Elsevier

Knowledge-Based Systems

Volume 212, 5 January 2021, 106597
Knowledge-Based Systems

Output-based transfer learning in genetic programming for document classification

https://doi.org/10.1016/j.knosys.2020.106597Get rights and content

Abstract

Transfer learning has been studied in document classification for transferring a model trained from a source domain (SD) to a relatively similar target domain (TD). In feature-based transfer learning techniques, there is an investigation on the features being transferred from SD to TD. This paper conducts an investigation on an output-based transfer learning system using Genetic Programming (GP) in document classification tasks, which automatically selects features to construct classifiers. The proposed GP system directly generates programs from a set of sparse features and only considers the output change of the evolved programs from SD to TD. A linear model is then used to combine existing GP programs from SD as features to TD. Also, new GP programs are mutated from the programs evolved in SD to improve the accuracy. Via directly utilizing the evolved GP programs and their mutations, the feature extraction and estimation processes on TD are avoided. The results for the experiments demonstrates that the GP programs from SD can be effectively used for classifying documents in the relevant TD. The results also show that it is easy to train effective classifiers on TD when the GP programs are used as features. Furthermore, the proposed linear model, using multiple GP programs from SD as its inputs, outperforms single GP programs which are directly obtained from TD.

Introduction

Document classification has been addressed by a large number of machine learning algorithms [1], [2], [3], [4]. The number of features in a document classification task is often large, and a feature selection method is typically required [5], [6]. Some categories in a document classification task, such as category “comp.graphics” and category “comp.windows.x” in [7], are very closely related to each other [3], [7], [8].

Transfer learning techniques [9], [10], [11], [12] have been employed for training classifiers from few categories then pass learned knowledge to other relevantly similar categories [7]. In a set of similar categories, the distribution of the selected features in different categories is often different [7]. Therefore, when a trained model from a source domain is applied to a target domain, the distribution change of the selected features has to be taken into account [13], [14].

Genetic programming (GP) has been effectively utilized for feature selection in different applications, e.g., multiple classification [15], cybersecurity [16], and high dimensional data classification [17]. GP evolves programs which automatically include a subset of features. In a question–answer ranking task [18], a smaller set of features is effectively selected by GP than other methods. It reveals that GP has capability of automatically selecting proper features to construct effective text classifiers. This research is motivated by that GP evolves programs to transfer features selected by GP to a single output-based composite feature. Between a source domain and a target domain, it is taken into account the difference in the transferred space only. Thus the features selected from the source domain are not required for the relevant distribution change estimation when these features are employed in the target domain. It is promising to explore how to effectively transfer GP programs to the target domain when these programs come from the source domain. Additionally, output-based transfer learning approaches [19], [20] have been proposed to train a classifier on a small dataset. The shared features are transformed from the target domain to the source domain, and the proposed target objective function considers the probability of each predicted category based on the outputs of the classifier in all the training data (including both the source domain and the target domain training data). The results show that the effective of transfer learning can be improved after the output of the trained classifier is considered.

This paper conducts an investigation on an output-based transfer learning system using GP for document classification. In this paper, GP programs are directly evolved from a set of sparse features without using feature selection methods on a source domain. The evolved programs from a source domain will be applied to a target domain without considering of the distribution of input features. Instead, after GP programs are evolved in the source domain, they are directly applied to the target domain, and we only consider the change of the outputs of the GP programs. A linear model is proposed to combine a set of these GP programs, and the linear model is optimised based on the training data from the target domain. Furthermore, new programs are directly mutated from these GP programs. The features represented by the mutated programs are used to enrich the data to be used by the linear model on the target domain. The major contributions of this paper are as follows:

  • a transfer learning technique is proposed that can effectively and directly transfer GP programs evolved from the source domain to the target domain;

  • a method is introduced that can effectively combine GP programs evolved from the source domain and their mutated programs for predicting the test documents in the target domain; and

  • the evolved GP program can be explained to some extent in the context of document classification.

After this section, the background of document classification is provided in Section 2. The backgrounds of transfer learning and GP for text classification are given in Section 2 as well. The output-based transfer learning system for document classification is introduced in Section 3. After Section 4 introduces the design of experiments, the results and discussions are presented in Section 5. Section 6 brings conclusions and addresses future research directions.

Section snippets

Document classification

Document classification is the task of discriminating a document as one category or more categories based on the text content in the document. To automatically handle a document classification task, statistical techniques and artificial intelligence approaches have been utilized [5], [21], [22], [23].

Normally, document classification includes the stages of pre-processing, feature extraction, model training, and predication. Some words, such as “an” and “the”, are removed in the stage of

The proposed output-based transfer learning approach

The proposed GP-based transfer learning system is presented in this section. The main components of the GP system are described in Section 3.1. Section 3.2 introduces use of the output-based GP programs to the transfer learning system. Section 3.3 provides the overall structure of the system.

Dataset

The twenty newsgroup dataset [28] has been widely employed by researches [26], [27], [42]. Some categories in the dataset are very closely related to each other, such as category “comp.graphics” and category “comp.windows.x”. There are twenty categories, and four groups. Each group is a major category, such as “comp.graphics” and “comp.windows.x” belonging to the category “comp”. Following [42], six datasets are generated. Fig. 3 lists the details of the six binary classification tasks. The

Results and discussions

The test results on each dataset are provided in this section. There are discussions of the test results as well.

Conclusions

This paper investigated output-based transfer learning on GP programs for document classification. After GP programs were evolved in the source domains, a linear model was proposed to combine these GP programs for classifying documents on the target domains. From the experiments, the evolved GP classifiers from source domains have been shown to be helpful to classify documents from target domains. The combinations of randomly selected GP programs from a source domain in the proposed linear

CRediT authorship contribution statement

Wenlong Fu: Conception and design of study, Acquisition of data, Analysis and/or interpretation of data, Writing - original draft, Writing - review & editing. Bing Xue: Conception and design of study, Acquisition of data, Analysis and/or interpretation of data, Writing - original draft, Writing - review & editing. Xiaoying Gao: Conception and design of study, Acquisition of data, Analysis and/or interpretation of data, Writing - original draft, Writing - review & editing. Mengjie Zhang:

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgment

All authors approved the version of the manuscript to be published.

Funding

No funding was received for this work.

Intellectual property

We confirm that we have given due consideration to the protection of intellectual property associated with this work and that there are no impediments to publication, including the timing of publication, with respect to intellectual property. In so doing we confirm that we have followed the regulations of our institutions concerning intellectual property.

References (50)

  • KouG. et al.

    Evaluation of feature selection methods for text classification with small datasets using multiple criteria decision-making methods

    Appl. Soft Comput.

    (2020)
  • AltınelB. et al.

    Semantic text classification: A survey of past and recent advances

    Inf. Process. Manage.

    (2018)
  • ZhangW. et al.

    Learning document representation via topic-enhanced LSTM model

    Knowl.-Based Syst.

    (2019)
  • SidorovG. et al.

    Syntactic N-grams as machine learning features for natural language processing

    Expert Syst. Appl.

    (2014)
  • DeyA. et al.

    Senti-N-Gram: An n-gram lexicon for sentiment analysis

    Expert Syst. Appl.

    (2018)
  • DoganT. et al.

    Improved inverse gravity moment term weighting for text classification

    Expert Syst. Appl.

    (2019)
  • EscalanteH.J. et al.

    Term-weighting learning via genetic programming for text classification

    Knowl.-Based Syst.

    (2015)
  • KhodadiI. et al.

    Genetic programming-based feature learning for question answering

    Inf. Process. Manage.

    (2016)
  • MaJ. et al.

    A filter-based feature construction and feature selection approach for classification using genetic programming

    Knowl.-Based Syst.

    (2020)
  • KimJ. et al.

    Text classification using capsules

    Neurocomputing

    (2020)
  • FengG. et al.

    Relevance popularity: A term event model based feature selection scheme for text classification

    PLoS One

    (2017)
  • PanS.J. et al.

    Domain adaptation via transfer component analysis

    IEEE Trans. Neural Netw.

    (2011)
  • ZhaiY. et al.

    Making trillion correlations feasible in feature grouping and selection

    IEEE Trans. Pattern Anal. Mach. Intell.

    (2016)
  • KhanF.H. et al.

    Enhanced cross-domain sentiment classification utilizing a multi-source transfer learning approach

    Soft Comput.

    (2019)
  • PanS. et al.

    A survey on transfer learning

    IEEE Trans. Knowl. Data Eng.

    (2010)
  • Cited by (6)

    • A hierarchical estimation of multi-modal distribution programming for regression problems

      2023, Knowledge-Based Systems
      Citation Excerpt :

      Gaussian and polynomial kernels are traditionally used in kernel-based methods to approximate the target function [17–20]. Genetic programming (GP) [21] is one of the evolutionary computation techniques that is used for solving different problems [22–26], and the regression problem is one of the most common [27–34]. GP has the benefit of not requiring the regression models to be specified beforehand to anticipate the outcome.

    • A Robust Deep Model for Improved Categorization of Legal Documents for Predictive Analytics

      2023, International Journal on Recent and Innovation Trends in Computing and Communication
    • Probit regressive tversky indexed rocchio convolutive deep neural learning for legal document data analytics

      2021, International Journal of Intelligent Systems and Applications in Engineering
    View full text