Two-stage learning for multi-class classification using genetic programming
Introduction
Data classification finds its application in many real world problems, like fraud detection, face recognition, speech recognition and knowledge extraction from databases. The field of data classification is receiving increased importance due to unpredictability and complexity of real-world data. Evolutionary algorithms have shown evident performance for classification tasks. Genetic Programming (GP) is one of the evolutionary algorithms introduced by Koza [1] for automatic evolution of computer programs (including classifiers). GP has been successfully used for evolution of classifier-programs like decision trees [2]. Other GP based classification approaches include evolution of neural networks [3], [4], [5], autonomous classification systems [6], rule induction algorithms [7], fuzzy rule based systems and fuzzy petri nets [5], [8]. Most of these methods involve defining a grammar that is used to create and evolve classification algorithms using GP.
Various researchers [9], [10], [11], [12], [13] have used GP for evolution of classification rules. The rule based systems include, atomic representations proposed by Eggermont [14], [15] and SQL based representations proposed by Freitas et al. [12]. Tunsel and Jamshidi [16], Berlanga et al. [17] and Mendes et al. [18] introduced evolution of fuzzy rules using GP. Chien et al. [19] used fuzzy discrimination function for classification. Falco et al. [20] discovered comprehensive classification rules that use continuous value attributes. Bozarczuk et al. [21], [22] used different set of functions applicable to different type of attributes that represent rules as disjunctive normal form. This type of GP is also referred as constrained syntax GP. Tsakonas et al. [23] introduced two GP based systems for medical domain and achieved noticeable performance. Lin et al. [24] proposed a layered GP, where different layers correspond to different populations that perform feature extraction and classification. Another method is evolution of arithmetic expressions for classification. The arithmetic expressions can be used for numerical data and they output a real value. This real value is translated into the classification decision using different thresholds. This includes static thresholds [25], [26], dynamic thresholds [26], [27] and slotted thresholds [28].
Multi-class classification problems are common in the real world applications for the tasks like object recognition, character recognition, person recognition, disease diagnosis and several others. Many classification algorithms are binary in nature and must be extended for multi-class classification. These include neural networks, decision trees, k-nearest neighbor, naive Baye's classifiers, and support vector machines [29]. GP also needs to be extended for multiclass classification problems. Several methods have been presented to use GP for multi-class classification problems. Most noticeable among them is the one-versus-all method also known as binary decomposition method. This method has been used widely in GP based multi-class classification. In this method, one classifier is evolved for each class, discriminating a particular class from other classes present in the data. The final decision is made by presenting the input vector to classifiers of all classes. The classifier with positive or highest output is declared the winner. This method has been explored by many researchers [30], [31], [32], [33], [34]. Another relatively different method proposed by Muni et al. [35], uses a multi-tree representation, where a single classifier is an integrated version of individual classifiers for all classes. This amalgamated classifier is evolved in search of best classifier that has the ability to classify any of the class in one evolution.
Several other methods like ‘all versus all’ [36], error correcting output codes [37], and generalized error correcting output codes [38] have also been used to tackle multi-class classification problems by binary classification algorithms. However, none of them has been used in GP due to the large number of computations.
The drawback of binary decomposition method is the conflicting situations, where more than one classifier outputs a positive signal or none of the classifier outputs a belong-to signal. This situation degrades the classification accuracy. Several conflict resolution methods have been devised for this problem but they require extra processing during training and classification step. Another problem is the presence of skewed data. The data appears unbalanced for classification of a single class versus remaining classes. This problem is solved by increasing the number of training instances to make them appear balanced for each class [30], [36]. This is named ‘interleaved data format’ where the samples belonging to class under consideration are repeated and alternately placed between samples belonging to other classes. This strategy increases the training data as well as the training time.
The proposed staged approach overcomes these two problems. It evolves the classifiers in two different stages that perform discrimination and integration, and incorporates a discriminative fitness function which takes care of skewed data without increasing the computation. The integrated evolution eliminates the conflicting situations decreasing the evaluation time required for conflict resolution. The proposed algorithm is detailed in the next section.
Section snippets
Proposed methodology
Many attempts have been made to develop general approaches to multi-class classification. One of the well known methods, in machine learning community, is one vs. all method. It involves learning a discriminator for each pair of class labels. The proposed classification mechanism uses the same principle but divides the training process into two phases. The first stage resembles the traditional binary decomposition method. The output, given by this phase, is a set of classifier populations for
Results
Five benchmark multi-class classification problems have been selected from UCI ML repository [41], for performance evaluation of this work. We have selected the datasets based on following properties:
- (1)
Dataset should be real or numerical valued.
- (2)
Problem should be multi-class classification.
- (3)
There should be no missing values.
The datasets have been chosen from various dimensions of life to show the applicability of GP classification as well as generalization of our proposed optimization technique.
Conclusions
The proposed two stage learning mechanism for multi-class classification using Genetic Programming has yielded better results when compared to one-versus-all or binary decomposition method. This is due to the fact that binary decomposition method suffers from conflicting situations. On the other hand, we have used a fitness measure that favors accurate classifiers and less conflicting outputs. The proposed method reduces the computation required to perform the conflict resolution during the
Hajira Jabeen is working as an assistant professor at Iqra University, Islamabad, Pakistan since 2009. Her field of expertise include evolutionary Computation, swarm intelligence and data classification.
References (48)
Genetic programming neural networks: a powerful bioinformatics tool for human genetics
Appl. Soft Comput.
(2007)A comparison of classification accuracy of four genetic programming-evolved intelligent structures
Inf. Sci.
(2006)- et al.
An autonomous GP-based system for regression and classification problems
Appl. Soft Comput.
(2009) - et al.
Learning discriminant functions with fuzzy attributes for classification using genetic programming
Expert Syst. Appl.
(2002) A constrained-syntax genetic programming system for discovering classification rules: application to medical data sets
Artif. Intelligence Med.
(2004)- et al.
Cancer classification using Rotation Forest
Comput. Biol. Med.
(2008) - et al.
An overview of ensemble methods for binary classifiers in multi-class problems: experimental study on one-vs-one and one-vs-all schemes
Pattern Recognition
(2011) Genetic Programming: On the Programming of Computers by Means of Natural Selection
(1992)Concept formation and decision tree induction using the genetic programming paradigm
Lect. Notes Comput. Sci.
(1991)- et al.
Modifying genetic programming for artificial neural network development for data mining
Soft Comput.
(2008)
Evolving rule induction algorithms with multiobjective grammer based genetic programming
Knowl. Inf. Syst.
Genetic programming—a tool for flexible rule extraction
IEEE Cong. Evol. Comput.
A building block approach to genetic programming for rule discovery, in data mining: a heuristic approach
Evolution of classification rules for comprehensible knowledge discovery
IEEE Cong. Evol. Comput.
A Genetic Programming Framework for Two Data Mining Tasks : Classification and Generalized Rule Induction
Applying genetic programming technique in classification trees
Soft Computing
On genetic programming of fuzzy rule-based systems for intelligent control
Int. J. Intelligent Autom. Soft Comput.
An innovative application of a constrained-syntax genetic programming system to the problem of predicting survival of patients
Cited by (12)
A genetically optimized neural network model for multi-class classification
2016, Expert Systems with ApplicationsCitation Excerpt :So, there is no point in adding BFS in crossover operation, because it increases the time required to reach the solution drastically. Jabeen and Baig (2013) proposed two stage learning for multi-class classification problems. In the first stage, the classifiers are trained for each class versus the remaining classes.
Designing efficient discriminant functions for multi-category classification using evolutionary methods
2016, NeurocomputingCitation Excerpt :In addition, the ability of coping with skewed data is another advantage of this fitness function, which makes it the best choice for our purpose. We also take advantage of layered fitness [11], where the individual with the highest fitness value has always priority to be chosen. In the case of equality of two individuals׳ fitness values, we choose the individual with less number of nodes.
On the usefulness of one-class classifier ensembles for decomposition of multi-class problems
2015, Pattern RecognitionA Novel Quadtree-Based Genetic Programming Search for Searchable Encryption Optimization
2023, GECCO 2023 Companion - Proceedings of the 2023 Genetic and Evolutionary Computation Conference CompanionGenetic Programming with Random Binary Decomposition for Multi-Class Classification Problems
2021, 2021 IEEE Congress on Evolutionary Computation, CEC 2021 - ProceedingsA semi-boosted nested model with sensitivity-based weighted binarization for multi-domain network intrusion detection
2019, ACM Transactions on Intelligent Systems and Technology
Hajira Jabeen is working as an assistant professor at Iqra University, Islamabad, Pakistan since 2009. Her field of expertise include evolutionary Computation, swarm intelligence and data classification.
Abdul Rauf Baig has been assosiated with National University of Computing and Emerging Technologies, NU-FAST, Islamabad, Pakistan, since 2004. His field of expertise include Aritifical Intelligence, Data Mining and swarm intelligence.