Multiobjective optimization of classifiers by means of 3D convex-hull-based evolutionary algorithms
Introduction
Classification is one of the most common problems in machine learning. The task of classification is to assign instances in a dataset to target classes based on previously trained classifiers. The ROC (Receiver Operating Characteristic) curve is a technique for visualizing, organizing and selecting binary classifiers based on their performance [24]. ROC curves are typically used to evaluate and compare the performance of classifiers and they also have properties that make them especially useful for domains with skewed class distributions and different classes of problems that assign costs to misclassification. Originating from the field of object classification in radar images, ROC analysis has become increasingly important in many other areas with cost sensitive classification [15] and/or unbalanced data distribution [49], such as medical decision making [50], signal detection [20] and diagnostic systems [52]. As opposed to ROC curves, which show the tradeoff between true positive rate and false positive rate, DET (Detection Error Tradeoff) curves [41] show tradeoffs between false positive and false negative error rates. With DET it is easier to visualize the tradeoff between misclassification cost for binary classifiers than with ROC curves.
More recently, research has drawn attention to ROC convex hull (ROCCH) analysis that covers potentially optimal points for a given set of classifiers [24]. ROCCH makes use of the finding that two hard classifiers can be combined into a classifier that has characteristics in ROC space that correspond to linear combinations of the characteristics of single classifiers and thus, when searching for an approximation to the Pareto front, these linear combinations do not have to be explicitly represented in ROC space. A performance indicator for sets of hard binary classifiers that is compliant with the improvement of ROCCH is the area under the convex hull (AUC). And likewise the area above the DET convex hull can serve as an indicator of how well a Pareto front has been approximated. It measures the area attained by the current Pareto front approximation in DET space.
Some evolutionary multiobjective optimization algorithms (EMOAs) [31], [32], [54], [57], [58] have been applied to machine learning [2], [33] and image processing areas [36], [39]. One of the first algorithms where EMOAs were used for ROC optimization was proposed in [35]. Here a niched Pareto multiobjective genetic algorithm was used for classifier optimization by optimizing biobjective ROC curve. The generalization improvement in multiobjective learning was discussed in [27], where the generation of binary neural network classifiers based on the ROC analysis using an evolutionary multiobjective optimization algorithm was presented. It showed that the generalization ability can be more efficiently improved with a multiobjective framework than within a single objective one. ROC for multiclass classification was analyzed in [22], where a multiobjective optimization algorithm was used for classifiers training based on multiclass ROC analysis. The ROC front concept was introduced as an alternative to the ROC curve representation in [13], and the strategy was applied to the selection of classifiers in a pool using a multiobjective optimization algorithm. Moreover, the maximization of the performance of ROC representations with respect to this indicator has been subject to a recent study by Wang et al. [55], who showed that the proposed algorithm, convex-hull-based multiobjective genetic programming algorithm (CH-MOGP), is capable of showing a strong performance for improving ROCCH with respect to AUC as compared to using state-of-the-art EMOAs for the same task, such as NSGA-II (Nondominated Sorting Genetic Algorithm II) [16], GDE3 (the third evolution step of Generalized Differential Evolution) [34], SPEA2 (Strength Pareto Evolutionary Algorithm 2) [63], MOEA/D (Multiobjective Evolutionary Algorithm based on Decomposition) [61], and SMS-EMOA (multiobjective selection based on dominated hypervolume) [7].
So far algorithms that seek to maximize ROCCH performance have only focused on the problem of optimizing binary classifiers with respect to two criteria, i.e., minimization of false positive rate (fpr) and maximization of true positive rate (tpr). There is however an increasing interest in extending ROCCH performance analysis to more than two criteria. In this research we consider the complexity as an additional objective. The objective here is to find models with maximum simplicity (parsimony) or minimum computational costs. For rule-based systems, it can be described as the number of rules defining a classifier in proportion to the number of all possible rules. As it is easier to see the tradeoff between misclassification costs (i.e., fpr and fnr) when using DET space than when using ROC space, we use DET curve to describe the performance of binary classifiers.
In the past, convex-hull-based selection operators were employed in EMOA to maintain a well-distributed set or make the non-dominated sorting more effective (cf. [30], [43]). In [14] a multiobjective evolutionary algorithm based on the properties of the convex hulls defined in the ROC space was proposed. It was applied to determine a set of fuzzy rule-based binary classifiers with different tradeoffs between false positive rate (fpr) and true positive rate (tpr). NSGA-II was used to generate an approximation of a Pareto front composed of genetic fuzzy classifiers with different tradeoffs among sensitivity, specificity, and interpretability in [17]. After projecting the overall Pareto front onto the ROC space, ROC convex hull method was used to determine the potentially optimal classifiers on the ROC plane.
In this paper, we add the complexity minimization for parsimony maximization as a third objective function and formulate the problem from the misclassification error optimization point of view by minimizing false positive and false negative error rates objectives. For this we model the problem as a triobjective optimization in augmented DET space, and we propose a 3D convex-hull-based evolutionary multiobjective algorithm (3DCH-EMOA) that takes into account domain specific properties of the 3D augmented DET space. Moreover, we analyze and assess the performance of the algorithm in different studies on, partly new, academic problems and practical applications. To analyze the capability of different algorithms to maximize convex hull volume, in a more fundamental study, a set of test problems named ZEJD (Zhao, Emmerich, Jiao, Deutz) [21] are designed and the capability of 3DCH-EMOA to capture only the convex part of a Pareto front is assessed. Besides, we include a study on spam filter design, in which the number of rules determines the complexity objective in terms of number of used rules. We also apply the proposed algorithm to deal with sparse neural networks, in which not only the classification performance but also the structure of the network optimized.
This paper is organized as follows: the related work is outlined in Section 2, and the background of augmented DET surfaces and the theory of multiobjective optimization are introduced in Section 3. We describe the framework of the 3DCH-EMOA algorithm in Section 4, and experimental results on ZEJD benchmarks test problems are described and discussed in Section 5. The description of the spam filter application and experimental results are shown in Section 6. The experimental results about multiobjective optimization of sparse neural networks are discussed in Section 7. Section 8 provides the conclusion and a discussion on the important aspects and future perspectives of this work. In addition, details of ZEJD test functions are described in Appendix A.
Section snippets
ROC and DET in classification
Both ROC and DET curves are defined by a two-by-two confusion matrix which describes the relationship between the true labels and predicted labels from a classifier. An example of a confusion matrix is shown in Table 1. There are four possible outcomes with binary classifiers in a confusion matrix. It is a true positive (TP), if a positive instance is classified as positive. We call it false negative (FN or type II error), if a positive instance is classified as negative. If a negative instance
Augmented DET and multiobjective formulation
Finding a set of optimal binary classifiers can be viewed as a biobjective problem, i.e., minimizing fpr and fnr simultaneously in DET space. Our study aims at looking at optimizing three objectives for parsimony binary classification problem. We define parsimony (to be maximized) or complexity (to be minimized) as a third objective, in addition to fpr and fnr.
3D convex-hull-based evolutionary multiobjective optimization
In this section, we propose 3D convex-hull-based evolutionary multiobjective algorithm (3DCH-EMOA) for ADCH maximization with three objectives. In this paper, we only consider 3D convex hull, and the solutions of 3DCH-EMOA act as vertices on the convex hull in augmented DET space. The aim of 3DCH-EMOA is to find a set of non-dominated solutions that covers part of the surface of the 3D convex hull, which is constructed with population (the population is described in objective space) and
Experimental studies on artificial test problems
In this section, ZEJD test functions are adopted to test the performance of 3DCH-EMOA and several other EMOAs, including NSGA-II, GDE3, SMS-EMOA, SPEA2, MOEA/D. In this first benchmark we are interested in the capability of 3DCH-EMOA to cover the relevant part of the convex hull surface with points. To evaluate the performance of these algorithms VAS, Gini coefficient, computational time and Mann–Whitney test [40] are adopted in this section. By comparing the results of all algorithms we can
Spam problem
From a technical point of view, an email anti-spam system consists of a set of boolean filtering rules (denoted as ), that jointly allows for spam messages detection. Discovering the relative importance of these rules and assigning the corresponding scores (weights) of each rule, is a complex setup process. The need of frequent scores reassignment for existing rules and setting scores for new rules, to keep the anti-spam filter updated and running, requires the adoption of
Multiobjective optimization of sparse neural networks
In this section, the proposed algorithm is applied to optimize multiobjective formulation of sparse neural networks to avoid overfitting by seeking parsimonious neural network models and hence to provide better predictions in augmented DET space. The idea of sparse neural network was proposed in [44], in which a fully connected feedforward neural network was pruned through optimization using single objective differential evolution algorithm to produce a sparse network that has good performance
Conclusions and future work
In this paper, we analyzed the properties of augmented DET convex hull (ADCH) maximization problem. 3DCH-EMOA is proposed to optimize the performance of augmented DET for classification. In order to evaluate the performance of several EMOAs a set of test problems ZEJD is designed. 3DCH-EMOA is compared with other EMOAs, such as NSGA-II, GDE3, SPEA2, MOEA/D and SMS-EMOA on ZEJD test problems. 3DCH-EMOA always obtains the best results not only for convergence but also for diversity metrics. By
Acknowledgment
This work was partially supported by the National Basic Research Program (973 Program) of China (no. 2013CB329402), the National Natural Science Foundation of China (nos. 61473215, 61371201, 61373111, 61303032, 61271301, 61272279, 61203303 and 61571342), the Excellent Young Science Foundation of China (no. 61522311), the Fund for Foreign Scholars in University Research and Teaching Programs (the 111 Project) (no. B07048), the Major Research Plan of the National Natural Science Foundation of
References (63)
- et al.
SMS-EMOA: multiobjective selection based on dominated hyper
Eur. J. Oper. Res.
(2007) - et al.
A multi-model selection framework for unknown and/or evolutive misclassification cost problems
Pattern Recognit.
(2010) - et al.
Time-constrained cost-sensitive decision tree induction
Inf. Sci.
(2016) - et al.
jMetal: A java framework for multi-objective optimization
Adv. Eng. Softw.
(2011) - et al.
Multi-class ROC analysis from a multi-objective optimisation perspective
Pattern Recognit. Lett.
(2006) An introduction to ROC analysis
Pattern Recognit. Lett.
(2006)- et al.
Operator adaptation in evolutionary computation and its application to structure optimization of neural networks
Neurocomputing
(2003) - et al.
A novel selection evolutionary strategy for constrained optimization
Inf. Sci.
(2013) - et al.
A modified objective function method with feasible-guiding strategy to solve constrained multi-objective optimization problems
Appl. Soft Comput.
(2014) - et al.
A variable reduction strategy for evolutionary algorithms handling equality constraints
Appl. Soft Comput.
(2015)