Elsevier

Information Sciences

Volumes 367–368, 1 November 2016, Pages 80-104
Information Sciences

Multiobjective optimization of classifiers by means of 3D convex-hull-based evolutionary algorithms

https://doi.org/10.1016/j.ins.2016.05.026Get rights and content

Abstract

The receiver operating characteristic (ROC) and detection error tradeoff (DET) curves are frequently used in the machine learning community to analyze the performance of binary classifiers. Recently, the convex-hull-based multiobjective genetic programming algorithm was proposed and successfully applied to maximize the convex hull area for binary classification problems by minimizing false positive rate and maximizing true positive rate at the same time using indicator-based evolutionary algorithms. The area under the ROC curve was used for the performance assessment and to guide the search. Here we extend this research and propose two major advancements: Firstly we formulate the algorithm in detection error tradeoff space, minimizing false positives and false negatives, with the advantage that misclassification cost tradeoff can be assessed directly. Secondly, we add complexity as an objective function, which gives rise to a 3D objective space (as opposed to a 2D previous ROC space). A domain specific performance indicator for 3D Pareto front approximations, the volume above DET surface, is introduced, and used to guide the indicator-based evolutionary algorithm to find optimal approximation sets. We assess the performance of the new algorithm on designed theoretical problems with different geometries of Pareto fronts and DET surfaces, and two application-oriented benchmarks: (1) Designing spam filters with low numbers of false rejects, false accepts, and low computational cost using rule ensembles, and (2) finding sparse neural networks for binary classification of test data from the UCI machine learning benchmark. The results show a high performance of the new algorithm as compared to conventional methods for multicriteria optimization.

Introduction

Classification is one of the most common problems in machine learning. The task of classification is to assign instances in a dataset to target classes based on previously trained classifiers. The ROC (Receiver Operating Characteristic) curve is a technique for visualizing, organizing and selecting binary classifiers based on their performance [24]. ROC curves are typically used to evaluate and compare the performance of classifiers and they also have properties that make them especially useful for domains with skewed class distributions and different classes of problems that assign costs to misclassification. Originating from the field of object classification in radar images, ROC analysis has become increasingly important in many other areas with cost sensitive classification [15] and/or unbalanced data distribution [49], such as medical decision making [50], signal detection [20] and diagnostic systems [52]. As opposed to ROC curves, which show the tradeoff between true positive rate and false positive rate, DET (Detection Error Tradeoff) curves [41] show tradeoffs between false positive and false negative error rates. With DET it is easier to visualize the tradeoff between misclassification cost for binary classifiers than with ROC curves.

More recently, research has drawn attention to ROC convex hull (ROCCH) analysis that covers potentially optimal points for a given set of classifiers [24]. ROCCH makes use of the finding that two hard classifiers can be combined into a classifier that has characteristics in ROC space that correspond to linear combinations of the characteristics of single classifiers and thus, when searching for an approximation to the Pareto front, these linear combinations do not have to be explicitly represented in ROC space. A performance indicator for sets of hard binary classifiers that is compliant with the improvement of ROCCH is the area under the convex hull (AUC). And likewise the area above the DET convex hull can serve as an indicator of how well a Pareto front has been approximated. It measures the area attained by the current Pareto front approximation in DET space.

Some evolutionary multiobjective optimization algorithms (EMOAs) [31], [32], [54], [57], [58] have been applied to machine learning [2], [33] and image processing areas [36], [39]. One of the first algorithms where EMOAs were used for ROC optimization was proposed in [35]. Here a niched Pareto multiobjective genetic algorithm was used for classifier optimization by optimizing biobjective ROC curve. The generalization improvement in multiobjective learning was discussed in [27], where the generation of binary neural network classifiers based on the ROC analysis using an evolutionary multiobjective optimization algorithm was presented. It showed that the generalization ability can be more efficiently improved with a multiobjective framework than within a single objective one. ROC for multiclass classification was analyzed in [22], where a multiobjective optimization algorithm was used for classifiers training based on multiclass ROC analysis. The ROC front concept was introduced as an alternative to the ROC curve representation in [13], and the strategy was applied to the selection of classifiers in a pool using a multiobjective optimization algorithm. Moreover, the maximization of the performance of ROC representations with respect to this indicator has been subject to a recent study by Wang et al. [55], who showed that the proposed algorithm, convex-hull-based multiobjective genetic programming algorithm (CH-MOGP), is capable of showing a strong performance for improving ROCCH with respect to AUC as compared to using state-of-the-art EMOAs for the same task, such as NSGA-II (Nondominated Sorting Genetic Algorithm II) [16], GDE3 (the third evolution step of Generalized Differential Evolution) [34], SPEA2 (Strength Pareto Evolutionary Algorithm 2) [63], MOEA/D (Multiobjective Evolutionary Algorithm based on Decomposition) [61], and SMS-EMOA (multiobjective selection based on dominated hypervolume) [7].

So far algorithms that seek to maximize ROCCH performance have only focused on the problem of optimizing binary classifiers with respect to two criteria, i.e., minimization of false positive rate (fpr) and maximization of true positive rate (tpr). There is however an increasing interest in extending ROCCH performance analysis to more than two criteria. In this research we consider the complexity as an additional objective. The objective here is to find models with maximum simplicity (parsimony) or minimum computational costs. For rule-based systems, it can be described as the number of rules defining a classifier in proportion to the number of all possible rules. As it is easier to see the tradeoff between misclassification costs (i.e., fpr and fnr) when using DET space than when using ROC space, we use DET curve to describe the performance of binary classifiers.

In the past, convex-hull-based selection operators were employed in EMOA to maintain a well-distributed set or make the non-dominated sorting more effective (cf. [30], [43]). In [14] a multiobjective evolutionary algorithm based on the properties of the convex hulls defined in the ROC space was proposed. It was applied to determine a set of fuzzy rule-based binary classifiers with different tradeoffs between false positive rate (fpr) and true positive rate (tpr). NSGA-II was used to generate an approximation of a Pareto front composed of genetic fuzzy classifiers with different tradeoffs among sensitivity, specificity, and interpretability in [17]. After projecting the overall Pareto front onto the ROC space, ROC convex hull method was used to determine the potentially optimal classifiers on the ROC plane.

In this paper, we add the complexity minimization for parsimony maximization as a third objective function and formulate the problem from the misclassification error optimization point of view by minimizing false positive and false negative error rates objectives. For this we model the problem as a triobjective optimization in augmented DET space, and we propose a 3D convex-hull-based evolutionary multiobjective algorithm (3DCH-EMOA) that takes into account domain specific properties of the 3D augmented DET space. Moreover, we analyze and assess the performance of the algorithm in different studies on, partly new, academic problems and practical applications. To analyze the capability of different algorithms to maximize convex hull volume, in a more fundamental study, a set of test problems named ZEJD (Zhao, Emmerich, Jiao, Deutz) [21] are designed and the capability of 3DCH-EMOA to capture only the convex part of a Pareto front is assessed. Besides, we include a study on spam filter design, in which the number of rules determines the complexity objective in terms of number of used rules. We also apply the proposed algorithm to deal with sparse neural networks, in which not only the classification performance but also the structure of the network optimized.

This paper is organized as follows: the related work is outlined in Section 2, and the background of augmented DET surfaces and the theory of multiobjective optimization are introduced in Section 3. We describe the framework of the 3DCH-EMOA algorithm in Section 4, and experimental results on ZEJD benchmarks test problems are described and discussed in Section 5. The description of the spam filter application and experimental results are shown in Section 6. The experimental results about multiobjective optimization of sparse neural networks are discussed in Section 7. Section 8 provides the conclusion and a discussion on the important aspects and future perspectives of this work. In addition, details of ZEJD test functions are described in Appendix A.

Section snippets

ROC and DET in classification

Both ROC and DET curves are defined by a two-by-two confusion matrix which describes the relationship between the true labels and predicted labels from a classifier. An example of a confusion matrix is shown in Table 1. There are four possible outcomes with binary classifiers in a confusion matrix. It is a true positive (TP), if a positive instance is classified as positive. We call it false negative (FN or type II error), if a positive instance is classified as negative. If a negative instance

Augmented DET and multiobjective formulation

Finding a set of optimal binary classifiers can be viewed as a biobjective problem, i.e., minimizing fpr and fnr simultaneously in DET space. Our study aims at looking at optimizing three objectives for parsimony binary classification problem. We define parsimony (to be maximized) or complexity (to be minimized) as a third objective, in addition to fpr and fnr.

3D convex-hull-based evolutionary multiobjective optimization

In this section, we propose 3D convex-hull-based evolutionary multiobjective algorithm (3DCH-EMOA) for ADCH maximization with three objectives. In this paper, we only consider 3D convex hull, and the solutions of 3DCH-EMOA act as vertices on the convex hull in augmented DET space. The aim of 3DCH-EMOA is to find a set of non-dominated solutions that covers part of the surface of the 3D convex hull, which is constructed with population QR3 (the population is described in objective space) and

Experimental studies on artificial test problems

In this section, ZEJD test functions are adopted to test the performance of 3DCH-EMOA and several other EMOAs, including NSGA-II, GDE3, SMS-EMOA, SPEA2, MOEA/D. In this first benchmark we are interested in the capability of 3DCH-EMOA to cover the relevant part of the convex hull surface with points. To evaluate the performance of these algorithms VAS, Gini coefficient, computational time and Mann–Whitney test [40] are adopted in this section. By comparing the results of all algorithms we can

Spam problem

From a technical point of view, an email anti-spam system consists of a set of boolean filtering rules (denoted as Ru={r1,r2,,r|Ru|}), that jointly allows for spam messages detection. Discovering the relative importance of these rules and assigning the corresponding scores (weights) of each rule, is a complex setup process. The need of frequent scores reassignment for existing rules and setting scores for new rules, to keep the anti-spam filter updated and running, requires the adoption of

Multiobjective optimization of sparse neural networks

In this section, the proposed algorithm is applied to optimize multiobjective formulation of sparse neural networks to avoid overfitting by seeking parsimonious neural network models and hence to provide better predictions in augmented DET space. The idea of sparse neural network was proposed in [44], in which a fully connected feedforward neural network was pruned through optimization using single objective differential evolution algorithm to produce a sparse network that has good performance

Conclusions and future work

In this paper, we analyzed the properties of augmented DET convex hull (ADCH) maximization problem. 3DCH-EMOA is proposed to optimize the performance of augmented DET for classification. In order to evaluate the performance of several EMOAs a set of test problems ZEJD is designed. 3DCH-EMOA is compared with other EMOAs, such as NSGA-II, GDE3, SPEA2, MOEA/D and SMS-EMOA on ZEJD test problems. 3DCH-EMOA always obtains the best results not only for convergence but also for diversity metrics. By

Acknowledgment

This work was partially supported by the National Basic Research Program (973 Program) of China (no. 2013CB329402), the National Natural Science Foundation of China (nos. 61473215, 61371201, 61373111, 61303032, 61271301, 61272279, 61203303 and 61571342), the Excellent Young Science Foundation of China (no. 61522311), the Fund for Foreign Scholars in University Research and Teaching Programs (the 111 Project) (no. B07048), the Major Research Plan of the National Natural Science Foundation of

References (63)

  • S. Yitzhaki

    Relative deprivation and the Gini coefficient

    Q. J. Econ.

    (1979)
  • Q. Zhang et al.

    MOEA/D: A multiobjective evolutionary algorithm based on decomposition

    IEEE Trans. Evol. Comput.

    (2007)
  • The apache spamassassin project. The powerful #1 open-source spam filter. SpamAssassin, 2011,...
  • W.A. Albukhanajer et al.

    Evolutionary multi-objective image feature extraction in the presence of noise

    IEEE Trans. Cybern.

    (2014)
  • R. Ariew

    Ockham’s Razor: A Historical and Philosophical Analysis of Ockham’s Principle of Parsimony

    (1976)
  • C.B. Barber et al.

    The quickhull algorithm for convex hulls

    ACM Trans. Math. Softw. (TOMS)

    (1996)
  • M. Barreno, A.A. Cárdenas, J.D. Tygar, Optimal ROC curve for a combination of classifiers, MIT Press, 2008. Proceedings...
  • V. Basto-Fernandes et al.

    Anti-spam multiobjective genetic algorithms optimization analysis

    Int. Resour. Manag. J.

    (2012)
  • U. Bhowan et al.

    Evolving diverse ensembles using genetic programming for classification with unbalanced data

    IEEE Trans. Evol. Comput.

    (2013)
  • U. Bhowan et al.

    Multi-objective genetic programming for classification with unbalanced data

    Proceedings of the 22nd Australasian Joint Conference: Advances in Artificial Intelligence, (AI)

    (2009)
  • L. Bottou

    Stochastic gradient learning in neural networks

    Proceedings of the 4th International Conference on Neural Networks and Their Applications (Neuro-Nîmes

    (1991)
  • C. Bourke et al.

    On reoptimizing multi-class classifiers

    Mach. Learn.

    (2008)
  • C.C. Chang et al.

    LIBSVM: a library for support vector machines

    ACM Trans. Intell. Syst. Technol.

    (2011)
  • M. Cococcioni et al.

    A new multi-objective evolutionary algorithm based on convex hull for binary classifier optimization

    Proceedings of the IEEE Congress on Evolutionary Computation

    (2007)
  • K. Deb et al.

    A fast and elitist multiobjective geneticalgorithm: NSGA-II

    IEEE Trans. Evol. Comput.

    (2002)
  • P. Ducange et al.

    Multi-objective genetic fuzzy classifiers for imbalanced and cost-sensitive datasets

    Soft Comput.

    (2010)
  • J.J. Durillo et al.

    The jMetal framework for multi-objective optimization: design and architecture

    Proceedings of the IEEE Congress on Evolutionary Computation CEC 2010

    (2010)
  • J.P. Egan

    Signal Detection Theory and ROC Analysis

    (1975)
  • M.T. Emmerich et al.

    A family of test problems with Pareto-fronts of variable curvature based on super-spheres

    Proceedings of the 18th International Conference on Multicriteria Decision Making MCDM

    (2006)
  • T. Fawcett

    Using rule sets to maximize ROC performance

    Proceedings of the IEEE International Conference on Data Mining, ICDM

    (2001)
  • T. Fawcett

    PRIE: A system for generating rule lists to maximize ROC performance

    Data Min. Knowl. Discov.

    (2008)
  • Cited by (0)

    View full text