Elsevier

Information Sciences

Volume 593, May 2022, Pages 488-504
Information Sciences

Using a small number of training instances in genetic programming for face image classification

https://doi.org/10.1016/j.ins.2022.01.055Get rights and content

Abstract

Classifying faces is a difficult task due to image variations in illumination, occlusion, pose, expression, etc. Typically, it is challenging to build a generalised classifier when the training data is small, which can result in poor generalisation. This paper proposes a new approach for the classification of face images based on multi-objective genetic programming (MOGP). In MOGP, image descriptors that extract effective features are automatically evolved by optimising two different objectives at the same time: the accuracy and the distance measure. The distance measure is a new measure intended to enhance generalisation of learned features and/or classifiers. The performance of MOGP is evaluated on eight face datasets. The results show that MOGP significantly outperforms 17 competitive methods.

Introduction

Face image classification has many important applications in security, criminal detection and surveillance [1]. This task includes facial expression classification and face recognition. Facial expression classification aims to identify different facial expressions from face images, which is an important task for human motion analysis and communication [2]. Face recognition aims to classify face images from different people into groups [3]. Face images are often collected under different environments so that they have different poses, illuminations, occlusions, facial expressions, and other facial details, which makes the task difficult.

Typically, face features extracted from images contain discriminative information, which can make the images easier to be classified. Typical methods include scale-invariant feature transform (SIFT) [4], local binary patterns (LBP) [5], eigenfaces [6], and Fisherfaces [6]. Recent advances in feature learning have enabled automatic characterisation of images by learning effective features rather than manually identifying them [7], [8]. Commonly used methods are convolutional neural networks (CNNs) [9], dictionary learning [10], and genetic programming (GP) [8]. However, learning features from images is challenging because of large search space and high image variations.

Collecting and labelling large numbers of face images for training is often expensive or difficult due to privacy, security or other concerns. Because there are insufficient training data, it is difficult for features and classifiers to generalise effectively. Popular image classification methods, i.e., NN-based algorithms, often require sufficient large data to train due to a huge number of trainable parameters [11], [12]. These methods often combine with other strategies, such as data augmentation, transfer learning, meta-learning [13], to improve the generalisation. However, these strategies are not always effective and need strong assumptions. For example, most data augmentation-based methods assume that the newly generated data have the same distributions as the training data and transfer learning-based methods assume that the source domains/tasks are similar or related to the target domains/tasks. To this end, this paper aims to solve face image classification only using small training data. Instead of using NNs, which need sufficient data to train, we use GP to solve face image classification.

Evolutionary computation (EC) studies algorithms inspired by biological evolution and social intelligence to solve real-world problems [14], [15], [16]. As an EC technique, GP typically evolves variable-length computer programs to solve problems [17]. GP has good global search abilities without requiring a differentiating objective function. The solutions of GP are known with flexible complexity and high interpretability. There is significant potential for GP to learn general image features [8], [18], [19]. However, there is a lack of investigations on using GP for face image classification using small training data.

Existing GP methods also face the issue of poor generalisation using small training data. In most GP-based methods [20], the fitness function measures the accuracy using the training set. When the training set is small, it may be easy to obtain perfect training accuracy (fitness value), i.e., 100%, at the very beginning of evolution, but the learned model often has poor generalisation. To improve generalisation, this paper develops a new distance measure for GP fitness evaluation, in addition to the classification accuracy measure. Since the relationship between the accuracy and the distance is unknown, multi-objective optimisation algorithms that simultaneously optimise multiple objective functions can be used to handle this. EC techniques are the main approach for multi-objective optimisation and have shown a promise in many problems [21].

A multi-objective GP (MOGP) method is proposed in this paper to classify face images on small training sets. The MOGP approach learns facial features by maximising two objectives, namely classification accuracy and distance measure. The second objective is a new metric based on different distances, aiming to improve the generalisation ability of learned features and/or classifiers. MOGP is used to search multiple Pareto optimal solutions with the idea of non-dominated sorting. MOGP will be tested on eight face image datasets, including face classification and facial expression classification, with several images per class for training. MOGP will be compared with two GP methods and 15 non-GP methods to demonstrate its effectiveness. There are two main contributions:

  • A new distance measure is developed as an objective function to maximise the inter-class distance and minimise the intra-class distance. By performing such an optimisation, the measure enhances the generalisation ability of the learning system when the training data is small.

  • A MOGP algorithm is proposed to automatically learn facial features while maximising classification accuracy and a new distance measure.

The proposed approach can automatically generate a dynamic number of global and/or local features from small-scale images while maximising classification accuracy, maximising the inter-class distance and minimising the intra-class distance, thereby improving generalisation performance. The proposed approach is simple and does not need any assumptions. It can achieve high classification accuracy on different face image classification tasks. Furthermore, it can evolve human-interpretable solutions, showing the process of feature extraction.

Section snippets

Multi-objective optimisation

Multi-objective optimisation problems often maximise or minimise multiple (potentially) conflicting objectives at the same time and can be expressed asminimiseF(x)={f1(x),f2(x),,fk(x)}subject to:gi(x)0,i=1,2,,mhi(x)=0,i=1,2,,nwhere f1(x),f2(x),,fk(x) denote k(k>1) objectives and x represents decision variables. g(x) and h(x) denote two types of constraints. m and n are the numbers of constraints, respectively.

The Pareto front usually contains many non-dominated solutions that can be

The proposed approach

This section presents the new MOGP approach in detail, i.e., the individual representation, the objective functions and the overall algorithm.

Benchmark methods

To demonstrate the effectiveness, we compare MOGP with two GP methods and 15 non-GP methods, which are

  • a single-objective GP method that only optimises the classification accuracy defined as Eq. (5). This method is termed SGP1. The individual representation is the same as MOGP;

  • a single-objective GP method that only optimises the distance measure defined as Eq. (8). This method is termed SGP2. The individual representation is the same as MOGP;

  • four different classification algorithms using raw

Results and discussions

The classification accuracy (%) obtained by MOGP, two single-objective GP algorithms and the other 15 non-GP methods are listed in Table 4, Table 5. The statistical test is Wilcoxon rank-sum test (p = 0.05). In Table 4, Table 5, the symbols “+”, “–” and “=” indicate that MOGP performs significantly better, worse, or similar to the corresponding method. The last rows of these tables summarise the results of the significance tests.

Further analysis

This section further analyses MOGP in terms of approximated Pareto front, the number of learned features, computation time, parameter sensitivity, and evolved programs/trees.

Conclusions

This paper developed a MOGP algorithm that maximises the objectives of classification accuracy and a distance measure for face image classification using a small training set. The effectiveness of MOGP has been evaluated on eight face datasets. The results showed that MOGP outperformed two single-objective GP algorithms and 15 non-GP methods on these datasets. The results demonstrated that MOGP was effective for feature learning from small training data for face image classification.

The

CRediT authorship contribution statement

Ying Bi: Conceptualization, Methodology, Software, Writing – original draft, Writing – review & editing. Bing Xue: Writing – review & editing, Supervision, Project administration, Funding acquisition. Mengjie Zhang: Writing – review & editing, Supervision, Project administration, Funding acquisition.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgements

This work was supported in part by the Marsden Fund of New Zealand Government under Contracts VUW1509 and VUW1615, the Science for Technological Innovation Challenge (SfTI) fund under contract 2019-S7-CRS, the University Research Fund at Victoria University of Wellington Grant No. 216378/3764 and 223805/3986, MBIE Data Science SSIF Fund under the contract RTVU1914, and National Natural Science Foundation of China (NSFC) under Grant 61876169.

References (50)

  • D.G. Lowe

    Distinctive image features from scale-invariant keypoints

    Proc. Int. J. Comput. Vis.

    (2004)
  • T. Ahonen et al.

    Face description with local binary patterns: Application to face recognition

    IEEE Trans. Pattern Anal. Mach. Intell.

    (2006)
  • P.N. Belhumeur et al.

    Eigenfaces vs. Fisherfaces, Recognition using class specific linear projection

    IEEE Trans. Pattern Anal. Mach. Intell.

    (1997)
  • Y. Duan et al.

    Context-aware local binary feature learning for face recognition

    IEEE Trans. Pattern Anal. Mach. Intell.

    (2017)
  • Y. Bi et al.

    Genetic Programming for Image Classification: An Automated Approach to Feature Learning

    (2021)
  • W. Rawat et al.

    Deep convolutional neural networks for image classification: A comprehensive review

    Neural Comput.

    (2017)
  • G. Zhang, J. Yang, Y. Zheng, Z. Luo, J. Zhang, Optimal discriminative feature and dictionary learning for image set...
  • Y. Bi et al.

    Dual-tree genetic programming for few-shot image classification

    IEEE Trans. Evol. Comput.

    (2021)
  • Y. Bi et al.

    Learning and sharing: A multitask genetic programming approach to image feature learning

    IEEE Trans. Evol. Comput.

    (2021)
  • Y. Wang et al.

    Generalizing from a few examples: A survey on few-shot learning

    ACM Comput. Surv.

    (2020)
  • B. Niu et al.

    Structure-redesign-based bacterial foraging optimization for portfolio selection, in

  • J.R. Koza

    Genetic Programming: On the Programming of Computers by Means of Natural Selection

    (1992)
  • H. Al-Sahaf et al.

    A survey on evolutionary machine learning

    J. R. Soc. New Zealand

    (2019)
  • H. Al-Sahaf et al.

    Keypoints detection and feature extraction: A dynamic genetic programming approach for evolving rotation-invariant texture image descriptors

    IEEE Trans. Evol. Comput.

    (2017)
  • Y. Bi et al.

    Genetic programming with a new representation to automatically learn features and evolve ensembles for image classification

    IEEE Trans. Cybern.

    (2021)
  • Cited by (0)

    View full text