Evolutionary feature synthesis for facial expression recognition

https://doi.org/10.1016/j.patrec.2005.07.026Get rights and content

Abstract

Feature extraction is one of the key steps in object recognition. In this paper we propose a novel genetically inspired learning method for facial expression recognition (FER). Unlike current research on facial expression recognition that generally selects visually meaningful feature by hands, our learning method can discover the features automatically in a genetic programming-based approach that uses Gabor wavelet representation for primitive features and linear/nonlinear operators to synthesize new features. These new features are used to train a support vector machine classifier that is used for recognizing the facial expressions. The learned operator and classifier are used on unseen testing images. To make use of random nature of a genetic program, we design a multi-agent scheme to boost the performance. We compare the performance of our approach with several approaches in the literature and show that our approach can perform the task of facial expression recognition effectively.

Introduction

Automatic face expression recognition (FER) is desirable for a variety of applications such as human–computer interaction, human behavior understanding, perceptual user interface, and interactive computer games. In an automatic FER system, face detection or localization in a cluttered scene is usually the first step. Next, relevant features from the face must be extracted, and finally the expression can be classified based on the extracted features (Daugman, 1997, Pantic and Rothkrantz, 2000).

As compared to face recognition, there is a relatively small amount of research on facial expression recognition. Previous work on automatic facial expression includes studies using representations based on optical flow, principal components analysis and physically based models. Viola uses Adaboost method to solve computer vision problems such as image retrieval and face detection (Viola and Jones, 2001), which can select features in the learning phase using a greedy strategy. AdaBoost method does not perform well in the small sample case (Guo and Dyer, 2003), which is used in our experiments. Yacoob and Davis (1994) use the inter-frame motion of edges extracted in the area of the mouth, nose, eyes, and eyebrows. Bartlett et al. (1996) use the combination of optical flow and principal components obtained from image differences. Hoey and Little (2000) approximate the flow of each frame with a low dimensional vector based upon a set of orthogonal Zernike polynomials and apply their method to the recognition of facial expressions with hidden Markov models (HMMs). Lyons et al., 1998, Lyons et al., 1999, Zhang et al., 1998, Zhang, 1999 use Gabor wavelet coefficients to code face expressions. In their work, they first extract a set of geometric facial points and then use multi-scale and multi-orientation Gabor wavelets filters to extract the Gabor wavelet coefficients at the chosen facial points. Similarly, Wiskott et al. (1997) use a labeled graph, based on the Gabor wavelet transform, to represent facial expression images. They perform face recognition through elastic graph matching.

Facial feature extraction attempts to find the most appropriate representation of face images for recognition and it is the key step in facial expression recognition. The extracted features capture the characteristics of face expressions and are fed to a classifier for recognition. The recognition accuracy of an automatic facial expression recognition system is determined by the quality of the feature set used. What are the good features? How can we synthesize effective features automatically based on the available information? It is difficult to identify a set of features that characterize a complex set of facial expressions. Typically, many types of features are explored before a recognition system can be built to perform the desired recognition task. There are a lot of features available and these features may be correlated, making the design and selection of appropriate features a very time consuming and expensive process.

For conventional methods, human experts design an approach to detect potential features in images depending on their knowledge and experience. This approach can often be dissected into some primitive operations on the original image or a set of related feature images obtained from the original one. Human experts try only some limited number of conventional combinations and explore a very small portion of the feature space since they are biased with their knowledge and have limited computational capability. On the other hand, GP, however, may try many unconventional ways of combining primitive operations that may never be imagined by a human expert. Although some of these unconventional combinations could be difficult to be explained by human experts, in some cases, it is these unconventional combinations that yield exceptionally good recognition results. In addition, the inherent parallelism of GP and the high speed of current computers allow the portion of the search space explored by GP to be much larger than that by human experts, enhancing the probability of finding an effective composite operator. The search performed by GP is not a random search. It is guided by the fitness of composite operators in the population. As the search proceeds, GP gradually shifts the population to the portion of the feature space containing good composite operators. Tan et al. (2003) propose a learning algorithm for fingerprint classification based on GP. Bhanu and Yu use GP for facial expression recognition with a Bayesian classifier (Bhanu et al., 2004). Unlike the conventional methods that select visually meaningful features by hand (Lyons et al., 1998, Lyons et al., 1999, Zhang et al., 1998, Zhang, 1999, Guo and Dyer, 2003), our approach can synthesize the features automatically. For the features chosen by hand, the points which are chosen are highly dependent on the person and the database. Our proposed approach learns features without resorting to a specific database. Therefore, our approach could be considered as fully domain-independent. To the best of our knowledge, unconventional features discovered by the computer have never been used in facial expression classification.

Section 2 presents the recognition system and explains the technical details. Experiments and results are presented in Section 3, where we compare our results with the other published work. Finally, Section 4 provides the conclusions of this paper.

Section snippets

Technical approach

Genetic programming (GP) is an evolutionary computational paradigm (Koza, 1994, Bhanu et al., 2005) that is an extension of genetic algorithm and works with a population of individuals. An individual in a population can be any complicated data structure such as linked lists, trees, graphs, etc. In this paper, individuals are composite operators represented by binary tree with primitive operators as internal nodes and primitive features as leaf nodes. We design different primitive operators,

Database

The database we use for our experiments contains 213 images of 10 Japanese women (Lyons et al., 1998). Each person has two to four images for each of the seven expressions: neutral (30 images), happy (31 images), sad (31 images), surprise (30 images), anger (30 images), disgust (29 images), and fear (32 images). The size of each image is 256 × 256 pixels, which are downscaled to 32 × 32 for computational efficiency reasons. We divide the database randomly into 10 roughly equal-sized parts, from

Conclusion

In this paper, we propose a learning algorithm for facial expression recognition based on GP. The proposed approach learns feature vector for facial expression recognition without explicit estimation of object pose, without any hand-tuned pre-process specific to a database. Thus, our approach is automatic and database-independent. Compared to the previous work, our experimental results show that GP can find good composite operators. Our GP-based algorithm is effective in extracting feature

References (21)

  • A.K. Jain et al.

    Unsupervised texture segmentation using Gabor filters

    Pattern Recognition

    (1991)
  • M. Bartlett et al.

    Classifying facial action

  • B. Bhanu et al.

    Feature synthesis using genetic programming for face expression recognition

    Genetic Evol. Comput. Conf.

    (2004)
  • B. Bhanu et al.

    Evolutionary Synthesis of Pattern Recognition Systems

    (2005)
  • Chang, C., Lin, C., 2001. LIBSVM: A library for support vector machines. Available from:...
  • J. Daugman

    Face and gesture recognition: An overview

    IEEE Trans. Pattern Anal. Machine Intell.

    (1997)
  • G.D. Guo et al.

    Simultaneous feature selection and classifier training via linear programming: A case study for face expression recognition

    IEEE Conf. Computer Vision Pattern Recogn.

    (2003)
  • J. Hoey et al.

    Representation and recognition of complex human motion

    IEEE Conf. Computer Vision Pattern Recogn.

    (2000)
  • J. Koza

    Genetic Programming II: Automatic Discovery of Reusable Programs

    (1994)
  • K. Krawiec et al.

    Evolutionary feature synthesis for visual learning

    IEEE Trans. Systems Man Cybernet.

    (2005)
There are more references available in the full text version of this article.

Cited by (66)

  • MPMFFT based DCA-DBT integrated probabilistic model for face expression classification

    2020, Journal of King Saud University - Computer and Information Sciences
  • Multi-Objective Differential Evolution for feature selection in Facial Expression Recognition systems

    2017, Expert Systems with Applications
    Citation Excerpt :

    It has been pointed out that any non-exhaustive selection method does not guarantee to find the optimal feature subset, but rather provides a satisfactory local optimum (Peng, Long, & Ding, 2005). Plenty of papers related to feature selection for FER in combination with EAs have been published to date, e.g., Yu and Bhanu (2006), Zavaschi, Britto, Oliveira, and Koerich (2013), Olague, Hammoud, Trujillo, Hernández, and Romero (2009), Soyel, Tekguc, and Demirel (2011) and Lajevardi and Hussain (2012). Cited methods solve the feature selection as single-objective problem, where the selected features depend heavily on a classifier’s accuracy.

  • Intelligent facial emotion recognition using a layered encoding cascade optimization model

    2015, Applied Soft Computing Journal
    Citation Excerpt :

    Minimum reconstruction errors on the combined manifolds were used to guide the classification process. Yu and Bhanu [38] proposed Genetic Programming (GP) combining with Gabor wavelet representations to synthesize facial expression features. Their work employed a GP-based method and linear/nonlinear operators for feature synthesis based on the primitive Gabor wavelet features.

  • Subspace learning for facial expression recognition: An overview and a new perspective

    2021, APSIPA Transactions on Signal and Information Processing
View all citing articles on Scopus
View full text