Evolutionary feature synthesis for facial expression recognition
Introduction
Automatic face expression recognition (FER) is desirable for a variety of applications such as human–computer interaction, human behavior understanding, perceptual user interface, and interactive computer games. In an automatic FER system, face detection or localization in a cluttered scene is usually the first step. Next, relevant features from the face must be extracted, and finally the expression can be classified based on the extracted features (Daugman, 1997, Pantic and Rothkrantz, 2000).
As compared to face recognition, there is a relatively small amount of research on facial expression recognition. Previous work on automatic facial expression includes studies using representations based on optical flow, principal components analysis and physically based models. Viola uses Adaboost method to solve computer vision problems such as image retrieval and face detection (Viola and Jones, 2001), which can select features in the learning phase using a greedy strategy. AdaBoost method does not perform well in the small sample case (Guo and Dyer, 2003), which is used in our experiments. Yacoob and Davis (1994) use the inter-frame motion of edges extracted in the area of the mouth, nose, eyes, and eyebrows. Bartlett et al. (1996) use the combination of optical flow and principal components obtained from image differences. Hoey and Little (2000) approximate the flow of each frame with a low dimensional vector based upon a set of orthogonal Zernike polynomials and apply their method to the recognition of facial expressions with hidden Markov models (HMMs). Lyons et al., 1998, Lyons et al., 1999, Zhang et al., 1998, Zhang, 1999 use Gabor wavelet coefficients to code face expressions. In their work, they first extract a set of geometric facial points and then use multi-scale and multi-orientation Gabor wavelets filters to extract the Gabor wavelet coefficients at the chosen facial points. Similarly, Wiskott et al. (1997) use a labeled graph, based on the Gabor wavelet transform, to represent facial expression images. They perform face recognition through elastic graph matching.
Facial feature extraction attempts to find the most appropriate representation of face images for recognition and it is the key step in facial expression recognition. The extracted features capture the characteristics of face expressions and are fed to a classifier for recognition. The recognition accuracy of an automatic facial expression recognition system is determined by the quality of the feature set used. What are the good features? How can we synthesize effective features automatically based on the available information? It is difficult to identify a set of features that characterize a complex set of facial expressions. Typically, many types of features are explored before a recognition system can be built to perform the desired recognition task. There are a lot of features available and these features may be correlated, making the design and selection of appropriate features a very time consuming and expensive process.
For conventional methods, human experts design an approach to detect potential features in images depending on their knowledge and experience. This approach can often be dissected into some primitive operations on the original image or a set of related feature images obtained from the original one. Human experts try only some limited number of conventional combinations and explore a very small portion of the feature space since they are biased with their knowledge and have limited computational capability. On the other hand, GP, however, may try many unconventional ways of combining primitive operations that may never be imagined by a human expert. Although some of these unconventional combinations could be difficult to be explained by human experts, in some cases, it is these unconventional combinations that yield exceptionally good recognition results. In addition, the inherent parallelism of GP and the high speed of current computers allow the portion of the search space explored by GP to be much larger than that by human experts, enhancing the probability of finding an effective composite operator. The search performed by GP is not a random search. It is guided by the fitness of composite operators in the population. As the search proceeds, GP gradually shifts the population to the portion of the feature space containing good composite operators. Tan et al. (2003) propose a learning algorithm for fingerprint classification based on GP. Bhanu and Yu use GP for facial expression recognition with a Bayesian classifier (Bhanu et al., 2004). Unlike the conventional methods that select visually meaningful features by hand (Lyons et al., 1998, Lyons et al., 1999, Zhang et al., 1998, Zhang, 1999, Guo and Dyer, 2003), our approach can synthesize the features automatically. For the features chosen by hand, the points which are chosen are highly dependent on the person and the database. Our proposed approach learns features without resorting to a specific database. Therefore, our approach could be considered as fully domain-independent. To the best of our knowledge, unconventional features discovered by the computer have never been used in facial expression classification.
Section 2 presents the recognition system and explains the technical details. Experiments and results are presented in Section 3, where we compare our results with the other published work. Finally, Section 4 provides the conclusions of this paper.
Section snippets
Technical approach
Genetic programming (GP) is an evolutionary computational paradigm (Koza, 1994, Bhanu et al., 2005) that is an extension of genetic algorithm and works with a population of individuals. An individual in a population can be any complicated data structure such as linked lists, trees, graphs, etc. In this paper, individuals are composite operators represented by binary tree with primitive operators as internal nodes and primitive features as leaf nodes. We design different primitive operators,
Database
The database we use for our experiments contains 213 images of 10 Japanese women (Lyons et al., 1998). Each person has two to four images for each of the seven expressions: neutral (30 images), happy (31 images), sad (31 images), surprise (30 images), anger (30 images), disgust (29 images), and fear (32 images). The size of each image is 256 × 256 pixels, which are downscaled to 32 × 32 for computational efficiency reasons. We divide the database randomly into 10 roughly equal-sized parts, from
Conclusion
In this paper, we propose a learning algorithm for facial expression recognition based on GP. The proposed approach learns feature vector for facial expression recognition without explicit estimation of object pose, without any hand-tuned pre-process specific to a database. Thus, our approach is automatic and database-independent. Compared to the previous work, our experimental results show that GP can find good composite operators. Our GP-based algorithm is effective in extracting feature
References (21)
- et al.
Unsupervised texture segmentation using Gabor filters
Pattern Recognition
(1991) - et al.
Classifying facial action
- et al.
Feature synthesis using genetic programming for face expression recognition
Genetic Evol. Comput. Conf.
(2004) - et al.
Evolutionary Synthesis of Pattern Recognition Systems
(2005) - Chang, C., Lin, C., 2001. LIBSVM: A library for support vector machines. Available from:...
Face and gesture recognition: An overview
IEEE Trans. Pattern Anal. Machine Intell.
(1997)- et al.
Simultaneous feature selection and classifier training via linear programming: A case study for face expression recognition
IEEE Conf. Computer Vision Pattern Recogn.
(2003) - et al.
Representation and recognition of complex human motion
IEEE Conf. Computer Vision Pattern Recogn.
(2000) Genetic Programming II: Automatic Discovery of Reusable Programs
(1994)- et al.
Evolutionary feature synthesis for visual learning
IEEE Trans. Systems Man Cybernet.
(2005)
Cited by (66)
MPMFFT based DCA-DBT integrated probabilistic model for face expression classification
2020, Journal of King Saud University - Computer and Information SciencesMulti-Objective Differential Evolution for feature selection in Facial Expression Recognition systems
2017, Expert Systems with ApplicationsCitation Excerpt :It has been pointed out that any non-exhaustive selection method does not guarantee to find the optimal feature subset, but rather provides a satisfactory local optimum (Peng, Long, & Ding, 2005). Plenty of papers related to feature selection for FER in combination with EAs have been published to date, e.g., Yu and Bhanu (2006), Zavaschi, Britto, Oliveira, and Koerich (2013), Olague, Hammoud, Trujillo, Hernández, and Romero (2009), Soyel, Tekguc, and Demirel (2011) and Lajevardi and Hussain (2012). Cited methods solve the feature selection as single-objective problem, where the selected features depend heavily on a classifier’s accuracy.
A new framework with multiple tasks for detecting and locating pain events in video
2017, Computer Vision and Image UnderstandingPerformance Analysis of Similarity Coefficient Feature Vector on Facial Expression Recognition
2016, Procedia EngineeringIntelligent facial emotion recognition using a layered encoding cascade optimization model
2015, Applied Soft Computing JournalCitation Excerpt :Minimum reconstruction errors on the combined manifolds were used to guide the classification process. Yu and Bhanu [38] proposed Genetic Programming (GP) combining with Gabor wavelet representations to synthesize facial expression features. Their work employed a GP-based method and linear/nonlinear operators for feature synthesis based on the primitive Gabor wavelet features.
Subspace learning for facial expression recognition: An overview and a new perspective
2021, APSIPA Transactions on Signal and Information Processing