Evolutionary compact embedding for large-scale image classification
Introduction
Dimensionality reduction has been a critical preprocessing step in many fields of information processing and analysis, such as data mining [5], [12], [18], [19], [20], information retrieval [14], [16], and pattern recognition [11], [21], [29], [37]. Recently, with the advances of computer technologies and the development of the World Wide Web, a huge amount of digital data, including text, images and videos, is generated, stored, analyzed, and accessed every day. To overcome the shortcomings of text-based image retrieval, content-based image classification and retrieval has attracted substantial attention. The most basic but essential scheme for image classification is the nearest neighbor search: given a query image, to find an image that is most similar to it within a large database and assign the same label of the nearest neighbor to this query image. However, greedily searching a dataset with N samples is infeasible because linear complexity is not scalable in practical applications. To overcome this kind of computational complexity problem, many other methods have been proposed to index the data for fast query responses, such as K-D tree and R tree [9]. However, these methods can only operate with small dimensionality, typically less than 100 [2], [3]. Besides, most of the vision-based applications also suffer from the curse of dimensionality problems,1 because visual descriptors usually have hundreds or even thousands of dimensions. Therefore, to make large-scale search or classification practical, some methods have been proposed to effectively reduce the dimension of data and increase the classification speed and accuracy.
One of the most baseline dimensionality reduction algorithms might be principal component analysis (PCA), which is used to explain the variance–covariance structure of a set of variables through linear combinations of those variables. PCA is most commonly applied to condense the information contained in a large number of original variables into a smaller set of new composite variables or dimensions, at the same time ensuring a minimum loss of information. Another effective scheme for dimensionality reduction is linear discriminant analysis (LDA). LDA is a supervised method that has been proved successful on classification problems [4], [8]. Following the Fisher discriminant criterion, the projection vectors are commonly obtained by maximizing the between-class covariance and simultaneously minimizing the within-class covariance. However, the classical LDA is a linear method and cannot tackle nonlinear problems. In order to overcome this limitation, kernel discriminant analysis (KDA) [22] is then developed. KDA is the nonlinear extension of LDA using the kernel trick that can be implicitly performed in a new feature space, which allows non-linear mappings to be learned. Beyond that, some other dimension reduction methods can also achieve promising results for different applications. Locality preserving projections (LPP) [12] are linear projective maps that are obtained by solving a variational problem that preserves the neighborhood structure of the data set. LPP aims to find the optimal linear approximations to the eigenfunctions of the Laplace Beltrami operator on the manifold. In addition, another popular method, termed discriminative locality alignment (DLA) [38], is also been used as dimensionality reduction algorithms for classification. All the above methods can be thought as the direct graph embedding or its linear/kernel/tensor extensions of a specific intrinsic graph that describes certain desired statistical or geometric properties of a data set, with constraints from scale normalization or a penalty graph [36]. With the need for fast search and classification in large-scale vision applications, some recent effort has been turned to applying binary hashing techniques, which explore the approximate similarity search based on Hamming distance to effectively reduce the indexing time. Among this kind of methods, kernelized locality-sensitive hashing (KLSH) [18] has been successfully utilized for large-scale image retrieval and classification. KLSH is essentially a kernelized method of performing probabilistic dimension reduction of high-dimensional data. The basic idea is to hash the input items so that similar items are mapped to the same buckets with high probability. Beyond that, some deep learning methods are also used to learn binary coding unsupervised, e.g., restricted boltzmann machine (RBM) [6].
Although the existing dimensionality reduction methods achieve promising results in a variety of applications, they basically rely on complex and advanced mathematical knowledge to optimize the pre-defined object functions. However, for some optimization problems, direct solutions cannot always be found. Besides, in large-scale settings, matrix factorization techniques used in the above methods can also cause a heavy computational burden. So how to automatically generate better solutions to optimization problems becomes an interesting topic for real-world vision applications. In this work, we propose evolutionary compact embedding (ECE), which applies genetic programming (GP) in combination with AdaBoost to automatically evolve dimensionality reduction. ECE is demonstrated to enable accurate and robust large-scale image classification. Fig. 1 shows the working flow of ECE.
Genetic programming (GP) [26] simulates the Darwinian principle of natural selection to solve optimization problems. Different from hand-crafted techniques based on deep domain knowledge, GP is inspired by natural evolution and can be employed to automatically solve problems without prior knowledge of the solutions. Users can utilize GP to solve a wide range of practical problems, producing human-competitive results and even patentable inventions. Relying on natural and random processes, GP can escape traps by which deterministic methods may be captured. Because of this, usage of GP is not limited to any research domain and creates relatively generalized solutions for any target tasks. In GP, a group of primitive operators is first adopted to randomly assemble computational programs which are regarded as the GP initial population. This population is then allowed to ‘evolve’ (using crossover and mutation) through sexual reproduction with single or pair parents chosen stochastically while biased in their fitness on the task at hand. In this way, the general fitness of population tends to improve over time. Finally, the obtained individual that achieves best performance is taken as the final solution. A typical GP procedure can be included in Algorithm 1. Algorithm 1 Genetic Programming
Aiming for the task of dimensionality reduction, we intentionally combine GP with a boosting trick to obtain a novel embedding method. For an M-bits embedding, GP is used to iteratively generate a best-performing weighted binary classifier for each bit by jointly minimizing its empirical risk with the Gentle AdaBoost strategy [7] on a training set. This embedding scheme reduces the Hamming distance between the data from the same class, while increasing the Hamming distance for data from different classes. The final optimized reduction representation is defined as the code calculated from the non-linear GP-evolved binary learner for each embedding bit. To the best of our knowledge, this is the first time that GP with the boosting trick has been successfully applied to feature embedding for large-scale image classification.
The remainder of this paper is organized as follows. In Section 2, related work is reviewed. The architecture of ECE and the implementation details are presented in Sections 3 Evolutionary compact embedding, 4 Improved ECE implementation for large-scale applications. Experiments and results are described in Section 5. In Section 6, we conclude this paper and outline the possible future work.
Section snippets
Related work
Recently, some techniques have been successfully used for feature embedding based on boosting schemes. One of the most related works is called boosted similarity sensitive coding (BSSC) [27], which is designed to learn an M-bits weighted Hamming embedding for task specific similarity search as:So that the distance between any two samples and is given by a weighted Hamming distance:where, the weights and the functions
Evolutionary compact embedding
In this section, the overall design of our evolutionary embedding algorithm is first introduced and then we describe how to train our GP classifier with the boosting trick.
Improved ECE implementation for large-scale applications
Our ECE method can theoretically reduce data of any dimension to a lower dimension compact code. However, the GP algorithm is always time-consuming for training on large-scale datasets, especially when the dimensionality of the original data is high.
To reduce the GP optimization complexity, we improve our ECE algorithm by using the random batch parallel learning (RBPL) technique. Given a training set with labels , we randomly assemble them into N pairs
Experiments and results
In this section, we systematically evaluate our proposed ECE with other popular dimension reduction algorithms on different datasets and relevant experimental results are compared and discussed in the following sub-sections.
Conclusion
In this paper, we have presented a novel framework to learn highly discriminative embedding codes for dimensionality reduction using evolutionary compact embedding (ECE). We address it as an optimization problem combining genetic programming (GP) with the boosting-based weight updating trick. For each bit of ECE, the proposed learning scheme evolves a binary classification function through GP and re-weights the training samples for the next bit to jointly minimize its empirical risk with the
Acknowledgements
This work is supported by the University of Sheffield, the National Natural Science Foundation of China (Grant No: 61125106), and the Key Research Program of the Chinese Academy of Sciences (Grant No. KGZD-EW-T03).
References (40)
- et al.
Subspace projection: a unified framework for a class of partition-based dimension reduction techniques
Inform. Sci.
(2009) - et al.
Fast dimension reduction for document classification based on imprecise spectrum analysis
Inform. Sci.
(2013) - et al.
Content-based retrieval of human actions from realistic video databases
Inform. Sci.
(2013) - et al.
Adaptive embedding techniques for vq-compressed images
Inform. Sci.
(2009) - et al.
Feature extraction using a fast null space based linear discriminant analysis algorithm
Inform. Sci.
(2012) - et al.
Simultaneous feature selection and classification using kernel-penalized support vector machines
Inform. Sci.
(2011) - et al.
Geometric and photometric invariant distinctive regions detection
Inform. Sci.
(2007) - et al.
Feature selection for multi-label naive bayes classification
Inform. Sci.
(2009) - U. Bhowan, M. Zhang, M. Johnston, Genetic programming for image classification with unbalanced data, in: International...
- et al.
Speed up kernel discriminant analysis
VLDB
(2011)
Fast and accurate text classification via multiple linear discriminant projections
VLDB J.
Additive logistic regression: a statistical view of boosting (with discussion and a rejoinder by the authors)
Ann. Stat.
Multidimensional access methods
ACM Comput. Surv. (CSUR)
A database for handwritten text recognition research
IEEE Trans. Pattern Anal. Mach. Intell.
Boosted geometric hashing based indexing technique for finger-knuckle-print database
Inform. Sci.
Cited by (22)
Multidimensional genetic programming for multiclass classification
2019, Swarm and Evolutionary ComputationCitation Excerpt :For example, GP can be used to learn image embeddings for ensemble methods [25], as an interactive learning tool for remote sensing [6], and for detection pulmonary nodes in medical imaging [5]. Liu et al. [25] also noted the GP's potential as a dimensionality reduction technique for large-scale problems. M4GP differs from these approaches in two ways: first, it focuses on the capacity for low- and high-dimensionality feature extraction to flexibly suit the needs of the problem, and second, it applies to general multiclass classification problems.
Learning robust latent representation for discriminative regression
2019, Pattern Recognition LettersCitation Excerpt :Recently, Zhang et al. [33] explored two discriminative regression methods, which incorporated the elastic-net regularization of singular values and building distinctive regression targets into a unified framework for robust image regression. Therefore, LSR model has become a popular technique and has been widely adopted to deal with recognition and classification tasks, and then many variations of LSR have been developed to promote its effectiveness [11,13,35]. One important and fundamental variant of LSR is the well-known sparse regression problem [26,29,36,37,39].
Multifocus image fusion via fixed window technique of multiscale images and non-local means filtering
2017, Signal ProcessingCitation Excerpt :As a result, the camera cannot make all relevant objects in focus. However, in the fields of digital image processing and recognition, such as image segmentation [1,2], target detection [3,4], saliency detection [5], image classification [6,7], enhancement [8] and retrieval [9], one image with all objects focused plays a very important role. Multifocus image fusion can be utilized to solve the above problem raised from the finite depth of field of lens.
Manifold regularized cross-modal embedding for zero-shot learning
2017, Information SciencesCitation Excerpt :Image classification is a major research area in computer vision [17,26,29,35,43].
Fast image classification by boosting fuzzy classifiers
2016, Information SciencesCitation Excerpt :Emergence of content-based image retrieval (CBIR) in the 1990s enabled automatic retrieval of images to a certain extent. Various CBIR tasks include searching for images similar to the query image or retrieving images of a certain class [7,9,18,19,23,24,26,35,50,51,53] and classification [1,3,6,16,17,20,25,43,52] of the query image. Such content-based image matching remains a challenging problem of computer science.
A novel visual codebook model based on fuzzy geometry for large-scale image classification
2015, Pattern RecognitionCitation Excerpt :In recent years, the rapid growth of digital image data brings huge challenges to traditional image classification [3–12,15–30,36–47].