Evolutionary compact embedding for large-scale image classification

doi:10.1016/j.ins.2014.06.030

Information Sciences

Volume 316, 20 September 2015, Pages 567-581

https://doi.org/10.1016/j.ins.2014.06.030 Get rights and content

Abstract

Effective dimensionality reduction is a classical research area for many large-scale analysis tasks in computer vision. Several recent methods attempt to learn either graph embedding or binary hashing for fast and accurate applications. In this paper, we propose a novel framework to automatically learn the task-specific compact coding, called evolutionary compact embedding (ECE), which can be regarded as an optimization algorithm combining genetic programming (GP) and a boosting trick. As an evolutionary computation methodology, GP can solve problems inspired by natural evolution without any prior knowledge of the solutions. In our evolutionary architecture, each bit of ECE is iteratively computed using a binary classification function, which is generated through GP evolving by jointly minimizing its empirical risk with the AdaBoost strategy on a training set. We address this as greedy optimization leading to small Hamming distances for similar samples and large distances for dissimilar samples. We then evaluate ECE on four image datasets: USPS digital hand-writing, CMU PIE face, CIFAR-10 tiny image and SUN397 scene, showing the accurate and robust performance of our method for large-scale image classification.

Introduction

Dimensionality reduction has been a critical preprocessing step in many fields of information processing and analysis, such as data mining [5], [12], [18], [19], [20], information retrieval [14], [16], and pattern recognition [11], [21], [29], [37]. Recently, with the advances of computer technologies and the development of the World Wide Web, a huge amount of digital data, including text, images and videos, is generated, stored, analyzed, and accessed every day. To overcome the shortcomings of text-based image retrieval, content-based image classification and retrieval has attracted substantial attention. The most basic but essential scheme for image classification is the nearest neighbor search: given a query image, to find an image that is most similar to it within a large database and assign the same label of the nearest neighbor to this query image. However, greedily searching a dataset with N samples is infeasible because linear complexity $O (N)$ is not scalable in practical applications. To overcome this kind of computational complexity problem, many other methods have been proposed to index the data for fast query responses, such as K-D tree and R tree [9]. However, these methods can only operate with small dimensionality, typically less than 100 [2], [3]. Besides, most of the vision-based applications also suffer from the curse of dimensionality problems,¹ because visual descriptors usually have hundreds or even thousands of dimensions. Therefore, to make large-scale search or classification practical, some methods have been proposed to effectively reduce the dimension of data and increase the classification speed and accuracy.

One of the most baseline dimensionality reduction algorithms might be principal component analysis (PCA), which is used to explain the variance–covariance structure of a set of variables through linear combinations of those variables. PCA is most commonly applied to condense the information contained in a large number of original variables into a smaller set of new composite variables or dimensions, at the same time ensuring a minimum loss of information. Another effective scheme for dimensionality reduction is linear discriminant analysis (LDA). LDA is a supervised method that has been proved successful on classification problems [4], [8]. Following the Fisher discriminant criterion, the projection vectors are commonly obtained by maximizing the between-class covariance and simultaneously minimizing the within-class covariance. However, the classical LDA is a linear method and cannot tackle nonlinear problems. In order to overcome this limitation, kernel discriminant analysis (KDA) [22] is then developed. KDA is the nonlinear extension of LDA using the kernel trick that can be implicitly performed in a new feature space, which allows non-linear mappings to be learned. Beyond that, some other dimension reduction methods can also achieve promising results for different applications. Locality preserving projections (LPP) [12] are linear projective maps that are obtained by solving a variational problem that preserves the neighborhood structure of the data set. LPP aims to find the optimal linear approximations to the eigenfunctions of the Laplace Beltrami operator on the manifold. In addition, another popular method, termed discriminative locality alignment (DLA) [38], is also been used as dimensionality reduction algorithms for classification. All the above methods can be thought as the direct graph embedding or its linear/kernel/tensor extensions of a specific intrinsic graph that describes certain desired statistical or geometric properties of a data set, with constraints from scale normalization or a penalty graph [36]. With the need for fast search and classification in large-scale vision applications, some recent effort has been turned to applying binary hashing techniques, which explore the approximate similarity search based on Hamming distance to effectively reduce the indexing time. Among this kind of methods, kernelized locality-sensitive hashing (KLSH) [18] has been successfully utilized for large-scale image retrieval and classification. KLSH is essentially a kernelized method of performing probabilistic dimension reduction of high-dimensional data. The basic idea is to hash the input items so that similar items are mapped to the same buckets with high probability. Beyond that, some deep learning methods are also used to learn binary coding unsupervised, e.g., restricted boltzmann machine (RBM) [6].

Although the existing dimensionality reduction methods achieve promising results in a variety of applications, they basically rely on complex and advanced mathematical knowledge to optimize the pre-defined object functions. However, for some optimization problems, direct solutions cannot always be found. Besides, in large-scale settings, matrix factorization techniques used in the above methods can also cause a heavy computational burden. So how to automatically generate better solutions to optimization problems becomes an interesting topic for real-world vision applications. In this work, we propose evolutionary compact embedding (ECE), which applies genetic programming (GP) in combination with AdaBoost to automatically evolve dimensionality reduction. ECE is demonstrated to enable accurate and robust large-scale image classification. Fig. 1 shows the working flow of ECE.

Genetic programming (GP) [26] simulates the Darwinian principle of natural selection to solve optimization problems. Different from hand-crafted techniques based on deep domain knowledge, GP is inspired by natural evolution and can be employed to automatically solve problems without prior knowledge of the solutions. Users can utilize GP to solve a wide range of practical problems, producing human-competitive results and even patentable inventions. Relying on natural and random processes, GP can escape traps by which deterministic methods may be captured. Because of this, usage of GP is not limited to any research domain and creates relatively generalized solutions for any target tasks. In GP, a group of primitive operators is first adopted to randomly assemble computational programs which are regarded as the GP initial population. This population is then allowed to ‘evolve’ (using crossover and mutation) through sexual reproduction with single or pair parents chosen stochastically while biased in their fitness on the task at hand. In this way, the general fitness of population tends to improve over time. Finally, the obtained individual that achieves best performance is taken as the final solution. A typical GP procedure can be included in Algorithm 1.

Algorithm 1

Genetic Programming

Aiming for the task of dimensionality reduction, we intentionally combine GP with a boosting trick to obtain a novel embedding method. For an M-bits embedding, GP is used to iteratively generate a best-performing weighted binary classifier for each bit by jointly minimizing its empirical risk with the Gentle AdaBoost strategy [7] on a training set. This embedding scheme reduces the Hamming distance between the data from the same class, while increasing the Hamming distance for data from different classes. The final optimized reduction representation is defined as the code calculated from the non-linear GP-evolved binary learner for each embedding bit. To the best of our knowledge, this is the first time that GP with the boosting trick has been successfully applied to feature embedding for large-scale image classification.

The remainder of this paper is organized as follows. In Section 2, related work is reviewed. The architecture of ECE and the implementation details are presented in Sections 3 Evolutionary compact embedding, 4 Improved ECE implementation for large-scale applications. Experiments and results are described in Section 5. In Section 6, we conclude this paper and outline the possible future work.

Section snippets

Related work

Recently, some techniques have been successfully used for feature embedding based on boosting schemes. One of the most related works is called boosted similarity sensitive coding (BSSC) [27], which is designed to learn an M-bits weighted Hamming embedding for task specific similarity search as: $H : X \to {α_{1} h_{1} (x), \dots, α_{m} h_{m} (x), \dots, α_{M} h_{M} (x)}$ So that the distance between any two samples $x_{i}$ and $x_{j}$ is given by a weighted Hamming distance: $D (x_{i}, x_{j}) = \sum_{m = 1}^{M} α_{m} | h_{m} (x_{i}) - h_{m} (x_{j}) |$ where, the weights $α_{m}$ and the functions $h_{m} (x_{i})$

Evolutionary compact embedding

In this section, the overall design of our evolutionary embedding algorithm is first introduced and then we describe how to train our GP classifier with the boosting trick.

Improved ECE implementation for large-scale applications

Our ECE method can theoretically reduce data of any dimension to a lower dimension compact code. However, the GP algorithm is always time-consuming for training on large-scale datasets, especially when the dimensionality of the original data is high.

To reduce the GP optimization complexity, we improve our ECE algorithm by using the random batch parallel learning (RBPL) technique. Given a training set $X = {x_{1}, x_{2}, \dots, x_{n}, \dots, x_{N}}$ with labels $Y = {1, 2, \dots, C}$ , we randomly assemble them into N pairs ${\hat{X}}_{pair} = {\dots (x_{n}$

Experiments and results

In this section, we systematically evaluate our proposed ECE with other popular dimension reduction algorithms on different datasets and relevant experimental results are compared and discussed in the following sub-sections.

Conclusion

In this paper, we have presented a novel framework to learn highly discriminative embedding codes for dimensionality reduction using evolutionary compact embedding (ECE). We address it as an optimization problem combining genetic programming (GP) with the boosting-based weight updating trick. For each bit of ECE, the proposed learning scheme evolves a binary classification function through GP and re-weights the training samples for the next bit to jointly minimize its empirical risk with the

Acknowledgements

This work is supported by the University of Sheffield, the National Natural Science Foundation of China (Grant No: 61125106), and the Key Research Program of the Chinese Academy of Sciences (Grant No. KGZD-EW-T03).

References (40)

H. Cheng et al.
Subspace projection: a unified framework for a class of partition-based dimension reduction techniques
Inform. Sci.
(2009)
H. Guan et al.
Fast dimension reduction for document classification based on imprecise spectrum analysis
Inform. Sci.
(2013)
S. Jones et al.
Content-based retrieval of human actions from realistic video databases
Inform. Sci.
(2013)
C.C. Lin et al.
Adaptive embedding techniques for vq-compressed images
Inform. Sci.
(2009)
G.F. Lu et al.
Feature extraction using a fast null space based linear discriminant analysis algorithm
Inform. Sci.
(2012)
S. Maldonado et al.
Simultaneous feature selection and classification using kernel-penalized support vector machines
Inform. Sci.
(2011)
L. Shao et al.
Geometric and photometric invariant distinctive regions detection
Inform. Sci.
(2007)
M.L. Zhang et al.
Feature selection for multi-label naive bayes classification
Inform. Sci.
(2009)
U. Bhowan, M. Zhang, M. Johnston, Genetic programming for image classification with unbalanced data, in: International...
D. Cai et al.
Speed up kernel discriminant analysis
VLDB
(2011)

L. Cayton, S. Dasgupta, A learning framework for nearest neighbor search, in:...

S. Chakrabarti et al.

Fast and accurate text classification via multiple linear discriminant projections

VLDB J.

(2003)

A. Fischer, C. Igel, An introduction to restricted boltzmann machines, in: Progress in Pattern Recognition, Image...

J. Friedman et al.

Additive logistic regression: a statistical view of boosting (with discussion and a rejoinder by the authors)

Ann. Stat.

(2000)

K. Fukunaga, Introduction to statistical pattern recognition,...

V. Gaede et al.

Multidimensional access methods

ACM Comput. Surv. (CSUR)

(1998)

A. Gionis, P. Indyk, R. Motwani, et al., Similarity search in high dimensions via hashing, in: VLDB, vol. 99, pp....

X. He, P. Niyogi, Locality preserving projections, in: Neural information processing systems, vol. 16, p....

J.J. Hull

A database for handwritten text recognition research

IEEE Trans. Pattern Anal. Mach. Intell.

(1994)

U. Jayaraman et al.

Boosted geometric hashing based indexing technique for finger-knuckle-print database

Inform. Sci.

(2014)

Cited by (22)

Multidimensional genetic programming for multiclass classification
2019, Swarm and Evolutionary Computation
Citation Excerpt :
For example, GP can be used to learn image embeddings for ensemble methods [25], as an interactive learning tool for remote sensing [6], and for detection pulmonary nodes in medical imaging [5]. Liu et al. [25] also noted the GP's potential as a dimensionality reduction technique for large-scale problems. M4GP differs from these approaches in two ways: first, it focuses on the capacity for low- and high-dimensionality feature extraction to flexibly suit the needs of the problem, and second, it applies to general multiclass classification problems.
We describe a new multiclass classification method that learns multidimensional feature transformations using genetic programming. This method optimizes models by first performing a transformation of the feature space into a new space of potentially different dimensionality, and then performing classification using a distance function in the transformed space. We analyze a novel program representation for using genetic programming to represent multidimensional features and compare it to other approaches. Similarly, we analyze the use of a distance metric for classification in comparison to simpler techniques more commonly used when applying genetic programming to multiclass classification. Finally, we compare this method to several state-of-the-art classification techniques across a broad set of problems and show that this technique achieves competitive test accuracies while also producing concise models. We also quantify the scalability of the method on problems of varying dimensionality, sample size, and difficulty. The results suggest the proposed method scales well to large feature spaces.
Learning robust latent representation for discriminative regression
2019, Pattern Recognition Letters
Citation Excerpt :
Recently, Zhang et al. [33] explored two discriminative regression methods, which incorporated the elastic-net regularization of singular values and building distinctive regression targets into a unified framework for robust image regression. Therefore, LSR model has become a popular technique and has been widely adopted to deal with recognition and classification tasks, and then many variations of LSR have been developed to promote its effectiveness [11,13,35]. One important and fundamental variant of LSR is the well-known sparse regression problem [26,29,36,37,39].
In this paper, we present a generic effective formulation, dubbed robust latent representation learning (RLRL), for discriminative linear regression. We formulate the RLRL optimization problem as a joint learning framework of discriminative latent feature selection and robust linear regression. Specifically, instead of directly projecting the original high-dimensional features onto a target space, RLRL learns discriminative latent representation by concurrently suppressing the redundant information from original features and constructing a robust latent subspace. To improve the effectiveness of the regression task, a capped l_p-norm regression model is formulated for robust linear regression. Furthermore, RLRL incorporates learning latent representation and building marginalized regressing prediction into one framework for reducing the classification error of the regression model. Consequently, a general framework for simultaneous feature selection and robust regression is discussed and explicitly presented. An effective alternating optimization algorithm with efficient convergence is developed to solve the resulting optimization problem. Extensive experimental results conducted on diverse databases validate the effectiveness of the proposed RLRL method in comparison with state-of-the-art regression methods.
Multifocus image fusion via fixed window technique of multiscale images and non-local means filtering
2017, Signal Processing
Citation Excerpt :
As a result, the camera cannot make all relevant objects in focus. However, in the fields of digital image processing and recognition, such as image segmentation [1,2], target detection [3,4], saliency detection [5], image classification [6,7], enhancement [8] and retrieval [9], one image with all objects focused plays a very important role. Multifocus image fusion can be utilized to solve the above problem raised from the finite depth of field of lens.
In multifocus image fusion, accurate detection of the focused pixel from source images is crucial to improve the quality of fused result. Traditionally, the method developed in spatial domain is commonly used. However, such approaches tend to produce boundary seams or distortions. To this end, we propose a new fixed window technique of multiscale image analysis (MIA) and a new weighted fusion strategy by employing non-local means filtering (NLF). This new scheme consists of three parts: detection of focused pixel, correction of detecting results, and generation of fusion weight maps for source images. To improve detection robustness against the size of object, we develop the fixed window technique of MIA to detect the focused pixels, and then we construct the initial fusion decision map for each of source images by combining those detection results; second, we present a new refining process based on block consistency evaluation for correcting the initial detection result. At last, the corresponding source images are used as a guide and combined with the NLF to produce the fusion weight maps. Experimental results demonstrate that the performance of our approach is superior to that of many state-of-the-art algorithms in terms of both visual perception and objective evaluation.
Manifold regularized cross-modal embedding for zero-shot learning
2017, Information Sciences
Citation Excerpt :
Image classification is a major research area in computer vision [17,26,29,35,43].
Zero-Shot Learning (ZSL) aims at classifying previously unseen class samples and has gained its popularity in applications where samples of some categories are scarce for training. The basic idea to address this issue is transferring knowledge from the seen classes to the unseen classes through mapping the visual feature to an embedding space spanned by class semantic information. The class semantic information can be obtained from human-labeled attributes or text corpus in an unsupervised fashion. Therefore, the embedding function from visual space to the embedding space is extremely important. However, the existing embedding approaches to ZSL mainly focus on aligning pairwise semantic consistency from heterogeneous spaces but ignore the intrinsic structure of the locally homogeneous isomorph. In order to preserve the locally visual structure in the embedding process, this paper proposes a Manifold regularized Cross-Modal Embedding (MCME) approach for ZSL by formulating the manifold constraint for intrinsic structure of the visual features as well as aligning pairwise consistency. The linear, closed-form solution makes MCME efficient to compute. Furthermore, rather than applying the embedding function learned from the seen classes directly, we also propose a new domain adaptation strategy to overcome the domain-shift problem during the knowledge transfer process. The MCME with the domain adaptation method is called MCME-DA. Extensive experiments on the benchmark datasets of AwA and CUB validate the superiority and promise of MCME and MCME-DA.
Fast image classification by boosting fuzzy classifiers
2016, Information Sciences
Citation Excerpt :
Emergence of content-based image retrieval (CBIR) in the 1990s enabled automatic retrieval of images to a certain extent. Various CBIR tasks include searching for images similar to the query image or retrieving images of a certain class [7,9,18,19,23,24,26,35,50,51,53] and classification [1,3,6,16,17,20,25,43,52] of the query image. Such content-based image matching remains a challenging problem of computer science.
This paper presents a novel approach to visual objects classification based on generating simple fuzzy classifiers using local image features to distinguish between one known class and other classes. Boosting meta-learning is used to find the most representative local features. The proposed approach is tested on a state-of-the-art image dataset and compared with the bag-of-features image representation model combined with the Support Vector Machine classification. The novel method gives better classification accuracy and the time of learning and testing process is more than 30% shorter.
A novel visual codebook model based on fuzzy geometry for large-scale image classification
2015, Pattern Recognition
Citation Excerpt :
In recent years, the rapid growth of digital image data brings huge challenges to traditional image classification [3–12,15–30,36–47].
The codebook model has been developed as an effective means for image classification. However, the inherent operation of assigning visual words to image feature vectors in traditional codebook approaches causes serious ambiguities in image classification. In particular, the nearest word may not be the best fit to a feature, and multiple words may be equally appropriate for one specific feature. To resolve these ambiguities, we propose a novel visual codebook model based on the n-dimensional fuzzy geometry (n-D FG) theory, where all visual words and features are modeled as fuzzy points in the n-D FG space, and appropriate uncertainty is introduced to each fuzzy point to enhance the representation capacity. This n-D FG-codebook model not only inherits advantages from the fuzzy set theory, but also facilitates the analysis and determination of the relationship between visual words and features in geometric form. By explicitly taking into account the ambiguities, we propose a novel measure of similarity between the visual words and fuzzy features. Following the proposed codebook model and the novel similarity measure, we develop two useful image classification algorithms by modifying popular image coding algorithms (i.e. SPM and LLC). Finally, experimental results demonstrate that the classification accuracy of the proposed algorithms is dramatically improved for a standard large-scale image database. For example, with a codebook size of 256, the proposed algorithms achieve similar performance as traditional algorithms with a codebook size of 1024, indicating that the proposed algorithms reduce the computational cost by 75% while achieving almost identical classification accuracy to traditional algorithms. Thus, the proposed algorithms represent a more efficient and appropriate scheme for big image data.

View all citing articles on Scopus

View full text

Evolutionary compact embedding for large-scale image classification

Abstract

Introduction

Section snippets

Related work

Evolutionary compact embedding

Improved ECE implementation for large-scale applications

Experiments and results

Conclusion

Acknowledgements

Inform. Sci.

Inform. Sci.

Inform. Sci.

Inform. Sci.

Inform. Sci.

Inform. Sci.

Inform. Sci.

Inform. Sci.

Speed up kernel discriminant analysis

VLDB

Fast and accurate text classification via multiple linear discriminant projections

VLDB J.

Additive logistic regression: a statistical view of boosting (with discussion and a rejoinder by the authors)

Ann. Stat.

Multidimensional access methods

ACM Comput. Surv. (CSUR)

A database for handwritten text recognition research

IEEE Trans. Pattern Anal. Mach. Intell.

Boosted geometric hashing based indexing technique for finger-knuckle-print database

Inform. Sci.