Elsevier

Information Sciences

Volume 316, 20 September 2015, Pages 567-581
Information Sciences

Evolutionary compact embedding for large-scale image classification

https://doi.org/10.1016/j.ins.2014.06.030Get rights and content

Abstract

Effective dimensionality reduction is a classical research area for many large-scale analysis tasks in computer vision. Several recent methods attempt to learn either graph embedding or binary hashing for fast and accurate applications. In this paper, we propose a novel framework to automatically learn the task-specific compact coding, called evolutionary compact embedding (ECE), which can be regarded as an optimization algorithm combining genetic programming (GP) and a boosting trick. As an evolutionary computation methodology, GP can solve problems inspired by natural evolution without any prior knowledge of the solutions. In our evolutionary architecture, each bit of ECE is iteratively computed using a binary classification function, which is generated through GP evolving by jointly minimizing its empirical risk with the AdaBoost strategy on a training set. We address this as greedy optimization leading to small Hamming distances for similar samples and large distances for dissimilar samples. We then evaluate ECE on four image datasets: USPS digital hand-writing, CMU PIE face, CIFAR-10 tiny image and SUN397 scene, showing the accurate and robust performance of our method for large-scale image classification.

Introduction

Dimensionality reduction has been a critical preprocessing step in many fields of information processing and analysis, such as data mining [5], [12], [18], [19], [20], information retrieval [14], [16], and pattern recognition [11], [21], [29], [37]. Recently, with the advances of computer technologies and the development of the World Wide Web, a huge amount of digital data, including text, images and videos, is generated, stored, analyzed, and accessed every day. To overcome the shortcomings of text-based image retrieval, content-based image classification and retrieval has attracted substantial attention. The most basic but essential scheme for image classification is the nearest neighbor search: given a query image, to find an image that is most similar to it within a large database and assign the same label of the nearest neighbor to this query image. However, greedily searching a dataset with N samples is infeasible because linear complexity O(N) is not scalable in practical applications. To overcome this kind of computational complexity problem, many other methods have been proposed to index the data for fast query responses, such as K-D tree and R tree [9]. However, these methods can only operate with small dimensionality, typically less than 100 [2], [3]. Besides, most of the vision-based applications also suffer from the curse of dimensionality problems,1 because visual descriptors usually have hundreds or even thousands of dimensions. Therefore, to make large-scale search or classification practical, some methods have been proposed to effectively reduce the dimension of data and increase the classification speed and accuracy.

One of the most baseline dimensionality reduction algorithms might be principal component analysis (PCA), which is used to explain the variance–covariance structure of a set of variables through linear combinations of those variables. PCA is most commonly applied to condense the information contained in a large number of original variables into a smaller set of new composite variables or dimensions, at the same time ensuring a minimum loss of information. Another effective scheme for dimensionality reduction is linear discriminant analysis (LDA). LDA is a supervised method that has been proved successful on classification problems [4], [8]. Following the Fisher discriminant criterion, the projection vectors are commonly obtained by maximizing the between-class covariance and simultaneously minimizing the within-class covariance. However, the classical LDA is a linear method and cannot tackle nonlinear problems. In order to overcome this limitation, kernel discriminant analysis (KDA) [22] is then developed. KDA is the nonlinear extension of LDA using the kernel trick that can be implicitly performed in a new feature space, which allows non-linear mappings to be learned. Beyond that, some other dimension reduction methods can also achieve promising results for different applications. Locality preserving projections (LPP) [12] are linear projective maps that are obtained by solving a variational problem that preserves the neighborhood structure of the data set. LPP aims to find the optimal linear approximations to the eigenfunctions of the Laplace Beltrami operator on the manifold. In addition, another popular method, termed discriminative locality alignment (DLA) [38], is also been used as dimensionality reduction algorithms for classification. All the above methods can be thought as the direct graph embedding or its linear/kernel/tensor extensions of a specific intrinsic graph that describes certain desired statistical or geometric properties of a data set, with constraints from scale normalization or a penalty graph [36]. With the need for fast search and classification in large-scale vision applications, some recent effort has been turned to applying binary hashing techniques, which explore the approximate similarity search based on Hamming distance to effectively reduce the indexing time. Among this kind of methods, kernelized locality-sensitive hashing (KLSH) [18] has been successfully utilized for large-scale image retrieval and classification. KLSH is essentially a kernelized method of performing probabilistic dimension reduction of high-dimensional data. The basic idea is to hash the input items so that similar items are mapped to the same buckets with high probability. Beyond that, some deep learning methods are also used to learn binary coding unsupervised, e.g., restricted boltzmann machine (RBM) [6].

Although the existing dimensionality reduction methods achieve promising results in a variety of applications, they basically rely on complex and advanced mathematical knowledge to optimize the pre-defined object functions. However, for some optimization problems, direct solutions cannot always be found. Besides, in large-scale settings, matrix factorization techniques used in the above methods can also cause a heavy computational burden. So how to automatically generate better solutions to optimization problems becomes an interesting topic for real-world vision applications. In this work, we propose evolutionary compact embedding (ECE), which applies genetic programming (GP) in combination with AdaBoost to automatically evolve dimensionality reduction. ECE is demonstrated to enable accurate and robust large-scale image classification. Fig. 1 shows the working flow of ECE.

Genetic programming (GP) [26] simulates the Darwinian principle of natural selection to solve optimization problems. Different from hand-crafted techniques based on deep domain knowledge, GP is inspired by natural evolution and can be employed to automatically solve problems without prior knowledge of the solutions. Users can utilize GP to solve a wide range of practical problems, producing human-competitive results and even patentable inventions. Relying on natural and random processes, GP can escape traps by which deterministic methods may be captured. Because of this, usage of GP is not limited to any research domain and creates relatively generalized solutions for any target tasks. In GP, a group of primitive operators is first adopted to randomly assemble computational programs which are regarded as the GP initial population. This population is then allowed to ‘evolve’ (using crossover and mutation) through sexual reproduction with single or pair parents chosen stochastically while biased in their fitness on the task at hand. In this way, the general fitness of population tends to improve over time. Finally, the obtained individual that achieves best performance is taken as the final solution. A typical GP procedure can be included in Algorithm 1.

Algorithm 1

Genetic Programming

Aiming for the task of dimensionality reduction, we intentionally combine GP with a boosting trick to obtain a novel embedding method. For an M-bits embedding, GP is used to iteratively generate a best-performing weighted binary classifier for each bit by jointly minimizing its empirical risk with the Gentle AdaBoost strategy [7] on a training set. This embedding scheme reduces the Hamming distance between the data from the same class, while increasing the Hamming distance for data from different classes. The final optimized reduction representation is defined as the code calculated from the non-linear GP-evolved binary learner for each embedding bit. To the best of our knowledge, this is the first time that GP with the boosting trick has been successfully applied to feature embedding for large-scale image classification.

The remainder of this paper is organized as follows. In Section 2, related work is reviewed. The architecture of ECE and the implementation details are presented in Sections 3 Evolutionary compact embedding, 4 Improved ECE implementation for large-scale applications. Experiments and results are described in Section 5. In Section 6, we conclude this paper and outline the possible future work.

Section snippets

Related work

Recently, some techniques have been successfully used for feature embedding based on boosting schemes. One of the most related works is called boosted similarity sensitive coding (BSSC) [27], which is designed to learn an M-bits weighted Hamming embedding for task specific similarity search as:H:X{α1h1(x),,αmhm(x),,αMhM(x)}So that the distance between any two samples xi and xj is given by a weighted Hamming distance:D(xi,xj)=m=1Mαm|hm(xi)-hm(xj)|where, the weights αm and the functions hm(xi)

Evolutionary compact embedding

In this section, the overall design of our evolutionary embedding algorithm is first introduced and then we describe how to train our GP classifier with the boosting trick.

Improved ECE implementation for large-scale applications

Our ECE method can theoretically reduce data of any dimension to a lower dimension compact code. However, the GP algorithm is always time-consuming for training on large-scale datasets, especially when the dimensionality of the original data is high.

To reduce the GP optimization complexity, we improve our ECE algorithm by using the random batch parallel learning (RBPL) technique. Given a training set X={x1,x2,,xn,,xN} with labels Y={1,2,,C}, we randomly assemble them into N pairs X^pair={(xn

Experiments and results

In this section, we systematically evaluate our proposed ECE with other popular dimension reduction algorithms on different datasets and relevant experimental results are compared and discussed in the following sub-sections.

Conclusion

In this paper, we have presented a novel framework to learn highly discriminative embedding codes for dimensionality reduction using evolutionary compact embedding (ECE). We address it as an optimization problem combining genetic programming (GP) with the boosting-based weight updating trick. For each bit of ECE, the proposed learning scheme evolves a binary classification function through GP and re-weights the training samples for the next bit to jointly minimize its empirical risk with the

Acknowledgements

This work is supported by the University of Sheffield, the National Natural Science Foundation of China (Grant No: 61125106), and the Key Research Program of the Chinese Academy of Sciences (Grant No. KGZD-EW-T03).

References (40)

  • L. Cayton, S. Dasgupta, A learning framework for nearest neighbor search, in:...
  • S. Chakrabarti et al.

    Fast and accurate text classification via multiple linear discriminant projections

    VLDB J.

    (2003)
  • A. Fischer, C. Igel, An introduction to restricted boltzmann machines, in: Progress in Pattern Recognition, Image...
  • J. Friedman et al.

    Additive logistic regression: a statistical view of boosting (with discussion and a rejoinder by the authors)

    Ann. Stat.

    (2000)
  • K. Fukunaga, Introduction to statistical pattern recognition,...
  • V. Gaede et al.

    Multidimensional access methods

    ACM Comput. Surv. (CSUR)

    (1998)
  • A. Gionis, P. Indyk, R. Motwani, et al., Similarity search in high dimensions via hashing, in: VLDB, vol. 99, pp....
  • X. He, P. Niyogi, Locality preserving projections, in: Neural information processing systems, vol. 16, p....
  • J.J. Hull

    A database for handwritten text recognition research

    IEEE Trans. Pattern Anal. Mach. Intell.

    (1994)
  • U. Jayaraman et al.

    Boosted geometric hashing based indexing technique for finger-knuckle-print database

    Inform. Sci.

    (2014)
  • Cited by (22)

    • Multidimensional genetic programming for multiclass classification

      2019, Swarm and Evolutionary Computation
      Citation Excerpt :

      For example, GP can be used to learn image embeddings for ensemble methods [25], as an interactive learning tool for remote sensing [6], and for detection pulmonary nodes in medical imaging [5]. Liu et al. [25] also noted the GP's potential as a dimensionality reduction technique for large-scale problems. M4GP differs from these approaches in two ways: first, it focuses on the capacity for low- and high-dimensionality feature extraction to flexibly suit the needs of the problem, and second, it applies to general multiclass classification problems.

    • Learning robust latent representation for discriminative regression

      2019, Pattern Recognition Letters
      Citation Excerpt :

      Recently, Zhang et al. [33] explored two discriminative regression methods, which incorporated the elastic-net regularization of singular values and building distinctive regression targets into a unified framework for robust image regression. Therefore, LSR model has become a popular technique and has been widely adopted to deal with recognition and classification tasks, and then many variations of LSR have been developed to promote its effectiveness [11,13,35]. One important and fundamental variant of LSR is the well-known sparse regression problem [26,29,36,37,39].

    • Multifocus image fusion via fixed window technique of multiscale images and non-local means filtering

      2017, Signal Processing
      Citation Excerpt :

      As a result, the camera cannot make all relevant objects in focus. However, in the fields of digital image processing and recognition, such as image segmentation [1,2], target detection [3,4], saliency detection [5], image classification [6,7], enhancement [8] and retrieval [9], one image with all objects focused plays a very important role. Multifocus image fusion can be utilized to solve the above problem raised from the finite depth of field of lens.

    • Manifold regularized cross-modal embedding for zero-shot learning

      2017, Information Sciences
      Citation Excerpt :

      Image classification is a major research area in computer vision [17,26,29,35,43].

    • Fast image classification by boosting fuzzy classifiers

      2016, Information Sciences
      Citation Excerpt :

      Emergence of content-based image retrieval (CBIR) in the 1990s enabled automatic retrieval of images to a certain extent. Various CBIR tasks include searching for images similar to the query image or retrieving images of a certain class [7,9,18,19,23,24,26,35,50,51,53] and classification [1,3,6,16,17,20,25,43,52] of the query image. Such content-based image matching remains a challenging problem of computer science.

    View all citing articles on Scopus
    View full text