Neural architecture search for image saliency fusion

doi:10.1016/j.inffus.2019.12.007

Information Fusion

Volume 57, May 2020, Pages 89-101

https://doi.org/10.1016/j.inffus.2019.12.007 Get rights and content

Highlights

•
Neural architecture search addressed with Genetic Programming and Backpropagation.
•
Genetic Programming efficiently provides blueprints for neural network architectures.
•
Backpropagation significantly improves the performance of candidate blueprints.
•
Proper fusion of hand-crafted saliency methods can outperform deep learning methods.
•
Proper fusion of deep learning methods outperforms the state of the art.

Abstract

Saliency detection methods proposed in the literature exploit different rationales, visual clues, and assumptions, but there is no single best saliency detection algorithm that is able to achieve good results on all the different benchmark datasets. In this paper we show that fusing different saliency detection algorithms together by exploiting neural network architectures makes it possible to obtain better results. Designing the best architecture for a given task is still an open problem since the existing techniques have some limits with respect to the problem formulation, to the search space, and require very high computational resources. To overcome these problems, in this paper we propose a three-step fusion approach. In the first step, genetic programming techniques are exploited to combine the outputs of existing saliency algorithms using a set of provided operations. Having a discrete search space allows us a fast generation of the candidate solutions. In the second step, the obtained solutions are converted into backbone Convolutional Neural Networks (CNNs) where operations are all implemented with differentiable functions, allowing an efficient optimization of the corresponding parameters (in a continuous space) by backpropagation. In the last step, to enrich the expressiveness of the initial architectures, the networks are further extended with additional operations on intermediate levels of the processing that are once again efficiently optimized through backpropagation.

Extensive experimental evaluations show that the proposed saliency fusion approach outperforms the state-of-the-art on the MSRAB dataset and it is able to generalize to unseen data of different benchmark datasets.

Introduction

According to [1], “Visual salience (or visual saliency) is the distinct subjective perceptual quality which makes some items in the world stand out from their neighbors and immediately grab our attention”. The human vision system is able to efficiently detect salient areas in a scene and further process them to extract high-level information [2], [3]. Visual saliency has been primarily studied by neuroscientists, cognitive scientists and recently has received attention from other research communities working in the fields of computer vision, computer graphics and multimedia e.g. [4]. In the area of multimedia and computer vision, visual saliency can be used to emphasize object-level regions in the scene that can serve as a pre-processing step for scene recognition [5], [6], object detection [7], [8], segmentation [9], and tracking [10]. It can also be exploited for image manipulation and visualization in applications such as image retargeting [11], image collage [12], and non-photorealistic rendering [13]. Moreover, in multimedia application saliency can be exploited for image and video summarization [14], [15], [16], enhancement [17], retrieval [18], and image quality or aesthetic assessment [19], [20].

Saliency detection methods can be divided into two categories: bottom-up and top-down. Bottom-up methods are stimuli-driven [21]. The saliency is usually modeled by local or global contrast on hand-crafted visual features and knowledge about human visual attention is embedded in the model exploiting some heuristic priors such as background [22], compactness [23], or objectness [24]. With these methods no explicit information about the semantics of the salient regions is provided but it is indirectly embedded via prior assumptions that are made on the location, shape or visual properties of the salient regions to be detected. Bottom-up methods can be considered general purpose.

Top-down saliency methods are designed to find regions in the images that are relevant for a given task. They are often also referred to as task-driven approaches. These methods usually formulate the saliency detection as a supervised learning problem [25]. The rationale of top-down saliency methods is to identify image regions that belong to a pre-defined object category [26]. For this reason, these methods are theoretically more robust for identifying salient regions in cluttered backgrounds where bottom-up methods may fail. Top-down approaches rely on the use of training data to build the detection model. They can be very robust for the specific task on which they are trained but may not generalize well to other tasks.

In order to make the detection more robust and to improve the generalization capabilities, saliency methods often integrate different features [27] that can be both hand-crafted or learned by Convolutional Neural Networks (CNNs) [28], [29], [30], or fuse saliency maps generated from different methods [31]. However, the feature definition and selection, and the combination strategies are usually empirically designed.

Since multiple observers may consider salient different regions in the scene depending on the scene context and/or on the observer’s cultural background, saliency detection is an ill-posed problem [22], [32]. Saliency detection methods proposed in the literature exploit different rationales, visual clues, and assumptions but as demonstrated by the experiments in [33], there is no best overall saliency detection algorithm that is able to achieve good results on all the different benchmark datasets.

In our previous works [34], [35], we have exploited genetic programming (GP) to build the rationale with which to combine the binary outputs of several change detection algorithms. By using a-priori defined unary, binary and n-ary operators, the GP approach automatically combined the inputs using the provided operators and built an optimal, task-driven, solution (i.e. program) in the form of a hierarchical tree structure.

In this work we want to further investigate and extend this approach to combine graylevel saliency maps, a domain we first addressed in [36]. We first create a candidate solution for combining the saliency maps using GP with a set of operations whose parameters are a-priori fixed. To further improve this solution, we should also tune these parameters, but they cannot be easily (or efficiently) optimized within the GP framework. In order to optimize the parameters, we use the candidate solution obtained by the GP as a blueprint upon which to design the architecture of a backbone Convolutional Neural Network. Within the CNN optimization framework, it is now easier and much more efficient to search for the optimal parameters of the operations of the GP solution. Another important advantage of the implementation of the backbone CNN is that the proposed solutions can be evaluated and then we can easily and safely create deeper variants of the CNN by including other operations (e.g. post-processing) on intermediate results. These operations, initialized as identities, are further optimized or can be completely ignored by the CNN during training.

The extensive experiments on benchmark datasets, both qualitative and quantitative, validate the effectiveness of the proposed fusion strategy.

Finally, beyond the focus on saliency estimation for the scope of this paper, the proposed information fusion technique can be considered a general purpose method, with possible applications to other fields such as change detection [35] and semantic segmentation [37].

Section snippets

Saliency detection algorithms

Borji et al. [33] benchmarked 41 different saliency detection algorithm each based on different assumptions and heuristics. For example, Li et al. [38] compute saliency from the perspective of image reconstruction error of background images generated at different level of details. A graph-based approach is used instead by Yang et al. [39]. Again, superpixels are the base for the saliency computation. Foreground and background region queries are used to rank each image regions using a

Proposed method

Our proposed saliency estimation approach aims at combining the advantages of Genetic Programming with those of Convolutional Neural Networks. With our approach, we design and optimize GP-generated solutions for saliency estimation in three steps. In the first step, Genetic Programming techniques are exploited to combine existing saliency maps using a set of provided operations. The output of this step is a fusion tree that encodes the optimal fusion strategy with respect to the defined

Experiments

In this section, we first describe the experimental setup, by introducing the input saliency estimation algorithms, the datasets that have been adopted at different phases of the optimization, and the evaluation metrics. We then present the following experiments: we select different fusion trees from the Genetic Programming phase, generate the corresponding CNNs, and evaluate them on various datasets for a comparison with the input algorithms.

Conclusions

We have proposed a general purpose neural architecture search strategy, with a focus on the estimation of image saliency. Specifically, we have devised a three-step optimization process that combines the output of existing algorithms for saliency estimation.

First, a fusion tree is generated through genetic programming, working on a set of predefined operators. The discrete search space of the operators to be used and combined is efficiently handled by the evolutionary algorithm. This initial

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgments

We gratefully acknowledge the support of NVIDIA Corporation with the donation of the Titan X Pascal GPU used for this research. The research leading to these results has received funding from TEINVEIN: TEcnologie INnovative per i VEicoli Intelligenti, CUP (Codice Unico Progetto – Unique Project Code): E96D17000110009 – Call “Accordi per la Ricerca e l’Innovazione”, cofunded by POR FESR 2014–2020 (Programma Operativo Regionale, Fondo Europeo di Sviluppo Regionale – Regional Operational

References (80)

A. Azaza et al.
Context proposals for saliency detection
Comput. Vis. Image Underst.
(2018)
R. Miikkulainen et al.
Evolving deep neural networks
Artificial Intelligence in the Age of Neural Networks and Brain Computing
(2019)
L. Itti
Visual saliency
Scholarpedia
(2007)
R.M. Shiffrin et al.
Controlled and automatic human information processing: ii. perceptual learning, automatic attending and a general theory.
Psychol. Rev.
(1977)
W. Schneider et al.
Controlled and automatic human information processing: I. Detection, search, and attention.
Psychol. Rev.
(1977)
L. Itti et al.
Computational modelling of visual attention
Nat. Rev. Neurosci.
(2001)
D. Gao et al.
Discriminant saliency, the detection of suspicious coincidences, and applications to visual recognition
IEEE Trans. Pattern Anal. Mach. Intell.
(2009)
Z. Ren et al.
Region-based saliency detection and its application in object recognition.
IEEE Trans. Circuits Syst. Video Technol.
(2014)
S. Mitri et al.
Robust object detection at regions of interest with an application in ball recognition
Proceedings of the IEEE International Conference on Robotics and Automation
(2005)
V. Navalpakkam et al.
An integrated model of top-down and bottom-up attention for optimizing detection speed
Proceedings of the 2006 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
(2006)

Q. Li et al.

Saliency based image segmentation

Proceedings of the 2011 International Conference on Multimedia Technology

(2011)

V. Mahadevan et al.

Saliency-based discriminant tracking

Proceedings of the 2009 IEEE Conference on Computer Vision and Pattern Recognition

(2009)

S. Avidan et al.

Seam carving for content-aware image resizing

ACM Trans. Graph.

(2007)

R. Margolin et al.

Saliency for image manipulation

Vis. Comput.

(2013)

D. DeCarlo et al.

Stylization and abstraction of photographs

ACM Trans. Graph.

(2002)

N. Ouerhani et al.

Adaptive color image compression based on visual attention

Proceedings of the 11th International Conference on Image Analysis and Processing (ICIAP)

(2001)

S. Corchs et al.

Video summarization using a neurodynamical model of visual attention

Proceedings of the 6th IEEE Workshop on Multimedia Signal Processing

(2004)

C. Guo et al.

A novel multiresolution spatiotemporal saliency detection model and its applications in image and video compression

IEEE Trans. Image Process.

(2010)

F. Gasparini et al.

Low quality image enhancement using visual attention

Opt. Eng.

(2007)

Y. Gao et al.

Database saliency for fast image retrieval

IEEE Trans. Multimed.

(2015)

A. Li et al.

Color image quality assessment combining saliency and FSIM

Proceedings of the Fifth International Conference on Digital Image Processing (ICDIP 2013)

(2013)

L. Wong et al.

Saliency retargeting: an approach to enhance image aesthetics

Proceedings of the 2011 IEEE Workshop on Applications of Computer Vision (WACV)

(2011)

L. Itti et al.

A model of saliency-based visual attention for rapid scene analysis

IEEE Trans. Pattern Anal. Mach. Intell.

(1998)

Y. Wei et al.

Geodesic saliency using background priors

Proceedings of the European Conference On Computer vision

(2012)

F. Perazzi et al.

Saliency filters: contrast based filtering for salient region detection

Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

(2012)

Y. Li et al.

The secrets of salient object segmentation

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition

(2014)

H. Jiang et al.

Salient object detection: a discriminative regional feature integration approach

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition

(2013)

H. Cholakkal et al.

Top-down saliency with locality-constrained contextual sparse coding.

Proceedings of the 2015 BMVC

(2015)

T. Liu et al.

Learning to detect a salient object

IEEE Trans. Pattern Anal. Mach. Intell.

(2011)

G. Li et al.

Deep contrast learning for salient object detection

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition

(2016)

Q. Hou et al.

Deeply supervised salient object detection with short connections

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition

(2017)

S. Bianco et al.

Multiscale fully convolutional network for image saliency

J. Electron. Imaging

(2018)

L. Mai et al.

Saliency aggregation: a data-driven approach

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition

(2013)

M. Amirul Islam et al.

Revisiting salient object detection: simultaneous detection, ranking, and subitizing of multiple salient objects

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition

(2018)

A. Borji et al.

Salient object detection: a benchmark

IEEE Trans. Image Process.

(2015)

S. Bianco et al.

How far can you get by combining change detection algorithms?

Proceedings of the International Conference on Image Analysis and Processing – ICIAP 2017

(2017)

S. Bianco et al.

Combination of video change detection algorithms by genetic programming

IEEE Trans. Evol. Comput.

(2017)

M. Buzzelli et al.

Combining saliency estimation methods

Proceedings of the International Conference on Image Analysis and Processing – ICIAP 2019

(2019)

D. Mazzini et al.

A CNN architecture for efficient semantic segmentation of street scenes

Proceedings of the 8th IEEE International Conference on Consumer Electronics – Berlin (ICCE-Berlin)

(2018)

X. Li et al.

Saliency detection via dense and sparse reconstruction

Proceedings of the 2013 IEEE International Conference on Computer Vision

(2013)

Cited by (22)

Neuroevolution with box mutation: An adaptive and modular framework for evolving deep neural networks
2023, Applied Soft Computing
The pursuit of self-evolving neural networks has driven the emerging field of Evolutionary Deep Learning, which combines the strengths of Deep Learning and Evolutionary Computation. This work presents a novel method for evolving deep neural networks by adapting the principles of Geometric Semantic Genetic Programming, a subfield of Genetic Programming, and Semantic Learning Machine. Our approach integrates evolution seamlessly through natural selection with the optimization power of backpropagation in deep learning, enabling the incremental growth of neural networks’ neurons across generations. By evolving neural networks that achieve nearly 89% accuracy on the CIFAR-10 dataset with relatively few parameters, our method demonstrates remarkable efficiency, evolving in GPU minutes compared to the field standard of GPU days.
Automatic design of machine learning via evolutionary computation: A survey
2023, Applied Soft Computing
Machine learning (ML), as the most promising paradigm to discover deep knowledge from data, has been widely applied to practical applications, such as recommender systems, virtual reality, and semantic segmentation. However, building a high-quality ML system for given tasks requires expert knowledge and high computation cost. This poses a significant challenge to the further development of ML in large-scale practical applications. The automatic design of ML has become an increasingly popular research trend. At the same time, evolutionary computation (EC), as an excellent heuristic search technique, has been widely employed in ML optimization, so-called evolutionary machine learning (EML). In this paper, we offer a comprehensive review of the literature (more than 500 references) for EML methods. We first introduce the concepts related to ML and EC. After that, we propose a taxonomy criterion based on the ML and EC perspectives. The important research problems of EML, e.g., ML algorithms, solution representations, search paradigms, acceleration strategies and applications, are reviewed systematically. Lastly, we analyze EML limitations and discuss potential trends that are promising to address in the future.
AutoTinyML for microcontrollers: Dealing with black-box deployability
2022, Expert Systems with Applications
Citation Excerpt :
According to the literature, several NAS approaches are available, such as Evolutionary Algorithms (EA) (Angeline, Saunders, & Pollack, 1994; Bianco, Buzzelli, Ciocca, & Schettini, 2020; Miikkulainen et al., 2019; Suganuma, Shirakawa, & Nagao, 2017; Xie & Yuille, 2017), Random Search (Liu, Simonyan, Vinyals, Fernando, & Kavukcuoglu, 2018), Reinforcement Learning (Baker, Gupta, Naik, & Raskar, 2017; Zhong, Yan, Wu, Shao, & Liu, 2018; Zoph & Le, 2017) and Sequential Model-based Optimization (SMBO) - aka Bayesian Optimization (BO) (Bergstra, Yamins, & Cox, 2013; Domhan, Springenberg, & Hutter, 2015; Jin, Song, & Hu, 2019; Mendoza, Klein, Feurer, Springenberg, & Hutter, 2016; Zela, Klein, Falkner, & Hutter, 2018), but they do not consider any constraint about the deployment of the trained model onto a device with limited hardware capacity. Indeed, a promising direction is to develop NAS methods for multi-objective problems (Dong, Cheng, Juan, Wei, & Sun, 2018; Elsken, Metzen, & Hutter, 2019; Zhou et al., 2018), in which measures of resource efficiency are used as objectives along with the predictive performance on unseen data.
While many companies are currently leveraging on Cloud, data centres and specialized hardware (e.g., GPUs and TPUs) to train very accurate Machine Learning models, the need to deploy and run these models on tiny devices is emerging as the most relevant challenge, with a massive untapped market. Although Automated Machine Learning and Neural Architecture Search frameworks are successfully used to find accurate models by trying a small number of alternatives, they are typically performed on large computational platforms and they cannot directly deal with deployability, leading to an accurate model which could result undeployable on a tiny device. To bridge the gap between these two worlds, we present an approach extending these frameworks to include the constraints related to the limited hardware resources of the tiny device which the trained model has to run on. Experimental results on two benchmark classification tasks and two microcontrollers prove that our AutoTinyML framework can efficiently identify models which are both accurate and deployable, in case accepting a reasonable reduction in accuracy compared to a significant reduction in hardware usages, without applying any quantization techniques of the model.
Automated design of CNN architecture based on efficient evolutionary search
2022, Neurocomputing
Citation Excerpt :
The efficient building units of architectures can ensure the effectiveness of the generated architectures, so that the algorithm can find an architecture with good performance as soon as possible. In the optimization of saliency detection algorithms, Bianco et al. [18] encoded the operators in existing saliency algorithms and constructed discrete search spaces to accelerate the generation of candidate solutions. Performance predictors aim to avoid time-consuming training processes by predicting the fitness values of DNNs.
Evolutionary Neural Architecture Search (ENAS) is a promising method for the automated design of deep network architecture, which has attracted extensive attention in the field of automated machine learning. However, the existing ENAS methods often need a lot of computing resources to design CNN architecture automatically. In order to achieve efficient and automated design of CNNs, this paper focuses on two aspects to improve efficiency. On the one hand, efficient CNN-based building blocks are introduced to ensure the effectiveness of the generated architectures and a triplet attention mechanism is incorporated into the architectures to further improve the classification performance. On the other hand, a random forest-based performance predictor is used in the fitness evaluation to reduce the amount of computation required to train each individual from scratch. Experimental results show that the proposed algorithm can significantly reduce the computational resources required and achieve competitive classification performance on the CIFAR dataset. Also, the architecture designed for the traffic sign recognition task exceeds the accuracy of manual expert design.
Evolutionary neural architecture search for remaining useful life prediction
2021, Applied Soft Computing
Citation Excerpt :
The approaches proposed in [62], instead, make use of Grammatical Evolution [63] to evolve DNNs. In [64], GP is used to design CNNs to perform image saliency fusion. As a final remark, as shown in [10–12], it is worth to note that evolutionary approaches to NAS yield NNs that have a good trade-off between performance and size.
With the advent of Industry 4.0, making accurate predictions of the remaining useful life (RUL) of industrial components has become a crucial aspect in predictive maintenance (PdM). To this aim, various Deep Neural Network (DNN) models have been proposed in the recent literature. However, while the architectures of these models have a large impact on their performance, they are usually determined empirically. To exclude the time-consuming process and the unnecessary computational cost of manually engineering these models, we present a Neural Architecture Search (NAS) technique based on an Evolutionary Algorithm (EA) applied to optimize the architecture of a DNN used to predict the RUL. The EA explores the combinatorial parameter space of a multi-head Convolutional Neural Network with Long Short Term Memory (CNN-LSTM) to search for the best architecture. In particular, our method requires minimum computational resources by making use of an early stopping policy and a history of the evaluated architectures. We dub the proposed method ENAS-PdM. To our knowledge, this is the first work where an EA-based NAS is used to optimize a CNN-LSTM architecture in the field of PdM. In our experiments, we use the well-established Commercial Modular Aero-Propulsion System Simulation (C-MAPSS) dataset from NASA. Compared to the current state-of-the-art, our method obtains better results in terms of two different metrics, RMSE and Score, when aggregating across all the C-MAPSS sub-datasets. Without aggregation, we achieve lower RMSE in 3 out of 4 sub-datasets. Our experimental results verify that the proposed method is a reliable tool for obtaining state-of-the-art RUL predictions and as such it can have a strong impact in several industrial applications, especially those with limited available computing power.
Environment enhanced fusion of infrared and visible images based on saliency assignment
2024, Signal, Image and Video Processing

View all citing articles on Scopus

View full text

Neural architecture search for image saliency fusion

Highlights

Abstract

Introduction

Section snippets

Saliency detection algorithms

Proposed method

Experiments

Conclusions

Declaration of Competing Interest

Acknowledgments

Comput. Vis. Image Underst.

Visual saliency

Scholarpedia

Controlled and automatic human information processing: ii. perceptual learning, automatic attending and a general theory.

Psychol. Rev.

Controlled and automatic human information processing: I. Detection, search, and attention.

Psychol. Rev.

Computational modelling of visual attention

Nat. Rev. Neurosci.

Discriminant saliency, the detection of suspicious coincidences, and applications to visual recognition

IEEE Trans. Pattern Anal. Mach. Intell.

Region-based saliency detection and its application in object recognition.

IEEE Trans. Circuits Syst. Video Technol.

Robust object detection at regions of interest with an application in ball recognition

Proceedings of the IEEE International Conference on Robotics and Automation

An integrated model of top-down and bottom-up attention for optimizing detection speed

Proceedings of the 2006 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

Saliency based image segmentation

Proceedings of the 2011 International Conference on Multimedia Technology

Saliency-based discriminant tracking

Proceedings of the 2009 IEEE Conference on Computer Vision and Pattern Recognition

Seam carving for content-aware image resizing

ACM Trans. Graph.

Saliency for image manipulation

Vis. Comput.

Stylization and abstraction of photographs

ACM Trans. Graph.

Adaptive color image compression based on visual attention

Proceedings of the 11th International Conference on Image Analysis and Processing (ICIAP)

Video summarization using a neurodynamical model of visual attention

Proceedings of the 6th IEEE Workshop on Multimedia Signal Processing

A novel multiresolution spatiotemporal saliency detection model and its applications in image and video compression

IEEE Trans. Image Process.

Low quality image enhancement using visual attention

Opt. Eng.

Database saliency for fast image retrieval

IEEE Trans. Multimed.

Color image quality assessment combining saliency and FSIM

Proceedings of the Fifth International Conference on Digital Image Processing (ICDIP 2013)

Saliency retargeting: an approach to enhance image aesthetics

Proceedings of the 2011 IEEE Workshop on Applications of Computer Vision (WACV)

A model of saliency-based visual attention for rapid scene analysis

IEEE Trans. Pattern Anal. Mach. Intell.

Geodesic saliency using background priors

Proceedings of the European Conference On Computer vision

Saliency filters: contrast based filtering for salient region detection

Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

The secrets of salient object segmentation

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition

Salient object detection: a discriminative regional feature integration approach

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition

Top-down saliency with locality-constrained contextual sparse coding.

Proceedings of the 2015 BMVC

Learning to detect a salient object

IEEE Trans. Pattern Anal. Mach. Intell.

Deep contrast learning for salient object detection

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition

Deeply supervised salient object detection with short connections

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition

Multiscale fully convolutional network for image saliency

J. Electron. Imaging

Saliency aggregation: a data-driven approach

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition

Revisiting salient object detection: simultaneous detection, ranking, and subitizing of multiple salient objects

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition

Salient object detection: a benchmark

IEEE Trans. Image Process.

How far can you get by combining change detection algorithms?

Proceedings of the International Conference on Image Analysis and Processing – ICIAP 2017