Elsevier

Neurocomputing

Volume 247, 19 July 2017, Pages 39-58
Neurocomputing

Evolving multi-dimensional wavelet neural networks for classification using Cartesian Genetic Programming

https://doi.org/10.1016/j.neucom.2017.03.048Get rights and content

Highlights

  • A new approach to evolution of multidimensional wavelet neural networks is presented.

  • Evolution of the wavelet rotation parameter in WNN is investigated for the first time.

  • Incorporation of the rotation parameter improves performance of the network.

  • Computational complexity is reduced by switching WNN input features on and off.

  • A recent publicly released Parkinson’s disease dataset is investigated along with others.

Abstract

Wavelet Neural Networks (WNNs) are complex artificial neural systems and their training can be a challenge. In the past, most common training schemes for WNNs, such as gradient descent, have been restricted to training only a subset of differentiable parameters. In this paper, we propose an evolutionary method to train both differentiable and non-differentiable parameters using the concept of Cartesian Genetic Programming (CGP). The approach was evaluated on the two-spiral task and on real-world datasets for the detection of breast cancer and Parkinson’s disease. In our experiments, the performance of the proposed method was comparable to several standard methods of classification. On the breast cancer dataset, the performance was better than other non-ensemble and multistep processing methods. The experimental results show how the performance of WNNs depends on the number of wavelons used. The presented case studies demonstrate that the proposed WNNs perform competitively in comparison to several other methods and results reported in literature.

Introduction

The wavelet transform has been used in pattern recognition, signal processing and compression applications for its ability to extract information from signals at either high time or frequency resolutions [1], [2], [3]. Wavelet neural networks (WNNs) utilize the concept of the wavelet transform in neural networks.

A combined model of wavelets and neural networks is suitable for function approximation and can be used for prediction and classification. WNNs have been successfully applied in many areas, including signal denoising [4], signal classification and compression [5], short-term electricity load forecasting [6], speech segmentation [7] and speaker recognition [8]. WNNs can provide better function approximation ability than standard multilayer perceptrons (MLPs) and radial basis function (RBF) neural networks over a wide range of applications [9], [10].

A WNN is determined by five key parameters. Three of them refer to the activation function (scale, translation, rotation) and two of them to the architecture of the network (weights and number of neurons). More details on the behavior of these parameters is discussed in Section 2.1.4. The standard training procedure of a WNN employs a gradient descent algorithm that can suffer from slow convergence and local optima [11]. In order to optimize the performance of WNNs, a number of research studies have utilized evolutionary algorithms and evolutionary programming techniques [12], [13].

Prediction of air and ground traffic flow [14], [15], energy consumption [16], large scale function estimation [17], function approximation [18], [19], [20], power transformer monitoring [21] and centrifugal compression [22] are a few of the many applications of WNNs which utilize genetic algorithms (GA). WNN evolution via differential evolution (DE) has also been quite successful and includes applications such as load forecasting [23] and bankruptcy prediction [24]. This variety of applications illustrates the adaptability of WNNs to different data domains.

The most common strategy to optimize combinations of WNN parameters is to evolve only the activation function parameters [14], [15], [16], [17]. Awad [25] evolved the translation and scale parameters using a GA and trained the weights using the Levenberg Marquardt algorithm. Jinru et al. [26] use a two-stage approach, where a GA is used first for a global search of the parameters, and in the second stage, the optimized parameters are further fine-tuned by using local search algorithms like gradient descent. In [27], the translation parameter is adaptive to the network input and its response to a non-linear function while the remaining attributes are evolved and optimized using particle swarm optimization. Simultaneous evolution of activation function parameters and network structure have also been studied and applied in various domains such as function approximation, Parkinson’s disease detection and prediction of hydro-turbine machine condition [20], [28], [29].

Apart from the existing methods of training wavelet parameters, there are a number of optimization algorithms, including self-adaptive differential evolution [30] and a social emotional algorithm which uses local search function [31] that can be used to optimize the WNN parameters.

In the present paper, a novel algorithm based on the concept of Cartesian Genetic Programming (CGP) is used to evolve a multi-dimensional wavelet neural network, so that its potential application to classification tasks can be evaluated. The paper also aims to contribute to a better understanding of the behaviour of WNNs when their parameters are adjusted.

CGP is an evolutionary programming technique developed by Miller et al. [32]. The concept of CGP has also been used to evolve artificial neural networks [33]. The motivation behind using CGP for evolving parameters is, firstly, that CGP doesn’t bloat [34] because the network becomes dominated by redundant genes that have a neutral effect [35], [36], [37] on the performance. Secondly, most of the applications evolved via CGP are generic, robust and present good accuracy compared to other methods [38], [39], [40], [41].

The computational cost of a WNN increases with the input dimensions of the system. Our objective is to introduce an algorithm that would have the ability to switch features on and off, hence making them either active or inactive during the evolution process. Discarding too many features might result in reduced accuracy. The advantage of using an evolution-based concept to evolve parameters is that features can be pruned during evolution while balancing the need for accuracy, thus efficiently reducing the time to train a network.

Another contribution of this work is the introduction of a rotation parameter Ri, represented as an n × n matrix where n is the total number of input features. Rotation matrices have not been used in any similar applications yet, due to non-differentiability issues and high computational cost. Our intent is to exploit rotations so that the approximation capability of WNNs can be correctly assessed.

In two of our previous publications [29], [42] we have used CGP to evolve wavelet parameters for a one-dimensional WNN. The present manuscript is about a separate study on the concept of multi-dimensional WNNs and the introduction of the rotation parameter for approximating functions. The structure of this paper is as follows. Section 2 describes WNNs, their properties and the tuning parameters, with visual examples. This section also introduces the mechanism used for building wavelet networks via Cartesian Genetic Programming (CGPWNN), constituting the main technical contribution of the paper. Sections 3–5 present the application of WNNs to three test problems: the standard 2D spiral benchmark, breast cancer classification via mammographic images, and Parkinson’s disease detection via speech signal analysis. Section 6 incorporates conclusions and possible directions for future research.

Section snippets

Wavelet neural networks

WNNs represent a class of neural networks with wavelets as activation functions; i.e. they combine the theory of wavelet transforms and neural networks [43]. WNNs generally have a feed-forward structure, with one hidden layer, as shown in Fig. 1, and activation functions are drawn from an orthonormal wavelet family. The most common wavelet activation functions are Gaussian, Mexican hat, Morelet and Haar wavelets [44].

Three parameters play a significant role in the tuning of wavelets for

Case study I: Two-spiral task

The two-spiral task is a benchmark task for non-linear classification [71], [72]. The dataset consists of two spirals, each with 97 sample data points in a 2D Cartesian space (shown in Fig. 11). The objective is to classify sample points close to each of the spirals by using only the (x, y)-Cartesian coordinates.

In this study, the two-spiral task is investigated under three different configurations of the wavelet neural network.

Case study II: Breast cancer classification

The Digital Database for Screening Mammography (DDSM) [74], [75] is an online repository of mammographic images of different resolutions obtained from various hospitals. The suspicious areas on the mammograms are manually marked by two experienced radiologists. For analysis, these markings are represented as chain codes and hence can be extracted easily.

In the dataset used by [76], mammographic images scanned by a HOWTEK scanner at 43.5 microns per pixel spatial resolution were downloaded and

Case study III: Sakar’s Parkinson’s disease dataset classification

This part of the research uses a recent, publicly available dataset from an online machine learning data repository from the University of California at Irvine (UCI) [98], [99]. The dataset represents features extracted from speech signals of Parkinson’s disease (PD) of affected and healthy individuals.

Case study IV: Little’s Parkinson’s disease dataset classification

This case study uses a Parkinson’s disease dataset donated by Max Little to the University of California Irvine’s machine learning repository [103], [104]. This dataset consists of multiple recordings of the same speech from 31 individuals. Each individual has 6 or 7 speech records. A total of 22 features were extracted from each speech sample using acoustic analysis software. Details of the dataset and the research literature surrounding its usage can be found in [29].

Conclusion and future work

Wavelet neural networks (WNNs) combine the characteristics of wavelet transforms and neural networks. They have been the focus of many studies, including studies on time-series prediction and approximation of 1D functions. One of the contributions of our study is the introduction of a rotation parameter for multi-dimensional networks. In addition, we have proposed a genetic algorithm to evolve all of the parameters of the network to obtain better classification accuracies. We have applied these

Acknowledgment

The first author would like to acknowledge the support through an Australian Government Research Training Program Scholarship.

Maryam Mahsal Khan did her B.Sc. Computer System Engineering from University of Engineering & Technology Peshawar, Pakistan in 2005 and Masters in Electrical & Electronic Engineering from Universti Teknologi Petronas, Malaysia in 2008. Before commencing her Ph.D. at the University of Newcastle, she worked as an Assistant Professor at UET Peshawar, Pakistan and later as a Research Engineer at LMKR Pvt. Ltd, Islamabad, Pakistan. She has a keen interest in Non-Linear Control, Genetic Algorithms

References (109)

  • R. Cheng et al.

    Radial wavelet neural network with a novel self-creating disk-cell-splitting algorithm for license plate character recognition

    Entropy (Basel)

    (2015)
  • J. Guillermo et al.

    Intelligent classification of real heart diseases based on radial wavelet neural network

    Proceedings of the Cairo International Biomedical Engineering Conference (CIBEC2014)

    (2014)
  • L. Breiman et al.

    Classification and Regression Trees

    (1984)
  • RuleQuest Research, C5.0: An informal tutorial, 2010, Accessed March 2017, URL...
  • S. Pour et al.

    Comparing data mining with ensemble classification of breast cancer masses in digital mammograms

    Proceedings of the Second Australian Workshop on Artificial Intelligence in Health: AIH 2012

    (2012)
  • UCI machine learning repository; parkinson speech dataset with multiple types of sound recordings data set, URL...
  • S.G. Mallat

    A theory for multiresolution signal decomposition: the wavelet representation

    IEEE Trans. Pattern Anal. Mach. Intell.

    (1989)
  • Bao-Guo XuA.-G. S.

    Pattern recognition of motor imagery EEG using wavelet transform

    J. Biomed. Sci. Eng.

    (2008)
  • S. Kadambe et al.

    Adaptive wavelets for signal classification and compression

    Int. J. Electron. Commun.

    (2006)
  • SzuH.H. et al.

    Neural network adaptive wavelets for signal representation and classification

    Opt. Eng.

    (1992)
  • SzuH. et al.

    Wavelet transforms and neural networks for compression and recognition

    Neural Netw.

    (1996)
  • ZhangJ. et al.

    Wavelet neural networks for function learning

    IEEE Trans. Signal Process.

    (1995)
  • YangH.-J. et al.

    Wavelet neural network with improved genetic algorithm for traffic flow time series prediction

    Opt. - Int. J. Light Electron. Opt.

    (2016)
  • YaoS. et al.

    Evolving wavelet neural networks

    Proceedings of the IEEE International Conference on Neural Networks

    (1995)
  • QiuF. et al.

    Air traffic flow of genetic algorithm to optimize wavelet neural network prediction

    Proceedings of the IEEE International Conference on Software Engineering and Service Science (ICSESS’2014)

    (2014)
  • jun YangH. et al.

    Wavelet neural network with improved genetic algorithm for traffic flow time series prediction

    Opt. - Int. J. Light Electron. Opt.

    (2016)
  • ZhaoH. et al.

    Analysis of energy consumption prediction model based on genetic algorithm and wavelet neural network

    Proceedings of the 3rd International Workshop on Intelligent Systems and Applications (ISA’2011)

    (2011)
  • D. Sahoo et al.

    Evolutionary wavelet neural network for large scale function estimation in optimization

    Proceedings of the 11th Multidisciplinary Analysis and Optimization Conference (AIAA/ISSMO)

    (2006)
  • XuJ.

    A genetic algorithm for constructing wavelet neural networks

    Proceedings of the International Conference on Intelligent Computing (ICIC’2006)

    (2006)
  • LuoY. et al.

    A niche hierarchy genetic algorithms for learning wavelet neural networks

    Proceedings of the 2nd IEEE Conference on Industrial Electronics and Applications

    (2007)
  • HuangM. et al.

    A novel learning algorithm for wavelet neural networks

  • HuangY.-C. et al.

    Evolving wavelet networks for power transformer condition monitoring

    IEEE Trans. Power Deliv.

    (2002)
  • Liang-yongH. et al.

    Immune evolutionary algorithm of wavelet neural network to predict the performance in the centrifugal compressor and research

    Proceedings of the Third International Conference on Measuring Technology and Mechatronics Automation (ICMTMA’2011)

    (2011)
  • LiaoG.C.

    Application a novel evolutionary computation algorithm for load forecasting of air conditioning

    Proceedings of the Asia-Pacific Power and Energy Engineering Conference

    (2012)
  • ChauhanN. et al.

    Differential evolution trained wavelet neural networks: Application to bankruptcy prediction in banks

    Expert Syst. Appl.

    (2009)
  • M. Awad

    Using genetic algorithms to optimize wavelet neural networks parameters for function approximation

    Int. J. Comp. Sci. Issues

    (2014)
  • JinruL. et al.

    Fault diagnosis of piston compressor based on wavelet neural network and genetic algorithm

    Proceedings of the 7th World Congress on Intelligent Control and Automation (WCICA’2008)

    (2008)
  • LingS. et al.

    Improved hybrid particle swarm optimized wavelet neural network for modeling the development of fluid dispensing for electronic packaging

    IEEE Trans. Ind. Electron.

    (2008)
  • M.M. Khan, S.K. Chalup, A. Mendes, Parkinson's disease data classification using evolvable wavelet neural networks, in:...
  • GuoZ. et al.

    Self-adaptive differential evolution with global neighborhood search

    Soft Comput.

    (2016)
  • GuoZ. et al.

    Enhancing social emotional optimization algorithm using local search

    Soft Comput.

    (2016)
  • J.F. Miller, P. Thomson, Cartesian genetic programming, in: Proceedings of the European Conference on...
  • M. Khan et al.

    Fast learning neural networks using cartesian genetic programming

    Neurocomputing

    (2013)
  • J. Miller

    What bloat? Cartesian genetic programming on boolean problems

    Proceedings of the Genetic and Evolutionary Computation Conference (GECCO2001) - Late Breaking Papers

    (2001)
  • V.K. Vassilev, J.F. Miller, The advantages of landscape neutrality in digital circuit evolution, in: Proceedings of the...
  • T. Yu, J. Miller, Neutrality and the evolvability of Boolean function landscape, in: Proceedings of the 4th European...
  • T. Yu, J. Miller, Finding needles in haystacks is not hard with neutrality, in: Proceedings of the 5th European...
  • M. Khan et al.

    Efficient representation of recurrent neural networks for Markovian/non-Markovian non-linear control problems

    Proceedings of the International Conference on System Design and Applications (ISDA2010)

    (2010)
  • A. Walker et al.

    Solving real-valued optimisation problems using cartesian genetic programming

    Proceedings of the Genetic and Evolutionary Computation Conference (GECCO2007)

    (2007)
  • J.A. Walker, J.F. Miller, Predicting prime numbers using cartesian genetic programming, in: Proceedings of the 10th...
  • Cited by (24)

    • Enhanced decision tree induction using evolutionary techniques for Parkinson's disease classification

      2022, Biocybernetics and Biomedical Engineering
      Citation Excerpt :

      Their experiment on the identification of PD showed that the SVM achieved a higher performance compared to all the other methods, and the RBFNN needed a large dataset to obtain better results. Khan, Mendes [32] developed a system for PD diagnosis using cartesian genetic programming (CGP) to evolve a multi-dimensional wavelet neural network, and achieved an accuracy of 90.13%. Little, McSharry [33] proposed a new method for constructing features based on the calculation of traditional (Kay Pentax Multi-Dimensional Voice Program), non-standard (correlation dimension D2) and pitch period entropy (PPE) measures.

    • Segmentation of skin lesion images using discrete wavelet transform

      2021, Biomedical Signal Processing and Control
    • Single-channel SEMG using wavelet deep belief networks for upper limb motion recognition

      2020, International Journal of Industrial Ergonomics
      Citation Excerpt :

      The activation function of the standard RBM is generally selected as the sigmoid function, which is difficult to establish the accurate mapping relationship between many modes and input signals. Many current studies have proved that as a new activation function of shallow neural network, wavelet neural network (WNN) usually shows obvious advantages over traditional neural network (Yang and Hu, 2016; Khan et al., 2017). The wavelet transform can gradually multi-scale refine the SEMG signals by scaling translation, which has the characteristics of time-frequency localization.

    • Optimizing wavelet neural networks using modified cuckoo search for multi-step ahead chaotic time series prediction

      2019, Applied Soft Computing Journal
      Citation Excerpt :

      The most difficult problem is that because the explicit formula only considers the input data domain, it may fail miserably if the observed data are contaminated, or if subtle changes exist. A similar heuristic initialization procedure which is based on the domain of input space, too, has been proposed by Oussar [6] and has been used in several studies [7,8]. To alleviate the existing limitation, a host of approaches have been put forward in this direction.

    View all citing articles on Scopus

    Maryam Mahsal Khan did her B.Sc. Computer System Engineering from University of Engineering & Technology Peshawar, Pakistan in 2005 and Masters in Electrical & Electronic Engineering from Universti Teknologi Petronas, Malaysia in 2008. Before commencing her Ph.D. at the University of Newcastle, she worked as an Assistant Professor at UET Peshawar, Pakistan and later as a Research Engineer at LMKR Pvt. Ltd, Islamabad, Pakistan. She has a keen interest in Non-Linear Control, Genetic Algorithms and Genetic Programming, Artificial Neural Networks, Pattern Recognition, Image Processing, Signal Processing, Time-frequency decomposition. She has a range of publications in these fields in the conferences of repute.

    Alexandre Mendes received his Ph.D. degree in Electrical Engineering from the State University of Campinas, Brazil, in 2003. He is a Senior Lecturer with the School of Electrical Engineering and Computer Science at The University of Newcastle, Australia. His research interests include optimization and data mining, with applications in bioinformatics, robotics and operations research.

    Dr. Ping Zhang is a Research Fellow at Menzies Health Institute Queensland, Griffith University Australia. She has worked in bioinformatics and health informatics research area in the last 10 years. Her research interests include pattern recognition, biomarkers discovery, vaccine target identification and applying machine learning and statistical techniques for medical decision making.

    Stephan Chalup is an Associate Professor in Computer Science and Software Engineering at the University of Newcastle, Australia, where he leads the Interdisciplinary Machine Learning Research Group. He received his Ph.D. (Machine Learning) in 2002 from Queensland University of Technology in Brisbane, Australia. His research interests include manifold learning, kernel machines, humanoid robots, computer vision, and neural information processing systems.

    View full text