Genetic programming for automatic skin cancer image classification

https://doi.org/10.1016/j.eswa.2022.116680Get rights and content

Highlights

  • Proposed methods construct new models with texture, color, and wavelet features.

  • Evolved features are highly informative to discriminate between skin image classes.

  • New features improve classification accuracy, efficient in real-time clinic situation.

  • Identify prominent visual features to help the dermatologist in making a diagnosis.

  • Achieved 86.77% accuracy on difficult dataset, outperforming the state-of-the-arts.

Abstract

Developing a computer-aided diagnostic system for detecting various types of skin malignancies from images has attracted many researchers. However, analyzing the behaviors of algorithms is as important as developing new systems in order to establish the effectiveness of a system in real-time situations which impacts greatly how well it can assist the dermatologist in making a diagnosis. Unlike many machine learning approaches such as Artificial Neural Networks, Genetic Programming (GP) automatically evolves models with its dynamic representation and flexibility. This study aims at analyzing recently developed GP-based approaches to skin image classification. These approaches have utilized the intrinsic feature selection and feature construction ability of GP to effectively construct informative features from a variety of pre-extracted features. These features encompass local, global, texture, color and multi-scale image properties of skin images. The performance of these GP methods is assessed using two real-world skin image datasets captured from standard camera and specialized instruments, and compared with six commonly used classification algorithms as well as existing GP methods. The results reveal that these constructed features greatly help improve the performance of the machine learning classification algorithms. Unlike “black-box” algorithms like deep neural networks, GP models are interpretable, therefore, our analysis shows that these methods can help dermatologists identify prominent skin image features. Further, it can help researchers identify suitable feature extraction methods for images captured from a specific instrument. Being fast, these methods can be deployed for making a quick and effective diagnosis in actual clinic situations.

Introduction

Skin cancer is a serious and challenging public health concern, with over 5 million new cases diagnosed every year in the United States (Siegel et al., 2019). Melanoma is the most serious form of skin cancer, which becomes life-threatening if not treated early (Matthews et al., 2017). In 2019, the global incidence of skin cancer was estimated to be over 104,350 cases, with almost 11,650 deaths (Siegel et al., 2019). While the mortality is quite high, survival rate of melanoma exceeds 95% when diagnosed in earlier stages (Matthews et al., 2017). The dramatic rise in the incidence of skin cancer, annoying biopsy procedures, and huge medical costs have made its early diagnosis a top priority of public health.

For examining a skin lesion, dermatologists commonly follow the ABCD (Asymmetry, Border irregularity, Color variation and Dermoscopic structure) rule of dermoscopy (Stolz et al., 1994). This rule calculates a score by measuring these four lesion properties to effectively divide different kinds of skin cancer images (Kasmi & Mokrani, 2016). Another regularly utilized clinical methodology is the 7-point check-list strategy (Streaks, Asymmetry, Blue-whitish veil, Dots, Regression areas, Pigment network, and presence of six colors: red, white, light-brown, dark-brown, blue-gray, black) (Argenziano et al., 1998). These significant visual properties and accessibility of an enormous number of skin cancer images have interested numerous researchers to present CAD frameworks that can help dermatologist in early identification. Automatic skin cancer image classification is a very difficult task mainly because (1) the low contrast of skin lesions, (2) the different location of lesion in an image, (3) the various artifacts in an image, e.g., gel, reflection, and hair (4) the immense intra-class variations of melanomas, and (5) the huge inter-class similarity between different kinds of skin cancers (Yu et al., 2017). Along these lines, it is necessary to define strategies that can capture informative features, which are by one way or another fit to imitate these clinical properties and, henceforth, utilize different texture, color, local, and global features. Local features are extracted from a sub-image whereas global features are extracted from the entire image. Such diagnostic systems or classification methods are potentially useful, which not only accurately classify a kind of skin cancer quickly in actual circumstances but also identify significant features effectively to help the dermatologist learn the critical visual patterns from the skin lesions. Moreover, the original set of features extracted from images may be redundant or irrelevant or in other words, may not contain significant information for accurately classifying these images. In such cases, feature selection (FS) and feature construction (FC) methods help to pick important features and generate new features from the original set of features to achieve improved performance (Ahmed et al., 2014, Ain et al., 2018b, Tran et al., 2015). This study analyzes classification methods developed for the tasks of binary and multi-class skin cancer image classification, which explore FS and FC characteristics of GP necessary to help the classification model learn better and achieve good performance.

In the recent years, convolutional neural networks (CNNs) have become popular in dermoscopy image analysis. Codella et al. (2015) used the Caffe architecture to perform feature extraction. Esteva et al. (2017) used a very huge private dataset which consists of both clinical and dermoscopy images to train an Inception network from scratch, aiming a performance close to a human expert. Recently, Kassem et al. (2020) utilized transfer learning and pre-trained deep neural network GoogleNet to classify eight classes of skin cancers. Zhang et al. (2020) developed an improved whale optimization algorithm for optimal selection of weights in CNNs for melanoma detection. However, with limited size of available datasets, it is usually infeasible to train a CNN effectively from scratch.

Genetic programming (GP) is an evolutionary computation method which solves a particular problem at hand by automatically evolving computer programs (often represented as trees) (Koza, 1992). GP utilizes genetic operations such as crossover, mutation, and reproduction on a current generation of programs to produce a new generation of programs (Koza & Poli, 2003). The success of GP relies on its algorithmic features: (1) no explicit assumption about the problem, (2) flexible to combine with existing approaches to get benefit from the best features of different methods, (3) robust using population and randomized options and makes GP less likely to get trapped in sub-optimal solutions, and (4) capable of providing unpredictable solutions which humans cannot presume effective for design domains (Eiben & Smith, 2015). Since, all features are not important for classification, GP utilizes its built-in FS ability and usually selects the highly significant features at its leaf nodes (terminals). These selected features usually have better ability to distinguish images of different classes, which greatly help achieve performance gains. A program evolved by GP can be considered as a binary classifier and/or a newly constructed feature (CF) that can help improve the classification accuracy. GP has not exclusively been utilized for classification, but has likewise been explored extensively for FS and FC (Ahmed et al., 2014, Ain et al., 2018b, Tran et al., 2015, Tran et al., 2016). In the field of image analysis, GP has been extensively utilized for a wide range of applications including object detection (Zhang et al., 2003), feature extraction (Lensen et al., 2016), feature construction (Lensen et al., 2016), evolving texture image descriptors (Al-Sahaf et al., 2017a, Al-Sahaf et al., 2017b), and classification (Choi and Choi, 2010, Tackett, 1993).

In view of the evaluation criteria, FS algorithms can be classified into three categories: wrapper, filter, and embedded approaches. While a wrapper approach incorporates a learning (classification) method in evaluating the feature subset, a filter approach does not utilize any classification method (Xue et al., 2016). An embedded approach integrates classifier learning and feature selection into a solitary procedure (Xue et al., 2016).

In comparison with single-tree GP which has only a single tree in an individual, multi-tree GP (MTGP) can have multiple trees in an individual (Oltean & Dumitrescu, 2006). In the literature, MTGP has been studied for texture image descriptors (Al-Sahaf et al., 2017b), multi-class classification (Muni et al., 2004), and high-dimensional data classification (Ain et al., 2018a). This study analyzes four MTGP-based methods developed for skin cancer image classification where multiple trees evolved in a GP individual corresponds to either binary classifiers or CFs. These methods have utilized various feature extraction methods to extract different kinds of informative features including local, global, color, texture and multi-resolution characteristics of skin images. These methods automatically generate multiple trees such that each tree is generated from a specific characteristic, e.g., one tree from gray-scale features, and another from border shape features. Moreover, different from the existing classification methods which are developed to produce viable outcomes for a single image modality, the MTGP methods studied in this work are robust which can perform well across images captured from multiple optical instruments. Previous methods mostly used a single type of features e.g. either color or texture to classify skin images. The MTGP methods not only employed different types of features but also defined a suitable way of combining those different types of features such as texture, color, geometrical-shape and frequency-based features to achieve high classification accuracy. Existing approaches lack automatically constructing new informative features for skin images which significantly help improve classification performance. Constructing new features from pre-extracted image features has not been explored in previous studies. Although this study is based on two previously published methods, this paper largely extends the two small papers (Ain et al., 2018b, Ain et al., 2019) by providing comparisons and detailed analysis of GP-based multiple feature construction methods including (1) how fast and effective these methods can be in computing results in actual clinic circumstances, and (2) which prominent image features are most prominent and can help the dermatologist in identifying further medical procedures.

Considering all the significant factors discussed above, we are interested to analyze four methods recently developed for classifying real-world skin cancer images, which include: (1) an embedded MTGP approach with four trees in its classification model (Ain et al., 2018a), (2) an extension of Ain et al. (2018a) with five trees in its classification model, (3) a wrapper MTGP approach with four trees (CFs), and (4) an extension of Ain et al. (2019) with five tree (CFs).

The following comparisons and analysis will be investigated in this work:

  • Single-tree versus multi-tree GP methods for binary classification and multi-class classification tasks.

  • 4-tree versus 5-tree GP representation for multiple feature construction.

  • Wrapper versus embedded approach in GP for skin image classification.

  • Visualizing the constructed features.

  • Analyzing efficiency in terms of computation time.

  • Identifying prominent features to diagnose a type of cancer.

This study can help to: (1) validate their effectiveness in real-time situations for making a quick diagnosis, (2) check how accurately melanoma images are detected, (3) check how well multiple kinds of skin cancer images have been characterized in a multi-class classification task, (4) analyze and identify significant selected features which play a vital role for generating good trees and subsequently for classification, (5) help the dermatologist learn important visual features to diagnose a cancer type, and (6) help computer vision researchers to identify which kinds of extracted features are suitable for images taken from a specific instrument to develop more effective CAD systems.

Section snippets

Background

This section first describes the basics of the GP algorithm. The four MTGP based methods utilized various kinds of features for skin image classification. The methods that were used to extract those features are then described in this section.

Multiple feature construction methods

To easily understand the different methods compared and analyzed in this paper, this section briefly describes the two versions of four-tree GP (i.e. wrapper (WGP-4) and embedded (EGP-4)) and the two versions of five-tree GP (i.e. wrapper (WGP-5) and embedded (EGP-5)) methods. The embedded methods are used for only binary classification, whereas the wrapper methods are used for both binary and multi-class classification.

Datasets

The four methods discussed in Section 3 are tested using two benchmark datasets of skin cancer images. These datasets vary in terms of: (1) the image size, (2) the optical device capturing the images, (3) the number of classes, (4) the various artifacts occluding the lesion area such as reflection, hair and gel. The details of the two datasets are listed in Table 4.

Results and discussions

This section presents and describes the findings of the experiments. The results are expressed as the mean and standard deviation of the 30 runs of GP. For binary classification, the results are represented in terms of sensitivity, specificity and balanced accuracy where mean and standard deviation (x̄±s) are shown in Table 5. For the task of multi-class classification, the balanced accuracy (x̄±s) is shown in Table 6. The deterministic methods’ results are given in terms of the mean of

Computation time

The average training time required for the WGP-4, WGP-5, EGP-4, and EGP-5 methods and to evaluate their performance on the test data in the binary and multi-class classification is plotted in Fig. 10, Fig. 11, respectively. While observing these plots, we can see that the time taken to evolve a model is typically influenced by the number of trees in a GP individual, the number of classes and the number of images in a dataset, the number of input features used to generate a tree(s) in a GP

Conclusions

This work has analyzed four recent multi-tree GP based methods developed for skin cancer image classification; two embedded and two wrapper methods. The embedded methods exploit the ability of GP to automatically evolve multiple binary classifiers in a single evolved GP individual, whereas, the wrapper approaches utilize its ability to automatically construct multiple features. The embedded approaches are used only for the binary classification, and the wrapper approaches are employed to solve

CRediT authorship contribution statement

Qurrat Ul Ain: Conceptualization, Methodology, Software, Validation, Formal Analysis, Writing – original draft, Visualization. Harith Al-Sahaf: Supervision, Project administration, Writing – review & editing. Bing Xue: Supervision, Writing – review & editing, Project administration, Funding acquisition. Mengjie Zhang: Supervision, Project administration.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgments

This work is supported in part by the University Research Fund at Victoria University of Wellington, New Zealand grant number 216378/3764 and 223805/3986, the Science for Technological Innovation Challenge (SfTI), New Zealand fund under grant E3603/2903, the Marsden Fund of New Zealand Government, New Zealand under Contracts VUW1509 and VUW1615, and the MBIE Data Science SSIF Fund, New Zealand under the contract RTVU1914.

References (50)

  • ArgenzianoG. et al.

    Epiluminescence microscopy for the diagnosis of doubtful melanocytic skin lesions: Comparison of the ABCD rule of dermatoscopy and a new 7-point checklist based on pattern analysis

    Archives of Dermatology

    (1998)
  • BalleriniL. et al.

    A color and texture based hierarchical k-NN approach to the classification of non-melanoma skin lesions

  • BarataC. et al.

    Deep learning for skin cancer diagnosis with hierarchical architectures

  • ChangT. et al.

    Texture analysis and classification with tree-structured wavelet transform

    IEEE Transactions on Image Processing

    (1993)
  • ChoiW.-J. et al.

    Computer-aided detection of pulmonary nodules using genetic programming

  • CodellaN. et al.

    Deep learning, sparse coding, and SVM for melanoma recognition in dermoscopy images

  • EibenA.E. et al.

    From evolutionary computation to the evolution of things

    Nature

    (2015)
  • EstevaA. et al.

    Dermatologist-level classification of skin cancer with deep neural networks

    Nature

    (2017)
  • GarnaviR. et al.

    Computer-aided diagnosis of melanoma using border-and wavelet-based texture analysis

    IEEE Transactions on Information Technology in Biomedicine

    (2012)
  • HallM. et al.

    The WEKA data mining software: An update

    SIGKDD Explorations Newsletter

    (2009)
  • HarangiB. et al.

    Classification of skin lesions using an ensemble of deep neural networks

  • IqbalM. et al.

    Cross-domain reuse of extracted knowledge in genetic programming for image classification

    IEEE Transactions on Evolutionary Computation

    (2017)
  • KasmiR. et al.

    Classification of malignant melanoma and benign skin lesions: Implementation of automatic ABCD rule

    IET Image Processing

    (2016)
  • KassemM.A. et al.

    Skin lesions classification into eight classes for ISIC 2019 using deep convolutional neural network and transfer learning

    IEEE Access

    (2020)
  • KawaharaJ. et al.

    Deep features to classify skin lesions

  • Cited by (18)

    View all citing articles on Scopus
    View full text