Classification of human cancer diseases by gene expression profiles
Graphical abstract
Introduction
The term cancer is utilized to identify diseases wherein an abnormal cell division without control is exists. This uncontrolled division causes a lump called by the tumor to form, or rogue immune system cells to develop, invades other tissues and spreads to other parts of the body through the blood vessels and lymph systems. The spectral range of cancer types surpasses 100 different tumors, mostly named by the positioning in the body where in fact the cancer first developed or by the type of tissue cell in where they start/originate (histological type). By kind of tissue cell, cancers may be categorized into six major categories: Carcinoma, Sarcoma, Myeloma, Leukemia, Lymphoma and Mixed Types. By primary site of origin, cancers may be of specific types like breast cancer, lung cancer, prostate cancer, liver cancer renal cell carcinoma (kidney cancer), oral cancer, brain cancer, etc. In cancer medical diagnosis, classification of the several tumor types is of the greatest importance. An accurate prediction of several tumor types provides better treatment and toxicity minimization on patients. Traditional methods of tackling this situation are mostly based on morphological characteristics of tumorous tissue. These conventional methods are reported to acquire several diagnosis limitations. Consequently, creating methodologies that can effectively distinguish between cancers subtypes is vital effectively.
The performances of DNA microarrays empowered the simultaneous observing of expression levels of a large number of genes [1] and also have motivated the ascent of computational evaluation including machine learning techniques. These procedures have been useful to extract patterns and build classification models from gene expression data and also have supported in cancers prediction [2] and prognosis [3]. DNA microarray technology has been broadly found in cancers studies for prediction of disease broadly. It really is a great platform effectively used for the analysis of gene expression in a multitude of experimental researchers [4]. Utilizing microarray technology, you can analyze gene-expression levels of thousands of genes from two test sample cells. With regards to the way to obtain the samples, essential investigations, like disease improvement, accurate diagnosis, medication response and prognosis after treatment, should be achieved [5].
Many successful feature selection algorithms had been devised and the review of feature selection algorithms might be within [6]. Several prior research workers [7], [8], [9], [10] were involved in research of goodness of an attribute subset in deciding an optimum one. The essential feature selection was an optimization problem. Within the paper [11] recommended the well-organized selection of discriminative genes from microarray gene expression data for cancers diagnosis. In his analysis [12] showed about dimension reduction for classification with gene expression microarray data. Comparison of general schemes for gene selection methods [13], [14], [15] as shown in Table 1.
This paper tackles the classification problem of human cancer diseases by using gene expression profiles. It presents a new methodology to analyze microarray datasets and efficiently classify cancer diseases. The new methodology first employs IG for feature selection, then employs GA for feature reduction and finally employs GP for cancer diseases classification. This method (IG/GA) improves classification accuracy of cancer classification by reducing the number of features and preventing the GA from being trapped in a local optimum. The proposed methodology is evaluated by classifying cancers diseases in seven cancer datasets and the results are compared with most recent approaches.
The rest of this paper is structured the following. Section 2 identifies the problem and its own challenges. Section 3 presents a literature review of related work. Section 4 presents a synopsis of feature selection and information gain. Section 5 presents the proposed methodology while as the experimental results are discussed in Section 6. Finally, Section 7 lists the concluding remarks.
Section snippets
Problem definition and challenges
Gene classification as the area of research poses a new challenge because of its unique problem characteristics. First, the challenge originates from the exclusive natural environment of the prevailing genes expression dataset; where almost all of these datasets have a sample size below 200, vs. thousands to hundred thousands of genes provided in each tuple. Second, just a few amounts of these genes present relevant features to the investigated disease. Third, originates from the occurrence of
Related work
The evaluation of gene expression data obtained in microarray tests has been of great involvement in the research regions of pattern recognition, machine learning, and statistics. Researchers across the world are attracted to the problem of discovering biologically interesting information in the expression data of so many genes. As stated before the key problem is the proportion between the extensive amount of genes assessed per instance and the small amount of available samples. Gene selection
Feature selection and information gain
Feature selection is a preprocessing procedure expecting to select the most informative genes that can separate groups, i.e., cancer subtypes. The essential reason is to discover a reduced group band of features from a dataset to diminish the initial feature space dimensionality. Generally, cancers classification studies require the use of formal strategies of feature selection for just two explanations:
- •
To lower the computational requirements in experimental responsibilities, which helps the
Proposed methodology
Fig. 1 shows the general framework of the proposed approach. The methodology first accepts Gene Microarray Dataset as input patterns and selects the significant features (feature selection) from the input patterns by using IG. The selected features are then reduced by applying GA. Finally, the methodology employs GP for cancer types’ classification [32].
Experimental results
This section presents the performance evaluation of the proposed IG/SGA methodology. The proposed framework is verified by considering 7 Cancer Gene Expression Datasets. For each test, two important criteria are used for observational assessment of the performance evaluation:
- •
A number of selected genes.
- •
Predictive accuracy on selected gene.
Conclusions
Classification of cancer predicated on gene expression data is an encouraging research area in the field of data mining. The suggested algorithm tended to the issue of early diagnosis cancer any particular one of the world’s most genuine health issues. In this paper, a new methodology is provided to classify human cancers diseases predicated on the gene expression profiles. Within the proposed methodology, IG can be utilized for feature selection first, then GA is utilized for feature reduction
References (54)
- et al.
Genetic bee colony (GBC) algorithm: a new gene selection method for microarray cancer classification
Comput. Biol. Chem.
(2015) - et al.
Gene expression data classification using support vector machine and mutual information-based gene selection
Procedia Comput. Sci.
(2015) - et al.
Hybrid genetic algorithm-neural network: feature extraction for unpreprocessed microarray data
Artif. Intell. Med.
(2011) - et al.
A fuzzy intelligent approach to the classification problem in gene expression data analysis
Know. Based Syst.
(2012) - et al.
Hidden Markov models for cancer classification using gene expression profiles
(2015) - et al.
Machine learning applications in cancer prognosis and prediction
Comput. Struct. Biotechnol. J.
(2015) - et al.
Selecting significant genes by randomization test for cancer classification using gene expression data
J. Biomed. Inform.
(2013) Gene expression correlates of clinical prostate cancer behavior
Cancer Cell
(2002)- et al.
Identification of a 12-gene signature for lung cancer prognosis through machine learning
J. Cancer Ther.
(2011) - et al.
Applications of machine learning in cancer prediction and prognosis, US national library of medicine national institutes of health
Cancer Inf.
(2006)