Skip to main content

Feature Selection and Ranking of Key Genes for Tumor Classification: Using Microarray Gene Expression Data

  • Conference paper
Artificial Intelligence and Soft Computing – ICAISC 2006 (ICAISC 2006)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 4029))

Included in the following conference series:

Abstract

In this paper we perform a t-test for significant gene expression analysis in different dimensions based on molecular profiles from microarray data, and compare several computational intelligent techniques for classification accuracy on Leukemia, Lymphoma and Prostate cancer datasets of broad institute and Colon cancer dataset from Princeton gene expression project. Classification accuracy is evaluated with Linear genetic Programs, Multivariate Regression Splines (MARS), Classification and Regression Tress (CART) and Random Forests. Linear Genetic Programs and Random forests perform the best for detecting malignancy of different tumors. Our results demonstrate the potential of using learning machines in diagnosis of the malignancy of a tumor.

We also address the related issue of ranking the importance of input features, which is itself a problem of great interest. Elimination of the insignificant inputs (genes) leads to a simplified problem and possibly faster and more accurate classification of microarray gene expression data. Experiments on select cancer datasets have been carried out to assess the effectiveness of this criterion. Results show that using significant features gives the most remarkable performance and performs consistently well over microarray gene expression datasets we used. The classifiers used perform the best using the most significant features expect for Prostate cancer dataset.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 139.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Brown, P., Botstein, D.: Exploring the New World of the Genome with DNA Microarrays. Nature Genetics Supplement 21, 33–37 (1999)

    Article  Google Scholar 

  2. Quackenbush, J.: Computational Analysis of Microarray Data. Nature Rev. Genteics 2, 418–427 (2001)

    Article  Google Scholar 

  3. Dudoit, S., Fridlyand, J., Speed, T.: Comparison of Discrimination Methods for the Classification of Tumors Using Gene Expression Data. J. Am. Statistical Assoc. 97, 77–87 (2002)

    Article  MATH  MathSciNet  Google Scholar 

  4. Peterson, C., Ringner, M.: Analysis Tumor Gene Expression Profiles. Artificial Intelligence in Medicine 28(1), 59–74 (2003)

    Article  Google Scholar 

  5. Eisen, M., Spellman, P., Brown, P., Botstein, D.: Cluster Analysis and Display of Genome-Wide Expression Patterns. Proc. Nat’l Acad. Sci. USA 95, 14863–14868 (1998)

    Article  Google Scholar 

  6. Tamyo, P., et al.: Interpreting Patterns of Gene Expression with Self-Organizing Maps: Methods and Application to Hematopoietic Differentiation. Proc. Nat’l Acad. Sci. USA 96, 2907–2912 (1999)

    Article  Google Scholar 

  7. Armitage, P., Berry, G.: Statistical Methods in Medical Research. Blackwell, Malden (1994)

    Google Scholar 

  8. Salford Systems. TreeNet, CART, MARS, Random Forests Manual

    Google Scholar 

  9. Hastie, T., Tibshirani, R., Friedman, J.H.: The elements of statistical learning: Data mining, inference, and prediction. Springer, Heidelberg (2001)

    MATH  Google Scholar 

  10. Breiman, L., Friedman, J.H., Olshen, R.A., Stone, C.J.: Classification and regression trees. Wadsworth and Brooks/Cole Advanced Books and Software (1986)

    Google Scholar 

  11. Breiman, L.: Random Forests. Journal of Machine Learning 45, 5–32 (2001)

    Article  MATH  Google Scholar 

  12. Koza, J.R.: Genetic Programming: On the Programming of Computers by Means of Natural Selection. The MIT Press, Cambridge (1992)

    MATH  Google Scholar 

  13. Goldberg, D.E.: Genetic Algorithms in Search, Optimization and Machine Learning. Addison-Wesley, Reading (1989)

    MATH  Google Scholar 

  14. AIM Learning Technology, http://www.aimlearning.com

  15. Golub, T., et al.: Molecular Classification of Cancer: Class Discovery and Class Prediction by Gene Expression. Science 286, 531–537 (1999)

    Article  Google Scholar 

  16. Shipp, M., et al.: Diffuse Large B-Cell Lymphoma Outcome Prediction by Gene Expression Profiling and Supervised Machine Learning. Nature Medicine 8(1), 68–74 (2002)

    Article  Google Scholar 

  17. Singh, D., et al.: Gene Expression Correlates of Clinical Prostate Cancer Behavior. Cancer Cell 1(2), 227–235 (2002)

    Article  Google Scholar 

  18. Alon, U., et al.: Broad Patterns of Gene Expression Revealed by Clustering Analysis of Tumor and Normal Colon Tissues Probed by Oligonucleotide Arrays. Proc. Nat’l Acad. Sci. 96, 6745–6750 (1999)

    Article  Google Scholar 

  19. http://www.broad.mit.edu/

  20. http://microarray.princetion.edu/oncology

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2006 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Mukkamala, S., Liu, Q., Veeraghattam, R., Sung, A.H. (2006). Feature Selection and Ranking of Key Genes for Tumor Classification: Using Microarray Gene Expression Data. In: Rutkowski, L., Tadeusiewicz, R., Zadeh, L.A., Żurada, J.M. (eds) Artificial Intelligence and Soft Computing – ICAISC 2006. ICAISC 2006. Lecture Notes in Computer Science(), vol 4029. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11785231_100

Download citation

  • DOI: https://doi.org/10.1007/11785231_100

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-35748-3

  • Online ISBN: 978-3-540-35750-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics