Skip to main content

Part of the book series: Genetic and Evolutionary Computation ((GEVO))

Abstract

In certain problem domains, “The Curse of Dimensionality” (Hastie et al., 2001) is well known. Also known as the problem of “High P and Low N” where the number of parameters far exceeds the number of samples to learn from, we describe our methods for making the most of limited samples in producing reasonably general classification rules from data with a larger number of parameters. We discuss the application of this approach in classifying mesothelioma samples from baseline data according to their time to recurrence. In this case there are 12,625 inputs for each sample but only 19 samples to learn from. We reflect on the theoretical implications of the behavior of GP in these extreme cases and speculate on the nature of generality.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  • Affymetrix (2006). Human genome u95 set.

    Google Scholar 

  • Almal, A., Mitra, A., Datar, R., Lenehan, P., Fry, D., Cote, R., and Worzel, W. (2006). Using genetic programming to classify node positive patients in bladder cancer. In Proceedings of the Genetic and Evolutionary Computation Conference (GECCO-2006).

    Google Scholar 

  • Daida, Jason (2004). Considering the roles of structure in problem solving by a computer. In O’Reilly, Una-May, Yu, Tina, Riolo, Rick L., and Worzel, Bill, editors, Genetic Programming Theory and Practice II, chapter 5, pages 67–86. Springer, Ann Arbor.

    Google Scholar 

  • Driscoll, Joseph A., Worzel, Bill, and MacLean, Duncan (2003). Classification of gene expression data with genetic programming. In Riolo, Rick L. and Worzel, Bill, editors, Genetic Programming Theory and Practice, chapter 3, pages 25–42. Kluwer.

    Google Scholar 

  • Hastie, T., Tibshirani, R., and Friedman, J. (2001). The elements of statistical learning: Data mining, inference, and prediction. In Springer series in statistics. Springer, Berlin.

    Google Scholar 

  • Holland, J.H. (2003). Personal communication.

    Google Scholar 

  • Hong, Jin-Hyuk and Cho, Sung Bae (2004). Lymphoma cancer classification using genetic programming with SNR features. In Keijzer, Maarten, O’Reilly, Una-May, Lucas, Simon M., Costa, Ernesto, and Soule, Terence, editors, Genetic Programming 7th European Conference, EuroGP 2004, Proceedings, volume 3003 of LNCS, pages 78–88, Coimbra, Portugal. Springer-Verlag.

    Google Scholar 

  • Langdon, W. and Buxton, B. (2004). Genetic programming for mining dna chip data from cancer patients. Genetic Programming and Evolvable Machines, 5(3):251–257.

    Article  Google Scholar 

  • MacLean, Duncan, Wollesen, Eric A., and Worzel, Bill (2004). Listening to data: Tuning a genetic programming system. In O’Reilly, Una-May, Yu, Tina, Riolo, Rick L., and Worzel, Bill, editors, Genetic Programming Theory and Practice II, chapter 15, pages 245–262. Springer, Ann Arbor.

    Google Scholar 

  • Moore, Jason H., Parker, Joel S., Olsen, Nancy J., and Aune, Thomas M. (2002). Symbolic discriminant analysis of microarray data in automimmune disease. Genetic Epidemiology, 23:57–69.

    Article  Google Scholar 

  • Pass, H.I., Liu, Z., Wali, A., Bueno, R., Land, S., Lott, D., Siddiq, F., Lonardo, F., Carbone, M., and Draghid, S. (2004). Gene expression profiles predict survival and progression of pleural mesothelioma. Clinical Cancer Research, 10(3):849–859.

    Article  Google Scholar 

  • Poli, R. (2000). Hyperschema theory for gp with one-point crossover, building blocks, and some new results in ga theory. In Proceedings of Euro GP’2000, LNCS, pages 163–180. Springer-Verlag.

    Google Scholar 

  • Poli, Riccardo and Langdon, W. B. (1997). Genetic programming with one-point crossover and point mutation. Technical Report CSRP-97-13, University of Birmingham, School of Computer Science, Birmingham, B15 2TT, UK.

    Google Scholar 

  • Sastry, Kumara, O’Reilly, Una-May, and Goldberg, David E. (2004). Population sizing for genetic programming based on decision making. In O’Reilly, Una-May, Yu, Tina, Riolo, Rick L., and Worzel, Bill, editors, Genetic Programming Theory and Practice II, chapter 4, pages 49–65. Springer, Ann Arbor.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2007 Springer Science+Business Media, LLC

About this chapter

Cite this chapter

Worzel, W.P., Almal, A., MacLean, C.D. (2007). Lifting the Curse of Dimensionality. In: Riolo, R., Soule, T., Worzel, B. (eds) Genetic Programming Theory and Practice IV. Genetic and Evolutionary Computation. Springer, Boston, MA. https://doi.org/10.1007/978-0-387-49650-4_3

Download citation

  • DOI: https://doi.org/10.1007/978-0-387-49650-4_3

  • Publisher Name: Springer, Boston, MA

  • Print ISBN: 978-0-387-33375-5

  • Online ISBN: 978-0-387-49650-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics