Skip to main content

Combining Decision Trees and Neural Networks for Drug Discovery

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2278))

Abstract

Genetic programming (GP) offers a generic method of automatically fusing together classifiers using their receiver operating characteristics (ROC) to yield superior ensembles. We combine decision trees (C4.5) and artificial neural networks (ANN) on a difficult pharmaceutical data mining (KDD) drug discovery application. Specifically predicting inhibition of a P450 enzyme. Training data came from high throughput screening (HTS) runs. The evolved model may be used to predict behaviour of virtual (i.e. yet to be manufactured) chemicals. Measures to reduce over fitting are also described.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. P.J. Angeline. Multiple interacting programs: A representation for evolving complex behaviors. Cybernetics and Systems, 29(8):779–806.

    Google Scholar 

  2. K. Binmore. Fun and Games. D. C. Heath, Lexington, MA, USA.

    Google Scholar 

  3. L. Breiman. Bagging predictors. Machine Learning, 24:123–140.

    Google Scholar 

  4. [Davidson et al., 2000]-J.W. Davidson, D.A. Savic, and G.A. Walters. Rainfall run off modeling using a new polynomial regression method. In Proc. 4th Int. Conf. On Hydroinformatics, Iowa City.

    Google Scholar 

  5. A.A. Freitas. Data mining with evolutionary algorithms: Research directions.Technical Report WS-99-06, AAAI, Orlando.

    Google Scholar 

  6. Y. Freund and R.E. Schapire. Experiments with a new boosting algorithm. In Proc. 13th ICML, pp148–156. Morgan Kaufmann.

    Google Scholar 

  7. C. Gathercole and P. Ross. Tackling the boolean even N parity problem with genetic programming and limited-error fitness. In J.R. Koza et al., eds., Proc. GP’97, pp119–127, Stanford University. Morgan Kaufmann.

    Google Scholar 

  8. C. Gathercole. An Investigation of Supervised Learning in Genetic Programming. PhD thesis, University of Edinburgh, 1998.

    Google Scholar 

  9. A.H. Gunatilaka and B.A. Baertlein. Feature-level and decision level fusion of noncoincidently sampled sensors for land mine detection.IEEE Transactions on Pattern Analysis and Machine Intelligence, 23(6):577–589.

    Google Scholar 

  10. S. Handley. On the use of a directed acyclic graph to represent a population of computer programs. In Proc. WCCI’94, pp154–159, Orlando. IEEE.

    Google Scholar 

  11. J.A. Hanley and B.J. McNeil. A method of comparing the areas under ROC curves derived from the same cases. Radiology, 148:839–843.

    Google Scholar 

  12. [Jacobs et al., 1991]_R.A. Jacobs, M.I. Jordon, S.J. Nowlan, and G.E. Hinton. Adaptive mixtures of local experts. Neural Computation, 3:79–87.

    Google Scholar 

  13. G. Kelly. Data fusion: from primary metrology to process measurement.In V. Piuri and M. Savino, eds., Proc. 16th Instrumentation and Measurement Technology Conference. IMTC/99., vol 3, pp1325–1329, Venice, Italy. IEEE.

    Google Scholar 

  14. J. Kittler and F. Roli, eds.. Second International Conference on Multiple Classi.er Systems, vol 2096 of LNCS, Cambridge. Springer Verlag.

    Google Scholar 

  15. M. A. Kupinski and M. A. Anastasio. Multiobjective genetic optimization of diagnostic classifiers with implications for generating ROC curves. IEEE Transactions on Medical Imaging, 18(8):675–685.

    Google Scholar 

  16. [Kupinski et al., 2000]_M.A. Kupinski, M.A. Anastasio, and M.L. Giger. Multiobjective genetic optimization of diagnostic classifiers used in the computerized detection of mass lesions in mammography. In K.M. Hanson, ed., SPIE Medical Imaging Conference, vol 3979, San Diego.

    Google Scholar 

  17. W.B. Langdon and B.F. Buxton. Genetic programming for combining classifiers. In L. Spector et al., eds., GECCO-2001, pp66–73, San Francisco. Morgan Kaufmann.

    Google Scholar 

  18. W.B. Langdon and B.F. Buxton. Genetic programming for improved receiver operating characteristics. In J. Kittler and F. Roli, eds., Second International Conference on Multiple Classifier System, pp68–77.

    Google Scholar 

  19. W.B. Langdon and B.F. Buxton. Evolving receiver operating characteristics for data fusion. In J.F. Miller at al., eds., EuroGP’2001, vol 2038 of LNCS, pp87–96, Lake Como, Italy. Springer.

    Google Scholar 

  20. [Langdon et al., 1999] W.B. Langdon, T. Soule, R. Poli, and J.A. Foster. The evolution of size and shape. In L. Spector at al., eds., Advances in Genetic Programming 3, ch 8, pp163–190. MIT Press.

    Google Scholar 

  21. [Langdon et al., 2001] W.B. Langdon, S.J. Barrett, and B.F. Buxton. Genetic programming for combining neural networks for drug discovery. In WSC6, 6th World Conference on Soft Computing in Industrial Applications, Springer-Verlag. Forthcoming.

    Google Scholar 

  22. W.B. Langdon. Genetic Programming and Data Structures. Kluwer.

    Google Scholar 

  23. W.B. Langdon. Size fair and homologous tree genetic programming crossovers. Genetic Programming and Evolvable Machines, 1(1/2):95–119.

    Google Scholar 

  24. D.W. Opitz and J.W. Shavlik. Actively searching for an effective neural-network ensemble. Connection Science, 8(3–4):337–353.

    Google Scholar 

  25. F. Provost and T. Fawcett. Robust classification for imprecise environments. Machine Learning, 42(3):203–231.

    Google Scholar 

  26. [Schmiedle et al., 2001]_F. Schmiedle, D. Grosse, R. Drechsler, and B. Becker. Too much knowledge hurts: Acceleration of genetic programs for learning heuristics. In B. Reusch, ed., Computational Intelligence: Theory and Applications, vol 2206 of LNCS, pp479–491, Dortmund, Germany. 7th Fuzzy Days, Springer.

    Chapter  Google Scholar 

  27. [Scott et al., 1998]_M.J.J. Scott, M. Niranjan, and R.W. Prager. Realisable classifiers: Improving operating performance on variable cost problems. In P.H. Lewis and M.S. Nixon, eds.., Proc. 9th British Machine Vision Conference, vol 1, pp304–315,University of Southampton, UK.

    Google Scholar 

  28. [Sirlantzis et al., 2001]_K. Sirlantzis, M.C. Fairhurst, and M.S. Hoque. Genetic algorithms for multi-classifier system configuration: A case study in character recognition.In J. Kittler and F. Roli, eds.., Second International Conference on Multiple Classifier System, pp99–108.

    Google Scholar 

  29. P. Sollich and A. Krogh. Learning with ensembles: How over fitting can be useful. In D.S. Touretzky et al., eds.., Advances in Neural Information Processing Systems, vol 8, pp190–196. MIT Press.

    Google Scholar 

  30. T. Soule. Voting teams: A cooperative approach to non-typical problems using genetic programming. In W. Banzhaf et al., eds.., GECCO-1999, vol 1, pp916–922, Orlando. Morgan Kaufmann.

    Google Scholar 

  31. [Swets et al., 2000]_J.A. Swets, R.M. Dawes, and J. Monahan. Better decisions through science. Scientific American, pp70–75, October.

    Google Scholar 

  32. A. Teller and D. Andre. Automatically choosing the number of fitness cases: The rational allocation of trials. In J.R. Koza et al., eds., GP’97.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2002 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Langdon, W.B., Barrett, S.J., Buxton, B.F. (2002). Combining Decision Trees and Neural Networks for Drug Discovery. In: Foster, J.A., Lutton, E., Miller, J., Ryan, C., Tettamanzi, A. (eds) Genetic Programming. EuroGP 2002. Lecture Notes in Computer Science, vol 2278. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45984-7_6

Download citation

  • DOI: https://doi.org/10.1007/3-540-45984-7_6

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-43378-1

  • Online ISBN: 978-3-540-45984-2

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics