Abstract
We propose N-version Genetic Programming (NVGP) as an ensemble method to enhance accuracy and reduce performance fluctuation of programs produced by genetic programming. Diversity is essential for forming successful ensembles. NVGP quantifies behavioral diversity of ensemble members and defines NVGP optimal as an ensemble that has independent fault occurrences among its members. We observed significant accuracy improvement by NVGP optimal ensembles when applied to a DNA segment classification problem.
Similar content being viewed by others
References
A. Avizienis and J. P. J. Kelly, “Fault tolerance by design diversity: Concepts and experiments, ” IEEE Computer, vol. 17,no. 8, pp. 67-80, 1984.
W. Banzhaf, P. Nordin, R. E. Keller, and F. D. Francone, Genetic Programming: An Introduction: On the Automatic Evolution of Computer Programs and its Applications, Academic Press/Morgan Kaufmann: San Francisco, 1998.
S. C. Basak, B. D. Gute, G. D. Grunwald, D. W. Opitz, and K. Balasubramanian, “Use of statistical and neural net methods in predicting toxicity of chemicals: A hierarchical qsar approach, ” in Predictive Toxicology of Chemicals: Experiences and Impact of AI Tools—Papers from the 1999 AAAI Symposium, G. C. Gini (ed.), AAAI Press: Palo Alto, Menlo, CA, USA, 1999, pp. 108-111.
E. Bauer and R. Kohavi, “An empirical comparison of voting classification algorithms: Bagging, boosting, and variants, ” Machine Learning, vol. 36,no. 1/2, pp. 105-139, 1999.
M. Brameier and W. Banzhaf, “Evolving teams of predictors with linear genetic programming, ” Genetic Programming and Evolvable Machines, vol. 2,no. 4, pp. 381-407, 2001.
M. Brameier and W. Banzhaf, “Explicit control of diversity and effective variation distance in linear genetic programming, ” in Proceedings of 5th European Conference EuroGP2002, volume 2278 of Lecture Notes in Computer Science, Kinsale, Ireland, E. Lutton, J. Miller, C. Ryan, Andrea G. B. Tettamanzi, James A. Foster, and J. C. M. Baeten (eds.), Springer-Verlag: Heidelberg, Germany, 2002, pp. 38-50.
L. Breiman, Bagging predictor, Technical Report 421, Department of Statistics, University of California: Berkley, 1994.
E. Burke, S. Gustafson, and G. Kendall, “A survey and analysis of diversity measures in genetic programming, ” in GECCO 2002: Proceedings of Genetic and Evolutionary Computation Conference, New York, NY, USA, E. Cantu-Paz, K. Mathias, R. Roy, D. Davis, R. Poli, K. Balakrishnan, V. Honavar, G. Rudolph, J. Wegener, L. Bull, M. A. Potter, A. C. Schultz, J. F. Miller, E. Burke, N. Jonoska, and W. B. Langdon (eds.), Morgan Kaufmann: San Francisco, CA, USA, 2002, pp. 716-723.
A. Ekárt and S. Z. Németh, “Maintaining the diversity of genetic programs, ” in Proceedings of 5th European Conference EuroGP2002, volume 2278 of Lecture Notes in Computer Science, Kinsale, Ireland, E. Lutton, J. Miller, C. Ryan, Andrea G. B. Tettamanzi, James A. Foster, and J. C. M. Baeten (eds.), Springer-Verlag: Heidelberg, Germany, 2002.
R. Feldt, “Generating diverse software versions with genetic programming: an experimental study, ” IEE Proceedings—Software, Special issue on Dependable Computing Systems, vol. 145,no. 6, pp. 228-236, 1998.
R. Feldt, “Generating multiple diverse software versions with genetic programming, ” in Proceedings of the 24th EUROMICRO Conference, Workshop on Dependable Computing Systems, Västerås, Sweden, IEEE Computer Society Press: Los Alamitos, CA, USA, 1998, pp. 387-396.
Y. Freung, Y. Mansour, and R. E. Schapire, “Why averaging classifiers can protect against overfitting, ” in Proceedings of 8th International Workshop on Artificial Intelligence and Statistics 2001, Key West, FL, USA, T. Jaakkola (ed.), Morgan Kaufmann: San Francisco, CA, USA, 2001.
S. Handley, “Predicting whether or not a nucleic acid sequence is an E. coli promoter region using genetic programming, ” in Proceedings of 1st International Symposium on Intelligence in Neural and Biological Systems, Herndon, VA, USA, N. G. Bourbakis, (ed.), IEEE Computer Society Press: Los Alamitos, CA, USA, 1995, pp. 122-127.
S. Hashem, “Improving model accuracy using optimal linear combinations of trained neural networks, ” IEEE Transactions on Neural Networks, vol. 6,no. 3, pp. 792-794, 1995.
S. Hashem, “Optimal linear combinations of neural networks, ” Neural Networks, vol. 10,no. 4, pp. 599-614, 1997.
L. Hatton, “N-version vs. one good program, ” IEEE Software, vol. 14,no. 6, pp. 71-76, 1997.
V. Hilford, M. R. Lyu, B. Cukic, A. Jamoussi, and F. B. Bastani, “Diversity in the software development process, ” in Proceedings of 3rd International Workshop on Object-Oriented Real-Time Dependable Systems, Newport Beach, CA, USA, IEEE Computer Society Press: Los Alamitos, CA, USA, 1997, pp. 129-136.
H. Iba, “Bagging, boosting, and bloating in genetic programming, ” in GECCO'99: Proceedings of the Genetic and Evolutionary Computation Conference, Orlando, FL, USA, W. Banzhaf, J. Daida, M. H. Eiben, A. E. Garzon, V. Honavar, M. Jakiela, and R. E. Smith (eds.), Morgan Kaufmann: San Francisco, CA, USA, 1999, pp. 1053-1060.
K. Imamura and J. A. Foster, “Fault tolerant computing with N-version genetic programming, ” in GECCO 2001: Proceedings of Genetic and Evolutionary Computation Conference, San Francisco, CA, USA, L. Spector, E. D. Goodman (eds.), Morgan Kaufmann: San Francisco, CA, USA, 2001, p. 178.
K. Imamura, R. B. Heckendorn, T. Soule, and J. A. Foster, “Abstention reduces errors—decision abstaining N-version genetic programming, ” in GECCO 2002: Proceedings of Genetic and Evolutionary Computation Conference, New York, NY, USA, E. Cantu-Paz, K. Mathias, R. Roy, D. Davis, R. Poli, K. Balakrishnan, V. Honavar, G. Rudolph, J. Wegener, L. Bull, M. A. Potter, A. C. Schultz, J. F. Miller, E. Burke, N. Jonoska, and W. B. Langdon (eds.), Morgan Kaufmann: San Francisco, CA, USA, 2002, p. 796.
K. Imamura, R. B. Heckendorn, T. Soule, and J. A. Foster, “N-version genetic programming via fault masking, ” in Proceedings of 5th European Conference EuroGP2002, volume 2278 of Lecture Notes in Computer Science, Kinsale, Ireland, E. Lutton, J. Miller, C. Ryan, Andrea G. B. Tettamanzi, James A. Foster, and J. C. M. Baeten (eds.), Springer-Verlag: Heidelberg, Germany, 2002, pp. 172-181.
D. Jimenez and N. Walsh, “Dynamically weighted ensemble neural networks for classification, ” in Proceedings of the 1998 IEEE International Joint Conference on Neural Networks, Anchorage, AL, USAIEEE: Piscataway, NJ, USA, 1998, pp. 753-756.
J. C. Knight and N. G. Leveson, “An experimental evaluation of the assumption of independence in multiversion programming, ” IEEE Transactions on Software Engineering, vol. 12,no. 1, pp. 96-109, 1986.
R. Kohavi, “A study of cross-validation and bootstrap for accuracy estimation and model selection, ” in Proceedings of the 14th International Joint Conference on Artificial Intelligence (IJCAI), Montreal, Canada, C. S. Mellish (ed.), Morgan Kaufmann: San Francisco, CA, USA, 1995, pp. 1137-1145.
A. Krogh and J. Vedelsby, “Neural network ensembles, cross validation, and active learning, ” in Advances in Neural Information Processing Systems, G. Tesauro, D. Touretzky, and T. Leen (eds.), The MIT Press, vol. 7, 1995, pp. 231-238.
W. H. Land, Jr., T. Masters, J. Y. Lo, D. W. McKee, and F. R. Anderson, “New results in breast cancer classification obtained from an evolutionary computation/adaptive boosting hybrid using mammogram and history data, ” in Proceedings of the 2001 IEEE Mountain Workshop on Soft Computing in Industrial Applications, Blackburg, VA, USA, M. J. Embrechts (ed.), IEEE: Piscataway, NJ, USA, 2001, pp. 47-52.
W. B. Langdon and B. F. Buxton, “Genetic programming for combining classifiers, ” in GECCO 2001: Proceedings of Genetic and Evolutionary Computation Conference, San Francisco, CA, USA, L. Spector and E. D. Goodman (eds.), Morgan Kaufmann: San Francisco, CA, USA, 2001, pp. 66-73.
Q. Ma and J. T. L. Wang, “Recognizing promoters in DNA using Bayesian neural networks, ” in Proceedings of the IASTED International Conference—Artificial Intelligence and Soft Computing, Honolulu, HI, USA, M. H. Hamza (ed.), ACTA Press: Calgary, Canada, 1999, pp. 301-305.
R. Maclin and D. Opitz, “Popular ensemble methods: An empirical study, ” Journal of Artificial Intelligence Research, vol. 11, pp. 169-198, 1999.
B. W. Matthwes, “Comparison of the predicted and observed secondary structure of t4 phage lysozyme, ” Biochimica et Biophysica Acta, vol. 405, pp. 443-451, 1975.
D. W. Opitz, S. C. Basak, and B. D. Gute, “Hazard assessment modeling: An evolutionary ensemble approach, ” in GECCO'99: Proceedings of the Genetic and Evolutionary Computation Conference, Orlando, FL, USA, W. Banzhaf, J. Daida, M. H. Eiben, A. E. Garzon, V. Honavar, M. Jakiela, and R. E. Smith (eds.), Morgan Kaufmann: San Francisco, CA, USA, 1999, pp. 1643-1650.
A. G. Pedersen and J. Engelbrecht, “Investigations of Escherichia coli promoter sequences with artificial neural networks: New signals discovered upstream of the transcriptional startpoint, ” in Proceedings of the 3rd International Conference on Intelligent Systems for Molecular Biology, Cambridge, UK, C. Rawlings (ed.), AAAI Press: Menlo Park, CA, USA, 1995, pp. 292-299.
D. K. Pradhan and P. Banerjee, “Fault-tolerance multiprocessor and distributed systems: Principles, ” in Fault-Tolerant Computer System Design, D. K. Pradhan (ed.), Prentice Hall PTR, 1996, ch. 3, p. 142.
G. Rätsch, T. Onoda, and K. R. Müller, “An improvement of adaboost to avoid overfitting, ” in Proceedings of the 5th International Conference on Neural Information Processing (ICONIP98), Kitakyushu, Japan, S. Usui and T. Omori (eds.), Ohmsha-IOS Press: Tokyo, Japan, 1998, pp. 506-509.
B. Rosen, “Ensemble learning using decorrelated neural networks, ” Connection Science, vol. 8, pp. 373-384, 1996.
R. E. Schapire and Y. Freund, “A short introduction to boosting, ” Journal of Japanese Society for Artificial Intelligence, vol. 14,no. 5, pp. 771-780, 1999.
T. Soule, “Voting teams: A cooperative approach to non-typical problems, ” in GECCO'99: Proceedings of the Genetic and Evolutionary Computation Conference, Orlando, FL, USA, W. Banzhaf, J. Daida, M. H. Eiben, A. E. Garzon, V. Honavar, M. Jakiela, and R. E. Smith (eds.), Morgan Kaufmann: San Francisco, CA, USA, vol. 1, 1999, pp. 916-922.
T. Soule, “Heterogeneity and specialization in evolving teams, ” in GECCO 2000: Proceedings of the Genetic and Evolutionary Computation Conference, Las Vegas, NV, USA, L. D. Whitley, D. Goldberg, E. Cantu-Paz, L. Spector, I. Parmee, and H. G. Beyer (eds.), Morgan Kaufmann: San Francisco, CA, USA, 2000, pp. 778-785.
G. G. Towell, J. W. Shavlik, and M. O. Noordewier, “Refinement of approximate domain theories by knowledge-based neural networks, ” in Proceedings of the 8th National Conference on Artificial Intelligence (AAAI-90), Boston, MA, USA, T. Dietterich (ed.), AAAI Press/MIT Press: Menlo Park, CA, USA, 1990, pp. 861-866.
UCI Machine Learning Repository—Molecular Biology Databases. http://wwwl.ics.uci.edu/~mlearn/MLSummary.html
B.T. Zang and J. G. Joung, “Enhancing robustness of genetic programming at the species level, ” in Proceedings of the 2nd Annual Conference Genetic Programming (GP 97), Palo Alto, CA, USA, J. R. Koza (ed.), Morgan Kaufmann: San Francisco, Palo Alto, CA, USA, 1997, pp. 336-342.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Imamura, K., Soule, T., Heckendorn, R.B. et al. Behavioral Diversity and a Probabilistically Optimal GP Ensemble. Genetic Programming and Evolvable Machines 4, 235–253 (2003). https://doi.org/10.1023/A:1025124423708
Issue Date:
DOI: https://doi.org/10.1023/A:1025124423708