tutorial

Towards the automatic design of decision tree induction algorithms

Authors:
Rodrigo C. Barros

University of São Paulo, São Carlos, Brazil

University of São Paulo, São Carlos, Brazil
View Profile

,
Márcio P. Basgalupp

Universidade Federal de São Paulo, São José dos Campos, Brazil

Universidade Federal de São Paulo, São José dos Campos, Brazil
View Profile

,
André C.P.L.F. de Carvalho

University of São Paulo, São Carlos, Brazil

University of São Paulo, São Carlos, Brazil
View Profile

,
Alex A. Freitas

University of Kent, Canterbury, United Kingdom

University of Kent, Canterbury, United Kingdom
View Profile

GECCO '11: Proceedings of the 13th annual conference companion on Genetic and evolutionary computationJuly 2011Pages 567–574https://doi.org/10.1145/2001858.2002050

Published:12 July 2011Publication History

GECCO '11: Proceedings of the 13th annual conference companion on Genetic and evolutionary computation

Pages 567–574

ABSTRACT

Decision tree induction is one of the most employed methods to extract knowledge from data, since the representation of knowledge is very intuitive and easily understandable by humans. The most successful strategy for inducing decision trees, the greedy top-down approach, has been continuously improved by researchers over the years. This work, following recent breakthroughs in the automatic design of machine learning algorithms, proposes two different approaches for automatically generating generic decision tree induction algorithms. Both approaches are based on the evolutionary algorithms paradigm, which improves solutions based on metaphors of biological processes. We also propose guidelines to design interesting fitness functions for these evolutionary algorithms, which take into account the requirements and needs of the end-user.

References

(2007). {Online}. Available: http://www.kdnuggets.com/polls/2007/data_mining_methods.htmGoogle Scholar
L. Rokach and O. Maimon, "Top-down induction of decision trees classifiers - a survey," IEEE T SYST MAN CY C, vol. 35, no. 4, pp. 476--487, 2005. Google ScholarDigital Library
L. Breiman, J. H. Friedman, R. A. Olshen, and C. J. Stone, Classification and Regression TreesWadsworth, 1984.Google Scholar
J. R. Quinlan, C4.5: programs for machine learning. San Francisco, CA, USA: Morgan Kaufmann, 1993. Google ScholarDigital Library
G. Landeweerd, T. Timmers, E. Gelsema, M. Bins, and M. Halie, "Binary tree versus single level tree classification of white blood cells," PATTERN RECOGN, vol. 16, no. 6, pp. 571--577, 1983.Google ScholarCross Ref
K. Bennett and O. Mangasarian, "Multicategory discrimination via linear programming," OPTIM METHOD SOFTW, vol. 2, pp. 29--39, 1994.Google Scholar
B. Kim and D. Landgrebe, "Hierarchical classifier design in high-dimensional numerous class cases," IEEE T GEOSCI REMOTE, vol. 29, no. 4, pp. 518--528, 1991.Google ScholarCross Ref
M. Basgalupp, A. Carvalho, R. Barros, D. Ruiz, and A. Freitas, "Lexicographic multi-objective evolutionary induction of decision trees," INT J BIOINSP COMPUT, vol. 1, no. 1/2, pp. 105--117, 2009. Google ScholarDigital Library
R. Barros, D. Ruiz, and M. Basgalupp, "Evolutionary model trees for handling continuous classes in machine learning," INFORM SCIENCES, vol. 181, pp. 954--971, 2011. Google ScholarDigital Library
L. Breiman, "Random forests," MACH LEARN, vol. 45, no. 1, pp. 5--32, 2001. Google ScholarDigital Library
L. Breslow and D. Aha, "Simplifying decision trees: A survey," KNOWL ENG REV, vol. 12, no. 01, pp. 1--40, 1997. Google ScholarDigital Library
F. Esposito, D. Malerba, and G. Semeraro, "A Comparative Analysis of Methods for Pruning Decision Trees," IEEE T PATTERN ANAL, vol. 19, no. 5, pp. 476--491, 1997. Google ScholarDigital Library
G. Pappa and A. Freitas, "Automatically evolving rule induction algorithms," in 17th European Conference on Machine Learning, 2006, pp. 341--352. Google ScholarDigital Library
S. K. Murthy, "Automatic construction of decision trees from data: A multi-disciplinary survey," DATA MIN KNOWL DISC, vol. 2, no. 4, pp. 345--389, 1998. Google ScholarDigital Library
J. A. Sonquist, E. L. Baker, and J. N. Morgan, "Searching for structure," Institute for Social Research, University of Michigan, Tech. Rep., 1971.Google Scholar
G. V. Kass, "An exploratory technique for investigating large quantities of categorical data," APPL STATIST, vol. 29, no. 2, pp. 119--127, 1980.Google ScholarCross Ref
E. B. Hunt, J. Marin, and P. J. Stone, Experiments in induction. New York, NY, USA: Academic Press, 1966.Google Scholar
J. R. Quinlan, "Induction of decision trees," MACH LEARN, vol. 1, no. 1, pp. 81--106, 1986. Google ScholarDigital Library
A. Patterson and T. Niblett, ACLS user manual, Glasgow: Intelligent Terminals Ltd, 1983.Google Scholar
I. Kononenko, I. Bratko, and E. Roskar, "Experiments in automatic learning of medical diagnostic rules," Jozef Stefan Institute, Ljubljana, Yugoslavia, Tech. Rep., 1984.Google Scholar
P.-N. Tan, M. Steinbach, and V. Kumar, Introduction to Data Mining. Addison-Wesley, 2005. Google ScholarDigital Library
C. E. Shannon, "A mathematical theory of communication," BELL SYST TECH, vol. 27, no. 1, pp. 379--423, 625--56, 1948.Google ScholarCross Ref
M. Gleser and M. Collen, "Towards automated medical decisions," COMPUT BIOMED RES, vol. 5, no. 2, pp. 180--189, 1972.Google ScholarCross Ref
B. Jun, C. Kim, Y.-Y. Song, and J. Kim, "A New Criterion in Selection and Discretization of Attributes for the Generation of Decision Trees," IEEE T PATTERN ANAL, vol. 19, no. 2, pp. 1371--1375, 1997. Google ScholarDigital Library
D. Wang and L. Jiang, "An improved attribute selection measure for decision tree induction," in 4TH INT CONF FUZZY KNOWL DISC, 2007, pp. 654--658. Google ScholarDigital Library
E. M. Rounds, "A combined nonparametric approach to feature selection and binary decision tree design," PATTERN RECOGN, vol. 12, no. 5, pp. 313--317, 1980.Google ScholarCross Ref
P. Utgoff and C. Brodley, "An incremental method for finding multivariate splits for decision trees," in 7th INT CONF MACH LEARN, 1990, pp. 58--65. Google ScholarDigital Library
P. E. Utgoff and C. E. Brodley, "Linear machine decision trees," University of Massachusetts, Dept of Comp Sci, Tech. Rep., 1991. Google ScholarDigital Library
D. Heath, S. Kasif, and S. Salzberg, "Induction of oblique decision trees," J ARTIF INTELL RES, vol. 2, pp. 1--32, 1993. Google ScholarDigital Library
S. K. Murthy, S. Kasif, S. Salzberg, and R. Beigel, "OC1: A Randomized Induction of Oblique Decision Trees," in AAAI, 1993, pp. 322--327. Google ScholarDigital Library
A. Ittner, "Non-linear decision trees-NDT," in 13th INT CONF MACH LEARN, 1996, pp. 1--6.Google Scholar
S. Shah and P. Sastry, "New algorithms for learning and pruning oblique decision trees," IEEE T SYST MAN CY C, vol. 29, no. 4, pp. 494--505, 1999. Google ScholarDigital Library
J. R. Quinlan, "Simplifying decision trees," INT J MAN-MACH STUD, vol. 27, pp. 221--234, 1987. Google ScholarDigital Library
T. Niblett and I. Bratko, "Learning decision rules in noisy domains," in 6th ANN TECH CONF RES DEV EXPER SYST III, 1986, pp. 25--34. Google ScholarDigital Library
B. Cestnik and I. Bratko, "On estimating probabilities in tree pruning," in MACH LEARN - EWSL'91. Springer Berlin/Heidelberg, 1991, pp. 138--150. Google ScholarDigital Library
J. H. Friedman, "A recursive partitioning decision rule for nonparametric classification," IEEE T COMPUT, vol. 100, no. 4, pp. 404--408, 1977. Google ScholarDigital Library
P. Clark and T. Niblett, "The CN2 induction algorithm," MACH LEARN, vol. 3, no. 4, pp. 261--283, 1989. Google ScholarDigital Library
J. R. Quinlan, "Unknown attribute values in induction," in 6tH INT MACH LEARN WORK, 1989, pp. 164--168. Google ScholarDigital Library
W. Loh and Y. Shih, "Split selection methods for classification trees," STAT SINICA, vol. 7, pp. 815--840, 1997.Google Scholar
J. R. Quinlan, "Decision trees as probabilistic classifiers," in 4th INT MACH LEARN WORK, 1987.Google Scholar
A. Vella, D. Corne, and C. Murphy, "Hyper-heuristic decision tree induction, " W CONF NAT BIOINSP COMP, pp. 409--414, 2010.Google Scholar
T. Ho and M. Basu, "Complexity measures of supervised classification problems," IEEE T PATTERN ANAL, vol. 24, no. 3, pp. 289--300, 2002. Google ScholarDigital Library

Index Terms

Towards the automatic design of decision tree induction algorithms
1. Computing methodologies
  1. Machine learning
    1. Learning settings
    2. Machine learning approaches
      1. Logical and relational learning
        Inductive logic learning

Recommendations

A hyper-heuristic evolutionary algorithm for automatically designing decision-tree algorithms
GECCO '12: Proceedings of the 14th annual conference on Genetic and evolutionary computation

Decision tree induction is one of the most employed methods to extract knowledge from data, since the representation of knowledge is very intuitive and easily understandable by humans. The most successful strategy for inducing decision trees, the greedy ...
Read More
Automatic design of decision-tree algorithms with evolutionary algorithms

This study reports the empirical analysis of a hyper-heuristic evolutionary algorithm that is capable of automatically designing top-down decision-tree induction algorithms. Top-down decision-tree algorithms are of great importance, considering their ...
Read More
Reusable components in decision tree induction algorithms

We propose a generic decision tree framework that supports reusable components design. The proposed generic decision tree framework consists of several sub-problems which were recognized by analyzing well-known decision tree induction algorithms, namely ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
GECCO '11: Proceedings of the 13th annual conference companion on Genetic and evolutionary computation
July 2011
1548 pages
ISBN:9781450306904
DOI:10.1145/2001858
Editor:
Natalio Krasnogor
University of Nottingham, UK
,
General Chair:
Pier Luca Lanzi
Politecnico di Milano, Italy
Copyright © 2011 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 12 July 2011
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
automatic design
decision tree induction
evolutionary algorithms
Qualifiers
- tutorial
Conference

Acceptance Rates
Overall Acceptance Rate1,669of4,410submissions,38%
Upcoming Conference
GECCO '24

Sponsor:

sigevo

Genetic and Evolutionary Computation Conference

July 14 - 18, 2024

Melbourne , VIC , Australia
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 14
  Total Citations
  View Citations
- 468
  Total Downloads
- Downloads (Last 12 months)2
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Towards the automatic design of decision tree induction algorithms

GECCO '11: Proceedings of the 13th annual conference companion on Genetic and evolutionary computation

ABSTRACT

References

Cited By

Index Terms

Recommendations

A hyper-heuristic evolutionary algorithm for automatically designing decision-tree algorithms

Automatic design of decision-tree algorithms with evolutionary algorithms

Reusable components in decision tree induction algorithms