ABSTRACT
Decision tree induction is one of the most employed methods to extract knowledge from data, since the representation of knowledge is very intuitive and easily understandable by humans. The most successful strategy for inducing decision trees, the greedy top-down approach, has been continuously improved by researchers over the years. This work, following recent breakthroughs in the automatic design of machine learning algorithms, proposes two different approaches for automatically generating generic decision tree induction algorithms. Both approaches are based on the evolutionary algorithms paradigm, which improves solutions based on metaphors of biological processes. We also propose guidelines to design interesting fitness functions for these evolutionary algorithms, which take into account the requirements and needs of the end-user.
- (2007). {Online}. Available: http://www.kdnuggets.com/polls/2007/data_mining_methods.htmGoogle Scholar
- L. Rokach and O. Maimon, "Top-down induction of decision trees classifiers - a survey," IEEE T SYST MAN CY C, vol. 35, no. 4, pp. 476--487, 2005. Google ScholarDigital Library
- L. Breiman, J. H. Friedman, R. A. Olshen, and C. J. Stone, Classification and Regression TreesWadsworth, 1984.Google Scholar
- J. R. Quinlan, C4.5: programs for machine learning. San Francisco, CA, USA: Morgan Kaufmann, 1993. Google ScholarDigital Library
- G. Landeweerd, T. Timmers, E. Gelsema, M. Bins, and M. Halie, "Binary tree versus single level tree classification of white blood cells," PATTERN RECOGN, vol. 16, no. 6, pp. 571--577, 1983.Google ScholarCross Ref
- K. Bennett and O. Mangasarian, "Multicategory discrimination via linear programming," OPTIM METHOD SOFTW, vol. 2, pp. 29--39, 1994.Google Scholar
- B. Kim and D. Landgrebe, "Hierarchical classifier design in high-dimensional numerous class cases," IEEE T GEOSCI REMOTE, vol. 29, no. 4, pp. 518--528, 1991.Google ScholarCross Ref
- M. Basgalupp, A. Carvalho, R. Barros, D. Ruiz, and A. Freitas, "Lexicographic multi-objective evolutionary induction of decision trees," INT J BIOINSP COMPUT, vol. 1, no. 1/2, pp. 105--117, 2009. Google ScholarDigital Library
- R. Barros, D. Ruiz, and M. Basgalupp, "Evolutionary model trees for handling continuous classes in machine learning," INFORM SCIENCES, vol. 181, pp. 954--971, 2011. Google ScholarDigital Library
- L. Breiman, "Random forests," MACH LEARN, vol. 45, no. 1, pp. 5--32, 2001. Google ScholarDigital Library
- L. Breslow and D. Aha, "Simplifying decision trees: A survey," KNOWL ENG REV, vol. 12, no. 01, pp. 1--40, 1997. Google ScholarDigital Library
- F. Esposito, D. Malerba, and G. Semeraro, "A Comparative Analysis of Methods for Pruning Decision Trees," IEEE T PATTERN ANAL, vol. 19, no. 5, pp. 476--491, 1997. Google ScholarDigital Library
- G. Pappa and A. Freitas, "Automatically evolving rule induction algorithms," in 17th European Conference on Machine Learning, 2006, pp. 341--352. Google ScholarDigital Library
- S. K. Murthy, "Automatic construction of decision trees from data: A multi-disciplinary survey," DATA MIN KNOWL DISC, vol. 2, no. 4, pp. 345--389, 1998. Google ScholarDigital Library
- J. A. Sonquist, E. L. Baker, and J. N. Morgan, "Searching for structure," Institute for Social Research, University of Michigan, Tech. Rep., 1971.Google Scholar
- G. V. Kass, "An exploratory technique for investigating large quantities of categorical data," APPL STATIST, vol. 29, no. 2, pp. 119--127, 1980.Google ScholarCross Ref
- E. B. Hunt, J. Marin, and P. J. Stone, Experiments in induction. New York, NY, USA: Academic Press, 1966.Google Scholar
- J. R. Quinlan, "Induction of decision trees," MACH LEARN, vol. 1, no. 1, pp. 81--106, 1986. Google ScholarDigital Library
- A. Patterson and T. Niblett, ACLS user manual, Glasgow: Intelligent Terminals Ltd, 1983.Google Scholar
- I. Kononenko, I. Bratko, and E. Roskar, "Experiments in automatic learning of medical diagnostic rules," Jozef Stefan Institute, Ljubljana, Yugoslavia, Tech. Rep., 1984.Google Scholar
- P.-N. Tan, M. Steinbach, and V. Kumar, Introduction to Data Mining. Addison-Wesley, 2005. Google ScholarDigital Library
- C. E. Shannon, "A mathematical theory of communication," BELL SYST TECH, vol. 27, no. 1, pp. 379--423, 625--56, 1948.Google ScholarCross Ref
- M. Gleser and M. Collen, "Towards automated medical decisions," COMPUT BIOMED RES, vol. 5, no. 2, pp. 180--189, 1972.Google ScholarCross Ref
- B. Jun, C. Kim, Y.-Y. Song, and J. Kim, "A New Criterion in Selection and Discretization of Attributes for the Generation of Decision Trees," IEEE T PATTERN ANAL, vol. 19, no. 2, pp. 1371--1375, 1997. Google ScholarDigital Library
- D. Wang and L. Jiang, "An improved attribute selection measure for decision tree induction," in 4TH INT CONF FUZZY KNOWL DISC, 2007, pp. 654--658. Google ScholarDigital Library
- E. M. Rounds, "A combined nonparametric approach to feature selection and binary decision tree design," PATTERN RECOGN, vol. 12, no. 5, pp. 313--317, 1980.Google ScholarCross Ref
- P. Utgoff and C. Brodley, "An incremental method for finding multivariate splits for decision trees," in 7th INT CONF MACH LEARN, 1990, pp. 58--65. Google ScholarDigital Library
- P. E. Utgoff and C. E. Brodley, "Linear machine decision trees," University of Massachusetts, Dept of Comp Sci, Tech. Rep., 1991. Google ScholarDigital Library
- D. Heath, S. Kasif, and S. Salzberg, "Induction of oblique decision trees," J ARTIF INTELL RES, vol. 2, pp. 1--32, 1993. Google ScholarDigital Library
- S. K. Murthy, S. Kasif, S. Salzberg, and R. Beigel, "OC1: A Randomized Induction of Oblique Decision Trees," in AAAI, 1993, pp. 322--327. Google ScholarDigital Library
- A. Ittner, "Non-linear decision trees-NDT," in 13th INT CONF MACH LEARN, 1996, pp. 1--6.Google Scholar
- S. Shah and P. Sastry, "New algorithms for learning and pruning oblique decision trees," IEEE T SYST MAN CY C, vol. 29, no. 4, pp. 494--505, 1999. Google ScholarDigital Library
- J. R. Quinlan, "Simplifying decision trees," INT J MAN-MACH STUD, vol. 27, pp. 221--234, 1987. Google ScholarDigital Library
- T. Niblett and I. Bratko, "Learning decision rules in noisy domains," in 6th ANN TECH CONF RES DEV EXPER SYST III, 1986, pp. 25--34. Google ScholarDigital Library
- B. Cestnik and I. Bratko, "On estimating probabilities in tree pruning," in MACH LEARN - EWSL'91. Springer Berlin/Heidelberg, 1991, pp. 138--150. Google ScholarDigital Library
- J. H. Friedman, "A recursive partitioning decision rule for nonparametric classification," IEEE T COMPUT, vol. 100, no. 4, pp. 404--408, 1977. Google ScholarDigital Library
- P. Clark and T. Niblett, "The CN2 induction algorithm," MACH LEARN, vol. 3, no. 4, pp. 261--283, 1989. Google ScholarDigital Library
- J. R. Quinlan, "Unknown attribute values in induction," in 6tH INT MACH LEARN WORK, 1989, pp. 164--168. Google ScholarDigital Library
- W. Loh and Y. Shih, "Split selection methods for classification trees," STAT SINICA, vol. 7, pp. 815--840, 1997.Google Scholar
- J. R. Quinlan, "Decision trees as probabilistic classifiers," in 4th INT MACH LEARN WORK, 1987.Google Scholar
- A. Vella, D. Corne, and C. Murphy, "Hyper-heuristic decision tree induction, " W CONF NAT BIOINSP COMP, pp. 409--414, 2010.Google Scholar
- T. Ho and M. Basu, "Complexity measures of supervised classification problems," IEEE T PATTERN ANAL, vol. 24, no. 3, pp. 289--300, 2002. Google ScholarDigital Library
Index Terms
- Towards the automatic design of decision tree induction algorithms
Recommendations
A hyper-heuristic evolutionary algorithm for automatically designing decision-tree algorithms
GECCO '12: Proceedings of the 14th annual conference on Genetic and evolutionary computationDecision tree induction is one of the most employed methods to extract knowledge from data, since the representation of knowledge is very intuitive and easily understandable by humans. The most successful strategy for inducing decision trees, the greedy ...
Automatic design of decision-tree algorithms with evolutionary algorithms
This study reports the empirical analysis of a hyper-heuristic evolutionary algorithm that is capable of automatically designing top-down decision-tree induction algorithms. Top-down decision-tree algorithms are of great importance, considering their ...
Reusable components in decision tree induction algorithms
We propose a generic decision tree framework that supports reusable components design. The proposed generic decision tree framework consists of several sub-problems which were recognized by analyzing well-known decision tree induction algorithms, namely ...
Comments