Abstract
Large databases are becoming ever more ubiquitous, as are the opportunities for discovering useful knowledge within them. Evolutionary computation methods such as genetic programming have previously been applied to several aspects of the problem of discovering knowledge in databases. The more specific task of producing human-comprehensible SQL queries has several potential applications but has thus far been explored only to a limited extent. In this chapter we show howdevelopmental genetic programming can automatically generate SQL queries from sets of positive and negative examples. We show that a developmental genetic programming system can produce queries that are reasonably accurate while excelling in human comprehensibility relative to the well-known C5.0 decision tree generation system.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsNotes
- 1.
- 2.
C5.0 is available at http://rulequest.com/see5-info.html.
References
Acar AC, Motro A (2005) Intensional encapsulations of database subsets by genetic programming. Tech. Rep. ISE-TR-05-01, Information and Software Engineering Department, The Volgenau School of Information Technology and Engineering, George Mason University, URL http://ise.gmu.edu/techrep/2005/05-01.pdf
Doucette JA, McIntyre AR, Lichodzijewski P, Heywood MI (2012) Symbiotic coevolutionary genetic programming: a benchmarking study under large attribute spaces. Genetic Programming and Evolvable Machines 13(1):71–101, DOI doi:10.1007/s10710-011-9151-4, special Section on Evolutionary Algorithms for Data Mining
Frank A, Asuncion A (2010) UCI machine learning repository. URL http://archive.ics.uci.edu/ml
Freitas A (2002) A survey of evolutionary algorithms for data mining and knowledge discovery. In: Ghosh A, Tsutsui S (eds) Advances in Evolutionary Computation, Springer-Verlag, chap 33, pp 819–845, URL http://www.macs.hw.ac.uk/~dwcorne/Teaching/freitas01survey.pdf
Freitas AA (1997) A genetic programming framework for two data mining tasks: Classification and generalized rule induction. In: Koza JR, Deb K, Dorigo M, Fogel DB, Garzon M, Iba H, Riolo RL (eds) Genetic Programming 1997: Proceedings of the Second Annual Conference, Morgan Kaufmann, Stanford University, CA, USA, pp 96–101, URL http://citeseer.nj.nec.com/43454.html
Gruau F (1994) Neural network synthesis using cellular encoding and the genetic algorithm. PhD thesis, Laboratoire de l’Informatique du Parallilisme, Ecole Normale Supirieure de Lyon, France, URL ftp://ftp.ens-lyon.fr/pub/LIP/Rapports/PhD/PhD1994/PhD1994-01-E.ps.Z
Ishida CY, Pozo ATR (2002) GP SQL miner: SQL-grammar genetic programming in data mining. In: Fogel DB, El-Sharkawi MA, Yao X, Greenwood G, Iba H, Marrow P, Shackleton M (eds) Proceedings of the 2002 Congress on Evolutionary Computation CEC2002, IEEE Press, pp 1226–1231
Klein J, Spector L (2007) Unwitting distributed genetic programming via asynchronous JavaScript and XML. In: Thierens D, Beyer HG, Bongard J, Branke J, Clark JA, Cliff D, Congdon CB, Deb K, Doerr B, Kovacs T, Kumar S, Miller JF, Moore J, Neumann F, Pelikan M, Poli R, Sastry K, Stanley KO, Stutzle T, Watson RA, Wegener I (eds) GECCO ’07: Proceedings of the 9th annual conference on Genetic and evolutionary computation, ACM Press, London, vol 2, pp 1628–1635, DOI doi:10.1145/1276958.1277282, URL http://www.cs.bham.ac.uk/~wbl/biblio/gecco2007/docs/p1628.pdf
Koza JR, Andre D, Bennett III FH, Keane M (1999) Genetic Programming 3: Darwinian Invention and Problem Solving. Morgan Kaufman, URL http://www.genetic-programming.org/gpbook3toc.html
Montana DJ (1995) Strongly typed genetic programming. Evolutionary Computation 3(2):199–230, DOI doi:10.1162/evco.1995.3.2.199, URL http://vishnu.bbn.com/papers/stgp.pdf
Quinlan JR (1993) C4.5: programs for machine learning. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA
da Silva BC, Thomas P (2010) Automatic query generation, “Unpublished manuscript”
Spector L (2001) Autoconstructive evolution: Push, pushGP, and pushpop. In: Spector L, Goodman ED, Wu A, Langdon WB, Voigt HM, Gen M, Sen S, Dorigo M, Pezeshk S, Garzon MH, Burke E (eds) Proceedings of the Genetic and Evolutionary Computation Conference (GECCO-2001), Morgan Kaufmann, San Francisco, California, USA, pp 137–146, URL http://hampshire.edu/lspector/pubs/ace.pdf
Spector L, Klein J (2005) Trivial geography in genetic programming. In: Yu T, Riolo RL, Worzel B (eds) Genetic Programming Theory and Practice III, Genetic Programming, vol 9, Springer, Ann Arbor, chap 8, pp 109–123, URL http://hampshire.edu/lspector/pubs/trivial-geography-toappear.pdf
Spector L, Klein J, Keijzer M (2005) The push3 execution stack and the evolution of control. In: Beyer HG, O’Reilly UM, Arnold DV, Banzhaf W, Blum C, Bonabeau EW, Cantu-Paz E, Dasgupta D, Deb K, Foster JA, de Jong ED, Lipson H, Llora X, Mancoridis S, Pelikan M, Raidl GR, Soule T, Tyrrell AM, Watson JP, Zitzler E (eds) GECCO 2005: Proceedings of the 2005 conference on Genetic and evolutionary computation, ACM Press, Washington DC, USA, vol 2, pp 1689–1696, DOI doi:10.1145/1068009.1068292, URL http://www.cs.bham.ac.uk/~wbl/biblio/gecco2005/docs/p1689.pdf
Van Rijsbergen C (1979) Information retrieval. Butterworths, London
Veeramachaneni K, Vladislavleva E, O’Reilly UM (2012) Knowledge mining sensory evaluation data: genetic programming, statistical techniques, and swarm optimization. Genetic Programming and Evolvable Machines 13(1):103–133, DOI doi:10.1007/s10710-011-9153-2, special Section on Evolutionary Algorithms for Data Mining
Acknowledgements
We thank Gerome Miklau for advice regarding databases and the UCI Machine Learning Repository for use of the adult dataset; see http://archive.ics.uci.-edu/ml/index.html. This material is based upon work supported by the National Science Foundation under Grant No. 1017817. Any opinions, findings, and conclusions or recommendations expressed in this publication are those of the authors and do not necessarily reflect the views of the National Science Foundation.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer Science+Business Media New York
About this chapter
Cite this chapter
Helmuth, T., Spector, L. (2013). Evolving SQL Queries from Examples with Developmental Genetic Programming. In: Riolo, R., Vladislavleva, E., Ritchie, M., Moore, J. (eds) Genetic Programming Theory and Practice X. Genetic and Evolutionary Computation. Springer, New York, NY. https://doi.org/10.1007/978-1-4614-6846-2_1
Download citation
DOI: https://doi.org/10.1007/978-1-4614-6846-2_1
Published:
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4614-6845-5
Online ISBN: 978-1-4614-6846-2
eBook Packages: Computer ScienceComputer Science (R0)