Medical data mining using evolutionary computation

https://doi.org/10.1016/S0933-3657(98)00065-7Get rights and content

Abstract

In this paper, we introduce a system for discovering medical knowledge by learning Bayesian networks and rules. Evolutionary computation is used as the search algorithm. The Bayesian networks can provide an overall structure of the relationships among the attributes. The rules can capture detailed and interesting patterns in the database. The system is applied to real-life medical databases for limb fracture and scoliosis. The knowledge discovered provides insights to and allows better understanding of these two medical domains.

Introduction

Data mining aims at discovering novel, interesting and useful knowledge from databases [9]. Conventionally, the data is analyzed manually. Many hidden and potentially useful relationships may not be recognized by the analyst. Nowadays, many organizations including modern hospitals are capable of generating and collecting a huge amount of data. This explosive growth of data requires an automated way to extract useful knowledge. Thus, medical domain is a major area for applying data mining. Through data mining, we can extract interesting knowledge and regularities. The discovered knowledge can then be applied in the corresponding field to increase the working efficiency and improve the quality of decision making.

We developed a knowledge discovery system to extract knowledge from data. There are five steps in the system (Fig. 1). Real-life data are collected in the first step. Then, the data must be preprocessed before analysis can be started. The third and fourth step induce knowledge from the preprocessed data. The causality and structure analysis step learns the overall relationships between the variables. A resulting Bayesian network represents the knowledge structure. Based on this knowledge, the user can specify the grammar for the target rules to be discovered from data. This grammar is used for the rule learning step that learns a set of significant rules from the data. In the fifth step, the discovered knowledge is verified and evaluated by the domain experts. The domain experts may discover and correct mistakes in the discovered knowledge. On the other hand, the learned knowledge can be used to refine the existing domain knowledge. Finally, the learned Bayesian network is used to perform reasoning under uncertainty, and the induced rules are incorporated into an expert system for decision making.

In this paper, we present the two knowledge learning steps which are the core of the knowledge discovery system. They both employ evolutionary computation as the search algorithms. This paper is organized as follows. Section 2 introduces the backgrounds on evolutionary computation, Bayesian network learning, and rule learning. Section 3 describes the approaches for learning Bayesian networks. The rule learning process is delineated in Section 4 and the details of the techniques are given in Section 5. The data mining system has been applied to two real-life medical databases. The results are presented in 6 Results on the fracture database, 7 Results on the scoliosis database and the conclusion is presented in Section 8.

Section snippets

Evolutionary computation

The term evolutionary computation is used to describe algorithms that simulate the natural evolution to perform function optimization and machine learning. They are based on the Darwinian principle of evolution through natural selection. The algorithms maintain a group of individuals to explore the search space. Examples of evolutionary computation include genetic algorithms (GA) [19], [13], genetic programming (GP) [24], [25], evolutionary programming (EP) [10], [11] and evolution strategy

Causality and structure analysis

In the proposed knowledge discovery process (Fig. 1), the causality and structure analysis process induces a Bayesian Network from the data. The learning approach is based on Lam and Bacchus’s work [27], [26] on employing the minimum description length (MDL) principle to evaluate a Bayesian Network. EP is employed to optimize this metric in order to search for the best network structure.

Rule learning

The second step in our data mining process is to learn rules from the data. Our learning approach is based on generic genetic programming (GGP) [43], [42], [40], which is an extension of GP. It uses a grammar [21] to control the structures evolved in GP.

Novel techniques for rule learning

Other than using GGP as the search algorithm, other techniques are needed so as to efficiently learn multiple interesting rules from the database. These techniques are described in the following section.

Results on the fracture database

The described data mining technology has been applied to a real-life medical database consisting of children with limb fractures, admitted to the Prince of Wales Hospital of Hong Kong during the period 1984–1996. This data can provide information for the analysis of child fracture patterns. This database has 6500 records and eight attributes, which are listed in Table 3.

Results on the scoliosis database

The data mining process has been applied to the database of scoliosis patients. Scoliosis refers to the spinal deformation, where a patient suffering from this has one or several curves in his spine. Among them, the curves with severe deformations are identified as major curves. The database stores measurements on the patients, such as the number of curves, curve location, degrees and directions. It also records the maturity of the patient, the class of scoliosis and the treatment. The database

Conclusion

We have presented a data mining system that is composed of five steps. The third and fourth steps are detailed. They both employ evolutionary computation as the search algorithms. Causality and structure analysis focuses on the general causality model between the variables while rule learning captures the specific behavior between particular values of the variables.

Our system is particularly suitable to the analysis of real-life databases that cannot be described completely by just a few rules.

Acknowledgements

This work was partially supported by Hong Kong RGC CERG Grant CUHK 4161/97E and CUHK Engineering Faculty Direct Grant 2050151. The authors wish to thank Chun Sau Lau and King Sau Lee for preparing, analyzing and implementing the rule learning system for the scoliosis database.

References (43)

  • L Booker et al.

    Classifier systems and genetic algorithms

    Artif. Intell.

    (1989)
  • G.F Cooper

    The computational complexity of probabilistic inference using Bayesian belief networks

    Artif. Intell.

    (1990)
  • Agrawal R., Imielinski T., Swami A. Mining association rules between sets of items in large databases. Proceedings of...
  • R.R. Bouckaert. Properties of belief networks learning algorithms. Proceedings of the Conference on Uncertainty in...
  • E Charniak

    Bayesian networks without tears

    AI Mag.

    (1991)
  • C.K Chow et al.

    Approximating discrete probability distributions with dependence trees

    IEEE Trans. Inf. Theory

    (1968)
  • P Clark et al.

    The CN2 induction algorithm

    Mach. Learn.

    (1989)
  • G.F Cooper et al.

    A Bayesian method for the induction of probabilistic networks from data

    Mach. Learn.

    (1992)
  • Fayyad U.M., Piatesky-Shapiro G., Smyth P. From data mining to knowledge discovery: An overview. AI Mag.,...
  • D.B Fogel

    An introduction to simulated evolutionary optimization

    IEEE Trans. Neural Netw.

    (1994)
  • Fogel L., Owens A., Walsh M. Artificial Intelligence through Simulated Evolution. New York: Wiley,...
  • A Giordana et al.

    Search-intensive concept induction

    Evol. Comput.

    (1995)
  • Goldberg D.E. Genetic Algorithms in Search, Optimization and Machine Learning. Addison-Wesley, Reading, MA,...
  • Goldberg D.E., Richardson J. Genetic algorithms with sharing for multimodal function optimization. Proceedings of the...
  • Heckerman D. Bayesian Networks for Knowledge Discovery, chapter 11. Cambridge, MA: MIT Press,...
  • D Heckerman et al.

    Learning Bayesian networks: the combination of knowledge and statistical data

    Mach. Learn.

    (1995)
  • D Heckerman et al.

    Bayesian networks

    Commun. ACM

    (1995)
  • Herskovits E., Cooper G. KUTATO: An entropy-driven system for construction of probabilistic expert systems from...
  • Holland J.H. Adaptation in Natural and Artificial Systems. Cambridge, MA: MIT Press,...
  • Holland J.H., Reitman J.S. Cognitive systems based on adaptive algorithms. In: Waterman D.A., Hayes-Roth F., editors....
  • J. E. Hopcroft, J. D. Ullman. Introduction to automata theory, languages, and computation. Reading, MA: Addison-Wesley,...
  • Cited by (39)

    • Knowledge discovery in medicine: Current issue and future trend

      2014, Expert Systems with Applications
      Citation Excerpt :

      Koyuncugil and Ozgulbas have also proposed a system based on association rule mining to identify the appropriate donor in organ transplantations with respect to time constraints (Koyuncugil & Ozgulbas, 2010). The hybrid approach proposed by Shun Ngan, Leung Wong, Lam, Leung, and Cheng (1999) can be used to extract parameters associated with length of stay at hospital for limb fracture and classification of scoliosis. This approach contains bayesian network, Rule induction and Evolutionary computing.

    • Automatic classification of epilepsy types using ontology-based and genetics-based machine learning

      2014, Artificial Intelligence in Medicine
      Citation Excerpt :

      The genetics-based data mining method presented in this paper is related to the work in the area of data mining for discovering temporal hidden knowledge from medical dataset. In [43], learning a Bayesian network and rules is used to discover medical knowledge. A grammar-based genetic programming is used as a search algorithm.

    View all citing articles on Scopus
    View full text