Elsevier

Information Sciences

Volume 524, July 2020, Pages 318-352
Information Sciences

A survey of evolutionary computation for association rule mining

https://doi.org/10.1016/j.ins.2020.02.073Get rights and content

Highlights

  • We present a review of trends and directions in EC-based association rule mining.

  • 221 algorithms were collected between 2000 and 2019 using a research methodology.

  • We review algorithms according to meta-heuristic approaches in nine groups.

  • Applications and the current problems and opportunities are described.

Abstract

Association Rule Mining (ARM) is a significant task for discovering frequent patterns in data mining. It has achieved great success in a plethora of applications such as market basket, computer networks, recommendation systems, and healthcare. In the past few years, evolutionary computation-based ARM has emerged as one of the most popular research areas for addressing the high computation time of traditional ARM. Although numerous papers have been published, there is no comprehensive analysis of existing evolutionary ARM methodologies. In this paper, we review emerging research of evolutionary computation for ARM. We discuss the applications on evolutionary computations for different types of ARM approaches including numerical rules, fuzzy rules, high-utility itemsets, class association rules, and rare association rules. Evolutionary ARM algorithms were classified into four main groups in terms of the evolutionary approach, including evolution-based, swarm intelligence-based, physics-inspired, and hybrid approaches. Furthermore, we discuss the remaining challenges of evolutionary ARM and discuss its applications and future topics.

Introduction

Data mining is a prevalent and effective technique for extracting useful knowledge from data sources. Association Rule Mining (ARM) is one of the main tasks of data mining. ARM aims to find close relationships between items in large datasets, which was first introduced by Agrawal et al. [3]. ARM techniques have been successfully applied in various fields such as the healthcare industry, market basket analysis, and recommendation systems [18]. Implementation of some frequent pattern mining algorithms is available in the SPMF open-source data mining library [84]. Discovering Association Rules (ARs) in a transaction database is an NP-Hard problem. If there are n items in the dataset, the number of itemsets is 2n. The maximum number of ARs that can be extracted from each itemset is 2k2, where k is the length of itemsets. The time complexity of Apriori-based algorithms is O(2n)+O(2k). As a result, the time complexity of discovering ARs is O(k × 2n) [17]. This demonstrates that the running time is exponentially increased with an increase in the number of items [24]. Traditional ARM algorithms require a considerable amount of computation time. Additionally, they depend on data preparation, before applying the algorithm, leading to a loss of information. Furthermore, a sharp boundary between intervals in numeric attributes and distinguishing the degree of membership for the interval in fuzzy sets are two other drawbacks of conventional ARM methods.

Evolutionary Computation (EC) algorithms are a state-of-the-art and efficient strategy for finding near-optimal solutions. EC algorithms encode the problem in terms of solution(s) to be evolved to improve its quality. A key characteristic of EC approaches is that strict termination conditions can be set to limit computation time while a nearly optimal solution can be obtained. Additionally, the use of EC algorithms allows association rule discovery without the frequent itemset generation step. This leads to a reduction in time computation [202]. In the last two decades, many researchers have presented the discovery of ARs based on metaheuristics to address the limitations of traditional approaches. Although many papers have been presented in the field of evolutionary ARM, no previous effort has been made to systematically review such methods. Nevertheless, there are a few survey and review papers that cover a small area of evolutionary ARM. A comparative analysis of three evolutionary ARM methods showed the effectiveness of GA for mining Quantitative ARs (QARs) [15]. Djenouri et al. [75] studied the application of three metaheuristic approaches (including GA, PSO, and ACO) to frequent itemset mining and High Utility Itemset Mining (HUIM). Some of the ARM methods that are based on multi-objective evolutionary algorithms can be found in [203]. By 2014, authors categorized algorithms in three types as categorical ARM, fuzzy ARM, and numeric ARM. Ventura and Luna [263] described some of the Grammar Guided Genetic Programming (G3P)-based methods for the mining of ARs. A review of multi-objective optimization in ARM was published [247], which analyzed ARM in terms of different aspects such as chromosome representations, genetic operators, and fitness functions. Badhon et al. [23] presented a more recently updated review as compared to [247]. Del Jesus et al. [57] reviewed evolutionary ARM approaches published by 2011. Ghafari and Tjortjis [91] provided a comparative study of certain evolutionary ARM algorithms.

Recent papers reviewed only a small fraction of evolutionary ARM algorithms, but there have been many papers published in recent years. Therefore, we aimed to provide a comprehensive survey of emerging research of EC techniques for the ARM process. From a variety of papers collected for this study, 214 papers were selected and used for our survey. A total of 219 scientific ARM algorithms were proposed by these 214 papers. Fig. 1 depicts the distribution of the collected papers published between 2000 and 2019. As presented, the highest number of papers were proposed in 2017 (23 out of 214, 10%). This figure demonstrates that the evolutionary ARM has emerged as a popular topic in recent years.

We categorized ARM algorithms into four groups, based on EC techniques, including evolution-based (Genetic Algorithm (GA)-based approaches and Differential Evolution (DE)), swarm intelligence (Particle Swarm Optimization (PSO), Ant Colony Optimization (ACO), Honey Bee-based Optimization (HBO), and Bat algorithm), physics-inspired, and hybridization. In each group, we provide a brief description as well as a summary of algorithms and their characteristics. To provide a useful overview, we provide a statistical review of algorithms in terms of different aspects. We also discuss current challenges and opportunities and point out potential trends and applications. In general, the objectives of this survey are as follows: (1) developing a classification framework for the application of EC techniques for ARM; (2) providing a systematic and comprehensive review; and (3) determining research gaps and proposing suggestions for directions of future research. We expect this survey provides a reference point for researchers and data miners to be informed of the state-of-the-art evolutionary ARM methods. Investigating research applications encourages businesses and governments to pay attention to the use of evolutionary methods for knowledge discovery in their respective domain.

The rest of the paper is organized as follows: Section 2 provides background information on ARM as well as different types of patterns in ARM. Section 3 presents a classification of EC approaches for ARM. Section 4 discusses current issues and challenges. The measures used to assess the quality of the rules in the ARM process are described in Section 5. Section 6 presents the applications of evolutionary ARM approaches. In Section 7, discussion and statistical analysis of the evolutionary ARM algorithms are provided. Finally, Section 8 concludes the paper.

Section snippets

Association rule mining

ARM is the process of discovering ARs in transaction data. It is one of the most significant unsupervised methods for pattern recognition [199].

Evolutionary computation for ARM

Due to the nature of high-dimensional spaces, ARM is difficult to solve. Therefore, traditional heuristic methods cannot provide sophisticated solutions, which have resulted in increased popularity of non-exact innovative optimization approaches known as EC algorithms. These approaches use an iterative heuristic process to search the problem space through an iterative heuristic process and produce a sufficiently good solution [227]. Single solution-based and population-based are the two main

Challenges and problems

Different issues need to be considered when developing and applying an evolutionary ARM algorithm. Some problems such as large datasets, attribute values, and parameter settings are related to both development and application. Other problems such as MFs and solution encoding are associated with the design of an algorithm. One major barrier to using metaheuristic algorithms in ARM is that considerable skill and experience are required in order to determine suitable parameters, such as minimum

Quality measures

In order to select the best set of frequent patterns, it is essential to assess their quality. Probability-based measures are often used to evaluate the generality and reliability of ARs. In ARM, the support criterion is employed to measure the generality of ARs. The measures of confidence, lift, and leverage are applied to show the reliability of ARs [189]. Support and confidence are the most widely-used measures in the ARM. However, even though many meaningless and redundant rules are mined,

Applications of evolutionary association rule mining

Optimization is applied in engineering and industry in order to minimize or maximize goals. This stems from the limitations of resources, time, and money. Therefore, optimization is far more important in practice. Evolutionary ARM approaches have been effectively applied to a variety of domains. Generally, the major applications can be grouped into eight categories: market basket, recommendation systems, computer networks, healthcare, environment, industry, education, and road traffic. In the

Statistical review and discussion

Table 14 presents the distribution of ARM methods based on evolutionary techniques and their publication year (between 2000 and 2019). GA is the most commonly used EC approach and has been a popular algorithm since 2000, when evolutionary ARM appeared. Swarm intelligence-based algorithms are the second most widely-used approach after GA-based algorithms. This is because swarm intelligence-based methods can be used to converge to both a single optimum and multiple optimal solutions. Applying

Conclusion and future work

Association rule mining is currently one of the most active data mining topics. The application of EC techniques for ARM has received a considerable amount of research attention. This paper has provided a comprehensive survey of EC approaches for solving the ARM problem. We presented a classification of evolutionary ARM algorithms along with a brief description of them and a statistical review of methods in terms of different factors. Applications and critical challenges were also discussed.

Declaration of Competing Interest

None.

References (303)

  • C.H. Chen et al.

    An improved approach to find membership functions and multiple minimum supports in fuzzy data mining

    Expert Syst. Appl.

    (2009)
  • Y. Cheng et al.

    GA‐based multi-level association rule mining approach for defect analysis in the construction industry

    Autom. Constr.

    (2015)
  • Y. Cui et al.

    Multi-objective optimization methods and application in energy saving

    Energy

    (2017)
  • Y. Djenouri et al.

    Mining diversified association rules in big datasets: a cluster/GPU/genetic approach

    Inf. Sci.

    (2018)
  • Y. Djenouri et al.

    Combining Apriori heuristic and bio-inspired algorithms for solving the frequent itemsets mining problem

    Inf. Sci.

    (2017)
  • Y. Djenouri et al.

    Exploiting GPU parallelism in improving bees swarm optimization for mining big transactional databases

    Inf. Sci.

    (2019)
  • Y. Djenouri et al.

    Intelligent mapping between GPU and cluster computing for discovering big association rules

    Appl. Soft Comput.

    (2018)
  • K.Y. Fung et al.

    A multi-objective genetic algorithm approach to rule mining for affective product design

    Expert Syst. Appl.

    (2012)
  • A.H. Gandomi et al.

    Metaheuristic algorithms in modeling and optimization

  • Z. Geng et al.

    Semantic relation extraction using sequential and tree-structured LSTM with attention

    Inf. Sci.

    (2020)
  • Z. Geng et al.

    A model-free Bayesian classifier

    Inf. Sci.

    (2019)
  • A. Ghosh et al.

    Multi-objective rule mining using genetic algorithms

    Inf. Sci.

    (2004)
  • Y.J. Gong et al.

    Distributed evolutionary algorithms and their models: A survey of the state-of-the-art

    Appl. Soft Comput.

    (2015)
  • A. Agarwal et al.

    Association rule mining using hybrid GA-PSO for multi-objective optimisation

  • J. Agrawal et al.

    SET-PSO-based approach for mining positive and negative association rules

    Knowl. Inf. Syst.

    (2015)
  • R. Agrawal et al.

    Mining association rules between sets of items in large databases

  • M. Agrawal et al.

    Association rules optimization using improved PSO algorithm

  • R. Agrawal et al.

    Fast algorithms for mining association rules in large databases

  • K.I. Ahn et al.

    Efficient mining of frequent itemsets and a measure of interest for association rule mining

    J. Inf. Knowl. Manag.

    (2004)
  • B. Alatas et al.

    An efficient genetic algorithm for automated mining of both positive and negative quantitative association rules

    Soft Comput.

    (2006)
  • B. Alatas et al.

    Rough particle swarm optimization and its applications in data mining

    Soft Comput.

    (2008)
  • R. Alcalá et al.

    Genetic learning of membership functions for mining fuzzy association rules

  • J. Alcala-Fdez et al.

    Analysis of the effectiveness of the genetic algorithms based on extraction of association rules

    Fundamenta Informaticae

    (2010)
  • R. Alhajj et al.

    Multi-objective genetic algorithms based automated clustering for fuzzy association rules mining

    J. Intell. Inf. Syst.

    (2008)
  • W. Altaf et al.

    Applications of association rule mining in health informatics: a survey

    Artif. Intell. Rev.

    (2017)
  • E.V. Altay et al.

    Performance analysis of multi-objective artificial intelligence optimization algorithms in numerical association rule mining

    J. Ambient Intell. Human. Comput.

    (2019)
  • B.M. Al-Maqaleh

    Discovering interesting association rules: a multi-objective genetic algorithm approach

    Int. J. Appl. Inf. Syst.

    (2013)
  • V.P. Álvarez et al.

    An evolutionary algorithm to discover quantitative association rules from huge databases without the need for an a priori discretization

    Expert Syst. Appl.

    (2012)
  • B. Badhon et al.

    A survey on association rule mining based on evolutionary algorithms

    Int. J. Comput. Appl.

    (2019)
  • A. Berrado et al.

    Using metarules to organize and group discovered association rules

    Data Mining Knowl. Discov.

    (2007)
  • C. Birtolo et al.

    A generative approach to product bundling in the e-commerce domain

  • C. Birtolo et al.

    Searching optimal product bundles by means of GA-based engine and market basket analysis

  • S. Brin et al.

    Dynamic itemset counting and implication rules for market basket data

    ACM Sigmod Record

    (1997)
  • U. Can et al.

    Automatic mining of quantitative association rules with gravitational search algorithm

    Int. J. Software Eng. Knowledge Eng.

    (2017)
  • A. Cano et al.

    High performance evaluation of evolutionary-mined association rules on GPUs

    J. Supercomput.

    (2013)
  • C.J. Carmona et al.

    NMEEF-SD: non-dominated multiobjective evolutionary algorithm for extracting fuzzy rules in subgroup discovery

    IEEE Trans. Fuzzy Syst.

    (2010)
  • M.A. Chamazi et al.

    Deriving support threshold values and membership functions using the multiple-level cluster-based master–slave IFG approach

    Soft Computing

    (2013)
  • M.A. Chamazi et al.

    Finding suitable membership functions for fuzzy temporal mining problems using fuzzy temporal bees method

    Soft Comput.

    (2019)
  • R. Chan et al.

    Mining high utility itemsets

  • C.H. Chen et al.

    A multiple-level genetic-fuzzy mining algorithm

  • Cited by (134)

    • Association rules mining for long uptime sucker rod pumping units

      2024, Reliability Engineering and System Safety
    View all citing articles on Scopus
    View full text