A survey of evolutionary computation for association rule mining
Introduction
Data mining is a prevalent and effective technique for extracting useful knowledge from data sources. Association Rule Mining (ARM) is one of the main tasks of data mining. ARM aims to find close relationships between items in large datasets, which was first introduced by Agrawal et al. [3]. ARM techniques have been successfully applied in various fields such as the healthcare industry, market basket analysis, and recommendation systems [18]. Implementation of some frequent pattern mining algorithms is available in the SPMF open-source data mining library [84]. Discovering Association Rules (ARs) in a transaction database is an NP-Hard problem. If there are n items in the dataset, the number of itemsets is 2n. The maximum number of ARs that can be extracted from each itemset is 2k−2, where k is the length of itemsets. The time complexity of Apriori-based algorithms is O(2n)+O(2k). As a result, the time complexity of discovering ARs is O(k × 2n) [17]. This demonstrates that the running time is exponentially increased with an increase in the number of items [24]. Traditional ARM algorithms require a considerable amount of computation time. Additionally, they depend on data preparation, before applying the algorithm, leading to a loss of information. Furthermore, a sharp boundary between intervals in numeric attributes and distinguishing the degree of membership for the interval in fuzzy sets are two other drawbacks of conventional ARM methods.
Evolutionary Computation (EC) algorithms are a state-of-the-art and efficient strategy for finding near-optimal solutions. EC algorithms encode the problem in terms of solution(s) to be evolved to improve its quality. A key characteristic of EC approaches is that strict termination conditions can be set to limit computation time while a nearly optimal solution can be obtained. Additionally, the use of EC algorithms allows association rule discovery without the frequent itemset generation step. This leads to a reduction in time computation [202]. In the last two decades, many researchers have presented the discovery of ARs based on metaheuristics to address the limitations of traditional approaches. Although many papers have been presented in the field of evolutionary ARM, no previous effort has been made to systematically review such methods. Nevertheless, there are a few survey and review papers that cover a small area of evolutionary ARM. A comparative analysis of three evolutionary ARM methods showed the effectiveness of GA for mining Quantitative ARs (QARs) [15]. Djenouri et al. [75] studied the application of three metaheuristic approaches (including GA, PSO, and ACO) to frequent itemset mining and High Utility Itemset Mining (HUIM). Some of the ARM methods that are based on multi-objective evolutionary algorithms can be found in [203]. By 2014, authors categorized algorithms in three types as categorical ARM, fuzzy ARM, and numeric ARM. Ventura and Luna [263] described some of the Grammar Guided Genetic Programming (G3P)-based methods for the mining of ARs. A review of multi-objective optimization in ARM was published [247], which analyzed ARM in terms of different aspects such as chromosome representations, genetic operators, and fitness functions. Badhon et al. [23] presented a more recently updated review as compared to [247]. Del Jesus et al. [57] reviewed evolutionary ARM approaches published by 2011. Ghafari and Tjortjis [91] provided a comparative study of certain evolutionary ARM algorithms.
Recent papers reviewed only a small fraction of evolutionary ARM algorithms, but there have been many papers published in recent years. Therefore, we aimed to provide a comprehensive survey of emerging research of EC techniques for the ARM process. From a variety of papers collected for this study, 214 papers were selected and used for our survey. A total of 219 scientific ARM algorithms were proposed by these 214 papers. Fig. 1 depicts the distribution of the collected papers published between 2000 and 2019. As presented, the highest number of papers were proposed in 2017 (23 out of 214, 10%). This figure demonstrates that the evolutionary ARM has emerged as a popular topic in recent years.
We categorized ARM algorithms into four groups, based on EC techniques, including evolution-based (Genetic Algorithm (GA)-based approaches and Differential Evolution (DE)), swarm intelligence (Particle Swarm Optimization (PSO), Ant Colony Optimization (ACO), Honey Bee-based Optimization (HBO), and Bat algorithm), physics-inspired, and hybridization. In each group, we provide a brief description as well as a summary of algorithms and their characteristics. To provide a useful overview, we provide a statistical review of algorithms in terms of different aspects. We also discuss current challenges and opportunities and point out potential trends and applications. In general, the objectives of this survey are as follows: (1) developing a classification framework for the application of EC techniques for ARM; (2) providing a systematic and comprehensive review; and (3) determining research gaps and proposing suggestions for directions of future research. We expect this survey provides a reference point for researchers and data miners to be informed of the state-of-the-art evolutionary ARM methods. Investigating research applications encourages businesses and governments to pay attention to the use of evolutionary methods for knowledge discovery in their respective domain.
The rest of the paper is organized as follows: Section 2 provides background information on ARM as well as different types of patterns in ARM. Section 3 presents a classification of EC approaches for ARM. Section 4 discusses current issues and challenges. The measures used to assess the quality of the rules in the ARM process are described in Section 5. Section 6 presents the applications of evolutionary ARM approaches. In Section 7, discussion and statistical analysis of the evolutionary ARM algorithms are provided. Finally, Section 8 concludes the paper.
Section snippets
Association rule mining
ARM is the process of discovering ARs in transaction data. It is one of the most significant unsupervised methods for pattern recognition [199].
Evolutionary computation for ARM
Due to the nature of high-dimensional spaces, ARM is difficult to solve. Therefore, traditional heuristic methods cannot provide sophisticated solutions, which have resulted in increased popularity of non-exact innovative optimization approaches known as EC algorithms. These approaches use an iterative heuristic process to search the problem space through an iterative heuristic process and produce a sufficiently good solution [227]. Single solution-based and population-based are the two main
Challenges and problems
Different issues need to be considered when developing and applying an evolutionary ARM algorithm. Some problems such as large datasets, attribute values, and parameter settings are related to both development and application. Other problems such as MFs and solution encoding are associated with the design of an algorithm. One major barrier to using metaheuristic algorithms in ARM is that considerable skill and experience are required in order to determine suitable parameters, such as minimum
Quality measures
In order to select the best set of frequent patterns, it is essential to assess their quality. Probability-based measures are often used to evaluate the generality and reliability of ARs. In ARM, the support criterion is employed to measure the generality of ARs. The measures of confidence, lift, and leverage are applied to show the reliability of ARs [189]. Support and confidence are the most widely-used measures in the ARM. However, even though many meaningless and redundant rules are mined,
Applications of evolutionary association rule mining
Optimization is applied in engineering and industry in order to minimize or maximize goals. This stems from the limitations of resources, time, and money. Therefore, optimization is far more important in practice. Evolutionary ARM approaches have been effectively applied to a variety of domains. Generally, the major applications can be grouped into eight categories: market basket, recommendation systems, computer networks, healthcare, environment, industry, education, and road traffic. In the
Statistical review and discussion
Table 14 presents the distribution of ARM methods based on evolutionary techniques and their publication year (between 2000 and 2019). GA is the most commonly used EC approach and has been a popular algorithm since 2000, when evolutionary ARM appeared. Swarm intelligence-based algorithms are the second most widely-used approach after GA-based algorithms. This is because swarm intelligence-based methods can be used to converge to both a single optimum and multiple optimal solutions. Applying
Conclusion and future work
Association rule mining is currently one of the most active data mining topics. The application of EC techniques for ARM has received a considerable amount of research attention. This paper has provided a comprehensive survey of EC approaches for solving the ARM problem. We presented a classification of evolutionary ARM algorithms along with a brief description of them and a statistical review of methods in terms of different factors. Applications and critical challenges were also discussed.
Declaration of Competing Interest
None.
References (303)
- et al.
A novel bee swarm optimization algorithm for numerical function optimization
Commun. Nonlinear Sci. Numer. Simul.
(2010) - et al.
Chaotically encoded particle swarm optimization algorithm and its applications, Chaos
Solit. Fract.
(2009) - et al.
Multi-objective rule mining using a chaotic particle swarm optimization algorithm
Knowl.-Based Syst.
(2009) - et al.
MODENAR: Multi-objective differential evolution algorithm for mining numeric association rules
Appl. Soft Comput.
(2008) - et al.
Learning the membership function contexts for mining fuzzy association rules by using genetic algorithms
Fuzzy Sets Syst.
(2009) - et al.
Rare-PEARs: A new multi objective evolutionary algorithm to mine rare and non-redundant quantitative association rules
Knowl.-Based Syst.
(2015) - et al.
Algorithmic design issues in adaptive differential evolution schemes: Review and taxonomy
Swarm .Evol. Comput.
(2018) - et al.
Multi-objective PSO algorithm for mining numerical association rules without a priori discretization
Expert Syst. Appl.
(2014) - et al.
A survey on optimization metaheuristics
Inf. Sci.
(2013) - et al.
MOGA-based fuzzy data mining with taxonomy
Knowl.-Based Syst.
(2013)
An improved approach to find membership functions and multiple minimum supports in fuzzy data mining
Expert Syst. Appl.
GA‐based multi-level association rule mining approach for defect analysis in the construction industry
Autom. Constr.
Multi-objective optimization methods and application in energy saving
Energy
Mining diversified association rules in big datasets: a cluster/GPU/genetic approach
Inf. Sci.
Combining Apriori heuristic and bio-inspired algorithms for solving the frequent itemsets mining problem
Inf. Sci.
Exploiting GPU parallelism in improving bees swarm optimization for mining big transactional databases
Inf. Sci.
Intelligent mapping between GPU and cluster computing for discovering big association rules
Appl. Soft Comput.
A multi-objective genetic algorithm approach to rule mining for affective product design
Expert Syst. Appl.
Metaheuristic algorithms in modeling and optimization
Semantic relation extraction using sequential and tree-structured LSTM with attention
Inf. Sci.
A model-free Bayesian classifier
Inf. Sci.
Multi-objective rule mining using genetic algorithms
Inf. Sci.
Distributed evolutionary algorithms and their models: A survey of the state-of-the-art
Appl. Soft Comput.
Association rule mining using hybrid GA-PSO for multi-objective optimisation
SET-PSO-based approach for mining positive and negative association rules
Knowl. Inf. Syst.
Mining association rules between sets of items in large databases
Association rules optimization using improved PSO algorithm
Fast algorithms for mining association rules in large databases
Efficient mining of frequent itemsets and a measure of interest for association rule mining
J. Inf. Knowl. Manag.
An efficient genetic algorithm for automated mining of both positive and negative quantitative association rules
Soft Comput.
Rough particle swarm optimization and its applications in data mining
Soft Comput.
Genetic learning of membership functions for mining fuzzy association rules
Analysis of the effectiveness of the genetic algorithms based on extraction of association rules
Fundamenta Informaticae
Multi-objective genetic algorithms based automated clustering for fuzzy association rules mining
J. Intell. Inf. Syst.
Applications of association rule mining in health informatics: a survey
Artif. Intell. Rev.
Performance analysis of multi-objective artificial intelligence optimization algorithms in numerical association rule mining
J. Ambient Intell. Human. Comput.
Discovering interesting association rules: a multi-objective genetic algorithm approach
Int. J. Appl. Inf. Syst.
An evolutionary algorithm to discover quantitative association rules from huge databases without the need for an a priori discretization
Expert Syst. Appl.
A survey on association rule mining based on evolutionary algorithms
Int. J. Comput. Appl.
Using metarules to organize and group discovered association rules
Data Mining Knowl. Discov.
A generative approach to product bundling in the e-commerce domain
Searching optimal product bundles by means of GA-based engine and market basket analysis
Dynamic itemset counting and implication rules for market basket data
ACM Sigmod Record
Automatic mining of quantitative association rules with gravitational search algorithm
Int. J. Software Eng. Knowledge Eng.
High performance evaluation of evolutionary-mined association rules on GPUs
J. Supercomput.
NMEEF-SD: non-dominated multiobjective evolutionary algorithm for extracting fuzzy rules in subgroup discovery
IEEE Trans. Fuzzy Syst.
Deriving support threshold values and membership functions using the multiple-level cluster-based master–slave IFG approach
Soft Computing
Finding suitable membership functions for fuzzy temporal mining problems using fuzzy temporal bees method
Soft Comput.
Mining high utility itemsets
A multiple-level genetic-fuzzy mining algorithm
Cited by (134)
Association rules mining for long uptime sucker rod pumping units
2024, Reliability Engineering and System SafetyA comprehensive review of visualization methods for association rule mining: Taxonomy, challenges, open problems and future ideas
2023, Expert Systems with ApplicationsAn edge-aided parallel evolutionary privacy-preserving algorithm for Internet of Things
2023, Internet of Things (Netherlands)Parallel incremental association rule mining framework for public opinion analysis
2023, Information SciencesAn explainable machine learning approach for Alzheimer’s disease classification
2024, Scientific Reports