Mining association rules on Big Data through MapReduce genetic programming
Created by W.Langdon from
gp-bibliography.bib Revision:1.7989
- @Article{PadilloLHV18,
-
author = "Francisco Padillo and Jose Maria Luna and
Francisco Herrera and Sebastian Ventura",
-
title = "Mining association rules on Big Data through
{MapReduce} genetic programming",
-
journal = "Integrated Computer-Aided Engineering,",
-
year = "2018",
-
volume = "25",
-
number = "1",
-
pages = "31--48",
-
keywords = "genetic algorithms, genetic programming, Association
rules, Big Data, MapReduce, Hadoop, Spark",
-
ISSN = "1069-2509",
-
publisher = "IOS Press",
-
DOI = "doi:10.3233/ICA-170555",
-
size = "18 pages",
-
abstract = "Association rule mining is one of the most important
tasks to describe raw data. Although many efficient
algorithms have been developed to this aim, existing
algorithms do not work well on huge volumes of data.
The aim of this paper is to propose a new genetic
programming algorithm for mining association rules in
Big Data. The genetic operators of our proposal have
been specifically designed to avoid a growing in the
complexity of the solutions without an improvement in
their fitness function values. Furthermore, it
introduces a repairing operator to improve the
convergence. Additionally, to facilitate its
application on real world problems a grammar has been
included, allowing it to introduce subjective knowledge
into the mining process and to reduce the search space.
Due to the growing interest in data gathering, a unique
implementation of the proposed algorithm is not useful
so different implementations (considering different
architectures such as RMI, Hadoop and Spark) are
required depending on the data size. All these
adaptations obtain exactly the same solutions as those
of the original algorithm since they only differ on the
software architectures. The experimental study
considers more than 75 datasets and 14 algorithms and
the results reveal that the proposed algorithm obtains
excellent results for more than 12 quality measures.
The scalability of the proposal is also analysed by
considering the three parallel implementations on high
dimensional datasets (3,000 millions of instances) and
file sizes up to 800 GB.",
- }
Genetic Programming entries for
Francisco Padillo
Jose Maria Luna
Francisco Herrera
Sebastian Ventura
Citations