Scale Genetic Programming for large Data Sets: Case of Higgs Bosons Classification

https://doi.org/10.1016/j.procs.2018.07.264Get rights and content
Under a Creative Commons license
open access

Abstract

Extract knowledge and significant information from very large data sets is a main topic in Data Science, bringing the interest of researchers in machine learning field. Several machine learning techniques have proven effective to deal with massive data like Deep Neuronal Networks. Evolutionary algorithms are considered not well suitable for such problems because of their relatively high computational cost. This work is an attempt to prove that, with some extensions, evolutionary algorithms could be an interesting solution to learn from very large data sets. We propose the use of the Cartesian Genetic Programming (CGP) as meta-heuristic approach to learn from the Higgs big data set. CGP is extended with an active sampling technique in order to help the algorithm to deal with the mass of the provided data. The proposed method is able to take up the challenge of dealing with the complete benchmark data set of 11 million events and produces satisfactory preliminary results.

Keywords

Cartesian Genetic Programming
Active Sampling
Higgs Bosons Classification
large dataset
Machine Learning

Cited by (0)