High Energy Physics event selection with Gene Expression Programming
Introduction
High Energy Physics, due to the vast and complex data expected from current and future experiments, is in need of powerful and efficient techniques for various analysis tasks. While methods like Artificial Neural Networks, Fisher Discriminant or Likelihood Estimators have already been used in the past or current experiments, Evolutionary Algorithms are less explored.
Evolutionary Algorithms (EA) are computer algorithms inspired by biological evolutionary theories. They encode the candidate solution to the problem at hand into an entity understood by the computer, called a chromosome. The optimal solution is sought through an optimisation process guided by an objective function, called the fitness function, which quantifies how good the candidate solution is. The optimisation process uses a generate-and-test cycle. An initial set of candidate solutions (called a generation) is created, usually randomly. Then new solutions are generated from the current ones applying a set of operators on the chromosomes. These operators, called genetic operators, simulate the genetic reproduction from biological evolutionary theories. The new solutions are then tested with the fitness function and the best ones are selected for further modifications (reproduction).
The EAs contain, traditionally, four types of algorithms: Genetic Algorithms (GA) [1], Genetic Programming (GP) [2], Evolutionary Strategies (ES) [3] and Evolutionary Programming (EP) [4]. The main differences among them are in the chromosome representation and in the reproduction methods. ES and EP represent the chromosome as a vector of real numbers, while in GA the chromosome is usually a fixed length binary string and in GP a LISP expression translated graphically into a tree. The common reproduction method is based on the mutation and cross-over operators that have different definitions in each algorithm. These operators are applied directly on the candidate solution, limiting the performance of the algorithms. In GP, for example, many syntactically invalid structures are produced through genetic variation, wasting computational resources.
Gene Expression Programming (GEP) [5] is a new evolutionary algorithm which overcomes some of the limitations of GA and GP. It works with two entities called the chromosome and the expression tree. The chromosome is the encoder of the candidate solution which is translated into an expression tree (the actual candidate solution). This translation process of the chromosome into an expression tree can be seen as similar to the expression of the biological genes encoded in DNA into proteins. The GEP genetic operators are applied on the chromosome, not directly on the candidate solution (expression tree). This reproduction method together with the structural organisation of the chromosome and its translation process into an expression tree allows unconstrained genetic modifications, always producing valid expression trees. Previous studies [6] showed that these characteristics allow GEP to outperform GP by two to four orders of magnitude in terms of convergence speed for benchmark symbolic regression and classification problems.
The potential of GEP has started to be exploited in different fields such as data and text mining [7], [8], [9], [10], electrical circuit modelling [11], and various engineering applications [12].
In high energy physics, EAs were successfully tested but not largely used, mainly due to their high computational needs. GAs were applied mainly to problems such as discrimination and parameter optimisation in both experimental and theoretical studies (for example, see [13], [14], [15], [16]). GP was recently applied to event selection type problems [17], [18], [19]. ES were tested for optimisation of event selection criteria [20].
This paper presents the application of GEP to particle physics data analysis, extending the first analysis presented in [21] for more datasets and more complex chromosome configurations. The algorithm is used for signal/background separation in the off-line event selection stage of the data analysis process. The purpose of the study is the evaluation of the potential of the algorithm for solving such a problem rather than the extraction of particular physics results.
Section snippets
Algorithm
The GEP algorithm is described in detail in [5] and [6]. The main ideas are summarised here.
The first steps, and the most difficult ones, of the application of GEP (and of any EA) are the problem definition, the encoding of the candidate solution and the definition of the fitness function. The encoding and the fitness functions are specific to each problem. Adequate choices are crucial for the success of the algorithm. In making these choices knowledge about the problem and about the expected
Methodology
In this study GEP was applied to an event selection problem using a supervised statistical learning approach. The algorithm was used to extract selection criteria for the signal/background classification from training data samples for which it was known to which class the event belonged. The generalisation power of the extracted selection criteria was tested by applying the criteria on similar but independent data samples (test data) and comparing the classification provided by these criteria
Analysis and results
Two analyses, with two sets of input functions, were performed on datasets with the signal to background ratio () equal to 0.25, 1 and 5. The results presented below required little computational resources. For the most complex configurations, approximately 60 minutes were needed in order to reach the convergence of the search process, on a computer with a processor.
Conclusions
A newly developed Evolutionary Algorithm, Gene Expression Programming, and its application to an event selection problem in high energy physics data analysis were introduced in this study. In order to evaluate the potential of the algorithm for solving such a problem, the selection of signal events from background events produced in the interaction at 10 GeV was chosen for investigation. This decay channel, being well understood theoretically and experimentally, was accurately
Acknowledgements
Support from the UK Particle Physics and Astronomy Research Council (PPARC) is gratefully acknowledged.
References (30)
A genetic algorithm analysis of resonances in reactions
Nuclear Physics A
(2004)Genetic algorithm for SUSY trigger optimisation in CMS detector at LHC
Nuclear Instruments and Methods in Physics Research A
(2003)- et al.
PhysicsGP: A genetic programming approach to event selection
Computer Physics Communications
(2005) Search for and using genetic programming event selection
Physics Letters B
(2005)- et al.
Parametric optimisation with evolutionary strategies in particle physics
Nuclear Instruments and Methods in Physics Research A
(2004) The BaBar detector
Nuclear Instruments and Methods in Physics Research A
(2002)High Energy Physics event generation with PYTHIA 5.7 and JETSET 7.4
Computer Physics Communications
(1994)The EvtGen particle decay simulation package
Nuclear Instruments and Methods in Physics Research A
(2001)GEANT4—a simulation toolkit
Nuclear Instruments and Methods in Physics Research A
(2003)Adaptation in Natural and Artificial Systems
(1975)
Genetic Programming: On the Programming of the Computers by Means of Natural Selection
Numerical Optimization of Computer Models
Artificial Intelligence through Simulated Evolution
Gene expression programming: A new adaptive algorithm for solving problems
Complex Systems
Gene Expression Programming: Mathematical Modelling by an Artificial Intelligence
Cited by (181)
A predictive model for determination of sand liquefaction potential based on energy method
2024, Soil Dynamics and Earthquake EngineeringPredicting the solubility of hydrogen in hydrocarbon fractions: Advanced data-driven machine learning approach and equation of state
2023, Journal of the Taiwan Institute of Chemical EngineersPredictive model of asphalt mixes’ theoretical maximum specific gravity using gene expression programming
2023, Results in EngineeringDetermining the dependence of solar cell power conversion efficiency on fabrication parameters in inorganic perovskite types by programming gene expression
2022, Physica E: Low-Dimensional Systems and NanostructuresCitation Excerpt :Fig. 3 shows the main steps of the GEP algorithm. It should be noted that the output of the genetic algorithm (GEP) depends on the fitting function and individuals are selected as a new or modified generation based on it [66,73,74]. In this study, GeneXproTools 5.0 is used to derive the PCE prediction model as a function of perovskite layer construction variables.