Search-based fault localisation: A systematic mapping study
Introduction
In recent decades, our reliance on software in all areas of human activity has increased significantly as a result of the increasing use of computer-based systems. This causes growing demands for quality and productivity, from the point of view of both the production processes and the generated product.
It is generally accepted that it is not possible to create perfect software, and mistakes or introduced defects (faults) occur and may be largely unavoidable [1]. Faults are constantly introduced and fixed during the software production and maintenance cycles. The presence of faults in software may stem from a variety of factors including, but not limited to, changes in user’s needs, misunderstanding of software requirements, inadequate software design, low-quality code, poor documentation, and mistakes in the coding phase. Software can still contain faults, even after completion of extensive testing, and failures experienced after software delivery are addressed by corrective maintenance [2]. Therefore, one of the main goals during software development and evolution is to remove as many faults in the software as possible without introducing new ones while doing so.
According to the Software Engineering Guide Body of Knowledge (SWEBOK) [2], software maintenance provides unique technical and management challenges for software engineers, such as trying to find one fault in a software system that contains a large number of lines of code and was developed by another software engineer. In this sense software debugging is a process aimed at finding and resolving faults that prevent the correct operation of computer software or of systems thereof.
Due to the increasing size and complexity of software projects, finding faults has become a more onerous and time-consuming task [3]. Software Fault Localisation (FL) is a vital process which refers to finding the faulty software elements (e.g. statement, line or block of code) related to failures that were revealed on the execution of software testing activities. Such a process can be laborious and time consuming when it is done manually as the complexity of software projects increases. Therefore, one of the main challenges of FL activities is to decrease the human effort by reducing the amount of code analysed until the software faults can be precisely located. Research into FL deals mainly with the problem of developing techniques to automate (or semi-automate) the process of locating software faults. To this end, we can find in the literature [4] different methods to help software engineering practitioners who often spend a significant amount of time and effort on debugging [5]. Among such methods, search-based methods have received increasing attention and a field of research has emerged, named Search-based Fault Localisation (SBFL).
In the SBFL field the FL problem is treated as an optimisation problem and search-based algorithms are used to automate (or partially automate) FL solutions. SBFL researchers usually apply evolutionary algorithms such as Genetic Programming, to derive metrics in order to measure the odds of each program element being faulty. Each individual in the population represents a candidate suspiciousness formula to solve the problem and the population is a set of solutions which evolves to achieve better equations to calculate how suspicious each software element is. A classical example is to rank the software statements with respect to their fault-proneness by applying an approach based on a Genetic Algorithm, but to the best of our knowledge there is no effort to provide an overall analysis of the SBFL methods in the literature.
In order to propose new SBFL methods that reduce the fault localisation effort, and to investigate how they are employed and evaluated, we need to examine and characterise existing methods. Considering this fact and to contribute to the development of the SBFL field, this paper provides results of a mapping study on the SBFL methods. The overall objective is to provide a study of the research on SBFL methods to systematically identify, analyse, and describe the state-of-art advances in the field.
In our mapping we followed a research plan, according to guidelines of Kitchenham et al. [6], including research questions, inclusion and exclusion criteria, construction of the search string and selection of known search databases. We found 14 primary studies, which are analysed considering the following dimensions: i) main fora and frequency of publications over the years; ii) research interests addressed in the field; iii) main characteristics of the proposed methods such as used algorithms, search process aspects and evaluation functions used; and iv) evaluation aspects regarding baselines used in the evaluations, identified benchmarks and evaluation measures.
As a contribution of our mapping we also discuss the main gaps we identified by analysing the found studies. They constitute research opportunities to guide future research in the field.
The paper is organised as follows. Section 2 reviews FL background and related work. Section 3 describes the protocol and procedure adopted in our mapping. The search process and the data extraction are in Sections 4 and 5 respectively. The main results and findings are analysed in Sections 5.1–5.5, which provides answers to our research questions. Section 6 summarises our finding, by presenting gaps and trends identified and derived research opportunities. Section 7 details the main threats to validity of our results and how they were mitigated. Section 8 concludes the paper.
Section snippets
Background
The terms error, fault (defect), and failure are defined, respectively, as “erroneous state of the system”, “defect in a system or a representation of a system that if executed/activated could potentially result in an error”, and “an externally visible deviation from the systems specification” [7]. The Standard IEEE 1044 (2009) [8] states that “a failure may be caused by (and thus indicate the presence of) a fault” and “a fault may cause one or more failures”. We adhere to this terminology in
Planning of the systematic mapping study
Following the guidelines of Kitchenham et al. [6] we created a protocol and structured our mapping study process into seven stages as illustrated in Fig. 1. Such stages are based on [19], [20] and are briefly introduced below:
- 1.
the need and relevance motivates the mapping study and states the research questions (Section 3.1);
- 2.
planning of the study refers to the main steps needed to carry out the mapping study and outlines its structure (this section);
- 3.
search for primary studies seeks relevant
Search for primary studies
When searching for relevant studies, we follow the argumentation of Wohlin et al. [22] for achieving a good sample instead of exhaustively finding all primary studies. The systematic choice of the sources (e.g. quality indexed databases) and the scanning methods (e.g. application of search strings) promotes a better representation of the population for the targeted topic.
The following databases were elected, as recommended by Kitchenham et al. [6] and Petersen et al. [20]:
- 1.
IEEExplore (//ieeexplore.ieee.org
Data extraction, classification and visualisation
The relevant studies were read in detail to extract the data needed to answer the research questions. The two first authors read fully all the primary studies, and both independently analysed and extracted data. The full-text analysis included annotating a digital version of each paper by using colours and adding comments so that each colour was related to a research question (e.g. brown for the benchmarks, orange for evaluation measures, and so on). The analyses were compared and disagreements
Summary of results and research opportunities
In this section, we present a synthesis of our findings regarding the SBLF field and identify research opportunities.
Threats to validity
We present some threats to the validity of our results by following the guidelines of Wohlin et al. [47].
Construct validity refers to the relation between the theory behind the experiment and the observation(s), i.e. what the researcher has in mind and what is investigated according to the research questions. Regarding the research questions, we defined them in discussion meetings to reach alignment with the goals of the mapping, so it became a mitigated threat. On the search string, it impacts
Concluding remarks
Search-based fault localisation (SBFL) is the research field that deals with the use of optimisation techniques to automate, or partially automate the location of faulty code. As faults are constantly introduced and fixed during the software lifecycle and locating faults is a very time-consuming task, SBFL is an important research subject whose first research initiatives occurred in 2011 (according to our findings). We observed an increasing interest in the field since 2017, given by the
Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
References (61)
- et al.
Search-based software engineering
Inf. Softw. Technol.
(2001) - et al.
Lessons from applying the systematic literature review process within the software engineering domain
J. Syst. Softw.
(2007) - et al.
Guidelines for conducting systematic mapping studies in software engineering: an update
Inf. Softw. Technol.
(2015) - et al.
On the reliability of mapping studies in software engineering
J. Syst. Softw.
(2013) - et al.
A new strategy for automotive off-board diagnosis based on a meta-heuristic engine
Eng. Appl. Artif. Intell.
(2011) - et al.
Inferring links between concerns and methods with multi-abstraction vector space model
2016 IEEE International Conference on Software Maintenance and Evolution (ICSME)
(2016) - et al.
Localizing multiple software faults based on evolution algorithm
J. Syst. Softw.
(2018) - ISO/IEC/IEEE, Software and Systems Engineering, Software Testing, Part 1: Concepts and Definitions, International...
- et al.
Guide to the Software Engineering Body of Knowledge SWEBOK Version 3.0
(2014) Expertise in debugging computer programs: An analysis of the content of verbal protocols
IEEE Trans. Syst. Man Cybern.
(1986)
A survey on software fault localization
IEEE Trans. Softw. Eng.
Practitioners’ expectations on automated fault localization
Proceedings of the 25th International Symposium on Software Testing and Analysis
Guidelines for Performing Systematic Literature Reviews in Software Engineering, Technical Report EBSE-2007-01, School of Computer Science and Mathematics
The use of program profiling for software maintenance with applications to the year 2000 problem
SIGSOFT Softw. Eng. Notes
An empirical investigation of program spectra
SIGPLAN Not.
On the accuracy of spectrum-based fault localization
Proceedings of the Testing: Academic and Industrial Conference Practice and Research Techniques - MUTATION
Visualization of test information to assist fault localization
Proceedings of the 24th International Conference on Software Engineering. ICSE 2002
An evaluation of similarity coefficients for software fault localization
Proceedings of the 12th Pacific Rim International Symposium on Dependable Computing
A model for spectra-based software diagnosis
ACM Trans. Softw. Eng. Methodol.
Empirical software engineering and verification
Software fault localisation: a systematic mapping study
IET Softw.
Systematic mapping studies in software engineering
Proceedings of the 12th International Conference on Evaluation and Assessment in Software Engineering
Fault-localization techniques for software systems: a literature review
SIGSOFT Softw. Eng. Notes
The current state and future of search based software engineering
2007 Future of Software Engineering
A systematic review of the application and empirical investigation of search-based test case generation
IEEE Trans. Softw. Eng.
Optimization of guided wave sensors distribution along thin-walled small-diameter pipe
J. Central South Univ. (Sci. Technol.)
Search-based software engineering: trends, techniques and applications
ACM Comput. Surv.
Precise distributed localization algorithms for wireless networks
Sixth IEEE International Symposium on a World of Wireless Mobile and Multimedia Networks
Cited by (6)
A systematic mapping study of bug reproduction and localization
2024, Information and Software TechnologySoftware fault localization: An overview of research, techniques, and tools
2023, Handbook of Software Fault Localization: Foundations and AdvancesLearning to rank software modules for effort-aware defect prediction
2021, Proceedings - 2021 21st International Conference on Software Quality, Reliability and Security Companion, QRS-C 2021A Cross-Project Aging-Related Bug Prediction Approach Based on Joint Probability Domain Adaptation and k-means SMOTE
2021, Proceedings - 2021 21st International Conference on Software Quality, Reliability and Security Companion, QRS-C 2021