Search-based fault localisation: A systematic mapping study

https://doi.org/10.1016/j.infsof.2020.106295Get rights and content

Abstract

Context

Software Fault Localisation (FL) refers to finding faulty software elements related to failures produced as a result of test case execution. This is a laborious and time consuming task. To allow FL automation search-based algorithms have been successfully applied in the field of Search-Based Fault Localisation (SBFL). However, there is no study mapping the SBFL field to the best of our knowledge and we believe that such a map is important to promote new advances in this field.

Objective

To present the results of a mapping study on SBFL, by characterising the proposed methods, identifying sources of used information, adopted evaluation functions, applied algorithms and elements regarding reported experiments.

Method

Our mapping followed a defined process and a search protocol. The conducted analysis considers different dimensions and categories related to the main characteristics of SBFL methods.

Results

All methods are grounded on the coverage spectra category. Overall the methods search for solutions related to suspiciousness formulae to identify possible faulty code elements. Most studies use evolutionary algorithms, mainly Genetic Programming, by using a single-objective function. There is little investigation of real-and-multiple-fault scenarios, and the subjects are mostly written in C and Java. No consensus was observed on how to apply the evaluation metrics.

Conclusions

Search-based fault localisation has seen a rise in interest in the past few years and the number of studies has been growing. We identified some research opportunities such as exploring new sources of fault data, exploring multi-objective algorithms, analysing benchmarks according to some classes of faults, as well as, the use of a unique definition for evaluation measures.

Introduction

In recent decades, our reliance on software in all areas of human activity has increased significantly as a result of the increasing use of computer-based systems. This causes growing demands for quality and productivity, from the point of view of both the production processes and the generated product.

It is generally accepted that it is not possible to create perfect software, and mistakes or introduced defects (faults) occur and may be largely unavoidable [1]. Faults are constantly introduced and fixed during the software production and maintenance cycles. The presence of faults in software may stem from a variety of factors including, but not limited to, changes in user’s needs, misunderstanding of software requirements, inadequate software design, low-quality code, poor documentation, and mistakes in the coding phase. Software can still contain faults, even after completion of extensive testing, and failures experienced after software delivery are addressed by corrective maintenance [2]. Therefore, one of the main goals during software development and evolution is to remove as many faults in the software as possible without introducing new ones while doing so.

According to the Software Engineering Guide Body of Knowledge (SWEBOK) [2], software maintenance provides unique technical and management challenges for software engineers, such as trying to find one fault in a software system that contains a large number of lines of code and was developed by another software engineer. In this sense software debugging is a process aimed at finding and resolving faults that prevent the correct operation of computer software or of systems thereof.

Due to the increasing size and complexity of software projects, finding faults has become a more onerous and time-consuming task [3]. Software Fault Localisation (FL) is a vital process which refers to finding the faulty software elements (e.g. statement, line or block of code) related to failures that were revealed on the execution of software testing activities. Such a process can be laborious and time consuming when it is done manually as the complexity of software projects increases. Therefore, one of the main challenges of FL activities is to decrease the human effort by reducing the amount of code analysed until the software faults can be precisely located. Research into FL deals mainly with the problem of developing techniques to automate (or semi-automate) the process of locating software faults. To this end, we can find in the literature [4] different methods to help software engineering practitioners who often spend a significant amount of time and effort on debugging [5]. Among such methods, search-based methods have received increasing attention and a field of research has emerged, named Search-based Fault Localisation (SBFL).

In the SBFL field the FL problem is treated as an optimisation problem and search-based algorithms are used to automate (or partially automate) FL solutions. SBFL researchers usually apply evolutionary algorithms such as Genetic Programming, to derive metrics in order to measure the odds of each program element being faulty. Each individual in the population represents a candidate suspiciousness formula to solve the problem and the population is a set of solutions which evolves to achieve better equations to calculate how suspicious each software element is. A classical example is to rank the software statements with respect to their fault-proneness by applying an approach based on a Genetic Algorithm, but to the best of our knowledge there is no effort to provide an overall analysis of the SBFL methods in the literature.

In order to propose new SBFL methods that reduce the fault localisation effort, and to investigate how they are employed and evaluated, we need to examine and characterise existing methods. Considering this fact and to contribute to the development of the SBFL field, this paper provides results of a mapping study on the SBFL methods. The overall objective is to provide a study of the research on SBFL methods to systematically identify, analyse, and describe the state-of-art advances in the field.

In our mapping we followed a research plan, according to guidelines of Kitchenham et al. [6], including research questions, inclusion and exclusion criteria, construction of the search string and selection of known search databases. We found 14 primary studies, which are analysed considering the following dimensions: i) main fora and frequency of publications over the years; ii) research interests addressed in the field; iii) main characteristics of the proposed methods such as used algorithms, search process aspects and evaluation functions used; and iv) evaluation aspects regarding baselines used in the evaluations, identified benchmarks and evaluation measures.

As a contribution of our mapping we also discuss the main gaps we identified by analysing the found studies. They constitute research opportunities to guide future research in the field.

The paper is organised as follows. Section 2 reviews FL background and related work. Section 3 describes the protocol and procedure adopted in our mapping. The search process and the data extraction are in Sections 4 and 5 respectively. The main results and findings are analysed in Sections 5.1–5.5, which provides answers to our research questions. Section 6 summarises our finding, by presenting gaps and trends identified and derived research opportunities. Section 7 details the main threats to validity of our results and how they were mitigated. Section 8 concludes the paper.

Section snippets

Background

The terms error, fault (defect), and failure are defined, respectively, as “erroneous state of the system”, “defect in a system or a representation of a system that if executed/activated could potentially result in an error”, and “an externally visible deviation from the systems specification” [7]. The Standard IEEE 1044 (2009) [8] states that “a failure may be caused by (and thus indicate the presence of) a fault” and “a fault may cause one or more failures”. We adhere to this terminology in

Planning of the systematic mapping study

Following the guidelines of Kitchenham et al. [6] we created a protocol and structured our mapping study process into seven stages as illustrated in Fig. 1. Such stages are based on [19], [20] and are briefly introduced below:

  • 1.

    the need and relevance motivates the mapping study and states the research questions (Section 3.1);

  • 2.

    planning of the study refers to the main steps needed to carry out the mapping study and outlines its structure (this section);

  • 3.

    search for primary studies seeks relevant

Search for primary studies

When searching for relevant studies, we follow the argumentation of Wohlin et al. [22] for achieving a good sample instead of exhaustively finding all primary studies. The systematic choice of the sources (e.g. quality indexed databases) and the scanning methods (e.g. application of search strings) promotes a better representation of the population for the targeted topic.

The following databases were elected, as recommended by Kitchenham et al. [6] and Petersen et al. [20]:

Data extraction, classification and visualisation

The relevant studies were read in detail to extract the data needed to answer the research questions. The two first authors read fully all the primary studies, and both independently analysed and extracted data. The full-text analysis included annotating a digital version of each paper by using colours and adding comments so that each colour was related to a research question (e.g. brown for the benchmarks, orange for evaluation measures, and so on). The analyses were compared and disagreements

Summary of results and research opportunities

In this section, we present a synthesis of our findings regarding the SBLF field and identify research opportunities.

Threats to validity

We present some threats to the validity of our results by following the guidelines of Wohlin et al. [47].

Construct validity refers to the relation between the theory behind the experiment and the observation(s), i.e. what the researcher has in mind and what is investigated according to the research questions. Regarding the research questions, we defined them in discussion meetings to reach alignment with the goals of the mapping, so it became a mitigated threat. On the search string, it impacts

Concluding remarks

Search-based fault localisation (SBFL) is the research field that deals with the use of optimisation techniques to automate, or partially automate the location of faulty code. As faults are constantly introduced and fixed during the software lifecycle and locating faults is a very time-consuming task, SBFL is an important research subject whose first research initiatives occurred in 2011 (according to our findings). We observed an increasing interest in the field since 2017, given by the

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

References (61)

  • W.E. Wong et al.

    A survey on software fault localization

    IEEE Trans. Softw. Eng.

    (2016)
  • P.S. Kochhar et al.

    Practitioners’ expectations on automated fault localization

    Proceedings of the 25th International Symposium on Software Testing and Analysis

    (2016)
  • B. Kitchenham et al.

    Guidelines for Performing Systematic Literature Reviews in Software Engineering, Technical Report EBSE-2007-01, School of Computer Science and Mathematics

    (2007)
  • ISO/IEC, Systems and Software Engineering, Systems and Software Assurance, Part 1: Concepts and Vocabulary,...
  • IEEE, Ieee standard classification for software anomalies - redline, IEEE Std 1044–2009 (Revision of IEEE Std...
  • T. Reps et al.

    The use of program profiling for software maintenance with applications to the year 2000 problem

    SIGSOFT Softw. Eng. Notes

    (1997)
  • M.J. Harrold et al.

    An empirical investigation of program spectra

    SIGPLAN Not.

    (1998)
  • R. Abreu et al.

    On the accuracy of spectrum-based fault localization

    Proceedings of the Testing: Academic and Industrial Conference Practice and Research Techniques - MUTATION

    (2007)
  • J.A. Jones et al.

    Visualization of test information to assist fault localization

    Proceedings of the 24th International Conference on Software Engineering. ICSE 2002

    (2002)
  • R. Abreu et al.

    An evaluation of similarity coefficients for software fault localization

    Proceedings of the 12th Pacific Rim International Symposium on Dependable Computing

    (2006)
  • L. Naish et al.

    A model for spectra-based software diagnosis

    ACM Trans. Softw. Eng. Methodol.

    (2011)
  • M. Harman et al.

    Empirical software engineering and verification

  • A. Zakari et al.

    Software fault localisation: a systematic mapping study

    IET Softw.

    (2019)
  • K. Petersen et al.

    Systematic mapping studies in software engineering

    Proceedings of the 12th International Conference on Evaluation and Assessment in Software Engineering

    (2008)
  • P. Agarwal et al.

    Fault-localization techniques for software systems: a literature review

    SIGSOFT Softw. Eng. Notes

    (2014)
  • M. Harman

    The current state and future of search based software engineering

    2007 Future of Software Engineering

    (2007)
  • S. Ali et al.

    A systematic review of the application and empirical investigation of search-based test case generation

    IEEE Trans. Softw. Eng.

    (2010)
  • W. Wang et al.

    Optimization of guided wave sensors distribution along thin-walled small-diameter pipe

    J. Central South Univ. (Sci. Technol.)

    (2016)
  • M. Harman et al.

    Search-based software engineering: trends, techniques and applications

    ACM Comput. Surv.

    (2012)
  • S. Biaz et al.

    Precise distributed localization algorithms for wireless networks

    Sixth IEEE International Symposium on a World of Wireless Mobile and Multimedia Networks

    (2005)
  • Cited by (6)

    • Software fault localization: An overview of research, techniques, and tools

      2023, Handbook of Software Fault Localization: Foundations and Advances
    • Learning to rank software modules for effort-aware defect prediction

      2021, Proceedings - 2021 21st International Conference on Software Quality, Reliability and Security Companion, QRS-C 2021
    • A Cross-Project Aging-Related Bug Prediction Approach Based on Joint Probability Domain Adaptation and k-means SMOTE

      2021, Proceedings - 2021 21st International Conference on Software Quality, Reliability and Security Companion, QRS-C 2021
    View full text