Pipe break prediction based on evolutionary data-driven methods with brief recorded data

https://doi.org/10.1016/j.ress.2011.03.010Get rights and content

Abstract

Pipe breaks often occur in water distribution networks, imposing great pressure on utility managers to secure stable water supply. However, pipe breaks are hard to detect by the conventional method. It is therefore necessary to develop reliable and robust pipe break models to assess the pipe's probability to fail and then to optimize the pipe break detection scheme. In the absence of deterministic physical models for pipe break, data-driven techniques provide a promising approach to investigate the principles underlying pipe break. In this paper, two data-driven techniques, namely Genetic Programming (GP) and Evolutionary Polynomial Regression (EPR) are applied to develop pipe break models for the water distribution system of Beijing City. The comparison with the recorded pipe break data from 1987 to 2005 showed that the models have great capability to obtain reliable predictions. The models can be used to prioritize pipes for break inspection and then improve detection efficiency.

Introduction

Pipe breaks often occur in water distribution networks that represent the arteries of a city. They can incur large direct and indirect economic and social costs, such as water and energy loss, repair costs, traffic delays, factory production loss due to inadequate water or service interruptions. Unfortunately, breaks in the pipe network are hard to be located because most parts of the pipes are buried underground and inaccessible.

According to interviews with many water industries, the most commonly used approach to locate breaks is to detect the sound of leaking water by connecting a pole to the pipe. Experienced workers may tell where a leakage has happened according to the sound they hear. But this empirical approach is obviously inaccurate and inefficient. Alternatively, acoustic devices are attached to the pipes and the captured signal is analyzed with a leak-noise correlator to compute the break location. Due to the large scale of the pipe network and the high cost of the device, it is unrealistic to install the devices for the entire city.

Therefore, reliable and robust pipe break models are required to assess a pipe's probability to fail, which can help making appropriate pipe break detection schemes and assist the locating of breaks [1]. Plenty of researches [1], [2], [3] have been dedicated to study the pipe break principles and develop predictive models with different methods, which can be basically classified into three categories: physically based methods, statistical methods and data mining methods.

Physically based methods [2], [4] aim at discovering the physical mechanisms underlying pipe break. Although robust and comprehensive physically based methods can improve the break prediction, they are hard to be implemented because the physical mechanisms that cause the pipe break are too complex to be completely understood. Moreover, it may be time and cost consuming to observe the complete pipe break process.

On the contrary, statistical methods provide a cost-effective means of analysis [1]. They use the available historical data to identify pipe break patterns [5]. There are two types of statistical methods: deterministic and probabilistic models. Time-linear models [6] and time-exponential models [7], [8] were developed to reveal the pipe break pattern in a deterministic way. Probabilistic models were also used to formally measure the probability rate of an individual pipe's break, such as proportional hazards models [9], time-dependent Poisson models [10], accelerated lifetime models [11], [12], Bayesian diagnostic models [13], logistic generalized linear models [3] and decision tree methods [14]. Despite the different variables considered, all these statistical methods aimed to describe pipe break rates with a unique predetermined expression in which the pipes shared the same explanatory variables. Comparison among the statistical models has been made by some authors [15]. There are some studies [16], [17] where the probabilistic approaches were introduced into the physically based methods and got acceptable results.

Recently, data mining methods such as genetic programming (GP) were employed to discover patterns in pipe break data sets [18]. The employment of such methods was required because of the complexity of water pipe networks. Berardi et al. [1] and Savic et al. [19], [20] used a novel hybrid data-driven method, called the Evolutionary Polynomial Regression (EPR), to model pipe breaks in water distribution systems. Parsimonious symbolic formulae were returned by GP and EPR with high accuracy in describing break occurrence in homogeneous pipe groups.

It must be noted that while using the statistical methods and the data-driven methods, pipes often need to be aggregated into homogeneous groups to obtain statistical significance, so that effective analysis can be conducted [7], [21].

Pioneered by Koza [22], genetic programming is an evolutionary algorithm-based methodology, which consists of finding computer programs that perform a user-defined task. It has been applied successfully to a broad range of applications such as automatic design, pattern recognition, etc. [23]. It has the advantage over black-box data mining methods of providing the potential to gain insight into the relationship between the variables.

Plenty of applications of the GP technique in the literature demonstrated its capability to develop a representative model of complicated physical processes. Giustolisi [24] used GP to determine the Chézy resistance coefficient for full circular corrugated channels. Babovic and Keijzer [25] made use of GP as well as combinations of GP and other conventional models to develop rainfall-runoff models on the basis of hydro-meteorological data. The algorithm was further improved by Babovic [26] to generate interpretable formulae considering the expertise knowledge.

In this study, the authors implemented the algorithm in C++ language based on the basic GP components coded by Kuhlmann and Hollick [27], and used it to develop the pipe break models. In the program, the rank selection method was used to choose individuals for genetic operations, which included crossover and mutation. The crossover rate was set at 0.5, meaning that the first half individuals were selected to spawn offspring according to their fitness rank. The mutation rate was set at 0.001 indicating that 0.1% of the nodes in an individual would be altered. The goodness-of-fit was evaluated using the coefficient of determination (CoD):CoD=1n(yˆyobs)2n(yobsy¯obs)2=1SSEn(yobsy¯obs)2where n is the number of samples, ŷ is the value predicted by the model, yobs is the observed value, y¯obs is the average of the observed values and SSE is the sum of square errors.

Evolutionary Polynomial Regression (EPR) is a hybrid data-driven technique recently developed by Giustolisi and Savic [28], which belongs to the family of genetic programming strategies. As stated by the developers, it incorporates the powerful regression capability of the conventional numerical regression techniques and the superior solution searching power of genetic programming.

The developed algorithm has been applied to various fields. Berardi et al. [1] used it to develop pipe deterioration models for water distribution systems. Elshorbagy and El-Baroudy [29] applied it to estimate the soil moisture content and compared it to the GP method. The algorithm was recently improved by Giustolisi and Savic [30] who introduced a multi-objective genetic algorithm, and tested it by developing a groundwater level prediction model based on the total monthly rainfall data.

In EPR, the generation of functions is not as random as in genetic programming. There are several optional forms of functions [28] and this study used the following form:Y=a0+j=1maj(X1)ES(j,1)(Xk)ES(j,k)f((X1)ES(j,k+1)(Xk)ES(j,2k))where Xk is the kth explanatory variable, ES is the matrix of unknown exponents, aj are unknown polynomial coefficients, m is the number of polynomial terms, a0 is the bias term and f is a function selected by the user.

Symbolic models were constructed by EPR in two stages: (i) model structure search using the genetic algorithm (GA) and (ii) parameter estimation by means of the least square (LS) method. The formulae were evaluated and selected using CoD (Eq. (1)) as well.

In this study, the EPR toolbox [28] was used to model the pipe breaks for water distribution system.

Section snippets

Case study

The water distribution pipe network of Beijing City was selected as the case study in this research. There are two databases related to the pipes. The pipes' property data contain the pipes' physical properties such as the diameter, the length, the year of installation, the material, spatial information, etc. The pipes' break data recorded during a period of 19 years (1987–2005), contains information on the diameter, the material, the year of installation, the year of break and the break cause.

Results

Symbolic equations were obtained by applying both GP and EPR methods, and they were subsequently applied in a predictive way. Then the results were compared based on different criteria, including the goodness to fit the observed data, the parsimony of the equation, the possibility to describe the physical phenomenon, etc.

Comparison between the GP and EPR models

It is clear that Eqs. (3), (4) are quite similar in structure and performance. Breaks in each pipe group in the period of 1987–2002 were calculated by Eqs. (3), (4), and it was found that they fitted the recorded data well, as shown in Fig. 1.

The only difference between Eqs. (3), (4) resides in the coefficient. This indicates that both GP and EPR were able to identify the basic rule that describes the break variation among the pipe groups. The coefficient in the EPR equation was obtained by

Conclusion

Models to estimate pipe breaks were developed by using genetic programming (GP) and evolutionary polynomial regression (EPR). The water distribution system of Beijing City was selected as the case study area. The data on the pipe characteristics as well as break records were collected, and then grouped by pipe diameter and pipe age. The grouped data were divided into two parts according to the observation time, where breaks found from 1987 to 2002 were used for model development, and those from

Acknowledgment

The authors are grateful to the funding from the Ministry of Sciences and Technology of the People's Republic of China (no. 2006BAB17B03), and from Chutian Scholarship (KJ2010B002) and to Dr. Koen Blanckaert for his thorough revision.

References (32)

  • T.M. Walski et al.

    Economic analysis of water main breaks

    J Am Water Works Assoc

    (1982)
  • D.H. Marks et al.

    Predicting urban water distribution maintenance strategies: a case study of New Haven

    (1985)
  • Constantine AG, Darroch JN, Miller R. Predicting underground pipe failure. Aust Water Works Assoc,...
  • Eisenbeis P, Rostum J, Le Gat, Y. Statistical models for assessing the technical state of water networks—some European...
  • T.G. Watson et al.

    Bayesian-based pipe failure model

    J Hydroinform

    (2004)
  • Q. Chen et al.

    Rule-based model for aging-induced leakage in water supply network of Beijing City

    China Water & Wastewater

    (2008)
  • Cited by (54)

    • Prediction of pipe failures in water supply networks using logistic regression and support vector classification

      2020, Reliability Engineering and System Safety
      Citation Excerpt :

      Meanwhile, Christodoulou et al. [9] considered the age of pipes as an output variable named “LifeCycle”. Material is treated differently, some authors only study certain kind of materials [13,16,18,19] while others consider all the diverse materials in the water network. Several studies stated that pipes with smaller diameters tended to suffer more failures [20,21].

    View all citing articles on Scopus
    View full text