Pipe break prediction based on evolutionary data-driven methods with brief recorded data

doi:10.1016/j.ress.2011.03.010

Reliability Engineering & System Safety

Volume 96, Issue 8, August 2011, Pages 942-948

https://doi.org/10.1016/j.ress.2011.03.010 Get rights and content

Abstract

Pipe breaks often occur in water distribution networks, imposing great pressure on utility managers to secure stable water supply. However, pipe breaks are hard to detect by the conventional method. It is therefore necessary to develop reliable and robust pipe break models to assess the pipe's probability to fail and then to optimize the pipe break detection scheme. In the absence of deterministic physical models for pipe break, data-driven techniques provide a promising approach to investigate the principles underlying pipe break. In this paper, two data-driven techniques, namely Genetic Programming (GP) and Evolutionary Polynomial Regression (EPR) are applied to develop pipe break models for the water distribution system of Beijing City. The comparison with the recorded pipe break data from 1987 to 2005 showed that the models have great capability to obtain reliable predictions. The models can be used to prioritize pipes for break inspection and then improve detection efficiency.

Introduction

Pipe breaks often occur in water distribution networks that represent the arteries of a city. They can incur large direct and indirect economic and social costs, such as water and energy loss, repair costs, traffic delays, factory production loss due to inadequate water or service interruptions. Unfortunately, breaks in the pipe network are hard to be located because most parts of the pipes are buried underground and inaccessible.

According to interviews with many water industries, the most commonly used approach to locate breaks is to detect the sound of leaking water by connecting a pole to the pipe. Experienced workers may tell where a leakage has happened according to the sound they hear. But this empirical approach is obviously inaccurate and inefficient. Alternatively, acoustic devices are attached to the pipes and the captured signal is analyzed with a leak-noise correlator to compute the break location. Due to the large scale of the pipe network and the high cost of the device, it is unrealistic to install the devices for the entire city.

Therefore, reliable and robust pipe break models are required to assess a pipe's probability to fail, which can help making appropriate pipe break detection schemes and assist the locating of breaks [1]. Plenty of researches [1], [2], [3] have been dedicated to study the pipe break principles and develop predictive models with different methods, which can be basically classified into three categories: physically based methods, statistical methods and data mining methods.

Physically based methods [2], [4] aim at discovering the physical mechanisms underlying pipe break. Although robust and comprehensive physically based methods can improve the break prediction, they are hard to be implemented because the physical mechanisms that cause the pipe break are too complex to be completely understood. Moreover, it may be time and cost consuming to observe the complete pipe break process.

On the contrary, statistical methods provide a cost-effective means of analysis [1]. They use the available historical data to identify pipe break patterns [5]. There are two types of statistical methods: deterministic and probabilistic models. Time-linear models [6] and time-exponential models [7], [8] were developed to reveal the pipe break pattern in a deterministic way. Probabilistic models were also used to formally measure the probability rate of an individual pipe's break, such as proportional hazards models [9], time-dependent Poisson models [10], accelerated lifetime models [11], [12], Bayesian diagnostic models [13], logistic generalized linear models [3] and decision tree methods [14]. Despite the different variables considered, all these statistical methods aimed to describe pipe break rates with a unique predetermined expression in which the pipes shared the same explanatory variables. Comparison among the statistical models has been made by some authors [15]. There are some studies [16], [17] where the probabilistic approaches were introduced into the physically based methods and got acceptable results.

Recently, data mining methods such as genetic programming (GP) were employed to discover patterns in pipe break data sets [18]. The employment of such methods was required because of the complexity of water pipe networks. Berardi et al. [1] and Savic et al. [19], [20] used a novel hybrid data-driven method, called the Evolutionary Polynomial Regression (EPR), to model pipe breaks in water distribution systems. Parsimonious symbolic formulae were returned by GP and EPR with high accuracy in describing break occurrence in homogeneous pipe groups.

It must be noted that while using the statistical methods and the data-driven methods, pipes often need to be aggregated into homogeneous groups to obtain statistical significance, so that effective analysis can be conducted [7], [21].

Pioneered by Koza [22], genetic programming is an evolutionary algorithm-based methodology, which consists of finding computer programs that perform a user-defined task. It has been applied successfully to a broad range of applications such as automatic design, pattern recognition, etc. [23]. It has the advantage over black-box data mining methods of providing the potential to gain insight into the relationship between the variables.

Plenty of applications of the GP technique in the literature demonstrated its capability to develop a representative model of complicated physical processes. Giustolisi [24] used GP to determine the Chézy resistance coefficient for full circular corrugated channels. Babovic and Keijzer [25] made use of GP as well as combinations of GP and other conventional models to develop rainfall-runoff models on the basis of hydro-meteorological data. The algorithm was further improved by Babovic [26] to generate interpretable formulae considering the expertise knowledge.

In this study, the authors implemented the algorithm in C++ language based on the basic GP components coded by Kuhlmann and Hollick [27], and used it to develop the pipe break models. In the program, the rank selection method was used to choose individuals for genetic operations, which included crossover and mutation. The crossover rate was set at 0.5, meaning that the first half individuals were selected to spawn offspring according to their fitness rank. The mutation rate was set at 0.001 indicating that 0.1% of the nodes in an individual would be altered. The goodness-of-fit was evaluated using the coefficient of determination (CoD): $CoD = 1 - \frac{\sum_{n} {(\hat{y} - y_{o b s})}^{2}}{\sum_{n} {(y_{o b s} - {\bar{y}}_{o b s})}^{2}} = 1 - \frac{S S E}{\sum_{n} {(y_{o b s} - {\bar{y}}_{o b s})}^{2}}$ where n is the number of samples, ŷ is the value predicted by the model, y_obs is the observed value, ${\bar{y}}_{o b s}$ is the average of the observed values and SSE is the sum of square errors.

Evolutionary Polynomial Regression (EPR) is a hybrid data-driven technique recently developed by Giustolisi and Savic [28], which belongs to the family of genetic programming strategies. As stated by the developers, it incorporates the powerful regression capability of the conventional numerical regression techniques and the superior solution searching power of genetic programming.

The developed algorithm has been applied to various fields. Berardi et al. [1] used it to develop pipe deterioration models for water distribution systems. Elshorbagy and El-Baroudy [29] applied it to estimate the soil moisture content and compared it to the GP method. The algorithm was recently improved by Giustolisi and Savic [30] who introduced a multi-objective genetic algorithm, and tested it by developing a groundwater level prediction model based on the total monthly rainfall data.

In EPR, the generation of functions is not as random as in genetic programming. There are several optional forms of functions [28] and this study used the following form: $Y = a_{0} + \sum_{j = 1}^{m} a_{j} \cdot {(X_{1})}^{E S (j, 1)} \dots {(X_{k})}^{E S (j, k)} \cdot f ({(X_{1})}^{E S (j, k + 1)} \dots {(X_{k})}^{E S (j, 2 k)})$ where X_k is the kth explanatory variable, ES is the matrix of unknown exponents, a_j are unknown polynomial coefficients, m is the number of polynomial terms, a₀ is the bias term and f is a function selected by the user.

Symbolic models were constructed by EPR in two stages: (i) model structure search using the genetic algorithm (GA) and (ii) parameter estimation by means of the least square (LS) method. The formulae were evaluated and selected using CoD (Eq. (1)) as well.

In this study, the EPR toolbox [28] was used to model the pipe breaks for water distribution system.

Section snippets

Case study

The water distribution pipe network of Beijing City was selected as the case study in this research. There are two databases related to the pipes. The pipes' property data contain the pipes' physical properties such as the diameter, the length, the year of installation, the material, spatial information, etc. The pipes' break data recorded during a period of 19 years (1987–2005), contains information on the diameter, the material, the year of installation, the year of break and the break cause.

Results

Symbolic equations were obtained by applying both GP and EPR methods, and they were subsequently applied in a predictive way. Then the results were compared based on different criteria, including the goodness to fit the observed data, the parsimony of the equation, the possibility to describe the physical phenomenon, etc.

Comparison between the GP and EPR models

It is clear that Eqs. (3), (4) are quite similar in structure and performance. Breaks in each pipe group in the period of 1987–2002 were calculated by Eqs. (3), (4), and it was found that they fitted the recorded data well, as shown in Fig. 1.

The only difference between Eqs. (3), (4) resides in the coefficient. This indicates that both GP and EPR were able to identify the basic rule that describes the break variation among the pipe groups. The coefficient in the EPR equation was obtained by

Conclusion

Models to estimate pipe breaks were developed by using genetic programming (GP) and evolutionary polynomial regression (EPR). The water distribution system of Beijing City was selected as the case study area. The data on the pipe characteristics as well as break records were collected, and then grouped by pipe diameter and pipe age. The grouped data were divided into two parts according to the observation time, where breaks found from 1987 to 2002 were used for model development, and those from

Acknowledgment

The authors are grateful to the funding from the Ministry of Sciences and Technology of the People's Republic of China (no. 2006BAB17B03), and from Chutian Scholarship (KJ2010B002) and to Dr. Koen Blanckaert for his thorough revision.

References (32)

P. Davis et al.
A physical probabilistic model to predict failure rates in buried PVC pipelines
Reliab Eng Syst Saf
(2007)
S. Yamijala et al.
Statistical models for the analysis of water distribution system pipe break data
Reliab Eng Syst Saf
(2009)
B. Rajani et al.
Comprehensive review of structural deterioration of water mains: physically based models
Urban Water
(2001)
Y. Kleiner et al.
Comprehensive review of structural deterioration of water mains: statistical models
Urban Water
(2001)
J. Lei et al.
Statistical approach for describing failures and lifetimes of water mains
Water Sci Technol
(1998)
A. Debón et al.
Comparing risk of failure models in water supply networks using ROC curves
Reliab Eng Syst Saf
(2010)
R. Sadiq et al.
Probabilistic risk analysis of corrosion associated failures in cast iron water mains
Reliab Eng Syst Saf
(2004)
L. Berardi et al.
Development of pipe deterioration models for water distribution systems using EPR
J Hydroinform
(2008)
A.J. Kettler et al.
An analysis of pipe breakage in urban water distribution networks
Can J Civ Eng
(1985)
U. Shamir et al.
An analytical approach to scheduling pipe replacement
J Am Water Works Assoc
(1979)

T.M. Walski et al.

Economic analysis of water main breaks

J Am Water Works Assoc

(1982)

D.H. Marks et al.

Predicting urban water distribution maintenance strategies: a case study of New Haven

(1985)

Constantine AG, Darroch JN, Miller R. Predicting underground pipe failure. Aust Water Works Assoc,...

Eisenbeis P, Rostum J, Le Gat, Y. Statistical models for assessing the technical state of water networks—some European...

T.G. Watson et al.

Bayesian-based pipe failure model

J Hydroinform

(2004)

Q. Chen et al.

Rule-based model for aging-induced leakage in water supply network of Beijing City

China Water & Wastewater

(2008)

Cited by (54)

Prediction of pipe failures in water supply networks for longer time periods through multi-label classification
2023, Expert Systems with Applications
The unexpected failure of pipes is a problem that is hitting the water networks of many cities around the world. Nowadays, many proposals based on the use of machine learning techniques are emerging to combat this problem. However, most studies focus their efforts on predicting failures in short time periods, usually a year, while longer time period predictions would be more valuable to address strategic decisions.
In this study, the use of multi-label classification techniques is proposed to simultaneously predict pipe failures in water supply systems for multiple years. For this purpose, three models (discriminant analysis, logistic regression and random forest) and different prediction time periods (one, two and three years) have been analysed. As multi-label data require specific quality metrics and sampling techniques, part of this work is dedicated to their exploration and discussion.
The models are evaluated on a real-world seven-year database, achieving successful results. An insightful analysis of the use of the methodology shows how the percentage of avoided pipe failures increases over time. In fact, it is demonstrated that 30.2%, 51.4% and 54.0% of the pipe failures of three consecutive years are avoided according to data from a real network.
Pipe breaks and estimating the impact of pressure control in water supply networks
2021, Reliability Engineering and System Safety
The deterioration and fracture of water supply pipes present a major threat for the continuous provision of drinking water. The hydraulic pressure in pipes is an influential factor for the occurrence of pipe breaks. However, little evidence has been provided so far for the quantitative assessment of the impact of pressure control on reducing the number of pipe breaks. In this paper, we applied logistic regression with polynomial terms, and a sensitivity analysis to assess the potential impact of pressure control on reducing pipe breaks. A large dataset of historic pipe breaks was used to develop and validate the presented method. Cast iron and asbestos cement pipes were examined in detail. Results showed that pipe breaks could be decreased by 18% to 30% by reducing the mean pressure for the investigated cohorts of asbestos cement and cast iron pipes. Pressure range reduction could provide larger impacts on both pipe materials. These results indicate that proactively controlling the hydraulic pressure may have a potentially significant impact on the reliability and sustainability of water supply networks.
Prediction of pipe failures in water supply networks using logistic regression and support vector classification
2020, Reliability Engineering and System Safety
Citation Excerpt :
Meanwhile, Christodoulou et al. [9] considered the age of pipes as an output variable named “LifeCycle”. Material is treated differently, some authors only study certain kind of materials [13,16,18,19] while others consider all the diverse materials in the water network. Several studies stated that pipes with smaller diameters tended to suffer more failures [20,21].
Companies in charge of water supply networks are making a huge effort to optimally plan the annual replacements of pipes. This would save costs, enable a higher quality of service and a sustainable management of infrastructure.
This study presents a methodology to predict pipe failures in water supply networks. Logistic regression and support vector classification are chosen as predictive systems. Both provide a failure probability associated with each sample which is increasingly required by companies that manage these infrastructures. Furthermore, several pre-processing techniques that seek to improve the accuracy of predictions are addressed.
The proposed methodology is illustrated with the real case of a Spanish city. This is an extensive water supply network whose recorded data contains 4,393 pipe failures. The results obtained state that the number of unexpected failures might be significantly reduced. Around 30% of failures could have been prevented by replacing only 3% of the network's pipes per year, which is a realistic and feasible option.
As a future line of research, the objective must be to develop a global tool that incorporates the failure probability and its consequence, generating the optimal pipe replacement plan.
Models and explanatory variables in modelling failure for drinking water pipes to support asset management: a mixed literature review
2023, Applied Water Science
Performance evaluation of ANN and ANFIS models for estimating velocity and pressure in water distribution networks
2023, Water Supply
A robust clustering-based multi-objective model for optimal instruction of pipes replacement in urban WDN based on machine learning approaches
2023, Urban Water Journal

View all citing articles on Scopus

View full text

Pipe break prediction based on evolutionary data-driven methods with brief recorded data

Abstract

Introduction

Section snippets

Case study

Results

Comparison between the GP and EPR models

Conclusion

Acknowledgment

Reliab Eng Syst Saf

Reliab Eng Syst Saf

Urban Water

Urban Water

Water Sci Technol

Reliab Eng Syst Saf

Reliab Eng Syst Saf

Development of pipe deterioration models for water distribution systems using EPR

J Hydroinform

An analysis of pipe breakage in urban water distribution networks

Can J Civ Eng

An analytical approach to scheduling pipe replacement

J Am Water Works Assoc

Economic analysis of water main breaks

J Am Water Works Assoc

Predicting urban water distribution maintenance strategies: a case study of New Haven

Bayesian-based pipe failure model

J Hydroinform

Rule-based model for aging-induced leakage in water supply network of Beijing City

China Water & Wastewater