Artificial intelligence-based single and hybrid models for prediction of water quality in rivers: A review

https://doi.org/10.1016/j.chemolab.2020.103978Get rights and content

Highlights

  • Review on recent studies about river water quality modeling and predicting.

  • Recent studies on the modeling of almost every type of river water quality variables are evaluated.

  • Various conventional and artificial intelligence-based single and hybrid models are reviewed.

  • Data normalization, data division, modeling performance evaluation measures, and recommendations for future works are discussed.

Abstract

The need for accurate predictions of water quality in rivers has encouraged researchers to develop new methods and to improve the predictive ability of conventional models. In recent years, artificial intelligence (AI)-based methods have been recognized significantly powerful for this purpose. In this study, the performance of the various types of single and hybrid AI models including artificial neural networks (ANNs), genetic programming (GP), fuzzy logic (FL), support vector machine (SVM), hybrid neuro-fuzzy (NF), hybrid ANN-ARIMA, hybrid genetic algorithm-neural networks (GA-NN), and wavelet-based hybrid models such as wavelet-neural networks (WANN), wavelet-neuro fuzzy (WNF), wavelet-support vector regression (WSVR), and wavelet-linear genetic programming (WLGP) models were investigated for the prediction of water quality in rivers. In this review paper, for each of the models, firstly, a brief introduction is provided. Then some recently published papers are presented to review the performance of the model for modeling water quality in rivers. For this purpose, 51 journal papers that were published from 2000 to 2016 and dealing with the use of the single and hybrid AI models for river water quality prediction were selected. The review of these papers is undertaken in terms of the predictor selection, data normalization, train, and test data division, modeling approaches, prediction time steps, and modeling performance evaluation procedures. The effect of using integrated models to improve the prediction accuracy of the single models was investigated as well. Out of the 51 selected papers, 31 papers (~60% of the entire papers) were published in the past five years. The selected papers have been cited up to 1716 times before 20th February 2016. Among the various modeling techniques, the ANN and WANN models (17 and 7 papers, respectively) were the most widely used single and hybrid models. In the reviewed papers, more attention is given to the modeling of dissolved oxygen (DO) and suspended sediment in rivers. In 23 papers, data with daily time intervals were used for water quality modeling. The present paper covers 13 different single and hybrid AI models. It presents a comprehensive investigation into the application of AI methods for modeling river water quality and offers a critical insight into the use and reliability of the various modeling approaches for modeling diverse water quality measurements.

Introduction

Modeling of the water quality-related variables in surface waters is now considered a crucial issue concerning efficient and reliable management of water resources which can result in a significant reduction in costs due to the reliability of these models and the use of indirect techniques which are capable of determining the values of water quality-related variables in the future. The limitations in the ability of traditional techniques such as multiple linear regression (MLR) [[1], [2], [3], [4]], multiple non-linear regression (MNLR) [1], Mann-Kendall (MK) trend test [5], and auto-regressive integrated moving average (ARIMA) [1,6] models to accurately estimate the water quality in rivers due to the complexity and sophistication of the water quality time series led to the use of the black box methods such as the artificial neural network (ANN) [[7], [8], [9], [10], [11]], neuro-fuzzy (NF) [3,12], genetic programming (GP) [13], and support vector machine (SVM) [14]. Besides these single methods, many hybrid models have been widely used for water quality modeling in rivers as well.

During the past decades, many researchers have used different models to improve the accuracy of water quality predictions. All these models could be classified under two main categories: the conventional models and artificial intelligence (AI)-based models. Conventional statistical models, including MLR and ARIMA models, are linear in predicting future values and are forced to be linear functions of past observations. Due to their relatively simple concepts and ease of implementation, linear models have been the focal point of many investigations and the critical tool for time series modeling over the past few decades [15]. However, in most real problems, especially in the field of river water quality modeling, we encounter non-linear patterns that need non-linear models to deal with that. Therefore, to overcome the limitation of such linear models, in the past two decades, several non-linear models have been proposed in the literature. These models include ANN, fuzzy, GP, and SVM. Many studies have proved the capability and usefulness of these methods for modeling different river water quality measurements as responses. Furthermore, in the past decade, combining these models and also the use of data pre-processing techniques, as well as many other hybrid models have been presented in the literature. The neuro-fuzzy (NF) model, wavelet-neural networks (WANN) model, wavelet-support vector regression (WSVR), and wavelet-linear genetic programming (WLGP) model can be mentioned as examples of hybrid models which have been used to model river water quality.

In the following, the two categories of modeling techniques, namely, the conventional and the AI-based models, are presented. Within each category, the commonly used methods for water quality modeling in rivers are briefly illustrated. Subsequently, to lend some insight into the application and reliability of the model in the field of water quality modeling in rivers, several studies published between 2000 and 2016 are provided. Although in the literature some reviews are presented regarding the modeling of quality of water bodies such as rivers, lakes, and reservoirs, most of them focused on modeling a specific water quality variable as an output (e.g., suspended sediment) or the application of a single method (ANN model for example). In Table 1, a summary of papers containing reviews of water quality modeling in riverine systems is presented. Some of these reviews discussed the application of AI techniques for both water quality (e.g., suspended sediment) and water quantity (e.g., rainfall-runoff) modeling. A number of these papers reviewed the studies concerning water quality in two or more bodies of water (e.g., river, lake, or groundwater).

The lack of a comprehensive review of river water quality modeling can be inferred from Table 1. A thorough review should involve different types of water quality variables and include several modeling approaches as well. The current review covers numerous studies that have been carried out to model several river water quality-related variables through the use of 13 different conventional and AI modeling approaches. This review is the first attempt that evaluates the current studies on the modeling of almost all types of water quality variables in rivers by using various modeling approaches. However, modeling of water quality in rivers via software such as GIS, Mike, Qual2k, WASP is not included in this review paper.

This review focuses primarily on the application of AI-based approaches to model river water quality. For this purpose, the relevant papers were identified by a keyword search of the papers conducted on 20th February 2016, within the period between 2000 and 2016 using the keywords; water quality, river, modeling, forecasting, and prediction, accompanied by the names of one or more of the modeling approaches such as; neural networks, fuzzy logic, genetic algorithm, wavelet neural network, to name but a few. Then, from the presented results by search engines, the papers in English which were relevant to our purpose were selected. Finally, looking at the inside, 51 of the most relevant papers were selected. The selected papers are published in 25 separate international scientific indexing (ISI) journals. The list of these journals and their 2014 impact factors is summarized in Appendix B.

The selected papers for this review comprised the main bulk of the published materials, including the most renowned studies, which had been cited a total of 1716 times up to 20th February 2016. Fig. 1 shows the number of cumulative citations per year. Four of the papers published before 2010 can be highlighted through their citation counts that exceed 100. These are Cigizoglu [19], Alp and Cigizoglu [20], Singh et al. [9], and Faruk [6] with citations of 177, 112, 210, and 128, respectively. The four highly cited papers included in the study were published before 2010, and each one utilized the ANN technique to model water quality.

The selected papers are analyzed based on their year of publication, the region of their study, their modeling approaches, and the water quality variables modeled. In Appendix C, for each of the selected papers, the results of the analysis mentioned above are presented. It should be noted that two or more techniques were used by the majority of these papers to model water quality in rivers. Therefore, in Appendix C, for each study, the applied models are ordered based on their modeling accuracies. Fig. 2 shows the number of papers published annually from 2000 to 2016. From this figure, it is evident that during the past decade, there has been a growing interest in modeling water quality of rivers using AI-based models. With some fluctuations and drops in the number of papers published in 2005, 2006, and 2012, the overall trend in the number of publications per year is incremental. Most of the papers were published in the past few years. For example, 12 papers (about one-fourth of the entire papers) were published in 2014. The drop in the number of papers published in 2015 is partly due to the fact that this review was carried out on 20th February 2016.

The selected papers were categorized based on modeling approaches with superior performances. Table 2 indicates the number of papers for each modeling procedure. As can be observed in this table, the single ANN model and its combination with other methods are the widely used models among the various AI-based methods used for water quality modeling in rivers. Pre-processing the inputs for the ANN models using wavelet analysis is a common technique to improve the predictive ability of the ANN models. For each modeling approach, the number of papers published from 2000 to 2016 is shown in Fig. 3.

Section snippets

Brief introduction and bibliographic review on conventional models

Probably the most conventional and frequently used methods for modeling river water quality are based on regression techniques. MLR, MNLR, and ARIMA models are widely used for this purpose. However, they assume linearity and stationarity in the data and have a limited ability to capture non-stationarities and non-linearities involved in the hydro-environmental data [21]. These models are briefly illustrated in the following subsections, and the studies in which these techniques were used are

Artificial neural networks

In recent years, ANN has received more attention in water resources studies. It is a robust computational algorithm that is used to simulate complex non-linear relationships, especially in situations where the explicit form of the relation between the variables is unknown. ANNs are mathematical systems and are able to map a set of input data into a corresponding set of output data [37]. In an ANN predictive model, the historical data time series are used to predict its future values. The

Water quality data

According to the source and nature, the river water quality variables could be categorized into three main groups; chemical, physical, and biological. In the literature for river systems, different water quality variables have been modeled. For example, algae, nutrients, or nutrient-related chemical indicators, such as DO and BOD, and physical water quality variables, such as T, EC, TDS, salinity, pH, and SSL, are often of concern. In this paper, the selected papers for review cover the three

Summary and conclusions

Attempts were made in this paper to review the papers which used AI-based techniques to model water quality in rivers. To this end, 51 journal papers that were published from 2000 to 2016 and focus on water quality modeling in rivers were selected. In this review, almost all different types of single and hybrid AI models were taken into account. Analysis of the selected papers showed that in recent years there had been an increasing trend towards using these models in the field of water quality

Recommendations for future works

Based on the review of 51 papers on river water quality modeling, conducted in this paper, the following recommendations for future research are made:

  • 1

    Because many studies have proved that the combined or hybrid models such as WANN are more accurate comparing the single structure AI-models, it is recommended that more attention be given to the improvement of such models.

  • 2

    Because of the limited studies in the application of GA-NN, WGP, and WSVR models for modeling water quality in rivers, more

Declaration of competing interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

References (105)

  • V. Nourani et al.

    Applications of hybrid wavelet–Artificial Intelligence models in hydrology: a review

    J. Hydrol.

    (2014)
  • H.K. Cigizoglu

    Estimation and forecasting of daily suspended sediment data by multi-layer perceptrons

    Adv. Water Resour.

    (2004)
  • M. Alp et al.

    Suspended sediment load simulation by two artificial neural network methods using hydrometeorological data

    Environ. Model. Software

    (2007)
  • V. Nourani et al.

    A geomorphology-based ANFIS model for multi-station modeling of rainfall–runoff process

    J. Hydrol.

    (2013)
  • H. Li et al.

    Support vector machines and its applications in chemistry

    Chemometr. Intell. Lab. Syst.

    (2009)
  • K.H. Hamed et al.

    A modified Mann-Kendall trend test for autocorrelated data

    J. Hydrol.

    (1998)
  • J. Adamowski et al.

    Development of a coupled wavelet transform and neural network method for flow forecasting of non-perennial rivers in semi-arid watersheds

    J. Hydrol.

    (2010)
  • E. Dogan et al.

    Modeling biological oxygen demand of the Melen River in Turkey using an artificial neural network technique

    J. Environ. Manag.

    (2009)
  • A.P. Piotrowski et al.

    Comparing various artificial neural network types for water temperature prediction in rivers

    J. Hydrol.

    (2015)
  • A. Altunkaynak et al.

    Fuzzy logic modeling of the dissolved oxygen fluctuations in Golden Horn

    Ecol. Model.

    (2005)
  • L.A. Zadeh

    Fuzzy sets

    Inf. Contr.

    (1965)
  • S.S. Mahapatra et al.

    A Cascaded Fuzzy Inference System for Indian river water quality prediction

    Adv. Eng. Software

    (2011)
  • A. Aytek et al.

    A genetic programming approach to suspended sediment modelling

    J. Hydrol.

    (2008)
  • O. Kisi

    Modeling discharge-suspended sediment relationship using least square support vector machine

    J. Hydrol.

    (2012)
  • T. Rajaee et al.

    Forecasting of chlorophyll-a concentrations in South San Francisco Bay using five different models

    Appl. Ocean Res.

    (2015)
  • D. Labat

    Recent advances in wavelet analyses: Part 1. A review of concepts

    J. Hydrol.

    (2005)
  • D. Labat et al.

    Recent advances in wavelet analyses: Part 2—amazon, Parana, Orinoco and Congo discharges time scale variability

    J. Hydrol.

    (2005)
  • D. Donald et al.

    Joint multiple adaptive wavelet regression ensembles

    Chemometr. Intell. Lab. Syst.

    (2011)
  • D.R. Wijaya et al.

    Information Quality Ratio as a novel metric for mother wavelet selection

    Chemometr. Intell. Lab. Syst.

    (2017)
  • T. Partal et al.

    Estimation and forecasting of daily suspended sediment data using wavelet–neural networks

    J. Hydrol.

    (2008)
  • S. Liu et al.

    A hybrid WA–CPSO-LSSVR model for dissolved oxygen content prediction in crab culture

    Eng. Appl. Artif. Intell.

    (2014)
  • M. Ravansalar et al.

    A wavelet–linear genetic programming model for sodium (Na+) concentration forecasting in rivers

    J. Hydrol.

    (2016)
  • M. Ay et al.

    Modelling of chemical oxygen demand by using ANNs, ANFIS and k-means clustering techniques

    J. Hydrol.

    (2014)
  • O. Kisi et al.

    Adaptive neuro-fuzzy computing technique for suspended sediment estimation

    Adv. Eng. Software

    (2009)
  • F.J. Chang et al.

    Assessment of arsenic concentration in stream water using neuro fuzzy networks with factor analysis

    Sci. Total Environ.

    (2014)
  • A. Sedki et al.

    Evolving neural network using real coded genetic algorithm for daily rainfall–runoff forecasting

    Expert Syst. Appl.

    (2009)
  • A. Burchard-Levine et al.

    A hybrid evolutionary data driven model for river water quality early warning

    J. Environ. Manag.

    (2014)
  • A.T.C. Goh

    Back-propagation neural networks for modeling complex systems

    Artif. Intell. Eng.

    (1995)
  • F. Jolai et al.

    Integrating data transformation techniques with Hopfield neural networks for solving travelling salesman problem

    Expert Syst. Appl.

    (2010)
  • N. Basant et al.

    Linear and nonlinear modeling for simultaneous prediction of dissolved oxygen and biochemical oxygen demand of the surface water—a case study

    Chemometr. Intell. Lab. Syst.

    (2010)
  • M. Daszykowski et al.

    TOMCAT: a MATLAB toolbox for multivariate calibration techniques

    Chemometr. Intell. Lab. Syst.

    (2007)
  • S. Palani et al.

    An ANN application for water quality forecasting

    Mar. Pollut. Bull.

    (2008)
  • T. Rajaee et al.

    River suspended sediment load prediction: application of ANN and wavelet conjunction model

    J. Hydrol. Eng.

    (2010)
  • D.Ö. Faruk

    A hybrid neural network and ARIMA model for water quality time series prediction

    Eng. Appl. Artif. Intell.

    (2010)
  • M.J. Diamantopoulou et al.

    Cascade correlation artificial neural networks for estimating missing monthly values of water quality parameters in rivers

    Water Resour. Manag.

    (2007)
  • A. Najah et al.

    Application of artificial neural networks for water quality prediction

    Neural Comput. Appl.

    (2013)
  • N. Wu et al.

    Modeling daily chlorophyll a dynamics in a German lowland river using artificial neural networks and multiple linear regression approaches

    Limnology

    (2014)
  • T. Rajaee

    Wavelet and neuro-fuzzy conjunction approach for suspended sediment prediction

    Clean–Soil, Air, Water

    (2010)
  • H. Orouji et al.

    Modeling of water quality parameters using data-driven models

    J. Environ. Eng.

    (2013)
  • J. Adamowski et al.

    Comparison of multivariate regression and artificial neural networks for peak urban water-demand forecasting: evaluation of different ANN learning algorithms

    J. Hydrol. Eng.

    (2010)
  • Cited by (136)

    View all citing articles on Scopus
    View full text