Elsevier

Journal of Hydrology

Volume 589, October 2020, 125335
Journal of Hydrology

Research papers
Prediction of surface water total dissolved solids using hybridized wavelet-multigene genetic programming: New approach

https://doi.org/10.1016/j.jhydrol.2020.125335Get rights and content

Highlights

  • A novel hybrid W-MGGP is developed for predicting TDS in surface water.

  • An investigation of the impact of wavelet decomposition is conducted.

  • 20 years of historical river discharge and TDS are used for the development.

  • The wavelet discrete meyer method is showed the best data time series decomposition.

  • The proposed W-MGGP model provided a reliable prediction accuracy for the TDS.

Abstract

Total dissolved solids (TDS) are recognized as an essential indicator of surface water quality. The current research investigates the potential of a novel computer aid approach based on the hybridization of wavelet pre-processing with multigene genetic programming (W-MGGP) for monthly TDS prediction at the Sefid Rud River in Northern Iran. 20-year historical monthly river flow (Q) and TDS data measured at the Astaneh station were used for the model training and testing. The employed time series data were decomposed into several sub-series using three mother wavelets (i.e., Daubechies4 (db4), biorthogonal (bior6.8), and discrete meyer (dmey)) to assess appropriate combinations of the time series and their lag times, which were further used for prediction process. The W-MGGP model was compared against the wavelet-gene expression programming (W-GEP), stand-alone MGGP, and GEP models. Results were evaluated using several performance metrics including root mean square error (RMSE), correlation coefficient (R), and Nash-Sutcliffe efficiency (NSE). Modeling results indicated that W-MGGP and W-GEP provided a superior prediction capacity for the TDS in comparison with the other stand-alone artificial intelligence (AI) models. The discrete meyer method exhibited the best performance in time series data decomposition as a pre-processing approach. The proposed W-MGGP model based on the dmey mother wavelet attained the best statistical metrics (R = 0.942, RMSE = 90.383, and NSE = 0.862). The research findings demonstrated the hybridization of the wavelet pre-processing approach with MGGP predictive model for the TDS simulation.

Introduction

River water is recognized as one of the essential fresh surface waters that is available naturally to supply multiple human usage such as drinking, irrigation, and industrial production purposes. Monitoring and management of surface water quality (WQ) play an undeniable role in environmental protection and sustainable use of these water resources (Ahmadianfar et al., 2020). Over the past few decades, efforts have been devoted to improving WQ management and sustainability through the precise simulation of the physical, chemical, and biological processes of various pollutants. Total dissolved solid (TDS) is a well-accepted indicator for the WQ that is effectively used for assessing the suitability of drinking and irrigation water supply. TDS consists of a variety of inorganic salts (e.g., sodium (Na+), magnesium (Mg+2), calcium (Ca+2), and potassium (K+) as cations, as well as chloride (Cl-), sulfate (SO4-2), nitrates (NO-3), and bicarbonates (HCO-3) as anions) and dissolved organic matter. Based on the reported standard by the World Health Organization (WHO), the acceptable range of TDS for drinking water is 300–600 mg/l (WHO, 2011). Also, the permissible water concentration range of TDS for agriculture is 450–2000 mg/l (Ayers and Westcot, 1985).

Laboratory investigations and empirical calculation methods have been reported for the TDS quantification (Tiyasha et al., 2020). However, its laboratory test or the manual calculation is associated with some drawbacks such as time consuming, unintentional errors, and the generalization for the perfect computation. The potential of the AI models exhibited a remarkable advancement on modeling TDS of river WQ (Banadkooki et al., 2020). The massive implementation of the AI models is due to some limitations that have been recognized in the classical mathematical models in addition to the high stochasticity pattern associated with the WQ (Abba et al., 2020). In addition, the classical models can only provide predictions for a linear and stationary state of a dataset (Deng et al., 2015). The capacity of the AI models is reported in their potential to handle the non-linearity and complexity phenomena of the environmental and hydrological processes, overcoming the drawback of the traditional models (Alizadeh et al., 2018, Das et al., 2020, Gholami et al., 2016, Maier et al., 2014, Naganna and Deka, 2019, Rezaie-Balf et al., 2019, Tiyasha et al., 2020, Wu and Chau, 2013). The AI models have been positively employed to address a variety of water quality issues such as water quality index (WQI), dissolved oxygen (DO), nitrate (NO3), electrical conductivity (EC), chemical oxygen demand (COD), biochemical oxygen demand (BOD), ammoniacal nitrogen (NH3-N), pH, and sodium adsorption ratio (SAR) (Tiyasha et al., 2020). The examples of these AI models include: artificial neural network (ANN), support vector machine (SVM), adaptive neuro fuzzy inference system (ANFIS), random forest (RF), decision tree (DT), genetic programming (GP), linear genetic programming (LGP), extreme learning machine (ELM), and gene expression programming (GEP) (Ay and Kisi, 2014, Azad et al., 2017, Emamgholizadeh et al., 2014, Heydari et al., 2013, Olyaie et al., 2017, Sengorur et al., 2015, Sepahvand et al., 2019, Takdastan et al., 2018, Tiwari et al., 2018).

Recently, the capacity of the SVM model was tested to assess different WQ variables in rivers (Mahmoudi et al., 2016); to forecast the Carlson's trophic state index in reservoirs (Chou et al., 2018); and to predict some WQ parameters in the Sefid Rud River basin in Iran (Bozorg-Haddad et al., 2017). The GEP, DT, and LGP were used to forecast TDS levels in the Zarinehroud basin in Iran (Zaman Zad Ghavidel and Montaseri, 2014); and to assess BOD, DO, and COD in the Karoun River in Iran (Najafzadeh et al., 2018).

WQ time series data are highly stochastic and chaotic. Implementation of an individual AI based model has limitations for WQ modeling (Yaseen et al., 2018). Hence, the integration of the time series data pre-processing approaches can facilitate decomposition of the time series and improve the predictability performance of the AI models. Among several powerful data pre-processing techniques, the discrete wavelet transform (DWT) has been demonstrated a satisfactory approach for decomposition of environmental, hydrological, and ecological time series data (Nourani et al., 2014). By providing a time–frequency representation of an analyzed signal of time series in the time domain and the information about the physical structure of the input time series, the wavelet transform can successfully lead to an accurate prediction especially when input data are limited (Ghimire et al., 2019). Recently, some researchers investigated the possibility and the advantage of integrating the wavelet transform (WT) approach with AI based models for diverse river WQ simulations. Barzegar et al., 2017, Barzegar et al., 2016 integrated WT with ELM, ANIFS and ANN models to predict EC and salinity. Research findings evidenced the improvement of the prediction accuracy using the pre-processing approach. Several other researchers conducted similar studies on the integration of the WT with AI models and demonstrated its successful implementation for river WQ simulations (Montaseri et al., 2018, Rajaee et al., 2018, Ravansalar et al., 2016b, Ravansalar et al., 2016a, Ravansalar and Rajaee, 2015). These studies indicated that the hybridization of the WT with AI models presented an optimistic new computer aid approach for environmental modeling. The enthusiasm of the exploration of new robust and reliable soft computing predictive models is a new modeling trend for better watershed management and sustainability.

In the current state-of-the-art research, a new hybrid artificial intelligence model, called wavelet-multigene genetic programming (W-MGGP), is developed for accurately predicting the monthly TDS levels at the Sefid Rud River in Iran. The selection of the MGGP model is owing to its feasibility in modeling highly non-linear time series (Mehr and Safari, 2020, Mohammad-Azari et al., 2020). In this study, the influence of the discrete wavelet transform is explored in combination with the MGGP and GEP models. The capacity of the new version of GP and MGGP has been employed for limited hydrological and environmental forecasting (Dadandeh Mehr and Demirel, 2016, Danandeh Mehr and Nourani, 2017), and thus the current research is devoted to the TDS prediction.

Section snippets

Multigene genetic programming

GP is an optimization technique that utilizes the principle of Darwin’s theory (Gandomi et al., 2010). The principle of GP is similar to that of the genetic algorithm (GA) and thus both methods use the three main operators: crossover, mutation, and selection (Danandeh Mehr et al., 2018). The main difference between these two methods is how to present solutions. The GA presents solutions by strings with fixed lengths, while the GP expresses the solutions by tree structures with varying sizes (

Case study and data analysis

In this study, the data (monthly time scale) are related to the Astane gauging station located on the Sefid Rud River (Longitude 49° 37′ 40′', Latitude 36° 57′ 02′'). The observed data are employed to build the AI based models for TDS prediction. The Sefid Rud River has a length of 670 km and a drainage area of 13,450 km2. It is the longest river in Northern Iran. In this study, monthly discharge and TDS data were obtained over a 20-year period (1985–2005, 240 months). Fig. 3 displays the

Proposed wavelet-multigene genetic programming method

In this research, the W-MGGP model is developed by combining the DWT and MGGP models. To do so, the datasets of Q and TDS are composed into several sub-datasets. To decompose a dataset using the wavelet transform, it is very important to choose a mother wavelet and a decomposition level for modeling. According to Nourani et al. (2014), the mother wavelet db4 is the most efficient in producing time localization properties for time series. In addition, the mother wavelets of bior6.8 and dmey are

Results and discussion

The efficiencies of the MGGP, W-MGGP, GEP, and W-GEP models were compared and evaluated. Their statistical performance results for the seven input combinations are shown in Table 4, Table 5. For the MGGP model (Table 4), combination (6), which used the TDS of the first, second, and third successive previous months and the Q of the current, first, and second successive previous months as inputs, yielded a better performance than the other combinations in terms of R (0.396), RMSE (239.718), and

Conclusions

In the current study, a hybrid wavelet-multigene genetic programming (W-MGGP) model using three mother wavelets (db4, bior6.8, and dmey) was developed to simulate the monthly TDS levels of river water. Particularly, the W-WGGP was compared with the W-GEP, MGGP, and GEP model and their efficiencies and performances were evaluated. The time series of river discharge and TDS over a 20-years period were utilized for the development of the predictive models. Statistical analysis (i.e.,

CRediT authorship contribution statement

Mehdi Jamei: Conceptualization, Data curation, Formal analysis, Investigation, Methodology, Project administration, Resources, Supervision, Validation, Visualization, Writing - original draft, Writing - review & editing. Iman Ahmadianfar: Conceptualization, Data curation, Formal analysis, Investigation, Methodology, Project administration, Resources, Software, Supervision, Validation, Visualization, Writing - original draft, Writing - review & editing. Xuefeng Chu: Conceptualization,

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgement

The authors acknowledge their appreciation for the dataset provider “Guilan regional water company.” In addition, the authors are very much thankful for the editors and reviewers for their constructive comments and suggestions.

References (63)

  • H.R. Maier et al.

    Understanding the behaviour and optimising the performance of back-propagation neural networks: an empirical study

    Environ. Modell. Software

    (1998)
  • H.R. Maier et al.

    Evolutionary algorithms and other metaheuristics in water resources: current status, research challenges and future directions

    Environ. Modell. Software

    (2014)
  • J.E. Nash et al.

    River flow forecasting through conceptual models part I — a discussion of principles

    J. Hydrol.

    (1970)
  • V. Nourani et al.

    Applications of hybrid wavelet–Artificial Intelligence models in hydrology: a review

    J. Hydrol.

    (2014)
  • E. Olyaie et al.

    A comparative analysis among computational intelligence techniques for dissolved oxygen prediction in Delaware River

    Geosci. Front.

    (2017)
  • M. Ravansalar et al.

    A wavelet-linear genetic programming model for sodium (Na+) concentration forecasting in rivers

    J. Hydrol.

    (2016)
  • A. Takdastan et al.

    Neuro-fuzzy inference system Prediction of stability indices and Sodium absorption ratio in Lordegan rural drinking water resources in west Iran

    Data in Brief

    (2018)
  • W.C. Wang et al.

    A comparison of performance of several artificial intelligence methods for forecasting monthly discharge time series

    J. Hydrol.

    (2009)
  • C.L. Wu et al.

    Prediction of rainfall time series using modular soft computing methods

    Eng. Appl. Artif. Intell.

    (2013)
  • I. Ahmadianfar et al.

    A novel hybrid wavelet-locally weighted linear regression (W-LWLR) model for electrical conductivity (EC) prediction in water surface

    J. Contam. Hydrol.

    (2020)
  • M.J. Alizadeh et al.

    Effect of river flow on the quality of estuarine and coastal waters using machine learning models

    Eng. App. Comput. Fluid Mech.

    (2018)
  • R.S. Ayers et al.

    Water Quality for Agriculture

    (1985)
  • A. Azad et al.

    Prediction of water quality parameters using ANFIS optimized by intelligence algorithms (Case study: Gorganrood River)

    KSCE J. Civ. Eng.

    (2017)
  • F.B. Banadkooki et al.

    Estimation of total dissolved solids (TDS) using new hybrid machine learning models

    J. Hydrol.

    (2020)
  • R. Barzegar et al.

    Application of wavelet-artificial intelligence hybrid models for water quality prediction: a case study in Aji-Chay River, Iran

    Stochastic Environ. Res. Risk Assess.

    (2016)
  • R. Barzegar et al.

    Multi-step water quality forecasting using a boosting ensemble multi-wavelet extreme learning machine model

    Stoch. Env. Res. Risk Assess.

    (2017)
  • O. Bozorg-Haddad et al.

    Modeling water-quality parameters using genetic algorithm-least squares support vector regression and genetic programming

    J. Environ. Eng.

    (2017)
  • Chen, X., Chau, K., 2019. Uncertainty Analysis on Hybrid Double Feedforward Neural Network Model for Sediment Load...
  • A. Dadandeh Mehr et al.

    On the calibration of multigene genetic programming to simulate low flows in the Moselle river

    Uludağ Univ. J. Faculty Eng.

    (2016)
  • A. Danandeh Mehr et al.

    Genetic programming in water resources engineering: a state-of-the-art review

    J. Hydrol.

    (2018)
  • P. Das et al.

    Hybrid wavelet packet machine learning approaches for drought modeling

    Environ. Earth Sci.

    (2020)
  • Cited by (68)

    • Prediction of total dissolved solids, based on optimization of new hybrid SVM models

      2023, Engineering Applications of Artificial Intelligence
    View all citing articles on Scopus
    View full text