Elsevier

Journal of Hydrology

Volume 537, June 2016, Pages 398-407
Journal of Hydrology

A wavelet–linear genetic programming model for sodium (Na+) concentration forecasting in rivers

https://doi.org/10.1016/j.jhydrol.2016.03.062Get rights and content

Highlights

  • This study presents the ANN, WANN, LGP and WLGP models for prediction of sodium.

  • Among the applied models, WLGP showed the best efficiency in terms of the NSE, RMSE, MAE and P.

  • WLGP was better approximated the cumulative streamflow than the other models.

  • The innovation of the study is prediction of the sodium using LGP and WLGP models.

Summary

The prediction of water quality parameters in water resources such as rivers is of importance issue that needs to be considered in better management of irrigation systems and water supplies. In this respect, this study proposes a new hybrid wavelet–linear genetic programming (WLGP) model for prediction of monthly sodium (Na+) concentration. The 23-year monthly data used in this study, were measured from the Asi River at the Demirköprü gauging station located in Antakya, Turkey. At first, the measured discharge (Q) and Na+ datasets are initially decomposed into several sub-series using discrete wavelet transform (DWT). Then, these new sub-series are imposed to the ad hoc linear genetic programming (LGP) model as input patterns to predict monthly Na+ one month ahead. The results of the new proposed WLGP model are compared with LGP, WANN and ANN models. Comparison of the models represents the superiority of the WLGP model over the LGP, WANN and ANN models such that the Nash–Sutcliffe efficiencies (NSE) for WLGP, WANN, LGP and ANN models were 0.984, 0.904, 0.484 and 0.351, respectively. The achieved results even points to the superiority of the single LGP model than the ANN model. Continuously, the capability of the proposed WLGP model in terms of prediction of the Na+ peak values is also presented in this study.

Introduction

Prediction of surface water quality has been proved as an important issue in water resources and environmental engineering. Rivers, as one of the most important water resources for mankind, most adversely affected source in quality due to their dynamic nature. Na+ contamination in surface water such as rivers is widespread. Sodium impurities make their way into our environment through the run-off from rain, melting snow and ice, as well as through splash and spray by vehicles and by wind. They find their way onto vegetation and into the soil, groundwater, storm drains, and surface waters make a significant impact on the environment. Therefore, it is crucial to find an accurate indirect way for quantification of the Na+ concentration for future in surface water as for implementing better water resource management. It is difficult to describe water quality quantitatively using accurate mathematical models and to establish an accurate prediction model using classical models, as the classical models usually need large amounts of data and a long response time. Studies have been conducted to reduce the complexity of these problems by developing practical methods that do not require extensive algorithms and theory. The classical techniques such as multi-linear regression (MLR) model are used widely for hydrological time series (Zounemat-Kermani, 2012, Magar and Jothiprakash, 2011). The limitation of this type of models underlies their linearity to capture non-stationarities and non-linearities in hydrologic dataset; therefore, they are no longer adequately effective in dealing with these types of problems. In recent years, the Artificial Intelligence (AI) models such as ANN which is known as a reliable tools has been successfully utilized in different research areas and even in field of water quality and environmental engineering (Atieh et al., 2015, Mekanik et al., 2013, Wen et al., 2013, Karakaya et al., 2013, Wu et al., 2014). Atieh et al. (2015) gained significant outcomes of the ANN models for sediment rating curve parameters in ungauged basins of Ontario, Canada. In their study, the ANNs were essential tools required for calculation of sediment for water quality management plans for ungauged basins. The application of ANNs was proposed by Mekanik et al. (2013) for long-term rainfall forecasting in Victoria, Australia. They found that the ANN models outperformed the MR models, and ANNs are known as a reliable models for measuring rainfall time series. Wen et al. (2013), utilized an ANN model for prediction of dissolved oxygen in Heihe River in China. From results of their study it was found that the identified ANN model can be used to simulate the water quality parameters. Karakaya et al. (2013) proposed ANN models for prediction of DO and chlorophyll-a (chl-a) in a lake in Turkey. Findings of the modeling revealed the higher accuracy of the ANN model to predict DO and chl-a. Another study developed by Wu et al. (2014) utilized an ANN model for prediction of chl-a. A comparison of a developed ANN model with a linear model such as MLR showed the higher precision of the ANN model in solving the non-linear and complex problems in their study.

Recently, linear genetic programing (LGP) model was applied as a useful tool for modeling in managing water resources. Danandeh Mehr et al. (2014b), investigated application of LGP for prediction of monthly streamflow at a river. Their findings showed the potential ability of LGP as a useful tool for prediction of monthly streamflow by comparing with three different ANN models. An LGP model was used for prediction of flow discharge in compound channels by Zahiri and Azamathulla (2014). They compared the efficiency of the LGP by M5 tree model. Then the accuracy of the LGP model was outperformed by another model for flow discharge prediction. The use of hybrid models to increase the efficiency of the models in predicting the hydrological time series especially in water quality modeling is widely common: a hybrid ANN and wavelet models for simulation of 30-min dissolved oxygen (Ravansalar et al., 2015), hybrid wavelet and support vector machine models for precipitation forecasting (Kisi and Cimen, 2011), hybrid artificial neural network-geo-statistic models for spatiotemporal groundwater level forecasting in coastal aquifers (Nourani et al., 2010), a hybrid genetic algorithm-support vector regression (GA-SVR) models for chlorophyll-a level forecasting (Rajaee and Boroumand, 2015), a Hybrid Wavelet–Genetic Programming Approach to Optimize ANN Modeling of Rainfall–Runoff Process (Nourani et al., 2012) and a hybrid wavelet and support vector machine for meteorological pollution forecasting (Osowski and Garanty, 2006) are among the examples of the use of such models.

The wavelet transform has become a useful mathematical tool for analyzing variations in time series which provides a time–frequency representation of an analyzed signal in temporal domain. Combining the DWT with single LGP model shows superiority compared to a single LGP model for hydrological time series. In recent decades, combination of the DWT theory with the other artificial intelligence (AI) models has been employed on some studies in water resources and environmental engineering (Adamowski and Sun, 2010, Rajaee, 2010, Rajaee et al., 2011, Danandeh Mehr et al., 2013, Danandeh Mehr et al., 2014a, Ravansalar et al., 2015, Ravansalar and Rajaee, 2015, Nourani et al., 2011, Nourani et al., 2013).

Adamowski and Sun (2010) utilized a discrete “a Trous” mother wavelet for decomposition of daily streamflow time series from two rivers of Cyprus. They used the sub-series generating by wavelet as input variable of the ANN model (WA-ANN), and then they compared results of the proposed model performance with ANN. In their study, the WA-ANN model was found to provide more accurate flow forecasts than the ANN model for both rivers. Rajaee (2010) proposed a model combining wavelet analysis and the neuro-fuzzy (WNF) model to predict the daily suspended sediment load in a gauging station in the USA. In the provided model, the daily discharge and SSL signals were decomposed into sub-signals with different scales. The results showed that proposed model performed better than the neuro-fuzzy (NF), MLR and SRC models in predicting SSL. Rajaee et al. (2011) studied WANN, ANN, MLR and SRC models for daily SSL estimation at a gauging station in the Iowa River in the USA. A comparison of the results indicated that the WANN model predicted the suspended sediment load more accurately than the conventional ANN, MLR and SRC models. Danandeh Mehr et al. (2013) proposed a hybrid DWT with ANN model to predict monthly streamflow one month ahead. They modeled it by the LGP model as well. The obtained results showed the superiority of the WANN model over the ANN model, but the LGP model showed a better result in comparison with the WANN and ANN models. In another study, Danandeh Mehr et al. (2014a) investigated the accuracy of the WLGP model for drought forecasting with 3, 6 and 12 months lead times in the state of Texas. In their study, the main time series of EI Nino-Southern Oscillation indicator (NINO) and Palmer’s modified drought index (PMDI) were decomposed into some sub-time series by continuous wavelet transform (CWT). Then significant bands were determined using significant variance method in the wavelet spectra. These significant bands then were used as inputs to the LGP model for drought forecasting. They compared the model’s results with the LGP, ANN, WANN, fuzzy logic (FL) and wavelet-FL (WFL) models and found that the WLGP model performed better than those models in drought forecasting. Ravansalar et al. (2015) proposed a hybrid WANN model to predict the dissolved oxygen (DO) thirty minutes ahead in a gauging station of River Calder, UK. In their study, the thirty minutes of DO and temperature (T) were decomposed into several sub-series by four types of the mother wavelet. These sub-series were then imposed to the ANN model as input patterns. The results of their study showed the superiority of the WANN model compared to the single ANN model in DO prediction thirty minutes ahead. Ravansalar and Rajaee (2015) applied DWT theory to decompose the monthly electrical conductivity (EC) and Q time series of Demirköprü Station on Asi River, Turkey. The sub-series were used as inputs for the ANN model for prediction of the EC one month ahead. The decomposition time series of EC and Q were done by seven types of mother wavelet. The WANN model which used the Dmey wave name was selected as the best model in level 3 compared to the ad hoc ANN model in prediction of the EC one month ahead. Nourani et al. (2011) utilized wavelet technique to decompose the rainfall–runoff time series and linked it to the ANFIS and ANN models. Comparing WANFIS and WANN models showed their better outcomes than the other models such as SARIMAX-ANN, ANN and ANFIS. A comparison of some models such as SOM-FFNN (self-organizing map- feed-forward neural network) and ARIMAX (auto regressive integrated moving average with exogenous input) to SOM-WT-FFNN model were done by Nourani et al. (2013). Results of their study showed that the application of wavelet transform to the runoff data increased the performance of the FFNN models (SOM-WT-FFNN). A comprehensive account of the wavelet theory and its combination with other AI models can be found in Nourani et al. (2014).

This study presents as a new investigation that couples DWT with LGP model for water quality parameters prediction along with multiple hydrological input patterns. The main aim of this research was to investigate the accuracy of the DWT theory linked to the LGP model for prediction of water quality parameters such as Na+ concentration at the Asi River located in the Antakya, Turkey. An investigation into hybrid wavelet and linear genetic programming for water quality modeling particularly regarding sodium concentration prediction has yet to be conducted in the literature. This incipient area has been mostly under-explored.

The structure of the rest of the paper is as follows, at first, the paper introduces the LGP, ANN and DWT in Section 2. Forthcoming section proposed the suggested new WLGP model combination. The gauging station and the statistical analysis of the dataset is described alongside the model application and results section, respectively. The conclusions and suggestions for future similar studies are presented as well.

Section snippets

Linear genetic programming (LGP)

The LGP model is one of the applications of the genetic algorithm (GA) which was proposed by Koza (1992). The LGP is extremely practical as it provides sufficient accuracy and is a novel application among algorithms approaches. The difference between LGP and GA techniques is representation of the solution. The LGP activates on parse trees although the GA operates on bit strings. The parse tree was created by both the function set and terminal set (Muttil and Chau, 2006). The function set can be

Gauging station and data analysis

The data used in this study are from the Demirköprü Station (No: 1907; 36°21′12′′E, 36°15′05′′N) on the Asi (Orontes) River, Antakya (P36), Turkey. The Demirköprü Station is located near the Turkish and Syrian border. The monthly Q and Na+ were selected for modeling as independent and dependent variables respectively for modeling. The river is approximately 380 km long, and 88 km of its downstream portion is within the borders of Turkey (Ravansalar and Rajaee, 2015). The location of the studied

Model evaluation criteria

In this study, the Nash–Sutcliffe efficiency (NSE) (Nash and Sutcliffe, 1970), the root mean square error (RMSE), mean absolute error (MAE) and pearson correlation coefficient (P) were used to evaluate the models accuracies between the measured and predicted values. NSE measures the degree of the correlation between the observed and predicted values and ranges between −∞ and 1 where for optimal model prediction NSE close to 1 is preferred. NSE equation is as follows:NSE=1-i=1n(Naim-Naip)2i=1n(

Result and discussion

The performance of the LGP, WLGP, WANN and ANN models were evaluated and compared. The results are presented in Table 3. Six input combinations were examined for all models. The combinations were arranged according to the results of the correlations in Table 2, Table 3. Within LGP models, model (2) uses the data from a successive previous month with the lowest RMSE as input (0.115 meq/l) and MAE (0.085) with highest NSE (0.484) and P (0.696) during the testing period. In the WLGP model,

Conclusion and remarks

In this study, four frequently used data-driven models including LGP, WLGP, ANN and WANN, were evaluated in terms of prediction of monthly Na+ of a gauging station at the Asi River, Antakya, Turkey. Comparing the models’ outcome, it was found the WLGP model which constructed the decomposed time series as the input vectors and linked them to the single LGP model, showed the best performance among all models with NSE = 0.984, P = 0.992, MAE = 0.011 and RMSE = 0.019 meq/l. The presented WLGP model improved

References (46)

  • V. Nourani et al.

    Applications of hybrid wavelet–artificial intelligence models in hydrology: a review

    J. Hydrol.

    (2014)
  • R. Noori et al.

    Assessment of input variables determination on the SVM model performance using PCA, Gamma test, and forward selection techniques for monthly stream flow prediction

    J. Hydrol.

    (2011)
  • T. Partal et al.

    Estimation and forecasting of the daily suspended sediment data using wavelet-neural networks

    J. Hydrol.

    (2008)
  • T. Rajaee et al.

    Forecasting of chlorophyll-a concentrations in South San Francisco Bay using five different models

    Appl. Ocean Res.

    (2015)
  • T. Rajaee et al.

    Daily suspended sediment concentration simulation using ANN and neuro-fuzzy models

    Sci. Total Environ.

    (2009)
  • T. Rajaee

    Wavelet and ANN combination model for prediction of daily suspended sediment load in rivers

    J. Sci. Total Environ.

    (2011)
  • H.D. Tran et al.

    Selection of significant input variables for time series forecasting

    Environ. Modell. Software

    (2015)
  • M. Zounemat-Kermani et al.

    Performance of radial basis and LM-feed forward artificial neuralnetworks for predicting daily watershed runoff

    Appl. Soft Comput.

    (2013)
  • P.S. Addison et al.

    Wavelet transform analysis of open channel wake flows

    J. Eng. Mech.

    (2001)
  • B. Cannas et al.

    Stream flow forecasting using neural networks and wavelet analysis

    J. Eur. Geosci. Union.

    (2005)
  • A. Cohen et al.

    Wavelets: the mathematical background

    Proc IEEE.

    (1996)
  • I. Daubechies

    The wavelet transform, time–frequency localization and signal analysis

    IEEE Trans. Inf. Theory

    (1990)
  • M.T. Hagan et al.

    Training feed forward networks with the Marquaradt algorithm

    IEEE Trans. Neural Netw.

    (1994)
  • Cited by (0)

    View full text