A wavelet–linear genetic programming model for sodium (Na+) concentration forecasting in rivers
Introduction
Prediction of surface water quality has been proved as an important issue in water resources and environmental engineering. Rivers, as one of the most important water resources for mankind, most adversely affected source in quality due to their dynamic nature. Na+ contamination in surface water such as rivers is widespread. Sodium impurities make their way into our environment through the run-off from rain, melting snow and ice, as well as through splash and spray by vehicles and by wind. They find their way onto vegetation and into the soil, groundwater, storm drains, and surface waters make a significant impact on the environment. Therefore, it is crucial to find an accurate indirect way for quantification of the Na+ concentration for future in surface water as for implementing better water resource management. It is difficult to describe water quality quantitatively using accurate mathematical models and to establish an accurate prediction model using classical models, as the classical models usually need large amounts of data and a long response time. Studies have been conducted to reduce the complexity of these problems by developing practical methods that do not require extensive algorithms and theory. The classical techniques such as multi-linear regression (MLR) model are used widely for hydrological time series (Zounemat-Kermani, 2012, Magar and Jothiprakash, 2011). The limitation of this type of models underlies their linearity to capture non-stationarities and non-linearities in hydrologic dataset; therefore, they are no longer adequately effective in dealing with these types of problems. In recent years, the Artificial Intelligence (AI) models such as ANN which is known as a reliable tools has been successfully utilized in different research areas and even in field of water quality and environmental engineering (Atieh et al., 2015, Mekanik et al., 2013, Wen et al., 2013, Karakaya et al., 2013, Wu et al., 2014). Atieh et al. (2015) gained significant outcomes of the ANN models for sediment rating curve parameters in ungauged basins of Ontario, Canada. In their study, the ANNs were essential tools required for calculation of sediment for water quality management plans for ungauged basins. The application of ANNs was proposed by Mekanik et al. (2013) for long-term rainfall forecasting in Victoria, Australia. They found that the ANN models outperformed the MR models, and ANNs are known as a reliable models for measuring rainfall time series. Wen et al. (2013), utilized an ANN model for prediction of dissolved oxygen in Heihe River in China. From results of their study it was found that the identified ANN model can be used to simulate the water quality parameters. Karakaya et al. (2013) proposed ANN models for prediction of DO and chlorophyll-a (chl-a) in a lake in Turkey. Findings of the modeling revealed the higher accuracy of the ANN model to predict DO and chl-a. Another study developed by Wu et al. (2014) utilized an ANN model for prediction of chl-a. A comparison of a developed ANN model with a linear model such as MLR showed the higher precision of the ANN model in solving the non-linear and complex problems in their study.
Recently, linear genetic programing (LGP) model was applied as a useful tool for modeling in managing water resources. Danandeh Mehr et al. (2014b), investigated application of LGP for prediction of monthly streamflow at a river. Their findings showed the potential ability of LGP as a useful tool for prediction of monthly streamflow by comparing with three different ANN models. An LGP model was used for prediction of flow discharge in compound channels by Zahiri and Azamathulla (2014). They compared the efficiency of the LGP by M5 tree model. Then the accuracy of the LGP model was outperformed by another model for flow discharge prediction. The use of hybrid models to increase the efficiency of the models in predicting the hydrological time series especially in water quality modeling is widely common: a hybrid ANN and wavelet models for simulation of 30-min dissolved oxygen (Ravansalar et al., 2015), hybrid wavelet and support vector machine models for precipitation forecasting (Kisi and Cimen, 2011), hybrid artificial neural network-geo-statistic models for spatiotemporal groundwater level forecasting in coastal aquifers (Nourani et al., 2010), a hybrid genetic algorithm-support vector regression (GA-SVR) models for chlorophyll-a level forecasting (Rajaee and Boroumand, 2015), a Hybrid Wavelet–Genetic Programming Approach to Optimize ANN Modeling of Rainfall–Runoff Process (Nourani et al., 2012) and a hybrid wavelet and support vector machine for meteorological pollution forecasting (Osowski and Garanty, 2006) are among the examples of the use of such models.
The wavelet transform has become a useful mathematical tool for analyzing variations in time series which provides a time–frequency representation of an analyzed signal in temporal domain. Combining the DWT with single LGP model shows superiority compared to a single LGP model for hydrological time series. In recent decades, combination of the DWT theory with the other artificial intelligence (AI) models has been employed on some studies in water resources and environmental engineering (Adamowski and Sun, 2010, Rajaee, 2010, Rajaee et al., 2011, Danandeh Mehr et al., 2013, Danandeh Mehr et al., 2014a, Ravansalar et al., 2015, Ravansalar and Rajaee, 2015, Nourani et al., 2011, Nourani et al., 2013).
Adamowski and Sun (2010) utilized a discrete “a Trous” mother wavelet for decomposition of daily streamflow time series from two rivers of Cyprus. They used the sub-series generating by wavelet as input variable of the ANN model (WA-ANN), and then they compared results of the proposed model performance with ANN. In their study, the WA-ANN model was found to provide more accurate flow forecasts than the ANN model for both rivers. Rajaee (2010) proposed a model combining wavelet analysis and the neuro-fuzzy (WNF) model to predict the daily suspended sediment load in a gauging station in the USA. In the provided model, the daily discharge and SSL signals were decomposed into sub-signals with different scales. The results showed that proposed model performed better than the neuro-fuzzy (NF), MLR and SRC models in predicting SSL. Rajaee et al. (2011) studied WANN, ANN, MLR and SRC models for daily SSL estimation at a gauging station in the Iowa River in the USA. A comparison of the results indicated that the WANN model predicted the suspended sediment load more accurately than the conventional ANN, MLR and SRC models. Danandeh Mehr et al. (2013) proposed a hybrid DWT with ANN model to predict monthly streamflow one month ahead. They modeled it by the LGP model as well. The obtained results showed the superiority of the WANN model over the ANN model, but the LGP model showed a better result in comparison with the WANN and ANN models. In another study, Danandeh Mehr et al. (2014a) investigated the accuracy of the WLGP model for drought forecasting with 3, 6 and 12 months lead times in the state of Texas. In their study, the main time series of EI Nino-Southern Oscillation indicator (NINO) and Palmer’s modified drought index (PMDI) were decomposed into some sub-time series by continuous wavelet transform (CWT). Then significant bands were determined using significant variance method in the wavelet spectra. These significant bands then were used as inputs to the LGP model for drought forecasting. They compared the model’s results with the LGP, ANN, WANN, fuzzy logic (FL) and wavelet-FL (WFL) models and found that the WLGP model performed better than those models in drought forecasting. Ravansalar et al. (2015) proposed a hybrid WANN model to predict the dissolved oxygen (DO) thirty minutes ahead in a gauging station of River Calder, UK. In their study, the thirty minutes of DO and temperature (T) were decomposed into several sub-series by four types of the mother wavelet. These sub-series were then imposed to the ANN model as input patterns. The results of their study showed the superiority of the WANN model compared to the single ANN model in DO prediction thirty minutes ahead. Ravansalar and Rajaee (2015) applied DWT theory to decompose the monthly electrical conductivity (EC) and Q time series of Demirköprü Station on Asi River, Turkey. The sub-series were used as inputs for the ANN model for prediction of the EC one month ahead. The decomposition time series of EC and Q were done by seven types of mother wavelet. The WANN model which used the Dmey wave name was selected as the best model in level 3 compared to the ad hoc ANN model in prediction of the EC one month ahead. Nourani et al. (2011) utilized wavelet technique to decompose the rainfall–runoff time series and linked it to the ANFIS and ANN models. Comparing WANFIS and WANN models showed their better outcomes than the other models such as SARIMAX-ANN, ANN and ANFIS. A comparison of some models such as SOM-FFNN (self-organizing map- feed-forward neural network) and ARIMAX (auto regressive integrated moving average with exogenous input) to SOM-WT-FFNN model were done by Nourani et al. (2013). Results of their study showed that the application of wavelet transform to the runoff data increased the performance of the FFNN models (SOM-WT-FFNN). A comprehensive account of the wavelet theory and its combination with other AI models can be found in Nourani et al. (2014).
This study presents as a new investigation that couples DWT with LGP model for water quality parameters prediction along with multiple hydrological input patterns. The main aim of this research was to investigate the accuracy of the DWT theory linked to the LGP model for prediction of water quality parameters such as Na+ concentration at the Asi River located in the Antakya, Turkey. An investigation into hybrid wavelet and linear genetic programming for water quality modeling particularly regarding sodium concentration prediction has yet to be conducted in the literature. This incipient area has been mostly under-explored.
The structure of the rest of the paper is as follows, at first, the paper introduces the LGP, ANN and DWT in Section 2. Forthcoming section proposed the suggested new WLGP model combination. The gauging station and the statistical analysis of the dataset is described alongside the model application and results section, respectively. The conclusions and suggestions for future similar studies are presented as well.
Section snippets
Linear genetic programming (LGP)
The LGP model is one of the applications of the genetic algorithm (GA) which was proposed by Koza (1992). The LGP is extremely practical as it provides sufficient accuracy and is a novel application among algorithms approaches. The difference between LGP and GA techniques is representation of the solution. The LGP activates on parse trees although the GA operates on bit strings. The parse tree was created by both the function set and terminal set (Muttil and Chau, 2006). The function set can be
Gauging station and data analysis
The data used in this study are from the Demirköprü Station (No: 1907; 36°21′12′′E, 36°15′05′′N) on the Asi (Orontes) River, Antakya (P36), Turkey. The Demirköprü Station is located near the Turkish and Syrian border. The monthly Q and Na+ were selected for modeling as independent and dependent variables respectively for modeling. The river is approximately 380 km long, and 88 km of its downstream portion is within the borders of Turkey (Ravansalar and Rajaee, 2015). The location of the studied
Model evaluation criteria
In this study, the Nash–Sutcliffe efficiency (NSE) (Nash and Sutcliffe, 1970), the root mean square error (RMSE), mean absolute error (MAE) and pearson correlation coefficient (P) were used to evaluate the models accuracies between the measured and predicted values. NSE measures the degree of the correlation between the observed and predicted values and ranges between −∞ and 1 where for optimal model prediction NSE close to 1 is preferred. NSE equation is as follows:
Result and discussion
The performance of the LGP, WLGP, WANN and ANN models were evaluated and compared. The results are presented in Table 3. Six input combinations were examined for all models. The combinations were arranged according to the results of the correlations in Table 2, Table 3. Within LGP models, model (2) uses the data from a successive previous month with the lowest RMSE as input (0.115 meq/l) and MAE (0.085) with highest NSE (0.484) and P (0.696) during the testing period. In the WLGP model,
Conclusion and remarks
In this study, four frequently used data-driven models including LGP, WLGP, ANN and WANN, were evaluated in terms of prediction of monthly Na+ of a gauging station at the Asi River, Antakya, Turkey. Comparing the models’ outcome, it was found the WLGP model which constructed the decomposed time series as the input vectors and linked them to the single LGP model, showed the best performance among all models with NSE = 0.984, P = 0.992, MAE = 0.011 and RMSE = 0.019 meq/l. The presented WLGP model improved
References (46)
- et al.
Development of a coupled wavelet transform and neural network method for flow forecasting of non-perennial rivers in semi-arid watersheds
J. Hydrol.
(2010) - et al.
Integrative neural networks model for prediction of sediment rating curve parameters for ungauged basins
J. Hydrol.
(2015) - et al.
Groundwater level forecasting using artificial neural network
J. Hydrol.
(2005) - et al.
Streamflow prediction using linear genetic programming in comparison with a neuro-wavelet technique
J. Hydrol.
(2013) - et al.
A gene-wavelet model for long lead time drought forecasting
J. Hydrol.
(2014) - et al.
Linear genetic programming application for successive-station monthly streamflow prediction
Comput. Geosci.
(2014) - et al.
Rainfall–runoff relation for karstic spring. Part 2: continuous wavelet and discrete orthogonal multi resolution analyses
J. Hydrol.
(2000) - et al.
Multiple regression and Artificial Neural Network for long-term rainfall forecasting using large scale climate modes
J. Hydrol.
(2013) - et al.
River flow forecasting through conceptual models, Part I: a discussion of principles
J. Hydrol.
(1970) - et al.
Two hybrid artificial intelligence approaches for modeling rainfall–runoff process
J. Hydrol.
(2011)