Elsevier

Journal of Hydrology

Volume 563, August 2018, Pages 669-678
Journal of Hydrology

Research papers
An improved gene expression programming model for streamflow forecasting in intermittent streams

https://doi.org/10.1016/j.jhydrol.2018.06.049Get rights and content

Highlights

  • A new hybrid model, called GEP-GA, was proposed for streamflow forecasting in intermittent streams.

  • The coefficients of the best GEP expressions were optimized through GA.

  • The GEP-GA was superior to GEP, GP, MLR and GEP-LR models.

  • The GA improved GEP accuracy up to 30% in the current study.

Abstract

Skilful forecasting of monthly streamflow in intermittent rivers is a challenging task in stochastic hydrology. In this study, genetic algorithm (GA) was combined with gene expression programming (GEP) as a new hybrid model for month ahead streamflow forecasting in an intermittent stream. The hybrid model was named GEP-GA in which sub-expression trees of the best evolved GEP model were rescaled by appropriate weighting coefficients through the use of GA optimizer. Auto-correlation and partial auto-correlation functions of the streamflow records as well as evolutionary search of GEP were used to identify the optimum predictors (i.e., number of lags) for the model. The proposed methodology was demonstrated using monthly streamflow data from the Shavir Creek in Iran. Performance of the GEP-GA was compared to that of classic genetic programming (GP), GEP, multiple linear regression and GEP-linear regression models developed in the present study as the benchmarks. The results showed that the GEP-GA outperforms all the benchmarks and motivated to be used in practice.

Introduction

Accurate streamflow forecasting is an important task for variety of issues in basin hydrology including (but not limited to) reservoir operation, irrigation planning, food production, flood damage mitigation and environmental protection. A number of models have been suggested to simulate this complex process either conceptually or through data-driven methods (Aksoy and Bayazit, 2000, Wang et al., 2009, Yaseen et al., 2017). Intermittent streams are those that may experience dry spells occasionally. This is often the case in arid and semi-arid regions (Salas, 1993), particularly in the tributaries of mountainous rivers or snow-fed streams. Because of the paucity of gauging stations in mountainous regions, the commonly used rainfall-runoff approaches may not be applicable to forecast streamflow in intermittent streams. In such situations, data-driven techniques could be implemented to model streamflow time series if a continuous set of streamflow measurements is available. Then, the evolved model could be applied for neighbouring tributaries using regionalization techniques. In recent literature, due to the advances in data-driven techniques, a number of cross-station, single-station, and successive-station monthly streamflow forecasting models have been developed and their successful results have been reported Danandeh Mehr et al. (2013).

Gene expression programming (GEP) is relatively a new data-driven method that uses population of individuals (programs), improves according to fitness, and obtains the best solution using one or more genetic operators (Ferreira 2001). However, there is foremost differences between genetic programming (GP) and GEP algorithms mainly reside in the nature of their programs. In both, programs are nonlinear entities with different size and shape. While programs are encoded as parse tree in GP, they are encoded as linear strings of fixed length in GEP which are afterwards expressed as the chromosomes. Details about GP and GEP are provided in Section 2.

In recent years, different variants of GP such as GEP, multigene GP (MGGP), and linear GP (LGP) have been used for streamflow prediction (Babovic and Keijzer, 2002, Meshgi et al., 2015, Ravansalar et al., 2017). For example, Guven (2009) compared LGP with two versions of artificial neural networks (ANNs) to predict daily streamflow of Schuylkill River in the USA. The author demonstrated that the performance of LGP is higher than ANNs. Danandeh Mehr et al. (2013) used LGP for monthly streamflow prediction between successive-stations at Çoruh River, a perennial river in Turkey and showed that LGP is superior to neuro-wavelet model. Shoaib et al. (2015) integrated GEP model with discrete wavelet transform pre-processing approach to predict streamflow using rainfall data. The main contribution of the study was the introducing a novel wavelet-GEP model applicable over four watersheds. Worth to mention, the aim of applying wavelet transform on the streamflow time series was to extract their temporal and spectral information. The authors used the sequential time series approach to determine the input vector matrix that built the predictive model. The proposed wavelet-GEP model outperformed the individual GEP model in all case study catchments during both training and testing phases. Using rainfall, potential evapotranspiration and streamflow from Moselle River basin in France, Danandeh Mehr and Demirel (2016) showed that MGGP can be satisfactorily used for one-day ahead low flow prediction. More recently, Danandeh Mehr and Kahya (2017) developed a Pareto-optimal moving MGGP model for daily streamflow prediction and demonstrate that their hybrid model can overcome the timing error in time series analysing of daily streamflow models.

Focusing on the implementation of GP/GEP in wider range of hydrological studies, the author’s review showed that they have been frequently used to distil knowledge from natural or experimental observations (e.g., Khu et al., 2001, Kisi et al., 2012b, Meshgi et al., 2014, Johari and Nejad, 2015, Danandeh Mehr, 2018). These are techniques which generate symbolic expressions that can be interpreted and combined with domain knowledge (Babovic, 2005, Babovic, 2009). Thus, motivating to be used in practice. Until recently, only a few studies focused on the application of GEP for monthly streamflow forecasting. For example, Karimi et al. (2016) forecasted river flow for both daily and monthly time scales using GEP model integrated with wavelet data pre-processing approach at Filyos River, which is a perennial river in Mediterranean region of Turkey. For comparison purpose, traditional auto regressive moving average model together with two other soft computing methods, ANNs and adaptive neuro-fuzzy inference system, were used in the study. The authors showed that wavelet-GEP was superior to its counterparts. Al-Juboori and Guven (2016) developed a GEP-based stepwise monthly streamflow prediction model and demonstrated that their model precisely forecasts monthly flows at the perennial Hurman River in Turkey as well as Diyalah and Lesser Zab Rivers in Iraq.

Table 1 has listed some of the studies that implemented at least one GP variant for time series modelling of streamflow data. As shown in the table, Karimi et al. (2016) as well as Al-Juboori and Guven’s (2016) papers are dealing with generating GEP-based monthly streamflow forecasting model for perennial rivers, whereas the present study focuses on the calibrating GEP for intermittent rivers. The main difference between the methodology of this study and those of Karimi et al., 2016, Al-Juboori and Guven, 2016 is the inclusion of seasonality effect in the selection of potential predictors which is the major pattern in the intermittent streamflow series. Moreover, the present study puts forward a new strategy to enhance the accuracy of GEP forecasts.

On the other hand, the documented studies related to the streamflow forecasting in intermittent rivers are quite limited owing to the complexity of time series modelling of intermittent flows (Kisi et al., 2012b). Although one might find a few studies that suggest the implementation of soft computing methods for intermittent streamflow forecasting (e.g., Cigizoglu, 2005, Kişi, 2009, Kisi et al., 2012b), to the best of the author’s knowledge, the present study is the first study in the literature that applies GEP for monthly streamflow forecasting in an intermittent stream. Under the lights of the abovementioned literature, a new hybridization procedure is suggested in order to augment GEP prediction accuracy. This is a new procedure by which the coefficients of the best GEP induced expression are optimized through genetic algorithm (GA). The proposed hybrid GEP-GA methodology is applied for single-station monthly streamflow forecasting at Shavir Creek, an intermittent stream located at North West of Iran. The efficiency results of the new model are compared with those of classic GP, standalone GEP as well as multi-linear regression (MLR) and hybrid GEP-linear regression (GEP-LR) models developed in the present study as the benchmarks.

Section snippets

Study area and data

The task of intermittent streamflow forecasting in arid and semi-arid regions is more complicated than in moist tropical and subtropical climates. A first order tributary of Shavir stream, an intermittent stream in Sefidrood River Basin, located in a semi-arid region in North West of Iran, was selected as the case study in the present study (Fig. 1). The stream catchment covers an area of approximately 55.5 km2, which is about 0.03% territory of Ardabil Province, Iran. The stream springs from

Prediction scenario

Fig. 6 shows the ACF and PACF of the streamflow time series for a lag range of 0–60 months. The figure includes the corresponding 95% confidence levels and exhibits a pronounced annual oscillating pattern (almost 12-month periodicity) at ACF diagram. This means that monthly streamflow at the gauging station is more correlated to its previous year amount than that of previous month. In addition, the PACF graph shows that the serial correlation is strongly weak after two years. Therefore, 1-,

Conclusions

Classic GP and GEP have difficulty creating appropriate model for intermittent streamflow forecast. Using more complicated functions, increasing runtime, number of expressions, or depth of genes could not necessarily augment their performance. By contrast, they may lead GP/GEP to over-trained models only after a few generations. This paper, proposed a novel hybrid method, GEP-GA, which embeds GA into GEP to enhance GEP performance through creating the new gene weights that meet the GEP

Acknowledgments

The streamflow data used in this research was provided by Iran Water Resource Management Company (www.wrm.ir). The author also would like to thank the reviewers for their constructive comment on the manuscript.

References (55)

  • M. Shoaib et al.

    Runoff forecasting using hybrid wavelet gene expression programming (WGEP) approach

    J. Hydrol.

    (2015)
  • H.D. Tran et al.

    Selection of significant input variables for time series forecasting

    Environ. Modell. Software

    (2015)
  • W.C. Wang et al.

    A comparison of performance of several artificial intelligence methods for forecasting monthly discharge time series

    J. Hydrol.

    (2009)
  • Z.M. Yaseen et al.

    Novel approach for streamflow forecasting using a hybrid ANFIS-FFA model

    J. Hydrol.

    (2017)
  • H. Aksoy et al.

    A daily intermittent streamflow simulator

    Turk. J. Eng. Environ. Sci.

    (2000)
  • A.M. Al-Juboori et al.

    A stepwise model to predict monthly streamflow

    J. Hydrol.

    (2016)
  • M.J. Alizadeh et al.

    Predicting longitudinal dispersion coefficient using ANN with metaheuristic training algorithms

    Int. J. Environ. Sci. Technol.

    (2017)
  • V. Babovic

    Emergence, evolution, intelligence; hydroinformatics: a study of distributed and decentralised computing using intelligent agents

    IHE Delft Inst. Water Educ.

    (1996)
  • V. Babovic

    Data mining in hydrology

    Hydrol. Process.

    (2005)
  • V. Babovic

    Introducing knowledge into learning based on genetic programming

    J. Hydroinf.

    (2009)
  • V. Babovic et al.

    The evolution of equations from hydraulic data Part I: Theory

    J. Hydr. Res.

    (1997)
  • V. Babovic et al.

    The evolution of equations from hydraulic data Part II: Applications

    J. Hydraul. Res.

    (1997)
  • V. Babovic et al.

    Genetic programming as a model induction engine

    J. Hydroinf.

    (2000)
  • V. Babovic et al.

    Rainfall runoff modelling based on genetic programming

    Nord. Hydrol.

    (2002)
  • Babović, V., Wu, Z., Larsen, L.C. (1994). Calibrating hydrodynamic models by means of simulated evolution. In...
  • K.W. Chau

    Use of meta-heuristic techniques in rainfall-runoff modelling

    Water

    (2017)
  • H.K. Cigizoglu

    Application of generalized regression neural networks to intermittent flow forecasting and estimation

    J. Hydrol. Eng.

    (2005)
  • Cited by (51)

    • Groundwater level prediction using machine learning models: A comprehensive review

      2022, Neurocomputing
      Citation Excerpt :

      Like other AI techniques, a set of training data is used to train the GP and the evolved solution must be generalized for unseen testing data sets. To minimize computational costs, a set of suitable functions, input variables, evolutionary operation rates, and a maximum depth of the GP trees must also be considered in the modeling process (Mehr and Noyrani 2018 [196]; Tur 2020 [204]). To avoid over-fitting, a lower number of functions and short trees are recommended [201,205].

    • Hourly streamflow forecasting using a Bayesian additive regression tree model hybridized with a genetic algorithm

      2022, Journal of Hydrology
      Citation Excerpt :

      Data-driven methods are superior to physical-based approaches because they do not consider the complex processes of rainfall-runoff mechanisms and they can be easily implemented (Ren et al., 2020). Scientists have applied various statistical and data-driven algorithms, including the adaptive network-based fuzzy inference system (ANFIS) (Yaseen et al., 2017; Zhou et al., 2019), artificial neural network-based models (Freire et al., 2019; Prasad et al., 2017; Taormina and Chau, 2015), deep neural network models (long short-term memory (LSTM) network and sequence-to-sequence model) (Alizadeh et al., 2021; Apaydin and Sibtain, 2021; Cheng et al., 2020; Fu et al., 2020; Le et al., 2021; Ni et al., 2020a; Yin et al., 2021), genetic programming (Danandeh Mehr, 2018; Mehr and Gandomi, 2021), support vector regression (SVR) (Adnan et al., 2020; Luo et al., 2019), and multiple linear regression (MLR) algorithms (Chokmani et al., 2008; Kim et al., 2018; Salmasi and Abraham, 2021) to solve hydrological problems and predict streamflow at different tempo-spatial scales. Their research was successful in modeling complex hydrological mechanisms and produced reliable results in streamflow predictions (Mosavi et al., 2018).

    • Artificial Intelligence-based model fusion approach in hydroclimatic studies

      2022, Handbook of HydroInformatics: Volume II: Advanced Machine Learning Techniques
    View all citing articles on Scopus
    View full text