Testing the structure of a hydrological model using Genetic Programming

https://doi.org/10.1016/j.jhydrol.2010.11.009Get rights and content

Summary

Genetic Programming is able to systematically explore many alternative model structures of different complexity from available input and response data. We hypothesised that Genetic Programming can be used to test the structure of hydrological models and to identify dominant processes in hydrological systems. To test this, Genetic Programming was used to analyse a data set from a lysimeter experiment in southeastern Australia. The lysimeter experiment was conducted to quantify the deep percolation response under surface irrigated pasture to different soil types, watertable depths and water ponding times during surface irrigation. Using Genetic Programming, a simple model of deep percolation was recurrently evolved in multiple Genetic Programming runs. This simple and interpretable model supported the dominant process contributing to deep percolation represented in a conceptual model that was published earlier. Thus, this study shows that Genetic Programming can be used to evaluate the structure of hydrological models and to gain insight about the dominant processes in hydrological systems.

Research highlights

Lysimeter data were analysed using Genetic Programming (GP). ► A simple model was recurrently evolved in multiple GP runs. ► GP model supported structure of a hydrological model that was published earlier.

Introduction

Typically, a hydrological model can be formulated as:qt=f(xt,β)+ε1t=1,2,,n,where t is a time interval, qt is the measured response of a hydrological system (such as streamflow or deep percolation below the plant rootzone) predicted by function f (), xt is a vector of inputs such as rainfall and potential evapotranspiration, β is a vector of model parameters and εt is an error. In this paper, f (xt, β) is referred to as model structure representing hydrological processes contributing to response qt. The model structure is an important source of uncertainty in hydrological predictions and should therefore be as rigorously tested as possible (Beven, 2001). The problem of identifying a model structure from an observed set of system inputs and responses has received considerable attention in control theory (see for e.g. Ljung, 1999 and references therein) and statistics (see e.g. Breiman, 2001, Chatfield, 1995, and the discussions therein). Particularly in statistics, it is often argued that there maybe multiple model structures that explain the observed data equally well and, at the same time, are physically plausible. While this is not necessarily a problem if the modelling purpose is prediction (as model predictions can be aggregated over a large set of competing models), it will represent an issue if the purpose of the modelling is system understanding. For most applications of hydrological models, a limited number of model structures are considered to be plausible. Consequently, only a few alternative model formulations are tested using some statistics of the model residuals ε such as the root mean square error. In addition to these statistics, model residuals should be checked for unexplained structure such as correlations with model inputs and variables that were not included in the model or trends to ensure that all information has been extracted from the available data (Kirchner et al., 1996). While few alternatives seem to be available for these tests based on model residuals, there is often limited rigor in unsystematically testing the structure of hydrological models. In particular, as complexity of models increases the problem of non-uniqueness of model structures increases, i.e. many different model structures having similar error statistics and characteristics (Beven and Freer, 2001). Conversely, for simpler models representing only a limited number of dominant processes, non-uniqueness is typically less problematic. However, as usually only a limited number of model structures are tested, it is difficult to know whether a robust, sufficiently simple model has been found and the dominant processes have been identified.

Genetic Programming (GP) is able to systematically explore many alternative model structures of different complexity from available input and response data. It may help to transform a set of observed input and response data into a conceptual model of the underlying dominant processes. Therefore we hypothesised that GP can be used to identify dominant processes in hydrological systems and to evaluate the structure of hydrological models. To test this, GP was used to analyse a data set from a lysimeter experiment in southeastern Australia. Based on the GP analysis, we evaluated an existing conceptual model of deep percolation that had been previously developed with these experimental data.

Section snippets

Genetic programming

Genetic Programming (GP) is a relatively new automatic programming technique for evolving computer programs to solve, or approximately solve, problems (Koza, 1992). In engineering applications, GP is frequently applied to model structure identification problems. In such applications, GP is used to infer the underlying structure of either a natural or experimental process in order to model the process numerically. A number of applications of GP have been reported in water resources, which

Analysis of lysimeter data using GP

Using the five different maximum equation sizes (i.e., 5, 8, 9, 10 and 15) and multiple GP runs with different initialisations, we found that the final infiltration rate of the subsoil if, the ponding time to and the watertable depth GWD were selected at least once per GP run (Table 2). The amount of water stored in the rootzone between saturation and field capacity DW, the daily average rainfall R and the sum of daily crop evapotranspiration ET between two consecutive irrigations were less

Concluding remarks

In this study, GP was used to analyse a data set from a lysimeter experiment in southeastern Australia. We investigated the recurrence and performance of GP derived models using multiple GP runs and different equation sizes. The GP model DP = if to a GWD was recurrently evolved in the multiple GP runs up to a maximum equation size of nine. This simple model was readily interpretable. It supported that steady-state percolation during irrigation, as represented in the conceptual model developed by

Acknowledgements

This work was funded by the Department of Primary Industries, the Department of Sustainability and Environment, North Central Catchment Management Authority and the Goulburn Broken Catchment Management Authority. We would like to thank Murray Hannah (Department of Primary Industries), Prof. Chris Perera (Victoria University) and two anonymous reviewers for their helpful comments on the manuscript.

References (23)

  • C. Chatfield

    Model uncertainty, data mining and statistical inference export

    Journal of the Royal Statistical Society, Series A (Statistics in Society)

    (1995)
  • Cited by (23)

    • Intercomparison of downscaling methods for daily precipitation with emphasis on wavelet-based hybrid models

      2021, Journal of Hydrology
      Citation Excerpt :

      In the case of SDSM, all the probable predictors are considered as the model predictors and the model is developed using the SDSM 5.2 toolbox (González-Rojí et al., 2019). For the GP models, a population size of 500 is chosen, and the mutation and crossover rates are kept at high value, following Sivapragasam et al. (2008) and Selle and Muttil (2011). The mean squared error (MSE) is used as the model fitness function.

    • Genetic programming for predictions of effectiveness of rolling dynamic compaction with dynamic cone penetrometer test results

      2019, Journal of Rock Mechanics and Geotechnical Engineering
      Citation Excerpt :

      However, as with GAs, GP performs a multi-directional simultaneous search for an optimal solution from a pool of many potential solutions, collectively known as a ‘population’. The fact that these methods operate from a population enables them to escape local minima in the error surface and is thus able to find optimal or near optimal solutions (Selle and Muttil, 2011). In the traditional GP approach, which is also referred to as tree-based genetic programming (TGP), the computer programs (individuals) have a symbolic representation of a rooted tree-like structure with ordered branches in which the root node and internal nodes are comprised of functions whereas, external nodes (leaves) contain the input values or constants (Koza, 1992).

    • Statistical downscaling of precipitation using machine learning techniques

      2018, Atmospheric Research
      Citation Excerpt :

      In the GP algorithm, initially, a set of equations (e.g. downscaling models) is randomly generated for relating the predictors with the predictand. Then these equations (models) are evolved by performing various genetic operations on them until their fitness reaches a maximum (Whigham and Crapper, 2001) or a predefined fitness threshold or in certain instances up to a predefined number of generations (Selle and Muttil, 2011). For the implementation of GP, training (calibration) and testing (validation) data sets, population size, tree size (depth of a tree), a set of terminals, a set of mathematical functions, a fitness measure, criterion for selecting models for the mating pool, values of genetic operators (e.g. probabilities of crossover, mutation and replication) and termination/stopping criterion should be predefined.

    • A Pareto-optimal moving average multigene genetic programming model for daily streamflow prediction

      2017, Journal of Hydrology
      Citation Excerpt :

      Mimicking Darwinian evolution, this process is iterated until the population contains programs that solve the task well (Searson, 2015). In hydrological applications, GP is commonly used to infer the underlying structure of either natural (e.g., Ghorbani et al., 2010; Danandeh Mehr et al., 2013; Nourani et al., 2013b; Sattar and Gharabaghi, 2015; Meshgi et al., 2015; Ravansalar et al., 2016) or experimental (e.g., Selle and Muttil, 2011; Khan et al., 2012; Uyumaz et al., 2014) processes. In such applications, GP generates some possible programs (solutions) representing the underlying process mathematically.

    View all citing articles on Scopus
    View full text