Testing the structure of a hydrological model using Genetic Programming
Research highlights
► Lysimeter data were analysed using Genetic Programming (GP). ► A simple model was recurrently evolved in multiple GP runs. ► GP model supported structure of a hydrological model that was published earlier.
Introduction
Typically, a hydrological model can be formulated as:where t is a time interval, qt is the measured response of a hydrological system (such as streamflow or deep percolation below the plant rootzone) predicted by function f (), xt is a vector of inputs such as rainfall and potential evapotranspiration, β is a vector of model parameters and εt is an error. In this paper, f (xt, β) is referred to as model structure representing hydrological processes contributing to response qt. The model structure is an important source of uncertainty in hydrological predictions and should therefore be as rigorously tested as possible (Beven, 2001). The problem of identifying a model structure from an observed set of system inputs and responses has received considerable attention in control theory (see for e.g. Ljung, 1999 and references therein) and statistics (see e.g. Breiman, 2001, Chatfield, 1995, and the discussions therein). Particularly in statistics, it is often argued that there maybe multiple model structures that explain the observed data equally well and, at the same time, are physically plausible. While this is not necessarily a problem if the modelling purpose is prediction (as model predictions can be aggregated over a large set of competing models), it will represent an issue if the purpose of the modelling is system understanding. For most applications of hydrological models, a limited number of model structures are considered to be plausible. Consequently, only a few alternative model formulations are tested using some statistics of the model residuals ε such as the root mean square error. In addition to these statistics, model residuals should be checked for unexplained structure such as correlations with model inputs and variables that were not included in the model or trends to ensure that all information has been extracted from the available data (Kirchner et al., 1996). While few alternatives seem to be available for these tests based on model residuals, there is often limited rigor in unsystematically testing the structure of hydrological models. In particular, as complexity of models increases the problem of non-uniqueness of model structures increases, i.e. many different model structures having similar error statistics and characteristics (Beven and Freer, 2001). Conversely, for simpler models representing only a limited number of dominant processes, non-uniqueness is typically less problematic. However, as usually only a limited number of model structures are tested, it is difficult to know whether a robust, sufficiently simple model has been found and the dominant processes have been identified.
Genetic Programming (GP) is able to systematically explore many alternative model structures of different complexity from available input and response data. It may help to transform a set of observed input and response data into a conceptual model of the underlying dominant processes. Therefore we hypothesised that GP can be used to identify dominant processes in hydrological systems and to evaluate the structure of hydrological models. To test this, GP was used to analyse a data set from a lysimeter experiment in southeastern Australia. Based on the GP analysis, we evaluated an existing conceptual model of deep percolation that had been previously developed with these experimental data.
Section snippets
Genetic programming
Genetic Programming (GP) is a relatively new automatic programming technique for evolving computer programs to solve, or approximately solve, problems (Koza, 1992). In engineering applications, GP is frequently applied to model structure identification problems. In such applications, GP is used to infer the underlying structure of either a natural or experimental process in order to model the process numerically. A number of applications of GP have been reported in water resources, which
Analysis of lysimeter data using GP
Using the five different maximum equation sizes (i.e., 5, 8, 9, 10 and 15) and multiple GP runs with different initialisations, we found that the final infiltration rate of the subsoil if, the ponding time to and the watertable depth GWD were selected at least once per GP run (Table 2). The amount of water stored in the rootzone between saturation and field capacity DW, the daily average rainfall R and the sum of daily crop evapotranspiration ET between two consecutive irrigations were less
Concluding remarks
In this study, GP was used to analyse a data set from a lysimeter experiment in southeastern Australia. We investigated the recurrence and performance of GP derived models using multiple GP runs and different equation sizes. The GP model DP = if to a GWD was recurrently evolved in the multiple GP runs up to a maximum equation size of nine. This simple model was readily interpretable. It supported that steady-state percolation during irrigation, as represented in the conceptual model developed by
Acknowledgements
This work was funded by the Department of Primary Industries, the Department of Sustainability and Environment, North Central Catchment Management Authority and the Goulburn Broken Catchment Management Authority. We would like to thank Murray Hannah (Department of Primary Industries), Prof. Chris Perera (Victoria University) and two anonymous reviewers for their helpful comments on the manuscript.
References (23)
- et al.
Equifinality, data assimilation, and uncertainty estimation in mechanistic modelling of complex environmental systems using the GLUE methodology
Journal of Hydrology
(2001) - et al.
Real-time wave forecasting using genetic programming
Ocean Engineering
(2008) - et al.
Testing and validating environmental models
Science of the Total Environment
(1996) - et al.
Genetic programming for analysis and real-time prediction of coastal algal blooms
Ecological Modelling
(2005) - et al.
River flow forecasting through conceptual models part I – A discussion of principles
Journal of Hydrology
(1970) - et al.
Modelling rainfall–runoff relationships using genetic programming
Mathematical and Computer Modelling
(2001) - et al.
Genetic programming as a model induction engine
Journal of Hydroinformatics
(2000) - et al.
Understanding and predicting deep percolation under surface irrigation
Water Resources Research
(2008) How far can we go in distributed hydrological modelling?
Hydrology and Earth System Sciences
(2001)Statistical modelling: the two cultures
Statistical Science
(2001)
Model uncertainty, data mining and statistical inference export
Journal of the Royal Statistical Society, Series A (Statistics in Society)
Cited by (23)
Intercomparison of downscaling methods for daily precipitation with emphasis on wavelet-based hybrid models
2021, Journal of HydrologyCitation Excerpt :In the case of SDSM, all the probable predictors are considered as the model predictors and the model is developed using the SDSM 5.2 toolbox (González-Rojí et al., 2019). For the GP models, a population size of 500 is chosen, and the mutation and crossover rates are kept at high value, following Sivapragasam et al. (2008) and Selle and Muttil (2011). The mean squared error (MSE) is used as the model fitness function.
Genetic programming for predictions of effectiveness of rolling dynamic compaction with dynamic cone penetrometer test results
2019, Journal of Rock Mechanics and Geotechnical EngineeringCitation Excerpt :However, as with GAs, GP performs a multi-directional simultaneous search for an optimal solution from a pool of many potential solutions, collectively known as a ‘population’. The fact that these methods operate from a population enables them to escape local minima in the error surface and is thus able to find optimal or near optimal solutions (Selle and Muttil, 2011). In the traditional GP approach, which is also referred to as tree-based genetic programming (TGP), the computer programs (individuals) have a symbolic representation of a rooted tree-like structure with ordered branches in which the root node and internal nodes are comprised of functions whereas, external nodes (leaves) contain the input values or constants (Koza, 1992).
Statistical downscaling of precipitation using machine learning techniques
2018, Atmospheric ResearchCitation Excerpt :In the GP algorithm, initially, a set of equations (e.g. downscaling models) is randomly generated for relating the predictors with the predictand. Then these equations (models) are evolved by performing various genetic operations on them until their fitness reaches a maximum (Whigham and Crapper, 2001) or a predefined fitness threshold or in certain instances up to a predefined number of generations (Selle and Muttil, 2011). For the implementation of GP, training (calibration) and testing (validation) data sets, population size, tree size (depth of a tree), a set of terminals, a set of mathematical functions, a fitness measure, criterion for selecting models for the mating pool, values of genetic operators (e.g. probabilities of crossover, mutation and replication) and termination/stopping criterion should be predefined.
A Pareto-optimal moving average multigene genetic programming model for daily streamflow prediction
2017, Journal of HydrologyCitation Excerpt :Mimicking Darwinian evolution, this process is iterated until the population contains programs that solve the task well (Searson, 2015). In hydrological applications, GP is commonly used to infer the underlying structure of either natural (e.g., Ghorbani et al., 2010; Danandeh Mehr et al., 2013; Nourani et al., 2013b; Sattar and Gharabaghi, 2015; Meshgi et al., 2015; Ravansalar et al., 2016) or experimental (e.g., Selle and Muttil, 2011; Khan et al., 2012; Uyumaz et al., 2014) processes. In such applications, GP generates some possible programs (solutions) representing the underlying process mathematically.
Hydrologically informed machine learning for rainfall-runoff modelling: Towards distributed modelling
2021, Hydrology and Earth System Sciences