Original papersGeneralizability of gene expression programming and random forest methodologies in estimating cropland and grassland leaf area index
Introduction
Leaf area index (LAI) is a dimensionless variable defined as the total one-sided area of photosynthetic tissues per unit ground surface area (Watson, 1947). LAI is an important structural characteristic of ecosystem as it influences the exchanges of water, energy, and carbon between the land surface and atmosphere (Sellers et al., 1988, Wulder et al., 1998, Sonnentag et al., 2007). It determines the size of the plant– atmosphere interface, and therefore plays a key role in the energy and mass exchanges between the canopy and the atmosphere (Weiss et al., 2004). Xu et al. (2014) showed that the inclusion of LAI in their variational data assimilation model improved the simulation of surface water and energy fluxes. Li et al. (2009) indicated that using LAI in Xinanjiang hydrologic modeling improved rainfall-runoff modeling. Coopersmith et al. (2014) and Chen et al. (2015) obtained a better soil moisture prediction by incorporating LAI in Integrated Biosphere Simulator (IBS) and HYDRUS-1D models.
The ground-based measurement methods have been developed to measure LAI accurately (Asner et al., 2003, Jonckheere et al., 2004, Qu et al., 2014). However, those methods can only obtain LAI at point scale during limited time periods due to their high cost and time consumption. Therefore, different models have been developed to acquire LAI over large spatial scales based on remotely sensed data.
Currently, there are mainly three kinds of methods for retrieving LAI from remotely sensed data, i.e., the empirical relationships, radiative transfer models, and heuristic data driven models. The empirical methods are used to link LAI with remotely sensed vegetation index (i.e. NDVI) or with reflectance data with regression equations (Combal et al., 2003), which are relatively simple and accurate. However, these methods are sensor dependent and site specific, and have a major drawback of local calibration need. The physical laws are used in radiative transfer models to explicitly describe associations between the vegetation properties and canopy spectra, and produce reasonable LAI at regional scale (Meroni et al., 2004, He et al., 2013). However, the radiative transfer models are usually complicated and time consuming (Jacquemoud et al., 2000). Thus, heuristic data driven techniques are used to estimate LAI at larger scales (Xiao et al., 2014, Liang et al., 2014). The GLASS-LAI product is produced via the reflectance data in the visible and infrared bands based on heuristic data driven techniques (Liang et al., 2014, Xiao et al., 2014).
In recent years, heuristic data driven techniques (e.g., gene expression programming (GEP) and random forest (RF)) have been utilized for modeling hydrological and eco-hydrological parameters. Genetic programming (GP), a generalization of genetic algorithm (GA) (Goldberg, 1989), was proposed by Koza (1992). It engages a “parse tree” structure for exploring the solutions. Gene expression programming (GEP) is equivalent to GP. The chromosomes in GEP collect multiple genes, each gene converting a smaller subprogram. Moreover, the systematic organization of the linear chromosomes provides the unrestrained behavior of important genetic operators such as mutation, transposition and recombination (Ferreira, 2006). Major dominances of GP (i.e., GEP) are that it can be applied to areas where (a) the interrelationships among the pertinent factors are less clarified, (b) finding the conclusive solution is difficult, (c) normal mathematical investigation cannot supply analytical solutions, (d) a rough solution is acceptable, (e) small improvements in the performance are routinely measured and highly valued, and (f) there is a large amount of data which require evaluation, classification, and integration (Banzhaf et al., 1998). One of the major advantages of GEP is that it can generate an explicit equation between input(s) and output of the underlying problem. Such an equation might be subjected to some interpretation to find the governing rules of the studied process.
A number of studies (e.g., Walthall et al., 2004, Dunea and Moise, 2008, Xiao et al., 2014) applied data driven neural networks models for obtaining LAI from remotely sensed data and filling the gaps between the recorded data. Everingham et al. (2009) applied heuristic techniques to forecast regional sugarcane crop production. Torres et al. (2011) applied support vector machine to estimate daily potential evapotranspiration with limited climatic data. Shiri et al. (2014a) used heuristic techniques to model dew point temperature. Shiri et al. (2014b) showed the generalizability of GEP in modeling daily evapotranspiration in local and regional scales. Karimi et al. (2017) used GEP for simulating daily evapotranspiration through a cross-station approach.
Commonly, lots of heuristic-based applications contemplate only a single data set assignment where models are developed and validated utilizing data of the same site. Apart from not executing a perfect performance evaluation of the local patterns, another important drawback of this data set assignment type is that the generalization ability of the achieved models is not evaluated outside the locations that have been used to train the models (Marti et al., 2013, Shiri et al., 2014b).
The present study aimed at assessing the performances of GEP and RF techniques in local and external cross-station scales for simulating LAI, using available meteorological and NDVI data. By relying only on meteorological data, LAI can be obtained in the long past time and future when no remotely sensed data exists. To the best of authors’ knowledge, this is the first assessment of heuristic methods in estimating LAI in local and cross-station scales. The robust k-fold testing cross validation technique was used for assessing the applied methodologies in both local and external scales.
Section snippets
Data
The GEP and RF-based models were trained and tested extensively over ten experimental sites (with five cropland and five grassland sites). The meteorological data of the ten experimental sites were obtained via Fluxnet website (http://www.fluxnet.ornl.gov/). The site locations and data temporal coverage were summarized in Table 1. Half-hourly or hourly micrometeorological data such as wind speed, air temperature and humidity, atmospheric pressure, solar radiation, and incoming longwave
Global assessment of the models
Table 4 sums up the global statistical indicators of the applied models for both the land cover types. Expectedly, the local GEP and RF models (LGEP and LRF, respectively) presented the most accurate simulations because they relied on the local patterns, so they were trained and tested using the meteorological patterns of the same locations. This would make a great limitation to the applicability of these locally trained models, so that they could not be applied using data outside the trained
Conclusions
The present study reports a new application of heuristic data driven models for simulating cropland and grassland LAI using meteorological variables. Meteorological and remotely sensed data from the mentioned land cover types were utilized to estimate the LAI through gene expression programming (GEP) and random forest (RF) techniques. A most robust cross validation approach, i.e. k-fold testing was adopted here for both local (temporal) and external (spatial) assessment of the applied models.
References (51)
- et al.
Investigating the impact of leaf area index temporal variability on soil moisture predictions using remote sensing vegetation data
J. Hydrol.
(2015) - et al.
Retrieval of canopy biophysical variables from bidirectional reflectance—using prior information to solve the ill-posed inverse problem
Remote Sens. Environ.
(2003) - et al.
Field-scale moisture estimates using COSMOS sensors: a validation study with temporary networks and leaf-area-indices
J. Hydrol.
(2014) - et al.
Ensemble data mining approaches to forecast regional sugarcane crop production
Agric. For. Meteorol.
(2009) - et al.
Retrieval of leaf area index in alpine wetlands using a two-layer canopy reflectance model
Int. J. Appl. Earth Observ. Geoinform.
(2013) - et al.
Overview of the radiometric and biophysical performance of the MODIS vegetation indices
Remote Sens. Environ.
(2002) - et al.
Comparison of four radiative transfer models to simulate plant canopies reflectance: direct and inverse mode
Remote Sens. Environ.
(2000) - et al.
Review of methods for in situ leaf area index determination Part I. Theories, sensors and hemispherical photography
Agric. For. Meteorol.
(2004) - et al.
Predicting runoff in ungauged catchments by using Xinanjiang model with MODIS leaf area index
J. Hydrol.
(2009) - et al.
Artificial neural networks vs. gene expressions programming for estimating outlet dissolved oxygen in micro irrigation sand filters fed with effluents
Comput. Electron. Agric.
(2013)
Modeling reference evapotranspiration with calculated targets: assessment and implications
Agric. Water Manage.
Inversion of a radiative transfer model with hyperspectral observations for LAImapping in poplar plantations
Remote Sens. Environ.
Global products of vegetation leaf area and fraction absorbed PAR from year one of MODIS data
Remote Sens. Environ.
LAINet – a wireless sensor network for coniferous forest leaf area index measurement: design, algorithm and validation
Comput. Electron. Agric.
Modeling energy dissipation over stepped spillways using machine learning approaches
J. Hydrol.
Evaluation of genetic programming-based models for simulating friction factor in alluvial channels
J. Hydrol.
Evaluation of FAO56-PM, empirical, semi-empirical and gene expression programming approaches for estimating daily reference evapotranspiration in hyper-arid regions of Iran
Agric. Water Manage.
Generalizability of gene expression programming-based approaches for estimating daily reference evapotranspiration in coastal stations of Iran
J. Hydrol.
Independent testing for assessing the calibration of the Hargreaves-Samani equation: new heuristic alternatives for Iran
Comput. Electron. Agric.
Using direct and indirect measurements of leaf area index to characterize the shrub canopy in an ombrot rophic peatland
Agric. For. Meteorol.
Forecasting daily potential evapotranspiration using machine learning and limited climatic data
Agric. Water Manage
A comparison of empirical and neural network approaches for estimating corn and soybean leaf area index from Landsat ETM+ imagery
Remote Sens. Environ.
Review of methods for in situ leaf area index (LAI) determination. Part II. Estimation of LAI, errors and sampling
Agric. For. Meteorol.
Aerial image texture information in the estimation of northern deciduous and mixed wood forest leaf area index (LAI)
Remote Sens. Environ.
Global synthesis of leaf area index observation: implications for ecological and remote sensing studies
Glob. Ecol. Biogeogr.
Cited by (25)
Artificial intelligence approach to estimate discharge of drip tape irrigation based on temperature and pressure
2020, Agricultural Water ManagementCitation Excerpt :Various applications of AI models have been reported so far in the field of irrigation, e.g. estimation of the nutrient and water distribution uniformity in the soil (Hinnell et al., 2010; Lazarovitch et al., 2009; Li et al., 2004), estimating soil moisture distribution (Dursun and Özden, 2014, 2017; Schmitz et al., 2002), simulating water distribution uniformity of sprinkler irrigation systems (Maroufpoor et al., 2019b), modeling local pressure loss by integrated emitters (Martí et al., 2009), estimation of dissolved oxygen in the outflow from sand filters (Martí et al., 2013; Puig-Bargués et al., 2012) assessing the hydraulic performance of labyrinth emitters (Mattar and Alamoud, 2015), determination of irrigation interval (Rocha neto et al., 2015), simulating radius and depth of wetting pattern caused by different emitters (Ekhmaj et al., 2007; Elnesr and Alazba, 2017; Samadianfard et al., 2014), estimating the irrigation depth values (Perea et al., 2018) and modeling crop water requirements (Shiri, 2018, 2019). Also, applications of AI models have been reported in the other fields of water resource management (Alizadeh et al., 2018; Fotovatikhah et al., 2018; Karimi et al., 2018; Yaseen et al., 2019). A review of the literature by the authors showed that, despite the development of AI models applications in various fields, there is still no attempt to use these models for estimating the emitter discharge values under simultaneous variations of the temperature and pressure parameters.
Modelling of soil permeability using different data driven algorithms based on physical properties of soil
2020, Journal of HydrologyCitation Excerpt :Additionally, excessive superiority of RF algorithm is that it is very simply to determine the virtual importance of respectively feature in the forecasting. In regression, tree indicator derives the numerical values instead of class labels utilized by the arbitrary tree classifier (Karimi et al., 2018). The RF model is built though fitting single trees in group (bagging system).
Estimating leaf area index and aboveground biomass of grazing pastures using Sentinel-1, Sentinel-2 and Landsat images
2019, ISPRS Journal of Photogrammetry and Remote SensingCitation Excerpt :AGB is often used to estimate forage amount and livestock carrying capacity in grasslands (Ramoelo et al., 2015; Yang et al., 2009). LAI and AGB can be measured using ground-based methods, but these approaches are time-consuming, labor-intensive, and difficult to replicate regionally (Karimi et al., 2018; Lu et al., 2016; Shoko et al., 2016). Process-based biosphere or ecosystem models can be used to simulate vegetation dynamics, including LAI and AGB of grasslands, but the results usually have coarse spatial resolutions (Foley et al., 1996; Friend et al., 1997; Haxeltine and Prentice, 1996; Tan et al., 2010).
Improving the performance of the mass transfer-based reference evapotranspiration estimation approaches through a coupled wavelet-random forest methodology
2018, Journal of HydrologyCitation Excerpt :Random Forests (RF) is a group learning algorithm which manages high-dimension regression problems. It is a tree-based group approach, where all trees are dependent on a collection of random variables, and the forest is grown from many regression trees put together and from a group (Breiman, 2001; Chen et al., 2017; Karimi et al., 2018). The final decision is resulted via averaging the output, after fitting single trees in ensemble (bagging procedure).