Original papers
Generalizability of gene expression programming and random forest methodologies in estimating cropland and grassland leaf area index

https://doi.org/10.1016/j.compag.2017.12.007Get rights and content

Highlights

  • Leaf area index (LAI) was simulated using GEP and RF models.

  • Cropland and grassland data were used for developing and validating the models.

  • Local and external k-fold testing were used for assessing the models.

  • GEP and RF gave promising results in both scenarios.

Abstract

Leaf Area Index (LAI) is a very important structural attribute of ecosystems which affects the energy, water and carbon exchanges between the land surface and atmosphere. Direct measurement of LAI is costly and time consuming so indirect measurement approaches have been developed for determining its magnitude. The present paper aimed at modeling LAI in cropland and grassland sites using the available meteorological data through two heuristic data driven techniques, namely, gene expression programming (GEP) and random forest (RF). Different data set organizations were designed using local (temporal) and external (spatial) norms to provide a thoroughgoing data scanning strategy. The results showed that the external GEP and RF models (EGEP and ERF) might be suitable approaches for modeling LAI by average scatter index (SI) values of 0.275 and 0.270 (for cropland) and 0.273 and 0.279 (for grassland) when compared to the local GEP and RF models with average SI values of 0.207 and 0.204 (cropland), and 0.249 and 0.204 (grassland), respectively. The presented methodology allowed the evaluation in each site of models developed (trained) using local patterns and the models developed using the exogenous data (patterns from ancillary sites).

Introduction

Leaf area index (LAI) is a dimensionless variable defined as the total one-sided area of photosynthetic tissues per unit ground surface area (Watson, 1947). LAI is an important structural characteristic of ecosystem as it influences the exchanges of water, energy, and carbon between the land surface and atmosphere (Sellers et al., 1988, Wulder et al., 1998, Sonnentag et al., 2007). It determines the size of the plant– atmosphere interface, and therefore plays a key role in the energy and mass exchanges between the canopy and the atmosphere (Weiss et al., 2004). Xu et al. (2014) showed that the inclusion of LAI in their variational data assimilation model improved the simulation of surface water and energy fluxes. Li et al. (2009) indicated that using LAI in Xinanjiang hydrologic modeling improved rainfall-runoff modeling. Coopersmith et al. (2014) and Chen et al. (2015) obtained a better soil moisture prediction by incorporating LAI in Integrated Biosphere Simulator (IBS) and HYDRUS-1D models.

The ground-based measurement methods have been developed to measure LAI accurately (Asner et al., 2003, Jonckheere et al., 2004, Qu et al., 2014). However, those methods can only obtain LAI at point scale during limited time periods due to their high cost and time consumption. Therefore, different models have been developed to acquire LAI over large spatial scales based on remotely sensed data.

Currently, there are mainly three kinds of methods for retrieving LAI from remotely sensed data, i.e., the empirical relationships, radiative transfer models, and heuristic data driven models. The empirical methods are used to link LAI with remotely sensed vegetation index (i.e. NDVI) or with reflectance data with regression equations (Combal et al., 2003), which are relatively simple and accurate. However, these methods are sensor dependent and site specific, and have a major drawback of local calibration need. The physical laws are used in radiative transfer models to explicitly describe associations between the vegetation properties and canopy spectra, and produce reasonable LAI at regional scale (Meroni et al., 2004, He et al., 2013). However, the radiative transfer models are usually complicated and time consuming (Jacquemoud et al., 2000). Thus, heuristic data driven techniques are used to estimate LAI at larger scales (Xiao et al., 2014, Liang et al., 2014). The GLASS-LAI product is produced via the reflectance data in the visible and infrared bands based on heuristic data driven techniques (Liang et al., 2014, Xiao et al., 2014).

In recent years, heuristic data driven techniques (e.g., gene expression programming (GEP) and random forest (RF)) have been utilized for modeling hydrological and eco-hydrological parameters. Genetic programming (GP), a generalization of genetic algorithm (GA) (Goldberg, 1989), was proposed by Koza (1992). It engages a “parse tree” structure for exploring the solutions. Gene expression programming (GEP) is equivalent to GP. The chromosomes in GEP collect multiple genes, each gene converting a smaller subprogram. Moreover, the systematic organization of the linear chromosomes provides the unrestrained behavior of important genetic operators such as mutation, transposition and recombination (Ferreira, 2006). Major dominances of GP (i.e., GEP) are that it can be applied to areas where (a) the interrelationships among the pertinent factors are less clarified, (b) finding the conclusive solution is difficult, (c) normal mathematical investigation cannot supply analytical solutions, (d) a rough solution is acceptable, (e) small improvements in the performance are routinely measured and highly valued, and (f) there is a large amount of data which require evaluation, classification, and integration (Banzhaf et al., 1998). One of the major advantages of GEP is that it can generate an explicit equation between input(s) and output of the underlying problem. Such an equation might be subjected to some interpretation to find the governing rules of the studied process.

A number of studies (e.g., Walthall et al., 2004, Dunea and Moise, 2008, Xiao et al., 2014) applied data driven neural networks models for obtaining LAI from remotely sensed data and filling the gaps between the recorded data. Everingham et al. (2009) applied heuristic techniques to forecast regional sugarcane crop production. Torres et al. (2011) applied support vector machine to estimate daily potential evapotranspiration with limited climatic data. Shiri et al. (2014a) used heuristic techniques to model dew point temperature. Shiri et al. (2014b) showed the generalizability of GEP in modeling daily evapotranspiration in local and regional scales. Karimi et al. (2017) used GEP for simulating daily evapotranspiration through a cross-station approach.

Commonly, lots of heuristic-based applications contemplate only a single data set assignment where models are developed and validated utilizing data of the same site. Apart from not executing a perfect performance evaluation of the local patterns, another important drawback of this data set assignment type is that the generalization ability of the achieved models is not evaluated outside the locations that have been used to train the models (Marti et al., 2013, Shiri et al., 2014b).

The present study aimed at assessing the performances of GEP and RF techniques in local and external cross-station scales for simulating LAI, using available meteorological and NDVI data. By relying only on meteorological data, LAI can be obtained in the long past time and future when no remotely sensed data exists. To the best of authors’ knowledge, this is the first assessment of heuristic methods in estimating LAI in local and cross-station scales. The robust k-fold testing cross validation technique was used for assessing the applied methodologies in both local and external scales.

Section snippets

Data

The GEP and RF-based models were trained and tested extensively over ten experimental sites (with five cropland and five grassland sites). The meteorological data of the ten experimental sites were obtained via Fluxnet website (http://www.fluxnet.ornl.gov/). The site locations and data temporal coverage were summarized in Table 1. Half-hourly or hourly micrometeorological data such as wind speed, air temperature and humidity, atmospheric pressure, solar radiation, and incoming longwave

Global assessment of the models

Table 4 sums up the global statistical indicators of the applied models for both the land cover types. Expectedly, the local GEP and RF models (LGEP and LRF, respectively) presented the most accurate simulations because they relied on the local patterns, so they were trained and tested using the meteorological patterns of the same locations. This would make a great limitation to the applicability of these locally trained models, so that they could not be applied using data outside the trained

Conclusions

The present study reports a new application of heuristic data driven models for simulating cropland and grassland LAI using meteorological variables. Meteorological and remotely sensed data from the mentioned land cover types were utilized to estimate the LAI through gene expression programming (GEP) and random forest (RF) techniques. A most robust cross validation approach, i.e. k-fold testing was adopted here for both local (temporal) and external (spatial) assessment of the applied models.

References (51)

  • P. Marti et al.

    Modeling reference evapotranspiration with calculated targets: assessment and implications

    Agric. Water Manage.

    (2015)
  • M. Meroni et al.

    Inversion of a radiative transfer model with hyperspectral observations for LAImapping in poplar plantations

    Remote Sens. Environ.

    (2004)
  • R. Myneni

    Global products of vegetation leaf area and fraction absorbed PAR from year one of MODIS data

    Remote Sens. Environ.

    (2002)
  • Y. Qu et al.

    LAINet – a wireless sensor network for coniferous forest leaf area index measurement: design, algorithm and validation

    Comput. Electron. Agric.

    (2014)
  • K. Roushangar et al.

    Modeling energy dissipation over stepped spillways using machine learning approaches

    J. Hydrol.

    (2014)
  • K. Roushangar et al.

    Evaluation of genetic programming-based models for simulating friction factor in alluvial channels

    J. Hydrol.

    (2014)
  • J. Shiri

    Evaluation of FAO56-PM, empirical, semi-empirical and gene expression programming approaches for estimating daily reference evapotranspiration in hyper-arid regions of Iran

    Agric. Water Manage.

    (2017)
  • J. Shiri et al.

    Generalizability of gene expression programming-based approaches for estimating daily reference evapotranspiration in coastal stations of Iran

    J. Hydrol.

    (2014)
  • J. Shiri et al.

    Independent testing for assessing the calibration of the Hargreaves-Samani equation: new heuristic alternatives for Iran

    Comput. Electron. Agric.

    (2015)
  • O. Sonnentag et al.

    Using direct and indirect measurements of leaf area index to characterize the shrub canopy in an ombrot rophic peatland

    Agric. For. Meteorol.

    (2007)
  • A.F. Torres et al.

    Forecasting daily potential evapotranspiration using machine learning and limited climatic data

    Agric. Water Manage

    (2011)
  • C. Walthall et al.

    A comparison of empirical and neural network approaches for estimating corn and soybean leaf area index from Landsat ETM+ imagery

    Remote Sens. Environ.

    (2004)
  • M. Weiss et al.

    Review of methods for in situ leaf area index (LAI) determination. Part II. Estimation of LAI, errors and sampling

    Agric. For. Meteorol.

    (2004)
  • M.A. Wulder et al.

    Aerial image texture information in the estimation of northern deciduous and mixed wood forest leaf area index (LAI)

    Remote Sens. Environ.

    (1998)
  • G.P. Asner et al.

    Global synthesis of leaf area index observation: implications for ecological and remote sensing studies

    Glob. Ecol. Biogeogr.

    (2003)
  • Cited by (25)

    • Artificial intelligence approach to estimate discharge of drip tape irrigation based on temperature and pressure

      2020, Agricultural Water Management
      Citation Excerpt :

      Various applications of AI models have been reported so far in the field of irrigation, e.g. estimation of the nutrient and water distribution uniformity in the soil (Hinnell et al., 2010; Lazarovitch et al., 2009; Li et al., 2004), estimating soil moisture distribution (Dursun and Özden, 2014, 2017; Schmitz et al., 2002), simulating water distribution uniformity of sprinkler irrigation systems (Maroufpoor et al., 2019b), modeling local pressure loss by integrated emitters (Martí et al., 2009), estimation of dissolved oxygen in the outflow from sand filters (Martí et al., 2013; Puig-Bargués et al., 2012) assessing the hydraulic performance of labyrinth emitters (Mattar and Alamoud, 2015), determination of irrigation interval (Rocha neto et al., 2015), simulating radius and depth of wetting pattern caused by different emitters (Ekhmaj et al., 2007; Elnesr and Alazba, 2017; Samadianfard et al., 2014), estimating the irrigation depth values (Perea et al., 2018) and modeling crop water requirements (Shiri, 2018, 2019). Also, applications of AI models have been reported in the other fields of water resource management (Alizadeh et al., 2018; Fotovatikhah et al., 2018; Karimi et al., 2018; Yaseen et al., 2019). A review of the literature by the authors showed that, despite the development of AI models applications in various fields, there is still no attempt to use these models for estimating the emitter discharge values under simultaneous variations of the temperature and pressure parameters.

    • Modelling of soil permeability using different data driven algorithms based on physical properties of soil

      2020, Journal of Hydrology
      Citation Excerpt :

      Additionally, excessive superiority of RF algorithm is that it is very simply to determine the virtual importance of respectively feature in the forecasting. In regression, tree indicator derives the numerical values instead of class labels utilized by the arbitrary tree classifier (Karimi et al., 2018). The RF model is built though fitting single trees in group (bagging system).

    • Estimating leaf area index and aboveground biomass of grazing pastures using Sentinel-1, Sentinel-2 and Landsat images

      2019, ISPRS Journal of Photogrammetry and Remote Sensing
      Citation Excerpt :

      AGB is often used to estimate forage amount and livestock carrying capacity in grasslands (Ramoelo et al., 2015; Yang et al., 2009). LAI and AGB can be measured using ground-based methods, but these approaches are time-consuming, labor-intensive, and difficult to replicate regionally (Karimi et al., 2018; Lu et al., 2016; Shoko et al., 2016). Process-based biosphere or ecosystem models can be used to simulate vegetation dynamics, including LAI and AGB of grasslands, but the results usually have coarse spatial resolutions (Foley et al., 1996; Friend et al., 1997; Haxeltine and Prentice, 1996; Tan et al., 2010).

    • Improving the performance of the mass transfer-based reference evapotranspiration estimation approaches through a coupled wavelet-random forest methodology

      2018, Journal of Hydrology
      Citation Excerpt :

      Random Forests (RF) is a group learning algorithm which manages high-dimension regression problems. It is a tree-based group approach, where all trees are dependent on a collection of random variables, and the forest is grown from many regression trees put together and from a group (Breiman, 2001; Chen et al., 2017; Karimi et al., 2018). The final decision is resulted via averaging the output, after fitting single trees in ensemble (bagging procedure).

    View all citing articles on Scopus
    View full text