Cotton yield prediction with Markov Chain Monte Carlo-based simulation model integrated with genetic programing algorithm: A new hybrid copula-driven approach
Introduction
Timely information on the crop yield is important for agriculture-dependent nations (e.g., Pakistan), as this can generate crucial ideas for agricultural policy making, and forward planners and agricultural markets. Agriculture in Pakistan is known to contribute to about 21% of the county’s GDP (Sarwar, 2014), which include cotton as an important cash crop. This is because cotton is an integral commodity for the economic development of Pakistan as the nation is highly dependent on the cotton industry and its related textile sector due to which the cotton crop has been given a principal status in the country. Cotton crop is grown from May-August as an industrial crop in 15% of the nation's available land area producing 15 million bales during 2014-15 (Reporter, 2015). Pakistan is placed at fourth position among cotton growers, third largest exporter and fourth largest consumer (Banuri, 1998). In 2013, about 1.6 million farmers (out of a total of 5 million in all sectors) engaged in cotton farming, growing more than 3 million hectares (Banuri, 1998; Reporter, 2015).
Data-intelligent models, utilizing past data can offer an accurate solutions to the problems related to the projection of future trends in agriculture, crop yield, rainfall and drought that affects agricultural productivity (Ali et al., 2018a, b; Bauer, 1975; Nguyen-Huy et al., 2017, 2018). Machine learning models, which are highly non-linear models, utilize data that has input features valued for the prediction of crop yield. In the work of Kern et al. (2018), multiple linear regression models were constructed to simulate the yield of the four major crop types in Hungary using environmental and remote sensing information. Moreover, Bokusheva et al. (2016) developed copula models for crop yields on VH indices and Craparo et al. (2015) built an ARIMA model to forecast the decline of coffee yield in Tanzania. Debnath et al. (2013) predicted area and cotton yield in India using an ARIMA model. Blanc et al. (2008) utilized a multiple regression model of the main climatic determinants of rain fed cotton yield in West Africa. Yang et al. (2014) assessed cotton yield and water demand under climate change and future adaptation measures using APSIM-OzCot model. Chen et al. (2011) studied the impact of climate change on cotton production and water consumption using COSIM model in China. Hearn (1994) design a simulation model named OZCOT for cotton crop management in Australia. Papageorgiou et al., (2011) predicted cotton yield using fuzzy cognitive maps in 2011, Greece. Jin and Xu (2012) conducted a study on the estimation of cotton yield using Carnegie Ames Stanford Approach model in China. The aforementioned models were developed to study the climate change impacts on cotton yield prediction.
In summary, existing literature shows that there are few studies in Pakistan that have developed methods for the prediction of cotton yield, despite its relevance as a world leader in cotton production. Ali et al. (2015) used a forecasting ARIMA model for the production of sugarcane and cotton crops of Pakistan from 2013–2030. Hina Ali et al. (2013) also analyzed production forecasting of cotton in Pakistan. Ahmad et al. (2017) developed an ARIMA model to forecast area, production and yield of major crops in Pakistan in 2017. Raza and Ahmad (2015) studied the impact of climate change on cotton productivity in Punjab and Sindh, Pakistan using fixed effect models. Ayaz et al. (2015) studied weather effect on cotton crop in Sindh, Pakistan. Carpio and Ramirez (2002) used yield and acreage models to forecast cotton yield in India, Pakistan and Australia. Ahmad (1975) designed a time series prediction for the supply response of cotton in Punjab, Pakistan in 1975.
All the previous studies indicate that the prediction of cotton yields have been based primarily on the effect of climate change with the adoption of ARIMA model only. In addition to that, all these studies have been conducted for a large area, either for a whole province, or national region, but not for a small locality. Moreover, there is a limitation of applying advanced data-intelligent algorithms for more accurate prediction models at a micro scale which can provide help for decision-making in precision agriculture and farming systems which may be the way future farming trends are analyzed. To address these mentioned issues, there is an apparent need for data intelligent models to predict cotton yield more accurately and at a much finer scale than attempted previously. In this study, for the first time, a hybrid genetic programing integrated with a Markov Chain Monte Carlo (GP-MCMC) based copula model has been developed for the prediction of cotton yield in Faisalabad, Multan and Nawabshah in Pakistan. The novelty of this study is to utilize as yet untested accurate GP-MCMC based copula models for the prediction of cotton yield in Pakistan.
To advance the application of copula models, especially in agriculture where they have been relatively scarcely applied the present study aims to address four primary objectives. (1) To apply GP and MCMC based copula, MCMC based copula models and a standalone GP model to determine which is of these models is the most accurate data-intelligent tool for predicting cotton yield in the developing nation of Pakistan. (2) To model influence of climate dataset (i.e., temperature, rainfall and humidity) to predict effectively the cotton yield in the proposed districts of Punjab and Sindh, the primary agricultural hubs in Pakistan. (3) To develop and optimize the copula-based models by tuning the GP and the MCMC techniques as well as to evaluate their performances in comparison with MCMC based copula and standalone GP model. (4) To validate the predictive ability of each model with respect to cotton yield in Pakistan, making a major contribution to the use of data-driven models for agricultural yield estimation.
Section snippets
Theoretical framework
In this section an overview of the proposed predictive GP-MCMC based copula models with its comparative counterparts, MCMC based copula models and GP are presented.
Materials and method
In this Section, the description of acquired climate and cotton yield data, study regions, design of predictive models and performance criteria have been provided.
Results and discussion
The results of the GP-MCMC based copula model have been compared against MCMC based copula models and a standalone GP model based on the evaluation criterion described above (Eqs. (10), (11), (12), (13), (14), (15), (16), (17), (18), (19), (20), (21)).
Fig. 6(a–c) demonstrates the joint dependence structure between GP based forecasted cotton yield and observed cotton yield anomalies using MCMC-copula models for the 33-year seasonal dataset. The asymmetric and skewed dependence structure of the
Conclusion
This paper has developed a suite of GP-MCMC based copula models using climate data (temperature, rainfall, humidity) as predictor variables and cotton yield data as an objective variable to predict cotton yield for different geographical sites in Pakistan. To attain an accurate GP-MCMC-copula model, the MCMC algorithm adopted a global optimization technique to find the best copula parameters. Evidently, the performance of the GP-MCMC based copula was found to be much better than the MCMC based
Acknowledgements
This research utilized cotton yield data acquired from the Pakistan Bureau of Statistics, Government of Pakistan: Islamabad, Pakistan and climate data were acquired from Pakistan Meteorological Department, Pakistan, that are duly acknowledged. This study was supported by the University of Southern Queensland’s Office of Graduate Studies Postgraduate Research Scholarship (2017–2019). We thank all reviewers and the journal Editor for their useful comments that have improved the clarity of the
References (87)
- et al.
An ensemble-ANFIS based uncertainty assessment model for forecasting multi-scalar standardized precipitation index
Atmos. Res.
(2018) - et al.
Multi-stage hybridized online sequential extreme learning machine integrated with Markov Chain Monte Carlo copula-Bat algorithm for rainfall forecasting
Atmos. Res.
(2018) The role of remote sensing in determining the distribution and yield of crops
Adv. Agron.
(1975)- et al.
The climatic determinants of cotton yields: evidence from a plot in West Africa
Agric. For. Meteorol.
(2008) - et al.
Satellite-based vegetation health indices as a criteria for insuring against drought-related yield losses
Agric. For. Meteorol.
(2016) - et al.
Coffea arabica yields decline in Tanzania due to climate change: Global implications
Agric. For. Meteorol.
(2015) - et al.
HydroTest: a web-based toolbox of evaluation metrics for the standardised assessment of hydrological forecasts
Environ. Model. Softw.
(2007) - et al.
Comparison of some existing models for estimating global solar radiation for Antalya (Turkey)
Energy Convers. Manag.
(2000) - et al.
Extreme learning machine: theory and applications
Neurocomputing
(2006) Statistical modelling of crop yield in Central Europe using climate data and remote sensing vegetation indices
Agric. For. Meteorol.
(2018)
A new hybrid support vector machine–wavelet transform approach for estimation of horizontal global solar radiation
Energy Convers. Manag.
River flow forecasting through conceptual models part I—a discussion of principles
J. Hydrol. (Amst)
Copula-statistical precipitation forecasting model in Australia’s agro-ecological zones
Agric. Water Manag.
Modeling the joint influence of multiple synoptic-scale, climate mode indices on Australian wheat yield using a vine copula-based approach
Eur. J. Agron.
Fuzzy cognitive map based approach for predicting yield in cotton crop production as a basis for decision support system in precision agriculture application
Appl. Softw. Comput.
The challenges–and some solutions–to process-based modelling of grazed agricultural systems
Environ. Model. Softw.
Prediction of cotton yield and water demand under climate change and future adaptation measures
Agric. Water Manag.
Wind power prediction using hybrid autoregressive fractionally integrated moving average and least square support vector machine
Energy
Quantum-behaved particle swarm optimization algorithm for economic load dispatch of power system
Expert Syst. Appl.
Supply response of cotton in Punjab: a time series analysis
Pak. Cottons
Major crops forecasting area, production and yield evidence from agriculture sector of Pakistan
Sarhad J. Agric.
A new look at the statistical model identification
IEEE Trans. Automat. Contr.
Forecasting production and yield of sugarcane and cotton crops of Pakistan for 2013-2030
Sarhad J. Agric.
A tutorial on adaptive MCMC
Stat. Comput.
Pakistan: Environmental Impact of Cotton Production and Trade
Turbulent Mirror: an Illustrated Guide to Chaos Theory and the Science of Wholeness
Forecasting foreign Cotton production: the case of India, Pakistan and Australia. Paper presented and published
The Proceedings of The 2002 Beltwide Cotton Conference
Particle swarm optimization algorithm
Inf. Control Shenyang
Impact of climate change on cotton production and water consumption in Shiyang River Basin
Trans. Chin. Soc. Agric. Eng.
A model for association in bivariate life tables and its application in epidemiological studies of familial tendency in chronic disease incidence
Biometrika
The interdependence between rainfall and temperature: copula analyses
Sci. World J.
Support vector machine
Mach. Learn.
Handbook of Genetic Algorithms
Singular Value Decomposition, Proc. EUSIPCO-94
Forecasting area, production and yield of cotton in India using ARIMA model
Res. Rev. J. Space Sci. Technol.
Forecasting evaporative loss by least-square support-vector regression and evaluation with genetic programming, Gaussian process, and minimax probability machine regression: case study of Brisbane City
J. Hydrol. Eng.
Dry Weather Predicted in the Country During Friday/Monday
Ensemble learning
Crops Area and Production by Districts 1981-2008
Applied Regression Analysis
Shuffled complex evolution approach for effective and efficient global minimization
J. Opt. Theory Appl.
A New Class of Copulas With Tail Dependence and a Generalized Tail Dependence Estimator
Cited by (25)
A Bayesian copula-based nonstationary framework for compound flood risk assessment along US coastlines
2022, Journal of HydrologyCitation Excerpt :The application of such a framework requires knowledge of future trends in each flooding driver and development of models that possess time-dependent parameters. Trends in flooding drivers can be quantified statistically by estimating historical trends and assuming that they persist into the future (Cheng & Aghakouchak, 2014; Sarhadi & Soulis, 2017; Ali et al., 2018) or through physics-based climate modeling that projects the impact of climate change on flooding drivers such as sea level and precipitation (Jevrejeva et al., 2012; Van Vliet et al., 2013). In a univariate framework, Taherkhani et al. (2020) showed that even a small increase in sea level can drastically influence the frequency of coastal flooding.
Soybean and Soybean Oil Price Forecasting through the Nonlinear Autoregressive Neural Network (NARNN) and NARNN with Exogenous Inputs (NARNN–X)
2022, Intelligent Systems with ApplicationsCitation Excerpt :Econometric models often seen in the literature for commodity price forecasts include the autoregressive and moving average model (Bessler, 1982; 1990; Bessler and Babula, 1987; Bessler and Brandt, 1981; Bessler and Chamberlain, 1988; Brandt and Bessler, 1981; 1982; 1983; 1984; Kling and Bessler, 1985; McIntosh and Bessler, 1988; Xu, 2020; Yang et al., 2001), vector autoregressive model (Awokuse and Yang, 2003; Babula et al., 2004; Bessler, 1990; Bessler and Babula, 1987; Bessler and Brandt, 1992; Bessler and Hopkins, 1986; Bessler and Kling, 1986; Bessler et al., 2003; Brandt and Bessler, 1982; 1984; Chen and Bessler, 1987; 1990; Kling and Bessler, 1985; Wang and Bessler, 2004; Xu, 2019a; 2019c; 2020; Yang et al., 2003), vector error correction model (Bessler et al., 2003; Wang and Bessler, 2004; Xu, 2019a; 2019c; 2020; Yang and Awokuse, 2003; Yang and Leatham, 1998; Yang et al., 2021), and a wide variety of their variations. Recently, machine learning approaches (Abreham, 2019; Ali et al., 2018; Ayankoya et al., 2016; BAYONA-ORÉ et al., 2021; Degife and Sinamo, 2019; Deina et al., 2021; Dias and Rocha, 2019; Fang et al., 2020; Gómez et al., 2021; Handoyo and Chen, 2020; Harris, 2017; HUY et al., 2019; Jiang et al., 2019; Khamis and Abdullah, 2014; Kohzadi et al., 1996; Kouadio et al., 2018; Li et al., 2020a; 2020b; Lopes, 2018; Mayabi, 2019; de Melo et al., 2004; Melo et al., 2007; Moreno et al., 2018; Naveena et al., 2017; Rasheed et al., 2021; dos Reis Filho et al., 2020; Ribeiro and Oliveira, 2011; Ribeiro et al., 2019; Ribeiro and dos Santos Coelho, 2020; RL and Mishra, 2021; Shahhosseini et al., 2020; 2021; Silalahi et al., 2013; Silva et al., 2019; Storm et al., 2020; Surjandari et al., 2015; Xu and Zhang, 2021b; Yoosefzadeh-Najafabadi et al., 2021; Yu et al., 2006; Yuan et al., 2020; Zelingher et al., 2020; 2021; Zhang et al., 2021; Zhao, 2021; Zou et al., 2007), such as the neural network (Abreham, 2019; Ayankoya et al., 2016; Deina et al., 2021; Fang et al., 2020; Harris, 2017; HUY et al., 2019; Khamis and Abdullah, 2014; Kohzadi et al., 1996; Li et al., 2020a; 2020b; Mayabi, 2019; de Melo et al., 2004; Melo et al., 2007; Moreno et al., 2018; Naveena et al., 2017; Rasheed et al., 2021; Ribeiro and Oliveira, 2011; Ribeiro and dos Santos Coelho, 2020; Silalahi et al., 2013; Silva et al., 2019; Xu and Zhang, 2021b; Yoosefzadeh-Najafabadi et al., 2021; Yuan et al., 2020; Zhang et al., 2021; Zou et al., 2007), genetic programming Ali et al. (2018), extreme learning (Deina et al., 2021; Jiang et al., 2019; Kouadio et al., 2018; Silva et al., 2019), deep learning (RL and Mishra, 2021), K-nearest neighbor (Abreham, 2019; Gómez et al., 2021; Lopes, 2018), support vector regression (Abreham, 2019; Dias and Rocha, 2019; Fang et al., 2020; Gómez et al., 2021; Harris, 2017; Li et al., 2020a; 2020b; Lopes, 2018; dos Reis Filho et al., 2020; Ribeiro and dos Santos Coelho, 2020; Surjandari et al., 2015; Yoosefzadeh-Najafabadi et al., 2021; Zhang et al., 2021; Zhao, 2021), random forest (Dias and Rocha, 2019; Gómez et al., 2021; Kouadio et al., 2018; Li et al., 2020b; Lopes, 2018; Ribeiro and dos Santos Coelho, 2020; Shahhosseini et al., 2020; 2021; Yoosefzadeh-Najafabadi et al., 2021; Zelingher et al., 2020; 2021), multivariate adaptive regression splines (Dias and Rocha, 2019), decision tree (Abreham, 2019; Degife and Sinamo, 2019; Dias and Rocha, 2019; Harris, 2017; Lopes, 2018; Surjandari et al., 2015; Zelingher et al., 2020; 2021), ensemble (Fang et al., 2020; Ribeiro et al., 2019; Ribeiro and dos Santos Coelho, 2020; Shahhosseini et al., 2020; 2021), and boosting (Gómez et al., 2021; Lopes, 2018; Ribeiro and dos Santos Coelho, 2020; Shahhosseini et al., 2020; 2021; Zelingher et al., 2020; 2021), have shown their great potential to forecast prices and yields of various agricultural commodities, including but not limited to soybeans (Handoyo and Chen, 2020; Jiang et al., 2019; Li et al., 2020b; dos Reis Filho et al., 2020; Ribeiro and dos Santos Coelho, 2020; Yoosefzadeh-Najafabadi et al., 2021; Zhao, 2021), soybean oil (Li et al., 2020a; Silalahi et al., 2013; Yu et al., 2006), sugar (de Melo et al., 2004; Melo et al., 2007; Ribeiro and Oliveira, 2011; Silva et al., 2019; Surjandari et al., 2015; Zhang et al., 2021), corn (Ayankoya et al., 2016; Mayabi, 2019; Moreno et al., 2018; dos Reis Filho et al., 2020; Ribeiro et al., 2019; Shahhosseini et al., 2020; 2021; Surjandari et al., 2015; Xu and Zhang, 2021b; Zelingher et al., 2020; 2021), wheat (Dias and Rocha, 2019; Fang et al., 2020; Gómez et al., 2021; Khamis and Abdullah, 2014; Kohzadi et al., 1996; Rasheed et al., 2021; Ribeiro and dos Santos Coelho, 2020; Zou et al., 2007), coffee (Abreham, 2019; Degife and Sinamo, 2019; Deina et al., 2021; HUY et al., 2019; Kouadio et al., 2018; Lopes, 2018; Naveena et al., 2017), oats (Harris, 2017), and cotton (Ali et al., 2018; Fang et al., 2020). Previous work has shown that the neural network has great potential for forecasting economic and financial time series, which generally tend to be highly noised and chaotic (Wang and Yang, 2010; Wegener et al., 2016; Xu and Zhang, 2021c; Yang et al., 2010; 2008).
Reducing deep learning network structure through variable reduction methods in crop modeling
2021, Artificial Intelligence in AgricultureCitation Excerpt :Increases in crop productivity have been largely attributed (50–60%) to breeding and the development of hybrid cultivars, followed by improved management practices (Connor et al., 2011; Sacks and Kucharik, 2011). Technological advancements have helped fine-tune management with increased adaptation of yield simulation (Ali and Deo, 2020; Ali et al., 2018a), field monitoring (Rao and Sridhar, 2018), and other data-driven practices (Pathak et al., 2018). Most recently, precision in resource management has been continuously fine-tuned through crop models and use of satellite navigation systems (Abbasi et al., 2014; Basso et al., 2001; Lobell and Burke, 2010).
How is the risk of hydrological drought in the Tarim River Basin, Northwest China?
2019, Science of the Total EnvironmentCitation Excerpt :However, sustainability can only be achieved when the amount of water needed for economic development balances the supply of ecological water in the arid region. The copula function is widely used in the study of the joint probability of multi-dimensional functions (e.g., Ali et al., 2018a; Liu et al., 2018a; Ayala and Blazsek, 2018; Cai et al., 2018). In addition, many experts and scholars use it to analyze various types of drought risks (e.g., Liu et al., 2015; Yu et al., 2018; Ali et al., 2018b).