Elsevier

Applied Soft Computing

Volume 70, September 2018, Pages 208-224
Applied Soft Computing

Decomposition genetic programming: An extensive evaluation on rainfall prediction in the context of weather derivatives

https://doi.org/10.1016/j.asoc.2018.05.016Get rights and content

Highlights

  • An extensive evaluation of a novel genetic decomposition algorithm on rainfall derivatives.

  • The decomposition algorithm is compared to 7 machine learning techniques.

  • Algorithm tested over 42 different cities from Europe and the USA.

  • The genetic decomposition algorithm is able to significantly outperform all other techniques.

  • Additional analysis on algorithmic performance across different climates.

Abstract

Regression problems provide some of the most challenging research opportunities in the area of machine learning, where the predictions of some target variables are critical to a specific application. Rainfall is a prime example, as it exhibits unique characteristics of high volatility and chaotic patterns that do not exist in other time series data. Moreover, rainfall is essential for applications that surround financial securities, such as rainfall derivatives. This paper extensively evaluates a novel algorithm called Decomposition Genetic Programming (DGP), which is an algorithm that decomposes the problem of rainfall into subproblems. Decomposition allows the GP to focus on each subproblem, before combining back into the full problem. The GP does this by having a separate regression equation for each subproblem, based on the level of rainfall. As we turn our attention to subproblems, this reduces the difficulty when dealing with data sets with high volatility and extreme rainfall values, since these values can be focused on independently. We extensively evaluate our algorithm on 42 cities from Europe and the USA, and compare its performance to the current state-of-the-art (Markov chain extended with rainfall prediction), and six other popular machine learning algorithms (Genetic Programming without decomposition, Support Vector Regression, Radial Basis Neural Networks, M5 Rules, M5 Model trees, and k-Nearest Neighbours). Results show that the DGP is able to consistently and significantly outperform all other algorithms. Lastly, another contribution of this work is to discuss the effect that DGP has had on the coverage of the rainfall predictions and whether it shows robust performance across different climates.

Introduction

Regression based problems provide a unique challenge for researchers, where the prediction of outputs have a pivotal outcome in real-life problems. The complexity can be overcome through specific domain knowledge, but often this is not the case. Within complex and chaotic time series data, there is a lack of reoccurring patterns and domain knowledge can be scarce. A type of time series, which remains one of the most difficult and crucial to applications, is rainfall. This time series contains high volatility, little to no seasonality and is highly random. The effects of rainfall can lead to devastation, and unfavourable conditions can impact societies’ and ecosystems’ ability to survive.

The phenomenon of rainfall has a direct impact on various domains such as water resource planning, agriculture and biological systems. Within finance, predicting the level of rainfall is important for protecting an individual's income from the adverse rainfall effects. Over the years people have sought means of protecting their day-to-day income from unfavourable rainfall, but only until more recently has this been possible. Insurance from rain's adverse effects has existed for many years, but often is of little use unless the impact is of high catastrophe, causing destruction. For instance, a farmer would only be able to receive compensation if s/he could demonstrate destruction of their crop, e.g. because of a severe flood. However, such business can also be affected by unfavourable rainfall, which is not necessarily catastrophic. For example, if a certain year is drier than normal, there might be a significant effect in the crop production. In such cases, rainfall derivatives is a new method for reducing the financial risk posed by adverse or uncertain weather circumstances. A rainfall derivative has the advantage that no proof of damages caused by rain is required to exercise protectionism, only the contract purchased.

Rainfall derivatives are part of the concept of weather derivatives, sharing many of the same aspects of normal financial derivatives (e.g., oil and grain). This derivative is an agreed contract between two or more parties and can be written on the level of rainfall expected over a certain period of time. This contract's value is priced according to the level of rainfall predicted over that period in the future. Therefore, the problem of rainfall derivatives can be broken down into two parts. The first problem is predicting the accumulated rainfall over a specified period and the second problem is having a pricing framework. The latter has its own unique problematic features, as rainfall derivatives constitute an incomplete market.1 To reduce the problem of mispricing, an algorithm that can predict rainfall accurately is key, before assigning a price. In this paper we focus on this first aspect of predicting the rainfall amount.

As the concept of rainfall derivatives is relatively new, there exists little literature on this subject. Moreover, the difficulty in predicting rainfall has deterred the attention of researchers, unlike other weather derivatives such as temperature.2 To estimate future levels of rainfall, the Markov-chain extended with rainfall prediction (MCRP) [7] method has been commonly applied in a wide range of the literature, including rainfall derivatives [8], [9], [10], [11]. The general MCRP approach is often referred to as a ‘chain-dependent process’ [12], which splits the model into capturing first the occurrence pattern, and then predicting the rainfall intensities. The occurrence pattern is produced by a Markov-chain, where state 0 is a dry day and state 1 is a wet day. If a wet day is produced then the rainfall intensity is calculated by generating a random number from a given distribution (typically Gamma or Mixed-Exponential distribution), otherwise a value of 0 is assigned (zero rainfall). We refer the reader to [7] for a complete description of MCRP. Despite being a popular approach, MCRP is very simplistic and does not truly capture the irregularities of rainfall. The final result tends to fluctuate around the observable mean of the training data. Moreover, there exists a large number of rainfall pathways that do not reflect future behaviour.

A way of dealing with the difficulty of predicting rainfall and to overcome some of the difficulties in modelling the time series of rainfall, is through change point models. The idea is based on abrupt changes in the time series, those points are considered a change point, with a new model explaining the time series within each segment [13]. They are frequently employed within econometrics [14], [15], climate [16] and hydrology [17], amongst other problem domains. The concept is similar to a decomposition method proposed in [18], but change point models split the time series into a typically larger number of smaller segments on the time axis. In [18], the time series of rainfall is split on the dependent variable according to whether the next day is expected to observe high, medium or low rainfall. The difference being, only three regression equations explain the whole time series of rainfall, instead of a larger number of regression models based on the abrupt changes in the time series.

Machine learning methods can be seen as an alternative and have become more popular over recent years. Typical applications within machine learning revolve around short term predictions (e.g. rainfall-runoff models up to a few hours [19] or monthly amounts [20], [21]). For daily predictions, Weerasinghe et al. [22] used a feed-forward back-propagation neural network for daily rainfall prediction in Sri Lanka, which was inspired by the chain-dependent approach from statistics. The work in [23] also applied GP to daily rainfall data, but the GP performed poorly by itself, although when assisted by wavelets the predictive accuracy improved. In the context of rainfall derivatives a selection of machine learning algorithms was explored in detail in [24], which showed that Radial Basis Function (RBF), Support Vector Regression (SVR) and Genetic Programming (GP) outperformed the commonly applied method of MCRP following a transformation of the data. In addition, Cramer et al. [25] presented in detail a tailored GP for the problem of rainfall prediction, and Cramer et al. [26] extended the above work by exploring the use of feature extraction. Both works showed promising results, where the GP could outperform MCRP, the current-state-of-the art. Furthermore, Cramer et al. [18] extended the above GP works, by proposing a new algorithm called Decomposition GP (DGP). This was a novel hybrid algorithm (comprising of a Genetic Algorithm (GA) part, and a Genetic Programming part) that decomposes the problem of rainfall into subproblems. The motivation for doing this was to allow the GP to focus on each subproblem, before combining back into the full problem. The GP did this by having a separate regression equation for each subproblem, determined based on the level of rainfall; in addition, the GA determined which regression equation should be used (solving a classification problem). As we turn our attention to subproblems, this reduces the difficulty when dealing with data sets with high volatility and extreme rainfall values, since these values can be focused on independently.

The main novelty of our paper is to present an in-depth technical and experimental comparative approach of the DGP algorithm, by building on [18]. This algorithm is an important step for time series that exhibit extreme time series behaviour. It is especially important within rainfall derivatives, where the price of a derivative is determined based on the level of rainfall, a prime example of the types of problems that our algorithm is looking to overcome. More specifically, the current study expands our previous work in the following five ways: (i) we present a more in-depth presentation of the DGP algorithm, (ii) we double the number of cities tested to 42, and we include cities not only from Europe, but also from the USA, (iii) we increase the number of algorithms we use as benchmarks from three (GP without decomposition, MCRP, RBF) to seven, as we now also include results for SVR, the M5 algorithm (both model trees M5R, and rules M5P), and k-Nearest Neighbour (KNN), (iv) we provide an extensive analysis on the results in terms of the GA component, which handles a classification task, as we compare it to other well-known classification techniques, such as RBF, SVM, RIPPER, Discriminant Analysis (DA), and Naive Bayes (NB), and (v) we provide an extensive discussion on the effectiveness of the DGP algorithm, by investigating how well its predictions cover the range of all rainfall data, and also by looking into how robustly it performs across different climates.

The remainder of this paper is organised as follows. In Section 2, we outline the data used. In Section 3, we present in detail the decomposition algorithm and its components. In Section 4, we outline the experimental setup for the DGP algorithm, and in Section 5, we discuss the results. In Section 6, we evaluate the effectiveness of DGP and also analyse the algorithm's performance on different climates. Finally in Section 7, we conclude and present future work.

Section snippets

The data used in the experiments

The daily rainfall data used is summarised in Table 1, which includes a total of 20 cities from around Europe and 22 from around the United States of America (USA). The data was retrieved from NOAA NCDC.3

The use of machine learning methods effectively requires a modification to the data to align it with the problem domain of rainfall derivatives. Following [24] we use a sliding window accumulation method, given by:rts=t=tstert,where rt is the accumulated amount of

Overview

Within this section we outline how we achieve the decomposition and how we break the problem down into smaller subproblems.

Our DGP consists of a number of individuals split into two separate populations, a GP part and a GA part. The GP part consists of b expression trees, where nodes represent functions or terminals as usual in GP [27]. For our implementation we define b to equal 3, such that we have 3 GP equations to predict low, medium and high rainfall amounts. The GA part consists of a

Experimental setup

The main goal of our experimentation is to establish whether the use of DGP is better than using a standard GP and other well known machine learning methods. As mentioned in the Introduction, producing more accurate rainfall predictions should lead to more accurate pricing.

We have identified three key aspects to investigate for DGP. The first is the performance against the financial state-of-the-art MCRP, as well as several popular machine learning algorithms. The second is the performance of

Results

Within this section we outline the results for how DGP performs against the benchmarks highlighted earlier. Moreover, we test the classification ability of the original GA against other well known techniques and how this impacts our DGP's predictive accuracy. To compare accuracy we use the Root Mean Squared Error (RMSE), because the data includes large deviations away from the mean of the data set. The idea is to have an algorithm that is able to cope with the extremes, thus analysing which

Effectiveness of the DGP algorithm

Within this section, we consider the effect that the problem decomposition (i.e., evolving a separate equation for each rainfall class) has had on DGP's ability to predict more similarly to the underlying data. Cramer et al. [24] noted that GP without decomposition tended to produce equations with flat predictions and was unable to meet the oscillations of the time series. To consider this we analyse the effect that DGP has had on the coverage of the predictions and whether DGP is able to

Conclusion

Within this paper, we presented an extensive evaluation of the Decomposed Genetic Programming (DGP) algorithm for the problem of rainfall within weather derivatives. DGP was proposed as a way to overcome the potential issues highlighted in previous work where we observed that GP was unable to consistently provide equations suitable for the underlying problem of rainfall. Therefore, we aimed to address this issue by thoroughly examining DGP to determine if the correct behaviour exists in our

References (30)

  • B.L. Cabrera et al.

    Pricing rainfall futures at the CME

    J. Bank. Finance

    (2013)
  • M. Ritter et al.

    Minimizing geographical basis risk of weather derivatives using a multi-site rainfall model

    Comput. Econ.

    (2014)
  • M. Cao et al.

    Precipitation modeling and contract valuation

    J. Altern. Invest.

    (2004)
  • M. Odening et al.

    Analysis of rainfall derivatives using daily precipitation models: opportunities and pitfalls

    Agric. Finance Rev.

    (2007)
  • R.W. Katz

    Precipitation as a chain-dependent process

    J. Appl. Meteorol. Climatol.

    (1977)
  • Cited by (20)

    • Uncertainty-based rainfall network design using a fuzzy spatial interpolation method

      2021, Applied Soft Computing
      Citation Excerpt :

      Stochastic methods include Ordinary Kriging (OK) [4], Bayesian approaches such as Bayesian Maximum Entropy (BME) [5]. Supervised learning methods are Artificial Neural Networks (ANNs) approaches [6,7], Support Vector Regression (SVR) [7] and finally intelligent method is Genetic Programming (GP) [8]. The question of which spatial interpolation method is the most efficient has been discussed in several papers.

    • Stochastic model genetic programming: Deriving pricing equations for rainfall weather derivatives

      2019, Swarm and Evolutionary Computation
      Citation Excerpt :

      Both works showed promising results, where the GP could outperform MCRP, the current-state-of-the art. Furthermore [25,26], extended the above GP works, by proposing a new algorithm called Decomposition GP (DGP). This was a novel hybrid algorithm (comprising of a Genetic Algorithm (GA) part and a Genetic Programming part) that decomposes the problem of rainfall into subproblems.

    • Pareto-optimal MPSA-MGGP: A new gene-annealing model for monthly rainfall forecasting

      2019, Journal of Hydrology
      Citation Excerpt :

      Stochastic modelling of rainfall events for long time scales such as monthly and seasonal has been attempted in earlier studies using classical time series modeling approaches such as auto regressive integrated moving average (ARIMA), seasonal ARIMA (SARIMA), and periodic autoregressive moving average (PARMA) (e.g., Delleur and Kavvas, 1978; Kaushik and Singh, 2008). Despite being popular, these are basically linear models and incapable of truly capturing the irregularities of rainfall (Cramer et al., 2018). They can be applied for stationary time series when month-to-month or season-to-season correlations do not vary throughout the year (Salas et al., 2003; Nourani et al., 2009).

    • rain-t: Daily Rainfall Predictive Model Using 6-Gene Genetic Expression for Historical Data-Based Forecasting

      2024, Journal of Advanced Computational Intelligence and Intelligent Informatics
    View all citing articles on Scopus
    View full text