Genetic programming for retrieving missing information in wave records along the west coast of India

https://doi.org/10.1016/j.apor.2007.11.002Get rights and content

Abstract

Instruments such as floating wave rider buoys provide wave data over a long period in a continuous manner; however such information invariably contains missing values resulting from the instrument and telemetry system that is damaged, malfunctioning or otherwise non-operational. The problem of restoring missing wave heights is attempted in this paper using one of the latest soft computing tools, namely, Genetic Programming (GP). The gaps in the time series of significant wave heights collected at every 3 h for a period of four years from January 2000 to December 2003 are filled in at six selected buoy locations along the west coast of India. The performance of GP was judged in terms of the error statistics of bias, root mean square error, correlation coefficient and scatter index. The methodology demonstrated reliable results with fairly good overall agreement between the restored wave records and actual measurements.

Introduction

The effective utilization and management of offshore and coastal resources require information on ocean waves. The time series of ocean wave heights has applications in many studies related to coastal, offshore and ocean engineering aimed at carrying out design and operational activities like derivation of long term wave heights corresponding to a certain return period and that of exceedance probabilities of a given wave height. An analysis of the time series usually dictates that the sequential observations contained in it are equally spaced, made over long periods and reported in an uninterrupted manner. However the continuity of data is lost occasionally. This is due to many reasons like, failure of collection and transmission equipments, noise and synchronization problems between the buoy and the receivers, hardware as well as software related failures, aging of equipments, accidental or weather induced snapping of mooring lines, severe weather rendering the system non-operational, cloud cover problem in satellite imagery and thefts of equipments deployed in the ocean.

The lost information which is valuable, especially in severe weather hence needs to be retrieved. A break in data adversely affects the quality of information obtained through analysis of such time series and also the performance of applications made, such as real time wave forecasting and derivation of wave height-duration curves. The results obtained may be biased by the presence of missing information.

The techniques of substituting missing values in a given time series of a random variable have been well studied and routinely employed in case of variables like river discharge and runoff (Mutreja [22]), but the same cannot be said for ocean waves. This might probably be due to relatively smaller sample sizes involved in many hydrological studies, like analysis of annual peak river flows or that of monthly rainfalls, where a single missing value may introduce very large bias in the results. The methods employed in such applications include a random choice within the observed range, linear and non-linear interpolation (Mutreja [22]), autoregressive schemes (Bennis et al. [2]) chaos theory (Elshorebagy et al. [9]) and artificial neural networks (Khalil et al. [17]). The problem of gaps in data in general oceanography has been addressed by investigators like Thompson [27] who suggested that a random sampling of data points might be an optimally efficient approach and Sturges [26] who used a Monte Carlo technique to make up gaps at random in a known time series of monthly mean sea level. Emery and Thomson [10] gave an account of such attempts in a wider domain of oceanography. Makarynskyy et al., [20] have described an effective use of ANN to in-fill gaps in tidal data based on a large number of measurements. As regards the time history of wave heights (rather than other variables in the works referred to so far) is concerned there are relatively sparse studies directly addressing the issue. Stefanokos and Athanassoulis [25] made use of a residual wave height series with the same probability distribution as the original one created after removing the trend and periodicity from the observed series. Use of the soft computing tools like artificial neural network (ANN)’s for in-filling of wave data is recent. Based on training and validation of ANNs over an exhaustive period of 8 years, Makarynskyy et al. [19] have restored significant wave heights over 36 h gaps. Puca et al. [24] filled up gaps at one location by spatial correlation with two nearby ones, while Balas et al. [1] resorted to temporal correlations probably due to smaller gaps (2–24 h or so) in their series and also smaller period of observation (24 months). In general small gaps–a few in number–appeared to have been filled up by simple interpolation, medium gaps by stochastic model fitting and large gaps by spatial correlation (Stefanokos and Athanassoulis [25], Makarynskyy and Makarynska [21]), although the distinction made between the small and large mediums is not very clear. The past works generally indicate that a soft tool like ANN is quite effective in retrieving the missing wave height information. The success of ANN has inspired authors to experiment with alternative soft computing approaches.

The present work therefore involves application of one of the latest and so far untried soft tool of genetic programming (GP) for filling up the missing significant wave height (Hs) values at a given location based on the same being collected at the nearby stations. It is based on observations made over a period of four years. The GP can iteratively generate new values till they reach a certain level of acceptance as per the selected criterion, and thus it looks promising to apply for the current problem of retrieval of missing values. In the present work suitability of this new approach is assessed for different lengths of the gap and its outcome is compared with that of an ANN. The comparison of reconstructed significant wave height time series and the actual buoy measurements would show a good performance of GP, both during rough and calm time periods.

Unlike in the past, a large amount of wave rider as well as satellite wave data are now becoming increasingly available for multiple locations and over long durations, at many parts of the world and this study would therefore be useful while dealing with such a database. Recently Ustoorikar and Deo [28] have attempted the use of GP to fill up gaps in the measured wave heights at certain locations in the Gulf of Mexico. However the present study goes beyond this pilot work and provides more robust treatment to the problem. It also belongs to a different part of the world influenced by different met-ocean conditions. Further the number of gaps present in the measured wave data at the present locations is very much high compared to the stations in the Gulf of Mexico.

Section snippets

The database

The significant wave height time series collected through floating buoy measurements was used. It pertained to six selected stations along the west coast of India maintained by National Institute of Ocean Technology (NIOT) under the National Data Buoy Programme implemented by the Department of Ocean Development, Government of India. These stations: DS1, DS2, SW1, SW2, SW3, SW4 (Fig. 1) were located along the west coast of India. The stations DS1 (15.326N and 69.371E, water depth: 3800 m) and

Genetic programming

The concept of genetic programming is borrowed from the process of evolution occurring in nature in which the species survive as per the principle of ‘survival of the fittest’. GP is similar to more widely known genetic algorithms (GA), but unlike GA its solution is a computer program or an equation as against a set of numbers in the GA. Koza [18] explained various concepts related to GP. In GP a random population of individuals (equations or computer programs) is created, the fitness of

Retrieval of missing information

Out of the total sample size of four years the observations for the initial 25 months were used to evaluate the final or optimum GP program or equation while those for the last 23 months were employed to validate the performance and achieve gap in-filling with different quanta of missing information. The objective was to fill up gaps in the time series of significant wave height (Hs) values at one location by using Hs values from other single location or multiple locations. As mentioned in the

Improving the prediction of higher waves

From Fig. 5 to Fig. 10 it can be seen that the higher values of Hs (>4 m) are under predicted. Hence it was decided to carry out separate training for higher values or to retrain the higher values using GP to improve their predictions. For this purpose the training data file was divided into two sets of wave heights; 0 to 3 m and 3 m and above. The training set containing higher values of Hs (>3 m) were used to generate the new equations respectively. The testing results are then generated

Conclusions

The preceding sections evaluated the applicability of one of the latest soft computing tools called genetic programming to retrieve the missing information in wave records collected at several locations along the west coast of India. The collected time histories had a very high number of gaps and these were satisfactorily filled up by developing GP models on spatial correlation with neighboring values.

Both program-based as well as equation-based GP models were used. No noticeable difference in

Acknowledgements

The authors thank NIOT, Chennai, India for sparing the data used in this paper. The help provided by Ms Pooja Jain and Ms Ketaki Ustoorikar at different stages of this study is acknowledged. The funds received from the Naval Research Board, India are gratefully acknowledged.

References (29)

  • V. Babovic et al.

    Neural networks as routine for error updating of numerical models

    Journal of Hydraulic Engineering, ASCE

    (2001)
  • Babovic V, Keijzer M, Aguilera DR, Harrington J. An evolutionary approach to knowledge induction: Genetic programming...
  • H. Demuth et al.

    Neural network toolbox user’s guide

    (1998)
  • Drecourt JP. Application of neural networks and genetic programming to rainfall runoff modeling. Danish Hydraulic...
  • Cited by (0)

    View full text