Prediction of tropospheric ozone concentrations: Application of a methodology based on the Darwin’s Theory of Evolution

https://doi.org/10.1016/j.eswa.2010.07.122Get rights and content

Abstract

This study aims to predict the next day hourly average tropospheric ozone (O3) concentrations using genetic programming (GP). Due to the complexity of this problem, GP is an adequate methodology as it can optimize, simultaneously, the structure of the model and its parameters. It is an artificial intelligence methodology that uses the same principles of the Darwinian Theory of Evolution. GP enables the automatic generation of mathematical expressions that are modified following an iterative process applying genetic operations.

The inputs of the models were the hourly average concentrations of carbon monoxide (CO), nitrogen oxide (NO), nitrogen dioxide (NO2) and O3, and some meteorological variables (temperature – T; solar radiation – SR; relative humidity – RH; and wind speed – WS) measured 24 h before. GP was also applied to the principal components (PC) obtained from these variables. The analysed period was from May to July 2004 divided in training and test periods.

GP was able to select the most relevant variables for prediction of O3 concentrations. The original variables, T, RH and O3 measured 24 h before were considered significant inputs for prediction. The selected PC had also important contributions of the same variables and of NO2. GP models using the original variables presented better performance in training period and worse performance in test period when compared with the models obtained using PC. The results achieved using the GP methodology demonstrated that it can be very useful to solve several environmental complex problems.

Introduction

Ozone (O3) is a strong photochemical oxidant present in different layers of the atmosphere. In the troposphere, this irritating and reactive gas has negative impacts on human health, climate, vegetation and materials (Bytnerowicz et al., 2006, Pires et al., 2008). Tropospheric ozone is the result of three basic processes: (i) photochemical production by the interaction of hydrocarbons and nitrogen oxides under the action of suitable ambient meteorological conditions (Guerra et al., 2004, Zolghadri et al., 2004); (ii) vertical transport of stratospheric air, rich in ozone, into the troposphere (Dueñas, Fernández, Cañete, Carretero, & Liger, 2002); and (iii) horizontal transport due to the wind that brings O3 produced in other regions.

The formation of O3 is a complex, nonlinear, time and space varying process. Accordingly, several studies presented different statistical approaches to predict O3 concentrations (Al-Alawi et al., 2008, Coman et al., 2008, Omidvari et al., 2008, Pires et al., 2008a, Sousa et al., 2007, Sousa et al., 2006, Sousa et al., 2009), including linear and nonlinear models. The applied linear models found in the literature were: (i) multiple linear regression (MLR); (ii) principal component regression (PCR); (iii) quantile regression; and (iv) time series. On the other hand, the most common nonlinear model was the artificial neural network (ANN). The selection of a model must consider some features, such as, complexity, flexibility, accuracy and speed of computation (Pires, Martins, Sousa, Alvim-Ferraz, & Pereira, 2008b). ANN models usually presented better performance than the linear ones (Al-Alawi et al., 2008, Sousa et al., 2006, Sousa et al., 2007) due to the nonlinearity behaviour associated to the O3 formation. However, they are included in a group called black box models, having limited interpretation. Moreover, the selection of the optimal network architecture and the computation time are the main disadvantages of these models.

Besides the structure, the success of a statistical model depends of several factors: (i) the data size; (ii) the method to optimize their parameters; (iii) the input variables; and (iv) the collinearity between the input variables. The collinearity between the input variables can be eliminated through the application of principal component analysis (PCA). It is mathematically defined as an orthogonal linear transformation that modifies the original data to a new coordinate system such that the greatest variance by any projection of the data comes to lie on the first coordinate (called the first principal component), the second greatest variance on the second coordinate, and so on (Pires, Sousa, et al., 2008). Thus, the principal components (PC) are orthogonal and uncorrelated to each other, being determined by linear combinations of the original variables. The directions of the new coordinate axes are given by the eigenvectors of the covariance matrix of the original variables. The magnitude of the eigenvalues corresponds to the variance of the data along the eigenvector direction. Varimax rotation is the most widely employed orthogonal rotation in PCA, because it tends to produce simplification of the unrotated loadings to easier interpretation of the results. It simplifies the loadings by rigidly rotating the PC axes such that the variable projections (loadings) on each PC tend to be high or low. After the rotation, the loadings show the relative contributions of the original variables on each PC.

As many factors could influence the performance of models, their development should have more degrees of freedom. For the models referred above, their structure is fixed in advance and only the parameters are optimized. In stochastic processes, such as the prediction of O3 concentrations, the structure of the models should be more flexible. In this context, genetic programming (GP) could be a successful methodology, as it does not assume in advance any structure for the model. GP can optimize both the structure of the model and its parameters, simultaneously. As far as it is known, no study was published applying GP for predicting air pollutant concentrations. This study aims to predict the next day hourly average O3 concentrations applying GP to the original input variables and their correspondent principal components.

Section snippets

Genetic programming

Genetic programming (GP) is an artificial intelligence methodology that uses principles of the Darwin’s Theory of Evolution. Its search strategy is based on genetic algorithms (GA) introduced by John Holland in the 1960s (Goldberg, 1989). GA use bit strings as chromosomes and are commonly applied in function optimization. This algorithm has several disadvantages, for example, the length of the strings is static (Koza, 1992). Additionally, the size and the shape of the model, solution of a given

Data

The inputs of GP models were the hourly averages of air pollutant concentrations and meteorological variables measured 24 h before. The atmospheric concentrations of carbon monoxide (CO), nitrogen oxide (NO), nitrogen dioxide (NO2) and O3, were collected in an urban site (Antas) with traffic influences situated in Oporto, Northern Portugal. This site belongs to the air quality monitoring network of Oporto Metropolitan Area that is managed by the Regional Commission of Coordination and

Results and discussion

GP procedure was coded by the authors using Matlab. Table 1 shows the main control parameters of GP. The tree size is defined as the number of levels in the tree. For example, in Fig. 1, the tree size is 4. The fittest individuals correspond to the ones that presented the lowest errors in the training step. As the results obtained by GP method are probabilistic, several runs should be made before taking conclusions. In this study, four different runs were done using 3, 4 and 5 populations at

Conclusions

Aiming the prediction of the next day hourly average of O3 concentrations, GP was applied using as inputs the OV and their PC. This methodology was able to select the relevant variables. Applying GP with original variables, T, RH and O3 were considered significant inputs for prediction. On the other hand, when applied to PC, the selected ones had important contributions of the same variables and also of NO2. GP models using the OV presented better performance in training period and worse

Acknowledgements

Authors are grateful to Comissão de Coordenação da Direcção Regional-Norte and to Instituto Geofísico da Universidade do Porto, for kindly providing the air quality and meteorological data. This work was supported by Fundação para a Ciência e Tecnologia (FCT). J.C.M. Pires also thanks the FCT for the fellowship SFRH/BD/23302/2005.

References (20)

There are more references available in the full text version of this article.

Cited by (12)

  • A novel dual-scale ensemble learning paradigm with error correction for predicting daily ozone concentration based on multi-decomposition process and intelligent algorithm optimization, and its application in heavily polluted regions of China

    2022, Atmospheric Pollution Research
    Citation Excerpt :

    High concentrations of secondary pollutants are generated near the ground due to photochemical reactions of these precursors under specific meteorological conditions, of which ozone is a typical case (Guerra et al., 2004). Ozone is a widely distributed air pollutant, and excessive ambient ozone levels are a primary hallmark of photochemical pollution (Pires et al., 2011). While the Chinese government has made significant progress in controlling fine particle pollution, another daunting anti-pollution task lies ahead of the authorities as the problem of ozone-based photochemical smog has become increasingly severe in recent years.

  • Performance and emission characteristics of a CI engine using nano particles additives in biodiesel-diesel blends and modeling with GP approach

    2017, Fuel
    Citation Excerpt :

    GP has been applied to a wide range of problems in artificial intelligence, engineering and science, chemical and biological processes and mechanical issues [18–22]. Pires, et al. [23] used GP method to predict the next day hourly average tropospheric ozone (O3) concentrations. The results showed very good agreement between predicted and measured data.

  • Developing a predictive tropospheric ozone model for Tabriz

    2013, Atmospheric Environment
    Citation Excerpt :

    In a study of Kaohsiung in Taiwan, a genetic algorithm-based model was developed by Tseng and Chang (2001) for assessing the relocation strategy of the urban air quality monitoring network with respect to the multi-objective and multi-pollutant design criteria. Pires et al. (2011) used genetic programming to predict the next day hourly average ozone concentrations in Oporto, Portugal, using hourly average concentrations of environmental variables (CO, NO, NO2) and meteorological variables (e.g. temperature, solar radiation) measured 24 h earlier. GEP was developed by Ferreira (2001a,b) but the authors are not aware of its past applications to modeling tropospheric ozone time series.

  • Correction methods for statistical models in tropospheric ozone forecasting

    2011, Atmospheric Environment
    Citation Excerpt :

    The formation of O3 is a complex, non-linear, time and space varying process. Accordingly, several studies presented different statistical approaches to predict O3 concentrations (Yi and Prybutok, 1996; Spellman, 1999; Abdul-Wahab and Al-Alawi, 2002; Ballester et al., 2002; Wang et al., 2003; Baur et al., 2004; Corani, 2005; Gómez-Sanchis et al., 2006; Schlink et al., 2006; Wang and Lu, 2006; Al-Alawi et al., 2008; Coman et al., 2008; Omidvari et al., 2008; Pires et al., 2008b, 2010, 2011; Ortiz-García et al., 2010), including linear and non-linear models. The applied linear models found in the literature were: (i) multiple linear regression (MLR); (ii) principal component regression (PCR); (iii) independent component regression; (iv) quantile regression; (v) partial least squares regression; and (vi) time series.

  • Ozone concentration forecasting using statistical learning approaches

    2017, Journal of Materials and Environmental Science
View all citing articles on Scopus
View full text