Multi-stage genetic programming: A new strategy to nonlinear system modeling
Introduction
Modeling of nonlinear engineering systems can be performed by using different kinds of methods. Owing to the large variety of methods available in this field, no one method can be used a universally applicable solution. The modeling of engineering problems is a difficult task because of the need to estimate both the structure and the parameters of such systems. Different criteria can be characterized for model classification while dealing with a system modeling task [44]. A model can be classified as phenomenological or behavioral [33]. A phenomenological model is derived by considering the physical relationships governing a system. As a result, the structure of the model is selected according to prior knowledge about the system. It is not always possible to design phenomenological models for structural engineering systems because of their complexity. To deal with this issue, behavioral models are commonly employed. These models approximate the relationships between the inputs and outputs on the basis of a measured set of data, without the need for prior knowledge about the mechanisms that produced the experimental data. Behavioral models can provide very good results with minimal effort [33]. Traditional statistical regression techniques are commonly used for behavioral modeling purposes. However, regression analysis can have large uncertainties. Further, it has major drawbacks in terms of idealization of complex processes, approximation, and averaging widely varying prototype conditions. Regression analysis often assumes linear, or in some cases nonlinear, relationships between the output and the predictor variables; these assumptions do not always hold. Another major constraint in the application of regression analysis is the assumption of normality of residuals. Several alternative computer-aided pattern recognition and data classification approaches have been developed for behavioral modeling. For instance, pattern recognition systems learn adaptively from experience and extract various discriminators. Artificial neural networks (ANNs) are the most widely used pattern recognition procedures. ANNs have been used for a wide range of engineering problems (e.g., [4], [32], [38], [41]). Despite the acceptable performance of ANNs in most cases, they do not usually provide a definite function to calculate the outcome. In addition, ANNs require the structure of the network (e.g. transfer functions, number of hidden layers and neurons, etc.) to be identified a priori. The ANN approach is mostly appropriate to be used as part of a computer program.
Genetic programming (GP) [26] is a new behavioral modeling approach with completely new characteristics. GP is an extension of genetic algorithms (GAs). In general, it may be defined as a supervised machine learning technique that searches a program space instead of a data space. The programs created by traditional GP are represented as tree structures and expressed using a functional programming language [26]. The main advantage of GP-based approaches over regression and ANN techniques is their ability to generate prediction equations without assuming the form of the existing relationships. GP and its variants have been successfully applied to various real world problems (e.g., [1], [2], [3], [8], [9], [11], [12], [17], [30], [45], [46], [47], [48]). Several alternative approaches have been developed to improve the efficiency of the standard GP. Folino et al. [16] and Deschaine et al. [14] combined GP and simulated annealing (SA) to make a hybrid algorithm with better performance. In this hybrid algorithm, the SA strategy is used to decide the acceptance of a new individual. McKee and Lensberg [31] proposed a hybrid approach combining GP and rough sets to construct a bankruptcy prediction model. Brezocnik and Kovacic [7] presented a new integrated GP and GA approach to predict surface roughness in end milling. In this coupled GP and GA approach, GP is used to develop the prediction model and GA is employed to optimize the floating-point constants of the best model. Madar et al. [29] developed a new method coupling GP and orthogonal least squares (OLS) algorithms. This method uses GP to generate nonlinear models of dynamical systems represented as a tree structure. OLS is then employed to estimate the contribution of branches of tree to the model’s accuracy. Chan et al. [10] proposed a GP-based fuzzy regression (GP-FR) approach to overcome the deficiencies of the commonly used methods for the modeling of manufacturing processes. In the GP-FR method, GP is used to generate model structures and an FR generator based on fuzzy regression is used to determine outliers in experimental data sets [10]. Ravisankar et al. [36] presented novel ANN and GP hybrids to predict the failure of dotcom companies. In these ANN-GP hybrids, one method is used to perform feature selection in the first phase and the other one is used as a classifier in the second phase. Nasseri et al. [34] developed a combined Extended Kalman Filter (EKF) and GP method for forecasting water demand. In this method, the EKF algorithm is applied to infer latent variables in order to make a forecasting based on GP results of water demand. Lee and Tong [27] proposed a hybrid forecasting model for nonlinear time series by combining autoregressive integrated moving average (ARIMA) with GP.
This study investigates the feasibility of using a multi-stage genetic programming (MSGP) strategy to simulate the complex behavior of engineering systems. This strategy is based on the incorporation of the effects of the predictor variables individually and then the interactions between the variables to derive a more efficient formulation for a problem. The formulation capabilities of the MSGP strategy are demonstrated by applying it to three practical engineering examples. Further, a comparative study is conducted using the results obtained via MSGP and those of the standard GP and its variants. These models are developed using reliable experimental results collected from the literature.
Section snippets
Genetic programming
GP is a symbolic optimization technique used to create computer programs for solving a problem following the principle of Darwinian natural selection [17], [26]. A breakthrough in GP was made in the late 1980s by performing experiments on symbolic regression. GP was introduced by Koza [26] as an extension of GA. Most of the genetic operators used in GA can be implemented in GP with minor changes. The main difference between GP and GA is the representation of the solution. GP solutions are
Application to engineering problems
This paper introduces the MSGP strategy to assess the nonlinear relationships between various parameters in three practical engineering problems. The problems investigated are as follows:
- i
Simulation of pH neutralization process.
- ii
Prediction of surface roughness in end milling.
- iii
Classification of soil liquefaction conditions.
The above problems are chosen to evaluate the performance of MSGP because they belong to different fields of engineering, including chemical engineering, mechanical engineering,
Performance analysis and model validity
Representative engineering systems are formulated using MSGP. According to Smith [42], if a prediction model gives R > 0.8, and the error values (RMSE or MAE) are minimum, there is a strong correlation between the predicted and measured values [17]. The model can therefore be judged as very accurate. Fig. 3, Fig. 4, Fig. 5, Fig. 6 show that the MSGP models (Problems I and II) with high R and low RMSE or MAE values are able to predict the target values with an acceptable level of accuracy. The
Conclusions
In this paper, an efficient strategy called the MSGP strategy is introduced for analyzing the behavior of engineering systems. The capabilities of the MSGP strategy are demonstrated by application to three practical problems. The problems considered are the characterization of the pH neutralization process, estimation of the surface roughness in end milling, and classification of soil liquefaction conditions. Reliable databases from previously published experimental results are used to develop
References (48)
- et al.
Non-linear projection to latent structures revisited (the neural network PLS algorithm)
Computers and Chemical Engineering
(1999) - et al.
Generating prediction rules for liquefaction through data mining
Expert Systems with Applications
(2009) - et al.
Prediction of surface roughness with genetic programming
Journal of Materials Processing Technology
(2004) - et al.
Modelling damping ratio and shear modulus of sand–mica mixtures using genetic programming
Expert Systems with Applications
(2009) - et al.
Modeling manufacturing processes using a genetic programming-based fuzzy regression with detection of outliers
Information Sciences
(2010) - et al.
A relevance feedback method based on genetic programming for classification of remote sensing images
Information Sciences
(2011) - et al.
A hybrid computational approach to drive new ground-motion prediction equations
Engineering Applications of Artifical Intelligence
(2011) - et al.
Support vector machines: Their use in geotechnical engineering as illustrated using seismic liquefaction data
Computers and Geotechnics
(2007) - et al.
Beware of q2
Journal of Molecular Graphics and Modelling
(2002) - et al.
Operating regime based process modelling and identification
Computers and Chemical Engineering
(1997)