Genetic programming for experimental big data mining: A case study on concrete creep formulation
Introduction
Different techniques can be used for modeling nonlinear systems in structural engineering, and the models obtained from these techniques can be broadly categorized into two groups: phenomenological (or knowledge-based) and behavioral. Phenomenological models consider the physical laws governing the system (such as energy, momentum, etc.). In these models, the structure of the system should be selected by the model developer based on the physical laws, which requires prior knowledge about the system. Due to the complexity of many structural engineering systems/phenomena (such as modeling of concrete shrinkage and creep), it is not always possible to derive such models. In contrast to phenomenological models, behavioral models can be easily developed by finding the relationships between input variables and outputs for a set of experimental data without considering the physical theories. For developing behavioral models, no prior knowledge is needed about the mechanism or fundamental theory that produced the experimental data. Therefore, behavioral modeling techniques can be used for approximate modeling of many structural engineering systems [1], [2].
While behavioral models can be advantageous, many behavioral models require the user to pre-specify/hypothesize the formulation structure of the model. In other words, behavioral techniques optimize the unknown coefficients of a pre-defined formulation structure. In particular, regression analysis is a commonly used technique for developing behavioral models. Although this technique can be used for developing both linear and nonlinear models, it has a strong sensitivity to outliers and can exhibit large model errors due to the idealization of complex processes, approximation, and averaging widely varying prototype conditions [3], [4]. Furthermore, for linear regressions, the least square estimate of unknown parameters can be obtained analytically, while nonlinear regressions typically use an iterative optimization procedure to estimate the unknown parameters, which requires the user to provide starting values. Failure in defining the appropriate starting values can lead to convergence problems or finding the local minimum rather than a global minimum in the optimization process. Therefore, using traditional techniques such as regression analysis cannot guarantee that a reliable and accurate behavioral model will be obtained, particularly for complex nonlinear engineering systems.
In recent years, more advanced computer-aided pattern-recognition and data-classification techniques, such as artificial neural networks (ANNs) and support vector machines (SVMs), have been used to develop behavioral models in various civil engineering problems (e.g., [5], [6], [7], [8]). ANN discovers patterns and approximates relationships in data based on a supervised learning algorithm, a form of regression that relies on the inputs and outputs of a training set [9]. Although ANNs are generally successful in prediction, they are only appropriate to use as part of a computer program, not for the development of practical prediction equations. In addition, ANN requires data to be initially normalized based on the suitable activation function and the best network architecture to be determined by the user, and it can have a complex structure and a high potential for over-fitting [10]. SVMs, on the other hand, are one of the efficient kernel-based methods that can solve a convex constrained quadratic programming (CCQP) problem to find a set of parameters. However, selecting the appropriate kernel in SVM can be a challenge, and the results are not transparent [11].
One powerful technique for developing nonlinear behavioral models in the case of complex optimization problems is genetic programming (GP) [12]. GP is specialization subset of genetic algorithms (GAs) [13], which are based on the principles of genetics and natural selection. GP and its variants have been successfully used for solving a number of different civil engineering problems (e.g., [14], [15]). Multi-gene genetic programming (MGGP) is a robust variant of GP that combines the ability of the standard GP in constructing the model structure with the capability of traditional regression in parameter estimation. In this technique, each symbolic model (and each member of the GP population) is a weighted linear combination of low order non-linear transformations of the input variables. In contrast to standard symbolic regression, MGGP allows the evolution of accurate and relatively compact mathematical models. Even when large numbers of input variables are used, this technique can automatically select the most contributed variables in the model, formulate the structure of the model, and solve the coefficients in the regression equation [16], [17], [18], [19]. Therefore, unlike other techniques such as traditional regression analysis or ANN, there is no need in the MGGP technique for the user to pre-define the formulation structure of the model or select any existing form of the relationship for optimization [3], [4], which makes it more practical for complex optimization problems. Recent studies also show that compared to other novel computer-based techniques such as SVM and particle swarm model selection, GP shows better performance in problems having high dimensionality and large training sets [20].
Typically, standard GP algorithms (including MGGP) will optimize only one objective in the model development process: maximizing the goodness-of-fit to the training data. The main drawback of using a single objective in the optimization process is that the developed models can become overly complex. In other words, minimizing the complexity of the developed models should be another important objective to be considered. In this study, a new algorithm called multi-objective genetic programming (MOGP) is developed. MOGP is an extension of standard GP algorithms that can simultaneously solve for two competing objectives (i.e. maximizing the goodness-of-fit and minimizing the model complexity). By performing multi-gene symbolic regression via MOGP, one can develop parsimonious and accurate data-based models for complex engineering systems.
This paper presents the feasibility of using MOGP for modeling complex nonlinear civil engineering systems. Two objectives are considered for optimization through MOGP: 1) maximization of goodness-of-fit and 2) minimization of model complexity. As an illustration, the capability of this technique is demonstrated by developing a simple and accurate model (referred to as the “genetic programming based creep model” or “G-C model” in this study) for predicting a complex civil engineering phenomenon: the time-dependent total creep compliance of concrete [21]. A large experimental database selected from Northwestern University's Infrastructure Technology Institute (NU-ITI) database [22] is used for the G-C model development; as such, the proposed model is valid for a wide range of structural properties. The multiple imputation method [23] is used to deal with missing data so that the collected data can be incorporated in the model development as much as possible. The predictors are selected from the literature and consist of parameters that have been found to have an influence on the total creep of concrete (such as relative humidity, curing period, etc.). In this study, some schemes have been also used to handle Big Data in genetic programming. The model selection procedure is automatically conducted by MOGP to select the most statistically contributed predictors to obtain an accurate, unbiased, and parsimonious model. To evaluate the capability of the G-C model, its accuracy is compared with other developed models in terms of variant statistical indicators.
Section snippets
Multi-gene symbolic regression
Genetic algorithm (GA) [13] and genetic programming (GP) [12] are two specific types of evolutionary algorithms that have been used in wide range of practical problems in different fields such as optimizing a fixed set of variables or finding a global optimum solution [24]. GA is a traditional optimization technique that uses a fixed length linear representation with binary encoding of all parameters; thus, the output of the GA is a string of numbers. Compared with the GA approach, GP solves
Applying MOGP to predict the time-dependent total creep for concrete
Concrete creep is defined as the time-dependent increase of strain in hardened concrete (in excess of shrinkage) subjected to a sustained stress; it has a direct influence on prestress losses of pretensioned concrete members and the long-term deflection of girders [29]. Furthermore, it is well known that in concrete repairs, cracking due to restrained shrinkage can reduce the performance of a structure and create a direct path for penetration of corrosive ions into the concrete that can in turn
Summary and conclusions
This paper proposes a multi-objective genetic programming (MOGP) technique for the modeling of complex engineering systems. The proposed technique can automatically select the most significant variables in the model, formulate the model structure, and solve the unknown parameters of the regression equation, while simultaneously optimizing for both accuracy and complexity. To handle Big Data, some schemes (such as parallel processing) have been used in the proposed MOGP technique. As an
Acknowledgments
This material is based in part upon work supported by the National Science Foundation (NSF) under Cooperative Agreement No. DBI-0939454. Any options, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of NSF.
References (47)
- et al.
Estimating unconfined compressive strength of cockle shell–cement–sand mixtures using soft computing methodologies
Eng. Struct.
(2015) - et al.
Basic concepts of artificial neural network (ANN) modeling and its application in pharmaceutical research
J. Pharm. Biomed. Anal.
(2000) - et al.
Assessment of artificial neural network and genetic programming as predictive tools
Adv. Eng. Softw.
(2015) - et al.
Numerical modeling of concrete strength under multiaxial confinement pressures using linear genetic programming
Autom. Constr.
(2013) Prediction of concrete creep and shrinkage: past, present and future
Nucl. Eng. Des.
(2001)- et al.
A multi-objective evolutionary algorithm for energy management of agricultural systems—a case study in Iran
Renew. Sust. Energ. Rev.
(2015) - et al.
Extreme learning machine for prediction of heat load in district heating systems
Energy Build.
(2016) - et al.
Genetic programming for moment capacity modeling of ferrocement members
Eng. Struct.
(2013) - et al.
Probabilistic prediction model for average bond strength at steel–concrete interface considering corrosion effect
Eng. Struct.
(2015) - et al.
Identification of parametric models
Commun. Control Eng.
(1997)
A novel genetic programming approach to nonlinear system modelling: application to the DAMADICS benchmark problem
Eng. Appl. Artif. Intell.
A new multi-gene genetic programming approach to nonlinear system modeling. Part I: materials and structural engineering problems
Neural Comput. & Applic.
A new multi-gene genetic programming approach to non-linear system modeling. Part II: geotechnical and earthquake engineering problems
Neural Comput. & Applic.
Artificial neural network for predicting creep and shrinkage of high performance concrete
J. Adv. Concr. Technol.
Effect of mixture temperature on slump flow prediction of conventional concretes using artificial neural networks
Aust. J. Civ. Eng.
Concrete workability
Advantage and drawback of support vector machine functionality
Genetic programming: on the programming of computers by means of natural selection
Holland, Adaptation in Natural and Artificial Systems: An Introductory Analysis with Applications to Biology, Control and Artificial Intelligence
New formulation of compressive strength of preformed-foam cellular concrete: an evolutionary approach
J. Mater. Civ. Eng.
Co-evolution of non-linear PLS model components
J. Chemom.
GPTIPS: an open source genetic programming toolbox for multigene symbolic regression
Evolving toxicity models using multigene symbolic regression and multiple objectives
Int. J. Mach. Learn. Comput.
Cited by (85)
Autogenous deformation-induced stress evolution in cementitious materials considering viscoelastic properties: A review of experiments and models
2024, Developments in the Built EnvironmentBayesian-based prediction and real-time updating of axial deformation in high-rise buildings during construction
2023, Engineering StructuresData-enabled comparison of six prediction models for concrete shrinkage and creep
2023, Case Studies in Construction MaterialsEvaluating the tensile strength of reinforced concrete using optimized machine learning techniques
2023, Engineering Fracture MechanicsParticle swarm optimization technique-based prediction of peak ground acceleration of Iraq's tectonic regions
2023, Journal of King Saud University - Engineering SciencesPrediction of high-temperature creep in concrete using supervised machine learning algorithms
2023, Construction and Building Materials