Elsevier

Automation in Construction

Volume 70, October 2016, Pages 89-97
Automation in Construction

Genetic programming for experimental big data mining: A case study on concrete creep formulation

https://doi.org/10.1016/j.autcon.2016.06.010Get rights and content

Highlights

  • Multi-objective genetic programming (MOGP) technique is introduced for handling Big Data and modeling complex systems.

  • As an illustration, MOGP is used for development of concrete creep model (referred to as “G-C model”).

  • The G-C model is developed based on a Big Data collected from the literature.

  • The proposed G-C model is simple, and provides accurate and unbiased predictions.

Abstract

This paper proposes a new algorithm called multi-objective genetic programming (MOGP) for complex civil engineering systems. The proposed technique effectively combines the model structure selection ability of a standard genetic programming with the parameter estimation power of classical regression, and it simultaneously optimizes both the complexity and goodness-of-fit in a system through a non-dominated sorting algorithm. The performance of MOGP is illustrated by modeling a complex civil engineering problem: the time-dependent total creep of concrete. A Big Data is used for the model development so that the proposed concrete creep model—referred to as a “genetic programming based creep model” or “G-C model” in this study—is valid for both normal and high strength concrete with a wide range of structural properties. The G-C model is then compared with currently accepted creep prediction models. The G-C model obtained by MOGP is simple, straightforward to use, and provides more accurate predictions than other prediction models.

Introduction

Different techniques can be used for modeling nonlinear systems in structural engineering, and the models obtained from these techniques can be broadly categorized into two groups: phenomenological (or knowledge-based) and behavioral. Phenomenological models consider the physical laws governing the system (such as energy, momentum, etc.). In these models, the structure of the system should be selected by the model developer based on the physical laws, which requires prior knowledge about the system. Due to the complexity of many structural engineering systems/phenomena (such as modeling of concrete shrinkage and creep), it is not always possible to derive such models. In contrast to phenomenological models, behavioral models can be easily developed by finding the relationships between input variables and outputs for a set of experimental data without considering the physical theories. For developing behavioral models, no prior knowledge is needed about the mechanism or fundamental theory that produced the experimental data. Therefore, behavioral modeling techniques can be used for approximate modeling of many structural engineering systems [1], [2].

While behavioral models can be advantageous, many behavioral models require the user to pre-specify/hypothesize the formulation structure of the model. In other words, behavioral techniques optimize the unknown coefficients of a pre-defined formulation structure. In particular, regression analysis is a commonly used technique for developing behavioral models. Although this technique can be used for developing both linear and nonlinear models, it has a strong sensitivity to outliers and can exhibit large model errors due to the idealization of complex processes, approximation, and averaging widely varying prototype conditions [3], [4]. Furthermore, for linear regressions, the least square estimate of unknown parameters can be obtained analytically, while nonlinear regressions typically use an iterative optimization procedure to estimate the unknown parameters, which requires the user to provide starting values. Failure in defining the appropriate starting values can lead to convergence problems or finding the local minimum rather than a global minimum in the optimization process. Therefore, using traditional techniques such as regression analysis cannot guarantee that a reliable and accurate behavioral model will be obtained, particularly for complex nonlinear engineering systems.

In recent years, more advanced computer-aided pattern-recognition and data-classification techniques, such as artificial neural networks (ANNs) and support vector machines (SVMs), have been used to develop behavioral models in various civil engineering problems (e.g., [5], [6], [7], [8]). ANN discovers patterns and approximates relationships in data based on a supervised learning algorithm, a form of regression that relies on the inputs and outputs of a training set [9]. Although ANNs are generally successful in prediction, they are only appropriate to use as part of a computer program, not for the development of practical prediction equations. In addition, ANN requires data to be initially normalized based on the suitable activation function and the best network architecture to be determined by the user, and it can have a complex structure and a high potential for over-fitting [10]. SVMs, on the other hand, are one of the efficient kernel-based methods that can solve a convex constrained quadratic programming (CCQP) problem to find a set of parameters. However, selecting the appropriate kernel in SVM can be a challenge, and the results are not transparent [11].

One powerful technique for developing nonlinear behavioral models in the case of complex optimization problems is genetic programming (GP) [12]. GP is specialization subset of genetic algorithms (GAs) [13], which are based on the principles of genetics and natural selection. GP and its variants have been successfully used for solving a number of different civil engineering problems (e.g., [14], [15]). Multi-gene genetic programming (MGGP) is a robust variant of GP that combines the ability of the standard GP in constructing the model structure with the capability of traditional regression in parameter estimation. In this technique, each symbolic model (and each member of the GP population) is a weighted linear combination of low order non-linear transformations of the input variables. In contrast to standard symbolic regression, MGGP allows the evolution of accurate and relatively compact mathematical models. Even when large numbers of input variables are used, this technique can automatically select the most contributed variables in the model, formulate the structure of the model, and solve the coefficients in the regression equation [16], [17], [18], [19]. Therefore, unlike other techniques such as traditional regression analysis or ANN, there is no need in the MGGP technique for the user to pre-define the formulation structure of the model or select any existing form of the relationship for optimization [3], [4], which makes it more practical for complex optimization problems. Recent studies also show that compared to other novel computer-based techniques such as SVM and particle swarm model selection, GP shows better performance in problems having high dimensionality and large training sets [20].

Typically, standard GP algorithms (including MGGP) will optimize only one objective in the model development process: maximizing the goodness-of-fit to the training data. The main drawback of using a single objective in the optimization process is that the developed models can become overly complex. In other words, minimizing the complexity of the developed models should be another important objective to be considered. In this study, a new algorithm called multi-objective genetic programming (MOGP) is developed. MOGP is an extension of standard GP algorithms that can simultaneously solve for two competing objectives (i.e. maximizing the goodness-of-fit and minimizing the model complexity). By performing multi-gene symbolic regression via MOGP, one can develop parsimonious and accurate data-based models for complex engineering systems.

This paper presents the feasibility of using MOGP for modeling complex nonlinear civil engineering systems. Two objectives are considered for optimization through MOGP: 1) maximization of goodness-of-fit and 2) minimization of model complexity. As an illustration, the capability of this technique is demonstrated by developing a simple and accurate model (referred to as the “genetic programming based creep model” or “G-C model” in this study) for predicting a complex civil engineering phenomenon: the time-dependent total creep compliance of concrete [21]. A large experimental database selected from Northwestern University's Infrastructure Technology Institute (NU-ITI) database [22] is used for the G-C model development; as such, the proposed model is valid for a wide range of structural properties. The multiple imputation method [23] is used to deal with missing data so that the collected data can be incorporated in the model development as much as possible. The predictors are selected from the literature and consist of parameters that have been found to have an influence on the total creep of concrete (such as relative humidity, curing period, etc.). In this study, some schemes have been also used to handle Big Data in genetic programming. The model selection procedure is automatically conducted by MOGP to select the most statistically contributed predictors to obtain an accurate, unbiased, and parsimonious model. To evaluate the capability of the G-C model, its accuracy is compared with other developed models in terms of variant statistical indicators.

Section snippets

Multi-gene symbolic regression

Genetic algorithm (GA) [13] and genetic programming (GP) [12] are two specific types of evolutionary algorithms that have been used in wide range of practical problems in different fields such as optimizing a fixed set of variables or finding a global optimum solution [24]. GA is a traditional optimization technique that uses a fixed length linear representation with binary encoding of all parameters; thus, the output of the GA is a string of numbers. Compared with the GA approach, GP solves

Applying MOGP to predict the time-dependent total creep for concrete

Concrete creep is defined as the time-dependent increase of strain in hardened concrete (in excess of shrinkage) subjected to a sustained stress; it has a direct influence on prestress losses of pretensioned concrete members and the long-term deflection of girders [29]. Furthermore, it is well known that in concrete repairs, cracking due to restrained shrinkage can reduce the performance of a structure and create a direct path for penetration of corrosive ions into the concrete that can in turn

Summary and conclusions

This paper proposes a multi-objective genetic programming (MOGP) technique for the modeling of complex engineering systems. The proposed technique can automatically select the most significant variables in the model, formulate the model structure, and solve the unknown parameters of the regression equation, while simultaneously optimizing for both accuracy and complexity. To handle Big Data, some schemes (such as parallel processing) have been used in the proposed MOGP technique. As an

Acknowledgments

This material is based in part upon work supported by the National Science Foundation (NSF) under Cooperative Agreement No. DBI-0939454. Any options, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of NSF.

References (47)

  • M.F. Metenidis et al.

    A novel genetic programming approach to nonlinear system modelling: application to the DAMADICS benchmark problem

    Eng. Appl. Artif. Intell.

    (2004)
  • A.H. Gandomi et al.

    A new multi-gene genetic programming approach to nonlinear system modeling. Part I: materials and structural engineering problems

    Neural Comput. & Applic.

    (2012)
  • A.H. Gandomi et al.

    A new multi-gene genetic programming approach to non-linear system modeling. Part II: geotechnical and earthquake engineering problems

    Neural Comput. & Applic.

    (2012)
  • J. Karthikeyan et al.

    Artificial neural network for predicting creep and shrinkage of high performance concrete

    J. Adv. Concr. Technol.

    (2008)
  • M. Moini et al.

    Effect of mixture temperature on slump flow prediction of conventional concretes using artificial neural networks

    Aust. J. Civ. Eng.

    (2012)
  • M. Moini et al.

    Concrete workability

  • S. Karamizadeh et al.

    Advantage and drawback of support vector machine functionality

  • J.R. Koza

    Genetic programming: on the programming of computers by means of natural selection

    (1992)
  • H. John

    Holland, Adaptation in Natural and Artificial Systems: An Introductory Analysis with Applications to Biology, Control and Artificial Intelligence

    (1992)
  • B. Kiani et al.

    New formulation of compressive strength of preformed-foam cellular concrete: an evolutionary approach

    J. Mater. Civ. Eng.

    (2016)
  • D. Searson et al.

    Co-evolution of non-linear PLS model components

    J. Chemom.

    (2007)
  • D.P. Searson et al.

    GPTIPS: an open source genetic programming toolbox for multigene symbolic regression

  • C. Hii et al.

    Evolving toxicity models using multigene symbolic regression and multiple objectives

    Int. J. Mach. Learn. Comput.

    (2011)
  • Cited by (85)

    View all citing articles on Scopus
    View full text