Conservative strategy-based ensemble surrogate model for optimal groundwater remediation design at DNAPLs-contaminated sites
Introduction
Dense nonaqueous-phase liquids (DNAPLs) are now frequently detected in groundwater throughout the world, because of the widespread use, improper disposal, accidental spills and leaks of petrochemical products (Kueper and Mcworter, 1991). Because of their low solubility, low mobility and high density in water, DNAPLs may remain in aquifers for long periods of time, and thus ultimately become long-term continuous sources of groundwater contamination (Qin et al., 2007). Surfactant-enhanced aquifer remediation (SEAR), an enhancement to the conventional pump-and-treat technique, is a promising way to remove DNAPLs from aquifers (Schaerlaekens et al., 2005). By adding surfactants to the water, the solubility and mobility of DNAPLs in an aquifer can be increased (Delshad et al., 1996), which makes SEAR more efficient than the conventional pump-and-treat technique. Because the cost of the SEAR process is high, optimizing design for cost-effectiveness is of great value.
Simulation-optimization (S/O) techniques have been used extensively to solve such problems (Jiang et al., 2015, Luo et al., 2013). When such techniques are employed, the numerical simulation model would be called thousands of times before the optimal design is obtained, which is computationally expensive, and may be prohibitive (Hou et al., 2015). Using surrogates (also known as meta-models or proxy models) to replace the computationally expensive simulation models has become commonplace.
However, no matter how well the surrogate model approximates the simulation model, errors caused by surrogate-modeling uncertainty exist. In constrained optimization (constraints being surrogate models), the obtained solution may be infeasible because of surrogate errors (Viana et al., 2010). He et al. (2010) regarded the error between simulation model and surrogate model as a stochastic variable and adopted the chance-constrained programming (CCP) method to incorporate it into the optimization model for groundwater remediation design. However, before this method is adopted, the hypotheses of normality and zero-means for the errors generated by surrogate models should be tested, which cannot always be achieved.
Recently, researchers have focused on conservative strategy-based surrogate models (also called conservative surrogates), which push the optimal solution into the feasible region (Pan et al., 2012). In many engineering problems, there is an incentive to obtain approximations that are as close as possible but on the safer side in terms of the actual response (Picheny et al., 2008). In groundwater remediation design optimization, such a response may represent the minimum allowable value of the contaminant removal rate that, to avoid failure, must not be overestimated. In this paper, we call a surrogate model conservative when the estimations are lower than the true responses. That is to say, conservative surrogates tend to underestimate target values. In contrast, general surrogate models are unbiased; that is, the estimations are equally likely to be lower and higher than the actual value (Pan et al., 2012). To date, conservative surrogate models have been used in structural analysis of vehicle engineering (Zhu et al., 2013) and aircraft engineering (Acar et al., 2007), but have not previously been applied in optimization of groundwater remediation design.
Many techniques for surrogate modeling have been proposed, such as artificial neural networks (ANNs) (Luo et al., 2013), Kriging (KRG) (Zhao et al., 2016), support vector regression (SVR) (Ouyang et al., 2017), and extreme learning machines (ELMs) (Jiang et al., 2015). More recently, multi-gene genetic programming (MGGP) (Hinchliffe et al., 1996, Searson et al., 2007) has been designed to develop the input–output relationship of a system, and has attracted the attention of many researchers across a broad range of fields (Pan et al., 2013, Pandey et al., 2015, Mohammadzadeh et al., 2016). The main advantage of MGGP is its ability to develop a compact and explicit prediction equation in terms of different model variables without assuming a prior form of the existing relationships (Muduli and Das, 2015). A previous study (Ouyang et al., 2017) demonstrated the superiority of MGGP over KRG and SVR. Recently, researchers have tended to combine multiple surrogate models in ensembles instead of selecting only the best model and discarding the rest (Acar and Rais-Rohani, 2008, Goel et al., 2006, Viana et al., 2009). However, the combination of MGGP surrogate modeling with other techniques has not previously been evaluated.
The aim of the present study is to determine an optimal groundwater remediation design for DNAPL-contaminated sites with minimum costs under certain constraints. To address the abovementioned concerns, this study 1) combines an MGGP surrogate model with other surrogate models to form ensembles and make comparisons between them, and 2) adopts a conservative strategy to address surrogate-modeling uncertainty in case of failure. The CCP method is used for comparison.
Section snippets
MGGP
MGGP is a robust variant of genetic programming (GP) and is designed to generate empirical mathematical models of the input–output relationship from the datasets. GP is based on the evaluation of a single gene, whereas MGGP is constructed from a number of genes (Gandomi and Alavi, 2012, Searson et al., 2007). Each gene evolved by MGGP is a structured tree composed of functions and terminals (Searson et al., 2007), as can be seen in Fig. 1. The function set can include elements such as
Site overview
The application of the proposed approach was analyzed on a hypothetical perchloroethylene (PCE)-contaminated site. UTCHEM (Center for Petroleum and Geosystems Engineering, 2000) software developed by the University of Texas was used to simulate the SEAR processes. The studied site was a three-dimensional domain with a horizontal area of 100 × 70 m2 and a depth of 20 m. The simulation domain consisted of 20 layers; each layer was discretized into 50 × 30 grid blocks. Each grid block had dimensions of 2
Analysis of surrogate models
First, three stand-alone surrogate models were constructed with MGGP, KRG and SVR. Two statistical metrics, the coefficient of determination (R2) (Eq. 18) and the root mean square error (RMSE) (Eq. 19), were used to assess model performance. The value of R2 indicates how well the surrogate approximates the simulation model; higher values indicate better approximation. The RMSE is an indicator of the precision of the surrogate model; smaller values indicate greater accuracy. These metrics can be
Conclusions
In this study, a conservative strategy was proposed to address surrogate-modeling uncertainty when using the surrogate-based optimization-simulation technique to identify optimal groundwater remediation design at DNAPLs-contaminated sites. In addition, the CCP method was employed to compare with this method. To construct a surrogate model that has favorable performance with both training and testing data, MGGP was combined with KRG, SVR and a combination of both methods to form ensemble
Acknowledgments
This work was supported by the National Natural Science Foundation of China (41372237) and the National Key Research and Development Program of China (No. 2016YFC0402804). The authors thank the editor and anonymous reviewers for their insightful comments and suggestions.
References (38)
- et al.
A compositional simulator for modeling surfactant enhancedaquifer remediation, 1 formulation
J. Contam. Hydrol.
(1996) - et al.
A stochastic optimization model under modeling uncertainty and parameter certainty for groundwater remediation design—part I. Model development
J. Hazard. Mater.
(2010) - et al.
Ensemble of surrogates-based optimization for identifying an optimal surfactant-enhanced aquifer remediation strategy at heterogeneous DNAPL-contaminated sites
Comput. Geosci.
(2015) - et al.
An efficient algorithm for constructing optimal design of computer experiments
J. Stat. Plan. Infer.
(2005) - et al.
Model uncertainty of SPT-based method for evaluation of seismic soil liquefaction potential using multi-gene genetic programming
Soils Found.
(2015) - et al.
Chance-constrained multi-objective optimization of groundwater remediation design at dnapls-contaminated sites using a multi-algorithm genetically adaptive method
J. Contam. Hydrol.
(2017) - et al.
Multi-gene genetic programming based predictive models for municipal solid waste gasification in a fluidized bed gasifier
Bioresour. Technol.
(2015) The convergence of variable metric methods for non-linearly constrained optimization calculations
Nonlinear Program.
(1978)- et al.
Simulation-based process optimization for surfactant-enhanced aquifer remediation at heterogeneous DNAPL-contaminated sites
Sci. Total Environ.
(2007) - et al.
A kriging surrogate model coupled in simulation–optimization approach for identifying release history of groundwater sources
J. Contam. Hydrol.
(2016)
Lightweight design of vehicle parameters under crashworthiness using conservative surrogates
Comput. Ind.
Comparing effectiveness of measures that improve aircraft structural safety
J. Aerosp. Eng.
Ensemble of metamodels with optimized weight factors
Struct. Multidiscip. Optim.
Neural Networks for Pattern Recognition
UTCHEM 9.0 Volume I
LIBSVM: A Library for Support Vector Machines
Chance-constrained programming
Manag. Sci.
A new multi-gene genetic programming approach to nonlinear system modeling. Part I: Materials and structural engineering problems
Neural Comput. & Applic.
A modified multi-gene genetic programming approach for modelling true stress of dynamic strain aging regime of austenitic stainless steel 304
Meccanica
Cited by (22)
Review of machine learning-based surrogate models of groundwater contaminant modeling
2023, Environmental ResearchA Kriging-based decoupled non-probability reliability-based design optimization scheme for piezoelectric PID control systems
2023, Mechanical Systems and Signal ProcessingSurrogate modeling for efficiently, accurately and conservatively estimating measures of risk
2022, Reliability Engineering and System Safety