A genetic programming-based model for drag coefficient of emergent vegetation in open channel flows

https://doi.org/10.1016/j.advwatres.2020.103582Get rights and content

Highlights

  • A new Ergun-like model for estimating drag coefficient of emergent vegetation is developed.

  • Machine learning technology is introduced into the study of drag coefficient.

  • The dependence of the two parameters in Ergun equation on vegetation characteristics are determined by genetic programming.

Abstract

The estimation of drag exerted by vegetation is of great interest because of its importance in assessing the impact of vegetation on the hydrodynamic processes in aquatic environments. In the current research, genetic programming (GP), a machine learning (ML) technique based on natural selection, was adopted to search for a robust relationship between the bulk drag coefficient (Cd) for arrays of rigid circular cylinders representing emergent vegetation with blockage ratio (ψ), vegetation density (λ) and pore Reynolds number (Rep) based on published data. We utilize a data set covering a wide range of each parameter involved to cover all possible dependencies. A new predictor, which shares the same form with the Ergun-derived formula, was obtained without any pre-specified forms before searching. The dependence of the two parameters in Ergun equation on vegetation characteristics was also estimated by GP. This new Cd predictor for emergent vegetation with a relatively concise form exhibits a considerable improvement in terms of prediction ability relative to existing predictors.

Introduction

Aquatic vegetation plays an important role in river ecosystems and greatly influences hydrodynamics processes within aquatic environments because of the additional resistance exerted by emergent and submerged vegetation (Etminan et al., 2017; Green, 2006; Huai et al., 2012; Nepf, 1999; Stoesser et al., 2010). The additional drag caused by vegetation may considerably reduce the flow velocity of the river, raise the water level and increase the risk of flooding. Moreover, by modifying the velocity profiles and turbulence structures in vegetated channels, the aquatic vegetation helps to reduce sediment erosion and suspended transport. Thus, a comprehensive understanding of vegetation resistance is essential to the assessment of the impact of vegetation on these hydrodynamic processes in aquatic environments. Many researchers have a strong interest in the estimation of the drag generated by vegetation because of its great significance in ecological and environmental engineering (Mattis et al., 2012; Tang et al., 2014).

In the past few decades, research on vegetation drag has mostly been carried out in laboratories. Given the complex geometry of real vegetation in the field, aquatic plants are usually modeled as an array of rigid circular cylinders with uniform diameter, d. This simplified geometry of real vegetation has been widely recognized by most researchers (Hamidifar et al., 2016; Li et al., 2018; Li and Zeng, 2009;Liu et al., 2008). The effect of canopies on the flow is mainly determined by the additional resistance, which can be quantified by the drag coefficient, defined as:Cd=FD12ρAUref2where FDis the vegetation-induced drag, ρ is the water density, A is the frontal area of the vegetation stems (In the case of an isolated circular cylinder, A=dL, L is the length of the cylinder immersed in water.), and Uref is a reference flow velocity. Usually the mean velocity approaching the cylinders is used as the reference velocity, which is equal to the upstream free flow velocity in the case of an isolated cylinder. For emergent vegetation, given the impact of vegetation distribution, using upstream free flow velocity (herein referred as “bulk flow velocity”, U) as the reference velocity may be reasonable only for very sparse vegetation. However, the mean pore velocity Up, Up=U/(1λ)=Q/(Wh(1λ)), is closer to the flow velocity that is actually approaching the vegetation stems than the bulk flow velocity, because the time-averaged longitudinal flow velocity is almost uniform in the vertical direction within emergent vegetation (Liu et al., 2008). Here, W is the width of the channel; Q is the channel flow rate; λ is the dimensionless vegetation density (λ=πnd2/4, where d is cylinder diameter and n is the number of cylinders per unit bed area). Therefore, in most studies, the mean pore velocity Up is typically used as the reference flow velocity (Kothyari et al., 2009; Tanino, 2012; Tanino and Nepf, 2008).

Usually, the drag coefficient Cd for emergent vegetation is obtained from laboratory experiments either by directly measuring the drag forces using load cells (strain gauges) or by equating driving forces with resistance caused by cylinders. The first method includes two forms. In one form, the drag force generated by the hole cylinder array is directly measured with a drag plate (Tinoco and Cowen, 2013; see Fig. 1 (b)), which can better reflect the bulk drag characteristics of the array. Obviously, the resistance measured by this method includes vegetation and bed resistance. In the numerical simulation results of Stoesser et al. (2010), bed shear stress only accounts for approximately 0.82%–6% of the total flow resistance; thus, bed resistance can be completely ignored. However, the disadvantage of this method is the relatively high requirement for experimental equipment. A simpler alternative method is to measure the drag force generated by a single cylinder in the array (Ishikawa et al., 2000; Kothyari et al., 2009; van Rooijen et al., 2018; see Fig. 1 (a)). That is, we only need to select one cylinder in the area where the flow is fully developed to measure its drag and use it to represent the drag characteristic of the entire cylinder array. More details about these two different methods can be found in Tinoco and Cowen (2013) and Kothyari et al. (2009), respectively. When the drag force is measured, Cd can be obtained from Eq. (1).

When the experimental conditions do not allow the direct measurement of the drag force, Cd can be derived from the force balance per fluid mass within the vegetated channel, which can be expressed as (Nepf, 1999):(1λ)CbUp2+12CdahUp2=gh(1λ)ηxwhere the first term on the left is the resistance exerted by the channel bed, with a bed friction coefficient, Cb, the second term on the left is the drag force contributed by the vegetation stems; λ is vegetation density, a is frontal facing area (For circular cylinders, λ=πad/4 and a = nd.), h is flow depth, g is acceleration due to gravity, Up is mean pore velocity, x is the streamwise direction, and η is water surface elevation. As mentioned above, the bed shear stress only accounts for 0.82%–6% of the total flow resistance, therefore, the first term on the left of Eq. (2) can be omitted. Then, Cd can be written as:Cd=2gS(1λ)aUp2where S is the energy slope, S=η/x. Obviously, when the uniform flow condition is reached in the experiment, S is just equal to the bottom slope of the channel (e.g., Cheng and Nguyen, 2011; James et al., 2001, 2004; Kim and Stoesser, 2011). For non-uniform flow, when the channel bed is horizontal, S is equal to ∂h/∂x (e.g., Ferreira et al., 2009; Meftah and Mossa, 2013; Nepf, 1999; Tanino and Nepf, 2008; Tanino, 2012).

Moreover, drag coefficient can also be obtained directly from high resolution numerical simulations, such as wall-resolving Large Eddy Simulation (LES) and Direct Numerical Simulation (DNS) (Chang et al., 2018; Etminan et al., 2017; Nicolle and Eames, 2011; Stoesser et al., 2010).

However, physical measurements and numerical simulation approaches are often unavailable or infeasible for the practical application of ecological and environmental engineering. Therefore, Cd must be estimated. Researchers have approached the study of estimating drag coefficient for emergent vegetation from different angles: pure theoretical derivation (Meftah and Mossa, 2013), regression analysis on the basis of a pre-given form obtained from theoretical derivation (Sonnenwald et al., 2017,2018; Tanino and Nepf, 2008), and multi-parameter regression analysis without any physical basis (Kothyari et al., 2009).

The drag coefficient predictor for an isolated cylinder is well established which can be expressed as (White, 1991):Cd=1+10Re2/3where the cylinder Reynolds number is Re=ρUd/μ, μ is dynamic viscosity. Note that for an isolated cylinder, λ=0, i.e., U=Up. However, the flow structure in a multi-cylinder array is much more complicated than the flow structure around a single isolated cylinder because of the strong wake-wake interaction and wake-cylinder interaction, especially when the vegetation density is not very small. Thus, Eq. (4) may not be directly used to estimate the drag coefficient of the multi-cylinder array. Based on Eq. (4), different Cd predictors for cylinder array can be obtained by defining Reynolds number and drag coefficient with various velocity and length scales (i.e., reference velocity and reference length). In emergent canopies, as mentioned above, previous researches have chosen the reference velocity scale to be the mean pore velocity, Up. However, through LES, Etminan et al. (2017) found that the “constriction cross-section velocity”, Uc, is the velocity scale that actually governs wake pressure and thus vegetation resistance. Therefore, Etminan et al. (2017) and van Rooijen et al. (2018) expressed the canopy drag coefficient, using the new velocity scale as:Cd,c=1+10Rec2/3where Cd, c and Rec are the canopy drag coefficient and cylinder Reynolds number based on Uc. Uc=(1λ)Up/(14λ/(βπ))=U/(1d/Ly), β is the ratio of the lateral distance between adjacent stems at the same streamwise location, Ly, to the distance between two rows of vegetation stems in the streamwise direction, Lx (as is shown in Fig. 5).

Further, Cheng (2013) re-fitted the empirical formula of the Cd for an isolated circular cylinder using the available experimental data. Subsequently, a pseudofluid model is developed to define a non-dimensional cylinder diameter, d*=(2CdRep2/π)1/3. With this new dimensionless cylinder diameter, Cheng (2013) extended the drag coefficient formula that was used for an isolated cylinder to the cylinder array. Finally, the drag coefficient formula for cylinder array can be written as:Cd=(11((1+S)Rep1+80λ)0.75+0.9[1exp(1000(1+80λ)(1+S)Rep)]+1.2[1exp(((1+S)Rep4500(1+80λ))0.7)])1+S1λwhere Rep is cylinder Reynolds number based on the mean pore velocity Up and cylinder diameter d, i.e., Rep=ρUpd/μ.

As mentioned above, by using different velocity and length scales to define Reynolds number and drag coefficient, different Cd predictors can be obtained. Etminan et al. (2017) and van Rooijen et al. (2018) redefined Reynolds number and Cd using a new velocity scale, Uc, and obtained Eq. (5). Similarly, Cheng and Nguyen (2011) proposed a new length scale, vegetation-related hydraulic radius, rv, to characterize vegetated channels. The new Reynolds number defined with the new length scale is written as:Rev=ρUprvμ=π(1λ)4λRepwhere rv=π(1λ)d/(4λ).With this new Reynolds number, Rev, Cheng and Nguyen (2011) proposed a best-fit function to empirically describe the relationship of Cd and Rev based on existing experimental data of Cd. The best-fit function can be expressed as:Cd=50Rev0.43+0.7[1exp(Rev15000)]

Ergun (1952) proposed an expression for pressure drop in packed columns, and Tanino and Nepf (2008) related this expression to drag coefficient giving:Cd=2(α0Rep+α1)where α0 and α1 are coefficients describing the viscous contribution that arises from the viscous shear stress on the cylinder surface and the inertial contribution that arises from the pressure loss in the cylinder wake, respectively. Sonnenwald et al. (2018) re-parametrized α0 and α1 in terms of λ and d based on least-squares curve fitting using available experimental data. Then, they obtained a new function for estimating Cd:Cd=2(6475d+32Rep+17d+3.2λ+0.5)where the coefficients of the d terms must have units m−1 to retain dimensionless. Tanino and Nepf (2008), Tinoco and Cowen (2013), and Sonnenwald et al. (2017) obtained different expressions with similar forms, which are not listed here.

Kothyari et al. (2009), using their experimental data, proposed a Cd predictor by multi-parameter regression analysis without considering any physical background written as:Cd=1.53[1+0.45ln(1+100λ)]Rep3/50

Obviously, all drag coefficient predictors mentioned above were developed with data-driven methods, such as regression analysis or curve-fitting with pre-specified forms. However, they were obtained with a relatively limited variable space and not physically sound, so they can only perform well in some certain cases (as discussed in Section 3.2). When extrapolated to conditions outside the sampled parameter space, it often produces absurd drag coefficient values or makes the predictors meaningless (e.g., Eqs. (8) and (10)). Generally, researchers analyze the dependence of drag force on flow parameters and vegetation geometry and then subjectively suggest the variables that may affect Cd, and even further, subjectively suggest an analytical expression, i.e., giving a pre-specified form of Cd predictor (Sonnenwald e al., 2017, 2018). This means that the predictive ability of the model is closely related to the reasonableness of the subjective “suggestion”. For instance, Sonnenwald et al. (2018) pre-specified that the Cd predictor has the Ergun-like form and considered that α0 was linearly related to d, and α1 was linearly related to d and λ. Then a drag coefficient model with mixed dimensional and non-dimensional parameters was obtained, however, it may lead to ridiculous Cd values (considering a large cylinder with 1 m diameter), which is dangerous. Moreover, several predictors developed by traditional multi-parameter regression methods are very complicated and not physically interpretable (e.g., Eqs. (6), ((8), and (10)), which is not conducive to our further understanding of the influence mechanism of flow variables and vegetation geometry on drag force.

However, a more advanced data-driven method, machine learning (ML) technique, enables the exploration of all possible relationships among variables in complicated phenomena on the basis of a provided data set. In present research, we utilize genetic programming (GP), an ML technique based on natural selection to develop a predictor of drag coefficient for emergent vegetation directly from the experimental data that have been published before. Compared with the conventional data-driven methods mentioned above, GP does not require researchers to make relatively subjective analysis of the data set. It is based on the principle of natural selection to automatically find all possible relations between the variables and only retains the solutions with higher accuracy in the evolving process, instead of needing researchers to subjectively suggest a possible expression of drag coefficient on flow variables and vegetation features. At the same time, the GP algorithm can take all variables into the development of a predictor, while leaving the task of identifying related variables to the program (Wang et al., 2017). In contrast, as mentioned above, the conventional regression methods usually rely on the researchers’ subjective “suggesting”. Nevertheless, the current understanding of the influence mechanism of drag force is not yet sufficient. As a result, these subjective suggestions will lead to a poor predictor. While the GP algorithm can well avoid the subjective suggesting from researchers. Furthermore, since the time used to analysis the data set manually is greatly reduced in GP, more time can be devoted to explore the physical background of Cd (Tinoco et al., 2015; Wang et al., 2017). Thus, the solution given by GP is simple in form and physically sound, thus having good performance.

In this study, a well-documented and widely tested GP software, Eureqa (Schmidt and Lipson, 2013), which has been successfully applied to the studies of vegetated flow (Huai et al., 2018; Tinoco et al., 2015; Wang et al., 2017), is used to find a new predictor without any pre-specified forms (in contrast to previous data-driven methods which pre-specify a form and proceed to find the optimal relationship through regression analysis. e.g., Sonnenwald et al., 2017, 2018; Tanino and Nepf, 2008; Tinoco and Cowen, 2013).

Section snippets

Data pre-processing

Laboratory experiments usually cannot explore a wide range of variables in a single research due to the limitations of cost and experimental conditions. In order to avoid the common problem of developing a new predictor, calibrated to a single data set, which performs better than other models on that single data, we use multiple data sets. We collected a large number of observed data from 10 researches to cover a spacious parameter space (as seen in Table 1 and Fig. 2). In all studies, smooth

Results

After evaluating 4.6×1011 formulas, only 7 formulas survived with 1 as the smallest size and 37 as the largest size (Table 2). Subsequent runs showed that, even when more formulas were evaluated, solutions with the same forms were obtained, though their coefficients may vary slightly.

As shown in Table 2, the most complicated formula (with a complexity of 37) is the best drag coefficient predictor in terms of MAE and MSE. However, its high complexity is likely to be the result of statistical

Summary and conclusion

In the present study, the genetic programming (GP) algorithm was used to develop a new predictor of drag coefficient for idealized emergent vegetation on the basis of existing experimental data. Among all the solutions provided by the GP software, Eureqa, the final expression of Cd was selected by balancing the complexity, accuracy and physical meaning. Although without any pre-specified forms before searching, a solution that agrees with Ergun's (1952) formula in form was obtained. The method

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

CRediT authorship contribution statement

Meng-Yang Liu: Conceptualization, Methodology, Software, Validation, Formal analysis, Investigation, Writing - original draft, Writing - review & editing, Visualization. Wen-Xin Huai: Conceptualization, Methodology, Writing - review & editing, Supervision, Project administration, Funding acquisition. Zhong-Hua Yang: Writing - review & editing, Supervision. Yu-Hong Zeng: Writing - review & editing.

Acknowledgements

The authors would like to thank Y. Tanino and A. van Rooijen for providing the drag coefficient data from their experiments. Comments and suggestions made by the Editor Dr. G.C. Sander, Associate Editor Dr. R. Ferreira, and Reviewers have greatly improved the quality of the paper. The work was supported by the National Natural Science Foundation of China [Grant numbers 11872285, and 11672213].

References (39)

  • M. Ben Meftah et al.

    Prediction of channel flow characteristics through square arrays of emergent cylinders

    Phys. Fluids

    (2013)
  • K. Chang et al.

    2-D eddy resolving simulations of flow past a circular array of cylindrical plant stems

    J. Hydrodyn.

    (2018)
  • N.S. Cheng

    Calculation of drag coefficient for arrays of emergent circular cylinders with pseudofluid model

    J. Hydraulic Eng.

    (2013)
  • N.S. Cheng et al.

    Hydraulic radius for evaluating resistance induced by simulated emergent vegetation in open-channel flows

    J. Hydraulic Eng.

    (2011)
  • S. Ergun

    Fluid flow through packed columns. Chem. Eng

    Prog

    (1952)
  • V. Etminan et al.

    A new model for predicting the drag exerted by vegetation canopies

    Water Resour. Res.

    (2017)
  • R.M.L. Ferreira et al.

    Discussion of "Laboratory investigation of mean drag in a random array of rigid, emergent cylinders" by Yukie Tanino and Heidi M

    Nepf. J. Hydraulic Eng.

    (2009)
  • W.X. Huai et al.

    Estimating the transverse mixing coefficient in laboratory flumes and natural rivers

    Water Air Soil Pollut.

    (2018)
  • Y. Ishikawa et al.

    Effect of density of trees on drag exerted on trees in river channels

    J. Forest Res.

    (2000)
  • Cited by (46)

    View all citing articles on Scopus
    View full text