Modelling formulations using gene expression programming – A comparative analysis with artificial neural networks

https://doi.org/10.1016/j.ejps.2011.08.021Get rights and content

Abstract

This study has investigated the utility and potential advantages of gene expression programming (GEP) – a new development in evolutionary computing for modelling data and automatically generating equations that describe the cause-and-effect relationships in a system- to four types of pharmaceutical formulation and compared the models with those generated by neural networks, a technique now widely used in the formulation development. Both methods were capable of discovering subtle and non-linear relationships within the data, with no requirement from the user to specify the functional forms that should be used. Although the neural networks rapidly developed models with higher values for the ANOVA R2 these were black box and provided little insight into the key relationships. However, GEP, although significantly slower at developing models, generated relatively simple equations describing the relationships that could be interpreted directly. The results indicate that GEP can be considered an effective and efficient modelling technique for formulation data.

Introduction

Formulation development is a complicated problem set in a design space that is multidimensional in nature and difficult to conceptualise. In order to reduce the development timescale within a framework of quality by design research into advanced computational techniques have resulted in the use of expert and knowledge base systems for the generation of initial formulations (Rowe and Roberts, 1998) and data mining for modelling formulation and process data. (Ekins, 2006, Balakin, 2010). Statistical methods have been used for over thirty years in pharmaceutical formulation to derive equations expressing the relevant cause-and-effect relationships, and over the last fifteen years artificial neural networks (ANN) have gained increasing usage in this area as a means of developing useful models from experimental data allowing predictions to be made and formulations optimised (Achanta et al., 1995, Bourquin et al., 1997, Takayama et al., 2003, Colbourn and Rowe, 2005). Neural networks cope well with complex non-linear relationships with the added advantage that the functional form between the dependent and independent variables need not be chosen a priori as with statistical techniques. However, the developed models are invariably ‘black-boxes’ and difficult to interpret except indirectly by examining response surfaces.

In order to overcome this specific disadvantage Do et al. (2008) proposed the use of a new technique – genetic programming (GP). This method based on evolutionary computing (Koza, 1998) is also able to generate mathematical equations and Do et al. (2008) showed that it could be applied to modelling drug dissolution from controlled release formulations. They showed that it was capable of not only providing equations relating the variables that could be interpreted directly but also the models exhibited comparable predictive power to statistics. The results support work carried out by Gusel and Brezocnik (2006) on modelling the impact toughness of a copper alloy; although in their case the GP models were more precise than the statistical models, the equations were very complex.

In this paper gene expression programming (GEP), an extension of GP recently proposed by Ferreira (2001) and claimed (Ferreira, 2006) to produce models more quickly, has been evaluated as a modelling technique for four different types of pharmaceutical formulations and the models compared with those generated using multi-layer perceptron (MLP) neural networks.

Section snippets

Gene expression programming

Because it is a new technique in the pharmaceutical formulation literature, a brief review of Gene Expression Programming is given here. GEP is a development of genetic programming (GP); both are part of the general family of Evolutionary Computing, a methodology in which one or more populations of individual members, each of which provides a possible fit to the data, are generated at random. The fitness of each individual is assessed by seeing how well it fits the training data, and the

Results and discussion

In all cases below, for the GEP modelling the population size was fixed at 1000 and 10 separate populations were considered. Other parameters used in the specific models are as discussed below.

Conclusions

For all the data sets examined here, a careful experimental design had been used in developing the strategy for data collection. One of the strengths of GEP compared to neural networks is that it has the ability to exclude irrelevant inputs, and this is evident from the immediate release tablet example, where two input variables summed to a constant value. In this case, for many of the models, GEP selected one of the inputs and omitted the other.

GEP was capable of developing good models for all

References (22)

  • J. Bourquin et al.

    Basic concepts of artificial neural networks (ANN) modelling in the application to pharmaceutical development

    Pharm. Dev. Technol.

    (1997)
  • Cited by (30)

    • Release modeling of nanoencapsulated food ingredients by artificial intelligence algorithms

      2020, Release and Bioavailability of Nanoencapsulated Food Ingredients
    • Artificial Intelligence Tools for Scaling Up of High Shear Wet Granulation Process

      2017, Journal of Pharmaceutical Sciences
      Citation Excerpt :

      Using Equation 1, impeller power values were predicted for each condition of the mixer operation and wet granule properties measured. Two commercial software packages FormRules® v4.03 and INForm® v5.01 (Intelligensys Ltd., North Yorkshire, UK) which implement neurofuzzy logic and GEP technologies, respectively, were used in this study.11,16 The FormRules model was obtained using results from the PMA 25L, 100L, and 600 L experiments (41 records).

    • Qualitative and quantitative methods to determine miscibility in amorphous drug-polymer systems

      2015, European Journal of Pharmaceutical Sciences
      Citation Excerpt :

      The study also demonstrated that the solubility parameter is limited in predicting miscibility of molten system, since the properties such as the viscosity of the polymers might change significantly during thermal events (Liu et al., 2013). In the pharmaceutical drug development process, data mining have been employed for various purposes including, the understanding of the structure–activity relationships, the prediction of absorption, distribution, metabolism and elimination of drugs, and the prediction of the changes in the solid-state properties of pharmaceutical compounds (Butina et al., 2002; Colbourn et al., 2011; Mahlin et al., 2011; Mendyk et al., 2008). Recently computational data mining have been developed as a theoretical approach to evaluate the drug–excipient miscibility.

    • Establishing and analyzing the design space in the development of direct compression formulations by gene expression programming

      2012, International Journal of Pharmaceutics
      Citation Excerpt :

      However, it has been applied successfully in solving some problems within the engineering and food industry fields in the development of new and better materials (Eskil and Kanca, 2008), the prediction of material properties (Antoniou et al., 2010) and the improvement of food processing (Kahyaoglu, 2008). Recently it has been applied to modeling pharmaceutical formulations (Colbourn et al., 2011) where the GEP approach has been compared to neural networks. Using a desktop computer, researchers can handle by GEP, a large number of variables (inputs and outputs) simultaneously.

    View all citing articles on Scopus
    View full text