Glucose forecasting combining Markov chain based enrichment of data, random grammatical evolution and Bagging

https://doi.org/10.1016/j.asoc.2019.105923Get rights and content

Highlights

  • We present a methodology to automatize the decision of the insulin bolus which reduces the number of dangerous predictions.

  • It combines Markov chain based enrichment of data, grammatical evolution and different ways of selecting the set of models to assemble.

  • The experimental results show that our models obtain more precise and robust predictions and less number of dangerous predictions on the Clarke error grid.

  • Two approaches to generate synthetic time series based on the Markov chains were tested: random walks following the Metropolis–Hastings algorithm and symbolic aggregate approximation and the sampling importance resampling technique.

  • We present a new Algorithm, Random-GE, which combines the properties of Grammatical Evolution and Random Forests.

Abstract

Diabetes Mellitus is a disease affecting more and more people every year. Depending on the kind of diabetes and sometimes on the stage of the illness, diabetic patients have to inject some amount of artificial insulin, namely bolus, before the meals, to make up the absence or malfunctioning of their natural insulin. This decision is a difficult task since they need to estimate the number of carbohydrates they are going to ingest, take into account the past and future circumstances, know the past values of glucose, evaluate if the effect of previously injected insulin has already finished and any other relevant information. In this paper, we present and compare a set of methodologies to automate the decision of the insulin bolus, which reduces the number of dangerous predictions. We combine two different data enrichment techniques based on Markov chains with grammatical evolution engines to generate models of blood glucose, and univariate marginal distribution algorithms and bagging techniques to select the set of models to assemble. In particular, we propose the Random-GE procedure, an adaptation of Random Forests to Grammatical Evolution, which leads to excellent prediction models, with a simple configuration and a reduced execution time. The ensemble gives the prediction of glucose for a duple of food and insulins, helping patients in the selection of the appropriate bolus to maintain healthy glucose levels after the meals. Experimental results show that our models get more accurate and robust predictions than previous approaches.

Introduction

Diabetes Mellitus is a chronic disease caused by a defect either in the production or in the action of the insulin generated by the pancreatic system. It is one of the diseases with a higher prevalence in the world. Hence, the International Diabetes Federation estimates around 415 million diabetic patients [1] (rising from 108 million since 1980), which is about 8%–10% of prevalence on adults over 18 years, and it is the seventh leading cause of death in 2016, with 1.6 million deaths directly caused by diabetes and 2.2 million additional deaths attributable to high blood glucose. Diabetes negative consequences can be avoided, or at least delayed, following a healthy diet and doing regular physical activity.

People who have diabetes can be distinguished in two main types: Type 1 (T1DM) and Type 2 (T2DM). In the case of T1DM patients, their pancreas is not able to produce enough insulin to process the sugar produced after the food ingestion. Those patients need to incorporate additional artificial insulin with each meal, and sometimes between meals, to maintain healthy levels of glucose. They have two alternatives to incorporate the insulin, either using an insulin pump or using a multiple manual injections/doses (MID). The insulin pump, a.k.a. continuous subcutaneous insulin infuser (CSII), allows greater flexibility and precision in the injections, although implies porting a connected device all day and night. In both alternatives, the decisions about the amount of insulin are challenging and have to consider many factors. In the case of T2DM patients, the insulin produced by the pancreas is not working correctly, in a phenomena known as insulin resistance. In advanced stages of the disease, many T2DM patients need to resort to inject artificial insulin using MID.

For both types of diabetes, it is essential to take the right decisions regarding the amount of insulin to be injected. If too much insulin is injected, hypoglycemia may occur, while insufficient injections keep glucose levels too high. The goal is to maintain the blood glucose levels within the target range most of the time, usually between 70 and 180 mg/dl [2]. It has been shown that when these values are not maintained or there is high variability then both, short-term and long-term complications, can emerge.

Control of blood glucose in insulin-dependent patients requires to predict the future glucose values to determine the amount of insulin to inject. This amount depends on many factors, but, above all, the patient should account for four of them: (i) the glucose value at the time of injection; (ii) the estimation of the amount of food ingested, usually measured in carbohydrate rations; (iii) the insulin previously injected; and (iv) the estimation of the ratio of how much is still active in the body. Doing all of these estimations manually is a complicated process that has to be done several times every day. Fortunately, recent advances in both devices and algorithms allow automating some parts of this control process and make it easier for people with diabetes.

There are different kinds of blood glucose control strategies [3]:

  • Traditional therapies with manual calculation and administration of the insulin protocol [4]. Patients decide the amount of insulin under the guidance of medical staff and their own experience.

  • Insulin pump therapies (semi-automated) [5]. Although the insulin injection is automated, patients have to be alert and to detect anomalous glucose situations, stopping the infusion of insulin or correcting a trend through the infusion of glucagon or the ingestion of additional meals. The decision is taken similarly to traditional therapies.

  • Solutions based on the artificial pancreas [6]. This approach is the ideal solution, although it is under research yet.

In any case, for all the strategies, it is essential to develop mathematical models or artificial intelligence systems that describe the interaction between the glucose system and the insulin using the measurements and stored data. Continuous glucose monitoring systems (CGMS) have shown to be very valuable because they make conceivable to record glucose values conveniently and reliably, and to develop models from these data.

When developing predictive models, several challenges have to be addressed. One of them is the difficulty of recording a large amount of data to train the model. CGMS are expensive (approximately EUR120 per month), and recording all food and insulin events can be tedious for patients, leading to data with a low temporal frequency. In addition, the measurement of the glucose by the device has an intrinsic uncertainty that cannot be captured entirely in the isolated recorded value. This scarcity in the amount of data sometimes leads to the obtention of overfitted models.

In this paper, we present a three-steps methodology to support the automation of the insulin bolus decision. Specifically, we generate glucose models from historical data:

  • First, a data enrichment step based on Markov chains. It permits including the intrinsic uncertainty of the collected data and increasing the size of the training dataset. We work with two different methods.

  • The second step is to pass the enhanced dataset through a model generation engine. We test two different methods, both based on grammatical evolution (GE) tackling the modeling as a symbolic regression problem. We implement a classical GE for symbolic regression, also studied in other works, to compare with a new proposal, Random-GE, which follows the principles of the Random-Forest machine learning approach.

  • Finally, ensemble models are obtained under two different versions: (i) by an univariate marginal distribution algorithm (UMDA), which selects the set of models to assemble, and (ii) by Bagging [7].

With those steps, we improve previous works that also apply GE. The combination of models provides the patient with glucose predictions for an insulin–food pair.

To test the validity of our proposal, we obtain models using data from five real patients from a public hospital in Spain. The experimental results show that our models obtain more precise and robust predictions reducing the number of dangerous mispredictions when compared with previous approaches. For the analysis of our experimental results we use the Clarke’s error grid metric [8].

The rest of the paper is organized as follows. In Section 2, we review the state of the art of this problem. Section 3 is the core of the paper, where we explain the methodology and the main contributions of the work. We continue with the description of the data, metrics, experimental results, and discussion in Section 4. Conclusions and future work are given in Section 5.

Section snippets

Related work

The problem of predicting and modeling glucose levels has been an intensive area of research for the last ten years. We can group works according to their primary objective: either to predict glucose levels for a forecasting horizon of up to two hours or to identify 24-h models. The first group tries to be an aid in the daily management of insulin, as this forecasting horizon is usually the time needed for the patient to decide the dose of insulin after a meal. The usefulness of 24-h models is

Methodology

Patients have to handle two types of situations in which insulin is necessary: basal insulin and prandial insulin. In non-diabetic individuals, basal insulin is continuously secreted to process the blood glucose. When injected to a diabetic patient through a CSII, the insulin is provided as a series of small injections. In the case of MID, usually one slow insulin injection, an insulin with action profile of long duration, is applied, taking the role of the whole set of small injections in the

Experimental results

We report the Clarke’s Error Grid Analysis (EGA) [46]. EGA is a scatterplot of the experimental results, where we represent the prediction versus actual observations. As in other scatterplots, the bisectrix represents a perfect prediction, and the plot is divided into five zones, which represent the danger or severity of an error in the prediction. This analysis takes into account the particularities of the diabetes clinical practice. For instance, we reach a hazardous situation if a prediction

Conclusions

In this paper, we continue the research on techniques to improve prediction models for glucose obtained using grammatical evolution as the symbolic regression tool. We have tested a new proposal named Random-GE which combines the properties of Random Forests and Grammatical Evolution. Random-GE shows it robustness maintaining similar percentages for different time horizons. The main features of our methodology are:

  • Data of the patients are enriched with synthetic time series generated using

Declaration of Competing Interest

No author associated with this paper has disclosed any potential or pertinent conflicts which may be perceived to have impending conflict with this work. For full disclosure statements refer to https://doi.org/10.1016/j.asoc.2019.105923.

Acknowledgments

This work has been partially supported by Fundación Eugenio Rodriguez Pascual, Spain 2019 grant -Desarrollo de sistemas adaptativos y bioinspirados para el control glucémico con infusores subcutáneos continuos de insulina y monitores continuos de glucosa (Development of adaptive and bioinspired systems for glycemic control with continuous subcutaneous insulin infusors and continuous glucose monitors)-GLENO Project; the Spanish Ministerio de Economia y Competitividad grant RTI2018-095180-B-I00,

References (46)

  • HidalgoJ.I. et al.

    Data based prediction of blood glucose concentrations using evolutionary methods

    J. Med. Syst.

    (2017)
  • HansenB. et al.

    Insulin administration: selecting the appropriate needle and individualizing the injection technique

    Expert Opin. Drug Deliv.

    (2011)
  • Weissberg-BenchellJ. et al.

    Insulin pump therapy

    Diabetes Care

    (2003)
  • BakhtianiP.A. et al.

    A review of artificial pancreas technologies with an emphasis on bi-hormonal therapy

    Diabetes Obes. Metab.

    (2013)
  • BreimanL.

    Bagging predictors

    Mach. Learn.

    (1996)
  • ClarkeW. et al.

    Evaluating clinical accuracy of systems for self-monitoring of blood glucose.

    Diabetes Care

    (1987)
  • CervigónC. et al.

    A genetic algorithm approach to customizing a glucose model based on usual therapeutic parameters

    Prog. Artif. Intell.

    (2017)
  • Prud’HommeT. et al.

    Preclinically assessed optimal control of postprandial glucose excursions for type 1 patients with diabetes

  • KovatchevB. et al.

    Multinational study of subcutaneous model-predictive closed loop control in type 1 diabetes mellitus: Summary of the results

    Diabetes Sci. Technol.

    (2010)
  • El-KhatibF. et al.

    A bihormonal closed-loop artificial pancreas for type 1 diabetes.

    Sci. Trans. Med.

    (2010)
  • MagniL. et al.

    Run-to-run tuning of model predictive control for type 1 diabetes subjects: In silico trial

    J. Diabetes Sci. Technol.

    (2009)
  • DassauE. et al.

    Artificial pancreatic beta-cell protocol for enhanced model identification

  • SteilG.M. et al.

    The effect of insulin feedback on closed loop glucose control

    J. Clin. Endocrinol. Metab.

    (2011)
  • Cited by (17)

    • Ensemble blood glucose prediction in diabetes mellitus: A review

      2022, Computers in Biology and Medicine
      Citation Excerpt :

      Several combination schemes have been investigated in the selected papers: weighted average and average were the most frequently used combiners. The average combiner [45–47,62,63,68–70,72,74,78,79] simply calculates the mean of all the single learner's prediction outputs and was mainly used by bagging meta-algorithms and RFs in particular. This combination method has the advantage to be simple and effective.

    • An autonomous channel deep learning framework for blood glucose prediction

      2022, Applied Soft Computing
      Citation Excerpt :

      It is not easy to establish an accurate physiological model for BG prediction due to the dependence on physiological parameters and professional knowledge. In contrast, the data-driven models without involving physiological variables can be modeled quickly with the historical CGM data and other variables [17,18]. Therefore, the data-driven models are used by most of the current BG prediction methods and achieve prediction accuracy comparable to physiological and hybrid models in practical applications.

    View all citing articles on Scopus
    View full text