Glucose forecasting combining Markov chain based enrichment of data, random grammatical evolution and Bagging
Introduction
Diabetes Mellitus is a chronic disease caused by a defect either in the production or in the action of the insulin generated by the pancreatic system. It is one of the diseases with a higher prevalence in the world. Hence, the International Diabetes Federation estimates around 415 million diabetic patients [1] (rising from 108 million since 1980), which is about 8%–10% of prevalence on adults over 18 years, and it is the seventh leading cause of death in 2016, with 1.6 million deaths directly caused by diabetes and 2.2 million additional deaths attributable to high blood glucose. Diabetes negative consequences can be avoided, or at least delayed, following a healthy diet and doing regular physical activity.
People who have diabetes can be distinguished in two main types: Type 1 (T1DM) and Type 2 (T2DM). In the case of T1DM patients, their pancreas is not able to produce enough insulin to process the sugar produced after the food ingestion. Those patients need to incorporate additional artificial insulin with each meal, and sometimes between meals, to maintain healthy levels of glucose. They have two alternatives to incorporate the insulin, either using an insulin pump or using a multiple manual injections/doses (MID). The insulin pump, a.k.a. continuous subcutaneous insulin infuser (CSII), allows greater flexibility and precision in the injections, although implies porting a connected device all day and night. In both alternatives, the decisions about the amount of insulin are challenging and have to consider many factors. In the case of T2DM patients, the insulin produced by the pancreas is not working correctly, in a phenomena known as insulin resistance. In advanced stages of the disease, many T2DM patients need to resort to inject artificial insulin using MID.
For both types of diabetes, it is essential to take the right decisions regarding the amount of insulin to be injected. If too much insulin is injected, hypoglycemia may occur, while insufficient injections keep glucose levels too high. The goal is to maintain the blood glucose levels within the target range most of the time, usually between 70 and 180 mg/dl [2]. It has been shown that when these values are not maintained or there is high variability then both, short-term and long-term complications, can emerge.
Control of blood glucose in insulin-dependent patients requires to predict the future glucose values to determine the amount of insulin to inject. This amount depends on many factors, but, above all, the patient should account for four of them: (i) the glucose value at the time of injection; (ii) the estimation of the amount of food ingested, usually measured in carbohydrate rations; (iii) the insulin previously injected; and (iv) the estimation of the ratio of how much is still active in the body. Doing all of these estimations manually is a complicated process that has to be done several times every day. Fortunately, recent advances in both devices and algorithms allow automating some parts of this control process and make it easier for people with diabetes.
There are different kinds of blood glucose control strategies [3]:
- •
Traditional therapies with manual calculation and administration of the insulin protocol [4]. Patients decide the amount of insulin under the guidance of medical staff and their own experience.
- •
Insulin pump therapies (semi-automated) [5]. Although the insulin injection is automated, patients have to be alert and to detect anomalous glucose situations, stopping the infusion of insulin or correcting a trend through the infusion of glucagon or the ingestion of additional meals. The decision is taken similarly to traditional therapies.
- •
Solutions based on the artificial pancreas [6]. This approach is the ideal solution, although it is under research yet.
In any case, for all the strategies, it is essential to develop mathematical models or artificial intelligence systems that describe the interaction between the glucose system and the insulin using the measurements and stored data. Continuous glucose monitoring systems (CGMS) have shown to be very valuable because they make conceivable to record glucose values conveniently and reliably, and to develop models from these data.
When developing predictive models, several challenges have to be addressed. One of them is the difficulty of recording a large amount of data to train the model. CGMS are expensive (approximately EUR120 per month), and recording all food and insulin events can be tedious for patients, leading to data with a low temporal frequency. In addition, the measurement of the glucose by the device has an intrinsic uncertainty that cannot be captured entirely in the isolated recorded value. This scarcity in the amount of data sometimes leads to the obtention of overfitted models.
In this paper, we present a three-steps methodology to support the automation of the insulin bolus decision. Specifically, we generate glucose models from historical data:
- •
First, a data enrichment step based on Markov chains. It permits including the intrinsic uncertainty of the collected data and increasing the size of the training dataset. We work with two different methods.
- •
The second step is to pass the enhanced dataset through a model generation engine. We test two different methods, both based on grammatical evolution (GE) tackling the modeling as a symbolic regression problem. We implement a classical GE for symbolic regression, also studied in other works, to compare with a new proposal, Random-GE, which follows the principles of the Random-Forest machine learning approach.
- •
Finally, ensemble models are obtained under two different versions: (i) by an univariate marginal distribution algorithm (UMDA), which selects the set of models to assemble, and (ii) by Bagging [7].
With those steps, we improve previous works that also apply GE. The combination of models provides the patient with glucose predictions for an insulin–food pair.
To test the validity of our proposal, we obtain models using data from five real patients from a public hospital in Spain. The experimental results show that our models obtain more precise and robust predictions reducing the number of dangerous mispredictions when compared with previous approaches. For the analysis of our experimental results we use the Clarke’s error grid metric [8].
The rest of the paper is organized as follows. In Section 2, we review the state of the art of this problem. Section 3 is the core of the paper, where we explain the methodology and the main contributions of the work. We continue with the description of the data, metrics, experimental results, and discussion in Section 4. Conclusions and future work are given in Section 5.
Section snippets
Related work
The problem of predicting and modeling glucose levels has been an intensive area of research for the last ten years. We can group works according to their primary objective: either to predict glucose levels for a forecasting horizon of up to two hours or to identify 24-h models. The first group tries to be an aid in the daily management of insulin, as this forecasting horizon is usually the time needed for the patient to decide the dose of insulin after a meal. The usefulness of 24-h models is
Methodology
Patients have to handle two types of situations in which insulin is necessary: basal insulin and prandial insulin. In non-diabetic individuals, basal insulin is continuously secreted to process the blood glucose. When injected to a diabetic patient through a CSII, the insulin is provided as a series of small injections. In the case of MID, usually one slow insulin injection, an insulin with action profile of long duration, is applied, taking the role of the whole set of small injections in the
Experimental results
We report the Clarke’s Error Grid Analysis (EGA) [46]. EGA is a scatterplot of the experimental results, where we represent the prediction versus actual observations. As in other scatterplots, the bisectrix represents a perfect prediction, and the plot is divided into five zones, which represent the danger or severity of an error in the prediction. This analysis takes into account the particularities of the diabetes clinical practice. For instance, we reach a hazardous situation if a prediction
Conclusions
In this paper, we continue the research on techniques to improve prediction models for glucose obtained using grammatical evolution as the symbolic regression tool. We have tested a new proposal named Random-GE which combines the properties of Random Forests and Grammatical Evolution. Random-GE shows it robustness maintaining similar percentages for different time horizons. The main features of our methodology are:
- •
Data of the patients are enriched with synthetic time series generated using
Declaration of Competing Interest
No author associated with this paper has disclosed any potential or pertinent conflicts which may be perceived to have impending conflict with this work. For full disclosure statements refer to https://doi.org/10.1016/j.asoc.2019.105923.
Acknowledgments
This work has been partially supported by Fundación Eugenio Rodriguez Pascual, Spain 2019 grant -Desarrollo de sistemas adaptativos y bioinspirados para el control glucémico con infusores subcutáneos continuos de insulina y monitores continuos de glucosa (Development of adaptive and bioinspired systems for glycemic control with continuous subcutaneous insulin infusors and continuous glucose monitors)-GLENO Project; the Spanish Ministerio de Economia y Competitividad grant RTI2018-095180-B-I00,
References (46)
- et al.
Idf diabetes atlas: Global estimates for the prevalence of diabetes for 2015 and 2040
Diabetes Res. Clin. Pract.
(2017) - et al.
Manual closed-loop insulin delivery in children and adolescents with type 1 diabetes: a phase 2 randomised crossover trial
Lancet
(2010) - et al.
Rapid model identification for online glucose prediction of new subjects with type 1 diabetes using model migration method
IFAC Proc. Vol.
(2014) Identification for control: From the early achievements to the revival of experiment design*
Eur. J. Control
(2005)- et al.
Genetic programming-based induction of a glucose-dynamics model for telemedicine
J. Netw. Comput. Appl.
(2018) - et al.
Risk-based postprandial hypoglycemia forecasting using supervised learning
Int. J. Med. Inf.
(2019) Diabetes mellitus standards of Care
Nurs. Clin. N. Am.
(2015)- et al.
Comparative analysis of a-priori and a-posteriori dietary patterns using state-of-the-art classification algorithms: A case/case-control study
Artif. Intell. Med.
(2013) - et al.
Glucmodel: A monitoring and modeling system for chronic diseases applied to diabetes
J. Biomed. Inform.
(2014) - et al.
Glucose concentration can be predicted ahead in time from continuous glucose monitoring sensor time-series
IEEE Trans. Biomed. Eng.
(2007)
Data based prediction of blood glucose concentrations using evolutionary methods
J. Med. Syst.
Insulin administration: selecting the appropriate needle and individualizing the injection technique
Expert Opin. Drug Deliv.
Insulin pump therapy
Diabetes Care
A review of artificial pancreas technologies with an emphasis on bi-hormonal therapy
Diabetes Obes. Metab.
Bagging predictors
Mach. Learn.
Evaluating clinical accuracy of systems for self-monitoring of blood glucose.
Diabetes Care
A genetic algorithm approach to customizing a glucose model based on usual therapeutic parameters
Prog. Artif. Intell.
Preclinically assessed optimal control of postprandial glucose excursions for type 1 patients with diabetes
Multinational study of subcutaneous model-predictive closed loop control in type 1 diabetes mellitus: Summary of the results
Diabetes Sci. Technol.
A bihormonal closed-loop artificial pancreas for type 1 diabetes.
Sci. Trans. Med.
Run-to-run tuning of model predictive control for type 1 diabetes subjects: In silico trial
J. Diabetes Sci. Technol.
Artificial pancreatic beta-cell protocol for enhanced model identification
The effect of insulin feedback on closed loop glucose control
J. Clin. Endocrinol. Metab.
Cited by (17)
Reducing high-risk glucose forecasting errors by evolving interpretable models for Type 1 diabetes
2023, Applied Soft ComputingEnsemble blood glucose prediction in diabetes mellitus: A review
2022, Computers in Biology and MedicineCitation Excerpt :Several combination schemes have been investigated in the selected papers: weighted average and average were the most frequently used combiners. The average combiner [45–47,62,63,68–70,72,74,78,79] simply calculates the mean of all the single learner's prediction outputs and was mainly used by bagging meta-algorithms and RFs in particular. This combination method has the advantage to be simple and effective.
An autonomous channel deep learning framework for blood glucose prediction
2022, Applied Soft ComputingCitation Excerpt :It is not easy to establish an accurate physiological model for BG prediction due to the dependence on physiological parameters and professional knowledge. In contrast, the data-driven models without involving physiological variables can be modeled quickly with the historical CGM data and other variables [17,18]. Therefore, the data-driven models are used by most of the current BG prediction methods and achieve prediction accuracy comparable to physiological and hybrid models in practical applications.
Data-based algorithms and models using diabetics real data for blood glucose and hypoglycaemia prediction – A systematic literature review
2021, Artificial Intelligence in MedicineSpecial issue on Bio-inspired optimization techniques for Biomedical Data Analysis: Methods and applications
2020, Applied Soft Computing Journal