abstract = "Machine learning (ML) models trained for triggering
clinical decision support (CDS) are typically either
accurate or interpretable but not both. Scaling CDS to
the panoply of clinical use cases while mitigating
risks to patients will require many ML models be
intuitively interpretable for clinicians. To this end,
we adapted a symbolic regression method, coined the
feature engineering automation tool (FEAT), to train
concise and accurate models from high-dimensional
electronic health record (EHR) data. We first present
an in-depth application of FEAT to classify
hypertension, hypertension with unexplained
hypokalemia, and apparent treatment-resistant
hypertension (aTRH) using EHR data for 1200 subjects
receiving longitudinal care in a large healthcare
system. FEAT models trained to predict phenotypes
adjudicated by chart review had equivalent or higher
discriminative performance (p < 0.001) and were at
least three times smaller (p < 1e-06) than other
potentially interpretable models. For aTRH, FEAT
generated a six-feature, highly discriminative highly
discriminative (positive predictive value = 0.70,
sensitivity = 0.62), and clinically intuitive
model. To assess the generalisability of the approach,
we tested FEAT on 25 benchmark clinical phenotyping
tasks using the MIMIC-III critical care database. Under
comparable dimensionality constraints, FEATs models
exhibited higher area under the receiver-operating
curve scores than penalized linear models across tasks
(p < 6e-06). In summary, FEAT can train EHR
prediction models that are both intuitively
interpretable and accurate, which should facilitate
safe and effective scaling of ML-triggered CDS to the
panoply of potential clinical use cases and healthcare
practices",
notes = "Preprint on MedRxiv:
10.1101/2020.12.12.20248005