Pathophysiological Features in Electronic Medical Records Sustain Model Performance under Temporal Dataset Shift.

AMIA Joint Summits on Translational Science proceedings. AMIA Joint Summits on Translational Science
Authors
Abstract

Access to real-world data streams like electronic medical records (EMRs) has accelerated the development of supervised machine learning (ML) models for clinical applications. However, few studies investigate the differential impact of particular features in the EMR on model performance under temporal dataset shift. To explain how features in the EMR impact models over time, this study aggregates features into by their source (e.g. medication orders, diagnosis codes and lab results) and based on their reflection of patient pathophysiology or healthcare processes. We adapt Shapley values to explain feature groups' and feature categories' marginal contribution to initial and sustained model performance. We investigate three standard clinical prediction tasks and find that while feature contributions to initial performance differ across tasks, pathophysiological features help mitigate temporal discrimination deterioration. These results provide interpretable insights on how specific feature groups contribute to model performance and robustness to temporal dataset shift.

Year of Publication
2024
Journal
AMIA Joint Summits on Translational Science proceedings. AMIA Joint Summits on Translational Science
Volume
2024
Pages
95-104
Date Published
12/2024
ISSN
2153-4063
PubMed ID
38827052
Links