Skip to content

soda-inria/predictive-ehr-benchmark

Repository files navigation

Predictive algorithms from Electronic Health Records

This repository hosts code for the working paper: Exploring a complexity gradient in representation and predictive algorithms for EHRs

Documentation

Source Code

Working Paper repository

Abstract

Electronic Health Records contain time-varying features with high cardinality. Current state-of-the-art predictive models build on increasingly elaborated pipelines --based on transformers-- to handle the complexity of these data. Acknowledging the complexity to deploy, transfer and adapt these models on local care environments, we explore a complexity-benefit tradeoff by comparing them to simple aggregation of events. We use three clinical tasks involving time-varying structured Electronic Health Records (EHRs) and increasingly clinically relevant problems. We show that these benchmarking tasks display heterogeneous predictive difficulties. We introduce a simple aggregation of static embeddings --transferred from national claims and publicly available--, showing that it outperforms transformer-based models on simple tasks with medium sample sizes. We highlight the sample and computing resource efficiency of these models. Finally, clinically relevant problems generally present a strong class imbalance, which complicates models development and undermines their performances. Further work is needed to understand if transformer-based models perform well in these scenarios where the number of cases requires good sample efficiency.

Usage

See the usage page on the documentation

About

Exploring a complexity gradient in representation and predictive algorithms for EHRs

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages