Our key projects span a diverse set of machine learning applications designed to accelerate discovery and improve clinical care. From foundational multimodal models to targeted disease prediction, phenotyping, and drug repurposing, ML4H combines large-scale data, deep clinical insight, and cutting-edge AI methods to drive innovation across cardiology, neurology, and beyond.
Patient contrastive learning representations
This new approach for transfer learning on ECGs, Patient Contrastive Learning Representations (PCLR), creates a more performant, efficient and pragmatic representation of an ECG that outperforms supervised deep learning models trained from scratch in data sets with less than a few thousand labeled events.
Lead(s): Nathaniel Diamant (MIT, ML4H)
Paper(s):
ECG-MRI multimodal approach
ML4H developed a cross-modal autoencoder framework integrating ECGs and MRIs for constructing a holistic representation of the cardiovascular state. The joint representations were shown to improve phenotype prediction from a single modality and enable data imputation.
Lead(s): Adit Radhakrishnan (MIT, EWSC), Sam Friedman (ML4H)
Paper(s):
Registered Autoencoders for GWAS and PheWAS
Preprocessing ECGs and MRIs can make a huge impact on the information that foundational models encode. For example, registration to an anatomical atlas can greatly increase the number of phenotypic and genetic associations of the learned latent space. Across multiple modalities and cohorts, the ML4H model factory builds autoencoders which can capture these associations, empowering phenome and genome-wide association studies.
Lead(s): Sam Friedman (ML4H)
Paper(s): ,
Natural language processing for heart failure adjudication from unstructured clinical notes
This natural language processing model accurately identifies heart failure events from unstructured discharge summaries.
Lead(s): Pulkit Singh (ML4H), Chris Reeder (ML4H), Jonathan Cunningham (BWH)
Paper(s):
Phenotyping longitudinal clinical notes for prediction of dementia
In this project, we built a pipeline that uses large language models (LLMs) to extract early clinical markers — specifically mentions of falls and hearing loss — from unstructured EHR notes. These features were chosen for their relevance as early indicators of dementia risk and are often documented only in narrative text. These features were incorporated into a predictive modeling framework aimed at identifying individuals at risk of developing dementia, with the broader aim of enabling earlier detection and intervention.
Lead(s): Valentina D’Souza (ML4H)
MRI-based fat distribution analysis modifies the effect of BMI
Using a convolutional neural network, the ML4H team quantified visceral, abdominal subcutaneous and gluteofemoral fat depots from MRI and showed that the distribution of fat deposits across the body can a) modify the effect of BMI, and b) be protective or harmful for developing diabetes or coronary artery disease.
Lead(s): Markus Klarqvist (ML4H), Saalet Agarwal (MGH)
Paper(s):
Echocardiogram phenotyping
A segmentation-free deep learning model interprets echocardiogram videos and automatically extracts measures of cardiac structure and function. These measurements were shown to be highly associated with future clinical outcomes.
Lead(s): Paolo Di Achille (ML4H), Emily Lau (MGH)
Paper(s):
Phenotyping cardiac fibrosis
A deep learning model automatically quantifies myocardial fibrosis from mid-ventricular short-axis cardiac MRI T1 maps in the UK Biobank. These measurements were associated with future disease risk and genetic loci, identifying novel biologic pathways relevant to fibrosis for therapeutic targeting.
Lead(s): Paolo Di Achille (ML4H), Victor Nauffal (BWH)
Paper(s):
Papillary muscle segmentation and fibrosis phenotyping
ML4H developed a deep learning model to segment the two small papillary muscles from mid-ventricular short-axis cardiac MRI T1 maps in the UK Biobank and automatically quantify fibrosis within each. Papillary muscle T1 time was associated with prevalent cardiovascular disease and measures of cardiac structure and function. A genome-wide association study revealed loci associated with papillary muscle fibrosis after adjusting for ventricular fibrosis.
Lead(s): Danielle Pace (ML4H), Victor Nauffal (BWH)
Prediction of atrial fibrillation (AF) based on 12-lead ECGs
ECG-AI is a deep learning model using 12-lead ECG for predicting time to incident atrial fibrillation (AF). ECG-AI demonstrated improved predictive usefulness of incident AF when combined with a clinical risk model.
Lead(s): Sam Friedman (ML4H), Shaan Kurshid (MGH)
Paper(s):
Detection of hypertension from 12-lead ECGs
ECG2Hypertension is a deep learning model that detects hypertension from an ECG. Based on a single 12-lead ECG, even those read as normal by a cardiologist, the model provides a digital biomarker for hypertension. This biomarker is a stronger predictor of mortality, stroke, heart failure, and myocardial infarction than systolic blood pressure or pulse pressure.
Lead(s): Sam Friedman (ML4H), Mostafa-Alusi (MGH)
Paper(s):
Prediction of Heart Failure (HF) based on 12-lead ECGs
Echo2HF is a deep learning model that predicts time to incident heart failure from transthoracic echocardiographic images. Based on a single echo study consisting of multiple videos, Echo2HF accurately predicts future heart failure risk, outperforming modern clinical heart failure risk scores.
Lead(s): Tal Snitzer (ML4H), Emily Lau(MGH)
Polygenic risk score of aortic aneurysm, stenosis and dissection from cardiac MRI
By characterizing aortic dimensions across >2 million cardiac MRI images from 34,000 UK Biobank participants using a deep learning model, ML4H discovered ~80 loci in a genome-wide association study (GWAS) associated with aortic diameter and developed a polygenic risk score predicting risk of thoracic aortic aneurysm and aortic stenosis.
Lead(s): James Pirruccello (MGH, ML4H)
Paper(s): ,
Identifying shared genetic pathways for fibrosis from abdominal and cardiac MRIs:
Leveraging abdominal MRI T1 maps in the UK Biobank, ML4H developed a deep learning segmentation model to simultaneously quantify fibrosis in the liver, kidney, and pancreas. Building on previous cardiac fibrosis phenotyping in the UK Biobank, this work identified both organ-specific and shared genetic pathways for fibrosis. Furthermore, ML4H demonstrated the role of multi-organ assessment of T1 time in risk stratification of all-cause mortality in the population.
Lead(s): Markus Klarqvist (Ó³»´«Ã½), Victor Nauffal (BWH)
Paper(s):
The Community Care Cohort Project (C3PO)
The C3PO has longitudinal high-resolution clinical data for over a half-million individuals. ML4H uses NLP on the ~80 billion tokens in C3PO to scale phenotype ascertainment and to drive new biological discovery and clinical impact across a wide variety of clinical analyses.
Lead(s): Chris Reeder (ML4H), Shaan Kurshid (MGH)
Paper(s):
The pregnancy cohort: PADME
ML4H developed the Predictive Analysis with Deep Learning Models for Maternal Endpoints (PADME) cohort, a multi-institutional EHR-based cohort of >56,000 pregnancies leveraging the C3PO cohort with rich clinical data and rigorously defined longitudinal CV outcomes. To define the pregnancy period, we extracted gestational age from unstructured clinical notes — a critical element for accurately linking vital signs and outcome data. Using this novel resource, we aimed to characterize secular trends in maternal cardiometabolic comorbidities and pre-existing cardiovascular disease at the time of pregnancy, as well as to evaluate the incidence of cardiovascular complications occurring during pregnancy.
Lead(s): Valentina D’Souza (ML4H), Emily Lau (MGH)