Areas of focus

Clinical datasets

In collaboration with Mass General Brigham, we make use of three large-scale retrospective EHR-based clinical datasets:

  • Community Care Cohort Project (C3PO)
  • Primary Care Loyalty Cohort (PCLC)
  • Electronic Warehouse of Cardiology (EWOC)

These datasets include demographics, anthropometrics, vital signs, laboratory results, patient-level encounter data, narrative notes, medication lists, billing and procedural codes, and other cardiology and radiology diagnostic test results from more than 600,000 patients seen at MGB facilities. These data — as well as data from large scale biobanks and genetic studies such as All of Us, UK Biobank, and FinnGen — form the backbone of our model and infrastructure development activities.

AI/ML methods

Thus far we have constructed, trained, assessed, and implemented more than 40 clinical AI models for care quality assessments, clinical trial enrollment, information retrieval from EHR systems, risk stratification, and more. These include:

  • Multimodal fusion models for deep phenotyping
  • Predictive models for disease risk prediction and diagnosis
  • Large-language models for phenotyping and disease classification 

More information about our model development efforts is available under Key Projects and on .

Platforms

We aim to develop Infrastructure, platforms and workflows that enable efficient model training, evaluation, and deployment in clinical settings. These platform efforts currently focus on:

  • SABERR: AI-enabled image annotation and segmentation
  • Secure cloud-based application for AI model deployment
  • : Library and web application for quality assessment of self-supervised representation learning-based models