A Representation Fusion Framework for Decoupling Diagnostic Information in Multimodal Learning.

NPJ digital medicine
Authors
Abstract

Modern medicine increasingly relies on multimodal data, ranging from clinical notes to imaging and genomics, to guide diagnosis and treatment. However, integrating these heterogeneous data sources in a principled and interpretable manner remains a major challenge. We present MODES (Multi-mOdal Disentangled Embedding Space), a representation fusion framework that explicitly separates shared and modality-specific factors of variation, offering a structured latent space for multimodal information that improves both prediction and interpretability. By leveraging pre-trained unimodal foundation models, MODES mitigates the dependency on extensive paired datasets, crucial in data-scarce clinical settings. We introduce a masking strategy that optimizes representation dimensionality by eliminating low-information dimensions, to achieve compact, information-rich representations. Our framework demonstrates superior performance in predicting diagnoses and phenotypes compared to unimodal and conventional fusion models. MODES also enables robust diagnostic inference in missing data scenarios, offering an opportunity toward interpretable and efficient multimodal diagnostics in personalized healthcare.

Year of Publication
2025
Journal
NPJ digital medicine
Volume
8
Issue
1
Pages
765
Date Published
12/2025
ISSN
2398-6352
DOI
10.1038/s41746-025-02144-6
PubMed ID
41408105
Links