PMCID
PMC12637690

Adaptive resampling for improved machine learning in imbalanced single-cell datasets.

bioRxiv : the preprint server for biology
Authors
Abstract

While machine learning models trained on single-cell transcriptomics data have shown great promise in providing biological insights, existing tools struggle to effectively model underrepresented and out-of-distribution cellular features or states. We present a generalizable Adaptive Resampling (AR) approach that addresses these limitations and enhances single-cell representation learning by resampling data based on its learned latent structure in an online, adaptive manner concurrent with model training. Experiments on gene expression reconstruction, cell type classification, and perturbation response prediction tasks demonstrate that the proposed AR training approach leads to significantly improved downstream performance across datasets and metrics. Additionally, it enhances the quality of learned cellular embeddings compared to standard training methods. Our results suggest that AR may serve as a valuable technique for improving representation learning and predictive performance in single-cell transcriptomic models.

Year of Publication
2025
Journal
bioRxiv : the preprint server for biology
Date Published
11/2025
ISSN
2692-8205
DOI
10.1101/2025.11.04.686583
PubMed ID
41279118
Links