Discovery of disease-associated cellular states using ResidPCA in single-cell RNA and ATAC sequencing data.

HGG advances
Authors
Abstract

To advance understanding of cellular heterogeneity in disease from single-cell sequencing data, we introduce Residual Principal Component Analysis (ResidPCA), a robust method for identifying cell states that explicitly models cell type heterogeneity. In simulations, ResidPCA achieved more than fourfold higher accuracy than conventional Principal Component Analysis (PCA) and over threefold higher accuracy than Non-negative Matrix Factorization (NMF)-based methods in detecting states expressed across multiple cell types. Applied to single-cell RNA sequencing (scRNA-seq) of light-stimulated mouse visual cortex cells, ResidPCA captured stimulus-driven variability with an accuracy more than fivefold higher than NMF-based approaches. In single-nucleus datasets from an Alzheimer's disease cohort, ResidPCA identified 44 chromatin accessibility-based states from single-nucleus ATAC-seq (snATAC-seq) and 42 transcriptional states from single-nucleus RNA-seq (snRNA-seq). Thirty snATAC-seq states were significantly enriched for Alzheimer's disease heritability, often more so than established cell types such as microglia. The snATAC-seq state most significantly enriched for heritability further elucidates a recently implicated neuron-oligodendrocyte-microglial mechanistic axis, linking early amyloid production in neurons and oligodendrocytes with later microglial activation and immune response. Together, these results highlight ResidPCA's ability to uncover previously hidden biological variation in single-cell data and reveal disease-relevant cell states.

Year of Publication
2025
Journal
HGG advances
Pages
100538
Date Published
10/2025
ISSN
2666-2477
DOI
10.1016/j.xhgg.2025.100538
PubMed ID
41157948
Links