Quantifying axes of inter-sample variability among transcriptional neighborhoods in single-cell datasets

Raychaudhuri Lab, Bioinformatics and Integrative Genomics Program, Harvard Medical School

Raychaudhuri Lab, Harvard Medical School; Brigham and Women’s Hospital

As single-cell datasets grow in sample size, there are increasing efforts to characterize cell states that vary across samples and associate with sample attributes like disease status. But it is not yet clear how best to summarize single-cell data on a per-sample level to enable comparison across samples. Prevailing approaches typically assume a transcriptional structure, such as a clustering of cells, and represent samples through that lens, e.g., by measuring the abundance of each cluster in each sample. However, this can be limiting because it assumes that the transcriptional structure in question matches the underlying biology. We will present co-varying neighborhood analysis (CNA), an alternative approach with greater granularity and flexibility. CNA characterizes dominant axes of variation across samples by identifying groups of small regions in transcriptional space--termed neighborhoods--that co-vary in abundance across samples, suggesting shared function or regulation. CNA can then perform statistical testing for associations between any sample-level attribute and the abundances of these co-varying neighborhood groups. We will discuss simulation evidence that CNA can provide more sensitive and accurate identification of disease-associated cell states than a cluster-based approach. We will then show three examples of applications of CNA that reveal a Notch activation signature in rheumatoid arthritis, heterogeneity in monocyte populations expanded in sepsis, and a novel T cell population associated with progression to active tuberculosis.

ӳ��ý