PMCID
PMC12873960

SCANBIT facilitates identification of tumor cell populations in scRNAseq data using pseudobulked SNV calls.

bioRxiv : the preprint server for biology
Authors
Abstract

MOTIVATION: Single cell RNAseq (scRNAseq) is an ideal tool to characterize the heterogeneity within the tumor microenvironment, however, accurate identification of tumor cells can be a challenge. Reference-based methods can be inaccurate, if reference datasets are even available. Current purpose-built methods can be inaccurate, particularly with highly heterogeneous tumor types. Improved methods are needed. We explored the use of genetic variants to distinguish tumor from normal cells within scRNAseq data.RESULTS: We characterized the limitations inherent to calling variants from scRNAseq data, quantifying how data sparsity precludes genetic distance calculation between single cells. As a novel workaround, we pooled data from transcriptionally similar cell clusters to call high quality variants and then calculated pairwise differences between cell populations and performed hierarchical clustering. We quantified confidence in genetic divergence between tumor and normal cell populations using bootstrapping. We performed extensive validation to assess accurate identification of tumor cells using ground-truth datasets. Application of our method to human scRNAseq samples highlighted the utility of our approach and revealed how mutational burden influences successful tumor cell identification.Improved cell type assignment in scRNAseq data will facilitate analysis of tumor samples and, in turn, accelerate our understanding of the mechanisms underlying tumor progression and reveal potential biological vulnerabilities that can be exploited to develop improved treatment options.

Year of Publication
2026
Journal
bioRxiv : the preprint server for biology
Date Published
01/2026
ISSN
2692-8205
DOI
10.64898/2026.01.27.701834
PubMed ID
41659636
Links