MIA: Jingyi Jessica Li, Permutation Enhances the Rigor of Genomics Data Analysis; Pan Liu, mcRigor

Jingyi Jessica Li
Fred Hutchinson Cancer Center

Primer: Permutation Enhances the Rigor of Genomics Data Analysis

Ensuring the reliability and accuracy of single-cell data analysis is critical, particularly in visualizing complex biological structures and addressing data sparsity. This talk introduces two novel statistical methods—scDEED and mcRigor—that leverage permutation-based techniques to enhance the rigor of these analyses. scDEED [Xia et al. 2024, Nature Communications] addresses the challenge of evaluating the reliability of two-dimensional (2D) embeddings produced by visualization methods like t-SNE and UMAP, which are commonly used to visualize cell clusters. These methods, however, can sometimes misrepresent data structure, leading to erroneous interpretations. scDEED calculates a reliability score for each cell embedding, comparing the consistency between a cell's neighbors in the 2D embedding space and its pre-embedding neighbors. Cells with low reliability scores are flagged as dubious, while those with high scores are deemed trustworthy. Additionally, scDEED provides guidance for optimizing t-SNE and UMAP hyperparameters by minimizing the number of dubious embeddings, significantly improving visualization reliability across multiple datasets. mcRigor [Liu and Li 2025, Nature Communications] focuses on enhancing metacell partitioning in single-cell RNA-seq and ATAC-seq data analysis, a common strategy to address data sparsity by aggregating similar single cells into metacells. Existing algorithms often fail to verify metacell homogeneity, risking bias and spurious findings. mcRigor introduces a feature-correlation-based statistic to measure heterogeneity within a metacell, identifying dubious metacells composed of heterogeneous single cells. By optimizing metacell partitioning algorithm hyperparameters, mcRigor enhances the reliability of downstream analyses. Moreover, mcRigor allows for benchmarking and selecting the most suitable partitioning algorithm for a dataset, ensuring more robust discoveries. scDEED and mcRigor demonstrate the power of permutation-based approaches in refining single-cell data analysis, providing researchers with tools to achieve more accurate and reproducible insights into complex cellular processes.

 

Pan Liu
Fred Hutchinson Cancer Center

Meeting: mcRigor: a statistical method to enhance the rigor of metacell partitioning in single-cell data analysis

In single-cell data analysis, addressing sparsity often involves aggregating the profiles of homogeneous single cells into metacells. However, existing metacell partitioning methods lack checks on the homogeneity assumption and may aggregate heterogeneous single cells, potentially biasing downstream analysis and leading to spurious discoveries. To fill this gap, we introduce mcRigor, a statistical method to detect dubious metacells, which are composed of heterogeneous single cells, and optimize the hyperparameter of a metacell partitioning method. The core of mcRigor is a feature-correlation-based statistic that measures the heterogeneity of a metacell, with its null distribution derived from a double permutation scheme. As an optimizer for existing metacell partitioning methods, mcRigor has been shown to improve the reliability of discoveries in single-cell RNA-seq and multiome (RNA+ATAC) data analyses, such as uncovering differential gene co-expression modules, enhancer-gene associations, and gene temporal expression. Moreover, mcRigor enables benchmarking and selection of the most suitable metacell partitioning method with optimized hyperparameters tailored to specific datasets, ensuring reliable downstream analysis. Our results indicate that among existing metacell partitioning methods, MetaCell and SEACells consistently outperform MetaCell2 and SuperCell, albeit with the trade-off of longer runtimes

Resources:

  • Paper:

 

Learn more about MIA.