PMCID
PMC13228633

Unique molecular identifiers don't need to be unique: a collision-aware estimator for RNA-seq quantification.

bioRxiv : the preprint server for biology
Authors
Keywords
Abstract

RNA-sequencing (RNA-seq) relies on Unique Molecular Identifiers (UMIs) to accurately quantify gene expression after PCR amplification. Longer UMIs minimize collisions-where two distinct transcripts are assigned the same UMI-at the expense of increased sequencing and synthesis costs. However, it is not clear how long UMIs need to be in practice, especially given the nonuniformity of the empirical UMI distribution. In this work, we develop a method-of-moments estimator that accounts for UMI collisions, accurately quantifying gene expression and preserving downstream biological insights. We show that UMIs need not be unique: shorter UMIs can be used with a more sophisticated estimator.

Year of Publication
2026
Journal
bioRxiv : the preprint server for biology
Date Published
05/2026
ISSN
2692-8205
DOI
10.1101/2025.09.08.674884
PubMed ID
42239132
Links