Capturing sequence ambiguity among taxa in a primer-specific manner to improve taxonomic classification of amplicon sequencing.

Nucleic acids research
Authors
Abstract

Amplicon sequencing, a common strategy to taxonomically profile microbial communities, is relatively low cost and high throughput. However, it is subject to unique biases, including primer incompatibilities and the inability to differentiate between certain microbes due to low sequence variability. Due to this, taxa may be mis-, multiply-, or un-identified when using different variable regions. To address this, we developed Parathaa (Preserving and Assimilating Region-specific Ambiguities in Taxonomic Hierarchical Assignments for Amplicons), which directly models taxonomic sequence ambiguities within amplicon regions and allows for assignments to multiple taxonomic labels when phylogenetically warranted. Parathaa accomplishes this by leveraging full-length sequence databases to build primer-specific phylogenies, which it uses to identify variable-region-specific taxonomic distance thresholds. Parathaa then assigns taxonomy to sequences by placing them into these trees, allowing for multiple assignments if the tree is not resolved at the placement location. Thus, Parathaa's assignments capture biological ambiguities specific to the sequenced variable region. Parathaa performed better than both IDTAXA and RDP-based Naïve Bayes classifiers with or without exact matching (as implemented in DADA2) at the species level when applied to a synthetic dataset from across the bacterial kingdom. Overall, Parathaa's approach allows users to retain more information and understand potential sources of bias when classifying amplicon reads.

Year of Publication
2025
Journal
Nucleic acids research
Volume
53
Issue
22
Date Published
11/2025
ISSN
1362-4962
DOI
10.1093/nar/gkaf1291
PubMed ID
41325771
Links