Capturing sequence ambiguity among taxa in a primer-specific manner to improve taxonomic classification of amplicon sequencing.
| Authors | |
| Abstract | Amplicon sequencing, a common strategy to taxonomically profile microbial communities, is relatively low cost and high throughput. However, it is subject to unique biases, including primer incompatibilities and the inability to differentiate between certain microbes due to low sequence variability. Due to this, taxa may be mis-, multiply-, or un-identified when using different variable regions. To address this, we developed Parathaa (Preserving and Assimilating Region-specific Ambiguities in Taxonomic Hierarchical Assignments for Amplicons), which directly models taxonomic sequence ambiguities within amplicon regions and allows for assignments to multiple taxonomic labels when phylogenetically warranted. Parathaa accomplishes this by leveraging full-length sequence databases to build primer-specific phylogenies, which it uses to identify variable-region-specific taxonomic distance thresholds. Parathaa then assigns taxonomy to sequences by placing them into these trees, allowing for multiple assignments if the tree is not resolved at the placement location. Thus, Parathaa's assignments capture biological ambiguities specific to the sequenced variable region. Parathaa performed better than both IDTAXA and RDP-based Naïve Bayes classifiers with or without exact matching (as implemented in DADA2) at the species level when applied to a synthetic dataset from across the bacterial kingdom. Overall, Parathaa's approach allows users to retain more information and understand potential sources of bias when classifying amplicon reads. |
| Year of Publication | 2025
|
| Journal | Nucleic acids research
|
| Volume | 53
|
| Issue | 22
|
| Date Published | 11/2025
|
| ISSN | 1362-4962
|
| DOI | 10.1093/nar/gkaf1291
|
| PubMed ID | 41325771
|
| Links |