PMCID
PMC13061034

Gene conversion is a key driver of diversity hotspots in antigens and virulence-associated loci.

bioRxiv : the preprint server for biology
Authors
Abstract

Despite the long-held view of () as a genetically conserved pathogen, many genomic regions remain poorly resolved due to high sequence homology and repetitive content. Using complete genome assemblies generated from long-read sequencing of 151 globally representative clinical isolates, we comprehensively analyzed genome-wide patterns of genetic diversity and evolution across the genome. Our analysis uncovers pronounced diversity hotspots within paralogous regions generated by recurrent gene conversion between homologous genes. In many cases, these hotspots exhibit more than an order of magnitude greater genetic diversity than the rest of the genome, which is otherwise characterized by remarkably low variation. Mutations within these regions display clustered substitution patterns, excess paralog-matching variants, and distinct mutational spectra consistent with ongoing gene conversion. Our analysis identifies over 300 individual gene conversion events distributed throughout the phylogeny. These gene conversion events occur predominantly within gene families associated with virulence and host-pathogen interactions, including the PE, PPE, and ESX families. Several of the most pronounced diversity hotspots occur in antigens encoded within paralogous regions. Among these, the vaccine candidate PPE18 harbors mutations in validated epitope sequences and predicted alterations in HLA-II binding. Together, these findings demonstrate that gene conversion actively shapes antigenic and virulence-associated diversity in .

Year of Publication
2026
Journal
bioRxiv : the preprint server for biology
Date Published
03/2026
ISSN
2692-8205
DOI
10.64898/2026.02.26.708061
PubMed ID
41959376
Links