PMCID
PMC13015563

Constrained Diffusion as a Paradigm for Evolution.

bioRxiv : the preprint server for biology
Authors
Abstract

A foundational question in computational biology is how to utilize data to describe the forces driving evolution. Here, we view evolution as a novel diffusion process constrained by many biological, physical, and environmental factors affecting organism viability at any given time. We introduce DiffEvol, a framework that models evolution as constrained diffusion over a discrete genotype space. Using real-world genomic sequence data alone, DiffEvol estimates complex evolutionary constraints by inverting the diffusion dynamics to recover a constrained subspace representing the viable genotype manifold, as well as its evolution over time. Applied to SARS-CoV-2 sequence data from 2020-2024, DiffEvol reconstructs constraint functions that recapitulate known viral fitness trends, including a pronounced "phase transition" that occurred following the widespread adoption of the SARS-CoV-2 vaccine. Our constraint subspace representation of the data characterizes such features and trends more clearly. This framework could be used not only to improve forecasting of emergent pathogenic strains, but also to produce more accurate reverse time analyses of their evolutionary dynamics to help identify ancestral variants and the forces having shaped a pathogen's evolutionary trajectory. More generally, this formulation provides a method for linking observed sequence mutations to an evolving fitness landscape. Thus, constrained subspace diffusion offers a mathematical language for evolutionary dynamics in any system where random variation interacts with slowly-changing structural or global constraints, and can be applied to more complex evolutionary phenomena such as vaccine resistance, viral escape, and protein evolution.

Year of Publication
2026
Journal
bioRxiv : the preprint server for biology
Date Published
03/2026
ISSN
2692-8205
DOI
10.64898/2026.03.10.710948
PubMed ID
41889840
Links