Protein design with generative diffusion models

Sarah Alamdari
BioML Group, 
Microsoft Research

Primer: An introduction to diffusion models for protein design

Generative AI is one of the most promising avenues towards gaining a deeper understanding of the vast amount of data available in our modern world. A generative model aims to learn the underlying patterns of a dataset, resulting in the ability to synthesize new and realistic data examples. Denoising diffusion probabilistic models are the current state-of-the art generative deep learning approaches behind some of the most striking examples of image (DALL-E) and text (GENIE) generation. Notably, diffusion models have recently been used to generate realistic protein sequences and structures, proving to be a powerhouse in the life-sciences domain. In this talk we introduce the theoretical foundation and formulation of diffusion models, then discuss their use and application in protein design problems.

 

Ava Amini
Microsoft Research

Meeting: Bridging Biophysics and AI to Optimize Protein Design

Engineered proteins play increasingly essential roles in applications spanning pharmaceuticals, molecular tools, synthetic biology, and more. Deep generative models offer the ability to accelerate protein engineering for therapeutic and biological applications. Recently, a family of generative models called diffusion models has demonstrated the potential for unprecedented capability and control in de novo design. In this talk, we introduce biologically-grounded diffusion models for generation of protein structures and sequences.

We first share work in creating a new diffusion-based generative model that designs protein structures by mirroring the biophysics of the native protein folding process. To expand beyond the subset of protein biology captured in structural data, we reasoned that sequence – not structure – could serve as a universal design space for protein generation. We thus developed a general-purpose diffusion framework, EvoDiff, that combines evolutionary-scale data with the distinct conditioning capabilities of diffusion models for controllable protein design in sequence space alone. We envision that these modeling frameworks will enable new capabilities in protein engineering towards programmable, functional design.delays: many new colonizers underwent sudden, correlated expansions months after the antibiotic perturbation. Furthermore, strains that had previously transmitted between cohabiting partners rarely recolonized after antibiotic disruptions, showing that colonization displays substantial historical contingency. This work demonstrates that there remain substantial ecological barriers to colonization even after major microbiome disruptions, suggesting that dispersal interactions and priority effects limit the pace of community change.

 

For more information visit: /mia.