Diffusion for molecule generation/Structured variational autoencoders for prediction and optimization

Alexandru Dumitrescu
Aalto University

Dani Korpela
Aalto University

Primer: Diffusion for molecule generation

In silico molecule generation enables the rapid creation of an initial pool of drug-like molecules, potentially accelerating significantly drug discovery and design. We explore the use of deep generative models for molecular generation, focusing on recent advancements and challenges. We begin with an introduction to diffusion models, a powerful framework for generating high-quality data through iterative noise processes. Diffusion models on point cloud representations of molecules has been extensively explored, with prevailing methodologies focusing on NN parametrizations that are E(3) invariant. We then examine Field-based Molecule Generation (FMG). Instead of moving points in the 3D space, we generate 3D vector fields, and analyze pros and cons of the two representations. Crucially different than point clouds, we do not constrain our architectures to be rotationally invariant in FMG. We show that keeping rotational invariance in diffusion models must either necessarily disregard molecular chirality or become intractable. We end our discussion with future next steps and ideas on how to integrate molecular generation methods as exploratory models for drug-discovery pipelines.

Harri Lähdesmäki
Aalto University

Meeting: Structured variational autoencoders for prediction and optimization

Variational autoencoder (VAE) is a neural architecture that learns a deep latent variable model using an amortized variational inference model and has become a popular approach to generative modeling and high-dimensional data analysis. Structured VAEs extend vanilla VAEs by incorporating probabilistic graphical models to account for dependencies in the prior of latent variables that naturally extend VAEs to temporal modeling and sequential decision making. In this talk, I will present our recent efforts in developing structured VAEs for modeling high-dimensional temporal and spatio-temporal data as well as for high-dimensional Bayesian optimization. Our proposed models are defined either as Gaussian process prior VAEs or latent neural ODEs or PDEs. I will discuss these models and their robust and computationally efficient learning methods. I will also highlight some applications in longitudinal modeling of electronic health records and dynamical modeling of physical systems as well as single-cell data.

For more information visit: .