Learning, Predicting, and Interpreting Omics Data with Biologically Informed Models

Institute for Computational Biomedicine, Heidelberg University

Abstract:

High-throughput omics assays capture molecular states at scale, yet building robust, interpretable, and actionable models from these high-dimensional data remains difficult. Key challenges include limited numbers of observed conditions and perturbations, technical confounding and batch effects, and separating causal effects of interventions from correlations driven by shared regulation. Prior biological knowledge can help overcome these limitations by constraining the hypothesis space, providing contextual structure, and encoding mechanistic assumptions that are not directly identifiable from data alone. This talk starts with CORNETO, a unified optimization framework for combining prior biological knowledge with omics data to infer interpretable, context-specific biological networks. Using constrained optimization over prior-knowledge graphs, we show how this framework supports principled network inference across diverse biological settings, applying it as part of the EU-funded DECIDER research project (Clinical Decision via Integrating Multiple Data Levels to Overcome Chemotherapy Resistance in High-Grade Serous Ovarian Cancer) to identify molecular mechanisms of resistance from transcriptomics. We then show how restricting attention to specific classes of CORNETO problems, especially convex ones, allows these problems to be embedded as convex layers and used as hard inductive biases in machine learning models. This provides a principled route to biologically informed neural networks for different biological problems. We close by showing results and lessons learned from recent competitions in perturbation biology, including the 1st Virtual Cell Challenge, with a focus on what these benchmarks reveal about current model strengths and limitations.

Biography:

Pablo Rodriguez-Mier is a research scientist at the Institute for Computational Biomedicine, Heidelberg University, in the Saez-Rodriguez group, and a visitor at EMBL-EBI (European Bioinformatics Institute). His research focuses on biological network inference and predictive models of biological perturbation responses, with a broader interest in understanding how complex biological systems behave under different conditions using mechanistic, statistical, and machine learning models that integrate prior biological knowledge with experimental data, drawing on a background in computer science. Previously, he was a postdoctoral researcher in Computational Systems Biology at INRAE Toxalim (Toulouse, France), where he developed computational models to understand and predict the metabolic deregulation of cancer cells caused by mutations in the TP53 gene.

MIA Talks Search