Multi-trait protein engineering - a synergistic ML-wet lab approach to AAV engineering
Fatma Elzahraa Eid
Ó³»´«Ã½
Multi-trait protein engineering - a synergistic ML-wet lab approach to AAV engineering
A major challenge in protein engineering is to introduce de novo protein-protein interaction motifs. We are interested in creating ML models that learn the relationship between sequences at specific sites on a protein of interest and the pre-existing (e.g., stability) or de novo (e.g., target binding) functions that those sequences drive. The challenge is compounded when seeking to predict sequences possessing multiple traits due to the surge of false positives when multiple models are combined. Using a case study in adeno-associated virus (AAV) capsid engineering, I will address two interconnected and key questions in applying ML to protein engineering: (1) How can wet lab experiments be designed to produce context-specific data that are less biased, less noisy, and that drive meaningful learning? (2) How can ML models be trained and used to accurately predict protein variants that simultaneously possess multiple traits of interest?
In the Deverman vector engineering group, we established Fit4Function, an ML-guided approach for systematically engineering multi-trait AAV capsids (Eid et al. bioRxiv 2022.12.22.521680). The Fit4Function strategy relies on designing capsid libraries that evenly sample the manufacturable sequence space such that screens of these libraries generate reproducible data that can be used to train accurate sequence-to-function models. Due to the low bias and low noise in the Fit4Function data, our models were capable of accurately predicting the functions of capsid variants in independently manufactured libraries. We demonstrated that these models could be applied in combination without suffering a high false positive rate. The Fit4Function strategy is a critical step toward assembling an ML atlas that predicts AAV capsid performance across dozens of traits, and should be generalizable to other protein and non-protein targets of interest. In this seminar, I will highlight the ML insights and solutions developed while building the Fit4Function AAV engineering platform to enable multi-trait protein learning.