Vivien Jiang
Vivian is a rising senior at Johns Hopkins (Class ‘26) studying biomedical engineering with minors in computer science and in applied mathematics and statistics.
Protein Foundation Models (PFMs) enable researchers to predict a diverse set of protein properties, including structural characteristics, functional properties, and mutational fitness effects, directly from amino acid sequences.
I thought I came to work on a summer project, but I walked away learning what scientific research is actually about, and why I want to keep doing it. As with other foundation models, PFMs are trained on large-scale protein sequence datasets using self-supervised learning, rather than being explicitly trained to perform specific prediction tasks. Prediction capabilities emerge through self-supervised training. Therefore, when researchers choose a PFM for a task, the generalizability of these models often leads to the assumption that model improvements from self-supervised training translate uniformly across all downstream tasks. This assumption implies that performance rankings of different models should remain consistent across various prediction tasks. We investigate this hypothesis on the Protein Gym dataset, which comprises deep mutational scanning experiments that quantify fitness effects of single amino acid substitutions across five functional categories: activity, organismal fitness, stability, expression, and binding. Surprisingly, we find that performance rankings of state-of-the-art PFMs are unstable across these tasks, contradicting the desired generality of current PFMs. We also investigate whether similarity in models' internal representations of these tasks correlates with ranking stability, providing deeper insight into potential causes of ranking instability. Given the widespread use of PFMs in academic and industrial research and applications, our findings inform the selection of appropriate models for specific biological tasks and motivate future work on understanding and improving task generalization in protein foundation models.
Project: Ranking Instability in Protein Foundation Models Across Functional Task
Mentors: Olawale Salaudeen, Eric and Wendy Schmidt Center