Predicting small-molecule binding sites in proteins using the pair representation of AlphaFold2

Harvard Medical School

Identification of small-molecule binding sites in proteins is an important task for drug discovery. Despite previous homology- and machine-learning-based approaches to this problem, true de novo binding-site prediction remains a challenge. Here, we use features from a pretrained neural network to train a logistic regression model, AF2BIND, for accurate prediction of de novo binding sites. AF2BIND identifies binding sites without relying on homology modeling, multiple sequence alignments, or knowledge of a pocket-compatible ligand. Interpretable aspects of the model can be used to predict chemical properties of compatible ligands. We apply AF2BIND on the human proteome to produce a database that includes thousands of unseen binding sites in disease-relevant proteins. We anticipate AF2BIND will be used to focus drug discovery efforts and uncover functional sites in proteins across the tree of life.

MIA Talks Search