Evaluation of DNA encoded library and machine learning model combinations for hit discovery.

npj drug discovery
Authors
Abstract

DNA-Encoded Library (DEL) technology allows the screening of millions to billions of compounds in a pooled fashion, which is faster and cheaper than traditional approaches. The massive amounts of DEL binder and not-binder data enable Machine Learning (ML) model development and virtual screening of readily accessible, drug-like libraries in an ultra-high-throughput fashion. Here, we report a comparative assessment of DEL + ML pipeline for hit discovery using three DELs and five ML models (fifteen DEL + ML combinations). Each ML model was used to identify orthosteric binders of two therapeutic targets, Casein kinase 1α/δ (CK1α/δ). Overall, 10% and 94% of the predicted binders and not-binders were confirmed in biophysical assays, including two nanomolar binders (187 and 69.6 nM). Our study provides insights into the DEL + ML paradigm for hit discovery: the importance of chemical diversity in training data and ML model generalizability over accuracy. We publicly shared our results for further use and similar developments.

Year of Publication
2025
Journal
npj drug discovery
Volume
2
Issue
1
Date Published
04/2025
ISSN
3005-1452
DOI
10.1038/s44386-025-00007-4
PubMed ID
42380223
Links