Evaluation of DNA encoded library and machine learning model combinations for hit discovery.
| Authors | |
| Abstract | DNA-Encoded Library (DEL) technology allows the screening of millions to billions of compounds in a pooled fashion, which is faster and cheaper than traditional approaches. The massive amounts of DEL binder and not-binder data enable Machine Learning (ML) model development and virtual screening of readily accessible, drug-like libraries in an ultra-high-throughput fashion. Here, we report a comparative assessment of DEL + ML pipeline for hit discovery using three DELs and five ML models (fifteen DEL + ML combinations). Each ML model was used to identify orthosteric binders of two therapeutic targets, Casein kinase 1α/δ (CK1α/δ). Overall, 10% and 94% of the predicted binders and not-binders were confirmed in biophysical assays, including two nanomolar binders (187 and 69.6 nM). Our study provides insights into the DEL + ML paradigm for hit discovery: the importance of chemical diversity in training data and ML model generalizability over accuracy. We publicly shared our results for further use and similar developments. |
| Year of Publication | 2025
|
| Journal | npj drug discovery
|
| Volume | 2
|
| Issue | 1
|
| Date Published | 04/2025
|
| ISSN | 3005-1452
|
| DOI | 10.1038/s44386-025-00007-4
|
| PubMed ID | 42380223
|
| Links |