MIA: Jacob Schreiber, Programmatic design and editing of cis-regulatory elements; Gregory Andrews
Jacob Schreiber
UMass Chan Medical School
Meeting: Programmatic design and editing of cis-regulatory elements using deep learning
The development of modern genome editing tools has enabled researchers to make such edits with high precision, but has left unsolved the problem of designing these edits. As a solution, we propose Ledidi, a computational method that rephrases the discrete task of designing genomic edits as a continuous optimization problem where the goal is to produce the desired outcome as measured by one or more predictive models using as few edits from an initial template sequence as possible. Ledidi can be paired with almost any trained machine learning model and, when applied across dozens of such models, can quickly design edits to precisely control predicted transcription factor binding, chromatin accessibility, transcription, and enhancer activity across several species. After demonstrating these capabilities, we used Ledidi to design cell type-specific enhancers and validated the designs using STARR-seq. We found that not only did the designs qualitatively induce cell type-specificity, but they also quantitatively controlled regulatory strength, with some designed enhancers exhibiting far greater activity than any naturally occurring enhancer.
Learn more about Jacob's work:
Gregory Andrews
UMass Chan Medical School
Primer: Inferring transcription factor binding from base-pair level chromatin accessibility using deep learning contribution scores
The human genome encodes over 1,600 sequence-specific transcription factors (TFs) that orchestrate gene expression by binding cis-regulatory elements (CREs). Despite their central role in gene regulation, fewer than 1% of all TF–cell-type combinations have been experimentally characterized. Deep learning has transformed regulatory genomics, enabling accurate prediction of molecular phenomena—including TF binding—directly from sequence. Yet, deep neural networks are often regarded as “black boxes,” raising fundamental questions about what they learn and why particular predictions are made. We will discuss these challenges—particularly the limits of model generalization and the difficulty of interpreting learned representations—in the context of transcription factor binding prediction. I will trace the development of interpretation methods, from early activation based approaches in DeepBind, the first deep learning model applied to regulatory genomics, to modern attribution frameworks. I will then show that base-pair-level contribution scores derived from chromatin accessibility models reveal interpretable sequence features consistent with transcription factor binding. Finally, I will demonstrate how these inferred binding sites can be leveraged to predict TF occupancy across diverse cell types, offering a scalable and interpretable framework for decoding gene regulation.