Inferring transcription factor binding from base-pair level chromatin accessibility using deep learning contribution scores

UMass Chan Medical School

The human genome encodes over 1,600 sequence-specific transcription factors (TFs) that orchestrate gene expression by binding cis-regulatory elements (CREs). Despite their central role in gene regulation, fewer than 1% of all TF–cell-type combinations have been experimentally characterized. Deep learning has transformed regulatory genomics, enabling accurate prediction of molecular phenomena—including TF binding—directly from sequence. Yet, deep neural networks are often regarded as “black boxes,” raising fundamental questions about what they learn and why particular predictions are made. We will discuss these challenges—particularly the limits of model generalization and the difficulty of interpreting learned representations—in the context of transcription factor binding prediction. I will trace the development of interpretation methods, from early activation based approaches in DeepBind, the first deep learning model applied to regulatory genomics, to modern attribution frameworks. I will then show that base-pair-level contribution scores derived from chromatin accessibility models reveal interpretable sequence features consistent with transcription factor binding. Finally, I will demonstrate how these inferred binding sites can be leveraged to predict TF occupancy across diverse cell types, offering a scalable and interpretable framework for decoding gene regulation.

ӳ��ý