Prediction and functional interpretation of inter-chromosomal genome architecture from DNA sequence with TwinC.

bioRxiv : the preprint server for biology
Authors
Abstract

Three-dimensional nuclear DNA architecture comprises well-studied intra-chromosomal () folding and less characterized inter-chromosomal () interfaces. Current predictive models of 3D genome folding can effectively infer pairwise -chromatin interactions from the primary DNA sequence but generally ignore contacts. There is an unmet need for robust models of -genome organization that provide insights into their underlying principles and functional relevance. We present TwinC, an interpretable convolutional neural network model that reliably predicts contacts measurable through genome-wide chromatin conformation capture (Hi-C). TwinC uses a paired sequence design from replicate Hi-C experiments to learn single base pair relevance in interactions across two stretches of DNA. The method achieves high predictive accuracy (AUROC=0.80) on a cross-chromosomal test set from Hi-C experiments in heart tissue. Mechanistically, the neural network learns the importance of compartments, chromatin accessibility, clustered transcription factor binding and G-quadruplexes in forming contacts. In summary, TwinC models and interprets genome architecture, shedding light on this poorly understood aspect of gene regulation.

Year of Publication
2024
Journal
bioRxiv : the preprint server for biology
Date Published
09/2024
ISSN
2692-8205
DOI
10.1101/2024.09.16.613355
PubMed ID
39345598
Links