PMCID
PMC12642317

Challenges in Predicting Chromatin Accessibility Differences between Species.

bioRxiv : the preprint server for biology
Authors
Abstract

Enhancers are transcriptional regulatory elements that help drive phenotypic diversity, yet they often undergo rapid sequence evolution despite functional conservation, posing a challenge for predicting their function across species. Machine learning models that predict quantitative enhancer activity using DNA sequence have not previously been evaluated for their ability to predict quantitative differences across orthologous regions. Here, we trained convolutional neural networks (CNNs) on a regression task to predict chromatin accessibility, which is a proxy for enhancer activity, in the liver across five mammals, and we developed a novel framework to evaluate cross-species performance. We demonstrated that training on multiple species improves model generalization to both species used in training and held-out species. However, the models consistently achieved poor performance in predicting quantitative differences in accessibility between species at orthologous regions. Our study highlights the challenges in using regression models to predict chromatin accessibility changes between species.

Year of Publication
2025
Journal
bioRxiv : the preprint server for biology
Date Published
11/2025
ISSN
2692-8205
DOI
10.1101/2025.11.09.687449
PubMed ID
41292905
Links