PMCID
PMC12889522

AlphaGenome-enabled analysis of non-coding regulatory variants underlying expression with wet-lab validation.

bioRxiv : the preprint server for biology
Authors
Abstract

Systematic identification of functional non-coding regulatory variants remains a major challenge in human genetics. Conventional approaches such as large-scale CRISPR screening and genome-wide association studies (GWAS) are powerful but often prohibitively expensive, time-consuming, and experimentally intensive, limiting their scalability for locus-specific mechanistic studies. Recent advances in artificial intelligence offer the potential to partially replace or substantially augment these approaches by prioritizing regulatory variants with high functional likelihood. The antigen, a major contributor to red blood cell alloimmunization, hemolytic transfusion reactions, and hemolytic disease of the fetus and newborn, serves as an excellent model for this paradigm, since coding variants alone do not fully account for differences in expression. Here, we present an integrated artificial intelligence (AI)-guided and experimental framework to identify and validate functional non-coding regulatory variants governing expression. We first applied AlphaGenome (AG), a deep-learning model released in 2025 for non-coding variant impact prediction, to systematically interrogate the locus. By integrating multi-omics datasets, AG prioritized regulatory regions within the promoter, 5' untranslated region (5'UTR), and intragenic regions. In silico deletion- and Single Nucleotide Polymorphism (SNP)-based perturbation analyses consistently predicted that variants within the promoter and its proximal regions, as well as within intragenic regions, exert strong suppressive effects on expression. To experimentally validate these predictions, we performed CRISPR-mediated base editing in K562 cells at AG-prioritized non-coding SNP sites. Editing of a high-score predicted variant (chr1:25272434 G>A) achieved efficient base conversion and was accompanied by additional nearby edits, all predicted by AG to downregulate expression. In contrast, editing of low-score predicted sites (chr1:25272422 C>T) produced much smaller functional effects. Quantitative polymerase chain reaction (qPCR) analysis of full-length transcripts, together with flow cytometry-based analysis of expression, confirmed strong concordance between AI-based predictions and transcriptional as well as phenotypic outcomes. Taken together, our results demonstrate that the combination of AI-guided regulatory variant prioritization and targeted base editing provides a potentially scalable and cost-effective alternative to traditional CRISPR screening for decoding functional non-coding variants in blood group genes, with direct implications for genomics-based RHD typing and transfusion medicine. To our knowledge, this study also represents the first validation of AlphaGenome predictions at the phenotypic level using wet-lab experiments.

Year of Publication
2026
Journal
bioRxiv : the preprint server for biology
Date Published
02/2026
ISSN
2692-8205
DOI
10.64898/2026.01.21.700828
PubMed ID
41676477
Links