Kristin Ardlie / en Wed, 03 Sep 25 14:10:36 -0400 How massive datasets generated at Ó³»­´«Ã½ are powering the latest AI models in biology /news/how-massive-datasets-generated-broad-are-powering-latest-ai-models-biology <span class="field field--name-title field--type-string field--label-hidden"><h1>How massive datasets generated at Ó³»­´«Ã½ are powering the latest AI models in biology </h1> </span> <span class="field field--name-uid field--type-entity-reference field--label-hidden"> <span>By Tom Ulrich</span> </span> <span class="field field--name-created field--type-created field--label-hidden"><time datetime="2025-09-03T14:10:36-04:00" class="datetime">September 3, 2025</time> </span> <div class="hero-section container"> <div class="hero-section__row row"> <div class="hero-section__content hero-section__content_left col-6"> <div class="hero-section__breadcrumbs"> <div class="block block-system block-system-breadcrumb-block"> <nav class="breadcrumb" role="navigation" aria-labelledby="system-breadcrumb"> <h2 id="system-breadcrumb" class="visually-hidden">Breadcrumb</h2> <ol> <li> <a href="/">Home</a> </li> <li> <a href="/news">News</a> </li> </ol> </nav> </div> </div> <div class="hero-section__title"> <div class="block block-layout-builder block-field-blocknodelong-storytitle"> <span class="field field--name-title field--type-string field--label-hidden"><h1>How massive datasets generated at Ó³»­´«Ã½ are powering the latest AI models in biology </h1> </span> </div> </div> <div class="hero-section__description"> <div class="block block-layout-builder block-field-blocknodelong-storybody"> <div class="clearfix text-formatted field field--name-body field--type-text-with-summary field--label-hidden field__item"><p>Ó³»­´«Ã½ scientists describe how data resources they helped build over more than a decade now form the foundation for cutting-edge AI and genome biology discoveries.</p> </div> </div> </div> <div class="hero-section__author"> <div class="block block-layout-builder block-extra-field-blocknodelong-storyextra-field-author-custom"> By Tom Ulrich </div> </div> <div class="hero-section__date"> <div class="block block-layout-builder block-field-blocknodelong-storycreated"> <span class="field field--name-created field--type-created field--label-hidden"><time datetime="2025-09-03T14:10:36-04:00" title="Wednesday, September 3, 2025 - 14:10" class="datetime">September 3, 2025</time> </span> </div> </div> </div> <div class="hero-section__right col-6"> <div class="hero-section__image"> <div class="block block-layout-builder block-field-blocknodelong-storyfield-image"> <div class="field field--name-field-image field--type-entity-reference field--label-hidden field__item"> <article class="media media--type-image media--view-mode-multiple-content-types-header"> <div class="field field--name-field-media-image field--type-image field--label-hidden field__item"> <picture> <source srcset="/files/styles/multiple_ct_header_desktop_xl/public/longstory/Kristin%20Ardlie%20Brad%20Bernstein%20AlphaGenome%20ENCODE%20GTEx.png?itok=sRd4S6HS 1x" media="all and (min-width: 1921px)" type="image/png" width="754" height="503"/> <source srcset="/files/styles/multiple_ct_header_desktop_xl/public/longstory/Kristin%20Ardlie%20Brad%20Bernstein%20AlphaGenome%20ENCODE%20GTEx.png?itok=sRd4S6HS 1x" media="all and (min-width: 1601px) and (max-width: 1920px)" type="image/png" width="754" height="503"/> <source srcset="/files/styles/multiple_ct_header_desktop/public/longstory/Kristin%20Ardlie%20Brad%20Bernstein%20AlphaGenome%20ENCODE%20GTEx.png?itok=oB7m1az4 1x" media="all and (min-width: 1340px) and (max-width: 1600px)" type="image/png" width="736" height="520"/> <source srcset="/files/styles/multiple_ct_header_laptop/public/longstory/Kristin%20Ardlie%20Brad%20Bernstein%20AlphaGenome%20ENCODE%20GTEx.png?itok=kMeF7Y51 1x" media="all and (min-width: 800px) and (max-width: 1339px)" type="image/png" width="641" height="451"/> <source srcset="/files/styles/multiple_ct_header_tablet/public/longstory/Kristin%20Ardlie%20Brad%20Bernstein%20AlphaGenome%20ENCODE%20GTEx.png?itok=2NMDhb_m 1x" media="all and (min-width: 540px) and (max-width: 799px)" type="image/png" width="706" height="417"/> <source srcset="/files/styles/multiple_ct_header_phone/public/longstory/Kristin%20Ardlie%20Brad%20Bernstein%20AlphaGenome%20ENCODE%20GTEx.png?itok=hxf3Ecob 1x" media="all and (max-width: 539px)" type="image/png" width="499" height="294"/> <img loading="eager" width="499" height="294" src="/files/styles/multiple_ct_header_phone/public/longstory/Kristin%20Ardlie%20Brad%20Bernstein%20AlphaGenome%20ENCODE%20GTEx.png?itok=hxf3Ecob" alt="Photos of two people, Kristin Ardlie on the left and Brad Bernstein on the right. Kristin is the director of GTEx and Brad was a leader of the ENCODE Consortium." title="Photos of two people, Kristin Ardlie on the left and Brad Bernstein on the right. Kristin is the director of GTEx and Brad was a leader of the ENCODE Consortium." typeof="foaf:Image" /> </picture> </div> <div class="media-caption"> <div class="media-caption__description"> Kristin Ardlie (left) and Brad Bernstein (right) </div> </div> </article> </div> </div> </div> </div> </div> </div> <div class="content-section container"> <div class="content-section__main"> <div class="block block-better-social-sharing-buttons block-social-sharing-buttons-block"> <div style="display: none"><link rel="preload" href="/modules/contrib/better_social_sharing_buttons/assets/dist/sprites/social-icons--no-color.svg" as="image" type="image/svg+xml" crossorigin="anonymous" /></div> <div class="social-sharing-buttons"> <a href="https://www.facebook.com/sharer/sharer.php?u=/taxonomy/term/2776/feed&title=" target="_blank" title="Share to Facebook" aria-label="Share to Facebook" class="social-sharing-buttons-button share-facebook" rel="noopener"> <svg aria-hidden="true" width="32px" height="32px" style="border-radius:100%;"> <use href="/modules/contrib/better_social_sharing_buttons/assets/dist/sprites/social-icons--no-color.svg#facebook" /> </svg> </a> <a href="https://twitter.com/intent/tweet?text=+/taxonomy/term/2776/feed" target="_blank" title="Share to X" aria-label="Share to X" class="social-sharing-buttons-button share-x" rel="noopener"> <svg aria-hidden="true" width="32px" height="32px" style="border-radius:100%;"> <use href="/modules/contrib/better_social_sharing_buttons/assets/dist/sprites/social-icons--no-color.svg#x" /> </svg> </a> <a href="mailto:?subject=&body=/taxonomy/term/2776/feed" title="Share to Email" aria-label="Share to Email" class="social-sharing-buttons-button share-email" target="_blank" rel="noopener"> <svg aria-hidden="true" width="32px" height="32px" style="border-radius:100%;"> <use href="/modules/contrib/better_social_sharing_buttons/assets/dist/sprites/social-icons--no-color.svg#email" /> </svg> </a> </div> </div> <div class="block block-layout-builder block-field-blocknodelong-storyfield-content-paragraphs"> <div class="field field--name-field-content-paragraphs field--type-entity-reference-revisions field--label-hidden field__items"> <div class="field__item"> <div class="paragraph paragraph--type--text-with-sidebar text-with-sidebar"> <div class="field field--name-field-sidebar field--type-entity-reference-revisions field--label-hidden field__items"> <div class="field__item"> <div class="paragraph paragraph--type--sidebar-menu sidebar-menu"> <div class="sidebar-menu__col"> <div class="clearfix text-formatted field field--name-field-heading field--type-text field--label-hidden field__item"><p>Related People</p> </div> <div class="field field--name-field-links field--type-link field--label-hidden field__items"> <div class="field__item"><a href="/bios/kristin-ardlie">Kristin Ardlie</a></div> <div class="field__item"><a href="/bios/bradley-e-bernstein">Brad Bernstein</a></div> </div> </div> </div> </div> <div class="field__item"> <div class="paragraph paragraph--type--sidebar-menu sidebar-menu"> <div class="sidebar-menu__col"> <div class="clearfix text-formatted field field--name-field-heading field--type-text field--label-hidden field__item"><p>Related Programs</p> </div> <div class="field field--name-field-links field--type-link field--label-hidden field__items"> <div class="field__item"><a href="/epigenomics">Epigenomics Program</a></div> <div class="field__item"><a href="/gene-regulation-observatory-gro">Gene Regulation Observatory</a></div> <div class="field__item"><a href="https://www.gtexportal.org/home/">Genotype Tissue Expression (GTEx) Project</a></div> <div class="field__item"><a href="https://www.humancellatlas.org/">Human Cell Atlas</a></div> <div class="field__item"><a href="https://igvf.org/">Impact of Genomic Variation on Function (IGVF) Consortium</a></div> </div> </div> </div> </div> <div class="field__item"> <div class="paragraph paragraph--type--sidebar-articles sidebar-articles"> <div class="sidebar-articles__col"> <div class="clearfix text-formatted field field--name-field-heading field--type-text field--label-hidden field__item"><p>Related news</p> </div> <div class="field field--name-field-content-reference field--type-entity-reference field--label-hidden field__items"> <div class="field__item"><article about="/news/massive-single-cell-atlas-across-human-tissues-highlights-cell-types-where-disease-genes-are" class="node"> <div class="field field--name-field-image field--type-entity-reference field--label-hidden field__item"><article class="media media--type-image media--view-mode-multiple-ct-sidebar-link-with-image"> <div class="field field--name-field-media-image field--type-image field--label-hidden field__item"> <a href="/news/massive-single-cell-atlas-across-human-tissues-highlights-cell-types-where-disease-genes-are"><picture> <source srcset="/files/styles/multiple_ct_sidebar_link_with_image_desktop_xl/public/news/images/2022/HCA_GTEx_2022_main.jpg?h=9423a5c0&itok=a16pTVBV 1x" media="all and (min-width: 1921px)" type="image/jpeg" width="104" height="104"/> <source srcset="/files/styles/multiple_ct_sidebar_link_with_image_desktop_xl/public/news/images/2022/HCA_GTEx_2022_main.jpg?h=9423a5c0&itok=a16pTVBV 1x" media="all and (min-width: 1601px) and (max-width: 1920px)" type="image/jpeg" width="104" height="104"/> <source srcset="/files/styles/multiple_ct_sidebar_link_with_image_desktop/public/news/images/2022/HCA_GTEx_2022_main.jpg?h=9423a5c0&itok=rFyaDh5d 1x" media="all and (min-width: 1340px) and (max-width: 1600px)" type="image/jpeg" width="87" height="104"/> <source srcset="/files/styles/multiple_ct_sidebar_link_with_image_desktop/public/news/images/2022/HCA_GTEx_2022_main.jpg?h=9423a5c0&itok=rFyaDh5d 1x" media="all and (min-width: 800px) and (max-width: 1339px)" type="image/jpeg" width="87" height="104"/> <source srcset="/files/styles/multiple_ct_sidebar_link_with_image_tablet/public/news/images/2022/HCA_GTEx_2022_main.jpg?h=9423a5c0&itok=M0WkGq6D 1x" media="all and (min-width: 540px) and (max-width: 799px)" type="image/jpeg" width="285" height="186"/> <source srcset="/files/styles/multiple_ct_sidebar_link_with_image_phone/public/news/images/2022/HCA_GTEx_2022_main.jpg?h=9423a5c0&itok=IJVf4kVJ 1x" media="all and (max-width: 539px)" type="image/jpeg" width="220" height="186"/> <img loading="eager" width="220" height="186" src="/files/styles/multiple_ct_sidebar_link_with_image_phone/public/news/images/2022/HCA_GTEx_2022_main.jpg?h=9423a5c0&itok=IJVf4kVJ" alt="Susanna Hamilton, Ó³»­´«Ã½ Communications" typeof="foaf:Image" /> </picture></a> </div> </article> </div> <div class="node__content"> <a href="/news/massive-single-cell-atlas-across-human-tissues-highlights-cell-types-where-disease-genes-are" class="node__title"><span class="field field--name-title field--type-string field--label-hidden">Massive single-cell atlas across human tissues highlights cell types where disease genes are active</span> </a> </div> </article> </div> <div class="field__item"><article about="/news/broad-scientists-join-new-consortium-studying-function-genetic-variation" class="node"> <div class="field field--name-field-image field--type-entity-reference field--label-hidden field__item"><article class="media media--type-image media--view-mode-multiple-ct-sidebar-link-with-image"> <div class="field field--name-field-media-image field--type-image field--label-hidden field__item"> <a href="/news/broad-scientists-join-new-consortium-studying-function-genetic-variation"><picture> <source srcset="/files/styles/multiple_ct_sidebar_link_with_image_desktop_xl/public/news/images/2021/Brad_Bernstein_Jason_Buenrostro_Main_Intranet-02.png?h=a7d067a7&itok=pqDORM2B 1x" media="all and (min-width: 1921px)" type="image/png" width="104" height="104"/> <source srcset="/files/styles/multiple_ct_sidebar_link_with_image_desktop_xl/public/news/images/2021/Brad_Bernstein_Jason_Buenrostro_Main_Intranet-02.png?h=a7d067a7&itok=pqDORM2B 1x" media="all and (min-width: 1601px) and (max-width: 1920px)" type="image/png" width="104" height="104"/> <source srcset="/files/styles/multiple_ct_sidebar_link_with_image_desktop/public/news/images/2021/Brad_Bernstein_Jason_Buenrostro_Main_Intranet-02.png?h=a7d067a7&itok=UZuZZDPA 1x" media="all and (min-width: 1340px) and (max-width: 1600px)" type="image/png" width="87" height="104"/> <source srcset="/files/styles/multiple_ct_sidebar_link_with_image_desktop/public/news/images/2021/Brad_Bernstein_Jason_Buenrostro_Main_Intranet-02.png?h=a7d067a7&itok=UZuZZDPA 1x" media="all and (min-width: 800px) and (max-width: 1339px)" type="image/png" width="87" height="104"/> <source srcset="/files/styles/multiple_ct_sidebar_link_with_image_tablet/public/news/images/2021/Brad_Bernstein_Jason_Buenrostro_Main_Intranet-02.png?h=a7d067a7&itok=w7oHas8Q 1x" media="all and (min-width: 540px) and (max-width: 799px)" type="image/png" width="285" height="186"/> <source srcset="/files/styles/multiple_ct_sidebar_link_with_image_phone/public/news/images/2021/Brad_Bernstein_Jason_Buenrostro_Main_Intranet-02.png?h=a7d067a7&itok=gVt0v4_J 1x" media="all and (max-width: 539px)" type="image/png" width="220" height="186"/> <img loading="eager" width="220" height="186" src="/files/styles/multiple_ct_sidebar_link_with_image_phone/public/news/images/2021/Brad_Bernstein_Jason_Buenrostro_Main_Intranet-02.png?h=a7d067a7&itok=gVt0v4_J" alt="Institute member Bradley Bernstein, director of the Epigenomics Program and Gene Regulation Observatory (left); Associate member Jason Buenrostro, assistant professor, Harvard University (right)" title="Institute member Bradley Bernstein, director of the Epigenomics Program and Gene Regulation Observatory (left); Associate member Jason Buenrostro, assistant professor, Harvard University (right)" typeof="foaf:Image" /> </picture></a> </div> </article> </div> <div class="node__content"> <a href="/news/broad-scientists-join-new-consortium-studying-function-genetic-variation" class="node__title"><span class="field field--name-title field--type-string field--label-hidden">Ó³»­´«Ã½ scientists join new consortium studying the function of genetic variation</span> </a> </div> </article> </div> <div class="field__item"><article about="/news/conversation-about-legacy-encode-and-what-comes-next-0" class="node"> <div class="field field--name-field-image field--type-entity-reference field--label-hidden field__item"><article class="media media--type-image media--view-mode-multiple-ct-sidebar-link-with-image"> <div class="field field--name-field-media-image field--type-image field--label-hidden field__item"> <a href="/news/conversation-about-legacy-encode-and-what-comes-next-0"><picture> <source srcset="/files/styles/multiple_ct_sidebar_link_with_image_desktop_xl/public/news/images/2021/ChuckEpstein_main.png?h=d3e04ee7&itok=Cu7_KaEj 1x" media="all and (min-width: 1921px)" type="image/png" width="104" height="104"/> <source srcset="/files/styles/multiple_ct_sidebar_link_with_image_desktop_xl/public/news/images/2021/ChuckEpstein_main.png?h=d3e04ee7&itok=Cu7_KaEj 1x" media="all and (min-width: 1601px) and (max-width: 1920px)" type="image/png" width="104" height="104"/> <source srcset="/files/styles/multiple_ct_sidebar_link_with_image_desktop/public/news/images/2021/ChuckEpstein_main.png?h=d3e04ee7&itok=Ayg3Pfwx 1x" media="all and (min-width: 1340px) and (max-width: 1600px)" type="image/png" width="87" height="104"/> <source srcset="/files/styles/multiple_ct_sidebar_link_with_image_desktop/public/news/images/2021/ChuckEpstein_main.png?h=d3e04ee7&itok=Ayg3Pfwx 1x" media="all and (min-width: 800px) and (max-width: 1339px)" type="image/png" width="87" height="104"/> <source srcset="/files/styles/multiple_ct_sidebar_link_with_image_tablet/public/news/images/2021/ChuckEpstein_main.png?h=d3e04ee7&itok=DghWV6TL 1x" media="all and (min-width: 540px) and (max-width: 799px)" type="image/png" width="285" height="186"/> <source srcset="/files/styles/multiple_ct_sidebar_link_with_image_phone/public/news/images/2021/ChuckEpstein_main.png?h=d3e04ee7&itok=JF7rRdZi 1x" media="all and (max-width: 539px)" type="image/png" width="220" height="186"/> <img loading="eager" width="220" height="186" src="/files/styles/multiple_ct_sidebar_link_with_image_phone/public/news/images/2021/ChuckEpstein_main.png?h=d3e04ee7&itok=JF7rRdZi" alt="" typeof="foaf:Image" /> </picture></a> </div> </article> </div> <div class="node__content"> <a href="/news/conversation-about-legacy-encode-and-what-comes-next-0" class="node__title"><span class="field field--name-title field--type-string field--label-hidden">A conversation about the legacy of ENCODE and what comes next</span> </a> </div> </article> </div> <div class="field__item"><article about="/news/broad%E2%80%99s-new-gene-regulation-observatory-will-take-most-detailed-look-yet-non-coding-genome" class="node"> <div class="field field--name-field-image field--type-entity-reference field--label-hidden field__item"><article class="media media--type-image media--view-mode-multiple-ct-sidebar-link-with-image"> <div class="field field--name-field-media-image field--type-image field--label-hidden field__item"> <a href="/news/broad%E2%80%99s-new-gene-regulation-observatory-will-take-most-detailed-look-yet-non-coding-genome"><picture> <source srcset="/files/styles/multiple_ct_sidebar_link_with_image_desktop_xl/public/news/images/2021/GRO-announcement-main.png?h=d3e04ee7&itok=ipvAfkYh 1x" media="all and (min-width: 1921px)" type="image/png" width="104" height="104"/> <source srcset="/files/styles/multiple_ct_sidebar_link_with_image_desktop_xl/public/news/images/2021/GRO-announcement-main.png?h=d3e04ee7&itok=ipvAfkYh 1x" media="all and (min-width: 1601px) and (max-width: 1920px)" type="image/png" width="104" height="104"/> <source srcset="/files/styles/multiple_ct_sidebar_link_with_image_desktop/public/news/images/2021/GRO-announcement-main.png?h=d3e04ee7&itok=MAs-8X9L 1x" media="all and (min-width: 1340px) and (max-width: 1600px)" type="image/png" width="87" height="104"/> <source srcset="/files/styles/multiple_ct_sidebar_link_with_image_desktop/public/news/images/2021/GRO-announcement-main.png?h=d3e04ee7&itok=MAs-8X9L 1x" media="all and (min-width: 800px) and (max-width: 1339px)" type="image/png" width="87" height="104"/> <source srcset="/files/styles/multiple_ct_sidebar_link_with_image_tablet/public/news/images/2021/GRO-announcement-main.png?h=d3e04ee7&itok=RXB4vTbk 1x" media="all and (min-width: 540px) and (max-width: 799px)" type="image/png" width="285" height="186"/> <source srcset="/files/styles/multiple_ct_sidebar_link_with_image_phone/public/news/images/2021/GRO-announcement-main.png?h=d3e04ee7&itok=soNXw163 1x" media="all and (max-width: 539px)" type="image/png" width="220" height="186"/> <img loading="eager" width="220" height="186" src="/files/styles/multiple_ct_sidebar_link_with_image_phone/public/news/images/2021/GRO-announcement-main.png?h=d3e04ee7&itok=soNXw163" alt="" typeof="foaf:Image" /> </picture></a> </div> </article> </div> <div class="node__content"> <a href="/news/broad%E2%80%99s-new-gene-regulation-observatory-will-take-most-detailed-look-yet-non-coding-genome" class="node__title"><span class="field field--name-title field--type-string field--label-hidden">The Ó³»­´«Ã½â€™s new Gene Regulation Observatory will take the most detailed look yet at the non-coding genome</span> </a> </div> </article> </div> <div class="field__item"><article about="/news/gtex-consortium-releases-fresh-insights-how-dna-differences-govern-gene-expression" class="node"> <div class="field field--name-field-image field--type-entity-reference field--label-hidden field__item"><article class="media media--type-image media--view-mode-multiple-ct-sidebar-link-with-image"> <div class="field field--name-field-media-image field--type-image field--label-hidden field__item"> <a href="/news/gtex-consortium-releases-fresh-insights-how-dna-differences-govern-gene-expression"><picture> <source srcset="/files/styles/multiple_ct_sidebar_link_with_image_desktop_xl/public/news/images/2020/gtex_image.png?h=d3e04ee7&itok=RM_ACKHj 1x" media="all and (min-width: 1921px)" type="image/png" width="104" height="104"/> <source srcset="/files/styles/multiple_ct_sidebar_link_with_image_desktop_xl/public/news/images/2020/gtex_image.png?h=d3e04ee7&itok=RM_ACKHj 1x" media="all and (min-width: 1601px) and (max-width: 1920px)" type="image/png" width="104" height="104"/> <source srcset="/files/styles/multiple_ct_sidebar_link_with_image_desktop/public/news/images/2020/gtex_image.png?h=d3e04ee7&itok=Rlyv1NBq 1x" media="all and (min-width: 1340px) and (max-width: 1600px)" type="image/png" width="87" height="104"/> <source srcset="/files/styles/multiple_ct_sidebar_link_with_image_desktop/public/news/images/2020/gtex_image.png?h=d3e04ee7&itok=Rlyv1NBq 1x" media="all and (min-width: 800px) and (max-width: 1339px)" type="image/png" width="87" height="104"/> <source srcset="/files/styles/multiple_ct_sidebar_link_with_image_tablet/public/news/images/2020/gtex_image.png?h=d3e04ee7&itok=TUw4ln6f 1x" media="all and (min-width: 540px) and (max-width: 799px)" type="image/png" width="285" height="186"/> <source srcset="/files/styles/multiple_ct_sidebar_link_with_image_phone/public/news/images/2020/gtex_image.png?h=d3e04ee7&itok=8WGsu0sJ 1x" media="all and (max-width: 539px)" type="image/png" width="220" height="186"/> <img loading="eager" width="220" height="186" src="/files/styles/multiple_ct_sidebar_link_with_image_phone/public/news/images/2020/gtex_image.png?h=d3e04ee7&itok=8WGsu0sJ" alt="Susanna M. Hamilton, Ó³»­´«Ã½ Communications" typeof="foaf:Image" /> </picture></a> </div> </article> </div> <div class="node__content"> <a href="/news/gtex-consortium-releases-fresh-insights-how-dna-differences-govern-gene-expression" class="node__title"><span class="field field--name-title field--type-string field--label-hidden">GTEx Consortium releases fresh insights into how DNA differences govern gene expression</span> </a> </div> </article> </div> </div> </div> </div> </div> </div> <div class="clearfix text-formatted field field--name-field-text field--type-text-long field--label-hidden field__item"><p>In June, Google DeepMind took the wraps off <a href="https://deepmind.google/discover/blog/alphagenome-ai-for-better-understanding-the-genome/" target="_blank">AlphaGenome</a>, its latest machine learning model for biological discovery. While DeepMind's Nobel Prize-winning AlphaFold model focuses on proteins and how they fold, AlphaGenome predicts how genetic variants affect the processes that control when and where genes are turned on and off. </p> <p>In their announcement and <a href="https://www.biorxiv.org/content/10.1101/2025.06.25.661532v2" target="_blank">preprint</a>, DeepMind cited two resources — both funded by the NIH and largely created at the Ó³»­´«Ã½ in the 2010s — as their main sources of training data for AlphaGenome: the <a href="https://www.encodeproject.org/help/project-overview/" target="_blank">Encyclopedia of DNA Elements (ENCODE) Consortium</a>, which cataloged more than a million regulatory elements across the genome, and the <a href="https://www.gtexportal.org/home/" target="_blank">Genotype-Tissue Expression (GTEx) Project</a>, which continues to map the gene expression patterns of human and primate tissues. </p> <p>Both resources have also been instrumental in revealing how the genome works and how noncoding genetic variants impact disease risk, and laid the groundwork for efforts like the NIH's <a href="https://igvf.org/" target="_blank">Impact of Genomic Variation on Function Consortium</a>, the <a href="https://www.humancellatlas.org/" target="_blank">Human Cell Atlas</a>, and the Ó³»­´«Ã½'s <a href="/gene-regulation-observatory-gro">Gene Regulation Observatory (GRO)</a>. </p> <p>To learn more about how ENCODE, GTEx, and similar datasets are fueling science in the AI age, we spoke with <a href="/bios/kristin-ardlie">Kristin Ardlie</a>, an institute scientist at Ó³»­´«Ã½ and director of GTEx; and <a href="/bios/bradley-e-bernstein">Brad Bernstein</a>, an institute member, leader of the GRO, director of Ó³»­´«Ã½'s <a href="/epigenomics">Epigenomics Program</a>, and a leader of the ENCODE Consortium. </p> <p><strong>When they started, what were the goals of ENCODE and GTEx?</strong></p> <p><strong>Brad Bernstein:</strong> ENCODE's goal was to understand the language of the genome. When it started, only 1 to 2 percent of the genome could be explained. Nobody knew how much of the other 98 percent was functional, or how it impacted the regulation of the cell. With ENCODE, we realized that maybe 20 percent of the genome looked like it had regulatory or functional roles. It changed the idea that the nonconding part of the genome was just junk.</p> <p><strong>Kristin Ardlie:</strong> And that's what launched GTEx. Once we reached the point where human genetic studies were reliably finding variants associated with diseases and traits, we realized that most were in those unknown regions of the genome, and we had no idea how they functioned. GTEx was launched as a way to systematically measure whether those genetic variants might have regulatory roles that affect gene expression in the context of tissues and cells and disease.</p> <p><strong>What does the emergence of AlphaGenome and other large language models say about the value of resources like ENCODE and GTEx?</strong></p> <p><strong>KA:</strong> These resources' legacy is enduring, in that more than a decade or two after we started building them, they're enabling developments that we couldn't have considered. They were designed to be community resources and to be as utilitarian as possible, with no constraints on their use. This latest development is a testament to the fact that they really are working as intended.</p> <p>And I think that this success shows us the path forward. The last five or 10 years has seen a lot of effort put into building atlases of single cells, which is going to be remarkably powerful as well. To find new opportunities to impact disease, we need to understand how biology and disease work at a cellular level.</p> <p>I keep thinking about sitting at the eye doctor, where they flip little lenses in front of your eyes to see what your eyeglass prescription should be so you can see things more clearly. The resolution these models have achieved is so much finer than before, but they need to be even finer and more powerful to really help us understand genome function. To achieve that, we need more of these unbiased foundational resources. Their value as training data for models that could help define the systematic rules of the genome is truly remarkable.</p> <p><strong>What are some other ways in which AI is helping us understand genome regulation?</strong> </p> <p><strong>BB:</strong> There's a number of labs just here at Ó³»­´«Ã½ that are applying machine learning to the regulatory code. <a href="/bios/jason-d-buenrostro">Jason Buenrostro</a>, who leads the GRO with me, is using deep learning to work out <a href="https://www.linkedin.com/posts/broad-institute_2_multiscale-footprints-reveal-the-organization-activity-7291198568906805248-nzJy/" target="_blank">how regulatory elements close to genes like promoters are organized and change as cells develop</a>. Our colleague Anders Hansen is applying AI to <a href="https://www.biorxiv.org/content/10.1101/2025.05.06.650874v1.full" target="_blank">map the genome's structure and organization in 3D</a>, which is incredibly important for understanding long-range interactions between elements and how they control expression of both genes and entire genetic programs. My own team collaborated with scientists at Google on <a href="https://innovations.dana-farber.org/ai-learns-genomic-language-to-advance-cancer-treatment/" target="_blank">a general model of the genome’s regulatory code that can be readily applied to any new cell type</a>. There's a lot going on.</p> <p><strong>What do you want to see happen over the next five years to make AI as useful as possible for genomic discovery?</strong></p> <p><strong>KA:</strong> We need to continue developing resources that focus on perturbations in humans — biological changes that affect health. Take development. We go through many changes as we develop, and in a sense that's a big sort of perturbation. How do we study that process systematically and at scale, and what can it teach us? That's what the next phase of GTEx is working to discover.</p> <p>Disease is another form of perturbation, one that we often look at just from the perspective of an endpoint. But really it's a process by which cells go from normal to not-normal. What's going on there? We need to gather data systematically across that continuum. </p> <p>A better understanding of variants' functions will help us to interpret the results of genetic testing. When we screen a patient’s genome we often end up with variants whose significance we can’t determine.  Many of these might be regulatory variants that could be very consequential in disease, but which we can't yet interpret. We need these data resources and models like AlphaGenome to help us better interpret what these variants are doing.</p> <p><strong>BB:</strong> We have a lot of data from ENCODE and other resources about where transcription factors and other things bind to DNA, which genes are turned on in which cell type, and whatnot. But we don't have massive amounts of data in human cells about variations and perturbations. I'd like to see more data that comes from picking individual cell types and mutagenizing the whole genome to help us decipher the genome's regulatory code. </p> <p>It's a complicated question, though. What and how much data would we have to generate to power models that could fully understand long-range regulatory events, complex mechanisms, chromatin structures, and conformations that go well beyond transcription factor binding? It kind of boggles the mind.</p> <p>If we get it right, however, large language models like AlphaGenome could help us answer a debate about how best to interpret variants' functions: should we drill down on variants one-by-one, or should we use AI models to explore the rules of the genome in an agnostic, holistic way? I'm excited about figuring that out.<br>  </p> </div> </div> </div> </div> </div> </div> </div> <div class="content-section container"> <div class="content-section__main"> <div class="block-node-broad-tags block block-layout-builder block-field-blocknodelong-storyfield-broad-tags"> <div class="block-node-broad-tags__row"> <div class="block-node-broad-tags__title">Tags:</div> <div class="field field--name-field-broad-tags field--type-entity-reference field--label-hidden field__items"> <div class="field__item"><a href="/broad-tags/epigenomics" hreflang="en">Epigenomics</a></div> <div class="field__item"><a href="/broad-tags/gene-regulation-observatory" hreflang="en">Gene Regulation Observatory</a></div> <div class="field__item"><a href="/broad-tags/kristin-ardlie" hreflang="en">Kristin Ardlie</a></div> <div class="field__item"><a href="/broad-tags/brad-bernstein" hreflang="en">Brad Bernstein</a></div> </div> </div> </div> </div> </div> Wed, 03 Sep 2025 18:10:36 +0000 tulrich@broadinstitute.org 5559316 at