AI generates short DNA sequences that show promise for gene therapies

The generative AI model designed sequences that successfully reactivated a protective gene in leukemia cell lines.

Illustration of a boxy robot with green eyes printing long pages of DNA sequences.
Credit: Susanna Hamilton
Scientists are training AI models to recognize and write pieces of human DNA that control gene expression, in hopes that one day these synthetic sequences can improve genetic medicine.

Highlights

  • The model, called DNA-Diffusion, is based on the same AI technology underpinning image generators like OpenAI’s DALL-E. 
  • The new model outperformed other methods in generating DNA segments that were functional and cell type-specific.
  • The researchers hope these synthetic gene regulatory elements could be used to improve gene therapies.

Scientists at the ӳý and Mass General Brigham have built a generative AI model that creates short DNA segments that can control gene activity in specific cells. These sequences, called cis-regulatory elements (CREs), make up a large part of the human genome, and synthetic versions of these bits of DNA could one day be part of gene therapies that tune gene activity to treat disease. 

The model, called DNA-Diffusion, designed robust synthetic regulatory elements, including ones that reactivated a protective gene in leukemia cell lines. According to study senior author Luca Pinello, this technology could potentially lead to new therapeutic strategies that combine synthetic regulatory elements with existing gene therapy technologies to ensure that these therapies reach the right cell types in the body. The study is published in Nature Genetics.

“If you think about DNA as a language, you cannot master the language just by removing letters or inserting words into a sentence,” said Pinello, a ӳý associate member and faculty at Mass General Brigham Cancer Institute. “To learn a language, we should be able to create whole new sentences.” 

The team says their model outperformed other methods in generating synthetic CREs that were functional, cell type-specific, and had a large variety of sequences. 

“In this paper, we demonstrate that this model not only can create sequences that appear to work in cells, but we can modulate the specificity, the activity, and the intensity of gene expression,” Pinello said. 

Targeting a cancer gene

The core technology is based on diffusion models, an AI technology that powers image generators like DALL-E and Stable Diffusion. These models are trained to analyze images at the pixel level and then to generate new images using patterns they learned during training. Pinello and his team adapted this approach for DNA, training their model on chromatin accessibility data from regions of the genome that contain regulatory elements. Chromatin accessibility data reveal which regulatory elements are actively being used by the cell to control gene activity. After training the model on this data from three cell types, the team used it to generate more than 5,800 synthetic CREs. When they tested these sequences in the lab, they found that they maintained their gene regulatory functions in specific cell types. 

To further demonstrate their technology, the researchers focused on the AXIN2 gene, which protects against chronic lymphocytic leukemia and is often turned off in B cells in patients with this disease. Using a cell line derived from individuals with chronic lymphocytic leukemia, Pinello and his collaborators analyzed the activity of 100 sequences, including 60 generated by their model. They found that many of the synthetic CREs were more effective at switching on AXIN2 than their natural counterparts. They also showed that sequences designed for B cells showed activity, whereas sequences designed for other cell types did not activate AXIN2

Pinello and his team are now expanding the scope of their model and hope to combine their technology with genomic medicine approaches such as genome editors or gene therapies that use adeno-associated viruses (AAVs) to deliver therapeutic cargo to specific cells or tissues in the body. 

“People at the ӳý and all over the world are working on technologies to modify the genome in therapeutic ways, so gene therapies combined with this technology, we propose, can be a very powerful tool,” Pinello said. 

Funding

This research was supported by the National Institutes of Health and the Rappaport MGH Research Scholar Award, as well as a Krantz Center Spark Award. 

Paper cited

DaSilva, L.F., Senan, S., et al. . Nature Genetics. Online December 23, 2025. DOI: 10.1038/s41588-025-02441-6