About 5% of the human genome reflects DNA that has been carefully conserved through the course of mammalian evolution. Though the importance of some of these conserved regions is clear — they encode proteins, the body’s miniature workhorses— the lion’s share has proven considerably more perplexing. Such regions, called “conserved non-coding elements” (CNEs), are presumed to be important by virtue of their painstaking preservation, but their purpose is largely unknown. In a study that appears in the April 18 online edition of the Proceedings of the National Academy of Sciences, a team of scientists from the ӳý offers a tantalizing glimpse of CNEs’ mysterious ways.
Led by ӳý director Eric Lander, the work takes its cue from a handful of recent studies that suggest some CNEs serve regulatory functions, inducing nearby genes to turn on or off. The precise mechanics of this control, however, remain obscure. For instance, what “hardware” physically interacts with CNEs to influence genes? And what is the “software” programmed within CNEs that orchestrates these interactions?
Zeroing in on the software side, first author Xiaohui Xie and his ӳý colleagues used computational methods to cull data from a wide variety of mammalian genomes, including the human genome. This enabled the scientists to carefully examine CNEs for the presence of short DNA sequences or “motifs”, which can form a kind of docking site for proteins to bind to DNA. While their analysis uncovered more than 200 different types of motifs interspersed throughout the human genome, two motifs stood out because of their dense distribution. Each one appears thousands of times across the genome.
To learn more about these two unusually large groups of motifs, the researchers used a combination of proteomics and biochemical approaches that allowed them to decipher the motifs’ associated hardware — that is, the specific proteins that bind to them. Surprisingly, the scientists discovered that one group is bound by a protein previously thought to play a minor role in humans. Exactly how that protein functions is unclear, but given the frequency at which its corresponding motif appears in humans, it seems likely to participate broadly in human gene regulation.
Even more intriguing, though, is what the researchers found for the second group. Xie and his colleagues determined that it is bound by the CTCF protein, which works not to switch genes on and off, but rather to limit their activity to defined regions of the genome. In this way, CTCF forms a sort of genetic barrier or “insulator”. The finding is noteworthy because until now, insulators were known to function at only a few places in the human genome. Moreover, the finding led to the discovery of additional regions that likely serve as insulators — nearly 15,000 in total — suggesting a previously unsuspected importance in dividing the genome into distinct functional domains.
A recent independent study from researchers at the University of California San Diego also described thousands of CTCF-binding sites across the human genome and together with the current work, points to extensive role for insulators in humans. Moreover, the ӳý team’s discovery reveals an unsuspected way in which CNEs might function, providing a window on the inner workings of some of the most crucial parts of the human genome.
Other ӳý scientists who authored the study include Tarjei Mikkelsen, Andreas Gnirke, Kerstin Lindblad-Toh and Manolis Kellis.
Paper cited:
Xie X, Mikkelsen TS, Gnirke A, Lindblad-Toh K, Kellis M, Lander ES. (2007) . Proceedings of the National Academy of Sciences; DOI: 10.1073/pnas.0701811104