
From an engineering standpoint, the design of the human genome definitely raises a few eyebrows. Of its 3 billion chemical building blocks, or "nucleotides," just 5% are deemed functional, a discovery that recently emerged from detailed comparisons of the human genome to other mammalian genomes, such as mouse, rat and dog. While less than half of these working nuts and bolts represent protein-coding genes, the rest remain a complete mystery. We know that they show evidence of careful preservation throughout evolutionary history, which speaks to their biological importance, yet they do not appear to code for proteins.
Adding another wrinkle to the mystery, these conserved noncoding elements, or "CNEs," seem to be evolutionary newcomers. They appear essentially out of the blue and virtually none of the CNEs in humans can be traced beyond vertebrates — animals that have a backbone. On the other hand, genes tend to be the time-honored treasures of evolution and are preserved all the way from humans to worms. This leaves us to wonder where exactly did CNEs come from and what do they do?
Answering these questions has been hindered by the fact that CNEs are an eclectic mix, and therefore quite difficult to generalize or group into clusters based on shared characteristics. However, scientists have focused mainly on the CNEs that reside in the non-repetitive or unique parts of the human genome, largely ignoring what might lie within the repetitive sequences — more commonly known as junk DNA. Typically regarded as mere genetic rubbish, most of these repeats, which together fill about half of the human genome, are "transposons." Transposons can readily copy and paste themselves in different locations in the genome — like Xerox machines on wheels — but they often lose their mobility over time. It is these immobilized has-beens, frozen in place, that litter much of the human genome.
We sifted through this genomic junk to identify new CNEs and stumbled upon a very unusual group called MER121. As we described in the first of two publications in the Proceedings of the National Academy of Sciences, this cluster is remarkable both for its size and its evolutionary conservation. Most of the MER121 family, which numbers around 1,000, is conserved in other mammals, including dogs, mice and rats. Indeed, other CNEs have been found hidden within repetitive DNA, but they typically represent one or two lone elements, not an extensive family.
Xiaohui Xie (left) and Mike Kamal (right)Photo by Maria NemchukSearching among other genomes, we noted that many members of the MER121 family are also present in the marsupial Monodelphis domestica, but only a handful of copies exist in the chicken. This suggests that the MER121 family burst onto the genome scene sometime after the appearance of birds but before marsupials, although there were no genetic clues as to how this could have happened. Instead of remaking each family member from scratch, we speculated that a transposon might have helped the MER121 family to multiply and spread, thereby explaining its widespread membership in mammalian genomes. Active transposons can be usually be distinguished by their characteristic DNA sequences, but these genetic signatures often fade as they lose their copy-and-paste capabilities and we could find no evidence that MER121 family members ever behaved as transposons.
In search of genetic support for the role of transposons in dispersing CNEs, we put together a list of CNEs that do not seem to fit within the known classes of mammalian DNA repeats. We compared this list to a catalogue of known transposable elements in humans and other vertebrates, including mammals, birds and fish, searching for the faintest semblance of a transposon. This revealed a striking similarity between a CNE on human chromosome 2 and an active transposable element in zebrafish called SINE3-1. The relationship is unusually strong, given the considerable evolutionary gap between humans and fish.
Strangely, the similarity between the human CNE and the zebrafish SINE3-1 transposon is confined to an otherwise unremarkable region. SINE3-1 corresponds to a group of transposons known as SINEs, which are found throughout the animal kingdom. SINEs are typically composed of three different DNA parts: a characteristic "head" region that recruits the cellular machinery needed for the transposon to copy itself, a well-recognized "tail" end that helps to insert these copies into different sites, and an intervening region, which usually exhibits no outstanding features and whose function, if any, is unknown. It is within this nondescript central piece that the human CNE and the zebrafish SINE3-1 are most similar to each other. However, because their likeness does not extend outward into the signature regions on either end, the proof remained suggestive — not unequivocal — that this human CNE originated from a transposon.
After a closer look, we discovered that there were many other human CNEs that resemble the zebrafish SINE3-1 transposon, more than 100 in total. With only one exception, these new additions mimic our original find, and appear similar to SINE3-1 only within their central regions. The lone outlier, however, shares the DNA sequence of SINE3-1's distinguishing head portion as well as its middle, providing indisputable evidence that it and its related CNEs arose from a transposon.
The SINE3 family is noteworthy for other reasons, too. Most family members that exist in humans are preserved with similar sequences and positions in other mammalian genomes. In addition, we spotted several instances within the coelacanth, a lobe-finned fish often referred to as a "living fossil," despite the relatively small amount of genome sequence that is currently available. This striking observation indicates that the SINE3 element has been preserved over some 450 million years and further underscores its biological importance. The SINE3 family members in coelecanth are intriguing because they seem to be modular in their design, incorporating different head and tail sequences from the ones used in other species, but retaining the characteristic central region.
In addition to the MER121 and SINE3 families, we have assembled another 100 families of CNEs that have been carefully preserved in mammals. Our work suggests that transposons played a creative role in shaping the human genome, serving as a vehicle for duplicating and dispersing specific DNA sequences. Whether or not these sequences were retained over time, thereby attaining CNE status, likely depended upon precisely where in the genome they landed and what evolution had to say about the arrangement. In addition, this transposon-based mechanism provides a plausible explanation for why CNEs seem to appear so suddenly within vertebrate lineages. Although the functions of CNEs are still a puzzle, we have put in place some of the first key pieces that reveal how and when CNEs first arose.
Kamal M, Xie X, Lander ES. (2006) . PNAS; 103:2740-2745. DOI:10.1073/pnas.0511238103.
Xie X, Kamal M, Lander ES. (2006) t. PNAS; 103:11659-11664. DOI:10.1073/pnas.0604768103