When a gene is uninterrupted, the restriction map of its DNA corresponds exactly with the map of its mRNA.
When a gene possesses an intron, the map at each end of the gene corresponds with the map at each end of the message sequence. But within the gene, the maps diverge, because additional regions are found in the gene, but are not represented in the message. Each such region corresponds to an intron. The example of Figure 2.5 compares the restriction maps of a ?-globin gene and mRNA. There are two introns. Each intron contains a series of restriction sites that are absent from the cDNA. But the pattern of restriction sites in the exons is the same in both the cDNA and the gene (Wenskink et al., 1974; Berget, Moore, and Sharp, 1977; Chow et al., 1977; Glover and Hogness, 1977; Jeffreys and Flavell, 1977).
Ultimately a comparison of the nucleotide sequences of the genomic and mRNA sequences precisely defines the introns. As indicated in Figure 2.6, an intron usually has no open reading frame. An intact reading frame is created in the mRNA sequence by the removal of the introns.
The structures of eukaryotic genes show extensive variation. Some genes are uninterrupted, so that the genomic sequence is colinear with that of the mRNA. Most higher eukaryotic genes are interrupted, but the introns vary enormously in both number and size.
All classes of genes may be interrupted: nuclear genes coding for proteins, nucleolar genes coding for rRNA, and genes coding for tRNA. Interruptions also are found in mitochondrial genes in lower eukaryotes, and in chloroplast genes. Interrupted genes do not appear to be excluded from any class of eukaryotes, and have been found in bacteria and bacteriophages, although they are extremely rare in prokaryotic genomes.
Some interrupted genes possess only one or a few introns. The globin genes provide an extensively studied example (see 2.11 The members of a gene family have a common organization). The two general types of globin gene,
? and
?, share a common type of structure. The consistency of the organization of mammalian globin genes is evident from the structure of the "generic" globin gene summarized in Figure 2.7.
Interruptions occur at homologous positions (relative to the coding sequence) in all known active globin genes, including those of mammals, birds, and frogs. The first intron is always fairly short, and the second usually is longer, but the actual lengths can vary. Most of the variation in overall lengths between different globin genes results from the variation in the second intron. In the mouse, the second intron in the ?-globin gene is only 150 bp long, so the overall length of the gene is 850 bp, compared with the major ?-globin gene where the intron length of 585 bp gives the gene a total length of 1382 bp. The variation in length of the genes is much greater than the range of lengths of the mRNAs (?-globin mRNA = 585 bases, ?-globin mRNA = 620 bases).
The example of DHFR, a somewhat larger gene, is shown in Figure 2.8. The mammalian DHFR (dihydrofolate reductase) gene is organized into 6 exons that correspond to the 2000 base mRNA. But they extend over a much greater length of DNA because the introns are very long. In three mammals the exons remain essentially the same, and the relative positions of the introns are unaltered, but the lengths of individual introns vary extensively, resulting in a variation in the length of the gene from 25-31 kb.
The globin and DHFR genes present examples of a general phenomenon: genes that are related by evolution have related organizations, with conservation of the positions of (at least some) of the introns. Variations in the lengths of the genes are primarily determined by the lengths of the introns.