The most common type of duplication generates a second copy of the gene close to the first copy. In some cases, the copies remain associated, and further duplication may generate a cluster of related genes. The best characterized example of a gene cluster is presented by the globin genes, which constitute an ancient gene family, concerned with a function that is central to the animal kingdom: the transport of oxygen through the bloodstream.
The major constituent of the red blood cell is the globin tetramer, associated with its heme (iron-binding) group in the form of hemoglobin. Functional globin genes in all species have the same general structure, divided into three exons as shown previously in Figure 2.7. We conclude that all globin genes are derived from a single ancestral gene; so by tracing the development of individual globin genes within and between species, we may learn about the mechanisms involved in the evolution of gene families.
The division of globin chains into ?-like and ?-like reflects the organization of the genes. Each type of globin is coded by genes organized into a single cluster. The structures of the two clusters in the higher primate genome are illustrated in Figure 4.3.
Stretching over 50 kb, the ? cluster contains five functional genes (? , two ? , ? , and ?) and one nonfunctional gene (??). The two ? genes differ in their coding sequence in only one amino acid; the G variant has glycine at position 136, where the A variant has alanine.
The more compact ? cluster extends over 28 kb and includes one active ? gene, one ? nonfunctional gene, two ? genes, two ? nonfunctional genes, and the ? gene of unknown function. The two ? genes code for the same protein. Two (or more) identical genes present on the same chromosome are described as nonallelic copies.
The details of the relationship between embryonic and adult hemoglobins vary with the organism. The human pathway has three stages: embryonic, fetal, and adult. The distinction between embryonic and adult is common to mammals, but the number of pre-adult stages varies. In Man, zeta and alpha are the two ?-like chains. Epsilon, gamma, delta, and beta are the ?-like chains. Figure 4.4 shows how yhe chains are expressed at different stages of development.
In the human pathway, ? is the first ?-like chain to be expressed, but is soon replaced by ?. In the ?-pathway, ? and ? are expressed first, with ? and ? replacing them later. In adults, the ?2?2 form provides 97% of the hemoglobin, ?2?2 is ~2%, and ~1% is provided by persistence of the fetal form ?2?2.
What is the significance of the differences between embryonic and adult globins? The embryonic and fetal forms have a higher affinity for oxygen. This is necessary in order to obtain oxygen from the mother's blood. This explains why there is no equivalent in (for example) chicken, where the embryonic stages occur outside the body (that is, within the egg).
Functional genes are defined by their expression in RNA, and ultimately by the proteins for which they code. Nonfunctional genes are defined as such by their inability to code for proteins; the reasons for inactivity vary, and the deficiencies may be in transcription or translation (or both). They are called pseudogenes and given the symbol ?.
A similar general organization is found in other vertebrate globin gene clusters, but details of the types, numbers, and order of genes all vary, as illustrated in Figure 4.5. Each cluster contains both embryonic and adult genes. The total lengths of the clusters vary widely. The longest is found in the goat, where a basic cluster of 4 genes has been duplicated twice. The distribution of active genes and pseudogenes differs in each case, illustrating the random nature of the conversion of one copy of a duplicated gene into the inactive state.
The characterization of these gene clusters makes an important general point. There may be more members of a gene family, both functional and nonfunctional, than we would suspect on the basis of protein analysis. The extra functional genes may represent duplicates that code for identical polypeptides; or they may be related to known proteins, although different from them (and presumably expressed only briefly or in low amounts).
With regard to the question of how much DNA is needed to code for a particular function, we see that coding for the ?-like globins requires a range of 20-120 kb in different mammals. This is much greater than we would expect just from scrutinizing the known ?-globin proteins or even considering the individual genes. However, clusters of this type are not common; most genes are found as individual loci.
From the organization of globin genes in a variety of species, we should be able to trace the evolution of present globin gene clusters from a single ancestral globin gene. Our present view of the evolutionary descent is pictured in Figure 4.6 (for review see
Hardison, 1998).
The leghemoglobin gene of plants, which is related to the globin genes, may represent the ancestral form. The furthest back that we can trace a globin gene in modern form is provided by the sequence of the single chain of mammalian myoglobin, which diverged from the globin line of descent ~800 million years ago. The myoglobin gene has the same organization as globin genes, so we may take the three-exon structure to represent their common ancestor.
Some "primitive fish" have only a single type of globin chain, so they must have diverged from the line of evolution before the ancestral globin gene was duplicated to give rise to the ? and ? variants. This appears to have occurred ~500 million years ago, during the evolution of the bony fish.
The next stage of evolution is represented by the state of the globin genes in the frog X. laevis, which has two globin clusters. However, each cluster contains both? and ? genes, of both larval and adult types. The cluster must therefore have evolved by duplication of a linked ?-? pair, followed by divergence between the individual copies. Later the entire cluster was duplicated.
The amphibians separated from the mammalian/avian line ~350 million years ago, so the separation of the ?- and ?-globin genes must have resulted from a transposition in the mammalian/avian forerunner after this time. This probably occurred in the period of early vertebrate evolution. Since there are separate clusters for ? and ? globins in both birds and mammals, the ? and ? genes must have been physically separated before the mammals and birds diverged from their common ancestor, an event that occurred probably ~270 million years ago.
Changes have occurred within the separate ? and ? clusters in more recent times, as we see from the description of the divergence of the individual genes in 4.4 Sequence divergence is the basis for the evolutionary clock.