Large-scale efforts have now led to the sequencing of many genomes. A range is summarized in Figure 3.9. They extend from the 0.6 × 106 bp of a mycoplasma to the 3.3 × 109 bp of the human genome, and include several important experimental animals, including yeasts, the fruit fly, and a nematode worm. (Web sites with summaries of genome sequences are listed at the end of this section).
Figure 3.10 summarizes the minimum number of genes found in each class of organism; of course, many species may have more than the minimum number required for their type.
The sequences of the genomes of bacteria and archaea show that virtually all of the DNA (typically 85-90%) codes for RNA or protein. Figure 3.11 shows that the range of genome sizes is about an order of magnitude, and that the genome size is proportional to the number of genes. The typical gene is about 1000 bp in length.
All of the bacteria with genome sizes below 1.5 Mb are obligate intracellular parasites?they live within a eukaryotic host that provides them with small molecules. Their genomes identify the minimum number of functions required to construct a cell. All classes of genes are reduced in number compared with bacteria with larger genomes, but the most significant reduction is in loci coding for enzymes concerned with metabolic functions (which are largely provided by the host cell) and with regulation of gene expression. Mycoplasma genitalium has the smallest genome, ~470 genes.
The archaea have biological properties that are intermediate between the prokaryotes and eukaryotes, but their genome sizes and gene numbers fall in the same range as bacteria. Their genome sizes vary from 1.5 - 3 Mb, corresponding to 1500 - 2700 genes. M. jannaschii is a methane-producing species that lives under high pressure and temperature. Its total gene number is similar to that of H. influenzae, but fewer of its genes can be identified on the basis of comparison with genes known in other organisms. Its apparatus for gene expression resembles eukaryotes more than prokaryotes, but its apparatus for cell division better resembles prokaryotes.
The archaea and the smallest free-living bacteria identify the minimum number of genes required to make a cell able to function independently in the environment. The smallest archaeal genome has ~1500 genes. The free-living bacterium with the smallest known genome is the thermophile Aquifex aeolicus, with 1.5 Mb and 1512 genes (Deckert et al., 1998). A "typical" gram-negative bacterium, H. influenzae, has 1,743 genes each of ~900 bp. So we can conclude that ~1500 genes are required to make a free-living organism.
Bacterial genome sizes extend over almost an order of magnitude to <8 Mb. The larger genomes have more genes. The bacteria with the largest genomes, S. meliloti and M. loti, are nitrogen-fixing bacteria that live on plant roots. Their genome sizes (~7 Mb) and total gene numbers (>6000) are similar to those of yeasts (Galibert et al., 2001).
The size of the genome of E. coli is in the middle of the range. The common laboratory strain has 4,288 genes, with an average length ~950 bp, and an average separation between genes of 118 bp (Blattner et al., 1997). But there can be quite significant differences between strains. The known extremes of E. coli are from the smallest strain that has 4.6 Mb with 4249 genes to the largest strain that has 5.5 Mb bp with 5361 genes
We still do not know the functions of all the genes. In most of these genomes, ~60% of the genes can be identified on the basis of homology with known genes in other species. These genes fall approximately equally into classes whose products are concerned with metabolism, cell structure or transport of components, and gene expression and its regulation. In virtually every genome, >25% of the genes cannot be ascribed any function. Many of these genes can be found in related organisms, which implies that they have a conserved function.
There has been some emphasis on sequencing the genomes of pathogenic bacteria, given their medical importance. An important insight into the nature of pathogenicity has been provided by the demonstration that "pathogenicity islands" are a characteristic feature of their genomes (for review see Hacker and Kaper, 2000). These are large regions, ~10-200 kb, that are present in the genome of a pathogenic species, but absent from the genomes of nonpathogenic variants of the same or related species. Their G-C content often differs from that of the rest of the genome, and it is likely that they migrate between bacteria by a process of horizontal transfer. For example, the bacterium that causes anthrax (B. anthracis) has two large plasmids (extrachromosomal DNA), one of which has a pathogenicity island that includes the gene coding for the anthrax toxin.