Biology
THE CONTENT OF THE GENOME
KEY TERMS:
- The genome is the complete set of sequences in the genetic material of an organism. It includes the sequence of each chromosome plus any DNA in organelles.
- The transcriptome is the complete set of RNAs present in a cell, tissue, or organism. Its complexity is due mostly to mRNAs, but it also includes noncoding RNAs.
- The proteome is the complete set of proteins that is expressed by the entire genome. Because some genes code for multiple proteins, the size of the proteome is greater than the number of genes. Sometimes the term is used to describe complement of proteins expressed by a cell at any one time.
The key question about the genome is how many genes it contains. We can think about the total number of genes at four levels, corresponding to successive stages in gene expression:
- The genome is the complete set of genes of an organism. Ultimately it is defined by the complete DNA sequence, although as a practical matter it may not be possible to identify every gene unequivocally solely on the basis of sequence.
- The transcriptome is the complete set of genes expressed under particular conditions. It is defined in terms of the set of RNA molecules that is present, and can refer to a single cell type or to any more complex assembly of cells up to the complete organism. Because some genes generate multiple mRNAs, the transcriptome is likely to be larger than the number of genes defined directly in the genome. The transcriptome includes noncoding RNAs as well as mRNAs.
- The proteome is the complete set of proteins. It should correspond to the mRNAs in the transcriptome, although there can be differences of detail reflecting changes in the relative abundance or stabilities of mRNAs and proteins. It can be used to refer to the set of proteins coded by the whole genome or produced in any particular cell or tissue.
- Proteins may function independently or as part of multiprotein assemblies. If we could identify all protein-protein interactions, we could define the total number of independent assemblies of proteins.
The number of genes in the genome can be identified directly by defining open reading frames. Large scale mapping of this nature is complicated by the fact that interrupted genes may consist of many separated open reading frames. Since we do not necessarily have information about the functions of the protein products, or indeed proof that they are expressed at all, this approach is restricted to defining the potential of the genome. However, a strong presumption exists that any conserved open reading frame is likely to be expressed.
Another approach is to define the number of genes directly in terms of the transcriptome (by directly identifying all the mRNAs) or proteome (by directly identifying all the proteins). This gives an assurance that we are dealing with bona fide genes that are expressed under known circumstances. It allows us to ask how many genes are expressed in a particular tissue or cell type, what variation exists in the relative levels of expression, and how many of the genes expressed in one particular cell are unique to that cell or are also expressed elsewhere.
Concerning the types of genes, we may ask whether a particular gene is essential: what happens to a null mutant? If a null mutation is lethal, or the organism has a visible defect, we may conclude that the gene is essential or at least conveys a selective advantage. But some genes can be deleted without apparent effect on the phenotype. Are these genes really dispensable, or does a selective disadvantage result from the absence of the gene, perhaps in other circumstances, or over longer periods of time?
-
Expressed Gene Number Can Be Measured En Masse
KEY CONCEPTS:"Chip" technology allows a snapshot to be taken of the expression of the entire genome in a yeast cell. ~75% (~4500 genes) of the yeast genome is expressed under normal growth conditions. Chip technology allows detailed comparisons of related...
-
The Human Genome Has Fewer Genes Than Expected
KEY CONCEPTS:Only 1% of the human genome consists of coding regions. The exons comprise ~5% of each gene, so genes (exons plus introns) comprise ~25% of the genome. The human genome has 30,000-40,000 genes. ~60% of human genes are alternatively spliced....
-
How Many Different Types Of Genes Are There?
KEY TERMS:The proteome is the complete set of proteins that is expressed by the entire genome. Because some genes code for multiple proteins, the size of the proteome is greater than the number of genes. Sometimes the term is used to describe complement...
-
Total Gene Number Is Known For Several Eukaryotes
KEY CONCEPTS:There are 6000 genes in yeast, 18,500 in worm, 13,600 in fly, 25,000 in the small plant Arabidopsis, and probably 30,000 in mouse and <40,000 in Man. As soon as we look at eukaryotic genomes, the relationship between genome size and gene...
-
Some Dna Sequences Code For More Than One Protein
KEY CONCEPTS:The use of alternative initiation or termination codons allows two proteins to be generated where one is equivalent to a fragment of the other. Nonhomologous protein sequences can be produced from the same sequence of DNA when it is read...
Biology