The genome of an organism is the sum total of its genetic information. The genome is not only a blueprint for the organism it also contains historical notes on the evolution of the organism. The ability to determine the sequence of deoxyribonucleic acid (DNA) and thus read the messages in the genome is of immense biological importance because it not only describes the organism in detail but also indicates its evolutionary history.
DNA is a linear chain of four nucleotides : adenosine (A), thymidine (T), cytidine (C), and guanosine (G). The genetic information in DNA is encoded in the sequence of these nucleotides much like the information in a word is encoded in a sequence of letters. The technique for determining the sequence of nucleotides in DNA is based on the same mechanism by which DNA is replicated in the cell. DNA is composed of two complementary strands in which the As of one strand are paired with the Ts of the complementary strand and the Cs of one strand are paired with the Gs of the complementary strand. When DNA is replicated, a new DNA strand (primer strand) is extended by using the information in the complementary (template) strand. The DNA has a direction (polarity); the growing end of a DNA strand is the end that is 3 and the other end is the 5 . An enzyme , DNA polymerase, replicates DNA by adding nucleotides to the 3 end of the primer strand, which complement the template strand. (Figure 2.)
DNA polymerase has an absolute requirement for a hydroxyl group (OH) on the 3 end of the template strand. If the 3 hydroxyl group is missing no further nucleotides can be added to the template strand. This termination of the elongation of the template strand is the basis for determining the DNA sequence. If the DNA polymerase is presented with a mixture of nucleotides, some of which have 3 OH groups and others of which have no 3 OH group (and are bound to a colored dye), both types of nucleotides are added to the growing template strand. When a nucleotide with no OH group is added to the primer strand, elongation is terminated with the colored dye at the 3 end of the strand.
All essential elements for determining the sequence of nucleotides in the primer DNA strand are in place. A DNA synthesis reaction is set up in a test tube (in vitro), including DNA polymerase, a template DNA strand, a short uniform primer DNA strand, and a mixture of the four nucleotides (A, T, C, and G). The short primer DNA strands are synthesized chemically and are identical so they pair with a specific sequence in the template DNA strand. Each of the nucleotides is present in two forms, the normal form with a 3 hydroxyl group and the terminating form with a colored dye and no 3 hydroxyl group. Each different terminating nucleotide (A, T, C, and G) has a different colored dye attached.
The amount of normal nucleotides present in the reaction is much larger than the terminating nucleotides so that DNA synthesis proceeds almost normally, and only occasionally is the elongation of the primer strand terminated by the incorporation of a dye labeled nucleotide lacking a 3 hydroxyl group. However, eventually all of the primer strands do incorporate a dye labeled nucleotide and their elongation is terminated. Thus, at the end of the reaction there is a vast collection of primer strands of varying lengths each terminated with a nucleotide that has a colored dye specific to the terminal nucleotide.
All of the primer strands start at the same point, specified by the sequence of the short uniform primer DNA. Thus, the length of the primer strand corresponds to the position of the terminal nucleotide in the DNA sequence relative to the starting position of the primer DNA strand. The color of the dye on the primer strand identifies the terminal nucleotide as an A, T, C, or G. Once the primer strands are arranged according to length, the DNA sequence will be indicated by the series of colors on progressively longer primer strands.
The DNA strands can be readily separated according to length by acrylamide gel electrophoresis (see Figure 1). The acrylamide gel is a loose matrix of fibers through which the DNA can migrate. The DNA molecules have a large negative charge and thus are pulled toward the plus electrode in an electric field. The whole collection of primer strand DNA molecules is placed in a well at the top of an acrylamide gel with the plus electrode at the bottom of the gel. When the electric field is applied the DNA molecules are drawn toward the plus electrode, with shorter molecules passing through the gel matrix more easily than longer molecules. Thus the smaller DNA molecules move the fastest.
After a fixed period of time, the DNA molecules are separated according to length with the shortest molecules moving furthest down the gel. All of the molecules of a given length will form a band and will have the same terminal nucleotide and thus the same color. The DNA sequence can be read from the colors of the bands. One reads the sequence of the DNA from the 5 end starting at the bottom of the gel to the 3 end at the top of the gel.
In practice the whole process is automated; the bands are scanned with a laser as they pass a specific point in the gel. These scans produce profiles for each nucleotide, as shown in the lower portion of Figure 3. A computer program then determines the DNA sequence from these colored profiles, as shown in the upper portion of Figure 3. A single automated DNA sequencing instrument can determine more than 100,000 nucleotides of DNA sequence per day and a large sequencing facility can often produce over 10 million nucleotides of sequence per day. This high sequencing capacity has made it feasible to determine the complete DNA sequence of large genomes including the human genome.
Hartl, Daniel L., and Elizabeth W. Jones. Genetics: Principles and Analysis, 4th ed. Sudbury, MA: Jones and Bartlett, 1998.
Raven, Peter H., and George B. Johnson. Biology. New York: McGraw-Hill, 1999.
Watson, James D., Michael Gilman, Jan Witkowski, and Mark Zoller. Recombinant DNA, 2nd ed. New York: Scientific American Books, 1992.