DNA sequencing
DNA sequencing is the process of determining the
nucleotide order of a given
DNA fragment, called the
DNA sequence. Currently, almost all DNA sequencing is performed using the chain termination method [
1], developed by
Frederick Sanger. This technique uses sequence-specific termination of an
in vitro DNA synthesis reaction using modified nucleotide substrates.
The sequence of DNA encodes the necessary information for living things to survive and reproduce. Determining the sequence is therefore useful in 'pure' research into why and how organisms live, as well as in applied subjects. Because DNA is key to all living things, knowledge of DNA sequence may be useful in almost any biological subject area. For example, in medicine it can be used to identify, diagnose and potentially develop treatments for genetic diseases. Similarly, research into
pathogens may lead to treatments for contagious diseases.
|
Part of a radioactively labelled sequencing gel |
In chain terminator sequencing (Sanger sequencing), which is possible because of the availability of
PCR, extension is initiated at a specific site on the template DNA by using a short oligonucleotide 'primer' complementary to the template at that region. The
oligonucleotide primer is extended using a
DNA polymerase, an enzyme that replicates DNA. Included with the primer and DNA polymerase are the four deoxynucleotide bases (DNA building blocks), along with a low concentration of a chain terminating nucleotide (most commonly a
di-deoxynucleotide). Limited incorporation of the chain terminating nucleotide by the DNA polymerase results in a series of related DNA fragments that are terminated only at positions where that particular nucleotide is used. The fragments are then size-separated by electrophoresis in a slab polyacrylamide gel, or more commonly now, in a narrow glass tube (capillary) filled with a viscous polymer.
The classical chain termination method or Sanger method first involves preparing the DNA to be sequenced as a single strand. (The single-band preparation guarantees one band per nucleotide, whereas a double-strand preparation guarantees two bands, and makes sequence prediction impossible.) The DNA sample is divided into four separate samples. Each of the four samples has a
primer, the four normal
deoxynucleotides (dATP, dGTP, dCTP and dTTP),
DNA polymerase, and only one of the four
dideoxynucleotides (ddATP, ddGTP, ddCTP and ddTTP) added to it. The dideoxynucleotides are added in limited quantities. The primer or the dideoxynucleotides are either
radiolabeled or have a
fluorescent tag.
As the DNA strand is elongated the DNA polymerase catalyses the joining of deoxynucleotides to the corresponding bases. The bases available to the polymerase are a mixture of normal and tagged/terminating nucleotides. So if the appropriate dideoxynucleotide is happens to be near the polymerase, it is incorporated into the elongating DNA strand. The tagged/terminating base prevents further elongation because a dideoxynucleotide lacks a crucial 3'-OH group. So a series of DNA fragments are produced with random length and (base-nonspecific, hence the four separate reactions) tags. Unfortunately, only short stretches of DNA can be sequenced in each reaction. The PCR technique is limited to 10 000 base-pairs and the maximum length of extension is dictated by the concentration of tagged/terminaging nucleotides.
The DNA is then denatured and the resulting fragments are separated (with a resolution of just one nucleotide) by
gel electrophoresis, from longest to shortest. Each of the four DNA samples is run on one of four individual lanes (lanes A, T, G, C) depending on which dideoxynucleotide was added. Depending on the whether the primers or dideoxynucleotides were radiolabeled or fluorescently labeled, the DNA bands can be detected by exposure to X-rays or UV-light and the DNA sequence can be directly read off the gel. In the image on the right, X-ray film was exposed to the dried gel, and the dark bands indicate the positions of the DNA molecules of different lengths. A dark band in a lane indicates a chain termination for that particular DNA subunit and the DNA sequence can be read off as indicated.
There can be various problems with sequencing through the Sanger Method. The primer used can also be annealed to a second site. This would cause two sequences to be interpreted at the same time. This can be solved by higher annealing temperatures and higher G and C content in the primer. Another problem can occur when RNA contaminates the reaction, which can act like a primer and leads to bands in all lanes at all positions due to non specific priming. Other contaminants can be from other plasmids, inhibitors of DNA pol, and low concentrations in general. Secondary structure of DNA being read by DNA pol can lead to reading problems and will be visualized on the readout by bands in all lanes of only a few positions. In short, the problems of this method are the standard problems one would encounter in PCR.
There are two sub-types of chain-termination sequencing. In the original method, the nucleotide order of a particular DNA template can be inferred by performing four parallel extension reactions using one of the four chain-terminating bases in each reaction. The DNA fragments are detected by labelling the primer with a base-nonspecific label, radioactive phosphorous for example, prior to performing the sequencing reaction. The four reactions would then be run out in four adjacent lanes on a slab polyacrylamide gel.
The Sanger method can be done using primers that add a non-specific lable on the 5' end of the PCR product. Instead of the label being included in the terminating nucleotide, the label is in the primer. The difference between this and the radioactive Sanger method is that the label is at the 5' end instead of the 3' end. Four separate reactions are still required, but the dye labels can be read using a optical system instead of film or phosphor storage screens, so it is faster, cheaper, and easier to automate. This approach is known as 'dye-primer sequencing'.
Dye terminator sequencing
|
View of the start of an example dye-terminator read (click to expand) |
An alternative to the labelling of the primer is to label the terminators instead, commonly called 'dye terminator sequencing'. The major advantage of this approach is that the complete sequencing set can be performed in a single reaction, rather than the four needed with the labeled-primer approach. This is accomplished by labelling each of the dideoxynucleotide chain-terminators with a separate fluorescent dye, which fluoresces at a different
wavelength. This method is easier and quicker than the dye primer approach, but may produce more uneven data peaks (different heights), due to a template dependent difference in the incorporation of the large dye chain-terminators. This problem has been significantly reduced with the introduction of new enzymes and dyes that minimize incorporation variability.
This method is now used for the vast majority of sequencing reactions as it is both simpler and cheaper. The major reason for this is that the primers do not have to be separately labelled (which can be a significant expense for a single-use custom primer), although this is less of a concern with frequently used 'universal' primers.
Automation and sample preparation
Modern automated DNA sequencing instruments are able to sequence as many as 384 fluoresecently labelled samples in a batch (run) and perform as many as 24 runs a day. These perform only the size separation and peak reading; the actual sequencing reaction(s), cleanup and resuspension in a suitable
buffer must be performed separately.
The magnitude of the fluorescent signal is realated to the number of strands of DNA that are in the reaction. If the initial amount of DNA is small, the signals will be weak. However, the properties of PCR allow one to increase the signal by increasing the number of cycles in the PCR programme.
At around the same time that the Sanger sequencing method was introduced, Maxam and Gilbert developed a method of DNA sequencing based on chemical modification of DNA followed by its subsequent cleavage [
2].This method was initially popular since purified DNA could be used directly, while the initial Sanger method required that each read start be cloned for production of single-stranded DNA. As the chain termination method has been developed and improved, Maxam-Gilbert sequencing has fallen out of favour due to its technical complexity, the need for use of hazardous chemicals, and difficulties with scale-up.
Other sequencing techniques which are under development, and may offer benefits over the conventional methods, include:
*
Sequencing by Hybridization*
Pyrosequencing*
nanopore sequencingCurrent methods can directly sequence only short lengths of DNA at a time. For example, modern sequencing machines using the Sanger method can achieve a maximum of around 1000 base pairs [
3]. This limitation is due to the geometrically decreasing probability of chain termination at increasing lengths, as well as physical limitations on gel size and resolution.
It is often necessary to obtain the sequence of much larger regions. For example, even simple bacterial
genomes contain millions of base pairs, and the
human genome has more than 3 billion. Several strategies have been devised for large-scale DNA sequencing, including
primer walking (see also
chromosome walking) and
shotgun sequencing. These involve taking many small
reads of the DNA through the Sanger method and subsequently assembling them into a contiguous sequence. The different strategies have different tradeoffs in speed and accuracy; for example, the shotgun method is the most practical for sequencing large genomes, but its assembly process is complex and potentially error-prone.
It is easier to obtain high quality sequence data when the desired DNA is purified and amplified from any contaminants that may be in the original sample. This can be achieved through
PCR if it is practical to design primers that cover the entire desired region. Alternatively, the sample can be
cloned using a bacterial
vector, harnessing bacteria to "grow" copies of the desired DNA a few thousand base pairs at a time. Most large-scale sequencing efforts involve the preparation of a large
library of such clones. The advantage of sequencing clones over PCR-products is that the possibility of the presence of non-specific PCR products that may cause signal noise is virtually eliminated.
*
Sequencing*
Human Genome Project