WO1995021269A1 - Method and apparatus for analyzing genetic material - Google Patents

Method and apparatus for analyzing genetic material Download PDF

Info

Publication number
WO1995021269A1
WO1995021269A1 PCT/US1995/001395 US9501395W WO9521269A1 WO 1995021269 A1 WO1995021269 A1 WO 1995021269A1 US 9501395 W US9501395 W US 9501395W WO 9521269 A1 WO9521269 A1 WO 9521269A1
Authority
WO
WIPO (PCT)
Prior art keywords
dna
genetic material
pcr
chamber
nucleotide
Prior art date
Application number
PCT/US1995/001395
Other languages
French (fr)
Inventor
Mark K. Perlin
Michael B. Gorin
Original Assignee
Perlin Mark K
Gorin Michael B
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Perlin Mark K, Gorin Michael B filed Critical Perlin Mark K
Publication of WO1995021269A1 publication Critical patent/WO1995021269A1/en

Links

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6813Hybridisation assays
    • C12Q1/6827Hybridisation assays for detection of mutation or polymorphism
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6844Nucleic acid amplification reactions
    • C12Q1/6858Allele-specific amplification

Definitions

  • the present invention pertains to a process for determining inheritance patterns in eukaryotic DNA. More specifically, the" present invention is related to densely sampling the genome with polymorphic genetic markers using a hybridization-based genotyping method, and then using this genetic information to assess the trait inheritance, including disease susceptibility, mendelian genetic disorders, and complex traits relevant for plant or animal husbandry.
  • One such hybridization-based genotyping method entails forming mismatched heteroduplexes and quantitating single-stranded loop sizes.
  • the specific objective of the system is genome-wide high-resolution genotyping for the purpose of health risk assessment, including genetic susceptibility for disease, and identification of disease-associated genes.
  • the means for achieving this is genotyping polymorphic genetic loci by hybridization assays.
  • Genomic mismatch scanning (Nelson, S.F., McCusker, J.H. , Sander, M.A. , Kee, Y. , Modrich, P. , and Brown, P.O. 1993. Genomic mismatch scanning: a new approach to genetic linkage mapping. Nature Genetics , 4 (May) : 11-18.) , incorporated by reference, is one such approach, but has limited throughput since experiments are done on pairs (not sets) of individuals.
  • a sequence-tagged site is defined herein as a location on a genome characterized by at least one sequence. Much of this effort is done by Weissenbach's group at CEPH in France (Weissenbach, J., Gyapay, G., Dib, C. , Vignal, A., Morissette, J. , Millasseau, P., Vaysseix, G. , and Lathrop, M. 1992. A second generation linkage map of the human genome. Nature , 359: 794-801), incorporated by reference, and by Lander's group at the Whitehead Institute in Cambridge, Massachusetts. STSs are readily amplified by means of the polymerase chain reaction
  • VNTR variable nucleotide tandem repeat
  • the approach described herein centers on a detailed examination of such highly polymorphic intron genetic markers, rather than the highly conserved genes and their exon coding regions.
  • the method also applies to expanded repeats within genes, and specific nucleotide alterations of specific DNA sequences.
  • genotyping (1) an associated technology that will reduce the cost and error of the requisite genotyping, and thus enable widespread usage. Further, this technology must be coupled with (2) data acquisition and analysis methods that allow for fully automated error detection, risk analysis, and linkage analysis for both populations and families. Completion of this analysis generates a vast amount of data, hence the results must (3) be presented in a targeted fashion to disparate groups of end-users.
  • the novel parallel genotyping apparatus for polymorphic VNTRs The approach is to spatially localize each genetic locus in a two-dimensional array, and then locally aggregate PCR-amplified DNA products to the proper array regions. Then, perform DNA hybridization studies by means of a detection mechanism to quantitate properties of the PCR products, and thereby determine the alleles (i.e., the genotype) for every genetic locus.
  • a VNTR is a linear sequence of (deoxy)nucleotides of the pattern LW n R, where W is a short DNA sentence repeated n times, contained within two flanking regions of unique sequences: the left flanking region L, and the right flanking region R. These flanking sequences establish the singularity of a specific VNTR within a haploid genome. These unique sequences allow a VNTR to be associated with a specific location within the genome such that it can be physically or genetically mapped with respect to other DNA markers and/or genetic traits and disorders. Variations in the number of repetitive elements within the VNTR are common among individuals and allow specific alleles to be tracked as they are genetically transmitted from individuals to their offspring.
  • VNTRs are the short tandem repeat (STR), where n tends to be small (e.g., ⁇ 100), and repeating unit short (e.g., between two and five).
  • a CA-repeat is an STR where the dinucleotide CA is repeated n times, where n ranges in a human population from roughly ten to forty. There are an estimated 100,000 such CA-repeat loci in the human genome.
  • Other VNTRs include trinucleotide and tetranucleotide repeats. Following PCR, the allelic variation in tandem repeat number can be determined by DNA size measurements using polyacrylamide gel electrophoresis.
  • VNTRs are important for several reasons.
  • Many VNTRs have been associated with specific diseases (e.g., Huntington's disease, fragile X syndrome) (Kre er, I., Pritchard, M. , Lynch, M. , Yu, S., Holman, K. , Baker, E., Warren, S.T., Schlessinger, D. , Sutherland, G.R. , and Richards, R.I. 1991. Mapping of DNA instability at the Fragile X to a trinucleotide repeat sequence p(CCG) n . Science , 252: 1711-1714), incorporated by reference, where, in "anticipation", larger n often correlates with increased severity.
  • STRs serve as highly useful markers for specific diseases (Clemens, P., Fenwick, R. , Chamberlain, J. , Gibbs, R. , de Andrade, M. , Chakraborty, R. , and Caskey, C. 1991. Linkage analysis for Duchenne and Becker muscular dystrophies using dinucleotide repeat polymorphisms. Am J Hum Genet , 49: 951-960.), incorporated by reference.
  • (3) STRs are useful as sequence tagged sites (STSs) (Olson, M. , Hood, L. , Cantor, C. , and Botstein, D. 1989. A common language for physical mapping of the human genome.
  • genotyping is easily effected by measuring the total length of the PCR product. This is commonly done by spatially (or temporally) separating DNA molecules of different sizes (or conformations) using, for example, gel electrophoresis.
  • This invention therefore describes more cost effective approaches that enable higher throughput STR genotyping.
  • These methods employ nucleotide hybridization assays that directly measure the number of STR repeat units, rather than total fragment length.
  • Such detections by hybridization are miniaturizable, hence parallelizable (Monaco, A.P., Lam, V.M.S., Zehetner, G. , Lennon, G.G., Douglas, C. , Nizetic, D. , Goodfellow, P.N. , and Lehrach, H. 1991.
  • Nucleic Acids Res , 19(12): 3315-3318. incorporated by reference, and, ultimately, highly manufacturable. Further, they can be adapted to work in chemical solutions, or on substrates with small surface area.
  • the first method entails creating and detecting loop mismatches in heteroduplexes formed from the alleles* PCR products.
  • the second method uses hybridization panels to determine the alleles.
  • the present invention pertains to an apparatus for analyzing the genetic material of an organism.
  • the apparatus comprises means for amplifying the genetic material of the organism.
  • the apparatus also comprises means for characterizing the amplified genetic material.
  • the characterizing means is in communication with the amplifying means.
  • the characterizing means contains all of the genetic material within a region having a radius of less than two feet.
  • the amplifying means and characterizing means characterize the genetic material at a rate exceeding 100 sequence-tagged sites per hour per organism. The sequence-tagged sites are inherent to the genetic material.
  • the genetic material includes nucleotide sequences.
  • the amplifying means preferably includes a reaction plate with which the genetic material is in contact.
  • the reaction plate has a plurality of chambers, each of which is disposed in a unique location of the plate corresponding to a location within a genome having at least one nucleotide sequence.
  • the characterizing means preferably includes means for detecting whether a chamber contains a nucleotide sequence of the genetic material corresponding to the chamber's unique location.
  • the apparatus preferably also includes a thermocycler in thermal communication with the plate to heat and cool the plate.
  • the detecting means preferably includes a detector connected to the chambers which produces a chamber signal for each chamber corresponding to genetic material in each chamber.
  • the detecting means preferably also includes a processor in communication with the detector which receives the signal and identifies unique properties of the nucleotides in each chamber. The unique properties of the nucleotide of the genetic material in each chamber pertain to a number of nucleotides in any of the nucleotide sequences of the genetic material.
  • the amplifying means preferably includes at least one nucleotide sequence that corresponds to each chamber and which is in contact with the chamber. Each nucleotide sequence interacts with the nucleotide sequence of the genetic material of the nucleotide sequence if it is present.
  • the present invention also pertains to a method for analyzing genetic material of an organism.
  • the method comprises the steps of amplifying the genetic material. Then there is the step of characterizing the amplified genetic material in a region having a radius of less than 20 feet at a rate exceeding 100 sequence-tagged sites per hour per organism.
  • the genetic material includes RNA or DNA.
  • the characterizing step there preferably is the step of accessing risk of illness for which there is a genetic susceptibility in the organism. Such illnesses can include cancer, heart disease, etc.
  • the present invention also pertains to a method for manufacturing an apparatus for analyzing genetic material of an organism.
  • the method comprises the steps of placing corresponding sequence-tagged sites in contact with corresponding chambers of a plate. Then, there is the step of connecting detectors to the chambers which can detect where the nucleotide sequences of the genetic material of the organism, when placed in contact with the chambers, have reacted with the corresponding sequence-tagged sites in the corresponding chamber. Then, there is the step of placing a thermocycling device in contact with the plate to cause the sequence-tagged sites in the chambers to react with genetic material of the organism that is placed in contact with the chambers. Next, there is the step of connecting a computer to the detectors and to the thermocycling device to control operation of the thermocycling device, and to receive signals which correspond to the genetic material of the organism and the sequence-tagged sites of each chamber from the detectors.
  • the present invention also pertains to a method for determining the size of nucleotide sequences of an STR marker contained on genetic material comprising the steps of: amplifying the nucleotide sequences of the genetic material in a region relating to the STR marker. Then there is the step of performing nucleic acid hybridizations on the amplified nucleotide sequences. Then there is the step of producing signals corresponding to the hybridizations of the amplified nucleotide sequences. Then there is the step of determining the sizes of the nucleotide sequences contained in the genetic material.
  • Figure 1 is a schematic representation of a preferred embodiment of the apparatus.
  • Figure 2 is a schematic representation of parts of
  • FIGS 3a-3d list the steps for parallel genotyping of the present invention.
  • Figures 4a and 4b are schematic representations of mismatched loops formed from allele DNA.
  • Figure 5 includes figures 5a-5c and is a schematic representation of loop mismatch for determining a sum of STR alleles.
  • Figure 6 includes figure 6a and is a block diagram showing loop mismatch for determining a difference of STR alleles.
  • Figure 7 is a flow chart for determining the STR alleles from the sum and difference.
  • Figure 8 is a flow chart of loop mismatch protocol for a single STR locus.
  • Figure 9 is a flow chart for reducing the number of PCR experiments.
  • Figures lOa-lOc show representations for increasing measured signal from loops with respect to summation experiment.
  • Figures 11a and lib are representations for increasing measured signal from loops with respect to difference experiments.
  • Figure 12 is a flow chart of concordance mapping for genetic patterns.
  • Figure 13 includes parts a-c and is a flow chart for determining an STR allele sum from a nucleic acid synthesis step.
  • Figure 14 includes parts a-c and is a flow chart for determining an STR allele difference from a nucleic acid synthesis step.
  • Figure 15 is a flow chart for determining STR alleles from a nucleic acid synthesis step.
  • Figure 16 is a schematic representation of an assay for determining STR alleles from a nucleic acid ligation step.
  • Figure 17 includes parts a-b and is a schematic representation of an assay for determining STR alleles from a nucleic acid loop ligation step.
  • the apparatus comprises means for amplifying the genetic material of the organism.
  • the apparatus also comprises means for characterizing the amplified genetic material.
  • the characterizing means is in communication with the amplifying means.
  • the characterizing means contains all of the genetic material within a region having a radius of less than two feet. It should be noted that the region could have a radius of any reasonable size commensurate with the requirements of the task. For instance, the radius of the region could range from 1 cubic millimeter up to 10 feet by and anywhere in between.
  • the amplifying means and characterizing means characterize the genetic material at a rate preferably exceeding 100 sequence- tagged sites per hour per organism. It should be noted that the rate could be up to 100,000 sequence-tagged sites per hour per organism, or as slow as desired, or any rate in between. Also, per organism could also be defined to be the characterization of genetic material of multiple organisms. The sequence-tagged sites are inherent to the genetic material.
  • the genetic material includes nucleotide sequences.
  • the amplifying means preferably includes a reaction plate 102 with which the genetic material is in contact.
  • the reaction plate 102 has a plurality of chambers, each of which is disposed in a unique location of the plate 102 corresponding to a location within a genome having at least one nucleotide sequence.
  • the characterizing means preferably includes means for detecting whether a chamber contains a nucleotide sequence of the genetic material corresponding to the chamber's unique location.
  • the apparatus preferably also includes a thermocycler 104 in thermal communication with the plate 102 to heat and cool the plate 102.
  • the detecting means preferably includes a detector 108 connected to the chambers which produces a chamber signal for each chamber corresponding to genetic material in each chamber.
  • the detecting means preferably also includes a processor 110 in communication with the detector 108 which receives the signal and identifies unique properties of the nucleotides in each chamber.
  • the unique properties of the nucleotide of the genetic material in each chamber pertain to a number of nucleotides in any of the nucleotide sequences of the genetic material.
  • the amplifying means preferably includes at least one nucleotide sequence that corresponds to each chamber and which is in contact with the chamber. Each nucleotide sequence interacts with the nucleotide sequence of the genetic material of the nucleotide sequence if it is present.
  • the present invention also pertains to a method for analyzing genetic material of an organism.
  • the method comprises the steps of amplifying the genetic material. Then there is the step of characterizing the amplified genetic material in a region having a radius of less than 20 feet at a rate exceeding 100 sequence-tagged sites per hour per organism.
  • the genetic material includes RNA or DNA.
  • the characterizing step there preferably is the step of accessing risk of illness for which there is a genetic susceptibility in the organism. Such illnesses can include cancer, heart disease, etc.
  • the present invention also pertains to a method for manufacturing an apparatus for analyzing genetic material of an organism.
  • the method comprises the steps of placing corresponding sequence-tagged sites in contact with corresponding chambers of a plate 102. Then, there is the step of connecting detectors 108 to the chambers which can detect where the nucleotide sequences of the genetic material of the organism, when placed in contact with the chambers, have reacted with the corresponding sequence-tagged sites in the corresponding chamber. Then, there is the step of placing a thermocycling device 104 in contact with the plate 102 to cause the sequence-tagged sites in the chambers to react with genetic material of the organism that is placed in contact with the chambers.
  • thermocycling device 104 there is the step of connecting a computer 110 to the detectors 108 and to the thermocycling device 104 to control operation of the thermocycling device 104, and to receive signals which correspond to the genetic material of the organism and the sequence-tagged sites of each chamber from the detectors 108.
  • the present invention also pertains to a method for determining the size of nucleotide sequences of an STR marker contained on genetic material comprising the steps of: amplifying the nucleotide sequences of the genetic material in a region relating to the STR marker. Then there is the step of performing nucleic acid hybridizations on the amplified nucleotide sequences. Then there is the step of producing signals corresponding to the hybridizations of the amplified nucleotide sequences. Then there is the step of determining the sizes of the nucleotide sequences contained in the genetic material.
  • a parallel genotyping apparatus is described.
  • the purpose of said apparatus is to provide a physical, chemical, mechanical, and computational embodiment for performing simultaneous experiments on multiple genetic markers used for genetic characterization.
  • the apparatus is comprised of the following components:
  • thermocycling device 104 (1) A multi-chambered reaction plate 102. (2) A thermocycling device 104.
  • a computer device 110 with a memory.
  • the biochemical reactions occur in the chambers of the reaction plate 102, wherein a "chamber" denotes any localized region suitable for performing said reactions.
  • the thermocycling device 104 provides a means for PCR and hybridization experiments.
  • the robotic device 106 provides a means for transferring chemicals and performing other physical/chemical operations.
  • the detection device 108 is used to quantitatively measure the signals from DNA hybridization experiments.
  • the computer device 110 coordinates the activity of the other components, and performs any needed computations.
  • the primary requirement of the multi-chambered reaction plate 102 is a set of spatially arrayed chambers, each containing its own PCR primers for genome characterization, and providing operations for PCR amplification, DNA hybridization, and signal detection.
  • Any physical device, of any number of dimensions, in whole or in part, that provides this functionality can serve as a physical embodiment for the apparatus.
  • parallel synthesis methods for producing the oligonucleotides by spatially addressable masking techniques on a surface have been described (Fodor, ⁇ .P.A., Read, J.L., Pirrung, M.C., Stryer, L. , Lu, A.T., and Solas, D. 1991. Light-directed spatially addressable parallel chemical synthesis.
  • the basic container for the parallel genotyping reactions is a commercially available polystyrene or polycarbonate 384-chamber microtiter plate (USA Scientific Products, Ocala, FI) .
  • Alternative embodiments include 96-chamber and 864-chamber plates. Each chamber corresponds to one chamber. These plates occupy the space of standard 96-chamber microtiter plates and are compatible with current robotic systems such as the Beckman Biomek system.
  • the apparatus has one or more two-dimensional surfaces 102 comprised of reaction chambers.
  • Each STS genetic marker used from a genome corresponds to some reaction chamber.
  • This experimentation surface provides a means for performing parallel laboratory operations on all the chambers simultaneously.
  • five steps are performed: (1) A deposition of at least two oligonucleotides into the chamber. These oligonucleotides serve as PCR primers for the STS marker specific to the chamber.
  • Means are provided by the apparatus for PCR amplification, DNA hybridization, and signal detection. The following description relates these functions to the parts of the apparatus.
  • PCR Amplification The apparatus provides the means for amplifying the STS DNA region subsequent to presentation with genomic DNA.
  • PCR Innis, M.A. , Gelfand, D.H. , Sninsky, J.J., and White, T.J. 1990.
  • PCR Protocols A Guide to Methods and Applications . San Diego, CA: Academic Press. Mullis, K.B., Faloona, F.A. , Scharf, S.J., Saiki, R.K. , Horn, G.T. , and Erlich, H.A. 1986.
  • Specific enzymatic amplification of DNA in vitro the polymerase chain reaction. Cold Spring Harbor Symp. Quant . Biol .
  • thermocycling components for heating and cooling the reaction mixture.
  • the genomic DNA and PCR reagents are simultaneously transferred to the chambers by means of the robotic device.
  • thermostable polyermases can be used (Garrity, P.A. , and Wold, B.J. (1992). Effects of different DNA polymerases in ligation-mediated PCR: enhanced genomic sequencing and in vivo footprinting. Proceedings of the National Academy of Sciences of the United States of America , 89(3): 1021-5. Ling, L.L., Keohavong, P., Dias, C. , and Thilly, W.G. (1991) . Optimization of the polymerase chain reaction with regard to fidelity: modified T7, Taq, and vent DNA polymerases. Per Methods & Applications , 1(1): 63-9.), incorporated by reference.
  • thermocycling is done using a conventional programmable block thermal cycler 104 based on the heating and cooling of a metal block (using Peltier or fluid refrigerants for cooling) (R. Hoelzel, Trends in Genetics, August 1990, volume 6 #8; p 237-8), incorporated by reference.
  • the reaction plate is transferred to and from this computer-controlled thermal cycler by means of the robot 106.
  • a device 104 is used that heats and cools a rapidly circulating air mass around the plate (e.g., Biotherm PCR oven) (Garner, H.R. , Armstrong, B. , and Lininger, D.M. (1993). High-throughput PCR.
  • a robotic attachment (Beckman Biomek) , incorporated by reference, comprised of a thermocycling surface which has the same 384-chamber shape as the reaction plate is used to physically mate with the 384- chamber reaction plate, and provide the necessary heating and cooling operations under computer control.
  • heating and cooling elements such as Peltier junctions can be physically incorporated into the apparatus. This surface is suitable for transferring sample genomic DNA to many chambers simultaneously. Miniaturization enable shorter cycle times and greater homogeneity because of the rapid temperature equilibration of the thin films and small volumes.
  • DNA hybridization Sufficient volume and chemical composition is provided within each reaction chamber so that the requisite DNA hybridization (Ausubel, F.M. , Brent, R. , guitarist, R.E., Moore, D.D., Seidman, J.G., Smith, J.A. , and Struhl, K. , ed. 1993. Current Protocols in Molecular Biology . New York, NY: John Wiley and Sons. Sa brook, J. , Fritsch, E.F., and Maniatis, T. 1989. Molecular Cloning, second edition . Plainview, NY: Cold Spring Harbor Press.), incorporated by reference, can occur.
  • the robotic component of the apparatus transfers the hybridization reaction mixture to the chambers, and provides means for heating and cooling the reaction chamber, as described above.
  • heteroduplexes include chemical derivatization and endonuclease digestion of single-stranded components.
  • the detection of the heteroduplexes and nucleotides within the loops is done with a commercially available spectrophotometric/fluorometric instrument 108 similar to that used for ELISAs (Dynatech Laboratories, Chantilly, Va) , incorporated by reference, modified to accomodate the larger number and smaller size chambers.
  • a scanning laser fluorimeter can also be employed over the plate surface. Because -the plate is flat and comprised of an optical grade surface, fluorescent detection is straightforward. The robot transfers the reaction plate to this optical detection device prior to the detection operation.
  • computerized fluorescent scanning microscopes are used that are capable of detecting and quantitating fluorescent signals and are suitable for the miniaturized system. These have been developed for immunological and genetic cytochemistry (Biological Detection Systems) , incorporated by reference.
  • a physical signal is measured from the reagent attached to a PCR primer.
  • detection reagents include (but are not limited to) radioactivity, fluorescence, phosphorescence, chemiluminescence, electrical resistivity, pH, and ionic concentration.
  • the direct electrical detection mechanisms are particularly attractive for direct coupling of the experiment onto a minaturized solid state detection device (Briggs, J. , Kung, V.T., Gomez, B., Kasper, K.C., Nagainis, P.A., Masino, R.S., Rice, L.S., Zuk, R.F., and Ghazarossian, V.E. (1990) .
  • the analysis of the signals is done by a computer device 110. Means are provided for the signals are transferred from the detector into the memory of the computer. A computer program for determining genotypes from the quantitative signals and calibrations curves resides in the memory of said computer.
  • the apparatus is manufactured by selecting a set of genetic markers, synthesizing both standard and derivatized oligonucleotide primers, and then depositing said oligonucleotide primers into the reaction chambers of a 384-chamber plate. This plate is then positioned with the other components of the apparatus, including the thermocycling device, the robotic device, the detection device, and the computer device.
  • a sufficient number of polymorphic genetic markers are chosen for unambiguously characterizing or tracing chromosomes in an organism containing DNA or RNA. Depending on the application, this can range from 10 centiMorgan (cm) to 0.001 cm. One cm is approximately one million megabases (Mb) . In a preferred embodiment, a resolution of 0.1 cm, or 100,000 base pairs (bp) , is used. In the human species, for example, which contains about 3 billion bp, this works out to 30,000 markers.
  • the genetic markers to be used for each STS are obtained as PCR primer sequences pairs from available databases (Genbank, GDB, EMBL; Hilliard, Davison, Doolittle, and Roderick, Jackson laboratory mouse genome database.
  • oligonucleotide primers for each STS are synthesized (Haralambidis, J. , Duncan, L. , Angus, K. , and Tregear, G.W. 1990. The synthesis of polyamide- oligonucleotide conjugate molecules. Nucleic Acids Research , 18(3): 493-9. Nelson, P.S., Kent, M. , and Muthini, S. 1992. Oligonucleotide labeling methods. 3. Direct labeling of oligonucleotides employing a novel, non-nucleosidic, 2- aminobutyl-1,3-propanediol backbone. Nucleic Acids Research , 20(23): 6253-9.
  • primers may be derivatized with a fluorescent detection molecule or a ligand for immunochemical detection such as digoxigenin. Derivatization of the primer for binding to the surface entails the incorporation of a biotinylated nucleotide at the 5' end of the synthetically made oligonucleotide. Additional biotinylated residues can also be incorporated (depending on the protocol) into this primer either at the time of biosynthesis or by secondary photo or chemical biotinylation.
  • oligonucleotides and their derivatives can be ordered from a commercial vendor (Research Genetics, Huntsville, AL) .
  • the oligonucleotide primer sets are deposited into each reaction chamber by means of a robotic system from source chambers containing a large store of presynthesized oligonucleotides. Said transferring can be effected in one or more operations, wherein oligonucleotide primers are deposited into multiple chambers in each transferring step, thereby creating a two-dimensional spatial array.
  • this deposition is effected by means of a parallel deposition device to which the 384-chamber plate is presented by means of a conveyor belt.
  • the deposition device has source chambers, each containing a large store of a unique oligonucleotides specific to a reaction chamber. Said source chambers are spatially arrayed to conform to the reaction chambers of the plate. Both the device and plate are properly positioned and made stationary, and then the chambers are filled in one or more more steps with the oligonucleotide.
  • the plates are dried and each chamber is then coated with a wax material, such as Ampliwax (Perkin-Elmer, Norwalk, CT) , incorporated by reference.
  • a wax material such as Ampliwax (Perkin-Elmer, Norwalk, CT) , incorporated by reference.
  • This material hardens at 4°C, is liquid throughout the temperature range of the PCR, and serves as a vapor barrier to prevent evaporation of the PCR reactions during the denaturation steps at 95 * C.
  • the oligonucleotides are covalently attached to a substrate such as glass by spatially addressable light-directed parallel DNA synthesis (Drmanac, R. , Drmanac, S., Strezoska, Z., Paunesku, T. , Labat, I., Zeremski, M. , Snoddy, J. , Funkhouser, W.K. , Koop, B. , and Hood, L. 1993. DNA Sequence Determination by Hybridization: a Strategy for Efficient Large-scale Sequencing. Science , 260: 1649-1652. Fodor, S.P.A., Read, J.L., Pirrung, M.C., Stryer, L.
  • Other components of the apparatus include resins and filters that will nonspecifically and reversibly bind double-stranded DNA, but not free nucleotides or short oligonucleotides (Molecular Biology LabFax, T.A. Brown, ed. Academic Press p281-4) , incorporated by reference. These are commercially available and can be readily modified to be fit within a manifold that will ensure leak-proof contact with the reaction chambers or plates. Uncharged nylon, charged nylon, and nitrocellulose are some of the filter materials in current use (Harley, C.B. , and Vaziri, H. (1991). Deproteination of nucleic acids by filtration through a hydrophobic membrane. Genetic Analysis, Techniques &
  • the commercially available polystyrene or polycarbonate 384- chamber microtiter plate 102 is arranged in a 24 by 16 array.
  • the commercially available robotic device 106 has a surface with 384 chambers arranged in a spatial configuration identical to that of the reaction plate 102.
  • all robotic actions e.g., for the steps of amplification, hybridization, and detection
  • robotic device 106 in mechanical juxtaposition with plate 102.
  • the commercially available programmable block thermal cycler 104 has a surface with 384 chambers arranged in a spatial configuration identical to that of the reaction plate 102. During thermocycling, every chamber of the plate 102 is in direct contact with its corresponding chamber in the thermocylcer 104.
  • the commercially available programmable oven thermocyler 104 is sufficiently large to accommodate the dimensions of 384-chamber reaction plate 102, and has sufficient uniformity to perform the necessary amplification reactions within each chamber.
  • a robotic device is used to transfer the reaction plate 102 to and from the oven thermocycler 104.
  • the commercially available ELISA-like spectrophotometric/fluorometric detection device 108 contains 384 chambers arranged in an spatial configuration identical to that of reaction plate 102. During the detection phase, the plate 102 is placed into the detector, with each chamber of plate 102 residing within its corresponding detection chamber of detector 108. This enables detections to be conducted simultaneously and independently for each chamber.
  • the computer device 110 coordinates the activities of the other components plate 102, thermocyler 104, robotic device 106, and detector 108. Note that most commercial thermocylers, robotic devices, and detectors include computational facilities for independently performing control, detection, and processing tasks, thus freeing the computer device 110 from such low-level processes.
  • the computer device 110 is connected to the detector 108. Signals obtained from the detector 108 are transferred to the memory of computer 110.
  • the computer 110 employs processing means for interpreting the signals in its memory, and determines and outputs the characteristics of nucleotide sequences in each chamber of the reaction plate 102.
  • genomic DNA is first extracted from an individual (say, by processing a blood sample) .
  • PCR reagents are then mixed with the genomic DNA, and a robotic device applies this PCR/DNA mixture to the chambers of the reaction plate of the apparatus. Every chamber has its own predeposited PCR primers that define a unique genetic marker.
  • PCR amplificiation of the genomic DNA marker region is then performed on every chamber using the thermocycling component of the apparatus.
  • a quantitative hybridization experiment is then conducted in ⁇ every chamber, possibly modifying the DNA.
  • the signals from these hybridization experiments are then measured from every chamber using the detection component, such as fluorescence measurements with a scanning light microscope. More than one (e.g., two or three) such parallel experiments may be needed to acquire all necessary genotyping data for one STR.
  • the measurements are then collected and analyzed by the component computer device to characterize the alleles at every marker.
  • the resulting genotyping information from the multiple alleles can be used for a number of applications, as described below.
  • One important use is the determination of genetic risk for phenotypic traits, including diseases.
  • haplotypes can be compared, and the shared genomic regions determined.
  • Correlating a shared trait and genotype commonalities enables a determination of genomic patterns that imply a quantitative risk for said trait.
  • These patterns can be applied to the genotypes of an individual and their relatives to compute a probability of expressing the trait.
  • the traits correspond to common multigenic multifactorial diseases, the highest risk entities are determined, and preventative measures undertaken, thereby improving the health of said individual.
  • Software systems are built to tailor the genotyping information for this advising task.
  • the quantitative hybridization experiment that is used in the preferred embodiment is a pair of loop mismatch assays.
  • the first assay measures the sum of the two STR allele loops, relative to a third (and smaller) STR.
  • the second assay measures the difference of the two STR allele loops relative to each other. By combining the sum and difference values, the two alleles can be determined.
  • the quantitative loop detection is effected by directly measuring the signals derived from the loops relative to the number of strands with loops (this is described in detailed later on) .
  • the loops are quantitated either by a chemical modification of the single-stranded loop DNA into a detectable state, or by incorporation of labeled DNA and subsequent digestion and detection of the single-stranded loop.
  • the number of strands is measured by using an end-labeled PCR primer.
  • the ratio of the (calibrated) loop measurements to the number of strands determines the loop size.
  • multiple hybridizations are performed for every STR, producing a patter that determines the genotype.
  • This system for performing multiple genotypings in parallel, with each STR in its separate cell, has many useful advantages over current genotyping methods, including the best gel-based multiplex methods. Specifically,
  • the experiment's architecture allows independent interchangeability of STR loci. Any STR(s) of the same class can be placed at any cell of the device.
  • the synthesis of oligonucleotides can be spatially or temporally separated from the execution of the PCR amplification and the detection.
  • Manufacturing enables miniaturization of the device, and the incorporation of detection machinery into the device.
  • the manual labor required for genotyping is greatly reduced, because the manufactured device eliminates the separate steps of handling multiple (e.g., thousands) specific STR primers. This includes synthesizing the oligonucleotides, performing the PCR, loading gels or other detection devices, and checking the genotyping results.
  • the CA (or GT) repeat region is of varying length.
  • Complementary strand A second strand having a Watson- Crick complementary DNA sequence to a first strand. However, the number of CA or GT repeats need not equal that of the first strand.
  • Upper strand The DNA strand 202 of the STR locus that contains the CA-repeat units.
  • Lower strand The DNA strand 204 complementary to the upper strand that contains the GT-repeat units.
  • the PCR oligonucleotide primer 206 that initiates the upper strand of the STR locus.
  • the PCR oligonucleotide primer 208 that initiates the lower strand of the STR locus.
  • the system is comprised of the following steps:
  • Step 1 entails the manufacture of an apparatus in which STR loci have been selected, and appropriate oligonucleotides (withmodifications) synthesized and deposited within each chamber.
  • Step 2a the process begins by extracting DNA from blood or tissue.
  • DNA There are numerous standard methods to isolate DNA including whole blood, isolated lymphocytes, tissue, and tissue culture (Ausubel, P.M., Brent, R. , guitarist, R.E., Moore, D.D., Seid an, J.G., Smith, J.A. , and Struhl, K. , ed. 1993. Current Protocols in Molecular Biology . New York, NY: John Wiley and Sons. Sambrook, J., Fritsch, E.F., and Maniatis, T. 1989. Molecular Cloning, second edition . Plainview, NY: Cold Spring Harbor Press. Nordvag 1992. Direct PCR of Washed Blood Cells.
  • DNA is extracted from anticoagulated human blood removed by standard venipuncture and collected in tubes containing either EDTA or sodium citrate.
  • the red cells are lysed by a gentle detergent and the leukocyte nuclei are pelleted and washed with the lysis buffer.
  • the proteinase K digestion is performed for 2 hours to overnight at 50 * C.
  • the solution is then extracted with an equal volume of buffered phenol-chloroform.
  • the upper phase is reextracted with chloroform and the DNA is precipitated by the addition of NaAcetate pH 6.5 to a final concentration of 0.3M and one volume of isopropanol.
  • the precipitated DNA is spun in a desktop centrifuge at approximately 15,000 g, washed with 70% ethanol, partially dried and resuspended in TE (lOmM Tris pH 7.5, 1 mM EDTA) buffer.
  • TE lOmM Tris pH 7.5, 1 mM EDTA
  • the reaction plates of the apparatus are maintained at 4"C at the time the genomic DNA has been mixed with the other components of the PCR reaction.
  • these other components include, but are not limited to, the standard PCR buffer (containing Tris pH8.0, 50 mM KCl, 2.5 mM magnesium chloride, albumin) , triphosphate deoxynucleotide ⁇ (dTTP, dCTP, dATP, dGTP) , the thermostable polymerase (Taq polymerase in this preferred embodiment, but others are available though buffer conditions are somewhat different) (Garrity, P.A. , and Wold, B.J. 1992.
  • the PCR primers for each locus are chosen for consistency with these uniform reaction conditions.
  • the total amount of this mixture is determined by the final volume of each PCR reaction (say, 10 ul) and the number of reactions (say, 384) .
  • This mixture can also be varied by including some of the constituents with the primers that are previously deposited in the microchambers. All of the necessary components for the PCR reactions are kept separate until the Ampliwax is melted and the aqueous phases reconstitute, each reaction cell receives a consistent and reproducible amount of the necessary components, and the combination of constituents does not compromise stability and biological activity (e.g., the Taq polymerase may be unstable if stored in a lyophilized state on the reaction plates) .
  • Step 2b the DNA/PCR mixture is applied to the reaction chambers with the Biomek robotics unit and the PCR is initiated by heating the plate rapidly to 95"C in order to melt the ampliwax, allow the DNA/PCR mixture to mix with the oligonucleotide primers (convection mixture is sufficent) , and denature the genomic DNA.
  • the ampliwax forms a stable vapor barrier over the chambers during the PCR reactions.
  • This method of initiating the PCR reactions is referred to as a "hot start" (D'Aquila et al., Nuc. Acid Res. 19 (13) 3749 ( 1 9 9 1)), incorporated by reference, and has the additional benefit of reducing the amount of nonspecific PCR products that are produced, thus improving the purity and amount of the final desired PCR signal that will be detected.
  • Step 2c the PCR reactions are performed on all of the reactions simultaneously by appropriately heating and cooling the plate to specific temperatures.
  • the plates are cooled to the annealing temperature (50 * -65 * C, typically 55°C) for a set time (0-100 seconds, typically 15 seconds) , warmed to the extension temperature which is optimal for the thermostable polymerase (e.g., 73 * C for Taq polymerase) and maintained for a set period of time (0-100 seconds, typically 30 seconds) .
  • the cycle is completed by elevating the temperature of the reaction to denature the DNA products (93-95"C for 0 - 60 seconds, typically 15 seconds) .
  • Step 2c the PCR cycles are completed, with each chamber containing the amplified DNA from a specific location of the genome.
  • Each mixture includes the DNA that was synthesized from the two alleles of the diploid genome (a single allele from haploid chromosomes as is the case with the sex chromosomes in males or in instances of cells in which a portion of the chromosome has been lost such as occurs in tumors, or no alleles when both are lost) .
  • the free triphosphate deoxynucleotides and the unused oligonucleotide primers are also in this mixture.
  • Step 2d is the last PCR step, which inactivates the thermostable polymerase, say, by the addition of EDTA. Ampliwax protects the integrity of the chambers and the mixing occurs at 37'C for several minutes.
  • Step 3a for quantitive loop mismatch genotyping, the DNA strands are allowed to reanneal at a temperature above the annealing temperature of the oligonucleotide primers, but below the melting temperature of perfectly matched complementary strands. In most instances, this will be between 65 and 75 * C, depending on the salt conditions of the buffer.
  • the annealing time can vary from 1 hour to 24 hours, with 2 hours selected in the preferred embodiment.
  • Step 3a the heteroduplex annealing is done with the original contents of the chamber for the "subtraction” assay of the loop detection method.
  • the “addition” assay that is required for the measurements of loop mismatches entails combining of the contents of a chamber with its counterpart from a control plate in which the PCR reaction has been carried out with a corresponding set of primers (same oligonucleotides, but with different primer modifications) on a target DNA that has the smallest possible number of repeated elements for the given DNA marker. These two assays are done in different chambers of the reaction plate, or on separate plates entirely.
  • the Left primer is linked to a detection molecule
  • the Right primer is covalently linked to a molecule necessary for binding (i.e., biotin in the preferred embodiment)
  • the unknown genomic DNA (Source DNA) is amplified using a Left primer that is labeled with the detection molecule and the Right primer is unmodified.
  • the control DNA (or Target DNA) is amplified with an unmodified Left primer and the Right primer contains the binding protein (such as biotin) .
  • the amplified DNA from the unknown source and the Target DNA are combined to form heteroduplexes, one will only detect the binding of the upper strand of the Source DNA to the immobilized lower strand of the Target DNA and homoduplexes of the Target DNA strands will be undetected as well as perfectly matched, creating no exposed loops for detection.
  • the corresponding Source and Target DNAs are appropriately combined using the Biomek robot though direct physical transfer methods (i.e., aligning the Source DNA plate on top of the Target DNA plate directly and mixing by melting the ampliwax) .
  • Steps 3b, 3c, 3d, 3e, and 3f the unwanted single strands, primers and free nucleotides are removed by using a 3 'to5'-specific exonuclease that will not cleave or disrupt internal single-stranded loop structures, in both the subtraction and addition assays.
  • Exonuclease VII from E. coli is capable of 3'- 5' exonuclease activity limited to single-stranded DNA (Ausubel, F.M. , Brent, R. , guitarist, R.E., Moore, D.D., Seidman, J.G., Smith, J.A. , and Struhl, K. , ed. 1993.
  • the enzyme is added to the chambers over the Ampliwax surface, allowed to mix at 37'C and incubated for a brief period (1-60 minutes, 10 minutes in the preferred embodiment) and terminated by the addition of EDTA. At the same time, the buffer is adjusted to promote non-specific binding of the DNA to a resin or filter.
  • Step 3f free deoxynucleotides and primers that interfere with binding of the PCR products and the detection system are removed.
  • the purification of unincorporated DNA materials is combined with the elimination of single-stranded DNA species that remain after heteroduplex formation. In the preferred embodiment, this purification step is done after the heteroduplex formation, thereby also eliminating single-stranded DNA's.
  • heteroduplex formation may be somewhat inhibited by residual primers, combining of the steps greatly simplifies the method and aids in increasing the ⁇ ignal-to-noise ratio.
  • the separation of free deoxynucleotides and primers from the PCR products is achieved by filtration (the unwanted materials are significantly smaller than the final PCR products) using commercially available filters (Centricon 30 filters, A icon) , incorporated by reference, or by adsorption (Molecular Biology LabFax, T.A. Brown, ed. Academic Press p281-4) , incorporated by reference, which entails the nonspecific binding of the PCR products, double- stranded and single-stranded DNA to a matrix followed by removal of the supernatents containing the primers and nucleotides.
  • Step 3g the filter is set upon a plastic manifold that fits over the chambers of the 384 chamber plate, the apparatus is inverted so that the Ampliwax rises to the bottom surface of the chambers, and the DNA solution comes into contact with the filter.
  • Step 3h the filter is separated from the chamber and washed with a high salt buffer to remove the free nucleotides.
  • Step 3i this filter is then placed against a polystyrene surface (optical grade)
  • the heteroduplexes are bound to the polystyrene surface in an exact replica of their initial spatial orientation.
  • the heteroduplexes containing biotinylated primer will bind to the streptavidin surface under a wide range of buffers that are pH neutral.
  • the DNA is bound in the TE buffer and in Step 3j the plate is washed twice with 0.15 M phosphate buffer.
  • Step 3k the chemical derivatization of the C and A residues within the heteroduplex loops employs a modification of the method originally described by (Ki ura, K. , Nakanishi, M. , Yamamoto, T. , and Tsuboi, M. (1977).
  • the pH can be varied from 4.5 to 6.5 and alternative buffers can be used
  • the plate is then covered with the buffer containing a final concentration of CAA is 2.0% and incubated at 37"C for 4 hours (longer or shorter times may be used).
  • the reaction is terminated in Step 31 by washing with 0.01M Tris-HCl pH 7.0 and 1.0 M NaCl.
  • the NaCl prevents dissociation of the heteroduplexes during the etheno- dehydration step.
  • the plate is heated in the final wash volume at 85-90"C for 1 hour, which dehydrates the ethenoderivative.
  • loop-specific derivatization of the nucleotides with chloracetaldehyde or other chemical modification reagents provides an alternative means for eliminating background reagents prior to detecting nuclease-liberated free derivatized nucleotides.
  • the fluorescence of the primer detector molecule that is bound to the hybridized strand is measured at this time, or measured at a later stage in conjunction with the fluorescent adducts created within the loop structures.
  • the detection of the hybridized strands and the derivatized nucleotides within the loops are performed at the same time.
  • the method of detection is preferrably by fluorescence (Kimura, K. , Nakanishi, M. , Yamamoto, T. , and Tsuboi, M. (1977). A correlation between the secondary structure of DNA and the reactivity of adenine residues with chloroacetaldehyde. Journal of Biochemistry, 81(6): 1699-703.), incorporated by reference.
  • Alternative embodiments include chemiluminesence (Martin R. , Hoover, C. , Grimme, S., Grogan, Cl, Holtke, J. and Kessler, CF. (1990) Bio Techniques 9(6): 762-8), incorporated by reference, electrochemical coupling using silicon surfaces (Briggs, J. , Kung, V.T., Gomez, B., Kasper, K.C., Nagainis, P.A., Masino, R.S., Rice, L.S., Zuk, R.F., and Ghazarossian, V.E. (1990) . Sub-femtomole quantitation of proteins with Threshold, for the biopharmaceutical industry.
  • the etheno derivatives (primarily the ethenoadenine residues) within the loops are measured with the fluorimeter of the apparatus: excitation at 310 nm, and emission at 410 nm.
  • the degree of fluorescence and sen ⁇ ititivity of the fluorimeter is calibrated with a quinine sulfate standard (10' 5 - 10 "7 M in 0.1 N H 2 S0 4 ) .
  • the amount of direct etheno fluorescence is increased by a factor of 2 by completely digesting the samples with DNasel and phosphodiesterase, when a gel overlay is used to prevent diffusion of the signals and disruption of the two- dimensional array of markers.
  • the number of heteroduplexes is determined by the unique fluorescence of the adduct that was initially linked to the Left primers.
  • Rhodamine, fluorescein or isothiocyanine derivatives can all be used to obtain intense fluorescent signals that can be separately measured from the fluorescence of the etheno adducts.
  • Standard programs guantitate the two different signals by analyzing two or more regions of the emission and/or excitation spectra.
  • Alternative detection methods for the etheno- derivatives include the use of specific monoclonal antibodies (Eberle, G. , Barbin, A., Laib, R.J. , Ciroussel, F., Thomale, J. , Bartsch, H. , and Rajewsky, M.F.
  • Step 4a detection residues within a mismatch loop will display differing degrees of reactivity to the modifying reagents as well as interactions (including fluorescence quenching and energy transfer) between closely spaced ethenoderivatives. (Which is why the fluorescence of an etheno derivative in a polynucleotide is approximately half that of the free ethenonucleotide.)
  • systematic labeling is used to calibrate the fluorescent signal for each size of mismatch loop, thereby compensating for the nonlinearity of the fluorescent signal with respect to the loop size.
  • Step 4a* an alternative embodiment of the heteroduplex loop detection is accomplished by incorporating labeled nucleotides during the Step 2a PCR synthesis, and then in Step 31* digesting them out of the single-stranded loops of the heteroduplex.
  • Incorporating labeled nucleotides e.g., fluorescently or radioactively, using appropriate triphosphate deoxynucleotide precursors
  • the quantity of detectable freed label corresponds to the loop size.
  • SI nuclease from Aspergillus orcyze (Dodgson, J.B., and Wells, R.D. (1977). Action of single- strand specific nucleases on model DNA heteroduplexes of defined size and sequence. Biochemistry, 16(11): 2374-9. Gite, S., and Shankar, V. (1992). Characterization of SI nuclease. Involvement of carboxylate groups in metal binding. -European Journal of Biochemi ⁇ try , 210(2): 437-41. Shenk, T.E., Rhodes, C. , Rigby, P.W. , and Berg, P. (1975).
  • the gel prevents diffusion of the released nucleotides. (Diffusion is not an issue with direct detection of chemically modified nucleotides.)
  • the polystyrene plate is placed into a plastic manifold, recreating 384 separate chambers.
  • Step 4 detection chemical modification is combined with specific nuclease treatment.
  • SI or micrococcal nuclease can be used to enhance the fluorescence of the etheno-derivatized adenosines generated by the chloracetaldehyde reaction. This provides two sets of measures of the same residues, thus increasing accuracy and sensitivity.
  • the nuclease treatment can be used alone to liberate nucleotides from the loop. These free nucleotides are then separated from the retained double- stranded DNA of the heteroduplexes and quantitated. The spatial orientation of the reactions must be preserved as the nucleotides are released.
  • a gel such as polyacrylamide that is on a solid backing (available from FMC Corporation)
  • a manifold over the streptavidin plate to contain the solutions with the nuclease and free nucleotides.
  • a gel such as polyacrylamide that is on a solid backing (available from FMC Corporation)
  • FMC Corporation solid backing
  • a manifold over the streptavidin plate to contain the solutions with the nuclease and free nucleotides.
  • the polyacrylamide gel plate one takes a 0.1 - 0.5 mm polyacrylamide gel (ranging from 4-15%) bound to a plastic backing. The gel is slightly dehydrated with minimal surface moisture.
  • the nuclease solution is applied to the surface of the gel (the amount of SI or micrococcal nuclease must be titrated for the enzyme lot) and the gel is placed over the surface of the streptavidin plate to which the heteroduplexes are bound.
  • the gel layer is removed and the nucleotides embedded within the gel are quantitated by fluorescence, two-dimensional radioactivity counting, autoradiography, or immunochemical assays.
  • Steps 4b, 4c, and 4d An alternative detection mechanism is described in Steps 4b, 4c, and 4d.
  • the nucleotides within the heteroduplex loops are detected by distinguishing these nucleotides from those that are contained within the double-stranded portions of the DNA strands.
  • the chemical modification agent chloracetaldehyde that selectively reacts with the exposed nucleotides within the loops is employed to specifically modify the C and A nucleotides within the heteroduplex loops.
  • This reagent is preferrable to other chemical modification agents such as hydroxylamine, bisulfite, and osmium tetroxide because of its ease of use, and the fact that the derivatized nucleotides are fluorescent, while the chemical reagent and the unmodified nucleotides are not fluorescent.
  • chemical modification agents such as hydroxylamine, bisulfite, and osmium tetroxide because of its ease of use, and the fact that the derivatized nucleotides are fluorescent, while the chemical reagent and the unmodified nucleotides are not fluorescent.
  • a detection amplification method such as immunodetection of the adducts using a urease-conjugate and a silicon-based detection of a pH shift, contacts the polystyrene surface with an electronic silicon detector and a urea-containing gel interface using existing methods (Briggs, J. , Kung, V.T., Gomez, B. , Kasper, K.C., Nagainis, P.A., Masino, R.S., Rice, L.S., Zuk, R.F., and Ghazarossian, V.E. (1990). Sub-femtomole quantitation of proteins with Threshold, for the biopharmaceutical industry.
  • Step 5 the genotypes are determined for every STR.
  • the two signals for each locus represent the sum and difference between the alleles.
  • this representation becomes quantitative.
  • One allele is computed by adding the sum and difference values and then dividing by two, and the second allele is computed by subtracting the sum and difference values and then dividing by two. This genotype determination is done for every locus.
  • Phenotypic data is gathered on the individuals, animals, or plants which are genotyped. For humans, this includes the basic medical examination: history, physical, and laboratory data. Additional phenotypic markers for various genetic diseases (e.g., creatine kinea ⁇ e for Duchenne muscular dystrophy) can also be collected. Environmental risks and exposures are also recorded.
  • Genomic ⁇ , 18: 283-289. association, homozygosity mapping (Ben Hamida, C. , Doerlinger, N. , Belal, S., Linder, C. , and Reutenauer, L. 1993. Localization of Friedrich ataxia phenotype with selective vitamin E deficiency to chromosome 8q by homozygosity mapping. Nature Genetic ⁇ , 5: 195-200. Pollak, M.R. , Chou, Y.-H.W., Cerda, J.J., Steinmann, B. , LaDu, B.N. , Seidman, J.G. , and Seidman, C.E. 1993.
  • the result is one or more (with polygenic disease) peaks appearing at specific locations on the chromosome that both suggest specific gene regions, as well provide a signature pattern for phenotypic risk.
  • dense STS sampling along the genome i.e., x-axis
  • large numbers of individuals tested at these STSs with each STS's allele given a combined score (i.e., on a y-axis)
  • the conventional limitations of statistical linkage analysis are overcome, and the process becomes akin to a signal processing of genetic data in order to separate delta functions (i.e., the causative genes) from the background noise.
  • Risks of trait inheritance or disease can then be determined by probabilistic (e.g., Bayesian) techniques (Young, I.D. 1991. Introduction to Ri ⁇ k Calculation in Genetic Coun ⁇ elling. Oxford: Oxford University Press.), incorporated by reference, that correlate the available genotypic and phenotypic data and environmental factors with chance of disease occurrence.
  • the signatures of causative gene locations deduced from the population can be applied to each individual to ascertain risk.
  • one or more genetic loci can be associated with specific (desirable or undesirable) traits such as milk production or disease resistance. This information can be used for selective breeding.
  • the techniques of genotyping and phenotypic correlation can be similarly applied to the task of disease gene identification. Exploiting dense genotypic data is particularly advantageous over existing techniques in localizing the genes of complex multigenic diseases. Once genes have been localized on the genetic map, use of an integrated genetic/physical genome map allows the positional cloning (Kerem, B.-S., Rommens, J.M. , Buchanan, J.A. , Markiewicz, D. , Cox, T.K. , Chakravarti, A., Buchwald, M. , and Tsui, L.-C. 1989. Identification,of the cystic fibrosis gene: genetic analysis. Science , 245: 1073-1080.
  • the STR loop mismatch method employs heteroduplex hybridizations to directly measure the STR allele repeat number n.
  • n the two alleles at a given STR locus as the complementary strands in a heteroduplex DNA molecule.
  • one strand S contains s STR repeat units and its mismatched complementary strand T 1 contains t STR repeat units.
  • U' denotes the complementary strand of sequence U.
  • Each STR repeat unit is comprised of k nucleotides. Assume that the left and right flanking regions are identical (i.e., perfectly complementary).
  • SS-DNA single-stranded nucleic acid
  • subsequence L 402 is the left flanking region (with subsequence L' 404 complementary) and subsequence R 406 is the right flanking region (with subsequence R' 408 complementary) .
  • the s-t extra STR units form a single-stranded loop 410 of size (s-t)*k bases. Energetically, only one such loop is expected (Ninio, J. 1979. Biochimie , 61: 1133. Salser 1977. Cold Spring Harbor Symp. Quant . Biol . , 42: 985.), incorporated by reference; however, multiple loops would in no way change the results.
  • the complementary structure shown in subfigure 4B is formed.
  • the single-stranded loop 412 size (t- s)*k is on the complementary strand.
  • the key idea is this: by detecting the size of the single-stranded loop 410 or 412, the value s-t (or t- ⁇ ) can be determined. By comparing two unknown alleles with a known standard, and by also comparing the two alleles with respect to each other, these loop size measurements will precisely determine the two alleles, i.e., the genotype at the STR locus.
  • the signal strength from a loop of single-stranded DNA is proportional to the number of unmatched nucleotides in the heteroduplex ST'. This signal is measured by means of a first label (*) that corresponds to the number of unmatched nucleotides in the loop of ST'. This label is measured by means of a physical detection that preferentially detects specific nucleotides in single-stranded DNA.
  • the nucleotides in the S strand of the heteroduplex molecule are chemically modified after the PCR synthesis.
  • the modification to these nucleotides renders them detectable (e.g., by fluorescence).
  • the measured fluorescence of these modified S nucleotides is proportional to the size of the loop mismatch s-t.
  • the nucleotides in the S strand of the heteroduplex molecule are labeled (radiolabeled, or other detectable means) and then incorporated during the PCR synthesis. Subsequent digestion with an Sl-like endon ⁇ clease separates the mismatched (and labeled) S nucleotides from the heteroduplex. The measured signal of these released S nucleotides is proportional to the size s-t of the loop mismatch prior to enzymatic digestion.
  • Means of physical detecting a quantitative signal for determining the loop size include: radioactivity, fluorescence, optical density, ionic concentration, electromagnetic conductivity or susceptibility, electrochemical coupling, or other detection assays (all referred to previously in this description) .
  • the loop size is determined by the ratio of the (1) measured single-stranded loop signal strength to the (2) measured number of strands having a loop. Therefore, in addition to detecting loop size, accurate quantitation also requires determining the number of heteroduplex strands with measurable loops. This is done using an independent second label (#) on the S strands of the heteroduplex molecules. This label is comprised of a detectable molecule attached to the PCR primer of the S strand; subsequent measurement of this molecule quantifies the number of strands in heteroduplexes.
  • the loop with label (*) is indicated by A*s, which in the most preferred embodiment represent adenosine nucleotides on the single- stranded loop that are chemically modified by chloracetaldehyde into a detectable state.
  • A*s represent labeled (e.g., radiolabeled) nucleotides that are incorporated during PCR synthesis, and are then detected following endonuclease digestion.
  • the experiment consists of performing a PCR amplification of an unknown CA-repeat locus source S of the form L(CA) S R, and hybridizing it to a known complementary oligonucleotide target T' of the form [L(CA),R] ' in order to indu ce mismatch and quantitatively measure the loop.
  • a CA-repeat locus molecule is selected for analysis, and is defined by its unique left and right oligonucleotide primers.
  • the primers are synthesized with appropriate labeling and linking modifications (Haralambidis, J. , Duncan, L. , Angus, K. , and Tregear, G. . (1990) .
  • the target DNA TT 1 is constructed from a standard of known CA-repeat length t in a separate PCR experiment.
  • the allele size t is chosen sufficiently small, say between 0 and 10, so that s>t is always guaranteed.
  • Standard PCR amplification of genomically-derived or cloned DNA for 20-40 cycles is done using unlabeled primers and nucleotides, with a linker such as biotin on the right primer.
  • Step 2b the source DNA SS' is constructed from sample genomic DNA via a PCR experiment.
  • the CA-repeat locus molecule is defined by its unique left and right primers.
  • a standard PCR amplification of genomically derived DNA is done for 20-40 cycles using labeled (#) left primer in the presence of A* labeled nucleotides.
  • Step 3a the SS' and TT' duplex molecules are denatured to form single stranded DNAs. When renatured in solution, the hybridization pairs
  • the T strands of the TT 1 duplex are not detectable (since their loops match) , and can be factored out of the analysis.
  • the T strands can be removed by attaching the TT' duplex to solid support via the linker of T', and then denaturing T from T', and washing to remove T, thus purifying T*.
  • T can remain as a nondetectable competitive contaminant.
  • using an excess of SS' relative to TT' favors the production of ST' heteroduplexes. Therefore, the focus is on the hybridization pairs
  • the SS' contains no single-stranded loops, hence is not detectable. Further, since only the T' molecule has the linker for solid support, attaching the T' to a surface (e.g., the biotin of T' to a streptavidin-coated surface) and washing removes the SS' product. This leaves only ST 1
  • the heteroduplex molecule is comprised of an upper strand 502 and a complementary lower strand 504.
  • the hybridization product is as shown in subfigure 5A.
  • the upper source strand 502 is produced by a first PCR amplification of sample genomic DNA.
  • the single-stranded DNA loop 506 contains *- detectable A nucleotides. (Following chemical modification or by incorporation/digestion, the *-detectable A's are used to measure loop size via label *.)
  • a second label (#) 508 on the upper strand is for strand quantification, and is attached to the left PCR primer.
  • the complementary lower target strand 504 is produced by a different PCR amplification of a known STR locus, or by direct synthesis.
  • This lower strand has a linker 510 such as biotin attached to its 5' (right) end.
  • Step 3b chemical modification em b odiment, the exposed A*s on the single-stranded DNA loop are chemically modified by chloracetaldehyde, as shown in subfigure 5B; in Step 4a, detecting the fluorescence from the first label (*) on the A* 512 measures the magnitude of s-t.
  • Step 3c the exposed A*s on the single- ⁇ tranded DNA loop are digested from the heteroduplex into free A* 514 using an endonuclease, as shown in subfigure 5C; in Step 4b, the radioactive A* is then detected using a scintillation counter, thereby measuring the magnitude of s-t.
  • the allele is determined in Step 5. Calibrations done prior to the experiment ensure that these measurements provide precise quantitation. Since
  • Step 5 adds the known value t to s, forming the average (sl+s2)/2 of the alleles. Multiplying this average by 2 determines the allele sum sl+s2.
  • This experiment consists of performing a PCR amplification of an unknown CA-repeat locus with the (zero, one, or) two sources SI and S2 of the form L(CA),,R and L(CA) s2 R, and hybridizing them against each other's complementary strands. This induces a loop mismatch proportional to js2-slj, which is then quantitatively measured.
  • a CA-repeat locus molecule is selected for analysis, and is defined by its unique left and right oligonucleotide primers.
  • the primers are located far enough away from the CA-repeat region to assure a sufficiently long linear stretch of DNA in the homoduplex; this is done make the effect of different loop sizes on the free energy neglible.
  • the rationale is that the flanking regions and the complementary CA/GT repeat regions have a total free energy that is proportational to the number of matching nucleotides, whereas the single- stranded DNA loop of heteroduplex has a free energy that grows as the logarithm of the loop size (Ninio, J. 1979. Biochimie , 61: 1133. Salser 1977. Cold Spring Harbor Symp. Quant . Biol . , 42: 985.), incorporated by reference.
  • the free energy changes (and binding affinities) introduced by differing loop sizes is small.
  • target DNA TT' is constructed from a standard of known CA-repeat length t in a separate PCR experiment.
  • the allele size t is chosen sufficiently small, say between 0 and 10, so that s>t is always guaranteed.
  • Standard PCR amplification of genomically-derived or cloned DNA for 20-40 cycles is done using unlabeled primers and nucleotides. No labels or linkers are used.
  • Step 2b the two source alleles are constructed simultaneously in one PCR experiment: each allele serves as the hybridization target for the other.
  • a standard PCR amplification of genomically derived DNA is done for 20-40 cycles using labeled (#) left primer, and a right primer with a linker such as biotin, in the presence of A* labeled nucleotides.
  • Step 3a forms the heteroduplexes.
  • the SI,SI' and S2,S2' homoduplex molecules are denatured to form single stranded DNAs.
  • the hybridization pairs are renatured in solution.
  • SI,SI' S1,S2'; S2,S1'; and S2,S2'.
  • the heteroduplex molecule constructed after PCR amplifying the sample genomic DNA, and rehybridizing, is comprised of an upper strand 602 and a complementary lower strand 604.
  • the single-stranded DNA loop 606 contains *- detectable A nucleotide ⁇ . (Following chemical modification or by incorporation/digestion, the *-detectable A's are used to measure loop size via label *.)
  • a second label (#) 608 on the upper strand is for strand quantification, and is attached to the left PCR primer.
  • the lower strand also ha ⁇ a linker 610 such as biotin attached to its 5' (right) end.
  • the label is incorporated into both strands during the PCR by labeling the CA and/or the GT dN*'s.
  • both the S1,S2' and S2,S1' strands have detectable single-stranded loops. Since both have the same js2-slj loop ⁇ ize, there is a two- to four- fold increase in the desired measured signal.
  • Elimination can be done using a single- ⁇ trand ⁇ pecific 3' to 5* exonuclea ⁇ e that remove ⁇ SS-DNA but not internal loop ⁇ , such as E. coli exonuclease VII.
  • Step 3c's chemical modification embodiment the exposed A*s on the ⁇ ingle-stranded DNA loop are chemically modified by chloracetaldehyde; in Step 4a detecting the fluorescence from the first label (*) on A*s measures the magnitude of s-t.
  • Step 3d' ⁇ alternative ⁇ ynthe ⁇ i ⁇ /dige ⁇ tion embodiment the exposed A*s on the ⁇ ingle- ⁇ tranded DNA loop are dige ⁇ ted from the heteroduplex into free A* u ⁇ ing an endonuclea ⁇ e; in Step 4b the radioactive A* i ⁇ then detected using a scintillation counter, thereby measuring the magnitude of s- t.
  • Step 5 the allele difference i ⁇ determined. Calibration ⁇ done prior to the experiment assure that these measurements provide precise quantitation. Since
  • the genotype is computed from loop mismatch data, referring to figure 7, by combining the sum (from the figure 5 protocol) and difference (from the figure 6 protocol) of the allele size ⁇ ; thi ⁇ determination exploit ⁇ the elimination of PCR stutter artifact by pooling within each experiment, as described below.
  • the ⁇ ingle experiment of Step 1 accurately mea ⁇ ures the allele sum (sl+s2)
  • the single experiment of Step 2 accurately measures the allele difference js2- ⁇ l] .
  • Combining the ⁇ e in Step 3 determines the two alleles:
  • a detailed protocol i ⁇ given for the loop mismatch method The following steps referring to figure 8 are designed for measuring a single STR, rather than the multiple STRs assayed in figure 3.
  • Step 1 of figure 8 an STR locus is selected, and PCR primers are chosen to provide large flanking regions.
  • this protocol is not optimized for compatibility with the apparatu ⁇ of figure 1.
  • the primer ⁇ are ⁇ ynthe ⁇ ized derivatized to ⁇ upport the characterization experiment ⁇ .
  • a ⁇ econd right primer Rb containing one or more biotin residues at the 5 ' end or within the oligonucleotide.
  • Derivatizing the primer for binding to a surface entails incorporating a biotinylated nucleotide at the 5 ' end of the ⁇ ynthetically made oligonucleotide. Additional biotinylated re ⁇ idue ⁇ can be incorporated into thi ⁇ primer either at the time of bio ⁇ ynthe ⁇ is or by secondary photo or chemical biotinylation.
  • the preferred embodiment employs the direct addition of the 5* biotin by chemical synthe ⁇ is; alternatively, additional biotin molecules may improve the heteroduplex isolation effiency.
  • Step 2 three PCR amplifications are performed.
  • Source DNA from a genome to be characterized, and target DNA of known minimal repeat length t from an individual (or prepared in advance by cloning a segment of genomic DNA in a plasmid or phage vector) are prepared for PCR.
  • Three separate reactions are performed. These are identical, except for the following specific reaction mixtures:
  • PCR a TT' sum PCR mixture for Step 2.a target DNA, L, Rb, all dNTPs unlabeled
  • PCR b SS' ⁇ um PCR mixture for Step 2.b ⁇ ource DNA, L#, R, labeled a- 32 P-dATP, other dNTPs unlabeled
  • PCR c S2,S1' difference PCR mixture for Step 2.c source DNA, L#, Rb, labeled a- 32 P-dATP, other dNTP ⁇ unlabeled
  • each 0.2 or 0.5 ml tube contains the appropriate set of primers, followed by the ⁇ tandard PCR buffer containing Tri ⁇ buffer, KCl, MgCl 2 and dNTP (the four tripho ⁇ phate deoxynucleotide ⁇ ) .
  • the total size of each PCR reaction is 50 ul (though thi ⁇ can vary from 10-100 ul) .
  • Each ⁇ pecific PCR reaction contain ⁇ it ⁇ ⁇ pecific reaction mixture, the PCR buffer (ie lOmM Tri ⁇ pH8.0, 50 mM KCl, 2.5 mM magne ⁇ ium chloride, albumin), and thermo ⁇ table (e.g., Taq) polymera ⁇ e.
  • the PCR reaction i ⁇ overlayed with a thin layer of Ampliwax that ⁇ eparate ⁇ ⁇ ome of the components from each other so that the reaction begins when the temperature rises to a level that melt ⁇ the wax and allow ⁇ all of the component ⁇ to mix.
  • Thi ⁇ i ⁇ the "hot start” method of PCR which reduces nonspecific synthesi ⁇ products.
  • An initial heat denaturation of 93-95'C for 5 minutes is followed by the thermal cycles are performed 20-40 times.
  • Each cycle consists of a 30 sec denaturation step at 95'C, 15-30 second annealing ⁇ tep at 50-65 * C (typically 55 * C) and an exten ⁇ ion ⁇ tep at 73'C for 15-120 seconds (typically 45 second ⁇ ) .
  • 0.5M EDTA is added to a final concentration of 10 mM. This inactivates the Taq polymerase.
  • Step 3 the heteroduplex hybridizations and modifications are done.
  • Reaction ⁇ a and b are combined ( ⁇ ummation experiment) in Step 3a, and reaction c (difference experiment) i ⁇ kept ⁇ eparate in Step 3b. All the following operation ⁇ are done independently for the two reaction ⁇ (sum and difference).
  • the samples are then heated to 95"C for 5 minutes and allowed to anneal at a temperature of 75"C to discourage primer-strand annealing. After 2-24 hours, the temperature i ⁇ lowered to 4 * C to solidify the Ampliwax and the exonuclease VII (Gibco, BRL) , incorporated by reference, in the appropriate buffer i ⁇ added to the ⁇ urface.
  • the buffer condition ⁇ for the PCR are compatible directly with tho ⁇ e of exonuclea ⁇ e VII.
  • the reactions are initiated by heating to 37"C and incubated for a time ranging from 1-120 minute ⁇ .
  • the reaction ⁇ are terminated by the addition of chloroform to the tubes.
  • the supernatant ⁇ from the chloroform extraction ⁇ contained hetero- and homoduplexe ⁇ , digested single strand ⁇ , pri er ⁇ and free nucleotide ⁇ .
  • the double- ⁇ tranded DNA i ⁇ then purified u ⁇ ing a ⁇ pin column/filter (such as Centricon filters from Amicon) to remove the small molecular weight material and concentrate the samples.
  • the purified DNAs from experiment are then adsorbed to strepavidin paramagnetic bead ⁇ (DYNAL 1993. Dynabead ⁇ biomagnetic ⁇ eparation system, Technical Handbook: Molecular Biology, Dynal International, Norway.) to bind tho ⁇ e double-stranded DNAs that contain the biotinylated right primer.
  • the bead ⁇ are wa ⁇ hed several times with a neutral salt buffer to reduce non ⁇ pecific binding and not disrupt the double-stranded DNA.
  • the tubes are incubated at 37"C for 4 hours (longer or shorter times may be used) .
  • the reaction is terminated by wa ⁇ hing with 0.01M Tris-HCl pH 7.0 and 1.0 M NaCl.
  • the NaCl prevents dissociation of the heteroduplexes during the etheno-dehydration step.
  • the samples heated in the final wa ⁇ h volume at 85 * C for 1 hour (dehydrate ⁇ the ethenoderivative) .
  • Step 3 u ⁇ ing a ⁇ ingle- ⁇ trand ⁇ pecific endonuclea ⁇ e ⁇ uch a ⁇ SI nuclea ⁇ e or micrococcal nuclea ⁇ e, the original PCR products that have been treated with exonuclease and bound to the strepavidin beads are equilibrated in the endonuclease buffer and reacted for varying time ⁇ .
  • Step ⁇ 4a and 4b the ⁇ ignal ⁇ are detected.
  • the fluore ⁇ cence and radioactivity retained on the bead ⁇ are mea ⁇ ured.
  • the amount of floure ⁇ cein and 32 P can be independently determined.
  • the fluore ⁇ cence i ⁇ measured by heating the samples to 95"C and eluting the DNA from the bead ⁇ , taking the supernatents and measuring the fluore ⁇ cence with a fluorimeter (excitation at 310 nm emi ⁇ ion at 410 nm) .
  • the degree of fluore ⁇ cence and sensititivity of the fluorimeter is calibrated with a quinine sulfate standard (10 "5 - IO" 7 M in 0.1 N H 2 S0 4 ) .
  • the tubes can be counted again for the amount of retained floure ⁇ cein and 32 P label ⁇ .
  • the amount of radioactivity can be calibrated with known ⁇ tandards that account for tube geometry, sample volume and instrument counting efficiencies. Ba ⁇ ed upon the radioactivity and the fluorescence, the size of the loops can be establi ⁇ hed.
  • Step 5 the genotype is determined.
  • Step 5a the sum is computed from the Step 4a detection, and in Step 5b, the difference is computed from the Step 4b detection.
  • the result ⁇ are combined in Step 5c to determine the genotype of the STR, a ⁇ de ⁇ cribed. Thi ⁇ complete ⁇ the protocol.
  • DNA protection i ⁇ done to minimize ⁇ puriou ⁇ ⁇ ignal ⁇ from unhybridized ⁇ ingle- ⁇ tranded DNA, and exonucleases are not u ⁇ ed.
  • Step 3a a 10:1 excess of the SS' amplified product relative to the TT' amplified product i ⁇ preferably u ⁇ ed.
  • Step 3b when necessary, TT' (or fragments thereof) without labels or linker ⁇ i ⁇ added to block unhybridized SI' ⁇ trands.
  • the number of PCR reactions can be reduced by performing PCR reaction ⁇ b and c above a ⁇ a fir ⁇ t reaction u ⁇ ing a cleavable biotinylated right primer and modifying ⁇ everal steps.
  • the PCR product can then be combined with the second target PCR reaction a to allow ⁇ equential measurement of the sum and difference experiments. This i ⁇ accompli ⁇ hed by combining the two PCR reaction ⁇ for the Source and Target DNA' ⁇ in Step 2, preparing and i ⁇ olating the heteroduplexe ⁇ on the streptavidin beads in Step 3, and mea ⁇ uring the nucleotides within the loops by derivatization and fluorescence in Step 4.
  • Step 4 The initial measurements in Step 4 are then followed by the release of duplexes employing the immobilized Source strand by reduction of a disulfide linkage between the primer and the biotin.
  • a more sen ⁇ itive detection ⁇ y ⁇ tem for the chemical modification embodiment i ⁇ an antibody-enzyme conjugate that recognize ⁇ the derivatized DNA (i.e. , the etheno- derivatives created by chloracetaldehyde) and catalyzes a colorimetric reaction that can be measured in the supernatent.
  • the ⁇ imple ⁇ t form of thi ⁇ a ⁇ ay would be to u ⁇ e a betagalacto ⁇ ida ⁇ e-antibody conjugate that act ⁇ on a colorimetric substrate such a ⁇ X-gal or Blue-gal (BRL/Gibco) .
  • the fragments L(CA) n .,R, L(CA) n . 2 R, and ⁇ o on are al ⁇ o generated in addition to the main PCR product L(CA) ⁇ R.
  • the di ⁇ tribution of the smaller fragments generally follows a decay pattern, with the amount of L(CA) m R le ⁇ than L(CA) n R, when m ⁇ n.
  • Thi ⁇ decay pattern i ⁇ empirically ob ⁇ erved to differ from one genetic locu ⁇ to another, but remain ⁇ ⁇ table across unrelated individuals for any given locus.
  • the u ⁇ e of pooled target ⁇ in the preferred embodiment eliminates thi ⁇ artifact. Multiple sources hybridized against multiple targets, producing a quadratic number of heteroduplexes. The different CA-repeat ⁇ izes
  • the mismatch loop size of each hybrid (S-i)x(T-j) is (s- t-i+j).
  • Each mismatch loop larger by d than the mean s-t is mirrored by a roughly equal concentration in its symmetric matrix entry of a mi ⁇ match loop ⁇ maller by d than the mean ⁇ -t.
  • the total signal from the stuttered sources with the stuttered targets averages out to the mean value s-t.
  • STR Genotyping in a Combined Heteroduplex Experiment The sum and difference experiments * at a locus are done separately using separate PCRs: two for the sum, and one for the difference, as described above.
  • the first PCR to construct TT' is preferably done prior to the introduction of sample genomic DNA, and can be incorporated (or "compiled") into the apparatus.
  • the protocol of figure 8 employs two PCRs. The following describes how to reduce this to just one PCR experiment, thereby reducing operating time and space requirements.
  • Digoxigenin is u ⁇ ed a ⁇ a linker (The Geniu ⁇ Sy ⁇ tem User's Guide for Filter Hybridization, 1992. Boehringer Mannheim Corporation, Indianapolis, IN) , incorporated by reference.
  • Step 1 an STR locu ⁇ is selected and oligonucleotides prepared.
  • Step 2a unlabeled duplex TT' of a known small repeat size t is constructed by PCR or direct synthe ⁇ i ⁇ .
  • the right primer has a digoxigenin linker 1.
  • Step 2b the homoduplexes SI,SI 1 and S2,S2' of an uncharacterized genomic DNA sample are amplified via PCR.
  • the first label (*) is incorporated into the single- ⁇ tranded loop, the left primer ha ⁇ the ⁇ econd label (#) , and the right primer ha ⁇ a biotin linker 2.
  • Step 3 the duplexes are combined and denatured together at high temperature into their separate strands, yielding: S2 , SI , T , S2 ' , SI ' , and T ' .
  • the upper right triangle ⁇ ub atrix hybrid ⁇ provide all the detectable elements -
  • the hybrid ⁇ pecie ⁇ (S2,S2'; SI,SI'; T,T') along the matrix diagonal are not detectable, ⁇ ince the duplex ⁇ trand ⁇ are identical in size, and no loop mismatch is formed.
  • the lower left triangle sub atrix hybrid ⁇ are not detectable - (S1,S2') By as ⁇ umption, ⁇ l ⁇ s2, so no loop mismatch is formed.
  • the T' (pre-made, digoxigenin linker 1) lower DNA strand ⁇ from the SI' and S2' (locus-made, biotin linker 2) lower strands are spatially separated by using two different solid supports to specifically bind the digoxigenin and the biotin linkers in different measurable regions.
  • the ⁇ ignals required for measuring the sum and the difference are detected in spatially separated experiments.
  • Step 5 the usual analy ⁇ i ⁇ (which exploits the expected PCR stuttering and the pooled targets) is u ⁇ ed to compute the allele value ⁇ .
  • a cleavable biotinylated linker is used on the right primer of T' that allows separate PCRs of a target and of genomic DNA, combines the samples into a single heteroduplex reaction, and then detects all nine of the hybridization products listed above. The following are measured: (a) the number of SI and S2 strands bound, and (b) the number of nucleotides in the loops. Then, the S1,T' and S2,T' measurable heteroduplex species are liberated by reduction of the dissulfide linkage, followed by remea ⁇ uring the S2,S1' bound, and the number of nucleotides in the remaining loops.
  • a Scalable STR Genotyping Assay The methods described refering to figure 8 enable practical construction of the apparatus in figure 1 and sy ⁇ tem manufactured device de ⁇ cribed in figure 3 in which multiple STR loci are genotyped ⁇ imultaneou ⁇ ly.
  • Step (1) is specific for a given STR.
  • the other four step ⁇ are largely independent of the given STR. Therefore, the apparatu ⁇ in figure 1 i ⁇ constructed to spatially encode multiple genetic loci on a ⁇ urface, and place ⁇ Step (1) ' ⁇ specific STR oligonucleotides at each spatial location, prior to complete PCR processing.
  • Step (2a) in figure 8 depo ⁇ its the pooled targets TT', and then Step ⁇ (2b- 5) for the sample-dependent PCR proces ⁇ ing, DNA hybridization, ⁇ ignal detection, and genotype determination are performed simultaneously over the ⁇ urface.
  • Step ⁇ (2-5) for the sample-dependent PCR proce ⁇ sing, DNA hybridization, signal detection, and genotype determination are performed simultaneously over the surface.
  • the steps of figure 8 for single STR genotyping are related to the step ⁇ of figure 3 for multiple STR genotyping.
  • the three dimensions of ⁇ pace and one dimen ⁇ ion of time can be u ⁇ ed to multiplex the STR-specific oligonucleotides and the PCR processing.
  • multiple reaction chamber ⁇ in a three-dimen ⁇ ional arrangement would each contain STR-specific oligonucleotides over some time period.
  • the PCR processing would be done in parallel in multiple chambers, until all required signals were obtained. This physical arrangement can customize the PCR conditions, if neces ⁇ ary, to each STR.
  • 864-chamber plate ⁇ can be physically arranged to achieve over 100,000 simultaneou ⁇ characterization ⁇ .
  • Thi ⁇ i ⁇ done by con ⁇ tructing a surface of four plates in a 2x2 array, which provide ⁇ 3,456 chambers in a layer. Stacking thirty such layers provides 103,680 chambers.
  • This three dimen ⁇ ional arrangement is quite compact, with no chamber further than two feet from any other chamber.
  • thi ⁇ three dimen ⁇ ional organization fit ⁇ into a thermocycling PCR oven.
  • the hybridization, detection, and other steps are multiplexed in time, enabling efficient use of the robotic device, detection device, and computer to achieve a throughput commen ⁇ urate with the parallelization.
  • the signals from either the allele sum or allele difference experiments can be increased several-fold by detecting SS-DNA mismatch loops on Jot the upper and lower strands, rather than on just one strand.
  • the PCR stutter can again be eliminated by u ⁇ ing pooled targets.
  • the PCR primers 1002 (left) and 1004 (right) for the upper strand 1006 and the lower strand 1008 of the target TT' both contain linkers 1010 (e.g., biotin) for binding to solid support, but no (#) labels.
  • linkers 1010 e.g., biotin
  • the target TT' duplexes are constructed by standard PCR amplification of genomically derived DNA for 20-40 cycles using dNs without (*) label ⁇ .
  • Step 2b of figure 5 amplifie ⁇ the unknown homoduplexe ⁇ SI,SI' and S2,S2'.
  • the fir ⁇ t label (*) 1012 for loop quantitation is present on nucleotides (in equal proportions) in both strand ⁇ S 1014 and S' 1016.
  • the label (*) indicate ⁇ detectability, whether by chemical modification or by incorporation/dige ⁇ tion.
  • the second label (#) 1018 for strand quantitation is present on both the left 1020 and right 1022 PCR primers.
  • the source DNA SS' is developed by standard PCR amplification of genomically derived DNA for 20-40 cycles using (*) labeled dA*, dC*, dG*, and dT*.
  • Step 4 of figure 5's detection 4n loop size (*) ⁇ ignal ⁇ , and 2 ⁇ trand (#) ⁇ ignal ⁇ are measured per ST' molecule.
  • Step 5's allele determination this four-fold increase in label (*) and two-fold increase in label (#) is accounted for.
  • Steps in figure 5 applies to the case of two alleles SI and S2 for determining the allele sum.
  • Two ⁇ eparate PCR ⁇ are done, a ⁇ de ⁇ cribed: one for SI,SI' and S2,S2' labeled duplexes, and one for linker TT' targets.
  • the denaturation/reannealing experiment constructs nine hybridization products. However, only those containing an T or T' linker are detectable.
  • each SI (S2) or SI' (S2') acts as an S (S') strand, and the sum sl+s2 is measured.
  • the allele difference is determined using single-stranded loops from both the upper and lower ⁇ trand ⁇ . Thi ⁇ again ha ⁇ the advantage of ⁇ ignal amplification.
  • the genotyping i ⁇ done by cross- hybridizing SI,SI' with S2,S2'.
  • Step 1 the STR locus and its PCR primers are chosen.
  • Step 2 the two complementary strands are constructed in a single PCR amplification of sample genomic DNA.
  • the first loop quantitation label (*) is pre ⁇ ent on nucleotide ⁇ (in equal proportion ⁇ ) in both S and S'.
  • linker such as biotin, which i ⁇ attached to the 5' end of the right primer.
  • the hybridization product 1102 of the denaturation and reannealing is shown in subfigure 11A.
  • the various label and linker combinations are shown in the hybridization product table of subfigure 11B. Adding up the ⁇ ignals from the first label (*) 1104,
  • Step 5 the allele difference ⁇ 2-sl is computed as n, i.e., the normalized (and calibrated) ratio of loop size signal from the first label (*) to ⁇ trand number ⁇ ignal from the ⁇ econd label (#) .
  • step l is for identifying an STR, and synthe ⁇ izing suitable PCR reagent ⁇ .
  • the STR locus is identified by conventional techniques (Sambrook, J. , Fritsch, E.F., and Maniatis, T. 1989. Molecular Cloning, ⁇ econd edition . Plainview, NY: Cold Spring Harbor Pres ⁇ ; N. J. Dracopoli, J. L. Haine ⁇ , B. R. Korf, C. C. Morton, C. E. Seidman, J. G. Seidman, D. T. Moir, and D. Smith, ed. , Current Protocol ⁇ in Human Genetic ⁇ . New York: John Wiley and Son ⁇ , 1994), incorporated by reference.
  • preexi ⁇ ting STR loci for the genome of intere ⁇ t can be obtained from available databa ⁇ e ⁇ (Genbank, GDB, EMBL; Hilliard, Davi ⁇ on, Doolittle, and Roderick, Jack ⁇ on laboratory mou ⁇ e genome databa ⁇ e. Bar Harbor, ME; SSLP genetic map of the mouse, Map Pairs, Research Genetics, Huntsville, AL) , incorporated by reference.
  • the STR's repeat unit includes no more than three distinct nucleotides; for clarity in exposition, the following ⁇ pecification of the preferred embodiment assumes that the STR is a CA-repeat marker.
  • nucleic acid sequences flanking the CA-repeat region are determined by DNA sequencing methods (Sambrook, J., Fritsch, E.F., and Maniatis, T. 1989. Molecular Cloning, ⁇ econd edition . Plainview, NY: Cold Spring Harbor Press; United State ⁇ Biochemical 1994. USB Sequenase version 2.0 DNA sequencing kit, sequencing protocols, 9th edition, product number 70770, Amersham Life Science, Arlington Heights, IL) , incorporated by reference.
  • sequence of all or part of the STR locu ⁇ may reside in a preexisting available database, or in the original articles describing the locus.
  • oligonucleotide primers are designed for use with the DNA sequence using computer programs that facilitate PCR primer or DNA synthe ⁇ is oligonucleotide design, such a ⁇ MacVector 4.1 (Ea ⁇ tman Chemical Co., New Haven, CT) or Oligo 4.0 (National Biosciences, Inc., Madison, MN) , incorporated by reference.
  • the ⁇ e program ⁇ facilitate ⁇ electing lengths and positioning ⁇ of oligonucleotide ⁇ that are operative for enzymatic reactions.
  • the two PCR primers and the reaction conditions are designed to permit amplification of the DNA sequence, and include:
  • primer R 1 (L) a left PCR primer for the upper strand, and (R*) a right PCR primer for the complementary lower strand.
  • primer R 1 is biotinylated.
  • a third oligonucleotide for DNA sequencing primer and its reaction conditions are designed to permit sequencing of the DNA sequence:
  • (Q) a left (upstream) DNA sequencing primer that is directly adjacent to the CA-repeat region of the upper strand; this sequencing primer is designed to allow exten ⁇ ion acro ⁇ the entire tandem repeat sequence using nucleotides that are specifically limited to the repeat unit base composition.
  • the oligonucleotide primers for the CA-repeat genetic marker are synthe ⁇ ized (Haralambidi ⁇ , J. , Duncan, L. , Angu ⁇ , K. , and Tregear, G.W. 1990. The ⁇ ynthesis of polyamide- oligonucleotide conjugate molecules. Nucleic Acid ⁇ Re ⁇ earch , 18(3): 493-9. Nelson, P.S., Kent, M. , and Muthini, S. 1992. Oligonucleotide labeling methods. 3. Direct labeling of oligonucleotides employing a novel, non-nucleosidic, 2- aminobuty1-1,3-propanediol backbone.
  • the ⁇ e primer ⁇ may be derivatized with a fluore ⁇ cent detection molecule or a ligand for immunochemical detection ⁇ uch as digoxigenin.
  • these oligonucleotides and their derivative ⁇ can be ordered from a commercial vendor (Research Genetics, Huntsville, AL) .
  • a genetic material whose genotype is to be determined is selected for study. Thi ⁇ genetic material is then placed in contact with the PCR primers L and R', and PCR amplification i ⁇ performed.
  • the methods for this PCR amplification given here are standard, and can be readily applied to every CA- repeat or microsatellite marker that correspond ⁇ to a (relatively unique) location on a genome.
  • these other components include, but are not limited to, the standard PCR buffer (containing Tris pH8.0, 50 mM KCl, 2.5 mM magne ⁇ ium chloride, albumin) , tripho ⁇ phate deoxynucleotide ⁇
  • thermostable polymerase e.g., dTTP, dCTP, dATP, dGTP
  • the PCR reactions are performed on all of the reactions by heating and cooling to specific locu ⁇ -dependent temperature ⁇ that are given by the known PCR conditions.
  • the entire cycle of annealing, extension, and denaturation is repeated multiple times (ranging from 20-40 cycles depending on the efficiencies of the reactions and sen ⁇ itivity of the detection system) (Innis, M.A. , Gelfand, D.H. , Sninsky, J.J., and White, T.J. 1990. PCR Protocol ⁇ : A Guide to Method ⁇ and Application ⁇ . San Diego, CA: Academic Press.), incorporated by reference.
  • the thermocycling protocol on the Perkin- Elmer PCR System 9600 machine is:
  • each reaction tube containing the amplified DNA from a specific location of the genome.
  • Each mixture includes the DNA that was synthesized from the two alleles of the diploid genome (a single allele from haploid chromosome ⁇ as is the case with the sex chromosomes in males or in instance ⁇ of cell ⁇ in which a portion of the chromo ⁇ ome ha ⁇ been lo ⁇ t such as occurs in tumors, or no alleles when both are lost) .
  • the free deoxynucleotides and primers may be separated from the PCR products by filtration u ⁇ ing commercially available filter ⁇ (Amicon, "Purification of PCR Products in Microcon Microconcentrators,” Amicon, Beverly, MA, Protocol Publication 305; A. M. Krowczyn ⁇ ka and M. B. Henderson, “Efficient Purification of PCR Products Using Ultrafiltration,” BioTechnique ⁇ , vol. 13, no. 2, pp. 286-289, 1992) , incorporated by reference. Referring to figure 13, ⁇ tep 3 is for purification of the amplified complementary lower DNA strand.
  • the lower biotinylated strand is purified from the upper strand by using magnetic streptavidin coated beads (Dynal International, Oslo, Norway) .
  • magnetic streptavidin coated beads Disynal International, Oslo, Norway
  • the steps of Dynabead preparation, PCR product immobilization, DNA duplex melting using a 0.1M NaOH solution, and separation of the upper and lower DNA ⁇ trand ⁇ to purify the lower ⁇ trand are done, a ⁇ described (DYNAL 1993. Dynabeads biomagnetic ⁇ eparation system, Technical Handbook: Molecular Biology, Dynal International, Oslo, Norway) , incorporated by reference.
  • step 4 is for nucleic acid synthesis of the upper strand.
  • the purified amplified lower DNA strand serves as a template for a sequencing reaction.
  • the sequencing reaction provides a template-directed ⁇ ynthe ⁇ i ⁇ that extend ⁇ the upper strand across the CA-repeat region.
  • the nucleotides used are:
  • dNTP ⁇ Extension is largely restricted to the repetitive sequence by including only dNTPs that appear in the repeat unit.
  • dATP and dCTP are used.
  • One or both of these dNTPs are labeled with a detectable label *, preferably a radioisotope ⁇ uch as 35 S or 3 *-P (DuPont NEN Research Products, Boston, MA) , or a fluorescent probe (Biological Detection System ⁇ , Pitt ⁇ burgh, PA) .
  • a detectable label * preferably a radioisotope ⁇ uch as 35 S or 3 *-P (DuPont NEN Research Products, Boston, MA)
  • a fluorescent probe Biological Detection System ⁇ , Pitt ⁇ burgh, PA
  • Termination i ⁇ restricted to nucleotides not contained in the repetitive sequence.
  • ddGTP or ddTTP (ddUTP) are used, depending on the sequence of the marker.
  • the termination molecule is labelled with a second label **, that is distinct from the first label *, and can be independently detected.
  • fluorescein-labeled ddNTP (DuPont NEN Research Products, Boston, MA) is a convenient second label **.
  • a highly proce ⁇ sive polymerase enzyme having little or no exonuclease activity is preferably used, such as Sequenase 2 (U.S. Biochemical, Cleveland, OH) . Protocols optimized for the selected enzyme (United States Biochemical 1994.
  • USB Sequenase version 2.0 DNA sequencing kit sequencing protocols, 9th edition, product number 70770, Amersham Life Science, Arlington Heights, IL) , incorporated by reference, are applied, with the (labeled and unlabeled) dNTPs and ddNTPs de ⁇ cribed above ⁇ ub ⁇ tituted for the dNTP ⁇ and ddNTP ⁇ contained in the conventional ⁇ equencing protocol.
  • the u ⁇ e of Mn buffer can be helpful when synthesizing ⁇ hort sequences.
  • step 5 is for detecting signals from the synthesized nucleic acids.
  • the newly ⁇ ynthe ⁇ ized upper DNA sequence formed by means of the DNA sequencing reaction remains hybridized to the biotinylated lower strand, which in turn is tightly bound to the streptavidin beads.
  • the DNA ⁇ equencing primer ⁇ , nucleotide ⁇ , and other reagent ⁇ are removed by repeated gentle wa ⁇ hing with a buffer that promote ⁇ double stranded DNA, such as the Dynabead binding and washing buffer (DYNAL 1993. Dynabeads biomagnetic ⁇ eparation ⁇ yste , Technical Handbook: Molecular Biology, Dynal International, Oslo, Norway) , leaving only the bound duplex DNA containing the desired purified product.
  • Fluorescence signals are detected and quantitated, preferably by means of a fluorimeter.
  • Radioactive signal ⁇ are detected and counted, preferably by mean ⁇ of a scintillation counter.
  • ⁇ tep 6 i ⁇ for analyzing the detected ⁇ ignal ⁇ to determine the genotype ⁇ um (or average) .
  • Precalibration with a set of predetermined reference allele ⁇ can e ⁇ tabli ⁇ h the ⁇ cale factor, and any deviation ⁇ from linearity.
  • PCR ⁇ tutter artifact i ⁇ accounted for by deconvolution with the known stutter distribution (Perlin, M.W. , Burks, M.B., Hoop, R.C., and Hoffman, E.P. 1994.
  • this analysi ⁇ procedure compute ⁇ the genotype.
  • this procedure computes the average (or, equivalently, the sum) of the alleles.
  • step 4 ' is for nucleic acid synthesis of the upper strand, and is comprised of the steps:
  • the purified amplified lower DNA ⁇ trand serves as a template for a sequencing reaction.
  • the sequencing reaction provides a template-directed synthe ⁇ is that extends the upper strand across the CA-repeat region.
  • the nucleotides used are:
  • dNTP ⁇ Exten ⁇ ion i ⁇ largely re ⁇ tricted to the repetitive sequence by including only dNTPs that appear in the repeat unit. For a CA-repeat, only dATP and dCTP are used. These are both unlabeled.
  • a highly proce ⁇ ive polymera ⁇ e enzyme having little or no exonuclease activity is preferably used, such as Sequenase 2 (U.S. Biochemical, Cleveland, OH) . Protocols optimized for the selected enzyme (United States Biochemical 1994.
  • USB Sequenase version 2.0 DNA sequencing kit sequencing protocols, 9th edition, product number 70770, Amersham Life Science, Arlington Heights, IL
  • the unlabeled dNTPs described above are sub ⁇ tituted for the dNTPs and ddNTPs contained in the ⁇ tandard sequencing protocol. Washing with the stabilizing Dynabead binding and washing buffer is then done 2-4 times (DYNAL 1993. Dynabeads biomagnetic separation system. Technical Handbook: Molecular E ology, Dynal International, Oslo, Norway) to remove the unincorporated primers and dNTPs, and thereby purify the duplex DNA comprised of lower strand template and partially synthesized unlabeled upper strand DNA.
  • step 4 • b is for heteroduplex formation between different alleles of the upper and lower strands.
  • sodium hydroxide is used to melt the duplex, and an equimolar amount of hydrochloric acid is then subsequently used to reanneal (DYNAL 1993. Dynabead ⁇ biomagnetic separation system, Technical Handbook: Molecular Biology, Dynal International, Oslo, Norway) . Specifically (p. 23) , using the bead-immobilized double stranded product,
  • the denaturing and renaturing i ⁇ done by heating the duplex DNA solution to a temperature of 65°C to 95°C for a period of 2 to 30 minutes, and then gradually cooling the solution over a period of 15 to 90 minutes to a temperature between 25°C and 40°C.
  • step 4 'c is for labeled restricted synthe ⁇ i ⁇ of the upper ⁇ trand.
  • the purified amplified lower DNA ⁇ trand ⁇ erve ⁇ a ⁇ a template for continuing the ⁇ equencing reaction.
  • the template-directed ⁇ ynthe ⁇ i ⁇ continues the upper strand sequencing acro ⁇ the CA-repeat region.
  • the nucleotides used are:
  • dNTP ⁇ Extension i ⁇ largely re ⁇ tricted to the repetitive ⁇ equence by including only dNTPs that appear in the repeat unit.
  • dATP and dCTP are used for a CA-repeat.
  • dNTP ⁇ are labeled with a detectable label *, preferably a radioi ⁇ otope such as 35 S or 32 P (DuPont NEN Research Products, Boston, MA) , or a fluorescent probe (Biological Detection Systems, Pittsburgh, PA) .
  • a detectable label * preferably a radioi ⁇ otope such as 35 S or 32 P (DuPont NEN Research Products, Boston, MA) , or a fluorescent probe (Biological Detection Systems, Pittsburgh, PA) .
  • Termination is restricted to nucleotide ⁇ not contained in the repetitive sequence.
  • ddGTP or ddTTP (ddUTP) are used, depending on the sequence of the marker.
  • the termination molecule is labelled with a second label **, that is distinct from the first label *, and can be independently detected.
  • fluorescein-labeled ddNTP (DuPont NEN Research Products, Bo ⁇ ton, MA) i ⁇ a convenient ⁇ econd label **.
  • a highly proce ⁇ ive polymera ⁇ e enzyme having little or no exonuclease activity is preferably u ⁇ ed, ⁇ uch a ⁇ Sequena ⁇ e 2 (U.S. Biochemical, Cleveland, OH) . Protocol ⁇ optimized for the ⁇ elected enzyme (United State ⁇ Biochemical 1 994.
  • USB Sequena ⁇ e ver ⁇ ion 2.0 DNA sequencing kit, sequencing protocol ⁇ , 9th edition, product number 70770, Amersham Life Science, Arlington Heights, IL) are applied, and the (labeled and unlabeled) dNTPs and ddNTPs described above are sub ⁇ tituted for the dNTPs and ddNTPs contained in the standard sequencing protocol.
  • this heteroduplex product is comprised of unlabeled primer, an unlabeled repetitive sequence with about s repeated CA units, a *-labeled repetitive sequence with about (t-s) repeated CA units, and has a **-labeled terminator dye.
  • step 6* i ⁇ for analyzing the detected ⁇ ignal ⁇ to determine the genotype difference is analyzed.
  • Precalibration with a ⁇ et of predetermined reference allele ⁇ can e ⁇ tablish this scale factor, and any deviations from linearity.
  • PCR stutter artifact is accounted for by deconvolution with the known stutter distribution (Perlin, M.W. , Burks, M.B., Hoop, R.C., and Hoffman, E.P. 1994.
  • thi ⁇ analy ⁇ i ⁇ procedure computes the difference between the two alleles of the genotype.
  • a method is de ⁇ cribed for determining STR allele ⁇ by nucleic acid ⁇ ynthe ⁇ is that is comprised of the steps:
  • dNTPs Nucleotide ⁇ that are restricted to the composition of the repetitive unit, at least one of which is labeled with the repeat counter first label *.
  • thi ⁇ could be *-dATP and dCTP.
  • reporter R A biotinylated reporter R that i ⁇ added after the reducing agent ha ⁇ cleaved the biotinylated PCR primer from the streptavidin bead ⁇ .
  • the reporter R i ⁇ a biotinylated terminating ddNPT that is added by means of a sequencing enzyme.
  • reporter R is a biotinylated oligonucleotide that i ⁇ added a ⁇ the right flanking sequence of the repetitive sequence by mean ⁇ of a ligation enzyme.
  • the detection reagent ⁇ used for the required labeling may include (but are not limited to) radioactivity, fluorescence, phosphorescence, chemiluminescence, electrical resistivity, pH, and ionic concentration.
  • the lower strand can be sequenced, instead of the upper strand.
  • a repetitive unit other than CA, but containing no more than three distinct nucleotides can be used.
  • dNTPs are used for every nucleotide in the repetitive unit, with at least one of the repetitive unit nucleotides labeled with the first label *, and ddNTP( ⁇ ) are used for every nucleotide not in the repetitive unit, with the appropriate terminating nucleotide immediately following the repetitive sequence labeled with the second label **.
  • the hybridization panel method for genotyping STRs is distingui ⁇ hed from the loop mi ⁇ match method described previously in that the determination of an STR's allele ⁇ i ⁇ accompli ⁇ hed with an entire panel of hybridization probe ⁇ , rather than determining the allele ⁇ with only two loop mi ⁇ match hybridization experiment ⁇ .
  • This hybridization panel method generally entails more hybridization experiments per STR than the loop mismatch method.
  • this approach is applicable to the determination of specific nucleotide sequences realted to genomic DNA, specific genes, and known mutations.
  • the central idea of the hybridization panel method for genotyping STR alleles is to have a detection panel of DNA probes.
  • This panel measure ⁇ the extent of specific DNA binding of the patient's DNA against a set of probes.
  • a second coordinate of information can optionally be obtained by performing the reactions over a range of reaction ⁇ tringencie ⁇ (e.g., using temperature, ion concentration, or DNA denaturants) .
  • the re ⁇ ult is a mapping from one or two coordinates (probe and stringency) into the reaction energetics (binding affinity) .
  • L(CA) n R be one allele in the patient's PCR product for a given STR reaction chamber in the two dimensional array.
  • L is the left flanking region DNA ⁇ ubsequence
  • R is the right flanking region DNA subsequence
  • n is the number of allelically varying CA repeat ⁇ , ⁇ o that (CA) n i ⁇ the middle DNA ⁇ ubsequence of length 2n.
  • the right PCR primer is a suffix ⁇ ubsequence of the right flanking region R.
  • each detection panel is customized to the PCR product of it ⁇ STR allele.
  • Thi ⁇ i ⁇ done by providing a panel of allele ⁇ pecific oligonucleotide ⁇ (ASOs) (Lemna, W.K. , Feldman, G.L., Kerem, B.-S., Fernbach, S.D., Zevkovich, E.P., O'Brien, W.E., Riordan, J.R. , Collins, F.S., Tsui, L.- C, and Beaudet, A.L. 1990. Mutation analy ⁇ i ⁇ for heterozygote detection and the prenatal diagnosis of cystic fibrosis. N. E. J.
  • each ASO contains an allele-specific left flanking region, concatentated with a number n of repeat unit nucleotide ⁇ , concatentated with an allele- ⁇ pecific right flanking region.
  • the lengths of the left and right regions flanking the varying size repeat polymer are individually adjusted to ensure that the left and right oligomers have roughly the ⁇ ame DNA binding energies when hybridizing to their respective complementary DNA strand ⁇ .
  • thermodynamic ba ⁇ i ⁇ for thi ⁇ (and alternative) approaches is that while perfect DNA duplex matches will have minimum energy, mismatches will induce bulges or loops in the DNA duplex molecule that increase the free energy. A two base-pair bulge will have sufficiently increased free energy (Ninio, J. 1979. Biochimie , 61: 1133. Salser 1977. Cold Spring Harbor Symp. Quant . Biol . , 42: 985.), incorporated by reference, to reduce binding affinity by several kcal/mole relative to a perfect match; the larger the bulge, the more unfavorable the binding.
  • a panel of ASO ⁇ that provide for all value ⁇ of n is used to determine the m values expressed from the PCR product.
  • the panel of target probes is constructed as the set of DNA sequences formed by concatenating L 0 , (CA) n , and Ro, as
  • the complementary PCR ⁇ ource product ⁇ have the form
  • one detection panel i ⁇ provided for the PCR product ⁇ of each genetic marker.
  • Each detection panel corre ⁇ ponds to one marker locus, and is embedded at that locu ⁇ ' coordinate in the ⁇ patially localized PCR marker grid.
  • the two ⁇ urfaces (PCR and detection) may be separate or composite.
  • the oligomers flanking the STR region are (in general) different for every genetic marker. That is, the target probe panel sequences are cu ⁇ tomized to each genetic marker.
  • a ⁇ econd coordinate of hybridization ⁇ tringency would be added.
  • Thi ⁇ ⁇ tringency variation can be implemented by varying any of several factors in the hybridization, including temperature, ion concentration, formamide concentration, and nucleotide compo ⁇ ition (Sambrook, J., Fritsch, E.F., and Maniatis, T. 1989. Molecular Cloning, ⁇ econd edition . Plainview, NY: Cold Spring Harbor Pres ⁇ .), incorporated by reference.
  • the two coordinate ⁇ of differential target ⁇ and differential ⁇ tringency give an even clearer ⁇ ignature for STR allele ⁇ .
  • the signature of two alleles is formed by superimpo ⁇ ing those of single alleles.
  • unique signatures in one or two coordinates
  • the separation of the superimpo ⁇ ed pattern ⁇ to effect genotyping can be done without recourse to such a library of signature ⁇ by curve fitting or deconvolution proce ⁇ sing.
  • the ⁇ tringency variation can be effected by temperature ramp, or by changing the chemical environment of the hybridization over time.
  • An alternative embodiment uses an identical detection panel of target oligonucleotides for every genetic locus.
  • each grid is comprised of the target panel ⁇ (CA) n ! n varie ⁇ acro ⁇ all intere ⁇ ting polymorphisms ⁇ •
  • n could range from 10 to 40.
  • intentional DNA pairing mismatch is introduced to bias the hybridization again ⁇ t further STRs. This can be done by a three-fold expansion of these probes by adding a mismatching base pair at one end. For example, with CA-repeats as the STR, these four probe fa ilie ⁇ are po ⁇ ible for every n:
  • the same detection panel i ⁇ u ⁇ ed for every genetic locu ⁇ , but intentional mismatch is introduced by changing the target DNA composition.
  • CA- repeats as STRs, a family of (CA) D or (GT) n probes are used, but changes are introduced in specific bases. For example, some G's are changed to Cs, or to the energetically similar base inosine.
  • the doping i ⁇ introduced in the ⁇ ource molecule, rather than in the target ⁇ has the advantage of requiring ju ⁇ t one target DNA molecule (i.e., a very large repeated oligomer) for all the genetic loci. Thu ⁇ , the manufacturing co ⁇ t ⁇ are greatly reduced, ⁇ ince replicated complex panel ⁇ for each locus are not needed.
  • the extent of doping is introduced (say, with inosine) as a variable into the PCR reaction itself.
  • the doping i ⁇ random acro ⁇ the PCR product ⁇ , but ha ⁇ con ⁇ tant ⁇ tati ⁇ tic ⁇ , particularly in the repetitive unit region of the unknown STR PCR product molecule. If two coordinate signatures are de ⁇ ired, hybridization ⁇ tringency variation can be introduced a ⁇ well.
  • a single STR detection probe is used for all experiments.
  • Using a single probe, say (CA) n (n large and fixed) dramatically reduces manufacturing costs.
  • a temperature ramp experiment is then conducted in parallel for every genetic locu ⁇ by varying ⁇ tringency.
  • For each PCR product with GT-repeat length when its subpopulation of (GT) k ⁇ equence ⁇ rapidly melt ⁇ , there will be a ⁇ harp change in the melting profile.
  • Thi ⁇ will be detectable as a peak in the first derivative of the curve. The peaks provide a DNA size v ⁇ . concentration mapping that can then be used to determine the alleles.
  • STR repeat units of any size.
  • the newer trinucleotide repeat ⁇ , tetranucleotide repeat ⁇ , etc. are more favorable energetically, and provide greater allele differentiation.
  • the size of the bound detection oligonucleotide is adju ⁇ ted to maximally di ⁇ cri inate between a perfect match and a ⁇ ingle ba ⁇ e pair mi ⁇ match.
  • An alternative to detecting perfect v ⁇ . mismatched heteroduplexes is using chemical modification reagents (such as CII, CAA, Os0 4 , or hydroxylamine) that can react with single nucleotide mismatche ⁇ and then be detected.
  • Nested PCR (Yourno 1992. A Method for Nested PCR with Single Closed Reaction Tubes. PCR Meth . Appl . , 2(1): 60-65. Inni ⁇ , M.A. , Gelfand, D.H. , Snin ⁇ ky, J.J., and White, T.J. 1990. PCR Protocol ⁇ : A Guide to Method ⁇ and Application ⁇ . San Diego, CA: Academic Pre ⁇ s.) , incorporated by reference, can be done for a purer PCR amplification to reduce noise.
  • LCR Ligase chain reaction
  • both strands must be nucleic acid ⁇ . Whether these are comprised of DNA, RNA, or any other nucleic acid polymer is nonessential. The key requirement is the binding specifity of complete and partial sequence matches. Further, these nucleic acids are modified (e.g, with linker olecule ⁇ , biotin, detection moietie ⁇ ) to perform the detection component ⁇ of the method.
  • FIG 16 a schematic representation is shown of an assay for determining STR alleles from a nucleic acid ligation step.
  • Standard oligonucleotide ligation as ⁇ ay (OLA) a ⁇ ays for the exact match of a pair of oligonucleotides X and Y against a DNA template molecule previously amplified by PCR (Landegren, U. , Kaiser, R. , Sanders, J., and Hood, L. 1988. A liga ⁇ e-mediated gene detection technique. Science , 241: 1077-1080; Inni ⁇ , M.A. , Gelfand, D.H. , Snin ⁇ ky, J.J., and White, T.J. 1990. PCR Protocol ⁇ : A Guide to Methods and Application ⁇ . San Diego, CA: Academic Pre ⁇ ) , incorporated by reference. Following amplification with the PCR primer ⁇ L and R' , two ligation oligonucleotides are conventionally used:
  • (X) initiates the matching sequence from the 5' end, and is biotinylated
  • (Y) completes the matching ⁇ equence to the 3 ⁇ end, and is labeled (e.g., with radiolabel or fluorescent label).
  • the 5 1 end of Y is phosphorylated to allow ligation to X.
  • variable length repeat preclude ⁇ the de ⁇ cribed u ⁇ e of thi ⁇ a ⁇ ay.
  • CA-repeat allele ⁇ can be detected.
  • Zk bridges the gap between X and Y.
  • the 5' end of Zk is phosphorylated to allow ligation to X.
  • the phosphorylated Y is ligated to Zk.
  • This CA-repeat detection differs from conventional ligation as ⁇ ays in that (a) a three-way ligation is performed, (b) a set of intermediate molecules is u ⁇ ed, (c) these intermediate molecules are universally reusable for a ⁇ aying more than one CA-repeat marker, and (d) a ⁇ equence of varying length can be detected.
  • the best Zk's which have the ⁇ tronge ⁇ t ⁇ ignal ⁇ determine the allele ⁇ . This detection can be improved on by deconvolving the panel of signals with the known PCR stutter pattern of the alleles (Perlin, M.W. , Burks, M.B. , Hoop, R.C., and Hoffman, E.P. 1994.
  • ligation chain reaction is performed, rather than a PCR amplification followed by an OLA detection ⁇ tep.
  • Thi ⁇ embodiment u ⁇ es the three oligonucleotide ⁇ X, Y, and Z described above. Specific protocols can be found in (Ausubel, F.M. , Brent, R. , guitarist, R.E., Moore, D.D., Seidman, J.G., Smith, J.A. , and Struhl, K. , ed. 1993. Current Protocol ⁇ in Molecular Biology . New York, NY: John Wiley and Sons; Dracopoli, N.J., Haines, J.L., Korf, B.R. , Morton, C.
  • FIG 17 a schematic representation i ⁇ ⁇ hown of an a ⁇ ay for determining STR alleles from a nucleic acid loop ligation step.
  • Two unique primers for a specific microsatellite are constructed.
  • the primer ⁇ are selected to flank the tandem repeat but to leave at least 15 to 20 bp of internal unique ⁇ equence flanking the repeat region.
  • the oligonucleotide is designed to have significant base mismatching if there is " ⁇ lippage" and a portion of the oligonucleotide extend ⁇ into the 5' and 3' portion ⁇ of the tandem repeat.
  • the degree of exten ⁇ ion into the repeat can be varied but i ⁇ done so that the bridging oligonucleotides are smaller, preferably 15-20 nucleotides than the loop oligonucleotide.
  • a melting temperature for the loop oligonucleotide that is about 10° higher than the largest bridging oligonucleotide is desirable.
  • the loop oligonucleotide is biotinylated or covalently bound to a support matrix or surface.
  • the loop oligonucleotide i ⁇ bound to paramagnetic beads that are covalently linked to strepavidin.
  • the loop oligonucleotide is phosphorylated at the 5' end.
  • the microsatellite marker is amplified using standard PCR primers and conditions.
  • the double-stranded DNA is denatured and annealed to the loop oligonucleotide.
  • the conditions of the annealing are such that the concentrations of the DNA and oligonucleotide are relatively low to discourage concatamer formation, the loop oligonucleotide should be pre ⁇ ent in exce ⁇ s with respect to the PCR product.
  • the hybridization is performed at a sufficient temperature (preferably 37°C) in O.lxSSC or a comparable buffer such that the annealed loop oligonucleotide and PCR strand are ⁇ table, but ⁇ imple annealing within the tandem repeat of the two PCR DNA ⁇ trand ⁇ i ⁇ di ⁇ favored.
  • the annealing i ⁇ performed at a low concentration in a minimum volume of 200 microliter ⁇ in order to di ⁇ favor concatamer formation.
  • part A the original PCR primers do not need to be removed prior to the annealing. After the annealing is completed, the unhybridized DNA and primers are eliminated by wa ⁇ hing with the hybridization buffer.
  • both specificity and sen ⁇ itivity i ⁇ achieved by hybridizing the PCR product with the loop oligonucleotide.
  • the ⁇ tructure i ⁇ annealed (in a ⁇ et of ⁇ eparate chamber ⁇ or po ⁇ ition ⁇ ) with a ⁇ et of bridging oligonucleotides that represent different multiples of the tandem repeat.
  • the bridging oligonucleotide i ⁇ complementary to the PCR'd DNA strand that is hybridized to the loop oligonucleotide.
  • the bridging oligonucleotide is labeled with radioactivity or another detection tag such as fluore ⁇ cein.
  • the bridging oligonucleotide is phosphorylated at the 5' end.
  • the exonuclease reaction is carried out to digest all noncircularized, single- or double-stranded DNAs and primers.
  • the remaining material on the support matrix represents the undigested circularized loop oligonucleotide and bridging oligonucleotide.
  • Bridging oligonucleotides that are too ⁇ hort or too long to perfectly clo ⁇ e the loop oligonucleotide are ligated to one end of the loop oligonucleotide but cannot allow the ⁇ tructure to circularize.
  • the ⁇ e partially ligated product ⁇ are then eliminated during the exonuclea ⁇ e ⁇ tep.
  • the nondegraded products (the circularized strands) are bound to the streptavidin-para agnetic beads in a 500 ⁇ l tube, washed three times with 200 ⁇ l of washing buffer and then counted directly or denatured off of the beads using the loading buffer/Dye for sequencing gels and run on a standard denaturing sequencing gel.
  • the annealing and ligation of the bridge and loop oligonucleotides to create a circular ⁇ tructure i ⁇ performed as a two-stage process to discourage concatemer formation.
  • this protocol only the bridge oligonucleotide is pho ⁇ phorylated.
  • the reaction is identical to that described until the end of the ligation ⁇ tep.
  • the ⁇ ample i ⁇ denatured at 95°C for 5 minute ⁇ and 0.1 unit of T4 Polynucleotide kinase is added at 37°C for 30 minute ⁇ . Thi ⁇ phosphorylates the 5' ends of the loop oligonucleotides.
  • the reaction is then again heated at 95°C for 2-5 minute ⁇ and the ⁇ ample ⁇ are diluted 100 fold in lx liga ⁇ e buffer to promote circularization.
  • IPM Inner Product Mapping
  • the probe' RH signature i ⁇ compared with the RH ⁇ ignature of every STS.
  • the ⁇ ignature ⁇ match at some RH (i.e. , ++ or —)
  • this indicate ⁇ concordance between the two ⁇ ignatures
  • a mismatch i.e., +- or -+
  • this indicates discordance between the signature ⁇ .
  • the sum of the matche ⁇ minus the sum of the mi ⁇ matches i ⁇ computed, which generates a profile curve acro ⁇ the chromo ⁇ ome.
  • the peak of this profile sugge ⁇ t ⁇ the location of the probe.
  • a feature of IPM is its ability to map accurately using few experiments: a logarithmic number of RHs provides linear resolving power.
  • Recombination events in meiosi ⁇ cause the founders' chromosomal region ⁇ to be retained or lo ⁇ t in progeny.
  • the location of thi ⁇ probe is suggested by the concordance of chromosomal regions that affected (or carrier) individuals ⁇ hare with founder( ⁇ ) (++) , or tho ⁇ e region ⁇ which unaffected individuals do not share with founder( ⁇ ) (—).
  • Step 1 phenotypic information i ⁇ obtained on a ⁇ et of related individual ⁇ .
  • Step 2 a den ⁇ e genotyping across a chromo ⁇ ome u ⁇ ing highly-polymorphic STS ⁇ i ⁇ obtained for all informative pedigree member ⁇ ; in the preferred embodiment, thi ⁇ i ⁇ done with the apparatu ⁇ of figure 1.
  • ⁇ ing pha ⁇ e known genotype ⁇ , haplotyping i ⁇ done wherever po ⁇ ible.
  • the founder genotype is obtained directly from the founder (if available) , or constructed indirectly as the union of alleles at each locus for every carrier or affected child of the founder.
  • Step 3 let v(i) be the sign of the phenotype of an individual i, where
  • w(i,m,a) the weight accorded the triple, as follows.
  • IPM identity-by-state
  • the w(i,m,a) term weights for the probability that an allele a was transmitted to individual i at marker m by the founder. That is, an accounting for identity-by-descent (IBD) is done.
  • IBD identity-by-descent
  • the probability of descent at a marker from the founder for an allele on the chromosome is computed.
  • the product of these link probabilitie ⁇ over every link in the inheritance path therefore provide ⁇ an estimate of the probability of descent. Linearly re ⁇ caling thi ⁇ descent probability from the range [0,1] by the function
  • Step 5 a concordance is computed for every allele of every STS marker by ⁇ umming over the individuals ⁇ i ⁇ chromosomes a ⁇
  • c(m,a) SUM (over i) [ v(i) * w(i,m,a) ].
  • Step 7 the genetic region ⁇ correlating with the trait are localized.
  • the concordance function C(m) computes a profile over the chromosome. Where this profile ⁇ how ⁇ a pattern on the chromosome that rises up to a peak, and then again descends from it, ⁇ ugge ⁇ ts the location of the gene (near the peak) .
  • the unaffected individuals are weighted to have les ⁇ influence.
  • Den ⁇ e genotype ⁇ are obtained for related ⁇ et ⁇ of individuals; in the preferred embodiment, this is done with the apparatus of figure 1.
  • Step 8 of figure 12 the genetic patterns obtained in Step 7 are used to as ⁇ e ⁇ the risk of individuals for various traits and diseases.
  • Step 9 the localization of disease genes on a genetic map is used to initiate the cloning of the gene via positional cloning techniques (Kerem, B.-S., Rommens, J.M. , Buchanan, J.A. , Markiewicz, D. , Cox, T.K. , Chakravarti, A., Buchwald, M. , and Tsui, L.-C. 1989. Identification of the cystic fibro ⁇ is gene: genetic analysi ⁇ .
  • Genotyping can be used for actuarial analysi ⁇ of health ri ⁇ ks in order to predict and reduce health care costs. Genotyping also finds application in transplantation (Scharf, S., Saiki, R. , and Ehrlich, H. 1988. New methodology for HLA class II oligonucleotide typing u ⁇ ing polymera ⁇ e chain reaction (PCR) amplification. Hum .
  • the loop mi ⁇ match method ⁇ de ⁇ cribed can detect exon repeat ⁇ that correlate with di ⁇ ease and prognosis, as well as exon alleles
  • Den ⁇ e genotyping can be u ⁇ ed to detect the occurrence of chromo ⁇ omal pattern ⁇ in a population.
  • Thi ⁇ applies in law enforcement applications (Jeffreys, A.J., Brookfield, J.F.Y., and Semeonoff, R. 1985. Po ⁇ itive identification of an immigration te ⁇ t-case using human DNA fingerprints. Nature, 317: 818-819.), incorporated by reference, for genetically fingerprinting individuals, a ⁇ well in paternity testing to asses ⁇ parenthood.
  • Genotyping can monitor the change ⁇ in the chromo ⁇ omal pattern ⁇ of population ⁇ , including:

Abstract

Modern medicine exploits relatively little of an individual's genetic composition in directing preventative, diagnostic, or therapeutic interventions. However, tracing the descent of chromosomal segments within families and populations, and then correlating with phenotypic traits, would enable accurate assessment of risk for common multifactorial diseases. This information could then be used to customize medical interventions to the major medical conditions for which an individual had significant risk. The primary obstacle has been the very many genotyping experiments and computations required to densely sample genomes. The invention pertains to a system that enables high-throughput genotyping, and thus the effective determination of such risks, and other useful genetic information. The invention also pertains to methods for determining the size of simple tandem repeat alleles by nucleic acid hybridization, including forming mismatched heteroduplexes and quantitating their single-stranded loop sizes.

Description

METHOD AND APPARATUS FOR ANALYZING GENETIC MATERIAL
FIELD OF THE INVENTION
The present invention pertains to a process for determining inheritance patterns in eukaryotic DNA. More specifically, the" present invention is related to densely sampling the genome with polymorphic genetic markers using a hybridization-based genotyping method, and then using this genetic information to assess the trait inheritance, including disease susceptibility, mendelian genetic disorders, and complex traits relevant for plant or animal husbandry. One such hybridization-based genotyping method entails forming mismatched heteroduplexes and quantitating single-stranded loop sizes.
BACKGROUND OF THE INVENTION
The specific objective of the system is genome-wide high-resolution genotyping for the purpose of health risk assessment, including genetic susceptibility for disease, and identification of disease-associated genes. The means for achieving this is genotyping polymorphic genetic loci by hybridization assays.
In meiotic recombination, large regions of parental chromosomes are interleaved and passed on to the next generation. By effecting a very dense sampling of the genome (i.e., all the chromosomes) for every individual in a large family, one can determine who has inherited which portions of which chromosomes from whom. That is, the dense sampling serves to tag the origin and descent of linear chromosomal fragments throughout the pedigree. By correlating the genotypic inheritance pattern of chromosomal fragments with the phenotypic occurrence of common multifactorial disease in individuals, culprit chromosomal regions can be identified. From this analysis, accurate risk assessments can be made for individuals based on their genotype, in the context of their entire kinship. Genome mismatch scanning (Nelson, S.F., McCusker, J.H. , Sander, M.A. , Kee, Y. , Modrich, P. , and Brown, P.O. 1993. Genomic mismatch scanning: a new approach to genetic linkage mapping. Nature Genetics , 4 (May) : 11-18.) , incorporated by reference, is one such approach, but has limited throughput since experiments are done on pairs (not sets) of individuals.
For 1 centiMorgan (cM) resolution genome sampling, about three thousand highly polymorphic genetic loci would be required for a medium-resolution genome-wide genotyping. High resolution at 0.lcM would therefore require genotyping no more than 30,000 genetic loci. Currently, as part of the world-wide Human Genome Project (Watson, J.D., Gilman, M. , Witkowski, J. , and Zoller, M. 1992. Recombinant DNA, Second Edition . New York, New York: W.H. Freeman and Company), incorporated by reference, roughly 30,000 highly polymorphic genetic sequence tagged site (STS) (Olson, M. , Hood, L. , Cantor, C. , and Botstein, D. 1989. A common language for physical mapping of the human genome. Science , 245: 1434- 35.), incorporated by reference, loci will be developed and mapped in the next three years. A sequence-tagged site is defined herein as a location on a genome characterized by at least one sequence. Much of this effort is done by Weissenbach's group at CEPH in France (Weissenbach, J., Gyapay, G., Dib, C. , Vignal, A., Morissette, J. , Millasseau, P., Vaysseix, G. , and Lathrop, M. 1992. A second generation linkage map of the human genome. Nature , 359: 794-801), incorporated by reference, and by Lander's group at the Whitehead Institute in Cambridge, Massachusetts. STSs are readily amplified by means of the polymerase chain reaction
(PCR) . These STSs will largely take the form of variable nucleotide tandem repeat (VNTR) sequences (Nakamura, Y. , Leppert, M. , O'Connell, P., Wolff, R. , Holm, T. , Culver, M. , Martin, C. , Fuji oto, E. , Hoff, M. , Kumlin, I., and White, R. 1987. Variable number tandem repeat (VNTR) markers for human gene mapping. Science , 235: 1616-1622.), incorporated by reference,that have several nucleotides repeated a fixed
(though highly polymorphic) number of times at any allele.
Importantly, the approach described herein centers on a detailed examination of such highly polymorphic intron genetic markers, rather than the highly conserved genes and their exon coding regions. However, the method also applies to expanded repeats within genes, and specific nucleotide alterations of specific DNA sequences.
Achieving this goal requires genome-wide high- resolution genotyping (1) an associated technology that will reduce the cost and error of the requisite genotyping, and thus enable widespread usage. Further, this technology must be coupled with (2) data acquisition and analysis methods that allow for fully automated error detection, risk analysis, and linkage analysis for both populations and families. Completion of this analysis generates a vast amount of data, hence the results must (3) be presented in a targeted fashion to disparate groups of end-users.
Much of the following description focuses on task
(1) , the novel parallel genotyping apparatus for polymorphic VNTRs. The approach is to spatially localize each genetic locus in a two-dimensional array, and then locally aggregate PCR-amplified DNA products to the proper array regions. Then, perform DNA hybridization studies by means of a detection mechanism to quantitate properties of the PCR products, and thereby determine the alleles (i.e., the genotype) for every genetic locus.
More precisely, a VNTR is a linear sequence of (deoxy)nucleotides of the pattern LWnR, where W is a short DNA sentence repeated n times, contained within two flanking regions of unique sequences: the left flanking region L, and the right flanking region R. These flanking sequences establish the singularity of a specific VNTR within a haploid genome. These unique sequences allow a VNTR to be associated with a specific location within the genome such that it can be physically or genetically mapped with respect to other DNA markers and/or genetic traits and disorders. Variations in the number of repetitive elements within the VNTR are common among individuals and allow specific alleles to be tracked as they are genetically transmitted from individuals to their offspring.
An important subclass of VNTRs is the short tandem repeat (STR), where n tends to be small (e.g., < 100), and repeating unit short (e.g., between two and five). For example, a CA-repeat is an STR where the dinucleotide CA is repeated n times, where n ranges in a human population from roughly ten to forty. There are an estimated 100,000 such CA-repeat loci in the human genome. Other VNTRs include trinucleotide and tetranucleotide repeats. Following PCR, the allelic variation in tandem repeat number can be determined by DNA size measurements using polyacrylamide gel electrophoresis.
These STRs and VNTRs are important for several reasons. (1) Many VNTRs have been associated with specific diseases (e.g., Huntington's disease, fragile X syndrome) (Kre er, I., Pritchard, M. , Lynch, M. , Yu, S., Holman, K. , Baker, E., Warren, S.T., Schlessinger, D. , Sutherland, G.R. , and Richards, R.I. 1991. Mapping of DNA instability at the Fragile X to a trinucleotide repeat sequence p(CCG)n. Science , 252: 1711-1714), incorporated by reference, where, in "anticipation", larger n often correlates with increased severity. (2) STRs serve as highly useful markers for specific diseases (Clemens, P., Fenwick, R. , Chamberlain, J. , Gibbs, R. , de Andrade, M. , Chakraborty, R. , and Caskey, C. 1991. Linkage analysis for Duchenne and Becker muscular dystrophies using dinucleotide repeat polymorphisms. Am J Hum Genet , 49: 951-960.), incorporated by reference. (3) STRs are useful as sequence tagged sites (STSs) (Olson, M. , Hood, L. , Cantor, C. , and Botstein, D. 1989. A common language for physical mapping of the human genome. Science , 245: 1434- 35.) , incorporated by reference, in physical mapping studies. (4) There is tremendous genetic polymorphism at these loci (Weber, J., and May, P. 1989. Abundant class of human DNA polymorphisms which can be typed using the polymerase chain reaction. Am J Hum Genet , 44: 388-396.), incorporated by reference. For each polymorphic locus, n may assume a wide range of allelic values in the population. Therefore, STRs are highly polymorphic loci that can be used in genetic linkage (Ott, J. 1991. Analysis of Human Genetic Linkage . Revised Edition . Baltimore, Maryland: The Johns Hopkins University Press.) , incorporated by reference, and chromosome fingerprinting (Jeffreys, A.J., Wilson, V., and Thein, S.L. 1985. Hypervariable 'minisatellite' regions in human DNA. Nature, 314: 67-73. Jeffreys, A.J. , Wilson, V., and Thein, S.C. 1985. Individual-specific fingerprints of human DNA. Nature , 316: 76-78.), incorporated by reference, studies that densely sample the genome.
Since STRs are easily amplified via PCR (Innis, M.A. , Gelfand, D.H. , Sninsky, J.J., and White, T.J. 1990. PCR Protocols : A Guide to Methods and Applications . San Diego, CA: Academic Press. Mullis, K.B., Faloona, F.A., Scharf, S.J., Saiki, R.K., Horn, G.T., and Erlich, H.A. 1986. Specific enzymatic amplification of DNA in vitro : the polymerase chain reaction. Cold Spring Harbor Symp. Quant . Biol . , 51: 263-273. Saiki, R.K. , Gelfand, D.H. , Stoffel, S., Scharf, S.J., Higuchi, R. , Horn, B.T. , Mullis, K.B., and Erlich, H.A. 1988. Primer-directed enzymatic amplification of DNA with a thermostable DNA polymerase. Science , 239: 487- 491.), incorporated by reference, and (by definition) their alleles differ only in the repeat number n, genotyping is easily effected by measuring the total length of the PCR product. This is commonly done by spatially (or temporally) separating DNA molecules of different sizes (or conformations) using, for example, gel electrophoresis. Other well-known approaches include mass spectroscopy, denaturing gradient gel electrophoresis, and chemical assays. A newer gel-based approach is two-dimensional DNA typing (te Meerman, G.J., Mullaart, E. , van der Meulen, M.A. , den Daas, J.H.G., Morolli, B. , Uitterlinden, A.G., and Vijg, J. 1993. Linkage analysis by two-dimensional DNA typing. Am . J. Hum . Genet . , 53: 1289-1297.), incorporated by reference. However, these measurements all have associated costs. In particular, none are particularly cost effective for genotyping the thousands of STR loci that are needed for densely sampling genomes.
This invention therefore describes more cost effective approaches that enable higher throughput STR genotyping. These methods employ nucleotide hybridization assays that directly measure the number of STR repeat units, rather than total fragment length. Such detections by hybridization are miniaturizable, hence parallelizable (Monaco, A.P., Lam, V.M.S., Zehetner, G. , Lennon, G.G., Douglas, C. , Nizetic, D. , Goodfellow, P.N. , and Lehrach, H. 1991. Mapping irradiation hybrids to cosmid and yeast artificial chromosome libraries by direct hybridization of Alu-PCR products. Nucleic Acids Res , 19(12): 3315-3318.), incorporated by reference, and, ultimately, highly manufacturable. Further, they can be adapted to work in chemical solutions, or on substrates with small surface area.
Two novel methods for STR allele determination at a locus are introduced, both based on genotyping by hybridization. The first method entails creating and detecting loop mismatches in heteroduplexes formed from the alleles* PCR products. The second method uses hybridization panels to determine the alleles.
SUMMARY OF THE INVENTION
The present invention pertains to an apparatus for analyzing the genetic material of an organism. The apparatus comprises means for amplifying the genetic material of the organism. The apparatus also comprises means for characterizing the amplified genetic material. The characterizing means is in communication with the amplifying means. The characterizing means contains all of the genetic material within a region having a radius of less than two feet. The amplifying means and characterizing means characterize the genetic material at a rate exceeding 100 sequence-tagged sites per hour per organism. The sequence-tagged sites are inherent to the genetic material.
Preferably, the genetic material includes nucleotide sequences. The amplifying means preferably includes a reaction plate with which the genetic material is in contact. The reaction plate has a plurality of chambers, each of which is disposed in a unique location of the plate corresponding to a location within a genome having at least one nucleotide sequence. The characterizing means preferably includes means for detecting whether a chamber contains a nucleotide sequence of the genetic material corresponding to the chamber's unique location.
The apparatus preferably also includes a thermocycler in thermal communication with the plate to heat and cool the plate. The detecting means preferably includes a detector connected to the chambers which produces a chamber signal for each chamber corresponding to genetic material in each chamber. The detecting means preferably also includes a processor in communication with the detector which receives the signal and identifies unique properties of the nucleotides in each chamber. The unique properties of the nucleotide of the genetic material in each chamber pertain to a number of nucleotides in any of the nucleotide sequences of the genetic material.
The amplifying means preferably includes at least one nucleotide sequence that corresponds to each chamber and which is in contact with the chamber. Each nucleotide sequence interacts with the nucleotide sequence of the genetic material of the nucleotide sequence if it is present.
The present invention also pertains to a method for analyzing genetic material of an organism. The method comprises the steps of amplifying the genetic material. Then there is the step of characterizing the amplified genetic material in a region having a radius of less than 20 feet at a rate exceeding 100 sequence-tagged sites per hour per organism. Preferably, the genetic material includes RNA or DNA. After the characterizing step, there preferably is the step of accessing risk of illness for which there is a genetic susceptibility in the organism. Such illnesses can include cancer, heart disease, etc.
The present invention also pertains to a method for manufacturing an apparatus for analyzing genetic material of an organism. The method comprises the steps of placing corresponding sequence-tagged sites in contact with corresponding chambers of a plate. Then, there is the step of connecting detectors to the chambers which can detect where the nucleotide sequences of the genetic material of the organism, when placed in contact with the chambers, have reacted with the corresponding sequence-tagged sites in the corresponding chamber. Then, there is the step of placing a thermocycling device in contact with the plate to cause the sequence-tagged sites in the chambers to react with genetic material of the organism that is placed in contact with the chambers. Next, there is the step of connecting a computer to the detectors and to the thermocycling device to control operation of the thermocycling device, and to receive signals which correspond to the genetic material of the organism and the sequence-tagged sites of each chamber from the detectors.
The present invention also pertains to a method for determining the size of nucleotide sequences of an STR marker contained on genetic material comprising the steps of: amplifying the nucleotide sequences of the genetic material in a region relating to the STR marker. Then there is the step of performing nucleic acid hybridizations on the amplified nucleotide sequences. Then there is the step of producing signals corresponding to the hybridizations of the amplified nucleotide sequences. Then there is the step of determining the sizes of the nucleotide sequences contained in the genetic material.
BRIEF DESCRIPTION OF THE DRAWINGS
In the accompanying drawings, the preferred embodiment of the invention and preferred methods of practicing the invention are illustrated in which:
Figure 1 is a schematic representation of a preferred embodiment of the apparatus.
Figure 2 is a schematic representation of parts of
DNA molecules for name convention purposes.
Figures 3a-3d list the steps for parallel genotyping of the present invention.
Figures 4a and 4b are schematic representations of mismatched loops formed from allele DNA. Figure 5 includes figures 5a-5c and is a schematic representation of loop mismatch for determining a sum of STR alleles.
Figure 6 includes figure 6a and is a block diagram showing loop mismatch for determining a difference of STR alleles.
Figure 7 is a flow chart for determining the STR alleles from the sum and difference.
Figure 8 is a flow chart of loop mismatch protocol for a single STR locus.
Figure 9 is a flow chart for reducing the number of PCR experiments.
Figures lOa-lOc show representations for increasing measured signal from loops with respect to summation experiment.
Figures 11a and lib are representations for increasing measured signal from loops with respect to difference experiments.
Figure 12 is a flow chart of concordance mapping for genetic patterns.
Figure 13 includes parts a-c and is a flow chart for determining an STR allele sum from a nucleic acid synthesis step. Figure 14 includes parts a-c and is a flow chart for determining an STR allele difference from a nucleic acid synthesis step.
Figure 15 is a flow chart for determining STR alleles from a nucleic acid synthesis step.
Figure 16 is a schematic representation of an assay for determining STR alleles from a nucleic acid ligation step.
Figure 17 includes parts a-b and is a schematic representation of an assay for determining STR alleles from a nucleic acid loop ligation step.
DESCRIPTION OF THE PREFERRED EMBODIMENT
Referring now to the drawings wherein like reference numerals refer to similar or identical parts throughout the several views, and more specifically to figure 1 thereof, there is shown an apparatus for analyzing the genetic material of an organism. The apparatus comprises means for amplifying the genetic material of the organism. The apparatus also comprises means for characterizing the amplified genetic material. The characterizing means is in communication with the amplifying means. The characterizing means contains all of the genetic material within a region having a radius of less than two feet. It should be noted that the region could have a radius of any reasonable size commensurate with the requirements of the task. For instance, the radius of the region could range from 1 cubic millimeter up to 10 feet by and anywhere in between. The amplifying means and characterizing means characterize the genetic material at a rate preferably exceeding 100 sequence- tagged sites per hour per organism. It should be noted that the rate could be up to 100,000 sequence-tagged sites per hour per organism, or as slow as desired, or any rate in between. Also, per organism could also be defined to be the characterization of genetic material of multiple organisms. The sequence-tagged sites are inherent to the genetic material.
Preferably, the genetic material includes nucleotide sequences. The amplifying means preferably includes a reaction plate 102 with which the genetic material is in contact. The reaction plate 102 has a plurality of chambers, each of which is disposed in a unique location of the plate 102 corresponding to a location within a genome having at least one nucleotide sequence. The characterizing means preferably includes means for detecting whether a chamber contains a nucleotide sequence of the genetic material corresponding to the chamber's unique location.
The apparatus preferably also includes a thermocycler 104 in thermal communication with the plate 102 to heat and cool the plate 102. The detecting means preferably includes a detector 108 connected to the chambers which produces a chamber signal for each chamber corresponding to genetic material in each chamber. The detecting means preferably also includes a processor 110 in communication with the detector 108 which receives the signal and identifies unique properties of the nucleotides in each chamber. The unique properties of the nucleotide of the genetic material in each chamber pertain to a number of nucleotides in any of the nucleotide sequences of the genetic material. The amplifying means preferably includes at least one nucleotide sequence that corresponds to each chamber and which is in contact with the chamber. Each nucleotide sequence interacts with the nucleotide sequence of the genetic material of the nucleotide sequence if it is present.
The present invention also pertains to a method for analyzing genetic material of an organism. The method comprises the steps of amplifying the genetic material. Then there is the step of characterizing the amplified genetic material in a region having a radius of less than 20 feet at a rate exceeding 100 sequence-tagged sites per hour per organism. Preferably, the genetic material includes RNA or DNA. After the characterizing step, there preferably is the step of accessing risk of illness for which there is a genetic susceptibility in the organism. Such illnesses can include cancer, heart disease, etc.
The present invention also pertains to a method for manufacturing an apparatus for analyzing genetic material of an organism. The method comprises the steps of placing corresponding sequence-tagged sites in contact with corresponding chambers of a plate 102. Then, there is the step of connecting detectors 108 to the chambers which can detect where the nucleotide sequences of the genetic material of the organism, when placed in contact with the chambers, have reacted with the corresponding sequence-tagged sites in the corresponding chamber. Then, there is the step of placing a thermocycling device 104 in contact with the plate 102 to cause the sequence-tagged sites in the chambers to react with genetic material of the organism that is placed in contact with the chambers. Next, there is the step of connecting a computer 110 to the detectors 108 and to the thermocycling device 104 to control operation of the thermocycling device 104, and to receive signals which correspond to the genetic material of the organism and the sequence-tagged sites of each chamber from the detectors 108.
The present invention also pertains to a method for determining the size of nucleotide sequences of an STR marker contained on genetic material comprising the steps of: amplifying the nucleotide sequences of the genetic material in a region relating to the STR marker. Then there is the step of performing nucleic acid hybridizations on the amplified nucleotide sequences. Then there is the step of producing signals corresponding to the hybridizations of the amplified nucleotide sequences. Then there is the step of determining the sizes of the nucleotide sequences contained in the genetic material.
A. An Apparatus for Parallel Genotyping
A parallel genotyping apparatus is described. The purpose of said apparatus is to provide a physical, chemical, mechanical, and computational embodiment for performing simultaneous experiments on multiple genetic markers used for genetic characterization.
Referring to figure 1, the apparatus is comprised of the following components:
(1) A multi-chambered reaction plate 102. (2) A thermocycling device 104.
(3) A robotic device 106.
(4) A detection device 108.
(5) A computer device 110, with a memory. The biochemical reactions occur in the chambers of the reaction plate 102, wherein a "chamber" denotes any localized region suitable for performing said reactions. The thermocycling device 104 provides a means for PCR and hybridization experiments. The robotic device 106 provides a means for transferring chemicals and performing other physical/chemical operations. The detection device 108 is used to quantitatively measure the signals from DNA hybridization experiments. The computer device 110 coordinates the activity of the other components, and performs any needed computations.
The primary requirement of the multi-chambered reaction plate 102 is a set of spatially arrayed chambers, each containing its own PCR primers for genome characterization, and providing operations for PCR amplification, DNA hybridization, and signal detection. Any physical device, of any number of dimensions, in whole or in part, that provides this functionality can serve as a physical embodiment for the apparatus. In an alternative embodiment, parallel synthesis methods for producing the oligonucleotides by spatially addressable masking techniques on a surface have been described (Fodor, Ξ.P.A., Read, J.L., Pirrung, M.C., Stryer, L. , Lu, A.T., and Solas, D. 1991. Light-directed spatially addressable parallel chemical synthesis. Science , 251: 767-773) , incorporated by reference, and may be employed for manufacture. The process may be further miniaturized using molded or etched surfaces that allow one or more orders of magnitude of markers to be simultaneously characterized in each chamber without increasing DNA or enzyme requirements. In the preferred embodiment, the basic container for the parallel genotyping reactions is a commercially available polystyrene or polycarbonate 384-chamber microtiter plate (USA Scientific Products, Ocala, FI) . Alternative embodiments include 96-chamber and 864-chamber plates. Each chamber corresponds to one chamber. These plates occupy the space of standard 96-chamber microtiter plates and are compatible with current robotic systems such as the Beckman Biomek system. These plates can contain sufficient volumes for the PCR reactions in each chamber. Many of the required mechanical, physical, and chemical steps can be performed on the plate by manipulating it with currently available robotic units (e.g., Beckman Biomek) (Bentley, D.R. , Todd, C, Collins, J. , Holland, J. , Dunham, I., Hassock, S., Bankier, A., and Giannelli, F. (1992). The development and application of automated gridding for efficient screening of yeast and bacterial ordered libraries. Genomics , 12(3): 534-41. Civitello, A.B. , Richards, S., and Gibbs, R.A. (1992). A simple protocol for the automation of DNA cycle sequencing reactions and polymerase chain reactions. Dna Sequence , 3(1) : 17-23. Drmanac, R. , Drmanac, S., Labat, I., Crkvenjakov, R. , Vicentic, A., and Gemmell, A. (1992). Sequencing by hybridization: towards an automated sequencing of one million M13 clones arrayed on membranes. Electrophoresis , 13(8): 566- 73.), incorporated by reference, as described below.
The apparatus has one or more two-dimensional surfaces 102 comprised of reaction chambers. Each STS genetic marker used from a genome corresponds to some reaction chamber. This experimentation surface provides a means for performing parallel laboratory operations on all the chambers simultaneously. Within each chamber, five steps are performed: (1) A deposition of at least two oligonucleotides into the chamber. These oligonucleotides serve as PCR primers for the STS marker specific to the chamber.
(2) A PCR amplification of genomic DNA presented to the chamber.
(3) A DNA hybridization experiment that characterizes the amplified DNA, and possibly modifies the DNA.
(4) A signal detection from the hybridized (and possibly modified) DNA. (5) An analysis of the detected signals to determine the alleles of the specific STS marker.
Means are provided by the apparatus for PCR amplification, DNA hybridization, and signal detection. The following description relates these functions to the parts of the apparatus.
Deposit primers. This function can be considered part of the manufacturing process, as described below.
PCR Amplification. The apparatus provides the means for amplifying the STS DNA region subsequent to presentation with genomic DNA. When PCR (Innis, M.A. , Gelfand, D.H. , Sninsky, J.J., and White, T.J. 1990. PCR Protocols: A Guide to Methods and Applications . San Diego, CA: Academic Press. Mullis, K.B., Faloona, F.A. , Scharf, S.J., Saiki, R.K. , Horn, G.T. , and Erlich, H.A. 1986. Specific enzymatic amplification of DNA in vitro : the polymerase chain reaction. Cold Spring Harbor Symp. Quant . Biol . , 51: 263-273.), incorporated by reference, is used for the amplification step, this means includes thermocycling components for heating and cooling the reaction mixture. In the preferred embodiment, the genomic DNA and PCR reagents are simultaneously transferred to the chambers by means of the robotic device. Various thermostable polyermases can be used (Garrity, P.A. , and Wold, B.J. (1992). Effects of different DNA polymerases in ligation-mediated PCR: enhanced genomic sequencing and in vivo footprinting. Proceedings of the National Academy of Sciences of the United States of America , 89(3): 1021-5. Ling, L.L., Keohavong, P., Dias, C. , and Thilly, W.G. (1991) . Optimization of the polymerase chain reaction with regard to fidelity: modified T7, Taq, and vent DNA polymerases. Per Methods & Applications , 1(1): 63-9.), incorporated by reference.
In the preferred embodiment, thermocycling is done using a conventional programmable block thermal cycler 104 based on the heating and cooling of a metal block (using Peltier or fluid refrigerants for cooling) (R. Hoelzel, Trends in Genetics, August 1990, volume 6 #8; p 237-8), incorporated by reference. The reaction plate is transferred to and from this computer-controlled thermal cycler by means of the robot 106. In an alternative embodiment, a device 104 is used that heats and cools a rapidly circulating air mass around the plate (e.g., Biotherm PCR oven) (Garner, H.R. , Armstrong, B. , and Lininger, D.M. (1993). High-throughput PCR. Biotechniques , 14(1): 112-5.), incorporated by reference. Such air thermal cyclers support the simultaneous processing of multiple plates. The conditions (such as temperature settings and ramp functions and step times) are adjusted to the method of heat and cooling, since the sensitivity of the method to how rapidly the reaction chambers will equilibrate with the changing temperatures.
In an alternative embodiment, a robotic attachment (Beckman Biomek) , incorporated by reference, comprised of a thermocycling surface which has the same 384-chamber shape as the reaction plate is used to physically mate with the 384- chamber reaction plate, and provide the necessary heating and cooling operations under computer control. In another alternative embodiment where the reaction surface is fabricated, heating and cooling elements such as Peltier junctions can be physically incorporated into the apparatus. This surface is suitable for transferring sample genomic DNA to many chambers simultaneously. Miniaturization enable shorter cycle times and greater homogeneity because of the rapid temperature equilibration of the thin films and small volumes.
DNA hybridization. Sufficient volume and chemical composition is provided within each reaction chamber so that the requisite DNA hybridization (Ausubel, F.M. , Brent, R. , Kingston, R.E., Moore, D.D., Seidman, J.G., Smith, J.A. , and Struhl, K. , ed. 1993. Current Protocols in Molecular Biology . New York, NY: John Wiley and Sons. Sa brook, J. , Fritsch, E.F., and Maniatis, T. 1989. Molecular Cloning, second edition . Plainview, NY: Cold Spring Harbor Press.), incorporated by reference, can occur. In the preferred embodiment, the robotic component of the apparatus transfers the hybridization reaction mixture to the chambers, and provides means for heating and cooling the reaction chamber, as described above.
In a preferred embodiment, means are provided for
(optionally) modifying the DNA. Typical modifications to heteroduplexes, for example, include chemical derivatization and endonuclease digestion of single-stranded components.
Signal Detection. The detection of the heteroduplexes and nucleotides within the loops is done with a commercially available spectrophotometric/fluorometric instrument 108 similar to that used for ELISAs (Dynatech Laboratories, Chantilly, Va) , incorporated by reference, modified to accomodate the larger number and smaller size chambers. A scanning laser fluorimeter can also be employed over the plate surface. Because -the plate is flat and comprised of an optical grade surface, fluorescent detection is straightforward. The robot transfers the reaction plate to this optical detection device prior to the detection operation. In an alternative embodiment, computerized fluorescent scanning microscopes are used that are capable of detecting and quantitating fluorescent signals and are suitable for the miniaturized system. These have been developed for immunological and genetic cytochemistry (Biological Detection Systems) , incorporated by reference.
A physical signal is measured from the reagent attached to a PCR primer. In other alternative embodiments, such detection reagents include (but are not limited to) radioactivity, fluorescence, phosphorescence, chemiluminescence, electrical resistivity, pH, and ionic concentration. The direct electrical detection mechanisms are particularly attractive for direct coupling of the experiment onto a minaturized solid state detection device (Briggs, J. , Kung, V.T., Gomez, B., Kasper, K.C., Nagainis, P.A., Masino, R.S., Rice, L.S., Zuk, R.F., and Ghazarossian, V.E. (1990) . Sub-femtomole quantitation of proteins with Threshold, for the biopharmaceutical industry. Biotechniqueε , 9(5): 598-606. Kung, V.T., Panfili, P.R. , Sheldon, E.L., King, R.S., Nagainis, P.A. , Gomez, B.J. , Ross, D.A. , Briggs, J. , and Zuk, R.F. (1990). Picogram quantitation of total DNA using DNA-binding proteins in a silicon sensor-based system. Analytical Biochemistry, 187(2): 220-7. Olson, J.D., Panfili, P.R., Armenta, R. , Femmel, M.B., Merrick, H. , Gumperz, J. , Goltz, M. , and Zuk, R.F. (1990). A silicon sensor-based filtration immunoassay using biotin-mediated capture. Journal of Immunological Methods , 134(1): 71-9. Olson, J.D., Panfili, P.R., Zuk, R.F., and Sheldon, E.L. (1991). Quantitation of DNA hybridization in a silicon sensor-based system: application to PCR. Molecular & Cellular Probes , 5(5): 351- 8.) , incorporated by reference. Such silicon-based detectors are described below.
Analysis. The analysis of the signals is done by a computer device 110. Means are provided for the signals are transferred from the detector into the memory of the computer. A computer program for determining genotypes from the quantitative signals and calibrations curves resides in the memory of said computer.
B. Manufacturing an Apparatus for Parallel Genotyping
In the preferred embodiment, the apparatus is manufactured by selecting a set of genetic markers, synthesizing both standard and derivatized oligonucleotide primers, and then depositing said oligonucleotide primers into the reaction chambers of a 384-chamber plate. This plate is then positioned with the other components of the apparatus, including the thermocycling device, the robotic device, the detection device, and the computer device.
A sufficient number of polymorphic genetic markers are chosen for unambiguously characterizing or tracing chromosomes in an organism containing DNA or RNA. Depending on the application, this can range from 10 centiMorgan (cm) to 0.001 cm. One cm is approximately one million megabases (Mb) . In a preferred embodiment, a resolution of 0.1 cm, or 100,000 base pairs (bp) , is used. In the human species, for example, which contains about 3 billion bp, this works out to 30,000 markers. The genetic markers to be used for each STS are obtained as PCR primer sequences pairs from available databases (Genbank, GDB, EMBL; Hilliard, Davison, Doolittle, and Roderick, Jackson laboratory mouse genome database. Bar Harbor, ME; SSLP genetic map of the mouse. Map Pairs, Research Genetics, Huntsville, AL) , incorporated by reference. One of the goals of the world wide genome project is to generate and make publicly available 30,000 genetic markers; currently, about 10,000 are available. Alternatively, some or all of these PCR sequences can also be constructed using existing techniques (Sambrook, J. , Fritsch, E.F., and Maniatis, T. 1989. Molecular Cloning, second edition . Plainview, NY: Cold Spring Harbor Press.), incorporated by reference.
The oligonucleotide primers for each STS are synthesized (Haralambidis, J. , Duncan, L. , Angus, K. , and Tregear, G.W. 1990. The synthesis of polyamide- oligonucleotide conjugate molecules. Nucleic Acids Research , 18(3): 493-9. Nelson, P.S., Kent, M. , and Muthini, S. 1992. Oligonucleotide labeling methods. 3. Direct labeling of oligonucleotides employing a novel, non-nucleosidic, 2- aminobutyl-1,3-propanediol backbone. Nucleic Acids Research , 20(23): 6253-9. Roget, A., Bazin, H. , and Teoule, R. 1989. Synthesis and use of labelled nucleoside phosphoramidite building blocks bearing a reporter group: biotinyl, dinitrophenyl, pyrenyl and dansyl. Nucleic Acids Research , 17(19): 7643-51. Schubert, F. , Cech, D., Reinhardt, R. , and Wiesner, P. 1992. Fluorescent labelling of sequencing primers for automated oligonucleotide synthesis. Dna Sequence , 2(5): 273-9. Theisen, P., McCollu , C, and Andrus, A. 1992. Fluorescent dye phosphora idite labelling of oligonucleotides. Nucleic Acids Symposium Series , 1992(27): 99-100.), incorporated by reference. These primers may be derivatized with a fluorescent detection molecule or a ligand for immunochemical detection such as digoxigenin. Derivatization of the primer for binding to the surface entails the incorporation of a biotinylated nucleotide at the 5' end of the synthetically made oligonucleotide. Additional biotinylated residues can also be incorporated (depending on the protocol) into this primer either at the time of biosynthesis or by secondary photo or chemical biotinylation. Though the preferred embodiment employs the direct addition of the 5* biotin by chemical synthesis, additional biotin molecules for binding may be added to the primer for improving the efficiency of selection of heteroduplexes for analyses. Alternatively, said oligonucleotides and their derivatives can be ordered from a commercial vendor (Research Genetics, Huntsville, AL) .
The oligonucleotide primer sets are deposited into each reaction chamber by means of a robotic system from source chambers containing a large store of presynthesized oligonucleotides. Said transferring can be effected in one or more operations, wherein oligonucleotide primers are deposited into multiple chambers in each transferring step, thereby creating a two-dimensional spatial array. In an alternative embodiment, this deposition is effected by means of a parallel deposition device to which the 384-chamber plate is presented by means of a conveyor belt. The deposition device has source chambers, each containing a large store of a unique oligonucleotides specific to a reaction chamber. Said source chambers are spatially arrayed to conform to the reaction chambers of the plate. Both the device and plate are properly positioned and made stationary, and then the chambers are filled in one or more more steps with the oligonucleotide.
In the preferred embodiment, the plates are dried and each chamber is then coated with a wax material, such as Ampliwax (Perkin-Elmer, Norwalk, CT) , incorporated by reference. This material hardens at 4°C, is liquid throughout the temperature range of the PCR, and serves as a vapor barrier to prevent evaporation of the PCR reactions during the denaturation steps at 95*C. By placing this material over the dried primers and allowing it to harden at 4*C, one establishes a stable apparatus that can be stored and to which the remaining components of the PCR reaction can be added without disruption of the stable two-dimensional array and the reactions can be initiated simultaneously.
In an alternative embodiment, the oligonucleotides are covalently attached to a substrate such as glass by spatially addressable light-directed parallel DNA synthesis (Drmanac, R. , Drmanac, S., Strezoska, Z., Paunesku, T. , Labat, I., Zeremski, M. , Snoddy, J. , Funkhouser, W.K. , Koop, B. , and Hood, L. 1993. DNA Sequence Determination by Hybridization: a Strategy for Efficient Large-scale Sequencing. Science , 260: 1649-1652. Fodor, S.P.A., Read, J.L., Pirrung, M.C., Stryer, L. , Lu, A.T. , and Solas, D. 1991. Light-directed spatially addressable parallel chemical synthesis. Science , 251: 767-773.) , incorporated by reference. The DNA amplification is done directly on this surface (Innis, M.A. , Gelfand, D.H. , Sninsky, J.J. , and White, T.J. 1990. PCR Protocols : A Guide to Methods and Applications . San Diego, CA: Academic Press.), incorporated by reference. Detection can be effected by 32P, fluorescence, or electronic means (Eggers, M. , et. al., 1993. Genosensors: Microfabricated Devices for Automated DNA Sequence Analysis. In Advances in DNA Sequencing Technology, #1891. Keller, R. , ed. , Proceedings of SPIE. Southern, E.M., Maskos, U. , and Elder, J.K. 1991. Analyzing and Comparing Nucleic Acid Sequences by Hybridization to Arrays of Oligonucletides: Evaluation Using Experimental Models. Genomics, 13: 1008- 10017.), incorporated by reference.
Other components of the apparatus include resins and filters that will nonspecifically and reversibly bind double-stranded DNA, but not free nucleotides or short oligonucleotides (Molecular Biology LabFax, T.A. Brown, ed. Academic Press p281-4) , incorporated by reference. These are commercially available and can be readily modified to be fit within a manifold that will ensure leak-proof contact with the reaction chambers or plates. Uncharged nylon, charged nylon, and nitrocellulose are some of the filter materials in current use (Harley, C.B. , and Vaziri, H. (1991). Deproteination of nucleic acids by filtration through a hydrophobic membrane. Genetic Analysis, Techniques &
Applications , 8(4): 124-8. Twomey, T.A. , and Krawetz, S.A.
(1990) . Parameters affecting hybridization of nucleic acids blotted onto nylon or nitrocellulose membranes. Biotechniques , 8(5): 478-82. Williams, D.L. (1990). The use of a PVDF membrane in the rapid immobilization of genomic DNA for dot-blot hybridization analysis. Biotechniques , 8(1): 14- 5.), incorporated by reference. The polystyrene plate that is bound to strepavidin (or avidin) is also commercially available in neutral, positively-, and negatively-charged configurations (MaxiSorp, Nunc, or Combiplate, Applied Scientific Instrumentation, Inc, Eurgene, OR) , incorporated by reference. The appropriate material is adjusted to the specific combination of the binding capacity, the degree of nonspecific or background binding, and the optical properties of the material.
In the preferred embodiment, referring to figure 1, the commercially available polystyrene or polycarbonate 384- chamber microtiter plate 102 is arranged in a 24 by 16 array. The commercially available robotic device 106 has a surface with 384 chambers arranged in a spatial configuration identical to that of the reaction plate 102. Thus, all robotic actions (e.g., for the steps of amplification, hybridization, and detection) are performed in parallel with robotic device 106 in mechanical juxtaposition with plate 102.
In the preferred embodiment, the commercially available programmable block thermal cycler 104 has a surface with 384 chambers arranged in a spatial configuration identical to that of the reaction plate 102. During thermocycling, every chamber of the plate 102 is in direct contact with its corresponding chamber in the thermocylcer 104. In an alterative embodiment, the commercially available programmable oven thermocyler 104 is sufficiently large to accommodate the dimensions of 384-chamber reaction plate 102, and has sufficient uniformity to perform the necessary amplification reactions within each chamber. A robotic device is used to transfer the reaction plate 102 to and from the oven thermocycler 104.
The commercially available ELISA-like spectrophotometric/fluorometric detection device 108 contains 384 chambers arranged in an spatial configuration identical to that of reaction plate 102. During the detection phase, the plate 102 is placed into the detector, with each chamber of plate 102 residing within its corresponding detection chamber of detector 108. This enables detections to be conducted simultaneously and independently for each chamber.
The computer device 110 coordinates the activities of the other components plate 102, thermocyler 104, robotic device 106, and detector 108. Note that most commercial thermocylers, robotic devices, and detectors include computational facilities for independently performing control, detection, and processing tasks, thus freeing the computer device 110 from such low-level processes. The computer device 110 is connected to the detector 108. Signals obtained from the detector 108 are transferred to the memory of computer 110. The computer 110 employs processing means for interpreting the signals in its memory, and determines and outputs the characteristics of nucleotide sequences in each chamber of the reaction plate 102.
C. A System for Parallel Genotyping
A system for characterizing multiple genetic markers is described, along with steps for using this information for preventative health care. In overview of the preferred embodiment, genomic DNA is first extracted from an individual (say, by processing a blood sample) . PCR reagents are then mixed with the genomic DNA, and a robotic device applies this PCR/DNA mixture to the chambers of the reaction plate of the apparatus. Every chamber has its own predeposited PCR primers that define a unique genetic marker. PCR amplificiation of the genomic DNA marker region is then performed on every chamber using the thermocycling component of the apparatus. A quantitative hybridization experiment is then conducted in~ every chamber, possibly modifying the DNA. The signals from these hybridization experiments are then measured from every chamber using the detection component, such as fluorescence measurements with a scanning light microscope. More than one (e.g., two or three) such parallel experiments may be needed to acquire all necessary genotyping data for one STR. The measurements are then collected and analyzed by the component computer device to characterize the alleles at every marker.
The resulting genotyping information from the multiple alleles can be used for a number of applications, as described below. One important use is the determination of genetic risk for phenotypic traits, including diseases. By comparing dense genotyping data of STRs across related individuals, haplotypes can be compared, and the shared genomic regions determined. Correlating a shared trait and genotype commonalities enables a determination of genomic patterns that imply a quantitative risk for said trait. These patterns can be applied to the genotypes of an individual and their relatives to compute a probability of expressing the trait. When the traits correspond to common multigenic multifactorial diseases, the highest risk entities are determined, and preventative measures undertaken, thereby improving the health of said individual. Software systems are built to tailor the genotyping information for this advising task.
The quantitative hybridization experiment that is used in the preferred embodiment is a pair of loop mismatch assays. The first assay measures the sum of the two STR allele loops, relative to a third (and smaller) STR. The second assay measures the difference of the two STR allele loops relative to each other. By combining the sum and difference values, the two alleles can be determined. The quantitative loop detection is effected by directly measuring the signals derived from the loops relative to the number of strands with loops (this is described in detailed later on) . The loops are quantitated either by a chemical modification of the single-stranded loop DNA into a detectable state, or by incorporation of labeled DNA and subsequent digestion and detection of the single-stranded loop. The number of strands is measured by using an end-labeled PCR primer. The ratio of the (calibrated) loop measurements to the number of strands determines the loop size. In an alternative embodiment, multiple hybridizations are performed for every STR, producing a patter that determines the genotype.
This system for performing multiple genotypings in parallel, with each STR in its separate cell, has many useful advantages over current genotyping methods, including the best gel-based multiplex methods. Specifically,
Massive parallelism greatly increases throughput by greatly reducing the total experimentation time.
The experiment's architecture allows independent interchangeability of STR loci. Any STR(s) of the same class can be placed at any cell of the device. The synthesis of oligonucleotides can be spatially or temporally separated from the execution of the PCR amplification and the detection.
Manufacturing enables miniaturization of the device, and the incorporation of detection machinery into the device.
The manual labor required for genotyping is greatly reduced, because the manufactured device eliminates the separate steps of handling multiple (e.g., thousands) specific STR primers. This includes synthesizing the oligonucleotides, performing the PCR, loading gels or other detection devices, and checking the genotyping results.
• Reduced manual intervention greatly reduces the error rate.
Referring to figure 2, the following terminology is used throughout:
Strand. A single-stranded DNA PCR product of an STR. The CA (or GT) repeat region is of varying length.
Complementary strand . A second strand having a Watson- Crick complementary DNA sequence to a first strand. However, the number of CA or GT repeats need not equal that of the first strand.
Upper strand . The DNA strand 202 of the STR locus that contains the CA-repeat units. Lower strand . The DNA strand 204 complementary to the upper strand that contains the GT-repeat units.
Left primer. The PCR oligonucleotide primer 206 that initiates the upper strand of the STR locus.
Right primer. The PCR oligonucleotide primer 208 that initiates the lower strand of the STR locus.
In the preferred embodiment, the system is comprised of the following steps:
Referring to figure 3, Step 1 entails the manufacture of an apparatus in which STR loci have been selected, and appropriate oligonucleotides (withmodifications) synthesized and deposited within each chamber.
In the PCR ampification of Step 2a, the process begins by extracting DNA from blood or tissue. There are numerous standard methods to isolate DNA including whole blood, isolated lymphocytes, tissue, and tissue culture (Ausubel, P.M., Brent, R. , Kingston, R.E., Moore, D.D., Seid an, J.G., Smith, J.A. , and Struhl, K. , ed. 1993. Current Protocols in Molecular Biology . New York, NY: John Wiley and Sons. Sambrook, J., Fritsch, E.F., and Maniatis, T. 1989. Molecular Cloning, second edition . Plainview, NY: Cold Spring Harbor Press. Nordvag 1992. Direct PCR of Washed Blood Cells. BioTechniqueε , 12(4): 490-492.), incorporated by reference. In the preferred embodiment, DNA is extracted from anticoagulated human blood removed by standard venipuncture and collected in tubes containing either EDTA or sodium citrate. The red cells are lysed by a gentle detergent and the leukocyte nuclei are pelleted and washed with the lysis buffer. The nuclei are then resuspended in a standard phosphate buffered saline (pH=7.5) and then lysed in a solution of sodium dodecyl sulfate, EDTA and tris buffer pH 8.0 in the presence of proteinase K 100 ug/m 1. The proteinase K digestion is performed for 2 hours to overnight at 50*C. The solution is then extracted with an equal volume of buffered phenol-chloroform. The upper phase is reextracted with chloroform and the DNA is precipitated by the addition of NaAcetate pH 6.5 to a final concentration of 0.3M and one volume of isopropanol. The precipitated DNA is spun in a desktop centrifuge at approximately 15,000 g, washed with 70% ethanol, partially dried and resuspended in TE (lOmM Tris pH 7.5, 1 mM EDTA) buffer. There are numerous other methods for isolating eucaryotic DNA, including methods that do not require organic solvents, and purification by adsorption to column matrices. None of these methods are novel, and the only requirement is that the DNA be of sufficient purity to serve as templates in PCR reactions and the amount of DNA is sufficient for the scale of the parallel genotyping procedure.
Continuing Step 2a, the reaction plates of the apparatus are maintained at 4"C at the time the genomic DNA has been mixed with the other components of the PCR reaction. These other components include, but are not limited to, the standard PCR buffer (containing Tris pH8.0, 50 mM KCl, 2.5 mM magnesium chloride, albumin) , triphosphate deoxynucleotideε (dTTP, dCTP, dATP, dGTP) , the thermostable polymerase (Taq polymerase in this preferred embodiment, but others are available though buffer conditions are somewhat different) (Garrity, P.A. , and Wold, B.J. 1992. Effects of different DNA polymerases in ligation-mediated PCR: enhanced genomic sequencing and in vivo footprinting. Proceedings of the National Academy of Sciences of the United States of America, 89(3): 1021-5. Ling, L.L., Keohavong, P., Dias, C. , and Thilly, W.G. 1991. Optimization of the polymerase chain reaction with regard to fidelity: modified T7, Taq, and Vent DNA polymerases. PCR Methods & Applications , 1(1): 63-9.), incorporated by reference. The amounts of the genomic DNA and deoxynucleotideε, Mg concentration, and enzyme are all adjusted so as to be optimal for the entire set of PCR reactions. The PCR primers for each locus are chosen for consistency with these uniform reaction conditions. The total amount of this mixture is determined by the final volume of each PCR reaction (say, 10 ul) and the number of reactions (say, 384) . This mixture can also be varied by including some of the constituents with the primers that are previously deposited in the microchambers. All of the necessary components for the PCR reactions are kept separate until the Ampliwax is melted and the aqueous phases reconstitute, each reaction cell receives a consistent and reproducible amount of the necessary components, and the combination of constituents does not compromise stability and biological activity (e.g., the Taq polymerase may be unstable if stored in a lyophilized state on the reaction plates) .
In Step 2b, the DNA/PCR mixture is applied to the reaction chambers with the Biomek robotics unit and the PCR is initiated by heating the plate rapidly to 95"C in order to melt the ampliwax, allow the DNA/PCR mixture to mix with the oligonucleotide primers (convection mixture is sufficent) , and denature the genomic DNA. The ampliwax forms a stable vapor barrier over the chambers during the PCR reactions. This method of initiating the PCR reactions is referred to as a "hot start" (D'Aquila et al., Nuc. Acid Res. 19 (13) 3749 (1991)), incorporated by reference, and has the additional benefit of reducing the amount of nonspecific PCR products that are produced, thus improving the purity and amount of the final desired PCR signal that will be detected.
In Step 2c, the PCR reactions are performed on all of the reactions simultaneously by appropriately heating and cooling the plate to specific temperatures. After the initial denaturation step of 93*-95*C for 3-5 minutes, the plates are cooled to the annealing temperature (50*-65*C, typically 55°C) for a set time (0-100 seconds, typically 15 seconds) , warmed to the extension temperature which is optimal for the thermostable polymerase (e.g., 73*C for Taq polymerase) and maintained for a set period of time (0-100 seconds, typically 30 seconds) . Finally, the cycle is completed by elevating the temperature of the reaction to denature the DNA products (93-95"C for 0 - 60 seconds, typically 15 seconds) . The entire cycle of annealing, extension, and denaturation is then repeated multiple times (ranging from 20-40 cycles depending on the efficiencies of the reactions and sensitivity of the detection system) . (Innis, M.A. , Gelfand, D.H. , Sninsky, J.J., and White, T.J. 1990. PCR Protocols : A Guide to Methods and Applications . San Diego, CA: Academic Press.), incorporated by reference.
Following Step 2c, the PCR cycles are completed, with each chamber containing the amplified DNA from a specific location of the genome. Each mixture includes the DNA that was synthesized from the two alleles of the diploid genome (a single allele from haploid chromosomes as is the case with the sex chromosomes in males or in instances of cells in which a portion of the chromosome has been lost such as occurs in tumors, or no alleles when both are lost) . Also in this mixture are the free triphosphate deoxynucleotides and the unused oligonucleotide primers.
Step 2d is the last PCR step, which inactivates the thermostable polymerase, say, by the addition of EDTA. Ampliwax protects the integrity of the chambers and the mixing occurs at 37'C for several minutes.
In Step 3a, for quantitive loop mismatch genotyping, the DNA strands are allowed to reanneal at a temperature above the annealing temperature of the oligonucleotide primers, but below the melting temperature of perfectly matched complementary strands. In most instances, this will be between 65 and 75*C, depending on the salt conditions of the buffer. The annealing time can vary from 1 hour to 24 hours, with 2 hours selected in the preferred embodiment.
In Step 3a, the heteroduplex annealing is done with the original contents of the chamber for the "subtraction" assay of the loop detection method. The "addition" assay that is required for the measurements of loop mismatches entails combining of the contents of a chamber with its counterpart from a control plate in which the PCR reaction has been carried out with a corresponding set of primers (same oligonucleotides, but with different primer modifications) on a target DNA that has the smallest possible number of repeated elements for the given DNA marker. These two assays are done in different chambers of the reaction plate, or on separate plates entirely.
In the subtraction assay, the Left primer is linked to a detection molecule, and the Right primer is covalently linked to a molecule necessary for binding (i.e., biotin in the preferred embodiment) . In the addition assay, the unknown genomic DNA (Source DNA) is amplified using a Left primer that is labeled with the detection molecule and the Right primer is unmodified. In contrast, the control DNA (or Target DNA) is amplified with an unmodified Left primer and the Right primer contains the binding protein (such as biotin) . In this situation, when the amplified DNA from the unknown source and the Target DNA are combined to form heteroduplexes, one will only detect the binding of the upper strand of the Source DNA to the immobilized lower strand of the Target DNA and homoduplexes of the Target DNA strands will be undetected as well as perfectly matched, creating no exposed loops for detection. The corresponding Source and Target DNAs are appropriately combined using the Biomek robot though direct physical transfer methods (i.e., aligning the Source DNA plate on top of the Target DNA plate directly and mixing by melting the ampliwax) .
In Steps 3b, 3c, 3d, 3e, and 3f, the unwanted single strands, primers and free nucleotides are removed by using a 3 'to5'-specific exonuclease that will not cleave or disrupt internal single-stranded loop structures, in both the subtraction and addition assays. Exonuclease VII from E. coli is capable of 3'- 5' exonuclease activity limited to single-stranded DNA (Ausubel, F.M. , Brent, R. , Kingston, R.E., Moore, D.D., Seidman, J.G., Smith, J.A. , and Struhl, K. , ed. 1993. Current Protocols in Molecular Biology . New York, NY: John Wiley and Sons. Vales, L.D., Rabin, B.A. , and Chase, J.W. 1982. Subunit structure of Escherichia coli exonuclease VII. J. Biol . Chem . , 257: 8799-8805.), incorporated by reference. One unique feature of this exonuclease is that it is not inactivated by EDTA, thus making it active under conditions that would inactivate the Taq polymerase. The enzyme is either removed by a subsequent wash step, or inactivated chemically. (A 5' exonuclease is not used, because that end is blocked by the linkage to biotin or by linkage to the surface.) The enzyme is added to the chambers over the Ampliwax surface, allowed to mix at 37'C and incubated for a brief period (1-60 minutes, 10 minutes in the preferred embodiment) and terminated by the addition of EDTA. At the same time, the buffer is adjusted to promote non-specific binding of the DNA to a resin or filter.
Following Step 3f, free deoxynucleotides and primers that interfere with binding of the PCR products and the detection system are removed. The purification of unincorporated DNA materials is combined with the elimination of single-stranded DNA species that remain after heteroduplex formation. In the preferred embodiment, this purification step is done after the heteroduplex formation, thereby also eliminating single-stranded DNA's. Although heteroduplex formation may be somewhat inhibited by residual primers, combining of the steps greatly simplifies the method and aids in increasing the εignal-to-noise ratio. In an alternative embodiment, the separation of free deoxynucleotides and primers from the PCR products is achieved by filtration (the unwanted materials are significantly smaller than the final PCR products) using commercially available filters (Centricon 30 filters, A icon) , incorporated by reference, or by adsorption (Molecular Biology LabFax, T.A. Brown, ed. Academic Press p281-4) , incorporated by reference, which entails the nonspecific binding of the PCR products, double- stranded and single-stranded DNA to a matrix followed by removal of the supernatents containing the primers and nucleotides. The removal of the primers, followed by heteroduplex formation and then elimination of single strands can be done by exonuclease digestion or by other separation methods (Linxweiler, W. , and Horz, W. 1982. Sequence specificity of exonuclease III from E. coli. Nucleic Acids Research , 10(16): 4845-59. Sandigursky, M. , and Franklin, W.A. 1993. Exonuclease I of Escherichia coli removes phosphoglycolate 3'-end groups from DNA. Radiation Research , 135(2): 229-33.), incorporated by reference.
In Step 3g, the filter is set upon a plastic manifold that fits over the chambers of the 384 chamber plate, the apparatus is inverted so that the Ampliwax rises to the bottom surface of the chambers, and the DNA solution comes into contact with the filter. In Step 3h, the filter is separated from the chamber and washed with a high salt buffer to remove the free nucleotides. In Step 3i, this filter is then placed against a polystyrene surface (optical grade)
(such as used in MaxiSorp plates manufactured by Nunc,
Naperville, IL) that has been coated with streptavidin
(Giorda, R. , Lampasona, V., Kocova, M. , and Trucco, M. 1993. Non-Radioisotopic Typing of Human Leukocyte Antigen Class II
Genes on Microplates. BioTechniqueε , 15(5): 918-925. Giorda et al. 1994. Molecular HLA DQ typing on microplates: A step toward complete automation, manuscript) , incorporated by reference and the DNA's are eluted from the filter using a low ionic strength buffer such as TE (10 mM Tris pH7.5, 1 mM EDTA) , and allowed to specifically bind to the streptavidin through the biotinylated primers described previously.
The heteroduplexes are bound to the polystyrene surface in an exact replica of their initial spatial orientation. The heteroduplexes containing biotinylated primer will bind to the streptavidin surface under a wide range of buffers that are pH neutral. In the preferred embodiment, the DNA is bound in the TE buffer and in Step 3j the plate is washed twice with 0.15 M phosphate buffer.
In Step 3k, the chemical derivatization of the C and A residues within the heteroduplex loops employs a modification of the method originally described by (Ki ura, K. , Nakanishi, M. , Yamamoto, T. , and Tsuboi, M. (1977). A correlation between the secondary structure of DNA and the reactivity of adenine residues with chloroacetaldehyde. Journal of
Biochemistry, 81(6): 1699-703.), incorporated by reference.
The plate is washed with 0.15 M Na Phosphate buffer pH •= 6.5
(the pH can be varied from 4.5 to 6.5 and alternative buffers can be used) . The plate is then covered with the buffer containing a final concentration of CAA is 2.0% and incubated at 37"C for 4 hours (longer or shorter times may be used). The reaction is terminated in Step 31 by washing with 0.01M Tris-HCl pH 7.0 and 1.0 M NaCl. The NaCl prevents dissociation of the heteroduplexes during the etheno- dehydration step. The plate is heated in the final wash volume at 85-90"C for 1 hour, which dehydrates the ethenoderivative. Note that loop-specific derivatization of the nucleotides with chloracetaldehyde or other chemical modification reagents (osmium tetroxide, hydroxylamine, carbodiimide, etc., as described below) provides an alternative means for eliminating background reagents prior to detecting nuclease-liberated free derivatized nucleotides.
In the Step 4a detection, the fluorescence of the primer detector molecule that is bound to the hybridized strand is measured at this time, or measured at a later stage in conjunction with the fluorescent adducts created within the loop structures. In the preferred embodiment, the detection of the hybridized strands and the derivatized nucleotides within the loops are performed at the same time. The method of detection is preferrably by fluorescence (Kimura, K. , Nakanishi, M. , Yamamoto, T. , and Tsuboi, M. (1977). A correlation between the secondary structure of DNA and the reactivity of adenine residues with chloroacetaldehyde. Journal of Biochemistry, 81(6): 1699-703.), incorporated by reference. Alternative embodiments include chemiluminesence (Martin R. , Hoover, C. , Grimme, S., Grogan, Cl, Holtke, J. and Kessler, CF. (1990) Bio Techniques 9(6): 762-8), incorporated by reference, electrochemical coupling using silicon surfaces (Briggs, J. , Kung, V.T., Gomez, B., Kasper, K.C., Nagainis, P.A., Masino, R.S., Rice, L.S., Zuk, R.F., and Ghazarossian, V.E. (1990) . Sub-femtomole quantitation of proteins with Threshold, for the biopharmaceutical industry. Biotechniques , 9(5): 598-606. Kung, V.T., Panfili, P.R. , Sheldon, E.L. , King, R.S., Nagainis, P.A. , Gomez, B.J., Ross, D.A., Briggs, J. , and Zuk, R.F. (1990). Picogra quantitation of total DNA using DNA-binding proteins in a silicon sensor- based system. Analytical Biochemiεtry , 187(2): 220-7. Olson, J.D., Panfili, P.R. , Armenta, R. , Femmel, M.B. , Merrick, H., Gumperz, J., Goltz, M. , and Zuk, R.F. (1990). A silicon sensor-based filtration immunoassay using biotin-mediated capture. Journal of Immunological Methods , 134(1): 71-9. Olson, J.D., Panfili, P.R. , Zuk, R.F., and Sheldon, E.L. (1991) . Quantitation of DNA hybridization in a silicon sensor-based system: application to PCR. Molecular & Cellular Probes , 5(5): 351-8.), incorporated by reference, and immunochemical reagents such as antibody-enzyme conjugates (Eberle, G. , Barbin, A. , Laib, R.J. , Ciroussel, F. , Thomale, J., Bartsch, H. , and Rajewsky, M.F. (1989). l,N6-etheno-2- deoxyadenosine and 3,N4-etheno-2'-deoxycytidine detected by monoclonal antibodies in lung and liver DNA of rats exposed to vinyl chloride. Carcinogenesiε , 10(1): 209-12. Foiles, P.G., Miglietta, L.M. , Nishikawa, A., Kusmierek, J.T., Singer, B., and Chung, F.L. (1993). Development of monoclonal antibodies specific for l,N2-ethenodeoxyguanosine and N2,3- ethenodeoxyguanosine and their use for quantitation of adducts in G12 cells exposed ' to chloroacetaldehyde. Carcinogenesiε, 14(1): 113-6. Palecek, E. , and Hung, M.A. (1983) . Determination of nanogram quantities of osmium- labeled nucleic acids by stripping (inverse) voltammetry. Analytical Biochemi try , 132(2): 236-42.), incorporated by reference.
In this Step 4a detection, the etheno derivatives (primarily the ethenoadenine residues) within the loops are measured with the fluorimeter of the apparatus: excitation at 310 nm, and emission at 410 nm. The degree of fluorescence and senεititivity of the fluorimeter is calibrated with a quinine sulfate standard (10'5 - 10"7 M in 0.1 N H2S04) . The amount of direct etheno fluorescence is increased by a factor of 2 by completely digesting the samples with DNasel and phosphodiesterase, when a gel overlay is used to prevent diffusion of the signals and disruption of the two- dimensional array of markers. The number of heteroduplexes is determined by the unique fluorescence of the adduct that was initially linked to the Left primers. Rhodamine, fluorescein or isothiocyanine derivatives can all be used to obtain intense fluorescent signals that can be separately measured from the fluorescence of the etheno adducts. Standard programs guantitate the two different signals by analyzing two or more regions of the emission and/or excitation spectra. Alternative detection methods for the etheno- derivatives include the use of specific monoclonal antibodies (Eberle, G. , Barbin, A., Laib, R.J. , Ciroussel, F., Thomale, J. , Bartsch, H. , and Rajewsky, M.F. (1989). l,N6-etheno-2 -deoxyadenosine and 3 ,N4-etheno-2 '- deoxycytidine detected by monoclonal antibodies in lung and liver DNA of rats exposed to vinyl chloride. Carcinogeneεis , 10(1): 209-12. Foiles, P.G. , Miglietta, L.M. , Nishikawa, A., Kusmierek, J.T., Singer, B. , and Chung, F.L. (1993). Development of monoclonal antibodies specific for 1,N2- ethenodeoxyguanosine and N2,3-ethenodeoxyguanosine and their use for quantitation of adducts in G12 cells exposed to chloroacetaldehyde. Carcinogeneεiε , 14(1): 113-6.), incorporated by reference, conjugated to a chemiluminescent (including horseradish peroxidase or betagalactosidase) or electrochemical (urease and silicon-detector system) detection method.
In this Step 4a detection, residues within a mismatch loop will display differing degrees of reactivity to the modifying reagents as well as interactions (including fluorescence quenching and energy transfer) between closely spaced ethenoderivatives. (Which is why the fluorescence of an etheno derivative in a polynucleotide is approximately half that of the free ethenonucleotide.) Thus, systematic labeling is used to calibrate the fluorescent signal for each size of mismatch loop, thereby compensating for the nonlinearity of the fluorescent signal with respect to the loop size.
In Step 4a*, an alternative embodiment of the heteroduplex loop detection is accomplished by incorporating labeled nucleotides during the Step 2a PCR synthesis, and then in Step 31* digesting them out of the single-stranded loops of the heteroduplex. Incorporating labeled nucleotides (e.g., fluorescently or radioactively, using appropriate triphosphate deoxynucleotide precursors) has greater signal strength than, and is therefore preferrable to, direct measurement of the liberated nucleotides by optical density. The quantity of detectable freed label corresponds to the loop size. This is done using a single-strand specific nuclease, such as SI nuclease from Aspergillus orcyze (Dodgson, J.B., and Wells, R.D. (1977). Action of single- strand specific nucleases on model DNA heteroduplexes of defined size and sequence. Biochemistry, 16(11): 2374-9. Gite, S., and Shankar, V. (1992). Characterization of SI nuclease. Involvement of carboxylate groups in metal binding. -European Journal of Biochemiεtry , 210(2): 437-41. Shenk, T.E., Rhodes, C. , Rigby, P.W. , and Berg, P. (1975). Biochemical method for mapping mutational alterations in DNA with SI nuclease: the location of deletions and temperature- sensitive mutations in simian virus 40. Proceedingε of the National Academy of Scienceε of the United Stateε of America , 72(3): 989-93. Wiegand, R.C., Godson, G.N. , and Radding, CM. (1975) . Specificity of the SI nuclease from Aspergillus oryzae. Journal of Biological Chemiεtry , 250(22): 8848-55.), incorporated by reference, native micrococcal nuclease (Chambers, S.A., and Rill, R.L. (1984). Enrichment of transcribed and newly replicated DNA in soluble chromatin released from nuclei by mild micrococcal nuclease digestion. Biochimica Et Biophyεica Acta , 782(2): 202-9. Galcheva, G.Z., Davidov, V., and Dessev, G. (1985). Formation of single-stranded regions in the course of digestion of DNA with DNAase II and micrococcal nuclease. Archiveε of Biochemiεtry & Biophyεicε , 240(1): 464-9.), incorporated by reference, or modified micrococcal nuclease (Corey, D.R., Pei, D. , and Schultz, P.G. (1989). Generation of a catalytic sequence-specific hybrid DNase. Biochemiεtry, 28(21): 8277- 86. Pei, D., Corey, D.R., and Schultz, P.G. (1990). Site- specific cleavage of duplex DNA by a semisynthetic nuclease via triple-helix formation. Proceedingε of the National Academy of Scienceε of the United States of America , 87(24) : 9858-62.), incorporated by reference. When an apparatus is used that permits comingling of contents from different chambers, the spatial separation of the released nucleotides is maintained by performing the nuclease reaction in a gel overlay of the polystyrene plate. The gel prevents diffusion of the released nucleotides. (Diffusion is not an issue with direct detection of chemically modified nucleotides.) Alternatively, the polystyrene plate is placed into a plastic manifold, recreating 384 separate chambers.
In an alternative embodiment of the Step 4 detection, chemical modification is combined with specific nuclease treatment. SI or micrococcal nuclease can be used to enhance the fluorescence of the etheno-derivatized adenosines generated by the chloracetaldehyde reaction. This provides two sets of measures of the same residues, thus increasing accuracy and sensitivity. The nuclease treatment can be used alone to liberate nucleotides from the loop. These free nucleotides are then separated from the retained double- stranded DNA of the heteroduplexes and quantitated. The spatial orientation of the reactions must be preserved as the nucleotides are released. This is done by performing the nuclease reaction in a gel, such as polyacrylamide that is on a solid backing (available from FMC Corporation) , or by fitting a manifold over the streptavidin plate to contain the solutions with the nuclease and free nucleotides. To use the polyacrylamide gel plate, one takes a 0.1 - 0.5 mm polyacrylamide gel (ranging from 4-15%) bound to a plastic backing. The gel is slightly dehydrated with minimal surface moisture. The nuclease solution is applied to the surface of the gel (the amount of SI or micrococcal nuclease must be titrated for the enzyme lot) and the gel is placed over the surface of the streptavidin plate to which the heteroduplexes are bound. After incubating for 10-45 minutes (preferably 15 minutes) at room temperature to 37*C (preferably at 37*C), the gel layer is removed and the nucleotides embedded within the gel are quantitated by fluorescence, two-dimensional radioactivity counting, autoradiography, or immunochemical assays.
An alternative detection mechanism is described in Steps 4b, 4c, and 4d. The nucleotides within the heteroduplex loops are detected by distinguishing these nucleotides from those that are contained within the double-stranded portions of the DNA strands. In the preferred embodiment, the chemical modification agent chloracetaldehyde that selectively reacts with the exposed nucleotides within the loops is employed to specifically modify the C and A nucleotides within the heteroduplex loops. This reagent is preferrable to other chemical modification agents such as hydroxylamine, bisulfite, and osmium tetroxide because of its ease of use, and the fact that the derivatized nucleotides are fluorescent, while the chemical reagent and the unmodified nucleotides are not fluorescent. These other chemical methods represent alternative embodiments, with reagent conditions and detection methods adjusted accordingly as described by established techniques (Cotton, R.G. (1993). Current methods of mutation detection. Mutation Research , 285(1): 125-44. Ganguly, A., and Prockop, D.J. (1990). Detection of single-base mutations by reaction of DNA heteroduplexes with a water-soluble carbodiimide followed by primer extension: application to products from the polymerase chain reaction. Nucleic Acidε Reεearch , 18(13): 3933-9. Glikin, G.C., Vojtiskova, M. , Rena, D.L., and Palecek, E. (1984) . Osmium tetroxide: a new probe for site-specific distortions in supercoiled DNAs. Nucleic Acidε Reεearch , 12(3): 1725-35. Hayatsu, H. (1976). Reaction of cytidine with semicarbazide in the presence of bisulfite. A rapid modification specific for single-stranded polynucleotide. Biochemiεtry, 15(12): 2677-82. Jelen, F. , Karlovsky, P., Makaturova, E. , Pecinka, P., and Palecek, E. (1991). Osmium tetroxide reactivity of DNA bases in nucleotide sequencing and probing of DNA structure. General Phyεiology & Biophyεicε , 10(5): 461-73. Lilley, D.M. (1983). Structural perturbation in supercoiled DNA: hypersensitivity to modification by a single-strand-selective chemical reagent conferred by inverted repeat sequences. Nucleic Acidε Research , 11(10): 3097-112. Smooker, P.M., and Cotton, R.G. (1993) . The use of chemical reagents in the detection of DNA mutations. Mutation Reεearch , 288(1): 65-77. Tindall, K.R., and Whitaker, R.A. (1991) . Rapid localization of point mutations in PCR products by chemical (HOT) modification. Environmental & Molecular Mutageneεiε , 18(4): 231-8.), incorporated by reference. In an alternative embodiment, a detection amplification method such as immunodetection of the adducts using a urease-conjugate and a silicon-based detection of a pH shift, contacts the polystyrene surface with an electronic silicon detector and a urea-containing gel interface using existing methods (Briggs, J. , Kung, V.T., Gomez, B. , Kasper, K.C., Nagainis, P.A., Masino, R.S., Rice, L.S., Zuk, R.F., and Ghazarossian, V.E. (1990). Sub-femtomole quantitation of proteins with Threshold, for the biopharmaceutical industry. Biotechniqueε , 9(5): 598-606. Kung, V.T., Panfili, P.R. , Sheldon, E.L. , King, R.S., Nagainis, P. . , Gomez, B.J., Ross, D.A. , Briggs, J. , and Zuk, R.F. (1990) . Picogram quantitation of total DNA using DNA- binding proteins in a silicon sensor-based system. Analytical Biochemiεtry, 187(2): 220-7. Olson, J.D., Panfili, P.R., Armenta, R. , Femmel, M.B., Merrick, H. , Gumperz, J. , Goltz, M. , and Zuk, R.F. (1990). A silicon sensor-based filtration immunoassay using biotin-mediated capture. Journal of Immunological Method ε , 134(1): 71-9. Olson, J.D., Panfili, P.R., Zuk, R.F., and Sheldon, E.L. (1991). Quantitation of DNA hybridization in a silicon sensor-based system: application to PCR. Molecular & Cellular Probeε , 5(5): 351- 8.), incorporated by reference.
In Step 5, the genotypes are determined for every STR. The two signals for each locus represent the sum and difference between the alleles. When compared with predetermined calibration tables, this representation becomes quantitative. One allele is computed by adding the sum and difference values and then dividing by two, and the second allele is computed by subtracting the sum and difference values and then dividing by two. This genotype determination is done for every locus.
While the foregoing method has been described for the measurement of loop mismatches as a technique for distinguishing the alleles STRs, the same approach is applicable to detecting specific gene alleles and mutations. For mutation detection, chemical modification by CAA as well as by other reagents at the site of the basepair mismatch creates a detectable signal. The use of a bound oligonucleotide to create a solid-state detection of specific alleles has been described (Giorda, R. , Lampasona, V., Kocova, M. , and Trucco, M. 1993. Non-Radioisotopic Typing of Human Leukocyte Antigen Class II Genes on Microplates. BioTechniqueε , 15(5): 918-925. Lemna, W.K. , Feldman, G.L., Kerem, B.-S., Fernbach, S.D., Zevkovich, E.P., O'Brien, W.E., Riordan, J.R. , Collins, F.S., Tsui, L.-C, and Beaudet, A.L. 1990. Mutation analysis for heterozygote detection and the prenatal diagnosis of cystic fibrosis. N. E. J. Wed., 322: 291-296.) , incorporated by reference, and the use of chemical reagents to modify the sites of basepair mismatch is also well-described (Cotton, R.G. (1993) . Current methods of mutation detection. Mutation Reεearch , 285(1): 125-44.), incorporated by reference. The invention described herein combines chemical modification techniques with solid-state detection in a novel manner different from any existing gel electrophoresis method.
From the resulting dense genotyping data, the descent of chromosomal segments within families and populations can be traced. This is because the number of recombinations is small compared with the linear sampling density of the chromosomes. Hence, agreement of alleles at many consecutive closely- spaced markers having a high polymorphism information content (PIC) value (Botstein, D. , White, R.L., Skolnick, M.H. , and Davies, R.W. 1980. Construction of a genetic linkage map in man using restriction fragment length polymorphisms. Am . J. Hum . Genet . , 32: 314-31.), incorporated by reference, serves as a signature (with extremely high probability) in the population for a unique linear segment of chromosome. In fact, with sufficiently dense spacing (as described here) , loci having much less informative PIC values can be used.
Phenotypic data is gathered on the individuals, animals, or plants which are genotyped. For humans, this includes the basic medical examination: history, physical, and laboratory data. Additional phenotypic markers for various genetic diseases (e.g., creatine kineaεe for Duchenne muscular dystrophy) can also be collected. Environmental risks and exposures are also recorded.
These genes associated with phenotypic traits are then localized on the genome. This analysis can be done by linkage (Ott, J. 1991. Analyεiε of Human Genetic Linkage, Reviεed Edition . Baltimore, Maryland: The Johns Hopkins University Press. Feingold, E. , Brown, P.O., and Siegmund, D. 1993. Gaussian Models for Genetic Linkage Analysis Using Complete High-Resolution Maps of Identity by Descent. Am . J. Hum . Genet . , 53: 234-252.), incorporated by reference, affected pedigree member (Weeks, D.E., and Lange, K. 1988. The affected pedigree member method of linkage analysis. Am . J. Hum . Genet . , 42: 315-326. Weeks, D.E. , and Lange, K. 1992. A multilocus extension of the affected-pedigree-member method of linkage analysis. Am . J. Hum . Genet . , 50: 859-868.), incorporated by reference, affected relative pairs (Risch, N. 1990. Linkage strategies for genetically complex traits, (in three parts) . Am . J. Hum . Genet . , 46: 222-253.), incorporated by reference, inclusion/exclusion, (Perlin, M.W. , and Chakravarti, A. 1993. Efficient Construction of High- Resolution Physical Maps from Yeast Artificial Chromosomes using Radiation Hybrids: Inner Product Mapping. Genomicε , 18: 283-289.), incorporated by reference, association, homozygosity mapping (Ben Hamida, C. , Doerlinger, N. , Belal, S., Linder, C. , and Reutenauer, L. 1993. Localization of Friedrich ataxia phenotype with selective vitamin E deficiency to chromosome 8q by homozygosity mapping. Nature Geneticε , 5: 195-200. Pollak, M.R. , Chou, Y.-H.W., Cerda, J.J., Steinmann, B. , LaDu, B.N. , Seidman, J.G. , and Seidman, C.E. 1993. Homozygosity mapping of the gene for alkaptonuria to chromosome 3q2. Nature Geneticε , 5(201-4).), incorporated by reference, linkage disequilibrium, and other genetic localization techniques (Emery, A.E.H. 1986. Methodology in Medical Geneticε: an introduction to εtatiεtical methodε, Second Edition . Edinburgh: Churchill Livingstone. Vogel, F. , and Motulsky, A.G. 1986. Human geneticε : Problems and Approaches, Second Edition . Berlin: Springer-Verlag.) , incorporated by reference. The result is one or more (with polygenic disease) peaks appearing at specific locations on the chromosome that both suggest specific gene regions, as well provide a signature pattern for phenotypic risk. With dense STS sampling along the genome (i.e., x-axis) , and large numbers of individuals tested at these STSs, with each STS's allele given a combined score (i.e., on a y-axis) , the conventional limitations of statistical linkage analysis are overcome, and the process becomes akin to a signal processing of genetic data in order to separate delta functions (i.e., the causative genes) from the background noise. That is, in addition to conventional linkage analysis, a method based on superimposing genetic information from many related individuals as one dimensional signals (along a genome) will accurately identify recurring genome locations by where the peaks occur. This method is described in figure 12. Importantly, this methodology will work well with complex multigenic multifactorial diseases (Lander, E.S., and Botstein, D. 1986. Mapping Complex Genetic Traits in Humans: New Methods Using a Complete RFLP Linkage Map. In Cold Spring Harbor Symposia on Quantitative Biology, 49-62. vol. LI, Cold Spring Harbor, Cold Spring Harbor Laboratory.), incorporated by reference, and not just single gene Mendelian inherited diseases. These complex diseases include all the most common diseases, such as cancer, heart disease, vascular disease, diabetes, glaucoma, and lung disease (King, R.A. , Rotter, J.I., and Motulsky, A.G., ed. 1992. The Genetic Baεiε of Common Diεeaεeε . New York, NY: Oxford University Press.), incorporated by reference.
Risks of trait inheritance or disease can then be determined by probabilistic (e.g., Bayesian) techniques (Young, I.D. 1991. Introduction to Riεk Calculation in Genetic Counεelling. Oxford: Oxford University Press.), incorporated by reference, that correlate the available genotypic and phenotypic data and environmental factors with chance of disease occurrence. In particular, the signatures of causative gene locations deduced from the population can be applied to each individual to ascertain risk. For animal and plant studies, one or more genetic loci can be associated with specific (desirable or undesirable) traits such as milk production or disease resistance. This information can be used for selective breeding.
Once the risks have been computed for an individual (in the context of his or her family) for all known disease entities, they can be sorted in descending order of likelihood and severity. The entities appearing at the top of this list are precisely those diseases that this individual has the greatest risk of developing. By moderating the environmental factors of these entities, including diagnostic, therapeutic, and preventative measures, the risks of these diseases can be reduced. This enables true cost- effective implementation of preventive health care: full customization to the genomic composition of each patient.
The techniques of genotyping and phenotypic correlation can be similarly applied to the task of disease gene identification. Exploiting dense genotypic data is particularly advantageous over existing techniques in localizing the genes of complex multigenic diseases. Once genes have been localized on the genetic map, use of an integrated genetic/physical genome map allows the positional cloning (Kerem, B.-S., Rommens, J.M. , Buchanan, J.A. , Markiewicz, D. , Cox, T.K. , Chakravarti, A., Buchwald, M. , and Tsui, L.-C. 1989. Identification,of the cystic fibrosis gene: genetic analysis. Science , 245: 1073-1080. Riordan, J.R., Rommens, J.M. , Kerem, B.-S., Alon, N. , Rozmahel, R. , Grzelczak, Z., Zielenski, J. , Lok, S., Plavsic, N. , Chou, J.-L., Drumm, M.L., Iannuzzi, M.C., Collins, F.S., and Tsui, L.-C. 1989. Identification of the cystic fibrosis gene: cloning and characterization of complementary DNA. Science , 245: 1066-1073.) , incorporated by reference, of the causative genomic materials. As more genes are mapped, the task increasingly becomes the association between known genes with specific traits and disease, rather than the isolation of new genes.
The storage and safeguarding of this genetic information requires large, secure memory devices. These can restrict access to just those persons given authority by the individual. In one embodiment, individuals are given CD-ROMs containing the genetic information of themselves and their relatives, with access restricted by encryption and passwords, so that each individual can only directly grant access to information about themselves. Alternatively, a centralized data service can provide this secure information.
With the large amount of genotypic, phenotypic, and risk assessment data obtained, the results of the customized risk analysis must be presented in a coherent fashion to the patient. This is done with the assistance of genetic counselors (Emery, A.E.H., and Rimoin, D.L., ed. 1983. Principleε and practice of medical geneticε . Edinburgh: Churchill Livingstone.), incorporated by reference, or clinical geneticists, or with a computer-based system that replicates this expertise. What is essential is to keep the bulk of the megabytes of information and low risk diseases in the background, and only bring to the patient's attention the most relevant risk and prevention information.
D. Method for Genotyping STRs using Loop Mismatch
The STR loop mismatch method employs heteroduplex hybridizations to directly measure the STR allele repeat number n. Consider the two alleles at a given STR locus as the complementary strands in a heteroduplex DNA molecule. Suppose that one strand S contains s STR repeat units and its mismatched complementary strand T1 contains t STR repeat units. (Notation: U' denotes the complementary strand of sequence U. ) Each STR repeat unit is comprised of k nucleotides. Assume that the left and right flanking regions are identical (i.e., perfectly complementary). When the hybridization product ST1 is formed, if s=t (i.e., identical STR alleles) , then there is a perfect match of the duplex DNA. If, however, s≠t (i.e., different STR alleles), then a heteroduplex is formed that has a loop of single-stranded nucleic acid (SS-DNA) .
D.l. Method for Genotyping STRs using Loop Modification
Referring to figure 4, for s>t, the loop structure seen in subfigure 4A is formed. Here, subsequence L 402 is the left flanking region (with subsequence L' 404 complementary) and subsequence R 406 is the right flanking region (with subsequence R' 408 complementary) . Crucially, the s-t extra STR units form a single-stranded loop 410 of size (s-t)*k bases. Energetically, only one such loop is expected (Ninio, J. 1979. Biochimie , 61: 1133. Salser 1977. Cold Spring Harbor Symp. Quant . Biol . , 42: 985.), incorporated by reference; however, multiple loops would in no way change the results.
For s<t, the complementary structure shown in subfigure 4B is formed. Here, the single-stranded loop 412 size (t- s)*k is on the complementary strand.
The key idea is this: by detecting the size of the single-stranded loop 410 or 412, the value s-t (or t-ε) can be determined. By comparing two unknown alleles with a known standard, and by also comparing the two alleles with respect to each other, these loop size measurements will precisely determine the two alleles, i.e., the genotype at the STR locus.
The signal strength from a loop of single-stranded DNA is proportional to the number of unmatched nucleotides in the heteroduplex ST'. This signal is measured by means of a first label (*) that corresponds to the number of unmatched nucleotides in the loop of ST'. This label is measured by means of a physical detection that preferentially detects specific nucleotides in single-stranded DNA.
In the most preferred "chemical modification" embodiment, the nucleotides in the S strand of the heteroduplex molecule are chemically modified after the PCR synthesis. The modification to these nucleotides renders them detectable (e.g., by fluorescence). The measured fluorescence of these modified S nucleotides is proportional to the size of the loop mismatch s-t. In another preferred "synthesis/digestion" embodiment, the nucleotides in the S strand of the heteroduplex molecule are labeled (radiolabeled, or other detectable means) and then incorporated during the PCR synthesis. Subsequent digestion with an Sl-like endonύclease separates the mismatched (and labeled) S nucleotides from the heteroduplex. The measured signal of these released S nucleotides is proportional to the size s-t of the loop mismatch prior to enzymatic digestion.
Means of physical detecting a quantitative signal for determining the loop size include: radioactivity, fluorescence, optical density, ionic concentration, electromagnetic conductivity or susceptibility, electrochemical coupling, or other detection assays (all referred to previously in this description) .
The loop size is determined by the ratio of the (1) measured single-stranded loop signal strength to the (2) measured number of strands having a loop. Therefore, in addition to detecting loop size, accurate quantitation also requires determining the number of heteroduplex strands with measurable loops. This is done using an independent second label (#) on the S strands of the heteroduplex molecules. This label is comprised of a detectable molecule attached to the PCR primer of the S strand; subsequent measurement of this molecule quantifies the number of strands in heteroduplexes. Although this loop mismatch method applies to all VNTRs of the form L nR, the following discussion assumes throughout an STR with W="CA". This is done solely to clarify the presentation, since the loop mismatch approach will work with any STR or VNTR locus, and on any linear nucleic acid (i.e., DNA, RNA, and hybrid polymers) .
To further clarify the presentation, the loop with label (*) is indicated by A*s, which in the most preferred embodiment represent adenosine nucleotides on the single- stranded loop that are chemically modified by chloracetaldehyde into a detectable state. The presentation is written to be compatible with another preferred embodiment, wherein the A*s represent labeled (e.g., radiolabeled) nucleotides that are incorporated during PCR synthesis, and are then detected following endonuclease digestion.
To determine a single allele (e.g., homozygous or hemizygous locus) , the experiment consists of performing a PCR amplification of an unknown CA-repeat locus source S of the form L(CA)SR, and hybridizing it to a known complementary oligonucleotide target T' of the form [L(CA),R] ' in order to indu ce mismatch and quantitatively measure the loop.
Referring to figure 5, in Step 1 a CA-repeat locus molecule is selected for analysis, and is defined by its unique left and right oligonucleotide primers. The primers are synthesized with appropriate labeling and linking modifications (Haralambidis, J. , Duncan, L. , Angus, K. , and Tregear, G. . (1990) . The synthesis of polyamide- oligonucleotide conjugate molecules. Nucleic Acidε Reεearch , 18(3): 493-9. Nelson, P.S., Kent, M. , and Muthini, S. (1992). Oligonucleotide labeling methods. 3. Direct labeling of oligonucleotides employing a novel, non-nucleosidic, 2- aminobutyl-l,3-propanediol backbone. Nucleic Acidε Reεearch , 20(23): 6253-9. Roget, A., Bazin, H. , and Teoule, R. (1989). Synthesis and use of labelled nucleoside phosphoramidite building blocks bearing a reporter group: biotinyl, dinitrophenyl, pyrenyl and dansyl. Nucleic Acidε Reεearch , 17(19): 7643-51. Schubert, F. , Cech, D. , Reinhardt, R. , and Wiesner, P. (1992) . Fluorescent labelling of sequencing primers for automated oligonucleotide synthesis. Dna Sequence , 2(5): 273-9. Theisen, P., McCollu , C. , and Andrus, A. (1992) . Fluorescent dye phosphoramidite labelling of oligonucleotides. Nucleic Acidε Sympoεium Serieε , 1992(27): 99-100.), incorporated by reference.
In Step 2a, the target DNA TT1 is constructed from a standard of known CA-repeat length t in a separate PCR experiment. The allele size t is chosen sufficiently small, say between 0 and 10, so that s>t is always guaranteed. Standard PCR amplification of genomically-derived or cloned DNA for 20-40 cycles is done using unlabeled primers and nucleotides, with a linker such as biotin on the right primer.
In Step 2b the source DNA SS' is constructed from sample genomic DNA via a PCR experiment. The CA-repeat locus molecule is defined by its unique left and right primers. A standard PCR amplification of genomically derived DNA is done for 20-40 cycles using labeled (#) left primer in the presence of A* labeled nucleotides. In Step 3a the SS' and TT' duplex molecules are denatured to form single stranded DNAs. When renatured in solution, the hybridization pairs
(S+T) (S+T) '
recombine to form
SS• , ST' , TS' , and TT' .
The T strands of the TT1 duplex are not detectable (since their loops match) , and can be factored out of the analysis. For example, the T strands can be removed by attaching the TT' duplex to solid support via the linker of T', and then denaturing T from T', and washing to remove T, thus purifying T*. Alternatively, T can remain as a nondetectable competitive contaminant. Further, using an excess of SS' relative to TT' favors the production of ST' heteroduplexes. Therefore, the focus is on the hybridization pairs
(S) (S+T) '
which recombine to form
SS' + ST' .
The SS' contains no single-stranded loops, hence is not detectable. Further, since only the T' molecule has the linker for solid support, attaching the T' to a surface (e.g., the biotin of T' to a streptavidin-coated surface) and washing removes the SS' product. This leaves only ST 1
as a detectable (and useful) product.
Referring to subfigure 5A, the heteroduplex molecule is comprised of an upper strand 502 and a complementary lower strand 504. With s>t, the hybridization product is as shown in subfigure 5A.
(S) The upper source strand 502 is produced by a first PCR amplification of sample genomic DNA.
(51) The single-stranded DNA loop 506 contains *- detectable A nucleotides. (Following chemical modification or by incorporation/digestion, the *-detectable A's are used to measure loop size via label *.)
(52) A second label (#) 508 on the upper strand is for strand quantification, and is attached to the left PCR primer.
(T') The complementary lower target strand 504 is produced by a different PCR amplification of a known STR locus, or by direct synthesis. This lower strand has a linker 510 such as biotin attached to its 5' (right) end.
With s>t, the hybridization of strands S and T' are perfectly matched everywhere but in the CA-repeat region. This mismatch produces a loop of size 2(s-t) containing precisely (s-t) A*'s. In the Step 3b chemical modification embodiment, the exposed A*s on the single-stranded DNA loop are chemically modified by chloracetaldehyde, as shown in subfigure 5B; in Step 4a, detecting the fluorescence from the first label (*) on the A* 512 measures the magnitude of s-t.
In the alternative Step 3c synthesis/digestion embodiment, the exposed A*s on the single-εtranded DNA loop are digested from the heteroduplex into free A* 514 using an endonuclease, as shown in subfigure 5C; in Step 4b, the radioactive A* is then detected using a scintillation counter, thereby measuring the magnitude of s-t.
Only the upper S strands 516 have the second label (#) 518, so detection in this fluorescent molecule's wavelength in Step 4a or 4b measures the number of strands.
The allele is determined in Step 5. Calibrations done prior to the experiment ensure that these measurements provide precise quantitation. Since
label 1 (*) => (number of SS-DNA strands) * (s-t) , and label 2 (#) => (number of SS-DNA strands) ,
taking the calibrated ratio of labell (*) to label2 (#) gives a measure of s-t. When only one allele s is measured (as in a hemizygotic or homozygotic locus, or with separated chromosomes) , this determines the value s-t. Since s>t, the allele s is then determined by adding the known value t to the measured value (s-t) .
The determination of the allele sum sl+s2 is described next. For general genotyping, the heterozygotic case must be handled. Suppose that a CA-repeat locus is heterozygotic, comprised of two alleles having CA-repeat numbers si and s2 corresponding to their respective DNA strandε Si and S2. Referring to figure 5, performing the Steps 1 and 2 of the two PCR experiments and the Step 3 hybridization with strand T' , the two products (S1+S2)T' ,
or
S1,T' ; and S2,T'
are formed. These two specieε are present in equal concentrations.
Step 4 measures the sum s=[ (sl-t)+(s2-t) ]/2. Following calibration, Step 5 adds the known value t to s, forming the average (sl+s2)/2 of the alleles. Multiplying this average by 2 determines the allele sum sl+s2.
Determining the allele difference s2-sl. This experiment consists of performing a PCR amplification of an unknown CA-repeat locus with the (zero, one, or) two sources SI and S2 of the form L(CA),,R and L(CA)s2R, and hybridizing them against each other's complementary strands. This induces a loop mismatch proportional to js2-slj, which is then quantitatively measured.
Referring to figure 6, in Step 1 a CA-repeat locus molecule is selected for analysis, and is defined by its unique left and right oligonucleotide primers. The primers are located far enough away from the CA-repeat region to assure a sufficiently long linear stretch of DNA in the homoduplex; this is done make the effect of different loop sizes on the free energy neglible. The rationale is that the flanking regions and the complementary CA/GT repeat regions have a total free energy that is proportational to the number of matching nucleotides, whereas the single- stranded DNA loop of heteroduplex has a free energy that grows as the logarithm of the loop size (Ninio, J. 1979. Biochimie , 61: 1133. Salser 1977. Cold Spring Harbor Symp. Quant . Biol . , 42: 985.), incorporated by reference. Thus, relative to the large region of matched double-stranded DNA, the free energy changes (and binding affinities) introduced by differing loop sizes is small.
In optional Step 2a, target DNA TT' is constructed from a standard of known CA-repeat length t in a separate PCR experiment. The allele size t is chosen sufficiently small, say between 0 and 10, so that s>t is always guaranteed. Standard PCR amplification of genomically-derived or cloned DNA for 20-40 cycles is done using unlabeled primers and nucleotides. No labels or linkers are used.
In Step 2b, the two source alleles are constructed simultaneously in one PCR experiment: each allele serves as the hybridization target for the other. A standard PCR amplification of genomically derived DNA is done for 20-40 cycles using labeled (#) left primer, and a right primer with a linker such as biotin, in the presence of A* labeled nucleotides.
Step 3a forms the heteroduplexes. The SI,SI' and S2,S2' homoduplex molecules are denatured to form single stranded DNAs. When renatured in solution, the hybridization pairs
(S1+S2) (S1+S2) ' recombine to form the four products
SI,SI'; S1,S2'; S2,S1'; and S2,S2'.
All four species are present in roughly equal concentrations. This is because of the DNA energetics described in Step 1, which assures binding DNA affinities of approximately equal strength.
Referring to subfigure 6A, with s2>εl, the hybridization product iε as shown. The heteroduplex molecule constructed after PCR amplifying the sample genomic DNA, and rehybridizing, is comprised of an upper strand 602 and a complementary lower strand 604.
(S) In the upper source strand 602:
(51) The single-stranded DNA loop 606 contains *- detectable A nucleotideε. (Following chemical modification or by incorporation/digestion, the *-detectable A's are used to measure loop size via label *.)
(52) A second label (#) 608 on the upper strand is for strand quantification, and is attached to the left PCR primer.
(S') The lower strand, also haε a linker 610 such as biotin attached to its 5' (right) end.
When εl=s2, SI is the same molecule as s2, and the homoduplex SI,SI' is formed (the other three duplexes are equivalent) . Since no mismatch occurs, there is no single- stranded loop, and the detection measureε zero signal, corresponding to the case s2-sl=0.
When sl≠s2, without losε of generality aεεume that sl<s2. Consider the four hybridization cases:
(SI,SI') Homoduplex with no detectable signal.
(S2,S2') Homoduplex with no detectable signal.
(S1,S2') Since sl<s2, the mismatch loop is on the S2 ' strand, and is unlabeled, producing no detectable signal.
(S2,S1') Since sl<s2, the mismatch loop is on the SI strand, and is labeled, producing a detectable signal.
(In another embodiment, the label is incorporated into both strands during the PCR by labeling the CA and/or the GT dN*'s. Hence, both the S1,S2' and S2,S1' strands have detectable single-stranded loops. Since both have the same js2-slj loop εize, there is a two- to four- fold increase in the desired measured signal.)
Incomplete hybridization results in single stranded lower DNAs SI' and S2' bound by biotin to the εolid εupport. While there are no *-detectable As in the GT-repeat region of these lower strands, *-detectable A may be incorporated into the flanking regionε during the PCR. In Step 3b these εingle-εtranded segments are made nondetectable by DNA elimination and/or protection.
Elimination can be done using a single-εtrand εpecific 3' to 5* exonucleaεe that removeε SS-DNA but not internal loopε, such as E. coli exonuclease VII.
Protection is effected by generating nonlabeled upper T strandε in Step 2a to form the double- stranded products
T,S1' and T,S2' .
The εame short T strand with known allele εize t (i.e., t<εl, and t<ε2) uεed in figure 5 would work here in figure 6 aε well. (Since t iε εmaller than both si and s2, the miεmatch loopε would be formed in the unlabeled GT-repeat region of the lower εtrandε, hence would be undetectable.) Uεing juεt the left and right flanking regionε L and R would alεo block the εingle-εtranded flanking DNA, and have more favorable binding kinetics in that they would tend to not displace hybridized SI and S2 strands.
These techniques can be combined for a more complete hybridization.
The hybridization of strands S2 and SI* is perfectly matched everywhere but in the CA-repeat region. This mismatch produces a loop of size 2*(s2-sl) containing precisely (ε2-sl) A*'ε. In Step 3c's chemical modification embodiment, the exposed A*s on the εingle-stranded DNA loop are chemically modified by chloracetaldehyde; in Step 4a detecting the fluorescence from the first label (*) on A*s measures the magnitude of s-t.
In Step 3d'ε alternative εyntheεiε/digeεtion embodiment, the exposed A*s on the εingle-εtranded DNA loop are digeεted from the heteroduplex into free A* uεing an endonucleaεe; in Step 4b the radioactive A* iε then detected using a scintillation counter, thereby measuring the magnitude of s- t.
All the strands (SI and S2) are labeled with the fluorescent label (#) , so Step 4a or 4b's detection in thiε fluoreεcent molecule'ε wavelength meaεureε the total number of εtrands from all four hybrids.
In Step 5, the allele difference iε determined. Calibrationε done prior to the experiment assure that these measurements provide precise quantitation. Since
label 1 (*) => (strandε/4) * (ε2-εl) , and label 2 (#) => εtrandε,
taking four timeε the calibrated ratio of labell (*) to label2 (#) giveε a meaεure of s2-sl.
The genotype is computed from loop mismatch data, referring to figure 7, by combining the sum (from the figure 5 protocol) and difference (from the figure 6 protocol) of the allele sizeε; thiε determination exploitε the elimination of PCR stutter artifact by pooling within each experiment, as described below. Thus, the εingle experiment of Step 1 accurately meaεures the allele sum (sl+s2) , the single experiment of Step 2 accurately measures the allele difference js2-εl] . Combining theεe in Step 3 determines the two alleles:
εl = (εum - difference) /2 s2 = (sum + difference) /2 When fewer than two distinct alleles are present on the two chromosomeε:
(0) zero alleleε - both εl and ε2 are zero;
(1) one allele - εl and ε2 are equal (i.e., the difference is zero) . The quantitation calibrated to other alleles showε whether one or two copieε of the allele are present.
A detailed protocol iε given for the loop mismatch method. The following steps referring to figure 8 are designed for measuring a single STR, rather than the multiple STRs assayed in figure 3. In Step 1 of figure 8, an STR locus is selected, and PCR primers are chosen to provide large flanking regions. In particular, this protocol is not optimized for compatibility with the apparatuε of figure 1. The primerε are εyntheεized derivatized to εupport the characterization experimentε.
To genotype one STR, theεe modified primers are used:
a first left primer L that is unmodified.
a second left primer L# for the upper strand which has the flourescein label (#) at the 5' end,
a firεt right primer R which contains no modifierε,
a εecond right primer Rb containing one or more biotin residues at the 5 ' end or within the oligonucleotide.
Derivatizing the primer for binding to a surface entails incorporating a biotinylated nucleotide at the 5 ' end of the εynthetically made oligonucleotide. Additional biotinylated reεidueε can be incorporated into thiε primer either at the time of bioεyntheεis or by secondary photo or chemical biotinylation. The preferred embodiment employs the direct addition of the 5* biotin by chemical syntheεis; alternatively, additional biotin molecules may improve the heteroduplex isolation effiency.
In Step 2, three PCR amplifications are performed. Source DNA from a genome to be characterized, and target DNA of known minimal repeat length t from an individual (or prepared in advance by cloning a segment of genomic DNA in a plasmid or phage vector) are prepared for PCR. Three separate reactions are performed. These are identical, except for the following specific reaction mixtures:
PCR a: TT' sum PCR mixture for Step 2.a target DNA, L, Rb, all dNTPs unlabeled
PCR b: SS' εum PCR mixture for Step 2.b εource DNA, L#, R, labeled a-32P-dATP, other dNTPs unlabeled
PCR c: S2,S1' difference PCR mixture for Step 2.c source DNA, L#, Rb, labeled a-32P-dATP, other dNTPε unlabeled
The components of the PCR reaction are assembled so that each 0.2 or 0.5 ml tube contains the appropriate set of primers, followed by the εtandard PCR buffer containing Triε buffer, KCl, MgCl2 and dNTP (the four triphoεphate deoxynucleotideε) . The total size of each PCR reaction is 50 ul (though thiε can vary from 10-100 ul) . Each εpecific PCR reaction containε itε εpecific reaction mixture, the PCR buffer (ie lOmM Triε pH8.0, 50 mM KCl, 2.5 mM magneεium chloride, albumin), and thermoεtable (e.g., Taq) polymeraεe. The PCR reaction iε overlayed with a thin layer of Ampliwax that εeparateε εome of the components from each other so that the reaction begins when the temperature rises to a level that meltε the wax and allowε all of the componentε to mix. Thiε iε the "hot start" method of PCR which reduces nonspecific synthesiε products. An initial heat denaturation of 93-95'C for 5 minutes is followed by the thermal cycles are performed 20-40 times. Each cycle consists of a 30 sec denaturation step at 95'C, 15-30 second annealing εtep at 50-65*C (typically 55*C) and an extenεion εtep at 73'C for 15-120 seconds (typically 45 secondε) . When the three PCR reactionε are completed, 0.5M EDTA is added to a final concentration of 10 mM. This inactivates the Taq polymerase.
In Step 3, the heteroduplex hybridizations and modifications are done. Reactionε a and b are combined (εummation experiment) in Step 3a, and reaction c (difference experiment) iε kept εeparate in Step 3b. All the following operationε are done independently for the two reactionε (sum and difference). The samples are then heated to 95"C for 5 minutes and allowed to anneal at a temperature of 75"C to discourage primer-strand annealing. After 2-24 hours, the temperature iε lowered to 4*C to solidify the Ampliwax and the exonuclease VII (Gibco, BRL) , incorporated by reference, in the appropriate buffer iε added to the εurface. The buffer conditionε for the PCR are compatible directly with thoεe of exonucleaεe VII. The reactions are initiated by heating to 37"C and incubated for a time ranging from 1-120 minuteε. The reactionε are terminated by the addition of chloroform to the tubes.
The supernatantε from the chloroform extractionε contained hetero- and homoduplexeε, digested single strandε, pri erε and free nucleotideε. The double-εtranded DNA iε then purified uεing a εpin column/filter (such as Centricon filters from Amicon) to remove the small molecular weight material and concentrate the samples. The purified DNAs from experiment are then adsorbed to strepavidin paramagnetic beadε (DYNAL 1993. Dynabeadε biomagnetic εeparation system, Technical Handbook: Molecular Biology, Dynal International, Oslo, Norway.) to bind thoεe double-stranded DNAs that contain the biotinylated right primer. The beadε are waεhed several times with a neutral salt buffer to reduce nonεpecific binding and not disrupt the double-stranded DNA.
In the preferred chemical modification embodiment of Step 3, the DNA bound to the strepavidin beads are equilibrated in 0.15 M Na Phosphate buffer pH = 6.5 (the pH can be varied from 4.5 to 6.5 and alternative buffers can be used) and then 2-chloracetaldehyde to a concentration 2.0%. The tubes are incubated at 37"C for 4 hours (longer or shorter times may be used) . The reaction is terminated by waεhing with 0.01M Tris-HCl pH 7.0 and 1.0 M NaCl. The NaCl prevents dissociation of the heteroduplexes during the etheno-dehydration step. The samples heated in the final waεh volume at 85*C for 1 hour (dehydrateε the ethenoderivative) .
In the alternative incorporation/digeεtion embodiment of
Step 3, uεing a εingle-εtrand εpecific endonucleaεe εuch aε SI nucleaεe or micrococcal nucleaεe, the original PCR products that have been treated with exonuclease and bound to the strepavidin beads are equilibrated in the endonuclease buffer and reacted for varying timeε.
In Stepε 4a and 4b, the εignalε are detected. The fluoreεcence and radioactivity retained on the beadε are meaεured. The amount of floureεcein and 32P can be independently determined. Theεe valueε eεtabliεh the number of double-εtranded complexeε, and total incorporation of 32P cATP into the moleculeε. reεpectively.
In the preferred chemical modification embodiment, the fluoreεcence iε measured by heating the samples to 95"C and eluting the DNA from the beadε, taking the supernatents and measuring the fluoreεcence with a fluorimeter (excitation at 310 nm emiεεion at 410 nm) . The degree of fluoreεcence and sensititivity of the fluorimeter is calibrated with a quinine sulfate standard (10"5 - IO"7 M in 0.1 N H2S04) . The tubes can be counted again for the amount of retained floureεcein and 32P labelε. The amount of radioactivity can be calibrated with known εtandards that account for tube geometry, sample volume and instrument counting efficiencies. Baεed upon the radioactivity and the fluorescence, the size of the loops can be establiεhed.
In the alternative incorporation/digeεtion embodiment, during the digeεtion, aliquots of the supernatants are removed and counted to determine the rate and extent of nuclease-dependent releaεe of 32P-labelled nucleotides. This establishes the optimal parameters for the endonuclease digeεtion and accurate quantitation of the nucleotideε contained within the loops. Thiε method can alεo be done with the chloracetaldehyde-modified nucleotides to measure released fluorescence. A direct comparison of the two methods can be achieved with the same initial set of PCR reactions.
In Step 5, the genotype is determined. In Step 5a, the sum is computed from the Step 4a detection, and in Step 5b, the difference is computed from the Step 4b detection. The resultε are combined in Step 5c to determine the genotype of the STR, aε deεcribed. Thiε completeε the protocol.
In another embodiment, DNA protection iε done to minimize εpuriouε εignalε from unhybridized εingle-εtranded DNA, and exonucleases are not uεed. Referring to figure 8, in Step 3a, a 10:1 excess of the SS' amplified product relative to the TT' amplified product iε preferably uεed. In Step 3b, when necessary, TT' (or fragments thereof) without labels or linkerε iε added to block unhybridized SI' εtrands.
In an alternative embodiment, the number of PCR reactions can be reduced by performing PCR reactionε b and c above aε a firεt reaction uεing a cleavable biotinylated right primer and modifying εeveral steps. The PCR product can then be combined with the second target PCR reaction a to allow εequential measurement of the sum and difference experiments. This iε accompliεhed by combining the two PCR reactionε for the Source and Target DNA'ε in Step 2, preparing and iεolating the heteroduplexeε on the streptavidin beads in Step 3, and meaεuring the nucleotides within the loops by derivatization and fluorescence in Step 4. The initial measurements in Step 4 are then followed by the release of duplexes employing the immobilized Source strand by reduction of a disulfide linkage between the primer and the biotin. In Step 4, one measureε the total number of bound duplexeε, the number of duplexeε that are attributable to the biotinylated source, the total number of nucleotides contained within all loops and the number of nucleotides contained within the loops formed between the Target and Source DNA.
In an alternative embodiment, in Step 4, a more senεitive detection εyεtem for the chemical modification embodiment iε an antibody-enzyme conjugate that recognizeε the derivatized DNA (i.e. , the etheno- derivatives created by chloracetaldehyde) and catalyzes a colorimetric reaction that can be measured in the supernatent. The εimpleεt form of thiε aεεay would be to uεe a betagalactoεidaεe-antibody conjugate that actε on a colorimetric substrate such aε X-gal or Blue-gal (BRL/Gibco) .
Eliminating PCR Stutter using Pooled Targets. When PCR is done on a CA-repeat locus, there iε often a εtutter pattern wherein εmaller fragmentε are also generated in lesser amounts (Schwartz, L.S., Tarleton, J. , Popovich, B. , Seltzer, W.K. , and Hoffman, E.P. 1992. Fluorescent Multiplex Linkage Analysiε and Carrier Detection for Duchenne/Becker Muεcular Dystrophy. Am . J . Hum . Genet . , 51: 721-729), incorporated by reference. With a locus L(CA)nR, the fragments L(CA)n.,R, L(CA)n.2R, and εo on are alεo generated in addition to the main PCR product L(CA)πR. The diεtribution of the smaller fragments generally follows a decay pattern, with the amount of L(CA)mR leεε than L(CA)nR, when m<n. Thiε decay pattern iε empirically obεerved to differ from one genetic locuε to another, but remainε εtable across unrelated individuals for any given locus. As described herein, the uεe of pooled targetε in the preferred embodiment eliminates thiε artifact. Multiple sources hybridized against multiple targets, producing a quadratic number of heteroduplexes. The different CA-repeat εizes
s, ε-1, ε-2, ..., and
L-- f L->^^ f f™ t t f • • • •
are obtained for the DNA εtrandε
S, S—1, S—2, ..., and T', (T-l) », (T-2) ', ... ,
which, when cross-hybridized, produce an entire table of products
(S-i) x (T-j).
The mismatch loop size of each hybrid (S-i)x(T-j) is (s- t-i+j).
The factorε affecting the relative εignal from each (S-i) (T- j) hybridization pair are:
(a) The product a-j of the concentrationε [S-i] and [T- j], which are determined by the stutter pattern. With equal amounts of S and T, and identical stutter patterns, the underlying concentration matrix is εymmetric: a*j •= a^.
(b) The εignal produced by the loop εize of the miεmatch, which iε proportional to (or monotonic in) the length (ε-t-i+j) of the miεmatch. (c) The differential amount of hybridization based on the energetics of DNA binding resulting from different loop εizeε. Aε noted above, thiε iε minor.
Combining the major factors (a) and (b) , matrix symmetry results in the relative cancellation of off-diagonal terms.
Each mismatch loop larger by d than the mean s-t is mirrored by a roughly equal concentration in its symmetric matrix entry of a miεmatch loop εmaller by d than the mean ε-t.
Thuε, the total signal from the stuttered sources with the stuttered targets averages out to the mean value s-t.
Thiε averaging by pooling with stuttered targets applies to all the aforementioned experiments.
Sum experiment . The stuttered source (S1+S2) iε hybridized with the complementary stuttered target T'. The stuttering is averaged away, and the desired signal strength (sl-t) + (s2-t) is measured.
Difference experiment . The stuttered source
(S1+S2) is hybridized with the complementary εtuttered target (S1+S2) '. The four hybridization εpecieε occur. The miεmatch loop length from the hybrid S2,S1' iε formed from equally stuttered S2 and SI• , so this measurement is correctly averaged.
Therefore, pooled experiments that use stuttered targets remove the stuttering from the signals.
When the measured signal is nonlinear in the loop size, factor (b) above would no longer be perfectly linear. Nonetheleεε, the relationship between loop size and signal strength remains monotonic (and invertible) . Calibration therefore removes εtutter artifact.
STR Genotyping in a Combined Heteroduplex Experiment. The sum and difference experiments * at a locus are done separately using separate PCRs: two for the sum, and one for the difference, as described above. The first PCR to construct TT' is preferably done prior to the introduction of sample genomic DNA, and can be incorporated (or "compiled") into the apparatus. Thus, with the introduction of sample DNA at "run time", the protocol of figure 8 employs two PCRs. The following describes how to reduce this to just one PCR experiment, thereby reducing operating time and space requirements. Digoxigenin is uεed aε a linker (The Geniuε Syεtem User's Guide for Filter Hybridization, 1992. Boehringer Mannheim Corporation, Indianapolis, IN) , incorporated by reference.
In an alternative embodiment, the sum and difference are obtained simultaneouεly in a combined PCR experiment. Referring to figure 9, in Step 1 an STR locuε is selected and oligonucleotides prepared. In Step 2a, unlabeled duplex TT' of a known small repeat size t is constructed by PCR or direct syntheεiε. The right primer has a digoxigenin linker 1. In Step 2b, the homoduplexes SI,SI1 and S2,S2' of an uncharacterized genomic DNA sample are amplified via PCR. The first label (*) is incorporated into the single-εtranded loop, the left primer haε the εecond label (#) , and the right primer haε a biotin linker 2. In Step 3, the duplexes are combined and denatured together at high temperature into their separate strands, yielding: S2 , SI , T , S2 ' , SI ' , and T ' .
Assume, aε before, that t<sl<s2. Renaturing at a lower temperature forms the hybridization pairs between
(S2, SI, T) and (S2, SI, T) • ,
or the nine duplex DNA molecules arranged as the table:
S2,S2' S2,S1' S2,T'
S1,S2» SI,SI' S1,T'
T,S2' T,S1» T,T'
The detectabilities in Step 4 of the DNA hybridization pairs in this table are aε followε:
The upper right triangle εub atrix hybridε provide all the detectable elements -
(S2,S1') This gives the loop difference s2-εl. (S2,T') This gives one half of the loop sum s2-t. (S1,T') This gives the other half of the loop sum εl-t.
The hybrid εpecieε (S2,S2'; SI,SI'; T,T') along the matrix diagonal are not detectable, εince the duplex εtrandε are identical in size, and no loop mismatch is formed.
The lower left triangle sub atrix hybridε are not detectable - (S1,S2') By asεumption, εl<s2, so no loop mismatch is formed.
(T,S2') By asεumption, t<s2, so no loop mismatch is formed. However, thiε helpε in blocking any unhybridized single-stranded DNA.
(T,S1') By assumption, t≤sl, so no loop mismatch is formed. However, this helps in blocking any unhybridized single-stranded DNA.
The T' (pre-made, digoxigenin linker 1) lower DNA strandε from the SI' and S2' (locus-made, biotin linker 2) lower strands are spatially separated by using two different solid supports to specifically bind the digoxigenin and the biotin linkers in different measurable regions. Thus, the εignals required for measuring the sum and the difference are detected in spatially separated experiments. In Step 5, the usual analyεiε (which exploits the expected PCR stuttering and the pooled targets) is uεed to compute the allele valueε.
In an alternative embodiment, a cleavable biotinylated linker is used on the right primer of T' that allows separate PCRs of a target and of genomic DNA, combines the samples into a single heteroduplex reaction, and then detects all nine of the hybridization products listed above. The following are measured: (a) the number of SI and S2 strands bound, and (b) the number of nucleotides in the loops. Then, the S1,T' and S2,T' measurable heteroduplex species are liberated by reduction of the dissulfide linkage, followed by remeaεuring the S2,S1' bound, and the number of nucleotides in the remaining loops. A Scalable STR Genotyping Assay. The methods described refering to figure 8 enable practical construction of the apparatus in figure 1 and syεtem manufactured device deεcribed in figure 3 in which multiple STR loci are genotyped εimultaneouεly.
Of the five steps in figure 8, only Step (1) is specific for a given STR. The other four stepε are largely independent of the given STR. Therefore, the apparatuε in figure 1 iε constructed to spatially encode multiple genetic loci on a εurface, and placeε Step (1) 'ε specific STR oligonucleotides at each spatial location, prior to complete PCR processing. For the allele εum experiment Step (2a) in figure 8 depoεits the pooled targets TT', and then Stepε (2b- 5) for the sample-dependent PCR procesεing, DNA hybridization, εignal detection, and genotype determination are performed simultaneously over the εurface. For the allele difference experiment Stepε (2-5) for the sample- dependent PCR proceεsing, DNA hybridization, signal detection, and genotype determination are performed simultaneously over the surface. In this way, the steps of figure 8 for single STR genotyping are related to the stepε of figure 3 for multiple STR genotyping.
In an alternative embodiment, the εpatiotemporal encoding of genetic loci iε not reεtricted to a surface. Instead, the three dimensions of εpace and one dimenεion of time can be uεed to multiplex the STR-specific oligonucleotides and the PCR processing. For example, multiple reaction chamberε in a three-dimenεional arrangement would each contain STR-specific oligonucleotides over some time period. The PCR processing would be done in parallel in multiple chambers, until all required signals were obtained. This physical arrangement can customize the PCR conditions, if necesεary, to each STR.
In a εecond example, commercially available 864-chamber plateε can be physically arranged to achieve over 100,000 simultaneouε characterizationε. Thiε iε done by conεtructing a surface of four plates in a 2x2 array, which provideε 3,456 chambers in a layer. Stacking thirty such layers provides 103,680 chambers. This three dimenεional arrangement is quite compact, with no chamber further than two feet from any other chamber. For the amplification step, thiε three dimenεional organization fitε into a thermocycling PCR oven. The hybridization, detection, and other steps are multiplexed in time, enabling efficient use of the robotic device, detection device, and computer to achieve a throughput commenεurate with the parallelization.
Double-Loop Detection for Improved Signal. In another embodiment, the signals from either the allele sum or allele difference experiments can be increased several-fold by detecting SS-DNA mismatch loops on Jot the upper and lower strands, rather than on just one strand. The PCR stutter can again be eliminated by uεing pooled targets.
The following description for determining a single allele refers to figures 5 and 10. The key change from the protocol referring to figure 5 is that nucleotideε on both the upper and lower strands of S are made detectable. Step 1 of figure 5 selectε the STR of intereεt, and prepares the oligonucleotides. The CA-repeat locus molecule is defined by its unique left and right primers, indicated in the figure by shading. Step 2a of figure 5 amplifieε the known homoduplexes TT'. Referring to subfigure 10A, the PCR primers 1002 (left) and 1004 (right) for the upper strand 1006 and the lower strand 1008 of the target TT' both contain linkers 1010 (e.g., biotin) for binding to solid support, but no (#) labels. The target TT' duplexes are constructed by standard PCR amplification of genomically derived DNA for 20-40 cycles using dNs without (*) labelε.
Step 2b of figure 5 amplifieε the unknown homoduplexeε SI,SI' and S2,S2'. Referring to εubfigure 10B, the firεt label (*) 1012 for loop quantitation is present on nucleotides (in equal proportions) in both strandε S 1014 and S' 1016. The label (*) indicateε detectability, whether by chemical modification or by incorporation/digeεtion. The second label (#) 1018 for strand quantitation is present on both the left 1020 and right 1022 PCR primers. The source DNA SS' is developed by standard PCR amplification of genomically derived DNA for 20-40 cycles using (*) labeled dA*, dC*, dG*, and dT*.
Following Step 3 of figure 5's denaturation and reannealing, the hybridization pairε formed are shown in the table of hybridization products of subfigure 10C in figure 10. These are:
(SS1) Homoduplex iε not detected, εince no linker iε present, and the loop size is zero.
(TT1) Homoduplex is not detected, εince the loop size is zero.
(ST') Each heteroduplex molecule has 2n loop size (*) labelε, and one εtrand (#) label. (TS') Symmetrically to ST', TS' is now also detected, with the same loop and strand labeling quantitation.
Thus, in detection Step 4 of figure 5's detection, 4n loop size (*) εignalε, and 2 εtrand (#) εignalε are measured per ST' molecule. In Step 5's allele determination, this four-fold increase in label (*) and two-fold increase in label (#) is accounted for.
The analysis of the Steps in figure 5 applies to the case of two alleles SI and S2 for determining the allele sum. Two εeparate PCRε are done, aε deεcribed: one for SI,SI' and S2,S2' labeled duplexes, and one for linker TT' targets. By ensuring that s>t, the denaturation/reannealing experiment constructs nine hybridization products. However, only those containing an T or T' linker are detectable. The end result is that each SI (S2) or SI' (S2') acts as an S (S') strand, and the sum sl+s2 is measured.
Similarly, the allele difference is determined using single-stranded loops from both the upper and lower εtrandε. Thiε again haε the advantage of εignal amplification. Here, unlabelled TT1 iε uεed only aε a SS-DNA protection agent, and containε no linker on itε PCR primers. Instead, as with the allele difference experiment of figure 6 for determining the allele difference s2-εl, the genotyping iε done by cross- hybridizing SI,SI' with S2,S2'.
Referring to figure 6, in Step 1 the STR locus and its PCR primers are chosen. In Step 2, the two complementary strands are constructed in a single PCR amplification of sample genomic DNA. In one embodiment, there are two labels on the upper strand: the first loop quantitation label (*) is preεent on nucleotideε (in equal proportionε) in both S and S'. The εecond label (#) for εtrand quantitation iε attached to the left primer. On the complementary lower εtrand S', there is one linker such as biotin, which iε attached to the 5' end of the right primer.
The hybridization iε performed in Step 3 of figure 6. Referring to figure 11, with s2>sl, the hybridization product 1102 of the denaturation and reannealing is shown in subfigure 11A. The various label and linker combinations are shown in the hybridization product table of subfigure 11B. Adding up the εignals from the first label (*) 1104,
2n (S2,S1') + 2n (S1,S2') = 4n,
and adding up the εignalε from the second label (#) 1106,
1 (SI,SI') + 1 (S2,S2') + 1 (S2,S1') + 1 (S1,S2') = 4.
Referring to figure 6, in the detection Step 4, relative to the single-stranded detection case, there is greater signal strength from the loops and strands. The 4n loop size εignal from the first label (*) representε a four-fold improvement over the εingle loop detection method originally deεcribed above. In Step 5, the allele difference ε2-sl is computed as n, i.e., the normalized (and calibrated) ratio of loop size signal from the first label (*) to εtrand number εignal from the εecond label (#) .
Aε in the Stepε of figure 9, with appropriate linker separation and detection, the two separate sum and difference experiments can be combined into a single experiment. D.2. Method for Genotyping STRs using Nucleic Acid Synthesis
Referring to figure 13, a method is described for determining a sum (or average) of STR alleles by nucleic acid synthesis that is comprised of the stepε:
(1) Identifying an STR, and εyntheεizing suitable PCR reagents;
(2) PCR amplification of template DNA using the PCR reagents;
(3) Purification of amplified complementary lower DNA strand;
(4) Nucleic acid synthesis of the upper strand;
(5) Detecting signalε from the synthesized nucleic acids;
(6) Analyzing the detected signalε to determine the genotype sum (or average) .
Referring to figure 13, step l is for identifying an STR, and syntheεizing suitable PCR reagentε.
The STR locus is identified by conventional techniques (Sambrook, J. , Fritsch, E.F., and Maniatis, T. 1989. Molecular Cloning, εecond edition . Plainview, NY: Cold Spring Harbor Presε; N. J. Dracopoli, J. L. Haineε, B. R. Korf, C. C. Morton, C. E. Seidman, J. G. Seidman, D. T. Moir, and D. Smith, ed. , Current Protocolε in Human Geneticε . New York: John Wiley and Sonε, 1994), incorporated by reference. Alternatively, preexiεting STR loci for the genome of intereεt can be obtained from available databaεeε (Genbank, GDB, EMBL; Hilliard, Daviεon, Doolittle, and Roderick, Jackεon laboratory mouεe genome databaεe. Bar Harbor, ME; SSLP genetic map of the mouse, Map Pairs, Research Genetics, Huntsville, AL) , incorporated by reference. The STR's repeat unit includes no more than three distinct nucleotides; for clarity in exposition, the following εpecification of the preferred embodiment assumes that the STR is a CA-repeat marker.
The nucleic acid sequences flanking the CA-repeat region are determined by DNA sequencing methods (Sambrook, J., Fritsch, E.F., and Maniatis, T. 1989. Molecular Cloning, εecond edition . Plainview, NY: Cold Spring Harbor Press; United Stateε Biochemical 1994. USB Sequenase version 2.0 DNA sequencing kit, sequencing protocols, 9th edition, product number 70770, Amersham Life Science, Arlington Heights, IL) , incorporated by reference. Alternatively, the sequence of all or part of the STR locuε may reside in a preexisting available database, or in the original articles describing the locus.
Three oligonucleotide primers are designed for use with the DNA sequence using computer programs that facilitate PCR primer or DNA syntheεis oligonucleotide design, such aε MacVector 4.1 (Eaεtman Chemical Co., New Haven, CT) or Oligo 4.0 (National Biosciences, Inc., Plymouth, MN) , incorporated by reference. Theεe programε facilitate εelecting lengths and positioningε of oligonucleotideε that are operative for enzymatic reactions. The two PCR primers and the reaction conditions are designed to permit amplification of the DNA sequence, and include:
(L) a left PCR primer for the upper strand, and (R*) a right PCR primer for the complementary lower strand. In the preferred embodiment, the 5' end of primer R1 is biotinylated.
A third oligonucleotide for DNA sequencing primer and its reaction conditions are designed to permit sequencing of the DNA sequence:
(Q) a left (upstream) DNA sequencing primer that is directly adjacent to the CA-repeat region of the upper strand; this sequencing primer is designed to allow extenεion acroεε the entire tandem repeat sequence using nucleotides that are specifically limited to the repeat unit base composition.
The oligonucleotide primers for the CA-repeat genetic marker are syntheεized (Haralambidiε, J. , Duncan, L. , Anguε, K. , and Tregear, G.W. 1990. The εynthesis of polyamide- oligonucleotide conjugate molecules. Nucleic Acidε Reεearch , 18(3): 493-9. Nelson, P.S., Kent, M. , and Muthini, S. 1992. Oligonucleotide labeling methods. 3. Direct labeling of oligonucleotides employing a novel, non-nucleosidic, 2- aminobuty1-1,3-propanediol backbone. Nucleic Acidε Reεearch , 20(23): 6253-9. Roget, A., Bazin, H. , and Teoule, R. 1989. Synthesis and use of labelled nucleoside phosphoramidite building blocks bearing a reporter group: biotinyl, dinitrophenyl, pyrenyl and dansyl. Nucleic Acidε Reεearch , 17(19): 7643-51. Schubert, F. , Cech, D. , Reinhardt, R. , and iesner, P. 1992. Fluorescent labelling of εequencing primers for automated oligonucleotide synthesiε. Dna Sequence , 2(5) : 273-9. Theiεen, P., McCollum, C. , and Andrus, A. 1992. Fluorescent dye phosphoramidite labelling of oligonucleotides. Nucleic Acids Symposium Serieε , 1992(27): 99-100.), incorporated by reference. Theεe primerε may be derivatized with a fluoreεcent detection molecule or a ligand for immunochemical detection εuch as digoxigenin. Alternatively, these oligonucleotides and their derivativeε can be ordered from a commercial vendor (Research Genetics, Huntsville, AL) .
Referring to figure 13, εtep 2 iε for PCR amplification of template DNA uεing the PCR reagents.
A genetic material whose genotype is to be determined is selected for study. Thiε genetic material is then placed in contact with the PCR primers L and R', and PCR amplification iε performed. The methods for this PCR amplification given here are standard, and can be readily applied to every CA- repeat or microsatellite marker that correspondε to a (relatively unique) location on a genome.
In the preferred embodiment, the genomic DNA iε mixed with the other co ponentε of the PCR reaction at 4°C. These other components include, but are not limited to, the standard PCR buffer (containing Tris pH8.0, 50 mM KCl, 2.5 mM magneεium chloride, albumin) , triphoεphate deoxynucleotideε
(dTTP, dCTP, dATP, dGTP) , the thermostable polymerase (e.g.,
Taq polymerase) . The total amount of this mixture is determined by the final volume of each PCR reaction
(preferably lOul to lOOul) , and the number of reactions.
The PCR reactions are performed on all of the reactions by heating and cooling to specific locuε-dependent temperatureε that are given by the known PCR conditions. The entire cycle of annealing, extension, and denaturation is repeated multiple times (ranging from 20-40 cycles depending on the efficiencies of the reactions and senεitivity of the detection system) (Innis, M.A. , Gelfand, D.H. , Sninsky, J.J., and White, T.J. 1990. PCR Protocolε: A Guide to Methodε and Applicationε . San Diego, CA: Academic Press.), incorporated by reference. In the preferred embodiment, for STR CA-repeat loci, the thermocycling protocol on the Perkin- Elmer PCR System 9600 machine is:
a) Heat to 94°C for 3' b) Repeat 3Ox: 94°C for 1/2' (denature)
53°C for 1/2' (anneal) 65°C for 4' (extend) c) 65°C for 7' (extend) d) 4°C soak ad librum
The PCR cycles are completed, with each reaction tube containing the amplified DNA from a specific location of the genome. Each mixture includes the DNA that was synthesized from the two alleles of the diploid genome (a single allele from haploid chromosomeε as is the case with the sex chromosomes in males or in instanceε of cellε in which a portion of the chromoεome haε been loεt such as occurs in tumors, or no alleles when both are lost) . If desired, the free deoxynucleotides and primers may be separated from the PCR products by filtration uεing commercially available filterε (Amicon, "Purification of PCR Products in Microcon Microconcentrators," Amicon, Beverly, MA, Protocol Publication 305; A. M. Krowczynεka and M. B. Henderson, "Efficient Purification of PCR Products Using Ultrafiltration," BioTechniqueε , vol. 13, no. 2, pp. 286-289, 1992) , incorporated by reference. Referring to figure 13, εtep 3 is for purification of the amplified complementary lower DNA strand.
The lower biotinylated strand is purified from the upper strand by using magnetic streptavidin coated beads (Dynal International, Oslo, Norway) . Specifically, the steps of Dynabead preparation, PCR product immobilization, DNA duplex melting using a 0.1M NaOH solution, and separation of the upper and lower DNA εtrandε to purify the lower εtrand are done, aε described (DYNAL 1993. Dynabeads biomagnetic εeparation system, Technical Handbook: Molecular Biology, Dynal International, Oslo, Norway) , incorporated by reference. Specifically, with annotations:
(1) Prepare lOOul Dynabeadε (excess)
Uεe 20 ul (200ug) of washed Dynabeads per PCR reaction.
(a) Pipette off εupernatant while holding tube by magnet
(b) Waεh beadε x 2
- Reεuspend beads in 100 ul lx Dynabead buffer - While holding tube near magnet, pipette off supernatant
(c) Resuspend in 200ul 2x Dynabead buffer
(Dynabeads concentration now 5 ug/ul
(2) Immobilize PCR product Use 0.5 ug genomic DNA and 5-10 pmole of each PCR primer.
(a) add PCR product
Remove 40 ul PCR material from PCR tube under oil with pipette. Add 40 ul Dynabead to 40ul PCR product (b) incubate at room temp for 30 minutes
Gently rotate tube to keep Dynabeadε suspended.
(3) Melting the DNA duplex
(a) pipette off supernatant while holding tube near magnet
(b) add 40ul lx Dynabead buffer
IMMOBILIZED PRODUCT IS NOW STABLE (4oC x several weeks)
(c) pipette off supernatant while holding tube near magnet
(d) add 8 ul 0.1M NaOH solution (freshly prepared)
(e) incubate at room temp for 10 minutes
(4) Separating the DNA εtrandε
(a) pipette off εupernatant while holding tube by magnet, and store supernatant in another tube supernatant = nonbiotinylated strand
(b) wash x3 :
- add 50 ul 0.1M NaOH solution (freshly prepared) ; pipette off supernatant while holding tube near magnet
- add 40 ul lx Dynabead buffer; pipette off supernatant while holding tube near magnet
- add 50 ul lx TE buffer; pipette off supernatant while holding tube near magnet
(c) adjust volume with water for sequencing reaction - add 7 ul sterile water For 20ml 2x Dynabead binding and washing buffer: 16 ml 2.5 M NaCl 200 ul 1 M TRIS, pH 7.6 40 Ul 0.5M EDTA 3.76 ml sterile water
Referring to figure 13, step 4 is for nucleic acid synthesis of the upper strand.
The purified amplified lower DNA strand serves as a template for a sequencing reaction. Starting from the left flanking primer Q, the sequencing reaction provides a template-directed εyntheεiε that extendε the upper strand across the CA-repeat region. The nucleotides used are:
(Q) The DNA sequencing primer that flanks the CA-repeat region, and initiates the sequencing reaction.
(dNTPε) Extension is largely restricted to the repetitive sequence by including only dNTPs that appear in the repeat unit. For a CA-repeat, only dATP and dCTP are used. One or both of these dNTPs are labeled with a detectable label *, preferably a radioisotope εuch as 35S or 3*-P (DuPont NEN Research Products, Boston, MA) , or a fluorescent probe (Biological Detection Systemε, Pittεburgh, PA) . When uεing fluoreεcein-labeled dUTP (DuPont NEN Reεearch Productε, Boεton, MA) , the roles of the "upper" and "lower" strandε are exchanged, εo that the template (rather than the εyntheεized product) containε the CA-repeat.
(ddNTP) Termination iε restricted to nucleotides not contained in the repetitive sequence. For a CA-repeat marker, ddGTP or ddTTP (ddUTP) are used, depending on the sequence of the marker. The termination molecule is labelled with a second label **, that is distinct from the first label *, and can be independently detected. When a radioisotope is used for the first label *, fluorescein-labeled ddNTP (DuPont NEN Research Products, Boston, MA) is a convenient second label **.
Sequencing is done using standard DNA sequencing protocols (Sambrook, J. , Fritsch, E.F., and Maniatiε, T. 1989. Molecular Cloning, εecond edition . Plainview, NY: Cold Spring Harbor Preεε; Ausubel, F.M. , Brent, R. , Kingston, R.E., Moore, D.D., Seidman, J.G., Smith, J.A. , and Struhl, K. , ed. 1993. Current Protocolε in Molecular Biology. New York, NY: John Wiley and Sons; N. J. Dracopoli, J. L. Haines, B. R. Korf, C. C. Morton, C. E. Seidman, J. G. Seidman, D. T. Moir, and D. Smith, ed. , Current Protocolε in Human Geneticε . New York: John Wiley and Sons, 1994) , incorporated by reference. A highly proceεsive polymerase enzyme having little or no exonuclease activity is preferably used, such as Sequenase 2 (U.S. Biochemical, Cleveland, OH) . Protocols optimized for the selected enzyme (United States Biochemical 1994. USB Sequenase version 2.0 DNA sequencing kit, sequencing protocols, 9th edition, product number 70770, Amersham Life Science, Arlington Heights, IL) , incorporated by reference, are applied, with the (labeled and unlabeled) dNTPs and ddNTPs deεcribed above εubεtituted for the dNTPε and ddNTPε contained in the conventional εequencing protocol. The uεe of Mn buffer can be helpful when synthesizing εhort sequences.
Referring to figure 13, step 5 is for detecting signals from the synthesized nucleic acids. The newly εyntheεized upper DNA sequence formed by means of the DNA sequencing reaction remains hybridized to the biotinylated lower strand, which in turn is tightly bound to the streptavidin beads. The DNA εequencing primerε, nucleotideε, and other reagentε are removed by repeated gentle waεhing with a buffer that promoteε double stranded DNA, such as the Dynabead binding and washing buffer (DYNAL 1993. Dynabeads biomagnetic εeparation εyste , Technical Handbook: Molecular Biology, Dynal International, Oslo, Norway) , leaving only the bound duplex DNA containing the desired purified product. Since the only labels present in the duplex reεide on the newly synthesized upper DNA sequence (with no label * or ** present on the lower template DNA) , the strands need not be separated. Fluorescence signals are detected and quantitated, preferably by means of a fluorimeter. Radioactive signalε are detected and counted, preferably by meanε of a scintillation counter.
For quality asεurance or development work, εtandard sequencing. gels can be used for detecting signals from the syntheεized nucleic acids (Sambrook, J. , Fritsch, E.F., and Maniatis, T. 1989. Molecular Cloning, Second Edition . Plainview, NY: Cold Spring Harbor Presε) , incorporated by reference. These protocols include a DNA denaturation step.
Referring to figure 13, εtep 6 iε for analyzing the detected εignalε to determine the genotype εum (or average) .
The ratio of the repeat unit label * to the end label ** varieε in direct proportion to the number of tandem repeats.
Precalibration with a set of predetermined reference alleleε can eεtabliεh the εcale factor, and any deviationε from linearity. PCR εtutter artifact iε accounted for by deconvolution with the known stutter distribution (Perlin, M.W. , Burks, M.B., Hoop, R.C., and Hoffman, E.P. 1994. Toward fully automated genotyping: allele assignment, pedigree construction, phase determination, and recombination detection in Duchenne muscular dystrophy. Am . J . Hum . Genet . , 55(4): 777-787), incorporated by reference.
For a single allele (e.g., hemizygote or ho ozygote) , this analysiε procedure computeε the genotype. For more than one allele (e.g., heterozygote) , this procedure computes the average (or, equivalently, the sum) of the alleles.
Referring to figure 14, a method is described for determining a difference of STR alleles by nucleic acid syntheεiε that is comprised of the steps:
(1) Identifying an STR, and syntheεizing suitable PCR reagents;
(2) PCR amplification of template DNA uεing the PCR reagentε;
(3) Purification of amplified complementary lower DNA strand; (4') Nucleic acid synthesis of the upper strand; (5) Detecting signalε from the εynthesized nucleic acidε;
(6') Analyzing the detected εignalε to determine the genotype difference.
Steps 1, 2, 3, and 5 have been described in figure 13.
Referring to figure 14, step 4 ' is for nucleic acid synthesis of the upper strand, and is comprised of the steps:
(4' a) Unlabeled restricted syntheεiε. (41 b) Heteroduplex formation. (4* c) Labeled restricted synthesiε.
Referring to figure 14, εtep 4'a iε for unlabeled restricted syntheεiε of the upper εtrand.
The purified amplified lower DNA εtrand serves as a template for a sequencing reaction. Starting from the left flanking primer Q, the sequencing reaction provides a template-directed syntheεis that extends the upper strand across the CA-repeat region. The nucleotides used are:
(Q) The DNA sequencing primer that flanks the CA-repeat region, and initiates the sequencing reaction.
(dNTPε) Extenεion iε largely reεtricted to the repetitive sequence by including only dNTPs that appear in the repeat unit. For a CA-repeat, only dATP and dCTP are used. These are both unlabeled.
(ddNTP) These are specifically excluded from the reaction mixture.
Sequencing is done using εtandard DNA εequencing protocolε (Sambrook, J. , Fritsch, E.F., and Maniatis, T. 1989. Molecular Cloning, εecond edition . Plainview, NY: Cold Spring Harbor Preεε; Ausubel, F.M. , Brent, R. , Kingston, R.E., Moore, D.D., Seidman, J.G., Smith, J.A. , and Struhl, K. , ed. 1993. Current Protocolε in Molecular Biology. New York, NY: John Wiley and Sons; N. J. Dracopoli, J. L. Haines, B. R. Korf, C. C. Morton, C. E. Seidman, J. G. Seidman, D. T. Moir, and D. Smith, ed. , Current Protocolε in Human Geneticε . New York: John Wiley and Sons, 1994) , with an excess of dNTPs relative to primer and template. A highly proceεεive polymeraεe enzyme having little or no exonuclease activity is preferably used, such as Sequenase 2 (U.S. Biochemical, Cleveland, OH) . Protocols optimized for the selected enzyme (United States Biochemical 1994. USB Sequenase version 2.0 DNA sequencing kit, sequencing protocols, 9th edition, product number 70770, Amersham Life Science, Arlington Heights, IL) are applied, and the unlabeled dNTPs described above are subεtituted for the dNTPs and ddNTPs contained in the εtandard sequencing protocol. Washing with the stabilizing Dynabead binding and washing buffer is then done 2-4 times (DYNAL 1993. Dynabeads biomagnetic separation system. Technical Handbook: Molecular E ology, Dynal International, Oslo, Norway) to remove the unincorporated primers and dNTPs, and thereby purify the duplex DNA comprised of lower strand template and partially synthesized unlabeled upper strand DNA.
Referring to figure 14, step 4 b is for heteroduplex formation between different alleles of the upper and lower strands.
In the preferred embodiment, sodium hydroxide is used to melt the duplex, and an equimolar amount of hydrochloric acid is then subsequently used to reanneal (DYNAL 1993. Dynabeadε biomagnetic separation system, Technical Handbook: Molecular Biology, Dynal International, Oslo, Norway) . Specifically (p. 23) , using the bead-immobilized double stranded product,
(3) Melting the DNA duplex
(c) pipette off εupernatant while holding tube near magnet
(d) add 8 ul 0.1M NaOH εolution (freεhly prepared) (e) incubate at room temp for 10 minuteε (41) Reannealing the DNA duplex
(a) neutralize with 4ul 0.2M HCl and lul 1M Tris- HCl (pH adjusted to optimum of sequencing enzyme) .
(b) mix immediately with a pipette and adjust the volume with water according to the εequencing protocol.
(c) the same pipette is always used for both NaOH and HCl to avoid εmall differenceε in calibration that can cauεe neutralization problemε.
In an alternative embodiment, the denaturing and renaturing iε done by heating the duplex DNA solution to a temperature of 65°C to 95°C for a period of 2 to 30 minutes, and then gradually cooling the solution over a period of 15 to 90 minutes to a temperature between 25°C and 40°C.
Referring to figure 14, step 4 'c is for labeled restricted syntheεiε of the upper εtrand.
The purified amplified lower DNA εtrand εerveε aε a template for continuing the εequencing reaction. Starting from the left flanking primer Q that haε been partially extended acroεε the CA-repeat region, the template-directed εyntheεiε continues the upper strand sequencing acroεε the CA-repeat region. The nucleotides used are:
(Q) No additional DNA sequencing primer is used.
(dNTPε) Extension iε largely reεtricted to the repetitive εequence by including only dNTPs that appear in the repeat unit. For a CA-repeat, only dATP and dCTP are used. One or both of these dNTPε are labeled with a detectable label *, preferably a radioiεotope such as 35S or 32P (DuPont NEN Research Products, Boston, MA) , or a fluorescent probe (Biological Detection Systems, Pittsburgh, PA) . When using fluorescein-labeled dUTP (DuPont NEN Research Products, Boεton, MA) , the roleε of the "upper" and "lower" strands are exchanged, so that the template (rather than the synthesized product) contains the CA-repeat.
(ddNTP) Termination is restricted to nucleotideε not contained in the repetitive sequence. For a CA-repeat marker, ddGTP or ddTTP (ddUTP) are used, depending on the sequence of the marker. The termination molecule is labelled with a second label **, that is distinct from the first label *, and can be independently detected. When a radioisotope is uεed for the firεt label *, fluorescein-labeled ddNTP (DuPont NEN Research Products, Boεton, MA) iε a convenient εecond label **.
Sequencing iε done uεing standard DNA sequencing protocols (Sambrook, J. , Fritsch, E.F., and Maniatis, T. 1989. Molecular Cloning, εecond edition . Plainview, NY: Cold Spring Harbor Press; Ausubel, F.M. , Brent, R. , Kingεton, R.E., Moore, D.D. , Seidman, J.G., Smith, J.A. , and Struhl, K. , ed. 1993. Current Protocolε in Molecular Biology. New York, NY: John Wiley and Sons; N. J. Dracopoli, J. L. Haines, B. R. Korf, C. C. Morton, C. E. Seidman, J. G. Seidman, D. T. Moir, and D. Smith, ed. , Current Protocolε in Human Geneticε . New York: John Wiley and Sonε, 1994) , incorporated by reference. A highly proceεεive polymeraεe enzyme having little or no exonuclease activity is preferably uεed, εuch aε Sequenaεe 2 (U.S. Biochemical, Cleveland, OH) . Protocolε optimized for the εelected enzyme (United Stateε Biochemical 1994. USB Sequenaεe verεion 2.0 DNA sequencing kit, sequencing protocolε, 9th edition, product number 70770, Amersham Life Science, Arlington Heights, IL) are applied, and the (labeled and unlabeled) dNTPs and ddNTPs described above are subεtituted for the dNTPs and ddNTPs contained in the standard sequencing protocol.
The result of this unlabeled/heteroduplex/labeled restricted sequencing reaction is a set of four possible newly synthesized upper strandε, correεponding to the two alleleε ε and t, where the length of allele s is lesε than or equal to the length of allele t:
(s,ε*) Thiε homoduplex product iε unlabeled with *, and may have a **-labeled terminator dye.
(t,f) Thiε homoduplex product is unlabeled, and may have a **-labeled terminator dye. (t,s*) This heteroduplex product is unlabeled, and may have a **-labeled terminator dye.
(s,f) From the 5* end, this heteroduplex product is comprised of unlabeled primer, an unlabeled repetitive sequence with about s repeated CA units, a *-labeled repetitive sequence with about (t-s) repeated CA units, and has a **-labeled terminator dye.
Referring to figure 14, step 6* iε for analyzing the detected εignalε to determine the genotype difference.
The ratio of the repeat unit label * to the end label ** varieε in direct proportion to the number of tandem repeatε. Since only one quarter of the reannealed duplexeε contain **, the conεtant applied to the ratio of label * to label ** iε greater than that of εtep 6 of figure 13. Precalibration with a εet of predetermined reference alleleε can eεtablish this scale factor, and any deviations from linearity. PCR stutter artifact is accounted for by deconvolution with the known stutter distribution (Perlin, M.W. , Burks, M.B., Hoop, R.C., and Hoffman, E.P. 1994. Toward fully automated genotyping: allele assignment, pedigree construction, phase determination, and recombination detection in Duchenne muscular dystrophy. Am . J. Hum . Genet . , 55(4): 777-787), incorporated by reference.
For a pair of alleleε (e.g., heterozygote) , thiε analyεiε procedure computes the difference between the two alleles of the genotype.
Referring to figure 15, a method is deεcribed for determining STR alleleε by nucleic acid εyntheεis that is comprised of the steps:
Perform the steps of figure 13.
Perform the steps of figure 14.
Combine the recalibrated ratio of label * to label ** from step 6 of figure 13, together with the recalibrated ratio of label * to label ** from step 61 of figure 14. Thiε combination iε preferably done by precalibration with a set of predetermined reference alleles that establish the mapping from the pair of measured ratios to the actual allele pairs. Alternatively, the alleles ε and t are computed directly from the εum (or average) ε+t and difference s-t. PCR stutter artifact is accounted for by deconvolution with the known stutter distribution (Perlin, M.W. , Burks, M.B. , Hoop, R.C., and Hoffman, E.P. 1994. Toward fully automated genotyping: allele assignment, pedigree construction, phase determination, and recombination detection in Duchenne muscular dystrophy. Am . J. Hum . Genet . , 55(4): 777-787).
Referring to figures 13, 14, and 15, alternative embodimentε for determining STR alleleε by nucleic acid εynthesis are given:
(a) Ligation with a reporter sequence R that flanks the CA-repeat region immediately the right (downstream) can be used, instead of a ddNTP dye terminator.
(b) Other label molecules (such as biotin) can be uεed on the newly εynthesized upper εtrand. In one embodiment, the lower PCR amplified εtrand iε conεtructed with a cleavable biotinylated primer (Pierce, Rockford, IL) , εuch aε a diεulfide link that can be subsequently cleaved with a reducing agent (e.g., DTT) . The upper strand is then εynthesized from the three (5* to 3') consecutive units:
(Q) A primer that is end-labeled with the εtrand counter second label **.
(dNTPs) Nucleotideε that are restricted to the composition of the repetitive unit, at least one of which is labeled with the repeat counter first label *. For a CA-repeat, thiε could be *-dATP and dCTP.
(R) A biotinylated reporter R that iε added after the reducing agent haε cleaved the biotinylated PCR primer from the streptavidin beadε. In one embodiment, the reporter R iε a biotinylated terminating ddNPT that is added by means of a sequencing enzyme. In another embodiment, reporter R is a biotinylated oligonucleotide that iε added aε the right flanking sequence of the repetitive sequence by meanε of a ligation enzyme.
(c) The detection reagentε used for the required labeling may include (but are not limited to) radioactivity, fluorescence, phosphorescence, chemiluminescence, electrical resistivity, pH, and ionic concentration.
(d) The lower strand can be sequenced, instead of the upper strand.
(e) A repetitive unit other than CA, but containing no more than three distinct nucleotides, can be used. In this case, dNTPs are used for every nucleotide in the repetitive unit, with at least one of the repetitive unit nucleotides labeled with the first label *, and ddNTP(ε) are used for every nucleotide not in the repetitive unit, with the appropriate terminating nucleotide immediately following the repetitive sequence labeled with the second label **.
E. Method for Genotyping STRs using a Hybridization Panel
The hybridization panel method for genotyping STRs is distinguiεhed from the loop miεmatch method described previously in that the determination of an STR's alleleε iε accompliεhed with an entire panel of hybridization probeε, rather than determining the alleleε with only two loop miεmatch hybridization experimentε. This hybridization panel method generally entails more hybridization experiments per STR than the loop mismatch method. However, this approach is applicable to the determination of specific nucleotide sequences realted to genomic DNA, specific genes, and known mutations.
The central idea of the hybridization panel method for genotyping STR alleles is to have a detection panel of DNA probes. When an apparatus for genotyping multiple STRs is used, each spatial location of εaid apparatuε correεpondε to one genetic STR locuε and containε a εeparate detection panel. This panel measureε the extent of specific DNA binding of the patient's DNA against a set of probes. A second coordinate of information can optionally be obtained by performing the reactions over a range of reaction εtringencieε (e.g., using temperature, ion concentration, or DNA denaturants) . The reεult is a mapping from one or two coordinates (probe and stringency) into the reaction energetics (binding affinity) . Different alleles produce different energy εurfaceε. Hence, unique pairwiεe combinationε of alleles will produce unique signature patterns. By performing the experiment described herein, the εignature can be observed, hence the zero, one, or two alleles at a sample point uniquely determined.
E.l. Method for Genotyping STRs using a Direct Hybridization Panel
To fix ideas, let L(CA)nR be one allele in the patient's PCR product for a given STR reaction chamber in the two dimensional array. Here, L is the left flanking region DNA εubsequence, R is the right flanking region DNA subsequence, and n is the number of allelically varying CA repeatε, εo that (CA)n iε the middle DNA εubsequence of length 2n. The left PCR primer (denoted by P) iε a prefix subsequence of the left flanking region L, and the right PCR primer (denoted by S) is a suffix εubsequence of the right flanking region R. For constructing probes to such PCR productε, note that a GT polymer binds complementarily to a CA polymer.
In a preferred embodiment for constructing said detection panel, each detection panel is customized to the PCR product of itε STR allele. Thiε iε done by providing a panel of allele εpecific oligonucleotideε (ASOs) (Lemna, W.K. , Feldman, G.L., Kerem, B.-S., Fernbach, S.D., Zevkovich, E.P., O'Brien, W.E., Riordan, J.R. , Collins, F.S., Tsui, L.- C, and Beaudet, A.L. 1990. Mutation analyεiε for heterozygote detection and the prenatal diagnosis of cystic fibrosis. N. E. J. Med . , 322: 291-296), incorporated by reference, where each ASO contains an allele-specific left flanking region, concatentated with a number n of repeat unit nucleotideε, concatentated with an allele-εpecific right flanking region. The lengths of the left and right regions flanking the varying size repeat polymer are individually adjusted to ensure that the left and right oligomers have roughly the εame DNA binding energies when hybridizing to their respective complementary DNA strandε.
The thermodynamic baεiε for thiε (and alternative) approaches is that while perfect DNA duplex matches will have minimum energy, mismatches will induce bulges or loops in the DNA duplex molecule that increase the free energy. A two base-pair bulge will have sufficiently increased free energy (Ninio, J. 1979. Biochimie , 61: 1133. Salser 1977. Cold Spring Harbor Symp. Quant . Biol . , 42: 985.), incorporated by reference, to reduce binding affinity by several kcal/mole relative to a perfect match; the larger the bulge, the more unfavorable the binding. Therefore, given a STR target with n repeating unitε in the middle (anchored by left and right flanking εequenceε) , and a STR source PCR product with m of complementary repetitive units (anchored by the complementary left and right flanking sequences) , high stringency DNA hybridization is a εenεitive meaεure of whether or not m=n. In thiε way, a panel of ASOε that provide for all valueε of n is used to determine the m values expressed from the PCR product.
With CA-repeats aε STRε, each DNA target probe in the panel haε the form LøfCAJnRo, or the complementary form I-o* (CA)n »Ro' , where n varieε across the polymorphic (CA)n alleles of the genetic locus (say, n = 15, 16, ..., 30), LQ iε a εuffix of the DNA flanking εequence L, Rj is a prefix of the DNA flanking sequence R, and U* iε the complementary εtrand of DNA εequence U.
Conεider the example panel of target probeε for the STR- 45 locuε residing in an intron of the dystrophin gene (Clemens, P., Fenwick, R. , Chamberlain, J. , Gibbs, R. , de Andrade, M. , Chakraborty, R. , and Caskey, C. 1991. Linkage analyεis for Duchenne and Becker muscular dystrophies using dinucleotide repeat polymorphismε. Am J Hum Genet , 49: 951- 960.), incorporated by reference. 15 baεeε are taken from the left flanking AT-rich region, and 10 baεeε from the right flanking GC-rich region in order to equalize the DNA hybridization energieε, aε
LQ = ATTAGTTGACCTAAA
Ro = CCCCTTGCCA Target probes are then constructed by inserting (CA)nunitε, e.g. ,
(CA)10= CACACACACACACACACACA.
Then, the panel of target probes is constructed as the set of DNA sequences formed by concatenating L0, (CA)n, and Ro, as
{ L0 (CA)n Ro J n varies from 10 to 40 by 1 }.
The complementary PCR εource productε have the form
{ LQ' (GT)m' Ro' j m varieε from 10 to 40 by 1 }.
When an exact match occurs between allele source and probe target, i.e., the GT-repeat polymer length exactly equals the CA-repeat polymer n, the binding is energetically most favorable (i.e., stable). Thuε, under appropriate hybridization binding conditionε (Sambrook, J. , Fritεch, E.F., and Maniatiε, T. 1989. Molecular Cloning, εecond edition . Plainview, NY: Cold Spring Harbor Preεε.), incorporated by reference, the two alleleε ml and m2 will bind moεt avidly to the two probeε in the target panel having the correεponding nl=ml and n2=m2. The detection of the two εpecific targetε nl and n2 out of the entire target panel can be effected by a variety of methodε, as described next.
Confirmation of the energetics for the STR-45 locuε target panel can be εeen in the following data generated by running Zuker'ε RNA folding program (Zuker, M. , and Stiegler, P. 1981. Optimal computer folding of large RNA εequences uεing ther odynamicε and auxiliary information. Nucleic Acidε Reεearch , 9: 133-148.), incorporated by reference. The left and right flanking εequences each contain ten bases. The temperature here is set to 70*C, a source is uεed with m = 21, and a panel of targetε with n = 18, 19, ..., 24. Aε εhown, the energetic difference between target 21 and itε nearest neighbors exceeds 2 kcal/mole, and is thus unamibiguously detectable.
Target 18 19 20 21 22 23 24 kcal/mole -45.4 -48.4 -51.7 -57.5 -53.8 -52.6 51.7
To implement thiε differential detection, one detection panel iε provided for the PCR productε of each genetic marker. Each detection panel correεponds to one marker locus, and is embedded at that locuε' coordinate in the εpatially localized PCR marker grid. The two εurfaces (PCR and detection) may be separate or composite. In thiε detection panel scheme, the oligomers flanking the STR region are (in general) different for every genetic marker. That is, the target probe panel sequences are cuεtomized to each genetic marker.
In another preferred embodiment, a εecond coordinate of hybridization εtringency would be added. Thiε εtringency variation can be implemented by varying any of several factors in the hybridization, including temperature, ion concentration, formamide concentration, and nucleotide compoεition (Sambrook, J., Fritsch, E.F., and Maniatis, T. 1989. Molecular Cloning, εecond edition . Plainview, NY: Cold Spring Harbor Presε.), incorporated by reference. The two coordinateε of differential targetε and differential εtringency give an even clearer εignature for STR alleleε. The signature of two alleles is formed by superimpoεing those of single alleles. By predetermining all possible single allele and paired allele patterns, unique signatures (in one or two coordinates) can be generated, and then later retrieved to effect genotyping into the component alleles. This is done by comparing the measured genotype signature at a genetic locus with the retrieved signatures, and determining a best match. Alternatively, the separation of the superimpoεed patternε to effect genotyping can be done without recourse to such a library of signatureε by curve fitting or deconvolution proceεsing.
Such signatureε are εeen in simulations with Zuker's program using CA-repeats as STRs, where the parameters are as before, but, additionally, the temperature assumeε the multiple valueε 60*C, 70*C, and 80"C. With the flanking markerε, this serves to reinforce the pattern of best match when m=n.
Target 18 19 20 21 22 23
24 60*C -61.4 -65.0 -68.9 -75.4 -71.6 -70.4
69.5
70'C -45.4 -48.4 -51.7 -57.5 -53.8 -52.6
51.7
80*C -32.0 -33.6 -36.5 -41.7 -38.1 -36.8 36.8
In this second differential detection approach, again one unique detection panel iε provided for the PCR products of each genetic marker. However, for each STR locus, the target panel is replicated for every measured εtringency. This replication can be accomplished by providing for the PCR products of the STR:
(1) A εingle panel, reuεed at different timeε with varying εtringencieε. The εtringency variation can be effected by temperature ramp, or by changing the chemical environment of the hybridization over time.
(2) Multiple panelε, uεable at the εame or different timeε, with varying εtringencieε. Here, the genetic locuε grid and itε PCR amplification iε replicated across each of the multiple target panels.
(3) Multiple panels on the same surface. This is done by placing multiple target panels, each with a different εtringency, on the εame εurface. Theεe are all be located in the same region of one genetic locus and its PCR amplification. Alternatively, one genetic locus may be replicated multiple times on the εame εurface, at each position having a target panel of identical composition, but different stringency.
(4) Any combination of the above.
An alternative embodiment uses an identical detection panel of target oligonucleotides for every genetic locus.
This has the utility of reducing manufacturing costs, since no STR locus customization is required, and the same detection panel design and manufacture iε reuεable for every genetic locus. With CA-repeats as STRs, each grid is comprised of the target panel { (CA)n ! n varieε acroεε all intereεting polymorphisms }•
For example, n could range from 10 to 40.
In another embodiment, intentional DNA pairing mismatch is introduced to bias the hybridization againεt further STRs. This can be done by a three-fold expansion of these probes by adding a mismatching base pair at one end. For example, with CA-repeats as the STR, these four probe fa ilieε are poεεible for every n:
{A,C,T}(GT)n, {C,G,T}(CA)n, (GT)n{A,C,T}, or (CA)n{A,G,T}.
Within each family, say {A,G,T} (CA)n, the three probes
C(CA)n, G(CA)n, T(CA)n, but not A(CA)n,
are provided. The idea is that an intentional miεmatch iε introduced to avoid the cloεe energeticε produced from DNA slippage during hybridization.
Extending this, there is a nine-fold expanεion of the STR by introducing intentional mismatch on both sides of the repeat region. For example, with CA-repeats as the STR, these nine probes are generated for every n:
{C,G,T}(CA)n{A,G,T}.
This provides a better balanced miεmatch. The εtrategy haε been uεed in PCR primer deεign for developing microεatellite markerε. Thiε approach can be further extended to introduce aε much biaε againεt STR extenεion as desired, by building targets for every n that have some number of STR blockers on the left, and some number of STR blockers on the right. The main advantage of this intentional mismatch approach iε improved STR εpecificity for a fixed length. The main diεadvantage iε the increased number of target DNA probes required in the detection panel.
In another embodiment, the same detection panel iε uεed for every genetic locuε, but intentional mismatch is introduced by changing the target DNA composition. With CA- repeats as STRs, a family of (CA)D or (GT)n probes are used, but changes are introduced in specific bases. For example, some G's are changed to Cs, or to the energetically similar base inosine. One (of many) doping strategy is to introduce k evenly spaced doping sites, where k = 0,1,2, and so on. In general, doping the targets reduceε binding affinitieε in a εelective way.
In another embodiment, the doping iε introduced in the εource molecule, rather than in the targetε. This has the advantage of requiring juεt one target DNA molecule (i.e., a very large repeated oligomer) for all the genetic loci. Thuε, the manufacturing coεtε are greatly reduced, εince replicated complex panelε for each locus are not needed. The extent of doping is introduced (say, with inosine) as a variable into the PCR reaction itself. The doping iε random acroεε the PCR productε, but haε conεtant εtatiεticε, particularly in the repetitive unit region of the unknown STR PCR product molecule. If two coordinate signatures are deεired, hybridization εtringency variation can be introduced aε well.
In another embodiment, a single STR detection probe is used for all experiments. Using a single probe, say (CA)n (n large and fixed) , dramatically reduces manufacturing costs. A temperature ramp experiment is then conducted in parallel for every genetic locuε by varying εtringency. For each PCR product with GT-repeat length, when its subpopulation of (GT)k εequenceε rapidly meltε, there will be a εharp change in the melting profile. Thiε will be detectable as a peak in the first derivative of the curve. The peaks provide a DNA size vε. concentration mapping that can then be used to determine the alleles.
These embodiments work with STR repeat units of any size. The newer trinucleotide repeatε, tetranucleotide repeatε, etc. are more favorable energetically, and provide greater allele differentiation. In inεtances where unique DNA sequences are assayed, the size of the bound detection oligonucleotide is adjuεted to maximally diεcri inate between a perfect match and a εingle baεe pair miεmatch. An alternative to detecting perfect vε. mismatched heteroduplexes is using chemical modification reagents (such as CII, CAA, Os04, or hydroxylamine) that can react with single nucleotide mismatcheε and then be detected.
In the hybridization detections, the roles of the upper strand and the lower strand may be interchanged. ith CA- repeats, this would mean that the CA-strand and the GT-εtrand relationε would be interchanged. Nested PCR (Yourno 1992. A Method for Nested PCR with Single Closed Reaction Tubes. PCR Meth . Appl . , 2(1): 60-65. Inniε, M.A. , Gelfand, D.H. , Sninεky, J.J., and White, T.J. 1990. PCR Protocolε : A Guide to Methodε and Applicationε . San Diego, CA: Academic Preεs.) , incorporated by reference, can be done for a purer PCR amplification to reduce noise. Two primer pairs are used: one pair for the initial amplification, and one labeled pair for the secondary amplificiation and detection. Ligase chain reaction (LCR) (Landegren, U. , Kaiser, R. , Sanders, J. , and Hood, L. 1988. A ligase-mediated gene detection technique. Science , 241: 1077-1080.), incorporated by reference, can be used in place of PCR when an aεεay for exact match iε deεired, aε is the case with the described panel hybridizationε.
In the hybridization detection aεsays described, both strands must be nucleic acidε. Whether these are comprised of DNA, RNA, or any other nucleic acid polymer is nonessential. The key requirement is the binding specifity of complete and partial sequence matches. Further, these nucleic acids are modified (e.g, with linker oleculeε, biotin, detection moietieε) to perform the detection componentε of the method.
E.2. Method for Genotyping STRs using a Nucleic Acid Ligation
Referring to figure 16, a schematic representation is shown of an assay for determining STR alleles from a nucleic acid ligation step.
Standard oligonucleotide ligation asεay (OLA) aεεays for the exact match of a pair of oligonucleotides X and Y against a DNA template molecule previously amplified by PCR (Landegren, U. , Kaiser, R. , Sanders, J., and Hood, L. 1988. A ligaεe-mediated gene detection technique. Science , 241: 1077-1080; Inniε, M.A. , Gelfand, D.H. , Sninεky, J.J., and White, T.J. 1990. PCR Protocolε: A Guide to Methods and Applicationε . San Diego, CA: Academic Preεε) , incorporated by reference. Following amplification with the PCR primerε L and R' , two ligation oligonucleotides are conventionally used:
(X) initiates the matching sequence from the 5' end, and is biotinylated;
(Y) completes the matching εequence to the 3 end, and is labeled (e.g., with radiolabel or fluorescent label). The 51 end of Y is phosphorylated to allow ligation to X.
When the sequence XY is complementary to a subsequence of the template DNA, ligation occurs and the match is detected. For CA-repeat (or any other polynucleotide repeat) marker detection, the variable length repeat precludeε the deεcribed uεe of thiε aεεay. However, by introducing a εet of third oligonucleotideε {Zk}, where each Zk is a k-fold repeat of the unit Z (Z="CA" in the preferred embodiment) , CA-repeat alleleε can be detected. Specifically,
(Zk) bridges the gap between X and Y. The 5' end of Zk is phosphorylated to allow ligation to X. The phosphorylated Y, in turn, is ligated to Zk.
This CA-repeat detection differs from conventional ligation asεays in that (a) a three-way ligation is performed, (b) a set of intermediate molecules is uεed, (c) these intermediate molecules are universally reusable for aεεaying more than one CA-repeat marker, and (d) a εequence of varying length can be detected.
A panel of aεεayε iε conεtructed, one for each intermediate εequence Zk which haε k repeatε of the base unit Z. The choice of k's panel correεpondε to the allele diεtribution (hence repeat εizeε) of the CA-repeat marker. When detecting two alleleε, the best Zk's which have the εtrongeεt εignalε determine the alleleε. This detection can be improved on by deconvolving the panel of signals with the known PCR stutter pattern of the alleles (Perlin, M.W. , Burks, M.B. , Hoop, R.C., and Hoffman, E.P. 1994. Toward fully automated genotyping: allele assignment, pedigree construction, phase determination, and recombination detection in Duchenne muεcular dystrophy. Am . J . Hum . Genet . , 55(4): 777-787), incorporate by reference. Deconvolution methods can be similarly applied for assaying more than two alleles, as is done in population studies.
In an alternative embodiment, ligation chain reaction (LCR) is performed, rather than a PCR amplification followed by an OLA detection εtep. Thiε embodiment uεes the three oligonucleotideε X, Y, and Z described above. Specific protocols can be found in (Ausubel, F.M. , Brent, R. , Kingston, R.E., Moore, D.D., Seidman, J.G., Smith, J.A. , and Struhl, K. , ed. 1993. Current Protocolε in Molecular Biology . New York, NY: John Wiley and Sons; Dracopoli, N.J., Haines, J.L., Korf, B.R. , Morton, C. C. , Seidman, C.E., Seidman, J.G., Moir, D.T., and Smith, D., ed. 1994. Current Protocolε in Human Geneticε . New York: John Wiley and Sons; Landegren, U. , Kaiεer, R. , Sanderε, J. , and Hood, L. 1988. A ligaεe- mediated gene detection technique. Science , 241: 1077-1080), incorporated by reference. E.3. Method for Genotyping STRs using a Nucleic Acid Loop Ligation
Referring to figure 17, a schematic representation iε εhown of an aεεay for determining STR alleles from a nucleic acid loop ligation step.
Two unique primers for a specific microsatellite are constructed. The primerε are selected to flank the tandem repeat but to leave at least 15 to 20 bp of internal unique εequence flanking the repeat region.
A loop oligonucleotide iε conεtructed from the internal, unique flanking sequences within the PCR'd product. The oligonucleotide is designed to have significant base mismatching if there is "εlippage" and a portion of the oligonucleotide extendε into the 5' and 3' portionε of the tandem repeat. The degree of extenεion into the repeat can be varied but iε done so that the bridging oligonucleotides are smaller, preferably 15-20 nucleotides than the loop oligonucleotide. A melting temperature for the loop oligonucleotide that is about 10° higher than the largest bridging oligonucleotide is desirable. In the preferred embodiment, the loop oligonucleotide is biotinylated or covalently bound to a support matrix or surface. In another preferred embodiment deεcribed herein, the loop oligonucleotide iε bound to paramagnetic beads that are covalently linked to strepavidin. The loop oligonucleotide is phosphorylated at the 5' end.
The microsatellite marker is amplified using standard PCR primers and conditions. The double-stranded DNA is denatured and annealed to the loop oligonucleotide. The conditions of the annealing are such that the concentrations of the DNA and oligonucleotide are relatively low to discourage concatamer formation, the loop oligonucleotide should be preεent in exceεs with respect to the PCR product. The hybridization is performed at a sufficient temperature (preferably 37°C) in O.lxSSC or a comparable buffer such that the annealed loop oligonucleotide and PCR strand are εtable, but εimple annealing within the tandem repeat of the two PCR DNA εtrandε iε diεfavored. The annealing iε performed at a low concentration in a minimum volume of 200 microliterε in order to diεfavor concatamer formation.
Referring to figure 17, part A, the original PCR primers do not need to be removed prior to the annealing. After the annealing is completed, the unhybridized DNA and primers are eliminated by waεhing with the hybridization buffer.
Referring to figure 17, part B, both specificity and senεitivity iε achieved by hybridizing the PCR product with the loop oligonucleotide. After removal of the complementary PCR product DNA εtrand and primerε, the εtructure iε annealed (in a εet of εeparate chamberε or poεitionε) with a εet of bridging oligonucleotides that represent different multiples of the tandem repeat. The bridging oligonucleotide iε complementary to the PCR'd DNA strand that is hybridized to the loop oligonucleotide. The bridging oligonucleotide is labeled with radioactivity or another detection tag such as fluoreεcein. The bridging oligonucleotide is phosphorylated at the 5' end.
The exonuclease reaction is carried out to digest all noncircularized, single- or double-stranded DNAs and primers.
The remaining material on the support matrix represents the undigested circularized loop oligonucleotide and bridging oligonucleotide.
Bridging oligonucleotides that are too εhort or too long to perfectly cloεe the loop oligonucleotide are ligated to one end of the loop oligonucleotide but cannot allow the εtructure to circularize. Theεe partially ligated productε are then eliminated during the exonucleaεe εtep.
The following ligation protocol steps are esεentially aε in (Inniε, M.A. , Gelfand, D.H., Sninεky, J.J. , and White, T.J. 1990. PCR Protocolε: A Guide to Methodε and
Applicationε . San Diego, CA: Academic Preεε) , incorporated by reference.
(1) Combine:
3 μl of PCR'd εample 1 μl of sheared salmon sperm DNA at 10 μg/ml 2 μl of H,0
(2) Denature the above DNA by heating at 95°C for 2 minutes.
Alternatively, use alkali denaturation by replacing the 2 μl of H,0 with 1 μl of 0.5 N NaOH (room temperature for 10 minutes) followed by 1 μl of 0.5 N HCl.
(3) Add:
1 μl of 140 fmol of biotinylated loop oligonucleotide (phosphorylated) 1 μl of 1.4 fmol of bridge oligonucleotide (phosphorylated with 32P) 2 μl of 0.1-0.2 Weisε units of T4 DNA ligase in 5x ligase buffer (250 mM Tris-Cl (pH=7.5), 500 mM NaCl, 50 mM MgCl2, 25 mM dithiothreitol, 5 mM ATP, 500 μg/μl BSA)
(4) Terminate the reaction by heating at 95°C for 2 minuteε.
After the ligation, 0.1 to 0.5 units each of exonuclease
VII (which digestε εingle-εtranded DNA from 5' and 3 ' ends) and exonuclease III (which digests double-stranded DNA, but not single-εtranded DNA) . Digeεtion proceedε at 37°C for 30 minuteε.
The nondegraded products (the circularized strands) are bound to the streptavidin-para agnetic beads in a 500 μl tube, washed three times with 200 μl of washing buffer and then counted directly or denatured off of the beads using the loading buffer/Dye for sequencing gels and run on a standard denaturing sequencing gel.
In an alternative embodiment, the annealing and ligation of the bridge and loop oligonucleotides to create a circular εtructure iε performed as a two-stage process to discourage concatemer formation. In this protocol, only the bridge oligonucleotide is phoεphorylated. The reaction is identical to that described until the end of the ligation εtep. At that point, the εample iε denatured at 95°C for 5 minuteε and 0.1 unit of T4 Polynucleotide kinase is added at 37°C for 30 minuteε. Thiε phosphorylates the 5' ends of the loop oligonucleotides. The reaction is then again heated at 95°C for 2-5 minuteε and the εampleε are diluted 100 fold in lx ligaεe buffer to promote circularization. The diluted εample iε concentrated uεing the εtreptavidin-paramagnetic beadε and then treated aε above with exonucleases III and VII.
F. Method for Identifying Inheritance Patterns using Concordance Analysis
As a disease gene segregateε within a pedigree, individualε inheriting the linear chromoεomal εegment that containε the founder*ε affected diεeaεe gene will carry the disease. Chromoεomal regionε that are cloεer to the disease gene will be more tightly linked, and theεe regionε and their aεεociated genetic markerε, will have a greater tendency to be associated with the disease. Conversely, regionε and markerε that are further away will be leεε likely to have the disease asεociation. In an X-linked disease that iε fully penetrant in maleε, the preεence of phenotypic diεease indicates inheritance of the affected disease gene, while absence (in males) indicates that the affected diεease region has not been inherited. In autosomal diseases, the unaffecteds are less useful.
Inner Product Mapping (IPM) (Perlin, 1993) is a method for mapping large physical DNA probes (e.g., > 25,000bp cosmidε or YACε) that uses radiation hybrids (RHs) . For each RH, a dense sampling acrosε the chromoεome (or whole genome) iε first obtained using sequence tagged siteε or fluorescence in εitu hybridization. This sampling apε the regions in which large chromosomal fragments have been retained, and where they have been lost, indicated by a + or -, respectively. Additionally, every physical probe has itε own signature of +'s and -'s, one for each RH, which indicates whether or not the probe lies within εome fragment of the RH. The probe' RH signature iε compared with the RH εignature of every STS. When the εignatureε match at some RH (i.e. , ++ or —) , this indicateε concordance between the two εignatures, whereas when there is a mismatch (i.e., +- or -+) , this indicates discordance between the signatureε. For every STS sample point along the chromosome, the sum of the matcheε minus the sum of the miεmatches iε computed, which generates a profile curve acroεε the chromoεome. The peak of this profile suggeεtε the location of the probe. A feature of IPM is its ability to map accurately using few experiments: a logarithmic number of RHs provides linear resolving power.
Recombination events in meiosiε cause the founders' chromosomal regionε to be retained or loεt in progeny. One can conεider the chromoεomal εegment containing the affected diεeaεe gene aε a probe. The location of thiε probe is suggested by the concordance of chromosomal regions that affected (or carrier) individuals εhare with founder(ε) (++) , or thoεe regionε which unaffected individuals do not share with founder(ε) (—). Conversely, discordance is suggested in those chromoεomal regionε affected (or carrier) individualε do not εhare with the founder(s) (+-) , or those regionε which unaffected individualε εhare with founder(ε) (-+) . Thiε motivates the application of IPM to diseaεe gene localization.
Referring to figure 12, in Step 1 phenotypic information iε obtained on a εet of related individualε. In Step 2, a denεe genotyping across a chromoεome uεing highly-polymorphic STSε iε obtained for all informative pedigree memberε; in the preferred embodiment, thiε iε done with the apparatuε of figure 1. ϋεing phaεe known genotypeε, haplotyping iε done wherever poεεible. The founder genotype is obtained directly from the founder (if available) , or constructed indirectly as the union of alleles at each locus for every carrier or affected child of the founder.
Referring to figure 12, in Step 3 let v(i) be the sign of the phenotype of an individual i, where
v(i) = +1, when i is affected or a carrier, and = -1, when i is not affected.
Let the triple <i,m,a> denote that individual i at marker m has allele a. Genotyping over a pedigree conεtructε a set of such triples. In Step 4 compute
w(i,m,a) = the weight accorded the triple, as follows.
In one IPM approach, assume that the alleles are sufficiently informative for an identity-by-state (IBS) analyεiε. Then whether or not an individual's allele is identical to a founder's allele would be known unambiguously.
Therefore, in this case, define
w(i,m,a) = +1, when the founder has allele a occurring at marker m. = -1, when allele a is not εhared with the founder.
In a second IPM approach, the w(i,m,a) term weights for the probability that an allele a was transmitted to individual i at marker m by the founder. That is, an accounting for identity-by-descent (IBD) is done. At each link in the inheritance graph, the probability of descent at a marker from the founder for an allele on the chromosome is computed. The product of these link probabilitieε over every link in the inheritance path therefore provideε an estimate of the probability of descent. Linearly reεcaling thiε descent probability from the range [0,1] by the function
f(x) = 1 - 2x
provides a number in the range [-1,+i], which is useful for calculations.
Under either IPM approach (IBS or IBD) , in Step 5 a concordance is computed for every allele of every STS marker by εumming over the individuals {i} chromosomes aε
c(m,a) = SUM (over i) [ v(i) * w(i,m,a) ].
Each εummand iε a number between -1 and +1. A marker which has an allele maximizing this sum has the greatest concordance with the founder, and suggeεtε a chromoεomal region containing the gene. Taking the maximum value c(m,a) of the alleles {a} at each marker m, in Step 6 the concordance function
C(m) = MAX (over a) [ c(m,a) ]
is computed. Note that thiε computation proceeds directly from the allele data, and requires no analysis of recombination breakpoints. In Step 7, the genetic regionε correlating with the trait are localized. With denεely εampled markerε {m} at previouεly determined map locations, the concordance function C(m) computes a profile over the chromosome. Where this profile εhowε a pattern on the chromosome that rises up to a peak, and then again descends from it, εuggeεts the location of the gene (near the peak) . With autosomal or nonfully penetrant disorders, the unaffected individuals are weighted to have lesε influence. While two-point or multi-point likelihood analyses are alternative embodiments to the one- point IPM approach, their algorithmic complexity may preclude practical application to dense genotyping of very many individuals. Multigenic traits will produce patterns of multiple peaks; each peak correεpondε to a region on the genome that influenceε the trait.
Denεe genotypeε are obtained for related εetε of individuals; in the preferred embodiment, this is done with the apparatus of figure 1. In Step 8 of figure 12, the genetic patterns obtained in Step 7 are used to asεeεε the risk of individuals for various traits and diseases. In Step 9, the localization of disease genes on a genetic map is used to initiate the cloning of the gene via positional cloning techniques (Kerem, B.-S., Rommens, J.M. , Buchanan, J.A. , Markiewicz, D. , Cox, T.K. , Chakravarti, A., Buchwald, M. , and Tsui, L.-C. 1989. Identification of the cystic fibroεis gene: genetic analysiε. Science , 245: 1073-1080.Riordan, J.R., Rommenε, J.M. , Kerem, B.-S., Alon, N. , Rozmahel, R. , Grzelczak, Z., Zielenski, J., Lok, S., Plavsic, N., Chou, J.- L. , Drumm, M.L., Iannuzzi, M.C., Collins, F.S., and Tsui, L.- C. 1989. Identification of the cystic fibrosiε gene: cloning and characterization of complementary DNA. Science , 245: 1066-1073.), incorporated by reference.
G. useful Applications of the System
Use of the apparatuε described in figure 1 with the system in figure 3 is made for health risk assessment, as described above. Dense genotyping has application to prenatal genetic screening (Schwartz, L.S., Tarleton, J. , Popovich,
B. , Seltzer, W.K. , and Hoffman, E.P. 1992. Fluorescent Multiplex Linkage Analysis and Carrier Detection for Duchenne/Becker Muscular Dystrophy. Am. J. Hum . Genet . , 51: 721-729.), incorporated by reference, and in detecting chromosomal abnormalities. Such genotyping can be used for actuarial analysiε of health riεks in order to predict and reduce health care costs. Genotyping also finds application in transplantation (Scharf, S., Saiki, R. , and Ehrlich, H. 1988. New methodology for HLA class II oligonucleotide typing uεing polymeraεe chain reaction (PCR) amplification. Hum . Immunol . , 23: 143.), incorporated by reference, and in the εcreening and evaluation of military perεonnel. The loop miεmatch methodε deεcribed can detect exon repeatε that correlate with diεease and prognosis, as well as exon alleles
(via multiple chemical modification asεays) for precise molecular diagnostics (Beggs, A., and Kunkel, L. 1990. A polymorphic CACA repeat in the 3' untranslated region of dystrophin. Nucleic Acidε Reε , 18: 1931. Beggs, A.H. , Koenig, M. , Boyce, F.M. , and Kunkel, L.M. 1990. Detection of 98% of DMD/BMD gene deletions by polymerase chain reaction. Hum. Genet . , 86: 45-48), incorporated by reference. The hybridization panel methods can similarly detect exon alleles.
The apparatuε and syεtem iε uεeful for the poεitional cloning of geneε that cauεe traitε and diεeaεes. Linkage (Ott, J. 1991. Analyεis of Human Genetic Linkage, Revised Edition . Baltimore, Maryland: The Johns Hopkins University Preεε.), incorporated by reference, and other analyεeε uεe denεe genotypeε to elicit patternε of inheritance and localized genetic regions of influence that correlate with genes. Such patternε are useful in genetic design applicationε, such as animal and plant huεbandry, for example, for crop improvement (Bernatzky, R. (1993) . Genetic mapping and protein product diversity of the self- incompatibility locus in wild tomato (Lycopersicon peruvianum) . Biochemical Geneticε , 31(3-4): 173-84. Ho, J.Y., Weide, R. , Ma, H.M. , van, W.M. , Lambert, K.N., Koornneef, M. , Zabel, P., and Williamεon, V.M. (1992). The root-knot nematode reεiεtance gene (Mi) in tomato: construction of a molecular linkage map and identification of dominant cDNA markers in reεistant genotypes. Plant Journal , 2(6): 971-82.), incorporated by reference, and cataloguing strainε.
Denεe genotyping can be uεed to detect the occurrence of chromoεomal patternε in a population. Thiε applies in law enforcement applications (Jeffreys, A.J., Brookfield, J.F.Y., and Semeonoff, R. 1985. Poεitive identification of an immigration teεt-case using human DNA fingerprints. Nature, 317: 818-819.), incorporated by reference, for genetically fingerprinting individuals, aε well in paternity testing to assesε parenthood.
Genotyping can monitor the changeε in the chromoεomal patternε of populationε, including:
• Cancer teεting and assessment (Zhang, Y. , Coyne, M.Y., Will, S.G., Levenson, CH. , and Kawaεaki, E.S. (1991). Single-baεe mutational analysis of cancer and genetic diseaεeε uεing membrane bound modified oligonucleotides. Nucleic Acidε Reεearch , 19(14): 3929-33.), incorporated by reference, determining the metastatic extent of tumor, and its sensitivity to treatment. • In vitro aεεayε for toxic, mutagenic, and other pharmacological effectε of chemicalε (e.g., on tiεsue cultures) .
• The relatedneεε of populationε, and quantitating environmental impact on populationε (Atlaε, et al, 1992.
Molecular Approacheε for Enviromental Monitoring of Microorgansisms. BioTechniques , 12(5): 706-714. Bej, and Mahbubani 1992. PCR Meth. Appl. Applicationε of the Polymeraεe Chain Reaction in Environmental Microbiology, 1(3): 151-159), incorporated by reference.
• Determining geographical spread for animal migration (e.g., fisherieε) and pathogen εpread (e.g., epidemiology).
• In the peεt control industry for determining tolerance and susceptibility, and detecting resiεtance to peεt control agentε.
• With microorganiεmε (including yeaεt and bacteria) to characterize exon DNA for pathogenicity, or to determine cauεative organisms for infections (Lerman, L.S., ed. 1986. DNA Probeε: Applicationε in Genetic and Infectious Diεeaεe and Cancer . Cold Spring Harbor, NY: Cold Spring Harbor Laboratory) , incorporated by reference.
Although the invention has been described in detail in the foregoing embodiments for the purpoεe of illuεtration, it iε to be understood that such detail is solely for that purpose and that variations can be made therein by those εkilled in the art without departing from the εpirit and εcope of the invention except aε it may be described by the following claims.

Claims

WHAT IS CLAIMED IS:
1. An apparatuε for analyzing genetic material of an organiεm compriεing:
meanε for amplifying the genetic material of the organism; and
means for characterizing the amplified genetic material, said characterizing means in communication with the amplifying means, said characterizing means containing all of the genetic material within a region having a radius of leεε than two feet, εaid amplifying means and characterizing means characterizing the genetic material at a rate exceeding 100 sequence-tagged sites per hour per organism.
2. An apparatus aε described in Claim 1 wherein the genetic material includes nucleotide εeguences and wherein the amplifying means includes a reaction plate with which the genetic material is in contact, said reaction plate having a plurality of chambers, each of which iε diεpoεed in a unique location of the plate corresponding to a location within a genome having at least one nucleotide sequence.
3. An apparatus as described in Claim 2 wherein the characterizing means includes means for detecting whether a chamber contains a nucleotide sequence of the genetic material corresponding to the chamber's unique location.
4. An apparatus as described in Claim 3 including a thermocycler in thermal communication with the plate to heat and cool the plate.
5. An apparatuε aε deεcribed in Claim 4 wherein the detecting means includes a detector connected to the chambers which produces a chamber signal for each chamber corresponding to genetic material in each chamber, and a proceεεor in communication with the detector which receiveε the εignalε and identifieε unique propertieε of the nucleotideε in each chamber.
6. An apparatuε aε deεcribed in Claim 5 wherein the unique properties of the nucleotide of the genetic material in each chamber pertain to a number of nucleotideε in any of the nucleotide εequenceε of the genetic material.
7. An apparatuε aε deεcribed in Claim 6 wherein the amplifying meanε includes at least one nucleotide sequence that correspondε to each chamber in contact with the chamber, each nucleotide εequence interacting with the nucleotide εequence of the genetic material of the nucleotide εequence if it iε preεent.
8. A method for analyzing genetic material of an organiεm compriεing the εtepε of:
amplifying the genetic material; and
characterizing the amplified genetic material in a region having a radiuε of leεs than two feet at a rate exceeding 100 sequence-tagged εiteε per hour per organiεm.
9. A method aε deεcribed in Claim 8 wherein the genetic material includes DNA or RNA.
10. A method aε described in Claim 9 including after the characterizing step, there is the step of asεessing risk of illnesε for which there is a genetic susceptibility in the organism.
11. A method for manufacturing an apparatus for analyzing genetic material of an organism compriεing the εtepε of:
placing correεponding εequence-tagged εiteε in contact with correεponding chamberε of a plate;
connecting detectors to the chambers which can detect whether nucleotide sequences of the genetic material of the organism, when placed in contact with the chambers, have reacted with the corresponding sequence-tagged sites in the corresponding chamber;
placing a thermocycling device in contact with the plate to cause the sequence-tagged sites in the chambers to react with genetic material of the organism that is placed in contact with the chambers; and
connecting a computer to the detectorε and to the thermocycling device to control operation of the thermocycling device, and to receive signals which correspond to the genetic material of the organism and the sequence-tagged siteε of each chamber from the detectors.
12. A method of determining the size of nucleotide seguenceε of an STR marker contained on genetic material compriεing the εtepε of: a plifying the nucleotide sequences of the genetic material in a region relating to the STR marker;
performing nucleic acid hybridizations on the amplified nucleotide sequences;
producing signalε corresponding to the hybridizations of the amplified nucleotide sequenceε; and
determining the εizeε of the nucleotide εequenceε contained in the genetic material.
13. A method as described in Claim 12 wherein the hybridizations include a nucleic acid syntheεiε εtep.
14. A method aε deεcribed in Claim 12 wherein the hybridizationε include a nucleic acid ligation step.
PCT/US1995/001395 1994-02-04 1995-02-02 Method and apparatus for analyzing genetic material WO1995021269A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US19249194A 1994-02-04 1994-02-04
US08/192,491 1994-02-04

Publications (1)

Publication Number Publication Date
WO1995021269A1 true WO1995021269A1 (en) 1995-08-10

Family

ID=22709899

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US1995/001395 WO1995021269A1 (en) 1994-02-04 1995-02-02 Method and apparatus for analyzing genetic material

Country Status (1)

Country Link
WO (1) WO1995021269A1 (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO1996036731A2 (en) * 1995-05-19 1996-11-21 Trustees Of Boston University Nucleic acid detection methods
WO1998023776A1 (en) * 1996-11-29 1998-06-04 Amersham Pharmacia Biotech Uk Ltd. Method for determining tandem repeat sequence length
US7972778B2 (en) 1997-04-17 2011-07-05 Applied Biosystems, Llc Method for detecting the presence of a single target nucleic acid in a sample

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5126239A (en) * 1990-03-14 1992-06-30 E. I. Du Pont De Nemours And Company Process for detecting polymorphisms on the basis of nucleotide differences
US5348853A (en) * 1991-12-16 1994-09-20 Biotronics Corporation Method for reducing non-specific priming in DNA amplification
US5364759A (en) * 1991-01-31 1994-11-15 Baylor College Of Medicine DNA typing with short tandem repeat polymorphisms and identification of polymorphic short tandem repeats

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5126239A (en) * 1990-03-14 1992-06-30 E. I. Du Pont De Nemours And Company Process for detecting polymorphisms on the basis of nucleotide differences
US5364759A (en) * 1991-01-31 1994-11-15 Baylor College Of Medicine DNA typing with short tandem repeat polymorphisms and identification of polymorphic short tandem repeats
US5364759B1 (en) * 1991-01-31 1997-11-18 Baylor College Medicine Dna typing with short tandem repeat polymorphisms and indentification of polymorphic short tandem repeats
US5364759B2 (en) * 1991-01-31 1999-07-20 Baylor College Medicine Dna typing with short tandem repeat polymorphisms and identification of polymorphic short tandem repeats
US5348853A (en) * 1991-12-16 1994-09-20 Biotronics Corporation Method for reducing non-specific priming in DNA amplification

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
GENOMICS, Volume 2, issued 1988, SKOLNICK et al., "Simultaneous Analysis of Multiple Polymorphic Loci Using Amplified Sequence Polymorphisms (ASPs)", pages 273-279. *
NATURE, Volume 359, issued 29 October 1992, WEISSENBACH et al., "A Second-Generation Linkage Map of the Human Genome", pages 794-801. *
SCIENCE, Volume 245, issued 29 September 1989, OLSON et al., "A Common Language for Physical Mapping of the Human Genome", pages 1434-1435. *

Cited By (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO1996036731A2 (en) * 1995-05-19 1996-11-21 Trustees Of Boston University Nucleic acid detection methods
WO1996036731A3 (en) * 1995-05-19 1997-02-06 Univ Boston Nucleic acid detection methods
US5753439A (en) * 1995-05-19 1998-05-19 Trustees Of Boston University Nucleic acid detection methods
WO1998023776A1 (en) * 1996-11-29 1998-06-04 Amersham Pharmacia Biotech Uk Ltd. Method for determining tandem repeat sequence length
US6083701A (en) * 1996-11-29 2000-07-04 Amersham Pharmacia Biotech Uk Limited Method for determining tandem repeat sequence length
US7972778B2 (en) 1997-04-17 2011-07-05 Applied Biosystems, Llc Method for detecting the presence of a single target nucleic acid in a sample
US8067159B2 (en) 1997-04-17 2011-11-29 Applied Biosystems, Llc Methods of detecting amplified product
US8257925B2 (en) 1997-04-17 2012-09-04 Applied Biosystems, Llc Method for detecting the presence of a single target nucleic acid in a sample
US8278071B2 (en) 1997-04-17 2012-10-02 Applied Biosystems, Llc Method for detecting the presence of a single target nucleic acid in a sample
US8551698B2 (en) 1997-04-17 2013-10-08 Applied Biosystems, Llc Method of loading sample into a microfluidic device
US8563275B2 (en) 1997-04-17 2013-10-22 Applied Biosystems, Llc Method and device for detecting the presence of a single target nucleic acid in a sample
US8822183B2 (en) 1997-04-17 2014-09-02 Applied Biosystems, Llc Device for amplifying target nucleic acid
US8859204B2 (en) 1997-04-17 2014-10-14 Applied Biosystems, Llc Method for detecting the presence of a target nucleic acid sequence in a sample
US9506105B2 (en) 1997-04-17 2016-11-29 Applied Biosystems, Llc Device and method for amplifying target nucleic acid

Similar Documents

Publication Publication Date Title
FI111554B (en) Reagent composition and kit to identify a nucleotide base at a specific position
Guo et al. Enhanced discrimination of single nucleotide polymorphisms by artificial mismatch hybridization
US5589330A (en) High-throughput screening method for sequence or genetic alterations in nucleic acids using elution and sequencing of complementary oligonucleotides
EP0777750B1 (en) High throughput screening method for sequences or genetic alterations in nucleic acids
US6114115A (en) Use of immobilized mismatch binding protein for detection of mutations and polymorphisms, and allele identification
US5834181A (en) High throughput screening method for sequences or genetic alterations in nucleic acids
US5856092A (en) Detection of a nucleic acid sequence or a change therein
US8334116B2 (en) Methods and compositions for generation of multiple copies of nucleic acid sequences and methods of detection thereof
US5843650A (en) Nucleic acid detection and amplification by chemical linkage of oligonucleotides
AU711836B2 (en) Detection of mismatches by resolvase cleavage using a magnetic bead support
US6183958B1 (en) Probes for variance detection
US6329147B1 (en) Methods for detection of a triplet repeat block and a functional mismatch binding protein in a biological fluid sample
EP1248853A2 (en) Liquid array technology
EP0227795A1 (en) Method for performing nucleic acid hybridization assays.
WO1997009444A9 (en) Detection of mismatches by resolvase cleavage using a magnetic bead support
EP0407789B1 (en) Nucleic acid detection method
JP2005519264A (en) Surface modification, linker bonding, and polymerization methods
EP1368501A1 (en) Mutation detection using muts and reca
AU2002311761A1 (en) Mutation detection using MutS and RecA
US20080076130A1 (en) Molecular haplotyping of genomic dna
CA2219933A1 (en) Methods for the identification of genetic modification of dna involving dna sequencing and positional cloning
Zhang et al. Detection of target nucleic acids and proteins by amplification of circularizable probes
WO1995021269A1 (en) Method and apparatus for analyzing genetic material
WO1997010366A2 (en) High throughput screening method for sequences or genetic alterations in nucleic acids
JP2002513915A (en) Method for identifying and displaying differences between individuals in DNA sequence

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A1

Designated state(s): CA JP US

AL Designated countries for regional patents

Kind code of ref document: A1

Designated state(s): AT BE CH DE DK ES FR GB GR IE IT LU MC NL PT SE

121 Ep: the epo has been informed by wipo that ep was designated in this application
122 Ep: pct application non-entry in european phase
NENP Non-entry into the national phase

Ref country code: CA