US20060134662A1 - Method and system for genotyping samples in a normalized allelic space - Google Patents

Method and system for genotyping samples in a normalized allelic space Download PDF

Info

Publication number
US20060134662A1
US20060134662A1 US11/259,162 US25916205A US2006134662A1 US 20060134662 A1 US20060134662 A1 US 20060134662A1 US 25916205 A US25916205 A US 25916205A US 2006134662 A1 US2006134662 A1 US 2006134662A1
Authority
US
United States
Prior art keywords
allelic
genotype
normalized
sample
space
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Abandoned
Application number
US11/259,162
Inventor
Mark Pratt
David Holden
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Applied Biosystems LLC
Applied Biosystems Inc
Original Assignee
Individual
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority to US11/259,162 priority Critical patent/US20060134662A1/en
Application filed by Individual filed Critical Individual
Assigned to APPLERA CORPORATION, A DELAWARE CORPORATION reassignment APPLERA CORPORATION, A DELAWARE CORPORATION ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: HOLDEN, DAVID, PRATT, MARK R.
Publication of US20060134662A1 publication Critical patent/US20060134662A1/en
Assigned to BANK OF AMERICA, N.A, AS COLLATERAL AGENT reassignment BANK OF AMERICA, N.A, AS COLLATERAL AGENT SECURITY AGREEMENT Assignors: APPLIED BIOSYSTEMS, LLC
Assigned to APPLIED BIOSYSTEMS INC. reassignment APPLIED BIOSYSTEMS INC. CHANGE OF NAME (SEE DOCUMENT FOR DETAILS). Assignors: APPLERA CORPORATION
Assigned to APPLIED BIOSYSTEMS, LLC reassignment APPLIED BIOSYSTEMS, LLC MERGER (SEE DOCUMENT FOR DETAILS). Assignors: ATOM ACQUISITION, LLC AND APPLIED BIOSYSTEMS INC.
Assigned to APPLIED BIOSYSTEMS INC. reassignment APPLIED BIOSYSTEMS INC. MERGER (SEE DOCUMENT FOR DETAILS). Assignors: ATOM ACQUISITION CORPORATION
Priority to US12/551,438 priority patent/US8359166B2/en
Assigned to APPLIED BIOSYSTEMS, LLC reassignment APPLIED BIOSYSTEMS, LLC MERGER (SEE DOCUMENT FOR DETAILS). Assignors: APPLIED BIOSYSTEMS INC.
Assigned to APPLIED BIOSYSTEMS, LLC reassignment APPLIED BIOSYSTEMS, LLC MERGER (SEE DOCUMENT FOR DETAILS). Assignors: APPLIED BIOSYSTEMS INC.
Assigned to APPLIED BIOSYSTEMS INC. reassignment APPLIED BIOSYSTEMS INC. CHANGE OF NAME (SEE DOCUMENT FOR DETAILS). Assignors: APPLERA CORPORATION
Priority to US13/602,676 priority patent/US20130060479A1/en
Assigned to APPLIED BIOSYSTEMS, INC. reassignment APPLIED BIOSYSTEMS, INC. LIEN RELEASE Assignors: BANK OF AMERICA, N.A.
Assigned to APPLIED BIOSYSTEMS, LLC reassignment APPLIED BIOSYSTEMS, LLC CORRECTIVE ASSIGNMENT TO CORRECT THE RECEIVING PARTY NAME PREVIOUSLY RECORDED AT REEL: 030182 FRAME: 0677. ASSIGNOR(S) HEREBY CONFIRMS THE RELEASE OF SECURITY INTEREST. Assignors: BANK OF AMERICA, N.A.
Priority to US15/926,535 priority patent/US20180211000A1/en
Abandoned legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B25/00ICT specially adapted for hybridisation; ICT specially adapted for gene or protein expression
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B25/00ICT specially adapted for hybridisation; ICT specially adapted for gene or protein expression
    • G16B25/10Gene or protein expression profiling; Expression-ratio estimation or normalisation

Definitions

  • Genotyping analysis can be hampered by noise and other perturbations affecting signal response. Oftentimes, conventional approaches to determine genotype calls inadequately address resolving these factors thus increasing the uncertainty and subjectivity of the analysis. In certain instances, conventional genotyping routines may discard data that does not conform to an identifiable signal cluster despite the overall effort to keep this information.
  • Removal of data from a sample set in this manner may actually prevent identification of critical data points and reduce the overall efficiency of the genotyping analysis. Further, including noisy allele signal response may reduce genotype call accuracy as it can skew or throw off certain conventional genotyping approaches.
  • FIG. 1 is a schematic illustration of a system for performing genotyping on genetic samples in accordance with some implementations of the present invention
  • FIG. 2 is a schematic flowchart of the operations used to call genotypes using various aspects of the present invention as well as those available through clustering;
  • FIG. 3 is a flowchart diagram depicting the operations associated with normalizing the allelic signal response and transforming the signals into an normalized allelic space in accordance with some implementations of the present invention
  • FIG. 4A is a schematic representation that illustrates the expected genotype values and basis for some of the constraints used by some implementations of the present invention
  • FIG. 4B illustrates the more random appearing distribution of allelic signals before normalization and processing performed in accordance with some aspects of the present invention
  • FIG. 4C illustrates the effect of normalization on allelic signal data from FIG. 4B ;
  • FIG. 4D illustrates the distribution of a large number of normalized training genotypes used to develop the prior probability distributions for the three expected genotypes.
  • FIG. 5A presents the operations of using a Cartesian coordinate system in the normalized allelic space to call the genotype for a given sample
  • FIG. 5B provides an alternative approach to calling genotypes using a fixed prior probability distribution in accordance with implementations of the present
  • FIG. 5C is a flowchart diagram of the operations for generating the fixed prior probability distribution used in calling the genotypes
  • FIG. 6 depicts a schematic system and methods for calling genotypes in accordance with implementations of the present invention.
  • aspects of the present invention describe an apparatus and method for generating genotype calls for one or more samples.
  • the genotyping approach transforms and corrects for systematic variation in an allelic signal response with an allelic model represented with one or more model parameters.
  • the parameters in the model are associated with noise and other sources of systematic variation.
  • the model and parameters may then be used to transform the allelic signals into a representation of normalized allelic space that serves to compensate for the one or more sources of the noise or systematic variation.
  • systematic variation may arise from instrument artifacts, chemistry artifacts, process artifacts, operational artifacts, temperature artifacts, humidity artifacts, volume artifacts and assay artifacts. Compensating for these variations in this manner, makes it possible to determine the genotype for the sample based upon its relationship to the representation of the allelic signals in normalized allelic space and in accordance with the allelic model.
  • FIG. 1 is a schematic illustration of a system configured for genotypic analysis of samples in accordance with some implementations of the present invention.
  • System 100 includes biological/genetic samples 102 , assays 104 , sample processing systems 106 , allele signal detection and acquisition system 108 (hereinafter allelic signal detection system 108 ), allele signal response database 110 , normalized allelic space transform and genotype analysis system 112 (hereinafter normalized allelic space transform system 112 ), genotype database derived from normalized allelic space 114 and output systems and interface for genotyping analysis 116 .
  • system 100 may be used in genetic and biological research to classify genetic sequence variations in samples 102 that may include insertions, deletions, restriction fragment length polymorphisms (“RFLPSs”), short tandem repeat polymorphisms (“STRPs”) and single nucleotide polymorphisms (“SNPs”).
  • RLPSs restriction fragment length polymorphisms
  • STRPs short tandem repeat polymorphisms
  • SNPs single nucleotide polymorphisms
  • Many of the examples and descriptions contained herein below directly relate to this latter SNP type of genetic sequence variation.
  • SNP is useful in studying the relationship between nucleotide variations and diseases or other conditions.
  • SNPs are but one of many different types of nucleotide variations and it is contemplated that application of these concepts with respect to SNPs can also be applied to many other sources of genetic sequence variations described hereinabove and known to those skilled in the art.
  • a large amount of data may be generated and analyzed when performing a genetic analysis or experiments. For example, this may be accomplished in sample processing systems 106 in part by placing samples in multiple wells and then evaluating with assays 104 .
  • 96 wells of genetic samples may be evaluated with 48 assays 104 sensitive to 48 different SNPs on the genetic samples 102 . This results in a total of 4608 different tests to be performed for each tray of samples and a corresponding set of results.
  • Assays 104 can be multiplexed with multiple assays for each sample or singleplexed whereby a single assay can be evaluated individually with each sample.
  • Allele signal detection system 106 assays the various samples and produces corresponding allele signal response. These signals may be diagrammed using a scatter plot or other diagrammatic representation. In one implementation, allele signal detection system 106 assays utilize a pair of fluorescent probes having an associated discrete marker or reporter dye responsive to each of the different alleles to be detected. During readout of the sample, allele signal detection system 106 records and associates the fluorescent intensities measured for each sample to determine its particular allelic composition.
  • results of such an assay are used to determine if a selected sample tested is homozygous for a first allele (e.g., A/A), homozygous for a second allele (e:g., B/B) or heterozygous for a combination of alleles (e.g., A/B).
  • a first allele e.g., A/A
  • homozygous for a second allele e.g., B/B
  • heterozygous for a combination of alleles e.g., A/B
  • homozygous portions of the sample tend to exhibit an increased degree of fluorescence in one or another marker type with the amount of observed fluorescence from the opposing marker type significantly diminished or completely absent.
  • portions of the sample identified as heterozygous for both alleles typically present a more uniform degree of fluorescence from both markers thus indicating a contribution from two different alleles A and B.
  • a commercial implementation of these operations is performed in Applied Biosystems SNPlex system and Taqman platforms and further employs Applied Biosystems' 3730, 3730x1 and 3100 Prism Genetic Analyzers, Prism 7700 and 7900HT sequence detection systems and Biotrove OpenArray systems to monitor and record the aforementioned amplified fluorescence signals.
  • the raw data of the resulting allelic signal response is stored in allele signal response database 110 .
  • This information may also be represented visually as values in a scatter or cluster plot.
  • Conventional genotyping systems typically represent the allelic signal response data from allele signal response database 110 in such representations to aid in visualization of clusters of signals representing the different genotypes. In some cases, these clusters tend to form three distinct groupings of signal data interpreted as representing a propensity that is homozygous for a first allele (e.g., A/A), homozygous for a second allele (e.g., B/B) or heterozygous for a combination of alleles (e.g., A/B).
  • Normalized allelic space transform and genotype analysis system 112 further improves calling an allele by recognizing and compensating for systematic variation that may be present in system 100 and genetic samples 102 . For example, this may be introduced from consistent sources of variation inherent to each sample, variation from run to run (i.e., variation in the run), imbalance between the signal response from pairs of complementary alleles (i.e., allelic imbalance) and many other factors that may arise during genotype analysis.
  • the alleles can be called with greater accuracy and in a manner that can be more readily automated for high throughput analytical genotyping systems.
  • signal response detected from the alleles are transformed from a relative measure of the allelic signal response into a unit-less scalar measure of the various alleles (i.e., homozygous A/B and heterozygous). Through this transformation, the representation of alleles in a normalized allelic space provides a more consistent and more accurate representation of the alleles being measured in a given sample.
  • Processing performed by normalized allelic space transform 112 also tends to preserve more data and increase the sample sets stored in genotype database derived from normalized allelic space 114 .
  • Computer based analytical operations can be performed with fewer exceptions or the need to eliminate data that otherwise would appear as outlying data points or noise.
  • Output systems and interface for genotyping analysis 116 can therefore provide more automated genotyping based on rules rather than requiring visual inspection or subjective viewing of the datasets. For example, a representation of genotypes in the normalized allelic space 114 can be more readily analyzed using a computer and require less visualization and subjective judgment from a user or operator of the analytic equipment used in genotyping.
  • FIG. 2 is a schematic flowchart of the operations used to call genotypes using both aspects of the present invention as well as those available through clustering.
  • the analytical framework for genotyping operations generally works in three stages: a genotype preprocessing 200 , a genotype classification 202 and genotype presentation 204 as illustrated in FIG. 2 . It is contemplated that the preprocessing transformations of the allelic signals in accordance with aspects of the present invention provides a superior approach to genotype analysis though conventional clustering and other approaches as suggested by Holden should also be considered. Consequently, the operations outlined in FIG. 2 as indicated below include clustering as part of the analytical framework as an option even though genotype calling may be performed with sufficient accuracy without these additional operations or steps in most cases.
  • allelic signal intensity information is gathered for a combination of experimental samples and corresponding assays as represented by allele A 206 A and allele B 206 B sample matrices. For example, each of 48 assays are applied to 96 samples each tested for two alleles for a total of 9216 measurements contained in both allele A 206 A and allele B 206 B matrices of intensity data points. As previously recognized, these intensity values generally include several sources of systematic variation thus causing the cluster or scatter plots to not exhibit the expected genotype distributions of homozygous A/A, B/B or heterozygous A/B alleles.
  • implementations of the present invention transform the allelic signals to a normalized allelic space through a model and model parameters that compensate for these identified systematic variations ( 208 ). Essentially, this step of the operation not only compensates for these systematic variations but also serves to preserve more data.
  • implementations of the present invention can then call the genotype for a target sample according to its relationships with expected distributions in normalized allelic space ( 210 ).
  • Several approaches can be used to compare the measurements associated with the target experimental sample and the other samples in normalized allelic space. For example, a proximity approach identifies a minimum distance between the target experimental sample and an expected genotype using Cartesian coordinates in the normalized allelic space.
  • An alternative implementation performs additional genotype preprocessing 200 to create a fixed prior probability distribution of alleles normalized in a normalized allelic space. In this operation, training data derived from many different assays, samples and runs are combined together for analysis using a common model and several parameters.
  • each of these data sets are normalized as previously described by identifying an appropriate set of parameters to compensate for the one or more sources of systematic variation.
  • These normalized values are associated with an error function that further refines the distribution presented in the normalized allelic space.
  • this alternate implementation determines a maximum probability genotype for the target sample as it relates to the error function and neighboring data samples used as training data. Further details on this operation are provided later herein.
  • genotype classification ( 212 ) can also employ conventional clustering as described in Holden.
  • the call for the genotype of the target sample is performed according to a clustering of the allelic signal response as described in Holden. For example, this may involve measuring a relative angular offset of values for various allelic signals identified for a set of samples and assays.
  • Genotype calls for the target sample in normalized allelic space ( 210 ) or based upon clustering of allelic data signals are then presented along with a confidence factor for each call and classification ( 214 ).
  • FIG. 3 a flowchart diagram depicts the operations associated with normalizing the allelic signal response and transforming the signals into a normalized allelic space in accordance with some implementations of the present invention.
  • Implementations of the present invention initially model the allelic signal response with one or more model parameters for each source of systematic variation ( 302 ).
  • a basic assumption or constraint for the model is that the normalized signal of any pair of alleles whether homozygous A/A, B/B or heterozygous A/B should sum to unity.
  • parameters are incorporated into the model that provide sufficient flexibility to account for a multiplicative variability as contributed by each run, sample and each allele within a SNP assay. Accordingly, one implementation of this model appears below in Equation 1.
  • Equation 1 ⁇ i ⁇ j ( A ij / ⁇ i + ⁇ i B ij ) ⁇ 1
  • aspects of the present invention identify model parameters that reduce overall variation between the allelic signal response and their representation in the normalized allelic space ( 304 ). Essentially, this operation identifies values for the parameters ⁇ , ⁇ , ⁇ and ⁇ that represent the systematic variation of our measured allelic signal response.
  • One example technique used to identify these parameters minimizes an appropriately selected error function.
  • Equation 2 provided below, a Chi-square error function is selected to identify the parameters that reduce this overall variation however, it is contemplated that many other error functions could be selected and solved to identify these parameter values.
  • the Chi-square in Equation 2 is a robust statistic often used to process data of uneven quality. Chi-square statistics also behave appropriately as an error function as the values generated are non-negative and its minima usually indicate optimal parameters and describe goodness-of-fit.
  • the Chi-square statistic operates by comparing the actual, or observed, signal values for each allele compared to the expected signal values if there were no relationship at all between of the set of allele signals in a run made with several samples and SNP assays.
  • the Chi-square statistic in Equation 2 is useful because it tends to discount or reduce weight of data points that may have large error. Likewise, data with smaller error, which should be emphasized, are weighted more heavily.
  • Equation 1 the non-linear model and constraints provided in Equations 1-4 are recast into separate operations that can be solved quickly with linear algebra. This approach also eliminates concerns of solving these equations and finding local minima rather than the global minima for the function. It also allows the approximate solution to Equation 1 to be found optimally without otherwise expending a computationally significant time to find the solution. It is satisfactory to recast this model into separate steps that can be solved with linear algebra.
  • the parameters are then applied to the measured allele signals to transform our allele pairs into a normalized allelic space rather than the allelic signal space originally measured.
  • This resulting normalized allelic space may be universally used as it is largely free of systematic instrument and chemistry artifacts and can be used across runs as well as samples, and alleles within a SNP assays.
  • the conversion between the signal response between alleles A ij and B ij and the corresponding values in normalized allelic space of a ij and b ij is defined by the relationships in Equation 6.
  • Systematic variation in raw allele signal A ij and B ij is compensated through Equation 6 generating normalized allele signals a ij and b ij in the normalized allelic space.
  • the genotyping is then determined for the sample based upon the relationship to the representation of the allelic signals in this normalized allelic space and in accordance with the model ( 306 ). Details on this determination is provided in further detail later herein.
  • FIG. 4A a schematic representation illustrates the expected genotype values and basis for some of the constraints used by some implementations of the present invention.
  • three different possible alleles are mapped in a unit space at expected genotype values along Cartesian coordinates (1, 0), (1 ⁇ 2, 1 ⁇ 2) and (0, 1).
  • Each of the genotype value coordinates corresponds to homozygous AA alleles, heterozygous AB alleles and homozygous BB alleles with an expectation that sum of each genotype pair is unity or one.
  • FIG. 4B illustrates the more random appearing distribution of approximately eighty-thousand pairs of allelic signals before normalization and processing performed in accordance with some aspects of the present invention. Namely, the scatter plot of allele signal intensity points in FIG. 4B do not appear to form three distinct representation of the expected genotypes from FIG. 4A but instead provide ambiguous information about the relative existence of one allele over another.
  • FIG. 4C illustrates a rather distinct grouping of the three expected genotypes in a unit space as illustrated in FIG. 4A .
  • the points in FIG. 4C fall into the expected genotype values as set forth in FIG. 4A and encompassed by the model and parameters used to normalize the values from FIG. 4B as provided in Equation 1 through Equation 6 provided hereinabove. Normalization not only removes ambiguity of the values presented in the conventional scatter plot but also serves to preserve data otherwise considered outlying and noisy by normal clustering and other analytical genotyping tools.
  • FIG. 4D provides yet another distinct grouping of the three expected genotypes in accordance with other aspects of the present invention.
  • These contours are frequency determinations of a training set that are subsequently used to fit prior probability distributions.
  • the genotypes and parameters fitted to an elliptical Gaussian distribution provide even more differentiation between the alleles.
  • FIG. 5A presents the operations of using a Cartesian coordinate system in the normalized allelic space to call the genotype for a given sample. This is sometimes referred to as a proximity genotyping approach.
  • implementations of the present invention measure a Cartesian coordinate distance from an allelic signal response represented in the normalized allelic space to each of the expected one or more genotype coordinates ( 502 ).
  • the expected genotype coordinates for homozygous alleles AA, heterozygous alleles AB and homozygous alleles BB corresponds to coordinates (1,0), (1 ⁇ 2, 1 ⁇ 2) and (0,1) for the unit based normalized allelic space.
  • various implementations of the present invention call a genotype for the sample that represents the shortest Cartesian distance between the sample represented in normalized allelic space and the expected one or more genotype coordinates for each allele( 504 ).
  • clustering can also be used as an additional indicia for genotyping but may not be required given the relatively straightforward ability to call the genotype in the normalized allelic space.
  • FIG. 5B provides an alternative approach to calling genotypes using a fixed prior probability distribution in accordance with implementations of the present invention. Initially, this approach identifies a probability of each experimental genotype according to the prior probability model distributions ( 506 ). Normalized allelic values for each experimental genotype are used in the equations that describe the prior probability distribution. For example, the normalized allelic values are entered into a set of elliptical Gaussian equations. Details on determining the fixed prior probability distribution are described herein below in FIG. 5C . Next, the genotype for the experimental sample is called based upon the greatest prior probability result ( 506 ). In practice, calling the genotype for the experimental sample is based upon a greatest prior probability and reflected by the probability distribution as applied to the sample normalized in normalized allelic space. The highest resulting probability indicates the most likely genotype for the experimental sample.
  • FIG. 5C is a flowchart diagram of the operations for generating the fixed prior probability distribution used in calling the genotypes.
  • a large set of training data genotypes 510 A and 510 B are used to establish the prior probability distribution.
  • the frequency of occurrence of these normalized signals in the training set is what establishes the expected distribution in normalized allelic space subsequent genotype samples.
  • the systematic variation is compensated for by normalizing and transforming the allelic signal response of the various training genotypes into scalar values in normalized allelic space ( 512 ).
  • Various implementations of the present invention fit the normalized set of genotype training data to one or more probability distributions that reflects the set of alleles ( 610 ).
  • the distribution can be modeled using three elliptical Gaussian distributions as described by the general form set formed in Equation 7 below.
  • F ( a,b ) Fe ⁇ (a ⁇ a o ) 2 /s o 2 ⁇ (b ⁇ b o ) 2 /s b 2
  • Equation 8 provides a more specific set of equations for the prior probability distributions provided that a o and b o are the expected genotype coordinates (a,b) for homozygous allele AA, heterozygous allele AB and homozygous allele BB with corresponding coordinates (1,0), (1 ⁇ 2, 1 ⁇ 2) and (0,1) in the unit based normalized allelic space.
  • Homozygous aa ( a,b ) F aa e ⁇ (a ⁇ 1) 2 /s a 2 ⁇ (b ⁇ 0) 2 /s b 2
  • Heterozygous ab ( a,b ) F ab e ⁇ (a ⁇ 1/2) 2 /s a 2 ⁇ (b ⁇ 1/2) 2 /s b 2
  • Homozygous bb ( a,b ) F bb e ⁇ (a-0) 2 /s a 2 ⁇ (b ⁇ 1/2) 2 /s b 2
  • FIG. 6 depicts a schematic system 112 and methods for calling genotypes in accordance with implementations of the present invention.
  • System 112 includes a memory 602 to hold executing programs typically random access memory (RAM) or read-only memory (ROM) such as Flash, a display interface 604 , a spectral detector interface 606 , a secondary storage 608 , a network communication port 610 , and a processor 612 , operatively coupled together over an interconnect 614 .
  • RAM random access memory
  • ROM read-only memory
  • FIG. 6 depicts a schematic system 112 and methods for calling genotypes in accordance with implementations of the present invention.
  • System 112 includes a memory 602 to hold executing programs typically random access memory (RAM) or read-only memory (ROM) such as Flash, a display interface 604 , a spectral detector interface 606 , a secondary storage 608 , a network communication port 610 , and a processor 612 , operatively coupled together over an interconnect
  • Display interface 604 allows presentation of information related to genotyping in normalized allelic space and with clustering of signals.
  • Spectral detector interface 606 contains circuitry to control operation of a spectral detector including duplex transmission of data in real-time or in a batch operation.
  • Secondary storage 608 can contain experimental results and programs for long-term storage including allelic signal response data, normalized allelic space transformation data, normalized and raw genotype training data, target samples and other data useful in genotyping in accordance with aspects of the present invention.
  • Network communication port 601 transmits and receives results and data over a network to other computer systems and databases.
  • Processor 612 executes the routines and modules contained in memory 602 .
  • memory 602 includes allelic normalization component 616 , fixed prior probability distribution component 618 , cluster genotyping component 620 , normalized allelic space genotyping component 622 and a run-time system 628 for managing computing resources used in conjunction with one or more of the above components.
  • Allelic normalization component 616 contains routines to transform allelic signals in a signal domain to scalar values in the normalized allelic space described previously. Essentially, this component uses a model and corresponding model parameters to compensate for systematic variation in the one or more allelic signals detected.
  • Fixed prior probability distribution component 618 gathers many data points from various runs, samples and SNP assays to create a probability distribution and basis for genotyping new samples.
  • the prior probability distribution in normalized allelic space can be further modeled using a Gaussian elliptical representation for each of the homozygous alleles (both AA and BB) as well as the heterozygous allele.
  • Run-time system 628 manages system resources used when processing one or more of the previously mentioned modules.
  • run-time system 628 can be a general-purpose operating system, an embedded operating system or a real-time operating system or controller.
  • System 112 can be preprogrammed, in ROM, for example, using field-programmable gate array (FPGA) technology or it can be programmed (and reprogrammed) by loading a program from another source (for example, from a floppy disk, an ordinary disk drive, a CD-ROM, or another computer).
  • FPGA field-programmable gate array
  • system 112 can be implemented using customized application specific integrated circuits (ASICs).
  • ASICs application specific integrated circuits
  • Embodiments of the invention can be implemented in digital electronic circuitry, or in computer hardware, firmware, software, or in combinations of thereof.
  • Apparatus of the invention can be implemented in a computer program product tangibly embodied in a machine-readable storage device for execution by a programmable processor; and method steps of the invention can be performed by a programmable processor executing a program of instructions to perform functions of the invention by operating on input data and generating output.
  • the invention can be implemented advantageously in one or more computer programs that are executable on a programmable system including at least one programmable processor coupled to receive data and instructions from, and to transmit data and instructions to, a data storage system, at least one input device, and at least one output device.
  • Each computer program can be implemented in a high-level procedural or object-oriented programming language, or in assembly or machine language if desired; and in any case, the language can be a compiled or interpreted language.
  • Suitable processors include, by way of example, both general and special purpose microprocessors.
  • a processor will receive instructions and data from a read-only memory and/or a random access memory.
  • a computer will include one or more mass storage devices for storing data files; such devices include magnetic disks, such as internal hard disks and removable disks; magneto-optical disks; and optical disks.
  • Storage devices suitable for tangibly embodying computer program instructions and data include all forms of non-volatile memory, including by way of example semiconductor memory devices, such as EPROM, EEPROM, and flash memory devices; magnetic disks such as internal hard disks and removable disks; magneto-optical disks; and CD-ROM disks. Any of the foregoing can be supplemented by, or incorporated in, ASICs.

Abstract

Aspects of the present invention describe an apparatus and method for generating genotype calls for a sample. The genotyping initially models allelic signal response into an allelic model having one or more model parameters for an identified one or more sources of systematic variation. The model and parameters are then used to transform the allelic signals to a normalized normalized allelic space that serves to compensate for the one or more sources of systematic variation. By compensating for the systematic variation in this manner, the genotype for the sample is readily determined based upon its relationship to the representation of the allelic signals in normalized allelic space and in accordance with the allelic model.

Description

    CROSS-REFERENCE TO RELATED APPLICATIONS
  • This application claims priority to U.S. Provisional Application No. 60/622,279, filed Oct. 25, 2004 assigned to the assignee of the present invention and titled “Self Calibration and Joint Modeling of High Throughput Genotype Data”, which is incorporated herein by reference.
  • INTRODUCTION
  • Genotyping analysis can be hampered by noise and other perturbations affecting signal response. Oftentimes, conventional approaches to determine genotype calls inadequately address resolving these factors thus increasing the uncertainty and subjectivity of the analysis. In certain instances, conventional genotyping routines may discard data that does not conform to an identifiable signal cluster despite the overall effort to keep this information.
  • Removal of data from a sample set in this manner may actually prevent identification of critical data points and reduce the overall efficiency of the genotyping analysis. Further, including noisy allele signal response may reduce genotype call accuracy as it can skew or throw off certain conventional genotyping approaches.
  • Improved computational processing of genetic samples and their associated signal responses is of growing importance. This is especially true in the context of high throughput platforms such as SNPlex® by Applied Biosystems Corporation that are capable of performing genotyping on smaller reaction volumes and larger multiplexed assays. For such systems, it is desirable to maintain the ability to operate at high speeds while producing accurate results even with data that is conventionally difficult to evaluate.
  • BRIEF DESCRIPTION OF THE DRAWINGS
  • The skilled artisan will understand that the drawings, described below, are for illustration purposes only. The drawings are not intended to limit the scope of the present teachings in any way.
  • FIG. 1 is a schematic illustration of a system for performing genotyping on genetic samples in accordance with some implementations of the present invention;
  • FIG. 2 is a schematic flowchart of the operations used to call genotypes using various aspects of the present invention as well as those available through clustering;
  • FIG. 3 is a flowchart diagram depicting the operations associated with normalizing the allelic signal response and transforming the signals into an normalized allelic space in accordance with some implementations of the present invention;
  • FIG. 4A is a schematic representation that illustrates the expected genotype values and basis for some of the constraints used by some implementations of the present invention;
  • FIG. 4B illustrates the more random appearing distribution of allelic signals before normalization and processing performed in accordance with some aspects of the present invention;
  • FIG. 4C illustrates the effect of normalization on allelic signal data from FIG. 4B;
  • FIG. 4D illustrates the distribution of a large number of normalized training genotypes used to develop the prior probability distributions for the three expected genotypes.
  • FIG. 5A presents the operations of using a Cartesian coordinate system in the normalized allelic space to call the genotype for a given sample;
  • FIG. 5B provides an alternative approach to calling genotypes using a fixed prior probability distribution in accordance with implementations of the present
  • FIG. 5C is a flowchart diagram of the operations for generating the fixed prior probability distribution used in calling the genotypes
  • FIG. 6 depicts a schematic system and methods for calling genotypes in accordance with implementations of the present invention.
  • SUMMARY
  • Aspects of the present invention describe an apparatus and method for generating genotype calls for one or more samples. The genotyping approach transforms and corrects for systematic variation in an allelic signal response with an allelic model represented with one or more model parameters. The parameters in the model are associated with noise and other sources of systematic variation. The model and parameters may then be used to transform the allelic signals into a representation of normalized allelic space that serves to compensate for the one or more sources of the noise or systematic variation. In various implementations, systematic variation may arise from instrument artifacts, chemistry artifacts, process artifacts, operational artifacts, temperature artifacts, humidity artifacts, volume artifacts and assay artifacts. Compensating for these variations in this manner, makes it possible to determine the genotype for the sample based upon its relationship to the representation of the allelic signals in normalized allelic space and in accordance with the allelic model.
  • These and other features of the present teachings are set forth herein.
  • DESCRIPTION
  • FIG. 1 is a schematic illustration of a system configured for genotypic analysis of samples in accordance with some implementations of the present invention. System 100 includes biological/genetic samples 102, assays 104, sample processing systems 106, allele signal detection and acquisition system 108 (hereinafter allelic signal detection system 108), allele signal response database 110, normalized allelic space transform and genotype analysis system 112 (hereinafter normalized allelic space transform system 112), genotype database derived from normalized allelic space 114 and output systems and interface for genotyping analysis 116.
  • In general, system 100 may be used in genetic and biological research to classify genetic sequence variations in samples 102 that may include insertions, deletions, restriction fragment length polymorphisms (“RFLPSs”), short tandem repeat polymorphisms (“STRPs”) and single nucleotide polymorphisms (“SNPs”). Many of the examples and descriptions contained herein below directly relate to this latter SNP type of genetic sequence variation. Detailed analysis SNP is useful in studying the relationship between nucleotide variations and diseases or other conditions. However, SNPs are but one of many different types of nucleotide variations and it is contemplated that application of these concepts with respect to SNPs can also be applied to many other sources of genetic sequence variations described hereinabove and known to those skilled in the art.
  • Often, a large amount of data may be generated and analyzed when performing a genetic analysis or experiments. For example, this may be accomplished in sample processing systems 106 in part by placing samples in multiple wells and then evaluating with assays 104. In an exemplary application of the present invention, 96 wells of genetic samples may be evaluated with 48 assays 104 sensitive to 48 different SNPs on the genetic samples 102. This results in a total of 4608 different tests to be performed for each tray of samples and a corresponding set of results. Assays 104 can be multiplexed with multiple assays for each sample or singleplexed whereby a single assay can be evaluated individually with each sample.
  • Allele signal detection system 106 assays the various samples and produces corresponding allele signal response. These signals may be diagrammed using a scatter plot or other diagrammatic representation. In one implementation, allele signal detection system 106 assays utilize a pair of fluorescent probes having an associated discrete marker or reporter dye responsive to each of the different alleles to be detected. During readout of the sample, allele signal detection system 106 records and associates the fluorescent intensities measured for each sample to determine its particular allelic composition. Generally, the results of such an assay are used to determine if a selected sample tested is homozygous for a first allele (e.g., A/A), homozygous for a second allele (e:g., B/B) or heterozygous for a combination of alleles (e.g., A/B).
  • In one aspect, homozygous portions of the sample tend to exhibit an increased degree of fluorescence in one or another marker type with the amount of observed fluorescence from the opposing marker type significantly diminished or completely absent. Conversely, portions of the sample identified as heterozygous for both alleles (e.g., A/B) typically present a more uniform degree of fluorescence from both markers thus indicating a contribution from two different alleles A and B. A commercial implementation of these operations is performed in Applied Biosystems SNPlex system and Taqman platforms and further employs Applied Biosystems' 3730, 3730x1 and 3100 Prism Genetic Analyzers, Prism 7700 and 7900HT sequence detection systems and Biotrove OpenArray systems to monitor and record the aforementioned amplified fluorescence signals.
  • In one aspect, the raw data of the resulting allelic signal response is stored in allele signal response database 110. This information may also be represented visually as values in a scatter or cluster plot. Conventional genotyping systems typically represent the allelic signal response data from allele signal response database 110 in such representations to aid in visualization of clusters of signals representing the different genotypes. In some cases, these clusters tend to form three distinct groupings of signal data interpreted as representing a propensity that is homozygous for a first allele (e.g., A/A), homozygous for a second allele (e.g., B/B) or heterozygous for a combination of alleles (e.g., A/B).
  • In other situations, data may tend to aggregate in greater or fewer than three different clusters. In such instances, it may become difficult to call the allele unambiguously. U.S. patent application Ser. No. 10/611,414 by Holden et al. entitled “A System and Method for SNP Genotype Clustering” (hereinafter Holden) incorporated by reference herein provides at least one approach for clustering analysis and calling genotypes when the distribution of allele signals do not readily and distinctly translate to the different alleles. In Holden, the clusters are discriminated using a relative angular measure of the allele signal response and clusters they form.
  • Normalized allelic space transform and genotype analysis system 112 (hereinafter normalized allelic space transform system 112) further improves calling an allele by recognizing and compensating for systematic variation that may be present in system 100 and genetic samples 102. For example, this may be introduced from consistent sources of variation inherent to each sample, variation from run to run (i.e., variation in the run), imbalance between the signal response from pairs of complementary alleles (i.e., allelic imbalance) and many other factors that may arise during genotype analysis.
  • By recognizing and compensating for this variation, the alleles can be called with greater accuracy and in a manner that can be more readily automated for high throughput analytical genotyping systems. In at least one implementation of the present invention, signal response detected from the alleles are transformed from a relative measure of the allelic signal response into a unit-less scalar measure of the various alleles (i.e., homozygous A/B and heterozygous). Through this transformation, the representation of alleles in a normalized allelic space provides a more consistent and more accurate representation of the alleles being measured in a given sample.
  • Processing performed by normalized allelic space transform 112 also tends to preserve more data and increase the sample sets stored in genotype database derived from normalized allelic space 114. Computer based analytical operations can be performed with fewer exceptions or the need to eliminate data that otherwise would appear as outlying data points or noise. Output systems and interface for genotyping analysis 116 can therefore provide more automated genotyping based on rules rather than requiring visual inspection or subjective viewing of the datasets. For example, a representation of genotypes in the normalized allelic space 114 can be more readily analyzed using a computer and require less visualization and subjective judgment from a user or operator of the analytic equipment used in genotyping.
  • FIG. 2 is a schematic flowchart of the operations used to call genotypes using both aspects of the present invention as well as those available through clustering. The analytical framework for genotyping operations generally works in three stages: a genotype preprocessing 200, a genotype classification 202 and genotype presentation 204 as illustrated in FIG. 2. It is contemplated that the preprocessing transformations of the allelic signals in accordance with aspects of the present invention provides a superior approach to genotype analysis though conventional clustering and other approaches as suggested by Holden should also be considered. Consequently, the operations outlined in FIG. 2 as indicated below include clustering as part of the analytical framework as an option even though genotype calling may be performed with sufficient accuracy without these additional operations or steps in most cases.
  • In genotype preprocessing 200, allelic signal intensity information is gathered for a combination of experimental samples and corresponding assays as represented by allele A 206A and allele B 206B sample matrices. For example, each of 48 assays are applied to 96 samples each tested for two alleles for a total of 9216 measurements contained in both allele A 206A and allele B 206B matrices of intensity data points. As previously recognized, these intensity values generally include several sources of systematic variation thus causing the cluster or scatter plots to not exhibit the expected genotype distributions of homozygous A/A, B/B or heterozygous A/B alleles.
  • In one implementation, it is observed that sources of systematic variation can be attributed to a number of factors including variability in the different samples, variability from one run to another run over time and an imbalance in the signal response from one allele over a complementary allele. As will be described in further detail later herein, implementations of the present invention transform the allelic signals to a normalized allelic space through a model and model parameters that compensate for these identified systematic variations (208). Essentially, this step of the operation not only compensates for these systematic variations but also serves to preserve more data.
  • Once transformed into the normalized allelic space, implementations of the present invention can then call the genotype for a target sample according to its relationships with expected distributions in normalized allelic space (210). Several approaches can be used to compare the measurements associated with the target experimental sample and the other samples in normalized allelic space. For example, a proximity approach identifies a minimum distance between the target experimental sample and an expected genotype using Cartesian coordinates in the normalized allelic space. An alternative implementation performs additional genotype preprocessing 200 to create a fixed prior probability distribution of alleles normalized in a normalized allelic space. In this operation, training data derived from many different assays, samples and runs are combined together for analysis using a common model and several parameters. It is presumed that each of these data sets are normalized as previously described by identifying an appropriate set of parameters to compensate for the one or more sources of systematic variation. These normalized values are associated with an error function that further refines the distribution presented in the normalized allelic space. Instead of a Cartesian measurement to an expected genotype coordinate, this alternate implementation determines a maximum probability genotype for the target sample as it relates to the error function and neighboring data samples used as training data. Further details on this operation are provided later herein.
  • In either of the above or other implementations described above or similarly contemplated, genotype classification (212) can also employ conventional clustering as described in Holden. In particular, the call for the genotype of the target sample is performed according to a clustering of the allelic signal response as described in Holden. For example, this may involve measuring a relative angular offset of values for various allelic signals identified for a set of samples and assays. Genotype calls for the target sample in normalized allelic space (210) or based upon clustering of allelic data signals are then presented along with a confidence factor for each call and classification (214).
  • Referring now to FIG. 3, a flowchart diagram depicts the operations associated with normalizing the allelic signal response and transforming the signals into a normalized allelic space in accordance with some implementations of the present invention. Implementations of the present invention initially model the allelic signal response with one or more model parameters for each source of systematic variation (302). A basic assumption or constraint for the model is that the normalized signal of any pair of alleles whether homozygous A/A, B/B or heterozygous A/B should sum to unity. Schematically, parameters are incorporated into the model that provide sufficient flexibility to account for a multiplicative variability as contributed by each run, sample and each allele within a SNP assay. Accordingly, one implementation of this model appears below in Equation 1.
    γαiβj(A ijii B ij)−1
  • Where:
      • Aij and Bij are the measured allele signals for SNP assay i and sample j
      • γ is a parameter to compensate for variability in the overall run
      • αi is a parameter to compensate for variability in the SNP assay i
      • βj is a parameter to compensate for variability in the sample j
      • δi is a parameter to compensate for allele imbalance for a SNP assay i and sample j
  • Equation 1
  • Provided this model in Equation 1 or other useful models, aspects of the present invention identify model parameters that reduce overall variation between the allelic signal response and their representation in the normalized allelic space (304). Essentially, this operation identifies values for the parameters α, β, γ and δ that represent the systematic variation of our measured allelic signal response. One example technique used to identify these parameters minimizes an appropriately selected error function. In Equation 2 provided below, a Chi-square error function is selected to identify the parameters that reduce this overall variation however, it is contemplated that many other error functions could be selected and solved to identify these parameter values. χ 2 = i = 1 N j = 1 M [ γα i β j ( A ij / δ i + δ i B ij ) - 1 σ ij ] 2
    Where:
      • χ2 a Chi-square statistic
      • σij is an estimatation of the error associated with measuring the allelic signal response for a SNP assay i and sample j
      • N is the maximum number of SNP assays
      • M is the maximum number of samples
  • Equation 2
  • The Chi-square in Equation 2 is a robust statistic often used to process data of uneven quality. Chi-square statistics also behave appropriately as an error function as the values generated are non-negative and its minima usually indicate optimal parameters and describe goodness-of-fit. In this application, the Chi-square statistic operates by comparing the actual, or observed, signal values for each allele compared to the expected signal values if there were no relationship at all between of the set of allele signals in a run made with several samples and SNP assays. The Chi-square statistic in Equation 2 is useful because it tends to discount or reduce weight of data points that may have large error. Likewise, data with smaller error, which should be emphasized, are weighted more heavily.
  • Minimizing the Chi-square statistic for the allele signals from Equation 2 above can be accomplished by requiring that the first partial derivatives of the Chi-square statistic each be set to zero. This is a requirement to determine an extrema of the Chi-squared statistic. It should be understood that it is typically sufficient to require an extrema of χ2 as error functions generally do not have maxima. Accordingly, a set of partial first derivatives as found in Equation 3 below can be used to solve for the parameters used in Equation 1 and Equation 2 above without requiring additional constraints on the negativity of second derivatives However, in the case of non-linear equations, some care must be taken to insure that the minimum identified is a global minimum and not merely local minimum. χ 2 γ = 0 χ 2 α i = 0 χ 2 δ i = 0 χ 2 β j = 0
  • Where: χ 2 γ
      • is partial derivative of Chi-square with respect to γ, the overall run compensation factor χ 2 α i
      • is partial derivative of Chi-square with respect to αi, the SNP assay compensation factor χ 2 δ i
      • is partial derivative of Chi-square with respect to δi, the allele balance compensation factor χ 2 β j
      • is partial derivative of Chi-square with respect to βj, the sample compensation factor
  • Equation 3
  • In some cases, the set of equations in Equation 3 may be degenerate and require additional constraints. In its degenerate form, Equation 3 would not be able to specify the α and β parameters uniquely. For example, it can be seen that the model is insensitive to multiplying the α's by a factor and the β's by its reciprocal. Accordingly, the added constraints to prevent the degenerate condition are described below with respect to Equation 4. i = 1 N α i = 1 ( 1 ) j = 1 M β j = 1 ( 2 )
  • Where: i = 1 N α i
      • is a repeated product of correction factors for SNP assays having a geometric mean of 1. j = 1 M β j
      • is a repeated product of correction factors for the samples having a geometric mean of 1.
  • Equation 4
  • To expedite identifying a solution, the non-linear model and constraints provided in Equations 1-4 are recast into separate operations that can be solved quickly with linear algebra. This approach also eliminates concerns of solving these equations and finding local minima rather than the global minima for the function. It also allows the approximate solution to Equation 1 to be found optimally without otherwise expending a computationally significant time to find the solution. It is satisfactory to recast this model into separate steps that can be solved with linear algebra.
  • First, we temporarily disregard the allele imbalance in Equation 1 by removing δi. This provides the following relationship in (1) indicated in Equation 5 below. By substituting the Vij according to the relationship (2) Equation 5, and applying the logarithm to both sides of the equation we have the relationship (3) from Equation 5 below and the linear equation in (4) readily solved in accordance with methods known to those skilled in the art. As a last step subsequent to the solution of (4) and α, β and γ a separate solution can be derived for an allele-balancing factor, δ.
    γαiβj(A ij +B ij)=1  (1)
    V ij =A ij +B ij  (2)
    log(γαiβj V ij)=0  (3)
    g+a i +b j +u ij=0  (4)
  • Where:
      • g=log γ
      • ai=log αi
      • bj=log βj
      • uij=log Vij
  • Equation 5
  • Once determined, the parameters are then applied to the measured allele signals to transform our allele pairs into a normalized allelic space rather than the allelic signal space originally measured. This resulting normalized allelic space may be universally used as it is largely free of systematic instrument and chemistry artifacts and can be used across runs as well as samples, and alleles within a SNP assays. The conversion between the signal response between alleles Aij and Bij and the corresponding values in normalized allelic space of aij and bij is defined by the relationships in Equation 6.
    a ij=γαiβj A iji
    b ij=γαiβjδi B ij
  • Where:
      • aij is a allele signal response mapped into the allelic space for allele Aij
      • bij is a allele signal response mapped into the allelic space for allele Bij
  • Equation 6
  • Systematic variation in raw allele signal Aij and Bij is compensated through Equation 6 generating normalized allele signals aij and bij in the normalized allelic space. The genotyping is then determined for the sample based upon the relationship to the representation of the allelic signals in this normalized allelic space and in accordance with the model (306). Details on this determination is provided in further detail later herein.
  • In FIG. 4A, a schematic representation illustrates the expected genotype values and basis for some of the constraints used by some implementations of the present invention. In particular, three different possible alleles are mapped in a unit space at expected genotype values along Cartesian coordinates (1, 0), (½, ½) and (0, 1). Each of the genotype value coordinates corresponds to homozygous AA alleles, heterozygous AB alleles and homozygous BB alleles with an expectation that sum of each genotype pair is unity or one.
  • As a point of comparison, FIG. 4B illustrates the more random appearing distribution of approximately eighty-thousand pairs of allelic signals before normalization and processing performed in accordance with some aspects of the present invention. Namely, the scatter plot of allele signal intensity points in FIG. 4B do not appear to form three distinct representation of the expected genotypes from FIG. 4A but instead provide ambiguous information about the relative existence of one allele over another.
  • In contrast, FIG. 4C illustrates a rather distinct grouping of the three expected genotypes in a unit space as illustrated in FIG. 4A. The points in FIG. 4C fall into the expected genotype values as set forth in FIG. 4A and encompassed by the model and parameters used to normalize the values from FIG. 4B as provided in Equation 1 through Equation 6 provided hereinabove. Normalization not only removes ambiguity of the values presented in the conventional scatter plot but also serves to preserve data otherwise considered outlying and noisy by normal clustering and other analytical genotyping tools.
  • FIG. 4D provides yet another distinct grouping of the three expected genotypes in accordance with other aspects of the present invention. These contours are frequency determinations of a training set that are subsequently used to fit prior probability distributions. In particular, the genotypes and parameters fitted to an elliptical Gaussian distribution provide even more differentiation between the alleles.
  • In light of the distribution illustrated in FIG. 4C, calling a genotype can readily be determined immediately and without the need for clustering. FIG. 5A presents the operations of using a Cartesian coordinate system in the normalized allelic space to call the genotype for a given sample. This is sometimes referred to as a proximity genotyping approach. Using the normalized allelic space representation, implementations of the present invention measure a Cartesian coordinate distance from an allelic signal response represented in the normalized allelic space to each of the expected one or more genotype coordinates (502). For example, the expected genotype coordinates for homozygous alleles AA, heterozygous alleles AB and homozygous alleles BB corresponds to coordinates (1,0), (½, ½) and (0,1) for the unit based normalized allelic space.
  • Next, various implementations of the present invention call a genotype for the sample that represents the shortest Cartesian distance between the sample represented in normalized allelic space and the expected one or more genotype coordinates for each allele(504). As previously described, clustering can also be used as an additional indicia for genotyping but may not be required given the relatively straightforward ability to call the genotype in the normalized allelic space.
  • Likewise, the prior probability distribution illustrated in FIG. 4D also facilitates calling a genotype immediately and without the need for clustering. FIG. 5B provides an alternative approach to calling genotypes using a fixed prior probability distribution in accordance with implementations of the present invention. Initially, this approach identifies a probability of each experimental genotype according to the prior probability model distributions (506). Normalized allelic values for each experimental genotype are used in the equations that describe the prior probability distribution. For example, the normalized allelic values are entered into a set of elliptical Gaussian equations. Details on determining the fixed prior probability distribution are described herein below in FIG. 5C. Next, the genotype for the experimental sample is called based upon the greatest prior probability result (506). In practice, calling the genotype for the experimental sample is based upon a greatest prior probability and reflected by the probability distribution as applied to the sample normalized in normalized allelic space. The highest resulting probability indicates the most likely genotype for the experimental sample.
  • FIG. 5C is a flowchart diagram of the operations for generating the fixed prior probability distribution used in calling the genotypes. Initially, a large set of training data genotypes 510A and 510B are used to establish the prior probability distribution. Essentially, the frequency of occurrence of these normalized signals in the training set is what establishes the expected distribution in normalized allelic space subsequent genotype samples. As before, the systematic variation is compensated for by normalizing and transforming the allelic signal response of the various training genotypes into scalar values in normalized allelic space (512). Various implementations of the present invention fit the normalized set of genotype training data to one or more probability distributions that reflects the set of alleles (610). In some implementations, the distribution can be modeled using three elliptical Gaussian distributions as described by the general form set formed in Equation 7 below.
    F(a,b)=Fe −(a−a o ) 2 /s o 2 −(b−b o ) 2 /s b 2
  • Where:
      • a is a normalized allele A in the allelic space
      • b is a normalized allele B in the allelic space
      • Fo is a model parameter
      • sb 2 is a measure of the dimension of the ellipse along the axis of allele b in allelic space
      • sa 2 is a measure of the dimension of the ellipse along the axis of allele a in allelic space
      • ao is an offset for the center of the ellipse along the axis of allele a in allelic space
      • bo is an offset for the center of the ellipse along the axis of allele b in allelic space
  • Equation 7
  • Equation 8 provides a more specific set of equations for the prior probability distributions provided that ao and bo are the expected genotype coordinates (a,b) for homozygous allele AA, heterozygous allele AB and homozygous allele BB with corresponding coordinates (1,0), (½, ½) and (0,1) in the unit based normalized allelic space.
    Homozygousaa(a,b)=F aa e −(a−1) 2 /s a 2 −(b−0) 2 /s b 2
    Heterozygousab(a,b)=F ab e −(a−1/2) 2 /s a 2 −(b−1/2) 2 /s b 2
    Homozygousbb(a,b)=F bb e −(a-0) 2 /s a 2 −(b−1/2) 2 /s b 2
  • Equation 8
  • FIG. 6 depicts a schematic system 112 and methods for calling genotypes in accordance with implementations of the present invention. System 112 includes a memory 602 to hold executing programs typically random access memory (RAM) or read-only memory (ROM) such as Flash, a display interface 604, a spectral detector interface 606, a secondary storage 608, a network communication port 610, and a processor 612, operatively coupled together over an interconnect 614.
  • Display interface 604 allows presentation of information related to genotyping in normalized allelic space and with clustering of signals. Spectral detector interface 606 contains circuitry to control operation of a spectral detector including duplex transmission of data in real-time or in a batch operation. Secondary storage 608 can contain experimental results and programs for long-term storage including allelic signal response data, normalized allelic space transformation data, normalized and raw genotype training data, target samples and other data useful in genotyping in accordance with aspects of the present invention. Network communication port 601 transmits and receives results and data over a network to other computer systems and databases. Processor 612 executes the routines and modules contained in memory 602.
  • In the illustration, memory 602 includes allelic normalization component 616, fixed prior probability distribution component 618, cluster genotyping component 620, normalized allelic space genotyping component 622 and a run-time system 628 for managing computing resources used in conjunction with one or more of the above components.
  • Allelic normalization component 616 contains routines to transform allelic signals in a signal domain to scalar values in the normalized allelic space described previously. Essentially, this component uses a model and corresponding model parameters to compensate for systematic variation in the one or more allelic signals detected.
  • Fixed prior probability distribution component 618 gathers many data points from various runs, samples and SNP assays to create a probability distribution and basis for genotyping new samples. As previously described, the prior probability distribution in normalized allelic space can be further modeled using a Gaussian elliptical representation for each of the homozygous alleles (both AA and BB) as well as the heterozygous allele.
  • Run-time system 628 manages system resources used when processing one or more of the previously mentioned modules. For example, run-time system 628 can be a general-purpose operating system, an embedded operating system or a real-time operating system or controller.
  • System 112 can be preprogrammed, in ROM, for example, using field-programmable gate array (FPGA) technology or it can be programmed (and reprogrammed) by loading a program from another source (for example, from a floppy disk, an ordinary disk drive, a CD-ROM, or another computer). In addition, system 112 can be implemented using customized application specific integrated circuits (ASICs).
  • Having thus described various implementations and embodiments of the present invention, it should be noted by those skilled in the art that the disclosures are exemplary only and that various other alternatives, adaptations and modifications may be made within the scope of the present invention. For example, it is mentioned that aspects of the present invention are useful for genotyping as it relates to SNPs and SNP assays. However, it is also contemplated that the teachings performed in accordance with implementations of the present invention can also be performed to facilitate genotyping of insertions, deletions, restriction fragment length polymorphisms (“RFLPSs”), short tandem repeat polymorphisms (“STRPs”) as well as single nucleotide polymorphisms (“SNPs”).
  • Embodiments of the invention can be implemented in digital electronic circuitry, or in computer hardware, firmware, software, or in combinations of thereof. Apparatus of the invention can be implemented in a computer program product tangibly embodied in a machine-readable storage device for execution by a programmable processor; and method steps of the invention can be performed by a programmable processor executing a program of instructions to perform functions of the invention by operating on input data and generating output. The invention can be implemented advantageously in one or more computer programs that are executable on a programmable system including at least one programmable processor coupled to receive data and instructions from, and to transmit data and instructions to, a data storage system, at least one input device, and at least one output device. Each computer program can be implemented in a high-level procedural or object-oriented programming language, or in assembly or machine language if desired; and in any case, the language can be a compiled or interpreted language. Suitable processors include, by way of example, both general and special purpose microprocessors. Generally, a processor will receive instructions and data from a read-only memory and/or a random access memory. Generally, a computer will include one or more mass storage devices for storing data files; such devices include magnetic disks, such as internal hard disks and removable disks; magneto-optical disks; and optical disks. Storage devices suitable for tangibly embodying computer program instructions and data include all forms of non-volatile memory, including by way of example semiconductor memory devices, such as EPROM, EEPROM, and flash memory devices; magnetic disks such as internal hard disks and removable disks; magneto-optical disks; and CD-ROM disks. Any of the foregoing can be supplemented by, or incorporated in, ASICs.
  • Thus, the invention is not limited to the specific embodiments described and illustrated above. Instead, the invention is construed according to the claims that follow and the full-scope of their equivalents thereof.

Claims (20)

1. A computer implemented method of generating genotype calls for a sample, comprising:
modeling allelic signal response into an allelic model having one or more model parameters for an identified one or more sources of systematic variation;
transforming the allelic signals through the one or more model parameters of the allelic model to a normalized allelic space that serves to compensate for the one or more sources of systematic variation; and
determining a genotype for the sample based upon its relationship to the representation of the allelic signals in normalized allelic space and in accordance with the allelic model.
2. The method of claim 1 wherein determining the genotype for the sample further comprises,
measuring a Cartesian distance from a sample represented in the normalized allelic space to an expected one or more genotype coordinates; and
calling a genotype for the sample that represents the shortest measured Cartesian distance between the sample and one of the expected one or more genotype coordinates.
3. The method of claim 2 wherein the expected one or more genotype coordinates is selected from a set of genotype coordinates including: a homozygous allele A, a heterozygous allele AB and a homozygous allele B having corresponding coordinates (1, 0), (½, ½) and (0, 1) respectively.
4. The method of claim 1 wherein determining the genotype for the sample further comprises,
creating a fixed prior probability distribution of normalized allele signals from a statistically significant training set of genotype data by fitting a normalized training set of genotype data to probability distributions that reflect a set of allowed genotypes in the normalized allelic space; and;
transforming ordinary allelic signals through the one or more model parameters of an allelic model to a normalized allelic space that serves to compensate for the one or more sources of systematic variation;
calling the genotype for the sample based upon a greatest prior probability as reflected by the fitted probability distribution and applied to the sample normalized in the normalized allelic space.
5. The method of claim 4 wherein the probability distributions are a based upon elliptical Gaussian probability distributions.
6. The method of claim 4 wherein the elliptical Gaussian probability distribution for the set of alleles normalized in the normalized allelic space is represented by the following equations:

Homozygousaa(a,b)=F aa e −(a−1) 2 /s a 2 −(b−0) 2 /s b 2
Heterozygousab(a,b)=F ab e −(a−1/2) 2 /s a 2 −(b−1/2) 2 /s b 2
Homozygousbb(a,b)=F bb e −(a-0) 2 /s a 2 −(b−1/2) 2 /s b 2
7. The method of claim 1 wherein the systematic variation in allelic signals is selected from a set of systematic factors including: a sample variability, an experimental run variability and an allelic imbalance.
8. The method of claim 1 wherein a parameter is associated with each source of systematic variation is modified to minimize overall variation in the allelic signals and a model of the alleles in a training population.
9. The method of claim 1 wherein the model of the alleles in the training population arc assigned a unit probability of occurrence in the allele space of homozygous and heterozygous alleles.
10. The method of claim 8 wherein the model of the alleles Aij and Bij in the training population are fit to a linear function as follows:

γαiβj(A ijii B ij)=1
Where:
Aij and Bij are the measured allele signals for SNP assay i and sample j
γ is a parameter to compensate for variability in the overall run
αi is a parameter to compensate for variability in the SNP assay i
βj is a parameter to compensate for variability in the sample j
δi is a parameter to compensate for allele imbalance for a SNP assay i and sample j
parameters for systematic variation due to polymorphism i, sample j and run respectively.
11. A computer program product for generating genotype calls for a sample, comprising instructions operable to cause a programmable processor to:
model allelic signal response into an allelic model having one or more model parameters for an identified one or more sources of systematic variation;
transform the allelic signals through the one or more model parameters of the allelic model to a normalized allelic space that serves to compensate for the one or more sources of systematic variation; and
determine a genotype for the sample based upon its relationship to the representation of the allelic signals in normalized allelic space and in accordance with the allelic model.
12. The computer program product of claim 11 wherein the instructions to determine the genotype for the sample further comprises instructions when executed that,
measure a Cartesian distance from a sample represented in the normalized allelic space to an expected one or more genotype coordinates; and
call a genotype for the sample that represents the shortest measured Cartesian distance between the sample and one of the expected one or more genotype coordinates.
13. The computer program product of claim 12 wherein the expected one or more genotype coordinates is selected from a set of genotype coordinates including: a homozygous allele A, a heterozygous allele AB and a homozygous allele B having corresponding coordinates (1, 0), (½, ½) and (0, 1) respectively.
14. The computer program product of claim 11 wherein the instructions that determine the genotype for the sample further comprises instructions when executed that,
create a fixed prior probability distribution of normalized allele signals from a statistically significant training set of genotype data by fitting a normalized training set of genotype data to probability distributions that reflect a set of allowed genotypes in the normalized allelic space; and,
transform ordinary allelic signals through the one or more model parameters of an allelic model to a normalized allelic space that serves to compensate for the one or more sources of systematic variation;
call the genotype for the sample based upon a greatest prior probability as reflected by the fitted probability distribution and applied to the sample normalized in the normalized allelic space.
15. The computer program product of claim 14 wherein the probability distributions are a based upon elliptical Gaussian probability distributions.
16. The computer program product of claim 15 wherein the elliptical Gaussian probability distribution for the set of alleles normalized in the normalized allelic space is represented by the following equations:

Homozygousaa(a,b)=F aa e −(a−1) 2 /s a 2 −(b−0) 2 /s b 2
Heterozygousab(a,b)=F ab e −(a−1/2) 2 /s a 2 −(b−1/2) 2 /s b 2
Homozygousbb(a,b)=F bb e −(a-0) 2 /s a 2 −(b−1/2) 2 /s b 2
17. An apparatus for generating genotype calls for a sample, comprising:
means for modeling allelic signal response into an allelic model having one or more model parameters for an identified one or more sources of systematic variation;
means for transforming the allelic signals through the one or more model parameters of the allelic model to a normalized allelic space that serves to compensate for the one or more sources of systematic variation; and
means for determining a genotype for the sample according to a clustering of allelic data signals as represented in the normalized allelic space.
18. The apparatus of claim 17 wherein clustering of allelic data signals further comprises,
means for measuring a relative angular offset of values for the allelic data signals in the normalized allelic space; and
means for calling a genotype for the sample based upon its proximity to the different angular offsets of values and their corresponding genotype classifications.
19. The apparatus of claim 18 wherein the expected one or more genotype coordinates is selected from a set of genotype coordinates including: a homozygous allele A, a heterozygous allele ˜AB and a homozygous allele B having corresponding coordinates (1, 0), (½, ½) and (0, 1) respectively.
20. The apparatus of claim 17 wherein determining the genotype for the sample further comprises,
means for presenting the cluster of allelic data signals in a diagram along with a confidence factor for the sample and genotype classification.
US11/259,162 2004-10-25 2005-10-25 Method and system for genotyping samples in a normalized allelic space Abandoned US20060134662A1 (en)

Priority Applications (4)

Application Number Priority Date Filing Date Title
US11/259,162 US20060134662A1 (en) 2004-10-25 2005-10-25 Method and system for genotyping samples in a normalized allelic space
US12/551,438 US8359166B2 (en) 2004-10-25 2009-08-31 Method and system for genotyping samples in a normalized allelic space
US13/602,676 US20130060479A1 (en) 2004-10-25 2012-09-04 Method and system for genotyping samples in a normalized allelic space
US15/926,535 US20180211000A1 (en) 2004-10-25 2018-03-20 Method and System for Genotyping Samples in a Normalized Allelic Space

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US62227904P 2004-10-25 2004-10-25
US11/259,162 US20060134662A1 (en) 2004-10-25 2005-10-25 Method and system for genotyping samples in a normalized allelic space

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US12/551,438 Continuation US8359166B2 (en) 2004-10-25 2009-08-31 Method and system for genotyping samples in a normalized allelic space

Publications (1)

Publication Number Publication Date
US20060134662A1 true US20060134662A1 (en) 2006-06-22

Family

ID=36596379

Family Applications (4)

Application Number Title Priority Date Filing Date
US11/259,162 Abandoned US20060134662A1 (en) 2004-10-25 2005-10-25 Method and system for genotyping samples in a normalized allelic space
US12/551,438 Active 2026-08-20 US8359166B2 (en) 2004-10-25 2009-08-31 Method and system for genotyping samples in a normalized allelic space
US13/602,676 Abandoned US20130060479A1 (en) 2004-10-25 2012-09-04 Method and system for genotyping samples in a normalized allelic space
US15/926,535 Abandoned US20180211000A1 (en) 2004-10-25 2018-03-20 Method and System for Genotyping Samples in a Normalized Allelic Space

Family Applications After (3)

Application Number Title Priority Date Filing Date
US12/551,438 Active 2026-08-20 US8359166B2 (en) 2004-10-25 2009-08-31 Method and system for genotyping samples in a normalized allelic space
US13/602,676 Abandoned US20130060479A1 (en) 2004-10-25 2012-09-04 Method and system for genotyping samples in a normalized allelic space
US15/926,535 Abandoned US20180211000A1 (en) 2004-10-25 2018-03-20 Method and System for Genotyping Samples in a Normalized Allelic Space

Country Status (1)

Country Link
US (4) US20060134662A1 (en)

Cited By (39)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070027636A1 (en) * 2005-07-29 2007-02-01 Matthew Rabinowitz System and method for using genetic, phentoypic and clinical data to make predictions for clinical or lifestyle decisions
US20070178501A1 (en) * 2005-12-06 2007-08-02 Matthew Rabinowitz System and method for integrating and validating genotypic, phenotypic and medical information into a database according to a standardized ontology
WO2008115497A2 (en) * 2007-03-16 2008-09-25 Gene Security Network System and method for cleaning noisy genetic data and determining chromsome copy number
US20080243398A1 (en) * 2005-12-06 2008-10-02 Matthew Rabinowitz System and method for cleaning noisy genetic data and determining chromosome copy number
US20110033862A1 (en) * 2008-02-19 2011-02-10 Gene Security Network, Inc. Methods for cell genotyping
US20110092763A1 (en) * 2008-05-27 2011-04-21 Gene Security Network, Inc. Methods for Embryo Characterization and Comparison
US20110178719A1 (en) * 2008-08-04 2011-07-21 Gene Security Network, Inc. Methods for Allele Calling and Ploidy Calling
US8024128B2 (en) 2004-09-07 2011-09-20 Gene Security Network, Inc. System and method for improving clinical decisions by aggregating, validating and analysing genetic and phenotypic data
US8532930B2 (en) 2005-11-26 2013-09-10 Natera, Inc. Method for determining the number of copies of a chromosome in the genome of a target individual using genetic data from genetically related individuals
US8825412B2 (en) 2010-05-18 2014-09-02 Natera, Inc. Methods for non-invasive prenatal ploidy calling
US9163282B2 (en) 2010-05-18 2015-10-20 Natera, Inc. Methods for non-invasive prenatal ploidy calling
US9228234B2 (en) 2009-09-30 2016-01-05 Natera, Inc. Methods for non-invasive prenatal ploidy calling
US9424392B2 (en) 2005-11-26 2016-08-23 Natera, Inc. System and method for cleaning noisy genetic data from target individuals using genetic data from genetically related individuals
US9499870B2 (en) 2013-09-27 2016-11-22 Natera, Inc. Cell free DNA diagnostic testing standards
US9677118B2 (en) 2014-04-21 2017-06-13 Natera, Inc. Methods for simultaneous amplification of target loci
US10011870B2 (en) 2016-12-07 2018-07-03 Natera, Inc. Compositions and methods for identifying nucleic acid molecules
US10083273B2 (en) 2005-07-29 2018-09-25 Natera, Inc. System and method for cleaning noisy genetic data and determining chromosome copy number
US10081839B2 (en) 2005-07-29 2018-09-25 Natera, Inc System and method for cleaning noisy genetic data and determining chromosome copy number
US10113196B2 (en) 2010-05-18 2018-10-30 Natera, Inc. Prenatal paternity testing using maternal blood, free floating fetal DNA and SNP genotyping
US10179937B2 (en) 2014-04-21 2019-01-15 Natera, Inc. Detecting mutations and ploidy in chromosomal segments
CN109240531A (en) * 2018-07-16 2019-01-18 青岛海信移动通信技术股份有限公司 Sampling compensation method, device, mobile terminal and the storage medium of touch data
US10262755B2 (en) 2014-04-21 2019-04-16 Natera, Inc. Detecting cancer mutations and aneuploidy in chromosomal segments
US10316362B2 (en) 2010-05-18 2019-06-11 Natera, Inc. Methods for simultaneous amplification of target loci
US10526658B2 (en) 2010-05-18 2020-01-07 Natera, Inc. Methods for simultaneous amplification of target loci
US10577655B2 (en) 2013-09-27 2020-03-03 Natera, Inc. Cell free DNA diagnostic testing standards
US10894976B2 (en) 2017-02-21 2021-01-19 Natera, Inc. Compositions, methods, and kits for isolating nucleic acids
US11111543B2 (en) 2005-07-29 2021-09-07 Natera, Inc. System and method for cleaning noisy genetic data and determining chromosome copy number
US11111544B2 (en) 2005-07-29 2021-09-07 Natera, Inc. System and method for cleaning noisy genetic data and determining chromosome copy number
US11322224B2 (en) 2010-05-18 2022-05-03 Natera, Inc. Methods for non-invasive prenatal ploidy calling
US11326208B2 (en) 2010-05-18 2022-05-10 Natera, Inc. Methods for nested PCR amplification of cell-free DNA
US11332785B2 (en) 2010-05-18 2022-05-17 Natera, Inc. Methods for non-invasive prenatal ploidy calling
US11332793B2 (en) 2010-05-18 2022-05-17 Natera, Inc. Methods for simultaneous amplification of target loci
US11339429B2 (en) 2010-05-18 2022-05-24 Natera, Inc. Methods for non-invasive prenatal ploidy calling
US11408031B2 (en) 2010-05-18 2022-08-09 Natera, Inc. Methods for non-invasive prenatal paternity testing
CN115035957A (en) * 2022-05-31 2022-09-09 陕西师范大学 Improved minimum residue method analysis mixed STR atlas based on particle swarm optimization
US11479812B2 (en) 2015-05-11 2022-10-25 Natera, Inc. Methods and compositions for determining ploidy
US11485996B2 (en) 2016-10-04 2022-11-01 Natera, Inc. Methods for characterizing copy number variation using proximity-litigation sequencing
US11525159B2 (en) 2018-07-03 2022-12-13 Natera, Inc. Methods for detection of donor-derived cell-free DNA
US11939634B2 (en) 2010-05-18 2024-03-26 Natera, Inc. Methods for simultaneous amplification of target loci

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060134662A1 (en) * 2004-10-25 2006-06-22 Pratt Mark R Method and system for genotyping samples in a normalized allelic space
US9600625B2 (en) 2012-04-23 2017-03-21 Bina Technologies, Inc. Systems and methods for processing nucleic acid sequence data

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030186279A1 (en) * 2002-03-28 2003-10-02 Affymetrix, Inc. Large scale genotyping methods

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6274317B1 (en) * 1998-11-02 2001-08-14 Millennium Pharmaceuticals, Inc. Automated allele caller
AU2003247832A1 (en) * 2002-06-28 2004-01-19 Applera Corporation A system and method for snp genotype clustering
US20060134662A1 (en) * 2004-10-25 2006-06-22 Pratt Mark R Method and system for genotyping samples in a normalized allelic space

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030186279A1 (en) * 2002-03-28 2003-10-02 Affymetrix, Inc. Large scale genotyping methods

Cited By (94)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8024128B2 (en) 2004-09-07 2011-09-20 Gene Security Network, Inc. System and method for improving clinical decisions by aggregating, validating and analysing genetic and phenotypic data
US11111544B2 (en) 2005-07-29 2021-09-07 Natera, Inc. System and method for cleaning noisy genetic data and determining chromosome copy number
US10260096B2 (en) 2005-07-29 2019-04-16 Natera, Inc. System and method for cleaning noisy genetic data and determining chromosome copy number
US10266893B2 (en) 2005-07-29 2019-04-23 Natera, Inc. System and method for cleaning noisy genetic data and determining chromosome copy number
US20070027636A1 (en) * 2005-07-29 2007-02-01 Matthew Rabinowitz System and method for using genetic, phentoypic and clinical data to make predictions for clinical or lifestyle decisions
US10227652B2 (en) 2005-07-29 2019-03-12 Natera, Inc. System and method for cleaning noisy genetic data from target individuals using genetic data from genetically related individuals
US10392664B2 (en) 2005-07-29 2019-08-27 Natera, Inc. System and method for cleaning noisy genetic data and determining chromosome copy number
US11111543B2 (en) 2005-07-29 2021-09-07 Natera, Inc. System and method for cleaning noisy genetic data and determining chromosome copy number
US10081839B2 (en) 2005-07-29 2018-09-25 Natera, Inc System and method for cleaning noisy genetic data and determining chromosome copy number
US10083273B2 (en) 2005-07-29 2018-09-25 Natera, Inc. System and method for cleaning noisy genetic data and determining chromosome copy number
US9424392B2 (en) 2005-11-26 2016-08-23 Natera, Inc. System and method for cleaning noisy genetic data from target individuals using genetic data from genetically related individuals
US9430611B2 (en) 2005-11-26 2016-08-30 Natera, Inc. System and method for cleaning noisy genetic data from target individuals using genetic data from genetically related individuals
US8682592B2 (en) 2005-11-26 2014-03-25 Natera, Inc. System and method for cleaning noisy genetic data from target individuals using genetic data from genetically related individuals
US11306359B2 (en) 2005-11-26 2022-04-19 Natera, Inc. System and method for cleaning noisy genetic data from target individuals using genetic data from genetically related individuals
US8532930B2 (en) 2005-11-26 2013-09-10 Natera, Inc. Method for determining the number of copies of a chromosome in the genome of a target individual using genetic data from genetically related individuals
US9695477B2 (en) 2005-11-26 2017-07-04 Natera, Inc. System and method for cleaning noisy genetic data from target individuals using genetic data from genetically related individuals
US10711309B2 (en) 2005-11-26 2020-07-14 Natera, Inc. System and method for cleaning noisy genetic data from target individuals using genetic data from genetically related individuals
US10597724B2 (en) 2005-11-26 2020-03-24 Natera, Inc. System and method for cleaning noisy genetic data from target individuals using genetic data from genetically related individuals
US10240202B2 (en) 2005-11-26 2019-03-26 Natera, Inc. System and method for cleaning noisy genetic data from target individuals using genetic data from genetically related individuals
US8515679B2 (en) 2005-12-06 2013-08-20 Natera, Inc. System and method for cleaning noisy genetic data and determining chromosome copy number
US20080243398A1 (en) * 2005-12-06 2008-10-02 Matthew Rabinowitz System and method for cleaning noisy genetic data and determining chromosome copy number
US20070178501A1 (en) * 2005-12-06 2007-08-02 Matthew Rabinowitz System and method for integrating and validating genotypic, phenotypic and medical information into a database according to a standardized ontology
CN101790731A (en) * 2007-03-16 2010-07-28 吉恩安全网络公司 Be used to remove the system and method that genetic data disturbed and determined the chromosome copies number
WO2008115497A3 (en) * 2007-03-16 2009-05-28 Gene Security Network System and method for cleaning noisy genetic data and determining chromsome copy number
WO2008115497A2 (en) * 2007-03-16 2008-09-25 Gene Security Network System and method for cleaning noisy genetic data and determining chromsome copy number
US20110033862A1 (en) * 2008-02-19 2011-02-10 Gene Security Network, Inc. Methods for cell genotyping
US20110092763A1 (en) * 2008-05-27 2011-04-21 Gene Security Network, Inc. Methods for Embryo Characterization and Comparison
US9639657B2 (en) 2008-08-04 2017-05-02 Natera, Inc. Methods for allele calling and ploidy calling
US20110178719A1 (en) * 2008-08-04 2011-07-21 Gene Security Network, Inc. Methods for Allele Calling and Ploidy Calling
US10216896B2 (en) 2009-09-30 2019-02-26 Natera, Inc. Methods for non-invasive prenatal ploidy calling
US10061889B2 (en) 2009-09-30 2018-08-28 Natera, Inc. Methods for non-invasive prenatal ploidy calling
US9228234B2 (en) 2009-09-30 2016-01-05 Natera, Inc. Methods for non-invasive prenatal ploidy calling
US10522242B2 (en) 2009-09-30 2019-12-31 Natera, Inc. Methods for non-invasive prenatal ploidy calling
US10061890B2 (en) 2009-09-30 2018-08-28 Natera, Inc. Methods for non-invasive prenatal ploidy calling
US11339429B2 (en) 2010-05-18 2022-05-24 Natera, Inc. Methods for non-invasive prenatal ploidy calling
US10597723B2 (en) 2010-05-18 2020-03-24 Natera, Inc. Methods for simultaneous amplification of target loci
US11525162B2 (en) 2010-05-18 2022-12-13 Natera, Inc. Methods for simultaneous amplification of target loci
US10017812B2 (en) 2010-05-18 2018-07-10 Natera, Inc. Methods for non-invasive prenatal ploidy calling
US11519035B2 (en) 2010-05-18 2022-12-06 Natera, Inc. Methods for simultaneous amplification of target loci
US11939634B2 (en) 2010-05-18 2024-03-26 Natera, Inc. Methods for simultaneous amplification of target loci
US10316362B2 (en) 2010-05-18 2019-06-11 Natera, Inc. Methods for simultaneous amplification of target loci
US11482300B2 (en) 2010-05-18 2022-10-25 Natera, Inc. Methods for preparing a DNA fraction from a biological sample for analyzing genotypes of cell-free DNA
US11312996B2 (en) 2010-05-18 2022-04-26 Natera, Inc. Methods for simultaneous amplification of target loci
US11746376B2 (en) 2010-05-18 2023-09-05 Natera, Inc. Methods for amplification of cell-free DNA using ligated adaptors and universal and inner target-specific primers for multiplexed nested PCR
US10526658B2 (en) 2010-05-18 2020-01-07 Natera, Inc. Methods for simultaneous amplification of target loci
US11408031B2 (en) 2010-05-18 2022-08-09 Natera, Inc. Methods for non-invasive prenatal paternity testing
US10538814B2 (en) 2010-05-18 2020-01-21 Natera, Inc. Methods for simultaneous amplification of target loci
US10557172B2 (en) 2010-05-18 2020-02-11 Natera, Inc. Methods for simultaneous amplification of target loci
US10113196B2 (en) 2010-05-18 2018-10-30 Natera, Inc. Prenatal paternity testing using maternal blood, free floating fetal DNA and SNP genotyping
US11306357B2 (en) 2010-05-18 2022-04-19 Natera, Inc. Methods for non-invasive prenatal ploidy calling
US10590482B2 (en) 2010-05-18 2020-03-17 Natera, Inc. Amplification of cell-free DNA using nested PCR
US11322224B2 (en) 2010-05-18 2022-05-03 Natera, Inc. Methods for non-invasive prenatal ploidy calling
US11332793B2 (en) 2010-05-18 2022-05-17 Natera, Inc. Methods for simultaneous amplification of target loci
US9334541B2 (en) 2010-05-18 2016-05-10 Natera, Inc. Methods for non-invasive prenatal ploidy calling
US11332785B2 (en) 2010-05-18 2022-05-17 Natera, Inc. Methods for non-invasive prenatal ploidy calling
US10655180B2 (en) 2010-05-18 2020-05-19 Natera, Inc. Methods for simultaneous amplification of target loci
US10174369B2 (en) 2010-05-18 2019-01-08 Natera, Inc. Methods for non-invasive prenatal ploidy calling
US10731220B2 (en) 2010-05-18 2020-08-04 Natera, Inc. Methods for simultaneous amplification of target loci
US10774380B2 (en) 2010-05-18 2020-09-15 Natera, Inc. Methods for multiplex PCR amplification of target loci in a nucleic acid sample
US10793912B2 (en) 2010-05-18 2020-10-06 Natera, Inc. Methods for simultaneous amplification of target loci
US11326208B2 (en) 2010-05-18 2022-05-10 Natera, Inc. Methods for nested PCR amplification of cell-free DNA
US9163282B2 (en) 2010-05-18 2015-10-20 Natera, Inc. Methods for non-invasive prenatal ploidy calling
US8949036B2 (en) 2010-05-18 2015-02-03 Natera, Inc. Methods for non-invasive prenatal ploidy calling
US11111545B2 (en) 2010-05-18 2021-09-07 Natera, Inc. Methods for simultaneous amplification of target loci
US11286530B2 (en) 2010-05-18 2022-03-29 Natera, Inc. Methods for simultaneous amplification of target loci
US8825412B2 (en) 2010-05-18 2014-09-02 Natera, Inc. Methods for non-invasive prenatal ploidy calling
US10577655B2 (en) 2013-09-27 2020-03-03 Natera, Inc. Cell free DNA diagnostic testing standards
US9499870B2 (en) 2013-09-27 2016-11-22 Natera, Inc. Cell free DNA diagnostic testing standards
US9677118B2 (en) 2014-04-21 2017-06-13 Natera, Inc. Methods for simultaneous amplification of target loci
US11371100B2 (en) 2014-04-21 2022-06-28 Natera, Inc. Detecting mutations and ploidy in chromosomal segments
US11319596B2 (en) 2014-04-21 2022-05-03 Natera, Inc. Detecting mutations and ploidy in chromosomal segments
US10179937B2 (en) 2014-04-21 2019-01-15 Natera, Inc. Detecting mutations and ploidy in chromosomal segments
US10597709B2 (en) 2014-04-21 2020-03-24 Natera, Inc. Methods for simultaneous amplification of target loci
US10597708B2 (en) 2014-04-21 2020-03-24 Natera, Inc. Methods for simultaneous amplifications of target loci
US11530454B2 (en) 2014-04-21 2022-12-20 Natera, Inc. Detecting mutations and ploidy in chromosomal segments
US10351906B2 (en) 2014-04-21 2019-07-16 Natera, Inc. Methods for simultaneous amplification of target loci
US11390916B2 (en) 2014-04-21 2022-07-19 Natera, Inc. Methods for simultaneous amplification of target loci
US11408037B2 (en) 2014-04-21 2022-08-09 Natera, Inc. Detecting mutations and ploidy in chromosomal segments
US10262755B2 (en) 2014-04-21 2019-04-16 Natera, Inc. Detecting cancer mutations and aneuploidy in chromosomal segments
US11414709B2 (en) 2014-04-21 2022-08-16 Natera, Inc. Detecting mutations and ploidy in chromosomal segments
US11486008B2 (en) 2014-04-21 2022-11-01 Natera, Inc. Detecting mutations and ploidy in chromosomal segments
US11319595B2 (en) 2014-04-21 2022-05-03 Natera, Inc. Detecting mutations and ploidy in chromosomal segments
US11479812B2 (en) 2015-05-11 2022-10-25 Natera, Inc. Methods and compositions for determining ploidy
US11946101B2 (en) 2015-05-11 2024-04-02 Natera, Inc. Methods and compositions for determining ploidy
US11485996B2 (en) 2016-10-04 2022-11-01 Natera, Inc. Methods for characterizing copy number variation using proximity-litigation sequencing
US11519028B2 (en) 2016-12-07 2022-12-06 Natera, Inc. Compositions and methods for identifying nucleic acid molecules
US10533219B2 (en) 2016-12-07 2020-01-14 Natera, Inc. Compositions and methods for identifying nucleic acid molecules
US10577650B2 (en) 2016-12-07 2020-03-03 Natera, Inc. Compositions and methods for identifying nucleic acid molecules
US11530442B2 (en) 2016-12-07 2022-12-20 Natera, Inc. Compositions and methods for identifying nucleic acid molecules
US10011870B2 (en) 2016-12-07 2018-07-03 Natera, Inc. Compositions and methods for identifying nucleic acid molecules
US10894976B2 (en) 2017-02-21 2021-01-19 Natera, Inc. Compositions, methods, and kits for isolating nucleic acids
US11525159B2 (en) 2018-07-03 2022-12-13 Natera, Inc. Methods for detection of donor-derived cell-free DNA
CN109240531A (en) * 2018-07-16 2019-01-18 青岛海信移动通信技术股份有限公司 Sampling compensation method, device, mobile terminal and the storage medium of touch data
CN115035957A (en) * 2022-05-31 2022-09-09 陕西师范大学 Improved minimum residue method analysis mixed STR atlas based on particle swarm optimization

Also Published As

Publication number Publication date
US20130060479A1 (en) 2013-03-07
US20100161237A1 (en) 2010-06-24
US8359166B2 (en) 2013-01-22
US20180211000A1 (en) 2018-07-26

Similar Documents

Publication Publication Date Title
US20180211000A1 (en) Method and System for Genotyping Samples in a Normalized Allelic Space
US6245517B1 (en) Ratio-based decisions and the quantitative analysis of cDNA micro-array images
Jun et al. Detecting and estimating contamination of human DNA samples in sequencing and array-based genotype data
AU2021200154B2 (en) Somatic copy number variation detection
US20040126782A1 (en) System and method for SNP genotype clustering
US20050216208A1 (en) Diagnostic decision support system and method of diagnostic decision support
WO2006014509A2 (en) Quantitative pcr data analysis system (qdas)
WO2006014464A2 (en) Method for quantitative pcr data analysis system (qdas)
US20230086774A1 (en) Method and system for predicting biological age on basis of various omics data analyses
CN114530198A (en) Screening method of SNP (single nucleotide polymorphism) sites for detecting sample pollution level and detection method of sample pollution level
Demidov et al. ClinCNV: novel method for allele-specific somatic copy-number alterations detection
Sauk et al. NIPTmer: rapid k-mer-based software package for detection of fetal aneuploidies
US6876929B2 (en) Process for removing systematic error and outlier data and for estimating random error in chemical and biological assays
US20050050129A1 (en) Method of estimating a penetrance and evaluating a relationship between diplotype configuration and phenotype using genotype data and phenotype data
Vineis et al. Technical variability in laboratory data
EP3207133B1 (en) Cross-platform transformation of gene expression data
US7558411B2 (en) Method and system for managing and querying gene expression data according to quality
Choo-Wosoba et al. hsegHMM: hidden Markov model-based allele-specific copy number alteration analysis accounting for hypersegmentation
CN116705157B (en) Method and device for detecting microsatellite state of plasma sample based on second-generation sequencing
US20230011085A1 (en) Method and system for determining a cnv profile for a tumor using sparse whole genome sequencing
US20160265051A1 (en) Methods for Detection of Fetal Chromosomal Abnormality Using High Throughput Sequencing
Kim et al. GenomomFF: Cost-effective method to measure fetal fraction by adaptive multiple regression techniques with optimally selected autosomal chromosome regions
Tyekucheva et al. Bioinformatic analysis of epidemiological and pathological data
Choi et al. Repurposing kinship coefficients as a sample integrity method for next generation sequencing data in a clinical setting
Choi et al. Validation of Genomic-Based Assay

Legal Events

Date Code Title Description
AS Assignment

Owner name: APPLERA CORPORATION, A DELAWARE CORPORATION, CALIF

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:PRATT, MARK R.;HOLDEN, DAVID;REEL/FRAME:017319/0396;SIGNING DATES FROM 20060116 TO 20060120

AS Assignment

Owner name: BANK OF AMERICA, N.A, AS COLLATERAL AGENT, WASHING

Free format text: SECURITY AGREEMENT;ASSIGNOR:APPLIED BIOSYSTEMS, LLC;REEL/FRAME:021976/0001

Effective date: 20081121

Owner name: BANK OF AMERICA, N.A, AS COLLATERAL AGENT,WASHINGT

Free format text: SECURITY AGREEMENT;ASSIGNOR:APPLIED BIOSYSTEMS, LLC;REEL/FRAME:021976/0001

Effective date: 20081121

AS Assignment

Owner name: APPLIED BIOSYSTEMS INC., CALIFORNIA

Free format text: CHANGE OF NAME;ASSIGNOR:APPLERA CORPORATION;REEL/FRAME:023087/0896

Effective date: 20080630

Owner name: APPLIED BIOSYSTEMS INC., CALIFORNIA

Free format text: MERGER;ASSIGNOR:ATOM ACQUISITION CORPORATION;REEL/FRAME:023087/0918

Effective date: 20081121

Owner name: APPLIED BIOSYSTEMS, LLC, CALIFORNIA

Free format text: MERGER;ASSIGNOR:ATOM ACQUISITION, LLC AND APPLIED BIOSYSTEMS INC.;REEL/FRAME:023087/0931

Effective date: 20081121

STCB Information on status: application discontinuation

Free format text: ABANDONED -- FAILURE TO RESPOND TO AN OFFICE ACTION

AS Assignment

Owner name: APPLIED BIOSYSTEMS INC.,CALIFORNIA

Free format text: CHANGE OF NAME;ASSIGNOR:APPLERA CORPORATION;REEL/FRAME:023994/0538

Effective date: 20080701

Owner name: APPLIED BIOSYSTEMS, LLC,CALIFORNIA

Free format text: MERGER;ASSIGNOR:APPLIED BIOSYSTEMS INC.;REEL/FRAME:023994/0587

Effective date: 20081121

Owner name: APPLIED BIOSYSTEMS, LLC,CALIFORNIA

Free format text: MERGER;ASSIGNOR:APPLIED BIOSYSTEMS INC.;REEL/FRAME:023985/0801

Effective date: 20081121

Owner name: APPLIED BIOSYSTEMS, LLC, CALIFORNIA

Free format text: MERGER;ASSIGNOR:APPLIED BIOSYSTEMS INC.;REEL/FRAME:023985/0801

Effective date: 20081121

Owner name: APPLIED BIOSYSTEMS INC., CALIFORNIA

Free format text: CHANGE OF NAME;ASSIGNOR:APPLERA CORPORATION;REEL/FRAME:023994/0538

Effective date: 20080701

Owner name: APPLIED BIOSYSTEMS, LLC, CALIFORNIA

Free format text: MERGER;ASSIGNOR:APPLIED BIOSYSTEMS INC.;REEL/FRAME:023994/0587

Effective date: 20081121

AS Assignment

Owner name: APPLIED BIOSYSTEMS, INC., CALIFORNIA

Free format text: LIEN RELEASE;ASSIGNOR:BANK OF AMERICA, N.A.;REEL/FRAME:030182/0677

Effective date: 20100528

AS Assignment

Owner name: APPLIED BIOSYSTEMS, LLC, CALIFORNIA

Free format text: CORRECTIVE ASSIGNMENT TO CORRECT THE RECEIVING PARTY NAME PREVIOUSLY RECORDED AT REEL: 030182 FRAME: 0701. ASSIGNOR(S) HEREBY CONFIRMS THE RELEASE OF SECURITY INTEREST;ASSIGNOR:BANK OF AMERICA, N.A.;REEL/FRAME:038006/0024

Effective date: 20100528

Owner name: APPLIED BIOSYSTEMS, LLC, CALIFORNIA

Free format text: CORRECTIVE ASSIGNMENT TO CORRECT THE RECEIVING PARTY NAME PREVIOUSLY RECORDED AT REEL: 030182 FRAME: 0677. ASSIGNOR(S) HEREBY CONFIRMS THE RELEASE OF SECURITY INTEREST;ASSIGNOR:BANK OF AMERICA, N.A.;REEL/FRAME:038006/0024

Effective date: 20100528