System and method for cleaning noisy genetic data and determining chromosome copy number

a genetic data and chromosome technology, applied in the field of genetic data acquisition, manipulation and use, can solve the problems of noisy m2 trisomy, high error-prone direct measurement of dna, unregulated current pgd techniques,

Pending Publication Date: 2020-08-06
NATERA
View PDF6 Cites 50 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Benefits of technology

[0036]In one aspect of the invention, methods make use of knowledge of the genetic data of the mother and the father such as diploid tissue samples, sperm from the father, haploid samples from the mother or other embryos derived from the mother's and father's gametes, together with the knowledge of the mechanism of meiosis and the imperfect measurement of the embryonic DNA, in order to reconstruct, in silico, the embryonic DNA at the location of key loci with a high degree of confidence. In one aspect of the invention, genetic data derived from other related individuals, such as other embryos, brothers and sisters, grandparents or other relatives can also be used to increase the fidelity of the reconstructed embryonic DNA. It is important to note that the parental and other secondary genetic data allows the reconstruction not only of SNPs that were measured poorly, but also of insertions, deletions, and of SNPs or whole regions of DNA that were not measured at all.
[0038]In another aspect of the invention, the direct measurements of the amount of genetic material, amplified or unamplified, present at a plurality of loci, can be used to detect for monosomy, uniparental disomy, trisomy and other aneuploidy states. The idea behind this method is that measuring the amount of genetic material at multiple loci will give a statistically significant result.

Problems solved by technology

Current PGD techniques are unregulated, expensive and highly unreliable: error rates for screening disease-linked loci or aneuploidy are on the order of 10%, each screening test costs roughly $5,000, and a couple is forced to choose between testing aneuploidy, which afflicts roughly 50% of IVF embryos, or screening for disease-linked loci on the single cell.
Since only a single copy of the DNA is available from one cell, direct measurements of the DNA are highly error-prone, or noisy.
M2 trisomy is particularly difficult to detect.
One advantage of FISH is that it is less expensive than karyotyping, but the technique is complex and expensive enough that generally a small selection of chromosomes are tested (usually chromosomes 13, 18, 21, X, Y; also sometimes 8, 9, 15, 16, 17, 22); in addition, FISH has a low level of specificity.
All genotyping techniques, when used on single cells, small numbers of cells, or fragments of DNA, suffer from integrity issues, most notably allele drop out (ADO).
Unfortunately, standard methods require invasive testing and carry a roughly 1 percent risk of miscarriage.
A major drawback of prenatal diagnosis is that given the limited courses of action once an abnormality has been detected, it is only valuable and ethical to test for very serious defects.
As result, prenatal diagnosis is typically only attempted in cases of high-risk pregnancies, where the elevated chance of a defect combined with the seriousness of the potential abnormality outweighs the risks.
A key challenge in using NIPGD is the task of identifying and extracting fetal cells or nucleic acids from the mother's blood.
Current techniques are able to isolate small quantities of fetal cells from the mother's blood, although it is very difficult to enrich the fetal cells to purity in any quantity.
Since only tens of molecules of each embryonic SNP are available through these techniques, the genotyping of the fetal tissue with high fidelity is not currently possible.
The major limitations to amplification material from a single cells are (1) necessity of using extremely dilute DNA concentrations or extremely small volume of reaction mixture, and (2) difficulty of reliably dissociating DNA from proteins across the whole genome.
There are numerous difficulties in using DNA amplification in these contexts.
This is often due to contamination of the DNA, the loss of the cell, its DNA, or accessibility of the DNA during the PCR reaction.
Other sources of error that may arise in measuring the embryonic DNA by amplification and microarray analysis include transcription errors introduced by the DNA polymerase where a particular nucleotide is incorrectly copied during PCR, and microarray reading errors due to imperfect hybridization on the array.
The biggest problem, however, remains allele drop-out (ADO) defined as the failure to amplify one of the two alleles in a heterozygous cell.
ADO can affect up to more than 40% of amplifications and has already caused PGD misdiagnoses.
ADO becomes a health issue especially in the case of a dominant disease, where the failure to amplify can lead to implantation of an affected embryo.
The need for more than one set of primers per each marker (in heterozygotes) complicate the PCR process.
This process is often accompanied by contamination.
PGD is also costly, consequently there is a need for less expensive approaches, such as mini-sequencing.
It is apparent that the techniques will be severely error-prone due to the limited amount of genetic material which will exacerbate the impact of effects such as allele drop-outs, imperfect hybridization, and contamination.
Compared with array based genotyping technologies, Taqman is quite expensive per reaction (˜$0.40 / reaction), and throughput is relatively low (384 genotypes per run).
Also, the Illumina platform takes as long to complete as the 500 k Affymetrix arrays (up to 72 hours), which is problematic for IVF genotyping.
A disadvantage of these arrays are the low flexibility and the lower sensitivity.
The main advantages to pyrosequencing include an extremely fast turnaround and unambiguous SNP calls, however, the assay is not currently conducive to high-throughput parallel analysis.
However, the technique has not been verified for genomic data from a single cell, or a single strand of DNA, as would be required for pre-implantation genetic diagnosis.
Hybridization, however, is inherently noisy, because of the complexities of the DNA sample and the huge number of probes on the arrays.
Background is exceedingly low in this assay (due to specificity), though allele dropout may be high (due to poor performing probes).
When this technique is used on genomic data from a single cell (or small numbers of cells) it will—like PCR based approaches—suffer from integrity issues.
For example, the inability of the padlock probe to hybridize to the genomic DNA will cause allele dropouts.
These approaches to reducing the time for the hybridization reaction will result in reduced data quality.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • System and method for cleaning noisy genetic data and determining chromosome copy number
  • System and method for cleaning noisy genetic data and determining chromosome copy number
  • System and method for cleaning noisy genetic data and determining chromosome copy number

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

Conceptual Overview of the System

[0060]The goal of the disclosed system is to provide highly accurate genomic data for the purpose of genetic diagnoses. In cases where the genetic data of an individual contains a significant amount of noise, or errors, the disclosed system makes use of the expected similarities between the genetic data of the target individual and the genetic data of related individuals, to clean the noise in the target genome. This is done by determining which segments of chromosomes of related individuals were involved in gamete formation and, when necessary where crossovers may have occurred during meiosis, and therefore which segments of the genomes of related individuals are expected to be nearly identical to sections of the target genome. In certain situations this method can be used to clean noisy base pair measurements on the target individual, but it also can be used to infer the identity of individual base pairs or whole regions of DNA that were not measur...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

PropertyMeasurementUnit
fluorescent in situ hybridizationaaaaaaaaaa
electrophoresisaaaaaaaaaa
fluorescent PCRaaaaaaaaaa
Login to view more

Abstract

Disclosed herein is a system and method for increasing the fidelity of measured genetic data, for making allele calls, and for determining the state of aneuploidy, in one or a small set of cells, or from fragmentary DNA, where a limited quantity of genetic data is available. Poorly or incorrectly measured base pairs, missing alleles and missing regions are reconstructed using expected similarities between the target genome and the genome of genetically related individuals. In accordance with one embodiment, incomplete genetic data from an embryonic cell are reconstructed at a plurality of loci using the more complete genetic data from a larger sample of diploid cells from one or both parents, with or without haploid genetic data from one or both parents. In another embodiment, the chromosome copy number can be determined from the measured genetic data, with or without genetic information from one or both parents.

Description

CROSS-REFERENCES TO RELATED APPLICATIONS[0001]This application is a continuation of U.S. application Ser. No. 16 / 823,127 filed Mar. 18, 2020, which is (i) a continuation-in-part of U.S. application Ser. No. 16 / 411,507 filed May 14, 2019, (ii) and a continuation-in-part of U.S. application Ser. No. 16 / 399,911 filed Apr. 30, 2019. U.S. application Ser. No. 16 / 411,507 is a continuation of U.S. application Ser. No. 16 / 288,690 filed Feb. 28, 2019, which is a continuation of U.S. application Ser. No. 15 / 187,555 filed Jun. 20, 2016, now U.S. Pat. No. 10,227,652, which is a continuation of U.S. application Ser. No. 14 / 092,457 filed Nov. 27, 2013, now U.S. Pat. No. 9,430,611. U.S. application Ser. No. 14 / 092,457 is a continuation of U.S. application Ser. No. 13 / 793,133 filed Mar. 11, 2013, now U.S. Pat. No. 9,424,392, and a continuation of U.S. application Ser. No. 13 / 793,186 filed Mar. 11, 2013, now U.S. Pat. No. 8,682,592. Each of U.S. application Ser. No. 13 / 793,133 and U.S. application S...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(United States)
IPC IPC(8): C12Q1/6883G16B30/00C12Q1/6827C12Q1/6876G16B40/00G16B25/00G16B20/00
CPCG16B40/00C12Q2600/156C12Q1/6876G16B20/00C12Q1/6883C12Q2600/158C12Q2600/118G16B25/00G16B30/00C12Q1/6827G16B20/10C12Q1/6869C12Q1/6806C12Q2537/149C12Q2545/114C12Q1/6855C12Q1/6886
Inventor RABINOWITZ, MATTHEWBANJEVIC, MILENADEMKO, ZACHARYJOHNSON, DAVIDKIJACIC, DUSANPETROV, DIMITRISWEETKIND-SINGER, JOSHUAXU, JING
Owner NATERA
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products