A family-wide genome-wide haplotype linkage analysis method, device, storage medium and equipment
By employing family-based whole-genome haplotype linkage analysis and haplotype correction strategies, combined with SNP information and Mendelian inheritance laws, integrated detection of PGT-A, PGT-M, and PGT-SR has been achieved. This solves the problems of high detection costs and long processing times in existing technologies, and improves detection speed and accuracy.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- SUZHOU BASECARE MEDICAL DEVICE CO LTD
- Filing Date
- 2023-04-24
- Publication Date
- 2026-06-12
AI Technical Summary
Existing technologies for chromosomal abnormality diagnosis are characterized by high detection costs, cumbersome operation, and long time consumption, making it difficult to achieve integrated detection of PGT-A/M/SR. Furthermore, existing methods such as FISH, microarray-comparative genomic hybridization, SNP-array, and high-depth whole-genome sequencing suffer from specificity, complexity, and high cost issues, failing to meet the needs of large-scale clinical applications.
The family-based whole-genome haplotype linkage analysis method utilizes known SNP information and Mendelian inheritance laws, combined with family relationships, to obtain SNP site information of genomic DNA from the father, mother, or offspring in one go, construct a whole-genome haplotype linkage map, and adopt a haplotype correction strategy to achieve integrated detection of PGT-A, PGT-M, and PGT-SR.
It reduces the amount of sequencing data and cost, improves detection speed and accuracy, and realizes integrated detection of PGT-A, PGT-M and PGT-SR, meeting the detection needs of a variety of clinical diseases and reducing detection time and economic burden.
Smart Images

Figure CN116665774B_ABST
Abstract
Description
Technical Field
[0001] This invention belongs to the field of molecular biology and relates to a method, apparatus, storage medium and device for haplotype linkage analysis of whole genome in a family. Background Technology
[0002] Chromosomal abnormalities are a significant cause of low implantation rates, pregnancy failure, and birth defects in humans. The early miscarriage rate in natural pregnancies is approximately 15%-20%, while the early miscarriage rate in in vitro fertilization and embryo transfer (IVF-ET) is approximately 25%, with embryonic chromosomal abnormalities accounting for about 40-50% of these cases. The incidence of neonatal chromosomal abnormalities is 0.5%-1%. For patients or their families with chromosomal abnormalities, single-gene disorders, unexplained recurrent miscarriages, or implantation failure, genetic counseling may lead to recommendations for embryo testing from multiple PGT platforms to prevent recurrent miscarriages and the risk of genetic diseases in offspring.
[0003] Currently, PGT-A / M / SR testing in clinical practice is performed using different technical platforms, resulting in high clinical testing costs, cumbersome procedures, and long processing times, thus limiting large-scale clinical application. PGT-A is used to screen embryos for chromosomal aneuploidy before implantation, particularly for patients with advanced maternal age, recurrent miscarriage, repeated implantation failure, adverse pregnancy history, or severe teratospermia in the father. PGT-SR is used to detect chromosomal structural abnormalities in one or both parents, such as inversions, balanced translocations, and Robertsonian translocations. Since chromosomes recombine during fertilization, this test detects chromosomal structural abnormalities in the embryo before implantation. PGT-M is used to test embryos for single-gene genetic diseases in parents who have or carry known genetic conditions, such as thalassemia, hereditary deafness, and polycystic kidney disease, before implantation.
[0004] The molecular cytogenetics technology that has been developed in recent years is a product of the combination of cytogenetics, molecular biology, and molecular immunology, and it has been widely used in the clinical diagnosis and research of chromosomal diseases.
[0005] Chromosomal karyotype analysis involves culturing specific cells, preparing and staining them with special techniques, and then observing the number and structure of chromosomes during metaphase under a light microscope. It is a fundamental method for diagnosing chromosomal disorders. However, this method is limited by its lengthy experimental process and culture time, and it can only analyze chromosomes in metaphase.
[0006] Fluorescence in situ hybridization (FISH) technology uses known nucleic acid sequences as probes, which are directly labeled with fluorescein or labeled with non-radioactive substances and hybridized with target DNA. The fluorescein label is then ligated through an immunocytochemical process, and the hybridization signal is observed under a fluorescence microscope to perform qualitative, localization, and quantitative analysis of the nucleic acid to be tested in the sample. However, this technology is limited by the specificity of the probes, and can only detect one or a few known chromosomal abnormalities at a time; certain subfamily DNA sequences are very similar to each other, and cross-reactions can occur between centromere sequences of several pairs of chromosomes; moreover, the technology is complex, cumbersome to operate, and uses expensive reagents, making it unsuitable for large-scale clinical application.
[0007] Array-Comparative Genomic Hybridization (Array-CGH) is a technique that combines gene chips and CGH. It utilizes microarrays to replace the metaphase of traditional CGH, allowing fluorescently labeled test probes and reference DNA probes to competitively hybridize with short target sequences on the microarray. Its drawbacks include the ability to detect only known chromosomal abnormalities, the need for control samples during detection, and the requirement to compare signals with control samples for result analysis, making it highly susceptible to the influence of hybridization signals.
[0008] To ensure accurate test results, SNP-array technology requires high-quality and broad-coverage chip probes. Therefore, the diagnostic capability of SNP-array is limited by the source, quality, quantity, and distribution density of the probes already fixed on the chip. High-quality, broad-coverage chips are expensive, and the associated testing and analysis equipment and consumables are also costly, increasing the financial burden on patients and limiting its routine clinical application.
[0009] High-depth whole-genome sequencing can detect abnormalities in chromosome number and chromosome segment, as well as smaller microdeletions and microduplications. However, it generates a large amount of sequencing data (90Gb), resulting in high sequencing costs, long experimental cycles, low data utilization, and waste of resources.
[0010] Nanopore third-generation sequencing is extremely expensive for whole-genome sequencing and cannot be directly used for embryo testing.
[0011] CN111961707A discloses a method for constructing nucleic acid libraries and its application in the analysis of chromosomal structural abnormalities in preimplantation embryos (RetSeq technology). Compared with high-depth whole-genome sequencing, RetSeq sequencing significantly reduces the cost, but the sequencing data volume is still relatively high (80M).
[0012] In summary, the field of chromosomal abnormality diagnosis urgently needs a low-cost, fast clinical diagnostic method to achieve universal PGT-A / M / SR integrated detection and address the detection needs of various clinical diseases. Summary of the Invention
[0013] To address the shortcomings of existing technologies and practical needs, this invention provides a method, apparatus, storage medium, and device for family-wide genome haplotype linkage analysis. By utilizing known SNP information as a reference and combining it with family relationships, the amount of sequencing data can be further reduced. This allows for the simultaneous integrated detection of non-holistic, single-gene diseases, and chromosomal structural rearrangements in the same experimental test.
[0014] To achieve the above objectives, the present invention adopts the following technical solution:
[0015] In a first aspect, the present invention provides a method for haplotype linkage analysis of a family genome for non-disease diagnosis purposes, the method comprising the following steps:
[0016] (1) Take one or two genomic DNA samples from the father, mother or offspring in a family that conforms to Mendel's laws of inheritance, and obtain the SNP information of the genomic DNA samples;
[0017] (2) Use the genomic DNA sample after obtaining SNP information as the reference sample, and establish a set of coordinate and genotype information data for the SNP sites of the reference sample with the human standard reference genome hg19 or hg38 as the coordinates.
[0018] (3) Sequencing of other individual samples in the family; based on the SNP coordinates and genotype information data set of the reference sample, and based on Mendel's laws of inheritance, analyzing the genotypes of SNPs with the same coordinates in other samples of the family.
[0019] (4) Based on the genotype information of all SNP loci in the same coordinate system of the family sample, and combined with the family kinship, construct the whole genome haplotype linkage map of the family.
[0020] (5) In the haplotype linkage analysis map of the whole genome of the family, haplotype correction is performed to remove interfering segments according to the principle that the coherent haplotype information among offspring should be consistent and the non-coherent haplotype information should be inconsistent.
[0021] This invention develops a family-based whole-genome haplotype linkage analysis technology that combines SNP localization. In families conforming to Mendelian inheritance laws, only a certain number of SNP loci information from any one or two samples of genomic DNA from the father, mother, or offspring (embryos) need to be obtained once, without having to repeatedly obtain all SNP information for the entire family members. Using known SNP information as a reference, combined with family relationships, the amount of sequencing data is further reduced. At the same time, a haplotype correction strategy is designed to remove the influence of interfering specks (segments), improve the accuracy of typing, and solve the integrated detection needs of PGT-A, PGT-M, and PGT-SR without having to build corresponding detection procedures for each type of disease.
[0022] Preferably, the method for obtaining SNP information of the genomic DNA sample in step (1) includes any one of the following: obtaining from existing NGS data, obtaining from existing SNP array data, performing WGS sequencing, performing SNP array detection, performing RetSeq detection (see CN111961707A), or performing third-generation sequencing detection.
[0023] Preferably, the number of SNPs in step (1) is >300,000, more preferably 500,000 to 800,000, and even more preferably 600,000 to 700,000.
[0024] Preferably, the sequencing method described in step (3) includes WGS sequencing or RetSeq detection (see CN111961707A). The amount of sequencing data can be lower than that of conventional WGS or RetSeq detection; furthermore, compared with conventional WGS or RetSeq detection, the amount of sequencing data can be reduced by up to 60% or more.
[0025] Preferably, the established coordinates are based on the human standard reference genome hg19 or hg38.
[0026] In this invention, the specific analytical approach based on Mendel's laws of inheritance in step (3) is as follows: First, SNP sites with excessively low QC values are filtered out, and SNP sites that do not conform to the genetic relationship are also filtered out; then, based on the genetic relationship, if one parent is AA and the other parent is AA, the offspring will definitely be AA (probability 100%); if one parent is BB and the other parent is BB, the offspring will definitely be BB (probability 100%); if one parent is AA and the other parent is BB, the offspring will definitely be AB (probability 100%); if one parent is AB and the other parent is AB, the offspring will definitely be AA, AB, or BB (probabilities respectively). 25%, 50%, 25%); if one parent is AA and the other is AB, then the offspring will definitely be AA or AB (probabilities 50% and 50% respectively); if one parent is AB and the other is BB, then the offspring will definitely be AB or BB (probabilities 50% and 50% respectively); thus, based on the known SNP genotype information of the reference sample under the genetic relationship, we can know the possible genotypes of SNPs at each coordinate locus in other samples and the probability of occurrence of each genotype; based on the sequencing data of other samples in the family, under the condition that only 1 to 3 genotypes can exist and their respective probabilities, we can obtain the accurate genotype information of each SNP.
[0027] For example, samples from the father and mother in the family lineage were used as reference samples for testing first, and 450,000 SNPs with fixed coordinates were obtained. The fixed coordinate sites are designated as SNP1, SNP2, SNP3, ... SNP450,000 according to their positions on the human reference genome.
[0028] If the father's genotype at the first fixed coordinate SNP1 locus is GC and the mother's is GG, then the genotype of the offspring embryo at the SNP1 locus can only be GG or GC (with probabilities of 50% and 50%, respectively). When low-depth sequencing shows that an embryo has a C base at the SNP1 locus, then the genotype of that embryo at the SNP1 coordinate locus is GC. When another embryo shows both G and C bases at the SNP1 locus, then the genotype of that embryo at the SNP1 coordinate locus is GC. When another embryo shows only a G base at the SNP1 locus, then the genotype of that embryo at the SNP1 coordinate locus may be GG or GC (the specific type can be distinguished by considering the haplotypes of closely linked upstream and downstream coordinate loci during haplotype correction).
[0029] If the father's genotype information for the second fixed coordinate SNP2 locus is AA and the mother's is CC, then it can be known that the genotype of the offspring embryo at the SNP2 locus can only be AC (probability 100%). In this case, regardless of what the low-depth sequencing information of the embryo indicates, it is known to be AC.
[0030] Then, the genotype information of the father and mother at the 3rd to 450,000th fixed coordinate loci SNP3 to SNP450,000 are analyzed sequentially to obtain the possible genotypes and probabilities of the offspring. The interpretation is made in combination with the low-depth sequencing of each embryo. For embryo genotypes that cannot be accurately interpreted at the moment, they can be further confirmed during haplotype correction.
[0031] In this invention, haplotype correction specifically includes: arbitrarily selecting one offspring (embryo) from all offspring (embryos) in a family as a reference; once a reference is selected, all other offspring can be compared with it in terms of haplotype. If a particular offspring (embryo) is used as a reference, other offspring (embryos) either share only the maternal haplotype with the reference; or the offspring share only the paternal haplotype with the reference; or the offspring share both the paternal and maternal haplotypes with the reference; or the offspring do not share any haplotypes with the reference, and the haplotypes of the paternal and maternal haplotypes are constructed from the reference offspring. Furthermore, based on the linkage law, the probability of homologous recombination at the Kb level is far less than 1%, therefore, impurities (segments) appearing on the haplotype map can be effectively corrected and removed based on the aforementioned haplotype co-heritage relationships.
[0032] In a specific embodiment of the present invention, in order to facilitate the differentiation of different haplotypes in subsequent analysis, the alleles inherited by the offspring from the father can be marked in blue, and the alleles of the other chain of the father can be marked in red. The alleles inherited by the offspring from the mother can be marked in orange, and the alleles of the other chain of the mother can be marked in green. According to Mendel's laws of inheritance, it is determined whether the embryonic alleles and the reference alleles have the same origin. If the origins are the same, the alleles are marked in blue (father's valid locus) or orange (mother's valid locus); otherwise, they are marked in red (father's valid locus) or green (mother's valid locus). Valid SNP loci are shown in Table 1.
[0033] Table 1
[0034]
[0035]
[0036] Secondly, the application of the family genome haplotype linkage analysis method described in the first aspect for non-disease diagnosis purposes in the construction of PGT-A, PGT-M and PGT-SR detection devices.
[0037] Thirdly, the present invention provides a detection device for PGT-A, PGT-M and PGT-SR, the detection device comprising an SNP acquisition unit, a reference sample construction unit, a PGT-A analysis unit, a whole-genome haplotype linkage analysis map construction unit, and PGT-M and PGT-SR analysis units.
[0038] The SNP acquisition unit is used to perform the following:
[0039] Take genomic DNA samples from any one or two of the father, mother, or offspring in a family that conforms to Mendel's laws of inheritance, and obtain the SNP information of the genomic DNA samples.
[0040] The reference sample construction unit is used to perform the following:
[0041] Using genomic DNA samples with SNP information as reference samples, a set of coordinate and genotype information data for the SNP sites of the reference samples is established.
[0042] The PGT-A analysis unit is used to perform the following:
[0043] SNP loci with the same coordinates as the reference sample in other individual samples of the family were sequenced. Based on the SNP coordinates and genotype information dataset of the reference sample, the genotypes of SNPs with the same coordinates in other samples of the family were analyzed according to Mendel's laws of inheritance. Copy number variation analysis was performed using the circular binary segmentation algorithm.
[0044] The copy number variation analysis in this invention includes comparative analysis of sequencing data, dividing the data into 10Kb windows to count the number of reads, followed by GC correction and window merging to obtain the total number of reads after merging and normalizing the windows. Then, the sequence number of the sample after normalization is compared with the reference database, and the LogRR value of each window is calculated. The LogRR value reflects the difference between the sample and the reference database window segment, that is, the CNV situation of each window segment. Finally, the t-statistic analysis is constructed using the mean difference of the LogRR values between adjacent windows to accurately obtain the segmentation point of the variant segment and determine the specific CNV variation information.
[0045] The unit for constructing a whole-genome haplotype linkage map is used to perform the following:
[0046] Based on the genotypic information of all SNP loci at the same coordinates in the family sample, and combined with the family kinship, a haplotype linkage map of the whole genome of the family was constructed.
[0047] The PGT-M and PGT-SR analysis units are used to perform the following:
[0048] In the family whole-genome haplotype linkage map, based on the principle that co-herited haplotype information among offspring should be consistent and non-co-herited haplotype information should be inconsistent, haplotype segment correction is performed to remove interfering segments, and PGT-M and PGT-SR analyses are conducted.
[0049] Preferably, the method for obtaining SNP information of the genomic DNA sample in the SNP unit includes any one of the following: obtaining from existing NGS data, obtaining from existing SNP array data, performing WGS sequencing, performing RetSeq detection, performing SNP array detection, or performing third-generation sequencing detection.
[0050] Preferably, the number of SNPs in the SNP unit is >300,000, more preferably 500,000 to 800,000, and even more preferably 600,000 to 700,000.
[0051] Preferably, the sequencing method in the PGT-A detection unit includes WGS sequencing, RetSeq detection, SNP array detection, or third-generation sequencing.
[0052] Fourthly, the present invention provides a computer device including a memory and a processor, wherein the memory stores a computer program that executes the family whole genome haplotype linkage analysis method for non-disease diagnosis purposes as described in the first aspect or implements the functions of the PGT-A, PGT-M and PGT-SR detection devices as described in the third aspect.
[0053] Fifthly, the present invention provides a computer-readable storage medium having a computer program stored thereon, characterized in that the computer program executes the family whole genome haplotype linkage analysis method for non-disease diagnosis purposes as described in the first aspect or implements the functions of the PGT-A, PGT-M and PGT-SR detection devices as described in the third aspect.
[0054] Compared with the prior art, the present invention has the following beneficial effects:
[0055] (1) This invention develops a family whole genome haplotype linkage analysis technology that combines SNP localization. In families that conform to Mendelian inheritance laws, only a certain number (>300,000) of SNP locus information from the genomic DNA of any one or two samples of the father, mother, or offspring (embryo) is needed, without needing to obtain all SNP information of the entire family members. Based on the samples with known SNP information in the family as reference samples, a whole genome SNP locus coordinate and genotype information data set of the reference samples is constructed. Then, when other samples in the family are detected and linkage analyzed, only the genotype information of the fixed coordinate loci in the data set is analyzed. Sufficient SNP data can be obtained with low coverage (0.3×–1.4×) and low cost sequencing. At the same time, the accuracy of SNP locus detection is ensured by combining the family genetic relationship.
[0056] (2) This invention uses a haplotype correction strategy. All haplotype information of offspring is inherited from parents. The co-inherited haplotype information among offspring should be consistent, while the non-co-inherited haplotype information should be inconsistent. This allows for haplotype segment correction, removing interference spots (segments) and improving typing accuracy.
[0057] (3) Compared with conventional high-depth WGS or RetSeq detection, the sequencing data volume of the method of the present invention can be reduced by more than 60%, and the low-depth sequencing can shorten the entire detection time, thereby reducing sequencing costs and increasing detection rate.
[0058] (4) This invention can universally solve the integrated detection needs of PGT-A, PGT-M and PGT-SR without the need to build corresponding detection processes for each type of disease. Attached Figure Description
[0059] Figure 1 Flowchart of PGT-A / SR / M integrated testing process;
[0060] Figure 2 Image showing the results of chromosome aneuploidy detection in offspring embryos of XY families;
[0061] Figure 3A This is a haplotype diagram of an XY family (chromosome 4).
[0062] Figure 3B This is a haplotype diagram of an XY family (chromosome 10).
[0063] Figure 4 Figure showing the results of genetic analysis of the 4q35 and 10q26 regions in the XY family;
[0064] Figure 5 The whole genome haplotype map of the offspring embryonic cells of the ZY family obtained by low-depth sequencing;
[0065] Figure 6 This is a haplotype diagram of the ZY family lineage;
[0066] Figure 7 CNV results for the ZY family;
[0067] Figure 8 Image showing the results of chromosome aneuploidy detection in offspring embryos of the WZW family;
[0068] Figure 9 This is a haplotype diagram of the WZW family;
[0069] Figure 10 This is a diagram showing the results of X-chromosome genetic analysis of the WZW family. Detailed Implementation
[0070] To further illustrate the technical means and effects of this invention, the following description, in conjunction with embodiments and accompanying drawings, provides a further explanation of the invention. It is understood that the specific embodiments described herein are merely illustrative of the invention and not intended to limit it.
[0071] Where specific techniques or conditions are not specified in the examples, they shall be performed in accordance with the techniques or conditions described in the literature in this field, or in accordance with the product instructions. Reagents or instruments whose manufacturers are not specified are all conventional products that can be purchased through legitimate channels.
[0072] This invention involves testing parents or reference samples within a family pedigree using any method that provides information on >300,000 SNPs (preferably 500,000-800,000, and more preferably 600,000-700,000) (e.g., using existing NGS data, existing SNP array data, performing WGS sequencing, SNP array detection, RetSeq detection, or third-generation sequencing). Whole-genome SNP localization and dataset creation are performed to construct parental whole-genome haplotypes. Low-depth sequencing is then performed on representative regions of the whole-genome DNA from other embryonic cells to be tested. This avoids the need for repeated testing of parental genotypes during each embryonic cell examination, further reducing sequencing costs and shortening testing time. By analyzing the presence of chromosomal aneuploidy, chromosomal structural abnormalities, and single-gene genetic mutations in embryos, the invention assists clinicians in selecting chromosomally normal embryos for implantation. This invention further constructs a detection device applicable to PGT-A / M / SR, as illustrated in the flowchart below. Figure 1 As shown, this is a low-cost, versatile, integrated detection solution that can also be applied to other fields of life sciences.
[0073] NGS: Next-Generation Sequencing.
[0074] RAD-Seq: Restriction Site-Associated DNA Sequencing (Simplified Genome Sequencing)
[0075] IVF-ET: in vitro fertilization-embryo transfer.
[0076] SNP: single nucleotide polymorphism.
[0077] PGT-A: Preimplantation Genetic Testing for aneuploidy.
[0078] PGT-M: Preimplantation Genetic Testing for Monogenic Embryos.
[0079] PGT-SR: Preimplantation Genetic Testing for Structural Rearrangements.
[0080] Example 1
[0081] This embodiment provides a device for detecting chromosomal aneuploidy and screening for single-gene genetic diseases.
[0082] During genetic counseling, it was discovered that the client, XY, aged 35, experienced spontaneous abortion due to advanced maternal age, and her husband, QB, suffers from X-linked facioscapulohumeral muscular dystrophy type 1 (FSHD). Both QB and QB's mother, GLX, also have FSHD type 1. They requested assisted reproduction and selection of genetically normal embryos for transfer. Facioscapulohumeral muscular dystrophy is a hereditary muscle disease, most severely affecting the muscles of the face, shoulders, and upper arms. (Some patients have their own gene mutations.) An advantage of FSHD is its slow disease progression, which is not life-threatening. Research has found that the gene abnormality for this disease is located at the 4q35 locus near the end of chromosome 4.
[0083] Peripheral blood samples (5 mL each) were collected from both partners and the husband's parents and stored in EDTA anticoagulant blood collection tubes. Genomic DNA was extracted using the Tiangen Blood / Cell / Tissue Genomic DNA Extraction Kit. Following ovarian superovulation stimulation with medication, in vitro fertilization (IVF) was performed. Several trophoblast cells from blastocysts cultured for five days were selected and numbered XY-1, XY-2, XY-8, and XY-14. DNA analysis of selected embryonic cells was conducted to analyze the presence of aneuploidy. Linkage analysis was performed on the D4Z4 repeat regions in the 4q35 and 10q26 regions to detect whether the embryos carried FSHD-causing mutations, assisting clinicians in determining embryo implantation suitability.
[0084] Single-cell amplification products of XY-1 offspring embryos that passed quality inspection were found in Reactions were performed on a 750K single nucleotide polymorphism (SNP) microarray, followed by scanning on an Affymetrix gene chip (GCS3000) scanner. Low-depth genome sequencing (0.5×-0.9×) was performed on embryos from other family members, including the male father, male mother, male, female, and other offspring. Whole-genome amplification of offspring embryos was performed using the QIAGEN REPLI-g Single Cell Kit, followed by RetSeq-NGS library construction.
[0085] ①. DNA digestion: Take 200 ng of DNA sample, add NspI and MboI endonuclease, mix well by pipetting (do not vortex), briefly centrifuge, and immediately place in a PCR instrument: 37℃ for 20 minutes, 65℃ for 20 minutes, and maintain at 4℃.
[0086] ②. Add adapters to the ends of DNA fragments: Select adapters with different sequences according to the sequencing platform, add the adapter mixture to the enzyme-digested DNA, vortex to mix, and briefly centrifuge. Immediately after centrifugation, place in a PCR instrument: 60℃ for 10 minutes, then maintain at 4℃;
[0087] ③. Adapter ligation: Add the ligase mixture to the DNA containing the adapter, vortex to mix, briefly centrifuge, and immediately place in a PCR instrument: 22℃ for 25 minutes, 65℃ for 10 minutes, and hold at 4℃.
[0088] ④. Fragment selection: Add water to 100μL, then add 60μL of AMPure XP magnetic beads, mix well, and let stand at room temperature for 5 minutes. Place the mixture on a magnetic rack and let it stand for 3-5 minutes until the liquid is clear. Transfer the supernatant to a new centrifuge tube, add 18μL of AMPure XP magnetic beads, mix well, and let stand at room temperature for 5 minutes. Place the mixture on a magnetic rack and let it stand until the liquid is clear. Remove the supernatant, wash with 200μL of 80% alcohol, dry at room temperature, and then elute the DNA with 22μL of Low TE.
[0089] ⑤. Library amplification: Add PCR reaction mixture to the DNA sample after fragment screening, then add 2 μL of specific primer X, vortex to mix, briefly centrifuge, and then place the PCR tube into the PCR instrument: 98℃ for 45 seconds; (98℃ for 15 seconds, 55℃ for 30 seconds, 72℃ for 30 seconds) * 6 cycles; 72℃ for 1 minute; store at 4℃.
[0090] ⑥. Library purification: After the reaction is complete, centrifuge, add 50 μL of AMPure XP magnetic beads, mix well, let stand at room temperature for 5 minutes, then place on a magnetic rack for 4 minutes until the liquid is clear, discard the supernatant, wash with 200 μL of 80% alcohol, repeat once, dry the magnetic beads at room temperature, add 25 μL of Low TE to resuspend the magnetic beads, and elute the DNA.
[0091] The constructed library was sequenced using PE100-NGS, and the sequencing data volume was 10M raw reads (0.5×-0.9×).
[0092] Obtain SNP units
[0093] First, gene chips were used to test the XY-1 sample of offspring embryos to obtain information on 300,000 SNP sites in the sample.
[0094] Constructing reference sample units
[0095] Then, the obtained SNP locus information is used as a reference sample, and a set of coordinate and genotype information data (coordinates are the human standard reference genome hg19) is established for the known SNP loci of the reference sample.
[0096] PGT-A Analysis Unit
[0097] Then, RetSeq was performed on other samples from the family, generating 10M raw reads (0.5×-0.9×). SNPs with excessively low QC values and those not conforming to genetic relationships were filtered out. Then, based on the reference sample's SNP coordinates and genotype information dataset, bioinformatics analysis was performed on SNPs at the same coordinates in other samples from the family, according to Mendel's laws of inheritance, to obtain accurate genotypes.
[0098] The circular binary segmentation (CBS) algorithm is used to analyze the sequencing results, obtaining the number of valid sequences matched on each chromosome. The ratio of the number of valid sequences to the number of corresponding chromosome sequences in the reference database is calculated. If the ratio is too high, the chromosome can be identified as trisomy or duplication; if the ratio is too low, the chromosome can be identified as monosomy or deletion. This method enables the detection of chromosomal aneuploidy. The results of chromosomal aneuploidy detection in XY family offspring embryos are shown below. Figure 2 As shown.
[0099] Constructing whole-genome haplotype linkage analysis units
[0100] Based on the genotypic information of all SNP loci at the same coordinate system in the family pedigree samples, and combined with kinship, a haplotype linkage map of the whole genome of the family was constructed. Haplotype linkage analysis of the offspring embryos was performed to observe the D4Z4 repeat regions in the 4q35 and 10q26 regions to determine whether the embryos carried the pathogenic mutation for FSHD. Sequencing data indicators for the XY family are shown in Table 2.
[0101] Table 2
[0102]
[0103] For example, based on the male's genotype information of the first fixed coordinate locus SNP1 being GC and the female's being GG, it can be known that the genotype of the offspring embryo at the SNP1 locus can only be GG or GC (with probabilities of 50% and 50%, respectively). When low-depth sequencing shows that embryo XY-2 has a C base at the SNP1 locus, it can be known that the genotype of the embryo at the SNP1 coordinate locus is GC. When another embryo XY-8 has both G and C bases at the SNP1 locus, it can be known that the genotype of the embryo at the SNP1 coordinate locus is GC. When embryo XY-14 has only a G base at the SNP1 locus, it can be known that the genotype of the embryo at the SNP1 coordinate locus may be GG or GC (in haplotype correction, the haplotypes of closely linked coordinate loci upstream and downstream of the embryo can be combined for differentiation). Then, the genotype information of the male and female at the 2nd to 300,000th fixed coordinate loci SNP2 to SNP300,000 are analyzed sequentially to obtain the possible genotypes and probabilities of the offspring. The interpretation is made in combination with the low-depth sequencing of each embryo. For the genotype of the embryo that cannot be accurately interpreted at the moment, it can be further confirmed during haplotype correction.
[0104] Monomorphic correction and PGT-SR analysis
[0105] In the haplotype linkage analysis of a family pedigree, all offspring haplotype information is inherited from both parents. Co-inherited haplotype information among offspring should be consistent, while non-co-inherited haplotype information should be inconsistent, allowing for haplotype segment correction to remove interfering pollutants (segments). In this family, using offspring (embryo) XY-1 as a reference, other offspring (embryos) either share only the maternal haplotype with the reference; or only the paternal haplotype with the reference; or share both paternal and maternal haplotypes with the reference; or have no shared haplotypes with the reference. In summary, haplotypes originating from the same paternal or maternal haplotype among all offspring are co-inherited haplotypes, and their haplotype information should be completely consistent; haplotypes originating from different paternal or maternal haplotypes among all offspring are non-co-inherited haplotypes, and their haplotype information should be inconsistent. The XY family haplotype results are as follows: Figure 3A and Figure 3B As shown. The results of regional genetic analysis of the XY family at 4q35 and 10q26 are as follows. Figure 4 As shown in Table 3, the embryo testing results for the XY family are presented.
[0106] Table 3
[0107]
[0108] Based on the above results, the XY-14 embryo sample showed normal chromosome aneuploidy, and no abnormal genetic variations were found in the D4Z4 repeat region of 4q35 and the D4Z4 repeat region of 10q26. Therefore, the embryo is suitable for implantation.
[0109] Example 2
[0110] This embodiment provides a device for detecting balanced translocation families.
[0111] During genetic counseling, it was discovered that the client, ZY, aged 28, had experienced spontaneous abortion and had not undergone genetic testing, requesting assisted reproduction at the hospital. PGT-SR testing was performed, with 5mL peripheral blood samples taken from both partners and stored in EDTA anticoagulant blood collection tubes. Following ovarian hyperovulation stimulation with medication, in vitro fertilization (IVF) was performed. Several trophoblast cells from blastocysts cultured for five days were selected for preimplantation genetic testing to analyze balanced chromosomal translocations.
[0112] Obtain SNP units
[0113] The male and female samples were numbered ZY-mother and ZY-father, and the offspring embryo samples were numbered ZY-1, ZY-4, and ZY-5. First, the DNA extracted from the peripheral blood of the male and female was analyzed using a microarray chip (Illumina iScan) to obtain information on approximately 400,000 SNP sites in the sample.
[0114] Constructing reference sample units
[0115] Then, the obtained SNP locus information was used as a reference sample to establish a coordinate and genotype information dataset (coordinates are the human standard reference genome hg19). Low-depth whole-genome sequencing analysis was performed on the offspring embryonic cells, using standard WGS and PE100-NGS detection, resulting in 50M raw reads (1.5×-3×). The haplotype map of the offspring embryonic cells from the ZY family, obtained through low-depth sequencing, is shown below. Figure 5 As shown.
[0116] Constructing whole-genome haplotype linkage analysis units
[0117] SNPs with excessively low QC values in the sequencing data were filtered out, as were SNPs that did not conform to genetic relationships. Then, based on the genetic relationships and using the reference sample's SNP coordinates and genotype information dataset, bioinformatics analysis was performed on SNPs at the same coordinates in other samples of the family, based on Mendelian inheritance laws, to obtain accurate genotypes. Based on the genotype information of all SNPs at the same coordinates in the family samples, and combined with family kinship, a haplotype linkage map of the whole genome of the family was constructed. The sequencing data indicators of the ZY family are shown in Table 4.
[0118] Table 4
[0119] Sample number information Data volume Sequencing depth LogRR_MAPD LogRR_SD Number of valid sites ZY-1 embryo 50.0M 3× 0.06 0.138 chr5:52;chr7:15 ZY-4 embryo 51.8M 3× 0.056 0.1 chr5:53;chr7:17 ZY-5 embryo 50.5M 3× 0.07 0.128 chr5:51; chr7:13
[0120] For example, based on the male's genotype information of the first fixed coordinate locus SNP1 being AG and the female's being GG, it can be known that the genotype of the offspring embryo at the SNP1 locus can only be AG or GG (with probabilities of 50% and 50%, respectively). When low-depth sequencing shows that embryo ZY-1 has an A base at the SNP1 locus, it can be known that the genotype of the embryo at the SNP1 coordinate locus is AG. When another embryo ZY-4 has both A and G bases at the SNP1 locus, it can be known that the genotype of the embryo at the SNP1 coordinate locus is AG. When embryo ZY-5 has only a G base at the SNP1 locus, it can be known that the genotype of the embryo at the SNP1 coordinate locus is GG. Then, the genotype information of the male and female at the 2nd to 400,000th fixed coordinate loci SNP2 to SNP400,000 are analyzed sequentially to obtain the possible genotypes and probabilities of the offspring. The interpretation is made in combination with the low-depth sequencing of each embryo. For the genotype of the embryo that cannot be accurately interpreted at the moment, it can be further confirmed during haplotype correction.
[0121] Monotype correction
[0122] In the haplotype linkage map of a family's whole genome, all offspring haplotype information is inherited from both parents. Co-inherited haplotype information among offspring should be consistent, while non-co-inherited haplotype information should be inconsistent, allowing for haplotype segment correction to remove interfering pollutants (segments). In this family, we use offspring (embryo) XY-4 as a reference. Other offspring (embryos) either share only the maternal haplotype with the reference; or only the paternal haplotype with the reference; or share both paternal and maternal haplotypes with the reference; or have no shared haplotypes with the reference. If all offspring haplotypes originate from the same paternal or maternal haplotype, they are considered co-inherited haplotypes, and the haplotype information should be completely consistent. If all offspring haplotypes originate from different paternal or maternal haplotypes, they are considered non-co-inherited haplotypes, and the haplotype information should be inconsistent. The haplotype results for the ZY family are as follows: Figure 6 As shown.
[0123] Analysis using this technique showed that all offspring embryo samples exhibited SNP localization accuracy of >99%.
[0124] The CNV results for the ZY family are as follows: Figure 7As shown in the analysis results: In the ZY family pedigree, the male is normal, while the female is a translocation carrier. Embryo ZY-4 and embryo ZY-1 both exhibit copy number abnormalities in the translocation region and are also translocation carriers. In the chromosome 5 translocation interpretation area, the orange haplotype of embryo ZY-4 indicates a translocation carrier, while the green haplotype of embryo ZY-5 indicates a normal embryo. In the chromosome 7 translocation interpretation area, the orange haplotype of embryo ZY-4 indicates a translocation carrier, while the green haplotype of embryo ZY-5 indicates a normal embryo.
[0125] Example 3
[0126] This embodiment provides a PGT-A / SR / M combined detection device.
[0127] In clinical diagnosis, the client, WZW, aged 31, presented with spontaneous abortion and requested assisted reproduction and selection of genetically normal embryos for transfer. Peripheral blood samples (5 mL) were collected from both WZW and ZYL. Approximately 20 mg of tissue was extracted from the aborted fetus, WZW-0, and DNA was extracted using a Kangwei Century nucleic acid extraction and purification kit. Following in vitro fertilization (IVF), several blastocyst trophoblast cells were selected and analyzed, designated WZW-2, WZW-8, and WZW-9. The analysis examined for aneuploidy, chromosomal structural abnormalities (such as translocations, inversions, microdeletions, and microduplications), and the presence of single-gene genetic diseases, assisting clinicians in determining embryo implantation suitability.
[0128] Obtain SNP
[0129] The DNA of the woman and the aborted fetus was analyzed using whole-genome sequencing (WGS) to achieve a gene coverage of at least 30× in the whole genome, obtaining 300,000 SNP loci for each sample. Then, using the woman and the aborted fetus as reference samples, a coordinate and genotype information dataset (coordinates based on the human standard reference genome hg19) was established for the known SNP loci in the reference samples. The offspring embryo samples underwent whole-genome amplification using the QIAGEN REPLI-g Single Cell Kit. The male's DNA and the offspring embryos were analyzed using standard WGS, PE100, yielding 20M raw reads (0.5×-0.9×).
[0130] Constructing reference samples
[0131] After obtaining the sequencing data, SNP sites with low QC values are filtered out. The sequencing data is then matched with the human genome database to establish a reference sample SNP information dataset to identify genetic variations.
[0132] Copy number variation (CNV) analysis was performed using the circular binary segmentation (CBS) algorithm. Sequencing data were aligned and analyzed, with reads counted in 10kb windows. GC correction and window merging were then performed to obtain the total number of reads after normalization. Next, the normalized sequence counts were compared with a reference database, and the LogRR value for each window was calculated. The LogRR value reflects the difference between the sample and the reference database window segments, i.e., the CNV situation of each window segment. Finally, a t-statistic analysis was constructed using the mean difference of LogRR values between adjacent windows to accurately determine the segmentation points of the variant regions and identify specific CNV variation information. The results of chromosome aneuploidy detection in WZW family offspring embryos are shown below. Figure 8 As shown in Table 5, the sequencing data indicators of the WZW family are as follows.
[0133] Table 5
[0134]
[0135] Constructing a haplotype linkage map of the whole genome of a family
[0136] For example, based on the first fixed coordinate locus SNP1, where the female's genotype is AA and the male's is GG, the genotype of the offspring embryo at SNP1 can only be AG. Similarly, for the second fixed coordinate locus SNP2, where the female's genotype is AG and the male's is CC, the genotype of the offspring embryo at SNP2 can only be AC or GC (with probabilities of 50% and 50%, respectively). When low-depth sequencing shows that embryo WZW-2 has an A base at SNP2, the genotype of that embryo at SNP2 is AC. When embryo WZW-8 shows both G and C bases at SNP2, the genotype is GC. When embryo WZW-9 shows only a C base at SNP2, the genotype may be AC or GC (haplotype correction can be achieved by combining the haplotypes of closely linked upstream and downstream haplotypes).
[0137] Then, the genotype information of the male and female at the 3rd to 300,000th fixed coordinate loci SNP3 to SNP300,000 are analyzed sequentially to obtain the possible genotypes and probabilities of the offspring. The interpretation is made in combination with the low-depth sequencing of each embryo. For the genotype of the embryo that cannot be accurately interpreted at the moment, it can be further confirmed during haplotype correction.
[0138] Monotype correction
[0139] In a family pedigree haplotype linkage map, all offspring haplotype information is inherited from both parents. Co-inherited haplotype information among offspring should be consistent, while non-co-inherited haplotype information should be inconsistent. This allows for haplotype segment correction, removing interfering pollutants (segments). In this family, we use offspring (aborted fetus) WZW-0 as a reference. Other offspring (embryos) either share only the maternal haplotype with the reference; or only the paternal haplotype with the reference; or share both paternal and maternal haplotypes with the reference; or have no shared haplotypes with the reference. If all offspring haplotypes originate from the same paternal or maternal haplotype, they are considered co-inherited haplotypes, and their haplotype information should be completely consistent. If all offspring haplotypes originate from different paternal or maternal haplotypes, they are considered non-co-inherited haplotypes, and their haplotype information should be inconsistent.
[0140] The results of the WZW family monotypic model are as follows: Figure 9 As shown, the results of the X chromosome genetic analysis of the WZW family are as follows: Figure 10 As shown in Figure 6, the embryo testing results for the WZW family are as indicated by target 6.
[0141] Table 6
[0142] embryo Classification WZW-2 carry WZW-8 normal WZW-9 normal
[0143] Based on the above results, both the woman and the miscarried fetus had a 0.47M deletion in the Xq28 region, which was the main cause of the miscarriage. Embryos WZW-8 and WZW-9 did not have deletions in the Xq28 region of the X chromosome and were both capable of implantation.
[0144] In summary, this invention, by combining SNP localization technology and using known SNP information as a reference, along with pedigree relationships, further reduces the amount of sequencing data. Regardless of whether RetSeq or high-depth WGS sequencing is used, the amount of sequencing data can be reduced by more than 60%. It can achieve integrated detection of non-holistic, single-gene diseases and chromosomal structural rearrangements in the same experimental test.
[0145] The applicant declares that the detailed method of the present invention is illustrated by the above embodiments, but the present invention is not limited to the above detailed method, that is, it does not mean that the present invention must rely on the above detailed method to be implemented. Those skilled in the art should understand that any improvements to the present invention, equivalent substitutions of the raw materials of the product of the present invention, addition of auxiliary components, selection of specific methods, etc., all fall within the protection scope and disclosure scope of the present invention.
Claims
1. A method for family-based whole-genome haplotype linkage analysis for non-disease diagnosis purposes, characterized in that, The method includes the following steps: (1) Take one or two genomic DNA samples from the father, mother or offspring in a family that conforms to Mendel's laws of inheritance, and obtain the SNP information of the genomic DNA samples; (2) Use the genomic DNA sample after obtaining SNP information as the reference sample, and establish a set of coordinate and genotype information data for the SNP sites of the reference sample with the human standard reference genome hg19 or hg38 as the coordinates. (3) Sequencing of other individual samples in the family; based on the SNP coordinates and genotype information data set of the reference sample, and based on Mendel's laws of inheritance, analyzing the genotypes of SNPs with the same coordinates in other samples of the family; (4) Based on the genotype information of all SNP loci in the same coordinate system of the family sample, and combined with the family kinship, construct the whole genome haplotype linkage map of the family; (5) In the haplotype linkage analysis map of the whole genome of the family, haplotype correction is performed to remove interfering segments according to the principle that the coherent haplotype information among offspring should be consistent and the non-coherent haplotype information should be inconsistent.
2. The method for family-wide genome haplotype linkage analysis for non-disease diagnosis purposes according to claim 1, characterized in that, The method for obtaining SNP information of the genomic DNA sample in step (1) includes any one of the following: obtaining from existing NGS data, obtaining from existing SNP array data, performing WGS sequencing, performing RetSeq detection, performing SNP array detection, or performing third-generation sequencing detection.
3. The family genome-wide haplotype linkage analysis method for non-disease diagnosis purposes according to claim 1 or 2, characterized in that, The number of SNPs in step (1) is >300,000.
4. The method for family-wide genome haplotype linkage analysis for non-disease diagnosis purposes according to claim 3, characterized in that, The number of SNPs in step (1) is 500,000 to 800,000.
5. The method for family-wide genome haplotype linkage analysis for non-disease diagnosis purposes according to claim 4, characterized in that, The number of SNPs in step (1) is 600,000 to 700,000.
6. The method for family-wide genome haplotype linkage analysis for non-disease diagnosis purposes according to claim 1, characterized in that, The sequencing methods described in step (3) include WGS sequencing or RetSeq detection.
7. A detection device for PGT-A, PGT-M, and PGT-SR, characterized in that, The detection device includes an SNP acquisition unit, a reference sample construction unit, a PGT-A analysis unit, a whole-genome haplotype linkage analysis map construction unit, and PGT-M and PGT-SR analysis units. The SNP acquisition unit is used to perform the following: Take genomic DNA samples from any one or two of the father, mother, or offspring in a family that conforms to Mendel's laws of inheritance, and obtain the SNP information of the genomic DNA samples; The reference sample construction unit is used to perform the following: Using genomic DNA samples after obtaining SNP information as reference samples, a set of coordinate and genotype information data for the SNP sites in the reference samples is established; The PGT-A analysis unit is used to perform the following: SNP loci with the same coordinates as the reference sample in other individual samples of the family were sequenced. Based on the SNP coordinates and genotype information dataset of the reference sample, the genotypes of SNPs with the same coordinates in other samples of the family were analyzed according to Mendel's laws of inheritance. Copy number variation analysis was performed using the circular binary segmentation algorithm. The unit for constructing a whole-genome haplotype linkage map is used to perform the following: Based on the genotype information of all SNP loci at the same coordinates in the family sample, and combined with the family kinship, a haplotype linkage map of the whole genome of the family was constructed. The PGT-M and PGT-SR analysis units are used to perform the following: In the family whole-genome haplotype linkage map, based on the principle that co-herited haplotype information among offspring should be consistent and non-co-herited haplotype information should be inconsistent, haplotype segment correction is performed to remove interfering segments, and PGT-M and PGT-SR analyses are conducted.
8. The PGT-A, PGT-M, and PGT-SR detection device according to claim 7, characterized in that, The methods for obtaining SNP information of the genomic DNA sample in the SNP unit include: obtaining it from existing NGS data, obtaining it from existing SNP array data, performing WGS sequencing, performing RetSeq detection, performing SNP array detection, or performing third-generation sequencing detection.
9. The PGT-A, PGT-M, and PGT-SR detection device according to claim 7, characterized in that, The number of SNPs obtained in the SNP unit is >300,000.
10. The PGT-A, PGT-M, and PGT-SR detection device according to claim 9, characterized in that, The number of SNPs obtained in the SNP unit is 500,000 to 800,000.
11. The PGT-A, PGT-M, and PGT-SR detection device according to claim 10, characterized in that, The number of SNPs obtained in the SNP unit is 600,000 to 700,000.
12. The PGT-A, PGT-M, and PGT-SR detection device according to claim 7, characterized in that, The sequencing methods described in the PGT-A analysis unit include WGS sequencing, RetSeq detection, SNP array detection, or third-generation sequencing.
13. A computer device comprising a memory and a processor, wherein the memory stores a computer program, characterized in that, The computer program executes the family whole genome haplotype linkage analysis method for non-disease diagnosis purposes as described in any one of claims 1-6 or implements the functions of the PGT-A, PGT-M and PGT-SR detection devices as described in any one of claims 7-12.
14. A computer-readable storage medium having a computer program stored thereon, characterized in that, The computer program executes the family whole genome haplotype linkage analysis method for non-disease diagnosis purposes as described in any one of claims 1-6 or implements the functions of the PGT-A, PGT-M and PGT-SR detection devices as described in any one of claims 7-12.