DNA tags and their use
By adding specific DNA tags and adapters to the ends of DNA/RNA molecular fragments, sequencing libraries are constructed and high-throughput sequencing is performed, solving the problem of detecting trace genomic variations in existing technologies and enabling accurate detection and early screening of trace variations.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- BGI GENOMICS CO LTD
- Filing Date
- 2017-04-27
- Publication Date
- 2026-06-23
AI Technical Summary
Existing high-throughput sequencing technologies suffer from amplification bias and sequencing errors when detecting minute variations in the genome, making it difficult to accurately identify low-frequency variations and limiting the detection of rare and low-frequency variations.
By employing original DNA tags and DNA adapters, specific sequences are added to the ends of DNA/RNA molecular fragments as unique tags to construct sequencing libraries and perform high-throughput sequencing. UMI technology is used for data analysis to eliminate false positive mutations, enabling the detection and verification of extremely small variations.
It enables the detection of mutations with a frequency as low as 0.01%, supporting early screening for cancers, neurodegenerative diseases, cardiovascular diseases, and other conditions induced by cumulative mutations in somatic cells and stem cells, thereby improving the sensitivity and accuracy of detection.
Smart Images

Figure FT_1 
Figure FT_2 
Figure FT_3
Abstract
Description
[0001] Priority information
[0002] none Technical Field
[0003] This invention relates to the field of biological sequencing, specifically to DNA tags, DNA adapters, methods for constructing sequencing libraries, sequencing libraries, and sequencing methods. Background Technology
[0004] The rapid development of high-throughput sequencing technology has ushered in a new era for genomics research. It not only enables large-scale genome sequencing but also facilitates gene expression analysis and the identification of non-coding small RNAs. In the medical field, high-throughput sequencing technology has broken through the throughput limitations in disease research, making multi-level and comprehensive disease research possible and providing effective means for disease prevention, diagnosis, and treatment. DNA sequencing, DNA quantification, and RNA abundance analysis are of great significance in genomics, gene expression research, and medical genetics testing. However, high-throughput sequencing requires PCR amplification of sample DNA / RNA before sequencing. PCR is generally plagued by amplification bias and errors. Furthermore, sequencing errors can occur during the sequencing process due to specific sequencing platforms and environments, resulting in approximately 1% of bases not being correctly identified, thus limiting the detection of rare and low-frequency variants.
[0005] Unique Molecular Identifiers (UMI) technology involves randomly adding a synthetically produced sequence (typically 5-12 bp) to the end of a DNA / RNA fragment as a unique tag to identify that fragment, used to record the original DNA / RNA information of the sample. As early as 2011, Isaac Kinde, Jian Wu, and others used Unique Identifier (UID) technology to detect rare mutations; this technology is similar to UMI. Then, in 2012, to solve the problem of determining the relative abundance of two different molecules or the absolute quantification of multiple molecules in a single sample, Teemu Kivioja, Anna... First, researchers used single-molecule tagging (UMI) technology to count the absolute quantity of multiple molecules. In the same year, Michael W. Schmitt et al. employed further UMI and duplex sequencing (DS) technology to detect extremely rare mutations. Similarly, in 2014, Scott R. Kennedy, Michael W. Schmitt et al. provided a detailed protocol outlining efficient DS adapter synthesis, library preparation, target enrichment, and data analysis workflows. Then, in 2015, Michael W. Schmitt et al. used DS technology to detect rare mutations in the ABL1 gene.
[0006] However, further research is needed to detect minute variations in the genome. Summary of the Invention
[0007] The present invention aims to solve at least one of the technical problems existing in the prior art.
[0008] Based on their original UMI sequences, the inventors of this application have developed a system for detecting and verifying minute genomic variations. This system can detect mutations as low as 0.01%, enabling early screening for cancers, neurodegenerative diseases, cardiovascular diseases, and other conditions induced by cumulative mutations in somatic cells and stem cells.
[0009] In a first aspect, the present invention provides a DNA tag. According to embodiments of the invention, the tag has a sequence selected from at least one of the following: (1) HHATHHHTCACCHHATHHH (SEQ ID NO: 10); and (2) HHHTAHHTAHHHTAHH (SEQ ID NO: 11), wherein H represents A, T, or C. Using the tag according to embodiments of the invention, the detection and verification of extremely small amounts of variation (mutation frequency as low as 0.01%) can be achieved, which is of great significance for the early screening of cancers, neurodegenerative diseases, cardiovascular diseases, etc., induced by cumulative mutations in somatic cells, stem cells, etc.
[0010] In a second aspect, the present invention provides a DNA adapter. According to an embodiment of the invention, the DNA adapter contains the aforementioned DNA tag. Constructing a sequencing library using the DNA adapter according to an embodiment of the invention, and then sequencing the library, allows for the detection of extremely small amounts of variation, exhibiting high sensitivity for detecting trace mutations or rare mutations with a mutation frequency as low as 0.01%. The DNA adapter according to an embodiment of the invention is of great significance for the early screening of cancers, neurodegenerative diseases, cardiovascular diseases, and other conditions induced by cumulative mutations in somatic cells and stem cells.
[0011] In a third aspect, the present invention proposes the application of the aforementioned DNA tag and adapter in the detection of minute variations. Using the tag and adapter according to embodiments of the present invention, the detection and verification of extremely minute variations (mutation frequency as low as 0.01%) can be achieved, which is of great significance for the early screening of cancers, neurodegenerative diseases, cardiovascular diseases, etc., induced by cumulative mutations in somatic cells, stem cells, etc.
[0012] In a fourth aspect, the present invention provides a method for constructing a sequencing library. According to embodiments of the invention, the method includes enriching nucleic acid molecules linked with the aforementioned DNA adapters to obtain a sequencing library. The sequencing library constructed using the method according to embodiments of the invention can be used for the detection of extremely small variations, with mutation frequencies as low as 0.01%.
[0013] In a fifth aspect, the present invention provides a sequencing library. According to an embodiment of the present invention, the sequencing library is obtained by the method for constructing a sequencing library described above. High-throughput sequencing of this sequencing library can detect mutation frequencies as low as 0.01%, enabling early screening for cancers, neurodegenerative diseases, cardiovascular diseases, etc., induced by cumulative mutations in somatic cells and stem cells.
[0014] In a sixth aspect, the present invention provides a sequencing method. According to an embodiment of the invention, the method includes sequencing the aforementioned sequencing library and performing data analysis. Using the sequencing method according to an embodiment of the invention, the detection and verification of low-frequency mutations can be achieved. Furthermore, depending on the sequencing depth, the mutation frequency detectable by UMI technology can reach 0.01%, which can be effectively applied to the early screening of cancers, neurodegenerative diseases, cardiovascular diseases, etc., induced by cumulative mutations in somatic cells and stem cells.
[0015] Additional aspects and advantages of the invention will be set forth in part in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention. Attached Figure Description
[0016] The above and / or additional aspects and advantages of the present invention will become apparent and readily understood from the description of the embodiments taken in conjunction with the following drawings, in which:
[0017] Figure 1 This is a flowchart illustrating the overall analysis of a trace variation detection system according to an embodiment of the present invention.
[0018] Figure 2 This is a flowchart of data analysis and processing according to an embodiment of the present invention;
[0019] Figure 3This is a diagram illustrating the purification, quantification, and Sanger sequencing verification of PCR products according to an embodiment of the present invention.
[0020] Figure 4 This is a diagram showing the result of a joint prepared using the detection 2100 detection plus "T" strategy according to an embodiment of the present invention;
[0021] Figure 5 This is a diagram showing the result of a joint prepared using a detection 2100 detection plus anchor strategy according to an embodiment of the present invention;
[0022] Figure 6 This is a diagram showing the results of a linker prepared using a detection 2100 detection enzyme digestion strategy according to an embodiment of the present invention;
[0023] Figure 7 This is a diagram showing the results of detecting a sequencing library using Detector 2100 according to an embodiment of the present invention;
[0024] Figure 8 This is a cumulative depth distribution map of a sample according to an embodiment of the present invention;
[0025] Figure 9 A depth distribution map of a sample according to an embodiment of the present invention;
[0026] Figure 10 A distribution map of the UMI sequence set of samples according to an embodiment of the present invention; and
[0027] Figure 11 A diagram showing the construction result of a duplex consistency sequence according to an embodiment of the present invention. Detailed description of the invention
[0028] Embodiments of the present invention are described in detail below. Examples of these embodiments are shown in the accompanying drawings, wherein the same or similar reference numerals denote the same or similar elements or elements having the same or similar functions throughout. The embodiments described below with reference to the accompanying drawings are exemplary and are only used to explain the present invention, and should not be construed as limiting the present invention.
[0029] It should be noted that the terms "first" and "second" are used for descriptive purposes only and should not be construed as indicating or implying relative importance or implicitly specifying the number of indicated technical features. Therefore, a feature defined as "first" or "second" may explicitly or implicitly include one or more of that feature. Furthermore, in the description of this invention, unless otherwise stated, "a plurality of" means two or more.
[0030] For the nucleic acids mentioned in this specification and claims, those skilled in the art should understand that they actually include any one or both of the complementary double strands. For convenience, although only one strand is given in most cases in this specification and claims, the other complementary strand is actually disclosed. For example, mentioning SEQ ID NO: 1 actually includes its complementary sequence. Those skilled in the art will also understand that one strand can be used to detect the other strand, and vice versa.
[0031] DNA Tags
[0032] In a first aspect, the present invention provides a DNA tag for detecting trace variations. According to embodiments of the invention, the tag has a sequence selected from at least one of the following: (1) HHATHHHTCACCHHATHHH; and (2) HHHTAHHTAHHHTAHH, wherein H represents A, T, or C. Using the tag according to embodiments of the invention, the detection and verification of extremely small variations (mutation frequency as low as 0.01%) can be achieved, which is of great significance for the early screening of cancers, neurodegenerative diseases, cardiovascular diseases, etc., induced by cumulative mutations in somatic cells, stem cells, etc.
[0033] DNA connectors
[0034] In a second aspect, the present invention provides a DNA adapter. According to an embodiment of the invention, the DNA adapter contains the aforementioned DNA tag. Constructing a sequencing library using the DNA adapter according to an embodiment of the invention, and then sequencing the library, allows for the detection of extremely small amounts of variation, exhibiting high sensitivity for detecting trace mutations or rare mutations with a mutation frequency as low as 0.01%. The DNA adapter according to an embodiment of the invention is of great significance for the early screening of cancers, neurodegenerative diseases, cardiovascular diseases, and other conditions induced by cumulative mutations in somatic cells and stem cells.
[0035] According to another specific embodiment of the present invention, the adapter has a sticky dT end. Therefore, efficient and rapid ligation of the adapter to the gene fragment to be sequenced can be achieved through rapid TA ligation.
[0036] According to a specific embodiment of the present invention, the DNA adapter further includes an anchoring sequence formed between the sticky dT end and the tag sequence. During the annealing reaction of the anchoring sequence and the tag sequence, the two sequences pair complementaryly until the anchoring sequence terminates with a protruding T base at its 3' end. In molecular cloning, blunt-end ligation forming a protruding base has relatively poor stability and a certain failure rate; however, by annealing two sequences (where the anchoring sequence has an additional dT base), the protruding dT end is formed through complementary pairing of the two sequences. The latter does not require a ligation reaction; only complementary pairing of the two primers is needed. Therefore, introducing an anchoring sequence is more efficient and robust than the commonly used blunt-end 3' ligation of dT.
[0037] According to a specific example of the present invention, the anchoring sequence has the nucleotide sequence shown in SEQ ID NO: 1: CTATGTCGATGC (SEQ ID NO: 1). The anchoring sequence according to embodiments of the present invention is strictly not complementary to sequences other than its complementary sequence, and it is not easily ligated. Furthermore, dDTP does not contain dC bases, therefore the extension reaction terminates, thereby effectively protecting the complementary structure of the anchoring sequence from destruction.
[0038] According to an embodiment of the present invention, the sticky end dT is formed at the 3' end of the DNA tag. This allows for rapid and efficient TA ligation with the sequencing fragment whose 5' end is linked to A.
[0039] According to a specific embodiment of the present invention, the connector with the anchoring sequence is obtained by sequentially performing gradient annealing, dDTP extension, and ethanol purification and Nick replenishment. The specific steps are as follows:
[0040] 1. Gradient annealing, the specific steps include:
[0041] 1) Dilute each of the three sequences to 150 μM with ddH2O (OAB buffer) according to the tube wall mol parameter, and then mix 12 μl of each of the three sequences in equal volumes, as shown in Table 1;
[0042] Table 1:
[0043]
[0044] Note: Experiments have shown that adding dT during the synthesis of anchor sequences results in better stability and efficiency than adding dT after the joint is prepared. Therefore, when preparing joints that connect anchor sequences, dT should be added during the synthesis of the anchor sequences.
[0045] 2) Place the sample in a PCR instrument for annealing reaction;
[0046] 3) After the reaction is complete, store at -20℃ and label it as pre-Mix-ac;
[0047] 2. dDTP extension, the specific steps include:
[0048] 1) Take 35 μl of pre-Mix-ac, add the reagent, and mix thoroughly by pipetting. The resulting system is shown in Table 2:
[0049] Table 2:
[0050] pre-Mix-ac 35μl 10×Blue buffer 5μl dDTP (25mM each) 5μl Klenow(3'→5'exo-)(5U / μl) 5μl In Total 50μl
[0051] 2) Incubate at 37℃ for 1 hour;
[0052] 3) Purify with alcohol, dissolve 50 μl with ddH2O;
[0053] 4) Store at -20℃ and label it as ac-Adpater-1.T.1.
[0054] 3. Alcohol purification and Nick replenishment, the specific steps include:
[0055] 1) Take 45 μl of ac-Adpater-1.T.1, add the following reagents, mix well by pipetting, and the resulting system is shown in Table 3;
[0056] Table 3:
[0057] ac-Adpater-1.T.1 45μl 2x Rapid ligation buffer 50μl T4 DNA Ligase (600 U / μl) 5μl In Total 50μl
[0058] 2) Incubate at 37℃ for 30 minutes.
[0059] 4. Purify with alcohol, dissolve 30 μl with ddH2O. Take 1 μl and dilute for detection at 2100;
[0060] 5. Store at -20℃ after the reaction is complete.
[0061] According to a specific example of the present invention, the DNA adapter further comprises: a restriction enzyme sequence formed at the end of the DNA tag, wherein the restriction enzyme sequence carries a restriction endonuclease recognition site suitable for generating a sticky T-terminal. The endonuclease can cleave the 8 bases following the restriction enzyme recognition site on the sense strand and the 7 bases following the restriction enzyme recognition site on the antisense strand, forming a sticky 3' end with a 1-dT base protruding. The adapter with the restriction enzyme sequence forms a more stable 3' T-terminal structure.
[0062] According to another specific example of the present invention, the restriction enzyme sequence is an HphI-specific recognition site. After the HphI-specific recognition site is specifically recognized and cleaved by HphI, a sticky end dT can be generated at the 3' end of the DNA adapter, which can then be rapidly and efficiently ligated with the fragment to be sequenced.
[0063] According to another specific example of the present invention, the adapter ligated with the HphI-specific recognition site restriction enzyme sequence is obtained by sequentially performing gradient annealing, dDTP extension, and HphI digestion. Specifically, it can be obtained through two methods: short PCR after digestion and long PF after digestion.
[0064] The specific method for short PCR after enzyme digestion is as follows:
[0065] 1. Gradient annealing, the specific steps include:
[0066] 1) Dilute each of the tubes to 100 μM with ddH2O (OAB buffer) according to the tube wall mol parameter, and then mix 20 μl of each solution.
[0067] 2) Place the sample in a PCR instrument for annealing reaction;
[0068] 3) After the reaction is complete, store at -20℃ and label it as pre-Mix-S.
[0069] 2. dDTP extension, the specific steps include:
[0070] 1) Take 35ul of pre-Mix-S, add the reagent, mix well by pipetting, and the system is shown in Table 4;
[0071] Table 4:
[0072] pre-Mix-S 35μl 10×Blue buffer 5μl dDTP (25mM each) 5μl <![CDATA[Klenow(3’→5’exo - )]]> 5μl In Total 50μl
[0073] 2) Incubate at 37℃ for 1 hour;
[0074] 3) Alcohol purification: dissolve 20 μl with ddH2O, take 1 μl for dilution, and use it for detection of 2100 high sensitivity;
[0075] 4) Store at -20℃ and label it as pre-Adpater-S.
[0076] Hphl digestion, the specific steps include:
[0077] 1) Take the volumes listed in Table 5 and add them to pre-Adpater-S for mixing. The resulting system is shown in Table 5.
[0078] Table 5:
[0079]
[0080] 2) Inactivate by incubating at 37℃ for 16 hours and then at 65℃ for 20 minutes;
[0081] 3) Alcohol purification: Dissolve 30 μl with ddH2O, take 1 μl for dilution, and use it for detection of 2100 high sensitivity;
[0082] 4) Store at -20℃ after the reaction is complete.
[0083] The specific method for generating long PF (PCR-Free) after enzyme digestion is as follows:
[0084] Gradient annealing, the specific steps include:
[0085] 1) Dilute each of the tubes to 100 μM with ddH2O (OAB buffer) according to the tube wall mol parameter, and then mix 20 μl of each solution.
[0086] 2) Place the sample in a PCR instrument for annealing reaction;
[0087] 3) After the reaction is complete, store at -20℃ and label it as pre-Mix-L57.
[0088] dDTP extension, the specific steps include:
[0089] 1) Take 35ul of pre-Mix-L57, add the following reagents, mix well by pipetting, and the system is shown in Table 6;
[0090] Table 6:
[0091] pre-Mix-L57 35μl 10×Blue buffer 5μl dDTP (250nM each) 5μl <![CDATA[Klenow(3’→5’exo - )]]> 5μl In Total 50μl
[0092] 2) Incubate at 37℃ for 1 hour;
[0093] 3) Alcohol purification: dissolve 20 μl with ddH2O, take 1 μl for dilution, and use it for detection of 2100 high sensitivity;
[0094] 4) Store at -20℃ and label as pre-Adpater-L57.
[0095] Hphl digestion, the specific steps include:
[0096] 1) Take the volumes listed in Table 7 and add them to pre-Adpater-L57 for mixing. The system is shown in Table 7.
[0097] Table 7:
[0098]
[0099]
[0100] 2) Incubate at 37℃ for 16 hours; incubate at 65℃ for 20 minutes to inactivate;
[0101] 3) Alcohol purification: Dissolve 30 μl with ddH2O, take 1 μl for dilution, and use it for detection of 2100 high sensitivity;
[0102] 4) After the reaction is complete, store at -20℃.
[0103] Uses of DNA tags and DNA adapters in detecting minute variations
[0104] In a third aspect, the present invention proposes the use of the aforementioned DNA tag and adapter in the detection of minute variations. Using the tag and adapter according to embodiments of the present invention, the detection and verification of extremely minute variations (mutation frequency as low as 0.01%) can be achieved. In scientific research, the detection of extremely minute variations (mutation frequency as low as 0.01%) using the tag and adapter according to embodiments of the present invention provides a reliable detection method for scientific research on extremely minute variations, such as somatic cell mitochondrial mutation rate detection, rare DNA variation detection (e.g., detection of novel susceptibility sites), accurate calculation of DNA / RNA copy number using single-molecule counting, research on hereditary diseases, and aging research (e.g., detection of methylation sites related to aging). Furthermore, it is of great significance for early screening of cancers, neurodegenerative diseases, cardiovascular diseases, etc., induced by cumulative mutations in somatic cells and stem cells.
[0105] Methods for constructing sequencing libraries
[0106] In a fourth aspect, the present invention provides a method for constructing a sequencing library. According to embodiments of the invention, the method includes enriching nucleic acid molecules linked with the aforementioned DNA adapters to obtain a sequencing library. The sequencing library constructed using the method according to embodiments of the invention can be used for the detection of extremely small variations, with mutation frequencies as low as 0.01%.
[0107] Specifically, according to an embodiment of the present invention, the nucleic acid molecule is obtained by: (1) performing PCR amplification on the nucleic acid sample to be tested to obtain a nucleic acid sample fragment; (2) adding A to the 3' end of the nucleic acid sample fragment; (3) ligating the DNA adapter described above with the nucleic acid sample fragment obtained in step (2) to obtain the nucleic acid molecule ligated with the DNA adapter described above.
[0108] According to another embodiment of the present invention, after a DNA adapter with only a sticky end dT or an anchoring sequence between the sticky end dT and the tag sequence is ligated to the sample fragment to be tested, the enrichment process is achieved by PCR enrichment. The specific steps are as follows:
[0109] 1) Experimental preparation. A PCR reaction table was prepared based on the experimental task sheet and sample size.
[0110] 2) Add template. Add DNA samples to the 96-well PCR plate according to the order in the PCR reaction table. For batch samples, add 3 μL per well; for re-amplification / re-extraction samples, add 5 μL per well. Verify that the DNA information matches the PCR reaction table. Add samples to the bottom or near the wall of the tube. After sealing with sealing film, briefly centrifuge at 2000 rpm for 30 seconds, check the bottom of the tube for sample addition, and set aside for use.
[0111] 3) Mix aliquoting. Aliquot the prepared mix into the reaction plates, 22 μL per well for batch samples and 20 μL per well for reamplification samples, and add the mix while the plates are suspended. After covering with a gel pad, briefly centrifuge at 1500 rpm for 30 s, and immediately perform PCR amplification using a PCR instrument.
[0112] 4) Cyclic amplification using a PCR instrument;
[0113] 5) Post-amplification product detection: After PCR amplification, the product is briefly centrifuged at 2,000 rpm for 30 seconds and transferred to the electrophoresis chamber for detection. If the product cannot be detected immediately, it should be stored at 4℃.
[0114] According to another embodiment of the present invention, after the adapter with the restriction enzyme sequence is ligated to the test sample, the enrichment process can also be achieved by the above-described PCR enrichment method. According to yet another specific example of the present invention, when the adapter with the restriction enzyme sequence is obtained by the above-described long PF (PCR-Free) method after restriction enzyme digestion, the enrichment process can be omitted after the adapter with the restriction enzyme sequence is ligated to the test sample.
[0115] According to a specific example of the present invention, prior to the enrichment process, the nucleic acid molecules ligated with the aforementioned DNA adapter are further purified. Specifically, the purification process can be performed using magnetic beads. The purification process removes enzymes and buffer solutions involved in the ligation process, thereby eliminating interference with subsequent enrichment and significantly improving the enrichment success rate and efficiency of the ligation products.
[0116] sequencing libraries
[0117] In a fifth aspect, the present invention provides a sequencing library. According to an embodiment of the invention, the sequencing library is obtained using the sequencing library construction method described above. High-throughput sequencing of this sequencing library can detect mutation frequencies as low as 0.01%, enabling early screening for cancers, neurodegenerative diseases, cardiovascular diseases, etc., induced by cumulative mutations in somatic cells and stem cells.
[0118] sequencing methods
[0119] In a sixth aspect, the present invention provides a sequencing method. According to an embodiment of the invention, the method includes sequencing the aforementioned sequencing library and performing data analysis. Using the sequencing method according to an embodiment of the invention, the detection and verification of low-frequency mutations can be achieved, and it can be effectively applied to the early screening of cancers, neurodegenerative diseases, cardiovascular diseases, etc., induced by cumulative mutations in somatic cells and stem cells.
[0120] According to a specific embodiment of the present invention, the sequencing is performed using the HiSeq2500 platform. High-throughput sequencing on the HiSeq2500 platform can significantly reduce costs, ensure the stability of experimental data and analysis results, and, more importantly, depending on the sequencing depth, the mutation frequency that UMI technology can detect can reach 0.01%.
[0121] According to a specific example of the present invention, the data analysis and processing flow refers to Figure 2 The details are as follows:
[0122] 1) Data preprocessing. The raw sequencing data is preprocessed, including filtering low-quality reads, extracting UMI adapter sequences, and analyzing read and UMI adapter sequence information.
[0123] 2) Alignment. Use BWA (V0.5.9-r16) to align the preprocessed reads to the reference sequence;
[0124] 3) Filtering the comparison results. The comparison results are statistically analyzed and filtered.
[0125] 4) Sorting. Use samtools (V 0.1.16) to sort the results;
[0126] 5) Construct a single-chain consistent sequence. Construct a single-chain consistent sequence based on the UMI sequence set;
[0127] 6) Sorting. Use samtools (V0.1.16) to sort the single-chain consistency sequences;
[0128] 7) Construct a duplex consistency sequence. Construct a duplex consistency sequence based on complementary sequences in the UMI sequence set;
[0129] 8) Sorting. Use samtools (V0.1.16) to sort the duplex consistency sequence;
[0130] 9) Filtering and sorting. Use samtools (V0.1.16) to filter the duplex consistency sequences and sort the filtered results;
[0131] 10) Local alignment. GATK (V2.4-9) was used to perform local alignment of duplex consistency sequences;
[0132] 11) Mutation information analysis. Analyze and statistically process mutation information according to the set mutation rate.
[0133] In summary, by utilizing the DNA tag, DNA adapter, method for constructing a sequencing library, sequencing library, and sequencing method according to embodiments of the present invention, the detection and verification of low-frequency mutations can be achieved. The detectable mutation frequency can be as low as 0.01%, which can be effectively applied to the early screening of cancers, neurodegenerative diseases, cardiovascular diseases, etc., induced by cumulative mutations in somatic cells and stem cells. Specifically, as described below: Because the present invention employs a special library preparation and analysis strategy, namely, ligating the prepared adapter sequence with the sample DNA, although the adapter sequence contains 10 degenerate bases, each molecule still has its specific sequence. After the adapter is added to the sample DNA, the resulting raw sequencing template has a 19-base molecular tag added to the end of each template, resulting in a total of 38 bases on the left and right ends of each template. Each degenerate base has 3 choices, and 20 bases equals 3^20, which is nearly 350 million possibilities. This ensures that each raw template is unique in the raw library. PCR amplification of the original library results in two complementary molecular families for each template: a forward sequence and a reverse sequence. Based on this library preparation and sequencing strategy, some false-positive mutation sites can be excluded in specific analyses using the following strategies:
[0134] 1) A mutation occurs only once or a few times within a molecular family. Furthermore, the same mutation does not occur in complementary molecular families. This indicates that the mutation is a random error, a replication error introduced during PCR, or a base misinterpretation by the HiSeq machine. It also suggests that the sample does not have a mutation at that location.
[0135] 2) The mutation appears uniformly in a molecular family but not in its complementary molecular family, indicating that this mutation is a replication error introduced in the first cycle of PCR;
[0136] 3) The mutations appear uniformly within the molecular family and correspond to mutations in the complementary strand. This indicates that the mutations are real and reliable.
[0137] The present invention will be explained below with reference to embodiments. Those skilled in the art will understand that the following embodiments are for illustrative purposes only and should not be considered as limiting the scope of the invention. Where specific techniques or conditions are not specified in the embodiments, they shall be performed in accordance with the techniques or conditions described in the literature in the art (e.g., refer to J. Sambrook et al., *Molecular Cloning: A Laboratory Manual*, 3rd edition, Science Press, translated by Huang Peitang et al.) or according to the product instructions. Reagents or instruments whose manufacturers are not specified are all commercially available conventional products, such as those purchased from Illumina.
[0138] In the embodiments of the present invention, two sets of DNA samples were used to perform PCR on the target region. After the specific base sites of each sample were determined by Sanger sequencing, they were mixed in molar ratios of 1:1, 1:100, 1:1000, and 1:10000 to form four sets of products. Finally, the three UMI strategies were tested in sequence, as detailed in Table 8.
[0139] Table 8:
[0140]
[0141] The target areas are shown in Table 9.
[0142] The target sequence is DRB1*01:01:01 (Note: the sequence corresponding to this type is the reference sequence of the DRB1 gene; the sequence shown below is the sequence of this type within the target region). The sequence is as follows:
[0143] ATGGGTGTGTCTGAAGCTCCCTGGAGGCTCCTGCATGACAGCGCTGACAGTGACACTGATGGTGCTGAGCTCCCCACTGGCTTTGGCTGGGGACACCCGAC (SEQ ID NO: 4).
[0144] Table 9:
[0145] Gene name Exons Starting position End position Sequence length DRB1 Exome1 211 310 100bp
[0146] Based on the overall analysis flowchart of the trace variation detection system (reference) Figure 1 The implementation of each step will be described below.
[0147] 1. DNA extraction, the specific steps are as follows:
[0148] (1) Add 20 μL of proteinase K solution to a 1.5 mL centrifuge tube;
[0149] (2) Add 200 μL of blood sample to the tube;
[0150] (3) Add 200 μL of buffer AL to the tube, vortex for 15 seconds to mix thoroughly;
[0151] (4) Bathe in a 56℃ water bath for 10 minutes;
[0152] (5) Centrifuge appropriately in a micro centrifuge to bring all the liquid down to the bottom of the tube;
[0153] (6) Add 200 μL of anhydrous ethanol, vortex for 15 seconds to mix, and centrifuge appropriately in a micro centrifuge to make all the liquid sink to the bottom of the tube.
[0154] (7) Carefully transfer all the liquid obtained in the previous step into the purification column, without wetting the edge, centrifuge at 8000 rpm for 1 minute in a high-speed centrifuge, discard the collection tube and replace it with a new collection tube;
[0155] (8) Carefully open the tube cap, add 500 μL of buffer AW1, being careful not to wet the edges, centrifuge at 8000 rpm for 1 minute in a high-speed centrifuge, discard the collection tube, and replace it with a new collection tube;
[0156] (9) Open the tube cap, add 500 μL of buffer AW2, and centrifuge at 14000 rpm for 3 minutes in a high-speed centrifuge.
[0157] (10) Discard the collection tube, replace it with a new centrifuge tube, and centrifuge at 14,000 rpm for 1 minute in a high-speed centrifuge;
[0158] (11) Discard the collection tube, put the purification column into a 1.5 mL centrifuge tube, let it air dry for 3 minutes, add 50 μL of buffer AE or ultrapure water, let it stand at room temperature for 5 minutes, centrifuge at 8000 rpm for 1 minute in a high-speed centrifuge, discard the purification column, and cover the centrifuge tube.
[0159] (12) Measure the OD value on the nanodrop 2000 and record the measurement results;
[0160] (13) Label the extracted DNA and store it in a -20℃ refrigerator.
[0161] 2. PCR amplification, the specific steps are as follows:
[0162] (1) Primer design;
[0163] Specific and conserved regions upstream and downstream of the target region were identified through bioinformatics analysis as candidate regions for primer design. Primer design was then completed according to primer design principles. To improve data utilization, the PCR primer amplification region was kept as short as possible while still covering the target region.
[0164] Based on the design principles in the technical solution, the primer sequences for the above target regions were finally determined as shown in Table 10.
[0165] Table 10:
[0166] Gene name Exons forward primer reverse primer Amplification length DRB1 Exome1 CCCTGGAGGCTCCTG (SEQ ID NO: 5) CACCCRCAATGTGCA (SEQ ID NO: 6) 75bp
[0167] (2) The DNA samples were amplified by PCR using high-fidelity PCR enzyme and the prepared primers to enrich the target sequence.
[0168] 3. Purification and quantification of PCR products and Sanger sequencing verification (see appendix for details) Figure 3 a and b);
[0169] 4. End-of-product repair: Take more than 200 ng of the product for end-of-product repair and purification;
[0170] 5. Add dA to the 3' end, i.e., add "A" to the 3' end, and then purify;
[0171] 6. Add the UMI connector. The specific steps are as follows:
[0172] (1) Joint preparation, the specific steps are as follows: Figure 1 The three strategies will be introduced in turn:
[0173] I. The "T" strategy, i.e., adding dT at the 3' end, is as follows:
[0174] 1) Gradient annealing, the specific steps include:
[0175] a) Dilute to 100 μM with ddH2O (OAB buffer) according to the tube wall mol parameter, and then take 20 μl of each and mix them together, as shown in Table 11.
[0176] Table 11:
[0177]
[0178] b) Place the sample in a PCR instrument for annealing reaction;
[0179] c) After the reaction is complete, store at -20°C and label it as pre-Mix-T.
[0180] 2) dDTP extension, the specific steps include:
[0181] a) Take 35 μl of pre-Mix-T, add the reagent, and mix well by pipetting as shown in Table 12:
[0182] Table 12:
[0183] pre-Mix-T 35μl 10×Blue buffer 5μl dDTP (25mM each) 5μl Klenow(3'→5'exo-)(5U / μl) 5μl In Total 50μl
[0184] b) Incubate at 37℃ for 1 hour;
[0185] c) Purify with alcohol, dissolve 42ul with ddH2O.
[0186] 3) Add dT, the specific steps include:
[0187] a) Add the reagents from Table 13 to the product from the previous step.
[0188] Table 13:
[0189] Previous product 42μl 10×Blue buffer 5μl dTTP(10mM) 1μl <![CDATA[Klenow(3’→5’exo - )(5U / μl)]]> 2μl In Total 50μl
[0190] b) Incubate at 37℃ for 30 minutes.
[0191] 4) Alcohol purification: Dissolve 30 μl with ddH2O. Take 1 μl and dilute for detection of 2100; (see appendix for details) Figure 4 )
[0192] 5) After the reaction is complete, store at -20℃ and label it as dT-Adpater-T.
[0193] II. Adding an anchor strategy, the specific steps are as follows:
[0194] 1) Gradient annealing, the specific steps include:
[0195] a) Dilute to 150 μM with ddH2O (OAB buffer) according to the tube wall mol parameter, and then mix 12 μl of each solution. See Table 1 for details.
[0196] b) Place the PCR instrument in the annealing reaction;
[0197] c) After the reaction is complete, store at -20°C and label it as pre-Mix-ac;
[0198] 2) dDTP extension, the specific steps include:
[0199] a) Take 35 μl of pre-Mix-ac, add the reagent, and mix well as shown in Table 2.
[0200] b) Incubate at 37℃ for 1 hour;
[0201] c) Purify with alcohol, dissolve 50 μl with ddH2O.
[0202] d) Store at -20℃ and label it as ac-Adpater-1.T.1.
[0203] 3) Alcohol purification to replenish Nick, the specific steps include:
[0204] a) Take 45 μl of ac-Adpater-1.T.1, add the following reagents, the system is shown in Table 14, and mix well by pipetting;
[0205] Table 14:
[0206] ac-Adpater-1.T.1 45μl 2x Rapid ligation buffer 50μl T4 DNA Ligase (600 U / μl) 5μl In Total 50μl
[0207] b) Incubate at 37℃ for 30 minutes.
[0208] 4) Alcohol purification: Dissolve 30 μl with ddH2O. Take 1 μl and dilute for detection of 2100; (see appendix for details) Figure 5 )
[0209] 5) After the reaction is complete, store at -20℃ and label it as ac-Adpater.
[0210] III. Enzyme digestion strategies, namely Hphl digestion, include short-sequence (S) and long-sequence (L) protocols, namely PCR and PCR-Free protocols. The specific steps are as follows:
[0211] 1) Gradient annealing, the specific steps include:
[0212] a) Dilute each of the tubes to 100 μM with ddH2O (OAB buffer) according to the tube wall mol parameter, and then mix 20 μl of each solution.
[0213] Primers for the short sequence protocol are shown in Table 15:
[0214] Table 15:
[0215]
[0216] Primers for long sequence schemes are shown in Table 16:
[0217] Table 16:
[0218]
[0219] b) Place the PCR instrument in the annealing reaction;
[0220] c) After the reaction is complete, store at -20℃ and label them as pre-Mix-S and pre-Mix-L57, respectively.
[0221] 2) dDTP extension, the specific steps include:
[0222] a) Take pre-Mix-S and pre-Mix-L57 respectively, add reagents, the system is shown in Table 17, and mix well by pipetting;
[0223] Table 17:
[0224] pre-Mix-S / pre-Mix-L57 35μl 10×Blue buffer 5μl dDTP (25mM each) 5μl Klenow(3'→5'exo-)(5U / μl) 5μl In Total 50μl
[0225] b) Incubate at 37℃ for 1 hour;
[0226] c) Purify with alcohol, dissolve 20 μl with ddH2O.
[0227] d) Store at -20℃ and label them as pre-Adpater-S and pre-Adpater-L57, respectively.
[0228] 3) Hphl enzyme digestion, the specific steps include:
[0229] a) Mix the systems shown in Table 18 and Table 19 respectively;
[0230] Table 18:
[0231]
[0232] Table 19:
[0233]
[0234] b) Incubate at 37℃ for 16 hours, then incubate at 65℃ for 20 minutes to inactivate.
[0235] 4) Alcohol purification: Dissolve 30 μl with ddH2O. Take 1 μl and dilute for detection of 2100; (see appendix for details) Figure 6 (a and b).
[0236] 5) After the reaction is complete, store at -20℃ and label them as Adpater-S and Adpater-L respectively.
[0237] (2) Connect the prepared UMI connector
[0238] (3) Magnetic bead purification
[0239] 7. PCR enrichment (this step is omitted for long-sequence restriction enzyme digestion protocols, i.e., PCR-Free), magnetic bead purification.
[0240] 8. Library pooling, i.e., submission for review 2100 (see details) Figure 7 a), b), c), d) and qPCR quantification were performed. The qPCR quantification results are shown in Table 20. Then, pooling was performed and the samples were ready for sequencing.
[0241] Table 20:
[0242]
[0243] 9. PE sequencing
[0244] 10. Data Analysis
[0245] Due to space limitations, the following examples will only use samples obtained by the enzyme digestion of long sequences as examples.
[0246] 1) Preprocess the PE90 data from the Hiseq2500 sequencing platform and extract the UMI sequence.
[0247] 2) Remove primer sequences and align (BWA(V0.5.9-r16);
[0248] 3) The results were compared, processed, and statistically analyzed. The cumulative depth distribution map and depth distribution map of the samples are shown in the appendix. Figure 8 and attached Figure 9 Due to space limitations, only the results of UMI-LT57-1 are presented.
[0249] 4) Sort the processed alignment results (samtools(V 0.1.16));
[0250] 5) Construct a single-stranded consistent sequence. The distribution map of the UMI sequence set for this sample is attached. Figure 10 Due to space limitations, only the results of UMI-LT57-1 are presented.
[0251] 6) Sort the single-chain consistency sequence (samtools(V 0.1.16));
[0252] 7) Construct a duplex consistency sequence. The construction result is stored in SAM file format. See the attached screenshot for the result. Figure 11 Due to space limitations, only the results of UMI-LT57-1 are presented.
[0253] 8) Sort, filter, and sort again (samtools(V 0.1.16));
[0254] 9) Local alignment (GATK(V2.4-9));
[0255] 10) Mutation information analysis, the statistical results are shown in Tables 22-24. Due to space limitations, only the regions containing the preset mutation sites are shown.
[0256] Table 21: Results of UMI-LT57-1 Mutation Information Analysis
[0257] Chr Ref Pos Total_Depth Eff_Depth Total_Mut A_Mut_Fre T_Mut_Fre C_Mut_Fre G_Mut_Fre D_ref C 243 22612 22546 0 0->0.0000 0->0.0000 0->0.0000 0->0.0000 D_ref A 244 22615 22450 2 0->0.0000 0->0.0000 1->0.0000 1->0.0000 D_ref T 245 22616 22410 1 1->0.0000 0->0.0000 0->0.0000 0->0.0000 D_ref G 246 22617 22550 2 0->0.0000 1->0.0000 0->0.0000 0->0.0000 D_ref A 247 22620 22416 18128 0->0.0000 0->0.0000 0->0.0000 18128->0.8087 D_ref C 248 22621 22533 0 0->0.0000 0->0.0000 0->0.0000 0->0.0000 D_ref A 249 22612 22296 2 0->0.0000 0->0.0000 0->0.0000 1->0.0000 D_ref G 250 22498 22440 0 0->0.0000 0->0.0000 0->0.0000 0->0.0000 D_ref C 251 22403 22123 17802 0->0.0000 17802->0.8047 0->0.0000 0->0.0000 D_ref G 252 22393 22180 17846 0->0.0000 17845->0.8046 1->0.0000 0->0.0000 D_ref C 253 22391 22335 0 0->0.0000 0->0.0000 0->0.0000 0->0.0000
[0258] Note: All identified mutation sites were detected, indicated in bold: A247G, C251T, and G252T. The column headings in the table have the following meanings: Chr represents the reference sequence identifier; Ref represents the reference base; Pos represents the position information on the reference sequence; Total_Depth represents the total depth; Eff_Depth represents the effective depth; Total_Mut represents the total number of mutated bases; A_Mut_Fre represents the number of bases with A base mutations and the percentage of that number in the effective depth; T_Mut_Fre represents the number of bases with T base mutations and the percentage of that number in the effective depth; C_Mut_Fre represents the number of bases with C base mutations and the percentage of that number in the effective depth; G_Mut_Fre represents the number of bases with G base mutations and the percentage of that number in the effective depth. The following three tables are similar.
[0259] Table 22: Results of UMI-LT57-2 Mutation Information Analysis
[0260] Chr Ref Pos Total_Depth Eff_Depth Total_Mut A_Mut_Fre T_Mut_Fre C_Mut_Fre G_Mut_Fre D_ref C 243 12877 12827 0 0->0.0000 0->0.0000 0->0.0000 0->0.0000 D_ref A 244 12878 12734 0 0->0.0000 0->0.0000 0->0.0000 0->0.0000 D_ref T 245 12878 12649 1 1->0.0001 0->0.0000 0->0.0000 0->0.0000 D_ref G 246 12880 12830 0 0->0.0000 0->0.0000 0->0.0000 0->0.0000 D_ref A 247 12884 12683 591 0->0.0000 0->0.0000 0->0.0000 591->0.0466 D_ref C 248 12884 12829 0 0->0.0000 0->0.0000 0->0.0000 0->0.0000 D_ref A 249 12885 12672 0 0->0.0000 0->0.0000 0->0.0000 0->0.0000 D_ref G 250 12884 12817 0 0->0.0000 0->0.0000 0->0.0000 0->0.0000 D_ref C 251 12882 12785 587 0->0.0000 587->0.0459 0->0.0000 0->0.0000 D_ref G 252 12882 12762 587 0->0.0000 587->0.0460 0->0.0000 0->0.0000 D_ref C 253 12882 12823 0 0->0.0000 0->0.0000 0->0.0000 0->0.0000
[0261] Table 23: Results of UMI-LT57-3 Mutation Information Analysis
[0262]
[0263]
[0264] Table 24: Results of UMI-LT57-4 Mutation Information Analysis
[0265] Chr Ref Pos Total_Depth Eff_Depth Total_Mut A_Mut_Fre T_Mut_Fre C_Mut_Fre G_Mut_Fre D_ref C 243 5273 5252 0 0->0.0000 0->0.0000 0->0.0000 0->0.0000 D_ref A 244 5273 5199 0 0->0.0000 0->0.0000 0->0.0000 0->0.0000 D_ref T 245 5273 5193 0 0->0.0000 0->0.0000 0->0.0000 0->0.0000 D_ref G 246 5286 5247 0 0->0.0000 0->0.0000 0->0.0000 0->0.0000 D_ref A 247 5288 5187 1 0->0.0000 0->0.0000 0->0.0000 1->0.0002 D_ref C 248 5288 5258 0 0->0.0000 0->0.0000 0->0.0000 0->0.0000 D_ref A 249 5288 5161 1 0->0.0000 0->0.0000 0->0.0000 1->0.0002 D_ref G 250 5288 5261 0 0->0.0000 0->0.0000 0->0.0000 0->0.0000 D_ref C 251 5288 5241 2 0->0.0000 2->0.0004 0->0.0000 0->0.0000 D_ref G 252 5288 5246 1 0->0.0000 1->0.0002 0->0.0000 0->0.0000 D_ref C 253 5288 5253 0 0->0.0000 0->0.0000 0->0.0000 0->0.0000
[0266] The analysis results show a strong correlation between the detection rate and the sample mixing ratio; even at a mixing ratio of 10000:1, the identified mutation sites can be correctly detected. Therefore, the UMI sequence designed in this system can detect mutations with a mutation rate of 0.01%.
[0267] Industrial applicability
[0268] The method of this invention can be effectively applied to the detection and verification of low-frequency mutations, and the mutation frequency that can be detected can be as low as 0.01%. It can be effectively applied to the early screening of cancer, neurodegenerative diseases, cardiovascular diseases and other diseases induced by cumulative mutations in somatic cells, stem cells and other organisms.
[0269] Although specific embodiments of the invention have been described in detail, those skilled in the art will understand that various modifications and substitutions can be made to those details based on all the teachings disclosed, and all such changes are within the scope of protection of this invention. The full scope of this invention is given by the appended claims and any equivalents thereof.
[0270] In the description of this specification, the references to terms such as "one embodiment," "some embodiments," "illustrative embodiment," "example," "specific example," or "some examples," etc., indicate that a specific feature, structure, material, or characteristic described in connection with that embodiment or example is included in at least one embodiment or example of the invention. In this specification, the illustrative expressions of the above terms do not necessarily refer to the same embodiment or example. Furthermore, the specific features, structures, materials, or characteristics described may be combined in any suitable manner in one or more embodiments or examples.
Claims
1. A DNA adapter, characterized in that, The DNA adapter was obtained by gradient annealing and dDTP extension of primers. The primer sequences are shown in SEQ ID NO:2 and SEQ ID NO:
9. The DNA adapter includes a DNA tag, the DNA tag of which is shown in the following sequence: HHHTAHHTAHHHTAHH. In this context, H represents A, T, or C.
2. The DNA adapter according to claim 1, characterized in that, The joint has a sticky end dT.
3. The DNA adapter according to claim 2, characterized in that, Further includes: An anchoring sequence, which is formed between the sticky end dT and the DNA tag.
4. The DNA adapter according to claim 3, characterized in that, The anchoring sequence has a nucleotide sequence as shown in SEQ ID NO:
1.
5. The DNA adapter according to claim 2, characterized in that, The sticky terminus dT is formed at the 3' end of the DNA tag.
6. The DNA adapter according to claim 4, characterized in that, The connector was obtained by sequentially performing gradient annealing, dDTP extension treatment, and alcohol purification and replenishment treatment.
7. The DNA adapter according to claim 2, characterized in that, Further includes: The restriction enzyme sequence is formed at the end of the DNA tag. in, The restriction enzyme sequence carries a restriction endonuclease recognition site suitable for generating sticky-terminal dT.
8. The DNA adapter according to claim 7, characterized in that, The enzyme digestion sequence is an HphI-specific recognition site.
9. The DNA adapter according to claim 8, characterized in that, The adapter was obtained by sequentially performing gradient annealing, dDTP extension, and Hphl digestion.
10. Use of the DNA adapter according to any one of claims 1-9 in the preparation of reagents for detecting trace variations.
11. A method for constructing a sequencing library, characterized in that, Nucleic acid molecules connected with the DNA adapters of any one of claims 1 to 9 are enriched to obtain sequencing libraries.
12. The method according to claim 11, characterized in that, The nucleic acid molecules were obtained in the following manner: (1) Perform PCR amplification on the nucleic acid sample to be tested in order to obtain nucleic acid sample fragments; (2) Add an A to the 3' end of the nucleic acid sample fragment; (3) Connect the DNA adapter according to any one of claims 1 to 9 to the nucleic acid sample fragment obtained in step (2) to obtain the nucleic acid molecule connected with the DNA adapter according to any one of claims 1 to 9.
13. The method according to claim 11, characterized in that, The enrichment process was achieved through PCR enrichment.
14. The method according to claim 11, characterized in that, Prior to the enrichment process, the method further includes purifying the nucleic acid molecules connected with the DNA adapters of any one of claims 1 to 9.
15. The method according to claim 14, characterized in that, The purification process is performed using magnetic beads.
16. A sequencing library, characterized in that, It is obtained by the method described in any one of claims 11 to 15.
17. A sequencing method, characterized in that, This includes sequencing and data analysis of the sequencing library described in claim 16.
18. The method according to claim 17, characterized in that, The sequencing was performed using the HiSeq2500 platform.