A DNA linker sequence structure combination, reagent, kit and library construction method
By designing UMI sequence structure combinations and single-stranded DNA molecular library construction methods, the problems of stability and multi-sample sequencing in UMI technology were solved, achieving efficient and accurate gene mutation detection, especially low-frequency mutation detection.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- SHENZHEN HUADA GENE INST
- Filing Date
- 2024-12-27
- Publication Date
- 2026-06-30
AI Technical Summary
Existing UMI technology has low stability in dual-link head structures, which affects sequencing accuracy and is not suitable for multi-sample mixed sequencing, requiring additional UMI sequencing primers for separate detection.
A UMI sequence structure combination was designed, including an adapter strand and a complementary auxiliary strand. A stable double-stranded DNA structure was formed by tandemly connecting random bases with specific base sequences. Combined with a single-stranded DNA molecular library construction method, a one-step ligation was performed using T4 DNA ligase, and a ligation reaction enhancer was added to improve efficiency.
It improves sequencing accuracy and the friendliness of multi-sample fusion sequencing, can correct single-base errors, improve single-base sequencing accuracy, simplify the operation process, and shorten ligation time.
Smart Images

Figure SMS_1 
Figure SMS_2 
Figure SMS_3
Abstract
Description
Technical Field
[0001] This invention relates to the field of biotechnology, and in particular to a DNA adapter sequence structure combination, reagents, kits, and library construction method. Background Technology
[0002] With the development of next-generation sequencing technology, research in the field of biotechnology is expanding to a wider scope and greater depth. In studies such as whole-genome sequencing and targeted capture, high-depth sequencing can improve detection sensitivity, thereby obtaining more accurate and comprehensive genetic information, which is of great significance for the study of gene mutation-related diseases, especially cancer. However, higher sequencing depth means more repetitive sequence data. To reduce systematic errors introduced during library preparation and sequencing, unique molecular identifiers (UMIs) are introduced during library construction. These UMIs can correctly distinguish the original molecular sequences from the background noise generated during library construction and sequencing.
[0003] Unique molecular tag sequences (UMIs) are typically short, completely random nucleotide sequences, or short sequences of random nucleotides bound to specific nucleotides. Common sequence lengths range from 4 to 15 nt, and the length can be adjusted according to the specific research needs. UMIs are generally introduced during library construction using methods such as single-stranded ligation, double-stranded ligation, and chain polymerase amplification. Like a molecular barcode, they attach a unique sequence number to the original DNA molecule of each sample. Sequences obtained from sequencing with the same UMI are repetitive products of the same original molecule amplified by PCR or sequenced. Therefore, UMIs can trace repetitive sequences and accurately quantify the number of starting molecules. They can also reduce errors generated during library construction and sequencing, as well as the heterogeneity caused by PCR amplification. This allows for the differentiation of DNA fragments from the same source in high-throughput sequencing data, further improving the accuracy of single-molecule sequences. In data analysis, by uniformly aligning and analyzing multiple DNA molecule sequence fragments from the same source, single-base errors caused by sequencing can be effectively corrected, improving the accuracy of single-base sequencing.
[0004] In previous UMI (Ultra-Mimic) techniques, the UMI typically appears in the middle region of the double-stranded linker structure and is not directly connected to the target region. This necessitates separate detection and reading using additional UMI sequencing primers, making it unsuitable for mixed sequencing of samples with varying sequencing needs. Furthermore, when the number of random bases in the UMI is high, the stability of the double-stranded structure of the UMI linker is relatively low, and the lack of an anchoring sequence structure can negatively impact the effectiveness of the UMI. Summary of the Invention
[0005] In view of this, the present invention provides a DNA adapter sequence structure combination, reagents, kits, and library construction method. The present invention provides a novel UMI sequence structure, offering a complete UMI adapter tandemly combined with a specific anchoring sequence, which is a combination of random bases tandemly with specific base sequences or tandem repeat sequences. It is more suitable for multi-sample mixed sequencing, and alternative solutions can be formed by changing the structure and length of the UMI.
[0006] The present invention employs reagents and methods for constructing single-stranded DNA molecular libraries by combining UMI sequence structures, which can form an alternative solution for constructing double-stranded DNA molecular libraries by combining UMI sequence structures of the present invention.
[0007] This invention can be applied to technical fields such as gene mutation detection, especially low-frequency mutation detection, and can improve detection accuracy. In data analysis, by uniformly comparing and analyzing multiple DNA molecular sequence fragments from the same source, single-base errors caused by sequencing can be effectively corrected, improving the accuracy of single-base sequencing.
[0008] To achieve the above-mentioned objectives, the present invention provides the following technical solution:
[0009] In a first aspect, the present invention provides a UMI connector, comprising a connector chain and a complementary auxiliary chain;
[0010] The connector chain includes a universal sequence and a first UMI;
[0011] The complementary auxiliary strand comprises a random base sequence, a second UMI, and a universal complementary sequence;
[0012] The universal sequence binds to the primers for PCR amplification;
[0013] The anchoring sequence or the anchoring complementary sequence is each an independent combination of fixed base sequences of 3-15 nt;
[0014] The first UMI and the second UMI each independently contain 3-15 nt of random bases;
[0015] The random base sequence is 1 to 10 nt in length and is used to bind to the single-stranded target sequence, assisting the linker strand in connecting with the single-stranded target sequence.
[0016] In some specific embodiments of the present invention, the connector chain further comprises one or more of a modifying group or an anchoring sequence;
[0017] The complementary auxiliary chain further includes one or more of the modifying groups or anchoring complementary sequences;
[0018] In some specific embodiments of the present invention, the anchoring complementary sequence provides complementary pairing sites for the anchoring sequence of the connector chain, and the anchoring sequence and the anchoring complementary sequence together maintain the stability of the UMI connector double-strand structure and ensure the connection efficiency of the single-strand target sequence and the double-strand structure of the UMI connector.
[0019] In some specific embodiments of the present invention, the modifying group blocks the self-linking of the connector;
[0020] Preferably, the modifying group includes one or more of amino modification, biotin modification, or thio modification.
[0021] This invention provides a UMI connector, including a connector chain and a complementary auxiliary chain;
[0022] The linker chain comprises a modifying group, a universal sequence, a first UMI, and an anchoring sequence;
[0023] The complementary auxiliary chain comprises a modifying group, a random base sequence, an anchored complementary sequence, a second UMI, a universal complementary sequence, and a modifying group;
[0024] The universal sequence binds to the primers for PCR amplification;
[0025] The anchoring sequence or the anchoring complementary sequence is each an independent combination of fixed base sequences of 3-15 nt;
[0026] The first UMI and the second UMI each independently contain 3-15 nt of random bases;
[0027] The random base sequence is 1 to 10 nt in length and is used to bind to the single-stranded target sequence, assisting the linker strand in connecting with the single-stranded target sequence.
[0028] The anchoring sequence is a specific base sequence combination that is tandemly linked with random bases to form an overall UMI adapter combination. The specific base sequence combination fully considers the balance among the four bases, avoiding sequencing quality problems caused by base imbalance during the sequencing process.
[0029] The main function of the complementary auxiliary chain is to assist in the connection.
[0030] In some specific embodiments of the present invention, the universal sequence includes a structural sequence that matches the adapter sequence used by the sequencing platform;
[0031] Preferably, the sequencing platform includes MGI or Illumina.
[0032] In some specific embodiments of the present invention, from the 5' end to the 3' end, the connector chain of the UMI connector at the 5' end includes a modifying group, a universal sequence, a UMI, and an anchoring sequence (As);
[0033] From the 5' end to the 3' end, the complementary auxiliary chain of the UMI connector at the 5' end includes a modifying group, a random base sequence, anchoring complementary sequences (ACs), a UMI, a universal complementary sequence, and a modifying group.
[0034] The connecting chain and the complementary auxiliary chain of the 5' end UMI connector are complementary and paired to form a double-chain UMI connector assembly. It should be noted that adding non-complementary base sequences to the connecting chain and the complementary auxiliary chain can form a Y-type UMI connector assembly or a hairpin-type UMI connector assembly.
[0035] In some specific embodiments of the present invention, the universal sequence has:
[0036] (I) A nucleotide sequence as shown in SEQ ID No. 1; or
[0037] (II) A nucleotide sequence obtained by substituting, deleting, or adding one or more nucleotide sequences to the nucleotide sequence shown in (I), and which has the same or similar function to the nucleotide sequence shown in (I); or
[0038] (III) A nucleotide sequence having at least 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, or 99% sequence homology to the nucleotide sequence described in (I) or (II); and / or
[0039] The anchoring sequence has:
[0040] (I) Nucleotide sequences such as 5-CATCAT-3, 5-ATCATC-3, 5-GGGGGG-3 and / or 5-TCATCA-3; or
[0041] (II) A nucleotide sequence obtained by substituting, deleting, or adding one or more nucleotide sequences to the nucleotide sequence shown in (I), and which has the same or similar function to the nucleotide sequence shown in (I); or
[0042] (III) A nucleotide sequence having at least 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, or 99% sequence homology to the nucleotide sequence described in (I) or (II);
[0043] The first UMI and anchoring sequence also include combinations of tandem repeating sequences;
[0044] The universal complementary sequence has:
[0045] (I) A nucleotide sequence as shown in SEQ ID No. 2; or
[0046] (II) A nucleotide sequence obtained by substituting, deleting, or adding one or more nucleotide sequences to the nucleotide sequence shown in (I), and which has the same or similar function to the nucleotide sequence shown in (I); or
[0047] (III) A nucleotide sequence having at least 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, or 99% sequence homology to the nucleotide sequence described in (I) or (II); and / or
[0048] The anchored complementary sequence has:
[0049] (I) Nucleotide sequences such as 5-ATGATG-3, 5-GATGAT-3, 5-CCCCCC-3 and / or 5-TGATGA-3; or
[0050] (II) A nucleotide sequence obtained by substituting, deleting, or adding one or more nucleotide sequences to the nucleotide sequence shown in (I), and which has the same or similar function to the nucleotide sequence shown in (I); or
[0051] (III) A nucleotide sequence having at least 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, or 99% sequence homology to the nucleotide sequence described in (I) or (II);
[0052] The second UMI and the anchored complementary sequence also include combinations of tandem repeat sequences.
[0053] In some specific embodiments of the present invention, the universal sequence is 5-GAACGACATGGCTACGATCCGACTT-3 (SEQ ID No. 1).
[0054] In some specific embodiments of the present invention, the anchoring sequence is:
[0055] 5-CATCAT-3
[0056] 5-ATCATC-3
[0057] 5-GGGGGG-3 and / or
[0058] 5-TCATCA-3.
[0059] In some specific embodiments of the present invention, the universal complementary sequence is: 5-AAGTCGGATCGTAGCCATGTCGTTC-3 (SEQ ID No. 2).
[0060] In some specific embodiments of the present invention, the anchoring complementary sequence is:
[0061] 5-ATGATG-3
[0062] 5-GATGAT-3
[0063] 5-CCCCCC-3 and / or
[0064] 5-TGATGA-3.
[0065] In some specific embodiments of the present invention, the connector chain of the UMI connector at end 5' has:
[0066] (I) A nucleotide sequence as shown in any of SEQ ID Nos. 5–10; or
[0067] (II) A nucleotide sequence obtained by substituting, deleting, or adding one or more nucleotide sequences to the nucleotide sequence shown in (I), and which has the same or similar function to the nucleotide sequence shown in (I); or
[0068] (III) A nucleotide sequence having at least 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, or 99% sequence homology to the nucleotide sequence described in (I) or (II); and / or
[0069] The complementary auxiliary chain of the UMI connector at end 5' has:
[0070] (I) A nucleotide sequence as shown in any of SEQ ID Nos. 11–16; or
[0071] (II) A nucleotide sequence obtained by substituting, deleting, or adding one or more nucleotide sequences to the nucleotide sequence shown in (I), and which has the same or similar function to the nucleotide sequence shown in (I); or
[0072] (III) A nucleotide sequence having at least 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, or 99% sequence homology with the nucleotide sequence described in (I) or (II).
[0073] In some specific embodiments of the present invention, the connector chain further comprises phosphate groups;
[0074] From the 5' end to the 3' end, the connector chain of the UMI connector at the 3' end includes a phosphate group, an anchoring sequence (As), UMI, a universal sequence, and a modifying group;
[0075] The phosphate group is used to react with the 3' end hydroxyl group of the single-chain target sequence to form a phosphodiester bond;
[0076] From the 5' end to the 3' end, the complementary auxiliary chain of the UMI connector at the 3' end includes a modifying group, a universal complementary sequence, a UMI, anchoring complementary sequences (ACs), a random base sequence, and a modifying group.
[0077] The connecting chain and complementary auxiliary chain of the 3' UMI connector are complementary and paired to form a double-stranded UMI connector assembly. It should be noted that adding non-complementary base sequences to the connecting chain and complementary auxiliary chain can form Y-type or hairpin-type UMI connector assemblies.
[0078] In some specific embodiments of the present invention, the universal sequence has:
[0079] (I) A nucleotide sequence as shown in SEQ ID No. 3; or
[0080] (II) A nucleotide sequence obtained by substituting, deleting, or adding one or more nucleotide sequences to the nucleotide sequence shown in (I), and which has the same or similar function to the nucleotide sequence shown in (I); or
[0081] (III) A nucleotide sequence having at least 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, or 99% sequence homology to the nucleotide sequence described in (I) or (II); and / or
[0082] The anchoring sequence has:
[0083] (I) Nucleotide sequences such as 5-TACTAC-3, 5-CTACTA-3, 5-GGGGGG-3 and / or 5-ACTACT-3; or
[0084] (II) A nucleotide sequence obtained by substituting, deleting, or adding one or more nucleotide sequences to the nucleotide sequence shown in (I), and which has the same or similar function to the nucleotide sequence shown in (I); or
[0085] (III) A nucleotide sequence having at least 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, or 99% sequence homology to the nucleotide sequence described in (I) or (II);
[0086] The first UMI and anchoring sequence also include combinations of tandem repeating sequences;
[0087] The universal complementary sequence has:
[0088] (I) A nucleotide sequence as shown in SEQ ID No. 4; or
[0089] (II) A nucleotide sequence obtained by substituting, deleting, or adding one or more nucleotide sequences to the nucleotide sequence shown in (I), and which has the same or similar function to the nucleotide sequence shown in (I); or
[0090] (III) A nucleotide sequence having at least 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, or 99% sequence homology to the nucleotide sequence described in (I) or (II); and / or
[0091] The anchoring complementary sequence is:
[0092] (I) Nucleotide sequences such as 5-GTAGTA-3, 5-TAGTAG-3, 5-CCCCCC-3 and / or 5-AGTAGT-3; or
[0093] (II) A nucleotide sequence obtained by substituting, deleting, or adding one or more nucleotide sequences to the nucleotide sequence shown in (I), and which has the same or similar function to the nucleotide sequence shown in (I); or
[0094] (III) A nucleotide sequence having at least 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, or 99% sequence homology to the nucleotide sequence described in (I) or (II);
[0095] The second UMI and the anchored complementary sequence also include combinations of tandem repeat sequences.
[0096] In some specific embodiments of the present invention, the general sequence is:
[0097] 5-AAGTCGGAGGCCAAGCGGTCTT-3 (SEQ ID No. 3).
[0098] In some specific embodiments of the present invention, the anchoring sequence is:
[0099] 5-TACTAC-3
[0100] 5-CTACTA-3
[0101] 5-GGGGGG-3 and / or
[0102] 5-ACTACT-3.
[0103] In some specific embodiments of the present invention, the universal complementary sequence is:
[0104] 5-AAGACCGCTTGGCCTCCGACTT-3 (SEQ ID No. 4).
[0105] In some specific embodiments of the present invention, the anchoring complementary sequence is:
[0106] 5-GTAGTA-3
[0107] 5-TAGTAG-3
[0108] 5-CCCCCC-3 and / or
[0109] 5-AGTAGT-3.
[0110] In some specific embodiments of the present invention, the connector chain of the UMI connector at end 3' has:
[0111] (I) A nucleotide sequence as shown in any of SEQ ID Nos. 17–22; or
[0112] (II) A nucleotide sequence obtained by substituting, deleting, or adding one or more nucleotide sequences to the nucleotide sequence shown in (I), and which has the same or similar function to the nucleotide sequence shown in (I); or
[0113] (III) A nucleotide sequence having at least 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, or 99% sequence homology to the nucleotide sequence described in (I) or (II); and / or
[0114] The complementary auxiliary chain of the UMI connector at end 3' has:
[0115] (I) A nucleotide sequence as shown in any of SEQ ID Nos. 23–28; or
[0116] (II) A nucleotide sequence obtained by substituting, deleting, or adding one or more nucleotide sequences to the nucleotide sequence shown in (I), and which has the same or similar function to the nucleotide sequence shown in (I); or
[0117] (III) A nucleotide sequence having at least 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, or 99% sequence homology with the nucleotide sequence described in (I) or (II).
[0118] Secondly, the present invention also provides a reagent, including the aforementioned UMI connector.
[0119] Thirdly, the present invention also provides a kit comprising the UMI connector or the reagent.
[0120] Fourthly, the present invention also provides the application of any of the following in amplification, sequencing and / or library construction, including but not limited to one or more of gene mutation detection, DNA detection, RNA detection or ancient DNA detection;
[0121] (I) The aforementioned UMI connector;
[0122] (II) The reagents described above; or
[0123] (III) The kit described above.
[0124] Fifthly, the present invention also provides a method for constructing a DNA molecular library, which uses a one-step method to introduce UMI into the single or double ends of a single-stranded DNA molecule through any of the following items, and adds a ligation reaction enhancer;
[0125] (I) The aforementioned UMI connector;
[0126] (II) The reagents described above; or
[0127] (III) The kit described above.
[0128] In some specific embodiments of the present invention, the principle of clamp connection is utilized, and T4 DNA ligase is used to connect the single-stranded DNA molecule to the double-stranded UMI adapter, thereby introducing UMI.
[0129] Compared with the prior art, the present invention has made the following technical improvements:
[0130] First, this invention designs a novel UMI sequence structure, providing the concept of a tandem UMI adapter combining a UMI with a specific anchoring sequence. This is a combination of random bases tandemly with specific base sequences or tandem repeat sequences, considering not only the balance between bases during sequencing but also ensuring the stability of the double-stranded UMI adapter sequence structure. UMIs can trace repeat sequences and accurately quantify the number of molecules starting up, while reducing errors generated during library preparation and sequencing, as well as the heterogeneity caused by PCR amplification. This enables the differentiation of DNA fragments from the same source in high-depth sequencing data from high-throughput sequencing, further improving the accuracy of single-molecule sequences. In data analysis, by uniformly aligning and analyzing multiple DNA molecular sequence fragments from the same source, single-base errors caused by sequencing can be effectively corrected, improving the accuracy of single-base sequencing.
[0131] Secondly, this invention employs a unique DNA molecular tag sequence combination combined with reagents and methods for constructing single-stranded DNA molecular libraries to introduce a unique DNA molecular tag at one or both ends of a single-stranded DNA molecule, thereby improving the sensitivity and accuracy of genomic DNA detection. The unique DNA molecular tag (UMI) adapter of this invention is a double-stranded DNA adapter structure, including a 5' UMI adapter and a 3' UMI adapter, both composed of an adapter strand and a complementary auxiliary strand. The adapter strand structure includes a modification group, a universal sequence, a UMI, and an anchoring sequence. The anchoring sequence is a specific base sequence combination tandemly with random bases to form an overall UMI adapter combination. This specific base sequence combination fully considers the balance among the four bases, avoiding sequencing quality problems caused by base imbalance during sequencing. The main function of the complementary auxiliary strand is to assist in ligation. Therefore, the structure of the complementary auxiliary strand includes a modifying group, a random base sequence, an anchoring complementary sequence, a UMI, and a universal complementary sequence. The length of the random base is 1 to 10 nt, which is used to bind to the single-stranded target sequence and assist the UMI adapter strand in ligating with the single-stranded target sequence. The anchoring complementary sequence provides complementary pairing sites for the anchoring sequence of the adapter strand. Together, they maintain the stability of the double-stranded structure of the UMI adapter and ensure the ligation efficiency of the single-stranded DNA molecule and the double-stranded UMI adapter.
[0132] Third, this invention provides a set of double-stranded UMI adapter sequences. Utilizing the principle of clip-on ligation, UMIs are introduced into the single- or double-ended ends of single-stranded DNA molecules using T4 DNA ligase in a one-step reaction. Some single-stranded DNA molecule construction techniques use T4 RNA ligation, which is time-consuming and far less efficient than T4 DNA ligase. Other single-stranded DNA molecule library construction techniques introduce UMIs stepwise in different reactions, which is complex and time-consuming. This invention uses a one-step method to introduce UMIs into the single- or double-ended ends of single-stranded DNA molecules, and simultaneously adds a ligation reaction enhancer, such as the chemical reagent cobalt hexaaminochloride, which can significantly improve the reaction efficiency and greatly shorten the ligation reaction time. Attached Figure Description
[0133] To more clearly illustrate the technical solutions in the embodiments of the present invention or the prior art, the accompanying drawings used in the description of the embodiments or the prior art will be briefly introduced below.
[0134] Figure 1 A schematic diagram of the overall UMI connector sequence assembly is shown;
[0135] Figure 2 The resolution of two sets of UMI connector sequence combination library samples is shown;
[0136] Figure 3 The sequence repetition rate of two groups of UMI connector sequence combinations and non-UMI connector library samples is shown.
[0137] Figure 4 The matching rate between two sets of UMI connector sequence combinations and non-UMI connector library samples is shown. Detailed Implementation
[0138] This invention discloses a method for assembling DNA adapter sequence structures, reagents, kits, and constructing libraries. Those skilled in the art can refer to this document and appropriately modify the process parameters to achieve the desired results. It is particularly important to note that all similar substitutions and modifications are obvious to those skilled in the art and are considered to be included in this invention. The methods and applications of this invention have been described through preferred embodiments. Those skilled in the art can clearly modify or appropriately change and combine the methods and applications described herein without departing from the content, spirit, and scope of this invention to implement and apply the technology of this invention.
[0139] This invention designs a novel UMI sequence structure, providing the concept of a tandem UMI adapter combining a UMI with a specific anchoring sequence. This is a combination of random bases tandemly with specific base sequences or tandem repeat sequences, ensuring not only base balance during sequencing but also the stability of the double-stranded UMI adapter sequence structure. This invention employs a unique DNA molecular tag sequence combination combined with reagents and methods for constructing single-stranded DNA libraries to introduce unique DNA molecular tags to the single or double ends of single-stranded DNA molecules, improving the sensitivity and accuracy of genomic DNA detection. First, the DNA sample is denatured into single-stranded DNA molecules under high temperature conditions. During denaturation, a thermostable single-stranded DNA-binding protein is introduced to prevent DNA renaturation and reformation into a double-stranded structure, maintaining the stability of the single-stranded DNA structure. Next, T4 DNA ligase is used to simultaneously ligate the two ends of the single-stranded DNA molecule with a double-stranded UMI adapter, introducing a universal adapter sequence, UMI, and anchoring sequence combination. The universal adapter is used to bind to PCR primers, while the anchoring sequence combination is used to balance the sequencing bases and maintain the stability of the UMI adapter double-stranded structure. Then, barcode sequences for sample identification and structural sequences for sequencing reactions are introduced via PCR amplification. Library samples with different barcode sequences are mixed at equal concentrations to form a single sequencing sample. The sequencing sample is then read using a gene sequencer, thereby achieving high-throughput sequencing detection.
[0140] This invention provides a set of reagents for the phosphorylation and denaturation of DNA samples. T4 PNK is used to phosphorylate DNA molecules lacking a phosphate group at the 5' end, and tris-HCl reagent is added to balance the pH of the reaction system. At the same time, the DNA sample is denatured into single-stranded DNA molecules under high temperature conditions. During the denaturation process, a thermostable single-stranded DNA binding protein is introduced to prevent the single-stranded DNA molecules from annealing and reforming into a double-stranded structure, thus maintaining the stability of the single-stranded DNA structure and improving the ligation efficiency of single-stranded DNA molecules.
[0141] This invention provides a monolithic UMI adapter tandemly combining a UMI with a specific anchoring sequence. The UMI adapter sequence is a double-stranded DNA structure comprising a 5' UMI adapter sequence and a 3' UMI adapter sequence, both composed of an adapter strand and a complementary auxiliary strand. The adapter strand structure includes a modifying group, a universal sequence, a UMI, and an anchoring sequence. The universal sequence is used for primer binding in the PCR amplification step; the anchoring sequence is a specific base sequence combination tandemly with random bases to form a monolithic UMI adapter. This specific base sequence combination fully considers the balance among the four bases, avoiding sequencing quality problems caused by base imbalance during sequencing. The primary function of the complementary auxiliary strand is to assist in ligation. Therefore, the structure of the complementary auxiliary strand includes modifying groups, random base sequences, anchoring complementary sequences, a UMI (universal complementary sequence), and a universal complementary sequence. The random bases are 1–10 nt in length and are used to bind to the single-stranded target sequence, assisting the UMI adapter strand in ligation with the single-stranded target sequence. The anchoring complementary sequence provides complementary pairing sites for the anchoring sequence of the adapter strand. Together, they maintain the stability of the double-stranded structure of the UMI adapter, ensuring the ligation efficiency of the single-stranded DNA molecule to the double-stranded UMI adapter. A schematic diagram of the overall UMI adapter structure sequence is attached. Figure 1 .
[0142] From the 5' end to the 3' end, the 5' end UMI adapter strand sequence consists of a modifying group, a universal sequence, the UMI, and an anchoring sequence (As). The modifying group can be amino-modified, biotin-modified, thio-modified, etc., used to block adapter self-ligation. The universal sequence is used for primer binding in the PCR amplification step and can be a structural sequence matching the adapter sequence of relevant sequencing platforms such as MGI and Illumina. For MGI, the preferred structural sequence is 5-GAACGACATGGCTACGATCCGACTT-3. Those skilled in the art can provide structural sequences matching the adapter sequence for other platforms based on the principles provided by this invention. This invention is not limited thereto; any feasible substitution, deletion, addition, or replacement sequences and sequence combinations are within the scope of protection of this invention. UMI consists of 3-15 nt of random bases; As consists of 3-15 nt of fixed base sequence combinations. To consider the balance between sequencing bases, As is preferably a combination of 5-CATCAT-3, 5-ATCATC-3, 5-GGGGGG-3, and 5-TCATCA-3, but substitutions may be formed through other random balanced base combinations. This invention is not limited thereto, and any feasible sequences and sequence combinations obtained by substitution, deletion, addition, or replacement are within the scope of protection of this invention.
[0143] Based on the number requirements of UMI, the sequence of the 5' end UMI connector chain can be extended to a combination of tandem repeating sequences of UMI and As, as shown in Table 1.
[0144] Table 1. Preferred sequence combinations and extension structures of the connector chain for the 5' end UMI connector (5'-3' direction)
[0145]
[0146]
[0147] From the 5' end to the 3' end, the complementary auxiliary strand of the 5' end UMI adapter consists of a modifying group, a random base sequence, anchoring complementary sequences (ACs), a UMI, a universal complementary sequence, and the modifying group. The modifying group can be amino-modified, biotin-modified, thio-modified, etc., used to block adapter self-ligation. The random base sequence is a 1-10 nt random base fragment used to bind to the single-stranded target sequence, assisting the UMI adapter strand in ligating to the single-stranded target sequence. The universal complementary sequence can be a structural complementary sequence matching the adapter sequence of related sequencing platforms such as MGI and Illumina. For MGI, the preferred structural sequence is 5-AAGTCGGATCGTAGCCATGTCGTTC-3; those skilled in the art can provide corresponding structural sequences matching the adapter sequence for other platforms based on the principles provided by this invention. This invention is not limited thereto; any feasible substitution, deletion, addition, or replacement sequences and sequence combinations are within the scope of protection of this invention. UMI consists of 3-15 nt of random bases; ACs are fixed base sequence combinations of 3-15 nt, complementary to As. To consider the balance between sequencing bases, ACs are preferably combinations of 5-ATGATG-3, 5-GATGAT-3, 5-CCCCCC-3, and 5-TGATGA-3, but substitutions may be formed through other random balanced base combinations. This invention is not limited herein, and any feasible substitutions, deletions, additions, or replacements to obtain sequences and sequence combinations are within the scope of protection of this invention. Depending on the number of UMIs and the expansion requirements of the adapter strand, the complementary auxiliary strand sequence can be expanded into a combination of tandem repeat sequences of UMIs and ACs, as shown in Table 2.
[0148] Table 2. Preferred sequence combinations and extended structures of complementary auxiliary chains for 5'-end UMI connectors (5'-3' direction)
[0149]
[0150]
[0151] The connecting chain and complementary auxiliary chain of the 5' UMI connector are complementary and paired to form a double-stranded UMI connector assembly. It should be noted that adding non-complementary base sequences to the connecting chain and complementary auxiliary chain can form Y-type or hairpin-type UMI connector assemblies.
[0152] From the 5' end to the 3' end, the sequence structure of the 3' end UMI adapter strand consists of a phosphate group, an anchoring sequence (As), the UMI, a universal sequence, and a modifying group. The phosphate group reacts with the 3' end hydroxyl group of the single-stranded target sequence to form a phosphodiester bond; As is a fixed base sequence combination of 3-15 nt. To consider the balance of sequencing bases, As is preferably a combination of 5-TACTAC-3, 5-CTACTA-3, 5-GGGGGG-3, and 5-ACTACT-3, but substitutions may be formed through other random balanced base combinations. This invention is not limited herein, and any feasible substitution, deletion, addition, or replacement sequences and sequence combinations are within the scope of protection of this invention.
[0153] UMI consists of 3-15 nt random bases. The universal sequence is used for primer binding in the PCR amplification step and can be a structural sequence matching the adapter sequence of sequencing platforms such as MGI and Illumina. For MGI, the preferred structural sequence is 5-AAGTCGGAGGCCAAGCGGTCTT-3. Those skilled in the art can provide structural sequences matching the adapter sequence for other platforms based on the principles provided by this invention. This invention is not limited thereto; any feasible substitution, deletion, addition, or replacement sequences and sequence combinations are within the scope of protection of this invention. Modifying groups can be amino-modifying, biotin-modifying, thio-modifying, etc., used to block adapter self-ligation. Depending on the number of UMIs required, the 3' UMI adapter chain sequence can be extended to a tandem repeat sequence combination of UMI and As, as shown in Table 3.
[0154] Table 3. Preferred sequence combinations and extension structures of the connector chain for the 3' end UMI connector (5'-3' direction)
[0155]
[0156]
[0157] From the 5' end to the 3' end, the complementary auxiliary strand of the 3' end UMI adapter consists of a modifying group, a universal complementary sequence, the UMI, anchoring complementary sequences (ACs), a random base sequence, and the modifying group. The modifying group can be amino-modified, biotin-modified, thio-modified, etc., used to block adapter self-ligation. The universal complementary sequence can be a structural sequence matching the adapter sequence of related sequencing platforms such as MGI and Illumina; for MGI, the preferred structural sequence is 5-AAGACCGCTTGGCCTCCGACTT-3. Those skilled in the art can provide corresponding structural sequences matching the adapter sequence for other platforms based on the principles provided by this invention. This invention is not limited thereto; any feasible substitution, deletion, addition, or replacement sequences and sequence combinations are within the scope of protection of this invention. The UMI is a random base sequence of 3-15 nt. ACs are fixed base sequence combinations of 3-15 nt, complementary to As. To consider the balance between sequencing bases, ACs are preferably combinations of 5-GTAGTA-3, 5-TAGTAG-3, 5-CCCCCC-3, and 5-AGTAGT-3, but substitutions may be formed through other random balanced base combinations. This invention is not limited thereto; any feasible substitution, deletion, addition, or replacement sequences and sequence combinations are within the scope of protection of this invention. Random base sequences are 1-10 nt random base fragments used to bind to single-stranded target sequences, assisting the UMI adapter strand in connecting to the single-stranded target sequence. Depending on the number of UMIs and the expansion requirements of the adapter strand, the complementary auxiliary strand sequence can be expanded into a repetitive sequence combination of UMIs and ACs, as shown in Table 4.
[0158] Table 4. Preferred sequence combinations and extended structures of complementary auxiliary chains for 3'-end UMI connectors (5'-3' direction)
[0159]
[0160]
[0161] The connecting chain and complementary auxiliary chain of the 3' UMI connector are complementary and paired to form a double-stranded UMI connector assembly. It should be noted that adding non-complementary base sequences to the connecting chain and complementary auxiliary chain can form Y-type or hairpin-type UMI connector assemblies.
[0162] This invention provides reagents and methods for simultaneously introducing UMIs (Ultra-Mechanisms) at one or both ends of a single-stranded DNA molecule. Utilizing the principle of clip ligation, T4 DNA ligase is used to ligate single-stranded DNA molecules with double-stranded UMI adapters, introducing the UMI sequence. Simultaneously, ligation reaction enhancing reagents, such as cobalt hexaaminochloride, are added to improve ligation reaction efficiency and shorten reaction time.
[0163] This invention provides universal primer sequences for PCR amplification of ligation products. The forward primer is named PCR-F, and the reverse primer is linked to a barcode sequence for sample identification, named PCR-RN, where N represents the barcode number. Different barcode sequences have different numbers and are composed of 10 different bases. Preferred primer sequences are shown in Table 5. Using these primer sequences to amplify library samples, libraries with different barcode sequences can be obtained. Multiple libraries with different barcode sequence tags can be mixed at equal concentrations to form a single sequencing sample, enabling the sequencing detection of multiple libraries together, increasing throughput and reducing detection costs.
[0164] Table 5. Optimal primer sequences (5, -3, orientation)
[0165]
[0166] Unless otherwise indicated, the practice of the methods and systems disclosed herein relates to conventional techniques and apparatus commonly used in the fields of molecular biology, microbiology, protein purification, protein engineering, protein and DNA sequencing, and recombinant DNA, which are within the scope of the art. Such techniques and apparatus are known to those skilled in the art and are described in numerous articles and references (see, for example, Sambrooket al., “Molecular Cloning: A Laboratory Manual,” 3rd edition (Cold Spring Harbor),
[2001] ).
[0167] Numerical ranges include values within defined ranges. Each maximum numerical limit given throughout this specification is intended to include each lower numerical limit, as such lower numerical limits are explicitly stated herein. Each minimum numerical limit given throughout this specification includes each higher numerical limit, as such higher numerical limits are explicitly stated herein. Each numerical range given throughout this specification includes each narrower numerical range falling within such a wider numerical range, as such narrower numerical ranges are explicitly stated herein.
[0168] The headings provided in this invention are not intended to limit this disclosure.
[0169] Unless otherwise specified, all technical and scientific terms used in this invention have the same meaning as commonly understood by one of ordinary skill in the art. Various scientific dictionaries including those containing the terms used in this invention are well known and available to those skilled in the art. While any methods and materials similar to or equivalent to those described and used in this invention may be used in the practice or testing of embodiments disclosed herein, some methods and materials are still described.
[0170] The terminology provided in this invention is described more fully with reference to the entire specification. It should be understood that this disclosure is not limited to the specific methods, procedures, and reagents described, as these may vary depending on the context used by those skilled in the art.
[0171] Terminology Explanation:
[0172] As used herein, unless the context clearly indicates otherwise, the singular terms “a / an” and “the” include plural referents.
[0173] Unless otherwise specified, nucleic acids are written from left to right in the 5' to 3' direction, and amino acid sequences are written from left to right in the direction from amino to carboxyl.
[0174] Unique molecular identifiers (UMIs) are nucleotide sequences used to identify DNA molecules or DNA molecules that can be used to distinguish individual DNA molecules from one another. Because UMIs are used to identify DNA molecules, they are also called unique molecular identifiers. See, for example, Kivioja, Nature Methods 9, 72-74 (2012). UMIs can be sequenced along with the DNA molecules they are associated with to determine whether the read sequence belongs to one source DNA molecule or another. In this paper, the term "UMI" refers to the sequence information of polynucleotides.
[0175] Typically, multiple cases of a single source molecule are sequenced. In the case of DNBSEQ sequencing using MGI sequencing technology, rolling circle amplification (RCA) can be performed on the source molecule before the sequencing reaction. Individual DNA molecules undergo rolling circle amplification (RCA) to generate DNA nanospheres (DNBs). Each molecule in a DNA nanosphere originates from the same source DNA molecule but is sequenced separately. For error correction and other purposes, it can be important to determine that all reads from a single cluster are identified as originating from the same source molecule. UMI allows this grouping. DNA molecules that produce multiple cases of DNA molecules through amplification or otherwise replication are called source DNA molecules.
[0176] UMIs are similar to barcodes, which are typically used to distinguish reads from reads in one sample. However, when sequencing many DNA molecules together, UMIs are used instead to distinguish one source DNA molecule from another. Because there can be many more DNA molecules in a sample than in a sequencing run, there are typically many more unique UMIs than unique barcodes in a sequencing run.
[0177] As used herein, the term "alignment" refers to the process of comparing a read with a reference sequence to determine whether the reference sequence contains the read sequence. The alignment process attempts to determine whether a read can be mapped to the reference sequence, but it does not always produce a read that aligns to the reference sequence. If the reference sequence contains the read, the read can be mapped to the reference sequence, or in some implementations, to a specific location within the reference sequence. In some cases, alignment only determines whether a read is a member of a specific reference sequence (i.e., whether the read exists within the reference sequence). For example, aligning a read sequence to a reference sequence of human chromosome 13 determines whether the read sequence exists within the reference sequence of chromosome 13. The tool providing this information may be called a set membership tester. In some cases, alignment additionally indicates the location of the read sequence within the reference sequence. For example, if the reference sequence is the entire human genome sequence, alignment may indicate that the read sequence is present on chromosome 13, and may also indicate that the read sequence is located on a specific strand and / or site on chromosome 13. In some cases, alignment tools are imperfect because a) not all valid alignments can be found, and b) some obtained alignments are invalid. This occurs for various reasons, such as reads containing errors and reads differing from the reference genome due to haplotype differences. In some applications, alignment tools include built-in mismatch tolerance, which tolerates a certain degree of base pair mismatches and still allows reads to be aligned with the reference sequence. This helps identify valid alignments of reads that would otherwise be missed.
[0178] The term “mapping” used in this article refers to the process of aligning a read sequence to a larger sequence, such as a reference genome.
[0179] The terms "polynucleotide," "nucleic acid," and "nucleic acid molecule" are used interchangeably and refer to a covalently linked sequence of nucleotides (i.e., ribonucleotides for RNA and deoxyribonucleotides for DNA), in which the 3' position of the pentose sugar of one nucleotide is linked to the 5' position of the pentose sugar of the next nucleotide by a phosphodiester group. Nucleotides include sequences of any form of nucleic acid, including but not limited to RNA and DNA molecules such as cell-free DNA (cfDNA) molecules. The term "polynucleotide" includes, but is not limited to, single-stranded and double-stranded polynucleotides.
[0180] The terms “site” and “alignment location” are used interchangeably to refer to a unique location on the reference genome (i.e., chromosome ID, chromosome position, and orientation). In some implementations, the site can be the location of a residue, sequence tag, or segment on the reference sequence.
[0181] As used herein, the terms “reference genome” or “reference sequence” refer to any specific known genome sequence (whether partial or complete) of any organism or virus that can be used as a reference to an identified sequence from a subject. For example, reference genomes for human subjects and a variety of other organisms are available at the National Center for Biotechnology Information (NCBI) at ncbi.nlm.nih.gov. A “genome” refers to the complete genetic information of an organism or virus as a sequence of nucleic acids. However, it should be understood that “complete” is a relative concept, as even the gold-standard reference genome is expected to include gaps and errors.
[0182] As used herein, the term "primer" refers to a segregated oligonucleotide that, when placed under conditions that induce the synthesis of the extension product (e.g., conditions including nucleotides, inducers such as DNA polymerase, necessary ions and molecules, and suitable temperature and pH), can act as a site of synthesis initiation. For maximum amplification efficiency, primers are preferably single-stranded, but optionally double-stranded. If double-stranded, the primers are first treated to separate their strands before being used to prepare the extension product. Primers can be oligodeoxyribonucleotides. Primers are long enough to initiate the synthesis of the extension product in the presence of an inducer. The exact length of the primer depends on a number of factors, including temperature, primer source, the method used, and the parameters used for primer design.
[0183] This invention establishes a technique for DNA adapter sequence assembly, reagents, and library construction. It mainly brings improvements in the following aspects:
[0184] First, this invention provides a tandem UMI (Universal Micro-Molecular Linker) that combines a UMI with a specific anchoring sequence. This UMI is a combination of random bases tandemly with a specific base sequence or a tandem repeat sequence. The UMI can trace repeat sequences and accurately quantify the number of molecules initiating them. It can also reduce errors generated during library preparation and sequencing, as well as the heterogeneity caused by PCR amplification. This enables the differentiation of DNA fragments from the same source in high-depth sequencing data from high-throughput sequencing, further improving the accuracy of single molecular sequences.
[0185] Secondly, the UMI adapter of this invention is a DNA double-stranded adapter, including a 5' UMI adapter and a 3' UMI adapter, both composed of an adapter strand and a complementary auxiliary strand. The adapter strand ligates to the target single-stranded DNA molecule. Modifying groups in its sequence structure prevent self-ligation of the adapter, while the anchoring sequence provides a specific base combination that tandem with random bases to form a complete UMI adapter assembly. This fully considers the balance among the four bases, avoiding sequencing quality problems caused by base imbalance during sequencing. The complementary auxiliary strand primarily assists in ligation; its anchoring complementary sequence provides complementary pairing sites for the anchoring sequence of the adapter strand. Together, they maintain the stability of the UMI adapter double-stranded structure, ensuring the ligation efficiency between the single-stranded DNA molecule and the double-stranded UMI adapter.
[0186] Third, this invention employs reagents and methods for constructing single-stranded DNA molecular libraries using UMI sequence combination. Utilizing the principle of clip ligation, T4 DNA ligase is used in the same reaction to introduce UMIs at one or both ends of the single-stranded DNA molecule. Some single-stranded DNA molecule construction techniques use T4 RNA ligation for single-stranded linker ligation, resulting in long ligation times and significantly low ligation efficiency. The application of T4 DNA ligase and the clip ligation principle improves the efficiency of the ligation reaction and shortens the reaction time. Some single-stranded DNA molecular library construction techniques introduce UMIs stepwise in different reactions, which is complex and time-consuming. The one-step method and the application of ligation reaction enhancers further improve the reaction efficiency and significantly shorten the ligation reaction time.
[0187] Fourth, this invention provides a UMI scheme suitable for single-stranded library construction. The UMI ligation is directly linked to the target single-stranded DNA molecule, and the sequencing reaction preferentially reads the UMI sequence, ensuring that the UMI sequence appears within the regular read sequence. This eliminates the need for an additional third sequencing primer, improving the accuracy of UMI sequencing. This design is platform-independent, allowing for the design of different adapter sequences for different sequencing platforms. This invention has a wide range of applications, enabling the addition of UMI tags to RNA for library construction using cDNA obtained from RNA reverse transcription.
[0188] The raw materials and reagents used in the DNA adapter sequence structure combination, reagents, kits and library construction method provided by this invention are all commercially available.
[0189] The present invention will be further illustrated below with reference to the embodiments:
[0190] Example 1: Annealing and Configuration of UMI Connectors
[0191] 1. Joint assembly
[0192] In this implementation, two sets of 5' UMI adapters and one set of non-UMI adapters were designed and synthesized, with sequence structures shown in Table 6. The adapters were dissolved in a 100 μM solution using TE buffer. The UMI1-5 adapter combination is a 5' primer UMI adapter sequence on a single-stranded DNA molecule, consisting of four random bases tandemly linked with six anchor bases to form a complete UMI combination. Similarly, the UMI2-5 adapter combination is also a 5' primer UMI adapter sequence on a single-stranded DNA molecule. Unlike the UMI1-5 adapter, UMI2-5 inserts a fixed base among the four random bases, balancing the four bases, before tandemly linking it with the six anchor bases to form a complete UMI combination. Both UMI1-5 and UMI2-5 anchor sequence combinations are preferred balanced base sequences, and both were used in conjunction with the 3' non-UMI adapter for library construction.
[0193] Table 6. Sequence structure of 5' UMI connectors (2 sets) and non-UMI 5' and 3' connectors.
[0194]
[0195]
[0196] 2. Joint annealing
[0197] The UMI adapter of the present invention is a DNA double-stranded adapter. According to the base complementary pairing principle, the linker strand and its respective complementary auxiliary strand in the adapter are annealed separately, for example, 5-ad-L-1-UMI1 and 5-ad-A-1-UMI1, 5-ad-L-2-UMI1 and 5-ad-A-2-UMI1, 5-ad-L-3-UMI1 and 5-ad-A-3-UMI1, 5-ad-L-4-UMI1 and 5-ad-A-4-UMI1. MI1 was annealed individually; 5-ad-L-1-UMI2 and 5-ad-A-1-UMI2, 5-ad-L-2-UMI2 and 5-ad-A-2-UMI2, 5-ad-L-3-UMI2 and 5-ad-A-3-UMI2, 5-ad-L-4-UMI2 and 5-ad-A-4-UMI2 were annealed individually; 5-ad-L and 5-ad-A, 3-ad-L and 3-ad-A were annealed individually. Annealing reaction mixtures were prepared in 0.2 mL PCR tubes (formulations are shown in Table 7). After vortexing and brief centrifugation, the PCR tubes were incubated at room temperature for 30 minutes, or the annealing reaction program in Table 8 was run on a PCR instrument. After the annealing reaction, the four UMI adapters with different anchoring sequences from the UMI1-5 and UMI2-5 combinations were mixed in a 1:1:1:1 ratio to prepare different UMI adapter combinations. Finally, the UMI1-5 combination, UMI2-5 combination, and non-UMI 5' and 3' connectors (5-ad and 3-ad) were diluted with TEbuffer to a working connector reaction solution with a concentration of 1 μM.
[0198] Table 7 Formulation of the joint annealing reaction system
[0199]
[0200] Table 8 Annealing Reaction Procedure
[0201]
[0202]
[0203] Example 2: Comparison of UMI connector assembly and non-UMI connector
[0204] 1. Sample preparation
[0205] This implementation used whole-genome DNA (New England Biolabs, N3011S) samples from Lambda. The Lambda whole-genome DNA was fragmented into 300–500 bp fragments using a Covaris LE220 physical fragmentation instrument. 5 ng of fragmented Lambda DNA was used for library construction, with two sets of UMI adapter combinations and one set of non-UMI adapters, each set of adapters repeated three times. Libraries constructed using the UMI1-5 combination with the 3-ad adapter were labeled UMI-1-1, UMI-1-2, and UMI-1-3, respectively; libraries constructed using the UMI2-5 combination with the 3-ad adapter were labeled UMI-2-1, UMI-2-2, and UMI-2-3, respectively; and libraries constructed using the 5-ad and 3-ad adapters were named NON-1-1, NON-1-2, and NON-1-3, respectively.
[0206] 2. Construction of a single-chain library
[0207] 2.1 Phosphorylation and Denaturation Reactions
[0208] This invention provides a set of reagents for the phosphorylation and denaturation reactions of DNA samples. T4 polynucleotide kinase (BGI) is used to phosphorylate DNA molecules lacking a phosphate group at the 5' end. Tris-HCl reagent (SIGMA, T2694-100ML) is added to balance the pH of the reaction system. Simultaneously, the DNA sample is denatured into single-stranded DNA molecules under high temperature conditions. A thermostable single-stranded DNA binding protein (New England Biolabs, M2401S) is introduced during the denaturation process to prevent the single-stranded DNA molecules from annealing and reforming into a double-stranded structure, thus maintaining the stability of the single-stranded DNA structure and improving the ligation efficiency of the single-stranded DNA molecules. The phosphorylation and denaturation reaction system is shown in Table 9. The reaction system is incubated on a PCR machine at 37°C for 15 minutes, and then at 95°C for 5 minutes. After the reaction, the reaction system is immediately transferred to ice and placed for 2 minutes.
[0209] Table 9 Phosphorylation and Denaturation Reaction Systems
[0210]
[0211] 2.2 Connection Reaction
[0212] This invention provides a monolithic UMI adapter tandemly combined with a specific anchoring sequence. The UMI adapter sequence is a DNA double-stranded structure. Utilizing the principle of splint ligation, UMI is introduced into the single or double ends of a single-stranded DNA molecule using T4 DNA ligase (BGI) in the same reaction. Simultaneously, a ligation reaction enhancer, such as the chemical reagent cobalt hexaaminochloride (MERYER, M84231-25G), is added to improve the ligation reaction efficiency and shorten the reaction time. It should be noted that in this case, the annealed and diluted UMI1-5 combination, UMI2-5 combination, or non-UMI adapters from Table 6 were used for library construction. 2 μL of UMI1-5 combination (1 μM), UMI2-5 combination (1 μM), and 5-ad (1 μM) adapters were added to different adapter reaction systems, along with 30 μL of ligation reaction solution, the composition of which is shown in Table 10. The reaction system was incubated at 23°C for 30 minutes on a PCR instrument.
[0213] Table 10 shows the composition of the reaction solution.
[0214]
[0215] 2.3 Purification of Ligation Products
[0216] Add 120 μL of DNAClean Beads (VAZYME, N411-03) to the reaction tube to purify the ligation product. Elute with 23 μL of DNA elution buffer and transfer 21 μL of the purified ligation product to a new PCR reaction tube for PCR amplification.
[0217] 2.4 PCR amplification
[0218] This invention provides universal primer sequences for PCR amplification of ligation products. The forward primer is named PCR-F, and the reverse primer is linked to a barcode sequence for sample identification, named PCR-RN, where N represents the barcode number. Different barcode sequences have different numbers and are composed of 10 different bases. It should be noted that this example uses PCR-F (20 μM) and PCR-RN (20 μM) from Table 5 for PCR amplification. Using this primer sequence to amplify library samples, library samples with different barcode sequences can be obtained. Multiple library samples with different barcode sequence tags can be mixed at equal mass concentrations to form a single sequencing sample, enabling multiple library samples to be sequenced together, improving throughput and reducing costs. PCR-RN (20 μM) with different barcode sequences is added to different library samples, along with 27 μL of PCR amplification reaction solution, the composition of which is shown in Table 11. The reaction system is placed on a PCR instrument and the program in Table 12 is run.
[0219] Table 11 Composition of PCR amplification reaction solution
[0220]
[0221] Table 12 PCR amplification reaction procedure
[0222]
[0223] 2.5 Purification and mixing of PCR amplification products
[0224] PCR products were purified using 50 μL of DNACleanBeads (VAZYME, N411-03). The concentration of the purified products was determined using Qubit3.0 (Life Technologies). Meanwhile, nine libraries constructed from UMI1-5, UMI2-5 and non-UMI adapters were mixed into a single sequencing sample at equal molar concentrations and then vortexed for later use.
[0225] 2.6 Single-strand circularization and sequencing reaction
[0226] According to the present invention, the MGIEAsy rapid circularization module (MGI, 1000005282) was used for single-strand circularization reaction, and the operation was strictly performed according to the kit instructions. This implementation example uses the DNBseq sequencing platform (MGI) and the PE100+10 (Paired End 100+10) sequencing type to obtain reliable sequence base information.
[0227] 2.7 Data Results Analysis
[0228] By comparing the base sequence information obtained from sequencing, the barcode sequence is split and screened to obtain the DNA sequence information that matches each sample. Then, the adapter sequence in the DNA sequence is removed using the software Fastp v0.23.2, and low-quality sequences with a length of less than 30 nt and an N base greater than 1 are filtered out. Based on the tandem combination sequence length of UMI and anchor sequence (UMI1-5 is 10 nt in length, UMI2-5 is 11 nt in length), the obtained DNA sequence is subjected to UMI excision under zero-error-tolerance parameter conditions, and the UMI splitting rate and DNA sequence repetition rate are calculated. Finally, under default parameter conditions, the DNA sequence is compared with the Lambda whole genome reference sequence (New England Biolabs, lambda Spike-incontrol DNA Sequences) using the software BWAv0.7.17 to obtain the Lambda DNA sequence information, and the alignment rate of the Lambda DNA sequence is calculated using the "flagstat" function of Samtoolsv1.11. Quality control data for nine libraries constructed using UMI1-5, UMI2-5, and non-UMI adapters were statistically analyzed, including UMI resolution, sequence repetition rate after UMI correction, and matching rate with reference sequences. Results showed that under zero-tolerance parameters, both the UMI1-5 and UMI2-5 adapters achieved resolution rates exceeding 91%, indicating that both design strategies are applicable. (See Appendix for results.) Figure 2 The repetitive sequence ratio of each sample was calculated, and the repetitive sequences were corrected using UMI. The results were then compared with the repetitive sequence ratio of samples constructed using non-UMI libraries. The results showed that the repetitive sequence ratio of the samples was significantly reduced after UMI correction. (See attached figure). Figure 3 The matching rate of each sample with the Lambda whole genome reference sequence was calculated. The results showed no significant difference in matching rate between the two sets of UMI adapter sequence combinations and the non-UMI adapter constructed library samples; the matching rates were all above 99%. (See attached results). Figure 4 This demonstrates that the unique DNA molecular tag result combination, reagents, and library construction method of this invention can effectively trace repetitive sequences and accurately quantify the number of initiating molecules, while effectively reducing errors and background noise generated during library construction and sequencing.
[0229] The above description is only a preferred embodiment of the present invention. It should be noted that for those skilled in the art, several improvements and modifications can be made without departing from the principle of the present invention, and these improvements and modifications should also be considered within the scope of protection of the present invention.
Claims
1. A UMI connector, characterized in that, Including connector chains and complementary auxiliary chains; The connector chain includes a universal sequence and a first UMI; The complementary auxiliary strand comprises a random base sequence, a second UMI, and a universal complementary sequence; The universal sequence binds to the primers for PCR amplification; The first UMI and the second UMI each independently contain 3-15 nt of random bases; The random base sequence is 1 to 10 nt in length and is used to bind to the single-stranded target sequence, assisting the linker strand in connecting with the single-stranded target sequence.
2. The UMI connector as described in claim 1, characterized in that, The connector chain also includes one or more of the modifying groups or anchoring sequences; The complementary auxiliary chain further includes one or more of the modifying groups or anchoring complementary sequences; Preferably, the anchoring sequence or the anchoring complementary sequence is an independent combination of fixed base sequences of 3-15 nt; Preferably, the anchoring complementary sequence provides complementary pairing sites for the anchoring sequence of the connector chain, and the anchoring sequence and the anchoring complementary sequence together maintain the stability of the UMI connector double-chain structure, ensuring the connection efficiency of the single-chain target sequence and the double-chain structure of the UMI connector; Preferably, the modifying group blocks the self-linking of the connector; Preferably, the modifying group includes one or more of amino modification, biotin modification, or thio modification.
3. The UMI connector as described in claim 1 or 2, characterized in that, The universal sequence includes a structural sequence that matches the adapter sequence used by the sequencing platform. Preferably, the sequencing platform includes MGI or Illumina; As a preferred option, from 5 , Served to 3 , End, 5 , The connector chain of the UMI connector described above includes a modifying group, a universal sequence, a UMI, and an anchoring sequence (As).
4. The UMI connector as described in claim 3, characterized in that, From 5 , Served to 3 , End, 5 , The complementary auxiliary chain of the UMI connector described above includes a modifying group, a random base sequence, anchoring complementary sequences (ACs), a UMI, a universal complementary sequence, and a modifying group; Preferably, the universal sequence has: (I) A nucleotide sequence as shown in SEQ ID No. 1; or (II) A nucleotide sequence obtained by substituting, deleting, or adding one or more nucleotide sequences to the nucleotide sequence shown in (I), and which has the same or similar function to the nucleotide sequence shown in (I); or (III) A nucleotide sequence having at least 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, or 99% sequence homology to the nucleotide sequence described in (I) or (II); and / or The anchoring sequence has: (I) Nucleotide sequences such as 5-CATCAT-3, 5-ATCATC-3, 5-GGGGGG-3 and / or 5-TCATCA-3; or (II) A nucleotide sequence obtained by substituting, deleting, or adding one or more nucleotide sequences to the nucleotide sequence shown in (I), and which has the same or similar function to the nucleotide sequence shown in (I); or (III) A nucleotide sequence having at least 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, or 99% sequence homology to the nucleotide sequence described in (I) or (II); and / or The universal complementary sequence has: (I) A nucleotide sequence as shown in SEQ ID No. 2; or (II) A nucleotide sequence obtained by substituting, deleting, or adding one or more nucleotide sequences to the nucleotide sequence shown in (I), and which has the same or similar function to the nucleotide sequence shown in (I); or (III) A nucleotide sequence having at least 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, or 99% sequence homology to the nucleotide sequence described in (I) or (II); and / or The anchored complementary sequence has: (I) Nucleotide sequences such as 5-ATGATG-3, 5-GATGAT-3, 5-CCCCCC-3 and / or 5-TGATGA-3; or (II) A nucleotide sequence obtained by substituting, deleting, or adding one or more nucleotide sequences to the nucleotide sequence shown in (I), and which has the same or similar function to the nucleotide sequence shown in (I); or (III) A nucleotide sequence having at least 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, or 99% sequence homology to the nucleotide sequence described in (I) or (II); Preferably, the first UMI and the anchoring sequence also include a combination of tandem repeating sequences; Preferably, the second UMI and the anchored complementary sequence further include a combination of tandem repeat sequences; As a preferred option, 5 , The connector chain of the UMI connector described above has: (I) A nucleotide sequence as shown in any of SEQ ID Nos. 5–10; or (II) A nucleotide sequence obtained by substituting, deleting, or adding one or more nucleotide sequences to the nucleotide sequence shown in (I), and which has the same or similar function to the nucleotide sequence shown in (I); or (III) A nucleotide sequence having at least 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, or 99% sequence homology to the nucleotide sequence described in (I) or (II); and / or 5 , The complementary auxiliary chain of the UMI connector described above has: (I) A nucleotide sequence as shown in any of SEQ ID Nos. 11–16; or (II) A nucleotide sequence obtained by substituting, deleting, or adding one or more nucleotide sequences to the nucleotide sequence shown in (I), and which has the same or similar function to the nucleotide sequence shown in (I); or (III) A nucleotide sequence having at least 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, or 99% sequence homology to the nucleotide sequence described in (I) or (II); Preferably, the connector chain also contains phosphate groups.
5. The UMI connector as described in claim 3, characterized in that, From 5 , Served to 3 , End, 3 , The connector chain of the UMI connector described above contains phosphate groups, an anchoring sequence (As), UMI, a universal sequence, and modifying groups; The phosphate group is used to react with the single-chain target sequence 3. , The terminal hydroxyl groups react to form phosphate diester bonds; From 5 , Served to 3 , End, 3 , The complementary auxiliary chain of the UMI connector described above includes a modifying group, a universal complementary sequence, a UMI, anchoring complementary sequences (ACs), a random base sequence, and a modifying group; Preferably, the universal sequence has: (I) A nucleotide sequence as shown in SEQ ID No. 3; or (II) A nucleotide sequence obtained by substituting, deleting, or adding one or more nucleotide sequences to the nucleotide sequence shown in (I), and which has the same or similar function to the nucleotide sequence shown in (I); or (III) A nucleotide sequence having at least 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, or 99% sequence homology to the nucleotide sequence described in (I) or (II); and / or The anchoring sequence has: (I) Nucleotide sequences such as 5-TACTAC-3, 5-CTACTA-3, 5-GGGGGG-3 and / or 5-ACTACT-3; or (II) A nucleotide sequence obtained by substituting, deleting, or adding one or more nucleotide sequences to the nucleotide sequence shown in (I), and which has the same or similar function to the nucleotide sequence shown in (I); or (III) A nucleotide sequence having at least 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, or 99% sequence homology to the nucleotide sequence described in (I) or (II); The first UMI and anchoring sequence also include combinations of tandem repeating sequences; The universal complementary sequence has: (I) A nucleotide sequence as shown in SEQ ID No. 4; or (II) A nucleotide sequence obtained by substituting, deleting, or adding one or more nucleotide sequences to the nucleotide sequence shown in (I), and which has the same or similar function to the nucleotide sequence shown in (I); or (III) A nucleotide sequence having at least 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, or 99% sequence homology to the nucleotide sequence described in (I) or (II); and / or The anchoring complementary sequence is: (I) Nucleotide sequences such as 5-GTAGTA-3, 5-TAGTAG-3, 5-CCCCCC-3 and / or 5-AGTAGT-3; or (II) A nucleotide sequence obtained by substituting, deleting, or adding one or more nucleotide sequences to the nucleotide sequence shown in (I), and which has the same or similar function to the nucleotide sequence shown in (I); or (III) A nucleotide sequence having at least 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, or 99% sequence homology to the nucleotide sequence described in (I) or (II); The second UMI and the anchored complementary sequence also include combinations of tandem repeat sequences.
6. The UMI connector as described in claim 3, characterized in that, 3 , The connector chain of the UMI connector described above has: (I) A nucleotide sequence as shown in any of SEQ ID Nos. 17–22; or (II) A nucleotide sequence obtained by substituting, deleting, or adding one or more nucleotide sequences to the nucleotide sequence shown in (I), and which has the same or similar function to the nucleotide sequence shown in (I); or (III) A nucleotide sequence having at least 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, or 99% sequence homology to the nucleotide sequence described in (I) or (II); and / or 3 , The complementary auxiliary chain of the UMI connector described above has: (I) A nucleotide sequence as shown in any of SEQ ID Nos. 23–28; or (II) A nucleotide sequence obtained by substituting, deleting, or adding one or more nucleotide sequences to the nucleotide sequence shown in (I), and which has the same or similar function to the nucleotide sequence shown in (I); or (III) A nucleotide sequence having at least 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, or 99% sequence homology with the nucleotide sequence described in (I) or (II).
7. A reagent, characterized in that, Includes the UMI connector as described in any one of claims 1 to 6.
8. A reagent kit, characterized in that, Includes the UMI connector as described in any one of claims 1 to 6 or the reagent as described in claim 7.
9. The application of any of the following in amplification, sequencing and / or library construction, including but not limited to one or more of gene mutation detection, DNA detection, RNA detection or ancient DNA detection; (I) The UMI connector as described in any one of claims 1 to 6; (II) The reagent as described in claim 7; or (III) The kit as described in claim 8.
10. A method for constructing a DNA molecular library, characterized in that, UMIs are introduced into the single or double ends of a single-stranded DNA molecule using a one-step method via any of the following options, and a ligation reaction enhancer is added; (I) The UMI connector as described in any one of claims 1 to 6; (II) The reagent as described in claim 7; or (III) The kit as described in claim 8.
11. The method for constructing a DNA molecular library as described in claim 10, characterized in that, Using the principle of splint connection, T4 DNA ligase is used to connect the single-stranded DNA molecule to the double-stranded UMI adapter, thus introducing UMI.