Preparation method of a linker, a large fragment sequencing library, a sequencing method, a kit and application thereof

What is AI technical title?
AI technical title is built by Patsnap AI team. It summarizes the technical point description of the patent document.
By using uracil nucleotide adapters to simplify the large-fragment library construction process, the problems of complex procedures, high costs, and amplification errors in existing technologies are solved, achieving efficient and accurate library preparation and sequencing.

CN122303378APending Publication Date: 2026-06-30MGI TECH CO LTD

View PDF 0 Cites 0 Cited by

Patent Information

Authority / Receiving Office: CN · China
Patent Type: Applications(China)
Current Assignee / Owner: MGI TECH CO LTD
Filing Date: 2024-12-27
Publication Date: 2026-06-30

Application Information

Patent Timeline

27 Dec 2024

Application

30 Jun 2026

Publication

CN122303378A

IPC: C12Q1/6806; C12Q1/6869; C40B50/06; C12N15/11

AI Tagging

Technology Topics

Large fragmentGenome

Explore More Agents

Novelty Search
Search existing technologies and assess novelty
↗
FTO
Analyze whether a product may infringe others' patents
↗
Design FTO
Check prior-design risk for exterior design
↗
Drafting
Draft patent application text based on a technical solution
↗
Find Solutions with TRIZ
Generate feasible solution to solve your technical challenge
↗

Similar Technology Patents

Method for efficiently constructing large fragment deletion of zebrafish helt gene and application
CN118786959BGenomic SegmentLarge fragment
Multi-size pre-controlled fragments for a fragment warhead
CN122448033ALarge fragmentSmall fragment
Wildlife non-invasive genomic monitoring and population health assessment system
CN122266770AOvercoming the core problem of distortionHighly reliable host physiological informationHealth-index calculation Microbiological testing/measurement BiotechnologyZooid
Gene for improving nitrogen use efficiency of wheat under nitrogen deficiency and application thereof
CN120591288BMicrobiological testing/measurement Plant peptides BiotechnologyWhole Genome Association Analysis
Molecular marker M-513 and its application in identifying sex-reversed pseudo-male individuals in the Northeast Forest Frog.
CN122326764AMuscle tissue Physiology

Get free access to AI patent search and analysis

Check patentability, review prior art and ask IP Agent with full patent context.

AI Technical Summary

Technical Problem

Existing methods for constructing large fragment libraries are complex, costly, and inefficient, and are prone to amplification errors and PCR bias, making them difficult to widely apply in clinical disease diagnosis.

Method used

By employing linkers with uracil nucleotides, sticky ends are formed after ligation with DNA fragments, allowing for circularization and controlled length nick translation, simplifying experimental procedures and improving accuracy.

Benefits of technology

It greatly simplifies experimental procedures and time, reduces costs, improves the reproducibility and success rate of experiments, avoids information loss, and ensures the accuracy of sequencing results.

✦ Generated by Eureka AI based on patent content.

Smart Images

Figure BDA0005215614670000271
Figure HDA0005215614680000011
Figure HDA0005215614680000012

Patent Text Reader

Abstract

This application discloses a method for preparing adapters and large-fragment sequencing libraries, a sequencing method, a kit for the application, and its applications. The scheme in this application utilizes a specially designed adapter for circularization, enabling simultaneous translation of restriction gaps in two directions. This allows for rapid and accurate capture of information from both ends of large fragments, facilitating genome assembly and variant detection.

Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This application belongs to the field of library construction and sequencing technology, and relates to a method for preparing large-fragment sequencing libraries, specifically involving an adapter, a method for preparing large-fragment sequencing libraries, a sequencing method, a kit, and their applications. Background Technology

[0002] Constructing large-fragment libraries aims to capture and sequence longer DNA segments to better understand and assemble complex genomes, capturing a wider range of genetic information, and thus plays a crucial role in genomics research. With the rise of Next Generation Sequencing (NGS), large-fragment library construction techniques have undergone a revolutionary change. NGS technology has not only significantly improved sequencing speed and throughput but also reduced costs, making large-scale genome sequencing feasible. Against this backdrop, novel large-fragment library construction strategies such as Mate-Pair have emerged. These technologies can efficiently capture genomic regions far exceeding those amplified by traditional PCR, opening new avenues for in-depth research on genome structural variations, genome rearrangements, and transposon activity.

[0003] Large DNA fragment libraries, also known as mate-pair libraries, are designed to prepare short DNA fragments containing sequences at both ends of larger segments in the genome. They are primarily used for de novo sequencing of plants, animals, and microorganisms (sequencing without a reference genome, requiring de novo assembly). These libraries help researchers better understand long-range genetic information, improve the efficiency of sequence assembly, and are crucial for understanding the overall genome layout.

[0004] However, the existing Mate Pair library construction process is complex and lengthy, with insufficient efficiency and specificity, which severely restricts its promotion in clinical disease diagnosis. According to the traditional construction protocol, genomic DNA is first randomly fragmented into larger fragments (2-30kb) using physical or enzymatic methods, followed by end repair, adapter addition, biotinylation at both ends, and blunt end circularization. After digesting linear DNA molecules, circular DNA molecules are randomly fragmented into smaller fragments, and biotinylated fragments at the ends are captured using magnetic beads carrying streptavidin. After end repair and adapter addition, a large fragment library is finally constructed for sequencing. Although the traditional Mate Pair library construction technology has certain advantages in the detection of genomic structural variations, it also has some obvious drawbacks. These drawbacks mainly affect library quality and sequencing efficiency, including: (i) low blunt end circularization efficiency and huge template consumption, resulting in a dramatic increase in the amount of starting DNA required; (ii) streptavidin labeling can lead to data waste and high false positive rates; (iii) the operation steps are cumbersome, taking up to 3-4 days, and the high cost greatly limits its popularization in clinical practice. Therefore, simplifying the Mate Pair library construction process, improving efficiency, reducing costs, and enhancing specificity are particularly important.

[0005] The Mate Pair library construction based on transposases is optimized by using transposases to fragment genomic DNA and adding sequencing adapters to both ends of the fragments. The transposase simultaneously breaks down nucleic acids and ligates transposon sequences to both ends of the fragments, thus achieving simultaneous fragmentation and adapter addition. The fragmented DNA molecules are then circularized to form circular DNA, with special labels applied to both ends for subsequent separation and identification. The circular DNA is fragmented mechanically or enzymatically, recovered using labeled materials, and the recovered fragments undergo end repair. Library construction is then completed via PCR amplification. Although the Mate Pair library construction method simplifies the process to some extent by utilizing the properties of transposases, it still has the following drawbacks: (i) High chimerism rate: During the ligation process, different DNA fragments may be randomly ligated, resulting in chimeras, i.e. fragments from different original DNA molecules are incorrectly ligated together, which increases the complexity of data analysis and reduces accuracy; (ii) High cost and time consumption: The library construction process is relatively cumbersome, involving multiple steps, such as physical breaking, ligation, circularization, and biotin-labeled capture. These operations are not only time-consuming but also costly, especially for large-scale sample processing; (iii) Low circularization efficiency: Similar to the traditional Mate Pair library construction technology, the traditional circularization step may lead to an inefficient circularization rate, which directly affects the yield and quality of the constructed library.

[0006] CP-AL is a recently studied method for constructing large mate-pair libraries (Dong Zirui et al., Development of coupling controlled polymerizations by adapter-ligation inmate-pair sequencing for detection of various genomic variants in one singleassay. DNA Research. 2019 Aug 1; 26(4):313-325. doi:10.1093 / dnares / dsz011, which is incorporated herein by reference in its entirety where applicable national and regional patent laws apply). It is mainly used for genomic variant detection, improving overall efficiency, reducing costs and library construction time, and its ease of operation enables full automation. First, genomic DNA is fragmented to a size of 3-8 kb. Then, the fragment is added to the adapter AD1 and digested by PCR (with U) and USER enzyme. After the above treatment, the DNA fragment forms sticky ends and undergoes double-strand circularization to form a nick. The length of the nick is controlled by restriction nick translation (CNT), a step crucial for obtaining accurate sequencing results. Add the AD2-3' adapter and control the template extension length using restriction primer extension (CPE). Then add the AD2-5' adapter, and finally perform PCR to form a Mate Pair library with AD2 adapter sequences at both ends and AD1 adapter sequence in the middle.

[0007] While CP-AL Mate Pair library construction simplifies the process in some aspects, it remains a relatively complex technique overall. First, CP-AL Mate Pair library construction involves several delicate biochemical reaction steps, including DNA fragmentation, adapter ligation, circularization, restriction nick translation, and restriction primer extension. Each step requires precise operating conditions and strict quality control. This places high demands on the professional skills and experience of the experimenters; those lacking sufficient training may struggle to obtain high-quality libraries. Second, long-fragment amplification is more challenging; during library construction, the amplification efficiency of long DNA fragments is often lower than that of short DNA fragments. Furthermore, the amplification fidelity of long DNA fragments is more difficult to control, and amplification errors are more likely to occur, affecting the accuracy of long fragments in the final library. Additionally, there is the introduction of bias: although CP-AL technology aims to reduce PCR bias, selective amplification of certain fragments may still occur during the PCR amplification step, especially for sequences with extreme GC content.

[0008] Therefore, simplifying experimental procedures while avoiding amplification errors and improving the accuracy of library construction and sequencing remains a pressing issue to be addressed in developing methods for constructing large-fragment libraries. Summary of the Invention

[0009] This application aims to at least address one of the aforementioned technical problems existing in the prior art. In view of this, this application provides a method for preparing a large-fragment sequencing library with a uracil nucleotide adapter, a sequencing method, a kit for such library, and its applications.

[0010] According to one aspect of this application, a linker having a uracil nucleotide is provided. In some embodiments, the uracil nucleotide is a deoxyuracil nucleotide (dUMP). In some embodiments, the linker is a dual-linker comprising a first single strand and a second single strand. In some embodiments, the first single strand is used for attachment to the 3' end of a DNA fragment, and the second single strand is used for attachment to the 5' end of the DNA fragment.

[0011] In some embodiments, the 5' end of the adapter has a phosphate group. In some embodiments, the 3' end of the adapter has a protruding end. In some preferred embodiments, the 5' end of the first single strand of the adapter has a phosphate group, and the 3' end of the second single strand of the adapter has a protruding end. Preferably, the 3' protruding end of the adapter is complementary to the 3' protruding end of the DNA fragment, thereby enabling the adapter to recognize and ligate with the DNA fragment. For example, when the 3' protruding end of the DNA fragment is one or more A bases, the 3' protruding end of the adapter is correspondingly designed to be the corresponding number of T bases. For example, when the 3' protruding end of the DNA fragment is polyG, the 3' protruding end of the adapter is correspondingly designed to be polyC. In some embodiments, to prevent the end of the adapter not intended to ligate with the DNA fragment from ligating with the DNA fragment, the 5' end of the non-ligating end may not have a phosphate group (e.g., a hydroxyl group), and / or this end may be designed as a protruding end that is not complementary to the end of the DNA fragment. For example, to prevent the end of the adapter not intended to be attached to a DNA fragment from being attached to a DNA fragment, the 5' end of the second single strand of the adapter may not have a phosphate group (e.g., a hydroxyl group), and / or the 3' end of the first single strand and the 5' end of the second single strand may be designed to be non-complementary (e.g., in a Y-shaped structure).

[0012] In some embodiments, the adapter has a uracil nucleotide on one of its single strands. In some embodiments, the uracil nucleotide is located on the single strand (second single strand) of the adapter intended for attachment to the 5' end of a DNA fragment. In some embodiments, the uracil nucleotide and the 3' overhang of the adapter are on the same single strand. In some embodiments, the number of uracil nucleotides is 1-10; preferably, 1-5. More preferably, the number of uracil nucleotides is 1. In some embodiments, the length from the position of the uracil nucleotide to the end of its single strand not intended for attachment to a DNA fragment (i.e., the 5' end of the second single strand) is 3-20 nt; preferably, 5-15 nt. When more than one uracil nucleotide is present, the length of the uracil nucleotide to the 5' end is calculated based on the farthest distance.

[0013] In some embodiments, the sequences of the two single strands of the adapter may be completely complementary or partially complementary. In some embodiments, the sequences of the two single strands of the adapter may be partially complementary at either end or both ends. Preferably, the two single strands of the adapter are complementary at one end (i.e., forming a Y-shape) to connect to the DNA fragment. In some embodiments, the two single strands of the adapter in the middle portion are not complementary (i.e., forming a bubble-like structure).

[0014] In some embodiments, on the other strand (first strand) of the single strand containing the uracil nucleotide (second single strand), the nucleotide complementary to the uracil nucleotide (A) extends to the end of that single strand (i.e., the 3' end) and comprises an inversely complementary nucleotide sequence portion. In some embodiments, the inversely complementary nucleotide sequence portion comprises the nucleotide at the 3' end of the single strand (first single strand). In some embodiments, the length of the uracil nucleotide complementary nucleotide (A) to the 3' end is 3-20 nt; preferably, the length is 5-15 nt; more preferably, the length is 8-10 nt. When more than one uracil nucleotide is present, the length of the uracil nucleotide complementary nucleotide to the 3' end is calculated based on the farthest distance.

[0015] In some embodiments, the length of the reverse complementary nucleotide sequence portion is 3-20 nt; preferably, it is 5-15 nt; more preferably, it is 8-10 nt. In some embodiments, on the other single strand (first single strand) of the single strand containing the uracil nucleotide (second single strand), the length of the sequence from the complementary nucleotide (A) to its 3' end is greater than the length of the reverse complementary nucleotide sequence portion. In some embodiments, the length from the complementary nucleotide (A) to its 3' end is 1-5 nt longer than the length of the reverse complementary nucleotide sequence portion; preferably, it is 2-3 nt longer.

[0016] In some embodiments, the two single strands of the connector may have the same number of bases, or the two single strands of the connector may have different numbers of bases. In some embodiments, the two single strands of the connector may each have 20-50 nt of bases; preferably, 25-40 nt. In some embodiments, the length of the non-complementary portion between the two single strands of the connector may be 10-30 nt; preferably, 15-25 nt. In some embodiments, the total length of the complementary portion between the two single strands of the connector may be 10-30 bp; preferably, 15-25 bp. Those skilled in the art can design the specific sequence of the two single strands of the connector and adjust the length of the complementary or non-complementary portions in the middle or at both ends to form a stable connector.

[0017] According to some embodiments of this application, the adapter may optionally, but not necessarily, have a labeled modifying group. Labeled modifying groups are generally used for the adsorption or purification of DNA fragments. When the adapter is ligated to a DNA fragment to form a ligation product, the ligation product or its derivatives can be readily adsorbed or purified using molecules or groups that have an affinity or interaction with the modifying group of the adapter. For example, a ligand in a ligand-receptor interaction system can be selected as the modifying group of the adapter, and a solid phase bound to a receptor can be used to adsorb or purify the ligation product. For example, the ligand-receptor interaction system can be a typical biotin-avidin system. To achieve equivalent adsorption or purification functionality, those skilled in the art can also select other molecules or groups with strong affinity interactions, such as antibodies and antigens, amino and hydroxyl groups, etc. In some embodiments, the molecules or groups with affinity or interaction are immobilized. In some embodiments, the reaction product with the modified group is purified using a solid phase, such as magnetic beads, a centrifuge column, or a chromatographic column, immobilized with molecules or groups with affinity or interaction. In some embodiments, the affinity constant between the modifying group and the molecules or groups with affinity or interaction is at least 10. 5 mol / L. Preferably, the affinity constant between the modifying group and the molecule or group with which it has an affinity or interaction is at least 10. 6 mol / L to 10 18 mol / L. More preferably, the affinity constant between the modifying group and the molecule or group with which it has an affinity or interaction is at least 10. 8 mol / L to 10 16 mol / L.

[0018] According to some embodiments of this application, the adapter optionally, but not necessarily, has a tag sequence. In some embodiments, the tag sequence includes, but is not limited to, a unique molecular tag (UMI) sequence and a sample tag sequence. The unique molecular tag (UMI) sequence is used to count the copy number of nucleic acid molecules in a sample. The sample tag sequence is used to distinguish different samples for subsequent multi-sample pooling sequencing. For example, the tag sequence can be a barcode sequence or an index sequence. In some embodiments, the tag sequence is 3-20 nt in length; preferably, the tag sequence is 4-15 nt in length; more preferably, it is 8-10 nt.

[0019] According to another embodiment of this application, the adapters of the above embodiments are provided for the preparation or sequencing of large-fragment sequencing libraries. In some embodiments, the adapters are used to ligate DNA fragments, and then to prepare large-fragment sequencing libraries based on the resulting DNA ligation products or to further sequence the large-fragment sequencing libraries.

[0020] According to some embodiments of this application, after ligating a DNA fragment to the adapter, a DNA ligation product with uracil nucleotides at both ends is obtained. The nucleotides from the uracil nucleotides to the end of the single strand not intended to be ligated to the DNA fragment (i.e., the 5' end of the second single strand) are removed, that is, the nucleotides from the uracil nucleotides to the 5' end of the DNA ligation product are removed, forming sticky ends at both ends of the DNA product. The sequences at both sticky ends have reverse complementary nucleotide sequence portions, thereby causing the DNA product to self-circulate. According to some embodiments of this application, the length of the sequence from the uracil nucleotide complement (A) to the 3' end is greater than the length of the reverse complementary nucleotide sequence portion, and the reverse complementary nucleotide sequence portion includes the nucleotide at the 3' end of the single strand. Therefore, only the 3' end portion of the sticky ends undergoes complementary ligation, resulting in a gap on each single strand before and after the ligation region of the DNA circular molecule. By introducing gaps at both ends of the initial DNA fragment sequence in a circular DNA molecule, and performing controlled-length restricted nick translation (CNT) in two directions on the initial DNA fragment, the circular DNA molecule is broken at the translation gaps, resulting in fragments with sequences at both ends of the DNA fragment. The technical solution of this application can rapidly and accurately capture information from both ends of large fragments, facilitating subsequent genome assembly and variant detection.

[0021] The following embodiments of the application of the above-described adapters for preparing or sequencing large fragment sequencing libraries are provided in several aspects. Those skilled in the art will understand that, based on the disclosed embodiments, the same scheme can be performed within a broad range of equivalents without affecting the scope of the subject matter or specific aspects described herein. Any changes, modifications, substitutions, combinations, or simplifications made without departing from the spirit and principle of this application should be considered equivalent substitutions and are included within the scope of protection of this application.

[0022] According to another embodiment of this application, a method for preparing a large-fragment sequencing library is provided, comprising the following steps:

[0023] A DNA fragment is provided, and the DNA fragment is ligated to a adapter containing uracil nucleotides to obtain a DNA ligation product with the adapters attached to both ends;

[0024] The uracil nucleotides at the ends of the DNA ligation product are removed to obtain a DNA digestion product with sticky ends at both ends; wherein the sequences of the sticky ends at both ends are complementary.

[0025] The DNA digestion product is circularized to form a circular DNA molecule with a single-strand gap. The circular DNA molecule is then translated with a controlled length gap. The circular DNA molecule is cut at the translated gap, and DNA fragment products are screened to obtain a large fragment sequencing library.

[0026] According to some embodiments of this application, compared with existing large-fragment library construction methods, the experimental process is greatly simplified, while reducing experimental steps, required reagents, and time. This not only reduces costs but also improves the reproducibility and success rate of the experiment, avoiding the loss of large-fragment information due to cumbersome operation steps. By introducing gaps at both ends of the initial DNA fragment sequence in the DNA circular molecule, and performing controlled-length restricted nick translation (CNT) in two directions on the initial DNA fragment, the DNA circular molecule is broken at the translation gap, resulting in fragments with sequences at both ends of the DNA fragment. The technical solution of this application can quickly and accurately capture information at both ends of large fragments, providing convenience for subsequent genome assembly and variant detection.

[0027] According to some embodiments of this application, the DNA fragment is longer than 1 kb. Preferably, the DNA fragment is 2-40 kb in length. More preferably, the DNA fragment is 3-8 kb in length. Even more preferably, the DNA fragment is 3-5 kb in length. The length of the provided DNA fragment depends on the span of the gene to be studied, and DNA fragments of the desired length can be screened after the DNA double strand is broken.

[0028] In some embodiments, the DNA fragment is provided by breaking the DNA double strand through sonication or enzymatic treatment. Preferably, the DNA fragment is provided by breaking the DNA double strand through enzymatic treatment. Preferably, the DNA fragment is provided by breaking the DNA double strand using a nuclease. Preferably, the nuclease includes at least one selected from deoxyribonuclease I (DNase I), deoxyribonuclease II (DNase II), micrococcal nuclease (MNase), double-strand-specific DNase (dsDNase), salt-active nuclease (SAN), and nuclease Vvn. In some embodiments, the concentration of the nuclease in the reaction system is 0.001 U / μL-1 U / μL; preferably 0.01 U / μL-0.5 U / μL; more preferably 0.04-0.1 U / μL. In some embodiments, the enzymatic reaction is carried out at 10-50°C for 5-60 min; preferably at 15-45°C for 10-50 min; more preferably at 25-40°C for 10-40 min; and even more preferably at 37°C for 15-30 min. Those skilled in the art can rationally select existing known endonucleases, or can extract enzymes with completely identical or similar functions as substitutes. Those skilled in the art can also select appropriate enzyme dosage, buffer solution, coenzyme factor, salt ions, and adjust the reaction temperature and time, as well as the pH value and concentration of each component of the reaction system, according to the specific type of endonuclease used and the desired effect.

[0029] According to some embodiments of this application, the provided DNA fragment has single-stranded sticky ends. In some embodiments, the DNA fragment has a 5' single-stranded sticky end.

[0030] In some embodiments, the DNA fragment is end-repaired before being ligated to the adapter. In some embodiments, DNA fragments provided by enzymatically breaking the DNA double strand can be used for end-repair without purification. Therefore, DNA double-strand breaking and end-repair can be performed in the same tube or reactor.

[0031] In some embodiments, a nucleic acid polymerase is used to repair the sticky single-stranded ends of a DNA fragment. Preferably, the nucleic acid polymerase has 5'-3' DNA polymerase activity. For the 5' sticky single-stranded ends of the DNA fragment, a nucleic acid polymerase with 5'-3' DNA polymerase activity is used to extend the 3' single-stranded ends using the 5' sticky single-stranded ends as a template to repair the 3' single strand.

[0032] In some embodiments, the nucleic acid polymerase used for end repair is a DNA-dependent DNA polymerase. In some preferred embodiments, the DNA-dependent DNA polymerase includes at least one selected from T4 DNA polymerase, DNA polymerase I large fragment (Klenow fragment), T7 DNA polymerase, DNA polymerase I, Taq DNA polymerase, Bst DNA polymerase, and Phi29 DNA polymerase. In some embodiments, the concentration of the nucleic acid polymerase in the reaction system is 0.001 U / μL-5 U / μL; preferably 0.01 U / μL-1 U / μL; more preferably 0.1 U / μL-0.5 U / μL. In some embodiments, the end repair reaction is carried out at 10-50°C for 5-60 min; preferably at 20-45°C for 10-50 min; more preferably at 25-40°C for 10-40 min; and even more preferably at 37°C for 15-30 min. Those skilled in the art can reasonably select existing known nucleic acid polymerases, or extract enzymes with completely identical or similar functions as substitutes. Furthermore, those skilled in the art can select appropriate enzyme dosage, buffer solution, coenzyme factors, salt ions, dNTPs, and adjust reaction temperature, time, pH value, and concentration of each component in the reaction system, based on the specific type of nucleic acid polymerase used and the desired effect.

[0033] In some embodiments, the DNA fragment has protruding sticky ends. Preferably, the 3' end of the DNA fragment has a protruding end. Preferably, the end of the DNA fragment has one or more protruding A bases. In some embodiments, the 3' end of the DNA fragment has one or more protruding A bases.

[0034] In some embodiments, protruding sticky ends are added to DNA fragments via end repair. In some embodiments, protruding sticky ends are added to DNA fragments using nucleic acid polymerases. Adding specific protruding sticky ends to DNA fragments, such as adding an A base to the 3' end of a DNA fragment to form a protruding sticky end, facilitates subsequent ligation with adapters designed to have protruding sticky ends with a T base. However, those skilled in the art can also add sticky ends of one or more other types of bases to DNA fragments using other known techniques and employ adapters with sticky ends having bases complementary to the added bases, such as adding a polyG base to the 3' end of a DNA fragment and correspondingly designing the adapter to have a polyC sticky end.

[0035] In some specific embodiments, at least one of T4 DNA polymerase, Klenow Fragment, and Taq DNA polymerase is used for end repair. T4 DNA polymerase, in the presence of a template and dNTPs, catalyzes the selective ligation of template-complementary deoxynucleotides to the 3'-OH end of the DNA strand. The Klenow Fragment is an N-terminal truncated form of DNA polymerase I, retaining both the 5'→3' polymerase and 3'→5' exonuclease activities of DNA polymerase I, but lacking the 5'→3' exonuclease activity of the complete enzyme. Taq DNA polymerase is a thermostable enzyme that synthesizes DNA from a single-stranded template in the presence of dNTPs. It possesses both 5'→3' polymerase and 5'→3' exonuclease activities; and in the presence of dATP, Taq DNA polymerase can add an A base to the 3' end of the DNA fragment. Adding an A base to the 3' end of the DNA fragment to form a protruding sticky end facilitates subsequent ligation with a double-joint head having a T protruding sticky end. Taq DNA polymerase is preferably recombinant Taq DNA polymerase (rTaq DNA polymerase), which is particularly suitable for optimizing problem templates and / or GC-rich templates. However, those skilled in the art can also add sticky ends of one or more other types of bases to DNA fragments using other known techniques, employing double-linked heads with sticky ends of bases complementary to the added bases, such as adding polyG to the 3' end of the DNA, and correspondingly designing the double-linked heads to have polyC sticky ends.

[0036] In some embodiments of this application, before ligating the DNA fragment to the adapter, in order to avoid undesirable self-ligation between DNA fragments, or to improve the ligation efficiency between the DNA fragment and the adapter in subsequent ligation steps, the DNA fragment to be ligated may be dephosphorylated or phosphorylated.

[0037] In some embodiments, to avoid unintended self-ligation between DNA fragments, phosphatases are used to remove the phosphate groups at the 5' and / or 3' ends of the DNA fragments, resulting in dephosphorylated DNA fragments. In some embodiments, to remove the phosphate groups at the 5' and 3' ends of the DNA fragments, phosphatases with catalytic 5' end dephosphorylation activity or 3' end dephosphorylation activity can be selected to remove the phosphate groups at the 5' and 3' ends of the DNA fragments sequentially or simultaneously. Alternatively, phosphatases with both catalytic 5' end dephosphorylation activity and 3' end dephosphorylation activity can be selected to remove the phosphate groups at the 5' and 3' ends of the DNA fragments simultaneously.

[0038] In some embodiments, the phosphatase may be an acid phosphatase or an alkaline phosphatase. Preferably, the phosphatase is [specific enzyme name missing]. More preferably, the alkaline phosphatase is a nonspecific phosphomonoesterase. Nonspecific phosphomonoesterases can catalyze the hydrolysis of the 5'-phosphate and 3'-phosphate groups of almost all phosphate monoesters. In some embodiments, the phosphatase may be at least one of bacterial alkaline phosphatase (BAP), shrimp alkaline phosphatase (SAP), calf intestinal alkaline phosphatase (CIAP), placental alkaline phosphatase (PLAP), and secretory alkaline phosphatase (SEAP). In some more preferred embodiments, the phosphatase may be a wild-type enzyme or a recombinant enzyme. Compared to wild-type enzymes, recombinant enzymes obtained through recombinant expression do not contain affinity tags or other modifications found in wild-type or natural enzymes.

[0039] In some embodiments, the phosphatase may be T4 polynucleotide kinase (T4 PNK). T4 PNK is a polynucleotide 5'-hydroxykinase that catalyzes the transfer of the γ-phosphate group of ATP to the 5'-hydroxy terminus of an oligonucleotide chain (double-stranded or single-stranded DNA) and to a 3'-monophosphate nucleoside. T4 PNK also catalyzes the reverse phosphorylation reaction, exhibiting 3'-terminal phosphatase activity, capable of catalyzing the hydrolysis of the 3'-phosphate group from the 3'-phosphate terminus of the oligonucleotide, deoxy3'-monophosphate nucleoside, and deoxy3'-diphosphate nucleoside. When ADP is present, T4 PNK also exhibits 5'-terminal phosphatase activity, catalyzing the exchange of the 5'-phosphate group between the 5'-P-oligonucleotide / polynucleotide and the 5'-terminal ATP. Those skilled in the art can rationally select existing known phosphatases, or extract enzymes with identical or similar functionality as substitutes. Those skilled in the art can also select appropriate enzyme dosage, buffer solution, coenzyme factor, salt ions, and adjust reaction temperature and time, as well as pH value and concentration of each component in the reaction system, based on the specific type of phosphatase used and the desired effect.

[0040] In some embodiments, the 5' end of the DNA fragment is phosphated to facilitate ligation between the DNA fragment and the adapter. DNA fragments formed by breaking the DNA double strand using endonucleases typically have a phosphate group at the 5' end and a hydroxyl group at the 3' end. To ensure ligation between the DNA fragment and the adapter, the DNA fragment can be phosphorylated using a phosphokinase. In some embodiments, the phosphokinase is a polynucleotide 5'-hydroxykinase. The polynucleotide 5'-hydroxykinase is used to phosphorylate the 5' end of the DNA fragment. Preferably, the polynucleotide 5'-hydroxykinase is T4 polynucleotide kinase (T4 PNK). T4 PNK catalyzes the transfer of the phosphate group of ATP to the 5'-hydroxy end of the oligonucleotide chain (double-stranded or single-stranded DNA). In some embodiments, the concentration of the phosphokinase in the reaction system is 0.001 U / μL-5 U / μL; preferably 0.01 U / μL-1 U / μL; more preferably 0.1-0.5 U / μL. In some embodiments, the reaction phosphorylation of the 5' end of the DNA fragment is carried out at 10-50°C for 5-60 min; preferably at 15-45°C for 10-50 min; more preferably at 25-40°C for 10-40 min; and even more preferably at 37°C for 15-30 min. Those skilled in the art can rationally select existing known phosphokinases, or can extract enzymes with completely identical or similar functions as substitutes. Those skilled in the art can also select appropriate enzyme dosage, buffer solution, coenzyme factor, salt ions, and adjust the reaction temperature and time, as well as the pH value and concentration of each component of the reaction system, according to the specific type of phosphokinase used and the desired effect.

[0041] According to some embodiments of this application, a DNA fragment is ligated to a linker containing a uracil nucleotide. In some embodiments, the uracil nucleotide is a deoxyuracil nucleotide (dUMP). In some embodiments, the linker is a double-linked linker comprising a first single strand and a second single strand; wherein the first single strand is ligated to the 3' end of the DNA fragment, and the second single strand is ligated to the 5' end of the DNA fragment.

[0042] In some embodiments, the 5' end of the adapter is phosphated to facilitate ligation between the adapter and the DNA fragment. In some embodiments, the 3' end of the adapter has a protruding end. In some preferred embodiments, the 5' end of the first single strand of the adapter is phosphated, and the 3' end of the second single strand of the adapter has a protruding end. Preferably, the 3' protruding end of the adapter is complementary to the 3' protruding end of the DNA fragment, thereby enabling the adapter to recognize and ligate with the DNA fragment. For example, when the 3' protruding end of the DNA fragment is one or more A bases, the 3' protruding end of the adapter is correspondingly designed to be the corresponding number of T bases. For example, when the 3' protruding end of the DNA fragment is polyG, the 3' protruding end of the adapter is correspondingly designed to be polyC. In some embodiments, to prevent the end of the adapter not intended to ligate with the DNA fragment from ligating with the DNA fragment, the 5' end of the non-ligation end may not have a phosphate group (e.g., a hydroxyl group), and / or this end may be designed as a protruding end that is not complementary to the end of the DNA fragment. For example, to prevent the end of the adapter not intended to be attached to a DNA fragment from being attached to a DNA fragment, the 5' end of the second single strand of the adapter may not have a phosphate group (e.g., a hydroxyl group), and / or the 3' end of the first single strand and the 5' end of the second single strand may be designed to be non-complementary (e.g., in a Y-shaped structure).

[0043] According to some embodiments of this application, the adapter has a uracil nucleotide on one of its single strands. In some embodiments, the uracil nucleotide is located on the single strand (second single strand) of the adapter intended to connect to the 5' end of a DNA fragment. In some embodiments, the uracil nucleotide and the 3' overhang of the adapter are on the same single strand. In some embodiments, the number of uracil nucleotides is 1-10; preferably, 1-5. More preferably, the number of uracil nucleotides is 1. In some embodiments, the length from the position of the uracil nucleotide to the end of its single strand not intended to connect to a DNA fragment (i.e., the 5' end of the second single strand) is 3-20 nt; preferably, the length is 5-15 nt. When more than one uracil nucleotide is present, the length of the uracil nucleotide to the 5' end is calculated based on the farthest distance.

[0044] In some embodiments, the sequences of the two single strands of the adapter may be completely complementary or partially complementary. In some embodiments, the sequences of the two single strands of the adapter may be partially complementary at either end or both ends. Preferably, the two single strands of the adapter are complementary at one end intended to connect to the DNA fragment (i.e., the adapter has a Y-shaped structure). In some embodiments, the two single strands of the adapter in the middle portion are not complementary (i.e., the adapter has a bubble-like structure).

[0045] In some embodiments, on the other strand (first strand) of the single strand containing the uracil nucleotide (the second single strand), the sequence from the nucleotide (A) complementary to the uracil nucleotide to the end (i.e., the 3' end) of that single strand contains an inversely complementary nucleotide sequence portion. Inverse complementarity means that when this nucleotide sequence portion is flipped from the 5' end to the 3' end, the flipped sequence can form a complementary pair with the nucleotide sequence portion itself. In some embodiments, the inversely complementary nucleotide sequence portion includes the nucleotide at the 3' end of the single strand (first single strand). According to some embodiments of this application, after ligating the DNA fragment to the adapter, a DNA ligation product with uracil nucleotides at both ends is obtained. The nucleotides from the uracil nucleotide to the end of its single strand not intended to be ligated to the DNA fragment (i.e., the 5' end direction of the second single strand) are removed, that is, the nucleotides from the uracil nucleotide to the 5' end of the DNA ligation product are removed, forming sticky ends at both ends of the DNA product. The sequences at both sticky ends have inversely complementary nucleotide sequence portions, thereby causing the DNA product to self-circulate (see...). Figure 1 ).

[0046] In some embodiments, the length of the uracil complementary nucleotide (A) to its 3' end is 3-20 nt; preferably, 5-15 nt; more preferably, 8-10 nt. When more than one uracil nucleotide is present, the length of the uracil complementary nucleotide to its 3' end is calculated based on the furthest distance. In some embodiments, the length of the reverse complementary nucleotide sequence portion is 3-20 nt; preferably, 5-15 nt; more preferably, 8-10 nt. In some embodiments, on the other single strand (first single strand) of the single strand containing the uracil nucleotide (second single strand), the length of the sequence of the uracil complementary nucleotide (A) to its 3' end is greater than the length of the reverse complementary nucleotide sequence portion. In some embodiments, the length of the uracil complementary nucleotide (A) to its 3' end is 1-5 nt longer than the length of the reverse complementary nucleotide sequence portion; preferably, 2-3 nt longer. According to some embodiments of this application, the extra nucleotide is used to form a single-strand gap on the circular molecule during DNA product cyclization (see...). Figure 1 ).

[0047] In some embodiments, the two single strands of the connector may have the same number of bases, or the two single strands of the connector may have different numbers of bases. In some embodiments, the two single strands of the connector may each have 20-50 nt of bases; preferably, 25-40 nt. In some embodiments, the length of the non-complementary portion between the two single strands of the connector may be 10-30 nt; preferably, 15-25 nt. In some embodiments, the total length of the complementary portion between the two single strands of the connector may be 10-30 bp; preferably, 15-25 bp. Those skilled in the art can design the specific sequence of the two single strands of the connector and adjust the length of the complementary or non-complementary portions in the middle or at both ends to form a stable connector. In some embodiments, the concentration of the connector in the linkage reaction system is 100 nmol / L-10 μmol / L; preferably, the concentration is 200 nmol / L-1 μmol / L; more preferably, the concentration is 300 nmol / L-600 nmol / L.

[0048] According to some embodiments of this application, the adapter may optionally, but not necessarily, have a labeled modifying group. Labeled modifying groups are generally used for the adsorption or purification of DNA fragments. When the adapter is ligated to a DNA fragment to form a ligation product, the ligation product or its derivatives can be readily adsorbed or purified using molecules or groups that have an affinity or interaction with the modifying group of the adapter. For example, a ligand in a ligand-receptor interaction system can be selected as the modifying group of the adapter, and a solid phase bound to a receptor can be used to adsorb or purify the ligation product. For example, the ligand-receptor interaction system can be a typical biotin-avidin system. To achieve equivalent adsorption or purification functionality, those skilled in the art can also select other molecules or groups with strong affinity interactions, such as antibodies and antigens, amino and hydroxyl groups, etc. In some embodiments, the molecules or groups with affinity or interaction are immobilized. In some embodiments, the reaction product with the modified group is purified using a solid phase, such as magnetic beads, a centrifuge column, or a chromatographic column, immobilized with molecules or groups with affinity or interaction. In some embodiments, the affinity constant between the modifying group and the molecules or groups with affinity or interaction is at least 10. 5 mol / L. Preferably, the affinity constant between the modifying group and the molecule or group with which it has an affinity or interaction is at least 10. 6 mol / L to 10 18 mol / L. More preferably, the affinity constant between the modifying group and the molecule or group with which it has an affinity or interaction is at least 10. 8 mol / L to 10 16 mol / L.

[0049] According to some embodiments of this application, the adapter optionally, but not necessarily, has a tag sequence. In some embodiments, the tag sequence includes, but is not limited to, a unique molecular tag (UMI) sequence and a sample tag sequence. The unique molecular tag (UMI) sequence is used to count the copy number of nucleic acid molecules in a sample. The sample tag sequence is used to distinguish different samples for subsequent multi-sample pooling sequencing. For example, the tag sequence can be a barcode sequence or an index sequence. In some embodiments, the tag sequence is 3-20 nt in length; preferably, the tag sequence is 4-15 nt in length; more preferably, it is 8-10 nt.

[0050] In some embodiments, a DNA ligase is used to ligate the DNA fragment to the adapter. In some embodiments of this application, the DNA ligase includes at least one of ATP-dependent DNA ligase and NAD+-dependent DNA ligase. In some preferred embodiments, the DNA ligase includes at least one of T4 DNA ligase, Taq DNA ligase, DNA ligase I, DNA ligase III, and DNA ligase IV. In some embodiments, the concentration of the DNA ligase in the ligation reaction system is 0.5 U / μL-50 U / μL; preferably 1 U / μL-40 U / μL; more preferably 2 U / μL-20 U / μL. In some embodiments, the ligation reaction is carried out at 10-40°C for 5-60 min; preferably at 16-25°C for 10-40 min. Those skilled in the art can reasonably select existing known DNA ligases, or can extract enzymes with completely identical or similar functions as substitutes. Those skilled in the art can also select appropriate enzyme dosage, buffer solution, coenzyme factor, salt ions, and adjust reaction temperature and time, as well as pH value and concentration of each component in the reaction system, based on the specific type of DNA ligase used and the desired effect.

[0051] According to some embodiments of this application, the removal of uracil nucleotides to the terminal nucleotides on the DNA ligation product is achieved by including the following steps: (1) forming a single nucleotide gap at the position of the uracil nucleotide, and (2) removing the single nucleotide gap to the terminal nucleotides. In some embodiments, the uracil nucleotides to the 5' end of the single strand are removed.

[0052] In some embodiments, a uracil cleavage enzyme is used to create a single nucleotide gap at the uracil nucleotide position. In some embodiments, creating a single nucleotide gap at the uracil nucleotide position includes cleaving the uracil base of the uracil nucleotide and breaking the 3' and 5' phosphodiester bonds of the ribose at the debasement site. In some embodiments, a single nucleotide gap is created at the uracil nucleotide position simultaneously or sequentially using uracil DNA glycosylase (UDG) and a DNA lyase. Uracil DNA glycosylase (UDG) catalyzes the cleavage of the uracil base, forming a debasement (deuracil) site while maintaining the integrity of the phosphodiester backbone. The lysinic activity of a DNA lyase, such as endonuclease VIII, causes the 3' and 5' phosphodiester bonds at the debasement site to release a baseless deoxyribose, thereby creating a single nucleotide gap at the uracil nucleotide position. In some embodiments, a uracil cleavage enzyme can also be used... The enzyme forms a single nucleotide nick at the uridine nucleotide position. The USER (uridine-specific cleavage reagent) enzyme combines the activities of both enzymes mentioned above, forming a single nucleotide nick at the uridine nucleotide position. In some embodiments, the concentration of the uridine cleaving enzyme in the reaction system is 0.001 U / μL-5 U / μL; preferably 0.01 U / μL-0.5 U / μL; more preferably 0.01 U / μL-0.05 U / μL. In some embodiments, the single nucleotide nick formation reaction is carried out at 16-50°C for 5-60 min; preferably, at 25-40°C for 10-40 min. Those skilled in the art can reasonably select existing known uridine cleaving enzymes, or can extract enzymes with completely identical or similar functions for substitution. Those skilled in the art can also select appropriate enzyme dosage, buffer solution, coenzyme factor, salt ions, and adjust the reaction temperature and time, as well as the pH value and concentration of each component of the reaction system, according to the specific type of uridine cleaving enzyme used and the desired effect.

[0053] In some embodiments, the nucleotides from the mononucleotide nick to the end are removed by physical / chemical or enzymatic methods. In some embodiments, the nucleotides from the mononucleotide nick to the end are removed by heat denaturation or alkaline denaturation. By controlling the high-temperature conditions of physical denaturation or the alkaline system of chemical denaturation, the hydrogen bonds between the nucleic acid double strands from the mononucleotide nick to the end are broken, thereby detaching the single strand from the mononucleotide nick to the end, while reducing or avoiding the impact on the nucleic acid double strands of the large fragment of the DNA ligation product. In some embodiments, the single strand from the mononucleotide nick to the end is detached by treating at 65-80°C for 5-20 min; preferably, treatment at 70-75°C for 10-15 min.

[0054] According to some embodiments of this application, the nucleotides from the uracil nucleotide to the ends of the DNA ligation product are removed to obtain a DNA digestion product with sticky ends at both ends. In some embodiments, each sticky end of the DNA digestion product contains an inversely complementary nucleotide sequence portion, which serves as the circularization linking region of the DNA circular molecule. The sticky ends of the DNA digestion product originate from the nucleotide sequence from the complementary nucleotide (A) to the end (i.e., the 3' end) of the first single strand (the other single strand) of the linker single strand (second single strand) containing the uracil nucleotide. By designing the specific sequence of the linker, each sticky end nucleotide single strand contains an inversely complementary nucleotide sequence portion. Therefore, these inversely complementary nucleotide sequence portions from the linker serve as the linking region for the self-circularization of the DNA fragment, allowing the DNA digestion product with sticky ends to undergo self-circularization to form a DNA circular molecule. In some embodiments, the linking region of the DNA circular molecule has a notch on each of the two single strands. Because the length of the uracil complementary nucleotide (A) to its 3' end is greater than the length of the reverse complementary nucleotide sequence, and the reverse complementary nucleotide sequence includes the nucleotide at the 3' end of its respective single strand, the sticky ends only undergo complementary connection at their 3' ends, resulting in a gap on each of the different single strands before and after the linker region of the DNA circular molecule. In some embodiments, the length of each gap is 1-5 nt; preferably, the length of each gap is 2-3 nt.

[0055] In some embodiments, to improve the specificity of complementary ligation at the sticky ends of the DNA digestion product, the cyclization reaction may optionally omit the use of DNA ligase. In some embodiments, the sticky ends of the DNA digestion product already possess sufficient complementary ligation specificity. For example, when the sticky ends of the DNA digestion product have appropriately long reverse complementary nucleotide sequence portions. Optionally, the cyclization reaction may use DNA ligase to improve cyclization efficiency. Those skilled in the art can rationally select existing known DNA ligases, or can extract enzymes with completely identical or similar functions as substitutes. Those skilled in the art can also select appropriate enzyme dosage, buffer solution, coenzyme factors, salt ions, and adjust the reaction temperature and time, as well as the pH value and concentration of each component of the reaction system, depending on whether it is used, the specific type of DNA ligase used, and the desired effect.

[0056] According to some embodiments of this application, circular DNA molecules are purified. In some embodiments, purification of circular DNA molecules includes digesting linear DNA molecules and recovering the circular DNA molecules. In some embodiments, linear DNA molecules are digested by physical / chemical methods or enzymatic methods. In some embodiments, linear DNA molecules are digested by enzymatic methods. In some embodiments, linear DNA molecules are digested by DNases. Taking advantage of the greater resistance of circular DNA molecules compared to linear DNA molecules, the linear DNA molecules are hydrolyzed or cleaved by controlling the digestion reaction system of physical / chemical methods or enzymatic methods, while reducing or avoiding the impact on the circular DNA molecules. There are various ways to digest linear DNA molecules without affecting the circular DNA molecules, and those skilled in the art can determine this by adjusting the reaction conditions. For example, by adjusting the temperature, time, and reaction system (e.g., enzyme dosage, ion concentration), the digested mixture is screened to determine the reaction conditions that efficiently digest linear DNA molecules while avoiding affecting the circular DNA molecules.

[0057] In some embodiments, nucleases, including endonucleases and exonucleases, are used to digest linear DNA molecules. In some preferred embodiments, exonucleases are used to digest linear DNA molecules. In some embodiments of this application, the exonucleases include ATP-dependent deoxyribonucleases (ATP-Dependent DNases). In some preferred embodiments, the concentration of the exonuclease in the reaction system is 0.001 U / μL-5 U / μL; preferably 0.01 U / μL-0.5 U / μL; more preferably 0.01 U / μL-0.05 U / μL. In some embodiments, the digestion reaction is carried out at 10-50°C for 5-60 min; preferably at 15-45°C for 10-50 min; more preferably at 25-40°C for 10-40 min; and even more preferably at 37°C for 15-30 min. Those skilled in the art can reasonably select existing known exonucleases, or can extract enzymes with completely identical or similar functions for substitution. Those skilled in the art can also select appropriate enzyme dosage, buffer solution, coenzyme factor, salt ions, and adjust reaction temperature and time, as well as pH value and concentration of each component in the reaction system, based on the specific type of exonuclease used and the desired effect.

[0058] In some embodiments, circular DNA molecules are recovered. Those skilled in the art are familiar with many methods for purifying and recovering nucleic acids, including but not limited to the phenol / chloroform method, centrifugation column method, gel extraction method, and magnetic bead method. Those skilled in the art can rationally select an appropriate nucleic acid purification method based on the desired specific effect and the characteristics of the sample.

[0059] The phenol / chloroform method is one of the most common DNA purification methods. This method utilizes the density difference between phenol and chloroform to separate DNA molecules from other impurities. An equal volume of phenol / chloroform mixture is added to a solution containing DNA molecules, mixed thoroughly, and then centrifuged to separate the layers. The upper DNA layer is then removed, and an equal volume of isopropanol is added to precipitate the DNA molecules.

[0060] The centrifugation column method is also a rapid and simple DNA purification method. This method utilizes the special structure of the centrifugation column to separate DNA molecules from other impurities. A DNA solution is added to the centrifugation column, and the DNA molecules are separated from other impurities by centrifugation. The DNA molecules in the centrifugation column are then washed and eluted to obtain purified DNA molecules.

[0061] Gel extraction is a commonly used DNA recovery technique, primarily used to extract target DNA fragments from agarose gels. The agarose gel containing the target fragment is excised under ultraviolet light, avoiding sections of gel containing the target fragment as much as possible. The recovered gel is then added to a sol solution and heated to melt the gel. The DNA fragments in the sol solution are adsorbed and eluted, yielding purified DNA molecules.

[0062] Magnetic bead purification is a highly efficient and automated method for DNA purification. It utilizes the specific binding of affinity molecules on the surface of magnetic beads to DNA molecules, separating them from other impurities. The specific steps are as follows: First, cells are lysed, releasing DNA molecules into the solution; then, magnetic beads are added, allowing them to bind to the DNA molecules; finally, magnetic force is used to separate the magnetic beads from the DNA molecules, yielding purified DNA. Furthermore, by adjusting the ratio of magnetic beads to DNA fragment samples, specific sizes of DNA fragments can be adsorbed. In some cases, two rounds of magnetic bead adsorption can be used to screen for DNA fragments of specific sizes from the sample; in the first round, the magnetic beads adsorb larger fragments (above the target size), discarding the beads and retaining the supernatant (containing the target fragment); then, in the second round, the magnetic beads adsorb the target fragment, discarding the supernatant, and eluting the target fragment from the beads for recovery.

[0063] In some embodiments of this application, the reaction products obtained from any of the individual reaction steps or their sub-steps may be purified. Those skilled in the art can determine whether a purification step is necessary based on the overall process efficiency and reaction yield. For example, when the reaction is highly specific and the reaction product is relatively singular, a purification step may be omitted. For instance, purifying the reaction product and using it in subsequent steps can improve the yield of the specific reaction product in the next step. Alternatively, if the reaction product from a previous step is not singular, but a specific reaction product is specific in the next reaction step, the purification step may also be omitted.

[0064] In some preferred embodiments, magnetic beads can be used to purify DNA products obtained from each reaction step or its sub-steps. Various magnetic bead purification kits are commercially available to those skilled in the art. These magnetic bead purification methods are primarily based on solid-phase reversible immobilization technology. The surface of the magnetic beads is specifically modified, and the adsorption and elution of DNA are achieved through interactions such as electrostatic, hydrophilic, and hydrophobic interactions between the magnetic beads and DNA molecules. Some magnetic bead purification kits work by using magnetic beads with silanol or carboxyl functional groups modified on their outer surface. In a purification buffer system containing PEG, high salt ions, etc., DNA is adsorbed by forming ion bridges between DNA, salt ions, and carboxyl groups. Simple magnetic field treatment can separate the DNA-adsorbed magnetic beads from impurities in other supernatants. In a buffer solution free of PEG and salt ions, the ion bridges between DNA and the magnetic beads are broken, thereby reversibly desorbing the target DNA from the magnetic beads. In some more preferred embodiments, the magnetic bead method may include, for example, AMPure XP magnetic beads or DNA Clean Beads. Those skilled in the art can rationally select appropriate magnetic bead purification kits based on the desired specific effects and the characteristics of the sample.

[0065] According to some embodiments of this application, a controlled-length nick translation is performed on a circular DNA molecule with a single-strand nick. In some embodiments, the nick translation is achieved by at least one or a mixture of enzymes: (1) a nucleic acid polymerase with 5'-3' strand substitution activity; or (2) a 5'-3' nucleic acid polymerase and a nucleic acid polymerase or 5'-3' exonuclease with 5'-3' exonuclease activity. In some embodiments, the nick translation is performed using a nucleic acid polymerase with 5'-3' strand substitution activity. A single nucleotide strand is synthesized at the 3' end of the nick using the nucleic acid polymerase with 5'-3' strand substitution activity, while the nucleotide is dissociated at the 5' end of the nick. In some embodiments, the nick translation is performed simultaneously using a 5'-3' nucleic acid polymerase and a nucleic acid polymerase or 5'-3' exonuclease with 5'-3' exonuclease activity. Nucleotide single strands are synthesized at the 3' end of the notch by a 5'-3' nuclease polymerase, while a nuclease polymerase or 5'-3' exonuclease with 5'-3' exonuclease activity cleaves the nucleotide at the 5' end of the notch.

[0066] In some preferred embodiments, the nucleic acid polymerases used for nick translation include T4 DNA polymerase, DNA polymerase I large fragment (Klenow fragment), and Klenow fragment (3'→5'exo-). DNA polymerase, (exo-)DNA polymerase, Deep DNA polymerase, Deep At least one of (exo–)DNA polymerase, DNA polymerase I, Taq DNA polymerase, Bst DNA polymerase, and Phi29 DNA polymerase. In some embodiments, the concentration of the nucleic acid polymerase in the reaction system is 0.001 U / μL-10 U / μL; preferably 0.01 U / μL-5 U / μL; more preferably 0.1 U / μL-2 U / μL. In some embodiments, the nick translation reaction is carried out at 10-50°C for 5-60 min; preferably at 20-45°C for 10-50 min; more preferably at 25-40°C for 10-40 min; and even more preferably at 37°C for 15-30 min. Those skilled in the art can reasonably select existing known nucleic acid polymerases or exonucleases, or can extract enzymes with completely identical or similar functions for substitution. Those skilled in the art can also select appropriate enzyme dosage, buffer solution, coenzyme factor, salt ions, dNTPs, and adjust reaction temperature and time, as well as pH value and concentration of each component in the reaction system, based on the specific type of nucleic acid polymerase or exonuclease used and the desired effect.

[0067] According to some embodiments of this application, the controlled nick translation length is 150-400 bp. In some preferred embodiments, the controlled nick translation length is 200-300 bp. There are various ways to control the length of the nick translation, and those skilled in the art can determine the reaction conditions for controlling a specific nick translation length by adjusting the reaction conditions to limit the synthesis of single nucleotide strands and the efficiency of nucleotide cleavage. For example, by adjusting the temperature, time, and reaction system (e.g., enzyme dosage, dNTP concentration), the DNA product after nick translation is cleaved, and the reaction conditions required to obtain a specific length of restriction nick translation (CNT) are determined based on the fragment size of the product. Experimental procedures for controlling a specific length of nick translation by adjusting reaction conditions can be found in the publications of Dong Zirui et al., which are incorporated herein by reference in their entirety where applicable national and regional patent laws apply.

[0068] According to some embodiments of this application, a mismatch-cutting nuclease is used to cleave circular DNA molecules at translational gaps. In some preferred embodiments, the mismatch-cutting nuclease is T7 endonuclease I. T7 endonuclease I is a product of the T7 gene 3 and can recognize and cleave incompletely paired, gapped DNA double strands. The cleavage site is located at the first, second, or third phosphodiester bond at the 5' end of the mismatched base. In some embodiments, the concentration of the mismatch-cutting nuclease in the reaction system is 0.001 U / μL-10 U / μL; preferably 0.01 U / μL-5 U / μL; more preferably 0.1 U / μL-2 U / μL. In some embodiments, the gap-cutting reaction is carried out at 10-50°C for 5-60 min; preferably at 20-45°C for 10-50 min; more preferably at 25-40°C for 10-40 min; and even more preferably at 37°C for 15-30 min. Those skilled in the art can reasonably select existing known mismatch cleavage nucleases, or extract enzymes with completely identical or similar functions as substitutes. Those skilled in the art can also select appropriate enzyme dosage, buffer solution, coenzyme factors, salt ions, and adjust reaction temperature, time, pH value, and concentration of each component in the reaction system, based on the specific type of mismatch cleavage nuclease used and the desired effect.

[0069] According to some embodiments of this application, a circular DNA molecule is cut at a translational notch to form two fragments, and the DNA fragment products are then screened. In some embodiments of this application, the DNA fragment products to be screened are fragments with sequences at both ends of the DNA fragment, and the middle portion of these fragments contains sequences from the adapter.

[0070] In some embodiments, DNA fragment products are screened based on a specific length. Generally, the adapter itself is relatively small and negligible in length, and the screened DNA fragment products are approximately twice the length of the control nick translation; while the other fragment is relatively larger. For example, when the control nick translation length is 200-300 bp, DNA fragment products with a length of approximately 400-600 bp are screened. In some embodiments, DNA fragment products of a specific length are screened using gel extraction. Gel extraction allows for the direct separation of the two DNA fragments of different sizes formed after cleaving the circular DNA molecule. In some embodiments, DNA fragment products of a specific length are screened using magnetic beads. In some optional embodiments, DNA fragment products of a specific length are screened by adjusting the ratio of magnetic beads to DNA fragment samples and / or by performing two rounds of magnetic bead adsorption.

[0071] This application's technical solution controls the length of the notch translation, causing the notch to move away from the adapter sequence region and towards the DNA fragment sequence region. This process cuts the notch-translated DNA product and filters for target fragments of specific length, thereby obtaining fragments with sequences at both ends of the DNA fragment. This technical solution avoids the need for labeling and magnetic beads to capture target fragments after randomly fragmenting circular DNA molecules in existing Mate pair library construction processes. Simultaneously, it efficiently and accurately controls the depth of obtaining DNA fragment end sequences, improving the effectiveness of sequencing results and data analysis, ultimately significantly increasing the efficiency of sequence assembly.

[0072] In some embodiments, the adapter has a labeled modifying group, allowing the adsorption or purification of DNA fragment products using molecules or groups that have an affinity for or interact with the modifying group of the adapter. For example, a typical biotin-avidin system can be used, with the adapter labeled with a biotin-modifying group, and avidin-immobilized magnetic beads used to purify the DNA fragment products. To achieve equivalent adsorption or purification functions, those skilled in the art can also select other molecules or groups with strong affinity interactions, such as antibodies and antigens, amino and hydroxyl groups, etc.

[0073] According to some embodiments of this application, the method of this application further includes amplifying and / or sequencing the screened DNA fragment products.

[0074] In some embodiments, the screened DNA fragment products are amplified to obtain an amplified library. In some embodiments, before amplifying the screened DNA fragment products, the method of this application further includes end repair of the DNA fragment products. End repair of the DNA fragment products can be referred to the embodiments regarding end repair described above. In some embodiments, end repair of the DNA fragment products further includes adding a 3' protruding sticky end and / or phosphorylating the 5' end of the DNA fragment.

[0075] In some embodiments, before amplifying the screened DNA fragment product, the method of this application further includes ligating the DNA fragment product with an additional adapter to obtain a secondary ligation product. The ligation of the DNA fragment product with the additional adapter can refer to the embodiments described above regarding adapter ligation; the additional adapter does not need to contain uracil nucleotides. In some embodiments, the additional adapter may have a tag sequence. Preferably, the tag sequence includes, but is not limited to, a unique molecular tag (UMI) sequence and a sample tag sequence. The unique molecular tag (UMI) sequence is used to count the copy number of nucleic acid molecules in a sample. The sample tag sequence is used to distinguish different samples for subsequent multi-sample pooling sequencing. For example, the tag sequence may be a barcode sequence or an index sequence.

[0076] In some embodiments, the amplification step may employ PCR amplification or PCR-free amplification. In some embodiments, the DNA fragment product or secondary ligation product is amplified to obtain an amplified library. In some embodiments, amplification primers are designed based on the sequence of the additional adapter in the secondary ligation product, thereby performing amplification. Those skilled in the art can rationally select a suitable amplification reaction system according to the specific desired effect and the characteristics of the nucleic acid sample.

[0077] In some embodiments, PCR amplification can be performed based on DNA fragment products or secondary ligation products. In some embodiments, forward or reverse primers are designed based on the sequence of the additional adapter in the secondary ligation product to perform PCR amplification. In some embodiments, primers with tag sequences can be designed, whether or not the adapter contains a tag sequence. By introducing known tag sequences through adapters or amplification primers, those skilled in the art can rationally select appropriate sequencing technologies to detect nucleic acid samples or their amplified libraries.

[0078] In some implementations, PCR-free amplification can be performed based on DNA fragment products and secondary ligation products. PCR-free amplification, or PCR-free library preparation, as the name suggests, refers to a library preparation process that does not require PCR amplification. Its advantage is that it avoids errors introduced by PCR amplification throughout the entire process from library preparation to sequencing. PCR-free amplification technology involves binding nucleic acid samples to bridging connectors. A bridging connector is a short DNA molecule with a specific sequence that can ligate to or bind to a nucleic acid sample, providing a sequence that can be analyzed by sequencing technology. After bridging, two adjacent nucleic acid molecules in the nucleic acid sample are linked by the connector, and then nucleic acid amplification methods in PCR-free technology, such as rolling circle amplification (RCA), can be used to amplify the ligated nucleic acid sequence. RCA is an amplification technique that does not require PCR cycling and can generate a large number of DNA copies. Through the above steps, the PCR-free method can amplify nucleic acid samples and generate a sufficient number of copies for sequencing analysis without PCR cycling, avoiding the errors and selective amplification problems caused by PCR cycling in traditional PCR methods, and saving time and costs.

[0079] In some embodiments of this application, the sequencing step includes sequencing the DNA fragment product, the secondary ligation product, or its amplified library. In some embodiments, the sequencing step includes repeating the sequencing of the DNA fragment product, the secondary ligation product, or its amplified library using the same template. Sequencing techniques that use the same template for repeat sequencing can correct read sequences, thereby improving sequencing accuracy. In some preferred embodiments, the sequencing method includes, but is not limited to, BGI Genomics' DNBSEQ sequencing technology.

[0080] In some specific implementations, DNBseq sequencing technology is used to sequence nucleic acid fragment ligation products or libraries of nucleic acid fragment ligation products. BGI Genomics' DNBseq technology consists of three stages: denaturation and circularization of the library; amplification of DNBs (DNA nanoballs) via rolling circle amplification (RCA); and loading the DNBs onto a regular array of a sequencing chip for sequencing. By circularizing the DNA strands of the library to form circular DNA products, and then performing rolling circle amplification with DNA polymerase, DNBseq technology not only effectively increases the copy number of the DNA to be tested, greatly enhancing the signal intensity, but also avoids the accumulation of errors during PCR amplification by using the same template for rolling circle replication, effectively improving sequencing accuracy. After circularization amplification, the DNBs are loaded onto the sequencing chip array. The DNBs hybridize complementaryly with the adapters on the array. Under the catalysis of DNA polymerase, the sequencing template combines with fluorescently labeled probes in the sequencing reagents. This excitation of fluorescent groups results in luminescence. The light signals emitted by different fluorescent groups are collected by the instrument's camera, processed, and converted into digital signals. After further processing, the base sequence information of the sample to be tested is finally obtained. DNB increases the copy number of DNA in the sample being tested, and each amplification references the original template, thereby enhancing signal strength and improving sequencing accuracy.

[0081] According to another embodiment of this application, a sequencing method for a large fragment sequencing library is provided, comprising the following steps:

[0082] A DNA fragment is provided, and the DNA fragment is ligated to a adapter containing uracil nucleotides to obtain a DNA ligation product with the adapters attached to both ends;

[0083] The uracil nucleotides at the ends of the DNA ligation product are removed to obtain a DNA digestion product with sticky ends at both ends; wherein the sequences of the sticky ends at both ends are complementary.

[0084] The DNA digestion product is circularized to form a DNA circular molecule with a single-strand gap. The DNA circular molecule is then translated with a controlled length gap. The DNA circular molecule is cut at the translated gap, and DNA fragment products are screened to obtain a large fragment sequencing library.

[0085] Sequencing of large fragment sequencing libraries.

[0086] In some embodiments, sequencing can be performed using any method known in the art, including but not limited to the use of conventional sequencing platforms, instruments, or equipment. In some embodiments, sequencing is performed using a sequencer.

[0087] Those skilled in the art can adjust the sequencing parameters appropriately based on the selected sequencing method to achieve the sequencing objective. In some embodiments of the present invention, the sequencing method further includes analyzing the sequencing data to obtain the sequence information at both ends of a large DNA fragment.

[0088] According to another embodiment of this application, a kit for preparing large-fragment sequencing libraries is provided, comprising: an adapter ligation kit; a single-strand digestion kit; a nick translation kit; and a nick cleavage kit. In some optional embodiments of this application, the kit may further comprise: an enzyme digestion kit; an end repair kit; a purification kit; a secondary ligation kit; an amplification kit; or a sequencing kit.

[0089] In some embodiments, the enzyme digestion kit is used to break the DNA double strand to provide DNA fragments. In some embodiments, the enzyme digestion kit includes a restriction enzyme. In some embodiments, the restriction enzyme includes at least one of deoxyribonuclease I (DNase I), deoxyribonuclease II (DNase II), micrococcal nuclease (MNase), double-strand-specific DNAase (dsDNase), salt-active nuclease (SAN), and restriction enzyme Vvn. Those skilled in the art can reasonably select existing known restriction enzymes, or can extract enzymes with completely identical or similar functions as substitutes. Those skilled in the art can also provide appropriate enzyme dosage, buffer, coenzyme factors, and salt ions in the enzyme digestion kit according to the specific type of restriction enzyme used and the desired effect.

[0090] In some embodiments, the end-repair kit is used for end repair of DNA fragments. In some embodiments, the end-repair kit includes a nucleic acid polymerase. Preferably, the nucleic acid polymerase has 5'-3' DNA polymerase activity. In some embodiments, the nucleic acid polymerase is a DNA-dependent DNA polymerase. In some preferred embodiments, the DNA-dependent DNA polymerase includes at least one of T4 DNA polymerase, DNA polymerase I large fragment (Klenow fragment, KlenowFragment), T7 DNA polymerase, DNA polymerase I, Taq DNA polymerase, Bst DNA polymerase, and Phi29 DNA polymerase. In some specific embodiments, the nucleic acid polymerase includes at least one of T4 DNA polymerase, KlenowFragment, and Taq DNA polymerase. In the presence of dATP, Taq DNA polymerase can add a base A to the 3' end of the DNA fragment. By adding a base A to the 3' end of the DNA fragment to form a protruding sticky end, it is beneficial for subsequent ligation with a double-linker head having a protruding sticky end of base T. Taq DNA polymerase is preferably recombinant Taq DNA polymerase (rTaq DNA polymerase). Those skilled in the art can reasonably select existing known nucleic acid polymerases, or extract enzymes with completely identical or similar functions as substitutes. Those skilled in the art can also provide appropriate enzyme dosages, buffer solutions, coenzyme factors, salt ions, and dNTPs in the end-repair kit, depending on the specific type of nucleic acid polymerase used and the desired effect.

[0091] In some embodiments, the end-repair kit further includes a phosphokinase for phosphorylating the DNA fragment. In some embodiments, the phosphokinase is a polynucleotide 5'-hydroxykinase. The polynucleotide 5'-hydroxykinase is used to phosphorylate the 5' end of the DNA fragment. Preferably, the polynucleotide 5'-hydroxykinase is T4 polynucleotide kinase (T4 PNK). Those skilled in the art can reasonably select existing known phosphokinases, or can extract enzymes with completely identical or similar functions as substitutes. Those skilled in the art can also provide appropriate enzyme dosages, buffer solutions, coenzyme factors, and salt ions in the end-repair kit according to the specific type of phosphokinase used and the desired effect.

[0092] In some embodiments, the adapter ligation kit is used to ligate a DNA fragment to an adapter. In some embodiments, the adapter ligation kit includes an adapter having a uracil nucleotide. In some embodiments, the uracil nucleotide is a deoxyuracil nucleotide (dUMP). In some embodiments, the adapter is a double-stranded adapter comprising a first single strand and a second single strand; wherein the first single strand is a single strand in the adapter intended to ligate to the 3' end of the DNA fragment, and the second single strand is a single strand in the adapter intended to ligate to the 5' end of the DNA fragment.

[0093] In some embodiments, the 5' end of the adapter has a phosphate group. In some embodiments, the 3' end of the adapter has a protruding end. In some preferred embodiments, the 5' end of the first single strand of the adapter has a phosphate group, and the 3' end of the second single strand of the adapter has a protruding end. Preferably, the 3' protruding end of the adapter is complementary to the 3' protruding end of the DNA fragment, thereby enabling the adapter to recognize and ligate with the DNA fragment. For example, when the 3' protruding end of the DNA fragment is one or more A bases, the 3' protruding end of the adapter is correspondingly designed to be the corresponding number of T bases. For example, when the 3' protruding end of the DNA fragment is polyG, the 3' protruding end of the adapter is correspondingly designed to be polyC. In some embodiments, to prevent the end of the adapter not intended to ligate with the DNA fragment from ligating with the DNA fragment, the 5' end of the non-ligating end may not have a phosphate group (e.g., a hydroxyl group), and / or this end may be designed as a protruding end that is not complementary to the end of the DNA fragment. For example, to prevent the end of the adapter not intended to be attached to a DNA fragment from being attached to a DNA fragment, the 5' end of the second single strand of the adapter may not have a phosphate group (e.g., a hydroxyl group), and / or the 3' end of the first single strand and the 5' end of the second single strand may be designed to be non-complementary (e.g., in a Y-shaped structure).

[0094] In some embodiments, the adapter has a uracil nucleotide on one of its single strands. In some embodiments, the uracil nucleotide is located on the single strand (second single strand) of the adapter intended to connect to the 5' end of a DNA fragment. In some embodiments, the uracil nucleotide and the 3' overhang of the adapter are on the same single strand. In some embodiments, the number of uracil nucleotides is 1-10; preferably, 1-5. More preferably, the number of uracil nucleotides is 1. In some embodiments, the length from the position of the uracil nucleotide to the end of its single strand not intended to connect to a DNA fragment (i.e., the 5' end of the second single strand) is 3-20 nt; preferably, 5-15 nt. When more than one uracil nucleotide is present, the length of the uracil nucleotide to the 5' end is calculated based on the farthest distance.

[0095] In some embodiments, the sequences of the two single strands of the adapter may be completely complementary or partially complementary. In some embodiments, the sequences of the two single strands of the adapter may be partially complementary at either end or both ends. Preferably, the two single strands of the adapter are complementary at one end intended to connect to the DNA fragment (i.e., the adapter has a Y-shaped structure). In some embodiments, the two single strands of the adapter in the middle portion are not complementary (i.e., the adapter has a bubble-like structure).

[0096] In some embodiments, on the other strand (first strand) of the single strand containing the uracil nucleotide (second single strand), the nucleotide complementary to the uracil nucleotide (A) extends to the end of that single strand (i.e., the 3' end) and comprises an inversely complementary nucleotide sequence portion. In some embodiments, the inversely complementary nucleotide sequence portion comprises the nucleotide at the 3' end of the single strand (first single strand). In some embodiments, the length of the uracil nucleotide complementary nucleotide (A) to the 3' end is 3-20 nt; preferably, the length is 5-15 nt; more preferably, the length is 8-10 nt. When more than one uracil nucleotide is present, the length of the uracil nucleotide complementary nucleotide to the 3' end is calculated based on the farthest distance.

[0097] In some embodiments, the length of the reverse complementary nucleotide sequence portion is 3-20 nt; preferably, it is 5-15 nt; more preferably, it is 8-10 nt. In some embodiments, on the other single strand (first single strand) of the single strand containing the uracil nucleotide (second single strand), the length of the sequence from the complementary nucleotide (A) to its 3' end is greater than the length of the reverse complementary nucleotide sequence portion. In some embodiments, the length from the complementary nucleotide (A) to its 3' end is 1-5 nt longer than the length of the reverse complementary nucleotide sequence portion; preferably, it is 2-3 nt longer.

[0098] In some embodiments, the two single strands of the connector may have the same number of bases, or the two single strands of the connector may have different numbers of bases. In some embodiments, the two single strands of the connector may each have 20-50 nt of bases; preferably, 25-40 nt. In some embodiments, the length of the non-complementary portion between the two single strands of the connector may be 10-30 nt; preferably, 15-25 nt. In some embodiments, the total length of the complementary portion between the two single strands of the connector may be 10-30 bp; preferably, 15-25 bp. Those skilled in the art can design the specific sequence of the two single strands of the connector and adjust the length of the complementary or non-complementary portions in the middle or at both ends to form a stable connector.

[0099] According to some embodiments of this application, the adapter may optionally, but not necessarily, have a labeled modifying group. Labeled modifying groups are generally used for the adsorption or purification of DNA fragments. When the adapter is ligated to a DNA fragment to form a ligation product, the ligation product or its derivatives can be readily adsorbed or purified using molecules or groups that have an affinity or interaction with the modifying group of the adapter. For example, a ligand in a ligand-receptor interaction system can be selected as the modifying group of the adapter, and a solid phase bound to a receptor can be used to adsorb or purify the ligation product. For example, the ligand-receptor interaction system can be a typical biotin-avidin system. To achieve equivalent adsorption or purification functionality, those skilled in the art can also select other molecules or groups with strong affinity interactions, such as antibodies and antigens, amino and hydroxyl groups, etc. In some embodiments, the molecules or groups with affinity or interaction are immobilized. In some embodiments, the reaction product with the modified group is purified using a solid phase, such as magnetic beads, a centrifuge column, or a chromatographic column, immobilized with molecules or groups with affinity or interaction. In some embodiments, the affinity constant between the modifying group and the molecules or groups with affinity or interaction is at least 10. 5 mol / L. Preferably, the affinity constant between the modifying group and the molecule or group with which it has an affinity or interaction is at least 10. 6 mol / L to 10 18 mol / L. More preferably, the affinity constant between the modifying group and the molecule or group with which it has an affinity or interaction is at least 10. 8 mol / L to 10 16 mol / L.

[0100] According to some embodiments of this application, the adapter optionally, but not necessarily, has a tag sequence. In some embodiments, the tag sequence includes, but is not limited to, a unique molecular tag (UMI) sequence and a sample tag sequence. The unique molecular tag (UMI) sequence is used to count the copy number of nucleic acid molecules in a sample. The sample tag sequence is used to distinguish different samples for subsequent multi-sample pooling sequencing. For example, the tag sequence can be a barcode sequence or an index sequence. In some embodiments, the tag sequence is 3-20 nt in length; preferably, the tag sequence is 4-15 nt in length; more preferably, it is 8-10 nt.

[0101] In some embodiments, the adapter ligation kit may further include a DNA ligase. In some embodiments, the DNA ligase includes at least one of an ATP-dependent DNA ligase and an NAD+-dependent DNA ligase. In some preferred embodiments, the DNA ligase includes at least one of T4 DNA ligase, Taq DNA ligase, DNA ligase I, DNA ligase III, and DNA ligase IV. Those skilled in the art can reasonably select existing known DNA ligases, or can extract enzymes with completely identical or similar functions as substitutes. Those skilled in the art can also provide appropriate enzyme dosages, buffer solutions, coenzyme factors, and salt ions in the adapter ligation kit according to the specific type of DNA ligase used and the desired effect.

[0102] In some embodiments, the single-strand digestion kit is used to remove uracil nucleotides to the ends of the DNA ligation product. In some embodiments, the single-strand digestion kit includes a uracil nicking enzyme for creating a single nucleotide nick at the uracil nucleotide position. Creating a single nucleotide nick at the uracil nucleotide position includes: cleaving the uracil base of the uracil nucleotide and breaking the 3' and 5' phosphodiester bonds of the ribose at the debasement site.

[0103] In some embodiments, the uracil nicking enzyme comprises uracil DNA glycosylase (UDG) and a DNA lyase. Uracil DNA glycosylase (UDG) catalyzes the cleavage of uracil bases, forming a debasement (deuracil) site while maintaining the integrity of the phosphodiester backbone. The lysinic activity of the DNA lyase, such as endonuclease VIII, debases the 3' and 5' phosphodiester bonds at the debasement site, releasing a baseless deoxyribose, thereby creating a single nucleotide gap at the uracil nucleotide position. In some embodiments, the uracil nicking enzyme comprises... The enzyme USER (uracil-specific cleavage reagent) combines the activities of the two enzymes mentioned above, forming a single nucleotide gap at the uracil nucleotide position. Those skilled in the art can rationally select existing known uracil-cleaving enzymes, or extract enzymes with completely identical or similar functions as substitutes. Those skilled in the art can also provide appropriate enzyme dosages, buffer solutions, coenzyme factors, and salt ions in the single-chain digestion kit according to the specific type of uracil-cleaving enzyme used and the desired effect.

[0104] In some embodiments, the single-strand digestion kit also includes a heat- or alkaline-based denaturation kit for removing the nucleotide from the mononucleotide nick to the end. By controlling the high-temperature conditions of physical denaturation or the alkaline system of chemical denaturation, the hydrogen bonds between the nucleic acid double strands from the mononucleotide nick to the end are broken, thereby causing the single strand from the mononucleotide nick to the end to detach.

[0105] In some embodiments, the kit may further include a cyclization kit for cyclizing the DNA digestion product to form a circular DNA molecule with a single-strand gap. The adapter ligation kit described above can be provided as a cyclization kit, or a cyclization kit containing different components can be provided separately. In some alternative embodiments, the cyclization kit may not include DNA ligase. Those skilled in the art can provide appropriate amounts of enzyme, buffer, coenzyme factors, and salt ions in the cyclization kit depending on whether it is used, the specific type of DNA ligase used, and the desired effect.

[0106] In some embodiments, purification kits can be used to purify reaction products obtained from any of the individual reaction steps or sub-steps. In some embodiments, purification kits are used to purify circular DNA molecules. Many reagents for purifying and recovering nucleic acids are known to those skilled in the art. In some embodiments, purification kits include, but are not limited to, kits based on methods such as phenol / chloroform, centrifugation column, gel extraction, and magnetic bead methods. Those skilled in the art can reasonably provide suitable purification kits according to the specific desired effect and the characteristics of the sample.

[0107] In some preferred embodiments, the purification kit includes a magnetic bead-based purification kit. Magnetic bead purification is primarily based on solid-phase reversible immobilization technology. The magnetic beads are specifically modified, and the adsorption and elution of DNA are achieved through interactions such as electrostatic, hydrophilic, and hydrophobic interactions between the magnetic beads and DNA molecules. Some magnetic bead purification kits work by using magnetic beads with silanol or carboxyl functional groups modified on their outer surface. In a purification buffer system containing PEG and high-salt ions, DNA is adsorbed by forming ion bridges between DNA, salt ions, and carboxyl groups. Simple magnetic field treatment separates the DNA-adsorbed magnetic beads from impurities in the supernatant. In a buffer solution free of PEG and salt ions, the ion bridges between DNA and the magnetic beads are broken, thereby reversibly desorbing the target DNA from the magnetic beads. In some more preferred embodiments, the purification kit may include, but is not limited to, AMPure XP magnetic beads and DNA Clean Beads. Those skilled in the art can reasonably provide suitable magnetic bead purification kits as purification solutions based on the specific desired effect and sample characteristics.

[0108] In some embodiments, the kit may further include a linear digestion kit for digesting linear DNA molecules. In some embodiments, the linear digestion kit includes a nuclease. Taking advantage of the greater resistance of circular DNA molecules compared to linear DNA molecules, the linear DNA molecules are hydrolyzed or cleaved by controlling the enzymatic digestion reaction system, while reducing or avoiding the impact on circular DNA molecules. Preferably, the nuclease includes endonucleases and exonucleases. More preferably, the exonucleases include ATP-dependent deoxyribonucleases. Those skilled in the art can rationally select existing known exonucleases, or can extract enzymes with completely identical or similar functions as substitutes. Those skilled in the art can also provide appropriate enzyme dosages, buffer solutions, coenzyme factors, and salt ions in the linear digestion kit according to the specific type of exonucleases used and the desired effect.

[0109] In some embodiments, the nick translation kit is used for restriction nick translation of a circular DNA molecule with a single-strand nick. In some embodiments, the nick translation kit may include at least one or a mixture of enzymes: (1) a nucleic acid polymerase with 5'-3' strand substitution activity; or (2) a 5'-3' nucleic acid polymerase and a nucleic acid polymerase or 5'-3' exonuclease with 5'-3' exonuclease activity. In some embodiments, the nick translation kit may include a nucleic acid polymerase with 5'-3' strand substitution activity. A single nucleotide strand is synthesized at the 3' end of the nick using the nucleic acid polymerase with 5'-3' strand substitution activity, while the nucleotide is dissociated at the 5' end of the nick. In some embodiments, the nick translation kit may include a 5'-3' nucleic acid polymerase and a nucleic acid polymerase or 5'-3' exonuclease with 5'-3' exonuclease activity. A single strand of nucleotide is synthesized at the 3' end of the notch by a 5'-3' nuclease polymerase, while a nuclease polymerase or 5'-3' exonuclease with 5'-3' exonuclease activity cleaves the nucleotide at the 5' end of the notch. In some preferred embodiments, the nuclease polymerase of the notch translation kit includes T4 DNA polymerase, DNA polymerase I large fragment (Klenow fragment, KlenowFragment), and Klenow Fragment (3'→5'exo-). DNA polymerase, (exo-)DNA polymerase, Deep DNA polymerase, Deep At least one of (exo–)DNA polymerase, DNA polymerase I, Taq DNA polymerase, Bst DNA polymerase, and Phi29 DNA polymerase. Those skilled in the art can reasonably select existing known nucleic acid polymerases or exonucleases, or can extract enzymes with completely identical or similar functions as substitutes. In some preferred embodiments, the length of the controlled nick translation is 150-400 bp. In some preferred embodiments, the length of the controlled nick translation is 200-300 bp. Those skilled in the art can also provide appropriate enzyme dosage, buffer solution, coenzyme factors, salt ions, and dNTPs in the nick translation kit according to the specific type of nucleic acid polymerase or exonuclease used and the desired effect.

[0110] In some embodiments, the cleavage kit is used to cleave circular DNA molecules at the translated cleavage site. In some embodiments, the cleavage kit includes a mismatch cleavage nuclease. In some preferred embodiments, the mismatch cleavage nuclease is T7 endonuclease I. T7 endonuclease I is a product of the T7 gene 3 and is capable of recognizing and cleaving incompletely paired, cleaved DNA double strands at the cleavage site located at the first, second, or third phosphodiester bond at the 5' end of the mismatched base. Those skilled in the art can reasonably select existing known mismatch cleavage nucleases, or can extract enzymes with identical or similar functions as substitutes. Those skilled in the art can also provide appropriate enzyme dosage, buffer, coenzyme factors, and salt ions in the cleavage kit according to the specific type of mismatch cleavage nuclease used and the desired effect. In some embodiments, the cleavage kit can employ existing cleavage translation systems, such as the MGIEAsy CNT-CPE large fragment library preparation kit from BGI Genomics.

[0111] In some embodiments, cleaving the circular DNA molecule at the translational notch will form two fragments, and the purification kit can also be used to screen DNA fragment products. In some embodiments, DNA fragment products are screened based on a specific length, and the purification kit may include a gel extraction-based kit or a magnetic bead-based kit. In some embodiments, the adapter has a labeled modifying group, and the purification kit may include a solid phase, such as magnetic beads, a centrifuge column, or a chromatographic column, immobilized with molecules or groups having affinity or interaction with the adapter. In some embodiments, the adapter has a labeled modifying group, and the purification kit may include a magnetic bead-based kit; preferably, the magnetic beads immobilize molecules or groups having affinity or interaction with the modifying group of the adapter.

[0112] In some embodiments, the secondary ligation kit is used to ligate the DNA fragment product to an additional adapter. The adapter ligation kit described above can be provided as a secondary ligation kit, or additional secondary ligation kits containing different components can be provided. In some alternative embodiments, the secondary ligation kit includes an additional adapter, which does not necessarily contain uracil nucleotides. In some embodiments, the additional adapter may have a tag sequence. Preferably, the tag sequence includes, but is not limited to, a unique molecular tag (UMI) sequence and a sample tag sequence. The unique molecular tag (UMI) sequence is used to count the copy number of nucleic acid molecules in a sample. The sample tag sequence is used to distinguish different samples for subsequent multi-sample pooling sequencing. For example, the tag sequence can be a barcode sequence or an index sequence.

[0113] In some embodiments, the amplification kit is used to amplify DNA fragment products or secondary ligation products. In some embodiments, the amplification kit includes a nucleic acid polymerase; amplification primers; and dNTPs, including dATP, dGTP, dTTP, and dCTP. In some preferred embodiments, the nucleic acid polymerase is a DNA-dependent DNA polymerase. In some more preferred embodiments, the DNA-dependent DNA polymerase includes at least one of T4 DNA polymerase, DNA polymerase I large fragment (Klenow fragment), T7 DNA polymerase, DNA polymerase I, Taq DNA polymerase, Bst DNA polymerase, and Phi29 DNA polymerase.

[0114] In some preferred embodiments, the amplification primers are designed based on the sequence of the additional adapter in the secondary ligation product. In some preferred embodiments, to facilitate subsequent sequencing, the amplification primers may have a sequencing tag sequence, regardless of whether the adapter contains a tag sequence. Preferably, the primer with the sequencing tag sequence can be a forward primer or a reverse primer, or both the forward and reverse primers used for amplification may have sequencing tag sequences. Sequencing tag sequences include, but are not limited to, unique molecular tag (UMI) sequences and sample tag sequences. For example, a sample tag sequence may be a barcode sequence or an index sequence.

[0115] In some embodiments, the amplification kit may include a PCR amplification kit or a PCR-free amplification kit. Those skilled in the art can select library preparation kits that do not require PCR amplification based on relevant disclosed technologies. In some embodiments, the amplification kit is a PCR-free amplification kit. In some embodiments, the PCR-free amplification kit includes a rolling circle amplification (RCA) kit. For example, the DNBSEQ one-step DNB preparation kit from BGI Genomics. Those skilled in the art can rationally provide suitable amplification kits according to the specific desired effect and characteristics of the nucleic acid sample, and can provide suitable amplification reaction systems based on the specific type of nucleic acid polymerase used.

[0116] In some embodiments, sequencing kits are used to sequence DNA fragment products, secondary ligation products, or amplified libraries thereof. Those skilled in the art can reasonably provide sequencing kits based on existing known sequencing technologies or other suitable sequencing technologies. Simultaneously, those skilled in the art can also provide compatible amplification kits and sequencing kits. For example, pretreatment kits or amplification kits compatible with the provided sequencing technology can be provided to pretreatment or amplify nucleic acid fragment ligation products or nucleic acid fragment ligation product libraries, and then the pretreated or amplified samples can be sequenced using the sequencing kit. In some embodiments, sequencing kits include, but are not limited to, sequencing chips, sequencers, or read length analyzers.

[0117] In some preferred embodiments, the sequencing kit includes, but is not limited to, a sequencing kit based on BGI Genomics' DNBSEQ sequencing technology. According to some embodiments of this application, sequencing technology that performs repeated sequencing on the same template can correct read sequences, thereby improving sequencing accuracy.

[0118] Those skilled in the art will understand that there may be no clear boundaries between the various kits categorized according to the processing or reaction stage of the DNA fragments, and the kits and their reagent components are not entirely independent. For example, compatible buffers or compatible salt ion mixtures may be provided to be used in one or more reaction stages. Those skilled in the art are capable of reasonably determining which reagent components are suitable for one or more reaction stages, thus providing only a sufficient number of portions in the kit. Those skilled in the art will understand that the aforementioned kits or packages will include reagents, instruments or consumables, instructions, etc.

[0119] According to another embodiment of this application, the adapters, methods for preparing large-fragment sequencing libraries, sequencing methods for large-fragment sequencing libraries, and kits for preparing large-fragment sequencing libraries described in the above embodiments are provided.

[0120] In some embodiments, the adapters, methods for preparing large-fragment sequencing libraries, sequencing methods for large-fragment sequencing libraries, and kits for preparing large-fragment sequencing libraries described above are used for the preparation or sequencing of large-fragment sequencing libraries.

[0121] In some embodiments, the adapters, methods for preparing large-fragment sequencing libraries, sequencing methods for large-fragment sequencing libraries, and kits for preparing large-fragment sequencing libraries described above are used for genome assembly or assembly.

[0122] In some embodiments, the adapters, methods for preparing large-fragment sequencing libraries, sequencing methods for large-fragment sequencing libraries, and kits for preparing large-fragment sequencing libraries described above are used for gene mutation detection.

[0123] In some embodiments, gene mutation detection includes polymorphism detection. In some embodiments, gene mutation detection includes the detection of random amplified polymorphism (RAPD), restriction fragment length polymorphism (RFLP), amplified fragment length polymorphism (AFLP), simple repeat sequence (SSR), simple repeat spacer region polymorphism (ISSR), sequence-associated amplified polymorphism (SRAP), expressed sequence tag simple repeat (EST-SSR), base insertion or deletion (InDel), and single nucleotide polymorphism (SNP).

[0124] According to the embodiments of various aspects of this application, the beneficial effects of the technical solution of this application include, but are not limited to:

[0125] The technical solution of this application can efficiently and rapidly construct large-fragment sequencing libraries, avoiding the loss of large-fragment information caused by cumbersome operation steps. By introducing gaps at both ends of the initial DNA fragment sequence in the DNA circular molecule, and performing controlled-length restricted nick translation (CNT) in two directions of the initial DNA fragment, the DNA circular molecule is broken at the translation gap, which can quickly and accurately capture the information at both ends of the large fragment, providing convenience for subsequent genome assembly and variant detection.

[0126] In some preferred embodiments of this application, DNA fragment products can be screened according to specific lengths, and adapters do not necessarily have labeling modification groups. This avoids the need for labeling modification and magnetic beads to capture target fragments after randomly breaking circular DNA molecules in existing Mate pair library construction processes, reducing data waste and high false positive rates caused by streptavidin labeling. Simultaneously, it allows for efficient and accurate control of the depth of sequences obtained at both ends of the DNA fragment, improving the effectiveness of sequencing results and data analysis, and ultimately significantly increasing the efficiency of sequence assembly.

[0127] In some preferred embodiments of this application, DNA fragment products can be amplified by PCR-free amplification, eliminating the need for the PCR step, avoiding PCR bias and amplification errors of long DNA fragments, thereby improving the accuracy of large fragment detection. Attached Figure Description

[0128] Figure 1 A schematic diagram is shown illustrating the self-cyclization of DNA products to form cyclic molecules according to some embodiments of this application.

[0129] Figure 2 A schematic flowchart of a sequencing method for a large-fragment sequencing library according to some embodiments of this application is shown.

[0130] Figure 3 Schematic diagrams of reaction products and their sequence information according to some embodiments of this application are shown; wherein Figure A shows the Ad1 adapter ligation product, Figure B shows the USER enzyme digestion product, and Figure C shows the cyclization product. Detailed Implementation

[0131] The following specific embodiments further illustrate the content of this application in detail. Unless otherwise specified, the raw materials, reagents, or apparatus used in the embodiments and comparative examples are all available from conventional commercial sources or can be obtained by existing technical methods. Unless otherwise specified, the testing or experimental methods are conventional methods in the art.

[0132] Unless otherwise stated, those skilled in the art will understand that the chemical terms used above and throughout this specification have their common meaning in the art. A particular term or phrase should not be considered uncertain or unclear unless specifically defined, but should be understood in its common meaning. When trade names appear herein, they are intended to refer to the corresponding product or its active ingredient.

[0133] Unless otherwise indicated or defined, all terms used herein have their usual meaning in the art, which will be clear to those skilled in the art. For example, refer to standard manuals such as Molecular Cloning: A Laboratory Manual (Sambrook et al., "Molecular Cloning: A Laboratory Manual" (4th Ed.), Vols. 1-3, ColdSpring Harbor Laboratory Press, 2012); Lewin's Genes XI (Krebs et al., "Lewin's Genes XI", Jones & Bartlett Learning, 2017); and Modern Molecular Biology (Zhu Yuxian et al., Modern Molecular Biology (5th Edition), Higher Education Press, 2019).

[0134] The subject matter of this application and the claims specifically relates to artificial products or methods of using or producing such artificial products, which may be variants of natural (wild-type) products. Although there may be a degree of sequence identity with natural structures, it is understood that the materials, methods, and uses of the present invention (e.g., specifically isolated nucleic acid sequences) are “artificial” or synthetic and should therefore not be considered as a result of “natural laws”.

[0135] The terms “first” and “second” as used herein are for descriptive purposes only and should not be construed as indicating or implying relative importance or implicitly specifying the number of technical features indicated. Thus, a feature defined as “first” or “second” may explicitly or implicitly include at least one of that feature. In the description herein, “multiple” means at least two, such as two, three, etc., unless otherwise explicitly specified.

[0136] The terms “comprising,” “including,” “having,” and “containing” as used herein can be used as synonyms and should be understood as open-ended definitions that allow for the presence of other components, parts, or elements. “Comprising of…” is considered the most closed definition, containing no other elements besides those defining the constituent features. Therefore, “comprising” is broader and encompasses the definition of “comprising of…”.

[0137] As used herein, the term “about” means the same value or a value that differs from a given value by + / - 5%, specifically by + / - 2%, and more specifically by + / - 1%.

[0138] As used in this article, the term “partial” means “at least partially”, and therefore, unless there is a clear contradiction with the preceding or following text, “partially” includes both “partially” and “completely”.

[0139] As used herein and in the claims, the singular form, such as "an / a," includes the plural unless the context clearly indicates otherwise.

[0140] Where appropriate, optional features of each aspect or exemplary embodiment of this application should be understood to be applicable to any other aspect or exemplary embodiment of this application. Therefore, all features disclosed in this specification (including any appended claims and drawings), and / or all steps of any disclosed method or all components of any disclosed kit, may be interchanged and combined in any way, unless at least some of such features and / or steps / components are mutually exclusive.

[0141] Example

[0142] The following examples further illustrate the adapters, large-fragment sequencing library preparation methods, sequencing methods, kits, and applications according to this application. The examples described herein are illustrative of the subject matter of this application and are not intended to limit the scope of this application. Many modifications and variations can be made to the technical solutions described and illustrated herein without departing from the scope of this application. Therefore, it should be understood that these examples are merely illustrative and do not limit the scope of the invention.

[0143] The methods for preparing large-fragment sequencing libraries and the materials used in the sequencing methods disclosed herein can be obtained using standard, well-known methods. Many materials are commercially available, and even those not commercially available can be prepared and extracted using standard techniques known to those skilled in the art. The preparation and / or sequencing methods provided herein include multiple reaction steps, each of which can be performed independently and may or may not have one or more pre- or post-sequence steps. Therefore, this application aims to protect each individual reaction step of the preparation and / or sequencing methods separately provided herein.

[0144] According to one embodiment of this application, a method for preparing a large-fragment sequencing library is specifically provided, comprising the following steps:

[0145] 1. Fragment the genomic DNA to form fragments ranging in size from 2 to 8 kb, and then purify the DNA fragments using magnetic beads.

[0146] 2. The DNA fragment is repaired at the end and an A-terminus is added. The DNA fragment is then ligated to a linker containing uracil nucleotides, and the DNA ligation product is purified.

[0147] 3. The DNA ligation product is digested using uracil nicking enzyme to form a single nucleotide nick at the uracil nucleotide position; controlled high-temperature denaturation is then used to remove the nucleotides from the single nucleotide nicks to the ends, yielding a DNA digestion product with sticky ends at both ends.

[0148] 4. Without adding DNA ligase, cyclize the DNA digestion products to form circular DNA molecules with single-strand gaps; purify the circular DNA molecules;

[0149] 5. Under controlled reaction conditions, DNA circular molecules are nicked and translated, and the DNA circular molecules are broken at the translation nick. DNA fragment products of specific lengths are screened to obtain large fragment sequencing libraries.

[0150] According to one embodiment of this application, a sequencing method for a large fragment sequencing library is specifically provided, comprising the following steps:

[0151] 1. Fragment the genomic DNA and purify the fragmented products by gel extraction to obtain DNA fragments in the range of 3-5kb in size;

[0152] 2. The DNA fragment is end-repaired and a 3' A-terminus is added using a mixture of nucleic acid polymerases containing Taq DNA polymerase. The DNA fragment is then ligated to an adapter. The adapter has a phosphate group at the 5' end and a protruding end at the 3' end. Uracil nucleotides are present on the single strand of the protruding end of the adapter. The DNA ligation product with uracil nucleotides at both ends is obtained by purification using magnetic beads.

[0153] 3. Use USER to digest the DNA ligation product and form a single nucleotide gap at the position of uracil nucleotide; treat at 65-80℃ for 5-20 min to detach the single strand from the single nucleotide gap to the end, producing a DNA digestion product with sticky ends at both ends, wherein the sequence of the sticky ends is complementary.

[0154] 4. Prepare a ligation reaction system without adding DNA ligase to cause the DNA digestion product to cyclize into a circular DNA molecule, wherein each of the two single strands of the circular DNA molecule has a gap; use nuclease to digest the linear DNA molecule and recover the circular DNA molecule;

[0155] 5. By controlling the amount of nucleic acid polymerase and the concentration of dNTPs, restriction nick translation (CNT) is performed on circular DNA molecules, with the length of the nick translation controlled to be 200-300 bp; the circular DNA molecules are broken at the translation nick by using a mismatch nuclease, and DNA fragment products with a length of about 400-600 bp are screened.

[0156] 6. The DNA fragment product is repaired by adding a 3' A-terminal base, and then the DNA fragment is ligated to a secondary adapter to obtain a secondary ligation product;

[0157] 7. Design amplification primers based on the sequence of the additional adapter in the secondary ligation product, perform PCR-free amplification on the secondary ligation product, and obtain a large fragment sequencing library.

[0158] 8. After the large fragment sequencing library has passed quality control, it is sequenced using a sequencer.

[0159] The technical solution according to the embodiments of this application can efficiently and rapidly construct large-fragment sequencing libraries, avoiding the loss of large-fragment information due to cumbersome operation steps. Eliminating the PCR step avoids PCR bias and amplification errors of long DNA fragments, thereby improving the accuracy of large-fragment detection. Circularization through a special adapter design enables simultaneous restriction gap shift in two directions. Utilizing the efficient gap-cutting and complementary strand-cutting capabilities of mismatched nucleases, information from both ends of large fragments can be captured quickly and accurately, facilitating subsequent genome assembly and variant detection.

[0160] Example 1

[0161] NA12878 Commercial Standard DNA Large Fragment Sequencing Library Preparation Process

[0162] I. Experimental Materials

[0163] Large-fragment sequencing libraries were prepared using NA12878 commercial standard DNA as the sample. The prepared libraries were sequenced using a DNBSEQ-G99 sequencer (PE150 sequencing type), with a sequencing data volume of 100M. Data analysis was performed to obtain performance results including data utilization, alignment rate, and repeatability.

[0164] II. Experimental Procedure

[0165] See Figure 1 The flowchart includes the following steps:

[0166] 1. DNA fragmentation

[0167] Take 200 ng of NA12878 commercial standard DNA and use a Covaris M220 instrument to break down the NA12878 DNA to obtain DNA fragments of 3-5 kb; purify the DNA fragments with 0.5× DNA Clean Beads and dissolve them in 40 μL TE buffer.

[0168] 2. End repair and addition of A-terminal bases

[0169] Take the purified DNA fragment from the previous step and prepare the end repair reaction system according to Table 1 below. After mixing and centrifuging, place it in a PCR instrument and treat it at 37℃ for 30 min and 65℃ for 15 min. Finally, cool it to room temperature to obtain the end-repaired DNA fragment.

[0170] Table 1

[0171] Components volume DNA fragments 38μL 10×T4 PNK buffer 5μL T4 PNK(BGI, 01E016MS) 2μL T4 DNA polymerase (BGI, BGE007) 2μL dNTP (10 mmol / L) 1μL dATP (100 mmol / L) 1μL rTaq(BGI, 01E012MS) 1μL

[0172] 3. Prepare the first connector and connector connection.

[0173] Synthesize the following sequences:

[0174] First connector F single chain (Ad1-F):

[0175] 5'P-CGGTCGTGCAAGTCGGATGTGA ATAGCGTACGCTA -3'OH (SEQ ID NO:1, wherein after digestion with USER enzyme in the next step, ATAGCGTACGCTA (SEQ ID NO:2) forms a sticky end, which further serves as the linking region of the DNA loop in the circular step);

[0176] First connector R single chain (Ad1-R):

[0177] 5'OH-CGCTA U TCACATCCGACTTGCACGACCGT-3'OH (SEQ ID NO:3, where the underlined U is deoxyuridine nucleotide dUMP).

[0178] The 50 μmol / L Ad1-F sequence and the 50 μmol / L Ad1-R sequence were annealed to form a 25 μmol / L first linker Ad1.

[0179] To complete the end-repair reaction, add 8 μL of 10×T4 DNA ligase buffer, 2 μL of T4 DNA ligase (NEB, MO202S, 400 U / μL), 2 μL of the first adapter Ad1 (25 μmol / L), and 20 μL of ddH2O. After mixing and centrifugation, incubate in a PCR instrument at 20°C for 30 min. After the reaction, purify the Ad1 adapter ligation product using 40 μL of DNAClean Beads and dissolve it in 32 μL of TE buffer. The Ad1 adapter ligation product and its sequence information are as follows: Figure 3 As shown in (A).

[0180] 4. USER enzymatic digestion

[0181] Add 10 μL Standard Taq Reaction Buffer (NEB, B9014S), 5 μL USER enzyme (NEB, M5505L), and 55 μL ddH2O to the purified ligation product from the previous step. After mixing and centrifugation, place the mixture in a PCR instrument and incubate at 37°C for 30 min, 70°C for 15 min (heat denaturation), and 4°C for 5 min. The DNA product and its sequence information after USER digestion are as follows: Figure 3 As shown in (B).

[0182] 5. Cycloning and linear digestion

[0183] After completing the digestion reaction in the previous step, add 200 μL of 10×T4 DNA ligase buffer and 1700 μL of ddH2O to the mixture. After mixing and centrifugation, incubate at room temperature for 30 min for cyclization. The cyclization product and its sequence information are as follows: Figure 3 As shown in (C), the reverse complementary nucleotide sequence regions serve as the circularization linker regions of the DNA circular molecule, with a gap on each of the inner and outer loop single strands before and after the linker region. After the reaction, 5 μL of Plasmid-Sate ATP-Dependent Dnase (EPICENTRE, E3110K) was added to the circularization reaction system, mixed, centrifuged, and then placed in a PCR instrument at 37°C for 30 min to complete linear digestion. The completed reaction system was divided into two tubes, approximately 1 mL each, and 500 μL of DNA Clean Beads was added to each tube for purification. The purified product was dissolved in 50 μL of TE buffer.

[0184] 6. Confined Notch Translation (CNT)

[0185] Using the MGIEAsy CNT-CPE Large Fragment Library Construction Kit V1.0, the purified product from the previous step was used to prepare the restriction nick translation reaction system according to Table 2 below. After mixing and centrifugation, the system was placed in a PCR instrument and treated at 37℃ for 15 min and 65℃ for 15 min.

[0186] Table 2

[0187] Components volume DNA fragments 50μL NEBuffer 2 (NEB, B7002S) 6μL Bst DNA polymerase, Full Length (NEB, B7002S) 2μL Klenow fragment (Enzymatics, P7060L) 0.5μL dNTP mix (10mM) 2.5μL

[0188] 7. Digestion with T7 endonuclease I

[0189] After completing the nick translation reaction, add 7 μL of 10×T7 restriction enzyme I buffer and 3 μL of T7 restriction enzyme I (Beyotime, D7080S) to the reaction mixture, mix well, centrifuge, and incubate at 37°C for 30 min in a PCR instrument to complete the digestion reaction. The reaction product is then subjected to magnetic bead double selection using 49 μL + 10.5 μL DNAClean Beads. Specifically, after cleaving the circular DNA molecule at the translation nick, smaller DNA fragments of approximately 400-600 bp and larger DNA fragments of approximately 2500-5000 bp are formed. The first round of magnetic bead adsorption removes fragments larger than approximately 2500 bp, retaining approximately 400-600 bp DNA fragments in the supernatant. A second round of magnetic bead adsorption purification is then performed, and the purified product is dissolved in 40 μL of TE buffer.

[0190] 8. End repair and addition of A-terminal bases

[0191] Referring to step 2 above, "End repair and addition of A-terminal bases", take the purified DNA fragment from the previous step to prepare an end repair reaction system, and after reaction treatment, obtain the end-repaired DNA fragment.

[0192] 9. Prepare the second connector and connect the connectors.

[0193] Synthesize the following sequences:

[0194] Second connector F single chain (Ad2-F):

[0195] 5'P-AGTCGGAGGCCAAGCGGTCTTAGGAAGACAA TAGAGGACAA CAACTCCTTGGCTCA CA-3'OH (SEQ ID NO:4, the underlined sequence is the barcode sequence);

[0196] Second connector R single chain (Ad2-R):

[0197] 5'OH-TTGTCTTCCTAAGGAACGACATGGCTACGATCCGACTT-3'OH (SEQ ID NO: 5).

[0198] The 50 μmol / L Ad3-F and 50 μmol / L Ad3-R sequences were annealed to form a 25 μmol / L second linker Ad2.

[0199] Add 8 μL of 10×T4 DNA ligase buffer, 2 μL of T4 DNA ligase (NEB, MO202S, 400 U / μL), 2 μL of adapter Ad2 (25 μmol / L), and 18 μL of ddH2O to the end-repair reaction system from step 8 above. After mixing and centrifugation, incubate in a PCR instrument at 20°C for 30 min. After the reaction, purify the secondary ligation product using 40 μL of DNAClean Beads and dissolve it in 32 μL of TE buffer.

[0200] 10. DNB amplification and sequencing

[0201] The purified secondary ligation product was used to prepare DNB using a one-step DNB preparation kit (BGI, 020-000849-00). After successful DNB preparation, sequencing was performed using the DNBSEQ-2000PE100 sequencing instrument. Data analysis was performed on the sequencing results, including basic steps such as filtering adapter and primer sequences and alignment. FastQC was used to evaluate the data, and PRINSEQ was used to detect PCR repeats. The raw data was filtered using the NGS QC Toolkit to remove low-quality data and data containing adapters. The filtered data was then aligned to the human genome (version GRCh38), repeat labeling was performed using Picard, and Samtools software was used for insert analysis.

[0202] III. Experimental Results

[0203] 1. See Table 3 below for sequencing data statistics. The results are in line with expectations. The sequencing repetition rate is low and the sequencing coverage is high, indicating that the template utilization efficiency is high.

[0204] Table 3: Sequencing Data Statistics

[0205] Available data Sequencing depth Comparison rate Repetition rate Coverage At least 10× coverage 432,364,673 32.4 99.65％ 0.28％ 99.47％ 99.68％

[0206] 2. The statistics of long fragment data are shown in Table 4 below. The results are as expected. The proportion of pairs of reads with a length of less than 1kb is low, indicating that the large fragment library has good specificity.

[0207] Table 4: Statistics of Long Segment Data

[0208]

[0209] The analysis results show that the method described in this embodiment can efficiently and rapidly construct large-fragment sequencing libraries, avoiding the loss of large-fragment information caused by cumbersome operation steps. Circularization through a special adapter design enables simultaneous restriction nick translation in two directions. Utilizing the efficient cleavage of the T7 endonuclease I on the nick and complementary strands, information from both ends of large fragments can be captured quickly and accurately.

[0210] The subject matter of this application has been fully described through the above embodiments. Those skilled in the art will understand that the implementation methods of this application are not limited to the above embodiments, and the same solution can be implemented within a broad range of equivalents without affecting the scope of the subject matter or specific aspects described herein. Any changes, modifications, substitutions, combinations, or simplifications made without departing from the spirit and principle of this application should be considered equivalent substitutions and are included within the protection scope of this application.

Claims

1. A method for preparing a large-fragment sequencing library, characterized in that, Includes the following steps: A DNA fragment is provided, and the DNA fragment is ligated to a adapter containing uracil nucleotides to obtain a DNA ligation product with the adapters attached to both ends; The uracil nucleotides at the ends of the DNA ligation product are removed to obtain a DNA digestion product with sticky ends at both ends; wherein the sequences of the sticky ends at both ends are complementary. The DNA digestion product is circularized to form a circular DNA molecule with a single-strand gap. The circular DNA molecule is then translated with a controlled length gap. The circular DNA molecule is cut at the translated gap, and DNA fragment products are screened to obtain a large fragment sequencing library.

2. The preparation method according to claim 1, characterized in that, The DNA fragment is 2-40 kb in length; Preferably, the DNA fragment has protruding sticky ends; Preferably, the 5' end of the DNA fragment has a phosphate group.

3. The preparation method according to claim 1, characterized in that, The adapter comprises a first single strand and a second single strand; wherein the first single strand is attached to the 3' end of the DNA fragment, and the second single strand is attached to the 5' end of the DNA fragment; Preferably, the 5' end of the first single chain has a phosphate group, and the 3' end of the second single chain has a protruding end; Preferably, the uracil nucleotide is located on the second single strand; Preferably, on the first single strand, the sequence of the nucleotide complementary to the uracil nucleotide up to the 3' end includes an inverse complementary nucleotide sequence portion; Preferably, on the first single strand, the sequence length of the nucleotide complementary to the uracil nucleotide up to the 3' end is greater than the length of the reverse complementary nucleotide sequence portion.

4. The preparation method according to claim 1, characterized in that, Remove uracil nucleotides to the nucleotide at the 5' end of the single strand where they reside; Preferably, the sticky ends at both ends of the DNA digestion product each contain an inversely complementary nucleotide sequence portion, which serves as the circularization linking region of the DNA circular molecule. Preferably, the linker region of the DNA circular molecule has a gap on each of the two single strands.

5. The preparation method according to claim 1, characterized in that, The gap translation is achieved by at least one of the following enzymes or a mixture of enzymes: (1) a nucleic acid polymerase with 5'-3' chain substitution activity; or (2) a 5'-3' nucleic acid polymerase and a nucleic acid polymerase or 5'-3' exonuclease with 5'-3' exonuclease activity.

6. The preparation method according to claim 1, characterized in that, The method further includes purifying the circular DNA molecule; Preferably, the purification includes: digesting linear DNA molecules and recovering circular DNA molecules. Optionally, the method further includes: screening DNA fragment products according to a specific length; Optionally, the method further includes: ligating the screened DNA fragment product with an additional adapter to obtain a secondary ligation product; Optionally, the method further includes: amplifying the screened DNA fragment product or secondary ligation product to obtain an amplified library; Optionally, the method further includes sequencing the DNA fragment product, the secondary ligation product, or its amplified library.

7. A sequencing method for a large fragment sequencing library, characterized in that, Includes the following steps: A large-fragment sequencing library is constructed according to the preparation method of any one of claims 1-6; The large-fragment sequencing library was sequenced; Optionally, the sequencing method further includes analyzing the sequencing data to obtain sequence information at both ends of the DNA fragment.

8. A linker having uracil nucleotides, characterized in that, The connector comprises a first single chain and a second single chain; Preferably, the first single strand is used to connect to the 3' end of the DNA fragment, and the second single strand is used to connect to the 5' end of the DNA fragment; Preferably, the 5' end of the first single chain has a phosphate group, and the 3' end of the second single chain has a protruding end; Preferably, the uracil nucleotide is located on the second single strand; Preferably, on the first single strand, the sequence of the nucleotide complementary to the uracil nucleotide up to the 3' end includes an inverse complementary nucleotide sequence portion; Preferably, on the first single strand, the sequence length of the nucleotide complementary to the uracil nucleotide up to the 3' end is greater than the length of the reverse complementary nucleotide sequence portion.

9. A kit for preparing large-fragment sequencing libraries, characterized in that, include: Connector assembly kit; Single-chain digestion kit; Notch translation kit; And a notch-cutting kit. Optionally, the connector kit includes a connector with uracil nucleotides; Optionally, the single-chain digestion kit includes a uracil nicking enzyme; optionally, the single-chain digestion kit further includes a nucleic acid polymerase or a 5'-3' exonuclease having 5'-3' chain substitution activity or 5'-3' exonuclease activity. Optionally, the gap translation kit includes at least one of the following enzymes or a mixture of enzymes: (1) a nucleic acid polymerase with 5'-3' chain substitution activity; or (2) a 5'-3' nucleic acid polymerase and a nucleic acid polymerase or a 5'-3' exonuclease with 5'-3' exonuclease activity; Optionally, the notch-cutting kit includes a mismatch-cutting nuclease; preferably, the mismatch-cutting nuclease is T7 endonuclease I.

10. The preparation method according to any one of claims 1-6, the sequencing method according to claim 7, the adapter according to claim 8, and the kit according to claim 9 are used for the preparation or sequencing of large fragment sequencing libraries; Preferably, the adapter is used to ligate DNA fragments, and a large-fragment sequencing library is prepared based on the obtained DNA ligation product or the large-fragment sequencing library is further sequenced. Preferably, the preparation method according to any one of claims 1-6, the sequencing method according to claim 7, the adapter according to claim 8, and the kit according to claim 9 are used for gene mutation detection or genome assembly or splicing.