Single base precision DNA double strand break sequencing library, method for constructing the same and application thereof

What is AI technical title?
AI technical title is built by Patsnap AI team. It summarizes the technical point description of the patent document.
By optimizing library construction and bioinformatics analysis using the DSBclear-seq method, the detection bias caused by protein occlusion in existing technologies has been resolved. This enables efficient and low-cost detection of DNA double-strand breaks with single-base precision, and is suitable for high-resolution analysis of endogenous DSBs and restriction enzyme sites.

CN122303379APending Publication Date: 2026-06-30OCEAN UNIV OF CHINA

View PDF 0 Cites 0 Cited by

Patent Information

Authority / Receiving Office: CN · China
Patent Type: Applications(China)
Current Assignee / Owner: OCEAN UNIV OF CHINA
Filing Date: 2026-04-23
Publication Date: 2026-06-30

Application Information

Patent Timeline

23 Apr 2026

Application

30 Jun 2026

Publication

CN122303379A

IPC: C12Q1/6806; C12Q1/6869; C40B50/06; G16B30/00; G16B30/10

AI Tagging

Technology Topics

genomic DNADetection bias

Explore More Agents

Novelty Search
Search existing technologies and assess novelty
↗
FTO
Analyze whether a product may infringe others' patents
↗
Design FTO
Check prior-design risk for exterior design
↗
Drafting
Draft patent application text based on a technical solution
↗
Find Solutions with TRIZ
Generate feasible solution to solve your technical challenge
↗

Similar Technology Patents

Get free access to AI patent search and analysis

Check patentability, review prior art and ask IP Agent with full patent context.

AI Technical Summary

Technical Problem

Existing single-base precision DNA double-strand break detection methods suffer from protein occlusion and chromatin spatial conformation limitations, leading to detection bias. They are also characterized by high reagent costs, cumbersome operation, and long cycles, making it difficult to achieve high-resolution detection of endogenous DSBs.

Method used

The DSBclear-seq method was adopted, which optimizes the library construction process and combines it with bioinformatics analysis. High molecular weight genomic DNA was used as the starting material to connect P5 and semi-functional P7 adapters. Bioinformatics was used to filter false positive background, simplifying the operation steps and reducing reagent costs.

Benefits of technology

It significantly improves detection efficiency and sensitivity, can capture the ends of DSBs wrapped by proteins, reduces labor and time costs, is suitable for high-resolution analysis of endogenous DSBs and specific break sites induced by restriction enzymes, and shortens the experimental cycle.

✦ Generated by Eureka AI based on patent content.

Smart Images

Figure CN122303379A_ABST

Patent Text Reader

Abstract

This invention discloses a single-base precision DNA double-strand break sequencing library, its construction method, and its applications, belonging to the field of genome detection technology. Specifically, it relates to a method for constructing a single-base precision DNA double-strand break sequencing library, comprising: extracting high-molecular-weight genomic DNA from a sample; then ligating P5 adapters to the ends of DSBs present in the genome; further fragmenting the DSBs using ultrasound; and then ligating semi-functional P7 adapters; enriching the target sequence by PCR to obtain the DSB sequencing library. This invention provides a single-base precision DNA double-strand break sequencing library, its construction method, and its applications. By using the constructed library for DSB sequencing and combining it with bioinformatics analysis strategies to filter false positive background noise, it overcomes the DSB detection bias problem caused by protein masking in existing in situ labeling methods. Simultaneously, it significantly reduces reagent costs, simplifies experimental procedures, and shortens experimental cycles, thereby improving detection efficiency and the potential for large-scale application.

Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention belongs to the field of genome detection technology, and particularly relates to single-base precision DNA double-strand break sequencing libraries, their construction methods, and applications. Background Technology

[0002] DNA double-strand breaks (DSBs) were previously considered one of the most cytotoxic DNA damages. Their erroneous repair can lead to mutations and genomic rearrangements, and can trigger cell cycle arrest, chromosomal translocations, apoptosis, and even cancer. However, increasing evidence in recent years suggests that endogenous DSBs play important regulatory roles in neural development and cancer development, participating in various biological processes such as transcription, replication, meiosis, and immune cell development. With a deeper understanding of the physiological functions of DSBs, accurately analyzing their generation and repair characteristics at the genomic scale has become a crucial prerequisite for understanding their biological roles. However, achieving high-precision localization and quantitative analysis of DSBs at the genomic level remains a significant challenge. In recent years, the rapid development of DSB mapping technology has enabled researchers to depict the distribution of DSBs across the genome with unprecedented scope and resolution, thus opening new avenues for related research.

[0003] Currently, high-throughput detection methods for DSBs mainly fall into three categories: 1) Indirect DNA damage analysis methods, including chromatin immunoprecipitation sequencing (ChIP-seq), ectopic capture sequencing (TC-Seq), and GUIDE-seq. Although these methods can detect DSBs across the entire genome, they are all indirect, resulting in lower resolution and reflecting only the DSB repair process. 2) DSB detection methods based on terminal deoxynucleotidyl transferase (TdT) DNA polymerization, including damaged DNA immunoprecipitation (dDIP), DSB-seq, and DNA breakage immunocapture (dBrIC). These techniques all involve using TdT, labeling DSBs with biotinylated nucleotides, immunoprecipitating DSB fragments with biotinylated antibodies, and sequencing after library construction. The average resolution is approximately 100-300 bp, which is higher than that of indirect methods. 3) Single-base precision DSB detection methods include the pioneering BLESS method by Crosettto et al., which enabled direct in-situ DSB labeling and streptavidin magnetic bead enrichment, allowing for single-nucleotide resolution localization and quantification of DSBs across the entire genome. Subsequent methods such as END-seq, DSBCapture, and BLISS are all optimizations and improvements based on the BLESS principle. However, this method of in-situ labeling of DSBs within the cell nucleus still has certain limitations. After DSBs occur, they are usually rapidly occupied by DNA damage response (DDR) proteins. Simultaneously, under in-situ fixation conditions, chromatin retains a higher-order folded conformation, causing protein shielding and steric hindrance at the break ends. This reduces the accessibility of ligases to the break ends, making it difficult to effectively detect some protein-bound DSBs. Furthermore, these methods require in-situ labeling of DSBs within the cell nucleus, thus demanding high stability and in-situ reactivity of the enzyme reagents, resulting in high reagent costs, cumbersome experimental procedures, and a long overall experimental cycle. Summary of the Invention

[0004] To address the aforementioned technical challenges, this invention proposes a single-base precision DNA double-strand break sequencing library, its construction method, and its applications. This invention directly uses high-molecular-weight genomic DNA as the starting material for library construction. Although this may introduce exogenous random DNA break signals, these signals lack consistency and reproducibility in biological replicates and can be effectively identified and removed. Based on this, this invention further employs a bioinformatics analysis strategy to filter false positives based on the read coverage depth of DSB sites and the signal consistency across replicate samples, thereby significantly reducing noise interference and improving the detection accuracy of true DSB signals. Overall, DSBclear-seq overcomes the DSB detection bias caused by protein physical occlusion and chromatin spatial conformation limitations in existing in situ labeling strategies by optimizing the library construction process and integrating computational filtering methods. Simultaneously, this method demonstrates significant advantages in reducing reagent costs, simplifying operational steps, and shortening experimental cycles, which is beneficial for improving detection efficiency and large-scale application potential, especially suitable for high-resolution analysis of endogenous DSBs and restriction endonuclease-induced specific break sites.

[0005] To overcome the limitations of existing single-base precision DSB detection methods, this invention provides a novel DSB library construction method, DSBclear-seq. This method avoids the problem of protein-bound DSBs being undetectable, uses low-cost reagents, has simple steps, and a short cycle time. It is particularly suitable for the detection of endogenous DSBs and specific DSB sites such as enzyme cleavage sites, and has the potential for high efficiency and large-scale application.

[0006] To achieve the above objectives, this invention provides a method for constructing a single-base precision DNA double-strand break sequencing library, comprising the following steps: 1) Extract high molecular weight genomes from cell or tissue samples, and then ligate P5 adapters to the ends of DSBs present in the genome; 2) Then use ultrasound to break down the genome after ligating the P5 adapter into 300-500bp fragments, and then ligate the semi-functional P7 adapter. 3) Obtain DSB sequencing libraries by enriching the target sequences with PCR.

[0007] Preferably, the P5 adapter ligation in step 1) includes the following steps: cutting off the 5' protruding end and 3' protruding end of the free DNA end to form a blunt end; then adding an A tail to the 3' end of the blunt-end DNA; ligating the DNA end with the 3'A protruding end to the 3'T protruding end adapter; and performing magnetic bead fragment sorting to remove excess adapters.

[0008] Preferably, the P5 adapter in step 1) has a C3 spacer arm. The P5 adapter includes a first strand and a second strand. The nucleotide sequence of the first strand of the P5 adapter is shown in SEQ ID NO.1, and the nucleotide sequence of the second strand of the P5 adapter is shown in SEQ ID NO.2. The 5' end and 3' end of the second strand of the P5 adapter are modified with 5' phosphorylation and the 3' spacer is modified with phosphoramide, respectively. After annealing, a 3'T sticky-terminated double-stranded oligonucleotide is generated for subsequent TA ligation reaction. TA sticky-terminated ligation has higher capture efficiency than blunt-terminated ligation and can improve detection sensitivity. The C3 spacer arm of the P5 adapter can provide steric hindrance, preventing the adapter from being re-attached during the second round of P7 adapter addition.

[0009] Preferably, step 2) involves the following steps: cutting off the 5' and 3' protruding ends of the free DNA to form blunt ends; then adding an A tail to the 3' end of the blunt-end DNA; connecting the 3'A protruding end of the DNA to the 3'T protruding end adapter; and performing magnetic bead fragment sorting to remove excess adapters.

[0010] Preferably, the semi-functional P7 adapter in step 2) comprises a first strand and a second strand. The nucleotide sequence of the first strand of the semi-functional P7 adapter is shown in SEQ ID NO.3, and the nucleotide sequence of the second strand is shown in SEQ ID NO.4. The 5' and 3' ends of the second strand of the semi-functional P7 adapter are modified with 5' phosphorylation and the 3' spacer with phosphoramide, respectively. The semi-functional P7 adapter lacks the complementary pairing sequence required for subsequent PCR amplification primer binding. The 3' T sticky end of the P7 adapter is used for TA ligation of the target fragment. The P7 adapter lacks the site sequence for subsequent PCR primer binding, so only the target sequence with DSBs sites, which has a P5 adapter at one end and a P7 adapter at the other end, can be enriched by PCR amplification for subsequent sequencing analysis.

[0011] Preferably, the target sequence in step 3) is a sequence with a P5 adapter attached to one end and a semi-functional P7 adapter attached to the other end. Only target sequences with a P5 adapter attached to one end and a semi-functional P7 adapter attached to the other end can be amplified in PCR, as the single-base site information of DNA double-strand breaks is fixed by the P5 adapter. During PCR, non-target sequences with semi-functional P7 adapters attached to both ends do not have PCR primer binding sites and therefore cannot be amplified. The primer sequences used for PCR amplification are shown in SEQ ID NO. 5 and SEQ ID NO. 6.

[0012] This invention also provides a single-base precision DNA double-strand break sequencing library constructed using the aforementioned method. The sequence structures generated by DSB library construction fall into three categories: both ends are connected to P7 adapter sequences; one end is connected to a P5 adapter sequence and the other end to a P7 adapter sequence; and both ends are connected to P5 adapter sequences. In the first case, the sequence fragment lacks primer binding sites, therefore it will not be amplified in PCR and will degrade during high-temperature cycling. In the second case, only the P5 adapter sequence contains primer binding sites; after the first round of PCR amplification, fragments with complete P5 and P7 adapter sequences will be generated, enabling exponential amplification of the DSB target sequence fragment from the second round onwards. The third case only occurs when the distance between two adjacent DSBs is less than 500 bp, and normal amplification can also be achieved.

[0013] This invention also provides the application of the single-base precision DNA double-strand break sequencing library constructed by the above method in DNA double-strand break sequencing.

[0014] Preferably, the DNA double-strand break sequencing includes the following steps: using the single-base precision DNA double-strand break sequencing library, performing sequencing on the Illumina sequencing platform to obtain the original paired-end sequences, and determining the DSB positions through the sequencing results.

[0015] Further preferably, the determination of DSB locations through sequencing results includes the following steps: screening out paired-end raw sequences that meet the following requirements: trimming the raw sequencing adapters of reads1 and reads2, aligning them to the same chromosome in the genome, and removing PCR repetitive sequences; obtaining the location of each DSB in the genome and read coverage: retaining only the alignment information of reads1; if reads1 aligns to the positive strand of the genome, the position of the first base of the sequence is taken as the DSB site; if reads1 aligns to the negative strand of the genome, the position of the last base of the sequence is taken as the DSB site, and the number of reads aligned to the DSB site is recorded as the read coverage of the DSB site; retaining high-confidence DSB sites: retaining only high-confidence DSB sites with ≥5 read coverage, thereby effectively shielding random DSBs introduced during library construction.

[0016] Compared with the prior art, the present invention has the following advantages and technical effects: This invention provides a single-base precision DNA double-strand break (DSB) sequencing library, its construction method, and applications, overcoming the limitations of existing single-base precision DSB detection methods. This invention offers a novel DSB library construction method, DSBclear-seq, combined with bioinformatics analysis strategies to filter false-positive background noise, thus overcoming to some extent the DSB detection bias caused by protein masking in existing in situ labeling methods. Simultaneously, it significantly reduces reagent costs, simplifies experimental procedures, and shortens experimental cycles, thereby improving detection efficiency and large-scale application potential. It is particularly suitable for high-resolution resolution of endogenous DSBs and restriction enzyme-induced specific break sites. This invention greatly reduces labor and time costs; DSB library construction can be completed in just two days, reducing the time required by methods such as DSBCapture and sBLISS by more than half. Direct manipulation of the genome, compared to single-cell manipulation, can capture protein-encapsulated DSB ends, improving capture sensitivity. The enzyme reaction reagents used are inexpensive, reducing reagent costs for the required reactions, laying the foundation for large-scale, low-cost DSB sequencing. This invention is particularly suitable for capturing non-random DSB sites. Attached Figure Description

[0017] To more clearly illustrate the technical solutions in the embodiments of the present invention or the prior art, the drawings used in the embodiments will be briefly introduced below. Obviously, the drawings described below are only some embodiments of the present invention. For those skilled in the art, other drawings can be obtained based on these drawings without creative effort.

[0018] Figure 1 This diagram provides an overview of the principle and sequencing applications of the single-base precision DNA double-strand break sequencing library of this invention. Figure 2 This is a schematic diagram illustrating the high sensitivity of the single-base precision DNA double-strand break sequencing library and the capture of EcoRV-HF restriction sites in this invention, wherein a is a schematic diagram of high sensitivity and b is a schematic diagram of capture of EcoRV-HF restriction sites. Figure 3 This invention relates to the single-base precision DNA double-strand break sequencing library and the capture efficiency of sBLISS for EcoRV-HF restriction sites. Figure 4 The single-base precision DNA double-strand break sequencing library and sBLISS detected consistent DSB read distribution characteristics in the coral genome. Figure 5To improve the noise filtering threshold and high sensitivity detection of low-level endogenous DSBs in the single-base precision DNA double-strand break sequencing library of this invention, the following parameters are defined: a) represents the number of endogenous DSBs enriched with high DSB levels in sBLESS and DSBclear-seq before filtering; b) represents the number of endogenous DSBs enriched with high DSB levels in sBLESS and DSBclear-seq after filtering; c) represents the Venn diagram of the overlap rate of endogenous DSBs in sBLESS and DSBclear-seq before filtering; and d) represents the Venn diagram of the overlap rate of endogenous DSBs in sBLESS and DSBclear-seq after filtering. Detailed Implementation

[0019] Various exemplary embodiments of the present invention will now be described in detail. This detailed description should not be considered as a limitation of the present invention, but rather as a more detailed description of certain aspects, features, and embodiments of the present invention.

[0020] It should be understood that the terminology used in this invention is merely for describing particular embodiments and is not intended to limit the invention.

[0021] Unless otherwise stated, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art. While only preferred methods and materials have been described herein, any methods and materials similar or equivalent to those described herein may be used in the implementation or testing of this invention. All references to this specification are incorporated by way of citation to disclose and describe methods and / or materials associated with those references. In the event of any conflict with any incorporated reference, the content of this specification shall prevail.

[0022] Various modifications and variations can be made to the specific embodiments described in this specification without departing from the scope or spirit of the invention, as will be apparent to those skilled in the art. Other embodiments derived from this specification will also be apparent to those skilled in the art. This specification and embodiments are merely exemplary.

[0023] The terms “include,” “including,” “have,” “contain,” etc., used in this article are all open-ended terms, meaning that they include but are not limited to.

[0024] Example 1 This example uses coral cells and employs the method of this invention to detect DSBs on the genome and DSBs generated by EcoRV-HF digestion, constructing a DNA double-strand break library with single-base precision. Simultaneously, the currently available sBLISS detection method is used as a positive control, and no adapter sequence is added as a negative control. Details are as follows: 1. Cell dissociation and separation: Place the coral fragments into a 15 mL tube, add collagenase II (Solarbi, C8150) at a concentration of 2 mol / mL, and carry out the digestion reaction at room temperature. Use a 1 mL pipette tip to continuously blow and aspirate until the coral fragments turn white, obtaining a mixed cell suspension of coral polyps and zooxanthellae.

[0025] The mixed cell suspension was centrifuged at 400g for 10 min, and the supernatant was discarded. Coral polyp single-cell suspensions were then separated using hydroxyethyl starch density gradient centrifugation.

[0026] 2. Genome extraction: Centrifuge the coral polyp single-cell suspension at 800g for 10 min and discard the supernatant. Resuspend the pellet in a 1.5mL EP tube with 100μL Tail buffer (10mM Tris-HCl, 100mM NaCl, 50mM EDTA, 1% SDS, pH 7.5, 25℃), add 10μL of 20mg / mL proteinase K (Roche, DBMK), and incubate at 55℃, 800rpm on a hot mixer until the sample tube becomes clear.

[0027] Add an equal volume of phenol:chloroform:isoamyl alcohol (25:24:1) and gently vortex for 10 min. Centrifuge at 20000g for 15 min, and transfer the clear upper layer of nucleic acid solution to a new 1.5 mL EP tube. Add an equal volume of chloroform:isoamyl alcohol (24:1) and gently vortex for 10 min. Centrifuge at 20000g for 15 min, and transfer the clear upper layer of nucleic acid solution to a new 1.5 mL EP tube. Add 0.1 × volume of 3M sodium acetate, mix, then add 2.45 × volume of pre-chilled ethanol, mix, and incubate at -20℃ for 1 h. Pre-chill the centrifuge to 4℃.

[0028] Centrifuge the precipitated nucleic acid at 20,000g for 30 minutes at 4°C, and discard the supernatant. Wash the precipitate twice with 70% ethanol, handling gently. Perform a final wash to remove all residual ethanol and dry the nucleic acid. Then dissolve the dried precipitate in ddH₂O.

[0029] 3. EcoRV-HF enzyme digestion: Take 1 μg of genomic DNA, add 5 μL of 10×rCutSmart™ Buffer (NEB, R3195S), and add 1 μL of EcoRV-HF® (NEB, R3195S), making a final volume of 50 μL. Incubate in a PCR instrument at 37°C for 1 h. Purify the reaction product again using the phenol-chloroform-isoamyl alcohol method.

[0030] 4. DSBs passivation and adding an A tail: Take 1 μg of genomic DNA and add 15 μL of End Prep Mix 4 (Vazyme, ND607), bringing the final volume to 65 μL. Incubate for PCR at 20°C for 15 min; 65°C for 15 min; store at 4°C.

[0031] 5. P5 connector connection: To the product from the previous step, add 25 μL Rapid Ligation Buffer 2 (Vazyme, ND607), 5 μL Rapid DNA Ligase (Vazyme, ND607), and 5 μL P5 adapter, bringing the final volume to 100 μL. Incubate at 20°C for 15 min in a PCR instrument; store at 4°C. P5 adapter first strand: A ATGATACGGCGACCACCGAGATCTACAC[index]ACACTCTTTCCCTACACGACGCTCTTCCGATC T (SEQ ID NO.1). P5 connector second chain: (SEQ ID NO.2) GATCGGAAGAGCGTCGTGTAGGGAAAGAGTGT[index]GTGTAGATCTCGGTGGTCGCCGTATCATT. (Where, " "[" indicates a thiolation modification between two bases to prevent degradation by nucleases; [index] indicates an index used to distinguish different samples during sequencing, typically a combination of 6-8 bases."

[0032] 6. Ultrasonic fragmentation and genome purification: Using a non-contact ultrasonic cell disruptor, the parameters were set to 5 seconds of operation followed by 10 seconds of pause, 40% power, for a total time of 5 minutes, at 4°C, to obtain DNA fragments of 300-500 bp.

[0033] The reaction product was purified using VAHTS DNA Clean Beads (Vazyme) to remove residual P5 adapters and enzyme reaction solution. First, the magnetic beads were mixed and equilibrated to room temperature. 0.6 × 10⁻⁶ magnetic beads were added to 100 μL of the reaction product; incubated at room temperature for 5 min; the PCR tube was placed on a magnetic rack for approximately 5 min until the solution became clear, then the supernatant was carefully removed; the PCR tube remained on the magnetic rack, and 200 μL of freshly prepared 80% ethanol was added to rinse the magnetic beads twice, 30 s each time; the 80% ethanol was carefully removed, and the tube was left uncapped and allowed to dry for approximately 5 min until no ethanol residue remained; the PCR tube was removed from the magnetic rack and resuspended in 22.5 μL of ddH₂O, mixed by pipetting, and allowed to stand at room temperature for 2 min. The PCR tube was then placed on the magnetic rack again and allowed to stand for approximately 5 min until the solution became clear, then 20 μL was transferred to a new PCR tube.

[0034] 7. P7 connector connection and product purification: Repeat step 4, adding 25 μL Rapid Ligation Buffer 2 (Vazyme, ND607), 5 μL Rapid DNA Ligase (Vazyme, ND607), and 5 μL semi-functionalized P7 adapter to the product from the previous step, bringing the final volume to 100 μL. Incubate at 20°C for 15 min in a PCR instrument; store at 4°C. Repeat the product purification process from step 6.

[0035] Semi-functional P7 connector first chain: C AAGCAGAAGACGGCATACGAGAT[index]GTGACTGGAGTTCAGACGTGTGCTCTTCCGATC T (SEQ ID NO.3). Semi-functionalized P7 connector second chain: GATCGGAAGAGCACACGTCTGAACTCCAGTCAC (SEQ ID NO.4). (Where, " "[" indicates a thiolation modification between two bases to prevent degradation by nucleases; [index] indicates an index used to distinguish different samples during sequencing, typically a combination of 6-8 bases."

[0036] 8. Library expansion: To the 20 μL purified product from the previous step, add 5 μL of PCR Primer Mix 3 for Illumina (Vazyme, ND607) and 25 μL of VAHTS HiFi Amplification Mix (Vazyme, ND607), bringing the final volume to 50 μL. Incubate in a PCR instrument at 95°C for 3 min; 98°C for 20 s; 60°C for 15 s; 72°C for 30 s; 15 cycles; 72°C for 5 min; store at 4°C. Primer sequences: AATGATACGGCGACCACCGA (SEQ ID NO.5); CAAGCAGAAGACGGCATACGA (SEQ ID NO.6).

[0037] 9. Product purification: First, vortex the VAHTS DNA Clean Beads to mix and equilibrate to room temperature. Add 0.9 × volume of magnetic beads to the PCR product and mix by pipetting. Incubate at room temperature for 5 min. Place the PCR tube on a magnetic rack to separate the magnetic beads from the liquid. After standing for about 5 min, carefully remove the supernatant. Keep the PCR tube on the magnetic rack and add 200 μL of freshly prepared 80% ethanol. Rinse twice to remove the supernatant completely. Keep the PCR tube on the magnetic rack and air dry for about 5 min until no ethanol residue remains. Remove the PCR tube from the magnetic rack, add 22.5 μL of ddH2O, mix by pipetting, and let stand at room temperature for 2 min. Place the PCR tube back on the magnetic rack and wait for the solution to clarify. Carefully pipette 20 μL into a new EP tube.

[0038] 10. Document quality inspection and submission for testing: The fragment size distribution of the products was determined by 1% agarose gel electrophoresis, and the product concentration was determined by Qbuit. The product fragment size distribution should be 300-1000 bp, and the concentration should be higher than 3 ng / μL.

[0039] 11. Data Analysis Process: For the split FASTQ files, the adapter sequences were pruned using the fastp parameters set to "-gq 5 -u 50 -n 15 -l 150 --overlap_diff_limit 1 --overlap_diff_percent_limit 10" to obtain error-free reads. The BWA-index was used to construct an index file of the coral reference genome, and BWA-mem aligned the error-free reads to the reference genome to obtain a SAM file. SAMtools was used to convert the SAM to BAM and remove low-quality alignments, with parameters set to "-q 10 -F 4 -F 256 -f 64". PCR repeats were removed using `java -jar picard.jar`. Then, a custom AWK script was used to parse the BAM file and obtain the first nucleotide position 5' upstream of the P5 end reads (if aligned to the "+" strand, the DSB site is the start site; if aligned to the "-" strand, the DSB site is the stop site), outputting a BED file.

[0040] 12. Prediction and analysis of EcoRV-HF electronic restriction sites: The EcoRV-HF restriction site sequence is GAT^ATC ("^" indicates that the EcoRV-HF enzyme cleavage site on GATATC is between T and A). SeqKit locate was used to predict the restriction sites on the coral genome; Bedtoolsintersect was used to calculate the overlap rate between DSBs and the predicted electronic restriction sites of EcoRV-HF. The capture efficiency was also compared with the sBLISS method.

[0041] 13. Detection and analysis of endogenous DSBs on the genome: The obtained DSB location bed files were analyzed using Homer to analyze the DSB site motifs; Bedtoolsintersect was used to statistically compare the DSBs detected by the method of this invention on specific motifs with the DSBs detected by sBLESS.

[0042] 14. Results: The single-base precision DNA double-strand break library constructed by the method of this invention and its application in sequencing are achieved through five stages (such as...). Figure 1 (As shown). Stage 1 involves the extraction of genomic DNA; Stage 2 involves chemically modified P5 adapters with an index (used to distinguish different samples during sequencing, typically a combination of 6-8 bases) to mark the DSB ends; Stage 3 involves ultrasonic fragmentation of genomic DNA into 300-500 bp fragments; Stage 4 involves chemically modified P7 adapters without PCR primer binding sites connecting to the other end of the DSB-bearing fragments; Stage 5 involves PCR enrichment and amplification of the DSB-bearing fragments. The method of this invention allows for the detection of DSB ends even when they are encapsulated by proteins. Currently used library construction techniques such as sBLISS require in vitro transcription followed by reverse transcription or avidin magnetic bead enrichment to enrich DSB-bearing fragments. The improved P5 and P7 adapter combination used in this invention significantly reduces the steps, costs, and operation time required for target fragment enrichment. The method provided by this invention is particularly suitable for the detection of non-random DSB sites, such as those obtained through enzyme digestion.

[0043] To demonstrate the effectiveness of the method described in this invention, the coral genome was first digested using a high-fidelity EcoRV-HF (NEB, R3195V) restriction endonuclease, and the DSBs generated by EcoRV-HF digestion were detected using the DSBclear-seq method described in this invention and the known detection method sBLISS. Figure 2 As shown in Figure a, the method described in this invention can sensitively detect DSBs generated by EcoRV-HF digestion from samples. Figure 2As shown in Figure b, the method of this invention detects the blunt-end cleavage pattern of EcoRV-HF, where the forward and reverse reads are aligned at known cleavage sites. This confirms that the method described in this invention can accurately detect DSB ends with a single-base precision. The significantly increased level of genome fragmentation after EcoRV-HF treatment indicates a broad dynamic detection range for the method described in this application. Figure 3 As shown, compared with the results of sBLISS, the method of this invention detected 90.9% of the EcoRV-HF restriction endonuclease sites, while sBLISS detected 70.6%, demonstrating more detection sites and higher sensitivity. Furthermore, compared with methods such as sBLISS, DSBCapture, and END-seq, the method of this invention has lower reagent costs and a shorter experimental cycle, thus being more economical and effective.

[0044] The effectiveness of the method described in this invention has been confirmed by genomes processed with EcoRV-HF, and then applied to detect DSBs of genomic endogenous origin. Figure 4 As shown, based on the sBLISS detection method (which features in situ detection of DSBs within the cell nucleus), it exhibits consistent DSBs read distribution characteristics with DSBclear-seq. For example... Figure 5 a and Figure 5 As shown in Figure b, the method of this invention can be applied to the detection of endogenous low-level DSB sites, and can effectively shield against exogenous DSB noise introduced during the genome extraction process by using a threshold of read coverage of ≥5 DSB sites. Figure 5 c and Figure 5 As shown in Figure d, the method of the present invention has a high overlap of 97.1% with endogenous DSBs on motifs at high DSB levels.

[0045] In summary, this invention provides a novel high-throughput method for detecting single-base precision DNA double-strand breaks, DSBclear-seq. Comparison of the results with sBLISS reveals that it can detect not only restriction endonuclease-induced DSBs and low levels of endogenous DSBs, but also protein-covered DSB ends. This demonstrates that the method described in this invention is not only cost-effective and simple to operate, but also has a short experimental cycle, and is of great significance for the promotion of large-scale, low-cost DNA double-strand break sequencing.

[0046] The embodiments described above are merely preferred embodiments of the present invention and are not intended to limit the scope of the present invention. Various modifications and improvements made by those skilled in the art to the technical solutions of the present invention without departing from the spirit of the present invention should fall within the protection scope defined by the claims of the present invention.

Claims

1. A method for constructing a single-base-precision DNA double-strand break sequencing library, characterized in that, Includes the following steps: 1) Extract high molecular weight genomes from cell or tissue samples, and then ligate P5 adapters to the ends of DSBs present in the genome; 2) Then use ultrasound to break down the genome after ligating the P5 adapter into 300-500bp fragments, and then ligate the semi-functional P7 adapter. 3) Obtain DSB sequencing libraries by enriching the target sequences with PCR.

2. The construction method of claim 1, wherein, The P5 adapter ligation described in step 1) includes the following steps: cutting off the 5' and 3' protrusions of the free DNA ends to form blunt ends; then adding an A tail to the 3' end of the blunt-end DNA; ligating the 3'A protrusion DNA end to the 3'T protrusion adapter; and performing magnetic bead fragment sorting to remove excess adapters.

3. The construction method of claim 1, wherein, The P5 linker described in step 1) has a C3 spacer arm. The P5 linker includes a first chain and a second chain. The nucleotide sequence of the first chain of the P5 linker is shown in SEQ ID NO.1, and the nucleotide sequence of the second chain of the P5 linker is shown in SEQ ID NO.

2. The 5' end and 3' end of the second chain of the P5 linker are modified with 5' phosphorylation and the 3' spacer with phosphoramide, respectively.

4. The construction method of claim 1, wherein, Step 2) describes the following steps for ligating the semi-functional P7 adapter: cutting off the 5' and 3' protrusions of the free DNA ends to form blunt ends; then adding an A tail to the 3' end of the blunt-end DNA; ligating the 3'A protrusion DNA end to the 3'T protrusion adapter; and performing magnetic bead fragment sorting to remove excess adapters.

5. The construction method of claim 1, wherein, The semi-functionalized P7 adapter described in step 2) includes a first strand and a second strand. The nucleotide sequence of the first strand of the semi-functionalized P7 adapter is shown in SEQ ID NO.3, and the nucleotide sequence of the second strand of the semi-functionalized P7 adapter is shown in SEQ ID NO.

4. The 5' end and 3' end of the second strand of the semi-functionalized P7 adapter are modified with 5' phosphorylation and 3' spacer phosphoramide, respectively.

6. The construction method of claim 1, wherein, The target sequence mentioned in step 3) is a sequence with one end connected to the P5 connector and the other end connected to the semi-functional P7 connector.

7. The single-base precision DNA double-strand break sequencing library constructed by the construction method according to any one of claims 1 to 6.

8. The application of the single-base precision DNA double-strand break sequencing library constructed by the construction method according to any one of claims 1 to 6 in DNA double-strand break sequencing.

9. Use according to claim 8, characterized in that, The DNA double-strand break sequencing includes the following steps: using the single-base precision DNA double-strand break sequencing library, sequencing is performed using the Illumina sequencing platform to obtain the original paired-end sequences, and the positions of DSBs are determined through the sequencing results.

10. The application according to claim 9, characterized in that, The method for determining the location of DSBs through sequencing results includes the following steps: Screening for paired-end raw sequences that meet the following requirements: Trimming the raw sequencing adapters of reads1 and reads2, aligning them to the same chromosome in the genome, and removing PCR repetitive sequences; Obtaining the location of each DSB in the genome and read coverage: Only retaining the alignment information of reads1; if reads1 aligns to the positive strand of the genome, the position of the first base of the sequence is taken as the DSB site; if reads1 aligns to the negative strand of the genome, the position of the last base of the sequence is taken as the DSB site; simultaneously, the number of reads aligned to this DSB site is recorded as the read coverage of the DSB site; Retaining high-confidence DSB sites: Only retaining high-confidence DSB sites with ≥5 read coverage, effectively shielding against random DSBs introduced during library construction.