Method and application for constructing a cell-free DNA methylation sequencing library

The method repairs single-strand gaps in cell-free DNA using DNA ligase and enzymatic conversion, addressing low starting amounts and short fragments, ensuring accurate methylation detection and fragment mix information preservation.

JP7880983B2Active Publication Date: 2026-06-26CHANGPING NAT LAB

Patent Information

Authority / Receiving Office
JP · JP
Patent Type
Patents
Current Assignee / Owner
CHANGPING NAT LAB
Filing Date
2022-07-02
Publication Date
2026-06-26

AI Technical Summary

Technical Problem

Current methods for constructing DNA methylation sequencing libraries face challenges with low starting amounts, short fragments, and severe gaps in cell-free DNA, leading to incomplete library construction and inaccurate methylation detection.

Method used

A method involving DNA ligase to repair single-strand gaps, followed by enzymatic conversion to distinguish methylation states, and using carrier DNA to protect low amounts of DNA, ensuring library construction with intact fragment mix information.

Benefits of technology

Enables high-throughput sequencing with low starting DNA volumes, preserving methylation and fragment information, and reducing methylation loss, achieving accurate DNA methylation detection.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure 0007880983000006
    Figure 0007880983000006
  • Figure 0007880983000007
    Figure 0007880983000007
  • Figure 0007880983000008
    Figure 0007880983000008
Patent Text Reader

Abstract

The present invention relates to a method and application of a cell-free DNA methylation sequencing library, which can detect DNA methylation and fragmentomics simultaneously. In the method of the present invention, when constructing a sequencing library, the phosphodiester bond gap on the DNA double strand is first repaired, and then a linker sequence is linked to the double stranded DNA, and then a DNA methylation sequencing library is constructed by an enzymatic conversion method. The method provided by the present invention maintains the integrity of the DNA double strand, prevents DNA strand breakage, ensures that the fragmentomics information in the sequencing library is not damaged, and prevents the DNA methylation modification downstream of the gap from being erased in the end repair process, so that accurate DNA methylation information can be obtained.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This application relates to biotechnology, particularly to a method for epigenetics analysis of cell-free DNA.

Background Art

[0002] Cell-free DNA (cfDNA) is generally DNA fragments released from inside cells into body fluids or the environment during cell metabolism, exocytosis, or apoptosis and necrosis. Materials such as plasma, serum, urine, saliva, amniotic fluid, etc. contain trace amounts of cell-free DNA, which reflects information such as genetic mutations, diseases, aging, etc. in the body. Extraction, library construction, sequencing, and analysis of cell-free DNA, as an effective non-invasive / low-invasive detection method, are widely applied in various aspects of clinical diagnosis and scientific research, such as cancer screening and diagnosis, prenatal diagnosis, preimplantation diagnosis, aging research, and other fields. Epigenetics information carried by cell-free DNA, such as DNA methylomics / methylolomics features, fragmentomics features, is increasingly recognized as useful molecular markers for the detection, diagnosis, and / or monitoring of various diseases such as cancer.

[0003] DNA methylation is a phenomenon in which methylation chemical modifications occur in DNA within living organisms. Its chemical essence is the substitution of one hydrogen atom with a methyl group at the 5th carbon atom of the pyrimidine ring of a cytosine base, forming 5-methylcytosine. Such DNA modifications, without altering the DNA sequence, affect many life processes such as gene expression and regulation, cell division and differentiation, and cell metabolism and growth. Ultimately, they are epigenetic mechanisms that influence the growth and development of organisms, metabolic reproduction, disease onset and progression, and aging and death. When performing high-throughput sequencing (Next Generation Sequencing, NGS) on methylation modifications in DNA sequences, it is necessary to first specifically convert methylated / unmethylated cytosine to other bases in order to distinguish between methylation modifications.

[0004] Fragment mix information is a set of information obtained by analyzing the fragmentation characteristics of cell-free DNA, and includes information such as the distribution characteristics of cell-free DNA on the genome, fragment length and start / end positions, terminal base characteristics of the fragment, and non-smooth characteristics of the fragment. Studies have shown that cell-free DNA tends to be cleaved in several genomic regions or elements, so fragment mix information is one of the important epigenomic information that can reflect states such as chromatin openness and closure, nucleosome occupancy, transcription factor binding, and epigenetic modifications, and is also a useful molecular marker for monitoring and diagnosing various diseases such as cancer [1]. When obtaining fragment mix information by high-throughput sequencing, the integrity of the DNA fragments must be maintained.

[0005] However, because cell-free DNA is formed by strong DNA enzymatic degradation both inside and outside cells, it possesses several characteristics, making its detection difficult. 1. Low cfDNA content (e.g., only a few ng of cfDNA per 1 mL of plasma). Therefore, detection methods must be able to handle starting cfDNA amounts of 1 ng or less, and higher library construction efficiency is required to improve cfDNA detection sensitivity. 2. The cfDNA fragments are short (approximately 100 to 200 bp). First, short DNA fragments are easily lost during purification procedures in library construction and cannot withstand secondary cleavage due to chemical transformations. Second, whole-genome library construction methods such as transposases, random primers, ligation, and cleavage cannot be applied to short fragments, and applying library construction methods using targeted amplification such as targeted PCR also presents some difficulties and limitations. 3. cfDNA often suffers from severe chemical structural damage. Damage to cfDNA can result from a variety of processes, including DNA damage due to the physiological and pathological state of cells, damage from nuclease digestion during cell death processes, damage from persistent attack by active substances such as free nucleases in the body fluid environment (e.g., blood circulation), and damage that occurs during processes such as long-term storage in solutions such as plasma and repeated freeze-thaw cycles. Studies have found that 97–98% of cfDNA has gaps, which are single-strand phosphodiester bond breaks that occur in the DNA double helix [2]. When using single-strand library methods, cfDNA is cleaved at the gaps after double-strand degeneration and dissociation, generating fragments [3].

[0006] Existing methods for constructing DNA methylation sequencing libraries can be divided into four categories: (1) double-stranded library construction methods using bisulfite conversion [4], (2) single-stranded library construction methods using bisulfite conversion (e.g., Swift Biosciences kit Accel-NGS® Methyl-Seq DNA Library Kit, Cat. No. 30024), (3) double-stranded library construction methods using enzymatic conversion (e.g., New England Biolabs, Inc.'s EM-seq technology (WO2017075436A1) and its commercialized kit E7120, WO2019136413A1's TAPS technology, WO2021077415's Cabernet technology), and (4) single-stranded library construction methods using enzymatic conversion (e.g., CN114032287A). However, none of the above four methods can completely solve the difficulties in detecting cell-free DNA methylation.

[0007] The currently widely used bisulfite conversion method for detecting DNA methylation requires reacting DNA under harsh conditions of high salt (9M), high acid (pH=5), and high temperature (50-90°C) for several hours. As a result, the DNA denatures and breaks, and most of it is degraded and lost during the conversion reaction. Due to the large amount of DNA breakage and loss, the bisulfite conversion method cannot meet the requirements for building libraries with low starting amounts of cfDNA. Furthermore, it exacerbates the fragmentation and gap damage of cfDNA, making it difficult to develop a highly sensitive cell-free DNA methylation detection technique using the bisulfite conversion method.

[0008] In recent years, enzymatic DNA methylation sequencing technologies, such as EM-seq, TAPS, and Cabernet, have gradually developed and matured. Enzymatic conversion eliminates harsh chemical processing, avoids cleavage and degradation of DNA during conversion, and offers higher library construction efficiency, sensitivity, and greater library uniformity. However, none of the conventional patents and products related to enzymatic conversion can solve all the problems in cell-free DNA methylation detection. Regarding the problem of low starting amounts of cfDNA, both EM-seq (WO2017075436A1) and TAPS (WO2019136413A1) require a minimum starting amount of 5-10 ng, making it impossible to sequence cell-free DNA with high sensitivity. Cabernet technology (WO2021077415) enables library construction with low starting DNA amounts and allows for highly sensitive library construction even in single cells. However, it cannot be applied to short cell-free DNA fragments using transposases, and in particular, Cabernet technology has not yet provided a solution to the cfDNA gap problem.

[0009] Regarding the issue of chemical damage present in cfDNA, in conventional sequencing that does not detect methylation, double-stranded cfDNA is used for library construction, so gaps are filled by end repair processes and do not affect sequencing, and are not considered important. In conventional bisulfite methylation sequencing, single-stranded cfDNA is used for library construction, resulting in cleavage and loss of cfDNA at gaps, but this does not cause errors in the methylation sequencing data itself, so this issue is not considered important. The conventional method using the novel enzymatic conversion method is double-stranded library construction, and indeed, serious methylation elimination is observed, leading to significant methylation loss. For example, Erger et al. [5] showed from methylation sequencing library data constructed using a double-stranded library construction method that the average methylation level continued to decrease from approximately 80% at the start of library reads to 40%, and extensive methylation elimination was observed in subsequent read 2, but the authors did not discuss this anomaly in their paper. In the DNA methylation sequencing library and its construction and detection method disclosed by Takeishi Biological Laboratory (CN114032287A), the authors observed an abnormal phenomenon in which the methylation level decreased with library reads. They attributed this to a simple deletion of terminal single strands and attempted to circumvent the problem by using a single-stranded library construction method. However, this did not actually solve the problem, as gap breaks in the single-stranded state prevented accurate acquisition of fragment mix information, and the single-stranded ligation efficiency was low, resulting in low library productivity. Therefore, the impact of the presence of single-stranded gaps on DNA methylation detection remains unrecognized and unresolved in the academic community.

[0010] Therefore, currently there is no library construction method that can comprehensively solve the problems of low starting template quantities, short fragments, and severe gaps in cell-free DNA methylation detection. As a result, it is not possible to obtain a complete DNA library, and therefore it is not possible to accurately measure cell-free DNA methylation and obtain fragment mix information. [Overview of the project]

[0011] The inventors have diligently studied the characteristics of cfDNA and the principles of library construction methods, analyzed the causes of problems in the cfDNA methylation detection process, and, through a series of targeted improvements, have succeeded in providing a novel cfDNA methylation sequencing method.

[0012] In one aspect, the present application relates to a method for pre-treating biological samples for constructing a sequencing library, S1: A step of repairing single-strand gaps in DNA fragments contained in the sample using DNA ligase, and S2: Optionally includes a step of filling the terminal gaps in the DNA fragment with DNA polymerase, The present invention provides a method in which the sequencing library is used to obtain DNA methylation information data of the biological sample after sequencing, or to obtain DNA methylation information and fragment mix information data of the biological sample.

[0013] In some embodiments, the length of the DNA fragment is 50 bp to 1000 bp, preferably 100 bp to 500 bp, and more preferably 100 bp to 350 bp.

[0014] In some embodiments, the method further includes, prior to step S1, fragmenting the DNA molecules contained in the biological sample to generate the DNA fragments.

[0015] In some embodiments, the ligase has 3',5'-phosphodiester bond catalytic activity.

[0016] In some embodiments, the ligase is selected from HiFi-Taq ligase, T4 DNA ligase, Taq DNA ligase, E. coli DNA ligase, and T7 DNA ligase.

[0017] In some embodiments, the biological sample is selected from animal body fluids, cell culture media, or biological samples from nature, and includes, but is not limited to, peripheral blood, plasma, serum, urine, feces, saliva, cerebrospinal fluid, lymph, alveolar lavage fluid, amniotic fluid, cleavage fluid, cell culture media, embryo culture media, microbial culture media, soil leachate, and bone meal leachate.

[0018] In some embodiments, the DNA fragments include, but are not limited to, cell-free DNA, extranuclear-derived free DNA, and fragmented DNA, and include, for example, cell-free DNA derived from cancer subjects.

[0019] In some embodiments, the DNA fragment content in the sample is 10 ng or less, or 1 ng or less, for example, 0.1 ng.

[0020] In some embodiments, the DNA methylation includes, but is not limited to, cytosine methylation and / or methylolation.

[0021] In another respect, the present invention relates to a method for constructing a sequencing library, 1) Prepare a biological sample and pre-treat the biological sample using the above pre-treatment method to obtain a DNA fragment that does not have a 3',5'-phosphodiester bond gap. 2) Construct the sequencing library using DNA fragments that do not have a 3',5'-phosphodiester bond gap. Includes, The present invention provides a method in which the sequencing library is used to obtain DNA methylation information data of the biological sample after sequencing, or to obtain DNA methylation information and fragment mix information data of the biological sample.

[0022] In some embodiments, step 2) further includes: a) performing blunt-end treatment on both ends of the DNA fragment and adding a linker sequence, preferably a linker sequence in which some or all of the cytosines are methylated cytosine and / or hydroxymethylated cytosine; and b) converting cytosine in the DNA fragment to uracil by an enzymatic conversion method, wherein the uracil is recognized as thymine in subsequent amplification and sequencing, and the methylated cytosine is recognized as cytosine in subsequent amplification and sequencing.

[0023] In some embodiments, step 2) further includes: a) performing blunt-end treatment on both ends of the DNA fragment and adding a linker sequence; and b) converting methylated cytosine in the DNA fragment to uracil or dihydrouracil by an enzymatic conversion method, wherein the uracil or dihydrouracil is recognized as thymine in subsequent amplification and sequencing, and the non-methylated cytosine is recognized as cytosine in subsequent amplification and sequencing.

[0024] In some embodiments, the EM-seq conversion method or the TAPS conversion method is used as the enzymatic conversion method.

[0025] In some embodiments, the method further includes adding carrier DNA to the biological sample after step a).

[0026] In some embodiments, the carrier DNA may be any DNA that does not contain the linker sequence. Preferably, the fragment size of the carrier DNA is 100 - 500 bp.

[0027] In another aspect, the present application provides a sequencing library obtained by the above method.

[0028] In another aspect, the present application provides a method for identifying single-strand gap positions in DNA fragments contained in a biological sample, comprising: 1) dividing the biological sample into two parts, a biological sample A and a biological sample B; 2) Prepare a first sequencing library by processing biological sample A using the above pretreatment method, and prepare a second sequencing library by processing biological sample B using the above pretreatment method under conditions where DNA ligase is not used in the above pretreatment step, and 3) Identifying the location of the single-strand gap based on the difference in DNA methylation information obtained from the first sequencing library and the second sequencing library. This provides a method that includes this.

[0029] In another respect, the present application relates to a method for identifying the health status of a subject, 1) Prepare a biological sample derived from the subject. 2) Construct a sequencing library using the above preprocessing method, and 3) Sequence the sequencing library, or perform sequencing after enrichment by hybrid capture. Includes, The present invention provides a method for identifying the health status of a subject by obtaining DNA methylation information data of a biological sample or DNA methylation information data and fragment mix information data of the biological sample based on sequencing data, and comparing it with normal DNA methylation information data and / or fragment mix information data in a population.

[0030] The method provided herein may be used for epigenetic analysis of cell-free DNA. The method provided herein may also be used in the fields of genomics, medicine, diagnostics, and epigenetics research. [Brief explanation of the drawing]

[0031] [Figure 1] This is a flowchart illustrating the method for constructing the methylation library provided by this application. [Figure 2] This is a flowchart of the EM-seq conversion method. [Figure 3]This is a flowchart of the TAPS conversion method. [Figure 4] This figure shows the results of an analysis of library methylation bias in the construction of methylated human peripheral blood cell-free DNA libraries using a method that involves processing with DNA ligase (solid line) and a method that does not involve processing (dashed line). The mean value of the CpG methylation rate (vertical axis) changes along the 5' to 3' direction (horizontal axis) of each library fragment. [Figure 5] This shows the regularity in the distribution of cell-free DNA fragments, with a peak having a period of approximately 170 bp. [Figure 6] The distribution of library fragments generated by different library construction methods is shown. Capillary gel electrophoresis was performed with the same amount of library DNA input, and the distribution of DNA fragments within the range of 1 to 6000 bp was analyzed. The library produced by the method of the present invention (upper figure) shows a periodic peak profile of DNA fragments that matches the characteristics of peripheral blood cell-free DNA. In contrast, the Swift Biosciences, Accel-NGS® Methyl-Seq DNA Library Kit (lower figure) has only one main peak, lacks periodic peaks, and contains a large amount of non-library impurity peaks. [Modes for carrying out the invention]

[0032] Unless otherwise specified, all technical and scientific terms used in this application have meanings that are generally understood by those skilled in the art.

[0033] In this specification, "DNA ligase" refers to an enzyme that repairs single-strand gaps (nicks) in double-stranded DNA molecules. DNA ligases typically have 3',5'-phosphodiester bond catalytic activity. For example, such a DNA ligase can catalyze the formation of a phosphodiester bond between the 5' phosphate group and the 3' hydroxyl group in the gap. Generally, DNA ligases do not add new nucleotides to the gap.

[0034] In this specification, "DNA polymerase" refers to an enzyme that, when a primer is present, can synthesize a complementary DNA strand by adding a dNTP (deoxyribonucleotide) to the 3' end of a primer, using a single DNA strand as a template. DNA polymerases typically also possess 3'-5' exonuclease activity, which performs proofreading during synthesis, and 5'-3' exonuclease activity, which performs excision repair.

[0035] In this specification, "DNA fragment" refers to a short fragment of DNA, for example, having a length of 50 bp to 700 bp, for example, 100 bp to 500 bp, and particularly 100 bp to 350 bp. Since DNA fragments contained in biological samples are usually heterogeneous, i.e., not of the same length, the above lengths accordingly refer to the average length of these DNA fragments. These DNA fragments may have different sequences, for example, originating from different regions of the genome of the same organism, or from different organisms. DNA fragments may have single-strand gaps and may have blunt ends or non-blunt ends (3' or 5' overhangs).

[0036] In this specification, "blunt-ended DNA fragment" refers to a double-stranded DNA fragment that does not have 3' or 5' overhangs at its ends.

[0037] In this specification, "DNA methylation" refers to the modification of cytosine bases in a DNA molecule or DNA fragment to 5-methylcytosine (5mC). In vertebrates, DNA methylation generally occurs at CpG sites (i.e., sites in the DNA sequence where guanine is linked immediately after cytosine) and is catalyzed by DNA methyltransferase to convert cytosine to 5-methylcytosine. While many CpG sites in the human genome are already methylated, some specific regions, such as cytosine (C) and guanine (G)-rich CpG islands, are not normally methylated. CpG methylation affects the transcriptional activity of related genes; for example, methylation can repress tumor suppressor genes, and demethylation can stimulate the expression of certain oncogenes, potentially leading to cancer development in either case. Furthermore, even if the incidence rate is less than 5mC, a small number of cytosine bases may be modified to 5-methylolcytosine (5hmC), 5-formylation (5fC), or 5-carboxylation (5caC). In this specification, when methylation is referred to, it may also refer to modification to 5hmC, 5fC, or 5caC unless otherwise specified in the context.

[0038] In this specification, “DNA methylation information” refers to information regarding the methylation status of a DNA molecule or DNA fragment, and includes, but is not limited to, methylation sites, methylation levels, and methylation schemes (5mC or 5hmC). In this specification, “methylation level” is also called “degree of methylation” and means the proportion (or frequency) to which a particular methylation site in a sample is modified by methylation. Whether or not a site is methylated can be detected by various methods. Conventional methods include chemical or enzymatic conversion, in which either methylated cytosine or unmethylated cytosine is converted to uracil (U) or a base substantially equivalent to uracil in the base pairing scheme (e.g., dihydrouracil, DHU). In the subsequent amplification process, the corresponding uracil pairs with adenine (A) as thymine (T), and finally, the cytosine or methylated cytosine at the methylation site appears as thymine in the detection result (e.g., sequencing result). By comparing with a reference sequence, it is possible to identify whether or not cytosine in a DNA molecule or DNA fragment is methylated. The reference sequence may be a sequence originating from the same sample but without the above conversion, or it may be a corresponding sequence in a healthy population. Furthermore, as will be described later, 5mC and 5hmC can also be distinguished in several ways. Currently, DNA methylation information is widely used for cancer screening and diagnosis (e.g., lung cancer, breast cancer, liver cancer, colorectal cancer, etc.), particularly for early screening and diagnosis.

[0039] In this specification, "fragment mix information" refers to a collection of information obtained by analyzing the fragmentation characteristics of cell-free DNA in a sample, including, for example, the distribution characteristics of cell-free DNA on the genome, fragment length and start / end positions, characteristics of the fragment terminal bases, and non-smooth characteristics of the fragments. Recently, several studies have reported using this information for cancer screening (e.g., lung cancer).

[0040] In this specification, "enzyme conversion" refers to the modification of cytosine or methylated cytosine by catalytic action of a specific enzyme so that its methylated and unmethylated states can be distinguished in subsequent detection, or so that 5mC and 5hmC can be distinguished. Several enzyme conversion methods are known in this art, including but not limited to EM-seq conversion and TAPS conversion. Enzyme conversion methods typically involve using these enzymes in combination with several chemical reagents.

[0041] In this specification, “EM-seq conversion” refers to a technique developed by New England Biolabs for distinguishing between methylated and unmethylated cytosine by enzymatic conversion. 5mC and 5hmC are modified to 5-carboxycytosine (5caC) using dioxygenase TET (including TET1, TET2, TET3, etc.), and then unmethylated cytosine is deaminated and converted to uracil using cytidine deaminase. The cytidine deaminases used include, for example, APOBEC protein family members such as APOBEC 3A. Further distinction between 5mC and 5hmC is possible when using DNA glycosyltransferase (GT). A scheme for the EM-seq conversion method can be found in Figure 2. For more detailed information on the EM-seq conversion method, please refer to PCT application publication WO2017075436A, which is incorporated herein by reference in its entirety.

[0042] The TAPS (TET-assisted pyridine borane sequencing) conversion method is somewhat similar to the EM-seq conversion method. It also uses an enzymatic method to modify 5mC and 5hmC to 5-carboxycytosine (5caC), but without a deamination step. Instead, it retains the unmethylated cytosine while converting 5caC to dihydrouracil (DHU) using the reducing agent pyridine borane. In subsequent copying or amplification processes, DHU is recognized as uracil by polymerase and paired with A. Thus, the 5mC and 5hmC modified sites are detected as T, while the unmodified C remains detected as C. Similarly, using DNA glycosyltransferase (GT) or KRuO4 allows for further differentiation between 5mC and 5hmC. The scheme of the TAPS conversion method can be seen in Figure 3. For more detailed information on the TAPS conversion method, please refer to PCT application publication WO2019136413A, the full text of which is incorporated herein by reference.

[0043] In this specification, "health status" refers to the health condition of a subject, including whether or not they have a disease, the level of risk of contracting a disease, whether or not they are suitable for a particular treatment or treatment method, and the prognosis of the disease. In this specification, “subject” means an animal, such as a mammal, including but not limited to humans, rodents, monkeys, felines, canids, equids, bovids, saurids, sheep, goats, mammalian laboratory animals, mammalian livestock, mammalian sports animals, and mammalian pets. Subjects may be male or female and may be of any age, including infants, toddlers, adolescents, adults, and elderly subjects. In some examples, subjects are patients. In certain examples, subjects are human, such as human patients. This term may be used interchangeably with “patient,” “test subject,” “treatment subject,” etc.

[0044] In one aspect, the present invention provides a method for pre-treating a sample to be used for sequencing library construction. This pre-treatment includes repairing single-strand gaps in DNA fragments contained in the sample, followed by normal end repair. Gap repair can be performed using DNA ligase, and end repair can be performed using DNA polymerase. The reason why gap repair must be performed prior to normal end repair is that the inventors have for the first time recognized and confirmed through research that the presence of these single-strand gaps is a significant cause of loss of methylation information, particularly in cell-free DNA (cfDNA), such as circulating tumor DNA (ctDNA). If only the normal repair process is performed, the DNA polymerase used will perform strand substitution downstream of the post-gap due to its 5'-3' exonuclease activity and polymerase activity, resulting in the erasure of methylation information contained in the original strand. When end repair of the DNA fragment is performed after gap repair, strand substitution does not occur, and the original methylation information can be preserved.

[0045] In another aspect, the present invention provides a method for constructing a sequencing library, which includes constructing a sequencing library after subjecting a sample to the above-described pretreatment. In some embodiments, the process of constructing the library also includes modifying cytosine (C) or methylated cytosine (including 5mC and / or 5hmC) within the library by enzymatic conversion, thereby enabling the differentiation of methylated and unmethylated cytosine, or the differentiation of 5mC and 5hmC, in a subsequent detection step, and forming a DNA methylation sequencing library. By placing this sequencing library in a computer and sequencing it, corresponding methylation information data can be obtained. Those skilled in the art will understand that by employing the method of the present invention and performing sequencing, not only methylation information data but also fragment mix information data can be obtained simultaneously. This is because the library construction technique is a methylation NGS detection library construction based on cfDNA double-stranded fragments, and by distinguishing the methylation state by enzymatic conversion after repairing single-stranded gaps, it preserves the original fragment mix information, including fragment length distribution characteristics and fragment length, without substantially compromising the integrity of the original DNA fragments in the sample.

[0046] In another embodiment, the requirement for the DNA fragment content in the sample of the method of the present invention can be further reduced to a DNA content of 0.1 ng or less by using carrier DNA.

[0047] In another aspect, the present invention identifies the location of a single-strand gap in a DNA fragment contained in a biological sample based on the difference in methylation information due to the addition or absence of DNA ligase. Since the gap location is not usually exactly 5mC or 5hmC, in many cases the location of the gap is considered to be within a range of gap locations. For example, the gap location is considered to be between a first location, which is a methylation location that can be detected with or without the addition of DNA ligase, and a second location, which is a location where a difference in downstream methylation is detected (for example, methylation is detected when DNA ligase is added, but not when ligase is not added).

[0048] In another aspect, the present invention allows for the determination of a subject's health status based on acquired methylation information, and in particular, by combining methylation information with fragment mix information, the subject's health status can be determined more accurately.

[0049] The technical problems that this invention aims to solve include, but are not limited to, the following. 1. To provide a high-throughput two-generation sequencing library construction and NGS sequencing method for cell-free DNA that can simultaneously and accurately measure DNA methylation and fragment mix information. 2. Because it is a highly sensitive method, it can efficiently construct NGS libraries even with low starting sample volumes of 1 ng or less. 3. This method maintains the integrity of the cell-free DNA strand, ensuring that the fragment mix information in the sequencing library is not destroyed. 4. Avoid DNA breaks and loss of methylation information due to gap damage on cell-free DNA. 5. By simply changing the technical route, it becomes possible to measure arbitrary methylation information, such as DNA methylation, and fragment mix information.

[0050] The method provided by the present invention is used for, but is not limited to, the epigenetic analysis of cell-free DNA, and can achieve the above objectives. This method includes one or more of the following features. 1. Avoid methylation loss due to DNA gap damage by using a specific DNA ligase. In this method, one or more specific DNA ligases, such as Taq DNA ligase, are used on DNA with gaps to repair phosphodiester bond gaps on the template DNA double strand, thereby preventing DNA strand break loss and / or erasure of methylation information in subsequent processes. When the gap-repaired DNA template is converted into a double-stranded library by enzymatic conversion, severe erasure of methylation information does not occur at the 3' end of the library, and the detected methylation rate recovers to near the average methylation rate. 2. By using carrier DNA to assist in protecting trace amounts of DNA, effective library construction and highly sensitive methylation sequencing for trace amounts of DNA are achieved. This method is an improved version of the enzymatic conversion method, referencing the Cabernet technology described in patent publication WO2021077415. After adding a library linker to the template DNA, a carrier DNA with similar physicochemical properties but without a linker sequence is added before enzymatic conversion. In processes where DNA loss or degradation is likely to occur, such as purification, the carrier DNA becomes the main component lost, effectively avoiding the loss of template DNA. Furthermore, by omitting the single-strand purification step after conversion and directly adding the PCR amplification reaction system, the loss of DNA template during the purification process is minimized. 3. While bisulfite conversion results in the loss of fragmentomics information, this method employs an enzymatic conversion method and a double-stranded DNA library strategy, accurately measuring DNA methylation information while preserving DNA fragmentomics information. Furthermore, treatment with DNA ligase does not affect the preservation of fragmentomics information.

[0051] Method Overview (1) Extraction of cell-free DNA Cell-free DNA is DNA that has been released outside of cells or cell nuclei. It can be extracted from various biological materials such as bodily fluids, in vitro cell culture media, and the natural environment, and includes, but is not limited to, peripheral blood, plasma, serum, urine, feces, saliva, cerebrospinal fluid, lymph, alveolar lavage fluid, amniotic fluid, cleavage fluid, cell culture media, embryo culture media, microbial culture media, soil leachate, and bone meal leachate. Cell-free DNA can be obtained through extraction and purification.

[0052] The total amount of cell-free DNA used for downstream sequencing analysis can be reduced to picogram or nanogram levels.

[0053] (2) DNA gap repair This process may involve using a specific ligase, and may be a combination of one or more repair reagents and / or DNA ligases, such as HiFi-Taq ligase, T4 DNA ligase, Taq DNA ligase, T7 DNA ligase, and E. coli DNA ligase.

[0054] DNA ligases and / or DNA repair reagents repair gaps on the template DNA double strand and prevent DNA strand break loss and / or erasure of methylation information in subsequent porcessations.

[0055] (3) Addition of library linker sequences (adaptors) to both ends of the DNA Using polymerases or similar methods, gaps at the ends of the double strand are filled, and in some cases, a single A (adenine) base is added to the end to facilitate linker ligation.

[0056] This process involves using DNA ligase to ligate a DNA linker, which has a specific sequence partially / fully cytosine-methylated or methylolated, to both ends of a template DNA.

[0057] In some cases, the DNA linker sequence may be biotin-labeled to facilitate subsequent purification.

[0058] In some cases, sample barcodes and single DNA molecule barcodes may be incorporated into the DNA linker sequence, labeling the origin of the DNA template, improving detection accuracy, and enabling the mixing of samples from different origins in subsequent steps.

[0059] (4) Addition of Carrier DNA To prevent the loss of trace amounts of template DNA in subsequent processes, several times the amount of similar carrier DNA is added to the template DNA solution. This carrier DNA is then primarily lost in processes where DNA loss or degradation is likely to occur, such as purification, effectively avoiding the loss of template DNA.

[0060] Because the carrier DNA does not contain a specific DNA linker sequence, it is not detected in subsequent library amplification and sequencing.

[0061] (5) Methylation conversion This process uses an enzymatic conversion method to specifically convert cytosine, methylated cytosine, and / or methylolated cytosine on DNA, altering the base pairing rules of cytosine, methylated, and / or methylolated cytosine so that they can be distinguished by corresponding to different bases in subsequent amplification and sequencing processes.

[0062] In this process, the "EM-seq" technology invented by New England Biolabs can be used. In other cases, TAPS technology may be used.

[0063] (6) Library amplification and sequencing During methylation conversion, DNA becomes single-stranded or partially single-stranded due to thermal denaturation and changes in base pairing rules. Single-stranded DNA has poor affinity in the purification process, leading to significant loss. This method eliminates the single-strand purification step and directly adds the PCR amplification reaction system, thereby minimizing the loss of template DNA during the purification process.

[0064] The library undergoes a PCR amplification reaction, and the amplification primers contain sequences complementary to the linker sequence and sequencer-compatible sequence, and may also incorporate a sample barcode. The primers pair complementaryally with the linker sequences attached to both ends of the template DNA, allowing for amplification and the formation of the desired sequencing library.

[0065] The sequencing library undergoes quantification and quality control after purification, followed by sequencing using a sequencer.

[0066] (7) Analysis of sequencing results In an example of methylation conversion using EM-seq technology, cytosine is converted to thymine, and methylated cytosine remains as cytosine. However, in an example of methylation conversion using TAPS technology, methylated cytosine is converted to thymine, and cytosine remains unchanged. DNA methylation information can be obtained by alignment with a reference genome. This method adds a library linker while the cell-free DNA maintains its double-stranded state before methylation conversion, thus preserving DNA fragment information. By aligning with a reference genome, it is possible to understand fragment mix information such as the original DNA fragment length, fragment distribution, upstream / downstream break point locations, and terminal base patterns of the cell-free DNA. The above analysis process can be seen in Figure 1.

[0067] Other applications of the present invention DNA methylolation and fragment mix sequencing of cell-free DNA In mammals, the content of DNA methylated DNA is lower than that of DNA methylated DNA, but it has different biological significance.

[0068] If, in the above methylation conversion process, it is necessary to measure only methylolation without distinguishing between DNA methylation and methylolation, then it is possible to distinguish between 5mC and 5hmC, for example, by not adding TET oxidase during the conversion process and maintaining other enzymatic reactions.

[0069] Enrichment method based on targeted PCR Enriching target fragments of interest can save sequencing costs and increase the sensitivity of detecting target genes.

[0070] In step (6) of the "Outline of Method" described above, PCR is performed using primers complementary to the linkers at both ends of the template DNA to amplify the entire template DNA. If it is necessary to enrich the target gene fragment, amplification can be performed using a targeted PCR method. Target primers containing sequences complementary to the methylated target gene fragment are designed and introduced into the PCR reaction system to amplify the target gene sequence.

[0071] The target primer may or may not contain part or all of the linker sequence compatible with the sequencer.

[0072] By adding only the upstream or downstream primer, a fragment can be obtained in which one end is the target primer and the other end is the linker sequence added in step (3) of the "Outline of Method" described above.

[0073] The upstream and downstream primers may be ligated at their ends, constructed on a single DNA molecule, and form a lock primer.

[0074] The default linker sequence primer may be added first, followed by the target primer after amplification, or they may be added simultaneously, or only the target primer may be used without adding the default linker sequence primer.

[0075] Concentration method using hybrid capture In step (6) of the "Outline of Method" described above, hybrid capture enrichment may be performed before or after amplification. It is necessary to design a probe that contains a sequence complementary to the target gene fragment after methylation conversion.

[0076] Hybrid capture enrichment involves mixing a library with a single-stranded DNA / RNA probe labeled with biotin, and then, through thermal denaturation and restoration, the library complementary to the target sequence hybridizes with the probe. The hybridized probe is then captured using methods such as streptavidin-conjugated magnetic beads to obtain an enriched library.

[0077] Instead of performing gap repair, the gap location information is obtained. If the ligase treatment in step (2) of the "Overview of the Method" above is omitted, and repair is performed using an appropriate polymerase or modified nucleotide, the methylation information downstream of the gap will be altered. By analyzing the sequence in which the methylation has been erased / altered, the gap position can be determined.

[0078] This invention discloses a highly sensitive DNA methylation sequencing technique for use with DNA having single-strand gaps. The technique first repairs DNA gap damage to ensure that DNA methylation signals are not lost or misread. Next, the technique constructs a library using double-stranded DNA as a template, overcoming the problem of reduced efficiency in single-stranded library construction. Subsequently, it converts the methylation information to base pairing rules using a significantly optimized and refined enzymatic conversion method. Combined with protection of the template DNA during the reaction and purification process using carrier DNA, the technique ultimately obtains a high-quality, highly sensitive methylation information sequencing library using an extremely small amount of starting DNA as a template, while preserving DNA fragment mix information.

[0079] The method of the present invention has the following advantages, but is not limited to these. 1. The minimum starting amount of DNA is low, less than 0.1 ng. 2. Library output can be increased fourfold. 3. Library alignment improves from 40% to 70%. 4. The length of continuous DNA methylation loss at the 3' end of the library decreases from 170 bp to within 40 bp.

[0080] The method of the present invention can be applied to various liquid biopsy diagnostic products such as early cancer screening, cancer gene detection, prenatal diagnosis, preimplantation genetic diagnosis, and genetic diagnosis, and provides valuable technical support for scientific research in related fields.

[0081] The present invention will be further described below with reference to specific examples. [Examples]

[0082] Example 1: In this example, the process of preparing a sequencing library will be explained using peripheral blood cfDNA as an example.

[0083] 1.1 cfDNA extraction cfDNA can be extracted using any standard method in the prior art.

[0084] 5 mL of peripheral blood was collected using a 5 mL EDTA venous vacuum blood collection tube (with a purple cap), and plasma separation was performed within 4 hours by slow centrifugation at room temperature, 1600 xg, using a horizontal rotor for 10 min. The upper layer of plasma was carefully removed and transferred to a 1.5 mL tube. The tube was centrifuged at 4°C, 6000 xg for 10 min in a high-speed centrifuge, and the supernatant was transferred to a new 1.5 mL tube and stored in a -80°C refrigerator or on dry ice. cfDNA could be extracted from the separated plasma using a nucleic acid extraction or purification kit (QIAGEN, QIAamp® Circulating Nucleic Acid, catalog number 55114), typically yielding 5-20 ng of cfDNA per 1 mL of plasma.

[0085] 1.2 cfDNA gap repair A ligase reaction system was set up, and using Taq DNA ligase as an example, the reagents listed in Table 1 below were added to the purified cfDNA solution. Of these, the sonicated spike-in DNA was a mixture of whole-genome CpG-methylated pUC19 DNA (NEB E7122) and whole-genome CpG-unmethylated lambda DNA (NEB E7123) in equal volumes. After cutting to 300 bp using sonication, it was diluted to 0.2 ng / μL for use. [Table 1]

[0086] The mixture was uniformly mixed for a short time, centrifuged, and then reacted in a PCR machine at 37°C to 60°C for 10 to 30 minutes.

[0087] 1.3 End repair and linker connection The End Prep reaction system was set up, and the reagents from the NEBNext Ultra II End Repair / dA-Tailing Module kit (NEB E7546L) (Table 2) were sequentially added to the repair product from the previous step. [Table 2]

[0088] The mixture was homogenized in a short time, centrifuged, and then reacted in a PCR machine at 20°C for 30 minutes and 65°C for 30 minutes.

[0089] After the reaction was complete, a linker sequence with all cytosine methylated was added. To complement the subsequent library construction primers, the recommended pair of linker sequences were: 5'- A [5mC] A [5mC] T [5mC] TTT [5mC] [5mC] [5mC] TA [5mC] A [5mC] GA [5mC] G [5mC] T [5mC] TT [5mC] [5mC] GAT [5mC] T-3' (SEQ ID NO:1) and 5'-phosphate group-GAT [5mC] GGAAGAG [5mC] A [5mC] A [5mC] GT [5mC] TGAA [5mC] T [5mC] [5mC] AGT [5mC] A-3' (SEQ ID NO:2). The primers were diluted to a total concentration of 15 μM and added in 1.25 μL, then mixed homogeneously.

[0090] Subsequently, the following reagents (Table 3) from the NEBNext Ultra II Ligation Module kit (NEB E7595L) were added sequentially. [Table 3]

[0091] The mixture was quickly and uniformly mixed, then centrifuged, and reacted in a PCR device at 20°C for 15 minutes.

[0092] 1.4 Addition and Purification of Carrier DNA 1 μL of Carrier DNA was added to the ligation product from the previous step, mixed uniformly for a short time, and then centrifuged.

[0093] Carrier DNA was prepared by cutting lambda DNA (NEB N3011) into 300 bp segments using ultrasound, and then diluting it to 25 ng / μL for use.

[0094] DNA was purified using 1.8 times the volume of SPRI or Ampure XP magnetic beads, and then eluted with water to obtain purified DNA.

[0095] 1.5 Enzymatic conversion of DNA methylation modification The purified DNA obtained in the previous step was subjected to enzymatic conversion using the NEBNext® Enzymatic Methyl-seq Conversion Module (NEB, E7125L) kit, including DNA TET oxidation, glycosylation protection, DNA thermal denaturation, and APOBEC deamination. After the APOBEC deamination, the DNA was not purified and proceeded directly to the next reaction.

[0096] 1.6 PCR amplification of the library After the APOBEC deamination reaction in the preceding step, PCR primers with a library linker and index labels were added to the liquid, and for example, 5 μL of NEBNext® Multiplex Oligos for Illumina primer set was added. Furthermore, the same volume as the existing liquid was added as 2x Q5U master mix (NEB M0597L). After homogeneous mixing in a short time and centrifugation, the reaction was carried out in a PCR instrument at 98°C for 30 s; 98°C for 10 s, 62°C for 30 s, 72°C for 90 s (7 cycles); and 65°C for 5 min, and the mixture was stored at 4°C.

[0097] DNA was purified using 1.1 times the volume of SPRI or Ampure XP magnetic beads, eluted with 1x TE buffer to obtain purified DNA, and stored at -20°C.

[0098] 1.7 Library Sequencing and Analysis The constructed library underwent concentration quantification and fragment distribution quality testing, followed by double-ended sequencing using an Illumina sequencer. The sequencing data was aligned and analyzed using Bismark software (to avoid biased methylation distributions, the methylation rates of the top 10 bases of read 1 and the top 40 bases of read 2 were not included in the methylation analysis results) to obtain whole-genome methylation data. Information such as the distribution characteristics of the library in the genome, fragment length and start / end positions, and 5' end base distribution characteristics obtained through bioinformatics analysis constitutes fragment mix information. [Table 4]

[0099] The sequencing results contain tens of millions to billions of pieces of the above information, and the amount of sequencing varies depending on the sequencer and chip selection. [Examples]

[0100] Example 2: Comparison of DNA gap repair To evaluate whether this method can compensate for the drawback of methylation loss in conventional enzyme conversion library construction methods, a parallel controlled comparative study was conducted.

[0101] After extracting cell-free DNA from the plasma of healthy individuals, the experiment was conducted by dividing the subjects into two groups. Experimental group: DNA ligase treatment (refer to step 1.2 in Example 1: DNA gap repair reaction) Control group: Untreated with DNA ligase (Control group, DNA ligase replaced with water)

[0102] The cell-free DNA from the two groups was subjected to the same double-stranded library construction process.

[0103] End modification was performed using the reagent NEBNe.t Ultra II End Repair / dA-Tailing Module (NEB E7546L), and linker ligation was performed using the reagent NEBNe.t Ultra II Ligation Module (NEB E7595L) and the reagent NEBNe.t® EM-seq® Adaptor (NEB E7165).

[0104] The template DNA after linker addition was purified using magnetic beads and then enzymatically converted using the NEBNe.t® Enzymatic Methyl-seq Conversion Module (NEB, E7125L) kit. The converted DNA was then PCR-amplified using Q5U master mi. (NEB M0597L).

[0105] The constructed library was sequenced using an Illumina sequencer at 150 bp double-ended intervals. Alignment and analysis were performed using Bismark software. The methylation bias (M-bias) of the library was statistically analyzed, and the average methylation rate at each read position was plotted along the 5'–3' direction of the library fragment to evaluate whether the methylation level of the library was uniform. The results are shown in Figure 4.

[0106] As shown in the figure, after constructing a library using methylated double strands without performing gap repair on cell-free DNA, the average methylation level of the DNA in the sequencing library showed increasingly significant methylation loss (dotted line in the figure) from approximately 130 bp in read 1 to read 2 along the 5'–3' direction of the library fragment, and was particularly pronounced in the sequencing data of read 2, with an average loss of 40% of the true methylation rate. The experimental group repaired the methylation loss in rear 2 by DNA ligase repair treatment (solid line in the figure) (the decrease in methylation rate in the last 30 bp of the library's 3' end is due to a defect in the 3' end of the cell-free DNA double-strand template itself).

[0107] This data demonstrates that the method provided herein can compensate for methylation loss due to gap damage in cell-free DNA. [Examples]

[0108] Example 3: Fragment mix information can be obtained along with methylation information. To evaluate the superiority of this method over conventional methods, the technology and commercialized kit of the present invention were tested in comparison. Experimental group: The technical method of the present invention (see Example 1 for specific procedures) Control group: Swift Biosciences (Kit manufacturer part number: 33024 Accel-NGS(registered trademark) Methyl-Seq DNA Library Kit; refer to the kit operation manual for detailed instructions.)

[0109] Cell-free DNA was extracted from the plasma of healthy individuals. Library construction tests were then performed using two different methods with the same starting cell-free DNA amount (3 ng) and the same number of PCR amplification rounds (7 rounds). After sequencing and analysis of the libraries, alignment efficiency was evaluated using Bismark software. The results are shown in Table 5. [Table 5]

[0110] Specifically, when the DNA yield of the constructed library was quantified using Qubit, the library yield produced by the present invention's technical method was 83.4 ng, which is significantly higher than the library yield produced by the Swift method, compared to the Swift kit. The alignment rate of the sequencing library produced by the present invention's technical method was 70%, which is approximately twice that of the Swift library (44%), indicating high library effectiveness. The duplication rate of the sequencing library constructed by this method was 15%, which is significantly lower than the Swift library's duplication rate of 22%, indicating fewer duplicate fragments and higher quality in the library. According to this method, fragment mix information is preserved (see Figures 5 and 6). Analysis of DNA fragments before library construction using the Agilent fully automated pulsed-field electrophoresis fragment analysis system (Agilent Femto Pulse) showed that, as shown in Figure 5, the cell-free DNA itself had a characteristic peak profile with a period of approximately 170 bp, which is a typical fragmentation characteristic due to nucleosome occupation. Analysis of the fragments after library construction using the Agilent 5200 Fragment Analyzer System (see Figure 6) showed that, compared to libraries constructed with the Swift kit, the sequencing library constructed by the present invention contains a fragmentation distribution characteristic of cell-free DNA, and allows for better analysis of fragment mix data.

[0111] As demonstrated for the first time in this invention, severe methylation defects in double-stranded library construction result from both single-strand gaps and terminal single-strand deletions. Terminal single-strand deletions represent a loss of original information and cannot be recovered. Furthermore, single-strand gaps erase the methylation of most reads. The principle is that in the double-stranded library construction process, it is first necessary to repair the ends of the double-stranded DNA template to form a DNA double-stranded DNA having blunt ends or 3'-A attached ends, and then to ligate the linker sequence. On the other hand, the DNA polymerase used for end repair has an activity that starts synthesis from the gap (Extension from Nick), and it replaces or enzymatically degrades the original DNA template strand downstream along the 5'-3' direction of the DNA from the gap location, synthesizes a substitute strand with artificially provided unmodified bases, and results in the erasure of continuous DNA methylation information. Moreover, such methylation erasure cannot be restored later by means such as bioinformatics, and due to the randomness of the gap location, it is not possible to accurately determine the valid portion of the methylation information using consistent criteria. The pretreatment method provided herein repairs gaps before library construction and solves the problem of widespread methylation loss (Figure 4).

[0112] References: 1. Lo, YMD, Han, DSC, Jiang, P. & Chiu, RWK Epigenetics, fragmentomics, and topology of cell-free DNA in liquid biopsies. Science 372, (2021). 2. Sanchez, C., Roch, B., Mazard, T., Blache, P., Dache, Z., Pastor, B., Pisareva, E., Tanos, R., & Thierry, A. R. (2021). Circulating nuclear DNA structural features, origins, and complete size profile revealed by fragmentomics. JCI insight, 6(7), e144561. https: / / doi.org / 10.1172 / jci.insight.144561. 3. (Sanchez, C., Snyder, M. W., Tanos, R., Shendure, J., & Thierry, A. R. (2018). New insights into structural features and optimal detection of circulating tumor DNA determined by single-strand DNA analysis. NPJ genomic medicine, 3, 31. https: / / doi.org / 10.1038 / s41525-018-0069-0). 4. Lister R, O’Malley RC, Tonti-Filippini J, Gregory BD, Berry CC, Millar AH, et al. Highly integrated single-base resolution maps of the epigenome in Arabidopsis. Cell. 2008; 133:523-36. 5. Erger, F., Norling, D., Borchert, D., Leenen, E., Habbig, S., Wiesener, M. S., Bartram, M. P., Wenzel, A., Becker, C., Toliat, M. R., Nurnberg, P., Beck, B. B., & Altmuller, J. (2020). cfNOMe - A single assay for comprehensive epigenetic analyses of cell-free DNA. Genome Medicine, 12(1), 54. https: / / doi.org / 10.1186 / s13073-020-00750-5.

Claims

1. A method for pre-treating biological samples for constructing a sequencing library, S1: A step of repairing single-strand gaps in DNA fragments contained in the sample using DNA ligase, and S2: Includes the step of filling the terminal gap in the DNA fragment with DNA polymerase, The sequencing library is for obtaining DNA methylation information data of the biological sample after sequencing, or for obtaining DNA methylation information and fragment mix information data of the biological sample. In step S1, the single-strand gap is a 3',5'-phosphodiester bond gap. method.

2. The method according to claim 1, wherein the length of the DNA fragment is 50 bp to 1000 bp.

3. The method according to claim 1, further comprising fragmenting DNA molecules contained in the biological sample to generate DNA fragments before step S1.

4. The method according to claim 1, wherein the ligase has 3',5'-phosphodiester bond catalytic activity.

5. The method according to claim 1, wherein the ligase is selected from HiFi-Taq ligase, T4 DNA ligase, Taq DNA ligase, E. coli DNA ligase, and T7 DNA ligase.

6. The method according to claim 1, wherein the biological sample is selected from a cell culture medium or a biological sample from nature.

7. The method according to claim 1, wherein the biological sample is selected from peripheral blood, plasma, serum, urine, feces, saliva, cerebrospinal fluid, lymph, alveolar lavage fluid, amniotic fluid, cleavage fluid, cell culture medium, embryo culture medium, microbial culture medium, soil leachate, and bone meal leachate.

8. The method according to claim 7, wherein the DNA fragment is selected from cell-free DNA and fragmented DNA.

9. The method according to claim 1, wherein the DNA fragment content in the sample is 10 ng or less.

10. The method according to claim 1, wherein the DNA methylation comprises cytosine methylation and / or methylolation.

11. A method for constructing a sequencing library, 1) Prepare a biological sample and pre-treat the biological sample using the method described in any one of claims 1 to 10 to obtain a DNA fragment that does not have a 3',5'-phosphodiester bond gap, and 2) Constructing the sequencing library using the DNA fragments, Includes, The sequencing library is for obtaining DNA methylation information data of the biological sample after sequencing, or for obtaining DNA methylation information and fragment mix information data of the biological sample. method.

12. Step 2) further comprises a) end blunting of both ends of the DNA fragment and adding a linker sequence in which some or all of the cytosine is methylated cytosine and / or methylolated cytosine, and b) converting the cytosine in the DNA fragment to uracil by an enzymatic conversion method, wherein the uracil is recognized as thymine in subsequent amplification and sequencing, the methylated cytosine is recognized as cytosine in subsequent amplification and sequencing, or, Step 2) further comprises a) end blunting of both ends of the DNA fragment and adding a linker sequence, and b) conversion of methylated cytosine in the DNA fragment to uracil or dihydrouracil by an enzymatic conversion method, wherein the uracil or dihydrouracil is recognized as thymine in subsequent amplification and sequencing, and the unmethylated cytosine is recognized as cytosine in subsequent amplification and sequencing. The method according to claim 11.

13. The method according to claim 12, wherein the enzyme conversion method is either the EM-seq conversion method or the TAPS conversion method.

14. The method according to claim 12, further comprising adding carrier DNA to the biological sample after step a).

15. The method according to claim 14, wherein the carrier DNA does not contain the linker sequence and the fragment size of the carrier DNA is 100 to 500 bp.

16. A method for identifying the location of single-strand gaps in DNA fragments contained in a biological sample, 1) Divide the biological sample into two parts: biological sample A and biological sample B. 2) Preparing a first sequencing library by processing biological sample A by the method described in any one of claims 1 to 10, and preparing a second sequencing library by processing biological sample B by the method described in any one of claims 1 to 10 under conditions in which the DNA ligase is not used in step S1, and 3) Identifying the location of the single-strand gap based on the difference in DNA methylation information obtained from the first sequencing library and the second sequencing library. Includes, A method wherein the single-strand gap is a single-strand 3',5'-phosphodiester bond gap.