Single-cell DNA methylation detection method

What is AI technical title?
AI technical title is built by Patsnap AI team. It summarizes the technical point description of the patent document.
By employing single-cell tagging and mixed enzymatic transformation methods, the problems of DNA damage and loss in existing technologies have been solved, enabling high-throughput and high-sensitivity single-cell DNA methylation detection, suitable for high-precision analysis of complex biological tissues.

CN118308491BActive Publication Date: 2026-06-23BEIJING CHANGPING LAB +1

View PDF 16 Cites 0 Cited by

Patent Information

Authority / Receiving Office: CN · China
Patent Type: Patents(China)
Current Assignee / Owner: BEIJING CHANGPING LAB
Filing Date: 2023-01-06
Publication Date: 2026-06-23

Application Information

Patent Timeline

06 Jan 2023

Application

23 Jun 2026

Publication

CN118308491B

IPC: C12Q1/6888; C12Q1/6806; C12Q1/6869; C12N15/11

AI Tagging

Application Domain

Microbiological testing/measurement DNA/RNA fragmentation

Technology Topics

DNA methylationGenome

Explore More Agents

Novelty Search
Search existing technologies and assess novelty
↗
FTO
Analyze whether a product may infringe others' patents
↗
Design FTO
Check prior-design risk for exterior design
↗
Drafting
Draft patent application text based on a technical solution
↗
Find Solutions with TRIZ
Generate feasible solution to solve your technical challenge
↗

Similar Technology Patents

Methods utilizing DNA methylation of canines
AU2025228020A1DNA methylationPhysiology
Fargesia and citrus grandis pericarp anti-aging composition for reducing methylation age and application thereof
CN122297581ABiotechnologyDNA methylation
Method for constructing high-throughput sequencing library for enriched methylated DNA and use thereof
WO2026113265A1Microbiological testing/measurement Library creationDNA methylationTarget enrichment
Methods for generating a circadian clock comprising a dna methylation profile
CN122374828ADNA methylationGenetics
Methods for determining a dog's health status
CN122139039AMedical data mining Health-index calculationDNA methylationPhysiology

Get free access to AI patent search and analysis

Check patentability, review prior art and ask IP Agent with full patent context.

Smart Images

Figure CN118308491B_ABST

Patent Text Reader

Abstract

The present disclosure provides methods of single-cell DNA methylation analysis that enable high-throughput, whole-genome DNA methylation sequencing of single cells in a variety of tissues.

Need to check novelty before this filing date? Find Prior Art

Description

TECHNICAL FIELD

[0001] The present disclosure provides a single-cell DNA methylation detection method. TECHNICAL BACKGROUND

[0002] DNA methylation, as an important epigenetic modification, plays an important role in the process of genome regulation. The establishment and removal of DNA methylation are usually closely related to cell fate determination, embryonic development, disease occurrence, etc. Researchers analyze the differences in the distribution of DNA methylation modification in the genomes of different tissues or different cell types, and explore the relationship between DNA methylation and cell type-specific gene expression, cell state maintenance and transformation. With the in-depth research, researchers found that the way of extracting tissue genomic DNA for sequencing can mask the heterogeneity among cells in the tissue. In order to achieve high-precision DNA methylation analysis for complex biological tissues such as brain and cancer, high-throughput single-cell DNA methylation sequencing technology is particularly important.

[0003] The principle of methylation sequencing is to distinguish cytosine carrying and not carrying methylation modification by chemical reaction difference, which includes whole-genome DNA methylation sequencing technology based on bisulfite conversion, TAPS process (Liu et al., Nature Biotechnology (2019). “Bisulfite-free direct detection of 5-methylcytosine and 5-hydroxymethylcytosine at base resolution”), EM-seq process (Williams et al., New England Biolabs, Inc. (2019). “Enzymatic Methyl-seq: The Next Generation of Methylome Analysis”). The damage of chemical reaction to DNA and the DNA loss caused by purification both limit the application of these methods in single-cell starting scenarios.

[0004] Single-cell DNA methylation sequencing methods that perform DNA amplification after bisulfite conversion include scRRBS (Guo et al., Genome Research (2013). “Single-Cell methylome landscapes of mouse embryonic stem cells and early embryos analyzed using reduced representation bisulfite sequencing”), scBS-seq (Smallwood et al., Nature Methods (2014). “Single-cell genome-wide bisulfite sequencing for assessing epigenetic heterogeneity”), scWGBS (Farlik et al., Cell Reports (2015). “Single-Cell DNA Methylation Sequencing and Bioinformatic Inference of Epigenomic Cell-State Dynamics”). Due to DNA damage and loss caused by bisulfite reaction and purification, these methods have low genome coverage for single-cell DNA methylation sequencing.

[0005] In the snmC-seq method published in Science in 2017 (Luo et al., Science (2017). “Single-cell methylomes identify neuronal subtypes and regulatory elements in mammalian cortex”), the products after bisulfite conversion are amplified by random primers labeled with tag sequences, and then single-cell DNA carrying different tag sequences can be mixed for library construction to improve throughput. The problem with this method is that each cell needs to be individually bisulfite-converted and DNA purified before being labeled, which increases reagent costs, operational complexity, and laboratory platform requirements, making it difficult to implement technology promotion. The sci-MET method published in Nature Biotechnology in 2018 (Mulqueen et al., Nature Biotechnology (2018). “Highly scalable generation of DNA methylation profiles in single cells”) first uses labeled Tn5 to fragment DNA in the nucleus, and by mixing and resorting, cells labeled with different tags can be bisulfite-converted in one reaction system. However, the destructive nature of bisulfite reaction on DNA causes further fragmentation and degradation of the already fragmented DNA, resulting in a significant loss of information and ultimately a serious impact on detection sensitivity.

[0006] Patent application WO2021077415 provides a single-cell DNA methylation sequencing technology based on Tn5 transposition and enzymatic conversion. This method, which can be called Cabernet technology, connects the second-generation sequencing adapter carrying methylation modification to both ends of the DNA fragment through Tn5, and distinguishes between cytosine carrying and not carrying methylation modification by enzymatic conversion reaction, achieving single-cell whole-genome DNA methylation detection. This method replaces bisulfite conversion and relies on enzymatic conversion to minimize DNA damage from chemical reactions. However, each cell still needs to be handled individually, making it difficult to meet the high-throughput demand.

[0007] The method of introducing single-cell independent tags with Tn5 followed by mixed amplification can achieve high throughput, including but not limited to the sci (single-cell combinatorial indexing) method (Mulqueen et al., Nature Biotechnology (2018). “Highly scalable generation of DNA methylation profiles in single cells”), the s3 method published in the journal Nature Biotechnology in 2021 (Mulqueen et al., Nature Biotechnology (2021), “High-content single-cell combinatorial indexing”). In the s3 method, Tn5 transposase is assembled with a DNA linker sequence containing uracil, which can achieve incomplete extension and thus linker replacement during gap filling reaction. This method introduces tag sequences through Tn5 multicell transposition reaction to achieve throughput improvement, but it limits the efficiency of transposition reaction and is not suitable for methods that require sequence transformation, such as methylation sequencing.

[0008] In summary, there is still a lack of high-throughput, low-cost, high-sensitivity single-cell DNA methylation detection methods in the field. SUMMARY

[0009] The present disclosure provides a method for single-cell methylation detection, which achieves high throughput by introducing single-cell tags followed by mixed enzymatic transformation and library construction sequencing.

[0010] In one aspect, the present disclosure provides a method of analyzing methylation characteristics of single-cell genomic DNA, comprising:

[0011] contacting genomic DNA from a single cell with a plurality of transposomes, wherein the transposomes comprise a transposase and a transposon DNA, wherein the transposon DNA comprises a double-stranded transposase binding site and an overhang, wherein the overhang comprises a first primer binding sequence at the 5' end of the overhang, and a uracil nucleotide downstream (e.g., 3' end) of the first primer binding sequence, upstream (e.g., 5' end) of the transposase binding site; to obtain double-stranded genomic DNA fragments comprising transposon DNA at each end;

[0012] filling the gap between the transposon DNA and the genomic DNA fragments with a uracil-intolerant polymerase to form a first double-stranded extension product of the genomic DNA fragments;

[0013] The first double-stranded extension product is contacted with a template-converting oligonucleotide, wherein the template-converting oligonucleotide comprises, from the 5' end to the 3' end: a second primer-binding sequence, a tag sequence, and a transposase-binding site-binding sequence; wherein the tag sequence has a unique nucleotide sequence corresponding to the cell.

[0014] An extension reaction was performed using a uracil-intolerant polymerase to obtain a second double-stranded extension product; wherein the second double-stranded extension product constitutes a library of tagged genomic DNA fragments of the cell.

[0015] Libraries of tagged genomic DNA fragments obtained from different cells are mixed, and the mixed libraries are processed to convert cytosine into uracil.

[0016] The transformed mixed library was amplified using the first and second primers to generate amplicon.

[0017] In some embodiments, the transposon has two transposases and two transposon DNAs, wherein each transposon DNA includes the double-stranded transposase binding site and overhang.

[0018] In some embodiments, the plurality of transposons cleave the genomic DNA into a plurality of double-stranded genomic DNA fragments representing a library of genomic DNA fragments, wherein each double-stranded genomic DNA fragment includes transposon DNA at each end of the genomic DNA fragment.

[0019] In some embodiments, the first double-stranded extension product has a 5' overhang at each end, the overhang containing a first primer-binding sequence.

[0020] In some embodiments, the second double-stranded extension product has a 5' overhang at one end, the overhang containing a first primer-binding sequence.

[0021] In some embodiments, the template-converting oligonucleotide contains a transposase-binding site sequence that can hybridize to a transposase-binding site sequence located at the 3' end of the genomic fragment in the first double-stranded extension product.

[0022] In some embodiments, the methylation includes 5-methylcytosine (5mC) and / or 5-hydroxymethylcytosine (5hmC).

[0023] In some embodiments, the transposase binding site binding sequence contained in the template-converting oligonucleotide includes a locked nucleotide (LNA) modification.

[0024] In some embodiments, the transposase binding site binding sequence contains multiple LNA modifications, such as 2, 3, 4, or 5 LNA modifications.

[0025] In some embodiments, the gap-filling and / or extension steps involve using four nucleotides: A, T, C, and G, wherein the cytosine nucleotide (e.g., dCTP) contains a modified cytosine that is resistant to the transformation step. In some embodiments, resistance to the transformation step means the ability to withstand a reaction that converts cytosine to uracil (e.g., APOBEC deamination) without sequence alteration. In some embodiments, the modified cytosine is a methylated cytosine, such as 5-methylcytosine (5-mdC) or 5-hydroxymethylcytosine (5-hmdC).

[0026] In some implementations, the first primer binding sequence does not contain cytosine nucleotides.

[0027] In some embodiments, the first primer-binding sequence comprises a cytosine nucleotide containing a modified cytosine, wherein the modified cytosine is resistant to the transformation step. In some embodiments, the modified cytosine is a methylated cytosine, such as 5-methylcytosine (5-mdC) or 5-hydroxymethylcytosine (5-hmdC).

[0028] In some embodiments, the transposon DNA contains a plurality of (e.g., 2, 3 or 4) consecutive uracil nucleotides downstream of the first primer binding sequence and upstream of the transposase binding site.

[0029] In some embodiments, the 3' end of the first primer-binding sequence is adjacent to the uracil nucleotide. In some embodiments, the 5' end of the transposase-binding site is adjacent to the uracil nucleotide. In some exemplary embodiments, the first primer-binding sequence and the transposase-binding site are linked by three consecutive uracil nucleotides.

[0030] In some embodiments, the mixed library is processed in the presence of vector DNA to convert cytosine to uracil. In some embodiments, the vector DNA is selected from dsDNA fragments of length between 100 bp and 4000 bp (e.g., 100 bp to 1000 bp, 100 bp to 800 bp, 100 bp to 600 bp, 100 bp to 500 bp, 100 bp to 400 bp, 200 bp to 400 bp). In some embodiments, the vector DNA is selected from fragments of length about 200 bp to 400 bp, such as 300 bp. In some embodiments, the vector DNA is sonicated λDNA.

[0031] In some embodiments, the method further includes sequencing the amplicon.

[0032] In some implementations, first and second primers are used to amplify a transformed mixed library to generate amplicons, each containing a sequencing adapter at its end, thereby generating a sequencing library.

[0033] In some implementations, the sequencing adapter is an Illumina sequencing adapter.

[0034] In some embodiments, the first primer introduces a P7 end adapter, and the second primer introduces a P5 end adapter. In some embodiments, the first primer comprises (i) a first portion at the 3' end containing a sequence capable of hybridizing with the first primer-binding sequence, and (ii) a second portion at the 5' end of the first portion containing the P7 sequence, and may also contain other desired functional sequences, such as an index sequence. In some embodiments, the second primer comprises (i) a first portion at the 3' end containing a sequence capable of hybridizing with the second primer-binding sequence, and (ii) a second portion at the 5' end of the first portion containing the P5 sequence, and may also contain other desired functional sequences, such as an index sequence.

[0035] In some embodiments, the first primer introduces a P5 end adapter, and the second primer introduces a P7 end adapter. In some embodiments, the first primer comprises (i) a first portion at the 3' end containing a sequence capable of hybridizing with the first primer-binding sequence, and (ii) a second portion at the 5' end of the first portion containing the P5 sequence, and may also contain other desired functional sequences, such as an index sequence. In some embodiments, the second primer comprises (i) a first portion at the 3' end containing a sequence capable of hybridizing with the second primer-binding sequence, and (ii) a second portion at the 5' end of the first portion containing the P7 sequence, and may also contain other desired functional sequences, such as an index sequence.

[0036] In some embodiments, the transposase is a Tn5 transposase, a Mu transposase, a Tn7 transposase, or an IS5 transposase.

[0037] In some embodiments, the uracil-intolerant polymerase is selected from Q5 polymerase, Deep Vent DNA polymerase, Phusion high-fidelity polymerase, KAPA high-fidelity polymerase, Phanta polymerase, or any combination thereof.

[0038] In some embodiments, the amplification step using the first and second primers is performed using a uracil-resistant polymerase. In some embodiments, the uracil-resistant polymerase is selected from Q5U polymerase, KAPA U+ polymerase, Phanta Uc polymerase, or any combination thereof.

[0039] In some embodiments, the bound transposase is removed from the double-stranded genomic DNA fragment before gap filling and extension.

[0040] In some embodiments, the method further includes a step after the extension step but before the transformation step: purifying the reaction medium containing the second double-stranded extension product, for example, by means of a DNA spin-column or bead-based DNA purification.

[0041] In some embodiments, the method further includes a step following the amplification step: purifying the reaction medium containing the amplicon, for example, by using a DNA spin-column or bead-based DNA purification.

[0042] In some embodiments, the process of converting cytosine to uracil is an enzymatic conversion. In some embodiments, the enzymatic conversion includes an APOBEC deamination reaction. In some embodiments, the enzymatic conversion includes the use of T4-BGT enzyme and APOBEC3A enzyme. In some embodiments, the enzymatic conversion includes the use of TET2 enzyme and APOBEC3A enzyme. Attached Figure Description

[0043] The foregoing and other features and advantages of the invention will be more fully understood from the following detailed description of illustrative embodiments in conjunction with the accompanying drawings, in which:

[0044] Figure 1 A schematic diagram of the workflow of an exemplary implementation is shown. First, single cells are sorted and lysed. Then, Tn5 transposition forms single-cell genomic DNA fragments, which are labeled with independent tag sequences to distinguish them from other cells. Subsequently, single-cell DNAs labeled with different tag sequences can be mixed for subsequent enzymatic transformation, library construction, and sequencing.

[0045] Figure 2 A schematic diagram of the chemical reaction principle of an exemplary implementation of adding single-cell tags is shown.

[0046] Figure 3The results of data splitting for tagged single cells are shown. The left figure shows the distribution of sequencing data volume for different single cells split according to the tags. The horizontal axis represents multiple cells from mixed library preparation and sequencing in the same 96-well plate, and the vertical axis represents the number of sequencing reads split. The right figure shows the alignment of the split sequencing reads obtained from mixed library preparation and sequencing of human and mouse DNA tagged in the same 96-well plate with the human and mouse reference genomes.

[0047] Figure 4 The results show the quality control and correlation performance. A) Lambda DNA fragments without any methylation modifications (represented by C in the figure), pUC19 DNA fragments with 5mC modification at CpG sites (represented by 5mC in the figure), and 5hmC DNA fragments (represented by 5hmC in the figure) were tested for 5mC and 5hmC respectively according to the TSO-Cabernet protocol. The accuracy results are shown in a bar chart. B) The heatmap shows the Pearson correlation of genomic 5mC modification between single-cell samples and multi-cell samples detected by TSO-Cabernet technology (TSO_1~TSO_5) and Cabernet technology (Cabernet_1~Cabernet_3, Bulk_Cabernet_1, Bulk_Cabernet_2). Red indicates strong correlation. Invention Details

[0048] According to one aspect, a method is provided for analyzing the methylation signature of single-cell genomic DNA by contacting genomic DNA from a single cell with multiple transposons. Each transposon has two transposases (e.g., Tn5 transposase) and two transposon DNAs, each transposase binding to the transposon DNA to form a transposase / transposon DNA complex dimer. Each transposon DNA includes a double-stranded transposase binding site (e.g., a double-stranded 19 bp Tnp binding site) and a dangling portion (e.g., a 5' dangling portion). The dangling portion includes a first primer-binding sequence at its 5' end and contains uracil nucleotides (e.g., one or more consecutive uracil nucleotides) downstream of the first primer-binding sequence (e.g., at the 3' end) and upstream of the transposase binding site (e.g., at the 5' end). The dangling portion may have any length suitable for including the first primer-binding sequence or other desired functional sequences.

[0049] The transposon randomly binds to the target site along the double-stranded genomic DNA and cleaves the double-stranded genomic DNA into multiple double-stranded fragments, each fragment having a first complex linked to the upper strand via a transposase binding site and a second complex linked to the lower strand via a transposase binding site. Thus, the transposon DNA (i.e., the transposase binding site along with a cantilever containing a first primer binding sequence) is attached to each 5' end of the double-stranded fragment. According to one aspect, the transposase is removed from the complex.

[0050] The transposon DNA is ligated to the double-stranded genomic DNA fragment, and a single-strand gap exists between one strand of the double-stranded genomic DNA fragment and one strand of the transposon DNA. A uracil-intolerant polymerase is used to fill the gap between the transposon DNA and the genomic DNA fragment, creating a double-stranded link between the double-stranded genomic DNA fragment and the double-stranded transposon DNA to form a first double-stranded extension product of the genomic DNA fragment. The first double-stranded extension product has a 5' overhang at each end, the overhang containing a first primer-binding sequence. According to one aspect, a uracil nucleotide is contained downstream of the first primer-binding sequence (e.g., the 3' end) and upstream of the transposon binding site (e.g., the 5' end), thereby preventing the 3' ends of both strands of the first double-stranded extension product from extending further towards the first primer-binding sequence during the gap-filling process using a uracil-intolerant polymerase, thus forming the first double-stranded extension product with a 5' overhang at each end. According to one aspect, the first double-stranded extension product comprises an upper strand and a lower strand, the upper strand comprising, from the 5' end to the 3' end: a first primer-binding sequence, uracil nucleotide, transposase binding site, upper strand of genomic DNA fragment, filled gap, and transposase binding site; the lower strand comprising, from the 5' end to the 3' end: a first primer-binding sequence, uracil nucleotide, transposase binding site, lower strand of genomic DNA fragment, filled gap, and transposase binding site.

[0051] The first double-stranded extension product is contacted with a template-converting oligonucleotide, wherein the template-converting oligonucleotide comprises, from its 5' end to its 3' end: a second primer-binding sequence, a tag sequence, and a transposase-binding site-binding sequence; wherein the tag sequence has a unique nucleotide sequence corresponding to the cell. According to one aspect, by introducing the tag sequence, the genomic DNA of different cells is labeled with different unique tag sequences to distinguish them from each other. According to one aspect, the transposase-binding site-binding sequence is capable of hybridizing to the transposase-binding site sequence located at the 3' end of the genomic fragment in the first double-stranded extension product.

[0052] An extension reaction is performed using a uracil-intolerant polymerase to obtain a second double-stranded extension product; wherein the second double-stranded extension product has a 5' overhang at one end, the overhang containing a first primer-binding sequence; wherein the second double-stranded extension product constitutes a library of tagged genomic DNA fragments of the cell. According to one aspect, a template-converting oligonucleotide anneals to one strand of the first double-stranded extension product, the annealed template-converting oligonucleotide serving as a template strand, allowing that strand of the first double-stranded extension product to undergo an extension reaction (such as PCR extension) to obtain the tag sequence and the second primer-binding sequence, thereby tagging the genomic DNA fragment. Those skilled in the art will understand that, according to the base pairing rules, the tag sequence and the second primer-binding sequence contained in the tagged genomic DNA strand obtained by the extension reaction are in fact complementary to the corresponding sequences in the template-converting oligonucleotide. According to one aspect, one strand of the first double-stranded extension product also serves as a template strand, causing the template-converting oligonucleotide that has undergone annealing to extend towards its 3' end. Uracil nucleotides are contained between the downstream (e.g., 3' end) of the first primer-binding sequence and the upstream (e.g., 5' end) of the transposase-binding site. This prevents the 3' end of the template-converting oligonucleotide from extending further towards the first primer-binding sequence when using a uracil-intolerant polymerase for the extension reaction, thereby forming a second double-stranded extension product with a 5' overhang at one end. According to one aspect, the second double-stranded extension product comprises an upper strand and a lower strand. The upper strand, from the 5' end to the 3' end, comprises: a first primer-binding sequence, a uracil nucleotide, a transposase-binding site, a genomic DNA fragment, a transposase-binding site, a tag sequence, and a second primer-binding sequence. The lower strand, from the 5' end to the 3' end, comprises: a second primer-binding sequence, a tag sequence, a transposase-binding site, an extension strand templated with the genomic DNA fragment, and a transposase-binding site.

[0053] Libraries of tagged genomic DNA fragments obtained from different cells are mixed, and the mixed libraries are processed to convert cytosine to uracil. According to one aspect, the above steps are performed separately on genomic DNA from different single cells to obtain a second double-stranded extension product for each cell, each carrying a unique tag sequence; these second double-stranded extension products are then mixed. According to one aspect, mixing single-cell genomic DNA carrying different tag sequences before transformation and library construction can significantly improve throughput.

[0054] Amplicones are generated by amplifying a transformed mixed library using first and second primers. According to one aspect, the first and second primers specifically hybridize to first primer-binding sequences and second primer-binding sequences at both ends of a tagged genomic DNA fragment, respectively. According to one aspect, the first and second primers are mixed with the mixed library, and the tagged genomic DNA fragment is amplified. According to one aspect, sequencing adapters are introduced at both ends of the tagged genomic DNA fragment using the first and second primers, respectively, thereby generating a sequencing library. According to one aspect, the amplicones, each including a sequencing adapter, are sequenced using a high-throughput sequencing method known to those skilled in the art, for example.

[0055] Unless otherwise indicated, certain embodiments or features thereof may be implemented using conventional techniques of molecular biology, microbiology, recombinant DNA, etc., which are within the capabilities of a person skilled in the art. Such techniques are well explained in the literature. See also Sambrook, Fritsch and Maniatis, *Molecula Cloning: A Laboratory Manual*, 2nd edition (1989), *Oligonucleide Synthesis* (edited by M.J. Gai, 1984), *Animal Cell Culture* (edited by R.R. Freshney, 1987), the series *Methods in Enzymology* (Academic Press, Inc.); *Gene Transfer Veterans for Mammalian Cells* (edited by J.M. Miller and M.Calos, 1987), *Handbook of Experimental Immunology* (edited by D.M. Weir and C.C. Blackwell), *Current Protocols in Molecular Biology* (edited by F.M. Mausubel, R. Brent, R.Kingston, D.D. Moore, J.G. Seedman, J.S. Smith, and K.S. Struhl, eds., 1987), *Current*. PROTOCOLS IN IMMUNOLOGY (JEColigan, AMKruisbeek, DH Margulies, EM Shevach, and W. Strober, eds., 1991); ANNUAL REVIEW OF IMMUNOLOGY; and feature articles in journals such as ADVANCES IN IMMUNOLOGY. All patents, patent applications, and publications mentioned above and below are incorporated herein by reference.

[0056] The terminology and notation used in this paper for nucleic acid chemistry, biochemistry, genetics, and molecular biology follow those of standard papers and texts in the field (e.g., Kornberg and Baker, DNA Replication, 2nd ed. (WH Freeman, New York, 1992); Lehninger, Biochemistry, 2nd ed. (Worth Publishers, New York, 1975); Strachan and Read, Human Molecular Genetics, 2nd ed. (Wiley-Liss, New York, 1999); Eckstein, ed., Oligonucleotides and Analogs: A Practical Approach (Oxford University Press, New York, 1991); Gait, ed., Oligonucleotide Synthesis: A Practical Approach (IRL Press, Oxford, 1984); etc.).

[0057] Definitions of terms

[0058] As used herein, “DNA methylation” typically involves the modification of a cytosine base in a DNA molecule or fragment to 5-methylcytosine (5mC). In addition, although less frequent than 5mC, a minority of cytosine bases are modified to 5-hydroxymethylcytosine (5hmC), 5-aldehyde cytosine (5fC), 5-carboxycytosine (5caC), etc. When methylation is mentioned herein, it refers to any modification of a cytosine base, which may refer to modification to 5mC, or to modification to 5hmC, 5fC, 5caC, etc., unless the context otherwise requires. In some embodiments, the methylation described herein includes 5-methylcytosine (5mC) and / or 5-hydroxymethylcytosine (5hmC).

[0059] As used herein, "methylation signature" refers to information about the methylation status of a DNA molecule or fragment, including but not limited to methylation sites, methylation levels, and methylation patterns (5mC or 5hmC). In this article, "methylation level," also known as "degree of methylation," refers to the proportion (or frequency) of a specific methylation site in a sample that is methylated. Methylation detection is typically based on the following principle: one of the methylated or unmethylated cytosine is converted to uracil (U) or a base that is substantially equivalent to uracil in base pairing (e.g., dihydrouracil, DHU); in the subsequent amplification process, the corresponding uracil pairs as thymine (T) with adenine (A), resulting in the cytosine or methylated cytosine at that methylation site appearing as thymine in the detection results (e.g., sequencing results); by comparing with a reference sequence, it can be determined whether the cytosine in the DNA molecule or fragment is methylated. This reference sequence can be a sequence from the same sample but without the above conversion, or a corresponding sequence from a healthy population. In addition, as described below, there are other methods to distinguish between different methylation modes (e.g., 5mC and 5hmC).

[0060] As used herein, “single cell” means one cell. A single cell suitable for use in the methods described herein can be obtained from the target tissue or from a biopsy, blood sample, or cell culture. Alternatively, cells from a specific organ, tissue, tumor, neoplasm, etc., can be obtained and used in the methods described herein. Furthermore, cells from any population can generally be used in the methods, such as populations of prokaryotic or eukaryotic single-celled organisms (including bacteria or yeast). Single-cell suspensions can be obtained using standard methods known in the art, including, for example, enzymatic digestion of cell-binding proteins in a tissue sample using trypsin or papain, or release of adherent cells from a culture, or mechanical separation of cells from a sample. Single cells can be placed in any suitable reaction vessel that allows for individual processing of single cells. For example, a 96-well plate, such that each single cell is placed in a single well.

[0061] As used herein, the term "genome" is defined as the set of genes carried by an individual, cell, or organelle. As used herein, the term "genomic DNA" is defined as DNA material containing some or all of the set of genes carried by an individual, cell, or organelle.

[0062] As used herein, the term "nucleotide" refers to a nucleoside having one or more phosphate groups linked to a sugar moiety by an ester bond. Exemplary nucleotides include nucleosides with monophosphate, diphosphate, and triphosphate. The terms "polynucleotide," "oligonucleotide," and "nucleic acid molecule" are used interchangeably herein and refer to polymers of nucleotides (deoxyribonucleotides or ribonucleotides) of any length linked together by phosphodiester bonds between their 5' and 3' carbon atoms. The following are non-limiting examples of polynucleotides: genes or gene fragments (e.g., probes, primers, EST, or SAGE tags), exons, introns, messenger RNA (mRNA), transfer RNA, ribosomal RNA, ribozymes, cDNA, recombinant polynucleotides, branched polynucleotides, plasmids, vectors, isolated DNA having any sequence, isolated RNA having any sequence, nucleic acid probes, and primers. Polynucleotides may contain modified nucleotides, such as methylated nucleotides and nucleotide analogs. The term also refers to double-stranded and single-stranded molecules. Unless otherwise stated or required, any embodiment of the invention comprising polynucleotides includes both double-stranded forms and each of two complementary single-stranded forms known or predicted to constitute a double-stranded form. A polynucleotide consists of a specific sequence of four nucleotide bases: adenine (A); cytosine (C); guanine (G); thymine (T); and, when the polynucleotide is RNA, uracil (U), corresponding to thymine. Therefore, the term polynucleotide sequence is a letter representation of a polynucleotide molecule. This letter representation can be entered into a database in a computer with a central processing unit and used for bioinformatics applications such as functional genomics and homology searches.

[0063] The nucleotides described herein include both natural nucleotides and nucleotide analogs or modified nucleotides. The terms "nucleotide analog" and "modified nucleotide" refer to non-standard nucleotides, including ribonucleotides or deoxyribonucleotides that are not naturally occurring. In some exemplary embodiments, nucleotide analogs may be modified at any position to alter certain chemical properties of the nucleotide while retaining its ability to perform its intended function. Examples of positions at which nucleotides can be derived include the 5-position, such as 5-(2-amino)propyluridine, 5-bromouridine, 5-propynyluridine, 5-propenyluridine, etc.; etc.; the 6-position, such as 6-(2-amino)propyluridine; and the 8-position of adenosine and / or guanosine, such as 8-bromoguanosine, 8-chloroguanosine, 8-fluoroguanosine, etc. Nucleotide analogs also include denitronucleotides, such as 7-denitro-adenosine; O- and N-modified (e.g., alkylated, such as N6-methyladenosine, or other modifications known in the art) nucleotides; and other heterocyclic modified nucleotide analogs, such as those described in Herdewijn, Antisense Nucleic Acid Drug Dev., 2000 Aug. 10(4):297-310.

[0064] Nucleotide analogs may also contain modifications to the sugar moiety of the nucleotide. For example, the 2'OH- group may be replaced by a group selected from H, OR, R, F, Cl, Br, I, SH, SR, NH2, NHR, NR2, COOR, or OR, wherein R is a substituted or unsubstituted C1-C6 alkyl, alkenyl, alkynyl, aryl, etc. Other possible modifications include those described in U.S. Patents 5,858,988 and 6,291,438.

[0065] The phosphate group of a nucleotide can also be modified, for example, by substituting one or more oxygen atoms of the phosphate group with sulfur (e.g., thiophosphate), or by making other substitutions that allow the nucleotide to perform its intended function (such as those described, for example, in Eckstein, Antisense Nucleic Acid Drug Dev. 2000 Apr. 10(2): 117-21, Rusckowski et al. Antisense Nucleic Acid Drug Dev. 2000 Oct. 10(5): 333-45, Stein, Antisense Nucleic Acid Drug Dev. 2001 Oct. 11(5): 317-25, Vorobjev et al. Antisense Nucleic Acid Drug Dev. 2001 Apr. 11(2): 77-85, and U.S. Patent No. 5,684,143). Some of the above modifications (e.g., phosphate group modifications) reduce the rate of hydrolysis of polynucleotides, such as those containing the like, in vivo or in vitro.

[0066] As used herein, the terms “complementarity” and “complementarity” refer to nucleotide sequences in relation to the base pairing rules. For example, sequence 5'-AGT-3' is complementary to sequence 5'-ACT-3'. Complementarity can be partial or complete. Partial complementarity occurs when one or more nucleic acid bases do not match according to the base pairing rules. Complete or full complementarity occurs when each nucleic acid base matches another base according to the base pairing rules. The degree of complementarity between nucleic acid chains has a significant impact on the hybridization efficiency and strength between nucleic acid chains.

[0067] As used herein, the term "hybridization" refers to the pairing of complementary nucleic acids. Hybridization and hybridization strength (i.e., the strength of association between nucleic acids) are influenced by factors such as the degree of complementarity between nucleic acids, the strictness of the conditions involved, and the Tg of the resulting hybrid. m And the G:C ratio within nucleic acids. Individual molecules containing paired complementary nucleic acids within their structure are called "self-hybridized".

[0068] As used in this article, the term "T" m "T" refers to the melting temperature of nucleic acids. The melting temperature is the temperature at which half of a double-stranded nucleic acid molecule population dissociates into single strands. It is used to calculate the T of nucleic acids. m The equation for this is well known in the art. As shown in the standard reference, when nucleic acids are in an aqueous solution of 1M NaCl, T can be calculated using the following formula. m A simple estimate of the value: T m= 81.5 + 0.41 (%G + C) (see, for example, Anderson and Young, Quantitative Filter Hybridization, in Nucleic Acid Hybridization (1985)). Other references include more complex calculations, which are performed on T m The calculation takes into account structural and sequence characteristics.

[0069] As used in this article, the term "rigor" refers to the conditions under which nucleic acid hybridization is performed, including temperature, ionic strength, and the presence of other compounds such as organic solvents.

[0070] When used for nucleic acid hybridization, “low stringency conditions” include conditions equivalent to the following: when using probes approximately 500 nucleotides in length, binding or hybridization is performed at 42°C in a solution of 5x SSPE and 100 mg / ml denatured salmon sperm DNA, wherein the 5x SSPE contains 43.8 g / L NaCl, 6.9 g / L NaH2PO4 (H2O), and 1.85 g / L EDTA (pH adjusted to 7.4 with NaOH), 0.1% SDS, 5x Denhardt reagent (each 500 ml of 50x Denhardt contains: 5 g Ficoll (Type 400, Pharmacia), 5 g BSA (Fraction V; Sigma)); followed by washing at 42°C in a solution containing 5x SSPE and 0.1% SDS.

[0071] When used for nucleic acid hybridization, “moderately stringent conditions” include conditions equivalent to the following: when using probes approximately 500 nucleotides in length, binding or hybridization is performed at 42°C in a solution of 5x SSPE and 100 mg / ml denatured salmon sperm DNA, wherein the 5x SSPE contains 43.8 g / L NaCl, 6.9 g / L NaH2PO4 (H2O), and 1.85 g / L EDTA (pH adjusted to 7.4 with NaOH), 0.5% SDS, and 5x Denhardt reagent; followed by washing at 42°C in a solution containing 1.0x SSPE and 1.0% SDS.

[0072] When used for nucleic acid hybridization, “highly stringent conditions” include conditions equivalent to the following: when using probes approximately 500 nucleotides in length, binding or hybridization is performed at 42°C in a solution of 5x SSPE and 100 mg / ml denatured salmon sperm DNA, wherein the 5x SSPE contains 43.8 g / L NaCl, 6.9 g / L NaH2PO4 (H2O), and 1.85 g / L EDTA (pH adjusted to 7.4 with NaOH), 0.5% SDS, and 5x Denhardt reagent; followed by washing at 42°C in a solution containing 0.1x SSPE and 1.0% SDS.

[0073] When hybridization occurs between two single-stranded polynucleotides in an antiparallel configuration, the reaction is called "annealing," and those polynucleotides are described as "complementary." If hybridization can occur between one strand of the first polynucleotide and one strand of the second polynucleotide, then the double-stranded polynucleotide is complementary or homologous to the other polynucleotide. According to the generally accepted rules of base pairing, complementarity or homology (the degree to which one polynucleotide is complementary to another) can be quantified based on the proportion of bases in the opposing strands that are expected to form hydrogen bonds with each other.

[0074] As used herein, the term "amplification" refers to the process of forming one or more additional copies of a specific polynucleotide. Amplification methods include PCR methods known to those skilled in the art, and also include rolling circle amplification (Blanco et al., J. Biol. Chem., 264, 8935-8940, 1989), superbranched rolling circle amplification (Lizard et al., Nat. Genetics, 19, 225-232, 1998), and loop-mediated isothermal amplification (Notomi et al., Nuc. Acids Res., 28, e63, 2000), each of which is hereby incorporated in its entirety by reference. These methods are known and widely practiced in the art. The term "amplification reagent" may include primers, nucleic acid templates, and amplification enzymes (e.g., polymerases), and may also include other reagents required for amplification, such as nucleotides (e.g., deoxyribonucleic acid triphosphates), buffers, etc. Typically, amplification reagents are placed and contained together with other reaction components in a reaction vessel (test tube, microwell, etc.).

[0075] As used herein, the term "primer" typically refers to a natural or synthetic oligonucleotide that, upon forming a double helix with a polynucleotide template, can act as an initiation site for nucleic acid synthesis (such as sequencing primers) and extends from its 3' end along the template to form an extended double helix. The sequence of the nucleotides added during extension is determined by the sequence of the template polynucleotide. Typically, primers are extended by DNA polymerase. Primer lengths are typically 3 to 36 nucleotides, 5 to 24 nucleotides, 14 to 36 nucleotides, or 17 to 30 nucleotides. A "primer" can be considered a short polynucleotide, typically with a free 3'-OH group, which binds to a target or template that may be present in the target sample by hybridization and subsequently promotes the polymerization of a polynucleotide complementary to the target.

[0076] Obtaining single cell genomic DNA

[0077] According to one aspect, the method provided in this disclosure relates to obtaining genomic DNA from a single cell.

[0078] In some embodiments, cells are identified, and then individual cells are isolated. Cells within the scope of this disclosure include any type of cell in which understanding the DNA content is useful to those skilled in the art. Cells according to this disclosure include any type of cancer cell, hepatocyte, oocyte, embryonic cell, stem cell, iPS cell, ES cell, neuron, erythrocyte, melanocyte, astrocyte, germ cell, oligodendrocyte, kidney cell, etc.

[0079] In some embodiments, single cells suitable for use in the methods described herein can be obtained from target tissue or from biopsies, blood samples, or cell cultures. Alternatively, cells from specific organs, tissues, tumors, neoplasms, etc., can be obtained and used in the methods described herein. Single-cell suspensions can be obtained using standard methods known in the art, including, for example, enzymatic digestion of cell-binding proteins in a tissue sample using trypsin or papain, or release of adherent cells from a culture, or mechanical separation of cells from a sample. Single cells can be placed in any suitable reaction vessel that allows for individual processing of single cells. For example, a 96-well plate, such that each single cell is placed in a single well.

[0080] Methods for manipulating single cells are known in the art, including fluorescence-activated cell sorting (FACS), flow cytometry (Herzenberg., PNAS USA 76:1453-55 1979), micromanipulation, and the use of semi-automatic fine pickers (e.g., Quixell from Stoelting Co.). TM(Cell transfer systems). For example, individual cells can be selected based on characteristics detectable by microscopy, such as location, morphology, or reporter gene expression. Additionally, a combination of gradient centrifugation and flow cytometry can be used to increase separation or sorting efficiency.

[0081] Once the desired cells are identified, they are lysed using methods known to those skilled in the art to release cell contents, including DNA. The cell contents are then contained in a container or collection space. In some embodiments, cell contents, such as genomic DNA, can be released from the cells by lysing them. Lysis can be achieved, for example, by heating the cells, or by using a detergent or other chemical method, or a combination of these methods. Any suitable lysis method known in the art can be used. In some embodiments, the cells are heated in a cell lysis buffer containing a detergent. In some embodiments, heating the cells at 72°C for 2 minutes in the presence of Tween-20 is sufficient to lyse the cells; alternatively, the cells may be heated in water to 65°C for 10 minutes (Esumi et al., Neurosci Res60(4):439-51(2008)); or heated in PCR buffer II (Applied Biosystems) supplemented with 0.5% NP-40 to 70°C for 90 seconds (Kurimoto et al., Nucleic Acids Res34(5):e42(2006)); or lysis may be achieved using a protease such as proteinase K or by using a dissociative salt such as guanidine isothiocyanate (US Publication No. 2007 / 0281313). The resulting cell lysate may be used directly according to the methods described herein, for example, by adding a reaction mixture to the cell lysate. Alternatively, the cell lysate may be aliquoted into two or more volumes, such as into two or more containers, tubes, or regions, wherein a portion of the cell lysate is contained in each volume of the container, tube, or region, using methods known to those skilled in the art. The genomic DNA contained in each container, tube, or region can then be processed using the methods described herein.

[0082] Transposition

[0083] According to one aspect, the method provided in this disclosure relates to a method for fragmenting genomic DNA using transposases. The method uses a transposase or transposon to fragment a raw or starting nucleic acid sequence (such as genomic DNA) and ligates a pendant sequence containing a first primer-binding sequence to each end of a cleavage site or break site, thereby producing a set of fragments (each member of the set having the same pendant sequence).

[0084] In some implementations, genomic DNA is cleaved into double-stranded fragments using multiple transposons or transposon libraries. Each transposon in the multiple transposons or libraries is a dimer of a transposase that binds to transposon DNA; that is, each transposon comprises two separate transposon DNAs. Each transposon DNA in a transposon includes a transposase binding site and a dangling sequence containing a first primer-binding sequence. Thus, a number of fragments from the original nucleic acid sequence are generated from the transposon library, wherein each fragment has the same dangling sequence containing a first primer-binding sequence at each end of the fragment.

[0085] In some embodiments, the suspension may have any length suitable for including the first primer-binding sequence or other desired functional sequences. In some exemplary embodiments, the length of the suspension does not exceed 60 bp, for example, not exceeding 55 bp or 50 bp. In some exemplary embodiments, the length of the suspension is at least 4 bp, for example, at least 5 bp, at least 8 bp, at least 10 bp, at least 12 bp, or at least 15 bp. In some exemplary embodiments, the length of the suspension is 10-50 bp.

[0086] In some embodiments, the first primer-binding sequence does not contain cytosine nucleotides. According to one aspect, the first primer-binding sequence, which does not contain cytosine nucleotides, is able to withstand reactions that convert cytosine to uracil (e.g., APOBEC deamination) without undergoing sequence alteration.

[0087] In some embodiments, the first primer-binding sequence comprises a cytosine nucleotide containing a modified cytosine, wherein the modified cytosine is resistant to the transformation step. In some embodiments, the modified cytosine is a methylated cytosine, such as 5-methylcytosine (5-mdC) or 5-hydroxymethylcytosine (5-hmdC).

[0088] In some embodiments, the transposon DNA comprises a plurality of (e.g., 2, 3, or 4) consecutive uracil nucleotides downstream of the first primer-binding sequence and upstream of the transposase-binding site. In some embodiments, the 3' end of the first primer-binding sequence is adjacent to the uracil nucleotide. In some embodiments, the 5' end of the transposase-binding site is adjacent to the uracil nucleotide. In some embodiments, the 3' end of the first primer-binding sequence and the 5' end of the transposase-binding site are linked by a plurality of (e.g., 2, 3, or 4) consecutive uracil nucleotides.

[0089] In some implementations, exemplary transposon systems include Tn5 transposase, Mu transposase, Tn7 transposase, or IS5 transposase. Other useful transposon systems are known to those skilled in the art, including the Tn3 transposon system (see Maekawa, T., Yanagira, K., and Ohtsubo, E. (1996), A cell-free system of Tn3 transposition and transposition immunity, Genes Cells 1, 1007-1016), the Tn7 transposon system (see Craig, NL (1991), Tn7: a target site-specific transposon, Mol. Microbiol. 5, 2569-2573), and the Tn10 transposon system (see Chalmers, R., Sewitz, S., Lipkow, K., and Crellin, P. (2000), Complete nucleotide sequence of Tn10, J. Bacteriol). 182, 2970-2972), Piggybac transposon system (see Li, X., Burnight, ER, Cooney, AL, Malani, N., Brady, T., Sander, JD, Staber, J., Wheelan, SJ, Joung, JK, McCray, PB, Jr. et al. (2013), PiggyBac transposase tools for genome engineering, Proc. Natl. Acad. Sci. USA 110, E2279-2287), Sleeping beauty transposon system (see Ivics, Z., Hackett, PB, Plasterk, RH and Izsvak, Z. (1997), Molecular reconstruction of Sleeping Beauty, a Tc1-like transposon from fish, and its transposition in human cells, Cell) 91,501-510), Tol2 transposon system (see Kawakami, K. (2007), Tol2: a versatile gene transfer vector in vertebrates, GenomeBiol.8Suppl.1,S7.).

[0090] Specific Tn5 transpose systems have been described and are known to those skilled in the art. See Goryshin, IY and W. Reznikoff, Tn5 in vitro transposition. The Journal of Biological Chemistry, 1998. 273(13): 7367-74; Davies, DR et al., Three-dimensional structure of the Tn5 synaptic complex transposition intermediate. Science, 2000. 289(5476): 77-85; Goryshin, IY et al., Insertional transposon mutagenesis by electroporation of released Tn5 transposition complexes. Nature Biotechnology, 2000. 18(1): 97-100; and Steiniger-White, M., I. Rayment and W. Reznikoff, Structure / function insights into Tn5 transposition. Currentopinion in Structural Biology, 2004. 14(1): 50-7. Each of these references is hereby incorporated in its entirety by citation for all purposes. Kits for using the Tn5 transposon system for DNA library preparation and other purposes are known.See Adey, A. et al., Rapid, low-input, low-bias construction of shotgun fragment libraries by high-density in vitro transposition. Genome biology, 2010.11(12): Page R119; Marine, R. et al., Evaluation of a transposase protocol for rapid generation of shotgun high-throughput sequencing libraries from nanogramquantities of DNA. Applied and environmental microbiology, 2011.77(22): pp. 8071-9; Parkinson, NJ et al., Preparation of high-quality next-generation sequencing libraries from picogram quantities of target DNA. Genome research, 2012.22(1): pp. 125-33; Adey, A. and J. Shendure, Ultra-low-input, tagmentation-based whole-genome bisulfite sequencing. Genome research, 2012.22(6): pp.1139-43; Picelli, S. et al., Full-length RNA-seq from single cells using Smart-seq2. Nature protocols, 2014.9(1): pp.171-81; and Buenrostro, JD et al., Transposition of native chromatin for fast and sensitive epigenomic profiling of open chromatin, DNA-binding proteins and nucleosome position. Nature methods, 2013. Each of these references is hereby incorporated in its entirety by reference for all purposes. See also WO 98 / 10077, EP 2527438 and EP 2376517, each of which is hereby incorporated in its entirety by reference.Commercially available transposon kits are sold under the brand name NEXTERA and are available from Illumina.

[0091] In some exemplary embodiments, the transposase is a Tn5 transposase, and the transposase binding site comprises a first strand as shown in AGATGTGTATAAGAGACAG (SEQ ID NO:11) and a second strand as shown in its complementary sequence.

[0092] In some embodiments, the bound transposase is removed from the double-stranded genomic DNA fragment before gap filling and extension. In some embodiments, the transposase is inactivated by using a protease. In some embodiments, further inactivation of the protease is included by heat and / or a protease inhibitor. In some embodiments, residual transposase is inactivated by digestion with a protease (such as QIAGEN protease) at a final concentration of 1-500 μg / mL at 37-55°C for 10-60 minutes. The protease is then inactivated by heat and / or a protease inhibitor (such as AEBSF).

[0093] Gap filling

[0094] The double-stranded fragments generated by the transposon method described herein are then processed to fill gaps. Gap filling described herein is performed using a uracil-intolerant polymerase. A uracil-intolerant polymerase is a DNA polymerase that cannot read and amplify a nucleic acid template containing uracil. Such polymerases are known to those skilled in the art. According to one aspect, a uracil nucleotide is contained between the downstream (e.g., 3' end) of the first primer-binding sequence described herein and the upstream (e.g., 5' end) of the transposon binding site, thereby preventing the 3' end from extending further into the first primer-binding sequence during the gap-filling process using a uracil-intolerant polymerase, resulting in a first double-stranded extension product with a 5' overhang at each end.

[0095] In some embodiments, the uracil-intolerant polymerase is selected from Q5 polymerase, Deep Vent DNA polymerase, Phusion high-fidelity polymerase, KAPA high-fidelity polymerase, Phanta polymerase, or any combination thereof.

[0096] In some embodiments, the gap-filling step includes using four nucleotides: A, T, C, and G, wherein the cytosine nucleotide (e.g., dCTP) contains a modified cytosine that is resistant to the transformation step. Resistant to transformation, as described herein, means being able to withstand a reaction that converts cytosine to uracil (e.g., APOBEC deamination) without sequence alteration. In some embodiments, the modified cytosine is a methylated cytosine, such as 5-methylcytosine (5-mdC) or 5-hydroxymethylcytosine (5-hmdC). In some embodiments, the gap-filling step includes using methylated dCTP instead of dCTP in a dNTP mixture.

[0097] Labeling single cell genomic DNA

[0098] According to one aspect, the method provided in this disclosure relates to distinguishing different cells by introducing a tag sequence that labels the genomic DNA of different cells with different unique tag sequences. The introduction of the tag sequence is achieved via template-converted oligonucleotides as described herein and based on an extension reaction.

[0099] In some embodiments, the tag sequence may have any length suitable for distinguishing different cells. In some exemplary embodiments, the tag sequence is 4-30 bp in length, such as 4-25 bp or 4-20 bp. In some exemplary embodiments, the tag sequence is 4 bp-16 bp in length.

[0100] In some embodiments, the transposase binding site binding sequence of the template-converting oligonucleotide includes a locked nucleotide (LNA) modification. According to one aspect, the genomic DNA fragment has anticomplementary transposase binding site sequences at both ends, thus creating the possibility of single-stranded DNA circularization during annealing. Transposase binding site binding sequences with LNA modifications have higher Tm values (e.g., each LNA modification can increase the Tm by approximately 2°C), which allows for setting higher annealing temperatures to achieve template-converting oligonucleotide binding while preventing single-stranded DNA circularization.

[0101] In some embodiments, the transposase binding site binding sequence comprises multiple LNA modifications, such as 2, 3, 4, or 5 LNA modifications. In some embodiments, the transposase binding site binding sequence comprises 5 LNA modifications.

[0102] In some embodiments, the template-converting oligonucleotide is annealed to the transposase binding site sequence located at the 3' end of the genomic fragment in the first double-stranded extension product, and then the first double-stranded extension product is given the tag sequence and the second primer binding sequence by an extension reaction (e.g., PCR), thereby tagging the genomic DNA fragment.

[0103] In some embodiments, the extension step includes using four nucleotides, A, T, C, and G, wherein the cytosine nucleotide (e.g., dCTP) contains a modified cytosine that is resistant to the transformation step. Resistant to transformation, as described herein, means being able to withstand a reaction that converts cytosine to uracil (e.g., APOBEC deamination) without sequence alteration. In some embodiments, the modified cytosine is a methylated cytosine, such as 5-methylcytosine (5-mdC) or 5-hydroxymethylcytosine (5-hmdC). In some embodiments, the extension step includes using methylated dCTP instead of dCTP in a dNTP mixture.

[0104] Vector DNA and optional purification

[0105] According to some aspects, transformation is performed in the presence of vector DNA. According to some aspects, a mixed library of tagged genomic DNA fragments from different cells is processed in the presence of vector DNA to convert cytosine to uracil. The vector DNA can be any dsDNA fragment of length between 100 and 4,000 base pairs (bp) (e.g., 100 bp to 1000 bp, 100 bp to 800 bp, 100 bp to 600 bp, 100 bp to 500 bp, 100 bp to 400 bp, 200 bp to 400 bp). In some embodiments, the vector DNA is selected from fragments of length about 200 bp to 400 bp, for example, 300 bp. In some embodiments, the vector DNA can be a different DNA type from the target DNA. In some embodiments, the vector DNA can be the same DNA type as the target DNA. In some embodiments, the vector DNA is sonicated λDNA. In some embodiments, the vector DNA does not include an Illumina sequencing adaptor.

[0106] The vector DNA is used to reduce damage to or loss of the target DNA caused by the transformation treatment. In some embodiments, the vector DNA is added to the reaction medium in an amount of 100 to 1000 times (e.g., 100 to 1000 times) the amount of sample DNA.

[0107] Depending on certain aspects, prior to the transformation treatment required for methylation detection, the reaction medium comprising the mixed library and vector DNA can be purified by DNA spin-column or bead-based DNA purification, or other purification methods known to those skilled in the art. Alternatively, the reaction medium can be directly transformed.

[0108] Transformation

[0109] According to one aspect, the method provided in this disclosure involves mixing and processing libraries of tagged genomic DNA fragments obtained from different cells to convert cytosine into uracil.

[0110] In some implementations, the transformation is performed in the presence of vector DNA.

[0111] In some embodiments, the process of converting cytosine to uracil is an enzymatic conversion. Reagents used for converting cytosine to uracil are known to those skilled in the art. In some embodiments, an enzymatic agent for converting cytosine to uracil, namely a cytosine deaminase, including those of the ABOPEC family, such as APOBEC-seq or APOBEC3A, is used. Members of the APOBEC family are cytidine deaminases that convert cytosine to uracil without altering the modified cytosine bases. Based on this disclosure, other enzymatic agents will become apparent to those skilled in the art.

[0112] In some embodiments, the enzymatic transformation includes the use of T4-BGT enzyme and APOBEC3A enzyme to detect 5hmC.

[0113] In some embodiments, the enzymatic transformation includes the use of TET2 enzyme and APOBEC3A enzyme to detect 5mC or 5hmC.

[0114] In some embodiments, the vector DNA can be removed after transformation, or the transformed fragment can be amplified without amplifying the vector DNA, thereby obtaining amplified fragmented DNA. The tagged genomic DNA fragment obtained by the methods described herein has first and second primer-binding sequences at each end, thereby ensuring that the tagged genomic DNA fragment is sufficiently distinguishable from the vector DNA. In some embodiments, the DNA linked by the first and second primer-binding sequences is amplified while the vector DNA is not amplified. The vector DNA becomes single-stranded DNA, i.e., ssDNA, and is removed from the mixture, yielding a pure amplified target DNA fragment.

[0115] Optional purification

[0116] Depending on some aspects, the reaction medium comprising the transformed fragment may be purified prior to amplification by DNA spin-column or bead-based DNA purification or other purification methods known to those skilled in the art. Alternatively, the reaction medium may be directly subjected to amplification. In some embodiments, the process includes, after the transformation step but before the amplification step, purifying the reaction medium containing the transformed mixed library; preferably, the purification step is performed by DNA spin-column or bead-based DNA purification. In some embodiments, the reaction medium containing the transformed fragment is directly subjected to the amplification step without purification.

[0117] Amplification

[0118] According to one aspect, the method provided in this disclosure relates to amplifying a transformed mixed library using first and second primers to generate an amplicon.

[0119] In some implementations, first and second primers are used to amplify a transformed mixed library to generate amplicons, each containing a sequencing adapter at its end, thereby generating a sequencing library.

[0120] In some embodiments, the first primer introduces a first sequencing adapter sequence, and the second primer introduces a second sequencing adapter sequence. In some embodiments, the first primer comprises (i) a first portion at the 3' end containing a sequence capable of hybridizing with the first primer-binding sequence, and (ii) a second portion at the 5' end of the first portion containing the first sequencing adapter sequence. In some embodiments, the second portion of the first primer may also contain other desired functional sequences, such as an index sequence. In some embodiments, the second primer comprises (i) a first portion at the 3' end containing a sequence capable of hybridizing with the second primer-binding sequence, and (ii) a second portion at the 5' end of the first portion containing the second sequencing adapter sequence. In some embodiments, the second portion of the second primer may also contain other desired functional sequences, such as an index sequence.

[0121] In some implementations, the sequencing adapter is an Illumina sequencing adapter.

[0122] In some embodiments, the first primer introduces a P7 end adapter, and the second primer introduces a P5 end adapter. In some embodiments, the first primer comprises (i) a first portion at the 3' end containing a sequence capable of hybridizing with the first primer-binding sequence, and (ii) a second portion at the 5' end of the first portion containing a P7 sequence. In some embodiments, the second portion of the first primer may also contain other desired functional sequences, such as an i7 index sequence. In some embodiments, the second primer comprises (i) a first portion at the 3' end containing a sequence capable of hybridizing with the second primer-binding sequence, and (ii) a second portion at the 5' end of the first portion containing a P5 sequence. In some embodiments, the second portion of the second primer may also contain other desired functional sequences, such as an i5 index sequence.

[0123] In some embodiments, the first primer introduces a P5 end adapter, and the second primer introduces a P7 end adapter. In some embodiments, the first primer comprises (i) a first portion at the 3' end containing a sequence capable of hybridizing with the first primer-binding sequence, and (ii) a second portion at the 5' end of the first portion containing a P5 sequence. In some embodiments, the second portion of the first primer may also contain other desired functional sequences, such as an i5 index sequence. In some embodiments, the second primer comprises (i) a first portion at the 3' end containing a sequence capable of hybridizing with the second primer-binding sequence, and (ii) a second portion at the 5' end of the first portion containing a P7 sequence. In some embodiments, the second portion of the second primer may also contain other desired functional sequences, such as an i7 index sequence.

[0124] In some embodiments, the amplification step using the first and second primers is performed using a uracil-resistant polymerase. A uracil-resistant polymerase is a DNA polymerase capable of reading and amplifying a nucleic acid template containing uracil. Such polymerases are known to those skilled in the art. In some embodiments, the uracil-resistant polymerase is selected from Q5U polymerase, KAPA U+ polymerase, Phanta Uc polymerase, or any combination thereof.

[0125] In some implementations, PCR is used to achieve amplification. Methods for PCR are well known in the art. PCR typically involves providing oligonucleotide primers with the desired target sequence and amplification reagents, followed by thermal cycling in the presence of a polymerase (e.g., DNA polymerase). The primers are complementary to their respective corresponding strands of the double-stranded target sequence (“primer-binding sequences”). To achieve amplification, the double-stranded target sequence is denatured, and then the primers are annealed to their complementary sequences within the target molecule. After annealing, the primers are extended with a polymerase to form a new pair of complementary strands. The denaturation, primer annealing, and polymerase extension steps can be repeated multiple times (i.e., denaturation, annealing, and extension constitute a “cycle”; multiple “cycles” can be performed) to obtain high concentrations of amplified segments of the desired target sequence.

[0126] Optional purification

[0127] According to some aspects, prior to sequencing, the reaction medium containing the amplified fragment can be purified by DNA spin-column or bead-based DNA purification, or other purification methods known to those skilled in the art. DNA purification following amplification (such as by PCR reaction) removes most of the single-stranded vector DNA, resulting in a pure, amplified target DNA library ready for sequencing. In some embodiments, the purification of the reaction medium containing the amplicon is included after the amplification step but before the sequencing step; preferably, this purification step is performed by DNA spin-column or bead-based DNA purification.

[0128] Sequencing

[0129] DNA amplified according to the methods described herein can be sequenced and analyzed using methods known to those skilled in the art. A variety of sequencing methods known in the art can be used to determine the sequence of the target nucleic acid sequence, including but not limited to sequencing-by-synthesis (SBS), hybridization sequencing (SBH), sequencing by ligation (SBL) (Shendure et al. (2005) Science 309:1728), quantitative incremental fluorescent nucleotide addition sequencing (QIFNAS), stepwise ligation and cleavage, fluorescence resonance energy transfer (FRET), molecular beacons, TaqMan reporter gene probe digestion, pyrosequencing, fluorescence in situ sequencing (FISSEQ), FISSEQ beads (US Patent No. 7,425,431), wobble sequencing (PCT / US05 / 27695), and multiplex sequencing (US Serial No. 12 / 027,039, filed February 6, 2008; Porreca et al. (2007) Nat. Methods). 4:931), POLONY sequencing (US Patents 6,432,360, 6,485,944, and 6,511,803, and PCT / US05 / 06425); ROLONY (U.S. Serial No. 12 / 120,541, filed May 14, 2008); allele-specific oligonucleotide ligation assays (e.g., oligonucleotide ligation assays (OLA), single-template molecule OLA using ligated linear probes and rolling circle amplification (RCA) reads, ligated padlock probes, and / or single-template molecule OLA using ligated circular padlock probes and rolling circle amplification (RCA) reads), etc. High-throughput sequencing methods can also be used, such as platforms like Roche 454, Illumina, AB-SOLiD, Helicos, and Polonator. Several light-based sequencing technologies are known in the field (Landegren et al. (1998) Genome Res. 8:769-76; Kwok (2000) Pharmacogenomics 1:95-100 and Shi (2001) Clin. Chem. 47:164-172).

[0130] In some implementations, high-throughput sequencing methods (such as the Illumina sequencing platform) can be used to sequence the amplified DNA. In some implementations, the sequencing is sequencing-by-synthesis (SBS).

[0131] Amplification and sequencing methods are useful in the field of predictive medicine, where diagnostic assays, prognostic assays, pharmacogenomics, and monitoring of clinical trials are used for prognostic (predictive) purposes, thereby preventing the treatment of individuals. Therefore, one aspect of the present invention relates to diagnostic assays for determining genomic DNA to ascertain an individual's risk of developing a symptom and / or disease. Such assays can be used for prognostic or predictive purposes, thereby preventing the treatment of individuals before the onset of a symptom and / or disease. Thus, in some exemplary embodiments, methods for diagnosing and / or prognosing one or more diseases and / or conditions are provided using the methods described herein for analyzing the methylation characteristics of single-cell genomic DNA.

[0132] It should be understood that the embodiments of the invention described herein are merely illustrative of some applications of the principles of the invention. Various modifications can be made by those skilled in the art based on the teachings presented herein without departing from the true spirit and scope of the invention. All references, patents, and published patent applications cited throughout this application are hereby incorporated in their entirety for all purposes.

[0133] The present invention will now be described with reference to the following embodiments, which are intended to illustrate the invention (and not limit it). Unless otherwise specified, specific conditions in the embodiments are performed under conventional conditions or conditions recommended by the manufacturer. Reagents or instruments used, unless otherwise specified, are all commercially available conventional products. Those skilled in the art will understand that the embodiments are described by way of example and are not intended to limit the scope of protection claimed in this application.

[0134] Example 1

[0135] According to an exemplary embodiment of this application, this application provides a high-throughput single-cell 5mC / 5hmC detection method, which may be referred to as TSO-Cabernet. The TSO-Cabernet method includes sorting single cells using FACS; fragmenting single-cell genomic DNA using Tn5 transposons, introducing sequencing primers and tag sequences at both ends of the fragments to independently label single cells; and performing enzymatic transformation and library construction sequencing on a mixture of single cells labeled with different tag sequences. Figure 1 ).

[0136] According to an exemplary embodiment of this application, a method for sequence tagging genomic fragments of a single cell includes: Tn5 transposition, gap filling, adapter replacement, and extension (…). Figure 2 ).in:

[0137] Tn5 transposition: Genomic DNA is fragmented using Tn5 transposition, with custom-defined P7 sequencing primer sequences (i.e., first primer-binding sequences) introduced at both ends of the genomic DNA. The breakage of the genomic DNA and the insertion of transposons leave gaps at both ends of the transposition / insertion sites, resulting in genomic DNA fragments with a transposon DNA Tnp binding site at the 5' position on the upper strand and a custom-defined P7 sequencing primer sequence, and a transposon DNA Tnp binding site at the 5' position on the lower strand and a custom-defined P7 sequencing primer sequence. A uracil base is designed between the custom-defined P7 sequencing primer sequence and the ME sequence (i.e., transposase binding site) to prevent the 3' end from extending further towards the P7 adapter sequence during DNA polymerase gap filling, thus facilitating subsequent adapter replacement. Furthermore, the P7 end sequencing primer sequence (the adapter sequence upstream of the ME sequence) is designed to be "cytosine-free". In the presence of uracil bases, this adapter sequence would exist in single-stranded form. To prevent the BGT protein from being unable to bind properly during the enzymatic conversion reaction and from being unable to provide glycosyl groups to protect the cytosine (BGT binds to double-stranded DNA), thus making it unable to withstand the APOBEC deamination reaction, the P7 end adapter sequence was designed to be "cytosine-free", thereby enabling it to withstand the APOBEC deamination reaction without sequence changes.

[0138] Adapter replacement: Adapter replacement is based on a PCR reaction, where a custom oligonucleotide chain (i.e., template-converting oligonucleotide) carrying a tag sequence and a P5-end sequencing primer sequence (i.e., the second primer-binding sequence) binds to the ME sequence through complementary base pairing. Furthermore, the sample DNA fragment itself has inversely complementary ME sequences at both ends, thus there is a possibility of single-stranded DNA circularization during cooling annealing. To avoid this, five locked nucleotides (LNAs) are designed to modify the ME sequence portion of the custom oligonucleotide chain used for adapter replacement. ME sequences with LNA modifications have higher Tm values (each LNA modification can increase the Tm value by approximately 2°C), which allows for setting higher annealing temperatures to achieve oligonucleotide chain binding while preventing single-stranded DNA circularization. Additionally, those skilled in the art will understand that the P5-end sequencing primer sequence can also be introduced via transposition, and the P7-end sequencing primer sequence can be introduced during adapter replacement.

[0139] Annealing extension: The annealed oligonucleotide chain serves as a template strand, allowing the sample DNA to undergo PCR extension to obtain the P5-end sequencing primer sequence and tag sequence. The oligonucleotide chain also extends to the 3' end, forming a double-stranded DNA structure to ensure the efficiency of subsequent DNA purification and enzymatic transformation. Furthermore, modified cytosine is used as a substrate during PCR extension, ensuring that the extended portion does not undergo sequence changes after enzymatic transformation, thus guaranteeing the normal progress of amplification and library construction.

[0140] At this point, the DNA double strands produced through the extension step already possess tag sequence information. Single-cell DNA samples carrying different tag sequences can be mixed together for subsequent enzymatic transformations, and finally, using the sequences at both ends of the fragments as bridges, a sequencing library adapted for the Illumina platform can be constructed.

[0141] Example 2

[0142] Preparation of reaction solution

[0143] 2.1 Preparation of cell lysis buffer

[0144] Table 2-1: Cell lysis buffer formulation

[0145]

[0146] *Sequence: TCAGGTTTTCCTGAA (SEQ ID NO:1)

[0147] 2.2 Preparation of transposable reaction solution (2x)

[0148] Table 2-2: Formulation of Transposable Reaction Solution

[0149]

[0150] 2.3 Preparation of Transposition Termination Solution

[0151] Table 2-3: Formulation of Transposition Termination Fluid

[0152]

[0153] 2.4 Preparation of transposon annealing buffer (10x)

[0154] Table 2-4: Transposon Annealing Buffer Formulation

[0155]

[0156] 2.5 Preparation of transposable complex storage solution

[0157] Table 2-5: Formulation of Transposable Complex Storage Fluid

[0158]

[0159] 2.6 Preparation of magnetic bead dilution solution

[0160] Table 2-6: Formulation of Magnetic Bead Diluent Solution

[0161]

[0162] 2.7 Lambda DNA Fragmentation

[0163] Unmodified Lambda DNA (Thermo Scientific, SD0021) was broken down to 300 bp using a non-contact ultrasonic disruptor, purified, and stored at -20°C for long-term storage.

[0164] Example 3

[0165] Preparation of transposition complex

[0166] 3.1 Preparation of transposons

[0167] Table 3-1: Transposon Sequences

[0168]

[0169]

[0170] One strand of the transposon is Tn5_3U_oligo, which contains the P7 end sequencing primer sequence, uracil nucleotide, and ME sequence, as shown in Table 3-1. The other strand is Tn5_comp, which contains the ME sequence, as shown in Table 3-1.

[0171] Table 3-2: Transposon Annealing Reaction System

[0172]

[0173] Mix the ingredients shown in the table above, run the program in a PCR instrument (95℃ for 1 min, cooling down to 25℃ at 0.1℃ every 3 seconds, hold at 4℃), and store for long-term storage at -20℃.

[0174] 3.2 Assembly of the transpose complex

[0175] The transposon was mixed with Tn5 transposase (Vazyme) at a ratio of 1.1:1, incubated at room temperature for 30 min, diluted to 250 nM with transposon storage solution, aliquoted and stored at -80℃.

[0176] Example 4

[0177] Isolation of single cells and cell lysis

[0178] Single-cell sorting was performed using pipettes or a flow cytometer. Single cells were sorted into 0.2 mL PCR tubes containing 2.5 μL of cell lysis buffer and run the cell lysis PCR program (50℃ 1 h, 65℃ 1 h, 70℃ 15 min, 4℃ hold). Single-cell lysis products were stored at -80℃.

[0179] Example 5

[0180] Tagging of single cells

[0181] 5.1Tn5 transposable reaction

[0182] Table 5-1: Transposon Annealing Reaction System

[0183]

[0184] Prepare the reaction system according to Table 5-1. After incubating at 55℃ for 10 min, add 1 μL of 2 mg / mL protease (QIAGEN, 19157) and 1 μL of transposition stop solution to the transposition reaction system. Run the PCR program (50℃ for 40 min, 70℃ for 15 min, 4℃ hold) to inactivate the transposase.

[0185] 5.2 Gap filling and joint conversion

[0186] Add 18 μL of adapter conversion reaction premix (see Table 5-2) to the DNA sample after the transposition reaction is completed, and run the program in the PCR instrument (50℃ for 3 min, 98℃ for 30 s, 10 cycles (98℃ for 10 s, 59℃ for 20 s, 72℃ for 1 min), 72℃ for 2 min, 4℃ hold). The tagged TSO sequence (i.e., template conversion oligonucleotide) from the 5' end to the 3' end includes: the P5 sequencing primer sequence, the tag sequence, and the ME binding sequence. The ME binding sequence binds to the ME sequence through complementary base pairing. An exemplary sequence of the tagged TSO sequence is shown in SEQ ID NO:4: TCGTCGGCAGCGTC (P5 sequencing primer sequence) TTACCGAC (tag sequence) AGATGTGTA+TA+AG+AG+AC+AG (ME binding sequence), note: "+" indicates that the nucleotide at its 3' adjacent position is modified with LNA.

[0187] Table 5-2: Premixed Solution for Connector Conversion Reaction

[0188]

[0189] *Methylated dCTP (NEB, N0356S) can be replaced with hydroxymethylated dCTP (Jena Bioscience, NU-932L).

[0190] The 96-well plate samples were then mixed, and 40 ng of fragmented Lambda DNA and 5 mL of magnetic beads (Beckman, B23319) were added. DNA purification was performed according to the instructions, and finally eluted in 56 μL of 1 mM Tris-HCl (pH = 8.0).

[0191] Example 6

[0192] Library preparation

[0193] 6.1 TET2 reaction (detection at 5mC)

[0194] Add 44 μL of TET2 reaction premix to 56 μL of elution product (see Table 6-1) and incubate at 37 °C for 1 h.

[0195] Table 6-1: TET2 Reaction Premix

[0196]

[0197] After the TET2 reaction was completed, 2 μL of stop reagent (NEB, E7125L) was added and the mixture was incubated at 37°C for 30 min to terminate the TET2 reaction. Subsequently, 183.6 μL of magnetic beads were added for DNA purification according to the instructions, and the mixture was eluted with 8 μL of 1 mM Tris-HCl (pH = 8.0).

[0198] 6.2 BGT reaction (detection at 5 hmC)

[0199] Add 44 μL of BGT reaction premix to 56 μL of elution product (see Table 6-2) and incubate at 37 °C for 2 h.

[0200] Table 6-2: BGT Reaction Premix

[0201]

[0202] After the BGT reaction was completed, 5 μL of proteinase K (NEB, P8107S) was added and the mixture was incubated at 37°C for 30 min to terminate the BGT reaction. Subsequently, 189 μL of magnetic beads were added for DNA purification according to the instructions, followed by elution with 8 μL of 1 mM Tris-HCl (pH = 8.0).

[0203] 6.3 APOBEC reaction

[0204] Table 6-3: APOBEC Reaction Premix

[0205]

[0206] Add 2 μL of 0.1M NaOH to 8 μL of the elution product obtained in 6.1 or 6.2, incubate at 50 °C for 10 min to denature the DNA double strand, and then rapidly cool with ice. Keep the sample on ice and add 10 μL of APOBEC reaction premix (formulation shown in Table 6-3), incubate at 37 °C for 3 h (detection at 5 mC) or 12 h (detection at 5 hC).

[0207] 6.4 Library Augmentation

[0208] Add 20 μL of 2x Q5U PCR premix (NEB, M0597L) and 0.4 μL of 100 μM s3N501 primer and 0.4 μL of 100 μM s3N701 primer. The s3N501 primer introduces the P5 end sequencing adapter, containing the P5 sequence, i5 index sequence, and a sequence that hybridizes with the P5 end sequencing primer sequence at one end of the introduced genomic fragment. An example sequence is shown in XX. The s3N701 primer introduces the P7 end sequencing adapter, containing the P7 sequence, i7 index sequence, and a sequence that hybridizes with the P7 end sequencing primer sequence at one end of the introduced genomic fragment. An example sequence is shown in Table 6-4.

[0209] Table 6-4: Connector Primer Sequences

[0210]

[0211] Whole genome amplification was performed using a PCR instrument with the following program: 98℃ for 30s, 11 cycles (98℃ for 10s, 60℃ for 30s, 65℃ for 90s, 65℃ for 5min, 4℃ hold). Subsequently, DNA purification and fragment selection were performed according to the instructions using a DNA purification kit.

[0212] Example 7

[0213] Library sequencing

[0214] The purified and fragment-selected libraries were quantified using a Qubit fluorometer, and the fragment distribution was detected using a fragment analysis system. A 20% base balancing sequence was incorporated to increase base complexity. The final library was sequenced at a concentration of 0.9 pM using an Illumina NovaSeq 6000 sequencer with 150 bp paired reads. The sequencing primer sequences are shown in Table 7.

[0215] Table 7: Sequencing Primer Sequences

[0216]

[0217]

[0218] Example 8

[0219] Sequencing results

[0220] Multi-cell labeled, mixed-library omics sequencing technologies often suffer from problems such as poor quality control of low-depth cells and uneven distribution of sequencing data due to significant differences in sequencing depth among individual cells. By statistically analyzing the sequencing reads obtained from cells in the same 96-well plate after processing using the TSO-Cabernet method disclosed in this publication, it was found that this method can achieve a uniform sequencing depth distribution. Figure 3 (Left figure) This effectively ensures that multiple cells in the same library can obtain sufficient sequencing data.

[0221] After mixing multiple cells together for enzymatic transformation, library construction, and sequencing, accurately separating and tracing the sequencing data using independent tag sequences for each cell is crucial to ensuring the accuracy of each single-cell analysis. To verify this, human and mouse genomic DNA were added to different wells of the same 96-well plate, and the DNA in each well was tagged, enzymatically transformed, library constructed, and sequenced according to the TSO-Cabernet procedure provided in this disclosure. Finally, the obtained separated data was compared with human and mouse reference genomes. By observing the number of sequencing reads aligned to human and mouse reference genomes for each sample, it was found that samples with significant mixing of human and mouse genomes were virtually nonexistent. Figure 3 (See right figure) This illustrates that this method allows for accurate tracing of the vast majority (greater than 99%) of sequencing reads based on the tag sequence information contained in the sequencing data, thereby ensuring the accuracy of downstream analysis.

[0222] The K562 sample, which underwent 5mC / 5hmC detection according to the TSO-Cabernet protocol, contained three standards (C corresponds to the Lambda DNA fragment; 5mC corresponds to the pUC19 DNA fragment; and 5hmC corresponds to the 5hmC-modified DNA fragment), which respectively indicate the detection accuracy of C, 5mC, and 5hmC. Figure 4A). Statistical results show that when performing 5hmC sequencing, the average proportions of C and 5mC incorrectly identified as 5hmC were 0.576% and 1.54%, respectively, while the average proportion of 5hmC correctly identified was 99.6%; when performing 5mC sequencing, the average proportion of C incorrectly identified as 5mC was 0.818%, while the average proportion of 5mC correctly identified was 98.3%. Furthermore, to verify the reliability of the TSO-Cabernet workflow in detecting DNA methylation modifications, five single-cell samples were extracted and compared with samples (including single-cell and multi-cell samples) detected using the Cabernet technology disclosed in WO2021077415 for genome-wide DNA methylation modification correlation analysis. Figure 4 Definitions of terms Obtaining single cell genomic DNA Transposition Gap filling Labeling single cell genomic DNA Vector DNA and optional purification Transformation Optional purification Amplification Optional purification Sequencing Figure 1 Figure 2 Preparation of reaction solution Preparation of transposition complex Isolation of single cells and cell lysis Tagging of single cells Library preparation Library sequencing Sequencing results Figure 3 Figure 3 Figure 4 Figure 4 Definitions of terms Obtaining single cell genomic DNA Transposition Gap filling Labeling single cell genomic DNA Vector DNA and optional purification Transformation Optional purification B). The figure shows a high degree of similarity between the TSO-Cabernet and Cabernet methods in detecting DNA methylation modifications in the genome. These results demonstrate that the TSO-Cabernet method can provide reliable detection results for 5mC and 5hmC. The TSO-Cabernet method provided in this disclosure enables simultaneous transformation and library preparation of multiple cells, demonstrating that the method significantly improves throughput while maintaining detection accuracy.

Claims

1. A method for analyzing the methylation characteristics of single-cell genomic DNA, including: Genomic DNA from a single cell is contacted with multiple transposons, wherein each transposon contains a transposase and transposon DNA, wherein the transposon DNA includes a double-stranded transposase binding site and a dangling splint, wherein the dangling splint includes a first primer-binding sequence at its 5' end and contains uracil nucleotides between the downstream of the first primer-binding sequence and the upstream of the transposase binding site; to obtain a double-stranded genomic DNA fragment containing transposon DNA at each end; wherein, (i) the first primer-binding sequence does not contain cytosine nucleotides; or, (ii) the first primer-binding sequence contains cytosine nucleotides containing methylated cytosine to tolerate the transformation step; The gap between the transposon DNA and the genomic DNA fragment is filled with a uracil-intolerant polymerase to form a first double-stranded extension product of the genomic DNA fragment; The first double-stranded extension product is contacted with a template-converting oligonucleotide, wherein the template-converting oligonucleotide comprises, from the 5' end to the 3' end: a second primer-binding sequence, a tag sequence, and a transposase-binding site binding sequence; wherein the tag sequence has a unique nucleotide sequence corresponding to the cell; and the transposase-binding site binding sequence contained in the template-converting oligonucleotide includes a locked nucleotide (LNA) modification. An extension reaction was performed using a uracil-intolerant polymerase to obtain a second double-stranded extension product; wherein the second double-stranded extension product constitutes a library of tagged genomic DNA fragments of the cell. Libraries of tagged genomic DNA fragments obtained from different cells are mixed, and the mixed libraries are processed to convert cytosine into uracil. The transformed mixed library was amplified using the first and second primers to generate amplicon.

2. The method of claim 1, wherein, The transposase binding site binding sequence contains multiple LNA modifications.

3. The method of claim 1, wherein, The transposase binding site binding sequence contains 2, 3, 4 or 5 LNA modifications.

4. The method of claim 1, wherein, The gap-filling and / or extension steps include the use of four nucleotides: A, T, C, and G, wherein the cytosine nucleotides contain methylated cytosine to tolerate the transformation steps.

5. The method of claim 4, wherein, The methylated cytosine is 5-methylcytosine (5-mdC) or 5-hydroxymethylcytosine (5-hmdC).

6. The method of claim 1, wherein, The methylated cytosine mentioned in (ii) is 5-methylcytosine (5-mdC) or 5-hydroxymethylcytosine (5-hmdC).

7. The method of claim 1, wherein, The transposon DNA contains a plurality of consecutive uracil nucleotides between the downstream of the first primer binding sequence and the upstream of the transposase binding site.

8. The method of claim 1, wherein, The transposon DNA contains two, three, or four consecutive uracil nucleotides between the downstream of the first primer binding sequence and the upstream of the transposase binding site.

9. The method of claim 1, wherein, The 3' end of the first primer binding sequence is adjacent to the uracil nucleotide.

10. The method of claim 1, wherein, The 5' end of the transposase binding site is adjacent to the uracil nucleotide.

11. The method according to any one of claims 1-10, wherein, The mixed library was processed in the presence of vector DNA to convert cytosine into uracil.

12. The method of claim 11, wherein, The vector DNA is selected from dsDNA fragments with a length between 100bp and 4000bp.

13. The method of claim 11, wherein, The vector DNA is λDNA that has been treated with ultrasound.

14. The method of any one of claims 1-10, further comprising sequencing the amplicon.

15. The method according to any one of claims 1-10, wherein, The transformed mixed library was amplified using first and second primers to generate amplicones, each containing a sequencing adapter at its end, thereby generating a sequencing library.

16. The method of claim 15, wherein, The sequencing adapter is an Illumina sequencing adapter.

17. The method of claim 16, wherein, The first primer is introduced into the P7 end connector, and the second primer is introduced into the P5 end connector.

18. The method of claim 17, wherein, The first primer comprises (i) a first portion at the 3' end containing a sequence capable of hybridizing with the first primer binding sequence, and (ii) a second portion at the 5' end of the first portion containing a P7 sequence; the second primer comprises (i) a first portion at the 3' end containing a sequence capable of hybridizing with the second primer binding sequence, and (ii) a second portion at the 5' end of the first portion containing a P5 sequence.

19. The method of claim 16, wherein, The first primer is introduced into the P5 end connector, and the second primer is introduced into the P7 end connector.

20. The method of claim 19, wherein, The first primer comprises (i) a first portion at the 3' end containing a sequence capable of hybridizing with the first primer binding sequence, and (ii) a second portion at the 5' end of the first portion containing the P5 sequence; the second primer comprises (i) a first portion at the 3' end containing a sequence capable of hybridizing with the second primer binding sequence, and (ii) a second portion at the 5' end of the first portion containing the P7 sequence.

21. The method according to any one of claims 1-10, wherein the transposase is a Tn5 transposase, a Mu transposase, a Tn7 transposase, or an IS5 transposase.

22. The method according to any one of claims 1-10, wherein the uracil-intolerant polymerase is selected from Q5 polymerase, Deep Vent DNA polymerase, Phusion high-fidelity polymerase, KAPA high-fidelity polymerase, Phanta polymerase, or any combination thereof.

23. The method according to any one of claims 1-10, wherein, The amplification step using the first and second primers is performed using a uracil-resistant polymerase.

24. The method of claim 23, wherein, The uracil-resistant polymerase is selected from Q5U polymerase, KAPA U+ polymerase, Phanta Uc polymerase, or any combination thereof.

25. The method of any one of claims 1-10, wherein the bound transposase is removed from the double-stranded genomic DNA fragment prior to gap filling and extension.

26. The method of any one of claims 1-10, further comprising the step of purifying the reaction medium containing the second double-stranded extended product after the extension step but before the conversion step.

27. The method of claim 26, wherein, The purification step is performed by DNA spinning column or bead-based DNA purification.

28. The method of any one of claims 1-10, further comprising the step of purifying the reaction medium containing the amplicon after the amplification step.

29. The method of claim 28, wherein, The purification step is performed by DNA spinning column or bead-based DNA purification.

30. The method of any one of claims 1-10, wherein the process of converting cytosine to uracil is an enzymatic conversion.

31. The method of claim 30, wherein, The enzymatic transformation includes the APOBEC deamination reaction.

32. The method of claim 30, wherein the enzymatic conversion comprises using T4-BGT enzyme and APOBEC3A enzyme, or using TET2 enzyme and APOBEC3A enzyme.

Citation Information

Patent Citations

Transposon end compositions and methods for modifying nucleic acids
EP2376517A1
Methods and compositions for DNA fragmentation and tagging by transposases
EP2527438A1
Methods for quantitative cDNA analysis in single-cell
US20070281313A1
Multiplex decoding of sequence tags in barcodes
US20080269068A1
Nanogrid rolling circle DNA sequencing
US20090018024A1

Patent Information

Abstract

Description

Patent Citations

Transposon end compositions and methods for modifying nucleic acids

Methods and compositions for DNA fragmentation and tagging by transposases

Methods for quantitative cDNA analysis in single-cell

Multiplex decoding of sequence tags in barcodes

Nanogrid rolling circle DNA sequencing