Method for identifying 5-methylcytosine (5MC) in target nucleic acid
By using sulfinic acid derivatives to convert 5mC into DHU, the problems of template degradation and high cost in existing technologies are solved, enabling highly sensitive and selective 5mC localization sequencing, and reducing DNA sample damage and cost.
Patent Information
- Authority / Receiving Office
- WO · WO
- Patent Type
- Applications
- Current Assignee / Owner
- SHENZHEN HUADA GENE INST
- Filing Date
- 2024-12-13
- Publication Date
- 2026-06-18
Smart Images

Figure CN2024139377_18062026_PF_FP_ABST
Abstract
Description
A method for identifying 5-methylcytosine (5mC) in target nucleic acids Technical Field
[0001] This invention belongs to the field of biotechnology, and more specifically, this invention relates to a method for identifying 5-methylcytosine (5mC) in target nucleic acids. Background Technology
[0002] Under the action of DNA methyltransferases, the 5-carbon atom of cytosine in mammalian DNA undergoes methylation modification, forming 5-methylcytosine (5mC). 5mC is an important epigenetic modification involved in many important biological processes, such as genomic imprinting and gene expression regulation. 5mC can be further oxidized sequentially by ten-eleven translocation (TET) proteins to 5-hydroxymethylcytosine (5hmC), 5-aldehyde cytosine (5fC), and 5-carboxycytosine (5caC), forming new modifications. 5mC plays a crucial role in a wide range of biological processes, from gene regulation to normal development. Abnormal DNA methylation has been associated with various diseases and is a recognized marker of cancer. Therefore, the identification of 5mC in DNA sequences is not only essential for basic research but also valuable for clinical applications, including diagnosis and treatment.
[0003] However, the abundance of these DNA modifications is very low, even less than 10 per 1000 DNA modifications. 6 There is less than one cytosine in DNA, and the chemical structures of many other unmodified components (cytosine, adenine, thymine, guanine) are similar to those of the modified components, which can cause serious interference when detecting these modified components. Therefore, single-base resolution localization analysis requires high sensitivity and high selectivity to ensure accurate detection.
[0004] The most widely used method for DNA methylation detection is bisulfite sequencing (BS) and its derivatives. These methods all employ bisulfite treatment to convert normal cytosine into uracil while maintaining 5mC integrity. After PCR amplification, uracil is read as thymine, allowing for the derivation of modification information for each cytosine at single-base resolution (where the C-to-T transition indicates the location of unmethylated cytosine). However, because this method is based on indirect rather than direct detection, and requires a thorough and large-scale chemical transformation of the biological sample by bisulfite in a harsh environment of high salt and high pH, it results in significant template degradation and destruction of the original sequence complexity, reducing the accuracy of results obtained from processing redundant data.
[0005] In 2019, Chunxiao Song of Oxford University creatively transformed the original indirect detection (converting the large proportion of unmodified C to T) into direct detection (converting the small proportion of 5mC to T), a technique known as TET-assisted pyridine borane sequencing (TAPS). The principle of TAPS is briefly described as follows: 5mC is first oxidized to 5caC by TET enzymes, which is then reduced to dihydrouracil (DHU) by pyridine borane. DHU can be recognized as uracil by most DNA polymerases for PCR amplification, thus achieving differential amplification of 5mC (TA base pair amplification) and unmodified C (CG base pair amplification). However, the pyridine borane used in this method is a flammable and explosive chemical with poor stability and relatively high cost.
[0006] In summary, given the shortcomings of existing technologies, there is still an urgent need in this field for a safe, bisulfite-free, highly sensitive, and highly selective 5mC localization sequencing method with single-base resolution that can be applied to the field of DNA modification sequencing. Summary of the Invention
[0007] To address the problems existing in the prior art, the present invention aims to provide a TET enzyme-assisted sulfinic acid derivative R 1 SO2R 2 Or R 2 SO2-SO2R 2 A single-base resolution localization method for 5-methylcytosine-modified DNA using reducing agents.
[0008] Therefore, in a first aspect, the present invention provides a method for converting 5-methylcytosine (5mC) in a target nucleic acid into dihydrouracil (DHU), the method comprising the following steps:
[0009] a) Provide a nucleic acid sample containing a target nucleic acid, wherein the target nucleotide contains 5-methylcytosine (5mC);
[0010] b) Process the nucleic acid sample to convert 5mC to 5-carboxycytosine (5caC) to obtain the first converted target nucleic acid;
[0011] c) The first converted target nucleic acid is further treated with a sulfinic acid derivative to reduce the 5caC to dihydrouracil (DHU) to obtain the second converted target nucleic acid.
[0012] In a second aspect, the present invention provides a method for identifying 5-methylcytosine (5mC) in a target nucleic acid, the method comprising the following steps:
[0013] a') Provide nucleic acid samples containing the target nucleic acid;
[0014] b) Process the nucleic acid sample to convert 5mC to 5-carboxycytosine (5caC) to obtain the first converted target nucleic acid;
[0015] c) The first converted target nucleic acid is further treated with a sulfinic acid derivative to reduce the 5caC to dihydrouracil (DHU) to obtain the second converted target nucleic acid;
[0016] d) Sequence analysis of the second transformed target nucleic acid.
[0017] In a third aspect, the present invention provides a kit comprising a sulfinic acid derivative.
[0018] The beneficial effects of the present invention include at least one or more of the following:
[0019] This invention provides a safe, bisulfite-free, highly sensitive, and highly selective single-base resolution method for localizing carboxylated or methylated cytosine in the field of DNA modification sequencing.
[0020] This invention is simple to operate, low in cost, and the sulfinic acid derivative reducing agent used is chemically stable and safer, while also better reducing damage to DNA samples. Attached Figure Description
[0021] Figure 1 shows the molecular weight determination results of the product obtained in Example 1. Detailed Implementation
[0022] The present invention will be described in detail below. It should be understood that the following description is merely illustrative and is not intended to limit the scope of the invention; the scope of protection of the invention is defined by the appended claims. Furthermore, those skilled in the art will understand that modifications can be made to the technical solutions of the present invention without departing from its spirit and intent. Unless otherwise specified, the technical means used in the embodiments are conventional means well known to those skilled in the art.
[0023] Unless otherwise defined, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which the subject matter pertains. Before a detailed description of the invention, the following definitions are provided to better understand it.
[0024] In the context of this invention, many embodiments use the expressions "comprising," "including," or "basically / mainly composed of...". The expressions "comprising," "including," or "basically / mainly composed of..." are generally understood as open-ended expressions, indicating that they include not only the elements, components, parts, or method steps specifically listed after the expression, but also other elements, components, parts, or method steps. Additionally, in this document, the expressions "comprising," "including," or "basically / mainly composed of..." can also be understood as closed-ended expressions in certain circumstances, indicating that they include only the elements, components, parts, or method steps specifically listed after the expression, and exclude any other elements, components, parts, or method steps. Furthermore, in the context of this invention, many embodiments use the expression "composed of...", which should be understood as a closed-ended expression, indicating that it includes only the elements, components, parts, or method steps specifically listed after the expression, and excludes any other elements, components, parts, or method steps.
[0025] In cases where numerical ranges are provided, such as concentration ranges, percentage ranges, or ratio ranges, it should be understood that, unless the context explicitly specifies otherwise, all intermediate values between the upper and lower limits of the range, separated by one-tenth of the unit of the lower limit, and any other values or intermediate values within the range, are included within the subject matter. The upper and lower limits of these smaller ranges may be independently included in the smaller ranges, and such embodiments are also included within the subject matter, limited by any specific excluded limit values within the range. Where the range includes one or two limit values, the range excluding any one or both of those included limit values is also included within the subject matter.
[0026] To better understand this teaching and without limiting its scope, unless otherwise indicated, all figures and other numerical values used in the specification and claims to express quantities, percentages, or proportions should in all cases be understood to be modified by the term "about," which can be defined as a range comprising ±10% of that value or endpoint. Therefore, unless otherwise indicated, the numerical parameters set forth in the following specification and appended claims are approximate values that may vary depending on the desired properties sought. At a minimum, each numerical parameter should be interpreted based at least on the reported significant figures and by applying common rounding techniques.
[0027] As used herein and in the appended claims, unless the context clearly specifies otherwise, the indefinite articles (“a”, “an”) and definite articles (“the”) in the singular form include the plural referent. Similarly, the terms indefinite article (“a”, “an”), “one or more”, and “at least one” are used interchangeably herein.
[0028] When a group of substituents is disclosed herein, it should be understood that all individual members of that group and all subgroups (including any isomers, enantiomers, and diastereomers of members of that group) are disclosed separately. When the Markush group or other groupings are used herein, all individual members of that group and all possible combinations and subcombinations of that group are intended to be included separately in this disclosure. When a compound is described herein without specifying a particular isomer, enantiomer, or diastereomer, such as in a formula or chemical name, the description is intended to include each isomer and enantiomer of the compound described separately, or any combination thereof.
[0029] In a first aspect, the present invention provides a method for converting 5-methylcytosine (5mC) in a target nucleic acid into dihydrouracil (DHU), the method comprising the following steps:
[0030] a) Provide a nucleic acid sample containing a target nucleic acid, wherein the target nucleotide contains 5-methylcytosine (5mC);
[0031] b) Process the nucleic acid sample to convert 5mC to 5-carboxycytosine (5caC) to obtain the first converted target nucleic acid;
[0032] c) The first converted target nucleic acid is further treated with a sulfinic acid derivative to reduce the 5caC to dihydrouracil (DHU) to obtain the second converted target nucleic acid.
[0033] In a second aspect, the present invention provides a method for identifying 5-methylcytosine (5mC) in a target nucleic acid, the method comprising the following steps:
[0034] a') Provide nucleic acid samples containing the target nucleic acid;
[0035] b) Process the nucleic acid sample to convert 5mC to 5-carboxycytosine (5caC) to obtain the first converted target nucleic acid;
[0036] c) The first converted target nucleic acid is further treated with a sulfinic acid derivative to reduce the 5caC to dihydrouracil (DHU) to obtain the second converted target nucleic acid;
[0037] d) Sequence analysis of the second transformed target nucleic acid.
[0038] In this paper, the term "identification" can be understood as (1) determining the presence of 5-methylcytosine in the target nucleic acid based on the presence of a conversion from 5-methylcytosine (5mC) to thymine (T) in the new strand of the amplified product, and (2) if present, further determining the position of 5-methylcytosine in the target nucleic acid based on the conversion from 5-methylcytosine (5mC) to thymine (T) in the new strand of the amplified product. Furthermore, the term "identification" can also be understood as providing a quantitative level of 5mC in the target nucleic acid based on the percentage of thymine (T) at all conversion sites in the new strand of the amplified product.
[0039] The steps of the method of the first and second aspects of the present invention will be described in detail below.
[0040] Step a) or a'): Provide a nucleic acid sample containing the target nucleic acid.
[0041] In the context of this invention, one objective is to convert modified cytosine (if present) in a target nucleic acid into a detectable form. Therefore, the target nucleic acid can be a modified target nucleic acid, more specifically, a target nucleic acid in which some or all of the cytosine is modified with a methyl or carboxyl group. For the conversion method of the first aspect of this invention, the target nucleic acid contains 5-methylcytosine (5mC), while for the identification method of the second aspect of this invention, the target nucleic acid may or may not contain 5-methylcytosine (5mC), and whether or not it contains 5-methylcytosine (5mC) can be determined by the identification method of this invention.
[0042] Furthermore, those skilled in the art will understand that if the cytosine in the target nucleic acid is modified with a carboxyl group, the cytosine can also be transformed or identified using the modified method of the present invention.
[0043] Nucleic acid modification, particularly methylation, plays a crucial role in growth and development, environmental responses, and disease development, especially tumorigenesis. A comprehensive description of nucleic acid modification patterns is in high demand for disease mechanism research, diagnosis, and treatment. The nucleic acid samples can be derived from blood, body fluids, or cells.
[0044] In one embodiment, prior to subsequent steps such as step b), the method of the second aspect of the invention may further include a step of fragmenting the target nucleic acid to obtain nucleic acid fragments.
[0045] This step is optional and may or may not be included, depending on the circumstances. The inventors have discovered that excessively long target nucleic acids have a significant negative impact on the accuracy of subsequent sequencing results. Therefore, when the target nucleic acid is too long, fragmentation can be considered to obtain nucleic acid fragments of the desired length. Target nucleic acid fragmentation can employ conventional methods in the art, including but not limited to: using fragmentation enzymes (e.g., Tn5 transposase, DNase I, Endonuclease V, Fragmentase, etc.), sonication, etc.
[0046] Therefore, in one embodiment, the target nucleic acid or the nucleic acid fragment has 11-255 nucleotide molecules, for example, 20-40, 50-70, 80-100, 110-130, 140-160, or 170-190 nucleotide molecules. In a preferred embodiment, the target nucleic acid or the nucleic acid fragment has 70-80 nucleotide molecules. In a most preferred embodiment, the target nucleic acid or the nucleic acid fragment has 74 nucleotide molecules.
[0047] Step b): Process the nucleic acid sample to convert 5mC to 5-carboxycytosine (5caC), thereby obtaining the first converted target nucleic acid.
[0048] The nucleic acid sample can be processed using any method known in the art suitable for converting 5mC to 5caC to obtain a first converted target nucleic acid.
[0049] Those skilled in the art will understand that, in the method of the second aspect of the present invention, when a nucleic acid fragmentation step is included between step a') and step b), the expression "processing the nucleic acid sample" in step b) refers to processing the nucleic acid fragment.
[0050] In some implementations, the nucleic acid sample or the nucleic acid fragment is treated with a TET enzyme in step b) to obtain the first transformed target nucleic acid.
[0051] TET enzyme is an enzyme found in living organisms containing α-ketoglutarate (α-KG) and Fe. 2+ The dioxygenase-dependent enzyme can catalyze the conversion of 5-methylcytosine (5mC) into 5-hydroxymethylcytosine (5hmC), 5-aldehyde cytosine (5fC), and 5-carboxycytosine (5caC), forming new modifications.
[0052] In some preferred embodiments, the TET enzyme is selected from human TET1, TET2 and TET3, mouse Tet1, Tet2 and Tet3, NgTET (NgTET), CcTET (CcTET) and their derivatives or analogs.
[0053] Step c): The first converted target nucleic acid is further treated with a sulfinic acid derivative to reduce the 5caC to dihydrouracil (DHU), thereby obtaining the second converted target nucleic acid.
[0054] Existing technologies include various DNA methylation analysis methods based on initial DNA sample processing, such as bisulfite sequencing and TET enzyme-assisted pyridineborane conversion sequencing. As described in the background section of this application, bisulfite sequencing is based on indirect detection rather than direct detection. It requires thorough and large-scale chemical transformation of the biological sample by sulfite in a harsh environment of high salt and high pH, resulting in significant template degradation and destruction of the original sequence complexity, thus reducing the accuracy of results obtained from processing redundant data. TET enzyme-assisted pyridineborane conversion sequencing, on the other hand, is a direct detection method, but the pyridineborane used is a flammable and explosive chemical with poor stability and relatively high cost.
[0055] In response to the aforementioned problems in the existing technology, the inventors unexpectedly discovered through repeated research that using sulfinic acid derivatives as a reducing agent can reduce 5caC to dihydrouracil (DHU). The method of this invention, using sulfinic acid derivatives as a reducing agent, is simple to operate, low in cost, and the sulfinic acid derivative reducing agent used is chemically stable and safer, while also better reducing damage to DNA samples.
[0056] In some embodiments, the sulfinic acid derivative has the structural formula R. 1 SO2R 2 or R 2 SO2-SO2R 2 ,in:
[0057] R 1 It includes formamidinyl, methyl, ethyl, n-propyl, isopropyl, n-butyl, isobutyl, sec-butyl, tert-butyl, n-pentyl, isopentyl, sec-pentyl, neopentyl, n-hexyl, hydroxymethyl and various substituted C1-C6 alkyl groups, aryl, heterocyclic and substituted aryl, heterocyclic;
[0058] R 2 The metal ion is hydrogen or a metal ion selected from the group consisting of the following metal ions: lithium ion, sodium ion, potassium ion, magnesium ion, calcium ion, strontium ion, barium ion, manganese ion, iron ion, cobalt ion, nickel ion, copper ion, zinc ion, silver ion, gold ion, molybdenum ion, or tungsten ion.
[0059] In this document, the term "C1-C6 alkyl" refers to an alkyl group with 1 to 6 carbon atoms in its main chain, for example, the main chain may have 1, 2, 3, 4, 5 or 6 carbon atoms.
[0060] In some implementations, R 1 It is a formamidinyl.
[0061] In some preferred embodiments, R 2 It can be hydrogen ions, lithium ions, sodium ions, or potassium ions.
[0062] In some embodiments, the sulfinic acid derivative is a dithionite, such as sodium dithionite, lithium dithionite, or potassium dithionite. Dithionites typically have strong reducing properties and can effectively reduce 5CaC to dihydrouracil (DHU).
[0063] In some preferred embodiments, the sulfinic acid derivative is formamidinic acid.
[0064] Furthermore, the inventors optimized the method conditions and found that buffer concentrations greater than 4M or less than 0.5M, or pH conditions less than 4.65 or greater than 8.0, all adversely affected the final conversion rate. In addition, the inventors also discovered that the solvent is essential for the method of this invention; without the solvent, the technical effects of this invention cannot be achieved.
[0065] Therefore, in some embodiments, step c) is carried out in a mixed solution of buffer and solvent, wherein the concentration of the buffer is 0.5-4M and the pH is 4.65-8.0. For example, the concentration of the buffer can be 0.5M, 0.8M, 1M, 1.2M, 1.5M, 1.8M, 2M, 2.2M, 2.5M, 2.8M, 3M, 3.2M, 3.5M, 3.8M or 4M; and the pH of the buffer can be 4.65, 4.8, 5.0, 5.2, 5.5, 5.8, 6.0, 6.2, 6.5, 6.8, 7.0, 7.2, 7.5, 7.8 or 8.0.
[0066] In some preferred embodiments, the concentration of the buffer solution is 3M and the pH is 5.2.
[0067] In some embodiments, the buffer solution may be sodium acetate buffer, TE buffer, or Tris-HCl buffer. In some preferred embodiments, the buffer solution is sodium acetate buffer.
[0068] In some embodiments, the solvent may be N,N-dimethylformamide (DMF), N,N-dimethylacetamide (DMA), or dimethyl sulfoxide (DMSO). In some preferred embodiments, the solvent is DMF.
[0069] In some embodiments, the volume ratio of the buffer solution to the solvent in the mixed solution can be (1-3):1, for example 1:1, 1.2:1, 1.5:1, 1.8:1, 2:1, 2.2:1, 2.5:1, 2.8:1 or 3:1.
[0070] In optimizing the method conditions, the inventors also discovered that lower reaction temperatures (e.g., 37°C) lead to lower conversion rates, while increasing the reaction temperature (e.g., to 60°C) can significantly improve the conversion rate. Furthermore, the inventors found that excessively long reaction times also negatively impact the final conversion rate.
[0071] Therefore, in some embodiments, step c) is carried out at a reaction temperature of 50-70°C (e.g., 50°C, 55°C, 60°C, 65°C or 70°C).
[0072] In some preferred embodiments, the reaction temperature is 60°C.
[0073] In some implementations, the reaction takes place for 24-48 hours.
[0074] In some preferred embodiments, the reaction is carried out for 24 hours.
[0075] In some implementations, the method further includes, between step c) and subsequent step d), constructing a sequencing library based on the second converted target nucleic acid.
[0076] Sequencing libraries can be constructed using any method known in the art. For example, sequencing libraries can be constructed using adapter ligation, transposase ligation, single-strand circularization, etc. DNA nanosphere (DNB) libraries can be generated using single-strand circularization and then further used for sequencing and analysis.
[0077] Step d): Sequence analysis of the second transformed target nucleic acid.
[0078] In one implementation, the sequence analysis includes amplification and sequencing of the second transformed target nucleic acid.
[0079] Conventional methods can be used to amplify and sequence DNA or sequencing libraries. The location and quantitative level of 5mC in the target nucleic acid can be obtained by analyzing the sequencing data. The conversion of 5-methylcytosine (5mC) to thymine (T) in the new strand of the amplified product indicates the presence of 5-methylcytosine (5mC) in the target nucleotide and indicates the location of 5mC in the target nucleic acid. The percentage of T at all conversion sites in the new strand of the amplified product provides the quantitative level of 5mC in the target nucleic acid.
[0080] In a third aspect, the present invention provides a kit comprising a sulfinic acid derivative.
[0081] In some embodiments, the sulfinic acid derivative has the structural formula R. 1 SO2R 2 Or R 2 SO2-SO2R 2 ,in:
[0082] R 1 It includes formamidinyl, methyl, ethyl, n-propyl, isopropyl, n-butyl, isobutyl, sec-butyl, tert-butyl, n-pentyl, isopentyl, sec-pentyl, neopentyl, n-hexyl, hydroxymethyl and various substituted C1-C6 alkyl groups, aryl, heterocyclic and substituted aryl, heterocyclic;
[0083] R 2 The metal ion is hydrogen or a metal ion selected from the group consisting of the following metal ions: lithium ion, sodium ion, potassium ion, magnesium ion, calcium ion, strontium ion, barium ion, manganese ion, iron ion, cobalt ion, nickel ion, copper ion, zinc ion, silver ion, gold ion, molybdenum ion, or tungsten ion.
[0084] In this document, the term "C1-C6 alkyl" refers to a main chain with 1 to 6 carbon atoms, for example, the main chain can have 1, 2, 3, 4, 5 or 6 carbon atoms.
[0085] In some implementations, R 1 It is a formamidinyl.
[0086] In some preferred embodiments, R 2 It can be hydrogen ions, lithium ions, sodium ions, or potassium ions.
[0087] In some embodiments, the sulfinic acid derivative is a dithionite, such as sodium dithionite, lithium dithionite, or potassium dithionite. Dithionites typically have strong reducing properties and can effectively reduce 5caC to dihydrouracil (DHU).
[0088] In some preferred embodiments, the sulfinic acid derivative is formamidinic acid.
[0089] In some implementations, the kit also includes a TET enzyme.
[0090] In some embodiments, the TET enzyme is selected from human TET1, TET2 and TET3, mouse Tet1, Tet2 and Tet3, NgTET (NgTET), CcTET (CcTET) and their derivatives or analogs.
[0091] In some embodiments, the kit further includes a buffer solution with a concentration of 0.5-4M and a pH of 4.65-8.0, such as sodium acetate buffer, TE buffer, or Tris-HCl buffer.
[0092] In some preferred embodiments, the buffer solution is a sodium acetate buffer solution with a concentration of 3M and a pH of 5.2.
[0093] In some embodiments, the kit further includes solvents such as N,N-dimethylformamide (DMF), N,N-dimethylacetamide (DMA), or dimethyl sulfoxide (DMSO).
[0094] In some preferred embodiments, the solvent is DMF.
[0095] It is understood that the descriptions of the first and second aspects of the present invention given above also apply to the third aspect of the present invention. Therefore, for the sake of brevity, they will not be repeated here.
[0096] Example
[0097] The embodiments of the present invention will be described in detail below with reference to examples. Those skilled in the art will understand that the following examples are merely illustrative and should not be considered as limiting the scope of the present invention. Where specific techniques or conditions are not specified in the examples, they are performed according to the techniques or conditions described in the literature in the art or according to the product instructions. Reagents or instruments whose manufacturers are not specified are all conventional products that can be obtained commercially.
[0098] Example 1: Transformation of carboxylated DNA by memidazine sulfinic acid
[0099] The target nucleic acid used was a pre-made target nucleic acid with carboxylation modification, the DNA sequence of which was (5' to 3'): TCGAC5caCGGATC (purchased from Changzhou Xinyisheng Life Technology Co., Ltd.).
[0100] Sulphin derivative reduction: 2.16 mg of memidazine sulfinic acid was added to a mixed solvent containing 30 μL of sodium acetate buffer (pH 5.2, 3 M) and 20 μL of N,N-dimethylformamide (DMF), followed by the addition of 50–300 ng of oxidized DNA dissolved in water. The reaction was carried out in a Thermo-Shaker at 60 °C and 1500 rpm for 24 hours. The reaction product was detected by mass spectrometry, and the results are shown in Figure 1. The theoretical molecular weight of the product was 3320.2, and the measured value was 3319.9. These results demonstrate that memidazine sulfinic acid, as a reducing agent, can successfully convert 5caC in DNA to dihydrouracil (DHU).
[0101] Example 2: Transformation and sequencing of carboxylated DNA by memidazine sulfinic acid
[0102] 1. Preparation of template DNA:
[0103] A 74 bp DNA template was synthesized directly using the following sequence, which contains a 5caC on the sense strand, thus eliminating the need for TET enzyme oxidation.
[0104] 2. Transformation:
[0105] Sulinate derivative reduction: Methylammonium sulfinic acid (2.16 mg) was added to a mixed solvent containing 30 μL sodium acetate buffer (pH = 5.2, 3 M) and 20 μL DMF, followed by the addition of 50–300 ng of the synthesized DNA dissolved in water. The reaction was carried out in a Thermo-Shaker at 37 °C and 1500 rpm for 24 hours. After the reaction was complete, the DNA was... DNA purified using the DNA Cleanup Columns (5μg) (NEB, catalog number: T1034L) purification kit.
[0106] 3. Database creation:
[0107] The purified DNA samples were used to construct libraries and sequence the DNA samples using the BGI DNA Library Construction Kit (Enzyme Digestion DNA Library Preparation Kit, BGI Genomics, Product No.: 1000006987).
[0108] The specific steps for building the database are as follows:
[0109] 1: End repair and addition of dA tail
[0110] 1.1 Take 5 ng of DNA for end repair. The volume should be ≤40 μL. If the volume is less than 40 μL, make up the difference with TE Buffer.
[0111] 1.2 Prepare the end-repair reaction solution, the composition of which is shown in the table below:
[0112] 1.3 Add the prepared reaction solution to the DNA sample, vortex and centrifuge the resulting mixture to the bottom of the PCR tube, and carry out the reaction according to the reaction conditions shown in the table below.
[0113] 2: Connector Connection
[0114] 2.1 Dilute the MGI Adapter in the kit 10 times, then take 5 μL of the adapter and add it to the PCR tube in step 1.3.
[0115] 2.2 Prepare the reaction solution on ice (the composition of the reaction solution is shown in the table below), and add the prepared reaction solution to the PCR tube from step 2.1.
[0116] 2.3 Carry out the reaction under the conditions shown in the table below. After the reaction is complete, add 20 μL of TE Buffer.
[0117] 2.4 Add twice the volume of magnetic beads to purify the product. During elution, add 21 μL of TE buffer, and finally take 19 μL for subsequent amplification.
[0118] 3: PCR amplification
[0119] 3.1 Prepare the reaction solution on ice, the composition of which is shown in the table below:
[0120] 3.2 Add the above reaction solution to the sample from step 2.4, and perform DNA PCR amplification in a PCR instrument according to the reaction conditions shown in the table below.
[0121] 3.3 Add 2 times the volume of magnetic beads to the amplified sample for purification. During elution, add 32 μL of TE Buffer. Finally, take out 32 μL for subsequent cyclization.
[0122] 4: DNA circularization and preparation of DNB
[0123] 4.1 Take 25 ng of the DNA sample obtained in step 3.3 and add TE buffer to a total volume of 34 μL. Then add 2 μL of ad153 splint oligo (GCCATGTCGTTCTGTGAGCCAAGG, 10 μM) and 4 μL of 10x phi29 buffer. Perform the reaction in a PCR instrument according to the reaction conditions shown in the table below.
[0124] 4.2 Place the circularized sample obtained in 4.1 on ice and add 40 μL of DNA Nanoball (DNB) polymerase buffer I and 4 μL of DNB polymerase mixture II. Prepare DNB according to the reaction conditions shown in the table below.
[0125] 4.3 After the reaction is complete, quickly add 20 μL of stop DNB buffer to the PCR tube and gently invert the PCR tube 10 times, avoiding vigorous shaking and centrifugation. Use a single-stranded DNA dye to determine the concentration of the prepared DNB.
[0126] 5: DNB sample loading
[0127] 5.1 Prepare sequencing samples using 30% DNB samples and 70% balanced libraries obtained in step 4.3 above. The preparation steps for the balanced library are as follows: obtain 200-300 bp genome samples from E. coli by sonication and double selection, and prepare the library sample according to steps 1-3, which is the balanced library.
[0128] 5.2 Dilute the sequencing sample obtained in step 5.1 with DNB load buffer I to prepare a mixed DNB sample with a mixed DNB concentration of 10 ng / μL.
[0129] 5.3 The prepared diluted DNB mixed sample is mixed with DNB Load Buffer II according to the volumes shown in the table below to prepare a mixture for loading onto the sequencing chip.
[0130] 5.4 Load the mixture prepared in step 5.3 onto the MGI sequencing chip for subsequent sequencing.
[0131] 6: Sequencing
[0132] The MGI sequencing chip described above was sequenced using a sequencer from BGI Genomics to obtain sequencing results.
[0133] 7: Results
[0134] Sequencing results showed that the conversion rate of 5caC (conversion of 5caC to DHU, where conversion rate refers to the percentage of T at each conversion site in the amplified new strand) was 34.42%.
[0135] Example 3: First round of optimization of sulfinic acid derivative types
[0136] In this embodiment, the same synthetic DNA template sequence as in Example 2 is used.
[0137] In the reduction step of the sulfinic acid derivative, a method similar to that in Example 2 was used, except that the type of sulfinic acid derivative was adjusted; sodium hydroxymethyl sulfinate (3.00 mg) and sodium dithionite (3.48 mg) were used instead of formamidinic acid (2.16 mg) for the reaction. After the reaction was completed, the following steps were performed: DNA Cleanup Columns (5μg), (NEB, Catalog No.: T1034L) Purification Kit for DNA purification.
[0138] Library construction and sequencing were performed using the same methods as in Example 2. Sequencing results showed that the conversion rates of 5caC using sodium hydroxymethyl sulfinate and sodium dithionite were 5.68% and 32.46%, respectively. Therefore, the use of methanine sulfinic acid for the reaction is preferred.
[0139] Example 4: Buffer Type Optimization
[0140] In this embodiment, the same synthetic DNA template sequence as in Example 2 is used.
[0141] In the reduction step of the sulfinic acid derivative, a method similar to that in Example 2 was used, except that the type of buffer was adjusted. Specifically, TE buffer (pH = 8, containing 10 mM Tris and 1 mM EDTA) and Tris-HCl (pH = 8, 1 M) were used instead of sodium acetate buffer (pH = 5.2, 3 M) for the reaction. After the reaction was completed, the solution was used... DNA Cleanup Columns (5μg), (NEB, Catalog No.: T1034L) Purification Kit for DNA purification.
[0142] Library construction and sequencing were performed using the same methods as in Example 2. Sequencing results showed that the conversion rates of 5caC using TE buffer and Tris-HCl were 28.76% and 18.02%, respectively. Therefore, using sodium acetate buffer as the reaction buffer is preferred.
[0143] Example 5: First round of optimization of buffer concentration
[0144] In this embodiment, the same synthetic DNA template sequence as in Example 2 is used.
[0145] In the reduction step of the sulfinic acid derivative, a method similar to that in Example 2 was used, except that the concentration of the sodium acetate buffer solution was adjusted; specifically, sodium acetate buffer solutions (pH = 5.2) with concentrations of 0.5M, 2M, 3M, and 4M were used for the reaction. After the reaction was completed, the solution was used... DNA Cleanup Columns (5μg), (NEB, Catalog No.: T1034L) Purification Kit for DNA purification.
[0146] Library construction and sequencing were performed using the same methods as in Example 2. Sequencing results showed that the conversion rates of 5caC using 0.5M, 2M, 3M, and 4M sodium acetate buffer (pH = 5.2) were 21.02%, 5.68%, 34.42%, and 19.86%, respectively. Therefore, using a 3M sodium acetate buffer solution is preferred for the reaction.
[0147] Example 6: Buffer pH Optimization
[0148] In this embodiment, the same synthetic DNA template sequence as in Example 2 is used.
[0149] In the reduction step of the sulfinic acid derivative, a method similar to that in Example 2 was used, except that the pH of the sodium acetate buffer was adjusted; specifically, sodium acetate buffers (3M concentration) with pH values of 4.65 and 6.0 were used for the reaction. After the reaction was completed, the solution was used... DNA Cleanup Columns (5μg), (NEB, Catalog No.: T1034L) Purification Kit for DNA purification.
[0150] Library construction and sequencing were performed using the same methods as in Example 2. Sequencing results showed that the conversion rates of 5caC using sodium acetate buffer (3M) at pH 4.65 and 6.0 were 10.52% and 32.5%, respectively. Therefore, using sodium acetate buffer solution at pH 5.2 is preferred for the reaction.
[0151] Example 7: Optimization of Reduction Reaction Temperature
[0152] In this embodiment, the same synthetic DNA template sequence as in Example 2 is used.
[0153] In the reduction step of the sulfinic acid derivative, a method similar to that in Example 2 was used, except that the reaction temperature was adjusted to 60°C. After the reaction was completed, the following steps were performed: DNA Cleanup Columns (5μg), (NEB, Catalog No.: T1034L) Purification Kit for DNA purification.
[0154] Library construction and sequencing were performed using the same methods as in Example 2. Sequencing results showed that the conversion rate of 5caC at 60°C was 99.24%. Therefore, conducting the reaction at 60°C is preferred.
[0155] Example 8: Second round of optimization of sulfinic acid derivative types
[0156] In this embodiment, the same synthetic DNA template sequence as in Example 2 is used.
[0157] In the reduction step of the sulfinic acid derivative, a method similar to that in Example 7 was used, except that the type of sulfinic acid derivative was adjusted; sodium dithionite (3.48 mg) was used instead of formamidinic acid (2.16 mg) for the reaction. After the reaction was completed, [the following steps were performed]. DNA Cleanup Columns (5μg), (NEB, Catalog No.: T1034L) Purification Kit for DNA purification.
[0158] Library construction and sequencing were performed using the same methods as in Example 2. Sequencing results showed that the conversion rate of 5caC using sodium dithionite at 60°C was 59.68%. Therefore, the use of methanine sulfinic acid at a reaction temperature of 60°C remains preferred.
[0159] Example 9: Second round of optimization of buffer concentration
[0160] In this embodiment, the same synthetic DNA template sequence as in Example 2 is used.
[0161] In the reduction step of the sulfinic acid derivative, a method similar to that in Example 7 was used, i.e., the method of the present invention was carried out at 60°C. The difference was that the concentration of the sodium acetate buffer solution used was adjusted, i.e., sodium acetate buffer solutions (pH = 5.2) with concentrations of 0.5M, 2M, 3M, and 4M were used for the reaction. After the reaction was completed, the solution was used... DNA Cleanup Columns (5μg), (NEB, Catalog No.: T1034L) Purification Kit for DNA purification.
[0162] Library construction and sequencing were performed using the same methods as in Example 2. Sequencing results showed that the conversion rates of 5caC using 0.5M, 2M, 3M, and 4M sodium acetate buffer (pH = 5.2) were 84.56%, 48.22%, 99.24%, and 26.56%, respectively. Therefore, at a reaction temperature of 60°C, a 3M sodium acetate buffer remains preferred.
[0163] Example 10 Solvent Type Optimization
[0164] In this embodiment, the same synthetic DNA template sequence as in Example 2 is used.
[0165] In the reduction step of the sulfinic acid derivative, a method similar to that in Example 7 was used, except that the solvent type was adjusted; that is, DMSO, DMA, DMF, or no solvent was used for the reaction. After the reaction was completed, the following steps were performed: DNA Cleanup Columns (5μg), (NEB, Catalog No.: T1034L) Purification Kit for DNA purification.
[0166] Library construction and sequencing were performed using the same library construction and sequencing methods as in Example 2. Sequencing results showed that the conversion rates using DMSO, DMA, DMF, and 5caC without solvent were 67.34%, 91.85%, 99.24%, and 1.03%, respectively. Therefore, a solvent is necessary in the method of the present invention, and DMF is the preferred solvent.
[0167] Example 11 Optimization of Reaction Time
[0168] In this embodiment, the same synthetic DNA template sequence as in Example 2 is used.
[0169] In the reduction step of the sulfinic acid derivative, a method similar to that in Example 7 was used, except that the reaction time was adjusted to 48 hours. After the reaction was completed, [the following steps were performed]. DNA Cleanup Columns (5μg), (NEB, Catalog No.: T1034L) Purification Kit for DNA purification.
[0170] Library construction and sequencing were performed using the same methods as in Example 2. Sequencing results showed that the conversion rate of 5caC at a reaction time of 48 h was 82.16%. Therefore, a reaction time of 24 h is preferred.
[0171] Example 12 Effect of target nucleic acid length
[0172] In this embodiment, the same synthetic DNA template sequence as in Example 2 (i.e., the DNA template sequence with a length of 74 bp mentioned above) and another DNA template sequence with a length of 255 bp were used.
[0173] Synthesize a 255bp DNA template directly using the following sequence:
[0174] In the reduction step of the sulfinic acid derivative, a method similar to that in Example 7 was used. After the reaction was completed, [the following steps were performed]. DNA Cleanup Columns (5μg), (NEB, Catalog No.: T1034L) Purification Kit for DNA purification.
[0175] Library construction and sequencing were performed using the same methods as in Example 2. Sequencing results showed that the conversion rates of 5caC using DNA template sequences of 74 bp and 255 bp were 99.24% and 67.42%, respectively. Therefore, a nucleic acid sequence length of 74 bp is preferred.
Claims
1. A method for converting 5-methylcytosine (5mC) in a target nucleic acid to dihydrouracil (DHU), the method comprising the following steps: a) Provide a nucleic acid sample containing a target nucleic acid, wherein the target nucleotide contains 5-methylcytosine (5mC); b) Process the nucleic acid sample to convert 5mC to 5-carboxycytosine (5caC) to obtain the first converted target nucleic acid; c) The first converted target nucleic acid is further treated with a sulfinic acid derivative to reduce the 5caC to dihydrouracil (DHU) to obtain the second converted target nucleic acid.
2. A method for identifying 5-methylcytosine (5mC) in a target nucleic acid, the method comprising the following steps: a') Provide nucleic acid samples containing the target nucleic acid; b) Process the nucleic acid sample to convert 5mC to 5-carboxycytosine (5caC) to obtain the first converted target nucleic acid; c) The first converted target nucleic acid is further treated with a sulfinic acid derivative to reduce the 5caC to dihydrouracil (DHU) to obtain the second converted target nucleic acid; d) Sequence analysis of the second transformed target nucleic acid.
3. The method according to claim 2, wherein between step a') and step b), a step of fragmenting the target nucleic acid to obtain nucleic acid fragments is further included.
4. The method according to claim 2 or 3, wherein the sequence analysis comprises amplifying and sequencing the second transformed target nucleic acid, wherein, The conversion of 5-methylcytosine (5mC) to thymine (T) in the new strand of the amplified product indicates the presence of 5-methylcytosine (5mC) in the target nucleotide and indicates the location of 5mC in the target nucleic acid.
5. The method according to any one of claims 1-4, wherein in step b), the nucleic acid sample or the nucleic acid fragment is treated with a TET enzyme to obtain the first transformed target nucleic acid; preferably, the TET enzyme is selected from human TET1, TET2 and TET3, mouse Tet1, Tet2 and Tet3, Narva nuclei TET (NgTET), Coprinus comatus TET (CcTET) and its derivatives or analogs.
6. The method according to any one of claims 1-5, wherein the sulfinic acid derivative has the structural formula R. 1 SO2R 2 Or R 2 SO2-SO2R 2 ,in: R 1 It includes formamidinyl, methyl, ethyl, n-propyl, isopropyl, n-butyl, isobutyl, sec-butyl, tert-butyl, n-pentyl, isopentyl, sec-pentyl, neopentyl, n-hexyl, hydroxymethyl and various substituted C1-C6 alkyl groups, aryl, heterocyclic and substituted aryl, heterocyclic; R 2 The metal ion is hydrogen or a metal ion selected from the group consisting of the following metal ions: lithium ion, sodium ion, potassium ion, magnesium ion, calcium ion, strontium ion, barium ion, manganese ion, iron ion, cobalt ion, nickel ion, copper ion, zinc ion, silver ion, gold ion, molybdenum ion, or tungsten ion.
7. The method of claim 6, wherein R 1 It is a formamidinyl group; preferably, R 2 It can be hydrogen ions, lithium ions, sodium ions, or potassium ions.
8. The method according to any one of claims 1-7, wherein the sulfinic acid derivative is selected from the group consisting of sodium dithionite, lithium dithionite, potassium dithionite, and formamidine sulfinic acid; preferably, the sulfinic acid derivative is formamidine sulfinic acid.
9. The method according to any one of claims 1-8, wherein step c) is carried out in a mixed solution of buffer and solvent, wherein the concentration of the buffer is 0.5-4M, 1-4M, 2-4M, 3-4M, 0.5-3M, 1-3M, or 2-3M; preferably, the concentration of the buffer is 0.5M, 2M, 3M, or 4M.
10. The method according to any one of claims 1-9, wherein step c) is carried out in a mixed solution of buffer and solvent, wherein the pH of the buffer is 4.65-8.0, 5.2-8.0, 4.65-5.2, 4.65-6.0, 5.2-6.0, or 6.0-8.0; preferably, the pH of the buffer is 4.65, 5.2, 6.0, or 8.
0.
11. The method according to any one of claims 9-10, wherein the buffer is sodium acetate buffer, TE buffer or Tris-HCl buffer, and the solvent is N,N-dimethylformamide (DMF), N,N-dimethylacetamide (DMA) or dimethyl sulfoxide (DMSO); preferably, the buffer is sodium acetate buffer and the solvent is DMF.
12. The method according to any one of claims 1-11, wherein step c) is carried out at a reaction temperature of 50-70°C; preferably, the reaction temperature is 50-60°C or 60-70°C; more preferably, the reaction temperature is 60°C.
13. The method according to claim 12, wherein the reaction is carried out for 24-48 hours; preferably, the reaction is carried out for 24 hours.
14. The method according to any one of claims 1-13, wherein the target nucleic acid or the nucleic acid fragment has 11-255 nucleotide molecules, preferably 70-80 nucleotide molecules.
15. The method according to any one of claims 2-14, wherein the method further comprises, between step c) and step d): A sequencing library was constructed based on the second transformed target nucleic acid.
16. A kit comprising a sulfinic acid derivative.
17. The kit according to claim 16, wherein the sulfinic acid derivative has the structural formula R. 1 SO2R 2 Or R 2 SO2-SO2R 2 ,in: R 1 It includes formamidinyl, methyl, ethyl, n-propyl, isopropyl, n-butyl, isobutyl, sec-butyl, tert-butyl, n-pentyl, isopentyl, sec-pentyl, neopentyl, n-hexyl, hydroxymethyl, and various substituted C1-C6 alkyl groups; aryl, heterocyclic, and substituted aryl, heterocyclic; R 2 The metal ion is hydrogen or a metal ion selected from the group consisting of the following metal ions: lithium ion, sodium ion, potassium ion, magnesium ion, calcium ion, strontium ion, barium ion, manganese ion, iron ion, cobalt ion, nickel ion, copper ion, zinc ion, silver ion, gold ion, molybdenum ion, or tungsten ion.
18. The kit according to claim 17, wherein R 1 It is a formamidinyl group; preferably, R 2 It can be hydrogen ions, lithium ions, sodium ions, or potassium ions.
19. The kit according to any one of claims 16-18, wherein the sulfinic acid derivative is selected from the group consisting of sodium dithionite, lithium dithionite, potassium dithionite, and formamidinic acid; preferably, the sulfinic acid derivative is formamidinic acid.
20. The kit according to any one of claims 16-19, wherein the kit further comprises TET enzyme; preferably, the TET enzyme is selected from human TET1, TET2 and TET3, mouse Tet1, Tet2 and Tet3, Narva nuclei TET (NgTET), Coprinus comatus TET (CcTET) and its derivatives or analogs.
21. The kit according to any one of claims 16-20, wherein the kit further comprises a buffer solution of 0.5-4M concentration and pH 4.65-8.0, such as sodium acetate buffer, TE buffer or Tris-HCl buffer; preferably, the buffer solution is a sodium acetate buffer solution of 3M concentration and pH 5.
2.
22. The kit according to any one of claims 16-21, wherein the kit further comprises a solvent such as N,N-dimethylformamide (DMF), N,N-dimethylacetamide (DMA) or dimethyl sulfoxide (DMSO); preferably, the solvent is DMF.