Novel uracil DNA glycosylase rco and its application in base editing

By recognizing and removing hypoxanthine produced by the deamination of the A base using a novel uracil DNA glycosylase, and combining it with Cas protein to construct a novel base editor, the problem of low editing efficiency of existing tools is solved, enabling precise editing of the rice genome and its application in breeding.

CN122256307APending Publication Date: 2026-06-23SANYA NATIONAL INSTITUTE OF SOUTHERN BREEDING CHINESE ACADEMY OF AGRICULTURAL SCIENCES +1

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
SANYA NATIONAL INSTITUTE OF SOUTHERN BREEDING CHINESE ACADEMY OF AGRICULTURAL SCIENCES
Filing Date
2026-05-26
Publication Date
2026-06-23

AI Technical Summary

Technical Problem

Existing base editing tools are inefficient, have a narrow targeting range, and produce low product purity in base substitutions such as A→T or A→C, which limits their application in precision breeding of crops such as rice.

Method used

Develop a novel uracil DNA glycosylation enzyme that recognizes and removes hypoxanthine produced by deamination of the A base. Combine this enzyme with a Cas protein to construct a novel base editor that enables A→T or A→C base substitution.

Benefits of technology

This technology enables precise editing of the rice genome, expands the application of DNA glycosylation enzymes in crop genome editing, provides a novel editing tool for rice, and lays the foundation for precision breeding and gene function research.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN122256307A_ABST
    Figure CN122256307A_ABST
Patent Text Reader

Abstract

This invention provides a novel DNA glycosylase Rco and its application in base editing. Specifically, it provides a novel DNA glycosylase Rco and a novel base editor containing this novel DNA glycosylase. This novel base editor can achieve adenine-based base transversions. The novel DNA glycosylase of this invention can be applied to precise genome editing and has promising application prospects.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of gene editing, and more specifically, to a novel uracil DNA glycosylase Rco and its application in base editing. Background Technology

[0002] Gene editing technology enables precise introduction of mutations at target sites in the genome. Related tools mainly include zinc finger nucleases (ZFNs), transcription activator-like effector nucleases (TALENs), and the CRISPR / Cas system. Among these, the CRISPR / Cas system, which uses simple single-stranded guide RNA (sgRNA) to guide the targeted cleavage of Cas proteins, boasts advantages such as high editing efficiency and ease of design, and has become the mainstream genome editing tool. It is widely used in genetic improvement research of various organisms and shows significant application potential in the precision breeding of important food crops such as rice.

[0003] However, the CRISPR / Cas system relies on DNA double-strand breaks (DSBs) to activate the repair pathway, which easily generates unintended insertion or deletion mutations (InDels), affecting the accuracy of editing and limiting its efficient application in breeding. Base editors (BEs) improve upon this by utilizing dCas9 (which has lost its double-strand cutting activity) or nCas9 (which retains only single-strand cutting activity) and fusing them with functional deaminases. This allows for precise base conversions without inducing DSBs, significantly reducing the generation of unintended InDels. Currently, common base editors mainly achieve C→T (CBE) or A→G (ABE) conversions. Further, by introducing uracil DNA glycosylase (UNG) to remove uracil (U) generated from the deamination of C bases by cytosine deaminase in CBEs, forming apurine / pyrimidine (AP) sites, repair can be initiated, potentially leading to the insertion of G or A bases. This allows for the development of tools capable of C→G or C→A transversion editing (such as CGBE or GBE). On the other hand, there are also reports of using human N-methylpurine DNA glycosylase (MPG) to remove hypoxanthine (Hx) produced by the deamination of the A base in ABE (adenine deaminase), forming an apurinol / pyrimidine (AP) site, which then triggers repair and may insert a T or C base, thereby developing tools capable of A→T or A→C base substitution (such as AYBE). Furthermore, some studies have directly modified and evolved DNA glycosylases to recognize and remove normal bases to form apurinol / pyrimidine (AP) sites without the need for deaminases, thus developing corresponding deaminase-free base editing tools, such as DAF-CBE and DAF-TBE. However, existing A→T or A→C base substitution tools still generally suffer from low editing efficiency, narrow targeting range, and low product purity.

[0004] Therefore, there is still a need in this field to explore novel DNA glycosylation enzymes with higher activity or new properties, and to develop precise base transversion editing tools adapted to the characteristics of the rice genome to meet the practical needs of precision breeding. Summary of the Invention

[0005] The purpose of this invention is to provide a novel uracil DNA glycosylation enzyme and its application in base editing.

[0006] In a first aspect, the present invention provides a DNA glycosylation enzyme selected from the group consisting of: (1) It has the amino acid sequence shown in SEQ ID NO:1; (2) An amino acid sequence having ≥80%, ≥85%, ≥90%, ≥91%, ≥92%, ≥93%, ≥94%, ≥95%, ≥96%, ≥97%, ≥98%, ≥99%, or 100% sequence identity with the amino acid sequence shown in SEQ ID NO:1.

[0007] In a second aspect, the present invention provides a fusion protein comprising the DNA glycosylation enzyme and Cas protein described in the first aspect of the present invention.

[0008] In another preferred embodiment, the fusion protein is a fusion protein comprising the DNA glycosylation enzyme, Cas protein, and deaminase.

[0009] In another preferred embodiment, the fusion protein further includes one or more of a localization signal, a reporter protein, and a tag sequence.

[0010] In another preferred embodiment, the fusion protein has the structure shown in formula (I): Z1-Z2-Z3-Z4-Z5-Z6-Z7 (I), In the formula, "-" represents a peptide bond or a linking peptide. Z1 is either a nuclear localization signal sequence or none; Z2 is a deaminase or a deamination domain; Z3 is a connection sequence; Z4 is a Cas protein; Z5 is a nuclear localization signal sequence or none; Z6 is a DNA glycosylation enzyme; Z7 is either a nuclear positioning signal sequence or none.

[0011] In another preferred embodiment, the nuclear localization signal sequence has an amino acid sequence as shown in SEQ ID NO:2.

[0012] In another preferred embodiment, the linker sequence has an amino acid sequence as shown in SEQ ID NO:4.

[0013] In another preferred embodiment, the deaminase includes: TadA, APOBEC1, CDA, and A3A.

[0014] In another preferred embodiment, the deamination domain includes: ADAR1, ADAR2, APOBEC, AID, and TAD.

[0015] In another preferred embodiment, the deaminase is TadA8e.

[0016] In another preferred embodiment, the amino acid sequence of the deaminase is selected from the group consisting of: (a1) Has the amino acid sequence shown in SEQ ID NO:3; or (a2) An amino acid sequence having ≥80%, ≥85%, ≥90%, ≥91%, ≥92%, ≥93%, ≥94%, ≥95%, ≥96%, ≥97%, ≥98%, ≥99%, or 100% sequence identity with the amino acid sequence shown in SEQ ID NO:3.

[0017] In another preferred embodiment, the Cas protein includes: Cas9 or a variant thereof, Cas12 or a variant thereof, Cas13 or a variant thereof, Cas14 or a variant thereof, and TnpB.

[0018] In another preferred embodiment, the Cas protein is Cas9 or a variant thereof.

[0019] In another preferred embodiment, the amino acid sequence of the Cas protein is selected from the group consisting of: (b1) Having the amino acid sequence shown in SEQ ID NO:5; or (b2) An amino acid sequence having ≥80%, ≥85%, ≥90%, ≥91%, ≥92%, ≥93%, ≥94%, ≥95%, ≥96%, ≥97%, ≥98%, ≥99%, or 100% sequence identity with the amino acid sequence shown in SEQ ID NO:5.

[0020] In another preferred embodiment, the amino acid sequence of the fusion protein is selected from the group consisting of: (c1) Has the amino acid sequence shown in SEQ ID NO:7; or (c2) An amino acid sequence having ≥80%, ≥85%, ≥90%, ≥91%, ≥92%, ≥93%, ≥94%, ≥95%, ≥96%, ≥97%, ≥98%, ≥99%, or 100% sequence identity with the amino acid sequence shown in SEQ ID NO:7.

[0021] A third aspect of the present invention provides a CRISPR-Cas composition comprising: (1) Guide RNA, or one or more DNA molecules encoding said guide RNA; and (2) The fusion protein described in the second aspect of the present invention, or the nucleic acid molecule encoding the fusion protein described in the second aspect of the present invention.

[0022] In another preferred embodiment, the CRISPR-Cas composition is used to transpose base A with base T or base C.

[0023] In a fourth aspect, the present invention provides a polynucleotide encoding the DNA glycosylation enzyme described in the first aspect of the present invention or the fusion protein described in the second aspect of the present invention.

[0024] In another preferred embodiment, the polynucleotide is a DNA molecule that has been codon-optimized according to the codon preference of the host cell.

[0025] In another preferred embodiment, the host cell includes eukaryotic cells and prokaryotic cells.

[0026] In a fifth aspect, the present invention provides a carrier containing the polynucleotide described in the fourth aspect of the present invention.

[0027] A sixth aspect of the present invention provides a delivery system comprising the DNA glycosylation enzyme of the first aspect of the present invention, the fusion protein of the second aspect of the present invention, the CRISPR-Cas composition of the third aspect of the present invention, the polynucleotide of the fourth aspect of the present invention, or the carrier and delivery medium of the fifth aspect of the present invention.

[0028] In another preferred embodiment, the delivery medium includes nanoparticles, plasmids, exosomes, microbubbles, gene guns, or electroporation devices.

[0029] In a seventh aspect, the present invention provides a host cell expressing the DNA glycosylation enzyme of the first aspect of the present invention, the fusion protein of the second aspect of the present invention, or the CRISPR-Cas composition of the third aspect of the present invention, or having a genome integrated with the polynucleotide of the fourth aspect of the present invention, or containing the expression vector of the fifth aspect of the present invention, or the delivery system of the sixth aspect of the present invention.

[0030] In an eighth aspect, the present invention provides a kit comprising the DNA glycosylase described in the first aspect of the present invention, the fusion protein described in the second aspect of the present invention, the CRISPR-Cas composition described in the third aspect of the present invention, or the host cell described in the seventh aspect of the present invention.

[0031] In another preferred embodiment, the components of the kit are in the same or different containers.

[0032] In another preferred embodiment, the kit includes one or more buffer solutions.

[0033] In another preferred embodiment, the kit also includes a label or instructions.

[0034] In another preferred embodiment, the kit is used for one or more of the following: gene or genome editing, plant breeding, targeting a target gene, cutting a target gene or a non-target gene, and base transversion.

[0035] A ninth aspect of the present invention provides a method for transposing bases in a target nucleic acid, the method comprising contacting the target nucleic acid with the CRISPR-Cas composition described in a third aspect of the present invention.

[0036] The tenth aspect of the present invention provides the use of the fusion protein of the first aspect of the present invention, the guide RNA of the second aspect of the present invention, the CRISPR-Cas composition of the third aspect of the present invention, the polynucleotide of the fourth aspect of the present invention, the expression vector of the fifth aspect of the present invention, the delivery system of the sixth aspect of the present invention, the host cell of the seventh aspect of the present invention, or the kit of the eighth aspect of the present invention for base editing, base transversion, or plant breeding.

[0037] It should be understood that, within the scope of this invention, the above-described technical features of this invention and the technical features specifically described below (such as in the embodiments) can be combined with each other to form new or preferred technical solutions. Due to space limitations, they will not be described in detail here. Attached Figure Description

[0038] Figure 1 This image shows sequence similarity comparisons between members of the 6th family of uracil DNA glycosylases and previously reported glycosylases such as MPG and UNG. Glycosylase origins and abbreviations: human MPG (hMPG); human UNG (hUNG1, hUNG2); e.coli UNG (eUNG); The following are members of the sixth family of uracil DNA glycosyltransferases: Polaribacter sp. (Psp); Burkholderia phymatum (Bph); Candidatus Methanoregula boonei (Cme); Entamoeba histolytica (Ehi); M. barkeri (Mba); M. hungatei (Mhu); Ricinus communis (Rco).

[0039] Figure 2 The construction graph of the adenine transversion editor candidates is shown.

[0040] Figure 3 The efficiency of base editing at the endogenous target OsWaxy-sg1 in rice was demonstrated.

[0041] Figure 4 The efficiency of base editing at the endogenous target OsWaxy-sg2 in rice was demonstrated.

[0042] Figure 5 The efficiency of base editing at the endogenous target OsSLR1-sg1 in rice was demonstrated.

[0043] Figure 6 The efficiency of base editing at the endogenous target OsNRT1.1B-sg1 in rice was demonstrated. Detailed Implementation

[0044] Through extensive and in-depth research, the inventors have discovered a novel DNA glycosylation enzyme for the first time. This novel DNA glycosylation enzyme is smaller than commonly used DNA glycosylation enzymes and can recognize and remove hypoxanthine produced by the deamination of the A base. Furthermore, a novel base editor containing this novel DNA glycosylation enzyme, a deaminase, and a Cas protein was constructed. Experimental verification showed that this novel base editor can produce A→T or A→C base substitutions. The novel DNA glycosylation enzyme of this invention can be used for precise editing of plant genomes. Based on this, the present invention was completed.

[0045] Specifically, members of the uracil DNA glycosylase (UNG) family typically act only on uracil (U) produced by deamination of C bases. However, our research group has explored the diversity of the uracil DNA glycosylase family and, for the first time, experimentally discovered that members of the sixth branch of this family possess the activity to recognize and excise hypoxanthine (Hx) produced by deamination of A bases. By fusing this with a highly efficient deaminase, we successfully constructed a novel base editing system. Validation of this system in rice protoplasts showed that it can precisely perform base transversion editing without inducing DNA double-strand breaks. This invention not only expands the application of DNA glycosylases in rice and crop genome editing, providing a new strategy for developing novel rice editing tools with independent intellectual property rights, but also holds promise for promoting its industrial application in rice gene function research and precision rice breeding. Furthermore, it provides a reference for precision genome editing in other crops, possessing significant scientific research and agricultural industrial application value.

[0046] It should be understood that the specific methods and experimental conditions of the invention described below in varying degrees of detail are intended to provide a substantive understanding of the invention. Definitions of certain terms used in this specification are provided below. Unless otherwise defined, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention pertains.

[0047] the term As used herein, the terms “containing” or “including (comprise)” can be open-ended, semi-closed, or closed-ended. In other words, the terms also include “consistently made of” or “made of”.

[0048] As used herein, the term “and / or” refers to and covers any and all possible combinations of one or more of the related listed items.

[0049] As used in this article, the term "significant" means that, in a hypothesis test, the observed effect (such as the difference between the experimental and control groups) is unlikely to be caused solely by random error. A hypothesis test includes: the null hypothesis (H0), which assumes that the observed effect does not exist (such as no difference between the experimental and control groups); the p-value, which is the probability of observing the current or more extreme effect when H0 is true; and the significance threshold (α). The significance threshold is typically used to determine whether a hypothesis test is significant. Generally, the significance threshold is 0.05. If the p-value ≤ α, then H0 is rejected, meaning the observed effect exists, and the result is called "significant."

[0050] As used herein, a protein “fragment,” “variant,” or “homologous” may optionally be characterized as having at least 60%, preferably 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100% amino acid sequence identity with a reference protein (such as a reference isoform). In some preferred embodiments, fragments, variants, isoforms, and homologs of a reference protein may be characterized by their ability to perform the functions performed by the reference protein.

[0051] Typically, protein derivatization does not adversely affect the protein's desired activity (e.g., activity binding to guide RNA, endonuclease activity, activity of binding to and cleaving a target sequence at a specific site guided by guide RNA); that is, the protein derivative has the same activity as the original protein. Modified forms of "derivatives" include the deletion, insertion, modification, and / or substitution of one or more amino acids in the protein. The term "artificial evolutionary modification" indicates artificial involvement.

[0052] Those skilled in the art will recognize that the structure of a protein can be altered without adversely affecting its activity and functionality, for example, by introducing one or more conserved amino acid substitutions into the protein's amino acid sequence without adversely affecting the protein molecule's activity and / or three-dimensional structure.

[0053] Those skilled in the art will recognize examples and implementations of conserved amino acid substitutions. Specifically, an amino acid residue can be substituted with another amino acid residue belonging to the same group as the site to be substituted; that is, a nonpolar amino acid residue can replace another nonpolar amino acid residue, a polar, uncharged amino acid residue can replace another polar, uncharged amino acid residue, a basic amino acid residue can replace another basic amino acid residue, and an acidic amino acid residue can replace another acidic amino acid residue. Such substituted amino acid residues may or may not be encoded by the genetic code. Conservative substitutions, where an amino acid is replaced by another amino acid belonging to the same group, fall within the scope of this disclosure, provided that the substitution does not result in the inactivation of the protein's biological activity. Therefore, the proteins of this disclosure may contain one or more conserved substitutions in their amino acid sequences, preferably generated by substitutions according to Table A. Additionally, this disclosure also covers proteins that also contain one or more other nonconservative substitutions, provided that such nonconservative substitutions do not significantly affect the desired function and biological activity of the proteins of this disclosure.

[0054] Conserved amino acid substitutions can occur at one or more predicted non-essential amino acid residues. “Non-essential” amino acid residues are those that can be altered (deleted, substituted, or replaced) without changing biological activity, while “essential” amino acid residues are required for biological activity. A “conserved amino acid substitution” is a substitution in which an amino acid residue is replaced by an amino acid residue with a similar side chain. Amino acid substitutions can occur in non-conserved regions of Cas enzymes. Generally, such substitutions are not performed on conserved amino acid residues, or on amino acid residues located within conserved motifs, where such residues are required for protein activity. However, those skilled in the art will understand that functional variants may have fewer conserved or non-conserved alterations in conserved regions.

[0055] In some implementations, the selected group of amino acids considered to be mutually conserved substitutions includes: Table A Those skilled in the art will recognize that one or more amino acid residues can be altered (replaced, deleted, truncated, or inserted) from the N and / or C-terminus of a protein while retaining its functional activity. Therefore, proteins that have one or more amino acid residues altered from the N and / or C-terminus of the fusion protein of this invention while retaining their desired functional activity are also within the scope of this disclosure. These alterations may include those introduced by modern molecular methods such as PCR, which includes PCR amplification that alters or lengthens the protein-coding sequence by means of oligonucleotides containing amino acid-coding sequences used in the PCR amplification.

[0056] It should be recognized that proteins can be altered in various ways, including amino acid substitution, deletion, truncation, and insertion, and methods for such operations are generally known to those skilled in the art. For example, amino acid sequence variants of proteins can be prepared by mutating DNA. This can also be accomplished through other forms of mutagenesis and / or directed evolution, for example, by using known mutagenesis, recombination, and / or shuffling methods, combined with relevant screening methods, to perform one or more amino acid substitutions; or the deletion and / or insertion of one or more amino acids.

[0057] Those skilled in the art will understand that minute amino acid changes (e.g., naturally occurring mutations) or (e.g., using r-DNA technology) may occur in the fusion proteins of this application without loss of protein function or activity. If these mutations occur in the catalytic domain, active site, or other functional domains of the protein, the properties of the polypeptide may be altered, but the polypeptide may retain its activity. If the mutations are not located near the catalytic domain, active site, or other functional domains, a smaller impact can be expected.

[0058] Those skilled in the art can identify the essential amino acids of fusion proteins using methods known in the art, such as localized mutagenesis, protein evolution, or bioinformatics analysis. The catalytic domains, active sites, or other functional domains of a protein can also be determined through physical structural analysis, such as by techniques like nuclear magnetic resonance, crystallography, electron diffraction, or photoaffinity labeling, combined with mutations in presumed key site amino acids.

[0059] DNA glycosylation enzymes As used herein, "uracil DNA glycosylase," "DNA glycosylase," and "glycosylase" are interchangeable and all refer to the novel DNA glycosylases provided by this invention. These novel DNA glycosylases have extremely low amino acid sequence similarity to existing DNA glycosylases and are considered members of the sixth family of uracil DNA glycosylases. Furthermore, these novel DNA glycosylases possess the activity of recognizing and cleaving hypoxanthine (Hx) produced by the deamination of the A base.

[0060] Preferably, the DNA glycosylation enzyme is selected from the group consisting of: (1) It has the amino acid sequence shown in SEQ ID NO:1; (2) An amino acid sequence having ≥80%, ≥85%, ≥90%, ≥91%, ≥92%, ≥93%, ≥94%, ≥95%, ≥96%, ≥97%, ≥98%, ≥99%, or 100% sequence identity with the amino acid sequence shown in SEQ ID NO:1.

[0061] Fusion protein This invention provides a fusion protein comprising the DNA glycosylase and Cas protein of this invention. Specifically, the fusion protein of this invention also includes a Cas protein-based base editor. As used herein, the term "Cas protein" is used in its broadest sense to include wild-type Cas proteins, their derivatives or variants, analogs, and functional fragments thereof such as oligonucleotide-binding fragments.

[0062] The term “wild type” has the meaning commonly understood by those skilled in the art as referring to the typical form of an organism, strain, gene, or protein, or the characteristic that distinguishes it from mutant or variant forms when it exists in nature, which can be isolated from its natural source and has not been intentionally modified by humans.

[0063] Preferably, the fusion protein is a fusion protein comprising DNA glycosylation enzyme, Cas protein, and deaminase. Preferably, the fusion protein further comprises one or more of a localization signal, a reporter protein, and a tag sequence.

[0064] Preferably, the Cas protein may include: Cas9 or its variants, Cas12 or its variants, Cas13 or its variants, Cas14 or its variants, TnpB, etc. Preferably, the Cas protein is Cas9 or its variants. Preferably, the Cas protein used in this invention is a type of Cas9, nCas9, whose amino acid sequence is shown in SEQ ID NO:5. A "variant" or "homolog" of a Cas protein refers to a polypeptide that substantially retains the function or activity of the Cas protein.

[0065] Preferably, the amino acid sequence of the Cas protein has one or more amino acid substitutions, deletions, or additions compared to the amino acid sequence shown in SEQ ID NO:5, and substantially retains the biological function of its source sequence; the substitutions, deletions, or additions of one or more amino acids include one or more (e.g., 1, 2, 3, 4, 5, 6, 7, 8, 9, or 10) amino acid substitutions, deletions, or additions.

[0066] The term "Cas protein-based base editor" refers to a fusion protein comprising a Cas protein and a deaminase or deamination domain. It is called a base editor because it can perform direct single-base substitution.

[0067] Preferably, the deaminase comprises: TadA, APOBEC1, CDA, A3A. The deamination domain comprises one or more of ADAR1, ADAR2, APOBEC, AID, or TAD.

[0068] As used herein, a “deamination domain” includes a deaminase (e.g., adenosine deaminase or cytidine deaminase) catalytic domain. As used herein, “adenosine deaminase” or “adenosine deaminase protein” refers to a protein, polypeptide, or one or more functional domains of a protein or polypeptide capable of catalyzing the hydrolytic deamination reaction that converts adenine (or the adenine portion of a molecule) to hypoxanthine (or the hypoxanthine portion of a molecule). Preferably, the adenine-containing molecule is adenosine (A), and the hypoxanthine-containing molecule is inosine (I). The adenine-containing molecule may be deoxyribonucleic acid (DNA) or ribonucleic acid (RNA).

[0069] Adenosine deaminases include, but are not limited to, members of the enzyme family called RNA-acting adenosine deaminases (ADAR), members of the enzyme family called tRNA-acting adenosine deaminases (ADAT), and other family members containing an adenosine deaminase domain (ADAD). According to this disclosure, adenosine deaminases are capable of targeting adenine in RNA / DNA and RNA duplexes. In certain embodiments, adenosine deaminases have been modified to increase their ability to edit DNA in RNA / DNA heteroduplexes of RNA duplexes.

[0070] Preferably, the deaminase is a cytidine deaminase. The term "cytidine deaminase" or "cytidine deaminase protein" refers to a protein, polypeptide, or one or more functional domains of a protein or polypeptide that catalyzes a hydrolytic deamination reaction that converts cytosine (or the cytosine portion of a molecule) to uracil (or the uracil portion of a molecule). In some embodiments, the cytosine-containing molecule is cytidine (C), and the uracil-containing molecule is uridine (U). The cytosine-containing molecule may be deoxyribonucleic acid (DNA) or ribonucleic acid (RNA).

[0071] Cytidine deaminases include, but are not limited to, members of the enzyme family known as the apolipoprotein B mRNA editing complex (APOBEC) family of deaminases, activation-induced deaminases (AID), or cytidine deaminase 1 (CDA1). In some embodiments, the cytidine deaminase includes APOBEC family deaminases.

[0072] Preferably, the deaminase has an amino acid sequence as shown in SEQ ID NO:3, or an amino acid sequence having ≥80%, ≥85%, ≥90%, ≥91%, ≥92%, ≥93%, ≥94%, ≥95%, ≥96%, ≥97%, ≥98%, ≥99%, or 100% sequence identity with the amino acid sequence shown in SEQ ID NO:3.

[0073] Preferably, the amino acid sequence of the fusion protein has the amino acid sequence shown in SEQ ID NO:7; or an amino acid sequence having ≥80%, ≥85%, ≥90%, ≥91%, ≥92%, ≥93%, ≥94%, ≥95%, ≥96%, ≥97%, ≥98%, ≥99%, or 100% sequence identity with the amino acid sequence shown in SEQ ID NO:7.

[0074] Guide RNA As used herein, the terms “guide RNA,” “guide sequence,” and “sgRNA” are used interchangeably and have the meanings commonly understood by those skilled in the art. Generally, a guide RNA may comprise a direct repeat (DR) sequence and a spacer sequence, or consist substantially of or composed of a direct repeat sequence and a spacer sequence.

[0075] In some cases, a guide sequence is any polynucleotide sequence that is sufficiently complementary to the target sequence / gene / nucleotide to hybridize with said target sequence / gene / nucleotide and guide the CRISPR-Cas complex to specifically bind to said target sequence / gene / nucleotide. Generally, when optimally aligned, the complementarity between the guide sequence and its corresponding target sequence / gene / nucleotide is at least 50%, at least 60%, at least 70%, at least 80%, at least 90%, at least 95%, or at least 99%.

[0076] Preferably, more than 80% of the spacer sequence is complementary to the target nucleic acid. More preferably, more than 90%, more preferably more than 95%, even more preferably more than 99%, and still more preferably 100% complementary to the target sequence / target gene / target nucleic acid.

[0077] Preferably, the spacer sequence is about or at least about 16 nucleotides in length, for example, about 16-100, about 16-90, about 17-70, about 17-50, or about 18-41 consecutive nucleotides in length; optionally, the spacer sequence is about 20 consecutive nucleotides in length. Preferably, the spacer sequence is about 20 nt in length.

[0078] carrier As used in this article, a "carrier" is a nucleic acid molecule that can transport another nucleic acid molecule linked to it.

[0079] Vectors include, but are not limited to, single-stranded, double-stranded, or partially double-stranded nucleic acid molecules; nucleic acid molecules including one or more free ends, or without free ends (e.g., circular); nucleic acid molecules including DNA, RNA, or both; and a wide variety of other polynucleotides known in the art. Vectors can be introduced into host cells through transformation, transduction, or transfection, thereby enabling the expression of their carried genetic material elements in the host cells. A vector can be introduced into a host cell to produce transcripts, proteins, or peptides, including proteins, fusion proteins, isolated nucleic acid molecules, etc., as described herein (e.g., CRISPR transcripts, such as nucleic acid transcripts, proteins, or enzymes). A vector may contain a variety of elements controlling expression, including, but not limited to, promoter sequences, transcription initiation sequences, enhancer sequences, selection elements, and reporter genes. Vectors may also contain a replication initiation site.

[0080] Vectors include plasmids and viral vectors. A plasmid is a circular double-stranded DNA loop in which another DNA fragment can be inserted, for example, using standard molecular cloning techniques. A viral vector contains a virus-derived DNA or RNA sequence within a vector used to package the virus; viruses include, for example, retroviruses, replication-defective retroviruses, adenoviruses, replication-defective adenoviruses, and adeno-associated viruses. Viral vectors also contain polynucleotides carried by a virus intended for transfection into a host cell. Some vectors (e.g., bacterial vectors with bacterial origins of replication and augmented mammalian vectors) are capable of autonomous replication in the host cells into which they are introduced.

[0081] Other vectors (e.g., non-attachment mammalian vectors) integrate into the host cell's genome after introduction and thereby replicate along with the host genome. Furthermore, some vectors can direct the expression of genes they are operatively linked to. Such vectors are called "expression vectors."

[0082] Those skilled in the art will understand that the design of expression vectors can depend on factors such as the selection of host cells to be transformed and the desired expression level.

[0083] Reagent test kit The present invention provides a kit comprising the DNA glycosylase of the present invention, a fusion protein, a CRISPR-Cas-containing composition, a host cell, or any two or all of the above components.

[0084] Preferably, the kit of this disclosure further includes a label or instructions for use of one or more components contained therein, and / or a label or instructions for use in combination with one or more additional components that may be available elsewhere or are required.

[0085] Preferably, the kit further includes one or more buffer solutions that can be used to dissolve any of the one or more components contained therein, and / or to provide suitable reaction conditions for one or more of the components. Such buffer solutions may include one or more of the following: PBS, HEPES, Tris, MOPS, Na₂CO₃, NaHCO₃, NaB, or combinations thereof. In some embodiments, reaction conditions include an appropriate pH, such as an alkaline pH. In some embodiments, the pH is between 7 and 10.

[0086] Preferably, any one or more of the reagent kit components can be stored in a suitable container or at a suitable temperature, such as 4 degrees Celsius.

[0087] Target sequence, target gene or target nucleic acid As used herein, the terms “target sequence,” “target gene,” or “target nucleic acid” are used interchangeably and refer to a specific nucleic acid containing a nucleic acid sequence that is wholly or partially complementary to a spacer sequence in the guide RNA. A “target sequence” is a polynucleotide targeted by a spacer sequence in the guide RNA, such as a sequence complementary to that spacer sequence, wherein hybridization between the target sequence and the spacer sequence will promote the formation of a CRISPR-Cas complex (including the Cas protein and the guide RNA). Perfect complementarity between sequences is not required, as long as sufficient complementarity exists to induce hybridization and promote the formation of a CRISPR-Cas complex. Preferably, the target nucleic acid contains a non-coding region (e.g., a promoter or terminator). Preferably, the target nucleic acid is single-stranded or double-stranded.

[0088] Sequence identity As used herein, “sequence identity” refers to the percentage of nucleotide / amino acid residues in the main sequence that are identical to those in the reference sequence after aligning sequences and, where necessary, introducing gaps to achieve the maximum percentage of sequence identity between sequences. To determine the percentage of sequence identity between two or more nucleic acid or amino acid sequences, those skilled in the art are familiar with various methods for performing double and multiple sequence alignments, for example, using publicly available computer software such as ClustalOmega (Söding, J. Bioinformatics (2005) 21, 951-960), T-coffee (Notredame et al. J. Mol. Biol. (2000) 302, 205-217), Kalign (Lassmann and Sonnhammer 2005, BMC Bioinformatics, 6(298)), and MAFFT (Katoh and Standley, Molecular Biology and Evolution (2013) 30(4) 772–780). When using this software, use the default parameters, preferably using features such as open space penalty and extended penalty.

[0089] The main advantages of this invention include: (1) This invention is the first to discover that members of the sixth family of uracil DNA glycosylases have the activity of recognizing and removing hypoxanthine (Hx) produced by deamination of the A base.

[0090] (2) The protein volume of the novel DNA glycosylation enzyme of the present invention is much smaller than that of commonly used DNA glycosylation enzymes, and it has better delivery advantages.

[0091] (3) The novel DNA glycosylation enzyme of the present invention is fused with deaminase and Cas protein to construct a novel base editing system, which can accurately complete base transversion editing without causing DNA double-strand breaks.

[0092] (4) The novel base editing system constructed in this invention can effectively perform A→T and A→C base editing.

[0093] The present invention will be further illustrated below with reference to specific embodiments. It should be understood that these embodiments are for illustrative purposes only and are not intended to limit the scope of the invention. Experimental methods in the following embodiments, unless otherwise specified, are generally performed under conventional conditions, such as those described in Sambrook et al., Molecular Cloning: A Laboratory Manual (New York: Cold Spring Harbor Laboratory Press, 1989), or as recommended by the manufacturer. Unless otherwise stated, percentages and parts are weight percentages and parts by weight.

[0094] Example 1: Obtaining a novel uracil DNA glycosylation enzyme A search of the NCBI database for potential glycosylation enzymes that could recognize damaged A bases revealed functional diversity within the uracil DNA glycosylation enzyme family. While members of this family typically recognize only uracil (U) produced by the deamination of C bases, this study found that members of the sixth branch of this family have the potential to recognize and excise hypoxanthine (Hx) produced by the deamination of A bases. Further gene sequences and their corresponding coding amino acid sequences of each member within this branch were extracted, and after codon optimization, the target gene was artificially synthesized. This gene served as a template for constructing a subsequent fusion protein expression vector, and its activity was experimentally tested.

[0095] Example 2: Similarity comparison between members of the sixth family of uracil DNA glycosylases and existing glycosylases This embodiment involves using TBtools software to perform multiple sequence alignment on the novel uracil DNA glycosylation enzyme obtained in Example 1 and various existing glycosylation enzymes, and comparing the sequence similarity between them.

[0096] The results show that ( Figure 1 The sixth family member of uracil DNA glycosylase has extremely low amino acid sequence similarity to reported glycosylases such as MPG and UNG, and can be considered a novel glycosylase in base editing tools.

[0097] Example 3: Activity test of an adenine transversion editor assembled from a novel uracil DNA glycosylase Enzymes from different sources of the uracil DNA glycosylase family were ligated to the C-terminus of the adenine deaminase TadA8e and nCas9 fusion proteins, respectively, and the reported AYBE vector was used as a control (producing A→T or A→C base substitutions). Ultimately, seven novel AHDG transversion editors were generated. Figure 2 Four targeting sgRNAs were designed on endogenous genes in rice and ligated into constructs (OsWaxy-sg1, OsWaxy-sg2, OsSLR1-sg1, OsNRT1.1B-sg1), which were then transformed into rice protoplasts for testing. The edited endogenous sites in rice were then subjected to high-throughput sequencing.

[0098] Tests conducted in rice protoplasts at four endogenous target sites revealed that A→G editing is produced by TadA8e deaminase, while A→T and A→C editing are produced by the combined action of adenine deaminase TadA8e and glycosylation enzymes. Figure 3-6 ).

[0099] The results showed that the fusion construct AHDG-Rco exhibited significant A→T transversion editing activity and weak A→C editing activity, marking the first discovery of base editing activity in these glycosylation enzymes (Tables 1-4). Although AYBE showed the highest overall activity, AHDG-Rco demonstrated higher editing efficiency at certain base sites, and the protein volume of glycosylated Rco was 30%-50% smaller than that of MPGv3 in the control AYBE, giving it an advantage in the delivery of editing tools.

[0100] Therefore, Rco, a member of the sixth family of uracil DNA glycosylases, has the potential for A→T and A→C base transversion editing, and is worthy of further modification and evolution, and has important application value.

[0101] Table 1. Base editing efficiency of the novel base editor on OsWaxy-sg1 Table 2. Base editing efficiency of the novel base editor on OsWaxy-sg2 Table 3. Base editing efficiency of the novel base editor on OsSLR1-sg1 Table 4. Base editing efficiency of the novel base editor on OsNRT1.1B-sg1 sequence All documents mentioned in this invention are incorporated herein by reference as if each document were individually incorporated by reference. Furthermore, it should be understood that after reading the foregoing teachings of this invention, those skilled in the art can make various alterations or modifications to this invention, and these equivalent forms also fall within the scope defined by the appended claims.

Claims

1. A fusion protein, characterized in that, The amino acid sequence of the fusion protein is SEQ ID NO:

7.

2. A CRISPR-Cas composition, characterized in that, The CRISPR-Cas composition comprises: (1) Guide RNA, or one or more DNA molecules encoding said guide RNA; and (2) The fusion protein of claim 1, or a nucleic acid molecule encoding the fusion protein of claim 1; The CRISPR-Cas composition is used to transpose base A with base T or base C.

3. A polynucleotide, characterized in that, The polynucleotide encodes the fusion protein of claim 1.

4. A carrier, characterized in that, The carrier contains the polynucleotide as described in claim 3.

5. A delivery system, characterized in that, The delivery system includes the fusion protein of claim 1, the CRISPR-Cas composition of claim 2, the polynucleotide of claim 3, or the vector of claim 4, and the delivery medium.

6. A host cell, characterized in that, The host cell expresses the fusion protein of claim 1, or the CRISPR-Cas composition of claim 2, or has the polynucleotide of claim 3 integrated into its genome, or contains the vector of claim 4, or the delivery system of claim 5.

7. A reagent kit, characterized in that, The kit comprises the fusion protein of claim 1, the CRISPR-Cas composition of claim 2, or the host cell of claim 6.

8. A method for replacing base A in a target nucleic acid with base T or base C, characterized in that, The method includes contacting the target nucleic acid with the CRISPR-Cas composition of claim 2.

9. Use of the fusion protein of claim 1, the CRISPR-Cas composition of claim 2, the polynucleotide of claim 3, the vector of claim 4, the delivery system of claim 5, the host cell of claim 6, or the kit of claim 7, characterized in that, Used for base editing, base transversion, or plant breeding.