High-efficiency high-precision cytosine c to guanine g conversion base editor

By modifying the structure of the C-to-G base editor and optimizing uracil N-glycosylation enzyme and cytosine deaminase, the targeted editing efficiency and accuracy of the C-to-G base editor were improved, solving the problems of insufficient efficiency and accuracy in existing technologies and reducing off-target effects.

CN115703842BActive Publication Date: 2026-06-23AGRICULTURAL GENOMICS INSTITUTE AT SHENZHEN CHINESE ACADEMY OF AGRICULTURAL SCIENCES (SHENZHEN BRANCH GUANGDONG LABORATORY FOR LINGNAN MODERN AGRICULTURE) +1

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
AGRICULTURAL GENOMICS INSTITUTE AT SHENZHEN CHINESE ACADEMY OF AGRICULTURAL SCIENCES (SHENZHEN BRANCH GUANGDONG LABORATORY FOR LINGNAN MODERN AGRICULTURE)
Filing Date
2021-08-10
Publication Date
2026-06-23

Smart Images

  • Figure BDA0003205057220000121
    Figure BDA0003205057220000121
  • Figure BDA0003205057220000131
    Figure BDA0003205057220000131
  • Figure BDA0003205057220000132
    Figure BDA0003205057220000132
Patent Text Reader

Abstract

The application provides a high-efficiency and high-precision base editor for cytosine C to guanine G conversion. The application establishes a novel C-to-G base editor, compared with the existing C-to-G base editor, the targeting editing efficiency and editing precision of the editor of the application are significantly improved, and the editor product has the characteristics of high product purity, narrow editing window and low off-target effect.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention belongs to the field of biotechnology, specifically, it is a base editor for the conversion of cytosine C to guanine G. Background Technology

[0002] With the development of modern life sciences, and to meet the rapid advancements in genetics, molecular biology, developmental biology, and medicine, manipulating specific sites on the genome at the cellular, embryonic, and even in vivo levels has become a major challenge for scientists. In recent years, a series of nucleases have been discovered, greatly promoting the development of gene editing technology. These mainly include three types of nuclease systems: zinc finger nucleases, TALENs, and the CRISPR / Cas system.

[0003] Adenosine and cytidine deaminases mediate adenine and cytidine deamination, which are the most common base editing mechanisms in mammals and plants. Adenosine deaminase mediates the deamination of adenine (A) in RNA to hypoxanthine (I), which is then recognized as guanine (G) during transcription and translation, resulting in the A>G conversion. Cytidine deaminase mediates the deamination of cytosine (C) in DNA or RNA to uracil (U), which is then recognized as thymine (T) by polymerases in DNA, resulting in the C>T conversion. Both adenosine and cytidine deaminases exhibit physiological functions in biological processes, such as immunity, development, and stress responses. Furthermore, by fusing with CRISPR effectors, using the CRISPR effector as an anchor and the deaminase as a catalyst, a family of single-base editors has been created, enhancing the ability to precisely edit specific bases in DNA or RNA sequences. For example, fusing APOBEC1 or APOBEC3 with Cas9 creates a C>T single-base DNA editor. The ADAR deaminase family and modified ADAR were fused with CRISPR-Cas13s to create a single-base RNA editor with A>I and C>U switching capabilities.

[0004] Base editors for C>G (C-to-G) base conversions were previously a challenge in the field. It wasn't until 2020 that researchers developed single-base editing tools (CGBEs) capable of mediating C>G base conversions. The team led by Ruqian Liu at the Broad Institute subsequently reported C-to-G base editors (CBEs) that can achieve cytosine C to thymine T conversions and adenine A to guanine G conversions (ABEs) without generating double-strand breaks. However, these BEs cannot install C to G or A to T conversions, which are pathogenic point mutations in 40% of human mutations. Two reports in 2020 demonstrated that replacing the uracil-DNA glycosyltransferase inhibitor (UGI) in CBEs with uracil-DNA glycosyltransferase (UNG) creates a new generation of editors (CGBEs) that can achieve cytosine C to guanine G conversions.

[0005] Current research has several limitations. Firstly, it tests a limited number of sites, failing to elucidate the impact of flanking sequences on editing effectiveness. Secondly, it lacks in-depth exploration of off-target effects of CGBEs. Furthermore, as research on CGBE tools is still in its early stages, the efficiency and accuracy of their targeted editing urgently need further improvement. Summary of the Invention

[0006] In biological research and applications, precise alteration of single nucleotides is a powerful method for gene editing. The purpose of this invention is to provide a highly efficient and precise base editing technique for the conversion of cytosine C to guanine G.

[0007] In a first aspect of the invention, a method is provided to improve the targeted editing efficiency and editing accuracy of a C-to-G base editor, comprising: modifying the C-to-G base editor to include the following elements in an operationally sequentially connected manner: nuclear localization signal - uracil N-glycosylation enzyme (UNG) - cytosine deaminase - Cas enzyme - nuclear localization signal; wherein the uracil N-glycosylation enzyme is a nematode-derived uracil N-glycosylation enzyme or an Escherichia coli-derived uracil N-glycosylation enzyme.

[0008] In one or more embodiments, the cytosine deaminase comprises an enzyme selected from the group consisting of: APOBEC1, with its 90th position mutated to Tyr(Y) and its 126th position mutated to Glu(E); APOBEC3A or its CTD domain; or, APOBEC3G or its CTD domain.

[0009] In one or more embodiments, the Cas9 enzyme comprises an enzyme selected from the group consisting of Cas9n, Cas9n-NG, xCas9n, and Cas9; preferably Cas9n-NG.

[0010] In one or more embodiments, the uracil N-glycosylation enzyme and the cytosine deaminase are connected by a linker; preferably, the uracil N-glycosylation enzyme and the cytosine deaminase are connected by a linker with the sequence shown in SEQ ID NO:3 (32AA, SGGSSGGSSGSETPGTSESATPESSGGSSGGS).

[0011] In one or more embodiments, the cytosine deaminase and the Cas enzyme are connected by a linker; preferably, the cytosine deaminase and the Cas enzyme are connected by a linker with the sequence shown in SEQ ID NO:4 (16AA, SGSETPGTSESATPES).

[0012] In one or more embodiments, the amino terminus of the nuclear localization sequence further includes a tag sequence, preferably a Flag tag; more preferably a 3xFlag tag, the amino acid sequence of which is shown in SEQ ID NO:5 (DYKDHDGDYKDHDIDYKDDDDK).

[0013] In one or more embodiments, the amino acid sequence of the nematode-derived uracil N-glycosylation enzyme is as shown in SEQ ID NO:1, or its active domain, or a conserved variant polypeptide thereof.

[0014] In one or more embodiments, the amino acid sequence of the E. coli-derived uracil N-glycosylation enzyme is as shown in SEQ ID NO:2, or its active domain, or its conserved variant polypeptide.

[0015] In one or more embodiments, the amino acid sequence of the nuclear localization signal is as shown in SEQ ID NO:6 (PKKKRKV).

[0016] In one or more embodiments, the various elements form a fusion polypeptide / fusion protein.

[0017] In one or more embodiments, the various elements are directly connected to each other, or may further include connectors (connection sequences).

[0018] In one or more embodiments, the operational sequence connection is a connection from the N end to the C end.

[0019] In one or more embodiments, the uracil N-glycosylation enzyme is preferably a nematode-derived uracil N-glycosylation enzyme.

[0020] In one or more embodiments, the conserved variant polypeptide of uracil N-glycosylation enzyme derived from nematodes or Escherichia coli includes: (1) a polypeptide formed by substitution, deletion or addition of one or more (e.g., 1-20, preferably 1-10; more preferably 1-5; more preferably 1-3) amino acid residues of a polypeptide of the sequence shown in SEQ ID NO:1 or SEQ ID NO:2, and having the function of the polypeptide of the sequence shown in SEQ ID NO:1 or SEQ ID NO:2; (2) a polypeptide whose amino acid sequence is more than 80% (preferably more than 85%; more preferably more than 90%; more preferably more than 95%; more preferably more than 98%; more preferably more than 99%) identical to the polypeptide of the sequence shown in SEQ ID NO:1 or SEQ ID NO:2, and having the function of the polypeptide of the sequence shown in SEQ ID NO:1 or SEQ ID NO:2; or (3) a polypeptide formed by adding a tag sequence or a signal peptide sequence to the N or C end of the polypeptide of the sequence shown in SEQ ID NO:1 or SEQ ID NO:2.

[0021] In one or more embodiments, the improvement of the C-to-G base editor's targeted editing efficiency and accuracy includes: increasing the product purity of the edited product (i.e., significantly reducing the amount of C to A and C to T editing), promoting a narrower editing window, and reducing off-target effects; preferably, the improvement of the C-to-G base editor's targeted editing efficiency and accuracy includes selecting from: (a) significantly increasing the probability (preferentiality) of the C-to-G base editor targeting the tribase motif "WCW"; preferably, wherein the cytosine deaminase is APOBEC1, and its 90th position mutation is T. (a) yr(Y) and the mutation at position 126 is changed to Glu(E); (b) significantly increases the probability (preferentiality) of the C-to-G base editor targeting the tribase motif “TCW”; preferably, wherein the cytosine deaminase is APOBEC3A; or (c) significantly increases the probability (preferentiality) of the C-to-G base editor targeting the tribase motif “CCN”; preferably, wherein the cytosine deaminase is APOBEC3G; more preferably, the target site contains three or more consecutive Cs; wherein in the tribases, W represents A or T, and N represents A, T, C, or G.

[0022] In one or more embodiments, in (a) when W is A, it is preferred to prepare a C-to-G base editor (cOPTI-CGBE) using UNG derived from nematodes.

[0023] In another aspect of the invention, a C-to-G base editor (characterized by high targeted editing efficiency and editing precision) is provided, comprising the following elements connected in an operational sequence: nuclear localization signal - uracil N-glycosylation enzyme (UNG) - cytosine deaminase - Cas enzyme - nuclear localization signal; wherein the uracil N-glycosylation enzyme is a nematode-derived uracil N-glycosylation enzyme or an Escherichia coli-derived uracil N-glycosylation enzyme.

[0024] In another aspect of the invention, isolated polynucleotides or recombinant expression vectors containing such polynucleotides are provided; wherein the polynucleotides encode any of the C-to-G base editors described above.

[0025] In another aspect of the invention, a genetically engineered host cell is provided, which contains the recombinant expression vector described above, or whose genome is integrated with the isolated polynucleotides described above.

[0026] In another aspect of the invention, the use of any of the preceding C-to-G base editors is provided for C-to-G gene editing (including gene editing with C-to-G as the primary objective), thereby improving the targeted editing efficiency and editing accuracy of the C-to-G base editor.

[0027] In another aspect of the invention, the use of any of the preceding C-to-G base editors is provided for the preparation of reagents or kits for performing C-to-G gene editing (including gene editing with C-to-G as the primary objective), improving the targeted editing efficiency and editing accuracy of C-to-G base editors.

[0028] In one or more embodiments, the C-to-G base editor is used for gene editing in an in vitro method for non-living organisms.

[0029] In one or more embodiments, the C-to-G base editor is intended for use in gene editing in organisms that do not develop into living organisms.

[0030] In one or more embodiments, the C-to-G base editor is used for gene editing as a cellular-level method, including single-cell, two-cell, or multi-cell methods.

[0031] In one or more embodiments, the C-to-G base editor is used for gene editing purposes that are not directly for disease diagnosis.

[0032] In another aspect of the invention, a method for performing C-to-G gene editing (including gene editing primarily for C-to-G purposes) is provided, comprising gene editing mediated by any of the C-to-G base editors described above; preferably, gene editing is performed by co-injecting a nucleic acid sequence encoding the C-to-G base editor and gRNA / sgRNA into a receptor; preferably, the receptor comprises a somatic cell or a germ cell; preferably, the germ cell comprises an embryonic cell or a fertilized egg.

[0033] In one or more embodiments, the gene editing method is an in vitro method for non-living organisms.

[0034] In one or more embodiments, the gene-editing method is targeted at objects that do not develop into living organisms.

[0035] In one or more embodiments, the gene editing method is a cellular-level method, including single-cell, two-cell, or multi-cell methods.

[0036] In one or more embodiments, the gene editing method is for purposes other than disease diagnosis.

[0037] In another aspect of the invention, a reagent or kit for gene editing is provided, comprising any of the C-to-G base editors described above; or, comprising the isolated polynucleotides described above; or, comprising a recombinant expression vector containing the polynucleotides.

[0038] Other aspects of the invention will be apparent to those skilled in the art from the disclosure herein. Attached Figure Description

[0039] Figure 1 a. Schematic diagram of CGBEs.

[0040] Figure 1 b. Editing efficiency of CGBEs with different UNG compositions at 34 endogenous sites (in the figure, the lower the number, the greater the significance compared with the control, the same applies below).

[0041] Figure 1 c. Optimize the editing efficiency of CGBEs at 34 endogenous sites.

[0042] Figure 1 d, Editing efficiency of CGBEs at different positions C in 34 endogenous sites.

[0043] Figure 2 a. Editing efficiency of CGBEs at 34 endogenous sites.

[0044] Figure 2b. Editing efficiency of CGBEs at different positions of C in 34 endogenous sites.

[0045] Figure 2 c. The ratio of C to G editing efficiency of CGBEs at 34 endogenous sites to C to A or T.

[0046] Figure 2 d. The proportion of CGBEs inserted or deleted at 34 endogenous sites.

[0047] Figure 2 Editing efficiency of e, PE2, PE3, and CGBEs at different sites.

[0048] Figure 2 f. The proportion of insertions or deletions of PE2, PE3, and CGBEs at different sites.

[0049] Figure 3 a. Number of single nucleotide mutations across the entire genome.

[0050] Figure 3 b. Single nucleotide mutations across the entire genome.

[0051] Figure 3 c. Number of single nucleotide mutations within the transcriptome.

[0052] Figure 3 d. Single nucleotide mutation types within the transcriptome.

[0053] Figure 3 e. Editing efficiency of predicted sites.

[0054] Figure 4 a. The influence of upstream base type of the target site on C to G editing efficiency.

[0055] Figure 4 b. The impact of downstream base type at the target site on C-to-G editing efficiency.

[0056] Figure 4 c. The influence of upstream base type on the efficiency of C-to-A or T editing at the target site.

[0057] Figure 4 d. The effect of downstream base type of the target site on the C to A or T editing efficiency.

[0058] Figure 4 e. The influence of upstream and downstream base types at the target site on the editing efficiency from C to G, A, or T.

[0059] Figure 4 f. The influence of upstream and downstream base types at the target site on the insertion or deletion ratio.

[0060] Figure 5a. Schematic diagram of eA3A-OPTI-CGBEs.

[0061] Figure 5 b. Editing efficiency of target C in different positions of eA3A-OPTI-CGBEs.

[0062] Figure 5 c. The influence of upstream base type on the editing efficiency from C to G, A, or T at the target site.

[0063] Figure 5 d. The impact of downstream base type at the target site on the editing efficiency from C to G, A, or T.

[0064] Figure 5 e. The impact of upstream and downstream base types at the target site on C-to-G editing efficiency.

[0065] Figure 5 f. The influence of upstream and downstream base types at the target site on the insertion or deletion ratio.

[0066] Figure 6 a. Schematic diagram of hA3G-OPTI-CGBEs and hA3G-CTD-OPTI-CGBEs.

[0067] Figure 6 b. Editing efficiency from C to G for target C at different locations.

[0068] Figure 6 c. The influence of upstream base type on the editing efficiency from C to G, A, or T at the target site.

[0069] Figure 6 d. The impact of downstream base type at the target site on the editing efficiency from C to G, A, or T.

[0070] Figure 6 e. The impact of upstream and downstream base types at the target site on C-to-G editing efficiency.

[0071] Figure 6 f. The influence of upstream and downstream base types at the target site on the insertion or deletion ratio.

[0072] Figure 6 g. When there are multiple Cs at the target site, the efficiency of C to G editing at different C positions.

[0073] Figure 7 a. Identify different OPTI-CGBEs-NG schematic diagrams of NG PAM.

[0074] Figure 7 b. The effect of editing efficiency on target site C to G, A or T.

[0075] Figure 7 c. The proportion of target site insertions or deletions.

[0076] Figure 8 a. C-to-G editing efficiency of upstream and downstream base pairs of OPTI-CGBEs at the library horizontal target position.

[0077] Figure 8 b. C-to-G editing efficiency of upstream and downstream base pairs of eA3A-OPTI-CGBEs at the target position in the library.

[0078] Figure 8 c. C-to-G editing efficiency of upstream and downstream base pairs of hA3G-OPTI-CGBEs at the target position in the library.

[0079] Figure 8 d. C-to-G editing efficiency of upstream and downstream base pairs of hA3G-CTD-OPTI-CGBEs at the target position in the library.

[0080] Figure 9 The C-to-G editing efficiency of all OPTI-CGBEs upstream and downstream base pairs at the target position in the library.

[0081] Figure 10 a. Two-cell injection protocol during the fertilized egg stage.

[0082] Figure 10 b. Cyst odds during the fertilized egg stage.

[0083] Figure 10 c. Comparison of editing efficiency at three sites in the embryo.

[0084] Figure 10 d. Sanger sequencing results.

[0085] Figure 10 e. Injecting embryos into mice and their offspring. Detailed Implementation

[0086] The inventors are dedicated to improving the efficiency and accuracy of base editors for the cytosine C to guanine G conversion. Through in-depth research, analysis, optimization, and experimentation, a novel C-to-G base editor has been developed. Compared with existing C-to-G base editors, the editor of this invention significantly improves the targeted editing efficiency and editing accuracy, and features high product purity of the edited product, narrow editing window, and low off-target effect.

[0087] the term

[0088] As used in this article, the terms “C-to-G base editor”, “CGBE”, and “CGBEs” are used interchangeably and all refer to base editors that perform the conversion of cytosine C to guanine G.

[0089] As used herein, the term "conservative variant polypeptide" refers to a polypeptide that substantially retains the same biological function or activity as the polypeptide. A "conservative variant polypeptide" may be (i) a polypeptide in which one or more conserved or non-conserved amino acid residues (preferably conserved amino acid residues) are substituted, and such substituted amino acid residues may or may not be encoded by the genetic code; or (ii) a polypeptide having substituent groups in one or more amino acid residues; or (iii) a polypeptide formed by fusing a mature polypeptide with another compound (e.g., a compound that extends the half-life of the polypeptide, such as polyethylene glycol); or (iv) a polypeptide formed by fusing an additional amino acid sequence to the sequence of this polypeptide (e.g., a leader sequence or secretion sequence or a sequence used to purify this polypeptide or a proteogen sequence, or a fusion protein formed with an antigen IgG fragment). In accordance with the teachings herein, such fragments, derivatives, and analogs are within the scope well known to those skilled in the art.

[0090] As used herein, the term "variant" or "mutant" refers to a peptide or polypeptide whose amino acid sequence has been altered compared to a reference sequence by the insertion, deletion, or substitution of one or more amino acids, but which retains at least one biological activity. Mutants described in any embodiment herein comprise amino acid sequences having at least 80%, preferably at least 85%, preferably at least 90%, preferably at least 95%, preferably at least 97% sequence identity to a reference sequence (as described herein) and retaining the biological activity of the reference sequence (e.g., as a glycosyltransferase). Sequence identity between two aligned sequences can be calculated using, for example, NCBI's BLASTp. Mutants also include amino acid sequences having one or more mutations (insertion, deletion, or substitution) in the amino acid sequence of a reference sequence while still retaining the biological activity of the reference sequence. The plurality of mutations typically refers to 1-20, for example 1-15, 1-10, 1-8, 1-5, or 1-3. Substitutions are preferably conserved substitutions. For example, in the art, conserved substitutions with amino acids of similar or comparable properties generally do not alter the function of the protein or polypeptide. "Amino acids with similar or comparable properties" includes, for example, families of amino acid residues having similar side chains. These families include amino acids with basic side chains (e.g., lysine, arginine, histidine), amino acids with acidic side chains (e.g., aspartic acid, glutamic acid), amino acids with uncharged polar side chains (e.g., glycine, asparagine, glutamine, serine, threonine, tyrosine, cysteine), amino acids with nonpolar side chains (e.g., alanine, valine, leucine, isoleucine, proline, phenylalanine, methionine, tryptophan), amino acids with β-branched side chains (e.g., threonine, valine, isoleucine), and amino acids with aromatic side chains (e.g., tyrosine, phenylalanine, tryptophan, histidine). Therefore, replacing one or more sites in the polypeptide of the present invention with another amino acid residue from the same side chain class will not substantially affect its activity.

[0091] As used herein, "operationally linked" or "operably coupled to" refers to a situation where certain portions of a linear DNA sequence can regulate or control the activity of other portions of the same linear DNA sequence. For example, if a promoter controls transcription of a sequence, then it is operably coupled to the coding sequence. "Operationally sequential coupling" refers to the coupling of elements in a specific order, such as from amino acid to carboxyl terminus.

[0092] As used herein, a "connector" refers to a chemical group or molecule that connects two molecules or parts, such as two domains of a fusion protein, or a chemical group or molecule connecting the Cas9 enzyme and a deaminase. In some connections, the connector is located between or flanking two groups, molecules, or other parts and is covalently linked to them. In some embodiments, the connector is an amino acid or multiple amino acids (e.g., a peptide or protein). In some embodiments, the connector is an organic molecule, group, polymer, or chemical part. The length and type of the connector can be designed as needed.

[0093] As used herein, "mutation" refers to the substitution of a residue in a sequence (e.g., a nucleic acid or amino acid sequence) by another residue, the change of one or more residues in a sequence to another residue, or the occurrence of deletion or insertion. In the editor of this invention, it is desirable to generate as many "C-to-G" mutations as possible to improve the purity of its editing, while reducing or eliminating mutations in other bases.

[0094] As used herein, “insertion / deletion” refers to the insertion or deletion of nucleotide bases within nucleic acids. Such insertions or deletions can lead to frameshift mutations within gene coding regions. In some implementations, indels are used as an indicator of a base editor that is expected to produce efficient modifications (e.g., mutations or deamination) of specific nucleotides within nucleic acids without generating a large number of insertions or deletions (i.e., insertions / deletions) within the nucleic acid.

[0095] As used herein, guide RNA (gRNA) can exist as a complex of two or more RNAs or as a single RNA molecule. gRNA existing as a single RNA molecule can be called single guide RNA (sgRNA), although the term "gRNA" is used interchangeably to refer to guide RNA existing as a single molecule or as a complex of two or more molecules. Typically, gRNA existing as a single RNA species contains two domains: (1) a domain that shares homology with the target nucleic acid (e.g., and guides the binding of the Cas9 complex to the target); and (2) a domain that binds to the Cas9 protein.

[0096] C-to-G base editor

[0097] The C-to-G base editor of this invention includes the following elements in an operationally sequentially linked manner: nuclear localization signal - uracil N-glycosylation enzyme (UNG) - cytosine deaminase - Cas enzyme - nuclear localization signal.

[0098] Uracil N-glycosylation enzyme (UNG)

[0099] The C-to-G base editor includes UNG. The mechanism of UNG enzyme action is the selective hydrolysis and breakage of uracil glycosidic bonds in double-stranded or single-stranded DNA containing dU, forming a DNA strand with missing bases. That is, UNG can remove uracil from the uracil base sites formed by cytosine deaminase, establishing apurinol / pyrimidine-free (AP) sites, inducing DNA repair at the DNA damage sites, thereby achieving base conversion.

[0100] Although UNG is found in many species, our research revealed significant differences in performance between UNGs from different species when used to fuse with other functional molecules to form C-to-G base editors. Through large-sample analysis, we selected optimized UNGs and excluded those with unsatisfactory performance.

[0101] Therefore, in a preferred embodiment of the present invention, the UNG is nematode-derived UNG, or Escherichia coli-derived UNG, or its active domain, or a conserved variant polypeptide thereof; most preferably, the UNG is nematode-derived UNG, or its active domain, or a conserved variant polypeptide thereof. The inventors have found that this optimized selection can significantly improve the conversion of cytosine C to guanine G at the target site.

[0102] Cytosine deaminase

[0103] The C-to-G base editor described in this invention includes a cytosine deaminase. In this invention, "deaminase" refers to a protein or enzyme that catalyzes a deamination reaction. Preferably, the deaminase is a cytosine (cytidine) deaminase, which catalyzes the hydrolysis and deamination of cytidine or deoxycytidine to uridine or deoxyuridine. In some embodiments, the deaminase is a domain (such as a CTD domain) of a cytidine deaminase.

[0104] Some research has been conducted in this field on cytosine deaminases, and various cytosine deaminases have been proposed. In this invention, the cytosine deaminase includes APOBEC1 or its homologs. The APOBEC1 homologs include enzymes that perform the same or similar functions as APOBEC1, or enzymes that have substantially the same or similar structural domains as APOBEC1, or enzymes that originate from different species but perform the same function in their respective species. For example, the APOBEC1 homologs include enzymes selected from the group consisting of: APOBEC3G, APOBECA3A.

[0105] As a preferred embodiment of the present invention, a modified cytosine deaminase is preferably used, wherein the amino acids corresponding to Trp(W) at position 90 and Arg(R) at position 126 of APOBEC1 are mutated, and the cytosine deaminase is linked to a nuclear localization sequence. Preferably, the mutation is that the Trp at position 90 of APOBEC1 of the cytosine deaminase is mutated to Tyr(Y); and / or that the Arg at position 126 is mutated to Glu(E).

[0106] In a preferred embodiment of the invention, the cytosine deaminase and the nuclear localization sequence are further linked by a linker sequence. This linker sequence can be any sequence that does not affect the function of either sequence, such as a tag sequence or some flexible linker sequences known in the art. Suitable tags can be used in this invention. For example, the tag can be FLAG, HA, HA1, c-Myc, Poly-His, Poly-Arg, Strep-TagII, AU1, EE, T7, 4A6, ε, B, gE, or Ty1.

[0107] Cas enzyme

[0108] In this invention, the Cas enzyme is an RNA-guided nuclease. Methods for site-specific cleavage (e.g., genome modification) using Cas enzymes such as Cas9 are known in the art. In some embodiments, the Cas9 domain is the Cas9 domain (SpCas9) from Streptococcus pyogenes. In a preferred embodiment, the Cas9 enzyme is an SpCas9 cleavage enzyme (SpCas9n).

[0109] In the modified base editor of this invention, the UNG, cytosine deaminase, and Cas enzyme can be recombinant proteins, natural proteins, or synthetic proteins, with recombinant proteins being preferred. The proteins of this invention can be naturally purified products, chemically synthesized products, or produced from prokaryotic or eukaryotic hosts (e.g., bacteria, yeast, higher plants, insects, and mammalian cells) using recombinant technology.

[0110] This invention also includes fragments, derivatives, and analogs of the UNG, cytosine deaminase, or Cas enzyme (i.e., the selected or modified enzymes of this invention). As used herein, the terms “fragment,” “derivative,” and “analyte” refer to proteins that substantially retain the same biological function or activity as the selected or modified enzymes of this invention. The protein fragments, derivatives, or analogs of this invention may be (i) proteins with one or more conserved or non-conserved amino acid residues (preferably conserved amino acid residues) substituted, which may or may not be encoded by the genetic code; or (ii) proteins having substituent groups in one or more amino acid residues; or (iii) proteins formed by fusing additional amino acid sequences to this protein sequence (such as leader sequences, secretory sequences, sequences used to purify this protein, or proteomic sequences, or fusion proteins). These fragments, derivatives, and analogs are within the scope well known to those skilled in the art as defined herein. However, in some embodiments, the amino acid sequences of the selected or modified enzymes and their fragments, derivatives, and analogs contain mutations specified in this invention, where the sites of these mutations are conserved if necessary.

[0111] In this invention, the term "selectively optimized or modified enzyme" also includes (but is not limited to): the deletion, insertion, and / or substitution of several amino acids (typically 1-20, more preferably 1-10, and even more preferably 1-8, 1-5, 1-3, or 1-2), and the addition or deletion of one or more amino acids (typically up to 20, preferably up to 10, and more preferably up to 5) at the C-terminus and / or N-terminus. For example, in the art, substitution with amino acids of similar or comparable properties generally does not alter the function of the protein. Similarly, the addition of one or more amino acids at the C-terminus and / or N-terminus generally does not alter the function of the protein. The term also includes active fragments and active derivatives of the selected, optimized, or modified enzyme. However, in some embodiments, the amino acid sequences of the selected, optimized, or modified enzymes and their fragments, derivatives, and analogs contain mutations specified in this invention, and these mutation sites are conserved when necessary.

[0112] In this invention, the term "selectively optimized or modified enzyme" also includes (but is not limited to): derived proteins that retain the protein activity of having at least 80%, preferably at least 85%, more preferably at least 90%, and even more preferably at least 95%, such as at least 98% or 99%, sequence identity with the amino acid sequence of the selectedly optimized or modified enzyme. Similarly, in some embodiments, these derived proteins contain the mutations specified in this invention in the amino acid sequence of the selectedly optimized or modified enzyme and its fragments, derivatives, and analogues, where the sites of these mutations are conserved if necessary.

[0113] The present invention also provides a multinucleotide sequence encoding a selected or modified enzyme or a conserved variant thereof.

[0114] The polynucleotides of this invention can be in DNA or RNA form. DNA form includes cDNA, genomic DNA, or artificially synthesized DNA. DNA can be single-stranded or double-stranded. DNA can be a coding strand or a non-coding strand.

[0115] The polynucleotide encoding the mature protein of the mutant includes: a coding sequence that encodes only the mature protein; a coding sequence for the mature protein and various additional coding sequences; a coding sequence for the mature protein (and optional additional coding sequences) and a non-coding sequence.

[0116] "A polynucleotide encoding a protein" can be a polynucleotide that includes the protein itself, or it can include polynucleotides that also include additional coding and / or non-coding sequences.

[0117] The selected, optimized, or modified full-length enzyme nucleotide sequences or fragments thereof of the present invention can generally be obtained by PCR amplification, recombinant methods, or artificial synthesis. For PCR amplification, primers can be designed based on the nucleotide sequences disclosed in the present invention, especially the open reading frame sequences, and the relevant sequences can be amplified using commercially available cDNA libraries or cDNA libraries prepared according to conventional methods known to those skilled in the art as templates. When the sequence is long, it is often necessary to perform two or more PCR amplifications, and then splice the fragments amplified from each amplification in the correct order.

[0118] Once the relevant sequence is obtained, it can be obtained in large quantities using recombination methods. This typically involves cloning it into a vector, transferring it into cells, and then isolating the sequence from the proliferated host cells using conventional methods.

[0119] Furthermore, the relevant sequences can be synthesized artificially, especially when the fragment length is short. Typically, long fragments are obtained by first synthesizing multiple small fragments and then ligating them. Currently, the DNA sequence encoding the protein of the present invention (or a fragment thereof, or a derivative thereof) can be obtained entirely through chemical synthesis. This DNA sequence can then be introduced into various existing DNA molecules (or vectors) and cells known in the art. Furthermore, mutations can be introduced into the protein sequence of the present invention through chemical synthesis.

[0120] The present invention also relates to vectors containing the polynucleotides of the present invention, host cells generated by genetic engineering using the vectors of the present invention or selected, optimized or modified enzyme-coding sequences, and methods for generating the proteins of the present invention via recombinant technology.

[0121] Using conventional recombinant DNA technology, the polynucleotide sequence of the present invention can be used to express or produce recombinant, selected, or modified enzymes. Generally, the steps are as follows: (1) transforming or transducing suitable host cells with the polynucleotide encoding the selected, optimized, or modified enzyme of the present invention, or with a recombinant expression vector containing the polynucleotide; (2) culturing the host cells in a suitable culture medium; and (3) isolating and purifying the protein from the culture medium or cells.

[0122] In this invention, the selected, optimized, or modified enzyme polynucleotide sequence or the C-to-G base editor polynucleotide sequence can be inserted into a recombinant expression vector. The term "recombinant expression vector" refers to bacterial plasmids, bacteriophages, yeast plasmids, plant cell viruses, mammalian cell viruses, or other vectors well-known in the art. In short, various plasmids and vectors can be used as long as they can replicate and remain stable within the host. An important characteristic of expression vectors is that they typically contain an origin of replication, a promoter, a marker gene, and translation control elements. As a preferred embodiment of this invention, the expression vector is suitable for the introduction of a base editor and for intracellular expression, such as, but not limited to, pCMV vectors.

[0123] Methods well known to those skilled in the art can be used to construct expression vectors containing the selected, optimized, or modified enzyme polynucleotide sequence or the C-to-G base editor polynucleotide sequence and suitable transcription / translation control signals. These methods include in vitro recombinant DNA techniques, DNA synthesis techniques, in vivo recombination techniques, etc. The DNA sequence can be efficiently ligated to an appropriate promoter in the expression vector to guide mRNA synthesis. The expression vector also includes a ribosome binding site for translation initiation and a transcription terminator. The expression vector preferably contains one or more selective marker genes to provide phenotypic traits for selecting transformed host cells.

[0124] Vectors containing the appropriate DNA sequence and appropriate promoter or control sequence can be used to transform appropriate host cells into recipient cells.

[0125] Methods and applications of gene editing

[0126] This invention also provides a method for gene editing, including gene editing mediated by the C-to-G base editor described herein. Besides using the C-to-G base editor described herein for gene editing, other gene editing reagents known in the art can be used; for example, gRNA (or sgRNA) can be designed in a manner known in the art.

[0127] In this invention, there are no particular limitations on the applicable gene editing targets; they can be somatic cells or germ cells, animal cells or human cells.

[0128] Although C-to-G base editors (CGBEs) hold great promise as powerful gene editors, their application in science and medicine is limited by issues such as off-target effects on DNA and RNA, low purity of edited products, excessively wide editing windows, and low editing precision. In a specific embodiment of this invention, the inventors meticulously screened enzymes from various sources, combined them with appropriate mutations, and evaluated these effects by measuring editing efficiency and product purity. GOTI and RNA-Seq were used to analyze the off-target effects on DNA and RNA, respectively. The results showed:

[0129] (a) Among the CGBEs mentioned, cUNG-CGBE from nematodes and eUNG-CGBE from Escherichia coli significantly improved the conversion of cytosine C to guanine G at the target site; the effect of cUNG-CGBE from nematodes was particularly outstanding.

[0130] (b) Based on eUNG-CGBE, Apobec1, W90Y and R126E mutations (YE1) were performed to obtain YE1-cUNG-CGBE or YE1-eUNG-CGBE. The amount of C to A and C to T editing was greatly reduced, and the purity of the corresponding products was significantly improved. The effect of cUNG-CGBE derived from nematodes was particularly excellent.

[0131] (c) Based on YE1-cUNG-CGBE or YE1-eUNG-CGBE, after adding FNLS to the amino terminus, FNLS-YE1-cUNG-CGBE or FNLS-YE1-eUNG-CGBE was obtained, which showed higher overall editing efficiency; the effect of cUNG-CGBE derived from nematodes was particularly excellent.

[0132] (d) Based on FNLS-YE1-cUNG / eUNG-CGBE, YE1 and cUNG are swapped to obtain an editor whose main components include FNLS-cUNG / eUNG-Apobec1(YE1)-Cas9n-NLS. At this time, the C-to-G editing efficiency is further improved, the product purity of the edited product is significantly improved (the C to G conversion ratio is significantly increased), and the editing window is significantly narrowed; the effect of cUNG-CGBE from nematodes is particularly excellent.

[0133] This invention not only achieves remarkable technical results in improving the editing efficiency and purity of edited products in targeted editing, and reduces insertion and deletion mutations (indels) during targeted editing to a very low level; moreover, through multiple modifications, the inventors have optimized an editor adapted to various motif structures of target genes, which is beneficial for further optimization and improvement of editing accuracy.

[0134] In one embodiment of the present invention, the cytosine deaminase in the CGBEs base editor is APOBEC1, and its 90th position is mutated to Tyr (Y) and its 126th position is mutated to Glu (E); this editor has a preference for targeting the tribase motif “WCW” (W represents A or T), which is beneficial for narrowing the gene editing window.

[0135] In one embodiment of the present invention, the cytosine deaminase in the CGBEs base editor is APOBEC3A. This editor has a preference for targeting the tribase motif "TCW", which helps to narrow the gene editing window.

[0136] In one embodiment of the present invention, the cytosine deaminase in the CGBEs base editor is APOBEC3G. This editor has a preference for targeting the tribase motif “CCN” (N represents A, T, C or G), which helps to narrow the gene editing window.

[0137] Various Cas9 enzymes can be used in the method of this invention. An editor containing multiple Cas9 enzymes was also established in this invention, and its effectiveness was analyzed. As a preferred embodiment of this invention, Cas9n-NG and spGn are used, and their editing efficiency is higher than that of xCas9n.

[0138] Animal-level studies have also demonstrated that the optimized editor of this invention is an effective base editing tool for mammalian embryonic genomes.

[0139] The present invention also provides reagents or kits for performing the gene editing operations of the present invention, wherein the C-to-G base editor of the present invention is contained; or wherein the isolated polynucleotide or a recombinant expression vector containing the polynucleotide is contained.

[0140] The present invention will be further illustrated below with reference to specific embodiments. It should be understood that these embodiments are for illustrative purposes only and are not intended to limit the scope of the invention. Experimental methods in the following embodiments that do not specify specific conditions are generally performed according to conventional conditions such as those described in J. Sambrook et al., Molecular Cloning: A Laboratory Manual, 3rd Edition, Science Press, 2002, or according to the manufacturer's recommendations.

[0141] Example 1: Optimization of CGBEs Efficiency and Off-Target Study

[0142] 1. Optimization and transformation of CGBEs

[0143] The existing pCMV-BE3 (Addgene plasmid#73021) backbone plasmid includes deaminase rApobec1, nCas9 (Cas9n) and uracil DNA glycosylation inhibitor (UGI) (with NLS and PolyA designed at the carboxyl terminus), and CMV-mCherry is added to pCMV-BE3 as a positive selection marker.

[0144] To address the problems existing in current CGBE editors, the inventors optimized their efficiency. Firstly, they compared the use of uracil-DNA glycosylase (UNG) from different species to replace UGI (…). Figure 1 a) These constructs were derived from human (hUNG, NC_000012.12), *Escherichia coli* (eUNG, NC_000913.3), *C. nematode* (cUNG, SEQ ID NO:1), and mouse (mUNG, NC_000071.7), respectively, replacing the uracil DNA glycosylation inhibitor (UGI) in pCMV-BE3. The constructs used are shown in Table 1 and... Figure 1 a.

[0145] Table 1

[0146]

[0147]

[0148] The amino acid sequence of UNG (cUNG) derived from nematodes is as follows (SEQ ID NO:1):

[0149] MSKTVRIPDMFLKASAASKRKSASSNTENIPEKVPAGNENQEVKKMKLQAPEPTEILLKSLLTGESWSKLLEEEFKKGYISKIEKFLNSEVNKGKQVFPPPTQIFTTFNLLPFDEISVVIIGQDPYHDDNQAHGLSFSVQKG VKPPPSLKNIYKELESDIEGFKRPDHGNLLGWTRQGVFMLNATLTVRAHEANSHAKIGWQTFTDTVIRIISRQSEKPIVFLLWGGFAHKKEELIDTKKHVVIKTAHPSPLSARKWWGCKCFSKCNTELENSGRNPINWADL

[0150] The amino acid sequence of uracil N-glycosylation enzyme from Escherichia coli is as follows (SEQ ID NO:2):

[0151] MANELTWHDVLAEEKQQPYFLNTLQTVASERQSGVTIYPPQKDVFNAFRFTELGDVKVVILGQDPYHGPGQAHGLAFSVRPGIAIPPSLLNMYKELENTIPGFTRPNHGYLESW ARQGVLLLNTVLTVRAGQAHSHASLGWETFTDKVISLINQHREGVVFLLWGSHAQKKGAIIDKQRHHVLKAPPHPSPLSAHRGFFGCNHFVLANQWLEQRGETPIDWMPVLPAESE

[0152] HEK293T cells were transfected with CGBEs containing different UNG sources and gRNAs targeting 34 endogenous sites. After 48 hours of transfection, positive cells were sorted, and PCR amplification and sequencing were performed using primers (forward and reverse primers were designed based on the sequences flanking the target sites). The number of reads with C-to-G mutations at the target sites was counted and divided by the total number of reads. The 34 endogenous sites and gRNAs are shown in Table 2. The results showed that UNG from *E. coli* (eUNG-CGBE) and UNG from *C. nematode* (cUNG-CGBE) significantly increased the conversion of cytosine C to guanine G at the target sites. Figure 1 b, eUNG-CGBE, 12.5%, cUNG-CGBE, 14.8%. It is evident that cUNG is the most effective among the various UNGs.

[0153] Table 2

[0154]

[0155]

[0156] To prevent random DNA and RNA off-target effects induced by the deaminase rApobec1, the inventors mutated tryptophan (W) at position 90 of rApobec1 to tyrosine (Y) and arginine (R) at position 126 to glutamate (E), obtaining YE1-eUNG-CGBE and YE1-cUNG-CGBE, as shown in Table 3 and... Figure 1 a.

[0157] Table 3

[0158] YE1-eUNG-CGBE Based on eUNG-CGBE, rApobec1 undergoes W90Y and R126E mutations. YE1-cUNG-CGBE Based on cUNG-CGBE, rApobec1 undergoes W90Y and R126E mutations.

[0159] HEK293T cells were transfected with YE1-eUNG-CGBE and YE1-cUNG-CGBE and gRNA targeting 34 endogenous sites, respectively. After 48 hours of transfection, transfected positive cells were sorted, and the number of reads with C-to-G / A / T mutations at the target sites was counted and divided by the total number of reads.

[0160] The results are as follows Figure 1 After editing with c, YE1-eUNG-CGBE and YE1-cUNG-CGBE, the amount of C-to-A and C-to-T editing was significantly reduced, and the purity of the corresponding products was significantly improved.

[0161] To further improve the editing efficiency of YE1-eUNG-CGBE and YE1-cUNG-CGBE, the inventors added a Flag tag and a nuclear localization signal peptide to the N-terminus of the fusion protein and optimized the codons. FNLS: 3x Flag tag and NLS nuclear localization sequence (PKKKRKV), the constructs are shown in Table 4. Figure 1 a.

[0162] Table 4

[0163] FNLS-YE1-eUNG-CGBE Based on YE1-eUNG-CGBE, FNLS is added to the amino terminus. FNLS-YE1-cUNG-CGBE Based on YE1-cUNG-CGBE, FNLS is added to the amino terminus.

[0164] The 3xFlag tag, NLS nuclear localization sequence, and codon-optimized nCas9 sequence are referenced from pCMV-YE1-FNLS-BE3 (Addgene Plasmid#154005).

[0165] The results show ( Figure 1 c) Compared with YE1-eUNG-CGBE, the variant (FNLS-YE1-eUNG-CGBE) in HEK293T cells effectively achieved higher overall editing efficiency (the latter being twice that of the former).

[0166] Furthermore, the inventors fused eUNG to the N-terminus of CGBE (FNLS-eUNG-YE1-CGBE) instead of the original C-terminus position, thus changing the domain position. The constructs are shown in Table 5 and... Figure 1 a.

[0167] Table 5

[0168]

[0169] The results show ( Figure 1 c) This positional change in FNLS-eUNG-YE1-CGBE further improves editing efficiency (average 22.7%). Similarly, FNLS-cUNG-YE1-CGBE carrying cUNG at the N end also significantly improves C to G editing efficiency (29.6%). Figure 1b). The editing efficiency of cUNG-induced C-to-G is significantly higher than that of eUNG.

[0170] Simultaneously, this invention determined the product purity of the C-to-G edited products of each modified construct, and counted the number of reads with C-to-G / A / T mutations at the target sites divided by the total number of reads. For example... Figure 1 c. The product purity of the C-to-G edited products of FNLS-eUNG-YE1-CGBE and FNLS-cUNG-YE1-CGBE was also significantly improved. Among them, the construct with cUNG was even more effective than the construct with eUNG.

[0171] Furthermore, the inventors also determined the editing window of each modified construct. The efficiency of each C mutation to G on the gRNA was calculated (where PAM is located at 21-23).

[0172] The results are as follows Figure 1 d. The editing windows of FNLS-eUNG-YE1-CGBE and FNLS-cUNG-YE1-CGBE are narrowed, and they have high editing efficiency when located at 4-7 of the gRNA (21-23 for PAM).

[0173] 2. Comparison of optimized and modified CGBEs with the previously reported CGBE1

[0174] The inventors' optimized CGBE mutants FNLS-eUNG-YE1-CGBE and FNLS-cUNG-YE1-CGBE were compared with the published CGBE1 (Addgene plasmid#140252). The calculation method for C-to-G editing efficiency, the determination of the editing window, and the determination of the purity of the editing product are as described above.

[0175] The results show that the optimized CGBEs exhibit significantly higher C-to-G editing efficiency. Figure 2 a). Optimized CGBEs positioned at positions 5 and 6 in the editing window (with PAM positioned at 21-23) showed higher editing efficiency. Figure 2 b). Furthermore, compared to CGBE1, FNLS-eUNG-YE1-CGBE and FNLS-cUNG-YE1-CGBE produced edited products with higher purity, and the proportion of C-to-G conversions to other edited products was significantly increased. Figure 2 c).

[0176] The inventors also calculated the indel mutation frequency of CGBEs by dividing the number of reads with insertion or deletion mutations by the total number of reads. The results showed that the indel frequency of FNLS-cUNG-YE1-CGBE was significantly reduced. Figure 2 d).

[0177] Furthermore, the optimized CGBEs were compared with the lead editors PE2 (Addgene plasmid #132775) and PE3. Six different targets were tested in the experiments (EMX1 site 2, FANCF site 1, HEK293 site 1, HEK293 site 2, HEK293 site 3, RNF2 site 1), and it was found that PE2 and PE3 were significantly less efficient than the optimized CGBEs. Figure 2 e) PE3 also induces a higher frequency of insertion and deletion (indel) mutations. Figure 2 f).

[0178] 3. Determination of off-target effects of optimized and modified CGBEs

[0179] The inventors employed off-target research methods such as GOTI and RNA-seq to evaluate the potential DNA and RNA off-target effects of FNLS-eUNG-YE1-CGBE and FNLS-cUNG-YE1-CGBE. The detected sites are shown in Table 6, where OT1–OT7 are the predicted off-target sites. During detection, PCR amplification and sequencing were performed using primers (internal forward and internal reverse primers designed based on the sequences flanking the target sites).

[0180] The results are as follows Figure 3 a. The number of single nucleotide variants (SNVs) in CGBE-edited mouse embryos was similar to the baseline SNV level, significantly lower than in the BE3 group. Furthermore, no mutation preference was observed in the CGBE group. Figure 3 (b) This indicates that the optimized CGBE mutant did not cause detectable off-target effects at the DNA level. On the other hand, FNLS-eUNG-YE1-CGBE or FNLS-cUNG-YE1-CGBE did not increase the number of RNA SNVs or show any mutation preference, indicating that the optimized CGBE mutant did not cause RNA off-target effects. Figure 3 cd).

[0181] Furthermore, the inventors also found no significant sgRNA-dependent off-target effects at the gRNA-dependent off-target sites predicted by Cas-OFFinder. Figure 3 e).

[0182] Table 6

[0183]

[0184]

[0185] In summary, these results show that FNLS-eUNG-YE1-CGBE and FNLS-cUNG-YE1-CGBE, hereinafter referred to as eOPTI-CGBE and cOPTI-CGBE respectively, can achieve high C-to-G conversion efficiency with low off-target effects.

[0186] Example 2: Target sequence preference of OPTI-CGBEs in endogenous gene studies

[0187] As previously stated, the inventors evaluated the editing effect at 34 target sites. However, when the inventors analyzed the upstream and downstream sequences of successfully edited sites by eOPTI-CGBE, a clear bias was detected in the presence of a 3-base motif (“WCW”; W representing A or T), without increasing the editing of other products. Figure 4 The statistical products are ACN, CCN, GCN, TCN, NCA, NCC, NCG, and NCT (where "N" represents any base A, T, G, or C, and the "C" in the middle position is the target site for "C-to-G").

[0188] A very similar preference was also found in cOPTI-CGBE, but W has a higher priority for T than A. Figure 4 ad).

[0189] Next, to verify the analysis results, the inventors tested the editing effect of OPTI-CGBEs on 20 motifs containing "WCW" (Table 7). During the detection, PCR amplification and sequencing were performed using primers (forward and reverse primers designed based on the sequences flanking the target site).

[0190] The results showed that the eOPTI-CGBE editing efficiency of cytosine C targeting the “WCW” motif was significantly higher than that of target Cs without this motif (30.2% vs 9.5%). Figure 4 e). The cOPTI-CGBE editing efficiency also showed a similar improvement (34.0% vs 12.1%). Figure 4 e). Furthermore, comparative analysis of the "WCW" target sites also showed that the purity of the edited products from eOPTI-CGBE and cOPTI-CGBE was significantly improved, while the proportions of C-to-A and C-to-T editing decreased significantly, and the induced insertion frequency was greatly reduced (eOPTI-CGBE: 22.6% down to 12.6%; cOPTI-CGBE: 11.7% down to 5.6%). Figure 4 (ef). In comparison, the induced insertion frequency generated by cOPTI-CGBE is significantly lower than that of eOPTI-CGBE, indicating that choosing cUNG to establish the base editor of this invention has advantages.

[0191] Therefore, eOPTI-CGBE and cOPTI-CGBE can achieve high C-to-G editing efficiency when targeting sites with the "WCW" motif, and the product purity is high.

[0192] Table 7

[0193] On-target site gRNA sequence SEQ ID NO: EMX1 site 8 aagtctctctcttaatgacacgg 81 EMX1 site 9 gcgttcagggcctgggagcctgg 82 EMX1 site 10 gtctttcttaacgtattgagagg 83 EMX1 site 11 caaactcaaccacttcctgttgg 84 EMX1 site 12 aggacttctttggctccctgg 85 EMX1 site 13 tctagactcaacccactagatgg 86 EMX1 site 14 attcacagaaacctccctagagg 87 EMX1 site 15 ttgatcaatgtggaactcactgg 88 EMX1 site 16 tgctgtctgggccacatgcaggg 89 EMX1 site 17 cctgactgatggagggagagggg 90 DNMT3B site 6 aatttcttgaatccaggaggcgg 91 DNMT3B site 7 atagtctacatttatagtgctgg 92 DNMT3B site 8 atcacagacatgggtatgccagg 93 DNMT3B site 9 taaacatgtgttaaggatcctgg 94 DNMT3B site 10 ggctctgatcttcatcccctcgg 95 DNMT3B site 11 gggatcatgggacagatgggagg 96 DNMT3B site 12 tcttttctcaatagaacaaatgg 97 DNMT3B site 13 cctgacagggtggatgagccagg 98 DNMT3B site 14 ttagtcacttgcctactgtcagg 99 DNMT3B site 15 tgcctactggttaaaggacctgg 100

[0194] Example 3: OPTI-CGBEs mutant

[0195] To further expand the application scope of OPTI-CGBEs, the inventors further examined the sequence preferences of other deaminases in order to determine the target range in the CGBE editor.

[0196] The inventors explored three different deaminase modules: a human APOBEC3A deaminase (from Addgene Plasmid #131315) with a preference for the "TCN" motif, and two human APOBEC3G deaminases (from Addgene Plasmid #163636) with a preference for the C-sequence-rich pattern. Figure 5 a, Figure 6 a and Table 8.

[0197] Table 8

[0198]

[0199]

[0200] Using the measurement method described above, the inventors first analyzed the editing efficiency and activity window of these CGBE editors at 34 target sites (Table 2). The results showed that these new CGBEs also exhibited a reduced editing window, similar to eOPTI-CGBE and cOPTI-CGBE. Figure 5 b). The C-to-G editing efficiency of these CGBE editors at different target locations is as follows: Figure 6 b.

[0201] Furthermore, eA3A-eOPTI-CGBE and eA3A-cOPTI-CGBE show a clear preference for the "TCW" motif, where W is either A or T, which is consistent with the motif preference of cOPTI-CGBE. Figure 5 cf).

[0202] Unlike the motif preference of rAPOBEC1 and APOBEC3A, the inventors found that two variants of APOBEC3G constructed from E. coli UNG (hA3G-eOPTI-CGBE and hA3G-CTD-eOPTI-CGBE) showed a significant preference for the "CCN" motif. Figure 6 (cf). In addition, 26 gRNAs containing CCN motifs were tested (Table 9); during detection, PCR amplification and sequencing were performed using primers (internal forward and internal reverse primers designed based on the sequences flanking the target site). The results showed that the editing efficiency of CCN motif target sites was significantly improved compared to non-CCN motif target sites. Figure 6 e).

[0203] The inventors discovered that hA3G-cOPTI-CGBE and hA3G-CTD-cOPTI-CGBE, constructed using nematode UNG (replacing the previous eUNG or cUNG), also yielded similar results. Figure 6 Furthermore, when the target site contains three or more consecutive Cs, the APOBEC3G variants containing eUNG or cUNG exhibit high CG conversion efficiency, especially cUNG. Figure 6 g).

[0204] Table 9

[0205] On-target site On-target sequence SEQ ID NO: EMX1 site 18 gctgcccctacacccttcccagg 101 EMX1 site 19 tggcccctgtgccctgcagatgg 102 EMX1 site 20 ttacccagttctctgggagcagg 103 EMX1 site 21 tcctccccaaggggggcctctgg 104 EMX1 site 22 cctcccagattatgcatgagagg 105 EMX1 site 23 ggccccagtggctgctctggggg 106 EMX1 site 24 ggagcccttcttcttctgctcgg 107 EMX1 site 25 gctcccatcacatcaaccggtgg 108 EMX1 site 26 cactccctggccaggctttgggg 109 EMX1 site 27 accaccccaggctctggggctgg 110 EMX1 site 28 gtaccccaccccaggctcccagg 111 EMX1 site 29 tgtcccacttgtcaggatgaagg 112 DNMT3B site 16 catccctgacccagagaacccgg 113 DNMT3B site 17 catccctacctagccagatgggg 114 DNMT3B site 18 gcatccctggctgggtagatagg 115 DNMT3B site 19 cacaccctgtaaacagcaaatgg 116 DNMT3B site 20 gcatccctacttcacagatgggg 117 DNMT3B site 21 gggccctgtggggtggatgtggg 118 DNMT3B site 22 agctccctattcccatgcctagg 119 DNMT3B site 23 ttgccctggaagcagcacacagg 120 DNMT3B site 24 tcacccaggctggaatgcagtgg 121 EMX1 site 30 ctccccattggcctgcttcgtgg 122 EMX1 site 31 ttgcccaccctagtcattggagg 123 HEK293site 6 ctgcccaatatgtaaaataacgg 124 HEK293site 7 cttcccaagtgagaagccagtgg 125 FANCF site 6 caacccaaatgcctttctgaagg 126

[0206] To further broaden the target range of C to G editing, the inventors also constructed six CGBE-NGs editors capable of recognizing "NG" PAMs by replacing Cas9n with Cas9n-NG (Addgene Plasmid#138566), spG (Addgene Plasmid#140001)n, or xCas9n (Addgene Plasmid#138565) (Table 10 and 10). Figure 7 a).

[0207] Table 10

[0208] eOPTI-CGBE-Cas9n-NG Based on eOPTI-CGBE, Cas9n is replaced with Cas9n-NG. eOPTI-CGBE-spGn Based on eOPTI-CGBE, Cas9n is replaced with spGn. eOPTI-CGBE-xCas9n Based on eOPTI-CGBE, Cas9n is replaced with xCas9n. cOPTI-CGBE-Cas9n-NG Based on cOPTI-CGBE, Cas9n is replaced with Cas9n-NG. cOPTI-CGBE-spGn Based on cOPTI-CGBE, Cas9n is replaced with spGn. cOPTI-CGBE-xCas9n Based on cOPTI-CGBE, Cas9n is replaced with xCas9n.

[0209] gRNAs from 20 NG PAMs were tested (Table 11).

[0210] Table 11

[0211] On-target site On-target sequence SEQ ID NO: EMX1 site 32 caaacaagggcagatggcgcag 127 EMX1 site 33 aagttctggcaagaggggactg 128 EMX1 site 34 agaggactgcctgggaagggtg 129 EMX1 site 35 aaccatctgcagggcacagggg 130 EMX1 site 36 tctttctgaggcctgccctcag 131 EMX1 site 37 ccaactgcagctgtgcagacgg 132 EMX1 site 38 cagcacatttaaaagcaagtgg 133 EMX1 site 39 aatgttctggggacacttgcag 134 EMX1 site 40 aaccatcttcagcgttggcagg 135 EMX1 site 41 ggttctttacttggctttgcag 136 DNMT3B site 25 cagacagggtttcaccatgttg 137 DNMT3B site 26 aagttctgggattacagtcatg 138 DNMT3B site 27 gcatttctgatcagagagcctg 139 DNMT3B site 28 gacgtcagggaagcccgtactg 140 DNMT3B site 29 tgtttctgtgtggagtgcctgg 141 DNMT3B site 30 ctgttacatgtgtctcccgcag 142 DNMT3B site 31 gctgtcatggcgtcctgcggcg 143 DNMT3B site 32 gggttcattcactccctccttg 144 DNMT3B site 33 gattttcaggggcctgatgaag 145 DNMT3B site 34 gccttctgagttagcagagctg 146

[0212] The results are as follows Figure 7The editing efficiency of bc, Cas9n-NG, and spGn is higher than that of the xCas9n version, while the indel frequency of Cas9n-NG is lower than that of the spGn version. Therefore, Cas9n-NG is the best version for C to G editing on sites with NG PAMs.

[0213] Example 4: Library-level study of OPTI-CGBEs sequence preference

[0214] The inventors conducted a large-scale screening to evaluate the sequence preference of OPTI-CGBEs. A paired sgRNA library (from Dr. Leopold Parts of the Wellcome Sanger Institute) containing 41,388 clonal oligonucleotides, each containing a 20 nt sgRNA sequence and its target sequence, was used. HEK293T cells were infected with a lentiviral vector containing the paired sgRNA library, followed by transfection with a plasmid encoding one of the eight OPTI-CGBEs. Deep sequencing was then performed to evaluate the editing results and explore the impact of sequence background on editing efficiency. Target Cs were defined as those spanning 4-7 positions across the protospacer and covering an area exceeding 100X. The inventors discovered that the motif preferences of the library sequences are basically consistent with the motif preferences of the detected endogenous sites: eOPTI-CGBE and cOPTI-CGBE prefer the "WCW" motif, eA3A-OPTI-CGBEs prefer the "TCW" motif, and OPTI-CGBEs fused with APOBEC3G and APOBEC3G-CTD prefer the "CCN" motif. Figure 8 ad).

[0215] Next, the inventors used a training dataset (80% random sampling) from a paired sgRNA library (from Dr. Leopold Parts of the Wellcome Sanger Institute) to build a logistic regression model to learn motif preferences. The model was then tested with the remaining 20% ​​of the library, showing good performance. The learned parameters are shown by sequence identifiers, displaying motif preferences for eOPTI-CGBE (WCW), cOPTI-CGBE (TCW), eA3A-OPTI-CGBEs (TCW), and hA3G-OPTI-CGBEs (CCN). Figure 9 ).

[0216] Example 5: Application of OPTI-CGBEs in mouse embryos

[0217] The inventors obtained CGBE variants with high CG base editing efficiency and no off-target effects under different sequence environments. Then, OPTI-CGBEs were applied to genomic DNA editing in mouse embryos. The target genes and their gRNA sequences are shown in Table 12.

[0218] The mRNA encoding eOPTI-CGBE or cOPTI-CGBE was injected into the fertilized egg along with one of the three gRNAs targeting the Tyr gene. Figure 10 a). Embryonic development was not adversely affected by the injection. Figure 10 b).

[0219] Table 12

[0220] gRNA sequence (5'-3') SEQ ID NO: Tyr-A tcagtctatgtcatccccac 147 Tyr-B acatctacgacctctttgta 148 Tyr-C gtgtcaagggacacactgct 149

[0221] The inventors first discovered that both OPTI-CGBEs achieved high C-to-G base conversion efficiencies at three target sites in the Tyr gene (NC_000073.7). Figure 10 c, eOPTI-CGBE: 30.4%, 12.9% and 5.2%; cOPTI-CGBE: 33.2%, 22.1% and 12.5%.

[0222] The inventors demonstrated that OPTI-CGBEs-mediated base editing during the two-cell stage of embryonic development, consistent with previous findings, significantly improved the C-to-G conversion efficiency of OPTI-CGBEs after injection into all three Tyr target sites at the two-cell stage. Figure 10 c; eOPTI-CGBE: 41.1%, 25.2%, and 11.0%; cOPTI-CGBE: 36.4%, 30.5%, and 12.3%. The efficacy of embryo injection was further validated by Sanger sequencing. Figure 10 d).

[0223] The introduction of a stop codon into the Tyr gene resulted in an albino phenotype in C57BL / 6J mice. The inventors injected eOPTI-CGBE mRNA and sgRNA-Tyr-C into fertilized eggs or embryos at the two-cell stage, transplanted the embryos into recipient mothers, and tracked the hair color phenotype of the offspring. The inventors found that offspring injected at the two-cell stage also showed a higher C-to-G editing rate in the Tyr gene compared to zygote-injected embryos. Furthermore, offspring from fertilized egg-injected embryos mostly had wild-type black hair and a small portion of black-and-white chimeric hair, while offspring from two-cell-injected embryos mostly had a larger proportion of chimeric hair and a small proportion of uniform white hair. More than 50% of the offspring from chimeric-haired females mating with males had white hair, and no offspring with chimeric hair were found. Figure 10 e).

[0224] These results demonstrate that the optimized OPTI-CGBEs of this invention are an effective base editing tool for mammalian embryonic genomes.

[0225] All documents mentioned in this invention are incorporated herein by reference as if each document were individually incorporated by reference. Furthermore, it should be understood that after reading the foregoing teachings of this invention, those skilled in the art can make various alterations or modifications to this invention, and these equivalent forms also fall within the scope defined by the appended claims. sequence list <110> Shenzhen Institute of Agricultural Genomics, Chinese Academy of Agricultural Sciences <120> A high-efficiency and high-precision base editor for cytosine C to guanine G conversion <130> 216297 <160> 149 <170> SIPOSequenceListing 1.0 <210> 1 <211> 282 <212> PRT <213> C. elegans <400> 1 Met Ser Lys Thr Val Arg Ile Pro Asp Met Phe Leu Lys Ala Ser Ala 1 5 10 15 Ala Ser Lys Arg Lys Ser Ala Ser Asn Thr Glu Asn Ile Pro Glu Lys 20 25 30 Val Pro Ala Gly Asn Glu Asn Gln Glu Val Lys Lys Met Lys Leu Gln 35 40 45 Ala Pro Glu Pro Thr Glu Ile Leu Leu Lys Ser Leu Leu Thr Gly Glu 50 55 60 Ser Trp Ser Lys Leu Leu Glu Glu Glu Phe Lys Lys Gly Tyr Ile Ser 65 70 75 80 Lys Ile Glu Lys Phe Leu Asn Ser Glu Val Asn Lys Gly Lys Gln Val 85 90 95 Phe Pro Pro Pro Thr Gln Ile Phe Thr Thr Phe Asn Leu Leu Pro Phe 100 105 110 Asp Glu Ile Ser Val Val Ile Ile Gly Gln Asp Pro Tyr His Asp Asp 115 120 125 Asn Gln Ala His Gly Leu Ser Phe Ser Val Gln Lys Gly Val Lys Pro 130 135 140 Pro Pro Ser Leu Lys Asn Ile Tyr Lys Glu Leu Glu Ser Asp Ile Glu 145 150 155 160 Gly Phe Lys Arg Pro Asp His Gly Asn Leu Leu Gly Trp Thr Arg Gln 165 170 175 Gly Val Phe Met Leu Asn Ala Thr Leu Thr Val Arg Ala His Glu Ala 180 185 190 Asn Ser His Ala Lys Ile Gly Trp Gln Thr Phe Thr Asp Thr Val Ile 195 200 205 Arg Ile Ile Ser Arg Gln Ser Glu Lys Pro Ile Val Phe Leu Leu Trp 210 215 220 Gly Gly Phe Ala His Lys Lys Glu Glu Leu Ile Asp Thr Lys Lys His 225 230 235 240 Val Val Ile Lys Thr Ala His Pro Ser Pro Leu Ser Ala Arg Lys Trp 245 250 255 Trp Gly Cys Lys Cys Phe Ser Lys Cys Asn Thr Glu Leu Glu Asn Ser 260 265 270 Gly Arg Asn Pro Ile Asn Trp Ala Asp Leu 275 280 <210> 2 <211> 229 <212> PRT <213> E.coli <400> 2 Met Ala Asn Glu Leu Thr Trp His Asp Val Leu Ala Glu Glu Lys Gln 1 5 10 15 Gln Pro Tyr Phe Leu Asn Thr Leu Gln Thr Val Ala Ser Glu Arg Gln 20 25 30 Ser Gly Val Thr Ile Tyr Pro Pro Gln Lys Asp Val Phe Asn Ala Phe 35 40 45 Arg Phe Thr Glu Leu Gly Asp Val Lys Val Val Ile Leu Gly Gln Asp 50 55 60 Pro Tyr His Gly Pro Gly Gln Ala His Gly Leu Ala Phe Ser Val Arg 65 70 75 80 Pro Gly Ile Ala Ile Pro Pro Ser Leu Leu Asn Met Tyr Lys Glu Leu 85 90 95 Glu Asn Thr Ile Pro Gly Phe Thr Arg Pro Asn His Gly Tyr Leu Glu 100 105 110 Ser Trp Ala Arg Gln Gly Val Leu Leu Leu Asn Thr Val Leu Thr Val 115 120 125 Arg Ala Gly Gln Ala His Ser His Ala Ser Leu Gly Trp Glu Thr Phe 130 135 140 Thr Asp Lys Val Ile Ser Leu Ile Asn Gln His Arg Glu Gly Val Val 145 150 155 160 Phe Leu Leu Trp Gly Ser His Ala Gln Lys Lys Gly Ala Ile Ile Asp 165 170 175 Lys Gln Arg His His Val Leu Lys Ala Pro His Pro Ser Pro Leu Ser 180 185 190 Ala His Arg Gly Phe Phe Gly Cys Asn His Phe Val Leu Ala Asn Gln 195 200 205 Trp Leu Glu Gln Arg Gly Glu Thr Pro Ile Asp Trp Met Pro Val Leu 210 215 220 Pro Ala Glu Ser Glu 225 <210> 3 <211> 32 <212> PRT <213> Artificial Sequence <400> 3 Ser Gly Gly Ser Ser Gly Gly Ser Ser Gly Ser Glu Thr Pro Gly Thr 1 5 10 15 Ser Glu Ser Ala Thr Pro Glu Ser Ser Gly Gly Ser Ser Gly Gly Ser 20 25 30 <210> 4 <211> 16 <212> PRT <213> Artificial Sequence <400> 4 Ser Gly Ser Glu Thr Pro Gly Thr Ser Glu Ser Ala Thr Pro Glu Ser 1 5 10 15 <210> 5 <211> 22 <212> PRT <213> Artificial Sequence <400> 5 Asp Tyr Lys Asp His Asp Gly Asp Tyr Lys Asp His Asp Ile Asp Tyr 1 5 10 15 Lys Asp Asp Asp Asp Lys 20 <210> 6 <211> 7 <212> PRT <213> Artificial Sequence <400> 6 Pro Lys Lys Lys Arg Lys Val 1 5 <210> 7 <211> 23 <212> DNA <213> Artificial Sequence <400> 7 tgcccctccc tccctggccc agg 23 <210> 8 <211> 23 <212> DNA <213> Artificial Sequence <400> 8 agagcccccc ctcaaagaga ggg 23 <210> 9 <211> 23 <212> DNA <213> Artificial Sequence <400> 9 gagtccgagc agaagaagaa ggg 23 <210> 10 <211> 23 <212> DNA <213> Artificial Sequence <400> 10 ggaatccctt ctgcagcacc tgg 23 <210> 11 <211> 23 <212> DNA <213> Artificial Sequence <400> 11 gaacacaaag catagactgc ggg 23 <210> 12 <211> 23 <212> DNA <213> Artificial Sequence <400> 12 ggcccagact gagcacgtga tgg 23 <210> 13 <211> 23 <212> DNA <213> Artificial Sequence <400> 13 ggcactgcgg ctggaggtgg ggg 23 <210> 14 <211> 23 <212> DNA <213> Artificial Sequence <400> 14 gtcatcttag tcattacctg agg 23 <210> 15 <211> 23 <212> DNA <213> Artificial Sequence <400> 15 gtattcacct gaaagtgtgc agg 23 <210> 16 <211> 23 <212> DNA <213> Artificial Sequence <400> 16 ggcactcggg ggcgagagga ggg 23 <210> 17 <211> 23 <212> DNA <213> Artificial Sequence <400> 17 gcgtgacttc cacatgagcg tgg 23 <210> 18 <211> 23 <212> DNA <213> Artificial Sequence <400> 18 gactcaccca ggagtgcgtt agg 23 <210> 19 <211> 23 <212> DNA <213> Artificial Sequence <400> 19 gagctcactg aacgctggca tgg 23 <210> 20 <211> 23 <212> DNA <213> Artificial Sequence <400> 20 gttagaccca tgggagcagc tgg 23 <210> 21 <211> 23 <212> DNA <213> Artificial Sequence <400> 21 agagcctgat gggaagactg agg 23 <210> 22 <211> 23 <212> DNA <213> Artificial Sequence <400> 22 gtagcctcag tcttcccatc agg 23 <210> 23 <211> 23 <212> DNA <213> Artificial Sequence <400> 23 aagtcctcct actactgccc tgg 23 <210> 24 <211> 23 <212> DNA <213> Artificial Sequence <400> 24 agtctccaca caggtgctgt tgg 23 <210> 25 <211> 23 <212> DNA <213> Artificial Sequence <400> 25 tgtcccccat cctgccccag agg 23 <210> 26 <211> 23 <212> DNA <213> Artificial Sequence <400> 26 tcacctgggc cagggaggga ggg 23 <210> 27 <211> 23 <212> DNA <213> Artificial Sequence <400> 27 gggaccccgc caccgtgcgc cgg 23 <210> 28 <211> 23 <212> DNA <213> Artificial Sequence <400> 28 cgccgtctcc aaggtgaaag cgg 23 <210> 29 <211> 23 <212> DNA <213> Artificial Sequence <400> 29 acgcctctct gcaatgctat tgg 23 <210> 30 <211> 23 <212> DNA <213> Artificial Sequence <400> 30 gacgccctct ggaggaagca ggg 23 <210> 31 <211> 23 <212> DNA <213> Artificial Sequence <400> 31 cagctcctgc accgggatac tgg 23 <210> 32 <211> 23 <212> DNA <213> Artificial Sequence <400> 32 ctgacctgca ttctctcccc tgg 23 <210> 33 <211> 23 <212> DNA <213> Artificial Sequence <400> 33 aggcccaggg gagagaatgc agg 23 <210> 34 <211> 23 <212> DNA <213> Artificial Sequence <400> 34 gaagccagta gagctcaaag tgg 23 <210> 35 <211> 23 <212> DNA <213> Artificial Sequence <400> 35 tgccgtctct ctcctgagtc cgg 23 <210> 36 <211> 23 <212> DNA <213> Artificial Sequence <400> 36 gctccttctc tgagtggtaa agg 23 <210> 37 <211> 23 <212> DNA <213> Artificial Sequence <400> 37 aagttcgcta atcccggaac tgg 23 <210> 38 <211> 23 <212> DNA <213> Artificial Sequence <400> 38 cgtcgccgat cttcacaggg tgg 23 <210> 39 <211> 23 <212> DNA <213> Artificial Sequence <400> 39 gtaacgagct gcatccccga ggg 23 <210> 40 <211> 23 <212> DNA <213> Artificial Sequence <400> 40 ggggctcaac atcggaagag ggg 23 <210> 41 <211> 23 <212> DNA <213> Artificial Sequence <400> 41 gagtccgagc agaagaagaa ggg 23 <210> 42 <211> 23 <212> DNA <213> Artificial Sequence <400> 42 gagttagagc agaagaagaa agg 23 <210> 43 <211> 23 <212> DNA <213> Artificial Sequence <400> 43 gagtctaagc agaagaaga gag <210> 44 <211> 23 <212> DNA <213> Artificial Sequence <400> 44 gagtcctagc gag gag <210> 45 <211> 23 <212> DNA <213> Artificial Sequence <400> 45 gagtccggga aggagaagaa agg <210> 46 <211> 23 <212> DNA <213> Artificial Sequence <400> 46 aagtccgagg rich agg <210> 47 <211> 23 <212> DNA <213> Artificial Sequence <400> 47 gaatccaagc 23 years old <210> 48 <211> 23 <212> DNA <213> Artificial Sequence <400> 48 acgtctgagc aggravated tgg <210> 49 <211> 23 <212> DNA <213> Artificial Sequence <400> 49 ggaatccctt ctgcagcacc tgg 23 <210> 50 <211> 23 <212> DNA <213> Artificial Sequence <400> 50 ggaaccccgt ctgcagcacc agg 23 <210> 51 <211> 23 <212> DNA <213> Artificial Sequence <400> 51 ggagtccctc ctacagcacc agg 23 <210> 52 <211> 23 <212> DNA <213> Artificial Sequence <400> 52 agaggcccct ctgcagcacc agg 23 <210> 53 <211> 23 <212> DNA <213> Artificial Sequence <400> 53 accatccctc ctgcagcacc agg 23 <210> 54 <211> 23 <212> DNA <213> Artificial Sequence <400> 54 tgaatcccat ctccagcacc agg 23 <210> 55 <211> 23 <212> DNA <213> Artificial Sequence <400> 55 ggagtccctc ctacagcacc agg 23 <210> 56 <211> 23 <212> DNA <213> Artificial Sequence <400> 56 ggagtccctc ctgcagcacc tga 23 <210> 57 <211> 23 <212> DNA <213> Artificial Sequence <400> 57 gaacacaaag catagactgc ggg 23 <210> 58 <211> 23 <212> DNA <213> Artificial Sequence <400> 58 gaacacaatg catagattgc cgg 23 <210> 59 <211> 23 <212> DNA <213> Artificial Sequence <400> 59 aaacataaag catagactgc aaa 23 <210> 60 <211> 23 <212> DNA <213> Artificial Sequence <400> 60 tcagggtgag catagactgc cgg 23 <210> 61 <211> 23 <212> DNA <213> Artificial Sequence <400> 61 tgaagtgttg catagactgc agg 23 <210> 62 <211> 23 <212> DNA <213> Artificial Sequence <400> 62 ggagagagag catagactgc tgg 23 <210> 63 <211> 23 <212> DNA <213> Artificial Sequence <400> 63 ccaaacaaaa catagactgc tgg 23 <210> 64 <211> 23 <212> DNA <213> Artificial Sequence <400> 64 ccaaacaaaa catagactgc tgg 23 <210> 65 <211> 23 <212> DNA <213> Artificial Sequence <400> 65 cacccagact gagcacgtgc tgg 23 <210> 66 <211> 23 <212> DNA <213> Artificial Sequence <400> 66 agctcagact gagcaagtga ggg 23 <210> 67 <211> 23 <212> DNA <213> Artificial Sequence <400> 67 agaccagact gagcaagaga ggg 23 <210> 68 <211> 23 <212> DNA <213> Artificial Sequence <400> 68 gagccagaat gagcacgtga ggg 23 <210> 69 <211> 23 <212> DNA <213> Artificial Sequence <400> 69 caggaagctg gagcacgtga ggg 23 <210> 70 <211> 23 <212> DNA <213> Artificial Sequence <400> 70 aaggctgagg gagcacgtga agg 23 <210> 71 <211> 23 <212> DNA <213> Artificial Sequence <400> 71 gtcaggggaa gagcacgtga cgg 23 <210> 72 <211> 23 <212> DNA <213> Artificial Sequence <400> 72 gttgtgaact gagcacgtga ggg 23 <210> 73 <211> 23 <212> DNA <213> Artificial Sequence <400> 73 atatttgctg gagcacgtga agg 23 <210> 74 <211> 23 <212> DNA <213> Artificial Sequence <400> 74 ggcactgcgg ctggaggtgg ggg 23 <210> 75 <211> 23 <212> DNA <213> Artificial Sequence <400> 75 tgcactgcgg ccggaggagg tgg 23 <210> 76 <211> 23 <212> DNA <213> Artificial Sequence <400> 76 ggctctgcgg ctggaggggg tgg 23 <210> 77 <211> 23 <212> DNA <213> Artificial Sequence <400> 77 agcagtgcgg ctagaggtgg tgg 23 <210> 78 <211> 23 <212> DNA <213> Artificial Sequence <400> 78 ggcactgggg ctgggggagg ggg 23 <210> 79 <211> 23 <212> DNA <213> Artificial Sequence <400> 79 gtggctggag gtggaggtgg ggg 23 <210> 80 <211> 23 <212> DNA <213> Artificial Sequence <400> 80 gagggaaggg ctggaggtgg agg 23 <210> 81 <211> 23 <212> DNA <213> Artificial Sequence <400> 81 aagtctctct cttaatgaca cgg 23 <210> 82 <211> 23 <212> DNA <213> Artificial Sequence <400> 82 gcgttcaggg cctgggagcc tgg 23 <210> 83 <211> 23 <212> DNA <213> Artificial Sequence <400> 83 gtctttctta acgtattgag agg 23 <210> 84 <211> 23 <212> DNA <213> Artificial Sequence <400> 84 caaactcaac cacttcctgt tgg 23 <210> 85 <211> 23 <212> DNA <213> Artificial Sequence <400> 85 aggacttctc tttggctccc tgg 23 <210> 86 <211> 23 <212> DNA <213> Artificial Sequence <400> 86 tctagactca acccactaga tgg 23 <210> 87 <211> 23 <212> DNA <213> Artificial Sequence <400> 87 attcacagaa acctccctag agg 23 <210> 88 <211> 23 <212> DNA <213> Artificial Sequence <400> 88 ttgatcaatg tggaactcac tgg 23 <210> 89 <211> 23 <212> DNA <213> Artificial Sequence <400> 89 tgctgtctgg gccacatgca ggg 23 <210> 90 <211> 23 <212> DNA <213> Artificial Sequence <400> 90 cctgactgat ggagggagag ggg 23 <210> 91 <211> 23 <212> DNA <213> Artificial Sequence <400> 91 aatttcttga atccaggagg cgg 23 <210> 92 <211> 23 <212> DNA <213> Artificial Sequence <400> 92 atagtctaca tttatagtgc tgg 23 <210> 93 <211> 23 <212> DNA <213> Artificial Sequence <400> 93 atcacagaca tgggtatgcc agg 23 <210> 94 <211> 23 <212> DNA <213> Artificial Sequence <400> 94 taaacatgtg ttaaggatcc tgg 23 <210> 95 <211> 23 <212> DNA <213> Artificial Sequence <400> 95 ggctctgatc ttcatcccct cgg 23 <210> 96 <211> 23 <212> DNA <213> Artificial Sequence <400> 96 gggatcatgg gacagatggg agg 23 <210> 97 <211> 23 <212> DNA <213> Artificial Sequence <400> 97 tcttttctca atagaacaaa tgg 23 <210> 98 <211> 23 <212> DNA <213> Artificial Sequence <400> 98 cctgacaggg tggatgagcc agg 23 <210> 99 <211> 23 <212> DNA <213> Artificial Sequence <400> 99 ttagtcactt gcctactgtc agg 23 <210> 100 <211> 23 <212> DNA <213> Artificial Sequence <400> 100 tgcctactgg ttaaaggacc tgg 23 <210> 101 <211> 23 <212> DNA <213> Artificial Sequence <400> 101 gctgccccta cacccttccc agg 23 <210> 102 <211> 23 <212> DNA <213> Artificial Sequence <400> 102 tggcccctgt gccctgcaga tgg 23 <210> 103 <211> 23 <212> DNA <213> Artificial Sequence <400> 103 ttacccagtt ctctgggagc agg 23 <210> 104 <211> 23 <212> DNA <213> Artificial Sequence <400> 104 tcctccccaa ggggggcctc tgg 23 <210> 105 <211> 23 <212> DNA <213> Artificial Sequence <400> 105 cctcccagat tatgcatgag agg 23 <210> 106 <211> 23 <212> DNA <213> Artificial Sequence <400> 106 ggccccagtg gctgctctgg ggg 23 <210> 107 <211> 23 <212> DNA <213> Artificial Sequence <400> 107 ggagcccttc ttcttctgct cgg 23 <210> 108 <211> 23 <212> DNA <213> Artificial Sequence <400> 108 gctcccatca catcaaccgg tgg 23 <210> 109 <211> 23 <212> DNA <213> Artificial Sequence <400> 109 cactccctgg ccaggctttg ggg 23 <210> 110 <211> 23 <212> DNA <213> Artificial Sequence <400> 110 accaccccag gctctggggc tgg 23 <210> 111 <211> 23 <212> DNA <213> Artificial Sequence <400> 111 gtaccccacc ccaggctccc agg 23 <210> 112 <211> 23 <212> DNA <213> Artificial Sequence <400> 112 tgtcccactt gtcaggatga agg 23 <210> 113 <211> 23 <212> DNA <213> Artificial Sequence <400> 113 catccctgac ccagagaacc cgg 23 <210> 114 <211> 23 <212> DNA <213> Artificial Sequence <400> 114 catccctacc tagccagatg ggg 23 <210> 115 <211> 23 <212> DNA <213> Artificial Sequence <400> 115 gcatccctgg ctgggtagat agg 23 <210> 116 <211> 23 <212> DNA <213> Artificial Sequence <400> 116 cacaccctgt aaacagcaaa tgg 23 <210> 117 <211> 23 <212> DNA <213> Artificial Sequence <400> 117 gcatccctac ttcacagatg ggg 23 <210> 118 <211> 23 <212> DNA <213> Artificial Sequence <400> 118 gggccctgtg gggtggatgt ggg 23 <210> 119 <211> 23 <212> DNA <213> Artificial Sequence <400> 119 agctccctat tcccatgcct agg 23 <210> 120 <211> 23 <212> DNA <213> Artificial Sequence <400> 120 ttgccctgga agcagcacac agg 23 <210> 121 <211> 23 <212> DNA <213> Artificial Sequence <400> 121 tcacccaggc tggaatgcag tgg 23 <210> 122 <211> 23 <212> DNA <213> Artificial Sequence <400> 122 ctccccattg gcctgcttcg tgg 23 <210> 123 <211> 23 <212> DNA <213> Artificial Sequence <400> 123 ttgcccaccc tagtcattgg agg 23 <210> 124 <211> 23 <212> DNA <213> Artificial Sequence <400> 124 ctgcccaata tgtaaaataa cgg 23 <210> 125 <211> 23 <212> DNA <213> Artificial Sequence <400> 125 cttcccaagt gagaagccag tgg 23 <210> 126 <211> 23 <212> DNA <213> Artificial Sequence <400> 126 caacccaaat gcctttctga agg 23 <210> 127 <211> 22 <212> DNA <213> Artificial Sequence <400> 127 caaacaaggg cagatggcgc ag 22 <210> 128 <211> 22 <212> DNA <213> Artificial Sequence <400> 128 aagttctggc aagaggggac tg 22 <210> 129 <211> 22 <212> DNA <213> Artificial Sequence <400> 129 agaggactgc ctgggaaggg tg 22 <210> 130 <211> 22 <212> DNA <213> Artificial Sequence <400> 130 aaccatctgc agggcacagg gg 22 <210> 131 <211> 22 <212> DNA <213> Artificial Sequence <400> 131 tctttctgag gcctgccctc ag 22 <210> 132 <211> 22 <212> DNA <213> Artificial Sequence <400> 132 ccaactgcag ctgtgcagac gg 22 <210> 133 <211> 22 <212> DNA <213> Artificial Sequence <400> 133 cagcacattt aaaagcaagt gg 22 <210> 134 <211> 22 <212> DNA <213> Artificial Sequence <400> 134 aatgttctgg ggacacttgc ag 22 <210> 135 <211> 22 <212> DNA <213> Artificial Sequence <400> 135 aaccatcttc agcgttggca gg 22 <210> 136 <211> 22 <212> DNA <213> Artificial Sequence <400> 136 ggttctttac ttggctttgc ag 22 <210> 137 <211> 22 <212> DNA <213> Artificial Sequence <400> 137 cagacagggt ttcaccatgt tg 22 <210> 138 <211> 22 <212> DNA <213> Artificial Sequence <400> 138 aagttctggg attacagtca tg 22 <210> 139 <211> 22 <212> DNA <213> Artificial Sequence <400> 139 gcatttctga tcagagagcc tg 22 <210> 140 <211> 22 <212> DNA <213> Artificial Sequence <400> 140 gacgtcaggg aagcccgtac tg 22 <210> 141 <211> 22 <212> DNA <213> Artificial Sequence <400> 141 tgtttctgtg tggagtgcct gg 22 <210> 142 <211> 22 <212> DNA <213> Artificial Sequence <400> 142 ctgttacatg tgtctcccgc ag 22 <210> 143 <211> 22 <212> DNA <213> Artificial Sequence <400> 143 gctgtcatgg cgtcctgcgg cg 22 <210> 144 <211> 22 <212> DNA <213> Artificial Sequence <400> 144 gggttcattc actccctcct tg 22 <210> 145 <211> 22 <212> DNA <213> Artificial Sequence <400> 145 gattttcagg ggcctgatga ag 22 <210> 146 <211> 22 <212> DNA <213> Artificial Sequence <400> 146 gccttctgag ttagcagagc tg 22 <210> 147 <211> 20 <212> DNA <213> Artificial Sequence <400> 147 tcagtctatg tcatccccac 20 <210> 148 <211> 20 <212> DNA <213> Artificial Sequence <400> 148 acatctacga cctctttgta 20 <210> 149 <211> 20 <212> DNA <213> Artificial Sequence <400> 149 gtgtcaaggg acacactgct 20

Claims

1. A method for improving the efficiency and accuracy of targeted editing in a C-to-G base editor, characterized in that, The method includes: modifying a C-to-G base editor to include the following elements in an operationally sequentially linked manner: nuclear localization signal - uracil N-glycosylation enzyme - cytosine deaminase - Cas enzyme - nuclear localization signal; The uracil N-glycosylation enzyme is a uracil N-glycosylation enzyme with an amino acid sequence from nematodes as shown in SEQ ID NO: 1, or a uracil N-glycosylation enzyme with an amino acid sequence from Escherichia coli as shown in SEQ ID NO:

2. The cytosine deaminase is rAPOBEC1, with a mutation at position 90 to Tyr and a mutation at position 126 to Glu; The Cas9 enzymes mentioned include enzymes selected from the group consisting of: Cas9n, Cas9n-NG; The uracil N-glycosylation enzyme and the cytosine deaminase are connected by a linker with the sequence shown in SEQ ID NO: 3; The cytosine deaminase and the Cas enzyme are connected by a linker with the sequence shown in SEQ ID NO: 4; The amino acid sequence of the nuclear localization signal is shown in SEQ ID NO:

6.

2. The method as described in claim 1, characterized in that, The amino terminus of the nuclear localization sequence also includes a tag sequence.

3. The method as described in claim 2, characterized in that, The tag sequence is a 3xFlag tag, and its amino acid sequence is shown in SEQ ID NO:

5.

4. The method as described in claim 1, characterized in that, The improvements to the targeted editing efficiency and accuracy of the C-to-G base editor include: increasing the product purity of the edited product, narrowing the editing window, and reducing off-target effects.

5. The method as described in claim 4, characterized in that, The improvement of the C-to-G base editor's targeted editing efficiency and editing accuracy includes: increasing the probability of the C-to-G base editor targeting the tri-base motif "WCW"; where W represents A or T, and N represents A, T, C, or G.

6. A C-to-G base editor, characterized in that, It comprises the following elements in an operatively sequentially connected manner: nuclear localization signal - uracil N-glycosylation enzyme - cytosine deaminase - Cas enzyme - nuclear localization signal; wherein the uracil N-glycosylation enzyme is a uracil N-glycosylation enzyme with an amino acid sequence from nematodes as shown in SEQ ID NO: 1, or a uracil N-glycosylation enzyme with an amino acid sequence from Escherichia coli as shown in SEQ ID NO: 2; The cytosine deaminase is rAPOBEC1, with a mutation at position 90 to Tyr and a mutation at position 126 to Glu; The Cas9 enzymes mentioned include enzymes selected from the group consisting of: Cas9n, Cas9n-NG; The uracil N-glycosylation enzyme and the cytosine deaminase are connected by a linker with the sequence shown in SEQ ID NO: 3; The cytosine deaminase and the Cas enzyme are connected by a linker with the sequence shown in SEQ ID NO: 4; The amino acid sequence of the nuclear localization signal is shown in SEQ ID NO:

6.

7. The C-to-G base editor as described in claim 6, characterized in that, The amino terminus of the nuclear localization sequence also includes a tag sequence.

8. The C-to-G base editor as described in claim 7, characterized in that, The tag is a 3xFlag tag, and its amino acid sequence is shown in SEQ ID NO:

5.

9. An isolated polynucleotide, said polynucleotide encoding a C-to-G base editor as described in any one of claims 6-8.

10. A recombinant expression vector containing the polynucleotide of claim 9.

11. A genetically engineered host cell containing the recombinant expression vector of claim 10, or having the isolated polynucleotide of claim 9 integrated into its genome.

12. The use of the C-to-G base editor according to any one of claims 6-8 for C-to-G gene editing, improving the targeted editing efficiency and editing accuracy of the C-to-G base editor; the gene editing method is a cell-level method, including single-cell, two-cell, or multi-cell methods; the target of the gene editing method will not develop into a living organism.

13. Use of the C-to-G base editor according to any one of claims 6-8, for preparing reagents or kits for performing C-to-G gene editing and improving the targeted editing efficiency and editing accuracy of the C-to-G base editor.

14. A method for performing C-to-G gene editing, comprising gene editing mediated by a C-to-G base editor as described in any one of claims 6-8; wherein the gene editing method is a cellular level method, including single-cell, two-cell, or multi-cell methods; and wherein the gene editing method is directed at an organism that will not develop into a living organism.

15. The method for C-to-G gene editing as described in claim 14, characterized in that, Gene editing is performed by co-injecting the nucleic acid sequence encoding the C-to-G base editor and gRNA / sgRNA into the receptor.

16. A reagent or kit for performing C-to-G gene editing, comprising the C-to-G base editor according to any one of claims 6-8; or It contains the isolated polynucleotide as described in claim 9; or It contains a recombinant expression vector of the polynucleotide as described in claim 9.