Optimized protein linker and method of use

Optimized linker sequences and domain architectures for Cas12a-based adenine base editors address the inefficiencies of previous systems, enabling effective nucleic acid editing in various organisms and commercial applications.

JP7883986B2Inactive Publication Date: 2026-07-02PAIRWISE PLANTS SERVICES INC

Patent Information

Authority / Receiving Office
JP · JP
Patent Type
Patents
Current Assignee / Owner
PAIRWISE PLANTS SERVICES INC
Filing Date
2021-07-21
Publication Date
2026-07-02
Estimated Expiration
Not applicable · inactive patent

AI Technical Summary

Technical Problem

Existing CRISPR-based adenine base editors using Cas12a have not demonstrated successful editing, likely due to unoptimized linker sequences and domain architectures that differ from those used in Cas9-based systems.

Method used

Designing a novel linker sequence and optimizing the domain architecture of a Cas12a-based adenine base editor, incorporating specific amino acid sequences (SEQ ID NOs: 1 to 24) to enhance editing efficiency and versatility.

Benefits of technology

The optimized Cas12a-based adenine base editor enables targeted nucleic acid editing in both prokaryotes and eukaryotes, expanding the repertoire of site-directed base editing tools and making them suitable for commercial applications, particularly in editing the genomes of commercially relevant crops.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure 0007883986000005
    Figure 0007883986000005
  • Figure 0007883986000006
    Figure 0007883986000006
  • Figure 0007883986000007
    Figure 0007883986000007
Patent Text Reader

Abstract

The present invention relates to peptide linkers and fusion proteins containing linkers designed to optimize the activity of the proteins contained therein, as well as methods of using them. The present invention further relates to newly designed adenine base editors based on Cas12a.
Need to check novelty before this filing date? Find Prior Art

Description

[Technical Field]

[0001] Statement regarding the electronic file of the sequence listing Instead of a paper copy, we provide the sequence listing in ASCII text format, titled 1499.50.WO_ST25.txt, created on July 14, 2021, and submitted via EFS-Web, in accordance with 37C.FR§1.821. This sequence listing, by reference, constitutes part of this specification.

[0002] Declaration of priority This application claims the interests of U.S. Provisional Patent Application No. 63 / 054,449 filed July 21, 2020, in accordance with 35 U.S. SC § 119(e), and the entirety of this application constitutes a part of this specification by reference.

[0003] Field of the present invention The present invention relates to peptide linkers, fusion proteins containing linkers designed to optimize the activity of proteins contained therein, and methods for using them. The present invention further relates to a newly designed Cas12a-based adenine base editor. [Background technology]

[0004] Over the past six years, CRISPR-based gene editing tools (particularly those based on Cas9) have become increasingly widespread. While early tools relied on Cas9's ability to generate blunt-end double-strand breaks in DNA, along with double-strand break repair mechanisms such as homologous recombination and non-homologous end joining, newer methods have been developed that primarily utilize modified versions of nucleases as targeting tools for other covalently bonded effector proteins. Notably, the first Cas9-based base editor was developed by ligating Cas9 to a deaminase domain (see, e.g., Gaudelli et al. Nature 551:464-471 (2017)). The first cytosine base editor was constructed by ligating the rat APOBEC1 domain (apolipoprotein B mRNA editing enzyme), which deaminates cytosine to uracil in both RNA and DNA, to the N-terminus of Cas9 using a previously published, unstructured XTEN protein-based linker (Komor et al.). al. Nature 533(7603):420-424 (2016). The uracil DNA glycosylase inhibitor (UGI) domain was ligated to the C-terminus of Cas9 to reduce base excision repair activity. Subsequent versions of the Cas9 cytosine base editor (CBE) doubled the length of both linkers by adding flexible glycine and serine residues and added an additional UGI domain. Subsequently, using the same architecture and linkers, the adenine base editor (ABE) was developed by removing the UGI domain and replacing the APOBEC1 domain with an E. coli TadA (tRNA-specific adenosine deaminase) domain that typically targets tRNA but has evolved to target DNA. The evolved TadA deaminates adenine to form inosine, which base pairs with cytosine during DNA replication, leading to A→G or T→C editing. The latest version of the ABE has been optimized for use in human cells through codon optimization and improved nuclear localization signaling.

[0005] Cas12a, also known as Cpf1, is a more recently discovered CRISPR endonuclease and is increasingly being used as a genome editing tool. Cas12a differs from Cas9 in several respects, including its size, nuclease activity, the direction in which the nuclease binds to its guide RNA, and the protospacer adjacent motif (PAM) it recognizes. However, the success of adenine base editing using Cas12a has not been demonstrated. Therefore, a new adenosine base editing tool using Cas12a is needed to overcome the shortcomings of this technology. [Overview of the Initiative]

[0006] State-of-the-art CRISPR-based adenine base editors primarily involve fusing the N-terminus of Cas9 to a TadA heterodimer evolved via a GS-XTEN-GS linker. While these Cas9-based ABEs efficiently edit DNA, similar fusions to Cas12a have not been found to produce successful edits. The linker sequences of Cas9 ABEs have not yet been optimized based on the position of the deaminase domain. Furthermore, due to structural differences between Cas9 and Cas12a, linker sequences and domain architectures useful for Cas9-based ABEs may not be ideal for Cas12a-based ABEs. We have designed a novel linker sequence and optimized the domain architecture of a Cas12a-based adenine base editor, which may enable the targeting of new sites and / or expand the repertoire of site-directed base editing tools and / or make them suitable for commercial use. We also provide a method for modifying nucleic acids using the fusion protein and / or the polynucleotide encoding it. These editors may be used for applications in prokaryotes and / or eukaryotes, including editing the genomes of commercially relevant crops.

[0007] One aspect of the present invention provides a polypeptide comprising any one of the amino acid sequences of SEQ ID NOs: 1 to 24 (L1 to L24).

[0008] A second aspect of the present invention provides a polypeptide comprising a Cas12a domain and one of the amino acid sequences of SEQ ID NOs: 1 to 24.

[0009] A third embodiment provides a fusion protein comprising a Cas12a domain, a polypeptide of interest, and one of the amino acid sequences of SEQ ID NOs: 1 to 24.

[0010] A fourth aspect provides a V-type Clustered Regularly Interspaced Short Palindromic Repeats (CRISPR)-related (Cas) (CRISPR-Cas) system comprising (a) a fusion protein comprising a Cas12a domain, a linker comprising one amino acid sequence of SEQ ID NOs. 1 to 24, and a polypeptide of interest, wherein the Cas12a domain is linked to the polypeptide of interest via one amino acid sequence of SEQ ID NOs. 1 to 24, or a nucleic acid encoding the fusion protein; and (b) a guide nucleic acid (CRISPR RNA, CRISPR DNA, crRNA, crDNA, gRNA) comprising a spacer sequence and a repeat sequence, wherein the guide nucleic acid can form a complex with the Cas12a domain of the fusion protein, and the spacer sequence can hybridize to a target nucleic acid, thereby guiding the Cas12a domain and the polypeptide of interest to the target nucleic acid, thereby allowing the system to modify or modulate the target nucleic acid.

[0011] A fifth aspect of the present invention provides a fusion protein comprising (a) a Cas12a domain, which, when present with a bound guide nucleic acid (e.g., gRNA), specifically binds to a target nucleic acid sequence; (b) a first adenine deaminase domain; and (c) a second adenine deaminase domain, wherein the first and second adenine deaminase domains, when present with the Cas12a domain and the gRNA, deaminate adenosine bases in the single-stranded portion of the target nucleic acid sequence, and the Cas12a domain is linked to the first adenine deaminase domain or the second adenine deaminase domain via any one of the amino acid sequences of SEQ ID NOs: 1 to 24.

[0012] The sixth aspect provides a fusion protein comprising (a) a first adenine deaminase domain, (b) a second adenine deaminase domain, and (c) a Cas12a(Cpf1) domain having a mutation in the nuclease active site, wherein the second adenine deaminase domain differs from the first adenine deaminase domain in that the C-terminus of the first adenine deaminase domain is ligated to the N-terminus of the second deaminase domain, and the N-terminus of the Cas12a domain is ligated to the C-terminus of the second adenine deaminase domain via any one of the amino acid sequences of SEQ ID NOs. 1 to 10 (L1 to 10).

[0013] A seventh aspect comprises (a) a Cas12a(Cpf1) domain, (b) a first adenine deaminase domain, and (c) a second adenine deaminase domain, wherein the second adenine deaminase domain differs from the first adenine deaminase domain in that the C-terminus of the first adenine deaminase domain is ligated to the N-terminus of the second deaminase domain, the C-terminus of the Cas12a domain is ligated to the N-terminus of the first adenine deaminase domain, and the first deaminase domain is wild-type The present invention provides a fusion protein in which, if the Cas12a domain is an adenine deaminase domain, the Cas12a domain is ligated to the N-terminus of the first adenine deaminase domain via one of the amino acid sequences of SEQ ID NOs. 11-24 (L11-24), and if the first deaminase domain is a mutated / evolved adenine deaminase domain, the Cas12a domain is ligated to the N-terminus of the first adenine deaminase domain via one of the amino acid sequences of SEQ ID NOs. 11-15 (L11-15).

[0014] An eighth aspect of the present invention provides a method for modifying a target nucleic acid, comprising contacting the target nucleic acid with (a)(i) the fusion protein of the present invention and (a)(ii) a guide nucleic acid, (b) a complex comprising the fusion protein of the present invention and the guide nucleic acid, (c) a composition comprising the fusion protein of the present invention and the guide nucleic acid, and / or (d) a system of the present invention.

[0015] A ninth aspect of the present invention provides a method for modifying a target nucleic acid, comprising contacting a cell or cell-free system containing a target nucleic acid with (a)(i) a polynucleotide encoding the polypeptide or fusion protein of the present invention or an expression cassette or vector containing the same, and (a)(ii) a guide nucleic acid or an expression cassette or vector containing the same, and / or (b) a complex containing the fusion protein of the present invention and a nucleic acid construct encoding the guide nucleic acid and / or an expression cassette or vector containing the same, under conditions in which the fusion protein is expressed and forms a complex with the guide nucleic acid, wherein the complex hybridizes with the target nucleic acid and thereby modifies the target nucleic acid.

[0016] A tenth aspect of the present invention provides a method for editing a target nucleic acid, comprising contacting the target nucleic acid with (a)(i) a fusion protein of the present invention and (a)(ii) a guide nucleic acid, (b) a complex comprising the fusion protein of the present invention and the guide nucleic acid, (c)(i) a composition comprising the fusion protein of the present invention and (c)(ii) a guide nucleic acid, and / or (d)(i) a system of the present invention, wherein the adenine deaminase domain converts adenosine (A) in the target nucleic acid to guanine (G), thereby editing the target nucleic acid to produce a (point) mutation.

[0017] An eleventh aspect of the present invention provides a method for editing a target nucleic acid, comprising contacting a cell or cell-free system containing a target nucleic acid with (a)(i) a polynucleotide encoding the fusion protein of the present invention or an expression cassette or vector containing the same, and (a)(ii) a guide nucleic acid or an expression cassette or vector containing the same, and / or (b) a complex containing the fusion protein of the present invention and a nucleic acid construct encoding the guide nucleic acid and / or an expression cassette or vector containing the same, under conditions in which the fusion protein is expressed and forms a complex with the guide nucleic acid, wherein the complex hybridizes with the target nucleic acid and the adenine deaminase domain converts adenosine (A) in the target nucleic acid to guanine (G), thereby editing the target nucleic acid and causing a (point) mutation.

[0018] The present invention further provides constructs, complexes, compositions, expression cassettes, vectors and cells comprising the polypeptide and / or fusion protein of the present invention, and / or polynucleotides and nucleic acid constructs encoding the fusion protein and complex of the present invention.

[0019] These and other aspects of the present invention are described in more detail in the following description of the present invention.

Sequence Listing Free-Text

[0020] SEQ ID NOs: 1 to 24 are amino acid sequences of the present invention useful for linking polypeptides. SEQ ID NOs: 25 to 29 are amino acid sequences of exemplary peptide linkers useful for linking polypeptides. SEQ ID NOs: 30 to 46 are exemplary Cas12a amino acid sequences useful for the present invention. SEQ ID NOs: 47 to 48 and 79 to 82 are exemplary TadA amino acid sequences useful for the present invention. SEQ ID NOs: 49 to 77 and 90 to 96 are exemplary fusion proteins. SEQ ID NOs: 83 to 89 are examples of spacer sequences.

Brief Description of the Drawings

[0021] [Figure 1] Figures 1A - C provide exemplary domain arrangements of the Cas12a-based adenine base editors of the present invention selected for screening. Ten linker designs were selected with a TadA heterodimer fused to the N-terminus of Cpf1 (Figure 1A), and fourteen were selected with a TadA heterodimer fused to the C-terminus of Cpf1 (Figure 1B). Further, five of the fourteen C-terminal linkers (Cterm_1, Cterm_4, Cterm_5, C9R, and Cterm_10) were selected with the order of the TadA and TadA* domains reversed (Figure 1C). "GS-" is a GS linker and includes, for example, GS-XTEN-GS. [Figure 2]Figure 2 shows the average LbCas12a nuclease activity observed in each of the three exemplary spacers in the same experiment. [Figure 3] Figure 3 is a graph showing the editing frequency of the fusion protein of the present invention having DMNT1 spacer 1. [Figure 4] Figure 4 is a graph of the editing frequency of the fusion protein of the present invention having a DMNT1 spacer 2. [Figure 5] Figure 5 is a graph showing the editing frequency of the fusion protein of the present invention having the DMNT1 spacer 3. [Figure 6] Figure 6 shows the average LbCas12a nuclease activity observed in each of the four exemplary spacers: RNF2 spacer 1, RNF2 spacer 2, RNF2 spacer 3, and RNF2 spacer 4. [Figure 7] Figure 7 is a graph showing the average adenine-to-guanine editing frequency observed in the fusion protein of the present invention having RNF2 spacer 1. [Figure 8] Figure 8 is a graph showing the average adenine-to-guanine editing frequency observed in the fusion protein of the present invention having RNF2 spacer 2. [Figure 9] Figure 9 is a graph showing the average adenine-to-guanine editing frequency observed in the fusion protein of the present invention having the RNF2 spacer 3. [Figure 10] Figure 10 is a graph showing the average adenine-to-guanine editing frequency observed in the fusion protein of the present invention having an RNF2 spacer 4. [Modes for carrying out the invention]

[0022] Next, the present invention will be described hereafter with reference to the accompanying drawings and examples illustrating embodiments of the invention. This description is not intended to be a detailed catalog of all possible ways in which the invention can be carried out, nor is it intended to be a list of all possible features that may be added to the invention. For example, features described specifically in relation to one embodiment may be incorporated into other embodiments, and features described specifically in relation to a particular embodiment may be omitted from that embodiment. Thus, the present invention is intended to exclude or omit any features or combinations of features shown herein in some embodiments of the invention. Furthermore, numerous variations and additions to the various embodiments suggested herein that do not depart from the invention will be apparent to those skilled in the art in consideration of this disclosure. Therefore, the following description is intended to show some specific embodiments of the invention and is not intended to exhaustively specify all sequences, combinations thereof, and variations.

[0023] Unless otherwise defined, all technical and scientific terms used herein have the same meaning as those generally understood by those skilled in the art to which the present invention pertains. The technical terms used herein in describing the present invention are for the purpose of illustrating only specific embodiments and are not intended to be limitations of the invention.

[0024] All publications, patent applications, patents, and other references cited herein, in whole, with respect to the teachings relating to the sentences and / or paragraphs in which they are referenced, constitute part of this specification by reference.

[0025] Unless otherwise indicated by the context, it is specifically intended that the various features of the present invention described herein may be used in any combination. Furthermore, it is also intended that in some embodiments of the present invention, any feature or combination of features shown herein may be excluded or omitted. It is specifically intended that A, B, or C, or any combination thereof, may be omitted or discarded individually or in any combination, in order to indicate that the specification states that a composition includes components A, B, and C.

[0026] The singular forms "a," "an," and "the" used in the description of this invention and the appended claims are intended to similarly include the plural form unless the context clearly indicates otherwise.

[0027] Furthermore, as used herein, “and / or” means and encompasses all possible combinations of one or more of the items described relating to each other, and, when interpreted alternatively (“or”), the absence of any combination.

[0028] When referring to measurable values, such as quantity or concentration, the term “about” as used herein is intended to include variations of ±10%, ±5%, ±1%, ±0.5%, or ±0.1% of the specified value, as well as the specified value itself. For example, if X is a measurable value, “about X” is intended to include X, as well as variations of X of ±10%, ±5%, ±1%, ±0.5%, or ±0.1%. The ranges provided herein for measurable values ​​may include any other ranges and / or individual values ​​within them.

[0029] In this specification, phrases such as "between X and Y" and "about X and Y" should be interpreted as including X and Y. In this specification, phrases such as "about X and Y" mean "about X and about Y," and phrases such as "about X to Y" mean "about X to about Y."

[0030] As used herein, the terms “comprise,” “comprises,” and “comprising” specify the presence of an expressly defined feature, integer, process, operation, element, and / or component, but do not exclude the presence or addition of one or more other features, integers, processes, operations, elements, components, and / or groups thereof.

[0031] As used herein, the transitional phrase “essentially from” should be interpreted as encompassing the specified materials or processes enumerated in the claims, and those that do not materially affect the essential and novel features of the claimed invention. Therefore, as used in the claims of this invention, the term “essentially from” is not intended to be interpreted as equivalent to “include.”

[0032] As used herein, the terms “increase,” “to increase,” “to be increased,” “to augment,” “to be augmented,” “to augment” and “augment” (and its grammatical variations thereof) describe an increase of at least about 25%, 50%, 75%, 100%, 150%, 200%, 300%, 400%, 500%, or more compared to a control.

[0033] As used herein, the terms “reduce,” “reduce,” “reduce,” “reduce,” “decrease,” and “lower” (and their grammatical variations thereof) describe, for example, a reduction of at least about 5%, 10%, 15%, 20%, 25%, 35%, 50%, 75%, 80%, 85%, 90%, 95%, 97%, 98%, 99%, or 100% compared to a control. In certain embodiments, the reduction may not result in or is essentially unable to result in any detectable activity or amount (i.e., an insignificant amount, e.g., less than about 10% or 5%).

[0034] A "heterogeneous" or "recombinant" nucleotide sequence is a nucleotide sequence that is not naturally associated with the host cell into which it is introduced, and may include multiple copies of a naturally occurring nucleotide sequence that are not naturally occurring.

[0035] "Natural" or "wild-type" nucleic acids, nucleotide sequences, polypeptides, or amino acid sequences refer to nucleic acids, nucleotide sequences, polypeptides, or amino acid sequences that are naturally present or endogenous. Therefore, for example, "wild-type mRNA" is mRNA that is naturally present in an organism or endogenous in an organism. "Homologous" nucleic acid sequences are nucleotide sequences that are naturally associated with the host cell into which they are introduced.

[0036] As used herein, the terms “nucleic acid,” “nucleic acid molecule,” “nucleotide sequence,” and “polynucleotide” refer to RNA or DNA that are linear or branched, single-stranded or double-stranded, or hybrids thereof. The terms also encompass RNA / DNA hybrids. Furthermore, when dsRNA is synthetically produced, less common bases, such as inosine, 5-methylcytosine, 6-methyladenine, hypoxanthine, and others, can be used for antisense, dsRNA, and ribozyme pairing. For example, polynucleotides containing C-5 propine analogs of uridine and cytidine have been shown to bind to RNA with high affinity and to be potent antisense inhibitors of gene expression. Other modifications, such as phosphodiester backbone modifications or modifications to the 2'-hydroxyl group within the ribose sugar group of RNA, are also possible.

[0037] As used herein, the term "nucleotide sequence" refers to a heteropolymer or sequence of nucleotides from the 5' end to the 3' end of a nucleic acid molecule, including DNA molecules or RNA molecules such as cDNA, DNA fragments or portions, genomic DNA, synthetic (e.g., chemically synthesized) DNA, plasmid DNA, mRNA, and antisense RNA (all of which may be single-stranded or double-stranded). The terms "nucleotide sequence," "nucleic acid," "nucleic acid molecule," "oligonucleotide," and "polynucleotide" are used interchangeably herein to refer to a heteropolymer of nucleotides. The nucleic acid molecules and / or nucleotide sequences provided herein are shown from left to right in a 5' to 3' orientation and are represented using standard codes for indicating nucleotide properties as shown in the US Sequencing Rules, 37 CFR §§1.821-1.825, and the World Intellectual Property Organization (WIPO) standard ST.25. As used herein, the "5' region" may mean the region of a polynucleotide closest to the 5' end of the polynucleotide. Therefore, for example, elements within the 5' region of a polynucleotide can be positioned anywhere from the first nucleotide located at the 5' end of the polynucleotide to nucleotides located in the middle of the polynucleotide. As used herein, the "3' region" may mean the region of the polynucleotide closest to the 3' end of the polynucleotide. Therefore, for example, elements within the 3' region of a polynucleotide can be positioned anywhere from the first nucleotide located at the 3' end of the polynucleotide to nucleotides located in the middle of the polynucleotide.

[0038] As used herein, the term “gene” refers to a nucleic acid molecule that can be used to produce mRNA, antisense RNA, miRNA, and anti-microRNA antisense oligodeoxyribonucleotides (AMOs), etc. Genes may or may not be used to produce functional proteins or gene products. Genes may include both coding and non-coding regions (e.g., introns, regulatory elements, promoters, enhancers, termination sequences, and / or 5' and 3' untranslated regions). Genes may be “isolated,” meaning nucleic acids that are substantially or essentially free from the components normally found in association with nucleic acids in their native state. Such components include other cellular material, culture media derived from recombinant products, and / or various chemicals used to chemically synthesize nucleic acids.

[0039] The term "mutation" refers to point mutations (e.g., missense or nonsense, or the insertion or deletion of a single base pair resulting in a frameshift), insertions, deletions, and / or truncations. When a mutation is the substitution of one residue in an amino acid sequence with another, or the deletion or insertion of one or more residues in a sequence, the mutation is typically described by identifying the position of the residue in the sequence following the original residue, and then identifying the newly substituted residue.

[0040] As used herein, the terms “complementary” or “complementarity” refer to the innate bonding of polynucleotides by base pairing under acceptable salt and temperature conditions. For example, the sequence “AGT” (5' to 3') bonds to the complementary sequence “TCA” (3' to 5'). Complementarity between two single-stranded molecules may be “partial,” where only a portion of the nucleotides are bonded, or it may be complete, if total complementarity exists between the single-stranded molecules. The degree of complementarity between nucleic acid strands has a significant impact on the efficiency and strength of hybridization between nucleic acid strands.

[0041] As used herein, "complementary" may mean 100% complementarity with the comparator nucleotide sequence, or it may mean less than 100% complementarity (e.g., approximately 70%, 71%, 72%, 73%, 74%, 75%, 76%, 77%, 78%, 79%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, and 99%).

[0042] A “part” or “fragment” of a nucleotide sequence in the present invention is reduced in length compared to a reference nucleic acid or nucleotide sequence (for example, by reducing the length by 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20 or more nucleotides) and is identical or nearly identical to the reference nucleic acid or nucleotide sequence (for example, 70%, 71%). It is understood that this means a nucleotide sequence that contains, and / or will contain, a nucleotide sequence that is essentially derived from, a nucleotide sequence that contains

[0043] As used herein with respect to polypeptides, the terms “fragment” or “part” may refer to a polypeptide that is reduced in length compared to a reference polypeptide and contains an amino acid sequence of consecutive amino acids that is identical or nearly identical (e.g., 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% identical) to the corresponding part of the reference polypeptide, from which it is essentially derived and / or derived from. Such polypeptide fragments may, where appropriate, be contained within a larger polypeptide from which they are components. In some embodiments, the polypeptide fragment contains, essentially consists of, or comprises at least about 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 20, 25, 30, 35, 40, 45, 50, 55, 60, 65, 70, 75, 80, 85, 90, 95, 100, 125, 150, 175, 200, 225, 250, 260, 270, 280, 290 or more consecutive amino acid residues of the reference polypeptide.

[0044] Different nucleic acids or proteins that exhibit homology are referred to herein as “homologues.” The term homology includes homologous sequences from the same and other species, as well as orthologous sequences from the same and other species. “Homologousness” refers to the level of similarity between two or more nucleic acids and / or amino acid sequences, expressed as a percentage of positional identity (e.g., sequence similarity or identity). Homology also refers to the concept of similar functional properties between different nucleic acids or proteins. Thus, the compositions and methods of the present invention further include homologs to the nucleotide and polypeptide sequences of the present invention. As used herein, “ortholog” refers to homologous nucleotide and / or amino acid sequences in different species that arose from a common ancestral gene during speciation. Homogenetics of the nucleotide sequence of the present invention have substantial sequence identity with respect to the nucleotide sequence of the present invention (for example, at least about 70%, 71%, 72%, 73%, 74%, 75%, 76%, 77%, 78%, 79%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, 99.5%, or 100%).

[0045] As used herein, “sequence identity” refers to the degree to which two optimally aligned polynucleotide or polypeptide sequences remain invariant throughout the entire alignment window of their constituent elements, such as nucleotides or amino acids. "Identity" can be easily calculated by known methods, including, but not limited to, those described in Computational Molecular Biology (Lesk, AM, ed.) Oxford University Press, New York (1988), Biocomputing: Informatics and Genome Projects (Smith, DW, ed.) Academic Press, New York (1993), Computer Analysis of Sequence Data, Part I (Griffin, AM, and Griffin, HG, eds.) Humana Press, New Jersey (1994), Sequence Analysis in Molecular Biology (von Heinje, G., ed.) Academic Press (1987), and Sequence Analysis Primer (Gribskov, M. and Devereux, J., eds.) Stockton Press, New York (1991).

[0046] As used herein, the terms “sequence identity percentage” or “identity percentage” refer to the percentage of identical nucleotides in the linear polynucleotide sequence of a reference ("query") polynucleotide molecule (or its complementary chain) compared to a test ("subject") polynucleotide molecule (or its complementary chain), assuming the two sequences are optimally aligned. In some embodiments, “identity percentage” may refer to the percentage of identical amino acids in the amino acid sequence compared to the reference polypeptide.

[0047] As used herein, the phrases “substantially identical” or “substantially identical” in the context of two nucleic acid molecules, nucleotide sequences, or protein sequences mean two or more sequences or subsequences whose nucleotide or amino acid residue identity, when compared and aligned for maximum correspondence using one of the following sequence comparison algorithms or by visual inspection, is at least about 70%, 71%, 72%, 73%, 74%, 75%, 76%, 77%, 78%, 79%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, 99.5%, or 100%. In some embodiments of the present invention, substantial identity exists over nucleotide lengths of approximately 10 to approximately 30 nucleotides, approximately 15 to approximately 25 nucleotides, approximately 30 to approximately 40 nucleotides, approximately 50 to approximately 60 nucleotides, approximately 70 to approximately 80 nucleotides, approximately 90 to approximately 100 nucleotides, or more, and over all ranges therein, up to the full length of the sequence, of the continuous nucleotide region of the nucleotide sequence of the present invention. In some embodiments, the nucleotide sequence may be substantially identical over at least approximately 20 nucleotides (e.g., approximately 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40 nucleotides). In some embodiments, substantially identical nucleotides or protein sequences perform substantially the same function as the nucleotide (or encoded protein sequence) that is substantially identical.

[0048] In sequence comparison, typically, one sequence functions as the reference sequence compared to the test sequence. When using a sequence comparison algorithm, the test sequence and reference sequence are input into the computer, and if necessary, sub-sequence coordinates and sequence algorithm program parameters are specified. Subsequently, the sequence comparison algorithm calculates the sequence identity percentage for the test sequence compared to the reference sequence based on the specified program parameters.

[0049] The optimal alignment of sequences for aligning the comparison window is well known to those skilled in the art and may be performed by tools, such as Smith and Waterman's local homology algorithm, Needleman and Wunsch's homology alignment algorithm, Pearson and Lipman's similarity search method, and possibly by computer implementations of these algorithms, such as GAP, BESTFIT, FASTA, and TFASTA, available as part of the GCG® Wisconsin Package® (Accelrys Inc., San Diego, CA). The “identity fraction” for the aligned segments of the test sequence and reference sequence is the total number of components in the reference sequence segment, e.g., the number of identical components shared by the two aligned sequences, divided by the entire reference sequence or a defined smaller portion of the reference sequence. The sequence identity percentage is expressed as the identity fraction multiplied by 100. The comparison of one or more polynucleotide sequences may be for the full-length polynucleotide sequence or a portion thereof, or for longer polynucleotide sequences. Furthermore, for the purposes of the present invention, the "identity percentage" may be determined using BLASTX version 2.0 for translated nucleotide sequences and BLASTN version 2.0 for polynucleotide sequences.

[0050] Furthermore, two nucleotide sequences may be considered substantially complementary if they hybridize with each other under stringent conditions. In some representative embodiments, two nucleotide sequences considered substantially complementary hybridize with each other under highly stringent conditions.

[0051] In the context of nucleic acid hybridization experiments such as Southern and Northern hybridization, "stringent hybridization conditions" and "stringent hybridization washing conditions" are sequence-dependent and vary under various environmental parameters. A comprehensive guide to nucleic acid hybridization can be found in Tijssen Laboratory Techniques in Biochemistry and Molecular Biology—Hybridization with Nucleic Acid Probes part I chapter 2, “Overview of principles of hybridization and the strategy of nucleic acid probe assays,” Elsevier, New York (1993). Typically, highly stringent hybridization and washing conditions involve defining the thermal melting point (T) for a particular sequence at defined ionic strength and pH. m It is selected to be approximately 5°C lower than ).

[0052] T m This is the temperature (under defined ionic strength and pH) at which 50% of the target sequence hybridizes to a perfectly matching probe. mVery stringent conditions are selected to be equivalent to the above. An example of stringent hybridization conditions for hybridizing complementary nucleotide sequences with more than 100 complementary residues on the filter in Southern or Northern blotting is 50% formamide with 1 mg heparin at 42°C, with hybridization performed overnight. An example of highly stringent washing conditions is 0.1 5M NaCl at 72°C for about 15 minutes. An example of stringent washing conditions is 0.2×SSC washing at 65°C for 15 minutes (see Sambrook below for an explanation of SSC buffer). Often, high-stringency washing follows low-stringency washing to remove background probe signals. For example, an example of moderate stringency washing for double helixes with more than 100 nucleotides is 1×SSC at 45°C for 15 minutes. For example, a low-stringency wash for double hemispheres of more than 100 nucleotides is 4-6 × SSC at 40°C for 15 minutes. For short probes (e.g., about 10-50 nucleotides), stringent conditions typically involve salt concentrations of less than about 1.0 M Na ions, typically about 0.01-1.0 M Na ions (or other salts) at pH 7.0-8.3, and a temperature of at least about 30°C. Stringent conditions can also be achieved by adding destabilizers such as formamide. Generally, a signal-to-noise ratio of 2 × (or higher) than observed for unrelated probes in a particular hybridization assay indicates the detection of specific hybridization. Nucleotide sequences that do not hybridize to each other under stringent conditions are also substantially identical if the encoding proteins are substantially identical. This can occur, for example, when copies of nucleotide sequences occur using the maximum codon degeneracy acceptable by the genetic code.

[0053] Any nucleotide sequence, polynucleotide, and / or recombinant nucleic acid construct of the present invention may be codon-optimized for expression in any organism of interest. Codon optimization is well known in the art and includes modification of nucleotide sequences using species-specific codon frequency tables for codon frequency bias. Codon frequency tables are constructed based on sequence analysis of the most highly expressed genes in the organism / species of interest. If the nucleotide sequence will be expressed in the nucleus, the codon frequency table is constructed based on sequence analysis of the most highly expressed nuclear genes in the species of interest. Modification of the nucleotide sequence is determined by comparing the species-specific codon frequency table with the codons present in the native polynucleotide sequence. As understood in the art, codon optimization of a nucleotide sequence results in a nucleotide sequence that has less than 100% identity with the native nucleotide sequence (e.g., 70%, 71%, 72%, 73%, 74%, 75%, 76%, 77%, 78%, 79%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, 99.5%, or 99.9%, and any range or value within that) but still encodes a polypeptide that has the same function as encoded by the original native nucleotide sequence. Therefore, in some embodiments of the present invention, the polynucleotides, nucleic acid constructs, expression cassettes, and / or vectors of the present invention (containing / encoding the polypeptide, fusion protein, or complex of the present invention, e.g., Cas12a, the polypeptide of interest, adenine deaminase, or linker) may be codon-optimized for expression in a particular species of interest, e.g., a particular plant species, a particular bacterial species, a particular animal species, etc.In some embodiments, codon-optimized polynucleotides, nucleic acid constructs, expression cassettes and / or vectors of the present invention have approximately 70% to approximately 99.9% or more (e.g., 70%, 71%, 72%, 73%, 74%, 75%, 76%, 77%, 78%, 79%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, 99.5%, 99.9%, or 100%) identity with non-codon-optimized polynucleotides, nucleic acid constructs, expression cassettes and / or vectors of the present invention.

[0054] In any of the embodiments described herein, the polynucleotide or nucleic acid constructs of the present invention may be operably associated with various promoters and other regulatory elements for expression in the organism of interest and / or within the cells of the organism of interest. Thus, in some embodiments, the expression cassette or vector comprising the polynucleotide or nucleic acid construct of the present invention may further include one or more promoters, enhancers, and / or terminators operably linked to one or more polynucleotide or nucleic acid constructs.

[0055] As used herein, “operably linked” or “operably related” means that the elements being referred to are functionally related to each other and usually also physically related. Thus, as used herein, the terms “operably linked” or “operably related” refer to nucleotide sequences on a single nucleic acid molecule that are functionally related. Therefore, “operably linked to a second nucleotide sequence” means a situation in which the first nucleotide sequence is positioned to be functionally related to the second nucleotide sequence. For example, a promoter is operably related to a nucleotide sequence if the promoter results in the transcription or expression of the nucleotide sequence. Those skilled in the art will understand that a control sequence (e.g., a promoter) does not need to be contiguous with an operably related nucleotide sequence insofar as it functions to guide the expression of that sequence. Thus, for example, an intervening sequence that is transcribed but not translated may exist between a promoter and a nucleotide sequence, and the promoter can still be considered “operably linked” to the nucleotide sequence.

[0056] In this specification, the term "linked" with respect to polypeptides refers to the attachment of one polypeptide to another. Polypeptides may be linked to another polypeptide directly (e.g., via a peptide bond) at the N-terminus or C-terminus, or via a linker.

[0057] The term "linker" as recognized in this art refers to a link, chemical group, or molecule that connects two molecules or parts, such as two domains of a fusion protein, such as a Cas12a domain and a nucleic acid-editing domain (e.g., adenosine deaminase). A linker may consist of a single linking molecule or may contain multiple linking molecules (e.g., amino acids). In some embodiments, the linker may be an organic molecule, group, polymer, or chemical part. In some embodiments, the linker may be an amino acid or a peptide linker. In some embodiments, the peptide linker has an amino acid length of about 4 to 100 or more, for example, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 4 The amino acid lengths may be 8, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, or greater. In some embodiments, the peptide linker may be a GS linker. In some embodiments, the linker may include the amino acid sequence SGGS (SEQ ID NO: 25), (GGS)n, or S(GGS)n (one or more repeats of SEQ ID NO: 26), where n is 1 to 20 (e.g., 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, and any range or value therein). In some embodiments, the linker may include the amino acid sequence SGGSGGSGGS (SEQ ID NO: 27). In some embodiments, the linker may include the amino acid sequence SGSETPGTSESATPES (SEQ ID NO: 28), also known as the XTEN linker.In some embodiments, the linker may include the amino acid sequence SGGSSGGSSGSETPGTSESATPESSGGSSGGS (SEQ ID NO: 29), also known as the GS-XTEN-GS linker. In some embodiments, the linker includes, is essentially, or consists of any one of the amino acid sequences of SEQ ID NOs: 1 to 24.

[0058] A "promoter" is a nucleotide sequence that controls or regulates the transcription of a nucleotide sequence (e.g., a coding sequence) that is operably associated with the promoter. The coding sequence controlled or regulated by the promoter may encode a polypeptide and / or functional RNA. Typically, a "promoter" refers to a nucleotide sequence that contains an RNA polymerase II binding site and directs the initiation of transcription. Generally, promoters are found 5' or upstream of the start of the coding region of the corresponding coding sequence. The promoter region may also contain other elements that act as regulators of gene expression. These include TATA box consensus sequences and often CAAT box consensus sequences (Breathnach and Chambon, (1981) Annu. Rev. Biochem. 50:349). In plants, CAAT boxes may be replaced by AGGA boxes (Messing et al., (1983) in Genetic Engineering of Plants, T. Kosuge, C. Meredith and A. Hollaender (eds.), Plenum Press, pp. 211-227).

[0059] Examples of promoters include constitutive, inducible, temporally regulated, developmentally regulated, chemically regulated, tissue-preference, and / or tissue-specific promoters used in the preparation of recombinant nucleic acid molecules, such as "synthetic nucleic acid constructs" or "protein-RNA complexes." Various types of these promoters are known in this art.

[0060] The choice of promoter may vary depending on the temporal and spatial requirements of expression, or on the host cell to be transformed. Promoters for a wide variety of organisms are well known in this technique. Based on the extensive knowledge available in this technique, a promoter suitable for a specific host organism of interest can be selected. For example, much is known about the upstream promoters of highly constitutively expressed genes in model organisms, and such knowledge is readily accessible and applicable in other systems as needed.

[0061] In some embodiments, the polynucleotide and / or nucleic acid constructs of the present invention may be an “expression cassette” or may be contained within an expression cassette. As used herein, “expression cassette” means, for example, a recombinant nucleic acid molecule containing a nucleic acid construct of the present invention (e.g., encoding a complex of the present invention (e.g., a fusion protein and guide nucleic acid of the present invention)), wherein the nucleic acid construct is operably associated with at least one regulatory sequence (e.g., a promoter). Thus, some embodiments of the present invention provide, for example, an expression cassette designed to express a nucleic acid construct of the present invention.

[0062] The expression cassette containing the nucleotide sequence of interest may be a chimeric one, meaning that at least one of its components is heterogeneous with respect to at least one of the other components (for example, a promoter derived from a host organism, operably linked to the polynucleotide of interest to be expressed in the host organism, where the polynucleotide of interest is derived from an organism other than the host or is not normally found in association with the promoter). The expression cassette may also be naturally occurring but obtained in a recombinant form useful for heterologous expression.

[0063] The expression cassette may, in some cases, include transcriptional and / or translational stop regions (i.e., stop regions) and / or enhancer regions that are functional within selected host cells. Various transcriptional terminators and / or enhancers are available for use in expression cassettes and are responsible for transcriptional termination and precise mRNA polyadenylation. The stop regions and / or enhancer regions may be specific to the operably linked nucleotide sequence of interest, specific to the host cell, or derived from another source (e.g., exogenous or heterologous to the promoter, the nucleotide sequence of interest, the host, or any combination thereof).

[0064] Furthermore, the expression cassette of the present invention may include a nucleotide sequence encoding a selection marker, which can be used to select transformed host cells. As used herein, “selection marker” means a nucleotide sequence that, when expressed, gives host cells expressing the marker a distinctive phenotype, so that such transformed cells can be distinguished from cells that do not have the marker. Such a nucleotide sequence may encode a selection marker or a screening marker, depending on whether the marker confers a trait that can be selected by chemical means, for example by using a selector (e.g., an antibiotic), or whether the marker is a trait that can be identified simply by observation or testing, for example by screening (e.g., fluorescence). Many examples of suitable selection markers are known in the art and can be used in the expression cassettes described herein.

[0065] In addition to expression cassettes, nucleic acid molecules / constructs and polynucleotide sequences described herein can be used in conjunction with vectors. The term “vector” refers to a composition for transmitting, delivering, or introducing nucleic acids into cells. A vector comprises a nucleic acid molecule containing the nucleotide sequence to be transmitted, delivered, or introduced. Vectors used for the transformation of host organisms are well known in this art. Non-limiting examples of a general class of vectors include, but are not limited to, double-stranded or single-stranded linear or cyclic viral vectors, plasmid vectors, phage vectors, phagemide vectors, cosmid vectors, fosmid vectors, bacteriophages, artificial chromosomes, minicircles, or Agrobacterium binary vectors, which may or may not be self-transmitting or mobile. In some embodiments, viral vectors may include, but are not limited to, retrovirus, lentivirus, adenovirus, adeno-associated virus, or herpes simplex virus vectors. Vectors as defined herein can transform prokaryotic or eukaryotic hosts either by integration into the cellular genome or by extrachromosomal presence (e.g., autonomous replication plasmids with an origin of replication). In addition, shuttle vectors, meaning DNA vehicles capable of replication naturally or by design within two different host organisms, may be selected from Actinomyces and related species, bacteria, and eukaryotes (e.g., higher-order plant, mammalian, yeast, or fungal cells). In some embodiments, the nucleic acids within the vector are under the control of a promoter or other regulatory element suitable for transcription within the host cell and are operably linked to it. The vector may be a binary functional expression vector that functions in multiple hosts. In the case of genomic DNA, it may contain its own promoter or other regulatory element, and in the case of cDNA, it may be under the control of a promoter or other regulatory element suitable for expression within the host cell.Therefore, the polynucleotides and nucleic acid constructs of the present invention, and / or expression cassettes containing them, may be included in the vectors described herein and in vectors known in the art.

[0066] As used herein, “to bring into contact,” “in contact,” “contacted,” and their grammatical variations refer to placing together the components of a desired reaction (e.g., transformation, transcriptional regulation, genome editing, nicking, and / or cleavage) under conditions suitable for carrying out the desired reaction. Thus, for example, a target nucleic acid can be brought into contact with the fusion protein and guide nucleic acid of the present invention, thereby modifying the target nucleic acid. In some embodiments, target DNA can be brought into contact with a polynucleotide or nucleic acid construct encoding the fusion protein of the present invention and the guide nucleic acid under conditions in which the fusion protein is expressed, forms a complex with the guide nucleic acid, and then the complex hybridizes to the target nucleic acid to modify the target nucleic acid.

[0067] As used herein, “modifying” or “modifying” a target nucleic acid includes editing (e.g., mutation), covalent modification, exchange / substitution of nucleic acid / nucleotide bases, deletion, cleavage, nicking, and / or transcriptional regulation of the target nucleic acid.

[0068] In the context of the polynucleotide of interest, “introducing,” “introducing,” and “introduced” (and their grammatical variations) mean presenting the nucleotide sequence of interest (e.g., polynucleotides, nucleic acid constructs, complexes (e.g., protein-RNA chimeric complexes), and / or guide nucleic acids) to a host organism or the cells of said organism (e.g., host cells) such that the nucleotide sequence gains access to the inside of the cell. Thus, for example, a polynucleotide and guide nucleic acid encoding the fusion protein of the present invention may be introduced into the cells of an organism to transform the cells.

[0069] As used herein, the term "transformation" refers to the introduction of a different nucleic acid into a cell. Cellular transformation may be stable or transient. Therefore, in some embodiments, a host cell or host organism is stably transformed with the nucleic acid molecule of the present invention. In other embodiments, a host cell or host organism is transiently transformed with the recombinant nucleic acid molecule of the present invention.

[0070] In the context of polynucleotides, "transient transformation" means that polynucleotides are introduced into a cell but are not integrated into the cell's genome.

[0071] In the context of polynucleotides introduced into cells, "stable introduction" or "being stable introduced" means that the introduced polynucleotides are stablely integrated into the cell's genome, and the cell is stablely transformed by the polynucleotides.

[0072] As used herein, “stable transformation” or “stable transformed” means that a nucleic acid molecule is introduced into a cell and integrated into the cell’s genome. Therefore, the integrated nucleic acid molecule can be inherited by subsequent generations, more specifically by multiple successive generations. As used herein, “genome” includes the nuclear genome and the plastid genome, and therefore includes, for example, the integration of nucleic acids into the chloroplast genome or the mitochondrial genome. Furthermore, as used herein, stable transformation may refer to a transgene maintained outside the chromosome, for example, as a microchromosome or plasmid.

[0073] Transient transformations can be detected, for example, by enzyme-linked immunosorbent assay (ELISA) or Western blotting, which can detect the presence of peptides or polypeptides encoded by one or more transgenes introduced into an organism. Stable transformations of cells can be detected, for example, by Southern blotting hybridization assays of the cell's genomic DNA with nucleic acid sequences, which specifically hybridize with the nucleotide sequence of the transgene introduced into the organism (e.g., a plant). Stable transformations of cells can also be detected, for example, by Northern blotting hybridization assays of the cell's RNA with nucleic acid sequences, which specifically hybridize with the nucleotide sequence of the transgene introduced into the host organism. Furthermore, stable transformations of cells can be detected, for example, by polymerase chain reaction (PCR) or other amplification reactions well known in the art, which result in amplification of the transgene sequence using specific primer sequences that hybridize with the target sequence of the transgene, and which can be detected according to standard methods. Transformations can also be detected by direct sequencing and / or hybridization protocols well known in the art.

[0074] Therefore, in some embodiments, the nucleotide sequences, nucleic acid constructs, and / or expression cassettes of the present invention may be transiently expressed and / or stably incorporated into the genome of a host organism. Thus, in some embodiments, the fusion protein of the present invention or the polynucleotide encoding it is introduced into a cell together with the guide nucleic acid, so that the DNA cannot be maintained within the cell.

[0075] The nucleic acid constructs / polynucleotides of the present invention can be introduced into cells by any method known to those skilled in the art. In some embodiments of the present invention, cell transformation includes nuclear transformation. In other embodiments, cell transformation includes plastid transformation (e.g., chloroplast transformation). In further embodiments, the nucleic acid constructs / polynucleotides of the present invention can be introduced into cells via conventional breeding techniques.

[0076] The procedures for transforming both eukaryotes and prokaryotes are well-known and routine in this technique and are described throughout the literature (see, for example, Jiang et al. 2013. Nat. Biotechnol. 31:233-239, Ran et al. Nature Protocols 8:2281-2308 (2013)).

[0077] Therefore, nucleotide sequences can be introduced into a host organism or its cells by several methods known in the art. The method of the present invention introduces one or more nucleotide sequences into an organism, but does not depend on a specific method of accessing the interior of at least one cell of the organism. When multiple nucleotide sequences are to be introduced, they can be assembled as part of a single nucleic acid construct or as separate nucleic acid constructs, and can be positioned on the same or different nucleic acid constructs. Thus, nucleotide sequences can be introduced into cells of interest in a single transformation event or in separate transformation events, or, where applicable, can be incorporated into plants, for example, as part of a breeding protocol.

[0078] The present invention relates to polypeptides (e.g., SEQ ID NOs. 1-24) that can be used, for example, to link two or more protein / protein domains. In some embodiments, the polypeptides of the present invention may be about 70% to 100% identical to any one of the amino acid sequences of SEQ ID NOs. 1-24 (e.g., 70%, 71%, 72%, 73%, 74%, 75%, 76%, 77%, 78%, 79%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, 99.5%, or 100% identical). In some embodiments, the present invention provides a polynucleotide encoding any one of the amino acid sequences of SEQ ID NOs: 1 to 24, and / or a polynucleotide having 70% to 100% identity (e.g., 70%, 71%, 72%, 73%, 74%, 75%, 76%, 77%, 78%, 79%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, 99.5%, or 100%) to a polynucleotide encoding any one of the amino acid sequences of SEQ ID NOs: 1 to 24. In some embodiments, the polynucleotide encoding any one of the amino acid sequences of SEQ ID NOs: 1 to 24 may be codon-optimized for expression in organisms.

[0079] The present invention also covers synthetic fusion proteins comprising these polypeptides. In some embodiments, the present invention provides a polypeptide comprising one of the amino acid sequences of SEQ ID NOs: 1 to 24 and a polypeptide of interest. In some embodiments, the polypeptide of interest may be ligated at its C-terminus and / or N-terminus to one of the amino acid sequences of SEQ ID NOs: 1 to 24, optionally at its C-terminus or N-terminus. In some embodiments, the polypeptide of interest may comprise two or more polypeptides of interest (e.g., 2, 3, 4, 5, 6, 7 or more), which may be the same or different, and where at least two of the two or more polypeptides of interest may be ligated to each other via one of the amino acid sequences of SEQ ID NOs: 1 to 24.

[0080] Polypeptides of interest useful in the present invention include deaminase (deamination) activity (e.g., cytosine deaminase, adenine deaminase), nickase activity, recombinase activity, transposase activity, methylase activity, glycosylase (DNA glycosylase) activity, glycosylase inhibitory activity (e.g., uracil-DNA glycosylase inhibitor (UGI)), demethylase activity, transcription activation activity, transcription repression activity, transcription release factor activity, histone modification activity, nuclease activity, single-strand RNA cleavage activity, double-strand RNA cleavage activity, restriction endonuclease activity (e.g., Fok1), nucleic acid binding activity, methyltransferase activity, DNA repair activity, and D The polypeptide or protein domain may include, but is not limited to, NA-damaging activity, dismutase activity, alkylation activity, deprylation activity, oxidative activity, pyrimidine dimer formation activity, integrase activity, transposase activity, polymerase activity, ligase activity, helicase activity, and / or photolyase activity. In some embodiments, the polypeptide of interest is an adenine deaminase, cytosine deaminase, Fok1 nuclease, or uracil-DNA glycosylase inhibitor. In some embodiments, the polynucleotide of interest may be codon-optimized for expression in organisms.

[0081] In some embodiments, the polypeptide of interest is a CRISPR Cas12a polypeptide or a Cas12a domain, where Cas12a is ligated at its C-terminus and / or N-terminus to the C-terminus or N-terminus of any one of the amino acid sequences of SEQ ID NOs: 1 to 24.

[0082] In some embodiments, a fusion protein is provided comprising Cas12a, a polypeptide of interest, and one of the amino acid sequences of SEQ ID NOs: 1-24. In some embodiments, the amino acid sequences of SEQ ID NOs: 1-24 comprise Cas12a and one or more (e.g., 1, 2, 3, 4, 5, 6, 7 or more) polypeptides of interest (e.g., adenine deaminase domains, e.g., TadA / TadA * This enables optimal placement of the Cas12a domain. The amino acid sequences of SEQ ID NOs: 1-24 may be used to link Cas12a with a polypeptide of interest to allow access to a single-stranded portion of the non-target strand, for example, for nucleic acid modification, such as base editing.

[0083] In some embodiments, when Cas12a is used to ligate a polypeptide of interest, the amino acid sequences of SEQ ID NOs. 1–24 may provide different windows for nucleic acid modification or editing. For example, the amino acid sequences of SEQ ID NOs. 1–24 ligating a polypeptide of interest to Cas12a may provide a window for editing or modification of 1 to approximately 25 nucleotides from the corresponding PAM (protospacer adjacent motif) in the target nucleic acid (e.g., DNA) (e.g., an editing / modification window of 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, or 25 nucleotides from the PAM, and any range or value thereof). In some embodiments, the editing or modification window can be 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 to about 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25 nucleotides from PAM (e.g., 1 to 20, 1 to 15, 1 to 10, 3 to 15, 4 to 10, 5 to 25, 5 to 20, 5 to 15, 5 to 10, 7 to 15 nucleotides from PAM, etc.).

[0084] Cas12a is a type V clustered regularly interspaced short palindromic repeats (CRISPR)-Cas nuclease. Cas12a differs from the more well-known type II CRISPR Cas9 nuclease in several respects. For example, Cas9 recognizes a G-rich protospacer-adjacent motif (PAM) (3'-NGG) located on the 3' side of its guide RNA (gRNA, sgRNA) binding site (protospacer, target nucleic acid, target DNA), while Cas12a recognizes a T-rich PAM (5'-ttN, 5'TTTN) located on the 5' side of the target nucleic acid. In fact, the orientation in which Cas9 and Cas12a bind to their respective guide RNAs is largely reversed with respect to their N-terminus and C-terminus. Furthermore, the Cas12a enzyme uses a single guide RNA (gRNA, CRISPR array, crRNA) instead of the dual guide RNA (sgRNA (e.g., crRNA and tracrRNA)) found in the natural Cas9 system, and Cas12a processes its own gRNA. In addition, Cas12a nuclease activity generates staggered DNA double-strand breaks instead of blunt ends produced by Cas9 nuclease activity, and Cas12a relies on a single RuvC domain to cleave both DNA strands, whereas Cas9 utilizes both an HNH domain and a RuvC domain for cleavage.

[0085] The CRISPR Cas12a polypeptide or CRISPR Cas12a domain useful in the present invention may be any known or later identified Cas12a nuclease (formerly known as Cpf1) (see, for example, U.S. Patent No. 9,790,490, which, for the disclosure of the Cpf1(Cas12a) sequence, is incorporated herein by reference). The terms “Cas12a,” “Cas12a polypeptide,” or “Cas12a domain” refer to an RNA guide nuclease containing a Cas12a polypeptide or a fragment thereof, comprising the Cas12a guide nucleic acid binding domain and / or the Cas12a active, inactive, or partially active DNA cleavage domain. In some embodiments, the Cas12a useful in the present invention may contain mutations within the nuclease active site (e.g., the RuvC site of the Cas12a domain). A Cas12a domain or Cas12a polypeptide that has a mutation within its nuclease active site and therefore no longer contains nuclease activity is generally referred to as deadCas12a (e.g., dCas12a). In some embodiments, a Cas12a domain or Cas12a polypeptide having a mutation within its nuclease active site may have impaired activity.

[0086] In some embodiments, the Cas12a domain may, but is not limited to, any one amino acid sequence from SEQ ID NOs. 30-46 (e.g., SEQ ID NOs. 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46) or a polynucleotide encoding it. In some embodiments, the fusion protein of the present invention may include the Cas12a domain of Cas12a (LbCas12a) from the Lachnospiraceae bacterium ND2006 (e.g., SEQ ID NO: 30).

[0087] In some embodiments, the polynucleotide encoding the Cas12a domain may be codon-optimized for expression in organisms. Thus, in some embodiments, the present invention provides a polynucleotide having at least about 70% identity (e.g., about 70%, 71%, 72%, 73%, 74%, 75%, 76%, 77%, 78%, 79%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, 100% identity) to the polynucleotide encoding any one of the amino acid sequences of SEQ ID NOs.

[0088] In some embodiments, a V-type Clustered Regularly Interspaced Short Palindromic Repeats (CRISPR)-related (Cas) (CRISPR-Cas) system is provided, comprising: (a) a fusion protein comprising a Cas12a domain, a linker comprising one of the amino acid sequences of SEQ ID NOs: 1 to 24, and a polypeptide of interest, wherein the Cas12a domain is linked to the polypeptide of interest via one of the amino acid sequences of SEQ ID NOs: 1 to 24, or a nucleic acid encoding the fusion protein; and (b) a guide nucleic acid (CRISPR RNA, CRISPR DNA, crRNA, crDNA) comprising a spacer sequence and a repeat sequence, wherein the guide nucleic acid can form a complex with the Cas12a domain of the fusion protein, and the spacer sequence can hybridize to a target nucleic acid, thereby guiding the Cas12a domain and the polypeptide of interest to the target nucleic acid, thereby allowing the system to modify (e.g., cleave or edit) or modulate (e.g., modulate transcription) the target nucleic acid.

[0089] In some embodiments, a fusion protein is provided comprising Cas12a, a polypeptide of interest, and one of the amino acid sequences of SEQ ID NOs: 1-24, where the polypeptide of interest is an adenine deaminase polypeptide or domain.

[0090] In some embodiments, the present invention provides a fusion protein comprising (a) a Cas12a domain that specifically binds to a target nucleic acid sequence when in conjunction with a bound guide nucleic acid (e.g., gRNA), (b) a first adenine deaminase domain, and (c) a second adenine deaminase domain, wherein the first and second adenine deaminase domains, when in conjunction with the Cas12a domain and the gRNA, deaminate adenosine bases in the single-stranded portion of the target nucleic acid sequence, and the Cas12a domain is linked to the first adenine deaminase domain or the second adenine deaminase domain via any one of the amino acid sequences of SEQ ID NOs: 1 to 24. In some embodiments, the N-terminus of the Cas12a domain may be ligated to the C-terminus of a second adenine deaminase domain via any one of the amino acid sequences of SEQ ID NOs. 1 to 10, or the C-terminus of the Cas12a domain may be ligated to the N-terminus of either the first or second adenine deaminase domain via any one of the amino acid sequences of SEQ ID NOs. 11 to 24. In some embodiments, the first adenine deaminase is a wild-type adenine deaminase (e.g., TadA (tRNA-specific adenosine deaminase, e.g., SEQ ID NO: 47)), and the second adenine deaminase domain is a mutated / evolved adenine deaminase domain (e.g., TadA *(An evolved tRNA-specific adenosine deaminase, e.g., SEQ ID NOs. 48 or 78-82), wherein the C-terminus of the Cas12a domain is ligated to the N-terminus of a second adenine deaminase domain via one of the amino acid sequences of SEQ ID NOs. 11-15, or to the N-terminus of a first adenine deaminase domain via one of the amino acid sequences of SEQ ID NOs. 11-24, or the N-terminus of the Cas12a domain is ligated to the C-terminus of a second adenine deaminase domain via one of the amino acid sequences of SEQ ID NOs. 1-10. Exemplary fusion proteins of the present invention include, but are not limited to, the amino acid sequences of SEQ ID NOs. 49-77 and / or SEQ ID NOs. 90-96.

[0091] In some embodiments, a fusion protein is provided comprising (a) a first adenine deaminase domain, (b) a second adenine deaminase domain, and (c) a Cas12a (Cpf1) domain, wherein the Cas12a domain contains a mutation in its nuclease active site, the second adenine deaminase domain differs from the first adenine deaminase domain in that the C-terminus of the first adenine deaminase domain is ligated to the N-terminus of the second deaminase domain, and the N-terminus of the Cas12a domain is ligated to the C-terminus of the second adenine deaminase domain via any one of the amino acid sequences of SEQ ID NOs. 1 to 10 (L1 to 10). In some embodiments, the first adenine deaminase is wild-type adenine deaminase (e.g., TadA) (e.g., SEQ ID NO: 47), and the second adenine deaminase domain is a mutated / evolved adenine deaminase domain (e.g., TadA). * ) (for example, SEQ ID NOs. 48 or 78-82). In some embodiments, a fusion protein is provided that includes any one of the amino acid sequences of SEQ ID NOs. 49-77 and / or SEQ ID NOs. 90-96.

[0092] In some embodiments, the fusion protein comprises (a) a Cas12a domain, (b) a first adenine deaminase domain, and (c) a second adenine deaminase domain, wherein the second adenine deaminase domain differs from the first adenine deaminase domain in that the C-terminus of the first adenine deaminase domain is ligated to the N-terminus of the second deaminase domain, and the C-terminus of the Cas12a domain is ligated to the N-terminus of the first adenine deaminase domain. A fusion protein is provided in which, if the first adenine deaminase domain is a wild-type adenine deaminase domain, the Cas12a domain is ligated to the N-terminus of the first adenine deaminase domain via one of the amino acid sequences of SEQ ID NOs. 11-24 (L11-24), and if the first deaminase domain is a mutated / evolved adenine deaminase domain, the Cas12a domain is ligated to the N-terminus of the first adenine deaminase domain via one of the amino acid sequences of SEQ ID NOs. 11-15 (L11-15). In some embodiments, the first adenine deaminase domain is a wild-type adenosine deaminase (e.g., a wild-type tRNA-specific adenosine deaminase domain) or a mutated / evolved adenosine deaminase domain (e.g., a mutated / evolved tRNA-specific adenosine deaminase domain) (e.g., SEQ ID NOs. 47, 48, or 78-82). In some embodiments, the second adenine deaminase domain is either a wild-type adenosine deaminase (e.g., a wild-type tRNA-specific adenosine deaminase domain) or a mutated / evolved adenosine deaminase domain (e.g., a mutated / evolved tRNA-specific adenosine deaminase domain) (e.g., SEQ ID NOs. 47, 48, or 78-82). In some embodiments, the first and second adenine deaminases form a dimer. In some embodiments, a fusion protein is provided that includes any one of the amino acid sequences of SEQ ID NOs. 49-77 and / or 90-96.

[0093] The adenine deaminase (or adenosine deaminase) useful in the present invention may be any known or later identified adenine deaminase of any biological origin (see, for example, U.S. Patent No. 10,113,163, which, for the disclosure of adenine deaminase, is incorporated herein by reference). As used herein, “adenine deaminase” and “adenosine deaminase” refer to polypeptides or domains that catalyze or can catalyze the hydrolytic deamination of adenine or adenosine (e.g., removal of an amine group from adenine). In some embodiments, adenine deaminase may catalyze the hydrolytic deamination of adenosine or deoxyadenosine to inosine or deoxyinosine, respectively. In some embodiments, adenosine deaminase catalyzes the hydrolytic deamination of adenine or adenosine in DNA. In some embodiments, the adenine deaminase encoded by the nucleic acid construct of the present invention can induce an A→G conversion in the sense (e.g., "+"; template) strand of a target nucleic acid, or a T→C conversion in the antisense (e.g., "-"; complementary) strand of a target nucleic acid. An adenine deaminase useful to the present invention may be any known or later identified adenine deaminase of any biological origin (e.g., see U.S. Patent No. 10,113,163, which is part of this specification by reference for the disclosure of an adenine deaminase).

[0094] In some embodiments, the adenosine deaminase may be a variant of a naturally occurring adenine deaminase. Therefore, in some embodiments, the adenosine deaminase useful for the present invention may be about 70% to 100% identical to wild-type adenine deaminase (for example, about 70%, 71%, 72%, 73%, 74%, 75%, 76%, 77%, 78%, 79%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100% identical to a naturally occurring adenine deaminase). In some embodiments, deaminase or deaminase may refer to a non-naturally occurring, engineered, mutated, or evolved adenosine deaminase. Therefore, for example, an engineered, mutated, or evolved adenine deaminase polypeptide or adenine deaminase domain may be approximately 70% to 99.9% identical to a naturally occurring adenine deaminase polypeptide / domain (e.g., approximately 70%, 71%, 72%, 73% identical to a naturally occurring adenine deaminase polypeptide or adenine deaminase domain). The percentages may be 74%, 75%, 76%, 77%, 78%, 79%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, 99.1%, 99.2%, 99.3%, 99.4%, 99.5%, 99.6%, 99.7%, 99.8%, or 99.9% identical. In some embodiments, the adenosine deaminase is derived from bacteria (e.g., Escherichia coli, Staphylococcus aureus, Haemophilus influenzae, Caulobacter crescentus, etc.). In some embodiments, the polynucleotide encoding the adenine deaminase polypeptide / domain may be codon-optimized for expression in an organism (e.g., a plant).

[0095] In some embodiments, the adenine deaminase domain may be a wild-type tRNA-specific adenosine deaminase domain, such as tRNA-specific adenosine deaminase (TadA) and / or a mutated / evolved adenosine deaminase domain, such as a mutated / evolved tRNA-specific adenosine deaminase domain (TadA * ). In some embodiments, the TadA domain may be derived from Escherichia coli. In some embodiments, TadA may be modified, for example, cleaved, and may lack one or more N-terminal and / or C-terminal amino acids compared to full-length TadA (e.g., 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, or 20 N-terminal and / or C-terminal amino acid residues may be lacking compared to full-length TadA). In some embodiments, the TadA polypeptide or TadA domain does not contain an N-terminal methionine. In some embodiments, wild-type Escherichia coli TadA contains the amino acid sequence of SEQ ID NO: 47. In some embodiments, the mutated / evolved Escherichia coli TadA * contains the amino acid sequence of SEQ ID NO: 48 or 78-82. In some embodiments, the polynucleotide encoding TadA / TadA * may be codon-optimized for expression in an organism.

[0096] In some embodiments, the first deaminase domain may be linked to a second deaminase domain via a linker (e.g., a peptide linker) to form an adenine deaminase dimer. In some embodiments, the first deaminase domain may be linked to the second deaminase domain via a GS linker. In some embodiments, the GS linker may include the amino acid sequence SGGS (SEQ ID NO: 25), (GGS)n, or S(GGS)n (one or more repeats of SEQ ID NO: 26), where n is 1 to 20 (e.g., 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, and any range or value therein). In some embodiments, the GS linker may include the amino acid sequences SGGSGGSGGS (SEQ ID NO: 27), SGSETPGTSESATPES (SEQ ID NO: 28), and / or SGGSSGGSSGSETPGTSESATPESSGGSSGGS (SEQ ID NO: 29). In some embodiments, the adenine deaminase dimer includes a first deaminase domain linked to a second deaminase domain via SGGSSGGSSGSETPGTSESATPESSGGSSGGS (SEQ ID NO: 29). In some embodiments, the first deaminase domain is linked at its C-terminus to the N-terminus of the second deaminase domain. In some embodiments, the second deaminase domain is linked at its C-terminus to the N-terminus of the first deaminase domain.

[0097] The fusion proteins of the present invention, comprising a Cas12a domain linked to a polypeptide of interest as described herein, may be used in combination with guide RNAs (gRNA, CRISPR array, CRISPR RNA, crRNA) designed to function with the Cas12a domain to modify a target nucleic acid. Guide nucleic acids (CRISPR RNA, CRISPR DNA, crRNA, crDNA) useful in the present invention include spacer sequences and repeat sequences. The guide nucleic acid can form a complex with the Cas12a domain of the fusion protein, and the spacer sequence can hybridize to the target nucleic acid, thereby guiding the Cas12a domain and the polypeptide of interest to the target nucleic acid, which is then modified (e.g., cleaved or edited) or regulated (e.g., regulated transcription) by the polypeptide of interest of the fusion protein. As an example, a fusion protein containing a Cas12a domain linked to an adenine deaminase domain as described herein can be used in combination with a Cas12a guide nucleic acid to modify a target nucleic acid. In this case, the adenine deaminase domain of the fusion protein deaminates the adenosine base in the target nucleic acid, thereby editing the target nucleic acid.

[0098] As used herein, “guide nucleic acid,” “guide RNA,” “gRNA,” “CRISPR RNA / DNA,” “crRNA,” or “crDNA” means a nucleic acid comprising at least one spacer sequence complementary to (hybridizes with) target DNA (e.g., a protospacer), and at least one repeat sequence (e.g., a repeat, or fragment or portion thereof, of a V-type Cas12a CRISPR-Cas system), where the repeat sequence is ligated to the 5' end of the spacer sequence. The design of the gRNA of the present invention is based on the V-type Cas12a CRISPR-Cas system. In some embodiments, the Cas12a gRNA may include a repeat sequence (full length or a portion thereof ("handle"), e.g., a pseudoknot-like structure) and a spacer sequence from 5' to 3'. In some embodiments, the guide nucleic acid may contain multiple repeat-spacer sequences (e.g., 2, 3, 4, 5, 6, 7, 8, 9, 10, or more repeat-spacer sequences) (e.g., repeat-spacer-repeat, e.g., repeat-spacer-repeat-spacer-repeat-spacer-repeat-spacer-repeat-spacer, etc.). The guide nucleic acid of the present invention is synthetic, artificial, and not found in nature. The gRNA can be very long and may be used as an aptamer (as in the MS2 mobilization strategy) or other RNA structure that hangs spacers.

[0099] As used herein, “repetitive sequence” refers, for example, to any repetitive sequence of the wild-type CRISPR Cas12a locus, or to a repetitive sequence of synthetic crRNA. Repetitive sequences useful to the present invention may be any known or later identified repetitive sequence of the CRISPR Cas12a locus (type V), or to synthetic repeats designed to function in a type V CRISPR-Cas system. The repetitive sequence may include hairpin structures and / or stem-loop structures. In some embodiments, the repetitive sequence may form a pseudoknot-like structure (i.e., a “handle”) at its 5' end. Thus, in some embodiments, the repetitive sequence may be identical or substantially identical (e.g., at least 70% identical) to a repetitive sequence derived from the wild-type type V CRISPR locus. The repetitive sequence derived from the wild-type Cas12a (type V) CRISPR locus can be determined by established algorithms, for example, using CRISPRfinder provided by CRISPRdb (see Grissa et al. Nucleic Acids Res. 35 (web server publication): W52-7). In some embodiments, the repetitive sequence or a portion thereof is ligated to the 5' end of a spacer sequence, thereby forming a repetitive-spacer sequence (e.g., guide RNA, crRNA).

[0100] In some embodiments, the repeat sequence contains, essentially consists of, or comprises at least 10 nucleotides (e.g., about 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50-100 nucleotides, or any range or value therein), depending on whether a particular repeat and the guide RNA containing the repeat are processed or not. In some embodiments, the repeat sequence contains, is essentially, or consists of about 10 to about 20, about 10 to about 30, about 10 to about 45, about 10 to about 50, about 15 to about 30, about 15 to about 40, about 15 to about 45, about 15 to about 50, about 20 to about 30, about 20 to about 40, about 20 to about 50, about 30 to about 40, about 40 to about 80, about 50 to about 100 or more nucleotides.

[0101] The repetitive sequence ligated to the 5' end of the spacer sequence may contain a portion of the repetitive sequence (for example, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35 or more consecutive nucleotides of the wild-type repetitive sequence). In some embodiments, a portion of the repeat sequence ligated to the 5' end of the spacer sequence may be about 5 to about 10 consecutive nucleotides long (e.g., about 5, 6, 7, 8, 9, 10 nucleotides) and have at least 90% identity (e.g., at least about 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or more) with respect to the same region (e.g., the 5' end) of the wild-type Cas12a repeat nucleotide sequence. In some embodiments, a portion of the repeat sequence includes a pseudoknot-like structure (e.g., a "handle") at its 5' end.

[0102] As used herein, “spacer sequence” is a nucleotide sequence (e.g., a protospacer) complementary to the target nucleic acid (e.g., target DNA). The spacer sequence may be fully complementary to the target nucleic acid or substantially complementary (e.g., at least about 70% complementary (e.g., about 70%, 71%, 72%, 73%, 74%, 75%, 76%, 77%, 78%, 79%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or more)). Thus, in some embodiments, the spacer sequence may have one, two, three, four, or five mismatches compared to the target nucleic acid, and such mismatches may be consecutive or discontinuous. In some embodiments, the spacer sequence may have 70% complementarity with the target nucleic acid. In other embodiments, the spacer nucleotide sequence may have 80% complementarity with the target nucleic acid. In yet another embodiment, the spacer nucleotide sequence may have complementarity with the target nucleic acid (protospacer) of 85%, 90%, 95%, 96%, 97%, 98%, 99%, or 99.5%, etc. In some embodiments, the spacer sequence is 100% complementary to the target nucleic acid. The spacer sequence may have a length of about 15 to about 30 nucleotides (e.g., 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, or 30 nucleotides, or any range or value within that range). Therefore, in some embodiments, the spacer sequence may have complete or substantial complementarity over a region of the target nucleic acid (e.g., protospacer) that is at least about 15 to about 30 nucleotides long. In some embodiments, the spacer is about 20 nucleotides long. In some embodiments, the spacer is about 23 nucleotides long.

[0103] In some embodiments, the 5' region of the guide RNA spacer sequence may be identical to the target DNA, while the 3' region of the spacer may be substantially identical to the target DNA, and therefore the overall complementarity of the spacer sequence to the target DNA may be less than 100%. For example, the first nucleotides, such as 1, 2, 3, 4, 5, 6, 7, 8, etc., in the 5' region of a 20-nucleotide spacer sequence (i.e., seed region) may be 100% complementary to the target DNA, while the remaining nucleotides in the 3' region of the spacer sequence may be substantially complementary to the target DNA (e.g., at least about 70% complementary). In some embodiments, the first 1–8 nucleotides at the 5' end of the spacer sequence (e.g., the first 1, 2, 3, 4, 5, 6, 7, 8 nucleotides, and any range within that) may be 100% complementary to the target DNA, while the remaining nucleotides in the 3' region of the spacer sequence are substantially complementary to the target DNA (e.g., at least about 50% complementary (e.g., 50%, 55%, 60%, 65%, 70%, 71%, 72%, 73%, 74%, 75%, 76%, 77%, 78%, 79%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or more)). In some embodiments, the seed region of the spacer may have a length of approximately 5 to 6 nucleotides. In some embodiments, the seed region of the spacer has a length of 5 nucleotides. In some embodiments, the seed region of the spacer has a length of 6 nucleotides.

[0104] As used herein, “target nucleic acid,” “target DNA,” “target nucleotide sequence,” “target region,” or “target region in genome” refers to a region of the genome of an organism that is fully complementary (100% complementary) or substantially complementary (e.g., at least 70% complementary (e.g., 70%, 71%, 72%, 73%, 74%, 75%, 76%, 77%, 78%, 79%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or more)) to the spacer sequence in the guide RNA of the present invention. Target regions useful for the CRISPR-Cas12a system are positioned immediately 3' to the PAM sequence in the genome of an organism. The target region can be selected from any of at least 15 consecutive nucleotides located immediately adjacent to the PAM sequence (e.g., 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30 nucleotides, etc.).

[0105] The “protospacer sequence” refers to a target double-stranded DNA, specifically a portion of the target DNA (e.g., or a target region within the genome) that is completely or substantially complementary to (and hybridizes with) the spacer sequence of the CRISPR repeat-spacer sequence (e.g., guide RNA, CRISPR array, crRNA). In the type V CRISPR-Cas Cas12a system, the protospacer sequence is adjacent to (e.g., immediately adjacent to) a protospacer fringe motif (PAM). The PAM is located at the 5' end of the non-target strand and at the 3' end of the target strand (see below for an example). [ka]

[0106] Canonical Cas12a PAMs are T-rich. In some embodiments, the canonical Cas12a PAM sequence may be 5'-TTN, 5'-TTTN, or 5'-TTTV. In some embodiments, non-canonical PAMs may be used, but they may not be very effective.

[0107] Further PAM sequences may be determined by those skilled in the art by established experimental and computational approaches. For example, experimental approaches include targeting sequences adjacent to all possible nucleotide sequences and identifying sequence members that are not targeted, for example, by transformation of the target plasmid DNA (Esvelt et al. 2013. Nat Methods 10:1116-1121, Jiang et al. 2013. Nat. Biotechnol. 31:233-239). In some embodiments, computational approaches may include performing BLAST searches of native spacers to identify the original target DNA sequence within a bacteriophage or plasmid, and aligning these sequences to determine conserved sequences adjacent to the target sequence (Briner and Barrangou. 2014. Appl. Environ. Microbiol. 80:994-1001, Mojica et al. 2009. Microbiology 155:733-740).

[0108] In some embodiments, complexes and compositions are provided comprising one or more (e.g., 1, 2, 3, 4, 5, 6, 7, 8, or more) fusion proteins of the present invention and one or more (e.g., 1, 2, 3, 4, 5, 6, 7, 8, or more) guide nucleic acids (e.g., CRISPR RNA / DNA, e.g., crRNA / crDNA). In some embodiments, polynucleotides or nucleic acid constructs encoding the polypeptides, fusion proteins, guide nucleic acids, and / or complexes of the present invention are provided. In some embodiments, nucleic acid constructs, expression cassettes, and / or vectors comprising the polynucleotides and / or one or more guide nucleic acids of the present invention are provided. In some embodiments, the polynucleotide encoding the fusion protein of the present invention may be the same as or on a separate polynucleotide, nucleic acid construct, expression cassette, or vector containing the guide nucleic acids. If the fusion protein is encoded on a polynucleotide, nucleic acid construct, expression cassette, or vector separate from the one containing the guide nucleic acid, the polynucleotide, nucleic acid construct, expression cassette, or vector encoding the fusion protein of the present invention may be provided before (e.g., in contact with the target nucleic acid), simultaneously with (e.g., in contact with the target nucleic acid) the guide nucleic acid is provided, or after (e.g., in contact with the target nucleic acid) the guide nucleic acid is provided.

[0109] In some embodiments, the polynucleotides, nucleic acid constructs, expression cassettes and / or vectors of the present invention may be codon-optimized for expression in organisms. In some embodiments, the optimized polynucleotides, nucleic acid constructs, or expression cassettes of the present invention may be about 70% to 100% (e.g., about 70%, 71%, 72%, 73%, 74%, 75%, 76%, 77%, 78%, 79%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, 99.5%, or 100%) identical to the polynucleotides, nucleic acid constructs, or expression cassettes encoding the polypeptides, fusion proteins and complexes of the present invention.

[0110] In some embodiments, cells comprising one or more polynucleotides, guide nucleic acids, nucleic acid constructs, expression cassettes, or vectors of the present invention are provided.

[0111] The polypeptides, fusion proteins, guide RNAs, complexes, and compositions of the present invention, as well as the encoding polynucleotide / nucleic acid constructs / expression cassettes / vectors, can be used to modify target nucleic acids and / or their expression.

[0112] In some embodiments, the fusion protein of the present invention is an adenine base editor (ABE) for use in base editing of a target nucleic acid, wherein the fusion protein comprises a Cas12a domain linked to an adenine deaminase domain.

[0113] In some embodiments, a method for modifying a target nucleic acid is provided, comprising modifying the target nucleic acid by contacting it with (a)(i) the fusion protein of the present invention and (a)(ii) a guide nucleic acid (e.g., CRISPR RNA, CRISPR DNA, crRNA, crDNA), (b) a complex comprising the fusion protein of the present invention and the guide nucleic acid, (c) a composition comprising the fusion protein of the present invention and the guide nucleic acid, and / or (d) a system of the present invention. The target nucleic acid may be contacted with the fusion protein before, simultaneously with, or after contact with the target nucleic acid and the guide nucleic acid.

[0114] In some embodiments, a method for modifying a target nucleic acid is provided, comprising contacting the target nucleic acid with a guide nucleic acid and a fusion protein containing one of the amino acid sequences of SEQ ID NOs: 49-77 or 90-96. The target nucleic acid may be contacted with the fusion protein of the present invention before, simultaneously with, or after contacting the target nucleic acid with the guide nucleic acid.

[0115] In some embodiments, a method for modifying a target nucleic acid is provided, comprising contacting a cell or cell-free system containing a target nucleic acid with (a)(i) a polynucleotide encoding the polypeptide or fusion protein of the present invention, or an expression cassette or vector containing the same, and (a)(ii) a guide nucleic acid, and / or an expression cassette or vector containing the same, and / or (b) a nucleic acid construct encoding a complex containing the fusion protein and the guide nucleic acid, and / or an expression cassette or vector containing the same, under conditions in which the fusion protein is expressed and forms a complex with the guide nucleic acid, the complex hybridizes to the target nucleic acid and thereby modifies the target nucleic acid. Where provided on a separate construct, the target nucleic acid may be contacted with the polynucleotide encoding the fusion protein, nucleic acid construct, expression cassette, or vector before, simultaneously with, or after contacting the target nucleic acid with the guide nucleic acid.

[0116] In some embodiments, a method for modifying a target nucleic acid is provided, comprising contacting a cell or cell-free system containing a target nucleic acid with a polynucleotide encoding a fusion protein comprising one of the amino acid sequences of SEQ ID NOs. 49-77 or 90-96, or an expression cassette or vector containing the same, and a guide nucleic acid or an expression cassette or vector containing the same, under conditions in which the fusion protein is expressed and forms a complex with the guide nucleic acid, the complex hybridizes to the target nucleic acid and thereby modifies the target nucleic acid. If provided on a separate construct, the target nucleic acid may be contacted with the polynucleotide encoding the fusion protein, nucleic acid construct, expression cassette, or vector before, simultaneously with, or after contacting the target nucleic acid with the guide nucleic acid.

[0117] In some embodiments, the present invention provides a method for editing a target nucleic acid, comprising contacting the target nucleic acid with (a)(i) the fusion protein of the present invention and (a)(ii) a guide nucleic acid, (b) a complex comprising the fusion protein of the present invention and the guide nucleic acid, (c)(i) a composition comprising the fusion protein of the present invention and (ii) a guide nucleic acid, and / or (d)(i) a CRISPR-Cas system of the present invention, wherein the adenine deaminase domain converts adenosine (A) in the target nucleic acid to guanine (G), thereby editing the target nucleic acid to produce a (point) mutation. The target nucleic acid may be contacted with the fusion protein of the present invention before, simultaneously with, or after contacting the target nucleic acid with the guide nucleic acid.

[0118] In some embodiments, a method for editing a target nucleic acid is provided, comprising contacting the target nucleic acid with a fusion protein containing one of the amino acid sequences of SEQ ID NOs: 49-77 or 90-96, and a guide nucleic acid, thereby editing the target nucleic acid. The target nucleic acid may be contacted with the fusion protein of the present invention before, simultaneously with, or after contacting the target nucleic acid with the guide nucleic acid.

[0119] In some embodiments, a method is provided for editing a target nucleic acid, comprising contacting a cell or cell-free system containing a target nucleic acid with (a)(i) a polynucleotide encoding the fusion protein of the present invention, and / or an expression cassette or vector containing the same, and (a)(ii) a guide nucleic acid, and / or an expression cassette or vector containing (a)(i) and / or (a)(ii), and / or (b) a nucleic acid construct encoding a complex containing the fusion protein and the guide nucleic acid, or an expression cassette or vector containing the same, under conditions in which the fusion protein is expressed and forms a complex with the guide nucleic acid, wherein the complex hybridizes to the target nucleic acid, and the adenine deaminase domain converts adenosine (A) in the target nucleic acid to guanine (G), thereby editing the target nucleic acid to produce a (point) mutation. If provided on a separate construct, the target nucleic acid may be contacted with the fusion protein before, simultaneously with, or after contacting the target nucleic acid and the guide nucleic acid.

[0120] In some embodiments, a method for editing a target nucleic acid is provided, comprising contacting a cell or cell-free system containing the target nucleic acid with a polynucleotide encoding a fusion protein containing one of the amino acid sequences of SEQ ID NOs: 49-77 or 90-96, or an expression cassette or vector containing the same, and a guide nucleic acid or an expression cassette or vector containing the same, under conditions in which the fusion protein is expressed and forms a complex with the guide nucleic acid, thereby hybridizing the complex to the target nucleic acid and editing the target nucleic acid. The polynucleotide encoding the fusion protein containing one of the amino acid sequences of SEQ ID NOs: 49-77 or 90-96 may be located on the same expression cassette or vector as the one containing the guide nucleic acid. If the polynucleotide encoding the fusion protein containing one of the amino acid sequences of SEQ ID NOs: 49-77 or 90-96 is located on a separate expression cassette or vector from the one containing the guide nucleic acid, the target nucleic acid may be contacted with the expression cassette / vector containing the fusion protein before, simultaneously with, or after contact with the expression cassette / vector containing the guide nucleic acid.

[0121] In some embodiments, the adenine deaminase of the fusion protein of the present invention causes an A→G conversion in the sense (e.g., "+"; template) strand of the target nucleic acid, or a T→C conversion in the antisense (e.g., "-"; complementary) strand of the target nucleic acid.

[0122] The fusion proteins and polypeptides of the present invention and the nucleic acid constructs encoding them may be used in combination with guide nucleic acids for modifications of target nucleic acids, including but not limited to plasmid sequences, including the generation of A→G or T→C mutations in target nucleic acids, the generation of A→G or T→C mutations in coding sequences to alter amino acid identity, the generation of A→G or T→C mutations in coding sequences to generate stop codons, the generation of A→G or T→C mutations in coding sequences to disrupt start codons, the generation of point mutations in genomic DNA to disrupt transcription factor binding, the generation of point mutations in genomic DNA to disrupt splice junctions, and / or other nucleic acid modifications generated by a fusion protein containing a Cas12a domain fused to another domain (polypeptide of interest) via any one of the amino acid sequences of SEQ ID NOs. 1 to 24 (e.g., peptide linkers).

[0123] The fusion proteins of the present invention and the polypeptides and nucleic acid constructs encoding them may be useful for modifying target nucleic acids in any organism, including but not limited to animals, plants, fungi, archaea, or bacteria. Animals include, but are not limited to, mammals, insects, fish, birds, etc.

[0124] Exemplary mammals to which the present invention may be useful may include, but are not limited to, primates (humans and non-humans (e.g., chimpanzees, baboons, monkeys, gorillas, etc.)), cats, dogs, mice, rats, ferrets, gerbils, hamsters, cattle, pigs, horses, goats, donkeys, or sheep.

[0125] Any target nucleic acid of any plant or plant part can be modified using the fusion protein of the present invention and the polypeptide and nucleic acid construct encoding it. Any plant (or group of plants, e.g., genera or higher-order classification) can be used in carrying out the present invention, including angiosperms, gymnosperms, monocots, dicots, C3, C4, CAM plants, bryophytes, ferns and / or ferns, microalgae, and / or macroalgae. Plants and / or plant parts useful to the present invention may be plants and / or plant parts of any plant species / variety / cultivar. The term "plant part" as used herein is not limited to, but includes, embryos, pollen, ovules, seeds, leaves, stems, shoots, flowers, branches, fruits, kernels, spikes, cobs, exoskeletons, stems, roots, root tips, anthers, and plant cells (including intact plant cells, plant protoplasts, plant tissues, plant cell tissue cultures, plant callus, plant masses, etc., in plants and / or plant parts). As used herein, “shoot” refers to the above-ground part of a plant, including leaves and stems. Furthermore, as used herein, “plant cell” refers to the structural and physiological unit of a plant, which includes a cell wall and may also refer to a protoplast. A plant cell may be in the form of an isolated single cell, a cultured cell, or a more highly organized unit, such as part of a plant tissue or plant organ.

[0126] The fusion proteins and polypeptides of the present invention and the nucleic acid constructs encoding them can be used to modify (e.g., base editing, cleavage, nicking, etc.) target nucleic acids of any plant or plant part. Non-limiting examples of plants useful for the present invention include turfgrasses (e.g., bluegrass, bentgrass, ryegrass, sedge), reeds, broadleaf sedges, Japanese pampas grass, Aruncus, switchgrass, vegetable crops (artichoke, kohlrabi, yellow radish, chives, asparagus, lettuce (e.g., salad greens, leaf lettuce, tallow lettuce), maranga, melons (e.g., muskmelon, watermelon, Crenshaw melon, honeydew melon, cantaloupe), and Brassica crops (e.g., Brussels sprouts). Cabbage, cauliflower, broccoli, collard, kale, Chinese cabbage, bok choy), cardoni, carrots, Chinese cabbage, okra, onions, celery, parsley, chickpeas, parsnips, chicory, pepper, potatoes, cucurbits (e.g., mallow, cucumber, zucchini, squash, pumpkin, honeydew melon, watermelon, cantaloupe), radishes, dried onions, rutabaga, eggplant, pokeweed, shallots, endive, garlic, spinach Vegetables include leeks, squash, greens, beets (sugar beets and fodder beets), sweet potatoes, Swiss chard, horseradish, tomatoes, turnips, and spices), fruit crops such as apples, apricots, cherries, nectarines, peaches, pears, plums, prunes, cherries, quince, figs, nuts (such as chestnuts, pecans, pistachios, hazelnuts, peanuts, walnuts, macadamia nuts, almonds, etc.), citrus fruits (such as clementines, kumquats, and oranges). Oranges, grapefruits, tangier mandarins, mandarins, lemons, limes, etc.), blueberries, black raspberries, boysenberries, cranberries, currants, gooseberries, loganberries, raspberries, strawberries, blackberries, grapes (for wine and eating), avocados, bananas, kiwis, persimmons, pomegranates, pineapples, tropical fruits, pears, melons, mangoes, papayas, and lychees, crops such as clover, alfalfa, timothy, evening primrose, meadowfoam,Corn (for feed, sweet corn, popcorn), hops, jojoba, buckwheat, safflower, quinoa, wheat, rice, barley, rye, millet, sorghum, oats, rye wheat, sorghum, tobacco, kapok, legumes (kidney beans (e.g., string beans and dried kidney beans), lentils, peas, soybeans), oil plants (rapeseed, canola, mustard, poppy, olive, sunflower, coconut, castor oil plants, cocoa beans, peanuts, oil palm), duckweed, Arabidopsis, fiber plants (cotton, flax, hemp, jute), cannabis (e.g., hemp (Cannabis sativa), Indian hemp (Cannabis indica), and Cannabis ruderalis (Cannabis Examples include plants such as ruderalis, laurel (cinnamon, camphor), or coffee, sugarcane, tea, and natural rubber plants, as well as / or flowerbed plants, such as flowering plants, cacti, succulents, and / or ornamental plants (e.g., roses, tulips, violets), as well as trees, such as forest trees (broad-leaved and evergreen plants, such as conifers, such as elm, ash, oak, maple, fir, spruce, cedar, pine, birch, cypress, eucalyptus, willow), as well as shrubs and other seedlings. In some embodiments, the fusion proteins and polypeptides of the present invention and the nucleic acid constructs encoding them may be used to modify corn, soybeans, wheat, canola, rice, tomatoes, pepper, sunflowers, raspberries, blackberries, black raspberries, and / or cherries.

[0127] The present invention further includes a kit for carrying out the method of the present invention. The kit of the present invention may include reagents, buffers, and apparatus for mixing, measuring, sorting, labeling, etc., as well as instructions suitable for modifying target nucleic acids.

[0128] In some embodiments, the present invention provides a kit comprising one or more polypeptides of the present invention, one or more fusion proteins of the present invention, one or more polynucleotides encoding one or more fusion proteins of the present invention, a CRISPR-Cas system of the present invention, and / or an expression cassette or vector comprising the same, optionally together with instructions for use. In some embodiments, the kit may further comprise a Cas12a guide nucleic acid and / or an expression cassette or vector comprising the same. In some embodiments, the guide nucleic acid may be provided on the same expression cassette or vector as the polynucleotide encoding the fusion protein of the present invention.

[0129] Therefore, in some embodiments, a kit is provided comprising a nucleic acid construct comprising (a) a polynucleotide encoding a fusion protein provided herein, and (b) a promoter that drives the expression of the polynucleotide of (a). In some embodiments, the kit may further comprise a nucleic acid construct encoding a guide nucleic acid, the construct comprising a cloning site for cloning a nucleic acid sequence identical to or complementary to the target nucleic acid sequence into the backbone of the guide nucleic acid.

[0130] In some embodiments, the polypeptide of the kit may further comprise one or more nuclear localization signals fused to the fusion protein, or a polynucleotide encoding them. In some embodiments, the polynucleotide of the kit may further encode one or more selection markers useful for identifying transformants (e.g., nucleic acids encoding antibiotic resistance genes, herbicide resistance genes, etc.). In some embodiments, the polynucleotide may be mRNA capable of encoding one or more introns within the encoded fusion protein.

[0131] Next, the present invention will be described with respect to the following embodiments. It should be recognized that these embodiments are not intended to limit the claims to the present invention, but rather to illustrate specific embodiments. Any variation of the illustrated methods that a person skilled in the art could imagine is intended to fall within the scope of the present invention. [Examples]

[0132] [Example 1] Currently, no successful version of adenine base editor based on Cas12a has been demonstrated. Therefore, the inventors have developed an optimal linker length and sequence based on the ideal placement of the deaminase relative to the DNA strand being edited, and the combination of Cas12a and adenine deaminase (e.g., TadA / TadA). * We attempted to develop an optimized Cas12a-based adenine base editor by designing a fusion of either the N-terminus or C-terminus of a dimer.

[0133] In the initial design of the fusion protein, Cas12a (LbCas12a) from the Lacnospiraceae bacterium ND2006 (e.g., SEQ ID NO: 30) was used due to its low temperature sensitivity and proven activity in plant cells. However, because of the high structural similarity between different Cas12a endonucleases, these designs should be extended to Cas12a enzymes from other species (e.g., Cpf1 (AsCpf1) from the genus Acidaminococcus, Cpf1 (FnCpf1) from Francisella novicida, and others, see SEQ ID NOs: 31-46).

[0134] Using a structure-based approach, the inventors identified an adenine deaminase domain (e.g., TadA / TadA) in relation to the Cas12a domain. *We developed several linker sequences designed to enable the optimal arrangement of the adenine deaminase domain. These linkers allow access to the non-target single-stranded portion for base editing. Due to the arrangement of the Cas12a terminus and the orientation of its guide RNA, the ideal linker sequence and length are likely to differ significantly from the current state-of-the-art linkers used in Cas9 ABEs. In this example, the linkers are designed to accommodate several possible base editor domain architectures, ligating the adenine deaminase domain to either end of Cas12a and alternating the order of the wild-type and evolved TadA domains. The exemplary linkers we designed are listed in Table 1.

[0135] [Table 1]

[0136] Figures 1A to 1C provide an overview of various structures developed using the designed linker.

[0137] To test the effectiveness of each designed linker sequence (including length, flexibility, and sensitivity to proteases), constructs containing each linker sequence in a vector for expression in mammalian cells were prepared (see, e.g., SEQ ID NOs. 49-77 or 90-96). Each linker was tested in the relevant domain configuration (fusion of the TadA heterodimer to the N-terminus or C-terminus of LbCpf1) (Figures 1A and 1B). For some of the C-terminal linkers (Cterm_1, Cterm_4, Cterm_5, C9R, and Cterm_10), the order of the deaminase components (mutant and wild-type) was reversed (Figure 1C). After screening in mammalian cells, the most effective linker of each architecture was selected for testing in stable plant transformations (e.g., soybeans).

[0138] [Example 2] Editing in HEK293T cells HEK293T cells were seeded in DMEM medium in the absence of antibiotics on 48-well collagen-coated plates (Corning). At 70-80% confluence, cells were transfected with 1.5 μL of Lipofectamine 3000 (ThermoFisher Scientific) using 750 ng of a base editor plasmid and 250 ng of a guide RNA plasmid, according to the manufacturer's protocol. After 3 days, the cells were lysed and DNA was extracted using the MagMax DNA Extraction Kit (Applied Biosystems). Spacer sequences used in guide RNA: DMNT1 Spacer 1: AAGAAATATTACAACATATAAAA Sequence ID 83 DMNT1 Spacer 2: AAATCCAGAATGCACAAAGTACT Sequence ID 84 DMNT1 Spacer 3: ATATAATGCATAATAAAAAACTT Sequence ID 85 RNF2 Spacer 1: TATGAGTTACAACGAACACCTCA Sequence ID 86 RNF2 Spacer 2: CACGTCTCATATGCCCCTTGGCA Sequence ID 87 RNF2 Spacer 3: GAACATGAAAACTTAAATAGAAC Sequence ID 88 RNF2 Spacer 4: ATGTTCTAAAAATGTATCCCAGT Sequence ID 89

[0139] Table 2 shows the mean frequency of adenine-to-guanine editing at the editing sites observed in the three tested spacers. All experimental linker constructs were constructed using the indicated linker as fusions from dLbCas12a to TadA8.20m from the indicated terminal (e.g., the Cterm1_8.20m construct contains dLbCas12a-Cterm1-TadA8.20m). N-terminal fusions of TadA8.20m or TadA8e with GS-XTEN-GS linkers to dLbCas12a were used as controls.

[0140] [Table 2]

[0141] Figures 3–5 show the editing frequencies in Table 2 in graphical form. For each construct, the amount of adenine-to-guanine editing observed at each editing position within the spacer is indicated by individual bars (e.g., A8, A11 in Figure 3, A9, A10, A14 in Figure 4, A10, A12, etc. in Figure 5). Figure 2 shows the average LbCas12a nuclease activity observed in each of the three test spacers in the same experiment. Based on these data, five linkers were selected as candidates for further testing as fusions with TadA8e deaminase (Cterm10, Cterm12, Nterm7, Nterm10, and Nterm11). The editing data for these constructs and two control ABEs are shown in Table 3 and Figures 6–10.

[0142] [Table 3]

[0143] Figures 7–10 show the mean adenine-to-guanine editing frequencies observed at each position within the target spacer for five selected linkers. Figure 6 shows the mean LbCas12a nuclease activity observed in each of the four test spacers in the same experiment. In each of these figures, the error bars represent the standard deviation of three replicates.

[0144] These data indicate that C-terminal fusion of adenine deaminase to dLbCas12 using the designed linker Cterm12 is consistently superior to the control construct.

[0145] The foregoing is an example of the present invention and should not be construed as an limitation thereof. The present invention is defined by the following claims, and equivalents of the claims should also be included therein. The claims at the time of filing were as follows: [Claim 1] A polypeptide containing one of the amino acid sequences from SEQ ID NOs: 1 to 24. [Claim 2] The polypeptide according to claim 1, further comprising the polypeptide of interest and one of the amino acid sequences of SEQ ID NOs: 1 to 24. [Claim 3] A polypeptide containing the Cas12a domain and one of the amino acid sequences of SEQ ID NOs: 1-24. [Claim 4] A fusion protein containing a Cas12a domain, a polypeptide of interest, and one of the amino acid sequences from SEQ ID NOs: 1 to 24. [Claim 5] The polypeptide according to claim 3 or the fusion protein according to claim 4, wherein the Cas12a domain contains a mutation within the nuclease active site. [Claim 6] The fusion protein according to claim 4 or 5, wherein the Cas12a domain is linked at its C-terminus and / or N-terminus to one of the amino acid sequences of SEQ ID NOs. 1 to 24. [Claim 7] The fusion protein according to any one of claims 4 to 6, wherein the C-terminus of the Cas12a domain is ligated to the N-terminus of any one of the amino acid sequences of SEQ ID NOs: 1 to 24, and the C-terminus of any one of the amino acid sequences of SEQ ID NOs: 1 to 24 is ligated to the N-terminus of the polypeptide of interest. [Claim 8] The fusion protein according to any one of claims 4 to 6, wherein the N-terminus of the Cas12a domain is ligated to the C-terminus of any one of the amino acid sequences of SEQ ID NOs: 1 to 24, and the N-terminus of any one of the amino acid sequences of SEQ ID NOs: 1 to 24 is ligated to the C-terminus of the polypeptide of interest. [Claim 9] The polypeptide of interest exhibits the following activities: deaminase (deamination) activity (e.g., cytosine deaminase, adenine deaminase), nickase activity, recombinase activity, transposase activity, methylase activity, glycosylase (DNA glycosylase) activity, glycosylase inhibitory activity (e.g., uracil-DNA glycosylase inhibitor (UGI)), demethylase activity, transcriptional activation activity, transcriptional repression activity, transcriptional deactivation factor activity, histone modification activity, nuclease activity, single-stranded RNA cleavage activity, and double-stranded RNA cleavage activity. A polypeptide according to claim 2 or a fusion protein according to any one of claims 4 to 8, comprising a protein domain having decongestant activity, restriction endonuclease activity (e.g., Fok1), nucleic acid binding activity, methyltransferase activity, DNA repair activity, DNA damage activity, dismutase activity, alkylation activity, deprylation activity, oxidative activity, pyrimidine dimer formation activity, integrase activity, transposase activity, polymerase activity, ligase activity, helicase activity, and / or photolyase activity. [Claim 10] The polypeptide of interest comprises an adenine deaminase domain, as described in claim 2 or claim 3, or the fusion protein described in any one of claims 4 to 8. [Claim 11] The adenine deaminase domain is TadA (tRNA-specific adenosine deaminase) and / or TadA * A polypeptide according to any one of claims 2, 3, or 10, or a fusion protein according to any one of claims 4 to 10, which is an evolved tRNA-specific adenosine deaminase. [Claim 12] A polynucleotide encoding a polypeptide according to any one of claims 1 to 3, 5, or 9 to 11, or a fusion protein according to any one of claims 4 to 11. [Claim 13] The polynucleotide according to claim 12, wherein the polynucleotide is codon-optimized for expression in a living organism. [Claim 14] The polynucleotide according to claim 13, wherein the organism is an animal, plant, fungus, archaea, or bacterium. [Claim 15] A complex comprising the fusion protein and guide nucleic acid according to any one of claims 4 to 9. [Claim 16] A complex comprising the fusion protein and guide nucleic acid according to claim 10 or claim 11. [Claim 17] A nucleic acid construct encoding the complex according to claim 15 or claim 16. [Claim 18] (a) a polypeptide according to any one of claims 1 to 3, 5, or 9, or a fusion protein according to any one of claims 4 to 9, and (b) a composition comprising a guide nucleic acid. [Claim 19] An expression cassette or vector comprising a polynucleotide according to any one of claims 12 to 14 or a nucleic acid construct according to claim 17. [Claim 20] (a) A fusion protein comprising a Cas12a domain, a linker containing one amino acid sequence of SEQ ID NOs: 1 to 24, and a polypeptide of interest, wherein the Cas12a domain is linked to the polypeptide of interest via one amino acid sequence of SEQ ID NOs: 1 to 24, or a nucleic acid encoding the fusion protein; and (b) A guide nucleic acid comprising a spacer sequence and a repeat sequence, wherein the guide nucleic acid can form a complex with the Cas12a domain of the fusion protein, and the spacer sequence can hybridize to a target nucleic acid, thereby guiding the Cas12a domain and the polypeptide of interest to the target nucleic acid, thereby allowing the system to modify (e.g., cleave or edit) or modulate (e.g., modulate transcription) the target nucleic acid. A V-type Clustered Regularly Interspaced Short Palindromic Repeats (CRISPR)-related (Cas) (CRISPR-Cas) system including [specific component]. [Claim 21] The system according to claim 20, wherein the Cas12a domain contains a mutation in the nuclease active site. [Claim 22] The system according to claim 20 or 21, wherein the Cas12a domain is ligated at its C-terminus and / or N-terminus to one of the amino acid sequences of SEQ ID NOs. 1 to 24. [Claim 23] The system according to any one of claims 20 to 22, wherein the Cas12a domain is ligated by its C-terminus to the N-terminus of the polypeptide of interest via one of the amino acid sequences of SEQ ID NOs: 1 to 24. [Claim 24] The system according to any one of claims 20 to 22, wherein the Cas12a domain is ligated by its N-terminus to the C-terminus of the polypeptide of interest via one of the amino acid sequences of SEQ ID NOs: 1 to 24. [Claim 25] The system according to any one of claims 20 to 24, wherein the polypeptide of interest comprises at least one polypeptide or protein domain having deaminase (deamination) activity, nickase activity, recombinase activity, transposase activity, methylase activity, glycosylase (DNA glycosylase) activity, glycosylase inhibitory activity (e.g., uracil-DNA glycosylase inhibitor (UGI)), demethylase activity, transcription activation activity, transcription repression activity, transcription de-transcriptional activity, histone modification activity, nuclease activity, single-strand RNA cleavage activity, double-strand RNA cleavage activity, restriction endonuclease activity (e.g., Fok1), nucleic acid binding activity, methyltransferase activity, DNA repair activity, DNA damage activity, dismutase activity, alkylation activity, deprylation activity, oxidation activity, pyrimidine dimer formation activity, integrase activity, transposase activity, polymerase activity, ligase activity, helicase activity, and / or photolyase activity. [Claim 26] The system according to any one of claims 20 to 25, wherein the polypeptide of interest is adenine deaminase or an adenine deaminase domain. [Claim 27] The system according to claim 26, wherein the adenine deaminase is tRNA-specific adenosine deaminase (TadA). [Claim 28] The system according to claim 26, wherein the polypeptide of interest is an adenine deaminase dimer (e.g., a first adenine deaminase and a second adenine deaminase). [Claim 29] The system according to any one of claims 20 to 28, wherein one or both of (a) and (b) are included in one or more expression cassettes and / or vectors. [Claim 30] A cell comprising a polynucleotide according to any one of claims 12 to 14, a nucleic acid construct according to claim 17, an expression cassette or vector according to claim 19, or a system according to any one of claims 20 to 29. [Claim 31] Target nucleic acids (a)(i) a fusion protein according to any one of claims 4 to 11, and (a)(ii) a guide nucleic acid; (b) The composite according to claim 15 or 16; (c) A composition comprising a fusion protein and guide nucleic acid as described in any one of claims 4 to 11; and / or (d) The system according to any one of claims 20 to 28 A method for modifying a target nucleic acid, comprising bringing it into contact with a target nucleic acid, thereby modifying the target nucleic acid. [Claim 32] Cells or cell-free systems containing target nucleic acids (a)(i) a polypeptide according to claim 3 or claim 5, or a polynucleotide encoding a fusion protein according to any one of claims 4 to 11, or an expression cassette or vector comprising the same, and (a)(ii) a guide nucleic acid or an expression cassette or vector comprising the same; and / or (b) A nucleic acid construct encoding the complex described in claim 15 or claim 16 and / or an expression cassette or vector comprising the same. A method for modifying a target nucleic acid, comprising contacting the fusion protein with the guide nucleic acid under conditions in which the fusion protein is expressed and forms a complex with the guide nucleic acid, wherein the complex hybridizes with the target nucleic acid and thereby modifies the target nucleic acid. [Claim 33] Target nucleic acids (a)(i) the fusion protein according to claim 10 or claim 11, and (a)(ii) guide nucleic acid; (b) The composite according to claim 16; (c)(i) a composition comprising the fusion protein described in claim 10 or claim 11 and (c)(ii) a guide nucleic acid; and / or (d)(i) The system according to any one of claims 26 to 28 A method for editing a target nucleic acid, comprising contacting the adenine deaminase domain with the target nucleic acid, wherein the adenine deaminase domain converts adenosine (A) in the target nucleic acid to guanine (G), thereby editing the target nucleic acid and causing a mutation (e.g., a point mutation). [Claim 34] Cells or cell-free systems containing target nucleic acids (a)(i) a polynucleotide encoding the fusion protein according to claim 10 or claim 11, or an expression cassette or vector containing the same, and (a)(ii) a guide nucleic acid or an expression cassette or vector containing the same; (b) A nucleic acid construct encoding the complex described in claim 16, or an expression cassette or vector containing the same, and / or (c) The system according to claim 29 A method for editing a target nucleic acid, comprising contacting the fusion protein under conditions in which the fusion protein is expressed and forms a complex with the guide nucleic acid, wherein the complex hybridizes with the target nucleic acid, the adenine deaminase domain converts adenosine (A) in the target nucleic acid to guanine (G), thereby editing the target nucleic acid and causing a (point) mutation. [Claim 35] The method according to claim 33 or 34, wherein the point mutation is an A→G conversion in the sense (e.g., "+"; template) strand of the target nucleic acid, or a T→C conversion in the antisense (e.g., "-"; complementary) strand of the target nucleic acid. [Claim 36] (a) A Cas12a domain that specifically binds to a target nucleic acid sequence when accompanied by a bound guide nucleic acid (e.g., gRNA); (b) First adenine deaminase domain, (c) Second adenine deaminase domain Includes, When the first and second adenine deaminase domains are present together with the Cas12a domain and the gRNA, they deaminate adenosine bases in the single-stranded portion of the target nucleic acid sequence. A fusion protein in which the Cas12a domain is linked to the first adenine deaminase domain or the second adenine deaminase domain via one of the amino acid sequences of SEQ ID NOs: 1 to 24. [Claim 37] The fusion protein according to claim 36, wherein the first adenine deaminase is a wild-type adenine deaminase, the second adenine deaminase domain is a mutated / evolved adenine deaminase domain, the C-terminus of the Cas12a domain is ligated to the N-terminus of the second adenine deaminase domain via any one of the amino acid sequences of SEQ ID NOs. 11 to 15; the C-terminus of the Cas12a domain is ligated to the N-terminus of the first adenine deaminase domain via any one of the amino acid sequences of SEQ ID NOs. 11 to 24; or the N-terminus of the Cas12a domain is ligated to the C-terminus of the second adenine deaminase domain via any one of the amino acid sequences of SEQ ID NOs. 1 to 10. [Claim 38] (a) The first adenine deaminase domain; (b) Second adenine deaminase domain; and (c) Cas12a(Cpf1) domain with mutation in the nuclease active site Includes, The second adenine deaminase domain differs from the first adenine deaminase domain in that the C-terminus of the first adenine deaminase domain is linked to the N-terminus of the second deaminase domain, and the N-terminus of the Cas12a domain is linked to the C-terminus of the second adenine deaminase domain via one of the amino acid sequences of SEQ ID NOs. 1 to 10, resulting in a fusion protein. [Claim 39] The fusion protein according to any one of claims 36 to 38, wherein the first adenine deaminase domain is a wild-type adenosine deaminase domain. [Claim 40] The fusion protein according to any one of claims 36 to 39, wherein the second adenine deaminase domain is a mutated / evolved adenosine deaminase domain. [Claim 41] (a) Cas12a (Cpf1) domain; (b) First adenine deaminase domain; and (c) Second adenine deaminase domain, Includes, Unlike the first adenine deaminase domain, the second adenine deaminase domain has the C-terminus of the first adenine deaminase domain ligated to the N-terminus of the second deaminase domain, and the C-terminus of the Cas12a domain ligated to the N-terminus of the first adenine deaminase domain, and A fusion protein in which, if the first deaminase domain is a wild-type adenine deaminase domain, the Cas12a domain is ligated to the N-terminus of the first adenine deaminase domain via one of the amino acid sequences of SEQ ID NOs. 11 to 24, and if the first deaminase domain is a mutated / evolved adenine deaminase domain, the Cas12a domain is ligated to the N-terminus of the first adenine deaminase domain via one of the amino acid sequences of SEQ ID NOs. 11 to 15. [Claim 42] The fusion protein according to claim 41, wherein the first adenine deaminase domain is a wild-type tRNA-specific adenosine deaminase or a mutated / evolved tRNA-specific adenosine deaminase domain. [Claim 43] The fusion protein according to claim 41 or claim 42, wherein the second adenine deaminase domain is a wild-type tRNA-specific adenosine deaminase or a mutated / evolved tRNA-specific adenosine deaminase domain. [Claim 44] The fusion protein according to any one of claims 39, 42, or 43, wherein the wild-type tRNA-specific adenosine deaminase is wild-type Escherichia coli TadA. [Claim 45] The aforementioned mutated / evolved tRNA-specific adenosine deaminase domain is used in evolved E. coli TadA * The fusion protein according to any one of claims 40, 42, or 43. [Claim 46] The fusion protein according to claim 44, wherein the Escherichia coli TadA contains the amino acid sequence of Sequence ID No. 47. [Claim 47] The aforementioned E. coli TadA * The fusion protein according to claim 45, wherein the amino acid sequence comprises the sequence of sequence number 48. [Claim 48] The fusion protein according to any one of claims 36 to 47, wherein the Cas12a domain contains a mutation in the nuclease active site. [Claim 49] The fusion protein according to any one of claims 36 to 48, wherein the first deaminase domain is linked to the second deaminase domain via a linker. [Claim 50] The fusion protein according to claim 49, wherein the linker is a GS linker. [Claim 51] The fusion protein according to claim 50, wherein the GS linker is (GSS)n, S(GGS)n (SEQ ID NO: 25), SGGS (SEQ ID NO: 26), SGGSGGSGGS (SEQ ID NO: 27), SGSETPGTSESATPES (SEQ ID NO: 28), and / or SGGSSGGSSGSETPGTSESATPESSGGSSGGS (SEQ ID NO: 29). [Claim 52] The fusion protein according to claim 50, wherein the linker is SGGSSGGSSGSETPGTSESATPESSGGSSGGS (Sequence ID 29). [Claim 53] A polynucleotide encoding a fusion protein according to any one of claims 36 to 52. [Claim 54] The polynucleotide according to claim 53, wherein the polynucleotide is codon-optimized for expression in a living organism. [Claim 55] A complex comprising a fusion protein and a guide nucleic acid according to any one of claims 36 to 52. [Claim 56] A nucleic acid construct encoding the complex described in claim 55. [Claim 57] An expression cassette or vector comprising the polynucleotide described in claim 53 or claim 54 or the nucleic acid construct described in claim 56. [Claim 58] A cell comprising a polynucleotide according to claim 53 or claim 54, a nucleic acid construct according to claim 56, or an expression cassette or vector according to claim 57. [Claim 59] A composition comprising a fusion protein and a guide nucleic acid according to any one of claims 36 to 52. [Claim 60] Target nucleic acids (a)(i) a fusion protein according to any one of claims 36 to 52, and (a)(ii) a guide nucleic acid; (b) The composite according to claim 55; and / or (c) composition according to claim 59 This includes making contact with A method for editing a target nucleic acid, wherein the adenine deaminase domain converts adenosine (A) in the target nucleic acid to guanine (G), thereby editing the target nucleic acid and causing a (point) mutation in the target nucleic acid. [Claim 61] Cells or cell-free systems containing target nucleic acids (a)(i) a polynucleotide according to claim 53 or claim 54 and (a)(ii) a guide nucleic acid, and / or an expression cassette or vector comprising (a)(i) and / or (a)(ii); and / or (b) Nucleic acid construct and / or expression cassette or vector containing the same as described in claim 56. The process includes contacting the fusion protein under conditions in which it is expressed and forms a complex with the guide nucleic acid, wherein the complex hybridizes with the target nucleic acid. A method for editing a target nucleic acid, wherein the adenine deaminase domain converts adenosine (A) in the target nucleic acid to guanine (G), thereby editing the target nucleic acid and causing a (point) mutation in the target nucleic acid. [Claim 62] The method according to claim 60 or 61, wherein the point mutation is an A→G conversion in the sense (e.g., "+"; template) strand of the target nucleic acid, or a T→C conversion in the antisense (e.g., "-"; complementary) strand of the target nucleic acid. [Claim 63] The method according to claim 60 or claim 61, wherein the guide nucleic acid comprises a repeat sequence and a spacer sequence from 5' to 3', and the spacer sequence is 70% to 100% complementary to the target nucleic acid (protospacer). [Claim 64] The method according to any one of claims 60 to 63, wherein the target nucleic acid is adjacent to a protospacer adjacency motif (PAM). [Claim 65] The method according to claim 64, wherein the PAM comprises a nucleotide sequence of 5'-TTN, 5'-TTTV, or 5'-TTTN. [Claim 66] A kit comprising a polypeptide according to any one of claims 1 to 3, a fusion protein according to any one of claims 4 to 11 or 36 to 52, optionally together with instructions for use thereof. [Claim 67] A kit comprising a polynucleotide and / or an expression cassette or vector containing the same, optionally together with instructions for use. [Claim 68] The kit according to claim 66 or 67, further comprising a Cas12a guide nucleic acid and / or an expression cassette or vector containing the same. [Claim 69] The kit according to claim 68, wherein the guide nucleic acid includes a cloning site for cloning a nucleic acid sequence identical or complementary to a target nucleic acid sequence into the backbone of the guide nucleic acid. [Claim 70] The kit according to any one of claims 66 to 69, further comprising one or more nuclear localization signals fused to the fusion protein, or a polynucleotide encoding such signals. [Claim 71] The kit according to any one of claims 66 to 70, wherein the polynucleotide further encodes one or more selection markers. [Claim 72] The kit according to any one of claims 66 to 71, wherein the polynucleotide is mRNA and encodes one or more introns within the encoded fusion protein.

Claims

1. A fusion protein containing a Cas12a domain, an adenine deaminase domain, and one of the amino acid sequences of SEQ ID NOs: 1 to 24.

2. The fusion protein according to claim 1, wherein the Cas12a domain contains a mutation within the nuclease active site.

3. The fusion protein according to claim 1 or 2, wherein the Cas12a domain is linked at its C-terminus or N-terminus to one of the amino acid sequences of SEQ ID NOs: 1 to 24.

4. A fusion protein according to any one of claims 1 to 3, further comprising a polypeptide of interest, wherein the polypeptide of interest comprises a protein domain having nickase activity, recombinase activity, transposase activity, methylase activity, glycosylase activity, glycosylase inhibitory activity, demethylase activity, transcriptional activation activity, transcriptional repression activity, transcriptional deactivation factor activity, histone modification activity, nuclease activity, single-strand RNA cleavage activity, double-strand RNA cleavage activity, restriction endonuclease activity, nucleic acid binding activity, methyltransferase activity, DNA repair activity, DNA damage activity, dismutase activity, alkylation activity, depurination activity, oxidation activity, pyrimidine dimer formation activity, integrase activity, transposase activity, polymerase activity, ligase activity, helicase activity, and / or photolyase activity.

5. The fusion protein according to any one of claims 1 to 4, wherein the adenine deaminase domain is TadA (tRNA-specific adenosine deaminase) and / or TadA* (evolved tRNA-specific adenosine deaminase).

6. The fusion protein according to any one of claims 1 to 5, wherein the fusion protein comprises one amino acid sequence of any one of SEQ ID NOs: 49 to 77 or SEQ ID NOs: 90 to 96.

7. The fusion protein according to any one of claims 1 to 6, wherein the adenine deaminase domain comprises a first adenine deaminase domain, a second adenine deaminase domain, and the Cas12a domain containing a mutation in the nuclease active site, the second adenine deaminase domain differs from the first adenine deaminase domain in that the C-terminus of the first adenine deaminase domain is ligated to the N-terminus of the second deaminase domain, and the N-terminus of the Cas12a domain is ligated to the C-terminus of the second adenine deaminase domain via any one of the amino acid sequences of SEQ ID NOs. 1 to 10, and optionally, the first adenine deaminase domain is a wild-type adenosine deaminase domain and / or the second adenine deaminase domain is a mutated / evolved adenosine deaminase domain.

8. The adenine deaminase domain comprises a first adenine deaminase domain and a second adenine deaminase domain, wherein the second adenine deaminase domain differs from the first adenine deaminase domain in that the C-terminus of the first adenine deaminase domain is ligated to the N-terminus of the second deaminase domain, and the C-terminus of the Cas12a domain is ligated to the N-terminus of the first adenine deaminase domain, and the first adenine deaminase domain is wild-type adenosine The fusion protein according to any one of claims 1 to 6, wherein when it is an aminase domain, the Cas12a domain is linked to the N-terminus of the first adenine deaminase domain via any one of the amino acid sequences of SEQ ID NOs: 11 to 24, and when the first adenine deaminase domain is a mutated / evolved adenosine deaminase domain, the Cas12a domain is linked to the N-terminus of the first adenine deaminase domain via any one of the amino acid sequences of SEQ ID NOs: 11 to 15.

9. The fusion protein according to claim 8, wherein the first adenine deaminase domain is wild-type tRNA-specific adenosine deaminase or a mutated / evolved tRNA-specific adenosine deaminase domain, and / or the second adenine deaminase domain is wild-type tRNA-specific adenosine deaminase or a mutated / evolved tRNA-specific adenosine deaminase domain, and optionally the wild-type tRNA-specific adenosine deaminase is wild-type Escherichia coli TadA, and / or the mutated / evolved tRNA-specific adenosine deaminase domain is evolved Escherichia coli TadA*.

10. The fusion protein according to claim 8, wherein the Cas12a domain contains a mutation in the nuclease active site.

11. A complex comprising a fusion protein and a guide nucleic acid according to any one of claims 1 to 10.

12. An expression cassette or vector comprising a polynucleotide encoding a fusion protein according to any one of claims 1 to 10, wherein the Cas12a domain and the adenine deaminase domain are linked to each other by any one of the amino acid sequences of SEQ ID NOs: 1 to 24.

13. Cells comprising the expression cassette or vector according to claim 12.

14. The cell according to claim 13, wherein the cell is derived from an animal, plant, fungus, archaea, or bacteria.

15. A cell comprising the fusion protein according to any one of claims 1 to 10.

16. The cell according to claim 15, wherein the cell is derived from an animal, plant, fungus, archaea, or bacteria.

17. An in vitro method for modifying a target nucleic acid, comprising contacting the target nucleic acid with a fusion protein according to any one of claims 1 to 10, thereby modifying the target nucleic acid.

18. An in vitro method for editing a target nucleic acid, comprising contacting the target nucleic acid with a fusion protein and guide RNA according to any one of claims 1 to 10, wherein the adenine deaminase domain converts adenosine (A) in the target nucleic acid to guanine (G), thereby editing the target nucleic acid and causing a mutation.

19. A cell comprising the fusion protein described in claim 8.