Cas fusion proteins and related methods for site-specific integration

JP2026507938A5Pending Publication Date: 2026-07-01SYNGENTA CROP PROTECITON AG

Patent Information

Authority / Receiving Office
JP · JP
Patent Type
Applications
Current Assignee / Owner
SYNGENTA CROP PROTECITON AG
Filing Date
2023-06-23
Publication Date
2026-07-01

AI Technical Summary

Technical Problem

Site-specific nucleases, such as CRISPR/Cas systems, often lack accuracy and efficiency in targeted genome editing, leading to off-target editing and low frequency of desired insertion events.

Method used

A fusion protein is developed, comprising a site-specific nuclease fused to a recruiter domain with a site-specific DNA-binding domain, such as a Cro repressor family protein, to enhance the specificity and frequency of genome editing by tethering a donor polynucleotide to the target site.

Benefits of technology

The fusion protein increases the efficiency of site-specific integration by promoting spatial proximity between the cleavage site and the donor polynucleotide template, improving the accuracy and frequency of homologous recombination-mediated repair.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure 00000000_0000_ABST
    Figure 00000000_0000_ABST
Patent Text Reader

Abstract

Provided herein are fusion proteins and related methods and systems for improving the efficiency of genome editing using site-specific nucleases. The fusion proteins, systems, and methods can selectively increase the desired editing outcome (e.g., insertion of a donor polynucleotide sequence). Various useful compositions for producing and using the fusion proteins and implementing the methods are also provided.
Need to check novelty before this filing date? Find Prior Art

Description

[Technical Field]

[0001] CROSS-REFERENCE TO RELATED APPLICATIONS This application claims priority to PCT Application No. PCT / CN2023 / 080827, filed March 10, 2023, which is incorporated by reference.

[0002] The present disclosure relates to methods for increasing site-specific integration. The methods presented herein are applicable to both non-homologous end joining (NHEJ) and homology-dependent repair (HDR) mechanisms.

[0003] Sequence Listing This application is accompanied by a Sequence Listing entitled 82448SL.xml, created on March 6, 2023, which is approximately 62.5 kilobytes in size and is hereby incorporated by reference in its entirety. [Background technology]

[0004] Site-specific nucleases (SDNs) (e.g., zinc finger nucleases, transcription activator-like effector nucleases, and CRISPR-associated nucleases) are becoming increasingly popular in the gene editing space. These SDNs act as endonucleases and generally create double-strand breaks (DSBs) at specific DNA sequences, thereby activating the cell's intrinsic repair mechanisms (e.g., homologous recombination). The repair process can result in site-specific modifications to the specific DNA sequence. The CRISPR (clustered regularly interspaced short palindromic repeats) / Cas (CRISPR-associated) system evolved in bacteria and archaea as an adaptive immune system to defend against viral attacks. In recent years, the CRISPR / Cas system has attracted particular attention as a genome editing tool. The CRISPR / Cas system, which generates site-specific double-strand breaks (DSBs), can be used to edit the DNA of eukaryotic cells, for example, by creating deletions, insertions, and / or changes in the nucleotide sequence. Summary of the Invention [Problem to be solved by the invention]

[0005] The site-specific modification induced by SDN is often lacking in accuracy (for example, off-target editing may occur), and in many cases, it occurs at a low frequency.For example, when CRISPR / Cas system is configured to use donor template to cause site-specific integration, the specificity of DSB targeting may vary, and the frequency of desired insertion event may be low.Therefore, there is a need for a method to improve the efficiency of targeted genome editing using SDN. [Means for solving the problem]

[0006] This Summary is provided to introduce selected concepts that are further described below in the Detailed Description. This Summary is not intended to identify key or essential features of the claimed subject matter, nor is it intended to be used as an aid in limiting the scope of the claimed subject matter.

[0007] In one aspect, the disclosure provides a fusion protein comprising a site-specific nuclease fused to a recruiter domain comprising a site-specific DNA-binding domain.

[0008] In some embodiments, the site-specific nuclease comprises a CRISPR-associated nuclease. In some embodiments, the CRISPR-associated nuclease is selected from the group consisting of Cas5, Cas6, Cas7, Cas8, Cas9, Cas12a, Cas12b, Cas12i, Cas12j, Cas12L, Cas12e, Cas12c, Cas12d, Cas12g, Cas12h, TnpB, Cas13a, Cas13b, Cas14, and nickase or inactivated versions thereof. In some embodiments, the CRISPR-associated nuclease is a Cas9 enzyme. In some embodiments, the CRISPR-associated nuclease is a Cas12a enzyme.

[0009] In some embodiments, the recruiter domain is a Cro repressor family protein. In some embodiments, the Cro repressor family protein comprises N15 Cro, lambda Cro, P22 Cro, 434 Cro, or any combination thereof. In some embodiments, the recruiter domain comprises an amino acid sequence having at least 90% identity to any one of SEQ ID NOs: 1-4. In some embodiments, the recruiter domain comprises a dimerization domain.

[0010] In some embodiments, the fusion protein comprises a linker positioned between the site-specific nuclease and the recruiter domain. In some embodiments, the linker comprises any one of SEQ ID NOs: 6, 7, 15, or 16. In some embodiments, the fusion protein comprises a nuclear localization signal.

[0011] In some embodiments, the fusion protein comprises an amino acid sequence having at least 90% identity to SEQ ID NO:11 or 13.

[0012] In another aspect, the disclosure provides a recombinant nucleic acid encoding a fusion protein, the fusion protein comprising a site-specific nuclease fused to a recruiter domain comprising a site-specific DNA-binding domain.

[0013] In another aspect, the disclosure provides a DNA construct comprising a promoter operably linked to a recombinant nucleic acid. In some embodiments, the promoter comprises at least one of an inducible promoter, a constitutive promoter, an egg cell-specific promoter, a pollen-specific promoter, or an apical meristem-specific promoter. In some embodiments, the promoter is a ubiquitin 4 promoter, an actin promoter, a tubulin promoter, a MADS box promoter, or a plant virus promoter.

[0014] In another aspect, the disclosure provides a vector comprising the recombinant nucleic acid or DNA construct.

[0015] In another aspect, the disclosure provides a cell comprising the recombinant nucleic acid, DNA construct, or vector. In some embodiments, the cell is a plant cell. In some embodiments, the plant cell is a corn plant cell, a soybean plant cell, a rice plant cell, a wheat plant cell, or a sunflower plant cell.

[0016] In another aspect, the present disclosure provides a method of editing a nucleic acid, the method comprising: (a) providing at least one fusion protein described herein; (b) providing a nucleic acid, the nucleic acid comprising a first binding site and a target region comprising a portion of the nucleic acid, the first binding site being within or adjacent to the target region; (c) providing a donor polynucleotide comprising a donor nucleotide region and at least one recruit sequence that is specifically bound by a recruiter domain of the at least one fusion protein; and (d) contacting the nucleic acid and donor polynucleotide with at least one fusion protein, wherein the at least one fusion protein specifically binds to the first binding site of the nucleic acid and the recruit sequence of the donor polynucleotide, thereby effecting editing of the target region of the nucleic acid.

[0017] In some embodiments, the first binding site is adjacent to the 5' or 3' end of the target region. In some embodiments, editing the target region of the nucleic acid replaces at least a portion of the target region with at least a portion of the donor polynucleotide.

[0018] In some embodiments, the nucleic acid further comprises a second binding site, the second binding site being within or adjacent to the target region, and at least one fusion protein specifically binds to the first binding site and the second binding site of the nucleic acid. In some embodiments, the second binding site is adjacent to the 5' or 3' end of the target region.

[0019] In some embodiments, the recruiter domain of the fusion protein comprises a Cro repressor family protein, and at least one recruitment sequence comprises a Cro OR3 operon sequence, hi some embodiments, the Cro OR3 operon sequence comprises an N15 OR3 operon sequence (optionally SEQ ID NO: 18), a lambda OR3 operon sequence, a P22 OR3 operon sequence, a 434 OR3 operon sequence, or a combination thereof.

[0020] In some embodiments, the donor polynucleotide comprises at least one homology arm, wherein the at least one homology arm comprises a nucleotide sequence having complementarity to a portion of the target region of the nucleic acid. In some embodiments, the donor polynucleotide comprises at least two recruit sequences. In some embodiments, the donor polynucleotide comprises a first recruit sequence adjacent to the 5' end of the donor nucleotide region and a second recruit sequence adjacent to the 3' end of the donor nucleotide region. In some embodiments, the at least two recruit sequences are not within the donor nucleotide region.

[0021] In some embodiments, the site-specific nuclease of the at least one fusion protein comprises a CRISPR-associated nuclease, and the method further comprises providing at least one guide RNA, wherein the at least one guide RNA comprises a nucleotide sequence having complementarity to the first binding site and / or the second binding site of the nucleic acid.

[0022] This application includes the following figures, which illustrate certain embodiments and / or features of the present compositions and methods and are intended to supplement any one or more descriptions of the present compositions and methods. These figures do not limit the scope of the present compositions and methods, except where expressly indicated to be so by the description herein. [Brief explanation of the drawings]

[0023] [Figure 1]

[0023] Figure 1 shows a schematic diagram of some embodiments of donor polynucleotides described herein, according to aspects of the present disclosure. Shown is a donor DNA polynucleotide ("donor DNA") with a portion ("insertion / replacement sequence") designed for insertion at a target site, as described herein. Also shown are the left and right homology arms ("LHA" and "RHA," respectively) and the Cro OR3 recruitment sequence ("Cro OR3"). [Figure 2] 1 shows a schematic diagram of one embodiment of the methods provided herein, according to aspects of the present disclosure. As described herein, a nucleic acid ("genomic DNA") bearing a target region and a donor DNA polynucleotide ("donor DNA") bearing a portion designed to insert into the target region are shown ("insertion / replacement sequence"). The left and right homology arms ("LHA" and "RHA," respectively) and the Cro OR3 recruitment sequence ("Cro OR3") are also shown. Potential cleavage sites ("on-target cut" and "off-target cut") are shown in the genomic DNA locus, which may be more or less useful in certain embodiments provided herein ("on-target cut" and "off-target cut"). [Figure 3] 1 shows a schematic diagram of two fusion proteins provided herein localized to a target site in a genomic DNA sequence and tethered to a donor DNA sequence, according to an embodiment of the disclosure. [Figure 4] FIG. 1 shows a schematic diagram of a Cas9-N15 Cro fusion protein according to an embodiment of the present disclosure. [Figure 5] FIG. 1 shows a schematic diagram of a Cas12a-N15 Cro fusion protein according to an embodiment of the present disclosure. DETAILED DESCRIPTION OF THE INVENTION

[0024] The following description describes various aspects and embodiments of the present compositions and methods. The specific embodiments are not intended to define the scope of the present compositions and methods. Rather, the embodiments merely provide non-limiting examples of various compositions and methods that fall at least within the scope of the disclosed compositions and methods. The description should be read from the perspective of one of ordinary skill in the art, and therefore may not necessarily include information that is known to one of ordinary skill in the art.

[0025] I. Terminology All technical and scientific terms used herein are intended to have the same meaning as commonly understood by those skilled in the art, unless otherwise defined below. References to technology used herein are intended to refer to technology as commonly understood in the art, including variations of those technologies and / or equivalent technology substitutions that would be apparent to those skilled in the art. While the following terms are believed to be well understood by those skilled in the art, definitions are provided below to facilitate description of the subject matter of the present disclosure.

[0026] As used herein, the singular forms "a," "an," and "the" include plural referents unless the context clearly dictates otherwise. Thus, for example, reference to an "enzyme" includes any combination of two or more such molecules, and the like.

[0027] As used herein, "and / or" refers to and includes any and all possible combinations of one or more of the associated listed items.

[0028] The term "about," as used herein, refers to the normal range of error for the respective value, readily known to one of ordinary skill in the art, e.g., ±20%, ±10%, or ±5% within the intended meaning of the recited value.

[0029] As used herein, the terms "comprising" or "comprises" are open-ended. When used in reference to a nucleic acid (or amino acid sequence) of interest, it refers to a nucleic acid sequence (or amino acid sequence) that includes the subject sequence as a part or entire sequence.

[0030] As used herein, the transitional phrase "consisting essentially of" means that the claim should be construed to include the specified materials or steps recited in the claim and materials or steps that do not materially affect the basic novel characteristic or characteristics of the claimed subject matter. Thus, it is intended that the term "consisting essentially of," when used in the claims of this disclosure, should not be construed as the equivalent of "comprise."

[0031] The term "plurality" refers to more than one entity. Thus, "plurality of individuals" refers to at least two individuals. In some embodiments, the term "plurality" refers to more than half of the total. For example, in some embodiments, "plurality of a population" refers to more than half of the members of the population.

[0032] The term "plant," as used herein, refers to any plant at any stage of development, particularly a seed plant. The term "plant cell," as used herein, refers to the structural and physiological unit of a plant, including a protoplast and a cell wall. A plant cell can be in the form of an isolated single cell or a cultured cell, or as part of a more highly organized unit, such as a plant tissue, a plant organ, or a whole plant. A plant cell can be derived from or part of an angiosperm or a gymnosperm. The plant cell can be a monocotyledonous plant cell (e.g., a corn cell, a rice cell, a sorghum cell, a sugarcane cell, a barley cell, a wheat cell, an oat cell, a turfgrass cell, or an ornamental herb cell) or a dicotyledonous plant cell (e.g., a tobacco cell, a pepper cell, a eggplant cell, a sunflower cell, a cruciferous plant cell, a flax cell, a potato cell, a cotton cell, a soybean cell, a sugarbeet cell, or a rapeseed cell. The term "plant cell culture," as used herein, refers to a culture of plant units at various stages of development, such as protoplasts, cell culture cells, cells of plant tissue, pollen, pollen tubes, ovules, embryo sacs, zygotes, and embryos. The term "plant tissue," as used herein, refers to a group of plant cells organized into a structural and functional unit. It includes any tissue of a plant, whether in planta or in culture. This term includes, but is not limited to, whole plants, plant organs, plant seeds, and the like. The term "plant part" includes single cells and cellular tissues, such as intact plant cells, tissue cultures, and any group of plant cells organized into a structural and / or functional unit. When used in conjunction with or without any specific type of plant tissue, as listed above or otherwise encompassed by this definition, it is not intended to exclude any other type of plant tissue. The term "plant part," as used herein, refers to a plant part, including single cells and cellular tissues, such as intact plant cells in a plant, cell clumps and tissue cultures that can regenerate plants. Examples of plant parts include, but are not limited to, single cells and tissues derived from pollen, ovules, zygotes, leaves, embryos, roots, root tips, anthers, flowers, inflorescences, fruits, stems, shoots, cuttings, and seeds, as well as pollen, ovules, egg cells, zygotes, leaves, embryos, roots, root tips, anthers, flowers, inflorescences, fruits, stems, shoots, cuttings, scions, rootstocks, seeds, protoplasts, calluses, etc.

[0033] The terms "polypeptide," "peptide," and "protein" are used interchangeably herein to refer to a polymer of amino acid residues. As used herein, these terms encompass amino acid chains of any length, including full-length proteins, in which the amino acid residues are linked by covalent peptide bonds.

[0034] The terms "nucleic acid" and "polynucleotide" are used interchangeably and, as used herein, refer to deoxyribonucleic acid (DNA) or ribonucleic acid (RNA) and polymers thereof in either single- or double-stranded form, as well as both sense and antisense strands of RNA, cDNA, genomic DNA, and mitochondrial DNA, as well as synthetic forms and mixed polymers of the above. In higher plants, DNA is the genetic material, while RNA is responsible for transferring the information contained within DNA to proteins. A "genome" is the entire body of genetic material contained in each cell of an organism. When RNA is described, it is understood that its corresponding cDNA is also described, and uridine is represented as thymidine. In certain embodiments, nucleotides refer to ribonucleotides, deoxynucleotides, or modified forms of either type of nucleotide, and combinations thereof. Additionally, polynucleotides disclosed herein can include either or both naturally occurring and modified nucleotides linked together by naturally occurring and / or non-naturally occurring nucleotide linkages. Nucleic acid molecules can be chemically or biochemically modified or can include non-natural or derivatized nucleotide bases, as one of ordinary skill in the art would readily understand. Such modifications include, for example, labels, methylation, substitution of one or more naturally occurring nucleotides with analogs, internucleotide modifications, such as uncharged linkages (e.g., methylphosphonates, phosphotriesters, phosphoramidates, carbamates, etc.), charged linkages (e.g., phosphorothioates, phosphorodithioates, etc.), pendant moieties (e.g., polypeptides), intercalators (e.g., acridine, psoralen, etc.), chelators, alkylating agents, and modified linkages (e.g., α-anomeric nucleic acids, etc.). The above terms are intended to encompass any topological conformation, including single-stranded, double-stranded, partially duplexed, triplexed, hairpinned, circular, and padlock conformations. Reference to a nucleic acid sequence includes its complement unless otherwise specified. Thus, a reference to a nucleic acid molecule having a particular sequence should be understood to encompass its complementary strand and its complementary sequence.Nucleotide sequences are "complementary" if they hybridize specifically in solution (e.g., according to Watson-Crick base-pairing rules). The term also includes codon-optimized nucleic acids that encode the same polypeptide sequence. It is also understood that nucleic acids can be crude, purified, or attached to synthetic materials, such as beads or column matrices.

[0035] The term "corresponding to," in reference to nucleic acid sequences, means that when the nucleic acid sequences of a given sequence are aligned with one another, those nucleic acids "corresponding" to a given recited position in the present invention are aligned with those positions in the reference sequence, but not at their exact numerical positions relative to a particular nucleic acid sequence of the present invention. Optimal alignment of sequences for comparison can be performed by computerized implementation of known algorithms or by visual inspection. Ready-made sequence comparison and multiple sequence alignment algorithms are the Basic Local Alignment Search Tool (BLAST) and ClustalW / ClustalW2 / Clustal Omega programs, respectively, available on the Internet (e.g., the EMBL-EBI website). Other suitable programs include, but are not limited to, GAP, BestFit, Plot Similarity, and FASTA, which are part of the Accelrys GCG package available from Accelrys, Inc., San Diego, Calif., United States of America. See also Smith & Waterman, 1981; Needleman & Wunsch, 1970; Pearson & Lipman, 1988; Ausubel et al., 1988 and Sambrook & Russell, 2001.

[0036] Unless otherwise indicated, a particular nucleic acid sequence implicitly encompasses its conservatively modified variants (e.g., degenerate codon substitutions), alleles, orthologs, SNPs, and complementary sequences along with the explicitly indicated sequence. Specifically, degenerate codon substitutions can be achieved by generating sequences in which the third position of one or more selected (or all) codons is substituted with mixed-base and / or deoxyinosine residues. See Batzer et al., Nucleic Acid Res. 19:5081 (1991); Ohtsuka et al., J. Biol. Chem. 260:2605-2608 (1985); and Rossolini et al., Mol. Cell. Probes 8:91-98 (1994).

[0037] The terms "identity" or "substantial identity," when used in reference to a polynucleotide or polypeptide sequence described herein, refer to a sequence having at least 60% sequence identity with a reference sequence. Alternatively, the percent identity can be any integer between 60% and 100%. Exemplary embodiments include at least 60%, 65%, 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99% identity when compared to a reference sequence using a program described herein, preferably BLAST with standard parameters as described below. One of skill in the art will recognize that these values ​​can be appropriately adjusted to determine the corresponding identity of proteins encoded by two nucleotide sequences by taking into account codon degeneracy, amino acid similarity, reading frame alignment, and the like.

[0038] For sequence comparison, typically one sequence serves as a reference sequence, and test sequence is compared with it.When using sequence comparison algorithm, test sequence and reference sequence are input into computer, and partial sequence coordinates are designated as necessary, and sequence algorithm program parameters are designated.Default program parameters can be used, or alternative parameters can be designated.Then, sequence comparison algorithm calculates the sequence identity percentage of test sequence to reference sequence based on program parameters.

[0039] As used herein, the term "comparison window" refers to any segment of a number of consecutive positions selected from the group consisting of 20 to 600, usually about 50 to about 200, and more usually about 100 to about 150, where a sequence can be compared to a reference sequence of the same number of consecutive positions after the two sequences are optimally aligned. Methods for aligning sequences for comparison are well known in the art. Optimal alignment of sequences for comparison can be performed using the local homology algorithm of Smith and Waterman Add. APL. Math. 2:482 (1981), the homology alignment algorithm of Needleman and Wunsch J. Mol. Biol. 48:443 (1970), the similarity search method of Pearson and Lipman Proc. Natl. Acad. Sci. (USA) 85:2444 (1988), computer implementations of these algorithms (e.g., BLAST), or manual alignment and visual inspection.

[0040] Suitable algorithms for determining percent sequence identity and percent sequence similarity are the BLAST and BLAST 2.0 algorithms described in Altschul et al. (1990) J. Mol. Biol. 215:403-410 and Altschul et al. (1977) Nucleic Acids Res. 25:3389-3402, respectively. Software for performing BLAST analyses is publicly available through the website of the National Center for Biotechnology Information (NCBI). This algorithm involves first identifying high-scoring sequence pairs (HSPs) by identifying short words of length W in the query sequence that, when aligned with words of the same length in a database sequence, match or meet some positive threshold score T. T is referred to as the neighborhood word score threshold (Altschul et al., supra). These initial neighborhood word hits act as seeds for initiating searches to find longer HSPs containing them. The word hits are then extended in both directions along each sequence for as far as possible to increase the cumulative alignment score. Cumulative scores are calculated using the parameters M (reward score for a pair of matching residues, always >0) and N (penalty score for mismatched residues, always <0) for nucleotide sequences. For amino acid sequences, a scoring matrix is ​​used to calculate the cumulative score. Extension of the word hits in each direction is stopped when the cumulative alignment score falls by an amount X from its maximum achieved value; the cumulative score falls below 0 due to the accumulation of one or more negative-scoring residue alignments; or the end of either sequence is reached. The BLAST algorithm parameters W, T, and X determine the sensitivity and speed of the alignment. The BLASTN program (for nucleotide sequences) uses as defaults a word size (W) of 28, an expectation (E) of 10, M=1, N=-2, and a comparison of both strands. For amino acid sequences, the BLASTP program uses as defaults a word size (W) of 3, an expectation (E) of 10, and the BLOSUM62 scoring matrix.See Henikoff & Henikoff, Proc. Natl. Acad. Sci. USA 89:10915 (1989).

[0041] The BLAST algorithm also performs a statistical analysis of the similarity between two sequences. See, e.g., Karlin & Altschul, Proc. Nat'l. Acad. Sci. USA 90:5873-5787 (1993). One measure of similarity provided by the BLAST algorithm is the smallest sum probability (P(N)), which provides an indication of the probability that a match between two nucleotide or amino acid sequences would occur by chance. For example, a smallest sum probability in a comparison of a test nucleic acid with a reference nucleic acid is less than about 0.01, more preferably less than about 10 -5 less than, most preferably about 10 -20 A nucleic acid is considered to be similar to a reference sequence if it is less than

[0042] "Recombination" is the exchange of DNA strands to create new nucleotide sequence configurations. This term can also refer to the homologous recombination process that occurs in the repair of double-stranded DNA breaks, in which a polynucleotide is used as a template to repair a homologous polynucleotide. This term can also refer to the exchange of information between two homologous chromosomes during meiosis. The frequency of this double recombination is the product of the frequency of single recombinants. For example, recombinants in a 10 cM region can be found at a frequency of 10%, and double recombinants are found at a frequency of 10% x 10% = 1% (1 centimorgan is defined as 1% recombinant progeny in testcross).

[0043] A "gene" is a defined region located within a genome that contains, in addition to the aforementioned coding nucleic acid sequence, other primarily regulatory nucleic acid sequences involved in the control of expression, i.e., transcription and translation, of the coding portion. A gene can include both coding and non-coding regions (e.g., introns, regulatory elements, promoters, enhancers, termination sequences, and 5' and 3' untranslated regions). A gene typically expresses mRNA, functional RNA, or a specific protein and includes regulatory sequences. A gene may or may not be usable for the production of a functional protein. In some embodiments, a gene refers to only the coding region. The term "native gene" refers to a gene as found in nature. The term "chimeric gene" refers to any gene that: 1) contains regulatory and coding sequences that are not found together in nature, or 2) encodes portions of a protein that are not naturally contiguous, or 3) contains portions of a promoter that are not naturally contiguous. Thus, a chimeric gene can contain regulatory and coding sequences from different sources, or regulatory and coding sequences from the same source but arranged in a manner different from that found in nature. A gene may be "isolated," which refers to a nucleic acid molecule that is substantially or essentially free from components normally found associated with the nucleic acid molecule in nature, including other cellular material, culture medium from recombinant production, and / or various chemicals used to chemically synthesize the nucleic acid molecule.

[0044] A "gene of interest" or "nucleotide sequence of interest" refers to any gene that, when transferred into a plant, confers a desired characteristic on the plant, such as antibiotic resistance, viral resistance, insect resistance, disease resistance or resistance to other pests, herbicide resistance, improved nutritional value, improved performance in an industrial process, or altered reproductive ability. A "gene of interest" can also include one that is transferred to a plant to produce a commercially valuable enzyme or metabolite in the plant.

[0045] An "isolated" nucleic acid molecule or nucleotide sequence, or an "isolated" polypeptide, is a nucleic acid molecule, nucleotide sequence, or polypeptide that exists apart from its natural environment by the hand of man and / or has a different, modified, regulated, and / or altered function compared to its function in its natural environment, and is therefore not a product of nature. An isolated nucleic acid molecule or isolated polypeptide can exist in purified form or can exist in a non-native environment (e.g., a recombinant host cell). Thus, for example, with respect to a polynucleotide, the term isolated means that it is separated from the chromosome and / or cell in which it occurs in nature. A polynucleotide is also isolated if it is separated from the chromosome and / or cell in which it occurs in nature and then inserted into a genetic context, chromosome, chromosomal location, and / or cell that does not occur in nature. Recombinant nucleic acid molecules and nucleotide sequences of the present invention can be considered "isolated" as defined above.

[0046] Thus, an "isolated nucleic acid molecule" or "isolated nucleotide sequence" is a nucleic acid molecule or nucleotide sequence that is not immediately adjacent to the nucleotide sequences (one at the 5' end and one at the 3' end) to which it is immediately adjacent in the naturally occurring genome of the organism from which it originates. Thus, in one embodiment, an isolated nucleic acid includes some or all of the 5' non-coding (e.g., promoter) sequences immediately adjacent to the coding sequence. Thus, the term includes recombinant nucleic acids that are incorporated into, for example, a vector, an autonomously replicating plasmid, or virus, or the genomic DNA of a prokaryote or eukaryote, or that exist as a separate molecule independent of other sequences (e.g., a cDNA or genomic DNA fragment produced by PCR or restriction endonuclease treatment). It also includes recombinant nucleic acids that are part of a hybrid nucleic acid molecule that encodes an additional polypeptide or peptide sequence. An "isolated nucleic acid molecule" or "isolated nucleotide sequence" can include nucleotide sequences that are derived from and inserted into the same natural cell type of origin, but that exist in a non-natural state, e.g., in different copy numbers and / or under the control of regulatory sequences that differ from those found in the nucleic acid molecule's natural state.

[0047] The term "isolated" can further refer to a nucleic acid molecule, nucleotide sequence, polypeptide, peptide, or fragment (e.g., when produced by recombinant DNA technology) or chemical precursor or other chemical (e.g., when chemically synthesized) that is substantially free of cellular material, viral material, and / or culture medium. Furthermore, an "isolated fragment" is a fragment of a nucleic acid molecule, nucleotide sequence, or polypeptide that is not naturally occurring as a fragment and, as such, would not be found in the natural state. "Isolated" does not necessarily mean that the preparation is technically pure (homogeneous), but is sufficiently pure to provide the polypeptide or nucleic acid in a form that can be used for its intended purpose.

[0048] "Homology-dependent repair" or "homologous recombination repair" or "HDR" refers to a mechanism that repairs ssDNA and double-stranded DNA (dsDNA) damage in cells. This repair mechanism can be used by cells when there is an HDR template with a sequence highly homologous to the damaged site. The term "complete HDR" refers to a situation in which the genome-homologous junction of the replaced allele has undergone complete HDR, and "incomplete HDR" refers to a situation in which the genome-homologous junction of the replaced allele has undergone partial or incomplete HDR. In some embodiments, a donor polynucleotide molecule with homology to the cut target DNA sequence is used as a template for repair of the cut target DNA sequence, resulting in the transfer of genetic information from the donor polynucleotide to the target DNA. Thus, new nucleic acid material can be inserted / copied into the site. Optionally, the target DNA is contacted with a donor molecule, e.g., a donor polynucleotide molecule. Optionally, the donor polynucleotide molecule is introduced into the cell. Optionally, at least a segment of the donor polynucleotide molecule is integrated into the genome of the cell.

[0049] "Microhomology-mediated end joining," or "MMEJ," or "alternative non-homologous end joining" (Alt-NHEJ) refers to a form of double-strand break repair in DNA. This repair mechanism utilizes microhomology sequences to align the broken strands. "Non-homologous end joining" or "NHEJ" refers to a form of double-strand break repair in DNA. The double-strand break is repaired by directly ligating the broken ends to each other. Generally, in the absence of a donor polynucleotide, there is no insertion of new nucleic acid material at the site, although some nucleic acid material may be lost or added, resulting in small deletions or small insertions. In some embodiments, a donor polynucleotide molecule can be provided (e.g., introduced into a cell), and a portion of the donor polynucleotide can be inserted into the genome via MMEJ or NHEJ. Some embodiments of the methods provided herein increase the likelihood of insertion of the donor polynucleotide via tethering the donor polynucleotide to the target site, as described below.

[0050] II. Introduction Fusion proteins and related recombinant nucleic acids, systems, and methods for increasing the efficiency of genome editing using SDN and donor polynucleotide tethering methods are provided herein. The present disclosure is based, in part, on the inventors' discovery that, as demonstrated in the Examples herein, 1) fusing an SDN to a recruiter domain containing a site-specific DNA-binding domain, and 2) using a donor polynucleotide homology repair template containing a binding site for the recruiter domain, results in increased frequency of HDR. Without being bound by any particular theory, it is possible that the recruiter domain binds to the binding site on the donor polynucleotide template and tethers the donor polynucleotide to the cleavage site (i.e., via fusion of the recruiter domain with the SDN that forms the cleavage). Furthermore, this tethering may increase the likelihood of HDR-mediated cleavage repair (e.g., by promoting spatial proximity between the cleavage site and the donor polynucleotide template).

[0051] III. Fusion Proteins In one aspect, provided herein is a fusion protein comprising a site-specific nuclease linked to a recruiter domain comprising a site-specific DNA-binding domain. As used throughout, a "fusion protein" is a protein comprising two distinct polypeptide sequences, i.e., a site-specific nuclease polypeptide sequence and a recruiter domain polypeptide sequence, joined or linked to form a single polypeptide. In some embodiments, the two amino acid sequences are encoded by separate nucleic acid sequences that are joined together to create a single polypeptide when transcribed and translated. The site-specific nuclease and recruiter domain can be linked in any order and orientation relative to each other. For example, the C' terminus of the site-specific nuclease can be linked to the N' or C' terminus of the recruiter domain. The site-specific nuclease and recruiter domain can also be separated by one or more additional fusion protein domains, as described below.

[0052] A. Site-specific nucleases The fusion proteins provided herein comprise a site-specific modifying polypeptide (e.g., a site-specific nuclease). The site-specific modifying polypeptide modifies a target DNA (e.g., by cleaving or methylating the target DNA) and / or modifies a polypeptide associated with the target DNA (e.g., by methylating or acetylating histone tails). In some embodiments, the site-specific modifying polypeptide interacts with a guide RNA, either a single RNA molecule or an RNA duplex of at least two RNA molecules, and is directed to a DNA sequence (e.g., a chromosomal sequence or an extrachromosomal sequence, such as an episomal sequence, a minicircle sequence, a mitochondrial sequence, a chloroplast sequence, etc.) for its association with the guide RNA. In some embodiments, the site-specific polypeptide is a site-specific nuclease, which is capable of cleaving one or both strands of DNA at a designated target sequence.

[0053] The term "cleavage" or "cleaving" refers to the breaking of the covalent phosphodiester bond in the ribosyl phosphate diester backbone of a polynucleotide and encompasses both single-strand and double-strand breaks. Double-strand breaks can occur as a result of two separate single-strand cleavage events. Cleavage can result in the creation of either blunt ends or overhanging ends (also known as sticky ends). A "nuclease cleavage site" or "genomic nuclease cleavage site" is a region of nucleotides within which a site-specific nuclease cleaves (e.g., upon binding to a proximal binding site). When the polynucleotide is DNA (e.g., genomic DNA), one or both strands can be cleaved at the nuclease cleavage site. Such cleavage by nuclease enzymes triggers DNA repair mechanisms within the cell, thereby establishing an environment for homologous recombination to occur.

[0054] A variety of site-specific nucleases can be used in the fusion proteins, systems, and methods disclosed herein. Suitable nucleases include, but are not limited to, CRISPR-associated (Cas) proteins or Cas nucleases, zinc finger nucleases (ZFNs), transcription activator-like effector nucleases (TALENs), meganucleases, RNA-binding proteins (RBPs), CRISPR-associated RNA-binding proteins, recombinases, flippases, transposases, Argonaute (Ago) proteins (e.g., prokaryotic Argonaute (pAgo), archaeal Argonaute (aAgo), eukaryotic Argonaute (eAgo), and Natronobacterium gregorii (Natronobacterium gregorii). gregoryi Argonaute (NgAgo), adenosine deaminases acting on RNA (ADARs), CRISPR-Cas-inspired RNA targeting (CIRT) systems, Pumilio / fem-3 binding factors (PUFs), homing endonucleases or any functional fragment thereof, any derivative thereof, any variant thereof, and any fragment thereof. Exemplary site-specific nucleases suitable for use in the fusion proteins, systems, and methods disclosed herein are further described below.

[0055] In some embodiments, the site-specific nuclease is a naturally occurring site-specific nuclease. Exemplary naturally occurring site-specific nucleases are known in the art (see, e.g., Makarova et al., 2017, Cell 168:328-328.e1 and Shmakov et al., 2017, Nat Rev Microbiol 15(3):169-182, both of which are incorporated herein by reference). In some embodiments, the site-specific nuclease binds to a DNA-targeting polynucleotide (e.g., a guide RNA), is guided to a specific sequence within the target DNA, and cleaves the target DNA.

[0056] In some embodiments, the site-specific nuclease is modified from its native sequence (e.g., by mutation or one or more amino acid residues) to alter its function. For example, the site-specific nuclease can be modified to be enzymatically inactive. The term "enzymatically inactive" can refer to a site-specific nuclease that can bind to a nucleic acid sequence in a polynucleotide in a sequence-specific manner but does not cleave the target polynucleotide. An enzymatically inactive site-specific polypeptide can include an enzymatically inactive domain (e.g., a nuclease domain). Enzymatically inactive can refer to no activity. Enzymatically inactive can refer to substantially no activity. Enzymatically inactive can refer to essentially no activity. Enzymatically inactive can refer to 1% or less, 2% or less, 3% or less, 4% or less, 5% or less, 6% or less, 7% or less, 8% or less, 9% or less, or 10% or less of the activity of the wild-type (e.g., nucleic acid cleavage activity, wild-type Cas9 activity).

[0057] In some embodiments, the site-specific nuclease (e.g., an enzymatically inactive site-specific nuclease) is fused to one or more transcriptional repressor domains, activator domains, epigenetic domains, recombinase domains, transposase domains, flippase domains, nickase domains, cleavage domains, or any combination thereof. The activator domain can include one or more tandem activation domains located at the carboxyl terminus of the enzyme. In other cases, the actuator moiety can include one or more tandem repressor domains located at the carboxyl terminus of the protein. Non-limiting exemplary activation domains include GAL4, herpes simplex activation domain VP16, VP64 (tetramer of herpes simplex activation domain VP16), NF-KB p65 subunit, and Epstein-Barr virus R transactivator (Rta); see Chavez et al., Nat Methods, 2015, 12(4):326-328 and U.S. Patent Application Publication No. 20140068797. Non-limiting exemplary repression domains include the KRAB (Kruppel-associated box) domain of Koxl, the Mad mSIN3-interacting domain (SID), and the ERF repressor domain (ERD); see Chavez et al., Nat Methods, 2015, 12(4):326-328 and U.S. Patent Application Publication No. 20140068797. Nucleases can also be fused to heterologous polypeptides that provide increased or decreased stability. The fused domain or heterologous polypeptide can be located at the N-terminus, C-terminus or internally within the nuclease.

[0058] CRISPR / Cas nucleases In some embodiments, the site-specific nuclease comprises a CRISPR-associated (Cas) protein or Cas nuclease that functions in a CRISPR (clustered regularly interspaced short palindromic repeats) / Cas system. In bacteria, this system can provide adaptive immunity to foreign DNA (Barrangou, R., et al., "CRISPR provides acquired resistance against viruses in prokaryotes," Science (2007) 315:1709-1712; Makarova, K.S., et al., "Evolution and classification of the CRISPR-Cas systems," Nat Rev Microbiol (2011) 9:467-477; Garneau, J.E., et al., "The CRISPR / Cas bacterial immune system cleaves bacteriophage and plasmid DNA," Nature (2010) 468:67-71; Sapranauskas, R., et al., "The Streptococcus thermophilus CRISPR / Cas system provides immunity in Escherichia coli," Nucleic Acids Res (2011) 39:9275-9282). CRISPR / Cas systems (e.g., modified and / or unmodified) can be utilized as genome engineering tools in a wide variety of organisms, including various mammals, animals, plants, microorganisms, and yeast. CRISPR / Cas systems can include a guide nucleic acid, such as a guide RNA (gRNA), complexed with a Cas protein for targeted regulation of gene expression and / or activity or nucleic acid editing. RNA-guided Cas proteins (e.g., Cas nucleases, such as Cas9 nuclease) can specifically bind to target polynucleotides (e.g., DNA) in a sequence-dependent manner.Cas proteins can cleave DNA when they possess nuclease activity (Gasiunas, G., et al., “Cas9-crRNA ribonucleoprotein complex mediates specific DNA cleavage for adaptive immunity in bacteria,” Proc Natl Acad Sci USA (2012) 109:E2579-E286; Jinek, M., et al., “A programmable dual-RNA-guided DNA endonuclease in adaptive bacterial immunity,” Science (2012) 337:816-821; Sternberg, SH, et al., “DNA interrogation by the CRISPR RNA-guided endonuclease Cas9,” Nature (2014) 507:62; Deltcheva, E., et al., “CRISPR RNA maturation by trans-encoded small RNA and host factor RNase III,” Nature (2011) 471:602-607). DNA breaks (e.g., double-strand breaks) can result in DNA break repair, allowing for the introduction of one or more genetic modifications (e.g., nucleic acid editing). DNA break repair can occur by non-homologous end joining (NHEJ), microhomology-mediated end joining (MMEJ) or homology-directed repair (HDR). In some embodiments, donor polynucleotides are used to facilitate HDR, as detailed in the "System" section below.The CRISPR-Cas system has been widely used for programmable genome editing in various organisms and model systems (Cong, L., et al., "Multiplex genome engineering using CRISPR-Cas systems," Science (2013) 339:819-823; Jiang, W., et al., "RNA-guided editing of bacterial genomes using CRISPR-Cas systems," Nat. Biotechnol. (2013) 31:233-239; Sander, JD & Joung, JK, "CRISPR-Cas systems for editing, regulating, and targeting genomes," Nature Biotechnol. (2014) 32:347-355).

[0059] In some embodiments, the site-specific nucleases described herein comprise a Cas protein complexed with a guide nucleic acid, such as a guide RNA (further described in the "Systems" section below). In some embodiments, the site-specific nuclease comprises a Cas protein complexed with a single guide nucleic acid, such as a single guide RNA (sgRNA). In some embodiments, the site-specific nuclease comprises an RNA-binding protein (RBP) optionally complexed with a guide nucleic acid, such as a guide RNA (e.g., sgRNA), capable of forming a complex with the Cas protein. In some examples, the RNA-guided Cas protein recognizes a DNA target complementary to a portion of the gRNA known as the CRISPR RNA (crRNA) sequence. The target sequence is often referred to as the protospacer, and the portion of the crRNA sequence complementary to the protospacer is often referred to as the spacer. To function (e.g., to cleave DNA), many Cas nucleases also require a specific protospacer adjacent motif (PAM), a DNA sequence typically 2-6 base pairs long that immediately follows the protospacer sequence.

[0060] Various site-specific Cas nucleases (e.g., Cas proteins from different species) may be useful in the fusion proteins, systems, and methods provided herein based on the varying enzymatic properties of different Cas proteins (e.g., different protospacer adjacent motif (PAM) sequence preferences, increased or decreased enzymatic activity, increased or decreased levels of cytotoxicity, tendency to generate one or more of NHEJ, homologous recombination repair, single-strand breaks, double-strand breaks, etc.). Cas proteins from various species (e.g., those disclosed in Shmakov et al., 2017, or polypeptides derived therefrom) may require different PAM sequences in target DNA. Thus, for a particular Cas enzyme of choice, the PAM sequence requirements may differ from the 5'-N GG-3' sequence (where N is either A, T, C, or G) known to be required for Cas9 activity. Many Cas9 orthologs have been identified from a wide variety of species, and these proteins share only a few identical amino acids. All identified Cas9 orthologs share the same domain architecture, including a central HNH endonuclease domain and a split RuvC / RNase H domain. Cas9 proteins share four key motifs of conserved architecture: motifs 1, 2, and 4 are RuvC-like motifs, while motif 3 is an HNH motif. In contrast, Cas12a proteins from various species may have different PAM sequence requirements compared to the standard PAM of TTTV LbCas12a.

[0061] Any suitable CRISPR / Cas system can be used. CRISPR / Cas systems can be referred to using various nomenclature systems. Exemplary nomenclature systems are provided in Makarova, K. Set al., "An updated evolutionary classification of CRISPR-Cas systems," Nat Rev Microbiol (2015) 13:722-736 and Shmakov, S. et al., "Discovery and Functional Characterization of Diverse Class 2 CRISPR-Cas Systems," Mol Cell (2015) 60:1-13. A CRISPR / Cas system can be a Type I, Type II, Type III, Type IV, Type V, Type VI system, or any other suitable CRISPR / Cas system. As used herein, a CRISPR / Cas system can be a Class 1, Class 2, or any other suitable classified CRISPR / Cas system. The determination of Class 1 or Class 2 can be based on the genes encoding the effector modules. Class 1 systems generally have a multi-subunit crRNA-effector complex, while Class 2 systems generally have a single protein, such as Cas9, Cpfl (also called Cas12a), C2c1, C2c2, C2c3, or a crRNA-effector complex. Class 1 CRISPR / Cas systems may use a complex of multiple Cas proteins to effect regulation. Class 1 CRISPR / Cas systems may include, for example, Type I (e.g., Type I, IA, IB, IC, ID, IE, IF, IU), Type III (e.g., Type III, IIIA, IIIB, IIIC, IIID), and Type IV (e.g., Type IV, IVA, IVB) CRISPR / Cas types. Class 2 CRISPR / Cas systems may use a single large Cas protein to effect regulation. Class 2 CRISPR / Cas systems can include, for example, Type II (e.g., Type II, Type IIA, Type IIB) and Type V CRISPR / Cas types.CRISPR systems may complement each other and / or provide functional units in trans to facilitate CRISPR gene targeting.

[0062] The Cas protein may be derived from any suitable organism, including, but not limited to, Streptococcus pyogenes, Streptococcus thermophilus, Streptococcus sp., Staphylococcus aureus, Nocardiopsis dassonvillei, Streptomyces pristinae spiralis, Streptomyces viridochromogenes, Streptomyces viridochromogenes, Streptosporangium roseum, Streptosporangium roseum, Alicyclobacillus acidocaldarius, Bacillus pseudomycoides, Bacillus selenitireducens, Exiguobacterium sibiricum, Lactobacillus delbrueckii, Lactobacillus salivarius, Microscilla marina, Burkholderiales bacteria, Polaromonas naphthalenivorans, Polaromonas sp., Crocosphaera watsonii watsonii), Cyanothece sp.), Microcystis aeruginosa, Pseudomonas aeruginosa, Synechococcus sp., Acetohalobium arabaticum, Ammonifex degensii, Caldicelulosiruptor becscii, Candidatus Desulforudis, Clostridium botulinum, Clostridium difficile, Finegoldia magna, Natranaerobius thermophilus thermophilus, Pelotomaculum thermopropionicum, Acidithiobacillus caldus, Acidithiobacillus ferrooxidans, Allochromatium vinosum, Marinobacter sp., Nitrosococcus halophilus, Nitrosococcus watsoni, Pseudoalteromonas haloplanktis, Ktedonobacter racemifer, Methanohalobium ebestigatum evestigatum, Anabaena variabilis, Nodularia spumigena, Nostoc sp.Examples of suitable organisms include Arthrospira maxima, Arthrospira platensis, Arthrospira sp., Lyngbya sp., Microcoleus chthonoplastes, Oscillatoria sp., Petrotoga mobilis, Thermosipho africanus, Acaryochloris marina, Leptotrichia shahii, and Francisella novicida. In some embodiments, the organism is Streptococcus pyogenes (S. pyogenes). In some embodiments, the organism is Staphylococcus aureus (S. aureus). In some embodiments, the organism is Streptococcus thermophilus (S. thermophilus).

[0063] Cas proteins are useful in, but not limited to, Veillonella atypical, Fusobacterium nucleatum, Filifactor alocis, Solobacterium moorei, Coprococcus catus, Treponema denticola, Peptoniphilus duerdenii, Catenibacterium mitsuokai, Streptococcus mutans, Listeria innocua, Staphylococcus pseudintermedius, and the like. pseudintermedius, Acidaminococcus intestine, Olsenella uli, Oenococcus kitaharae, Bifidobacterium bifidum, Lactobacillus rhamnosus, Lactobacillus gasseri, Finegoldia magna, Mycoplasma mobile, Mycoplasma gallisepticum, Mycoplasma ovipneumoniae, Mycoplasma canis, Mycoplasma synoviae synoviae, Eubacterium rectale, Streptococcus thermophilus, Eubacterium doricumdolichum, Lactobacillus coryniformis subsp. Torquens, Ilyobacter polytropus, Ruminococcus albus, Akkermansia muciniphila, Acidothermus cellulolyticus, Bifidobacterium longum, Bifidobacterium dentium, Corynebacterium diphtheria, Elusimicrobium minutum, Nitratifractor salsagainis salsuginis, Sphaerochaeta globus, Fibrobacter succinogenes subsp. succinogenes, Bacteroides fragilis, Capnocytophaga ochracea, Rhodopseudomonas palustris, Prevotella micans, Prevotella ruminicola, Flavobacterium columnare, Aminomonas paucivorans, Rhodospirillum rubrum, Candidatus Puniceispirillum marinum, Verminephrobacter eiseniae, Ralstonia syzygiisyzygii, Dinoroseobacter shibae, Azospirillum, Nitrobacter hamburgensis, Bradyrhizobium, Wolinella succinogenes, Campylobacter jejuni subsp. jejuni, Helicobacter mustelae, Bacillus cereus, Acidovorax ebreus, Clostridium perfringens, Parvibaculum labamentivorans lavamentivorans, Roseburia intestinalis, Neisseria meningitidis, Pasteurella multocida subsp. multocida, Sutterella wadsworthensis, Proteobacteria, Legionella pneumophila, Parasterella excrementihominis, Wolinella succinogenes, and Francisella novicida.

[0064] Non-limiting examples of Cas proteins include c2c1, C2c2, c2c3, Casl, CaslB, Cas2, Cas3, Cas4, Cas5, Cas5e (CasD), Cash, Cas6e, Cas6f, Cas7, Cas8a, Cas8a1, Cas8a2, Cas8b, Cas8c, Cas9 (Csnl or Csx12), Cas10, Cas10d, CasF, CasG, CasH, Cpfl (also called Cas12a), Csyl, Csy2, Csy3, Csel (Cas A), Cse2 (CasB), Cse3 (CasE), Cse4 (CasC), Cscl, Csc2, Csa5, Csn2, Csm2, Csm3, Csm4, Csm5, Csm6, Cmrl, Cmr3, Cmr4, Cmr5, Cmr6, Csbl, Csb2, Csb3, Csx17, Csx14, Csx10, Csx16, CsaX, Csx3, Csxl, Csx15, Csfl, Csf2, Csf3, Csf4 and Cul966, and homologs or modified versions thereof. In some embodiments, the site-specific nuclease of the fusion proteins provided herein comprises a CRISPR-associated nuclease, wherein the CRISPR-associated nuclease is Cas5, Cas6, Cas7, Cas8, Cas9, Cas12a, Cas12b, Cas12i, Cas12j, Cas12L, Cas12e, Cas12c, Cas12d, Cas12g, Cas12h, TnpB, Cas13a, Cas13b, or Cas14. In some embodiments, the CRISPR-associated nuclease is a Cas9 enzyme. In some embodiments, the CRISPR-associated nuclease is a Cas12a enzyme. In some embodiments, the CRISPR-associated nuclease is a nickase or an inactivated version of a CRISPR-associated nuclease.

[0065] Lachnospiraceae bacterium Cpf1 (LbCpf1) is one of a large group of many Cpf1 proteins. The terms "Cpf1" and "Cas12a" are used interchangeably throughout this disclosure. Cpf1 is a Cas protein. In some embodiments, the site-specific nuclease is catalytically active Cas12a from Lachnospiraceae bacterium ("LbCas12a") or Moraxella bovoculi AAX08_00205 ("Mb2Cas12a"). In some embodiments, the site-specific nuclease domain of the fusion protein is a Cas12a protein from any of Lachnospiraceae bacterium, Acidaminococcus sp., Moraxella bovoculi, Thiomicrospira sp., Moraxella lacunata, Methanomethylophilus alvus, Butyrivibrio sp., or Bacteroidetes oral sp.

[0066] A Cas protein may comprise one or more domains. Non-limiting examples of domains include a guide nucleic acid recognition and / or binding domain, a nuclease domain (e.g., DNase or RNase domain, RuvC, HNH), a DNA-binding domain, an RNA-binding domain, a helicase domain, a protein-protein interaction domain, and a dimerization domain. The guide nucleic acid recognition and / or binding domain can interact with the guide nucleic acid. The nuclease domain can comprise catalytic activity for nucleic acid cleavage. The nuclease domain can lack catalytic activity to prevent nucleic acid cleavage. The Cas protein can also be a chimeric Cas protein fused with another protein or polypeptide. For example, the Cas protein can be a chimera of various Cas proteins comprising domains from different Cas proteins.

[0067] As used herein, a Cas protein can be an active variant, an inactive variant, or a fragment of a wild-type or modified Cas protein. The Cas protein can contain amino acid changes, such as deletions, insertions, substitutions, variants, mutations, fusions, chimeras, or any combination thereof, compared to the wild-type version of the Cas protein. The Cas protein can be a polypeptide with at least about 5%, 10%, 20%, 30%, 40%, 50%, 60%, 70%, 80%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100% sequence identity or similarity to an exemplary wild-type Cas protein. The Cas protein can be a polypeptide with at most about 5%, 10%, 20%, 30%, 40%, 50%, 60%, 70%, 80%, 90%, or 100% sequence identity and / or similarity to an exemplary wild-type Cas protein. A variant or fragment may comprise at least about 5%, 10%, 20%, 30%, 40%, 50%, 60%, 70%, 80%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100% sequence identity or similarity to a wild-type or modified Cas protein or portion thereof. A variant or fragment may lack nucleic acid cleavage activity while being capable of being complexed with a guide nucleic acid and targeted to a nucleic acid locus.

[0068] In some embodiments, the modified Cas protein has reduced function compared to its unmodified form. In some embodiments, the modified Cas protein lacks the function of its unmodified form. For example, a nuclease-deficient Cas protein retains the ability to bind to DNA but lacks or has reduced nucleic acid cleavage activity. Cas nucleases (e.g., retaining wild-type nuclease activity, having reduced nuclease activity, and / or lacking nuclease activity) can function in CRISPR / Cas systems to modulate (e.g., decrease, increase, or eliminate) the level and / or activity of a target gene or protein. Cas proteins can bind to target polynucleotides and prevent transcription by physical obstruction or editing the nucleic acid sequence, resulting in a non-functional gene product. In some embodiments, the engineered Cas protein has 90% or less, 80% or less, 70% or less, 60% or less, 50% or less, 40% or less, 30% or less, 20% or less, 10% or less, 5% or less, or 1% or less of the functionality (e.g., nuclease activity) of a wild-type Cas protein (e.g., Cas9 from S. pyogenes). In some embodiments, the engineered Cas protein does not have substantial functionality of a wild-type Cas protein. When a Cas protein is in an engineered form and does not have substantial nucleic acid cleavage activity, it may be referred to as enzymatically inactive and / or "dead" (abbreviated "d"). A dead Cas protein (e.g., dCas, dCas9) may be capable of binding to a target polynucleotide but not cleaving the target polynucleotide. In some aspects, the dead Cas protein is a dead Cas9 protein or a dead Cas12a protein.

[0069] In some embodiments, the engineered Cas protein can be an engineered Cas "base editor." Base editing allows for the direct, irreversible conversion of one target DNA base to another in a programmable manner, without the need for DNA cleavage or a donor polynucleotide molecule. For example, Komor et al. (2016, Nature, 533:420-424) teach a Cas9-cytidine deaminase fusion in which Cas9 is engineered to be inactivated and not induce double-stranded DNA breaks. Additionally, Gaudelli et al. (2017, Nature, doi:10.1038 / nature24644) teach a catalytically impaired Cas9 fused to a tRNA adenosine deaminase, which can mediate the conversion of A / T to G / C in a target DNA sequence. Another class of engineered Cas9 nucleases that can be used as site-specific nucleases in the fusion proteins of the present disclosure are variants that can recognize a wide range of PAM sequences, including NG, GAA, and GAT (Hu et al., 2018, Nature, doi:10.1038 / nature26155).

[0070] Cas proteins can be modified to optimize regulation of gene expression. Cas proteins can be modified to increase or decrease nucleic acid binding affinity, nucleic acid binding specificity, and / or enzymatic activity. Cas proteins can also be modified to alter any other activity or property of the protein, such as stability. For example, one or more nuclease domains of a Cas protein can be modified, deleted, or inactivated, or the Cas protein can be truncated to remove domains that are not essential for protein function or to optimize (e.g., enhance or decrease) the activity of the Cas protein for regulating gene expression.

[0071] One or more nuclease domains of a Cas protein (e.g., RuvC, HNH) may be deleted or mutated so that they are no longer functional or contain reduced nuclease activity. For example, in a Cas protein containing at least two nuclease domains (e.g., Cas9), if one of the nuclease domains is deleted or mutated, the resulting Cas protein, known as a nickase, can generate a single-strand break, rather than a double-strand break, at the CRISPR RNA (crRNA) recognition sequence within double-stranded DNA. Such a nickase may be capable of cleaving either the complementary or non-complementary strand, but not both. In some embodiments, the targeting specificity of the double-strand break is improved by targeting the nickase to opposite strands at two nearby loci. If the nickase cleaves a single strand at both loci, a double-strand break is formed and can be repaired by HR as described herein. When all of the nuclease domains of a Cas protein (e.g., both the RuvC and HNH nuclease domains in the Cas9 protein, or the RuvC nuclease domain in the Cpfl protein) are deleted or mutated, the resulting Cas protein may have reduced or no ability to cleave both strands of double-stranded DNA.

[0072] Zinc finger nuclease In some embodiments, a site-specific nuclease suitable for use in the fusion proteins or methods described herein is a "zinc finger nuclease" or "ZFN." ZFN refers to a fusion between a cleavage domain, such as the cleavage domain of Fokl, and at least one zinc finger motif (e.g., at least 2, 3, 4, or 5 zinc finger motifs) capable of binding to polynucleotides such as DNA and RNA. Heterodimerization of two individual ZFNs at a specific location in a specific polynucleotide, with a specific orientation and spacing, can lead to cleavage of that polynucleotide. For example, binding of a ZFN to DNA can induce a double-strand break in the DNA. Two individual ZFNs can bind to opposite strands of DNA with their C-termini separated by a specific distance, allowing the two cleavage domains to dimerize and cleave the DNA. In some cases, a linker sequence between the zinc finger domain and the cleavage domain may be required such that the 5' ends of each binding site are separated by approximately 5 to 7 base pairs. In some cases, a cleavage domain is fused to the C-terminus of each zinc finger domain. Exemplary ZFNs include, but are not limited to, those described in Urnov et al., Nature Reviews Genetics, 2010, 11:636-646; Gaj et al., Nat Methods, 2012, 9(8):805-7; U.S. Patent Nos. 6,534,261; 6,607,882; 6,746,838; 6,794,136; 6,824,978; 6,866,997; 6,933,113; 6,979,539; 7,013,219; and 7,030,2 Nos. 15; 7,220,719; 7,241,573; 7,241,574; 7,585,849; 7,595,376; 6,903,185; 6,479,626 and U.S. Patent Application Publication Nos. 2003 / 0232410 and 2009 / 0203140.

[0073] In some embodiments, nucleases, including ZFNs, can generate double-strand breaks in target polynucleotides, such as DNA. Double-strand breaks in DNA can result in DNA break repair, allowing for the introduction of one or more genetic modifications (e.g., nucleic acid editing). DNA break repair can occur by non-homologous end joining (NHEJ) or homology-directed repair (HDR). In HDR, a donor polynucleotide repair template or template polynucleotide can be provided with homologous arms flanking the target DNA site. In some embodiments, ZFNs are zinc finger nickases that induce site-specific single-strand DNA breaks or nicks, thus resulting in HR. For a description of zinc finger nickases, see, for example, Ramirez et al., Nucl Acids Res, 2012, 40(12):5560-8; Kim et al., Genome Res, 2012, 22(7):1327-33. In some embodiments, the ZFN binds to a polynucleotide (eg, DNA and / or RNA) but is not able to cleave the polynucleotide.

[0074] In some embodiments, the cleavage domain of a nuclease, including a ZFN, comprises a modified form of a wild-type cleavage domain. The modified form of the cleavage domain may contain amino acid changes (e.g., deletions, insertions, or substitutions) that reduce the nucleic acid cleavage activity of the cleavage domain. For example, the modified form of the cleavage domain may have 90% or less, 80% or less, 70% or less, 60% or less, 50% or less, 40% or less, 30% or less, 20% or less, 10% or less, 5% or less, or 1% or less of the nucleic acid cleavage activity of the wild-type cleavage domain. The modified form of the cleavage domain may have no substantial nucleic acid cleavage activity. In some embodiments, the cleavage domain is enzymatically inactive.

[0075] TAL effector nucleases In some embodiments, a site-specific nuclease suitable for use in the fusion proteins, systems, or methods described herein is a "TALEN" or "TAL effector nuclease." TALEN generally refers to an engineered transcription activator-like effector nuclease having a central domain of DNA-binding tandem repeats and a cleavage domain. TALENs can be generated by fusing a TAL effector DNA-binding domain to a DNA-cleavage domain. Optionally, the DNA-binding tandem repeat comprises 33-35 amino acids in length and has two hypervariable amino acid residues at positions 12 and 13 that can recognize at least one specific DNA base pair. A transcription activator-like effector (TALE) protein can be fused to a nuclease, such as wild-type or mutant Fok1 endonuclease, or the catalytic domain of Fok1. Fok1 can be engineered with several mutations, e.g., to improve cleavage specificity or activity, to enable its use as a TALEN. Such TALENs can be engineered to bind to any desired DNA sequence. TALENs can be used to create double-strand breaks in target DNA sequences, which then cause NHEJ or HR, thereby generating genetic modifications (e.g., nucleic acid sequence editing).Double-strand breaks in DNA can result in DNA break repair, allowing for the introduction of one or more genetic modifications (e.g., nucleic acid editing).DNA break repair can occur by non-homologous end joining (NHEJ) or homology-directed repair (HDR).In HDR, a donor polynucleotide repair template or template polynucleotide can be provided that has homologous arms adjacent to the target DNA site.Optionally, a single-stranded donor polynucleotide repair template is provided to facilitate HR.For a detailed description of TALENs and their use in gene editing, see, e.g., U.S. Patent Nos. 8,440,431; 8,440,432; 8,450,471; 8,586,363; and 8,697,853; Scharenberg et al., Curr Gene Ther, 2013, 13(4):291-303; Gaj et al., Nat Methods, 2012, 9(8):805-7; Beurdeley et al., Nat Commun, 2013, 4:1762, and Joung and Sander, Nat Rev Mol Cell Biol, 2013, 14(1):49-55.

[0076] In some embodiments, the TALEN is engineered to reduce nuclease activity. In some embodiments, the nuclease domain of the TALEN comprises a modified form of a wild-type nuclease domain. The modified form of the nuclease domain may include amino acid changes (e.g., deletions, insertions, or substitutions) that reduce the nucleic acid cleavage activity of the nuclease domain. For example, the modified form of the nuclease domain may have 90% or less, 80% or less, 70% or less, 60% or less, 50% or less, 40% or less, 30% or less, 20% or less, 10% or less, 5% or less, or 1% or less of the nucleic acid cleavage activity of the wild-type nuclease domain. The modified form of the nuclease domain may have no substantial nucleic acid cleavage activity. In some embodiments, the nuclease domain is enzymatically inactive.

[0077] In some embodiments, a Transcription Activator-Like Effector (TALE) protein is fused to a domain capable of regulating transcription and is nuclease-free. In some embodiments, a Transcription Activator-Like Effector (TALE) protein is designed to function as a transcription activator. In some embodiments, a Transcription Activator-Like Effector (TALE) protein is designed to function as a transcription repressor. For example, the DNA-binding domain of a Transcription Activator-Like Effector (TALE) protein can be fused (e.g., linked) to one or more transcription activation domains or one or more transcription repression domains. Non-limiting examples of transcription activation domains include the herpes simplex VP16 activation domain and tetrameric repeats of the VP16 activation domain, e.g., the VP64 activation domain. Non-limiting examples of transcription repression domains include the Krüppel-associated box domain.

[0078] Meganuclease In some embodiments, a site-specific nuclease suitable for use in the fusion proteins, systems, or methods described herein is a meganuclease. Meganuclease generally refers to a rare-cutting endonuclease or homing endonuclease, which can be highly specific. Meganucleases can recognize DNA target sites at least 12 base pairs in length, e.g., 12-40 base pairs, 12-50 base pairs, or 12-60 base pairs in length. Meganucleases can be modular DNA-binding nucleases, such as any fusion protein containing at least one catalytic domain of an endonuclease and at least one DNA-binding domain or protein that specifies a nucleic acid target sequence. The DNA-binding domain can comprise at least one motif that recognizes single-stranded or double-stranded DNA. Meganucleases can generate double-strand breaks. Double-strand breaks in DNA can result in DNA break repair, allowing for the introduction of one or more genetic modifications (e.g., nucleic acid editing). DNA break repair can occur by non-homologous end joining (NHEJ) or homology-directed repair (HDR). In HDR, a donor polynucleotide template can be provided with homology arms flanking the site of the target DNA. Meganucleases can be monomeric or dimeric. In some embodiments, meganucleases are naturally occurring (found in nature) or wild-type; in other cases, meganucleases are non-natural, artificial, engineered, synthetic, rationally designed, or man-made. In some embodiments, meganucleases of the present disclosure include I-CreI meganuclease, I-CeuI meganuclease, I-Msol meganuclease, I-SceI meganuclease, variants thereof, derivatives thereof, and fragments thereof.For detailed descriptions of useful meganucleases and their applications for gene editing, see, e.g., Silva et al., Curr Gene Ther, 2011, 11(1):11-27; Zaslavoskiy et al., BMC Bioinformatics, 2014, 15:191; Takeuchi et al., Proc Natl Acad Sci USA, 2014, 111(11):4061-4066 and U.S. Patent Nos. 7,842,489; 7,897,372; 8,021,867; 8,163,514; 8,133,697; 8,021,867; 8,119,361; 8,119,381; 8,124,36 and 8,129,134 are referenced.

[0079] In some embodiments, the nuclease domain of the meganuclease comprises a modified form of a wild-type nuclease domain. The modified form of the nuclease domain may contain amino acid changes (e.g., deletions, insertions, or substitutions) that reduce the nucleic acid cleavage activity of the nuclease domain. For example, the modified form of the nuclease domain may have 90% or less, 80% or less, 70% or less, 60% or less, 50% or less, 40% or less, 30% or less, 20% or less, 10% or less, 5% or less, or 1% or less of the nucleic acid cleavage activity of the wild-type nuclease domain. The modified form of the nuclease domain may have no substantial nucleic acid cleavage activity. In some embodiments, the nuclease domain is enzymatically inactive. In some embodiments, the meganuclease can bind to DNA but cannot cleave DNA.

[0080] B. Recruiter Domain The fusion proteins provided herein comprise a recruiter domain that includes a site-specific DNA-binding domain. The recruiter domain of the fusion proteins herein can comprise any polypeptide having a unique recognition motif.

[0081] In some embodiments, the recruiter domain is a Cro repressor family protein. Cro repressor family proteins are bacteriophage transcription factors that function as homodimers. See, e.g., M.S. Dubrava, et al., N15 Cro and λ Cro: Orthologous DNA-binding domains with completely different but equally effective homodimer interfaces, Protein Science 17:803-812 (2008). In some embodiments, the recruiter domain comprises an N15 Cro protein, a lambda Cro protein, a P22 Cro protein (e.g., A. R. Poete, et al., Bacteriophage P22 Cro Protein: Sequence, Purification, and Properties, Biochemistry 25:251-256 (1986)), a 434 Cro protein (see, e.g., C. Wolberger, et al., Structure of a phage 434 Cro / DNA complex, Nature 335:789-795 (1988)), or a combination thereof. Further information on Cro family proteins from N15, lambda, P22, and 434 phages can be found in B.M. Hall, et al., Extreme divergence between one-to-one orthologs: the structure of N15 Cro bound to operator DNA and its relationship to the λ Cro complex, Nucleic Acids Research, 47(13):7118-7129 (2019).

[0082] In some embodiments, the recruiter domain of the fusion proteins provided herein comprises all or most of the polypeptide sequence of a naturally occurring DNA-binding protein. In some embodiments, the recruiter domain comprises only the DNA-binding domain of a naturally occurring DNA-binding protein. In some embodiments, the recruiter domain comprises one or more modifications relative to the protein from which they are derived (e.g., as described in the "Variations" section below). In some embodiments, the recruiter domain comprises a synthetic DNA-binding polypeptide sequence.

[0083] In some embodiments of the fusion proteins provided herein, the recruiter domain functions as an oligomer (i.e., binds to a specific DNA sequence). In some embodiments, the recruiter domain functions as a homo-oligomer. In some embodiments, the recruiter domain functions as a homo-dimer, homo-trimer, homo-tetramer, or higher-order homo-oligomer. In such embodiments, the fusion proteins provided herein can include two or more monomers (e.g., two, three, four, or more monomers) of the recruiter domain. In some embodiments, the monomers are separated by a linker (e.g., any linker described herein). For example, Cro repressor family proteins generally function as homo-dimers. In some embodiments, the fusion proteins provided herein include two or more monomers of a Cro repressor family protein recruiter domain sequence. In some embodiments, the two or more monomers are linked by a flexible linker, allowing the monomers to interact and form a homo-oligomer. Exemplary fusion proteins including two monomers of a Cro repressor family protein are described in the Examples herein and shown in Figures 4 and 5.

[0084] In some embodiments, recruiter domains function as hetero-oligomers, and in such embodiments, the fusion proteins provided herein can include at least one monomer of two or more recruiter domain proteins.

[0085] In some embodiments, the recruiter domains provided herein comprise an amino acid sequence having at least 70% (e.g., at least 75%, at least 80%, at least 85%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or 100%) identity to any one of SEQ ID NOs: 1-4.

[0086] C. Additional Fusion Protein Domains In some embodiments, the fusion proteins provided herein comprise one or more linkers. A linker, as used herein, is also referred to as a spacer and is a flexible molecule or a flexible stretch of molecules that joins or connects two portions (e.g., domains) of a fusion protein or a modified protein as provided herein. In some embodiments, the linker is a polypeptide. Proteins in which domains are joined by a polypeptide linker are referred to as fusion proteins. In some embodiments, the linker is a non-peptide linker. Proteins in which domains are joined by a polypeptide linker are referred to as modified proteins. It will be understood that where fusion proteins are discussed throughout this disclosure, modified proteins are generally also contemplated, where feasible.

[0087] Linkers can increase the range of orientations that domains in a fusion protein or modified protein can adopt. Linkers can be optimized to produce a desired effect in a fusion protein or modified protein. Aspects and considerations of linker design are described, for example, in Chen, X. et al., Adv Drug Deliv Rev. 2013 Oct 15;65(10):1357-1369 and Klein, J. Set al. 2014 Protein Eng. Des. Sel. 27(10):325-330. In some embodiments, the proteins provided herein comprise a peptide linker. In some embodiments, the proteins provided herein comprise a non-peptide linker. In some embodiments, the proteins provided herein comprise a peptide linker and a non-peptide linker. The proteins provided herein can also comprise multiple linkers, including at least one peptide linker, at least one non-peptide linker, or at least one peptide linker and at least one non-peptide linker.

[0088] Linkers can be short or long, flexible or fixed. See, e.g., PCT / US2020 / 051383 (incorporated herein by reference in its entirety), WO 2020 / 168102 (incorporated herein by reference in its entirety), and U.S. Patent Application Publication No. 2021 / 0017506 (incorporated herein by reference in its entirety).

[0089] In some embodiments, the length of the linker can affect one or more functions of the fusion protein. Selection of a linker to achieve a desired length is within the ability of one of ordinary skill in the art. In some embodiments, the peptide linker can be, for example, 5 to 100 or more amino acids in length (e.g., 5 aa, 10 aa, 15 aa, 20 aa, 25 aa, 30 aa, 35 aa, 40 aa, 45 aa, 50 aa, 55 aa, 60 aa, 65 aa, 70 aa, 75 aa, 80 aa, 85 aa, 90 aa, 95 aa, or 100 aa).

[0090] Depending on the length, the linker sequence may have various conformations in the secondary structure, such as helical, β-strand, coil / bend, and turn. In some cases, the linker sequence may have an extended conformation and function as an independent domain that does not interact with adjacent protein domains. The linker sequence may be flexible or rigid. Flexible linkers provide some degree of movement or interaction of the polypeptide domains and are generally rich in small or polar amino acids such as Gly and Ser (e.g., at least 90%, at least 95%, at least 98%, at least 99%, or all of the amino acid residues in the linker are either Gly or Ser). Rigid linkers may be used to maintain a constant distance between the domains and promote the maintenance of their independent functions. The linker bond may be via an amide bond (e.g., a peptide bond) or other functional group, as discussed further below.

[0091] In some embodiments, a peptide linker described herein comprises an amino acid sequence having at least 90% sequence identity to any one of SEQ ID NOs: 6, 7, 15, or 16. In some embodiments, the linker comprises an XTEN linker sequence. See, e.g., X. Li, et al., Base editing with a Cpf1-cytidine deaminase fusion, Nature Biotechnology 36:324-327 (2018); Y. Zong, et al., Precise base editing in rice, wheat, and maize with a Cas9-cytidine deaminase fusion, Nature Biotechnology 35:438-440 (2017); and V. Schellenberger, et al., A recombinant polypeptide extends the in vivo half-life of peptides and proteins in a tunable manner, Nature Biotechnology 27:1189-1190 (2009). In some embodiments, the linker comprises one or more repeats (e.g., 2 repeats, 3 repeats, 4 repeats, 5 repeats, 6 repeats or more) of SGGS (SEQ ID NO: 22), GGGS (SEQ ID NO: 23), GGGGS (SEQ ID NO: 24) and / or one or more repeats of GSSGSS (SEQ ID NO: 25). Further exemplary peptide linkers include, but are not limited to, SGSETPGTSESATPE ("XTEN-03," SEQ ID NO:27), SGSETPGTSESATPES ("XTEN-01," SEQ ID NO:28), SGSETPGTSESATPELK ("XTEN-02," SEQ ID NO:29), a peptide linker comprising (GGGGS) (SEQ ID NO:30), (GGGGS) (SEQ ID NO:31), (GGGGS) (SEQ ID NO:32), GGGGGGGG (SEQ ID NO:33), GSAGSAAGSGEF (SEQ ID NO:34), A(EAAAK) (SEQ ID NO:35), or A(EAAAK) (SEQ ID NO:36).Further non-limiting exemplary linkers that can be used include those disclosed in PCT / US2020 / 051383, Chen et al., Adv. Drug. Deliv. Rev. 65(10):1357-1369 (2014), and Rosemalen et al., Biochemistry 2017, 56, 50, 6565-6574, the entire contents of both of which are incorporated herein by reference.

[0092] In some embodiments, the non-peptide linker can include any of a number of known chemical linkers. Exemplary chemical linkers can include one or more units of beta-alanine, 4-aminobutyric acid (GABA), (2-aminoethoxy)acetic acid (AEA), 5-aminohexanoic acid (Ahx), PEG multimers, and trioxatridecane-succinamic acid (Ttds). In some embodiments, the non-peptide linker includes one or more units of polyethylene glycol (PEG). PEG is commonly used as a linker in the conjugation of polypeptide domains due to its water solubility, lack of toxicity, low immunogenicity, and well-defined chain length. See, for example, Ramirez-Paz, J., et al., PLoS One 13(7):e0197643 (2018). The number of PEG linking units can be selected based on the desired length of the linker.

[0093] Variant proteins containing non-peptide linkers can be produced in a variety of ways. For example, the site-specific nuclease and recruiter domains can be produced separately (e.g., in vitro or by expression in and purification from host cells) and chemically linked in vitro. In some embodiments, the site-specific nuclease, recruiter domain, and linker can each be produced separately in vitro and chemically linked. A variety of chemical linkers can be used to bridge the two amino acid residues.

[0094] Also contemplated herein are embodiments in which the site-specific nuclease and recruiter domain described above are used separately (e.g., introduced into a cell separately or applied to a target nucleic acid separately) to provide a complex without the use of a linker as described above. Various methods for forming complexes between two or more polypeptides are known in the art, including, but not limited to, using protein-protein interaction strategies (e.g., SunTag, coiled coil, etc.), using RNA aptamers and related binding proteins (e.g., MS2, N22, etc.), and tag:catcher strategies. For example, a site-specific nuclease of the present disclosure may comprise an MS2 RNA aptamer, which would promote interaction with a recruiter domain comprising an MS2 coat protein.

[0095] In some embodiments, the fusion proteins provided herein comprise a targeting sequence that mediates localization (or retention) of a protein to a subcellular location, such as the plasma membrane or the membrane of a given organelle, such as the nucleus, cytosol, mitochondria, endoplasmic reticulum (ER), Golgi, chloroplast, apoplast, peroxisome, or other organelle. For example, the targeting sequence can target a protein (e.g., a nuclease) to the nucleus using a nuclear localization signal (NLS), to the outside of the nucleus of the cell, e.g., to the cytoplasm, using a nuclear export signal (NES), to mitochondria using a mitochondrial targeting signal, to the endoplasmic reticulum (ER) using an ER retention signal, to peroxisomes using a peroxisomal targeting signal, to the plasma membrane using a membrane localization signal, or a combination thereof. In some embodiments, the fusion protein comprises a nuclear localization signal.Non-limiting examples of NLSs include NLS sequences derived from the SV40 virus large T antigen NLS having the amino acid sequence PKKKRKV (SEQ ID NO: 37), an NLS from nucleoplasmin (e.g., the nucleoplasmin bipartite NLS having the sequence KRPAATKKAGQAKKKK (SEQ ID NO: 38)), a c-myc NLS having the amino acid sequence PAAKRVKLD (SEQ ID NO: 39) or RQRRNELKRSP (SEQ ID NO: 40), a hRNPA1M9 NLS having the sequence NQSSNFGPMKGGNFGGRSSGPYGGGGQYFAKPRNQGGY (SEQ ID NO: 41), a IBB domain from importin alpha with the sequence RMRIZFKNKGKDTAELRRRRVEVSVELRKAKKDEQILKRRNV (SEQ ID NO: 42), a fibroid T protein with the sequences VSRKRPRP (SEQ ID NO: 43) and PPKKARED (SEQ ID NO: 44), a human p53 with the sequence PQPKKKPL (SEQ ID NO: 45), a mouse c-abl The sequence of IV SALIKKKKKMAP (SEQ ID NO: 46), the sequences DRLRR (SEQ ID NO: 47) and PKQKKRK (SEQ ID NO: 48) of influenza virus NS1, the sequence RKLKKKIKKL (SEQ ID NO: 49) of hepatitis virus delta antigen, the sequence REKKKFLKRR (SEQ ID NO: 50) of mouse Mx1 protein, the sequence KRKGDEVDGVDEVAKKKSKK (SEQ ID NO: 51) of human poly(ADP-ribose) polymerase, the sequence RKCLQAGMNLEARKTKK (SEQ ID NO: 52) of steroid hormone receptor (human) glucocorticoid, and the sequence KRPRDRHDGELGGRKRAR (SEQ ID NO: 53) of Agrobacterium VirD2 protein.

[0096] In some embodiments, the fusion proteins provided herein comprise an amino acid sequence having at least 70% (e.g., at least 75%, at least 80%, at least 85%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or 100%) identity to SEQ ID NO: 11 or 13.

[0097] Any of the polypeptides and fusion proteins described herein can further comprise a detectable moiety, such as a fluorescent protein or a fragment thereof. Examples of fluorescent proteins include, but are not limited to, yellow fluorescent protein (YFP, e.g., Venus), green fluorescent protein (GFP), and red fluorescent protein (RFP), as well as derivatives of these proteins, such as mutant derivatives. See, for example, Chudakov et al., "Fluorescent Proteins and Their Applications in Imaging Living Cells and Tissues," Physiological Reviews 90(3):1103-1163 (2010) and Specht et al., "A Critical and Comparative Review of Fluorescent Tools for Live-Cell Imaging," Annual Review of Physiology 79:93-117 (2017)).

[0098] Any of the polypeptides described herein can further comprise an affinity tag, such as a polyhistidine tag (e.g., (His)6 (SEQ ID NO: 54)), an HA tag (e.g., YPYDVPDYA (SEQ ID NO: 55)), albumin binding protein, alkaline phosphatase, an AU1 epitope, an AU5 epitope, a biotin carboxy carrier protein (BCCP), a FLAG epitope (e.g., DYKDDDDK (SEQ ID NO: 56)), or a MYC epitope (e.g., EQKLISEEDL (SEQ ID NO: 57)), to name a few. See Kimple et al., "Overview of Affinity Tags for Protein Purification," Curr. Protoc. Protein Sci. 73:Unit-9.9 (2013).

[0099] D. Variant Variants of the polypeptides of the present disclosure are also provided herein. Unless otherwise explicitly stated, polypeptide variants retain their respective biological activity. For example, variants of site-specific nuclease polypeptides retain the biological function of the full-length native sequence site-specific nuclease. In another example, variants of recruiter domains retain the biological function of the full-length native sequence recruiter domain.

[0100] Modifications to any of the polypeptides or proteins provided herein can be made by known methods. For example, modifications can be made by site-directed mutagenesis of nucleotides in a nucleic acid encoding the polypeptide, thereby generating DNA encoding the modification, which can then be expressed in recombinant cell culture to produce the encoded polypeptide. Techniques for making substitution mutations at predetermined sites in DNA having a known sequence are well known. For example, M13 primer mutagenesis and PCR-based mutagenesis methods can be used to make one or more substitution mutations. Any of the nucleic acid sequences provided herein can be codon-optimized to alter, e.g., maximize, expression in a host cell or organism.

[0101] The amino acids in the polypeptides described herein can be any of the 20 naturally occurring amino acids, D stereoisomers of naturally occurring amino acids, unnatural amino acids, and chemically modified amino acids. Unnatural amino acids (i.e., those not found in proteins in nature) are also known in the art, as shown, for example, in Zhang et al. "Protein engineering with unnatural amino acids," Curr. Opin. Struct. Biol. 23(4):581-587 (2013); Xie et al. "Adding amino acids to the genetic repertoire," 9(6):548-54 (2005)) and all references cited therein. β- and γ-amino acids are known in the art and are also contemplated herein as unnatural amino acids.

[0102] As used herein, a chemically modified amino acid refers to an amino acid whose side chain has been chemically modified. For example, the side chain can be modified to include a signaling moiety, such as a fluorophore or a radiolabel. The side chain can also be modified to include a new functional group, such as a thiol, a carboxylic acid, or an amino group. Post-translationally modified amino acids are also included in the definition of chemically modified amino acids.

[0103] Conservative amino acid substitutions are also contemplated. For example, conservative amino acid substitutions may be made at one or more amino acid residues, for example, at one or more lysine residues in any of the polypeptides provided herein. Those skilled in the art will know that a conservative substitution is the replacement of an amino acid residue with another amino acid residue that is biologically and / or chemically similar. The following eight groups each contain amino acids that are conservative substitutions for each other: 1) Alanine (A), Glycine (G), 2) Aspartic acid (D), glutamic acid (E), 3) Asparagine (N), Glutamine (Q), 4) Arginine (R), Lysine (K), 5) Isoleucine (I), Leucine (L), Methionine (M), Valine (V), 6) Phenylalanine (F), Tyrosine (Y), Tryptophan (W), 7) serine (S), threonine (T), and 8) Cysteine ​​(C), methionine (M).

[0104] For example, when arginine is substituted with serine, conservative substitutions of serine (e.g., threonine) are also contemplated. Non-conservative substitutions, such as lysine with asparagine, are also contemplated.

[0105] IV. Recombinant Nucleic Acids, Constructs, Vectors and Host Cells Also provided herein are recombinant nucleic acids encoding any of the polypeptides described herein. For example, recombinant nucleic acids encoding a polypeptide having at least 70% identity (e.g., at least 75%, at least 80%, at least 85%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or 100%) to SEQ ID NO: 12 or 14 are also provided. Recombinant nucleic acids having at least 70% identity to SEQ ID NO: 12 or 14 are also provided.

[0106] Also provided are DNA constructs comprising a promoter operably linked to a recombinant nucleic acid encoding a fusion protein or domain thereof as described herein. A nucleic acid is "operably linked" when it is placed into a functional relationship with another nucleic acid sequence. Many promoters can be used in the constructs described herein. A promoter is a region or sequence located upstream and / or downstream of the transcription start site that is involved in the recognition and binding of RNA polymerase and other proteins to initiate transcription.

[0107] The term "promoter," as used herein, refers to a nucleotide sequence that controls the expression of a coding sequence by providing recognition for RNA polymerase and other factors necessary for proper transcription, typically located upstream (5') of that coding sequence. A "promoter regulatory sequence" consists of proximal and more distal upstream elements. Promoter regulatory sequences affect the transcription, RNA processing or stability, or translation of the associated coding sequence. Regulatory sequences include enhancers, promoters, untranslated leader sequences, introns, and polyadenylation signal sequences. These include natural and synthetic sequences, as well as sequences that may be a combination of natural and synthetic sequences. An "enhancer" is a DNA sequence that can stimulate promoter activity and may be an intrinsic element of the promoter or a heterologous element inserted to increase the level or tissue specificity of the promoter. It is capable of operating in both orientations (normal or inverted) and can function when moved either upstream or downstream from the promoter. The term "promoter" includes "promoter regulatory sequence."

[0108] The choice of which promoter to include depends on several factors, including but not limited to efficiency, selectability, inducibility, desired expression level, and cell- or tissue-preferred expression. It is routine for those skilled in the art to regulate the expression of a sequence by appropriate selection and placement of promoters and other regulatory regions relative to that sequence.

[0109] Certain promoters have been shown to be capable of directing RNA synthesis at higher rates than others. These are called "strong promoters." Certain other promoters have been shown to direct RNA synthesis at higher levels only in certain types of cells or tissues; when a promoter preferentially directs RNA synthesis to a particular tissue (while RNA synthesis may occur at lower levels in other tissues), it is often referred to as a "tissue-specific promoter" or "tissue-preferred promoter." Because the expression pattern of a chimeric gene (or genes) introduced into plants is controlled using a promoter, there is ongoing interest in isolating novel promoters capable of controlling the expression of a chimeric gene (or genes) to a consistent level in specific tissue types or at specific plant developmental stages.

[0110] Certain promoters are capable of directing relatively similar levels of RNA synthesis in all tissues of a plant. These are called "constitutive promoters" or "tissue-independent" promoters. Constitutive promoters can be divided into strong, intermediate, and weak categories based on their effectiveness in directing RNA synthesis. Constitutive promoters are particularly useful in this regard, since in many cases, a chimeric gene (or genes) must be expressed simultaneously in different plant tissues to obtain the desired function of the gene (or genes). Although many constitutive promoters have been discovered and characterized from plants and plant viruses, there is still ongoing interest in isolating more novel synthetic or natural constitutive promoters capable of controlling the expression of chimeric genes (or genes) at different levels for gene stacking, thereby controlling the expression of multiple genes in the same transgenic plant.

[0111] Among the most commonly used promoters are the nopaline synthase (NOS) promoter (Ebert et al., Proc. Natl. Acad. Sci. USA 84:5745-5749 (1987)); the octapine synthase (OCS) promoter; caulimovirus promoters such as the cauliflower mosaic virus (CaMV) 19S promoter (Lawton et al., Plant Mol. Biol. 9:315-324 (1987)); the light-inducible promoter from the small subunit of Rubisco (Pellegrineschi et al., Biochem. Soc. Trans. 23(2):247-250 (1995)); the Adh promoter (Walker et al., Proc. Natl. Acad. Sci. USA 84:6624-66280 (1987)); the sucrose synthase promoter (Yang et al. al., Proc. Natl. Acad. Sci. USA 87:414-44148 (1990)); R gene complex promoter (Chandler et al., Plant Cell 1:1175-1183 (1989)); chlorophyll a / b binding protein gene promoter, etc.

[0112] Furthermore, it is contemplated that promoters that combine elements from two or more promoters may be useful. For example, U.S. Patent No. 5,491,288 discloses combining a cauliflower mosaic virus promoter with a histone promoter. Thus, elements from the promoters disclosed herein may be combined with elements from other promoters. Promoters useful for plant transgene expression include inducible, viral, synthetic, constitutive (Odell Nature 313:810-812 (1985)), temporally regulated, spatially regulated, tissue-specific, and spatiotemporally regulated promoters. Using the regulatory elements described herein, many agronomic genes can be expressed in transformed plants. More specifically, plants can be genetically engineered to express a variety of phenotypes of agronomic interest.

[0113] In some embodiments of the DNA constructs provided herein, the promoter can be a eukaryotic or prokaryotic promoter. In some embodiments, the promoter is an inducible promoter, a naturally occurring inducible promoter (e.g., drought-inducible Rab17), a synthetic inducible promoter (e.g., auxin-inducible DR5, estradiol-inducible XVE / pLex, dexamethasone-inducible GVG / Gal4), a constitutive promoter (e.g., ZmUbq1, OsAct1, OsTub3, EF), an egg cell-specific promoter (e.g., EC1, EC2, EC3, EC4, EC5), a pollen-specific promoter, an apical meristem-specific promoter, or a promoter with enhanced expression in the zygote. In some embodiments, the promoter is a floral mosaic promoter (e.g., ZmBde1, OsAP1). In some embodiments, the promoter is a ubiquitin 4 promoter, an actin promoter, a tubulin promoter, a MADS box promoter, or a plant virus promoter. Suitable promoters are disclosed, for example, in U.S. Pat. No. 10,519,456, the entire contents of which are incorporated herein by reference, and PCT / US2022 / 020690, the entire contents of which are incorporated herein by reference.

[0114] The recombinant nucleic acids provided herein can be included in an expression cassette for expression in a host cell or organism of interest. The cassette will include 5' and 3' regulatory sequences operably linked to the recombinant nucleic acids provided herein, allowing for expression of the fusion protein. The cassette may additionally contain at least one additional gene or genetic element to be co-transformed into the cell or organism. When additional genes or elements are included, the components are operably linked. Alternatively, the additional genes or elements can be provided on multiple expression cassettes. Such expression cassettes comprise multiple restriction and / or recombination sites for insertion of polynucleotides under the transcriptional control of the regulatory regions. The expression cassette may additionally contain a selectable marker gene. The expression cassette will include, in the 5' to 3' transcriptional direction, a transcriptional and translational initiation region (i.e., promoter) functional in the cell or organism of interest, a polynucleotide of the invention, and a transcriptional and translational termination region (i.e., termination region). A promoter of the invention is capable of directing or driving expression of a coding sequence (i.e., a nucleic acid sequence that is transcribed into RNA, such as mRNA, rRNA, tRNA, snRNA, ncRNA, lncRNA, sense RNA, or antisense RNA, whether or not that RNA is then translated to produce a protein) in a host cell. The regulatory regions (i.e., promoter, transcriptional regulatory region, and translation termination region) can be endogenous or heterologous to the host cell or to each other. As used herein, "heterologous" with respect to a sequence refers to a sequence that is derived from a foreign species or, if derived from the same species, has been substantially altered from its native form in composition and / or genomic locus by deliberate human intervention.

[0115] Additional regulatory signals include, but are not limited to, a start site for transcription initiation, operators, activators, enhancers, other regulatory elements, ribosome binding sites, start codons, termination signals, etc. See Sambrook et al. (1992) Molecular Cloning: A Laboratory Manual, ed. Maniatis et al. (Cold Spring Harbor Laboratory Press, Cold Spring Harbor, NY); Davis et al., eds. (1980) Advanced Bacterial Genetics (Cold Spring Harbor Laboratory Press), Cold Spring Harbor, NY, and references cited therein.

[0116] The expression cassette may also contain a selectable marker gene for selecting transformed cells. Marker genes include genes that confer antibiotic resistance, such as those that confer hygromycin resistance, ampicillin resistance, gentamicin resistance, and neomycin resistance, to name a few. Additional selectable markers are known, and any may be used.

[0117] In preparing expression cassettes, various DNA fragments can be manipulated to provide the DNA sequences in the proper orientation and, if necessary, in the proper reading frame. To this end, adapters or linkers can be used to join the DNA fragments, or other manipulations can be involved to provide convenient restriction sites, remove excess DNA, remove restriction sites, etc. To this end, in vitro mutagenesis, primer repair, restriction, annealing, resubstitutions, such as transitions and transversions, can be involved.

[0118] In preparing expression cassettes, various DNA fragments can be manipulated to provide the DNA sequences in the proper orientation and, if necessary, in the proper reading frame. To this end, adapters or linkers can be used to join the DNA fragments, or other manipulations can be involved to provide convenient restriction sites, remove excess DNA, remove restriction sites, etc. In vitro mutagenesis, primer repair, restriction, annealing, resubstitutions, such as transpositions and transversions, can be used for this purpose.

[0119] Also provided are vectors containing the recombinant nucleic acids or DNA constructs described herein. It is contemplated that the vectors have the necessary functional elements to direct and regulate the transcription of the inserted nucleic acid. Such functional elements include, but are not limited to, a promoter, a region upstream or downstream of the promoter, such as an enhancer that can regulate the transcriptional activity of the promoter, an origin of replication, a restriction site suitable for facilitating the cloning of an insert adjacent to the promoter, an antibiotic resistance gene or other marker that can be useful for selecting cells containing the vector or a vector containing the insert, an RNA splice junction, a transcription termination region, or any other region that can be useful for promoting the expression of the inserted gene or hybrid gene. Generally, the functional elements described in Sambrook et al., Molecular Cloning: A Laboratory Manual, 4 th ed., Cold Spring Harbor Laboratory Press, Cold Spring Harbor, 2012. The vector can be, for example, a plasmid.

[0120] Many E. coli expression vectors useful for expressing nucleic acids are known to those skilled in the art. Other microbial hosts suitable for use include bacilli such as Bacillus subtilis, other Enterobacteriaceae bacteria such as Salmonella and Serratia, and various Pseudomonas species. Expression vectors can also be made in these prokaryotic hosts, which will typically contain expression control sequences compatible with the host cell (e.g., an origin of replication). Additionally, several well-known promoters exist, such as the lactose promoter system, the tryptophan (Trp) promoter system, the beta-lactamase promoter system, or promoter systems from lambda phage. Additionally, yeast expression can be used. Provided herein are nucleic acids encoding the polypeptides of the invention, which can be expressed by yeast cells. More specifically, the nucleic acid can be expressed by Pichia pastoris or S. cerevisiae.

[0121] Mammalian cells also allow for the expression of proteins in an environment that favors important post-translational modifications, such as folding and cysteine ​​pairing, addition of complex carbohydrate structures, and secretion of active proteins. Vectors useful for expressing active proteins in mammalian cells are known in the art and can include genes that confer hygromycin resistance, geneticin, or G418 resistance, or other genes or phenotypes suitable for use as selectable markers or methotrexate resistance for gene amplification. A number of suitable host cell lines capable of secreting intact human proteins have been developed in the art, including CHO cells, HeLa cells, HEK-293 cells, HEK-293T cells, U2OS cells, or any other primary or transformed cell line. Other suitable host cell lines include COS-7 cells, myeloma cell lines, Jurkat cells, and the like. Expression vectors for these cells may contain expression control sequences such as a replication origin, a promoter, an enhancer, and necessary information processing sites such as a ribosome binding site, an RNA splice site, a polyadenylation site, and a transcription terminator sequence. Preferred expression control sequences are promoters derived from immunoglobulin genes, SV40, adenovirus, bovine papilloma virus, etc.

[0122] The expression vectors described herein can also include nucleic acids as described herein under the control of an inducible promoter, such as a tetracycline-inducible promoter or a glucocorticoid-inducible promoter. The nucleic acids of the invention can also be under the control of a tissue-specific promoter that drives expression of the nucleic acid in specific cells, tissues, or organs. Any regulatable promoter, of which many examples are known in the art, is also contemplated, such as metallothionein promoters, heat shock promoters, and other regulatable promoters. Additionally, Cre-loxP inducible systems and Flp recombinase inducible promoter systems can also be used, both of which are known in the art.

[0123] Insect cells also allow the expression of polypeptides. Recombinant proteins produced in insect cells with baculovirus vectors undergo post-translational modifications similar to wild-type mammalian proteins.

[0124] Also provided herein are host cells comprising the recombinant nucleic acids, DNA constructs, and / or vectors described herein, as well as methods for making such cells. In some embodiments, the cell is a plant cell. In some embodiments, the plant cell is a corn plant cell, a soybean plant cell, a rice plant cell, a wheat plant cell, or a sunflower plant cell.

[0125] Host cells comprising a nucleic acid or vector described herein are provided. The host cell can be an in vitro, ex vivo, or in vivo host cell. The host cell as provided herein is capable of expressing a fusion protein. Cell populations of any of the host cells described herein are also provided. In some embodiments, the cell population comprises a plurality of cells, wherein the plurality of cells comprises a recombinant nucleic acid encoding a fusion protein as described herein. In some embodiments, the cell population comprises a plurality of cells, wherein the plurality of cells comprises a DNA construct encoding a fusion protein as described herein. In some embodiments, the cell population comprises a plurality of cells, wherein the plurality of cells comprises a vector comprising a recombinant nucleic acid or DNA construct encoding a fusion protein as described herein. In some embodiments, the cell population comprises a plurality of cells, wherein the plurality of cells comprises a plurality of any of the host cells described herein. In some embodiments, the plurality of cells of any of the cell populations described herein express a fusion protein as described herein.

[0126] In some embodiments, the provided cells stably or transiently express the fusion protein. Stable expression of a fusion protein in a cell refers to the integration of any of the nucleic acids, DNA constructs, or vectors described herein into the genome of the cell, thereby enabling the cell to express the fusion protein. Transient expression refers to the direct expression of the fusion protein from any of the nucleic acids, DNA constructs, and / or vectors after introduction into the cell (i.e., the gene encoding the fusion protein is not integrated into the genome of the cell).

[0127] In some embodiments, the provided cells constitutively or inducibly express the fusion protein. Constitutive expression refers to ongoing, continuous expression of a gene (i.e., a protein), while inducible expression refers to gene (protein) expression in response to a stimulus. Inducible expression is generally regulated by an inducible promoter, a description of which is included above.

[0128] Also provided are cell cultures comprising one or more host cells described herein. Numerous cell culture and production methods are available in the art, including cells of bacterial origin (e.g., E. coli and other bacterial strains), animal origin (especially mammalian origin), and archebacterial origin. See, for example, Sambrook, supra; Ausubel, ed. (1995) Current Protocols in Molecular Biology, John Wiley & Sons; and Freshney (1994) Culture of Animal Cells, a Manual of Basic Technique, 3 rd Ed., Wiley-Liss, New York and references cited therein; Doyle and Griffiths (1997) Mammalian Cell Culture: Essential Techniques John Wiley and Sons, NY; Humason (1979) Animal Tissue Techniques, 4 thSee Ed. W.H. Freeman and Company and Ricciardelli, et al., (1989) In vitro Cell Dev. Biol. 25:1016-1024.

[0129] The host cell can be a prokaryotic cell, including, for example, a bacterial cell. Alternatively, the cell can be a eukaryotic cell, such as a plant cell, yeast cell, insect cell, avian cell, or mammalian cell. The plant cell can be a field or greenhouse crop plant cell, including, but not limited to, broad-acre crop plants, fruits and vegetables, perennial woody plants, and ornamental plants. In some embodiments, the plant cell is a plant cell of sugarcane, pumpkin, corn (maize), wheat, rice, cassava, soybean, hay, potato, cotton, tomato, alfalfa, and green algae. In some embodiments, the plant cell is a plant cell of any vegetable, such as cabbage, turnip, carrot, parsnip, beetroot, lettuce, bean, broad bean, pea, potato, eggplant, tomato, cucumber, pumpkin, squash, onion, garlic, leek, pepper, spinach, yam, sweet potato, and cassava. In some embodiments, the plant cells are corn, soybean, sunflower, tomato, rice, or wheat plant cells. In some embodiments, the mammalian cells can be HEK-293T cells, HEK-293 cells, Chinese hamster ovary (CHO) cells, U2OS cells, COS-7 cells, HELA cells, or any other primary or transformed cells. A number of other suitable host cell lines have been developed, including various tumor cell lines such as myeloma cell lines, fibroblast cell lines, and melanoma cell lines. Vectors containing the nucleic acid segment of interest can be transferred or introduced into host cells by well-known methods that vary depending on the type of cellular host.

[0130] As used herein, the phrase "introducing," in the context of introducing a nucleic acid into a cell (e.g., a prokaryotic, bacterial, eukaryotic, or plant cell), refers to changing the location of a nucleic acid sequence from outside the cell to inside the cell. Optionally, introducing refers to changing the location of a nucleic acid from outside the cell to inside the nucleus of the cell. When two or more nucleic acid molecules are to be introduced, the nucleic acid molecules can be assembled as part of a single polynucleotide or nucleic acid construct or as separate polynucleotides or nucleic acid constructs, and can be located on the same or different nucleic acid constructs. Thus, such polynucleotides can be introduced into a cell (e.g., a plant cell) in a single transformation event, in separate transformation events, or, for example, as part of a breeding protocol. Various methods of introducing nucleic acids into cells are contemplated, including, but not limited to, electroporation, nanoparticle delivery, biolistic transformation, viral delivery, contact with nanowires or nanotubes, receptor-mediated internalization, cell-penetrating peptide-mediated transfer, liposome-mediated transfer, DEAE-dextran, lipofectamine, calcium phosphate, or any method now known or later identified for the introduction of nucleic acids into prokaryotic or eukaryotic hosts. Targeted nuclease systems (e.g., RNA-guided nucleases, transcription activator-like effector nucleases (TALENs), zinc finger nucleases (ZFNs), or megaTALs (MTs) can also be used to introduce nucleic acids, such as nucleic acids encoding the fusion proteins described herein, into host cells. See Li et al., Signal Transduction and Targeted Therapy 5, Article No. 1 (2020).

[0131] Transformation of cells can be stable or transient. Thus, transgenic cells, plant cells, plants, and / or plant parts of the present invention can be stably transformed or transiently transformed. Transformation can refer to the transfer of a nucleic acid molecule into the genome of a host cell, resulting in stable genetic inheritance. In some embodiments, introduction into plants, plant parts, and / or plant cells is via bacterial-mediated transformation, particle bombardment transformation, calcium phosphate-mediated transformation, cyclodextrin-mediated transformation, electroporation, liposome-mediated transformation, nanoparticle-mediated transformation, polymer-mediated transformation, virus-mediated nucleic acid delivery, whisker-mediated nucleic acid delivery, microinjection, sonication, infiltration, polyethylene glycol-mediated transformation, protoplast transformation, or any other electrical, chemical, physical, and / or biological mechanism, or any combination thereof, that results in the introduction of a nucleic acid into a plant, plant part, and / or cell thereof.

[0132] Plant transformation procedures are well known and routine in the art and are described throughout this specification. Non-limiting examples of plant transformation methods include transformation by bacterial-mediated nucleic acid delivery (e.g., by bacteria from the genus Agrobacterium), virus-mediated nucleic acid delivery, silicon carbide or nucleic acid whisker-mediated nucleic acid delivery, liposome-mediated nucleic acid delivery, microinjection, microparticle bombardment, calcium phosphate-mediated transformation, cyclodextrin-mediated transformation, electroporation, nanoparticle-mediated transformation, sonication, infiltration, PEG-mediated nucleic acid uptake, and any other electrical, chemical, physical (mechanical), and / or biological mechanism that results in the introduction of nucleic acid into plant cells, including any combination thereof. General guides to various plant transformation methods known in the art include Miki et al. ("Procedures for Introducing Foreign DNA into Plants" in Methods in Plant Molecular Biology and Biotechnology, Glick, BR and Thompson, JE, Eds. (CRC Press, Inc., Boca Raton, 1993), pages 67-88) and Rakowoczy-Trojanowska (Cell Mol Biol Lett 7:849-858 (2002)).

[0133] Agrobacterium-mediated transformation is a commonly used method for transforming plants due to its high transformation efficiency and its versatility with many different species. Agrobacterium-mediated transformation typically involves transferring a binary vector carrying the foreign DNA of interest into a suitable Agrobacterium strain, which may depend on the complement of vir genes carried by the host Agrobacterium strain either on a coexisting Ti plasmid or on the chromosome (Uknes et al. 1993, Plant Cell 5:159-169). Transfer of the recombinant binary vector into Agrobacterium can be achieved by a triparental mating procedure using Escherichia coli carrying the recombinant binary vector and a helper E. coli strain carrying a plasmid capable of recruiting the recombinant binary vector to the target Agrobacterium strain. Alternatively, the recombinant binary vector can be transferred into Agrobacterium by nucleic acid transformation (Hoefgen and Willmitzer 1988, Nucleic Acids Res 16:9877).

[0134] Transformation of plants with recombinant Agrobacterium usually involves co-cultivation of the Agrobacterium with an explant from the plant, followed by methods well known in the art. The transformed tissue, which carries an antibiotic or herbicide resistance marker between the binary plasmid T-DNA borders, is typically regenerated on selective media.

[0135] Another method for transforming plants, plant parts, and plant cells involves projecting inert or biologically active particles into plant tissues and cells. See, e.g., U.S. Patent Nos. 4,945,050, 5,036,006, and 5,100,792. Generally, this method involves projecting inert or biologically active particles into plant cells under conditions effective to penetrate the outer surface of the cells and cause their internalization. When inert particles are used, the vector can be introduced into the cells by coating the particles with a vector containing the nucleic acid of interest. Alternatively, one or more cells can be surrounded by the vector, resulting in the vector being carried into the cells following the particle. Biologically active particles (e.g., dried yeast cells, dried bacteria, or bacteriophage, each containing one or more nucleic acids to be introduced) can also be projected into plant tissue. As used herein, the phrase "biolistic transformation" refers to a method of directly introducing RNA or DNA into a cell (e.g., a plant cell) by mixing the RNA or DNA with heavy metal particles (e.g., tungsten or gold) and releasing them into the cell (e.g., a plant cell) using high-velocity pressure, allowing the RNA or DNA to penetrate the cell (e.g., penetrate the plant cell wall).

[0136] CRISPR / Cas systems can also be used to edit the genome of a host cell or organism. As detailed above, the "CRISPR / Cas" system refers to a broad class of bacterial systems for defense against foreign nucleic acids. Any of the CRISPR / Cas system components described herein can be used to introduce a fusion protein, recombinant nucleic acid, or system into the genome of a host cell or organism. CRISPR / Cas system-mediated genome editing methods are known in the art. It will be understood that the introduction of a fusion protein, recombinant nucleic acid, or system described herein into the genome of a host cell or organism using a CRISPR / Cas system will differ from the detailed methods and systems provided herein.

[0137] Any of the fusion proteins described herein can be purified or isolated from a host cell or a population of host cells. For example, a recombinant nucleic acid encoding any of the fusion proteins described herein can be introduced into a host cell under conditions that allow expression of the fusion protein. In some embodiments, the recombinant nucleic acid is codon-optimized for expression. After expression in the host cell, the fusion protein can be isolated or purified using purification methods known in the art.

[0138] V.System In another aspect, provided herein is a system useful for editing one or more nucleic acids. The system comprises one or more of the fusion proteins (or recombinant nucleic acids, constructs, vectors, or host cells) described above. In some embodiments, the system further comprises one or more additional elements useful for editing one or more nucleic acids. For example, the system provided herein may further comprise a donor polynucleotide. As another example, a system comprising a fusion protein comprising a Cas nuclease may further comprise one or more guide nucleic acids and / or one or more donor polynucleotide sequences. Donor polynucleotides and guide nucleic acids are described in more detail below. The system provided herein is useful for practicing the methods described in Section VI of this disclosure.

[0139] A. Donor Polynucleotide The disclosed systems and methods may include a donor polynucleotide. A "donor polynucleotide," "donor molecule," or "donor template" is a target polynucleotide, typically a nucleotide polymer or oligomer, intended for insertion into a target genomic site. The donor sequence may be one or more transgenes, expression cassettes, or nucleotide sequences of interest. The donor molecule may be a single-stranded, partially double-stranded, or double-stranded donor DNA molecule. The donor polynucleotide may be a natural or modified polynucleotide, an RNA-DNA chimera, or a DNA fragment (either a single-stranded, at least partially double-stranded, or fully double-stranded DNA molecule), or a PGR-amplified ssDNA or at least partially dsDNA fragment. In some embodiments, the donor DNA molecule is part of a circularized DNA molecule. In some cases, fully double-stranded donor DNA can provide increased stability, as dsDNA fragments are generally more resistant to nuclease degradation than ssDNA.

[0140] In some embodiments, the donor polynucleotide comprises at least one recruiting sequence to which the recruiter domain of at least one fusion protein provided herein binds. In some embodiments, the donor polynucleotide comprises 2, 3, 4, 5, 6, 7, 8, or more recruiting sequences. In some embodiments, two or more recruiting sequences are the same sequence. In some embodiments, two or more recruiting sequences are different sequences. In some embodiments, the recruiting sequence comprises at least 10 (e.g., at least 12, at least 14, at least 16, at least 18, at least 20, at least 25, at least 30, at least 35, at least 40, at least 45, at least 50, or more) contiguous nucleotides that are at least 70% identical to a recognition motif specifically bound by the recruiter domain. In some embodiments, the recruiter domain comprises a naturally occurring sequence that is specifically bound by a DNA-binding protein. In some embodiments, the recruiter domain comprises one or more modifications to a naturally occurring sequence. In some embodiments, the recruiter domain comprises a synthetic sequence that is specifically bound by a recruiter domain described herein.

[0141] In some embodiments, the recruitment sequence is a Cro repressor family protein operon sequence ("Cro O"). R In some embodiments, the recruitment sequence comprises a N15 O R 3. Operon sequence, lambda O R 3 operon sequences, P22 O R 3 operon sequences, 434 O RIn some embodiments, the recruit sequence comprises a nucleotide sequence having at least 70% (e.g., at least 75%, at least 80%, at least 85%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or 100%) identity to any one of SEQ ID NOs: 5-10.

[0142] The donor molecule can include at least 10 contiguous nucleotides (often referred to as homology arms) that are at least 70% identical to a genomic nucleotide sequence, such that these contiguous nucleotides are sufficient for homologous recombination of the donor polynucleotide molecule into the genome of the cell with the target genomic DNA sequence, for example, after cleavage by a site-specific nuclease. In some embodiments, the donor polynucleotide molecule can comprise at least about 10, 20, 30, 50, 70, 80, 100, 150, 200, 250, 300, 250, 400, 450, 500, 600, 700, 800, 900, 1000, 1500, 2000, 2500, 3000, 3500, 4000, 4500, 5000, 7500, 10000, 15,000, or 20,000 nucleotides (including any value within this range not explicitly recited herein), and the donor polynucleotide molecule is at least 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% or 100% identical to a genomic nucleic acid sequence. In some embodiments, the donor molecule comprises at least one homology arm, hi some embodiments, the donor molecule comprises two homology arms (which may be referred to as a left homology arm and a right homology arm).

[0143] In some embodiments, the donor polynucleotide molecule may be substantially complementary to a genomic nucleic acid sequence. In some embodiments, the donor polynucleotide molecule comprises a heterologous nucleic acid sequence. In some embodiments, the donor polynucleotide molecule comprises at least one expression cassette. In some embodiments, the donor polynucleotide molecule may comprise a transgene comprising at least one expression cassette. In some embodiments, the donor polynucleotide molecule comprises an allelic modification of a gene native to the target genome. The allelic modification may comprise at least one nucleotide insertion, at least one nucleotide deletion, and / or at least one nucleotide substitution. In some embodiments, the allelic modification may comprise a small insertion or deletion.

[0144] The donor polynucleotide can be any suitable nucleic acid. In some embodiments, the donor polynucleotide is part of a donor template. In some embodiments, the donor template is part of a plasmid or linear nucleic acid. In some embodiments, the donor polynucleotide is part of a chromosome.

[0145] The various sequences of the donor polynucleotides described herein can be organized in any suitable configuration. R Exemplary embodiments of donor polynucleotides comprising three recruit sequences and left and right homology arms flanking an insert sequence designed for integration into a target site (e.g., according to the methods described herein) are shown in Figure 1. In some embodiments, the donor polynucleotide comprises two homology arms, where the homology arms are adjacent to the insert sequence. In donor polynucleotide embodiments comprising one or more recruit sequences, the recruit sequences can be upstream and / or downstream of the insert sequence. In some embodiments, the recruit sequence is between the homology arms and the insert sequence. In some embodiments, the recruit sequence is outside of the homology arms. In some embodiments, the donor polynucleotide comprises two or more recruit sequences comprising identical sequences in tandem.

[0146] In some embodiments, the donor polynucleotide comprises a nucleotide sequence having at least 70% (e.g., at least 75%, at least 80%, at least 85%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or 100%) identity to SEQ ID NO: 20 or 21.

[0147] B. Guide Nucleic Acid Optionally, the systems and methods described herein include at least one guide nucleic acid polynucleotide. Optionally, the systems and methods described herein include multiple guide nucleic acids. In some embodiments, the polynucleotide may be deoxyribonucleic acid (DNA). Optionally, the DNA sequence may be single-stranded or double-stranded. In some embodiments, the at least one guide nucleic acid polynucleotide may be a ribonucleic acid (guide RNA).

[0148] In some embodiments, the nuclease may be complexed with at least one guide RNA polynucleotide. The at least one guide RNA polynucleotide may comprise a nucleic acid targeting region that comprises a sequence complementary to a nucleic acid sequence on a targeted polynucleotide, such as a targeted genomic locus or gene, conferring sequence specificity for nuclease targeting. In some embodiments, the at least one guide RNA polynucleotide may comprise two separate nucleic acid molecules (which may be referred to as a double-guide nucleic acid) or a single nucleic acid molecule (which may be referred to as a single-guide nucleic acid (e.g., single-guide RNA or sgRNA)). In some embodiments, the guide nucleic acid is a single-guide nucleic acid comprising a fused CRISPR RNA (crRNA) and a trans-activating crRNA (tracrRNA). In some embodiments, the guide nucleic acid is a single-guide nucleic acid comprising a crRNA. In some embodiments, the guide nucleic acid is a single-guide nucleic acid comprising a crRNA but lacking a tracrRNA. In some embodiments, the guide nucleic acid is a double-guide nucleic acid comprising unfused crRNA and tracrRNA. An exemplary double-guide nucleic acid may comprise a crRNA-like molecule and a tracrRNA-like molecule. Exemplary single guide nucleic acids can include crRNA-like molecules. Exemplary single guide nucleic acids can include fused crRNA-like and tracrRNA-like molecules.

[0149] The crRNA can include a nucleic acid targeting segment (e.g., a spacer region) of the guide nucleic acid and a stretch of nucleotides that can form one half of a double-stranded duplex of the Cas protein binding segment of the guide nucleic acid.

[0150] The tracrRNA can include a stretch of nucleotides that forms the other half of the double-stranded duplex of the Cas protein-binding segment of the gRNA. The stretch of nucleotides of the crRNA is complementary to the stretch of nucleotides of the tracrRNA and can hybridize with it to form the double-stranded duplex of the Cas protein-binding domain of the guide nucleic acid.

[0151] The crRNA and tracrRNA can hybridize to form a guide nucleic acid. The crRNA can also provide a single-stranded nucleic acid targeting segment (e.g., a spacer region) that hybridizes to a target nucleic acid recognition sequence (e.g., a protospacer). The sequence of the crRNA or tracrRNA molecule, including the spacer region, can be designed to be specific for the species in which the guide nucleic acid will be used.

[0152] Whether the nuclease requires only the crRNA molecule or both the crRNA and tracrRNA molecules (covalently linked or not) depends on the CRISPR-associated nuclease used.

[0153] In some embodiments, the nucleic acid targeting region of the guide nucleic acid can be 18 to 72 nucleotides in length. The nucleic acid targeting region of the guide nucleic acid (e.g., spacer region) can have a length of about 12 nucleotides to about 100 nucleotides. For example, the nucleic acid targeting region of the guide nucleic acid (e.g., spacer region) can have a length of about 12 nucleotides (nt) to about 80 nt, about 12 nt to about 50 nt, about 12 nt to about 40 nt, about 12 nt to about 30 nt, about 12 nt to about 25 nt, about 12 nt to about 20 nt, about 12 nt to about 19 nt, about 12 nt to about 18 nt, about 12 nt to about 17 nt, about 12 nt to about 16 nt, or about 12 nt to about 15 nt. Alternatively, the DNA targeting segment can have a length of about 18 nt to about 20 nt, about 18 nt to about 25 nt, about 18 nt to about 30 nt, about 18 nt to about 35 nt, about 18 nt to about 40 nt, about 18 nt to about 45 nt, about 18 nt to about 50 nt, about 18 nt to about 60 nt, about 18 nt to about 70 nt, about 18 nt to about 80 nt, about 18 nt to about 90 nt, about 18 nt to about 100 nt, about 20 nt to about 25 nt, about 20 nt to about 30 nt, about 20 nt to about 35 nt, about 20 nt to about 40 nt, about 20 nt to about 45 nt, about 20 nt to about 50 nt, about 20 nt to about 60 nt, about 20 nt to about 70 nt, about 20 nt to about 80 nt, about 20 nt to about 90 nt, or about 20 nt to about 100 nt. The length of the nucleic acid targeting region can be at least 5, 10, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 30, or more nucleotides. The length of the nucleic acid targeting region (e.g., spacer sequence) can be at most 5, 10, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 30, or more nucleotides.

[0154] In some embodiments, the nucleic acid targeting region of the guide nucleic acid (e.g., a spacer) is 20 nucleotides in length. In some embodiments, the nucleic acid targeting region of the guide nucleic acid is 19 nucleotides in length. In some embodiments, the nucleic acid targeting region of the guide nucleic acid is 18 nucleotides in length. In some embodiments, the nucleic acid targeting region of the guide nucleic acid is 17 nucleotides in length. In some embodiments, the nucleic acid targeting region of the guide nucleic acid is 16 nucleotides in length. In some embodiments, the nucleic acid targeting region of the guide nucleic acid is 21 nucleotides in length. In some embodiments, the nucleic acid targeting region of the guide nucleic acid is 22 nucleotides in length.

[0155] The nucleotide sequence of the guide nucleic acid complementary to the nucleotide sequence of the target nucleic acid (target sequence) can have a length of, for example, at least about 12 nucleotides (nt) to about 80 nt, about 12 nt to about 50 nt, about 12 nt to about 45 nt, about 12 nt to about 40 nt, about 12 nt to about 35 nt, about 12 nt to about 30 nt, about 12 nt to about 25 nt, about 12 nt to about 20 nt, about 12 nt to about 19 nt, about 19 nt to about 20 ... The length may be about 19 nt to about 25 nt, about 19 nt to about 30 nt, about 19 nt to about 35 nt, about 19 nt to about 40 nt, about 19 nt to about 45 nt, about 19 nt to about 50 nt, about 19 nt to about 60 nt, about 20 nt to about 25 nt, about 20 nt to about 30 nt, about 20 nt to about 35 nt, about 20 nt to about 40 nt, about 20 nt to about 45 nt, about 20 nt to about 50 nt, or about 20 nt to about 60 nt.

[0156] The protospacer sequence of a targeted polynucleotide can be identified by identifying a protospacer adjacent motif (PAM) within the region of interest and selecting a region of desired size upstream or downstream of the PAM as the protospacer. The corresponding spacer sequence can be designed by determining the complementary sequence of the protospacer region.

[0157] Spacer sequences can be identified using a computer program (e.g., machine-readable code) that can use variables such as predicted melting temperature, secondary structure formation and predicted annealing temperature, sequence identity, genomic context, chromatin exposure, %GC, genomic frequency, methylation status, and the presence of SNPs.

[0158] The percent complementarity between a nucleic acid targeting sequence (e.g., a spacer sequence of at least one guide polynucleotide as disclosed herein) and a target nucleic acid (e.g., a protospacer sequence of one or more target loci as disclosed herein) can be at least 50%, at least 60%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 97%, at least 98%, at least 99%, or 100%. The percent complementarity between a nucleic acid targeting sequence and a target nucleic acid can be at least 60%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 97%, at least 98%, at least 99%, or 100% over about 20 contiguous nucleotides.

[0159] The Cas protein-binding segment of a guide nucleic acid may comprise two consecutive nucleotides (e.g., crRNA and tracrRNA) that are complementary to each other. The two consecutive nucleotides (e.g., crRNA and tracrRNA) that are complementary to each other may be covalently linked by an intervening nucleotide (e.g., a linker in the case of a single guide nucleic acid). The two consecutive nucleotides (e.g., crRNA and tracrRNA) that are complementary to each other may hybridize to form a double-stranded RNA duplex or hairpin of the Cas protein-binding segment, thus creating a stem-loop structure. The crRNA and tracrRNA may be covalently linked via the 3' end of the crRNA and the 5' end of the tracrRNA. Alternatively, the tracrRNA and crRNA may be covalently linked via the 5' end of the tracrRNA and the 3' end of the crRNA.

[0160] The Cas protein-binding segment of the guide nucleic acid can have a length of about 10 nucleotides to about 100 nucleotides, such as about 10 nucleotides (nt) to about 20 nt, about 20 nt to about 30 nt, about 30 nt to about 40 nt, about 40 nt to about 50 nt, about 50 nt to about 60 nt, about 60 nt to about 70 nt, about 70 nt to about 80 nt, about 80 nt to about 90 nt, or about 90 nt to about 100 nt. For example, the Cas protein-binding segment of the guide nucleic acid can have a length of about 15 nucleotides (nt) to about 80 nt, about 15 nt to about 50 nt, about 15 nt to about 40 nt, about 15 nt to about 30 nt, or about 15 nt to about 25 nt.

[0161] The dsRNA duplex of the Cas protein-binding segment of the guide nucleic acid can have a length of about 6 base pairs (bp) to about 50 bp. For example, the dsRNA duplex of the protein-binding segment can have a length of about 6 bp to about 40 bp, about 6 bp to about 30 bp, about 6 bp to about 25 bp, about 6 bp to about 20 bp, about 6 bp to about 15 bp, about 8 bp to about 40 bp, about 8 bp to about 30 bp, about 8 bp to about 25 bp, about 8 bp to about 20 bp, or about 8 bp to about 15 bp. For example, the dsRNA duplex of the Cas protein binding segment can have a length of about 8 bp to about 10 bp, about 10 bp to about 15 bp, about 15 bp to about 18 bp, about 18 bp to about 20 bp, about 20 bp to about 25 bp, about 25 bp to about 30 bp, about 30 bp to about 35 bp, about 35 bp to about 40 bp, or about 40 bp to about 50 bp.

[0162] In some embodiments, the dsRNA duplex of the Cas protein-binding segment may have a length of 36 base pairs. The percent complementarity between the nucleotide sequences that hybridize to form the dsRNA duplex of the protein-binding segment may be at least about 60%. For example, the percent complementarity between the nucleotide sequences that hybridize to form the dsRNA duplex of the protein-binding segment may be at least about 65%, at least about 70%, at least about 75%, at least about 80%, at least about 85%, at least about 90%, at least about 95%, at least about 98%, or at least about 99%. In some cases, the percent complementarity between the nucleotide sequences that hybridize to form the dsRNA duplex of the protein-binding segment may be 100%.

[0163] The linker (e.g., the sequence linking the crRNA and tracrRNA to form a single guide nucleic acid) can be about 3 nucleotides (nt) to about 100 nucleotides in length. For example, the linker can be about 3 nucleotides (nt) to about 90 nt, about 3 nucleotides (nt) to about 80 nt, about 3 nucleotides (nt) to about 70 nt, about 3 nucleotides (nt) to about 60 nt, about 3 nucleotides (nt) to about 50 nt, about 3 nucleotides (nt) to about 40 nt, about 3 nucleotides (nt) to about 30 nt, about 3 nucleotides (nt) to about 20 nt, or about 3 nucleotides (nt) to about 10 nt in length. For example, the linker can have a length of about 3 nt to about 5 nt, about 5 nt to about 10 nt, about 10 nt to about 15 nt, about 15 nt to about 20 nt, about 20 nt to about 25 nt, about 25 nt to about 30 nt, about 30 nt to about 35 nt, about 35 nt to about 40 nt, about 40 nt to about 50 nt, about 50 nt to about 60 nt, about 60 nt to about 70 nt, about 70 nt to about 80 nt, about 80 nt to about 90 nt, or about 90 nt to about 100 nt. In some embodiments, the linker of the DNA-targeting RNA is 4 nt.

[0164] Guide nucleic acids of the disclosed systems may contain modifications or sequences that provide additional desirable characteristics (e.g., modified or controlled stability, intracellular targeting, tracking by fluorescent labels, binding sites for proteins or protein complexes, etc.). Examples of such modifications include, for example, a 5' cap (7-methylguanylate cap (m7G)), a 3' polyadenylation tail (3' poly(A) tail), a riboswitch sequence (e.g., allowing for regulated stability and / or regulated accessibility) by proteins and / or protein complexes, a stability control sequence, a sequence that forms a dsRNA duplex (hairpin), a modification or sequence that targets the RNA to a subcellular location (e.g., nucleus, mitochondria, chloroplast, etc.), a modification or sequence that provides tracking (e.g., direct conjugation to a fluorescent molecule, conjugation to a moiety that facilitates fluorescent detection, a sequence that allows fluorescent detection, etc.), a protein (e.g., a transcriptional activator, a transcriptional repressor, a DNA methyltransferase, a DNA demethylase, a histone acetyltransferase, a histone deacetylase, and combinations thereof), or a modification or sequence that provides binding sites for proteins that act on DNA, including proteins (e.g., transcriptional activators, transcriptional repressors, DNA methyltransferases, DNA demethylases, histone acetyltransferases, histone deacetylases, and combinations thereof).

[0165] A guide nucleic acid may contain one or more modifications (e.g., base modifications, backbone modifications) to provide a nucleic acid with novel or enhanced characteristics (e.g., enhanced stability). A guide nucleic acid may contain a nucleic acid affinity tag. A nucleoside may be a base-sugar combination. The base portion of a nucleotide may be a heterocyclic base. The two most common classes of such heterocyclic bases are purines and pyrimidines. A nucleotide may be a nucleoside further comprising a phosphate group covalently linked to the sugar portion of the nucleoside. For nucleosides that include a pentofuranosyl sugar, the phosphate group may be linked to the 2', 3', or 5' hydroxyl moiety of the sugar. In forming a guide nucleic acid, the phosphate group may covalently link adjacent nucleosides to each other to form a linear polymeric compound. The respective ends of this linear polymeric compound can then be further joined to form a circular compound, although linear compounds may be preferred. In addition, linear compounds can have internal nucleotide base complementarity and thus can fold in such a way as to yield fully or partially double-stranded compounds. Furthermore, within a guide nucleic acid, the phosphate groups can generally be referred to as forming the internucleoside backbone of the guide nucleic acid. The linkage or backbone of the guide nucleic acid can be a 3'→5' phosphodiester linkage.

[0166] The guide nucleic acid can have a modified backbone and / or modified internucleoside linkage. Modified backbones can include those that retain a phosphorus atom in the backbone and those that do not have a phosphorus atom in the backbone.

[0167] Suitable modified guide nucleic acid backbones containing a phosphorus atom therein include, for example, phosphorothioates, chiral phosphorothioates, phosphorodithioates, phosphate triesters, aminoalkyl phosphotriesters, methyl and other alkyl phosphonates, such as 3'-alkylene phosphonates, 5'-alkylene phosphonates, chiral phosphonates, phosphinates, phosphoramidates including 3'-aminophosphoramidate and aminoalkylphosphoramidates, phosphorodiamidates, thionophosphoramidates, thionoalkylphosphonates, thionoalkylphosphotriesters, selenophosphates, and boranophosphates, including those with normal 3'-5' linkages, 2'-5' linked analogs, and those with reversed polarity such that one or more internucleotide linkages are 3'→3', 5'→5', or 2'→2' linkages. Suitable guide nucleic acids with inverted polarity can include a single 3'→3' linkage at the 3'-most internucleotide linkage (such as a single inverted nucleoside residue lacking a nucleobase or having a hydroxyl group instead). Various salts (e.g., potassium chloride or sodium chloride), mixed salts, and free acid forms can also be included.

[0168] The guide nucleic acid may contain one or more phosphorothioate and / or heteroatom internucleoside linkages, particularly -CH2-NH-O-CH2-, -CH2-N(CH3)-O-CH2- (methylene(methylimino) or MMI backbone), -CH2-ON(CH3)-CH2-, -CH2-N(CH3)-N(CH3)-CH2- and -ON(CH3)-CH2-CH2- (where the natural phosphodiester internucleotide linkage is represented as -OP(=O)(OH)-O-CH2-).

[0169] The guide nucleic acid may include a morpholino backbone structure. For example, the nucleic acid may include a six-membered morpholino ring instead of a ribose ring. In some of these embodiments, phosphorodiamidate or other non-phosphodiester internucleoside linkages replace the phosphodiester linkages.

[0170] Guide nucleic acids can include polynucleotide backbones formed by short chain alkyl or cycloalkyl internucleoside linkages, mixed heteroatom and alkyl or cycloalkyl internucleoside linkages, or one or more short chain heteroatom or heterocyclic internucleoside linkages, including those with morpholino linkages (formed in part by the sugar portion of the nucleoside), siloxane backbones, sulfide, sulfoxide and sulfone backbones, formacetyl and thioformacetyl backbones, methyleneformacetyl and thioformacetyl backbones, riboacetyl backbones, alkene-containing backbones, sulfamate backbones, methyleneimino and methylenehydrazino backbones, sulfonate and sulfonamide backbones, amide backbones, and others with a mixture of N, O, S, and CH moieties.

[0171] The guide nucleic acid may include a nucleic acid mimic. The term "mimetic" is intended to include polynucleotides in which only the furanose ring or both the furanose ring and the internucleotide linkage are replaced with non-furanose groups; replacement of only the furanose ring may also be referred to as a sugar surrogate. The heterocyclic base moiety or modified heterocyclic base moiety may be maintained for hybridization with an appropriate target nucleic acid. One such nucleic acid may be a peptide nucleic acid (PNA). In a PNA, the sugar backbone of a polynucleotide may be replaced with an amide-containing backbone, specifically an aminoethylglycine backbone. The nucleotides are retained and can be directly or indirectly linked to the aza nitrogen atom of the amide portion of the backbone. The backbone in a PNA compound may contain two or more linked aminoethylglycine units, giving the PNA an amide-containing backbone. The heterocyclic base moiety may be directly or indirectly linked to the aza nitrogen atom of the amide portion of the backbone.

[0172] Guide nucleic acids can contain linked morpholino units (morpholino nucleic acids) with heterocyclic bases attached to the morpholino ring. Linking groups can link the morpholino monomer units of morpholino nucleic acids. Nonionic morpholino-based oligomeric compounds may have fewer undesired interactions with cellular proteins. Morpholino-based polynucleotides can be nonionic mimics of guide nucleic acids. Various compounds within the morpholino class can be joined using different linking groups. A further class of polynucleotide mimics can be called cyclohexenyl nucleic acids (CeNA). The furanose ring typically present in nucleic acid molecules can be replaced with a cyclohexenyl ring. CeNA DMT-protected phosphoramidite monomers can be prepared and used to synthesize oligomeric compounds using phosphoramidite chemistry. Incorporation of CeNA monomers into nucleic acid strands can increase the stability of DNA / RNA hybrids. CeNA oligoadenylates can form complexes with nucleic acid complements, with stability similar to that of the native complex. Further modifications include locked nucleic acids (LNAs) in which a 2'-hydroxyl group is linked to the 4' carbon atom of the sugar ring, thereby forming a 2'-C,4'-C-oxymethylene linkage, thereby forming a bicyclic sugar moiety. The linkage can be a methylene (-CH2-) group bridging the 2' oxygen atom and the 4' carbon atom (where n is 1 or 2). LNAs and LNA analogs can exhibit extremely high thermal stability of duplexes with complementary nucleic acids (Tm = +3 to +10°C), stability against 3'-exonuclease degradation, and good solubility properties.

[0173] The guide nucleic acid may contain one or more substituted sugar moieties. Suitable polynucleotides include OH, F, O-, S-, or N-alkyl, O-, S-, or N-alkenyl, O-, S-, or N-alkynyl, or O-alkyl-O-alkyl (where alkyl, alkenyl, and alkynyl are substituted or unsubstituted C1-C6). 10 Alkyl or C2-C 10 In particular, O((CH) n O)m CH3, O(CH2) n OCH3, O(CH2) n NH2, O(CH2) n CH3, O(CH2) n ONH2 and O(CH2) n ON((CH2) n CH3)2 (where n and m are from 1 to about 10) is preferred. The sugar substituent is C1-C 10 The substituent may be selected from lower alkyl, substituted lower alkyl, alkenyl, alkynyl, alkaryl, aralkyl, O-alkaryl or O-aralkyl, SH, SCH, OCN, Cl, Br, CN, CF, OCF, SOCH, SOCH, ONO, NO, N, NH, heterocycloalkyl, heterocycloalkaryl, aminoalkylamino, polyalkylamino, substituted silyl, RNA cleaving group, reporter group, intercalator, group that improves the pharmacokinetic properties of the guide nucleic acid or group that improves the pharmacodynamic properties of the guide nucleic acid, and other substituents with similar properties. Suitable modifications may include 2'-methoxyethoxy (2'-O-CHCHOCH, also known as 2'-O-(2-methoxyethyl) or 2'-MOE, an alkoxyalkoxy group). Further suitable modifications may include 2'-dimethylaminooxyethoxy, (O(CH)ON(CH) group, also known as 2'-DMAOE), and 2'-dimethylaminoethoxyethoxy (also known as 2'-O-dimethyl-amino-ethoxy-ethyl or 2'-DMAEOE), 2'-O-CH-O-CH-N(CH).

[0174] Other suitable sugar substituents include methoxy (-O-CH), aminopropoxy (-OCHCHNH), allyl (-CH-CH=CH), -O-allyl (-O-CH-CH=CH), and fluoro (F). The 2'-sugar substituent can be in the arabino (up) or ribo (down) position. A preferred 2'-arabino modification is 2'-F. Similar modifications can be made at other positions on the oligomeric compound, particularly the 3' position of the sugar on the 3'-terminal nucleoside or in a 2'-5'-linked nucleotide and the 5' position of a 5'-terminal nucleotide. Oligomeric compounds can also have sugar mimetics, such as a cyclobutyl moiety, in place of the pentofuranosyl sugar.

[0175] A guide nucleic acid may also include nucleobase (or "base") modifications or substitutions. As used herein, "unmodified" or "natural" nucleobases can include purine bases (e.g., adenine (A) and guanine (G)) and pyrimidine bases (e.g., thymine (T), cytosine (C), and uracil (U)). Modified nucleobases include 5-methylcytosine (5-me-C), 5-hydroxymethylcytosine, xanthine, hypoxanthine, 2-aminoadenine, 6-methyl and others, alkyl derivatives of adenine and guanine, 2-propyl and others, alkyl derivatives of adenine and guanine, 2-thiouracil, 2-thiothymine and 2-thiocytosine, 5-halouracil and cytosine, 5-propynyl (-C=C-CH3) uracil and cytosine and others, alkynyl derivatives of pyrimidine bases, 6-azouracil, cytosine and thymine, 5-uracil ( Other synthetic and natural nucleobases may be mentioned, such as 8-amino-, 8-isopropyl ... Modified nucleobases can include tricyclic pyrimidines such as phenoxazine cytidine (1H-pyrimido(5,4-b)(1,4)benzoxazin-2(3H)-one), phenothiazine cytidine (1H-pyrimido(5,4-b)(1,4)benzothiazin-2(3H)-one), G-clamps such as substituted phenoxazine cytidines (e.g., 9-(2-aminoethoxy)-H-pyrimido(5,4-(b)(1,4)benzoxazin-2(3H)-one), carbazole cytidine (2H-pyrimido(4,5-b)indol-2-one), pyridoindole cytidine (H-pyrido(3',2':4,5)pyrrolo(2,3-d)pyrimidin-2-one).

[0176] Heterocyclic base moieties include those in which the purine or pyrimidine base is replaced with other heterocycles, such as 7-deaza-adenine, 7-deazaguanosine, 2-aminopyridine, and 2-pyridone. Nucleobases can be useful for increasing the binding affinity of polynucleotide compounds. These can include 5-substituted pyrimidines, 6-azapyrimidines, and N-2, N-6, and O-6 substituted purines, including 2-aminopropyladenine, 5-propynyluracil, and 5-propynylcytosine. 5-Methylcytosine substitutions can increase nucleic acid duplex stability by 0.6-1.2°C and can be a preferred base substitution (e.g., when combined with a 2'-O-methoxyethyl sugar modification).

[0177] Modification of the guide nucleic acid may involve chemically linking to the guide nucleic acid one or more moieties or conjugates capable of improving the activity, cellular distribution, or cellular uptake of the guide nucleic acid. These moieties or conjugates may include conjugate groups covalently attached to functional groups, such as primary or secondary hydroxyl groups. Conjugate groups may include, but are not limited to, intercalators, reporter molecules, polyamines, polyamides, polyethylene glycols, polyethers, groups that improve the pharmacodynamic properties of oligomers, and groups that can improve the pharmacokinetic properties of oligomers. Conjugate groups may include, but are not limited to, cholesterol, lipids, phospholipids, biotin, phenazine, folic acid, phenanthridine, anthraquinone, acridine, fluorescein, rhodamine, coumarin, and dyes. Groups that improve pharmacodynamic properties include groups that improve uptake, improve degradation resistance, and / or enhance sequence-specific hybridization with the target nucleic acid. Groups that can improve pharmacokinetic properties include groups that improve the uptake, distribution, metabolism, or excretion of nucleic acids. Conjugate moieties can include, but are not limited to, lipid moieties such as cholesterol moieties, cholic acid, thioethers (e.g., hexyl-S-tritylthiol), thiocholesterol, aliphatic chains (e.g., dodecanediol or undecyl residues), phospholipids (e.g., di-hexadecyl-rac-glycerol or triethylammonium 1,2-di-O-hexadecyl-rac-glycero-3-H-phosphonate), polyamine or polyethylene glycol chains or adamantane acetic acid, palmityl moieties, or octadecylamine or hexylamino-carbonyl-oxycholesterol moieties.

[0178] In some embodiments, the at least one guide RNA polynucleotide of the systems or methods provided herein is capable of binding to at least a portion of a genome (e.g., a plant genome) or a gene (e.g., a plant gene). Optionally, the at least one guide RNA polynucleotide is capable of forming a complex with a site-specific nuclease and directing the site-specific nuclease to target a portion of the target nucleic acid (e.g., a site in the genome or gene).

[0179] In some embodiments, the systems described herein comprise at least one guide RNA polynucleotide capable of complexing with the site-specific nuclease portion of a fusion protein of the systems. In some embodiments, the systems described herein comprise at least two (e.g., at least three, at least four, at least five, or at least six) different guide RNA polynucleotides capable of complexing with the site-specific nuclease portion of a fusion protein of the systems.

[0180] In some embodiments, the guide nucleic acid comprises a nucleotide sequence having at least 70% (e.g., at least 75%, at least 80%, at least 85%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or 100%) identity to SEQ ID NO: 17 or 19.

[0181] Also provided herein are kits that include components of the systems described in this disclosure, in some embodiments, the kits include one or more of the fusion proteins and / or polynucleotides described herein.

[0182] VI. Method In another aspect, provided herein are methods for editing one or more nucleic acids using the fusion proteins and / or systems described herein. In some embodiments, the method comprises contacting a nucleic acid comprising a fusion protein binding site (i.e., the nucleic acid to be edited) with at least one fusion protein as described herein, wherein contacting the nucleic acid with the at least one fusion protein results in editing of the nucleic acid. The nucleic acid (i.e., the nucleic acid to be edited) can be any suitable nucleic acid. In some embodiments, the nucleic acid is part of a chromosome. In some embodiments, the nucleic acid is part of a genome (e.g., a plant genome).

[0183] As described herein and demonstrated in the Examples below, the methods provided herein can result in an increased frequency of one or more desired nucleic acid editing outcomes (e.g., fragment replacement by HDR).

[0184] In some embodiments, the nucleic acid edited by the present method comprises a target region. As used herein, "target region" refers to a portion of a nucleic acid that is targeted for editing. For example, the target region may be a portion of a gene to be edited. In some embodiments, at least a portion of the target region is replaced with at least a portion of a donor polynucleotide. In some embodiments, the target region comprises at least one (e.g., 2, 3, 4, 5, 6, 7, 8, 9, 10, or more) nuclease cleavage site. In some embodiments, the nucleic acid comprises at least one binding site. In some embodiments, the target region is adjacent to a nuclease cleavage site. In some embodiments, the nucleic acid comprises a first binding site adjacent to the 5' end of the target region and a second binding site adjacent to the 3' end of the target region.

[0185] In some embodiments, the nucleic acid to be edited comprises a first binding site and a second binding site, as described in more detail below. In some embodiments, the first binding site and the second binding site are different sequences, and the method comprises providing two different fusion proteins, one that binds to the first binding site and one that binds to the second binding site. In some embodiments, the first binding site and the second binding site are the same sequence, and the method comprises providing a fusion protein that can bind to both the first binding site and the second binding site.

[0186] In some embodiments, the methods herein include providing a donor polynucleotide. In some embodiments, the donor polynucleotide comprises a left homology arm (i.e., a homology arm complementary to a sequence upstream of the target region) and a right homology arm (i.e., a homology arm complementary to a sequence downstream of the target region). In such embodiments, the target region of the nucleic acid to be edited is flanked by the homology arms, and the target region comprises at least one fusion protein binding site (i.e., at least one cleavage site). An exemplary embodiment is shown in Figure 2.

[0187] In some embodiments of the methods provided herein, the recruiter domain of the fusion protein specifically binds to a recruit sequence in the donor polynucleotide. Without being bound by any particular theory, the fusion protein, which binds to both a binding site in the nucleic acid target region (i.e., via binding of a site-specific nuclease) and a recruit sequence in the donor polynucleotide (i.e., via specific binding of the recruiter domain), tethers the donor polynucleotide in close proximity to the cleavage site formed by the SDN. Such proximity, in some embodiments, increases the likelihood that the cleaved nucleic acid will be repaired to result in at least partial integration of the donor polynucleotide in the target region. In some embodiments, the donor polynucleotide comprises at least one homology arm, and repair occurs via HDR. In some embodiments, repair occurs via NHEJ or MMEJ, and the end of the donor polynucleotide is joined to the cleaved nucleic acid end within the target region.

[0188] In some embodiments of the methods provided herein, such as the Examples described herein, the site-specific nuclease of the at least one fusion protein comprises a CRISPR-associated nuclease. In such embodiments, the method may further comprise providing a guide RNA to target the fusion protein to the binding site. In some embodiments, the method comprises providing at least one first guide RNA and at least one second guide RNA. In some embodiments, the at least one first guide RNA comprises a nucleotide sequence having complementarity to a first binding site of the nucleic acid to be edited. In some embodiments, the at least one second guide RNA comprises a nucleotide sequence having complementarity to a second binding site of the nucleic acid to be edited.

[0189] The methods herein include providing at least a fusion protein, a nucleic acid to be edited, and a donor polynucleotide, and may also include providing at least one guide RNA. These various components may be provided using any suitable technique. For example, providing the fusion protein may include introducing the fusion protein into a cell or introducing a recombinant nucleic acid, construct, or vector encoding the fusion protein into a cell. Similarly, the gRNA may be provided by introducing the gRNA itself or a nucleic acid sequence encoding the gRNA. In some embodiments, the fusion protein and the gRNA may be encoded by the same DNA construct or vector. [Example]

[0190] Example 1. Cro-Cas9 fusion induces allelic replacement at the ZmALS2 locus. This example demonstrates that the Cro-Cas9 fusion protein improves the efficiency of homology-directed repair (HDR)-mediated allele replacement using donor DNA containing a Cro binding sequence. In this example, as shown in Figure 3 (top panel), the donor DNA contains a Cro operator site (O R 3) The Cro protein, functioning as a homodimer, binds to the operator site and is tethered to Cas9 via an XTEN linker.

[0191] The schematic design of the Cro-Cas9 fusion is shown in Figure 4. A single-chain dimer of N15 bacteriophage Cro protein (two monomers linked by a 15-amino acid linker: SEQ ID NO: 15) was fused to the N-terminus of Streptococcus pyogenes Cas9 (SpCas9) via a 32-amino acid linker (SEQ ID NO: 16). The fusion protein is tagged with a 3xFLAG peptide followed by an SV40 nuclear localization signal (NLS) at the N-terminus and another NLS at the C-terminus.

[0192] Svitashev et al. (2015, Plant Physiol.) reported Cas9-induced allele replacement at the ZmALS2 (maize acetolactate synthase homolog on chromosome 5) locus. In this example, the same target site (gRNA represented by SEQ ID NO: 17) was selected to evaluate the effectiveness of Cro-Cas9 fusion in improving allele replacement efficiency.

[0193] A DNA construct was constructed to express a Cro-Cas9 fusion protein and gRNA in maize cells. The maize codon-optimized coding sequence for the fusion protein was operably linked to the sugarcane ubiquitin 4 promoter and the Agrobacterium tumefaciens nopaline synthase terminator. A single guide RNA (sgRNA) targeting the Zmals2 locus was operably linked to the Oryza sativa U3 promoter, with a stretch of nine consecutive thymine bases following the sgRNA spacer sequence to terminate transcription. A control construct was also generated that lacked the Cro domain but was otherwise identical to the Cro-Cas9 expression vector.

[0194] The donor DNA, which served as the repair template for HDR, was based on a 657-bp homologous fragment taken from the ZmALS2 coding region, beginning at the fourth codon (GCT for alanine) and ending at the 222nd codon (CGG for arginine). The donor DNA sequence is represented by SEQ ID NO: 20. A total of eight single nucleotide substitutions were introduced into the homologous fragment around the Cas9 cleavage site. Only two of these resulted in amino acid changes. A G-to-C substitution at codon 111 changed methionine to isoleucine, while disrupting the TGG PAM of the gRNA and protecting the donor DNA from Cas9 cleavage. A C-to-T substitution at codon 119 changed proline to serine, rendering the gene product resistant to chlorsulfuron herbicide. The nucleotide substitutions also created three restriction sites as molecular barcodes to facilitate detection of HDR-edited alleles. A 20-bp bacteriophage N15 O nucleotide sequence, specifically recognized and bound by the Cro protein, was used. R The three operon sequences (SEQ ID NO: 18) were added to each flank of the substitution-containing homologous fragment to create donor DNA, which was cloned into a high-copy cloning vector flanked by PmeI and AscI restriction sites for excision.

[0195] The linearized vector (0.2 pmol) was co-delivered with double-stranded donor DNA (2 pmol) into maize embryonic callus cells via biolistic analysis. Transgenic calli were selected on PMI selective medium and subsequently regenerated into transgenic plants using standard tissue culture procedures. Regenerated plants were sampled for DNA extraction, and a TaqMan assay was designed to distinguish the HDR replacement-resistant allele from the wild-type allele. A pair of primers outside the homology region was designed to amplify a 1.5 kb fragment covering the entire homology region by PCR, and the amplicon was subjected to restriction fragment length polymorphism (RFLP) analysis and Sanger sequencing for sequence verification.

[0196] The results are summarized in Table 1. In each of the three allele-replaced plants generated with the Cro-Cas9 fusion construct, one of the two alleles underwent HDR-mediated allele replacement, introducing all eight nucleotide substitutions, and the other allele contained an indel mutation at the Cas9 cleavage site. The control vector produced only one plant in which one of the two alleles underwent partial replacement with a subset of the eight introduced substitutions.

[0197] [Table 1]

[0198] Example 2. Cro-Cas12a fusion induces allelic replacement at the ZmGL2 locus. This example demonstrates that the Cro-Cas12a fusion protein improves the efficiency of homology-directed repair (HDR)-mediated allele replacement using donor DNA containing a Cro binding sequence. In this example, as shown in Figure 3 (bottom), the donor DNA contains a Cro operator site (O R 3) The Cro protein, functioning as a homodimer, binds to the operator site and is tethered to Cas12a via an XTEN linker.

[0199] The schematic design of the Cro-Cas12a fusion is shown in Figure 5. The same single-chain N15 Cro dimer described in Example 1 was fused to the N-terminus of the Lachnospiraceae ND2006 Cas12a (LbCas12a) D156R mutant via a 30-amino acid (GGGGS)6 linker (SEQ ID NO: 26). The fusion protein is flanked by one SV40 NLS at the N-terminus and two SV40 NLSs at the C-terminus.

[0200] K. Lee, et al., "Activities and specificities of CRISPR / Cas9 and Cas12a nucleases for targeted mutagenesis in maize," Plant Biotech. J., 17:362-372 (2019), reported that LbCas12a could efficiently induce double-stranded DNA breaks in the maize Glossy2 (ZmGL2) locus. In this example, the same target site (gRNA sequence number 19) was selected to evaluate the effectiveness of Cro-LbCas12a fusion in improving allele replacement efficiency.

[0201] A DNA construct was constructed to express the Cro-LbCas12a fusion protein and gRNA in maize cells. The maize codon-optimized coding sequence for the fusion protein was operably linked to the sugarcane ubiquitin 4 promoter and the Agrobacterium tumefaciens nopaline synthase terminator. The mature crRNA targeting the ZmGL2 locus, flanked by a hammerhead (HH) ribozyme and a hepatitis delta virus (HDV) ribozyme for processing, was operably linked to another copy of the sugarcane ubiquitin 4 promoter and another copy of the nopaline synthase terminator. An identical construct lacking the Cro domain was constructed as a control.

[0202] The donor DNA, which served as the repair template for HDR, was based on an 859-bp homologous fragment taken from the ZmGL2 coding region. The donor DNA sequence is represented by SEQ ID NO: 21. The homology to the left (upstream) of the cleavage site is in the first intron, and the homology to the right (downstream) of the cleavage site is in the second exon. A series of eight nucleotide substitutions (ACAAACTT to TAGTGACC) was introduced into the homologous fragment in the middle of the gRNA protospacer, which created two consecutive premature stop codons and protected the donor from Cas12a cleavage. The same N15 O as above was used. RThree sequences were added to each flank of the substitution-containing homologous fragment to create donor DNA, which can be cloned into a cloning vector and linearized by PCR using this vector as a template.

[0203] A mixture of 0.15 pmol of linearized vector and 2.25 pmol of double-stranded donor DNA was co-delivered into maize embryonic callus cells via biolistic analysis. Transgenic calli were selected on PMI selective medium and subsequently regenerated into transgenic plants via tissue culture procedures. Regenerated plants were sampled for DNA extraction, and a TaqMan assay was designed to distinguish the HDR-replaced functional knockout allele from the wild-type allele. A pair of primers outside the homology region was designed to amplify a 1.4 kb fragment covering the entire homology region by PCR, and the amplicon was subjected to Sanger sequencing to verify the predicted mutation and homology-genomic junction.

[0204] The results are summarized in Table 2. In all plants, eight nucleotide mutations were introduced into the ZmGL2 locus. Of these plants, six generated by Cro-LbCas12a fusion and three generated by the LbCas12a control had both genomic-homologous junctions of the replaced allele undergo complete HDR. In contrast, two plants generated by Cro-LbCas12a fusion and seven plants generated by the LbCas12a control had at least one genomic-homologous junction in the replaced allele and were unable to undergo complete HDR. Notably, one plant generated by Cro-LbCas12a had both alleles undergo complete HDR replacement.

[0205] [Table 2]

[0206] Reference sequence list SEQ ID NO: 1 N15 Cro monomer amino acid sequence MKPEELVRHFGDVEKAAVGVGVTPGAVYQWLQAGEIPPLRQSDIEVRTAYKLKSDFTSQRMGKEGHNSGTK SEQ ID NO: 2 Lambda Cro monomer amino acid sequence MEQRITLKDYAMRFGQTKTAKDLGVYQSAINKAIHAGRKIFLTINADGSVYAEEVKPFPSNKKTTA SEQ ID NO: 3 P22 chromomer amino acid sequence MYKKDVIDHFGTQRAVAKALGISDAAVSQWKEVIPEKDAYRLEIVTAGALKYQENAYRQAA SEQ ID NO:4 434Cro monomer amino acid sequence MQTLSERLKKRRIALKMTQTELATKAGVKQQSIQLIEAGVTKRPRFLFEIAMALNCDPVWLQYGTKRGKAA SEQ ID NO:5 N15 Cro O R 3 TTATAGCTGGCTATAA SEQ ID NO:6 Lambda Cro O R 3 (natural) TATCACCGCAAGGGATA SEQ ID NO:7 Lambda Cro O R 3 (synthesis) TATCACCGCGGGTGATA SEQ ID NO:8 P22 Cro O R 3 AGTTAAGTCATCTTAAAT SEQ ID NO:9 434 Cro O R 3 (natural) ACAAGAAAAACTGT SEQ ID NO: 10 434 Cro O R 3 (synthesis) ACAATATATATTGT SEQ ID NO: 11 Cro-Cas9 amino acid sequence [ka] SEQ ID NO: 12 Cro-Cas9 nucleotide sequence [ka] [ka] SEQ ID NO: 13 Cro-Cas12a amino acid sequence [ka] SEQ ID NO: 14 Cro-Cas12a nucleotide sequence [ka] [ka] [ka] SEQ ID NO: 15 15 amino acid linker GGGSGGGSGGGSGGG SEQ ID NO: 16 32 amino acid linker SGGSSGGSSGSETPGTSESATPESSGGSSGGS SEQ ID NO: 17 ZmALS2 (maize acetolactate synthase homolog) gRNA GCTGCTCGATTCCGTCCCCA SEQ ID NO: 18 20bp bacteriophage N15 OR3 operon CTTTATAGCTGGCTATAATT SEQ ID NO: 19 Maize Glossy2 (ZmGL2) gRNA GTCACAGATCACAAACTTCAAATG SEQ ID NO: 20 Donor DNA acting as repair template for HDR based on a 657 bp homologous fragment taken from the ZmALS2 coding region, starting at codon 4 (GCT for alanine) and ending at codon 222 (CGG for arginine). [ka] SEQ ID NO: 21 The donor DNA serving as the repair template for HDR was based on an 859-bp homologous fragment taken from the ZmGL2 coding region. [ka]

[0207] [Table 3-1]

[0208] [Table 3-2]

[0209] All patents, patent publications, patent applications, journal articles, books, technical references, etc. discussed in this disclosure are hereby incorporated by reference in their entirety for all purposes.

[0210] It should be understood that the figures and descriptions of the present disclosure have been simplified to illustrate elements relevant to a clear understanding of the present disclosure. It should be understood that the drawings are presented for illustrative purposes and are not presented as structural diagrams. Omitted details and variations or alternative embodiments are within the understanding of those skilled in the art.

[0211] It will be appreciated that in certain aspects of the present disclosure, a single component may be replaced by multiple components, and multiple components may be replaced by a single component, to provide an element or structure or to perform one or more given functions. Except to the extent that such a substitution would not be functional in practicing a particular embodiment of the present disclosure, such substitution is deemed to be within the scope of the present disclosure.

[0212] The examples shown herein are intended to illustrate possible specific implementations of the present disclosure. It will be understood that the examples are intended primarily to illustrate the present disclosure for those skilled in the art. There may be variations to these diagrams or to the operations described herein without departing from the spirit of the present disclosure. For example, in certain cases, method steps or operations may be performed or executed in a different order, or operations may be added, deleted, or modified.

[0213] Where a range of values ​​is provided, it is understood that each intervening value between the upper and lower limits of that range, to the smallest fraction of the lower limit, is also specifically disclosed, unless the context clearly dictates otherwise. Any narrower range between any stated or unstated intervening value in a stated range and any other stated or intervening value in that stated range is encompassed. The upper and lower limits of these smaller ranges may independently be included or excluded, and each range in which either, either, or both limits are included in the smaller range is also encompassed within the technology, subject to any specifically excluded limits in the stated range. When a stated range includes one or both of the limits, ranges excluding either or both of those included limits are also included.

[0214] The foregoing description sets forth numerous specific details to provide a more thorough understanding of the present invention. However, it will be apparent to those skilled in the art that the invention described in this disclosure may be practiced without one or more of these specific details. In other instances, well-known features and procedures well known to those skilled in the art have not been described to avoid obscuring the invention. The embodiments of the present disclosure have been described for purposes of illustration and not limitation. Although the present invention has been described primarily with reference to specific embodiments, it is anticipated that other embodiments will become apparent to those skilled in the art upon reading this disclosure, and such embodiments are intended to be included within the scope of the present method. Therefore, the present disclosure is not limited to the embodiments described above or depicted in the drawings, and various embodiments and modifications can be made without departing from the scope of the following claims.

Claims

1. A fusion protein comprising a site-specific nuclease fused to a recruiter domain containing a site-specific DNA-binding domain, wherein the recruiter domain contains a Cro repressor family protein.

2. The fusion protein according to claim 1, wherein the site-specific nuclease comprises a CRISPR-related nuclease.

3. The fusion protein according to claim 2, wherein the CRISPR-related nuclease is selected from the group consisting of Cas5, Cas6, Cas7, Cas8, Cas9, Cas12a, Cas12b, Cas12i, Cas12j, Cas12L, Cas12e, Cas12c, Cas12d, Cas12g, Cas12h, TnpB, Cas13a, Cas13b, Cas14 and their nickase or inactivated versions.

4. The fusion protein according to claim 3, wherein the CRISPR-related nuclease is a Cas9 enzyme.

5. The fusion protein according to claim 3, wherein the CRISPR-related nuclease is a Cas12a enzyme.

6. The fusion protein according to claim 1, wherein the Cro repressor family protein comprises N15 Cro, P22 Cro, 434 Cro, or any combination thereof.

7. The fusion protein according to any one of claims 1 to 6, wherein the recruiter domain comprises an amino acid sequence having at least 90% identity with any one of Sequence ID No. 1, 3, or 4.

8. The fusion protein according to any one of claims 1 to 6, wherein the recruiter domain includes a dimerization domain.

9. The fusion protein according to any one of claims 1 to 6, comprising a linker located between the site-specific nuclease and the recruiter domain.

10. The fusion protein according to claim 9, wherein the linker comprises any one of SEQ ID NOs: 6, 7, 15, or 16.

11. A fusion protein according to any one of claims 1 to 6, comprising a nuclear localization signal.

12. A fusion protein according to any one of claims 1 to 6, comprising an amino acid sequence having at least 90% identity with SEQ ID NO: 11 or 13.

13. Recombinant nucleic acid encoding a fusion protein according to any one of claims 1 to 6.

14. A DNA construct comprising a promoter operably linked to the recombinant nucleic acid according to claim 13.

15. The DNA construct according to claim 14, wherein the promoter comprises at least one of an inducible promoter, a constitutive promoter, an egg cell-specific promoter, a pollen-specific promoter, or an apical meristem-specific promoter.

16. The DNA construct according to claim 14, wherein the promoter is a ubiquitin 4 promoter, an actin promoter, a tubulin promoter, a MADS box promoter, or a plant virus promoter.

17. A vector comprising a recombinant nucleic acid according to any one of claims 1 to 6, or a DNA construct comprising a promoter operably linked to the recombinant nucleic acid.

18. A cell comprising a recombinant nucleic acid according to any one of claims 1 to 6, a DNA construct comprising a promoter operably linked to the recombinant nucleic acid, or a vector comprising the recombinant nucleic acid.

19. The cell according to claim 18, which is a plant cell.

20. The cell according to claim 19, wherein the plant cell is a corn plant cell, a soybean plant cell, a rice plant cell, a wheat plant cell, or a sunflower plant cell.

21. A method for editing nucleic acids, a. To provide at least one fusion protein according to any one of claims 1 to 6, b. To provide the nucleic acid, wherein the nucleic acid comprises a first binding site and a target region including a part of the nucleic acid, and the first binding site is located within or adjacent to the target region. c. To provide a donor polynucleotide comprising a donor nucleotide region and at least one recruit sequence that is specifically bound by the recruiter domain of the at least one fusion protein, d. Contacting the nucleic acid and the donor polynucleotide with the at least one fusion protein, wherein the at least one fusion protein specifically binds to the first binding site of the nucleic acid and the recruitment sequence of the donor polynucleotide, thereby resulting in editing of the target region of the nucleic acid. A method that includes this.

22. The method according to claim 21, wherein the first binding site is adjacent to the 5' or 3' end of the target region.

23. The method according to claim 21, wherein the nucleic acid further comprises a second binding site, the second binding site being located within or adjacent to the target region, and the at least one fusion protein specifically binds to the first binding site and the second binding site of the nucleic acid.

24. The method according to claim 23, wherein the second binding site is adjacent to the 5' or 3' end of the target region.

25. The aforementioned at least one recruit sequence is Cro O R The method according to claim 21, comprising a three-operon sequence.

26. The aforementioned Cro O R The 3-operon sequence is N15 O R 3 operon sequence (arbitrarily sequence number 18), P22 O R 3-operon sequence, 434 O R The method according to claim 25, comprising a three-operon sequence or a combination thereof.

27. The method according to claim 21, wherein the donor polynucleotide comprises at least one homology arm, and the at least one homology arm comprises a nucleotide sequence complementary to a portion of the target region of the nucleic acid.

28. The method according to claim 21, wherein the donor polynucleotide comprises at least two recruitment sequences.

29. The method according to claim 28, wherein the donor polynucleotide includes a first recruitment sequence adjacent to the 5' end of the donor nucleotide region and a second recruitment sequence adjacent to the 3' end of the donor nucleotide region.

30. The method according to claim 28, wherein the at least two recruit sequences are not located within the donor nucleotide region.

31. The method according to claim 21, wherein the site-specific nuclease of the at least one fusion protein comprises a CRISPR-associated nuclease, and the method further comprises providing at least one guide RNA, the at least one guide RNA comprising a nucleotide sequence complementary to the first binding site and / or the second binding site of the nucleic acid.

32. The method according to claim 21, wherein the editing of the target region of the nucleic acid is to replace at least a portion of the target region with at least a portion of the donor polynucleotide.