Cas Exonuclease Fusion Proteins and Related Methods for Excision, Inversion, and Site-Specific Integration

JP2025521592A5Pending Publication Date: 2026-07-01SYNGENTA CROP PROTECITON AG

View PDF 0 Cites 0 Cited by

Patent Information

Authority / Receiving Office: JP · JP
Patent Type: Applications
Current Assignee / Owner: SYNGENTA CROP PROTECITON AG
Filing Date: 2023-06-23
Publication Date: 2026-07-01

Application Information

Patent Timeline

23 Jun 2023

Application

01 Jul 2026

Publication

JP2025521592A5

IPC: C07K19/00; C12N9/16; C12N9/22; C12N15/55; C12N15/62; C12N15/63; C12N15/82; C12N1/15; C12N1/19; C12N1/21; C12N5/10; C12N5/04; C12N15/90

CPC: C12N9/22; C12N2310/20; C07K2319/00; C12N15/82; C12N15/8201; C07K2319/09; C12N9/226; C12N15/111

Explore More Agents

Novelty Search
Search existing technologies and assess novelty
↗
FTO
Analyze whether a product may infringe others' patents
↗
Design FTO
Check prior-design risk for exterior design
↗
Drafting
Draft patent application text based on a technical solution
↗
Find Solutions with TRIZ
Generate feasible solution to solve your technical challenge
↗

Get free access to AI patent search and analysis

Check patentability, review prior art and ask IP Agent with full patent context.

AI Technical Summary

Technical Problem

Site-specific modifications induced by SDNs, such as CRISPR/Cas systems, often result in inaccurate off-target editing and low frequency of desired deletion events, with varying deletion sizes and frequencies.

Method used

Fusion proteins comprising a site-specific nuclease, like CRISPR-associated nuclease, linked to a non-specific exonuclease, such as Trex2, are used to enhance genome editing efficiency through targeted excision, inversion, or replacement of nucleic acid regions by modulating DNA repair pathways.

Benefits of technology

The fusion proteins increase the frequency and accuracy of desired editing outcomes, such as excision, inversion, or replacement of genomic fragments, by biasing DNA repair mechanisms towards specific pathways like NHEJ or HDR, thereby improving the efficiency of genome editing.

✦ Generated by Eureka AI based on patent content.

Smart Images

Figure 00000000_0000_ABST

Patent Text Reader

Abstract

The present specification provides fusion proteins, related methods, and systems for increasing the efficiency of genome editing using site-specific nucleases. This fusion protein, system, and method can selectively increase desired editing outcomes (e.g., inversion, excision, and homologous recombination repair). Also provided are various compositions useful for making and using the fusion protein and for practicing the method.

Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] Cross - reference to related applications This application claims priority to the specification of Chinese Patent Application No. 202210718723.X, filed on June 23, 2022, which is incorporated herein by reference.

[0002] This disclosure relates to methods for increasing excision, inversion, and site - specific integration. The methods presented herein are applicable to both the non - homologous end - joining (NHEJ) and homology - dependent repair (HDR) mechanisms.

[0003] Sequence Listing This application is accompanied by a sequence listing entitled 82439 - ST26.xml, which is approximately 252 kilobytes in size. This sequence listing is incorporated herein by reference in its entirety.

Background Art

[0004] Site-specific nucleases (SDNs) (e.g., zinc finger nucleases, transcription activator-like effector nucleases, CRISPR-associated nucleases) are becoming increasingly popular in the gene editing space. These SDNs act as endonucleases and generally activate the cell's native repair mechanisms (e.g., homologous recombination) by creating double-strand breaks (DSBs) in specific DNA sequences. During the repair process, site-specific modifications to the specific DNA sequence can be achieved. The CRISPR (Clustered Regularly Interspaced Short Palindromic Repeats) / Cas (CRISPR-associated) system evolved as an adaptive immune system in bacteria and archaea to defend against viral attack. In recent years, the CRISPR / Cas system has attracted particular attention as a genome editing tool. Using a CRISPR / Cas system that generates site-specific double-strand breaks (DSBs), eukaryotic DNA can be edited, for example, by causing deletions, insertions, and / or changes in nucleotide sequences.

Summary of the Invention

Problems to be Solved by the Invention

[0005] Site-specific modifications induced by SDNs are often inaccurate (e.g., off-target editing can occur) and often occur at low frequencies. For example, when a CRISPR / Cas system is configured to cause deletions by making one or more DSBs, the size of the deletions can vary and the frequency of the desired deletion event can be relatively low. Therefore, there is a need for methods to increase the efficiency of targeted genome editing using SDNs.

Means for Solving the Problems

[0006] This summary is provided to introduce a selected set of concepts that are further described in detail below. This summary is not intended to identify key or essential features of the claimed subject matter, nor is it intended to be used as an aid in limiting the scope of the claimed subject matter.

[0007] In one aspect, provided herein is a fusion protein comprising a site-specific nuclease linked to a non-specific end-processing enzyme. In some embodiments, the site-specific nuclease comprises a CRISPR-associated nuclease. In some embodiments, the CRISPR-associated nuclease is selected from the group consisting of Cas5, Cas6, Cas7, Cas8, Cas9, Cas12a, Cas12b, Cas12i, Cas12j, Cas12L, Cas12e, Cas12c, Cas12d, Cas12g, Cas12h, TnpB, Cas13a, Cas13b, Cas14, and nickase or inactivated versions thereof. In some embodiments, the CRISPR-associated nuclease is a Cas9 enzyme. In some embodiments, the CRISPR-associated nuclease is a Cas12a enzyme.

[0008] In some embodiments of the fusion protein provided herein, the non-specific end-processing enzyme is a non-specific exonuclease. In some embodiments, the non-specific exonuclease is T5Exo, Trex2, Escherichia coli exonuclease I, exonuclease III, exonuclease T, exonuclease IX, exonuclease X, RecJ, Pol II, Pol IIIε; WRN, MRE11, APE1, VDJP, RAD1, RAD9, p53, or Trex1. In some embodiments, the non-specific end-processing enzyme comprises an amino acid sequence having at least 90% identity to any one of SEQ ID NOs: 4, 5, 18, 19, 20, 22, or 58 - 74. In some embodiments, the non-specific end-processing enzyme is a monomer of a dimerizing protein.

[0009] In some embodiments of the fusion proteins provided herein, the fusion protein comprises a linker located between a site-specific nuclease and a non-specific end-processing enzyme. In some embodiments, the linker comprises SEQ ID NO: 7. In some embodiments, the fusion protein comprises a nuclear localization signal. In some embodiments, the fusion protein comprises an amino acid sequence having at least 90% identity to any one of SEQ ID NOs: 50-57.

[0010] Also provided herein is a recombinant nucleic acid encoding any of the fusion proteins described herein. Also provided is a DNA construct comprising a promoter operably linked to the recombinant nucleic acid described herein. In some embodiments, the promoter is at least one of an inducible promoter, a constitutive promoter, an egg cell-specific promoter, a pollen-specific promoter, or a meristem-specific promoter. In some embodiments, the promoter is a ubiquitin 4 promoter, an actin promoter, a tubulin promoter, a MADS box promoter, or a plant virus promoter. Also provided herein is a vector comprising the recombinant nucleic acid or DNA construct described herein. Also provided herein is a cell comprising the recombinant nucleic acid, DNA construct, or vector described herein. In some embodiments, the cell is a plant cell. In some embodiments, the plant cell is a maize plant cell, a soybean plant cell, a rice plant cell, a wheat plant cell, and / or a sunflower plant cell.

[0011] Also provided herein is a method of editing a nucleic acid, comprising: providing at least one fusion protein described herein; providing a nucleic acid, wherein the nucleic acid comprises a first binding site, a second binding site, and a target region comprising a portion of the nucleic acid, the first binding site being adjacent to the 5' end of the target region and the second binding site being adjacent to the 3' end of the target region; and contacting the nucleic acid with the at least one fusion protein, wherein the at least one fusion protein specifically binds to the first binding site and the second binding site, thereby causing an edit in the target region of the nucleic acid. In some embodiments, the site-specific nuclease of the at least one fusion protein comprises a CRISPR-associated nuclease, and the method further comprises providing at least one first guide RNA and at least one second guide RNA, wherein the at least one first guide RNA comprises a nucleotide sequence having complementarity to the first binding site and the at least one second guide RNA comprises a nucleotide sequence having complementarity to the second binding site. In some embodiments, the first binding site and the second binding site are on the same strand. In some embodiments, the first binding site and the second binding site are on opposite strands. In some embodiments, at least one of the first binding site or the second binding site is within the target region. In some embodiments, both the first binding site and the second binding site are within the target region. In some embodiments, neither the first binding site nor the second binding site is within the first target region.

[0012] In some embodiments of the methods of editing a nucleic acid provided herein, the method further comprises providing a donor nucleic acid, wherein the donor nucleic acid comprises a third binding site, a fourth binding site, and a donor nucleotide region, the third binding site is adjacent to the 5' end of the donor nucleotide region, the fourth binding site is adjacent to the 3' end of the donor nucleotide region, and at least one fusion protein specifically binds to the third binding site and the fourth binding site. In some embodiments, the site-specific nuclease of the at least one fusion protein comprises a CRISPR-associated nuclease, and the method further comprises providing at least one third guide RNA and at least one fourth guide RNA, wherein the at least one third guide RNA comprises a nucleotide sequence having complementarity to the third binding site, and the at least one fourth guide RNA comprises a nucleotide sequence having complementarity to the fourth binding site. In some embodiments, the third binding site and the fourth binding site are on the same strand. In some embodiments, the third binding site and the fourth binding site are on opposite strands. In some embodiments, at least one of the third binding site or the fourth binding site is within the donor nucleotide region. In some embodiments, both the third binding site and the fourth binding site are within the donor nucleotide region. In some embodiments, neither the third binding site nor the fourth binding site is within the donor nucleotide region.

[0013] In some embodiments of the methods of editing a nucleic acid provided herein, the nucleic acid is a portion of a first chromosome. In some embodiments, the donor nucleic acid is a portion of a donor template. In some embodiments, the donor template is a portion of a plasmid or a linear nucleic acid.

[0014] In some embodiments of the methods of editing a nucleic acid provided herein, the editing is an excision, inversion, or replacement of at least a portion of the target region.

[0015] In some embodiments of the methods of editing nucleic acids provided herein, the donor nucleic acid is a portion of a second chromosome. In some embodiments, the first chromosome and the second chromosome are homologous or non-homologous chromosomes. In some embodiments, the editing is a chromosomal rearrangement or replacement of at least a portion of the target region. In some embodiments, the chromosomal rearrangement is a reciprocal or non-reciprocal translocation.

[0016] This application includes the following figures. These figures are intended to illustrate certain embodiments and / or features of the compositions and methods and to supplement any one or more descriptions of the compositions and methods. These figures are not intended to limit the scope of the compositions and methods, except as explicitly indicated by the description in the specification to that effect.

Brief Description of the Drawings

[0017]

Figure 1

Figure 2

Figure 3A

Figure 3B

Figure 4

Figure 5

Figure 6

Figure 7A

Figure 7B

Figure 8

Figure 9

DETAILED DESCRIPTION OF THE INVENTION

[0018] The following description sets forth various aspects and embodiments of the present compositions and methods. It is not intended that the detailed embodiments define the scope of the present compositions and methods. Rather, the embodiments merely provide non-limiting examples of various compositions and methods that are at least included within the scope of the disclosed compositions and methods. The description is to be read from the perspective of one of ordinary skill in the art, and thus information well known to one of ordinary skill in the art is not necessarily included.

[0019] I. TERMINOLOGY All technical and scientific terms used herein are intended to have the same meaning as commonly understood by one of ordinary skill in the art, unless specifically defined otherwise below. References to techniques used herein are intended to refer to techniques commonly understood in the art, including variations of those techniques and / or alternative examples of equivalent techniques that would be apparent to one of ordinary skill in the art. The following terms are defined below for ease of explanation of the subject matter of the present disclosure, although one of ordinary skill in the art is considered to be fully familiar with them.

[0020] As used herein, the singular forms "a", "an", and "the" include plural referents unless the context clearly dictates otherwise. Thus, for example, when referring to "an enzyme", it optionally includes combinations of two or more such molecules, and the like.

[0021] As used herein, "and / or" refers to and encompasses any and all possible combinations of one or more of the associated listed items.

[0022] The term "about", as used herein, refers to the normal error range of each value readily known to those of ordinary skill in the art. For example, ±20%, ±10%, or ±5% is within the intended meaning range of the recited value.

[0023] As used herein, the term "comprising" or "comprise" is open-ended. When used in connection with a subject nucleic acid (or amino acid sequence), it refers to a nucleic acid sequence (or amino acid sequence) that includes the subject sequence as part or in whole.

[0024] As used herein, the transitional phrase "consisting essentially of" should be interpreted to mean that the specified materials or steps recited in the claims, and materials or steps that do not materially affect the basic novel one or more features of the claimed subject matter, are included in the claims. Thus, it is intended that the term "consisting essentially of" should not be interpreted as equivalent to "comprising" when used in the claims of this disclosure.

[0025] The term "plural" refers to more than one entity. Thus, "a plurality of individuals" refers to at least two individuals. In some embodiments, the term plurality refers to more than half of the whole. For example, in some embodiments, "a plurality of a group" refers to more than half of the members of that group.

[0026] As used herein, the term "plant" refers to any plant at any stage of development, particularly seed plants. As used herein, the term "plant cell" refers to the structural and physiological unit of a plant, including a protoplast and a cell wall. A plant cell may be in the form of an isolated single cell or a cultured cell, or as part of a highly organized unit such as, for example, a plant tissue, a plant organ, or an entire plant. A plant cell may be derived from or be part of an angiosperm or a gymnosperm. A plant cell may be a monocotyledonous plant cell (e.g., a corn cell, a rice cell, a sorghum cell, a sugarcane cell, a barley cell, a wheat cell, a rye cell, a turfgrass cell, or an ornamental herb cell) or a dicotyledonous plant cell (e.g., a tobacco cell, a pepper cell, an eggplant cell, a sunflower cell, a Brassicaceae plant cell, a flax cell, a potato cell, a cotton cell, a soybean cell, a sugar beet cell, or a rape cell). As used herein, the term "plant cell culture" refers to a culture of plant units at various stages of development, such as, for example, protoplasts, cultured cells, cells of plant tissues, pollen, pollen tubes, ovules, embryo sacs, zygotes, and embryos. As used herein, the term "plant tissue" refers to a group of plant cells organized into structural and functional units. Any tissue of a plant, whether in planta or in culture, is included. This term includes, but is not limited to, whole plants, plant organs, plant seeds, tissue cultures, and any group of plant cells organized into structural and / or functional units. When this term is used in conjunction with, or without, any particular type of plant tissue as listed above or otherwise subsumed within this definition, it is not intended to exclude any other type of plant tissue. As used herein, the term "plant part" refers to parts of a plant, including single cells and cell tissues, such as intact plant cells, cell aggregates capable of regenerating the plant, and tissue cultures in the plant. Examples of plant parts include, but are not limited to, pollen, ovules, zygotes, leaves, embryos, roots, root tips, anthers, flowers, floral parts, fruits, stems, shoots, cuttings, and seeds; and single cells and tissues from pollen, ovules, egg cells, zygotes, leaves, embryos, roots, root tips, anthers, flowers, floral parts, fruits, stems, shoots, cuttings, scions, rootstocks, seeds, protoplasts, callus, etc.

[0027] The terms "polypeptide", "peptide", and "protein" are used interchangeably herein to refer to a polymer of amino acid residues. As used herein, these terms encompass amino acid chains of any length, including full-length proteins, in which the amino acid residues are linked by covalent peptide bonds.

[0028] The terms "nucleic acid" and "polynucleotide" are used synonymously and, as used herein, refer to deoxyribonucleic acid (DNA) or ribonucleic acid (RNA) in either single-stranded or double-stranded form, and polymers thereof, as well as to both the sense and antisense strands of RNA, cDNA, genomic DNA, mitochondrial DNA, and to synthetic forms and hybrid polymers thereof. In higher plants, while DNA is the genetic material, RNA is involved in transferring the information contained within DNA to proteins. "Genome" refers to the entirety of the genetic material contained in each cell of a living organism. When RNA is described, it is understood that its corresponding cDNA is also described, where uridine is represented as thymidine in cDNA. In a detailed embodiment, a nucleotide refers to a ribonucleotide, deoxynucleotide, or a modified form of any type of nucleotide, and combinations thereof. Additionally, the polynucleotides disclosed herein may include either or both naturally occurring and modified nucleotides linked together by naturally occurring and / or non-naturally occurring nucleotide linkages. Nucleic acid molecules, as would be readily understood by those skilled in the art, may be chemically or biochemically modified or may contain non-natural or derivatized nucleotide bases. Such modifications include, for example, labeling, methylation, substitution with analogs for one or more of the naturally occurring nucleotides, internucleotide modifications such as uncharged linkages (e.g., methylphosphonates, phosphate triesters, phosphoramidates, carbamates, etc.), charged linkages (e.g., phosphorothioates, phosphorodithioates, etc.), pendant moieties (e.g., polypeptides), intercalators (e.g., acridine, psoralen, etc.), chelators, alkylating agents, and modified linkages (e.g., α-anomeric nucleic acids, etc.). The above terms are also intended to include any conformation of any topology, including single-stranded, double-stranded, partially double-stranded, triple-stranded, hairpin-shaped, circular, and padlock-shaped conformations. When referring to a nucleic acid sequence, its complement is included unless otherwise specified.Accordingly, when referring to a nucleic acid molecule having a particular sequence, it must be understood that it includes its complementary strand with its complementary sequence. A nucleotide sequence is "complementary" when it specifically hybridizes in solution (e.g., according to the Watson-Crick base pairing rules). This term also includes codon-optimized nucleic acids encoding the same polypeptide sequence. It is also understood that the nucleic acid may be unpurified, purified, or attached to a synthetic material, such as beads or column matrix.

[0029] The term "corresponding to" in the context of nucleic acid sequences means that when aligning nucleic acid sequences of certain sequences with each other, those nucleic acids that "correspond to" certain recited positions in the present invention align with those positions in the reference sequence, but are not necessarily in their numerically exact positions compared to a particular nucleic acid sequence of the present invention. Optimal alignment of sequences for comparison can be performed by computerized implementations of known algorithms or by visual inspection. Readily available sequence comparison and multiple sequence alignment algorithms are, respectively, the Basic Local Alignment Search Tool (BLAST) and the ClustalW / ClustalW2 / Clustal Omega programs available on the Internet (e.g., the website of EMBL-EBI). Other suitable programs include, but are not limited to, GAP, BestFit, Plot Similarity, and FASTA, which are part of the Accelrys GCG package available from Accelrys, Inc., San Diego, Calif., United States of America. See also Smith & Waterman, 1981; Needleman & Wunsch, 1970; Pearson & Lipman, 1988; Ausubel et al., 1988; and Sambrook & Russell, 2001.

[0030] Unless otherwise indicated, a particular nucleic acid sequence also implicitly encompasses its conservatively modified variants, such as degenerate codon substitutions, alleles, orthologs, SNPs, and complementary sequences, as well as the explicitly indicated sequences. Specifically, degenerate codon substitutions can be achieved by generating sequences in which the third position of one or more selected (or all) codons is substituted with a mixed base and / or a deoxyinosine residue. See Batzer et al., Nucleic Acid Res. 19:5081 (1991); Ohtsuka et al., J. Biol. Chem. 260:2605-2608 (1985); and Rossolini et al., Mol. Cell. Probes 8:91-98 (1994).

[0031] The terms "identity" or "substantial identity," as used in the context of the polynucleotide or polypeptide sequences described herein, refer to a sequence having at least 60% sequence identity with a reference sequence. Alternatively, the percent identity may be any integer from 60% to 100%. Exemplary embodiments include, but are not limited to, the programs described herein; preferably, at least 60%, 65%, 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99% identity when compared to a reference sequence using BLAST with the standard parameters described below. Those skilled in the art will recognize that these values can be appropriately adjusted to determine the corresponding identity of the proteins encoded by two nucleotide sequences, taking into account codon degeneracy, amino acid similarity, reading frame positioning, and the like.

[0032] For array comparison, typically one array serves as the reference array and is compared with a test array. When using an array comparison algorithm, the test array and the reference array are input into a computer, partial array coordinates are specified as needed, and array algorithm program parameters are specified. It is possible to use default program parameters or to specify alternative parameters. The array comparison algorithm then calculates the percent sequence identity of the test array relative to the reference array based on the program parameters.

[0033] As used herein, "comparison window" includes reference to any one segment of a number of contiguous positions selected from the group consisting of from 20 to 600, usually about 50 to about 200, more usually about 100 to about 150, wherein the sequences can be compared to the reference sequence of the same number of contiguous positions after the two sequences are optimally aligned. Methods of aligning sequences for comparison are well known in the art. Optimal alignment of sequences for comparison can be conducted by the local homology algorithm of Smith and Waterman Add.APL.Math.2:482 (1981), the homology alignment algorithm of Needleman and Wunsch J.Mol.Biol.48:443 (1970), the similarity search method of Pearson and Lipman Proc.Natl.Acad.Sci.(U.S.A.) 85:2444 (1988), computer implementations of these algorithms (e.g., BLAST), or by a method by manual alignment and visual inspection.

[0034] Algorithms suitable for determining percent sequence identity and percent sequence similarity are the BLAST and BLAST 2.0 algorithms, described in Altschul et al. (1990) J. Mol. Biol. 215:403-410 and Altschul et al. (1977) Nucleic Acids Res. 25:3389-3402, respectively. Software for performing BLAST analyses is publicly available through the website of the National Center for Biotechnology Information (NCBI). This algorithm involves initially identifying high-scoring sequence pairs (HSPs) by identifying short words of length W in the query sequence, which match or satisfy some positive-valued threshold score T when aligned with words of the same length in the database sequence. T is referred to as the neighborhood word score threshold (Altschul et al., supra). These initial neighborhood word hits act as seeds for initiating a search to find longer HSPs that contain them. The word hits are then extended in both directions along each sequence as far as possible while the cumulative alignment score can be increased. The cumulative score is calculated for nucleotide sequences using the parameters M (reward score for a pair of matching residues; always >0) and N (penalty score for a mismatch residue; always <0). For amino acid sequences, a scoring matrix is used to calculate the cumulative score. Extension of the word hits in each direction stops when the cumulative alignment score drops by an amount X from its maximum achieved value; the cumulative score goes to zero or below due to the accumulation of one or more negative-score residue alignments; or the end of either sequence is reached. The BLAST algorithm parameters W, T, and X determine the sensitivity and speed of the alignment. The BLASTN program (for nucleotide sequences) uses, by default, a word size (W) of 28, an expectation value (E) of 10, M = 1, N = -2, and a comparison of both strands. For amino acid sequences, the BLASTP program uses, by default, a word size (W) of 3, an expectation value (E) of 10, and the BLOSUM62 scoring matrix.See Henikoff & Henikoff, Proc. Natl. Acad. Sci. USA 89:10915 (1989).

[0035] The BLAST algorithm also performs a statistical analysis of the similarity between two sequences. See, e.g., Karlin & Altschul, Proc. Nat’l. Acad. Sci. USA 90:5873-5787 (1993). One measure of similarity provided by the BLAST algorithm is the minimum sum probability (P(N)), which provides an indication of the probability that a match between two nucleotide or amino acid sequences occurs by chance. For example, if the minimum sum probability in a comparison of a test nucleic acid and a reference nucleic acid is less than about 0.01, more preferably less than about 10 -5 less, and most preferably less than about 10 -20 less, the nucleic acid is considered to be similar to the reference sequence.

[0036] "Recombination" is the exchange of DNA strands to create a new nucleotide sequence configuration. This term may also refer to the homologous recombination process that occurs in the repair of double-strand DNA breaks, in which case a polynucleotide is used as a template to repair a homologous polynucleotide. This term may also refer to the exchange of information between two homologous chromosomes during meiosis. The frequency of this double recombination is the product of the frequencies of single recombinants. For example, recombinants in a 10 cM region can be found at a frequency of 10%, and double recombinants are found at a frequency of 10% × 10% = 1% (1 centimorgan is defined as 1% recombinant progeny in a test cross).

[0037] A "gene" is a defined region located within the genome and containing, in addition to the aforementioned coding nucleic acid sequences, other mainly regulatory nucleic acid sequences involved in the regulation of the expression, i.e., transcription and translation, of its coding portion. A gene can include both a coding region and non-coding regions (e.g., introns, regulatory elements, promoters, enhancers, termination sequences, and 5' and 3' untranslated regions). A gene typically expresses a specific protein, including mRNA, functional RNA, or regulatory sequences. A gene may or may not be usable for the production of a functional protein. In some embodiments, a gene refers only to the coding region. The term "natural gene" refers to a gene as found in nature. The term "chimeric gene" refers to any gene that 1) contains DNA sequences with regulatory and coding sequences that are not found co-existing in nature, or 2) contains sequences encoding parts of proteins that are not adjacent in nature, or 3) contains parts of a promoter that are not adjacent in nature. Thus, a chimeric gene can contain regulatory and coding sequences derived from different sources or regulatory and coding sequences derived from the same source but arranged in a manner different from that found in nature. A gene may be "isolated", which means a nucleic acid molecule that substantially or essentially does not contain components normally found in association with nucleic acid molecules in their natural state. Such components include other cellular materials, culture media from recombinant production, and / or various chemicals used in the chemical synthesis of nucleic acid molecules.

[0038] The "gene of interest" or "nucleotide sequence of interest" refers to any gene that confers a desired trait to a plant upon transfer into the plant, such as antibiotic resistance, virus resistance, insect resistance, disease resistance, or resistance to other pests, herbicide resistance, improvement of nutritional value, improvement of performance in industrial processes, or modification of reproductive ability. The "gene of interest" can also include those that are transferred into the plant so that a commercially valuable enzyme or metabolite is produced in the plant.

[0039] An "isolated" nucleic acid molecule or nucleotide sequence or "isolated" polypeptide is one that exists apart from its natural environment by the hand of man and / or has a different, modified, regulated, and / or altered function as compared to its function in its natural environment, and thus is not a natural product, a nucleic acid molecule, nucleotide sequence, or polypeptide. An isolated nucleic acid molecule or isolated polypeptide can exist in purified form or in a non-natural environment (e.g., a recombinant host cell). Thus, for example, with respect to polynucleotides, the term isolated means that it has been separated from the chromosomes and / or cells in which it occurs in nature. A polynucleotide is also isolated if it has been separated from the chromosomes and / or cells in which it occurs in nature and then inserted into a genetic context, chromosome, chromosomal locus, and / or cell in which it does not occur in nature. The recombinant nucleic acid molecules and nucleotide sequences of the present invention can be considered to be "isolated" as defined above.

[0040] Accordingly, an "isolated nucleic acid molecule" or "isolated nucleotide sequence" is a nucleic acid molecule or nucleotide sequence that is not directly adjacent to the nucleotide sequences (one at the 5' end and one at the 3' end) to which it is directly adjacent in the naturally occurring genome of the organism from which it is derived. Accordingly, in one embodiment, an isolated nucleic acid includes some or all of the 5' non-coding (e.g., promoter) sequences that are directly adjacent to the coding sequence. Accordingly, this term includes, for example, recombinant nucleic acids that are incorporated into a vector, autonomously replicating plasmid or virus, or the genomic DNA of a prokaryote or eukaryote, or exist as separate molecules (e.g., cDNA or genomic DNA fragments produced by PCR or restriction endonuclease treatment) independent of other sequences. It also includes recombinant nucleic acids that are part of a hybrid nucleic acid molecule encoding a further polypeptide or peptide sequence. An "isolated nucleic acid molecule" or "isolated nucleotide sequence" can also include a nucleotide sequence that is derived from the same natural source cell type, is inserted, but exists in a non-natural state, e.g., exists in a different copy number and / or under the control of regulatory sequences that are different from those found in the natural state of the nucleic acid molecule.

[0041] The term "isolated" can further refer to a nucleic acid molecule, nucleotide sequence, polypeptide, peptide, or fragment (e.g., when produced by recombinant DNA techniques) that is substantially free of cellular material, viral material, and / or culture medium, or a chemical precursor or other chemical (e.g., when chemically synthesized). Further, an "isolated fragment" is a fragment of a nucleic acid molecule, nucleotide sequence, or polypeptide that does not naturally exist as such and is not found in its natural state as such. "Isolated" does not necessarily mean that the preparation is technically pure (homogeneous), but is pure enough to provide the polypeptide or nucleic acid in a form that can be used for the intended purpose.

[0042] "Homology-Dependent Repair" or "Homologous Recombination Repair" or "HDR" refers to the mechanism that repairs ssDNA and double-stranded DNA (dsDNA) damage in cells. This repair mechanism can be used by cells when there is an HDR template with a sequence highly homologous to the damaged site. The term "complete HDR" refers to the situation where the genomic homology junction in the replaced allele has undergone complete HDR, and "incomplete HDR" refers to the situation where the genomic homology junction in the replaced allele has undergone partial or incomplete HDR. Since a donor DNA molecule with homology to the cleaved target DNA sequence is used as a template for the repair of the cleaved target DNA sequence, genetic information will transfer from the donor polynucleotide to the target DNA. Thus, new nucleic acid material can be inserted / copied into the site. In some cases, the target DNA contacts a donor molecule, such as a donor DNA molecule. In some cases, the donor DNA molecule is introduced into the cell. In some cases, at least a segment of the donor DNA molecule is integrated into the genome of the cell.

[0043] "Microhomology-Mediated End Joining" or "MMEJ" or "Alternative Non-Homologous End Joining" (Alt-NHEJ) refers to a form of double-strand break repair in DNA. This repair mechanism utilizes microhomology sequences to align the cleaved strands. "Non-Homologous End Joining" or "NHEJ" refers to a form of double-strand break repair in DNA. The double-strand break is repaired by directly ligating the cleavage ends to each other. Generally, there is no insertion of new nucleic acid material into the site, but small deletions or small insertions can occur due to the loss or addition of some nucleic acid material.

[0044] II. Introduction This specification provides fusion proteins and related recombinant nucleic acids, systems, and methods for increasing the efficiency of genome editing using SDN by inversion, excision, and HDR using fusion proteins and donor DNA tethering methods. The present disclosure is based, in part, on the inventors' discovery that fusing SDN to a non-specific end-processing enzyme (e.g., a non-specific exonuclease) increases the frequency of desired editing outcomes, such as inversion of genomic fragments between two targeted SDN-induced double-strand breaks (DSBs), as demonstrated in Example 1 herein. In general, the detailed cellular mechanisms used for DSB repair are thought to depend on the nature of the DNA ends created by the DSB (e.g., blunt or sticky ends) and / or the level at which end-processing occurs (e.g., whether one of the strands is trimmed). DSBs that do not undergo end-trimming are generally repaired by classical non-homologous end joining (C-NHEJ). C-NHEJ is considered an "error-prone" pathway because, in some cases, it leads to the formation of small insertions and deletions. However, when end-trimming does occur, the ends of the DSB may contain one or more overhangs (e.g., 3' overhang or 5' overhang), which can interact with nearby homologous sequences. The repair mechanism of the DSB can vary depending on the extent of processing. When relatively limited end-trimming occurs at the ends of the DSB, the DSB is generally processed by alternative non-homologous end joining (ALT-NHEJ). ALT-NHEJ refers to a class of pathways that includes blunt-end ligation (blunt-end EJ) and microhomology-mediated end joining (MMEJ), which are prone to deletion, and synthesis-dependent microhomology-mediated end joining (SD-MMEJ), which is prone to insertion. However, when end-trimming is extensive, the resulting overhangs can undergo strand invasion of highly homologous sequences, which can be endogenous or heterologous sequences, followed by repair of the DSB by the homology-dependent recombination (HDR) pathway.Without being bound by any particular theory, according to the fusion proteins provided herein, by combining DSB formation with a desired type and / or level of end processing to bias DSB repair towards a particular pathway, the frequency of desirable editing results may be increased.

[0045] The present disclosure is also based in part on the inventors' discovery that the use of fusion proteins that are capable of dimerizing can increase the frequency of desirable editing results. As further discussed herein, the fusion proteins can remain bound to their nucleic acid targets even after DSB formation. Additionally, the fusion proteins can be targeted to remain bound to a portion of the nucleic acid target that is upstream or downstream of the DSB cleavage site. When two or more fusion proteins that are capable of forming a dimer are used, the polynucleotide ends to which the fusion proteins are bound come into proximity. Without being bound by any particular theory, the likelihood of which particular DSB repair pathway will be used may be influenced by this proximity. In addition, the inventors have shown that by modulating the targeting of the fusion proteins, it is possible to bias DSB repair towards different outcomes (e.g., excision of a target fragment, inversion of a target fragment, or HDR using a donor template).

[0046] III. Fusion Proteins In one aspect, provided herein is a fusion protein comprising a site-specific nuclease linked to a non-specific end-processing enzyme. As used throughout, a "fusion protein" includes two different polypeptide sequences, namely, a site-specific nuclease polypeptide sequence and a non-specific end-processing enzyme polypeptide sequence, which are joined or linked to form a single polypeptide. In some embodiments, the two amino acid sequences are encoded by separate nucleic acid sequences joined such that a single polypeptide is created when they are transcribed and translated. The site-specific nuclease and the non-specific end-processing enzyme may be linked in any order and orientation relative to each other. For example, the C'-terminal end of the site-specific nuclease may be linked to the N'-terminal end of the non-specific end-processing enzyme or to the C'-terminal end. The site-specific nuclease and the non-specific end-processing enzyme may also be separated by one or more additional fusion protein domains, as described below.

[0047] A. Site-Specific Nuclease The fusion proteins provided herein include a site-specific polypeptide. The site-specific modification polypeptide modifies the target DNA (e.g., by cleavage or methylation of the target DNA) and / or modifies a polypeptide associated with the target DNA (e.g., methylation or acetylation of the histone tail). In some embodiments, the site-specific modification polypeptide interacts with a guide RNA that is either a single RNA molecule or an RNA duplex of at least two RNA molecules, and, due to its association with the guide RNA, is directed to a DNA sequence (e.g., a chromosomal sequence or an extrachromosomal sequence such as an episomal sequence, a minicircle sequence, a mitochondrial sequence, a chloroplast sequence, etc.). In some embodiments, the site-specific polypeptide is a site-specific nuclease, which is capable of cleaving one or both strands of DNA at a designated target sequence.

[0048] The term "cleavage" or "cleaving" refers to the breaking of the covalent phosphodiester bond in the ribosyl phosphate backbone of a polynucleotide, and includes both single-strand cleavage and double-strand cleavage. Double-strand cleavage can occur as a result of two separate single-strand cleavage events. Cleavage can result in the creation of either blunt ends or overhanging ends (also known as sticky ends). A "nuclease cleavage site" or "genomic nuclease cleavage site" is a region of nucleotides within which a site-specific nuclease cleaves (e.g., when bound to a proximal binding site). When the polynucleotide is DNA (e.g., genomic DNA), one or both strands can be cleaved at the nuclease cleavage site. Such cleavage by a nuclease enzyme triggers the DNA repair machinery within the cell, thereby establishing an environment for homologous recombination to occur.

[0049] In the fusion proteins, systems, and methods disclosed herein, various site-specific nucleases can be used. Suitable nucleases include, but are not limited to, CRISPR-associated (Cas) proteins or Cas nucleases; zinc finger nucleases (ZFNs); transcription activator-like effector nucleases (TALENs); meganucleases; RNA-binding proteins (RBPs); CRISPR-associated RNA-binding proteins; recombinases; flippases; transposases; Argonaute (Ago) proteins (e.g., prokaryotic Argonaute (pAgo), archaeal Argonaute (aAgo), eukaryotic Argonaute (eAgo), and Natronobacterium gregoryi Argonaute (NgAgo); adenosine deaminases acting on RNA (ADAR); CRISPR-Cas inspired RNA targeting (CIRT) systems; Pumilio / fem-3 binding factor (PUF), homing endonucleases, or any functional fragment thereof, any derivative thereof; any variant thereof; and any fragment thereof. Exemplary site-specific nucleases suitable for use in the fusion proteins, systems, and methods disclosed herein are further described below.

[0050] In some embodiments, the site-specific nuclease is a naturally occurring site-specific nuclease. Exemplary naturally occurring site-specific nucleases are known in the art (see, e.g., Makarova et al., 2017, Cell 168:328-328.e1, and Shmakov et al., 2017, Nat Rev Microbiol 15(3): 169-182, both of which are incorporated herein by reference). In some embodiments, the site-specific nuclease binds to a DNA-targeting polynucleotide (e.g., a guide RNA), thereby being directed to a specific sequence within the target DNA and cleaving the target DNA.

[0051] In some embodiments, the site-specific nuclease is modified from its native sequence (e.g., by mutations or one or more amino acid residues) such that its function is altered. For example, the site-specific nuclease may be modified to be enzymatically inactive. The term "enzymatically inactive" can refer to a site-specific nuclease that can bind to a nucleic acid sequence in a polynucleotide in a sequence-specific manner but does not cleave the target polynucleotide. A polypeptide that targets an enzymatically inactive site may include an enzymatically inactive domain (e.g., a nuclease domain). Enzymatically inactive can refer to without activity. Enzymatically inactive can refer to substantially without activity. Enzymatically inactive can refer to essentially without activity. Enzymatically inactive can refer to an activity that does not exceed 1%, 2%, 3%, 4%, 5%, 6%, 7%, 8%, 9%, or 10% of the activity of an exemplary wild-type activity (e.g., nucleic acid cleavage activity, wild-type Cas9 activity).

[0052] In some embodiments, a site-specific nuclease (e.g., a nuclease that targets an enzymatically inactive site) is fused to one or more transcriptional repressor domains, activator domains, epigenetic domains, recombinase domains, transposase domains, flippase domains, nickase domains, cleavage domains, or any combination thereof. Examples of activator domains include one or more tandem activation domains located at the carboxyl terminus of the enzyme. In other cases, examples of the actuator portion include one or more tandem repressor domains located at the carboxyl terminus of the protein. Non-limiting exemplary activation domains include GAL4, herpes simplex activation domain VP16, VP64 (tetramer of herpes simplex activation domain VP16), NF-κB p65 subunit, Epstein-Barr virus R transactivator (Rta), as described in Chavez et al., Nat Methods, 2015, 12(4):326-328 and U.S. Patent Application Publication No. 20140068797. Non-limiting exemplary repression domains include the KRAB (Krüppel-associated box) domain of Kox1, the Mad mSIN3 interaction domain (SID), the ERF repressor domain (ERD), as described in Chavez et al., Nat Methods, 2015, 12(4):326-328 and U.S. Patent Application Publication No. 20140068797. The nuclease may also be fused to a heterologous polypeptide that provides increased or decreased stability. The fused domain or heterologous polypeptide may be located at the N-terminus, C-terminus, or internally within the nuclease.

[0053] CRISPR / Cas nuclease In some embodiments, the site-specific nuclease comprises a CRISPR (Clustered Regularly Interspaced Short Palindromic Repeats) / Cas system-associated (Cas) protein or Cas nuclease that functions in the CRISPR / Cas system. In bacteria, this system can provide adaptive immunity against foreign DNA (Barrangou, R., et al, “CRISPR provides acquired resistance against viruses in prokaryotes,” Science (2007) 315:1709-1712; Makarova, K.S., et al, “Evolution and classification of the CRISPR-Cas systems,” Nat Rev Microbiol (2011) 9:467-477; Garneau, J.E., et al, “The CRISPR / Cas bacterial immune system cleaves bacteriophage and plasmid DNA,” Nature (2010) 468:67-71; Sapranauskas, R., et al, “The Streptococcus thermophilus CRISPR / Cas system provides immunity in Escherichia coli,” Nucleic Acids Res (2011) 39:9275-9282). The CRISPR / Cas system (e.g., modified and / or unmodified) can be utilized as a genome engineering tool in a wide variety of organisms, including diverse mammals, animals, plants, microorganisms, and yeast. The CRISPR / Cas system can include a guide nucleic acid such as a guide RNA (gRNA) complexed with a Cas protein for targeted regulation of gene expression and / or activity or nucleic acid editing. RNA-guided Cas proteins (e.g., Cas nucleases such as Cas9 nuclease) can specifically bind to a target polynucleotide (e.g., DNA) in a sequence-dependent manner.When the Cas protein has nuclease activity, it can cleave DNA (Gasiunas, G., et al., “Cas9-crRNA ribonucleoprotein complex mediates specific DNA cleavage for adaptive immunity in bacteria,” Proc Natl Acad Sci USA (2012) 109: E2579-E286; Jinek, M., et al., “A programmable dual-RNA-guided DNA endonuclease in adaptive bacterial immunity,” Science (2012) 337: 816-821; Sternberg, S.H., et al., “DNA interrogation by the CRISPR RNA-guided endonuclease Cas9,” Nature (2014) 507: 62; Deltcheva, E., et al., “CRISPR RNA maturation by trans-encoded small RNA and host factor RNase III,” Nature (2011) 471: 602-607). As a result of DNA cleavage (e.g., double-strand cleavage), DNA cleavage repair can occur, enabling the introduction of one or more gene modifications (e.g., nucleic acid editing). DNA cleavage repair can occur by non-homologous end joining (NHEJ), microhomology-mediated end joining (MMEJ), or homologous recombination repair (HDR). In some embodiments, a donor nucleic acid is used to facilitate HDR, as detailed in the “Systems” section below.The CRISPR-Cas system has been widely used for programmable genome editing in various organisms and model systems (Cong, L., et al., “Multiplex genome engineering using CRISPR Cas systems,” Science (2013) 339:819-823; Jiang, W., et al., “RNA-guided editing of bacterial genomes using CRISPR-Cas systems,” Nat. Biotechnol. (2013) 31:233-239; Sander, J.D. & Joung, J.K, “CRISPR-Cas systems for editing, regulating and targeting genomes,” Nature Biotechnol. (2014) 32:347-355).

[0054] In some embodiments, the site-specific nucleases described herein include a Cas protein that forms a complex with a guide nucleic acid such as a guide RNA (further described in the “Systems” section below). In some embodiments, the site-specific nuclease includes a Cas protein that forms a complex with a single guide nucleic acid such as a single guide RNA (gRNA). In some embodiments, the site-specific nuclease includes an RNA-binding protein (RBP) optionally complexed with a guide nucleic acid such as a guide RNA (e.g., sgRNA) that is capable of forming a complex with the Cas protein. In some examples, the RNA-guided Cas protein recognizes a DNA target complementary to a portion of the gRNA known as the CRISPR RNA (crRNA) sequence. The target sequence is often referred to as a protospacer, and a portion of the crRNA sequence that is complementary to the protospacer is often referred to as the spacer. For function (e.g., to cleave DNA), many Cas nucleases also require a specific protospacer adjacent motif (PAM), a DNA sequence of about 2-6 base pairs immediately following the protospacer sequence.

[0055] In the fusion proteins, systems, and methods provided herein, various site-specific Cas nucleases (e.g., Cas proteins from different species) can be useful based on the various enzymatic properties of different Cas proteins (e.g., different protospacer adjacent motif (PAM) sequence preferences; increased or decreased enzymatic activity; increased or decreased cytotoxicity levels; tendency to cause one or more of NHEJ, homologous recombination repair, single-strand breaks, double-strand breaks, etc.). Cas proteins from various species (e.g., those disclosed in Shmakov et al., 2017, or polypeptides derived therefrom) may require different PAM sequences in the target DNA. Thus, for a given Cas enzyme of choice, the PAM sequence requirements can differ from the 5'-NGG-3' sequence (wherein N is any one of A, T, C, or G), which is known to be required for Cas9 activity. Many Cas9 orthologs have been identified from a wide variety of species, and those proteins share only a few identical amino acids. All identified Cas9 orthologs have the same domain architecture, including a central HNH endonuclease domain and a split RuvC / RNase H domain. Cas9 proteins share four key motifs of conserved structure; motifs 1, 2, and 4 are RuvC-like motifs, while motif 3 is an HNH motif. In contrast, Cas12a proteins from various species may have different PAM sequence requirements compared to the LbCas12a canonical PAM of TTTV.

[0056] Any suitable CRISPR / Cas system can be used. CRISPR / Cas systems can be referred to using various nomenclature systems. Exemplary nomenclature systems are provided in Makarova, K.S. et al, “An updated evolutionary classification of CRISPR-Cas systems,” Nat Rev Microbiol (2015) 13:722-736 and Shmakov, S. et al, “Discovery and Functional Characterization of Diverse Class 2 CRISPR-Cas Systems,” Mol Cell (2015) 60:1-13. The CRISPR / Cas system can be a type I, type II, type III, type IV, type V, type VI system, or any other suitable CRISPR / Cas system. The CRISPR / Cas system as used herein can be a class 1, class 2, or any other suitably classified CRISPR / Cas system. The determination of class 1 or class 2 can be based on the genes encoding the effector modules. Class 1 systems generally have a multi-subunit crRNA-effector complex, while class 2 systems generally have a single protein such as Cas9, Cpfl, C2c1, C2c2, C2c3 or a crRNA-effector complex. Class 1 CRISPR / Cas systems can use a complex of multiple Cas proteins to effect regulation. Class 1 CRISPR / Cas systems can include, for example, type I (e.g., type I, IA, IB, IC, ID, IE, IF, IU), type III (e.g., type III, IIIA, IIIB, IIIC, IIID), and type IV (e.g., type IV, IVA, IVB) CRISPR / Cas types. Class 2 CRISPR / Cas systems can use a single large Cas protein to effect regulation. Class 2 CRISPR / Cas systems can include, for example, type II (e.g., type II, IIA, IIB) and type V CRISPR / Cas types.The CRISPR system may be complementary to each other and / or may be provided in trans with functional units to facilitate CRISPR locus targeting.

[0057] The Cas protein may be derived from any suitable organism. Non-limiting examples include Streptococcus pyogenes, Streptococcus thermophilus, Streptococcus sp., Staphylococcus aureus, Nocardiopsis dassonvillei, Streptomyces pristinae spiralis, Streptomyces viridochromogenes, Streptomyces viridochromogenes, Streptosporangium roseum, Streptosporangium roseum, Alicyclobacillus acidocaldarius, Bacillus pseudomycoides, Bacillus selenitireducens, Exiguobacterium sibiricum, Lactobacillus delbrueckii, Lactobacillus salivarius, Microscilla marina, bacteria of the Burkholderiales order, Polaromonas naphthalenivorans, Polaromonas sp., Crocosphaera watsonii, Cyanothece sp.) Microcystis aeruginosa, Pseudomonas aeruginosa, Synechococcus sp., Acetohalobium arabaticum, Ammonifex degensii, Caldicelulosiruptor becscii, Candidatus Desulforudis, Clostridium botulinum, Clostridium difficile, Finegoldia magna, Natranaerobius thermophilus, Pelotomaculum thermopropionicum, Acidithiobacillus caldus, Acidithiobacillus ferrooxidans, Allochromatium vinosum, Marinobacter sp., Nitrosococcus halophilus, Nitrosococcus watsoni, Pseudoalteromonas haloplanktis, Ktedonobacter racemifer, Methanohalobium evestigatum, Anabaena variabilis, Nodularia spumigena, Nostoc sp.) include Arthrospira maxima, Arthrospira platensis, Arthrospira sp., Lyngbya sp., Microcoleus chthonoplastes, Oscillatoria sp., Petrotoga mobilis, Thermosipho africanus, Acaryochloris marina, Leptotrichia shahii, and Francisella novicida. In some embodiments, the organism is Streptococcus pyogenes (S. pyogenes). In some embodiments, the organism is Staphylococcus aureus (S. aureus). In some embodiments, the organism is Streptococcus thermophilus (S. thermophilus).

[0058] Cas proteins include, but are not limited to, Veillonella atypical, Fusobacterium nucleatum, Filifactor alocis, Solobacterium moorei, Coprococcus catus, Treponema denticola, Peptoniphilus duerdenii, Catenibacterium mitsuokai, Streptococcus mutans, Listeria innocua, Staphylococcus pseudintermedius, Acidaminococcus intestine, Olsenella uli, Oenococcus kitaharae, Bifidobacterium bifidum, Lactobacillus rhamnosus, Lactobacillus gasseri, Finegoldia magna, Mycoplasma mobile, Mycoplasma gallisepticum, Mycoplasma ovipneumoniae, Mycoplasma canis, Mycoplasma synoviae, Eubacterium rectale, Streptococcus thermophilus, Eubacterium doricumDolichum), Lactobacillus coryniformis subsp. Torquens, Ilyobacter polytropus, Ruminococcus albus, Akkermansia muciniphila, Acidothermus cellulolyticus, Bifidobacterium longum, Bifidobacterium dentium, Corynebacterium diphtheria, Elusimicrobium minutum, Nitratifractor salsuginis, Sphaerochaeta globus, Fibrobacter succinogenes subsp. Succinogenes, Bacteroides fragilis, Capnocytophaga ochracea, Rhodopseudomonas palustris, Prevotella micans, Prevotella ruminicola, Flavobacterium columnare, Aminomonas paucivorans, Rhodospirillum rubrum, Candidatus Puniceispirillum marinum, Verminephrobacter eiseniae, Ralstoniaincluding Syzygii, Dinoroseobacter shibae, Azospirillum, Nitrobacter hamburgensis, Bradyrhizobium, Wolinella succinogenes, Campylobacter jejuni subsp. Jejuni, Helicobacter mustelae, Bacillus cereus, Acidovorax ebreus, Clostridium perfringens, Parvibaculum lavamentivorans, Roseburia intestinalis, Neisseria meningitidis, Pasteurella multocida subsp. Multocida, Sutterella wadsworthensis, Proteobacteria, Legionella pneumophila, Parasutterella excrementihominis, Wolinella succinogenes, and Francisella novicida, and can be derived from various bacterial species.

[0059] Non-limiting examples of Cas proteins include c2c1, C2c2, c2c3, Cas1, Cas1B, Cas2, Cas3, Cas4, Cas5, Cas5e (CasD), CasH, Cas6e, Cas6f, Cas7, Cas8a, Cas8a1, Cas8a2, Cas8b, Cas8c, Cas9 (Csn1 or Csx12), Cas10, Cas10d, CasF, CasG, CasH, Cpfl, Csyl, Csy2, Csy3, Cse1 (CasA), Cse2 (CasB), Cse3 (CasE), Cse4 (CasC), Csc1, Csc2, Csa5, Csn2, Csm2, Csm3, Csm4, Csm5, Csm6, Cmr1, Cmr3, Cmr4, Cmr5, Cmr6, Csb1, Csb2, Csb3, Csx17, Csx14, Csx10, Csx16, CsaX, Csx3, Csx1, Csx15, Csfl, Csf2, Csf3, Csf4, and Cul966, and homologs or modified versions thereof. In some embodiments, the site-specific nuclease of the fusion protein provided herein includes a CRISPR-associated nuclease, where the CRISPR-associated nuclease is Cas5, Cas6, Cas7, Cas8, Cas9, Cas12a, Cas12b, Cas12i, Cas12j, Cas12L, Cas12e, Cas12c, Cas12d, Cas12g, Cas12h, TnpB, Cas13a, Cas13b, or Cas14. In some embodiments, the CRISPR-associated nuclease is the Cas9 enzyme. In some embodiments, the CRISPR-associated nuclease is the Cas12a enzyme. In some embodiments, the CRISPR-associated nuclease is a nickase or an inactivated version of the CRISPR-associated nuclease.

[0060] The Lachnospiraceae bacterium Cpf1 (LbCpf1) is one of many Cpf1 proteins in a large population. The terms "Cpf1" and "Cas12a" are used synonymously throughout this disclosure. Cpf1 is a Cas protein. In some embodiments, the site-specific nuclease is a catalytically inactive Cas12a from the Lachnospiraceae bacterium ("dLbCas12a"). In other embodiments, the site-specific nuclease is a catalytically active Cas12a from the Lachnospiraceae bacterium ("LbCas12a") or Moraxella bovoculi AAX08_00205 ("Mb2Cas12a"). In some embodiments, the site-specific nuclease domain of the fusion protein is a Cas12a protein from any of the Lachnospiraceae bacterium, Acidaminococcus sp., Moraxella bovoculi, Thiomicrospira sp., Moraxella lacunata, Methanomethylophilus alvus, Btyrivibrio sp., or Bacteroidetesoral sp.

[0061] The Cas protein can include one or more domains. Non-limiting examples of domains include a guide nucleic acid recognition and / or binding domain, a nuclease domain (e.g., a DNase or RNase domain, RuvC, HNH), a DNA binding domain, an RNA binding domain, a helicase domain, a protein-protein interaction domain, and a dimerization domain. The guide nucleic acid recognition and / or binding domain can interact with a guide nucleic acid. The nuclease domain can include catalytic activity for nucleic acid cleavage. The nuclease domain may lack catalytic activity to prevent nucleic acid cleavage. The Cas protein may be a chimeric Cas protein fused to another protein or polypeptide. The Cas protein may be, for example, a chimera of various Cas proteins including domains from different Cas proteins.

[0062] The Cas proteins used herein may be active variants, inactive variants, or fragments of wild-type or modified Cas proteins. The Cas proteins may include amino acid changes such as deletions, insertions, substitutions, variants, mutations, fusions, chimeras, or any combination thereof compared to the wild-type version of the Cas protein. The Cas proteins may be polypeptides having at least about 5%, 10%, 20%, 30%, 40%, 50%, 60%, 70%, 80%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100% sequence identity or sequence similarity to an exemplary wild-type Cas protein. The Cas proteins may be polypeptides having at most about 5%, 10%, 20%, 30%, 40%, 50%, 60%, 70%, 80%, 90%, 100% sequence identity and / or sequence similarity to an exemplary wild-type Cas protein. The variant or fragment may include at least about 5%, 10%, 20%, 30%, 40%, 50%, 60%, 70%, 80%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100% sequence identity or sequence similarity to the wild-type or modified Cas protein or a portion thereof. The variant or fragment may be able to complex with a guide nucleic acid and be targeted to a nucleic acid locus while lacking nucleic acid cleavage activity.

[0063] In some embodiments, the modified Cas protein has reduced functionality compared to its unmodified form. In some embodiments, the modified Cas protein lacks the functionality of the unmodified form. For example, a nuclease-deficient Cas protein retains the ability to bind DNA but lacks or has reduced nucleic acid cleavage activity. Cas nucleases (e.g., those retaining wild-type nuclease activity, having reduced nuclease activity, and / or lacking nuclease activity) can function in the CRISPR / Cas system to regulate (e.g., decrease, increase, or abolish) the level and / or activity of a target gene or protein. The Cas protein can bind to a target polynucleotide and cause a non-functional gene product to be produced by physically interfering or by preventing transcription through editing of the nucleic acid sequence. In some embodiments, the modified Cas protein has 90% or less, 80% or less, 70% or less, 60% or less, 50% or less, 40% or less, 30% or less, 20% or less, 10% or less, 5% or less, or 1% or less of the functionality (e.g., nuclease activity) of the wild-type Cas protein (e.g., Cas9 from Streptococcus pyogenes). In some embodiments, the modified Cas protein has substantially no functionality of the wild-type Cas protein. When the Cas protein is in a modified form and has substantially no nucleic acid cleavage activity, it can be referred to as enzymatically inactive and / or "dead" (abbreviated as "d"). A dead Cas protein (e.g., dCas, dCas9) can bind to a target polynucleotide but does not cleave the target polynucleotide. In some aspects, the dead Cas protein is a dead Cas9 protein or a dead Cas12a protein.

[0064] In some embodiments, the modified Cas protein may be a modified Cas "base editor". In base editing, it is possible to directly and irreversibly convert one target DNA base to another base in a programmable manner, and DNA cleavage or donor DNA molecules are not required. For example, Komor et al (2016, Nature, 533: 420-424) taught a Cas9-cytidine deaminase fusion, where Cas9 is also inactivated and engineered not to induce double-stranded DNA cleavage. Additionally, Gaudelli et al (2017, Nature, doi:10.1038 / nature24644) taught a catalytically impaired Cas9 fused to tRNA adenosine deaminase, which can mediate the conversion of A / T to G / C in the target DNA sequence. Another class of engineered Cas9 nucleases that can be used as site-specific nucleases in the fusion proteins of the present disclosure are mutants that can recognize a wide range of PAM sequences, including NG, GAA, and GAT (Hu et al., 2018, Nature, doi:10.1038 / nature26155).

[0065] The Cas protein can be modified such that the regulation of gene expression is optimized. The Cas protein can be modified such that its nucleic acid binding affinity, nucleic acid binding specificity, and / or enzymatic activity are increased or decreased. The Cas protein can also be modified such that other optional activities or properties of the protein, such as stability, are changed. For example, one or more nuclease domains of the Cas protein can be modified, deleted, or inactivated, or the Cas protein can be truncated to regulate gene expression, removing domains that are not essential for the function of the protein, or optimizing (e.g., enhancing or reducing) the activity of the Cas protein.

[0066] One or more nuclease domains (e.g., RuvC, HNH) of the Cas protein can be deleted or mutated to no longer be functional or to have reduced nuclease activity. For example, in a Cas protein that contains at least two nuclease domains (e.g., Cas9), if one of the nuclease domains is deleted or mutated, the resulting Cas protein, known as a nickase, can generate a single-strand break rather than a double-strand break in the CRISPR RNA (crRNA) recognition sequence within double-stranded DNA. Such a nickase can cleave either the complementary or non-complementary strand, but not both. In some embodiments, the targeting specificity of the double-strand break is improved by targeting the nickase to the reverse strands at two neighboring loci. If the nickase cleaves the single strands at both loci, a double-strand break is formed and can be repaired by HR as described herein. If all of the nuclease domains of the Cas protein (e.g., both the RuvC and HNH nuclease domains in the Cas9 protein; the RuvC nuclease domain in the Cpfl protein) are deleted or mutated, the resulting Cas protein may have a reduced or no ability to cleave both strands of double-stranded DNA.

[0067] Zinc finger nuclease In some embodiments, a site-specific nuclease suitable for use in the fusion proteins or methods described herein is a "zinc finger nuclease" or "ZFN". A ZFN refers to a fusion between a cleavage domain, such as the cleavage domain of FokI, and at least one zinc finger motif (e.g., at least 2, 3, 4, or 5 zinc finger motifs) capable of binding to a polynucleotide such as DNA and RNA. When two individual ZFNs heterodimerize at specific positions and intervals in a particular polynucleotide, it can lead to cleavage of that polynucleotide. For example, when a ZFN binds to DNA, a double-strand break can be induced in that DNA. The two individual ZFNs can bind to the opposite strands of DNA with their C-termini separated by a specific distance so that the two cleavage domains can dimerize to cleave the DNA. Optionally, the linker sequence between the zinc finger domain and the cleavage domain may need the 5'-end termini of each binding site to be separated by about 5 to 7 base pairs. Optionally, the cleavage domain is fused to the C-terminus of each zinc finger domain. Exemplary ZFNs include, but are not limited to, Urnov et al., Nature Reviews Genetics, 2010, 11:636-646; Gaj et al., Nat Methods, 2012, 9(8):805-7; U.S. Patent No. 6,534,261; No. 6,607,882; No. 6,746,838; No. 6,794,136; No. 6,824,978; No. 6,866,997; No. 6,933,113; No. 6,979,539; No. 7,013,219; No. 7,030,215; No. 7,220,719; No. 7,241,573; No. 7,241,574; No. 7,585,849; No. 7,595,376; No. 6,903,185; No. 6,479,626; and those described in U.S. Patent Application Publication Nos. 2003 / 0232410 and 2009 / 0203140.

[0068] In some embodiments, a nuclease comprising a ZFN can generate a double-strand break in a target polynucleotide, such as DNA. As a result of the double-strand break in the DNA, DNA break repair can occur, enabling the introduction of one or more gene modifications (e.g., nucleic acid editing). DNA break repair can occur by non-homologous end joining (NHEJ) or homologous recombination repair (HDR). In HDR, a donor DNA repair template or template polynucleotide containing homology arms adjacent to the site of the target DNA can be provided. In some embodiments, the ZFN is a zinc finger nickase that induces site-specific single-strand DNA breaks or nicks, thereby resulting in HR. For an explanation of zinc finger nickases, see, for example, Ramirez et al., Nucl Acids Res, 2012, 40(12):5560-8; Kim et al., Genome Res, 2012, 22(7):1327-33. In some embodiments, the ZFN binds to a polynucleotide (e.g., DNA and / or RNA), but is unable to cleave the polynucleotide.

[0069] In some embodiments, the cleavage domain of a nuclease comprising a ZFN comprises a modified form of a wild-type cleavage domain. The modified form of the cleavage domain can include amino acid changes (e.g., deletions, insertions, or substitutions) that reduce the nucleic acid cleavage activity of the cleavage domain. For example, the modified form of the cleavage domain can have a nucleic acid cleavage activity that is 90% or less, 80% or less, 70% or less, 60% or less, 50% or less, 40% or less, 30% or less, 20% or less, 10% or less, 5% or less, or 1% or less of the wild-type cleavage domain. The modified form of the cleavage domain can be one that has substantially no nucleic acid cleavage activity. In some embodiments, the cleavage domain is enzymatically inactive.

[0070] TAL effector nuclease In some embodiments, a site-specific nuclease suitable for use in the fusion proteins, systems, or methods described herein is a "TALEN" or "TAL effector nuclease". A TALEN generally refers to an engineered transcription activator-like effector nuclease having a central domain of DNA-binding tandem repeats and a cleavage domain. A TALEN can be produced by fusing a TAL effector DNA-binding domain to a DNA cleavage domain. In some cases, the DNA-binding tandem repeats include 33 to 35 amino acids in length and have two hypervariable amino acid residues at positions 12 and 13 that can recognize at least one specific DNA base pair. The transcription activator-like effector (TALE) protein can be fused to a nuclease such as a wild-type or mutant Fok1 endonuclease or the catalytic domain of Fok1. Fok1 has several mutations for using it as a TALEN, for example, to improve cleavage specificity or activity. Such TALENs can be engineered to bind to any desired DNA sequence. Double-stranded breaks can be created in the target DNA sequence using TALENs, and then gene modifications (e.g., nucleic acid sequence editing) can be generated by causing NHEJ or HR to occur. As a result of double-stranded breaks in DNA, DNA cleavage repair can occur, enabling the introduction of one or more gene modifications (e.g., nucleic acid editing). DNA cleavage repair can occur by non-homologous end joining (NHEJ) or homologous recombination repair (HDR). In HDR, a donor DNA repair template or template polynucleotide having homology arms adjacent to the site of the target DNA can be provided. In some cases, a single-stranded donor DNA repair template is provided to promote HR.For a detailed description of TALEN and its use in gene editing, see, for example, U.S. Patent Nos. 8,440,431; 8,440,432; 8,450,471; 8,586,363; and 8,697,853; Scharenberg et al., Curr Gene Ther, 2013, 13(4):291-303; Gaj et al., Nat Methods, 2012, 9(8):805-7; Beurdeley et al., Nat Commun, 2013, 4:1762; and Joung and Sander, Nat Rev Mol Cell Biol, 2013, 14(1):49-55.

[0071] In some embodiments, the TALEN is engineered to reduce nuclease activity. In some embodiments, the nuclease domain of the TALEN comprises a modified form of the wild-type nuclease domain. The modified form of the nuclease domain may include amino acid changes (e.g., deletions, insertions, or substitutions) that reduce the nucleic acid cleavage activity of the nuclease domain. For example, the modified form of the nuclease domain may have nucleic acid cleavage activity that is 90% or less, 80% or less, 70% or less, 60% or less, 50% or less, 40% or less, 30% or less, 20% or less, 10% or less, 5% or less, or 1% or less of the wild-type nuclease domain. The modified form of the nuclease domain may be one that has substantially no nucleic acid cleavage activity. In some embodiments, the nuclease domain is enzymatically inactive.

[0072] In some embodiments, the transcription activator-like effector (TALE) protein is fused to a domain capable of regulating transcription and does not contain a nuclease. In some embodiments, the transcription activator-like effector (TALE) protein is designed to function as a transcription activator. In some embodiments, the transcription activator-like effector (TALE) protein is designed to function as a transcription repressor. For example, the DNA binding domain of the transcription activator-like effector (TALE) protein can be fused (e.g., linked) to one or more transcription activation domains or to one or more transcription repression domains. Non-limiting examples of transcription activation domains include the herpes simplex VP16 activation domain and the tetrameric repeat of the VP16 activation domain, e.g., the VP64 activation domain. Non-limiting examples of transcription repression domains include the Kruppel-associated box domain.

[0073] Meganuclease In some embodiments, a site-specific nuclease suitable for use in the fusion proteins, systems, or methods described herein is a meganuclease. A meganuclease generally refers to a rare-cut endonuclease or homing endonuclease that can be highly specific. Meganucleases can recognize DNA target sites ranging from at least 12 base pairs in length, for example, 12-40 base pairs in length, 12-50 base pairs in length, or 12-60 base pairs in length. A meganuclease can regulate a DNA-binding nuclease, such as any fusion protein comprising at least one catalytic domain of an endonuclease and at least one DNA-binding domain or protein that specifies a nucleic acid target sequence. The DNA-binding domain can comprise at least one motif that recognizes single-stranded or double-stranded DNA. Meganucleases can generate double-strand breaks. As a result of double-strand breaks in DNA, DNA break repair can occur, enabling the introduction of one or more gene modifications (e.g., nucleic acid editing). DNA break repair can occur by non-homologous end joining (NHEJ) or homologous recombination repair (HDR). In HDR, a donor DNA repair template or template polynucleotide having homology arms adjacent to the site of the target DNA can be provided. Meganucleases can be monomers or dimers. In some embodiments, the meganuclease is naturally occurring (found in nature) or wild-type, and in other cases, the meganuclease is non-natural, artificial, engineered, synthetic, rationally designed, or man-made. In some embodiments, meganucleases of the present disclosure include I-CreI meganuclease, I-CeuI meganuclease, I-Msol meganuclease, I-SceI meganuclease, variants thereof, derivatives thereof, and fragments thereof.For a detailed description of useful meganucleases and their applications in gene editing, see, for example, Silva et al., Curr Gene Ther, 2011, 11(1):11-27; Zaslavoskiy et al., BMC Bioinformatics, 2014, 15:191; Takeuchi et al., Proc Natl Acad Sci USA, 2014, 111(11):4061-4066, and U.S. Patent Nos. 7,842,489; 7,897,372; 8,021,867; 8,163,514; 8,133,697; 8,021,867; 8,119,361; 8,119,381; 8,124,36; and 8,129,134.

[0074] In some embodiments, the nuclease domain of the meganuclease comprises a modified form of the wild-type nuclease domain. The modified form of the nuclease domain can include amino acid changes (e.g., deletions, insertions, or substitutions) that reduce the nucleic acid cleavage activity of the nuclease domain. For example, the modified form of the nuclease domain can have a nucleic acid cleavage activity that is 90% or less, 80% or less, 70% or less, 60% or less, 50% or less, 40% or less, 30% or less, 20% or less, 10% or less, 5% or less, or 1% or less of the wild-type nuclease domain. The modified form of the nuclease domain can be one that has substantially no nucleic acid cleavage activity. In some embodiments, the nuclease domain is enzymatically inactive. In some embodiments, the meganuclease can bind to DNA but cannot cleave DNA.

[0075] B. Nonspecific Endoprocessing Enzymes The fusion proteins provided herein include non-specific endoprocessing enzymes. A non-specific endoprocessing enzyme is a polypeptide that non-sequence-specifically modifies the ends on one side of a polynucleotide. In some embodiments, the non-specific endoprocessing enzyme is a non-specific exonuclease. As used herein, the term "exonuclease" refers to an enzyme that cleaves a polynucleotide from the 5'-end or 3'-end. A 5'→3' exonuclease cleaves a polynucleotide only in the 5' to 3' direction. A 3'→5' exonuclease cleaves a polynucleotide only in the 3' to 5' direction. A bidirectional exonuclease can cleave a polynucleotide in either direction. Suitable exonucleases are described, for example, in Lovett, 2011, ASM Journals EcoSal Plus 4(2):10.1128 / ecosalplus.4.4.7 and Shevelev and Huebscher, 2002, Nature Reviews Molecular Cell Biology 3:364-376.

[0076] In some embodiments, the non-specific exonuclease is T5Exo, Trex2 (a non-processive 3'→5' exonuclease that functions as a homodimer), Escherichia coli (E. coli) exonuclease I, exonuclease III, exonuclease T, exonuclease IX, exonuclease X, RecJ, Pol II, Pol IIIε; WRN, MRE11, APE1, VDJP, RAD1, RAD9, p53, or Trex1.

[0077] In some embodiments, the non-specific endoprocessing enzyme comprises an amino acid sequence having at least 70% (e.g., at least 75%, at least 80%, at least 85%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or 100%) identity to any one of SEQ ID NOs: 4, 5, 18, 19, 20, 22, or 58-74.

[0078] C. Dimerization Dimerization refers to the situation where two non-covalently linked protein domains function together as a single unit (i.e., a "dimer"). Non-limiting examples of dimers include homodimers, heterodimers, oligomerization / polymerization, autonomous dimerization, and inducible dimerization. See, for example, U.S. Patent Application Publication No. 2016 / 0024485; U.S. Patent Application Publication No. 2020 / 0199254; International Publication No. 1999 / 010510; International Publication No. 2022 / 040909; and U.S. Patent Application Publication No. 2018 / 163195. In some embodiments, the fusion proteins provided herein form dimers (e.g., homodimers). In some embodiments, this is through protein-protein interactions of non-specific endoprocessing enzymes. In some embodiments, the non-specific endoprocessing enzyme has the ability to dimerize. In some examples, the non-specific endoprocessing enzyme is a monomer of the protein that dimerizes. In some examples, the non-specific endoprocessing enzyme contains a dimerization domain. In some embodiments, a fusion protein containing a non-specific endoprocessing enzyme capable of dimerizing will form a fusion protein dimer (or a complex with more than two monomers) through dimerization of the non-specific endoprocessing enzyme.

[0079] In some embodiments, the non-specific endoprocessing enzyme is capable of dimerizing in its endogenous form. As an example, Trex2 is capable of dimerizing and is known to function as a homodimer. In some embodiments, the non-specific endoprocessing enzyme contains a domain that contributes to dimerization. In some embodiments, the non-specific endoprocessing enzyme is capable of autonomously dimerizing.

[0080] In some embodiments, the non-specific end-processing enzyme is an engineered polypeptide that has acquired a dimerization function, for example, by addition of a dimerization domain. A wide variety of protein dimerization domains are known in the art, including, for example, antibody Fc domains and commercially available dimerization systems (e.g., the iDimerize® system, Takara Bio USA). In some embodiments, dimerization is achieved by using any of the polypeptide interaction strategies described in Section III.D below. In some examples, the dimerization domain may be located at the N'- or C'-terminal end of the non-specific end-processing enzyme.

[0081] D. Further Fusion Protein Domains In some embodiments, the fusion proteins provided herein include one or more linkers. A linker as used herein, also referred to as a spacer, is a flexible molecule or a series of flexible molecules that joins or attaches two portions (e.g., domains) of a fusion protein or modified protein as provided herein. In some embodiments, the linker is a polypeptide. A protein in which domains are joined by a polypeptide linker is referred to as a fusion protein. In some embodiments, the linker is a non-peptide linker. A protein in which domains are joined by a polypeptide linker is referred to as a modified protein. It will be understood that when fusion proteins are discussed throughout this disclosure, modified proteins are generally also contemplated where feasible.

[0082] The linker can increase the range of orientations that the domains of the fusion protein or modified protein can adopt. The linker may be optimized to produce the desired effect in the fusion protein or modified protein. Aspects of linker design and considerations are described, for example, in Chen, X. et al., Adv Drug Deliv Rev. 2013 Oct 15;65(10):1357-1369, and Klein, J. S. et al. 2014 Protein Eng Des Sel. 27(10):325-330. In some embodiments, the proteins provided herein include a peptide linker. In some embodiments, the proteins provided herein include a non-peptide linker. In some embodiments, the proteins provided herein include both a peptide linker and a non-peptide linker. The proteins provided herein may also include multiple linkers, including at least one peptide linker, at least one non-peptide linker, or at least one peptide linker and at least one non-peptide linker.

[0083] The linker may be short or long, flexible or rigid. See, for example, PCT / US2020 / 051383 (incorporated herein by reference in its entirety), and International Publication No. 2020 / 168102 (incorporated herein by reference in its entirety), and U.S. Patent Application Publication No. 2021 / 0017506 (incorporated herein by reference in its entirety).

[0084] In some embodiments, the length of the linker can affect one or more functions of the fusion protein. Selection of a linker to achieve the desired length is within the ability of one of ordinary skill in the art. In some embodiments, the peptide linker may be, for example, 5 to 100 amino acids in length or more (e.g., 5 aa, 10 aa, 15 aa, 20 aa, 25 aa, 30 aa, 35 aa, 40 aa, 45 aa, 50 aa, 55 aa, 60 aa, 65 aa, 70 aa, 75 aa, 80 aa, 85 aa, 90 aa, 95 aa, or 100 aa).

[0085] Linker sequences can have various conformations depending on their length, such as helices, β-strands, coils / bends, and turns, in terms of secondary structure. In some examples, the linker sequence may have an extended conformation and can function as an independent domain that does not interact with adjacent protein domains. The linker sequence may be flexible or rigid. Flexible linkers provide some degree of movement or interaction of polypeptide domains and are generally rich in small or polar amino acids such as Gly and Ser (e.g., at least 90%, at least 95%, at least 98%, at least 99%, or all of the amino acid residues of the linker are either Gly or Ser). Rigid linkers can be used to maintain a fixed distance between domains and help maintain their independent functions. The linker can be attached by an amide bond (e.g., a peptide bond) or other functional groups as further discussed below.

[0086] In some embodiments, the peptide linkers described herein include an amino acid sequence having at least 90% sequence identity with SEQ ID NO: 7. In some embodiments, the linker includes one or more repeats (e.g., 2 repeats, 3 repeats, 4 repeats, 5 repeats, 6 repeats, or more) of GGGGS (SEQ ID NO: 125) and / or one or more repeats of GSSGSS (SEQ ID NO: 126). Further exemplary peptide linkers include, but are not limited to, SGSETPGTSESATPE (SEQ ID NO: 127), SGSETPGTSESATPES (SEQ ID NO: 128), (GGGGS)3 (SEQ ID NO: 129), (GGGGS)5 (SEQ ID NO: 130), (GGGGS)10 (SEQ ID NO: 131), GGGGGGGG (SEQ ID NO: 132), GSAGSAAGSGEF (SEQ ID NO: 133), A(EAAAK)3A (SEQ ID NO: 134), or A(EAAAK)10A (SEQ ID NO: 135). Further non-limiting exemplary linkers that can be used include those disclosed in PCT / US2020 / 051383, Chen et al., Adv. Drug. Deliv. Rev. 65(10):1357-1369(2014), and Rosemalen et al., Biochemistry 2017, 56, 50, 6565-6574 (the entire contents of both of which are incorporated herein by reference).

[0087] In some embodiments, the non-peptide linker can include any of several known chemical linkers. Exemplary chemical linkers include one or more units of β-alanine, 4-aminobutyric acid (GABA), (2-aminoethoxy)acetic acid (AEA), 5-aminohexanoic acid (Ahx), PEG polymers, and trioxatricdecan-succinamic acid (Ttds). In some embodiments, the non-peptide linker includes one or more units of polyethylene glycol (PEG), which is often used as a linker for conjugation of polypeptide domains due to its water solubility, lack of toxicity, low immunogenicity, and well-defined chain length. See, e.g., Ramirez-Paz, J., et al., PLoS One 13(7):e0197643 (2018). The number of PEG linking units may be selected based on the desired length of the linker.

[0088] Modified proteins containing non-peptide linkers can be made in a variety of ways. For example, the site-specific nuclease and the non-specific end-processing enzyme can be made separately (e.g., in vitro or by expression in and purification from a host cell) and chemically linked in vitro. In some embodiments, the site-specific nuclease, the non-specific end-processing enzyme, and the linker can each be made separately in vitro and chemically linked. A variety of chemical linkers can be used to crosslink two amino acid residues.

[0089] Also, in this specification, embodiments are contemplated in which the site-specific nuclease and the non-specific end-processing enzyme as described above are used separately (e.g., separately introduced into cells or separately applied to the target nucleic acid) without using the linker as described above to provide a complex. Various methods of forming a complex between two or more polypeptides are known in the art and include, but are not limited to, using protein-protein interaction strategies (e.g., SunTag, coiled coil, etc.), using RNA aptamers and related binding proteins (e.g., MS2, N22, etc.); catcher strategies and Tags. For example, the site-specific nuclease of the present disclosure may include an MS2 RNA aptamer, which is thought to facilitate interaction with a non-specific end-processing enzyme containing the MS2 coat protein.

[0090] In some embodiments, the fusion proteins provided herein include targeting sequences that mediate the localization (or retention) of the protein to an intracellular location, such as the plasma membrane or the membrane of a given organelle, nucleus, cytosol, mitochondria, endoplasmic reticulum (ER), Golgi, chloroplast, apoplast, peroxisome, or other organelle. For example, the targeting sequence can utilize a nuclear localization signal (NLS) to direct a protein (e.g., a nuclease) to the nucleus; utilize a nuclear export signal (NES) to direct it outside of the cell nucleus, e.g., to the cytoplasm; utilize a mitochondrial targeting signal to direct it to the mitochondria; utilize an ER retention signal to direct it to the endoplasmic reticulum (ER); utilize a peroxisome targeting signal to direct it to the peroxisome; utilize a membrane localization signal to direct it to the plasma membrane; or combinations thereof. In some embodiments, the fusion protein includes a nuclear localization signal.Non-limiting examples of NLSs include NLS sequences derived from the following: the NLS of SV40 virus large T antigen having the amino acid sequence PKKKRKV (SEQ ID NO: 8); the NLS from nucleoplasmin (e.g., the bipartite NLS of nucleoplasmin of sequence KRPAATKKAGQAKKKK (SEQ ID NO: 136)); the c-myc NLS having the amino acid sequence PAAKRVKLD (SEQ ID NO: 137) or RQRRNELKRSP (SEQ ID NO: 138); the hRNPA1 M9 NLS having the sequence NQSSNFGPMKGGNFGGRSSGPYGGGGQYFAKPRNQGGY (SEQ ID NO: 139); the sequence of the IBB domain from importin α, RMRIZFKNKGKDTAELRRRRVEVSVELRKAKKDEQILKRRNV (SEQ ID NO: 140); the sequences of VSRKRPRP (SEQ ID NO: 141) and PPKKARED (SEQ ID NO: 142) of the myogenic T protein; the sequence of human p53, PQPKKKPL (SEQ ID NO: 143); the sequence of mouse c-abl IV, SALIKKKKKMAP (SEQ ID NO: 144); the sequences of DRLRR (SEQ ID NO: 145) and PKQKKRK (SEQ ID NO: 146) of influenza virus NS1; the sequence of hepatitis virus δ antigen, RKLKKKIKKL (SEQ ID NO: 147); the sequence of mouse Mx1 protein, REKKKFLKRR (SEQ ID NO: 148); the sequence of human poly(ADP-ribose) polymerase, KRKGDEVDGVDEVAKKKSKK (SEQ ID NO: 149); the sequence of the human glucocorticoid steroid hormone receptor, RKCLQAGMNLEARKTKK (SEQ ID NO: 150); and the sequence of the Agrobacterium VirD2 protein, KRPRDRHDGELGGRKRAR (SEQ ID NO: 151).

[0091] In some embodiments, the fusion proteins provided herein include an amino acid sequence having at least 70% (e.g., at least 75%, at least 80%, at least 85%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or 100%) identity to any one of SEQ ID NOs: 50 - 57.

[0092] Any of the polypeptides and fusion proteins described herein can further include a detectable moiety, such as a fluorescent protein or a fragment thereof. Examples of fluorescent proteins include, but are not limited to, yellow fluorescent protein (YFP, e.g., Venus), green fluorescent protein (GFP), and red fluorescent protein (RFP), and derivatives of these proteins, such as mutagenized derivatives. See, for example, Chudakov et al., “Fluorescent Proteins and Their Applications in Imaging Living Cells and Tissues,” Physiological Reviews 90(3):1103-1163 (2010); and Specht et al., “A Critical and Comparative Review of Fluorescent Tools for Live-Cell Imaging,” Annual Review of Physiology 79:93-117 (2017)).

[0093] Any of the polypeptides described herein can further include an affinity tag, such as, by way of example, a polyhistidine tag (e.g., (His)6 (SEQ ID NO: 152)), an HA tag (e.g., YPYDVPDYA (SEQ ID NO: 153)), an albumin-binding protein, alkaline phosphatase, an AU1 epitope, an AU5 epitope, a biotin carboxyl carrier protein (BCCP), a FLAG epitope (e.g., DYKDDDDK (SEQ ID NO: 154), or a MYC epitope (e.g., EQKLISEEDL (SEQ ID NO: 155)). See Kimple et al., “Overview of Affinity Tags for Protein Purification,” Curr. Protoc. Protein Sci. 73:Unit-9.9 (2013).

[0094] E. Mutants Also provided herein are variants of the disclosed polypeptides. Polypeptide variants retain their respective biological activities unless otherwise expressly noted. For example, variants of a site-specific nuclease polypeptide retain the biological function of the full-length native sequence site-specific nuclease. In another example, variants of a non-specific end-processing enzyme retain the biological function of the full-length native sequence non-specific end-processing enzyme.

[0095] Modifications to any of the polypeptides or proteins provided herein are made by known methods. By way of example, a modification is made by site-directed mutagenesis of nucleotides in the nucleic acid encoding the polypeptide, thereby creating DNA encoding the modification, and then expressing the DNA in a recombinant cell culture to produce the encoded polypeptide. Techniques for making substitution mutations at predetermined sites in DNA having a known sequence are well known. For example, one or more substitution mutations can be made using M13 primer mutagenesis and PCR-based mutagenesis methods. Any of the nucleic acid sequences provided herein can be codon-optimized, for example, to maximize, such that expression in a host cell or organism is altered.

[0096] The amino acids in the polypeptides described herein may be any of the 20 naturally occurring amino acids, D-stereoisomers of naturally occurring amino acids, unnatural amino acids, and chemically modified amino acids. Unnatural amino acids (i.e., those not found in proteins in nature) are also known in the art, for example, as shown in Zhang et al. “Protein engineering with unnatural amino acids,” Curr.Opin.Struct.Biol.23(4):581-587(2013); Xie et la. “Adding amino acids to the genetic repertoire,” 9(6):548-54(2005)); and all the references cited therein. β and γ amino acids are known in the art and are also contemplated herein as unnatural amino acids.

[0097] As used herein, a chemically modified amino acid refers to an amino acid whose side chain has been chemically modified. For example, the side chain can be modified to include a signaling moiety, such as a fluorophore or a radiolabel. The side chain can also be modified to include a new functional group, such as a thiol, carboxylic acid, or amino group. Post-translationally modified amino acids are also included in the definition of chemically modified amino acids.

[0098] Conservative amino acid substitutions are also contemplated. By way of example, conservative amino acid substitutions can be made at one or more amino acid residues, for example, at one or more lysine residues of any of the polypeptides provided herein. One of ordinary skill in the art will know that a conservative substitution is the replacement of one amino acid residue with another that is biologically and / or chemically similar. The following eight groups each contain amino acids that are conservative substitutions for one another: 1) Alanine (A), Glycine (G); 2) Aspartic acid (D), Glutamic acid (E); 3) Asparagine (N), Glutamine (Q); 4) Arginine (R), Lysine (K); 5) Isoleucine (I), Leucine (L), Methionine (M), Valine (V); 6) Phenylalanine (F), Tyrosine (Y), Tryptophan (W); 7) Serine (S), Threonine (T); and 8) Cysteine (C), Methionine (M).

[0099] As an example, when serine is mentioned as being from arginine, conservative substitutions of serine (e.g., threonine) are also contemplated. Non-conservative substitutions, such as substituting lysine with asparagine, are also contemplated.

[0100] IV. Recombinant Nucleic Acids, Constructs, Vectors, and Host Cells Also provided herein are recombinant nucleic acids encoding any of the polypeptides described herein. For example, recombinant nucleic acids encoding polypeptides having at least 70% (e.g., at least 75%, at least 80%, at least 85%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or 100%) identity to any of SEQ ID NOs: 4-6, 18-20, 22, or 50-74 are also provided. Also provided are recombinant nucleic acids having at least 70% identity to any of SEQ ID NOs: 21, 32, or 33.

[0101] Also provided are DNA constructs comprising a promoter operably linked to a recombinant nucleic acid encoding a fusion protein or domain thereof as described herein. A nucleic acid is "operably linked" when it is placed into a functional relationship with another nucleic acid sequence. A number of promoters can be used in the constructs described herein. A promoter is a region or sequence located upstream and / or downstream of the start of transcription that is involved in the recognition and binding of RNA polymerase and other proteins for initiating transcription.

[0102] As used herein, the term "promoter" refers to a nucleotide sequence that controls the expression of a coding sequence by providing recognition by RNA polymerase and other factors necessary for proper transcription, and is usually upstream (5' side) of the coding sequence. A "promoter regulatory sequence" consists of proximal and more distal upstream elements. Promoter regulatory sequences affect the transcription, RNA processing or stability, or translation of the associated coding sequence. Regulatory sequences include enhancers, promoters, untranslated leader sequences, introns, and polyadenylation signal sequences. These include natural and synthetic sequences, and sequences that can be combinations of natural and synthetic sequences. An "enhancer" is a DNA sequence that can stimulate promoter activity and can be an element intrinsic to the promoter or a heterologous element inserted to increase the level or tissue specificity of the promoter. It has the ability to operate in both orientations (normal or inverted) and to function when moving either upstream or downstream from the promoter. The meaning of the term "promoter" includes "promoter regulatory sequences".

[0103] The choice of which promoter to include depends on several factors including, but not limited to, efficiency, selectability, inducibility, desired expression level, and cell- or tissue-preferential expression. It is routine for those skilled in the art to regulate the expression of a sequence by appropriate selection and placement of a promoter and other regulatory regions for that sequence.

[0104] Certain promoters have been shown to be capable of directing RNA synthesis at a higher rate than others. These are called "strong promoters." Certain other promoters have been shown to direct RNA synthesis more highly only in specific types of cells or tissues, and when a promoter preferentially directs RNA synthesis to a particular tissue (where RNA synthesis can occur at low levels in other tissues), it is often referred to as a "tissue-specific promoter," or a "tissue-preferred promoter." Since the expression pattern of a chimeric gene (or genes) introduced into a plant is controlled using a promoter, there continues to be interest in the isolation of novel promoters capable of controlling the expression of a chimeric gene (or genes) at a certain level in specific tissue types or at specific plant developmental stages.

[0105] Certain types of promoters are capable of directing RNA synthesis at relatively similar levels in all tissues of a plant. These are called "constitutive promoters" or "tissue-independent" promoters. Constitutive promoters can be divided into strong, intermediate, and weak categories based on their effectiveness in directing RNA synthesis. In many cases, constitutive promoters are particularly useful in that it is often necessary to simultaneously express a chimeric gene (or genes) in different plant tissues to obtain the desired function of the gene (or genes). Many constitutive promoters have been discovered and characterized from plants and plant viruses, but there continues to be interest in the isolation of more novel synthetic or natural constitutive promoters capable of controlling the expression of a chimeric gene (or genes) at different levels and expression in multiple genes in the same transgenic plant for gene stacking.

[0106] Among the most commonly used promoters are, in particular, the nopaline synthase (NOS) promoter (Ebert et al., Proc. Natl. Acad. Sci. USA 84:5745-5749 (1987)); the octapin synthase (OCS) promoter; caulimovirus promoters such as the cauliflower mosaic virus (CaMV) 19S promoter (Lawton et al., Plant Mol. Biol. 9:315-324 (1987)); the light-inducible promoter from the small subunit of ribulose bisphosphate carboxylase (Pellegrineschi et al., Biochem. Soc. Trans. 23(2):247-250 (1995)); the Adh promoter (Walker et al., Proc. Natl. Acad. Sci. USA 84:6624-66280 (1987)); the sucrose synthase promoter (Yang et al., Proc. Natl. Acad. Sci. USA 87:414-44148 (1990)); the R gene complex promoter (Chandler et al., Plant Cell 1:1175-1183 (1989)); the chlorophyll a / b binding protein gene promoter, and the like.”

[0107] Furthermore, it is contemplated that promoters that combine elements from two or more promoters may be useful. For example, U.S. Patent No. 5,491,288 discloses combining the cauliflower mosaic virus promoter with a histone promoter. Thus, elements from the promoters disclosed herein may be combined with elements from other promoters. Promoters useful for plant transgene expression include inducible, viral, synthetic, constitutive (Odell Nature 313:810-812 (1985)), temporally regulated, spatially regulated, tissue-specific, and spatiotemporally regulated promoters. Using the regulatory elements described herein, numerous agronomic genes can be expressed in transgenic plants. More specifically, plants can be genetically engineered to express a variety of phenotypes of agronomic interest.”

[0108] In some embodiments of the DNA constructs provided herein, the promoter may be a eukaryotic or prokaryotic promoter. In some embodiments, the promoter is an inducible promoter, a native inducible promoter (e.g., drought-inducible Rab17), a synthetic inducible promoter (e.g., auxin-inducible DR5, estradiol-inducible XVE / pLex, dexamethasone-inducible GVG / Gal4), a constitutive promoter (e.g., ZmUbq1, OsAct1, OsTub3, EF), an egg cell-specific promoter (e.g., EC1, EC2, EC3, EC4, EC5), a pollen-specific promoter, a shoot apical meristem-specific promoter, or a promoter with enhanced expression in the zygote. In some embodiments, the promoter is a floral mosaic promoter (e.g., ZmBde1, OsAP1). In some embodiments, the promoter is a ubiquitin 4 promoter, an actin promoter, a tubulin promoter, a MADS box promoter, or a plant virus promoter. Suitable promoters are disclosed, for example, in U.S. Patent No. 10,519,456 (the entire content of which is incorporated herein by reference) and PCT / US2022 / 020690 (incorporated herein by reference).

[0109] The recombinant nucleic acids provided herein can be included in an expression cassette for expression in a host cell or organism of interest. The cassette will include 5' and 3' regulatory sequences operably linked to the recombinant nucleic acids provided herein, which will enable the expression of the fusion protein. The cassette can additionally contain at least one additional gene or gene element that will be co-transformed into the cell or organism. When additional genes or elements are included, the components are operably linked. Alternatively, the additional genes or elements can be provided on multiple expression cassettes. Such expression cassettes comprise multiple restriction sites and / or recombination sites for the insertion of polynucleotides under the transcriptional regulation of the regulatory region. The expression cassette can additionally contain a selectable marker gene. The expression cassette will include, in the 5' to 3' transcriptional direction, a transcriptional and translational start region (i.e., a promoter) that is functional in the cell or organism of interest, a polynucleotide of the invention, and a transcriptional and translational termination region (i.e., a termination region). The promoter of the invention has the ability to direct or drive the expression of a coding sequence (i.e., a nucleic acid sequence that is transcribed into an RNA such as mRNA, rRNA, tRNA, snRNA, ncRNA, lncRNA, sense RNA, or antisense RNA, whether or not that RNA is subsequently translated to produce a protein) in a host cell. The regulatory regions (i.e., the promoter, transcriptional regulatory region, and translational termination region) can be endogenous or heterologous to the host cell and / or to each other. As used herein, "heterologous" with respect to a sequence means a sequence that is derived from a foreign species or, if derived from the same species, has been substantially modified from its natural form in composition and / or genomic locus by intentional human intervention.

[0110] Additional regulatory signals include, but are not limited to, the start site of transcription initiation, operator, activator, enhancer, other regulatory elements, ribosome binding site, start codon, termination signal, etc. See Sambrook et al. (1992) Molecular Cloning: A Laboratory Manual, ed. Maniatis et al. (Cold Spring Harbor Laboratory Press, Cold Spring Harbor, N.Y.); Davis et al., eds. (1980) Advanced Bacterial Genetics (Cold Spring Harbor Laboratory Press), Cold Spring Harbor, N.Y., and the references cited therein.

[0111] The expression cassette can also contain a selectable marker gene for selecting transformed cells. Examples of marker genes include genes conferring antibiotic resistance such as those conferring hygromycin resistance, ampicillin resistance, gentamicin resistance, neomycin resistance, etc. Further selectable markers are known and any one can be used.

[0112] When preparing the expression cassette, various DNA fragments can be manipulated so as to provide the DNA sequences in the appropriate orientation and, if necessary, in the appropriate reading frame. For this purpose, DNA fragments can be ligated using adapters or linkers, or other manipulations can be involved to provide convenient restriction sites, removal of extra DNA, removal of restriction sites, etc. For this purpose, in vitro mutagenesis, primer repair, restriction, annealing, replacement, such as transition and transversion, can be involved.

[0113] When preparing an expression cassette, various DNA fragments can be manipulated so that the DNA sequences are provided in the appropriate orientation and, if necessary, in the appropriate reading frame. For this purpose, DNA fragments can be ligated using adapters or linkers, or other manipulations may be involved to provide convenient restriction sites, removal of extra DNA, removal of restriction sites, etc. For this purpose, in vitro mutagenesis, primer repair, restriction, annealing, replacement, for example, transposition and transformation may be used.

[0114] Furthermore, vectors are provided that contain the recombinant nucleic acids or DNA constructs shown herein. The vectors are intended to have the functional elements necessary to direct and regulate the transcription of the inserted nucleic acid. Such functional elements include, but are not limited to, promoters, regions upstream or downstream of the promoter, for example, enhancers that can regulate the transcriptional activity of the promoter, origins of replication, restriction sites appropriate for facilitating the cloning of the insert adjacent to the promoter, antibiotic resistance genes or other markers that can be useful for the selection of cells containing the vector or vectors containing the insert, RNA splice junctions, transcription termination regions, or any other region that can be useful for promoting the expression of the inserted gene or hybrid gene. Generally, see Sambrook et al. Molecular Cloning: A Laboratory Manual, 4 th ed., Cold Spring Harbor Laboratory Press, Cold Spring Harbor, 2012. The vector may be, for example, a plasmid.

[0115] There are numerous Escherichia coli (E. coli) expression vectors known to those skilled in the art that are useful for the expression of nucleic acids. Other microbial hosts suitable for use include bacilli such as Bacillus subtilis, and other Enterobacteriaceae bacteria such as those of the genus Salmonella and the genus Serratia, and various Pseudomonas species. In these prokaryotic hosts, expression vectors can also be made, which will typically contain expression control sequences (e.g., origin of replication) compatible with the host cell. In addition, there are numerous well-known promoters, such as the lactose promoter system, the tryptophan (Trp) promoter system, the β-lactamase promoter system, or the promoter system from λ phage. In addition, yeast expression can be used. Nucleic acids encoding the polypeptides of the present invention are provided herein, where the nucleic acids can be expressed by yeast cells. More specifically, the nucleic acids can be expressed by Pichia pastoris or S. cerevisiae.

[0116] Mammalian cells also enable the expression of proteins in an environment that favors important post-translational modifications, such as folding and cysteine pair formation, addition of complex carbohydrate structures, and secretion of active proteins. Vectors useful for the expression of active proteins in mammalian cells are known in the art and can include genes conferring hygromycin resistance, geneticin or G418 resistance, or other genes or phenotypes suitable for use as selectable markers, or genes conferring methotrexate resistance for gene amplification. In the art, several suitable host cell lines with the ability to secrete intact human proteins have been developed, including CHO cells, HeLa cells, HEK-293 cells, HEK-293T cells, U2OS cells, or any other primary or transformed cell line. Other suitable host cell lines include COS-7 cells, myeloma cell lines, Jurkat cells, and the like. Expression vectors for these cells can include expression control sequences, such as an origin of replication, a promoter, an enhancer, and necessary information processing sites such as a ribosome binding site, an RNA splice site, a polyadenylation site, and a transcription terminator sequence. Preferred expression control sequences are promoters derived from immunoglobulin genes, SV40, adenovirus, bovine papillomavirus, and the like.

[0117] Also included as expression vectors described herein are nucleic acids as described herein under the control of inducible promoters, such as tetracycline-inducible promoters or glucocorticoid-inducible promoters. The nucleic acids of the present invention may also be under the control of tissue-specific promoters that promote the expression of the nucleic acids in specific cells, tissues, or organs. Also contemplated are any regulatable promoters well known in the art, such as the metallothionein promoter, the heat shock promoter, and other regulatable promoters. Furthermore, Cre-loxP inducible systems, as well as Flp recombinase inducible promoter systems, can be used, both of which are known in the art.

[0118] Insect cells can also enable the expression of polypeptides. Recombinant proteins produced in insect cells with baculovirus vectors undergo post-translational modifications similar to wild-type mammalian proteins.

[0119] Also provided herein are host cells comprising the recombinant nucleic acids, DNA constructs, and / or vectors described herein, as well as methods of making such cells. In some embodiments, the cells are plant cells. In some embodiments, the plant cells are corn plant cells, soybean plant cells, rice plant cells, wheat plant cells, or sunflower plant cells.

[0120] Host cells comprising the nucleic acids or vectors described herein are provided. The host cells may be in vitro, ex vivo, or in vivo host cells. The host cells as provided herein have the ability to express a fusion protein. Also provided is a cell population of any of the host cells described herein. In some embodiments, the cell population comprises a plurality of cells, where the plurality of cells comprises a recombinant nucleic acid encoding a fusion protein as described herein. In some embodiments, the cell population comprises a plurality of cells, where the plurality of cells comprises a DNA construct encoding a fusion protein as described herein. In some embodiments, the cell population comprises a plurality of cells, where the plurality of cells comprises a vector comprising a recombinant nucleic acid or DNA construct encoding a fusion protein as described herein. In some embodiments, the cell population comprises a plurality of cells, where the plurality of cells comprises a plurality of any of the host cells described herein. In some embodiments, the plurality of cells of any of the cell populations described herein express a fusion protein as described herein.

[0121] In some embodiments, the provided cells stably or transiently express a fusion protein. Stable expression of a fusion protein in a cell refers to the integration of any of the nucleic acids, DNA constructs, or vectors described herein into the genome of the cell, thereby enabling the cell to express the fusion protein. Transient expression refers to the direct expression of a fusion protein from any of a nucleic acid, DNA construct, and / or vector after introduction into the cell (i.e., the gene encoding the fusion protein is not integrated into the genome of the cell).

[0122] In some embodiments, the provided cells constitutively or inducibly express a fusion protein. Constitutive expression refers to the ongoing continuous expression of a gene (i.e., a protein), while inducible expression refers to gene (protein) expression in response to a stimulus. Inducible expression is generally regulated by an inducible promoter, and an explanation thereof is included above.

[0123] Also provided is a cell culture comprising one or more host cells described herein. In the art, many methods for culturing and producing cells are available, including cells of bacterial origin (e.g., E. coli and other bacterial strains), animal origin (particularly mammalian origin), and archebacterial origin. For example, Sambrook, supra; Ausubel, ed. (1995) Current Protocols in Molecular Biology, John Wiley & Sons, as well as Freshney (1994) Culture of Animal Cells, a Manual of Basic Technique, 3 rdEd., Wiley-Liss, New York and the references cited therein; Doyle and Griffiths (1997) Mammalian Cell Culture: Essential Techniques John Wiley and Sons, NY; Humason (1979) Animal Tissue Techniques, 4 th Ed. W.H. Freeman and Company; see also Ricciardelli, et al., (1989) In vitro Cell Dev. Biol. 25:1016-1024.

[0124] The host cell may be a prokaryotic cell, including, for example, a bacterial cell. Alternatively, the cell may be a eukaryotic cell, such as, for example, a mammalian cell. In some embodiments, the cell may be a HEK-293T cell, a HEK-293 cell, a Chinese hamster ovary (CHO) cell, a U2OS cell, or any other primary or transformed cell. In some embodiments, the cell may be a COS-7 cell, a HELA cell, an avian cell, a myeloma cell, a Pichia cell, an insect cell, or a plant cell. Numerous other suitable host cell lines have been developed, including various tumor cell lines such as myeloma cell lines, fibroblast cell lines, and melanoma cell lines. The vector containing the nucleic acid segment of interest can be introduced or transfected into the host cell by well-known methods depending on the type of cell host.

[0125] As used herein, the phrase "introducing" refers to changing the position of a nucleic acid sequence from outside the cell to inside the cell in the context of introducing a nucleic acid into a cell (e.g., a prokaryotic cell, a bacterial cell, a eukaryotic cell, a plant cell). In some cases, introducing refers to changing the position of a nucleic acid from outside the cell to inside the nucleus of the cell. Where two or more nucleic acid molecules are to be introduced, they can assemble as part of a single polynucleotide or nucleic acid construct, or as separate polynucleotides or nucleic acid constructs, and can be located in the same or different nucleic acid constructs. Thus, such polynucleotides can be introduced into a cell (e.g., a plant cell) in a single transformation event, in separate transformation events, or, for example, as part of a breeding protocol. A variety of methods for introducing nucleic acids into cells are contemplated, including, but not limited to, electroporation, nanoparticle delivery, biolistic transformation, viral delivery, contact with nanowires or nanotubes, receptor-mediated internalization, translocation by cell-penetrating peptides, liposome-mediated translocation, DEAE dextran, lipofectamine, calcium phosphate, or any method currently known or later identified for introducing nucleic acids into prokaryotic or eukaryotic cell hosts. Targeted nuclease systems (e.g., RNA-guided nucleases, transcription activator-like effector nucleases (TALENs), zinc finger nucleases (ZFNs), or megaTALs (MTs)) can also be used to introduce nucleic acids, e.g., nucleic acids encoding the fusion proteins described herein, into host cells. See Li et al. Signal Transduction and Targeted Therapy 5, Article No. 1 (2020).

[0126] Cell transformation can be either stable or transient. Thus, the gene-introduced cells, plant cells, plants and / or plant parts of the present invention can be stably transformed or transiently transformed. Transformation can refer to the transfer of a nucleic acid molecule into the genome of a host cell, resulting in its genetic and stable inheritance. In some embodiments, introduction into plants, plant parts and / or plant cells is via bacterial-mediated transformation, particle bombardment transformation, calcium phosphate-mediated transformation, cyclodextrin-mediated transformation, electroporation, liposome-mediated transformation, nanoparticle-mediated transformation, polymer-mediated transformation, virus-mediated nucleic acid delivery, whisker-mediated nucleic acid delivery, microinjection, sonication, infiltration, polyethylene glycol-mediated transformation, protoplast transformation, or any other electrical, chemical, physical and / or biological mechanism that results in the introduction of nucleic acids into plants, plant parts and / or their cells, or any combination thereof.

[0127] Plant transformation procedures are well-known and routinely performed in the art and are described throughout this document. Non-limiting examples of methods for transforming plant bodies include transformation by bacterial-mediated nucleic acid delivery (e.g., by bacteria from the genus Agrobacterium), virus-mediated nucleic acid delivery, silicon carbide or nucleic acid whisker-mediated nucleic acid delivery, liposome-mediated nucleic acid delivery, microinjection, particle bombardment, calcium phosphate-mediated transformation, cyclodextrin-mediated transformation, electroporation, nanoparticle-mediated transformation, sonication, infiltration, PEG-mediated nucleic acid uptake, and any other electrical, chemical, physical (mechanical), and / or biological mechanisms that result in the introduction of nucleic acids into plant cells, including any combination of these. General manuals regarding various plant transformation methods known in the art include Miki et al. (“Procedures for Introducing Foreign DNA into Plants” in Methods in Plant Molecular Biology and Biotechnology, Glick, B.R. and Thompson, J.E., Eds. (CRC Press, Inc., Boca Raton, 1993), pages 67-88) and Rakowoczy-Trojanowska (Cell Mol Biol Lett 7:849-858 (2002)).

[0128] Agrobacterium-mediated transformation is a commonly used method for plant transformation due to its high transformation efficiency and broad utility in numerous different species. Agrobacterium-mediated transformation typically involves introducing a binary vector carrying the foreign DNA of interest into a suitable Agrobacterium strain that may rely on a complement of vir genes carried by the host Agrobacterium strain either on a co-resident Ti plasmid or chromosomally (Uknes et al. 1993, Plant Cell 5:159-169). Transfer of the recombinant binary vector into Agrobacterium can be achieved by a triparental mating procedure using Escherichia coli carrying the recombinant binary vector and a helper E. coli strain carrying a plasmid capable of mobilizing the recombinant binary vector into the target Agrobacterium strain. Alternatively, the recombinant binary vector can be introduced into Agrobacterium by nucleic acid transformation (Hoefgen and Willmitzer 1988, Nucleic Acids Res 16:9877).

[0129] Transformation of plants by recombinant Agrobacterium usually involves co-cultivation of Agrobacterium with explants from the plant and follows methods well known in the art. The transformed tissue carries an antibiotic or herbicide resistance marker between the binary plasmid T-DNA borders and is typically regenerated on selective media.

[0130] Another method for transforming plants, plant parts, and plant cells involves firing inert or biologically active particles at plant tissues and cells. See, for example, U.S. Patent Nos. 4,945,050; 5,036,006; and 5,100,792. Generally, this method involves firing inert or biologically active particles at plant cells under conditions effective to penetrate the outer surface of the cell and effect uptake into its interior. When inert particles are utilized, the vector containing the nucleic acid of interest can be introduced into the cell by coating the particles with the vector. Alternatively, by surrounding one or more cells with the vector, the vector may subsequently be carried into the cell following the particle. Biologically active particles (e.g., dried yeast cells, dried bacteria, or bacteriophage, each containing one or more nucleic acids to be introduced) can also be fired into plant tissue. As used herein, the phrase "biolistic transformation" refers to a method of directly introducing RNA or DNA into a cell (e.g., a plant cell), where the RNA or DNA is mixed with heavy metal particles (e.g., tungsten or gold) and released into the cell (e.g., a plant cell) using high velocity pressure, thereby enabling the RNA or DNA to penetrate the cell (e.g., penetrate the plant cell wall).

[0131] The CRISPR / Cas system can also be used to edit the genome of a host cell or organism. As detailed above, the "CRISPR / Cas" system refers to a broad class of bacterial systems for defense against foreign nucleic acids. Any of the CRISPR / Cas system components described herein can be used to introduce a fusion protein, recombinant nucleic acid, or system into the genome of a host cell or organism. CRISPR / Cas system-mediated genome editing methods are known in the art. It will be understood that the introduction of the fusion proteins, recombinant nucleic acids, or systems described herein using the CRISPR / Cas system into the genome of a host cell or organism is different from the detailed methods and systems provided herein.

[0132] Any of the fusion proteins described herein can be purified or isolated from a host cell or a population of host cells. For example, a recombinant nucleic acid encoding any of the fusion proteins described herein can be introduced into a host cell under conditions that permit expression of the fusion protein. In some embodiments, the recombinant nucleic acid is codon-optimized for expression. After expression in the host cell, the fusion protein can be isolated or purified using purification methods known in the art.

[0133] V. SYSTEM In another aspect, provided herein is a system useful for editing one or more nucleic acids. The system includes one or more of the fusion proteins (or recombinant nucleic acids, constructs, vectors, or host cells) described above. In some embodiments, the system further includes one or more additional elements useful for editing one or more nucleic acids. For example, the systems provided herein can further include a donor polynucleotide. As another example, a system that includes a fusion protein comprising a Cas nuclease can further include one or more guide nucleic acids and / or one or more donor polynucleotide sequences. Donor polynucleotides and guide nucleic acids are described in more detail below. The systems provided herein are useful for performing the methods described in Section VI of this disclosure.

[0134] A. Donor Polynucleotide The systems and methods of the present disclosure may include a donor polynucleotide. A "donor polynucleotide," "donor molecule," or "donor template" is a nucleotide polymer or oligomer intended for insertion into a target polynucleotide, typically a target genomic site. The donor sequence may be one or more transgenes, expression cassettes, or nucleotide sequences of interest. The donor molecule can be a single-stranded, partially double-stranded, or double-stranded donor DNA molecule. The donor polynucleotide can be a natural or modified polynucleotide, an RNA-DNA chimera, or a DNA fragment (either single-stranded or at least partially double-stranded, or completely double-stranded DNA molecule), or a PGR amplified ssDNA or at least partially dsDNA fragment. In some embodiments, the donor DNA molecule is part of a circularized DNA molecule. Since dsDNA fragments are generally more resistant to nuclease degradation compared to ssDNA, in some examples, fully double-stranded donor DNA can provide increased stability.

[0135] The donor molecule may contain at least 10 consecutive nucleotides (often referred to as homology arms), where this nucleic acid molecule is at least 70% identical to the genomic nucleotide sequence, so that these consecutive nucleotides are sufficient, for example, to homologous recombine the donor DNA molecule into the cell's genome at the target genomic DNA sequence after cleavage by a site-specific nuclease. In some embodiments, the donor DNA molecule can contain at least about 10, 20, 30, 50, 70, 80, 100, 150, 200, 250, 300, 250, 400, 450, 500, 600, 700, 800, 900, 1000, 1500, 2000, 2500, 3000, 3500, 4000, 4500, 5000, 7500, 10000, 15,000 or 20,000 nucleotides (including any value within this range not explicitly listed herein), and the donor DNA molecule is at least 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% or 100% identical to the genomic nucleic acid sequence. In some embodiments, the donor DNA molecule can be substantially complementary to the genomic nucleic acid sequence. In some embodiments, the donor DNA molecule contains a heterologous nucleic acid sequence. In some embodiments, the donor DNA molecule contains at least one expression cassette. In some embodiments, the donor DNA molecule can contain a transgene containing at least one expression cassette. In some embodiments, the donor DNA molecule contains an allelic modification of a gene native to the target genome. The allelic modification can include at least one nucleotide insertion, at least one nucleotide deletion, and / or at least one nucleotide substitution. In some embodiments, the allelic modification can include small insertions or deletions. In some embodiments, the donor DNA molecule contains homology arms for the target genomic site. In some embodiments, the donor DNA molecule contains at least 100 consecutive nucleotides that are at least 90% identical to the genomic nucleic acid sequence, and may optionally further contain a heterologous nucleic acid sequence such as a transgene.

[0136] The donor polynucleotide can be any suitable nucleic acid. In some embodiments, the donor nucleic acid is a portion of a donor template. In some embodiments, the donor template is a portion of a plasmid or linear nucleic acid. In some embodiments, the donor nucleic acid is a portion of a chromosome.

[0137] In some embodiments, the donor polynucleotide comprises a nucleotide sequence having at least 70% (e.g., at least 75%, at least 80%, at least 85%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or 100%) identity to SEQ ID NO: 75 or SEQ ID NO: 76.

[0138] B. Guide Nucleic Acid Optionally, the systems and methods described herein include at least one guide nucleic acid polynucleotide. Optionally, the systems and methods described herein include a plurality of guide nucleic acids. In some embodiments, the polynucleotide can be deoxyribonucleic acid (DNA). Optionally, the DNA sequence can be single-stranded or double-stranded. In some embodiments, at least one guide nucleic acid polynucleotide can be ribonucleic acid (guide RNA).

[0139] In some embodiments, the nuclease can complex with at least one guide RNA polynucleotide. The at least one guide RNA polynucleotide can include a nucleic acid targeting region that confers sequence specificity for nuclease targeting by including a sequence complementary to a nucleic acid sequence on a polynucleotide to be targeted, such as a genomic locus or gene to be targeted. In some embodiments, the at least one guide RNA polynucleotide can include two separate nucleic acid molecules, which can be referred to as double-guide nucleic acid or single nucleic acid molecule, which can be referred to as single-guide nucleic acid (e.g., single-guide RNA or sgRNA). In some embodiments, the guide nucleic acid is a single-guide nucleic acid that includes a fused CRISPR RNA (crRNA) and a trans-activating crRNA (tracrRNA). In some embodiments, the guide nucleic acid is a single-guide nucleic acid that includes crRNA. In some embodiments, the guide nucleic acid is a single-guide nucleic acid that includes crRNA but lacks tracrRNA. In some embodiments, the guide nucleic acid is a double-guide nucleic acid that includes an unfused crRNA and tracrRNA. Exemplary double-guide nucleic acids can include a crRNA-like molecule and a tracrRNA-like molecule. Exemplary single-guide nucleic acids can include a crRNA-like molecule. Exemplary single-guide nucleic acids can include a fused crRNA-like molecule and a tracrRNA-like molecule.

[0140] The crRNA can include a nucleic acid targeting segment of the guide nucleic acid (e.g., the spacer region) and a continuous stretch of nucleotides that can form one half of the duplex of the Cas protein-binding segment of the guide nucleic acid.

[0141] The tracrRNA can include a contiguous sequence of nucleotides that forms the other half of the double-stranded duplex of the Cas protein-binding segment of the gRNA. The contiguous sequence of nucleotides of the crRNA is complementary to the contiguous sequence of nucleotides of the tracrRNA and can hybridize therewith to form the double-stranded duplex of the Cas protein-binding domain of the guide nucleic acid.

[0142] The crRNA and the tracrRNA can hybridize to form a guide nucleic acid. The crRNA can also provide a single-stranded nucleic acid targeting segment (e.g., a spacer region) that hybridizes to a target nucleic acid recognition sequence (e.g., a protospacer). The sequence of the crRNA, including the spacer region, or the tracrRNA molecule can be designed to be species-specific for the species in which the guide nucleic acid is to be used.

[0143] Whether only the crRNA molecule or both the crRNA molecule and the tracrRNA molecule (whether covalently linked or not) are required for the nuclease depends on the CRISPR-associated nuclease used.

[0144] In some embodiments, the nucleic acid targeting region of the guide nucleic acid can be 18 to 72 nucleotides in length. The nucleic acid targeting region (e.g., spacer region) of the guide nucleic acid can have a length of about 12 nucleotides to about 100 nucleotides. For example, the nucleic acid targeting region (e.g., spacer region) of the guide nucleic acid can have a length of about 12 nucleotides (nt) to about 80 nt, about 12 nt to about 50 nt, about 12 nt to about 40 nt, about 12 nt to about 30 nt, about 12 nt to about 25 nt, about 12 nt to about 20 nt, about 12 nt to about 19 nt, about 12 nt to about 18 nt, about 12 nt to about 17 nt, about 12 nt to about 16 nt, or about 12 nt to about 15 nt. Alternatively, the DNA targeting segment can have a length of about 18 nt to about 20 nt, about 18 nt to about 25 nt, about 18 nt to about 30 nt, about 18 nt to about 35 nt, about 18 nt to about 40 nt, about 18 nt to about 45 nt, about 18 nt to about 50 nt, about 18 nt to about 60 nt, about 18 nt to about 70 nt, about 18 nt to about 80 nt, about 18 nt to about 90 nt, about 18 nt to about 100 nt, about 20 nt to about 25 nt, about 20 nt to about 30 nt, about 20 nt to about 35 nt, about 20 nt to about 40 nt, about 20 nt to about 45 nt, about 20 nt to about 50 nt, about 20 nt to about 60 nt, about 20 nt to about 70 nt, about 20 nt to about 80 nt, about 20 nt to about 90 nt, or about 20 nt to about 100 nt. The length of the nucleic acid targeting region can be at least 5, 10, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 30 nucleotides or more. The length of the nucleic acid targeting region (e.g., spacer sequence) can be at most 5, 10, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 30 nucleotides or more.

[0145] In some embodiments, the nucleic acid targeting region (e.g., spacer) of the guide nucleic acid is 20 nucleotides in length. In some embodiments, the nucleic acid targeting region of the guide nucleic acid is 19 nucleotides in length. In some embodiments, the nucleic acid targeting region of the guide nucleic acid is 18 nucleotides in length. In some embodiments, the nucleic acid targeting region of the guide nucleic acid is 17 nucleotides in length. In some embodiments, the nucleic acid targeting region of the guide nucleic acid is 16 nucleotides in length. In some embodiments, the nucleic acid targeting region of the guide nucleic acid is 21 nucleotides in length. In some embodiments, the nucleic acid targeting region of the guide nucleic acid is 22 nucleotides in length.

[0146] The nucleotide sequence of the guide nucleic acid complementary to the nucleotide sequence of the target nucleic acid (target sequence) can have a length of, for example, at least about 12 nt, at least about 15 nt, at least about 18 nt, at least about 19 nt, at least about 20 nt, at least about 25 nt, at least about 30 nt, at least about 35 nt or at least about 40 nt. The nucleotide sequence of the guide nucleic acid complementary to the nucleotide sequence of the target nucleic acid (target sequence) can have a length of about 12 nucleotides (nt) to about 80 nt, about 12 nt to about 50 nt, about 12 nt to about 45 nt, about 12 nt to about 40 nt, about 12 nt to about 35 nt, about 12 nt to about 30 nt, about 12 nt to about 25 nt, about 12 nt to about 20 nt, about 12 nt to about 19 nt, about 19 nt to about 20 nt, about 19 nt to about 25 nt, about 19 nt to about 30 nt, about 19 nt to about 35 nt, about 19 nt to about 40 nt, about 19 nt to about 45 nt, about 19 nt to about 50 nt, about 19 nt to about 60 nt, about 20 nt to about 25 nt, about 20 nt to about 30 nt, about 20 nt to about 35 nt, about 20 nt to about 40 nt, about 20 nt to about 45 nt, about 20 nt to about 50 nt, or about 20 nt to about 60 nt.

[0147] The protospacer sequence of the polynucleotide to be targeted can be identified by identifying a protospacer adjacent motif (PAM) within the region of interest and selecting a region of a desired size upstream or downstream of the PAM as the protospacer. The corresponding spacer sequence can be designed by determining the complementary sequence of the protospacer region.

[0148] The spacer sequence can be identified using a computer program (e.g., machine-readable code). The computer program can use variables such as predicted melting temperature, secondary structure formation, and predicted annealing temperature, sequence identity, genomic context, chromatin accessibility, %GC, genomic occurrence frequency, methylation status, presence of SNPs, etc.

[0149] The percent complementarity between a nucleic acid targeting sequence (e.g., the spacer sequence of at least one guide polynucleotide as disclosed herein) and a target nucleic acid (e.g., the protospacer sequence of one or more target loci as disclosed herein) can be at least 50%, at least 60%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 97%, at least 98%, at least 99%, or 100%. The percent complementarity between the nucleic acid targeting sequence and the target nucleic acid can be at least 60%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 97%, at least 98%, at least 99%, or 100% over about 20 consecutive nucleotides.

[0150] The Cas protein-binding segment of the guide nucleic acid may include two consecutive nucleotides that are complementary to each other (e.g., crRNA and tracrRNA). Two consecutive nucleotides that are complementary to each other (e.g., crRNA and tracrRNA) may be covalently linked by intervening nucleotides (e.g., a linker in the case of a single guide nucleic acid). Two consecutive nucleotides that are complementary to each other (e.g., crRNA and tracrRNA) can hybridize to form a double-stranded RNA duplex or hairpin of the Cas protein-binding segment, and thus a stem-loop structure may occur. crRNA and tracrRNA may be covalently linked via the 3' end of crRNA and the 5' end of tracrRNA. Alternatively, tracrRNA and crRNA may be covalently linked via the 5' end of tracrRNA and the 3' end of crRNA.

[0151] The Cas protein-binding segment of the guide nucleic acid may have a length of about 10 nucleotides to about 100 nucleotides, such as about 10 nucleotides (nt) to about 20 nt, about 20 nt to about 30 nt, about 30 nt to about 40 nt, about 40 nt to about 50 nt, about 50 nt to about 60 nt, about 60 nt to about 70 nt, about 70 nt to about 80 nt, about 80 nt to about 90 nt, or about 90 nt to about 100 nt. For example, the Cas protein-binding segment of the guide nucleic acid may have a length of about 15 nucleotides (nt) to about 80 nt, about 15 nt to about 50 nt, about 15 nt to about 40 nt, about 15 nt to about 30 nt, or about 15 nt to about 25 nt.

[0152] The dsRNA duplex of the Cas protein-binding segment of the guide nucleic acid can have a length of about 6 base pairs (bp) to about 50 bp. For example, the dsRNA duplex of the protein-binding segment can have a length of about 6 bp to about 40 bp, about 6 bp to about 30 bp, about 6 bp to about 25 bp, about 6 bp to about 20 bp, about 6 bp to about 15 bp, about 8 bp to about 40 bp, about 8 bp to about 30 bp, about 8 bp to about 25 bp, about 8 bp to about 20 bp, or about 8 bp to about 15 bp. For example, the dsRNA duplex of the Cas protein-binding segment can have a length of about 8 bp to about 10 bp, about 10 bp to about 15 bp, about 15 bp to about 18 bp, about 18 bp to about 20 bp, about 20 bp to about 25 bp, about 25 bp to about 30 bp, about 30 bp to about 35 bp, about 35 bp to about 40 bp, or about 40 bp to about 50 bp.

[0153] In some embodiments, the dsRNA duplex of the Cas protein-binding segment can have a length of 36 base pairs. The percent complementarity between the nucleotide sequences that hybridize to form the dsRNA duplex of the protein-binding segment can be at least about 60%. For example, the percent complementarity between the nucleotide sequences that hybridize to form the dsRNA duplex of the protein-binding segment can be at least about 65%, at least about 70%, at least about 75%, at least about 80%, at least about 85%, at least about 90%, at least about 95%, at least about 98%, or at least about 99%. In some cases, the percent complementarity between the nucleotide sequences that hybridize to form the dsRNA duplex of the protein-binding segment can be 100%.

[0154] The linker (e.g., a sequence that links crRNA and tracrRNA to form a single guide nucleic acid) can have a length of about 3 nucleotides to about 100 nucleotides. For example, the linker can have a length of about 3 nucleotides (nt) to about 90 nt, about 3 nucleotides (nt) to about 80 nt, about 3 nucleotides (nt) to about 70 nt, about 3 nucleotides (nt) to about 60 nt, about 3 nucleotides (nt) to about 50 nt, about 3 nucleotides (nt) to about 40 nt, about 3 nucleotides (nt) to about 30 nt, about 3 nucleotides (nt) to about 20 nt, or about 3 nucleotides (nt) to about 10 nt. For example, the linker can have a length of about 3 nt to about 5 nt, about 5 nt to about 10 nt, about 10 nt to about 15 nt, about 15 nt to about 20 nt, about 20 nt to about 25 nt, about 25 nt to about 30 nt, about 30 nt to about 35 nt, about 35 nt to about 40 nt, about 40 nt to about 50 nt, about 50 nt to about 60 nt, about 60 nt to about 70 nt, about 70 nt to about 80 nt, about 80 nt to about 90 nt, or about 90 nt to about 100 nt. In some embodiments, the linker of the DNA targeting RNA is 4 nt.

[0155] The guide nucleic acids of the systems of the present disclosure may include modifications or sequences that provide additional desirable features (e.g., modified or regulated stability; intracellular targeting; tracking by fluorescent labeling; binding sites for proteins or protein complexes, etc.). Examples of such modifications include, for example, a 5' cap (7-methylguanylate cap (m7G)); a 3' polyadenylation tail (3' poly(A) tail); riboswitch sequences by proteins and / or protein complexes (e.g., enabling regulated stability and / or regulated exposure); stability control sequences; sequences that form dsRNA duplexes (hairpins); modifications or sequences that target RNA to intracellular locations (e.g., nucleus, mitochondria, chloroplasts, etc.); modifications or sequences that provide tracking (e.g., direct conjugation to a fluorescent molecule, conjugation to a moiety that facilitates fluorescent detection, sequences that enable fluorescent detection, etc.); modifications or sequences that provide binding sites for proteins (e.g., proteins that act on DNA, including transcriptional activators, transcriptional repressors, DNA methyltransferases, DNA demethylases, histone acetyltransferases, histone deacetylases, and combinations thereof).

[0156] The guide nucleic acid can provide a nucleic acid with novel or enhanced features (e.g., improved stability) by including one or more modifications (e.g., base modifications, backbone modifications). The guide nucleic acid can include a nucleic acid affinity tag. A nucleoside may be a base-sugar combination. The base portion of a nucleotide can be a heterocyclic base. Two of the most common classes of such heterocyclic bases are purines and pyrimidines. A nucleotide can be a nucleoside that further includes a phosphate group covalently linked to the sugar portion of the nucleoside. For a nucleoside containing a pentofuranosyl sugar, the phosphate group may be linked to the 2′, 3′, or 5′ hydroxyl portion of the sugar. In forming the guide nucleic acid, the phosphate groups can covalently link adjacent nucleosides to each other to form a linear polymer compound. Next, the respective ends of this linear polymer compound can be further joined to form a cyclic compound; however, a linear compound may also be suitable. In addition, the linear compound can have internal nucleotide base complementarity and thus can be folded in a way that results in a completely or partially double-stranded compound. Further, within the guide nucleic acid, the phosphate groups can generally be said to form the internucleoside backbone of the guide nucleic acid. The linkage or backbone of the guide nucleic acid can be a 3′→5′ phosphodiester linkage.

[0157] The guide nucleic acid can include a modified backbone and / or a modified internucleoside linkage. Modified backbones can include those that retain a phosphorus atom in the backbone and those that do not have a phosphorus atom in the backbone.

[0158] Suitable modified guide nucleic acid backbones containing phosphorus atoms include, for example, phosphorothioates, chiral phosphorothioates, phosphorodithioates, phosphate triesters, aminoalkylphosphotriesters, methyl phosphonate and other alkyls, such as 3'-alkylene phosphonates, 5'-alkylene phosphonates, chiral phosphonates, phosphinates, phosphoramidates including 3'-aminophosphoramidate and aminoalkylphosphoramidates, phosphorodiamidates, thionophosphoramidates, thionoalkylphosphonates, thionoalkylphosphotriesters, selenophosphates, and boranophosphates, those having a normal 3'-5' linkage, 2'-5' linkage analogs, and those having an inverted polarity such that one or more internucleotide bonds are 3'→3', 5'→5' or 2'→2' bonds. Suitable guide nucleic acids having an inverted polarity include a single 3'→3' bond at the most 3'-terminal internucleotide bond (such as a single inverted nucleoside residue lacking a nucleobase or having a hydroxyl group instead). Various salts (e.g., potassium chloride or sodium chloride), mixed salts, and free acid forms may also be included.

[0159] The guide nucleic acid may include one or more phosphorothioate and / or heteroatom nucleoside linkages, specifically, -CH2-NH-O-CH2-, -CH2-N(CH3)-O-CH2- (methylene(methylimino) or MMI backbone), -CH2-O-N(CH3)-CH2-, -CH2-N(CH3)-N(CH3)-CH2- and -O-N(CH3)-CH2-CH2- (where the natural phosphodiester internucleotide bond is represented as -O-P(=O)(OH)-O-CH2-).

[0160] The guide nucleic acid may include a morpholino backbone structure. For example, the nucleic acid may include a 6-membered morpholino ring instead of a ribose ring. In some of these embodiments, phosphorodiamidate or other non-phosphodiester nucleoside linkages replace the phosphodiester bond.

[0161] The guide nucleic acid can include a polynucleotide backbone formed by short-chain alkyl or cycloalkyl nucleoside linkages, mixed heteroatom and alkyl or cycloalkyl nucleoside linkages, or one or more short-chain heteroatom or heterocyclic nucleoside linkages. These can include those having morpholino linkages (some formed in part by the sugar moiety of the nucleoside); siloxane backbones; sulfide, sulfoxide, and sulfone backbones; formacetyl and thioformacetyl backbones; methyleneformacetyl and thioformacetyl backbones; riboacetyl backbones; alkene-containing backbones; sulfamate backbones; methyleneimino and methylenehydrazino backbones; sulfonate and sulfonamide backbones; amide backbones; and others having a mixture of N, O, S, and CH2 component parts may be included.

[0162] The guide nucleic acid can include nucleic acid mimics. The term "mimic" may be intended to include polynucleotides in which only the furanose ring or both the furanose ring and the internucleotide linkage are replaced by non-furanose groups, and replacement of only the furanose ring can also be referred to as a sugar substitute. The heterocyclic base moiety or modified heterocyclic base moiety can be maintained for hybridization with an appropriate target nucleic acid. One such nucleic acid can be a peptide nucleic acid (PNA). In PNA, the sugar-backbone of the polynucleotide can be replaced with an amide-containing backbone, specifically an aminoethylglycine backbone. The nucleotides can be maintained directly or indirectly and can bind to the aza-nitrogen atom of the amide portion of the backbone. The backbone in a PNA compound can include two or more linked aminoethylglycine units that give PNA an amide-containing backbone. The heterocyclic base moiety can bind directly or indirectly to the aza-nitrogen atom of the amide portion of the backbone.

[0163] The guide nucleic acid can include linked morpholino units (morpholino nucleic acids) having heterocyclic bases attached to the morpholino ring. A linking group can link the morpholino monomer units of the morpholino nucleic acid. Nonionic morpholino-based oligomeric compounds can have less undesirable interactions with cellular proteins. Morpholino-based polynucleotides can be nonionic mimics of the guide nucleic acid. Various compounds within the morpholino class can be joined using different linking groups. A further class of polynucleotide mimics can be referred to as cyclohexenyl nucleic acids (CeNA). The furanose ring normally present in nucleic acid molecules can be replaced with a cyclohexenyl ring. CeNA phosphoramidite monomers protected with DMT can be prepared and used in the synthesis of oligomeric compounds using phosphoramidite chemistry. Incorporation of CeNA monomers into nucleic acid strands can increase the stability of DNA / RNA hybrids. CeNA oligoadenylic acids can form complexes with nucleic acid complements, which have a stability similar to that of natural complexes. A further modification can be locked nucleic acids (LNA) in which the 2'-hydroxyl group is linked to the 4'-carbon atom of the sugar ring, thereby forming a 2'-C,4'-C-oxymethylene linkage and thereby forming a bicyclic sugar moiety. The linkage is a methylene (-CH2-) group that bridges the 2'-oxygen atom and the 4'-carbon atom (where n is 1 or 2). LNAs and LNA analogs can exhibit very high thermal stability (Tm = +3 to +10 °C) of duplexes with complementary nucleic acids, stability against 3'-exonuclease degradation, and good solubility properties.

[0164] The guide nucleic acid can include one or more substituted sugar moieties. Suitable polynucleotides can include sugar substituents selected from OH; F; O-, S-, or N-alkyl; O-, S-, or N-alkenyl; O-, S- or N-alkynyl; or O-alkyl-O-alkyl (where alkyl, alkenyl, and alkynyl are substituted or unsubstituted C1-C 10 alkyl or C2-C 10 alkenyl and alkynyl). In particular, O((CH2) nO) m CH3, O(CH2) n OCH3, O(CH2) n NH2, O(CH2) n CH3, O(CH2) n ONH2, and O(CH2) n ON((CH2) n CH3)2 (wherein n and m are from 1 to about 10) are preferred. The sugar substituents are C1-C 10 lower alkyl, substituted lower alkyl, alkenyl, alkynyl, aralkyl, aralkyl, O-aralkyl or O-aralkyl, SH, SCH3, OCN, Cl, Br, CN, CF3, OCF3, SOCH3, SO2CH3, ONO2, NO2, N3, NH2, heterocycloalkyl, heterocycloaralkyl, aminoalkylamino, polyalkylamino, substituted silyl, RNA cleavage group, reporter group, intercalator, a group that improves the pharmacokinetic properties of the guide nucleic acid, or a group that improves the pharmacodynamic properties of the guide nucleic acid, and others, substituents having similar properties may be selected. Suitable modifications may include 2'-methoxyethoxy (2'-O-CH2CH2OCH3, also known as 2'-O-(2-methoxyethyl) or 2'-MOE, an alkoxyalkoxy group). Further suitable modifications may include 2'-dimethylaminooxyethoxy, (O(CH2)2ON(CH3)2 group, also known as 2'-DMAOE), and 2'-dimethylaminoethoxyethoxy (also known as 2'-O-dimethyl-amino-ethoxy-ethyl or 2'-DMAEOE), 2'-O-CH2-O-CH2-N(CH3)2.

[0165] Other suitable sugar substituents include methoxy (-O-CH3), aminopropoxy (-OCH2CH2NH2), allyl (-CH2-CH=CH2), -O-allyl (-O-CH2-CH=CH2), and fluoro (F). The 2'-sugar substituent may be in the arabino (upward) or ribo (downward) position. A preferred 2'-arabino modification is 2'-F. Similar modifications can also be made at other positions on the oligomeric compound, particularly at the 3'-position of the sugar on the 3'-terminal nucleoside or in the 2'-5'-linked nucleotide and at the 5'-position of the 5'-terminal nucleotide. The oligomeric compound may also have a sugar mimetic such as a cyclobutyl moiety instead of a pentofuranosyl sugar.

[0166] Guide nucleic acids can also include nucleic acid base (or "base") modifications or substitutions. As used herein, "unmodified" or "natural" nucleic acid bases can include purine bases, (e.g., adenine (A) and guanine (G)), and pyrimidine bases, (e.g., thymine (T), cytosine (C) and uracil (U)). Modified nucleic acid bases include 5-methylcytosine (5-me-C), 5-hydroxymethylcytosine, xanthine, hypoxanthine, 2-aminoadenine, 6-methyl and other alkyl derivatives of adenine and guanine, 2-propyl and other alkyl derivatives of adenine and guanine, 2-thiouracil, 2-thiothymine and 2-thiocytosine, 5-halouracil and cytosine, 5-propynyl (-C=C-CH3) uracil and cytosine and other alkynyl derivatives of pyrimidine bases, 6-azouracil, cytosine and thymine, 5-uracil (pseudouracil), 4-thiouracil, 8-halo, 8-amino, 8-thiol, 8-thioalkyl, 8-hydroxyl and other 8-substituted adenines and guanines, 5-halo, particularly 5-bromo, 5-trifluoromethyl and other 5-substituted uracils and cytosines, 7-methylguanine and 7-methyladenine, 2-F-adenine, 2-amino-adenine, 8-azaguanine and 8-azadenine, 7-deazaguanine and 7-deazaadenine and 3-deazaguanine and 3-deazaadenine, and other synthetic and natural nucleic acid bases. Modified nucleic acid bases include tricyclic pyrimidines, such as phenoxazine cytidine (1H-pyrimido(5,4-b)(1,4)benzoxazin-2(3H)-one), phenothiazine cytidine (1H-pyrimido(5,4-b)(1,4)benzothiazin-2(3H)-one), G-clamp, such as substituted phenoxazine cytidine (e.g., 9-(2-aminoethoxy)-H-pyrimido(5,4-(b)(1,4)benzoxazin-2(3H)-one), carbazole cytidine (2H-pyrimido(4,5-b)indol-2-one), pyridoindole cytidine (H-pyrido(3’,2’:4,5)pyrrolo(2,3-d)pyrimidine-2-one).

[0167] Examples of the heterocyclic base moiety include those in which a purine or pyrimidine base is replaced by another heterocycle such as 7-deaza-adenine, 7-deazaguanosine, 2-aminopyridine, and 2-pyridone. Nucleic acid bases can be useful for increasing the binding affinity of polynucleotide compounds. These may include 5-substituted pyrimidines, 6-azapyrimidines, and N-2, N-6, and O-6 substituted purines, including 2-aminopropyladenine, 5-propynyluracil, and 5-propynylcytosine. 5-Methylcytosine substitution can increase nucleic acid duplex stability by 0.6 to 1.2 °C and can be a preferred base substitution (for example, when combined with 2'-O-methoxyethyl sugar modification).

[0168] Modification of the guide nucleic acid can include chemically linking to the guide nucleic acid one or more moieties or conjugates that can enhance the activity, cellular distribution, or cellular uptake of the guide nucleic acid. These moieties or conjugates may include conjugate groups covalently attached to a functional group, such as a primary or secondary hydroxyl group. Conjugate groups include, but are not limited to, intercalators, reporter molecules, polyamines, polyamides, polyethylene glycols, polyethers, groups that enhance the pharmacodynamic properties of the oligomer, and groups that can enhance the pharmacokinetic properties of the oligomer. Conjugate groups include, but are not limited to, cholesterol, lipids, phospholipids, biotin, phenazine, folic acid, phenanthridine, anthraquinone, acridine, fluorescein, rhodamine, coumarin, and dyes. Groups that enhance the pharmacodynamic properties include groups that improve uptake, enhance resistance to degradation, and / or strengthen sequence-specific hybridization to the target nucleic acid. Groups that can enhance the pharmacokinetic properties include groups that improve the uptake, distribution, metabolism, or excretion of the nucleic acid. Conjugate moieties include, but are not limited to, lipid moieties, such as cholesterol moieties, cholic acid, thioethers (e.g., hexyl-S-tritylthiol), thiocholesterol, aliphatic chains (e.g., dodecanediol or undecyl residues), phospholipids (e.g., di-hexadecyl-rac-glycerol or triethylammonium 1,2-di-O-hexadecyl-rac-glycero-3-H-phosphonate), polyamine or polyethylene glycol chains, or adamantane acetic acid, palmitoyl moieties, or octadecylamine or hexylamino-carbonyl-oxy cholesterol moieties may also be included.

[0169] In some embodiments, at least one guide RNA polynucleotide of the systems or methods provided herein can bind to at least a portion of a genome (e.g., a plant genome) or a gene (e.g., a plant gene). Optionally, the at least one guide RNA polynucleotide has the ability to form a complex with a site-specific nuclease and direct the site-specific nuclease to target a portion of a target nucleic acid (e.g., a site in a genome or gene).

[0170] In some embodiments, the systems described herein include at least one guide RNA polynucleotide capable of forming a complex with the site-specific nuclease portion of a fusion protein of the system. In some embodiments, the systems described herein include at least two (e.g., at least three, at least four, at least five, or at least six) different guide RNA polynucleotides capable of forming a complex with the site-specific nuclease portion of a fusion protein of the system.

[0171] In some embodiments, the guide nucleic acid includes a nucleotide sequence having at least 70% (e.g., at least 75%, at least 80%, at least 85%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or 100%) identity to any one of SEQ ID NOs: 2, 3, 11, 12, 23-26, or 28-31.

[0172] Also provided herein are kits containing the components of the systems described in this disclosure. In some embodiments, the kit includes one or more of the fusion proteins and / or polynucleotides described herein.

[0173] VI. Methods In another aspect, provided herein are methods for editing one or more nucleic acids using the fusion proteins and / or systems described herein. In some embodiments, the method includes contacting a nucleic acid comprising a fusion protein binding site (i.e., the nucleic acid to be edited) with at least one fusion protein as described herein, wherein contacting the nucleic acid with the at least one fusion protein results in an edit to the nucleic acid. The nucleic acid (i.e., the nucleic acid to be edited) can be any suitable nucleic acid. In some embodiments, the nucleic acid is a portion of a chromosome. In some embodiments, the nucleic acid is a portion of a genome (e.g., a plant genome).

[0174] As described herein and as demonstrated in the following examples, the methods provided herein can result in an increase in the frequency of one or more desired nucleic acid editing outcomes (e.g., excision of a fragment, inversion of a fragment, replacement of a fragment by HDR, chromosomal rearrangement). By using a fusion protein that is targeted to a specific nucleic acid sequence (e.g., a genomic site, a donor template site) (i.e., by a site-specific nuclease), the methods herein can be used to increase (or decrease) the frequency of one or more desired nucleic acid editing outcomes. In some embodiments, the fusion protein is targeted to a specific strand of the nucleic acid. In some embodiments, the fusion protein is targeted to a site upstream or downstream of the nuclease cleavage site. In some embodiments, the fusion protein is targeted to bind to the nucleic acid in a specific orientation.

[0175] In some embodiments, the nucleic acid to be edited by the present method comprises a target region. As used herein, a "target region" refers to a portion of a nucleic acid that is the target of editing. For example, the target region can be a portion of a gene to be edited. In some embodiments, the fusion proteins provided herein are targeted to binding sites (e.g., two sites are inside the target region, one site is inside the target region and one site is outside, or two sites are outside the target region) inside and / or outside the target region as described below. In some embodiments, each fusion protein binding site is in proximity to a nuclease cleavage site. In some embodiments, the target region is adjacent to a nuclease cleavage site. In some embodiments, the nucleic acid comprises a first binding site adjacent to the 5' end of the target region and a second binding site adjacent to the 3' end of the target region.

[0176] In some embodiments, as detailed below, the nucleic acid to be edited comprises a first binding site and a second binding site. In some embodiments, the first binding site and the second binding site are different sequences, and the method comprises providing two different fusion proteins, one of which binds to the first binding site and one of which binds to the second binding site. In some embodiments, the first binding site and the second binding site are the same sequence, and the method comprises providing a fusion protein that can bind to both the first binding site and the second binding site.

[0177] In some embodiments, the methods herein include providing a donor nucleic acid comprising a third binding site and a fourth binding site. In some embodiments, the third binding site and the fourth binding site are different sequences, and the methods include providing two different fusion proteins, one of which binds to the third binding site and one of which binds to the fourth binding site. In some embodiments, the third binding site and the fourth binding site are the same sequence, and the methods include providing a fusion protein that can bind to both the third binding site and the fourth binding site. Additionally, in some embodiments, the third and / or fourth binding sites are the same sequence as the first and / or second binding sites. The first, second, third, and fourth binding sites can include any combination of sequences, from all four having the same sequence to all four having different sequences. In some embodiments, the nucleic acid (i.e., the nucleic acid to be edited) is a portion of a first chromosome and the donor nucleic acid is a portion of a second chromosome. In some embodiments, the first chromosome and the second chromosome are different chromosomes. In some embodiments, the first chromosome and the second chromosome are homologous chromosomes. In some embodiments, the first chromosome and the second chromosome are non-homologous chromosomes. In some embodiments, the first chromosome and the second chromosome are the same chromosome.

[0178] In some embodiments of the methods provided herein, for example, in the exemplary embodiments described below, the site-specific nuclease of at least one fusion protein comprises a CRISPR-associated nuclease. In such embodiments, the method can further comprise providing a guide RNA to target the fusion protein to the binding site. In some embodiments, the method comprises providing at least one first guide RNA and at least one second guide RNA. In some embodiments, the at least one first guide RNA comprises a nucleotide sequence having complementarity to a first binding site of the nucleic acid to be edited. In some embodiments, the at least one second guide RNA comprises a nucleotide sequence having complementarity to a second binding site of the nucleic acid to be edited. In some embodiments, when the method comprises providing a donor nucleic acid, the method further comprises providing at least one third guide RNA and at least one fourth guide RNA. In some embodiments, the at least one third guide RNA comprises a nucleotide sequence having complementarity to a third binding site (i.e., on the donor nucleic acid). In some embodiments, the at least one fourth guide RNA comprises a nucleotide sequence having complementarity to a fourth binding site (i.e., on the donor nucleic acid).

[0179] In some embodiments, the frequency of a desired nucleic acid editing result can be increased or decreased by targeting the fusion protein to bind inside and / or outside of the target region. In some embodiments of the methods provided herein, the editing made to the nucleic acid is an excision (i.e., removal), inversion (i.e., reversal of orientation), or replacement of at least a portion of the target region. In some embodiments, the editing is a chromosomal rearrangement. In some embodiments, the chromosomal rearrangement is a reciprocal translocation. In some embodiments, the chromosomal rearrangement is a non-reciprocal translocation. Some non-limiting, exemplary embodiments of fusion protein targeting and editing results that increase the frequency are discussed below with reference to the accompanying figures, and then various aspects of the method are further discussed. Exemplary embodiments include a Cas enzyme as the fusion protein SDN, and thus discussion of gRNA-mediated binding and PAM sequences is included, but as discussed in Section III.A above, similarly targeted binding (e.g., to target inside and / or outside of the target region) can use other site-specific nucleases (e.g., zinc finger nucleases, TAL effector nucleases, meganucleases, etc.).

[0180] In some embodiments, as demonstrated in Example 1 of the present disclosure, according to the methods provided herein, the frequency of fragment inversion (e.g., at a genomic locus) between paired cleavage sites on a nucleic acid is increased. An exemplary embodiment is shown in FIG. 1, where the method includes two fusion proteins that bind to a first binding site and a second binding site on the same strand of a DNA polynucleotide. The first binding site is adjacent to the 5' end of the target region (i.e., the region between nuclease cleavage sites), and the second binding site is adjacent to the 3' end of the target region. For purposes of illustration, the fusion protein of this embodiment includes LbCas12a as an SDN linked to Trex2 exonuclease as a non-specific end-processing enzyme. The DNA polynucleotide is a portion of the ZmDMR6 gene, the first binding site includes a PAM sequence located in the promoter of the gene, and the second binding site includes a PAM sequence located in intron 1 of the gene. LbCas12a (i.e., the SDN of the fusion protein) cleaves the polynucleotide in the 3' direction from the PAM, and the fusion protein remains bound to the binding site on the cleaved polynucleotide end that includes the PAM. Since the first binding site (designated "gRNA1") is upstream of the cleavage site (i.e., outside the target region), the fusion protein bound to the first binding site remains bound to the cleaved polynucleotide end that is upstream and outside the target region. Since the second binding site (designated "gRNA2") is upstream of the cleavage site (i.e., inside the target region), the fusion protein bound to the second binding site remains bound to the downstream end of the target region. Without being bound by any particular theory, when the fusion proteins bound to the first and second binding sites dimerize (i.e., via dimerization of Trex2), the downstream end of the target region and the end of the polynucleotide upstream of the target region will come into proximity, where they will be ligated through a DNA repair mechanism such as NHEJ or MMEJ. When the remaining polynucleotide ends (i.e., the ends not bound by the fusion protein) are ligated through a DNA repair mechanism in this way, inversion of the target region occurs.

[0181] In some embodiments, according to the method provided herein, as described in Example 3 of the present disclosure, the frequency of excision (i.e., removal) of a fragment between paired cleavage sites on a nucleic acid (e.g., at a genomic locus) is increased. An exemplary embodiment is shown in FIG. 2, where the method includes two fusion proteins that bind to a first binding site and a second binding site on the reverse strand of a DNA polynucleotide. The first binding site is adjacent to the 5' end of the target region, and the second binding site is adjacent to the 3' end of the target region. For purposes of illustration, the fusion proteins of this embodiment include LbCas12a as an SDN linked to Trex2 exonuclease as a non-specific end-processing enzyme. The DNA polynucleotide is a portion of the ZmDMR6 gene, the first binding site includes a PAM sequence located in the promoter of the gene, and the second binding site includes a PAM sequence located in intron 1 of the gene. LbCas12a (i.e., the SDN of the fusion protein) cleaves the polynucleotide in the 3' direction from the PAM, and the fusion protein remains bound to the binding site on the cleaved polynucleotide end that includes the PAM. Since the first binding site (designated "gRNA1") is upstream of the cleavage site (i.e., outside the target region), the fusion protein bound to the first binding site remains bound to the cleaved polynucleotide end that is upstream and outside the target region. Since the second binding site (designated "gRNA2") is downstream of the cleavage site (i.e., outside the target region), the fusion protein bound to the second binding site remains bound to the cleaved polynucleotide end that is downstream and outside the target region. Without being bound by any particular theory, when the fusion proteins bound to the first and second binding sites dimerize, the cleaved polynucleotide ends upstream and downstream of the target region come into proximity, where they are ligated through a DNA repair mechanism such as NHEJ or MMEJ, resulting in excision of the target region.

[0182] In some embodiments, the increased excision frequency described above can also be achieved by a method using a fusion protein targeted within the target region. An exemplary embodiment of such a method involves two fusion proteins that bind to a first binding site and a second binding site on the reverse strand of a DNA polynucleotide. The first binding site is adjacent to the 5' end of the target region, the second binding site is adjacent to the 3' end of the target region, and both the first binding site and the second binding site are within the target region. Thus, after SDN-mediated cleavage, the fusion protein remains bound to the end of the target region on one side thereof. Without being bound by any particular theory, when the fusion proteins bound to the first and second binding sites dimerize, the cleaved polynucleotide ends of the target region come into proximity, whereupon they are joined via a DNA repair mechanism such as NHEJ or MMEJ (i.e., a circular polynucleotide is formed). When the remaining polynucleotide ends (i.e., the ends to which the fusion protein does not bind) are joined via a DNA repair mechanism in this way, excision of the target region occurs.

[0183] In some embodiments, the methods provided herein further comprise providing a donor nucleic acid comprising a fusion protein binding site. As described above, the donor nucleic acid can be used with the SDN to provide a template for homologous recombination repair (HDR). In some embodiments, the donor nucleic acid can also be used to provide a replacement fragment to be inserted in place of the target region. In some embodiments, use of the donor nucleic acid can facilitate translocation (e.g., chromosomal translocation). By using a fusion protein that targets specific sequences in both the nucleic acid to be edited and the donor nucleic acid, the frequency of one or more desired nucleic acid editing outcomes can be increased (or decreased) using the methods herein. In some embodiments, the donor nucleic acid provided by the methods herein comprises a third binding site and a fourth binding site (i.e., if the nucleic acid to be edited comprises a first binding site and a second binding site) and a donor nucleotide region. As used herein, "donor nucleotide region" refers to a portion of the donor nucleic acid that is adjacent to the fusion protein binding site. Some non-limiting, exemplary embodiments of methods comprising a donor nucleic acid are considered below with reference to the accompanying figures.

[0184] In some embodiments, according to the method provided herein as described in Example 5 of the present disclosure, the frequency of fragment replacement or targeted insertion between paired cleavage sites on a nucleic acid (e.g., at a genomic locus) is increased. Two exemplary embodiments are illustrated in FIGS. 3A and 3B, where these methods include two fusion proteins that bind to a first binding site and a second binding site on the reverse strand of a DNA polynucleotide (i.e., the nucleotide to be edited, herein shown as "genomic DNA"), and two fusion proteins that bind to a third binding site and a fourth binding site on the reverse strand of a donor nucleic acid (herein shown as "donor DNA"). In both embodiments, the first binding site is adjacent to the 5' end of the target region (i.e., the region between nuclease cleavage sites), the second binding site is adjacent to the 3' end of the target region, the third binding site is adjacent to the 5' end of the donor nucleotide region (i.e., the region of the donor nucleotide between nuclease cleavage sites), and the fourth binding site is adjacent to the 3' end of the donor nucleotide region. For illustrative purposes, the fusion proteins in these embodiments include LbCas12a as an SDN linked to Trex2 exonuclease as a non-specific end-processing enzyme. LbCas12a (i.e., the SDN of the fusion protein) cleaves the polynucleotide (i.e., both the nucleotide to be edited and the donor nucleic acid) in the 3' direction from the PAM, and the fusion protein remains bound to the binding site on the cleaved polynucleotide end containing the PAM. Since the first binding site (designated "gRNA-a") is upstream of the cleavage site (i.e., outside the target region), the fusion protein bound to the first binding site remains bound to the cleaved polynucleotide end that is upstream and outside the target region. Since the second binding site (designated "gRNA-b") is downstream of the cleavage site (i.e., outside the target region), the fusion protein bound to the second binding site remains bound to the cleaved polynucleotide end that is downstream and outside the target region.Since the third binding site (designated as "gRNA-a / c") is downstream of the cleavage site (i.e., within the donor nucleotide region), the fusion protein bound to the third binding site remains bound to the cleaved polynucleotide end of the donor nucleotide region (i.e., the upstream end of the donor nucleotide region). Since the fourth binding site (designated as "gRNA-b / d") is upstream of the cleavage site (i.e., within the donor nucleotide region), the fusion protein bound to the fourth binding site remains bound to the cleaved polynucleotide end of the donor nucleotide region (i.e., the downstream end of the donor nucleotide region). Without being bound by any particular theory, when the fusion proteins dimerize (i.e., via dimerization of Trex2), the cleaved polynucleotide ends upstream of the target region bound to the first and third binding sites come close to the upstream end of the donor nucleotide region, and when the fusion proteins bound to the second and fourth binding sites dimerize, the cleaved polynucleotide ends downstream of the target region come close to the downstream end of the donor nucleotide region. When the adjacent polynucleotide ends are ligated through a DNA repair mechanism such as NHEJ or MMEJ, replacement of the target region with the donor nucleotide region will occur. The donor nucleotide region may also be inserted in the reverse orientation (i.e., when the fusion proteins bound to the first and fourth binding sites dimerize and the fusion proteins bound to the second and third binding sites dimerize).

[0185] In the embodiment shown in FIG. 3A, the donor nucleotide region does not contain homology arms, and thus has a higher probability of NHEJ or MMEJ repair than HDR. In the embodiment shown in FIG. 3B, the donor nucleotide region contains homology arms. Without being bound by any particular theory, in the embodiment illustrated in FIG. 3B, the fusion proteins bound to the first binding site and the third binding site dimerize, and the fusion proteins bound to the second binding site and the fourth binding site dimerize, so that the homology arms of the donor nucleotide region come into proximity with the homologous sequence of the nucleotide to be edited (for example, genomic DNA), thereby promoting HDR-mediated repair.

[0186] In some embodiments, according to the method provided herein as described in Example 7 of the present disclosure, the frequency of translocation between two polynucleotides (e.g., between two chromosomes) is increased. An exemplary embodiment is shown in FIG. 4, where the method includes two fusion proteins that bind to a first binding site and a second binding site on the reverse strand of a DNA polynucleotide (i.e., the nucleotide to be edited, shown here as the "recipient chromosome"), and two fusion proteins that bind to a third binding site and a fourth binding site on the reverse strand of a donor nucleic acid (shown here as the "donor chromosome"). The first binding site is adjacent to the 5' end of the target region (i.e., the region between the nuclease cleavage sites), the second binding site is adjacent to the 3' end of the target region, the third binding site is adjacent to the 5' end of the donor nucleotide region (i.e., the region between the nuclease cleavage sites of the donor nucleotides), and the fourth binding site is adjacent to the 3' end of the donor nucleotide region. For illustrative purposes, the fusion protein includes LbCas12a as an SDN linked to Trex2 exonuclease as a non-specific end-processing enzyme in these embodiments. LbCas12a (i.e., the SDN of the fusion protein) cleaves the polynucleotide (i.e., both the nucleotide to be edited and the donor nucleic acid) in the 3' direction from the PAM, and the fusion protein remains bound to the binding site on the cleaved polynucleotide end containing the PAM. Since the first binding site (designated "gRNA-a") is upstream of the cleavage site (i.e., outside the target region), the fusion protein bound to the first binding site remains bound to the cleaved polynucleotide end that is upstream and outside the target region. Since the second binding site (designated "gRNA-b") is downstream of the cleavage site (i.e., outside the target region), the fusion protein bound to the second binding site remains bound to the cleaved polynucleotide end that is downstream and outside the target region.The third binding site (designated as "gRNA-c") is upstream of the cleavage site (i.e., outside the donor nucleotide region), so the fusion protein bound to the third binding site remains bound to the cleaved donor nucleotide end that is upstream and outside the donor nucleotide region. The fourth binding site (designated as "gRNA-d") is downstream of the cleavage site (i.e., outside the donor nucleotide region), so the fusion protein bound to the fourth binding site remains bound to the cleaved donor nucleotide end that is downstream and outside the donor nucleotide region. Without being bound by any particular theory, when the fusion proteins bound to the first and fourth binding sites dimerize (i.e., via dimerization of Trex2), the cleaved polynucleotide end upstream of the target region will come into proximity with the cleaved polynucleotide end downstream of the donor nucleotide region, and dimerization of the fusion proteins bound to the second and third binding sites will cause the cleaved polynucleotide end downstream of the target region to come into proximity with the cleaved polynucleotide end upstream of the donor nucleotide region. When the neighboring polynucleotide ends are ligated via a DNA repair mechanism such as NHEJ or MMEJ, translocation of the polynucleotide (and excision of the target region and donor nucleotide region) will occur.

[0187] Table 1 summarizes the expected editing results induced by the methods of the present disclosure using various fusion protein targeting strategies.

[0188]

Table 1

[0189] The methods herein include providing at least a fusion protein and a nucleic acid to be edited, and may also include providing a donor nucleic acid and / or at least one guide RNA. These various components can be provided using any suitable technique. For example, providing a fusion protein can include introducing the fusion protein into a cell, or introducing a recombinant nucleic acid, construct, or vector encoding the fusion protein into a cell. Similarly, the gRNA can be provided by introducing the gRNA itself or a nucleic acid sequence encoding the gRNA. In some embodiments, the fusion protein and the gRNA may be encoded by the same DNA construct or vector.

Example

[0190] Example 1. Increasing the frequency of fragment inversion between paired guide RNA targeting sites in the maize genome using the LbCas12a-Trex2 fusion In this example, it is demonstrated that the LbCas12a-Trex2 fusion is capable of promoting fragment inversion between two paired gRNA targeting sites that are on the same chromosome and on the same strand (as shown in FIG. 1, for example). The maize downy mildew resistance 6 (DMR6) gene (the nucleotide sequence is shown in SEQ ID NO: 1) was selected as an example to demonstrate this design. Similar designs may be applicable to any other genomic locus of any organism.

[0191]

Table 2

[0192] DMR6 is a well-characterized plant susceptibility gene that was first studied in Arabidopsis. Knocking out this gene in other plant species is expected to confer resistance to multiple pathogens. However, due to the high GC% of the DMR6 protein-coding sequence, it is difficult to find a standard TTTN PAM that can knockout the gene by a simple indel mutation induced by Cas12a. To avoid this difficulty, one guide RNA was designed to target a sequence (SEQ ID NO: 2) adjacent to the TTTG PAM in the promoter region, and a second guide RNA was designed to target a sequence (SEQ ID NO: 3) adjacent to the TTTA PAM in the first intron. When these two guide RNAs are co-expressed together with the LbCas12a nuclease, excision or inversion of genomic fragments between the two targeting sites should occur, enabling knockout of the gene containing the entire first exon. The design of the guide RNAs and the expected results are illustrated in Figure 5.

[0193] According to previous in-house research, it was suggested that it is more difficult to excise or invert the genomic sequence between the paired guide RNA targeting sites using Cas12a compared to Cas9. This may be explained by the fact that, in contrast to the blunt ends generated by Cas9, the double-strand breaks created by Cas12a are sticky ends that are more difficult to separate from each other. To improve the frequency of fragment excision and inversion, various DNA exonucleases, including bacteriophage T5 exonuclease (T5Exo, SEQ ID NO: 4), a 5'→3' exonuclease; and mouse 3 prime repair exonuclease 2 (Trex2, SEQ ID NO: 5), a 3'→5' exonuclease, were fused to LbCas12a (SEQ ID NO: 6) through a flexible GS linker (SEQ ID NO: 7). The purpose was for LbCas12a to trim and separate the sticky ends immediately after generating a double-strand break. This hypothesis is shown in Figure 6 using T5Exo as an example.

[0194] To verify the effectiveness of the Cas12a-exonuclease fusion in maize, a multi-binary vector was constructed as summarized in Table 2. The protein coding sequence of each Cas12a-exonuclease fusion (SEQ ID NO: 8), with nuclear localization signal peptides flanking both ends, was optimized based on maize codon usage and expressed by an expression cassette operably linked to the sugarcane ubiquitin 4 promoter (SEQ ID NO: 9) and the Agrobacterium nopaline synthase terminator (SEQ ID NO: 10). Two guide RNAs (SEQ ID NO: 11 and 12) were expressed in tandem array under the rice U6 promoter (SEQ ID NO: 13) and processed by the native RNase activity of LbCas12a to separate transcripts into mature crRNAs. In all vectors, the Escherichia coli phosphomannose isomerase gene (SEQ ID NO: 14) operably linked to the maize ubiquitin 1 promoter (SEQ ID NO: 15) and the Agrobacterium nopaline synthase terminator (SEQ ID NO: 10) served as a selectable marker.

[0195]

Table 3

[0196] Each construct was delivered by Agrobacterium tumefaciens-mediated transformation or, if the construct was unstable in Agrobacterium, by biolistic transformation, into callus cells derived from immature maize embryos of the inbred line AX5707 owned by Syngenta. The transformed calli were subjected to mannose selection and the plantlets were regenerated by standard tissue culture procedures. Samples for DNA extraction were taken from the regenerated plantlets, and transgenic plants containing the construct were identified by TaqMan assay therein.

[0197] PCR was designed to amplify the genomic sequences around and between two targeting sites and used to characterize the editing results in transgenic plants. The expected amplicon size using the wild-type template is 1,416 bp. The result of excision of the fragment between the targeting sites results in a shorter one of approximately 350 bp. The amplicons of each transgenic plant were sequenced by the Sanger method to identify the inverted alleles and further characterize the repaired junctions at the targeting sites.

[0198] The editing results are summarized in Table 4. Compared with the control construct without the fused exonuclease domain, fusing T5 exonuclease to the N-terminal or C-terminal did not result in an improvement in the frequency of the desired excision or inversion. In contrast, fusing Trex2 to both the N-terminal and C-terminal resulted in significantly higher inversion frequencies of both approximately 20%, while the excision frequency only increased to some extent. Among the transgenic plants containing construct 25962 (encoding a C-terminal LbCas12a-Trex2 fusion), there were three plants in which the targeted 1-kb fragment was inverted in both alleles, and two plants in which the targeted fragment was excised from one allele while the other allele was inverted. The fully sequenced alleles resulting from editing with the C-terminal LbCas12a-Trex2 fusion are shown in Table 5 below together with the unedited wild-type reference sequence. Most of the repaired junctions only lost a few additional base pairs. In Table 5, the gRNA1 target and flanking sequences (100 bp on each side) are underlined; the gRNA2 target and flanking sequences (100 bp on each side) are in italics. The PAM sequence is in bold and the protospacer is double-underlined. The sequences in lowercase are the reverse complements of the reference sequences. The nucleotides enclosed in square brackets are those inserted or mutated during the repair process.

[0199] [Table 4]

[0200]

Table 5-1

Table 5-2

Table 5-3

Table 5-4

Table 5-5

Table 5-6

Table 5-7

Table 5-8

Table 5-9

Table 5-10

Table 5-11

Table 5-12

Table 5-13

Table 5-14

Table 5-15

Table 5-16

Table 5-17

[0201] Example 2. Analysis of one or more molecular mechanisms conferring high inversion rates induced by the LbCas12a-Trex2 fusion. In this example, we will examine the necessity of Trex2 dimerization and / or 3'→5' exonuclease activity in inducing high inversion rates by the LbCas12a-Trex2 fusion. The catalytically defective Trex2 mutant (Trex2 CD , SEQ ID NO: 18), the dimerization-defective Trex2 mutant (Trex2 DD , SEQ ID NO: 19), and the single-stranded Trex2 homodimer (scTrex2, SEQ ID NO: 20) were each fused to the C-terminus of LbCas12a. Then, the efficacy of each fused mutant in inducing inversion will be compared to the original LbCas12a-Trex2 fusion. Based on previous studies (Chen et al., Nucleic Acids Res. 2007 Apr;35(8):2682-2694.; Delacote et al., PLoS One. 2013;8(1):e53217.), Trex2 CD loses exonuclease activity but retains the ability to form homodimers; Trex2 DD has impaired homodimer formation and thus also loses exonuclease activity; scTrex2 has the ability to form intramolecular dimers after translation, which exhibit exonuclease activity but no longer have the ability to mediate dimerization between the two fusion proteins.

[0202] By mutating the codons encoding amino acid residues H188 and D193 in construct 25962 to alanine-encoding codons, construct 27431 expressing LbCas12a-Trex2 CD was created. By mutating the codons encoding amino acid residues E29, K59, N94, R107, and E191 in 25962 to alanine-encoding codons, LbCas12a-Trex2 DDA construct 27432 expressing it was created. By inserting a DNA sequence encoding a polypeptide (TPPQTGLDVPY) linker and a second recoded Trex2 monomer upstream of the right side of the C-terminal NLS (SEQ ID NO: 21), a construct 27433 expressing LbCas12a-scTrex2 was created. All constructs use the same gRNA pair targeting ZmDMR6.

[0203]

Table 6

[0204]

Table 7

[0205] Transgenic plants were created and analyzed as described in Example 1. The efficiency of the targeted inversion of each construct will be compared with constructs 25962 and 26297 and summarized in Table 8. When two catalytic residues of Trex2 are mutated, LbCas12a-Trex2 CD The fusion can still induce fragment inversion at a similar high frequency, if not higher, compared to that induced by the original LbCas12a-Trex2 fusion. When five dimerization-mediating residues are mutated, LbCas12a-Trex DDThe inversion rate induced by the fusion is reduced to only one-third of the inversion rate induced by the original LbCas12a-Trex2 fusion. The partial editing results by LbCas12a-scTrex2 are unexpected - the editing efficiency at the gRNA1 target drops to nearly one-third of the efficiency at the same target by other constructs. However, among the 13 plants in which both targets were edited, in one plant, targeted inversions were identified at a frequency (7.7%) equivalent to 27432 (6.0%). Overall, these results suggest that the high inversion rate induced by the LbCas12a-Trex2 fusion requires not 3'-5' exonuclease activity, but rather the dimerization ability of Trex2.

[0206]

Table 8

[0207] Example 3. Use the LbCas12a-Trex2 fusion to increase the fragment excision frequency between paired guide RNA targeting sites in the maize genome. In this example, it is demonstrated that the LbCas12a-Trex2 fusion can significantly increase the excision frequency of the target region in conjunction with a pair of gRNAs that target the same chromosome (as shown in Figure 2, for example) where the binding sites are on the reverse strand and both are outside the target region.

[0208] In Example 1, two gRNAs in the same orientation were selected to excise the first exon of the ZmDMR6 gene. When the LbCas12a-Trex2 fusion was used in combination with the two gRNAs, the desired excision frequency increased slightly compared to the non-fusion control (see Table 4). Based on the dimerization hypothesis described in Example 2, it was predicted that the excision frequency induced by LbCas12a-Trex2 could be further increased by two gRNAs whose binding sites are outside the target region on the reverse strand. To design such a gRNA pair, while keeping ZmDMR6-crRNA1 as it is, two gRNAs whose targets are on the complementary strand downstream of ZmDMR6-crRNA2, namely ZmDMR6-crRNA3 (SEQ ID NO: 23) and ZmDMR6-crRNA4 (SEQ ID NO: 24), were paired (Figure 7A).

[0209]

Table 9

[0210] By replacing the coding sequence of ZmDMR6-crRNA2 in construct 25962 with the coding sequences of ZmDMR6-crRNA3 and ZmDMR6-crRNA4, respectively, fusion constructs 26710 and 26711 were created. By replacing the coding sequence of ZmDMR6-crRNA2 in construct 26297 with the coding sequences of ZmDMR6-crRNA3 and ZmDMR6-crRNA4, respectively, non-fusion control constructs 26712 and 26713 were created.

[0211]

Table 10

[0212] Transgenic plants were generated and analyzed as described in Example 1, except that the reverse primer for PCR was redesigned. The editing results are summarized in Table 11. Interestingly, for all four constructs, the inversion frequency was extremely low or even zero. The excision frequency obtained for all four constructs was below 10%, but in the presence of Trex2, the excision frequency between ZmDMR6-crRNA1 and ZmDMR6-crRNA4 almost doubled. Other binding sites designed (as shown in Table 1) where a significant improvement in the excision frequency between the two target sites was expected were also verified.

[0213]

Table 11

[0214] For the verification of the prediction that the excision frequency induced by LbCas12a-Trex2 can be significantly increased for two gRNAs whose binding sites are inside the target region on the reverse strand, the ZmWaxy1 gene was selected. ZmWaxy1-crRNA1 (SEQ ID NO: 25) was designed to target the sequence adjacent to the TTTG PAM in exon 4 on the coding strand, while ZmWaxy1-crRNA5 (SEQ ID NO: 26) was designed to target the sequence adjacent to the TTTA PAM in the promoter region on the complementary strand (Figure 7B).

[0215] By replacing the coding sequences of ZmDMR6-crRNA1 and ZmDMR6-crRNA2 in construct 25962 with the coding sequences of ZmWaxy1-crRNA1 and ZmWaxy1-crRNA5 respectively, fusion construct 26958 was created. By replacing the coding sequences of ZmDMR6-crRNA1 and ZmDMR6-crRNA2 in construct 26297 with the coding sequences of ZmWaxy1-crRNA1 and ZmWaxy1-crRNA5 respectively, non-fusion control construct 26961 was created. Transgenic plants were created and analyzed as described in Example 1. The editing results are also summarized in Table 9. For the gRNA pair, a high (about 20%) excision frequency was achieved even without Trex2; however, when Trex2 was added, the excision frequency increased significantly to 58.6%, almost tripling. The results of the excision experiment in this example do not conform to the quantification hypothesis, but can be explained by the alternative hypothesis based on repair inhibition. For example, when Trex2 is present, immediate repair by standard NHEJ between the two ends at one DSB site can be inhibited (possibly due to exonuclease activity), but the two unligated ends at the two DSB sites remain available for NHEJ repair. NHEJ repair between these two unligated ends can then result in a preferred repair outcome depending on the orientation of the gRNA target site.

[0216] Example 4. Use of the LbCas12a-Trex2 fusion to increase the frequency of fragment inversion and excision between paired guide RNA targeting sites in the soybean genome. In this example, the effectiveness of the LbCas12a-Trex2 fusion in increasing the frequency of fragment inversion and excision in soybean, a dicotyledonous crop, is demonstrated. For these experiments, the soybean fatty acid desaturase 2A (FAD2-1A) gene (SEQ ID NO: 27) was selected.

[0217] A first gRNA, GmFAD2A-crRNA1 (SEQ ID NO: 28), was designed to target the sequence adjacent to the TTTG PAM in the FAD2-1A promoter region on the coding strand. A second gRNA, GmFAD2A-crRNA2 (SEQ ID NO: 29), was designed to target the sequence adjacent to the TTTG PAM adjacent to the 3' splicing site of the first intron on the coding strand. The distance between the target sites of crRNA1 and crRNA2 is approximately 1.1 kb. A third gRNA, GmFAD2A-crRNA3 (SEQ ID NO: 30), was designed to target the sequence adjacent to TTTG in the second exon on the complementary strand. A fourth gRNA, GmFAD2A-crRNA4 (SEQ ID NO: 31), was designed to target the sequence adjacent to TTTG in the second exon on the coding strand. The distance between the target sites of crRNA3 and crRNA4 is approximately 1 kb (Figure 8).

[0218] The following result predictions are based on the results of maize described in Examples 1 and 3. GmFAD2A-crRNA1 and crRNA2 are in the same orientation. Therefore, when both gRNAs are co-expressed with the LbCas12a-Trex2 fusion in soybean cells, they are expected to induce a significantly higher frequency of inversion of the fragment between the two gRNA target sites compared to co-expression with LbCas12a. GmFAD2A-crRNA3 and crRNA4 bind to the reverse strand inside the target region. Therefore, when both gRNAs are co-expressed with the LbCas12a-Trex2 fusion in soybean cells, they are expected to induce a significantly higher frequency of excision of the fragment between the two gRNA target sites compared to co-expression with LbCas12a.

[0219] The binary vectors constructed to verify the effectiveness of the Cas12a-exonuclease fusion in soybean are summarized in Table 12. The Arabidopsis codon-optimized coding sequence of the LbCas12a-Trex2 fusion (SEQ ID NO: 32), or the Arabidopsis codon-optimized coding sequence of LbCas12a (SEQ ID NO: 33), was operably linked to the sesame mosaic virus (FMV) enhancer (SEQ ID NO: 34), the promoter of the Arabidopsis EF1A elongation factor gene (SEQ ID NO: 35), and the Agrobacterium nopaline synthase terminator (SEQ ID NO: 10) for constitutive expression in soybean cells. Two gRNA pairs flanking a tandem array of hammerhead (HH) type and hepatitis delta virus (HDV) ribozymes (SEQ ID NOs: 36 and 37, respectively) were expressed under the soybean ubiquitin promoter (SEQ ID NO: 38). Transcripts with the gRNA array were processed by ribozyme self-cleavage and the intrinsic RNase activity of LbCas12a into separate mature crRNAs. The aminoglycoside 3'-adenyltransferase gene (SEQ ID NO: 39) operably linked to the promoter of the soybean EF1 elongation factor gene (SEQ ID NO: 40) and the terminator of the pea (Pisum sativum) ribulose-1,5-bisphosphate carboxylase (rbcS2) small subunit E9 gene (SEQ ID NO: 41) confers resistance to the antibiotic spectinomycin in plant cells and served as a plant selection marker in all vectors.

[0220]

Table 12

[0221]

Table 13

[0222] Each construct was delivered by Agrobacterium tumefaciens-mediated transformation to the stem cells in the cotyledons of the soybean elite line 06KG owned by Syngenta. The transformed cotyledons were subjected to spectinomycin selection. The regenerated plants will be regenerated by tissue culture procedures. Samples for DNA extraction were taken from the regenerated plants, and transgenic plants containing the construct were identified by TaqMan assay among them.

[0223] In the samples, inversion and excision events were detected using PCR. For 27188 and 27194, a forward primer upstream of the GmFAD2A-crRNA1 target site and a reverse primer downstream of the GmFAD2A-crRNA2 target site were designed to amplify the sequences around and between these two target sites. A second "forward primer" between the GmFAD2A-crRNA1 and GmFAD2A-crRNA2 target sites (priming the same strand as the first forward primer primes) was designed to amplify the inverted fragment when paired with the first forward primer. For 27253 and 27254, a forward primer upstream of the GmFAD2A-crRNA3 target site and a reverse primer downstream of the GmFAD2A-crRNA4 target site were designed to amplify the sequences around and between these two target sites. And as a result of the fragment between these two target sites being excised, a smaller PCR amplicon will be obtained. The results based on agarose gel electrophoresis of the PCR products are summarized in Table 14. These results support the findings in maize: when two pairs of target sites are on the same strand, with one within and one outside the target region, the editing result is that fragment inversion is strongly preferred; when two pairs of target sites are on the opposite strands and both are within the range of the target region, the editing result is that excision of the fragment is preferred.

[0224] [Table 14]

[0225] Due to the high degree of mosaicism frequently observed in T0 plants created by transformation based on organogenesis, the ratio of desired mutations will be determined by NGS, and the heritability of the desired mutations will be determined in the T1 population.

[0226] Example 5. Use of the Cas12a-Trex2 fusion to achieve efficient targeted insertion by NHEJ / MMEJ repair in maize. In this example, it is demonstrated that the LbCas12a-Trex2 fusion, in combination with a gRNA in a carefully designed orientation, can efficiently mediate the insertion of a 3.6 kb expression cassette of the 5-enolpyruvylshikimate-3-phosphate synthase gene from Agrobacterium tumefaciens strain CP4 (CP4 EPSPS; cargo sequence, SEQ ID NO: 42) into a targeted gap in the maize genome by the NHEJ / MMEJ repair pathway (i.e., without requiring long homology in the donor DNA).

[0227] gRNAs ZmDMR6-crRNA1 and ZmDMR6-crRNA3 (see Example 3), which have binding sites outside the target region on the reverse strand (as shown in FIG. 3A for example), were selected to generate a gap between these two target sites of the ZmDMR6 gene. By adding the target sequences of crRNA1 and crRNA2 (SEQ ID NOs: 43 and 44 respectively) to the 5' and 3' ends of the cargo sequence, respectively, although the crRNA binding sites are on the reverse strand, they are within the range of the donor nucleotide region (as shown in FIG. 3A for example), resulting in the generation of a cargo-containing donor DNA. The donor construct 27022 was obtained by cloning this donor DNA into a 1.8 kb miniaturized pUC57 backbone.

[0228] [Table 15]

[0229]

Table 16

[0230] The linearized plasmid DNAs of constructs 26710 and 27022 will be co-delivered to maize immature embryos by biolistic transformation. The bombarded embryos will be subjected to mannose selection, and the plantlets will be regenerated by tissue culture. Samples for DNA extraction will be taken from the regenerated plantlets, and transgenic plants containing the construct will be identified by TaqMan assay therein. As a non-fused control, transgenic plants co-transformed with constructs 26712 and 27022 will be created following the same procedure.

[0231] To identify the insertion at the target locus, multiplex junction PCR using genome-specific primers and cargo-specific primers, respectively, will be designed. The amplicons of junction PCR will be sequenced by the Sanger method to characterize the junction sequences. Long PCR using two genome-specific primers adjacent to the insertion site will be performed to confirm the length of the insertion. Since the cargo is inserted by NHEJ / MMEJ repair, the cargo can be inserted in either direction, and small indels are expected to be found at the junction.

[0232] Example 6. Use of the Cas12a-Trex2 fusion to achieve efficient targeted insertion by homologous recombination repair (HDR) in maize. In this example, it is demonstrated that the LbCas12a-Trex2 fusion, in combination with a carefully designed gRNA in the correct orientation, can efficiently mediate the insertion of a 3.8 kb CP4 EPSPS expression cassette (cargo sequence, SEQ ID NO: 45) into a targeted gap in the maize genome by the HDR pathway (i.e., with long homology sequences flanking the cargo sequence). The intergenic region on chromosome 1 of maize identified as ZmSH1 was selected for demonstration of this design.

[0233] One gRNA, ZmSH1-crRNA1 (SEQ ID NO: 46), was designed to target the sequence adjacent to the TTTG PAM on the plus strand. Another gRNA, ZmSH1-crRNA2 (SEQ ID NO: 47), was designed to target the sequence adjacent to the TTTG PAM on the minus strand. In the AX5707 genome, the two gRNA target sites are 167 bp apart, which are outside the target region on the reverse strand and lead to the generation of a gap when co-expressed with the ZmSH1 LbCas12a-Trex2 fusion or non-fused LbCas12a.

[0234] A 458-bp genomic sequence (SEQ ID NO: 48) upstream of the PAM targeted by ZmSH1-crRNA1 was selected as the left homology arm (LHA) for mediating HDR and added to the 5'-end of the cargo sequence. A 509-bp genomic sequence (SEQ ID NO: 49) downstream of the PAM targeted by ZmSH1-crRNA2 was selected as the right homology arm (RHA) and added to the 3'-end of the cargo sequence. When the target sequences of ZmSH1-crRNA1 and ZmSH1-crRNA2 (both with TTTG PAM added to the 5'-end) are added to the 5'-end of the left homology arm and the 3'-end of the right homology arm, respectively, such that their crRNA binding sites are on the reverse strand but within the donor nucleotide region, donor DNA is generated (Figure 9).

[0235]

Table 17

[0236] The donor DNA, Cas12a-Trex2 expression cassette, gRNA expression cassette, and PMI selection marker will be constructed in a single T-DNA region in the binary vector. The resulting construct will be used to transform immature maize embryos, transgenic plants will be created by tissue culture with mannose selection, and it will be identified by TaqMan assay.

[0237] To identify the desired insertion at the target locus, multiplex junction PCR will be designed, each using genomic-specific primers and cargo-specific primers. The amplicons of the junction PCR will be sequenced by the Sanger method to characterize the junction sequences. Long PCR will be performed using two genomic-specific primers adjacent to the insertion site to confirm the length of the insertion. Since the cargo is inserted by HDR repair, the cargo is expected to be inserted in the designed direction as it is in the donor DNA and the junction is expected to contain no additional mutations.

[0238] Example 7. Use of the Cas12a-Trex2 fusion to achieve efficient targeted chromosomal translocation. In this example, it is demonstrated that the LbCas12a-Trex2 fusion, in combination with a carefully designed gRNA, can efficiently mediate the translocation of chromosomal arms between two non-homologous chromosomes. The soybean FAD2-1A gene on chromosome 10 and its paralog FAD2-1B gene on chromosome 20 are selected as targets to demonstrate this design.

[0239] One gRNA will be designed to simultaneously target sequences present in both FAD2-1A and FAD2-1B adjacent to the TTTV PAM. A second gRNA will be designed to simultaneously target sequences present in both FAD2-1A and FAD2-1B adjacent to the TTTV PAM on a different reverse strand. Co-expression of both the gRNA and the LbCas12a-Trex2 fusion will create gaps in both FAD2-1A and FAD2-1B, and the fusion molecule will remain bound to the genomic termini adjacent to the gaps. As illustrated in Figure 4, dimerization of Trex2 will lead to the exchange of chromosomal arms between chromosome 10 and chromosome 20 at some frequency. Chromosomal translocation will be identified by junction PCR using one FAD2-1A-specific primer and one FAD2-1B-specific primer each.

[0240] All patents, patent publications, patent applications, journal articles, books, technical references, etc. discussed in this disclosure are hereby incorporated by reference in their entirety for all purposes.

[0241] It should be understood that the figures and descriptions of this disclosure are simplified to illustrate relevant elements for a clear understanding of the disclosure. These figures should be recognized as being presented for illustrative purposes rather than as structural diagrams. Omitted details and variations or alternative embodiments are within the scope of understanding of those skilled in the art.

[0242] In certain aspects of this disclosure, it will be understood that a single component can be replaced by multiple components, and multiple components can be replaced by a single component, in order to provide an element or structure or to perform a given one or more functions. Such replacements are considered to be within the scope of this disclosure, except where such replacements do not function in the implementation of a particular embodiment of this disclosure.

[0243] The examples presented herein are intended to illustrate potential and specific embodiments of this disclosure. It will be understood that the examples are primarily for the purpose of illustrating this disclosure to those skilled in the art. Without departing from the spirit of this disclosure, there can be variations in these figures or the operations described herein. For example, in certain cases, method steps or operations can be performed or executed in a different order, or operations can be added, deleted, or modified.

[0244] When a range of values is provided, it is understood that each value intervening between the upper and lower limits of that range, down to the minimum unit fraction of the lower limit value, is also specifically disclosed, unless the context clearly dictates otherwise. Any narrower range between any explicitly stated value or intervening value within an explicitly stated range and any other explicitly stated value or intervening value within an explicitly stated range is included. The upper and lower limits of those smaller ranges may independently be included or excluded from that range, and subject to any specifically excluded upper or lower limit values in the explicitly stated range, each range where either one, neither, or both are included in that smaller range is likewise included within the scope of the present technology. When the explicitly stated range includes one or both of the upper and lower limit values, ranges excluding one or both of those included upper and lower limit values are also included.

[0245] In the foregoing description, numerous specific details are set forth in order to provide a more thorough understanding of the present invention. However, it will be apparent to one of ordinary skill in the art that the present invention as described in this disclosure may be practiced without one or more of these specific details. In other instances, well-known features and procedures that are known to those of ordinary skill in the art have not been described so as not to obscure the present invention. Embodiments of the present disclosure are described for purposes of illustration and not limitation. Although the present invention has been mainly described with reference to specific embodiments, it is contemplated that other embodiments will become apparent to those of ordinary skill in the art upon reading this disclosure, and such embodiments are intended to be included within the scope of the method of the present invention. Accordingly, the present disclosure is not limited to the embodiments described above or depicted in the drawings, and various embodiments and variations can be realized without departing from the scope of the following claims.

Claims

1. A fusion protein comprising a CRISPR-related nuclease linked to a nonspecific endoprocessing enzyme, wherein the nonspecific endoprocessing enzyme is a nonspecific exonuclease, and the nonspecific exonuclease is Trex2, Trex1, Escherichia coli (E. coli) exonuclease I, exonuclease III, exonuclease T, exonuclease IX, exonuclease X, RecJ, Pol II, Pol IIIε; WRN, MRE11, APE1, VDJP, RAD1, RAD9, or p53, and optionally comprising (i) a linker located between the CRISPR-related nuclease and the nonspecific endoprocessing enzyme; and / or (ii) a nuclear localization signal.

2. The fusion protein according to claim 1, wherein the CRISPR-related nuclease is selected from the group consisting of Cas5, Cas6, Cas7, Cas8, Cas9, Cas12a, Cas12b, Cas12i, Cas12j, Cas12L, Cas12e, Cas12c, Cas12d, Cas12g, Cas12h, TnpB, Cas13a, Cas13b, Cas14, and their nickase or inactivated versions.

3. The fusion protein according to claim 2, wherein the CRISPR-related nuclease is a Cas9 enzyme or a Cas12a enzyme.

4. (i) The nonspecific endoprocessing enzyme comprises an amino acid sequence having at least 90% identity with any one of SEQ ID NOs: 4, 5, 18, 19, 20, 22, or 58-74; (ii) The nonspecific endoprocessing enzyme is a monomer of the dimerizing protein; (iii) The linker includes Sequence ID No. 7; and / or (iv) The fusion protein according to any one of claims 1 to 3, wherein the fusion protein comprises an amino acid sequence having at least 90% identity with any one of sequence numbers 50 to 57.

5. Recombinant nucleic acid encoding a fusion protein according to any one of claims 1 to 3.

6. A DNA construct comprising a promoter operably linked to the recombinant nucleic acid according to claim 5.

7. The promoter is at least one of the following: an inducible promoter, a constitutive promoter, an egg cell-specific promoter, a pollen-specific promoter, or an apical meristem-specific promoter. The DNA construct according to claim 6, wherein optionally, the promoter is a ubiquitin 4 promoter, an actin promoter, a tubulin promoter, a MADS box promoter, or a plant virus promoter.

8. A cell comprising a recombinant nucleic acid according to any one of claims 1 to 3, a DNA construct including a promoter operably linked to the recombinant nucleic acid, or a vector including the recombinant nucleic acid, Optionally, the cell is a plant cell selected from maize plant cells, soybean plant cells, rice plant cells, wheat plant cells, or sunflower plant cells.

9. A method for editing nucleic acids, a. To provide at least one fusion protein according to any one of claims 1 to 3; b. Providing the nucleic acid, wherein the nucleic acid comprises a first binding site, a second binding site, and a target region including a portion of the nucleic acid, wherein the first binding site is adjacent to the 5' end of the target region, and the second binding site is adjacent to the 3' end of the target region; and c. Contacting the nucleic acid with the at least one fusion protein, wherein the at least one fusion protein specifically binds to the first binding site and the second binding site, thereby causing editing of the target region of the nucleic acid. Includes, The method provides at least one first guide RNA and at least one second guide RNA, further comprising that the at least one first guide RNA comprises a nucleotide sequence complementary to the first binding site, the at least one second guide RNA comprises a nucleotide sequence complementary to the second binding site, and the editing comprises excision, inversion, or substitution of at least a portion of the target region.

10. The method according to claim 9, wherein the first binding site and the second binding site are on the same chain or on opposite chains.

11. (i) at least one of the first binding site and / or the second binding site is located within the first target region; or (ii) The method according to claim 9, wherein neither the first binding site nor the second binding site is located within the first target region.

12. The method further comprises providing a donor nucleic acid, wherein the donor nucleic acid comprises a third binding site, a fourth binding site, and a donor nucleotide region, the third binding site being adjacent to the 5' end of the donor nucleotide region, the fourth binding site being adjacent to the 3' end of the donor nucleotide region, and the at least one fusion protein specifically binding to the third binding site and the fourth binding site, The method according to claim 9, optionally further comprising providing at least one third guide RNA and at least one fourth guide RNA, wherein the at least one third guide RNA comprises a nucleotide sequence complementary to the third binding site, and the at least one fourth guide RNA comprises a nucleotide sequence complementary to the fourth binding site.

13. (i) The third binding site and the fourth binding site are on the same chain or on the opposite chain; (ii) At least one of the third binding site or the fourth binding site is located within the donor nucleotide region; (iii) Both the third binding site and the fourth binding site are located within the donor nucleotide region; and / or (iv) The method according to claim 12, wherein neither the third binding site nor the fourth binding site is located within the donor nucleotide region.

14. (i) The nucleic acid is a part of the first chromosome; and / or (ii) The donor nucleic acid is part of a donor template, and the donor template is part of a plasmid or linear nucleic acid, optionally The method according to claim 12, wherein the donor nucleic acid is a portion of the second chromosome.

15. (i) The first chromosome and the second chromosome are homologous or non-homologous chromosomes; and / or (ii) The method of claim 14, wherein the editing is a chromosomal rearrangement or substitution of at least a portion of the target region, and optionally the chromosomal rearrangement is a reciprocal translocation or a non-reciprocal translocation.