Increased meiotic recombination in plants
By targeting histone methyltransferases like SDG2 and PRDM9 to specific genomic loci using CRISPR-based systems, meiotic recombination rates in plants are enhanced, addressing the limitations of current methods and accelerating trait transfer in crop varieties.
Patent Information
- Authority / Receiving Office
- WO · WO
- Patent Type
- Applications
- Current Assignee / Owner
- RGT UNIV OF CALIFORNIA
- Filing Date
- 2025-12-02
- Publication Date
- 2026-06-11
Smart Images

Figure US2025057691_11062026_PF_FP_ABST
Abstract
Description
Attorney Docket No. 26223-20028.40INCREASED MEIOTIC RECOMBINATION IN PLANTSCROSS-REFERENCE TO RELATED APPLICATIONS
[0001] This application claims priority to and benefit of U.S. Provisional Application No. 63 / 727,095, filed December 2, 2024, the contents of which are hereby incorporated by reference in their entirety.REFERENCE TO AN ELECTRONIC SEQUENCE LISTING
[0002] The contents of the electronic sequence listing (262232002840SEQLIST.xml; Size: 362,028 bytes; and Date of Creation: November 19, 2025) is herein incorporated by reference in its entirety.FIELD
[0003] The present disclosure relates to stimulating recombination in plants by targeting histone methyltransferases to specific loci in which increased recombination is desired. Specifically, the present disclosure provides methods and compositions for using guided (e.g. RNA-guided) histone methyltransferases (e.g., H3K4me3 methyltransferases) to increase meiotic crossover rates at desired repeat-rich loci, such as centromeric and centromere- proximal loci, in plants.BACKGROUND
[0004] Meiotic crossover recombination rates impose a limit on the speed with which new traits can be transferred to elite crop varieties.
[0005] There is currently no robust method for selectively increasing rates of meiotic recombination by site-directed alteration of histone and / or DNA methylation patterns in plants. Accordingly, there is a need for methods of increasing meiotic recombination rates at desired loci in plant genomes, such as low-recombining, repeat-rich centromeric and centromere-proximal regions.BRIEF SUMMARY
[0006] In one aspect, the present disclosure provides a method for producing a plurality of plant meiocytes having an increased rate of recombination between a first genomic locus and a second genomic locus, the method including: (a) providing a plant including a recombinant polypeptide including a histone methyltransferase domain and that is capable of1MF-364578422Attorney Docket No. 26223-20028.40 being targeted to a target locus between the first genomic locus and the second genomic locus; (b) growing the plant under conditions whereby the recombinant polypeptide is targeted to the target locus in a plurality of plant meiocyte precursor cells; and (c) growing the plant under conditions whereby the plurality of plant meiocyte precursor cells undergo meiosis, thereby producing a plurality of plant meiocytes having an increased rate of recombination between the first genomic locus and the second genomic locus, wherein the rate of recombination is measured relative to a comparator plurality of plant meiocytes. In some embodiments, (a) the plant further includes: i) a second recombinant polypeptide including 1) a nuclease-deficient CAS9 polypeptide (dCAS9) or fragment thereof and 2) a multimerized epitope; and ii) a crRNA and a tracr RNA, or fusions thereof; wherein the recombinant polypeptide including a histone methyltransferase domain further includes an affinity polypeptide that specifically binds to the epitope; and (b) the growing includes conditions whereby the second recombinant polypeptide and the recombinant polypeptide including the histone methyltransferase domain are targeted to the target locus.
[0007] In another aspect, the present disclosure provides a method for producing a progeny plant, including: (a) providing a plant including a recombinant polypeptide including a histone methyltransferase domain and that is capable of being targeted to a target locus between a first genomic locus a second genomic locus; (b) growing the plant under conditions whereby the recombinant polypeptide is targeted to the target locus in a plant meiocyte precursor cell; (c) producing a first plant meiocyte from the plant meiocyte precursor cell; and (d) crossing the first plant meiocyte with a second plant meiocyte, thereby producing a progeny plant, wherein the progeny plant includes an increased number of recombined genomic sequences between the first genomic locus and the second genomic locus compared to a comparator progeny plant. In some embodiments, (a) the plant further includes: i) a second recombinant polypeptide including 1) a nuclease-deficientCAS9 polypeptide (dCAS9) or fragment thereof and 2) a multimerized epitope; and a crRNA and a tracr RNA, or fusions thereof; wherein the recombinant polypeptide including a histone methyltransferase domain further includes an affinity polypeptide that specifically binds to the epitope; and (b) the growing includes conditions whereby the second recombinant polypeptide and the recombinant polypeptide including the histone methyltransferase domain are targeted to the target locus.
[0008] In some embodiments that may be combined with any of the preceding embodiments, the multimerized epitope includes a GCN4 epitope. In some embodiments that2MF-364578422Attorney Docket No. 26223-20028.40 may be combined with any of the preceding embodiments, the second polypeptide includes a nuclear localization signal (NLS). In some embodiments that may be combined with any of the preceding embodiments, the affinity polypeptide is an antibody. In some embodiments, the antibody is an scFv antibody. In some embodiments that may be combined with any of the preceding embodiments, the polypeptide including a histone methyltransferase domain includes an SV40-type NLS.
[0009] In some embodiments, the rate of recombination in the plurality of plant meiocytes is at least 10% higher, at least 20% higher, at least 30% higher, at least 40% higher, at least 50% higher, or more than 50% higher than the rate of recombination in the comparator plurality of plant meiocytes. In some embodiments, the progeny plant includes at least 1 more, at least 2 more, at least 3 more, at least 4 more, at least 5 more, or more than 5 more recombined genomic sequences between the first genomic locus and the second genomic locus compared to the comparator progeny plant.
[0010] In another aspect, the present disclosure provides a plant produced from: (1) a plant meiocyte from the plurality of plant meiocytes produced by the method of any of the preceding embodiments; or (2) the method of any of the preceding embodiments.
[0011] In another aspect, the present disclosure provides a plant part of the plant of any of the preceding embodiments. In another aspect, the present disclosure provides a seed produced by the plant of any of the preceding embodiments. In another aspect, the present disclosure provides a plant generated from the plant part of any of the preceding embodiments or grown from the seed of any of the preceding embodiments. In another aspect, the present disclosure provides a plant derived from the plant of any of the preceding embodiments. In some embodiments that may be combined with any of the preceding embodiments, the plant, plant part, or seed includes increased methylation between the first genomic locus and the second genomic locus compared to a comparator plant, comparator plant part, or comparator seed.
[0012] In some embodiments that may be combined with any of the preceding embodiments, the histone methyltransferase domain deposits H3K4me3. In some embodiments that may be combined with any of the preceding embodiments, the histone methyltransferase domain is an SDG2 polypeptide, a PRDM9 polypeptide, or a fragment thereof. In some embodiments that may be combined with any of the preceding embodiments, the histone methyltransferase domain is at least 70%, at least 75%, at least3MF-364578422Attorney Docket No. 26223-20028.4080%, at least 85%, at least 90%, or at least 95% identical to a sequence selected from the group consisting of the histone methyltransferase amino acid sequences provided herein and homologs and orthologs thereof, optionally wherein the histone methyltransferase domain is at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, or at least 95% identical to the amino acid sequence of an SDG2 polypeptide selected from the group consisting of SEQ ID NO: 1, SEQ ID NO: 2, SEQ ID NO: 3, SEQ ID NO: 4, SEQ ID NO: 5, SEQ ID NO: 6, SEQ ID NO: 7, SEQ ID NO: 8, SEQ ID NO: 9, SEQ ID NO: 10, SEQ ID NO: 11, SEQ ID NO: 12, SEQ ID NO: 13, SEQ ID NO: 14, SEQ ID NO: 15, SEQ ID NO: 16, SEQ ID NO: 17, SEQ ID NO: 18, SEQ ID NO: 19, SEQ ID NO: 20, SEQ ID NO: 21, SEQ ID NO: 22, SEQ ID NO: 23, SEQ ID NO: 24, SEQ ID NO: 25, SEQ ID NO: 26, SEQ ID NO: 27, SEQ ID NO: 28, SEQ ID NO: 29, SEQ ID NO: 93, and SEQ ID NO: 94, or to the amino acid sequence of a PRDM9 polypeptide selected from the group consisting of SEQ ID NO: 30, SEQ ID NO: 31, SEQ ID NO: 32, SEQ ID NO: 33, SEQ ID NO: 34, SEQ ID NO: 35, SEQ ID NO: 36, SEQ ID NO: 37, SEQ ID NO: 38, SEQ ID NO: 39, SEQ ID NO: 40, SEQ ID NO: 41, SEQ ID NO: 42, and SEQ ID NO: 43. In some embodiments that may be combined with any of the preceding embodiments, the histone methyltransferase domain comprises (a) an Arabidopsis SDG2 polypeptide sequence having at least 95% sequence identity to an amino acid sequence selected from the group consisting of SEQ ID NO: 1, SEQ ID NO: 16, SEQ ID NO: 93, and SEQ ID NO: 94; or (b) a murine PRDM9 polypeptide having at least 95% sequence identity to an amino acid sequence selected from the group consisting of SEQ ID NO: 30, SEQ ID NO: 31, SEQ ID NO: 38, and SEQ ID NO: 43.
[0013] In some embodiments that may be combined with any of the preceding embodiments, at least one of the first genomic locus and the second genomic locus are in a centromere or proximal to a centromere. In some embodiments that may be combined with any of the preceding embodiments, the first genomic locus and the second genomic locus are in a centromere.
[0014] In some embodiments that may be combined with any of the preceding embodiments, the second plant meiocyte is derived from the same plant as the first plant meiocyte, and / or the second plant meiocyte is derived from a plant including the first and second recombinant polypeptides. In some embodiments that may be combined with any of the preceding embodiments, the second plant meiocyte is not derived from the same plant as the first plant meiocyte, and / or the second plant meiocyte is derived from a plant lacking the first and second recombinant polypeptides.4MF-364578422Attorney Docket No. 26223-20028.40
[0015] In some embodiments that may be combined with any of the preceding embodiments, the first genomic locus and the second genomic locus are separated by at least 1 kilobase (kb), at least 5 kb, at least 50 kb, at least 100 kb, at least 500 kb, at least 1 Megabase (Mb), at least 2 Mb, at least 3 Mb, at least 5 Mb, at least 10 Mb, at least 50 Mb, or more than 50 Mb.
[0016] In some embodiments that may be combined with any of the preceding embodiments, the gRNA or crRNA includes a sequence that aligns perfectly to the target locus. In some embodiments that may be combined with any of the preceding embodiments, the gRNA or crRNA includes a sequence that aligns to the target locus with five or fewer mismatches. In some embodiments that may be combined with any of the preceding embodiments, the gRNA or crRNA includes a sequence that aligns to a repeat sequence in a centromere or aligns to a repeat sequence proximal to a centromere.DESCRIPTION OF THE FIGURES
[0017] The patent or application file contains at least one drawing executed in color. Copies of this patent or patent application publication with color drawings will be provided by the office upon request and payment of the necessary fee.
[0018] FIGS. 1A-1E illustrate the SunTag system and SDG2. FIG. 1A illustrates the SunTag system for modulation of chromatin. FIG. IB illustrates the structure of SDG2. The upper panel shows SDG2 with catalytic SET domain (drawn approximately to scale). The lower left panel depicts predicted alignment error (PAE) plot from AlphaFold3 for the SDG2 CDS with a box drawn around the region that was cloned into SunTag (amino acids 1571 - 2335). Note the high confidence structural prediction for this region. The lower right depicts the AlphaFold3 top model for SDG2, focusing on the regions cloned into SunTag and indicates the amino acid change used for dSDG2. FIG. 1C illustrates an epifluorescence image of rootstocks from 2-week old representative plants showing nuclear localization of sfGFP from SunTag lines. FIG. ID illustrates a Western blot for the dCas9-10xGCN4 and effector module in SunTag lines showing activation of FWA expression. FIG. IE illustrates RT-qPCR data showing activation of FWA expression.
[0019] FIGS. 2A-2B illustrate the SDG2 Y1903 control sequence. FIG. 2A illustrates an amino acid alignment of SDG2 and related SET domain containing proteins. Y1903 is highlighted. FIG. 2B illustrates a consurf analysis showing the maximum level of conservation for Y1903.5MF-364578422Attorney Docket No. 26223-20028.40
[0020] FIGS. 3A-3J illustrates that SunTag:SDG2 activates FWA mRNA expression. FIG. 3A illustrates RT-qPCR data for WA expression from the genotypes indicated. Error bars represent SEM from three biological replicates. *** indicates p-value <0.001 (students T-test). FIG. 3B illustrates RT-qPCR data for the expression of FWA (upper panel) and sfGFP (lower panel) with individual biological replicates shown. FIG. 3C illustrates ChlP- qPCR data for the presence of SunTag (left panel) and H3K4me3 enrichment (right panel) over the TSS of FWA. The control region (AT5G65130) has high H3K4me3. Error bars represent SEM from 2 biological replicates. FIG. 3D illustrates ChlP-qPCR data with two independent FWA targeting lines. FIG. 3E illustrates the enrichment of SunTag:SDG2:FWA_g4 lines (2 biological replicates each) at FWA TSS in a genome browser image. FIG. 3F illustrates a Venn diagram of the overlap between the peaks called from HA ChlP-seq of two independent SunTag:SDG2 FWA targeting lines. FIG. 3G illustrates a genome browser image of overlapping peak regions for FWA. FIG. 3H illustrates a genome browser image of overlapping peak regions for an off-target region with sequence similarity to the FWA binding site that was previously identified (Papikian et al., 2019). FIG. 31 illustrates a genome browser image of overlapping peak regions for an off- target region with sequence similarity to the FWA binding site that was previously identified (Papikian et al., 2019). FIG. 3J illustrates a comparative MA plot of the transcriptome of SunTag:SDG2:FWA_g4 as compared to the non-transformed control rdr6). Differentially expressed genes are show in in red (FDR < 0.01), with FWA labelled and enlarged for visibility.
[0021] FIGS. 4A-4F illustrate that SunTag:SDG2 targeting to SNC1 enhances resistance to P. syringae. FIG. 4A illustrates the SNC1 gene and images from plants from the genotypes indicated. The upper panel depicts the SNC1 gene, with the red line indicating the SNC1 gRNA target site, drawn to scale. The lower panel depicts representative images of 3-week old plants from the genotypes indicated. FIG. 4B illustrates rosette size quantification. Each dot represents an individual plant. Error bars represent SEM. Different letters indicate significant difference by ANOVA with post-hoc Tukey HSD. FIG. 4C illustrates ChlP-qPCR data for the presence of SunTag at SNC1. FIG. 4D illustrates RT-qPCR data for SNC1 expression. Error bars represent SEM. * indicates p-value < 0.05. FIG. 4E illustrates colonization dynamics of Pst::EUX in the genotypes indicated. Error bars represent SEM. FIG. 4F illustrates a Pst::LUX assay for colonization quantification at 3 days post6MF-364578422Attorney Docket No. 26223-20028.40 inoculation. Different letters indicate significant difference by ANOVA with post-hoc Tukey HSD.
[0022] FIGS. 5A-5I illustrate that centromeric targeting of SunTag:SDG2 elevates meiotic crossover recombination rate. FIG. 5A illustrates the region of chromosome 3, showing CENH3 ChlP-seq enrichment (Naish et al., 2021) with CTL3.9 (Wu et al., 2015) red / green T-DNA marker positions and LRCen3 guide RNA binding sites indicated. FIG. 5B illustrates seeds and crossovers from a CTL3.9 double homozygous line. The left panel depicts representative images of red and green fluorescing seeds from a CTL3.9 double homozygous line. The right panel depicts how crossovers are identified within CTL3.9. FIG. 5C illustrates a boxplot graph of crossover recombination frequency over CTL3.9 in centimorgans. Control data is from Col-0 (non-transgenic) crossed to CTL3.9. Different letters indicate significant difference by ANOVA with post-hoc Tukey HSD. The red dot indicates the seed set line taken to the next generation (F4). FIG. 5D illustrates the recombination rate of Sun-Tag-SDG2 g_LRCen3 independent transformants over CTL3.9 of the F3 lines (as shown in FIGS. 5A-5C and FIG. 51). FIG. 5E illustrates the distortion ratios (green to non-green) for the green T-DNAs from CTL3.9 from the F3S. FIG. 5F illustrates the distortion ratios (red to non-red) for the red T-DNAs from CTL3.9 from the F3S. FIG. 5G illustrates the recombination frequency over CTL3.9 (F3S) with different SunTag-SDG2 guide RNAs, including all scorable individuals including no guide lines. FIG. 5H illustrates the recombination frequency over CTL3.9 (F3S) with different SunTag-SDG2 guide RNAs, filtered for individuals with no distortion (approximately 3:1). FIG. 51 illustrates chromosome- wide plots of chromosome 3. The upper and middle panels depict enrichment of H3K4me3 and SunTag (anti-HA) by ChlP-seq in sibling lines expressing or not expressing SunTag (+ / -). Enrichment is calculated as log2 fold-change over non-transgenic (Col-0) controls in 100 kb windows. The lower panel depicts an LRCen3 guide RNA binding site density over chromosome 3. CTL3.9 marker positions are indicated, and centromeric regions are shown in grey.
[0023] FIG. 6A illustrates ChIP replicate comparisons by Pearsons correlation clustering using 25bp bin resolution for anti-H3K4me3. FIG. 6B illustrates ChIP replicate comparisons by Pearsons correlation clustering using 25bp bin resolution for anti-HA. FIGS. 6C-6D illustrate ChlPseq data chromosomal-wide plots as shown in FIGS. 5A-5C and FIG. 51, over all chromosomes from sibling lines expressing or not expressing SunTag:SDG2:LRCen3 (+ / -7MF-364578422Attorney Docket No. 26223-20028.40). Enrichment is calculated as log2 fold change over non-transgenic (Col-0) controls in lOOkb windows. CTL3.9 marker positions are indicated, and centromeric regions are shown in grey.
[0024] FIG. 7A illustrates a LRCen3 guide RNA binding site density plot with 0 mismatch tolerance. FIG. 7B illustrates a LRCen3 guide RNA binding site density plot with 1 mismatch tolerance.
[0025] FIG. 8 illustrates representative images from 3 -week old plants.
[0026] FIG. 9A illustrates that SunTag-SDG2 no guide causes bulk increases in H3K4me3. The upper panel depicts a Western blot showing bulk levels of H3K4me3 normalized to H3 in the genotypes indicated. The lower panel depicts the ratio of H3K4me3 / H3. FIG. 9B illustrates that overexpression of SunTag-SDG2dcat effector can activate FWA. Panels depict RT-qPCR data for expression of FWA (upper) and sfGFP (middle) and ratio (lower) in the genotypes indicated.
[0027] FIGS. 10A-10H illustrate that SunTag PRDM9 is sufficient for activation of FWA. FIG. 10A illustrates the structure of PRDM9. The upper panel depicts PRDM9 with catalytic SET domain (drawn approximately to scale). The lower left panel depicts PAE (predicted alignment error) plot from AlphaFold3 for the PRDM9 CDS with a box drawn around the region that was cloned into SunTag (amino acids 110-417). The lower right panel depicts the AlphaFold3 top model for PRDM9, focusing on the regions cloned into SunTag and indicates the amino acid change used for dPRDM9. FIG. 10B illustrates RT-qPCR data for FWA (upper panel) and the effector module (sfGFP, lower panel) for the genotypes indicated. Two independent T3 lines are used per genotype. Error bars represent SEM. FIG. 10C illustrates RT-qPCR data of FWA expression (upper) and sfGFP expression (lower) for individual Ti plants from the genotypes indicated. FIG. 10D illustrates ChlP-qPCR data for the presence of SunTag (left panel) and H3K4me3 enrichment (right panel) over the TSS of FWA. Error bars represent SEM from 2 biological replicates. FIG. 10E illustrates a comparative MA plot of the transcriptome of SunTag :PRDM9:FWA_g4 as compared to the non-transformed control (rdr6). Differentially expressed genes are show in in red (FDR < 0.01), with FWA labelled and enlarged for visibility. FIG. 10F illustrates a MA plot for the transcriptome of SunTag:SDG2:No_guide versus the non-transformed control (rdr6). FIG. 10G illustrates a MA plot for transcriptome of SunTag:PDRM9:No_guide versus the nontransformed control rdr6). Differentially expressed genes are depicted in red (FDR < 0.1). Note that FWA is not identified as differentially expressed in either FIG. 10F or FIG. 10G.8MF-364578422Attorney Docket No. 26223-20028.40FIG. 10H illustrates efficient SunTag targeting and H3K4me3 enrichment at FWA in a genome browser image. Biological replicates from independent T3 lines are overlaid.
[0028] FIG. 11 illustrates crossover recombination frequency over CTL3.9 in centiMorgans (p < 0.001, two-sample, two- sided t-test). Boxplots show the median, the interquartile range, whiskers extending to 1.5 times the interquartile range, and individual data points plotted as dots, n = 24 for control, n = 17 for Suntag:PRDM9:LRCen3_g.DETAILED DESCRIPTION
[0029] The following description is presented to enable a person of ordinary skill in the art to make and use the various embodiments. Descriptions of specific devices, techniques, methods, and applications are provided only as examples. Various modifications to the examples described herein will be readily apparent to those of ordinary skill in the art, and the general principles defined herein may be applied to other examples and applications without departing from the spirit and scope of the various embodiments. Thus, the various embodiments are not intended to be limited to the examples described herein and shown, but are to be accorded the scope consistent with the claims.
[0030] The present disclosure relates generally to the targeting of histone methyltransferase domains to specific loci in plants to induce recombination in or around the targeted loci. For example, the present disclosure provides methods and compositions for using RNA-guided histone methyltransferase domains to induce recombination at or near specific loci (e.g., in centromeres) in plants.
[0031] The present disclosure relates to the deposition of histone methylation at or around a target nucleic acid, which, without wishing to be bound by theory, is believed to promote meiotic recombination, even in DNA sequences that typically exhibit only low rates of recombination, such as centromeric and centromere-proximal loci. Furthermore, recruitment of multiple copies of a protein to a target substrate (e.g. DNA, RNA, or protein) may amplify reaction rates in biological systems. Thus, when the protein being targeted comprises a histone methyltransferase domain, providing multiple copies of that histone methyltransferase domain may result in efficient and targeted methylation of histones on or near the targeted nucleic acid (e.g., within about 1, 2, 4, 6, 8, 10, 15, or 20 Megabases (Mb) of the target nucleic acid).
[0032] As provided herein, Applicant successfully demonstrated that targeting histone methyltransferase domains to specific loci can be used to trigger recombination at or around9MF-364578422Attorney Docket No. 26223-20028.40 the targeted locus in plants. This can be used to, for example, increase the rate of recombination in regions of a plant genome that do not typically recombine, also known as recombination “cold spots,” such as, for example, centromeric and centromere-proximal loci. For example, Applicant was able to show that targeting H3K4me3 (histone 3 lysine 4 trimethylation) methyltransferase activity successfully drove increased meiotic crossover recombination when targeted to low-recombining, repeat-rich genomic loci, such as centromeres. This represents the development of a method to promote recombination at specific areas of chromatin.
[0033] Accordingly, the present disclosure provides methods for increasing meiotic crossover recombination rates in a targeted genomic region by recruiting a histone methyltransferase domain (e.g., SDG2, PRDM9) to a target nucleic acid in plants. Multiple copies of the histone methyltransferase domain may be recruited to the target region simultaneously. Recruitment of the histone methyltransferase domain may be accomplished in various different ways, such as, for example, via CRISPR-based targeting in a manner that allows for methylation of histones (e.g., H3K4me3 methylation) on or near the target nucleic acid. In certain aspects, this specific targeting involves the use of a system that includes (1) a nuclease-deficient CAS9 polypeptide that is recombinantly fused to a multimerized epitope, (2) a histone methyltransferase domain polypeptide that is recombinantly fused to an affinity polypeptide, and (3) a guide RNA (gRNA). In this aspect, the dCAS9 portion of the dCAS9- multimerized epitope fusion protein is involved with targeting a target nucleic acid as directed by the guide RNA. The multimerized epitope portion of the dCAS9-multimerized epitope fusion protein is involved with binding to the affinity polypeptide (which is recombinantly fused to a histone methyltransferase domain). The affinity polypeptide portion of the histone methyltransferase domain-affinity polypeptide fusion protein is involved with binding to the multimerized epitope so that the histone methyltransferase domain can be in association with dCAS9. The histone methyltransferase domain portion of the histone methyltransferase domain-affinity polypeptide fusion protein is involved with methylating histones at or near a target nucleic acid, once the complex has been targeted to a target nucleic acid via the guide RNA, which, without wishing to be bound by theory, is believed to promote crossover recombination.
[0034] The use of the terms “a,” “an,” and “the,” and similar referents in the context of describing the disclosure (especially in the context of the following claims) are to be construed to cover both the singular and the plural, unless otherwise indicated herein or10MF-364578422Attorney Docket No. 26223-20028.40 clearly contradicted by context. Conversely, the use of the term “plurality” in the context of describing the disclosure is to be construed to indicate the plural, i.e., more than one, unless otherwise indicated herein or clearly contradicted by context. The terms “comprising,” “having,” “including,” and “containing” are to be constmed as open-ended terms (i.e., meaning “including, but not limited to,”) unless otherwise noted. Recitation of ranges of values herein are merely intended to serve as a shorthand method of referring individually to each separate value falling within the range, unless otherwise indicated herein, and each separate value is incorporated into the specification as if it were individually recited herein. For example, if the range 10-15 is disclosed, then 11, 12, 13, and 14 are also disclosed. All methods described herein can be performed in any suitable order unless otherwise indicated herein or otherwise clearly contradicted by context. The use of any and all examples, or exemplary language (e.g., “such as”) provided herein, is intended merely to better illuminate the embodiments of the disclosure and does not pose a limitation on the scope of the disclosure unless otherwise claimed. No language in the specification should be constmed as indicating any non-claimed element as essential to the practice of the embodiments of the disclosure.
[0035] Reference to “about” a value or parameter herein refers to the usual error range for the respective value readily known to the skilled person in this technical field. Reference to “about” a value or parameter herein includes (and describes) aspects that are directed to that value or parameter per se. For example, description referring to “about X” includes description of “X.”
[0036] It is understood that aspects and embodiments of the present disclosure described herein include “comprising,” “consisting,” and “consisting essentially of’ aspects and embodiments.
[0037] It is to be understood that one, some, or all of the properties of the various embodiments described herein may be combined to form other embodiments of the present disclosure. These and other aspects of the present disclosure will become apparent to one of skill in the art. These and other embodiments of the present disclosure are further described by the detailed description that follows.
[0038] The terms “isolated” and “purified” as used herein refers to a material that is removed from at least one component with which it is naturally associated (e.g., removed from its original environment). The term “isolated,” when used in reference to an isolated11MF-364578422Attorney Docket No. 26223-20028.40 protein, refers to a protein that has been removed from the culture medium of the host cell that expressed the protein. As such an isolated protein is free of extraneous or unwanted compounds (e.g., nucleic acids, native bacterial or other proteins, etc.).Recombinant Polypeptides
[0039] The present disclosure relates to the use of recombinant polypeptides to trigger recombination at or near a target nucleic acid (e.g. recombinant polypeptides comprising a histone methyltransferase domain, such as, for example, recombinant SDG2 polypeptides and / or recombinant PRDM9 polypeptides). In certain aspects, the targeting involves the use of a nuclease-deficient CAS 9 polypeptide that is recombinantly fused to a multimerized epitope. In certain aspects, the targeting involves the use of a histone methyltransferase domain polypeptide that is recombinantly fused to a zinc finger domain polypeptide.
[0040] As used herein, a “polypeptide” is an amino acid sequence including a plurality of consecutive polymerized amino acid residues (e.g., at least about 15 consecutive polymerized amino acid residues). “Polypeptide” refers to an amino acid sequence, oligopeptide, peptide, protein, or portions thereof, and the terms “polypeptide” and “protein” are used interchangeably.
[0041] Polypeptides as described herein also include polypeptides having various amino acid additions, deletions, or substitutions relative to the native amino acid sequence of a polypeptide of the present disclosure. In some embodiments, polypeptides that are homologs of a polypeptide of the present disclosure contain non-conservative changes of certain amino acids relative to the native sequence of a polypeptide of the present disclosure. In some embodiments, polypeptides that are homologs of a polypeptide of the present disclosure contain conservative changes of certain amino acids relative to the native sequence of a polypeptide of the present disclosure, and thus may be referred to as conservatively modified variants. A conservatively modified variant may include individual substitutions, deletions or additions to a polypeptide sequence which result in the substitution of an amino acid with a chemically similar amino acid. Conservative substitution tables providing functionally similar amino acids are well-known in the art. Such conservatively modified variants are in addition to and do not exclude polymorphic variants, interspecies homologs, and alleles of the disclosure. The following eight groups contain amino acids that are conservative substitutions for one another: 1) Alanine (A), Glycine (G); 2) Aspartic acid (D), Glutamic acid (E); 3) Asparagine (N), Glutamine (Q); 4) Arginine (R), Lysine (K); 5) Isoleucine (I), Leucine (L),12MF-364578422Attorney Docket No. 26223-20028.40Methionine (M), Valine (V); 6) Phenylalanine (F), Tyrosine (Y), Tryptophan (W); 7) Serine (S), Threonine (T); and 8) Cysteine (C), Methionine (M) (see, e.g., Creighton, Proteins (1984)). A modification of an amino acid to produce a chemically similar amino acid may be referred to as an analogous amino acid.
[0042] Recombinant polypeptides of the present disclosure that are composed of individual polypeptide domains may be described based on the individual polypeptide domains of the overall recombinant polypeptide. A domain in such a recombinant polypeptide refers to the particular stretches of contiguous amino acid sequences with a particular function or activity, such as a catalytic activity. For example, a recombinant polypeptide that is a fusion of a histone methyltransferase domain polypeptide and an affinity polypeptide, the contiguous amino acids that encode the histone methyltransferase domain polypeptide may be described as the histone methyltransferase domain in the overall recombinant polypeptide, and the contiguous amino acids that encode the affinity polypeptide may be described as the affinity domain in the overall recombinant polypeptide. Individual domains in an overall recombinant protein may also be referred to as units of the recombinant protein. Recombinant polypeptides that are composed of individual polypeptide domains may also be referred to as fusion polypeptides.
[0043] Certain aspects of the present disclosure relate to a nuclease-deficient CAS 9 polypeptide that is recombinantly fused to a multimerized epitope (e.g. dCAS9-multimerized epitope fusion protein). The dCAS9 polypeptide domain of a dCAS9-multimerized epitope fusion protein may be in an N-terminal orientation or a C-terminal orientation relative to the multimerized epitope domain. The multimerized epitope domain of a dCAS9-multimerized epitope fusion protein may be in an N-terminal orientation or a C-terminal orientation relative to the dCAS9 polypeptide domain. In some embodiments, a dCAS9-multimerized epitope fusion protein may be a direct fusion of a dCAS9 polypeptide domain and a multimerized epitope domain. In some embodiments, a dCAS9-multimerized epitope fusion protein may be an indirect fusion of a dCAS9 polypeptide domain and a multimerized epitope domain. In embodiments where the fusion is indirect, a linker domain or other contiguous amino acid sequence may separate the dCAS9 polypeptide domain and the multimerized epitope domain.
[0044] Certain aspects of the present disclosure relate to a histone methyltransferase domain polypeptide (e.g., SDG2, PRDM9) that is recombinantly fused to an affinity polypeptide (e.g. histone methyltransferase domain- affinity polypeptide fusion protein). The histone methyltransferase domain polypeptide domain of a histone methyltransferase domain-13MF-364578422Attorney Docket No. 26223-20028.40 affinity polypeptide fusion protein may be in an N-terminal orientation or a C-terminal orientation relative to the affinity polypeptide. The affinity polypeptide domain of a histone methyltransferase domain-affinity polypeptide fusion protein may be in an N-terminal orientation or a C-terminal orientation relative to the histone methyltransferase domain polypeptide domain. In some embodiments, a histone methyltransferase domain-affinity polypeptide fusion protein may be a direct fusion of a histone methyltransferase domain polypeptide domain and an affinity polypeptide domain. In some embodiments, a histone methyltransferase domain-affinity polypeptide fusion protein may be an indirect fusion of a histone methyltransferase domain polypeptide domain and an affinity polypeptide domain. In embodiments where the fusion is indirect, a linker domain or other contiguous amino acid sequence may separate the histone methyltransferase domain polypeptide domain and the affinity polypeptide domain.Histone Methyltransferase Domains
[0045] Certain aspects of the present disclosure involve targeting a histone methyltransferase domain to a target nucleic acid such that the histone methyltransferase domain methylates the target nucleic acid and leads to an increase in the recombination rate at or near the target nucleic acid in the event of meiosis. In some embodiments, a histone methyltransferase domain is present in a recombinant polypeptide that contains a histone methyltransferase domain polypeptide and an affinity polypeptide. Various different histone methyltransferase domains may be used in the methods of the present disclosure.
[0046] Histone methyltransferases are polypeptides that facilitate the methylation of histones. Specifically, histone methyltransferase domains catalyze the addition of one or multiple methyl groups to lysine (K) or arginine (R) residues on histone polypeptides. Some histone methyltransferase polypeptides preferentially methylate H3 histones, while some histone methyltransferase polypeptides preferentially methylate H4 histones. Histone methyltransferase polypeptides of the present disclosure may deposit methyl groups in various different types of patterns, such as, for example, as H3K4me3, which refers to trimethylation of lysine 4 on the histone H3 protein. Histone methyltransferase polypeptides may also deposit methyl groups to produce other types of methylation patterns, including, for example, mono-methylation, di-methylation, or tri-methylation of lysine 4 on the histone H3 protein, lysine 9 on the histone H3 protein, lysine 27 on the histone H3 protein, or lysine 36 on the histone H3 protein (see, e.g., Fang et al., Reprogramming of Histone H3 Lysine14MF-364578422Attorney Docket No. 26223-20028.40Methylation During Plant Sexual Reproduction, 12 Frontiers in Plant Science 782450 (2021)). The deposition of methyl groups on histone polypeptides can be measured by various techniques, including, for example, Western blotting, mass spectrometry, or chromatin immunoprecipitation sequencing.
[0047] A histone methyltransferase of the present disclosure may methylate histones at the target locus (e.g., on a sequence directly targeted by a recombinant peptide and / or guide RNA provided herein) or near the target locus (e.g., within about 10 kb, 1 kb, 500 bp, 400 bp, 300 bp, 200 bp, or 100 bp on either side of the target nucleic acid, or within about 0-1, 1-2, 2- 3, 3-4, 4-5, 5-6, 6-7, 7-8, 8-9, 9-10, 10-15, 15-20, 20-25, 25-30, 30-35, 35-40, 40-45, or 45-50 Mb on either side of the target nucleic acid, or elsewhere between the first and second genomic loci between which recombination is measured).
[0048] Histone methyltransferase domains may interact with DNA-binding proteins and methylate histones on various types of genomic regions, such as, for example, enhancers, promoters, other regulatory elements, and / or coding regions of a nucleic acid, which can then alter expression of the nucleic acid and / or the likelihood of a recombination crossover event occurring at or near the nucleic acid. Histone methyltransferase domains and / or methylated histones may interact with proteins that are components of transcriptional machinery or other proteins that are involved in regulation of transcription in a manner that promotes expression of the nucleic acid. Histone methyltransferase domains and / or methylated histones may also promote local recombination. Histone methyltransferase domains of the present disclosure may be, for example, from a histone lysine methyltransferase (KMTs) and / or from a protein arginine methyltransferase (PRMTs). Examples of such histone methyltransferases are known in the art, including, for example, SET domain containing histone methyltransferases and non-SET domain containing histone methyltransferases.
[0049] Histone methyltransferase domains of the present disclosure may be endogenous to the host plant, or they may be exogenous / heterologous to the host plant. In some embodiments, the histone methyltransferase domain is a plant histone methyltransferase domain. In some embodiments, the histone methyltransferase domain is derived from Arabidopsis. For example, one or more copies of a SET DOMAIN GROUP2 (SDG2) domain may be used herein. In some embodiments, at least two, at least three, or at least four or more copies of a SDG2 domain may be used as a histone methyltransferase domain. A polypeptide containing at least one copy of a domain from SDG2 or from a homolog or ortholog thereof is known as a SDG2 domain. In some embodiments, the histone methyltransferase domain is15MF-364578422Attorney Docket No. 26223-20028.40 an SDG2 polypeptide. A SDG2 polypeptide of the present disclosure may contain an amino acid sequence with at least about 20%, at least about 25%, at least about 30%, at least about 40%, at least about 50%, at least about 55%, at least about 60%, at least about 65%, at least about 70%, at least about 75%, at least about 80%, at least about 85%, at least about 90%, at least about 91%, at least about 92%, at least about 93%, at least about 94%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, at least about 99%, or about 100% amino acid identity to the amino acid sequence of Arabidopsis SDG2 provided herein or to a sequence comprised therein, such as, for example, to amino acids 1571-2335 of Arabidopsis SDG2. In some embodiments, a SDG2 polypeptide comprises an amino acid sequence with at least about 20%, at least about 25%, at least about 30%, at least about 40%, at least about 50%, at least about 55%, at least about 60%, at least about 65%, at least about 70%, at least about 75%, at least about 80%, at least about 85%, at least about 90%, at least about 91%, at least about 92%, at least about 93%, at least about 94%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, at least about 99%, or about 100% sequence identity to a sequence selected from the group consisting of SEQ ID NO: 1, SEQ ID NO: 2, SEQ ID NO: 3, SEQ ID NO: 4, SEQ ID NO: 5, SEQ ID NO: 6, SEQ ID NO: 7, SEQ ID NO: 8, SEQ ID NO: 9, SEQ ID NO: 10, SEQ ID NO: 11, SEQ ID NO: 12, SEQ ID NO: 13, SEQ ID NO: 14, SEQ ID NO: 15, SEQ ID NO: 16, SEQ ID NO: 17, SEQ ID NO: 18, SEQ ID NO: 19, SEQ ID NO: 20, SEQ ID NO: 21, SEQ ID NO: 22, SEQ ID NO: 23, SEQ ID NO: 24, SEQ ID NO: 25, SEQ ID NO: 26, SEQ ID NO: 27, SEQ ID NO: 28, SEQ ID NO: 29, SEQ ID NO: 93, and SEQ ID NO: 94.
[0050] In some embodiments, the histone methyltransferase domain is not a plant histone methyltransferase domain. In some embodiments, the histone methyltransferase domain is derived from an animal, such as a mammal. For example, one or more copies of a PR DOMAIN CONTAINING 9 (PRDM9) domain may be used herein. In some embodiments, at least two, at least three, or at least four or more copies of a PRDM9 domain may be used as a histone methyltransferase domain. A polypeptide containing at least one copy of a domain from PRDM9 or from a homolog or ortholog thereof is known as a PRDM9 domain. In some embodiments, the histone methyltransferase domain is a PRDM9 polypeptide. A PRDM9 polypeptide of the present disclosure may contain an amino acid sequence with at least about 20%, at least about 25%, at least about 30%, at least about 40%, at least about 50%, at least about 55%, at least about 60%, at least about 65%, at least about 70%, at least about 75%, at least about 80%, at least about 85%, at least about 90%, at least about 91%, at least about16MF-364578422Attorney Docket No. 26223-20028.4092%, at least about 93%, at least about 94%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, at least about 99%, or about 100% amino acid identity to the amino acid sequence of mouse PRDM9 provided herein or to a sequence comprised therein, such as, for example, to amino acids 110-417 of mouse PRDM. In some embodiments, a PRDM9 polypeptide comprises an amino acid sequence with at least about 20%, at least about 25%, at least about 30%, at least about 40%, at least about 50%, at least about 55%, at least about 60%, at least about 65%, at least about 70%, at least about 75%, at least about 80%, at least about 85%, at least about 90%, at least about 91%, at least about 92%, at least about 93%, at least about 94%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, at least about 99%, or about 100% sequence identity to a sequence selected from the group consisting of SEQ ID NO: 30, SEQ ID NO: 31, SEQ ID NO: 32, SEQ ID NO: 33, SEQ ID NO: 34, SEQ ID NO: 35, SEQ ID NO: 36, SEQ ID NO: 37, SEQ ID NO: 38, SEQ ID NO: 39, SEQ ID NO: 40, SEQ ID NO: 41, SEQ ID NO: 42, and SEQ ID NO: 43.
[0051] Other exemplary histone methyltransferases can be, for example, SET domain containing histone methyltransferases and non-SET domain containing histone methyltransferases (see, e.g., Fang et al., Reprogramming of Histone H3 Lysine Methylation During Plant Sexual Reproduction, 12 Frontiers in Plant Science 782450 (2021)). SET domain containing histone methyltransferases include, for example, Class I histone methyltransferases (Enhancer of Zeste (E(Z)) homologs, e.g., protein methyltransferase enhancer of enzyme homolog 2 (EZH2), which methylates lysine 27 of the histone H3 protein), Class II histone methyltransferases (Absent, Small, or Homeotic discs 1 (ASH1) groups), Class III histone methyltransferases (Trithorax (Trx) groups), Class IV histone methyltransferases (Arabidopsis Trithorax-related 5 (ATXR5) and Arabidopsis Trithorax- related 6 (ATXR6)), Class V histone methyltransferases (SU(VAR)3-9 subgroups, e.g., SUV39H1 or SUV39H1, which methylate lysine 9 of the histone H3 protein), Class VI histone methyltransferases (SMYD subfamily), and Class VII histone methyltransferases (SETD subfamily, e.g., SETD bifurcated 1 or 2 (SETDB1, SETDB2), which methylate lysine 9 of the histone H3 protein). Non-SET domain containing histone methyltransferases include, for example, the Doti protein, which methylates lysine 79 in the histone H3 protein. Other regulators of histone methylation include, for example, histone demethylases, such as lysinespecific demethylase 1 (LSD1), IBM1, or Jumonji C (JmjC) domain-containing proteins (JMJs).17MF-364578422Attorney Docket No. 26223-20028.40
[0052] Additional histone methyltransferase domains that may be used in the methods and compositions described herein will be readily apparent to those of skill in the art.SDG2 Polypeptides
[0053] In some embodiments, a histone methyltransferase domain of the present disclosure is a recombinant SDG2 polypeptide. Certain aspects of the present disclosure therefore relate to recombinant SDG2 polypeptides. SDG2 proteins are known in the art and are described herein. In Arabidopsis thaliana, locus AT4G15180 codes for SDG2. SDG2 is a histone methyltransferase, and functions to catalyze methylation of histone 3 (H3) at position lysine 4 (K4). Accordingly, SDG2 is an H3K4 histone methyltransferase. SDG2 proteins generally catalyze tri -methylation (me3) of H3K4, producing H3K4me3. However, without wishing to be bound by theory, SDG2 may also catalyze some quantity of mono-methylation (mel) or di-methylation (me2) of H3K4.
[0054] Recombinant SDG2 polypeptides of the present disclosure may contain an SDG2 polypeptide domain and a domain involved in facilitating the targeting of the recombinant SDG2 polypeptide to a target nucleic acid. In some embodiments, recombinant SDG2 polypeptides include an SDG2 polypeptide domain and a heterologous DNA-binding domain. In some embodiments, recombinant SDG2 polypeptides include an SDG2 polypeptide domain and a dCAS9 polypeptide domain. In some embodiments, recombinant SDG2 polypeptides include an SDG2 polypeptide domain and an scFv antibody polypeptide domain.
[0055] Various SDG2 polypeptides may be used in the methods and compositions of the present disclosure, including full-length SDG2 proteins and fragments thereof. In some embodiments, an SDG2 polypeptide contains at least 20 consecutive amino acids, at least 30 consecutive amino acids, at least 40 consecutive amino acids, at least 50 consecutive amino acids, at least 60 consecutive amino acids, at least 70 consecutive amino acids, at least 80 consecutive amino acids, at least 90 consecutive amino acids, at least 100 consecutive amino acids, at least 120 consecutive amino acids, at least 140 consecutive amino acids, at least 160 consecutive amino acids, at least 180 consecutive amino acids, at least 200 consecutive amino acids, at least 220 consecutive amino acids, at least 240 consecutive amino acids, at least 260 consecutive amino acids, at least 280 consecutive amino acids, at least 300 consecutive amino acids, at least 350 consecutive amino acids, at least 400 consecutive amino acids, at least 450 consecutive amino acids, at least 500 consecutive amino acids, at18MF-364578422Attorney Docket No. 26223-20028.40 least 550 consecutive amino acids, at least 600 consecutive amino acids, at least 650 consecutive amino acids, or at least 750 consecutive amino acids or more of a full-length SDG2 protein. In some embodiments, an SDG2 polypeptide comprises an amino acid sequence within the full-length SDG2 protein, such as, for example, amino acids 1571 to 2335 therein. In some embodiments, an SDG2 polypeptide may include sequences with one or more amino acids removed from the consecutive amino acid sequence of a full-length SDG2 protein. In some embodiments, an SDG2 polypeptide may include sequences with one or more amino acids replaced / substituted with an amino acid different from the endogenous amino acid present at a given amino acid position in a consecutive amino acid sequence of a full-length SDG2 protein. In some embodiments, an SDG2 polypeptide may include sequences with one or more amino acids added to an otherwise consecutive amino acid sequence of a full-length SDG2 protein.
[0056] Suitable SDG2 proteins may be identified and isolated from monocot and dicot plants. Examples of suitable SDG2 proteins may include, for example, those listed in Table 1, homologs thereof, and orthologs thereof.Table 1: SDG2 Proteins
[0057] In some embodiments, an SDG2 polypeptide of the present disclosure has an amino acid sequence with at least about 20%, at least about 25%, at least about 30%, at least about 40%, at least about 50%, at least about 55%, at least about 60%, at least about 65%, at least about 70%, at least about 75%, at least about 80%, at least about 85%, at least about19MF-364578422Attorney Docket No. 26223-20028.4090%, at least about 91%, at least about 92%, at least about 93%, at least about 94%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, at least about 99%, or about 100% amino acid identity to an amino acid sequence selected from the group consisting of SEQ ID NO: 1, SEQ ID NO: 2, SEQ ID NO: 3, SEQ ID NO: 4, SEQ ID NO: 5, SEQ ID NO: 6, SEQ ID NO: 7, SEQ ID NO: 8, SEQ ID NO: 9, SEQ ID NO: 10, SEQ ID NO: 11, SEQ ID NO: 12, SEQ ID NO: 13, SEQ ID NO: 14, SEQ ID NO: 15, SEQ ID NO: 16, SEQ ID NO: 17, SEQ ID NO: 18, SEQ ID NO: 19, SEQ ID NO: 20, SEQ ID NO: 21, SEQ ID NO: 22, SEQ ID NO: 23, SEQ ID NO: 24, SEQ ID NO: 25, SEQ ID NO: 26, SEQ ID NO: 27, SEQ ID NO: 28, SEQ ID NO: 29, SEQ ID NO: 93, and SEQ ID NO: 94. In some embodiments, an SDG2 polypeptide of the present disclosure has an amino acid sequence with at least about 20%, at least about 25%, at least about 30%, at least about 40%, at least about 50%, at least about 55%, at least about 60%, at least about 65%, at least about 70%, at least about 75%, at least about 80%, at least about 85%, at least about 90%, at least about 91%, at least about 92%, at least about 93%, at least about 94%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, at least about 99%, or about 100% amino acid identity to the amino acid sequence of the A. thaliana SDG2 protein provided herein, e.g., an amino acid sequence selected from the group consisting of SEQ ID NO: 1, SEQ ID NO: 16, SEQ ID NO: 93, and SEQ ID NO: 94.
[0058] An SDG2 polypeptide may include the amino acid sequence or a fragment thereof of any SDG2 homolog or ortholog, such as any one of those listed in Table 1. One of skill would readily recognize that additional SDG2 protein homologs and / or orthologs may exist and may be used herein.SDG2 Catalytic Domain (SDG2C) Polypeptides
[0059] As described above, in some embodiments, the SDG2 polypeptide is a fragment of a full-length SDG2 protein. In some embodiments, the fragment includes the catalytic (H3K4 histone methyltransferase) domain of SDG2 (SDG2C). Accordingly, in some embodiments, the SDG2 polypeptide is a fragment of a full-length SDG2 protein that includes that SDG2 catalytic domain (SDG2C polypeptides).
[0060] Examples of suitable SDG2C polypeptides may include, for example, those listed in Table 2, homologs thereof, and orthologs thereof.20MF-364578422Attorney Docket No. 26223-20028.40Table 2: SDG2C Polypeptides0061] In some embodiments, an SDG2C polypeptide of the present disclosure has an amino acid sequence with at least about 20%, at least about 25%, at least about 30%, at least about 40%, at least about 50%, at least about 55%, at least about 60%, at least about 65%, at least about 70%, at least about 75%, at least about 80%, at least about 85%, at least about 90%, at least about 91%, at least about 92%, at least about 93%, at least about 94%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, at least about 99%, or about 100% amino acid identity to the amino acid sequence of Arabidopsis thaliana SDG2C. In some embodiments, a SDG2C polypeptide of the present disclosure is catalytically inactive, such as, for example, by comprising an amino acid sequence with a Y1903F mutation. In some embodiments, the SDG2C polypeptide is catalytically active.
[0062] An SDG2C polypeptide may include the amino acid sequence or a fragment thereof of any SDG2C polypeptide homolog or ortholog, such as any one of those listed in Table 2. One of skill would readily recognize that additional SDG2C polypeptide homologs and / or orthologs may exist and may be used herein.PRDM9 Polypeptides
[0063] In some embodiments, a histone methyltransferase domain of the present disclosure is a recombinant PRDM9 polypeptide. Certain aspects of the present disclosure therefore relate to recombinant PRDM9 polypeptides. PRDM9 proteins are known in the art and are described herein. PRDM9 is a histone methyltransferase, and functions to catalyze methylation of histone 3 (H3) at position lysine 4 (K4). Accordingly, PRDM9 is an H3K421MF-364578422Attorney Docket No. 26223-20028.40 histone methyltransferase. PRDM9 proteins may catalyze mono-methylation (mel) of H3K4, producing H3K4mel; di-methylation (me2) of H3K4, producing H3K4me2; or trimethylation (me3) of H3K4, producing H3K4me3 (see, e.g., Wu et al., Molecular Basis for the Regulation of the H3K4 Methyltransferase Activity ofPRDM9, 5 Cell Reports 13 (2013)).
[0064] Recombinant PRDM9 polypeptides of the present disclosure may contain a PRDM9 polypeptide domain and a domain involved in facilitating the targeting of the recombinant PRDM9 polypeptide to a target nucleic acid. In some embodiments, recombinant PRDM9 polypeptides include a PRDM9 polypeptide domain and a heterologous DNA-binding domain. In some embodiments, recombinant PRDM9 polypeptides include a PRDM9 polypeptide domain and a dCAS9 polypeptide domain. In some embodiments, recombinant PRDM9 polypeptides include a PRDM9 polypeptide domain and an scFv antibody polypeptide domain.
[0065] Various PRDM9 polypeptides may be used in the methods and compositions of the present disclosure, including full-length PRDM9 proteins and fragments thereof. In some embodiments, a PRDM9 polypeptide contains at least 20 consecutive amino acids, at least 30 consecutive amino acids, at least 40 consecutive amino acids, at least 50 consecutive amino acids, at least 60 consecutive amino acids, at least 70 consecutive amino acids, at least 80 consecutive amino acids, at least 90 consecutive amino acids, at least 100 consecutive amino acids, at least 120 consecutive amino acids, at least 140 consecutive amino acids, at least 160 consecutive amino acids, at least 180 consecutive amino acids, at least 200 consecutive amino acids, at least 220 consecutive amino acids, at least 240 consecutive amino acids, at least 260 consecutive amino acids, at least 280 consecutive amino acids, at least 300 consecutive amino acids, at least 350 consecutive amino acids, at least 400 consecutive amino acids, at least 450 consecutive amino acids, at least 500 consecutive amino acids, at least 550 consecutive amino acids, at least 600 consecutive amino acids, at least 650 consecutive amino acids, or at least 750 consecutive amino acids or more of a full-length PRDM9 protein. In some embodiments, a PRDM9 polypeptide comprises an amino acid sequence within the full-length PRDM9 protein, such as, for example, amino acids 110 to 417 therein. In some embodiments, a PRDM9 polypeptide may include sequences with one or more amino acids removed from the consecutive amino acid sequence of a full-length PRDM9 protein. In some embodiments, a PRDM9 polypeptide may include sequences with one or more amino acids replaced / substituted with an amino acid different from the endogenous amino acid present at a given amino acid position in a consecutive amino acid22MF-364578422Attorney Docket No. 26223-20028.40 sequence of a full-length PRDM9 protein. In some embodiments, a PRDM9 polypeptide may include sequences with one or more amino acids added to an otherwise consecutive amino acid sequence of a full-length PRDM9 protein.
[0066] Suitable PRDM9 proteins may be identified and isolated from mice or other mammalian species. Examples of suitable PRDM9 proteins may include, for example, those listed in Tables 3A-3B, homologs thereof, and orthologs thereof.Table 3 A: PRDM9 ProteinsTable 3B: Additional PRDM9 amino acid sequences23MF-364578422Attorney Docket No. 26223-20028.40
[0067] In some embodiments, a PRDM9 polypeptide of the present disclosure has an amino acid sequence with at least about 20%, at least about 25%, at least about 30%, at least about 40%, at least about 50%, at least about 55%, at least about 60%, at least about 65%, at least about 70%, at least about 75%, at least about 80%, at least about 85%, at least about 90%, at least about 91%, at least about 92%, at least about 93%, at least about 94%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, at least about 99%, or about 100% amino acid identity to an amino acid sequence selected from the group consisting of SEQ ID NO: 30, SEQ ID NO: 31, SEQ ID NO: 32, SEQ ID NO: 33, SEQ ID NO: 34, SEQ ID NO: 35, SEQ ID NO: 36, SEQ ID NO: 37, SEQ ID NO: 38, SEQ ID NO: 39, SEQ ID NO: 40, SEQ ID NO: 41, SEQ ID NO: 42, and SEQ ID NO: 43. In some embodiments, a PRDM9 polypeptide of the present disclosure has an amino acid sequence with at least about 20%, at least about 25%, at least about 30%, at least about 40%, at least about 50%, at least about 55%, at least about 60%, at least about 65%, at least about 70%, at least about 75%, at least about 80%, at least about 85%, at least about 90%, at least about 91%, at least about 92%, at least about 93%, at least about 94%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, at least about 99%, or about 100% amino acid identity to the amino acid sequence of the murine PRDM9 protein provided herein, e.g., an amino acid sequence selected from the group consisting of SEQ ID NO: 30, SEQ ID NO: 31, SEQ ID NO: 38, and SEQ ID NO: 43. In some embodiments, a PRDM9 polypeptide of the present disclosure is catalytically inactive, such as, for example, by comprising an amino acid sequence with a G282A mutation. In some embodiments, the PRDM9 polypeptide is catalytically active.
[0068] A PRDM9 polypeptide may include the amino acid sequence or a fragment thereof of any PRDM9 homolog or ortholog, such as any one of those listed in Table 3. One of skill would readily recognize that additional PRDM9 protein homologs and / or orthologs may exist and may be used herein.PRDM9 Catalytic Domain (PRDM9C) Polypeptides
[0069] As described above, in some embodiments, the PRDM9 polypeptide is a fragment of a full-length PRDM9 protein. In some embodiments, the fragment includes the catalytic (H3K4 histone methyltransferase) domain of PRDM9 (PRDM9C). Accordingly, in some24MF-364578422Attorney Docket No. 26223-20028.40 embodiments, the PRDM9 polypeptide is a fragment of a full-length PRDM9 protein that includes that PRDM9 catalytic domain (PRDM9C polypeptides).
[0070] Examples of suitable PRDM9C polypeptides may include, for example, those listed in Table 4, homologs thereof, and orthologs thereof.Table 4: PRDM9C Polypeptides
[0071] In some embodiments, a PRDM9C polypeptide of the present disclosure has an amino acid sequence with at least about 20%, at least about 25%, at least about 30%, at least about 40%, at least about 50%, at least about 55%, at least about 60%, at least about 65%, at least about 70%, at least about 75%, at least about 80%, at least about 85%, at least about 90%, at least about 91%, at least about 92%, at least about 93%, at least about 94%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, at least about 99%, or about 100% amino acid identity to the amino acid sequence of the murine PRDM9C protein provided herein, e.g., SEQ ID NO: 38 or SEQ ID NO: 43.
[0072] A PRDM9C polypeptide may include the amino acid sequence or a fragment thereof of any PRDM9C polypeptide homolog or ortholog, such as any one of those listed in Table 4. One of skill would readily recognize that additional PRDM9C polypeptide homologs and / or orthologs may exist and may be used herein.Methods of Identifying Sequence Similarity
[0073] Various methods are known to those of skill in the art for identifying similar (e.g. homologs, orthologs, paralogs, etc.) polypeptide and / or polynucleotide sequences, including phylogenetic methods, sequence similarity analysis, and hybridization methods.
[0074] Phylogenetic trees may be created for a gene family by using a program such as CLUSTAL (Thompson et al. Nucleic Acids Res. 22: 4673-4680 (1994); Higgins et al. Methods Enzymol 266: 383-402 (1996)) or MEGA (Tamura et al. Mol. Biol. & Evo. 24:1596-1599 (2007)). Once an initial tree for genes from one species is created, potential orthologous sequences can be placed in the phylogenetic tree and their relationships to genes from the species of interest can be determined. Evolutionary relationships may also be inferred using the Neighbor-Joining method (Saitou and Nei, Mol. Biol. & Evo. 4:406-42525MF-364578422Attorney Docket No. 26223-20028.40(1987)). Homologous sequences may also be identified by a reciprocal BLAST strategy. Evolutionary distances may be computed using the Poisson correction method (Zuckerkandl and Pauling, pp. 97-166 in Evolving Genes and Proteins, edited by V. Bryson and H.J. Vogel. Academic Press, New York (1965)).
[0075] In addition, evolutionary information may be used to predict gene function. Functional predictions of genes can be greatly improved by focusing on how genes became similar in sequence (i.e. by evolutionary processes) rather than on the sequence similarity itself (Eisen, Genome Res. 8: 163-167 (1998)). Many specific examples exist in which gene function has been shown to correlate well with gene phylogeny (Eisen, Genome Res. 8: 163- 167 (1998)). By using a phylogenetic analysis, one skilled in the art would recognize that the ability to deduce similar functions conferred by closely -related polypeptides is predictable.
[0076] When a group of related sequences are analyzed using a phylogenetic program such as CLUSTAL, closely related sequences typically cluster together or in the same clade (a group of similar genes). Groups of similar genes can also be identified with pair- wise BLAST analysis (Feng and Doolittle, J. Mol. Evol. 25: 351-360 (1987)). Analysis of groups of similar genes with similar function that fall within one clade can yield sub-sequences that are particular to the clade. These sub-sequences, known as consensus sequences, can not only be used to define the sequences within each clade, but define the functions of these genes; genes within a clade may contain paralogous sequences, or orthologous sequences that share the same function (see also, for example, Mount, Bioinformatics: Sequence and Genome Analysis Cold Spring Harbor Laboratory Press, Cold Spring Harbor, N.Y., page 543 (2001)).
[0077] To find sequences that are homologous to a reference sequence, BLAST nucleotide searches can be performed with the BLASTN program, score=100, wordlength=12, to obtain nucleotide sequences homologous to a nucleotide sequence encoding a protein of the disclosure. BLAST protein searches can be performed with the BLASTX program, score=50, wordlength=3, to obtain amino acid sequences homologous to a protein or polypeptide of the disclosure. To obtain gapped alignments for comparison purposes, Gapped BLAST (in BLAST 2.0) can be utilized as described in Altschul et al. (1997) Nucleic Acids Res. 25:3389. Alternatively, PSI-BLAST (in BLAST 2.0) can be used to perform an iterated search that detects distant relationships between molecules. See Altschul et al. (1997) supra. When utilizing BLAST, Gapped BLAST, or PSI-BLAST, the default parameters of the respective programs (e.g., BLASTN for nucleotide sequences, BLASTX for proteins) can be used.26MF-364578422Attorney Docket No. 26223-20028.40
[0078] Methods for the alignment of sequences and for the analysis of similarity and identity of polypeptide and polynucleotide sequences are well-known in the art.
[0079] As used herein “sequence identity” refers to the percentage of residues that are identical in the same positions in the sequences being analyzed. As used herein “sequence similarity” refers to the percentage of residues that have similar biophysical / biochemical characteristics in the same positions (e.g. charge, size, hydrophobicity) in the sequences being analyzed.
[0080] Methods of alignment of sequences for comparison are well-known in the art, including manual alignment and computer assisted sequence alignment and analysis. This latter approach is a preferred approach in the present disclosure, due to the increased throughput afforded by computer assisted methods. As noted below, a variety of computer programs for performing sequence alignment are available, or can be produced by one of skill.
[0081] The determination of percent sequence identity and / or similarity between any two sequences can be accomplished using a mathematical algorithm. Examples of such mathematical algorithms are the algorithm of Myers and Miller, CABIOS 4:11-17 (1988); the local homology algorithm of Smith et al., Adv. Appl. Math. 2:482 (1981); the homology alignment algorithm of Needleman and Wunsch, J. Mol. Biol. 48:443-453 (1970); the search- for-similarity-method of Pearson and Lipman, Proc. Natl. Acad. Sci. 85:2444-2448 (1988); the algorithm of Karlin and Altschul, Proc. Natl. Acad. Sci. USA 87:2264-2268 (1990), modified as in Karlin and Altschul, Proc. Natl. Acad. Sci. USA 90:5873-5877 (1993).
[0082] Computer implementations of these mathematical algorithms can be utilized for comparison of sequences to determine sequence identity and / or similarity. Such implementations include, for example: CLUSTAL in the PC / Gene program (available from Intelligenetics, Mountain View, Calif.); the AlignX program, versionl0.3.0 (Invitrogen, Carlsbad, CA) and GAP, BESTFIT, BLAST, FASTA, and TFASTA in the Wisconsin Genetics Software Package, Version 8 (available from Genetics Computer Group (GCG), 575 Science Drive, Madison, Wis., USA). Alignments using these programs can be performed using the default parameters. The CLUSTAL program is well described by Higgins et al. Gene 73:237-244 (1988); Higgins et al. CABIOS 5:151-153 (1989); Corpet et al., Nucleic Acids Res. 16:10881-90 (1988); Huang et al. CABIOS 8:155-65 (1992); and Pearson et al.,27MF-364578422Attorney Docket No. 26223-20028.40Meth. Mol. Biol. 24:307-331 (1994). The BLAST programs of Altschul et al. J. Mol. Biol. 215:403-410 (1990) are based on the algorithm of Karlin and Altschul (1990) supra.
[0083] Polynucleotides homologous to a reference sequence can be identified by hybridization to each other under stringent or under highly stringent conditions. Single stranded polynucleotides hybridize when they associate based on a variety of well characterized physical-chemical forces, such as hydrogen bonding, solvent exclusion, base stacking and the like. The stringency of a hybridization reflects the degree of sequence identity of the nucleic acids involved, such that the higher the stringency, the more similar are the two polynucleotide strands. Stringency is influenced by a variety of factors, including temperature, salt concentration and composition, organic and non-organic additives, solvents, etc. present in both the hybridization and wash solutions and incubations (and number thereof), as described in more detail in references cited below (e.g., Sambrook et al., Molecular Cloning: A Laboratory Manual, 2nd Ed., Vol. 1-3, Cold Spring Harbor Laboratory, Cold Spring Harbor, N.Y. ("Sambrook") (1989); Berger and Kimmel, Guide to Molecular Cloning Techniques, Methods in Enzymology, vol. 152 Academic Press, Inc., San Diego, Calif. ("Berger and Kimmel") (1987); and Anderson and Young, "Quantitative Filter Hybridisation." In: Hames and Higgins, ed., Nucleic Acid Hybridisation, A Practical Approach. Oxford, TRL Press, 73-111 (1985)).
[0084] Encompassed by the disclosure are polynucleotide sequences that are capable of hybridizing to the disclosed polynucleotide sequences and fragments thereof under various conditions of stringency (see, for example, Wahl and Berger, Methods Enzymol. 152: 399- 407 (1987); and Kimmel, Methods Enzymo. 152: 507-511, (1987)). Full length cDNA, homologs, orthologs, and paralogs of polynucleotides of the present disclosure may be identified and isolated using well-known polynucleotide hybridization methods.
[0085] With regard to hybridization, conditions that are highly stringent, and means for achieving them, are well known in the art. See, for example, Sambrook et al. (1989) (supra); Berger and Kimmel (1987) pp. 467-469 (supra); and Anderson and Young (1985) (supra).
[0086] Hybridization experiments are generally conducted in a buffer of pH between 6.8 to 7.4, although the rate of hybridization is nearly independent of pH at ionic strengths likely to be used in the hybridization buffer (Anderson and Young (1985)(supra)). In addition, one or more of the following may be used to reduce non-specific hybridization: sonicated salmon sperm DNA or another non-complementary DNA, bovine serum albumin, sodium28MF-364578422Attorney Docket No. 26223-20028.40 pyrophosphate, sodium dodecylsulfate (SDS), polyvinyl-pyrrolidone, ficoll and Denhardt's solution. Dextran sulfate and polyethylene glycol 6000 act to exclude DNA from solution, thus raising the effective probe DNA concentration and the hybridization signal within a given unit of time. In some instances, conditions of even greater stringency may be desirable or required to reduce non-specific and / or background hybridization. These conditions may be created with the use of higher temperature, lower ionic strength and higher concentration of a denaturing agent such as formamide.
[0087] Stringency conditions can be adjusted to screen for moderately similar fragments such as homologous sequences from distantly related organisms, or to highly similar fragments such as genes that duplicate functional enzymes from closely related organisms. The stringency can be adjusted either during the hybridization step or in the posthybridization washes. Salt concentration, formamide concentration, hybridization temperature and probe lengths are variables that can be used to alter stringency. As a general guideline, high stringency is typically performed at Tm-5°C to Tm-20°C, moderate stringency at Tm-20°C to Tm-35°C and low stringency at Tm-35°C to Tm-50° C for duplex >150 base pairs. Hybridization may be performed at low to moderate stringency (25-50°C below Tm), followed by post-hybridization washes at increasing stringencies. Maximum rates of hybridization in solution are determined empirically to occur at Tm-25°C for DNA-DNA duplex and Tm-15°C for RNA-DNA duplex. Optionally, the degree of dissociation may be assessed after each wash step to determine the need for subsequent, higher stringency wash steps.
[0088] High stringency conditions may be used to select for nucleic acid sequences with high degrees of identity to the disclosed sequences. An example of stringent hybridization conditions obtained in a filter-based method such as a Southern or northern blot for hybridization of complementary nucleic acids that have more than 100 complementary residues is about 5°C to 20°C lower than the thermal melting point (Tm) for the specific sequence at a defined ionic strength and pH.
[0089] Hybridization and wash conditions that may be used to bind and remove polynucleotides with less than the desired homology to the nucleic acid sequences or their complements of the present disclosure include, for example: 6X SSC and 1% SDS at 65°C; 50% formamide, 4X SSC at 42°C; 0.5X SSC to 2.0 X SSC, 0.1% SDS at 50°C to 65°C; or 0.1X SSC to 2X SSC, 0.1% SDS at 50°C - 65°C; with a first wash step of, for example, 1029MF-364578422Attorney Docket No. 26223-20028.40 minutes at about 42°C with about 20% (v / v) formamide in 0.1X SSC, and with, for example, a subsequent wash step with 0.2 X SSC and 0.1% SDS at 65°C for 10, 20 or 30 minutes.
[0090] For identification of less closely related homologs, wash steps may be performed at a lower temperature, e.g., 50o C. An example of a low stringency wash step employs a solution and conditions of at least 25°C in 30 mM NaCl, 3 mM trisodium citrate, and 0.1% SDS over 30 min. Greater stringency may be obtained at 42 °C in 15 mM NaCl, with 1.5 mM trisodium citrate, and 0.1% SDS over 30 min. Wash procedures will generally employ at least two final wash steps. Additional variations on these conditions will be readily apparent to those skilled in the art (see, for example, US Patent Application No. 20010010913).
[0091] If desired, one may employ wash steps of even greater stringency, including conditions of 65°C -68°C in a solution of 15 mM NaCl, 1.5 mM trisodium citrate, and 0.1% SDS, or about 0.2X SSC, 0.1% SDS at 65° C and washing twice, each wash step of 10, 20 or 30 min in duration, or about 0.1 X SSC, 0.1% SDS at 65° C and washing twice for 10, 20 or 30 min. Hybridization stringency may be increased further by using the same conditions as in the hybridization steps, with the wash temperature raised about 3°C to about 5°C, and stringency may be increased even further by using the same conditions except the wash temperature is raised about 6°C to about 9°C.Targeting to Specific Loci
[0092] Certain aspects of the present disclosure relate to recombinant polypeptides that contain a heterologous targeting domain and are capable of being targeted to a target nucleic acid. A targeting domain generally refers to a polypeptide or amino acid sequence that is able to facilitate or is involved in facilitating, either directly or indirectly, targeting of a recombinant polypeptide to a target nucleic acid sequence. For example, the targeting domain may directly confer the specific targeting functionality of a histone methyltransferase domain polypeptide of the present disclosure to a target nucleic acid, or the targeting domain may be associated with or interact with another agent that confers the specific targeting functionality of a histone methyltransferase domain polypeptide of the present disclosure to the target nucleic acid. In some embodiments, the targeting domain may associate with a DNA-binding polypeptide that is able to be targeted to a target nucleic acid. Suitable targeting domains for use in the present disclosure are described herein and will be readily apparent to one of skill in the art.30MF-364578422Attorney Docket No. 26223-20028.40
[0093] Certain aspects of the present disclosure relate to targeting histone methyltransferase domain polypeptides, such as, for example, SDG2 polypeptides, to specific loci. Targeted loci may also be referred to as target nucleic acids. Various methods for targeting polypeptides to a specific nucleic acid are known in the art and are described herein. In some embodiments, an RNA-guided DNA-binding protein or system is used to facilitate targeting of a histone methyltransferase domain polypeptide to a target nucleic acid (e.g. CRISPR-CAS9 targeting systems, such as a SunTag system). In some embodiments, a DNA- binding domain may be used to facilitate targeting of a histone methyltransferase domain polypeptide to a target nucleic acid.
[0094] In some embodiments, the targeting is perfect (i.e., comprises no off-target effects). In some embodiments, the targeting is imperfect (i.e., comprises off-target effects). In some embodiments, imperfect targeting is designed intentionally in order to increase methylation levels and / or recombination rates in multiple genomic regions simultaneously.
[0095] Certain aspects of the present disclosure involve CRISPR-based targeting of a target nucleic acid, which involves use of a CRISPR-CAS9 targeting system. CRISPR-CAS9 systems involve the use of a CRISPR RNA (crRNA), a trans -activating CRISPR RNA (tracrRNA), and a CAS9 protein. The crRNA and tracrRNA aid in directing the CAS9 protein to a target nucleic acid sequence, and these RNA molecules can be specifically engineered to target specific nucleic acid sequences. In particular, certain aspects of the present disclosure involve the use of a single guide RNA (gRNA) that reconstitutes the function of the crRNA and the tracrRNA. Further, certain aspects of the present disclosure involve a CAS9 protein that does not exhibit DNA cleavage activity (dCAS9). As disclosed herein, gRNA molecules may be used to direct a dCAS9 protein to a target nucleic acid sequence. In some embodiments, a gRNA or crRNA is engineered to target a sequence perfectly, i.e., without mismatches. In some embodiments, a gRNA or crRNA is engineered to target a sequence imperfectly, i.e., with mismatches.
[0096] Certain aspects of the present disclosure involve targeting of a target nucleic acid that includes one or more repeats, for example, a target nucleic acid at a repeat-rich locus. In some embodiments, a gRNA or crRNA is engineered to target multiple possible locations in a centromere repeat and / or in a repeat proximal to a centromere, whether perfectly or imperfectly. For example, in some embodiments, a gRNA or crRNA is engineered to perfectly target 2 or more, 3 or more, 5 or more, 10 or more, 20 or more, 50 or more, 100 or more, 150 or more, 200 or more, 250 or more, 300 or more, 350 or more, 400 or more, 450 or31MF-364578422Attorney Docket No. 26223-20028.40 more, 500 or more, 750 or more, or more than 1000 individual repeats in a centromere and / or in a region proximal to a centromere. In some embodiments, a gRNA or crRNA is engineered to imperfectly target 2 or more, 3 or more, 5 or more, 10 or more, 20 or more, 50 or more, 100 or more, 150 or more, 200 or more, 250 or more, 300 or more, 350 or more, 400 or more, 450 or more, 500 or more, 750 or more, or more than 1000 different repeats in a centromere and / or in a region proximal to a centromere.
[0097] In some embodiments, a gRNA or crRNA that aligns perfectly to one target locus (e.g., within a first centromere or first centromere proximal region) is engineered to tolerate mismatches to a different locus (e.g., within a second centromere or second centromere proximal region). Without wishing to be bound by theory, it is believed that designing gRNAs or crRNAs with mismatch tolerance may allow, e.g., increasing of the recombination rate in multiple genomic loci simultaneously without having to design and transform multiple different targeting constructs.
[0098] In addition to the CRISPR-based targeting systems described herein, recombinant histone methyltransferase domain polypeptides of the present disclosure may be targeted to a target nucleic acid via a DNA-binding domain. Accordingly, certain aspects of the present disclosure relate to recombinant histone methyltransferase domain polypeptides that have DNA-binding activity. In some embodiments, this DNA-binding activity is achieved through a heterologous DNA-binding domain (e.g. binds with a sequence affinity other than that of any DNA-binding domain that may be present in the endogenous protein). In some embodiments, recombinant histone methyltransferase domain polypeptides of the present disclosure contain a DNA-binding domain. Recombinant histone methyltransferase domain polypeptides of the present disclosure may contain one DNA binding domain or they may contain more than one DNA-binding domain. Heterologous DNA-binding domains may be recombinantly fused to a histone methyltransferase domain polypeptide of the present disclosure such that the histone methyltransferase domain polypeptide is then able to be targeted to a specific nucleic acid sequence.
[0099] In some embodiments, the DNA-binding domain is a zinc finger domain. A zinc finger domain generally refers to a DNA-binding protein domain that contains zinc fingers, which are small protein structural motifs that can coordinate one or more zinc ions to help stabilize their protein folding. Zinc fingers were first identified as DNA-binding motifs (Miller et al., 1985), and numerous other variations of them have been characterized. Recent progress has been made that allows the engineering of DNA-binding proteins that specifically32MF-364578422Attorney Docket No. 26223-20028.40 recognize any desired DNA sequence. For example, it was shown that a three-finger zinc finger protein could be constructed to block the expression of a human oncogene that was transformed into a mouse cell line (Choo and Klug, 1994).
[0100] Zinc fingers can generally be classified into several different structural families and typically function as interaction modules that bind DNA, RNA, proteins, or small molecules. Suitable zinc finger domains of the present disclosure may contain two, three, four, five, six, seven, eight, or nine zinc fingers. Examples of suitable zinc finger domains may include, for example, Cys2His2 (C2H2) zinc finger domains, C-x8-C-x5-C-x3-H (CCCH) (SEQ ID NO: 44; SEQ ID NO: 45) zinc finger domains, multi-cysteine zinc finger domains, and zinc binuclear cluster domains.
[0101] In some embodiments, the DNA-binding domain binds a specific nucleic acid sequence. For example, the DNA-binding domain may bind a sequence that is at least 5 nucleotides, at least 6 nucleotides, at least 7 nucleotides, at least 8 nucleotides, at least 9 nucleotides, at least 10 nucleotides, at least 11 nucleotides, at least 12 nucleotides, at least 13 nucleotides, at least 14 nucleotides, at least 15 nucleotides, at least 20 nucleotides, at least 25 nucleotides, at least 30 nucleotides, at least 35 nucleotides, at least 40 nucleotides, at least 45 nucleotides, at least 50 nucleotides, or a higher number of nucleotides in length.
[0102] In some embodiments, a recombinant histone methyltransferase domain polypeptide of the present disclosure may contain two N-terminal CCCH zinc finger domains.
[0103] In some embodiments, the zinc finger domain is an engineered zinc finger array, such as a C2H2 zinc finger array. Engineered arrays of C2H2 zinc fingers can be used to create DNA-binding proteins capable of targeting desired genomic DNA sequences. Methods of engineering zinc finger arrays are well known in the art, and include, for example, combining smaller zinc fingers of known specificity.
[0104] In some embodiments, recombinant histone methyltransferase domain polypeptides of the present disclosure may contain a DNA-binding domain other than a zinc finger domain. Examples of such DNA-binding domains may include, for example, TAL (transcription activator-like) effector targeting domains, helix-turn-helix family DNA-binding domains, basic domains, ribbon-helix-helix domains, TBP (TATA-box binding protein) domains, barrel dimer domains, RHB domains (real homology domain), BAH (bromo- adjacent homology) domains, SANT domains, Chromodomains, Tudor domains,33MF-364578422Attorney Docket No. 26223-20028.40Bromodomains, PHD domains (plant homeo domain), WD40 domains, and MBD domains (methyl-CpG-binding domain).
[0105] In some embodiments, the DNA-binding domain is a TAL effector targeting domain. TAL effectors generally refer to secreted bacterial proteins, such as those secreted by Xanthomonas or Ralstonia bacteria when infecting various plant species. Generally, TAL effectors are capable of binding promoter sequences in the host plant and activate the expression of plant genes that aid in bacterial infection. TAL effectors recognize plant DNA sequences through a central repeat targeting domain that contains a variable number of approximately 34 amino acid repeats. Moreover, TAL effector targeting domains can be engineered to target specific DNA sequences. Methods of modifying TAL effector targeting domains are well known in the art, and described in Bogdanove and Voytas, Science. 2011 Sep 30; 333(6051):1843-6.
[0106] Other DNA-binding domains for use in the methods and compositions of the present disclosure will be readily apparent to one of skill in the art, in view of the present disclosure.CRISPR-CAS
[0107] In some embodiments, the targeting domain is or may include an RNA-guided DNA binding protein. For example, certain aspects of the present disclosure involve CRISPR-based targeting of a histone methyltransferase domain to a target nucleic acid, which involves use of a CRISPR-CAS targeting system (e.g., CRISPR-CAS9, CRISPR- CAS 12, CRISPR-CAS PHI, etc.). In some embodiments, an epitope or multimerized epitope of the present disclosure is present in a recombinant polypeptide that contains dCAS9 polypeptide.
[0108] CRISPR systems naturally use small base-pairing guide RNAs to target and cleave foreign DNA elements in a sequence-specific manner (Wiedenheft et al., 2012). There are diverse CRISPR systems in different organisms that may be used to target proteins of the present disclosure to a target nucleic acid. One of the simplest systems is the type II CRISPR system from Streptococcus pyogenes. Only a single gene encoding the CAS9 protein and two RNAs, a mature CRISPR RNA (crRNA) and a partially complementary trans-acting RNA (tracrRNA), are necessary and sufficient for RNA-guided silencing of foreign DNAs (Jinek et al., 2012). Maturation of crRNA requires tracrRNA and RNase III (Deltcheva et al., 2011). However, this requirement can be bypassed by using an engineered small guide RNA34MF-364578422Attorney Docket No. 26223-20028.40(gRNA) containing a designed hairpin that mimics the tracrRNA-crRNA complex (Jinek et al., 2012). Base pairing between the gRNA and target DNA normally causes double-strand breaks (DSBs) due to the endonuclease activity of CAS 9.
[0109] It is known that the endonuclease domains of the CAS 9 protein can be mutated to create a programmable RNA-dependent DNA-binding protein (dCAS9) (Qi et al., 2013). The fact that duplex gRNA-dCAS9 binds target sequences without endonuclease activity has been used to tether regulatory proteins, such as transcriptional activators or repressors, to promoter regions in order to modify gene expression (Gilbert et al., 2013), and CAS9 transcriptional activators have been used for target specificity screening and paired nickases for cooperative genome engineering (Mali et al., 2013, Nature Biotechnology 31 :833-838). Thus, dCAS9 may be used as a modular RNA-guided platform to recruit different proteins to DNA in a highly specific manner. One of skill in the art would recognize other RNA-guided DNA binding protein / RNA complexes that can be used equivalently to CRISPR-CAS9.
[0110] Various CAS proteins suitable for use in the methods and compositions of the present disclosure are known in the art and described herein. In some embodiments, the CAS polypeptide may be a Cas9 polypeptide having an amino acid sequence that has at least about 20%, at least about 25%, at least about 30%, at least about 40%, at least about 50%, at least about 55%, at least about 60%, at least about 65%, at least about 70%, at least about 75%, at least about 80%, at least about 85%, at least about 90%, at least about 91%, at least about 92%, at least about 93%, at least about 94%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, at least about 99%, or about 100% amino acid identity to the amino acid sequence of dCas9, Cas9, Casl2a, and / or CasPhi.
[0111] CRISPR-based systems may be used to target a histone methyltransferase domain polypeptide (e.g. PRDM9, SDG2) to a specific nucleic acid. Targeting using CRISPR-based systems may be beneficial over other genome targeting techniques in certain instances. For example, one need only change the guide RNAs in order to target recombinant polypeptides to a new genomic location, or even multiple locations simultaneously. Further, CAS9- mediated targeting has been shown to be insensitive to the methylation state of the target nucleic acid (Nature Biotechnology 31, 827-832 (2013)). In addition, guide RNAs can be extended to include sites for binding to certain proteins which can be fused to polypeptides of interest (e.g., histone methyltransferase polypeptides).35MF-364578422Attorney Docket No. 26223-20028.40
[0112] Suitable CRISPR-based targeting systems and variations thereof are well-known in the art and may be used in the embodiments of the present disclosure in view of the guidance provided herein. For example, WO2018 / 136783 describes a SunTag-based targeting system for use in plants.CAS9 Proteins
[0113] A variety of CAS 9 proteins may be used in the methods of the present disclosure. There are several CAS9 genes present in different bacteria species (Esvelt, K et al, 2013, Nature Methods). One of the most characterized CAS9 proteins is the CAS9 protein from S. pyogenes that, in order to be active, needs to bind a gRNA with a specific sequence and the presence of a PAM motif (NGG, where N is any nucleotide) at the 3’ end of the target locus. However, other CAS9 proteins from different bacterial species show differences in 1) the sequence of the gRNA they can bind and 2) the sequence of the PAM motif. Therefore, it is possible that other CAS9 proteins such as, for example, those from Streptococcus thermophilus or N. meningitidis may also be utilized herein. Indeed, these two CAS9 proteins have a smaller size (around 1100 amino acids) as compared to 5. pyogenes CAS 9 (1400 amino acids), which may confer some advantages during cloning or protein expression.
[0114] CAS9 proteins from a variety of bacteria have been used successfully in engineered CRISPR-CAS9 systems. There are also versions of CAS9 proteins available in which the codon usage has been more highly optimized for expression in eukaryotic systems, such as human codon optimized CAS9 (Cell, 152:1173-1183) and plant optimized CAS9 (Nature Biotechnology, 31:688-691).
[0115] CAS9 proteins may also be modified for various purposes. For example, CAS9 proteins may be engineered to contain a nuclear-localization sequence (NLS). CAS9 proteins may be engineered to contain an NLS at the N-terminus of the protein, at the C-terminus of the protein, or at both the N- and C-terminus of the protein. Engineering a CAS9 protein to contain an NLS may assist with directing the protein to the nucleus of a host cell. CAS9 proteins may be engineered such that they are unable to cleave nucleic acids (e.g. nuclease- deficient dCAS9 polypeptides). One of skill in the art would be able to readily identify a suitable CAS9 protein for use in the methods and compositions of the present disclosure.
[0116] Exemplary CAS9 proteins that may be used in the methods and compositions of the present disclosure may include, for example, a CAS9 protein having an amino acid sequence disclosed herein, homologs thereof, and fragments thereof, e.g., SEQ ID NO: 102,36MF-364578422Attorney Docket No. 26223-20028.40SEQ ID NO: 103, SEQ ID NO: 104. In some embodiments, the CAS9 polypeptide is a dCAS9 polypeptide. dCAS9 polypeptides may contain an amino acid sequence with at least about 20%, at least about 25%, at least about 30%, at least about 40%, at least about 50%, at least about 55%, at least about 60%, at least about 65%, at least about 70%, at least about 75%, at least about 80%, at least about 85%, at least about 90%, at least about 91%, at least about 92%, at least about 93%, at least about 94%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, at least about 99%, or about 100% amino acid identity to the amino acid sequence of a dCAS9 amino acid sequence provided herein, e.g., SEQ ID NO: 98.CRISPR RNAs
[0117] The CRISPR RNA (crRNA) of the present disclosure may take a variety of forms. As described above, the sequence of the crRNA is involved in conferring specificity to targeting a specific nucleic acid.
[0118] Many different crRNA molecules can be designed to target many different sequences. With respect to targeting, target nucleic acids generally require the PAM sequence, NGG, at the end of the 20 base pair target sequence. crRNAs of the present disclosure may be expressed as a single crRNA molecule, or they may be expressed in the form of a crRNA / tracrRNA hybrid molecule where the crRNA and the tracrRNA have been fused together, forming a guide RNA (gRNA). crRNA molecules and / or guide RNA molecules may be extended to include sites for the binding of RNA binding proteins.
[0119] Multiple crRNAs and / or guide RNAs can be encoded into a single CRISPR array to enable simultaneous targeting to several sites (Science 2013: Vol. pp. 819-823). For example, the tracrRNA may be expressed separately, and two adjacent target sequences may be encoded in a pre-crRNA array interspaced with repeats.
[0120] A variety of promoters may be used to drive expression of the crRNA and / or the guide RNA. crRNAs and / or guide RNAs may be expressed using a Pol III promoter such as, for example, the U6 promoter or the Hl promoter (eLife 2013 2:e00471). For example, an approach in plants has been described using three different Pol III promoters from three different Arabidopsis U6 genes, and their corresponding gene terminators (BMC Plant Biology 2014 14:327). One skilled in the art would readily understand that many additional Pol III promoters could be utilized to simultaneously express many crRNAs and / or guide RNAs to many different locations in the genome simultaneously. The use of different Pol III37MF-364578422Attorney Docket No. 26223-20028.40 promoters for each crRNA and / or gRNA expression cassette may be desirable to reduce the chances of natural gene silencing that can occur when multiple copies of identical sequences are expressed in plants. In addition, crRNAs and / or guide RNAs can be modified to improve the efficiency of their function in guiding CAS9 to a target nucleic acid. For example, it has been shown that adding either 8 or 20 additional nucleotides to the gRNA in order to extend the hairpin by 4 or 10 base pairs resulted in more efficient CAS 9 activity (eLife 2013 2:e00471).
[0121] In some embodiments, the guide RNA is driven by a U6 promoter. In some embodiments, the guide RNA is driven by a promoter having a nucleic acid sequence with at least about 20%, at least about 25%, at least about 30%, at least about 40%, at least about 50%, at least about 55%, at least about 60%, at least about 65%, at least about 70%, at least about 75%, at least about 80%, at least about 85%, at least about 90%, at least about 91%, at least about 92%, at least about 93%, at least about 94%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, at least about 99%, or about 100% nucleic acid sequence identity to the nucleic acid sequence of a U6 promoter sequence provided herein, e.g., SEQ ID NO: 105.
[0122] Alternatively, a tRNA-gRNA expression cassette (Xie, X et al, 2015, Proc Natl Acad Sci U S A. 2015 Mar 17; 112(1 l):3570-5) may be used to deliver multiple gRNAs simultaneously with high expression levels. In such an embodiment, a tRNA in such a cassette may have a nucleic acid sequence with at least about 20%, at least about 25%, at least about 30%, at least about 40%, at least about 50%, at least about 55%, at least about 60%, at least about 65%, at least about 70%, at least about 75%, at least about 80%, at least about 85%, at least about 90%, at least about 91%, at least about 92%, at least about 93%, at least about 94%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, at least about 99%, or about 100% nucleic acid sequence identity to the nucleic acid sequence of a tRNA sequence provided herein, e.g., SEQ ID NO: 106.Trans-activating CRISPR RNAs
[0123] The trans-activating CRISPR RNA (tracrRNA) of the present disclosure may take a variety of forms, as will be readily understood by one of skill in the art. As described above, tracrRNAs are involved in the maturation of a crRNA. tracrRNAs of the present disclosure may be expressed as a single tracrRNA molecule, or they may be expressed in the form of a crRNA / tracrRNA hybrid molecule where the crRNA and the tracrRNA have been fused38MF-364578422Attorney Docket No. 26223-20028.40 together, forming a guide RNA (gRNA). tracrRNA molecules and / or guide RNA molecules may be extended to include sites for the binding of RNA binding proteins.
[0124] As CRISPR systems naturally exist in a variety of bacteria, the framework of the crRNA and tracrRNA in these bacteria may be adapted for use in the methods and compositions described herein. crRNAs, tracrRNAs, and / or guide RNAs of the present disclosure may be constructed based on the framework of one or more of these molecules in, for example, .S', pyogenes, Streptococcus thermophilus, and / or N. meningitidis. For example, a guide RNA of the present disclosure may be constructed based on the framework of the crRNA and tracrRNA from 5. pyogenes (SEQ ID NO: 46), Streptococcus thermophilus (SEQ ID NO: 47), and / or N. meningitidis (SEQ ID NO: 48). In these exemplary frameworks, the 5’ end of the sequence contains 20 generic nucleotides (N) that correspond to the crRNA targeting sequence. This sequence will vary depending on the sequence of the particular nucleic acid being targeted.
[0125] In some embodiments, the tracrRNA component may have a nucleic acid sequence with at least about 20%, at least about 25%, at least about 30%, at least about 40%, at least about 50%, at least about 55%, at least about 60%, at least about 65%, at least about 70%, at least about 75%, at least about 80%, at least about 85%, at least about 90%, at least about 91%, at least about 92%, at least about 93%, at least about 94%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, at least about 99%, or about 100% nucleic acid sequence identity to the nucleotide sequence set forth in SEQ ID NO: 49.Affinity Polypeptides
[0126] Certain aspects of the present disclosure relate to recombinant polypeptides that contain an affinity polypeptide. Affinity polypeptides of the present disclosure may bind to one or more epitopes (e.g. a multimerized epitope). In some embodiments, an affinity polypeptide is present in a recombinant polypeptide that contains a histone methyltransferase domain polypeptide and an affinity polypeptide.
[0127] A variety of affinity polypeptides are known in the art and may be used herein. Generally, the affinity polypeptide should be stable in the conditions present in the intracellular environment of a plant cell. Additionally, the affinity polypeptide should specifically bind to its corresponding epitope with minimal cross-reactivity.39MF-364578422Attorney Docket No. 26223-20028.40
[0128] The affinity polypeptide may be an antibody such as, for example, an scFv. The antibody may be optimized for stability in the plant intracellular environment. When a GCN4 epitope is used in the methods described herein, a suitable affinity polypeptide that is an antibody may contain an anti-GCN4 scFv domain.
[0129] In embodiments where the affinity polypeptide is an scFv antibody, the polypeptide may contain an amino acid sequence with at least about 20%, at least about 25%, at least about 30%, at least about 40%, at least about 50%, at least about 55%, at least about 60%, at least about 65%, at least about 70%, at least about 75%, at least about 80%, at least about 85%, at least about 90%, at least about 91%, at least about 92%, at least about 93%, at least about 94%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, at least about 99%, or about 100% amino acid identity to the amino acid sequence of an scFv antibody polypeptide sequence provided herein, e.g., SEQ ID NO: 95.
[0130] Other exemplary affinity polypeptides include, for example, proteins with SH2 domains or the domain itself, 14-3-3 proteins, proteins with SH3 domains or the domain itself, the Alpha-Syntrophin PDZ protein interaction domain, the PDZ signal sequence, or proteins from plants which can recognize AGO hook motifs (e.g. AGO4 from Arabidopsis thaliand).
[0131] Additional affinity polypeptides that may be used in the methods and compositions described herein will be readily apparent to those of skill in the art.Epitopes and Multimerized Epitopes
[0132] Certain aspects of the present disclosure relate to recombinant polypeptides that contain an epitope or a multimerized epitope. Epitopes of the present disclosure may bind to an affinity polypeptide. In some embodiments, an epitope or multimerized epitope is present in a recombinant polypeptide that contains a dCAS9 polypeptide.
[0133] Epitopes of the present disclosure may be used for recruiting affinity polypeptides (and any polypeptides they may be recombinantly fused to) to a dCAS9 polypeptide. In embodiments where a dCAS9 polypeptide is fused to an epitope or a multimerized epitope, the dCAS9 polypeptide may be fused to one copy of an epitope, multiple copies of an epitope, more than one different epitope, or multiple copies of more than one different epitope as further described herein.40MF-364578422Attorney Docket No. 26223-20028.40
[0134] A variety of epitopes and multimerized epitopes are known in the art and may be used herein. In general, the epitope or multimerized epitope may be any polypeptide sequence that is specifically recognized by an affinity polypeptide of the present disclosure. Exemplary epitopes may include a c-Myc affinity tag, an HA affinity tag, a His affinity tag, an S affinity tag, a methionine-His affinity tag, an RGD-His affinity tag, a FLAG octapeptide, a strep tag or strep tag II, a V5 tag, a VSV-G epitope, and a GCN4 epitope.
[0135] Other exemplary amino acid sequences that may serve as epitopes and multimerized epitopes include, for example, phosphorylated tyrosines in specific sequence contexts recognized by SH2 domains, characteristic consensus sequences containing phosphoserines recognized by 14-3-3 proteins, proline rich peptide motifs recognized by SH3 domains, the PDZ protein interaction domain or the PDZ signal sequence, and the AGO hook motif from plants.
[0136] Epitopes described herein may also be multimerized. Multimerized epitopes may include at least 2, at least 3, at least 4, at least 5, at least 6, at least 7, at least 8, at least 9, at least 10, at least 11, at least 12, at least 13, at least 14, at least 15, at least 16, at least 17, at least 18, at least 19, at least 20, at least 21, at least 22, at least 23, or at least 24 or more copies of an epitope.
[0137] Multimerized epitopes may be present as tandem copies of an epitope, or each individual epitope may be separated from another epitope in the multimerized epitope by a linker or other amino acid sequence. Suitable linker regions are known in the art and are described herein. The linker may be configured to allow the binding of affinity polypeptides to adjacent epitopes without, or without substantial, steric hindrance. Linker sequences may also be configured to provide an unstructured or linear region of the polypeptide to which they are recombinantly fused. The linker sequence may comprise e.g. one or more glycines and / or serines. The linker sequences may be e.g. at least 2, at least 3, at least 4, at least 5, at least 6, at least 7, at least 8, at least 9, or at least 10 or more amino acids in length.
[0138] In some embodiments, the epitope is a GCN4 epitope. In some embodiments, the multimerized epitope contains at least 2, at least 3, at least 4, at least 5, at least 6, at least 7, at least 8, at least 9, at least 10, at least 11, at least 12, at least 13, at least 14, at least 15, at least 16, at least 17, at least 18, at least 19, at least 20, at least 21, at least 22, at least 23, or at least 24 copies of a GCN4 epitope polypeptide sequence provided herein, e.g., SEQ ID NO: 96 or41MF-364578422Attorney Docket No. 26223-20028.40SEQ ID NO: 97. In some embodiments, the multimerized epitope contains 10 copies of a GCN4 epitope polypeptide sequence provided herein.
[0139] Additional epitopes and multimerized epitopes that may be used in the methods and compositions described herein will be readily apparent to those of skill in the art.Nuclear Localization Signals (NLS)
[0140] Recombinant polypeptides of the present disclosure may contain one or more nuclear localization signals (NLS). Nuclear localization signals may also be referred to as nuclear localization sequences, domains, peptides, or other terms readily apparent to those of skill in the art. Nuclear localization signals are a translocation sequence that, when present in a polypeptide, direct that polypeptide to localize to the nucleus of a eukaryotic cell.
[0141] Various nuclear localization signals may be used in recombinant polypeptides of the present disclosure. For example, one or more SV40-type NLS or one or more REX NLS may be used in recombinant polypeptides. Recombinant polypeptides may also contain two or more tandem copies of a nuclear localization signal. For example, recombinant polypeptides may contain at least two, at least three, at least for, at least five, at least six, at least seven, at least eight, at least nine, or at least ten copies, either tandem or not, of a nuclear localization signal.
[0142] Recombinant polypeptides of the present disclosure may contain one or more nuclear localization signals that contain an amino acid sequence with at least about 20%, at least about 25%, at least about 30%, at least about 40%, at least about 50%, at least about 55%, at least about 60%, at least about 65%, at least about 70%, at least about 75%, at least about 80%, at least about 85%, at least about 90%, at least about 91%, at least about 92%, at least about 93%, at least about 94%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, at least about 99%, or about 100% amino acid identity to an amino acid sequence selected from the group consisting of SEQ ID NO: 50, SEQ ID NO: 51, and SEQ ID NO: 52.Linkers
[0143] Various linkers may be used in the construction of recombinant proteins as described herein. In general, linkers are short peptides that separate the different domains in a multi-domain protein. They may play an important role in fusion proteins, affecting the42MF-364578422Attorney Docket No. 26223-20028.40 crosstalk between the different domains, the yield of protein production, and the stability and / or the activity of the fusion proteins. Linkers are generally classified into 2 major categories: flexible or rigid. Flexible linkers are typically used when the fused domains require a certain degree of movement or interaction, and these linkers are usually composed of small amino acids such as, for example, glycine (G), serine (S) or proline (P).
[0144] The certain degree of movement between domains allowed by flexible linkers is an advantage in some fusion proteins. However, it has been reported that flexible linkers can sometimes reduce protein activity due to an inefficient separation of the two domains. In this case, rigid linkers may be used since they enforce a fixed distance between domains and promote their independent functions. A thorough description of several linkers has been provided in Chen X et al., 2013, Advanced Drug Delivery Reviews 65 (2013) 1357-1369).
[0145] Various linkers may be used in, for example, the construction of recombinant polypeptides as described herein. Linkers may be used in e.g. dCAS9-multimerized epitope fusion proteins as described herein to separate the coding sequences of the dCAS9 polypeptide and the multimerized epitope polypeptide. Linkers may be used in e.g. histone methyltransferase domain-affinity polypeptide fusion proteins as described herein to separate the coding sequences of the histone methyltransferase domain polypeptide and the affinity polypeptide. For example, a variety of wiggly / flexible linkers, stiff / rigid linkers, short linkers, and long linkers may be used as described herein. Various linkers as described herein may be used in the construction of recombinant proteins as described herein.
[0146] A variety of shorter or longer linker regions are known in the art, for example corresponding to a series of glycine residues, a series of adjacent glycine-serine dipeptides, a series of adjacent glycine-glycine-serine tripeptides, or known linkers from other proteins. A flexible linker may include, for example, the amino acid sequence set forth in SEQ ID NO: 53 and variants thereof. A rigid linker may include, for example, the amino acid sequence set forth in SEQ ID NO: 54 and variants thereof. The XTEN linker set forth in SEQ ID NO: 55, and variants thereof, described in Guilinget et al, 2014 (Nature Biotechnology 32, 577-582), may also be used. This particular linker was previously shown to produce the best results among other linkers in a protein fusion between dCAS9 and the nuclease Fokl.
[0147] Recombinant polypeptides of the present disclosure may contain one or more linkers that contain an amino acid sequence with at least about 20%, at least about 25%, at least about 30%, at least about 40%, at least about 50%, at least about 55%, at least about43MF-364578422Attorney Docket No. 26223-20028.4060%, at least about 65%, at least about 70%, at least about 75%, at least about 80%, at least about 85%, at least about 90%, at least about 91%, at least about 92%, at least about 93%, at least about 94%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, at least about 99%, or about 100% amino acid identity to an amino acid sequence selected from the group consisting of SEQ ID NO: 56 and SEQ ID NO: 57.Tags, Reporters, and Other Features
[0148] Recombinant polypeptides of the present disclosure may contain one or more tags that allow for e.g. purification and / or detection of the recombinant polypeptide. Various tags may be used herein and are well-known to those of skill in the art. Exemplary tags may include HA, GST, FLAG, MBP, etc., and multiple copies of one or more tags may be present in a recombinant polypeptide.
[0149] Recombinant polypeptides of the present disclosure may contain one or more reporters that allow for e.g. visualization and / or detection of the recombinant polypeptide. A reporter polypeptide encodes a protein that may be readily detectable due to its biochemical characteristics such as, for example, enzymatic activity or chemifluorescent features.Reporter polypeptides may be detected in a number of ways depending on the characteristics of the particular reporter. For example, a reporter polypeptide may be detected by its ability to generate a detectable signal (e.g. fluorescence), by its ability to form a detectable product, etc. Various reporters may be used herein and are well-known to those of skill in the art. Exemplary reporters may include GFP, GUS, mCherry, luciferase, etc., and multiple copies of one or more tags may be present in a recombinant polypeptide.
[0150] Recombinant polypeptides of the present disclosure may contain one or more polypeptide domains that serve a particular purpose depending on the particular goal / need. For example, recombinant polypeptides may contain a GB 1 polypeptide. Recombinant polypeptides may contain translocation sequences that target the polypeptide to a particular cellular compartment or area. Suitable features will be readily apparent to those of skill in the art.Recombinant Nucleic Acids Encoding Recombinant Proteins
[0151] Certain aspects of the present disclosure relate to recombinant nucleic acids encoding recombinant proteins of the present disclosure. Certain aspects of the present44MF-364578422Attorney Docket No. 26223-20028.40 disclosure relate to recombinant nucleic acids encoding various portions / domains of recombinant proteins of the present disclosure.
[0152] As used herein, the terms “polynucleotide,” “nucleic acid,” and variations thereof shall be generic to polydeoxyribonucleotides (containing 2-deoxy-D-ribose), to polyribonucleotides (containing D-ribose), to any other type of polynucleotide that is an N- glycoside of a purine or pyrimidine base, and to other polymers containing non-nucleotidic backbones, provided that the polymers contain nucleobases in a configuration that allows for base pairing and base stacking, as found in DNA and RNA. Thus, these terms include known types of nucleic acid sequence modifications, for example, substitution of one or more of the naturally occurring nucleotides with an analog, and inter-nucleotide modifications. As used herein, the symbols for nucleotides and polynucleotides are those recommended by the IUPAC-IUB Commission of Biochemical Nomenclature.
[0153] In some embodiments, a recombinant nucleic acid is provided that encodes a recombinant polypeptide comprising a histone methyltransferase domain, such as, e.g., a recombinant SDG2 polypeptide. In some embodiments, the recombinant nucleic acid encodes an SDG2 polypeptide that has an amino acid sequence that is at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 75%4at least 80%, at least 85%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or about 100% identical to Arabidopsis thaliana SDG2C.
[0154] In some embodiments, a recombinant nucleic acid is provided containing a plant promoter and that encodes a recombinant polypeptide containing a nuclease-deficient CAS9 polypeptide (dCAS9) and a multimerized epitope. This recombinant nucleic acid may encode a recombinant polypeptide having an amino acid sequence with at least about 20%, at least about 25%, at least about 30%, at least about 40%, at least about 50%, at least about 55%, at least about 60%, at least about 65%, at least about 70%, at least about 75%, at least about 80%, at least about 85%, at least about 90%, at least about 91%, at least about 92%, at least about 93%, at least about 94%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, at least about 99%, or about 100% amino acid identity to the amino acid sequence of a dCAS9 with a multimerized epitope provided herein..
[0155] In some embodiments, a recombinant nucleic acid is provided containing a plant promoter and that encodes recombinant polypeptide containing a histone methyltransferase domain and an affinity polypeptide. This recombinant nucleic acid may encode a45MF-364578422Attorney Docket No. 26223-20028.40 recombinant polypeptide having an amino acid sequence with at least about 20%, at least about 25%, at least about 30%, at least about 40%, at least about 50%, at least about 55%, at least about 60%, at least about 65%, at least about 70%, at least about 75%, at least about 80%, at least about 85%, at least about 90%, at least about 91%, at least about 92%, at least about 93%, at least about 94%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, at least about 99%, or about 100% amino acid identity to, for example, an amino acid sequence provided in the Examples herein of a recombinant polypeptide containing a histone methyltransferase domain and an affinity polypeptide.
[0156] Recombinant nucleic acids are also provided that have a nucleic acid sequence with at least about 20%, at least about 25%, at least about 30%, at least about 40%, at least about 50%, at least about 55%, at least about 60%, at least about 65%, at least about 70%, at least about 75%, at least about 80%, at least about 85%, at least about 90%, at least about 91%, at least about 92%, at least about 93%, at least about 94%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, at least about 99%, or about 100% nucleic acid sequence identity to any the nucleic acid sequence provided herein.
[0157] Sequences of the polynucleotides of the present disclosure may be prepared by various suitable methods known in the art, including, for example, direct chemical synthesis or cloning. For direct chemical synthesis, formation of a polymer of nucleic acids typically involves sequential addition of 3 '-blocked and 5 '-blocked nucleotide monomers to the terminal 5'-hydroxyl group of a growing nucleotide chain, wherein each addition is effected by nucleophilic attack of the terminal 5'-hydroxyl group of the growing chain on the 3 '- position of the added monomer, which is typically a phosphorus derivative, such as a phosphotriester, phosphoramidite, or the like. Such methodology is known to those of ordinary skill in the art and is described in the pertinent texts and literature (e.g., in Matteucci et al., (1980) Tetrahedron Lett 21:719-722; U.S. Pat. Nos. 4,500,707; 5,436,327; and 5,700,637). In addition, the desired sequences may be isolated from natural sources by splitting DNA using appropriate restriction enzymes, separating the fragments using gel electrophoresis, and thereafter, recovering the desired polynucleotide sequence from the gel via techniques known to those of ordinary skill in the art, such as utilization of polymerase chain reactions (PCR; e.g., U.S. Pat. No. 4,683,195).
[0158] The nucleic acids employed in the methods and compositions described herein may be codon optimized relative to a parental template for expression in a particular host cell. Cells differ in their usage of particular codons, and codon bias corresponds to relative46MF-364578422Attorney Docket No. 26223-20028.40 abundance of particular tRNAs in a given cell type. By altering codons in a sequence so that they are tailored to match with the relative abundance of corresponding tRNAs, it is possible to increase expression of a product (e.g. a polypeptide) from a nucleic acid. Similarly, it is possible to decrease expression by deliberately choosing codons corresponding to rare tRNAs. Thus, codon optimization / deoptimization can provide control over nucleic acid expression in a particular cell type (e.g. bacterial cell, plant cell, mammalian cell, etc.). Methods of codon optimizing a nucleic acid for tailored expression in a particular cell type are well-known to those of skill in the art.Target Nucleic Acids
[0159] Histone methyltransferase domains of the present disclosure may be targeted to specific target nucleic acids to methylate histones at or near the target nucleic acid and / or to promote recombination events at or near the target nucleic acid (which may also be referred to as a target sequence, target genomic sequence, target locus, and the like). The target nucleic acid may be described with reference to sequences upstream and / or downstream of the target in order to aid tracking of recombination. For example, a target nucleic acid may be described as being between a first genomic locus and a second genomic locus, in terms of 5’ to 3’ order on a contiguous stretch of nucleic acids, such as within a single chromosome comprising the target nucleic acid.
[0160] In some embodiments, the histone methyltransferase domain polypeptide is targeted to the target nucleic acid via a heterologous DNA-binding domain. In this sense, a target nucleic acid of the present disclosure is targeted based on the particular nucleotide sequence in the target nucleic acid that is recognized by the targeting portion of the DNA- binding domain. In some embodiments, histone methyltransferase domains methylate histones at or near the target nucleic acid and / or lead to localized increases in recombination rates at or near the target nucleic acid by being targeted to the nucleic acid with the assistance of a guide RNA (via CRISPR-based targeting). In some embodiments, the CRISPR-based targeting scheme may be a SunTag targeting system. With CRISPR-based targeting, a target nucleic acid of the present disclosure is targeted based on the particular nucleotide sequence in the target nucleic acid that is recognized by the targeting portion of the crRNA or guide RNA that is used according to the methods of the present disclosure.
[0161] Various types of nucleic acids may be targeted in the methods provided herein, as will be readily apparent to one of skill in the art. The target nucleic acid may be located47MF-364578422Attorney Docket No. 26223-20028.40 within the coding region of a target gene or upstream or downstream thereof. Moreover, the target nucleic acid may reside endogenously in a target gene or may be inserted into the gene, e.g., heterologous, for example, using techniques such as homologous recombination. For example, a target gene of the present disclosure can be operably linked to a control region, such as a promoter, that contains a sequence that can be recognized by e.g. a crRNA / tracrRNA and / or a guide RNA of the present disclosure such that a histone methyltransferase domain of the present disclosure may be targeted to that sequence. In some embodiments, the target nucleic acid is not a target of and / or does not naturally associate with the naturally-occurring histone methyltransferase domain polypeptide (e.g. PRDM, SDG2).
[0162] In some embodiments, one or both of the first genomic locus and second genomic locus is in euchromatin. In some embodiments, one or both of the first genomic locus and second genomic locus is in heterochromatin. In some embodiments, one or both of the first genomic locus and second genomic locus is in a recombination cold spot (e.g., in a region of the genome that, in wild type cells, undergoes meiotic recombination relatively infrequently). In some embodiments, one or both of the first genomic locus and second genomic locus is in a recombination hot spot (e.g., in a region of the genome that, in wild type cells, undergoes meiotic recombination relatively frequently). In some embodiments, one or both of the first genomic locus and second genomic locus is neither in a recombination cold spot nor hot spot (e.g., in a region of the genome that, in wild type cells, undergoes meiotic recombination at roughly average rates compared to the rest of the genome). One of skill in the art can readily determine the relative recombination propensity of a genomic region, including, for example, whether a genomic region is in a recombination hot spot or cold spot.
[0163] In some embodiments, the target nucleic acid is in euchromatin. In some embodiments, the target nucleic acid is in heterochromatin. In some embodiments, the target nucleic acid is in a recombination cold spot (e.g., in a region of the genome that, in wild type cells, undergoes meiotic recombination relatively infrequently). For example, a cold spot may be identified in a genome as a region having a very low number of crossover events per meiosis, or even zero crossover events (see, e.g., Si et al., 2015, Widely Distributed Hot and Cold Spots in Meiotic Recombination as Shown by the Sequencing of Rice F2 Plants, 206 New Phytologist 1491 (2015)). In some embodiments, the target nucleic acid is in a recombination hot spot (e.g., in a region of the genome that, in wild type cells, undergoes meiotic recombination relatively frequently). For example, a hot spot may be identified in a genome as a region having a very high number of crossover events per meiosis (see, e.g., Si48MF-364578422Attorney Docket No. 26223-20028.40 et al., 2015, Widely Distributed Hot and Cold Spots in Meiotic Recombination as Shown by the Sequencing of Rice 2 Plants, 206 New Phytologist 1491 (2015)). In some embodiments, the target nucleic acid is neither in a recombination cold spot nor hot spot (e.g., in a region of the genome that, in wild type cells, undergoes meiotic recombination at roughly average rates compared to the rest of the genome). One of skill in the art can readily determine the relative recombination propensity of a genomic region, including, for example, whether a genomic region is in a recombination hot spot or cold spot.
[0164] In some embodiments, the target nucleic acid is in a centromere or proximal to a centromere. As described above, centromeres are known as chromosomal positions where kinetochores assemble, and are often occupied by a centromere-specific variant of the histone H3 protein, and methods of identifying centromeres and centromere-proximal regions are well known in the art. In some embodiments, one or both of the first genomic locus and second genomic locus is in a centromere. In some embodiments, one or both of the first genomic locus and second genomic locus is proximal to a centromere. In some embodiments, the target nucleic acid, the first genomic locus, and the second genomic locus are in a centromere, e.g., within the same centromere. In some embodiments, the target nucleic acid, the first genomic locus, and the second genomic locus are all proximal to a centromere, e.g., proximal to the same centromere. In some embodiments, the target nucleic acid, first genomic locus, and / or second genomic locus comprises one or more histones. In some embodiments, the genomic region between the first and second genomic loci comprises histones. In some embodiments, the histones comprise centromeric histone H3 (CENH3) histones. In some embodiments, the histones are methylated. In some embodiments, the methylation state of the histones is increased as a consequence of the methods provided herein. In some embodiments, the type of methylation that is increased is H3K4me3.
[0165] In some embodiments, the target nucleic acid comprises a repetitive sequence, which may be referred to as a repeat. Repetitive sequences of DNA are known to occur within plant and other genomes. It is well-understood in the art that repetitive sequences are represent fragments of DNA that are present multiple times, in copies, within a genome. For example, as described above, centromeres are known to contain repetitive DNA sequences, such as high-copy tandem repeats (also called satellites), or transposon families. In some embodiments, the genomic sequence between the first genomic locus and second genomic locus is repetitive. One of skill in the art can readily determine whether a particular genomic sequence is repetitive relative to the respective genome sequence as a whole.49MF-364578422Attorney Docket No. 26223-20028.40
[0166] In some embodiments, the target nucleic acid is endogenous to the plant where the rate of recombination is increased according to the methods described herein. In some embodiments, the target nucleic acid is a transgene of interest that has been inserted into a plant. Methods of introducing transgenes into plants are well known in the art. Transgenes may be inserted into plants for a variety of reasons, including, for example, in order to provide a production system for a desired protein, to genetically compliment a mutation or phenotype, or to modulate the metabolism of a plant.
[0167] Suitable target nucleic acids will be readily apparent to one of skill in the art depending on the particular need or outcome. The target nucleic acid may be in e.g. a region of euchromatin (e.g., a highly expressed gene), or the target nucleic acid may be in a region of heterochromatin (e.g., centromere DNA). Use of histone methyltransferase domains according to the methods described herein to induce recombination events in a region of heterochromatin or other highly methylated region of a plant genome may be especially useful in certain research embodiments. For example, activation of a retrotransposon in a plant genome may find use in allowing more precise modulation of important agricultural traits by decoupling genomic regions that typically segregate together. A target nucleic acid of the present disclosure may be methylated or it may be unmethylated.
[0168] Target nucleic acids will be readily apparent. Exemplary target nucleic acids for, e.g., research or other purposes may include, for example, genes and / or highly repetitive DNA regions, such as centromeres. The methods of the present disclosure may also provide a quantitative approach to comparing guide RNA efficiency at targeting histone methylation using plant-based SunTag expression systems.Plants of the Present Disclosure
[0169] Certain aspects of the present disclosure relate to plants containing histone methyltransferase domains that are targeted to one or more target nucleic acids in the plant in order to induce histone methylation and / or increase meiotic recombination at or around the target nucleic acid.
[0170] As used herein, a “plant” refers to any of various photosynthetic, eukaryotic multicellular organisms of the kingdom Plantae, characteristically producing embryos, containing chloroplasts, having cellulose cell walls and lacking locomotion. As used herein, a “plant” includes any plant or part of a plant at any stage of development, including seeds, suspension cultures, plant cells, embryos, meristematic regions, callus tissue, leaves, roots, shoots,50MF-364578422Attorney Docket No. 26223-20028.40 gametophytes, sporophytes, pollen, microspores, and progeny thereof. Also included are cuttings, and cell or tissue cultures. As used in conjunction with the present disclosure, plant tissue includes, for example, whole plants, plant cells, plant organs, e.g., leaves, stems, roots, meristems, plant seeds, protoplasts, callus, cell cultures, and any groups of plant cells organized into structural and / or functional units.
[0171] Any plant cell may be used in the present disclosure so long as it remains viable after being transformed with a sequence of nucleic acids. Preferably, the plant cell is not adversely affected by the transduction of the necessary nucleic acid sequences, the subsequent expression of the proteins or the resulting intermediates.
[0172] As disclosed herein, a broad range of plant types may be modified to incorporate recombinant polypeptides and / or polynucleotides of the present disclosure. Suitable plants that may be modified include both monocoty ledonous (monocot) plants and dicotyledonous (dicot) plants.
[0173] Examples of suitable plants may include, for example, species of the Family Gramineae, including Sorghum bicolor and Zea mays; species of the genera: Cucurbita, Rosa, Vitis, Juglans, Fragaria, Eotus, Medicago, Onobrychis, Trifolium, Trigonella, Vigna, Citrus, Linum, Geranium, Manihot, Daucus, Arabidopsis, Brassica, Raphanus, Sinapis, Atropa, Capsicum, Datura, Hyoscyamus, Lycopersicon, Nicotiana, Solanum, Petunia, Digitalis, Majorana, Ciahorium, Helianthus, Lactuca, Bromus, Asparagus, Antirrhinum, Heterocallis, Nemesis, Pelargonium, Panieum, Pennisetum, Ranunculus, Senecio, Salpiglossis, Cucumis, Browaalia, Glycine, Pisum, Phaseolus, Lolium, Oryza, Avena, Hordeum, Secale, and Triticum.
[0174] In some embodiments, plant cells may include, for example, those from corn (Zea mays), canola (Brassica napus, Brassica rapa ssp.), Brassica species useful as sources of seed oil, alfalfa (Medicago sativa), rice (Oryza sativa), rye (Secale cereale), sorghum (Sorghum bicolor, Sorghum vulgare), millet (e.g., pearl millet (Pennisetum glaucum), proso millet (Panieum miliaceum), foxtail millet (Setaria italica), finger millet (Eleusine coracana)), sunflower (Helianthus annuus), safflower (Carthamus tinctorius), wheat (Triticum aestivum), duckweed (Lemna), soybean (Glycine max), tobacco (Nicotiana tabacum), potato (Solanum tuberosum), peanuts (Arachis hypogaea), cotton (Gossypium barbadense, Gossypium hirsutum), sweet potato (Ipomoea batatus), cassava (Manihot esculenta), coffee (Coffea spp.), coconut (Cocos nueijra), pineapple (Ananas comosus), citrus trees (Citrus spp.), cocoa51MF-364578422Attorney Docket No. 26223-20028.40(Theobroma cacao), tea (Camellia sinensis), banana (Musa spp.), avocado (Persea americana), fig (Ficus casica), guava (Psidium guajava), mango (Mangifera indica), olive (Olea europaea), papaya (Carica papaya), cashew (Anacardium occidentale), macadamia (Macadamia spp.), almond (Prunus amygdalus), sugar beets (Beta vulgaris), sugarcane (Saccharum spp.), oats, barley, vegetables, ornamentals, and conifers.
[0175] Examples of suitable vegetables plants may include, for example, tomatoes (Lycopersicon esculentum), lettuce (e.g., Lactuca sativa), green beans (Phaseolus vulgaris), lima beans (Phaseolus limensis), peas (Lathyrus spp.), and members of the genus Cucumis such as cucumber (C. sativus), cantaloupe (C. cantalupensis), and musk melon (C. melo).
[0176] Examples of suitable ornamental plants may include, for example, azalea (Rhododendron spp.), hydrangea (Macrophylla hydrangea), hibiscus (Hibiscus rosasanensis), roses (Rosa spp.), tulips (Tulipa spp.), daffodils (Narcissus spp.), petunias (Petunia hybrida), carnation (Dianthus caryophyllus), poinsettia (Euphorbiapulcherrima), and chrysanthemum.
[0177] Examples of suitable conifer plants may include, for example, loblolly pine (Pinus taeda), slash pine (Pinus elliotii), ponderosa pine (Pinus ponderosa), lodgepole pine (Pinus contorta), Monterey pine (Pinus radiata), Douglas-fir (Pseudotsuga menziesii), Western hemlock (Isuga canadensis), Sitka spruce (Picea glauca), redwood (Sequoia sempervirens), silver fir (Abies amabilis), balsam fir (Abies balsamea), Western red cedar (Thuja plicata), and Alaska yellow-cedar (Chamaecyparis nootkatensis).
[0178] Examples of suitable leguminous plants may include, for example, guar, locust bean, fenugreek, soybean, garden beans, cowpea, mungbean, lima bean, fava bean, lentils, chickpea, peanuts (Arachis sp.), crown vetch (Vicia sp.), hairy vetch, adzuki bean, lupine (Lupinus sp.), trifolium, common bean (Phaseolus sp.), field bean (Pisum sp.), clover (Melilotus sp.) Lotus, trefoil, lens, and false indigo.
[0179] Examples of suitable forage and turf grass may include, for example, alfalfa (Medicago s sp.), orchard grass, tall fescue, perennial ryegrass, creeping bent grass, and redtop.
[0180] Examples of suitable crop plants and model plants may include, for example, Arabidopsis, corn, rice, alfalfa, sunflower, canola, soybean, cotton, peanut, sorghum, wheat, tobacco, and lemna.
[0181] The plants of the present disclosure may be genetically modified in that recombinant nucleic acids have been introduced into the plants, and as such the genetically52MF-364578422Attorney Docket No. 26223-20028.40 modified plants do not occur in nature. A suitable plant of the present disclosure is one capable of expressing one or more nucleic acid constructs encoding one or more recombinant proteins. The recombinant proteins encoded by the nucleic acids may be e.g. recombinant polypeptides containing a nuclease-deficient CAS 9 polypeptide (dCAS9) and a multimerized epitope, as well as recombinant polypeptides containing a histone methyltransferase domain and an affinity polypeptide.
[0182] As used herein, the terms “transgenic plant” and “genetically modified plant” are used interchangeably and refer to a plant which contains within its genome a recombinant nucleic acid. Generally, the recombinant nucleic acid is stably integrated within the genome such that the polynucleotide is passed on to successive generations. However, in certain embodiments, the recombinant nucleic acid is transiently expressed in the plant. The recombinant nucleic acid may be integrated into the genome alone or as part of a recombinant expression cassette. “Transgenic” is used herein to include any cell, cell line, callus, tissue, plant part or plant, the genotype of which has been altered by the presence of exogenous nucleic acid including those transgenics initially so altered as well as those created by sexual crosses or asexual propagation from the initial transgenic.
[0183] “Recombinant nucleic acid” or “heterologous nucleic acid” or “recombinant polynucleotide” as used herein refers to a polymer of nucleic acids wherein at least one of the following is true: (a) the sequence of nucleic acids is foreign to (i.e., not naturally found in) a given host cell; (b) the sequence may be naturally found in a given host cell, but in an unnatural (e.g., greater than expected) amount; or (c) the sequence of nucleic acids contains two or more subsequences that are not found in the same relationship to each other in nature. For example, regarding instance (c), a recombinant nucleic acid sequence will have two or more sequences from unrelated genes arranged to make a new functional nucleic acid. Specifically, the present disclosure describes the introduction of an expression vector into a plant cell, where the expression vector contains a nucleic acid sequence coding for a protein that is not normally found in a plant cell or contains a nucleic acid coding for a protein that is normally found in a plant cell but is under the control of different regulatory sequences. With reference to the plant cell’s genome, then, the nucleic acid sequence that codes for the protein is recombinant. A protein that is referred to as recombinant generally implies that it is encoded by a recombinant nucleic acid sequence which may be present in the plant cell. Recombinant proteins of the present disclosure may also be exogenously supplied directly to host cells (e.g. plant cells).53MF-364578422Attorney Docket No. 26223-20028.40
[0184] A “recombinant” polypeptide, protein, or enzyme of the present disclosure, is a polypeptide, protein, or enzyme that may be encoded by a “recombinant nucleic acid” or “heterologous nucleic acid” or “recombinant polynucleotide.”
[0185] In some embodiments, the genes encoding the recombinant proteins in the plant cell may be heterologous to the plant cell. In certain embodiments, the plant cell does not naturally produce one or more polypeptides of the present disclosure, and contains heterologous nucleic acid constructs capable of expressing one or more genes necessary for producing those molecules. In certain embodiments, the plant cell does not naturally produce one or more polypeptides of the present disclosure, and is provided the one or more polypeptides through exogenous delivery of the polypeptides directly to the plant cell without the need to express a recombinant nucleic acid encoding the recombinant polypeptide in the plant cell.
[0186] Recombinant nucleic acids and / or recombinant proteins of the present disclosure may be present in host cells (e.g. plant cells). In some embodiments, recombinant nucleic acids are present in an expression vector, and the expression vector may be present in host cells (e.g. plant cells).Expression of Recombinant Proteins in Plants
[0187] Recombinant polypeptides of the present disclosure may be introduced into plant cells via any suitable methods known in the art. For example, a recombinant polypeptide can be exogenously added to plant cells and the plant cells are maintained under conditions such that the recombinant polypeptide is involved with targeting one or more target nucleic acids to methylate histones at or near the target nucleic acids in the plant cells. Alternatively, a recombinant nucleic acid encoding a recombinant polypeptide of the present disclosure can be expressed in plant cells and the plant cells are maintained under conditions such that the recombinant polypeptides of the present disclosure are targeted to one or more target nucleic acids and methylate histones at or near the target gene in the plant cells. Additionally, in some embodiments, a recombinant polypeptide of the present disclosure may be transiently expressed in a plant via viral infection of the plant, or by introducing a recombinant polypeptide-encoding RNA into a plant to methylate histones at or near a target nucleic acid of interest. Methods of introducing recombinant proteins via viral infection or via the introduction of RNAs into plants are well known in the art. For example, Tobacco rattle vims (TRY) has been successfully used to introduce zinc finger nucleases in plants to cause54MF-364578422Attorney Docket No. 26223-20028.40 genome modification (“Nontransgenic Genome Modification in Plant Cells”, Plant Physiology 154:1079-1087 (2010)).
[0188] A recombinant nucleic acid encoding a recombinant polypeptide of the present disclosure can be expressed in a plant with any suitable plant expression vector. Typical vectors useful for expression of recombinant nucleic acids in higher plants are well known in the art and include, for example, vectors derived from the tumor-inducing (Ti) plasmid of Agrobacterium tumefaciens (e.g., see Rogers et al., Meth, in Enzymol. (1987) 153:253-277). These vectors are plant integrating vectors in that on transformation, the vectors integrate a portion of vector DNA into the genome of the host plant. Exemplary A. tumefaciens vectors useful herein are plasmids pKYLX6 and pKYLX7 (e.g., see of Schardl et al., Gene (1987) 61:1-11; and Berger et al., Proc. Natl. Acad. Sci. USA (1989) 86:8402-8406); and plasmid pBI 101.2 that is available from Clontech Laboratories, Inc. (Palo Alto, CA).
[0189] In addition to regulatory domains, recombinant polypeptides of the present disclosure can be expressed as a fusion protein that is coupled to, for example, a maltose binding protein ("MBP"), glutathione S transferase (GST), hexahistidine, c-myc, or the FLAG epitope for ease of purification, monitoring expression, or monitoring cellular and subcellular localization.
[0190] Moreover, a recombinant nucleic acid encoding a recombinant polypeptide of the present disclosure can be modified to improve expression of the recombinant protein in plants by using codon preference. When the recombinant nucleic acid is prepared or altered synthetically, advantage can be taken of known codon preferences of the intended plant host where the nucleic acid is to be expressed. For example, recombinant nucleic acids of the present disclosure can be modified to account for the specific codon preferences and GC content preferences of monocotyledons and dicotyledons, as these preferences have been shown to differ (Murray et al., Nucl. Acids Res. (1989) 17: 477-498).
[0191] In some embodiments, recombinant polypeptides of the present disclosure can be used to create increased genetic diversity in a plant by promoting meiotic recombination as a consequence of the targeted histone methylation induced by the methods provided herein. Production of plant meiocytes using the methods provided herein may provide a more diverse breeding pool from which a plant having increased genetic diversity may be efficiently produced.55MF-364578422Attorney Docket No. 26223-20028.40
[0192] The present disclosure further provides expression vectors encoding recombinant polypeptides of the present disclosure. A nucleic acid sequence coding for the desired recombinant nucleic acid of the present disclosure can be used to construct a recombinant expression vector which can be introduced into the desired host cell. A recombinant expression vector will typically contain a nucleic acid encoding a recombinant protein of the present disclosure, operably linked to transcriptional initiation regulatory sequences which will direct the transcription of the nucleic acid in the intended host cell, such as tissues of a transformed plant.
[0193] Recombinant nucleic acids e.g. encoding recombinant polypeptides of the present disclosure may be expressed on multiple expression vectors or they may be expressed on a single expression vector. In some embodiments, recombinant nucleic acids encoding (1) recombinant polypeptides containing a nuclease-deficient CAS9 polypeptide (dCAS9) and a multimerized epitope, (2) recombinant polypeptides containing a histone methyltransferase domain and an affinity polypeptide, and (3) a crRNA and a tracrRNA, or fusions thereof (guide RNA), are all expressed on a single vector.
[0194] For example, plant expression vectors may include (1) a cloned gene under the transcriptional control of 5' and 3' regulatory sequences and (2) a dominant selectable marker. Such plant expression vectors may also contain, if desired, a promoter regulatory region (e.g., one conferring inducible or constitutive, environmentally- or developmentally -regulated, or cell- or tissue-specific / selective expression), a transcription initiation start site, a ribosome binding site, an RNA processing signal, a transcription termination site, and / or a polyadenylation signal.
[0195] In some embodiments, an expression vector containing recombinant nucleic acids of the present disclosure may contain a plant-specific TBS insulator sequence having a nucleic acid sequence with at least about 20%, at least about 25%, at least about 30%, at least about 40%, at least about 50%, at least about 55%, at least about 60%, at least about 65%, at least about 70%, at least about 75%, at least about 80%, at least about 85%, at least about 90%, at least about 91%, at least about 92%, at least about 93%, at least about 94%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, at least about 99%, or about 100% nucleic acid sequence identity to the nucleic acid sequence of a plant-specific TBS insulator sequence provided herein.56MF-364578422Attorney Docket No. 26223-20028.40
[0196] In some embodiments, expression of a nucleic acid of the present disclosure may be driven (in operable linkage) with a promoter (e.g. a promoter functional in plants or a plant-specific promoter). A plant promoter, or functional fragment thereof, can be employed to control the expression of a recombinant nucleic acid of the present disclosure in regenerated plants. The selection of the promoter used in expression vectors will determine the spatial and temporal expression pattern of the recombinant nucleic acid in the modified plant, e.g., the nucleic acid encoding the recombinant polypeptide of the present disclosure is only expressed in the desired tissue or at a certain time in plant development or growth. Certain promoters will express recombinant nucleic acids in all plant tissues and are active under most environmental conditions and states of development or cell differentiation (i.e., constitutive promoters). Other promoters will express recombinant nucleic acids in specific cell types (such as leaf epidermal cells, mesophyll cells, root cortex cells) or in specific tissues or organs (roots, leaves or flowers, for example) and the selection will reflect the desired location of accumulation of the gene product. Alternatively, the selected promoter may drive expression of the recombinant nucleic acid under various inducing conditions.
[0197] Examples of suitable constitutive promoters may include, for example, the core promoter of the Rsyn7, the core CaMV 35S promoter (Odell et al., Nature (1985) 313:810- 812), CaMV 19S (Lawton et al., 1987), rice actin (Wang et al., 1992; U.S. Pat. No. 5,641,876; and McElroy et al., Plant Cell (1985) 2:163-171); ubiquitin (Christensen et al., Plant Mol. Biol. (1989)12:619-632; and Christensen et al., Plant Mol. Biol. (1992) 18:675- 689), pEMU (Last et al., Theor. Appl. Genet. (1991) 81:581-588), MAS (Velten et al., EMBO J. (1984) 3:2723-2730), nos (Ebert et al., 1987), Adh (Walker et al., 1987), the P- or 2'- promoter derived from T-DNA of Agrobacterium tumefaciens, the Smas promoter, the cinnamyl alcohol dehydrogenase promoter (U.S. Pat. No. 5,683,439), the Nos promoter, the pEmu promoter, the rubisco promoter, the GRP 1 - 8 promoter, and other transcription initiation regions from various plant genes known to those of skilled artisans, and constitutive promoters described in, for example, U.S. Pat. Nos. 5,608,149; 5,608,144; 5,604,121; 5,569,597; 5,466,785; 5,399,680; 5,268,463; and 5, 608,142.
[0198] In some embodiments, expression of a nucleic acid of the present disclosure may be driven (in operable linkage) with a UBQ10 promoter. In some embodiments, expression of a nucleic acid of the present disclosure may be driven (in operable linkage) with a promoter having a nucleic acid sequence with at least about 20%, at least about 25%, at least about 30%, at least about 40%, at least about 50%, at least about 55%, at least about 60%, at least57MF-364578422Attorney Docket No. 26223-20028.40 about 65%, at least about 70%, at least about 75%, at least about 80%, at least about 85%, at least about 90%, at least about 91%, at least about 92%, at least about 93%, at least about 94%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, at least about 99%, or about 100% nucleic acid sequence identity to the nucleic acid sequence of a UBQ10 promoter sequence provided herein, e.g., SEQ ID NO: 109.
[0199] Examples of suitable tissue specific promoters may include, for example, the lectin promoter (Vodkin et al., 1983; Lindstrom et al., 1990), the corn alcohol dehydrogenase 1 promoter (Vogel et al., 1989; Dennis et al., 1984), the com light harvesting complex promoter (Simpson, 1986; Bansal et al., 1992), the corn heat shock protein promoter (Odell et al., Nature (1985) 313:810-812; Rochester et al., 1986), the pea small subunit RuBP carboxylase promoter (Poulsen et al., 1986; Cashmore et al., 1983), the Ti plasmid mannopine synthase promoter (Langridge et al., 1989), the Ti plasmid nopaline synthase promoter (Langridge et al., 1989), the petunia chaicone isomerase promoter (Van Tunen et al., 1988), the bean glycine rich protein 1 promoter (Keller et al., 1989), the truncated CaMV 35s promoter (Odell et al., Nature (1985) 313:810-812), the potato patatin promoter (Wenzler et al., 1989), the root cell promoter (Conkling et al., 1990), the maize zein promoter (Reina et al., 1990; Kriz et al., 1987; Wandelt and Feix, 1989; Langridge and Feix, 1983; Reina et al., 1990), the globulin-1 promoter (Belanger and Kriz et al., 1991), the alpha-tubulin promoter, the cab promoter (Sullivan et al., 1989), the PEPCase promoter (Hudspeth & Grula, 1989), the R gene complex-associated promoters (Chandler et al., 1989), and the chaicone synthase promoters (Franken et al., 1991).
[0200] Alternatively, the plant promoter can direct expression of a recombinant nucleic acid of the present disclosure in a specific tissue (such as, for example, to a tissue that produces meiocytes) or may be otherwise under more precise environmental or developmental control. Such promoters are referred to here as “inducible” promoters. Environmental conditions that may affect transcription by inducible promoters include, for example, pathogen attack, anaerobic conditions, or the presence of light. Examples of inducible promoters include, for example, the AdhI promoter which is inducible by hypoxia or cold stress, the Hsp70 promoter which is inducible by heat stress, and the PPDK promoter which is inducible by light. Examples of promoters under developmental control include, for example, promoters that initiate transcription only, or preferentially, in certain tissues, such as leaves, roots, fruit, seeds, or flowers. An exemplary promoter is the anther specific promoter 5126 (U.S. Pat. Nos. 5,689,049 and 5,689,051). The operation of a promoter may also vary58MF-364578422Attorney Docket No. 26223-20028.40 depending on its location in the genome. Thus, an inducible promoter may become fully or partially constitutive in certain locations.
[0201] Moreover, any combination of a constitutive or inducible promoter, and a nontissue specific or tissue specific promoter may be used to control the expression of various recombinant polypeptides of the present disclosure.
[0202] The recombinant nucleic acids of the present disclosure and / or a vector housing a recombinant nucleic acid of the present disclosure, may also contain a regulatory sequence that serves as a 3’ terminator sequence. One of skill in the art would readily recognize a variety of terminators that may be used in the recombinant nucleic acids of the present disclosure. For example, a recombinant nucleic acid of the present disclosure may contain a 3’ NOS terminator.
[0203] In some embodiments, recombinant nucleic acids of the present disclosure contain a transcriptional termination site. Transcription termination sites may include, for example, OCS terminators and NOS terminators.
[0204] In some embodiments, recombinant nucleic acids of the present disclosure contain a transcriptional termination site having a nucleic acid sequence with at least about 20%, at least about 25%, at least about 30%, at least about 40%, at least about 50%, at least about 55%, at least about 60%, at least about 65%, at least about 70%, at least about 75%, at least about 80%, at least about 85%, at least about 90%, at least about 91%, at least about 92%, at least about 93%, at least about 94%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, at least about 99%, or about 100% nucleic acid sequence identity to a transcriptional termination nucleic acid sequence provided herein, e.g., SEQ ID NO: 110 or SEQ ID NO: 111. Plant transformation protocols as well as protocols for introducing recombinant nucleic acids of the present disclosure into plants may vary depending on the type of plant or plant cell, e.g., monocot or dicot, targeted for transformation. Suitable methods of introducing recombinant nucleic acids of the present disclosure into plant cells and subsequent insertion into the plant genome include, for example, microinjection (Crossway et al., Biotechniques (1986) 4:320-334), electroporation (Riggs et al., Proc. Natl. Acad Sci. USA (1986) 83:5602-5606), Agrobacterium-mediated transformation (U.S. Pat. No. 5,563,055), direct gene transfer (Paszkowski et al., EMBO J. (1984) 3:2717-2722), and ballistic particle acceleration (U.S. Pat. No. 4,945,050; Tomes et al. (1995). "Direct DNA Transfer into Intact Plant Cells via Microprojectile Bombardment," in Plant Cell, Tissue, and59MF-364578422Attorney Docket No. 26223-20028.40Organ Culture: Fundamental Methods, ed. Gamborg and Phillips (Springer- Verlag, Berlin); and McCabe et al., Biotechnology (1988) 6:923-926).
[0205] Additionally, recombinant polypeptides of the present disclosure can be targeted to a specific organelle within a plant cell. Targeting can be achieved by providing the recombinant protein with an appropriate targeting peptide sequence. Examples of such targeting peptides include, for example, secretory signal peptides (for secretion or cell wall or membrane targeting), plastid transit peptides, chloroplast transit peptides, mitochondrial target peptides, vacuole targeting peptides, nuclear targeting peptides, and the like (e.g., see Reiss et al., Mol. Gen. Genet. (1987) 209(1): 116-121 ; Settles and Martienssen, Trends Cell Biol (1998) 12:494-501; Scott et al., J Biol Chem (2000) 10:1074; and Luque and Correas, J Cell Sci (2000) 113:2485-2495).
[0206] The modified plant may be grown into plants in accordance with conventional ways (e.g., see McCormick et al., Plant Cell. Reports (1986) 81-84.). These plants may then be grown, and pollinated with either the same transformed strain or different strains, with the resulting hybrid having the desired phenotypic characteristic. Two or more generations may be grown to ensure that the subject phenotypic characteristic is stably maintained and inherited and then seeds harvested to ensure the desired phenotype or other property has been achieved.
[0207] The present disclosure also provides plants derived from plants having increased levels of histone methylation and / or increased rates of recombination as a consequence of the methods of the present disclosure. A plant having increased histone methylation and / or an increased rate of recombination between the first and second genomic loci as a consequence of the methods of the present disclosure may be crossed with itself or with another plant to produce an Fl plant. In some embodiments, one or more of the resulting Fl plants may also have increased levels of histone methylation and / or increased rates of recombination at or near the target nucleic acid.
[0208] Further provided are methods of screening plants derived from plants having increased levels of histone methylation and / or increased rates of recombination between the first and second genomic loci (e.g., at or near the target locus) as a consequence of the methods of the present disclosure. In some embodiments, the derived plants (e.g. Fl or F2 plants resulting from or derived from crossing the plant having increased levels of histone methylation and / or increased rates of recombination between the first and second genomic60MF-364578422Attorney Docket No. 26223-20028.40 loci as a consequence of the methods of the present disclosure with another plant) can be selected from a population of derived plants. For example, provided are methods of selecting one or more of the derived plants that (i) lack recombinant nucleic acids, and (ii) have increased levels of histone methylation and / or increased rates of recombination between the first and second genomic loci.Plant Meiosis and Plant Meiocytes
[0209] Certain aspects of the present disclosure relate to the process of plant meiosis and plant meiocytes. There are various forms of meiosis known in the art, including, for example, reciprocal meiotic crossover recombination (also referred to as meiotic crossover recombination (see, e.g., Wang et al., Meiosis in Crops: From Genes to Genomes, 72 Journal of Experimental Botany 6091 (2021)). The process of plant meiosis is well-understood in the art. Meiosis is a process of specialized cell division that occurs during sexual reproduction and involves chromosomal pairing and recombination in diploid meiocytes to produce haploid gametes. Certain aspects of the present disclosure relate to stimulating recombination in plants during reciprocal meiotic crossover recombination.
[0210] The process of plant meiosis is well-understood to begin with meiocytes, which are cells that are destined to enter meiosis. In plants, meiosis is known to occur within specialized reproductive female and male organs that are composed of meiocyte precursor cells, that are destined to become meiocytes. Meiocyte precursor cells may be targeted with DNA-binding or other domains, using polypeptides or other systems. Certain aspects of the present disclosure involve, for example, targeting a recombinant polypeptide to a target locus in a plant meiocyte precursor cell. Meiocytes are a type of sub-epidermal cells, where the meiotic fate is determined based on a number of genetic factors known in the art. One or more of the peptides provided herein may be expressed throughout a plant or targeted to one or more specific tissues, such as, for example, to flowers and tissues therein. In some embodiments, a plant meiocyte precursor cell is produced from a tissue regenerated from a vegetative tissue of a plant expressing the peptides provided herein.
[0211] Meiosis begins with diploid meiocytes, which eventually differentiate into haploid gametes. During the process of meiosis, a diploid meiocyte containing homologous chromosomes first undergoes a round of DNA replication to produce identical sister chromatids. Then, the meiocyte undergoes chromosomal recombination, and one round of cell division. Homologous chromosomes pair with each other and undergo recombination61MF-364578422Attorney Docket No. 26223-20028.40 between non-sister chromatids, then, the homologs segregate and the meiocyte divides into two daughter cells. In a second round of cell division, the sister chromatids in those two daughter cells segregate, and the two cells divide into four haploid gametes.
[0212] The process of chromosomal recombination during meiosis may also be referred to as meiotic recombination. Meiotic recombination is known to result in an exchange of genetic material between sister chromatids. This exchange of genetic material may also be referred to as, for example, a crossover or recombination event. Within a given genome, recombination may occur within different genomic positions, resulting in crossover or an exchange of genetic material. Certain aspects of the present disclosure involve recombination occurring in a genomic region between a first genomic locus and a second genomic locus. The process of exchanging genetic material during meiosis is well-known to result in new allelic combinations and new traits.Centromeres
[0213] Certain aspects of the present disclosure relate to targeting histone methyltransferase domain polypeptides to specific loci. Targeted loci may be located in any genomic region or chromosomal position, between, for example a first genomic locus and a second genomic locus. In some embodiments, the targeted loci may be located in heterochromatin regions of DNA, for example, in centromeres. The methods provided herein may have an effect of increasing recombination in genomic regions that experience recombination suppression in the wild type, such as, for example, centromeres and centromere proximal regions.
[0214] Methods of identifying centromeres are well-understood in the art (see, e.g., Naish & Henderson, The Structure, Function, and Evolution of Plant Centromeres, 34 Genome Research 161 (2024)). For example, centromeres are known as chromosomal positions where kinetochores assemble. Centromeres help to regulate cell division and are generally responsible for proper distribution of replicated chromosomes. It is well-known that centromeres are often occupied by a centromere-specific variant of the histone H3 protein. In plants, this variant may also be called CENH3. Centromeres can thus be recognized by, for example, the presence of CENH3 variant nucleosomes in chromosomal regions. Accordingly, the methods provided herein in some aspects may involve targeting the recombinant polynucleotides provided herein to a target genomic locus that is in a region occupied by CENH3 variant nucleosomes. Alternatively or additionally, the methods provided herein may62MF-364578422Attorney Docket No. 26223-20028.40 involve targeting to genomic loci proximal to a centromere, e.g., a region proximal to a region occupies by CENH3 variant nucleosomes. Centromeres can also be recognized by the presence of predominantly repetitive DNA sequences, such as high-copy tandem repeats (also called satellites), or transposon families. It is well known that meiotic crossovers are suppressed in centromeres and in sequences proximal to centromeres (see, e.g., Pazhayam et al., Centromere-proximal suppression of meiotic crossovers in Drosophila is robust to changes in centromere number, repetitive DNA content, and centromere-clustering, Genetics, Volume 226, Issue 3, March 2024, iyad216, discussing Beadle, A possible influence of the spindle fibre on crossing-over in Drosophila. PNAS 1932, 18(2): 160— 165. doi:10.1073 / pnas.l8.2.160). The genomic regions proximal to a centromere are sometimes referred to as pericentromeres or pericerntromeric heterochromatin, and often contain repetitive sequences. The length of a centromere -proximal region may vary depending on, e.g., the organism and chromosome, but may include, for example, at least the first 5, 10, 15, 20, 25, 30, 35, 40, 45, 50, 55, 60, 65, 70, 75, 80, 90, 100, 125, 150, 175, 200, 225, 250, 275, 300 Mb or more on either side of a centromere. In a crop genome, for example, the centromere may be relatively narrow, with a relatively large zone of recombination suppression surrounding it. Some plants have large genomes, and the exact range of distances in Mb of a centromere-proximal region may vary. For example, in bread wheat, the recombination-suppressed centromere proximal region is 200 Mb on some chromosomes (see, e.g., Tock et al., Crossover-active regions of the wheat genome are distinguished by DMC1, the chromosome axis, H3K27me3, and signatures of adaptation. Genome Res. 2021 Sep;31(9):1614-1628. doi: 10.1101 / gr.273672.120. Epub 2021 Aug 23. PMID: 34426514; PMCID: PMC8415368). The methods provided herein may unlock recombination in these centromere-proximal regions, allowing genetic gains to be efficiently achieved through breeding.
[0215] Thus, in some embodiments, one or more of the first genomic locus, second genomic locus, and / or target locus may be in a centromere and / or in a region proximal to a centromere. In some embodiments, one or more of the first genomic locus, second genomic locus, and / or target locus may be in a repetitive region. In some embodiments, all of the first genomic locus, second genomic locus, and target locus are in a centromere. In some embodiments, all of the first genomic locus, second genomic locus, and target locus are in a region proximal to a centromere. In some embodiments, the first genomic locus is in a centromere and the second genomic locus is in a region proximal to a centromere. In some63MF-364578422Attorney Docket No. 26223-20028.40 embodiments, the second genomic locus is in a centromere and the first genomic locus is in a region proximal to a centromere.
[0216] Certain aspects of the present disclosure involve altering the rate of meiotic crossover recombination in regions of a genome, including in centromeric and / or centromere- proximal regions. Within chromosomes, or genomes, the rate of recombination or crossover is known to vary. As described above, certain aspects of the present disclosure involve the rate of recombination or crossover between a first genomic locus and a second genomic locus. Some chromosomal regions are known to undergo a high rate of recombination or crossover and may also be referred to as hot spots. For example, gene-rich chromosomal arms are known to have relatively higher rates of meiotic recombination. Other regions are known to undergo a low rate of recombination or crossover and may also be referred to as cold spots. For example, most heterochromatic regions are known to have relatively lower rates of meiotic recombination. It is well-understood in the art that centromeres, and the proximal genomic regions surrounding centromeres, have lower rates of recombination, compared to other genomic regions. In A. thaliana, for example, recombination is known to occur at a higher frequency in euchromatic arms, at a lower frequency in heterochromatic regions proximal to centromeric regions, and at a much lower frequency over the centromere itself (see, e.g., Naish et al., 2021; Yelina et al., 2012). In certain aspects of the present disclosure, meiotic recombination is increased in centromeric regions and / or centromere proximal regions using the methods of the present disclosure. In some embodiments, the increase is compared to the rate of recombination between corresponding genomic loci in a corresponding wild type background.Recombination Rates
[0217] It is well-understood in the art that rates of recombination in a genome, also referred to as recombination frequency, can be described and experimentally measured in various ways (see, e.g., Wu et al., 2015). For example, one way to describe the rate of recombination in a specific genomic region is with the number of crossover events per meiosis in that region (see, e.g., Si et al., 2015, Widely Distributed Hot and Cold Spots in Meiotic Recombination as Shown by the Sequencing of Rice F2 Plants, 206 New Phytologist 1491 (2015)). Another way to describe the rate of recombination is based on the number of recombinant gametes compared to the total number of gametes. Another way to describe the rate of recombination is based on the number of recombination events compared to the total64MF-364578422Attorney Docket No. 26223-20028.40 number of individuals studied (see, e.g., Phillips et al., The Effect of Temperature on the Male and Female Recombination Landscape of Barley, 208 New Phytologist 421 (2015)).
[0218] Another way to describe the rate of recombination is based on genetic distance (i.e., centimorgan or cM) compared to physical genomic distance (i.e., the number of base pairs in the interval over which the rate of recombination is measured). For example, a rate of recombination, also referred to as recombination or crossover frequency, may be calculated as centimorgan (cM) per megabase (Mb), where Mb represents the physical genomic distance. In embodiments where the physical genomic distance is measured in Mb to calculate recombination rate or frequency, the distance (e.g., the number of bases between a first genomic locus and a second genomic locus as described herein) may be at least about 2 Mb, at least about 2.5 Mb, at least about 3 Mb, at least about 3.5 Mb, at least about 4 Mb, at least about 4.5 Mb, at least about 5 Mb, at least about 5.5 Mb, at least about 6 Mb, at least about 6.5 Mb, at least about 7 Mb, at least about 7.5 Mb, at least about 8 Mb, at least about8.5 Mb, at least about 9 Mb, at least about 9.5 Mb, at least about 10 Mb, at least about 10.5Mb, at least about 11 Mb, at least about 11.5 Mb, at least about 12 Mb, at least about 12.5Mb, at least about 13 Mb, at least about 13.5 Mb, at least about 14 Mb, at least about 14.5Mb, at least about 15 Mb, at least about 15.5 Mb, at least about 16 Mb, at least about 16.5Mb, at least about 17 Mb, at least about 17.5 Mb, at least about 18 Mb, at least about 18.5Mb, at least about 19 Mb, at least about 19.5 Mb, at least about 20 Mb, at least about 20.5Mb, at least about 21 Mb, at least about 21.5 Mb, at least about 22 Mb, at least about 22.5Mb, at least about 23 Mb, at least about 23.5 Mb, at least about 24 Mb, at least about 24.5Mb, at least about 25 Mb, or at least about 25.5 Mb. For example, in some embodiments where the physical genomic distance is measured in Mb, the distance may be between 7 and7.5 Mb, between 7.5 and 8 Mb, between 8 and 8.5 Mb, between 8.5 and 9 Mb, or between 9 and 9.5 Mb. In some embodiments, the distance may be about 8.5 Mb. In some embodiments, the distance may be about 5-6 Mb.
[0219] In embodiments where the rate of recombination is calculated as cM per Mb, the rate of recombination between a first genomic locus and a second genomic locus after application of the methods provided herein may be at least about 5%, at least about 10%, at least about 15%, at least about 20%, at least about 25%, at least about 30%, at least about 35%, at least about 40%, at least about 45%, at least about 50%, at least about 55%, at least about 60%, at least about 65%, at least about 70%, or at least about 75%. In some embodiments, the rate of recombination between a first genomic locus and a second genomic65MF-364578422Attorney Docket No. 26223-20028.40 locus after application of the methods provided herein may be between 10% and 15%, between 15% and 20%, between 20% and 25%, or between 25% and 30%. In embodiments where the rate of recombination is calculated as cM per Mb, the rate of recombination between a first genomic locus and a second genomic locus may increase by at least about 5%, at least about 10%, at least about 15%, at least about 20%, at least about 25%, at least about 30%, at least about 35%, at least about 40%, at least about 45%, at least about 50%, at least about 55%, at least about 60%, at least about 65%, at least about 70%, or at least about 75% as a consequence of the methods provided herein. In some embodiments, the rate of recombination between a first genomic locus and a second genomic locus may increase by between 10% and 15%, between 15% and 20%, between 20% and 25%, or between 25% and 30% as a consequence of the methods provided herein.
[0220] Certain aspects of the present disclosure involve an increased rate of recombination between a first genomic locus and a second genomic locus. As described above, the first genomic locus and the second genomic locus may be separated by a physical genomic distance measured in Mb. An increased rate of recombination can be measured with any of the above descriptions of the rate of recombination in a specific genomic region between a first genomic locus and a second genomic locus. For example, an increased rate of recombination may involve an increase in the rate of recombination measured in cM per Mb in a plant or plant cell provided with the recombinant polypeptides provided herein, or a transgenic plant or cell thereof, compared to the rate of recombination measured in cM per Mb in a comparator plant or comparator plant cell. The comparator may be, for example, a corresponding plant not provided with the recombinant polypeptides provided herein, or a cell thereof, such as a wild type plant or cell an otherwise non-transgenic plant or cell thereof. In some embodiments, the comparator plant or cell may be of the same plant variety as a plant or cell upon which the methods provided herein are performed, except that the comparator does not undergo the methods provided herein.
[0221] In some embodiments, the rate of recombination may increase by at least about 15%, at least about 20%, at least about 25%, at least about 30%, at least about 35%, at least about 40%, at least about 45%, at least about 50%, at least about 55%, at least about 60%, at least about 65%, at least about 70%, at least about 75%, at least about 80%, at least about 85%, at least about 90%, at least about 95%, or about 100% in the methods provided herein compared to a comparator plant or plant cell.66MF-364578422Attorney Docket No. 26223-20028.40
[0222] The number of crossovers may vary across a genomic interval, such as, for example, between a first genomic locus and a second genomic locus. For example, a single genomic internal undergoing meiosis may experience 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, or more than 15 crossover events, with each crossover event creating a recombined genomic sequence.
[0223] The rate of recombination in a genome or a specific genomic region can be measured in different ways. For example, recombination rate may be measured by performing crosses between plants that carry fluorescent transgenic markers around specific genomic regions. The presence or absence of fluorescence can indicate meiotic recombination events that occur within a specific genomic region. There are several other methods that can be employed to measure rates of recombination within and across genomes (see, e.g., Ahn et al., High-Throughput Measuring of Meiotic Recombination Rates in Barley Pollen Nuclei Using Crystal Digital PCR™, 107 The Plant Journal 649 (2021)), for example, meiotic chromosome analysis, genotyping-by-sequencing, or other molecular markers.Methods of Increasing Levels of Histone Methylation and / or Increasing Recombination Rates at or Near a Target Nucleic Acid in Plants
[0224] Targeting a polypeptide comprising a histone methyltransferase domain according to the methods provided herein promotes meiotic recombination (also known as crossovers, crossover events, and the like) at and / or around the targeted locus. This may be measured in a variety of ways. For example, one way to quantify the increase in meiotic recombination is from the perspective of a plurality of plant meiocytes, whereby, for example, a first plurality of meiocytes (e.g., produced by a method provided herein) exhibits an increased rate of recombination at or around the targeted locus compared to the rate of recombination at or around a corresponding locus in a second plurality of plant meiocytes (e.g., not produced by a method provided herein). Another way to quantify the increase in meiotic recombination, for example, is from the perspective of a progeny plant, whereby, for example, a first progeny plant (e.g., produced by a method provided herein) exhibits an increased number of recombined genomic sequences at or around the targeted locus compared to the number of recombined genomic sequences at or around a corresponding locus in a second progeny plant (e.g., not produced by a method provided herein).
[0225] In some embodiments, the histone methyltransferase domain methylates the target locus. Thus, the target locus of a plant comprising a recombinant polypeptide comprising a67MF-364578422Attorney Docket No. 26223-20028.40 histone methyltransferase domain that is targeted to a target locus may comprise increased methylation (of histones at or near, e.g., the target locus) compared to a plant not comprising the recombinant polypeptide comprising the histone methyltransferase domain. Without wishing to be bound by theory, it is believed that histone methylation (such as, for example, a relative increase in histone methylation at and / or around a targeted locus) induces recombination at or around the histone methylation site, such that targeting histone methylation may be used to prompt recombination at one or more desired sites. Thus, if the histone methylation is targeted to a genomic locus that typically naturally rarely recombines (so-called “recombination cold spots”, such as, for example, in or near a centromere), targeted histone methylation as provided herein could be used to, for example, increase genomic diversity by promoting, for example, rare types of recombination events. In other words, the histone methyltransferase domain targeting of the methods provided herein may, in some embodiments, make a recombination cold spot less “cold”.
[0226] Thus, provided herein is a method for producing a plurality of plant meiocytes having an increased rate of recombination between a first genomic locus and a second genomic locus, the method comprising: (a) providing a plant comprising a recombinant polypeptide comprising a histone methyltransferase domain and that is capable of being targeted to a target locus between the first genomic locus and the second genomic locus; (b) growing the plant under conditions whereby the recombinant polypeptide is targeted to the target locus in a plurality of plant meiocyte precursor cells; and (c) growing the plant under conditions whereby the plurality of plant meiocyte precursor cells undergo meiosis, thereby producing a plurality of plant meiocytes having an increased rate of recombination between the first genomic locus and the second genomic locus. In some embodiments, the rate of recombination is measured relative to a comparator plurality of plant meiocytes.
[0227] In some embodiments, the plant further comprises: i) a second recombinant polypeptide comprising 1) a nuclease-deficient CAS9 polypeptide (dCAS9) or fragment thereof and 2) a multimerized epitope; and ii) a crRNA and a tracr RNA, or fusions thereof; wherein the recombinant polypeptide comprising a histone methyltransferase domain further comprises an affinity polypeptide that specifically binds to the epitope; and (b) the growing comprises conditions whereby the second recombinant polypeptide and the recombinant polypeptide comprising the histone methyltransferase domain are targeted to the target locus.
[0228] In some embodiments, the plurality of plant meiocytes having an increased rate of recombination between the first genomic locus and the second genomic locus is a first68MF-364578422Attorney Docket No. 26223-20028.40 plurality of plant meiocytes. In some embodiments, the increased rate of recombination is measured relative to a comparator plurality of plant meiocytes. In some embodiments, the comparator plurality of plant meiocytes is produced by a second plant. In some embodiments, the second plant is of the same genotype as the first plant except that the second plant lacks the first recombinant polypeptide and the second recombinant polypeptide.
[0229] Thus, in some embodiments, provided herein is a method for producing a first plurality of plant meiocytes having an increased rate of recombination between a first genomic locus and a second genomic locus, the method comprising: (a) providing a first plant comprising: a first recombinant polypeptide comprising 1) a histone methyltransferase domain and 2) an affinity polypeptide that specifically binds to a multimerized epitope; a second recombinant polypeptide comprising: 1) a nuclease-deficient CAS9 polypeptide (dCAS9) or fragment thereof and 2) the multimerized epitope; and a crRNA and a tracr RNA, or fusions thereof; (b) growing the first plant under conditions whereby the first and second recombinant polypeptides are targeted to a target locus between the first genomic locus and the second genomic locus in a plurality of plant meiocyte precursor cells; and (c) growing the plant under conditions whereby the plurality of plant meiocyte precursor cells undergo meiosis, thereby producing a first plurality of plant meiocytes having an increased rate of recombination between the first genomic locus and the second genomic locus, wherein the rate of recombination is measured relative to a comparator plurality of plant meiocytes produced by a second plant, wherein the second plant is of the same genotype as the first plant but lacks the first recombinant polypeptide and the second recombinant polypeptide.
[0230] Also provided herein is a method for producing a progeny plant, comprising: (a) providing a plant comprising a recombinant polypeptide comprising a histone methyltransferase domain and that is capable of being targeted to a target locus between a first genomic locus a second genomic locus; (b) growing the plant under conditions whereby the recombinant polypeptide is targeted to the target locus in a plant meiocyte precursor cell; (c) producing a first plant meiocyte from the plant meiocyte precursor cell; and (d) crossing the first plant meiocyte with a second plant meiocyte, thereby producing a progeny plant, wherein the progeny plant comprises an increased number of recombined genomic sequences between the first genomic locus and the second genomic locus. In some embodiments, the number of recombined genomic sequences is measured in comparison to a comparator progeny plant.69MF-364578422Attorney Docket No. 26223-20028.40
[0231] In some embodiments, the plant further comprises: i) a second recombinant polypeptide comprising 1) a nuclease-deficient CAS9 polypeptide (dCAS9) or fragment thereof and 2) a multimerized epitope; and a crRNA and a tracr RNA, or fusions thereof; wherein the recombinant polypeptide comprising a histone methyltransferase domain further comprises an affinity polypeptide that specifically binds to the epitope; and (b) the growing comprises conditions whereby the second recombinant polypeptide and the recombinant polypeptide comprising the histone methyltransferase domain are targeted to the target locus.
[0232] In some embodiments, first plant meiocyte comprises an increased number of recombination events compared to a comparator plant meiocyte, wherein the comparator plant meiocyte was produced from a plant meiocyte precursor cell that did not comprise the first and / or second recombinant polypeptides.
[0233] In some embodiments, the progeny plant that comprises an increased number of recombined genomic sequences between the first genomic locus and the second genomic locus is a first progeny plant. In some embodiments, the comparator progeny plant is produced from a cross between meiocytes that were not derived from a plant comprising the first recombinant polypeptide or the second recombinant polypeptide. The level and / or relative level of histone methylation at and / or around a locus that is targeted in a plant meiocyte precursor cell may vary in a progeny plant derived from the plant meiocyte precursor cell. For example, in some embodiments, a progeny plant has greater histone methylation at and / or around the target locus compared to the level of histone methylation at and / or around the corresponding locus in the comparator progeny plant. For example, in some embodiments, a progeny plant has increased histone methylation in a genomic region within about 1, 2, 4, 6, 8, 10, 15, or 20 Mb of the target nucleic acid. In other embodiments, a progeny plant may have roughly equal or less histone methylation at and / or around the target locus compared to the level of histone methylation at and / or around the corresponding locus in the comparator progeny plant.
[0234] Thus, in some embodiments, provided herein is a method for producing a first progeny plant, comprising: (a) providing a first plant comprising: a first recombinant polypeptide comprising 1) a histone methyltransferase domain and 2) an affinity polypeptide that specifically binds to a multimerized epitope; a second recombinant polypeptide comprising: 1) a nuclease-deficient CAS9 polypeptide (dCAS9) or fragment thereof and 2) the multimerized epitope; and a crRNA and a tracr RNA, or fusions thereof; (b) growing the first plant under conditions whereby the first and second recombinant70MF-364578422Attorney Docket No. 26223-20028.40 polypeptides are targeted to a target locus between a first genomic locus and a second genomic locus in a plant meiocyte precursor cell; (c) producing a first plant meiocyte from the plant meiocyte precursor cell; (d) crossing the first plant meiocyte with a second plant meiocyte, thereby producing a first progeny plant, wherein the first progeny plant comprises an increased number of recombined genomic sequences between the first genomic locus and the second genomic locus compared to a comparator progeny plant, wherein the comparator progeny plant was produced from a cross between meiocytes that were not derived from a plant comprising the first recombinant polypeptide or the second recombinant polypeptide.
[0235] Growing conditions sufficient for the recombinant polypeptides of the present disclosure to be expressed in the plant to be targeted to a target locus between a first genomic locus and a second genomic locus and to increase the rate of recombination between the first and second genomic loci of the present disclosure are well known in the art and include any suitable growing conditions disclosed herein. Typically, the plant is grown under conditions sufficient to express a recombinant polypeptide of the present disclosure, and for the expressed recombinant polypeptides to be localized to the nucleus of cells of the plant in order to be targeted to and induce histone methylation at or near the target nucleic acid (if those targets are present in the nucleus). Generally, the conditions sufficient for the expression of the recombinant polypeptide will depend on the promoter used to control the expression of the recombinant polypeptide. For example, if an inducible promoter is utilized, expression of the recombinant polypeptide in a plant will require that the plant to be grown in the presence of the inducer.
[0236] As noted above, growing conditions sufficient for the recombinant polypeptides of the present disclosure to be expressed in the plant to be targeted to one or more target nucleic acids and to methylate histones and / or lead to localized increases in recombination rates at or near the target nucleic acid may vary depending on a number of factors (e.g. species of plant, use of inducible promoter, etc.). Suitable growing conditions may include, for example, ambient environmental conditions, standard greenhouse conditions, growth in long days under standard environmental conditions (e.g. 16 hours of light, 8 hours of dark), growth in 12 hour light : 12 hour dark day / night cycles, etc.
[0237] Various time frames may be used to observe increased rates of recombination according to the methods of the present disclosure. Recombination rates may be observed / assayed after meiosis in progeny cells or progeny plants produced from the methods provided herein, for example, about 5 days of growth, about 10 days of growth, about 15 days71MF-364578422Attorney Docket No. 26223-20028.40 after growth, about 20 days after growth, about 25 days after growth, about 30 days after growth, about 35 days after growth, about 40 days after growth, about 50 days after growth, or 55 days or more of growth.
[0238] A recombination rate between a first genomic locus and a second genomic locus in a plant cell housing recombinant polypeptides of the present disclosure may increase by at least about 5%, at least about 10%, at least about 15%, at least about 20%, at least about 25%, at least about 30%, at least about 40%, at least about 50%, at least about 55%, at least about 60%, at least about 65%, at least about 70%, at least about 75%, at least about 80%, at least about 85%, at least about 90%, at least about 91%, at least about 92%, at least about 93%, at least about 94%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, at least about 99%, or about 100% as compared to a corresponding control. Various controls will be readily apparent to one of skill in the art. For example, a control may be a corresponding plant or plant cell that does not contain recombinant polypeptides of the present disclosure (e.g. wild-type plant or plant cell).
[0239] A recombination rate between a first genomic locus and a second genomic locus that bound a genomic interval comprising a target nucleic acid of the present disclosure may increase as compared to a corresponding control third genomic locus and fourth genomic locus that bound a control genomic interval not comprising a target nucleic acid. A recombination rate between a first genomic locus and a second genomic locus that bound a genomic interval comprising a target nucleic acid of the present disclosure may increase at least about 1-fold, at least about 2-fold, at least about 3-fold, at least about 4-fold, at least about 5-fold, at least about 10-fold, at least about 15-fold, at least about 20-fold, at least about 25-fold, at least about 30-fold, at least about 40-fold, at least about 50-fold, at least about 75- fold, at least about 100-fold, at least about 150-fold, at least about 200-fold, at least about 300-fold, at least about 400-fold, at least about 500-fold, at least about 600-fold, at least about 700-fold, at least about 800-fold, at least about 900-fold, at least about 1,000-fold, at least about 1,250-fold, at least about 1,500-fold, at least about 1,750-fold, at least about 2,000-fold, at least about 2,500-fold, at least about 3,000-fold, at least about 3,500-fold, at least about 4,000-fold, at least about 4,500-fold, at least about 5,000-fold, at least about 5,500-fold, at least about 6,000-fold, at least about 6,500-fold, at least about 7,000-fold, at least about 7,500-fold, at least about 8,000-fold, at least about 8,500-fold, at least about 9,000-fold, at least about 9,500-fold, at least about 10,000-fold, at least about 12,000-fold, at least about 14,00-fold, at least about 16,000-fold, at least about 18,000-fold, or at least about72MF-364578422Attorney Docket No. 26223-20028.4020,000-fold or more as compared to the recombination rate across a corresponding control nucleic acid interval. In some embodiments, a recombination rate between a first genomic locus and a second genomic locus that bound a genomic interval comprising a target nucleic acid of the present disclosure may increase in the range of about 1,000-fold to about 10,000- fold as compared to the recombination rate across a corresponding control nucleic acid interval. As stated above, various controls will be readily apparent to one of skill in the art. For example, a control nucleic acid interval may be a corresponding nucleic acid interval from a plant or plant cell that does not contain a nucleic acid encoding a recombinant polypeptide of the present disclosure. Alternatively, a genomic interval from within the same plant but not comprising a target locus may be used as a comparator.
[0240] In some embodiments, nucleic acids targeted by a histone methyltransferase domain polypeptide (e.g. SDG2) according to the methods of the present disclosure may experience a gain or increase in histone methylation at and / or in proximity of the targeted nucleic acid after the histone methyltransferase domain polypeptide has been targeted to the target nucleic acid.
[0241] A target nucleic acid of the present disclosure in a plant cell housing a recombinant histone methyltransferase domain polypeptide of the present disclosure may have its level of histone methylation increased by at least about 5%, at least about 10%, at least about 15%, at least about 20%, at least about 25%, at least about 30%, at least about 40%, at least about 50%, at least about 55%, at least about 60%, at least about 65%, at least about 70%, at least about 75%, at least about 80%, at least about 85%, at least about 90%, at least about 91%, at least about 92%, at least about 93%, at least about 94%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, at least about 99%, or about 100% as compared to a corresponding control. Various controls will be readily apparent to one of skill in the art. For example, a control may be a corresponding plant or plant cell that does not contain a nucleic acid encoding a recombinant histone methyltransferase domain polypeptide of the present disclosure (e.g. a wild-type plant or plant cell).
[0242] A target nucleic acid of the present disclosure having increased histone methylation as compared to a corresponding control nucleic acid (as a consequence of the methods of the present disclosure) may exhibit an increase in histone methylation over a number of nucleotides including and adjacent to the targeted nucleotide sequences in a target nucleic acid. For example, the increase in histone methylation may be present over one nucleotide, over about 5 nucleotides, over about 10 nucleotides, over about 15 nucleotides,73MF-364578422Attorney Docket No. 26223-20028.40 over about 20 nucleotides, over about 25 nucleotides, over about 30 nucleotides, over about 35 nucleotides, over about 40 nucleotides, over about 45 nucleotides, over about 50 nucleotides, over about 55 nucleotides, over about 60 nucleotides, over about 75 nucleotides, over about 100 nucleotides, over about 125 nucleotides, over about 150 nucleotides, over about 175 nucleotides, over about 200 nucleotides, over about 225 nucleotides, over about 250 nucleotides, over about 275 nucleotides, over about 300 nucleotides, over about 350 nucleotides, over about 400 nucleotides, over about 450 nucleotides, over about 500 nucleotides, over about 600 nucleotides, over about 700 nucleotides, over about 800 nucleotides, over about 900 nucleotides, over about 1,000 nucleotides, over about 1,500 nucleotides, over about 2,000 nucleotides, over about 2,500 nucleotides, or over about 3,000 nucleotides or more as compared to corresponding nucleotides in a corresponding control nucleic acid. The increase in methylation of histones on nucleotides adjacent to the target nucleotides in the target nucleic acid may occur in nucleotides that are 5’ to the target nucleotide sequences, 3’ to the target nucleotides sequences, or both 5’ and 3’ to the target nucleotide sequences.
[0243] Increased histone methylation of a target nucleic acid induced by targeting a recombinant histone methyltransferase domain polypeptide to the target nucleic acid may be stable in plants even in the absence of the recombinant histone methyltransferase domain polypeptide in the plant. Accordingly, the methods of the present disclosure may allow one or more target nucleic acids in a plant to maintain an increased level of histone methylation, and thereby maintain an increased rate of recombination, after a nucleic acid encoding a recombinant histone methyltransferase domain polypeptide has been crossed out or otherwise removed from the plant. For example, after targeting a particular genomic region with a recombinant histone methyltransferase domain polypeptide according to the methods of the present disclosure, the increased level of histone methylation and / or increased rate of recombination at or around the targeted region may remain stable even after crossing away the transgenes. It is an object of the present disclosure to provide plants having increased histone methylation and / or increased rates of recombination at or near one or more target nucleic acids according to the methods of the present disclosure. As the methods of the present disclosure may allow one or more target nucleic acids in a plant to remain in their state of increased histone methylation after a recombinant histone methyltransferase domain polynucleotide encoding a recombinant histone methyltransferase domain polypeptide of the present disclosure has been crossed out of the plant, the progeny plants of these plants may74MF-364578422Attorney Docket No. 26223-20028.40 have increased histone methylation of one or more target nucleic acids even in the absence of the recombinant polynucleotides that produce the recombinant polypeptides of the present disclosure. Correspondingly, cells from such progeny plants that undergo meiosis may also maintain the increased recombination rates even in the absence of the recombinant polynucleotides that produce the recombinant polypeptides of the present disclosure. Without wishing to be bound by theory, it is believed that the level of histone methylation is maintained for one or more generations even in the absence of the recombinant polynucleotides that produce the recombinant polypeptides of the present disclosure, and that the increased recombination rate is a consequence of the increased histone methylation.
[0244] Comparisons in the present disclosure may also be in reference to corresponding control plants. Various control plants will be readily apparent to one of skill in the art. For example, a control plant may be a plant that does not contain one or more of: (1) a recombinant polypeptide including a nuclease-deficient CAS9 polypeptide (dCAS9) or fragment thereof and a multimerized epitope, (2) a recombinant polypeptide including a histone methyltransferase domain and an affinity polypeptide, and / or (3) a crRNA and a tracrRNA, or fusions thereof.
[0245] Methods of probing the expression level of a nucleic acid are well-known to those of skill in the art. For example, qRT-PCR analysis may be used to determine the expression level of a population of nucleic acids isolated from a nucleic acid-containing sample (e.g. plants, plant tissues, or plant cells).
[0246] Methods of probing the histone methylation status of a nucleic acid are well- known to those of skill in the art. For example, bisulfite sequencing and nucleic acid analysis may be used to determine the histone methylation status, on a nucleotide-by-nucleotide basis, of a population of nucleic acids isolated from a nucleic acid-containing sample (e.g. plants, plant tissues, or plant cells).
[0247] It is to be understood that while the present disclosure has been described in conjunction with the preferred specific embodiments thereof, the foregoing description is intended to illustrate and not limit the scope of the present disclosure. Other aspects, advantages, and modifications within the scope of the present disclosure will be apparent to those skilled in the art to which the present disclosure pertains.75MF-364578422Attorney Docket No. 26223-20028.40EXAMPLES
[0248] The following examples are offered to illustrate provided embodiments and are not intended to limit the scope of the present disclosure.Example 1: CRISPR targeting of H3K4me3 activates gene expression and unlocks centromeric crossover recombination in Arabidopsis.
[0249] This Example demonstrates Applicant’s development of a SunTag system that allows CRISPR targeting of H3K4me3.
[0250] H3K4me3 is a fundamental and highly conserved chromatin mark across eukaryotes, playing a central role in many genome-related processes, including transcription, maintenance of cell identity, DNA damage repair, and meiotic recombination. However, identifying the causal function of H3K4me3 in these diverse pathways remains a challenge, and we lack the tools to manipulate it for agricultural benefit. Here we used the CRISPR- based SunTag system to direct H3K4me3 methyltransferases in the model plant, Arabidopsis thaliana. Targeting of SunTag-SDG2 activated the expression of the endogenous reporter gene, FWA. We showed that SunTag-SDG2 can be employed to increase pathogen resistance by targeting the H3K4me3 -dependent disease resistance gene, SNC1. Meiotic crossover recombination rates impose a limit on the speed with which new traits can be transferred to elite crop varieties. We demonstrated that targeting of SunTag-SDG2 to low recombining centromeric regions can significantly stimulate crossover formation. Finally, we revealed that the effect is not specific to SDG2 and is likely dependent on the H3K4me3 mark itself, as the orthogonal mammalian-derived H3K4me3 methyltransferase, PRDM9, produces a similar effect on gene expression with reduced off-target potential. Overall, this Example supports an instructive role for H3K4me3 in transcription and meiotic recombination and opens the door to precise modulation of important agricultural traits.Introduction
[0251] Re-programming of transcriptional landscapes in response to developmental and environmental cues critically underlies a plant’s ability to adapt and optimize their performance. The potential to re-wire and fine-tune expression of endogenous genes therefore holds great promise for agricultural improvement. Chromatin modifications associated with gene expression play an essential role in this process (Lloyd and Lister, 2021).76MF-364578422Attorney Docket No. 26223-20028.40
[0252] Trimethylation of histone 3 lysine 4 (H3K4me3) is a well-studied and highly conserved chromatin mark. Its enrichment around the transcriptional start site of expressed genes is observed across eukaryotes (Schwaiger et al., 2014), and is thought to assist in transcriptional maintenance and memory of cell identity (Wang and Helin, 2024). Despite the ubiquity of H3K4me3 and its positive correlation with gene expression, whether it plays an instructive role in transcription remains a matter of debate, due in part to the limited evidence of transcriptional impact when H3K4me3 levels are perturbed (Francisco et al., 2023; Howe et al., 2017). In animals, TAF3 directly binds to H3K4me3 and recruits basal transcription factor machinery, providing a plausible mechanism to stimulate transcription (Lauberth et al., 2013; Vermeulen et al., 2007). Recent work in embryonic stem cells shows that H3K4me3 plays a role in RNA Polymerase II pause-release and elongation, rather than transcriptional initiation (H. Wang et al., 2023). In plants, there are no TAF3 homologues and whether similar mechanisms exist to promote transcription downstream of H3K4me3 remains unknown.
[0253] H3K4me3 also plays a key role in several other genome-related processes. For instance, meiotic recombination, which allows for genetic exchange between parental chromosomes (Borde et al., 2009). This process is essential for breeders to transfer traits from wild relatives into elite cultivars. However, the positioning of meiotic recombination events is non-uniform across the genome, with gene-rich chromosomal arms having relatively higher rates of recombination, and centromeric repeat-rich regions being more recombination suppressed (Rowan et al., 2019; Tock et al., 2021). Loci encoding desirable traits that reside within these low-recombining regions of the genome are largely intractable to selection by traditional breeding approaches. Further, in mammals, the H3K4me3 SET domain methyltransferase containing protein, PRDM9, helps to define active sites of crossover recombination (Powers et al., 2016). While H3K4me3 levels are positively correlated with recombination rates in plants (Choi et al., 2013; Rowan et al., 2019), whether deposition of H3K4me3 is sufficient to promote crossover formation is entirely unknown.
[0254] One way to reveal the causal role of an epigenetic modification is to directly deposit the mark at targeted genomic locations (Dubois and Roudier, 2021; Harris et al., 2023; Policarpi and Hackett, 2021; Villiger et al., 2024; Wang and Yamaguchi, 2024). Epigenome editing tools, such as the CRISPR-based SunTag system (Tanenbaum et al., 2014), have proven to be highly efficient at targeting DNA methylation addition, removal and histone modification (Gallego-Bartolome et al., 2018; Papikian et al., 2019; Pflueger et al.,77MF-364578422Attorney Docket No. 26223-20028.402018; Policarpi et al., 2024; M. Wang et al., 2023), although their utility for modulating agronomically useful traits is only just beginning to be realized (Selma and Orzaez, 2021), and is limited by the range of possible marks that can be directed (Lloyd and Lister, 2021). In the model plant Arabidopsis thaliana, SDG2 is the main histone methyltransferase responsible for deposition of H3K4me3 (Guo et al., 2010).
[0255] Here, we describe the first CRISPR directed H3K4me3 depositor in plants and demonstrate its utility in a range of scenarios. We translationally fused the SDG2 methyltransferase domain to the scFv single chain antibody, which can be recruited to the SunTag epitope chain connected to dead Cas9 (dCas9). We showed that SunTag-SDG2 mediated H3K4me3 deposition can activate the expression of endogenous genes. Further, by targeting a disease resistance gene, we generated plants with enhance resistance capabilities. Next, we showed that SunTag-SDG2 can unlock meiotic crossover recombination across centromeric regions. Finally, we incorporated the methyltransferase domain from PRDM9 into SunTag - which was recently shown to instruct transcription in a mammalian context (Policarpi et al., 2024) - showing that it is also highly efficient for transcriptional activation in plants, and displays reduced off-target potential. These results demonstrate the power and versatility of a functional H3K4me3 targeting system for modulation of critical genome templated processes, such as transcriptional enhancement and meiotic recombination.Results and DiscussionSunTag SDG2 Mediates Transcriptional Activation ofFWA
[0256] The SunTag system we chose is composed of three components (Gardiner et al., 2022; Tanenbaum et al., 2014) (FIG. 1A): 1) A catalytically deactivated (nuclease deficient) Cas9 (dCas9) translationally fused to a tail with lOx copies of the GCN4 epitope, each separated by a 22 amino acid flexible linker. 2) An effector module that encodes a single chain fragment variable (scFv) antibody that recognizes GCN4, super folder GFP (sfGFP), and the effector of interest (e.g. enzymatic modifier / recruitment scaffold). 3) U6 promoter driven CRISPR guide RNA. When the three components are expressed within the same cell, the guide RNA recruits the dCas9-10xGCN4 to the target locus of interest, while the GCN4 single chain antibody containing effector module binds to the dCas9-10xGCN4 epitope tail. The system allows for conformational flexibility and the potential for high stochiometric concentration of the effector at the locus of interest.78MF-364578422Attorney Docket No. 26223-20028.40
[0257] We inserted the C-terminal coding sequence of the H3K4 methyltransferase, SDG2 (amino acids 1571-2335), into the SunTag system (FIGS. 1B-1E). As a control, we also generated a version of SDG2 with an amino acid change (Y1903F) predicted to eliminate catalytic activity (FIGS. 2A-2B) (Guo et al., 2010). Hereafter, we refer to these as SDG2 and dSDG2 (for catalytically deactivated). We targeted the SunTag system to the promoter of the epigenetically repressed FWA gene (Soppe et al., 2000), using the previously published CRISPR guide RNA 4, which is complementary to two tandem repeat regions directly upstream of the FWA transcription start site (TSS) (Gallego-Bartolome et al., 2018). These were transformed into the rdr6 background, as previous results have shown that SunTag is less prone to silencing and achieves higher gene expression (M. Wang et al., 2023). We observed that SunTag:SDG2:FWA_g4 activated FWA mRNA expression, while the SunTag:dSDG2:FWA_g4 and no guide (SunTag:SDG2:No_g) RNA controls did not (FIG. 3A and FIG. IE). Importantly, effector module expression was detected in all constructs, indicating that both guide RNA targeting and catalytic activity are required for FWA activation (FIG. 3B) To validate the presence of the SunTag and test for the deposition of H3K4me3, we performed ChIP qPCR and observed significant enrichment both for the presence of SunTag and for H3K4me3 accumulation at the target site in these transgenic lines (FIGS. 3C-3D), confirming that SunTag-SDG2 binds the FWA locus and deposits H3K4me3. To investigate SunTag:SDG2:FWA_g4 binding genome-wide, we performed high- throughput sequencing (anti-HA ChlP-seq), which confirmed the previously reported high specificity of SunTag targeting (Papikian et al., 2019), with the FWA target locus as the top binding target (FIGS. 3E-3I). Only 3 conserved binding sites were identified in the SunTag:SDG2:FWA_g4 lines; FWA and two off-target sites previously identified as having sequence-similarity to FWA_g4 (Papikian et al., 2019). Furthermore, transcriptome analyses confirmed FWA as the highest fold change differentially expressed upregulated gene (FIG. 3J). Together, these data indicate that deposition of H3K4me3 by SunTag:SDG2 is sufficient to overcome epigenetic silencing and drive transcriptional activation in a locus-specific manner.SNC1 mediated modulation of disease resistance
[0258] Given that SunTag:SDG2 can modulate chromatin and activate gene expression, we were interested in using the tool to modify an important plant trait. Gene expression during pathogen attack is a key factor in determining disease outcome. SNC1 is a disease resistance gene, encoding a nucleotide-binding leucine-rich repeat (NLR) (M. Y. Wang et al.,79MF-364578422Attorney Docket No. 26223-20028.402023) that sits within the partially epigenetic ally repressed RPP5 gene cluster (Yi and Richards, 2009). SNC1 expression activity is modulated by H3K4me3 (Yang et al., 2023), and its level of expression is positively correlated with basal resistance (Yang et al., 2023; Yi and Richards, 2009). Therefore, SNC1 is an attractive target for proof-of-principle modulation of disease resistance by epigenome engineering.
[0259] We designed a guide RNA to target the promoter region of SNC1 directly upstream of the TSS and inserted it into our SunTag:SDG2 and SunTag:dSDG2 constructs (FIG. 4A) As high pathogen resistance is generally associated with dwarfed plants, we inspected the lines for growth and developmental phenotypes. In our SNC1 targeting SunTag:SDG2:SNCl_g4, but not our SunTag:dSDG2:SNCl_g4 controls, we observed small, dwarfed phenotype with reduced rosette diameter (FIGS. 4A-4B). These plant phenotypes resembled those of our positive control bal lines, which overexpress SNC1, resulting in the small dwarfed plant phenotype and high basal disease resistance (Yi and Richards, 2009). As the phenotype is consistent with upregulation of SNC1 by SunTag, we confirmed the increased expression and successful targeting of SunTag:SDG2 to the SNC1 locus (FIGS.4C-4D).
[0260] To assess whether these lines show increased pathogen resistance, we challenged the plants with the generalist plant pathogen, Pseudomonas syringae pathovar tomato strain DC3000 (Pst). The recently described bioluminescent Pst (Pst:: LUX) was used to allow for non-destructive quantification of pathogen colonization (Furci et al., 2021). We developed a high-throughput assay system, inoculating 5 day old seedlings with Pst::LUX and imaging to quantify the levels of Pst::LUX at 24 hour intervals over the course of infection. We challenged a range of control genotypes that are known to be Pst hyper- (bal, edrP and hyporesistant (NahG), recapitulating the expected pattern of Pst::LUX colonization dynamics in these backgrounds (FIG. 4E) (Furci et al., 2021). Next, we challenged our SNC1 targeting lines with Pst::LUX, finding that SunTag:SDG2:SNCl_g4 lines displayed hyper resistance that was nearly identical in magnitude to that of the SNC1 overexpressing positive control, bal (FIG. 4F) The results demonstrate that epigenome-engineering of a single defense gene, SNC1, is sufficient to generate plants with improved disease resistance phenotypes.Centromeric targeting of SDG2 drives increased meiotic crossover recombination
[0261] Beyond transcription, H3K4me3 is associated with many other fundamental genome related processes including meiotic recombination. In plants, meiotic crossover80MF-364578422Attorney Docket No. 26223-20028.40 recombination rates broadly correlate with epigenetic territories and are non-uniform across the genome (Rowan et al., 2019; Tock et al., 2021). In A. thaliana, recombination occurs at a relatively high frequency in euchromatic arms, it is significantly reduced over the more heterochromatic pericentromeric regions, and drops to virtually zero over the centromere (Naish et al., 2021; Yelina et al., 2012). Crossover hotspots are often found to be high at gene promoters where H3K4me3 is enriched (Choi et al., 2013). In previous work, we showed that gene-associated crossover recombination hotspots can be suppressed by small RNA directed DNA methylation (Yelina et al., 2015). Therefore, we hypothesized that recombination within the centromeric regions might be increased by SunTag mediated epigenetic activation via deposition of H3K4me3.
[0262] The Arabidopsis centromeres are composed of highly repetitive CEN178 satellite repeat arrays (Naish et al., 2021; Wlodzimierz et al., 2023). We designed an individual guide RNAs that perfectly matches to 250 individual CEN178 repeats within the centromere region of chromosome 3 (LRCen3_g), and inserted this guide into the SunTag:SDG2 system. As centromere-proximal crossover recombination events are relatively rare, to quantitatively assess the impact on recombination rate over centromere 3, we crossed (non-transgenic) Col- 0 plants with CTL3.9 (Wu et al., 2015), which is a transgenic a line that encodes red and green fluorescent markers flanking CEN3. The red and green T-DNA insertions span a region of approximately 9Mb (positions 9.7Mb and 18.8Mb, respectively, on the ColCEN assembly) of chromosome 3, while the centromeric satellite arrays are located from 13.6- 15.7Mb (FIG. 5A). Meiotic recombination events that occur within this region result in progeny inheriting either red or green fluorescent markers, the frequency of which can be measured by automated seed imaging (Kbiri et al., 2022). We transformed CTL3.9 Fi double hemizygous plants with the SunTag:SDG2:LRCen3_g construct (FIGS. 5A-5B). The F2 plants represented individual transformants of the SunTag (Ti for SunTag), and so F3 seeds from individual F2 plants were imaged to measure the rate of recombination, in centiMorgans, over the centromere 3 spanning region of CTL3.9 (Fernandes et al., 2024).
[0263] Remarkably, the SunTag:SDG2:LRCen3_g containing lines showed significantly elevated crossover recombination within the CTL3.9 interval, which was not observed in the no guide RNA controls (ANOVA with post-hoc Tukey HSD cutoff <0.05) (FIGS. 5C-5H). Some individual transgenic plants displayed a substantial increase in CTL3.9 crossover recombination rate, over 50% higher as compared the average in non-transgenics, across this megabase-scale centromere-spanning region. Progeny from one of the most highly81MF-364578422Attorney Docket No. 26223-20028.40 recombining independent lines, and that showed Mendelian inheritance of the CTL3.9 T- DNAs, were sown to assess whether the effect could persist in the subsequent generation (F4). The plants retained a significantly elevated crossover frequency within CTL3.9 in the next generation, suggesting that the SunTag:SDG2:LRCen3_g effect is relatively stable in the presence of the transgene (FIG. 5C).
[0264] To examine the distribution of SunTag:SDG2:LRCen3_g in the high recombination lines (P4), we performed ChlP-seq (anti-HA) comparing progeny from sibling plants that either have (Bl 2) or do not have (B5) the SunTag machinery, as well as non- transgenic (Col-0) controls. Chromosome-wide plots at 100 kb resolution showed that SDG2:g-LRCen3 is enriched over the entire centromeric region of chromosome 3 (FIG. 51). SunTag was also enriched over the other four A. thaliana centromeres (FIGS. 6A-6D). As these regions are highly repetitive, we reasoned that the LRCen3 guide RNA might additionally bind the other centromeres due to mismatch tolerance. Indeed, while the LRCen3 guide RNA aligns perfectly to 250 loci within the centromere of chromosome 3, by allowing a single mismatch the guide is predicted to bind regions in all 5 centromeres, which is consistent with the ChlP-seq data observed (FIGS. 7A-7B). Having confirmed that SunTag is enriched within the centromeric regions in these lines, we were next interested to examine levels of H3K4me3. In the same lines, H3K4me3 ChlP-seq showed that levels were broadly increased throughout the genome (FIG. 51). This increase was stronger in centromeric regions as compared to euchromatic arms, as expected, and was most prominent over the peri centromere. As the markers on CTL3.9 span the centromere and adjacent pericentromeric regions (Fernandes et al., 2024) (FIG. 51), this is consistent with the significantly elevated crossover recombination rate observed. Overall, the results demonstrate that centromerically targeted SunTag-SDG2 can elevate the meiotic crossover potential of these typically recombination-suppressed regions.Mammalian PRDM9 drives efficient transcriptional activation in Arabidopsis
[0265] As the enrichment of H3K4me3 in SunTag :SDG2:LRCen3_g was not exclusive to the guide RNA targeted centromeric regions, we reasoned that the SunTag system may afford some level of off-target activity due to overexpression and ectopic activity of the non-Cas9- bound SDG2 containing effector module. Consistent with this observation, we also noticed that our SunTag :SDG2 no guide RNA controls frequently exhibited pleotropic developmental phenotypes (FIG. 8). To quantify global levels of H3K4me3, we performed bulk histone82MF-364578422Attorney Docket No. 26223-20028.40 westerns, comparing no guide RNA lines to successful FWA expression-activating guide RNA containing lines. Importantly, the SDG2 effector itself is expressed to similar levels as in the FWA targeting lines (FIG. 3B and FIG. 3D). The SunTag:SDG2 no guide lines all showed significant (3-4 fold) increases in global levels of H3K4me3, as compared to the FWA guide containing lines (FIG. 9A). This indicates that the guide RNA may help to sequester ScFv-SDG2 at the locus of interest, thereby reducing off-target activity. We also noticed that a small number of SunTag:dSDG2:FWA_g4 lines could activate FWA, but this effect was limited to lines where the dSDG2 effector was massively overexpressed (FIG. 9B). This suggests that either dSDG2 retains some residual catalytic activity, or that it can recruit endogenous machinery to initiate transcription.
[0266] In order to reduce off-target activity and the potential for co-recruitment of endogenous complexes, we replaced SDG2 with an orthogonal effector, PRDM9. PRDM9 is a mammalian derived H3K4me3 methyltransferase, and Policarpi et al (2024) recently showed that the catalytic domain could be incorporated into the SunTag system for targeting of H3K4me3 mammalian cells. Therefore, we inserted the PRDM9 methyltransferase domain, as well as the catalytically deactivated version, into our SunTag system and targeted FWA (FIG. 10A). Strikingly, SunTag:PRDM9:FWA_g4 was highly effective at activating FWA mRNA expression (FIG. 10B), (23 / 32 in the Ti generation, 72%), while none of the dPRDM9 (0 / 16) and no-guide RNA (0 / 19) versions were able to do so (FIG. 10C, Table 5). Importantly, we could detect robust expression of the effector in all constructs (FIG. 8), and ChIP qPCR confirmed the presence of SunTag and the enrichment of H3K4me3 at FWA (FIG. 10D) At the transcriptome level, the effect of SunTag:PRDM9:FWA_g4 was highly specific, with only 19 genes identified as differentially expressed (15 up, 4 down, FDR<0.01), and with FWA itself having the highest fold change among them (>158 fold change over rdr6) (FIG. 10E). Unlike the no-guide SDG2 lines, we did not observe any developmental defects in the PRDM9 no-guides, and correspondingly, we observed only minor transcriptional defects when comparing the no-guide lines to non-transformed rdr6 controls (FIGS. 10F-10G). Finally, we performed ChlP-seq in these lines, confirming FWA as the most significant and enriched peak for SunTag binding and revealing deposition of H3K4me3 at the transcriptional start site of FWA (FIG. 10H). Overall, the results demonstrates that PRDM9 is highly effective for deposition of H3K4me3 in plant as well as mammalian SunTag systems. Furthermore, as PRDM9 is not likely to have endogenous83MF-364578422Attorney Docket No. 26223-20028.40 partners in the plant nucleus, the results provide strong evidence in favor of a direct role for H3K4me3 in transcriptional stimulation.Table 5: FWA Activating Ti Lines Per Construct.Conclusion
[0267] Here we show that SunTag mediated targeting of H3K4me3 facilitates control of gene expression states, can allow for improved pathogen resistance capabilities, and can drive targeted increases in centromere-proximal recombination. From a fundamental biological perspective, our data provide strong support in favor of an instructive role for H3K4me3 in transcription, as we show that site-specific deposition of H3K4me3 results in activation of mRNA expression. As the effect is dependent on the presence of a functional catalytic domain, and is observed using both native and non-native methyltransferases, co-recruitment of endogenous complexes can be likely ruled out as an explanatory factor in driving this transcriptional enhancement. Furthermore, we reveal H3K4me3 targeting as a novel strategy for stimulating crossover formation in low-recombining, repeat-rich regions of the genome. The results open the door to non-GM approaches for precise genome engineering and rational design of agriculturally relevant traits.Materials & MethodsPlant Materials and Growth Conditions
[0268] All Arabidopsis thaliana lines were in a Col-0 ecotype background. Plants grown on plates were sown on to l / 2x Murashige-Skoog (MS) medium, 1% sucrose and 0.8% agar (pH 5.7), stratified for 2 d at 4 °C in the dark, then transferred to growth chambers (Percival CU-41L4D) at 21 °C under long day light conditions (16 hour light / 8 hour dark). Plants grown on soil were grown in F2 soil at 20°C under long day light conditions (16 hour light / 8 hour dark). Lines: rdr6-15 (SAIL_617_H07), and fwa lines were previously described (M.84MF-364578422Attorney Docket No. 26223-20028.40Wang et al., 2023). NahG and edrl were kindly provided by Prof. Juriaan Ton, previously described in (Furci et al., 2021). bal lines were previously described in (Yi and Richards, 2009). The CTL 3.9 line was previously described in (Fernandes et al., 2024; Wu et al., 2015). All transgenic plants were generated by Agrobacterium tumefaciens strain AGL1 using floral dip (Clough and Bent, 1998).Plasmid Construction
[0269] To generate the SunTag:SDG2:FWA_g4 construct: the CDS of the catalytic domain of SDG2 (from amino acids 1571-2335) was amplified and coned into the BsiWI linearized SunTag construct (previously described (Papikian et al., 2019)), using In-Fusion (Takara). To generate dSDG2: the conserved amino acid predicted to abrogate catalytic activity when mutated (Y1903F) was identified using ConSurf (Ashkenazy et al., 2016). Overlapping PCR was used to generate a version of SDG2 containing the desired mutation (Table 6), and the amplified PCR product was cloned into a BsiWI linearized SunTag-SDG2 construct by In-Fusion (Takara). To generate SunTag:PRDM9, the catalytic domain of mouse PRDM9 (110-417aa) (described in Policarpi et al 2024) was synthesized (TWIST Bioscience) and cloned into the SunTag using the same BsiWI linearization and In-Fusion method described above. Subsequently, overlap PCR was used to generate a version of PRDM9 with the G282A mutation (Table 6). The SNC1, LRCen3, and the no guides controls were cloned using the protocol described in (Ghoshal et al., 2021) (Table 6).
[0270] Sequences of constructs used in these experiments included the following:
[0271] >pEG302-l-SunTag:PRDM9:FWA_g4 (SEQ ID NO: 58)
[0272] >pEG302-l-SunTag:dPRDM9:FWA_g4 (SEQ ID NO: 59)
[0273] >pEG302-l-SunTag:SDG2:LRCen3 (SEQ ID NO: 60)
[0274] >pEG302-l-SunTag:SDG2:No_g (SEQ ID NO: 61)
[0275] >pEG302-l-SunTag:dSDG2:FWA_g4 (SEQ ID NO: 62)
[0276] >pEG302-l-SunTag:SDG2:FWA_g4 (SEQ ID NO: 63)Table 6: Oligo List.85MF-364578422Attorney Docket No. 26223-20028.40Chromatin Immunoprecipitation
[0277] The chromatin immunoprecipitation experiments were performed as previously described (Harris et al., 2024). 2g of 14-day-old seedlings were collected per biological replicate. For SunTag-SDG2 FWA and SunTag-SDG2 SNC1 input material were pooled T2 plants. For the SunTag:PRDM9:FWA_g4 in rdr6. 2 independent T3 lines expressing FWA, 2 independent T3 lines NoG were used alongside rdr6 controls. 7 pl of anti-HA (3F10, Merck) and 5 pl of anti-H3K4me3 (ab8580, abeam) were added to up to 2 ml of sheared chromatin per sample for pulldown. The ChIP purified DNA was directly used for ChlP-qPCR and ChlP-seq library preparation. Primer information is listed in Table 6. Values were86MF-364578422Attorney Docket No. 26223-20028.40 normalized to Input (2’(ChIP Ct InPut Ct)*100). ChlP-seq libraries were generated using NuGen Ovation Ultra Low System V2 kits according to manufacturer’s instructions and were sequenced on an Illumina NovaSeq X instrument with 150bp paired-end reads.Crossover Measurement
[0278] For the recombination rate experiment, SunTag:SDG2:LRCen3_g / No_g plasmids were transformed into Fi plants derived from a cross between Col-0 and CTL3.9 (double homozygous for the seed coat fluorescent markers) (Wu et al., 2015). 70-90 individual Ti transgenic plants were recovered per construct after selection on hygromycin (F2 generation). Seeds derived from individual Tis (F3) were imaged and scored for recombination rate using the SeedScoring CellProfiler pipeline, previously described (Fernandes et al., 2024; Kbiri et al., 2022) with approximately 1,000 seeds per biological sample. Note, seed sets were prescreened to remove lines in which either of the fluorescent markers had become homozygous (presence / absence) in the F2 generation). For the F4 data, seeds derived from the most highly recombining and non-distorted line (P4) were sown (F3), grown for a generation and seeds from individual plants were collected (F4) for recombination rate analysis as above, again filtering seed sets from which either of the fluorescent markers had become homozygous (presence / absence) in the previous generation.Western Blots
[0279] Western blot were performed as previously described (Ichino et al., 2021) with minor modifications described below. For detection of the large dCAS9-10xGCN4 and SDG2 effector components as shown in Fig SI, single leaf punch (1cm diameter) samples from individual plants were used as input. Tissue was ground in a Tissuelyser II (Retsch) and resuspended in 100 ul of a 60:40 ratio of 2XSDS:8M Urea, mixed, and boiled at 95 degrees C for 5 minutes. Samples were centrifuged at >10,000xg to remove debris and loaded into a 3- 8% tris acetate gel using MOPS running buffer. After electrophoresis, protein was transferred from the gel to a methanol activated PVDF membrane in Tobin Buffer (25mM Tris, 192mM Glycine, 20% methanol, 0.035% SDS). Membranes were blocked in 5% milk (w / v) in PBST (Phosphate Buffered Saline, 0.1% Tween20) and then incubated with HRP conjugated anti- HA at a 1 :3,000 dilution in block. The blot was imaged by ECL detection.
[0280] For the western blots used to detect levels of H3K4me3, 100 mg of 7-day-old seedlings were frozen in liquid nitrogen and ground using a Tissuelyser II (Retsch). Total histones were extracted using the The EpiQuik™ Total Histone Extraction Kit (EpigenTek)87MF-364578422Attorney Docket No. 26223-20028.40 according to the manufacturers instructions. 5 pl of sample for H3 and 20 pl of sample for H3K4me3 were loaded onto a NuPage 4-12% BIS-Tris gel (Invitrogen) in NuPage MOPS SDS running buffer (Invitrogen). Transfer from gel to PVDF membrane was performed using an iBlot 3 machine with the dedicated transfer stack (Invitrogen). Following transfer, membrane was blocked in 5% milk (w / v) in TBST (Tris Buffered Saline, 0.1% Tween20) and shaken in room temp for 1 hour. Blocking buffer was discarded and membranes were incubated in primary antibody (anti-H3, abl791 and anti-H3K4me3, ab8580, Abeam) in blocking buffer (1:1,000) shaking overnight in 4C. Following morning membrane was rinsed in TBST 5 times for 5 minutes in room temp. Secondary antibody was added for 1 hour (Goat anti-mouse HRP 1:10,000 in blocking buffer, ab6789, Abeam). Membrane was washed again 5 times for 5 minutes. Detection was performed using a SuperSignal West Pico chemiluminescent substrate (#34080, Thermo) using an ImageQuant 800 (Amersham).Fluorescent Imaging
[0281] Roots of 7-day-old plants grown in vertical l / 2x MS solid media plates were imaged using a Leica DM6000B epifluorescent microscope.Rosette Size Surface Area Measurement
[0282] 3-week-old plants grown on soil were imaged using a Google Pixel 7 mounted on a tripod. Images were then analyzed using Fiji.Pseudomonas syringae Colonization Assay
[0283] Infection assays were performed with P. syringae pathovar tomato strain DC3000 (Pst) strain expressing a stable chromosomal insertion of the lux- CDABE operon from Photorhabdus luminescens (Psi:: LUX) (Fan et al. 2008), using a slightly modified version of a previously published protocol (Furci et al. 2021). Plants were grown on 1 / 2X MS solid media supplemented with 200 pg / ml Timentin in 96-well plates (Falcon) for 7 days. Prior inoculation, Pst::LUX were grown in King’s B media containing 50 pg / ml Kanamycin and 50 pg / ml Rifampicin overnight at 28 °C. Bacteria were pelleted by centrifugation, washed in 10 mM MgSO4 and finally re-suspended to an OD600 of 0.2 in 10 mM MgSO4 containing 0.015% v / v Silwet-L77. 7 days-old seedlings were sprayed with bacteria solution and 96-well plates were sealed with parafilm to maintain 100% relative humidity. Seedlings were imaged at 1-4 days post infection (dpi). Prior imaging using (ImageQuant 800, Amersham), plates were left in the dark for 2 mins. Bacterial bioluminescence images were acquired with 4 mins88MF-364578422Attorney Docket No. 26223-20028.40 exposure time and bright field images were taken using OD settings of the ImageQuant 800. Image-based quantification of bioluminescence was carried out in Fiji. For each well, the well outline was obtained from bright field images and added to ROI Manager. These ROIs were then transposed onto the bioluminescence images. The bioluminescence intensity from infected seedlings was obtained using the Fiji functions Analyse, Measure and Mean value.RT-qPCR
[0284] RNA was extracted from indicated plant material using TRIzol reagent (Invitrogen) and the Direct-Zol RNA MiniPrep (Zymo) kit, including in-column DNase treatment following the manufacturer’s instructions. cDNA was synthesized using SuperScript IV (Invitrogen). qRT-PCRs were performed using a Luna Universal qPCR Master Mix (NEB) and a CFX connect Real-time PCR detection system (Bio-rad). For SunTag-SDG2 targeting FWA and SNC1, RNA was extracted from 14-day-old whole seedlings grown on l / 2x MS plates. SNC1 targeted lines were treated with Pst::LUX 4H prior to harvesting. For SunTag:PRDM9 qRT-PCR, RNA was extracted from 4 weeks-old leaf tissue of all Ti lines and 2 independent T3 lines per construct. 450 ng of total RNA was used for cDNA synthesis. Primers are listed in Table 6.Quant-Seq Library Construction
[0285] For Suntag-SDG2 QuantSeq, 350 ng of total RNA from ~10 pooled T2 plants was used as input material. For Suntag :PRDM9 Quant-seq, 350 ng of total RNA from 3 Ti SunTag:PRDM9:FWA_g4 plants expressing FWA, 3 Ti no guide SunTag:PRDM9 lines and rdr6 and plants was used for library preparation. Libraries were prepared using QuantSeq 3 ’ mRNA-Seq Library Prep Kit FWD (Lexogen) according to manufacturer’s instructions and sequenced on Illumina NovaSeq 6000 PEI 50 instrument.Bioinformatic AnalysisChlP-seq Analysis
[0286] The ChlP-seq analysis pipeline was performed as previously described with minor modifications. Bowtie2 (version 2.5.0) was used to map the PEI 50 bp read data in fastq format to the TAIR10 genome (— no-unal), and were converted to bam format using Samtools (version 1.10). Reads were de-duplicated using the samtools fixmate and markdup commands. Tracks were generated in DeepTools (version 3.5.1) using bamCoverage (--89MF-364578422Attorney Docket No. 26223-20028.40 normalizeUsing RPGC, — effectiveGenomeSize 135000000 -binSize 10) with multicopy regions blacklisted (— blackListFileName) using the regions identified in (Klasfeld et al., 2022). For analysis of reads mapping to the centromere, a modified version of the previously described pipeline (Wlodzimierz et al., 2023) was used. Briefly, PEI 50 bp read data in fastq format were mapped to the Col-CEN genome (Naish et al., 2021) using Bowtie2 (version 2.2.5) with (-very-sensitive -k 200 — no-unal -no-discordant) and were filtered for primary alignments using samtools view (-F 256 -q 5) prior to deduplication and track generation, as above. Peaks were called using MACS2 (version 2.2.9.1) with default parameters.Correlation plots were generated in DeepTools using MultiBamSummary (—binSize 25) and PlotCorrelation (-c pearson -removeOutliers -plotNumbers -p heatmap).QuantSeq Analysis
[0287] In accordance with the manufacturer’s (Lexogen) instructions, only read (*l.fq.gz) from PEI 50 bp data was used for downstream analysis. Briefly, reads were trimmed using cutadapt (version 1.18) and were mapped to the genome using STAR (version 2.7.10b) to the TAIR10 genome (-quantMode GeneCounts — alignlntronMax 10000 - outSAMmultNmax 20). Gene counts were used as input for DESeq2 analysis in R.Differentially expressed genes were defined using an adjusted p-value (Bonferroni method) cutoff of <0.01. gRNA Binding Analysis:
[0288] For the LRCen3 binding and mismatch analysis, the LRCen3 guide RNA sequence (Table 6), was mapped to the Col-CEN genome using bowtie2 (version 2.2.5). The following options were used for no mismatches (-f -a -end-to-end — np 0 — score-min L,0,0) and for 1 mismatch (-f -a -end-to-end —score-min L,0,-l). Mapped reads were sorted and converted to bam format using Samtools (version 1.10), and subsequently converted to .bed format using the bamtobed function in bedtools (2.20.1). Bed files were used for downstream analysis in R to generate chromosome-wide density plots over lOOkb regions.ReferencesAshkenazy H, Abadi S, Martz E, Chay O, Mayrose I, Pupko T, Ben-Tai N. 2016. ConSurf 2016: an improved methodology to estimate and visualize evolutionary conservation in macromolecules. Nucleic Acids Res 44:W344-W350. doi:10.1093 / NAR / GKW408Borde V, Robine N, Lin W, Bonfils S, Geli V, Nicolas A. 2009. Histone H3 lysine 4 trimethylation marks meiotic recombination initiation sites. EMBO 728:99-111. doi:10.1038 / emboj.2008.257Choi K, Zhao X, Kelly KA, Venn O, Higgins JD, Yelina NE, Hardcastle TJ, Ziolkowski PA, Copenhaver GP, Franklin FCH, Mcvean G, Henderson IR. 2013. Arabidopsis meiotic crossover hot spots overlap with90MF-364578422Attorney Docket No. 26223-20028.40H2A.Z nucleosomes at gene promoters. Nat Genet 45:1327-1338. doi:10.1038 / ng.2766Clough SJ, Bent AF. 1998. Floral dip: A simplified method for Agrobacterium-mediated transformation of Arabidopsis thaliana. Plant J 16:735-743. doi:10.1046 / j,1365-313X.1998.00343.xDubois A, Roudier F. 2021. Deciphering Plant Chromatin Regulation via CRISPR / dCas9-Based Epigenome Engineering. Epigenomes 5:17. doi:10.3390 / epigenomes5030017Fernandes JB, Naish M, Lian Q, Burns R, Tock AJ, Rabanal FA, Wlodzimierz P, Habring A, Nicholas RE, Weigel D, Mercier R, Henderson IR. 2024. Structural variation and DNA methylation shape the centromere-proximal meiotic crossover landscape in Arabidopsis. Genome Biol 25:1-31. doi: 10.1186 / S13059-024-03163-4Francisco M, Id P, Id PS. 2023. Histone methyltransferase activity affects metabolism in human cells independently of transcriptional regulation. doi:10.1371 / journal.pbio.3002354Furci L, Pardo DP, Ton J. 2021. A rapid and non - destructive method for spatial - temporal quantification of colonization by Pseudomonas syringae pv . tomato DC3000 in Arabidopsis and tomato. Plant Methods 1- 8. doi:10.1186 / sl3007-021-00826-2Gallego-Bartolome J, Gardiner J, Liu W, Papikian A, Ghoshal B, Kuo HY, Zhao IM-C, Segal DJ, Jacobsen SE. 2018. Targeted DNA demethylation of the Arabidopsis genome using the human TET1 catalytic domain. Proc Natl Acad Sci 115:201716945. doi:10.1073 / pnas,1716945115Gardiner J, Ghoshal B, Wang M, Jacobsen SE. 2022. CRISPR-Cas-mediated transcriptional control and epimutagenesis. Plant Physiol 188:1811-1824. doi:10.1093 / plphys / kiac033Ghoshal B, Picard CL, Vong B, Feng S, Jacobsen SE. 2021. CRISPR-based targeting of DNA methylation in Arabidopsis thaliana by a bacterial CG-specific DNA methyltransferase 1-8. doi: 10.1073 / pnas.2125016118 / - / DCSupplemen tai. PublishedGuo L, Yu Y, Law JA, Zhang X. 2010. Set domain group2 is the major histone H3 lysine 4 trimethyltransferase in Arabidopsis. Proc Natl Acad Sci U SA 107:18557-18562. doi:10.1073 / pnas,1010478107Harris CJ, Amtmann A, Ton J. 2023. Epigenetic processes in plant stress priming : Open questions and new approaches. Curr Opin Plant Biol ! 5:102432. doi:10.1016 / j.pbi.2023.102432Harris CJ, Zhong Z, Ichino L, Feng S, Jacobsen SE. 2024. Hl restricts euchromatin-associated methylation pathways from heterochromatic encroachment. Elife 1-17.Howe FS, Fischl H, Murray SC, Mellor J. 2017. Is H3K4me3 instructive for transcription activation? BioEssays 39:1-12. doi:10.1002 / bies.201600095Ichino L, Boone BA, Strauskulage L, Harris CJ, Kaur G, Gladstone MA, Tan M, Feng S, Jami-alahmadi Y, Duttke SH. 2021. MBD5 and MBD6 couple DNA methylation to gene silencing through the J-domain protein SILENZIO. Science (80- ) 1439:1434-1439.Kbiri N, Dluzewska J, Henderson IR, Ziolkowski PA. 2022. Quantifying Meiotic Crossover Recombination in Arabidopsis Lines Expressing Fluorescent Reporters in Seeds Using SeedScoring Pipeline for CellProfiler In: Lambing C, editor. Plant Gametogenesis. Methods in Molecular Biology. New York, NY: Springer US. pp. 121-134. doi:10.1007 / 978-l-0716-2253-7_10Klasfeld S, Roule T, Wagner D. 2022. Greenscreen: A simple method to remove artifactual signals and enrich for true peaks in genomic datasets including ChlP-seq data. Plant Cell 34:4795-4815. doi: 10.1093 / plcell / koac282Lauberth SM, Nakayama T, Wu X, Ferris AL, Tang Z, Hughes SH, Roeder RG. 2013. H3K4me3 interactions with TAF3 regulate preinitiation complex assembly and selective gene activation. Cell 152:1021-1036. doi:10.1016 / j.cell.2013.01.052Lloyd JPB, Lister R. 2021. Epigenome plasticity in plants. Nat Rev Genet 0123456789. doi:10.1038 / s41576- 021-00407-yNaish M, Alonge M, Wlodzimierz P, Tock AJ, Abramson BW, Schmucker A, Mandakova T, Jamge B, Lambing C, Kuo P, Yelina N, Hartwick N, Colt K, Smith LM, Ton J, Kakutani T, Martienssen RA, Schneeberger K, Lysak MA, Berger F, Bousios A, Michael TP, Schatz MC, Henderson IR. 2021. The genetic and epigenetic landscape of the Arabidopsis centromeres. Science (80- ) 374. doi: 10.1126 / science.abi7489Papikian A, Liu W, Gallego-bartolome J, Jacobsen SE. 2019. Site-specific manipulation of Arabidopsis loci using CRISPR-Cas9 SunTag systems. Nat Comma 10. doi:10.1038 / s41467-019-08736-7Pflueger C, Tan D, Swain T, Nguyen T, Pflueger J, Nefzger C, Polo JM, Ford E, Lister R. 2018. A modular dCas9-SunTag DNMT3A epigenome editing system overcomes pervasive off-target activity of direct fusion dCas9-DNMT3A constructs. Genome Res 28:1193-1206. doi: 10.1101 / gr.233049.117Policarpi C, Hackett JA. 2021. Epigenetic editing : Dissecting chromatin function. BioEssays 1-16.Policarpi C, Munafd M, Tsagkris S, Carlini V, Hackett JA. 2024. Systematic epigenome editing captures the context-dependent instructive function of chromatin modifications. Nat Genet 56: 1168-1180. doi:10.1038 / s41588-024-01706-wPowers NR, Parvanov ED, Baker CL, Walker M, Petkov PM, Paigen K. 2016. The Meiotic Recombination91MF-364578422Attorney Docket No. 26223-20028.40Activator PRDM9 Trimethylates Both H3K36 and H3K4 at Recombination Hotspots In Vivo. PLoS Genet 12:1-24. doi:10.1371 / journal.pgen.1006146Rowan BA, Heavens D, Feuerborn TR, Tock AJ, Henderson IR, Weigel D. 2019. An Ultra High-Density Arabidopsis thaliana Crossover. Genetics 213:771-787.Schwaiger M, Schonauer A, Rendeiro AF, Pribitzer C, Schauer A, Gilles AF, Schinko JB, Renfer E, Fredman D, Technau U. 2014. Evolutionary conservation of the eumetazoan gene regulatory landscape. Genome Res 24:639-650. doi:10.1101 / gr,162529.113Selma S, Orzaez D. 2021. Perspectives for epigenetic editing in crops. Transgenic Res 30:381 400. doi:10.1007 / sll248-021-00252-zSoppe WJ, Jacobsen SE, Alonso-Blanco C, Jackson JP, Kakutani T, Koornneef M, Peeters a J. 2000. The late flowering phenotype of fwa mutants is caused by gain-of-function epigenetic alleles of a homeodomain gene. Mol Cell 6:791-802.Tanenbaum ME, Gilbert L a, Qi LS, Weissman JS, Vale RD. 2014. A versatile protein tagging system for signal amplification in single molecule imaging and gene regulation. Cell 159:635-646. doi:10.1016 / j.cell.2014.09.039Tock AJ, Holland DM, Jiang W, Osman K, Sanchez-Moran E, Higgins JD, Edwards KJ, Uauy C, Franklin FCH, Henderson IR. 2021. Crossover-active regions of the wheat genome are distinguished by DMC1, the chromosome axis, H3K27me3, and signatures of adaptation. Genome Res 31:1614-1628. doi:10.1101 / gr.273672.120Vermeulen M, Mulder KW, Denissov S, Pijnappel WWMP, van Schaik FMA, Varier RA, Baltissen MPA, Stunnenberg HG, Mann M, Timmers HTM. 2007. Selective Anchoring of TFIID to Nucleosomes by Trimethylation of Histone H3 Lysine 4. Cell 131:58-69. doi:10.1016 / j.cell.2007.08.016Villiger L, Joung J, Koblan L, Weissman J, Abudayyeh OO, Gootenberg JS. 2024. CRISPR technologies for genome, epigenome and transcriptome editing. Nat Rev Mol Cell Biol, doi: 10.1038 / s41580-023-00697-6Wang H, Fan Z, Shliaha P V., Miele M, Hendrickson RC, Jiang X, Helin K. 2023. H3K4me3 regulates RNA polymerase II promoter-proximal pause-release. Nature 615:339-348. doi:10.1038 / s41586-023-05780-8Wang H, Helin K. 2024. Roles of H3K4 methylation in biology and disease. Trends Cell Biol xx:l-14. doi:10.1016 / j.tcb.2024.06.001Wang M, Zhong Z, Gallego-bartolome J, Li Z, Feng S, Kuo HY, Kan RL, Lam H, Richey JC, Tang L, Zhou J, Liu M, Jami-Alahmadi Y, Wohlschlegel JA, Jacobsen SE. 2023. A gene silencing screen uncovers diverse tools for targeted gene repression in Arabidopsis . Nat Plants. doi : 10. 1038 / s41477-023-01362-8Wang MY, Chen J Bin, Wu R, Guo HL, Chen Y, Li ZJ, Wei LY, Liu C, He SF, Du M Da, Guo Y long, Peng YL, Jones JDG, Weigel D, Huang JH, Zhu WS. 2023. The plant immune receptor SNC1 monitors helper NLRs targeted by a bacterial effector. Cell Host Microbe 31:1792-1803. e7. doi: 10.1016 / j .chom.2023.10.006Wang X, Yamaguchi N. 2024. Cause or effect: Probing the roles of epigenetics in plant development and environmental responses. Curr Opin Plant Biol 81:102569. doi:10.1016 / j.pbi.2024.102569Wlodzimierz P, Rabanal FA, Burns R, Naish M, Primetis E, Scott A, Mandakova T, Gorringe N, Tock AJ, Holland D, Fritschi K, Habring A, Lanz C, Patel C, Schlegel T, Collenberg M, Mielke M, Nordborg M, Roux F, Shirsekar G, Alonso-Blanco C, Lysak MA, Novikova PY, Bousios A, Weigel D, Henderson IR. 2023. Cycles of satellite and transposon evolution in Arabidopsis centromeres. Nature 618:557-565. doi:10.1038 / s41586-023-06062-zWu G, Rossidivito G, Hu T, Berlyand Y, Poethig RS. 2015. Traffic lines: New tools for genetic analysis in Arabidopsis thaliana. Genetics 200:35-45. doi:10.1534 / genetics.114.173435Yang L, Wang Z, Hua J. 2023. Multiple chromatin-associated modules regulate expression of an intracellular immune receptor gene in Arabidopsis. New Phytol 237:2284-2297. doi:10.1111 / nph.18672Yelina NE, Choi K, Chelysheva L, Macaulay M, de Snoo B, Wijnker E, Miller N, Drouaud J, Grelon M, Copenhaver GP, Mezard C, Kelly K a, Henderson IR. 2012. Epigenetic remodeling of meiotic crossover frequency in Arabidopsis thaliana DNA methyltransferase mutants. PLoS Genet 8:el002844. doi: 10.1371 / journal.pgen.1002844Yelina NE, Lambing C, Hardcastle TJ, Zhao X, Santos B, Henderson IR. 2015. DNA methylation epigenetically silences crossover hot spots and controls chromosomal domains of meiotic recombination in Arabidopsis. Genes Dev 29:2183-2202. doi:10.1101 / gad.270876.115Yi H, Richards EJ. 2009. Gene duplication and hypermutation of the pathogen Resistance gene SNC1 in the arabidopsis bal variant. Genetics 183:1227-1234. doi:10.1534 / genetics.109.105569WO 2018 / 136783U.S. Patent No. 11,692,19892MF-364578422Attorney Docket No. 26223-20028.40Example 2: PRDM9 increases meiotic recombination over the CTL3.9 interval
[0289] This Example demonstrates that PRDM9 is able to increase meiotic recombination over the CTL3.9 interval, to a level similar to SDG2 as shown in Example 1.Methods
[0290] Methods were the same as those described in Example 1. Briefly, for the recombination rate experiment, SunTag plasmids were transformed into Fl plants derived from a cross between the traffic line CTL3.9 (containing insertions of NAP:eGFP and NAP:dsRED) and Col-0 (Wu et al., 2025). As the red and green T-DNA insertions are separated by nearly 9-Mb, disruption from the T-DNA insertions (5-Kb and 12-Kb, respectively) was likely to be negligible at this scale. To help reduce the impact of transgene silencing in the absence of the rdr6 background (due to the impracticality of generating homozygous rdr6 in the double heterozygous CTL3.9 Fl background) and to ensure a sufficient number of scorable lines, 70-100 Tl’s were aimed to be screened per construct. T1 transgenic plants (F2 plants) were recovered on hygromycin selection plates. Seeds harvested from T1 (F3) plants were first pre-screened using a Leica DFC310 FX dissecting microscope with ultraviolet filters. Only seeds that contained both red and green markers were cleaned to remove plant debris and included in the analysis pipeline. The seed monolayer containing between 1000 and 2500 seeds was captured with a Leica DFC310 FX dissecting microscope, first using brightfield, followed by UV through a dsRED filter and then UV through a GFP filter. Images were analyzed using a modified CellPro ler pipeline (version 4.2.5). For the F4 data, seeds derived from the most highly recombining and non-distorted line (P4) were sown (F3), grown for a generation, and seeds from individual plants were collected (F4) for recombination rate analysis as above, again filtering seed sets from which either of the fluorescent markers had become homozygous (presence / absence) in the previous generation.
[0291] The PRDM9 fragment amino acid sequence is set forth in SEQ ID NO: 38. The plasmid map for SunTag:PRDM9:LRCen3_g is set forth in SEQ ID NO: 112.Results
[0292] Meiotic crossover frequency was measured over the centromere spanning CTL3.9 interval in plant transformed with SunTag :PRDM9:LRCen3_g. A significant increase was observed as compared to non-transformed control lines (p<0.001, two sample, two sided t- test), with some individuals showing more than 50% increase over this megabase spanning93MF-364578422Attorney Docket No. 26223-20028.40 regions as compared to the median level in the non-transgenic control (e.g., recombination frequency of 24, as compared to the non-transgenic median of 17)(.vee FIG. 11). This pattern and level of increase in meiotic recombination rate over this region was similar to that observed with the SunTag:SDG2:LRCen3_g construct (see Example 1).
[0293] These results show that PRDM9 is similarly effective at increasing meiotic crossover frequency as compared to SDG2. As PRDM9 is a mammalian derived enzyme and so not likely to have endogenous partners in the plant cell, this suggests that targeted deposition of H3K4me3 is responsible for driving the increased meiotic recombination rate.ReferencesAbramson J, Adler J, D nger J, Evans R, Green T, Pritzel A, Ronneberger O, Willmore L, Ballard AJ, Bambrick J, Bodenstein SW, Evans DA, Hung CC, O’Neill M, Reiman D, Tunyasuvunakool K, Wu Z, Zemgulyte A, Arvaniti E, Beattie C, Bertolli O, Bridgland A, Cherepanov A, Congreve M, Cowen-Rivers Al, Cowie A, Figurnov M, Fuchs FB, Gladman H, Jain R, Khan YA, Low CMR, Perlin K, Potapenko A, Savy P, Singh S, Stecula A, Thillaisundaram A, Tong C, Yakneen S, Zhong ED, Zielinski M, Zfdek A, Bapst V, Kohli P, Jaderberg M, Hassabis D, Jumper JM. 2024. Accurate structure prediction of biomolecular interactions with AlphaFold 3. Nature 630:493-500. doi:10.1038 / s41586-024-07487-wAshkenazy H, Abadi S, Martz E, Chay O, Mayrose I, Pupko T, Ben-Tai N. 2016. ConSurf 2016: an improved methodology to estimate and visualize evolutionary conservation in macromolecules. Nucleic Acids Res 44:W344-W350. doi:10.1093 / NAR / GKW408Baldini A, Battaglia F, Perrella G. 2025. The generation of novel epialleles in plants: the prospective behind reshaping the epigenome. Front Plant Sci 16. doi:10.3389 / fpls.2025.1544744Boone BA, Ichino L, Wang S, Gardiner J, Yun J, Jami-Alahmadi Y, Sha J, Mendoza CP, Steelman BJ, van Aardenne A, Kira-Lucas S, Trentchev I, Wohlschlegel JA, Jacobsen SE. 2023. ACD15, ACD21, and SLN regulate the accumulation and mobility of MBD6 to silence genes and transposable elements. Sci Adv 9:1-14. doi:10.1126 / sciadv.adi9036Borde V, Robine N, Lin W, Bonfils S, Geli V, Nicolas A. 2009. Histone H3 lysine 4 trimethylation marks meiotic recombination initiation sites. EMBO J 28:99-111. doi:10.1038 / emboj.2008.257Casas-Mollano JA, Zinselmeier MH, Sychla A, Smanski MJ. 2023. Efficient gene activation in plants by the MoonTag programmable transcriptional activator. Nucleic Acids Res 51:7083-7093. doi:10.1093 / nar / gkad458Cheng Y, Zhou Y, Wang M. 2024. Targeted gene regulation through epigenome editing in plants. Curr Opin Plant Biol 80:102552. doi:10.1016 / j.pbi.2024.102552Choi K, Zhao X, Kelly KA, Venn O, Higgins JD, Yelina NE, Hardcastle TJ, Ziolkowski PA, Copenhaver GP, Franklin FCH, Mcvean G, Henderson IR. 2013. Arabidopsis meiotic crossover hot spots overlap with H2A.Z nucleosomes at gene promoters. Nat Genet 45: 1327-1338. doi:10.1038 / ng.2766Clough SJ, Bent AF. 1998. Floral dip: A simplified method for Agrobacterium-mediated transformation of Arabidopsis thaliana. Plant J 16:735-743. doi:10.1046 / j,1365-313X.1998.00343.xCuimei Zhang, Yajun Tang, Shanjie Tang, Lei Chen, Haidi Yuan, Yujun Xu, Yangyan Zhou, Shuaibin Zhang, Jianli Wang, Hongyu Wen, Wenbo Jiang, Yongzhen Pang, Xian Deng, Xiaofeng Cao, Junhui Zhou, Xianwei Song QLP. 2024. An Inducible CRISPR-activation tool for accelerated plant regeneration Cuimei. Plant Commun.94MF-364578422Attorney Docket No. 26223-20028.40 de Melo BP, Lourengo-Tessutti IT, Paixao JFR, Noriega DD, Silva MCM, de Almeida-Engler J, Fontes EPB, Grossi-de-Sa MF. 2020. Transcriptional modulation of AREB-1 by CRISPRa improves plant physiological performance under severe water deficit. Sci Rep 10:1-10. doi:10.1038 / s41598-020-72464-yDu J, Johnson LM, Groth M, Feng S, Hale CJ, Li S, Vashisht AA, Gallego-Bartolome J, Wohlschlegel JA, Patel DJ, Jacobsen SE. 2014. Mechanism of DNA methylation-directed histone methylation by KRYPTONITE. Mol Cell 55:495-504. doi:10.1016 / j.molcel.2014.06.009Dubois A, Roudier F. 2021. Deciphering Plant Chromatin Regulation via CRISPR / dCas9-Based Epigenome Engineering. Epigenomes 5:17. doi:10.3390 / epigenomes5030017Fal K, El Khoury S, Le Masson M, Berr A, Carles CC. 2025. CRISPR / dCas9-targeted H3K27me3 demethylation at the CUC3 boundary gene triggers ectopic transcription and impacts plant development. iScience 28:112475. doi:10.1016 / j.isci.2025.112475Fayos I, Frouin J, Meynard D, Vernet A, Herbert L, Guiderdoni E. 2022. Manipulation of Meiotic Recombination to Hasten Crop Improvement. Biology (Basel) 11:1-15. doi: 10.3390 / biology 11030369Fernandes JB, Naish M, Lian Q, Burns R, Tock AJ, Rabanal FA, Wlodzimierz P, Habring A, Nicholas RE, Weigel D, Mercier R, Henderson IR. 2024. Structural variation and DNA methylation shape the centromere- proximal meiotic crossover landscape in Arabidopsis. Genome Biol 25:1-31. doi:10.1186 / sl3059-024-03163-4Furci L, Pardo DP, Ton J. 2021. A rapid and non - destructive method for spatial - temporal quantification of colonization by Pseudomonas syringae pv . tomato DC3000 in Arabidopsis and tomato. Plant Methods 1-8. doi:10.1186 / sl3007-021-00826-2Gallego-Bartolome J, Gardiner J, Liu W, Papikian A, Ghoshal B, Kuo HY, Zhao JM-C, Segal DJ, Jacobsen SE. 2018. Targeted DNA demethylation of the Arabidopsis genome using the human TET1 catalytic domain. Proc Natl Acad Sci 115:201716945. doi:10.1073 / pnas,1716945115Gardiner J, Ghoshal B, Wang M, Jacobsen SE. 2022. CRISPR-Cas-mediated transcriptional control and epimutagenesis. Plant Physiol 188:1811-1824. doi:10.1093 / plphys / kiac033Ghoshal B, Gardiner J. 2021. CRISPR-dCas9-Based Targeted Manipulation of DNA Methylation in Plants.Springer Protoc Handbooks 2:57-71. doi:https: / / doi.org / 10.1007 / 978-l-0716-1657-4_5Gorringe N, Topp S, Burns R, Yamaguchi S, Fernando A. 2025. Natural variation modifies centromere- proximal meiotic crossover frequency and segregation distortion in Arabidopsis thaliana. bioRxiv 1-69.Guo L, Yu Y, Law JA, Zhang X. 2010. Set domain group2 is the major histone H3 lysine 4 trimethyltransferase in Arabidopsis. Proc Natl Acad Sci U S A 107:18557-18562. doi:10.1073 / pnas,1010478107Harris CJ, Amtmann A, Ton J. 2023. Epigenetic processes in plant stress priming : Open questions and new approaches. Curr Opin Plant Biol 75:102432. doi:10.1016 / j.pbi.2023.102432Harris CJ, Zhong Z, Ichino L, Feng S, Jacobsen SE. 2024. Hl restricts euchromatin-associated methylation pathways from heterochromatic encroachment. Elife 1-17.Howe FS, Fischl H, Murray SC, Mellor J. 2017. Is H3K4me3 instructive for transcription activation? BioEssays 39:1-12. doi:10.1002 / bies.201600095Ichino L, Boone BA, Strauskulage L, Harris CJ, Kaur G, Gladstone MA, Tan M, Feng S, Jami-alahmadi Y, Duttke SH. 2021. MBD5 and MBD6 couple DNA methylation to gene silencing through the J-domain protein SILENZIO. Science (80- ) 1439:1434-1439.Johnson LM, Du J, Hale CJ, Bischof S, Feng S, Chodavarapu RK, Zhong X, Marson G, Pellegrini M, Segal DJ, Patel DJ, Jacobsen SE. 2014. SRA- and SET-domain-containing proteins link RNA polymerase V occupancy to DNA methylation. Nature 507:124-128. doi:10.1038 / naturel2931Kbiri N, Dluzewska J, Henderson IR, Ziolkowski PA. 2022. Quantifying Meiotic Crossover Recombination in Arabidopsis Lines Expressing Fluorescent Reporters in Seeds Using SeedScoring Pipeline for CellProfiler In:95MF-364578422Attorney Docket No. 26223-20028.40Lambing C, editor. Plant Gametogenesis. Methods in Molecular Biology. New York, NY: Springer US. pp. 121-134. doi:10.1007 / 978-l-0716-2253-7_10Klasfeld S, Roule T, Wagner D. 2022. Greenscreen: A simple method to remove artifactual signals and enrich for true peaks in genomic datasets including ChlP-seq data. Plant Cell 34:4795-4815. doi: 10.1093 / plcell / koac282Lauberth SM, Nakayama T, Wu X, Ferris AL, Tang Z, Hughes SH, Roeder RG. 2013. H3K4me3 interactions with TAF3 regulate preinitiation complex assembly and selective gene activation. Cell 152:1021-1036. doi:10.1016 / j.cell.2013.01.052Leng X, Thomas Q, Rasmussen SH, Marquardt S. 2020. A G(enomic)P(ositioning)S(ystem) for Plant RNAPII Transcription. Trends Plant Sci 25:744-764. doi: 10.1016 / j.tplants.2020.03.005Lloyd JPB, Lister R. 2021. Epigenome plasticity in plants. Nat Rev Genet 0123456789. doi: 10.1038 / s41576- 021-00407-yLu Z, Marand AP, Ricci WA, Ethridge CL, Zhang X, Schmitz RJ. 2019. The prevalence, evolution and chromatin signatures of plant regulatory elements. Nat Plants 5:1250-1259. doi:10.1038 / s41477-019-0548-zMarquardt S, Petrillo E, Manavella PA. 2023. Cotranscriptional RNA processing and modification in plants. Plant Cell 35:1654-1670. doi:10.1093 / plcell / koac309Naish M, Alonge M, Wlodzimierz P, Tock AJ, Abramson BW, Schmucker A, Mandakova T, Jamge B, Lambing C, Kuo P, Yelina N, Hartwick N, Colt K, Smith LM, Ton J, Kakutani T, Martienssen RA, Schneeberger K, Lysak MA, Berger F, Bousios A, Michael TP, Schatz MC, Henderson IR. 2021. The genetic and epigenetic landscape of the Arabidopsis centromeres. Science (80- ) 374. doi:10.1126 / science.abi7489Oberkofler V, Baurle I. 2022. Inducible epigenome editing probes for the role of histone H3K4 methylation in Arabidopsis heat stress memory. Plant Physiol 703-714. doi:10.1093 / plphys / kiacl l3Pan C, Wu X, Markel K, Malzahn AA, Kundagrami N, Sretenovic S, Zhang Y, Cheng Y, Shih PM, Qi Y. 2021. CRISPR-Act3.0 for highly efficient multiplexed gene activation in plants. Nat Plants 7:942-953. doi:10.1038 / s41477-021-00953-7Papikian A, Liu W, Gallego-bartolome J, Jacobsen SE. 2019. Site-specific manipulation of Arabidopsis loci using CRISPR-Cas9 SunTag systems. Nat Commun 10. doi:10.1038 / s41467-019-08736-7Perez MF, Sarkies P. 2023. Histone methyltransferase activity affects metabolism in human cells independently of transcriptional regulation. PLoS Biol 21:1-31. doi:10.1371 / journal.pbio.3002354Pflueger C, Tan D, Swain T, Nguyen T, Pflueger J, Nefzger C, Polo JM, Ford E, Lister R. 2018. A modular dCas9-SunTag DNMT3A epigenome editing system overcomes pervasive off-target activity of direct fusion dCas9-DNMT3A constructs. Genome Res 28:1193-1206. doi:10.1101 / gr.233049.117Policarpi C, Hackett JA. 2021. Epigenetic editing : Dissecting chromatin function. BioEssays 1-16.Policarpi C, Munafd M, Tsagkris S, Carlini V, Hackett JA. 2024. Systematic epigenome editing captures the context-dependent instructive function of chromatin modifications. Nat Genet 56: 1168-1180. doi:10.1038 / s41588-024-01706-wPowers NR, Parvanov ED, Baker CL, Walker M, Petkov PM, Paigen K. 2016. The Meiotic Recombination Activator PRDM9 Trimethylates Both H3K36 and H3K4 at Recombination Hotspots In Vivo. PLoS Genet 12:1-24. doi:10.1371 / journal.pgen.1006146Rickels R, Herz HM, Sze CC, Cao K, Morgan MA, Collings CK, Gause M, Takahashi YH, Wang L, Rendleman EJ, Marshall SA, Krueger A, Bartom ET, Piunti A, Smith ER, Abshiru NA, Kelleher NL, Dorsett D, Shilatifard A. 2017. Histone H3K4 monomethylation catalyzed by Trr and mammalian COMPASS-like proteins at enhancers is dispensable for development and viability. Nat Genet 49:1647-1653. doi:10.1038 / ng.3965Rowan BA, Heavens D, Feuerborn TR, Tock AJ, Henderson IR, Weigel D. 2019. An Ultra High-Density Arabidopsis thaliana Crossover. Genetics 213:771-787.96MF-364578422Attorney Docket No. 26223-20028.40Schwaiger M, Schonauer A, Rendeiro AF, Pribitzer C, Schauer A, Gilles AF, Schinko JB, Renfer E, Fredman D, Technau U. 2014. Evolutionary conservation of the eumetazoan gene regulatory landscape. Genome Res 24:639-650. doi:10.1101 / gr,162529.113Selma S, Orzaez D. 2021. Perspectives for epigenetic editing in crops. Transgenic Res 30:381 400. doi:10.1007 / sll248-021-00252-zShang JY, Lu YJ, Cai XW, Su YN, Feng C, Li L, Chen S, He XJ. 2021. COMPASS functions as a module of the INO80 chromatin remodeling complex to mediate histone H3K4 methylation in Arabidopsis. Plant Cell 33:3250-3271. doi:10.1093 / plcell / koabl87Song X, Tang S, Liu H, Meng Y, Luo H, Wang B, Hou XL, Yan B, Yang C, Guo Z, Wang L, Jiang S, Deng X, Cao X. 2025. Inheritance of acquired adaptive cold tolerance in rice through DNA methylation. Cell 1-12. doi:10.1016 / j.cell.2025.04.036Soppe WJ, Jacobsen SE, Alonso-Blanco C, Jackson JP, Kakutani T, Koornneef M, Peeters a J. 2000. The late flowering phenotype of fwa mutants is caused by gain-of-function epigenetic alleles of a homeodomain gene. Mol Cell 6:791-802.Taagen E, Bogdanove AJ, Sorrells ME. 2020. Counting on Crossovers: Controlled Recombination for Plant Breeding. Trends Plant Sci 25:455 465. doi:10.1016 / j.tplants.2019.12.017Tanenbaum ME, Gilbert L a, Qi LS, Weissman JS, Vale RD. 2014. A versatile protein tagging system for signal amplification in single molecule imaging and gene regulation. Cell 159:635-646. doi:10.1016 / j. cell.2014.09.039Tock AJ, Holland DM, Jiang W, Osman K, Sanchez-Moran E, Higgins JD, Edwards KJ, Uauy C, Franklin FCH, Henderson IR. 2021. Crossover-active regions of the wheat genome are distinguished by DMC1, the chromosome axis, H3K27me3, and signatures of adaptation. Genome Res 31:1614-1628. doi:10.1101 / gr.273672.120Underwood CJ, Choi K, Lambing C, Zhao X, Serra H, Borges F, Simorowski J, Ernst E, Jacob Y, Henderson IR, Martienssen RA. 2018. Epigenetic activation of meiotic recombination near Arabidopsis thaliana centromeres via loss of H3K9me2 and non-CG DNA methylation. Genome Res 28:519-531. doi:10.1101 / gr.227116.117Vermeulen M, Mulder KW, Denissov S, Pijnappel WWMP, van Schaik FMA, Varier RA, Baltissen MPA, Stunnenberg HG, Mann M, Timmers HTM. 2007. Selective Anchoring of TFIID to Nucleosomes by Trimethylation of Histone H3 Lysine 4. Cell 131:58-69. doi:10.1016 / j.cell.2007.08.016Villiger L, Joung J, Koblan L, Weissman J, Abudayyeh OO, Gootenberg JS. 2024. CRISPR technologies for genome, epigenome and transcriptome editing. Nat Rev Mol Cell Biol. doi:10.1038 / s41580-023-00697-6Wang H, Fan Z, Shliaha P V., Miele M, Hendrickson RC, Jiang X, Helin K. 2023. H3K4me3 regulates RNA polymerase II promoter-proximal pause-release. Nature 615:339-348. doi:10.1038 / s41586-023-05780-8Wang H, Helin K. 2024. Roles of H3K4 methylation in biology and disease. Trends Cell Biol xx:l-14. doi:10.1016 / j.tcb.2024.06.001Wang M, He Y, Zhong Z, Papikian A, Wang S, Gardiner J, Ghoshal B, Feng S, Jami-Alahmadi Y, Wohlschlegel JA, Jacobsen SE. 2025. Histone H3 lysine 4 methylation recruits DNA demethylases to enforce gene expression in Arabidopsis. Nat Plants 11:206-217. doi:10.1038 / s41477-025-01924-yWang M, Zhong Z, Gallego-bartolome J, Li Z, Feng S, Kuo HY, Kan RL, Lam H, Richey JC, Tang L, Zhou J, Liu M, Jami-Alahmadi Y, Wohlschlegel JA, Jacobsen SE. 2023. A gene silencing screen uncovers diverse tools for targeted gene repression in Arabidopsis . Nat Plants. doi:10.1038 / s41477-023-01362-8Wang MY, Chen J Bin, Wu R, Guo HL, Chen Y, Li ZJ, Wei LY, Liu C, He SF, Du M Da, Guo Y long, Peng YL, Jones JDG, Weigel D, Huang JH, Zhu WS. 2023. The plant immune receptor SNC1 monitors helper NLRs targeted by a bacterial effector. Cell Host Microbe 31:1792-1803. e7. doi:10.1016 / j.chom.2023.10.00697MF-364578422Attorney Docket No. 26223-20028.40Wang X, Yamaguchi N. 2024. Cause or effect: Probing the roles of epigenetics in plant development and environmental responses. Curr Opin Plant Biol 81:102569. doi:10.1016 / j.pbi.2024.102569Wang Yuqiu, Fan Y, Fan D, Zhang Y, Zhou X, Zhang R, Wang Yao, Sun Y, Zhang W, He Y, Deng XW, Zhu D. 2022. The Arabidopsis DREAM complex antagonizes WDR5A to modulate histone H3K4me2 / 3 deposition for a subset of genome repression. Proc Natl Acad Sci U S A 119:1-9. doi:10.1073 / pnas.2206075119Wlodzimierz P, Rabanal FA, Burns R, Naish M, Primetis E, Scott A, Mandakova T, Gorringe N, Tock AJ, Holland D, Fritschi K, Habring A, Lanz C, Patel C, Schlegel T, Collenberg M, Mielke M, Nordborg M, Roux F, Shirsekar G, Alonso-Blanco C, Lysak MA, Novikova PY, Bousios A, Weigel D, Henderson IR. 2023. Cycles of satellite and transposon evolution in Arabidopsis centromeres. Nature 618:557-565. doi:10.1038 / s41586-023- 06062-zWu G, Rossidivito G, Hu T, Berlyand Y, Poethig RS. 2015. Traffic lines: New tools for genetic analysis in Arabidopsis thaliana. Genetics 200:35 45. doi:10.1534 / genetics.114.173435Xiao J, Lee U, Wagner D. 2016. ScienceDirect Tug of war : adding and removing histone lysine methylation in Arabidopsis. Curr Opin Plant Biol 34:41-53. doi:10.1016 / j.pbi.2016.08.002Xie SS, Zhang YZ, Peng L, Yu DT, Zhu G, Zhao Q, Wang CH, Xie Q, Duan CG. 2023. JMJ28 guides sequence-specific targeting of ATXl / 2-containing COMPASS-like complex in Arabidopsis. Cell Rep 42:112163. doi:10.1016 / j.celrep.2023.112163Xu L, Wang Y, Li X, Hu Q, Adamkova V, Xu J, Harris CJ, Ausin I. 2025. H3K4me3 binding ALFIN - LIKE proteins recruit SWR1 for gene - body deposition of H2A . Z. Genome Biol. doi:10.1186 / sl3059-025-03605-7Xue M, Ma L, Li X, Zhang H, Zhao F, Liu Q, Jiang D. 2025. Single amino acid mutations in histone H3.3 illuminate the functional significance of H3K4 methylation in plants. Nat Commun 16. doi:10.1038 / s41467- 025-59711-4Yang L, Wang Z, Hua J. 2023. Multiple chromatin-associated modules regulate expression of an intracellular immune receptor gene in Arabidopsis. New Phytol 237:2284-2297. doi: 10.1111 / nph.18672Yelina NE, Choi K, Chelysheva L, Macaulay M, de Snoo B, Wijnker E, Miller N, Drouaud J, Grelon M, Copenhaver GP, Mezard C, Kelly K a, Henderson IR. 2012. Epigenetic remodeling of meiotic crossover frequency in Arabidopsis thaliana DNA methyltransferase mutants. PLoS Genet 8:el002844. doi: 10.1371 / journal.pgen.1002844Yelina NE, Lambing C, Hardcastle TJ, Zhao X, Santos B, Henderson IR. 2015. DNA methylation epigenetically silences crossover hot spots and controls chromosomal domains of meiotic recombination in Arabidopsis. Genes Dev 29:2183-2202. doi:10.1101 / gad.270876.115Yi H, Richards EJ. 2009. Gene duplication and hypermutation of the pathogen Resistance gene SNC1 in the arabidopsis bal variant. Genetics 183:1227-1234. doi:10.1534 / genetics.109.105569Yi H, Richards EJ. 2007. A cluster of disease resistance genes in Arabidopsis is coordinately regulated by transcriptional activation and RNA silencing. Plant Cell 19:2929-39. doi: 10.1105 / tpc.107.051821ADDITIONAL SEQUENCE INFORMATION
[0294] Arabidopsis thaliana SDG2 (SEQ ID NO: 1)
[0295] Amino acid sequence of Arabidopsis thaliana SDG2C (SEQ ID NO: 16)
[0296] Amino acid sequence of Arabidopsis thaliana SDG2 mutant (Y1903F) (SEQ ID NO: 93)
[0297] Amino acid sequence of Arabidopsis thaliana SDG2C mutant (Y1903F) (SEQ ID NO: 94)
[0298] Mus musculus PRDM9 wild-type sequence (SEQ ID NO: 30)98MF-364578422Attorney Docket No. 26223-20028.40
[0299] Mus musculus PRDM9 G282A mutation sequence (SEQ ID NO: 31)
[0300] Mus musculus PRDM9 wild-type catalytic domain (110-417aa) (SEQ ID NO: 38)
[0301] Mus musculus PRDM9 G282A catalytic domain (110-417aa) (SEQ ID NO: 43)
[0302] scFv antibody polypeptide sequence (SEQ ID NO: 95)
[0303] GCN4 epitope polypeptide sequence (1 copy) (SEQ ID NO: 96)
[0304] GCN4 epitope polypeptide sequence (10 copies) (SEQ ID NO: 97)
[0305] dCAS9 amino acid sequence (SEQ ID NO: 98)
[0306] CAS9 DNA sequence (SEQ ID NO: 99)
[0307] CasO (DNA sequence) (SEQ ID NO: 100)
[0308] Cas (protein sequence) (SEQ ID NO: 101)
[0309] Exemplary CAS9 amino acid sequence 1 (SEQ ID NO: 102)
[0310] Exemplary CAS9 amino acid sequence 2 (SEQ ID NO: 103)
[0311] Exemplary CAS9 amino acid sequence 3 (SEQ ID NO: 104)
[0312] U6 promotor sequence (SEQ ID NO: 105)
[0313] Exemplary tRNA sequence (e.g., for a tRNA-gRNA expression cassette) (SEQ ID NO: 106)
[0314] Polypeptide sequence of dCAS9 with a multimerized epitope (SEQ ID NO: 107)
[0315] Plant-specific TBS insulator sequence (SEQ ID NO: 108)
[0316] UBQ10 promoter sequence (SEQ ID NO: 109)
[0317] Exemplary transcriptional termination nucleic acid sequence 1 (SEQ ID NO: 110)
[0318] Exemplary transcriptional termination nucleic acid sequence 2 (SEQ ID NO: 111)99MF-364578422
Claims
Attorney Docket No. 26223-20028.40CLAIMSWhat is claimed is:
1. A method for producing a plurality of plant meiocytes having an increased rate of recombination between a first genomic locus and a second genomic locus, the method comprising:(a) providing a plant comprising a recombinant polypeptide comprising a histone methyltransferase domain and that is capable of being targeted to a target locus between the first genomic locus and the second genomic locus;(b) growing the plant under conditions whereby the recombinant polypeptide is targeted to the target locus in a plurality of plant meiocyte precursor cells; and(c) growing the plant under conditions whereby the plurality of plant meiocyte precursor cells undergo meiosis, thereby producing a plurality of plant meiocytes having an increased rate of recombination between the first genomic locus and the second genomic locus, wherein the rate of recombination is measured relative to a comparator plurality of plant meiocytes.
2. The method of claim 1, wherein(a) the plant further comprises: i) a second recombinant polypeptide comprising 1) a nuclease-deficient CAS9 polypeptide (dCAS9) or fragment thereof and 2) a multimerized epitope; and ii) a crRNA and a tracr RNA, or fusions thereof; wherein the recombinant polypeptide comprising a histone methyltransferase domain further comprises an affinity polypeptide that specifically binds to the epitope; and(b) the growing comprises conditions whereby the second recombinant polypeptide and the recombinant polypeptide comprising the histone methyltransferase domain are targeted to the target locus.
3. A method for producing a progeny plant, comprising:(a) providing a plant comprising a recombinant polypeptide comprising a histone methyltransferase domain and that is capable of being targeted to a target locus between a first genomic locus a second genomic locus;(b) growing the plant under conditions whereby the recombinant polypeptide is100MF-364578422Attorney Docket No. 26223-20028.40 targeted to the target locus in a plant meiocyte precursor cell;(c) producing a first plant meiocyte from the plant meiocyte precursor cell; and(d) crossing the first plant meiocyte with a second plant meiocyte, thereby producing a progeny plant, wherein the progeny plant comprises an increased number of recombined genomic sequences between the first genomic locus and the second genomic locus compared to a comparator progeny plant.
4. The method of claim 3, wherein:(a) the plant further comprises: i) a second recombinant polypeptide comprising 1) a nuclease-deficient CAS9 polypeptide (dCAS9) or fragment thereof and 2) a multimerized epitope; and a crRNA and a tracr RNA, or fusions thereof; wherein the recombinant polypeptide comprising a histone methyltransferase domain further comprises an affinity polypeptide that specifically binds to the epitope; and(b) the growing comprises conditions whereby the second recombinant polypeptide and the recombinant polypeptide comprising the histone methyltransferase domain are targeted to the target locus.
5. The method of claim 2 or claim 4, wherein the multimerized epitope comprises a GCN4 epitope.
6. The method of any one of claims 2, 4, and 5, wherein the second polypeptide comprises a nuclear localization signal (NLS).
7. The method of any one of claims 2 and 4-6, wherein the affinity polypeptide is an antibody.
8. The method of claim 7, wherein the antibody is an scFv antibody.
9. The method of any one of claims 2 and 4-8, wherein the polypeptide comprising a histone methyltransferase domain comprises an SV40-type NLS.
10. The method of claim 1, wherein the rate of recombination in the plurality of plant101MF-364578422Attorney Docket No. 26223-20028.40 meiocytes is at least 10% higher, at least 20% higher, at least 30% higher, at least 40% higher, at least 50% higher, or more than 50% higher than the rate of recombination in the comparator plurality of plant meiocytes.
11. The method of claim 3, wherein progeny plant comprises at least 1 more, at least 2 more, at least 3 more, at least 4 more, at least 5 more, or more than 5 more recombined genomic sequences between the first genomic locus and the second genomic locus compared to the comparator progeny plant.
12. A plant produced from:(1) a plant meiocyte from the plurality of plant meiocytes produced by the method of claim 1 or claim 2; or(2) the method of claim 3 or claim 4.
13. A plant part of the plant of claim 12.
14. A seed produced by the plant of claim 12.
15. A plant generated from the plant part of claim 13 or grown from the seed of claim 14.
16. A plant derived from the plant of claim 12.
17. The plant of any one of claims 12, 15, or 16, the plant part of claim 13, or the seed of claim 14, comprising increased methylation between the first genomic locus and the second genomic locus compared to a comparator plant, comparator plant part, or comparator seed.
18. The method of any one of claims 1-17, wherein the histone methyltransferase domain deposits H3K4me3.
19. The method of any one of claims 1-18, wherein the histone methyltransferase domain is an SDG2 polypeptide, a PRDM9 polypeptide, or a fragment thereof.
20. The method of any one of claims 1-19, wherein the histone methyltransferase domain is at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, or at least 95% identical102MF-364578422Attorney Docket No. 26223-20028.40 to a sequence selected from the group consisting of the histone methyltransferase amino acid sequences provided herein and homologs and orthologs thereof, optionally wherein the histone methyltransferase domain is at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, or at least 95% identical to the amino acid sequence of an SDG2 polypeptide selected from the group consisting of SEQ ID NO: 1, SEQ ID NO: 2, SEQ ID NO: 3, SEQ ID NO: 4, SEQ ID NO: 5, SEQ ID NO: 6, SEQ ID NO: 7, SEQ ID NO: 8, SEQ ID NO: 9, SEQ ID NO: 10, SEQ ID NO: 11, SEQ ID NO: 12, SEQ ID NO: 13, SEQ ID NO: 14, SEQ ID NO: 15, SEQ ID NO: 16, SEQ ID NO: 17, SEQ ID NO: 18, SEQ ID NO: 19, SEQ ID NO:
20. SEQ ID NO: 21, SEQ ID NO: 22, SEQ ID NO: 23, SEQ ID NO: 24, SEQ ID NO: 25, SEQ ID NO: 26, SEQ ID NO: 27, SEQ ID NO: 28, SEQ ID NO: 29, SEQ ID NO: 93, and SEQ ID NO: 94, or to the amino acid sequence of a PRDM9 polypeptide selected from the group consisting of SEQ ID NO: 30, SEQ ID NO: 31, SEQ ID NO: 32, SEQ ID NO: 33, SEQ ID NO: 34, SEQ ID NO: 35, SEQ ID NO: 36, SEQ ID NO: 37, SEQ ID NO: 38, SEQ ID NO: 39, SEQ ID NO: 40, SEQ ID NO: 41, SEQ ID NO: 42, and SEQ ID NO: 43.
21. The method of claim 20, wherein the histone methyltransferase domain comprises:(a) an Arabidopsis SDG2 polypeptide sequence having at least 95% sequence identity to an amino acid sequence selected from the group consisting of SEQ ID NO: 1, SEQ ID NO: 16, SEQ ID NO: 93, and SEQ ID NO: 94; or(b) a murine PRDM9 polypeptide having at least 95% sequence identity to an amino acid sequence selected from the group consisting of SEQ ID NO: 30, SEQ ID NO: 31, SEQ ID NO: 38, and SEQ ID NO: 43.
22. The method of any one of claims 1-21, wherein at least one of the first genomic locus and the second genomic locus are in a centromere or proximal to a centromere.
23. The method of claim 22, wherein the first genomic locus and the second genomic locus are in a centromere.
24. The method of claim 3, wherein the second plant meiocyte is derived from the same plant as the first plant meiocyte, and / or wherein the second plant meiocyte is derived from a plant comprising the first and second recombinant polypeptides.
25. The method of claim 3, wherein the second plant meiocyte is not derived from the103MF-364578422Attorney Docket No. 26223-20028.40 same plant as the first plant meiocyte, and / or wherein the second plant meiocyte is derived from a plant lacking the first and second recombinant polypeptides.
26. The method of claim 1 or claim 3, wherein the first genomic locus and the second genomic locus are separated by at least 1 kilobase (kb), at least 5 kb, at least 50 kb, at least 100 kb, at least 500 kb, at least 1 Megabase (Mb), at least 2 Mb, at least 3 Mb, at least 5 Mb, at least 10 Mb, at least 50 Mb, or more than 50 Mb.
27. The method of claim 1 or claim 3, wherein the gRNA or crRNA comprises a sequence that aligns perfectly to the target locus.
28. The method of claim 1 or claim 3, wherein the gRNA or crRNA comprises a sequence that aligns to the target locus with five or fewer mismatches.
29. The method of claim 22, wherein the gRNA or crRNA comprises a sequence that aligns to a repeat sequence in a centromere or aligns to a repeat sequence proximal to a centromere.104MF-364578422