Cas enzyme and system comprising same, and use
By improving the Cas enzyme and fusion molecules, the cleavage activity and DNA binding efficiency of the CRISPR-Cas system have been enhanced, overcoming the shortcomings of existing systems in genome editing and epigenetic modification, and achieving highly efficient gene regulation of eukaryotic cells.
Patent Information
- Authority / Receiving Office
- WO · WO
- Patent Type
- Applications
- Current Assignee / Owner
- EPIGENIC THERAPEUTICS INC
- Filing Date
- 2025-12-05
- Publication Date
- 2026-06-11
AI Technical Summary
Existing CRISPR-Cas systems lack robust programmable effectors and systems for modifying nucleic acids and polynucleotides, making it difficult to meet the needs of genome editing and epigenetic modification.
A Cas enzyme is provided that has superior cleavage activity and DNA binding efficiency compared to the wild type, and can bind to transcriptional activation or repression tools, recognize more PAM sequences, and be engineered to form fusion molecules and non-naturally occurring CRISPR-Cas systems.
It improves the cleavage activity and DNA binding efficiency of Cas enzymes, enhances the programmability and flexibility of the system, and is suitable for gene editing and epigenetic modification in eukaryotic cells such as mammalian cells.
Smart Images

Figure PCTCN2025140425-FTAPPB-I100001 
Figure PCTCN2025140425-FTAPPB-I100002 
Figure PCTCN2025140425-FTAPPB-I100003
Abstract
Description
Cas enzymes, systems, and applications TECHNICAL FIELD
[0001] The present application relates to the field of biological medicine, in particular to a Cas enzyme, systems, and applications thereof. BACKGROUND
[0002] Recent advances in genome sequencing technologies and analytical methods have dramatically accelerated the understanding of the genetic basis of biological activities in diverse fields, ranging from prokaryotic synthetic pathways to human pathologies. To fully understand and evaluate the vast amount of information generated by gene sequencing technologies, there is a need for corresponding improvements in the scale, efficiency, and ease-of-use of genome and epigenome manipulation technologies. These new genome and epigenome engineering technologies will accelerate the development of new applications in many fields, including biotechnology, agriculture, and human therapeutics.
[0003] Clustered Regularly Interspaced Short Palindromic Repeats (CRISPR) and CRISPR-associated (Cas) genes, collectively known as CRISPR-Cas or CRISPR / Cas systems, have now been understood to provide bacteria and archaea with immunity against phage infection. The CRISPR-Cas system of prokaryotic adaptive immunity is an extremely diverse set of protein effectors, non-coding elements, and locus structures, some examples of which have been engineered and adapted to produce important biotechnologies. Components of the system involved in host defense include one or more effector proteins capable of modifying DNA or RNA and RNA guide elements responsible for targeting the activities of these proteins to specific sequences on phage DNA or RNA, which can be reprogrammed to target alternative DNA or RNA targets.
[0004] CRISPR-Cas systems can be broadly divided into two classes: class 1 systems are composed of multiple effector proteins, and class 2 systems are composed of a single effector protein that complexes with an RNA guide to target a DNA or RNA substrate. The single-subunit effector composition of class 2 systems provides a simpler set of components for engineering and application translation, and has been an important source of programmable effectors to date. Characterization and engineering of class 2 CRISPR-Cas systems, exemplified by CRISPR-Cas9, has paved the way for diverse and widespread biotechnological applications of genome editing and other aspects. However, in addition to the current CRISPR-Cas systems that have enabled new applications through their unique properties, there is still a need to develop powerful genome engineering tools, i.e., alternative programmable effectors and systems for modifying nucleic acids and polynucleotides (i.e., DNA, RNA, or any hybrid, derivative, or modification thereof). SUMMARY
[0005] In one aspect, the present application provides a Cas enzyme having one or more of the following advantages: (1) has superior cleavage activity compared to wild-type Cas enzyme; (2) has superior DNA binding efficiency, (3) can be used in epigenetic modification in combination with transcriptional activation or transcriptional repression tools, and (4) can recognize more PAM sequences.
[0006] The Cas enzyme described herein comprises a sequence having at least about 80% identity to the amino acid sequence set forth in any one of SEQ ID NOs: 1-4, or is based on the amino acid sequence set forth in any one of SEQ ID NOs: 1-4, and comprises one or more amino acid substitutions or deletions, respectively.
[0007] For example, the Cas enzyme comprises a sequence having about 80%, about 81%, about 82%, about 83%, about 84%, about 85%, about 86%, about 87%, about 88%, about 89%, about 90%, about 91%, about 92%, about 93%, about 94%, about 95%, about 96%, about 97%, about 98%, about 99%, or about 100% identity to the amino acid sequence set forth in any one of SEQ ID NOs: 1-4.
[0008] In certain embodiments, the one or more amino acid substitutions comprise:
[0009] 1) amino acid substitutions at one or more of positions 28, 83, 148, 206, 212, 265, 268, 277, 322, 336, 362, 375, 386, 402, 420, 560, 602, 627, 743, 756, 793, 802, 859, 860, 891, 901, 1004, 1053, 1079, 1088, 1108, 1122, 1162, 1212, 1242, and 1255, based on the amino acid sequence set forth in SEQ ID NO: 1; or
[0010] 2) one or more amino acid substitutions at one or more of positions 6, 35, 39, 130, 149, 169, 172, 178, 252, 266, 285, 306, 309, 342, 378, 388, 395, 399, 404, 411, 421, 434, 438, 476, 506, 520, 572, 602, 611, 612, 626, 674, 679, 716, 729, 767, 930, 939, 1017, 1072, 1087, 1103, 1104, and 1242, of the amino acid sequence set forth in SEQ ID NO:2; or
[0011] 3) one or more amino acid substitutions or deletions at one or more of positions 7, 82, 165, 167, 180, 194, 224, 226, 267, 288, 323, 361, 446, 447, 451, 458, 484, 510, 532, 537, 561, 569, 571, 604, 605, 693, 891, 899, 915, 916, and 1001, of the amino acid sequence set forth in SEQ ID NO:3; or
[0012] 4) an amino acid substitution at one or more of positions 4, 6, 9, 11, 15, 17, 23, 24, 28, 53, 65, 72, 81, 103, 110, 122, 123, 133, 136, 144, 153, 160, 164, 165, 172, 183, 184, 191, 196, 202, 211, 213, 219, 224, 227, 229, 231, 233, 235, 237, 243, 259, 260, 262, 268, 274, 275, 281, 286, 297, 326, 328, 331, 332, 338, 343, 352, 361, 363, 365, 367, 375, 382, 383, 387, 402, 418, 420, 429, 434, 441, 443, 456, 458, 462, 466, 477, 485, 494, 519, 520, 527, 528, 531, 579, 582, 591, 608, 612, 615, 627, 633, 646, 664, 666, 684, 698, 699, 704, 707, 708, 711, 713, 715, 718, 720, 724, 731, 733, 749, 754, 766, 786, 804, 818, 820, 842, 851, 864, 865, 881, 890, 901, 907, and 908, based on the amino acid sequence set forth in SEQ ID NO:4.
[0013] In certain embodiments, the substitution of one or more amino acids comprises:
[0014] 1) one or more of M206I, I756V, D1212N, E148D, G793D, S859R, I1088V, C265Y, L336M, S375R, F1242V, D627Y, K901N, F1255S, D1162G, K83E, V386L, D560E, A1122V, M277L, S802C, T28S, S891N, N362S, D860N, S420N, N1079I, T402I, D602N, E212V, S268G, E743G, L1004M, E322K, R1053I, and K1108Q based on the amino acid sequence set forth in SEQ ID NO: 1; or
[0015] 2) one or more of S130R, Q169R, A172R, T178R, D252R, T729R, D1017R, T306A, E395K, Y602H, N611K, A1103V, D1104N, F309L, I520L, D679Y, T1072S, D378E, T399S, T411I, E6K, D679K, D939V, E1087K, E1242G, L35M, D149Y, E388K, E404V, K476M, E716K, L342D, I572F, R39S, E285K, L421M, L438M, T506I, I674N, V612M, D626N, F767L, T930S, G266S, and S434N based on the amino acid sequence set forth in SEQ ID NO: 2; or
[0016] 3) one or more of D7G, Q165W or Q165R or Q165K, N167R, V180K or V180N or V180Q or V180R or V180T or V180W or V180G, S224 deletion or S224R, K226R, V267M, E323R or E323G or E323K, G361R, K446R, F451G, E571R, R605G, N693K, L899R, K915R or K915N or K915W or K915V or K915Y or K915Q or K915G or K915T, G916D, T1001I, D458K, D484K, D510K, D532K, D537K, D561K, D569K, I288T, D891E, S447N, A194V, A82V, and I604L based on the amino acid sequence set forth in SEQ ID NO: 3; or
[0017] 4) E361 K or E361 R, K367R, E338K or E338D or E338R, D15G, K28R, E262K, M434R, K443R, T458K or T458R, A229T, W237R, D418G, D420G, K227N, E235Q, E243K, L268V, S274Y, K275M or K275E, E297K, F343Y or F343S or F343I, P383A, S387R, K429N, L520M, L579R, E731Q, M820I, E441K, E213G, N260D, D402Y, L110M, I133N, I191T, N519K, V365M, V233D, M666I, E749G, S165C, W259R, E612K, D615N, T196I, G494D, Q231R, K907N, L908C, N224S, E851K, H462Y, N466D, A708S or A708H or A708Y or A708R, Q881H, P627L, T818P, S328R, E262G, D15Y, Y184C, A11V, K219Q or K219E, E842K, G528C, E144R or E144K, D183Y, K363R, K582R, T6R or T6C, H202R, T4R, H286R or H286Q, M646D, K527R, A153T, T901R, M382R, M172E or M172C, D804R, A890W, M122W, F485A, M211E or M211R, H65R, D684R, Y711D or Y711P, I713R or I713N, N53R, V103R, E81R, K123R, K9R, S17I, I331F or I331C or I331S, Y754F, T786I, V326G, C608Y, S766N, N864D, N332H, E375K, A531E, K72T, A664Q, K24Q, W699Q or W699T or W699A or W699S or W699I or W699L or W699V or W699H or W699M or W699F or W699R or W699P or W699N or W699E or W699D, E160P, D591W, D136Q, R281M, G733I, T718I, W707L or W707M or W707I or W707A or W707V, Y720A or Y720S or Y720Q or Y720G or Y720T or Y720N, A352V, C704V, Y715V or Y715H, N724T, based on the amino acid sequence set forth in SEQ ID NO:4;one or more of K698Y or K698H, R865A, P456S, D477A, A164V, G23D, and S633N.
[0018] In certain embodiments, the one or more amino acid substitutions comprise one or more substitutions selected from:
[0019] 1) V180K, I288T, S447N, D891E, and E323K based on the amino acid sequence set forth in SEQ ID NO: 3; or
[0020] 2) T458K, E262K, K443R, M434R, N332H, W237R, E144K, D183Y, G528C, S633N, S328R, I331S, E375K, C608Y, E338K, and D804R based on the amino acid sequence set forth in SEQ ID NO: 4.
[0021] In certain embodiments, the Cas enzyme comprises one, two, or three substitutions selected from V180K, I288T, and S447N based on the amino acid sequence set forth in SEQ ID NO: 3, the two substitutions comprising V180K / I288T, V180K / S447N, or I288T / S447N; or five substitutions V180K / I288T / S447N / D891E / E323K based on the amino acid sequence set forth in SEQ ID NO: 3.
[0022] In certain embodiments, the Cas enzyme comprises one, two, three, four, or five substitutions selected from T458K, E262K, K443R, M434R, and N332H based on the amino acid sequence set forth in SEQ ID NO: 4, the two substitutions comprise T458K / E262K, T458K / K443R, T458K / M434R, T458K / N332H, E262K / K443R, E262K / M434R, E262K / N332H, K443R / M434R, K443R / N332H, or M434R / N332H; the three substitutions comprise T458K / E262K / K443R, T458K / E262K / M434R, T458K / E262K / N332H, T458K / K443R / M434R, T458K / K443R / N332H, T458K / M434R / N332H, E262K / K443R / M434R, E262K / K443R / N332H, E262K / M434R / N332H, or K443R / M434R / N332H; the four substitutions comprise T458K / E262K / K443R / M434R, T458K / E262K / K443R / N332H, T458K / E262K / M434R / N332H, T458K / K443R / M434R / N332H, E262K / K443R / M434R / N332H; or the six substitutions comprise T458K / E262K / K443R / M434R / N332H / W327R.
[0023] For example, the one or more substitutions or deletions of amino acids are any of those set forth in Table 1:
[0024] Table 1
[0025] In certain embodiments, the Cas enzyme comprises the amino acid sequence set forth in any one of SEQ ID NOs: 5-148, 166-378, 383-398. In certain embodiments, the Cas enzyme has a catalytically active domain capable of binding to a target DNA strand and / or a catalytically active domain capable of cleaving the target DNA strand.
[0026] In certain embodiments, the one or more amino acid substitutions or deletions are located in the catalytically active domain, such that the Cas enzyme has only the activity of binding to a target DNA strand, or the activity of binding to a target DNA strand and the activity of cleaving the target DNA single strand.
[0027] In another aspect, the present application provides a fusion molecule comprising a Cas enzyme described herein and one or more heterologous functional domains.
[0028] In certain embodiments, the one or more heterologous functional domains are capable of modulating the expression of one or more gene products.
[0029] In certain embodiments, the one or more heterologous functional domains are directly or indirectly fused to the Cas enzyme.
[0030] In certain embodiments, the one or more heterologous functional domains are selected from the group consisting of a helicase, a nuclease, a helicase-nuclease, a DNA methyltransferase, a DNA hydroxylmethylase, a histone methylase, a histone demethylase, a histone acetyltransferase, a histone deacetylase, a phosphatase, a kinase, a transcriptional (co-)activator, a transcriptional repressor, a DNA binding protein, a DNA structural protein, a marker protein, a reporter protein, a fluorescent protein, a ligand binding protein, a signal peptide, a subcellular localization sequence, an antibody epitope, and an affinity purification tag.
[0031] In certain embodiments, the one or more heterologous functional domains have one or more of the following activities: methylase activity, demethylase activity, deaminase activity, transcriptional activation activity, transcriptional repression activity, transcriptional release factor activity, reverse transcriptase activity, histone modification activity, RNA cleavage activity, and nucleic acid binding activity.
[0032] In another aspect, the present application provides an engineered, programmable, non-naturally occurring CRISPR-Cas system, which comprises a Cas enzyme described herein or a nucleotide encoding the Cas enzyme; or which comprises a fusion molecule described herein or a nucleotide encoding the fusion molecule, and one or more guide RNAs that target a locus of a nucleic acid molecule encoding one or more gene products in a cell, such that the Cas enzyme or the fusion molecule binds and / or cleaves the locus of the nucleic acid molecule encoding one or more gene products; and the Cas enzyme or the fusion molecule is present simultaneously or non-simultaneously with the guide RNA.
[0033] In another aspect, the present application provides an engineered, non-naturally occurring vector system comprising one or more vectors comprising: a) a first regulatory element operably linked to one or more guide RNAs capable of hybridizing to a target sequence in a locus of a nucleic acid molecule encoding one or more gene products, and b) a second regulatory element operably linked to a Cas enzyme described herein or a fusion molecule described herein, wherein the component a) and the component b) are located on the same or different vectors of the vector system, and the guide RNA targets the locus of the nucleic acid molecule encoding one or more gene products in a cell, thereby directing the Cas enzyme or the fusion molecule to bind and / or cleave the locus of the nucleic acid molecule encoding one or more gene products; and the Cas enzyme or the fusion molecule is present simultaneously or non-simultaneously with the guide RNA.
[0034] In certain embodiments, the expression of the one or more gene products is altered.
[0035] In certain embodiments, the expression of the gene product is reduced or increased.
[0036] In certain embodiments, the gene product is a protein.
[0037] In certain embodiments, the cell is a eukaryotic cell.
[0038] In certain embodiments, the eukaryotic cell is a mammalian cell. For example, the mammalian cell includes, but is not limited to, murine, simian, human, farm animal, sports animal, and pet animal cells.
[0039] In certain embodiments, the mammalian cell is a human cell.
[0040] In certain embodiments, the Cas enzyme is codon-optimized for expression in a eukaryotic cell.
[0041] In certain embodiments, the guide RNA comprises a guide sequence fused to a tracr sequence.
[0042] In certain embodiments, the guide RNA comprises a Direct repeat sequence and a Spacer sequence, wherein the Spacer sequence binds to a nucleic acid molecule targeted by the guide RNA.
[0043] In certain embodiments, the Direct repeat sequence is 10 to 70 nucleotides in length.
[0044] In certain embodiments, the direct repeat sequence is 31 to 36 nucleotides in length.
[0045] In certain embodiments, the spacer sequence is 16 to 24 nucleotides in length.
[0046] In certain embodiments, the nucleic acid molecule targeted by the guide RNA comprises a nucleotide sequence capable of complementary pairing with the spacer sequence.
[0047] In certain embodiments, the vector or the Cas enzyme of the system further comprises one or more nuclear localization sequences (NLS).
[0048] In certain embodiments, the system is introduced into the cell by a delivery system selected from the group consisting of a virion, a liposome, a lipid nanoparticle, electroporation, microinjection, and conjugation.
[0049] In another aspect, the present application provides a method of altering expression of one or more gene products, the method comprising introducing into a cell comprising and expressing a nucleic acid molecule encoding the one or more gene products an engineered, non-naturally occurring CRISPR-Cas system comprising a Cas enzyme described herein or a fusion molecule described herein, and one or more guide RNAs targeting a locus of the nucleic acid molecule encoding the one or more gene products, thereby directing the Cas enzyme or the fusion molecule to bind and / or cleave the locus, thereby altering expression of the one or more gene products; and, the Cas enzyme or the fusion molecule is present simultaneously or non-simultaneously with the guide RNA.
[0050] In another aspect, the application provides a method of altering expression of one or more gene products, the method comprising introducing into a cell comprising and expressing a nucleic acid molecule encoding the one or more gene products an engineered, non-naturally occurring vector system comprising one or more vectors comprising: a) a first regulatory element operably linked to one or more guide RNAs, the one or more guide RNAs being capable of hybridizing to a target sequence in a locus of the nucleic acid molecule encoding the one or more gene products, and b) a second regulatory element operably linked to a Cas enzyme described herein or a fusion molecule described herein, wherein the component a) and the component b) are located on the same or different vectors of the vector system, and the guide RNA targets the locus of the nucleic acid molecule encoding the one or more gene products in the cell, thereby directing the Cas enzyme or the fusion molecule to bind and / or cleave the locus, thereby altering expression of the one or more gene products; and the Cas enzyme or the fusion molecule is present in the cell simultaneously or non-simultaneously with the guide RNA.
[0051] In certain embodiments, the expression of the gene product is decreased or increased.
[0052] In certain embodiments, the gene product is a protein.
[0053] In certain embodiments, the cell is a eukaryotic cell.
[0054] In certain embodiments, the eukaryotic cell is a mammalian cell. For example, the mammalian cell includes, but is not limited to, cells of murine, simian, human, farm animals, sports animals, and pets.
[0055] In certain embodiments, the mammalian cell is a human cell.
[0056] In certain embodiments, the Cas enzyme is codon-optimized for expression in a eukaryotic cell.
[0057] In certain embodiments, the guide RNA comprises a guide sequence fused to a tracr sequence.
[0058] In certain embodiments, the guide RNA comprises a Direct repeat sequence and a Spacer sequence, wherein the Spacer sequence binds to a nucleic acid molecule targeted by the guide RNA.
[0059] In certain embodiments, the Direct repeat sequence is 10 to 70 nucleotides in length.
[0060] In certain embodiments, the direct repeat sequence is 31 to 36 nucleotides in length.
[0061] In certain embodiments, the spacer sequence is 16 to 24 nucleotides in length.
[0062] In certain embodiments, the nucleic acid molecule targeted by the guide RNA comprises a nucleotide sequence capable of complementary pairing with the spacer sequence.
[0063] In certain embodiments, the vector or the Cas enzyme of the system further comprises one or more nuclear localization sequences (NLS).
[0064] In certain embodiments, the method comprises introducing the CRISPR-Cas system or the vector system into the cell by a delivery system selected from the group consisting of a virion, a liposome, a lipid nanoparticle, electroporation, microinjection, and conjugation.
[0065] In another aspect, the present application provides a nucleic acid encoding the Cas enzyme described herein, the fusion molecule described herein, or the CRISPR-Cas system described herein.
[0066] In another aspect, the present application provides a cell comprising the Cas enzyme described herein, the fusion molecule described herein, the CRISPR-Cas system described herein, the vector system described herein, and / or the nucleic acid described herein.
[0067] In another aspect, the present application provides a kit comprising the Cas enzyme described herein, the fusion molecule described herein, the CRISPR-Cas system described herein, the vector system described herein, the nucleic acid described herein, and / or the cell described herein.
[0068] In certain embodiments, the kit further comprises a container for placing the Cas enzyme, the fusion molecule, the CRISPR-Cas system, the vector system, the nucleic acid, and / or the cell, and an instruction manual.
[0069] Other aspects and advantages of the present application will become apparent to those skilled in the art from the following detailed description, which shows and describes illustrative embodiments of the present application. As will be realized, the contents of the present application are capable of modifications in various obvious respects, all without departing from the spirit and scope of the inventive principles described in the present application. Accordingly, the drawings and description are to be regarded as illustrative in nature, and not restrictive. BRIEF DESCRIPTION OF DRAWINGS
[0070] The specific features of the invention to which this application relates are set out in the claims appended hereto. The features and advantages of the invention to which this application relates are better understood by reference to the exemplary embodiments described hereinafter and to the accompanying drawings. The drawings are briefly described as follows:
[0071] Figure 1 shows the flow chart of the method for determining the cleavage activity of the Cas enzymes (different variants of EpiCas002, EpiCas045, EpiCas059 shown in Table 1) in the cleavage reporter system in eukaryotic cells.
[0072] Figure 2 shows the proportion of green fluorescent cells activated by the Cas enzymes (different variants of EpiCas002 shown in Table 1) to the proportion of positive cells transfected by the method of Figure 1.
[0073] Figures 3A-3B show the results of the determination of the cleavage activity of the Cas enzymes (different variants of EpiCas045 shown in Table 1) by the method of Figure 1, with the wild-type activity taken as 1, and the fold increase of the mutant compared to the wild-type calculated.
[0074] Figures 4-21 show the results of the determination of the cleavage activity of the Cas enzymes (different variants of EpiCas059 shown in Table 1) by the method of Figure 1, with the wild-type activity taken as 1, and the fold increase of the mutant compared to the wild-type calculated.
[0075] Figures 22A-22D show the proportion of green fluorescent cells activated by the Cas enzymes (different variants of EpiCas059 shown in Table 1) to the proportion of positive cells transfected by the method of Figure 1.
[0076] Figure 23 shows the results of the determination of the cleavage activity of the Cas enzymes (different variants of EpiCas059 shown in Table 1) by the method of Figure 1, with the wild-type activity taken as 1, and the fold increase of the mutant compared to the wild-type calculated, after the N-terminal PAM of the sequence complementary to the gRNA targeting sequence was changed to TTVG.
[0077] Figure 24 shows the results of the determination of the cleavage activity of the Cas enzymes (different variants of EpiCas059 shown in Table 1) by the method of Figure 1, with the wild-type activity taken as 1, and the fold increase of the mutant compared to the wild-type calculated, after the N-terminal PAM of the sequence complementary to the gRNA targeting sequence was changed to TVTG.
[0078] Figure 25-26 shows the N-terminal PAM of the sequence reverse complementary to the gRNA targeting sequence is changed to CTAG, the Cas enzyme of the present application reference sequence is different variants of EpiCas059 shown in Table 1, the cutting activity is determined by the method of Figure 1, and the wild type activity is taken as 1, and the fold increase of the mutant compared with the wild type is calculated.
[0079] Figure 27 shows the gene activation tool based on the Cas enzyme of the present application (reference sequence is different variants of EpiCas057 shown in Table 1) detects the activation effect of the target site gene on the endogenous site CXCR4.
[0080] Figure 28 shows the gene activation tool based on the Cas enzyme of the present application (reference sequence is different variants of EpiCas057 shown in Table 1) detects the activation effect of the target site gene on the endogenous site HBB.
[0081] Figure 29 shows the targeting reporter system of the present application.
[0082] Figure 30 shows the DNA binding efficiency of the Cas enzyme of the present application (reference sequence is different variants of EpiCas057 shown in Table 1) tested by the GFP reporter system.
[0083] Figure 31-33 shows the InDel ratio of the Cas enzyme of the present application (reference sequence is different variants of EpiCas059 shown in Table 1).
[0084] Figure 34 shows the protein structure of EpiCas057.
[0085] Figure 35 shows the protein structure of EpiCas059. DETAILED DESCRIPTION
[0086] The following specific examples illustrate the embodiments of the application of the present application, and those skilled in the art can easily understand other advantages and effects of the application of the present application from the disclosure of the specification.
[0087] Term definition
[0088] In the present application, the term "identity" is used interchangeably with "homology" and generally refers to the relatedness between the sequences of two or more polypeptide molecules or two or more nucleic acid molecules, as determined by comparing their sequences. In the art, "identity" also refers to the degree of relatedness of nucleic acid molecules or polypeptide sequences, which can be determined by matching the sequences of two or more nucleotides or two or more amino acids. In the present application, the percent (%) identity of an amino acid sequence is defined as the percentage of amino acid residues in the candidate sequence that are identical with the amino acid residues in the reference polypeptide sequence, after aligning the sequences and introducing gaps, if necessary, to achieve the maximum percent sequence identity, and not considering any conservative substitutions as sequence identity. Alignment for purposes of determining percent amino acid sequence identity can be achieved in a variety of ways that are within the skill in the art, for instance, using publicly available computer software such as BLAST, BLAST-2, ALIGN, or Megalign (DNASTAR) software. Those skilled in the art can determine appropriate parameters for aligning sequences, including any algorithms needed to achieve maximal alignment over the full length of the sequences being compared. In certain embodiments, the percent (%) identity of a polypeptide molecule or nucleic acid molecule sequence can also be determined based on the type of mutations that are considered in calculating the total number of residues. The types of mutations include insertions (extensions) at either or both ends of the sequence, deletions (truncations) at either or both ends of the sequence, substitution (or stated as: replacement, substitution) of one or more amino acids / nucleotides, insertions within the sequence, and deletions within the sequence. Using the amino acid sequence of a polypeptide as an example (the same applies to nucleotide sequence), if the type of mutations is one or more of the following: substitution of one or more amino acids / nucleotides, insertion within the sequence, and deletion within the sequence, the total number of residues is calculated as the larger of the two molecules being compared. If the type of mutations also includes insertions (extensions) at either or both ends of the sequence or deletions (truncations) at either or both ends of the sequence or insertions within the sequence and deletions within the sequence, the number of amino acids inserted or deleted at either or both ends or within the sequence (e.g., less than 20 amino acids inserted or deleted at either or both ends) is not counted in the total number of residues. In calculating percent identity, the sequences being compared can be aligned to produce the maximum percent sequence identity, and gaps in the alignment, if any, can be addressed by a particular algorithm.
[0089] In the present application, "catalytically active domain" refers to a distinguishable or determinable conserved structural entity within a Cas protein (enzyme) that exhibits a significant secondary structure content and that is the region of the Cas protein (enzyme) that enables, for example, binding and / or cleavage of a polynucleotide function. An exemplary catalytically active domain can be the protease of the Cas9 family, which has two catalytically active domains, one HNH-like, which acts to cleave the single-stranded polynucleotide (target strand) paired to the guide RNA, and the other domain, RuvC-like, which acts to cleave the complementary strand of the target strand.
[0090] In the present application, the term "bind" (e.g., with respect to the target DNA binding (catalytically active) domain of a polypeptide or protease) generally refers to a non-covalent interaction between macromolecules (e.g., between a protein and a nucleic acid). When in a non-covalent interaction state, macromolecules are said to be "associated" or "interacting" or "bound" (e.g., when molecule X is said to interact with molecule Y, it is meant that molecule X is non-covalently bound to molecule Y). It will be appreciated that not all components of the binding interaction need be sequence-specific (e.g., contact with a phosphate residue in the DNA backbone), but some portion of the binding interaction can be sequence-specific.
[0091] In the present application, the term "fusion molecule" generally refers to a bipartite molecule comprising an enzyme (protein or peptide) of the present application coupled to at least one other moiety, thereby forming a single entity. The enzyme and the at least one other moiety can be separated by a linker, or can be directly coupled. The at least one other moiety can be fused to the enzyme of the present application at the N-terminus, C-terminus, or at any amino acid outside of the terminal amino acids. The other moiety can be fused to a moiety already comprised in the fusion molecule. One of skill in the art is well aware of assays to determine optimal order and / or combination of moieties in a fusion molecule of the present application. Generally, when a fusion molecule comprises an enzyme of the present application and at least one other peptide, the term does not include fusion molecules in which the fusion results in a naturally occurring peptide.
[0092] In this application, the term "heterologous" generally refers to a nucleotide or polypeptide sequence that is not present in natural nucleic acids or proteins. In some embodiments of this application, the term "heterologous functional domain" may refer to about one, two, three, four, five, six, seven, eight, nine, ten, or more domains other than the Cas enzyme described in this application, or a portion of the fusion molecule described in this application. Examples of heterologous functional domains that may be included in or fused with the Cas enzyme described in this application include, but are not limited to, epitope tags, reporter gene sequences, and protein domains having one or more of the following activities: methylase activity, demethylase activity, transcriptional activation activity, transcriptional repression activity, transcriptional release factor activity, histone modification activity, RNA cleavage activity, and nucleic acid binding activity. Non-limiting examples of epitope tags include histidine (His) tags, V5 tags, FLAG tags, influenza virus hemagglutinin (HA) tags, Myc tags, VSV-G tags, and thioredoxin (Trx) tags. Examples of reporter genes include, but are not limited to, glutathione S-transferase (GST), horseradish peroxidase (HRP), chloramphenicol acetyltransferase (CAT), β-galactosidase, β-glucuronidase, luciferase, green fluorescent protein (GFP), HcRed, DsRed, cyan fluorescent protein (CFP), yellow fluorescent protein (YFP), and autofluorescent proteins including blue fluorescent protein (BFP). Cas enzymes can be fused to a gene sequence encoding a protein or protein fragment that binds to DNA molecules or other cellular molecules, including, but not limited to, maltose-binding protein (MBP), S-tag, Lex A DNA-binding domain (DBD) fusions, GAL4 DNA-binding domain fusions, and herpes simplex virus (HSV) BP16 protein fusions. Additional domains that can form part of a fusion molecule containing a Cas enzyme are described in US20110059502, which is incorporated herein by reference.
[0093] In this application, the term "expression" generally refers to the process of transcribing a DNA template into a polynucleotide (such as mRNA or other RNA transcripts) and / or the subsequent translation of the transcribed mRNA into a peptide, polypeptide, or protein. Transcripts and encoded polypeptides can be collectively referred to as "gene products." If the polynucleotides are derived from genomic DNA, expression may include the splicing of mRNA in eukaryotic cells.
[0094] In this application, the terms “polynucleotide,” “nucleotide,” “nucleotide sequence,” “nucleic acid,” and “oligonucleotide” are used interchangeably. They generally refer to a polymeric form of nucleotides of any length, which is a deoxyribonucleotide or ribonucleotide, or an analogue thereof. Polynucleotides can have any three-dimensional structure and can perform any function, known or unknown. The following are non-limiting examples of polynucleotides: coding or non-coding regions of genes or gene fragments, multiple loci (one locus) as defined by ligation analysis, exons, introns, messenger RNA (mRNA), transfer RNA, ribosomal RNA, short hairpin RNA (shRNA), microRNA (miRNA), ribozymes, cDNA, recombinant polynucleotides, branched polynucleotides, plasmids, vectors, isolated DNA of any sequence, isolated RNA of any sequence, nucleic acid probes, and primers. Polynucleotides may contain one or more modified nucleotides, such as methylated nucleotides and nucleotide analogues. If present, the nucleotide structure may be modified before or after polymer assembly. The sequence of the nucleotide may be interrupted by non-nucleotide components. Polynucleotides may be further modified after polymerization, such as by conjugation with labeled components.
[0095] In this application, the terms “non-naturally occurring” and “engineered” are used interchangeably. When they refer to nucleic acid molecules, peptides, or combinations thereof and systems thereof, they generally mean that the nucleic acid molecule or peptide is at least substantially free from at least one other component bound to it, either in nature or as found in nature.
[0096] In this application, the term "vector" generally refers to a nucleic acid molecule capable of delivering another nucleic acid molecule linked to it. Vectors include, but are not limited to, single-stranded, double-stranded, or partially double-stranded nucleic acid molecules; nucleic acid molecules including one or more free ends, or without free ends (e.g., circular); nucleic acid molecules including DNA, RNA, or both; and a wide variety of other polynucleotides known in the art. One type of vector is a "plasmid," which refers to a circular double-stranded DNA loop in which another DNA fragment can be inserted, for example, by standard molecular cloning techniques. Another type of vector is a viral vector, in which a virus-derived DNA or RNA sequence is present in a vector used to package viruses (e.g., retroviruses, replication-defective retroviruses, adenoviruses, replication-defective adenoviruses, and adeno-associated viruses). Viral vectors also contain polynucleotides carried by a virus for transfection into a host cell. Some vectors (e.g., bacterial vectors with bacterial origins of replication and episodic mammalian vectors) are capable of autonomous replication in the host cells in which they are introduced. Other vectors (e.g., non-episodic mammalian vectors) integrate into the genome of the host cell after introduction and thereby replicate along with the host genome. Furthermore, certain vectors are capable of directing the expression of genes they are operatively linked to. Such vectors are referred to herein as “expression vectors.” Common expression vectors used in recombinant DNA technology are typically in plasmid form. Recombinant expression vectors may contain the nucleic acids of this application in a form suitable for nucleic acid expression in a host cell, meaning that these recombinant expression vectors contain one or more regulatory elements selected based on the host cell to be used for expression, said regulatory elements being operatively linked to the nucleic acid sequence to be expressed. Within a recombinant expression vector, “operatively linked” is intended to mean that the nucleotide sequence of interest is linked to the one or more regulatory elements in a manner that allows the expression of that nucleotide sequence (e.g., in an in vitro transcription / translation system or in the host cell when the vector is introduced into the host cell).
[0097] In this application, the term "regulatory element" is generally intended to include promoters, enhancers, internal ribosome entry sites (IRES), and other expression control elements (e.g., transcription termination signals, such as polyadenylation signals and poly-U sequences). Such regulatory sequences are described, for example, in Goeddel, *Gene Expression Technology: Methods in Enzymology*, 185, Academic Press, San Diego, California (1990). In some embodiments, regulatory elements may include those sequences that direct constitutive expression of a nucleotide sequence in many types of host cells and those sequences that direct expression of the nucleotide sequence only in certain host cells (e.g., tissue-specific regulatory sequences). Tissue-specific promoters may primarily direct expression in the desired tissue of interest, such as muscle, neurons, bone, skin, blood, specific organs (e.g., liver, pancreas), or specific cell types (e.g., lymphocytes). Regulatory elements can also direct expression in a time-dependent manner (e.g., in a cell cycle-dependent or developmental stage-dependent manner), which may or may not be tissue- or cell type-specific. In some embodiments, a vector may contain one or more pol III promoters (e.g., 1, 2, 3, 4, 5, or more pol III promoters), one or more pol II promoters (e.g., 1, 2, 3, 4, 5, or more pol II promoters), one or more pol I promoters (e.g., 1, 2, 3, 4, 5, or more pol I promoters), or combinations thereof. Examples of pol III promoters include, but are not limited to, the U6 and H1 promoters. Examples of pol II promoters include, but are not limited to, the retro-transcribed Rouss sarcoma virus (RSV) LTR promoter (optionally having an RSV enhancer), the cytomegalovirus (CMV) promoter (optionally having a CMV enhancer), the SV40 promoter, the dihydrofolate reductase promoter, the β-actin promoter, the glycerol phosphokinase (PGK) promoter, and the EF1α promoter. The term “regulatory element” can also encompass enhancer elements such as WPRE, CMV enhancer, R-U5' fragment in the LTR of HTLV-I, SV40 enhancer; and intron sequence between exons 2 and 3 of rabbit β-globin (Proceedings of the National Academy of Sciences of the United States of America, Vol. 78(3), pp. 1527-31, 1981).
[0098] Those skilled in the art will understand that the design of expression vectors can depend on factors such as the choice of host cells to be transformed and the desired expression level. A vector can be introduced into a host cell to produce transcripts, proteins, or peptides, including fusion molecules or enzymes encoded by nucleic acids as described in this application (e.g., regularly spaced clustered short palindromic repeats (CRISPR) transcripts, proteins, enzymes, their mutant forms, their fusion molecules or fusion proteins, etc.). Advantageous vectors include lentiviruses and adeno-associated viruses, and vectors of this type can also be selected to target specific cell types.
[0099] In this application, the term "codon optimization" generally refers to a method of modifying a nucleic acid sequence to enhance expression in a host cell of interest by replacing at least one codon of the natural sequence with codons that are more frequently or most frequently used in the gene in the host cell, such as about 1, 2, 3, 4, 5, 10, 15, 20, 25, 50, or more codons while maintaining the natural amino acid sequence. Different species exhibit specific preferences for certain codons containing specific amino acids. Codon preference (differences in codon use between organisms) is often associated with the translation efficiency of messenger RNA (mRNA), which is thought to depend (among other things) on the nature of the codons being translated and the availability of specific transfer RNA (tRNA) molecules. The dominance of selected tRNAs within a cell generally reflects the codons most frequently used for peptide synthesis. Therefore, genes can be tailored to achieve optimal gene expression in a given organism based on codon optimization. Codon utilization tables are readily available, for example, in codon usage databases ("Codon Usage Database"), and these tables can be adapted in various ways. For example, see Nakamura Y. et al., Codon usage tabulated from the international DNA sequence databases: status for the year 2000, Nucleic Acids Res. 28:292 (2000). Computer algorithms for codon optimization of specific sequences for expression in specific host cells are also available, such as those from Gene Forge (Aptagen, Jacobus, PA). In some implementations, one or more codons (e.g., 1, 2, 3, 4, 5, 10, 15, 20, 25, 50, or more, or all codons) in the sequence encoding the Cas enzyme correspond to the codon most frequently used for a particular amino acid.
[0100] In this application, the terms "guide RNA" and "gRNA" are used interchangeably, and generally refer to a group of nucleic acid molecules that facilitate the specific guidance of RNA-guided nucleases or other effector molecules (usually complexed with gRNA molecules) to target sequences. In nature, crRNA and tracrRNA usually exist as two separate RNA molecules, forming gRNA. The term "tracrRNA" generally refers to a scaffold RNA that can bind to Cas nucleases, and the term "crRNA," also known as CRISPR RNA, generally refers to a nucleotide sequence complementary to the target DNA. crRNA and tracrRNA can also be fused into a single strand, in which case gRNA can also be called single-stranded guide RNA (sgRNA). sgRNA has become the most common form of gRNA used by those skilled in the art in CRISPR technology, therefore the terms "sgRNA" and "gRNA" may have the same meaning herein. sgRNA can be synthesized artificially or prepared from a DNA template in vitro or in vivo. sgRNA can bind to Cas nucleases or target target DNA, guiding Cas nucleases to cleave DNA sites complementary to the gRNA.
[0101] As used herein, crRNA typically comprises a spacer sequence mediating target recognition and a direct repeat sequence (also referred to herein as a “direct repeat” or “DR sequence”) that forms a complex with a CRISPR-Cas effector protein. In some cases, the spacer sequence (also called a guide sequence) is any polynucleotide sequence that is sufficiently complementary to the target sequence to hybridize with said target sequence and guide the specific binding of the CRISPR-Cas system complex to said target sequence. In some embodiments, when optimally aligned, the complementarity between the spacer sequence and its corresponding target sequence is at least 50%, at least 60%, at least 70%, at least 80%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99%. Determining optimal alignment is within the capabilities of a person skilled in the art. For example, publicly available and commercially available alignment algorithms and programs exist, such as, but not limited to, ClustalW, the Smith-Waterman algorithm in Matlab, Bowtie, Geneious, Biopython, and SeqMan.
[0102] In some embodiments, the spacer sequence is at least 5, at least 10, at least 15, at least 16, at least 17, at least 18, at least 19, at least 20, at least 21, at least 22, at least 23, at least 24, at least 25, at least 26, at least 27, at least 28, at least 29, at least 30, at least 35, at least 40, at least 45, or at least 50 nucleotides in length. In some embodiments, the spacer sequence is no more than 50, 45, 40, 35, 30, 25, 24, 23, 22, 21, 20, 15, 10, or fewer nucleotides in length. In some embodiments, the spacer sequence is 10-30, 15-25, 15-22, 16-24, 19-25, or 19-22 nucleotides in length. In some preferred embodiments, the spacer sequence is 20 nucleotides in length.
[0103] In some embodiments, the direct repeat sequence is at least 10, at least 15, at least 16, at least 17, at least 18, at least 19, at least 20, at least 21, at least 22, at least 23, at least 24, at least 25, at least 26, at least 27, at least 28, at least 29, at least 30, at least 31, at least 32, at least 33, at least 34, at least 35, at least 40, at least 45, at least 50, at least 55, at least 56, at least 57, at least 58, at least 59, at least 60, at least 61, at least 62, at least 63, at least 64, at least 65, or at least 70 nucleotides in length. In some embodiments, the direct repeat sequence is no more than 70, 65, 64, 63, 62, 61, 60, 59, 58, 57, 56, 55, 50, 45, 40, 35, 34, 33, 32, 31, 30, 29, 28, 27, 26, 25, 24, 23, 22, 21, 20, 15, 10, or fewer nucleotides in length. In some embodiments, the direct repeat sequence is 55-70 nucleotides in length, for example, 55-65 nucleotides, for example, 60-65 nucleotides, for example, 62-65 nucleotides, for example, 63-64 nucleotides. In some embodiments, the direct repeat sequence is 15-40 nucleotides in length, for example 15-25 nucleotides, for example 20-30 nucleotides, for example 22-36 nucleotides, for example 31 nucleotides.
[0104] In this application, the terms "comprising" and "including" are used interchangeably, and generally refer to including the expressly specified features but not excluding other elements. The term "at least" generally refers to including the stated number.
[0105] In this application, the terms “about” or “approximately” generally refer to an acceptable range of error for a particular value, as determined by those skilled in the art, in part depending on how the value is measured or determined, i.e., the limitations of the measurement system. For example, “about” or “approximately” may mean within 1 or more standard deviations according to practice in the art. Alternatively, “about” or “approximately” may mean a range of up to 10% or 20% (i.e., ±10% or ±20%).
[0106] In this application, the term “selected from” generally refers to the selection of objects and all combinations thereof. For example, “selected from (:) A, B and C” means all combinations of A, B and C, such as A, B, C, A+B, A+C, B+C or A+B+C.
[0107] In this application, when referring to a combination of multiple mutation sites, such as containing multiple specific amino acid substitution combinations, " / " can be used interchangeably with "and" and ",". For example, the Cas enzyme contains: T458K / E262K substitution; the Cas enzyme containing: T458K, E262K substitution both refer to the Cas enzyme having both T458K and E262K substitutions.
[0108] In this application, the term "reference sequence" refers to an amino acid sequence used as a benchmark. For example, the reference sequence of the Cas enzyme is EpiCas002 (SEQ ID NO:1) with an amino acid mutation of S128R substitution, which means that the serine at position 128 from the N-terminus of the amino acid sequence of SEQ ID NO:1 is mutated to arginine.
[0109] The embodiments described below are not intended to be limited by any theory, but are merely for illustrating the enzymes and their fusion molecules, methods and uses of this application, and are not intended to limit the scope of the invention.
[0110] Invention Details
[0111] Cas enzyme
[0112] The amino acid sequence of the Cas enzyme described in this application may contain a sequence that has at least about 80% identity with the amino acid sequence shown in any one of SEQ ID NOs:1-4, or the Cas enzyme may be based on the amino acid sequence shown in any one of SEQ ID NOs:1-4, respectively containing substitutions or deletions of one or more amino acids.
[0113] For example, the Cas enzyme may comprise a sequence having about 80%, about 81%, about 82%, about 83%, about 84%, about 85%, about 86%, about 87%, about 88%, about 89%, about 90%, about 91%, about 92%, about 93%, about 94%, about 95%, about 96%, about 97%, about 98%, about 99%, or about 100% identity with the amino acid sequence shown in any of SEQ ID NOs:1-4.
[0114] In some embodiments, the Cas enzyme can undergo substitution or deletion of one or more amino acids based on the amino acid sequence shown in SEQ ID NO:1. The substitution of amino acids may include those located at positions 28, 83, 148, 206, 212, 265, 268, 277, 322, 336, 362, 375, 386, 402, 420, 560, and 602. Amino acid substitutions at one or more of the following positions: position 627, position 743, position 756, position 793, position 802, position 859, position 860, position 891, position 901, position 1004, position 1053, position 1079, position 1088, position 1108, position 1122, position 1162, position 1212, position 1242, and position 1255. In some embodiments, the Cas enzyme can undergo substitution or deletion of 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, or more amino acids based on the amino acid sequence shown in SEQ ID NO:1.
[0115] In some embodiments, the Cas enzyme can undergo substitution or deletion of one or more amino acids based on the amino acid sequence shown in SEQ ID NO:1. The amino acid substitutions may include M206I, I756V, D1212N, E148D, G793D, S859R, I1088V, C265Y, L336M, S375R, F1242V, D627Y, K901N, F1255S, D1162G, K83E, V386L, D560E, and A112. Substitution of one or more (e.g., 1, 2, 3, 4, 5, 6, 7, 8, 9 or 10) amino acids from 2V, M277L, S802C, T28S, S891N, N362S, D860N, S420N, N1079I, T402I, D602N, E212V, S268G, E743G, L1004M, E322K, R1053I and K1108Q.
[0116] In some embodiments, the Cas enzyme comprises an amino acid sequence that may include any one of SEQ ID NOs:5-18, 397, and 398.
[0117] In some embodiments, the Cas enzyme can undergo substitution or deletion of one or more amino acids based on the amino acid sequence shown in SEQ ID NO:2. The substitution of amino acids may include those located at positions 6, 35, 39, 130, 149, 169, 172, 178, 252, 266, 285, 306, 309, 342, 378, 388, 395, 399, 404, 411, and 42. Amino acid substitutions at one or more of the following positions: position 1, position 434, position 438, position 476, position 506, position 520, position 572, position 602, position 611, position 612, position 626, position 674, position 679, position 716, position 729, position 767, position 930, position 939, position 1017, position 1072, position 1087, position 1103, position 1104, and position 1242. In some embodiments, the Cas enzyme can undergo substitution or deletion of 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, or more amino acids based on the amino acid sequence shown in SEQ ID NO:2.
[0118] In some embodiments, the Cas enzyme can undergo substitution or deletion of one or more amino acids based on the amino acid sequence shown in SEQ ID NO:2. The amino acid substitutions may include S130R, Q169R, A172R, T178R, D252R, T729R, D1017R, T306A, E395K, Y602H, N611K, A1103V, D1104N, F309L, I520L, D679Y, T1072S, D378E, T399S, T411I, E6K, D679K, D939V, Substitution of one or more (e.g., 1, 2, 3, 4, 5, 6, 7, 8, 9 or 10) amino acids from E1087K, E1242G, L35M, D149Y, E388K, E404V, K476M, E716K, L342D, I572F, R39S, E285K, L421M, L438M, T506I, I674N, V612M, D626N, F767L, T930S, G266S and S434N.
[0119] In some embodiments, the Cas enzyme may comprise an amino acid sequence that may include any of the amino acid sequences shown in SEQ ID NOs:19-35.
[0120] In some embodiments, the Cas enzyme can undergo substitution or deletion of one or more amino acids based on the amino acid sequence shown in SEQ ID NO:3. The substitution of amino acids may include substitutions at one or more of the following positions: 7, 82, 165, 167, 180, 194, 224, 226, 267, 288, 323, 361, 446, 447, 451, 458, 484, 510, 532, 537, 561, 569, 571, 604, 605, 693, 891, 899, 915, 916, and 1001. In some embodiments, the Cas enzyme may be substituted or deleted with one, two, three, four, five, six, seven, eight, nine, ten or more amino acids based on the amino acid sequence shown in SEQ ID NO:3.
[0121] In some embodiments, the Cas enzyme can undergo substitution or deletion of one or more amino acids based on the amino acid sequence shown in SEQ ID NO:3. The substitutions may include D7G, Q165W or Q165R or Q165K, N167R, V180K or V180N or V180Q or V180R or V180T or V180W or V180G, S224 deletion or S224R, K226R, V267M, E323R or E323G or E323K, G361R, K446R, F451G, E571R, R605G, N693K, L... Substitution of one or more (e.g., 1, 2, 3, 4, 5, 6, 7, 8, 9 or 10) amino acids from 899R, K915R or K915N or K915W or K915V or K915Y or K915Q or K915G or K915T, G916D, T1001I, D458K, D484K, D510K, D532K, D537K, D561K, D569K, I288T, D891E, S447N, A194V, A82V and I604L.
[0122] In some embodiments, the Cas enzyme may comprise an amino acid sequence that may include any one of SEQ ID NOs:36-79, 166-181.
[0123] In some embodiments, the Cas enzyme can undergo substitution or deletion of one or more amino acids based on the amino acid sequence shown in SEQ ID NO:4. The substitution of amino acids may include those located at positions 4, 6, 9, 11, 15, 17, 23, 24, 28, 53, 65, 72, 81, 103, 110, 122, 123, 133, 136, 144, 153, 160, 164, 165, 172, 183, 184, 191, and 1... 96th, 202nd, 211th, 213th, 219th, 224th, 227th, 229th, 231st, 233rd, 235th, 237th, 243rd, 259th, 260th, 262nd, 268th, 274th, 275th, 281st, 286th, 297th, 326th, 328th, 331st, 332nd, 338th, 343rd, 352nd, 361st, 363rd, 365th, 367th, 37th 5th, 382nd, 383rd, 387th, 402nd, 418th, 420th, 429th, 434th, 441st, 443rd, 456th, 458th, 462nd, 466th, 477th, 485th, 494th, 519th, 520th, 527th, 528th, 531st, 579th, 582nd, 591st, 608th, 612th, 615th, 627th, 633rd, 646th, 664th, 666th Amino acid substitutions can be made at one or more of the following positions: position 684, position 698, position 699, position 704, position 707, position 708, position 711, position 713, position 715, position 718, position 720, position 724, position 731, position 733, position 749, position 754, position 766, position 786, position 804, position 818, position 820, position 842, position 851, position 864, position 865, position 881, position 890, position 901, position 907, and position 908. In some embodiments, the Cas enzyme can undergo substitution or deletion of 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, or more amino acids based on the amino acid sequence shown in SEQ ID NO:4.
[0124] In some embodiments, the Cas enzyme can undergo substitution or deletion of one or more amino acids based on the amino acid sequence shown in SEQ ID NO:4. The substitutions may include E361K or E361R, K367R, E338K or E338D or E338R, D15G, K28R, E262K, M434R, K443R, T458K or T458R, A229T, W237R, D418G, D420G, K227N, E235Q, E243K, L268V, S274Y, K275M or K275E, E297K, F343Y or F343S or F343I, P383A, S387R, K429N, L520M, L579 R, E731Q, M820I, E441K, E213G, N260D, D402Y, L110M, I133N, I191T, N519K, V365M, V233D, M666I, E749G, S165C, W259R, E612K, D615N, T196I, G494D, Q231R, K907N, L908C, N224S, E851K, H462Y, N466D, A708S or A708H or A708Y or A708R, Q881H, P627L, T818P, S328R, E262G, D15Y, Y184C, A11V, K 219Q or K219E, E842K, G528C, E144R or E144K, D183Y, K363R, K582R, T6R or T6C, H202R, T4R, H286R or H286Q, M646D, K527R, A153T, T901R, M382R, M172E or M172C, D804R, A890W, M122W, F485A, M211E or M211R, H65R, D684R, Y711D or Y711P, I713R or I713N, N53R, V103R, E81R, K123R, K9R, S17I, I331F or I3 31C or I331S, Y754F, T786I, V326G, C608Y, S766N, N864D, N332H, E375K, A531E, K72T, A664Q, K24Q, W699Q or W699T or W699A or W699S or W699I or W699L or W699V or W699H or W699M or W699F or W699R or W699P or W699N or W699E or W699D, E160P, D591W, D136Q, R281M, G733I, T718I, W707L or W707M or W707I or W707A or W707V,Substitution of one or more amino acids from Y720A, Y720S, Y720Q, Y720G, Y720T, Y720N, A352V, C704V, Y715V, Y715H, N724T, K698Y, K698H, R865A, P456S, D477A, A164V, G23D, and S633N (e.g., 1, 2, 3, 4, 5, 6, 7, 8, 9, or 10).
[0125] In some embodiments, the Cas enzyme may comprise an amino acid sequence represented by any one of SEQ ID NOs:80-148, 182-378, or 383-396.
[0126] Specific substitution mutation combinations
[0127] In some embodiments, the Cas enzyme may undergo substitution or deletion of one or more amino acids based on the amino acid sequence shown in SEQ ID NO:3. The substitution of amino acids may include one or more (e.g., 1, 2, 3, 4, or 5) amino acids selected from V180K, I288T, S447N, D891E, and E323K.
[0128] In some embodiments, the Cas enzyme can undergo a one-amino acid substitution based on the amino acid sequence shown in SEQ ID NO:3. For example, the Cas enzyme can undergo a V180K substitution based on the amino acid sequence shown in SEQ ID NO:3. For example, the Cas enzyme can undergo an I288T substitution based on the amino acid sequence shown in SEQ ID NO:3. For example, the Cas enzyme can undergo an S447N substitution based on the amino acid sequence shown in SEQ ID NO:3. For example, the Cas enzyme can undergo a D891E substitution based on the amino acid sequence shown in SEQ ID NO:3. For example, the Cas enzyme can undergo an E323K substitution based on the amino acid sequence shown in SEQ ID NO:3.
[0129] In some embodiments, the Cas enzyme can undergo two-amino acid substitutions based on the amino acid sequence shown in SEQ ID NO:3. For example, the Cas enzyme can undergo V180K and I288T substitutions based on the amino acid sequence shown in SEQ ID NO:3. For example, the Cas enzyme can undergo V180K and S447N substitutions based on the amino acid sequence shown in SEQ ID NO:3. For example, the Cas enzyme can undergo I288T and S447N substitutions based on the amino acid sequence shown in SEQ ID NO:3.
[0130] In some embodiments, the Cas enzyme can undergo three-amino acid substitutions based on the amino acid sequence shown in SEQ ID NO:3. For example, the Cas enzyme can undergo V180K, I288T, and S447N substitutions based on the amino acid sequence shown in SEQ ID NO:3.
[0131] In some embodiments, the Cas enzyme may be substituted with V180K, I288T, S447N, D891E, and E323K based on the amino acid sequence shown in SEQ ID NO:3.
[0132] In some embodiments, the Cas enzyme may undergo substitution or deletion of one or more amino acids based on the amino acid sequence shown in SEQ ID NO:4. The substitution of amino acids may include substitution or deletion of one or more (e.g., 1, 2, 3, 4, 5, 6, 7, 8, 9, or 10) amino acids selected from T458K, E262K, K443R, M434R, N332H, W237R, E144K, D183Y, G528C, S633N, S328R, I331S, E375K, C608Y, E338K, and D804R.
[0133] In some embodiments, the Cas enzyme can undergo a one-amino acid substitution based on the amino acid sequence shown in SEQ ID NO:4. For example, the Cas enzyme can undergo a T458K substitution based on the amino acid sequence shown in SEQ ID NO:4. For example, the Cas enzyme can undergo an E262K substitution based on the amino acid sequence shown in SEQ ID NO:4. For example, the Cas enzyme can undergo a K443R substitution based on the amino acid sequence shown in SEQ ID NO:4. For example, the Cas enzyme can undergo an M434R substitution based on the amino acid sequence shown in SEQ ID NO:4. For example, the Cas enzyme can undergo an N332H substitution based on the amino acid sequence shown in SEQ ID NO:4.
[0134] In some embodiments, the Cas enzyme can undergo two-amino acid substitutions based on the amino acid sequence shown in SEQ ID NO:4. For example, the Cas enzyme can undergo T458K and E262K substitutions based on the amino acid sequence shown in SEQ ID NO:4. For example, the Cas enzyme can undergo T458K and K443R substitutions based on the amino acid sequence shown in SEQ ID NO:4. For example, the Cas enzyme can undergo T458K and M434R substitutions based on the amino acid sequence shown in SEQ ID NO:4. For example, the Cas enzyme can undergo T458K and N332H substitutions based on the amino acid sequence shown in SEQ ID NO:4. For example, the Cas enzyme can undergo E262K and K443R substitutions based on the amino acid sequence shown in SEQ ID NO:4. For example, the Cas enzyme can undergo E262K and M434R substitutions based on the amino acid sequence shown in SEQ ID NO:4. For example, the Cas enzyme can undergo E262K and N332H substitutions based on the amino acid sequence shown in SEQ ID NO:4. For example, the Cas enzyme can undergo K443R and M434R substitutions based on the amino acid sequence shown in SEQ ID NO:4. For example, the Cas enzyme can undergo K443R and N332H substitutions based on the amino acid sequence shown in SEQ ID NO:4. For example, the Cas enzyme can undergo M434R and N332H substitutions based on the amino acid sequence shown in SEQ ID NO:4.
[0135] In some embodiments, the Cas enzyme can undergo three-amino acid substitutions based on the amino acid sequence shown in SEQ ID NO:4. For example, the Cas enzyme can undergo substitutions of T458K, E262K, and K443R based on the amino acid sequence shown in SEQ ID NO:4. For example, the Cas enzyme can undergo substitutions of T458K, E262K, and M434R based on the amino acid sequence shown in SEQ ID NO:4. For example, the Cas enzyme can undergo substitutions of T458K, K443R, and M434R based on the amino acid sequence shown in SEQ ID NO:4. For example, the Cas enzyme can undergo substitutions of T458K, K443R, and N332H based on the amino acid sequence shown in SEQ ID NO:4. For example, the Cas enzyme can undergo substitutions of T458K, M434R, and N332H based on the amino acid sequence shown in SEQ ID NO:4. For example, the Cas enzyme can undergo E262K, K443R, and M434R substitutions based on the amino acid sequence shown in SEQ ID NO:4. For example, the Cas enzyme can undergo E262K, K443R, and N332H substitutions based on the amino acid sequence shown in SEQ ID NO:4. For example, the Cas enzyme can undergo E262K, M434R, and N332H substitutions based on the amino acid sequence shown in SEQ ID NO:4. For example, the Cas enzyme can undergo K443R, M434R, and N332H substitutions based on the amino acid sequence shown in SEQ ID NO:4.
[0136] In some embodiments, the Cas enzyme can undergo four-amino acid substitutions based on the amino acid sequence shown in SEQ ID NO:4. For example, the Cas enzyme can undergo substitutions of T458K, E262K, K443R, and M434R based on the amino acid sequence shown in SEQ ID NO:4. For example, the Cas enzyme can undergo substitutions of T458K, E262K, K443R, and N332H based on the amino acid sequence shown in SEQ ID NO:4. For example, the Cas enzyme can undergo substitutions of T458K, E262K, M434R, and N332H based on the amino acid sequence shown in SEQ ID NO:4. For example, the Cas enzyme can undergo substitutions of T458K, K443R, M434R, and N332H based on the amino acid sequence shown in SEQ ID NO:4. For example, the Cas enzyme can undergo substitutions of E262K, K443R, M434R, and N332H based on the amino acid sequence shown in SEQ ID NO:4.
[0137] In some embodiments, the Cas enzyme can undergo substitution of five amino acids based on the amino acid sequence shown in SEQ ID NO:4. For example, the Cas enzyme can undergo substitution of T458K, E262K, K443R, M434R, and N332H based on the amino acid sequence shown in SEQ ID NO:4.
[0138] In some embodiments, the Cas enzyme may have a catalytically active domain capable of binding to and / or cleaving the target DNA strand.
[0139] In some embodiments, the substitution of one or more amino acids may be located in the catalytically active domain, thereby enabling the Cas enzyme to have only the activity of binding to the target DNA strand, or to have both the activity of binding to the target DNA strand and the activity of cleaving the single strand of the target DNA.
[0140] Fusion Molecules
[0141] On the other hand, this application provides a fusion molecule that may include the Cas enzyme described in this application and one or more heterologous functional domains.
[0142] In some implementations, the one or more heterologous functional domains may be able to regulate the expression of one or more gene products.
[0143] In some embodiments, the one or more heterologous functional domains may be directly or indirectly fused to the Cas enzyme.
[0144] In some embodiments, the one or more heterologous functional domains may be selected from helicases, nucleases, helicase-nucleases, DNA methyltransferases, DNA hydroxymethyltransferases, histone methyltransferases, histone demethyltransferases, histone acetyltransferases, histone deacetyltransferases, phosphatases, kinases, transcription (co)activators, transcription repressors, DNA-binding proteins, DNA structural proteins, marker proteins, reporter proteins, fluorescent proteins, ligand-binding proteins, signal peptides, subcellular localization sequences, antibody epitopes, and affinity purification tags.
[0145] In some embodiments, the one or more heterologous functional domains may have one or more of the following activities: methylase activity, demethylase activity, deaminase activity, transcriptional activation activity, transcriptional repression activity, transcriptional release factor activity, reverse transcriptase activity, histone modification activity, RNA cleavage activity, and nucleic acid binding activity.
[0146] CRISPR-Cas system and carrier system
[0147] On the other hand, this application provides an engineered, programmable, non-naturally occurring CRISPR-Cas system, which may contain the Cas enzyme described in this application or a nucleotide encoding the Cas enzyme; or the system may contain the fusion molecule described in this application or a nucleotide encoding the fusion molecule, and one or more guide RNAs, which target the loci of nucleic acid molecules encoding one or more gene products in the cell, thereby guiding the Cas enzyme or the fusion molecule to bind to and / or cleave the loci of the nucleic acid molecules encoding one or more gene products; and the Cas enzyme or the fusion molecule and the guide RNA may exist simultaneously or separately.
[0148] On the other hand, this application provides an engineered, non-naturally occurring vector system, which may comprise one or more vectors, the one or more vectors comprising: a) a first regulatory element operably linked to one or more guide RNAs capable of hybridizing with a target sequence in a locus of a nucleic acid molecule encoding one or more gene products; and b) a second regulatory element operably linked to the Cas enzyme or the fusion molecule described in this application, wherein components a) and b) are located on the same or different vectors of the vector system, and the guide RNA targets the locus of the nucleic acid molecule encoding one or more gene products in the cell, thereby directing the Cas enzyme or the fusion molecule to bind to and / or cleave the locus of the nucleic acid molecule encoding one or more gene products; and the Cas enzyme or the fusion molecule and the guide RNA may coexist simultaneously or separately.
[0149] In some implementations, the expression of one or more gene products can be altered.
[0150] In some implementations, the expression of the gene product may be reduced or increased.
[0151] In some embodiments, the gene product may be a protein.
[0152] In some embodiments, the cell may be a eukaryotic cell.
[0153] In some embodiments, the eukaryotic cell may be a mammalian cell. For example, the mammalian cell may include, but is not limited to, cells of mice, monkeys, humans, livestock, sporting animals, and pets.
[0154] In some embodiments, the mammalian cell may be a human cell.
[0155] In some embodiments, the Cas enzyme may be codon-optimized for expression in eukaryotic cells.
[0156] In some implementations, the guide RNA may comprise a guide sequence fused to the tracr sequence.
[0157] In some embodiments, the guide RNA may comprise a direct repeat sequence and a spacer sequence, wherein the spacer sequence binds to a nucleic acid molecule targeted by the guide RNA.
[0158] In some implementations, the length of the direct repeat sequence can be from 10 to 70 nucleotides.
[0159] In some implementations, the length of the direct repeat sequence can be 31 to 36 nucleotides.
[0160] In some embodiments, the length of the spacer sequence may be 16 to 24 nucleotides.
[0161] In some embodiments, the nucleic acid molecule targeted by the guide RNA may contain a nucleotide sequence that can complementarily pair with the spacer sequence.
[0162] In some embodiments, the vector of the system or the Cas enzyme may also contain one or more nuclear localization sequences (NLS).
[0163] In some embodiments, the system can be introduced into the cells via a delivery system selected from viral particles, liposomes, lipid nanoparticles, electroporation, microinjection, and conjugation.
[0164] method
[0165] On the other hand, this application provides a method for altering the expression of one or more gene products. The method may include introducing an engineered, non-naturally occurring CRISPR-Cas system into a cell containing and expressing a nucleic acid molecule encoding the one or more gene products. The system may contain the Cas enzyme or the fusion molecule described in this application, and one or more guide RNAs. The one or more guide RNAs target the loci of the nucleic acid molecule encoding the one or more gene products, thereby directing the Cas enzyme or the fusion molecule to bind to and / or cleave the loci, thereby altering the expression of the one or more gene products. Furthermore, the Cas enzyme or the fusion molecule and the guide RNA may coexist or not coexist.
[0166] On the other hand, this application provides a method for altering the expression of one or more gene products. The method may include introducing an engineered, non-naturally occurring vector system into a cell containing and expressing a nucleic acid molecule encoding the one or more gene products. The vector system may contain one or more vectors comprising: a) a first regulatory element operably linked to one or more guide RNAs capable of hybridizing with a target sequence at a locus of the nucleic acid molecule encoding the one or more gene products; and b) a second regulatory element operably linked to the Cas enzyme or the fusion molecule described in this application. Components a) and b) are located on the same or different vectors of the vector system, and the guide RNA targets the locus of the nucleic acid molecule encoding the one or more gene products in the cell, thereby directing the Cas enzyme or the fusion molecule to bind to and / or cleave the locus, thereby altering the expression of the one or more gene products. Furthermore, the Cas enzyme or the fusion molecule and the guide RNA may coexist or not coexist.
[0167] In some implementations, the expression of the gene product may be reduced or increased.
[0168] In some embodiments, the gene product may be a protein.
[0169] In some embodiments, the cell may be a eukaryotic cell.
[0170] In some embodiments, the eukaryotic cell may be a mammalian cell. For example, the mammalian cell includes, but is not limited to, cells of mice, monkeys, humans, livestock, sporting animals, and pets.
[0171] In some embodiments, the mammalian cell may be a human cell.
[0172] In some embodiments, the Cas enzyme may be codon-optimized for expression in eukaryotic cells.
[0173] In some implementations, the guide RNA may comprise a guide sequence fused to the tracr sequence.
[0174] In some embodiments, the guide RNA may comprise a direct repeat sequence and a spacer sequence, wherein the spacer sequence binds to a nucleic acid molecule targeted by the guide RNA.
[0175] In some implementations, the length of the direct repeat sequence can be from 10 to 70 nucleotides.
[0176] In some implementations, the length of the direct repeat sequence can be 31 to 36 nucleotides.
[0177] In some embodiments, the length of the spacer sequence may be 16 to 24 nucleotides.
[0178] In some embodiments, the nucleic acid molecule targeted by the guide RNA may contain a nucleotide sequence that can complementarily pair with the spacer sequence.
[0179] In some embodiments, the vector of the system or the Cas enzyme may also contain one or more nuclear localization sequences (NLS).
[0180] In some embodiments, the method may include introducing the CRISPR-Cas system or the vector system into the cells via a delivery system selected from viral particles, liposomes, lipid nanoparticles, electroporation, microinjection, and conjugation.
[0181] Nucleic acids, cells and reagent kits
[0182] On the other hand, this application provides nucleic acids encoding the Cas enzyme, the fusion molecule, or the CRISPR-Cas system described in this application.
[0183] On the other hand, this application provides a cell that may contain the Cas enzyme, the fusion molecule, the CRISPR-Cas system, the vector system, and / or the nucleic acid described in this application.
[0184] In some embodiments, the cells may include HEK293T cells.
[0185] On the other hand, this application provides a kit that may contain the Cas enzyme described in this application, the fusion molecule described in this application, the CRISPR-Cas system described in this application, the vector system described in this application, the nucleic acid described in this application, and / or the cell described in this application.
[0186] In some embodiments, the kit may also include a container for placing the Cas enzyme, the fusion molecule, the CRISPR-Cas system, the vector system, the nucleic acid and / or the cells, and instructions for use.
[0187] Other aspects and advantages of this application will readily be apparent to those skilled in the art from the detailed description below. Only exemplary embodiments of this application are shown and described in the following detailed description. As will be appreciated by those skilled in the art, the content of this application enables them to make modifications to the disclosed specific embodiments without departing from the spirit and scope of the invention to which this application pertains. Accordingly, the descriptions in the accompanying drawings and specification of this application are merely exemplary and not restrictive.
[0188] Example
[0189] Example 1
[0190] In the embodiments and accompanying figures, "+" indicates the addition of mutation sites. Taking Figure 3A as an example, M1+M2 means that it has mutation sites in both M1 and M2, that is, it has both S130R and Q169R mutations on the basis of EpiCas045 (SEQ ID NO: 2); M1+M2+M6 means that it has mutation sites in M1, M2 and M6, that is, it has both S130R, Q169R and T729R mutations on the basis of EpiCas045 (SEQ ID NO: 2).
[0191] 1.1 Determine the cleavage activity of the Cas enzyme of this application in eukaryotic cells.
[0192] Different variants of the Cas enzyme provided in this application, their corresponding gRNAs, and reporter plasmids with gRNA targeting sites were co-transfected into HEK293T cells. By modifying the targeting sequence on the reporter plasmid, reporter systems for different Cas enzymes can be rapidly constructed. As shown in Figure 1, reporter system cells without cleavage only express blue fluorescent protein, while green fluorescent protein cannot be expressed normally due to cleavage, and the reporter system emits only blue fluorescence. After cleavage at the gRNA targeting site, the cells repair DNA through homologous recombination, thereby obtaining a complete green fluorescent protein expression cassette, causing the reporter system to emit green fluorescence. Cas enzymes with stronger cleavage activity produce a higher proportion of DNA cleavage, resulting in more copies of repaired green fluorescent protein, and the cells exhibit stronger green fluorescence. Therefore, this embodiment compares the proportion and fluorescence intensity of green fluorescent cells to quickly compare the cleavage activity of different Cas enzymes.
[0193] In this embodiment, an exemplary nucleotide sequence encoding the reporter system is shown below (bold text represents blue and green fluorescent protein sequences, italics represent linker sequences, and underlined text represents 2A cleavage peptide sequences; bold underlined text represents an exemplary sequence that is reverse complementary to the gRNA targeting sequence (SEQ ID NO: 149), wherein the TTTG and TGG (italics) at both ends of this sequence will be set to different sequences according to the PAM preferences of different Cas enzymes):
[0194] Figure 2 shows the proportion of green fluorescent cells activated by different variants of the reference sequence SEQ ID NO:1 (EpiCas002 shown in Table 1). The DR (direct repeat) region sequence and spacer sequence of the gRNA are shown in SEQ ID NO:151 and 154, respectively.
[0195] This embodiment also compared the enhanced cleavage activity of different variants compared to wild-type Cas enzyme. The proportion of cells activated by wild-type Cas enzyme with the reference sequence was normalized to 1, and the fold increase in the activation rate of the mutant compared to the wild type was calculated. The results shown in Figures 3A-3B are a comparison of the cleavage activity of different variants (variant numbers are shown in Table 1) of the reference sequence SEQ ID NO:2 (EpiCas045 shown in Table 1); wherein, the DR region sequence and spacer sequence of gRNA are shown as SEQ ID NO:152 and 154, respectively.
[0196] Figures 4-21 show the test results of different variants of the reference sequence SEQ ID NO:4 (EpiCas059 shown in Table 1) (variant numbers are shown in Table 1). The proportion of cells activated by the wild-type Cas enzyme with the reference sequence was normalized to 1, and the fold increase in the activation ratio of the mutant compared with the wild type was calculated.
[0197] Figures 22A-22D show the results comparing the cleavage activity of different variants of the reference sequence SEQ ID NO:4 (EpiCas059 shown in Table 1) (variant numbers are shown in Table 1), and directly statistically analyzing the proportion of green fluorescent cells that can be activated by different variants.
[0198] In Figures 4-21 and 22A-22D, the DR region sequence and Spacer sequence of the gRNA are shown as SEQ ID NO:153 and 154, respectively.
[0199] The completely inactivated Cas enzyme sequence (Dead sequence) shown in Figure 5 is as shown in SEQ ID NO:155. The M46 and M47 variants involved in Figures 4-7 and 22 are shown in Table 2 below:
[0200] Table 2 M46 and M47 variants
[0201] All of the above results indicate that the cleavage activity of the mutants is improved compared to their wild-type counterparts, and the mutants formed by the superposition of multiple single amino acid substitutions can further enhance the activity.
[0202] 1.2 Modify the N-terminal PAM of the sequence in the coding reporter system that is inversely complementary to the gRNA target sequence, and determine the cleavage activity of the Cas enzyme in eukaryotic cells.
[0203] By changing the N-terminal PAM of the sequence in the coding reporter system that is inversely complementary to the gRNA target sequence to TTVG (TTVG represents a mixture of TTCG, TTGG, and TTAG versions in equal proportions), the increased cleavage activity of different variants compared to wild-type Cas enzymes was compared. The proportion of cells activated by wild-type Cas enzyme with the reference sequence was normalized to 1, and the fold increase in the activation rate of mutants compared to wild-type was calculated.
[0204] Figure 23 shows the test results of different variants of the reference sequence SEQ ID NO:4 (EpiCas059 shown in Table 1). The proportion of cells activated by the wild-type Cas enzyme with the reference sequence was normalized to 1, and the fold increase in the activation ratio of the mutant compared with the wild type was calculated.
[0205] By changing the N-terminal PAM of the sequence in the coding reporter system that is inversely complementary to the gRNA target sequence to TVTG (TVTG represents a mixture of TCTG, TGTG, and TATG versions in equal proportions), the increased cleavage activity of different variants compared to wild-type Cas enzymes was compared. The proportion of cells activated by wild-type Cas enzyme with the reference sequence was normalized to 1, and the fold increase in the activation rate of mutants compared to wild-type was calculated.
[0206] Figure 24 shows the test results of different variants of the reference sequence SEQ ID NO:4 (EpiCas059 shown in Table 1). The proportion of cells activated by the wild-type Cas enzyme with the reference sequence was normalized to 1, and the fold increase in the activation ratio of the mutant compared with the wild type was calculated.
[0207] The sequence in the reporter system that is inversely complementary to the gRNA target sequence was changed to TGCAAGCTAACAGTTGCTTT, and its N-terminal PAM was changed to CTAG. The increased cleavage activity of different variants compared to wild-type Cas enzymes was compared. The proportion of cells activated by wild-type Cas enzyme with the reference sequence was normalized to 1, and the fold increase in the activation rate of mutants compared to wild-type was calculated.
[0208] Figures 25 and 26 show the test results of different variants of the reference sequence SEQ ID NO:4 (EpiCas059 shown in Table 1). The proportion of cells activated by the wild-type Cas enzyme with the reference sequence was normalized to 1, and the fold increase in the activation ratio of the mutant compared with the wild type was calculated.
[0209] In Figures 4-26, the DR region sequence and Spacer sequence of the gRNA used in the experimental group are shown in SEQ ID NO:153 and 154, respectively. The DR region sequence and Spacer sequence of the gRNA used in the control group NT are shown in SEQ ID NO:153 and 379 (sequence: CUGAAGGUGUCUGGCAGAGC). The DR region sequence and Spacer sequence of the gRNA after changing the sequence to be inversely complementary to the gRNA target sequence are shown in SEQ ID NO:153 and 380 (sequence: UGCAAGCUAACAGUUGCUUU). The specific sequences are shown in Table 3.
[0210] Table 3
[0211] All of the above results indicate that the cleavage activity of the mutants is improved compared to their wild-type counterparts, and the mutants formed by the superposition of multiple single amino acid substitutions can further enhance the activity.
[0212] Example 2: Detection of inactivated Cas enzyme (dCas) at endogenous sites
[0213] (1) The Cas enzyme variant provided in this application (its reference sequence is EpiCas057 shown in Table 1, i.e., SEQ ID NO:3, and the different variants are numbered as shown in Table 1) is fused with 10×GCN4 to recruit the fusion peptide of scFV-P65-HSF1, thereby obtaining a dCas-SPH gene activation tool based on the Cas enzyme provided in this application. The principle of this tool is based on the fact that GCN4 can spontaneously recognize and bind to scFV, thereby enriching the P65 and HSF1 effectors with transcriptional activation functions near the target site of the Cas enzyme, and then activating the gene expression at the target site.
[0214] A transcriptional activation tool and a CXCR4-targeting gRNA (spacer sequences shown in SEQ ID NOs: 156-159, and the corresponding DR region sequence shown in SEQ ID NO: 160; these four gRNAs were mixed in equal proportions) were co-transfected into HEK293T cells. Forty-eight hours after transfection, cells were collected and stained with PE anti-human CXCR4 antibody (BioLegendg, 306506). The fluorescence intensity of the PE channel was detected by flow cytometry to reflect the expression intensity of CXCR4. The average fluorescence intensity of PE in the positive transfection population was divided by the average fluorescence intensity of PE in the negative transfection population to obtain the CXCR4 activation fold change (MFI), which represents the activation efficiency of different activation tools (Figure 27), and thus the DNA binding effect of different tools.
[0215] The amino acid sequence of the scFV-P65-HSF1 fusion peptide is shown in SEQ ID NO:161 (italics represent NLS, bold italics represent P65 and HSF1, bold represents scFV, italic underline represents HA tag, underline represents linker peptide, and <> represents Flag marker):
[0216] The amino acid sequence of the dCas-10×GCN4 fusion peptide is shown below (italics represent NLS, the bold dCas portion is selected from any amino acid sequence in SEQ ID NO:36-79, and <> represents GCN4):
[0217] The fusion peptides scFV-P65-HSF1 and dCas-10×GCN4 of each tool can be co-expressed by the same plasmid sequence.
[0218] (2) The aforementioned transcription activation tool and gRNA targeting the HBB gene (spacer sequences shown in SEQ ID NOs:162-164, and the corresponding DR region sequence shown in SEQ ID NO:165; the three gRNAs were mixed in equal proportions) were transfected into genetically modified HEK293T cells. In the 293T genome, a p2a-GFP sequence was knocked into the HBB gene before the stop codon using Cas9. The activation tool binds to the HBB gRNA and targets the HBB gene promoter region, activating HBB gene expression, which is represented by GFP. A higher GFP activation ratio indicates stronger binding ability. Forty-eight hours after transfection, GFP fluorescence was detected by flow cytometry. The proportion of GFP-expressing cells in the positive transfection population represented the DNA binding efficiency of different mutants (Figure 28).
[0219] The gRNA (spacer sequence shown in SEQ ID NO:162, and the corresponding DR region sequence shown in SEQ ID NO:165) of the transcription activation tool and the targeted reporter system (as shown in Figure 29) was transfected into genetically modified HEK293T cells. The 293T genome contains a reporter system that is not expressed at a low background level but can efficiently express GFP only after specific targeted activation. The activation tool binds to the specifically targeted gRNA and targets the reporter system promoter region, activating GFP. A higher GFP activation rate indicates stronger binding ability. Forty-eight hours after transfection, GFP fluorescence was detected by flow cytometry. The proportion of GFP-expressing cells in the positive transfection population represented the DNA binding efficiency of different mutants. The results are shown in Figure 30. The results indicate that different mutants all exhibited good DNA binding efficiency.
[0220] In this embodiment, the nucleotide sequence encoding the reporter system is shown in SEQ ID NO:382 (bold text represents the green fluorescent protein sequence, bold underline represents the sequence that is inversely complementary to the gRNA targeting sequence (SEQ ID NO:381), and italic underline represents the smallest promoter unit):
[0221] The amino acid sequence of the scFV-P65-HSF1 fusion peptide is shown in SEQ ID NO:161 (italics represent NLS, bold italics represent P65 and HSF1, bold represents scFV, italic underline represents HA tag, underline represents linker peptide, and <> represents Flag marker):
[0222] The amino acid sequence of the dCas-10×GCN4 fusion peptide is shown below (italics represent NLS, the bold dCas portion is selected from any amino acid sequence in SEQ ID NO:36-79, and <> represents GCN4):
[0223] The fusion peptides scFV-P65-HSF1 and dCas-10×GCN4 of each tool can be co-expressed by the same plasmid sequence.
[0224] Example 3: Determination of the cleavage activity of the Cas enzyme variant in this application at endogenous sites in eukaryotic cells.
[0225] Different variants of the Cas enzyme provided in this application and their corresponding gRNA plasmids (the DR region sequence and spacer sequence of the gRNA are shown in SEQ ID NO:153 and 380, respectively, and the target site is BCL11A) were co-transfected into HEK293T cells. After 48 hours of transfection, transfected positive cells were sorted by flow cytometry, and the cell genome was extracted and the target site was amplified for high-throughput sequencing. The proportion of insertions and deletions (InDel) in different experimental groups was determined by analyzing the sequencing results. The proportion of InDel reflects the activity of different variants.
[0226] Figures 31-33 show the test results for different variants of the reference sequence SEQ ID NO:4 (EpiCas059 shown in Table 1) (variant numbers are shown in Table 1). The results indicate that the Cas enzymes of these different variants all exhibit good cleavage activity.
[0227] Example 4 Mutation Site Analysis
[0228] Drawing using the AlphaFold3 program with default parameters (https: / / alphafoldserver.com / )
[0229] Figure 34 shows the protein structure of EpiCas057, with V180, I288, and S447 sites highlighted in red. These three highly effective mutation sites, although located far apart in the sequence of the EpiCas057 protein, are spatially close in structure, which explains why mutations at these three sites can efficiently enhance the activity of EpiCas057.
[0230] Figure 35 shows the protein structure of EpiCas059, with the M434, K443, and T458 sites highlighted in red. These three highly efficient mutation sites are close in sequence and located in the same structural region. Mutations in these sites can efficiently enhance the activity of EpiCas059.
[0231] The above experiments demonstrate that the different Cas enzyme variants provided in this application are suitable for epigenetic modification editing applications, and are not limited to other DNA-targeted applications such as base editing, gene activation, gene silencing, chromosome imaging, etc.
Claims
1. A Cas enzyme whose amino acid sequence comprises a sequence that is at least about 80% identical to the amino acid sequence set forth in any one of SEQ ID NOs: 1-4, or which is based on the amino acid sequence set forth in any one of SEQ ID NOs: 1-4 with one or more amino acid substitutions or deletions, respectively.
2. The Cas enzyme of claim 1, wherein the one or more amino acid substitutions comprise: 1) amino acid substitutions at one or more of positions 28, 83, 148, 206, 212, 265, 268, 277, 322, 336, 362, 375, 386, 402, 420, 560, 602, 627, 743, 756, 793, 802, 859, 860, 891, 901, 1004, 1053, 1079, 1088, 1108, 1122, 1162, 1212, 1242, and 1255, based on the amino acid sequence set forth in SEQ ID NO: 1; or 2) amino acid substitutions at one or more of positions 6, 35, 39, 130, 149, 169, 172, 178, 252, 266, 285, 306, 309, 342, 378, 388, 395, 399, 404, 411, 421, 434, 438, 476, 506, 520, 572, 602, 611, 612, 626, 674, 679, 716, 729, 767, 930, 939, 1017, 1072, 1087, 1103, 1104, and 1242, based on the amino acid sequence set forth in SEQ ID NO: 2; or 3) amino acid substitutions or deletions at one or more of positions 7, 82, 165, 167, 180, 194, 224, 226, 267, 288, 323, 361, 446, 447, 451, 458, 484, 510, 532, 537, 561, 569, 571, 604, 605, 693, 891, 899, 915, 916, and 1001, based on the amino acid sequence set forth in SEQ ID NO:
3. 4) an amino acid substitution at one or more of positions 4, 6, 9, 11, 15, 17, 23, 24, 28, 53, 65, 72, 81, 103, 110, 122, 123, 133, 136, 144, 153, 160, 164, 165, 172, 183, 184, 191, 196, 202, 211, 213, 219, 224, 227, 229, 231, 233, 235, 237, 243, 259, 260, 262, 268, 274, 275, 281, 286, 297, 326, 328, 331, 332, 338, 343, 352, 361, 363, 365, 367, 375, 382, 383, 387, 402, 418, 420, 429, 434, 441, 443, 456, 458, 462, 466, 477, 485, 494, 519, 520, 527, 528, 531, 579, 582, 591, 608, 612, 615, 627, 633, 646, 664, 666, 684, 698, 699, 704, 707, 708, 711, 713, 715, 718, 720, 724, 731, 733, 749, 754, 766, 786, 804, 818, 820, 842, 851, 864, 865, 881, 890, 901, 907, and 908, based on the amino acid sequence set forth in SEQ ID NO:
4.
3. The Cas enzyme of claim 1 or 2, the substitution of one or more amino acids comprises: 1) one or more of M206I, I756V, D1212N, E148D, G793D, S859R, I1088V, C265Y, L336M, S375R, F1242V, D627Y, K901N, F1255S, D1162G, K83E, V386L, D560E, A1122V, M277L, S802C, T28S, S891N, N362S, D860N, S420N, N1079I, T402I, D602N, E212V, S268G, E743G, L1004M, E322K, R1053I, and K1108Q based on the amino acid sequence set forth in SEQ ID NO: 1; or 2) one or more of S130R, Q169R, A172R, T178R, D252R, T729R, D1017R, T306A, E395K, Y602H, N611K, A1103V, D1104N, F309L, I520L, D679Y, T1072S, D378E, T399S, T411I, E6K, D679K, D939V, E1087K, E1242G, L35M, D149Y, E388K, E404V, K476M, E716K, L342D, I572F, R39S, E285K, L421M, L438M, T506I, I674N, V612M, D626N, F767L, T930S, G266S, and S434N based on the amino acid sequence set forth in SEQ ID NO: 2; or 3) one or more of D7G, Q165W or Q165R or Q165K, N167R, V180K or V180N or V180Q or V180R or V180T or V180W or V180G, S224 deletion or S224R, K226R, V267M, E323R or E323G or E323K, G361R, K446R, F451G, E571R, R605G, N693K, L899R, K915R or K915N or K915W or K915V or K915Y or K915Q or K915G or K915T, G916D, T1001I, D458K, D484K, D510K, D532K, D537K, D561K, D569K, I288T, D891E, S447N, A194V, A82V, and I604L based on the amino acid sequence set forth in SEQ ID NO: 3; or 4) E361 K or E361 R, K367R, E338K or E338D or E338R, D15G, K28R, E262K, M434R, K443R, T458K or T458R, A229T, W237R, D418G, D420G, K227N, E235Q, E243K, L268V, S274Y, K275M or K275E, E297K, F343Y or F343S or F343I, P383A, S387R, K429N, L520M, L579R, E731Q, M820I, E441K, E213G, N260D, D402Y, L110M, I133N, I191T, N519K, V365M, V233D, M666I, E749G, S165C, W259R, E612K, D615N, T196I, G494D, Q231R, K907N, L908C, N224S, E851K, H462Y, N466D, A708S or A708H or A708Y or A708R, Q881H, P627L, T818P, S328R, E262G, D15Y, Y184C, A11V, K219Q or K219E, E842K, G528C, E144R or E144K, D183Y, K363R, K582R, T6R or T6C, H202R, T4R, H286R or H286Q, M646D, K527R, A153T, T901R, M382R, M172E or M172C, D804R, A890W, M122W, F485A, M211E or M211R, H65R, D684R, Y711D or Y711P, I713R or I713N, N53R, V103R, E81R, K123R, K9R, S17I, I331F or I331C or I331S, Y754F, T786I, V326G, C608Y, S766N, N864D, N332H, E375K, A531E, K72T, A664Q, K24Q, W699Q or W699T or W699A or W699S or W699I or W699L or W699V or W699H or W699M or W699F or W699R or W699P or W699N or W699E or W699D, E160P, D591W, D136Q, R281M, G733I, T718I, W707L or W707M or W707I or W707A or W707V, Y720A or Y720S or Y720Q or Y720G or Y720T or Y720N, A352V, C704V, Y715V or Y715H, N724T, based on the amino acid sequence set forth in SEQ ID NO:4;one or more of K698Y or K698H, R865A, P456S, D477A, A164V, G23D, and S633N.
4. The Cas enzyme of any one of claims 1-3, wherein the one or more substitutions of amino acids comprises one or more substitutions selected from: 1) V180K, I288T, S447N, D891E, and E323K based on the amino acid sequence set forth in SEQ ID NO: 3; or 2) based on the amino acid sequence set forth in SEQ ID NO: 4: T458K, E262K, K443R, M434R, N332H, W237R, E144K, D183Y, G528C, S633N, S328R, I331S, E375K, C608Y, E338K, and D804R.
5. The Cas enzyme of claim 4, comprising one, two, or three substitutions selected from V180K, I288T, and S447N, based on the amino acid sequence set forth in SEQ ID NO: 3, said two substitutions comprising: V180K / I288T, V180K / S447N, or I288T / S447N; or comprising the following five substitutions: V180K / I288T / S447N / D891E / E323K, wherein " / " denotes "and".
6. The Cas enzyme of claim 4, comprising one, two, three, four, or five substitutions selected from T458K, E262K, K443R, M434R, and N332H, based on the amino acid sequence set forth in SEQ ID NO:4, said two substitutions comprising: T458K / E262K, T458K / K443R, T458K / M434R, T458K / N332H, E262K / K443R, E262K / M434R, E262K / N332H, K443R / M434R, K443R / N332H, or M434R / N332H; three of said substitutions comprise: T458K / E262K / K443R, T458K / E262K / M434R, T458K / E262K / N332H, T458K / K443R / M434R, T458K / K443R / N332H, T458K / M434R / N332H, E262K / K443R / M434R, E262K / K443R / N332H, E262K / M434R / N332H, or K443R / M434R / N332H; four of said substitutions comprise: T458K / E262K / K443R / M434R, T458K / E262K / K443R / N332H, T458K / E262K / M434R / N332H, T458K / K443R / M434R / N332H, E262K / K443R / M434R / N332H; or six of said substitutions comprise: T458K / E262K / K443R / M434R / N332H / W327R, wherein " / " denotes "and".
7. The Cas enzyme of any one of claims 1-6, comprising the amino acid sequence set forth in any one of SEQ ID NOs: 5-148, 166-378, 384-392, or 394-398.
8. The Cas enzyme of any one of claims 1-7, having a catalytically active domain capable of binding to a target DNA strand and / or a catalytically active domain capable of cleaving the target DNA strand.
9. The Cas enzyme of any one of claims 1-8, wherein the substitution or deletion of one or more amino acids is located in the catalytically active domain, such that the Cas enzyme has only the activity of binding to a target DNA strand, or has the activity of binding to a target DNA strand and the activity of cleaving the target DNA single strand.
10. A fusion molecule comprising the Cas enzyme of any one of claims 1-9 and one or more heterologous functional domains.
11. The fusion molecule of claim 10, the one or more heterologous functional domains are capable of modulating expression of one or more gene products.
12. The fusion molecule of any one of claims 10-11, the one or more heterologous functional domains are fused directly or indirectly to the Cas enzyme.
13. The fusion molecule of any one of claims 10-12, the one or more heterologous functional domains are selected from the group consisting of a helicase, a nuclease, a helicase-nuclease, a DNA methyltransferase, a DNA hydroxylmethylase, a histone methylase, a histone demethylase, a histone acetyltransferase, a histone deacetylase, a phosphatase, a kinase, a transcriptional (co-)activator, a transcriptional repressor, a DNA binding protein, a DNA structural protein, a marker protein, a reporter protein, a fluorescent protein, a ligand binding protein, a signal peptide, a subcellular localization sequence, an antibody epitope, and an affinity purification tag.
14. The fusion molecule of any one of claims 10-13, the one or more heterologous functional domains have one or more of the following activities: methylase activity, demethylase activity, deaminase activity, transcriptional activation activity, transcriptional repression activity, transcriptional release factor activity, reverse transcriptase activity, histone modification activity, RNA cleavage activity, and nucleic acid binding activity.
15. An engineered, programmable, non-naturally occurring CRISPR-Cas system, the system comprising the Cas enzyme of any one of claims 1-9 or a nucleotide encoding the Cas enzyme; or the system comprising the fusion molecule of any one of claims 10-14 or a nucleotide encoding the fusion molecule, and one or more guide RNAs that target a locus of a nucleic acid molecule encoding one or more gene products in a cell, thereby directing the Cas enzyme or the fusion molecule to bind and / or cleave the locus of the nucleic acid molecule encoding one or more gene products; and, the Cas enzyme or the fusion molecule is present simultaneously or non-simultaneously with the guide RNA.
16. An engineered, non-naturally occurring vector system, the vector system comprising one or more vectors comprising: a) a first regulatory element operably linked to one or more guide RNAs that are capable of hybridizing to a target sequence in a locus of a nucleic acid molecule encoding one or more gene products, and b) a second regulatory element operably linked to the Cas enzyme of any one of claims 1-9 or the fusion molecule of any one of claims 10-14. wherein said component a) and said component b) are located on the same or different vectors of said vector system, and said guide RNA targets in a cell a locus of said nucleic acid molecule encoding one or more gene products, thereby directing said Cas enzyme or said fusion molecule to bind and / or cleave said locus of said nucleic acid molecule encoding one or more gene products; and, said Cas enzyme or said fusion molecule is present simultaneously or non-simultaneously with said guide RNA.
17. The system of any one of claims 15-16, expression of said one or more gene products is altered.
18. The system of any one of claims 15-17, expression of said gene product is decreased or increased.
19. The system of any one of claims 15-18, said gene product is a protein.
20. The system of any one of claims 15-19, wherein said cell is a eukaryotic cell.
21. The system of claim 20, said eukaryotic cell is a mammalian cell.
22. The system of claim 21, said mammalian cell is a human cell.
23. The system of any one of claims 15-22, said Cas enzyme is codon-optimized for expression in a eukaryotic cell.
24. The system of any one of claims 15-23, said guide RNA comprises a guide sequence fused to a tracr sequence.
25. The system of any one of claims 15-24, said guide RNA comprises a direct repeat sequence and a spacer sequence, wherein said spacer sequence binds to a nucleic acid molecule targeted by said guide RNA.
26. The system of claim 25, said direct repeat sequence is 10 to 70 nucleotides in length.
27. The system of claim 25 or 26, said direct repeat sequence is 31 to 36 nucleotides in length.
28. The system of claim 25, said spacer sequence is 16 to 24 nucleotides in length.
29. The system of any one of claims 15-28, said nucleic acid molecule targeted by said guide RNA comprises a nucleotide sequence capable of complementary pairing to said spacer sequence.
30. The system of any one of claims 15-29, said vector or said Cas enzyme of said system further comprises one or more nuclear localization sequences (NLS).
31. The system of any one of claims 15-30, said system is introduced into said cell by a delivery system selected from the group consisting of a virion, a liposome, a lipid nanoparticle, electroporation, microinjection, and conjugation.
32. A method of altering expression of one or more gene products, the method comprising introducing into a cell comprising and expressing a nucleic acid molecule encoding the one or more gene products an engineered, non-naturally occurring CRISPR-Cas system comprising a Cas enzyme of any one of claims 1-9 or a fusion molecule of any one of claims 10-14, and one or more guide RNAs that target a locus of the nucleic acid molecule encoding the one or more gene products to direct the Cas enzyme or the fusion molecule to bind and / or cleave the locus, thereby altering expression of the one or more gene products; and the Cas enzyme or the fusion molecule is present simultaneously or non-simultaneously with the guide RNA.
33. A method of altering expression of one or more gene products, the method comprising introducing into a ceil comprising and expressing a nucleic acid molecule encoding the one or more gene products an engineered, non- naturally occurring vector system comprising one or more vectors comprising: a) a first regulatory element operably linked to one or more guide RNAs that are capable of hybridizing to a target sequence in a locus of the nucleic acid molecule encoding the one or more gene products, and b) a second regulatory element operably linked to a Cas enzyme of any one of claims 1-9 or a fusion molecule of any one claims 10-14, wherein the component a) and the component b) are located on the same or different vectors of the vector system, and the guide RNAs target a locus of the nucleic acid molecule encoding the one or more gene products in the cell to direct the Cas enzyme or the fusion molecule to bind and / or cleave the locus, thereby alter expression of the one or more gene products; and the Cas enzyme or the fusion molecule is present non-simultaneously or simultaneously with the guide RNA.
34. The method of claim 32 or 33, the expression of the gene product is decreased or increased.
35. The method of any one of claims 32-34, the gene product is a protein.
36. The method of any one of claims 32-35, the cell is a eukaryotic cell.
37. The method of claim 36, the eukaryotic cell is a mammalian cell.
38. The method of claim 37, the mammalian cell is a human cell.
39. The method of any one of claims 32-38, the Cas enzyme is codon-optimized for expression in a eukaryotic cell.
40. The method of any one of claims 32-39, the guide RNA comprises a guide sequence fused to a tracr sequence.
41. The method of any one of claims 32-40, the guide RNA comprising a Direct repeat sequence and a Spacer sequence, wherein the Spacer sequence binds to a nucleic acid molecule targeted by the guide RNA.
42. The method of claim 41, the Direct repeat sequence having a length of 10 to 70 nucleotides.
43. The method of claim 41 or 42, the Direct repeat sequence having a length of 31 to 36 nucleotides.
44. The method of claim 41, the Spacer sequence having a length of 16 to 24 nucleotides.
45. The method of claim 41 or 44, the nucleic acid molecule targeted by the guide RNA comprising a nucleotide sequence capable of complementary pairing to the Spacer sequence.
46. The method of any one of claims 32-45, the vector or the Cas enzyme of the system further comprising one or more nuclear localization sequences (NLS).
47. The method of any one of claims 32-46, the method comprising introducing the CRISPR-Cas system or the vector system into the cell by a delivery system selected from the group consisting of a virion, a liposome, a lipid nanoparticle, electroporation, microinjection, and conjugation.
48. A nucleic acid encoding the Cas enzyme of any one of claims 1-9, the fusion molecule of any one of claims 10-14, or the CRISPR-Cas system of any one of claims 15 and 17-31.
49. A cell comprising the Cas enzyme of any one of claims 1-9, the fusion molecule of any one claims 10-14, the CRISPR-Cas system of any one of claims 15 and 17-31, the vector system of any one of claims 16-31, and / or the nucleic acid of claim 48.
50. A kit comprising the Cas enzyme of any one of claims 1-9, the fusion molecule of any of claims 10-14, the CRISPR-Cas system of any one of claims 1 5 and 17-31, the vector system of any one of claims 16-31, the nucleic acid of claim 48, and / or the cell of claim 49.
51. The kit of claim 50, further comprising a container for placing the Cas enzyme, the fusion molecule, the CRISPR-Cas system, the vector system, the nucleic acid, and / or the cell, and an instruction manual.