A CRISPR-Cas system

CN122249549APending Publication Date: 2026-06-19SHENZHEN HUADA GENE INST

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
SHENZHEN HUADA GENE INST
Filing Date
2025-03-25
Publication Date
2026-06-19

AI Technical Summary

Technical Problem

Existing CRISPR-Cas systems have problems such as large molecular weight, difficult delivery, limited editing sites, easy off-target effects, and PAM sequence dependence that limits their application scope. In particular, the Cas9 and Cas12a systems have insufficient delivery and targeting in eukaryotic cells.

Method used

A new CRISPR-Cas system has been developed, which contains the Cas protein C2c11 with less than 700 amino acids, which has zinc finger domains and RuvC domains and does not contain HNH domains. It can form a complex with guide RNA, recognize and cut target nucleic acids, and has broad targeting and high editing activity.

Benefits of technology

It achieves small molecular weight, high activity and broad targeting of nucleic acid editing, reduces off-target effects, and expands the application range of the CRISPR-Cas system in eukaryotic cells.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure 00000064_0000
    Figure 00000064_0000
  • Figure 00000064_0001
    Figure 00000064_0001
  • Figure 00000065_0000
    Figure 00000065_0000
Patent Text Reader

Abstract

This invention relates to the field of nucleic acid editing, particularly to the field of regularly clustered short palindromic repeats (CRISPR) technology. Specifically, this invention relates to a system or composition comprising a Cas effector protein, as well as a vector system, delivery composition, and kit comprising said system or composition. This invention also relates to the use of these systems or compositions, vector systems, delivery compositions, and kits in nucleic acid editing, and methods for nucleic acid editing, nucleic acid detection, and disease treatment.
Need to check novelty before this filing date? Find Prior Art

Description

A CRISPR-Cas system Technical Field

[0001] The present invention relates to the field of nucleic acid editing, in particular to nucleic acid editing based on regularly clustered interspaced short palindromic repeats (CRISPR). Specifically, the present invention relates to a system or composition comprising a Cas effector protein, and a vector system, a delivery composition, and a kit comprising the system or composition. The present invention also relates to the use of these systems or compositions, vector systems, delivery compositions, and kits in nucleic acid editing, as well as methods for nucleic acid editing, nucleic acid detection, and disease treatment. Background Art

[0002] The CRISPR (clustered regularly interspaced short palindromic repeats) system is an adaptive immune system in prokaryotes, possessing RNA-guided endonuclease activity. The earliest discovered gene editing system was the Cas9 system, mediated by crRNA and tracrRNA (trans-activating crRNA). This system uses certain Cas proteins, primarily Cas1 and Cas2, to capture bacteriophage viral DNA or exogenous plasmid DNA and insert it into the bacteria's native direct repeat sequences, forming a CRISPR sequence. The CRISPR sequence is transcribed into pre-crRNA, which is then processed and modified into crRNA. The crRNA and tracrRNA combine through local base pairing and form a ribonucleoprotein (RNP) with the Cas9 nuclease, which possesses RNA-guided DNA endonuclease activity. When the bacteria are reinfected with the virus, the invading DNA is specifically recognized and cleaved by the RNP, a process that requires the presence of a specific protospacer-adjacent motif (PAM). The CRISPR-Cas system has been widely used in gene editing due to its RNA-guided endonuclease activity.

[0003] CRISPR-Cas systems can be divided into two distinct categories: Class 1 and Class 2, based on whether the proteins involved in the defense process are single-subunit or multi-subunit complexes. Class 1 CRISPR-Cas systems are categorized into Types I, III, and IV based on their effector proteins. These systems utilize complex effectors formed by multiple Cas proteins to cleave foreign nucleic acids. Class 2 systems are divided into Types II, V, and VI. Their most notable characteristic is that they utilize a single Cas (CRISPR-associated protein) complex with crRNA (CRISPR RNA) to perform targeted cleavage, making them simpler to operate. Although Class 2 systems utilize only a single effector protein, the different types of effector proteins identified vary significantly in molecular weight, domain structure, crRNA, PAM sequence preference, and nucleic acid cleavage patterns, providing greater flexibility for gene editing. Types II and V CRISPR-Cas systems encode Cas1, Cas2, and several accessory proteins, such as Cas4. Type VI CRISPR-Cas systems consist solely of the CRISPR array and two ribonuclease domain-containing effector proteins from the HEPN superfamily. Furthermore, a notable feature of type II and type V CRISPR-Cas sequences is the presence of RuvC-like nuclease domains in their effector proteins. In the type II effector protein Cas9, the RuvC domain contains an inserted HNH nuclease domain, which preferentially recognizes guanine-rich PAM sequences and, under RNA guidance, cleaves phosphodiester bonds to produce blunt ends. In contrast, various subtypes of type V effector proteins contain only RuvC-like domains and lack HNH nuclease domains. Instead, they preferentially recognize thymine-rich PAM sequences and cleave phosphodiester bonds to produce sticky ends.

[0004] CRISPR technology has developed rapidly, enabling applications in gene editing in bacteria, archaea, and eukaryotic cells. However, it has also exposed numerous challenges. The currently widely used Cas9 and Cas12a gene editing systems suffer from limitations such as large molecular weight, difficulty in delivery, limited editing sites, and susceptibility to off-target effects. Since Cas9 and Cas12a are typically larger than 1200 amino acids, and the widely used adeno-associated virus (AAV) genome is approximately 4.7 kb, with limited packaging length, inserting excessively long exogenous sequences may exceed the AAV capacity, posing significant challenges to AAV delivery. Furthermore, the CRISPR-Cas system relies heavily on the presence of a PAM sequence for target sequence recognition. This limitation significantly restricts its application in cell editing and nucleic acid detection. Currently commercialized Cas proteins include SpCas9, which is restricted by the 3'-terminal PAM NGG, and LbCas12a / AsCas12a, which is restricted by the 5'-terminal PAM TTTN. Although the specificity of currently engineered CRISPR-Cas systems has been significantly improved, the limitations of their PAMs still limit their application. Secondly, the issue of off-target rate, off-target effects caused by low specificity, large-fragment deletions and complex gene recombination also need to be paid attention to.

[0005] Recently, several small type V systems, including CRISPR-Cas12f, Cas12m, and Cas12n, have attracted considerable attention due to their extremely small Cas protein size and high specificity. Studies have found that these proteins, mediated by corresponding crRNA and tracrRNA, can achieve dsDNA recognition and targeted cleavage in eukaryotic cells. However, the editing activity of the small Cas12 systems reported so far is relatively low. Moreover, their high PAM recognition specificity results in a narrow range of target sequences, limiting their further application. Therefore, the development of new, smaller CRISPR-Cas systems with higher activity and a wider targeting range is urgently needed. Given that currently available CRISPR-Cas systems are limited by several drawbacks, the development of a more robust new CRISPR-Cas system with multiple advantages is of great significance to the development of gene editing technology. Summary of the Invention

[0006] After extensive experimentation and repeated exploration, the inventors of this application unexpectedly discovered a new class of nucleases with smaller molecular size and higher editing activity. Based on this discovery, the inventors developed a new CRISPR-Cas system and a gene editing method based on this system.

[0007] System or combination

[0008] The present application identifies a new class of Cas proteins, which can be distinguished from other classes of Cas proteins by protein size, primary structure (e.g., motif), secondary structure (e.g., α-helical structure, β-sheet structure), functional domain (e.g., containing a zinc finger domain, containing a RuvC domain, and not containing an HNH domain) and / or biological activity.

[0009] Thus, in a first aspect, the present invention provides a system or composition comprising:

[0010] a) Cas protein C2c11 or a nucleic acid molecule A1 encoding the Cas protein C2c11, wherein the Cas protein C2c11 is capable of cleaving or fragmenting a target nucleic acid; and the Cas protein C2c11 is less than 700 amino acids (e.g., 300-700 amino acids, 350-650 amino acids, 400-600 amino acids, 450-550 amino acids); wherein the Cas protein C2c11:

[0011] (i) comprising at least one (e.g., 1, 2, 3 or more) zinc finger domain capable of recognizing, binding and / or cleaving a target nucleic acid, wherein the zinc finger domain comprises an α-helical structure and a β-sheet structure and does not contain the motif Cys-X1-X2-Cys, wherein X1 and X2 are each independently selected from any amino acid;

[0012] (ii) comprises at least one RuvC domain capable of binding to and / or cleaving a target nucleic acid; and

[0013] (iii) does not contain an HNH domain; and

[0014] b) one or more guide RNAs or nucleic acid molecules B1 encoding the guide RNAs, wherein the guide RNAs are capable of forming a complex with the Cas protein C2c11, and under conditions that allow nucleic acid hybridization or annealing, the guide RNAs are capable of guiding the complex to hybridize or anneal to the target nucleic acid.

[0015] In certain embodiments, when the target nucleic acid is a double-stranded target nucleic acid, the RuvC domain is capable of unwinding the double-stranded target nucleic acid (eg, DNA double strand, DNA-RNA double strand) into a single-stranded target nucleic acid.

[0016] Zinc finger domain of Cas protein C2c11

[0017] As used herein, the term "zinc finger domain" refers to a domain involved in the recognition, binding and / or cleavage of target nucleic acids by the Cas protein C2c11 of the present application, which can recognize, bind and / or cut DNA and / or RNA target nucleic acids. In certain embodiments, the spatial position of the zinc finger domain in the Cas protein C2c11 is identical or similar to its spatial position in the proximal family protein (e.g., Type V-U Cas protein) of the Cas protein C2c11. In certain embodiments, in the domain contained in the Cas protein C2c11, the zinc finger domain is adjacent to the RuvC domain.

[0018] At present, the zinc finger domain contained in the Cas protein of less than 700 amino acids known in the art contains only a β-sheet structure, and most of them contain the motif Cys-X1-X2-Cys (e.g., Cys-His-His-Cys), wherein X1 and X2 are each independently selected from any amino acid. The Cas protein C2c11 in the system or composition provided herein is less than 700 amino acids, and the zinc finger domain contained therein contains both an α-helix structure (α-helix) and a β-sheet structure (β-sheet), and does not contain the motif Cys-X1-X2-Cys, wherein X1 and X2 are each independently selected from any amino acid.

[0019] In the new class of Cas proteins C2c11 provided in the present application, the amino acid sequences of the same domain contained in different proteins may be the same or different.

[0020] In certain exemplary embodiments, the zinc finger domain contained in the Cas protein C2c11 in the system or composition provided herein contains the amino acid sequence Y1-X1-X2-Y2, wherein X1 and X2 are each independently selected from any amino acid, Y1 and Y2 are each independently selected from amino acids other than Cys, and Y1 and Y2 can be the same or different amino acids.

[0021] In certain exemplary embodiments, the zinc finger domain contained in the Cas protein C2c11 in the system or composition provided herein contains the amino acid sequence Gln-X1-X2-Asn, wherein X1 and X2 are each independently selected from any amino acid.

[0022] In certain exemplary embodiments, the zinc finger domain contained in the Cas protein C2c11 in the system or composition provided herein contains the amino acid sequence His-X1-X2-Gly, wherein X1 and X2 are each independently selected from any amino acid.

[0023] In certain embodiments, the zinc finger domain of the Cas protein C2c11 in the system or composition provided herein contains the conserved amino acids Ser and Arg. In certain embodiments, the conserved amino acid Ser (S) is located at position 425 of the Cas protein C2c11 corresponding to SEQ ID NO: 18. In certain embodiments, the conserved amino acid Arg (R) is located at position 442 of the Cas protein C2c11 corresponding to SEQ ID NO: 18.

[0024] Other functional domains of Cas protein C2c11

[0025] In the identified active Cas proteins (e.g., Cas12a, Cas12f), multiple functional domains (e.g., REC domain, WED domain, RuvC domain, ZNF domain) have been identified.

[0026] In certain embodiments, after the Cas protein C2c11 binds to the target nucleic acid through the zinc finger domain, other domains of the Cas protein C2c11 (e.g., RuvC domain) are responsible for cutting and / or breaking the target nucleic acid. In certain embodiments, after the Cas protein C2c11 binds to the target nucleic acid, the RuvC domain is responsible for cutting the target sequence and the complementary chain of the target sequence. In certain embodiments, after the Cas protein C2c11 recognizes the double-stranded target nucleic acid, the RuvC domain is responsible for unwinding the double-stranded target nucleic acid into a single-stranded target nucleic acid. In certain embodiments, the RuvC domain is capable of cutting or breaking the phosphodiester bond in the target nucleic acid. In certain embodiments, after the RuvC domain cuts or breaks the target nucleic acid, a target nucleic acid with a sticky end is generated.

[0027] In certain embodiments, the Cas protein C2c11 comprises a REC domain. In certain embodiments, the REC domain is capable of participating in the recognition of the Cas protein C2c11 and the PAM. In certain embodiments, the REC domain comprises one or more (e.g., one, two, three, four, five or more) α-helical structures.

[0028] In certain embodiments, the Cas protein C2c11 comprises a WED domain. In certain embodiments, the WED domain can participate in the recognition of the Cas protein C2c11 and PAM, and / or the processing of crRNA (e.g., processing pre-crRNA into mature crRNA).

[0029] In certain embodiments, the Cas protein C2c11 has cis-cleavage activity for cleaving a target nucleic acid (eg, a double-stranded target nucleic acid, a single-stranded target nucleic acid).

[0030] In certain embodiments, after the Cas protein C2c11 of the present application forms a complex with the guide RNA, it is guided by the guide RNA and hybridizes or anneals to the target nucleic acid, and then the complex cuts one or both chains of the target nucleic acid.

[0031] In certain embodiments, the Cas protein C2c11 has a trans-cleavage activity that can be activated by a target nucleic acid. In certain embodiments, the Cas protein C2c11 can cut single-stranded nucleic acids (e.g., ssDNA, ssRNA) after being activated by a target nucleic acid.

[0032] In certain embodiments, the Cas protein C2c11 of the present application has a trans-cleavage activity that can be activated by a target nucleic acid. In certain embodiments, both double-stranded and single-stranded target nucleic acids can activate the trans-cleavage activity of the Cas protein C2c11. In certain embodiments, when the Cas protein C2c11 forms a complex with the guide RNA and hybridizes or anneals with the target nucleic acid, the complex is activated and has a trans-cleavage activity against non-target single-stranded DNA sequences. That is, the complex can indiscriminately shear single-stranded DNA of any sequence in the reaction system. In certain embodiments, when a fluorescently labeled single-stranded DNA probe is present in the reaction system, the presence of the target nucleic acid can be determined by detecting the fluorescent signal of the single-stranded DNA probe, and the detection of the target nucleic acid can be achieved.

[0033] In certain embodiments, the amino acid residue at position 250 of the Cas protein C2c11 corresponding to SEQ ID NO: 18 is D. In certain embodiments, the amino acid residue at position 358 of the Cas protein C2c11 corresponding to SEQ ID NO: 18 is E. In certain embodiments, the amino acid residue at position 425 of the Cas protein C2c11 corresponding to SEQ ID NO: 18 is S. In certain embodiments, the amino acid residue at position 442 of the Cas protein C2c11 corresponding to SEQ ID NO: 18 is R. In certain embodiments, the amino acid residue at position 455 of the Cas protein C2c11 corresponding to SEQ ID NO: 18 is D. In certain embodiments, the amino acid residue at position 460 of the Cas protein C2c11 corresponding to SEQ ID NO: 18 is F. In certain embodiments, the amino acid residue at position 484 of the Cas protein C2c11 corresponding to SEQ ID NO: 18 is F ( Figure 8 ).

[0034] In certain embodiments, the zinc finger domain contained in the Cas protein C2c11 contains the motif Y1-X1-X2-Y2, wherein X1 and X2 are each independently selected from any amino acid, Y1 and Y2 are each independently selected from amino acids other than Cys, and Y1 and Y2 can be the same or different amino acids; for example, containing the motif Gln-X1-X2-Asn, or containing the motif His-X1-X2-Gly.

[0035] In certain embodiments, the amino acid residues at positions 250, 358, 460, and 455 are located in a RuvC domain. In certain embodiments, the amino acid residues at positions 425 and 442 are located in a zinc finger domain.

[0036] In certain embodiments, the Cas protein C2c11 comprises a WED domain, a REC domain, a RuvC domain, and a zinc finger domain.

[0037] In certain embodiments, among the domains comprised by the Cas protein C2c11, the zinc finger domain is adjacent to the RuvC domain.

[0038] In certain embodiments, the WED domain comprises a WED-I subdomain and / or a WED-II subdomain. In certain embodiments, the WED domain consists of a WED-I subdomain and a WED-II subdomain.

[0039] In certain embodiments, the RuvC domain comprises a RuvC-I subdomain, a RuvC-II subdomain and / or a RuvC-III subdomain. In certain embodiments, the RuvC domain consists of a RuvC-I subdomain, a RuvC-II subdomain and a RuvC-III subdomain.

[0040] In certain embodiments, the zinc finger domain comprises a zinc finger domain-I subdomain and / or a zinc finger domain-II subdomain. In certain embodiments, the zinc finger domain consists of a zinc finger domain-I subdomain and a zinc finger domain-II subdomain.

[0041] In certain embodiments, the Cas protein C2c11 comprises the amino acid sequence of the following domains and / or subdomains from N-terminus to C-terminus: RuvC-I subdomain, RuvC-II subdomain, zinc finger domain-I subdomain, RuvC-III subdomain and zinc finger domain-II subdomain.

[0042] In certain embodiments, the Cas protein C2c11 comprises the amino acid sequence of the following domains and / or subdomains from N-terminus to C-terminus in sequence: WED-I subdomain, REC domain, WED-II subdomain, RuvC-I subdomain, RuvC-II subdomain, zinc finger domain-I subdomain, RuvC-III subdomain and zinc finger domain-II subdomain. In certain embodiments, the Cas protein C2c11 is a natural protein obtained from an organism (e.g., a microorganism) or a homolog thereof, or is a protein comprising one or more (e.g., 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 or more amino acids) amino acid substitutions, deletions and / or additions based on the natural protein. In certain embodiments, the protein retains the secondary structure (e.g., as defined above) and / or biological activity (e.g., as defined above) of the natural protein from which it is derived.

[0043] In certain embodiments, the microorganism is selected from bacteria, viruses (eg, bacteriophages), or any combination thereof.

[0044] In certain embodiments, the bacteria is selected from the group consisting of Corynebacterium, Schutella, Legionella, Treponema, Filamentosa, Eubacterium, Streptococcus, Lactobacillus, Mycoplasma, Bacteroides, Flavobacterium, Flavobacterium, Glomerella, Azospirillum, Gluconacetobacter, Neisseria, Roseburia, Microbacterium, Staphylococcus, Nitrobacter, Campylobacter, Carnegiea, Rhodobacter, Listeria, Parodia, Clostridium, Lachnospiraceae, Leptotrichia, Francisella, Alicyclobacillus, Methanophilus, Porphyromonas, Prevotella, Bacillus, Helicococcus, Spirochaete, Desulfovibrio, Desulfovibrio, Potassium Fungus, Truffle, Oscillatory Spirospira, Eubacterium, Ruminococcus, Lachnospira, Acidaminococcus, or any combination thereof.

[0045] In certain embodiments, the microorganism is selected from Prevotella, Bacteroides, Bacillus, Oscillatory Spirulina, Eubacterium, Ruminococcus, Clostridium (e.g., Clostridium, Butyricum), Lachnospira, Eubacterium, or any combination thereof.

[0046] In certain embodiments, the microorganism is selected from a bacteriophage (eg, a tailed phage).

[0047] In certain embodiments, the microorganism is selected from Prevotellaceae bacterium, Bacteroidales bacterium, Bacilli bacterium, Oscillospiraceae bacterium, Eubacteriales bacterium, Clostridiales bacterium, Ruminococcus bacterium, Clostridia bacterium, Caudoviricetes, Lachnospira sp., Blautia sp., Ruminococcus bromii, Butyricicoccus intestinisimiae, or any combination thereof. In certain embodiments, the microorganism is a cultured or uncultured microorganism.

[0048] In certain embodiments, the Cas protein C2c11 has an amino acid sequence as shown in any one of SEQ ID NOs: 1-30, or a sequence having one or more (e.g., 1, 2, 3, 4, 5, 6, 7, 8, 9 or 10 amino acids) amino acid substitutions, deletions and / or additions compared thereto; wherein the sequence substantially retains the secondary structure (e.g., as defined above) and / or biological activity (e.g., as defined above) of the sequence from which it is derived.

[0049] In certain embodiments, the Cas protein C2c11 has an amino acid sequence as shown in any one of SEQ ID NOs: 1-30, or has at least 40%, at least 50%, at least 60%, at least 70%, at least 80%, at least 85%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% sequence identity compared thereto, and substantially retains the secondary structure (e.g., as defined above) and / or biological activity (e.g., as defined above) of the sequence from which it is derived.

[0050] In certain embodiments, the nucleotide sequence of the nucleic acid molecule A1 encoding the amino acid sequence of the Cas protein C2c11 is codon-optimized for expression in prokaryotic and / or eukaryotic cells. These eukaryotic cells can be mammals or primates, including but not limited to humans, mice, rats, rabbits, and dogs.

[0051] In this article, the term "codon optimized" refers to replacing at least one codon (e.g., about or more than about 1, 2, 3, 4, 5, 10, 15, 20, 25, 50 or more codons) of a native sequence with a codon that is more frequently or most frequently used in the gene of a host cell, while maintaining the native amino acid sequence, to enhance the method for expression of the sequence of interest in a host cell. Different species exhibit specific preferences for certain codons with specific amino acids. Codon utilization tables can be easily obtained, such as in the available codon usage database (" Codon Usage Database ") at www.kazusa.orjp / codon / , and these tables can be adjusted to be applicable in different ways.

[0052] In certain embodiments, the Cas protein C2c11 has the nucleotide sequence shown in any one of SEQ ID NOs: 31-60 or has at least 40%, at least 50%, at least 60%, at least 70%, at least 80%, at least 85%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98% or at least 99% sequence identity thereto.

[0053] In certain embodiments, the Cas protein C2c11 comprises a modified portion, for example, it is connected to another molecule (e.g., another polypeptide or protein). Typically, the modification of the protein does not adversely affect the desired activity of the protein (e.g., activity binding to a guide RNA, endonuclease activity, activity of binding to and cutting a specific site of a target sequence under the guidance of a guide RNA). Therefore, the protein of the present invention is also intended to include such modified forms. For example, the protein of the present invention can be functionally linked (by chemical coupling, gene fusion, non-covalent linkage or other means) to one or more other molecular groups, such as another protein or polypeptide, a detection reagent or a pharmaceutical agent, etc.

[0054] In certain embodiments, the modifying moiety is selected from another protein or polypeptide, a detectable label, a purification tag, or any combination thereof.

[0055] In certain embodiments, the modifying moiety is linked to the N-terminus or C-terminus of the protein, optionally through a linker.

[0056] In certain embodiments, the modifying moiety is fused to the N-terminus or C-terminus of the protein.

[0057] In certain embodiments, the additional protein or polypeptide is selected from an epitope tag, a reporter gene sequence, a nuclear localization signal (NLS) sequence, a targeting moiety, a transcriptional activation domain (e.g., VP64), a transcriptional repression domain (e.g., a KRAB domain or a SID domain), a nuclease domain (e.g., Fok1), a domain having the following activities: methylase activity, demethylase activity, transcriptional activation activity, transcriptional repression activity, transcription release factor activity, histone modification activity, nuclease activity, single-stranded RNA cleavage activity, double-stranded RNA cleavage activity, single-stranded DNA cleavage activity, double-stranded DNA cleavage activity, nucleic acid binding activity, or any combination thereof.

[0058] In certain embodiments, the protein of the present invention may be linked to a nuclear localization signal (NLS) sequence to enhance the ability of the protein of the present invention to enter the cell nucleus.

[0059] In certain embodiments, the protein of the present invention can be connected to a detectable label or reporter gene to facilitate detection of the protein of the present invention. Such detectable labels are well known to those skilled in the art, such as fluorescent dyes, such as FITC or DAPI. Such reporter genes are well known to those skilled in the art, and examples thereof include but are not limited to GST, HRP, CAT, GFP, HcRed, DsRed, CFP, YFP, BFP, etc.

[0060] In certain embodiments, the protein of the present invention can be linked to an epitope tag to facilitate expression, detection, tracing, and / or purification of the protein of the present invention. Such epitope tags are well known to those skilled in the art, and examples thereof include, but are not limited to, His, V5, FLAG, HA, Myc, VSV-G, Trx, and the like. Those skilled in the art know how to select an appropriate epitope tag based on the desired purpose (e.g., purification, detection, or tracing).

[0061] In certain embodiments, the modified portion is connected to the N-terminus or C-terminus of the protein of the present invention via a linker. Such linkers are well known in the art, and examples thereof include, but are not limited to, linkers comprising one or more (e.g., 1, 2, 3, 4, or 5) amino acids (e.g., Glu or Ser) or amino acid derivatives (e.g., Ahx, β-Ala, GABA, or Ava) or PEG, etc.

[0062] The protein of the present invention is not limited by the method of production; for example, it can be produced by genetic engineering methods (recombinant technology) or by chemical synthesis methods.

[0063] guide RNA

[0064] In certain embodiments, the system or composition comprises two or more guide RNAs capable of hybridizing to different target nucleic acids or different regions of the same target nucleic acid. In certain embodiments, the two or more guide RNAs guide the same Cas protein C2c11 or guide different Cas proteins C2c11 respectively.

[0065] In certain embodiments, the guide RNA comprises a guide sequence, and the guide sequence is capable of hybridizing or annealing to a target nucleic acid under conditions that permit nucleic acid hybridization or annealing. In certain embodiments, the guide sequence comprises a complementary sequence to the sequence of the target nucleic acid. In certain embodiments, the guide sequence is at least 10 nt in length, e.g., 10-15 nt, 15-20 nt, 20-25 nt, 25-30 nt, 30-40 nt, 40-50 nt, 50-100 nt, 100-200 nt, or longer.

[0066] In certain embodiments, the guide RNA further comprises a backbone sequence. In certain embodiments, the backbone sequence has a length of at least 20 nt, such as 20-30 nt, 30-40 nt, 40-50 nt, 50-100 nt, 100-200 nt, 200-300 nt or longer.

[0067] In certain embodiments, the guide RNA forms a complex with the Cas protein C2c11 through the backbone sequence.

[0068] In certain embodiments, the targeting sequence is located 3' to the backbone sequence.

[0069] In certain embodiments, the guide RNA is non-naturally occurring or modified.

[0070] In certain embodiments, the sequence of the guide RNA comprises at least one chemical modification.

[0071] In certain embodiments, the chemical modification is selected from pseudo-U, 5-methyl-C, methylated nucleotide or nucleotide analog, 2'-O-methyl, 2'-O-methyl 3' phosphorothioate, 2'-O-methyl 3' thio PACE, or any combination thereof.

[0072] Herein, the guide RNA includes crRNA and tracrRNA, wherein a partial sequence of crRNA serves as the guide sequence of the guide RNA, and the remaining sequence of crRNA and tracrRNA together serve as the backbone sequence of the guide RNA.

[0073] In certain embodiments, the crRNA comprises a repetitive sequence and a guide sequence, wherein the guide sequence is capable of hybridizing or annealing to the target nucleic acid under conditions that allow nucleic acid hybridization or annealing. In certain embodiments, the repetitive sequence is located upstream of the guide sequence.

[0074] In certain embodiments, tracrRNA comprises a complementary repeat sequence, wherein, under conditions allowing nucleic acid hybridization or annealing, the complementary repeat sequence can hybridize or anneal to the repeat sequence of crRNA. It will be appreciated by those skilled in the art that the complementary repeat sequence and the repeat sequence do not need to be completely complementary. In certain embodiments, when optimally aligned, the degree of complementarity between the complementary repeat sequence and the repeat sequence can be at least 50%, at least 60%, at least 70%, at least 80%, at least 90%, at least 95% or at least 99%.

[0075] In certain embodiments, tracrRNA further comprises a non-complementary repeat sequence that can form a stem-loop structure (also referred to as a "hairpin structure") in the secondary structure. In certain embodiments, the non-complementary repeat sequence is located upstream of the complementary repeat sequence.

[0076] In certain embodiments, the tracrRNA consists of complementary repeat sequences and non-complementary repeat sequences.

[0077] In certain embodiments, a linker sequence is further included between the complementary repeat sequence of tracrRNA and the repeat sequence of crRNA. In certain embodiments, the linker sequence is at least 2 nt in length. For example, 2 nt, 3 nt, 4 nt, 5 nt, 6 nt, 7 nt, or 8 nt.

[0078] In certain exemplary embodiments, the structure of the guide RNA of the present application is shown in Figure 9. In such exemplary embodiments, the guide RNA comprises, from the 5' end to the 3' end, the following: a non-complementary repeat sequence of tracrRNA (the portion of the red segment in Figure 9 forming a stem-loop structure), a complementary repeat sequence of tracrRNA (the portion of the red segment complementary to the blue segment in Figure 9), a connecting sequence (linker in Figure 9), a repeat sequence of crRNA (the portion of the blue segment complementary to the red segment in Figure 9), and a guide sequence of crRNA (the portion of the yellow segment in Figure 9).

[0079] In such exemplary embodiments, the non-complementary repeat sequence of tracrRNA, the complementary repeat sequence of tracrRNA, the linker sequence and the repeat sequence of crRNA serve together as the backbone sequence of guide RNA. In such exemplary embodiments, the guide sequence of crRNA serves as the guide sequence of guide RNA.

[0080] In certain embodiments, the backbone sequence comprises or consists of a sequence selected from the group consisting of:

[0081] (i) the sequence shown in any one of SEQ ID NOs: 63, 65, 67, 72-98;

[0082] (ii) sequences having one or more base substitutions, deletions and / or additions (e.g., 1, 2, 3, 4, 5, 6, 7, 8, 9 or 10 base substitutions, deletions and / or additions) compared to the sequences shown in SEQ ID NOs: 63, 65, 67, 72-98;

[0083] (iii) a sequence having at least 50%, at least 60%, at least 70%, at least 80%, at least 90%, or at least 95% sequence identity to the sequence shown in SEQ ID NOs: 63, 65, 67, 72-98;

[0084] (iv) a sequence that hybridizes under stringent conditions to the sequence described in any one of (i) to (iii); or

[0085] (v) a complementary sequence of the sequence described in any one of (i) to (iii);

[0086] Furthermore, the sequence of any one of (ii) to (v) substantially retains the biological activity of the sequence from which it is derived;

[0087] In certain embodiments, the guide RNA comprises or consists of a sequence selected from the group consisting of:

[0088] (i) the sequence shown in any one of SEQ ID NOs: 62, 64, and 66;

[0089] (ii) sequences having one or more base substitutions, deletions and / or additions (e.g., 1, 2, 3, 4, 5, 6, 7, 8, 9 or 10 base substitutions, deletions and / or additions) compared to the sequences shown in SEQ ID NOs: 62, 64, and 66;

[0090] (iii) a sequence having at least 50%, at least 60%, at least 70%, at least 80%, at least 90%, or at least 95% sequence identity to the sequence shown in SEQ ID NO: 62, 64, or 66;

[0091] (iv) a sequence that hybridizes under stringent conditions to the sequence described in any one of (i) to (iii); or

[0092] (v) a complementary sequence of the sequence described in any one of (i) to (iii);

[0093] Furthermore, the sequence of any one of (ii) to (v) substantially retains the biological function of the sequence from which it is derived.

[0094] Target nucleic acid

[0095] In certain embodiments, the sequence of the target nucleic acid is a DNA and / or RNA sequence from a prokaryotic cell or a eukaryotic cell; or, the sequence of the target nucleic acid is a non-naturally occurring DNA and / or RNA sequence.

[0096] In certain embodiments, the target nucleic acid is selected from a double-stranded target nucleic acid, a single-stranded target nucleic acid, or any combination thereof.

[0097] In certain embodiments, the target nucleic acid is present in a cell; alternatively, the target nucleic acid is present in an in vitro nucleic acid molecule (e.g., a plasmid or genomic DNA collected in vitro by cell lysis or PCR amplification). In certain embodiments, cells include, but are not limited to, prokaryotic cells such as Escherichia coli cells, and eukaryotic cells such as yeast cells, insect cells, plant cells, and animal cells (e.g., mammalian cells, e.g., mouse cells, human cells, etc.).

[0098] In certain embodiments, when the target nucleic acid is DNA, the target nucleic acid is located at the 3' end of the protospacer adjacent motif (PAM). In certain embodiments, the sequence of the PAM is selected from GTC, TTC, TTTC, TGTC, AGTC or any combination thereof.

[0099] In certain embodiments, when the target sequence is RNA, the target sequence is not restricted by a PAM domain.

[0100] In certain embodiments, the target nucleic acid is modified (e.g., cut, edited) by the complex formed by the guide RNA and Cas protein C2c11 after hybridization or annealing with the complex. In certain embodiments, the modification causes the expression product of the target nucleic acid to change (e.g., the expression level of the expression product is increased or decreased).

[0101] In certain embodiments, the systems or compositions further comprise additional components.

[0102] In certain embodiments, the additional component is selected from:

[0103] (1) an auxiliary protein or a nucleic acid molecule C1 encoding the auxiliary protein, wherein the auxiliary protein can enhance the activity of the Cas protein C2c11;

[0104] (2) a repressor protein or a nucleic acid molecule D1 encoding the repressor protein, wherein the repressor protein is capable of reducing the activity of the Cas protein C2c11;

[0105] (3) a nuclease (e.g., Cas1, Cas2, or Cas4) or a nucleic acid molecule E1 encoding the nuclease;

[0106] (4) Any combination of (1) to (3) above.

[0107] In certain embodiments, the additional component forms a complex with the Cas protein C2c11, or exists independently of the Cas protein C2c11.

[0108] Vector system

[0109] In a second aspect, the present application provides a vector system, comprising one or more vectors, wherein the one or more vectors comprise:

[0110] (a) a nucleic acid molecule A1 encoding the Cas protein C2c11 as described in the first aspect; optionally, the vector further comprises a first regulatory element operably linked to the nucleic acid molecule A1; and

[0111] (b) a nucleic acid molecule B1 encoding the guide RNA according to the first aspect; optionally, the vector further comprises a second regulatory element operably linked to the nucleic acid molecule B1;

[0112] Components (a) and (b) are located on the same or different carriers of the carrier system.

[0113] As used herein, the term "vector" is a tool that allows or facilitates the transfer of an entity from one environment to another. Thus, in certain embodiments, a "vector" includes a substance for delivering or introducing a Cas protein and / or guide RNA into a cell. In certain embodiments, the vector is also used to amplify the Cas protein and / or guide RNA (e.g., in prokaryotic cells).

[0114] In certain embodiments, a type of vector is a plasmid, which refers to a circular double-stranded DNA wherein other DNA fragments can be inserted, for example, by standard molecular cloning techniques. Another type of vector is a viral vector, in which virally derived DNA or RNA sequences are present in a vector for packaging viruses (e.g., retroviruses, replication-defective retroviruses, adenoviruses, replication-defective adenoviruses, and adeno-associated viruses). Viral vectors also include polynucleotides carried by viruses transfected into a host cell. Some vectors (e.g., bacterial vectors and additional mammalian vectors with bacterial replication origins) can replicate autonomously in the host cell into which they are introduced. Other vectors (e.g., non-additional mammalian vectors) are integrated into the genome of the host cell after introducing the host cell, and thus replicate together with the host genome. Moreover, some vectors can instruct the expression of their operably linked genes. Such vectors are referred to as "expression vectors" herein. The common expression vectors used in recombinant DNA technology are typically plasmid forms.

[0115] The recombinant expression vector may comprise other nucleic acid molecules suitable for nucleic acid expression in a host cell. For example, the recombinant expression vector comprises one or more regulatory elements suitable for nucleic acid expression in a host cell, and the regulatory elements are operably linked to the nucleic acid sequence to be expressed.

[0116] In certain embodiments, the one or more vectors further comprise a nucleic acid molecule C1 encoding an accessory protein. In certain embodiments, the vector further comprises a third regulatory element operably linked to the nucleic acid molecule C1.

[0117] In certain embodiments, the one or more vectors further comprise a nucleic acid molecule D1 encoding a repressor protein. In certain embodiments, the vector further comprises a fourth regulatory element operably linked to the nucleic acid molecule D1.

[0118] In certain embodiments, the one or more vectors further comprise a nucleic acid molecule El encoding a nuclease. In certain embodiments, the vector further comprises a fifth regulatory element operably linked to the nucleic acid molecule El.

[0119] In certain embodiments, the first regulatory element, the second regulatory element, the third regulatory element, the fourth regulatory element, and the fifth regulatory element are the same or different.

[0120] In certain embodiments, the first regulatory element, the second regulatory element, the third regulatory element, the fourth regulatory element, and the fifth regulatory element are each independently selected from a promoter (e.g., an inducible promoter), an enhancer, an internal ribosome entry site (IRES), a terminator, or any combination thereof.

[0121] In certain embodiments, the promoter can be one or more constitutive promoters, and / or one or more conditional promoters, and / or one or more inducible promoters, and / or one or more tissue-specific promoters.

[0122] In certain embodiments, the promoter is selected from: RNA polymerase, pol I, pol II, pol III, T7, U6, H1, retroviral Rous sarcoma virus (RSV) LTR promoter, cytomegalovirus (CMV) promoter, SV40 promoter, dihydrofolate reductase promoter, β-actin promoter, phosphoglycerate kinase (PGK) promoter, EF1α promoter or any combination thereof.

[0123] In certain embodiments, the vector system comprises a viral vector.

[0124] In certain embodiments, the viral vector is selected from a retroviral vector, a lentiviral vector, an adenoviral vector, an adeno-associated viral vector, a herpes simplex viral vector, or any combination thereof.

[0125] In certain embodiments, the nucleotide sequences of nucleic acid molecule A1, nucleic acid molecule C1, nucleic acid molecule D1 and / or nucleic acid molecule E1 are codon-optimized according to the preference of the host cell (eg, eukaryotic cell, prokaryotic cell).

[0126] Delivery composition

[0127] In a third aspect, the present application provides a delivery composition comprising:

[0128] (1) a delivery vehicle, and

[0129] (2) The system or composition as described in the first aspect or the carrier system as described in the second aspect.

[0130] In certain embodiments, the delivery vehicle is a particle.

[0131] In certain embodiments, the delivery vehicle is selected from lipid particles, sugar particles, metal particles, protein particles, liposomes, exosomes, microvesicles, gene guns, viral vectors (e.g., replication-defective retroviruses, lentiviruses, adenoviruses, or adeno-associated viruses), or any combination thereof.

[0132] The delivery composition can deliver the components of the system or composition as described in the first aspect or the carrier system as described in the second aspect separately at different times or under different conditions, or deliver the components of the system or composition as described in the first aspect or the carrier system as described in the second aspect together.

[0133] Delivery can be performed by any method known in the art. Such methods include, but are not limited to, electroporation, lipofection, nucleofection, microinjection, sonoporation, gene guns, calcium phosphate-mediated transfection, cationic transfection, lipofection, dendritic transfection, heat shock transfection, nucleofection, magnetofection, lipofection, puncture transfection, optical transfection, agent-enhanced nucleic acid uptake, and delivery via liposomes, immunoliposomes, viral particles, artificial virions, and the like.

[0134] Reagent test kit

[0135] In a fourth aspect, the present invention provides a kit comprising one or more components selected from the group consisting of: the system or composition of the first aspect, the carrier system of the second aspect, or the delivery composition of the third aspect.

[0136] In certain embodiments, the kit further comprises instructions for using the system or composition, vector system, or delivery composition.

[0137] In certain embodiments, the components included in the kits of the present invention may be provided in any suitable container.

[0138] In certain embodiments, the kit further comprises one or more buffers. The buffer can be any buffer, including but not limited to sodium carbonate buffer, sodium bicarbonate buffer, borate buffer, Tris buffer, MOPS buffer, HEPES buffer and combinations thereof. In certain embodiments, the buffer is neutral or close to neutral. In certain embodiments, the buffer has a pH from about 6.0 to 9.0 (e.g., 6.0 to 7.0, 7.0 to 8.0 or 8.0 to 9.0).

[0139] In certain embodiments, the kit further comprises one or more oligonucleotides corresponding to a targeting sequence for insertion into a vector so as to operably link the targeting sequence and regulatory elements. In certain embodiments, the kit comprises a homologous recombination template polynucleotide.

[0140] Complex

[0141] In a fifth aspect, the present invention provides a composite comprising:

[0142] (i) a protein component comprising Cas protein C2c11, wherein the Cas protein C2c11 is as defined in the first aspect; and

[0143] (ii) a nucleic acid component;

[0144] Wherein, the protein component and the nucleic acid component combine with each other to form a complex.

[0145] In certain embodiments, the nucleic acid component comprises a guide RNA in a CRISPR-Cas system. In certain embodiments, the guide RNA is as defined in the first aspect.

[0146] In certain embodiments, the protein component further comprises another protein. In certain embodiments, the other protein forms a complex with the Cas protein C2c11 in a covalent or non-covalent manner.

[0147] In certain embodiments, the additional protein is selected from:

[0148] (1) an auxiliary protein, which can enhance the activity of the Cas protein C2c11;

[0149] (2) a repressor protein, which can reduce the activity of the Cas protein C2c11;

[0150] (3) nuclease (e.g., Cas1, Cas2, or Cas4);

[0151] (4) Any combination of (1) to (3) above.

[0152] In certain embodiments, the accessory protein or repressor protein and the Cas protein C2c11 are derived from the same organism.

[0153] Methods for modifying target genes

[0154] In a sixth aspect, the present invention provides a method for modifying a target gene, comprising: contacting the system or composition as described in the first aspect, the vector system as described in the second aspect, the delivery composition as described in the third aspect, the kit as described in the fourth aspect, or the complex as described in the fifth aspect with the target nucleic acid.

[0155] In certain embodiments, the method is used to modify the target nucleic acid in vitro or ex vivo. In certain embodiments, the target nucleic acid is present in an in vitro nucleic acid molecule (e.g., a plasmid or genomic DNA collected in vitro by cell lysis or PCR amplification).

[0156] In certain embodiments, the method is not a method of treating a human or animal by therapy.In certain embodiments, the method does not include the step of modifying the human's germline genetic characteristics.

[0157] In another aspect, the present invention provides a method for modifying a target gene, comprising: delivering the system or composition as described in the first aspect, the vector system as described in the second aspect, the delivery composition as described in the third aspect, the kit as described in the fourth aspect, or the complex as described in the fifth aspect into a cell containing the target nucleic acid.

[0158] In certain embodiments, the target gene is present in a cell. In certain embodiments, the cell is a prokaryotic cell. In certain embodiments, the cell is a eukaryotic cell. In certain embodiments, the cell is a mammalian cell. In certain embodiments, the cell is a human cell. In certain embodiments, the cell is selected from non-human primate, cattle, pig or rodent cells. In certain embodiments, the cell is a non-mammalian eukaryotic cell, such as a cell possessed by poultry or fish. In certain embodiments, the cell is a plant cell, such as a cell possessed by a cultivated plant (such as cassava, corn, sorghum, wheat or rice), algae, tree or vegetable.

[0159] In certain embodiments, the modification comprises cutting and / or editing. In certain embodiments, the modification comprises a break in the target sequence, such as a single-strand or double-strand break in DNA or a single-strand break in RNA. In certain embodiments, the break results in reduced transcription of the target gene.

[0160] In certain embodiments, the modification causes a change in the expression product of the target gene (eg, an increase or decrease in the expression level of the expression product).

[0161] In certain embodiments, the method further comprises contacting the target nucleic acid with an editing template or delivering it to a cell comprising the target nucleic acid. In such embodiments, the method repairs the broken target nucleic acid by homologous recombination with an exogenous template polynucleotide, wherein the repair results in a mutation comprising an insertion, deletion, or substitution of one or more nucleotides of the target nucleic acid. In certain embodiments, the mutation results in a change in one or more amino acids in a protein expressed from a gene comprising the target nucleic acid.

[0162] Thus, in certain embodiments, the modification further comprises inserting an editing template (eg, an exogenous nucleic acid) into the break.

[0163] cell

[0164] In the seventh aspect, the present invention provides an in vitro, ex vivo or in vivo cell or cell line or their progeny, which comprises: the system or composition as described in the first aspect, or the carrier system as described in the second aspect, or the delivery composition as described in the third aspect or the complex as described in the fifth aspect.

[0165] In another aspect, the present invention provides a cell, cell line or progeny thereof comprising a modified target gene, wherein the cell or cell line has been modified by the method according to the sixth aspect.

[0166] In certain embodiments, the cell is a prokaryotic cell.

[0167] In certain embodiments, the cell is a eukaryotic cell. In certain embodiments, the cell is a mammalian cell. In certain embodiments, the cell is a human cell. In certain embodiments, the cell is a non-human mammalian cell, such as a cell of a non-human primate, cattle, sheep, pig, dog, monkey, rabbit, rodent (such as rat or mouse). In certain embodiments, the cell is a non-mammalian eukaryotic cell, such as a cell of poultry (such as chicken), fish or crustacean (such as clams, shrimp). In certain embodiments, the cell is a plant cell, such as a cell, a cultivated plant or a food crop such as cassava, corn, sorghum, soybean, wheat, oat or rice, such as algae, tree, production plant, fruit or vegetable (for example, trees such as citrus trees, nut trees; Solanaceae, cotton, tobacco, tomato, grape, coffee, cocoa, etc.).

[0168] In certain embodiments, the cell is a stem cell or a stem cell line.

[0169] In certain embodiments, the cell is in vitro, ex vivo, or in vivo.

[0170] In certain embodiments, the modification results in an altered expression of at least one gene product of the cell, for example, an increase in the expression of the at least one gene product, or a decrease in the expression of the at least one gene product.

[0171] In certain embodiments, the expression of the gene product is altered (e.g., enhanced or reduced). In certain embodiments, the expression of the gene product is enhanced. In certain embodiments, the expression of the gene product is reduced.

[0172] In another aspect, the present invention provides a gene product, wherein the gene product is derived from the cell, cell line or progeny thereof according to the seventh aspect.

[0173] In certain embodiments, the gene product is a protein.

[0174] In another aspect, the present invention provides a plant or animal model comprising a cell or cell line comprising a modified target gene as described above, or progeny thereof.

[0175] use

[0176] In another aspect, the present invention relates to the use of the system or composition of the first aspect, the vector system of the second aspect, the delivery composition of the third aspect, the kit of the fourth aspect, or the complex of the fifth aspect for nucleic acid editing (e.g., in vitro or ex vivo nucleic acid editing), or in the preparation of a nucleic acid editing preparation, in the preparation of an in vitro or ex vivo nucleic acid detection preparation, or in the preparation of a medicament for treating a disease or condition in a subject in need thereof.

[0177] In certain embodiments, the nucleic acid to be edited is present in a cell. In certain embodiments, the nucleic acid to be edited is a genome. In certain embodiments, the cell is a prokaryotic cell or a eukaryotic cell. In certain embodiments, the nucleic acid to be edited is present in an in vitro nucleic acid molecule (e.g., a plasmid). In certain embodiments, the nucleic acid to be edited includes one or more genes.

[0178] In certain embodiments, the nucleic acid editing includes gene editing, for example, the gene editing includes single-site and / or multi-site gene modification, single-site and / or multi-site gene knockout, single-site and / or multi-site gene knock-in, single-site and / or multi-site methylation modification (for example, adding, removing methylation modification), changing the expression of gene products, repairing mutations, and / or inserting polynucleotides. In certain embodiments, the gene editing does not include the step of modifying human germline genetic characteristics. In certain embodiments, the purpose is not a method for treating humans or animals by therapy. In certain embodiments, the purpose also includes repairing the edited target sequence by homologous recombination with an exogenous template polynucleotide, wherein the repair can produce a mutation in the target sequence, including insertion, deletion or substitution of one or more nucleotides.

[0179] In certain embodiments, the in vitro or ex vivo nucleic acid is selected from double-stranded DNA, single-stranded DNA, single-stranded RNA, double-stranded RNA, or any combination thereof. In certain embodiments, the nucleic acid detection is used to detect tumors, viruses, or bacteria. For example, the detection of tumors, Ebola, avian influenza, African swine fever, and other viruses or bacteria.

[0180] In certain embodiments, the system or composition as described in the first aspect, the vector system as described in the second aspect, the delivery composition as described in the third aspect, the kit as described in the fourth aspect, or the complex as described in the fifth aspect can be used to edit a gene associated with the disease or condition in a subject to treat the disease or condition.

[0181] In certain embodiments, the gene has one or more mutations, is a gene that causes a genetic variation, or is linked to one or more genes that cause a genetic variation.

[0182] In certain embodiments, the disease or disorder is a genetic disease or disorder; for example, a blood disease or disorder, an eye disease or disorder, a liver disease or disorder, a muscle disease or disorder, or a neurological disease or disorder.

[0183] In certain embodiments, the disease or condition is a disease or condition caused by a genetic mutation or a pathogenic SNP; for example, cancer.

[0184] Detection method

[0185] In certain embodiments, the Cas protein C2c11 of the present application has a trans-cleavage activity that can be activated by a target nucleic acid. In certain embodiments, double-stranded or single-stranded target nucleic acids can activate the trans-cleavage activity of the Cas protein C2c11. Therefore, when the Cas protein C2c11 of the present application forms a complex with the guide RNA and hybridizes or anneals with the target nucleic acid, the complex will be activated and have a trans-cleavage activity for non-target single-stranded DNA sequences. That is, the complex can indiscriminately shear single-stranded DNA of any sequence in the reaction system. In this case, when a fluorescently labeled single-stranded DNA probe is present in the reaction system, it is possible to determine whether the target nucleic acid exists by detecting the fluorescently labeled signal of the single-stranded DNA probe, and to detect the target nucleic acid.

[0186] Therefore, in another aspect, the present invention also relates to a method for detecting the presence of a target nucleic acid in a sample, comprising the following steps:

[0187] (1) contacting the sample with a labeled single-stranded DNA probe and any of the following components: the system or composition described in the first aspect, the carrier system described in the second aspect, the delivery composition described in the third aspect, the kit described in the fourth aspect, or the complex described in the fifth aspect; wherein,

[0188] The system or composition, vector system, delivery composition, kit or complex comprises a guide sequence that is capable of hybridizing to a target nucleic acid, and the single-stranded DNA probe does not hybridize to the guide sequence;

[0189] (2) detecting a detectable signal generated by the Cas protein C2c11 contained in the system or composition, carrier system, delivery composition, kit or complex cleaving the single-stranded DNA probe, thereby determining whether the target nucleic acid is present in the sample.

[0190] In certain embodiments, the detectable signal is determined by one or more methods selected from the group consisting of imaging-based detection, sensor-based detection, color-based detection, gold nanoparticle-based detection, fluorescence polarization-based detection, colloidal phase transition / dispersion-based detection, electrochemical-based detection, and semiconductor-based sensing detection.

[0191] In certain embodiments, the target nucleic acid is as defined in the first aspect.

[0192] In certain embodiments, one end (eg, the 5' end) of the single-stranded DNA probe is labeled with a fluorescent group, and the other end (eg, the 3' end) is labeled with a quencher group.

[0193] In certain embodiments, the target nucleic acid is selected from double-stranded DNA, single-stranded DNA, RNA, or any combination thereof.

[0194] Therefore, in some embodiments, the method further comprises the step of contacting the sample with a reagent for reverse transcription before step (1). In some embodiments, the reagent for reverse transcription is selected from reverse transcriptase, oligonucleotide primers, dNTPs or any combination thereof.

[0195] In certain embodiments, the method further comprises a step of amplifying the target nucleic acid in the sample before step (1).

[0196] In certain embodiments, the amplification is selected from nucleic acid sequence-based amplification, recombinase polymerase amplification, loop-mediated isothermal amplification, strand displacement amplification, exonuclease III assisted signal amplification, hybridization chain reaction, helicase-dependent amplification, isothermal circular strand displacement polymerization, multiple displacement amplification, primase-based whole genome amplification, rolling circle amplification, whole genome amplification, or any combination thereof.

[0197] In certain embodiments, the method further comprises, before step (1), pre-treating the sample to expose the target nucleic acid in the sample.

[0198] In certain embodiments, the sequence of the target nucleic acid is a sequence obtained from a pathogen.

[0199] In certain embodiments, the pathogen is selected from a virus, a bacterium, a fungus, a protozoa, a parasite, or any combination thereof.

[0200] In certain embodiments, the sample is a biological sample or an environmental sample. In certain embodiments, the biological sample is an isolated biological sample.

[0201] In certain embodiments, the biological sample is selected from blood, plasma, serum, urine, feces, sputum, mucus, lymph, bile, ascites, pleural effusion, saliva, cerebrospinal fluid, any body secretion, exudate or exudate (e.g., fluid obtained from an abscess or any other site of infection or inflammation), a swab of the skin or mucosal surface, or any combination thereof.

[0202] In certain embodiments, the environmental sample is selected from a food sample, a paper surface, a fabric, a metal surface, a wood surface, a plastic surface, a soil sample, a freshwater sample, a wastewater sample, a saltwater sample, or any combination thereof.

[0203] Definition of terms

[0204] Unless otherwise indicated, scientific and technical terms used herein have the meanings commonly understood by those skilled in the art. Furthermore, procedures in molecular genetics, nucleic acid chemistry, chemistry, molecular biology, biochemistry, cell culture, microbiology, cell biology, genomics, and recombinant DNA used herein are conventional procedures widely used in the relevant fields. To facilitate a better understanding of the present invention, definitions and explanations of relevant terms are provided below.

[0205] In the present invention, the expression "C2c11" refers to a class of Cas effector proteins discovered and identified for the first time by the inventors of the present application, which can be distinguished from other classes of Cas proteins by protein size, primary structure (e.g., specific motifs), secondary structure (e.g., α-helical structure, β-sheet structure), functional domain (e.g., zinc finger domain, RuvC domain) and / or biological activity. In certain embodiments, the "C2c11" of the present application is less than 700 amino acids (e.g., 300-700 amino acids, 350-650 amino acids, 400-600 amino acids, 450-550 amino acids), and it comprises at least one (e.g., 1, 2, 3 or more) zinc finger domain, the zinc finger domain comprising an α-helical structure and a β-sheet structure and does not contain the amino acid sequence Cys-X1-X2-Cys, wherein X1 and X2 are each independently selected from any amino acid. The C2c11 of the present invention is a nuclease that binds to and cuts a specific site of a target sequence under the guidance of a guide RNA, and has both DNA and RNA endonuclease activities.

[0206] As used herein, the term "zinc finger (ZF) domain" refers to a domain involved in the recognition, binding and / or cutting of Cas proteins, which can recognize, bind and / or cut DNA, RNA or protein. Typically, a zinc finger domain is capable of coordinating one or more zinc ions and is coordinated with one zinc ion by four amino acids (e.g., Cys and / or His) as ligands. This structure can form a variety of different geometric configurations, such as "C2H2 type", "C4 type", "C6 type" and the like. The diversity and modularity of zinc finger domains allow them to participate in the recognition and binding of various biomolecules in different combinations. This DNA recognition can occur via sequence-specific and nonspecific interactions, which are controlled by amino acids in the ZF-DNA interface (Bulyk, Huang, Choo, & Church, 2001, PNAS, 98:7158-63).

[0207] In the Cas protein C2c11 of the present application, the “zinc finger domain” can be involved in identifying, binding and / or cutting target nucleic acids (e.g., DNA, RNA). In certain embodiments, after the Cas protein C2c11 binds to the target nucleic acid through the zinc finger domain, other domains of the Cas protein C2c11 (e.g., RuvC domain) are responsible for cutting and / or breaking the target nucleic acid. In certain embodiments, the spatial position of the zinc finger domain in the Cas protein C2c11 is the same as or similar to its spatial position in the proximal family protein (e.g., Type V-U Cas protein). In certain embodiments, in the domain included in the Cas protein C2c11, the zinc finger domain is adjacent to the RuvC domain.

[0208] As used herein, the term "domain" refers to a region in a protein molecule that has a specific structure and / or function and is the basic unit that constitutes the tertiary structure of a protein. Each domain is typically composed of 50 to 300 amino acid residues, contains a unique spatial conformation, and performs the same or different biological functions. Generally speaking, a group of proteins with the same domain is called a family.

[0209] As used herein, the term "motif" refers to a relatively conserved amino acid sequence in a protein. Typically, a protein family may or may not contain a specific motif to distinguish it from other protein families. A motif typically consists of 3 to 20 consecutive amino acid residues. In some embodiments, the amino acid sequence of the motif can also be shorter or longer.

[0210] As used herein, the terms "Clustered Regularly Interspaced Short Palindromic Repeats (CRISPR)-CRISPR-associated (Cas) (CRISPR-Cas) system" or "CRISPR system" are used interchangeably and have the meaning generally understood by those skilled in the art, which generally include transcripts or other elements related to the expression of CRISPR-associated ("Cas") genes, or transcripts or other elements capable of directing the activity of the Cas genes. Such transcripts or other elements may include sequences encoding Cas effector proteins and guide RNAs comprising CRISPR RNA (crRNA), as well as trans-acting crRNA (tracrRNA) sequences contained in the CRISPR-Cas system, or other sequences or transcripts from the CRISPR locus.

[0211] As used herein, the terms "Cas effector protein" and "Cas effector enzyme" are used interchangeably and refer to a protein of a certain length of amino acids that is present in the CRISPR-Cas system. In some cases, such a protein refers to a protein identified from the Cas locus.

[0212] As used herein, the terms "guide RNA (guide RNA)" and "mature crRNA" are used interchangeably and have the meanings generally understood by those skilled in the art. In general, guide RNA can include a guide sequence (guide sequence) and a backbone sequence, or essentially consist of or consist of a guide sequence (also referred to as a spacer sequence (spacer) in the context of an endogenous CRISPR system) and a backbone sequence. In some cases, a guide sequence is any polynucleotide sequence that has complementarity with a target sequence so as to hybridize with the target sequence and guide the CRISPR-Cas complex to specifically bind to the target sequence. In certain embodiments, when optimally aligned, the degree of complementarity between a guide sequence and its corresponding target sequence is at least 50%, at least 60%, at least 70%, at least 80%, at least 90%, at least 95% or at least 99%. It is within the capabilities of those of ordinary skill in the art to determine the optimal alignment. For example, there are publicly available and commercially available alignment algorithms and programs, such as, but not limited to, ClustalW, Smith-Waterman algorithm (Smith-Waterman), Bowtie, Geneious, Biopython, and SeqMan in matlab.

[0213] In some cases, the guide sequence is at least 10, at least 15, at least 16, at least 17, at least 18, at least 19, at least 20, at least 21, at least 22, at least 23, at least 24, at least 25, at least 26, at least 27, at least 28, at least 29, at least 30, at least 35, at least 40, at least 45, or at least 50 nucleotides in length. In some cases, the guide sequence is no more than 50, 45, 40, 35, 30, 25, 24, 23, 22, 21, 20, 15, 10, or fewer nucleotides in length. In certain embodiments, the guide sequence is 10-15, 15-20, 20-25, 25-30, or 30-40 nucleotides in length.

[0214] In some cases, the backbone sequence is at least 20, at least 21, at least 22, at least 23, at least 24, at least 25, at least 26, at least 27, at least 28, at least 29, at least 30, at least 35, at least 40, at least 45, at least 50, at least 55, at least 56, at least 57, at least 58, at least 59, at least 60, at least 61, at least 62, at least 63, at least 64, at least 65, or at least 70 nucleotides in length. In some cases, the backbone sequence is no more than 70, 65, 64, 63, 62, 61, 60, 59, 58, 57, 56, 55, 50, 45, 40, 35, 30, 29, 28, 27, 26, 25, 24, 23, 22, 21, 20, 15, 10 or fewer nucleotides in length. In certain embodiments, the backbone sequence is 20-30, 30-40, 40-50, 50-100, 100-200, or 200-300 nucleotides in length.

[0215] As used herein, the term "CRISPR-Cas complex" refers to a ribonucleoprotein complex formed by the binding of a guide RNA or mature crRNA to a Cas protein, comprising a guide sequence that hybridizes to a target sequence and binds to the Cas protein. The ribonucleoprotein complex is capable of recognizing and cleaving a polynucleotide that can hybridize to the guide RNA or mature crRNA.

[0216] Therefore, in the case of forming a CRISPR-Cas complex, a "target sequence" refers to a polynucleotide targeted by a guide sequence designed to be targeted, such as a sequence having complementarity with the guide sequence, wherein hybridization between the target sequence and the guide sequence will promote the formation of the CRISPR-Cas complex. Complete complementarity is not required, as long as there is sufficient complementarity to cause hybridization and promote the formation of a CRISPR-Cas complex. The target sequence can comprise any polynucleotide, such as DNA or RNA. In some cases, the target sequence is located in the nucleus or cytoplasm of the cell. In some cases, the target sequence may be located in an organelle of a eukaryotic cell, such as a mitochondria or chloroplast. A sequence or template that can be used to recombine into a target locus comprising the target sequence is referred to as an "editing template" or "editing polynucleotide" or "editing sequence". In certain embodiments, the editing template is an exogenous nucleic acid. In certain embodiments, the recombination is homologous recombination.

[0217] In the present invention, the expression "target sequence" or "target nucleic acid" can be any endogenous or exogenous polynucleotide to a cell (e.g., a eukaryotic cell). For example, the target nucleic acid can be a polynucleotide present in the nucleus of a eukaryotic cell. The target nucleic acid can be a sequence encoding a gene product (e.g., a protein) or a non-coding sequence (e.g., a regulatory polynucleotide or useless DNA). In some cases, the target sequence should be associated with a protospacer adjacent motif (PAM). The precise sequence and length requirements for PAM vary depending on the Cas effector enzyme used, but PAM is typically a 2-5 base pair sequence adjacent to the protospacer sequence (i.e., target sequence). Those skilled in the art will be able to identify PAM sequences for use with a given Cas effector protein.

[0218] In some cases, the target sequence or target nucleic acid can include multiple disease-related genes and polynucleotides and signal transduction biochemical pathway-related genes and polynucleotides. Non-limiting examples of such target sequences or target nucleic acids include those listed in U.S. Provisional Patent Applications 61 / 736,527 and 61 / 748,427, filed on December 12, 2012 and January 2, 2013, respectively, and International Application No. PCT / US2013 / 074667, filed on December 12, 2013, all of which are incorporated herein by reference.

[0219] In some cases, examples of target sequences or target nucleic acids include sequences associated with signal transduction biochemical pathways, such as signal transduction biochemical pathway-related genes or polynucleotides. Examples of target polynucleotides include disease-associated genes or polynucleotides. "Disease-associated" genes or polynucleotides refer to any genes or polynucleotides that produce transcriptional or translational products at abnormal levels or in abnormal forms in cells derived from tissues affected by the disease, compared to tissues or cells of non-disease controls. In cases where the altered expression is associated with the emergence and / or progression of the disease, it can be a gene that is expressed at an abnormally high level; alternatively, it can be a gene that is expressed at an abnormally low level. Disease-associated genes also refer to genes with one or more mutations or genetic variations that are directly responsible for or are disequilibrium with one or more genes responsible for the etiology of the disease. The transcribed or translated products can be known or unknown and can be at normal or abnormal levels.

[0220] As used herein, the term "wild type" or "native" has the meaning commonly understood by those skilled in the art to refer to the typical form of an organism, strain, gene, or characteristic as it occurs in nature, as distinguished from mutant or variant forms, which can be isolated from a source in nature and has not been intentionally modified by man.

[0221] As used herein, the term "non-naturally occurring" means the involvement of human effort. When these terms are used to describe a nucleic acid molecule or polypeptide, they mean that the nucleic acid molecule or polypeptide is at least substantially separated from at least one other component with which it is found in nature or associated as found in nature.

[0222] As used herein, the term "homolog" has the meaning commonly understood by those skilled in the art. As further guidance, a "homolog" of a protein as described herein refers to a protein belonging to a different species that performs the same or similar function as the protein to which it is a homolog.

[0223] As used herein, the term "identity" refers to the match between two polypeptides or between two nucleic acids. When a position in both sequences being compared is occupied by the same base or amino acid monomer subunit (e.g., a position in each of the two DNA molecules is occupied by adenine, or a position in each of the two polypeptides is occupied by lysine), then the molecules are identical at that position. The "percent identity" between two sequences is a function of the number of matching positions shared by the two sequences divided by the number of positions compared x 100. For example, if 6 out of 10 positions in two sequences match, then the two sequences have 60% identity. For example, the DNA sequences CTGACT and CAGGTT share 50% identity (3 out of 6 positions match). Typically, two sequences are compared when they are aligned for maximum identity. Such an alignment can be achieved, for example, by using the method of Needleman et al. (1970) J. Mol. Biol. 48:443-453, which can be conveniently performed using a computer program such as the Align program (DNAstar, Inc.). The percent identity between two amino acid sequences can also be determined using the algorithm of E. Meyers and W. Miller (Comput. Appl Biosci., 4:11-17 (1988)), which has been incorporated into the ALIGN program (version 2.0), using a PAM120 weight residue table, a gap length penalty of 12, and a gap penalty of 4. In addition, the percent identity between two amino acid sequences can be determined using the Needleman and Wunsch (J Mol Biol. 48:444-453 (1970)) algorithm, which has been incorporated into the GAP program in the GCG software package (available at www.gcg.com), using a Blossum 62 matrix or a PAM250 matrix and a gap weight of 16, 14, 12, 10, 8, 6, or 4 and a length weight of 1, 2, 3, 4, 5, or 6.

[0224] As used herein, the expression "corresponding to position ... of SEQ ID NO: 1" refers to the fragment at the equivalent position in the compared sequences when the sequences are optimally aligned, ie, when the sequences are aligned to obtain the highest percentage identity.

[0225] As used herein, the term "vector" refers to a nucleic acid delivery vehicle into which a polynucleotide can be inserted. When a vector is capable of expressing a protein encoded by the inserted polynucleotide, the vector is called an expression vector. A vector can be introduced into a host cell by transformation, transduction, or transfection, so that the genetic material elements it carries are expressed in the host cell. Vectors are well known to those skilled in the art and include, but are not limited to, plasmids, artificial chromosomes (such as yeast artificial chromosomes (YACs), bacterial artificial chromosomes (BACs), or P1-derived artificial chromosomes (PACs)), bacteriophages (such as lambda phage, M13 phage, and animal viruses). Animal viruses that can be used as vectors include, but are not limited to, retroviruses (including lentiviruses), adenoviruses, adeno-associated viruses, herpes viruses (such as herpes simplex virus), poxviruses, baculoviruses, papillomaviruses, and papillomas (such as SV40). A vector can contain a variety of elements that control expression, including, but not limited to, promoter sequences, transcription initiation sequences, enhancer sequences, selection elements, and reporter genes. In addition, a vector may also contain a replication initiation site.

[0226] As used herein, the term "host cell" refers to a cell that can be used to introduce a vector, including but not limited to prokaryotic cells such as Escherichia coli or Bacillus subtilis, fungal cells such as yeast cells or Aspergillus, insect cells such as S2 Drosophila cells or Sf9, or animal cells such as fibroblasts, CHO cells, COS cells, NSO cells, HeLa cells, BHK cells, HEK 293 cells or human cells.

[0227] Those skilled in the art will appreciate that the design of the expression vector may depend on factors such as the choice of the host cell to be transformed, the desired expression level, etc. A vector can be introduced into a host cell to thereby produce a transcript, protein, or peptide, including a protein, fusion protein, isolated nucleic acid molecule, etc. as described herein (e.g., a CRISPR transcript, such as a nucleic acid transcript, protein, or enzyme).

[0228] As used herein, the term "regulatory element" is intended to include promoters, enhancers, internal ribosome entry sites (IRES), and other expression control elements (e.g., transcription termination signals, such as polyadenylation signals and poly-U sequences), which are described in detail in Goeddel, GENE EXPRESSION TECHNOLOGY: METHODS IN ENZYMOLOGY 185, Academic Press, San Diego, CA (1990). In some cases, regulatory elements include those that direct constitutive expression of a nucleotide sequence in many types of host cells and those that direct expression of the nucleotide sequence only in certain host cells (e.g., tissue-specific regulatory sequences). Tissue-specific promoters can primarily direct expression in a desired tissue of interest, such as muscle, neurons, bone, skin, blood, specific organs (e.g., liver, pancreas), or specific cell types (e.g., lymphocytes). In some cases, regulatory elements can also direct expression in a timing-dependent manner (e.g., in a cell cycle-dependent or developmental stage-dependent manner), which may or may not be tissue- or cell-type-specific. In some cases, the term "regulatory element" encompasses enhancer elements such as WPRE, the CMV enhancer, the R-U5' segment in the LTR of HTLV-I ((Mol. Cell. Biol., Vol. 8(1), pp. 466-472, 1988), the SV40 enhancer, and the intron sequence between exons 2 and 3 of rabbit β-globin (Proc. Natl. Acad. Sci. USA., Vol. 78(3), pp. 1527-31, 1981).

[0229] As used herein, the term "promoter" has a meaning well known to those skilled in the art and refers to a non-coding nucleotide sequence located upstream of a gene that can initiate expression of a downstream gene. A constitutive promoter is a nucleotide sequence that, when operably linked to a polynucleotide encoding or defining a gene product, results in the production of the gene product in the cell under most or all physiological conditions of the cell. An inducible promoter is a nucleotide sequence that, when operably linked to a polynucleotide encoding or defining a gene product, results in the production of the gene product in the cell essentially only when an inducer corresponding to the promoter is present in the cell. A tissue-specific promoter is a nucleotide sequence that, when operably linked to a polynucleotide encoding or defining a gene product, results in the production of the gene product in the cell essentially only when the cell is a cell of the tissue type corresponding to the promoter.

[0230] As used herein, the term "operably linked" is intended to mean that the nucleotide sequence of interest is linked to the one or more regulatory elements in a manner that allows for expression of the nucleotide sequence (e.g., in an in vitro transcription / translation system or in a host cell when the vector is introduced into the host cell).

[0231] As used herein, the term "complementarity" refers to the ability of a nucleic acid to form one or more hydrogen bonds with another nucleic acid sequence by means of traditional Watson-Crick or other non-traditional types. Percent complementarity represents the percentage of residues in a nucleic acid molecule that can form hydrogen bonds (e.g., Watson-Crick base pairing) with another nucleic acid sequence (e.g., 5, 6, 7, 8, 9, 10 out of 10 is 50%, 60%, 70%, 80%, 90%, and 100% complementary). "Complete complementarity" means that all consecutive residues of a nucleic acid sequence form hydrogen bonds with the same number of consecutive residues in another nucleic acid sequence. As used herein, "substantially complementary" refers to a degree of complementarity that is at least 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, 97%, 98%, 99%, or 100% over a region of 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 30, 35, 40, 45, 50 or more nucleotides, or to two nucleic acids that hybridize under stringent conditions.

[0232] As used herein, " stringent conditions " for hybridization refer to the conditions that the nucleic acid with complementarity to the target sequence mainly hybridizes with the target sequence and does not hybridize to the non-target sequence substantially. Stringent conditions are usually sequence-dependent and vary depending on many factors. Generally speaking, the longer the sequence, the higher the temperature at which the sequence-specific hybridizes to its target sequence. Non-limiting examples of stringent conditions are described in " Laboratory Techniques In Biochemistry And Molecular Biology - Hybridization With Nucleic Acid Probes " by Tijssen (1993), Part I, Chapter II, " Overview of hybridization principles and nucleic acid probe analysis strategy ", Elsevier, New York.

[0233] As used herein, the term "hybridization" refers to a reaction in which one or more polynucleotide reactions form a complex that is stabilized via hydrogen bonding of the bases between the nucleotide residues. Hydrogen bonding can occur by means of Watson-Crick base pairing, Hoogstein binding, or in any other sequence-specific manner. The complex can comprise two chains forming a duplex, three or more chains forming a multi-chain complex, a single self-hybridizing chain, or any combination thereof. A hybridization reaction can constitute a step in a broader process (such as the beginning of PCR or the cutting of a polynucleotide via an enzyme). A sequence that can hybridize with a given sequence is referred to as the "complement" of the given sequence.

[0234] As used herein, the term "expression" refers to the process of transcription from a DNA template into a polynucleotide (e.g., into mRNA or other RNA transcripts) and / or the process of subsequent translation of the transcribed mRNA into a peptide, polypeptide, or protein. The transcript and the encoded polypeptide may be collectively referred to as a "gene product." If the polynucleotide is derived from genomic DNA, expression may include splicing of the mRNA in a eukaryotic cell.

[0235] As used herein, the term "linker" refers to a linear polypeptide formed by connecting multiple amino acid residues via peptide bonds. The linker of the present invention can be an artificially synthesized amino acid sequence, or a naturally occurring polypeptide sequence, such as a polypeptide having a hinge region function. Such linker polypeptides are well known in the art (see, for example, Holliger, P. et al. (1993) Proc. Natl. Acad. Sci. USA 90: 6444-6448; Poljak, RJ et al. (1994) Structure 2: 1121-1123).

[0236] As used herein, the term "treat" refers to treating or curing a disorder, delaying the onset of symptoms of a disorder, and / or delaying the progression of a disorder.

[0237] As used herein, the term "subject" includes, but is not limited to, various animals, such as mammals, such as bovines, equines, ovines, porcines, canines, felines, lagomorphs, rodents (e.g., mice or rats), non-human primates (e.g., macaques or cynomolgus monkeys), or humans. In certain embodiments, the subject (e.g., human) suffers from a disorder (e.g., a disorder caused by a disease-associated gene defect).

[0238] Advantageous Effects of the Invention

[0239] Compared with the prior art, the Cas protein C2c11 and system of the present invention have significant differences in protein size, primary structure (e.g., motif), secondary structure (e.g., α-helix structure, β-sheet structure) and / or functional domain (e.g., containing a zinc finger domain, containing a RuvC domain, and not containing an HNH domain).

[0240] In addition, the Cas protein C2c11 and system of the present invention have significant advantages. First, the Cas effector protein of the present invention has cis-cleavage activity (e.g., it can cut double-stranded and single-stranded target nucleic acids) and trans-cleavage activity (e.g., it can achieve detection of target nucleic acids). Secondly, the molecular size of the Cas effector protein of the present invention is significantly smaller than that of Cas9 and Cas12a proteins, so the transfection efficiency is significantly better than that of Cas9 and Cas12a proteins. Further, the editing activity of the Cas effector protein of the present invention is higher.

[0241] The embodiments of the present invention will be described in detail below with reference to the accompanying drawings and examples, but it will be understood by those skilled in the art that the following drawings and examples are intended only to illustrate the present invention and are not intended to limit the scope of the invention. Various objects and advantages of the present invention will become apparent to those skilled in the art based on the following detailed description of the accompanying drawings and preferred embodiments. BRIEF DESCRIPTION OF THE DRAWINGS

[0242] Figure 1 shows the structure of the C2c11 protein. Figure 1A shows the predicted three-dimensional structure of the C2c11 protein; Figure 1B compares the structural domains of C2c11 with those of the closely related C2C9 and C2C10 systems; and Figure 1C compares the key zinc ion coordination residues in the zinc finger domains of C2c11 and Cas12f1 (C2C10) (left: purple-red; right: C2c11: green).

[0243] Figure 2 is a plasmid map constructed based on pET28a(+).

[0244] Figure 3 is a map of the C2c11 family protein sequence alignment and clustering.

[0245] Figure 4 shows the results of C2c11 protein purification, where Figure 4A shows the SDS-PAGE results of elution with C2c11-A4 molecular sieve; Figure 4B shows the SDS-PAGE results of elution with C2c11-A5 molecular sieve; Figure 4C shows the SDS-PAGE results of elution with C2c11-L1 molecular sieve; Figure 4D shows the SDS-PAGE results of elution with C2c11-L2 molecular sieve; Figure 4E shows the SDS-PAGE results of elution with C2c11-B2 molecular sieve; and Figure 4F shows the SDS-PAGE results of elution with C2c11-B9 molecular sieve.

[0246] Figure 5 shows the electrophoresis results of the products obtained after the CRISPR-C2c11 system cuts single-stranded DNA.

[0247] Figure 6 shows the fluorescence signal generated after the CRISPR-C2c11 system cuts the targeted single-stranded DNA, where Figure 6A is the fluorescence curve of C2c11-L2 and Figure 6B is the fluorescence curve of C2c11-L3; the abscissa is time (in minutes) and the ordinate is the fluorescence value.

[0248] Figure 7 shows the electrophoresis results of the products obtained after double-stranded DNA cleavage by the CRISPR-C2c11 system; wherein, the substrate used in Figure 7A is the 7N PAM library and target sequence, the substrate used in Figure 7B is the 5'TTCC PAM sequence and target sequence, and the substrate used in Figure 7C is the 5'TTTC PAM sequence and target sequence.

[0249] FIG8 shows the conserved amino acid sites with consistency in the zinc finger domain of the C2c11 protein.

[0250] Figure 9 is an exemplary structural diagram of the guide RNA of the present application. Wherein, the guide RNA comprises in sequence from 5' end to 3' end: the non-complementary repeat sequence of tracrRNA (the portion where the red segment forms a stem-loop structure in Figure 9), the complementary repeat sequence of tracrRNA (the portion where the red segment is complementary to the blue segment in Figure 9), the linker sequence (linker in Figure 9), the repeat sequence of crRNA (the portion where the blue segment is complementary to the red segment in Figure 9), the guide sequence of crRNA (the yellow segment in Figure 9). In such exemplary embodiments, the non-complementary repeat sequence of tracrRNA, the complementary repeat sequence of tracrRNA, the linker sequence, and the repeat sequence of crRNA serve together as the backbone sequence of the guide RNA, and the guide sequence of crRNA serves as the guide sequence of the guide RNA.

[0251] Figure 10 is a map of the editing plasmid pX458-C2c11 in C2c11 cells.

[0252] Figure 11 is the electrophoresis diagram of T7E1 enzyme digestion of AAVS1 gene editing products.

[0253] Figure 12 is a map of the CMV-BFP-P2A-mcherry plasmid.

[0254] Figure 13 is the electrophoresis diagram of T7E1 enzyme digestion of mCherry gene editing products.

[0255] Sequence information

[0256] Information on the partial sequences involved in the present invention is provided in Table 1 below.

[0257] Table 1: Description of sequences Note: 1. N in a nucleic acid sequence can be any of A, U, C or G.

[0258] 2. The italicized and underlined sequence in SEQ ID NO: 101 is the target sequence in the targeting chain.

[0259] 3. The italicized and bold sequence in SEQ ID NO: 108 is the L1-gRNA coding sequence, the bold and underlined sequence is the sequence of the genomic region targeted by the gRNA, and the italicized and underlined sequence is the human codon-optimized C2c11L1 protein sequence. DETAILED DESCRIPTION

[0260] The invention will now be described with reference to the following examples which are intended to illustrate the invention but not to limit it.

[0261] Unless otherwise specified, the experiments and methods described in the embodiments are carried out substantially according to conventional methods well known in the art and described in various references. For example, conventional techniques such as immunology, biochemistry, chemistry, molecular biology, microbiology, cell biology, genomics and recombinant DNA used in the present invention can be found in Sambrook, Fritsch and Maniatis, MOLECULAR CLONING: A LABORATORY MANUAL, 2nd edition (1989); CURRENT PROTOCOLS IN MOLECULAR BIOLOGY (eds. F. M. Ausubel, (1987)); METHODS IN ENZYMOLOGY series (Academic Publishing Company): PCR 2: A PRACTICAL METHOD. APPROACH) (MJ MacPherson, BD Hames and GR Taylor, eds. (1995)), Harlow and Lane, eds. (1988) ANTIBODIES, A LABORATORY MANUAL, and ANIMAL CELL CULTURE (RI Freshney, ed. (1987)).

[0262] In addition, if specific conditions are not specified in the examples, the experiments were performed under conventional conditions or the conditions recommended by the manufacturer. If the manufacturer of the reagents or instruments is not specified, they are all conventional products that can be obtained commercially. It is understood that the examples describe the present invention by way of example and are not intended to limit the scope of the present invention. All publications and other references mentioned herein are incorporated herein by reference in their entirety.

[0263] Example 1. Acquisition of nuclease

[0264] Using the bioinformatics prediction process disclosed in the patent "A method and apparatus for screening novel CRISPR-Cas systems" (application number: 201610741844.0), a new family of Cas12f proteins was discovered in multiple strains (see Table 2 for details), which was named C2c11. The three-dimensional structure of this family of proteins was subsequently predicted using AlphaFold2 (Figure 1A). It was found that compared to previously reported Cas12f proteins, the zinc finger domain of this family of proteins does not have the key zinc ion coordination residue CXXC (Figure 1B), where C is cysteine ​​(Cys) and X is any amino acid.

[0265] By combining previous and optimized molecular experimental techniques, 30 new C2c11 nucleases with potential gene editing functions were obtained.

[0266] The amino acid sequences and nucleotide sequences of these nucleases are shown in Table 1 , and the sequence sources are shown in Table 2 .

[0267] Table 2. Nuclease sequence sources

[0268] Further clustering and multiple sequence alignment of the C2c11 protein sequences, along with the construction of a phylogenetic tree, revealed that the 30 sequences formed three major branches, designated subfamilies A, B, and L. Specifically, C2c11-A1 to A5 belonged to subfamily A, C2c11-B1 to B12 to subfamily B, and C2c11-L1 to L11, as well as Q4 and Q5, to subfamily L (as shown in Figure 3). The amino acid sequence similarity within each subfamily ranged from 0.41 to 0.72, while the similarity between subfamilies ranged from 0.29 to 0.41.

[0269] Subsequently, sequences such as A4, A5, L1, L2, B2, B8, and B9 will be randomly selected from each subfamily for in vivo and in vitro activity analysis and identification experiments.

[0270] Example 2. Expression and purification of effector proteins of the CRISPR-C2c11 system

[0271] Before protein expression and purification, the physicochemical properties of the protein, including isoelectric point, relative molecular mass, and extinction coefficient, were analyzed based on the protein sequence using the ProtParam tool provided by ExPasy (https: / / web.expasy.org / protparam / ) to adjust the purification process and buffer.

[0272] Plasmid construction: This vector construction strategy is based on the pET28a(+) prokaryotic expression vector and uses λ phage T7 RNA polymerase to express the target protein. The nucleotide sequence of the nuclease was codon-optimized, sequenced, and inserted into the pET28a(+) multiple cloning site. The optimized nucleotide sequence of the nuclease contains the SV40 NLS signal upstream and the thrombin restriction site and 10×His affinity tag downstream. In addition, the transcription templates of the JM23119 promoter and C2C11 guide RNA were inserted downstream of the protein sequence. The transcribed RNA sequences are SEQ ID NO: 63, SEQ ID NO: 65, SEQ ID NO: 67, SEQ ID NO: 72-98, and are connected to the targeting sequence (SEQ ID NO: 99). This plasmid is used for prokaryotic expression and protein-RNA co-purification. The sequence was submitted to Beijing Liuhe BGI Gene Technology Co., Ltd. for plasmid synthesis. The constructed plasmid map is shown in Figure 2.

[0273] The plasmid was introduced into competent Rosseta (DE3) cells (Takara) using the heat shock method. 300 μL of antibiotic-free medium was added and cultured for 60 minutes. The cells were plated (LB plate, kanamycin resistance) and cultured overnight at 37°C. Single colonies were selected and transferred to 50 mL of LB liquid medium (kanamycin resistance 50 μg / mL, chloramphenicol 34 μg / mL) and cultured overnight at 37°C and 200 rpm. A 1% transfer ratio was transferred to a 5L shake flask containing 2 L of LB medium containing both kanamycin and chloramphenicol. The cells were cultured at 37°C and 200 rpm until the OD600 reached approximately 0.6-0.8. The shake flask was removed and placed in an ice bath for at least 30 minutes. After the ice shock was complete, isopropyl β-D-thiogalactopyranoside (IPTG) was added to a final concentration of 0.5 mM and expression was induced overnight at 16°C and 200 rpm. Centrifuge at 6000 rpm and 4°C for 10 min, collect the bacteria and freeze them at -20°C for later use.

[0274] The collected cells were resuspended in a 500 mL beaker at a ratio of 1 g:10 mL buffer A (20 mM Tris-HCl, (pH 8.0, 25°C), 20 mM imidazole, 1 M NaCl, 1 mM PMSF, 10% glycerol) and incubated on ice for 10 min. The cells were lysed by ultrasonic disruption. The ultrasonically disrupted cells were centrifuged at 16,000 rpm, 4°C for 45 min, and the supernatant was collected.

[0275] Because the target protein carries a His tag, affinity chromatography was first performed on a Ni-NTA gravity column to purify the protein. 2 mL of Ni-NTA filler was loaded onto the gravity column (1 mL column volume). After the preservative solution flowed through, 10 column volumes of buffer A were added to equilibrate the filler. The equilibrated filler was mixed with the centrifuged supernatant and incubated at 4°C for 1 hour. The supernatant was then passed through the gravity column and repeated once. Contaminants were eluted with 30 mL of buffer B (20 mM Tris-HCl (pH 8.0, 25°C), 50 mM imidazole, 1 M NaCl, 10% glycerol). Further elution was performed with 10 mL of buffer C (20 mM Tris-HCl (pH 8.0, 25°C), 100 mM imidazole, 1 M NaCl, 10% glycerol). The target protein was eluted using Buffer D (20 mM Tris-HCl (pH 8.0, 25°C), 1000 mM imidazole, 1 M NaCl, 10% glycerol) with a volume of 1 mL each time. The target protein was collected and the protein concentration was determined. Samples with higher concentrations were selected for subsequent SEC purification.

[0276] Gel filtration chromatography: Superdex 200 Increase (10 / 300) was used according to the AKTA operating procedure. The working pump and system were rinsed with filtered and degassed Milli Q water, connected at a flow rate of 0.3 mL / min, and loaded with Superdex 200 Increase (10 / 300). The column was then rinsed with Milli Q water for 1 column volume. The A1 pump was placed in SEC Buffer (20 mM Tris-HCl (pH 8.0, 25°C), 1 M NaCl, 1 mM DTT, 10% glycerol) and the A1 pump and system were rinsed. The packed column was equilibrated with SEC Buffer for 1CV at a flow rate of 0.4 mL / min. The liquid outlet was adjusted to Outlet 1 pipeline until the baseline was stable and the baseline was zeroed. 2 mL of sample with higher concentration and purity was loaded through a 2 mL loading loop. The sample was loaded onto the molecular sieve at a flow rate of 0.4 mL / min and the liquid outlet was set to Fraction. After the flow-through volume reached 8 mL, 0.5 mL was collected per tube. After elution, the column was rinsed sequentially with 1 CV of 0.5 M NaOH, 2 CV of Milli-Q water, and then with a 20% ethanol-0.2 M sodium acetate solution, which was then stored. Fractions below the main elution peak that showed UV absorption were collected and sampled for SDS-PAGE electrophoresis to confirm the target protein fraction. Protein was concentrated using a 30K ultrafiltration tube and the protein concentration was determined using a protein quantification kit (Invitrogen). Fractions of good purity identified by SDS-PAGE were pooled, concentrated using ultrafiltration tubes, and the concentration was determined. The fractions were snap-frozen in liquid nitrogen and stored in a -80°C freezer for aliquoting.

[0277] After the above experiments, the purified C2c11-gRNA ribonucleoprotein complex (RNP) was successfully obtained. Among them, the exemplary results of SDS-PAGE are shown in Figures 4A-4F. The SDS-PAGE figure is the result of gel filtration chromatography protein purification, and the numbers in the figure indicate the order in which the proteins flow out of the chromatography column during the purification process. Figures 4A to 4F are the purification results of C2c11-A4, C2c11-A5, C2c11-L1, C2c11-L2, C2c11-B2, and C2c11-B9, respectively. The position of the target protein in the PAGE figure is indicated by an arrow. The proteins that do not conform to the actual size in the figure are all miscellaneous proteins that flow out during the purification process.

[0278] Example 3. Editing activity of the CRISPR-C2c11 system on single-stranded DNA

[0279] Perform in vitro cleavage experiments with the CRISPR-C2c11 system according to the following protocol:

[0280] 1. Using double-stranded DNA synthesis, a transcription template was obtained by connecting the backbone sequence (SEQ ID NO: 63, SEQ ID NO: 65, SEQ ID NO: 67, SEQ ID NO: 72-98) to the targeting sequence (SEQ ID NO: 99), and gRNA was obtained by in vitro transcription. Exemplary gRNAs are shown below: C2c11-L1-gRNA (SEQ ID NO: 62), C2c11-L2-gRNA (SEQ ID NO: 64), and C2c11-L3-gRNA (SEQ ID NO: 66). In these exemplary sequences, the italicized and underlined bases are the targeting sequences.

[0281] 2. The above gRNA transcription DNA template was amplified by PCR, referring to Thermofisher MEGAshortscript TM Kit Protocol: Prepare T7 transcription system: T7: 1μL, 10×T7 buffer 1μL, UTP: 1μL, ATP: 1μL, GTP: 1μL, CTP: 1μL, ddH2O: 2μL, DNA: 50-200ng, react at 37℃ for more than 12 hours, extract with phenol-chloroform, and then precipitate with isopropanol to recover RNA product. The product is purified by Qubit TM The concentration was determined using the RNA BR Assay Kit and diluted to a final concentration of 5 μM using Nuclease-Free water.

[0282] 3. 5'FAM-labeled ssDNA substrate 1 (SEQ ID NO: 101) was synthesized by Beijing Liuhe BGI Genomics Co., Ltd., diluted to 5 μM with Nuclease-Free water, and stored at -20°C.

[0283] 4. 2 μL of 10× reaction buffer (25 mM Tris-HCl, pH 7.5, 200 mM NaCl, 100 mM MgCl2, 25 mM DTT), 2 μL of 5 μM protein obtained in Example 2, and 2 μL of 5 μM RNA product obtained in step 2 were mixed, nuclease-free water was added to 18 μL, and the mixture was shaken and mixed at room temperature for 15 minutes. 2 μL of ssDNA substrate 1 obtained in step 3 was added and incubated at 37°C for 4 hours. Cas12f1 (MeiGe Bio, C015S) was used as the standard positive sample. CRISPR-C2c11 and CRISPR-Cas12f1 cleaved ssDNA substrate 1.

[0284] 5. Add 1 μL of 10 mg / mL RNase A to the reaction mixture in step 4 and incubate at 37°C for 15 min. Then, add 1 μL of 50 mg / mL Proteinase K and incubate at 37°C for 15 min.

[0285] 6. Take 11 μL of the above reaction product, mix it with 2 μL of 10× loading buffer, and run it on a 10% TBU-PAGE gel at 150V for 30 min. Use a Bio-lab gel imaging instrument and Image Lab software to take pictures and analyze the images.

[0286] The above experiments confirmed that the CRISPR-C2c11 system has editing activity on single-stranded DNA. The exemplary results of CRISPR-C2c11-L2 are shown in Figure 5. Compared with the control group, the brightness of the single-stranded DNA fluorescence band was significantly reduced after the action of C2c11 protein and gRNA, demonstrating that the system has editing activity on single-stranded DNA.

[0287] Example 4. Editing activity (trans-cleavage activity) of the CRISPR-C2c11 system on the reporter fluorescent substrate ssDNA

[0288] To test whether the C2c11 protein possesses target-activated trans-cleavage activity, a ssDNA substrate labeled with a FAM fluorophore at the 5' end and a BHQ1 quencher at the 3' end was used as a probe. When the probe is intact, the fluorophore and quencher are in close proximity, and due to fluorescence resonance energy transfer (FRET), the fluorophore does not fluoresce when excited by incident light. However, when the ssDNA is cleaved, the FAM fluorophore at the 5' end and the quencher at the 3' end separate, exceeding the FRET range. The FAM quencher then emits fluorescence at its own wavelength when exposed to incident light.

[0289] When C2c11 binds to targeted single-stranded DNA, its trans-cleavage activity can be activated, so the targeted single-stranded DNA is used to activate its trans-cleavage activity.

[0290] 1. A reporter fluorescent substrate, ssDNA-FQ, labeled with FAM at the 5' end and BHQ1 at the 3' end, was synthesized by Beijing Liuhe BGI Genomics Co., Ltd. The sequence is shown in SEQ ID NO:100 (5'-FAM-AAAAAA-BHQ1-3'). Nuclease-free water was added to dilute the substrate to 25 μM and stored at -20°C. The target string (TS) sequence is shown in SEQ ID NO:101. A reverse-complementary non-target string (NTS) was synthesized as a negative control. The non-target string (NTS) sequence is shown in SEQ ID NO:68.

[0291] 2. Synthesize the C2c11-gRNA complex in vitro, as specifically shown in Example 3.

[0292] 3. The detection used 100 ng of purified C2c11-gRNA complex, 25 pM ssDNA-FQ, 100 ng of TS strand DNA fragment, and 1× buffer (50 mM Tris-HCl, 10 mM MgCl2, 100 μg / ml BSA (pH 8.0)) in a final volume of 20 μL. The reaction was incubated at 37°C. In the detection system, once the specific Cas protein is activated by the target sequence, the ssDNA-FQ will be cleaved. The fluorescence of the detection reaction was measured using a full-wavelength microplate reader with an excitation wavelength of 485 nm and an emission wavelength of 520 nm. The signal was recorded once every minute for 1 hour. Each reaction was set up in three parallel replicates, incubated at 37°C for 1 hour, and the FAM fluorescence signal value was detected every 5 minutes. The fluorescence value-time relationship reaction curve was plotted.

[0293] The above experiments revealed that randomly selected CRISPR-C2c11 systems A4, A5, L1, L2, L3, B2, and B8 all possessed trans-cleavage activity that could be activated by target nucleic acids. The exemplary experimental results for CRISPR-C2c11-L2 and CRISPR-C2c11-L3 are shown in Figures 6A and 6B. Compared to negative controls lacking gRNA (L2-TS / L3-TS) or providing non-targeting strands (L2-NTS / L2-gRNA-NTS / L3-NTS / L3-gRNA-NTS), the fluorescence signal generated by substrate cleavage upon the action of C2c11 protein and gRNA increased significantly over time, demonstrating that the system possessed trans-cleavage activity that could be activated by target nucleic acids.

[0294] Example 5. Editing activity of the CRISPR-C2c11 system on double-stranded DNA

[0295] 1. Double-stranded DNA Substrate Sequence Preparation Method: Using a 7N PAM region library plasmid constructed in our laboratory as a template, we generated substrate plasmids with PAM sequences of TTTC and TTCC through site-directed PCR. PCR products were recovered and purified to obtain linear dsDNA. After Cas cleavage, the dsDNA formed two DNA fragments of approximately 1500 and 1100 bp in length.

[0296] 2. Primer sequences used for PCR: LC_target_F2: SEQ ID NO: 102, LC_target_R2: SEQ ID NO: 103. 7N PAM and target sequence: SEQ ID NO: 104. 5'TTTC PAM sequence and target sequence: SEQ ID NO: 105. 5'TTCC PAM sequence and target sequence: SEQ ID NO: 106.

[0297] 3. The enzyme digestion reaction system includes 200 ng of C2c11-gRNA complex, 100 ng of target DNA fragment, and 1× buffer (50 mM Tris-HCl, 10 mM MgCl2, 100 μg / ml BSA (pH 7.9, 25°C). The volume is added to 10 μL, incubated at 37°C overnight, and then heated at 95°C for 5-10 minutes to inactivate the protease. The digestion product fragments are then separated by agarose gel electrophoresis.

[0298] 4. The above experiments confirmed that the seven representative CRISPR-C2c11 systems purified to date, namely C2c11-A4, A5, L1, L2, L3, B2, and B9, all have editing activity on double-stranded DNA. Exemplary results are shown in Figure 7. The substrates used in Figure 7A are the 7N PAM library and target sequence, the substrates used in Figure 7B are the 5'TTCC PAM sequence and target sequence, and the substrates used in Figure 7C are the 5'TTTC PAM sequence and target sequence. Compared to the dsDNA cleavage substrate bands, the formation of clear small fragment cleavage products can be observed after processing by the C2c11-gRNA complex. This experiment confirms that the C2c11 system has significant in vitro editing activity on double-stranded DNA.

[0299] Example 6. C2c11 system edits the AAVS1 gene in mammalian cells

[0300] To detect the editing activity of the C2c11 system in eukaryotic cells, the C2c11 system with in vitro activity was tested for in vivo editing activity in the human embryonic kidney cell-derived cell line HEK293T.

[0301] Design of editing plasmids

[0302] For editing in HEK293T cells, the endogenous gene AAVS1 (gene bank ID: AC005782.1) was selected for targeted cleavage verification. The nucleotide sequence of the AAVS1 gene is shown in SEQ ID NO: 107.

[0303] Target site sequences for the AAVS1 gene were designed based on the TC-rich PAM characteristics of the C2c11 family (see Table 3). These include seven C2c11-A4 editing target sites, eight C2c11-L1 target sites, three each of C2c11-B9, C2c11-B11, and C2c11-B12 target sites, and two each of C2c11-L2, C2c11-B2, and C2c11-B8 target sites. The corresponding guide RNA scaffolds are shown in Table 1. Gene-editing plasmids for the corresponding proteins were designed and synthesized. The corresponding plasmid information is shown in Tables 3 and 4.

[0304] Taking the C2c11-L1 system editing plasmid px458-C2c11L1-AAVS1-g1 as an example, it contains the following inserts: the backbone sequence of the gRNA for the C2c11-L1 protein (SEQ ID NO: 63) and the guide sequence (C2c11 L1-AAVS1-g1, see Table 3 for the specific sequence), transcribed from the U6 promoter RNA; and the human codon-optimized C2c11-L1 protein sequence, which carries an NLS nuclear import signal. Specifically, the map of the synthesized C2c11 intracellular editing plasmid Px458-C2c11L1-AAVS1-g1 is shown in Figure 10, and the complete sequence is shown in SEQ ID NO: 108. The editing plasmids for the other C2c11 systems were designed using the same method as the editing plasmid for the C2c11-L1 system. The codon-optimized protein sequences of C2c11-L2, C2c11-A4, C2c11-A5, C2c11-B2, C2c11-B8, C2c11-B9, C2c11-B11, and C2c11-B12 are shown in Table 1 as SEQ ID NOs: 109-116. The gRNA backbone sequences for these C2c11 systems are shown in Table 1 for the corresponding gRNA backbone sequences. All nucleotide sequences were synthesized at Beijing Liuhe BGI Genomics Co., Ltd. The positive control was spCas9.

[0305] Table 3. Guide sequences of gRNA (AAVS1 gene target site information)

[0306] Furthermore, the following steps were performed to complete intracellular plasmid delivery (transfection), genome editing, and activity verification experiments:

[0307] (1) Human cell culture

[0308] The culture conditions were: DMEM medium (high glucose, Gibco) containing 10% fetal bovine serum (FBS, Gibco), 1% non-essential amino acids (NEAA, Gibco) and 1% glutamine (GlutaMAX, Gibco), 37°C, 5% CO2 concentration.

[0309] (2) Preparation of NC plasmid:

[0310] Take 15mL LB liquid culture medium (sterilized in advance at high temperature and high pressure, room temperature), add 15μL of 1000X Amp antibiotic, use a 10μL pipette tip to pick up the punctured strain containing the target plasmid (Beijing Liuhe BGI), place it in the culture medium, and culture at 37°C, 200rpm for 12-16h; centrifuge the cultured and amplified bacterial liquid at 8000rpm for 3min, and discard the culture medium; use an endotoxin-free kit (Tiangen Biochemical) to extract the target plasmid according to the instructions; after extraction, use Nanodrop (Thermo) to quantify the DNA concentration and store it at -20°C.

[0311] (3) Editing plasmid construction:

[0312] 1000 ng of NC plasmid was digested with 2 μL BsaI (NEB) at 37°C, 10x loading buffer (Takara) was added, and electrophoresis was performed on a 0.5% agarose gel. The target fragment was selected for gel recovery to obtain a vector with sticky ends. Oligos with sticky ends and target sequences were synthesized (Beijing Liuhe BGI), and double-stranded inserts were formed after annealing. The vector backbone and insert were connected at room temperature for 30 minutes using T4 ligase (NEB), transformed into DH5α Escherichia coli (commissioned to Sangon), and plated onto Amp-resistant plates. After overnight culture, single clones were picked and Sanger sequencing was performed using U6 universal primers (commissioned to Beijing Liuhe BGI). The correct single clone was selected for expansion culture, and the plasmid was extracted according to the method shown in step (1).

[0313] (4) Plasmid transfection

[0314] One day before transfection, the original culture medium of HEK293T cells (~90% confluence) was removed using a pipette pump. About 2 mL of 37°C preheated DPBS (Gibco) was slowly added along the wall to wash the cell surface and then removed using a pipette pump. 1 mL of preheated digestion solution (TrypLE Express, Gibco) was added for digestion. After about 3 minutes, an appropriate amount of preheated DMEM medium containing serum was added to terminate the digestion. The cells were resuspended by pipetting, and a small amount of the cell suspension was gently pipetted to mix with an equal proportion of trypan blue dye (Solarbio). About 20 μL of the mixture was added to a cell counting plate (Countstar) and viable cells were counted using a cell analyzer (Countstar Rigel S2). Finally, cells were plated and cultured in 24-well cell culture plates, with approximately 0.5–1 × 10⁶ cells per well. Once the cells reached 50%–70% confluency, the cells were transfected with the target plasmid using the Lipofectamine 8000 kit (Beyond Technology) according to the manufacturer's instructions (1 μg of plasmid per well). Following transfection, cells were cultured for 2–3 days to allow for full gene editing. After completion of the cell culture, the transfection efficiency was calculated and the cells were harvested. The cell digestion and resuspension procedures were the same as described above (with the appropriate reagent amounts calculated based on the cell culture area). Approximately 20 μL of the cell suspension was added to a cell counting plate and analyzed using a cell analyzer, using the green fluorescence (GFP) channel to calculate the plasmid transfection efficiency. The remaining cells were transferred to a 1.5 mL centrifuge tube and centrifuged at 12,000 rpm for 1 minute. The supernatant culture medium was removed and the cells harvested.

[0315] (5) Identification of genome editing activity

[0316] After harvesting the cells, genome extraction and editing activity detection were performed.

[0317] (a) Genomic DNA extraction: Genomic DNA was extracted using a genomic DNA extraction kit (Tiangen), and the genomic (gDNA) concentration was quantified using Nanodrop and stored at -20°C.

[0318] (b) Targeted region PCR: Target region amplification was performed from gDNA using a high-fidelity amplification enzyme (PrimeSTAR GXL DNA Polymerase, Takara). Primers F (SEQ ID NO: 117) and R (SEQ ID NO: 118) were synthesized at Beijing Liuhe BGI. After a distinct, single target band was visualized on a 1% agarose (TAE) gel electrophoresis (180 V, 20 min), the gel was excised and purified using a PCR purification and gel extraction kit (NucleoSpin Extract, MN). Concentration was measured using a Nanodrop.

[0319] (c) Prepare the annealing reaction system according to Table 4 and perform denaturation and annealing according to Table 5. Add 0.3 μL of T7EI nuclease to the annealed reaction system, for a total of 20 μL. Incubate the reaction at 37°C for 20 min. Then, add 4 μL of 6* Gel Loading Dye (NEB) and examine the bands on an agarose gel (using the same steps as above for agarose gel electrophoresis).

[0320] Table 4 Annealing reaction system

[0321] Table 5 Denaturation and annealing conditions

[0322] The genome editing results are shown in Figure 11. In electropherogram 11, five target sites of C2c11-L1-AAVS1-g4 to C2c11-L1-AAVS1-g8, six target sites of C2c11-A4-AAVS1-g4 to C2c11-A4-AAVS1-g9, and C2c11-B9-AAVS1-g3, C2c11-B11-AAVS1-g2, C2c11-B11-AAVS1-g3, and C2c11-B12-AAVS1-g1 showed genome editing activity.

[0323] (d) Amplicon library construction and sequencing.

[0324] The target region was amplified from gDNA using a high-fidelity amplification enzyme (PrimeSTAR GXL DNA Polymerase, Takara). The primers for amplification are shown in Table 6 below and were synthesized at Beijing Liuhe BGI. The PCR products were sequenced by NGS at Wuhan BGI Co., Ltd. The next-generation sequencing library was constructed according to the instructions for the MGI Easy Fast PCR-FREE Enzyme Digestion Library Preparation Kit V2.0. DNA was prepared into a DNBSEQ platform-specific library and sequenced using the MGISEQ2000 platform, using the PE100 sequencing type.

[0325] Analysis of the targeted region editing products revealed that all C2c11 systems, including C2c11-A4, B5, B2, B8, B9, B11, B12, L1, and L2, exhibited gene editing activity in human cells at at least one AAVS1 site. B12 exhibited an editing efficiency of 7.87% at the g1 site, L1 had editing efficiencies of 8.27% and 3.51% at the g7 and g8 sites, respectively, and B11 had an editing efficiency of 1.36% at the g3 site. The remaining A4, B5, B2, B8, B9, B11, and L2 systems exhibited effective editing rates ranging from 0.5% to 0.04%.

[0326] Table 6 PCR amplification primers for the targeted region of the AAVS1 gene

[0327] Example 7. C2c11 system edits reporter gene plasmids in mammalian cells

[0328] To detect the editing activity of the C2c11 system in eukaryotic cells, the C2c11 system selected above was used to test the in vivo editing activity of exogenous plasmids in the human embryonic kidney cell-derived cell line HEK293T.

[0329] Construction of targeting plasmids

[0330] The exogenous plasmid CMV-BFP-P2A-mcherry (its sequence is shown in SEQ ID NO: 61, and the map is shown in Figure 12) can target the mcherry gene. Among them, the target site region to be edited in SEQ ID NO: 61 is replaced with consecutive Ns, and the specific sequences of the target sites corresponding to each test system are shown in Table 7. The single-stranded DNA was synthesized by Beijing Liuhe BGI and inserted into the open reading frame of the mcherry gene by enzyme digestion and ligation.

[0331] Construction of editing plasmids

[0332] The design of the editing plasmid is essentially the same as in Example 6. Specifically, it comprises a protein coding sequence (the same as in Example 6), a gRNA backbone sequence (the same as in Example 6), and a gRNA guide sequence (as shown in Table 8). The map of the editing plasmid is the same as in Figure 10.

[0333] Table 7 Targeting plasmid mcherry gene target site sequence

[0334] Edit the plasmid guide sequence information in Table 8:

[0335] Table 8 mCherry gene editing plasmid gRNA guide sequence

[0336] The editing plasmid and the targeting plasmid were constructed separately and co-delivered (transfected) into 293T cells. The cell culture, plasmid delivery and intracellular editing methods were essentially the same as in Example 6.

[0337] After harvesting, cells were lysed to release total DNA, including human genomic and targeted plasmid DNA. The target region was amplified using a high-fidelity amplification enzyme (PrimeSTAR GXL DNA Polymerase, Takara) with primers CMV-F: SEQ ID NO: 70; CMV-R: SEQ ID NO: 71, synthesized at Beijing Liuhe BGI. After electrophoresis, the product was detected as a distinct, single target band, which was then purified and recovered by gel excision. Concentration was measured using a Nanodrop.

[0338] Denaturation, annealing, and renaturation were performed in the same manner as in Example 6, and T7EI nuclease was used to qualitatively detect whether the product had any editing phenomena such as insertion or deletion.

[0339] The genome editing results are shown in Figure 13. In electrophoresis Figure 13, C2c11A5-g2 and C2c11B2-g1 have obvious editing activity, C2c11A4-g3, C2c11A5-g2, and C2c11B9-g1 have weak editing activity, and C2c11B11 and C2c11B12 have no obvious editing activity. Due to the limitations of electrophoresis detection sensitivity, the target sequence DNA was subsequently sequenced by NGS to more accurately assess the editing situation. First, the target site region was amplified. The amplification primers are shown in Table 9 below and synthesized at Liuhe BGI. The PCR recovery product was then sequenced by NGS at Wuhan BGI Co., Ltd. The second-generation sequencing library construction method was referred to the instructions for use of BGI's "MGIEasy Fast PCR-FREE Enzyme Digestion Library Preparation Reagent Set V2.0". The DNA was prepared into a DNBSEQ platform-specific library and sequenced using the MGISEQ2000 platform. The sequencing type was PE100.

[0340] Table 9 PCR amplification primers for the mcherry gene targeted region

[0341] Analysis of the targeted region editing products revealed that the C2c11 systems, including C2c11-A4, A5, B1, B2, B8, B9, B11, and B12, all stably edited at at least one mCherry locus on the intracellular plasmid. The editing efficiencies for A4 at the TS3 site, B1 at the ST3 site, and B8 at the ST2 site were 3.52%, 3.26%, and 3.47%, respectively. The effective editing rates for B5, B2, B9, B11, and B12 ranged from 1.96% to 0.43%.

[0342] Although the specific embodiments of the present invention have been described in detail, those skilled in the art will understand that various modifications and changes can be made to the details based on all the teachings published, and these changes are all within the scope of protection of the present invention. The entire invention is given by the appended claims and any equivalents thereof.

Claims

1. A system or composition comprising: a) Cas protein C2c11 or a nucleic acid molecule A1 encoding the Cas protein C2c11, wherein The Cas protein C2c11 is capable of cleaving or fragmenting a target nucleic acid; and the Cas protein C2c11 is less than 700 amino acids (e.g., 300-700 amino acids, 350-650 amino acids, 400-600 amino acids, 450-550 amino acids); wherein the Cas protein C2c11: (i) comprising at least one zinc finger domain capable of recognizing, binding and / or cleaving a target nucleic acid, wherein the zinc finger domain comprises an α-helical structure and a β-sheet structure and does not contain the motif Cys-X1-X2-Cys, wherein X1 and X2 are each independently selected from any amino acid; (ii) comprising at least one RuvC domain, wherein the RuvC domain is capable of binding to and / or cleaving a target nucleic acid; preferably, when the target nucleic acid is a double-stranded target nucleic acid, the RuvC domain is further capable of unwinding the double-stranded target nucleic acid (e.g., a DNA double strand, a DNA-RNA double strand) into a single-stranded target nucleic acid; and (iii) does not contain an HNH domain; and b) one or more guide RNAs or nucleic acid molecules B1 encoding the guide RNAs, wherein the guide RNAs are capable of forming a complex with the Cas protein C2c11, and under conditions that allow nucleic acid hybridization or annealing, the guide RNAs are capable of guiding the complex to hybridize or anneal to the target nucleic acid.

2. The system or composition of claim 1, wherein: The Cas protein C2c11 has one or more characteristics selected from the following: (1) The RuvC domain is capable of cleaving or breaking the phosphodiester bond in the target nucleic acid; (2) the RuvC domain cuts or cleaves the target nucleic acid, generating a target nucleic acid with sticky ends; (3) The Cas protein C2c11 comprises a REC domain; preferably, the REC domain is capable of participating in the recognition of the Cas protein C2c11 and the PAM; preferably, the REC domain comprises one or more (e.g., one, two, three, four, five or more) α-helical structures; (4) The Cas protein C2c11 contains a WED domain; preferably, the WED domain is capable of participating in the recognition of the Cas protein C2c11 and PAM, and / or the processing of crRNA (e.g., processing pre-crRNA into mature crRNA); (5) The Cas protein C2c11 has cis-cleavage activity for cleaving target nucleic acids (e.g., double-stranded target nucleic acids, single-stranded target nucleic acids); (6) The Cas protein C2c11 has a trans-cleavage activity that can be activated by the target nucleic acid; preferably, the Cas protein C2c11 can cut single-stranded nucleic acids (e.g., ssDNA, ssRNA) after being activated by the target nucleic acid; (7) the amino acid residue at position 250 of the Cas protein C2c11 corresponding to SEQ ID NO: 18 is D; (8) The amino acid residue at position 358 of the Cas protein C2c11 corresponding to SEQ ID NO: 18 is E; (9) The amino acid residue at position 425 of the Cas protein C2c11 corresponding to SEQ ID NO: 18 is S; (10) The amino acid residue at position 442 of the Cas protein C2c11 corresponding to SEQ ID NO: 18 is R; (11) The amino acid residue at position 455 of the Cas protein C2c11 corresponding to SEQ ID NO: 18 is D; (12) the amino acid residue at position 460 of the Cas protein C2c11 corresponding to SEQ ID NO: 18 is F; (13) The amino acid residue at position 484 of the Cas protein C2c11 corresponding to SEQ ID NO: 18 is F; (14) The zinc finger domain contained in the Cas protein C2c11 contains the motif Y1-X1-X2-Y2, wherein X1 and X2 are each independently selected from any amino acid, Y1 and Y2 are each independently selected from amino acids other than Cys, and Y1 and Y2 can be the same or different amino acids; for example, containing the motif Gln-X1-X2-Asn, or containing the motif His-X1-X2-Gly.

3. The system or composition of claim 1 or 2, wherein: The Cas protein C2c11 comprises a WED domain, a REC domain, a RuvC domain and a zinc finger domain; Preferably, among the domains contained in the Cas protein C2c11, the zinc finger domain is adjacent to the RuvC domain; Preferably, the WED domain comprises a WED-I subdomain and / or a WED-II subdomain; Preferably, the RuvC domain comprises a RuvC-I subdomain, a RuvC-II subdomain and / or a RuvC-III subdomain; Preferably, the zinc finger domain comprises a zinc finger domain-I subdomain and / or a zinc finger domain-II subdomain; Preferably, the Cas protein C2c11 comprises the amino acid sequence of the following domains and / or subdomains from N-terminus to C-terminus: RuvC-I subdomain, RuvC-II subdomain, zinc finger domain-I subdomain, RuvC-III subdomain and zinc finger domain-II subdomain; Preferably, the Cas protein C2c11 comprises the amino acid sequence of the following domains and / or subdomains from N-terminus to C-terminus: WED-I subdomain, REC domain, WED-II subdomain, RuvC-I subdomain, RuvC-II subdomain, zinc finger domain-I subdomain, RuvC-III subdomain and zinc finger domain-II subdomain; Preferably, the Cas protein C2c11 is a natural protein obtained from an organism (e.g., a microorganism) or a homolog thereof, or is a protein comprising one or more (e.g., 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 or more amino acids) amino acid substitutions, deletions and / or additions based on the natural protein; preferably, the protein retains the secondary structure (e.g., as defined in claim 1) and / or biological activity (e.g., as defined in claim 2) of the natural protein from which it is derived; Preferably, the microorganism is selected from bacteria, viruses (e.g., bacteriophages) or any combination thereof; Preferably, the microorganism is selected from the genus Prevotella, Bacteroides, Bacillus, Oscillatory Spirulina, Eubacterium, Ruminococcus, Clostridium (e.g., Clostridium, Butyricum), Lachnospira, Eubacterium, or any combination thereof; Preferably, the microorganism is selected from bacteriophages (e.g., tailed phages); Preferably, the microorganism is a cultured or uncultured microorganism; Preferably, the Cas protein C2c11 has an amino acid sequence as shown in any one of SEQ ID NOs: 1-30, or a sequence having one or more (e.g., 1, 2, 3, 4, 5, 6, 7, 8, 9 or 10 amino acids) amino acid substitutions, deletions and / or additions compared thereto; wherein the sequence substantially retains the secondary structure (e.g., as defined in claim 1) and / or biological activity (e.g., as defined in claim 2) of the sequence from which it is derived; Preferably, the Cas protein C2c11 has the amino acid sequence shown in any one of SEQ ID NOs: 1-30 or has at least 40%, at least 50%, at least 60%, at least 70%, at least 80%, at least 85%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% sequence identity compared thereto, and substantially retains the secondary structure (e.g., as defined in claim 1) and / or biological activity (e.g., as defined in claim 2) of the sequence from which it is derived.

4. The system or composition of any one of claims 1 to 3, wherein: The Cas protein C2c11 comprises a modified portion; Preferably, the modifying moiety is selected from another protein or polypeptide, a detectable label, a purification tag, or any combination thereof; Preferably, the modification portion is optionally linked to the N-terminus or C-terminus of the protein via a linker; Preferably, the modified portion is fused to the N-terminus or C-terminus of the protein; Preferably, the additional protein or polypeptide is selected from an epitope tag, a reporter gene sequence, a nuclear localization signal (NLS) sequence, a targeting moiety, a transcriptional activation domain (e.g., VP64), a transcriptional repression domain (e.g., a KRAB domain or a SID domain), a nuclease domain (e.g., Fok1), a domain having the following activities: methylase activity, demethylase activity, transcriptional activation activity, transcriptional repression activity, transcription release factor activity, histone modification activity, nuclease activity, single-stranded RNA cleavage activity, double-stranded RNA cleavage activity, single-stranded DNA cleavage activity, double-stranded DNA cleavage activity, nucleic acid binding activity, or any combination thereof; Preferably, the Cas protein C2c11 comprises an NLS sequence; Preferably, the NLS sequence is located at, near or close to the end (e.g., the N-terminus or C-terminus) of the Cas protein C2c11.

5. The system or composition of any one of claims 1 to 4, wherein: The system or composition comprises two or more guide RNAs capable of hybridizing to different target nucleic acids or different regions of the same target nucleic acid; Preferably, the guide RNA comprises a guide sequence, and the guide sequence is capable of hybridizing or annealing to the target nucleic acid under conditions that allow nucleic acid hybridization or annealing; Preferably, the guide sequence is at least 10 nt in length, such as 10-15 nt, 15-20 nt, 20-25 nt, 25-30 nt, 30-40 nt, 40-50 nt, 50-100 nt, 100-200 nt or longer; Preferably, the guide RNA further comprises a backbone sequence; Preferably, the length of the backbone sequence is at least 20 nt, such as 20-30 nt, 30-40 nt, 40-50 nt, 50-100 nt, 100-200 nt, 200-300 nt or longer; Preferably, the guide RNA forms a complex with the Cas protein C2c11 through the backbone sequence; Preferably, the guide sequence is located at the 3' end of the backbone sequence; Preferably, the guide RNA is non-naturally occurring or modified; Preferably, the sequence of the guide RNA comprises at least one chemical modification; Preferably, the chemical modification is selected from pseudo-U, 5-methyl-C, methylated nucleotide or nucleotide analog, 2′-O-methyl, 2′-O-methyl 3′ phosphorothioate, 2′-O-methyl 3′ thio PACE, or any combination thereof.

6. The system or composition of claim 5, wherein: The guide RNA has one or more characteristics selected from the following: (1) The backbone sequence comprises a sequence selected from the following, or consists of a sequence selected from the following: (i) the sequence shown in any one of SEQ ID NOs: 63, 65, 67, 72-98; (ii) sequences having one or more base substitutions, deletions and / or additions (e.g., 1, 2, 3, 4, 5, 6, 7, 8, 9 or 10 base substitutions, deletions and / or additions) compared to the sequences shown in SEQ ID NOs: 63, 65, 67, 72-98; (iii) a sequence having at least 50%, at least 60%, at least 70%, at least 80%, at least 90%, or at least 95% sequence identity to the sequence shown in SEQ ID NOs: 63, 65, 67, 72-98; (iv) a sequence that hybridizes under stringent conditions to the sequence described in any one of (i) to (iii); or (v) a complementary sequence of the sequence described in any one of (i) to (iii); Furthermore, the sequence of any one of (ii) to (v) substantially retains the biological activity of the sequence from which it is derived; (2) The guide RNA comprises a sequence selected from the following, or is composed of a sequence selected from the following composition: (i) the sequence shown in any one of SEQ ID NOs: 62, 64, and 66; (ii) sequences having one or more base substitutions, deletions and / or additions (e.g., 1, 2, 3, 4, 5, 6, 7, 8, 9 or 10 base substitutions, deletions and / or additions) compared to the sequences shown in SEQ ID NOs: 62, 64, and 66; (iii) a sequence having at least 50%, at least 60%, at least 70%, at least 80%, at least 90%, or at least 95% sequence identity to the sequence shown in SEQ ID NO: 62, 64, or 66; (iv) a sequence that hybridizes under stringent conditions to the sequence described in any one of (i) to (iii); or (v) a complementary sequence of the sequence described in any one of (i) to (iii); Furthermore, the sequence of any one of (ii) to (v) substantially retains the biological function of the sequence from which it is derived.

7. The system or composition of any one of claims 1 to 6, wherein: The target nucleic acid has one or more characteristics selected from the following: (1) The target nucleic acid sequence is a DNA and / or RNA sequence from a prokaryotic cell or a eukaryotic cell; or, the target nucleic acid sequence is a non-naturally occurring DNA and / or RNA sequence; (2) the target nucleic acid is selected from a double-stranded target nucleic acid, a single-stranded target nucleic acid, or any combination thereof; (3) The target nucleic acid is present in a cell; alternatively, the target nucleic acid is present in a nucleic acid molecule in vitro (e.g., a plasmid or genomic DNA collected in vitro by cell lysis or PCR amplification); preferably, the cell is a prokaryotic cell (e.g., a plant cell) or a eukaryotic cell (e.g., a mammalian cell, e.g., a human cell); (4) When the target nucleic acid is DNA, the target nucleic acid is located at the 3' end of the protospacer adjacent motif (PAM); preferably, the sequence of the PAM is selected from GTC, TTC, TTTC, TGTC, AGTC or any combination thereof; (5) After hybridization or annealing with the complex formed by the guide RNA and Cas protein C2c11, the target nucleic acid is modified (e.g., cut, edited) by the complex; preferably, the modification causes the expression product of the target nucleic acid to change (e.g., the expression amount of the expression product is increased or decreased).

8. The system or composition of any one of claims 1 to 7, wherein the system or composition further comprises an additional component; Preferably, the additional components are selected from: (1) an auxiliary protein or a nucleic acid molecule C1 encoding the auxiliary protein, wherein the auxiliary protein can enhance the activity of the Cas protein C2c11; (2) a repressor protein or a nucleic acid molecule D1 encoding the repressor protein, wherein the repressor protein is capable of reducing the activity of the Cas protein C2c11; (3) a nuclease (e.g., Cas1, Cas2, or Cas4) or a nucleic acid molecule E1 encoding the nuclease; (4) Any combination of (1) to (3) above.

9. A vector system, comprising one or more vectors, wherein the one or more vectors comprise: (a) a nucleic acid molecule A1 encoding the Cas protein C2c11 according to any one of claims 1 to 4; optionally, the vector further comprises a first regulatory element operably linked to the nucleic acid molecule A1; and (b) a nucleic acid molecule B1 encoding the guide RNA according to any one of claims 1, 5-6; optionally, the vector further comprises a second regulatory element operably linked to the nucleic acid molecule B1; in, Components (a) and (b) are located on the same or different carriers of the carrier system; Optionally, the one or more vectors further comprise a nucleic acid molecule C1 encoding an auxiliary protein; preferably, the vector further comprises a third regulatory element operably linked to the nucleic acid molecule C1; Optionally, the one or more vectors further comprise a nucleic acid molecule D1 encoding a repressor protein; preferably, the vector further comprises a fourth regulatory element operably linked to the nucleic acid molecule D1; Optionally, the one or more vectors further comprise a nucleic acid molecule E1 encoding a nuclease; preferably, the vector further comprises a fifth regulatory element operably linked to the nucleic acid molecule E1; Preferably, the first regulating element, the second regulating element, the third regulating element, the fourth regulating element, and the fifth regulating element are the same or different; Preferably, the first regulatory element, the second regulatory element, the third regulatory element, the fourth regulatory element, and the fifth regulatory element are each independently selected from a promoter (e.g., an inducible promoter), an enhancer, an internal ribosome entry site (IRES), a terminator, or any combination thereof; preferably, the vector system comprises a viral vector; Preferably, the viral vector is selected from a retroviral vector, a lentiviral vector, an adenoviral vector, an adeno-associated viral vector, a herpes simplex viral vector, or any combination thereof; Preferably, the nucleotide sequence of the nucleic acid molecule A1 is codon-optimized according to the preference of the host cell (eg, eukaryotic cell, prokaryotic cell).

10. A delivery composition comprising: (1) a delivery vehicle, and (2) the system or composition according to any one of claims 1 to 8 or the vector system according to claim 9; Preferably, the delivery vehicle is a particle; Preferably, the delivery vehicle is selected from lipid particles, sugar particles, metal particles, protein particles, liposomes, exosomes, microvesicles, gene guns, viral vectors (e.g., replication-defective retroviruses, lentiviruses, adenoviruses or adeno-associated viruses) or any combination thereof.

11. A kit comprising one or more components selected from the group consisting of: the system or composition of any one of claims 1 to 8, the carrier system of claim 9, or the delivery composition of claim 10; Preferably, the kit further comprises instructions for using the system or composition, carrier system or delivery composition.

12. A composite comprising: (i) a protein component comprising a Cas protein C2c11, wherein the Cas protein C2c11 is as defined in any one of claims 1 to 4; and (ii) a nucleic acid component; in, The protein component and the nucleic acid component combine with each other to form a complex; Preferably, the nucleic acid component comprises a guide RNA in a CRISPR-Cas system; preferably, the guide RNA is as defined in claim 5 or 6; Preferably, the protein component further comprises another protein; preferably, the other protein forms a complex with the Cas protein C2c11 in a covalent or non-covalent manner; Preferably, the additional protein is selected from: (1) an auxiliary protein, which can enhance the activity of the Cas protein C2c11; (2) a repressor protein, which can reduce the activity of the Cas protein C2c11; (3) nuclease (e.g., Cas1, Cas2, or Cas4); (4) Any combination of (1) to (3) above.

13. A method for modifying a target gene, comprising: (1) contacting the system or composition of any one of claims 1 to 8, the carrier system of claim 9, the delivery composition of claim 10, the kit of claim 11, or the complex of claim 12 with the target nucleic acid; or (2) delivering the system or composition of any one of claims 1 to 8, the vector system of claim 9, the delivery composition of claim 10, the kit of claim 11, or the complex of claim 12 into a cell containing the target nucleic acid; Preferably, in (1), the target nucleic acid is present in a nucleic acid molecule in vitro (e.g., a plasmid or genomic DNA collected in vitro by cell lysis or PCR amplification); Preferably, in (2), the cell is a prokaryotic cell (e.g., a plant cell) or a eukaryotic cell (e.g., a mammalian cell, e.g., a human cell); Preferably, the modification comprises cutting and / or editing; preferably, the modification of the target gene causes the target gene to be broken; preferably, the modification further comprises inserting an exogenous nucleic acid into the break; Preferably, the modification causes a change in the expression product of the target gene (eg, an increase or decrease in the expression level of the expression product).

14. An in vitro, ex vivo or in vivo cell or cell line or progeny thereof, comprising: the system or composition of any one of claims 1 to 8, or the carrier system of claim 9, or the delivery composition of claim 10, or the complex of claim 12; Preferably, the cell is a prokaryotic cell (eg, a plant cell), or a eukaryotic cell (eg, a mammalian cell, eg, a human cell).

15. A cell, cell line or progeny thereof comprising a modified target gene, wherein the cell or cell line has been modified by the method of claim 13; Preferably, the cell is a prokaryotic cell (eg, a plant cell) or a eukaryotic cell (eg, a mammalian cell, e.g., a human cell); Preferably, the cell is in vitro, ex vivo or in vivo; Preferably, the modification of the target gene results in an altered expression of at least one gene product of the cell; Preferably, the change in expression of the gene product is an increase or decrease in the expression level of the gene product.

16. A gene product derived from the cell, cell line, or progeny thereof according to claim 15; Preferably, the gene product is a protein.

17. Use of the system or composition of any one of claims 1 to 8, the vector system of claim 9, the delivery composition of claim 10, the kit of claim 11, or the complex of claim 12 for nucleic acid editing, or in the preparation of a nucleic acid editing formulation, in the preparation of an in vitro or ex vivo nucleic acid detection formulation, or in the preparation of a medicament for treating a disease or condition in a subject in need thereof; Preferably, the nucleic acid editing comprises gene editing; Preferably, the gene editing comprises single-site and / or multi-site gene modification, single-site and / or multi-site gene knockout, single-site and / or multi-site gene knock-in, single-site and / or multi-site methylation modification (e.g., addition, removal of methylation modification), alteration of gene product expression, repair of mutations, and / or insertion of polynucleotides; Preferably, the in vitro or ex vivo nucleic acid is selected from double-stranded DNA, single-stranded DNA, single-stranded RNA, double-stranded RNA or any combination thereof; Preferably, the system or composition, vector system, delivery composition, kit or complex is used to edit a gene associated with the disease or disorder in a subject to treat the disease or disorder; Preferably, the gene associated with the disease or condition has one or more mutations, or is a gene that causes genetic variation, or is linked to one or more genes that cause genetic variation; Preferably, the disease or condition is a genetic disease or condition; for example, a blood disease or condition, an eye disease or condition, a liver disease or condition, a muscle disease or condition, or a neurological disease or condition; Preferably, the disease or condition is a disease or condition caused by a gene mutation or a pathogenic SNP; for example, cancer.

18. A method for detecting the presence of a target nucleic acid in a sample, comprising the following steps: (1) contacting the sample with a labeled single-stranded DNA probe and any of the following components: The system or composition of any one of claims 1 to 8, the carrier system of claim 9, the delivery composition of claim 10, the kit of claim 11, or the complex of claim 12; wherein, The system or composition, vector system, delivery composition, kit or complex comprises a guide sequence that is capable of hybridizing to a target nucleic acid, and the single-stranded DNA probe does not hybridize to the guide sequence; (2) detecting a detectable signal generated by cleavage of the single-stranded DNA probe by the Cas protein C2c11 contained in the system or composition, carrier system, delivery composition, kit or complex, thereby determining whether the target nucleic acid is present in the sample; Preferably, the target nucleic acid is as defined in claim 7; Preferably, one end (eg, the 5' end) of the single-stranded DNA probe is labeled with a fluorescent group, and the other end (eg, the 3' end) is labeled with a quencher group.

19. The method of claim 18, wherein The target nucleic acid sequence is a sequence obtained from a pathogen; Preferably, the pathogen is selected from viruses, bacteria, fungi, protozoa, parasites or any combination thereof; Optionally, the method further comprises the step of contacting the sample with a reagent for reverse transcription; Preferably, the reagent for reverse transcription is selected from reverse transcriptase, oligonucleotide primers, dNTPs or any combination thereof; Optionally, the method further comprises a step of amplifying the target nucleic acid in the sample before step (1); Preferably, the amplification is selected from nucleic acid sequence-based amplification, recombinase polymerase amplification, loop-mediated isothermal amplification, chain displacement amplification, exonuclease III assisted signal amplification, hybridization chain reaction, helicase-dependent amplification, isothermal loop chain displacement polymerization, multiple displacement amplification, primase-based whole genome amplification, rolling circle amplification, whole genome amplification or any combination thereof.

20. The method according to claim 18 or 19, wherein The method further comprises, before step (1), pre-treating the sample to expose the target nucleic acid in the sample; Preferably, the sample is a biological sample or an environmental sample; preferably, the biological sample is an ex vivo biological sample; preferably, the biological sample is selected from blood, plasma, serum, urine, feces, sputum, mucus, lymph, bile, ascites, pleural effusion, saliva, cerebrospinal fluid, any body secretion, exudate or exudate (such as fluid obtained from an abscess or any other infected or inflamed site), a swab of a skin or mucosal surface, or any combination thereof; preferably, the environmental sample is selected from a food sample, a paper surface, a fabric, a metal surface, a wood surface, a plastic surface, a soil sample, a freshwater sample, a wastewater sample, a salt water sample, or any combination thereof.