Polypeptides for gene editing and uses thereof

By providing a novel Cas protein and CRISPR-Cas combination, the complexity and efficiency issues of existing systems are addressed, enabling highly efficient gene editing and nucleic acid detection capabilities suitable for diverse applications.

CN119372179BActive Publication Date: 2026-06-12YOLTECH THERAPEUTICS CO LTD

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
YOLTECH THERAPEUTICS CO LTD
Filing Date
2024-09-14
Publication Date
2026-06-12

AI Technical Summary

Technical Problem

Existing CRISPR/Cas systems each have their own advantages and disadvantages, and cannot meet diverse application needs, especially in terms of complexity and efficiency issues related to Cas proteins and RNA.

Method used

A novel Cas protein is provided, whose amino acid sequence has high identity with any one of SEQ ID Nos. 1-4, and whose biological function is retained by introducing amino acid substitutions, deletions or additions. It can also bind to guide RNA to form a CRISPR-Cas composition for gene editing and nucleic acid detection.

🎯Benefits of technology

It achieves highly efficient nuclease activity both in vivo and in vitro, making it suitable for diverse applications, including gene editing and nucleic acid detection, and has broad application prospects.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure SMS_1
    Figure SMS_1
  • Figure SMS_2
    Figure SMS_2
  • Figure SMS_3
    Figure SMS_3
Patent Text Reader

Abstract

The application provides a CRISPR-Cas effector protein, a composition and application thereof. The CRISPR-Cas effector protein comprises a protein having at least 80% identity with the amino acid sequence described in any one of SEQ ID Nos. 1-4; the application also relates to a complex and composition for nucleic acid editing, comprising the protein or fusion protein of the application, or a nucleic acid molecule encoding the same. The application also relates to a method for nucleic acid editing, using the protein or fusion protein of the application.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of gene editing, and more specifically, to a Cas protein, its gene editing system, and its applications. Background Technology

[0002] The Clustered Regularly Interspaced Short Palindromic Repeats (CRISPR) system is a defense mechanism developed by bacteria and archaea to protect against invading bacteriophage DNA. The CRISPR immune interference process mainly consists of three phases: adaptation, expression, and interference. In the adaptation phase, the CRISPR system integrates short DNA fragments from bacteriophages or plasmids between the leader sequence and the first repeat sequence. Each integration is accompanied by replication of the repeat sequence, forming a new repeat-spacer unit. In the expression phase, the CRISPR locus is transcribed into a CRISPR RNA (crRNA) precursor (pre-crRNA). This precursor, in the presence of Cas protein and tracrRNA, is further processed into smaller crRNAs at the repeat sequence. The mature crRNA forms a Cas / crRNA complex with the Cas protein. In the interference phase, the crRNA guides the Cas / crRNA complex to the target site through its complementary region. At the target site, the nuclease activity of the Cas protein causes a double-stranded DNA break, thus rendering the target DNA non-functional.

[0003] The CRISPR system is divided into three families: type I, type II, and type III. The most common type II system is the CRISPR / Cas9 system. The Cas9 protein, with the assistance of trans-encoded small RNA (tracrRNA), processes pre-crRNA into mature crRNA that binds to tracrRNA. Later, it was discovered that by artificially constructing single-stranded chimeric guide RNAs that mimic the crRNA:tracrRNA complex, the recognition and cleavage of targets by the Cas9 protein can be effectively mediated. The three bases immediately adjacent to the 3′ end of the target must be in the form of 5′-NGG-3′, thus forming the PAM (protospacer adjacent motif) structure required for the Cas / crRNA complex to recognize the target. However, different CRISPR / Cas systems have different advantages and disadvantages. For example, Cas9, C2c1, and CasX all require two RNAs as guide RNAs. Common Cas9, C2c1, CasY, and Cpf1 are typically around 1300 amino acids in size. Furthermore, the PAM sequences of Cas9, Cpf1, CasX, and CasY are all complex and diverse. There is still a need to develop new Cas proteins and CRISPR-Cas systems to meet diverse application requirements. Summary of the Invention

[0004] The main objective of this invention is to provide a novel Cas protein, its composition, and its applications to meet the aforementioned application needs. Based on this, the invention also provides a novel CRISPR-Cas composition, as well as gene editing methods and nucleic acid detection methods based on this system.

[0005] In one aspect, the present invention provides a Cas protein comprising an amino acid sequence having at least 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100% identity with any one of the amino acid sequences in SEQ ID Nos. 1-4, and substantially retaining the biological function of the sequence from which it is derived;

[0006] In one embodiment, the amino acid sequence of the Cas protein has one or more amino acid substitutions, deletions, or additions compared to any one of the amino acid sequences in SEQ ID Nos. 1-4, and substantially retains the biological function of the sequence from which it is derived.

[0007] In one embodiment, the Cas protein comprises the amino acid sequence shown in any one of SEQ ID Nos. 1-4;

[0008] Or a sequence having one or more amino acid substitutions, deletions, or additions (e.g., substitutions, deletions, or additions of 1, 2, 3, 4, 5, 6, 7, 8, 9, or 10 amino acids) compared to the sequence shown in any one of SEQ ID Nos. 1-4; or a sequence having at least 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100% amino acid sequence identity with the amino acid sequence shown in any one of SEQ ID Nos. 1-4;

[0009] In one embodiment, the protein has the amino acid sequence shown in any one of SEQ ID Nos. 1-4.

[0010] Those skilled in the art will understand that the structure of a protein can be altered without adversely affecting its activity and functionality, for example, by introducing one or more conserved amino acid substitutions into the protein's amino acid sequence without adversely affecting the protein molecule's activity and / or three-dimensional structure.

[0011] Those skilled in the art will recognize examples and implementations of conserved amino acid substitutions. Specifically, an amino acid residue can be substituted with another amino acid residue belonging to the same group as the site to be substituted, i.e., a nonpolar amino acid residue replacing another nonpolar amino acid residue, a polar uncharged amino acid residue replacing another polar uncharged amino acid residue, a basic amino acid residue replacing another basic amino acid residue, and an acidic amino acid residue replacing another acidic amino acid residue. Such substituted amino acid residues may or may not be encoded by the genetic code. Conservative substitution, where an amino acid is replaced by another amino acid belonging to the same group, falls within the scope of this invention, as long as the substitution does not lead to the inactivation of the protein's biological activity. Therefore, the proteins of this invention can contain one or more conserved substitutions in their amino acid sequence. Furthermore, this invention also covers proteins that also contain one or more other nonconservative substitutions, provided that such nonconservative substitutions do not significantly affect the desired function and biological activity of the proteins of this invention.

[0012] Conserved amino acid substitutions can occur at one or more predicted non-essential amino acid residues. “Non-essential” amino acid residues are those that can be altered (deleted, substituted, or replaced) without changing biological activity, while “essential” amino acid residues are required for biological activity. A “conserved amino acid substitution” is a substitution in which an amino acid residue is replaced by an amino acid residue with a similar side chain. Amino acid substitutions can occur in non-conserved regions of Cas enzymes. Generally, such substitutions are not performed on conserved amino acid residues, or on amino acid residues located within conserved motifs, where such residues are required for protein activity. However, those skilled in the art will understand that functional variants may have fewer conserved or non-conserved alterations in conserved regions.

[0013] In some implementations, the selected group of amino acids considered to be mutually conserved substitutions includes:

[0014]

[0015] Those skilled in the art will recognize that one or more amino acid residues can be altered (replaced, deleted, truncated, or inserted) from the N and / or C-terminus of a protein while retaining its functional activity. Therefore, proteins that have one or more amino acid residues altered from the N and / or C-terminus of the Cas protein of this invention while retaining their desired functional activity are also within the scope of this invention. These alterations may include those introduced by modern molecular methods such as PCR, which includes PCR amplification that alters or lengthens the protein-coding sequence by means of oligonucleotides containing amino acid-coding sequences used in the PCR amplification.

[0016] It should be recognized that proteins can be altered in a variety of ways, including amino acid substitution, deletion, truncation, and insertion, and the methods used for such operations are generally known to those skilled in the art.

[0017] For example, amino acid sequence variants of the Cas protein can be prepared by mutating the DNA. This can also be accomplished through other forms of mutagenesis and / or directed evolution, for example, by using known mutagenesis, recombination, and / or shuffling methods, combined with relevant screening methods, to perform one or more amino acid substitutions; or one or more amino acid deletions and / or one or more amino acid insertions.

[0018] Those skilled in the art will understand that these minute amino acid changes in the Cas protein of the present invention can occur (e.g., naturally occurring mutations) or be generated (e.g., using r-DNA technology) without loss of protein function or activity. If these mutations occur in the catalytic domain, active site, or other functional domains of the protein, the properties of the polypeptide may be altered, but the polypeptide may retain its activity. If the mutations are not located near the catalytic domain, active site, or other functional domains, a smaller impact can be expected.

[0019] Those skilled in the art can identify the essential amino acids of Cas proteins using methods known in the art, such as localized mutagenesis, protein evolution, or bioinformatics analysis. The catalytic domains, active sites, or other functional domains of the protein can also be determined through physical structural analysis, such as by techniques like nuclear magnetic resonance, crystallography, electron diffraction, or photoaffinity labeling, combined with mutations in presumed key site amino acids.

[0020] In one aspect, the present invention provides a fusion protein comprising the Cas protein of the present invention and one or more functional domains.

[0021] In one embodiment, the functional domain includes one or more of the following: localization signal, reporter protein, Cas protein targeting portion, DNA binding domain, epitope tag, transcription activation domain, transcription repression domain, nuclease, deamination domain, methyltransferase, demethylase, transcription release factor, HDAC, cleavage active peptide, and ligase.

[0022] In one embodiment, the functional domain is selected from the adenosine deaminase catalytic domain or the cytidine deaminase catalytic domain.

[0023] In one embodiment, the positioning signal includes a nuclear positioning signal and / or a nuclear output signal;

[0024] Preferably, the nuclear output signal includes human protein tyrosine kinase 2;

[0025] Preferably, the reporter protein includes one or more of glutathione S-transferase, horseradish peroxidase, chloramphenicol acetyltransferase, β-galactosidase, β-glucuronidase, or autofluorescent protein;

[0026] Preferably, the autofluorescent protein includes one or more of green fluorescent protein, HcRed, DsRed, cyan fluorescent protein, yellow fluorescent protein, or blue fluorescent protein;

[0027] Preferably, the DNA binding domain includes one or more of methylation-binding proteins, LexADBD, or Gal4DBD;

[0028] Preferably, the epitope tag includes one or more of the following: histidine tag, V5 tag, FLAG tag, influenza virus hemagglutinin tag, Myc tag, VSV-G tag, or thioredoxin tag;

[0029] Preferably, the transcriptional activation domain includes VP64 and / or VPR;

[0030] Preferably, the transcriptional repression domain includes KRAB and / or SID;

[0031] Preferably, the nuclease comprises FokI;

[0032] Preferably, the deammoniation domain includes one or more of ADAR1, ADAR2, APOBEC, AID, or TAD;

[0033] Preferably, the cleavage-active polypeptide includes a polypeptide with single-stranded RNA cleavage activity, a polypeptide with double-stranded RNA cleavage activity, a polypeptide with single-stranded DNA cleavage activity, or a polypeptide with double-stranded DNA cleavage activity.

[0034] Preferably, the ligase includes DNA ligase and / or RNA ligase.

[0035] In one implementation, the functional domain is the full length or a functional segment of TadA8e;

[0036] In one aspect, the present invention provides a polynucleotide, said polynucleotide being a polynucleotide sequence encoding the Cas protein, or a polynucleotide sequence encoding the aforementioned fusion protein.

[0037] In one embodiment, the polynucleotide is a DNA molecule codon-optimized according to the codon preference of the host cell;

[0038] In one implementation, the host cell includes a prokaryotic cell or a eukaryotic cell;

[0039] In one embodiment, the DNA molecule comprises nucleotides having at least 70%, preferably at least 90%, more preferably at least 95%, further preferably at least 99%, and even more preferably at least 100% identity with the nucleotide sequence described in any one of SEQ ID Nos. 5-8.

[0040] In one aspect, the present invention provides a guide RNA comprising a direct repeat (DR) sequence capable of binding the Cas protein and a spacer sequence capable of targeting a target sequence.

[0041] In some embodiments, the 3' end of the direct repeat (DR) sequence includes a stem-loop structure, and further includes a stem formed by the hybridization of a first stem nucleotide chain and a second stem nucleotide chain, wherein the loop nucleotide chain forms the loop of the stem-loop structure.

[0042] In some implementations, the direct repeat (DR) sequence has a stem-loop structure at both the 5' and 3' ends.

[0043] In some embodiments, the stem of the stem-loop structure does not require precise base pairing. Therefore, the stem may include one or more base mismatches. Alternatively, base pairing may be precise, i.e., it may not include any mismatches.

[0044] In some embodiments, the stem structure of the described stem-ring structure separates two stem portions by a ring.

[0045] In one embodiment, the direct repeat (DR) sequence comprises a nucleotide sequence having at least 80% identity with the nucleotide sequences described in SEQ ID Nos. 9-12;

[0046] In one embodiment, the direct repeat (DR) sequence comprises a nucleotide sequence having at least 85%, more preferably 90%, and even more preferably 95% identity with the nucleotide sequences described in SEQ ID Nos. 9-12;

[0047] As used herein, a direct repeat sequence may refer to a DNA-coding sequence at a CRISPR locus or to the RNA encoded by it in crRNA. Therefore, when referring to any of SEQ ID Nos. 9-12 in the context of an RNA molecule (such as crRNA), each T should be understood to represent U.

[0048] In one embodiment, the direct repeat (DR) sequence comprises any of the sequences shown in SEQ ID Nos. 9-12.

[0049] In one implementation, more than 80% of the spacer sequence is complementary to the target nucleic acid;

[0050] In one embodiment, more than 90%, more than 95%, more preferably more than 99%, and even more preferably 100% of the spacer sequence is complementary to the target nucleic acid;

[0051] In one implementation, the length of the spacer sequence is 18-41 nt;

[0052] In one implementation, the spacer sequence is 20 nt in length.

[0053] In one aspect, the present invention also provides a CRISPR-Cas composition comprising:

[0054] (1) Protein component: the aforementioned Cas protein, or the aforementioned fusion protein; or a nucleic acid molecule encoding the aforementioned Cas protein or the aforementioned fusion protein;

[0055] (2) RNA component: guide RNA, or one or more nucleic acids encoding the guide RNA, or precursor RNA of the guide RNA, or nucleic acid encoding the precursor RNA of the guide RNA;

[0056] The protein components and nucleic acid components combine to form a complex.

[0057] In one embodiment, the composition is an activated CRISPR complex, the activated CRISPR complex further comprising a target sequence of a target nucleic acid bound to the guide RNA.

[0058] In one embodiment, the CRISPR-Cas composition includes one or more vectors, said one or more vectors comprising:

[0059] (1) A first regulatory element, operatively linked to a nucleotide sequence encoding the Cas protein or a nucleotide sequence encoding the fusion protein; and

[0060] (2) A second regulatory element, operatively linked to a nucleotide sequence encoding the guide RNA, the guide RNA comprising:

[0061] (a) A spacer sequence capable of hybridizing with the target sequence of the target nucleic acid, and

[0062] (b) A direct repeat (DR) sequence attached to the spacer sequence that guides the Cas protein to bind to the guide RNA to form a CRISPR-Cas complex targeting the target sequence;

[0063] The first control element and the second control element are located on the same or different carriers of the CRISPR-Cas carrier system.

[0064] In one embodiment, the first or second regulatory element includes a promoter, which includes one or more of an inductive promoter, a constitutive promoter, or a tissue-specific promoter;

[0065] In one embodiment, the promoter includes one or more of T7, SP6, T3, CMV, EF1a, SV40, PGK1, humanβ-actin, CAG, U6, H1, T7, T7lac, araBAD, trp, lac, or Ptac;

[0066] In one implementation, the first control element and the second control element are located on the same or different carriers.

[0067] In one embodiment, the vector includes a retroviral vector, a lentiviral vector, an adenovirus vector, an adeno-associated virus vector, a herpes simplex vector, or a phage particle vector.

[0068] In one embodiment, the vector includes a plasmid vector.

[0069] In one embodiment, the target nucleic acid includes DNA derived from eukaryotes or DNA derived from prokaryotes;

[0070] In one implementation, the eukaryotes include animals or plants;

[0071] In one embodiment, the target nucleic acid includes non-human mammal DNA, human DNA, insect DNA, bird DNA, reptile DNA, amphibian DNA, rodent DNA, fish DNA, worm DNA, nematode DNA, or yeast DNA.

[0072] In one implementation, the non-human mammalian DNA includes non-human primate DNA.

[0073] In one embodiment, the present invention provides a CRISPR-Cas system comprising one or more vectors, said one or more vectors comprising:

[0074] (i) a first nucleic acid, which is a nucleotide sequence encoding the Cas protein or the fusion protein described in this disclosure; optionally, the first nucleic acid is operatively linked to a first regulatory element; and

[0075] (ii) a second nucleic acid encoding a nucleotide sequence comprising the guide RNA described herein; optionally, the second nucleic acid is operatively linked to a second regulatory element;

[0076] in:

[0077] The first nucleic acid and the second nucleic acid may exist on the same or different vectors;

[0078] The guide RNA is capable of forming a complex with the Cas protein or fusion protein described in (i).

[0079] In one aspect, the present invention also provides a delivery system comprising the Cas protein or the fusion protein, or the polynucleotide, or the CRISPR-Cas composition.

[0080] In one embodiment, the delivery system further includes a delivery medium, which includes nanoparticles, liposomes, exosomes, microbubbles, gene guns, or electroporation devices.

[0081] In another aspect, the present invention also provides a host cell or its progeny comprising the aforementioned Cas protein, or the aforementioned fusion protein, or the aforementioned polynucleotide, or the aforementioned vector system, or the aforementioned CRISPR-Cas system, or the aforementioned composition, wherein the cell or its progeny contains modifications not present in its wild type.

[0082] In one embodiment, the host cell includes non-human mammals, humans, insects, birds, reptiles, amphibians, rodents, fish, worms, nematodes, or yeast cells.

[0083] In one aspect, the present invention also provides a multicellular organism comprising the aforementioned cells or their descendants.

[0084] In one implementation, the multicellular organism is an animal or plant model used for the relevant disease.

[0085] In one aspect, the present invention provides an enzyme preparation comprising the Cas protein, the fusion protein, the complex, the CRISPR-Cas composition, the system, or the delivery composition described herein.

[0086] In one aspect, the present invention also provides a method for targeting and editing a target nucleic acid, the method comprising contacting the target nucleic acid with any of the aforementioned CRISPR-Cas systems or compositions.

[0087] In one aspect, the present invention also provides a method for nonspecifically degrading single-stranded DNA after recognizing a target nucleic acid, the method comprising contacting the target nucleic acid with the aforementioned CRISPR-Cas composition.

[0088] In one aspect, the present invention also provides a method for targeting and creating a nick in a double-stranded target nucleic acid after recognizing the spacer complementary strand of the double-stranded target nucleic acid, the method comprising contacting the double-stranded target nucleic acid with the aforementioned CRISPR-Cas system or composition.

[0089] In one aspect, the present invention also provides a method for targeting and cleaving a double-stranded target nucleic acid, the method comprising contacting the double-stranded target nucleic acid with the aforementioned CRISPR-Cas system or composition.

[0090] In one embodiment, the non-spacer sequence complementary strand of the double-stranded target nucleic acid is nicked before the spacer complementary strand of the double-stranded DNA is nicked.

[0091] In one aspect, the present invention also provides a method for specifically editing double-stranded nucleic acids, the method comprising allowing sufficient contact time under adequate conditions,

[0092] (1) the aforementioned Cas protein, or fusion protein, another enzyme with sequence-specific cleavage activity, and the guide RNA, the guide RNA instructing the Cas protein or the fusion protein to cleave the opposite strand relative to the activity of the other sequence-specific cleavage enzyme; and (2) the double-stranded nucleic acid; the method resulting in the formation of double-strand breaks.

[0093] In one aspect, the present invention also provides a method for editing double-stranded nucleic acids, the method comprising allowing sufficient contact for a sufficient amount of time under adequate conditions:

[0094] (1) the aforementioned Cas protein, or fusion protein, and fusion protein of a protein domain having DNA modification activity, and the RNA guide targeting the double-stranded nucleic acid; and (2) the double-stranded nucleic acid;

[0095] The Cas protein of the fusion protein is modified to create a nick in the non-target strand of the double-stranded nucleic acid.

[0096] In one implementation, the two strands of the double-stranded nucleic acid are cleaved at different sites, resulting in staggered cleavage.

[0097] In one embodiment, the two strands of the double-stranded nucleic acid are cleaved at the same site, resulting in a flat double-strand break.

[0098] In one aspect, the present invention also provides a method for targeting and cleaving a single-stranded target nucleic acid, the method comprising contacting the target nucleic acid with the CRISPR-Cas composition of the present invention.

[0099] In one aspect, the present invention also provides a method for inducing changes in cell state, the method comprising contacting the aforementioned CRISPR-Cas composition with the target nucleic acid in the cell.

[0100] In one implementation, the cell state includes apoptosis or dormancy;

[0101] In one embodiment, the cells include eukaryotic cells or prokaryotic cells;

[0102] In one embodiment, the cells include mammalian cells or plant disease cells;

[0103] In one implementation, the cells include cancer cells;

[0104] In one embodiment, the cells include infectious cells or cells infected by an infectious agent;

[0105] In one embodiment, the cells include virus-infected cells and prion-infected cells;

[0106] In one embodiment, the cells include fungal cells, protozoan or parasitic cells.

[0107] In one aspect, the present invention also provides a method for detecting target nucleic acids in a sample, the method comprising contacting the sample with the aforementioned Cas protein, guide RNA, and non-target sequence; detecting a detectable signal generated by the Cas protein cleaving the non-target sequence, thereby detecting the target nucleic acid; wherein the non-target sequence does not hybridize with the guide RNA.

[0108] In one aspect, the present invention provides a kit comprising the aforementioned Cas protein, the aforementioned fusion protein, the aforementioned polynucleotide, the aforementioned CRISPR-Cas composition, and the aforementioned host cells for use in preparing the kit, wherein the components of the kit are in the same or different containers.

[0109] In one aspect, the present invention also provides a container comprising the aforementioned reagent kit.

[0110] In one embodiment, the container includes a sterile container;

[0111] In one embodiment, the container includes a syringe.

[0112] In one aspect, the present invention also provides the use of the aforementioned Cas protein, the aforementioned fusion protein, the aforementioned polynucleotide, the aforementioned CRISPR-Cas composition, and the aforementioned host cell in the preparation of a medicament for treating a condition or disease of a subject in need.

[0113] In one embodiment, the application includes administering the CRISPR-Cas composition to the subject or to ex vivo cells of the subject;

[0114] In one embodiment, the spacer sequence is complementary to at least 15 nucleotides of the target nucleic acid associated with the condition or disease, and the Cas protein or the fusion protein cleaves the target nucleic acid;

[0115] In one implementation, the condition or disease includes cancer or an infectious disease;

[0116] In one embodiment, the cancer includes one or more of the following: Wilms' tumor, Ewing sarcoma, neuroendocrine tumor, glioblastoma, neuroblastoma, melanoma, skin cancer, breast cancer, colon cancer, rectal cancer, prostate cancer, liver cancer, kidney cancer, pancreatic cancer, lung cancer, biliary tract cancer, cervical cancer, endometrial cancer, esophageal cancer, gastric cancer, head and neck cancer, medullary thyroid carcinoma, ovarian cancer, glioma, lymphoma, leukemia, myeloma, acute lymphoblastic leukemia, acute myeloid leukemia, chronic lymphocytic leukemia, chronic myeloid leukemia, Hodgkin's lymphoma, non-Hodgkin's lymphoma, or urobladder cancer.

[0117] In one embodiment, the condition or disease includes one or more of the following: cystic fibrosis, progressive pseudohypertrophic muscular dystrophy, Becker's muscular dystrophy, α-1-antitrypsin deficiency, Pompe disease, myotonic dystrophy, Huntington's disease, fragile X syndrome, Friedreich's ataxia, amyotrophic lateral sclerosis, frontotemporal dementia, hereditary chronic kidney disease, hyperlipidemia, hypercholesterolemia, Leber's congenital amaurosis, sickle cell disease, hypercholesterolemia, transthyretin amyloidosis, or β-thalassemia.

[0118] In one embodiment, the infectious agent of the infectious disease includes one or more of human immunodeficiency virus, herpes simplex virus-1, or herpes simplex virus-2.

[0119] In one aspect, the present invention also provides a method for obtaining a plant with a desired trait, comprising using the aforementioned gene editing system to contact plant cells, modifying the genes of the plant cells or introducing a target gene, wherein the modification or target gene is capable of expressing the desired trait, thereby obtaining modified plant cells.

[0120] The modified plant cells are used for regeneration to obtain plants with the desired traits.

[0121] In one aspect, the present invention also provides a method for identifying a target trait in a plant, wherein a target gene in the plant cell is capable of expressing the target trait, and the target gene is identified by contacting the plant cell with the aforementioned CRISPR-Cas composition.

[0122] In one aspect, the present invention also provides an implantable device comprising the aforementioned CRISPR-Cas composition.

[0123] In one embodiment, the CRISPR-Cas system or composition is within a matrix;

[0124] In one embodiment, the CRISPR-Cas system or composition is stored in a reservoir.

[0125] Beneficial effects of the present invention

[0126] Compared with the Cas enzymes disclosed in the prior art, the Cas protein of the present invention is a novel Cas enzyme that exhibits better nuclease activity in vivo and in vitro and has broad application prospects. Attached Figure Description

[0127] Figure 1 The map of the T7-Cas-T7-crRNA vector is shown.

[0128] Figure 2 The secondary structure predictions for the direct repeat (DR) sequences corresponding to YTGE-167, YTGE-174, YTGE-190, and YTGE-192 are shown.

[0129] Figure 3 The target plasmid map is shown.

[0130] Figure 4 The editing efficiency of YTGE-167, YTGE-174, YTGE-190 and YTGE-192 in Escherichia coli was shown.

[0131] Figure 5 The cleavage efficiency of YTGE-167, YTGE-174, YTGE-190 and YTGE-192 in mammalian cells is shown. Detailed Implementation Plan

[0132] The following embodiments are for illustrative purposes only and are not intended to limit the invention. Unless otherwise specified, the experiments and methods described in the embodiments are generally performed in accordance with conventional methods well known in the art and described in various references.

[0133] Furthermore, unless specific conditions are specified in the examples, conventional conditions or conditions recommended by the manufacturer should be followed. Reagents or instruments whose manufacturers are not specified are all commercially available conventional products. Those skilled in the art will understand that the examples are described by way of illustration and are not intended to limit the scope of protection claimed by the invention. All disclosures and other references mentioned herein are incorporated herein by reference in their entirety.

[0134] Cas protein

[0135] In this invention, Cas protein, Cas enzyme, and Cas effector protein can be used interchangeably. Cas protein is used in the broadest sense, including wild-type Cas protein, its derivatives or variants, analogs, and its functional fragments such as oligonucleotide binding fragments.

[0136] The term “wild type” has the meaning commonly understood by those skilled in the art as referring to the typical form of an organism, strain, gene, or protein, or the characteristic that distinguishes it from mutant or variant forms when it exists in nature, which can be isolated from its natural source and has not been intentionally modified by humans.

[0137] In some embodiments, the Cas proteins of the present invention are referred to as YTGE-167, YTGE-174, YTGE-190, and YTGE-192, and their amino acid sequences are shown in SEQ ID Nos. 1-4.

[0138] The terms “variant,” “derivative,” and “analyte” refer to polypeptides that substantially retain the function or activity of the Cas protein of the present invention.

[0139] In some embodiments, the Cas protein comprises an amino acid sequence having at least 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, 99.5%, or 100% identity with the amino acid sequence shown in any of SEQ ID Nos. 1-4.

[0140] In some embodiments, the amino acid sequence of the Cas protein has one or more amino acid substitutions, deletions, or additions compared to any of the amino acid sequences in SEQ ID Nos. 1-4, and substantially retains the biological function of the sequence from which it is derived; the one or more amino acid substitutions, deletions, or additions include one or more (e.g., 1, 2, 3, 4, 5, 6, 7, 8, 9, or 10) amino acid substitutions, deletions, or additions.

[0141] In some implementations, the Cas protein is derived from the same or different species as YTGE-167, YTGE-174, YTGE-190, and YTGE-192.

[0142] Generally, protein derivatization does not adversely affect the protein's desired activity (e.g., activity with guide RNA, endonuclease activity, activity with guide RNA-guided binding to and cleavage at specific sites on the target sequence); that is, the protein derivative has the same activity as the original protein. A modified form of "derivative" includes one or more amino acids of the protein that may be deleted, inserted, modified, and / or substituted. The terms "non-natural" or "engineered" are used interchangeably and indicate artificial involvement.

[0143] Orthologue (ortholog)

[0144] As used herein, the term "orthologue" has the meaning commonly understood by those skilled in the art. As further guidance, an "orthologue" of a protein, as described herein, refers to a protein belonging to a different species that performs the same or similar function as the protein that is its orthologue.

[0145] The nucleic acid cleavage of the present invention includes: DNA or RNA breaks in target nucleic acids generated by the Cas protein (Cis cleavage), and DNA or RNA breaks in side-branched nucleic acid substrates (single-stranded nucleic acid substrates) caused by the paracleavage activity of the Cas protein (i.e., non-specific or non-targeted, trans cleavage). In some embodiments, the cleavage is a double-stranded DNA break. In some embodiments, the cleavage is a single-stranded DNA break or a single-stranded RNA break.

[0146] Paracleavage refers to the phenomenon where, under certain conditions, partially activated Cas proteins remain active after binding to a target sequence and continue to nonspecifically cleave non-target oligonucleotides. This paracleavage activity enables the detection of specific target oligonucleotides using Cas systems. For example, the Cas12i system has been engineered to nonspecifically cleave ssDNA or transcripts. Paracleavage activity has been used in a highly sensitive and specific nucleic acid detection platform called SHERLOCK, which can be used in many clinical diagnostics (Gootenberg, JS et al., Nucleic acid detection with CRISPR-Cas13a / C2c2. Science 356, 438-442 (2017)).

[0147] CRISPR system

[0148] The terms “regularly clustered short palindromic repeats (CRISPR)-CRISPR-related (Cas) (CRISPR-Cas) system” or “CRISPR system” are used interchangeably and have the meaning commonly understood by those skilled in the art, which typically includes transcripts or other elements relating to the expression of CRISPR-related (“Cas”) genes, or transcripts or other elements capable of directing the activity of said Cas genes.

[0149] CRISPR / Cas complex

[0150] The term "CRISPR / Cas complex" refers to a complex formed by the binding of gRNA (guide RNA) or mature crRNA (or guide RNA) to the Cas protein. This complex contains a spacer sequence that hybridizes to the target sequence and a unidirectional repeat sequence that binds to the Cas protein. This complex can recognize and cleave target nucleotides that can hybridize with the guide RNA or mature crRNA.

[0151] Guide RNA (gRNA, guideRNA)

[0152] The terms “guide RNA (gRNA),” “mature crRNA,” “guide sequence,” and “guide RNA” are used interchangeably and have the meanings commonly understood by those skilled in the art. Generally, guide RNA may comprise a direct repeat (DR) sequence and a spacer sequence, or consist essentially of or composed of direct repeat (DR) sequences and spacer sequences.

[0153] In some cases, the guide sequence is any polynucleotide sequence that is sufficiently complementary to the target sequence to hybridize with the target sequence and guide the CRISPR-Cas complex to specifically bind to the target sequence. In one embodiment, when optimally aligned, the complementarity between the guide sequence and its corresponding target sequence is at least 50%, at least 60%, at least 70%, at least 80%, at least 90%, at least 95%, or at least 99%.

[0154] The gRNAs of YTGE-167, YTGE-174, YTGE-190, and YTGE-192 of the present invention contain a spacer sequence that hybridizes with a target nucleic acid, wherein the target nucleic acid includes a sequence located at the 3' end of a prototype spacer adjacent motif (PAM).

[0155] In some implementations, the PAM sequence is 5'-TTN-3', where N is selected from A, T, C, or G.

[0156] target nucleic acid

[0157] In this invention, the terms "target nucleic acid" and "target sequence" or "target nucleic acid sequence" are used interchangeably to refer to a specific nucleic acid containing a nucleic acid sequence that is wholly or partially complementary to the spacer sequence in the guide RNA. "Target sequence" refers to a polynucleotide targeted by the spacer sequence in the guide RNA, such as a sequence complementary to that spacer sequence, wherein hybridization between the target sequence and the spacer sequence will promote the formation of a CRISPR-Cas complex (including the Cas protein and the guide RNA). Perfect complementarity is not required, as long as sufficient complementarity exists to induce hybridization and promote the formation of a CRISPR-Cas complex. In some embodiments, the target nucleic acid contains a non-coding region (e.g., a promoter or terminator). In some embodiments, the target nucleic acid is single-stranded or double-stranded.

[0158] The target sequence can contain any polynucleotide, such as DNA or RNA. In some cases, the target sequence is located inside or outside the cell. In other cases, the target sequence is located in the cell nucleus, cytoplasm, or organelles (such as mitochondria or chloroplasts).

[0159] The target nucleic acid can be a sequence encoding a gene product (e.g., a protein) or a non-coding sequence (e.g., a regulatory polynucleotide or useless DNA). In some cases, the target sequence should be associated with a protospacer adjacent motif (PAM).

[0160] Donor template

[0161] In this invention, the donor template nucleic acid or the donor template can be used interchangeably, meaning that after the Cas protein described herein alters the target nucleic acid, one or more cellular proteins can use it to alter the structure of the target nucleic acid.

[0162] In some embodiments, the donor template nucleic acid is a double-stranded or single-stranded nucleic acid. In some embodiments, the donor template nucleic acid is linear or circular (e.g., a plasmid). In some instances, the donor template nucleic acid is a foreign nucleic acid molecule. In some instances, the donor template nucleic acid is an endogenous nucleic acid molecule (e.g., a chromosome). In some embodiments, gene recombination, specifically homologous recombination, can be achieved using the donor template.

[0163] Cutting

[0164] Cleavage refers to a break in the DNA of the target nucleic acid produced by the Cas protein described herein. In some embodiments, cleavage is a double-stranded DNA break. In some embodiments, cleavage is a single-stranded DNA break.

[0165] In this invention, the meanings of cleaving target nucleic acids or modifying target nucleic acids can overlap. Modifying target nucleic acids includes not only the modification of single nucleotides, but also the insertion or deletion of nucleic acid fragments.

[0166] Report nucleic acid

[0167] A reporter nucleic acid is a molecule that can be cleaved or otherwise inactivated by an activated CRISPR system protein as described herein. A reporter nucleic acid comprises a nucleic acid element that can be cleaved by a CRISPR protein (e.g., a single-stranded, non-targeting nucleic acid molecule with distinct reporter groups or labeled molecules at both ends). Cleavage of the nucleic acid element produces a detectable signal. Prior to cleavage, or while the reporter nucleic acid is in an “active” state, the reporter nucleic acid prevents the generation or detection of a positive detectable signal. It will be understood that in some example embodiments, minimal background signal may be generated in the presence of an active reporter nucleic acid. A positive detectable signal can be any signal that can be detected using optical, fluorescent, chemiluminescent, electrochemical, or other detection methods known in the art. For example, in some embodiments, a first signal (i.e., a negative detectable signal) may be detected in the presence of a reporter nucleic acid, and then converted to a second signal (e.g., a positive detectable signal) upon detection of a target molecule and upon cleavage or inactivation by an activated CRISPR protein. The reporter nucleic acid can be a single-stranded DNA molecule, a single-stranded RNA molecule, or a single-stranded DNA-RNA hybrid.

[0168] The detection method described in this invention can be used for the quantitative detection of target nucleic acids. The quantitative detection index can be determined based on the signal strength of the reporter group, such as the luminescence intensity of the fluorescent group or the width of the colored band.

[0169] Functional structural domain

[0170] In this invention, the term "functional domain" is used in its broadest sense, encompassing proteins such as enzymes or factors themselves, or fragments / domains with specific functions. Cas proteins (e.g., dCas proteins) are linked / associated with one or more functional domains, selected from one or more of the following: localization signals, reporter proteins, Cas protein targeting portions, DNA-binding domains, epitope tags, transcriptional activation domains, transcriptional repression domains, nucleases, deamination domains, methyltransferases, demethylases, transcription release factors, HDACs, cleavage-active peptides, and ligases. When more than one functional domain is included, the functional domains may be the same or different.

[0171] Deamination domain

[0172] In this invention, the deamination domain includes a deaminase (e.g., adenosine deaminase or cytidine deaminase) catalytic domain. As used herein, "adenosine deaminase" or "adenosine deaminase protein" refers to a protein, polypeptide, or one or more functional domains of a protein or polypeptide capable of catalyzing a hydrolytic deamination reaction that converts adenine (or the adenine portion of a molecule) to hypoxanthine (or the hypoxanthine portion of a molecule), as shown below. In some embodiments, the adenine-containing molecule is adenosine (A), and the hypoxanthine-containing molecule is inosine (I). The adenine-containing molecule may be deoxyribonucleic acid (DNA) or ribonucleic acid (RNA).

[0173] Adenosine deaminases include, but are not limited to, enzyme family members called RNA-acting adenosine deaminases (ADAR), enzyme family members called tRNA-acting adenosine deaminases (ADAT), and other family members containing an adenosine deaminase domain (ADAD). According to this disclosure, adenosine deaminases are capable of targeting adenine in RNA / DNA and RNA duplexes. In certain embodiments, adenosine deaminases have been modified to increase their ability to edit DNA in RNA / DNA heteroduplexes of RNA duplexes.

[0174] In some embodiments, the deaminase is a cytidine deaminase. The term "cytidine deaminase" or "cytidine deaminase protein" refers to a protein, polypeptide, or one or more functional domains of a protein or polypeptide that catalyzes a hydrolytic deamination reaction that converts cytosine (or the cytosine portion of a molecule) to uracil (or the uracil portion of a molecule). In some embodiments, the cytosine-containing molecule is cytidine (C), and the uracil-containing molecule is uridine (U). The cytosine-containing molecule may be deoxyribonucleic acid (DNA) or ribonucleic acid (RNA).

[0175] Cytidine deaminases include, but are not limited to, members of an enzyme family known as the apolipoprotein B mRNA editing complex (APOBEC) family of deaminases, activation-induced deaminase (AID), or cytidine deaminase 1 (CDA1). In specific embodiments, APOBEC family deaminases are used.

[0176] identity

[0177] "Identity" refers to the sequence matching between two polypeptides or two nucleic acids. "Identity" represents the percentage of identical residues in the polypeptide or nucleic acid sequence out of the total number of residues, and is calculated based on mutation type. Mutation types include insertions (extensions) at either end of a sequence, deletions (truncations) at either end of a sequence, substitutions of one or more amino acids / nucleotides, insertions within a sequence, and deletions within a sequence.

[0178] For example, with polypeptide sequences, if the mutation type is one or more of the following: substitution / replacement of one or more amino acids / nucleotides, insertion within the sequence, and deletion within the sequence, the total residue count is calculated based on the larger of the compared molecules. If the mutation type also includes insertions (extensions) or deletions (truncations) at either end of the sequence, the number of amino acids inserted or deleted at either end (e.g., less than 20 at either end) is not included in the total residue count. When calculating the percentage of identity, the sequences being compared are aligned in a manner that produces the maximum match between sequences, and gaps in the alignment (if present) are resolved using a specific algorithm. Nucleotide identity is calculated similarly.

[0179] carrier

[0180] The carrier is a nucleic acid molecule that can transport another nucleic acid molecule that is linked to it.

[0181] Vectors include, but are not limited to, single-stranded, double-stranded, or partially double-stranded nucleic acid molecules; nucleic acid molecules including one or more free ends, or those without free ends (e.g., circular); nucleic acid molecules including DNA, RNA, or both; and a wide variety of other polynucleotides known in the art. Vectors can be introduced into host cells through transformation, transduction, or transfection, thereby enabling the expression of their carried genetic material elements in the host cells. A vector can be introduced into a host cell to produce transcripts, proteins, or peptides, including proteins, fusion proteins, isolated nucleic acid molecules, etc., as described herein (e.g., CRISPR transcripts, such as nucleic acid transcripts, proteins, or enzymes). A vector may contain a variety of elements controlling expression, including but not limited to, promoter sequences, transcription initiation sequences, enhancer sequences, selection elements, and reporter genes. Vectors may also contain a replication initiation site.

[0182] Vectors include plasmids and viral vectors. A plasmid is a circular double-stranded DNA loop in which another DNA fragment can be inserted, for example, using standard molecular cloning techniques. A viral vector contains a virus-derived DNA or RNA sequence within a vector used to package the virus; viruses include, for example, retroviruses, replication-defective retroviruses, adenoviruses, replication-defective adenoviruses, and adeno-associated viruses. Viral vectors also contain polynucleotides carried by a virus intended for transfection into a host cell. Some vectors (e.g., bacterial vectors with bacterial origins of replication and augmented mammalian vectors) are capable of autonomous replication in the host cells into which they are introduced.

[0183] Other vectors (e.g., non-attachment mammalian vectors) integrate into the host cell's genome after introduction and thereby replicate along with the host genome. Furthermore, some vectors can direct the expression of genes they are operatively linked to. Such vectors are called "expression vectors."

[0184] Those skilled in the art will understand that the design of expression vectors can depend on factors such as the selection of host cells to be transformed and the desired expression level.

[0185] Control element

[0186] "Regulatory elements" include promoters, enhancers, internal ribosome entry sites (IRES), and other expression control elements (e.g., transcription termination signals, such as polyadenylation signals, poly-U sequences), for detailed description in Goeddel, GENE EXPRESSION TECHNOLOGY: METHODS IN ENZYMOLOGY 185, Academic Press, San Diego, Calif (1990). In some cases, regulatory elements include those sequences that direct constitutive expression of a nucleotide sequence in many types of host cells and those sequences that direct expression of that nucleotide sequence only in certain host cells (e.g., tissue-specific regulatory sequences). Tissue-specific promoters may primarily direct expression in the desired tissue of interest, such as muscle, neurons, bone, skin, blood, specific organs (e.g., liver, pancreas), or specific cell types (e.g., lymphocytes). In other cases, regulatory elements may also direct expression in a time-dependent manner (e.g., in a cell cycle-dependent or developmental stage-dependent manner), which may or may not be tissue- or cell type-specific.

[0187] A promoter is a non-coding nucleotide sequence located upstream of a gene that initiates the expression of downstream genes. A constitutive promoter is a nucleotide sequence that, when operatively linked to a polynucleotide encoding or defining a gene product, will result in the production of that gene product in the cell under most or all physiological conditions. An inducible promoter is a promoter that selectively expresses a coding sequence or functional RNA in response to the presence of endogenous or exogenous stimuli, such as through a chemical compound (chemical inducer), or in response to environmental, hormone, chemical, and / or developmental signals. Inducible or regulatory promoters include promoters induced or regulated, for example, by light, heat, stress, flooding or drought, salt stress, osmotic stress, plant hormones, wounds, or chemicals (such as ethanol, abscisic acid (ABA), jasmonic acid esters, salicylic acid, or safeners).

[0188] host cells

[0189] "Host cell" refers to a eukaryotic cell (e.g., animal cell, plant cell, fungal cell, etc.), a prokaryotic cell (e.g., some microbial cells, Escherichia coli, Bacillus subtilis, etc.), or a cell derived from a multicellular organism (e.g., a cell line) cultured in the form of a single-celled entity, said cell serving as a recipient of nucleic acid (e.g., an expression vector), and includes the progeny of the original cell that has been genetically modified with nucleic acid.

[0190] It should be understood that the offspring of a single cell can be attributed to natural, accidental, or intentional mutations and do not necessarily have the exact same morphology or genome as the original parent cell. A “recombinant host cell” (also known as a “genetically modified host cell”) is a host cell in which a heterologous nucleic acid, such as an expression vector, has been introduced.

[0191] In some implementations, the host cell is derived from a plant or an animal.

[0192] In some embodiments, the plant is a dicotyledonous plant. In some embodiments, the dicotyledonous plant is selected from soybean, cabbage (e.g., Chinese cabbage), rapeseed, Brassica, watermelon, melon, potato, tomato, tobacco, eggplant, pepper, cucumber, cotton, alfalfa, and grape.

[0193] In some implementations, the plant is a monocotyledonous plant. Exemplarily, the monocotyledonous plant is selected from rice, corn, wheat, barley, oats, sorghum, millet, grasses, Poaceae, Zizania, Avena, Coix, Hordeum, Oryza, Panicum (e.g., millet), Secale, Setaria (e.g., foxtail millet), Sorghum, Triticum, Zea, Cymbopogon, and Saccharum (e.g., sugarcane). The genera *Phyllostachys*, *Dendrocalamus*, *Bambusa*, and *Yushania* are all related to bamboo.

[0194] In some embodiments, the animals are selected from pigs, cattle, sheep, goats, mice, rats, alpacas, monkeys, rabbits, chickens, ducks, geese, and fish (e.g., zebrafish). In some embodiments, the cells are eukaryotic cells, injected with mammalian cells, including human cells (primary human cells or established human cell lines). In some embodiments, the cells are non-human mammalian cells, such as cells from non-human primates (e.g., monkeys), cows / bulls / cattle, sheep, goats, pigs, horses, dogs, cats, and rodents (such as rabbits, mice, rats, hamsters, etc.). In some embodiments, the cells are derived from fish (such as salmon), birds (such as poultry, including chickens, ducks, geese), reptiles, shellfish (e.g., oysters, clams, lobsters, shrimp), insects, worms, yeast, etc. In some embodiments, the cells are derived from plants, such as monocots or dicots. In some embodiments, the plant is a food crop, such as barley, cassava, cotton, peanut or ground peanut, corn, millet, oil palm fruit, potato, dried beans, rapeseed or canola, rice, rye, sorghum, soybean, sugarcane, sugar beet, sunflower, and wheat. In some embodiments, the plant is a cereal (barley, corn, millet, rice, rye, sorghum, and wheat). In some embodiments, the plant is a tuber (cassava and potato). In some embodiments, the plant is a sugar crop (sugar beet and sugarcane). In some embodiments, the plant is an oilseed crop (soybean, peanut or ground peanut, rapeseed or canola, sunflower, and oil palm fruit). In some embodiments, the plant is a fiber crop (cotton). In some embodiments, the plant is a tree (such as peach or nectarine trees, apple or pear trees, nut trees (such as almond or walnut or pistachio trees), or citrus trees (e.g., orange, grapefruit, or lemon trees), grass, vegetables, fruits, or algae. In some implementations, the plants are plants of the genera *Solanum*, *Brassica*, *Lactuca*, *Spinacia*, *Capsicum*, cotton, tobacco, asparagus, carrots, cabbage, broccoli, cauliflower, tomatoes, eggplants, peppers, lettuce, spinach, strawberries, blueberries, raspberries, blackberries, grapes, coffee, cocoa, etc.

[0195] NLS

[0196] NLS stands for “nuclear localization sequence” or “nuclear localization signal,” which is the amino acid sequence that prompts a protein to enter the cell nucleus. Nuclear localization sequences are known in the art (e.g., described in Plank et al., International PCT Application PCT / EP2000 / 011690, filed November 23, 2000, and published as WO / 2001 / 038547 on May 31, 2001), and are incorporated herein by reference to their disclosure of exemplary nuclear localization sequences. In other embodiments, the NLS is an optimized NLS, for example, as described in Koblan et al., Nature Biotech. 2018 doi:10.1038 / nbt.4172. In some embodiments, NLS comprises the following amino acid sequences: KRTADGSEFESPKKKRKV, AVKRPAATKKAGQAKKKKLD, KRPAATKKAGQAKKKK, KKTELQTTNAENKTKKL, KRGINDRNFWRGENGRKTR, RKSGKIAAIVVKRPRK, PKKKRKV, or MDSLLMNRRKFLYQFKNVRWAKGRRETYLC.

[0197] Operable connection

[0198] "Operationally ligated" refers to the ligation of a target nucleotide sequence to a regulatory element in a manner that allows for the expression of the nucleotide sequence (e.g., in an in vitro transcription / translation system or in a host cell when a vector is introduced into the host cell). Advantageous vectors include lentiviruses and adeno-associated viruses, and the type of these vectors can also be selected to target specific cell types.

[0199] Complementary

[0200] "Complementarity" refers to the ability of one nucleic acid sequence to form one or more hydrogen bonds with another nucleic acid sequence via conventional Watson-Crick or other non-conventional types. The complementarity percentage represents the percentage of residues in one nucleic acid molecule that can form hydrogen bonds (e.g., Watson-Crick base pairing) with another nucleic acid sequence (e.g., if 5, 6, 7, 8, 9, or 10 out of 10 are complementary, the complementarity percentages are 50%, 60%, 70%, 80%, 90%, and 100%). "Complete complementarity" means that all consecutive residues in one nucleic acid sequence form hydrogen bonds with the same number of consecutive residues in another nucleic acid sequence. "Substantially complementary" refers to a complementarity of at least 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, 97%, 98%, 99%, or 100% in a region of 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 30, 35, 40, 45, 50 or more nucleotides, or to two nucleic acids that hybridize under strict conditions.

[0201] The term "strict condition" associated with hybridization refers to conditions under which a nucleic acid complementary to a target sequence hybridizes primarily with that target sequence and substantially does not hybridize to non-target sequences. Strict conditions are typically sequence-dependent and depend on many factors. Generally, the longer the sequence, the higher the temperature at which it specifically hybridizes to its target sequence.

[0202] "Hybridization" refers to a reaction in which one or more polynucleotides react to form a complex that is stabilized by hydrogen bonds between the bases of these nucleotide residues. This complex can consist of two strands forming a duplex, three or more strands forming a multi-stranded complex, a single self-hybridizing strand, or any combination thereof. Hybridization can be a step in a broader process, such as the initiation of PCR or the cleavage of a polynucleotide by an enzyme. A sequence capable of hybridizing with a given sequence is called the "complement" of that given sequence.

[0203] Hybridization of the target sequence with gRNA indicates that at least 60%, 65%, 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100% of the nucleic acid sequences of the target sequence and gRNA can hybridize to form a complex; or it indicates that at least 12, 15, 16, 17, 18, 19, 20, or more bases of the nucleic acid sequences of the target sequence and gRNA can complement each other to form a complex.

[0204] Express

[0205] Nucleic acid expression includes one or more of the following: generating an RNA template from a DNA sequence (e.g., transcription), processing of RNA transcripts (e.g., by splicing, editing, 5′ cap formation and / or 3′ end processing), translating RNA into a polypeptide or protein, or post-translational modifications of a polypeptide or protein.

[0206] deliver

[0207] "Delivery" refers to providing an entity (such as a drug) to a destination. For example, components of the CRISPR-Cas system / composition of the present invention can be delivered in various forms, such as DNA / RNA or RNA / RNA or a combination of protein and RNA. For example, Cas proteins can be delivered as polynucleotides encoding DNA or RNA, or as proteins.

[0208] connector

[0209] A linker is a linear polypeptide formed by multiple amino acid residues linked together by peptide bonds. Linkers can be artificially synthesized amino acid sequences or naturally occurring polypeptide sequences.

[0210] treat

[0211] "Treatment" refers to treating or curing a subject's condition, delaying the onset of symptoms, and / or slowing the severity of the condition. The term "subject" includes, but is not limited to, various animals, plants, and microorganisms. Animals include mammals such as bovines, equines, sheep, suidae, canines, felines, lagomorphs, rodents (e.g., mice or rats), non-human primates (e.g., macaques or cynomolgus monkeys), or humans. In some embodiments, the subject (e.g., a human) suffers from a condition (e.g., a condition caused by a disease-related gene defect). "Plant" is any differentiated multicellular organism capable of photosynthesis, including crop plants at any stage of maturity or development.

[0212] Example 1. Obtaining the Cas protein

[0213] The inventors analyzed the metagenomics of uncultured organisms and identified four new Cas proteins through redundancy removal and protein clustering. Blast analysis showed that the Cas proteins had low sequence similarity to previously reported Cas proteins. In this invention, these Cas proteins were named YTGE-167, YTGE-174, YTGE-190, and YTGE-192.

[0214] The amino acid sequences of the aforementioned Cas proteins are shown in SEQ ID Nos. 1-4, and the nucleic acid sequences encoding them are shown in SEQ ID Nos. 5-8. Analysis of the direct repeat (DR) sequences of the guide RNAs corresponding to these proteins revealed the DR sequences for each Cas protein, as shown in SEQ ID Nos. 9-12. The inventors further analyzed the RNA secondary structure of each DR sequence using RNAfold. The analysis results are as follows: Figure 2 PAM library depletion experiments (conducted according to any of the embodiments of Karvelis et al. Methods. May 15, 2017; 121-122:3-8 (the entire contents of which are incorporated herein by reference) identified that the PAM corresponding to each of the above Cas proteins was 5'-TTN-3'.

[0215] Example 2. Validation of Cas protein cleavage activity

[0216] 1. Plasmid construction

[0217] (1) Using the TTR gene target as the target, design the spacer sequence TTR-spacer (SEQ ID NO.13), and design the TTR-crRNA sequence of each Cas protein as shown in the table below (the crRNA sequence structure is 5'-DR sequence-spacer sequence-3').

[0218] Table 1

[0219]

[0220] The encoding nucleotides (SEQ ID NO. 5-8) of YTGE-167, YTGE-174, YTGE-190, and YTGE-192, along with their corresponding TTR-crRNAs, were cloned into pcDNA3.1-HA (Addgene, #128034). The crRNA was then regulated using the T7 promoter to obtain the recombinant expression plasmid: T7-Cas-T7-crRNA vector (vector map shown). Figure 1 ).

[0221] (2) The araC-pBAD-CCDB fragment (SEQ ID NO.18) containing the TTR target sequence (SEQ ID NO.13) was synthesized by Suzhou Hongxun Biotechnology Co., Ltd., and the araC-pBAD-CCDB fragment was inserted into positions 1284-1300 of the pKESK22 (Addgene, Plasmid #64857) plasmid to obtain the Target plasmid. The sequence of the Target plasmid is shown in SEQ ID NO.19, and the plasmid map is shown in [link to plasmid map]. Figure 3 .

[0222] 2. Preparation and transformation of competent Escherichia coli cells

[0223] The Target plasmid was transferred into DH5α competent cells and isolated by streaking with an inoculating loop onto LB agar containing 50 μg / ml kanamycin sulfate. The cells were incubated overnight at 37°C. The next day, a single colony was picked from the plate and inoculated into a test tube containing 4 ml of LB liquid medium containing 50 μg / ml kanamycin sulfate (Sangon Biotech, A100408-0100). The colony was incubated overnight at 37°C with shaking at 200 rpm. The following day, 4 ml of the bacterial culture was inoculated into a 2L shake flask containing 400 ml of LB liquid medium containing 50 μg / ml kanamycin sulfate and incubated at 37°C with shaking at 200 rpm for 2-3 hours.

[0224] When the OD600nm value of the bacterial suspension reaches 0.3-0.5, remove the shake flask and place it on ice for 10-15 minutes. Under aseptic conditions, pour the bacterial suspension into a pre-chilled 500ml centrifuge bottle, centrifuge at 4℃ and 3000rpm for 8 minutes, discard the supernatant, add approximately 200ml of pre-chilled CaCl2 solution, mix by pipetting to resuspend the bacterial cells, and incubate on ice for 30 minutes. Then, centrifuge the bacterial suspension at 4℃ and 3000rpm for 8 minutes, discard the supernatant, add approximately 8ml of pre-chilled CaCl2 solution to resuspend the bacterial cells, and aliquot the resuspended bacterial cells into 1.5ml EP tubes (110μl per tube) and store at -80℃ for later use.

[0225] 3. Determination of in vivo editing efficiency of Escherichia coli

[0226] 100 ng of T7-Cas-T7-crRNA vector was transformed into competent E. coli cells. After thawing for half an hour, the cells were plated onto plates containing disodium carbenicillin (Sangon Biotech, A100358-0001) and L-arabinose (Sangon Biotech, A610071-0100) (CL) and kanamycin sulfate (Sangon Biotech, A100408-0100) and disodium carbenicillin (Sangon Biotech, A100358-0001) (CK), respectively, and incubated overnight for 16 h. If the Cas protein cleaved the target plasmid, the bacteria survived. The clone count in the CL plate / the clone count in the CK plate represents the editing efficiency. The editing efficiency of each Cas protein was tested using the above method, and the results are as follows: Figure 4 As shown in A-4B, analysis reveals that each Cas protein, including YTGE-167, YTGE-174, YTGE-190, and YTGE-192, exhibits significant cleavage activity and has potential application prospects.

[0227] Example 3. Targeting of various nucleases in mammalian cells

[0228] To evaluate the cleavage activity of various nucleases in mammalian cells, the applicant cloned various nucleases (SEQ ID NO. 5-8), including YTGE-167, YTGE-174, YTGE-190, and YTGE-192, into the pcDNA3.1 (Invitrogen, V79020) backbone to obtain nuclease expression plasmids. The HAO1 and TTR genes were selected as targets, and the target sequences are shown in the table below.

[0229] Table 2

[0230]

[0231] The corresponding crRNA sequences are designed as shown in the table below:

[0232] Table 3

[0233]

[0234] crRNA was cloned into the pUC19 vector (NEB, N3041L) and regulated using the U6 promoter to obtain crRNA expression plasmids. The constructed plasmids were then transfected into *E. coli* DH5α competent cells (Weidi Biotechnology, DL1001), and after large-scale amplification, their concentrations were measured and stored at -20°C for later use.

[0235] HEK293T cells (purchased from ATCC) were seeded in DMEM medium (Gibco, 11965092) supplemented with 10% FBS (v / v) and containing 1% Penicillin Streptomycin (v / v) (Gibco, 15140122) and cultured in a 37°C cell culture incubator containing 5% CO2. Cells for transfection were seeded in 24-well cell culture plates the day before transfection, and observed the cells the next day. Transfection was performed when the cell density reached approximately 80%.

[0236] The nuclease expression plasmid and its corresponding crRNA expression plasmid, along with the EGFP-C1 (Addgene, Plasmid, #54759) plasmid, were transfected into HEK293T cells. The amount of plasmid used per well in a 24-well plate was 0.3 μg of nuclease expression plasmid, 0.3 μg of crRNA expression plasmid, and 0.3 μg of EGFP-C1 plasmid. The specific transfection procedure is as follows:

[0237] The nuclease expression plasmid, crRNA expression plasmid, and EGFP-C1 plasmid were each mixed separately and diluted with 25 μl of UltraFectin® serum-depleted transfection medium (Yuanpei Biotechnology, L530KJ). Then, 2 μl of Lipofectamine 3000 (Invitrogen, L3000015) reagent was added, and the mixture was thoroughly mixed (Reagent A). The mixture was then incubated for 5 minutes. Simultaneously, 2 μl of Lipofectamine 3000 transfection reagent (Invitrogen, L3000015) was diluted with 25 μl of UltraFectin® serum-depleted transfection medium (Yuanpei Biotechnology, L530KJ) and mixed thoroughly (Reagent B). The mixture was then incubated for 5 minutes.

[0238] Mix reagent A and reagent B thoroughly and let stand for 20 minutes. After standing, add the mixed reagent dropwise to the cells in the 24-well plate to be transfected, and return to a 37°C, 5% CO2 incubator. Six hours after transfection, change the culture medium to DMEM medium containing 10% FBS.

[0239] Forty-eight hours after transfection, the expression of EGFP fluorescent protein indicated successful cell transfection. Cells expressing EGFP were sorted for editing efficiency testing. Genomic DNA extraction was performed on these cells (using a TIANGEN DP304-03 genomic DNA extraction kit). The PCR amplification products were used for high-throughput deep sequencing (Qingke Biotechnology Co., Ltd.) to detect cleavage activity. The results are shown below. Figure 5 As shown.

[0240] Analysis shows that each nuclease in this invention has significant cleavage activity in mammalian cells and has potential application prospects.

[0241] The following is some of the sequence information involved in this invention:

[0242]

Claims

1. A Cas protein, characterized in that, The protein is a polypeptide with an amino acid sequence shown in any one of SEQ ID NO:1-4.

2. A fusion protein comprising the Cas protein of claim 1 and one or more functional domains, wherein the functional domains are selected from localization signals, reporter proteins, and epitope tags.

3. A polynucleotide, characterized in that, The polynucleotide is a nucleotide sequence encoding the Cas protein of claim 1 or the fusion protein of claim 2.

4. The polynucleotide as described in claim 3, characterized in that, The polynucleotide is a DNA molecule whose codons are optimized according to the codon preferences of the host cell.

5. The polynucleotide as described in claim 4, characterized in that, The DNA molecule comprises nucleotides having ≥90% identity with the nucleotide sequence of any one of SEQ ID No:5-8.

6. A CRISPR-Cas composition, said composition comprising: (1) The Cas protein of claim 1 or the fusion protein of claim 2, or a nucleic acid molecule encoding the Cas protein of claim 1 or the fusion protein of claim 2; (2) Guide RNA, or one or more DNA molecules encoding the guide RNA.

7. The CRISPR-Cas composition of claim 6, comprising one or more carriers, said one or more carriers comprising: (1) A first regulatory element, operatively linked to a nucleotide sequence encoding the Cas protein or a nucleotide sequence encoding the fusion protein; and (2) A second regulatory element, operatively linked to a nucleotide sequence encoding the guide RNA, the guide RNA comprising: (a) a spacer sequence capable of hybridizing with the target sequence of the target nucleic acid, and (b) A direct repeat sequence attached to the spacer sequence that guides the Cas protein to bind to the guide RNA to form a CRISPR-Cas complex targeting the target sequence. The first control element and the second control element are located on the same or different carriers.

8. A delivery system, characterized in that, The delivery system comprises the Cas protein of claim 1, or the fusion protein of claim 2, or the polynucleotide of any one of claims 3-5, or the CRISPR-Cas composition of any one of claims 6-7.

9. A host cell, characterized in that, The host cell comprises the Cas protein of claim 1, or the fusion protein of claim 2, or the polynucleotide of any one of claims 3-5, or the CRISPR-Cas composition of any one of claims 6-7, or the delivery system of claim 8.

10. A method for non-diagnostic, non-therapeutic targeting and editing of target nucleic acids or for cleaving target nucleic acids, characterized in that, The method includes contacting a cell containing the target nucleic acid with the Cas protein of claim 1, or the fusion protein of claim 2, or the polynucleotide of any one of claims 3-5, or the CRISPR-Cas composition of any one of claims 6-7, or the host cell of claim 9.

11. A non-diagnostic, non-therapeutic method for targeting and cleaving double-stranded target nucleic acids, characterized in that, The method includes contacting the double-stranded target nucleic acid with the CRISPR-Cas composition according to any one of claims 6-7.

12. A non-diagnostic, non-therapeutic method for gene editing of cells, characterized in that, The method includes contacting the CRISPR-Cas composition of any one of claims 6-7 with the target nucleic acid in the cell.

13. A kit comprising the Cas protein of claim 1, the fusion protein of claim 2, or the polynucleotide of any one of claims 3-5, or the CRISPR-Cas composition of any one of claims 6-7, or the host cell of claim 9, wherein the components of the kit are in the same or different containers.

14. The use of the Cas protein of claim 1, or the fusion protein of claim 2, or the polynucleotide of any one of claims 3-5, or the CRISPR-Cas composition of any one of claims 6-7, or the delivery system of claim 8, or the host cell of claim 9, or the kit of claim 13 in nucleic acid detection.