Base editing tools and applications thereof
By using a base editing tool based on the Cas12i protein and fusing a Cas mutant protein with a deaminase, the problem of achieving precise single-base modification in existing CRISPR/Cas technology has been solved, enabling efficient editing of gene sites without cutting the DNA double strand.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- SHANDONG SHUNFENG BIOTECH CO LTD
- Filing Date
- 2023-11-09
- Publication Date
- 2026-06-12
Smart Images

Figure CN117327679B_ABST
Abstract
Description
[0001] This application claims priority to Chinese invention patent application CN202211451897.0, filed on November 21, 2022. The entire contents of the aforementioned Chinese patent are incorporated herein by reference. Technical Field
[0002] This invention relates to the field of gene editing, particularly to the field of CRISPR (Cellular Repetition of Precipitated Short Palindromic Repeats). Specifically, this invention relates to a base editing tool, particularly a base editing tool based on the Cas12 protein. Background Technology
[0003] CRISPR / Cas technology is a widely used gene editing technology that uses RNA to specifically bind to target sequences on the genome and cut DNA to create double-strand breaks, using biological non-homologous end joining or homologous recombination for site-specific gene editing.
[0004] The development of single-base gene editing tools enables precise modification of specific gene sites without causing DNA double-strand breaks. The basic principle of single-base gene editing technology is to fuse cytosine deaminase (APOBEC) or adenosine deaminase with Cas protein to form a gene editing technology that relies on the CRISPR principle to modify a single base at the target site.
[0005] The base editor in a single-base gene editing system mainly consists of two parts: the Cas protein and the DNA modifying enzyme. To date, two types of base editors have been developed: the cytosine base editor (CBE), based on cytosine deaminase, to achieve C>T (conversion from C to T); and the adenine base editor (ABE), based on adenine deaminase, to achieve A>G (conversion from A to G).
[0006] Currently, the commonly used Cas enzyme in single-base gene editing tools is Cas9nickase (Cas9n), which has nicking enzyme activity. This invention is based on the Cas12i protein and constructs a single-base editing tool that can perform base editing. It is the first time that single-base editing using the Cas12i protein has been realized, which greatly expands the application scope of single-base editing technology. Summary of the Invention
[0007] On the one hand, the present invention provides a Cas mutant protein with inactivated nuclease activity.
[0008] In one embodiment, the Cas mutant protein is mutated at amino acid position 7 and amino acid position 619 or amino acid position 844, corresponding to the amino acid sequence shown in SEQ ID No. 1, compared to the amino acid sequence of the parental Cas protein.
[0009] In one embodiment, the Cas mutant protein is mutated at amino acid positions 7 and 619 corresponding to the amino acid sequence shown in SEQ ID No. 1, compared to the parental Cas protein.
[0010] In one embodiment, the Cas mutant protein is mutated at amino acid positions 7 and 844, corresponding to the amino acid sequence shown in SEQ ID No. 1, compared to the parental Cas protein.
[0011] In one embodiment, the Cas mutant protein is mutated at amino acid positions 7, 619, and 844, corresponding to the amino acid sequence shown in SEQ ID No. 1, compared to the parental Cas protein.
[0012] In one embodiment, the Cas mutant protein, compared with the amino acid sequence of the parental Cas protein, also has mutations at amino acid positions 233, 267, 369, and 433 corresponding to the amino acid sequence shown in SEQ ID No. 1.
[0013] In one embodiment, the Cas mutant protein, compared with the amino acid sequence of the parental Cas protein, has mutations at amino acid positions 7, 619, and 844 corresponding to the amino acid sequence shown in SEQ ID No. 1; furthermore, the Cas mutant protein, compared with the amino acid sequence of the parental Cas protein, also has mutations at amino acid positions 233, 267, 369, and 433 corresponding to the amino acid sequence shown in SEQ ID No. 1.
[0014] In one embodiment, the Cas mutant protein has mutations at amino acid positions 7, 619, 233, 267, 369, and 433 corresponding to the amino acid sequence shown in SEQ ID No. 1, compared to the amino acid sequence of the parental Cas protein.
[0015] In one embodiment, the 7th amino acid is mutated to a non-S amino acid, such as A, V, G, L, Q, F, W, Y, D, N, E, K, M, T, C, P, H, R, I; preferably, R.
[0016] In one embodiment, the 233rd or 267th amino acid is mutated to a non-D amino acid, such as A, V, G, L, Q, F, W, Y, N, S, E, K, M, T, C, P, H, R, I; preferably, the 233rd or 267th amino acid is mutated to R.
[0017] In one embodiment, the 369th amino acid is mutated to a non-N amino acid, such as A, V, G, L, Q, F, W, Y, D, S, E, K, M, T, C, P, H, R, I; preferably, R.
[0018] In one embodiment, the 433rd amino acid is mutated to a non-S amino acid, such as A, V, G, L, Q, F, W, Y, D, N, E, K, M, T, C, P, H, R, I; preferably, R.
[0019] In one embodiment, the amino acid at position 619 is mutated to a non-D amino acid, such as A, V, G, L, Q, F, W, Y, N, S, E, K, M, T, C, P, H, R, I; preferably, A.
[0020] In one embodiment, the 844th amino acid is mutated to a non-E amino acid, such as A, V, G, L, D, F, W, Y, N, S, Q, T, M, K, C, P, H, R, I; preferably, A.
[0021] In one embodiment, the amino acid sequence of the parental Cas protein has at least 80%, at least 85%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, at least 99.1%, at least 99.2%, at least 99.3%, at least 99.4%, at least 99.5%, at least 99.6%, at least 99.7%, at least 99.8%, or at least 99.9% sequence identity with SEQ ID No. 1.
[0022] In some embodiments, the parental Cas protein is a natural wild-type Cas protein; in other embodiments, the parental Cas protein is an engineered Cas protein.
[0023] Cas proteins or Cas12i proteins from various organisms can be used as parental Cas proteins. In some embodiments, the parental Cas protein or Cas12i protein has nuclease activity. In some embodiments, the parental Cas protein is a nuclease, i.e., cleaving both strands of a target double-stranded nucleic acid (e.g., double-stranded DNA). In some embodiments, the parental Cas protein is a cleavage enzyme, i.e., cleaving a single strand of a target double-stranded nucleic acid (e.g., double-stranded DNA).
[0024] In one embodiment, the amino acid sequence of the parental Cas protein is shown in SEQ ID No. 1.
[0025] On the other hand, the present invention provides a fusion protein comprising a Cas protein and a deaminase.
[0026] In one embodiment, the Cas protein in the fusion protein is a Cas mutant protein with inactivated nuclease activity as described above.
[0027] In one embodiment, the deaminase is selected from either adenosine deaminase or cytidine deaminase.
[0028] In this invention, adenosine deaminase, also known as adenine deaminase, catalyzes the hydrolytic deamination of adenine or adenosine. The adenosine deaminases provided herein (e.g., engineered adenosine deaminases, evolved adenosine deaminases) can be derived from any organism, such as bacteria. In some embodiments, the adenosine deaminase is a naturally occurring adenosine deaminase, or it may be a mutant that has been mutated but still retains adenosine deaminase activity.
[0029] In some embodiments, adenosine deaminase is derived from prokaryotes. In some embodiments, adenosine deaminase is derived from bacteria. In some embodiments, adenosine deaminase is derived from *Escherichia coli*, *Staphylococcus aureus*, *Salmonella typhi*, *Shewanella putrefaciens*, *Haemophilus influenzae*, *Caulobacter crescentus*, or *Bacillus subtilis*.
[0030] In some embodiments, adenosine deaminase is TadA deaminase. In some embodiments, TadA deaminase is Escherichia coli TadA deaminase, or it may be a variant of TadA deaminase, such as TadA7-10, for example, TadA9.
[0031] In one embodiment, the amino acid sequence of the adenosine deaminase has at least 80%, at least 85%, at least 90%, at least 92%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or at least 99.5%, or at least 99.9% sequence identity with SEQ ID No. 2.
[0032] SEVEFSHEYWMRHALTLAKRARDEREVPVGAVLVLNNRVIGEGWNRAIGLHDPTAH AEIMALRQGGLVMQNYRLIDATLYSTFEPCVMCAGAMIHSRIGRVVFGVRNSKRGAAGS LMNVLNYPGMNHRVEITEGILADECAALLCDFYRMPRRVFNAQKKAQSSIN (SEQ ID No. 2).
[0033] In a preferred embodiment, the amino acid sequence of the adenosine deaminase is shown in SEQ ID No. 2.
[0034] In this invention, cytidine deaminase, also known as cytosine deaminase, catalyzes the hydrolysis and deamination of cytidine or deoxycytidine to uridine or deoxyuridine. In some embodiments, cytidine deaminase catalyzes the hydrolysis and deamination of cytosine to uracil. In some embodiments, cytidine deaminase is a naturally occurring deaminase derived from an organism, such as a human, chimpanzee, gorilla, monkey, cow, dog, rat, or mouse. In some embodiments, the deaminase or deaminase domain is a variant of a naturally occurring deaminase derived from an organism, but still retains the activity of cytidine deaminase.
[0035] In another preferred embodiment, the cytosine deaminase comprises APOBEC. In one embodiment, the APOBEC is selected from the group consisting of: APOBEC1 (A1), APOBEC2 (A2), APOBEC3A, APOBEC3B, APOBEC3C, APOBEC3D, APOBEC3E, APOBEC3F, APOBEC3H, APOBEC4 (A4), activation-induced cytidine deaminase (AID), or combinations thereof.
[0036] In another preferred embodiment, the cytidine deaminase includes CBE2.0, CBE2.1, CBE2.2, CBE2.3, and CBE2.4.
[0037] In one embodiment, the amino acid sequence of the cytidine deaminase has at least 80%, at least 85%, at least 90%, at least 92%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or at least 99.5%, or at least 99.9% sequence identity with SEQ ID No. 3.
[0038] STDAEYVRIHEKLDIYTFKKQFSNNKKSVSHRCYVLFELKRRGERRACFWGYAVNKPQSGTERGIHAEIFSIRKVEEYLRDNPGQFTINWYSSWSPCADCAEKILE WYNQELRGNGHTLKIWVCKLYYEKNARNQIGLWNLRDNGVGLNVMVSEHYQCCRKIFIQSSHNQLNENRWLEKTLKRAEKRRSELSIMFQVKILHTTKSPAV(SEQ IDNo.3).
[0039] In a preferred embodiment, the amino acid sequence of the cytidine deaminase is shown in SEQ ID No. 3.
[0040] In one embodiment, the Cas protein in the fusion protein is fused to the N-terminus of the deaminase; in other embodiments, the Cas protein in the fusion protein is fused to the C-terminus of the deaminase. In some embodiments, the Cas protein and the deaminase are linked by a linker.
[0041] In this invention, the linker can be used to connect any peptide or protein domain of the invention. In some embodiments, the linker is a polypeptide. In some embodiments, the linker is a covalent bond (e.g., carbon-carbon bond, disulfide bond, carbon-heteroatom bond, etc.). In some embodiments, the linker is an amide-linked carbon-nitrogen bond. In some embodiments, the linker is a cyclic or acyclic, substituted or unsubstituted, branched or unbranched aliphatic or heteroaliphatic linker. In some embodiments, the linker is polymeric (e.g., polyethylene, polyethylene glycol, polyamide, polyester, etc.). In some embodiments, the linker comprises a monomer, dimer, or polymer of an aminoalkyl acid. In some embodiments, the linker comprises an aminoalkyl acid (e.g., glycine, acetic acid, alanine, β-alanine, 3-aminopropionic acid, 4-aminobutyric acid, 5-valeric acid, etc.). In some embodiments, the linker comprises a monomer, dimer, or polymer of aminohexanoic acid (Ahx). In some embodiments, the linker is based on a carbocyclic moiety (e.g., cyclopentane, cyclohexane). In other embodiments, the linker comprises a polyethylene glycol moiety (PEG). In other embodiments, the linker comprises an amino acid. In some embodiments, the linker comprises a peptide. In some embodiments, the linker comprises an aryl or heteroaryl moiety. In some embodiments, the linker is based on a benzene ring. The linker may include a functionalized moiety to facilitate the attachment of a nucleophile (e.g., a thiol, amino group) from the peptide to the linker. Any electrophile can be used as part of the linker.
[0042] In one embodiment, the linker is an XTEN linker, preferably with the amino acid sequence shown in SEQ ID No. 4: SGGSSGGSSGSETPGTSESATPESSGGSSGGS (SEQ ID No. 4).
[0043] In one embodiment, the fusion protein of the present invention further comprises a nuclear localization sequence (NLS). In some embodiments, the NLS is fused to the N-terminus of the fusion protein. In some embodiments, the NLS is fused to the C-terminus of the fusion protein. In other embodiments, both the N-terminus and C-terminus of the fusion protein are connected to an NLS.
[0044] In some embodiments, NLS is fused to the N-terminus of the Cas protein. In some embodiments, NLS is fused to the C-terminus of the Cas protein. In some embodiments, NLS is fused to the N-terminus of a deaminase. In some embodiments, NLS is fused to the C-terminus of a deaminase. In some embodiments, NLS is fused to the fusion protein via one or more linkers. In some embodiments, NLS is fused to the fusion protein without a linker.
[0045] Nuclear localization sequences (NLS) are known in the art and will be obvious to those skilled in the art. In some embodiments, the NLS sequence comprises the amino acid sequence PKKKRKV (SEQ ID No. 5) or KRPAATKKAGQAKKKK (SEQ ID No. 12).
[0046] In one embodiment, the deaminase in the fusion protein is cytidine deaminase; the fusion protein includes Cas protein and cytidine deaminase, and further, the fusion protein also includes uracil glycosylation inhibitor (UGI).
[0047] The term "uracil glycosylation enzyme inhibitor" or "UGI" refers to a protein that can inhibit uracil-DNA glycosylation enzyme base cleavage repair enzyme.
[0048] In some embodiments, UGI is fused to the N-terminus or C-terminus of the Cas protein. In one embodiment, UGI and the Cas protein are fused via a linker; in other embodiments, UGI and the Cas protein are not fused via a linker. The linker is preferably an XTEN linker.
[0049] In some embodiments, the UGI is fused to the N-terminus or C-terminus of the deaminase. In one embodiment, the UGI and deaminase are fused via a linker; in other embodiments, the UGI and deaminase are not fused via a linker. The linker is preferably an XTEN linker.
[0050] In one implementation, the UGI can be one or more UGIs, for example, it can be two, three, four or more UGIs connected together.
[0051] In one embodiment, the amino acid sequence of the UGI has at least 80%, at least 85%, at least 90%, at least 92%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or at least 99.5%, or at least 99.9% sequence identity with SEQ ID No. 6.
[0052] TNLSDIIEKETGKQLVIQESILMLPEEVEEVIGNKPESDILVHTAYDESTDENVMLLTSDAPEYKPWALVIQDSNGENKIKML (SEQ ID No. 6).
[0053] In a preferred embodiment, the amino acid sequence of the UGI is shown in SEQ ID No. 6.
[0054] Those skilled in the art will understand that the structure of a protein can be altered without adversely affecting its activity and function. For example, one or more conserved amino acid substitutions can be introduced into the amino acid sequence of a protein without adversely affecting the activity and / or three-dimensional structure of the protein molecule. Examples and implementations of conserved amino acid substitutions are familiar to those skilled in the art. Specifically, an amino acid residue can be substituted with another amino acid residue belonging to the same group as the site to be substituted, i.e., replacing another nonpolar amino acid residue with a nonpolar amino acid residue, replacing another polar uncharged amino acid residue with a polar uncharged amino acid residue, replacing another basic amino acid residue with a basic amino acid residue, and replacing another acidic amino acid residue with an acidic amino acid residue. Such substituted amino acid residues may or may not be encoded by the genetic code. Conservative substitutions, where an amino acid is replaced by another amino acid belonging to the same group, fall within the scope of this invention, provided that the substitution does not lead to inactivation of the protein's biological activity. Therefore, the proteins of this invention can contain one or more conserved substitutions in their amino acid sequence, preferably generated by substitutions according to Table 1. Furthermore, this invention also covers proteins that also contain one or more other nonconservative substitutions, provided that such nonconservative substitutions do not significantly affect the desired function and biological activity of the proteins of this invention.
[0055] Conservative amino acid substitutions can occur at one or more predicted non-essential amino acid residues. “Non-essential” amino acid residues are those that can be altered (deleted, substituted, or replaced) without changing biological activity, while “essential” amino acid residues are required for biological activity. A “conservative amino acid substitution” is a substitution in which an amino acid residue is replaced by an amino acid residue with a similar side chain. Amino acid substitutions can occur in the non-conservative regions of the aforementioned Cas mutant or fusion proteins. Generally, such substitutions are not performed on conserved amino acid residues, or on amino acid residues located within conserved motifs, where such residues are required for protein activity. However, those skilled in the art will understand that functional variants may have fewer conserved or non-conserved alterations in conserved regions.
[0056] Table 1
[0057] The initial residues Representative substitution Preferred replacement Ala(A) Val; Leu; Ile Val Arg(R) Lys;Gln;Asn Lys Asn(N) Gln; His; Lys; Arg Gln Asp(D) Glu Glu Cys(C) Ser Ser Gln(Q) Asn Asn Glu(E) Asp Asp Gly(G) Pro; Ala Ala His(H) Asn; Gln; Lys; Arg Arg Ile(I) Leu; Val; Met; Ala; Phe Leu Leu(L) Ile; Val; Met; Ala; Phe Ile Lys(K) Arg;Gln;Asn Arg Met(M) Leu; Phe; Ile Leu Phe(F) Leu; Val; Ile; Ala; Tyr Leu Pro(P) Ala Ala Ser(S) Thr Thr Thr(T) Ser Ser Trp(W) Tyr; Phe Tyr Tyr(Y) Trp; Phe; Thr; Ser Phe Val(V) Ile; Leu; Met; Phe; Ala Leu
[0058] As is well known in the art, one or more amino acid residues can be altered (replaced, deleted, truncated, or inserted) from the N and / or C ends of a protein while retaining its functional activity. Therefore, proteins with altered N and / or C ends from the Cas mutant protein while retaining their desired functional activity are also within the scope of this invention. These alterations can include those introduced by modern molecular methods such as PCR, which includes PCR amplification that alters or lengthens the protein-coding sequence by means of oligonucleotides containing amino acid-coding sequences used in the PCR amplification.
[0059] It should be recognized that proteins can be altered in various ways, including amino acid substitutions, deletions, truncations, and insertions, and methods for such operations are generally known in the art. For example, amino acid sequence variants of the aforementioned proteins can be prepared by mutating DNA. This can also be accomplished through other forms of mutagenesis and / or directed evolution, for example, using known mutagenesis, recombination, and / or shuffling methods, combined with relevant screening methods, to perform single or multiple amino acid substitutions, deletions, and / or insertions.
[0060] Those skilled in the art will understand that these minor amino acid changes in the Cas protein of this invention can occur (e.g., naturally occurring mutations) or be generated (e.g., using r-DNA technology) without loss of protein function or activity. If these mutations occur in the catalytic domain, active site, or other functional domains of the protein, the properties of the polypeptide may be altered, but the polypeptide may retain its activity. If the mutations are not located near the catalytic domain, active site, or other functional domains, a smaller impact can be expected.
[0061] Those skilled in the art can identify the essential amino acids of the Cas mutant protein of the present invention using methods known in the art, such as localized mutagenesis, protein evolution, or bioinformatics analysis. The catalytic domains, active sites, or other functional domains of the protein can also be determined through physical structural analysis, such as by techniques like nuclear magnetic resonance, crystallography, electron diffraction, or photoaffinity labeling, combined with mutations in presumed key site amino acids.
[0062] In this invention, amino acid residues can be represented by a single letter or by three letters, for example: alanine (Ala, A), valine (Val, V), glycine (Gly, G), leucine (Leu, L), glutamic acid (Gln, Q), phenylalanine (Phe, F), tryptophan (Trp, W), tyrosine (Tyr, Y), aspartic acid (Asp, D), asparagine (Asn, N), glutamic acid (Glu, E), lysine (Lys, K), methionine (Met, M), serine (Ser, S), threonine (Thr, T), cysteine (Cys, C), proline (Pro, P), isoleucine (Ile, I), histidine (His, H), and arginine (Arg, R).
[0063] The term "AxxB" indicates that amino acid A at position xx is changed to amino acid B. For example, D619A means that D at position 619 is mutated to A. When multiple amino acid sites are mutated simultaneously, it can be expressed in a form similar to S7R-D619A. For example, S7R-D619A represents that S at position 7 is mutated to R and D at position 619 is mutated to A.
[0064] The specific amino acid positions (numbers) within the protein described in this invention are determined using standard sequence alignment tools by comparing the amino acid sequence of the target protein with a target sequence (e.g., SEQ ID No. 1), such as using the Smith-Waterman algorithm or the CLUSTALW2 algorithm. The sequence is considered aligned when the alignment score is the highest. The alignment score can be calculated according to the method described in Wilbur, WJ and Lipman, DJ (1983) Rapid similarity searches of nucleic acid and protein databanks. Proc. Natl. Acad. Sci. USA, 80:726-730. In the ClustalW2 (1.82) algorithm, the default parameters are preferably used: protein gap opening penalty = 10.0; protein gap extension penalty = 0.2; protein matrix = Gonnet; protein / DNA end gap = -1; protein / DNA GAPDIST = 4. Preferably, the AlignX program (part of the vectorNTI group) is used with default parameters suitable for multiple alignments (gap opening penalty: 10og gap extension penalty 0.05) to determine the position of a specific amino acid in the protein of the present invention by comparing the amino acid sequence of the protein with SEQ ID No. 1. Those skilled in the art can use commonly used software, such as Clustal Omega, to perform sequence identity comparison and alignment of the amino acid sequence of any parental Cas protein with SEQ ID NO. 1, thereby obtaining the amino acid sites in the parental Cas protein corresponding to the amino acid sites defined in SEQ ID NO. 1 as described in this application.
[0065] The fusion protein of the present invention is not limited by its production method. For example, it can be produced by genetic engineering methods (recombinant technology) or by chemical synthesis methods.
[0066] The present invention also provides a base editing tool comprising the above-described fusion protein, such as a single-base editing tool.
[0067] Nucleic acid
[0068] On the other hand, the present invention provides an isolated polynucleotide comprising:
[0069] (a) A multinucleotide sequence encoding the Cas mutant protein or fusion protein of the present invention;
[0070] Alternatively, a polynucleotide complementary to the polynucleotide described in (a).
[0071] In one embodiment, the nucleotide sequence is codon-optimized for expression in prokaryotic cells. In another embodiment, the nucleotide sequence is codon-optimized for expression in eukaryotic cells.
[0072] In one embodiment, the cell is an animal cell, such as a mammalian cell.
[0073] In one embodiment, the cell is a human cell.
[0074] In one embodiment, the cell is a plant cell, such as the cell of a cultivated plant (e.g., cassava, corn, sorghum, wheat, or rice), algae, tree, or vegetable.
[0075] In one embodiment, the polynucleotide is preferably single-stranded or double-stranded.
[0076] Guide RNA (gRNA)
[0077] On the other hand, the present invention provides a gRNA comprising a first segment and a second segment; the first segment is also referred to as a "backbone region", "protein binding region", "protein binding sequence", or "direct repeat sequence"; the second segment is also referred to as a "target sequence for targeting nucleic acids", "target segment for targeting nucleic acids", or "guide sequence for targeting target sequences".
[0078] The first segment of the gRNA can interact with the Cas protein in the fusion protein of the present invention, thereby enabling the Cas protein and gRNA to form a complex.
[0079] The target sequence or target region of the nucleic acid targeted by this invention comprises a nucleotide sequence complementary to a sequence in the target nucleic acid. In other words, the target sequence or target region of the nucleic acid targeted by this invention interacts with the target nucleic acid in a sequence-specific manner through hybridization (i.e., base pairing). Therefore, the target sequence or target region of the nucleic acid can be altered or modified to hybridize with any desired sequence within the target nucleic acid. The nucleic acid is selected from DNA or RNA.
[0080] The percentage of complementarity between the target sequence or target region of the target nucleic acid and the target sequence of the target nucleic acid may be at least 60% (e.g., at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 97%, at least 98%, at least 99%, or 100%).
[0081] The "backbone region," "protein-binding region," "protein-binding sequence," or "direct repeat sequence" of the gRNA of this invention can interact with CRISPR proteins (or Cas proteins). The gRNA of this invention guides the interacting Cas protein to a specific nucleotide sequence within the target nucleic acid through the targeting sequence of the target nucleic acid.
[0082] Preferably, the guide RNA comprises a first segment and a second segment in the 5' to 3' direction.
[0083] In this invention, the second segment can also be understood as a guide sequence for hybridization with the target sequence.
[0084] The gRNA of the present invention can form a complex with the Cas protein.
[0085] carrier
[0086] The present invention also provides a carrier comprising, as described above, a Cas mutant protein, a fusion protein, an isolated nucleic acid molecule or a polynucleotide; preferably, it further comprises a regulatory element operatively linked thereto.
[0087] In one embodiment, the regulatory element is selected from one or more of the following: enhancers, transposons, promoters, terminators, leader sequences, polyadenylation sequences, and marker genes.
[0088] In one embodiment, the vector includes a cloning vector, an expression vector, a shuttle vector, and an integration vector.
[0089] In some implementations, the vectors included in the system are viral vectors (e.g., retroviral vectors, lentiviral vectors, adenovirus vectors, adeno-associated vectors, and herpes simplex vectors), and may also be plasmids, viruses, granules, bacteriophages, etc., which are well known to those skilled in the art.
[0090] CRISPR system
[0091] The present invention provides an engineered, non-naturally occurring vector system, or a CRISPR-Cas system, comprising the above-described fusion protein or a nucleic acid sequence encoding the fusion protein and a nucleic acid encoding one or more of the above-described guide RNAs.
[0092] In one embodiment, the nucleic acid sequence encoding the fusion protein and the nucleic acid encoding one or more guide RNAs are artificially synthesized.
[0093] In one embodiment, the nucleic acid sequence encoding the fusion protein and the nucleic acid encoding one or more guide RNAs do not coexist naturally.
[0094] The one or more guide RNAs target one or more target sequences in the cell. The one or more target sequences hybridize with the genomic loci of a DNA molecule encoding one or more gene products and guide the fusion protein to the genomic locus of the DNA molecule of the one or more gene products. After reaching the target sequence location, the fusion protein modifies or edits the target sequence, thereby altering or modifying the expression of the one or more gene products.
[0095] The cells of this invention include one or more of animals, plants, or microorganisms.
[0096] In some embodiments, the fusion protein is codon-optimized for expression in cells.
[0097] The present invention also provides an engineered, non-naturally occurring carrier system, which may include one or more carriers, the one or more carriers comprising:
[0098] a) A first regulatory element, which is operatively linked to the gRNA.
[0099] b) A second regulatory element operatively linked to the fusion protein;
[0100] Components (a) and (b) are located on the same or different carriers in the system.
[0101] The first and second regulatory elements include promoters (e.g., constitutive or inducible promoters), enhancers (e.g., 35S promoters or 35S enhanced promoters), internal ribosome entry sites (IRES), and other expression control elements (e.g., transcription termination signals, such as polyadenylation signals and polyU sequences).
[0102] In some implementations, the vector in the system is a viral vector (e.g., a retroviral vector, lentiviral vector, adenovirus vector, adeno-associated vector, and herpes simplex vector), or it can be a plasmid, virus, granule, bacteriophage, or other type known to those skilled in the art.
[0103] In some embodiments, the system provided herein is a delivery system. In some embodiments, the delivery system is a nanoparticle, liposome, exosome, microbubble, or gene gun.
[0104] In one embodiment, the target sequence is a DNA or RNA sequence derived from prokaryotic or eukaryotic cells. In another embodiment, the target sequence is a non-naturally occurring DNA or RNA sequence.
[0105] In one embodiment, the target sequence is present within the cell. In another embodiment, the target sequence is present in the cell nucleus or cytoplasm (e.g., organelles). In one embodiment, the cell is a eukaryotic cell. In other embodiments, the cell is a prokaryotic cell.
[0106] Protein-nucleic acid complexes / compositions
[0107] On the other hand, the present invention provides a complex or composition comprising:
[0108] (i) a protein component selected from: the aforementioned fusion protein; and
[0109] (ii) A nucleic acid component comprising (a) a guide sequence capable of hybridizing with a target sequence; and (b) a unidirectional repeat sequence capable of binding to the Cas protein in the fusion protein of the present invention.
[0110] The protein components and nucleic acid components can combine with each other to form a complex.
[0111] In one embodiment, the nucleic acid component is a guide RNA in a CRISPR-Cas system.
[0112] In one embodiment, the complex or composition is non-natural or modified. In one embodiment, at least one component of the complex or composition is non-natural or modified. In one embodiment, the first component is non-natural or modified; and / or, the second component is non-natural or modified.
[0113] Delivery and delivery composition
[0114] The fusion proteins, gRNAs, nucleic acid molecules, vectors, systems, complexes, and compositions of the present invention can be delivered by any method known in the art. Such methods include, but are not limited to, electroporation, lipid transfection, nuclear transfection, microinjection, acoustic pore effect, gene gun, calcium phosphate-mediated transfection, cationic transfection, liposome transfection, dendritic transfection, heat shock transfection, nuclear transfection, magnetic transfection, lipid transfection, puncture transfection, optical transfection, reagent-enhanced nucleic acid uptake, and delivery via liposomes, immunoliposomes, viral particles, artificial viruses, etc.
[0115] Therefore, in another aspect, the present invention provides a delivery composition comprising a delivery vector and selected from one or more of the following: fusion proteins, gRNAs, nucleic acid molecules, vectors, systems, complexes, and compositions of the present invention.
[0116] In one embodiment, the delivery carrier is a particle.
[0117] In one embodiment, the delivery vector is selected from lipid particles, sugar particles, metal particles, protein particles, liposomes, exosomes, microvesicles, gene guns, or viral vectors (e.g., replication-defective retroviruses, lentiviruses, adenoviruses, or adeno-associated viruses).
[0118] host cells
[0119] The present invention also relates to an in vitro, ex vivo, or in vivo cell or cell line or its progeny, said cell or cell line or its progeny comprising: the fusion protein of the present invention, nucleic acid molecule, protein-nucleic acid complex, vector, or delivery composition of the present invention.
[0120] In some implementations, the cell is a prokaryotic cell.
[0121] In some embodiments, the cell is a eukaryotic cell. In some embodiments, the cell is a mammalian cell. In some embodiments, the cell is a human cell. In some embodiments, the cell is a non-human mammalian cell, such as cells of non-human primates, cattle, sheep, pigs, dogs, monkeys, rabbits, or rodents (e.g., rats or mice). In some embodiments, the cell is a non-mammalian eukaryotic cell, such as cells of poultry (e.g., chickens), fish, or crustaceans (e.g., clams, shrimp). In some embodiments, the cell is a plant cell, such as cells of monocotyledonous or dicotyledonous plants, or cells of cultivated plants or food crops such as cassava, corn, sorghum, soybeans, wheat, oats, or rice, such as algae, trees, or productive plants, fruits, or vegetables (e.g., trees such as citrus trees, nut trees; nightshade plants, cotton, tobacco, tomatoes, grapes, coffee, cocoa, etc.).
[0122] In some implementations, the cell is a stem cell or stem cell line.
[0123] In some cases, the host cells of the present invention contain genetic or genomic modifications that are not present in their wild type.
[0124] Gene editing methods and applications
[0125] The fusion protein, nucleic acid, the above-described composition, the above-described CIRSPR / Cas system, the above-described vector system, the above-described delivery composition, or the above-described host cell of the present invention can be used for any or more of the following purposes: targeting and / or editing target nucleic acids; specifically editing double-stranded nucleic acids; base editing double-stranded nucleic acids; base editing single-stranded nucleic acids. In other embodiments, they can also be used to prepare reagents or kits for any or more of the above purposes.
[0126] The present invention also provides a method for editing nucleic acids, the method comprising the step of contacting a target region of a nucleic acid (e.g., a double-stranded DNA sequence) with a complex comprising the aforementioned fusion protein and gRNA; wherein the target region contains targeted base pairs, and the targeted base pairs in the target region are replaced by base substitution. In one embodiment, the deaminase in the fusion protein is adenosine deaminase, and the targeted base pairs are replaced by G:C instead of A:T; in another embodiment, the deaminase in the fusion protein is cytidine deaminase, and the targeted base pairs are replaced by A:T instead of C:G.
[0127] The above A:T means that the bases paired in the base pair are A and T; similarly, G:C means that the bases paired in the base pair are G and C, and C:G means that the bases paired in the base pair are C and G.
[0128] The present invention also provides the use of fusion proteins, nucleic acids, the above-described compositions, the above-described CIRSPR / Cas system, the above-described vector system, the above-described delivery compositions, or the above-described host cells in gene editing; or, in the preparation of reagents or kits for gene editing.
[0129] In one embodiment, the gene editing is performed intracellularly and / or extracellularly.
[0130] The present invention also provides a method for editing a target nucleic acid, the method comprising contacting the target nucleic acid with the aforementioned fusion protein, nucleic acid, the aforementioned composition, the aforementioned CIRSPR / Cas system, the aforementioned vector system, or the aforementioned delivery composition. In one embodiment, the method is to edit the target nucleic acid intracellularly or extracellularly.
[0131] The gene editing or editing of target nucleic acids includes the step of editing a single base of the target gene.
[0132] The editing can be performed in prokaryotic and / or eukaryotic cells.
[0133] On the other hand, the present invention also provides a kit for gene editing, the kit comprising the above-described fusion protein, gRNA, nucleic acid, the above-described composition, the above-described CIRSPR / Cas system, the above-described vector system, the above-described delivery composition, or the above-described host cell.
[0134] On the other hand, the invention provides the use of the above-mentioned fusion protein, nucleic acid, composition, CIRSPR / Cas system, vector system, delivery composition, or host cell in the preparation of formulations or kits, wherein the formulations or kits are used for:
[0135] (i) Gene or genome editing;
[0136] (ii) Editing target sequences in target loci to modify organisms;
[0137] (iii) Single-base editing;
[0138] (iv) Treatment of the disease.
[0139] Preferably, the above-mentioned gene or genome editing is performed intracellularly or extracellularly.
[0140] Preferably, the treatment of the disease is to treat symptoms caused by defects in the target sequence at the target locus.
[0141] Methods for specifically modifying target nucleic acids
[0142] On the other hand, the present invention also provides a method for specifically modifying target nucleic acids, the method comprising: contacting the target nucleic acid with the above-mentioned fusion protein, nucleic acid, the above-mentioned composition, the above-mentioned CIRSPR / Cas system, the above-mentioned vector system or the above-mentioned delivery composition.
[0143] This specific modification can occur in vivo or in vitro.
[0144] This specific modification can occur either inside or outside the cell.
[0145] In some cases, the cells are selected from prokaryotic or eukaryotic cells, such as animal cells, plant cells, or microbial cells.
[0146] CRISPR system
[0147] As used herein, the terms “regularly clustered short palindromic repeats (CRISPR)-CRISPR-associated (Cas) (CRISPR-Cas) system” or “CRISPR system” are used interchangeably and have the meaning commonly understood by those skilled in the art, which typically includes transcripts or other elements associated with the expression of CRISPR-associated (“Cas”) genes, or transcripts or other elements capable of directing the activity of said Cas genes. The Cas protein in this invention is a Crisprassociated protein.
[0148] CRISPR / Cas complex
[0149] As used herein, the term “CRISPR / Cas complex” refers to a complex formed by the binding of guide RNA or mature crRNA to the Cas protein, which contains a guide sequence that hybridizes to the target sequence and a homologous repeat sequence that binds to the Cas protein. This complex is capable of recognizing and cleaving polynucleotides that hybridize with the guide RNA or mature crRNA.
[0150] Guide RNA (gRNA)
[0151] As used herein, the terms “guide RNA (gRNA),” “mature crRNA,” and “guide sequence” are used interchangeably and have the meanings commonly understood by those skilled in the art. Generally, guide RNA may comprise a direct repeat sequence and a guide sequence, or consist substantially of or composed of a direct repeat sequence and a guide sequence.
[0152] In some cases, the guide sequence is any polynucleotide sequence that is sufficiently complementary to the target sequence to hybridize with the target sequence and guide the specific binding of the CRISPR / Cas complex to the target sequence. In one embodiment, the complementarity between the guide sequence and its corresponding target sequence is at least 50%, at least 60%, at least 70%, at least 80%, at least 90%, at least 95%, or at least 99% when optimal alignment is achieved. Determining the optimal alignment is within the capabilities of a person skilled in the art. For example, publicly available and commercially available alignment algorithms and programs exist, such as, but not limited to, ClustalW, the Smith-Waterman algorithm in MATLAB, Bowtie, Geneious, Biopython, and SeqMan.
[0153] target sequence
[0154] A "target sequence" refers to a polynucleotide targeted by a guide sequence in the gRNA, such as a sequence complementary to that guide sequence, where hybridization between the target and guide sequences will promote the formation of a CRISPR / Cas complex (including the Cas protein and gRNA). Perfect complementarity is not required, as long as sufficient complementarity exists to induce hybridization and promote the formation of a CRISPR / Cas complex.
[0155] The target sequence can contain any polynucleotide, such as DNA or RNA. In some cases, the target sequence is located inside or outside the cell. In some cases, the target sequence is located in the cell nucleus or cytoplasm. In some cases, the target sequence may be located in an organelle of a eukaryotic cell, such as a mitochondrion or chloroplast. The sequence or template that can be used for recombination into a target locus containing the target sequence is referred to as an "edit template," "edit polynucleotide," or "edit sequence." In one embodiment, the edit template is a foreign nucleic acid. In one embodiment, the recombination is homologous recombination.
[0156] In this invention, the "target sequence," "target polynucleotide," or "target nucleic acid" can be any endogenous or exogenous polynucleotide for a cell (e.g., a eukaryotic cell). For example, the target polynucleotide can be a polynucleotide present in the nucleus of a eukaryotic cell. The target polynucleotide can be a sequence encoding a gene product (e.g., a protein) or a non-coding sequence (e.g., a regulatory polynucleotide or useless DNA). In some cases, the target sequence should be associated with a protospacer adjacent motif (PAM).
[0157] wild type
[0158] As used herein, the term “wildtype” has the meaning commonly understood by those skilled in the art as referring to the typical form of an organism, strain, or gene, or the characteristic that distinguishes it from mutant or variant forms when it exists in nature, is separable from its natural source and has not been intentionally modified by humans.
[0159] Not naturally occurring
[0160] As used herein, the terms “non-naturally occurring” or “engineered” are used interchangeably and indicate artificial involvement. When these terms are used to describe nucleic acid molecules or peptides, they indicate that the nucleic acid molecule or peptide is at least substantially free from at least one other component bound to it, either naturally occurring or found in nature.
[0161] identity
[0162] As used herein, the term "identity" refers to the sequence matching between two polypeptides or two nucleic acids. Two compared sequences are identical at a position when the same base or amino acid monomeric subunit occupies the same location (e.g., a position in each of two DNA molecules is occupied by adenine, or a position in each of two polypeptides is occupied by lysine). The "percentage identity" between two sequences is a function of the number of matching positions shared by the two sequences divided by the number of positions compared × 100. For example, if six out of ten positions in two sequences match, then the two sequences have 60% identity. For example, the DNA sequences CTGACT and CAGGTT share 50% identity (three out of six positions match). Typically, two sequences are compared to produce the maximum identity. Such comparisons can be made using methods readily available, for example, computer programs such as the Align program (DNAstar, Inc.) Needleman et al. (1970) J. Mol. Biol. 48: 443-453. The percentage identity between two amino acid sequences can also be determined using the algorithm of E. Meyers and W. Miller (Comput. Appl Biosci., 4:11-17 (1988)) integrated into the ALIGN program (version 2.0), which uses a PAM120 weight residue table, a gap length penalty of 12, and a gap penalty of 4. Alternatively, the percentage identity between two amino acid sequences can be determined using the Needleman and Wunsch algorithm (J MoIBiol. 48:444-453 (1970)) in the GAP program integrated into the GCG software package (available at www.gcg.com), which uses a Blossum 62 matrix or a PAM250 matrix, along with gap weights of 16, 14, 12, 10, 8, 6, or 4, and length weights of 1, 2, 3, 4, 5, or 6.
[0163] carrier
[0164] The term "vector" refers to a nucleic acid molecule capable of delivering another nucleic acid molecule linked to it. Vectors include, but are not limited to, single-stranded, double-stranded, or partially double-stranded nucleic acid molecules; nucleic acid molecules including one or more free ends, or without free ends (e.g., circular); nucleic acid molecules including DNA, RNA, or both; and a wide variety of other polynucleotides known in the art. A vector can be introduced into a host cell through transformation, transduction, or transfection, thereby enabling the expression of its carried genetic material elements in the host cell. A vector can be introduced into a host cell to produce transcripts, proteins, or peptides, including proteins, fusion proteins, isolated nucleic acid molecules, etc., as described herein (e.g., CRISPR transcripts, such as nucleic acid transcripts, proteins, or enzymes). A vector may contain a variety of elements controlling expression, including, but not limited to, promoter sequences, transcription initiation sequences, enhancer sequences, selection elements, and reporter genes. Additionally, the vector may contain a replication initiation site.
[0165] One type of vector is a "plasmid," which is a circular double-stranded DNA loop into which another DNA fragment can be inserted, for example, using standard molecular cloning techniques.
[0166] Another type of vector is the viral vector, in which a virus-derived DNA or RNA sequence is present in a vector used to package the virus (e.g., retroviruses, replication-defective retroviruses, adenoviruses, replication-defective adenoviruses, and adeno-associated viruses). Viral vectors also contain polynucleotides carried by the virus used for transfection into a host cell. Some vectors (e.g., bacterial vectors with bacterial origins of replication and episodic mammalian vectors) are capable of autonomous replication in the host cells into which they are introduced.
[0167] Other vectors (e.g., non-attachment mammalian vectors) integrate into the host cell's genome upon introduction and thereby replicate along with the host genome. Furthermore, some vectors are capable of directing the expression of genes they are operatively linked to. Such vectors are referred to herein as "expression vectors."
[0168] host cells
[0169] As used herein, the term “host cell” refers to a cell that can be used to introduce a vector, including but not limited to prokaryotic cells such as Escherichia coli or Bacillus subtilis, and eukaryotic cells such as microbial cells, fungal cells, animal cells, and plant cells.
[0170] Those skilled in the art will understand that the design of expression vectors can depend on factors such as the selection of host cells to be transformed and the desired expression level.
[0171] Control element
[0172] As used herein, the term "regulatory element" is intended to include promoters, enhancers, internal ribosome entry sites (IRES), and other expression control elements (e.g., transcription termination signals such as polyadenylation signals and poly-U sequences), for which detailed description can be found in Goeddel, *Gene Expression Technology: Methods in Enzymology*, 185, Academic Press, San Diego, California (1990). In some cases, regulatory elements include those sequences that direct constitutive expression of a nucleotide sequence in many types of host cells and those sequences that direct expression of that nucleotide sequence only in certain host cells (e.g., tissue-specific regulatory sequences). Tissue-specific promoters can primarily direct expression in the desired tissue of interest, such as muscle, neurons, bone, skin, blood, specific organs (e.g., liver, pancreas), or specific cell types (e.g., lymphocytes). In some cases, regulatory elements can also be directed to express in a time-dependent manner (such as in a cell cycle-dependent or developmental stage-dependent manner), which may or may not be tissue or cell type specific. In some cases, the term "regulatory element" covers enhancer elements such as WPRE; CMV enhancer; R-U5' fragment in the LTR of HTLV-I (Mol. Cell. Biol., Vol. 8(1), pp. 466-472, 1988); SV40 enhancer; and intron sequence between exons 2 and 3 of rabbit β-globin (Proc. Natl. Acad. Sci. USA., Vol. 78(3), pp. 1527-31, 1981).
[0173] promoter
[0174] As used herein, the term "promoter" has the meaning known to those skilled in the art, referring to a non-coding nucleotide sequence located upstream of a gene that initiates the expression of a downstream gene. A constitutive promoter is a nucleotide sequence that, when operably linked to a polynucleotide encoding or defining a gene product, results in the production of the gene product in the cell under most or all physiological conditions of the cell. An inducible promoter is a nucleotide sequence that, when operably linked to a polynucleotide encoding or defining a gene product, results in the production of the gene product in the cell substantially only when an inducer corresponding to the promoter is present in the cell. A tissue-specific promoter is a nucleotide sequence that, when operably linked to a polynucleotide encoding or defining a gene product, results in the production of the gene product in the cell substantially only when the cell is a cell of the tissue type corresponding to that promoter.
[0175] NLS
[0176] A “nuclear localization signal” or “nuclear localization sequence” (NLS) is an amino acid sequence that “tags” a protein to allow it to be transported to the nucleus via nuclear transport; that is, a protein with an NLS is transported to the nucleus. Typically, an NLS contains positively charged Lys or Arg residues exposed on the protein surface. Exemplary nuclear localization sequences include, but are not limited to, NLS from the following: SV40 large T antigen, EGL-13, c-Myc, and TUS protein. In some embodiments, the NLS contains the PKKKRKV sequence. In some embodiments, the NLS contains the AVKRPAATKKAGQAKKKKLD sequence. In some embodiments, the NLS contains the PAAKRVKLD sequence. In some embodiments, the NLS contains the MSRRRKANPTKLSENAKKLAKEVEN sequence. In some embodiments, the NLS contains the KLKIKRPVK sequence. Other nuclear localization sequences include, but are not limited to, the acidic M9 domain of hnRNP A1, the KIPIK sequence in the yeast transcriptional repressor Matα2, and PY-NLS.
[0177] Operable connection
[0178] As used herein, the term “operably linked” is intended to mean that the nucleotide sequence of interest is linked to one or more regulatory elements in a manner that allows the expression of that nucleotide sequence (e.g., in an in vitro transcription / translation system or in the host cell when the vector is introduced into the host cell).
[0179] Complementarity
[0180] As used herein, the term "complementarity" refers to the ability of a nucleic acid to form one or more hydrogen bonds with another nucleic acid sequence via conventional Watson-Crick or other non-conventional types. The percentage of complementarity indicates the percentage of residues in a nucleic acid molecule that can form hydrogen bonds (e.g., Watson-Crick base pairing) with a second nucleic acid sequence (e.g., 5, 6, 7, 8, 9, 10 out of 10 are 50%, 60%, 70%, 80%, 90%, and 100% complementary). "Complete complementarity" means that all consecutive residues in a nucleic acid sequence form hydrogen bonds with the same number of consecutive residues in a second nucleic acid sequence. As used herein, “substantially complementary” refers to a complementarity of at least 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, 97%, 98%, 99%, or 100% in a region having 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 30, 35, 40, 45, 50 or more nucleotides, or to two nucleic acids hybridizing under stringent conditions.
[0181] Strict conditions
[0182] As used herein, “strict conditions” for hybridization refer to conditions under which a nucleic acid complementary to the target sequence hybridizes primarily with the target sequence and substantially does not hybridize to non-target sequences. Strict conditions are typically sequence-dependent and vary depending on many factors. Generally, the longer the sequence, the higher the temperature at which it specifically hybridizes to its target sequence.
[0183] Hybridization
[0184] The terms “hybridization” or “complementary” or “substantially complementary” refer to nucleic acids (such as RNA, DNA) containing nucleotide sequences that enable them to bind non-covalently, that is, to form base pairs and / or G / U base pairs with another nucleic acid in a sequence-specific, antiparallel manner (i.e., nucleic acid-specific binding of complementary nucleic acids), also known as “annealing” or “hybridization”.
[0185] Hybridization requires two nucleic acids to contain complementary sequences, although mismatches between bases may exist. Suitable conditions for hybridization between two nucleic acids depend on their length and degree of complementarity, which are well-known variables in the art. Typically, hybridizable nucleic acids are 8 nucleotides or longer (e.g., 10 nucleotides or longer, 12 nucleotides or longer, 15 nucleotides or longer, 20 nucleotides or longer, 22 nucleotides or longer, 25 nucleotides or longer, or 30 nucleotides or longer).
[0186] It should be understood that the sequence of a polynucleotide does not need to be 100% complementary to the sequence of its target nucleic acid for specific hybridization. The polynucleotide may contain 60% or higher, 65% or higher, 70% or higher, 75% or higher, 80% or higher, 85% or higher, 90% or higher, 95% or higher, 98% or higher, 99% or higher, 99.5% or higher, or have 100% sequence complementarity with the target region of the target nucleic acid sequence it hybridizes with.
[0187] Hybridization of the target sequence with gRNA means that at least 60%, 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100% of the nucleic acid sequences of the target sequence and gRNA can hybridize to form a complex; or it means that at least 12, 15, 16, 17, 18, 19, 20, 21, 22, or more bases of the nucleic acid sequences of the target sequence and gRNA can be complementary and hybridize to form a complex.
[0188] Express
[0189] As used herein, the term "expression" refers to the process by which a DNA template is transcribed into polynucleotides (such as mRNA or other RNA transcripts) and / or the transcribed mRNA is subsequently translated into peptides, polypeptides, or proteins. Transcripts and encoded polypeptides can be collectively referred to as "gene products." If the polynucleotides are derived from genomic DNA, expression can include the splicing of mRNA in eukaryotic cells.
[0190] connector
[0191] As used herein, the term "linker" refers to a linear polypeptide formed by the linkage of multiple amino acid residues via peptide bonds. The linkers of this invention can be synthetically produced amino acid sequences or naturally occurring polypeptide sequences, such as polypeptides with hinge region functions. Such linker polypeptides are well known in the art (see, for example, Holliger, P. et al. (1993) Proc. Natl. Acad. Sci. USA 90:6444-6448; Poljak, RJ et al. (1994) Structure 2:1121-1123).
[0192] treat
[0193] As used in this article, the term "treatment" means to treat or cure a disease, to delay the onset of symptoms of a disease, and / or to slow the progression of a disease.
[0194] Subjects
[0195] As used herein, the term “subject” includes, but is not limited to, various animals, plants and microorganisms.
[0196] animal
[0197] For example, mammals, such as bovids, equines, sheep, suidae, canids, felines, lagos, rodents (e.g., mice or rats), non-human primates (e.g., macaques or cynomolgus monkeys), or humans. In some embodiments, the subject (e.g., a human) suffers from a condition (e.g., a condition caused by a disease-related gene defect).
[0198] plant
[0199] The term "plant" should be understood as any differentiated multicellular organism capable of photosynthesis, including crop plants at any stage of maturity or development, particularly monocotyledonous or dicotyledonous plants, vegetable crops including artichokes, kohlrabi, arugula, leeks, asparagus, lettuce (e.g., head lettuce, leaf lettuce, longleaf lettuce), bok choy, taro, cucurbits (e.g., melons, watermelons, crenshaw, cantaloupes, Roman melons), rapeseed crops (e.g., Brussels sprouts, cabbage, cauliflower, broccoli, kale, headless cabbage, Chinese cabbage, baby bok choy), artichokes, carrots, napa cabbage, okra, onions, celery, parsley, chickpeas, parsnip, chicory, peppers, potatoes, gourds (e.g., zucchini, cucumbers, baby zucchini, squash, pumpkin), radishes, and dried heads. Onions, rutabagas, purple eggplant (also known as eggplant), ginseng, lettuce, scallions, chicory, garlic, spinach, green onions, squash, leafy greens, beets (sugar beets and fodder beets), sweet potatoes, romaine lettuce, wasabi, tomatoes, turnips, and spices; fruits and / or vine crops such as apples, apricots, cherries, nectarines, peaches, pears, plums, prunes, cherries, quince, almonds, chestnuts, hazelnuts, pecans, pistachios, walnuts, citrus fruits, blueberries, boysenberries. Raspberries, cranberries, currants, raspberries, strawberries, blackberries, grapes, avocados, bananas, kiwis, persimmons, pomegranates, pineapples, tropical fruits, pears, melons, mangoes, papayas, and lychees; field crops such as clover, alfalfa, evening primrose, silvergrass, corn / maize (feed corn, sweet corn, popcorn), hops, jojoba, peanuts, rice, safflower, small grain cereals (barley, oats, rye, wheat, etc.), sorghum, tobacco, kapok, legumes (beans, lentils, peas) Beans, soybeans), oil plants (rapeseed, mustard, olive, sunflower, coconut, castor oil plants, cocoa beans, peanuts), Arabidopsis, fiber plants (cotton, flax, jute), Lauraceae (cinnamon, camphor), or a plant such as coffee, sugarcane, tea, and natural rubber plants; and / or bedding plants, such as flowering plants, cacti, succulents and / or ornamental plants, and trees such as forests (broadleaf trees and evergreen trees, such as conifers), fruit trees, ornamental trees, and nut-bearing trees, as well as shrubs and other seedlings.
[0200] Beneficial effects of the invention
[0201] This invention improves the Cas protein by fusing it with a deaminase, enabling it to be used for single-base editing of target nucleic acids, and has broad application prospects.
[0202] The embodiments of the present invention will now be described in detail with reference to the accompanying drawings and examples. However, those skilled in the art will understand that the following drawings and examples are for illustrative purposes only and are not intended to limit the scope of the invention. Various objects and advantages of the present invention will become apparent to those skilled in the art from the following detailed description of the drawings and preferred embodiments.
[0203] The sequence information involved in this application is as follows:
[0204]
[0205]
[0206]
[0207]
[0208]
[0209] Attached Figure Description
[0210] Figure 1 Results of verification of the nuclease activity of the mutant Cas protein.
[0211] Figure 2 A schematic diagram of the ABE editing tool.
[0212] Figure 3 Schematic diagram of the .CBE editing tool structure.
[0213] Figure 4 Results of Cas protein single-base editing efficiency based on nuclease activity inactivation; where BE4max is the CBE editing vector in the examples and ABE9 is the ABE editing vector in the examples.
[0214] Figure 5 The single-base editing efficiency of Cas protein with optimized nuclease activity inactivation was determined; among them, S7R-D233R-D267R-N369R-S433R-D619A-ABE is an ABE editing vector composed of mutant protein S7R-D233R-D267R-N369R-S433R-D619A and adenosine deaminase. Detailed Implementation
[0215] The following examples are for illustrative purposes only and are not intended to limit the invention. Unless otherwise specified, the experiments and methods described in the examples are generally performed according to conventional methods well known in the art and described in various references. For example, conventional techniques such as immunology, biochemistry, chemistry, molecular biology, microbiology, cell biology, genomics, and recombinant DNA used in this invention can be found in Sambrook, Fritsch, and Maniatis, *Molecular Cloning: A Laboratory Manual*, 2nd edition (1989); *Current Protocols in Molecular Biology* (edited by FM. Ausubel et al., (1987)); and the *Methods in Enzymology* series (academic publishing company): *PCR 2: A PRACTICAL*. APPROACH (edited by MJ MacPherson, BD Hames and GR Taylor (1995)), Harlow and Lane (1988) Antibodies, ALABORATORY MANUAL, and Animal Cell Culture (edited by R.R. Freshney (1987)).
[0216] Furthermore, unless specific conditions are specified in the examples, conventional conditions or conditions recommended by the manufacturer should be followed. Reagents or instruments whose manufacturers are not specified are all commercially available conventional products. Those skilled in the art will understand that the examples are described by way of illustration and are not intended to limit the scope of protection claimed by the invention. All disclosures and other references mentioned herein are incorporated herein by reference in their entirety.
[0217] Example 1. Obtaining Cas protein with inactivated nuclease activity
[0218] For the known Cas protein (Cas12f.4 in CN111757889B, referred to as Cas12i3 in this embodiment), the amino acid sequence of the wild-type Cas12i3 protein is shown in SEQ ID No. 1. The applicant predicted key amino acid sites that might affect its biological function using bioinformatics and mutated these sites, changing D at position 619 to A (D619A) and E at position 844 to A (E844A), obtaining the mutant proteins D619A and E844A of Cas12i3. Mutations at these amino acid sites can generate variants of the Cas protein through PCR-based site-directed mutagenesis, which can employ site-directed mutagenesis methods commonly used in the art. Specifically, the DNA sequence of the Cas12i3 protein (sequence shown in SEQ ID No. 1) is designed to be divided into two parts centered on the mutation site. Two pairs of primers are designed to amplify these two DNA sequences respectively, with the sequence to be mutated introduced onto the primers. The combination of mutants is achieved by splitting the DNA into multiple segments and constructing them using PCR and Gibson cloning. Fragment amplification kit: TransStart FastPfu DNA Polymerase (containing 2.5 mM dNTPs), please refer to the instruction manual for detailed experimental procedures. Gel recovery kit: For detailed experimental procedures, please refer to the instruction manual for the Gel DNA Extraction Mini Kit. The reagent kit used for vector construction is the pEASY-Basic Seamless Cloning and Assembly Kit (CU201-03). For detailed experimental procedures, please refer to the instruction manual.
[0219] To verify the nuclease cleavage activity of the mutated Cas12i3 proteins D619A and E844A, the PCR double-stranded DNA product shown in SEQ ID No. 11 was used, and gRNA was designed targeting the PCR double-stranded product. The gRNA targeting sequence is (agagaaugugugcauagucacacaugcagaguucacuuuug).
[0220] Experimental method: The final concentration of Cas protein was 50 nM, the final concentration of gRNA was 50 nM, and the final concentration of dsDNA was 300 ng. T7 enzyme digestion buffer was added for digestion. After digestion for 30 min, 10 μL of sample was loaded onto a gel for testing.
[0221] The results are as follows Figure 1As shown. Lane 1 is the wild-type Cas12i3 experimental group, which showed significant cleavage activity against double-stranded DNA; Lane 2 is the Cas12i3 control group without gRNA, where double-stranded DNA was not cleaved; Lane 3 is the blank (water) control group; Lane 4 is the D619A experimental group (with gRNA), where double-stranded DNA was not cleaved; Lane 5 is the E844A experimental group (with gRNA), where double-stranded DNA was not cleaved. Figure 1 It can be seen that the D619A and E844A mutant proteins have completely lost their cleavage activity compared to the wild-type Cas12i3 (referred to as dCas12i3), and can be used as Cas proteins for single-base editing.
[0222] Example 2. Establishment of a single-base editing system based on dCas12i3
[0223] A single-base editing system was constructed using dCas12i3 (D619A or E844A) with inactivated nuclease activity obtained in Example 1 and deaminase (adenosine deaminase or cytidine deaminase).
[0224] In this embodiment, the adenosine deaminase used is TadA9 (amino acid sequence as shown in SEQ ID No. 2), and the cytidine deaminase used is BE4max (amino acid sequence as shown in SEQ ID No. 3). The above-mentioned adenosine deaminase and cytidine deaminase are only exemplary deaminases. In other embodiments, other adenosine deaminases and cytidine deaminases may also be used.
[0225] A schematic diagram of the ABE editing element constructed using adenosine deaminase TadA9 is shown below. Figure 2 As shown in the diagram, a schematic diagram of the CBE editing element constructed using BE4max with cytidine deaminase is presented. Figure 3 As shown; where GFP is a tag designed to screen for positive cells.
[0226] like Figures 2-3 As shown, the deaminase (adenosine deaminase or cytidine deaminase) is connected to the N-terminus of dCas12i3 (D619A or E844A) via an XTEN adapter; the other end of the deaminase and dCas12i3 is also connected to an NLS. In the CBE editing element of the cytidine deaminase, the C-terminus of dCas12i3 is also connected to a UGI (dCas12i3 and UGI are connected via a adapter); in this embodiment, two UGIs are used in series. The above is only an exemplary connection method of the deaminase and dCas12i3; in other embodiments, those skilled in the art can adjust the position or connection order of the above elements.
[0227] The amino acid and DNA sequences of the ABE editing element designed in this embodiment are shown in SEQ ID No. 7 and SEQ ID No. 8, respectively; the amino acid and DNA sequences of the CBE editing element designed are shown in SEQ ID No. 9 and SEQ ID No. 10, respectively. The Cas12i3 in the above sequences is the wild-type sequence, which can be replaced with D619A or E844A protein in actual use.
[0228] The activity of the above single-base editing system was verified in animal cells. Two target sites, FUT8-6 and TTG, were designed to target the FUT8 gene in Chinese hamster ovary cells (CHO). AAGCCAAGCTTCTTGGTGGTTTC FUT8-3: TTC CA GCCAAGGTTGTGGACGGATCA The italicized portion represents the PAM sequence, and the underlined region represents the gRNA target region. The DR region of the gRNA (the region that binds to the Cas protein) is: GUCUAACUGCCAGAGAAUC GUGCCUGCAAUUGGCAC.
[0229] In this embodiment, the target site of the CBE vector for both D619A and E844A proteins is FUT8-6; the target site of the ABE vector for D619A protein is FUT8-6; and the target site of the ABE vector for E844A protein is FUT8-3.
[0230] The pcDNA3.3 vector was used, and ABE or CBE editing elements were inserted via the XbaI and PstI restriction sites; the U6 promoter and gRNA sequence were inserted via the Mfe1 restriction site. The EF-1α promoter initiates the expression of the puromycin resistance gene. Plating: CHO cells were plated when confluence reached 70-80%, with a cell number of 8*10^4 cells / well in 12-well plates. Transfection: Transfection was performed 24 hours after plating, adding 6.25 μl of Hieff Transfection reagent to 100 μl opti-MEM. TM Mix the liposome nucleic acid transfection reagent thoroughly; add 2.5 μg of plasmid to 100 μl of opti-MEM and mix well. The diluted Hieff Transfection reagent... TM The liposome nucleic acid transfection reagent was mixed thoroughly with the diluted plasmid and incubated at room temperature for 20 min. The incubated mixture was then added to cell-coated culture medium for transfection. Puromycin was added for selection: 24 h after transfection, puromycin was added at a final concentration of 10 μg / ml. After 24 h of puromycin treatment, the medium was replaced with normal medium and cultured for another 24 h. 48 h after transfection, cells were digested with trypsin-EDTA (0.05%), and cells exhibiting GFP signals were sorted using flow cytometry (FACS).
[0231] DNA extraction, PCR amplification of the area near the editing region, and hiTOM sequencing: Cells were collected after trypsin digestion, and genomic DNA was extracted using a cell / tissue genomic DNA extraction kit (Biotech). Genomic DNA was amplified in the region near the target site. PCR products were sequenced using hiTOM sequencing, and mutations in AG in ABE and CT in CBE within the target site were statistically analyzed. Results are as follows: Figure 4 As shown, neither the ABE vector nor the CBE vector constructed using D619A or E844A proteins exhibited single-base editing efficiency. Figure 4 BE4max-i3(D619A) is a CBE vector constructed using the D619A protein, BE4max-i3(E844A) is a CBE vector constructed using the E844A protein, ABE9-i3(D619A) is an ABE vector constructed using the D619A protein, and ABE9-i3(E844A) is an ABE vector constructed using the E844A protein. This may be because although the mutations at the two single sites above yield nuclease-inactivated Cas proteins, they are not sufficient to induce single-base editing activity when fused with adenosine deaminase or cytidine deaminase.
[0232] To obtain a Cas protein with single-base editing efficiency, the applicant further optimized the Cas protein. Specifically, based on the D619A or E844A protein, the 7th amino acid of SEQ ID No. 1 was mutated to R, resulting in two amino acid-mutated proteins: S7R-D619A (relative to SEQ ID No. 1, the 7th amino acid is mutated to R and the 619th amino acid is mutated to A) and S7R-E844A (relative to SEQ ID No. 1, the 7th amino acid is mutated to R and the 844th amino acid is mutated to A).
[0233] Using the same method described above, the single-base editing efficiency of S7R-D619A or S7R-E844A proteins when used in combination with adenosine deaminase or cytidine deaminase was further verified; the results are as follows: Figure 4 As shown, the S7R-D619A or S7R-E844A proteins exhibited significant single-base editing activity in both ABE and CBE vectors. Figure 4 BE4max-i3(S7R-D619A) is a CBE vector constructed using the S7R-D619A protein, BE4max-i3(S7R-E844A) is a CBE vector constructed using the S7R-E844A protein, ABE9-i3(S7R-D619A) is an ABE vector constructed using the S7R-D619A protein, and ABE9-i3(S7R-E844A) is an ABE vector constructed using the S7R-E844A protein.
[0234] Sequencing results showed that the CBE single-base editing vectors of S7R-D619A or S7R-E844A edited the C at position 9 of the 3' end of the FUT8-6 target site (AAGCCAAGCTTCTTGGTGGTTTC) to T; the ABE single-base editing vector of S7R-D619A edited the A at position 1 of the 3' end of the FUT8-6 target site (AAGCCAAGCTTCTTGGTGGTTTC) to G; and the ABE single-base editing vector of S7R-E844A edited the A at position 16 of the 3' end of the FUT8-3 target site (CAGCCAAGGTTGTGGACGGATCA) to G.
[0235] The results show that base editing tools for single-base editing can be constructed using S7R-D619A or S7R-E844A with adenosine deaminase or cytidine deaminase.
[0236] Example 3. Validation of a further optimized single-base editing system
[0237] Based on the D619A protein obtained in Example 1, the amino acid positions 7, 233, 267, 369, and 433 of SEQ ID No. 1 were all mutated to R, resulting in a protein with six amino acid mutations: S7R-D233R-D267R-N369R-S433R-D619A (relative to SEQ ID No. 1, the 7th amino acid is mutated to R, the 233rd amino acid is mutated to R, the 267th amino acid is mutated to R, the 369th amino acid is mutated to R, the 433rd amino acid is mutated to R, and the 619th amino acid is mutated to A).
[0238] The ABE vector was constructed using the same method as in Example 2. The single-base editing efficiency of S7R-D233R-D267R-N369R-S433R-D619A in combination with adenosine deaminase was verified. Its gene editing activity was verified in 293T cells. Three target sites were designed for the CHK2, KLF4, and PCSK genes in 293T cells: CHEK2: TGTTTCAACATTGAGAGCTGGGTC; KLF4: GTTTAAACACACCGGGTTAA; PCSK9: CCCAGAGCATCCCGTGGAAC. Each target site was analyzed in triplicate, and the average value was taken. The editing efficiency is shown below. Figure 5 As shown, the single-base editing efficiency of the S7R-D233R-D267R-N369R-S433R-D619A-ABE vector at the CHK2 target is 9.15%, at the KLF4 target is 38.43%, and at the PCSK9 target is 8.74%.
[0239] Depend on Figure 5 It can be seen that the ABE vector composed of S7R-D233R-D267R-N369R-S433R-D619A and adenosine deaminase exhibits excellent single-base editing efficiency at different target sites. Its editing efficiency is higher than that of S7R-D619A or S7R-E844A proteins in Example 2 when used in combination with adenosine deaminase or cytidine deaminase.
[0240] Using the same method as in Example 2, a CBE vector was constructed using the S7R-D233R-D267R-N369R-S433R-D619A protein. The editing efficiency of S7R-D233R-D267R-N369R-S433R-D619A in combination with cytosine deaminase was verified in animal CHO cells. The CBE single-base editing vector edited the 9th C at the 3' end of the FUT8-6 target site (AAGCCAAGCTTCTTGGTGGTTTC) to a T.
[0241] In addition, the applicant constructed a CBE vector using the S7R-D233R-D267R-N369R-S433R-D619A protein and performed single-base editing on the ALS gene in soybeans, obtaining soybeans with ALS gene editing. The single-base edited soybeans were able to exhibit herbicide resistance.
[0242] The results show that S7R-D233R-D267R-N369R-S433R-D619A, along with adenosine deaminase or cytidine deaminase, can be used to construct a base editing tool for single-base editing. Furthermore, its editing efficiency is higher than that of S7R-D619A or S7R-E844A proteins when used in combination with adenosine deaminase or cytidine deaminase for single-base editing.
[0243] Although specific embodiments of the invention have been described in detail, those skilled in the art will understand that various modifications and variations can be made to the details based on all the published teachings, and all such changes are within the scope of protection of the invention. The entire scope of the invention is given by the appended claims and any equivalents thereof.
Claims
1. A Cas mutant protein with inactivated nuclease activity, wherein the Cas mutant protein is characterized by a mutation of R at amino acid position 7 and A at amino acid position 619 of the amino acid sequence shown in SEQ ID No.
1.
2. A Cas mutant protein with inactivated nuclease activity, wherein the Cas mutant protein is characterized by mutations of amino acid positions 7, 233, 267, 369, and 433 to R, and amino acid position 619 to A.
3. A fusion protein, characterized in that, The fusion protein includes the Cas mutant protein and deaminase as described in any one of claims 1-2.
4. The fusion protein according to claim 3, characterized in that, The deaminase is selected from either adenosine deaminase or cytidine deaminase.
5. The fusion protein according to claim 3 or 4, characterized in that, The Cas mutant protein and the deaminase are linked by a adapter.
6. The fusion protein according to claim 3 or 4, characterized in that, The fusion protein also includes a nuclear localization sequence (NLS).
7. The fusion protein according to claim 4, characterized in that, The deaminase is cytidine deaminase, and the fusion protein also includes a uracil glycosylation inhibitor (UGI).
8. An isolated polynucleotide, characterized in that, The polynucleotide encodes the Cas mutant protein according to any one of claims 1-2, or the polynucleotide encodes the fusion protein according to any one of claims 3-7.
9. A carrier, characterized in that, The vector comprises the polynucleotide of claim 8 and a regulatory element operatively linked thereto.
10. A CRISPR-Cas system, characterized in that, The system comprises the fusion protein according to any one of claims 3-7 and at least one gRNA; The gRNA can bind to the Cas mutant protein in any of the fusion proteins described in claims 3-7.
11. A composition, characterized in that, The composition comprises: (i) A protein component selected from: the Cas mutant protein of any one of claims 1-2 or the fusion protein of any one of claims 3-7; (ii) A nucleic acid component, which is gRNA, said gRNA being capable of binding the Cas mutant protein of any one of claims 1-2.
12. An engineered host cell, characterized in that, The host cell comprises the Cas mutant protein of any one of claims 1-2, or the fusion protein of any one of claims 3-7, or the polynucleotide of claim 8, or the vector of claim 9, or the CRISPR-Cas system of claim 10, or the composition of claim 11.
13. The use of the Cas mutant protein of any one of claims 1-2, or the fusion protein of any one of claims 3-7, or the polynucleotide of claim 8, or the vector of claim 9, or the CRISPR-Cas system of claim 10, or the composition of claim 11, or the host cell of claim 12 in gene editing, wherein the use is for purposes other than disease diagnosis and treatment; or, in the preparation of reagents or kits for gene editing.
14. The application according to claim 13, characterized in that, The gene editing refers to single-base editing of the target gene.
15. A kit for gene editing, characterized in that, The kit comprises the Cas mutant protein of any one of claims 1-2, or the fusion protein of any one of claims 3-7, or the polynucleotide of claim 8, or the vector of claim 9, or the CRISPR-Cas system of claim 10, or the composition of claim 11, or the host cell of claim 12.
16. The use of the Cas mutant protein of any one of claims 1-2, or the fusion protein of any one of claims 3-7, or the polynucleotide of claim 8, or the vector of claim 9, or the CRISPR-Cas system of claim 10, or the composition of claim 11, or the host cell of claim 12 in the preparation of a formulation or kit for: editing a target sequence in a target locus to modify an organism, or for the treatment of a disease.
17. A method for editing nucleic acids, said method for non-disease diagnosis and treatment purposes, the method comprising the step of contacting a target region of the nucleic acid with a fusion protein and gRNA according to any one of claims 3-7, said gRNA comprising a segment capable of binding a Cas mutant protein in any one of claims 3-7 and a segment capable of binding the target region of said nucleic acid; wherein, The target region contains targeted base pairs, and the fusion protein is capable of substituting the base pairs.
18. The method according to claim 17, characterized in that, The deaminase in the fusion protein is adenosine deaminase, and the targeted base pair is replaced by G:C instead of A:T.
19. The method according to claim 17, characterized in that, The deaminase in the fusion protein is cytidine deaminase, and the targeted base pair is replaced by A:T instead of C:G.