A Cas mutant protein and its applications

By designing specific amino acid mutations and fusion proteins in Cas12i.3, the gene editing efficiency of Cas12i.3 was improved, solving the problem of low editing efficiency in existing technologies and achieving more efficient gene editing results.

CN120665841BActive Publication Date: 2026-06-30CHINA AGRI UNIV

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
CHINA AGRI UNIV
Filing Date
2025-06-03
Publication Date
2026-06-30

AI Technical Summary

Technical Problem

The existing Cas12i.3 nuclease has low editing efficiency, which limits its widespread application.

Method used

By mutating amino acids 168, 273, 332, 478, and 599 of Cas12i.3 to arginine, a mutant protein N168R+S273R+L332R+G478R+S599R was formed. This mutant protein was then fused with a nuclear localization signal, tag, linker, and T5 exonuclease to form the fusion protein NLS-N168R+S273R+L332R+G478R+S599R-T5, thus improving editing efficiency.

Benefits of technology

It significantly improves the efficiency of gene editing and is suitable for multi-gene editing applications.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure BDA0005431585590000111
    Figure BDA0005431585590000111
  • Figure BDA0005431585590000131
    Figure BDA0005431585590000131
  • Figure BDA0005431585590000141
    Figure BDA0005431585590000141
Patent Text Reader

Abstract

This invention discloses a Cas mutant protein and its applications, belonging to the field of gene editing technology. The Cas mutant protein provided by this invention is a mutant protein in which amino acids 168, 273, 332, 478, and 599 of SEQ ID NO:1 are all mutated to arginine, while the other amino acid sequences remain unchanged. This invention improves editing efficiency by mutating multiple amino acids in the wild-type Cas12i protein, and can be used for multi-gene editing.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention belongs to the field of gene editing technology, specifically relating to a Cas mutant protein and its applications. Background Technology

[0002] CRISPR-Cas is an adaptive immune system developed by prokaryotes to defend against viral infection or bacteriophage invasion. It recognizes DNA targets via RNA and generates double-strand breaks, subsequently performing site-specific gene editing through non-homologous end joining or homologous recombination.

[0003] The CRISPR-Cas system comprises six types (I-VI). Among them, the type II SpyCas9 and type V AsCas12a gene editing systems are widely used in microbiology, botany, and animal fields due to their relatively simple composition, high editing efficiency, and ease of operation. Chinese invention patent CN111757889B discloses a type V Cas protein, Cas12f.4, which is referred to herein as Cas12i.3. Compared to SpCas9 and AsCas12a, Cas12i.3 is a relatively small Cas nuclease (1,045 amino acids) with a 5'-TTN PAM motif and the ability to autonomously process pre-crRNA. Despite these advantages, the editing efficiency of Cas12i.3 remains a significant factor limiting its widespread application.

[0004] Therefore, it is necessary to provide a Cas nuclease with higher editing efficiency. Summary of the Invention

[0005] The technical problem to be solved by this invention is to provide a Cas enzyme with higher editing efficiency. The technical problem to be solved is not limited to the described technical subject matter; other technical subject matter not mentioned herein will be clearly understood by those skilled in the art through the following description.

[0006] To solve the above-mentioned technical problems, the present invention provides the following technical solutions:

[0007] The present invention provides a Cas mutant protein, wherein the Cas mutant protein is a mutant protein in which the 168th, 273rd, 332nd, 478th and 599th amino acids of SEQ ID NO:1 are all mutated to arginine, while the other amino acid sequences remain unchanged.

[0008] Specifically, the Cas mutant protein mentioned above is a mutant protein obtained by mutating the 168th amino acid of SEQ ID NO:1 from asparagine to arginine, the 273rd amino acid from serine to arginine, the 332nd amino acid from leucine to arginine, the 478th amino acid from glycine to arginine, and the 599th amino acid from serine to arginine, while keeping the other amino acid sequences unchanged.

[0009] Specifically, the amino acid sequence of the aforementioned Cas mutant protein is SEQ ID NO:6. The aforementioned Cas mutant protein will be referred to below as N168R+S273R+L332R+G478R+S599R or N168R+S273R+L332R+G478R+S599R mutants.

[0010] Furthermore, the coding sequence of the aforementioned Cas mutant protein is obtained by replacing the codon encoding amino acid N at position 168 of SEQ ID NO:1 (nucleotides 502-504 aac in SEQ ID NO:2) with a codon encoding R (aga); replacing the codon encoding amino acid S at position 273 of SEQ ID NO:1 (nucleotides 817-819 agc in SEQ ID NO:2) with a codon encoding R (aga); replacing the codon encoding amino acid L at position 332 of SEQ ID NO:1 (nucleotides 994-996 ctg in SEQ ID NO:2) with a codon encoding R (aga); replacing the codon encoding amino acid G at position 478 of SEQ ID NO:1 (nucleotides 1432-1434 ggc in SEQ ID NO:2) with a codon encoding R (aga); and replacing the codon encoding amino acid S at position 599 of SEQ ID NO:1 with a codon encoding R (aga). The sequence was obtained by replacing nucleotides agc (positions 1795-1797) of NO:2 with the codon (aga) encoding R, while keeping the other nucleotide sequences of SEQ ID NO:2 unchanged.

[0011] The present invention also provides a fusion protein, which is a protein comprising the aforementioned Cas mutant protein. Hereinafter referred to as the fusion protein NLS-N168R+S273R+L332R+G478R+S599R-T5.

[0012] Furthermore, the modified portion includes, but is not limited to, tags, nuclear localization signals, linkers, and T5 exonucleases, and may also include other sequences that facilitate gene editing of the fusion protein. Specifically, the modified portion may be one or more of the aforementioned components.

[0013] Furthermore, the nuclear localization signal is located at, near, or close to the end of the Cas mutant protein (e.g., N-terminus, C-terminus, or both ends).

[0014] Furthermore, the epitope tag is well known to those skilled in the art, including but not limited to His, V5, FLAG, HA, Myc, VSV-G, Trx, etc., and those skilled in the art can choose other suitable epitope tags (e.g., for purification, detection, or tracing).

[0015] Furthermore, the Cas mutant protein may optionally be coupled, conjugated, or fused to the modified portion via a linker.

[0016] Furthermore, the modified portion is directly linked to the N-terminus or C-terminus of the Cas mutant protein.

[0017] Furthermore, the modified portion is attached to the N-terminus or C-terminus of the Cas protein of the present invention via a linker. Such linkers are well known in the art, and examples include, but are not limited to, those containing one or more (e.g., 1, 2, 3, 4, or 5) amino acids (e.g., Gly or Ser).

[0018] As is well known to those skilled in the art, the aforementioned Cas mutant proteins and fusion proteins are not limited by their production method. For example, they can be produced by genetic engineering methods (recombinant technology) or by chemical synthesis methods.

[0019] Furthermore, the fusion protein is obtained by linking the 22nd amino acid residue of the tag protein with the 1st amino acid residue of the first nuclear localization signal sequence via a peptide bond, the 31st amino acid residue of the first nuclear localization signal sequence with the 1st amino acid residue of the first linker via a peptide bond, the 8th amino acid residue of the first linker with the 1st amino acid residue of the Cas mutant protein via a peptide bond, the 1045th amino acid residue of the Cas mutant protein with the 1st amino acid residue of the second linker via a peptide bond, the 10th amino acid residue of the second linker with the 1st amino acid residue of the T5 exonuclease via a peptide bond, and the 291st amino acid residue of the T5 exonuclease with the 1st amino acid residue of the second nuclear localization sequence via a peptide bond, totaling 1423 amino acids.

[0020] The tag protein in the fusion protein comprises 22 amino acids, encoded by nucleotides 35-100 of SEQ ID NO:3; the first nuclear localization signal sequence in the fusion protein comprises 31 amino acids, encoded by nucleotides 101-193 of SEQ ID NO:3; the first linker connecting the nuclear localization signal sequence in the fusion protein and the Cas mutant protein (amino acid sequence SEQ ID NO:6) comprises 8 amino acids, with the amino acid sequence GHIHGVPAA, encoded by nucleotides 194-217 of SEQ ID NO:3; the second linker connecting the fusion protein and the T5 exonuclease comprises 10 amino acids, with the amino acid sequence SGGSGGSGGS, encoded by nucleotides 3353-3382 of SEQ ID NO:3; the T5 exonuclease in the fusion protein comprises 291 amino acids, encoded by nucleotides 3383-4255 of SEQ ID NO:3; and the second nuclear localization sequence in the fusion protein comprises 16 amino acids, encoded by SEQ ID NO:3. Nucleotides 4256-4303 in NO:3 are encoded.

[0021] Furthermore, the recombinant vector expressing the aforementioned fusion protein (N168R+S273R+L332R+G478R+S599R mutant) expresses the fusion protein.

[0022] The present invention also provides biological materials related to mutant proteins, wherein the mutant protein is the aforementioned Cas mutant protein, and the biological material is any one of the following:

[0023] B1) The nucleic acid molecule encoding the aforementioned Cas mutant protein;

[0024] B2), an expression cassette containing the nucleic acid molecule described in B1);

[0025] B3), a recombinant vector containing the nucleic acid molecule described in B1), or a recombinant vector containing the expression cassette described in B2);

[0026] B4) Recombinant microorganisms containing the nucleic acid molecules described in B1), or recombinant microorganisms containing the expression cassette described in B2), or recombinant microorganisms containing the recombinant vector described in B3).

[0027] The above-mentioned biological material related to mutant proteins, B1), the nucleotide sequence of the nucleic acid molecule is obtained by replacing the codon encoding amino acid N at position 168 of SEQ ID NO:1 (nucleotides 502-504 aac of SEQ ID NO:2) with the codon encoding R (aga), the codon encoding amino acid S at position 273 of SEQ ID NO:1 (nucleotides 817-819 agc of SEQ ID NO:2) with the codon encoding R (aga), the codon encoding amino acid L at position 332 of SEQ ID NO:1 (nucleotides 994-996 ctg of SEQ ID NO:2) with the codon encoding R (aga), the codon encoding amino acid G at position 478 of SEQ ID NO:1 (nucleotides 1432-1434 ggc of SEQ ID NO:2) with the codon encoding R (aga), and the codon encoding amino acid S at position 599 of SEQ ID NO:1 (nucleotides 502-504 aac of SEQ ID NO:2) with the codon encoding R (aga). The sequence was obtained by replacing nucleotides agc (positions 1795-1797) of NO:2 with the codon (aga) encoding R, while keeping the other nucleotide sequences of SEQ ID NO:2 unchanged.

[0028] The present invention also provides biomaterials related to fusion proteins, wherein the fusion protein is the aforementioned fusion protein, and the biomaterial is any one of the following:

[0029] B1) The nucleic acid molecule encoding the aforementioned fusion protein;

[0030] B2), an expression cassette containing the nucleic acid molecule described in B1);

[0031] B3), a recombinant vector containing the nucleic acid molecule described in B1), or a recombinant vector containing the expression cassette described in B2);

[0032] B4) Recombinant microorganisms containing the nucleic acid molecules described in B1), or recombinant microorganisms containing the expression cassette described in B2), or recombinant microorganisms containing the recombinant vector described in B3).

[0033] In the above-mentioned biological materials related to fusion proteins, the nucleotide sequence of the nucleic acid molecule described in B1) is obtained by replacing nucleotides 218 to 3352 of SEQ ID NO:3 with nucleotides 1-3135 of the coding sequence of the aforementioned Cas mutant protein while keeping the other nucleotides of SEQ ID NO:3 unchanged.

[0034] In the aforementioned biological materials, the expression cassette containing nucleic acid molecules described in B3) refers to DNA capable of expressing the RNA molecules described above in host cells. The expression cassette may also include single-stranded or double-stranded nucleic acid molecules containing all regulatory sequences necessary for expressing the nucleic acid molecule of any of the aforementioned proteins or the DNA of the RNA molecule. The regulatory sequences, under compatible conditions, guide the coding sequence to express the DNA of any of the aforementioned proteins or the RNA molecule in a suitable host cell. The regulatory sequences include, but are not limited to, leader sequences, polyadenylated sequences, propeptide sequences, promoters, signal sequences, and transcription terminators. At a minimum, the regulatory sequences must include a promoter and termination signals for transcription and translation. To introduce specific restriction enzyme sites of the vector for linking the regulatory sequences to the coding region of the nucleic acid sequence encoding the protein or the DNA of the RNA molecule, a regulator-linked regulatory sequence may be provided. The regulatory sequence may be a suitable promoter sequence, i.e., a nucleic acid sequence that can be recognized by the host cell expressing the nucleic acid sequence. The promoter sequence contains a transcriptional regulatory sequence mediating the DNA expression of the protein or the RNA molecule. The promoter can be any nucleic acid sequence that is transcriptionally active in the selected host cell, including mutated, truncated, and heterozygous promoters, and can be derived from genes encoding extracellular or intracellular proteins that are homologous or heterologous to those of the host cell. The regulatory sequence can also be a suitable transcription termination sequence, i.e., a sequence that can be recognized by the host cell and thus terminate transcription. The termination sequence is operatively linked to the 3' end of the nucleic acid sequence encoding the protein or the DNA of the RNA molecule. Any terminator that can function in the selected host cell can be used in this invention. The regulatory sequence can also be a suitable leader sequence, i.e., an untranslated region of mRNA that is crucial for translation in the host cell. The leader sequence is operatively linked to the 5' end of the nucleic acid sequence encoding the protein or the DNA of the RNA molecule. Any leader sequence that can function in the selected host cell can be used in this invention. The regulatory sequence can also be a signal peptide coding region, which encodes an amino acid sequence linked to the amino terminus of a protein, capable of guiding the DNA encoding the protein or the RNA molecule into the cellular secretion pathway. Signal peptide coding regions that can guide the expressed protein or the DNA of the RNA molecule into the secretion pathway of the host cell used can be used in this invention. Adding regulatory sequences that can modulate the expression of proteins or RNA molecules according to the growth status of the host cell may also be necessary. Examples of regulatory sequences are systems that respond to chemical or physical stimuli (including in the presence of regulatory compounds), thereby turning gene expression on or off. Other examples of regulatory sequences are those that enable gene amplification.

[0035] In the aforementioned biological materials, the carrier may be a plasmid, a granule, a bacteriophage, or a viral vector.

[0036] In the above-mentioned biological materials, the microorganisms may be yeast, bacteria, algae or fungi.

[0037] The present invention also provides a composition for gene editing, the composition comprising the aforementioned Cas mutant protein and at least one gRNA; the gRNA being capable of binding to the aforementioned Cas mutant protein.

[0038] In the above composition, the gRNA includes a first segment and a second segment; the first segment is also referred to as a "backbone region", "protein binding segment", "protein binding sequence", or "direct repeat sequence"; the second segment is also referred to as a "target sequence for targeting nucleic acid", "target segment for targeting nucleic acid", or "guide sequence for targeting target sequence".

[0039] The first segment of the gRNA can interact with the aforementioned Cas mutant protein or fusion protein, thereby enabling the Cas mutant protein and gRNA to form a complex or enabling the fusion protein to form a complex with the gRNA.

[0040] The present invention also provides the application of Cas protein in gene editing or in the preparation of products for gene editing, wherein the Cas protein is the aforementioned Cas mutant protein.

[0041] The present invention also provides the application of a fusion protein in gene editing or in the preparation of products for gene editing, wherein the fusion protein is the aforementioned fusion protein.

[0042] The product is a combination reagent or kit for gene editing.

[0043] The present invention also provides an application of a biomaterial, wherein the biomaterial is the aforementioned biomaterial related to mutant proteins or biomaterial related to fusion proteins, and the application is the use of the biomaterial in gene editing or in the preparation of products for gene editing.

[0044] The present invention also provides a kit for gene editing, the kit comprising the aforementioned Cas mutant protein, the aforementioned fusion protein, or the aforementioned combination thereof.

[0045] The kit also includes reagents necessary for gene editing, such as containers, reagents, culture media, cytokines, buffer solutions, and antibodies.

[0046] This invention improves editing efficiency by mutating multiple amino acids in the wild-type Cas12i protein, and can be used for multi-gene editing. Attached Figure Description

[0047] Figure 1This represents the editing efficiency of each combination after efficient mutation in regions 1 and 3, based on the 168 / 599 combination in region 2. WT represents the recombinant vector expressing the Cas12i.3 mutant (S273R), and 168 / 599 represents the recombinant vector expressing the mutant protein 168 / 599. The mutant protein 168 / 599 is obtained by mutating amino acids 168, 273, and 599 of SEQ ID NO:1 to arginine (R), while keeping other amino acids unchanged. 1-1 indicates the recombinant vector expressing the mutant protein N168R+L332R+S599R+S273R has been introduced; 1-2 indicates the recombinant vector expressing the mutant protein N168R+S599R+D851R+S273R has been introduced; 1-3 indicates the recombinant vector expressing the mutant protein N168R+G478R+S599R+S273R has been introduced; 1-4 indicates the recombinant vector expressing the mutant protein N168R+G478R+D551R+S599R+S273R has been introduced; 1-5 indicates the recombinant vector expressing the mutant protein N... The recombinant vectors are 168R+L332R+G478R+S599R+S273R. 1-6 represent recombinant vectors expressing the mutant protein N168R+L332R+G478R+D551R+S599R+S273R. 1-7 represent recombinant vectors expressing the mutant protein N168R+G478R+S599R+D851R+S273R. 1-8 represent recombinant vectors expressing the mutant protein N168R+G478R+D551R+S599R+D851R+S273R.

[0048] Figure 2The editing efficiencies of the mutant N168R+S273R+L332R+G478R+S599R in ADRB2, CHRM4, FANCF, and CXCR4 are shown. For ADRB2, the control group represents the S273R gene editing vector targeting the ADRB2 gene, and 168 / 332 / 478 / 599 represents the N168R+S273R+L332R+G478R+S599R gene editing vector targeting the ADRB2 gene. For CHRM4, the control group represents the S273R gene editing vector targeting the CHRM4 gene, and 168 / 332 / 478 / 599 represents the N168R+S273R+L332R+G478R+S599R gene editing vector targeting the CHRM4 gene. The control group at FANCF represents the vector inserted into the S273R gene editing vector targeting the FANCF gene, while 168 / 332 / 478 / 599 represents the vector inserted into the N168R+S273R+L332R+G478R+S599R gene editing vector targeting the FANCF gene; the control group at CXCR4 represents the vector inserted into the S273R gene editing vector targeting the CXCR4 gene, while 168 / 332 / 478 / 599 represents the vector inserted into the N168R+S273R+L332R+G478R+S599R gene editing vector targeting the CXCR4 gene.

[0049] Figure 3 This diagram illustrates the construction of vectors with 20 and 30 target sites using the mutant N168R+S273R+L332R+G478R+S599R.

[0050] Figure 4 This is a gel image showing the T7E1 restriction enzyme digestion at positions 3, 5, 7, 8, 9, and 12 of target site 20 for the mutant N168R+S273R+L332R+G478R+S599R. In the image, 3 represents target GAPDH-1, 5 represents target LMNA-1, 7 represents target AR, 8 represents target ADRB2, 9 represents target CCR4, and 12 represents target CD2. Each lane in the image has three bands marked with an asterisk (*). The band with the largest molecular weight is the uncleaved specific band, and the remaining two bands with smaller molecular weights are the cleaved specific bands.

[0051] Figure 5This is a gel image showing the T7E1 restriction enzyme digestion sites at positions 3, 5, 7, 8, 9, 12, 21, 24, 28, and 30 of the mutant N168R+S273R+L332R+G478R+S599R at target site 30. In the image, 3 represents target GAPDH-1, 5 represents target LMNA-1, 7 represents target AR, 8 represents target ADRB2, 9 represents target CCR4, 12 represents target CD2, 21 represents target DNMT1-1, 24 represents target EMX1-1, 28 represents target FANCF-3, and 30 represents target VEGFA. Each lane in the image has three bands marked with an asterisk (*). The band with the largest molecular weight is the uncleaved specific band, and the remaining two bands with smaller molecular weights are the cleaved specific bands. Detailed Implementation

[0052] definition:

[0053] In this invention, amino acid residues can be represented by a single letter or by three letters, for example: alanine (Ala, A), valine (Val, V), glycine (Gly, G), leucine (Leu, L), glutamic acid (Gln, Q), phenylalanine (Phe, F), tryptophan (Trp, W), tyrosine (Tyr, Y), aspartic acid (Asp, D), asparagine (Asn, N), glutamic acid (Glu, E), lysine (Lys, K), methionine (Met, M), serine (Ser, S), threonine (Thr, T), cysteine ​​(Cys, C), proline (Pro, P), isoleucine (Ile, I), histidine (His, H), and arginine (Arg, R).

[0054] The present invention will now be described in further detail with reference to specific embodiments. The given embodiments are merely illustrative of the invention and not intended to limit its scope. The embodiments provided below can serve as a guide for further improvements by those skilled in the art and do not constitute a limitation on the invention in any way.

[0055] Unless otherwise specified, the experimental methods used in the following examples are conventional methods, performed according to the techniques or conditions described in the literature in this field or according to the product instructions. Unless otherwise specified, the materials and reagents used in the following examples are commercially available.

[0056] The following examples used GraphPad Prism 8 statistical software to process the data. The experimental results are expressed as mean ± standard deviation. Unpaired t-tests were used, and P < 0.05 (*) indicates a significant difference.

[0057] Example 1: Screening for high-efficiency Cas12i mutant combinations in three regions

[0058] I. Obtaining mutant proteins

[0059] 1. Obtain the S273R mutant

[0060] During the early research, the inventors provided the amino acid sequence of wild-type Cas12i.3 (its amino acid sequence is SEQ ID NO:1, and its CDS sequence is SEQ ID NO:2). Later, during the experiment, a mutation was unintentionally introduced, that is, the third C of the codon of the 273rd amino acid of Cas12i.3 was mutated to A (C→A), causing the amino acid at that position to mutate from serine S to arginine R (S273R). Subsequently, the S273R variant of Cas12i.3 will be referred to as S273R.

[0061] 2. Obtain L332R and other mutants

[0062] The tag, nuclear localization signal, linker, T5 exonuclease sequence, and S273R were synthesized and further constructed into a fusion protein of the Cas protein variant S273R (nucleotide sequence SEQ ID NO:3, total 4303bp). Nucleotides 1-31 are vector homologous sequences (including gccaccatgg (kozak sequence)), nucleotides 32-34 are the start codon ATG, nucleotides 35-100 are the 3×FLAG tag sequence (encoding 22 amino acids), nucleotides 101-193 are the first nuclear localization signal sequence (C-Myc NLS and BP NLS, encoding 31 amino acids), nucleotides 194-217 are the linker (encoding the amino acid GHIHGVPAA, totaling 8 amino acids), and nucleotides 218-3352 are the nucleotide sequence of S273R (which is the sequence of SEQ ID NO:3). The sequence obtained by mutating serine S to arginine R while keeping other amino acid sequences unchanged in NO:1 (encoding 1045 amino acids), nucleotides 3353-3382 are the nucleotide sequence of the linker (encoding the amino acids SGGSGGSGGS, a total of 10 amino acids), nucleotides 3383-4255 are the T5 exonuclease encoding sequence (encoding 291 amino acids), and nucleotides 4256-4303 are the second nuclear localization signal sequence (nucleoplasmin NLS, encoding 16 amino acids). The synthesized nucleotide sequence is the DNA fragment of SEQ ID NO:3 (encoding 1423 amino acids).

[0063] The nucleotide sequence of S273R is obtained by replacing the codon encoding amino acid S at position 273 of SEQ ID NO:1 (nucleotides agc at positions 817-819 of SEQ ID NO:2) with the codon encoding R (aga). That is, the difference between the nucleotide sequence of S273R and SEQ ID NO:2 is that the nucleotide at position 819 of SEQ ID NO:2 is replaced by adenine deoxyribonucleotide (A) instead of cytosine deoxyribonucleotide (C).

[0064] Based on the coding sequence SEQ ID NO:3 of the aforementioned synthesized Cas protein variant S273R fusion protein, the S273R mutation was divided into three regions (Region 1, Region 2, Region 3) according to spatial location, and efficient mutation combinations were found in each region. The following are the mutation or mutation combination types distributed in the three regions. It should be noted that all the above mutant proteins are based on S273R, so the S273R mutation site is not shown in the name. For example, L332R is actually a mutant protein obtained by mutating the 273rd amino acid of wild-type Cas12i.3 (amino acid sequence SEQ ID NO:1) from S (serine) to R (arginine) and the 332nd amino acid from L (leucine) to R (arginine) while keeping other amino acids unchanged.

[0065] Various combinations of distribution area 1: L332R, T850R, D851R, L332R+T850R, L332R+D851R, L332R+T850R+D851R.

[0066] Various combinations distributed in region 2: N168R, D233R, T235R, D267R, N168R+D267R, N168R+D233R, N168R+T235R, D233R+D267R, T235R+D267R, N168R+T235R+D267R, S7R, T505R, S599R, S7R+N168R, N168R+T5 05R、N168R+S599R、S7R+N168R+T505R、N168R+T505R+S599R、S7R+N168R+D267R、N168R+D2 67R+T505R, N168R+D267R+S599R, S7R+N168R+D267R+T505R, N168R+D267R+T505R+S599R.

[0067] Distributed in region 3: S477R, G478R, D551R, L662R, D551R+L662R, S477R+L662R, G478R+L662R, S477R+D551R, G478R+D551R, S477R+D551R+L662R, G478R+D551R+L662R.

[0068] The mutant proteins corresponding to the combinations of the above regions are named according to the mutation positions. Taking T850R, L332R+T850R, S477R+D551R+L662R, and N168R+D267R+T505R+S599R as examples, T850R is a mutant protein obtained by mutating amino acid position 273 of wild-type Cas12i.3 (amino acid sequence SEQ ID NO:1) from S to R and amino acid position 850 from T to R, while keeping other amino acids unchanged. L332R+T850R is a mutant protein obtained by mutating amino acid position 273 of wild-type Cas12i.3 (amino acid sequence SEQ ID NO:1) from S to R, amino acid position 332 from L to R, and amino acid position 850 from T to R, while keeping other amino acids unchanged. S477R+D551R+L662R is a mutant protein obtained by mutating amino acid position 273 (S to R), amino acid position 477 (S to R), amino acid position 551 (D to R), and amino acid position 662 (L to R) of wild-type Cas12i.3 (amino acid sequence SEQ ID NO:1) while keeping other amino acids unchanged. N168R+D267R+T505R+S599R is a mutant protein obtained by mutating amino acid position 273 (S to R), amino acid position 168 (N to R), amino acid position 267 (D to R), amino acid position 505 (T to R), and amino acid position 599 (S to R) of wild-type Cas12i.3 (amino acid sequence SEQ ID NO:1) while keeping other amino acids unchanged. The aforementioned combinations will be used to refer to the corresponding mutant proteins thereafter.

[0069] The preparation methods for the DNA fragments of the mutant proteins in the above combinations are similar. Taking L332R as an example, the DNA sequence encoding the aforementioned L332R is based on SEQ ID NO:2, with the nucleotides 817-819 in SEQ ID NO:2 (encoding amino acid S at position 273 of SEQ ID NO:1) replaced with AGA (encoding arginine), and the nucleotides 994-996 in SEQ ID NO:2 (encoding amino acid L at position 332 of SEQ ID NO:1) replaced with AGA (encoding arginine). Centered on the aforementioned AGA, the DNA sequence encoding the aforementioned L332R is divided into two parts, and two pairs of primers are designed for amplifying these two parts of the DNA sequence.

[0070] First pair of primers:

[0071] Primer-F: 5'-TCACTTTTTTTCAGGTTGGACCGGTGCC-3';

[0072] L332R-R: 5'-GTCGCCCTCGCTCAGTCTGCCtctCAGCTCCAC-3';

[0073] Second pair of primers:

[0074] L332R-F: 5'-GTGGAGCTGagaGGCAGACTGAGCGAGGGCGAC-3';

[0075] Primer-R: 5'-CTTTTTCTTTTTTGCCTGGCCGGCCT-3'.

[0076] L332R-R and L332R-F are designed as codons containing the mutation point at amino acid position 332 (lowercase letters in L332R-R and L332R-F), totaling 33 bp of sequence.

[0077] Using S273R as a template, the first nucleotide sequence of the L332R mutant protein was amplified using Primer-F and L332R-R, and the second nucleotide sequence was amplified using L332R-F and Primer-R, resulting in two fragments containing overlapping sequences. The fragment amplification kit used was PrimeSTAR Max DNA Polymerase (Baori Biotechnology Co., Ltd., R045A); detailed experimental procedures are available in the instruction manual. The two fragments were then ligated using the pEASY-Basic Seamless Cloning and Assembly Kit (Beijing TransGen Biotech Co., Ltd., CU201-02); detailed experimental procedures are available in the instruction manual. A PCR product containing the nucleotide sequence encoding the L332R mutant protein was obtained.

[0078] Using the methods and examples described above, target mutations can be accurately and efficiently introduced to obtain DNA fragments of mutant proteins from various combinations of distribution regions 1, 2, and 3, as well as any combination thereof.

[0079] It should be noted that the DNA fragments of the above mutant proteins all contain sequences such as tags, nuclear localization signals, linkers, and T5 exonucleases.

[0080] II. Constructing a gene-editing plasmid targeting tdTomato and containing mutant proteins.

[0081] 1. Construct the U6-CBh-Cas9-T2A-EGFP-bGH polyA vector

[0082] PX458: also known as U6-sgRNA-CBh-Cas9-T2A-EGFP-bGH polyA, purchased from addgene vector library, catalog number 48138.

[0083] PX458 was double-digested with restriction endonucleases BbsI (NEB) and XbaI (NEB) to remove the sgRNA scaffold sequence, yielding a linearized PX458 vector. Primers TF and TR were synthesized, and annealing of primers TF and TR generated a DNA double strand complementary to the digested linearized PX458 vector, which was the annealed product. Annealing system: 2.5 μL TF (100 μM), 2.5 μL TR (100 μM), 1 μL T4 ligase buffer, ddH2O to 10 μL. Annealing program: 95℃ for 5 min in a metal bath, then open the metal bath lid, close the metal bath, and allow to cool to room temperature. The recovered linearized PX458 vector (double-digested with BbsI and XbaI) was ligated with the annealed product using a T4 ligase kit (Baori Biotechnology Co., Ltd.) to obtain the vector U6-CBh-Cas9-T2A-EGFP-bGH polyA.

[0084] TF: 5'-CACCACTAGTT-3';

[0085] TR: 5'-CTAGAACTAGT-3'.

[0086] The difference between U6-CBh-Cas9-T2A-EGFP-bGH polyA and PX458 is that U6-CBh-Cas9-T2A-EGFP-bGH polyA is a recombinant vector obtained by replacing the sgRNA scaffold sequence between the BbsI and XbaI restriction sites of PX458 with a DNA fragment formed by annealing TF and TR, while keeping other nucleotides of PX458 unchanged.

[0087] 2. Construct a gene-editing plasmid targeting tdTomato and containing mutant proteins.

[0088] (1) Construction of plasmids targeting tdTomato

[0089] This embodiment designs a gRNA targeting tdTomato based on the coding sequence of tdTomato (positions 2529-3959 of GenBank Accession No. KT878736.1 (Update Date 06-OCT-2015)). The target sequence is 5'-AAGACCAUCUACAUGGCCAAGAA-3' (targeting nucleotides 3063 to 3085 of GenBank Accession No. KT878736.1). The U6-CBh-Cas9-T2A-EGFP-bGH polyA vector was double-digested with KpnI (NEB) and SpeI (NEB), and the large fragment containing the U6 promoter (with SpeI and KpnI restriction sites after the U6 promoter) was recovered to obtain the U6-CBh-Cas9-T2A-EGFP-bGH polyA linear vector. PCR products containing a DNA fragment expressing a gRNA sequence targeting tdTomato were amplified using primers tdTomato-F and tdTomato-R. The PCR products were recovered and their concentrations determined using a product recovery kit (Guangzhou Meiji Biotechnology Co., Ltd., catalog number D2111-02). The aforementioned gRNA fragment (the aforementioned DNA fragment containing a gRNA sequence targeting tdTomato) was homologously recombinated with the aforementioned U6-CBh-Cas9-T2A-EGFP-bGH polyA linear vector using a seamless cloning kit to form the U6-crRNA-CBh-Cas9-T2A-EGFP-bGH polyA vector.

[0090] The DNA fragment expressing the gRNA sequence targeting tdTomato is 5'-aaaggacgaaacaccGAGAGAATGTGcGCATAGTCgCACAAGACCATCTACATGGCCAAGAATTTTTTTgtacccgttacataa-3' (84bp). Nucleotides 17 to 39 are direct repeats, and nucleotides 40 to 62 are the target sequence, targeting bases 3063 to 3085 of GenBank Accession No. KT878736.1. Nucleotides 63 to 69 are transcription termination signals; the lowercase letters indicate sequences homologous to the U6-CBh-Cas9-T2A-EGFP-bGH polyA vector, and the additional G at position 16 enhances transcription.

[0091] tdTomato-F: 5'-aaaggacgaaacaccGAGAGAATGTGCGCATAGTCGCAC-3';

[0092] tdTomato-R: 5'-ttatgtaacgggtacAAAAAAATTCTTGGCCATGTAGATGGTCTTGTGCGACTATGCGCA-3.

[0093] The difference between the U6-crRNA-CBh-Cas9-T2A-EGFP-bGH polyA vector and the U6-CBh-Cas9-T2A-EGFP-bGH polyA vector is that the U6-crRNA-CBh-Cas9-T2A-EGFP-bGH polyA vector is a recombinant vector obtained by replacing the DNA fragment between the KpnI and SpeI restriction sites of the U6-CBh-Cas9-T2A-EGFP-bGH polyA vector with a DNA fragment expressing a gRNA sequence targeting tdTomato, while keeping the other nucleotide sequences of the U6-CBh-Cas9-T2A-EGFP-bGH polyA vector unchanged.

[0094] (2) Insertion of mutant proteins and plasmid construction

[0095] The plasmid U6-crRNA-CBh-Cas9-T2A-EGFP-bGH polyA was digested with the restriction endonucleases AgeI (NEB) and FseI (NEB) to remove the Cas9 coding sequence, resulting in a linearized plasmid without the Cas9 coding sequence.

[0096] The DNA fragments of the mutant proteins from "I. Obtaining mutant proteins" above were seamlessly cloned into the aforementioned linearized plasmids that do not contain the Cas9 coding sequence, resulting in U6-crRNA-CBh-T2A-EGFP-bGH polyA vectors that express each mutant protein one-to-one.

[0097] For example, the DNA fragment of the L332R mutant protein was seamlessly cloned into the U6-crRNA-CBh-T2A-EGFP-bGH polyA vector to obtain a recombinant vector that can express the L332R mutant protein.

[0098] Recombinant vectors expressing the aforementioned mutant proteins can be prepared using the methods described above.

[0099] The aforementioned recombinant vectors may be referred to in the following text as gene editing vectors containing Cas protein variants.

[0100] 2. Cell culture and plasmid transfection

[0101] The tdTomato sheep fibroblast fluorescent reporter cell line (Wang Linli, 2024, Establishment and Application of CRISPR / Cas12i.3 High-Efficiency Multiplex Gene Editing System, China Agricultural University, 2024 [D]) was cultured in DMEM (Gibco) containing 1% penicillin-streptomycin (Gibco) and 10% fetal bovine serum (Gibco). Cells in good condition were transferred to 10 cm culture dishes (Corning) and cultured until the cell confluence was approximately 80%. Cells were collected by digestion with trypsin-EDTA (0.25%, Gibco) into EP tubes. The cells were resuspended in 100 μL of electroporation buffer (Beijing Enggen Biotechnology Co., Ltd., catalog number 98668-20) and 7 μg of plasmid (the gene editing vector containing the Cas protein variant constructed above) was added and mixed thoroughly. The cells were then placed in a Lonza Amaxa Nucleofector 2B transfection instrument and electroporated using program A-033. Immediately after electroporation, 500 μL of LMEM high-glucose medium was added, and the cells were incubated at 37°C for 10 min. Cells were then seeded into 6-well plates using complete medium containing 20% ​​FBS. Six hours after transfection, the medium was replaced with complete medium containing 15% FBS. Forty-eight hours after transfection, cells were digested with trypsin-EDTA (0.25%, Gibco), and flow cytometry was used to analyze changes in tdTomato fluorescence intensity in EGFP-positive cells, specifically calculating the average quenching rate of tdTomato fluorescence intensity and the percentage of cells with weak fluorescence intensity.

[0102] 3. Summary of Results

[0103] In region 1, the L332R and D851R mutants showed the highest average fluorescence intensity quenching rates. In region 2, the N168R and N168R+D267R mutants were initially detected as having the highest average fluorescence intensity quenching rates, followed by the introduction of the remaining mutants. Ultimately, the N168R+S599R and S7R+N168R+T505R mutants showed the highest average fluorescence intensity quenching rates. In region 3, the G478R and G478R+D551R mutants did not have the highest average fluorescence intensity quenching rates, but they had the highest proportion of weak fluorescence intensity.

[0104] Example 2: Further optimization based on the selected high-efficiency mutation combinations within the three regions.

[0105] Through Example 1, it was found that the L332R and D851R mutants in Region 1, the N168R+S599R and S7R+N168R+T505R mutants in Region 2, and the G478R and G478R+D551R mutants in Region 3 have a good effect on improving editing efficiency. Furthermore, the N168R+S599R mutant in Region 2 shows a better efficiency improvement through a two-step combination. Therefore, based on the N168R+S599R mutant in Region 2, further combining the efficient combinations in Regions 1 and 3 is expected to further improve editing efficiency.

[0106] Based on the N168R+S599R mutant in region 2, and using a similar mutation introduction method as described in Example 1, a mutant was constructed.

[0107] Table 1. Mutation combination types of N168R+S599R in Region 2

[0108] name Amino acid mutation types 168 / 599 N168R+S599R+S273R 1-1 N168R+L332R+S599R+S273R 1-2 N168R+S599R+D851R+S273R 1-3 N168R+G478R+S599R+S273R 1-4 N168R+G478R+D551R+S599R+S273R 1-5 N168R+L332R+G478R+S599R+S273R 1-6 N168R+L332R+G478R+D551R+S599R+S273R 1-7 N168R+G478R+S599R+D851R+S273R 1-8 N168R+G478R+D551R+S599R+D851R+S273R

[0109] Following the experimental method described in Example 1, a plasmid was constructed, and the plasmid was used for subsequent cell culture and plasmid transfection, also following the experimental method described in the example.

[0110] In the above plasmids:

[0111] The recombinant vector expressing the N168R+S273R+L332R+G478R+S599R mutant (fusion protein NLS-N168R+S273R+L332R+G478R+S599R-T5): This recombinant vector differs from the U6-crRNA-CBh-Cas9-T2A-EGFP-bGH polyA vector only in that the recombinant fragment M replaces the coding sequence of the Cas9 protein between the AgeI and FseI restriction sites on the U6-crRNA-CBh-Cas9-T2A-EGFP-bGH polyA vector; the other nucleotide sequences are identical. Recombinant fragment M is a recombinant fragment obtained by replacing nucleotides 218 to 3352 of SEQ ID NO:3 with a DNA fragment of the N168R+S273R+L332R+G478R+S599R mutant (excluding the stop codon, totaling 3135bp) while keeping the other nucleotides of SEQ ID NO:3 unchanged. The nucleotide sequence of the aforementioned N168R+S273R+L332R+G478R+S599R mutant DNA fragment (excluding the stop codon, totaling 3135bp) is obtained by replacing the codon encoding amino acid N at position 168 of SEQ ID NO:1 (nucleotides 502-504 aac of SEQ ID NO:2) with the codon encoding R (aga), and replacing the codon encoding amino acid S at position 273 of SEQ ID NO:1 (nucleotides 817-819 agc of SEQ ID NO:2) with the codon encoding R (aga), and the codon encoding SEQ ID NO:2... The sequence of SEQ ID NO:1 was obtained by replacing the codon for amino acid L at position 332 (nucleotides 994-996 ctg in SEQ ID NO:2) with the codon for R (aga), replacing the codon for amino acid G at position 478 (nucleotides 1432-1434 ggc in SEQ ID NO:2) with the codon for R (aga), and replacing the codon for amino acid S at position 599 (nucleotides 1795-1797 agc in SEQ ID NO:2) with the codon for R (aga) while keeping the other nucleotide sequences of SEQ ID NO:2 unchanged.

[0112] A recombinant vector expressing the N168R+S273R+L332R+G478R+S599R mutant expresses the fusion protein NLS-N168R+S273R+L332R+G478R+S599R-T5. The fusion protein NLS-N168R+S273R+L332R+G478R+S599R-T5 is constructed by linking the 22nd amino acid residue of the tag protein to the 1st amino acid residue of the first nuclear localization signal sequence via a peptide bond, linking the 31st amino acid residue of the first nuclear localization signal sequence to the 1st amino acid residue of the first linker via a peptide bond, and linking the 8th amino acid residue of the first linker... The sequence consists of 1423 amino acids, obtained by linking the first amino acid residue of the N168R+S273R+L332R+G478R+S599R mutant with the first amino acid residue of the second linker via a peptide bond, linking the 1045th amino acid residue of the N168R+S273R+L332R+G478R+S599R mutant with the first amino acid residue of the second linker via a peptide bond, linking the 10th amino acid residue of the second linker with the first amino acid residue of the T5 exonuclease via a peptide bond, and linking the 291st amino acid residue of the T5 exonuclease with the first amino acid residue of the second nuclear localization sequence via a peptide bond.

[0113] The tag protein in the fusion protein NLS-N168R+S273R+L332R+G478R+S599R-T5 contains 22 amino acids, encoded by nucleotides 35-100 of SEQ ID NO:3; the first nuclear localization signal sequence in the fusion protein contains 31 amino acids, encoded by nucleotides 101-193 of SEQ ID NO:3; the first linker connecting the nuclear localization signal sequence in the fusion protein and the N168R+S273R+L332R+G478R+S599R mutant contains 8 amino acids, with the amino acid sequence GHIHGVPAA, encoded by nucleotides 194-217 of SEQ ID NO:3; the second linker connecting the fusion protein and the T5 exonuclease contains 10 amino acids, with the amino acid sequence SGGSGGSGGS, encoded by SEQ ID NO:3. The T5 exonuclease in the fusion protein, encoded by nucleotides 3353-3382 of SEQ ID NO:3, comprises 291 amino acids and is encoded by nucleotides 3383-4255 of SEQ ID NO:3. The second nuclear localization sequence in the fusion protein comprises 16 amino acids and is encoded by nucleotides 4256-4303 of SEQ ID NO:3.

[0114] 2. Summary of Results

[0115] The results are as follows Figure 1As shown, further cross-regional combination of N168R+S599R in region 2, where the efficiency improvement is highest, yields N168R+S273R+L332R+G478R+S599R, which exhibits the most significant improvement in editing activity. Therefore, this N168R+S273R+L332R+G478R+S599R mutant based on S273R is named N168R+S273R+L332R+G478R+S599R (amino acid sequence is SEQ ID NO:6).

[0116] Example 3: Verification of the editing efficiency of engineered Cas12i nuclease N168R+S273R+L332R+G478R+S599R

[0117] The recombinant vector expressing the S273R mutant obtained in Example 1 and the recombinant vector expressing the N168R+S273R+L332R+G478R+S599R mutant obtained in Example 2 were digested with KpnI (NEB) and SpeI (NEB) and recovered (there are SpeI and KpnI restriction sites after the U6 promoter) to obtain the digested S273R recombinant vector and the digested N168R+S273R+L332R+G478R+S599R recombinant vector.

[0118] Targets were designed for ADRB2, CHRM4, FANCF, and CXCR4 in HEK293T cells. DNA fragments expressing gRNA sequences targeting ADRB2 were amplified using primers F and ADRB2-R; DNA fragments expressing gRNA sequences targeting CHRM4 were amplified using primers F and CHRM4-R; DNA fragments expressing gRNA sequences targeting FANCF were amplified using primers F and FANCF-R; and DNA fragments expressing gRNA sequences targeting CXCR4 were amplified using primers F and CXCR4-R. The PCR products were recovered and their concentrations determined using a product recovery kit (Guangzhou Meiji Biotechnology Co., Ltd., catalog number D2111-02). The DNA fragments described above were homologously recombinated with the enzyme-digested S273R recombinant vector (or the enzyme-digested N168R+S273R+L332R+G478R+S599R recombinant vector) using a seamless cloning kit to form S273R gene editing vectors targeting ADRB2, CHRM4, FANCF, or CXCR4 genes, and N168R+S273R+L332R+G478R+S599R gene editing vectors targeting ADRB2, CHRM4, FANCF, or CXCR4 genes (the DNA fragments targeting the gRNA sequences of the four genes were randomly combined with the two enzyme-digested recombinant vectors for homologous recombination). HEK293T cells were transfected with Lipofectami ne 3000, and 20,000 EFGP-positive cells were collected for each combination. Primer information is as follows:

[0119] F: 5'-aaaggacgaaacaccGAGAGAATGTGCGCATAGTCGCAC-3';

[0120] ADRB2-R: 5'-taagttatgtaacggAAAAAAAACCGAGGCACGCACATACAGGCAGTGcGACTATGCgCA-3';

[0121] CHRM4-R: 5'-taagttatgtaacggAAAAAAACGTGTCTGGGGAGGAAGGGGAGAGTGcGACTATGCgCA-3';

[0122] FANCF-R: 5'-taagttatgtaacggAAAAAAAGTGCTAGTCCACTGGCTTCTGGGGTGcGACTATGCgCA-3';

[0123] CXCR4-R: 5'-taagttatgtaacggAAAAAAACTTCAGGCGCATCCCGCTTCCCTGTGcGACTATGCgCA-3'.

[0124] Table 2. DNA fragments expressing gRNA sequences targeting the above genes.

[0125]

[0126] In the above sequence, nucleotides 1 to 15 are the left homologous sequence of the vector, G at position 16 is used to enhance RNA transcription, nucleotides 17 to 39 are the direct repeat sequence, and nucleotides 40 to 62 are the target sequence (nucleotides 40 to 62 in ADRB2-R target nucleotides 148826203 to 148826225 of NC_000005.10, and nucleotides 40 to 62 in CHRM4-R target nucleotide 4638 of NC_000011.10). Nucleotides 6582 to 46386604, nucleotides 40-62 in FANCF-R target nucleotides 22625083 to 22625105 of NC_000011.10, nucleotides 40-62 in CXCR4-R target nucleotides 136118388 to 136118410 of NC_000002.12, nucleotides 63 to 69 are transcription termination signals, and nucleotides 70 to 84 are the right homologous sequence of the vector.

[0127] DNA was extracted from the cells collected above. PCR reaction was performed using primers listed in the table below to amplify the target region. Amplicon sequencing was then performed. The editing efficiency of each target was calculated using CRISPResso2 software. Editing efficiency (%) = mutation readings / total readings × 100%.

[0128] Table 3. PCR reaction primers

[0129] Primer name Sequence (5'-3') HEK293T_ADRB2_seq_F GGAGGGTGTGTCTCAGTGTC HEK293T_ADRB2_seq_R GCTTTTGGCTCTTCTGTGGC HEK293T_CHRM4_seq_F ATTCTGCCAGAGAATGTCCCTC HEK293T_CHRM4_seq_R ATTTCCACCGTCTCATAGCGA HEK293T_CXCR4_seq_F CCTGGGCCTCAGTGTCTCTA HEK293T_CXCR4_seq_R CAGGGGACCCTGCTGTTTG HEK293T_VEGFA_seq_F GAGAAGGCCAGGGGTCACTC HEK293T_VEGFA_seq_R AACTCTGTCCAGAGACACGC

[0130] The results are as follows Figure 2 As shown, compared with S273R, the editing efficiency of the Cas mutant protein N168R+S273R+L332R+G478R+S599R is significantly improved.

[0131] Example 4: Application of engineered Cas12i nucleic acid (N168R+S273R+L332R+G478R+S599R) in multi-gene editing

[0132] The recombinant vector expressing the N168R+S273R+L332R+G478R+S599R mutant obtained in Example 3 was digested with SAcI (NEB) and recovered to obtain the digested N168R+S273R+L332R+G478R+S599R recombinant vector.

[0133] Triple-crRNA arrays targeting 20 (12 genes) and 30 (17 genes) of genes such as ACTB, GAPDH, and LMNA were synthesized, and T20 and T30 were synthesized, respectively. These were then homologously recombinated with the enzyme-digested N168R+S273R+L332R+G478R+S599R recombinant vector to form gene editing vectors T20-U6 and T30-U6. These two gene editing vectors target 20 (12 genes) and 30 (17 genes) of genes such as ACTB, GAPDH, and LMNA, respectively (e.g., T20-U6). Figure 3 (As shown).

[0134] T20-U6 is a recombinant expression vector obtained by inserting a DNA fragment with the nucleotide sequence SEQ ID NO:4 (T20, the lowercase sequence agagaatgtgcgcatagtcgcac is a homologous repeat sequence in crRNA, totaling 1128 bp) into the SacI restriction site of the recombinant vector N168R+S273R+L332R+G478R+S599R while keeping other nucleotide sequences unchanged.

[0135] In SEQ ID NO:4, positions 1-25 are vector homologous sequences used for seamless cloning with the vector backbone after enzyme digestion. Nucleotides 26-135 form a triplet structure (which can stabilize mRNA without a poly(A) tail). Positions 136-185 are nucleosides with no practical significance. Nucleotides 186-1128 are lowercase repetitive sequences. The remaining positions are, in order, target ACTB-1, target ACTB-2, target GAPDH-1, target GAPDH-2, target LMNA-1, target LMNA-2, target AR, target ADRB2, target CCR4, target CCR10-1, target CCR10-2, target CD2, target CHR M4-1, target CHRM4-2, target CXCR4-1, target CXCR4-2, target HBB-1, target HBB-2, target IL1RN-1, and target IL1RN-2.

[0136] T30-U6 is a recombinant expression vector obtained by inserting a DNA fragment with the nucleotide sequence SEQ ID NO:5 (T30, the lowercase sequence agagaatgtgcgcatagtcgcac is a homologous repeat sequence in crRNA, totaling 1588bp) into the SacI restriction site of the recombinant vector N168R+S273R+L332R+G478R+S599R while keeping other nucleotide sequences unchanged.

[0137] In SEQ ID NO:5, sequences 1-25 are vector homologous sequences, nucleotides 26-135 form the first triplet-crRNA array, nucleotides 136-185 are nucleosides with no practical significance, nucleotides 186-1588 contain lowercase repetitive sequences, and the remaining sequences are, in order, target ACTB-1, target ACTB-2, target GAPDH-1, target GAPDH-2, target LMNA-1, target LMNA-2, target AR, target ADRB2, target CCR4, target CCR10-1, and target CCR. 10-2, target CD2, target CHRM4-1, target CHRM4-2, target CXCR4-1, target CXCR4-2, target HBB-1, target HBB-2, target IL1RN-1, target IL1RN-2, target DNMT1-1, target DNMT1-2, target DNMT1-3, target EMX1-1, target EMX1-2, target FANCF-1, target FANCF-2, target FANCF-3, target GRIN2B, target VEGFA. The aforementioned repetitive sequences are used to connect target sites. The Cas12i protein possesses RNase activity, which can cleave the expressed crRNA array at the 5' end of the repetitive sequences, generating multiple crRNAs targeting different target genes.

[0138] HEK293T cells were transfected with Lipofectamine 3000, and 100,000 EGFP-positive cells were collected. T7E1 assays were performed on target sites located at positions 3, 5, 7, 8, 9, 12 and 3, 5, 7, 8, 9, 12, 21, 24, 28, 30 from both the 20 and 30 target sites. Gene editing efficiency (%indel) = 100 × (1 - (1-FractionCleared)) 1 / 2 Fraction Cleaved = Sum of gray values ​​of the cleaved bands / Sum of gray values ​​of the cleaved bands and the uncleaved specific bands. The target sequence information for the above crRNAs is shown in the table below, where targets 1-20 are the first 20 targets in Table 4, and targets 1-30 are all 30 targets in Table 4.

[0139] The T7E1 detection method is as follows: Genomic DNA was extracted from cells transfected for 48 hours using a genomic DNA extraction kit (Guangzhou Meiji Biotechnology Co., Ltd., catalog number D3018-02). 100 ng of the extracted genomic DNA was used as a template for PCR amplification. The amplification reaction system and procedure were as follows: The total volume of the amplification reaction was 50 μL. Primers listed in Table 5 were used for amplification, with the following components: 100 ng DNA template, 1 μL each of 10 μmol / L forward and reverse primers, 25 μL PrimeS TAR (Baori Biotechnology Co., Ltd.), and the volume was brought to 50 μL with sterile deionized water. The PCR reaction program was: 98℃ pre-denaturation for 3 min; 98℃ denaturation for 10 s, 60℃ annealing for 15 s, 72℃ extension for 30 s (33 cycles); and a final extension at 72℃ for 5 min. After PCR, the PCR product was recovered using a product recovery kit, and the concentration was determined.

[0140] Take the PCR product recovered from the previous step and prepare the following enzyme digestion system: 500 ng of amplified product, 1.1 μL of Cutsmart, and ddH2O to a final volume of 10.5 μL. After mixing thoroughly, follow the hybridization program: 95℃ for 10 min; -2℃ / s to 85℃; -0.1℃ / s to 25℃. Add 0.5 μL of T7E1 (NEB), digest at 37℃ for 15 min, immediately add 2 μL of Loading Buffer, prepare 2% agarose gel for electrophoresis analysis, and observe and analyze the results after enzyme digestion using a gel imaging system.

[0141] Table 4. Gene Targets

[0142]

[0143]

[0144]

[0145] In Table 4, NC_000007.14 and others are the NCBI Genebank Accession Numbers for the corresponding genes. The sequences corresponding to the aforementioned NCBI Genebank Accession Numbers are based on the update date closest to the application date of this application.

[0146] Table 5. Amplification Primers

[0147]

[0148] The results are as follows Figure 4 and Figure 5As shown, 3 represents target GAPDH-1, 5 represents target LMNA-1, 7 represents target AR, 8 represents target ADRB2, 9 represents target CCR4, 12 represents target CD2, 21 represents target DNMT1-1, 24 represents target EMX1-1, 28 represents target FANCF-3, and 30 represents target VEGFA. This demonstrates that all selected sites exhibited effective cleavage, indicating that engineered Cas12i nucleic acid (N168R+S273R+L332R+G478R+S599R) can be used for multi-gene editing.

[0149] Table 6. Editing efficiency (Indel%) of each target in the 20-target array

[0150] Target Name Position in the array Editing efficiency (Indel%) GAPDH-1 3 36.0 LMNA-1 5 * AR 7 26.4 ADRB2 8 30.8 CCR4 9 12.2 CD2 12 41.3

[0151] In the table above, "*" indicates that no target band was detected, resulting in no editing efficiency.

[0152] Table 7. Editing efficiency (Indel%) of each target in the 30-target array.

[0153]

[0154] SEQ ID NO:1 (Wild-type amino acid sequence)

[0155]

[0156] SEQ ID NO:2 (Wild-type nucleotide sequence)

[0157]

[0158] SEQ ID NO:3(3xFLAG-Myc NLS-BP NLS-Cas12i3(S273R)-linker(SGGSGGSGGS)-T5

[0159] exo-nucleoplasmin NLS (a marker of mutations in the DNA sequence) 4303bp

[0160]

[0161] SEQ ID NO:4(T20, the lowercase sequence indicates a homologous repeat sequence in crRNA)

[0162]

[0163] SEQ ID NO:5(T30, the lowercase sequence indicates a homologous repeat sequence in crRNA)

[0164]

[0165] Sequence 6: Engineered Cas12i nuclease protein sequence N168R+S273R+L332R+G478R+S599R (labeled with amino acid mutations).

[0166]

[0167] The present invention has been described in detail above. Those skilled in the art will recognize that the invention can be practiced in a wide range of ways with equivalent parameters, concentrations, and conditions without departing from its spirit and scope, and without requiring unnecessary experiments. While specific embodiments have been provided, it should be understood that further modifications can be made to the invention. In summary, according to the principles of the invention, this application is intended to include any changes, uses, or improvements to the invention, including changes made using conventional techniques known in the art that depart from the scope disclosed herein.

Claims

1. A Cas mutein, characterized in that, The Cas mutant protein is a mutant protein in which amino acids 168, 273, 332, 478 and 599 of SEQ ID NO:1 are all mutated to arginine, while the other amino acid sequences remain unchanged.

2. A fusion protein, characterized in that, The fusion protein is a protein comprising the Cas mutant protein of claim 1; The fusion protein is obtained by linking the 22nd amino acid residue of the tag protein with the 1st amino acid residue of the first nuclear localization signal sequence via a peptide bond, the 31st amino acid residue of the first nuclear localization signal sequence with the 1st amino acid residue of the first linker via a peptide bond, the 8th amino acid residue of the first linker with the 1st amino acid residue of the Cas mutant protein via a peptide bond, the 1045th amino acid residue of the Cas mutant protein with the 1st amino acid residue of the second linker via a peptide bond, the 10th amino acid residue of the second linker with the 1st amino acid residue of the T5 exonuclease via a peptide bond, and the 291st amino acid residue of the T5 exonuclease with the 1st amino acid residue of the second nuclear localization sequence via a peptide bond, for a total of 1423 amino acids. The tag protein in the fusion protein comprises 22 amino acids, encoded by nucleotides 35-100 of SEQ ID NO:3; the first nuclear localization signal sequence in the fusion protein comprises 31 amino acids, encoded by nucleotides 101-193 of SEQ ID NO:3; the first linker comprises 8 amino acids with the amino acid sequence GHIHGVPAA; the second linker comprises 10 amino acids with the amino acid sequence SGGSGGSGGS; the T5 exonuclease in the fusion protein comprises 291 amino acids, encoded by nucleotides 3383-4255 of SEQ ID NO:3; and the second nuclear localization sequence in the fusion protein comprises 16 amino acids, encoded by nucleotides 4256-4303 of SEQ ID NO:

3.

3. Biomaterials associated with the muteins, characterized in that, The biomaterial is any one of the following: B1) A nucleic acid molecule encoding the mutant protein described in claim 1; B2) Expression cassettes, recombinant vectors, or recombinant microorganisms containing the nucleic acid molecules described in B1).

4. Biomaterials associated with fusion proteins, characterized in that, The biomaterial is any one of the following: B1) A nucleic acid molecule encoding the fusion protein of claim 2; B2) Expression cassettes, recombinant vectors, or recombinant microorganisms containing the nucleic acid molecules described in B1).

5. A composition for gene editing, characterized by, The composition comprises the Cas mutant protein of claim 1 and at least one gRNA; the gRNA is capable of binding to the Cas mutant protein of claim 1.

6. A composition for gene editing, characterized in that, The composition comprises the fusion protein of claim 2 and at least one gRNA; the gRNA is capable of binding to the fusion protein of claim 2.

7. The application of Cas protein in gene editing or the preparation of products for gene editing, characterized in that, The Cas protein is the Cas mutant protein as described in claim 1.

8. The application of fusion proteins in gene editing or the preparation of products for gene editing, characterized in that, The fusion protein is the fusion protein according to claim 2.

9. The use of the biomaterial of claim 3 or 4 in gene editing or in the preparation of products for gene editing.

10. A kit for gene editing, characterized in that, The kit comprises the Cas mutant protein of claim 1, the fusion protein of claim 2, or the composition of claim 5 or 6.