A Cas protein, a fusion protein, its corresponding gene editing system and applications
By modifying the Cas9d protein and gRNA scaffold, a compact Cas9d system and its derivatives were developed, solving the problems of large size and limited targeting range, and achieving efficient and precise genome editing and base conversion, which is suitable for the delivery of gene editing tools and the establishment of disease models.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- JINHUA INSTITUTE OF ZHEJIANG UNIVERSITY
- Filing Date
- 2026-03-03
- Publication Date
- 2026-06-30
AI Technical Summary
Existing DNA base editor tools suffer from limitations in size and target range, leading to complex delivery and increased off-target effects, making it difficult to achieve efficient and precise applications in genome editing.
Develop a compact Cas protein Cas9d and its derivatives, and through amino acid sequence mutation and gRNA scaffold optimization, combine it with a deaminase catalytic domain to form a compact and efficient genome editing system.
It enables efficient and precise genome cutting and base switching in mammalian cells, reduces off-target effects, and is suitable for the delivery of gene editing tools and the establishment of disease models.
Smart Images

Figure CN121780486B_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the field of gene editing, specifically to a Cas protein, a fusion protein, a corresponding gene editing system, and its applications. Background Technology
[0002] DNA base editors (BEs) are promising tools that correct or disrupt gene expression by inducing targeted single nucleotide transitions in the genome. The main advantages of BEs are their high editing efficiency and the fact that they do not require DNA double-strand breaks (DSBs), DNA donor templates, or specific cell cycles; these characteristics highlight their enormous potential in treating genetic diseases.
[0003] However, the large size of Cas9 and its derivatives presents challenges for delivery using a single adeno-associated virus (AAV) vector, the most commonly used vector for in vivo gene editing, whose cargo capacity is typically limited to approximately 4.7 kb. Furthermore, the efficiency of DNA base editing techniques is highly dependent on enzymes with HNH nuclease activity. Among the well-characterized CRISPR systems, type II Cas9 is the most widely used for this purpose. Cas9 consists of three RuvC-like domains and one HNH-like endonuclease domain, which cleave the non-target DNA strand and the target DNA strand, respectively. This mechanism has been successfully used to develop base editors and leader editors. In contrast, the type V Cas12 system, which also plays a role in genome editing, lacks the HNH nuclease domain, making it less effective for base editing.
[0004] The relatively large size of the Cas9 system has prompted efforts to evolve smaller Cas9 protein variants or develop alternative systems to replace SpCas9. Among these alternatives, including TnpB, Cas12f, and Fanzor, only IscB possesses an HNH nuclease domain. IscB (approximately 500 amino acids, ~500 aa) is an obligate mobile element guided activity (OMEGA) system, considered an evolutionary ancestor of Cas9, sharing similar domains and functions.
[0005] Although previous studies have successfully engineered OgeuIscB (Han, D., Xiao, Q., Wang, Y. et al. Development of miniature base editors using engineered IscB nickase. NatMethods 20, 1029–1036 (2023)) and IscB.m16 (Xiao, Q., Li, G., Han, D. et al. Engineered IscB–ωRNA system with expanded target range for base editing. NatChem Biol 21, 100–108 (2025)) to develop miniature DNA editors, their short intervening sequences (14–16 nt) and limited targeting range present significant limitations. These limitations lead to an increase in potential off-target sites for IscB-derived tools, as well as many inaccessible pathogenic sites due to insufficient targeting flexibility.
[0006] Therefore, developing compact genome editing tools that are efficient and precise with broad targeting capabilities remains crucial for advancing research and therapeutic applications. Summary of the Invention
[0007] To address the aforementioned technical problems, this invention provides a Cas protein.
[0008] A Cas protein, wherein the Cas protein is a mutant of a reference Cas protein, the amino acid sequence of which is shown in SEQ ID NO.1, and the Cas protein, compared with the amino acid sequence of the reference Cas protein, contains the following mutation sites:
[0009] (A) D10A and / or H375A; and / or
[0010] (B) Amino acid substitutions at any one or more of the following amino acid sites: E278, V281, G492, K543, M544, D545, S548, A552, F570, E576, G578, D582, C630, I632, E671, E683, T734; and / or
[0011] (C) Deletion of amino acids at positions 95-136 or positions 100-129.
[0012] In this invention, the reference Cas protein used is the MG34-1Cas9d system (hereinafter referred to as Cas9d (SEQ ID NO. 1)) from the phylum Deltaproteobacteria. It is a programmable, RNA-directed nuclease with a length of 747 amino acids (aa), smaller than SpCas9 (1368 aa). It represents an evolutionary bridge between the IscB and Cas9 systems, combining the advantages of both: compact protein size and a relatively long spacer sequence of up to 20 nt.
[0013] It is currently believed that Cas9 evolved from IscB, which contains a small effector protein (approximately 420-500 amino acids in length) and a relatively large guide RNA scaffold sequence. With the evolution of IscB, the size of its protein domains increased, while the length of the guide RNA decreased. For example, OgeuIscB, a representative IscB, is 496 amino acids long, and its guide RNA scaffold sequence is 206 nucleotides long, while SpCas9, a representative Cas9, is 1368 amino acids long, and its guide RNA scaffold sequence is 76 nucleotides long. Unlike IscB and Cas9, the Cas9d (SEQ ID NO. 1; 747 aa) disclosed herein is longer than OgeuIscB (496 amino acids) but shorter than SpCas9 (1368 amino acids), and its guide RNA scaffold sequence (135 nt) is shorter than OgeuIscB (206 nt) but longer than SpCas9 (76 nt). Therefore, without being bound by theory, it is believed that the Cas9d (SEQ ID NO. 1) and its variants disclosed herein are evolutionary intermediates between Cas9 and its ancestor IscB, but do not belong to Cas9 or IscB.
[0014] Preferably, the amino acid substitution refers to substitution with arginine.
[0015] More preferably, the Cas protein, compared to the reference Cas protein, contains the following mutation sites:
[0016] (A) D10A; and / or
[0017] (B) Any one or more of the following amino acid substitutions: V281R, G492R, M544R, E576R, D582R, I632R; and / or
[0018] (C) Deletion of amino acids at positions 95-136 or positions 100-129.
[0019] More preferably, the Cas protein, compared with the amino acid sequence of the reference Cas protein, contains the following mutation sites: (A) V281R, G492R, M544R and D582R; or (B) D10A, V281R, G492R, M544R and D582R.
[0020] More preferably, the Cas protein is selected from any one of the following (i) to (iv):
[0021] (i) The Cas protein, compared with the reference Cas protein, contains the following mutation sites: V281R, G492R, M544R, and D582R, the amino acid sequence of which is shown in SEQ ID NO. 8; or
[0022] (ii). The Cas protein, compared with the reference Cas protein, contains the following mutation sites: D10A, V281R, G492R, M544R, and D582R, the amino acid sequence of which is shown in SEQ ID NO.9; or
[0023] (iii) Compared to the Cas protein described in (i), it has the mutation site described in (i); and compared to the Cas protein described in (i), amino acids at positions 95-136 are missing, and its amino acid sequence is shown in SEQ ID NO.18; or
[0024] (iv) Compared with the Cas protein described in (i), it has the mutation site described in (i); and compared with the Cas protein described in (i), amino acids at positions 100-129 are missing, and its amino acid sequence is shown in SEQ ID NO.19.
[0025] In this invention, arginine (Arg, R) substitutions were made in the RuvC, WED, and / or PI domains of the Cas9d protein (SEQ ID NO. 1), resulting in various Cas9d variants. One non-limiting example is the variant Cas9d-V281R / G492R / M544R / D582R (referred to as Cas9d). Plus ; SEQ ID NO. 8), Cas9d Plus The variant (SEQ ID NO. 8) was modified into a D10A nickase to obtain a variant with the sequence shown in SEQ ID NO. 9 (referred to as Cas9d). Ultra ), and also by deleting Cas9d PlusUsing a portion of the sequence, truncated Cas9d variants were constructed, with sequences shown in SEQ ID NO. 18 (ST153; deletion of amino acids 95-136) or 19 (ST160; deletion of amino acids 100-129), respectively.
[0026] The present invention also provides a fusion protein comprising the above-described Cas protein and one or more functional domains.
[0027] Preferably, the functional domain is selected from: nuclear localization signal (NLS), nuclear output signal (NES), base editing domain (e.g., deaminase or its catalytic domain), base excision domain, uracil glycosylase inhibitor (UGI) or its catalytic domain, uracil glycosylase (UNG) or its catalytic domain, methylpurine glycosylase (MPG) or its catalytic domain, methyltransferase or its catalytic domain, demethylase or its catalytic domain, transcription activation domain (e.g., VP64 or VPR), transcription repression domain (e.g., KRAB portion or SID portion), reverse transcriptase or its catalytic domain, exonuclease (e.g., T5E or its catalytic domain). Catalytic domains, destabilizing domains (e.g., the destabilizing domain (DD) of *E. coli* dihydrofolate reductase (ecDHFR)), histone residue modification domains, nuclease catalytic domains (e.g., FokI), transcriptional modifiers, light-gated factors, chemically inducible factors, chromatin visualization factors, targeting peptides for providing binding to cell surface portions on target cells or target cell types, reporter (e.g., fluorescent) peptides or detection markers (e.g., GST, HRP, CAT, GFP, HcRed, DsRed, CFP, YFP, BFP), localization signals, peptide targeting moieties, DNA-binding domains (e.g., MBP, Lex...). A DBD, Gal4 DBD), epitope tags (e.g., His, myc, V5, FLAG, HA, VSV-G, Trx, etc.), transcription release factors, HDAC, ssRNA cleaving fragments, dsRNA cleaving fragments, ssDNA cleaving fragments, dsDNA cleaving fragments, DNA or RNA ligases, functional domains exhibiting activity in modifying target DNA and their catalytic domains and their functional fragments (e.g., functional truncated fragments), and any combination thereof.
[0028] More preferably, the activity of the modified target DNA is selected from: methyltransferase activity, DNA repair activity, DNA damage activity, superoxide dismutase activity, alkylation activity, dealkylation activity, depurination activity, oxidation activity, deoxygenation activity, pyrimidine dimer formation activity, integrase activity, transposase activity, recombinase activity, polymerase activity, ligase activity, helicase activity, photolyase activity, glycosylation activity, acetyltransferase activity, deacetylase activity, kinase activity, phosphatase activity, ubiquitin ligase activity, deubiquitination activity, adenylation activity, deadenylation activity, SUMOylation activity, deSUMOylation activity, ribosylation activity, deribosylation activity, myristylation activity, demyristylation activity, glycosylation activity (e.g., from O-GlcNAc transferase), and deglycosylation activity.
[0029] More preferably, the fusion protein comprises the aforementioned Cas protein and deaminase catalytic domain.
[0030] More preferably, the deaminase domain is an adenosine deaminase catalytic domain or a cytidine deaminase catalytic domain.
[0031] More preferably, the adenosine deaminase catalytic domain is TadA deaminase.
[0032] More preferably, the adenosine deaminase catalytic domain comprises the sequence shown in SEQ ID NO. 10, or a sequence having at least about 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, 99.1%, 99.2%, 99.3%, 99.4%, 99.5%, 99.6%, 99.7%, 99.8%, 99.9%, or 100% homology with SEQ ID NO. 10.
[0033] More preferably, the catalytic domain of the cytidine deaminase is an APOBEC deaminase, such as human APOBEC3A deaminase.
[0034] More preferably, the cytidine deaminase catalytic domain comprises the sequence shown in SEQ ID NO. 12, or a sequence having at least about 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, 99.1%, 99.2%, 99.3%, 99.4%, 99.5%, 99.6%, 99.7%, 99.8%, 99.9%, or 100% homology with SEQ ID NO. 12.
[0035] More preferably, the fusion protein further includes a uracil DNA glycosylase inhibitor (UGI) domain, the amino acid sequence of which is shown in SEQ ID NO. 13.
[0036] More preferably, the fusion protein comprises a sequence as shown in any of SEQ ID NO. 11, SEQ ID NO. 14-17, or a sequence having at least about 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, 99.1%, 99.2%, 99.3%, 99.4%, 99.5%, 99.6%, 99.7%, 99.8%, 99.9%, or 100% homology with any of the sequences shown in SEQ ID NO. 11, SEQ ID NO. 14-17.
[0037] The present invention also provides a polynucleotide comprising a sequence encoding the aforementioned Cas protein or fusion protein.
[0038] The present invention also provides a carrier comprising the above-described nucleic acid.
[0039] Preferably, the vector is a plasmid vector, a viral vector, a ribonucleoprotein (RNP), or a lipid nanoparticle (LNP).
[0040] More preferably, the viral vector is a recombinant adeno-associated virus (rAAV) vector or a recombinant lentivirus vector.
[0041] The present invention also provides a CRISPR-Cas system, comprising:
[0042] (a) the Cas protein described above, the fusion protein described above, or the polynucleotide described above; and
[0043] (b) A guide RNA or a polynucleotide containing a sequence encoding the guide RNA, the guide RNA comprising a guide sequence and a scaffold sequence located at the 3' end of the guide sequence, wherein the scaffold sequence is capable of forming a complex with the Cas protein or the fusion protein described above, and the guide sequence is capable of hybridizing with a target sequence in the target DNA, thereby guiding the complex to the target DNA.
[0044] Preferably, the stent sequence is selected from any one of the following (I) to (III):
[0045] (I). Any of the sequences shown in SEQ ID NO. 2-7; or
[0046] (II). Compared with any of the stent sequences shown in (I), it has essentially the same secondary structure; or
[0047] (III). A sequence having at least about 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, 99.1%, 99.2%, 99.3%, 99.4%, 99.5%, 99.6%, 99.7%, 99.8%, 99.9%, or 100% homology with any of the stent sequences shown in (I).
[0048] For DNA targeting, Cas9d (SEQ ID NO. 1) requires a specific NGG protospacer adjacent motif (PAM) at the 3' end of the target DNA to guide RNA (gRNA) binding. The chimeric co-derived wild-type RNA scaffold sequence of Cas9d (135 nt; SEQ ID NO. 2) consists of a CRISPR RNA (crRNA), a trans-activating crRNA (tracrRNA), and a GAAA adapter. The folded gRNA acts as a structural scaffold, interacting with a small, flexible REC domain to form a functional recognition module for DNA targeting. These advantages make Cas9d well-suited for developing genome modification tools. Unfortunately, although Cas9d exhibits staggered dsDNA cleavage activity and lacks paraseptal ssDNA cleavage activity in vitro and in E. coli cells, it has not shown nuclease activity in mammalian cells.
[0049] In order to address these limitations, this invention uses the predicted structure and cryo-electron microscopy structure of the Cas9d-gRNA-dsDNA ternary complex as guidance to comprehensively optimize the Cas9d system by modifying the scaffold sequence of Cas9d gRNA (SEQ ID NO. 2) and key protein residues of Cas9d (SEQ ID NO. 1) to enhance its nuclease activity.
[0050] This invention removes redundant bases from the scaffold sequence of gRNA (SEQ ID NO. 2), reducing its size from 135 nucleotides to 110 nucleotides. Furthermore, the applicant enhances the thermal stability of the gRNA by strengthening the hydrogen bonds in the stem-loop structure and pseudo-knot region to stabilize its three-dimensional structure. This results in several scaffold sequence variants (SEQ ID NO. 3~7).
[0051] Preferably, the length of the guiding sequence is 17~24 nt, more preferably 19~24 nt, more preferably 20~23 nt, and most preferably 20 nt.
[0052] In some embodiments, a PAM sequence is present at the 3' end of the protospacer on the complementary strand of the target DNA to the target sequence, wherein the PAM sequence is NGG, where N represents A, T, G, or C.
[0053] More preferably, the CRISPR-Cas system, wherein
[0054] (a) The sequence of the Cas protein is as shown in any of SEQ ID NO. 8, 9, 18 or 19; and
[0055] (b) The stent sequence described is shown in SEQ ID NO. 6.
[0056] The present invention also provides a method for modifying target DNA, comprising contacting the target DNA with the aforementioned CRISPR-Cas system, wherein the guide sequence in the CRISPR-Cas system is capable of hybridizing with a target sequence in the target DNA, thereby modifying the target DNA by the CRISPR-Cas system.
[0057] Preferably, the modification is performed outside the body.
[0058] Preferably, the modification refers to the target DNA being cut or edited with bases.
[0059] The present invention also provides a cell or its progeny comprising the above-described Cas protein, fusion protein, polynucleotide, vector, CRISPR-Cas system or target DNA modified by the above-described methods.
[0060] The cells or their progeny are T cells or NK cells.
[0061] The present invention also provides the use of the above-mentioned Cas protein, fusion protein, polynucleotide, vector or CRISPR-Cas system in the preparation of drugs or reagents for modifying target DNA.
[0062] Preferably, the modification refers to the target DNA being cut or edited with bases.
[0063] Compared with the prior art, the beneficial effects of the present invention are as follows:
[0064] While DNA base editors hold great promise for a wide range of genome editing applications, their practicality is limited by two factors: the use of short-spacer sequences in miniature systems exacerbates off-target effects, and the large size of Cas9-derived editors complicates adeno-associated virus (AVV) delivery. Guided by structural insights from the compact nuclease MG34-1 Cas9d from the phylum Delta Proteobacteria, this invention develops a highly efficient Cas9d system (Cas9d...) through engineering its RNA scaffold and proteins. Ultra ), and further developed the base editor (Ultra-BE). This invention demonstrates Cas9d Ultra Ultra-BE and Ultra-CBE achieved efficient and precise genome cleavage and base switching in human cells, respectively. The optimized gRNA scaffold was shortened by approximately 20% while exhibiting enhanced cleavage activity. Importantly, Ultra-CBE induced premature stop codons in 89% of mouse pups via microinjection, enabling robust disease model establishment. Overall, this invention establishes a series of compact and efficient genome editing tools, which hold promise for advancing biological and biomedical research.
[0065] In this invention, the cleavage efficiency of Cas9d (SEQ ID NO. 1) in HEK293T cells was greatly improved through engineering modifications of the gRNA scaffold sequence or the nuclease amino acid sequence, including Cas9d... Plus (SEQ ID NO. 8). Furthermore, functional nickase variants, including Cas9d, have been developed. Ultra A nickase (SEQ ID NO. 9) was used to bind with deaminases to develop highly efficient and precise compact base editors, including Ultra-ABE (SEQ ID NO. 11) and Ultra-CBE (SEQ ID NO. 14). The resulting base editors are comparable in base editing efficiency to the widely used SpG-BE, but are significantly smaller in size, which is highly advantageous for AAV delivery where delivery size is a constraint.
[0066] Furthermore, the present invention also observed Cas9d Ultra The improvement in Cas9d activity resulting from protein engineering within the system was greater than that resulting from gRNA truncation. This result may be because the compact structure of Cas9d requires a longer gRNA scaffold sequence to compensate for its inherent functional and structural limitations. Furthermore, this invention confirms that Cas9d activity requires a spacer sequence length of at least 17 nucleotides, and this invention finds that using a spacer sequence length of 20 nt yields optimal nuclease performance.
[0067] In summary, through structure-guided design combined with gRNA scaffold optimization and protein engineering, this invention successfully constructed compact, efficient, and highly specific Cas9d variants and their derived base editors. These tools demonstrated excellent performance in mammalian cells and mouse embryos, showcasing their potential as a multifunctional platform for basic research and therapeutic applications. Attached Figure Description
[0068] Figure 1 This is a test diagram illustrating the effect of engineered gRNA scaffold sequences on Cas9d cleavage activity. Figure a shows the evaluation of Cas9d-mediated DNA cleavage in HEK293T cells using the GFxxFP reporter system, which can be activated by DNA double-strand breaks (DSBs). Figure b shows the secondary structure of the Cas9d gRNA scaffold sequence (SEQ ID NO. 2) predicted using RNAfold and AlphaFold3, where P1-P4 represent different stem-loop sequences, and dashed boxes indicate regions that can be truncated through modification. Figure c shows the editing efficiency statistics of gRNA variants with scaffold sequence truncation, substitution, or mutation. Dashed lines represent the editing level when using the wild-type gRNA scaffold sequence (SEQ ID NO. 2), M indicates the substitution of U with C at the G·U mismatch site, and R indicates the substitution of A:U and U:A with G:C base pairs. Figure d shows the optimized scaffold sequences gRNA-v2 / v3 / v4 (sequences shown in SEQ ID NO: 5-7) generated by different combinations of beneficial mutations. Data are expressed as mean ± standard deviation (sd), (n = ...). (3 independent biological replicates).
[0069] Figure 2 This graph illustrates the effects of engineered Cas9d protein on its activity in mammalian cells. Figure a shows the effect of combinations of beneficial mutations in the RuvC domain on cleavage activity, with the Cas9d-v1 variants (V281R / G492R / M544R) showing the highest efficiency among the variants. Figure b shows the effect of single arginine substitutions within the WED and PI domains on the efficiency of Cas9d. Figure c shows the cleavage activity of combined mutations in the RuvC, WED, and PI domains on the GFxxGP reporter. Figure d shows the cleavage efficiency of different Cas9d-gRNA combinations. Unpaired two-tailed Student's t-test was used to compare the means. P < 0.0001; e represents Cas9d and Cas9d fused with different nuclear localization signals (NLS). UltraThe system's cleavage efficiency was tested at eight endogenous genomic sites, where bpNLS represents bipartite NLS, npNLS represents nucleoplasmin NLS, and f represents Cas9d and Cas9d in e. Ultra Statistical analysis of the system's cutting efficiency data is presented in the graph. Data are expressed as mean ± standard deviation (SD). One-way ANOVA was used to assess statistical significance, followed by Dunnett's T3 post-hoc test for multiple comparisons. P < 0.01; *** P < 0.001, ns, no significant difference, in Figure 2 In the sequence a~e, n = 3 independent biological replicates.
[0070] Figure 3 To utilize Cas9d Ultra The figure shows the test results of the system performing genome cutting in human cells. Figure a illustrates the design of spacer sequences of different lengths targeting the target site. Figures b and c show the results of using gRNA with different spacer sequence lengths at two endogenous sites. VEGFA -S1 and CCR5 -S1 represents the cutting efficiency, Unt indicates no processing; d and e are Cas9d Ultra The cleavage efficiency of SpG Cas9 at 14 endogenous sites, and the efficiency at a single site ( Figure 3 d) and statistical comparative analysis ( Figure 3 In section e), statistical analysis employed an unpaired two-tailed Student's t-test, where ns indicates no significance, and n = 14 loci; f represents Cas9d. Ultra and SpG-Cas9 in HEK293T cells BCL11A The Indel spectrum at the -S3 site; g represents the intrinsic... EMX1 At the -S1 site, Cas9d Ultra Systematic assessment of mismatch tolerance within spacer sequences; lowercase magenta letters indicate mismatched bases in the spacer sequence; h represents Cas9d. Ultra The total number of putative off-target sites detected by GUIDE-seq at the same target site by the system and the SpG Cas9 system. Figure 3 The data in b~e and g are expressed as mean ± standard deviation (sd). Figure 3 In the 'e' matrix, each point represents the average edit of n = 3 independent biological replicates. Figure 3 In b~d and g, n = 3 independent biological replicates.
[0071] Figure 4The data presents the test data for Ultra-BE in human cells for base editing. a and c are line graphs showing the overall editing window of ABE and CBE within the target and the average base editing efficiency at each location. Data are expressed as mean ± sem, where the values represent the average editing efficiency at each A / C location within the target, derived from three independent biological replicates of 12 endogenous sites (ABE group) and 17 endogenous sites (CBE group). b and d are summary histograms of base editing efficiency of ABE and CBE at endogenous sites, with data derived from 12 endogenous sites (ABE group) and 17 endogenous sites (CBE group), respectively, expressed as mean ± standard deviation (sd). Each point represents the average optimal base editing efficiency of three independent biological replicates at each given genomic site. e and f show the results of targeted... CCR5 -S1 gene locus, an assessment plot of gRNA-dependent off-target effects of Ultra-ABE and SpG-ABE, data are presented as mean ± standard deviation (sd); g and h are statistical plots of gRNA-independent off-target editing levels of ABE and CBE at three endogenous sites and multiple 5R-loop regions, data are presented as mean ± standard deviation (sd), each point represents the mean optimal base editing efficiency at each endogenous site for three independent biological replicates. To compare the mean, in Figure 4 Unpaired two-tailed Student's t-test was used for b, d, g, and h in the results, and ns indicates no significance.
[0072] Figure 5 To utilize embryo microinjection, Cas9d was used. Ultra Ultra-CBE efficiently generates disease models, where 'a' represents mice. Tyr A schematic diagram of the locus and an experimental design for targeting exon 1 to induce premature termination codons or disrupt splicing of six gRNAs, with target bases and PAM highlighted in cyan and orange, respectively; b shows the efficiency of Ultra-CBE-mediated targeted C-to-T editing in N2a cells (n = 3 independent biological replicates); c shows the microinjection targeting... Tyr Cas9d gene Ultra Image of blastocysts derived from mouse zygotes using sg3 or Ultra-CBE-sg3, scale bar, 100 μm; d is a statistical graph of blastocyst development rate in c; e is a statistical graph of embryonic development and F0 mouse yield; f is a statistical graph of blastocyst development using Cas9d. Ultra (n = 47 mice) and Ultra-CBE (n = 56 mice) edit Tyr Representative fur phenotype of the subsequently generated F0 mice; g and 5h: Cas9d UltraF0 mice produced by (n = 47 mice) and Ultra-CBE (n = 56 mice) Tyr Gene editing efficiency, Figure 5 The data in b are expressed as mean ± standard deviation (sd). Figure 5 The data in g and h are expressed as mean ± sem.
[0073] Figure 6 The diagram shows the phylogenetic tree and protein size distribution of Cas9d. In diagram a, there is a phylogenetic tree of various Cas9, Cas9d, and IscB, with representative Cas9, Cas9d, and IscB highlighted with dots. Diagram b shows the size distribution of the proteins shown in diagram a. Cas9d proteins are significantly smaller than other subtypes (average size approximately 750 aa). Dots represent the size of individual proteins, and data are expressed as mean ± standard deviation (sd).
[0074] Figure 7 Figure 1 shows the cleavage activity of the wild-type Cas9d system (Cas9d protein sequence SEQ ID NO. 1, gRNA scaffold sequence SEQ ID NO. 2) using the GFxxFP fluorescent reporter system. In figure 1, a represents widefield bright-field and fluorescence images of HEK293T cells 48 hours after co-transfection with the Cas9d or SpG Cas9 system and the GFxxFP reporter plasmid. T: target-guided sequence, used in previous studies; NT: non-target-guided sequence, scale bar, 100 μm. Figure 2 shows the editing efficiency quantified by FACS. Data are expressed as mean ± standard deviation (sd), n = 3 independent biological replicates.
[0075] Figure 8 The graphs evaluate the impact of structure-guided gRNA optimization on the activity of Cas9d, where a is the three-dimensional structure of the Cas9d-gRNA-dsDNA ternary complex predicted by Alphafold3; b is a detailed view of the gRNA-dsDNA interaction region within the complex; c is a graph of the editing efficiency test of Cas9d (SEQ ID NO. 1) used in combination with truncated / mutated gRNA scaffold sequences compared to the wild-type scaffold sequence (SEQ ID NO. 2); and d is a graph evaluating the impact of rationally designed combinations of gRNA-v0 (SEQ ID NO. 3) with different mutations on the cleavage efficiency of Cas9d (SEQ ID NO. 1). Data are expressed as mean ± standard deviation (sd), n = 3 independent biological replicates.
[0076] Figure 9The graphs show the impact of protein engineering techniques on the activity of Cas9d (SEQ ID NO. 1). Figure a illustrates the dual-plasmid system used for Cas9d variant screening; figure b shows the effect of arginine substitutions in the three RuvC nuclease domains on Cas9d activity, where NT represents the non-target guide sequence, WT represents the wild-type Cas9d system, and each point represents the average editing efficiency of n=3 independent biological replicates for a given variant. Data are expressed as mean ± standard deviation; figure c shows the evaluation of Cas9d activity based on combinations of beneficial mutations, with data expressed as mean ± standard deviation, n = 3 independent biological replicates; and figure d shows wild-type Cas9d (SEQ ID NO. 1) and variant Cas9d. Plus The WebLogo map of PAM identified by (SEQ ID NO. 8) was characterized by E. coli plasmid depletion experiment.
[0077] Figure 10 Figure 1 shows the results of PAM preference identification. Figure 2 shows a schematic diagram of the two-plasmid system in the Cas9d PAM depletion experiment; figures 3 and 4 show wild-type Cas9d and Cas9d... Plus The scatter plot of the exhaustion data distribution shows that the diagonal in the coordinate system represents the ratio of normalized PAM abundance between the experimental group and the control group to 1, σ represents the standard deviation (STD), pink dots represent PAM with a ratio ≥ -3σ, and blue dots represent PAM with a ratio < -3σ.
[0078] Figure 11 For Cas9d Ultra Evaluation plot of the system's cleavage characteristics at endogenous sites, where a represents Cas9d sequences with different length intervals. Ultra In HEK293T cells PCSK9 - Histogram of cutting efficiency at S7 site; b is Cas9 Ultra and SpG-Cas9 in EMX1 -S4 and HBB -Indel pattern generated at site S1; c represents Cas9d Ultra exist HBB - Mismatch tolerance assessment map within the spacer sequence at the S1 site Figure 11 The data in a and c are expressed as mean ± standard deviation (sd), and n = 3 independent biological replicates.
[0079] Figure 12 For Cas9d Ultra A map assessing the system's genome-wide specificity, where a and b represent Cas9d identified by GUIDE-seq. Ultra exist PCSK9 -S7 ( Figure 12(a; 12 off-target sites) and BCL11A -S2 ( Figure 12 (b; off-target sites at 20 sites); c and d are SpG Cas9 identified by GUIDE-seq at... PCSK9 -S7 ( Figure 12 (c; 52 off-target sites) and BCL11A -S2 ( Figure 12 The off-target sites (d in the diagram) are represented by dots and lines, while the target sites are represented by dots. Mismatched bases in the off-target sequences are highlighted in color, including sites with ≥2 reads and ≤8 mismatches.
[0080] Figure 13 For Cas9d Ultra Evaluation diagram of system-derived nickase system and base editor, where 'a' represents the evaluation of Cas9d using a dual-target reporter system. Ultra Graphs of nickase activity of variants (D10A and / or H375A), with SpG Cas9 variant as a positive control; ST, single target; FDT, flanking dual targets; data are expressed as mean ± standard deviation (sd), n = 3 independent biological replicates; b is a schematic diagram of the Ultra-ABE and Ultra-CBE constructs, NLS, nuclear localization signal.
[0081] Figure 14 The figure shows the A:T to G:C base conversion mediated by ABE, where a~c are base editing efficiency test plots at 12 endogenous sites of Ultra-ABE and SpG-ABE in HEK293T cells, respectively. The numerical values and error bars show the mean and standard deviation, and n = 3 independent biological replicates.
[0082] Figure 15 The figure shows the C:G to T:A base conversion mediated by CBE, where a~c are base editing efficiency test plots at 17 endogenous sites of Ultra-CBE and SpG-CBE in HEK293T cells. All values are mean ± standard deviation, and n = 3 independent biological replicates.
[0083] Figure 16 This is a graph assessing the insertion and deletion (indel) frequencies of the base editor. Figures a and b are heatmaps and statistical analysis plots of the average indel frequency generated by ABE at 12 human target sites, n = 3 independent biological replicates; figures c and d are heatmaps and statistical analysis plots of the average indel frequency generated by CBE at 17 human genomic sites. Figure 16 The values of a and c in the equation represent the average of n = 3 independent replicates. Figure 16The values in b and d are expressed as mean ± standard deviation, and each point represents the mean indel of 3 independent samples.
[0084] Figure 17 The graph shows the detection of gRNA-dependent off-target effects of the ABE system. In the graph, a and b are the evaluation graphs of the off-target effects of targeted deep sequencing on Ultra-ABE and SpG-ABE at the Cas-OFFinder prediction site in HEK293T cells. The values and error bars reflect the mean ± standard deviation. n = 3 independent biological replicates.
[0085] Figure 18 The graph shows the detection of gRNA-dependent off-target effects of the CBE system. In the graph, a and b are the evaluation graphs of the off-target effects of targeted deep sequencing on Ultra-CBE and SpG-CBE at the Cas-OFFinder prediction site in HEK293T cells. The data are expressed as mean ± standard deviation (sd), and n = 3 independent biological replicates.
[0086] Figure 19 Orthogonal R-loop assay for gRNA-independent off-target editing of ABE and CBE. Deep sequencing analysis of gRNA-independent editing at genomic sites in five orthogonal R-loop regions (each corresponding to three target sites) in HEK293T cells. In the figure, a and b show similar editing levels observed between the Ultra-BE and SpG-BE systems for ABE and CBE. Data are expressed as mean ± standard deviation, n = 3 independent biological replicates.
[0087] Figure 20 To utilize Cas9d Ultra Ultra-CBE efficiently edits mouse embryos Tyr Site, where a is Cas9d Ultra Six in N2a cells Tyr Efficiency test plot of frameshift mutation (non-3n indel) generated at the target site, n=3 independent biological replicates; b is the result of Cas9d Ultra -sg3 RNA or Ultra-CBE-sg3 RNA is microinjected into the fertilized egg to produce Tyr Workflow diagram of modified mouse model; c represents the target. Tyr Cas9d gene Ultra Developmental statistics of blastocysts after microinjection of sg3 or Ultra-CBE-sg3; d and e are Cas9d Ultra -sg3 ( Figure 20 (d;n = 18 embryos) and Ultra-CBE-sg3 ( Figure 20The graph shows the editing efficiency of Ultra-CBE in mouse embryos (e; n = 23 embryos); f represents the editing efficiency of Ultra-CBE in mouse embryos. Tyr Representative Sanger sequencing spectra of robustly induced premature stop codons (TAAs) at the locus. Figure 20 The data in a, d, and e are expressed as mean ± standard deviation.
[0088] Figure 21 For Cas9d Ultra Genotyping diagram of representative F0 mice generated by Ultra-CBE, where 'a' represents three representative Cas9d mice. Ultra Sequencing map of -sg3 F0 pups, dashed lines represent deletions (mainly frameshift mutations); b represents three Ultra-CBE-sg3 F0 mouse genotypes, with target cytosine indicated by red arrows.
[0089] Figure 22 For Cas9d Ultra In vivo specificity assessment of Ultra-CBE. Targeted deep sequencing of Cas9d. Ultra ( Figure 22 a) and Ultra-CBE ( Figure 22 b) gRNA-dependent off-target analysis was performed at the Cas-OFFinder predicted sites. Data are expressed as mean ± standard deviation (n = 3 mice per group).
[0090] Figure 23 For example, the target DNA and the system include (1) an exemplary guiding nucleic acid comprising a guiding sequence and a scaffold sequence, and (2) an exemplary napDNAbp (in this invention, a programmable RNA-guided DNA endonuclease or a mutant thereof, such as Cas9d). Detailed Implementation
[0091] The present invention will be further described in detail below with reference to the embodiments, but the implementation of the present invention is not limited to the following embodiments.
[0092] All raw materials used in this invention are commercially available.
[0093] Similar to programmable RNA-guided DNA endonucleases Cas9, Cas12, and IscB, the Cas9d and its variants of the present invention are capable of binding to target DNA (e.g., dsDNA), as directed by a guide nucleic acid (e.g., guide RNA) containing a guide sequence targeting that DNA. The Cas9d or its variants of the present invention can be associated with a guide nucleic acid that positions / targets the Cas9d or its variants to target DNA containing a DNA strand (i.e., the target strand) that is inversely complementary to the guide nucleic acid or a portion thereof (e.g., the guide sequence of the guide RNA). In other words, the guide nucleic acid is "programmed" to position and bind the Cas9d or its variants of the present invention to the target DNA. The binding of the Cas9d or its variants of the present invention to the target DNA enables the Cas9d or its variants, or constructs containing it, to access and act on the target DNA.
[0094] For this purpose, the guiding nucleic acid comprises a scaffold sequence responsible for (capable of) forming a complex with the Cas9d or a variant thereof of the present invention, and a guiding sequence intentionally designed to (capable of) hybridizing with a target sequence of the target DNA, thereby guiding the complex comprising the Cas9d or a variant thereof of the present invention and the guiding nucleic acid to the target DNA, such that the Cas9d or a variant thereof of the present invention binds indirectly to the target DNA. The ability of the Cas9d or a variant thereof of the present invention to bind to the target DNA by such guiding nucleic acid makes the Cas9d or a variant thereof of the present invention a nucleic acid-programmable DNA-binding protein (napDNAbp) or a nucleic acid-programmable DNA-binding domain (napDNAbd) similar to Cas9, Cas12, and IscB.
[0095] refer to Figure 23 For example, dsDNA is described as comprising 5' to 3' single DNA strands and 3' to 5' single DNA strands, the 5' to 3' single DNA strands comprising an exemplary first deoxynucleotide dA, and the 3' to 5' single DNA strands comprising an exemplary second deoxynucleotide dT that pairs with the bases of dA.
[0096] For example, a guide nucleic acid is described as comprising a guide sequence and a scaffold sequence. The guide sequence is designed according to base pairing principles to hybridize with a portion of a 3' to 5' single DNA strand, and thus the guide sequence "targets" that portion. Therefore, the 3' to 5' single DNA strand is called the "target strand (TS)" of the dsDNA, while the opposite 5' to 3' single DNA strand is called the "non-target strand (NTS)" of the dsDNA. The portion of the target strand that serves as the basis for the design of the guide sequence and is capable of hybridizing with the guide sequence is called the "target sequence," while the opposite portion on the non-target strand is called the "protospacer," which is typically 100% (perfectly) reverse complementary to the target sequence unless there is an intentional or unintentional mismatch.
[0097] Typically, as is customary in the art, unless otherwise explicitly instructed, nucleic acid sequences (e.g., DNA, RNA) are written in a 5' to 3' orientation.
[0098] For example, the DNA sequence for ATGC is generally understood as 5'-ATGC-3' unless otherwise specified. Its reverse sequence is 5'-CGTA-3'. Its complete complement is 5'-TACG-3'. Its complete reverse complement is 5'-GCAT-3' (3'-TACG-5'). Note that complete complement sequences generally do not have the ability to pair / hybridize with the original sequence.
[0099] Typically, unless otherwise indicated, the double-stranded sequence of dsDNA can be represented by the sequence of its 5' to 3' single DNA strand, written conventionally in the 5' to 3' direction / orientation.
[0100] For example, for a dsDNA having a 5' to 3' single DNA strand with a 5' to 3' single DNA strand and a 3' to 5' single DNA strand with a 3' to 5' single DNA strand as shown below, the dsDNA can be simply represented as 5'-ATGC-3'.
[0101] 5' ----- ATGC ----- 3'
[0102] 3' ----- TACG ----- 5'
[0103] It should be noted that the 5' to 3' single DNA strand or the 3' to 5' single DNA strand of dsDNA can be a non-target strand from which the prototype spacer sequence is selected.
[0104] In a base editing scenario, the target nucleotide to be edited (e.g., Figure 23 The strand containing the deoxyribonucleotide (dA) in a target string is called the edited strand, and the opposite strand is called the non-edited strand. As used in this article, the non-target strand is the edited strand, and the target strand is the non-edited strand.
[0105] Typically, for genes, the 5' to 3' single DNA strand is the sense strand, and the 3' to 5' single DNA strand is the antisense strand. Both sense and antisense strands can be non-targeted strands from which the prototypical spacer sequence is selected.
[0106] In order to hybridize with dsDNA, such as dsDNA 5'-ATGC-3', the guide sequence of the gRNA can be designed in one embodiment to have a 5'-AUGC-3' sequence that is completely reverse complementary to the 3' to 5' strands (3'-TACG-5') of the dsRNA.
[0107] When the guide sequence of the gRNA is completely inversely complementary to the target sequence of the target DNA and the target sequence of the target DNA is completely inversely complementary to the prototype spacer sequence of the target DNA, the guide sequence of the gRNA is the same as the prototype spacer sequence of the target DNA, except that the guide sequence is U (due to its RNA properties) and the corresponding prototype spacer sequence is T (due to its DNA properties).
[0108] As used in this invention, if a DNA sequence (e.g., 5'-ATGC-3') is transcribed into an RNA sequence in which each dT (deoxythymidine, or simply "T") in the primary sequence of the DNA sequence is replaced with U (uridine), and each dA (deoxyadenosine, or simply "A"), dG (deoxyguanosine, or simply "G") and dC (deoxycytidine, or simply "C") is replaced with A (adenosine), G (guanosine), and C (cytidine), respectively, to produce 5'-AUGC-3', then in this disclosure the DNA sequence is referred to as "encoding" the RNA sequence.
[0109] As used in this invention, "nucleic acid" and "polynucleotide" can be used interchangeably; "protein" and "polypeptide" can be used interchangeably; "programmable nucleic acid", "guide nucleic acid" and "programmable guide nucleic acid" can be used interchangeably; "guide RNA", "gRNA" and "sgRNA" can be used interchangeably; "guide sequence" and "spacersequence" can be used interchangeably.
[0110] As described herein, the guide sequence is designed to hybridize with the target sequence. As used herein, the term "hybridize (hybridize, hybridizing, or hybridization)" refers to a reaction in which one or more polynucleotide sequences react to form a complex that is stabilized via hydrogen bonds between the bases of the polynucleotide sequences. Hydrogen bonds can occur through Watson-Crick base pairing, Hoogstein binding, or any other sequence-specific manner. A polynucleotide sequence capable of hybridizing with a given polynucleotide sequence is referred to as a "complement" of that given polynucleotide sequence. As used herein, the hybridization of the guide sequence with the target sequence is stabilized to allow functional domains of an effector peptide (e.g., Cas9d) or associated with (e.g., fusion) a nucleic acid complex containing the guide sequence to act on the target sequence or its complement or a nearby sequence (e.g., cleavage, deamination).
[0111] As used in this invention, the term "and / or" in phrases such as "A and / or B" is intended to indicate one or both of the options, including both A and B, A or B, A (alone), and B (alone). Similarly, the term "and / or" in phrases such as "A, B, and / or C" is intended to cover each of the following embodiments: A, B, and C; A, B, or C; A or C; A or B; B or C; A and C; A and B; B and C; A (alone); B (alone); and C (alone).
[0112] As used in this invention, a numerical range includes the endpoints of the range as well as each specific value within the range. For example, "16 to 100 nucleotides" includes 16 nucleotides and 100 nucleotides as well as each specific value between 16 and 100, such as 17, 23, 34, 52, and 78.
[0113] It should be understood that the embodiments described in this disclosure include "consisting of embodiments" and / or "consisting substantially of embodiments".
[0114] Experimental methods
[0115] I. Plasmid Cloning
[0116] All PCR amplifications were performed using Phanta Max Super-Fidelity DNA Polymerase (Vazyme). PCR products were separated by agarose gel electrophoresis and purified using a gel extraction kit (OMEGA Bio-tek). The human codon-optimized Cas9d gene and its corresponding gRNA scaffold were synthesized by Tsingke Biotech. Plasmids pQH440, pTH03, SpGCas9, and their derived base editors (BEs) were provided by Yang Hui's laboratory. The Cas9d coding sequence was cloned into the XhoI and EcoRI restriction sites of the CBh promoter-driven mammalian cell expression backbone (pQH440) using the 2xpEASY Basic Seamless Cloning Assembly Kit (TransGen Biotech). After Cas9d assembly, the gRNA scaffold was cloned by XbaI and KpnI digestion. The GFxxFP reporter gene plasmid was modified from pTH03. For spacer sequence clones, complementary oligonucleotides were annealed and ligated into BpiI-digested pU6-gRNA_scaffold-pCBh-Cas9d-bGH_poly A expression cassettes using T4 DNA ligase (Thermo Fisher Scientific). All oligonucleotides were synthesized by Tsingke Biotech and Genewiz. All clones were validated from the promoter to the poly(A) tail using Sanger sequencing (Tsingke Biotech and Genewiz).
[0117] II. Cell Culture and Transfection
[0118] HEK293T and Neuro-2a (N2a) cell lines were provided by Yang Hui's laboratory. Both cell lines were cultured in high-glucose DMEM (Gibco) medium containing 10% FBS (Gibco), 1% penicillin-streptomycin-glutamine (Gibco), and 1% NEAA (Gibco) in a humidified incubator at 37°C and 5% CO2. To assess the cleavage activity of the Cas9d system and its engineered variants during optimization, HEK293T cells were seeded in poly-D-lysine-coated 48-well plates (Corning) and transfected at approximately 70% cell confluence. Briefly, transfection was performed using polyethyleneimine (PEI), with a total of 800 ng of plasmid DNA added per well, containing a 1:1 molar ratio of the Cas9d-associated system and the GFxxGP reporter system. For endogenous gene editing, transfection was performed in 24-well plates. Following the manufacturer's instructions, use PEI to transfect 1.6 μg of the integrated plasmid encoding the editing system per well into cells that have achieved approximately 70% cell confluence.
[0119] To effectively edit mice Tyr Six single guide RNAs (sgRNAs) were screened in cultured N2a cells. Briefly, N2a cells were seeded in 24-well plates and transfected approximately 12 hours post-seeding at about 70% confluence. A total of 1.6 μg of the all-in-one editor plasmid was transfected per well using Lipofectamine 2000 reagent (Invitrogen) according to the manufacturer's instructions.
[0120] III. FACS Analysis and Cell Sorting
[0121] To quantify the editing efficiency of Cas9d and its related variants on the GFxxFP reporter, flow cytometry analysis was performed using a Beckman CytoFlex S instrument to measure GFP fluorescence intensity in BFP+mCherry+ cells. Briefly, 48 hours after transfection, cells were collected by trypsin digestion and resuspended in medium containing 10% FBS to neutralize the trypsin. After recording 20,000 events within a single-cell phylum, data acquisition was stopped using FSC, SSC, and fluorescence channels (emissions at 450 nm, 525 nm, and 561 nm, respectively). Editing efficiency was further quantified using FlowJo X (v10.0.7) by calculating the percentage of GFP+ cells in the BFP+mCherry+ population. To assess endogenous editing efficiency, cells underwent the same treatment 48 hours post-transfection. Approximately 15,000 cells from the top 35% of the mCherry-positive population were sorted into 500 μL of medium using a BD FACSAria Fusion Flow Cytometer. The cells were lysed with 25 μL of lysis buffer containing proteinase K (Vazyme), and genomic DNA was extracted from the sorted cells. The cells were then incubated at 55 °C for 50 min and at 95 °C for 10 min. The cell lysates were stored at -20 °C for later use.
[0122] IV. Targeted Deep Sequencing and Analysis
[0123] To assess editing efficiency at endogenous sites, genomic regions of interest were amplified from cell lysates by PCR using 2×Taq mix (Vazyme) and barcode primers. PCR products were separated by agarose gel electrophoresis, and the target bands were purified using a gel extraction kit (Omega Bio-tek). For library preparation, end repair, adapter ligation, and amplification of the mixed DNA fragments were performed using the VAHTSUniversal Pro DNA library Prep Kit for Illumina (Vazyme, ND608) according to the manufacturer's protocol. Paired-end sequencing of the library was performed on an Illumina NovaSeq 6000 (Genewiz). The deep sequencing data were first demultiplexed using a custom script based on sample barcodes. CRISPResso2 (v 2.3.1) was used to analyze indel frequency and base editing efficiency. The key parameters are as follows: --plot_window_size 20 --window_around_sgRNA 15 --quantification_window_center -4 (for cutting efficiency analysis), and --plot_window_size 14 --window_around_sgRNA 12 --quantification_window_center -10 (for base frequency).
[0124] V. Guiding RNA-dependent off-target detection
[0125] To assess the specificity of the base editor, Cas-OFFinder (v2.4) was used to identify potential RNA-dependent off-target sites in the human reference genome (hg38). Search parameters allowed ≤ 5 mismatches within the target region and required classic PAM sequences: NGG for Cas9d-derived BEs and NGN for SpG Cas9-derived BEs. For ABEs, further off-target effect analysis was performed using potential off-target sites associated with three intermediate target sites (CCR5-S1, EMX1-S1, and TTR-S1). Similarly, for CBEs, potential off-target sites associated with three intermediate target sites (PCSK9-S3, TTR-S1, and VEGFA-S4) were analyzed. The top 10 potential off-target sites for each intermediate target site were subsequently validated by PCR amplification and high-throughput sequencing (HTS). HTS data were analyzed using CRISPResso2 (v2.3.1) with the same parameters used for the intermediate-target BE analysis.
[0126] VI. Orthogonal R-loop Experiment
[0127] To evaluate the Cas-independent DNA off-target editing capabilities of ABE and CBE, an orthogonal R-loop assay was used to assess the ability of BE to mutate single-stranded DNA within the off-target R-loop region generated by orthogonal, catalytically inactivated Staphylococcus aureus Cas9 (dSaCas9). Specifically, HEK293T cells were seeded in 24-well plates. At ~70% confluence, a total of 1.6 μg of plasmid DNA (including both the integrated BE and dSaCas9 systems) was co-transfected into the cells with 3.2 μL of PEI according to the manufacturer's protocol. Similar to the target assay, ~35% of GFP and mCherry double-positive cells were sorted 48 h after transfection. Subsequently, the target site and five associated off-target SaCas9 R-loop sites were amplified using 2×Taq mix, followed by deep sequencing and further analysis of off-target effects.
[0128] VII. GUIDE-seq Experiment
[0129] In short, HEK293T cells were seeded in 10 cm Corning dishes before transfection. At 70%–80% confluency, 15 μg of the integrated editing plasmid and 50 pmol of annealed double-stranded oligodeoxynucleotides (dsODN) were co-transfected into the cells using PEI. After 72 h, approximately 1 × 10⁶ mCherry-positive cells out of the top 30% of cells were transfected using FACS sorting. 6 Genomic DNA was extracted using a genomic DNA purification kit (Sangon Biotech) according to the manufacturer's protocol and sheared to an average size of 500 bp using NEBNextdsDNA Fragmentase (NEB). Libraries were prepared using the VAHTS Universal DNA Library Prep Kit for Illumina V3 (Vazyme) via end repair, dA tailing, and adapter ligation. Fragments containing dsODN integration were enriched by two rounds of nested PCR using anchored primers. Sequencing was performed in paired-end mode on an Illumina NovaSeq6000 (GentleGen). Data were analyzed using a GUIDE-seq analysis pipeline, where Cas9d... Ultra NGG PAM sequences are used, while SpG Cas9 uses NGN PAM sequences; both allow up to eight mismatches.
[0130] VIII. PAM Library Generation
[0131] In summary, a randomized PAM plasmid library was constructed using synthetic oligonucleotides (HuaGene) containing eight random nucleotides downstream of the Cas9d target sequence. Random single-stranded DNA (ssDNA) oligonucleotides were annealed to short primers to convert to double-stranded DNA (dsDNA), which was then used to synthesize a complementary strand using DNA polymerase I, a large fragment (Klenow fragment, New England Biolabs). The obtained dsDNA fragment was purified and then assembled into a HindIII and EcoRI-linearized pACYC184 vector backbone using NEBuilder HiFi DNA Assembly (New England Biolabs). After purification, the assembly product was electroporated into competent E. coli using TransforMax EC100 according to the manufacturer's protocol. The transformed cells were plated on square (22 cm × 22 cm) LB agar plates supplemented with chloramphenicol and incubated at 37°C. After 13 hours of incubation, colonies were collected by scraping, and plasmid DNA was extracted using the NucleoBond Xtra Midiprep kit (Machery Nagel).
[0132] IX. PAM Exhaustion Test and Bioinformatics Analysis
[0133] In summary, 200 ng of PAM library plasmid and 300 ng of integrated Cas9d system expression plasmid were co-electroplated into TransforMax EC100 electroporated competent *E. coli*. Transformed cells were recovered in antibiotic-free SOC medium at 37°C for 1 hour to allow for antibiotic resistance gene expression, followed by selection on 22 cm × 22 cm LB agar plates containing ampicillin and chloramphenicol, and then cultured at 37°C for 13 hours. Colonies were then collected, and plasmid DNA was extracted using the same methods as for PAM library preparation. Sequences containing the target and PAM were amplified and sequenced with paired ends (150 bp) on an Illumina NovaSeq 6000 platform (Genewiz). Randomized PAM regions were extracted, counted, and normalized relative to the total PAM count for each sample. For each specific PAM, the frequency of occurrences more than once was retained, and the log fold change (logFC) of these retained PAMs was calculated as the log ratio relative to the non-target control, with 1e-9 pseudo-count adjustment. PAM exhaustion with logFC < -3σ (sd) was considered statistically significant, and these sequences were collected to construct sequence identifiers using WebLogo (v3.7.12).
[0134] 10. Editor mRNA and Tyr In vitro transcription of -gRNA
[0135] In simple terms, the T7 promoter sequence is added to Cas9d via PCR amplification. Ultra The 5' end of the Ultra-CBE and gRNA transcription template was extracted. PCR products were purified using a gel extraction kit (OMEGA Bio-tek) and used as transcription templates. mRNA was transcribed using the mMESSAGE mMACHINE T7 Ultra Transcription Kit (Life Technologies), and gRNA was transcribed using the MEGAshortscript T7 Transcription Kit (Life Technologies), both following the manufacturer's recommended protocols. RNA was purified using the MEGAclear Transcription Clean-Up Kit (Life Technologies) and eluted with nuclease-free water (Ambion). The RNA solution was aliquoted and stored at -80°C for later use. For microinjection, 1 μL of Cas9d... Ultra Mix mRNA or Ultra-CBE mRNA (500 ng / μL) with 1 μL gRNA (500 ng / μL) and 3 μL nuclease-free water. Centrifuge the mixture at 14,000 rpm and 4°C for 10 minutes, and transfer the supernatant to a new 200 μL RNase-free centrifuge tube for microinjection.
[0136] XI. Phylogenetic Tree Construction
[0137] To construct representative phylogenetic trees for Cas9, Cas9d, and IscB, multiple sequence alignments of all proteins were performed using MAFFT-fftnsi (v7.526) with a maximum of 1000 iterations. Prunes with a cleavage frequency exceeding 50% in each alignment column were trimmed using trimAl (v1.5.rev0) with the parameter -gt 0.5 arguments. The phylogenetic trees were then constructed using IQ-TREE3 (v3.0.1) with the processed alignment results, employing 1000 ultrafast bootstrap repeats and the --alrt 1000-bnni parameter to optimize the tree structure; other parameters remained at their default settings.
[0138] 12. Animals
[0139] In this study, female B6D2F1 (C57BL / 6J × DBA2) mice (7-9 weeks old) underwent superovulation after treatment with exogenous gonadotropins (PMSG and hCG) and were then mated with male B6D2F1 mice (10-15 weeks old). 0.5 days post-mating, 2-cell embryos were transferred to 8-week-old pseudopregnant female ICR mice. All mouse experiments were approved by the Biomedical Research Ethics Committee of the College of Animal Science and Technology, Northwest A&F University. Mice were housed in a 12-hour light-dark cycle chamber with free access to water and food.
[0140] XIII. Single-cell fertilized egg microinjection and embryo transfer
[0141] In short, fertilized eggs were isolated from the fallopian tubes 21 hours after human chorionic gonadotropin (hCG) injection and transparentized in M2 and M16 media (supplemented with amino acids), respectively. The RNA mixture from the editing system (mRNA: 100 ng / μL; gRNA: 100 ng / μL) was injected into the cytoplasm of the prokaryotically recognized fertilized eggs. Microinjection was performed using a piezoelectric microinjector (HARIOLAB) in droplets of HEPES-CZB medium (containing 5 μg / mL cytochalasin B). The injected fertilized eggs were cultured in M16 medium containing amino acids at 37°C in a humidified incubator with 5% CO2 for approximately 1.5 days until they reached the 2-cell stage. Subsequently, 25 2-cell embryos were transferred into the fallopian tubes of pseudopregnant ICR female mice 0.5 days post-mating.
[0142] XIV. Embryo and Mouse Genotyping
[0143] To evaluate Cas9d Ultra and Ultra-CBE in injected embryos Tyr Editing efficiency at the site involves lysing a single 2-cell stage embryo or blastocyst in 2 μL of lysis buffer to release genomic DNA. For the purpose of... Tyr Genotyping was performed on modified offspring, and tail tissue was collected from newborn pups on day 7 postnatally. Genomic DNA was extracted using a tissue lysis kit (Vazyme) according to the manufacturer's instructions. Target regions in embryos and offspring mice were amplified by two rounds of PCR using 2×Taq mix (Vazyme), followed by the addition of barcodes and Illumina adapters. The resulting libraries were subjected to high-throughput sequencing. Cas9d was analyzed using CRISPResso2 (v2.3.1). Ultra Editing efficiency and genotypes caused by Ultra-CBE. Sanger sequencing data were analyzed using EditR (v1.0.10).
[0144] XV. Data Analysis
[0145] All values are expressed as mean ± standard deviation (sd), except for measurements of the base editor's editing window and mouse editing efficiency, which are presented as mean ± standard error of the mean (sem). Detailed descriptions of the statistical methods used are shown in the corresponding figures. Two-tailed unpaired Student's t-tests were used for comparisons between two independent groups. One-way ANOVA was used for multiple comparisons. A p-value < 0.05 was considered statistically significant. Experiments were not randomized, and researchers were not blinded during the experimental and outcome assessment processes. Statistical analysis was performed using GraphPad Prism (version 8.2.1).
[0146] Example 1: Modifying RNA scaffolds to reduce size and enhance cleavage activity
[0147] The average size of various Cas9ds is approximately 750 aa ( Figure 6 Previous studies have reported that although MG34-1 Cas9d (SEQ ID NO. 1; in the following examples, "Cas9d" refers to this specific Cas9d) exhibits activity in vitro, it cannot directly cleave the human genome via mRNA delivery. To date, whether Cas9d can be engineered for mammalian genome editing remains unknown. To sensitively detect the cleavage activity of Cas9d in mammalian cells, this invention uses a previously developed fluorescent reporter system (referred to as GFxxFP). This system includes a blue fluorescent protein (BFP) and an activatable enhanced green fluorescent protein (EGFP) (…). Figure 1 (a) EGFP activation occurs via a single-strand annealing-mediated repair pathway after Cas9d cleaves the target sequence.
[0148] To evaluate the cleavage efficiency of MG34-1 Cas9d (SEQ ID NO. 1), this invention co-expressed a human codon-optimized Cas9d and its corresponding gRNA (previously used for in vitro cleavage) with the GFxxFP reporter in HEK293T cells. The gRNA consists of a guide sequence, such as a guide sequence targeting the GFxxFP, and a scaffold sequence as shown in SEQ ID NO. 2 located at the 3' end of the guide sequence. However, this Cad9d-gRNA system could not cleave the reporter (…). Figure 7 Considering potential sequence incompatibility, this invention targets a different sequence. 48 hours post-transfection, fluorescence-activated cell sorting (FACS) experiments showed that approximately 20% of BFP cells... + mCherry + Double-positive cells detected GFP signal ( Figure 8(The column marked "c" in the figure; "WT" indicates a cutting efficiency of approximately 20%). These results suggest that Cas9d has the potential to be engineered for gene editing in mammalian cells.
[0149] This invention attempts to enhance the activity of the programmable endonuclease Cas9d by employing guided RNA engineering. To gain a deeper understanding of the Cas9d gRNA folding, this invention first uses AlphaFold3 to predict the three-dimensional structure of the Cas9d-gRNA-dsDNA ternary complex (…). Figure 1 b and Figure 8 (a~b in the sequence). The structure of this gRNA scaffold sequence (SEQ ID NO. 2) consists of four main parts, named P1~P4, specifically four stems and three main loops (a~b in the sequence). Figure 1 (b) It is worth noting that a loop inside P3 forms a pseudo-knot by pairing with five bases at the 3' tail of the gRNA, further stabilizing the overall structure. Figure 1 b and Figure 8 (a~b in the original text). To improve the compatibility of gRNA with Cas9d, this invention comprehensively engineered the gRNA scaffold sequence (SEQ ID NO. 2), including truncating base pairs and nucleotides 127~135 (nt) in the P1 and P4 stems; replacing the mismatched U at the G·U swing site with C to form a G:C base pair; and independently replacing A:U or U:A base pairs with G:C base pairs (a~b in the original text). Figure 1 (c) It is worth noting that some gRNA variants show improved cleavage activity ( Figure 1 (c) Moreover, combining these beneficial mutations resulted in enhanced activity and a 25 nt size reduction. Specifically, the scaffold sequence gRNA-v0 (SEQ ID NO. 3) demonstrated this improvement. Figure 8 The c) sequence contains an 8-nt deletion in P1, an 8-nt deletion in P4, a 9-nt tail removal, and an M6 mutation (67U->C). Furthermore, by replacing the 73U:126A base pair in the pseudoknot of gRNA-v0 (SEQ ID NO. 3) with 73G:126C, the scaffold sequence gRNA-v1 (SEQ ID NO. 4) was obtained. Introducing the M9 mutation (112U->C) into gRNA-v1 yielded the further efficient scaffold sequence gRNA-v2 (SEQ ID NO. 5). Figure 8(d) Furthermore, introducing the R3 mutation (4G:35C) or the R3+R4 combined mutation (4G:35C + 6G:33C) into gRNA-v2 yielded the scaffold sequences gRNA-v3 (SEQ ID NO. 6) and gRNA-v4 (SEQ ID NO. 7), respectively. These scaffold sequence variants maintained cleavage efficiency comparable to gRNA-v2, but with an increased G:C content (d). Figure 1 (d) In summary, these results demonstrate that the optimized gRNA scaffold reduces the gRNA size by approximately 20% while enhancing Cas9d (SEQ ID NO. 1) activity. Note that gRNA-v0, gRNA-v1, gRNA-v2, gRNA-v3, gRNA-v4, and gRNA-v5 refer to the scaffold sequences of the gRNA, not the full-length gRNA. These scaffold sequences can be combined with any suitable guide sequence to form a complete gRNA. The gRNAs tested with these scaffold sequences were all wild-type Cas9d (SEQ ID NO. 1).
[0150] Example 2: Protein modification to improve Cas9d cleavage efficiency
[0151] To enhance the activity of Cas9d (SEQ ID NO. 1), this invention developed a co-expression system ( Figure 9 (a) ensures consistent use of gRNA and reporter dosage across all Cas9d variants.
[0152] Given that the RuvC domain is crucial for the activity of Cas9d (SEQ ID NO. 1), this invention first screens for arginine by individually replacing non-arginine residues with Arg in the three RuvC domains of Cas9d. It was found that multiple single Arg substitutions, such as E278R, V281R, G492R, K543R, M544R, D545R, and S548R, enhanced the cleavage efficiency. Figure 9 (b) Then, by combining these beneficial mutations, the present invention obtained the most active ( Figure 2 The mutant V281R / G492R / M544R (named Cas9d-v1) is a) of the group.
[0153] This invention extends Arg screening to the WED and PAM interaction (PI) domain of Cas9d. Several single mutations were observed to enhance cleavage activity (e.g., A552R, F570R, E576R, G578R, D582R, C630R, I632R, E671R, E683R, T734R), particularly I632R, which achieved 51.06% ± 0.03% cleavage activity. Figure 2 (b) in the middle.
[0154] Further combining these mutations from the first two rounds of optimization yielded two variants that exhibited higher efficiency: one is V281R / G492R / M544R / D582R (called Cas9d). Plus (SEQ ID NO. 8), the other is E576R / I632R (called Cas9d). Pro () Figure 2 c and Figure 9 (c in the text)
[0155] Since these Cas9d variants were characterized in conjunction with wild-type gRNA (whose scaffold sequence is shown in SEQ ID NO. 2), it is unclear whether combining them with the optimized gRNA scaffold in Example 1 could further improve editing efficiency.
[0156] Therefore, using the same GFxxFP reporter, this invention evaluated four gRNA scaffold variants (gRNAv1, gRNAv2, gRNAv3, gRNAv4) and two Cas9d mutants (Cas9d...). Plus and Cas9d Pro ) combination ( Figure 2 (d) Notably, compared to other combinations, gRNA-v3 (SEQ ID NO. 6) and Cas9d... Plus Combination (collectively referred to as Cas9d) Ultra The system demonstrated exceptionally high editing efficiency (63.32% ± 0.47%). Figure 2 The d in the model resulted in a 3.1-fold improvement compared to the wild-type Cas9d system (20.44% ± 0.58%).
[0157] In addition, this invention systematically compares wild-type Cas9d (wtCas9d), Cas9d Ultra (with N-terminal SV40NLS and C-terminal nucleoplasmic protein NLS (npNLS)), and in Cas9d Ultra Both the N-terminus and C-terminus have a two-joint NLS (bpNLS) construct (bpNLS-Cas9d). Ultra Differences in bpNLS at eight genomic loci in HEK293T cells were observed. The results indicate that synergistic engineering of gRNA structure and key protein residues enhanced cleavage efficiency. Figure 2 The e~f in the text). The double-terminal bpNLS structure (bis-bpNLS) exhibits superior performance compared to the conventional NLS structure. Figure 2 (e~f in the text).
[0158] In addition, an unbiased bacterial consumption test was used ( Figure 10 (a) In this invention, wtCas9d and Cas9d are characterized. Plus PAM identification; enrichment analysis showed that Cas9d Plus It exhibits enhanced prokaryotic activity. Figure 10 (b~c) Both nucleases recognize typical NGG PAM ( Figure 9 (d in the text)
[0159] Therefore, the integration of rationally designed gRNA scaffold sequences, directed evolution of the Cas9d protein sequence, and optimized NLS conformation all improved the nuclease activity of Cas9d nuclease in both prokaryotic and eukaryotic systems.
[0160] Example 3: Using Cas9d Ultra System performs genome cutting in human cells
[0161] Although previous studies have confirmed that Cas9d possesses in vitro DNA cleavage activity against 17–20 nt spacer sequences, the optimal spacer sequence length required for efficient endogenous genome editing remains undetermined. To address this issue, this invention systematically investigated spacer sequences of different lengths targeting three endogenous sites (VEGFA-S1, CCR5-S1, and PCSK9-S7) in HEK293T cells. Figure 3 (a) The results show that Cas9d Ultra The minimum spacer sequence length required to cut genomic DNA is 17 nt ( Figure 11 (a) Notably, the 20 nt spacer sequence exhibited high editing efficiency at all these sites ( Figure 3 b~c and Figure 11 (a) indicates that the 20 nt interval sequence has the best performance.
[0162] Considering the compact size of Cas9d and the enormous potential of programmable genome editing in biotechnology and clinical applications, this invention compares Cas9d... Ultra The cleavage efficiency of SpG Cas9 at 14 endogenous human sites was compared. Both systems efficiently produced small insertions and deletions (indels) (Cas9d). Ultra : 39.90% ± 17.89%, SpG Cas9: 44.78% ± 18.48%, Figure 3 (d~e). Analysis of the insertion / deletion profiles at BCL11A-S3, HBB-S1, and EMX1-S4 sites showed that Cas9d Ultra The main missing data is centered at position 15 (PAM positions are 21-23), with the missing window covering positions -1 to 26 (…). Figure 3 f and Figure 10 (b) in the middle.
[0163] While Cas9 can tolerate up to three mismatches between gRNA and genomic DNA, this raises concerns about off-target effects. To verify Cas9d... Ultra To assess mismatch tolerance, this invention examined consecutive double mismatches along the spacer sequence at two endogenous sites (EMX1-S1 and HBB-S1) in HEK293T cells. The results showed that Cas9d... Ultra Minimal mismatch tolerance was observed at these sites. Figure 3 g and Figure 11 (c) To further evaluate genome-wide specificity, this invention performed unbiased GUIDE-seq analysis on the PCSK9-S7 and BCL11A-S2 loci in HEK293T cells. Cas9d Ultra Off-target editing was induced at 12 and 20 sites (where off-target sites were allowed ≤8 mismatches and ≥2 reads). Figure 3 h and Figure 12 The number of off-target sites (a~b) induced by SpG Cas9 was significantly less than the 52 and 36 off-target sites induced by SpG Cas9, respectively. Figure 3 h and Figure 12 (c~d) indicates Cas9d Ultra It has higher editing specificity, accuracy, and target specificity than SpG Cas9.
[0164] Example 4: Cas9d Ultra The derived Ultra-BE enables highly efficient base editing.
[0165] Base editing represents a direct method for correcting disease-related mutations in model systems. Given that creating nicks on the non-edited strand in mammalian cells can improve base editing efficiency, this invention constructs a nickase, including Cas9d, by mutating conserved catalytic residues in Cas9d—namely, Asp10 in the RuvC domain and / or His375 in the HNH domain—to alanine. Ultra The nickase (SEQ ID NO. 9) (Cas9d-D10A,V281R,G492R,M544R,D582R), used in combination with the gRNA, still uses the gRNA-v3 (SEQ ID NO. 6) scaffold sequence. The nickase activity of these variants was evaluated using a dual-target reporter system containing a 15 bp or 20 bp insert. On both reporter systems, Cas9d... Ultra The derived nicking enzymes showed higher nicking efficiency than SpG Cas9-derived nicking enzymes. Figure 13 a) in the text indicates that Cas9d Ultra It is a promising candidate for developing base editors.
[0166] In order to develop Cas9d Ultra The derived adenine and cytosine base editor (ABE / CBE), in this invention, is based on Cas9d Ultra The N-terminus of -D10A incorporates an evolved variant of TadA (TadA8e). V106W (SEQ ID NO. 10), thus generating Cas9d Ultra -ABE (named Ultra-ABE; SEQ ID NO. 11) Figure 13 (b) in TadA8e V106W This is a high-fidelity variant that has been reported to reduce RNA off-target effects. Similarly, by combining an optimized human apolipoprotein B mRNA editing enzyme carrying the W104A mutation, catalyzing peptide-like 3A (APOBEC3A or A3A) (A3A-W104A; SEQ ID NO. 12), and a uracil DNA glycosylase inhibitor (UGI; SEQ ID NO. 13) with Cas9d... Ultra -D10A fusion, this invention constructs Cas9d Ultra -CBE (named Ultra-CBE; SEQ ID NO. 14) Figure 13 (b) Note Figure 13 The b shown in the figure is based on Cas9d Ultra -D10A The lengths of Ultra-ABE and Ultra-CBE do not include the N-terminal Met amino acid residues, and are therefore one amino acid shorter than the lengths of Ultra-ABE and Ultra-CBE shown in SEQ ID NO. 11 and SEQ ID NO. 14, respectively.
[0167] To systematically characterize the editing of these two types of BEs, this invention first designed dozens of PAM-matched endogenous sites for Ultra-ABE, Ultra-CBE, SpG-ABE, and SpG-CBE. Ultra-ABE exhibited an editing window covering positions A2~A14 (PAM positions 21~23), with optimal editing efficiency at positions A6~A11. Figure 4 a and Figure 14In contrast, SpG-ABE showed a narrower window (A2–A11) and maximum efficiency at A6. Across 12 sites, Ultra-ABE achieved a higher average A-to-G conversion efficiency than SpG-ABE (Ultra-ABE: 54.08% ± 14.04%, SpG-ABE: 44.41% ± 19.60%). Figure 4 (b) Furthermore, both ABEs induced a considerable insertion / deletion frequency ( Figure 16 (a~b in the original text). Regarding CBE editing at 17 endogenous sites, both Ultra-CBE and SpG-CBE edited cytosine at positions 2~14 (…). Figure 4 c and Figure 15 Ultra-CBE exhibits a wider effective editing window than SpG-CBE (limited to C4~C9). On average, the C to T editing efficiency of the two CBEs is comparable (Ultra-CBE: 56.12% ± 15.67%, SpG-CBE: 56.87% ± 18.67%). Figure 4 (d in the text), and the insertion and missing frequencies are similar ( Figure 16 (c~d in the original text). Therefore, these results indicate that Cad9d Ultra The derived BE achieves comparable editing efficiency to SpG-BE, but is much more compact in size.
[0168] To assess specificity, this invention evaluated gRNA-dependent and gRNA-independent off-target effects of Ultra-BE and SpG-BE in HEK293T cells. Potential gRNA-dependent off-target sites were predicted using Cas-OFFinder, while gRNA-independent off-target editing was evaluated using orthogonal R-loop experiments. Targeted deep sequencing analysis detected one off-target editing site in Ultra-ABE (…). Figure 4 e and Figure 16 In (a), SpG-ABE did not detect off-target editing. Figure 4 f and Figure 16 (b) It is worth noting that SpG-CBE exhibits more frequent off-target editing than Ultra-CBE. Figure 18 R-loop experiments showed that all BEs had comparable targeted editing efficiency and similar low-level gRNA-independent off-target effects. Figure 4 g~h and Figure 19 In summary, these results indicate that Cas9d Ultra -BE is a highly efficient and compact base editor with gRNA-dependent specificity comparable to SpG-BE at the tested genomic sites.
[0169] Example 5: Cas9d Ultra Application of Ultra-CBE in Mouse Model Construction
[0170] The ability of Cas9d and its derived base editors to perform efficient genome editing across various cell types is crucial for advancing their applications in research and therapy. To evaluate these tools in mouse neuroblastoma Neuro-2a cells (N2a), this invention designs a target tyrosinase gene (… Tyr The gene encodes the six gRNA guide sequences (sg1~sg6; as shown in SEQ ID NO. 20~25) of exon 1, which encodes the key melanin biosynthesis enzyme (…). Figure 5 (a) The corresponding full-length gRNA sequence consists of its respective guide sequence and its 3' scaffold sequence gRNA-v3 (SEQ ID NO. 6), for example, corresponding to Tyr The full-length sequence of the gRNA with the guide sequence sg3 is shown in SEQ ID NO. 26.
[0171] Following gRNA selection, the guide sequence (sg3; SEQ ID NO. 22) exhibited the best performance in both systems: Cas9d Ultra It produces approximately 12% frameshift mutations (non-3n insertions / deletions) Figure 20 (a) in the text, while Ultra-CBE induced premature stop codons in approximately 20% of alleles (a). Figure 5 (b) Next, the present invention will combine sg3 gRNA (SEQ ID NO.26) with Cas9d Ultra Microinjection of mRNA or Ultra-CBE mRNA into single-cell embryos to assess editing efficiency in mouse zygotes ( Figure 20 (b) The blastocyst development rates were 80% (8 / 10) and 93% (13 / 14), respectively. Figure 5 c~d and Figure 20 (c) Targeted deep sequencing showed that Cas9d injection... Ultra -sg3 fertilized eggs have an average of 49.61 frameshift mutations ( Figure 20 In the d), 53.19% of fertilized eggs treated with Ultra-CBE-sg3 underwent C-to-T editing ( Figure 20 (e~f in the text). These results demonstrate the powerful editing capabilities of these two systems in early mouse embryos.
[0172] because TYR The loss of function results in oculocutaneous albinism type 1 (OCA1) in humans and a similar albinism phenotype in mice; therefore, hair pigmentation is used as a phenotype readout to reflect editing efficiency. To construct... Tyr This invention provides a disease model of functional impairment. Fertilized eggs (C57BL / 6J female × DBA / 2 male) were harvested from black-haired D2B6F1 mice and used Cas9d... Ultra Microinjection of -sg3 or Ultra-CBE-sg3 onto it ( Figure 20 (b) 87% (242 / 277; Cas9d) Ultra -sg3) and 92% (251 / 273; Ultra-CBE-sg3) of the embryos developed to the two-cell stage ( Figure 5 (e). In each group, 125 two-cell stage embryos were transferred into female mice, resulting in 47 mice (Cas9d). Ultra -sg3) and 56 (Ultra-CBE-sg3) F0 pups ( Figure 5 (e~f in the original text). Mutation analysis confirmed Cas9d. Ultra In juvenile mice treated with -sg3, 85% (40 / 47) exhibited Tyr disruption (average editing efficiency of 71.36%), while in juvenile mice treated with Ultra-CBE-sg3, this proportion was 89% (50 / 56). Figure 5 (g~h in the text). The editing results show that Cas9d Ultra The resulting completely albino mice were mainly frameshift deletions ( Figure 21 (a) in the text, while the albino mice produced by Ultra-CBE showed that almost all of them had premature stop codons (a). Figure 21 (b) In this invention, the gRNA-dependent off-target effects of each group (n=3 mice) of completely albino F0 mice were further evaluated by deep sequencing, and no off-target editing was detected. Figure 22 In summary, these results prove that Cas9d... Ultra The Ultra-CBE system enables efficient and precise genome editing in mouse embryos, supporting its application in disease model construction and treatment.
[0173] Example 6: TadA deaminase embedded in Cas9d Ultra The base editor constructed
[0174] This invention involves embedding TadA deaminase into Cas9d. Ultra Another ABE was constructed (SEQ ID NO. 9), with sequences shown in SEQ ID NO. 15-17. The base editing efficiency in HEK293 cells using a fluorescent reporter system was tested and is shown in the table below.
[0175] Table 1: Base editing efficiency of ABE editors for sequences shown in SEQ ID NO. 11, 15-17
[0176]
[0177] Example 7: Truncated Cas9d Plus variants
[0178] This invention also removes Cas9d Plus A partial sequence from (SEQ ID NO: 8) was used to construct a truncated Cas9d. Plus The variants, with sequences as shown in SEQ ID NO. 18 (ST153; amino acids 95-136 deleted) or 19 (ST160; amino acids 100-129 deleted), were assembled with NLS to detect DNA cleavage. The cleavage efficiency in HEK293 cells on a fluorescent reporter system is shown in the table below.
[0179] Table 2: Cutting efficiency of truncated Cas9d variants
[0180]
[0181] The above description is merely a preferred embodiment of the present invention and is not intended to limit the present invention. Although the present invention has been described in detail with reference to the foregoing embodiments, those skilled in the art can still modify the technical solutions described in the foregoing embodiments or make equivalent substitutions for some of the technical features. Any modifications, equivalent substitutions, improvements, etc., made within the spirit and principles of the present invention should be included within the protection scope of the present invention.
Claims
1. A Cas protein, characterized in that, The Cas protein is a mutant of the reference Cas protein, the amino acid sequence of which is shown in SEQ ID NO.1, and the amino acid sequence of the Cas protein is shown in SEQ ID NO.8, 9, 18 or 19.
2. A fusion protein, characterized in that, The amino acid sequence of the fusion protein is shown in any one of SEQ ID NO. 11, SEQ ID NO. 14-17.
3. A polynucleotide, characterized in that, The polynucleotide comprises a sequence encoding the Cas protein of claim 1 or the fusion protein of claim 2.
4. A carrier, characterized in that, It comprises the polynucleotide of claim 3.
5. A CRISPR-Cas system, characterized in that, Include: (a) the Cas protein of claim 1, the fusion protein of claim 2, or the polynucleotide of claim 3; and (b) A guide RNA or a polynucleotide containing a sequence encoding the guide RNA, the guide RNA comprising a guide sequence and a scaffold sequence located at the 3' end of the guide sequence, wherein the scaffold sequence is capable of forming a complex with the Cas protein or the fusion protein, and the guide sequence is capable of hybridizing with a target sequence in the target DNA, thereby guiding the complex to the target DNA.
6. An in vitro method for modifying target DNA, characterized in that, This includes contacting the target DNA with the CRISPR-Cas system of claim 5, wherein the guide sequence in the CRISPR-Cas system hybridizes with the target sequence in the target DNA, such that the target DNA is modified by the CRISPR-Cas system.
7. The use of the Cas protein of claim 1, the fusion protein of claim 2, the polynucleotide of claim 3, the vector of claim 4, or the CRISPR-Cas system of claim 5 in the preparation of drugs or reagents for modifying target DNA.