PPR protein having reduced off-target effects and use for same
By mutating amino acids at positions 2 and 13 in PPR motifs, the off-target effects of PPR proteins are mitigated, enhancing their specificity and utility in RNA editing applications.
Patent Information
- Authority / Receiving Office
- WO · WO
- Patent Type
- Applications
- Current Assignee / Owner
- EDITFORCE INC
- Filing Date
- 2025-12-26
- Publication Date
- 2026-07-02
AI Technical Summary
Existing PPR proteins exhibit significant off-target effects due to their RNA binding specificity, which complicates their application in technological development.
Introduce specific amino acid mutations at positions 2 and 13 within the PPR motifs to reduce off-target effects while maintaining on-target editing efficiency, utilizing a modified PPR protein design.
The modified PPR proteins demonstrate reduced off-target activity with minimal impact on on-target editing efficiency, enabling precise RNA targeting and broader applicability in fields like medicine, agriculture, and chemistry.
Smart Images

Figure JP2025045892_02072026_PF_FP_ABST
Abstract
Description
PPR proteins with reduced off-target effects and their applications
[0001] This invention relates to a PPR protein capable of binding to target RNA. This invention is useful in fields such as medicine (drug discovery support, treatment), agriculture (agricultural, fishery, and livestock product production, breeding), and chemistry (biological substance production).
[0002] Pentatricopeptide repeat (PPR) proteins are involved in eukaryotic gene regulation at the RNA level (Non-Patent Literature 1). PPR proteins are RNA-binding proteins with a tandem repeat structure of a PPR motif consisting of 31 to 36 amino acids. The first of the two helices constituting each motif interacts with the RNA molecule. Two amino acids within the motif (the 5th and last positions) recognize RNA in a base-specific manner according to the nucleic acid recognition code (Non-Patent Literature 2, 3). PPR proteins are classified into two subclasses based on the composition of the PPR motif: the P class, which consists only of the standard P motif, and the PLS class, which has L1 and S1 motifs in addition to P1 (P) (Non-Patent Literature 4). Most PLS class proteins have a PPR-like motif (P2, L2, S2, E1, E2) and part or full-length of a cytidine deaminase-like domain (DYW) at the C-terminus. The DYW domain has three subclasses: the PG and WW subclasses are involved in cytidine-to-uridine editing of RNA (C-to-U), and the KP subclass is involved in uridine-to-cytidine editing (U-to-C) (Non-Patent Literature 5, 6). The present inventors previously created a "designer PRR editor" by ligating a PLS domain, composed of artificially designed PLS repeats, with a C-terminal RNA editing domain consisting of PPR-like motifs (P2, L2, S2, E1, E2) and a DYW domain. This protein edited target bases on foreign mRNA introduced into the cytoplasm of human cultured cells (Non-Patent Literature 7, Patent Literature 1, Non-Patent Literature 9).
[0003] Studies using in vitro bind-n-seq have shown that artificial P arrays (repeating structures of P motifs) are highly likely to cause mismatches near the 3' end of the binding site (Non-Patent Literature 8). According to the same study, the optimal number of motifs for the PPR domain is 10, and it is suggested that misalignment occurs between the PPR protein and the target sequence with a longer number of motifs (14 motifs in the same study).
[0004] International release WO2021-201198
[0005] Barkan A, Small I. Pentatricopeptide repeat proteins in plants. Annu Rev Plant Biol. 2014;65:415-42.Barkan A, Rojas M, Fujii S, Yap A, Chong YS, Bond CS, Small I. A combinatorial amino acid code for RNA recognition by pentatricopeptide repeat proteins. PLoS Genet. 2012;8(8):e1002910Shen C, Zhang D, Guan Z, Liu Y, Yang Z, Yang Y, Wang X, Wang Q, Zhang Q, Fan S, Zou T, Yin P. Structural basis for specific single-stranded RNA recognition by designer pentatricopeptide repeat proteins. Nat Commun. 2016 Apr 18;7:11285Cheng S, Gutmann B, Zhong X, Ye Y, Fisher MF, Bai F, Castleden I, Song Y, Song B, Huang J, Liu X, Xu X, Lim BL, Bond CS, Yiu SM, Small I. Redefining the structural motifs that determine RNA binding and RNA editing by pentatricopeptide repeat proteins in land plants. Plant J. 2016 Feb;85(4):532-547Gerke P, Szovenyi P, Neubauer A, Lenz H, Gutmann B, McDowell R, Small I, Schallenberg-Rudinger M, Knoop V.Towards a plant model for enigmatic U-to-C RNA editing: the organelle genomes, transcriptomes, editomes and candidate RNA editing factors in the hornwort Anthoceros agrestis. New Phytol. 2020 Mar;225(5):1974-1992Gutmann B, Royan S, Schallenberg-Rudinger M, Lenz H, Castleden IR, McDowell R, Vacher MA, Tonti-Philippines J, Bond CS, Knoop V, Small ID. The Expansion and Diversification of Pentatricopeptide Repeat RNA-Editing Factors in Plants. Mol Plant. 2020 Feb 3;13(2):215-230Ichinose M., Gutmann B., Yagi Y., Akaiwa Y., Shimajiri Y., Nakamura T. Method for editing target RNA 2021Miranda RG, McDermott JJ, Barkan A. Elucidate RNA-binding specificity landscapes of designer pentatricopeptide repeat proteins principles of PPR-RNA interactions. Nucleic Acids Res. 2018 Mar 16;46(5):2613-2623Ichinose M, Kawabata M, Akaiwa Y, Shimajiri Y, Nakamura I, Tamai T, Nakamura T, Yagi Y, Gutmann B. U-to-C RNA editing by synthetic PPR-DYW proteins in bacteria and human culture cells. Commun Biol.2022 Sep 15;5(1):968.
[0006] It would be desirable to have a method that reduces off-target effects of PPR proteins while maintaining their length.
[0007] The PLS domain is difficult to apply to technological development because the mechanism of action of the L motif has not been elucidated. Therefore, to circumvent this problem, the PLS domain was replaced with the P domain, and the decrease in RNA editing efficiency was only slight (Non-Patent Literature 7, Patent Literature 1). In this study, with the aim of reducing off-target effects by PPR proteins, we investigated the effect on editing efficiency by introducing mutations in amino acids at positions involved in binding to RNA molecules in a designer PPR editor in which the C-terminal domain (P2, L2, S2, E1, E2, DYW) was linked to the PPR-P domain. First, we showed that introducing mutations in amino acids at positions 2 or 13 in multiple PPR motifs reduced or eliminated off-target editing activity. In particular, introducing amino acid mutations at position 13 in two motifs significantly reduced off-target effects while maintaining on-target editing activity. Furthermore, combining amino acid mutations at position 2 in two different motifs in addition to amino acid mutations at position 13 in two motifs also significantly reduced off-target effects. Finally, human transcriptome analysis showed that introducing amino acid mutations at position 13 into two motifs reduced the number of off-target sites to 1 / 6 to 1 / 8. Based on these findings, the present invention was completed.
[0008] The present invention provides the following: [1] A method for modifying a PPR protein, comprising: Substituting at least one amino acid at positions 2 and 13 in the amino acid sequence of one or more PPR motifs of a PPR protein containing multiple PPR motifs. Here, the PPR motif is a motif classified into any subclass selected from the P, P1, L1, S1, SS, P2, L2, S2, E1, and E2 motifs, and consists of a polypeptide with a total length of 31 to 36 amino acids represented by the following formula 1, where each amino acid is numbered A1, A2, A3, A4... in order. (In Formula 1: Helix A is a portion capable of forming an α-helix structure, consisting of 13 or 14 amino acids in length; X1 consists of 1 to 9 amino acids, preferably 1 to 3; Helix B is a portion capable of forming an α-helix structure, consisting of 10 to 14 amino acids in length; X2 consists of 1 to 9 amino acids, preferably 4 to 9 amino acids, where the C-terminal amino acid in X2 is represented by L.) The amino acid combination of A5 and L functions for selective binding to RNA bases. More specifically, the bases capable of binding to the two amino acid combinations of A5 and L satisfy one of the following table (2-1) to (2-42), depending on the subclass to which the motif is classified. [2] The method according to 1, wherein the amino acid substitution is at least one substitution selected from V2 and K13. [3] The method according to 1 or 2, wherein the amino acid substitution is at least one substitution selected from V2F, V2S, V2Y, K13Q, and K13S. [4] The method according to any one of items 1 to 3, wherein the PPR protein contains 14 or more PPR motifs. [5] A method for reducing the off-target effects of a PPR protein, comprising the steps of: carrying out the modification method according to any one of items 1 to 4; and allowing the resulting modified PPR protein to act on a target RNA in a prokaryotic or eukaryotic cell. [6] Polypeptides consisting of any of the following amino acid sequences: VFTYNTLIDG LCKAGRLDEA EELLEEMEEK GIKPD (SEQ ID NO:5) VFTYNTLIDG LCKSGKIEEA LKLFKEMEEK GITPS (SEQ ID NO:6) VFTYTTLIDG LCKAGKVDEA LELFDEMKER GIKPD (SEQ ID NO:7) VFTYTTLIDG LCKAGKVDEA LELFKEMRSK GVKPN (SEQ ID NO:8) VSTYNTLIDG LCKAGRLDEA EELLEEMEEK GIKPD (SEQ ID NO:9) VSTYNTLIDG LCKSGKIEEA LKLFKEMEEK GITPS (SEQ ID NO:10) VSTYTTLIDG LCKAGKVDEA LELFDEMKER GIKPD (SEQ ID NO:11) VSTYTTLIDG LCKAGKVDEA LELFKEMRSK GVKPN (SEQ ID NO:12) VVTYNTLIDG LCQAGRLDEA EELLEEMEEK GIKPD (SEQ ID NO:13) VVTYNTLIDG LCQSGKIEEA LKLFKEMEEK GITPS (SEQ ID NO:14) VVTYTTLIDG LCQAGKVDEA LELFDEMKER GIKPD (SEQ ID NO:15) VVTYTTLIDG LCQAGKVDEA LELFKEMRSK GVKPN (SEQ ID NO:16) VVTYNTLIDG LCSAGRLDEAEELLEEMEEK GIKPD (SEQ ID NO:17) VVTYNTLIDG LCSSGKIEEA LKLFKEMEEK GITPS (SEQ ID NO:18) VVTYTTLIDG LCSAGKVDEA LELFDEMKER GIKPD (SEQ ID NO:19) VVTYTTLIDG LCSAGKVDEA LELFKEMRSK GVKPN (SEQ ID NO:20) [7] A PPR protein capable of binding to target RNA, comprising 10 or more PPR motifs, of which 2 to 6 are PPR motifs selected from the following: The polypeptide described in 6, and a polypeptide consisting of a sequence in which at least one substitution selected from V2F, V2S, V2Y, K13Q and K13S is made in the amino acid sequence of the PPR motif, in this case The PR motif consists of a polypeptide with a total length of 31 to 36 amino acids, represented by the formula 1 below, and each amino acid is numbered sequentially as A1, A2, A3, A4, etc. (In Formula 1: Helix A is a portion capable of forming an α-helix structure, consisting of 13 or 14 amino acids in length; X1 consists of 1 to 9 amino acids, preferably 1 to 3 amino acids in length; Helix B is a portion capable of forming an α-helix structure, consisting of 10 to 14 amino acids in length; X2 consists of 1 to 9 amino acids, preferably 4 to 9 amino acids in length, where the C-terminal amino acid in X2 is represented by L.) The amino acid combination of A5 and L functions for selective binding to RNA bases. [8] The PPR protein according to 6, comprising 14 PPR motifs. [9] The PPR protein modified by the method described in any one of items 1 to 4 and expressed in prokaryotic or eukaryotic cells.
[10] The polypeptide according to 6, or the nucleic acid encoding the PPR protein according to 7 or 8.
[11] The vector comprising the nucleic acid according to 10.
[12] The cell comprising the vector according to 11.
[0009] This study strongly suggests that mitigating the stability of the PPR / RNA complex by introducing mutations into amino acids involved in its stabilization is important for reducing the off-target effects of PPR editors.
[0010] The modification method of the present invention, namely mutation of the PPR domain, is a novel approach to reduce off-target effects of PPR proteins, and this method, which maintains the length of the PPR domain, enables the design of PPR proteins that specifically target a wider range of targets.
[0011] A designer PPR editor targeting the CTNNB1-T41I site. The PPR editor has a PPR array consisting of 16 PPR-P motifs, followed by a C-terminal domain containing five PPR-like motifs and a DYW domain (a). (b) shows the amino acid positions 2, 13, or 5, 35 (involved in specific RNA recognition) in each of the 16 PPR-P motifs. Alignment of the nucleotide sequences of the nine off-target editing sites analyzed in this study and the target CTNNB1 editing site of the PPR editor (c). Arrows indicate editing sites. Frequency of occurrence of the amino acid at position 2 in the P motif (35 amino acid length) of the Arabidopsis thaliana PPR protein (Non-patent Literature 3). Effect of amino acid mutation at position 2 in a single PPR motif on off-target effects. In the 16P-WW protein targeting the CTNNB1-T41I site, valine (V) at position 2 of the 6th, 8th, 10th, 12th, or 14th PPR motif was replaced with phenylalanine (V2F), serine (V2S), or tyrosine (V2Y). The editing efficiency of the PPR mutants at the on-target (CTNNB1) and off-target sites was compared with that of the unmutated 16P-WW (WT). Gray bars represent the mean editing efficiency of the WT, and white bars represent the mean editing efficiency of the mutants. The results of three experiments are shown. Significant differences between WT and mutants are indicated by *: * P<0.05, ** P<0.01 (Student's t test). Mutants showing good performance are boxed. The effect of amino acid mutations at position 2 in two PPR motifs on off-target effects. In the 16P-WW protein targeting the CTNNB1-T41I site, valine (V) at position 2 of the 6th / 14th or 8th / 14th PPR motif was replaced with phenylalanine (V2F), serine (V2S), or tyrosine (V2Y). The editing efficiency of the mutants at on-target (CTNNB1) and off-target sites was compared with that of the unmutated 16P-WW (WT). Gray bars represent the mean editing efficiency of the WT, and white bars represent the mean editing efficiency of the mutants. The results of three experiments are shown. Significant differences between WT and mutants are indicated by *: *P<0.05, **P<0.01 (Student's t-test). Mutants showing good performance are enclosed in boxes.Frequency of amino acid occurrence at position 13 in PPR motifs. The frequency of amino acid occurrence at position 13 in the P motif identified in Arabidopsis thaliana (a), or in the P1 (b), L1 (c), and S1 (d) motifs identified from the transcriptome of early land plants is shown. The effect of amino acid mutations at position 13 in a single PPR motif on off-target effects. In 16P-PG (a) and 16P-WW (b) proteins targeting the CTNNB1-T41I site, lysine (K), the amino acid at position 13 of the 3rd, 7th, 11th, or 15th PPR motif, was substituted with glutamine (K13Q) or serine (K13S). The editing efficiency of these PPR mutants at the on-target (CTNNB1) and off-target sites was compared with that of unmutated 16P-PG or 16P-WW (WT). Gray bars represent the average editing efficiency of WT, and white bars represent the average editing efficiency of the mutants. Results from three experiments are shown. Significant differences between WT and mutants are indicated by *: * P<0.05, ** P<0.01, *** P<0.001, **** P<0.0001 (Student's t test). Mutants showing good performance are boxed. Effects of amino acid mutations at position 13 in two PPR motifs on off-target effects. In 16P-PG (a) and 16P-WW (b) proteins targeting the CTNNB1-T41I site, lysine (K), the amino acid at position 13 of the 3rd and 7th, 3rd and 11th, 7th and 15th, or 11th and 15th PPR motifs, was substituted with glutamine (K13Q) or serine (K13S). The editing efficiency of PPR mutants at the on-target (CTNNB1) and off-target sites was compared with 16P-PG or 16P-WW (WT) without the mutation. Gray bars represent the mean editing efficiency of WT, and white bars represent the mean editing efficiency of mutants. The results of three experiments are shown. Significant differences between WT and mutants are indicated by *: * P<0.05, ** P<0.01, *** P<0.001, **** P<0.0001 (Student's t test). Mutants showing favorable results are boxed. The effect of amino acid mutations at position 13 in four PPR motifs on off-target effects.In 16P-PG (a) and 16P-WW (b) proteins targeting the CTNNB1-T41I site, the amino acid lysine (K) at position 13 of the 3rd, 7th, 11th, and 15th PPR motifs was substituted with glutamine (K13Q) or serine (K13S). The editing efficiency of the PPR mutants at the on-target (CTNNB1) and off-target sites was compared with that of 16P-PG or 16P-WW (WT) without the mutation. Gray bars represent the mean editing efficiency of the WT, and white bars represent the mean editing efficiency of the mutants. The results of three experiments are shown. Significant differences between WT and mutants are indicated by *: **P<0.01, ***P<0.001, ****P<0.0001 (Student's t test). The effect of amino acid mutations at positions 2 and 13 in the two PPR motifs on off-target effects. (a) In the 16P-WW protein targeting the CTNNB1-T41I site, valine (V) at position 2 of the 6th or 8th PPR motif was replaced with serine (V2S) or tyrosine (V2Y), and lysine (K) at position 13 of the 15th motif was replaced with glutamine (K13Q) or serine (K13S). (b) K13Q or K13S mutations were introduced into the 7th or 11th motif of 16P-WW, and V2S or V2Y mutations were introduced into the 14th motif. The editing efficiency of the PPR mutants at the on-target (CTNNB1) and off-target sites was compared with that of 16P-WW without mutations (WT). Gray bars represent the average editing efficiency for WT, and white bars represent the average editing efficiency for the mutants. The results of three experiments are shown. Significant differences between WT and mutants are indicated by *: * P<0.05, ** P<0.01, *** P<0.001 (Student's t test). Mutants showing favorable results are boxed. Effects of amino acid mutations at positions 2 and 13 in four PPR motifs within the 16P-WW protein on off-target effects. V2S or V2Y mutations were introduced at the amino acid position 2 of motifs 6 and 14 (a) or motifs 8 and 14 (b), and K13Q or K13S mutations were introduced at the amino acid position 13 of motifs 7 and 15 (a) or motifs 11 and 15 (b). The editing efficiency of these mutants at on-target (CTNNB1) and off-target sites was compared with that of 16P-WW (WT) without mutations.Gray bars represent the average editing efficiency of the WT (Write-Track) and white bars represent the average editing efficiency of the mutant. Results from three experiments are shown. Significant differences between WT and mutants are indicated by *: *P<0.05, **P<0.01, ***P<0.001 (Student's t-test). Mutants showing good performance are enclosed in boxes. The effect of PPR domain length on off-target effects. Four different PPR-P domains with varying motif numbers (10, 12, 14, or 16 motifs) were ligated to a modified C-terminal WW domain (WW2) to enhance editing activity. Editing activity of these proteins at on-target (CTNNB1) and off-target sites was measured. Results from three experiments are shown. The number of motifs showing good performance is enclosed in boxes. The effect of amino acid mutations at position 13 in the 14P-WW2 protein on off-target effects. In the 14P-WW2 protein targeting the CTNNB1-T41I site, lysine (K) at position 13 within one (a) or two (b) motifs was substituted with glutamine (K13Q) or serine (K13S). The editing efficiency of these PPR mutants at both the on-target (CTNNB1) and off-target sites was compared to that of 14P-WW2 without the mutation (WT). Gray bars represent the mean editing efficiency for WT, and white bars represent the mean editing efficiency for the mutants. Results from three experiments are shown. Significant differences between WT and mutants are indicated by *: * P<0.05, ** P<0.01, *** P<0.001, **** P<0.0001 (Student's t test). Mutants showing good performance are boxed. The editing efficiency and target specificity of the 14P-WW protein, which targets the CTNNB1-T41I site, were compared with the 14P-WW2 variant and mutants in which a mutation (K13Q or K13S) was introduced into the amino acid at position 13 of two PPR motifs (7th and 13th motifs or 9th and 13th motifs) within the variant (a). The Venn diagram shows the results of RNA-seq analysis of the number of off-target sites for four types of 14P-WW proteins in the transcriptome of HEK293T cells (b). Bases that were significantly edited in the three experiments were defined as off-target editing sites. Frequency of occurrence of amino acids in the E2 motif used with DYW:PG.Visualized using a sequence logo created with WebLogo, similar to Figure 4 of Patent Document 1. Frequency of occurrence of amino acids in the E2 motif used with DYW:WW. Visualized using a sequence logo created with WebLogo, similar to Figure 4 of Patent Document 1. Analysis of the effect of changes in PPR domain length (10, 12, 14, 16 P motifs) on off-target RNA editing using editing tools PG2 (A) and WW2 (B) (HEK293T cells). Results from three experiments are shown. The letters on the bars indicate significant differences based on PPR domain length (one-way ANOVA, Tukey's comparative test, P < 0.05). Multiple letters indicate no significant difference. Time course analysis of each RNA editing technology. A. Cytotoxicity of cells transfected with vectors expressing each RNA editing technology was evaluated at 24, 48, and 72 hours using an LDH assay. Each data point shows the mean ± standard deviation of three biological replications (n=3). B. Time course of editing efficiency of the CTNNB1-T41I site. Asterisks indicate significant differences between time points (*P<0.05, **P<0.01; one-way ANOVA, Tukey's test). C, Relative β-catenin protein accumulation level (corrected for total protein, then further corrected for blank vector sample). The letters above the bars indicate significant differences between each variant (one-way ANOVA, Tukey's test, <0.05). Multiple letters indicate no significant difference. D, Relative CTNNB1 mRNA expression level (corrected for GAPDH gene expression level, then further corrected for blank vector sample). Asterisks indicate significant differences compared to the blank vector control (*P<0.05; Welch's test). BD, All data are shown as the mean ± standard deviation of three biological copies (n=3). Each data point corresponds to a single biological copy. In vitro RNA binding activity of the editing tool (RECODE (RNA Editor for C-to-U with an Optimized DYW Enzyme)). RNA electrophoretic mobility shift assay to evaluate the RNA binding activity of editing tools targeting CTNNB1-T41I. Binding curves show the binding rate of the RNA probe at protein concentrations (0, 0.5, 1, 2, 5, 10, 20, 50, 100, 200 nM). Each data point represents the mean ± standard deviation of three biological copies (n=3).The calculated equilibrium dissociation constant (K(d)) and maximum binding amount (Bmax) are also shown. RECODE edits a wide range of targets in human cells. A. The heatmap shows the RNA editing efficiency (average value of 3 biological replications (n=3)) at endogenous target mRNA by each RECODE variant and RESCUE-S. The targets were selected to represent all four types of nucleotides from position -6 to +5 relative to the editing site (indicated by arrows). B. Off-target editing sites detected from the amplification product of RECODE-WW2 (Figure 9A) are aligned to on-target editing sites (arrows) (upper panel). RECODE-WW2 is aligned to RECODE-WW. 2-PPRm1 (K13S mutation in P-motifs 7 and 13) and RECODE-WW 2-PPRm2 The RNA editing efficiencies of the target (on) and off-target (off) were evaluated by mutating the proteins to match the pattern of the K13Q mutation in motifs 9 and 13. The editing efficiencies of these RECODE mutants were validated at off-target sites within the KRAS-Q25X and SMARCA4-P88L amplification products. All editing efficiencies are shown as the mean of three biological copies (n=3), with each point representing a single copy. Significant differences between wild-type and mutant proteins (Welch's t-test) are indicated by asterisks. *P < 0.05 and ****P < 0.0001.
[0012] I. Method for Modifying PPR Protein (Embodiment 1) This embodiment relates to a method for modifying a protein. The PPR protein according to this embodiment is modified so as to reduce off-target effects (or actions; the same applies hereinafter), as described later. Off-target effects refer to any effects on targets other than the original target, and include off-target binding and off-target editing.
[0013] PPR proteins contain a binding region consisting of an array of PPR motifs capable of binding to target RNA, and may have a C-terminal domain. The binding region may consist of a P array, i.e., simple repeats of a standard 35-amino acid PPR motif (P), or a PLS array, i.e., in addition to P, it may contain two similar motifs called L and S, and may be composed of repeating units of PLS, or more specifically, three repeating units of P1 (approximately 35 amino acids), L1 (approximately 35 amino acids), and S1 (approximately 31 amino acids). Unless otherwise specified, the C-terminal domain consists of a P2 motif, an L2 motif, an S2 motif, an E1 motif, an E2 motif, and a DYW domain. The portion consisting of the P array is sometimes called the P domain, and the portion consisting of the PLS array is sometimes called the PLS domain.
[0014] (PPR motif) A PPR motif refers to a polypeptide consisting of 30 to 38 amino acids whose amino acid sequence, when analyzed using a web-based protein domain search program, has an E value of PF01535 in Pfam and PS51375 in Prosite that is below a predetermined value (preferably E-03). In this application, the position numbers of the amino acids constituting the PPR motif are almost synonymous with PF01535, but correspond to the number obtained by adding 1 to the amino acid position of PF01535 (e.g., position 5 in this invention → position 4 in PF01535). For Pfam, see http: / / pfam.sanger.ac.uk / , and for Prosite, see http: / / www.expasy.org / prosite / .
[0015] In relation to the present invention, the position of an amino acid on the sequence of a PPR motif is represented by position x.
[0016] In relation to the present invention, when simply referring to a PPR motif, unless otherwise specified, it includes all subclasses of PPR motifs, specifically the P, P1, L1, S1, SS, P2, L2, S2, E1, and E2 motifs. The P2, L2, S2, E1, and E2 motifs may also be referred to as PPR-like motifs.
[0017] More specifically, the PPR motif consists of a polypeptide with a total length of 31 to 36 amino acids, represented by the following formula 1, where each amino acid is numbered sequentially as A1, A2, A3, A4, and so on.
[0018]
[0019] (In Formula 1: Helix A is a portion capable of forming an α-helix structure, consisting of 13 or 14 amino acids in length; X1 consists of 1 to 9 amino acids in length, preferably 1 to 3 amino acids; Helix B is a portion capable of forming an α-helix structure, consisting of 10 to 14 amino acids in length; X2 consists of 1 to 9 amino acids in length, preferably 4 to 9 amino acids, where the C-terminal amino acid in X2 is represented by L.) The amino acid combination of A5 and L functions for selective binding to RNA bases.
[0020] In one embodiment, the Helix A, X1, Helix B, X2, and total length of the PPR motif in each subclass are as follows:
[0021]
[0022] The PPR motif relies on the combination of amino acids A5 and L for selective binding to RNA bases. The relationship between the two amino acid combinations A5 and L and the bindable base (referred to as Base in the table below) is known as the PPR code. The table below shows the bindable bases and the two amino acid combinations A5 and L that constitute the PPR motif.
[0023]
[0024] In one embodiment, the PPR motif may have the following combinations of amino acids: A2, which is involved in the stability of binding to the RNA molecule, and A5 and L, which are involved in the specific recognition of RNA bases. (3-1) In a PPR motif that selectively binds to U, the combination of the three amino acids A2, A5, and L is, in order, valine, asparagine, and aspartic acid; (3-2) In a PPR motif that selectively binds to A, the combination of the three amino acids A2, A5, and L is, in order, valine, threonine, and asparagine; (3-3) In a PPR motif that selectively binds to C, the combination of the three amino acids A2, A5, and L is, in order, valine, asparagine, and asparagine; (3-4) In a PPR motif that selectively binds to G, the combination of the three amino acids A2, A5, and L is, in order, glutamic acid, glycine, and aspartic acid; (3-5) In a PPR motif that selectively binds to C or U, the combination of the three amino acids A2, A5, and L is, in order, isoleucine, asparagine, and asparagine; (3-6) In a PPR motif that selectively binds to G, the combination of the three amino acids A2, A5, and L is, in order, valine, threonine, and aspartic acid; (3-7) In a PPR motif that selectively binds to G, the combination of the three amino acids A2, A5, and L is, in order, lysine, threonine, and aspartic acid; (3-8) In a PPR motif that selectively binds to A, the combination of the three amino acids A2, A5, and L is, in order, phenylalanine, serine, and asparagine; (3-9) In a PPR motif that selectively binds to C, the combination of the three amino acids A2, A5, and L is, in order, valine, asparagine, and serine; (3-10) In a PPR motif that selectively binds to A, the combination of the three amino acids A2, A5, and L is, in order, phenylalanine, threonine, and asparagine; (3-11) In PPR motifs that selectively bind to U or A, the combination of the three amino acids A2, A5, and L is, in order, isoleucine, asparagine, and aspartic acid;(3-12) In PPR motifs that selectively bind to A, the combination of the three amino acids A2, A5, and L is, in order, threonine, threonine, and asparagine; (3-13) In PPR motifs that selectively bind to U or C, the combination of the three amino acids A2, A5, and L is, in order, isoleucine, methionine, and aspartic acid; (3-14) In PPR motifs that selectively bind to U, the combination of the three amino acids A2, A5, and L is, in order, phenylalanine, proline, and aspartic acid; (3-15) In PPR motifs that selectively bind to U, the combination of the three amino acids A2, A5, and L is, in order, tyrosine, proline, and aspartic acid; (3-16) In PPR motifs that selectively bind to G, the combination of the three amino acids A2, A5, and L is, in order, leucine, threonine, and aspartic acid.
[0025] (Modification of PPR motif) In this embodiment, at least one amino acid at positions 2 and 13 is substituted in the amino acid sequence of the PPR motif described above. The PPR motif specifically recognizes one RNA base according to a nucleic acid recognition code determined by the combination of amino acids at two locations (A5 and L) within the motif (Non-Patent Literature 2, 3). A2 and A 13 Although the amino acids at position 2 are also involved in RNA binding, nucleic acid recognition is nonspecific (Non-Patent Literature 3). The amino acids at position 2 of two adjacent motifs sandwich one nucleic acid via van der Waals forces (see Supplementary Fig. 7b in Non-Patent Literature 3). The amino acid at position 13 forms a salt bridge with the phosphate group of the single-stranded RNA (ssRNA) backbone (see Supplementary Fig. 8b in Non-Patent Literature 3).
[0026] In one embodiment, the amino acid substitution in the PPR motif is at least one substitution selected from V2 and K13. The substituted amino acid is not particularly limited. With respect to V2, the substituted amino acid may be A (alanine), L (leucine), R (arginine), K (lysine), N (asparagine), M (methionine), D (aspartic acid), F (phenylalanine), C (cysteine), P (proline), Q (glutamine), S (serine), E (glutamic acid), T (threonine), G (glycine), W (tryptophan), H (histidine), Y (tyrosine), I (isoleucine). With respect to V13, the substituted amino acid may be A, L, R, N, M, D, F, C, P, Q, S, E, T, G, W, H, Y, I, V (valine).
[0027] In one embodiment, the amino acid substitution in the PPR motif is at least one substitution selected from V2F, V2S, V2Y, K13Q, and K13S.
[0028] In one embodiment, the amino acid substitution in the PPR motif is at least one substitution selected from V2F, V2S, and V2Y. In another embodiment, the amino acid substitution in the PPR motif is at least one substitution selected from K13Q and K13S. In yet another embodiment, the amino acid substitution in the PPR motif is a substitution of both at least one substitution selected from V2F, V2S, and V2Y and at least one substitution selected from K13Q and K13S.
[0029] In one embodiment, the above substitution may be performed on the following PPR motif sequences: VVTYTTLIDG LCKAGKVDEA LELFKEMRSK GVKPN (SEQ ID NO:1) (Example of A-binding motif) VVTYNTLIDG LCKSGKIEEA LKLFKEMEEK GITPS (SEQ ID NO:2) (Example of C-binding motif) VVTYTTLIDG LCKAGKVDEA LELFDEMKER GIKPD (SEQ ID NO:3) (Example of G-binding motif) VVTYNTLIDG LCKAGRLDEA EELLEEMEEK GIKPD (SEQ ID NO:4) (Example of U-binding motif)
[0030] (Number and position of modified PPR motifs) This embodiment relates to a PPR protein containing multiple PPR motifs. In the PPR protein of this embodiment, one or more PPR motifs are modified from the viewpoint of reducing off-target effects.
[0031] The length of the PPR motifs contained in the PPR protein of this embodiment is not particularly limited, but is preferably 10 or more, regardless of the type of modification and the number and position of the modified motifs (which number the modified motif is used in), and may be 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, etc. From the viewpoint of editing efficiency, it is more preferable to be 14 or more.
[0032] In PPR proteins containing multiple PPR motifs, the number of modified motifs may be one or multiple, regardless of the type and location of the modification. From the perspective of maintaining on-target editing activity, only some, not all, PPR motifs are modified. When one motif is modified, it may be any motif, but for example, in a PPR protein containing 16 or more PPR motifs, the modified motif may be the sixth or later motif.
[0033] The position of the modified motif can be any motif, regardless of the type of modification or the number of motifs modified. The modified motifs may be consecutive or not. For example, in a PPR protein containing 16 or more PPR motifs, if two motifs are modified, they could be the 3rd and 13th, the 7th and 13th, the 7th and 15th, the 11th and 13th, or the 11th and 15th.
[0034] Regardless of the type of modification, the N-terminal portion of the PPR protein is crucial for initial RNA recognition, and weakening this portion can reduce on-target activity. From this perspective, it is preferable to modify the motifs after the N-terminus, for example, the first to third motifs, while leaving the N-terminal region untouched.
[0035] In one embodiment, for a PPR editor protein containing 16 PPR motifs, the following may be effective: Substitution of K13Q or K13S in the 3rd, 7th, 11th, and 15th PPR motifs; Substitution of V2S or V2Y in the 6th or 8th PPR motif, and substitution of K13Q or K13S in the 15th motif; Substitution of K13Q or K13S in the 7th or 11th motif, and substitution of V2S or V2Y in the 14th motif; Substitution of V2S or V2Y in the 6th and 14th motifs or the 8th and 14th motifs, and substitution of K13Q or K13S in the 7th and 15th motifs or the 11th and 15th motifs.
[0036] In one embodiment, for a PPR editor protein containing 14 PPR motifs, the following may be effective: substitution of K13Q or K13S in the 11th or 13th motif; substitution of K13Q or K13S in the 7th and 13th motifs; substitution of K13Q or K13S in the 9th and 13th motifs.
[0037] II. Modified PPR Motif, Modified PPR Protein (Embodiment 2) (Modified PPR Motif) This embodiment relates to a PPR motif in which at least one amino acid at positions 2 and 13 is substituted in the amino acid sequence of the PPR motif. In one embodiment, the amino acid substitution is at least one substitution selected from V2 and K13. The substituted amino acid is not particularly limited. In one embodiment, the amino acid substitution is at least one substitution selected from V2F, V2S, V2Y, K13Q, and K13S.
[0038] In one embodiment, the modified PPR motif is any of the following: (1-1) a polypeptide consisting of any one sequence selected from SEQ ID NOs. 5, 9, 13, and 17; (1-2) a polypeptide consisting of a sequence with high sequence identity to any one sequence selected from SEQ ID NOs. 5, 9, 13, and 17, specifically a sequence with 88% or more, preferably 90% or more, more preferably 94% or more, and even more preferably 97% or more, provided that if the selected sequence is 2 or 5, the amino acids corresponding to positions 2, 5, and 35 are identical to those of the selected sequence; if the selected sequence is 13 or 17, the amino acids corresponding to positions 5, 13, and 35 are identical to those of the selected sequence; and a polypeptide capable of functioning as an A-binding PPR motif; (1-3) A polypeptide comprising a sequence in which one of the sequences selected from Sequence IDs 5, 9, 13, and 17 has 1 to 4 amino acids, preferably 1 to 3 amino acids, more preferably 1 to 2 amino acids, and even more preferably 1 amino acid substituted, deleted, or added, provided that if the selected sequence is 2 or 5, the amino acids corresponding to positions 2, 5, and 35 are the same as those in the selected sequence; if the selected sequence is 13 or 17, the amino acids corresponding to positions 5, 13, and 35 are the same as those in the selected sequence; and which is functional as an A-binding PPR motif;
[0039] (2-1) A polypeptide comprising any one sequence selected from sequence numbers 6, 10, 14, and 18; (2-2) A polypeptide comprising a sequence having high sequence identity with any one sequence selected from sequence numbers 6, 10, 14, and 18, specifically, 88% or more, preferably 90% or more, more preferably 94% or more, and even more preferably 97% or more, provided that if the selected sequence is 6 or 10, the amino acids corresponding to positions 2, 5, and 35 are identical to those of the selected sequence; if the selected sequence is 14 or 18, the amino acids corresponding to positions 5, 13, and 35 are identical to those of the selected sequence; and a polypeptide capable of functioning as a C-binding PPR motif; (3-3) A polypeptide comprising a sequence in which one of the sequences selected from Sequence IDs 6, 10, 14, and 18 has 1 to 4 amino acids, preferably 1 to 3 amino acids, more preferably 1 to 2 amino acids, and even more preferably 1 amino acid substituted, deleted, or added, provided that if the selected sequence is 6 or 10, the amino acids corresponding to positions 2, 5, and 35 are the same as those in the selected sequence; if the selected sequence is 14 or 18, the amino acids corresponding to positions 5, 13, and 35 are the same as those in the selected sequence; and which is functional as a C-binding PPR motif;
[0040] (3-1) A polypeptide consisting of any one sequence selected from SEQ ID NOs: 7, 11, 15, and 19; (3-2) A polypeptide consisting of a sequence with high sequence identity to any one sequence selected from SEQ ID NOs: 7, 11, 15, and 19, specifically a sequence with 88% or more, preferably 90% or more, more preferably 94% or more, and even more preferably 97% or more, provided that if the selected sequence is 7 or 11, the amino acids corresponding to positions 2, 5, and 35 are the same as those in the selected sequence, and if the selected sequence is 15 or 19, the amino acids corresponding to positions 5, 13, and 35 are the same as those in the selected sequence, and a polypeptide capable of functioning as a G-binding PPR motif; (3-3) A polypeptide consisting of any one sequence selected from SEQ ID NOs: 5, 9, 13, and 17 in which 1 to 4 amino acids, preferably 1 to 3 amino acids, more preferably 1 to 2 amino acids, and even more preferably 1 amino acid are substituted, deleted, or added, provided that the selected sequence is 7 If the selected sequence is 11, the amino acids corresponding to positions 2, 5, and 35 are identical to those in the selected sequence; if the selected sequence is 15 or 19, the amino acids corresponding to positions 5, 13, and 35 are identical to those in the selected sequence; and the polypeptide is capable of functioning as a G-binding PPR motif;
[0041] (4-1) A polypeptide comprising any one sequence selected from sequence numbers 8, 12, 16, and 20; (4-2) A polypeptide comprising a sequence having high sequence identity with any one sequence selected from sequence numbers 8, 12, 16, and 20, specifically, 88% or more, preferably 90% or more, more preferably 94% or more, and even more preferably 97% or more, provided that if the selected sequence is 8 or 12, the amino acids corresponding to positions 2, 5, and 35 are identical to those of the selected sequence; if the selected sequence is 16 or 20, the amino acids corresponding to positions 5, 13, and 35 are identical to those of the selected sequence; and a polypeptide capable of functioning as a U-binding PPR motif; (4-3) A polypeptide comprising a sequence in which one of the sequences selected from Sequence IDs 8, 12, 16, and 20 has 1 to 4 amino acids, preferably 1 to 3 amino acids, more preferably 1 to 2 amino acids, and even more preferably 1 amino acid substituted, deleted, or added, provided that if the selected sequence is 8 or 12, the amino acids corresponding to positions 2, 5, and 35 are the same as those in the selected sequence, and if the selected sequence is 16 or 20, the amino acids corresponding to positions 5, 13, and 35 are the same as those in the selected sequence, and which is functional as a U-binding PPR motif;
[0042] In a preferred embodiment, the modified PPR motif is a polypeptide consisting of one of the following amino acid sequences: VFTYTTLIDG LCKAGKVDEA LELFKEMRSK GVKPN (SEQ ID NO:5) (Example of A-binding motif) VFTYNTLIDG LCKSGKIEEA LKLFKEMEEK GITPS (SEQ ID NO:6) (Example of C-binding motif) VFTYTTLIDG LCKAGKVDEA LELFDEMKER GIKPD (SEQ ID NO:7) (Example of G-binding motif) VFTYNTLIDG LCKAGRLDEA EELLEEMEEK GIKPD (SEQ ID NO:8) (Example of U-binding motif) VSTYTTLIDG LCKAGKVDEA LELFKEMRSK GVKPN (SEQ ID NO:9) (Example of A-binding motif) VSTYNTLIDG LCKSGKIEEA LKLFKEMEEK GITPS (SEQ ID NO:10) (Example of C-binding motif) VSTYTTLIDG LCKAGKVDEA LELFDEMKER GIKPD (SEQ ID NO:11) (Example of G-binding motif) VSTYNTLIDG LCKAGRLDEA EELLEEMEEK GIKPD (SEQ ID NO:12) (Example of U-binding motif) VVTYTTLIDG LCQAGKVDEA LELFKEMRSK GVKPN (SEQ ID NO:13) (Example of A-binding motif) VVTYNTLIDG LCQSGKIEEA LKLFKEMEEK GITPS (SEQ ID NO:14) (Example of C-binding motif) VVTYTTLIDG LCQAGKVDEA LELFDEMKER GIKPD (SEQ ID NO:15) (Example of G-binding motif) VVTYNTLIDG LCQAGRLDEA EELLEEMEEK GIKPD (SEQ ID NO:16) (Example of U-binding motif) VVTYTTLIDG LCSAGKVDEA LELFKEMRSK GVKPN (SEQ ID NO:17) (Example of A-binding motif) VVTYNTLIDG LCSSGKIEEA LKLFKEMEEK GITPS (SEQ ID NO:18)(Example of C-binding motif) VVTYTTLIDG LCSAGKVDEA LELFDEMKER GIKPD (SEQ ID NO:19) (Example of G-binding motif) VVTYNTLIDG LCSAGRLDEA EELLEEMEEK GIKPD (SEQ ID NO:20) (Example of U-binding motif)
[0043] (Modified PPR Protein) This embodiment relates to a PPR protein consisting of 2 to 20 PPR motifs. The portion consisting of 2 to 20 PPR motifs is a binding region consisting of an array of PPR motifs (P array or PLS array) that can bind to a target RNA. The PPR protein may have a C-terminal domain.
[0044] (P Array) P-type PPR proteins consist of a simple repeat (P array) of a standard 35-amino acid PPR motif (P). The DYW domain of this embodiment can be used by ligating it to a P-type PPR protein.
[0045] (PLS array) PLS-type PPR proteins are composed of repeating PPR motifs (PLS array) consisting of P1, L1, and S1.
[0046] The total length of P1 is not particularly limited as long as it can bind to the target base, but is for example 33 to 37 amino acids long, preferably 34 to 36 amino acids long, and more preferably 35 amino acids long. The total length of L1 is not particularly limited as long as it can bind to the target base, but is for example 33 to 37 amino acids long, preferably 34 to 36 amino acids long, and more preferably 35 amino acids long. The total length of S1 is not particularly limited as long as it can bind to the target base, but is for example 30 to 33 amino acids long, preferably 30 to 32 amino acids long, and more preferably 31 amino acids long.
[0047] In PLS-type PPR proteins, the P1L1S1 repeat portion and the portion up to P2 can be designed according to the PPR-code rules described above, depending on the sequence of the target RNA.
[0048] The number of P1L1S1 repeats is not particularly limited as long as it can bind to the target base sequence, but is for example 1 to 5, preferably 2 to 4, and more preferably 3. In principle, even one unit (3 repeats) can be used. MEF8 (L1-S1-P2-L2-S2-E-DYW), which consists of 5 PPR motifs, is known to be involved in approximately 60 editing sites.
[0049] In natural PPR proteins, the first and last P1L1S1 units show clear differences in the amino acid residues at specific positions, distinguishing them from the internal P1L1S1 units. From the perspective of designing an artificial PLS array that is as close as possible to naturally occurring ones, it is advisable to design three types of P1L1S1 units corresponding to the positions of the PPR motif: the first (N-terminal) P1L1S1 unit, the internal P1L1S1 unit, and the last (C-terminal) P1L1S1 unit located immediately before P2L2S2. In addition to those composed of repeating PLS units, natural PPR proteins also sometimes contain repeating SS units (31 amino acids), and these can also be used in this embodiment.
[0050] (C-terminal domain) The C-terminal domain consists of the P2 motif (P2), L2 motif (L2), S2 motif (S2), E1 motif (E1), E2 motif (E2), and the DYW domain.
[0051] (P2, L2, S2, E1, E2) The total length of P2 is not particularly limited as long as it can bind to the target base, but is for example 33 to 37 amino acids long, preferably 34 to 36 amino acids long, and more preferably 35 amino acids long.
[0052] The total length of L2 is not particularly limited as long as it can bind to the target base, but is, for example, 34 to 38 amino acids long, preferably 35 to 37 amino acids long, and more preferably 36 amino acids long.
[0053] The total length of S2 is not particularly limited as long as it can bind to the target base, but is, for example, 30 to 34 amino acids long, preferably 31 to 33 amino acids long, and more preferably 32 amino acids long.
[0054] The S2 motif correlates with the nucleotide corresponding to L (Takenaka, M. et al. (2013). PLoS One 8:e65343.). Furthermore, it can be incorporated into PLS-type PPR proteins by noting that the C four positions downstream of the target base of the S2 motif in the target sequence of the PPR editor protein is the editing target base by the DYW domain.
[0055] The total length of E1 is not particularly limited as long as it can bind to the target base, but is, for example, 32 to 36 amino acids long, preferably 33 to 35 amino acids long, and more preferably 34 amino acids long.
[0056] In the E1 motif, correlation with nucleotides is observed only with the A5 amino acid (Ruwe et al. (2019) New Phytol. 222 218-229).
[0057] The total length of E2 is not particularly limited as long as it can bind to the target base, but is, for example, 32 to 36 amino acids long, preferably 33 to 35 amino acids long, and more preferably 34 amino acids long.
[0058] The A5 and last amino acid in the E2 motif are highly conserved and are not involved in the recognition of specific PPR-RNAs (Non-Patent Literature 2).
[0059] The frequency of amino acids in the E2 motif selected from those used with DYW:PG, and the frequency of amino acids in the E2 motif selected from those used with DYW:WW, are visualized using sequence logos created with WebLogo and are shown in Figures 14-1 and 14-2, respectively. From Figures 14-1 and 14-2, the conserved and non-conserved positions in the E2 sequence can be understood.
[0060] Generally, the amino acid sequence of the E2 motif used with DYW:PG is as follows:
[0061] AAxYVLLSNIYAAAGRWDExAKVRKLMKERGVKK (SEQ ID NO:31)
[0062] Generally, the amino acid sequence of the E2 motif used together with DYW:WW is as follows.
[0063] AAAYVLMSNIYADAHMWEERDKIQAMRKNARAWK (SEQ ID NO:32)
[0064] In the above sequence, x independently represents any amino acid. Specifically, x can be any one selected from alanine, valine, glycine, isoleucine, leucine, phenylalanine, proline, tryptophan, tyrosine, arginine, asparagine, aspartic acid, glutamic acid, glutamine, lysine, serine, threonine, cysteine, histidine, and methionine.
[0065] (DYW domain) The DYW domain used in this embodiment has a more detailed structure specified and provided by the method described in Patent Document 1, and consists of any of the following amino acid sequences. x a1 PGx a2 SWIEx a3 -x a16 HP - First linker - Hx aa E - Second linker - Cx a17 x a18 CH - Third linker - DYW x b1 PGx b2 SWWTDx b3 -x b16 HP - First linker - Hx bb E - Second linker - Cx b17 x b18 CH - Third linker - DYW
[0066] In the sequence, x independently represents any amino acid, and the first linker, the second linker, and the third linker each independently represent a polypeptide fragment consisting of an amino acid sequence of any length. x a1 PGx a2 SWIEx a3 -x a16 HP - First linker - Hx aa E - Second linker - Cx a17 x a18CH - Third linkage - The DYW domain consisting of DYW is called DYW:PG (sometimes simply called PG, or sometimes called the PG domain), x b1 PGx b2 SWWTDx b3 -x b16 HP - First connecting section - Hx bb E - Second connecting section - Cx b17 x b18 The DYW domain, consisting of CH - the third linkage - and DYW, is sometimes represented as DYW:WW (sometimes simply as WW, or sometimes as the WW domain).
[0067] The DYW domain consists of a region containing a PG box of approximately 15 amino acids at the N-terminus, and a central zinc-binding domain (HxEx n CxxCH, x n The DYW domain is a sequence of any number n amino acids. It has three regions: the DYW domain and the C-terminal DYW domain. The zinc-binding domain can be further divided into the HxE region and the CxxCH region. These regions of each DYW domain can be represented as shown in the table below. Since x is any amino acid, it can be selected from alanine, valine, glycine, isoleucine, leucine, phenylalanine, proline, tryptophan, tyrosine, arginine, asparagine, aspartic acid, glutamic acid, glutamine, lysine, serine, threonine, cysteine, histidine, and methionine.
[0068]
[0069] (DYW:PG) DYW:PG is x a1 PGx a2 SWISE a3 -x a16 HP - First connecting section - Hx aa E - Second connecting section - Cx a17 x a18 CH - Third linkage - DYW polypeptide. Preferably, x a1 PGx a2 SWISE a3 -x a16 HP - First connecting section - Hxaa E - Second connecting section - Cx a17 x a18 It has a CH-third linkage-DYW, has sequence identity with the sequence of position 172-307 of sequence number 49 (detailed in the [Terminology] section), and exhibits C-to-U editing activity when used in the C-terminal domain.
[0070] The total length of DYW:PG is not particularly limited as long as it exhibits C-to-U editing activity, but is for example 110 to 160 amino acids long, preferably 124 to 148 amino acids long, more preferably 128 to 144 amino acids long, and even more preferably 132 to 140 amino acids long, for example 136 amino acids long.
[0071] Area containing the PG box of DYW:PG (x a1 PGx a2 SWISE a3 -x a16 HP) at: x a1 The amino acid is not particularly limited as long as it exhibits C-to-U editing activity as DYW:PG, but is preferably E (glutamic acid) or a similar amino acid, and more preferably G. a2 The amino acid is not particularly limited as long as it exhibits C-to-U editing activity as DYW:PG, but is preferably C (cysteine) or an amino acid with similar properties, and more preferably C. a3 -x a16 Each of the amino acids is not particularly limited as long as it exhibits C-to-U editing activity as DYW:PG, but is preferably the same as or similar in properties to the corresponding amino acid at positions 180-193 in the sequence of SEQ ID NO: 49, and more preferably is the same as the corresponding amino acid at positions 180-193 in the sequence of SEQ ID NO: 49.
[0072] In one preferred embodiment, the HxE region of DYW:PG is HSE regardless of the other regions.
[0073] DYW:PG's CxxCH region, i.e., Cx a17 x a18 In CH: x a17The amino acid is not particularly limited as long as it exhibits C-to-U editing activity as DYW:PG, but is preferably G (glycine) or a similar amino acid, and more preferably G. a18 The amino acid is not particularly limited as long as it exhibits C-to-U editing activity as DYW:PG, but is preferably D (aspartic acid) or an amino acid with similar properties, and more preferably D.
[0074] In DYW:PG, the area including the PG box and Hx aa The part that connects E, Hx aa The portion connecting the E region and the CxxCH region, and the portion connecting the CxxCH region and the DYW, are referred to as the first junction, the second junction, and the third junction, respectively (the same applies to other DYW domains).
[0075] The total length of the first linkage portion of DYW:PG is not particularly limited as long as it exhibits C-to-U editing activity as DYW:PG, but is, for example, 39 to 47 amino acids long, preferably 40 to 46 amino acids long, more preferably 41 to 45 amino acids long, and even more preferably 42 to 44 amino acids long. The amino acid sequence of the first linkage portion is not particularly limited as long as it exhibits C-to-U editing activity as DYW:PG, but is preferably the same as the portion at positions 196 to 238 of the sequence of Sequence ID No. 49, or a sequence in which 1 to 22 amino acids are substituted, deleted, or added in that subsequence, or a sequence having sequence identity with that subsequence, and more preferably the same as that subsequence.
[0076] One preferred embodiment of the first ligation site of DYW:PG is a polypeptide represented by the following formula, which is 43 amino acids long, regardless of the sequence of the other parts of the DYW domain.
[0077] N a25 -N a26 -N a27 - … -N a65 -N a66 -N a67
[0078] The polypeptide described above is preferably a sequence that is the same as the portion of the sequence of SEQ ID NO: 49 at positions 196-238, or a sequence in which multiple amino acids are substituted in that subsequence, and which can exhibit C-to-U editing activity as DYW:PG. In this case, the amino acid substitution is made by an amino acid with a large bits value at the corresponding position in Figure 4-1 of Patent Document 1 (for example, N a29 , N a30 , N a32 , N a33 , N a35 , N a36 , N a40 , N a44 , N a45 , N a47 , N a48 , N a52 , N a53 , N a54 , N a55 , N a58 , N a61 , N a65 , N a67 ) is the same as in Figure 4-1 of Patent Document 1, and it is preferable that the substitution is carried out so that the other amino acids are replaced.
[0079] The total length of the second linkage of DYW:PG is not particularly limited as long as it can exhibit C-to-U editing activity as DYW:PG, but is, for example, 21 to 29 amino acids long, preferably 22 to 28 amino acids long, more preferably 23 to 27 amino acids long, and even more preferably 24 to 26 amino acids long. The amino acid sequence of the second linkage is not particularly limited as long as it can exhibit C-to-U editing activity as DYW:PG, but is preferably the same as the portion of the sequence of Sequence ID No. 49 at positions 242 to 266, or a sequence in which 1 to 13 amino acids are substituted, deleted, or added in that subsequence, or a sequence having sequence identity with that subsequence, and more preferably the same as that subsequence.
[0080] One preferred embodiment of the second ligation site of DYW:PG is a polypeptide represented by the following formula, which is 25 amino acids long, regardless of the sequence of the other parts of the DYW domain.
[0081] N a71 -N a72 -N a73- … -N a93 -N a94 -N a95
[0082] The above polypeptide is preferably the same as the portion at positions 242 to 266 of the sequence of SEQ ID NO: 49, or a sequence in which a plurality of amino acids are substituted in that partial sequence and can exhibit C-to-U editing activity as DYW:PG. At this time, the amino acid substitution is an amino acid with a large bits value (for example, N a71 , N a72 , N a73 , N a76 , N a77 , N a78 , N a79 , N a81 , N a82 , N a86 , N a88 , N a89 , N a91 , N a92 , N a93 , N a94 ) is the same as that in FIG. 4-1 of Patent Document 1, and it is preferable that other amino acids are substituted.
[0083] The total length of the third linker of DYW:PG is not particularly limited as long as it can exhibit C-to-U editing activity as DYW:PG. For example, it is 29 to 37 amino acids long, preferably 30 to 36 amino acids long, more preferably 31 to 35 amino acids long, and even more preferably 32 to 34 amino acids long. The amino acid sequence of the third linker is not particularly limited as long as it can exhibit C-to-U editing activity as DYW:PG, but is preferably the same as the portion at positions 272 to 304 of the sequence of SEQ ID NO: 49, or a sequence in which 1 to 17 amino acids are substituted, deleted, or added in that partial sequence, or a sequence having sequence identity with that partial sequence, and more preferably, the same sequence as that partial sequence.
[0084] One preferred embodiment of the third linker of DYW:PG is a polypeptide represented by the following formula with a length of 33 amino acids, regardless of the sequence of other parts of the DYW domain.
[0085] N a101-N a102 -N a103 - … -N a131 -N a132 -N a133
[0086] The polypeptide described above is preferably a sequence that is the same as the portion of the sequence of SEQ ID NO: 49 at positions 272-304, or a sequence in which multiple amino acids are substituted in that subsequence, and which can exhibit C-to-U editing activity as DYW:PG. In this case, the amino acid substitution is an amino acid with a large bits value at the corresponding position in Figure 4-1 of Patent Document 1 (for example, N a102 , N a104 , N a107 , N a112 , N a114 , N a117 , N a118 , N a121 , N a122 , N a123 , N a124 , N a125 , N a128 , N a130 , N a131 , N a132 ) is the same as in Figure 4-1 of Patent Document 1, and it is preferable that the substitution is carried out so that the other amino acids are replaced.
[0087] (DYW:WW) DYW:WW is x b1 PGx b2 SWWTDx b3 -x b16 HP - First connecting section - Hx bb E - Second connecting section - Cx b17 x b18 CH - Third linkage - DYW polypeptide. Preferably, x b1 PGx b2 SWWTDx b3 -x b16 HP - Third Connection - Hx bb E - Third connecting section - Cx b17 x b18 This polypeptide has a CH-third linkage-DYW, exhibits sequence identity (detailed in the [Terminology] section) with the sequence of position 172-308 of SEQ ID NO: 50, and possesses C-to-U editing activity when used in the C-terminal domain.
[0088] The total length of DYW:WW is not particularly limited as long as it exhibits C-to-U editing activity, but is, for example, 110 to 160 amino acids long, preferably 125 to 149 amino acids long, more preferably 129 to 145 amino acids long, and even more preferably 133 to 141 amino acids long, for example 137 amino acids long.
[0089] The region containing the PG box of DYW:WW, i.e., x b1 PGx b2 SWWTDx b3 -x b16 In HP, the portion consisting of WTD may also be WSD.
[0090] In the area containing the PG box of DYW:WW: x b1 The amino acid is not particularly limited as long as it exhibits C-to-U editing activity as DYW:WW, but is preferably K (lysine) or a similar amino acid, and more preferably K. b2 The amino acid is not particularly limited as long as it exhibits C-to-U editing activity as DYW:WW, but is preferably Q (glutamine) or a similar amino acid, and more preferably Q. b3 -x b16 Each of the amino acids is not particularly limited as long as it exhibits C-to-U editing activity as DYW:WW, but is preferably the same as or similar in properties to the corresponding amino acids at positions 181-194 in the sequence of SEQ ID NO: 50, and more preferably is the same as the corresponding amino acids at positions 181-194 in the sequence of SEQ ID NO: 50.
[0091] In one preferred embodiment, the HxE region of DYW:WW is HSE regardless of the arrangement of the other parts.
[0092] The CxxCH region of DYW:WW, i.e., Cx b17 x b18 In CH: x b17The amino acid is not particularly limited as long as it exhibits C-to-U editing activity as DYW:WW, but is preferably D (aspartic acid) or a similar amino acid, and more preferably D. b18 The amino acid is not particularly limited as long as it exhibits C-to-U editing activity as DYW:WW, but is preferably D or an amino acid with similar properties, and more preferably D.
[0093] The total length of the first linkage of DYW:WW is not particularly limited as long as it can exhibit C-to-U editing activity as DYW:WW, but is, for example, 39 to 47 amino acids long, preferably 40 to 46 amino acids long, more preferably 41 to 45 amino acids long, and even more preferably 42 to 44 amino acids long. The amino acid sequence of the first linkage is not particularly limited as long as it can exhibit C-to-U editing activity as DYW:WW, but is preferably the same as the portion of the sequence of Sequence ID No. 50 at positions 197 to 239, or a sequence in which 1 to 22 amino acids are substituted, deleted, or added in that subsequence, or a sequence having sequence identity with that subsequence, and more preferably the same as that subsequence.
[0094] One preferred embodiment of the first ligation site of DYW:WW is a polypeptide represented by the following formula, which is 43 amino acids long, regardless of the sequence of the other parts of the DYW domain.
[0095] N b26 -N b27 -N b28 - … -N b66 -N b67 -N b68
[0096] The polypeptide described above is preferably the same as the portion of the sequence of SEQ ID NO: 50 from positions 197 to 239, or a sequence in which multiple amino acids are substituted in that subsequence, and which can exhibit C-to-U editing activity as DYW:PG. In this case, the amino acid substitution is made by an amino acid with a large bits value at the corresponding position in Figure 4-2 of Patent Document 1 (for example, N b26 , N b30 , N b33 , N b34 , N b37 , Nb41 , N b45 , N b46 , N b48 , N b49 , N b51 , N b52 , N b53 , N b55 , N b56 , N b57 , N b59 , N b61 , N b62 , N b63 , N b64 , N b66 , N b67 , N b68 ) is the same as in Figure 4-2 of Patent Document 1, and it is preferable that the substitution is carried out so that the other amino acids are replaced.
[0097] The total length of the second linkage of DYW:WW is not particularly limited as long as it can exhibit C-to-U editing activity as DYW:WW, but is, for example, 21 to 29 amino acids long, preferably 22 to 28 amino acids long, more preferably 23 to 27 amino acids long, and even more preferably 24 to 26 amino acids long. The amino acid sequence of the second linkage is not particularly limited as long as it can exhibit C-to-U editing activity as DYW:WW, but is preferably the same as the portion at positions 243 to 267 of the sequence of Sequence ID No. 50, or a sequence in which 1 to 13 amino acids are substituted, deleted, or added in that subsequence, or a sequence having sequence identity with that subsequence, and more preferably the same as that subsequence.
[0098] One preferred embodiment of the second ligation site of DYW:WW is a polypeptide represented by the following formula, which is 25 amino acids long, regardless of the sequence of the other parts of the DYW domain.
[0099] N b72 -N b73 -N b74 - … -N b94 -N b95 -N b96
[0100] The polypeptide described above is preferably a sequence that is the same as the portion of the sequence of SEQ ID NO: 50 at positions 243-267, or a sequence in which multiple amino acids are substituted in that subsequence, and which can exhibit C-to-U editing activity as DYW:WW. In this case, the amino acid substitution is an amino acid with a large bits value at the corresponding position in Figure 4-2 of Patent Document 1 (for example, N b72 , N b73 , N b74 , N b75 , N b77 , N b78 , N b79 , N b81 , N b82 , N b84 , N b88 , N b89 , N b90 , N b91 , N b92 , N b93 , N b94 , N b95 , N b96 ) is the same as in Figure 4-2 of Patent Document 1, and it is preferable that the substitution is carried out so that the other amino acids are replaced.
[0101] The total length of the third linkage of DYW:WW is not particularly limited as long as it can exhibit C-to-U editing activity as DYW:WW, but is, for example, 29 to 37 amino acids long, preferably 30 to 36 amino acids long, more preferably 31 to 35 amino acids long, and even more preferably 32 to 34 amino acids long. The amino acid sequence of the third linkage is not particularly limited as long as it can exhibit C-to-U editing activity as DYW:WW, but is preferably the same as the portion of the sequence of Sequence ID No. 50 at positions 273 to 305, or a sequence in which 1 to 17 amino acids are substituted, deleted, or added in that subsequence, or a sequence having sequence identity with that subsequence, and more preferably the same as that subsequence.
[0102] One preferred embodiment of the third ligation site of DYW:WW is a polypeptide represented by the following formula, which is 33 amino acids long, regardless of the sequence of the other parts of the DYW domain.
[0103] N b102 -N b103 -N b104- … -N b132 -N b133 -N b134
[0104] The polypeptide described above is preferably a sequence that is the same as the portion of the sequence of SEQ ID NO: 50 at positions 273 to 305, or a sequence in which multiple amino acids are substituted in that subsequence, and which can exhibit C-to-U editing activity as DYW:WW. In this case, the amino acid substitution is an amino acid with a large bits value at the corresponding position in Figure 4-2 of Patent Document 1 (for example, N b104 , N b105 , N b107 , N b108 , N b109 , N b110 , N b111 , N b113 , N b115 , N b116 , N b117 , N b118 , N b119 , N b121 , N b122 , N b123 , N b124 , N b126 , N b129 , N b131 , N b132 , N b133 N b134 ) is the same as in Figure 4-2 of Patent Document 1, and it is preferable that the substitution is carried out so that the other amino acids are replaced.
[0105] (Reduction of Off-Target Effects) This embodiment also relates to a method for reducing the off-target effects of a PPR protein, comprising the steps of carrying out the modification method described above; and allowing the resulting modified PPR protein to act on target RNA in prokaryotic or eukaryotic cells.
[0106] Unlike natural PPR-P motifs, artificial PPR-P motifs used in the development of biotechnology tools are designed based on consensus sequences, resulting in extremely high homology with adjacent motifs. Prior art methods (Non-Patent Literature 7) have shown, through in vitro bind-n-seq studies, that artificial P arrays (repeating structures of P motifs) are highly likely to experience mismatches near the 3' end of the binding site. According to the same study, the optimal number of motifs for the PPR domain is 10, and it is suggested that misalignment occurs between the PPR protein and the target sequence with a longer number of motifs (14 motifs in the same study). However, minimizing the number of motifs is not necessarily a solution to minimizing off-target effects. The method of this embodiment involves substituting amino acids at positions 2 and 13 of the PPR motif, providing an effective means of reducing off-target effects without compromising the stability and target specificity of the PPR / RNA complex while maintaining the length of the PPR domain.
[0107] III. Other Embodiments The present invention also provides: • Nucleic acids encoding the PPR protein of Embodiment 2 • Vectors containing the above nucleic acids • Cells containing the above vectors.
[0108] Vectors include viral vectors. Vectors for amplification can use E. coli or yeast as hosts. In this specification, an expression vector means a vector that includes, for example, DNA having a promoter sequence, DNA encoding a desired protein, and DNA having a terminator sequence from upstream, but the sequences do not necessarily have to be in this order as long as the desired function is performed. In this embodiment, various vectors that are commonly used by those skilled in the art can be rearranged and used.
[0109] Specifically, this embodiment provides a nucleotide sequence encoding a PPR editor, comprising at least one PPR motif, an RNA-binding domain (for example, an RNA-binding domain which is a PLS-type PPR protein) which is sequence-specifically bound to a target RNA (preferably an animal target RNA) according to the rules of the PPR-code, and a DYW domain which is one of the aforementioned DYW:PG, DYW:WW, or DYW:KP.
[0110] Another form of the present invention provides a vector for editing target RNA, comprising a nucleotide sequence encoding a PPR editor, which includes at least one PPR motif and an RNA-binding domain (preferably an RNA-binding domain which is a PLS-type PPR protein) capable of sequence-specifically binding to target RNA (preferably an animal target RNA) according to the rules of the PPR-code, and a DYW domain which is one of the aforementioned DYW:PG, DYW:WW, or DYW:KP.
[0111] The PPR editor of Embodiment 2 may function in eukaryotic cells (e.g., animal, plant, microorganism (yeast, etc.), protist). In particular, the PPR editor of this embodiment may function in animal cells (in vitro or in vivo). Examples of animal cells into which the PPR editor, or a vector expressing the PPR editor, may be introduced include cells derived from humans, monkeys, pigs, cattle, horses, dogs, cats, mice, and rats. Examples of cultured cells into which the PPR editor, or a vector expressing the PPR editor, may be introduced include, but are not limited to, Chinese hamster ovary (CHO) cells, COS-1 cells, COS-7 cells, VERO (ATCC CCL-81) cells, BHK cells, canine kidney-derived MDCK cells, hamster AV-12-664 cells, HeLa cells, WI38 cells, HEK293 cells, HEK293T cells, and PER. C6 cells.
[0112] The PPR editor of Embodiment 2 can convert editing target C to U, or editing target U to C, within the target RNA. RNA-binding PPR proteins are involved in all RNA processing steps found in organelles: cleavage, RNA editing, translation, splicing, and RNA stabilization.
[0113] The PPR editor of Embodiment 2 allows for single-base editing of mitochondrial RNA. Mitochondria have their own genomes and encode constituent proteins of important complexes involved in respiration and ATP production. Mutations in these proteins are known to cause various diseases. Mutation repair using this embodiment is expected to provide treatment for a variety of diseases.
[0114] The improved RNA base editing methods achieved by embodiments 1 and 2 described above are expected to have the following applications in various fields.
[0115] (1) Recognize and edit specific RNAs related to medical treatment or specific diseases. By using this embodiment, it is possible to treat genetic diseases caused by single nucleotide mutations. Many mutations in genetic diseases are in the direction of C to U mutations. Therefore, the method of this embodiment, which can convert U to C, may be particularly useful.
[0116] - Create cells with controlled RNA suppression and expression. Such cells include stem cells (e.g., iPS cells) whose differentiated and undifferentiated states are monitored, model cells for evaluating cosmetics, and cells in which the expression of functional RNA can be switched ON / OFF for the purpose of elucidating drug discovery mechanisms and conducting pharmacological tests.
[0117] (2) Agriculture, forestry and fisheries: To improve yield and quality in crops, forest products, fishery products, etc. - To improve disease resistance, improve environmental tolerance, and breed organisms with improved or new functionalities.
[0118] For example, with regard to first-generation hybrid (F1) crops, it may be possible to artificially create F1 crops by editing mitochondrial RNA with a PPR editor, thereby improving yield and quality. RNA editing with a PPR editor allows for more accurate and rapid improvement of biological varieties and breeding (genetically improving organisms) than conventional techniques. Furthermore, since RNA editing with a PPR editor does not involve altering traits with foreign genes like genetic modification, it is closer to traditional breeding methods such as mutant selection and backcrossing. Therefore, it can reliably and quickly address global food and environmental problems.
[0119] (3) In the production of useful substances using chemicals, microorganisms, cultured cells, plants, and animals (e.g., insects), protein expression levels are controlled by manipulating RNA. This can improve the productivity of useful substances. Examples of useful substances include proteinaceous substances such as antibodies, vaccines, and enzymes, as well as relatively low-molecular-weight compounds such as pharmaceutical intermediates, fragrances, and dyes.
[0120] - Improve the efficiency of biofuel production by modifying the metabolic pathways of algae and microorganisms.
[0121] IV. Unless otherwise specified, the numerical range x to y includes the values x and y at both ends.
[0122] In this specification, claims, and drawings, bases or nucleosides in nucleic acids are represented by a single letter of the alphabet. Unless otherwise specified, A represents adenine or adenosine, C represents cytosine or cytidine, G represents guanine or guanosine, U represents uracil or uridine, T represents uracil or uridine in RNA sequences, and thymine or thymidine in DNA sequences. In this specification, claims, and drawings, unless otherwise specified, amino acids are represented by a single letter of the alphabet. Specifically, A represents alanine, L represents leucine, R represents arginine, K represents lysine, N represents asparagine, M represents methionine, D represents aspartic acid, F represents phenylalanine, C represents cysteine, P represents proline, Q represents glutamine, S represents serine, E represents glutamic acid, T represents threonine, G represents glycine, W represents tryptophan, H represents histidine, Y represents tyrosine, I represents isoleucine, and V represents valine.
[0123] Furthermore, when a variant of a protein or enzyme is represented by a string consisting of one letter of the alphabet, a number, and another letter of the alphabet following the number, the leftmost letter indicates the amino acid before the mutation, the middle number indicates the position of the amino acid, and the rightmost letter indicates the amino acid after the mutation, meaning that the leftmost amino acid has been replaced by the rightmost amino acid. For example, M10I indicates that methionine at position 10 in the amino acid sequence has been replaced with isoleucine.
[0124] In relation to proteins or enzymes, specific amino acids in the amino acid sequence may be represented by a single letter of the alphabet and a number. For example, M10 refers to the M at position 10 in the amino acid sequence. Similarly, 10I indicates that the amino acid at position 10 in the amino acid sequence has been substituted with I. The original amino acid is not relevant.
[0125] In relation to the amino acid sequences of proteins and polypeptides, amino acid residues are sometimes simply referred to as amino acids.
[0126] With respect to base sequences (sometimes called nucleotide sequences) or amino acid sequences, "identity," unless otherwise specified, refers to the percentage of matching bases or amino acids shared between two sequences when the two sequences are aligned in the most optimal manner. That is, identity can be calculated as (number of matching positions / total number of positions) × 100, and can be calculated using commercially available algorithms. Such algorithms are incorporated into the NBLAST and XBLAST programs described in Altschul et al., J.Mol.Biol. 215(1990) 403-410. More specifically, the search and analysis of identity between base sequences or amino acid sequences can be performed using algorithms or programs well known to those skilled in the art (e.g., BLASTN, BLASTP, BLASTX, ClustalW). When using a program, the parameters can be appropriately set by those skilled in the art, or the default parameters of each program may be used. The specific methods of these analysis methods are also well known to those skilled in the art.
[0127] With respect to the base sequence or amino acid sequence, a high degree of sequence identity is preferred unless otherwise specified. Specifically, it is preferable to have 40% or more, more preferably 45% or more, even more preferably 50% or more, even more preferably 55% or more, even more preferably 60% or more, and even more preferably 65% or more. Furthermore, it is preferable to have 70% or more, more preferably 80% or more, even more preferably 85% or more, even more preferably 90% or more, even more preferably 95% or more, and even more preferably 97.5% or more.
[0128] With respect to a polypeptide or protein, the number of amino acids substituted, deleted, or added in a "substituted, deleted, or added sequence" is not particularly limited in any motif or protein, as long as the motif or protein consisting of that amino acid sequence has the desired function, unless otherwise specified. However, it is usually around 1 to 9 or 1 to 4 amino acids, or even more if the substitutions are with similar amino acids. Means for preparing polynucleotides or proteins relating to such amino acid sequences are well known to those skilled in the art.
[0129] Similar amino acids refer to amino acids with similar physical properties such as hydroxyl, charge, pKa, and solubility. Examples include the following: Hydrophobic (nonpolar) amino acids: alanine, valine, glycine, isoleucine, leucine, phenylalanine, proline, tryptophan, tyrosine. Nonhydrophobic amino acids: arginine, asparagine, aspartic acid, glutamic acid, glutamine, lysine, serine, threonine, cysteine, histidine, methionine; Hydrophilic amino acids: arginine, asparagine, aspartic acid, glutamic acid, glutamine, lysine, serine, threonine; Acidic amino acids: aspartic acid, glutamic acid; Basic amino acids: lysine, arginine, histidine; Neutral amino acids: alanine, asparagine, cysteine, glutamine, glycine, isoleucine, leucine, methionine, phenylalanine, proline, serine, threonine, tryptophan, tyrosine, valine; Sulfur-containing amino acids: methionine, cysteine; Aromatic ring amino acids: tyrosine, tryptophan, phenylalanine.
[0130] [Results] (Design of Designer PPR Editors) In this study, following 16 PPR-P motifs, we designed PPR editors (16P-PG and 16P-WW, respectively) that have 5 PPR-like motifs (P2, L2, S2, E1, E2) and a PG or WW domain (Non-Patent Literature 7) as a DYW domain at the C-terminus (Figure 1a). These were designed to edit cytidine in human CTNNB1 mRNA to uridine and to replace threonine, the 41st amino acid of β-catenin encoded by the CTNNB1 gene, with isoleucine (Figure 1a, b).
[0131] To investigate the target specificity of the designer PPR editors, HEK293T cells were transfected with them, and 24 hours after transfection, the editing efficiency of the target CTNNB1 mRNA and the mRNA of other genes (DNAJA1, HAX1, PPA1, SF3B2, UBE2D3) where off-target effects were expected was measured by direct sequencing (Figure 1c).
[0132] (Mutations of the amino acid at position 2 in 1 or 2 PPR motifs reduce off-target effects) The amino acid at position 2 within a PPR motif acts in RNA binding by sandwiching nucleic acids with van der Waals forces (Non-Patent Literature 3). The amino acid at position 2 is usually a small, hydrophobic amino acid such as valine (V) or isoleucine (I) (Figure 2). To investigate the effect of amino acids with different properties on RNA editing site specificity, the valine (V) amino acid at position 2 was substituted with phenylalanine (F), tyrosine (Y), or serine (S) (V2F, V2Y, V2S, respectively). These are amino acids that account for approximately 20% of the amino acids at the same position in the Arabidopsis thaliana PPR protein.
[0133] First, we investigated the effects of single mutations in which the amino acid at position 2 was substituted in one motif (any of the 6th, 8th, 10th, 12th, or 14th PPR motifs) within the 16P-WW protein (motifs SEQ ID NOs. 5-12). The results showed that these mutants did not affect the editing efficiency of the CTNNB1 site. On the other hand, off-target effects were slightly reduced, and in particular, the V2S mutation showed a significant decrease in editing efficiency at multiple off-target sites (Figure 3). Next, when the amino acid at position 2 was substituted in either two PPR motifs (either the 6th / 14th or 8th / 14th motifs), the V2S double mutation reduced off-target effects similarly to the single mutation, but also decreased editing efficiency at the PPA1-1 site (Figure 4).
[0134] (Mutations to the amino acid at position 13 significantly reduce off-target effects) The amino acids at position 13 of the P array are mainly basic lysine (K) and arginine (R), which form salt bridges with the negatively charged phosphate group of RNA (see Supplementary Fig. 8b, 5a of Non-Patent Literature 3). On the other hand, in the PLS array, only the S1 motif has a basic amino acid at position 13 (Fig. 5d), while the P1 motif is mostly composed of non-basic, hydrophilic amino acids such as glutamine (Q) or glutamic acid (E), and the L1 motif is mostly composed of serine (S) (Figs. 5b, c). Based on previous findings, it is thought that the PLS domain has weaker and more transient binding to the target compared to the P domain. Therefore, we hypothesized that by substituting K for Q (K13Q) or S (K13S) for the amino acid at position 13 of the PPR motif in 16P-PG and 16P-WW proteins, the stability of the PPR / RNA complex can be reduced, thereby reducing off-target effects.
[0135] To test this hypothesis, we first introduced K13Q and K13S mutations into only one PPR motif (either motif 3, 7, 11, or 15) within the 16P-PG or 16P-WW protein. Specifically, we used motifs from sequence numbers 13–20. As a result, these mutations introduced into the 16P-PG protein reduced editing efficiency at five of the six known off-target sites, but maintained editing efficiency at the sixth off-target site (PPA1-2) and the on-target site (CTNNB1) at levels equivalent to or slightly (5%) lower than the wild (WT) (Figure 6a). On the other hand, similar mutations in the 16P-WW protein did not reduce off-target effects, and when the mutation was introduced into motif 3, it reduced editing activity at the on-target site (Figure 6b).
[0136] Next, when mutations were introduced into the amino acid at position 13 of two PPR motifs (any combination of motifs 3 and 7, 3 and 11, 7 and 15, or 11 and 15), these double mutations significantly reduced or eliminated off-target effects in the 16P-PG and 16P-WW proteins (Figure 7). However, combinations containing motif 3 also reduced on-target editing efficiency.
[0137] Finally, mutations were introduced into four motifs (3rd, 7th, 11th, and 15th) in the 16P-PG and 16P-WW editors. As a result, on-target and off-target editing activity was reduced or eliminated in all mutants (Figure 8). This is presumed to be due to amino acid substitution within the 3rd motif, similar to the case of double mutations.
[0138] (Combining amino acid mutations at positions 2 and 13 further reduces off-target effects) Mutations were introduced at amino acids at positions 2 and 13 within different motifs of the 16P-WW protein, and their impact on off-target effects was analyzed. In previous studies, neither single mutation at position 2 nor 13 significantly reduced off-target effects (Figure 3, Figure 6b), but combining mutations at both locations reduced off-target effects to the same or greater extent than double mutations at either position 2 or 13 (Figure 9). In mutants containing V2Y substitution, off-target effects were not reduced, or if they were reduced, on-target editing activity was also decreased. There was no significant difference between K13Q and K13S mutations. When an amino acid mutation at position 13 of the 11th motif was combined with an amino acid mutation at position 2 of the 14th motif, on-target effects were maintained at the same level as the wild type, while editing activity was significantly reduced at five off-target sites.
[0139] Next, we investigated the introduction of amino acid mutations into four motifs, combining amino acid mutations at position 2 (V2S or V2Y) in motifs 6 and 14 or 8 and 14, and amino acid mutations at position 13 (K13Q or K13S) in motifs 7 and 15 or 11 and 15 (Figure 10). Mutants with two V2Y substitutions had no effect on the on-target and only slightly reduced off-target effects. On the other hand, mutants with two V2S substitutions eliminated off-target effects at five sites and significantly reduced them at the remaining site. However, when mutations in motif 6 (V2S) and motif 7 (K13Q) were included, the on-target editing efficiency was also significantly reduced. Since V2S mutations in motifs 8 and 14 and K13S or K13Q mutations in motifs 11 and 15 reduced off-target effects while maintaining on-target editing activity, it was suggested that introducing mutations into motifs in the latter half of the PPR domain is an effective method for reducing off-target effects.
[0140] (The amino acid mutation at position 13 reduces off-target effects at the transcriptome level.) Prior to transcriptome analysis, the optimal length of the PPR-P domain of a PPR editor (WW protein) that minimizes off-target effects and exhibits high on-target editing activity was investigated. In this investigation, a modified WW domain (WW2, with the D13A mutation introduced into the E2 motif of WW; Sequence ID No. 52) was used to increase editing efficiency, and the number of P motifs was changed from 16 to 10, 12, or 14 (Figure 11). These PPR editors were transfected into HEK293T cells, and editing efficiency was measured after 48 hours. As a result, on-target editing efficiency increased with increasing motif number, reaching a plateau at 14 motifs. Off-target effects were similar for 10P-WW2 and 12P-WW2 proteins, slightly higher for 14P-WW2, and significantly increased for 16P-WW2. Due to the balance between high on-target editing activity and low off-target effects, the 14P-WW2 protein was chosen for subsequent transcriptome-level analysis of off-target effects. The sequence of the 14P portion is shown in SEQ ID NO: 53.
[0141] Since a comprehensive comparative study of amino acid mutations in the PPR backbone due to differences in PPR array length has not been conducted, we introduced K13Q or K13S mutations into one or two motifs (odd-numbered motifs) of 14P-WW2 and analyzed the off-target effects. Because 14P-WW2 showed lower off-target effects compared to 16P-WW, HAX1 was excluded from the analysis, and three new targets (ITCH, RBPP8, and TSPAN33) were added (Figure 1c).
[0142] Only when mutations were introduced into motifs 11 and 13 did off-target effects decrease while maintaining on-target effects (Figure 12a). When double mutations were introduced, off-target effects decreased in all combinations, but the reduction was greatest in combinations including motif 13. The optimal balance between high on-target editing activity and low off-target effects was found when the two mutated motifs were separated by 3 to 5 motifs.
[0143] To investigate the impact of a double mutation at position 13 on off-target effects across the entire human transcriptome, RNA sequencing was performed on the K13Q and K13S mutants ('PPR-9 / 13 K13Q' and 'PPR-7 / 13 K13S') that exhibited the lowest off-target effects. Unmutated 14P-WW and modified 14P-WW2 proteins were used as controls. RNA-seq analysis revealed that the mutants showed approximately 10% lower on-target editing efficiency compared to 14P-WW2 (Figure 13a), but the number of off-target sites was reduced by 6 to 8 times (Figure 13b).
[0144] Furthermore, the effect of four lengths of PPR domains on the activity of DYW mutants modified to improve editing efficiency was re-evaluated. PG2 (a variant of PG with a P3A mutation introduced into the E2 motif; Sequence ID No. 51) was used as the PG domain. WW2 was used as the WW domain. PG2 and WW2, which ligated 14 P motifs, showed a reduced number of off-target sites (Figures 15A and B).
[0145] To further study the potential toxicity of PG2, which linked 14 P motifs, and WW2, and to evaluate their temporal stability, time-course analyses were performed over 24, 48, and 72 hours, using RESCUE-S (RNA Editing for Specific C-to-U Exchange, an RNA editing technology fusing CRISPR-Cas13 with a modified ADAR (adenosine deaminase acting on RNA) enzyme)
[0014] as a control. Cell death was lower compared to cells transfected with an empty vector, but increased slightly over time, reaching 7–11% at 24 hours and 12–25% at 48 hours (Figure 16A). At 72 hours, cell death increased further, with a larger range of variation (12–40%), likely due to the influence of cell density. The rate of cell death was lower at all time points compared to RESCUE-S. The editing efficiency of the CTNNB1-T41I site was already high at 24 hours and plateaued at 48 hours (Figure 16B). After editing, accumulation of β-catenin protein was observed in all mutants (Figure 16C). This protein increase, along with the fact that there was no significant difference in total CTNNB1 mRNA expression levels at the three time points compared to the blank vector control (Figure 16D), supports the idea that 14P-PG2 and -WW2 do not affect overall gene expression. Furthermore, at all three time points, the 14P-PG2 and the three -WW2 mutants accumulated more β-catenin than cells transfected with 14P-PG and 14P-WW, confirming that mutations to improve editing efficiency have a beneficial effect on the translation of edited RNA (Figure 16C).
[0146] To investigate the relationship between observed functional changes (increased β-catenin protein accumulation in some mutants, decreased off-target editing in others) and PPR protein binding affinity, RNA electrophoresis mobility shift assays (REMSA) were performed using six developed editing tools (RECODE (RNA Editor for C-to-U with an Optimized DYW Enzyme)) and their targets (CTNNB1-T41I). The results showed that all RECODE mutants had K(d) values ranging from 1.6 to 6 nM (Figure 17). The RECODE-PG and RECODE-PG2 mutants exhibited the strongest and weakest binding affinities in the series, respectively, suggesting that the E2:P3A mutation leads to a slight but measurable decrease in binding affinity. In contrast, the RECODE-WW mutant group showed very similar K(d) values (4.4–5.4 nM), suggesting that subtle amino acid changes in these mutants do not have a detectable effect on the overall binding affinity to the target RNA.
[0147] (Highly active RECODE with broad sequence recognition capability) In nature, PPR editors exhibit specific nucleotide restriction upstream and downstream of the editing site
[0013] . To characterize the sequence restriction of RECODE-PG, RECODE-PG2, RECODE-WW, and RECODE-WW2, the editing efficiency at 15 endogenous targets (SEQ ID NOs.: 56-70) was analyzed. These targets were selected to represent a broad diversity of nucleotides at positions -6 to +5 relative to the editing site. Their activity was directly compared with RESCUE-S, for which optimized gRNAs were pre-designed for most sites
[0014] . RECODE consistently outperformed RESCUE-S, with at least one RECODE variant showing editing efficiency of 50% or more at all sites except PPIB-A19V (Figure 9A). Only RECODE-WW2 edited sites with G at position -1 at both targets, but PPIB-A19V showed low efficiency. These observations are consistent with off-target analysis in CTNNB1-T41I, showing that only 10% of off-target sites have G at position -1 (Figure 6D). Variations in target editing efficiency may be due to suboptimal PPR codes or combinations of adjacent PPR codes. To further reduce off-target editing, RECODE-WW targets KRAS-Q25X and SMARCA4-P88L
[0014] . 2-PPRm1 Mutants (K13S mutation introduced into motifs 7 and 13) and RECODE-WW 2-PPRm2 We designed mutants (introducing the K13Q mutation into motifs 9 and 13). These mutants significantly reduced the off-target effects of RECODE-WW2 [KRAS-Q25X] and RECODE-WW2 [SMARCA-P88L], but RECODE-WW 2-PPR With m2[KRAS-Q25X], on-target editing efficiency was significantly reduced (Figure 9B). These results demonstrate RECODE's ability to specifically edit arbitrary RNA sequences and confirm that mutations in the PPR domain can regulate off-target activity.
[0148] [Discussion] (Application of PPR-P domains to genetic engineering techniques) Unlike natural PPR-P motifs, artificial PPR-P motifs used in the development of biotechnology tools are designed based on consensus sequences and therefore exhibit extremely high homology with adjacent motifs. Studies using in vitro bind-n-seq have shown that artificial P arrays (repeating structures of P motifs) are highly likely to cause mismatches near the 3' end of the binding site (Non-Patent Literature 8). According to the same study, the optimal number of motifs for a PPR domain is 10, and it is suggested that misalignment occurs between the PPR protein and the target sequence with a longer number of motifs (14 motifs in the same study).
[0149] Similar findings were obtained in this study using a PPR editor (P-WW2) targeting the CTNNB1-T41I site. The off-target effect of the PPR editor was a significant increase in editing activity at one site when the number of PPR motifs was 14, and at three additional sites when the number of motifs was 16. On the other hand, reducing the number of motifs also decreased on-target editing efficiency, so minimizing the number of motifs is not necessarily a solution to minimize off-target effects.
[0150] (The amino acid substitution at position 13 in the C-terminal PPR motif enhances target specificity.) Generally, PLS-class PPR proteins are mainly involved in RNA editing, while P-class PPR proteins play an important role in stabilizing RNA molecules (such as controlling splicing) or inhibiting enzyme reactions (such as protecting the ends from exonucleases) (Non-patent Literature 1). Therefore, PPR / RNA complexes containing P arrays are considered to be more stable than those containing PLS arrays.
[0151] The basic amino acid at position 13 interacts with the RNA backbone and stabilizes the PPR / RNA complex. Substitution of lysine at position 13 with another amino acid in all P motifs inhibits RNA binding (Non-Patent Literature 3)
[10] . In this study, substitution of the amino acid at position 13 in the first half of the PPR-P array reduced on-target editing efficiency, suggesting that the first half of the PPR motif is important for initiating binding to the target. Conversely, substitution of the amino acid at position 13 in the second half of the P motif reduced off-target effects. This suggests that the C-terminal P motif is important for editing target bases.
[0152] Mutations to the amino acid at position 13 are thought to prevent misalignment between PPR and RNA by reducing target specificity, but this point was not clarified in this study. This possibility has been supported by a previous report of bind-n-seq analysis using PPR10, a P-class PPR protein
[11] . In that study, it was shown that only three of the 19 PPR motifs of PPR10 (motifs 1, 8, and 18) have a non-basic amino acid at position 13, and these motifs have low specificity for the target base and adjacent bases.
[0153] (Optimizing the balance between target specificity and off-target effects through amino acid mutations at multiple positions) Probabilistic studies have shown that the amino acid at position 2, inserted between two bases of the target sequence in a P motif, affects specific nucleic acid recognition by position 5 and the last amino acid
[12] . Interestingly, it was expected that replacing the valine amino acid at position 2 with aromatic amino acids such as phenylalanine (F) or tyrosine (Y) would inhibit nucleic acid recognition, but in reality, it only slightly reduced target selectivity. On the other hand, substituting the amino acid at position 2 with serine reduced off-target effects, similar to the amino acid substitution at position 13.
[0154] To maintain on-target editing activity, only a limited number of motif substitutions are possible at amino acids at positions 2 or 13. However, targeting mutations at both positions in a PPR array is considered an effective means of increasing the number of mutations introduced without compromising the stability and target specificity of the PPR / RNA complex.
[0155] (Constraints in this study) This study can be positioned as an important initial step in reducing off-target effects in PPR-based editors. The findings obtained are applicable to other P-class PPR proteins, regardless of the presence or absence of an effector domain, but there are some constraints. First, since the PPR protein focused on in this study is an RNA editor, a PPR-like motif and a DYW domain are ligated to the back of the P array, and these may interact with the P array. The precise positioning of the DYW domain on the target base and the nucleic acid recognition mode of the PPR-like motif may affect RNA recognition by the PPR-P array.
[0156] The second point is that this study only examined a limited number of amino acid mutations based on natural P-class or PLS-class PPR proteins. Because natural PPR motifs exhibit extreme diversity in length and amino acid composition, other amino acid mutations may be more suitable for the synthesis of artificial PPR proteins.
[0157] Therefore, high-throughput research is essential to summarize this study. The next step requires analysis that ensures diversity in PPR domain length, target sequence, and amino acid mutation types. The ultimate goal is to determine one or more amino acid mutation profiles that reduce off-target effects of PPR-P arrays.
[0158] [Methods] (Design of Artificial PPR Protein) Based on previously reported information, the C-terminal domain of the PPR editor used in this study consists of five PPR-like motifs (P2, L2, S2, E1, E2) and a DYW domain (PG or WW) (Non-Patent Literature 7). First, this C-terminal domain was cloned into the mammalian cell expression vector PM18033, which contains the earliest promoter of human cytomegalovirus (CMV), a β-globin chimeric intron, and an SV40 polyadenylation signal, using the GoldenGate method with restriction enzyme Esp3I. Next, a PPR-P domain [2] designed to recognize the upstream sequence of the target CTNNB1-T41I editing site was inserted using the GoldenGate method with restriction enzyme BpiI. Amino acid mutations in the PPR editor PG and WW proteins were introduced using site-directed mutagenesis.
[0159] (Mammalian Cell Culture) In this study, human cultured cells HEK293T (RIKEN, RCB2202) were used. For culture, Dulbecco's modified Eagle medium (DMEM) (Fujifilm Wako Pure Chemical Industries) containing high glucose, L-glutamine, phenol red, and sodium pyruvate was used, supplemented with 10% fetal bovine serum (Capricom) and 1% penicillin-streptomycin solution (Fujifilm Wako Pure Chemical Industries). The cells were cultured under optimal conditions at 37°C with 5% CO2 and moderate humidity. Cell subculturing was performed every 2-3 days when the cells reached 80-90% confluence.
[0160] (Transfection) HEK293T cells were placed in a 24-well flat-bottom cell culture plate (ThermoFisher) approximately 8.0 × 10⁶ 24 hours before transfection. 4 Cells were seeded at a cell / well concentration. A mixture of 500 ng of transfect plasmid, Opti-MEM (trademark) I low-serum medium (ThermoFisher), and 1.5 μL of FuGENE (trademark) HD transfection reagent (Promega) was prepared at a total volume of 25 μL per well and added to the cultured cells after standing at room temperature for 10 minutes. After transfection, the cells were incubated at 37°C for 24 or 48 hours.
[0161] (RNA editing activity measurement) Cells harvested after culture were disrupted with 1-Thioglycerol / Homogenization solution included with the Maxwell® RSC simplyRNA Tissue Kit (Promega), and RNA was extracted using the same kit. cDNA was synthesized from 300 ng of RNA using ReverTra Ace® (Toyobo) and 1.25 μM random hexamer primers. Amplification of the editing site was performed by PCR using 1 μL of cDNA as a template with PrimeSTAR Max DNA® polymerase (Takara Bio) and primers with sequences adjacent to the target site on human mRNA. The PCR product was processed using ExoSAP-IT. TM After purification with Express PCR Cleanup Reagent (ThermoFisher), the RNA was sequenced by Sanger assay using target gene-specific forward or reverse primers (Azenta). From the resulting chromatograms, the peak area of each base at the RNA editing site was quantified using EditR software (http: / / baseeditr.com)
[0016] . After trimming low-quality data regions (P-value cutoff: 0.01), the C-to-U editing efficiency was calculated as the percentage of thymidine (T) peak area relative to the sum of cytidine (C) and thymidine (T) peak areas (%). Peak area values that were not considered statistically significant due to background noise were treated as 0.
[0162] (Off-target analysis by RNA-seq) As previously mentioned, all R NAs were extracted and polyadenylated RNAs were enriched to prepare a library. Sequencing was performed using Novaseq 6000 (Illumina) (Azenta), and the obtained reads were aligned to the reference genome GRCh38.105 using STAR (v2.7.10)
[17] (parameters: --quantMode TranscriptomeSAM -outFilterType BySJout -outFilterMultimapNmax 1 -outSAMstrandField intronMotif -outSAMattributes All). Duplicate reads were removed using Picard MarkDuplicates (http: / / broadinstitute.github.io / picard / ). RNA editing candidate sites were detected using REDItools
[0018] (v1.3) (parameters: -t 17 -e -d -l -U [AG,TC,CT,GA] -G path / to / gtf -p -u -m 30 -T 6-0 -W -v 10 -n 0 -g 2 -s 1). Sites with read counts less than 10 and minimum mapping quality scores less than 30 were removed. Base frequencies were counted at all positions in the transcript sequence. Positions with significant differences in base frequencies compared to the reference were identified using Fisher's exact test and Benjamini-Hochberg correction (p-value < 0.01).
[0163] (Cytotoxicity assessment using LDH assay) Cytotoxicity was assessed using the Cytotoxicity LDH Assay Kit-WST (Dojindo) according to the manufacturer's heterogeneous assay protocol. In simplified terms, HEK293T cells were transfected 24 hours prior to transfection in collagen-coated 24-well plates (IWAKI) at a rate of 1 × 10⁶ per well. 5Cells were seeded at a specified density. Cytotoxicity was measured at 24, 48, and 72 hours post-transfection. At each time point, 100 μL of cell culture supernatant was transferred to a 96-well plate and mixed with 100 μL of working solution. The plate was shielded from light and incubated at room temperature for 30 minutes, after which 50 μL of stop solution was added to each well to halt the reaction. Absorbance at 490 nm was measured using a microplate reader. Cells transfected with an empty vector under identical conditions were used as a control, and cell-free wells were used as background. All experiments were repeated three times. Cells remaining on the plate after transfection were used to evaluate editing efficiency and β-catenin protein accumulation (described in the sections "Analysis of in vitro RNA editing by Sanger sequencing" and "Protein analysis of in vitro samples").
[0164] (RNA electrophoretic mobility shift assay (REMSA)) Cy3 or Cy5-labeled synthetic 31-base pair RNA oligonucleotides, CTNNB1-T41I (5'-Cy3-UCUGGAAUCCAUUCUGGUGCCACUACCACAG, edited site is underlined. SEQ ID NO.: 54) and SMARCA4-P88L (5'-Cy5-CAUGAGAAGGGCAUGUCGGACGACCCGCGCU, negative control. SEQ ID NO.: 55), were used. RNA probe (10 nmol) and purified protein at specified concentrations (0, 0.5, 1, 2, 5, 10, 20, 50, 100, 200 nM) were mixed in 10 μL of reaction buffer (20 mM Tris-HCl, pH 8.0, 150 mM NaCl, 0.5% NP40, 1 mM EDTA, pH 8.0, 1 mM dithiothreitol) to a final reaction volume of 20 μL. The reaction was incubated at 25°C for 20 minutes, after which 2 μL of 80% glycerol was added. 10 μL was dispensed from each reaction mixture, loaded onto 5-20% e-PAGEL (ATTO), and electrophoresis was performed in 1× TBE buffer at 4°C. The gels were imaged using the iBright imaging system (Invitrogen), and the percentage of bound oligonucleotides was quantified using Image Lab software (Bio-Rad).
[0165] (Statistical Analysis) Two-tailed unpaired t-tests and one-way analysis of variance (ANOVA) were performed using GraphPad Prism 10 software (GraphPad Software, Inc., https: / / www.graphpad.com / ). The significance level was set at p < 0.05 for all statistical tests.
[0166] [ PubMed ] 10. Shell S, Filipovska A, Chia T, Rajappa L, Lingford JP, Razif MF, Thore S, Rackham O. An artificial PPR scaffold for programmable RNA recognition. Nat Commun. 2014 Dec 17;5:5729 11. Miranda RG, Rojas M, Montgomery MP, Gribbin KP, Barkan A. RNA-binding specificity landscape of the pentatricopeptide repeat protein PPR10. RNA. 2017 Apr;23(4):586-599 12. Kobayashi T, Yagi Y, Nakamura T. Comprehensive Prediction of Target RNA Editing Sites for PLS-Class PPR Proteins in Arabidopsis thaliana. Plant Cell Physiol. 2019 Apr 1;60(4):862-874 13. Maeda,A., Takenaka,S., Wang,T., Frink,B., Shikanai,T. and Takenaka,M. (2022) The DYW deaminase domain has a distinct preference for neighboring nucleotides of the target RNA editing sites. The Plant Journal, 111, 756-767. 14. Abudayyeh , OO , Goetenberg , JS , Franklin , B , Koob , J , Kellner , MJ , Ladha , A , Joung , J , Kirchgatterer , P , Cox , DBT and Zhang , F (2019) A cytosine deaminase for programmable single-base RNA editing.Science, 365, 382-386. 15. Yagi Y, Teramoto T, Kaieda S, Imai T, Sasaki T, Yagi M, Maekawa N, Nakamura T. Construction of a Versatile, Programmable RNA-Binding Protein Using Designer PPR Proteins and Its Application for Splicing Control in Mammalian Cells. Cells. 2022 Nov 8;11(22):3529 16. Kluesner MG, et al. EditR: A method to quantify base editing from sanger sequencing. CRISPR J. 2018;1:239-250. 17. Dobin A, et al. STAR: Ultrafast universal RNA-seq aligner. Bioinformatics. 2013;29:15-21. 18. Picardi E, Pesole G. REDItools: High-throughput RNA editing detection made easy. Bioinformatics. 2013;29:1813-1814.
[0167] SEQ ID NOs: 1-4 PPR motif sequences SEQ ID NOs: 5-20 PPR motif sequences SEQ ID NOs:21-30 Target sequences SEQ ID NO:31 E2 motif for DY32 SEQ ID NOs: SEQ ID NOs: SEQ ID NOs: DYW:WW SEQ ID NOs:33-48 PPR motif sequences SEQ ID NO:49 C-terminal domain containing DYW:PG SEQ ID NO:50 C-terminal domain containing DYW:WW SEQ ID NO:51 PG2 SEQ ID NO:52 WW2 SEQ ID NO:53 SEQ2 SEQ ID NO:52 WW2 SEQ ID NO:53 SEQ ID NO:153 SEQ ID NO:1414 (31 bp) SEQ ID NO:55 SMARCA4-P88L (31 bp) SEQ ID NOs:56-72 Target sequences
Claims
1. A method for modifying a PPR protein, including the following: Substituting at least one amino acid at positions 2 and 13 in the amino acid sequence of one or more PPR motifs in a PPR protein containing multiple PPR motifs.
2. The method according to claim 1, wherein the amino acid substitution is at least one substitution selected from V2 and K13.
3. The method according to claim 1, wherein the amino acid substitution is at least one substitution selected from V2F, V2S, V2Y, K13Q, and K13S.
4. The method according to any one of claims 1 to 3, wherein the PPR protein has a P domain, and the P domain contains 14 or more PPR motifs.
5. A method for reducing the off-target effects of a PPR protein, comprising the steps of: carrying out the modification method described in any one of claims 1 to 4; and allowing the modified PPR protein obtained to act on a target RNA in a prokaryotic or eukaryotic cell.
6. Polypeptides consisting of any of the following amino acid sequences: VFTYNTLIDG LCKAGRLDEA EELLEEMEEK GIKPD (SEQ ID NO:5) VFTYNTLIDG LCKSGKIEEA LKLFKEMEEK GITPS (SEQ ID NO:6) VFTYTTLIDG LCKAGKVDEA LELFDEMKER GIKPD (SEQ ID NO:7) VFTYTTLIDG LCKAGKVDEA LELFKEMRSK GVKPN (SEQ ID NO:8) VSTYNTLIDG LCKAGRLDEA EELLEEMEEK GIKPD (SEQ ID NO:9) VSTYNTLIDG LCKSGKIEEA LKLFKEMEEK GITPS (SEQ ID NO:10) VSTYTTLIDG LCKAGKVDEA LELFDEMKER GIKPD (SEQ ID NO:11) VSTYTTLIDG LCKAGKVDEA LELFKEMRSK GVKPN (SEQ ID NO:12) VVTYNTLIDG LCQAGRLDEA EELLEEMEEK GIKPD (SEQ ID NO:13) VVTYNTLIDG LCQSGKIEEA LKLFKEMEEK GITPS (SEQ ID NO:14) VVTYTTLIDG LCQAGKVDEA LELFDEMKER GIKPD (SEQ ID NO:15) VVTYTTLIDG LCQAGKVDEA LELFKEMRSK GVKPN (SEQ ID NO:16) VVTYNTLIDG LCSAGRLDEA EELLEEMEEK GIKPD (SEQ ID NO:17) VVTYNTLIDG LCSSGKIEEA LKLFKEMEEK GITPS (SEQ ID NO:18) VVTYTTLIDG LCSAGKVDEA LELF DEMKER GIKPD (SEQ ID NO:19) VVTYTTLIDG LCSAGKVDEA LELFKEMRSK GVKPN (SEQ ID NO:20) 7. A PPR protein capable of binding to a target RNA, comprising 10 or more PPR motifs, wherein 2 to 6 of the included PPR motifs are selected from the following: The polypeptide according to claim 6, and a polypeptide comprising a sequence in which at least one substitution selected from V2F, V2S, V2Y, K13Q and K13S is made in the amino acid sequence of the PPR motif, in which case the PPR motif consists of a polypeptide with a total length of 31 to 36 amino acids represented by the following formula 1, and each amino acid is numbered sequentially as A1, A2, A3, A4... (In Formula 1: Helix A is a portion capable of forming an α-helix structure, consisting of 13 or 14 amino acids in length; X1 consists of 1 to 9 amino acids in length, preferably 1 to 3 amino acids; Helix B is a portion capable of forming an α-helix structure, consisting of 10 to 14 amino acids in length; X2 consists of 1 to 9 amino acids in length, preferably 4 to 9 amino acids, where the C-terminal amino acid in X2 is represented by L.) The amino acid combination of A5 and L functions for selective binding to RNA bases.
8. The PPR protein according to claim 6, comprising 14 PPR motifs.
9. A PPR protein modified by any one of claims 1 to 4 and expressed in prokaryotic or eukaryotic cells.
10. A nucleic acid encoding a polypeptide according to claim 6, or a PPR protein according to claim 7 or 8.
11. A vector comprising the nucleic acid described in claim 10.
12. A cell comprising the vector according to claim 11.