Orthogonal trna generation
Through computational analysis and structural engineering, orthogonal tRNAs are generated with cloverleaf folding and minimized host identity, addressing the challenge of discovering orthogonal tRNAs for genetic code expansion and enabling efficient non-canonical amino acid incorporation.
Patent Information
- Authority / Receiving Office
- WO · WO
- Patent Type
- Applications
- Current Assignee / Owner
- SPINCK MARTIN
- Filing Date
- 2025-12-12
- Publication Date
- 2026-06-18
AI Technical Summary
Existing methods are inadequate for the scalable and reliable discovery of orthogonal transfer RNAs (tRNAs) and their cognate aminoacyl-tRNA synthetases, which are crucial for genetic code expansion and reprogramming, as they rely on rare and challenging-to-identify pre-existing orthogonal tRNAs.
A method involving computational analysis and structural engineering of tRNA sequences to promote cloverleaf folding and minimize host-specific identity elements, combined with directed evolution of synthetases, to generate orthogonal and active tRNA/synthetase pairs de novo.
This approach effectively generates orthogonal tRNAs capable of directing stop codon suppression and incorporating non-canonical amino acids, enhancing the efficiency and reliability of genetic code expansion.
Smart Images

Figure EP2025086861_18062026_PF_FP_ABST
Abstract
Description
[0001] Orthogonal tRNA Generation
[0002] The invention describes a method for generating orthogonal tRNA molecules for assembly of proteins comprising any non-canonical amino acid. The method is based on computational analysis of data derived from thousands of tRNA structures and an understanding of the influence of parts of the tRNA molecule on both activity and orthogonality.
[0003] Introduction
[0004] The discovery of orthogonal transfer RNAs (tRNAs) and their cognate aminoacyl-tRNA synthetases provides a foundation for genetic code expansion and genetic code reprogramming1-4. Orthogonal tRNAs must be transcribed and correctly processed in the host cell and not appreciably aminoacylated by endogenous aminoacyl-tRNA synthetases (aaRSs). To be useful for genetic code reprogramming, orthogonal tRNAs must also be directed to an otherwise unassigned codon, commonly through anticodon mutation, and fold into the correct three-dimensional structure to enable aminoacylation by their cognate aminoacyl-tRNA synthetase. Orthogonal tRNAs have commonly been discovered by ad hoc processes5-16. We previously performed a systematic two-step computational and experimental search of 2.8 million tRNA sequences, from diverse bacteria, archaea, chloroplasts, and bacteriophage, for tRNAs that are orthogonal in Escherichia coli17. We experimentally characterized 231 of the resulting tRNAs, for which we identified the corresponding synthetase gene. These experiments defined tRNA genes (with their native anticodon sequences) for which no transcript was detected (defined as undetectable) and tRNA genes with a detectable transcript that was (1) aminoacylated by E. coli synthetases (defined as non-orthogonal); (2) not aminoacylated in E. coli containing the cognate synthetase gene (defined as orthogonal inactive); and (3) not aminoacylated by E. coli synthetases but was aminoacylated by a coexpressed cognate synthetase (defined as orthogonal active).
[0005] The properties of several orthogonal active tRNAs have been altered by anticodon mutation. For example, the orthogonal active tRNAs in Sorangium cellulosum (Sc) AspRS / ScAsptRNAGuc, llumatobacter nonamiensis (In) GlnRS / lnGlntRNAuuG, and Coprobacillus sp. D7 (Cs) ProRS / CsProtRNAuGG pairs were converted to orthogonal inactive tRNAs upon mutating their anticodon to CUA17. In some cases (for example, ScAsptRNACuA or lnGlntRNACuA), we could convert these orthogonal inactive tRNAs into orthogonal active tRNAs by evolving the anticodon recognition of the cognate synthetase. However, in other cases (for example, CsProtRNACuA), we could not re-activate the orthogonal inactive tRNAs by directed evolution. Orthogonal active tRNAs were also converted to non-orthogonal tRNAs upon mutating their anticodon to CUA. In some cases (for example, A IRNACUA), we could convert these non-orthogonal tRNAs into orthogonal active tRNAs, by directed evolution of the tRNA, followed by directed evolution of the anticodon recognition of the cognate synthetase. However, in other cases (for example, ApHistRNACUA), we did not regenerate orthogonal active tRNAs.
[0006] Generally, methods for genetic code expansion require orthogonal and active aaRS / tRNA pairs as starting points, but few methods exist for discovering orthogonal and active tRNAs in cells. Cervettini et al17have developed a technique for analytically assessing the acylation status of a tRNA, which assists in identifying orthogonal aaRS-tRNA pairs which can be used to incorporate ncAAs into proteins. This technique is known as tRNA extension or tREX. Once an orthogonal tRNA and its corresponding orthogonal aaRS are identified, methods have been developed for modulating the activity and specificity of the aaRS towards non-canonical monomers (ncMs). In a recent development, Dunkelmann et al (2024 Nature 626:603-610) have coupled the tREX acylation determination technique with a split-RNA based system which selects for orthogonal aaRS which specifically acylate cognate orthogonal tRNAs with amino acids or non-canonical monomers. By providing a library of aaRS genes coupled to tRNA molecules, it is proposed to select novel orthogonal aaRS / tRNA pairs based on the ability of the aaRS to acylate a tRNA with a given substrate. Both the tRNA and the sequence of the candidate aaRS are encoded within the same mRNA construct. This construct is processed in cells to generate stmRNAs, which can then be selectively recovered on the basis of the acylation status of the tRNA portion of the stmRNA. This procedure is known as tRNA display.
[0007] While tREX serves to evaluate the activity and orthogonality of pre-existing tRNAs, and aaRS engineering methods such as tRNA display enable the discovery of aaRS variants for novel ncMs, neither of these techniques can in itself generate new orthogonal tRNAs and cognate aaRSs that serve as starting points for further engineering. Rather, they rely on pre-existing orthogonal tRNAs, which are rare in nature, and which are challenging to identify reliably and scalably. There is therefore a need in the art for a method which can be used for de novo generation of orthogonal tRNAs and modification of existing tRNA to improve orthogonality and / or activity.
[0008] Summary of the Invention
[0009] According to a first aspect of the present invention, there is provided a method for assessing the functionality of an tRNA sequence in a non-native host, comprising the steps of
[0010] (a) calculating the predicted minimum free energy (MFE) structure for the sequence; and (b) assessing whether the predicted structure matches the canonical cloverleaf structure of the corresponding isoacceptor class for the tRNA sequence.
[0011] Preferably, the tRNA is non-native to bacteria, advantageously E. coli. Preferably, the tRNA is devoid of bacterial identity elements. Preferably, the tRNA is devoid of identity elements found in tRNAs of E. coli.
[0012] We have found that only 40% of (active) native E. coli tRNAs and 38% of all tREX tested tRNAs (243 tRNAs) are predicted to fold into a minimum free energy (MFE) cloverleaf structures; that is, the majority of tRNA structures in tRNA databases are not predicted MFE cloverleaf structures. When active orthogonal tRNA sequences are examined, however, we found that 80% of orthogonal active tRNAs are predicted to fold into a cloverleaf structure, significantly higher than expected (P < 0.001).
[0013] This suggests that MFE cloverleaf folding is a distinct feature necessary for the activity of orthogonal tRNAs, which may not benefit from host-specific folding mechanisms in the intended host.
[0014] In another embodiment, there is provided a method for converting an inactive orthogonal tRNA to an active orthogonal tRNA, comprising introducing mutations into the sequence of the orthogonal tRNA to promote folding into a MFE cloverleaf structure.
[0015] We have found that fixing the cloverleaf structure of an orthogonal inactive tRNA generated an orthogonal active tRNA capable of directing stop codon suppression through stop codon read-through.
[0016] Identity elements in tRNA have been identified in the literature as specific nucleotide sequences and structural features within a tRNA molecule that enable its recognition by the corresponding aminoacyl-tRNA synthetase enzyme. These elements ensure the correct amino acid is attached to the tRNA, maintaining the fidelity of protein synthesis. Identity elements can include specific base pairs in the acceptor stem, the anticodon loop, and other regions that interact uniquely with the synthetase enzyme. Identity elements can be a single nucleotide, or a set of nucleotides or nucleotide pairs. Preferably, the position of the nucleotide(s) is also defined.
[0017] Identity elements can be identified which are specific for the aaRS of any given organism, including E. coli. Identity elements have been catalogued in the art21 33and further identity elements may be identified and used in the context of the present invention. In a further embodiment, the invention provides a method for converting a non-orthogonal tRNA to an orthogonal tRNA. In the context of the invention, orthogonality is the property of a tRNAto be aminoacylated by tRNA synthetases which are heterologous to the organism in which the tRNA is used. Thus, if the host is E. coli, an orthogonal tRNA is not aminoacylated by E. coli tRNA synthetases. An orthogonal tRNA can also be referred to as a non-native or heterologous tRNA, within the above definition.
[0018] We have found that certain elements in tRNAs are permissive elements which allow aminoacylation by host aaRS enzymes, and that although these elements may not be specifically identified, their effects can be overcome by the creation of chimeric tRNAs, in which elements of the sequence are substituted by elements derived from tRNAs of the same isoacceptor class with minimal host identity elements present. Selection of chimeric tRNAs can then be used to identify an orthogonal tRNA.
[0019] Accordingly, the invention provides a method for increasing the orthogonality of a tRNA comprising the steps of creating a plurality of chimeras of a tRNA by substituting sections of the sequence thereof with analogous sections derived from tRNAs of the same isoacceptor class or alternative isoacceptor classes with minimal host identity elements, selecting for positive MFE cloverleaf folding, and screening a select set of chimeric tRNAs for aminoacylation by their native aaRS over the host aaRSs.
[0020] Furthermore, there is provided a method for converting a non-orthogonal tRNAto an orthogonal tRNA, comprising analyzing the sequences of chimeric tRNAs identified as orthogonal in the previous embodiment with the sequence of their non-orthogonal parent tRNA, identifying sequence elements which are consistently present or absent in the orthogonal tRNAs, and adding or removing said sequence elements from the tRNAto be converted.
[0021] In a further embodiment, the invention provides a method for increasing the activity of an orthogonal tRNA / aaRS pair, comprising the steps of creating a repertoire of mutant aaRS by randomizing the positions responsible for anticodon recognition, and selecting the repertoire for synthetase activity in promoting readthrough of a stop codon in a host expression system. Anticodon recognition can limit the activity of a de novo designed tRNA, and this can be corrected by anticodon recognition engineering in the cognate synthetase.
[0022] The foregoing embodiments of the invention establish that orthogonal tRNAs and their cognate synthetases can be effectively engineered by: - Designing tRNAs that retain identity elements for their cognate synthetases and adopt MFE cloverleaf folding.
[0023] - Removing host identity and permissive elements to prevent mis-acylation by endogenous synthetases.
[0024] - Utilizing directed evolution to enhance synthetase recognition of the tRNA anticodon, thereby improving pair activity.
[0025] On the basis of these principles, we have developed a procedure for generating orthogonal aaRS / tRNA pairs, in which orthogonal tRNAs are developed de novo from sequence parts derived from existing natural tRNAs. According to a first embodiment of this aspect of the invention, there is provided a method which comprises the following steps:
[0026] (a) collecting tRNA sequence data from databases of tRNAs, comprising both a first general tRNA database and a second database of tRNAs which are to serve as starting points for computational re-design into orthogonal and active tRNAs;
[0027] (b) partitioning the tRNA sequences of the second database into parts, identifying identity elements in those parts which are responsible for functionality, and thus dividing the parts into the fixed parts which comprise identity elements, and a repertoire of variable parts which do not;
[0028] (c) selecting alternative sequences from the first database for variable parts identified in step (b), and adding said parts to the repertoire of variable parts to introduce diversity;
[0029] (d) applying filters and calculating identity scores to select the most suitable variable parts; (e) optionally clustering the variable parts, and selecting the most diverse set of these suitable variable part sequences for chimeric tRNA assembly;
[0030] (f) assembling chimeric tRNAs from fixed parts and the repertoire of variable parts to create a repertoire of chimeric tRNAs, and filtering the chimeric tRNAs based on identity scores calculated by comparing against a database of tRNAs native to the intended host;
[0031] (g) using computational tools, predicting the structural viability of chimeric tRNAs;
[0032] (h) modifying anticodons and ensuring structural viability across all variants;
[0033] (i) preferably, clustering the chimeric tRNA sequences;
[0034] (j) selecting the most diverse set of tRNA sequences for further testing; (k) generating the final sequences ready for experimental validation; and
[0035] (I) determining the source organisms for the fixed parts in the candidate chimeric tRNAs and providing a tRNA synthetase from said source organism to generate a tRNA synthetase / tRNA pair.
[0036] In the method of the invention, the databases used can be databases of tRNA sequences from any organism, including prokaryotic or eukaryotic organisms and viruses, such as bacteria, archaea, chloroplasts and bacteriophage, or mammals, fungi, yeasts, plants, fish, reptiles and birds; for example, one database of tRNA sequences is the tRNADB-CE database27, which comprises tRNA sequences derived from prokaryotes, eukaryotes, viruses and chloroplasts. General tRNA databases can contain sequences of all tRNAs. Databases of tRNAs serving as starting points can be a single tRNA or sets of sequences which can comprise a plurality of tRNA sequences. Advantageously, the starting point tRNA database comprises tRNAs which lack undesired identity elements, that is are more likely to be orthogonal to the intended host. Partitioning of the sequences in the databases involves separating the sequences into parts. The parts can be any segments of the tRNA, but in an advantageous embodiment the parts substantially comprise the loops and stems of the tRNA cloverleaf structure, as well as unpaired bases in the sequence. In one embodiment, referring to canonical tRNA numbering, the parts comprise the following elements: the acceptor stem (1-7 and 66-72), unpaired bases 8 and 9, D-arm (10-13 and 22-25), D / T loops (14-21 and 54-60), unpaired base and variable loop (26 and 44-48), anticodon stem (27-31 and 39-43), anticodon stem loop (32-38), T-arm (49-53 and 61-65), and the discriminator base with CCA (73-76). These parts are defined structurally within the tRNA and can be referred to as the canonical parts.
[0037] Identity elements can be identified in the parts as is known in the art21 33and their presence used to partition the parts into those which are essential for tRNA function (those parts which contain identity elements; referred to as the fixed parts) and those which are not (and which do not contain identity elements, referred to as variable parts). Filters can be applied to sort the variable parts in order to select parts more likely to yield functional orthogonal tRNA. For example, sequences can be filtered for the presence of only bases A, C, U and G; paired sequences being correctly Watson-Crick base paired; variable loops shorter than 8 nt; and for the presence of conserved sequence elements specific for the desired host. For example, if the target host is E. coli, parts can be filtered for the presence of a least one U in unpaired bases 8-9, and the D-loop starting with A. The variable and fixed parts can then be assembled into a library of chimeric tRNAs.
[0038] Chimeras sharing the largest number of identity elements with tRNA from the intended host are discarded, leaving the tRNAs most likely to be orthogonal.
[0039] Computational folding programs such as RNAfold can be used to determine the sequences with the lowest energy of folding (MFE); the MFE structures which match the cloverleaf model are selected as the most likely to be active, and the remainder discarded.
[0040] The anticodon to the chimeric tRNAs can be altered to match the desired anticodon; following alteration, the structural predictions and filtering described above are repeated.
[0041] Clustering may then be applied to the remaining sequences, to generate a set of sequences representative of the tRNA space in the chimeric tRNA database.
[0042] Selected sequences can be analyzed for identity elements present in the fixed parts of the tRNA. Suitable aaRS for the tRNA are more likely to be selected from the source organisms of those fixed parts; therefore, aaRS for the tRNAs selected can be screened from pools derived from the identified source organisms.
[0043] Accordingly, there is provided a method for de novo generation of an aaRS / tRNA pair orthogonal to an intended host, comprising the steps of:
[0044] (a) providing a first database of aligned tRNA sequences and a second database of tRNA sequences;
[0045] (b) partitioning the tRNA sequences in the second database into structural parts;
[0046] (c) separating a first set of structural parts comprising identity elements from a second set of structural parts which do not comprise identity elements, and defining the first set of structural parts as fixed parts for all chimeric tRNA structures to be synthesized;
[0047] (d) defining each part of the second set of structural parts as a variable part, identifying corresponding variable parts in the sequences of the first database optionally in the same isoacceptor class as the sequences in the second database, and sectioning said variable parts with the second set of structural parts;
[0048] (e) filtering the second set of structural part sequences to identify parts likely to assemble into a functional tRNA;
[0049] (f) for each filtered part sequence, comparing the part sequence to a third database of tRNAs from an intended host, and calculating an identity score which reflects the number of shared identity elements;
[0050] (g) selecting the filtered part sequences with the lowest identity scores; (h) assembling a chimeric tRNA repertoire comprising the fixed parts and the filtered variable part sequences with low identity scores;
[0051] (i) calculating an identity score across the entire sequence of each chimeric tRNA, and selecting the sequences with the lowest identity scores;
[0052] (j) For each chimeric tRNA in the repertoire, determining the minimum free energy structure of the tRNA;
[0053] (k) Determining if the minimum free energy structure matches a cloverleaf configuration, and determining the ensemble diversity of the cloverleaf structure and the frequency of the cloverleaf structure in the chimeric tRNA;
[0054] (l) Discarding sequences which do not have a cloverleaf structure match, or have a low frequency of cloverleaf structure, or a high ensemble diversity;
[0055] (m) introducing variation into the anticodon sequence of the remaining chimeric tRNAs, and repeating steps (k) and (I);
[0056] (n) calculating average frequency and ensemble diversity across the anticodon variants, and retaining the desired tRNA sequences;
[0057] (o) clustering the retained chimeric tRNA sequences to generate a user-defined number of sequences with maximal sequence diversity, in order to define a finite set of chimeric tRNA sequences to be tested experimentally while maximally representing the sequence space;
[0058] (p) determining the source organism for the fixed identity parts of any one of the chimeric tRNA sequences, and test said tRNA sequences with candidate synthetases derived from said organism to provide an orthogonal aaRS / tRNA pair.
[0059] The method of the invention is advantageously a computer-implemented method.
[0060] Preferably, the first database is the tRNADB-CE database27. Preferably, the second database is a single tRNA sequence or a set of tRNA sequences having advantageous features for genetic code reprogramming, such as a low number of shared identity elements with the desired host, ora library of tRNA sequences having these desirable qualities, optionally in the same isoacceptor class. The second database may thus be a single tRNA or a set of tRNAs, need not be aligned or annotated and is not restricted to being a database as such.
[0061] In one embodiment, the second database may be prepared by:
[0062] 1. providing a database of candidate starting tRNAs for an isoacceptor class of interest; 2. breaking down the tRNAs into the canonical parts and identifying identity elements in the parts; 3. optionally, clustering the candidate tRNAs on the basis of the similarity of their identity elements and their identity parts as a whole, and selecting diverse candidate tRNAs from the different clusters;
[0063] 4. deriving an overall identity score of the candidate tRNAs with respect to the tRNAs of the intended host (e.g. E. coli)
[0064] 5. selecting a subset of tRNAs with minimal identity scores, and using those as starting points for chimeric tRNA generation.
[0065] Preferably, partitioning into parts is carried out as described above, in which the parts comprise the following elements: the acceptor stem (1-7 and 66-72), unpaired bases 8 and 9, D-arm (10-13 and 22-25), D / T loops (14-21 and 54-60), unpaired base and variable loop (26 and 44-48), anticodon stem (27-31 and 39-43), anticodon stem loop (32-38), T-arm (49-53 and 61-65), and the discriminator base with CCA (73-76). Sequences in the databases are aligned and partitioned into the canonical structural parts. Alignment may be carried out by aligning the GG motif in the D-loop of the tRNA.
[0066] The fixed parts are parts which comprise identity elements which are involved in aaRS recognition of the candidate tRNA. Preferably, these identity elements differ from the identity elements found in the intended host. These parts are fixed in the chimeric tRNAs, with the intention of maximizing the chances of obtaining an orthogonal tRNA.
[0067] Fixed parts can be selected on the basis of any desired candidate tRNA. For example, fixed parts can be based on E / 7TrptRNA, P / 7TrptRNA or TbTrptRNA. Fixed parts can also be selected on the basis of an exemplar from a selected cluster in a database of candidate tRNAs of the appropriate isoacceptor class.
[0068] An identity score can be generated for each chimeric tRNA part, by assessing the occurrence of identity elements from the intended host in the variable parts. An exemplary identity score is described in the methodology section; however, other calculations may be possible. The cutoff for an identity score for the selected variable parts can vary as appropriate, but for example can be the top 65 sequences for each part, wherein the top sequence has the lowest identity score. A number of sequences should be chosen to produce a computationally feasible number of combinations in the chimeric database combining variable sequences.
[0069] The library of chimeric tRNAs can be screened using RNAfold or any suitable software capable of calculating folding energy, to produce MFE structures. These structures can be examined for cloverleaf folding; non-cloverleaf structures are discarded. Anticodon sequence variants can be generated by selective mutation with all possible bases, leading to a repertoire of anticodon sequences. Once the anticodon has been changed, structural analysis of the tRNA should be repeated, to ensure that the tRNA retains the favorable folding characteristics and forms a MFE cloverleaf structure.
[0070] In another embodiment, there is provided a data processing system comprising a processor configured to perform the method of the preceding aspects of the invention. Such a system is typically a computer which has access to the required tRNA databases and sufficient user input. Nucleic acid and protein synthesizers, which can produce custom nucleic acids and / or polypeptides, can also be associated with the system and directed by the system to produce tRNA or aaRS molecules for testing.
[0071] In a further embodiment, there is provided a computer program comprising instructions which, when executed by a computer, carry out the method of the preceding aspects of the invention. Further provided is a computer-readable medium comprising instructions which, when executed by a computer, cause the computer to carry out the method of the preceding aspects of the invention.
[0072] In another aspect, the present invention provides CsProtRNACuAfix, a rationally designed mutant of the Coprobacillus sp. D7 proline tRNA (CsProtRNAcuA) which comprises 2-3 point mutations to fix the predicted cloverleaf structure and increase folding stability.
[0073] The result of the mutation is conversion from an orthogonal inactive tRNA to an orthogonal active tRNA in E. coli. CsProtRNACuAfix has the ability to incorporate proline at amber stop codons when paired with its cognate synthetase (CsProRS).
[0074] In another aspect, the invention provides ApHistRNACuAfix, a modified version of the Afifella pfennigii DSM 17143 histidine tRNA (ApHistRNACuA). Rationally designed mutations stabilize the cloverleaf structure, increasing activity compared to the parent tRNA but remaining non-orthogonal.
[0075] A library of 192 chimeric tRNA sequences was created by replacing sections of ApHistRNACuAfix with sequences from 12 selected histidine tRNAs. In the library, mutants comprising mutations T2H6, T2H7, and T2H9 demonstrated ApHisRS-dependent read-through of amber stop codons and were identified as orthogonal active tRNAs.
[0076] These tRNAs contained specific deletions in the D-loop and variable loop that contributed to orthogonality by removing permissive elements. In a further aspect, there is provided tRNA molecule 1092TrptRNA CUA, which comprises fixed identity elements from Entamoeba histolytica Trp tRNA (EhTrptRNA) and variable parts optimized for minimal E. coli identity elements and robust cloverleaf folding.
[0077] The tRNA is orthogonal and active in E. coli; aminoacylated by Pyrococcus horikoshii rpRS (PhTrpRS), successfully incorporated tryptophan at amber stop codons and was engineered to incorporate 5-hydroxy-L-tryptophan (an ncAA) using a mutated PhTrpRS.
[0078] In a further aspect, the invention provides 1081ArgtRNACuA, which comprises fixed identity elements from Fusobacterium thermophilum Arg tRNA (FtArgtRNA). This tRNA is orthogonal and active in E. coli; aminoacylated by Coprobacillus sp. D7 ArgRS (CsArgRS) and enabled incorporation of arginine at amber stop codons.
[0079] In a further aspect, there is provided the tRNA CsProtRNACuAfix (G37A). This tRNA is spontaneous mutant of CsProtRNACuAfix with a G37A mutation. It Showed increased activity when paired with evolved CsProRS variants.
[0080] In a further aspect, there are provided anticodon variants of 1092TrptRNA. The variants of 1092TrptRNA according to the invention have different anticodons (e.g., CGA). The variants of 1092TrptRNA demonstrated robustness to anticodon mutations in this tRNA, and were used to decode other codons (e.g., TCG codons in E. coli Syn61Δ3ev5).
[0081] In another aspect, the invention provides evolved variants of ApHisRS. ApHisRS is the histidyl-tRNA synthetase from Afifella pfennigii DSM 17143; preferably, this synthetase comprises mutation E97K. An Anticodon Recognition (ACR) Library was created by mutating five key residues (Ser482, Asp483, Glu484, Gly491, Arg493) involved in anticodon recognition, leading to a library size of approximately 107mutants. Mutants were selected based on their ability to aminoacylate the engineered tRNAT2H9HistRNACuAfix and facilitate read-through of an amber stop codon in a chloramphenicol acetyltransferase (CAT) reporter gene. The variants showed significantly improved activity with T2H9HistRNACuAfix and maintained orthogonality by not aminoacylating endogenous E. coli tRNAs.
[0082] Also provided are evolved variants of CsProRS (CsProRS v1-v8). CsProRS is the prolyl-tRNA synthetase from Coprobacillus sp. D7. An ACR Library was created by mutating five residues (Glu340, Arg347, Lys370, Glu349, Asp354) critical for anticodon recognition.
[0083] CsProRS v1 showed a fivefold increase in activity with CsProtRNAcuAfix.
[0084] Introduction of an additional mutation (S261G) in CsProRS v1 (creating CsProRS v1*) further enhanced activity. CsProRS v2 was selected alongside a spontaneous G37A mutation in CsProtRNACuAfix, leading to even higher activity levels.
[0085] Variants demonstrated improved efficiency in incorporating proline at amber stop codons. Further mutated variants CsProRS v2* and CsProRS v3* comprise mutations introduced into CsProRS v2 and v3, including S261G, to enhance activity. They showed substantial increases in activity with CsProtRNAcuAfix (G37A). CsProRS v2* exhibited a 12-fold increase in activity over the wild-type pair.
[0086] Further provided is the engineered Variant PhTrpRS*. PhTrpRS is the tryptophanyl-tRNA synthetase from Pyrococcus horikoshii. Introduced mutations are Y78F, T79A, I212G, and A214C to alter substrate specificity. PhTrpRS*is engineered to incorporate 5-hydroxy-L-tryptophan (an ncAA) into proteins and demonstrated the ability to aminoacylate the computationally designed tRNA 1092TrpRNA CUA or CGA- Also provided is PhTrpRS* v1, a further evolved mutant prepared by mutating residues involved in anticodon recognition (T283H, R286Q). This mutant showed a 30% increase in activity compared to PhTrpRS* and efficiently incorporated 5-hydroxy-L-tryptophan with activity levels comparable to established systems.
[0087] The present invention moreover provides orthogonal aaRS / tRNA pairs, including PhTrpRS with 1092TrpRNAcuA, which demonstrated successful aminoacylation and incorporation of tryptophan at amber stop codons; and CsArgRS with 1081ArgRNACuA, which enabled incorporation of arginine in response to amber codons with the engineered tRNA.
[0088] Brief Description of the Figures
[0089] Fig. 1 | Active and orthogonal tRNAs commonly form predicted cloverleaf MFE structures.
[0090] a, Classification of 243 tRNAs into four groups on the basis of previous tREX data17that identified whether a tRNA was present in E. coli and whether it was acylated by endogenous synthetases or in the presence of the gene for its cognate synthetase (NO, non-orthogonal; OA, orthogonal active; Ol, orthogonal inactive; U, undetectable). Orthogonal tRNAs for which the cognate synthase could not be identified (unk) were excluded from further analysis.
[0091] For each group, the predicted MFE structure was calculated using RNAfold. We determined if the predicted structure matched an expected cloverleaf fold of a particular isoacceptor class (yes / no) by inspection. MFE abundance (percent contribution to the ensemble) and diversity (base pair distance between possible structures within the ensemble) were determined for all tRNAs. b, Percentage of sequences within each group that are predicted to form MFE cloverleaf structures; the corresponding percentage for E. coli tRNAs is also shown. The number of MFE cloverleaf tRNAs and total members of each group are shown. The observed distribution in the OA group diverges significantly (P = 55.2 × 10⁻⁵) from the expected distribution found for all other tested tRNAs. Statistical testing was performed using the two-tailed Fisher’s exact test.
[0092] Fig. 2 | Fixing the predicted cloverleaf structure of an orthogonal inactive tRNA generates an orthogonal active tRNA, but fixing the cloverleaf structure of a non-orthogonal active tRNA is not sufficient to generate an orthogonal active tRNA.
[0093] a, Frequency-diversity plots for the indicated tRNAs. CsProtRNACuA and ApHistRNACuA, which exhibit weak or no cloverleaf folding, are connected to their structurally fixed derivatives by arrows. Cloverleaf MFE structures are indicated with large dots. Non-cloverleaf MFE structures are indicated with small dots, b, Fixing the predicted cloverleaf structure of CsProtRNACuA (an orthogonal inactive tRNA) generates an orthogonal active tRNA, CsProtRNACuAfix. GFP fluorescence was measured in cells containing sfGFP3TAGHis6 and CsProtRNACuA or CsProtRNACuAfix with or without CsProRS. GFP fluorescence generated from sfGFP3TAGHis6 by the MmPylRS / MmPyltRNACUAwith 2 mM AllocK under the same conditions was approximately 20,000 a.u. The experiments were performed in three independent replicates. The individual data points are shown as dots; the bars represent mean values; and the error bar indicates the standard deviation, c, Fixing the predicted cloverleaf structure of ApHistRNACuA (a non-orthogonal active tRNA) generates a more active non-orthogonal tRNA, ApHistRNACuAfix. GFP fluorescence was measured in cells containing sfGFP3TAGHis6 and ApHistRNACUAor ApHistRNACUAfix with or without ApHisRS. GFP fluorescence generated from sfGFP3TAGHis6 by the Mm PylRS / MmPyltRNACUAwith 2 mM AllocK under the same conditions was approximately 20,000 a.u. The experiments were performed in three independent replicates. The individual data points are shown as dots; the bars represent mean values; and the error bar indicates the standard deviation.
[0094] Fig. 3 | Generating an orthogonal active tRNA from a non-orthogonal active tRNA via CRtM. a, A heterologous tRNA of interest (green), with minimal E. coli identity elements, and identity elements for the cognate synthetase (green circles) in aminoacylated with a canonical amino acid (gray star) by E. coli aminoacyl-tRNA synthetases (gray). This acylation is a consequence of permissive tRNA elements (gray circles) and makes the tRNA non-orthogonal. We generated an orthogonal tRNA through CRtM. CRtM systematically replaces sections of a tRNA with sequences from other isoacceptor tRNAs with diverse sequences. This approach aims to remove permissive elements while maintaining an MFE cloverleaf structure. Some chimeric tRNAs no longer have permissive elements and maintain activity with the cognate orthogonal aminoacyl tRNA synthetase; these tRNAs are orthogonal and active, b, We computationally replaced defined regions of ApHistRNACuAfix sequence (green) with the corresponding regions from 12 histidine tRNAs with minimal E. coli identity elements (colored). These replacements followed 16 defined replacement schemes and generated 192 chimeric sequences, c, We used RNAfold to predict the frequency and diversity of all 192 chimeric tRNAs. Chimeras that fold into a cloverleaf are indicated by larger circles; any other folding is depicted as small circles. The diversity and frequency of the final set of 12 chimeric tRNAs (T2H1–TH12) selected for experimental validation are shown in orange, d, GFP fluorescence was measured in cells containing sfGFP3TAGHis6 and the indicated ApHistRNACUAfix derivative, with or without ApHisRS. The experiments were performed in three independent replicates. The individual data points are shown as dots; the bars represent mean values; and the error bar indicates the standard deviation, e, MFE structure prediction for T2H9HistRNACuAfix. Base (un)pairing probability is colored. Sequence changes with respect to ApHistRNACuAfix are indicated by arrows; gray arrows indicate changes unique to T2H9HistRNACuAfix; black arrows indicate changes found for all orthogonal tRNAs (T2H9, T2H6 and T2H7). Encoding of histidine (green star) was confirmed for T2H6, T2H7 and T2H9 by mass spectrometry.
[0095] Fig. 4 | Directed evolution of synthetase ACR generates synthetases that function with orthogonal tRNA with similar activity to benchmark genetic code expansion systems, a, T2H9HistRNACUAfix and CsProtRNACUAfix contain altered anticodons that may not be efficiently recognized by ApHisRS and CsProRS, respectively, b, Anticodon recognition in the T. thermophilus (Tt)HisRSIH'stRNAGuG pair (PDB: 4RDX). Anticodon bases are shown in white. Residues in TfHisRS that recognize the anticodon are shown in orange; the numbering for the corresponding residues in ApHisRS is shown in brackets — these residues were targeted for mutagenesis in the ACR library of ApHisRS. c, Sequences of ApHisRS variants obtained from the ACR library after a single round of selection, d, GFP fluorescence was measured in cells containing T2H9HistRNACuAfix and a sfGFP3TAGHis6 gene. Cells contained the indicated ApHisRS variant: no synthetase (-), ApHisRS (wt) or its evolved variants (v1-3). The dashed line indicates the level of GFP fluorescence generated from sfGFP3TAGHis6 by the Mm PylRS / MmPyltRNACUAwith 2 mM AllocK. The experiments were performed in three independent replicates. The individual data points are shown as dots; the bars represent mean values; and the error bars indicate the standard deviation, e, ACR in the TtProRS / TtProtRNAUGGpair (PDB: 1H4Q). Anticodon bases and base G37 are shown in white. Residues in TfProRS that recognize the anticodon are shown in orange; the numbering for the corresponding residues in CsProRS, where different, is shown in brackets — these residues were targeted for mutagenesis in the ACR library of CsProRS. f, Sequences of CsProRS variants obtained from the ACR library after a single round of selection. The CsProRS v2 and v3 sequences were isolated with CsProtRNACuAfix (G37A), a spontaneous mutant of the tRNA used as an input for the selection. We also generated variants of the selected CsProRS mutants, indicated with an *; we transplanted the S261G mutation into these variants (EcProRS C443G)26. g, GFP fluorescence from sfGFP3TAGHis6 measured in cells containing CsProtRNACuAfix or CsProtRNACuAfix G37A and the indicated CsProRS variant: no synthetase (-), CsProRS (wt) or its evolved variants (v1 and v2). The dashed line indicates the level of GFP fluorescence generated from sfGFP3TAGHis6 by the MmPylRS / MmPyltRNACUAwith 2 mM AllocK. The experiments were performed in three independent replicates. The individual data points are shown as dots; the bars represent mean values; and the error bars indicate the standard deviation. PDB, Protein Data Bank; wt, wildtype.
[0096] A
[0097] Fig. 5 | Chi-T, a computational algorithm for the de novo design of codon- reassigned, orthogonal tRNAs. a, The Chi-T workflow, b, RNAfold-predicted MFE structure of the E. histolytica tryptophanyl tRNA with its anticodon (black line) changed from CCG to CUA. Base coloring represents the predicted probability that a base has the pairing status predicted by the MFE structure, c, GFP fluorescence from sfGFP3TAGHis6 measured in cells containing E. histolytica (E77)TrptRNAcuA with no synthetase (-) or the tryptophanyl synthetases from E. histolytica, T. brucei and P horikoshii. The experiments were performed in three independent replicates. The individual data points are shown as dots; the bars represent mean values; and the error bars indicate the standard deviation, d, Selection ofTrptRNACuAS with fixed E77TrptRNA identity parts and minimal E. coli identity elements that form cloverleaf structures. Sequences for which the MFE structure is a cloverleaf are shown as blue dots; all other sequences are shown as red dots. In total, 642 tRNAs were selected (solid blue dots); these had a frequency of cloverleaf greater than or equal to 40% and a diversity (Abp) of 9 bp or less. All other sequences are represented by translucent dots. For simplicity of rendering, the graph shows data for 105 randomly chosen chimeras, e, Selection ofTrptRNACuAS with fixed EhTrptRNA identity parts that are robust to anticodon mutation. The anticodons of the tRNAs from the selection step shown in d were varied to CUA, UGA and CGA. The 169 chimeric tRNA sequences passing these filters for this step are shown as blue dots in the frequency-diversity plot, f, Multidimensional scaling plot of the relationship among the final 169 EhTrptRNA chimeric sequences. Distances were computed as the Levenshtein distance between sequences, and sequences were clustered. Sequences are colored by cluster. Exemplar sequences, for each of the 33 clusters, are represented by solid triangles; all other sequences are represented by translucent dots. The four final tRNAs (1091, 1092, 1093 and 1094) are circled.
[0098] Multidimensional scaling distances are not linearly correlated to the spatial distance on the plot, g, The RNAfold-predicted MFE structures of the indicated tRNAs (sequence in Fig. 8). Fig. 6 | Characterizing a new orthogonal pair and engineering it for ncAA incorporation, a, GFP fluorescence from sfGFP3TAGHis6 measured in cells containing 1092TrptRNACuA, with or without the P horikoshii synthetase (P TrpRS). The experiments were performed in three independent replicates. The individual data points are shown as dots; the bars represent mean values; and the error bars indicate the standard deviation, b, Deconvoluted ESI-MS of GFP purified from cell containing sfGFPI 50TAGHis6, 1092TrptRNA CUA and P TrpRS. Expected mass, sfGFP150Trp: 27,900 Da, observed 27,900 Da. The experiment was performed in one replicate, c, Structure of 5-hydroxy-l-tryptophan (1). d, GFP fluorescence from sfGFP3TAGHis6 was measured in cells containing 1092TrptRNACuA and the indicated P TrpRS variant in the presence and absence of 1. The dashed line indicates the level of GFP fluorescence generated from sfGFP3TAGHis6 by the Mm PylRS / MmPyltRNACUAwith 2 mM AllocK. The experiments were performed in three independent replicates. The individual data points are shown as dots; the bars represent mean values; and the error bars indicate the standard deviation, e, Deconvoluted ESI-MS of GFP purified from cell containing
[0099] sfGFPI 50TAGHis6, 1092TrptRNACuA, P / )TrpRS* and 5-hydroxy- l-tryptophan (1). Expected mass, GFP150Trp-OH: 27,915 Da, observed 27,916 Da. The experiment was performed in one replicate. Exp., expected; Obs., observed; wt, wild-type; v, variant.
[0100] Figure 7 | Cloverleaf structure definition.
[0101] a) The canonical tRNA cloverleaf secondary structure, with part types distinguished by colour. Here, blue describes the required number of unpaired bases, and red describes the number of paired bases (‘Base number variation’ and ‘allowed (un)paired)’. The variable loop can take a number of structures featuring paired and unpaired bases, only requiring a minimum of one unpaired base at the 3’ end of the part i.e. an unpaired base must separate the anticodon stem and the T-arm. b) RNAfold outputs’ MFE structures in dot-bracket secondary structure format, such that unpaired bases are denoted by ‘.’, and paired bases are denoted by ‘(‘ or “)’, depending on if they are pairing with bases 3’, or 5’ of themselves, respectively (‘MFE string’). We defined a cloverleaf structure based on the order and number of dots and brackets. We searched for these structures using the regex string (‘Universal cloverleaf tRNA search pattern’). All cloverleaf structures are also defined by the first continuous unpaired region containing at least one uracil (unpaired bases 8-9 in canonical structure), and the first nucleotide of the second continuous unpaired region being an adenine (nucleotide 14, D-loop, in canonical structure).
[0102] Figure 8 | Alignment of Chi-T input tRNAs and generated designs.
[0103] The sequences of the cognate tRNAs for the Entamoeba histolytica (E / ?)TrpRS, Trypanosoma brucei (Tb)TrpRS and Pyrococcus horikoshii (P / ?)TrpRS, and the sequences of the TrptRNAs generated by Chi-T. CUA anticodon variants are shown, a) Sequence alignment of TbTrptRNACUA, E TrptRNACUA and P TrptRNACUA. b-d) Sequence alignment of the Chi-T designed tRNA sequences for all three input tRNAs. Green shading: Identity elements for the tryptophanyl synthetases, Blue: CUA anticodon, red text: Difference between the TbTrptRNA / E TrptRNA identity elements and the P TrptRNA identity elements, * indicates conserved bases.
[0104] Detailed Description of the Invention
[0105] In several instances in the prior art, directed evolution of tRNA or aaRS has not been successful in generating active orthogonal tRNA and tRNA / aaRS pairs. In17we could not reactivate orthogonal inactive tRNAs by directed evolution. Orthogonal active tRNAs were also converted to non-orthogonal tRNAs upon mutating their anticodon to CUA. In some cases (for example, AfTyrtRNACuA), we could convert these non-orthogonal tRNAs into orthogonal active tRNAs, by directed evolution of the tRNA, followed by directed evolution of the anticodon recognition of the cognate synthetase. However, in other cases (for example, ApHistRNACuA), we did not regenerate orthogonal active tRNAs.
[0106] Therefore, we set out to understand why certain tRNAs could be converted to orthogonality and others not, and why certain tRNAs were active rather than inactive.
[0107] Here we show that tRNAs, which are orthogonal and active in E. coli, are more likely to fold into a predicted minimum free energy (MFE) structure that is cloverleaf than tRNAs that are non-orthogonal or undetectable in E. coli. We leveraged our insights into the properties of orthogonal tRNAs to create orthogonal active tRNAs from both an inactive orthogonal tRNA (CsProtRNACuA) and a non-orthogonal tRNA (ApHistRNACuA). We developed Chi-T, a computational tool for automatically generating chimeric orthogonal tRNAs. Chi-T generates millions of chimeric isoacceptor tRNA sequences and then filters these chimeric sequences to directly identify a small number of diverse sequences that have minimal identity elements for E. coli synthetases and are predicted to fold into a robust cloverleaf. We also developed RS-ID to identify synthetases that may acylate the orthogonal tRNAs discovered through Chi-T. Using this approach, we directly discovered orthogonal tRNAs: 1092TrptRNACuA, which is active with Pyrococcus horikoshii TrpRS, and 1081ArgtRNAcuA, which is active with Capnocytophaga sp. ArgRS. Overall, we generated four new active and orthogonal tRNAs (CsProtRNACuAfix, T2H9HistRNACuAfix, 1081ArgtRNACuA and 1092TrptRNACuA) with altered anticodons that redirect them to the amber stop codon. We evolved the anticodon recognition of the cognate synthetase for three of these pairs to create pairs that function with an efficiency similar to benchmark genetic code expansion systems. We further engineered the PhTrpRS / 1092TrptRNACUAorthogonal pair for non-canonical amino acid (ncAA) incorporation.
[0108] Databases
[0109] The present invention makes use of databases of tRNA sequence (structure). Preferred databases for use with the present invention include:
[0110] GtRNAdb (Genomic tRNA Database)
[0111] Offers a comprehensive collection of tRNA genes from various organisms, including bacteria, archaea, and eukaryotes. The database provides detailed annotations and predicted secondary structures using the tRNAscan-SE tool.
[0112] [http: / / gtmadb.ucsc.edu / ] (http: / / gtrnadb.ucsc.edu / )
[0113] tRNAdb
[0114] A curated database containing tRNA sequences and gene information from a wide range of organisms. It includes details about tRNA modifications and their corresponding enzymes.
[0115] [http: / / trnadb.bioinf.uni-leipzig.de / ](http: / / trnadb. bioinf.uni-leipzig.de / )
[0116] Sprinzl tRNA Database
[0117] An older but still valuable resource that catalogs tRNA sequences and their characteristics. Although not actively updated, it serves as a historical reference for tRNA research.
[0118] [http: / / trna.bioinf.uni-leipzig.de / DataOutput / ](http: / / trna. bioinf.uni-leipzig.de / DataOutput / ) MODOMICS
[0119] Focuses on RNA modification pathways, providing information on tRNA sequences along with their chemical modifications and associated enzymes.
[0120] [http: / / modomics.genesilico.pl / ](http: / / modomics. genesilico.pl / ) Rfam
[0121] A collection of RNA families, including tRNAs, annotated with secondary structures and conserved sequence motifs. Useful for comparative analysis and identifying RNA elements in genomic sequences, [http: / / rfam.xfam. org / ](http: / / rfam.xfam.org / )
[0122] GenBank (NCBI)
[0123] A comprehensive nucleotide sequence database that includes tRNA sequences submitted by researchers worldwide. Accessible via the NCBI website.
[0124] [https: / / www.ncbi. nlm.nih.gov / genbank / ](https: / / www.ncbi. nlm.nih.gov / genbank / ) European Nucleotide Archive (ENA)
[0125] Offers a vast repository of nucleotide sequences, including tRNAs, from a variety of species. Provides tools for sequence retrieval and analysis.
[0126] [https: / / www.ebi. ac.uk / ena](https: / / www.ebi.ac.uk / ena)
[0127] tRNAscan-SE Web Server
[0128] While primarily a tool for detecting tRNA genes in genomic sequences, it also provides access to predicted tRNA sequences and annotations. [http: / / lowelab.ucsc.edu / tRNAscan-SE / ] (http: / / lowelab.ucsc.edu / tRNAscan-SE / )
[0129] Ensembl Genomes
[0130] Provides genome annotations for a wide range of species, including tRNA genes. Useful for exploring tRNA sequences in the context of genomic data.
[0131] [https: / / www.ensembl.org / ](https: / / www.ensembl.org / )
[0132] RNAcentral
[0133] A comprehensive database of non-coding RNA sequences, including tRNAs, aggregated from multiple expert databases. [https: / / rnacentral.org / ](https: / / rnacentral.org / ) tRNADB-CE
[0134] The preferred database27. The database was constructed analyzing 534 complete genomes of prokaryotes and 394 draft genomes in WGS (Whole Genome Shotgun) division in DDBJ / EMBL / GenBank and approximately 6.2 million DNA fragment sequences obtained from metagenomic analyses, to provide aligned tRNA sequences.
[0135] [https: / / trna.ie.niigata-u.ac.jp / ]. Cloverleaf Structure
[0136] The tRNA cloverleaf refers to the characteristic secondary structure of transfer RNA (tRNA) molecules, which resembles a cloverleaf when depicted in two dimensions. This structure is important for the tRNA's role in protein synthesis, as it allows the molecule to properly interact with mRNA codons and the ribosome during translation.
[0137] The cloverleaf structure comprises four main arms (stems) and loops:
[0138] 1. Acceptor Stem: Formed by the pairing of the 5' and 3' ends of the tRNA, this stem is where an amino acid is attached to the tRNA's 3' terminal adenine nucleotide. The acceptor stem is crucial for the attachment of the correct amino acid by aminoacyl- tRNA synthetases.
[0139] 2. D-arm: Named after the presence of dihydrouridine residues, the D-arm consists of a stem and loop structure. It plays a role in the proper folding and stabilization of the tRNA molecule and is involved in the recognition by aminoacyl-tRNA synthetases. 3. Anticodon Arm: This arm contains the anticodon loop, which holds a specific sequence of three nucleotides (the anticodon) complementary to an mRNA codon. This complementarity ensures that the tRNA delivers the correct amino acid corresponding to the mRNA codon during protein synthesis.
[0140] 4. Variable Loop: The size of this loop varies among different tRNAs and can influence the overall three-dimensional structure of the tRNA. It can range from 3 to 21 nucleotides in length.
[0141] 5. TipC Arm or T-arm: This arm is characterized by the presence of the conserved sequence thymidine (T), pseudouridine (ip), and cytidine (C). The TipC arm is important for the binding of the tRNA to the ribosome and contributes to the proper folding of the tRNA molecule.
[0142] The cloverleaf model provides a simplified way to represent the tRNA's complex three-dimensional structure, which is critical for its function during translation. The structure of tRNA has been modelled by crystallography, and the capacity of a tRNA sequences to conform to the cloverleaf structure when folded can be predicted.
[0143] Minimum Free Energy
[0144] The minimum free energy of a tRNA structure can be predicted using tools such as RNAfold. RNAfold is a computational tool used for predicting the secondary structure of RNA molecules based on their nucleotide sequences. It is a part of the ViennaRNA Package, a collection of programs and libraries designed for RNA secondary structure prediction and analysis. RNAfold calculates the Minimum Free Energy (MFE) structure of an RNA sequence, which represents the most thermodynamically stable conformation the RNA can adopt under given conditions. The MFE is the lowest possible free energy state of an RNA molecule's secondary structure. It reflects the most stable configuration that the RNA can form due to intramolecular base pairing and stacking interactions. MFE is computed using thermodynamic parameters derived from experimental data. These parameters consider various contributions to the RNA's free energy, including base pairing, loops, bulges, and mismatches. See Lorenz, R., Bernhart, S. H., Höner zu Siederdissen, C., Tafer, H., Flamm, C., Stadler, P. F., & Hofacker, I. L. (2011). ViennaRNA Package 2.0. Algorithms for Molecular Biology, 6, 26.
[0145] We have found that the adoption of a cloverleaf structure by MFE tRNA structures corresponds with functionality in orthogonal tRNAs; this property is also referred to as structural viability.
[0146] Identity Elements
[0147] Identity elements for prokaryotic and eukaryotic tRNAs have been defined in the art; see Giege33.
[0148] Orthogonality
[0149] As used herein, an “orthogonal” tRNA or tRNA synthetase is functional in a host cell only in conjunction with its cognate tRNA or synthetase, which is not native to the host cell. Thus, the orthogonal tRNA / synthetase pair function together and independently of the host tRNA synthetases and tRNA molecules. Orthogonality is expressed with respect to a host cell, which is advantageously a prokaryotic cell, such as a bacterial cell, for example an E. coli cell.
[0150] Functionality
[0151] Afunctional tRNA / synthetase pair is a pair which can charge the tRNA with the correct amino acid (including ncAA) or non-AA monomer. Preferably, the functional tRNA can successfully base-pair with its cognate codon on a mRNA molecule.
[0152] We have observed that, in orthogonal tRNAs, functionality is associated with the canonical cloverleaf structure of tRNA of the predicted MFE structure of the tRNA, referred to as structural viability. To investigate the relationship among tRNA structure, expression and activity, we calculated (using RNAfold18) the predicted MFE structure for each of the 231 tRNA sequences that we previously investigated17and a reference set of 50 native E. coli tRNAs. We then checked whether the structure predicted by RNAfold matched the cloverleaf structure for a tRNA of the corresponding isoacceptor class. We also extracted the frequency — the percentage of the MFE structure in the predicted ensemble — and the diversity — the average base pair distance between possible alternative structures in the ensemble (Fig. 1a).
[0153] Approximately 40% of the primary sequences of native E. coli tRNAs were predicted to fold into cloverleaf tRNAs, providing a benchmark for the percentage of tRNA sequences predicted to fold into cloverleaf structures in the absence of host-specific factors. Thirtyeight percent of the 231 tRNAs in our dataset were also predicted to fold into cloverleaf tRNAs.
[0154] Among primary sequences of non-orthogonal active tRNAs — which are sufficiently similar to E. coli tRNAs to be aminoacylated by endogenous synthetases — 40.2% were predicted to fold into cloverleaf tRNAs; these tRNAs may be sufficiently similar to E. coli tRNAs that they benefit from mechanisms that help fold E. coli tRNAs.
[0155] Therefore, amongst native tRNAs, the majority of sequences do not appear to adhere to the cloverleaf structure.
[0156] Strikingly, approximately 80% of orthogonal active tRNAs are predicted to fold into a cloverleaf structure (Fig. 1b); this is a significantly higher percentage than expected (P< 0.001) and suggests that predicted MFE cloverleaf folding is a distinct feature of active and orthogonal tRNAs. These tRNAs may need to have the ability to fold into a cloverleaf hardwired into their primary sequence, as they may be sufficiently distinct from E. coli tRNAs that they do not benefit from mechanisms that help fold endogenous tRNAs1920. Converting orthogonal tRNAs from inactive to active
[0157] Our analysis prompted us to investigate the effect of introducing mutations - that are predicted to increase the percentage of cloverleaf structures and decrease the diversity of other structures - for two tRNAs: CsProtRNAcuA and ApHistRNACUA. The parent tRNAs (CsProtRNAuGG and ApHistRNAGuG) are orthogonal with respect to E. coli synthetases and are aminoacylated by their cognate synthetases (Coprobacillus sp. D7 ProRS and Afifella pfennigii DSM 17143 HisRS) and are, therefore, classed as orthogonal active tRNAs in E. coli. However, CsProtRNACuA does not direct stop codon read-through in E. coli in the presence of the cognate synthetase of its parent tRNA (and we class it as orthogonal inactive), whereas ApHistRNACuA directs stop codon read-through in the absence of the cognate synthetase of its parent tRNA (and we class it as non-orthogonal active).
[0158] CsProtRNACuA has a low percentage predicted cloverleaf structure in the ensemble, and ApHistRNACUAdoes not fold into an unambiguous predicted cloverleaf structure. We fixed both tRNA structures into high-frequency / low-diversity predicted cloverleaf structures with 2-3 rationally designed point mutations (Fig. 2 ). We expressed the derivatives, CsProtRNACuAfix and ApHistRNACuAfix, and measured the ability of each tRNA (with and without the cognate synthetase of the parent tRNA) to read through an amber stop codon at position 3 of GFP, in sfGFP3TAG6, to generate a fluorescent signal (Fig. 2b, c).
[0159] CsProtRNACuAfix exhibited a six-fold increase in activity over CsProtRNACuA in the presence of CsProRS and had little activity in the absence of this synthetase. We note that this pair is much less active (10-fold) than the PylRS / PyltRNACUApair currently used for genetic code expansion. The encoding of proline with the CsProRS / CsProtRNACUAfix pair was confirmed by mass spectrometry. We conclude that introducing mutations that fix the predicted cloverleaf structure of this tRNA is sufficient to convert it from an orthogonal inactive tRNA to an orthogonal active tRNA in E. coli; this is consistent with our hypothesis that CsProtRNACuA is inactive due to mis-folding, but other interpretations are formally possible. ApHistRNACuAfix was further activated with respect to ApHistRNACuA but remained non-orthogonal active. We conclude that fixing the predicted cloverleaf structure of a non-orthogonal active tRNA is not sufficient to generate an orthogonal active tRNA. Overall, we suggest that fixing the cloverleaf structure of a tRNA has the potential to increase its activity but does not necessarily increase its orthogonality.
[0160] Converting non-orthogonal tRNAs to orthogonal tRNAs
[0161] We demonstrate that ApHistRNACuAfix is aminoacylated with either lysine or glutamine, as found for ApHistRNACUA17. ApHistRNACUAand ApHistRNACuAfix do not contain the set of classical identity elements for E. coli lysyl-tRNA or glutaminyl-tRNA synthetases. We, therefore, hypothesized that unknown sequence elements within these tRNAs were responsible for their aminoacylation by E. coli synthetases. As the location and nature of these permissive elements were unknown, we aimed to diversify the whole sequence of ApHistRNACuAfix by a strategy we refer to as Chimeric Replacement tRNA Mutagenesis (CRtM) (Fig. 3a). This approach aims to generate chimeric sequences that (1) retain a high-frequency MFE cloverleaf structure, (2) cover diverse sequences and (3) minimize the canonical identity elements for E. coli synthetases. We anticipate that CRtM may provide a pool of sequences enriched with active and orthogonal tRNAs.
[0162] We generated chimeras of ApHistRNACuAfix by substituting defined sections in the tRNA body, using 16 replacement schemes (Fig. 3b), with sequences from 12 selected histidyl-tRNAs; these tRNAs were chosen because they have minimal canonical E. coli identity element nucleotides (Supplementary Table 1 in17). This created 192 chimeric tRNA sequences (16 x 12). We calculated the frequency of the most abundant predicted structure and the diversity of predicted structures for each sequence using RNAfold (Fig.
[0163] 3c). We chose chimeric sequences to characterize such that (1) each of the 12 selected tRNAs was represented; (2) the sequences had a predicted cloverleaf structure as their most abundant species; and (3) the predicted cloverleaf folding was unambiguous. For five of the 12HistRNAs, such a chimeric tRNA was directly generated by one of the replacement schemes. We further optimized the remaining seven chimeric tRNAs by manually introducing point mutations (Fig. 3c), which increased unambiguous predicted cloverleaf folding or removed potential identity element nucleotides for E. coli synthetases.
[0164] The 12 selected and optimized chimeric tRNAs were tested for their ability to mediate read-through of the amber stop codon in sfGFP3TAG6with and without ApHisRS (Fig. 3d). Six of theHistRNACUAs (T2H1, T2H3, T2H4, T2H8, T2H11 and T2H12) exhibited little activity regardless of whether ApHisRS was added, and three tRNAs (T2H2, T2H5 and T2H10) were active and exhibited little change in activity upon addition of ApHisRS; these tRNAs are non-orthogonal and were not considered further. Three tRNAs (T2H6, T2H7 and T2H9) exhibited ApHisRS-dependent read-through of the amber codon (Fig. 3d); these tRNAs exhibited some orthogonality. tRNAs — unlike ApHistRNACuAfix — directed the incorporation of histidine in response to the amber stop codon in sfGFP3TAG6, when paired with ApHisRS.
[0165] Comparing the sequence of ApHistRNACUAfix, which is aminoacylated by E. coli synthetases with lysine or glutamine, and the T2H6, T2H7 and T2H9 chimeras, which are selectively aminoacylated by ApHisRS with histidine, reveals that all the chimeric tRNAs share a 2-base pair (bp) deletion in the D-loop and have a 1-nucleotide (nt) shorter variable loop (Fig.
[0166] 3e). The deleted bases of the D-loop are not canonical identity elements for the E. coli aminoacyl-tRNA synthetases that direct the aminoacylation of their cognate tRNAs with lysine or glutamine21. Instead, the deleted bases are likely ‘permissive elements’ that permit the mis-aminoacylation of these tRNAs by particular E. coli synthetases22,23. The deletions — within the context of these chimeric tRNAs — may be anti-determinants for mis-aminoacylation by these E. coli synthetases21.
[0167] Our data demonstrate how permissive elements for the mis- aminoacylation of a tRNA, which are much more poorly understood and delineated than identity elements, can be removed through the targeted exploration of tRNA sequence diversity.
[0168] Evolved anti-codon recognition enhances pair activity
[0169] The ApHisRS / T2H9HistRNAcuAfix pair and the CsProRS / CsProtRNACUAfix pair that we developed exhibited 5-10% of the activity of the PylRS / PyltRNACUApair commonly used for genetic code expansion. Because the anticodon of their cognate tRNAs is a recognition element for HisRS24and ProRS25, we hypothesized that the activity of these pairs may be limited by inefficient aminoacylation of their tRNAs. Therefore, we investigated the directed evolution of ApHisRS and CsProRS for improved activity with T2H9HistRNACUAfix and CsProtRNACUAfix, respectively (Fig. 4a).
[0170] The structure of HisRS / HistRNAGUGreveals how the synthetase recognizes the anticodon of its cognate tRNA24(Fig. 4b). We created an anticodon recognition (ACR) library in ApHisRS by mutating residues Ser482, Asp483, Glu484 (responsible for recognition of G34 in the tRNA), Gly491 (proximal to G34) and Arg493 (which interacts with G36). All five positions were mutated to the 20 canonical amino acids, creating a library of 107mutants. The ACR library was subjected to selection for the ability to read through an amber stop codon at position 111 of chloramphenicol acetyltransferase (CAT111TAG) and confer resistance to chloramphenicol, when provided with T2H9HistRNACUAfix. After selection, we isolated three distinct ApHisRS variants (v1, v2 and v3; Fig. 4c).
[0171] ApHisRS(v1-3) / T2H9HistRNACuAfix pairs were approximately seven times more active than the ApHisRS / T2H9HistRNACuAfix pair, when measured by read-through of an amber stop codon at position 3 of sfGFP (sfGFP3TAG', Fig. 4d).
[0172] The structure of ProRS / tRNACGGreveals how the synthetase recognizes the anticodon of its cognate tRNA25(Fig. 4e). We created an ACR library by targeting residues Glu340, Arg347, Lys370, Glu349 and Asp354 in CsProRS. All five positions were mutated to the 20 canonical amino acids, creating a library of 107mutants. We subjected the ACR library to selection for the ability to read through the amber stop codon in CAT111TAG and confer chloramphenicol resistance, when provided with CsProtRNACUAfix. From the selection and subsequent screening, we identified eight CsProRS variants (v1-8; Fig. 4f). The CsProRSv1 / CsProtRNACuAfix pair showed a five-fold increase in activity with respect to the CsProRS / CsProtRNACUAfix pair. We introduced an S261G mutation (homologous to the C443G mutation previously introduced into EcProRS26) into CsProRSv1, creating CsProRSv1* The CsProRSv1* / CsProtRNACUAfix pair was 1.5 times more active than the CsProRSv1 / CsProtRNAcuAfix pair.
[0173] During the evolution of CsProRSv2 and CsProRSv3, CsProtRNACUAfix spontaneously acquired a G37A mutation. This mutation is known to improve amber decoding in other tRNAs, including Archaeoglobus fulgidusProtRNACUA7. We confirmed that CsProtRNACUAfix (G37A) was orthogonal with respect to E. coli aminoacyl synthetases (Fig. 4g). The CsProRSv2 / CsProtRNACuAfix (G37A) pair showed a four-fold increase in activity with respect to the CsProRS / CsProtRNACUAfix pair. We introduced the S261G mutation (homologous to the C443G mutation previously introduced into EcProRS26) into CsProRSv2, creating CsProRSv2*. The resulting CsProRSv2* / CsProtRNAcuAfix (G37A) pair was 12 times more active than the CsProRS / CsProtRNACUAfix pair.
[0174] The activities of the ApHisRS(v1–3) / T2H9HistRNACUAfix and CsProRSv2* / CsProtRNACUAfix (G37A) pairs are similar to the activity of the PylRS / PyltRNACUApair commonly used for genetic code expansion. We conclude that orthogonal tRNAs and their cognate synthetase can be generated by (1) creating tRNAs that retain the identity elements for their cognate aminoacyl-tRNA synthetases and fold into a cloverleaf; (2) removing host (E. coli) identity elements and permissive elements to create tRNAs that are not aminoacylated by endogenous synthetases; and then (3) evolving the ACR of the cognate synthetase to ensure efficient aminoacylation.
[0175] Chi-T and RS-ID generate orthogonal pairs
[0176] Next, we aimed to combine what we had learned about the properties of orthogonal tRNAs to create an approach for the computational generation of potentially active and orthogonal tRNAs from many more tRNA sequences. We started with the following postulates: (1) sequences that are predicted to fold into unambiguous cloverleaf structures will favor activity in E. coli; (2) recognition of a chimeric tRNA by a synthetase is favored when the identity elements of the cognate tRNA for the synthetase are present in the chimeric tRNA; (3) orthogonality of a cloverleaf tRNA will be determined by its sequence differences from E. coli tRNAs, particularly at identity element nucleotides; and (4) tRNA sequences that minimize permissive elements (which allow non-orthogonal interactions) and inhibitory elements (which abrogate activity) may be discovered by sampling diverse sequences from the same isoacceptor class. These postulates were incorporated into Chi-T, a computational algorithm that automatically generates a user-defined number of chimeric, unambiguous cloverleaf tRNA designs for a target anticodon or anticodons (Fig. 5a).
[0177] Chi-T takes the 10 million annotated and aligned tRNA sequences in tRNADB-CE27as an input and automatically generates chimeric tRNA sequences. These sequences are assembled from nine distinct parts. These parts and their canonical tRNA numbering are as follows: acceptor stem {1-7 + 66-72}, unpaired bases 8-9 {8-9}, D-arm {10-13 + 22-25}, D / T-loops {14-21 + 54-60}, unpaired base and variable loop {26 + 44-48}, anticodon stem {27-31 + 39-43}, anticodon stem loop {32-38}, T-arm {49-53 + 61-65} and the discriminator base with CCA {73-76}. These parts are structurally defined, such that base pairing interactions and tertiary interactions are between nucleotide positions within a part. Chi-T extracts the tRNA part sequences known to contain identity elements21from a user-defined tRNA gene of interest and fixes these part sequences for all the chimeric tRNA sequences that it generates (Fig. 5a). Chi-T then varies the sequences of the remaining parts of the tRNA sequence, which do not contain known identity elements for the synthetases in the isoacceptor class of the user-defined tRNA. To choose the variable sequences for each part, Chi-T extracts all other tRNA sequences in the same isoacceptor class found in the tRNADB-CE database and sections these into parts, which are then filtered based on a number of user-defined criteria. The default settings for filtering are as follows: part sequences contain only nucleotides A, C, U and G (some database sequences contain N, R, etc.); sequences of paired parts (for example, D-arm) must be composed of Watson-Crick base pairs; variable loops must be shorter than 8 nt; unpaired bases 8-9 must contain at least one uracil nucleotide; and the D-loop must start with an adenine nucleotide.
[0178] Chi-T then compares each filtered part sequence to the corresponding part sequence of all E. coli tRNAs and, thereby, generates a part identity score; this score reflects the number of identity element nucleotides that each sequence part shares with the corresponding part sequences of all E. coli tRNAs. Part sequences with the lowest part identity scores contain the fewest E. coli identity elements. We, therefore, predicted that assembling chimeric tRNAs from part sequences with low part identity scores would favor the generation of tRNAs that are not aminoacylated by E. coli synthetases. For each part, Chi-T selects several part sequences (on the order of 101–102) with the lowest part identity scores. The selected part sequences are then clustered by affinity propagation, and Chi-T selects one exemplar part sequence from each distinct cluster with which to generate chimeric tRNAs. These steps aim to maximize both the dissimilarity of parts to their corresponding E. coli part sequences and the sequence diversity covered for each part.
[0179] Chi-T then combines the fixed and variable part sequences to form a library of chimeric tRNA sequences (Fig. 5a). The library is then optionally filtered by length (by default, tRNAs >78 nt are filtered out). The sequence identity score17is determined for each chimera, and tRNA sequences with the highest sequence identity score (sharing the most identity elements with E. coli synthetases) are discarded. The stringency of filtering here is left to the user; however, Chi-T aims to keep 2.5 million tRNAs to balance low sequence identity scores, sufficient sequence diversity and computational memory and time efficiency.
[0180] The resulting library of chimeric tRNA sequences is then computationally folded, with RNAfold, to generate the MFE structure, frequency and ensemble diversity for each chimeric sequence. MFE structures that match the general stable cloverleaf structure, which we define within Chi-T but can be modified as a user input, are selected (the default cutoff points for diversity and cloverleaf frequency are set at diversity <10 Abp and frequency >30%, but these can be varied within Chi-T). All tRNAs for which the predicted MFE structures are not cloverleaf, or for which the cloverleaf structure forms only a small part of the predicted conformational ensemble (low frequency), or for which the cloverleaf structure is not predicted to be stable (high diversity), are discarded.
[0181] Chi-T alters the anticodon of the tRNA sequences to each user- defined anticodon, in series, and repeats the structural prediction and filtering. The tRNA sequences with an average frequency and an average ensemble diversity above user-defined thresholds (across the defined anticodons) are taken forward. This kind of intrinsic robustness to anticodon reassignment is found in the PylRS / PyltRNACUApair and its CGA and UGA anticodon variants: all threePyltRNA anticodon variants are predicted by RNAfold to form robust cloverleaf structures, and the resulting pairs can decode their cognate codons3.
[0182] The resulting chimeric tRNA sequences are then clustered by affinity propagation to generate a set of tRNA sequences that represent the remaining sequence space. A user-defined number of output sequences is then identified for experimental characterization by calculating the group of tRNA sequences that maximize dissimilarity between any pair of output sequences. To identify synthetases to test with the sequences generated by Chi-T, we developed RS-ID. This script takes all tRNA sequences from the same isoacceptor class used by Chi-T and identifies the subset of tRNAs with similar or identical identity element sequences and identity element parts to those of the tRNA used to define the fixed identity parts used in Chi-T. The organisms from which these tRNAs are derived provide candidate synthetases to test for activity with Chi-T-derived tRNAs. Testing several synthetases may increase the chance of finding a synthetase that expresses in a functional form in E. coli and that acylates the chimeras.
[0183] We tested Chi-T using the non-anticodon identity parts ofTrptRNAfrom Entamoeba histolytica (the acceptor stem part (nucleotides 1-7 GGGGGCT and nucleotides 66-72 AGCCCTC) and the discriminator base with CCA part (nucleotides 73-76 ACCA)). These identity parts contain distinct nucleotides from E. coliTrptRNA at four of the seven, nonanticodon, identity element positions forTrp isoacceptors (nucleotides 1, 2, 3, 70, 71, 72 and 73) and do not contain a full set of identity element nucleotides recognized by any E. coli synthetases. RNAfold predicts a non-cloverleaf MFE structure for E / 7TrptRNACuA (Fig.
[0184] 5b; frequency: 13.3%, diversity: 5.3 Δbp, cloverleaf: no). Expression of EhTrptRNACUAin E. coli does not lead to read-through of the amber stop codon in sfGFP150TAGHis6- Combining Eh TrpRS, which can be expressed in E. coli28, with EhTrptRNACUAdid not lead to enhanced GFP fluorescence from sfGFP150TAGHis6(Fig. 5c). These observations suggest that EhTrptRNACUAis not produced in a functional, stable form in E. coli.
[0185] To generate chimeric tRNAs with the identity elements from EhTrptRNA, Chi-T first extracted the two parts of EhTrptRNA that contain the non-anticodon identity elements for tryptophanyl-tRNA synthetases from tRNADB-CE and fixed these parts for all chimeric tRNAs. For the remaining seven parts, Chi-T extracted all unique sequences derived from tryptophanyl tRNAs from tRNADB-CE and created a library of variable part sequences for each part.
[0186] Chi-T scored all variable part sequences by their part identity score and selected the top 65 sequences for each part for clustering by affinity propagation (65 sequences were user defined to produce a computationally feasible number of chimeras from combining variable part sequences). Parts with fewer than 15 unique sequences, namely unpaired bases 8-9 with six unique sequences, were not clustered. Part sequences for all other parts were clustered individually. Cluster exemplars (D-arm: 8 part sequences, D / T-loop: 15 part sequences, unpaired base / variable loop: 14 part sequences, anticodon stem: 11 part sequences, anticodon stem loop: 9 part sequences, T-arm: 9 part sequences), along with non-clustered variable sequences (unpaired bases 8-9: 6 part sequences) and the fixed identity parts (acceptor stem and discriminator base with CCA), were used for chimera generation.
[0187] Combining the fixed and variable parts, and setting the anticodon to CUA, generated a library of 9.0 × 106(1 × 6 × 8 × 15 × 14 × 11 × 9 × 9 × 1) chimeric sequences, of which 7.5 x 106were shorter than 79 nt in length and were retained. In total, 2.6 x 106of these sequences passed the sequence identity score filter (sequences with a sequence identity score >0 for any isoacceptor class were discarded). We computationally folded the remaining sequences using RNAfold and discarded sequences whose MFE structures were not cloverleaf. Of the MFE cloverleaf tRNAs, we discarded those with a diversity >9 Δbp or cloverleaf structure frequency <40%. This step removed more than 99.9% of sequences, leaving 642 chimeric sequences that passed the first round of computational selection (Fig. 5d).
[0188] We decided to iteratively alter the anticodon of the chimeric tRNAs to UGA and CGA and remove tRNA sequences predicted to misfold. Installing robustness to anticodon mutations into our design further reduced the number of tRNA sequences to 489 and 361 after successive iteration of the anticodon to UGA and CGA, respectively. Finally, the anticodon / variable ensemble of each tRNA was filtered such that any tRNA ensemble with an average diversity >8 Δbp or average frequency <50% was discarded, leaving 169 structurally robust tRNA sequences (Fig. 5e). These filtering thresholds were set so that 102–103sequences were passed on to the next step.
[0189] Chi-T then clustered these 169 tRNA sequences, using affinity propagation, to give 33 cluster exemplars. From these 33 exemplars, a group of four tRNA sequences, which maximized the minimum Levenshtein distance between any pair in the four, was returned — that is, Chi-T picked the most diverse ensemble by ensuring that no two tRNAs were similar. The Levenshtein distance between any of the four tRNA sequences in the group was >18 bases (Fig. 5f).
[0190] Using RS-ID, we identified 80 potential tryptophanyl-tRNA synthetases (with cognate tRNAs bearing eight distinct sets of identity parts) to test with the tRNAs output by Chi-T (Fig. 8). We tested three of these synthetases (TrpRS, TrpRS and TrpRS), based on their known expression in E. coli, with the four tRNAs that were output by Chi-T (Fig.
[0191] 5g). One of the tRNA sequences, 1092TrptRNACUA, produced low fluorescence from sfGFP3TAG, and addition of P TrpRS led to a 44-fold increase in GFP fluorescence; these experiments demonstrated that 1092TrptRNACUAis an active and orthogonal tRNA (Fig. 6a). Mass spectrometry confirmed that the Ph TrpRS / 1092TrptRNACUApair led to the selective incorporation of Trp in response to the amber codon in sfGFP150TAGHis6(Fig. 6b).
[0192] We also ran Chi-T with fixed identity parts derived from PhTrptRNA and TbTrptRNA and identified four tRNAs from each run. These tRNAs were not active with E TrpRS, P TrpRS and TbTrpRS. Overall, we tested 12 chimeras in combination with three synthetases identified by RS-ID to discover an active and orthogonal tRNA.
[0193] We note that variants of 1092TrptRNACuA in which the identity parts derived from EhTrptRNA were replaced with those from PhTrptRNA or TbTrptRNA were orthogonal and active with P TrpRS. These experiments suggest that, as expected, the tRNA sequences generated by Chi-T are sensitive to the choice of starting identity parts but that orthogonal and active tRNAs identified with one set of identity parts may be functional with closely related sets of identity parts.
[0194] In additional experiments, we used Chi-T and RS-ID to generate an active and orthogonal 1081ArgtRNACuA, which is aminoacylated by CsArgRS17. The starting point for discovering this pair was the identity parts from Ff^tRNAcuA; this tRNA did not fold into an MFE cloverleaf structure and was not active with FfArgRS or CsArgRS. To identify the active and orthogonal tRNA, we tested four chimeras with FfArgtRNA identity parts, in combination with two synthetases identified by RS-ID. These experiments demonstrated that Chi-T can be used to generate orthogonal and active tRNAs for an additional isoacceptor class.
[0195] The active and orthogonal tRNAs that we discovered via Chi-T contain 28 mutations (1092TrptRNACUA) and 14 mutations (1081ArgtRNACUA) with respect to EhTrptRNA and FtArgtRNA. It would be exceptionally challenging to discover these sequences by directed evolution approaches, and we suggest that Chi-T enables the exploration of regions of sequence space not commonly accessible by other methods. Overall, we discovered active and orthogonal tRNAs and cognate synthetases in one out of 36 combinations tested with identity parts from EhTrptRNA, PhTrptRNA and TbTrptRNA and in one out of eight combinations tested with FtArgtRNA identity parts, using Chi-T and RS-ID. We do not expect all combinations of identified synthetases and chimeric tRNAs to function, but we suggest that the sequence space sampled through Chi-T and RS-ID may be substantially enriched in orthogonal and active chimeric tRNAs and synthetases that may acylate them. ncAA incorporation with an orthogonal pair from Chi-T / RS-ID
[0196] To demonstrate that the Ph TrpRS / 1092TrptRNACUApair can be engineered to incorporate ncAAs, we introduced mutations into the active site of Ph TrpRS (Y78F T79A I212G A214C), creating Ph TrpRS* The analogous amino acid mutations in EcTrpRS29generate a variant that directs the incorporation of 5-hydroxytryptophan (1; Fig. 6c). Cells containing the Ph TrpRS* / 1092TrptRNACUApair and sfGFP150TAGHis6exhibited low fluorescence in absence of 1 and an increase in fluorescence upon addition of 1 (Fig. 6d). Mass spectrometry confirmed that the Ph TrpRS* / 1092TrptRNACUApair directs the incorporation of 1 into GFP (Fig. 6e).
[0197] To increase the activity of the Ph TrpRS* / 1092TrptRNACUApair, we evolved the ACR of Ph TrpRS* (to accommodate the C35U mutation in 1092TrptRNACuA with respect to PhTrptRNACCA). We identified a variant— Ph TrpRS*v1 (Y78F T79A I212G A214C T283H R286Q) / 1092TrptRNACUApair — that exhibited a 30% increase in activity over the progenitor Ph TrpRS* / 1092TrptRNACUApair. The Ph TrpRSv1* / 1092TrptRNACUApair with 2 mM 1 produced approximately 80% of the fluorescence produced by the Mm PylRS / MmPyltRNACUAwith 2 mM AllocK, via read-through of the amber codon in sfGFP150TAGHis6(Fig. 6d). These experiments demonstrate that Ph TrpRS and its derivatives are orthogonal in E. coli.
[0198] In additional experiments, we used a Ph TrpRS*v1 / 1092TrptRNACGApair to decode TCG codons in Syn61Δ3ev5 E. coli. This is consistent with the Chi-T optimization of the 1092TrptRNA cloverleaf structure with a CGA anticodon and demonstrates that P / ? TrpRS*v1 is tolerant to a single mutation in the anticodon of 1092TrptRNACUA. We also showed that the Ph TrpRS*v1 / 1092TrptRNACGApair is mutually orthogonal it its acylation specificity with respect to the widely used Mm PylRS / tRNA pair and the four other pairs developed herein.
[0199] Methodology
[0200] Materials
[0201] Antibiotics and arabinose were obtained from Sigma-Aldrich. 5-Hydroxy-l-tryptophan was purchased from Acros Organics (cas. 4350-09-8, code 148290050). RNAfold webserver was used to create tRNA secondary structure illustrations34. A local installation of the Vien-naRNA Package18(version 2.4.18) was used to calculate all values given for the tRNA diversity and frequency. Molecular graphics and analyses were performed with UCSF Chimera, developed by the Resource for Biocomputing, Visualization, and Informatics at the University of California, San Francisco, with support from National Institutes of Health P41-GM103311 (ref. 35). GraphPad Prism version 9 was used to graph any collected data.
[0202] Plasmid generation
[0203] These plasmids were obtained or adapted from the following: pKW1 (ref. 17) (spectinomycin resistance); p15A CAT111TAG sfGFP150TAGHis6(ref. 17) (tetracycline resistance); p15A 1R26PylRS CAT111TAG sfGFP150TAGHis6(ref. 15) (tetracycline resistance); p15A sfGFP3TAGHis6(ref. 3) (apramycin resistance); and pKW1 Methanosarcina mazei pyrrolysine synthetase (MmPylRS) with optimizedPyltRNACUA4,36,37. Rationally designed tRNAs (ApHistRNACUAfix and CsProtRNACUAfix) were obtained by QuikChange mutagenesis using the respective pKW-(XxxtRNAcuA) and pKW-(tRNAX)-aaRS as a template38. All Chi-T-generated tRNAs were obtained as duplex oligos (Integrated DNA Technologies (IDT)) with matching overhangs for ligation into a Notl / Bgll I (New England Biolabs (NEB), FastDigest) digested and gel-extracted pKW1 (tRNAX) vector. The coding sequences for each tryptophanyl synthetase (E TrpRS, TbTrpRS and P TrpRS) were obtained as codon-optimized and recoded (Syn61) gBIocks (IDT) and inserted into a p15A vector containing CAT111TAG and sfGFP150TAGHis6by Gibson Assembly (NEB). All synthetase libraries were generated in the pKW-(XxxtRNACuA)-aaRS vector. The E TrpRS coding sequence was introduced into the pKW-1092TrptRNACUAthrough Gibson Assembly39(NEB). Mutants of EhTrpRS were generated using a QuikChange protocol. ACR libraries were generated using Gibson Assembly or type IIS restriction digest (Bsal orSapI) and ligation (T4; NEB) of a polymerase chain reaction (PCR) product using degener- ate primers targeting the residues of interest. For experiments in Syn61A3(ev5) (Addgene, bacterial strain 174514), derivatives of the p15A sfGFP3 TCG4TAGHis6(ref. 3) reporter plasmids (apramycin resistance; Addgene, 174518) and a recoded version of the pKW1 vector (spectinomycin resistance) were used.
[0204] RNAfold analysis of tRNAs
[0205] All tRNA sequences for which tREX measurements were available were collected and folded using a local installation of RNAfold (ver- sion 2.4.18)18,34. E. coli K12 tRNA sequences were obtained from the Genomic tRNA Database (GtRNAdb)40. For all sequences, the MFE structure and its diversity and frequency were determined using the default settings of RNAfold. By manual inspection of the MFE structure, we assessed whether a sequence showed a cloverleaf fold or not. Small deviations from the theoretical optimal structure were accepted as long as the overall structure showed a clearly defined tRNA acceptor stem, D-arm, T-arm and anticodon loop, with the anticodon unpaired.
[0206] Examples of small changes that were accepted include additional base pairing within a loop, pairing with the variable loop or base 8-9 and changes in line with the cloverleaf structure of tRNAs other than the predicted class. Not accepted were mismatches in the tRNA stem and any pairings between main tRNA domains (stem, D / L-arm or AC loop) or of the tRNA anticodon. All tRNAs judged to be compliant with these criteria were labeled as predicted cloverleaf tRNA (cloverleaf: yes), whereas any other structure is referred to as a non-cloverleaf tRNA (cloverleaf: no).
[0207] Rational design of structurally fixed tRNAs
[0208] To stabilize the tRNA structure, we targeted elements of the tRNA body that do not carry identity elements — for example, the variable loop or anticodon stem. We aimed to use these mutations to generate unambiguous cloverleaf folds and minimize the possibility of other structures. The RNAfold prediction for the diversity and frequency was used as a measure of structural ambiguity, and we aimed to find high-frequency / low-diversity cloverleaf structures. In most cases, we found an unambiguous, folded cloverleaf tRNA by introducing 2-3-bp changes.
[0209] Quantification of amber decoding
[0210] Next, 20 µl of chemically competent E. coli DH10B or Syn61Δ3(ev5) containing a transformed p15AGFP reporter plasmid (sfGFP150TAGHis6- CAT111TAG, tetracycline resistant or sfGFP3TAGHis6apramycin resistant) was transformed with the pKW1 plasmid (conferring spectinomycin resistance) containing only the tRNA or the tRNA and its cognate synthetase. Syn61A3(ev5) was transformed with the reporter plasmid, followed by the pKW1 plasmid. Cells were recovered in 200 pl of SOB medium, 1,000 r.p.m., 1 h, 37 °C, before transfer into 2 ml of LB medium containing the appropriate antibiotics for selection. Cells were grown overnight (48 h for Syn61Δ3(ev5)) at 37 °C, 220 r.p.m. Then, 50 pl of each culture was diluted in triplicate into 450 pl of LB medium containing 0.2% L-arabinose and the appropriate antibiotics. Cultures were grown in a 96-well plate with 1.2-ml wells. For synthetases that recognize a non-canonical amino acid, a second triplicate expression containing 2 mM non-canonical amino acid (AllocK for / WmPyIRS or 5-hydroxytryptophan for P TrpRS* and its variants) was performed. The wells were sealed with air-permeable foil, and the plate was incubated for 20 h in a shaking incubator at 37 °C, 1,000 r.p.m. Cells were then harvested by centrifugation (3,000g, 10 min); the medium was discarded; and the inverted plate was briefly placed on paper towels (1-2 min). The cell pellets were resuspended in 150 pl of PBS, of which 100 pl of resuspended cells was transferred to a clear, flat-bottom, 96-well plate (Nunc96). The optical density at 600 nm (OD600) and GFP fluorescence of each well were measured with a PHERAstar FS (BMG Labtech) plate reader. OD600was measured using a 600-nm light source, and GFP fluorescence was measured using an optical module with an excitation wavelength of 485 nm and an emission wavelength of 520 nm (gain was set to 0). Plots show the average GFP fluorescence normalized by OD600as a bar graph; the individual data points are shown as dots; and the standard deviation of the triplicate measurement is shown.
[0211] Screening of Chi-T-generated tRNAs by GFP expression Chemically competent DH10B cells were doubly transformed by heat shock (42 °C, 45 s, 20 pl of cells) with the pKW1-tRNA plasmid and p15A-synthetase GFP reporter plasmid. Cells were recovered for 1 h (200 pl of SOC medium, 37 °C, 850 r.p.m.). Each transformation was added to a different well of a 96-well plate (Nunc96, 1.2 ml or 2.2 ml) and diluted to 1 ml in LB containing tetracycline (10 µg ml-1) and spectino- mycin (50 µg ml-1) and grown overnight (37 °C, 300 r.p.m.). Overnight cultures were diluted 10-fold into 450 pl of LB containing selection anti- biotics and 0.2% L-arabinose to induce GFP expression and incubated at 37 °C (20 h). Measurements were performed as described above, and heatmaps show individual measurements of the transformed pool.
[0212] GFP purification and mass spectrometry
[0213] Expression of sfGFP-His6for purification was identical to those used during the tRNA screening. Expression was scaled to larger volumes (0.5 ml to 10 ml of LB) if necessary, and non-canonical amino acids were added at 2 mM. Cells were harvested by centrifugation, and the cell pellets were used directly for purification or stored frozen at −20 °C. The pellet was resuspended in 1 ml of 20 mM Tris-HCl pH 8, 150 mM NaCI containing 1× BugBuster Protein Extraction Reagent and lysed by agitating for 10 min at room temperature. The lysate was cleared by centrifugation (10 min, 15,000g), and Ni-NTA beads (20 pl of slurry) were added to the supernatant. The slurry was incubated while agitated for 1 h (room temperature). The beads were collected using a fritted spin filter (300g, 10 s) and washed three times with 1 ml of wash buffer (20 mM Tris-HCI pH 8, 150 mM NaCI, 40 mM imidazole). GFP was eluted in 50 pl of elution buffer (20 mM Tris-HCI pH 8, 150 mM NaCI, 200 mM imidazole). The elution buffer was exchanged for 20 mM Tris-HCI pH 8, 150 mM NaCI using a 10-kDa spin concentrator. High-resolution mass spectra of GFP were obtained by electro- spray ionization mass spectrometry (ESI-MS) using a Waters Xevo G2 MS with a modified nanoAcquity LC system, as previously reported3,17. In brief, injected proteins were separated on a BEH C4 UPLC column (1.7 µm; 1.0 × 100 mm; Waters) with a flow rate of 50 µl min-1 using an acetonitrile gradient starting at 2% v / v to 80% v / v (0.1% v / v formic acid) over 20 min. The column outlet was directly interfaced via an ESI source with a hybrid quadrupole time-of-flight mass spectrometer (Waters). A cone voltage of 30 V was used during data acquisition in positive ion mode with a range of 300-2,000 m / z. The scans were deconvoluted using the MaxEntl function within MassLynx software (Waters). Spectra were also obtained using an Agilent 1200 liquid chromatography-mass spectrometry (LC-MS) system equipped with a 6130 Quadrupole spectrometer. Then, 10 pl of sample was applied on a Phenomenex Jupiter C4 column (150 x 2 mm, 5 pm), and a gradient of Buffer A (0.2% formic acid in water) and Buffer B (0.2% formic acid in acetonitrile (MeCN)) was used for reverse-phase high-performance liquid chromatography (HPLC), 10% to 90% B in 6 min. Mass spectra were acquired in positive mode and analyzed with MS ChemStation software (Agilent Technologies). The deconvolution program pro- vided in the software was used to obtain the entire mass spectra. The expected, theoretical mass was calculated using ProtParam (Expasy) and adapted for the mass difference expected for a given non-canonical amino acid and GFP maturation.
[0214] ACR selections
[0215] The ACR libraries were transformed into freshly made electrocompetent cells. The electrocompetent cells were prepared from an overnight culture of DH10B containing the p15A- CAT111TAG -sfGFP150TAGHis6selection plasmid. The overnight culture was diluted 1:100 in LB containing 10 µg ml-1 tetracycline and grown to OD 0.3 to 0.5 at 37 °C. The cells were harvested by centrifugation and washed three times with ice-cold water. After the final wash, all cells were resuspended in 200 pl of water and electroporated with 4 pg of the ACR library plasmid (4 x 1 pg in 50 pl of competent cells) using standard conditions for 2-mm electroporation cuvettes (2,500 V). One milliliter of SOB medium was added immediately after electroporation, and the cells were left shaking at 37 °C for 1 h to recover. A dilution series (10“4to 10“7) was plated on selective plates to investigate library coverage (>108transformants). All cells were grown overnight and diluted 1:20 into selective media (20 ml, 2 mM 1 added for P / ? TrpRS*) on the following day. Once cells reached exponential phase (OD600> 0.5), they were harvested, and approximately 108cells were plated onto Cm selection plates (150 pg ml-1 chloramphenicol, 0.2% arabinose, 50 µg ml-1spectinomycin, 10 µg ml-1 tetracycline and 2 mM 1 in case of Ph TrpRS*, 25 × 25 cm). The plates were incubated at 37 °C for 20 h. Generally, plasmid was isolated from colonies and retransformed into chemically competent DH10B containing the p15A sfGFP3TAGHis6(apramycin resistance), and GFP expression was tested. For all clones showing enhanced activity, the plasmid was isolated again from the retransformed colonies and sequenced to confirm that the orthogonal tRNA was intact and identify mutations in the synthetase. The order of these steps was changed depending on the selection outcome. Clones obtained from the Ph TrpRS* ACR selection were pre-screened for dependence of the GFP expression on 1 before plasmid isolation due to the large number of escape mutants via tRNA mutation.
[0216] Chi-T
[0217] Chi-T version 1 and version 1.1 were written using Python 3.7 to generate, filter and select chimeric tRNAs, which are publicly available in a GitHub repository (https: / / github.com / JWChin-Lab / ). The following subsections were implemented within Chi-T: ‘tRNA database processing’, ‘Part scoring and selection and chimeric tRNA generation’, ‘Scoring and selecting chimeric tRNAs with minimal host identity elements’, ‘Folding and cloverleaf MFE structure-based filtering’ and ‘Chimeric tRNA clustering and selection’. Chi-T version 1.1 builds on version 1 by adding the ability to iterate processes as described in ‘Part scoring and selection and chimeric tRNA generation’.
[0218] tRNA database processing. Approximately 10,000,000 aligned tRNAs from bacteria, archaea, plants, fungi, viruses, phage, plasmids and chloroplasts were downloaded from tRNADB-CE27(http: / / trna.ie.niigata-u.ac.jp / cgi-bin / trnadb / index.cgi). tRNA sequences in this database have been aligned and sectioned into the canonical structural parts. These tRNAs were cleaned using the cleanup. py script provided in Chi-T. This script removed any entries with missing information and then attempted to align D-loop sequences (because D-loops are variable in size, the common GG motif was used to align them). First, D-loop sequences from the tRNADB-CE dataset were checked against a manually curated D-loop alignment dictionary17. Those not in the dictionary were aligned automatically as follows. Shorter sequences (n = 6 or 7) were extended to the consensus 8 by adding two hyphens (‘-’) between the 2nd and 3rd (n = 6) nucleotides or one hyphen (‘-’) between the 3rd and 4th (n = 7) nucleotides. Then, sequences were searched for a ‘XXXGGX’ string (where X is any nucleotide or ‘-’), and sequences were aligned such that the GG motif comprised the 5th and 6th nucleotides in the D-loop (tRNA consensus nucleotides 18 and 19). Nucleotides 74-76 were replaced with CCA, and the anticodon was replaced with CTA by default. Finally, sequences were merged into parts — for example, the D-loop (nucleotides 14-21, for example AACTG- GCA) and the T-loop (nucleotides 54-60, for example TTCGAGC) were merged into a single part (AACTGGCA_TTCGAGC). Nine parts were used for chimera generation defined in Fig. 7. Additional sequences (for example, from E. histolytica) were added using the tRNA_adder.py script provided in Chi-T.
[0219] Part scoring and selection and chimeric tRNA generation. tRNA sequences were filtered to remove those for isoacceptors other than the one specified, for example, to generate tRNAs using a tryptophanyl- tRNA synthetase as a target, the cleaned tRNA dataset was filtered by including only entries from natural tryptophanyl-tRNAs. At this point, identity parts and variable parts were defined based on the identity element positions for the given isoacceptor. All sequence parts were scored individually. For a single query sequence part, there are k positions that are identity elements for at least one isoacceptor. These positions are contained in the set j. For instance, a D-loop / T-loop part comprises positions 14-21 and 54-60 and may have the sequence AACTGGCA_TTCGAGC.
[0220] Positions 14 (Leu), 15 (Cys, Leu, Pro), 16 (Leu), 20 (Ala, Arg, Phe), 59 (Phe) and 60 (Phe) are identity elements for at least one isoacceptor (specified in brackets) — therefore, k = 6 and the set j = {14, 15, 16, 20, 59, 60}. The identity of the nucleotide in the E. coli tRNA at position / (where / e j) for isoacceptor s is mis, and so, at each position / , we define a multiset M containing the base identities of all the relevant isoacceptors from E. coli tRNAs. The corresponding nucleotide identity of the query part at position / is given as n,. Following on from the example above, the base identity of position 20 in the query sequence is C (r?2o = C). The multiset M20 will contain ni20A, m20R and m20F (the 20th nucleotide in the E. coli tRNAs for alanine, arginine and phenylalanine). In E. coli, these correspond to G, A and T, respectively, and so the multiset M2o is {G, A, T}. In E. coli, there are two alanine and four arginine tRNAs; however, the 20th nucleotide is G and A, respectively, in all tRNAs of the same isoacceptor. In cases where tRNAs of the same isoacceptor differ at an identity element, all unique base identities are counted once. The score at a single position / is defined as the number of times n, appears in the multiset Mi, divided by the size of M,. Because C appears zero times in
[0221]
[0222] score assigned toR
[0223]
[0224] »*sSm20& M204n20=q = O / IM’ol - 0 / 3 - 0. -|-^e overa|| parf identity score for the query part sequence is the average score at each position in j, so that no element bears any more weight than any other element, because the relative importance of various identity elements to synthetase recognition is currently unknown. This scoring can be summarized in the following formula
[0225]
[0226] Rd
[0227] For each part type, the lowest-scoring sequences were chosen for clustering (default number of sequences in Chi-T is 200). Affinity propagation clustering was performed using the Affinity Propagation function from the sklearn package on the sequence distance matrix, defined as the pairwise Levenshtein distances between sequences, for each variable part type (excluding the six sequences from the unpaired bases 8-9 part). Cluster exemplars were chosen by the clustering algorithm for each part type, and chimeric tRNAs were generated by combining all exemplar sequences in a given part type with all other exemplar sequences in every other part type (that is, the Cartesian product of all exemplars across part types).
[0228] In Chi-T version 1.1, there is an option to iterate the process of chimeric tRNA generation and structural filtering to generate more diverse tRNAs. In this approach, the variable parts selected for chimeric tRNA generation (cluster exemplars) are excluded from the parts used for subsequent iterations in which revised sets of parts are clustered and new cluster exemplars are chosen to proceed with. This process may be repeated for several iterations.
[0229] Scoring and selecting chimeric tRNAs with minimal host identity elements.
[0230] Assembled chimeric tRNAs were first filtered by length (tRNAs <79 nt were kept) and then scored by their sequence identity17. Rather than one aggregate score across all isoacceptors, as in the part identity score, a sequence identity score consists of one score for each isoacceptor, and the highest score for any isoacceptor is taken for filtering. In brief, to score a query tRNA sequence for a single isoacceptor — for example, alanine — for each nucleotide in the sequence that is an identity element of alanine (nucleotides 2, 3, 4, 20, 69, 70, 71 and 73), the alanine score is increased by 1 if the query sequence nucleotide matches the E. coli sequence and is decreased by 1 if it does not. This cumulative total is then divided by the number of elements (eight in this case). For isoacceptors with multiple tRNAs, the scoring is performed for each E. coli tRNA and then averaged to give an isoacceptor score. This process is repeated for all isoacceptors. Here, the scoring was modified such that identity elements within identity parts specific for a given Chi-T run were not involved in scoring, as these sequences were fixed throughout the process. All remaining chimeric sequences were scored, and the lowest-scoring tRNAs were put forward for folding and cloverleaf MFE structure-based filtering.
[0231] Folding and cloverleaf MFE structure-based filtering. All tRNAs were computationally folded using RNAfold, using the default parameters for MFE structure production. MFE structures were filtered based on their structure, frequency and ensemble diversity. The secondary structure string representation was used for filtering according to the regular expression string provided in Fig. 7. Sequences that formed cloverleaf tRNAs but whose structures did not correspond to their parts — for example, where a D-loop has partially paired into the D-arm — were accepted, provided there was at least one thymine / uracil in the first unpaired section (unpaired bases 8-9) and the most 5' nucleotide of the D-loop was an adenine. tRNAs passing structural filtering fortheir individual tRNA sequences were further filtered based on the average structural metrics of their anticodon / variable ensemble with stricter thresholds.
[0232] Chimeric tRNA clustering and selection. The output from folding and cloverleaf MFE structure-based filtering was optionally filtered by sequence identity to the user-defined tRNA before implementing the clustering described in this subsection. In this option, tRNAs with more mismatches to the given reference tRNA than the user-defined threshold were discarded. When run in automatic mode, Chi-T aims to select 200 sequences for clustering, achieved by iteratively removing sequences with the largest number of mismatches from the parent tRNA, until fewer than 200 sequences remain.
[0233] tRNA sequences passing all previous filters were clustered using the same function and parameters for part sequences above, and the cluster exemplar sequences were taken forward. The group of out- put tRNAs was chosen from this pool as having the highest minimum Levenshtein distance between any pair in the group (Chi-T default is four tRNAs).
[0234] Parameters used for generating chimeric tRNAs through Chi-TTrptRNACUAchimeras were generated in Chi-T version 1 with the following settings and parameters. We specified what the targetTrptRNA sequence is by its tRNADB-CE ID (tRNAs not in the database can be added using the tRNA_adder.py script for Chi-T), and target isoacceptor class was set to Trp to extract all tryptophanyl-tRNAfor part scoring. For ‘Part scoring and selection and chimeric tRNA generation’, 65 cluster_parts were specified to generate chimeric tRNAs from. For ‘Scoring and selecting chimeric tRNAs with minimal host identity elements’, the length_filt was kept at default (79 nt), and the default parameters for minimizing host identity were used: cervettini_filt 0.5 (starting_stringency), 0 (minimum_stringency), 2,500,000 (target number of tRNAs),
[0235] 0.05 (step size); this starts with an identity score of no more than 0.5 (for any host synthetase) and reduces the identity score used to filter until 2.5 million sequences are left. ‘Folding and cloverleaf MFE structure-based filtering’ was done over three anticodons — CTA, TGA and CGA — with the cutoff for fre- quency of 40% and a diversity of 9 Abp for each individual tRNA sequence. To filter the anticodon varied ensemble a final_frequency of 50% and a final_diversity of 8 Abp was set. In ‘Chimeric tRNA clustering and selection’, the automatic mode was used; it was not necessary to filter out tRNA sequences to have fewer than 200 sequences for clustering. In the remaining tRNAs, the set of four most distant tRNAs from the cluster exemplars were searched (num_tRNAs, 4).
[0236] ArgtRNACUAchimeras were generated in Chi-T version 1.1 with the following settings and parameters. The target FtArgtRNA sequence is specified by tRNADB-CE ID and target issoaceptor class to Arg to extract all Arginyl-tRNA for part scoring.
[0237] For'Part scoring and selection and chimeric tRNA generation’, 60 cluster_parts were specified to generate a chimeric tRNAs from. Chimeric tRNA generation was run in three iterations each excluding the exemplar parts used in all previous iterations (numjtera-tions, 3). For ‘Scoring and selecting chimeric tRNAs with minimal host identity elements’, the length_filt was kept at default (79 nt), and the following parameters for minimizing host identity were used: cervettini_filt 0.5 (starting_stringency), 0 (minimum_stringency), 2,500,000 (target number of tRNAs) and 0.05 (step size). ‘Folding and cloverleaf MFE structure-based filtering’ was done over seven anticodons — CTA, TGA, CGA, TGC, CGC, AGA and GGA — with the default cutoff for frequency of 30% and a diversity of 10 Abp for each individual tRNA sequence. To filter the anticodon varied ensemble, a final_frequency of 50% and a final_diversity of 3 Abp were set. In ‘Chimeric tRNA clustering and selection’, the automatic mode was used. To get fewer than 200 sequences for clustering, sequences with more than 14 mismatches with respect to HArgtRNA were filtered out. In the remaining tRNAs, the set of four most distant tRNAs from the cluster exemplars was searched (num_tRNAs, 4).
[0238] Synthetase Identification with RS-ID
[0239] The inputs for RS-ID include (1) the processed database output of cleanup. py (above); (2) an isoacceptor class; (3) the name(s) of cognate synthetase to a tRNA(s) with the identity parts of interest; and, optionally, (4) the threshold for identity element mismatches and (5) sequence similarity threshold for identity parts.
[0240] Within RS-ID, the tRNA database is filtered for sequences in the specified isoacceptor class. The identity parts for the tRNAs are identified as well as the identity elements within them. All identity parts in a given tRNA are concatenated, and unique concatenated sequences are identified; these are termed ‘concatenated identity parts’. These concatenated identity parts are compared to the concatenated identity parts of the user-defined tRNA (or tRNAs). Concatenated identity parts with more than one mismatch (a user-defined parameter) in the identity element nucleotides, with respect to the user-defined tRNA(s), are discarded. The remaining concatenated identity parts are stored in FASTA format and input into the calc_distmx function, part of the USEARCH software suite (version 11)41, to generate a sequence dissimilarity matrix for all unique concatenated identity parts. Similar sequences to the user-defined concatenated identity parts are identified using a similar- ity threshold (default is 0.2, where 0 is identical and 1 is no homology) and returned. The matrix is converted into a two-dimensional projection using the uniform manifold approximation and projection (UMAP) algorithm42, and any clusters were identified using HDBSCAN43on the UMAP embeddings. The outputs of RS-ID are a UMAP projection and a spreadsheet containing the remaining filtered tRNA gene IDs, their identity part sequences, any HDBSCAN-assigned cluster and the organisms from which remaining filtered tRNAs originate. The synthetase genes were then extracted from the genomes of the organisms output by RS-ID. References
[0241] 1. de la Torre, D. & Chin, J. W. Reprogramming the genetic code. Nat. Rev. Genet. 22, 169-184 (2021).
[0242] 2. Liu, C. C. & Schultz, P. G. Adding new chemistries to the genetic code. Annu. Rev. Biochem. 79, 413-444 (2010).
[0243] 3. Robertson, W. E. et al. Sense codon reassignment enables viral resistance and encoded polymer synthesis. Science 372, 1057-1062 (2021).
[0244] 4. Spinck, M. et al. Genetically programmed cell-based synthesis of non-natural peptide and depsipeptide macrocycles. Nat. Chem. 15, 61-69 (2023).
[0245] 5. Wang, L., Magliery, T. J., Liu, D. R. & Schultz, P. G. A new functional suppressor tRNA / aminoacyl-tRNA synthetase pair for the in vivo incorporation of unnatural amino acids into proteins. J. Am. Chem. Soc. 122, 5010-5011 (2000).
[0246] 6. Anderson, J. C. & Schultz, P. G. Adaptation of an orthogonal archaeal leucyl-tRNA and synthetase pair for four-base, amber, and opal suppression. Biochemistry 42, 9598-9608 (2003).
[0247] 7. Chatterjee, A., Xiao, H. & Schultz, P. G. Evolution of multiple, mutually orthogonal prolyl-tRNA synthetase / tRNA pairs for unnatural amino acid mutagenesis in Escherichia coli. Proc. Natl Acad. Sci. USA 109, 14841-14846 (2012).
[0248] 8. Neumann, H., Peak-Chew, S. Y. & Chin, J. W. Genetically encoding Nε-acetyllysine in recombinant proteins. Nat. Chem. Biol. 4, 232-234 (2008).
[0249] 9. Rogerson, D. T. et al. Efficient genetic encoding of phosphoserine and its nonhydrolyzable analog. Nat. Chem. Biol. 11, 496-503 (2015).
[0250] 10. Hughes, R. A. & Ellington, A. D. Rational design of an orthogonal tryptophanyl nonsense suppressor tRNA. Nucleic Acids Res. 38, 6813-6830 (2010).
[0251] 11. Chatterjee, A., Xiao, H., Yang, P.-Y, Soundararajan, G. & Schultz, P. G. A tryptophanyl-tRNA synthetase / tRNA pair for unnatural amino acid mutagenesis in E. coli. Angew. Chem. Int. Ed. 52, 5106-5109 (2013).
[0252] 12. Wu, N., Deiters, A., Cropp, T. A., King, D. & Schultz, P. G. A genetically encoded photocaged amino acid. J. Am. Chem. Soc. 126, 14306-14307 (2004). 13. Edwards, H. & Schimmel, P. An E. coli aminoacyl-tRNA synthetase can substitute for yeast mitochondrial enzyme function in vivo. Cell 51, 643-649 (1987).
[0253] 14. Willis, J. C. W. & Chin, J. W. Mutually orthogonal pyrrolysyl-tRNA synthetase / tRNA pairs. Nat. Chem. 10, 831-837 (2018).
[0254] 15. Dunkelmann, D. L., Willis, J. C. W., Beattie, A. T. & Chin, J. W. Engineered triply orthogonal pyrrolysyl-tRNA synthetase / tRNA pairs enable the genetic encoding of three distinct non-canonical amino acids. Nat. Chem. 12, 535-544 (2020).
[0255] 16. Beattie, A. T., Dunkelmann, D. L. & Chin, J. W. Quintuply orthogonal pyrrolysyl-tRNA synthetase / tRNAPyl pairs. Nat. Chem. 15, 948-959 (2023).
[0256] 17. Cervettini, D. et al. Rapid discovery and evolution of orthogonal aminoacyl-tRNA synthetase-tRNA pairs. Nat. Biotechnol. 38, 989-999 (2020).
[0257] 18. Lorenz, R. et al. ViennaRNA Package 2.0. Algorithms Mol. Biol. 6, 26 (2011).
[0258] 19. Helm, M. et al. The presence of modified nucleotides is required for cloverleaf folding of a human mitochondrial tRNA. Nucleic Acids Res. 26, 1636-1643 (1998).
[0259] 20. Lorenz, C., Lünse, C. E. & Mörl, M. tRNA modifications: impact on structure and thermal adaptation. Biomolecules 7, 35 (2017).
[0260] 21. Giege, R., Sissler, M. & Florentz, C. Universal rules and idiosyncratic features in tRNA identity. Nucleic Acids Res. 26, 5017-5035 (1998).
[0261] 22. Frugier, M., Helm, M., Felden, B., Giege, R. & Florentz, C. Sequences outside recognition sets are not neutral for tRNA aminoacylation. Evidence for nonpermissive combinations of nucleotides in the acceptor stem of yeast tRNAPhe. J. Biol. Chem. 273, 11605-11610 (1998).
[0262] 23. Fukunaga, J., Ohno, S., Nishikawa, K. & Yokogawa, T. A base pair at the bottom of the anticodon stem is reciprocally preferred for discrimination of cognate tRNAs by Escherichia coli lysyl- and glutaminyl-tRNA synthetases. Nucleic Acids Res. 34, 3181-3188 (2006). 24. Tian, Q., Wang, C., Liu, Y. & Xie, W. Structural basis for recognition of G-1 -containing tRNA by histidyl-tRNA synthetase. Nucleic Acids Res. 43, 2980-2990 (2015).
[0263] 25. Cusack, S., Yaremchuk, A., Krikliviy, I. & Tukalo, M. tRNAPro anticodon recognition by Thermus thermophilus prolyl-tRNA synthetase. Structure 6, 101-108 (1998). 26. Beuning, P. J. & Musier-Forsyth, K. Hydrolytic editing by a class II aminoacyl-tRNA synthetase. Proc. Natl Acad. Sci. USA 97, 8916-8920 (2000).
[0264] 27. Abe, T. et al. tRNADB-CE 2011: tRNAgene database curated manually by experts. Nucleic Acids Res. 39, D210-D213 (2010).
[0265] 28. Merritt, E. A. et al. Crystal structures of three protozoan homologs of tryptophanyl-tRNA synthetase. Mol. Biochem. Parasitol. 177, 20-28 (2011).
[0266] 29. Italia, J. S. et al. An orthogonalized platform for genetic code expansion in both bacteria and eukaryotes. Nat. Chem. Biol. 13, 446-450 (2017).
[0267] 30. Lin, B. Y, Chan, P. P. & Lowe, T. M. tRNAviz: explore and visualize tRNA sequence features. Nucleic Acids Res. 47, W542-W547 (2019).
[0268] 31. Uhlenbeck, O. C. & Schrader, J. M. Evolutionary tuning impacts the design of bacterial tRNAs for the incorporation of unnatural amino acids by ribosomes. Curr. Opin. Chem. Biol. 46, 138-145 (2018).
[0269] 32. Boccaletto, P. et al. MODOMICS: a database of RNA modification pathways. 2021 update. Nucleic Acids Res. 50, D231-D235 (2022).
[0270] 33. Giege, R. & Eriani, G. The tRNA identity landscape for aminoacylation and beyond. Nucleic Acids Res. 51, 1528-1570 (2023).
[0271] 34. Gruber, A. R., Lorenz, R., Bernhart, S. H., Neubock, R. & Hofacker, I. L. The Vienna RNA Websuite. NucleicAcids Res. 36, W70-W74 (2008).
[0272] 35. Pettersen, E. F. et al. UCSF Chimera — a visualization system for exploratory research and analysis. J. Comput. Chem. 25, 1605-1612 (2004).
[0273] 36. Fan, C., Xiong, H., Reynolds, N. M. & Soil, D. Rationally evolving tRNAPyl for efficient incorporation of noncanonical amino acids. NucleicAcids Res. 43, e156 (2015).
[0274] 37. Dunkelmann, D. L. et al. Adding a,a-disubstituted and [3-linked monomers to the genetic code of an organism. Nature 625, 603-610 (2024).
[0275] 38. Weiner, M. P. et al. Site-directed mutagenesis of double-stranded DNA by the polymerase chain reaction. Gene 151, 119-123 (1994).
[0276] 39. Gibson, D. G. et al. Enzymatic assembly of DNA molecules up to several hundred kilobases. Nat. Methods 6, 343-345 (2009). 40. Chan, P. P. & Lowe, T. M. GtRNAdb: a database of transfer RNA genes detected in genomic sequence. Nucleic Acids Res. 37, D93-D97 (2009).
[0277] 41. Edgar, R. C. Search and clustering orders of magnitude faster than BLAST.
[0278] Bioinformatics 26, 2460-2461 (2010).
[0279] 42. Mclnnes, L., Healy, J. & Melville, J. UMAP: uniform manifold approximation and projection for dimension reduction. Preprint at https: / / doi.org / 10.48550 / arXiv.1802.03426 (2018).
[0280] 43. Mclnnes, L., Healy, J. & Astels, S. hdbscan: hierarchical density based clustering. J. Open Source Softw. 2, 205 (2017).
Claims
1. Claims1. A method for assessing the functionality of a tRNA sequence in a non-native host cell, comprising the steps of:3.(a) calculating the predicted minimum free energy (MFE) structure for the sequence; and (b) assessing whether these predicted structure matches the canonical cloverleaf structure of the corresponding isoacceptor class for the tRNA sequence.
2. A method according to claim 1, wherein the tRNA is orthogonal to the host cell.
3. A method according to claim 2, wherein the host cell is a bacterium, preferably wherein the bacterium is E. coli.
4. A method according to any preceding claim, wherein functionality is measured as the ability to charge a tRNA with a correct monomer when used in conjunction with a cognate orthogonal tRNA synthetase in a host cell.
5. A method according to claim 1, wherein the MFE is calculated using structural prediction software, advantageously wherein said software is RNAfold.
6. A method according to any preceding claim, wherein the cloverleaf structure comprises the acceptor stem, D and T arms, anticodon loop and variable loop of the tRNA cloverleaf.
7. A method for converting an inactive orthogonal tRNA to an active orthogonal tRNA, comprising introducing mutations into the sequence of the orthogonal tRNA to promote folding into a MFE cloverleaf structure.
8. A method according to claim 7, wherein the mutations are identified by rational design.
9. A method for converting a non-orthogonal tRNA to an orthogonal tRNA, comprising the steps of substituting elements of the sequence of the non-orthogonal tRNA with elements derived from tRNAs with minimal host identity elements present, and testing one or more tRNA molecules from the resulting chimeric tRNAs to identify an orthogonal tRNA.
10. A method for converting a non-orthogonal tRNA to an orthogonal tRNA, comprising analyzing the sequences of chimeric tRNAs identified as orthogonal in the method of claim 9 with the sequence of their non-orthogonal parent tRNA, identifying sequence elements which are consistently present or absent in the orthogonal tRNAs, and adding or removing said sequence elements from the tRNA to be converted.
11. A method for increasing the activity of an orthogonal tRNA / aaRS pair, comprising selecting a tRNA selected or prepared by the method of any preceding claim, creating a repertoire of mutant cognate tRNA synthetases by randomizing the positions responsible for anticodon recognition, and selecting the repertoire for synthetase activity in promoting readthrough of a stop codon in a host expression system using the selected tRNA.
12. A method for generating orthogonal aaRS / tRNA pairs, comprising the steps of: (a) collecting tRNA sequence data from databases of tRNA sequences, comprising both a first general tRNA database and a second database of tRNAs which are to serve as starting points for computational re-design into orthogonal and active tRNAs; (b) partitioning the tRNA sequences of the second database into parts, identifying identity elements in those parts which are responsible for functionality, and thus dividing the parts into the fixed parts which comprise identity elements, and a repertoire of variable parts which do not;13.(c) selecting alternative sequences from the first database for variable parts identified in step (b), and adding said parts to the repertoire of variable parts to introduce diversity;14.(d) applying filters and calculating identity scores to select the most suitable variable parts;15.(e) optionally clustering the variable parts, and selecting the most diverse set of these suitable variable part sequences for chimeric tRNA assembly;16.(f) assembling chimeric tRNAs from fixed parts and the repertoire of variable parts to create a repertoire of chimeric tRNAs, and filtering the chimeric tRNAs based on identity scores calculated by comparing against a database of tRNAs native to the intended host;17.(g) using computational tools, predicting the structural viability of chimeric tRNAs; (h) modifying anticodons and ensuring structural integrity across all variants;18.(i) clustering and selecting the most diverse set of tRNA sequences for further testing; (j) generating the final sequences ready for experimental validation; and19.(k) determining the source organisms for the fixed parts in the candidate chimeric tRNAs and providing a tRNA synthetase from said source organism to generate a tRNA synthetase / tRNA pair.
13. A method according to claim 12, wherein the first database is the tRNADB-CE database.
14. A method according to claim 12 or claim 13, wherein the second database is a single tRNA sequence.
15. A method according to claim 12 or claim 13, wherein the second database comprises a plurality of sequences of tRNAs, optionally of the same isoacceptor class.
16. A method according to any one of claims 12 to 15, wherein sequences selected from a database are clustered, and exemplars from the clustering are used.
17. A method according to any one of claims 12 to 16, wherein the sequences are partitioned into parts comprising the acceptor stem (1-7 and 66-72), unpaired bases 8 and 9, D-arm (10-13 and 22-25), D / T loops (14-21 and 54-60), unpaired base and variable loop (26 and 44-48), anticodon stem (27-31 and 39-43), anticodon stem loop (32-38), T-arm (49-53 and 61-65), and the discriminator base with CCA (73-76), numbered according to canonical tRNA numbering.
18. A method according to any one of claims 12 to 17, wherein variable parts are selected for the presence of elements selected from the group consisting of the presence of only bases A, C, U and G; paired sequences being correctly Watson-Crick base paired; variable loops shorter than 8nt; other host-specific elements, such as the presence of at least one U in unpaired bases 8-9 in E. coli, for example.
19. A method according to any one of claims 12 to 18, wherein the identity score is calculated by analyzing the presence of identity elements in the variable parts of the chimeric tRNA, with a higher presence of identity elements being indicative of a higher identity score.
20. A method according to any one of claims 12 to 19, wherein structural viability is assessed by determining whether the MFE form of a candidate tRNA conforms to the canonical tRNA cloverleaf structure.
21. A method according to any one of claims 12 to 20, wherein anticodons are varied by sequentially altering the anticodon bases to match a user-defined anticodon sequence, and repeating step (f) to determine structural viability.
22. A method for de novo synthesis of an orthogonal aaRS / tRNA pair, comprising the steps of:29.(a) providing a first database of aligned tRNA sequences and a second database of tRNA sequences with the desired anticodon specificity;30.(b) partitioning the tRNA sequences in the second database into structural parts; (c) separating a first set of structural parts comprising identity elements from a second set of structural parts which do not comprise identity elements, and defining the first set of structural parts as fixed parts for all chimeric tRNA structures to be synthesized;31.(d) defining each part of the second set of structural parts as a variable part, identifying corresponding variable parts in the sequences of first database in the same isoacceptor class as the sequences in the second database, and sectioning said variable parts with the second set of structural parts;32.(e) filtering the second set of structural part sequences to identify parts likely to assemble into a functional tRNA;33.(f) for each filtered part sequence, comparing the part sequence to a third database of tRNAs from an intended host, and calculating an identity score which reflects the number of shared identity elements;34.(g) selecting the filtered part sequences with the lowest identity scores;35.(h) assembling a chimeric tRNA repertoire comprising the fixed parts and the filtered part sequences with low identity scores; (i) calculating an identity score across the entire sequence of each chimeric tRNA, and selecting the sequences with the lowest identity scores;36.(j) for each chimeric tRNA in the repertoire, determining the minimum free energy structure of the tRNA;37.(k) determining if the minimum free energy structure matches a cloverleaf configuration, and determining the ensemble diversity of the cloverleaf structure and the frequency of the cloverleaf structure in the chimeric tRNA;38.(l) discarding sequences which do not have a cloverleaf structure match, or have a low frequency of cloverleaf structure, or a high ensemble diversity;39.(m)introducing variation into the anticodon sequence of the remaining chimeric tRNAs, and repeating steps (j) and (k);40.(n) calculating average frequency and ensemble diversity across the anticodon variants, and retaining the desired tRNA sequences;41.(o) clustering the retained chimeric tRNA sequences to generate a user-defined number of sequences with maximal sequence diversity, in order to define a finite set of chimeric tRNA sequences to be tested experimentally while maximally representing the sequence space;42.(p) determining the source organism for the fixed identity parts of any one of the chimeric tRNA sequences, and test said tRNA sequences with candidate synthetases derived from said organism to provide an orthogonal aaRS / tRNA pair.
23. A method according to any one of claims 12 to 22, which is a computer-implemented method.
24. A data processing system comprising a processor configured to perform the method of claim 23.
25. A computer program comprising instructions which, when executed by a computer, carry out the method of claim 23.
26. A computer-readable medium comprising instructions which, when executed by a computer, cause the computer to carry out the method of claim 23.
27. A tRNA molecule selected from the group consisting of CsProtRNACUAfix, CsProtRNACUAfix(G37A) and ApHistRNACUAfix.
28. A tRNA molecule select from the group consisting of 1092TrptRNACUA, 1081ArgtRNACUA, and anticodon variants of 1092TrptRNACUA.
29. A tRNA synthetase molecule selected form the group consisting of ApHisRS v1, ApHisRS v2 and ApHisRS v3.
30. A tRNA synthetase molecule selected form the group consisting of CsProRS v1, CsProRS v2, CsProRS v3, CsProRS v4, CsProRS v5, CsProRS v6, CsProRS v7, CsProRS v8, CsProRS v2* and CsProRS v3*31. A tRNA synthetase molecule selected form the group consisting of PhTrpRS* and PhTrpRS* v1.
32. A tRNA synthetase molecule which is selected from the group consisting of T2H6HistRNACUA, T2H7HistRNACUA, T2H9HistRNACUAand T2H9HistRNACUAfix.
32. A tRNA synthetase / tRNA pair selected from the group consisting of PhTrpRS / 1092TrpRNACUA, PhTrpRS* / 1092TrpRNACUA, PhTrpRS*v1 / 1092TrpRNACUA, CsProRS / CsProtRNACUA, ApHisRS / ApHistRNACUAand CsArgRS / 1081ArgRNACUA.