Methods for (POLY)peptide ligation

Modified Connectase (MmCNT) enzyme enables precise and efficient peptide ligation by recognizing specific motifs, addressing substrate specificity and processivity issues in existing technologies, facilitating intracellular and diverse biotechnological applications.

WO2026127824A1PCT designated stage Publication Date: 2026-06-18NANYANG TECH UNIV

Patent Information

Authority / Receiving Office
WO · WO
Patent Type
Applications
Current Assignee / Owner
NANYANG TECH UNIV
Filing Date
2025-12-09
Publication Date
2026-06-18

Smart Images

  • Figure SG2025050776_18062026_PF_FP_ABST
    Figure SG2025050776_18062026_PF_FP_ABST
Patent Text Reader

Abstract

The present invention relates generally to methods of enzymatically ligating (poly) peptides using an enzyme that acts as a peptide ligase and recognises specific N- and C-terminal motifs for binding and ligation. More particularly, the present invention relates to the use of a Connectase (MmCNT), which may be modified to improve activity and substrate specificity, in methods for highly precise, irreversible ligation of two or more (poly) peptides.
Need to check novelty before this filing date? Find Prior Art

Description

[0001] METHODS FOR (POLY)PEPTIDE LIGATION

[0002] CROSS-REFERENCE TO RELATED APPLICATION

[0003]

[0001] This application makes reference to and claims the benefit of priority of the Singapore Patent Application No. 10202403901 Q filed on 11 December 2024, the content of which is incorporated herein by reference for all purposes.

[0004] FIELD OF THE INVENTION

[0005]

[0002] The present invention relates generally to methods of enzymatically ligating (poly)peptides using an enzyme that acts as a peptide ligase and recognises specific N- and C-terminal motifs for binding and ligation. More particularly, the present invention relates to the use of a Connectase (MmCNT), which may be modified to improve activity and substrate specificity, in methods for highly precise, irreversible ligation of two or more (poly)peptides.

[0006] BACKGROUND OF THE INVENTION

[0007]

[0003] Since the 1990s, biochemists have endeavoured to develop enzymes capable of bio-orthogonal protein and peptide modifications. Subtiligase1,2and Sortase A3, both bacterial in origin, were repurposed early on to facilitate peptide ligation, significantly advancing protein engineering and biochemical research4,5. More recently, research teams from Singapore and Australia characterised plant-based asparaginyl endopeptidases (AEPs), demonstrating their potential as efficient protein ligases6. In 2017, the structure of the first AEP-like protein ligase, OaAEP1, was resolved, and a hyperactive variant, OaAEP1 (C247A), was engineered by elucidating its catalytic mechanism7, leading to several biotechnological applications8–10. Despite these advances, current enzymatic ligation strategies face limitations that hinder their broader biomedical application11, particularly regarding the substrate specificity of plant AEPs and the processivity of Sortase A, which require further enhancement for direct in vivo use12.

[0008]

[0004] The discovery of ligase activity in the Connectase family of enzymes (CNTs) from archaea has expanded protease repurposing strategies13. However, original CNTs exhibit residual protease activity, leading to non-processive (reversible) ligation and limiting their application. A detailed structural understanding is needed to elucidate the differences between protease and ligase activities. The substrate recognition grooves of Connectase enzymes suggest potential for engineering diverse substrate specificities, transforming occasional ligase activity in specialised proteases into a versatile biotechnological platform. To date, only one inactive analogue, MjCNT, a phylogenetic sister of Connectase, has been structurally characterized.

[0009]

[0005] However, existing technologies suffered from non-specific recognition of molecular features, resulting a wide range of undesirable products, which defeated the purpose of ‘specific’ intracellular protein engineering. Furthermore, existing protein ligation technologies requires a well-controlled biochemical environment to be fully functional, and none of the existing publications feature intracellular ligations conducted by a co-expressed, non-purified protein ligase.

[0010]

[0006] Therefore, there exists a need for novel improved methods for ligating peptides, especially intracellularly, taking advantage of the superior site specificity and catalytic efficiency identified for a connectase enzyme acting as a protein ligase.

[0011] SUMMARY OF THE INVENTION

[0012]

[0007] The present invention satisfies the aforementioned need in the art by providing the methods and peptide ligase described herein.

[0013]

[0008] In one aspect, there is provided a method of (poly)peptide ligation, comprising, providing a first (poly)peptide comprising a C-terminal K-D-X-G-A motif, wherein X is any amino acid, providing a second (poly)peptide comprising an N-terminal p1"-p2"_p3’ motif, wherein P1" is A, V, G, S, T, L, or M, P2' is G or an analogue of G, and P3’ is A or an analogue of A, wherein the second (poly)peptide may be the same or different to the first (poly) peptide, and contacting the first (poly)peptide and the second (poly)peptide with a peptide ligase having the activity of MmConnectase (MmCNT), under conditions suitable for a cleavage and ligation reaction, wherein the peptide ligase cleaves the bond between the C-terminal D and X residue in the first (poly)peptide and ligates the C-terminal D residue of the first (poly)peptide to the N-terminal P1'-P2'-P3'motif of the second (poly)peptide to form a ligated (poly)peptide.

[0014]

[0009] In various embodiments, P1'is A, V, G, S, T, or L, P2'is G, and P3'is A.

[0015]

[0010] In various embodiments, the N-terminal P1'-P2'-P3'motif is G-G-A.

[0016]

[0011] In various embodiments, the second (poly)peptide comprises a N-terminal amino acid sequence P1”-P2"-P3”-(X)q, wherein q is an integer selected from 1 to 15, and X can be any amino acid.

[0017]

[0012] In various embodiments, the second (poly)peptide comprises a N-terminal amino acid sequence set forth in any one of SEQ ID NOs: 75, 80, 84, 85, 90-92 and 101, or variants thereof.

[0018]

[0013] In various embodiments, the C-terminal K-D-X-G-A motif is K-D-P-G-A or K-D-G-G-A.

[0019]

[0014] In various embodiments, the first (poly)peptide comprises a C-terminal amino acid sequence (X)n-K-D-X-G-A-(X)m, wherein n is an integer selected from 0 to 5, m is an integer selected from 0 to 10, and X can be any amino acid, preferably n is 5 and m is 10.

[0020]

[0015] In various embodiments, the first (poly)peptide comprises a C-terminal amino acid sequence set forth in any one of SEQ ID NOs: 69-72 and 95-97, or variants thereof.

[0016] In various embodiments, the first and second (poly)peptides are termini of the same peptide such that the method cyclizes said peptide.

[0021]

[0017] In various embodiments, the first and / or second (poly)peptide further comprises a labelling moiety, preferably wherein the labelling moiety is an affinity tag, therapeutic agent, detectable label, or scaffold molecule.

[0022]

[0018] In various embodiments, the first and / or second (poly)peptides is a cellular surface protein such that the method results in the modification or tagging of the cellular surface protein and the cellular surface.

[0023]

[0019] In various embodiments, the first and / or second (poly)peptides are intracellular proteins of a host cell, and the protein ligase is comprised within said host cell.

[0024]

[0020] In various embodiments, the first and / or second (poly)peptide is coupled to a solid support material.

[0025]

[0021] In various embodiments, the method comprises coupling the second peptide on the solid support material; and ligating the first (poly)peptide to the second (poly)peptide by the peptide ligase.

[0026]

[0022] In various embodiments, the analogue of Gly and Ala is selected or validated based on molecular modelling or docking analysis using the atomic coordinates of the peptide ligase deposited under PDB accession number 8WKD or 8JTU.

[0027]

[0023] In various embodiments, the ligation of the first (poly)peptide to the second (poly)peptide is irreversible.

[0028]

[0024] In various embodiments, the first (poly)peptide is operably fused to the C-terminus of the peptide ligase, wherein the fusion maintains the ligase activity of the peptide ligase and the accessibility of the C-terminal K-D-X-G-A motif of the first (poly)peptide.

[0029]

[0025] In another aspect, there is provided a method for modifying or tagging the surface of a target cell by one or more (poly)peptides of interest, the method comprising, providing the one or more (poly)peptides of interest having a C-terminal K-D-X-G-A motif and / or an N-terminal pi"-p2"-p3" motif, wherein X is any amino acid, P1" is A, V, G, S, T, L, or M, P2’ is G or an analogue of G, and P3" is A or an analogue of A; providing a peptide ligase having the activity of MmCNT; contacting the target cell with the one or more (poly)peptides of interest and the peptide ligase; and subjecting the target cell to conditions that allow the peptide ligase to catalyse the ligation of the one or more (polyjpeptides of interest to a cellular surface protein of the target cell, wherein the cellular surface protein comprises a MmCNT recognition motif complementary to that comprised in the one or more (poly)peptides of interest.

[0030]

[0026] In various embodiments, the one or more (poly)peptides of interest comprise a labelling moiety, therapeutic agent, detectable label, or scaffold molecule.

[0031]

[0027] In various embodiments, the cellular surface protein comprises a C-terminal K-D-X-G-A motif and / or an N-terminal pi'-pz'-p®’ motif.

[0032]

[0028] In another aspect, there is provided a method for intracellular (poly)peptide ligation, comprising, providing a host cell that comprises, within its intracellular environment, a first (poly)peptide comprising a C-terminal K-D-X-G-A motif, wherein X is any amino acid, a second (poly)peptide comprising an N-terminal pi"-p2"-p3" motif, wherein P1’ is A, V, G, S, T, L, or M, P2’ is G or an analogue of G, and P3’ is A or an analogue of A, and a peptide ligase having the activity of MmCNT, and subjecting the host cell to conditions that allow the peptide ligase to catalyse the ligation of the first and second (poly)peptide within the intracellular environment of the host cell.

[0033]

[0029] In various embodiments, the host cell is a mammalian cell, preferably a human cell.

[0034]

[0030] In various embodiments, the method comprises introducing into the host cell, one or more nucleic acid molecules comprising a nucleotide sequence encoding the first (poly)peptide, a nucleotide sequence encoding the second (poly)peptide, and a nucleotide sequence encoding the peptide ligase, optionally wherein expression of the first (poly)peptide, the second (poly)peptide, and / or the peptide ligase are under inducible or constitutive control within the host cell.

[0035]

[0031] In another aspect, there is provided a method for tandem ligation, comprising, providing a first (poly)peptide (A) comprising an N-terminal pi'-p^-py motif and a C-terminal K-D-X-G-A motif, providing a second (poly)peptide (B) comprising an N-terminal pi’.pz’.pa’ motif ora C-terminal K-D-X-G-A motif, providing a third (poly)peptide (C) comprising an N-terminal pi’.pz'-p®-motif or a C-terminal K-D-X-G-A motif, wherein X is any amino acid, Pris A, V, G, S, T, L, or M, P2’ is G or an analogue of G, and P3' is A or an analogue of A, wherein the third (poly)peptide comprises a different MmCNT recognition motif to the second (poly)peptide, contacting the first (poly)peptide (A) with the second (poly)peptide (B) and the peptide ligase having the activity of MmCNT under conditions that allow ligation of the second (poly)peptide to the C- or N-terminal of the first (poly)peptide to yield a modified first (poly)peptide; and contacting the modified first (poly)peptide with the third (poly)peptide (D) and the peptide ligase having the activity of MmCNT under conditions that allow ligation of the third (poly) peptide to the N- or C-terminal of the first (poly)peptide to yield a dually modified first (poly)peptide.

[0036]

[0032] In various embodiments, the second (poly)peptide comprises the C-terminal K-D-X-G-A motif and is ligated to the N-terminal of the first (poly)peptide, and the third (poly)peptide comprises the N- terminal pi’-pz'-py motif and is ligated to the C-terminal of the first (poly)peptide; or the second (poly)peptide comprises the N-terminal pi’-pz’-ps" motif and is ligated to the C-terminal of the first (poly)peptide, and the third (poly)peptide comprises the C-terminal K-D-X-G-A motif and is ligated to the N-terminal of the first (poly)peptide.

[0037]

[0033] In another aspect, there is provided a method of preparing a dimer, oligomer, or multimer of one or more (poly)peptides of interest, comprising, providing one or more (poly)peptides of interest having a C-terminal K-D-X-G-A motif, and an N-terminal pi'-pz'-P3’ motif, wherein X is any amino acid, P1’ is A, V, G, S, T, L, or M, P2' is G or an analogue of G, and P3" is A or an analogue of A, providing a peptide ligase having the activity of MmCNT, contacting the one or more (poly)peptides of interest, and the peptide ligase having the activity of MmCNT under conditions that allow the peptide ligase to catalyze ligation of one (poly)peptide of interest with another (poly)peptide of interest to form a dimer, oligomer, or multimer of the one or more (poly)peptides of interest.

[0038]

[0034] In various embodiments, the method further comprises immobilizing a scaffold molecule on to a solid support material, wherein the scaffold molecule comprises one or more copies of the N-terminal P1'-P2'-P3'motif; or one or more copies of the C-terminal K-D-X-G-A motif.

[0039]

[0035] In another aspect, there is provided a method for preparing a dimer, oligomer, or multimer of two different (poly)peptides comprising, providing at least one first (poly)peptide having an N-terminal pr.p2-.p3-motif or a C-terminal K-D-X-G-A motif, and a C- or N-terminal recognition motif for a second peptide ligase, providing at least one second (poly)peptide having an N-terminal pi".p2’.p3' motif or a C-terminal K-D-X-G-A motif, and a C- or N-terminal recognition motif for a second peptide ligase, wherein X is any amino acid, P1' is A, V, G, S, T, L, or M, P2' is G or an analogue of G, and P3' is A or an analogue of A, providing a first peptide ligase having the activity of MmCNT, providing a second peptide ligase, wherein the second peptide ligase is different to the first peptide ligase, wherein each of the at least one first (poly)peptide has different C- and N-terminal recognition motifs to each of the at least one second (poly)peptide, and contacting the at least one first (poly)peptide and the at least one second (poly)peptide with the first and second peptide ligase, under conditions suitable for a cleavage and ligation reaction, to form a dimer, oligomer, or multimer of the two different (poly)peptides.

[0040]

[0036] In various embodiments, the method further comprises immobilizing a scaffold molecule on to a solid support material, wherein the scaffold molecule comprises one or more copies of the N-terminal P1'-P2'-P3'motif or one or more copies of C-terminal K-D-X-G-A motif, or the scaffold molecule comprises one or more copies of the N-terminal recognition motif for the second peptide ligase, or one or more copies of C-terminal recognition motif for the second peptide ligase.

[0041]

[0037] In various embodiments, the second peptide ligase is OaAEP1 (C247A) or a functional variant thereof, wherein the N-terminal recognition motif for the second peptide ligase is GL, and the C-terminal recognition motif for the second peptide ligase is NGL.

[0038] In various embodiments, the peptide ligase having the activity of MmCNT comprises or consists of an amino acid sequence set forth in SEQ ID NO: 1, 2 or 3, or functional variants, or fragments thereof.

[0042]

[0039] In various embodiments, the peptide ligase comprises an amino acid residue A at the position corresponding to position 1 of SEQ ID NO:1, and / or an amino acid residue S at the position corresponding to position 192 of SEQ ID NO:1.

[0043]

[0040] In various embodiments, the peptide ligase having the activity of MmCNT comprises or consists of an amino acid sequence set forth in SEQ ID NO: 3.

[0044]

[0041] In various embodiments, the peptide ligase having the activity of MmCNT comprises an amino acid mutation at the position corresponding to position 81 and / or 125 of SEQ ID NO:1, preferably the mutation is S81 A and / or N125S.

[0045]

[0042] In various embodiments, the peptide ligase having the activity of MmCNT has a three-dimensional structure corresponding to the atomic coordinates deposited under Protein Data Bank (PDB) accession number 8JTU or 8WKD, preferably wherein the peptide ligase having the activity of MmCNT comprises or consists of an amino acid sequence that adopts an overall tertiary structure analogous to the enzymatic scaffold defined by Protein Data Bank accession number 8JTU or 8WKD, chain A, residues 1–192, wherein the root mean square deviation (RMSD) of backbone atoms following structural alignment is within 1.5 Å.

[0046]

[0043] In another aspect, there is provided a peptide ligase having the activity of MmCNT in accordance with the present application comprises or consists of: (i) the amino acid sequence set forth in SEQ ID NO: 3; (ii) an amino acid sequence that shares at least 65%, preferably at least 75%, even more preferably at least 85%, most preferably at least 95% sequence identity with, or at least 80%, preferably at least 90%, more preferably at least 95% sequence homology with the amino acid sequence as set forth in SEQ ID NO: 3; or (iii) a functional fragment of (i) or (ii), wherein the peptide ligase comprises an amino acid residue S at the position corresponding to position 192 of SEQ ID NO:3.

[0047]

[0044] In various embodiments, the peptide ligase comprises or consists of an amino acid sequence set forth in SEQ ID NO: 3.

[0048]

[0045] In various embodiments, the peptide ligase comprises and an amino acid mutation at position 81 and / or 125 of SEQ ID NO:3, preferably the mutation is S81A and / or N125S.

[0049]

[0046] In another aspect, there is provided a fusion protein comprising the peptide ligase disclosed herein operably fused to a ligation substrate.

[0047] In another aspect, there is provided a nucleic acid molecule encoding the peptide ligase disclosed herein, optionally wherein said nucleic acid molecule is comprised in a vector.

[0050]

[0048] In another aspect, there is provided a host cell comprising the nucleic acid molecule disclosed herein.

[0051]

[0049] In another aspect, there is provided a use of a peptide ligase disclosed herein for the ligation of two (poly)peptides, optionally wherein the two (poly)peptides are intracellular proteins of a host cell.

[0052]

[0050] In another aspect, there is provided a crystalline form of a peptide ligase having the activity of MmCNT, wherein the crystalline form is characterized by atomic coordinates corresponding to those deposited under Protein Data Bank (PDB) accession n umber 8JTU or 8WKD, and comprises or consists of an amino acid sequence having at least 80% amino acid sequence identity to the amino acid sequence set forth in SEQ ID NO:2 or 3.

[0053]

[0051] In various embodiments, the crystalline form adopts an overall tertiary structure analogous to the enzymatic scaffold defined by PDB: 8JTU, chain A, residues 1–192, wherein the RMSD of backbone atoms following structural alignment is within 1.5 Å.

[0054]

[0052] In various embodiments, the crystalline form is characterized with space group P312 1, and has unit cell parameters of a=91 Å, b=91 Å, c=90 Å, α=β=90°, γ=120°.

[0055]

[0053] In various embodiments, the crystalline form adopts an overall tertiary structure analogous to the enzymatic scaffold defined by PDB: 8WKD, chain A, residues 1–192, wherein the RMSD of backbone atoms following structural alignment is within 1.5 Å when bound to a substrate.

[0056]

[0054] In various embodiments, the crystalline form is characterized with space group P21212, and has unit cell parameters of a=54 Å, b=101 Å, c=33 Å, α=β=γ=90°.

[0057]

[0055] In various embodiments, the crystalline form is for use in a computer-assisted method of structure-based design, docking, simulation, screening, and engineering of peptide ligases having the activity of MmCNT and their ligation substrates.

[0058]

[0056] It is understood that all embodiments disclosed herein in relation to one aspect of the invention are similarly applicable to all other aspects of the invention.

[0059] BRIEF DESCRIPTION OF THE DRAWINGS

[0060]

[0057] The invention will be better understood with reference to the detailed description when considered in conjunction with the non-limiting examples and the accompanying drawings.

[0058] FIG. 1A-1B MmConnectase mutant panel well folded protein ligation. Residues surrounding the catalytic pockets have impact on its catalytic activity, demonstrated by protein-protein ligation experiments.

[0061]

[0059] FIG. 2A-2D shows the overall X-ray crystal structure of mmConnectase (T1 A) in complex with one of the enhanced recognition motifs: FIG. 2A Side view and front view of the complex electron density (transparent light grey) as the assigned atomic models (connectase in cyan and substrate peptide in yellow); FIG. 2B The side view of the atomic model with a zoom-in illustration of the N-terminal substrate (highlighted in yellow) recognition groove (enzyme residues shown in light blue); FIG.

[0062] 2C The amino acid sequence of the enhanced substate that was co-crystalized. C-terminal recognition motif, catalytic pocket motif and N-terminal recognition motif. The map showed the alanine substitution in indicated position of substrate recognition motif. The measurement was determined by the ratio of the ligated product against substrates where low values indicate unfavorable substitution for alanine in the relevant position (FIG. 1 A-1 B). Shaded regions indicate substitutions that were eliminated from the analysis due to key positions for ligase recognition. The right panel showed a cartoon illustration of mmConnectase encompassing an extended complex substrate recognition groove; and FIG. 2D Cartoon illustrations and structural views of Sortase A and OaAEP1 highlighting their substrate binding pockets.

[0063]

[0060] FIG. 3A-3C Mass-spec spectrums of substrate and product peptides: FIG. 3A PGA FDADP LVVEI SEEGE (SEQ ID NO: 87); FIG. 3B RELAS KDPGA FDADP LVVEI (SEQ ID NO:96); and FIG.

[0064] 3C RELAS KD (SEQ ID NO: 116), PGA FDADP LVVEI (SEQ ID NO: 110), PGA FDADP LVVEI SEEGE, RELAS KDPGA FDADP LVVEI, and RELAS KDPGA FDADP LVVEI SEEGE (SEQ ID NO: 117). Illustration of the mass-spec results, explaining how the results are interpreted and the 'apparent' catalytic efficiency is calculated. The charts are the raw data of the efficiency plots

[0065]

[0061] FIG. 4A-4D Structure-based modifications of the N-terminal recognition motif of mmConnectase rendering an irreversible protein ligation activity: FIG. 4A The side view of the mmConnectase with substrate peptide atomic model, together with a zoom-in illustration of the c-terminal, N-terminal and Pro-Gly-Ala catalytic pocket of the substrate within the mmConnectase recognition groove; FIG. 4B Atomic model illustrating Pro-Gly-Ala substrate motif is under high constraint within the mmConnectase recognition groove, by switching this residue to smaller amino acids (Gly-Gly-Ala), a new stable alpha-helical secondary structure could be formed, which is less likely to be cleaved; FIG. 4C Screening of first amino acid residue at the N-terminal recognition motif, where various XGA N-terminal substrates were used; and FIG. 4D Time-based ligation assay comparing two prominent N-terminal substrates: PGA and GGA. The MS analysis of the ligations was determined, and the measurement was calculated based on the ratio of the ligated product against C-terminal KDPGA substrates (FIG. 3A-3C). The assays are performed as n = 3 independent assays, and the graph bars represent the mean with standard deviation (SD) and the circles represent individual data.

[0062] FIG.5A-5D Ligase activity and buffer compatibility of mmConnectase using well-folded proteins as substrates: FIG. 5A Schematic representation of the irreversible ligation activity of GGA N-terminal substrates with KDPGA C-terminal substrate. SDS-PAGE results demonstrating the good yield of final product and ligation achieved by mmConnetase at 15min and 60min reaction time; FIG. 5B Schematic representation of the reversible ligation activity of PGA N-terminal substrates with KDPGA C-terminal substrate. SDS-PAGE results demonstrating the poor yield of the final product and ligation achieved by mmConnetase at 15min and 60min reaction time; FIG. 5C Schematic representation of weak cleavage of KDGGA C-terminal substrate as compared with KDPGA C-terminal substrate. SDS-PAGE results demonstrating the yield of the final product and ligation achieved by mmConnetase with different C-terminal substrates; and FIG. 5D Determine the compatibility of various cellular culturing medium conditions with the ligation efficiency of Cnt wt. SDS-PAGE results demonstrating the ligation of two substrates denoted with black arrows: Ub(5)KDPGA(10) and GGA(10)ub-mCherry in various conditions as indicated. The ligated product is denoted with arrows.

[0066]

[0063] FIG. 6A-6C Fluorescence imaging and flow cytometry analysis to validate cellular surface ligation of Connectase (mmCNT) in HEK293T cells. The analysis was carried out for samples: FIG. 6A with the C-terminal recognition motif (EGFP-TM-(5)KPDGA(15)) and the N-terminal recognition motif (Ub mCherry) without mmCNT; FIG. 6B with the C-terminal recognition motif (EGFP-FasL TM-t / srich-(5)KDPGA(15)), the N-terminal recognition motif (PGA(10)Ub mCherry) with mmCNT. The cells were fixed overnight using 2% PFA in PBS and counterstained with DAPI; and FIG. 6C Flow cytometry analysis of cellular surface ligation of Connectase (mmCNT) in HEK293T cells.

[0067]

[0064] FIG. 7A-7C Target protein ligation on cellular surface: FIG. 7A Expression and purification of targeted anti-CD19 scFv protein with the improved N-terminal recognition motif; FIG. 7B Immunoblot detected with anti-FLAG probe showing the ligation validation test of the target protein with Ub(5)KDPGA(10); and FIG. 7C Immunoblot detected with anti-FLAG probe showing cellular surface ligation with target protein.

[0068]

[0065] FIG. 8A-8C Protein immobilization and polymerization with controlled sequence using connectase and OaAEP1 verified by AFM-SMFS: FIG. 8A Ubiquitin (Ub) is immobilized on a glass surface by connectase, as verified by AFM-SMFS protein unfolding experiments. The GL-Ub-(5)KDPGA(10) protein is first ligated onto a GGA(10)-functionalized surface using connectase first (Step 0), and then capped with Coh-NGL using OaAEP1 (Step 1) between GL and NGL for precise AFM measurement with a GB1-XDoc-modified AFM tip; FIG. 8B Left panel: Representative force-extension curves showing the expected number and type of protein unfolding peaks from the immobilized (poly)protein, including Ub (contour length increment ALc of 24 nm, blue), I27 (28 nm, red), GB1 (18 nm, black), and a final peak from the dissociation of the Coh-XDoc complex. Curves 1-5 correspond to (poly)proteins (Ub)1, (Ub)1 -(127)1, (Ub)2-(I27)1, (Ub)2-(I27)2, and (Ub)3-(I27)2, respectively. Right panel: The corresponding scatter plot showing the relationship between unfolding force and ALc for GB1, Ub and I27. For clarity, the force value is shift 500 pN for each sample; and FIG. 8C Stepwise protein ligation using connectase and OaAEP1 to build polyproteins with controlled sequence.

[0069]

[0066] FIG. 9A-9C Intracellular ligation of protein-protein: FIG. 9A Illustration on top showed the transfection of three plasmids within a single mammalian cell: pCDNA3.1_EGFP_9aalinker_(5)KDPGA(15)VDAK (found the cell cytoplasm), pCDNA3.1_GGA(15)_Ub_mCherry_ NLS Cmyc (localised to the nuclear) and pCDNA3.1_mmCNT_6his (involved for the intracellular ligation). HEK293T cells were transfected with various combinations of plasmids and fluorescence imaging of EGFP, mCherry and Ph1 were taken after 24 hour post transfection. Top 4 images are EGFP + MmCNT + mCherry sample, while the lower 4 images are EGFP + mCherry (no ligase) control sample; FIG. 9B EGFP alone and mCherry alone sample with the top 4 images being pCDNA3.1_EGFP_9aalinker_(5)KDPGA(15)VDAK and bottom 4 images being pCDNA3.1_GGA(15)_Ub_mCherry_ NLS Cmyc; and FIG. 9C Zoomed in views of EGFP + MmCNT + mCherry, and EGFP + mCherry samples. MmCET is used interchangeably with mmCNT.

[0070]

[0067] FIG. 10 SDS-PAGE analysis to identify intracellular protein ligation. Various transfected cells were harvested and lysed with RIPA lysis buffer prior to the SDS-PAGE analysis. SDS-PAGE gel was imaging using Chemidoc under Alexa594 for detecting mCherry, Alexa488 for detecting EGFP and Coomassie Blue Staining. SDS-PAGE results demonstrating the ligation of two substrates intracellularly denoted with arrows: EGFP(5)KDPGA(10) and GGA(10)ub-mCherry. Arrow indicates successful ligation of intracellular substrates. 1 = pCDNA3.1_GGA(15)-Ub-mCherry-NLS Cmyc transfected HEK293T; 2 = pCDNA3.1_MmCNT_6his transfected HEK293T; 3 pCDNA3.1_EGFP_9aalinker_(5)KDPGA(15)VDAK transfected HEK293T; 4 = pCDNA3.1_GGA(15)-Ub-mCherry-NLS Cmyc / pCDNA3.1_EGFP_9aalinker_(5)KDPGA(15)VDAK / HEK293T; 5 = pCDNA3.1_GGA(15)-Ub-mCherry-NLS Cmyc I pCDNA3.1_EGFP_9aalinker_(5)KDPGA(15)VDAK / pCDNA3.1_MmCNT_6his transfected HEK293T (1:1:1); 6 = pCDNA3.1_GGA(15)-Ub-mCherry-NLS Cmyc / pCDNA3.1_EGFP_9aalinker_(5)KDPGA(15)VDAK / pCDNA3.1_MmCNT_6his transfected HEK293T (2:2:1); and 7 = Lysate of pCDNA3.1_GGA(15)-Ub-mCherry-NLS Cmyc transfected HEK293T / pCDNA3.1_EGFP_9aalinker_(5)KDPGA(15)VDAK transfected HEK293T I pCDNA3.1_MmCNT_6his transfected HEK293T.

[0071] DETAILED DESCRIPTION OF THE INVENTION

[0072]

[0068] The following detailed description refers to, by way of illustration, specific details and embodiments in which the invention may be practiced. These embodiments are described in sufficient detail to enable those skilled in the art to practice the invention. Other embodiments may be utilized and structural, and logical changes may be made without departing from the scope of the invention. The various embodiments are not necessarily mutually exclusive, as some embodiments can be combined with one or more other embodiments to form new embodiments.

[0069] Unless otherwise defined, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art. The singular terms "a," "an," and "the" include plural referents unless context clearly indicates otherwise. Similarly, the word "or" is intended to include "and" unless the context clearly indicates otherwise. The term "comprises" means "includes." In case of conflict, the present specification, including explanations of terms, will control.

[0073]

[0070] Enzyme-catalysed protein modifications have become invaluable in diverse applications, outperforming chemical methods in precision, conjugation efficiency, and biological compatibility. Despite significant advances with ligases such as Sortase A and OaAEP1, their use in heterogeneous biological environments remains constrained by limited target sequence specificity. In 2021, Andrei Lupas' group introduced Connectase, a family of repurposed archaeal protease for protein ligations, but its low processivity and lack of structural information have impeded further engineering for practical biological and biophysical applications. The identification of Connectase has significantly expanded the toolbox of protein ligases with a novel scaffold, opening new frontiers in biochemical research and applications. Like plant AEPs, this enzyme family was initially recognised for its protease activity. The research disclosed herein reveals that the key to transforming proteases into highly efficient ligases lies in the precise modification of critical structural features.

[0074]

[0071] Here, the X-ray crystallographic structures of Connectase (CNT) from methanosarcina mazei MmCNT) in both apo (PBD: 8JTU) and substrate-bound form (PBD: 8WKD) are provided. Comparative analysis with its inactive paralog, MjCNT (methanococcus janaschi), reveals the structural basis of MmCNT's high-precision ligation activity. By solving the crystal structure of MmCNT, both with and without a preferred peptide substrate at its catalytic transition state, the essential elements responsible for the high-precision substrate recognition of N-terminal motifs has been discovered. Capitalizing on these newly characterized properties of MmCNT, the present inventors successfully addressed a major limitation of this enzyme, enhancing its processivity and demonstrating its superior performance in protein ligation through single-molecule experiments. The insights gained from this study bring us closer to designing tailor-made, highly specific protein ligase tools for a wide range of biological applications. The versatile structural features of MmCNT, particularly its highly adaptable N-terminal recognition groove, promise unprecedented opportunities for specificity engineering. It is believed that mastering the specificity of substrate recognition in MmCNT will revolutionize the field, enabling the development of next-generation protein ligases with unparalleled precision and efficiency, poised to transform biotechnological and therapeutic practices.

[0075]

[0072] Accordingly, the object of the present invention is to provide a method for ligating peptides, taking advantage of the superior site specificity and catalytic efficiency of MmCNT acting as a protein ligase enzyme. In particular, the peptide ligase MmCNT represents a highly precise protein ligase capable of conducting protein ligation in various in vitro, in vivo and ex vivo conditions.

[0073] Thus, the methods disclosed herein advantageously employ the use of the MmCNT peptide ligase, or variants thereof having the activity of MmCNT, for performing ligation reactions of two or more (poly)peptides.

[0076] ❖ Method of Peptide Ligation

[0077]

[0074] To this end, provided in a first aspect of the present invention is a method for (poly)peptide ligation, comprising,

[0078] providing a first (poly)peptide comprising a C-terminal K-D-X-G-A motif, wherein X is any amino acid,

[0079] providing a second (poly)peptide comprising an N-terminal P1"-P2’-P3’ motif, wherein P1’ is A, V, G, S, T, L, or M, P2' is G or an analogue of G, and P3’ is A or an analogue of A,

[0080] wherein the second (poly)peptide may be the same or different to the first (poly)peptide, and contacting the first (poly)peptide and the second (poly)peptide with a peptide ligase having the activity of MmConnectase (MmCNT), under conditions suitable for a cleavage and ligation reaction, wherein the peptide ligase cleaves the bond between the C-terminal D and X residue in the first (poly)peptide and ligates the C-terminal D residue of the first (poly)peptide to the N-terminal P1"-P2’-P3’ motif of the second (poly)peptide to form a ligated (poly)peptide.

[0081]

[0075] The term “(poly)peptide”, as used herein, refers to peptides and polypeptides. In the context of the whole application, the terms "peptide", "polypeptide", and "protein" are used interchangeably to refer to polymers of amino acids of any length connected by peptide bonds. The polymer may comprise modified amino acids, it may be linear or branched, and it may be interrupted by non-amino acids. The terms also encompass an amino acid polymer that has been modified naturally or artificially; for example, by disulfide bond formation, glycosylation, lipidation, acetylation, phosphorylation, or any other manipulation, such as conjugation to a labelling moiety. However, in preferred embodiments, these terms relate to polymers of naturally occurring amino acids, as defined below, which may optionally be modified as defined above, but does not comprise non-amino acid moieties in the polymer backbone. The (poly)peptides, as defined herein, can comprise more than 50 amino acids, preferably 100 or more amino acids. " Peptides’, as used herein, relates to polymers made from amino acids connected by peptide bonds. The peptides, as defined herein, can comprise 2 or more amino acids, preferably 5 or more amino acids, more preferably 10 or more amino acids, for example 10 to 50 amino acids.

[0082]

[0076] The term “amino acid”, as used herein refers to natural and / or unnatural or synthetic amino acids, including both the D and L optical isomers, amino acid analogues (for example norleucine is an analogue of leucine) and derivatives known in the art. The term “natural amino acid”, as used herein, relates to the 20 naturally occurring L-amino acids, namely Gly (G), Ala (A), Val (V), Leu (L), Ile (I), Phe (F), Cys (C), Met (M), Pro (P), Thr (T), Ser (S), Glu (E), Gin (Q), Asp (D), Asn (N), His (H), Lys (K), Arg (R), Tyr (Y), and Trp (W). Generally, in the context of the present application, the polypeptides are shown in the N- to C-terminal orientation. All amino acid residues are generally referred to herein by reference to their one letter code and, in some instances, their three letter code. This nomenclature is well known to those skilled in the art and used herein as understood in the field. As a person skilled in the art would appreciate, amino acids can be categorized in different classes depending upon the chemical and physical properties of the amino acid residue. Amino acids may be grouped according to similarities in the properties of their side chains (in A. L. Lehninger, in Biochemistry, second ed., pp. 73-75, Worth Publishers, New York (1975)): (1) non-polar: Ala, Vai, Leu, He, Pro, Phe, Trp, Met; (2) uncharged polar: Gly, Ser, Thr, Cys, Tyr, Asn, Gin; (3) acidic: Asp, Glu; and (4) basic: Lys, Arg, His. Alternatively, naturally occurring residues may be divided into groups based on common side-chain properties: (1) hydrophobic: Norleucine, Met, Ala, Vai, Leu, lie; (2) neutral hydrophilic: Cys, Ser, Thr, Asn, Gin; (3) acidic: Asp, Glu; (4) basic: His, Lys, Arg; (5) residues that influence chain orientation: Gly, Pro; and (6) aromatic: Trp, Tyr, Phe. As one of skill in the art would appreciate, amino acids can be categorized in different classes depending upon the chemical and physical properties of the amino acid residue. Typically, hydrophobic amino acids can be further classified as having an aliphatic side chain or an aromatic side chain. Aliphatic amino acids and aromatic amino acids are known one of skill in the art. Typically, aliphatic amino acids have aside chain containing hydrogen and carbon atoms. Examples of aliphatic amino acids include alanine (Ala, A), isoleucine (Ile, I), proline (Pro, P), and valine (Val, V). Typically, aromatic amino acids contain a side chain comprising an aromatic ring. Examples of aromatic amino acids include phenylalanine (Phe, F), tyrosine (Tyr, Y), histidine (His, H), and tryptophan (Trp, W).

[0083]

[0077] The term “peptide bond” illustrated as refers to a covalent amide linkage formed by loss of a molecule of water between the carboxyl group of one amino acid and the amino group of a second amino acid. Generally, in all amino acid sequences or motifs depicted herein, the peptides are shown in the N- to C-terminal orientation.

[0084]

[0078] Without wishing to be bound to any particular theory, it is believed that the peptide ligase having the activity of MmCNT as described herein can also catalyse the ligation between any one substance or object comprising a C-terminal “K-D-X-G-A” motif and any one substance or object comprising a N-terminal “p1-.p2-.p3-’ motif. In this connection, it should be noted that the method described herein also applies mutatis mutandis to embodiments wherein one or both of the first and second (poly) peptides may be substituted with any substance or object other than a (poly)peptide, which is also within the scope of the present application.

[0085]

[0079] In various embodiments, the method enzymatically ligates the two (poly)peptides in an irreversible manner using the peptide ligase, that is, the ligation reaction between the recognition motifs catalysed by the peptide ligase is an irreversible ligation reaction. It is of note that such an irreversible ligation does not naturally occur in nature. As used herein, the term “irreversible" refers to a reaction that proceeds in one direction only and is highly resistant to further cleavage and reversible protein ligations, such that the ligation does not revert to regenerate the original substrates under physiological or reaction conditions. In the context of the present invention, the ligation reaction catalysed by the peptide ligase, is irreversible in that the newly formed covalent amide bond between the C-terminal aspartate residue of a first peptide and the N-terminal residue of a second peptide cannot be cleaved or exchanged once formed. Unlike naturally occurring peptide exchange or transpeptidation reactions that operate through reversible acyl-enzyme intermediates, the Connectase-catalysed reaction proceeds to completion without regeneration of a reactive intermediate or leaving group, and the product remains hydrolytically stable due to conformational rearrangements within the catalytic pocket that exclude water molecules. Consequently, the ligation constitutes a non-natural, thermodynamically stable, and unidirectional reaction yielding a permanently fused peptide product. In this regard, the methods disclosed herein lead to a processive ligation of (poly)peptides as opposed to a non-processive (reversible) ligation.

[0086]

[0080] In various embodiments, the C-terminal K-D-X-G-A motif may be referred to as a C-terminal MmCNT recognition motif that is a binding and ligation site for the peptide ligase having the activity of MmCNT. The term “C-terminal’’ refers to the carboxy-terminal end of a peptide or polypeptide chain, typically terminating with a free carboxyl group (-COOH) or a carboxamide group (-CONH2). In various embodiments, the K-D-X-G-A motif may be termed as a pentapeptide motif and also may be written as KDXGA.

[0087]

[0081] In various embodiments of the C-terminal K-D-X-G-A motif, the X residue may be Gly (G) or any uncharged polar residues, such as Ala (A), Ser (S), Thr (T), Cys (C), Asn (N) or Gin (Q).

[0088]

[0082] In various embodiments of the C-terminal K-D-X-G-A motif, the X residue may be Pro(P) or any non-polar residue such as Ala (A), Vai (V), Leu(L), lle(l), Pro(P), Phe(F), Trp(W), Met(M).

[0089]

[0083] In various embodiments of the C-terminal K-D-X-G-A (SEQ ID NO: 9) motif is K-D-P-G-A (SEQ ID NO: 10) or K-D-G-G-A (SEQ ID NO: 11).

[0090]

[0084] The C-terminal region of the (poly)peptide may generally encompass the final 1 to 30 amino acids of the peptide sequence. In various embodiments, the K-D-X-G-A motif is located proximal to but not necessarily at the extreme C-terminus of the (poly)peptide. For instance, the motif may be positioned within the final 5 to 15 amino acid residues from the C-terminus end of the (poly)peptide, preferably about 10 or 15 amino acids upstream of the terminal residue.

[0091]

[0085] In various embodiments, the C-terminal K-D-X-G-A motif further comprises one or more amino acid residues immediately upstream (N-terminal side) of the K residue and / or immediately downstream (C-terminal side) of the A residue. In this regard, the C-terminal MmCNT recognition motif may extend beyond the minimal K-D-X-G-A sequence motif to include additional amino acid residues that provide structural or contextual compatibility for MmCNT recognition.

[0092]

[0086] In various embodiments, the C-terminal of the first (poly)peptide, and more particularity the C-terminal MmCNT recognition motif, may comprise or consist of an amino acid sequence (X)n-K-D-X- G— A— (X)m, wherein n is an integer selected from 0 to 5, m is an integer selected from 0 to 10, and X can be any amino acid. In various embodiments, n may be 5 and m may be 10.

[0093]

[0087] In various embodiments, the C-terminal of the first (poly)peptide may comprise of an amino acid sequence set forth in any one of SEQ ID NOs: 69-72, and 95-97, or variants thereof. The variants being invariable with respect to the inclusion of the K-D-X-G-A motif.

[0094]

[0088] The inventors discovered that the N-terminal substrate recognition motifs for MmCNT may be engineered and designed to suppress MmCNT's reversible protease activity, thus enabling high-precision protein ligations in complex biological environments, such as serum-containing cell cultures. In various embodiments, the N-terminal pi”-p2”-ps”motjf may be referred to as an N-terminal MmCNT recognition motif that is a binding and ligation site present on the second (poly)peptide for the peptide ligase having the activity of MmCNT. The term “N-terminal” refers to the amino-terminal end of a peptide or polypeptide chain, typically terminating with a free amino group (-NH2) or an N-acylated derivative thereof. In various embodiments, the pi’-p^-ps’ motif may be termed as a tripeptide motif that defines the reactive N-terminal sequence context recognised by the peptide ligase during amide bond formation with the C-terminal K-D-X-G-A motif of the first (poly)peptide.

[0095]

[0089] In various embodiments, P1” may be selected from Ala (A), Vai (V), Gly (G), Ser (S), Thr (T), Leu (L), Met (M), Pro(P) or an analogue thereof. In various embodiments, P1’ may be selected from Ala (A), Vai (V), Gly (G), Ser (S), Thr (T), Leu (L), Met (M) or an analogue thereof. In various embodiments, P1’ may be Gly (G) or an analogue of Gly.

[0096]

[0090] An “amino acid analogue” as used herein refers to any naturally occurring amino acid, non-canonical amino acid, amino acid derivative, isotopically labeled amino acid, or non-peptidic chemical moiety that possesses a structural, conformational, or physicochemical resemblance to a reference amino acid residue such that, when incorporated into a peptide or polypeptide sequence, it can substantially reproduce the steric, electronic, or geometric characteristics of that residue and maintain an equivalent functional or structural role in the context of enzyme recognition, substrate binding, and catalysis. The term encompasses, without limitation, D- and L-forms, side-chain modified variants (e.g., methylated, hydroxylated, halogenated, alkylated derivatives, or isotopically substituted derivatives), backbone-modified residues (e g., p-amino acids, aza- or dehydro-amino acids), and peptidomimetic surrogates that preserve the spatial orientation, steric compatibility, and backbone flexibility required for recognition by the connectase catalytic pocket (i.e. catalytic pocket of the MmCNT having a three-dimensional structure corresponding to the atomic coordinates deposited under Protein Data Bank (PDB) accession number 8JTU or 8WKD) or for maintenance of peptide conformation.

[0097]

[0091] In various embodiments, the determination or validation of an amino acid analogue that is compatible with the catalytic activity of MmCNT may be guided or confirmed using the X-ray crystallographic structures of the enzyme, as deposited under Protein Data Bank (PDB) accession numbers 8JTU or 8WKD. The atomic coordinates of these structures define the spatial organisation of the MmCNT catalytic pocket, including the N-terminal recognition groove that accommodates the N-terminal pi"-p2"-p3" motif of the ligation substrate. By analysing or modelling the steric and electronic interactions between candidate amino acid residues or chemical moieties and the active-site environment, it is possible to predict and validate which substituents may functionally substitute for the amino acid (e.g. Gly or Ala). In various embodiments, in silico mutagenesis, molecular docking, or molecular dynamics (MD) simulations may be employed to assess whether a given residue or analogue maintains the permissive conformational space, torsional flexibility, and minimal steric interference characteristic of the amino acid being substituted (e.g. Gly or Ala) within the active-site geometry defined by PDB 8JTU or 8WKD. Candidate analogues identified through computational analysis may further be experimentally validated using enzymatic ligation assays or kinetic measurements to confirm that catalytic efficiency, substrate turnover, or product yield are substantially equivalent to those observed with the reference motif. Accordingly, the structural data provided by PDB 8JTU and 8WKD serve as template models for rational design and empirical confirmation of functionally compatible amino acid analogues within the N-terminal recognition motif of substrates for the MmCNT -catalysed ligation reaction.

[0098]

[0092] In this regard, the analogue of Gly may refers to a residue, amino acid derivative, or chemical moiety that exhibits a steric profile and backbone geometry similar to that of Gly, characterized by minimal side-chain volume, high conformational flexibility, and lack of branching at the |3-carbon position. Such analogues may include, for example, a-methylglycine (sarcosine), |3-alanine, N-methylglycine, aminooxyacetyl, or other small, uncharged moieties that reproduce the conformational permissiveness and compactness of Gly. In the context of connectase-catalysed ligation, an analogue of Gly is any substituent that preserves the motifs accessibility to the MmConnectase (MmCNT) active site and maintains catalytic ligation efficiency, yielding substantially the same structural and functional outcome as Gly. In various embodiments, the analogue of Gly refers to any chemical moiety or amino acid residue that can functionally substitute for Gly within the N-terminal recognition motif, provided that such substitution is sterically and chemically compatible with the catalytic pocket of the MmCNT enzyme, the three-dimensional structure of which corresponds to the atomic coordinates deposited under Protein Data Bank (PDB) accession number 8JTU or 8WKD. In various embodiments, the analogue of Gly is selected or validated based on molecular modelling or docking analysis using the atomic coordinates of MmCNT deposited under PDB accession number 8WKD or 8JTU.

[0099]

[0093] In various embodiments, P2" may be any uncharged polar amino acid or any amino acid that influences chain orientation selected from Gly (G), Ser (S), Thr (T), Cys (C), Tyr (Y), Asn (N), Gin (Q), and Pro(P), or analogues thereof.

[0100]

[0094] In various embodiments, P2' is Gly (G) or an analogue of Gly (G).

[0095] In various embodiments, P3” may be any nonpolar amino acid, or hydrophobic amino acid, or aliphatic amino acid, selected from Ala (A), Val (V), Leu (L), Ile (I), Pro (P), Phe (F), Trp (W), and Met (M), or analogues thereof.

[0101]

[0096] In various embodiments, P3’’ is Ala (A) or an analogue of Ala (A).

[0102]

[0097] In this regard, the analogue of Ala refers to a residue, amino acid derivative, or chemical moiety that resembles Ala in steric volume, side-chain geometry, and hydrophobic character, typically possessing a small, non-polar substituent at the -carbon that does not introduce steric clash or polarity incompatible with the connectase substrate-binding groove. Such analogues include, but are not limited to, valine analogues with reduced branching, a-ethylglycine, 2-aminobutyric acid, or fluoroalanine, as well as other minimal aliphatic side-chain residues that conserve the structural compactness and orientation of Ala while maintaining efficient recognition and catalysis by MmConnectase. In various embodiments, the analogue of Ala refers to any chemical moiety or amino acid residue that can functionally substitute for Ala within the N-terminal recognition motif, provided that such substitution is sterically and chemically compatible with the catalytic pocket of the MmCNT enzyme, the three-dimensional structure of which corresponds to the atomic coordinates deposited under Protein Data Bank (PDB) accession number 8JTU or 8WKD. In various embodiments, the analogue of Ala is selected or validated based on molecular modelling or docking analysis using the atomic coordinates of MmCNT deposited under PDB accession number 8WKD or 8JTU.

[0103]

[0098] In various embodiments, the N-terminal p1”-p2”-p3” motif may be P1”-G-A, where P1" may be selected from A, V, G, S, T, L, or M, and P2” is G, and P3" is A. In various embodiments, the N-terminal pi’-p2”-P3” motif may selected from A-G-A, G-G-A, V-G-A, S-G-A and T-G-A, preferably the pi”-p2”.p3” motif is G-G-A.

[0104]

[0099] The N-terminal region of the (poly)peptide may generally encompass the initial 1 to 30 amino acid residues of the (poly)peptide sequence. In various embodiments, the pi”-p2”-p3” motif is located proximal to, and preferably at, the extreme N-terminus of the (poly)peptide. For instance, P1" may correspond to the N-terminus residue of the (poly)peptide, thereby defining the free amino group that participates in ligation by the peptide ligase.

[0105]

[0100] In various embodiments, the N-terminal p1”-p2”.p3” motif further comprises one or more amino acid residues immediately downstream (C-terminal side) of the P3" residue. In this regard, the N-terminal recognition motif may extend beyond the minimal pi"-p2”-p3” motif to include additional amino acid residues that may provide structural or contextual compatibility for MmCNT recognition.

[0106]

[0101] In various embodiments, the N-terminal motif further comprises 1 to 15 or more amino acid residues immediately downstream of P3”, such that the recognition motif may be represented as P1”-P2’-P3”-(X)q, wherein q is an integer selected from 1 to 15, and X can be any amino acid. In various embodiments, q may be 10 or 15. In various embodiments, (X)qmay be FDADPLWEISEEGE (SEQ ID NO:98) or variants thereof. In various embodiments, the variants of FDADPLWEISEEGE may be comprise one or two amino acid substitutions, for example VDAKPLWEISEEGE (SEQ ID NO:99).

[0107]

[0102] In various embodiments, the N-terminal of the second (poly)peptide may comprise an amino acid sequence P1’-P2”-P3,’-V-X1-X2-K-(X3)r, where r is an integer from 6-11 and X1, X2, and X3can be any amino acid.

[0108]

[0103] In various embodiments, the N-terminal of the second (poly)peptide, and more particularly the N-terminal MmCNT recognition motif, may comprise or consist of an amino acid sequence set forth in any one of SEQ ID NOs: 58-68, 75-94, and 100-101 or variants thereof. In various embodiments, the N-terminal of the second (poly)peptide, and more particularly the N-terminal MmCNT recognition motif, may comprise or consist of an amino acid sequence set forth in any one of SEQ ID NOs: 75-86, 88-94 and 101. In various embodiments, the N-terminal of the second (poly)peptide, and more particularly the N-terminal MmCNT recognition motif, may comprise or consist of an amino acid sequence set forth in any one of SEQ ID NOs: 75, 80, 84, 85, 90, 91, 92 and 101.

[0109]

[0104] As used herein, the term “contacting” or “contacting step” refers to any process by which the first (poly)peptide, the second (poly)peptide, and the peptide ligase having the activity of MmConnectase (MmCNT) are brought into physical or functional proximity under conditions that permit enzymatic interaction, substrate recognition, and catalysis of peptide bond formation. The contacting step may comprise, for example, mixing, combining, incubating, or co-localizing the first and second (poly)peptides with the peptide ligase in a common reaction medium or environment suitable for ligation, which may be in solution, on a solid support, at an interface, or within a cellular or subcellular compartment such as the cytosol, nucleus, organellar lumen, or extracellular milieu. In various embodiments, the contacting step may further comprise adjusting one or more physicochemical conditions conducive to MmCNT activity, including but not limited to pH, ionic strength, buffer composition, temperature, metal ion concentration, redox state, or the presence of cofactors or stabilizing agents. In various embodiments, contacting may comprise co-expression or co-transfection of nucleic acids encoding the first (poly)peptide, the second (poly)peptide, and the peptide ligase in a host cell, such that the resulting polypeptides are expressed and interact within the intracellular environment to permit ligation in situ. The contacting step may be performed simultaneously, wherein all components are combined in a single step, or sequentially, such as pre-incubating the peptide ligase with one (poly)peptide substrate to form an intermediate complex before addition of the other substrate. Accordingly, the contacting step encompasses any configuration or method of bringing the reactants together in a manner sufficient to allow the peptide ligase to mediate cleavage and ligation between the C-terminal D-X bond of the first (poly)peptide and the N-terminal pi”-P2"-P3" motif of the second (poly)peptide, thereby forming a ligated (poly)peptide product.

[0105] In various embodiments of the methods of ligation, the second (poly)peptide comprising the N-terminal pi"-p2"-p3”motif may be the same as, or different from, the first (poly)peptide comprising the C-terminal K-D-X-G-A motif, and may participate in intermolecular or intramolecular peptide ligation to yield a ligated or cyclic product, respectively.

[0110]

[0106] Such intermolecular ligation may enable the assembly of modular peptide conjugates, proteinprotein fusions, or synthetic peptide-tag linkages, and may be employed for site-specific conjugation, labelling, immobilisation, or construction of defined multi-domain polypeptides. Such intramolecular ligation may be used to produce backbone-cyclised polypeptides, head-to-tail macrocycles, or conformationally constrained scaffolds that exhibit enhanced thermal stability, proteolytic resistance, and structural rigidity relative to their linear counterparts. Thus, in various embodiments, the intermolecular ligation reaction facilitates transpeptidation between two separate peptide substrates, thereby enabling programmable peptide-peptide assembly, whereas the intramolecular ligation reaction facilitates self-cyclisation, resulting in a monomeric cyclic product.

[0111]

[0107] The peptide ligase having the activity of MmCNT for use in the present invention and methods may be any peptide ligase having the desired activity of the peptide ligase MmCNT in recognising specific C- and N- terminal binding and ligation motifs. The peptide ligase has an ability to site-specifically break a peptide bond and then reform a new bond with an incoming nucleophile. The peptide ligase MmCNT, and peptide ligases having the activity of MmCNT, recognises the C-terminal K-D-X-G-A motif of a first (polyjpeptide, and mediates peptide ligation by cleaving the bond between the C-terminal D and X residue in the first (polyjpeptide and ligates the C-terminus D residue of the first (poly)peptide to an N-terminal pi'-p2’-p3’ motif of a second (poly)peptide to form a ligated (poly)peptide that may be represented by the formula (A)-K-D-P1’’-P2"-P3”-(B), wherein (A) represents the first (polyjpeptide and (B) represents the second (polyjpeptide.

[0112]

[0108] In various embodiments, the peptide ligase having the activity of MmCNT in accordance with the present application comprises or consists of an amino acid sequence set forth in SEQ ID NO: 1, 2 or 3, as outlined in the below Table 1, or functional variants, or fragments thereof. The variants are at least at least 65%, 75%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 90.5%, 91%, 91.5%, 92%, 92.5%, 93%, 93.5%, 94%, 94.5%, 95%, 95.5%, 96%, 96.5%, 97%, 97.5%, 98%, 98.5%, 99%, 99.25%, or at least 99.5% identical or homologous to the amino acid sequence set forth in SEQ ID NO:1, 2 or 3 over their entire length.

[0113]

[0109] Alternatively or additionally, the peptide ligase may be characterized in that it is obtainable from a peptide ligase contemplated herein as an initial molecule by fragmentation or by deletion, insertion, or substitution mutagenesis, and encompasses an amino acid sequence that matches the initial molecule over a length of at least 150, 160, 170, 180, 185, 186, 187, 188, 189, 190, 191, or 192 continuously connected amino acids. The functional fragments of the peptide ligases described herein retain enzymatic activity. It is preferred that they have at least 50%, more preferably at least 70%, most preferably at least 90% of the protein ligase activity of the initial molecule, preferably of the peptide ligase having the amino acid sequence of SEQ ID NO:1, 2 or 3. The functional fragments are preferably at least 150 amino acids in length, more preferably at least 180 or 190, 191 or 192. In various embodiments, the peptide ligase may be truncated at the C-terminal, such that the residue at position 192 of SEQ ID NO: 1 represents the N-terminus of the MmCNT.

[0114]

[0110] The peptide ligase having the activity of MmCNT may be naturally occurring enzymes and may be provided in isolated form, and may also be referred to as a polypeptide having the peptide ligase activity of MmCNT. The peptide ligase or polypeptide may be “isolated”, wherein “Isolated”, as used herein, relates to the polypeptide in a form where it has been at least partially separated from other cellular components it may naturally occur or associate with. The peptide ligase may be recombinant polypeptides, i.e. polypeptides produced in a genetically engineered organism that does not naturally produce said polypeptide. Both native and recombinant polypeptides may be post-translationally modified by N-linked glycosylation.

[0115]

[0111] Table 1: Amino Acid Sequences of MmCNT

[0116] SEQ Description / Amino Acid Sequence

[0117] ID NO: Source

[0118] 1 MmCNT TLVIAFIGKNGAVMAGDMREITFEGEKPDREKLEKELYSGSIVTDEEMQK KAEEFGVKITVADCKEKVSERNGVLVGEVSSAEGGWKKRRLYASAGN FAIAELINTEMTLTSQGKGSNFIAFGNEFTKQVANKCFKDNWTKKSNLQ DAVKILILCMETVARKTASVSKQFMIVQTASNADVLKWEKDRNC

[0119] 2 MmCNT T1A ALVIAFIGKNGAVMAGDMREITFEGEKPDREKLEKELYSGSIVTDEEMQ KKAEEFGVKITVADCKEKVSERNGVLVGEVSSAEGGWKKRRLYASAG NFAIAELINTEMTLTSQGKGSNFIAFGNEFTKQVANKCFKDNWTKKSNL QDAVKILILCMETVARKTASVSKQFMIVQTASNADVLKWEKDRNC

[0120] 3 MmCNT ALVIAFIGKNGAVMAGDMREITFEGEKPDREKLEKELYSGSIVTDEEMQ T1A / C192S KKAEEFGVKITVADCKEKVSERNGVLVGEVSSAEGGWKKRRLYASAG NFAIAELINTEMTLTSQGKGSNFIAFGNEFTKQVANKCFKDNWTKKSNL QDAVKILILCMETVARKTASVSKQFMIVQTASNADVLKWEKDRNS

[0121]

[0122]

[0112] In various embodiments, the peptide ligase having the activity of MmCNT in accordance with the present application comprises or consists of:

[0123] (i) the amino acid sequence set forth in SEQ ID NO:1, 2 or 3;

[0124] (ii) an amino acid sequence that shares at least 65%, preferably at least 75%, even more preferably at least 85%, most preferably at least 95% sequence identity with, or at least 80%, preferably at least 90%, more preferably at least 95% sequence homology with the amino acid sequence as set forth in SEQ ID NO:1, 2 or 3; or

[0125] (iii) a functional fragment of (i) or (ii).

[0113] In various embodiments, the peptide ligase comprises or consists of the amino acid sequence as set forth in SEQ ID NO:3.

[0126]

[0114] The identity of nucleic acid or amino acid sequences is generally determined by means of a sequence comparison. This sequence comparison is based on the BLAST algorithm that is established in the existing art and commonly used (cf. for example Altschul et al. (1990) “Basic local alignment search tool”, J. Mol. Biol. 215:403-410, and Altschul et al. (1997): “Gapped BLAST and PSI-BLAST: a new generation of protein database search programs”; Nucleic Acids Res., 25, p. 3389-3402) and is effected in principle by mutually associating similar successions of nucleotides or amino acids in the nucleic acid sequences and amino acid sequences, respectively. A tabular association of the relevant positions is referred to as an "alignment." Sequence comparisons (alignments), in particular multiple sequence comparisons, are commonly prepared using computer programs which are available and known to those skilled in the art.

[0127]

[0115] A comparison of this kind also allows a statement as to the similarity to one another of the sequences that are being compared. This is usually indicated as a percentage identity, which is calculated in relation to a reference sequence and its entire length. The term "sequence identity" refers to the extent that sequences are identical on a nucleotide-by-nucleotide or amino acid-by-amino acid basis over a window of comparison. Thus, a "percentage of sequence identity'" is calculated by comparing two optimally aligned sequences over the window of comparison, determining the number of positions at which the identical nucleic acid base or amino acid residue occurs in both sequences to yield the number of matched positions, dividing the number of matched positions by the total number of positions in the window of comparison (i.e., the window size), and multiplying the result by 100 to yield the percentage of sequence identity. The more broadly construed term "homology", in the context of amino acid sequences, also incorporates consideration of the conserved amino acid exchanges, i.e. amino acids having a similar chemical activity, since these usually perform similar chemical activities within the protein. The similarity of the compared sequences can therefore also be indicated as a "percentage homology" or "percentage similarity." Indications of identity and / or homology can be encountered over entire polypeptides or genes, or only over individual regions. Homologous and identical regions of various nucleic acid sequences or amino acid sequences are therefore defined by way of matches in the sequences. Such regions often exhibit identical functions. They can be small, and can encompass only a few nucleotides or amino acids. Small regions of this kind often perform functions that are essential to the overall activity of the protein. It may therefore be useful to refer sequence matches only to individual, and optionally small, regions. Unless otherwise indicated, however, indications of identity and homology herein refer to the full length of the respectively indicated nucleic acid sequence or amino acid sequence.

[0128]

[0116] While it is recognised that various peptide ligases as described above, may be suitable for the practice of the present invention, it is preferable to use one with potent protein ligase activity. In various embodiments, this means that it can ligate a given peptide with an efficiency of at least 50%, preferably at least 70%, more preferably at least 90%, most preferably at least 95%. Methods to determine such efficiency by, for example, ligating substrate (100 pM) in the presence of said peptide ligase (50 nM) for 30 mins in a standard reaction buffer at neutral pH and room temperature, are well known in the art and can be routinely applied by those skilled in the art, for example. It is preferred that the peptide ligases of the invention have at least 50%, more preferably at least 70%, most preferably at least 90% of the protein ligase activity of the enzyme having the amino acid sequence of SEQ ID NO:1, 2 or 3.

[0129]

[0117] The amino acid position numbering referenced throughout this disclosure is relative to the amino acid sequence set forth in SEQ ID NO:1, 2 or 3, where position 1 is T or A and position 192 is C or S.

[0130]

[0118] In various embodiments, the peptide ligase comprises the amino acid sequence set forth in SEQ ID NO: 4 (KKRRLYASAGNFAIAE) at the positions corresponding to residues 88-103 of SEQ ID NO:1. These residues were shown to represent N-terminal substrate recognition groove features with the amino acid side chains forming specific interactions with the ligation substrate.

[0131]

[0119] In various embodiments, the peptide ligase comprises the amino acid sequence set forth in SEQ ID NO: 5 (NEFTKQVANKCFKDNW) at the positions corresponding to residues 125-140 of SEQ ID NO:1. These residues were shown to represent N-terminal substrate recognition groove features with the amino acid side chains forming specific interactions with the ligation substrate. In particular, these residues form the a-helix 3 of the MmCNT.

[0132]

[0120] In various embodiments, the peptide ligase comprises the amino acid sequence set forth in SEQ ID NO: 4 and 5 at the positions corresponding to residues 88-103 and 125-140 of SEQ ID NO:1. These defines surface structural feature of the enzyme serving as the binding site for approaching N-terminal amino group. When bound to a substrate these residues form a N-terminal binding groove that adopts a -sheet secondary structure.

[0133]

[0121] In various embodiments, the peptide ligase comprises the amino acid sequence set forth in SEQ ID NO: 6 (KKRRLYASAGNFAIAELINTEMTLTSQGKGSNFIAFGNEFTKQVANKCFKDNW) at the positions corresponding to residues 88-140 of SEQ ID NO:1. This sequence defines the N-terminal substrate recognition groove of the MmCNT.

[0134]

[0122] A number of amino acid residues of the MmCNT have been found to be relevant for interacting with the substrate and perform the ligation reaction. Thus, in various embodiments, the peptide ligase may comprise one or more or all of the following residues at the designated positions, wherein position numbering is in accordance with SEQ ID NO:1. These residues may be considered as invariable in relation to any variant used for the methods disclosed herein:

[0135] an amino acid residue F at the position corresponding to position 99 of SEQ ID NO:1;

[0136] an amino acid residue I at the position corresponding to position 101 of SEQ ID NO:1; an amino acid residue E at the position corresponding to position 103 of SEQ ID NO:1;

[0137] an amino acid residue N at the position corresponding to position 119 of SEQ ID NO:1, an amino acid residue I at the position corresponding to position 121 of SEQ ID NO:1;

[0138] an amino acid residue F at the position corresponding to position 123 of SEQ ID NO:1;

[0139] an amino acid residue N at the position corresponding to position 133 of SEQ ID NO:1;

[0140] an amino acid residue K at the position corresponding to position 137 of SEQ ID NO:1; and an amino acid residue W at the position corresponding to position 140 of SEQ ID NO:1.

[0141]

[0123] In various embodiments, amino acid residues C at the positions corresponding to positions 64 and 192 of SEQ ID NO:1 form a disulfide bridge.

[0142]

[0124] Peptide ligases having the activity of MmCNT according to the present application can comprise amino acid modifications, in particular amino acid substitutions, insertions, or deletions. Such peptide ligases are, for example, further developed by targeted genetic modification, i.e. byway of mutagenesis methods, and optimized for specific purposes or with regard to special properties (for example, with regard to their catalytic activity, stability, etc.). The objective may be to introduce targeted mutations, such as substitutions, insertions, or deletions, into the known molecules in order, for example, to alter substrate specificity and / or improve the catalytic activity. For this purpose, in particular, the surface charges and / or isoelectric point of the molecules, and thereby their interactions with the substrate, can be modified. Alternatively or additionally, the stability of the peptide ligase can be enhanced by way of one or more corresponding mutations, and its catalytic performance thereby improved. Advantageous properties of individual mutations, e.g. individual substitutions, can supplement one another. In various embodiments, the peptide ligase may be characterised in that it is obtainable from a peptide ligase as described above as an initial molecule by single or multiple conservative amino acid substitution. The term "conservative amino acid substitution" means the exchange (substitution) of one amino acid residue for another amino acid residue, where such exchange does not lead to a change in the polarity or charge at the position of the exchanged amino acid, e.g. the exchange of a nonpolar amino acid residue for another nonpolar amino acid residue. Conservative amino acid substitutions in the context of the invention encompass, for example, G=A=S, l=V=L=M, D=E, N=Q, K=R, Y=F, and S=T.

[0143]

[0125] Accordingly, the peptide ligase may comprise one or more mutations, preferably substitutions that enhance ligation activity and / or stability of the enzyme. The one or more mutations may be at a position selected from 1, 38, 79, 81, 83, 125, 192 and combinations thereof, wherein position numbering is relative to the amino acid sequence set forth in SEQ ID NO:1.

[0144]

[0126] In various embodiments, the peptide ligase comprises an amino acid residue S, A, H or D at the position corresponding to position 38 of SEQ ID NO:1, an amino acid residue A at the position corresponding to position 79 of SEQ ID NO:1, an amino acid residue A, G or D at the position corresponding to position 81 of SEQ ID NO:1, an amino acid residue A at the position corresponding to position 83 of SEQ ID NO:1, and / or an amino acid residue S, A or G at the position corresponding to position 125 of SEQ ID NO:1.

[0145]

[0127] In various embodiments, the peptide ligase comprises an amino acid residue A at the position corresponding to position 1 of SEQ ID NO:1. In various embodiments, the peptide ligase comprises an amino acid residue A at the position corresponding to position 81 of SEQ ID NO:1. In various embodiments, the peptide ligase comprises an amino acid residue S at the position corresponding to position 125 of SEQ ID NO:1. In various embodiments, the peptide ligase comprises an amino acid residue S at the position corresponding to position 192 of SEQ ID NO:1.

[0146]

[0128] In various embodiments, the peptide ligase comprises an amino acid residue A at the position corresponding to position 1, an amino acid residue S at the position corresponding to position 192, and an amino acid mutation at position 81 and / or 125, preferably the mutation is S81A and / or N125S.

[0147]

[0129] In various embodiments, the peptide ligase may comprises an affinity tag and optionally a cleavage sequence positioned at the N- or C-terminal of the amino acid sequence as set forth in (i)-(iii) above, more particularly the affinity tag, and optionally the cleavage sequence, are positioned at the N-or C-terminus of the amino acid sequence as set forth in (i)-(iii). In various embodiments, the cleavage sequence is positioned between the affinity tag and the amino acid sequence as set forth in (i)-(iii). In various embodiments, the peptide ligase may comprise a His-tag, and optionally a TEV cleavage site.

[0148]

[0130] In various embodiments, the peptide ligase may be operably fused to one of the ligation substrates (i.e. (poly)peptides) to form a fusion protein. The fusion may be effected at either the N- or C-terminus of the peptide ligase, optionally through a linker sequence that maintains the catalytic activity of the enzyme and the accessibility of the MmCNT recognition motif of the first or second (poly)peptide. Suitable linker sequences include flexible glycine- or serine-rich linkers, or other amino acid spacers of 5-30 residues that reduce steric interference between the enzyme and substrate domains.

[0149]

[0131] In various embodiments, the peptide ligase may be operably fused to the first (poly)peptide. The fusion protein may comprise the peptide ligase covalently linked, either directly or through a flexible linker, to the first (poly)peptide. The fusion being configured such that the peptide ligase retains its catalytic activity and the C-terminal K-D-X-G-A motif of the first (poly)peptide remains sterically accessible forsubstrate recognition and ligation. In this regard, the first (poly)peptide may function either as a ligase substrate or as a targeting protein depending on the application. In various embodiments, the first (poly)peptide serves as a C-terminal substrate for the peptide ligase, facilitating proximity-enhanced or self-templated ligation. In other embodiments, the first (poly)peptide may function as a targeting protein, such as an antibody or antibody fragment, receptor-binding domain, cell-penetrating peptide (CPP), or other localization sequence, thereby enabling site-specific or targeted ligation of a complementary N-terminal substrate (i.e. second (poly)peptide) at a defined molecular or cellular site.

[0132] In various embodiments, the peptide ligase may be fused to the second (poly)peptide. The fusion may be effected at either the N-terminus of the peptide ligase, optionally through a linker sequence that maintains the catalytic activity of the enzyme and the accessibility of the N-terminal recognition motif of the second (poly)peptide. The fusion being configured such that the peptide ligase retains its catalytic activity and the N-terminal pi'-p2’_p3’ motif of the second (poly)peptide remains sterically accessible for substrate recognition and ligation.

[0150]

[0133] In various embodiments, the first and second (poly)peptides are termini of the same peptide (i.e. first and second (poly)peptide combine to form a single core peptide sequence) such that the method cyclizes said peptide.

[0151]

[0134] In various embodiments, the (poly)peptides to be ligated in accordance with the present application may be modified by, for example, conjugation to a labelling moiety, either covalently or non-covalently. A labelling moiety may be any molecules such as, without limitation, an affinity tag, therapeutic agent, detectable label, or scaffold molecule.

[0152]

[0135] The term “affinity tag” as used herein refers to a moiety such as biotin that can be used to separate a molecule to which the affinity tag is attached from other molecules that do not contain the affinity tag. Exemplary affinity tags include a polyhistidine tag (His-tag), a FLAG-tag, an HA-tag, a Strep-tag, or other peptide sequences that permit affinity purification or immunodetection using corresponding antibodies or binding partners.

[0153]

[0136] The term “detectable label" is intended to mean at least one label capable of directly or indirectly generating a detectable signal. In non-limiting examples, a detectable label can be an enzyme producing a detectable signal, for example by colorimetry, fluorescence, or luminescence; a chromophore, such as a fluorescent protein, luminescent or dye compound, (e g. green fluorescent proteins such as GFP or EGFP, red fluorescent proteins such as mCherry, or self-labelling proteins such as SNAP-tag that can covalently bind fluorophore substrates); a group with an electron density detectable by electron microscopy or by virtue of their electrical property, such as conductivity, amperometry, voltammetry or impedance; detectable group, for example the molecules of which are sufficiently large to induce detectable modifications of their physical and / or chemical characteristics (this detection can be carried out by optical methods such as diffraction, surface plasmon resonance, surface variation or contact angle variation, or physical methods such as atomic force spectroscopy or the tunnel effect; or a radioactive molecule such as32P,35S or125l. Detectable labels useful in the methods and uses of the invention may allow using a single reagent imaging of tumours in vivo using PET or SPECT followed by fluorescent detection in organ sections or biopsies.

[0154]

[0137] The term “scaffold molecule” as used herein refers to a compound, macromolecule, or molecular framework to which one or more other moieties may be attached covalently or non-covalently, thereby serving as a structural or organizational platform for subsequent assembly or modification. The scaffold molecule may be organic, polymeric, peptidic, or inorganic in nature, and may include natural, semi-synthetic, or synthetic materials. Exemplary scaffold molecules include, but are not limited to, dendrimers, branched polymers, peptides, proteins, nucleic acids, polysaccharides, or surface-bound linkers such as those immobilized on functionalized glass, resin, or nanoparticle substrates. In certain embodiments, the scaffold molecule presents two or more recognition motifs for enzymatic ligation (e.g., peptide ligase motifs), thereby enabling the assembly or organization of dimers, oligomers, or polyproteins in a defined spatial arrangement.

[0155]

[0138] In the methods described herein, at least one of the (poly)peptides to be ligated may be further conjugated to an organic moiety. For this purpose, the (poly)peptide may comprise a reactive group, typically not at the terminus to be ligated. Said reactive group, which may also be a side chain of an amino acid, may then be conjugated to an organic moiety of interest in a further step of the method. The organic moiety may be any molecule or group and comprises pharmaceutically active agents and detectable markers, such as fluorescent markers or biotin. In various embodiments, the active agent may be a small organic molecule pharmaceutical, such as a cancer therapeutic agent.

[0156]

[0139] In various embodiments, the first and / or second (poly)peptide can be an antibody, an antibody fragment or an antibody mimetic.

[0157]

[0140] In various embodiments, the first and / orsecond (polyjpeptides may be a cellular surface protein such that the method results in the modification ortagging of the cellular surface protein and the cellular surface.

[0158]

[0141] In various embodiments, the first and / or second (poly)peptides may be intracellular proteins of a host cell, and the protein ligase is comprised within said host cell, preferably a mammalian host cell.

[0159]

[0142] In various embodiments, the first and / or second peptide may be coupled to a solid support material.

[0160]

[0143] The term “solid support material" as used herein refers to a solid or semi-solid (e.g., a hydrogel) material onto which the peptide can be immobilized. Non-limiting examples include solid supports for peptide synthesis, magnetic beads, glass fibres, and resins. The solid support materials described above can be used for on-column cyclization and / or ligation of at least one substrate peptide or in a method for the cyclisation or ligation of at least one substrate peptide, comprising contacting a solution comprising the at least one substrate peptide with the solid support material described above under conditions that allow cyclization and / or ligation of the at least one substrate peptide. The substrate peptides are those described above and include also the above polypeptide substrate.

[0161]

[0144] In various embodiments, the second (poly)peptide may be coupled on the solid support material, and the first (poly)peptide may be ligated to the second (poly)peptide by the peptide ligase. ❖ Modifying or Tagging a Cell Surface

[0162]

[0145] In various embodiments, the method may result in the modification or tagging of the cellular surface protein and the modification or tagging of the cellular surface of a target cell. In particular, one of the first and second (poly)peptide may be the cellular surface protein, while the other being a peptide of interest to be ligated to said cellular surface protein.

[0163]

[0146] Accordingly, in another aspect, the present invention may be directed to methods for modifying or tagging the surface of a target cell by one or more (poly)peptides of interest, the method comprising, providing the one or more (poly)peptides of interest having a C-terminal K-D-X-G-A motif and / or an N-terminal pi'-p^-ps’ motif, wherein X is any amino acid, P1’ is A, V, G, S, T, L, or M, P2" is G or an analogue of G, and P3’ is A or an analogue of A;

[0164] providing a peptide ligase having the activity of MmCNT;

[0165] contacting the target cell with the one or more (poly)peptides of interest and the peptide ligase; and

[0166] subjecting the target cell to conditions that allow the peptide ligase to catalyse the ligation of the one or more (poly)peptides of interest to a cellular surface protein of the target cell.

[0167]

[0147] In various embodiments, the one or more (poly)peptides of interest may be functionalized to bind a variety of cargo molecules. In various embodiments, the one or more (poly)peptides of interest comprise a labelling moiety such as an affinity tag, therapeutic agent, detectable label, or scaffold molecule. In various embodiments, the one or more (poly)peptides of interest may include but are not limited to antibodies or antibody fragments (e.g. anti-CD19 scFv), cytokines, cytokine receptors, interferon receptors, T-cell receptors (TCRs), B-cell receptors (BCRs), or cell-penetrating peptides (CPPs).

[0168]

[0148] In various embodiments, the target cell recombinantly expresses a cellular surface protein having an N-terminal pi’-pz’-ps" motif or a C-terminal K-D-X-G-A motif for ease of tagging by the one or more peptides of interest. The term “recombinantly express” as used herein refers to the expression of said protein by recombinant DNA technology. The one or more (poly)peptides of interest comprising a different (i.e. complementary) MmCNT recognition motif to the cellular surface protein.

[0169]

[0149] In various embodiments, the target cell may express endogenous surface proteins suited for ligation to the one or more (poly)peptides of interest.

[0170]

[0150] In various embodiments, the cellular surface protein is a transmembrane, membrane-anchored or membrane-associated protein that naturally presents, or is engineered to comprise, an MmCNT recognition motif, such as a C-terminal K-D-X-G-A motif and / or an N-terminal p1"-p2"-p3" motif, to serve as the ligation site for ligation of the peptide of interest. The cellular surface protein may be an integral membrane protein, receptor, transporter, adhesion molecule, orextracellularly exposed enzyme located at the plasma membrane or an organelle membrane of the target cell. Examples include, but are not limited to, CD, integrin, or cadherin family members, or engineered derivatives thereof, in which the recognition motif is introduced into an extracellular domain, luminal loop, orterminal region exposed on the cell surface.

[0171]

[0151] Accordingly, the cellular surface protein may function as an anchoring scaffold for enzymatic attachment of diverse peptide species. Such a configuration permits site-specific, enzyme-mediated modification of living cells, enabling applications in targeted cytokine delivery, receptor modulation, celllabelling, immune-cell labelling, tracking, imaging, targeted delivery, or bio-orthogonal surface functionalization with high selectivity and minimal disruption of native protein architecture. In various embodiments, the modifying or tagging the surface of a target cell may encompass, without limitation, (i) the formation of antibody-cytokine fusion products on the surface of the target cell for targeted cytokine delivery, sequestration, or modulation within the tumor microenvironment (TME); (ii) the ligation of masking or activating (poly)peptides of interest to cytokines or cytokine receptors that constitute or are associated with the cellular surface proteins of the target cell, thereby regulating local cytokine activity and immune responses; (Hi) the ligation of cell-penetrating (poly)peptides of interest to cytokines or receptor ligands to generate CPP-cytokine conjugates capable of facilitating intracellular delivery; and (iv) the site-specific modification of cellular surface proteins comprising T-cell receptors (TCRs), B-cell receptors (BCRs), cytokine receptors, or interferon receptors through covalent attachment of one or more (poly)peptides of interest, for therapeutic, diagnostic, targeting, or labelling applications.

[0172]

[0152] In various embodiments, the cellular surface protein may be genetically modified to contain a C-terminal KDXGA motif appended to its extracellular tail (e.g. EGFP tagged type II transmembrane protein with the c-terminal (5)KDPGA(10) recognition sequence). Upon exposure of the target cell to a peptide ligase having the activity of MmCNT and to one or more (poly)peptides of interest containing an N-terminal P1"-P2”-P3" based motif (e.g. GGA(15)-Ub-mCherry or GGA-biotin), the ligase catalyses the formation of an amide bond between the aspartate residue of the surface protein motif and the N-terminal residue of the (poly)peptide of interest. The result is a covalently attached peptide label, therapeutic moiety, or scaffold molecule displayed on the external surface of the target cell.

[0173]

[0153] In various embodiments, the cellular surface protein may present an N-terminal pi”-p2"-p3" motif at an extracellular domain, which allows the ligation of a (poly)peptide of interest carrying a C-terminal KDXGA motif. In this configuration, the MmCNT enzyme catalyses the ligation, linking the C-terminal of the peptide substrate to the N-terminal motif of the cell-surface protein.

[0174]

[0154] In various embodiments, the method further comprises removing the unligated one or more (poly)peptides of interest from the target cell after the “subjecting” step.

[0155] The target cell may be any prokaryotic or eukaryotic cell, including but not limited to bacterial, yeast, plant, or mammalian cells. In various embodiments, the target cell is a mammalian cell, such as a human or animal cell, which may be a cancer cell, immune cell, or somatic cell. In various embodiments, the target cell may include, without limitation, T lymphocytes or B lymphocytes (e.g., cells used in chimeric antigen receptor (CAR-T) therapy or monoclonal antibody production), monocytes, macrophages, dendritic cells or other antigen-presenting cells, endothelial cells, neural cells, hepatocytes, or renal epithelial cells. In various embodiments, the target cell may be an oocyte, embryonic stem cell, hematopoietic stem cell, or any other differentiated or undifferentiated cell. The target cell may be naturally occurring orgenetically engineered, provided that it expresses or is modified to express a cellular surface protein comprising a C-terminal or N-terminal recognition motif for the peptide ligase having the activity of MmCNT (e.g., an MmCNT recognition motif), wherein such motif is accessible to the one or more (poly)peptides of interest for enzymatic ligation..

[0175]

[0156] In various embodiments, the contacting step may comprise exposing or incubating the target cell with an effective concentration of the peptide ligase and the one or more (poly)peptides of interest under conditions suitable for cell viability and enzymatic ligase activity. The contacting may be carried out in suspension or adherent cell culture, wherein the ligase and peptide are delivered directly to the extracellular milieu such that the ligase can access cell-surface proteins presenting a complementary recognition motif (e.g., N-terminal p1”-p2”-p3”Or C-terminal K-D-X-G-A) to that comprised in the (poly)peptide of interest. In various embodiments, the contacting may be achieved by co-incubation of the target cells with the peptide ligase and the (poly)peptides of interest in buffered saline or physiological medium, optionally supplemented with cofactors or stabilizers to enhance ligase activity. In various embodiments, the contacting step may comprise co-delivery or co-expression of the peptide ligase and (poly)peptide of interest within the same culture system, such as by transfection, viral transduction, or cell-penetrating peptide conjugates, allowing ligation to occur on the extracellular surface or at the plasma-membrane interface. The contacting step may further include washing or incubation phases of defined duration and temperature to permit efficient enzymatic recognition and covalent attachment of the peptide(s) of interest to cell-surface proteins, thereby yielding a surface-modified or peptide-tagged cell. In various embodiments, the contacting step may comprise expression of the peptide ligase in the host cell that also expresses or contains the one or more (poly)peptides of interest to be ligated, such that ligation occurs in vivo or in situ, whereby said peptide ligase is nonpurified.

[0176]

[0157] The step of subjecting the target cell to conditions that allow the peptide ligase to catalyse ligation may refer to maintaining or treating the target cell under environmental, physicochemical, or biological conditions that permit the peptide ligase to retain catalytic activity and access the cellular surface proteins. Such conditions may include, but are not limited to, incubation at physiological or near-physiological temperature (for example, about 25 °C to 40 °C) and pH (for example, about pH 6.5 to 8.0) in a buffered aqueous medium that is compatible with cell viability, optionally supplemented with cofactors, salts, divalent metal ions (e.g., Mg2+, Mn2+, or Ca2+), or reducing agents that stabilize the ligase. In various embodiments, the conditions may comprise culturing the cells in a suitable growth or reaction medium after administration of the peptide ligase and one or more (poly)peptides of interest, thereby enabling surface-localised enzymatic ligation. In various embodiments, the conditions may further involve transient permeabilisation or membrane association of the peptide ligase to facilitate access to extracellular or membrane-anchored substrates. The step encompasses any incubation, treatment, or environmental adjustment sufficient to promote ligase-mediated cleavage of the C-terminal D-X bond of the peptide substrate and ligation of the resulting D residue to the N-terminal P1"-p2»_p3" motif of the (polyjpeptide of interest on the cellular surface.

[0177]

[0158] It is therefore believed that the present invention provides a versatile and fast-acting technology for modifying or tagging the cell surface by attaching modified or unmodified (poly)peptides of interest. Compared to conventional chemical labelling strategies, this method enables specific and fast conjugation to the N- and / or C-terminus of surface proteins.

[0178]

[0159] It is also within the scope of the present invention that the one or more (polyjpeptides of interest may be endogenously or recombinantly expressed on the surface of the target cell, in which case no additional said peptides need to be provided and the method described herein results in hetero- or homo-dimerization, oligomerization, or multimerization of surface proteins of the target cell.

[0179] ❖ Intracellular Peptide Ligation

[0180]

[0160] It is also contemplated that the methods of peptide ligation may be performed directly within the cellular environment of a host cell, in embodiments where the first and / or second (polyjpeptides are intracellular proteins of the host cell.

[0181]

[0161] Accordingly, in another aspect, the present invention may be directed to methods for intracellular peptide ligation, comprising,

[0182] providing a host cell that comprises, within its intracellular environment, the first and second (polyjpeptide, and the peptide ligase having the activity of MmCNT, such that the peptide ligase may contact the first and second (polyjpeptide, and

[0183] subjecting the host cell to conditions that allow the peptide ligase to catalyse the ligation of the first and second (poly)peptide within the intracellular environment of the host cell.

[0184]

[0162] The method described herein may be applicable to all types of host cells in vitro, ex vivo, or in vivo. All cells are in principle suitable as host cells, including prokaryotic or eukaryotic cells that can be manipulated or genetically engineered in an advantageous fashion. In various embodiments, the host cell may be a bacterial, yeast, plant, mammalian or human cell. In various embodiments, the host cell is a mammalian cell, such as a T lymphocyte or B lymphocyte (for example, cells used in chimeric antigen receptor (CAR-T) therapies or in monoclonal antibody production), a monocyte, macrophage, dendritic cell or other antigen-presenting cell, an endothelial cell, neural cell, hepatocyte, or renal epithelial cell. In various embodiments, the host cell may be a cancer cell or a cell derived from a cancer cell line, an oocyte, an embryonic stem cell, a hematopoietic stem cell, or any other differentiated or undifferentiated cell suitable for genetic manipulation. Host cells contemplated herein can be modified in terms of their requirements for culture conditions, can comprise other or additional selection markers, or can also express other or additional proteins. They can, in particular, be those host cells that transgenically express multiple proteins or enzymes, including those participating in biosynthetic, signalling, or ligase-mediated pathways.

[0185]

[0163] In various embodiments, the first and / or second (poly)peptide and the peptide ligase may be endogenously expressed in the host cell, for example, through the use of a genetically engineered cell line or organism in which the corresponding genes are integrated into the genome or maintained as episomal elements.

[0186]

[0164] In various embodiments, the first and second (poly)peptides, and the peptide ligase, are introduced into the host cell exogenously by any method well known to the skilled person, such as by transfection, transduction, electroporation, microinjection, lipid-mediated delivery, or viral vector-based expression systems. Accordingly, the method may comprise introducing into the host cell one or more nucleic acid molecules comprising nucleotide sequences encoding the first and second (poly)peptides, and the peptide ligase.

[0187]

[0165] In various embodiments, the peptide ligase is recombinantly expressed in the host cell in a nonpurified form such that ligation occurs in vivo or in situ without the need for isolation or exogenous addition of the peptide ligase. In various embodiments, the host cell may comprise a nucleic acid molecule encoding the peptide ligase, enabling intracellular expression and ligase activity on (poly)peptides that are co-expressed or present within the same cell. The intrinsically expressed peptide ligase may therefore catalyse intermolecular or intramolecular ligation reactions directly in the cellular environment. This eliminates the need for enzyme purification and enabling continuous or self-contained peptide assembly within the host cell, specifically mammalian host cells.

[0188]

[0166] In various embodiments, the nucleotide sequence encoding the first and second (poly)peptides, and the peptide ligase may be comprised in separate, distinct nucleic acid molecules. In particular, the method may comprise introducing into the host cell a first nucleic acid molecule comprising a nucleotide sequence encoding the first (poly)peptide, a second nucleic acid molecule comprising a nucleotide sequence encoding the second (poly)peptide, and a third nucleic acid molecule comprising a nucleotide sequence encoding the peptide ligase.

[0189]

[0167] The nucleic acid molecules can be DNA molecules or RNA molecules. They can exist as an individual strand, as an individual strand complementary to said individual strand, or as a double strand. With DNA molecules in particular, the sequences of both complementary strands in all three possible reading frames are to be considered in each case. Also to be considered is the fact that different codons, i.e. base triplets, can code for the same amino acids, so that a specific amino acid sequence can be coded by multiple different nucleic acids. As a result of this degeneracy of the genetic code, all nucleic acid sequences that can encode the (poly)peptides and peptide ligases are included. The skilled artisan is capable of unequivocally determining these nucleic acid sequences, since despite the degeneracy of the genetic code, defined amino acids are to be associated with individual codons. The skilled artisan can therefore, proceeding from an amino acid sequence, readily ascertain nucleic acids coding for that amino acid sequence. In addition, in the context of nucleic acids, according to the present invention one or more codons can be replaced by synonymous codons. This aspect refers in particular to heterologous expression of the enzymes contemplated herein. For example, every organism, e g. a host cell of a production strain, possesses a specific codon usage. " Codon usage" is understood as the translation of the genetic code into amino acids by the respective organism. Bottlenecks in protein biosynthesis can occur if the codons located on the nucleic acid are confronted, in the organism, with a comparatively small number of loaded tRNA molecules. Also it codes for the same amino acid, the result is that a codon becomes translated in the organism less efficiently than a synonymous codon that codes for the same amino acid. Because of the presence of a larger number of tRNA molecules for the synonymous codon, the latter can be translated more efficiently in the organism. By way of methods commonly known today such as, for example, chemical synthesis or the polymerase chain reaction (PCR) in combination with standard methods of molecular biology or protein chemistry, a skilled artisan has the ability to manufacture, on the basis of known DNA sequences and / or amino acid sequences, the corresponding nucleic acids all the way to complete genes. Such methods are known, for example, from Sambrook, J., Fritsch, E. F., and Maniatis, T, 2001, Molecular cloning: a laboratory manual, 3rd edition, Cold Spring Laboratory Press.

[0190]

[0168] In various embodiments, the nucleic acid molecule may be a vector. " Vectors" are understood for purposes herein as elements - made up of nucleic acids - that contain a nucleic acid molecule contemplated herein as a characterising nucleic acid region. They enable said nucleic acid to be established as a stable genetic element in a species or a cell line over multiple generations or cell divisions. In the context herein, a nucleic acid as contemplated herein may be cloned into a vector. Included among the vectors are, for example, those whose origins are bacterial plasmids, viruses, or bacteriophages, or predominantly synthetic vectors or plasmids having elements of widely differing derivations. Using the further genetic elements present in each case, vectors are capable of establishing themselves as stable units in the relevant host cells over multiple generations.

[0191]

[0169] In various embodiments, the nucleic acid molecule may be an expression vector. Expression vectors encompass nucleic acid molecules which are capable of replicating in the host cells that contain them, and expressing therein a contained nucleic acid. In various embodiments, the vectors described herein thus also contain regulatory elements that control expression of the nucleic acids encoding a polypeptide of the invention. Expression is influenced in particular by the promoter or promoters that regulate transcription. Expression can occur in principle by means of the natural promoter originally located in front of the nucleic acid to be expressed, but also by means of a host-cell promoter furnished on the expression vector or also by means of a modified, or entirely different, promoter of another organism or of another host cell. In the present case at least one promoter for expression of a nucleic acid as contemplated herein is made available and used for expression thereof. Expression vectors can furthermore be regulated, for example by way of a change in culture conditions or when the host cells containing them reach a specific cell density, or by the addition of specific substances, in particular activators of gene expression. One example of such a substance is the galactose derivative isopropyl-beta-D-thiogalactopyranoside (IPTG), which is used as an activator of the bacterial lactose operon (lac operon). In contrast to expression vectors, the contained nucleic acid is not expressed in cloning vectors.

[0192]

[0170] In various embodiments, the nucleotide sequences encoding the first and second (poly)peptide and the peptide ligase may be comprised in expression plasmids, each under the control of either constitutive or inducible promoters (for example, CMV, EF1a, T7, Tet-On / Tet-Off, arabinose, or lac-based systems). The use of inducible promoters enables temporal control of expression, such that the peptide ligase or the (poly)peptides may be expressed sequentially or in response to an external stimulus (e.g., addition of doxycycline or IPTG). In various embodiments, all three components may be encoded on a single multicistronic vector, optionally separated by internal ribosome entry sites (IRES) or self-cleaving peptide sequences (e.g., P2A, T2A), allowing co-expression within the same cellular compartment.

[0193]

[0171] In various embodiments, the first and second (poly)peptide and the peptide ligase may each be expressed with localisation sequences (e.g., nuclear localisation signals, mitochondrial targeting peptides, or secretion signals) to direct the ligation reaction to a defined subcellular compartment. Alternatively, the peptide ligase may be delivered as a purified protein or via cell-penetrating peptide (CPP) conjugation for transient intracellular activity, while the peptides are endogenously expressed.

[0194]

[0172] Following expression or delivery, the host cell may be subjected to suitable culture conditions (e.g., temperature, pH, and ionic strength compatible with enzyme activity) for a period sufficient to permit the peptide ligase to catalyse in situ ligation of the first and second (poly)peptides, thereby forming a ligated product or polyprotein within the cell.

[0195] ❖ Method of Tandem Ligation

[0196]

[0173] In various embodiments, the peptide ligase having the activity of MmCNT may be used for the tandem ligation of three (poly)peptides to form a covalently linked trimeric construct. Thus, the above method of ligation disclosed herein, may advantageously provide an efficient N- and C- terminal modification of target biological molecules, using a single enzyme and a single reactive (poly)peptide-based ligand with minimal off-target results.

[0197]

[0174] Thus, the peptide ligase disclosed herein may be used in a method of tandem peptide ligation, wherein a first (poly)peptide (A) functions as a central or anchoring scaffold comprising both an N-terminal pi'-pz’-ps" motif and a C-terminal K-D-X-G-A motif. The method further comprises providing a second (poly)peptide (B) and a third (poly)peptide (C), each comprising a complementary MmCNT recognition motif to one of the MmCNT recognition motifs present on the first (poly)peptide, wherein the third (poly)peptide (C) comprise a different MmCNT recognition motif to the second (poly)peptide (B). The peptide ligase catalyses sequential or simultaneous ligation of the second and third (poly)peptides to the N- and / or C-terminal ends of the first (poly)peptide, thereby generating a dually ligated or tandemly assembled complex.

[0198]

[0175] Accordingly, in another aspect, the present invention may be directed to methods for tandem ligation, comprising,

[0199] providing a first (poly)peptide (A) comprising an N-terminal P1'-P2'-P3'motif and a C-terminal K-D-X-G-A motif,

[0200] providing a second (poly)peptide (B) comprising an N-terminal p’’-pz'-ps' motif or a C-terminal K-D-X-G-A motif,

[0201] providing a third (poly)peptide (C) comprising an N-terminal p1"-P2’-p3" motif or a C-terminal K-D-X-G-A motif, wherein the third (poly)peptide comprises a different MmCNT recognition motif to the second (poly)peptide,

[0202] wherein X is any amino acid, P1’ is A, V, G, S, T, L, or M, P2’ is G or an analogue of G, and P3" is A or an analogue of A,

[0203] contacting the first (poly)peptide (A) with the second (poly)peptide (B) and the peptide ligase having the activity of MmCNT under conditions that allow ligation of the second (poly)peptide to the C-or N-terminal of the first (poly)peptide to yield a modified first (poly)peptide; and

[0204] contacting the modified first (poly)peptide with the third (poly)peptide (D) and the peptide ligase having the activity of MmCNT under conditions that allow ligation of the third (poly)peptide to the N- or C-terminal of the first (poly)peptide to yield a dually modified first (poly)peptide.

[0205]

[0176] In various embodiments, the first, second, and third (poly)peptides may encompass: (i) antibody or nanobody assemblies, wherein the first (poly)peptide (A) comprises an antibody heavy chain, Fc scaffold, or hinge domain, and the second and third (poly)peptides (B, C) comprise nanobody (VHH) fragments, single-chain variable fragments (scFv), or antigen-binding domains, thereby forming multivalent or multispecific antibody complexes such as bispecific T-cell engagers (BiTEs) or nanobody-conjugated constructs; (ii) antibody-drug or antibody-cytokine conjugates, wherein the first (poly) peptide (A) is an antibody or fragment thereof and the second and third (poly)peptides (B, C) comprise therapeutic payloads, cytokines, or targeting peptides, enabling controlled dual-site functionalization of antibody scaffolds; or (iii) enzymatic or metabolic complexes, wherein the first (poly)peptide (A) is a catalytic or structural subunit, such as a fatty acid synthase, polyketide synthase, or multienzyme scaffold component, and the second and third (poly)peptides (B, C) correspond to distinct enzymatic domains, cofactors, or accessory proteins, thereby reconstructing or extending the native multienzyme architecture through defined covalent linkages. The resulting tandem-ligated product is a dually modified first (polyjpeptide exhibiting enhanced structural stability, multivalency, or cooperative catalytic activity relative to its individual components.

[0177] In order to avoid side reactions, the method of tandem ligation described herein may be performed in two separate steps. In the first step, the second (poly)peptide is ligated to the first (poly)peptide. Non-ligated peptides may be removed after this step and the ligation product isolated. In a second step, the product of the first step may then ligated to the third (poly)peptide to yield a linear dually modified (poly)peptide, typically with the second and third (poly)peptide ligated to the N- and C-terminal, respectively, of the first (poly)peptide or vice versa, or it is cyclized.

[0206]

[0178] In various embodiments, the C-terminal of the second (poly)peptide may be ligated to the N-terminal of the first (poly)peptide. The main product of such a reaction would thus be a ligation product where the second (poly)peptide is linked to the N-terminal of the first (poly)peptide by a peptide bond. Alternatively, in said first step the main reaction may be the reaction of the C-terminal of the first (poly)peptide with the N-terminal of the second (poly)peptide.

[0207]

[0179] In the second step, either the C-terminal of the ligation product of the first step is ligated to the N-terminal of the third (poly)peptide or the C-terminal of the third (poly)peptide is ligated to the N-terminal of the first (poly)peptide. This may be dependent on whether the second (poly)peptide has been ligated to the N- or C-terminal of the first (poly)peptide in the first step, as the third (poly)peptide is preferably ligated to that end of the first (poly)peptide that has not been ligated to the second (poly)peptide to yield a dually modified, i.e. a C- and N-terminally modified, (poly)peptide.

[0208]

[0180] In various embodiments, the second (poly)peptide has a C-terminal K-D-X-G-A motif and is ligated to the N-terminal of the first (poly)peptide. Such methods are also referred to herein as “N-to-C tandem ligation”, since the N-terminal of the first (poly)peptide is ligated first.

[0209]

[0181] In various other embodiments, the second (poly)peptide has an N-terminal pi'-p2"-p3’motjf and is ligated to the C-terminal of the first (poly)peptide. Such methods are also referred to herein as “C-to-N tandem ligation”, since the C-terminal of the first (poly)peptide is ligated first.

[0210] ❖ Method of Polymerisation

[0211]

[0182] It is also contemplated that the peptide ligations disclosed herein may be performed iteratively or using bifunctional peptide substrates bearing both N- and C-terminal recognition motifs, such that the ligation reaction can be extended to generate dimers, oligomers, or multimers of (poly)peptide(s) of interest. Such higher-order constructs are often referred to as (poly)proteins, particularly when the resulting polypeptide chain comprises repeated or modular units covalently joined via peptide bonds.

[0212]

[0183] The term “(poly)protein” refers to a polymeric protein construct formed by ligation of two or more individual (poly)peptide subunits, through enzyme-catalysed peptide bond formation. In this regard, the (poly)protein is generated via ligation reactions mediated by either a single peptide ligase or two distinct peptide ligases acting at orthogonal recognition motifs, such that successive or site-specific ligation events yield a continuous polypeptide chain of defined sequence and orientation. The resulting (poly)protein may comprise identical or heterologous polypeptide units, optionally arranged in a head-to-tail configuration, and may preserve the native folding or functional properties of each constituent domain while providing a covalently linked macromolecular assembly suitable for biophysical, structural, or mechanochemical studies.

[0213]

[0184] Accordingly, in another aspect, the present invention may be directed to methods for preparing a dimer, oligomer, or multimer of one or more peptides using the peptide ligases having the activity of MmCNT, and may be also described as a method of forming a (poly)protein or protein polymerization, where the enzymatic ligation reaction is employed for controlled protein polymerization or assembly.

[0214]

[0185] In various embodiments, the method of preparing a dimer, oligomer, or multimer of one or more (poly)peptides of interest, comprises the steps of:

[0215] providing one or more (poly)peptides of interest having a C-terminal K-D-X-G-A motif, and an N-terminal P1"-P2'-P3' motif, wherein X is any amino acid, P1' is A, V, G, S, T, L, or M, P2' is G or an analogue of G, and P3’ is A or an analogue of A,

[0216] providing a peptide ligase having the activity of MmCNT; and

[0217] contacting the one or more (poly) peptides of interest, and the peptide ligase having the activity of MmCNT under conditions that allow the peptide ligase to catalyse ligation of one (poly)peptide of interest with another (poly)peptide of interest to form a dimer, oligomer, or multimer (poly) protein. In particular, the peptide ligase catalyses intermolecular ligation between the C-terminal D–X bond of one (poly)peptide and the N-terminal P1'-P2'-P3'motif of another (poly)peptide.

[0218]

[0186] In various embodiments, the method further comprises immobilising a scaffold molecule on to a solid support material, thereby providing a foundation for enzymatic assembly of peptides into a polyprotein. The immobilised scaffold molecule may present one or more MmCNT recognition motifs for the peptide ligase, such that upon exposure to one or more (poly)peptides of interest bearing the corresponding complementary recognition motif, the peptide ligase catalyses the formation of a peptide bond between the scaffold and the peptide. Subsequent rounds of ligation with additional peptide units may then proceed at the available recognition sites, resulting in the stepwise formation of a dimer, oligomer, or multimer. The immobilisation of the scaffold molecule thus effectively functionalizes the support, enabling controlled, surface-anchored protein polymerisation and providing spatial orientation for downstream analyses or applications.

[0219]

[0187] In various embodiments, the scaffold molecule comprises one or more copies of the N-terminal pi"-p2".p3’ motif, or one or more copies of the C-terminal K-D-X-G-A motif, and the one or more (poly)peptides of interest, and the peptide ligase are contacted with the solid support material comprising the scaffold molecule immmobilised thereon.

[0188] In various embodiments, the method further comprises a capping step to terminate further polymerisation or undesired ligation reactions. The capping step may comprise introducing a capping peptide or molecule having a single MmCNT recognition motif (e g., an N-terminal pi'-pz'-ps" motif ora C-terminal K-D-X-G-A motif) and a non-reactive counterpart terminus to prevent additional rounds of ligation.

[0220]

[0189] In various embodiments, the cap comprises a cohesin (Coh) motif or a functionally equivalent tag that sterically or chemically prevents access of the peptide ligase to the reactive terminus. For example, the capping element may form part of a Coh-Xdoc complex, wherein a dockerin (Xdoc) domain fused to the terminal end of the polyprotein specifically binds to a complementary Coh domain provided on a capping molecule or scaffold. A polyprotein bearing a C-terminal Xdoc domain could be “capped” by adding a Coh-tagged molecule, preventing further enzymatic ligation while simultaneously providing a handle for immobilisation or detection. Alternative capping motifs that function analogously to the Coh-Xdoc system, including orthogonal dockerin-cohesin pairs or other high-affinity binding tags (e.g., SpyTag / SpyCatcher, SnoopTag / SnoopCatcher, or HaloTag systems), may also be employed to achieve selective termination or anchoring of the polymerised product. Other examples of capping strategies include (i) using peptides bearing blocked or acetylated N- orC-termini, (ii) introducing short spacer peptides terminating in a non-recognised motif, or (iii) adding competitive substrate analogues that bind but are not ligated by the peptide ligase.

[0221]

[0190] In various embodiments, the one or more (poly)peptides of interest may comprise any polypeptide, peptide domain, or protein module that is to be covalently joined to form a higher-order construct. The (poly)peptides of interest may be any folded protein possessing a defined a secondary and / or tertiary structure, and belong to one or more categories, including structural or mechanical proteins such as ubiquitin (Ub), the 27th immunoglobulin domain of titin (I27), the B1 domain of streptococcal protein G (GB1), fibronectin type III domains, ankyrin repeats, or leucine-rich repeat modules; enzymatic or catalytic proteins such as oxidoreductases, hydrolases, transferases, kinases, phosphatases, or proteases, including truncated or engineered fragments thereof that retain catalytic or binding activity; binding or recognition proteins such as antibody fragments (e.g., scFv, Fab, or nanobody domains), receptor-ligand interaction domains (e.g., SH2, PDZ, or WW domains), or DNA / RNA-binding motifs (e.g., zinc fingers, RRM domains, or PUF domains); reporter or tag proteins such as fluorescent proteins (e g., GFP, mCherry), luminescent reporters (e.g., luciferase), or affinity tags (e.g., His-tag, FLAG-tag, or Strep-tag) useful for detection, purification, or visualization of the ligated constructs; regulatory or signalling domains including phosphorylation motifs, ubiquitination sites, docking domains, or intrinsically disordered regions that modulate molecular interactions, phase separation, or post-translational regulation; and engineered or synthetic peptides including designed repeat proteins, de novo folded mini-proteins, or peptide scaffolds containing reactive termini for controlled ligation.

[0191] The (poly)peptides of interest may be monomeric units capable of repetitive ligation to form dimers, oligomers, or multimers, or may comprise distinct functional or structural domains to generate hetero-multimeric assemblies. In various embodiments, the (poly)peptides of interest include folded protein domains, enzymatic domains, fluorescent tags, or mechanically stable modules suitable for single-molecule force spectroscopy, protein immobilization, or modular bioconjugation assays.

[0222]

[0192] It is also contemplated that the peptide ligase having the activity of MmCNT disclosed herein may be used in conjunction with another peptide ligase (i.e. peptide asparaginyl ligase (PAL)) thereby performing stepwise ligations of two different (poly)peptides. The other peptide ligase (i.e. PAL) may be termed as a second peptide ligase herein that is different from the first peptide ligase (i.e. MmCNT) and recognises a different binding and ligation site to the first peptide ligase, to enable alternating, orthogonal chain extension.

[0223]

[0193] Accordingly, the method for preparing a dimer, oligomer, or multimer of two different (poly)peptides may use the peptide ligases having the activity of MmCNT in conjunction with a second peptide ligase. In various embodiments, the second peptide ligase is selected to catalyse site-specific peptide bond formation at a recognition motif distinct from that recognised by the MmCNT, thereby enabling sequential or orthogonal ligation reactions for controlled assembly of multicomponent (poly)proteins. It would be appreciated by a person skilled in the art that different peptide ligases and variants thereof having the desired protein ligase activity may be suitable for the practice of such methods. In various embodiments, the second peptide ligase may be Oldenlandia affinis asparaginyl endopeptidase 1 (OaAEP1) or a functional variant thereof (e.g., OaAEP1b-C247A), which recognises N-terminal GL or NGL motifs and C-terminal [N / D]HV, [N / D]HL, or similar motifs.

[0224]

[0194] In various embodiments, the second peptide ligase may be selected from butelase-1, butelase-2, VyPAL2, VyPAL3, HeAEP3, AtLEGy, VuPALI, HaPAL, OaAEP1b or a functional fragment or variant or homologue thereof exhibiting equivalent substrate specificity. In various embodiments, the second peptide ligase is selected from the group comprising butelase-1 comprising the amino acid sequence set forth in SEQ ID NO: 12, butelase-2 comprising the amino acid sequence set forth in SEQ ID NO:

[0225] 13 orSEQ ID NO: 14, VyPAL2 comprising the amino acid sequence set forth in SEQ ID NO: 15, VyPAL3 comprising the amino acid sequence set forth in SEQ ID NO: 16, HeAEP3 comprising the amino acid sequence set forth in SEQ ID NO: 17, AtLEGy comprising the amino acid sequence set forth in SEQ ID NO: 18, VuPALI comprising the amino acid sequence set forth in SEQ ID NO: 19, HaPALI comprising the amino acid sequence set forth in SEQ ID NO: 20, OaAEP1b comprising the amino acid sequence set forth in SEQ ID NO: 21, and a functional variant or a fragment thereof. In various embodiments, the second peptide ligase may be OaAEP1b comprising the amino acid sequence set forth in SEQ ID NO: 21, or a functional variant, or homologue, or a fragment thereof.

[0226]

[0195] Table 2: Amino acid sequences of the second peptide ligases.

[0227] SEQ Description Amino acid sequence

[0228]

[0229] ID NO butelase-1 MKNPLAILFLIATWAWSGIRDDFLRLPSQASKFFQADDNVEGTRWAV LVAGSKGYVNYRHQADVCHAYQILKKGGLKDENIIVFMYDDIAYNESNP

[0230] (Clitoria HPGVIINHPYGSDVYKGVPKDYVGEDINPPNFYAVLLANKSALTGTGS ternatea) GKVLDSGPNDHVFIYYTDHGGAGVLGMPSKPYIAASDLNDVLKKKHAS GTYKSIVFYVESCESGSMFDGLLPEDHNIYVMGASDTGESSWVTYCPL QHPSPPPEYDVCVGDLFSVAWLEDCDVHNLQTETFQQQYEVVKNKTI VALIEDGTHVVQYGDVGLSKQTLFVYMGTDPANDNNTFTDKNSLGTP RKAVSQRDADLIHYWEKYRRAPEGSSRKAEAKKQLREVMAHRMHIDN SVKHIGKLLFGIEKGHKMLNNVRPAGLPVVDDWDCFKTLIRTFETHCG SLSEYGMKHMRSFANLCNAGIRKEQMAEASAQACVSIPDNPWSSLHA GFSV

[0231] butelase-2 MGHHHHHHSSGVDLGTENLYFQSMARLNPQKEWDSVIRLPTEPVDA DTDEVGTRWAVLVAGSNGYENYRHQADVCHAYQLLIKGGLKEENIW

[0232] (Clitoria FMYDDIAWHELNPRPGVIINNPRGEDVYAGVPKDYTGEDVTAENLFAVI ternatea) LGDRSKVKGGSGKVINSKPEDRIFIFYSDHGAPGVLGMPNEQILYAMD FIDVLKKKHASGGYREMVIYVEACESGSLFEGIMPKDLNVFVTTASNAQ ENSWVTYCPGTEPSPPPEYTTCLGDLYSVAWMEDSESHNLRRETVN

[0233] His Tag, QQYRSVKERTSNFKDYAMGSHVMQYGDTNITAEKLYLFQGFDPATVN G252V and LPPHNGRIEAKMEVVHQRDAELLFMWQMYQRSNHLLGKKTHILKQIAE G182A TVKHRNHLDGSVELIGVLLYGPGKGSPVLQSVRDPGLPLVDNWACLK mutations SMVRVFESHCGSLTQYGMKHMRAFANICNSGVSESSMEEACMVACG GHDAGHL

[0234] butelase-2 MGHHHHHHSSGVDLGTENLYFQSMARLNPQKEWDSVIRLPTEPVDA DTDEVGTRWAVLVAGSNGYENYRHQADVCHAYQLLIKGGLKEENIW

[0235] (Clitoria FMYDDIAWHELNPRPGVIINNPRGEDVYAGVPKDYTGEDVTAENLFAVI ternatea) LGDRSKVKGGSGKVINSKPEDRIFIFYSDHGGAGVLGMPNEQILYAMD FIDVLKKKHASGGYREMVIYVEACESGSLFEGIMPKDLNVFVTTASNAQ

[0236] His tag, ENSWVTYCPGTEPSPPPEYTTCLGDLYSVAWMEDSESHNLRRETVN G252V and QQYRSVKERTSNFKDYAMGSHVMQYGDTNITAEKLYLFQGFDPATVN P183A LPPHNGRIEAKMEVVHQRDAELLFMWQMYQRSNHLLGKKTHILKQIAE mutations TVKHRNHLDGSVELIGVLLYGPGKGSPVLQSVRDPGLPLVDNWACLK SMVRVFESHCGSLTQYGMKHMRAFANICNSGVSESSMEEACMVACG GHDAGHL

[0237] VyPAL2 MQLFAAGVILFFLLALSGTIAGGLDVDSLQLPSEAAKFFHNDNSTNDDD SIGTRWAVLIAGSKGYHNYRHQADVCHMYQILRKGGVKDENIIVFMYD

[0238] (Viola DIAYNESNPFPGIIINKPGGENVYKGVPKDYTGEDINNVNFLAAILGNKS yedoensis) AIIGGSGKVLDTSPNDHIFIYYADHGAPGKIGMPSKPYLYADDLVDTLKQ KAATGTYKSMVFYVEACNAGSMFEGLLPEGTNIYAMAASNSTEGSWIT YCPGTPDFPPEFDVCLGDLWSITFLEDCDAHNLRTETVHQQFELVKKK IAYASTVSQYGDIPISKDSLSVYMGTDPANDNRTFVDENSLRPPLKVIH QHDADLYHIWCKYNMAPEGSSKKIEAQKQLLELMSHRAHVDNSITLIG KLLFGVNKASKVLNTVRPVGQPLVDDWQCLKAMIRTFETHCGSLSEY GMKHTLSFANMCNAGIQKEQLAEAAAQACVTFPSNPYSSLAEGFSA

[0239] VyPAL3 MQLFAAGVILFFLLALSGTIAGGLDVDSLQLPSEAAKFFHNDNSTNDDS SAGTKWAVLIAGSKGYQNYRHQADVCHAYQILRRGGVKDENIIVFMYD

[0240] (Viola DIAYDIRNPYPGTITNSPDKKDVYKGVPKDYTGEDVNVQNFLAVILGNK yedoensis) TALTGGSGKVLDTRPNDHIFIYYTDHGYAGVLGMPTQPYLYANDLIDTL KKKHASGTYESLVFYVEACESASIFEGLLPDGLNIYVSTAAKAGEGSW WYCPTQQPPVPAEYGTCVGDLYSVTWMEDCDLYNLRTQTLHQQYE MVKKKIAYASTVSQFGDLTITKDSLFEYMGTDPANEKHHYEDQENSLR PHVDAVHQREADLYHFWDKYQKASEGSRNKVAARKQLVEVMLHRMH VDDSIESIAKLLFGSDAKASEMMNTIRPPGQPLVSDWDCLKTMVRTFE THCGSLSEYGMKYTRFLA

[0241] HeAEP3 MKLLVPGVLLLFLLALSGIAAGRPDDFLRLPSEAAKSFLHNDDDSVGTR WAVLIAGSKGWQNYRHQADVCHAYQILKKGGLKDENIIVFMYDDIAYN

[0242] (hybanthus ESNPRPGIVINKPKGEDVYKGVPKDYTGENVNAVNFLAVLLANRSALT enneasperm GGSGKVLDSGPNDRIFIYYTDHGAPVTIGMPSKPYLVAKDLVDTLKKKH us) AAGTYKSMVFYIESCESGSMFDGLLPEDANIYGMTATNSTEGSWVTY

[0243] CPGQTDDYPEDDEYDVCFGDLWSVAWLEDCDAHNLRTETLDQQYEV

[0244]

[0245] VKKKIEYAHIPAQYGNVSLAKDSLFVYMGTDPANDNKTFVEENTLRRPL KAVHSRDADLLHFWHKYHKAPEGTSRKIDAQKQLVEVLSHRTHVDNSI KLVGELLFGVGKASEVLNTIRPAGQPLVDDWDCLKTMVRTFETHCGSL SEYGMKHMRSFANMCNAGVQKEQMAVAAGQACVTFPSNPWSSLDE GFSV

[0246] 18 AtLEGy SLEHHHHHHENLYFQGVGTRWAVLVAGSSGYGNYRHQADVCHAYQI LRKGGLKEENIVVLMYDDIANHPLNPRPGTLINHPDGDDVYAGVPKDY

[0247] (Arabidopsis TGSSVTAANFYAVLLGDQKAVKGGSGKVIASKPNDHIFVYYAXHGGPG thaliana) VLGMPNTPHIYAADFIETLKKKHASGTYKEMVIYVEAAESGSIFEGIMPK DLNIYVTTASNAQESSYGTYCPGMNPSPPSEYITCLGDLYSVAWMEDS ETHNLKKETIKQQYHTVKMRTSNYNTYSGGSHVMEYGNNSIKSEKLYL YQGFDPATVNLPLNELPVKSKIGVVNQRDADLLFLWHMYRTSEDGSR KKDDTLKELTETTRHRKHLDASVELIATILFGPTMNVLNLVREPGLPLVD DWECLKSMVRVFEEHCGSLTQYGMKHMRAFANVCNNGVSKELMEEA STAACGGYSEARYTVHPSILGYSA

[0248] 19 VuPALI MKLLAAGVILVSLLALSGTVAGGLDVDPLRLPSEAAKFFHNDNSTNDD DSIGTRWAVLIAGSKDYHNYRHQADVCHMYQILRKGGVKDENIIVFMY

[0249] (Viola DDIAYNESNPHPGIIINKPGGEDVYKGVPKDYTGEDVNNINFLAAILGNK uliginosa) SAIIGGSGKVLDTSPNDHIFIYYTDHGAPGKIGMPSKPYLYADDLVDTLK QKAATGTYKSMVFYVEACNAGSMFEGLLPEGTNIYAMAASNSTEGSW ITYCPGATPDFPPEYDICLGDLWSITFLEDCDAHNLRTETVHQQFELVK KNIAYASTVSQYGDIPISKDSLSVYMGTDPANDNRTFVDENSLKPPLKV IHQRDADLYHLWYKYNKAPEGSSKKIEAQKQLLELMSHRAHVDNSITLI GKLLFGVDKASKVLNTVRPVGQPLVDDWQCLKAMIRTFETHCGSLSE YGMKHTLSFANMCNAGIQKEQLAEAAAQACVTFPSNSYSSLAEGFSA

[0250] 20 HaPALI MACFSYRLICLLLVLMMVMALPNGAAAARRGSDYWDPFIRSPVDLEDD ELGNGTRWALLVAGSKGYQSYRHQANVCHAYQILKRGGLKDENIWF

[0251] (Helianthus MYDDIATCDENPRPGTIIHHPEGGDVYAGVPKDYTGDAVTADNFFAVIL annuus) GDKSSVKGGSGKVIDSKPDDRIFLYYTDHGAAGLLGMPEKPYWANDF VEVLKKKHAMGTYKEMVIYLEACESGSIFEGLLPEDLNIYAITSTKPEEP SYIIYCPDMNPPPPPEYTTCLGDTFSVAWMEDSETHNLKKESLAQQIN KVKERTSMFGTYANGSHVMEYGTKVIKPEKVYLYQGYNPETANLPAN RIHFDKKMESVNQRDGDLIYLWQKYKRSSVSNRAEALKQMTETLRYM AHLDSSVDMIGVLLFGPQNGGSILRSSRGRGLPLVDDWDCLKSMTRLF EKHCGLLTEYGMKHMRAFANICNNLVEETEVEEAIIATCSGKNIGPYAS LGAYSV

[0252] 21 OaAEP1b MGMAHHHHHHMQIFVKTLTGKTITLEVEPSDTIENVKAKIQDKEGIPPD QQRLIFAGKQLEDGRTLSDYNIQKESTLHLVLRLRGGARDGDYLHLPS

[0253] (Oldenlandia EVSRFFRPQETNDDHGEDSVGTRWAVLIAGSKGYANYRHQAGVCHA affinis) YQILKRGGLKDENIVVFMYDDIAYNESNPRPGVIINSPHGSDVYAGVPK DYTGEEVNAKNFLAAILGNKSAITGGSGKVVDSGPNDHIFIYYTDHGAA GVIGMPSKPYLYADELNDALKKKHASGTYKSLVFYLEACESGSMFEGIL PEDLNIYALTSTNTTESSWCYYCPAQENPPPPEYNVCLGDLFSVAWLE DSDVQNSWYETLNQQYHHVDKRISHASHATQYGNLKLGEEGLFVYM GSNPANDNYTSLDGNALTPSSIVVNQRDADLLHLWEKFRKAPEGSAR KEEAQTQIFKAMSHRVHIDSSIKLIGKLLFGIEKCTEILNAVRPAGQPLVD DWACLRSLVGTFETHCGSLSEYGMRHTRTIANICNAGISEEQMAEAAS

[0254]

[0255] QACASIP

[0256]

[0196] It will be appreciated that recognition motifs at the C- and N-termini for the above well-characterized peptide ligases are well established in the art and readily available to the skilled person. For example, butelase-1 and butelase-2 recognise a C-terminal Asn / Asp-His-Val (NHV) or related motif and ligate to an N-terminal residue bearing a small side chain such as Gly, Ala, Ser or Thr. VyPAL2 and VyPAL3 display similar substrate preferences, acting on C-terminal NHV or DHV motifs. OaAEP1b recognize C-terminal NGL or NXL motifs, while HeAEP3 and AtLEGy act on analogous Asn- or Asp-terminated motifs. Other ligases such as VuPALI and HaPAL accept related C-terminal Asn-containing tripeptides (e.g., NGL, NDV, NHV) and ligate to N-terminal nucleophiles beginning with small, uncharged residues (typically Gly, Ala, Ser, or Thr). These C- and N-terminal recognition motifs, along with corresponding substrate specificities and kinetic parameters, are comprehensively documented in the peptide-ligase literature and provide ready-to-use templates for engineering or screening of ligation reactions by those skilled in the field.

[0257]

[0197] In various embodiments, the method for preparing a dimer, oligomer, or multimer of two different (poly)peptides comprises,

[0258] providing a first peptide ligase having the activity of MmCNT,

[0259] providing a second peptide ligase, wherein the second peptide ligase is different to the first peptide ligase,

[0260] providing at least one first (poly)peptide having a C- or N-terminal MmCNT recognition motif, and a C- or N-terminal recognition motif for the second peptide ligase,

[0261] providing at least one second (poly)peptide having a C- or N-terminal MmCNT recognition motif, and a C- or N-terminal recognition motif for the second peptide ligase,

[0262] wherein the at least one first and second (poly)peptide each comprise one recognition motif for each of the first and second peptide ligase, and the first (poly)peptide has different C- and N-terminal recognition motifs to the second (poly)peptide,

[0263] contacting the first (poly)peptide and the second (poly)peptide with the first and second peptide ligase, under conditions suitable for a cleavage and ligation reaction, to form a dimer, oligomer, or multimer of two different (poly)peptides.

[0264]

[0198] In various embodiments, the method further comprises immobilising a scaffold molecule onto a solid support material, thereby providing a foundation for enzymatic assembly of the two (poly)peptides into a polyprotein.

[0265]

[0199] In various embodiments, the scaffold molecule comprises one or more copies of the N-terminal pi'-p2".p3’ motif, or one or more copies of the C-terminal K-D-X-G-A motif. Alternatively, the scaffold molecule may comprise one or more copies of the N-terminal recognition motif for the second peptide ligase, or one or more copies of the C-terminal recognition motif for the second peptide ligase.

[0266]

[0200] In various embodiments, the method comprises preparing a mixture of the first and second (poly)peptides, and the first and second peptide ligase; and subjecting the mixture to conditions that allow the peptide ligase to concurrently catalyse the ligation of the first and second (poly) peptides to form the polyprotein. This enables concurrent or overlapping ligation events under compatible reaction conditions. When incubated together under mild aqueous buffer conditions permissive for both enzymes, the two ligases catalyse orthogonal ligation reactions, each proceeding selectively at its own recognition motifs without substantial cross-reactivity. This results in the simultaneous formation of multiple covalent linkages and the assembly of hybrid polyproteins comprising alternating or interspersed ligation junctions derived from each peptide ligase. The use of orthogonal enzymes in a single reaction permits one-pot synthesis of complex, multi-domain polyproteins with reduced processing time and simplified workflow, while maintaining positional control via selective motif design.

[0267]

[0201] In various embodiments, the ligation reactions may be performed in a stepwise or serial manner, allowing controlled and programmable elongation of the polyprotein chain. In one implementation, a first ligation step may be carried out using the first peptide ligase, catalysing bond formation between a first pair of complementary recognition motifs. Non-ligated peptides maybe removed after this step and the ligation product isolated. Following completion of this step, the reaction mixture may be subjected to washing, buffer exchange, or enzyme deactivation, after which a second ligation step may be performed using the second peptide ligase and corresponding recognition motifs. These alternating steps may be repeated multiple times to achieve a polyprotein of predefined length or composition, with each round of ligation extending the chain by one or more (poly)peptide units. The stepwise process affords precise temporal control over each ligation event, minimises undesired side reactions, and enables analytical confirmation of intermediate products between steps. After each ligation step, the non-ligated peptides may be removed and the ligation product isolated to avoid side reactions.

[0268]

[0202] In various embodiments, the method may comprise,

[0269] immobilising a scaffold molecule onto a solid support material, wherein the scaffold molecule comprises one or more copies of the N-terminal MmCNT recognition motif,

[0270] contacting a first (poly)peptide comprising a C-terminal MmCNT recognition motif and an N-terminal recognition motif for the second peptide ligase, with the first peptide ligase, under conditions suitable for a cleavage and ligation reaction to ligate the first (poly)peptide to the scaffold molecule to form an immobilised first (poly)peptide,

[0271] contacting a second (poly)peptide comprising a C-terminal recognition motif for the second peptide ligase and an N-terminal MmCNT recognition motif, with the second peptide ligase, under conditions suitable for a cleavage and ligation reaction to ligate the immobilised first (poly)peptide to the second (poly)peptide to form a dimer polyprotein.

[0272]

[0203] To further form oligomers or multimers of a desired number of (poly)peptides, after forming the dimer, the method may further comprise,

[0273] contacting the first (poly)peptide comprising a C-terminal MmCNT recognition motif and an N-terminal recognition motif for the second peptide ligase, with the first peptide ligase, and the dimer polyprotein, under conditions suitable for a cleavage and ligation reaction to ligate the first (poly) peptide to the dimer polyprotein to form a trimer polyprotein, and

[0274] optionally, contacting the second (poly)peptide comprising a C-terminal recognition motif for the second peptide ligase and an N-terminal MmCNT recognition motif, with the second peptide ligase and the trimer polyprotein, under conditions suitable for a cleavage and ligation reaction to ligate the second (poly)peptide to the trimer polyprotein to form a tetramer polyprotein, and

[0275] optionally, repeating these contacting steps to extend the polyprotein to form oligomers or multimers of a desired length.

[0204] As will be appreciated, the optional subsequent repeating ligation steps may be carried out with alternating use of the first and second peptide ligases such that successive additions of the first and second (poly)peptides are catalysed in turn under conditions selective for each ligase, thereby extending the chain in a controlled mannerto yield a pentamer, or other oligomers and multimer (greater than 10 (poly)peptides ligated together) polyproteins.

[0276]

[0205] The first and second (poly)peptide may be selected from any one of those listed as the one or more (poly)peptides of interest above.

[0277]

[0206] In various embodiments, the two peptide ligases are orthogonal such that each ligase catalyses bond formation only at its cognate recognition motifs without substantially perturbing or re-editing bonds formed by the other ligase.

[0278]

[0207] In various embodiments, the second peptide ligase may be OaAEP1 (C247A) or a functional variant thereof. The OaAEP1 (C247 A) comprises or consists of the amino acid sequence set forth in SEQ ID NO:7 or 8, or a functional variant thereof. OaAEP-C247A has a PDB access code: 5H0I and further information can be found in Yang R, et al. (2017) J Am Chem Soc 139(15):5351– 5358.

[0279]

[0208] Table 3: Amino acid sequences of OaAEP1 (C247A). The bold letter indicates the C247A mutation.

[0280] SEQ ID Description Amino acid sequence

[0281] NO

[0282] 7 OaAEP1 b- MVRYLAGAVLLLVVLSVAAAVSGARDGDYLHLPSEVSRFFRPQ C247A ETNDDHGEDSVGTRWAVLIAGSKGYANYRHQAGVCHAYQILKR GGLKDENIVVFMYDDIAYNESNPRPGVIINSPHGSDVYAGVPKD YTGEEVNAKNFLAAILGNKSAITGGSGKVVDSGPNDHIFIYYTDH GAAGVIGMPSKPYLYADELNDALKKKHASGTYKSLVFYLEACES GSMFEGILPEDLNIYALTSTNTTESSWAYYCPAQENPPPPEYNV CLGDLFSVAWLEDSDVQNSWYETLNQQYHHVDKRISHASHAT QYGNLKLGEEGLFVYMGSNPANDNYTSLDGNALTPSSIVVNQR DADLLHLWEKFRKAPEGSARKEEAQTQIFKAMSHRVHIDSSIKLI GKLLFGIEKCTEILNAVRPAGQPLVDDWACLRSLVGTFETHCGS LSEYGMRHTRTIANICNAGISEEQMAEAASQACASIP

[0283] 8 Core domain + GTRWAVLIAGSKGYANYRHQAGVCHAYQILKRGGLKDENIVVF Linker + Cap MYDDIAYNESNPRPGVIINSPHGSDVYAGVPKDYTGEEVNAKNF domain of LAAILGNKSAITGGSGKVVDSGPNDHIFIYYTDHGAAGVIGMPSK OaAEP1 b- PYLYADELNDALKKKHASGTYKSLVFYLEACESGSMFEGILPED C247A LNIYALTSTNTTESSWAYYCPAQENPPPPEYNVCLGDLFSVAWL EDSDVQNSWYETLNQQYHHVDKRISHASHATQYGNLKLGEEGL FVYMGSNPANDNYTSLDGNALTPSSIVVNQRDADLLHLWEKFR KAPEGSARKEEAQTQIFKAMSHRVHIDSSIKLIGKLLFGIEKCTEIL NAVRPAGQPLVDDWACLRSLVGTFETHCGSLSEYGMRHTRTIA

[0284]

[0285] NICNAGISEEQMAEAASQACASIP

[0286]

[0209] Accordingly, the “variants" disclosed herein share a % sequence identity or % sequence homology with the reference amino acid sequence set forth in SEQ ID NO: 7 or 8. In particular, the variants may comprise an amino acid sequence that shares at least 65%, preferably at least 75%, even more preferably at least 85%, most preferably at least 95% sequence identity with, or at least 80%, preferably at least 90%, more preferably at least 95% sequence homology with the amino acid sequence as set forth in SEQ ID NOT or 8; or a functional fragment thereof.

[0287]

[0210] The peptide ligase OaAEP1(C247A) has the ability to site-specifically break a peptide bond and then reform a new bond with an incoming nucleophile. It is Asx-specific in that the C-terminal amino acid to which ligation occurs, i.e. the C-terminal end of the peptide that is ligated, is either N(Asn) or D(Asp), preferably N(Asn). OaAEP1(C247A) recognises the motif N / D-G-L, at the C-terminus of a peptide, and mediates peptide ligation by cleaving off the sorting signal GL and ligating the N / D to the N-terminal residue of a second peptide GL or GGto form a ligated peptide (i.e. first peptide-N / D-G-L-second peptide), thereby joining the two peptides in a head-to-tail orientation.

[0288]

[0211] Accordingly, in various embodiments, the first (poly)peptide has C- or N-terminal recognition motif for OaAEP1 (C247A), and the second (poly)peptide has a C- or N-terminal recognition motif for OaAEP1 (C247A).

[0289]

[0212] In various embodiments, the method may comprise, immobilising a first (poly)peptide onto a functionalised substrate by the first (poly)peptide ligase (MmCNT), ligating a second (poly)peptide to the first (poly)peptide by a second peptide ligase (OaAEP1 (C247A)), where the ligation steps are repeated alternately using the first and second peptide ligases (i.e. MmCNT and OaAEP1) to extend the chain in a controlled manner, yielding polyproteins.

[0290]

[0213] In various embodiments, the method for preparing a dimer, oligomer, or multimer of two different (poly)peptides comprises,

[0291] immobilising a scaffold molecule onto a solid support material, wherein the scaffold molecule comprises one or more copies of the N-terminal pi"-p2"-p3" motif, to form an immobilised scaffold molecule,

[0292] providing at least one first (poly)peptide having a C-terminal K-D-X-G-A motif, and an N-terminal GL motif,

[0293] providing at least one second (poly)peptide having a N-terminal P1"-P2’-P3’ motif, and a C-terminal NGL motif,

[0294] wherein X is any amino acid, P1' is A, V, G, S, T, L, or M, P2" is G or an analogue of G, and P3' is A or an analogue of A,

[0295] providing a first peptide ligase having the activity of MmCNT,

[0296] providing a second peptide ligase that is OaAEP1 (C247A) ora functional variant thereof, and contacting the at least one first (poly)peptide and the at least one second (poly)peptide with the immobilised scaffold molecule, and the first and second peptide ligase, under conditions suitable for a cleavage and ligation reaction, to form a dimer, oligomer, or multimer of the two different (poly)peptides.

[0297] ❖ Peptide Ligase having the activity of MmCNT

[0214] In another aspect, the present invention may be directed to the peptide ligase having the activity of MmCNT, comprising or consisting of an amino acid sequence set forth in SEQ ID NO: 2 or 3, or variants, or fragments thereof.

[0298]

[0215] In various embodiments, the peptide ligase having the activity of MmCNT in accordance with the present application comprises or consists of:

[0299] (i) the amino acid sequence set forth in SEQ ID NO: 2;

[0300] (ii) an amino acid sequence that shares at least 65%, preferably at least 75%, even more preferably at least 85%, most preferably at least 95% sequence identity with, or at least 80%, preferably at least 90%, more preferably at least 95% sequence homology with the amino acid sequence as set forth in SEQ ID NO: 2; or

[0301] (iii) a functional fragment of (i) or (ii),

[0302] wherein the peptide ligase comprises an amino acid residue A at the position corresponding to position 1 of SEQ ID NO:2.

[0303]

[0216] In various embodiments, the peptide ligase having the activity of MmCNT in accordance with the present application comprises or consists of:

[0304] (i) the amino acid sequence set forth in SEQ ID NO: 3;

[0305] (ii) an amino acid sequence that shares at least 65%, preferably at least 75%, even more preferably at least 85%, most preferably at least 95% sequence identity with, or at least 80%, preferably at least 90%, more preferably at least 95% sequence homology with the amino acid sequence as set forth in SEQ ID NO: 3; or

[0306] (iii) a functional fragment of (i) or (ii),

[0307] wherein the peptide ligase comprises an amino acid residue S at the position corresponding to position 192 of SEQ ID NO:3.

[0308]

[0217] In various embodiments, the peptide ligase comprises or consists of the amino acid sequence as set forth in SEQ ID NO:2. In various embodiments, the peptide ligase comprises or consists of the amino acid sequence as set forth in SEQ ID NO:3.

[0309]

[0218] In various embodiments, the peptide ligase comprises the amino acid sequence set forth in SEQ ID NO: 4 (KKRRLYASAGNFAIAE) at the positions corresponding to residues 88-103 of SEQ ID NO:2 or 3.

[0310]

[0219] In various embodiments, the peptide ligase comprises the amino acid sequence set forth in SEQ ID NO: 5 (NEFTKQVANKCFKDNW) at the positions corresponding to residues 125-140 of SEQ ID NO:2 or 3.

[0311]

[0220] In various embodiments, the peptide ligase comprises the amino acid sequence set forth in SEQ ID NO: 4 and 5 at the positions corresponding to residues 88-103 and 125-140 of SEQ ID NO:2 or 3.

[0221] In various embodiments, the peptide ligase comprises the amino acid sequence set forth in SEQ ID NO: 6 (KKRRLYASAGNFAIAELINTEMTLTSQGKGSNFIAFGN EFTKQVANKCFKDNW) at the positions corresponding to residues 88-140 of SEQ ID NO:2 or 3.

[0312]

[0222] In various embodiments, the peptide ligase may comprise one or more mutations at a position selected from position 1, 38, 79, 81, 83, 125, 192 and combinations thereof, wherein position numbering is relative to the amino acid sequence set forth in SEQ ID NO: 2 or 3.

[0313]

[0223] In various embodiments, the peptide ligase comprises an amino acid residue A at the position corresponding to position 1 of SEQ ID NO:2 or 3. In various embodiments, the peptide ligase comprises an amino acid residue A at the position corresponding to position 81 of SEQ ID NO:2 or 3. In various embodiments, the peptide ligase comprises an amino acid residue S at the position corresponding to position 125 of SEQ ID NO:2 or 3. In various embodiments, the peptide ligase comprises an amino acid residue S at the position corresponding to position 192 of SEQ ID NO:2 or 3.

[0314]

[0224] In various embodiments, the peptide ligase comprises an amino acid residue S, A, H or D at the position corresponding to position 38 of SEQ ID NO:2 or 3, an amino acid residue A at the position corresponding to position 79 of SEQ ID NO:2 or 3, an amino acid residue A, G or D at the position corresponding to position 81 of SEQ ID NO:2 or 3, an amino acid residue A at the position corresponding to position 83 of SEQ ID NO:2 or 3, and / or an amino acid residue S, A or G at the position corresponding to position 125 of SEQ ID NO:2 or 3.

[0315]

[0225] In various embodiments, the peptide ligase comprises an amino acid residue A at the position corresponding to position 1, an amino acid residue S at the position corresponding to position 192, and an amino acid mutation at position 81 and / or 125, preferably the mutation is S81 A and / or N125S.

[0316]

[0226] In various embodiments, the peptide ligase may comprise one or more or all of the following residues at the designated positions, wherein position numbering is in accordance with SEQ ID NO:2 or 3. These residues may be considered as invariable:

[0317] an amino acid residue F at the position corresponding to position 99 of SEQ ID NO:2 or 3;

[0318] an amino acid residue I at the position corresponding to position 101 of SEQ ID NO:2 or 3;

[0319] an amino acid residue E at the position corresponding to position 103 of SEQ ID NO:2 or 3;

[0320] an amino acid residue N at the position corresponding to position 119 of SEQ ID NO:2 or 3, an amino acid residue I at the position corresponding to position 121 of SEQ ID NO:2 or 3;

[0321] an amino acid residue F at the position corresponding to position 123 of SEQ ID NO:2 or 3;

[0322] an amino acid residue N at the position corresponding to position 133 of SEQ ID NO:2 or 3; an amino acid residue K at the position corresponding to position 137 of SEQ ID NO:2 or 3; and an amino acid residue W at the position corresponding to position 140 of SEQ ID NO:2 or 3.

[0227] In various embodiments, the present invention thus also relates to functional fragments of the peptide ligases described herein, with said fragments retaining enzymatic activity. It is preferred that they have at least 50%, more preferably at least 70%, most preferably at least 90% of the protein ligase activity of the initial molecule, preferably of the peptide ligase having the amino acid sequence of SEQ ID NO: 2 or 3. The functional fragments are preferably at least 150 amino acids in length, more preferably at least 180 or 190.

[0323]

[0228] In various embodiments, the peptide ligase may be truncated at the C-terminal, such that the residue at position 192 of SEQ ID NO: 2 or 3 represents the N-terminus of the MmCNT.

[0324]

[0229] In various embodiments, the peptide ligase may comprises an affinity tag and optionally a cleavage sequence positioned at the N- or C-terminal of the amino acid sequence as set forth in (i)-(iii) above, more particularly the affinity tag, and optionally the cleavage sequence, are positioned at the N-orC-terminus of the amino acid sequence as set forth in (i)-(iii). In various embodiments, the cleavage sequence is positioned between the affinity tag and the amino acid sequence as set forth in (i)-(iii). In various embodiments, the peptide ligase may comprise a His-tag, and optionally a TEV cleavage site.

[0325]

[0230] The structure of the MmCNT peptide ligase in both its apo and substrate-bound forms are also described herein.

[0326]

[0231] The apo form of MmCNT adopts a dimeric conformation stabilised by disulfide bonds C64-C192, exhibiting a crocodile-like fold with a substrate-binding groove. Accordingly, in various embodiments, the peptide ligase has a three-dimensional structure corresponding to the atomic coordinates deposited under Protein Data Bank (PDB) accession number 8JTU. The amino acid sequence of the apo form of MmCNT (PDB: 8JTU) is that of SEQ ID NO:2, and the X-ray Crystallography data of MmCNT (PDB: 8JTU) is set forth in Table 6 below.

[0327]

[0232] In various embodiments, the peptide ligase has an amino acid sequence that adopts an overall tertiary structure analogous to the enzymatic scaffold defined by Protein Data Bank PDB: 8JTU, chain A, residues 1-192, wherein the root mean square deviation (RMSD) of backbone atoms following structural alignment is within 1.5 A. In this regard, upon structural alignment with PDB: 8JTU, chain A, residues 1-192, the peptide ligase may exhibit a root-mean-square deviation (RMSD) of not more than 1.5 A across backbone atoms, thereby representing an analogous enzymatic scaffold.

[0328]

[0233] The substrate-bound form of MmCNT captures a catalytic intermediate with the substrate peptide. Accordingly, in various embodiments, the peptide ligase has a three-dimensional structure corresponding to the atomic coordinates deposited under Protein Data Bank (PDB) accession number 8WKD. The amino acid sequence of the substrate-bound form of MmCNT (PDB: 8WKD) is that of SEQ ID NO:3, and the X-ray Crystallography data of MmCNT (PDB: 8WKD) is set forth in Table 6 below.

[0234] In various embodiments, the peptide ligase has an amino acid sequence that adopts an overall tertiary structure analogous to the enzymatic scaffold defined by Protein Data Bank PDB: 8WKD, chain A, residues 1-192, wherein the root mean square deviation (RMSD) of backbone atoms following structural alignment is within 1.5 A. In this regard, upon structural alignment with PDB: 8WKD, chain A, residues 1-192, the peptide ligase may exhibit a root-mean-square deviation (RMSD) of not more than 1.5 A across backbone atoms, thereby representing an analogous enzymatic scaffold.

[0329]

[0235] The term "atomic coordinates" refers to the Cartesian coordinates corresponding to an atom's spatial relationship to other atoms in a molecule or molecular complex. Various software programs allow for the graphical representation of a set of structural coordinates to obtain a three-dimensional representation of a molecule or molecular complex. The atomic coordinates of the present disclosure may be modified from the original set by mathematical manipulation, such as by inversion or integer additions or subtractions. As such, it is recognised that the structural coordinates of the present disclosure are relative, and are in no way specifically limited by the actual x, y, z coordinates.

[0330]

[0236] The term "atomic structure" refers to a three dimensional representation of the atoms in a molecule or molecular complex. An atomic structure may be derived from atomic coordinates as described above. An atomic structure may also be derived from computational manipulation of a received or previously obtained set of atomic coordinates. Such computational manipulation may be performed to produce an alternative or new atomic structure of a previously derived atomic structure based on new information. An alternative or new atomic structure of an initially modelled molecule may represent an alternate conformation of the modelled molecule orthe conformation of a second molecule that is closely related to the initially modelled molecule. The new information used to inform manipulation may be obtained from practical data, for example electron density maps derived from electron microscopy. New information that may inform computational manipulation may also be obtained from computational data, for example the results of computationally docking two atomic structures or molecular models. A computationally manipulated atomic structure may be utilised to produce a new set of atomic coordinates representing the newly derived three dimensional molecule or molecular complex.

[0331]

[0237] The term " Root mean square deviation" is the square root of the arithmetic mean of the squares of the deviations from the mean, and is a way of expressing deviation or variation from the structural coordinates described herein. The present disclosure includes all embodiments comprising conservative substitutions of the noted amino acid residues resulting in the same structural coordinates within the stated root mean square deviation. It will be apparent to the skilled practitioner that the numbering of the amino acid residues of MmCNT may be different from that set forth herein, and may contain certain conservative amino acid substitutions that yield the same three dimensional structures as those defined by Table 6. Corresponding amino acids and conservative substitutions in other isoforms or analogues are easily identified by visual inspection of the relevant amino acid sequences or by using commercially available homology software programs.

[0238] It is also contemplated that there is provided a fusion protein comprising the peptide ligase operably fused to one of the ligation substrates comprising the N- or C-terminal MmCNT recognition motif. The fusion may be effected at either the N- orC-terminus of the peptide ligase, optionally through a linker sequence that maintains the catalytic activity of the enzyme and the accessibility of the C-terminal recognition motif of the first (poly)peptide. Suitable linker sequences include flexible glycine-or serine-rich linkers, or other amino acid spacers of 5-30 residues that reduce steric interference between the enzyme and substrate domains.

[0332]

[0239] In various embodiments of the fusion protein, the peptide ligase is operably fused to a (poly)peptide of interest comprising a N- or C-terminal MmCNT recognition motif. The fusion protein may comprise the peptide ligase covalently linked, either directly or through a flexible linker, to the (poly)peptide of interest. The fusion being configured such that the peptide ligase retains its catalytic activity and the N- or C-terminal MmCNT recognition motif remains sterically accessible for substrate recognition and ligation. In this regard, the (poly)peptide may function either as a ligase substrate or as a targeting protein depending on the application. In various embodiments, the (poly)peptide serves as a C-terminal substrate for the peptide ligase, facilitating proximity-enhanced or self-templated ligation. In other embodiments, the (poly)peptide may function as a targeting protein, such as an antibody or antibody fragment, receptor-binding domain, cell-penetrating peptide (CPP), or other localization sequence, thereby enabling site-specific or targeted ligation of a complementary N-terminal substrate at a defined molecular or cellular site.

[0333]

[0240] In another aspect, the invention relates to the use of the peptide ligase having the activity of MmCNT for (i) the ligation oftwo peptides, (ii) modifying or tagging a cell surface, (iii) intracellular ligation of two peptides, (iv) tandem ligation, and / or (v) (poly)peptide polymerization, as described herein.

[0334]

[0241] In another aspect, the invention relates to the use of the peptide ligase having the activity of MmCNT in combination with a different peptide ligase, preferably OaAEP1 (C247A), for (i) the ligation of two peptides, (ii) modifying or tagging a cell surface, (iii) intracellular ligation of two peptides, (iv) tandem ligation, and / or (v) (poly)peptide polymerization, as described herein.

[0335]

[0242] Nucleic acid molecules encoding the peptide ligase disclosed herein are also provided. All embodiments disclosed above in relation to the polypeptide and fusion protein similarly apply to the nucleic acid molecules and vice versa.

[0336]

[0243] The nucleic acid molecules encoding the MmCNT peptide ligase described herein, as well as a circular DNA molecule containing such a nucleic acid, in particular a plasmid, vector, cosmid, bacterial artificial chromosome (BAC), bacteriophage, viral vector or hybrids thereof also form part of the present invention.

[0244] In various embodiments, the nucleic acid molecule may be comprised in a bacterial plasmid or is a bacterial plasmid. The term “bacterial plasmid" as used herein refers to a circular DNA molecule capable of replication in a bacterial host cell. A bacterial plasmid may contain an appropriate origin of replication, which is a sequence of DNA sufficient to enable the replication of the plasmid in a host bacterial cell. A bacterial plasmid may also contain a selectable marker sequence, which encodes a selectable marker conferring cellular resistance to antibiotics such as ampicillin, kanamycin, chloramphenicol, and tetracycline.

[0337]

[0245] In various embodiments, the nucleic acid molecule, or vector, or plasmid, containing the nucleic acid molecule, further comprises regulatory elements for controlling expression of said nucleic acid molecule.

[0338]

[0246] The term "operably linked" as used herein refers to the relationship between two or more nucleotide sequences that interact physically or functionally. For example, a promoter or regulatory nucleotide sequence is said to be operably linked to a nucleotide sequence that codes for an RNA or a protein if the two sequences are situated such that the regulatory nucleotide sequence will affect the expression level of the coding or structural nucleotide sequence. “Regulatory nucleotide sequences” or “regulatory elements” as used herein refer to nucleotide sequences that influence the timing and level / amount of transcription, RNA processing or stability, or translation of the associated coding sequence. Regulatory sequences may include promoters; translation leader sequences; introns; enhancers; stem-loop structures; repressor binding sequences; termination sequences; and polyadenylation recognition sequences. Particular regulatory sequences may be located upstream and / or downstream of a coding sequence operably linked thereto.

[0339]

[0247] Another aspect of the invention relates to a host cell comprising the peptide ligase or nucleic acid molecule encoding the same disclosed herein. All embodiments disclosed above in relation to the peptide ligase and nucleic acid molecule encoding the same disclosed herein, similarly apply to the host cell, and vice versa.

[0340]

[0248] Host cells disclosed herein may be used to in the methods disclosed herein, such as the methods for tagging the surface of a cell, and the method for intracellular (poly)peptide ligation.

[0341]

[0249] The host cells may also be used for manufacture the peptide ligase described herein. Accordingly, a further aspect of the invention is therefore a method of producing / manufacturing a peptide ligase as disclosed herein, comprising culturing a host cell contemplated herein under conditions that allow expression of the peptide ligase; and isolating the peptide ligase from the culture medium or from the host cell. Culture conditions and mediums can be selected by those skilled in the art based on the host organism used by resorting to general knowledge and techniques known in the art.

[0250] In this regard, the host cell may be readily manipulated in microbiological and biotechnological terms. This refers, for example, to easy culturability, high growth rates, low demands in terms of fermentation media, and good production and secretion rates for the peptide ligase. The peptide ligase can furthermore be modified, after their manufacture, by the cells producing them, for example by the addition of sugar molecules, formylation, amination, etc. Post-translation modifications of this kind can functionally influence the polypeptide.

[0342]

[0251] In various embodiments, the method may further comprise purifying the isolated peptide ligase disclosed herein. “Purification” or “purifying” herein means the process of removing components from a host cell or culture, the presence of which is not desired. Purification is a relative term and does not require that all traces of the undesirable component be removed.

[0343]

[0252] There is also provided a composition comprising the peptide ligase, nucleic acid molecule encoding the peptide ligase, or the host cell comprising the nucleic acid molecule encoding the peptide ligase disclosed herein.

[0344]

[0253] In various embodiments, the composition may comprise one or more additional components or agents that enhance the function, delivery, or stability, of the peptide ligase, orthe host cell comprising the nucleic acid molecule encoding the peptide ligase for one or more of the methods and uses described above. Accordingly, the compositions disclosed herein may be formulated and adapted for use in a wide range of applications and methods.

[0345]

[0254] There is also provided a solid support material comprising the peptide ligase having the activity of MmCNT disclosed herein immobilized thereon. In another aspect, the invention relates to the use of the solid support material forthe on-column cyclization and / or ligation of at least one substrate peptide, wherein a method for the cyclisation or ligation of the at least one substrate peptide, may comprise contacting a solution comprising the at least one substrate peptide with the solid support material under conditions that allow cyclization and / or ligation of the at least one substrate peptide.

[0346] ❖ Crystalline Form of the Peptide ligase having the activity of MmCNT

[0347]

[0255] In another aspect, there is also provided a crystalline form of the peptide ligase having the activity of MmCNT comprising or consisting of an amino acid sequence set forth in SEQ ID NO: 2, or variants, or fragments thereof. The crystalline form is characterized by atomic coordinates corresponding substantially to those deposited under Protein Data Bank (PDB) accession number 8JTU, representing the apo-enzyme conformation of MmCNT. In various embodiments, wherein the crystalline form comprises or consists of an amino acid sequence having at least 65%, 70%, 75%, 80%, 85%, 90%, or 95% amino acid sequence identity to the amino acid sequence set forth in SEQ ID NO:2. The X-ray Crystallography data of the MmCNT (PDB: 8JTU) is set forth in Table 6 below.

[0256] In various embodiments, the crystalline form adopts an overall tertiary structure analogous to the enzymatic scaffold defined by PDB: 8JTU, chain A, residues 1–192, wherein the RMSD of backbone atoms following structural alignment is within 1.5 A.

[0348]

[0257] In various embodiments, the crystalline form is characterized with space group P3i 2 1, and has unit cell parameters of a=91 A, b=91 A, c=90 A, a=p=90°, y=120°, and wherein said peptide ligase comprises or consists of an amino acid sequence having at least 65% amino acid sequence identity to the amino acid sequence set forth in SEQ ID NO:2.

[0349]

[0258] In another aspect, there is also provided a crystalline form of the peptide ligase having the activity of MmCNT comprising or consisting of an amino acid sequence set forth in SEQ ID NO: 3, or variants, or fragments thereof. The crystalline form is characterized by atomic coordinates corresponding substantially to those deposited under Protein Data Bank (PDB) accession number 8WKD, representing the substrate-bound conformation of MmCNT. In various embodiments, wherein the crystalline form comprises or consists of an amino acid sequence having at least 65%, 70%, 75%, 80%, 85%, 90%, or 95% amino acid sequence identity to the amino acid sequence set forth in SEQ ID NO: 3. The X-ray Crystallography data of the MmCNT (PDB: 8WKD) is set forth in Table 6 below. In various embodiments, the crystalline form is a co-crystal comprising the crystalline form characterized by atomic coordinates corresponding to those deposited under PDB 8WKD bound to a substrate, wherein the substrate may be comprise a C-terminal K-D-X-G-A motif and / or a N-terminal pi’-pz'-pa" motif as disclosed herein.

[0350]

[0259] In various embodiments, the crystalline form adopts an overall tertiary structure analogous to the enzymatic scaffold defined by PDB: 8WKD, chain A, residues 1–192, wherein the RMSD of backbone atoms following structural alignment is within 1.5 A.

[0351]

[0260] In various embodiments, the crystalline form is characterized with space group P2- 2i 2, and has unit cell parameters of a=54 A, b=101 A, c=33 A, □= =7=90°, and wherein said peptide ligase comprises or consists of an amino acid sequence having at least 65% amino acid sequence identity to the amino acid sequence set forth in SEQ ID NO:3.

[0352]

[0261] As used herein, the term “crystalline form” refers to a solid state of a polypeptide or protein, such as a peptide ligase having the activity of MmCNT, in which the constituent molecules are arranged in a periodic three-dimensional lattice detectable by X-ray diffraction. The crystalline form may comprise a single-component protein crystal or a multiple-component crystal, including, but not limited to, apoenzyme crystals (enzyme alone), substrate-bound or inhibitor-bound complexes, and co-crystals comprising the enzyme in association with one or more ligands, substrates, peptides, cofactors, or buffer components. Thus, the crystalline forms may correspond substantially to the apo crystal structure deposited under PDB 8JTU or the substrate-bound crystal structure deposited under PDB 8WKD, or to a crystal having an equivalent or isomorphous lattice. The crystalline form may exist as different polymorphs, solvates, hydrates, or crystal complexes containing ordered solvent molecules or cocrystallised ligands within the asymmetric unit. In various embodiments, the crystalline form is substantially free of amorphous protein or other crystalline forms, whereas in other embodiments, the crystalline preparation may comprise up to about 50% of alternative crystalline or amorphous forms. The term “co-crystal” as used herein refers to a crystalline molecular complex comprising MmCNT and one or more substrate or analogue molecules within the same crystal lattice, wherein the components are associated through non-covalent, non-ionic interactions such as hydrogen bonding, van der Waals contacts, or coordination interactions.

[0353]

[0262] In various embodiments, the crystalline form disclosed herein may be used for structure-based design, docking, simulation, screening, and engineering of peptide ligases and their ligation substrates, supporting identification of new peptide ligases or variants of pre-existing peptide ligases, modulators, or substrate with the desired ligation activity. Such methods may be computer-assisted and employ atomic coordinates derived from the apo (PDB: 8JTU) and substrate-bound (PDB: 8WKD) structures. As will be understood by the person skilled in the art, the use of crystallographic coordinate data for molecular docking, dynamics simulation, or virtual screening represents standard and routine techniques in structural biology and computational enzymology. Such methods allow prediction of substrate or ligand interactions, conformational changes, and catalytic effects using established computational suites (for example, AutoDock, Rosetta, Schrodinger, GROMACS, or AMBER), and do not require inventive skill to perform once the atomic coordinates of the enzyme are available.

[0354]

[0263] The crystalline forms disclosed herein may be used in a method for screening, or designing, a candidate substrate (poly)peptide that fits within or binds to the peptide ligase having the activity of MmCNT. The method may be a computer-assisted method.

[0355]

[0264] Accordingly, in one aspect, the present invention provides a method for screening, ordesigning a candidate substrate (poly)peptide of the peptide ligase having the activity of MmCNT comprising or consisting of an amino acid sequence set forth in SEQ ID NO: 2 or 3, or variants, or fragments thereof, comprising:

[0356] (a) providing a crystalline form of the peptide ligase characterised by atomic coordinates corresponding substantially to those deposited under PDB accession number 8JTU (apo form) and / or 8WKD (substrate-bound form);

[0357] (b) providing or generating a molecular model or three-dimensional structure of the candidate substrate (poly)peptide;

[0358] (c) computationally fitting or docking the candidate substrate (poly)peptide to the catalytic pocket and / or N- or C-terminal substrate-binding groove of the peptide ligase, wherein said fitting or docking comprises determining one or more interactions between atoms of the candidate substrate (poly)peptide and atoms of the peptide ligase to predict whether the candidate substrate (poly)peptide can adopt a binding conformation compatible with a ligation reaction; and (d) selecting the candidate substrate (poly)peptide predicted to fit within and / or bind to the catalytic pocket and / or substrate-binding groove of the peptide ligase as a potential substrate for ligation.

[0359]

[0265] The structural information suitable for use in such methods comprises the atomic coordinate data derived from the X-ray crystal structures of MmCNT, as set forth in PDB accession numbers 8JTU and 8WKD. Each atom may be defined by its element type, residue number, chain identifier, Cartesian coordinates (X, Y, Z) in angstroms relative to the crystallographic axes, occupancy, and isotropic displacement parameter. The apo structure (PDB: 8JTU) provides coordinates of the unbound enzyme, while the substrate-bound structure (PDB: 8WKD) defines the geometry of the active site in complex with a substrate (poly)peptide comprising a C-terminal K-D-X-G-A motif and / or an N-terminal pi'-pz'-ps’ motif. The apo structure of MmCNT (PDB: 8JTU) may be employed to analyse conformational flexibility and to identify regions involved in substrate recognition or induced-fit movements, while the substratebound structure (PDB: 8WKD) may be used to define the catalytic geometry, hydrogen-bonding network, and substrate orientation that support enzymatic ligation. The combination of these structures enables a structure-guided approach for rational engineering of substrates with improved catalytic performance or modified substrate specificity.

[0360]

[0266] In various embodiments, the methods described herein involve structure-based substrate design, wherein virtual peptide or chemical libraries containing the N- and C-terminal recognition motifs, are computationally fitted into the MmCNT peptide ligase substrate-binding pocket as defined by the coordinates of PDB 8WKD. The term “fitting” refers to determining, by automatic or semi-automatic computational means, the stability and complementarity of interactions between the candidate and the enzyme active site, including steric compatibility, hydrogen bonding, hydrophobic packing, and electrostatic complementarity. Such analyses allow prediction of amino acid analogues (e g., analogues of Gly or Ala) or sequence variants that retain binding and catalytic compatibility with the MmCNT catalytic pocket.

[0361]

[0267] The crystalline forms disclosed herein may be used in a method for screening of a candidate modulator of the peptide ligase disclosed herein. The method may be computer-assisted method.

[0362]

[0268] Accordingly, in one aspect, the present invention provides a method of screening or identifying a candidate modulator of the peptide ligase having the activity of MmCNT comprising or consisting of an amino acid sequence set forth in SEQ ID NO: 2 or 3, or variants, or fragments thereof, the method comprising:

[0363] (a) providing a crystalline form of the peptide ligase characterised by atomic coordinates corresponding substantially to those deposited under PDB accession number 8JTU (apo form) and / or 8WKD (substrate-bound form);

[0364] (b) providing one or more candidate modulators; (c) computationally docking, simulating, or evaluating the interaction of each candidate modulator with the peptide ligase, wherein said evaluation comprises determining changes in conformational stability, or active-site accessibility; and

[0365] (d) selecting one or more candidate modulators that are predicted to reduce or enhance substrate binding, or ligase activity of the peptide ligase.

[0366]

[0269] In various embodiments, step (c) comprises in silico docking or molecular dynamics simulation using the atomic coordinates of the crystalline enzyme to model the binding orientation and energetic fit of candidate molecules. In various embodiments, inhibitors identified by this method may act by occluding the substrate-binding groove, disrupting the catalytic pocket, or stabilising an inactive conformation, whereas activators may promote conformational ordering or enhanced substrate accessibility as observed in the substrate-bound crystal structure (PDB: 8WKD).

[0367]

[0270] In various embodiments, the candidate modulators may include inhibitors or activators that alter one or more catalytic properties such as substrate affinity, reaction rate, or overall ligation efficiency. Modulators may interact directly with the catalytic pocket or indirectly through allosteric regions influencing the N- or C-terminal substrate-recognition grooves. The activity of a modulator may be validated by in vitro ligation assays (e.g., Ub-EGFP and mCherry-Ub substrates), mass spectrometry, or fluorescence analysis, wherein decreased ligation yield indicates inhibition and increased yield indicates activation. Such modulators are useful for regulating or tuning MmCNT-mediated ligation reactions in biochemical, industrial, or therapeutic applications.

[0368]

[0271] The crystalline forms disclosed herein may be used in a method for engineering or screening a variant of the peptide ligase having the activity of MmCNT. The method may be a computer-assisted method.

[0369]

[0272] Accordingly, in one aspect, the present invention provides a method for engineering or screening variants of the peptide ligase having the activity of MmCNT, the method comprising:

[0370] (a) providing a crystalline form of the peptide ligase characterised by atomic coordinates corresponding substantially to those deposited under PDB accession number 8JTU (apo form) or8WKD (substrate-bound form);

[0371] (b) identifying one or more amino acid residues within the catalytic pocket and / or N- or C-terminal substrate-binding groove, and / or structural scaffolding region of the peptide ligase that contribute to substrate recognition and ligase activity;

[0372] (c) generating one or more amino acid mutations at the identified positions, thereby producing a set of candidate variants of the peptide ligase;

[0373] (d) modelling, using the atomic coordinates of the crystalline form, the impact of each amino acid mutation on catalytic geometry, substrate accessibility, or overall folding stability; and

[0374] (e) selecting one or more variants predicted to retain, enhance, or alter ligase activity and / or substrate specificity.

[0273] In certain embodiments, step (c) comprises computational mutagenesis or energy-minimisation modelling based on the crystal structure coordinates, followed by in silico scoring of the predicted structural stability (AAG) and retention of active-site geometry.

[0375]

[0274] In various embodiments, variants engineered according to this method may exhibit enhanced thermostability, broadened substrate tolerance, or modified catalytic selectivity while preserving the structural architecture defined by the crystalline forms (8JTU or 8WKD).

[0376]

[0275] In various embodiments, the method further comprises comparing the predicted ligase activity of each candidate variant with that of a reference peptide ligase corresponding to the MmConnectase (MmCNT), having the amino acid sequence set forth in SEQ ID NO:1, 2 or 3 and / or structural coordinates substantially corresponding to PDB accession numbers 8JTU and 8WKD. The activity of each candidate is determined under identical reaction conditions and normalised relative to the reference enzyme, wherein a variant exhibiting equivalent, enhanced, or reduced activity is identified respectively as retaining, improving, or diminishing the catalytic performance of the peptide ligase.

[0377]

[0276] The present invention is further illustrated by the following examples. However, it should be understood, that the invention is not limited to the exemplified embodiments.

[0378] EXAMPLES

[0379] Materials and methods

[0380]

[0277] Plasmid construction: To generate expression plasmids for Connectase wildtype (residues 1-193), nucleotide sequence was obtained from NCBI search (Accession #: AAM32605.1) and synthesized by Biobasic gene synthesis platform in a pET47b(+) backbone containing C terminal hexa-His tag for purification. To generate Connectase mutants, site-directed mutagenesis was performed using the Q5® High-Fidelity PCR kits (New England Biolabs, SG) and specific base substitution primers. mCherry, SNAP and Ubiquitin ligation substrates for protein-protein ligation assays were cloned into pET47b(+) plasmid backbones with double restriction digestion and T4 ligation, followed by site-directed mutagenesis to insert N-terminal Connectase recognition sequence (PGA FDADP LWEI SEQ ID N0:100) and C-terminal Connectase recognition sequence (RELAS KDPGA FDADP LWEI SEQ ID NO:96), respectively. Nucleotides and protein sequences are listed in Table 4.

[0381]

[0278] Table 4: Nucleotide and protein sequences

[0382] Nucleotides Sequence

[0383] m. mazei Sequence (5’ - 3’) SEQ ID NO: Connectase

[0384] T1A GGAGATATACATATG GCACTGGTTATCGCGTTCATCGG 22

[0385] E34A GATCGTGAAAAACTGGCAAAAGAACTGTACAGCGGCAGCATC 23

[0386] Y38A CTGGAAAAAGAACTGGCAAGCGGCAGCATCGTTACCGATGA 24

[0387] AG

[0388] Y38D GAAAAACTGGAAAAAGAACTGGACAGCGGCAGCATCGTTAC 25

[0389]

[0390] CG V79A GTTCTGGTTGGCGAAGCAAGCAGCGCGGAAGGC 26 S81A GTTGGCGAAGTTAGCGCGGCGGAAGGCGGCGTTG 27 S81G GTTGGCGAAGTTAGCGGAGCGGAAGGCGGCGTTG 28 S81D GTTGGCGAAGTTAGCGACGCGGAAGGCGGCGTTG 29

[0391] GTTGGCGAAGTTAGCAGCGCGGCAGGCGGCGTTGTTAAAAA

[0392] E83A 30

[0393] G

[0394] N125S CTTCATCGCGTTCGGT AGO GAATTCACCAAACAGGTTGCG 31 N125A CTTCATCGCGTTCGGT GCG GAATTCACCAAACAGGTTGCG 32 N125G CTTCATCGCGTTCGGT GGC GAATTCACCAAACAGGTTGCG 33 Ubiquitin Sequence (5’ - 3’)

[0395] (5)KDPGA(1 CCTGCGTCTGCGCGGTGGTAATGGTCTTAGAGAGCTAGCAA 34

[0396] GCAAGGATCCAGGTGCTTTCGACGCAGATCCACTAGTAGTC

[0397] 0) GAAATATGAGGATCCTAACTCGAGGC

[0398] SNAP Sequence (5’ - 3')

[0399] GAAGGAGATATACATATGCCAGGTGCTTTCGACGCAGATCCA PGA(10) 35

[0400] CTAGTAGTCGAAATAGGTCTCCCCGTGG

[0401] mCherry Sequence (5’ - 3')

[0402] GTCGAAATACCCGTGGATATCAAGATGCAGATCTTCGTGAAA 36 ACCCTGACCGGCAAGACCATCACCCTCGAGGTGGAGCCCAG TGACACCATCGAGAATGTCAAGGCAAAGATCCAAGATAAGGA AGGCATCCCTCCTGATCAGCAGAGGTTGATCTTTGCTGGGAA PGA(10)Ub ACAGCTGGAAGATGGACGCACCCTGTCTGACTACAACATCCA GAAAGAGTCCACTCTGCACTTGGTCCTGCGTCTGCGCCTTAT CGGTGAATTCGCTGGCTCAGGATCCGGATCAGTGAGCAAGG GC

[0403] Amino Acid Sequences

[0404] Plasmid Sequence MTLVIAFIGKNGAVMAGDMREITFEGEKPDREKLEKELYSGSIVT 37 pET47b(+)_ DEEMQKKAEEFGVKITVADCKEKVSERNGVLVGEVSSAEGGW m.

[0405] mazei_Conn KKRRLYASAGNFAIAELINTEMTLTSQGKGSNFIAFGNEFTKQVA ectase_6His NKCFKDNWTKKSNLQDAVKILILCMETVARKTASVSKQFMIVQT ASNADVLKWEKDRNCGSHHHHHH*

[0406] pET47b(+)_ M|A|LVIAFIGKNGAVMAGDMREITFEGEKPDREKLEKELYSGSIVT 38 m. DEEMQKKAEEFGVKITVADCKEKVSERNGVLVGEVSSAEGGW mazei_Conn KKRRLYASAGNFAIAELINTEMTLTSQGKGSNFIAFGNEFTKQVA ectase NKCFKDNWTKKSNLQDAVKILILCMETVARKTASVSKQFMIVQT T1A_6His ASNADVLKWEKDRNCGSHHHHHH*

[0407] pET47b(+)_ MTLVIAFIGKNGAVMAGDMREITFEGEKPDREKL|A|KELYSGSIVT 39 m. DEEMQKKAEEFGVKITVADCKEKVSERNGVLVGEVSSAEGGW mazei_Conn KKRRLYASAGNFAIAELINTEMTLTSQGKGSNFIAFGNEFTKQVA ectase_E34A NKCFKDNWTKKSNLQDAVKILILCMETVARKTASVSKQFMIVQT _6His ASNADVLKWEKDRNCGSHHHHHH*

[0408] pET47b(+)_ MTLVIAFIGKNGAVMAGDMREITFEGEKPDREKLEKEL|A|SGSIVT 40 m. DEEMQKKAEEFGVKITVADCKEKVSERNGVLVGEVSSAEGGW mazei_Conn KKRRLYASAGNFAIAELINTEMTLTSQGKGSNFIAFGNEFTKQVA ectase_Y38A NKCFKDNWTKKSNLQDAVKILILCMETVARKTASVSKQFMIVQT _6His ASNADVLKWEKDRNCGSHHHHHH*

[0409] pET47b(+)_ MTLVIAFIGKNGAVMAGDMREITFEGEKPDREKLEKEL|D|SGSIVT 41 m. DEEMQKKAEEFGVKITVADCKEKVSERNGVLVGEVSSAEGGW mazei_Conn KKRRLYASAGNFAIAELINTEMTLTSQGKGSNFIAFGNEFTKQVA ectase_Y38D NKCFKDNWTKKSNLQDAVKILILCMETVARKTASVSKQFMIVQT _6His ASNADVLKWEKDRNCGSHHHHHH*

[0410] pET47b(+)_ MTLVIAFIGKNGAVMAGDMREITFEGEKPDREKLEKELYSGSIVT 42 m. DEEMQKKAEEFGVKITVADCKEKVSERNGVLVGEgSSAEGGW mazei_Conn KKRRLYASAGNFAIAELINTEMTLTSQGKGSNFIAFGNEFTKQVA

[0411]

[0412] ectase_V79A _6His NKCFKDNWTKKSNLQDAVKILILCMETVARKTASVSKQFMIVQT ASNADVLKWEKDRNCGSHHHHHH*

[0413] pET47b(+)_ MTLVIAFIGKNGAVMAGDMREITFEGEKPDREKLEKELYSGSIVT 43 m. DEEMQKKAEEFGVKITVADCKEKVSERNGVLVGEVS@AEGGW mazei_Conn KKRRLYASAGNFAIAELINTEMTLTSQGKGSNFIAFGNEFTKQVA ectase NKCFKDNWTKKSNLQDAVKILILCMETVARKTASVSKQFMIVQT S81A_6His ASNADVLKWEKDRNCGSHHHHHH*

[0414] pET47b(+)_ MTLVIAFIGKNGAVMAGDMREITFEGEKPDREKLEKELYSGSIVT 44 m. DEEMQKKAEEFGVKITVADCKEKVSERNGVLVGEVS^AEGGW mazei_Conn KKRRLYASAGNFAIAELINTEMTLTSQGKGSNFIAFGNEFTKQVA ectase NKCFKDNWTKKSNLQDAVKILILCMETVARKTASVSKQFMIVQT S81G_6His ASNADVLKWEKDRNCGSHHHHHH*

[0415] pET47b(+)_ MTLVIAFIGKNGAVMAGDMREITFEGEKPDREKLEKELYSGSIVT 45 m. DEEMQKKAEEFGVKITVADCKEKVSERNGVLVGEVS@AEGGW mazei_Conn KKRRLYASAGNFAIAELINTEMTLTSQGKGSNFIAFGNEFTKQVA ectase NKCFKDNWTKKSNLQDAVKILILCMETVARKTASVSKQFMIVQT S81D_6His ASNADVLKWEKDRNCGSHHHHHH*

[0416] pET47b(+)_ MTLVIAFIGKNGAVMAGDMREITFEGEKPDREKLEKELYSGSIVT 46 m. DEEMQKKAEEFGVKITVADCKEKVSERNGVLVGEVSSA@GGW maze / _Conn KKRRLYASAGNFAIAELINTEMTLTSQGKGSNFIAFGNEFTKQVA ectase NKCFKDNWTKKSNLQDAVKILILCMETVARKTASVSKQFMIVQT E83A_6His ASNADVLKWEKDRNCGSHHHHHH*

[0417] pET47b(+)_ MTLVIAFIGKNGAVMAGDMREITFEGEKPDREKLEKELYSGSIVT 47 m. DEEMQKKAEEFGVKITVADCKEKVSERNGVLVGEVSSAEGGW maze / _Conn KKRRLYASAGNFAIAELINTEMTLTSQGKGSNFIAFG^EFTKQVA ectase_N125 NKCFKDNWTKKSNLQDAVKILILCMETVARKTASVSKQFMIVQT S_6His ASNADVLKWEKDRNCGSHHHHHH*

[0418] pET47b(+)_ MTLVIAFIGKNGAVMAGDMREITFEGEKPDREKLEKELYSGSIVT 48 m. DEEMQKKAEEFGVKITVADCKEKVSERNGVLVGEVSSAEGGW mazei_Conn KKRRLYASAGNFAIAELINTEMTLTSQGKGSNFIAFG|G|EFTKQVA ectase_N125 NKCFKDNWTKKSNLQDAVKILILCMETVARKTASVSKQFMIVQT G_6His ASNADVLKWEKDRNCGSHHHHHH*

[0419] pET47b(+)_ MTLVIAFIGKNGAVMAGDMREITFEGEKPDREKLEKELYSGSIVT 49 m. DEEMQKKAEEFGVKITVADCKEKVSERNGVLVGEVSSAEGGW maze / _Conn KKRRLYASAGNFAIAELINTEMTLTSQGKGSNFIAFG@EFTKQVA ectase_N125 NKCFKDNWTKKSNLQDAVKILILCMETVARKTASVSKQFMIVQT A_6His ASNADVLKWEKDRNCGSHHHHHH*

[0420] MPGAFDADPLWEIGLPVDIKLTGEFAMDKDCEMKRTTLDSPLG 50 pSNAP_PGA KLELSGCEQGLHEIKLLGKGTSAADAVEVPAPAAVLGGPEPLMQ _ (FDADP ATAWLNAYFHQPEAIEEFPVPALHHPVFQQESFTRQVLWKLLKV LWEI)_SNA VKFG EVISYQQLAALAGN PAATAAVKTALSGNPVPI LI PCH RWS P_6His SSGAVGGYEGGLAVKEWLLAHEGHRLGKPGLGPAGGSHHHH HH*

[0421] pET47b(+)_6 MAHHHHHHSAALEVLFQGPGMQIFVKTLTGKTITLEVEPSDTIEN 51 His_3C_Ubq VKAKIQDKEGIPPDQQRLIFAGKQLEDGRTLSDYNIQKESTLHLV uitin_(RELAS LRLRGGNGLRELASKDPGAFDADPLWEI*

[0422] ) KDPGA

[0423] (FDADP

[0424] LWEI)

[0425] pET47b(+)_P MPGAFDADPLWEIPVDIKMQIFVKTLTGKTITLEVEPSDTIENVK 52 GA AKIQDKEGIPPDQQRLIFAGKQLEDGRTLSDYNIQKESTLHLVLR

[0426] (FDADPLW LRLIGEFAGSGSGSVSKGEEDNMAIIKEFMRFKVHMEGSVNGH EI)_Ub_mCh EFEIEGEGEGRPYEGTQTAKLKVTKGGPLPFAWDILSPQFMYG erry_6His SKAYVKHPADIPDYLKLSFPEGFKWERVMNFEDGGWTVTQDS

[0427]

[0428] SLQDGEFIYKVKLRGTNFPSDGPVMQKKTMGWEASSERMYPE DGALKGEIKQRLKLKDGGHYDAEVKTTYKAKKPVQLPGAYNVNI

[0429]

[0430] KLDITSHNEDYTIVEQYERAEGRHSTGGMDELYKHHHHHH*

[0431]

[0279] To generate mammalian expression plasmids for Connectase wildtype (residues 1-193), nucleotide sequence was obtained from NCBI search (Accession #: AAM32605.1) and synthesized by Avenir gene synthesis platform in a pCDNA3.1 backbone. To generate mCherry and EGFP substrates for protein-protein ligation assays were cloned into pCDNA3.1 plasmid backbones with double restriction digestion and T4 ligation, followed by site-directed mutagenesis to insert N’ terminal Connectase recognition sequence (GGA FDADP LVVEI SEQ ID NO:101) and C’ terminal Connectase recognition sequence (RELAS KDPGA FDADP LVVEI SEQ ID NO:96) respectively. To generate plasmids for bacteria cell surface expression, proteins were selected and synthesized into pET28b plasmid backbones containing LppOmpA sequence for docking of protein to the outer membrane of cells and C’ terminal recognition fusion mmCNT sequence (RELASKD-mmCNT WT), by Biobasicgene synthesis platform. All plasmid sequences are reflected in Table 5.

[0432]

[0280] Table 5: Plasmid protein sequences

[0433] Name Amino acid sequence in the ORF of the plasmids SEQ ID NO:

[0434] MVSKGEELFTGWPILVELDGDVNGHKFSVSGEGEGDATYG 53 KLTLKFICTTGKLPVPWPTLVTTLTYGVQCFSRYPDHMKQHD FFKSAMPEGYVQERTIFFKDDGNYKTRAEVKFEGDTLVNRIE LKGIDFKEDGNILGHKLEYNYNSHNVYIMADKQKNGIKVNFKI pCDNA_EGF RHNIEDGSVQLADHYQQNTPIGDGPVLLPDNHYLSTQSALSK P_7aalinker_ DPNEKRDHMVLLEFVTAAGITLGMDELYKGSSGSGSRELAS

[0435] KDCnt KDTLVIAFIGKNGAVMAGDMREITFEGEKPDREKLEKELYSG SIVTDEEMQKKAEEFGVKITVADCKEKVSERNGVLVGEVSSA EGGWKKRRLYASAGNFAIAELINTEMTLTSQGKGSNFIAFG NEFTKQVANKCFKDNWTKKSNLQDAVKILILCMETVARKTAS VSKQFMIVQTASNADVLKVVEKDRNCGS* MTLVIAFIGKNGAVMAGDMREITFEGEKPDREKLEKELYSGSI 54 pCDNA3.1(+) VTDEEMQKKAEEFGVKITVADCKEKVSERNGVLVGEVSSAE

[0436] Cnt WT_6his GGWKKRRLYASAGNFAIAELINTEMTLTSQGKGSNFIAFGN EFTKQVANKCFKDNWTKKSNLQDAVKILILCMETVARKTASV SKQFMIVQTASNADVLKVVEKDRNCGSLEHHHHHH* MGGAVDAKPLWEIPVDIKMQIFVKTLTGKTITLEVEPSDTIEN 55 VKAKIQDKEGIPPDQQRLIFAGKQLEDGRTLSDYNIQKESTLH pCDNA3.1(+) LVLRLRLIGEFAGSGSGSVSKGEEDNMAIIKEFMRFKVHMEG

[0437] GGA Ub SVNGHEFEIEGEGEGRPYEGTQTAKLKVTKGGPLPFAWDIL

[0438] VDAK SPQFMYGSKAYVKHPADIPDYLKLSFPEGFKWERVMNFEDG

[0439] mcherry NLS GVVTVTQDSSLQDGEFIYKVKLRGTNFPSDGPVMQKKTMG

[0440] Cmyc WEASSERMYPEDGALKGEIKQRLKLKDGGHYDAEVKTTYKA KKPVQLPGAYNVNIKLDITSHNEDYTIVEQYERAEGRHSTGG MDELYKLESRGPVPAAKRVKLD* MVSKGEELFTGWPILVELDGDVNGHKFSVSGEGEGDATYG 56 pCDNA_EGF KLTLKFICTTGKLPVPWPTLVTTLTYGVQCFSRYPDHMKQHD P_9aalinker_( FFKSAMPEGYVQERTIFFKDDGNYKTRAEVKFEGDTLVNRIE 5)KDPGA(15) LKGIDFKEDGNILGHKLEYNYNSHNVYIMADKQKNGIKVNFKI

[0441] VDAK RHNIEDGSVQLADHYQQNTPIGDGPVLLPDNHYLSTQSALSK DPNEKRDHMVLLEFVTAAGITLGMDELYKGGGSKKRGNREL ASKDPGAVDAKPLVVEISEEGE*

[0442] Pet47b MGGAVDAKPLWEIPVDIKMQIFVKTLTGKTITLEVEPSDTIEN 57 GGA(15) VKAKIQDKEGIPPDQQRLIFAGKQLEDGRTLSDYNIQKESTLH

[0443] VDAK Ub(12) LVLRLRLIGEFAGSGSGSVSKGEEDNMAIIKEFMRFKVHMEG

[0444]

[0445] Mcherry SVNGHEFEIEGEGEGRPYEGTQTAKLKVTKGGPLPFAWDIL SPQFMYGSKAYVKHPADIPDYLKLSFPEGFKWERVMNFEDG GWTVTQDSSLQDGEFIYKVKLRGTNFPSDGPVMQKKTMG WEASSERMYPEDGALKGEIKQRLKLKDGGHYDAEVKTTYKA KKPVQLPGAYNVNIKLDITSHNEDYTIVEQYERAEGRHSTGG

[0446]

[0447] MDELYKHHHHHH*

[0448]

[0281] Protein expression and purification: To obtain protein substrates, plasmids constructed were transformed into BL21(DE3) (NEB, C2527) and grown in Miller’s LB broth (LB, Biobasic) media at 37°C for 16-18 h, in a shaking incubator (220rpm). Starting culture was then used to inoculate a larger volume of lysogeny broth, which was grown to OD600 of -0.6-0.8. Temperature was then lowered to 16 °C, protein expression was induced with 0.5 mM IPTG and bacteria were cultured overnight. Bacteria were harvested at 4000 xg (Avanti JXN series, Beckman Coulter) for 15 minutes and then resuspended with lysis buffer (50 mM Tris-HCI, pH 7.4, 150 mM NaCI, 0.05% (v / v) CHAPS, 10 % (v / v) glycerol. Cells were then lysed by Emulsiflex-C3 Homogenizer and centrifuged at a speed of 30,000 x g (Avanti JXN series, Beckman Coulter) for 30 minutes at 4°C. The supernatant was passed through a pre-equilibrated column containing Ni-NTA agarose beads (BioBasic, SA005100) by gravity. The column was then washed with lysis buffer (50 mM Tris-HCI, pH 8.0, 150 mM NaCI, 0.05% (v / v) CHAPS, 10 % (v / v) glycerol) containing 20 mM imidazole to remove any non-specific binding proteins. His-tagged proteins were then eluted with increasing imidazole concentration (100mM-300mM, each step a 50mM increment) and verified via SDS-PAGE. To analyse the eluted fractions, Sodium Dodecyl-Sulfate Polyacrylamide Gel Electrophoresis (SDS-PAGE) was carried out. Fractions containing the protein were then concentrated using Amicon Ultra concentrator (Merck Millipore, USA) and further purified on a Size Exclusion column (Superdex™ 20010 / 300 GL, GE Healthcare, USA) in elution buffer (20 mM Tris and 100 mM NaCI, pH 8.0).

[0449]

[0282] Crystallization of Connectase T1A, crystallographic data collection and structure determination: Protein at a concentration of 30 mg / ml was subjected to robotic crystallization trials using the sitting drop / vapor diffusion method and a Mosquitoe instrument (TTP Biotech UK). Two protein concentration (15 mg / mL and 30 mg / mL) and an absence of protein were tested with commercial crystallization screens (Molecular Dimensions, Hampton Research) in intelli 96-3 wells sitting drop plates. Optimized Connectase T1A crystals were by mixing equal volume of protein and a precipitate solution containing 0.1 M T ris-HCI, pH 8.5, 150 mM NaCI, 12% PEG 20k. Crystals were cryo- protected by a brief soak in the precipitating solution supplemented with 10% (v / v) glycerol and rapidly frozen in liquid nitrogen. The diffraction intensities of crystal were collected at MX1 beamline, Australian Synchrotron (Melbourne, Australia) to a resolution of 3.15 A. A total of 720 images of 0.25° oscillation each, was processed with the CCP4 package. Diffraction intensities were integrated with program Imosflm and scaled with program SCALA from the CCP4 suite. The crystal structure of Connectase T1A was determined by molecular replacement using M. J connectase as a search probe (PDB codes 6ZW0). Model rebuilding sessions at the computer graphics, using program Coot were interspersed with refinement to a resolution of 2.0 A using Phenix. Data collection statistics are summarized in Table 6. Figure illustrations were produced using Pymol. The atomic coordinates and structure factors are deposited with the Protein Data Bank under accession code 8JTU and 8WKD.

[0283] Table 6: X-ray Crystallography data collection and refinement statistics

[0450] mmCNTT1AmmCNTT1A / C192Swith substrate Crystal parameter

[0451] Space group P 31 2 1 P 21 21 2

[0452] Cell dimensions

[0453] a, b, c (A) 90.72, 90.72, 90.46 54.10, 100.97, 32.60 a, 3, y(°) 90, 90, 120 90, 90, 90 Subunit 2 2

[0454] Data collection

[0455] Beamline Aus Sync MX1 Aus Sync MX1 Wavelength (A) 1.0 0.9537 Resolution (A) 45.36 - 3.40 47.69- 1.98 No. observations 25516 61160

[0456] Rsymor Rmergea0.14 - 13.17 1.13 Completeness (%)a99.7 99.4 Redundancy36.5 9.9

[0457] Refinement

[0458] Resolution (A) 3.40 1.98

[0459] No. unique reflections 11443 23726

[0460] 0.207 / 0.285 0.231 / 0.270 Wilson B-factor (A2) 118.2 34.9

[0461] Anisotropy 0.468 0.337

[0462] Bulk solvent kso / (e / A3), Bsol(Å2) 0.27, 161.9 0.34, 37.4

[0463] Total number of atoms 2944 1664

[0464] B-factors 226.0 46

[0465] R.m.s. deviationsc

[0466] Bond lengths (A) 0.86 0.43

[0467] Bond angles (°) 1.20 0.60 Ramachandran Plot (%)d97.56 / 1.95 / 0.49 96.59 / 1.95 / 1.46 PDB accession code 8JTU 8WKD

[0468]

[0469] aThe values in parentheses correspond to the highest resolution shell.

[0470] bIntensities estimated from amplitudes.

[0471] cDeviations from ideal bond lengths / angles.

[0472] dPercentage of residues in favoured region / allowed region / outlier region.

[0284] Protein-Protein Ligation Assay. Protein-protein ligation assay involves ligating two proteins using Connectase. A standard protein-protein ligation reaction consists of two substrates e.g., PGA(10) SNAP and Ubi (5)KDPGA(10), and Connectase in Phosphate-Buffered Saline (PBS). The concentration used for both proteins was 10pM, whereas the concentration for Connectase varies. The reaction mix was then incubated at 37°C for 15mins. After incubation, BG-647 was added and left to sit for two mins. SDS-PAGE analysis was then performed to identify whether protein ligation is successful, by imaging the gel under BG-647 and Coomassie Blue Staining. To compare the ligation efficiency of Connectase mutants with the WT, all assay components remained the same except for the Connectase enzyme used. Different PGA(10) substrates were also tested - PGA(10) Ub FLAG, PGA(10) GSGSGS (SEQ ID NO:115) mCherry and PGA(10) Ub mCherry. Similarly, all components remained the same except for the type of PGA(10) substrate tested.

[0473]

[0285] Peptides synthesis: All Fmoc amino acids powder and benzotriazol-1-yloxytripyrrolidinophosphonium hexafluorophosphate (PyBOP) were purchased from GL Biochem, Shanghai. Solvents such as dimethylformamide (DMF), N, N-Diisopropylethylamine (DIEA) and dimethylchloride (DCM) were purchased from Merck-Millipore, USA. Trifluoroacetic acid (TFA), 2,2 -(Ethylenedioxy)diethanethiol (DODT), triisopropylsilane (TIS), N. N'-Diisopropylcarbodiimide (DIC) and diethyl ether were purchased from Sigma Aldrich. OxymaPure was purchased from CEM. Piperidine was purchased from Acros. Biotin was purchased from IBA life sciences. All peptides used in this manuscript were synthesized in-house by Fmoc-based automated CEM Liberty peptide synthesizer with DIC / OxymaPure as coupling reagent and HPLC purified. All peptides identities were confirmed by MALDI TOF MS under reflective mode (ABI 4800 MALDI TOF / TOF). MBHA rink amide resin was used for solid phase synthesis with double coupling method. 20% piperidine in DMF was used as deprotection agent. Resin conjugated peptide was repeatedly washed with DMF and DCM before cleavage. A freshly prepared cleavage cocktail (94% TFA / 2.5% H2O / 2.5% DODT / 1 % TIS) was added to the resin for minimum 2 hours cleavage at ambient temperature. 4 times volume of cold ether was added into the TFA peptide solution to precipitate crude peptide out from the TFA solution. The crude peptide was purified by reverse phase HPLC with BioRad NGC system, C18 analytical column (Jupiter Phenomenex5um C18300A250 x 10mm). Fractions were analysed by MALDI-TOF MS to confirm the presence of target peptide. Fractions with target peptide were lyophilised for a minimum of 24 hours. Dried peptide powder was stored at -20°C. All peptides synthesis are listed in Table 7.

[0474]

[0286] Table 7: Peptides Synthesis

[0475] No. of SEQ ID Name Peptide sequence AA MW NO:

[0476] PGA (P4’) PGA ADADP LWEI SEEGE 18 1797 58 PGA (V P4’) PGA VDADP LWEI SEEGE 18 1825 59 PGA (K P7’) PGA FDAKP LWEI SEEGE 18 1887 60 PGA (P5’) PGA FAADP LWEI SEEGE 18 1829 61 PGA (P8’) PGA FDADA LWEI SEEGE 18 1847 62

[0477]

[0478] PGA (P9’) PGA FDADP AVWEI SEEGE 18 1831 63 PGA(P10) PGA FDADP LAVEI SEEGE 18 1845 64 PGA(PH’) PGA FDADP LVAEI SEEGE 18 1845 65 PGA(P12) PGA FDADP LVVAI SEEGE 18 1815 66 PGA (P 13 ) PGA FDADP LVVEA SEEGE 18 1831 67 PGA(P14) PGA FDADP LVVEI AEEGE 18 1857 68 5KDPGA10 69 (P7) AELAS KDPGA FDADP LWEI 20 2057

[0479] 5KDPGA10 70 (P6) RALAS KDPGA FDADP LWEI 20 2084

[0480] 5KDPGA10 71 (P5) REAAS KDPGA FDADP LWEI 20 2100

[0481] 5KDPGA10 72 (P3) RELAA KDPGA FDADP LWEI 20 2126

[0482] 5KDPGA10 73 (P2) RELAS ADPGA FDADP LWEI 20 2085

[0483] 5KDPGA10 74 (P1) RELAS RDPGA FDADP LWEI 20 2170

[0484] AGA 15aa AGA FDADP LWEI SEEGE 18 1847 75 CGA15aa CGA FDADP LWEI SEEGE 18 1880 76 DGA15aa DGA FDADP LWEI SEEGE 18 1891 77 EGA15aa EGA FDADP LWEI SEEGE 18 1904 78 FGA15aa FGA FDADP LWEI SEEGE 18 1923 79 GGA 15aa GGA FDADP LWEI SEEGE 18 1832 80 HGA 15aa HGA FDADP LWEI SEEGE 18 1914 81 IGA 15aa IGA FDADP LWEI SEEGE 18 1890 82 KGA15aa KGA FDADP LWEI SEEGE 18 1904 83 LGA 15aa LGA FDADP LWEI SEEGE 18 1890 84 MGA15aa MGA FDADP LWEI SEEGE 18 1908 85 NGA 15aa NGA FDADP LWEI SEEGE 18 1890 86 PGA 15aa PGA FDADP LWEI SEEGE 18 1873 87 QGA 15aa QGA FDADP LVVEI SEEGE 18 1905 88 RGA 15aa RGA FDADP LVVEI SEEGE 18 1932 89 SGA 15aa SGA FDADP LVVEI SEEGE 18 1862 90 TGA 15aa TGA FDADP LVVEI SEEGE 18 1877 91 VGA 15aa VGA FDADP LVVEI SEEGE 18 1874 92 WGA15 aa WGA FDADP LWEI SEEGE 18 1962 93 YGA 15aa YGA FDADP LVVEI SEEGE 18 1939 94 5KDGGA10 RELAS KDGGA FDADP LWEI 20 2101 95 5KDPGA10 RELAS KDPGA FDADP LVVEI 20 2141 96 5KDPGA10 97

[0485]

[0486] VK RELAS KDPGA VDAKP LVVEI 20 2107

[0487]

[0287] Peptide-Peptide Ligation Assay: Peptide-peptide ligation assay involves ligating two peptides using Connectase. Native and modified amino acids were purchased from Sigma Aldrich (USA). All peptides used in this manuscript were synthesised by the lab using the solid phase method with the Liberty Blue™ Automated Microwave Peptide Synthesiser (CEM Corporation, USA) and HPLC purified.

[0488] The identity of each purified peptide via HPLC peak was analysed by MALDI TOF MS (ABI 4800 MALDI TOF / TOF). A standard peptide-peptide ligation reaction consists of two peptides e.g., PGA(15) and (5)KDPGA(10), and Connectase in PBS. The concentration used for both peptides was 50pM, whereas the concentration for Connectase was 0.5pM. The reaction mix was then incubated at 37°C for 15mins. The samples were directly spotted on a Matrix-Assisted Laser Desorption / lonization Mass Spectrometry (MALDI MS) sample plate in triplicates, a-cyano-4-hydroxycinnamic acid (CHCA), which is a recrystallized matrix for MALDI MS, was spotted on the samples. Finally, the samples were analysed using MALDI MS in linear mode, with a mass range of 820-5000 Da. To compare the ligation efficiency of Connectase mutants with the WT, all assay components remained the same except for the Connectase enzyme used.

[0489]

[0288] Cell culture and transfection: Cell lines HEK293T and HaCaT were from ATCC (Manassas, VA), both cell lines were maintained at 37 °C with 5% CO2 in DMEM (Gibco) supplemented with 10% (v / v) fetal bovine serum (Gibco). Cellular viability checked was performed by staining the cells with Trypan Blue (ThermoFisher Scientific, SG). The stained cells were added into a Countess™ Cell Counting Chamber Slide (ThermoFisher Scientific, SG) for use with the Countess™ Automated Cell Counter (ThermoFisher Scientific, SG). HEK293T cells were plated at a density of 5* 105cells per well for poly-L-lysine- (Sigma-Aldrich) coated six-well plate prior to transfections. Plasmids were transfected into HEK293T cells using FuGENE HD (Promega) with a “5: T'ratio according to the manufacturer’s instructions.

[0490]

[0289] Cellular Surface Ligation: The transfected cells were harvested 24 hour post-transfection by using a cell scrapper. Medium was replaced with reaction buffer containing 0.1% BSA in 1x PBS, pH 7.4. 1 pM of Connectase and 10 pM of ligating substrates were added and incubated at 37°C for20mins. The cells were washed thrice with 1x PBS, pH 7.4 before further assay.

[0491]

[0290] Flow cytometric assay and analysis: To the cell pellet, the following reagents were added -ice cold Fluorescence-Activated Cell Sorting (FACS) buffer containing 1X PBS and 0.5% Bovine Serum Albumin (BSA), primary mouse anti-flag M2 antibody (1:1000 dilution) (Sigma-Aldrich, SG) and secondary Alexa 647 goat anti-mouse IgG (1:250 dilution) (ThermoFisher Scientific, SG). This is followed by incubation at 4°C for 1 hour in the dark. After incubation, the cells were washed and resuspended using an ice cold FACS buffer. The cells were passed through a 5ml Polystyrene Round-Bottom Tube with Cell-Strainer Cap, to obtain single cells. Finally, the cells were analysed using BD LSRFortessa™ X-20 Cell Analyzer and fcs files is processed using FlowJo Version 10 (FlowJo, LLC).

[0492]

[0291] Immunofluorescence imaging and staining: HEK293T cells were seeded onto 8 chamber p-Slide from ibidi and transfection were done according to the protocol previously written. Ligation of Ub-mCherry with improved N-terminal recognition motif was carried out according to the cellular ligation method. The unbounded Ub-mCherry were washed off briefly with PBS before the addition of 4% paraformaldehyde in PBS for 15 minutes and washed again thrice with PBS. Lastly, the slides were cover slipped with Prolong™ Gold Antifade Mountant with DAPI (Thermofisher Scientific). Fluorescence imaged were captured using Ziess AXIO observer Z1 and processed with ZEISS ZEN (blue edition) imaging software.

[0493]

[0292] Immunofluorescence imaging and SDS-PAGE analysis: HEK293T cells were seeded onto 8 chamber p-Slide from ibidi and transfection were done according to the protocol previously written. Fluorescence and Phase contrast imaged were captured using Ziess AXIO observer Z1 and processed with ZEISS ZEN (blue edition) imaging software. Transfected cells were harvested by using a cell scrapper and lysed with ice-cold RIPA lysis buffer (20 mM Na2H2PO4, 250 mM NaCI, 1% Triton X-100, 0.1% SDS, pH 8.0). Total protein lysates were resolved using 15% SDS-PAGE. SDS-PAGE analysis was then performed to identify whether intracellular protein ligation is successful, by imaging the gel under Alexa594 for detecting mCherry, Alexa488 for detecting EGFP and Coomassie Blue Staining.

[0494]

[0293] Immunoblotting: Cells were lysed with ice-cold RIPA lysis buffer (20 mM Na2H2PO4, 250 mM NaCI, 1% Triton X-100, 0.1% SDS, pH 8.0). Total protein lysates were resolved using 15% SDS-PAGE and electro-transferred onto Immun-Blot® polyvinylidene fluoride (PVDF) membrane (Bio-Rad, USA). Membranes were blocked with 5% Blotting-Grade Blocker (#1706404, Bio-Rad, USA) diluted with TBST (50 mM Tris HCI, pH 7.6, 150 mM NaCI, and 0.05% Tween-20) for 1 h at room temperature. The membrane was then incubated overnight at 4°C with the indicated primary antibodies in 5% BSA diluted with TBST. Membranes were washed thrice with TBST, and incubated with appropriate HRP-conjugated anti-IgG secondary antibodies (Santa Cruz Biotehnology, USA) for 1 h at room temperature. Protein bands were revealed using ECL™ Prime Western Blotting System (RPN2232, GE Healthcare Bio-Sciences, USA) and imaged using CCD-based ChemiDoc™ Imaging Systems (Bio-Rad, USA).

[0495]

[0294] Table 8: List of CNTs related-family variants and their amino acid sequences, including validated inactive homologs.

[0496] Description Amino acid sequence SEQ ID NO WP_048036387.1 MTLVIAFIGKNGAVMAGDMREITFEGEKPDREKLEKELYSG 102 DUF2121 domainSIVTDEKMQKKAEEFGVKITVADCKEKVSERNGVLVGEVSS containing protein AEGGWKKRRLYASAGNFAIAELINTEMTLTSQGKGSNFIA [Methanosarcina FGNEFTKQVANKCFKDNWTKKSNLQDAVKILILCMETVARK

[0497] mazei] TASVSKQFMIVQTASNADVLKWEKDRNC

[0498] WP_010870052.1 MSLIICYYGKNGAVIGGDRRQIFFRGSEENRKILEEKLYSGE 103 DUF2121 family IKSEEELYKLAEKLNIKIIIEDDREKVRKISDSWCGEVRSLGI

[0499] protein DAKRRRVYATKGKCAIVDILNDTVTNQTIKEGFGIWLGNRF [Methanocaldococc LKKKAEEELKRTAKLFPMMPIQQIEDAIKEIFEKLKWHPTVS

[0500] us jannaschii] KEYDIYSVNKYEKNFEEVIKKDIESLFKYREQLRKQLIDFGK VMSIVNKIVKNGEIGVIKDGKLHLYDDYIAIDKIDPNPKVFKV VDVEGNFKDGDIWIENGDMKIKGTNEKVTTKYIIIHK WP_048172501.1 MTLVIAFIGKNGAIMAGDMREITFEGEKPDREKLEKELYNGT 104 DUF2121 domainIVTDEELARKAEEAGVKITVTDCKNKVSERNGILVGEVSSVE containing protein GGIVKKRRLYASAGAYAIAELRDLELTLISQGKSSNFIAFGN [Methanosarcina EFTKQVANKYFKDNWTKKSKLQDAVKILMLCMETAAKKTA

[0501] siciliae] SVSKQFVIVQTSSNADVLKLVEKDRKS

[0502]

[0503] MDI9395351.1 MTLVIAFIGKNGAVMAGDMREITFEGEKPDREKLEKELYSG 105 MAG: DUF2121 AIITDEELAKKAEEFGVKITIADCKEKVAERDGVLVGEASSIE domain-containing DGWKKRRLYASAGNYAIAELRDSEMTLTTHGKGSNFIAFG protein NEFTKQVANKCFKKHWTKKSNLQDAVKILIMCMESAGEKT [Euryarchaeota ASVSRQFMIVQTASNADVLKWEKDRNG

[0504] archaeon]

[0505] WP_048125541.1 MTLVIAFIGKNGAVMAGDLREITFDGEKPDMEKLEKELYSG 106 DUF2121 domainTIVTDEELTKKAEEFGVKIMVTDCKSKVSERNGILVGEVSSV containing protein EGSAVKKRRLYASAGNYAIAELRDSELTMTSQGKGSNFIAF [Methanosarcina GNEFTKQIANKCFTDNWTKKSKLQDAVKILMLCMETAARET lacustris] ASVSKQFVIMQTASNVDVLKWEKDRKNCIKC

[0506] WP_011022290.1 MTLVIAFIGKNGAVMAGDMREITFEGEKPDREKLEKELYNG 107 DUF2121 family AIVTDEELARKAEEAGVKITVTDCKNKVSERNGILVGEVSSV protein ERGIVKKRRLYASAGSYAIAELRDSELTLTSQGKGSNFIAFG [Methanosarcina NEFTKQVANKCFKDNWTKKSKLQDAVKVLMLCMETAAKKT acetivorans] ASVSKQFVIVQTSSNVDVLKLVEKDGGT

[0507] MCO5382431.1 MTLVIAFIGKNGAVMTGDLREITFEGDKSDRERLEKELYSG 108 MAG: DUF2121 AIVTDEELVKKAEEFGVGITVTDCKSKVSERNGVLMGEVSS domain-containing IEGGWKKRRLYASAGNYAITELRDIEIILTSHGKGSNFIAFG protein NEFTKQIANKCFKDNWTKKSNFQDAVKILMLCMETAARKT [Methanosarcina ASVSKQFFLIQTALNVDVLKWEEDLKE

[0508] barken]

[0509] WP_048123851.1 MTLVIAFIGKNGAVMTGDLREITFEGEKQNREKLEKDLYNG 109 DUF2121 domainTIVTDDELAKKAREFGVGITVTDCKSKISERDGILIGEVSSIE containing protein GGVTKKRRLYASAGNHAIAEIRDSEITLTSHTKGSNLIVLGN [Methanosarcina DFTKQVANKCFKDNWTKKSTFQDAIKILILCMETAARKTAS vacuolata] VSKQFFLIQTTSNVDVLKIVEKDRNN

[0510] WP_095644589.1 MTLVIAFIGKNGAVMTGDLREITFEGEKQNRDKLEKELYNG 110 DUF2121 domainTIVTDDELANKAREFGIGITVTDCKSKISERDGVLIGEVSSIE containing protein GGVTKKRRLYASAGNYAIAELRNSEITLTSHAKGSNLIVLGN [Methanosarcina DFTKQVANKCFKDNWTKKSTFQDAVKILILCIETAARKTASV spelaei] SKQFFLIQTASNVDVLKIVEKDRNS

[0511] AYK16464.1 MTLVIAFIGKNGAVMTGDLREITFEGDKQDREKLEKELYSG 111 DUF2121 domainAIVTDDELAEKARKFGVGITVTDCKSKISEKNGVLVGEVSSI containing protein EGGVIKKRKLYASAGNYAIAELRDSELTLTSHAKGSTLIVLG [Methanosarcina NEFTKQIANKCFKDNWTKKSTFQDAVKILMLCMETAARKTA flavescens] SVSQQFYLIQTTSNVDVLKAVEMDKTQKA

[0512] WP_048168365.1 MTLVIAFIGKNGAVMTGDLREITFEGDKQDREKLEKELYSG 112 DUF2121 domainAIVTDDELAEKAREFGVSITVKDCKSKIFERDGVLIGEVSSIE containing protein GGWKKRRLYASAGNYAIAEIRDSELTLTSHAKGSTLIVLGN [Methanosarcina EFTKQIANKCFKDNWTKKSTFQDAVKILMLCMETAARKTAS thermophila] VSQQFYLIQTTSNVDVLKAMEIDKTQEA

[0513] WP_303716285.1 MIAFIGRQATVMAGDMREIAFEGDDPCIEELERELYSGSITS 113 DUF2121 domainDTELAERADEIGVTIRVRDDKAKVSQQGGVLIGEVTETEGS containing protein VTRRKRLYATAGSYAIAEAIDSRLRVTQRGRASNFWLGNE [Methanoculleus VTKRIANQCIQGMWEGGTIQDAMRLLMLTMQITASVTASVS marisnigri] RTFILVHTDLAANLVDAINQDSRK

[0514] WP_269899706.1 MSLIITYISSKGCVIAGDKRRIAYFGDKSSREVLEEELYTGKI 114 DUF2121 family KSDEELQRRASELGVNIKVTDDTCKVRSLGDVWGEISQKT protein PFETRRRRIYATTGAYQIIELTGSKITSMEKGDTAIIVFGNKIA [Methanothermoba KEITNRFLKKRWKTKTSLKDVADLFRELMDHVSSQTPSVG cter SEYDLFIKSPSLDKKSAHKLLSDTIVRDVRLLQKWRAKLKQ thermautotrophicus] EMLDRREEMKLASRILTEGEVGRVMRQDGSHVEVKLAGD T2GHE8 VEAYDTRWKKVAGPGDMVLMRVADGGTISPGEKIAVRDE

[0515]

[0516] NLCVEGRDIKVDCDVIICRREE Example 1: Overall structure, and comparison with Mj Connectase, and sortase.

[0517]

[0295] Understanding the structural features that distinguish ligase and protease activities is crucial for advancing enzymatic protein ligation techniques. To this end, we first focused on structural studies to capture the pro-ligase MmConnectase (MmCNT) in both its apo form and peptide substrate-bound form.

[0518]

[0296] Despite repeated co-crystallization attempts, holoenzyme crystals were obtained only when using the native peptide substrate. The MmCNT apo form (8JTU) adopts a crocodile-shaped conformation, featuring a long groove with less refined sidechain lining. In this conformation, cysteines in MmCNT form stable disulfide bonds (C65-C193), locking the enzyme in a dimer conformation and restricting the crystal lattice. Interestingly, removing the cysteines did not resolve this issue, and empty enzymes were still stuck in the same crystal form. It appears that the native substrate recognition motif is not the ‘optimal’ combination of amino acids for MmCNT recognition. Volume analysis of the apo MmCNT structure revealed significant potential for designing more favourable peptide substrates.

[0519]

[0297] A series of peptides were synthesized and evaluated using a peptide binding and ligation assay, with mass spectrometry monitoring peptide ligation efficacy (FIG. 1A-1B). Swapping two amino acids at the P4' and P7' positions (F— »V, and D— > K, respectively) of MmCNT (bold) significantly enhanced ligation efficacy (peptide sequence: PGAVDAKPLWEI SEQ ID NO:100). Subsequently, we successfully obtained a crystal structure capturing of the binary complex capturing MmCNT(TIA) in its catalytic intermediate state (PDB code: 8WKD). The resulting structure at 2.1 A resolution revealed the atomic details of how MmCNT formed intricate interactions with its preferred substrates. In contrast to the original MjCNT structure re-ported where the bound peptide was only loosely interacting with the enzyme, here the entire peptide was embedded in the substrate-recognizing groove (FIG. 2A, density view in yellow colour), with the peptide sequences at the P7-P1 positions (RELASKD SEQ ID NO:116) sitting in a C-terminal substrate pocket, the PT-P3' positions (PGA) inside the catalytic pocket, and the P4'-P13' positions (VDAKPLWEI SEQ ID NO:102) fitting perfectly in the N-terminal substrate binding groove. The overall shape of MmCNT resembled a crocodile biting a buffalo leg. In addition, we have conducted a series mutagenesis to demonstrate that the observed complex structure is indeed the captured catalytic intermediate state of which the residues that were observed having intricate interactions with the substrate indeed have correlated impact on MmCNT’s protein ligase activity (FIG.

[0520] 3A-3C).

[0521] Example 2: The substrate binding grooves.

[0522]

[0298] Notably, the C-terminal substrate pocket of MmCNT exhibits relatively loose interactions with the bound peptide. Sequential alanine scanning of the substrate (replacing canonical recognition sequence amino acids with alanine) at the P3-P6 positions resulted in minor changes in catalytic efficiency (FIG.1A-1B, FIG.2C), as measured by mass spectrometry detection of the product / substrate ratio. It agreed with our observation that the sequences of peptide / protein substrates at the P7-P2 positions are relatively loosely recognized. In contrast, the N-terminal substrate recognition groove features two lines of side chains that form specific interactions with the substrate. Side chains from residues 88-103, as well as N119, 1121, and H123, form the upper surface of this groove, while side chains from a-helix 3 (residues 125-140) constitute the bottom surface. The bound peptide substrate within this N-terminal binding groove adopts a p-sheet secondary structure, interacting with these upper and lower surface side chains.

[0523]

[0299] Among these substrate residues, the upward-facing amino acids (P7', P8', P1 O', and P12') interact with the upper residues. For instance, F99 and 1121 from MmCNT interact with the residue at the P10' position, which prefers small-sized hydrophobic residues. 1101 and F123 interact with the substrate's P8' residue, which is compatible with either proline or another mid-sized hydrophobic residue. E103 from MmCNT interacts with the P7' residue of the substrate, where lysine is preferable, explaining why an alanine replacement at P7' is highly unfavourable. Similarly, the downward-facing residues of the substrate interact with the bottom surface. MmCNT W140 and K137 interact with the substrate's P11' residue, which does not accommodate bulky features at this position. N133 from MmCNT is in close proximity to the P9' position of the substrate, where smaller residues are preferred. Closer to the catalytic pocket, the substrate's P4' and P5' residues interact with K88 from MmCNT, which accommodates amino acids with mid-sized or negatively charged side chains at these locations. Our alanine scan experiment, using peptide ligation efficiency as the readout, confirmed these observations.

[0524]

[0300] Overall, the sequences of peptide / protein substrates at the P7-P2 positions are loosely recognized, while the N-terminal substrate recognition groove (reading the P4’-P13’ residues from the substrate) forms a highly specific interaction, resembling a 'keyhole' that only allows binding of substrates with a stringent combination of amino acids, like a 'key* (FIG.2C). This observation contrasts with previously known protein ligation enzymes (FIG. 2D). In Sortase A, the N-terminal end of the substrate (P2’-P5’) interacts loosely with 2-3 surface residues (highlighted in blue in FIG. 2D, middle panel), allowing any N-terminal sequence with two small amino acids (Gly-Gly at PT and P2') to be recognized to some extent, with no specificity for the positions P3' onwards. This permissive recognition allows repeated recognition of the ligation products as substrates, resulting in a highly reversible reaction. A similar issue exists with OaAEP1 and other plant AEP-derived ligases, where the N-terminal substrate recognition pocket (PT-P5’) is absent, leading to less restricted substrate recognition. In comparison, the extensive N-terminal substrate recognition groove in the MmCNT scaffold holds significant potential for specificity engineering.

[0525] Example 3: The catalytic pocket.

[0526]

[0301] Through the analysis of the MmCNT-substrate complex structure, we learned about the structural origin of substrate specificity. To comprehend the catalytic mechanism and distinguish a protein ligase from a protease scaffold, we conducted an in-depth analysis of the catalytic pocket.

[0302] During either a protease or protein ligation reaction, the C-terminal substrate sits in the catalytic pocket of MmCNT. As shown in FIG. 4A, the aspartic acid residue at the P1 position is attacked by threonine 1 from MmCNT (note that the T1A in the actual crystal structure is inactive), forming a transition state complex in which the C-terminal substrate is directly linked to Thr1 of MmCNT via a peptidyl bond, while proline at the PT position is cleaved.

[0527]

[0303] This reaction requires a destabilized peptidyl bond between Asp at P1 and Pro at P1 ’ to be energetically favoured. Indeed, by observing the conformation of a peptide substrate captured in our complex structure, we noticed that the Pro-Gly-Ala sequence at the P1 ’-P3’ positions is highly unstable (FIG. 5B, orange colour) based on its bond angle constraints. Thus, we would like to understand how a less stressed amino acid at P1 would' would impact the MmCNT’s processivity. We then prepared 20 different peptide ligands with different amino acid at the PT position, at its very N-terminal. So, when they were consumed during the first round of ligation, a more stabilized PT residue would prevent the ligation product from repeated cleavage. Interestingly, smaller amino acids, like Gly or Ala, etc., at PT position significantly improve the eventual peptide ligation efficiency (FIG. 4C, as measured by product / substrate ratio using mass-spec).

[0528]

[0304] During Coot refinement, mutating the Pro residue at PT to a Gly residue and rationalizing the model resulted in a relatively stable secondary structure with minimal constraints at the P1 -P1 ' junction of the peptide substrate (shown in FIG.4B, grey colour). Based on this observation, we predict that a Gly-Gly-Ala sequence at the PT-P3’ positions of the substrate would be unfavourable for MmCNT-catalyzed protease or ligase activity. Interestingly, validating this finding helped us obtain a much more processive protease.

[0529] Example 4: Structure based modification of substrate recognition motifs to suppress reversible protein ligation

[0530]

[0305] In a series of peptide substrate screenings, we fixed the PT-P3' positions of the C-terminal substrate as Pro-Gly-Ala, making them easier to cleave. We then matched this with 20 different N-terminal substrate peptides, each with X-Gly-Ala at the N-terminal end (FIG.4C), and evaluated peptide ligation efficiency by monitoring the ratio of ligated product to remaining substrate. Smaller residues such as Gly, Ala, and Vai at the N-terminal end, instead of Pro, improved ligation efficiency. To uncover the source of this enhancement, we conducted a time-series ligation experiment, again monitoring the output / input ratio (FIG. 4D). Gly-Gly-Ala was significantly more processive than Pro-Gly-Ala. This is because, in the ligated products, internal peptide sequences can be recognized by MmCNT. If Pro-Gly-Ala remains at the PT-P3' positions, the ligated product can be easily cleaved and reused as a substrate. In the case of the N-terminal GGA peptide, once ligated with the C-terminal substrate, Gly-Gly-Ala in the catalytic pocket is stable, and the peptidyl bond between Asp (P1) and Gly (P1 ') is not stressed, preventing further cleavage. After 30 minutes, the product and substrate for the PGA design reach equilibrium, whereas the GGA design allows the ligation reaction to continue, consuming the substrate and accumulating the products. Thus, using a GGA N-terminal substrate or similar residues that better fit the catalytic pocket conformation makes the MmCNT-catalyzed ligation reaction irreversible and highly processive.

[0531]

[0306] This finding overcomes a major disadvantage of MmCNT and provides a potential strategy to enhance repurposed proteases as efficient protein ligases.

[0532] Example 5: Applications in protein ligation experiments.

[0533]

[0307] Bearing the GGA-enhanced processivity in mind, we designed protein ligation experiments to test whether our findings hold true for large, well-folded protein substrates. First, we confirmed that substrates with a C-terminal motif containing Pro-Gly-Ala at the P1 ’-P3’ positions are readily recognized and compatible with MmCNT-catalyzed ligation using a Gly-Gly-Ala N-terminal motif (FIG. 5A). 10 M of protein substrate (Ub-KDPGA and GGA-Ub-m Cherry, respectively) were incubated with 2 M MmCNT, resulting in more than 40% (estimated based on intensity) of the substrate being ligated after 15 minutes, with the ligation yield increasing to over 60% after 60 minutes. This demonstrated that a combination of the C-terminal KDPGA sequence and an N-terminal GGA motif is highly effective in MmCNT-catalyzed protein-protein ligation experiments.

[0534]

[0308] To verify whether the processivity of MmCNT is indeed improved by swapping to the GGA N-terminal motif, we performed the same ligation experiment using a combination of C-terminal KDPGA and N-terminal PGA motifs. The ligation yield stagnated at approximately 40%, perfectly depicting the equilibrium state of a reversible ligation reaction (FIG. 5B). To further confirm that the enhanced processive protein ligation is due to the cleavage-resistant property of a C-terminal or internal GGA at the PT-P3’ location, we engineered two ubiquitin constructs with C-terminal KDPGA and KDGGA, respectively. We attempted ligation reactions with these two different C-terminal substrates side-by-side and tracked the ligation product using a fluorescently labelled SNAP N-terminal substrate (FIG.

[0535] 5C). The results confirmed our hypothesis and demonstrated that C-terminal KDGGA is highly resistant to further cleavage and reversible protein ligations. Thus, our structural analysis of the MmCNT-substrate complex identified a method to enhance the processivity of MmCNT-catalyzed protein ligations.

[0536]

[0309] With this knowledge, we tested whether these modified substrate recognition motifs would facilitate efficient target protein modifications in a highly heterogeneous background, including serum conditions. Typically, Sortase A and OaAEP1(247A) ligation reactions must be conducted in relatively clean environments, as the background peptide and protein environment in fetal bovine serum (FBS) can partially inhibit and contaminate the labelling reaction. We conducted a series of protein-protein ligations in the presence of FBS up to 100%. MmCNT demonstrated little reduction in efficiency, precisely recognizing Ub-KDPGA and GGA-Ub-mCherry, and ligating them smoothly. This demonstrated the unprecedented in vivo compatibility of the structure-inspired enhanced MmCNT ligation protocol. Example 6: Applications in cellular surface labelling.

[0537]

[0310] MmCNT presents a novel receptor engineering tool with the potential to modify cellular surfaces. We first transfect mammalian cells with plasmids encoding EGFP tagged type II transmembrane protein with the c-terminal (5)KDPGA(10) recognition sequence. To test their cellular surface ligation potential, small protein cargo substrate containing the improved N-terminal recognition sequence (e.g N-terminal recognition Ub-mCherry and N-terminal recognition motif FLAG) was produced and used for ligation examination. Both fluorescence microscopy imaging and flow cytometry sorting analysis (FACS) was used to validate the activity of MmCNT on those transfected EGFP-TM-(5)KDPGA(10) cells. Ligated surface of the EGFP-TM-(5)KDPGA(10) transfected cells with mCherry-tag can be visualized under a fluorescence microscopy imaging (FIG.6A and 6B) while ligated surface of the EGFP-TM-(5)KDPGA(10) transfected cells with the FLAG-tag can then be immunostained and processed using flow FACS to identify cellular surface ligation (FIG. 6C).

[0538]

[0311] With these studies, we would like to further investigate if mmCNT can precisely ligate target cargo protein onto the cellular surface. Firstly, we produced FLAG tagged anti-CD19 scFv antibodies with the improved N-terminal recognition motif in house (FIG. 7A). Similarly, reactions involving substrates Ub(5)KDPGA(10) and anti-CD19 scFv bearing an improved N-terminal recognition sequence, in the presence of mmCNT, resulted in a ligated product Ub(5)KDPGA(21)-anti-CD19 scFv whereas no reaction occurred without mmCNT (FIG. 7B). Interestingly, EGFP-TM-(5)KDPGA(10) transfected cells can also be ligated with anti-CD19 scFv bearing an improved N-terminal recognition sequence, in the presence of mmCNT (FIG. 7C). Hence, there is potential for anti-CD19 scFv to be ligated to cell surface membranes transduced with KDPGA motif.

[0539] Example 7: High precision single molecule experiments enabled by enhanced MmCNT catalysed protein ligation

[0540]

[0312] In AFM-SMFS studies of protein, the target protein is immobilized in the system and polymerized proteins (polyprotein) is often constructed for precise single-molecule measurement, whose unfolding leads to characteristic sawtooth-like force-extension curves as reliable signals23-28. To demonstrate the potential of connectase for robust protein immobilization and polymerization, we combined it with OaAEP1(C247A) for stepwise protein ligation using ubiquitin (Ub) and the 27th Ig domain (I27) of titin29. The two protein monomers with specific enzymatic recognition peptide sequences on both termini, GL-Ub-(5)KDPGA(10) and GGA(10)-I27-NGL, were constructed for enzymatic ligation.

[0541]

[0313] First, we immobilized Ub on a glass surface using connectase (FIG. 8A). Peptide GGA(10)-N3 was coated on a DBCO-functionalized glass coverslip via click chemistry as previously reported (SI)30. Then, ubiquitin monomer GL-Ub-(5)KDPGA(10) was immobilized on the coverslip catalysed by connectase (Step 0). For precise AFM-SMFS measurement, Coh-NGL was reacted with Ub via OaAEP1 be-tween GL and NGL peptide sequences as a pulling handle (Step 1), and an AFM tip functionalized with GB1-XDoc was used to pick up the Ub via reversible Coh-XDoc interaction31,32. Upon stretching, the representative force-extension curve showed two unfolding peaks (FIG. 8B, Curve 1 and Plot 1), corresponding to Ub (contour length increment from protein unfolding ΔLc, ~24 nm) and unfolding fingerprint GB1 (~18 nm) with normal unfolding force33, respectively, confirming successful protein immobilization.

[0542]

[0314] Next, we demonstrated the stepwise ligation of protein on the surface. After Ub immobilization, we skipped the ligation (Step 1) of Coh-NGL for direct AFM measurement. Instead, the GGA(10)-I27-NGL monomer was ligated first using OaAEP1, leading to the construction of (Ub)1 -(127)1 (FIG. 8C, Step2). Then, Coh-NGL was capped for AFM measurement as before, which showed an additional unfolding peak (28 nm) from I27 as expected.

[0543]

[0315] Following this strategy, we repeated this stepwise enzymatic ligation procedure (FIG. 8C). As a result, poly-proteins with controlled sequences, (Ub)2-(I27) 1, (Ub) 2-(l27)2, and (Ub)3-(I27)2, were all obtained through this method using OaAEP1 and MmCNT (Step 3-5). Their AFM-SMFS measurements all showed expected unfolding events from Ub and 127. Thus, the result demonstrated that our connectase is an excellent enzyme for protein immobilization and polymerization, which is bio-orthogonal with other enzymes.

[0544] Example 8: Intracellular protein-protein ligation in live mammalian cells

[0545]

[0316] Studying proteins within their native cellular environment is fundamental to advancing our understanding of eukaryotic cell biology. In live mammalian cells, tracking or manipulating a protein of interest requires an experimental handle, typically achieved by appending amino acid sequences ranging from short peptides to large fusion proteins. While these tags have transformed cellular protein research, they present notable limitations: they can alter the behaviour of the target protein and offer limited temporal resolution for probing dynamic processes. Chemical labelling strategies, prized fortheir small size and structural diversity, offer an alternative but remain challenging to implement in live mammalian systems. These constraints underscore the urgent need for innovative technologies that enable precise, minimally invasive, and temporally controlled labelling and manipulation of proteins in living mammalian cells

[0546]

[0317] We presented here an ultra-stable version of a novel enzyme entity that is distinct from all previous protein ligation platform, compatible with serum conditions and even in vivo within the cellular compartments, hence capable of conducting both extracellular and intracellular protein engineering under its ‘native’ states.

[0547]

[0318] We introduced intracellular protein-protein ligation in vivo using MmCNT with precision and specificity to both intracellular native proteins contain the C-term and N-term recognition motifs

[0548]

[0319] FIG. 9A-9C and 10 demonstrate intracellular protein ligation catalysed by MmConnectase (MmCNT) between two complementary peptide substrates expressed in mammalian cells. In this system, an EGFP fusion construct bearing a C-terminal KDPGA motif and an mCherry fusion construct bearing an N-terminal GGA motif were co-expressed together with the MmCNT enzyme. The schematic of FIG. 9A illustrates the mechanism of action, whereby MmCNT recognizes and cleaves the peptide bond between the aspartate (D) and the following residue (X) in the KDPGA motif of one substrate and subsequently ligates the liberated C-terminal D residue to the N-terminal GGA motif of the second substrate, resulting in the formation of a covalently linked EGFP-KDGGA-Ub-mCherry fusion product within the intracellular environment.

[0549]

[0320] Fluorescence microscopy (FIG. 9A-9C) confirms this intracellular ligation event. In cells cotransfected with all three plasmids, EGFP-KDPGA, GGA-mCherry, and MmCNT, overlapping green (EGFP) and red (mCherry) fluorescence signals are observed. This colocalization indicates the successful formation of a single polypeptide containing both fluorescent domains. In contrast, in the absence of MmCNT, EGFP and mCherry fluorescence remain distinct and spatially separated (FIG.9B and 9C), confirming that spontaneous ligation does not occur and that ligation is dependent on the catalytic activity of MmCNT. These findings demonstrate that MmCNT is capable of mediating efficient peptide ligation in living cells, enabling the covalent tagging or modification of intracellular proteins.

[0550]

[0321] Supporting SDS-PAGE analysis (FIG. 10) provides biochemical evidence of this intracellular ligation. Lysates from transfected HEK293T cells were analysed under fluorescence detection for mCherry (Alexa 594 channel) and EGFP (Alexa 488 channel). A distinct dual-fluorescent band at higher molecular weight, visible in both the green and red channels, appears only in samples co-expressing the ligase and both substrates. The same band is absent in control lanes lacking MmCNT, confirming that the observed product arises from enzyme-dependent ligation. The Coomassie-stained gel verifies comparable protein loading across all samples. Collectively, the microscopy and SDS-PAGE results confirm that MmCNT retains catalytic activity within mammalian cells and can promote the formation of covalently linked protein multimers through recognition of its cognate KDPGA-GGA peptide motifs.

[0551]

[0322] Accordingly, we present a method for intracellular protein-protein ligation in live mammalian cells, enabling the construction of protein complexes and the integration of functional labels directly within the cellular environment. This approach allows for the precise labelling and manipulation of both exogenous and endogenous proteins in a minimally disruptive and temporally controlled manner. By expanding the toolkit for studying proteins in situ, this ligation strategy offers a versatile and broadly applicable platform for editing, observing, and regulating a wide array of target proteins across diverse biological applications.

[0552]

[0323] Further, we identified that intracellularly expressed MmCNT, without purification, can ligate protein substrates and alter protein boundaries with certain compatible recognition motifs. As a result, it overcomes a major challenge in cellular protein modification by minimizing unintended alterations and preserving native cellular function. Furthermore, the method is applicable to a diverse set of proteins that occupy distinct locations and perform varied roles in the cells, such as the cytoskeleton components, kinases and even transcription factors, broadening its potential across biomedical and biotechnological applications. As the result of intracellular ligation, the cellular localization, biochemical tags, biochemical properties, stability, activation states, etc. could be altered. In addition, the substrate could be delivered directly into the host cell, and ligation activity could be used to indirectly probe cross membrane delivery efficiency of protein and peptides.

[0553]

[0324] This demonstrates a co-expressed MmCNT functions directly inside host cell without going through purification process, and has, at least in part, the following novel features:

[0554] • Crude expression of MmCNT and / or its variants in host cell without the need for calibrating the reaction condition or purifying the MmCNT or its variants; and

[0555] • Precise ligation of target proteins / peptides inside the host cell. Depending on the biochemical properties of the target proteins / peptides, such ligation reactions and the ligated products may have the desirable cellular localization, clustering / binding partners, posttranslational modifications, etc. in its native cellular environment.

[0556]

[0325] This platform represents a significant advancement in protein ligation technology, offering robust, site-specific conjugation intra cellularly. When the host cell expresses MmCNT or its variants spontaneously or induced by certain signals, corresponding substrate proteins / peptides containing specific amino acid sequences (recognition motifs) will be ligated and irreversibly form a protein / peptide product.

[0557]

[0326] This enables a one-step approach for protein labelling within living cells, including modification of intracellular and organelle or cellular membrane surfaces, with minimal off-target modification. It also provides a direct ligation or coupling of two polypeptides possessing compatible terminal motifs in the presence of complex intracellular background, wherein the reaction proceeds efficiently despite the presence of numerous endogenous proteins and peptides in their native physiological environment. This system thus represents a potentially accurate and efficient intracellular labelling and protein-tracing technology applicable to living host cells. Furthermore, the ligation reaction can be precisely controlled by regulating the expression timing and level ofthe peptide ligase, allowing temporal control over protein tagging and modification events.

[0558]

[0327] The invention has been described broadly and generically herein. Each ofthe narrowerspecies and subgeneric groupings falling within the generic disclosure also form part of the invention. This includes the generic description of the invention with a proviso or negative limitation removing any subject matter from the genus, regardless of whether or not the excised material is specifically recited herein. Other embodiments are within the following claims. In addition, where features or aspects ofthe invention are described in terms of Markush groups, those skilled in the art will recognize that the invention is also thereby described in terms of any individual member or subgroup of members of the Markush group.

[0328] One skilled in the art would readily appreciate that the present invention is well adapted to carry out the objects and obtain the ends and advantages mentioned, as well as those inherent therein. Further, it will be readily apparent to one skilled in the art that varying substitutions and modifications may be made to the invention disclosed herein without departing from the scope and spirit of the invention. The methods, uses and peptide ligases described herein are presently representative of preferred embodiments are exemplary and are not intended as limitations on the scope of the invention. Changes therein and other uses will occur to those skilled in the art which are encompassed within the spirit of the invention are defined by the scope of the claims. The listing or discussion of a previously published document in this specification should not necessarily be taken as an acknowledgement that the document is part of the state of the art or is common general knowledge.

[0559]

[0329] The invention illustratively described herein may suitably be practiced in the absence of any element or elements, limitation or limitations, not specifically disclosed herein. Thus, for example, the terms “comprising”, '‘including,” containing", etc. shall be read expansively and without limitation. The word "comprise" or variations such as "comprises" or "comprising" will accordingly be understood to imply the inclusion of a stated integer or groups of integers but not the exclusion of any other integer or group of integers. Additionally, the terms and expressions employed herein have been used as terms of description and not of limitation, and there is no intention in the use of such terms and expressions of excluding any equivalents of the features shown and described or portions thereof, but it is recognized that various modifications are possible within the scope of the invention claimed. Thus, it should be understood that although the present invention has been specifically disclosed by exemplary embodiments and optional features, modification and variation of the inventions embodied therein herein disclosed may be resorted to by those skilled in the art, and that such modifications and variations are considered to be within the scope of this invention.

[0560]

[0330] The content of all documents and patent documents cited herein is incorporated by reference in their entirety.

[0561] REFERENCES

[0562] (1) Chang, T. K.; Jackson, D. Y.; Burnier, J. P.; Wells, J. A. Subtiligase: A Tool for Semisynthesis of Proteins. Proceedings of the National Academy of Sciences 1994, 91 (26), 12544-12548. https: / / doi.org / 10.1073 / pnas.91.26.12544.

[0563] (2) Evans, T. C.; Xu, M. Q. Intein-Mediated Protein Ligation: Harnessing Nature's Escape Artists. Biopolymers 1999, 51 (5), 333-342.

[0564] (3) Pishesha, N.; Ingram, J. R.; Ploegh, H. L. Sortase A: A Model for Transpeptidation and Its Biological Applications. Annu Rev Cell Dev Biol 2018, 34, 163-188. https: / / doi.org / 10.1146 / annurev-cellbio-100617- 062527.

[0565] (4) Rehm, F. B. H.; Harmand, T. J.; Yap, K.; Durek, T.; Craik, D. J.; Ploegh, H. L. Site-Specific Sequential Protein Labeling Catalyzed by a Single Recombinant Ligase. J Am Chem Soc 2019, 141 (43), 17388-17393. https: / / doi.org / 10.1021 / jacs.9b09166.

[0566] (5) Witte, M. D.; Wu, T.; Guimaraes, C. P.; Theile, C. S.; Blom, A. E. M.; Ingram, J. R.; Li, Z.; Kundrat, L.; Goldberg, S. D.; Ploegh, H. L. Site-Specific Protein Modification Using Immobilized Sortase in Batch and Continuous-Flow Systems. Nat Protoc 2015, 10 (3), 508-516. https: / / doi.org / 10.1038 / nprot.2015.026.

[0567] (6) Nguyen, G. K. T.; Wang, S.; Qiu, Y.; Hemu, X.; Lian, Y.; Tam, J. P. Butelase 1 Is an Asx-Specific Ligase Enabling Peptide Macrocyclization and Synthesis. Nat Chem Biol 2014, 70 (9), 732-738. https: / / doi.org / 10.1038 / nchembio.1586. (7) Yang, R.; Wong, Y. H.; Nguyen, G. K. T.; Tam, J. P.; Lescar, J; Wu, B. Engineering a Catalytically Efficient Recombinant Protein Ligase. J Am Chem Soc 2017, 139 (15), 5351-5358. https: / / doi.org / 10.1021 / jacs.6b12637. (8) Harris, K. S.; Durek, T.; Kaas, Q.; Poth, A. G.; Gilding, E. K.; Conlan, B. F.; Saska, I.; Daly, N. L.; van der Weerden, N. L.; Craik, D. J.; Anderson, M. A. Efficient Backbone Cyclization of Linear Peptides by a Recombinant Asparaginyl Endopeptidase. Nat Commun 2015, 6 (1), 10199. https: / / doi.org / 10.1038 / ncomms10199.

[0568] (9) Hu, S.; El Sahili, A.; Kishore, S.; Wong, Y. H.; Hemu, X.; Goh, B. C.; Zhipei, S.; Wang, Z.; Tam, J. P.; Liu, C.-F.; Lescar, J. Structural Basis for Proenzyme Maturation, Substrate Recognition, and Ligation by a Hyperactive Peptide Asparaginyl Ligase. Plant Cell 2022, 34 (12), 4936-4949. https: / / doi.org / 10.1093 / plcell / koac281. (10) Harmand, T. J.; Pishesha, N.; Rehm, F. B. H.; Ma, W.; Pinney, W. B.; Xie, Y. J.; Ploegh, H. L. Asparaginyl Ligase-Catalyzed One-Step Cell Surface Modification of Red Blood Cells. ACS Chem Biol 2021, 16 (7), 1201— 1207. https: / / doi.org / 10.1021 / acschembio.1c00216.

[0569] (11) Lau, J. L.; Dunn, M. K. Therapeutic Peptides: Historical Perspectives, Current Development Trends, and Future Directions. Bioorg Med Chem 2018, 26 (10), 2700-2707. https: / / doi.org / 10.1016 / j.bmc.2017.06.052. (12) Schmidt, M.; Toplak, A.; Quaedflieg, P. J.; Nuijens, T. Enzyme-Mediated Ligation Technologies for Peptides and Proteins. CurrOpin Chem Biol 2017, 38, 1-7. https: / / doi.org / 10.1016 / j.cbpa.2017.01.017.

[0570] (13) Fuchs, A. C. D.; Ammelburg, M.; Martin, J.; Schmitz, R. A.; Hartmann, M. D.; Lupas, A. N. Archaeal Connectase Is a Specific and Efficient Protein Ligase Related to Proteasome p Subunits. Proceedings of the National Academy of Sciences 2021, 118 (11). https: / / doi.org / 10.1073 / pnas.2017871118.

Claims

CLAIMSWhat is claimed is:

1. A method of (poly)peptide ligation, comprising,providing a first (poly)peptide comprising a C-terminal K-D-X-G-A motif, wherein X is any amino acid,providing a second (poly)peptide comprising an N-terminal pi’-P^-P3' motif, wherein P1’ is A, V, G, S, T, L, or M, P2" is G or an analogue of G, and P3' is A or an analogue of A, wherein the second (poly)peptide may be the same or different to the first (poly)peptide, andcontacting the first (poly)peptide and the second (poly)peptide with a peptide ligase having the activity of MmConnectase (MmCNT), under conditions suitable for a cleavage and ligation reaction, wherein the peptide ligase cleaves the bond between the C-terminal D and X residue in the first (poly)peptide and ligates the C-terminal D residue of the first (poly)peptide to the N-terminal pi’-pz’-P3’ motif of the second (poly)peptide to form a ligated (poly)peptide.

2. The method of claim 1, wherein P1' is A, V, G, S, T, or L, P^is G, and P3’ is A.

3. The method of claim 2, wherein the N-terminal pf-p^-p3’ motif is G-G-A.

4. The method of claim 1, wherein the second (poly)peptide comprises a N-terminal amino acid sequence P1”-P2”-P3”-(X)q, wherein q is an integer selected from 1 to 15, and X can be any amino acid.

5. The method of any one of claims 4, wherein the second (poly)peptide comprises a N-terminal amino acid sequence set forth in any one of SEQ ID NOs: 75, 80, 84, 85, 90-92 and 101, or variants thereof6. The method of any one of claims 1 -5, wherein the C-terminal K-D-X-G-A motif is K-D-P-G-A or K-D-G-G-A.

7. The method of claim 1, wherein the first (poly)peptide comprises a C-terminal amino acid sequence (X)n-K-D-X-G-A-(X)m, wherein n is an integer selected from 0 to 5, m is an integer selected from 0 to 10, and X can be any amino acid, preferably n is 5 and m is 10.

8. The method of any one of claims 7, wherein the first (poly)peptide comprises a C-terminal amino acid sequence set forth in any one of SEQ ID NOs: 69-72 and 95-97, or variants thereof9. The method of any one of claims 1-8, wherein the first and second (poly)peptides are termini of the same peptide such that the method cyclizes said peptide.

10. The method of any one of claims 1-9, wherein the first and / or second (poly)peptide further comprises a labelling moiety, preferably wherein the labelling moiety is an affinity tag, therapeutic agent, detectable label, or scaffold molecule.

11. The method of any one of claims 1-10, wherein the first and / or second (poly)peptides is a cellular surface protein such thatthe method results in the modification ortagging of the cellular surface protein and cellular surface.

12. The method of any one of claims 1-11, wherein the first and / or second (poly)peptides are intracellular proteins of a host cell, and the protein ligase is comprised within said host cell.

13. The method of any one of claims 1-12, wherein the first and / or second (poly)peptide is coupled to a solid support material.

14. The method of claim 13, comprising coupling the second peptide on the solid support material; and ligating the first (poly)peptide to the second (poly)peptide by the peptide ligase.

15. The method of any one of claims 1-14, wherein the analogue of Gly and Ala is selected or validated based on molecular modelling or docking analysis using the atomic coordinates of the peptide ligase deposited under PDB accession number 8WKD or8JTU.

16. The method of any one of claims 1-15, wherein the ligation of the first (poly)peptide to the second (poly)peptide is irreversible.

17. The method of any one of claims 1-16, wherein the first (poly)peptide is operably fused to the C-terminus of the peptide ligase, wherein the fusion maintains the ligase activity of the peptide ligase and the accessibility of the C-terminal K-D-X-G-A motif of the first (poly)peptide.

18. A method for modifying ortagging the surface of a target cell by one or more (poly)peptides of interest, the method comprising,providing the one or more (poly)peptides of interest comprise a C-terminal K-D-X-G-A motif and / or an N-terminal pi"-p2"-p3" motif, wherein X is any amino acid, P1" is A, V, G, S, T, L, or M, P2" is G or an analogue of G, and P3“ is A or an analogue of A;providing a peptide ligase having the activity of MmCNT;contacting the target cell with the one or more (poly)peptides of interest and the peptide ligase; andsubjecting the target cell to conditions that allow the peptide ligase to catalyse the ligation of the one or more (poly)peptides of interest to a cellular surface protein of the target cell, wherein the cellular surface protein comprises a MmCNT recognition motif complementary to that comprised in the one or more (polyjpeptides of interest.

19. The method of claim 18, wherein the one or more (poly)peptides of interest comprise a labelling moiety, therapeutic agent, detectable label, or scaffold molecule.

20. The method of claim 18, wherein the cellular surface protein comprises a C-terminal K-D-X-G-A motif and / or an N-terminal P1'-P2'-P3' motif.

21. A method for intracellular (poly)peptide ligation, comprising,providing a host cell that comprises, within its intracellular environment,a first (poly)peptide comprising a C-terminal K-D-X-G-A motif, wherein X is any amino acid,a second (poly)peptide comprising an N-terminal pi'-pz'-ps' motif, wherein P1' is A, V, G, S, T, L, or M, P2’ is G or an analogue of G, and P3" is A or an analogue of A, anda peptide ligase having the activity of MmCNT, andsubjecting the host cell to conditions that allow the peptide ligase to catalyse the ligation of the first and second (poly)peptides within the intracellular environment of the host cell.

22. The method of claim 21, wherein the host cell is a mammalian cell, preferably a human cell.

23. The method of claim 21, comprising introducing into the host cell, one or more nucleic acid molecules encoding the first (polyjpeptide, the second (poly)peptide, and the peptide ligase, optionally wherein expression of the first (polyjpeptide, the second (poly)peptide, and / or the peptide ligase are under inducible or constitutive control within the host cell.

24. A method for tandem ligation, comprising,providing a first (polyjpeptide (A) comprising an N-terminal P1“-P2"-P3' motif and a C-terminal K-D-X-G-A motif,providing a second (poly)peptide (B) comprising an N-terminal P1'-P2'-P3' motif or a C-terminal K-D-X-G-A motif,providing a third (poly)peptide (C) comprising an N-terminal P1'-P2'-P3' motif or a C-terminal K-D-X-G-A motif,wherein X is any amino acid, P1’ is A, V, G, S, T, L, or M, P2” is G or an analogue of G, and P3’ is A or an analogue of A,wherein the third (polyjpeptide comprises a different MmCNT recognition motif to the second (polyjpeptide,contacting the first (poly)peptide (A) with the second (polyjpeptide (B) and the peptide ligase having the activity of MmCNT under conditions that allow ligation of the second (polyjpeptide to the C-or N-terminal of the first (poly)peptide to yield a modified first (polyjpeptide; andcontacting the modified first (poly)peptide with the third (poly)peptide (D) and the peptide ligase having the activity of MmCNT under conditions that allow ligation of the third (poly)peptide to the N- or C-terminal of the first (poly)peptide to yield a dually modified first (poly)peptide.

25. The method of claim 24, whereinthe second (poly)peptide comprises the C-terminal K-D-X-G-A motif and is ligated to the N-terminal of the first (poly)peptide, and the third (poly)peptide comprises the N-terminal pi’-p^-py motif and is ligated to the C-terminal of the first (poly)peptide; orthe second (poly)peptide comprises the N-terminal pi'-pz'.pz’ motif and is ligated to the C-terminal of the first (poly)peptide, and the third (poly)peptide comprises the C-terminal K-D-X-G-A motif and is ligated to the N-terminal of the first (poly)peptide.

26. A method of preparing a dimer, oligomer, or multimer of one or more (poly)peptides of interest, comprising,providing one or more (poly)peptides of interest having a C-terminal K-D-X-G-A motif, and an N-terminal pi'.pz’.p3' motif, wherein X is any amino acid, Pris A, V, G, S, T, L, or M, P2’ is G or an analogue of G, and P3’ is A or an analogue of A,providing a peptide ligase having the activity of MmCNT,contacting the one or more (poly)peptides of interest, and the peptide ligase having the activity of MmCNT under conditions that allow the peptide ligase to catalyse ligation of one (poly)peptide of interest with another (poly)peptide of interest to form a dimer, oligomer, or multimer of the one or more (poly)peptides of interest.

27. The method of claim 26, further comprising immobilizing a scaffold molecule on to a solid support material, wherein the scaffold molecule comprises one or more copies of the N-terminal pi'-pr-p3’ motif, or one or more copies of the C-terminal K-D-X-G-A motif.

28. A method for preparing a dimer, oligomer, or multimer of two different (poly)peptides comprising, providing a first peptide ligase having the activity of MmCNT,providing a second peptide ligase, wherein the second peptide ligase is different to the first peptide ligase,providing at least one first (poly)peptide having an N-terminal pi’-pz’-p3" motif or a C-terminal K-D-X-G-A motif, and a C- or N-terminal recognition motif for the second peptide ligase, providing at least one second (poly)peptide having an N-terminal pi’.pz'-pa" motif or a C-terminal K-D-X-G-A motif, and a C- or N-terminal recognition motif for the second peptide ligase, wherein X is any amino acid, P1’ is A, V, G, S, T, L, or M, P2’ is G or an analogue of G, and P3’ is A or an analogue of A,wherein each of the at least one first (poly)peptide has different C- and N-terminal recognition motifs to each of the at least one second (poly)peptide, andcontacting the at least one first (poly)peptide and the at least one second (poly)peptide with the first and second peptide ligase, under conditions suitable for a cleavage and ligation reaction, to form a dimer, oligomer, or multimer of the two different (poly)peptides.

29. The method of claim 28, further comprising immobilising a scaffold molecule on to a solid support material, wherein the scaffold molecule comprises one or more copies of the N-terminal P1"-P2'-P3' motif, or one or more copies of C-terminal K-D-X-G-A motif, or the scaffold molecule comprises one or more copies of the N-terminal recognition motif for the second peptide ligase, or one or more copies of C-terminal recognition motif for the second peptide ligase.

30. The method of claim 28, wherein the second peptide ligase is OaAEP1(C247A) or a functional variant thereof, wherein the N-terminal recognition motif for the second peptide ligase is GL, and the C-terminal recognition motif for the second peptide ligase is NGL.

31. The method of any one of claims 1 -30, wherein the peptide ligase having the activity of MmCNT comprises or consists of an amino acid sequence set forth in SEQ ID NO: 1, 2 or 3, or functional variants, or fragments thereof.

32. The method of claim 31, wherein the peptide ligase comprises an amino acid residue A at the position corresponding to position 1 of SEQ ID NO:1, and / or an amino acid residue S at the position corresponding to position 192 of SEQ ID NO:1.

33. The method of claim 31 or 32, wherein the peptide ligase having the activity of MmCNT comprises or consists of an amino acid sequence set forth in SEQ ID NO: 3.

34. The method of any one of claims 31-33, wherein the peptide ligase having the activity of MmCNT comprises an amino acid mutation at the position corresponding to position 81 and / or 125 of SEQ ID NO:1, preferably the mutation is S81 A and / or N125S.

35. The method of any one of claims 31 -34, wherein the peptide ligase having the activity of MmCNT has a three-dimensional structure corresponding to the atomic coordinates deposited under Protein Data Bank (PDB) accession number 8JTU or 8WKD, preferably wherein the peptide ligase having the activity of MmCNT comprises or consists of an amino acid sequence that adopts an overall tertiary structure analogous to the enzymatic scaffold defined by Protein Data Bank accession number 8JTU or 8WKD, chain A, residues 1–192, wherein the root mean square deviation (RMSD) of backbone atoms following structural alignment is within 1.5 Å.

36. A peptide ligase having the activity of MmCNT in accordance with the present application comprises or consists of:(i) the amino acid sequence set forth in SEQ ID NO: 3;(ii) an amino acid sequence that shares at least 65%, preferably at least 75%, even more preferably at least 85%, most preferably at least 95% sequence identity with, or at least 80%, preferably at least 90%, more preferably at least 95% sequence homology with the amino acid sequence as set forth in SEQ ID NO: 3; or(iii) a functional fragment of (i) or (ii),wherein the peptide ligase comprises an amino acid residue S at the position corresponding to position 192 of SEQ ID NO:3.

37. The peptide ligase of claim 36, wherein the peptide ligase comprises or consists of an amino acid sequence set forth in SEQ ID NO: 3.

38. The peptide ligase of claim 36 or 37, wherein the peptide ligase comprises and an amino acid mutation at position 81 and / or 125 of SEQ ID NO:3, preferably the mutation is S81A and / or N125S.

39. A fusion protein comprising the peptide ligase of any one of claim 36-38 operably fused to a ligation substrate.

40. A nucleic acid molecule encoding the peptide ligase according to any one of claims 36-38, optionally wherein said nucleic acid molecule is comprised in a vector.

41. A host cell comprising the nucleic acid molecule of claim 40.

42. Use of a peptide ligase according any one of claims 36-38 for the ligation oftwo (poly)peptides, optionally wherein the two (poly)peptides are intracellular proteins of a host cell.

43. A crystalline form of a peptide ligase having the activity of MmCNT, wherein the crystalline form is characterized by atomic coordinates corresponding to those deposited under Protein Data Bank (PDB) accession number 8 JTU or8WKD, and comprises or consists of an amino acid sequence having at least 80% amino acid sequence identity to the amino acid sequence set forth in SEQ ID NO:2 or 3.

44. The crystalline form of claim 43, wherein the crystalline form adopts an overall tertiary structure analogous to the enzymatic scaffold defined by PDB: 8JTU, chain A, residues 1–192, wherein the RMSD of backbone atoms following structural alignment is within 1.5 A.

45. The crystalline form of claim 44, wherein the crystalline form is characterized with space group P3₁ 2 1, and has unit cell parameters of a=91 Å, b=91 Å, c=90 Å, α=β=90°, γ=120°.

46. The crystalline form of claim 43, wherein the crystalline form adopts an overall tertiary structure analogous to the enzymatic scaffold defined by PDB: 8WKD, chain A, residues 1–192, wherein the RMSD of backbone atoms following structural alignment is within 1.5 Å when bound to a substrate.

47. The crystalline form of claim 46, wherein the crystalline form is characterized with space group P2₁ 2₁ 2, and has unit cell parameters of a=54 Å, b=101 Å, c=33 Å, α=β=γ=90°.

48. The crystalline form according to any one of claims 43-47, for use in a computer-assisted method of structure-based design, docking, simulation, screening, and / or engineering of peptide ligases having the activity of MmCNT and their ligation substrates.