Enzymatic ligation of peptides and / or proteins
Patent Information
- Authority / Receiving Office
- WO · WO
- Patent Type
- Applications
- Current Assignee / Owner
- MAX PLANCK GESELLSCHAFT ZUR FOERDERUNG DER WISSENSCHAFTEN EV
- Filing Date
- 2025-07-11
- Publication Date
- 2026-06-18
AI Technical Summary
Existing methods for polypeptide ligation, particularly using Connectases, result in approximately 50% yield due to reversible reactions, limiting the efficiency and specificity of polypeptide fusion.
A method involving the modification of a partial N-terminal DUF2121 recognition motif in a first cleavage product to prevent recognition by a polypeptide with an N-terminal DUF2121 domain, allowing for the specific fusion of two polypeptides.
Enhances the yield and specificity of polypeptide ligation by ensuring a higher percentage of fusion product formation, overcoming the limitations of reversible reactions in existing methods.
Abstract
Description
[0001] Enzymatic ligation of peptides and / or proteins
[0002] The present invention relates to the provision of means and methods for the enzymatic ligation of peptides and / or proteins. In particular, herein provided are means and methods for the efficient coupling of peptides and / or proteins using transpeptidase enzymes belonging to the Connectase family, which are characterized by an N-terminal DUF2121 domain with an N-terminal serine or threonine residue. Also provided are, inter alia, compositions, polypeptides, and kits for carrying out the herein provided methods.
[0003] This application claims benefit of priority to EP24188474.1, filed July 12, 2024, the entire contents of which are hereby incorporated by reference.
[0004] Protein conjugations are, inter alia, used to fuse fluorophores to proteins and to immobilize proteins on beads, microplates, EM grids, or SPR chips. Among others, they also allow the generation of antibody conjugates or antibody-like constructs, of segmentally labeled proteins for NMR, or for example the fusion of proteins to the cell surface. For such applications, a number of methods are in use. Yet, each of these methods comes with its own advantages and disadvantages. For example, N-hydroxysuccinimidyl (NHS) labeling of lysine residues or maleimide labeling of cysteine residues, allow for easy, but unspecific labeling of e.g. proteins1. Click chemistry is considered as specific but requires the introduction of often undesired non-biological chemical groups on the surface of the to be fused proteins2,3. Split domain methods, such as ’SpyTag- SpyCatcher’, are considered as more specific but tend to introduce long sequences between the fusion partners, which may be undesired4,5. Split intein methods introduce only a short "ligation scar" but may suffer from solubility problems and varying efficiency for different substrate pairs, limiting its applicability6,7. Enzymatic methods are simple and, in some cases, specific, but the resulting fusions are reversible and therefore incomplete (i.e., as will be further explained herein below, only about 50% of the substrates are coupled using such methods)8,9.
[0005] The industrially employed enzyme, Staphylococcus aureus Sortase A, catalyzes the reversible fusion of substrates A and B in form of A-LPXTG (X = any amino acid; SEQ ID NO: 610) and G-B to yield A-LPXTG-B (usually with additional linker sequences)9,1°. Due to the varying length of the recognition motifs, Sortase A is specific for substrate A, but comparably unspecific towards substrate B. Accordingly, the glycine (i.e., “G” in “G-B”) may be replaced by lysine side chains or by other amines without preventing Sortase A activity11. Due to its low affinity for its substrates, Sortase A requires high substrate concentrations for merely moderate activity12. It also catalyzes the irreversible hydrolysis of both educts and products12. However, these undesired properties still constrain the applicability of Sortase A, meaning that it is primarily used for protein-peptide fusions and requires high substrate concentrations with an excess of peptide educts. A number of other enzymes share certain characteristics with Sortase A. These enzymes belong to the cysteine proteases (i.e., Sortase A, butelase, asparaginyl endopeptidase) or serine proteases (i.e., trypsiligase, subtiligase), and bind their substrate through a (thio-)ester bond, which can react with H2O (hydrolysis, irreversible) or the H2N group of substrates (aminolysis, reversible)8,9. Recently, a group of enzyme ligases with entirely different characteristics than for example Sortase A was employed in the fusion of peptides and / or proteins (see WO 2021 / 099484 and Fuchs et al., 2021)13. These enzymes, termed “Connectases”, bind their substrates through an hydrolysis resistant amide bond. Thus, these Connectases exclusively catalyze conjugations with the H2N groups or imino groups of one of its substrates. In addition, Connectases recognize a longer substrate recognition sequence, leading to a higher substrate specificity and to a higher catalytic efficiency as compared to Sortase A. This allows for an entirely different set of technical applications with relevance for example for industry and academia. For example, by the use of Connectase, it is feasible to specifically label proteins with, e.g., fluorophores right within cell extracts. This allows, inter alia, the in-gel detection of such labeled proteins with significantly higher sensitivity and signal-to-noise ratio compared to Western blots14(see, also Figures 7 and 16)
[0006] However, as discussed in further detail herein below, Connectases catalyze a reversible reaction13.
[0007] Accordingly, when using equimolar quantities of two educts, Connectases catalyse an equilibrium of approximately 50% fusion product and 50% educts resulting in only approx. 50% of the maximum product yield.
[0008] Thus, there is a need in the art to increase the potential maximum product yield for polypeptide ligation reactions, in particular, for polypeptide ligation reactions employing the Connectase transpeptidase.
[0009] Accordingly, the technical problem underlying the present invention is the provision of means and methods for the efficient (e.g., resource-efficient) and site-specific ligation of two polypeptides.
[0010] The technical problem is solved by provision of the embodiments provided herein below and as characterized in the appended claims.
[0011] Accordingly, the present invention provides for means and methods for the enzymatic ligation of two polypeptides. In particular, the present invention provides for a method for the ligation of two polypeptides and / or a method for the production of a fusion polypeptide, wherein said method comprises the following steps (i) to (iii):
[0012] (i) contacting a first substrate polypeptide with a polypeptide comprising an N- terminal DUF2121 domain having an N-terminal serine or threonine residue, thereby cleaving said first substrate polypeptide into a first cleavage product and a second cleavage product;
[0013] (ii) modifying a partial N-terminal DUF2121 recognition motif comprised in said first cleavage product so that / to achieve that / thereby it is achieved that said modified partial N-terminal DUF2121 recognition motif is not recognized by said polypeptide comprising an N-terminal DUF2121 domain; and
[0014] (iii) contacting said second cleavage product and / or said polypeptide comprising an N- terminal DUF2121 domain with a second substrate polypeptide, thereby said second cleavage product is fused to said second substrate polypeptide.
[0015] See, for example Figure 17, for a non-limiting and illustrative representation of the herein provided means and methods.
[0016] The present invention provides for the following items:
[0017] 1. A method for the production of a fusion polypeptide, wherein the method comprises the following steps (i) to (iii):
[0018] (i) contacting a first substrate polypeptide with a polypeptide comprising an N- terminal DUF2121 domain having an N-terminal serine or threonine residue, thereby cleaving said first substrate polypeptide into a first cleavage product and a second cleavage product;
[0019] (ii) modifying a partial N-terminal DUF2121 recognition motif comprised in said first cleavage product so that said modified partial N-terminal DUF2121 recognition motif is not recognized by said polypeptide comprising an N-terminal DUF2121 domain; and
[0020] (iii) contacting said second cleavage product and / or said polypeptide comprising an N- terminal DUF2121 domain with a second substrate polypeptide, thereby said second cleavage product is fused to said second substrate polypeptide.
[0021] 2. A method for the production of a circular polypeptide, wherein said method comprises the following steps (i) to (iii):
[0022] (i) contacting a first substrate polypeptide with a polypeptide comprising an N- terminal DUF2121 domain having an N-terminal serine or threonine residue, thereby cleaving said first substrate polypeptide into a first cleavage product and a second cleavage product; (ii) modifying a partial N-terminal DUF2121 recognition motif comprised in said first cleavage product so that said partial N-terminal DUF2121 recognition motif is not recognized by said polypeptide comprising an N-terminal DUF2121 domain; and
[0023] (iii) contacting said second cleavage product and / or said polypeptide comprising an N- terminal DUF2121 domain with a second substrate polypeptide, thereby said second cleavage product is fused to said second substrate polypeptide, wherein said first substrate polypeptide and said second substrate polypeptide are covalently linked, preferably wherein the N-terminus of said first substrate polypeptide is linked to the C-terminus of said second substrate polypeptide. A method for the immobilization of a polypeptide, wherein the method comprises the steps of (0) and (i) to (iii):
[0024] (0) immobilizing the N-terminus of a first substrate polypeptide on a solid carrier or obtaining a first substrate polypeptide immobilized on a solid carrier via its N- terminus, or immobilizing the C-terminus of a second substrate polypeptide on a solid carrier or obtaining a second substrate polypeptide immobilized on a solid carrier via its C-terminus;
[0025] (i) contacting said first substrate polypeptide with a polypeptide comprising an N- terminal DUF2121 domain having an N-terminal serine or threonine residue, thereby cleaving said first substrate polypeptide into a first cleavage product and a second cleavage product;
[0026] (ii) modifying a partial N-terminal DUF2121 recognition motif comprised in said first cleavage product so that said partial N-terminal DUF2121 recognition motif is not recognized by said polypeptide comprising an N-terminal DUF2121 domain; and
[0027] (iii) contacting said second cleavage product and / or said polypeptide comprising an N- terminal DUF2121 domain with a second substrate polypeptide, thereby said second cleavage product is fused to said second substrate polypeptide. The method according to any one of items 1 to 3, wherein said first substrate polypeptide comprises a DUF2121 recognition motif. The method according to item 4, wherein during step (i) said polypeptide comprising an N-terminal DUF2121 domain cleaves said DUF2121 recognition motif comprised in said first substrate polypeptide into said partial N-terminal DUF2121 recognition motif comprised in said first cleavage product and a partial C-terminal DUF2121 recognition motif comprised in said second cleavage product, thereby producing said first cleavage product and said second cleavage product. The method according to any one of items 1 to 5, wherein said first cleavage product is the C-terminal cleavage product of said first substrate polypeptide. The method according to any one of items 1 to 6, wherein said second cleavage product is the N-terminal cleavage product of said first substrate polypeptide. The method according to any one of items 1 to 7, wherein said second cleavage product comprises a partial C-terminal DUF2121 recognition motif. The method according to any one of items 1 to 8, wherein said second substrate polypeptide comprises a partial N-terminal DUF2121 recognition motif. The method according to item 9, wherein said partial N-terminal DUF2121 recognition motif of said first cleavage product and said partial N-terminal DUF2121 recognition motif of said second substrate polypeptide are not identical. The method according to any one of items 1 to 10, wherein the N-terminal amino acid of said first cleavage product and the N-terminal amino acid of said second substrate polypeptide are selected from proline, alanine, cysteine, serine, valine, tryptophan, methionine, glycine, leucine, phenylalanine, isoleucine, aspartate, glutamate, histidine, asparagine, glutamine, tryptophan, or tyrosine and wherein said N-terminal amino acid of said first cleavage product and said N-terminal amino acid of said second substrate polypeptide are not identical. The method according to any one of items 1 to 11, wherein the N-terminal amino acid of said first cleavage product and the N-terminal amino acid of said second substrate polypeptide are selected from proline, alanine, cysteine, serine, valine, and wherein said N-terminal amino acid of said first cleavage product and said N-terminal amino acid of said second substrate polypeptide are not identical. The method according to any one of items 1 to 12, wherein the N-terminal amino acid of said first cleavage product and the N-terminal amino acid of said second substrate polypeptide are selected from proline, alanine, and wherein said N-terminal amino acid of said first cleavage product and said N-terminal amino acid of said second substrate polypeptide are not identical. The method according to any one of items 1 to 13, wherein the N-terminal amino acid of said second substrate polypeptide is not modified, preferably wherein the N-terminal amino acid of said second substrate polypeptide is not acetylated. The method according to any one of items 1 to 14, wherein in step (ii) said partial N- terminal DUF2121 recognition motif is modified by one or more enzyme(s) and / or one or more chemical(s), preferably an enzyme or a chemical. The method according to item 15, wherein said modifying enzyme to be employed in step (ii) is selected from the group consisting of a peptidase and an N-acetyltransferase. The method according to item 15 or 16, wherein said modifying enzyme to be employed in step (ii) has substrate specificity to the N-terminal amino acid residue of said first cleavage product. The method according to any one of items 15 to 17, wherein the N-terminal amino acid residue of said second substrate polypeptide is not a substrate for said modifying enzyme to be employed in step (ii). The method according to any one of items 4 to 18, wherein the DUF2121 recognition motif comprised in said first substrate polypeptide comprises an amino acid sequence selected from the group consisting of SEQ ID NO: 361 to 364 and 579 to 582, preferably SEQ ID NO: 361 to 364, more preferably SEQ ID NO: 363 and 364, most preferably SEQ ID NO: 364. The method according to any one of items 9 to 19, wherein the partial N-terminal DUF2121 recognition motif of said second substrate polypeptide comprises the amino acid sequence XGA or XAA, preferably XGA, wherein X is the N-terminal amino acid of said second substrate polypeptide and is as defined in any one of items 11 to 14. The method according to any one of items 1 to 20, wherein the partial N-terminal DUF2121 recognition motif of said first cleavage product comprises the amino acid sequence XGA or XAA, preferably XGA, wherein X is the N-terminal amino acid of said first cleavage product and is as defined in any one of items 11 to 14. The method according to any one of items 8 to 20, wherein said partial C-terminal DUF2121 recognition motif of said second cleavage product comprises the amino acid sequence KD or RD, preferably KD. The method according to any one of items 9 to 22, wherein the DUF2121 recognition motif comprised in said first substrate polypeptide, the partial N-terminal DUF2121 recognition motif comprised in said first cleavage product, and / or the partial N-terminal DUF2121 recognition motif comprised in said second substrate polypeptide comprise(s) at least 1, preferably at least 2, even more preferably at least 3, even more preferably at least 4, even more preferably at least 5, even more preferably at least 6, even more preferably at least 7, even more preferably at least 8, even more preferably at least 9, even more preferably at least 10, even more preferably at least 15, and even more preferably at least 20 amino acids C-terminally of said DUF2121 recognition motif or of said partial N-terminal DUF2121 recognition motif. The method according to any one of items 9 to 23, wherein the DUF2121 recognition motif comprised in said first substrate polypeptide, the partial N-terminal DUF2121 recognition motif comprised in said first cleavage product, and / or the partial N-terminal DUF2121 recognition motif comprised in said second substrate polypeptide comprise(s) at least 10, at least 15 or at least 20 amino acids C-terminally of said DUF2121 recognition motif or of said partial N-terminal DUF2121 recognition motif. The method according to any one of items 9 to 23, wherein the DUF2121 recognition motif comprised in said first substrate polypeptide, the partial N-terminal DUF2121 recognition motif comprised in said first cleavage product, and / or the partial N-terminal DUF2121 recognition motif comprised in said second substrate polypeptide comprise(s) a sequence identical to or at least 60% identical to a sequence as defined by position(s) 21 to 30, 21 to 29, 21 to 28, 21 to 27, 21 to 26, 21 to 25, 21 to 24, 21 to 23, 21 to 22 or 21 of any one of SEQ ID NO: 365 to 578 C-terminally of said DUF2121 recognition motif or of said partial N-terminal DUF2121 recognition motif; and / or wherein the DUF2121 recognition motif comprised in said first substrate polypeptide, the partial N-terminal DUF2121 recognition motif comprised in said first cleavage product, and / or the partial N-terminal DUF2121 recognition motif comprised in said second substrate polypeptide comprise(s) a sequence identical to or at least 60 % identical to a sequence as defined by position(s) 21 to 40, 21 to 39, 21 to 38, 21 to 37, 21 to 36, 21 to 35, 21 to 34, 21 to 33, 21 to 32, 21 to 31 of any one of SEQ ID NO: 511-578 C-terminally of said DUF2121 recognition motif or of said partial N-terminal DUF2121 recognition motif. The method according to any one of items 8 to 25, wherein the DUF2121 recognition motif of the first substrate polypeptide and / or the partial C-terminal DUF2121 recognition motif of said second cleavage product comprise(s) at least 4 amino acids N-terminally of said DUF2121 recognition motif or of said partial C-terminal DUF2121 recognition motif. The method according to any one of items 8 to 25, wherein DUF2121 recognition motif of the first substrate polypeptide and / or the partial C-terminal DUF2121 recognition motif of said second cleavage product comprise(s) a sequence identical to or at least 60 % identical to a sequence as defined by position(s) 1 to 15, 2 to 15, 3 to 15, 4 to 15, 5 to 15, 6 to 15, 7 to 15, 8 to 15, 9 to 15, 10 to 15, 11 to 15, 12 to 15, 13 to 15, 14 to 15 or 15 of any one of SEQ ID NO: 365 to 578 N-terminally of said DUF2121 recognition motif or of said partial C-terminal DUF2121 recognition motif. The method according to any one of items 4 to 27, wherein the DUF2121 recognition motif of the first substrate polypeptide comprises the sequence as defined in any one of SEQ ID NO: 365 to 578 or a sequence having at least 60% sequence identity to said sequence. The method according to any one of items 4 to 28, wherein the first substrate polypeptide comprises an N-terminal part defined from the N-terminus of the first substrate polypeptide to the aspartate residue in position 2 of SEQ ID NO: 361 to 364. The method according to any one of items 1 to 29, wherein said polypeptide comprising an N-terminal DUF2121 domain has transpeptidase activity, preferably sequence-specific transpeptidase activity and most preferably DUF2121 transpeptidase activity. The method according to any one of items 1 to 30, wherein said polypeptide comprising an N-terminal DUF2121 domain does not have protease activity. The method according to any one of items 1 to 31, wherein said N-terminal DUF2121 domain comprises an amino acid sequence as depicted in SEQ ID NO: 56 or an amino acid sequence having at least 20% sequence identity to SEQ ID NO: 56. The method according to any one of items 1 to 32, wherein said N-terminal DUF2121 domain comprises an amino acid sequence selected from the group consisting of the following (a) to (c):
[0028] (a) SEQ ID NO: 57 to 196;
[0029] (b) an amino acid sequence having at least 60% sequence identity to the amino acid sequences of (a); and
[0030] (c) an amino acid sequence as defined in (a) or (b) wherein one to 10 amino acid residues are deleted, inserted or added; and wherein said N-terminal DUF2121 domain has transpeptidase activity. The method according to any one of items 1 to 33, wherein the polypeptide comprising an N-terminal DUF2121 domain has an amino acid sequence selected from the group consisting of the following (a) to (c):
[0031] (a) SEQ ID NO: 139 to 278; (b) an amino acid sequence having at least 60% sequence identity to the amino acid sequences of (a); and
[0032] (c) an amino acid sequence as defined in (a) or (b) wherein 1 to 10 amino acid residues are deleted, inserted or added; and wherein the polypeptide and / or said N-terminal DUF2121 domain has transpeptidase activity. The method according to any one of items 30 to 34, wherein the transpeptidase activity comprises the capability of catalyzing the formation of a peptide bond between said first cleavage product and said second substrate polypeptide thereby fusing / ligating said first cleavage product N-terminally to said second substrate polypeptide. The method according to any one of items 30 to 35, wherein the polypeptide comprising an N-terminal DUF2121 domain further comprises a C-terminal OB-like domain, preferably a C-terminal OB-like domain having an amino acid sequence selected from the group consisting of SEQ ID NO 279 to 360 or an amino acid sequence having at least 60% sequence identity to any one of SEQ ID NO 279 to 360. The method according to any one of items 1 to 36, wherein linking and / or fusing the second cleavage product and the second substrate polypeptide produces a fusion polypeptide. The method according to any one of items 1 to 37, wherein in step (iii) a fusion polypeptide is obtained. The method according to items 37 or 38, wherein said fusion polypeptide comprises:
[0033] (a) a part of the first substrate polypeptide and a part of the second substrate polypeptide; or
[0034] (b) a part of the first substrate polypeptide and the entire second substrate polypeptide. The method according to any one of items 37 to 39, wherein said fusion polypeptide comprises the second substrate polypeptide C-terminally fused to said second cleavage product, preferably via a covalent bond, more preferably via a peptide bond. The method according to any one of items 1 to 40, wherein said first substrate polypeptide comprises a non-proteinaceous moiety N-terminally to said DUF2121 recognition motif, and / or wherein said second substrate polypeptide comprises a non-proteinaceous moiety C-terminally to said partial N-terminal DUF2121 recognition motif, so that the produced fusion polypeptide comprises said non-proteinaceous moiety. The method according to item 41, wherein said non-proteinaceous moiety is selected from the group consisting of a fluorophore, a drug, a toxin, a carbohydrate, a lipid, a solid carrier, and an oligonucleotide. The method according to any one of items 1 to 42, wherein said first substrate polypeptide comprises an antibody, or an antigen-binding fragment thereof N-terminally to said DUF2121 recognition motif, and / or wherein said second substrate polypeptide comprises an antibody, or an antigen-binding fragment thereof C-terminally to said partial N-terminal DUF2121 recognition motif, so that the produced fusion polypeptide comprises said antibody, or said antigen-binding fragment thereof. The method according to any one of items 1 to 43, wherein said first substrate polypeptide comprises an enzyme N-terminally to said DUF2121 recognition motif, and / or wherein said second substrate polypeptide comprises an enzyme C-terminally to said partial N- terminal DUF2121 recognition motif, so that the produced fusion polypeptide comprises said enzyme. The method according to any one of items 1 to 40, wherein the part of the first substrate polypeptide or the part of the second substrate polypeptide forming part of the produced fusion polypeptide comprises a protein and wherein the part of the other substrate polypeptide forming part of the produced fusion polypeptide has a solid carrier attached thereto, wherein the produced fusion polypeptide comprises the protein immobilized on the solid carrier, preferably wherein the protein is an enzyme. The method according to any one of items 1 to 45, wherein the first substrate polypeptide and / or the second substrate polypeptide is / are isotopically labeled, preferably wherein the first or the second polypeptide are isotopically labeled. The method according to any one of items 1 to 46, wherein the part of the first substrate polypeptide or the part of the second substrate polypeptide forming part of the produced fusion polypeptide is part of a virus-like particle and wherein the part of the other substrate polypeptide forming part of the produced fusion polypeptide comprises an immunogenic structure. The method according to any one of items 1 to 47, wherein the N-terminus of said first substrate polypeptide and / or the C-terminus of said second substrate polypeptide are linked to a membrane, preferably a vesicle membrane. The method according to any one of items 1 to 48, wherein the first substrate polypeptide comprises an intramolecular disulfide bond, preferably wherein the first cysteine residue forming the disulfide bond is located N-terminally of the DUF2121 recognition sequence and the second cysteine residue forming the disulfide bond is located C-terminally of the DUF2121 recognition motif. The method according to any one of items 1 to 49, wherein said first substrate polypeptide comprises an affinity tag N-terminally to said DUF2121 recognition motif, and / or wherein said second substrate polypeptide comprises an affinity tag C-terminally to said partial N- terminal DUF2121 recognition motif. The method according to any one of items 1 to 50, wherein said first substrate polypeptide comprises an affinity tag N-terminally to said DUF2121 recognition motif, and wherein said second substrate polypeptide comprises an affinity tag C-terminally to said partial N- terminal DUF2121 recognition motif, preferably wherein the first and the second affinity tag are not identical. The method according to any one of items 1 to 51 , wherein the method is an in vivo method. The method according to item 52, wherein said in vivo method comprises expressing said polypeptide comprising an N-terminal DUF2121 domain, said modifying enzyme to be employed in step (ii), said first substrate polypeptide, and said second substrate polypeptide in a host or a host cell. The method according to item 53, wherein said host or host cell is procaryotic or eukaryotic, preferably wherein said host or host cell is selected from the group consisting of: Escherichia coli, Bacillus subtilis, Pseudomonas fhiorescens, Sulfolobus solfataricus, Pichia pastoris, Saccharomyces cerevisiae, Schizosaccharomyces pombe, Arabidopsis thaliana, Zea mays, Oryza sativa, Caenorhabditis elegans, Drosophila melanogaster, Mus musculus, Danio rerio, Homo sapiens, T4 phage, and TEV virus. The method according to item 53 or 54, wherein said in vivo method further comprises obtaining the produced fusion polypeptide from said host or host cell. The method according to any one of items 1 to 51, wherein the method is an in vitro method. The method according to any one of items 1 to 56, wherein step (i) is carried out before step (ii) and step (ii) is carried out before step (iii). The method according to any one of items 1 to 56, wherein steps (i), (ii), and (iii) are carried out simultaneously and / or in a single reaction. The method according to any one of items 1 to 58, wherein the method further comprises collecting the produced fusion polypeptide. The method according to any one of items 15 to 59, wherein said modifying enzyme to be employed in step (ii) is capable of cleaving off at least the N-terminal amino acid residue of the first cleavage product, preferably only the N-terminal amino acid residue of the first cleavage product. The method according to any one of items 15 to 60, wherein said modifying enzyme to be employed in step (ii) is a peptidase. The method according to item 61, wherein said peptidase is an exopeptidase. The method according to item 62, wherein said exopeptidase is a proline aminopeptidase and wherein the N-terminal amino acid of said first cleavage product is proline. The method according to item 63, wherein said proline aminopeptidase is a proline aminopeptidase from Bacillus, preferably from Bacillus coagulans. The method according to item 64, wherein said proline aminopeptidase comprises an amino acid sequence as defined in SEQ ID NO: 52 or wherein said proline aminopeptidase comprises an amino acid sequence having at least about 60% sequence identity to SEQ ID NO: 52 and comprises proline aminopeptidase activity. The method according to item 64 or 65, wherein the N-terminal amino acid of said second substrate polypeptide is not proline. The method according to any one of items 63 to 66, wherein the N-terminal amino acid of said second substrate polypeptide is selected from alanine, cysteine, serine, valine, tryptophan, methionine, glycine, leucine, phenylalanine, isoleucine, aspartate, glutamate, histidine, asparagine, glutamine, tryptophan, or tyrosine, preferably alanine, cysteine, serine, or valine, more preferably alanine. The method according to item 62, wherein said exopeptidase is an alanine aminopeptidase and wherein the N-terminal amino acid of said first cleavage product is alanine. The method according to item 68, wherein the N-terminal amino acid of said second substrate polypeptide is not alanine. The method according to item 68 or 69, wherein the N-terminal amino acid of said second substrate polypeptide is selected from proline, cysteine, serine, valine, tryptophan, methionine, glycine, leucine, phenylalanine, isoleucine, aspartate, glutamate, histidine, asparagine, glutamine, tryptophan, or tyrosine, preferably proline, cysteine, serine, or valine, more preferably proline. The method according to any one of items 15 to 59, wherein said modifying enzyme to be employed in step (ii) is an N-acetyltransferase. The method according to item 71, wherein said modifying enzyme to be employed in step (ii) or said N-acetyltransferase is capable of acetylating the N-terminal amino acid residue of the first cleavage product, preferably only the N-terminal amino acid residue of the first cleavage product. The method according to item 71 or 72, wherein the N-terminal amino acid of said first cleavage polypeptide is not proline. The method according to any one of items 71 to 73, wherein the N-terminal amino acid of said first cleavage product is selected from alanine, cysteine, serine, valine, tryptophan, methionine, glycine, leucine, phenylalanine, isoleucine, aspartate, glutamate, histidine, asparagine, glutamine, tryptophan, or tyrosine preferably alanine, or cysteine, more preferably alanine. The method according to any one of items 71 to 74, wherein the N-terminal amino acid of said second substrate polypeptide is proline. The method according to any one of items 15 to 75, wherein said polypeptide comprising an N-terminal DUF2121 domain and said modifying enzyme to be employed in step (ii) are covalently linked. The method according to any one of items 15 to 76, wherein said polypeptide comprising an N-terminal DUF2121 domain and said modifying enzyme to be employed in step (ii) are covalently linked via a linker. The method according to item 77, wherein said linker is a polypeptide linker, a polyethylene glycol linker, or an alkyl linker. The method according to item 78, wherein said polypeptide linker comprises a sequence selected from GGGGS (SEQ ID NO: 586), GGGGSGGGGS (SEQ ID NO: 587), or GGGGSGGGGSGGGGS (SEQ ID NO: 588). The method according to any one of items 15 to 59, wherein said modifying chemical to be employed in step (ii) is specifically modifying the N-terminal amino acid residue of the first cleavage product. The method according to any one of items 15 to 59 and 80, wherein said modifying chemical to be employed in step (ii) is not modifying and / or not capable of modifying the N-terminal amino acid residue of the second substrate peptide. The method according to any one of items 15 to 59, 80 and 81, wherein said modifying chemical is a benzaldehyde comprising an ortho-boronic acid substituent, or a composition comprising an aldehyde and an organoboronic acid, preferably a benzaldehyde comprising an ortho-boronic acid substituent. The method according to item 82, wherein said benzaldehyde comprising an ortho-boronic acid substituent is 2-Formylphenylboronic acid (2-FPBA), wherein said aldehyde is salicylaldehyde, 2-pyridinecarbaldehyde, glyoxylic acid, or 3 -hydroxy -2- pyridinecarbaldehyde, and / or wherein said organoboronic acid is phenylboronic acid or para-methoxyboronic acid. The method according to item 82, wherein said modifying chemical is a benzaldehyde comprising an ortho-boronic acid substituent, preferably 2-Formylphenylboronic acid (2- FPBA), wherein the N-terminal amino acid residue of said first cleavage product is cysteine, and preferably wherein the N-terminal amino acid residue of said second substrate polypeptide is not cysteine, more preferably wherein the N-terminal amino acid residue of said second substrate polypeptide is selected from the group of proline, alanine, serine, valine, tryptophan, methionine, glycine, leucine, phenylalanine, isoleucine, aspartate, glutamate, histidine, asparagine, glutamine, tryptophan, or tyrosine. The method according to item 82, wherein said modifying chemical is a composition comprising an aldehyde and an organoboronic acid, preferably wherein said aldehyde is salicylaldehyde, 2-pyridinecarbaldehyde, glyoxylic acid, or 3 -hydroxy -2- pyridinecarbaldehyde, preferably wherein said organoboronic acid is phenylboronic acid or para-methoxyboronic acid, wherein the N-terminal amino acid residue of said first cleavage product is proline, and preferably wherein the N-terminal amino acid residue of said second substrate polypeptide is not proline, more preferably wherein the N-terminal amino acid residue of said second substrate polypeptide is selected from the group of alanine, cysteine, serine, valine, tryptophan, methionine, glycine, leucine, phenylalanine, isoleucine, aspartate, glutamate, histidine, asparagine, glutamine, tryptophan, or tyrosine. The method according to any one of items 1 to 85, wherein said second substrate polypeptide further comprises one or more N-terminal protecting residue(s). The method according to item 86, wherein said one or more N-terminal protecting residue(s) are cleaved off before step (i) or during step (i) and / or step (ii). The method according to item 86 or 87, wherein said one or more N-terminal protecting residue(s) is / are identical to the N-terminal residue of the first cleavage product. The method according to item 88, wherein said modifying enzyme to be employed in step (ii) is capable of cleaving and / or cleaves said one or more N-terminal protecting residue(s) off the second substrate polypeptide, preferably wherein said modifying enzyme is as defined in any one of items 63 to 67. The method according to item 86 or 87, wherein said one or more N-terminal protecting residue(s) are a cleavable tag. The method according to item 90, wherein said cleavable tag further comprises an N- terminal affinity tag. The method according to item 91, wherein said affinity tag is selected from the group consisting of: Streptavidin tag, FLAG tag, HA tag, Myc tag, Sumo, or polyhistidine tag, preferably a polyhistidine-tag. The method according to any one of items 90 to 92, wherein said method further comprises contacting said second substrate polypeptide comprising said cleavable tag with an enzyme capable of cleaving said cleavable tag off of said second substrate polypeptide. The method according to any one of items 90 to 93, wherein said cleavable tag comprises an amino acid sequence consisting of SEQ ID NO: 50. The method according to item 94, wherein said enzyme capable of cleaving off said cleavable tag is Tobacco Etch Virus protease (TEV protease). The method for the production of a circular polypeptide according to any one of items 2 and 4 to 95, wherein the N-terminus of said first substrate polypeptide or the N-terminus of said second cleavage product is covalently linked to the C-terminus of said second substrate polypeptide via a linker. The method for the production of a circular polypeptide according to item 96, wherein said linker is a polypeptide linker, a polyethylene glycol linker, or an alkyl linker. The method for the production of a circular polypeptide according to item 97, wherein said polypeptide linker comprises a sequence selected from GGGGS, GGGGSGGGGS, or GGGGSGGGGSGGGGS . The method for the production of a circular polypeptide according to any one of items 2 and 4 to 98, wherein linking and / or fusing said second cleavage product to said second substrate polypeptide comprises producing a circular polypeptide. The method for the immobilization of a polypeptide according to any one of items 3 to 95, wherein said first substrate polypeptide or said second cleavage product remains immobilized on said solid carrier during steps (0) and (i) to (iii). The method for the immobilization of a polypeptide according to any one of items 3 to 95 and 100, wherein said method fuses said immobilized second cleavage product and said second substrate polypeptide, thereby producing an immobilized polypeptide. The method for the immobilization of a polypeptide according to any one of items 3 to 95, 100, and 101, wherein said immobilized polypeptide comprises a DUF2121 recognition motif, preferably wherein said DUF2121 recognition motif is as defined in any one of items 19 and 23 to 27. The method for the immobilization of a polypeptide according to any one of items 3 to 95 and 100 to 102, wherein said method further comprises removing undesired reagents and / or contaminants from the immobilized polypeptide after step (iii), preferably by washing said immobilized polypeptide with a buffer after step (iii). A method for the purification of a polypeptide, wherein said method comprises steps (0) and (i) to (iii) of the method according to item 96 and further comprises the following step (iv):
[0035] (iv) contacting the immobilized polypeptide with a third substrate polypeptide, with said polypeptide comprising an N-terminal DUF2121 domain, and optionally with a modifying enzyme as defined in any one of items 16 to 18 and 60 to 79 or with a modifying chemical as defined in any one of items 80 to 85. A fusion polypeptide obtained and / or obtainable by the method according to any one of items 1 and 4 to 95, a circular polypeptide obtained and / or obtainable by the method according to any one of items 2 and 4 to 99, an immobilized polypeptide obtained and / or obtainable by the method according to any one of items 3, 4 to 95, and 100 to 103, or a purified polypeptide obtained and / or obtainable by the method according to item 104. A composition comprising a polypeptide comprising an N-terminal DUF2121 domain having an N-terminal serine or threonine residue and an enzyme selected from the group consisting of a peptidase and an N-acetyltransferase and / or with a modifying chemical as defined in any one of items 80 to 85, preferably a peptidase. The composition according to item 106, wherein the polypeptide comprising an N-terminal DUF2121 domain is as defined in any one of items 30 to 36 and wherein said peptidase is as defined in any one of items 63 to 70 and / or wherein said N-acetyltransferase is as defined in any one of items 71 to 75. A polypeptide comprising an N-terminal DUF2121 domain having an N-terminal serine or threonine residue and a polypeptide selected from the group consisting of a peptidase and an N-acetyltransferase, preferably a peptidase, preferably wherein said polypeptide is as defined in any one of items 76 to 79. The polypeptide according to item 108, wherein the polypeptide comprising an N-terminal DUF2121 domain is as defined in any one of items 30 to 36 and wherein said peptidase is as defined in any one of items 63 to 70 and / or wherein said N-acetyltransferase is as defined in any one of items 71 to 75. A nucleic acid encoding the polypeptide according to item 108 or 109, or a nucleic encoding the following (a) to (b): (a) a polypeptide comprising an N-terminal DUF2121 domain having an N-terminal serine or threonine residue, preferably wherein said polypeptide comprising an N- terminal DUF2121 domain is as defined in any one of items 30 to 36; and
[0036] (b) an enzyme selected from the group consisting of a peptidase and an N- acetyltransferase, preferably wherein said peptidase is as defined in any one of items 63 to 70, or preferably wherein said N-acetyltransferase is as defined in any one of items 71 to 75, more preferably a peptidase as defined in any one of items 63 to 70. A nucleic acid vector comprising the nucleic acid according to item 108. A host or host cell comprising the nucleic acid according to item 108, the nucleic acid vector according to item 108, and / or the polypeptide according to item 108 or 109. A host or host cell comprising at least (a) and (b), or at least (c) and (d), or at least (e) and (f) from the group consisting of the following (a) to (f):
[0037] (a) a polypeptide comprising an N-terminal DUF2121 domain having an N-terminal serine or threonine residue, preferably wherein said polypeptide comprising an N- terminal DUF2121 domain is as defined in any one of items 30 to 36;
[0038] (b) an enzyme selected from the group consisting of a peptidase and an N- acetyltransferase, preferably wherein said peptidase is as defined in any one of items 63 to 70, or preferably wherein said N-acetyltransferase is as defined in any one of items 71 to 75, more preferably a peptidase as defined in any one of items 63 to 70;
[0039] (c) a nucleic acid encoding the polypeptide comprising an N-terminal DUF2121 according to (a);
[0040] (d) a nucleic acid encoding the enzyme selected from the group consisting of a peptidase and an N-acetyltransferase according to (b);
[0041] (e) a nucleic acid vector comprising the nucleic acid according to (c); and
[0042] (f) a nucleic acid vector comprising the nucleic acid according to (d). The host or host cell according to item 113, wherein the nucleic acids of (c) and (d) are comprised in a single nucleic acid. The host or host cell according to item 113, wherein the nucleic acids of (c) and (d) are comprised in a single nucleic acid vector. The host or host cell according to any one of items 113 to 115, wherein said host cell is as defined in item 54. A kit comprising one or more selected from the group consisting of the following (a) to (g):
[0043] (a) an N-terminal DUF2121 domain having an N-terminal serine or threonine residue and an enzyme or a chemical, wherein said enzyme is selected from the group consisting of a peptidase and an N-acetyltransferase, preferably wherein said polypeptide comprising an N-terminal DUF2121 domain is as defined in any one of items 30 to 36, and preferably wherein said peptidase is as defined in any one of items 63 to 70, or preferably wherein said N-acetyltransferase is as defined in any one of items 71 to 75, preferably wherein said chemical is as defined in any one of items 80 to 86, more preferably a polypeptide comprising an N-terminal DUF2121 domain having an N-terminal serine or threonine residue as defined in any one of items 30 to 36 and a peptidase as defined in any one of items 63 to 70;
[0044] (b) the composition according to item 106 or 107;
[0045] (c) the polypeptide according to item 108 or 109;
[0046] (d) the nucleic acid according to item 110;
[0047] (e) the nucleic acid vector according to item 111;
[0048] (f) the host cell according to item 112; and
[0049] (g) the host cell according to item 113. A polypeptide comprising a partial N-terminal DUF2121 recognition site and one or more protecting residue(s) N-terminal to said partial N-terminal DUF2121 recognition site. The polypeptide according to item 118, wherein said partial N-terminal DUF2121 recognition site is as defined in any one of items 20, and 23 to 25. The polypeptide according to item 118 or 119, wherein said one or more N-terminal protecting residue(s) is as defined in any one of items 85, 87 to 89, and 91. The polypeptide according to any one of item 118 or 120, wherein said polypeptide comprises an N-terminal amino acid sequence selected from SEQ ID NO: 50, 53, 54, or 55 or an N-terminal amino acid sequence consisting of P, PP, or PPP. A nucleic acid encoding the polypeptide according to any one of items 118 to 121. A nucleic acid vector comprising the nucleic acid according to item 122. 124. A host or host cell comprising the nucleic acid according to item 122, the nucleic acid vector according to item 123, and / or the polypeptide according to any one of item 118 or 121.
[0050] 125. A composition comprising the polypeptide according to any one of item 118 or 121.
[0051] 126. The composition according to item 125, wherein the composition further comprises one or more of the following (a) to (b):
[0052] (a) a polypeptide comprising an N-terminal DUF2121 domain having an N-terminal serine or threonine residue, preferably wherein said polypeptide comprising an N- terminal DUF2121 domain is as defined in any one of items 30 to 36; and
[0053] (b) an enzyme or a chemical, wherein said enzyme is selected from the group consisting of a peptidase and an N-acetyltransferase, preferably wherein said peptidase is as defined in any one of items 63 to 70, or preferably wherein said N-acetyltransferase is as defined in any one of items 71 to 75, preferably wherein said chemical is as defined in any one of items 80 to 85.
[0054] In the context of the present invention, a polypeptide comprising an N-terminal DUF2121 domain having an N-terminal serine or threonine residue has / comprises transpeptidase activity, specifically sequence-specific transpeptidase activity. Accordingly, such polypeptides are enzymes that catalyze transpeptidation reactions, i.e., link two peptides (or parts thereof) that each comprise a DUF2121 recognition sequence (which is further detailed herein below). The polypeptide comprising an N-terminal DUF2121 domain having an N-terminal serine or threonine residue may herein also be referred to as “polypeptide comprising an N-terminal DUF2121 domain”, “Connectase”, “Conectase”, “Adriase” (abbreviation for “Archaeal Peptide Recombinase”), “Jugase”, ’’transpeptidase”, “sequence-specific transpeptidase” or “polypeptide recombinase”. Connectases are a member of the Ntn (N-terminal nucleophile) hydrolase superfamily (the name is misleading in this case, as Connectases have no hydrolysis activity). The signature feature of this superfamily is an N-terminal active site residue, usually a serine or a threonine. Such residues have two functional chemical groups, an (N-terminal) amino group and a hydroxyl group (on their amino acid side chain), which distinguishes Connectases from other transferases (such as for example Sortase A). In the case of Connectases, this architecture allows a unique reaction mechanism, in which the substrates / educts are bound by Connectases via an amide bond. Because amide bonds do not undergo spontaneous hydrolysis, Connectases can fuse substrates without side reactions involving hydrolysis (unlike e.g. Sortase A). Such Connectases and their reaction mechanisms are further detailed herein below and in WO 2021 / 099484 (which is herein incorporated by reference in its entirety). The Connectases described therein are particularly useful, inter alia, in the herein provided methods. Generally, the term "transpeptidase activity" refers to the enzymatic capability to catalyze the transfer of a peptide fragment from a donor substrate to an acceptor substrate, thereby forming a fusion polypeptide comprising parts of the donor substrate and parts of the acceptor substrate. In the context of the present invention, this activity (in particular “sequence specific transpeptidase activity”) specifically involves:
[0055] 1. The cleavage of a peptide bond within a DUF2121 recognition motif of a first substrate polypeptide;
[0056] 2. The formation of a covalent, hydrolysis-resistant amide-linked intermediate between the enzyme and the resulting second cleavage product; and
[0057] 3. The transfer of this second cleavage product to a second substrate polypeptide to form a new peptide bond, thereby creating a fusion polypeptide.
[0058] Crucially, for the purposes of this invention, this activity is distinct from protease activity as it does not involve hydrolysis (i.e., water is not the acceptor molecule) and does not result in the degradation of the substrates or products. This is a key characteristic of the polypeptide comprising an N-terminal DUF2121 domain as described herein.
[0059] Connectases comprise several advantages as compared to previously known transpeptidases / ligases (such as Sortase A, or other herein above mentioned enzymes). For example, Connectases are characterized by their high substrate specificity, high catalytic efficiency, and high versatility (i.e., Connectases couple peptides, polypeptide, proteins, and the like). Further, Connectases catalyze no side reaction (as opposed to the above-mentioned hydrolysis of products and educts as catalyzed by Sortase A), such as hydrolysis of educts and / or products. However, the reaction of Connectases (as detailed in WO 2021 / 099484 and Fuchs et al., 2021, both of which are herein incorporated in their entirety) is reversible resulting in only 50% fusion product from two equally abundant educts (i.e., “a first substrate polypeptide” and “a second substrate polypeptide” in the context of the present invention), thus, resulting in approx. 50% product yield. This means that an equilibrium of approx. 50% educts and 50% products will be reached when employing equimolar quantities of both educts. This reaction mechanism is further illustrated and exemplified in appended Example 2 and Figure 1:
[0060] Connectases cleave in a first step the first substrate polypeptide into two intermediates (i.e., a first cleavage product and a second cleavage product). The second cleavage product remains bound (via the catalytically active serine or threonine residue) to the Connectases and the first cleavage product dissociates from the Connectase.
[0061] In a second step, the Connectases may then either react the (bound) second cleavage product with a second substrate polypeptide (resulting in the production of a fusion polypeptide) or with said first cleavage product (resulting in the back-reaction of the first step; i.e., resulting in the formation of the first substrate polypeptide again). The invention is illustrated by way of example of a Connectase from M. mazei (known from WO 2021 / 099484 and in the present case shown in SEQ ID NO: 161).
[0062] This specific Connectase binds an amino acid sequence (i.e., Connectase recognition motif) ELASKDPGAFDADPLVVEI (as e.g. comprised in the exemplary substrate A- ELASKDPGAFDADPLVVEI, with A being any desired polypeptide / polypeptide stretch / chemi cal molecule). It then cleaves off the C-terminal peptide PGAFDADPLVVEI (i.e., a “first cleavage product”) and forms a covalent intermediate, A-ELASKD-Connectase, with the N- terminal part of this sequence (i.e., with “A-ELASKD”, which is considered “a second cleavage product“). This reaction works both ways, meaning that PGAFDADPLVVEI (i.e., “the first cleavage product”) can react with A-ELASKD-Connectase to restore the Connectase and its substrate, A-ELASKDPGAFDADPLVVEI. However, when a second substrate B (with B being any desired polypeptide / polypeptide stretch / chemical molecule) in form of PGAFDADPLVVEI- B (i.e., “a second substrate polypeptide” comprising a partial N-terminal DUF2121 recognition sequence) is added to the reaction, it may be used instead of the peptide PGAFDADPLVVEI (i.e., the “first substrate polypeptide”) to form the fusion product A-ELASKDPGAFDADPLVVEI-B (also “A-B” herein below). In other words, Connectases lead to (desired) fusion proteins / fusion polypeptides in form of fused single molecules, wherein the educts (A and B) are combined / fused / ligated to a molecule / fusion product in form of “A-B”. The fusion product A-B may comprise a residual Connectase recognition site (like in the above example in form of “ELASKDPGAFDADPLVVEI”) linking “A” and “B”.
[0063] When using equimolar quantities of educts A and B (i.e., of the [desired] first and of the [desired] second substrate polypeptides, respectively), Connectases catalyze an equilibrium of approximately 50% fusion product A-B and 50% educts. In the specific example of the AL mazei Connectase, this is because the same amounts of PGAFDADPLVVEI peptide byproduct (i.e., the first cleavage product) and PGAFDADPLVVEI-B educt (i.e., the second substrate polypeptide) compete for the A-ELASKD-Connectase intermediate. See also Figure 17 for a non-limiting and illustrative representation of the means and methods of the present invention.
[0064] The gist of the present invention was to shift this equilibrium of reactions catalyzed by Connectases towards more than about 50% of the desired fusion products, i.e., more than about 50% of the A- B fusion products. It was surprisingly found that with the means and methods provided herein, more than even at least 55% if not even up to nearly 100% fusion products can be achieved; i.e., nearly all educts are ligated / fused / coupled to the desired fusion protein. In the context of the invention, means and methods are provided that favorably shift the equilibrium towards the product side, i.e., a higher yield of the desired fusion protein / fusion product / fusion peptide can be achieved. This higher yield can be achieved in Connectase reactions by removing or inactivating the peptide byproduct (i.e., the first cleavage product)from the reaction mixture / the pool of reactants. This “removal” or “inactivation” of this peptide byproduct / first cleavage product can be for example achieved by specific enzymatic proteolysis of the peptide byproduct / first cleavage product. The enzymatic proteolysis can comprise the use of specific proteases (such as, e.g., a proline aminopeptidase) that act only on the first cleavage product (in this exemplified case of the M. mazei Connectase the PGAFDADPLVVEI peptide) and not on the second substrate polypeptide (in this exemplified case the PGAFDADPLVVEI-B educt). This inventive principle can be used also for other Connectase recognition sequences as shown herein (like, e.g., for the Connectase recognition sequences shown in SEQ ID NO: 34 to 36 and 365 to 578). The removal of the peptide byproduct / first cleavage product as illustrated herein above in the specific example of the M. mazei Connectase can without further ado be generalized for other Connectase reactions. Also in other Connectase reactions, this should lead to a higher yield of desired fusion products / fusion proteins / fusion peptides.
[0065] In light of the above, the present invention is also illustrated in one specific, yet not limiting, embodiment employing a specific Connectase of M. mazei (i.e., the Connectase as shown in SEQ ID NO: 161) and (a) modifying enzyme(s). Accordingly, the inventive method for the ligation of two polypeptides, the method for the cyclization of a polypeptide, or the method for the immobilization of a polypeptide may comprise the following steps (i) to (iii):
[0066] (i) contacting a first substrate polypeptide [i.e., a C-terminal Cnt-Tag; e.g., A- ELASKDPGAFDADPLVVEI (SEQ ID NO: 35)] with a polypeptide comprising an N-terminal DUF2121 domain having an N-terminal serine or threonine residue [e.g., a Connectase from M. mazei as shown in e.g., SEQ ID NO: 161], thereby cleaving said first substrate polypeptide into a first cleavage product [e.g., PGAFDADPLVVEI (SEQ ID NO: 16)] and a second cleavage product [e.g., A- ELASKD (SEQ ID NO: 46)];
[0067] (ii) modifying a partial N-terminal DUF2121 recognition motif [e.g., PGAFDADPLVVEI] comprised in said first cleavage product so that / to achieve that / thereby it is achieved that said modified partial N-terminal DUF2121 recognition motif is not recognized by said polypeptide comprising an N-terminal DUF2121 domain, preferably wherein said partial N-terminal DUF2121 recognition motif is modified by a proline aminopeptidase [e.g., a proline aminopeptidase from Bacillus coagulans; BcPAP; as, e.g., shown in SEQ ID NO: 52]; and
[0068] (iii) contacting said second cleavage product and / or said polypeptide comprising an N- terminal DUF2121 domain with a second substrate polypeptide [i.e., an N-terminal Cnt-Tag; e.g., AGAFDADPLVVEI-B (SEQ ID NO: 4)], thereby said second cleavage product is fused to said second substrate polypeptide [e.g., resulting in the fusion polypeptide: A-ELASKDAGAFDADPLVVEI-B (SEQ ID NO: 598)].
[0069] As above, A may be any desired polypeptide / polypeptide stretch / chemical molecule and B may be any desired polypeptide / polypeptide stretch / chemical molecule. As will be detailed herein below, the herein provided means and methods are also particularly useful in the production of a circular peptide (e.g., where the N-terminus of A is fused / ligated / coupled to the C-terminus of B prior to the Connectase reaction) and in the immobilization of a polypeptide / polypeptide stretch / chemi cal molecule (e.g., where either the N-terminus of A is or the C-terminus of B is fused / ligated / coupled to a solid carrier prior to the Connectase reaction).
[0070] Accordingly, in the context of the present invention, it was found that the equilibrium of the reaction can surprisingly be shifted (from the educt side) towards the (desired) product side (the product may be referred to as a “fusion polypeptide” herein). In other words, the present invention provides for means and methods to improve the biochemical production of (fusion) peptides and / or (fusion) proteins by shifting the equilibrium of the reaction towards the product side. Thus, the herein provided means and methods surprisingly improve the efficacy of Connectases. In particular, it was surprisingly found that the herein provided means and methods can result in about 100% fusion products (irrespective of whether peptides or proteins were employed as educts). As will be detailed below, attempts to shift the equilibrium to the product side of reactions catalyzed by the above mentioned Sortase A, resulted in substantially lower product yield. Accordingly, in the context of the present invention, a 'fusion polypeptide' is a single, continuous polypeptide chain formed by the (covalent) ligation of the second cleavage product and a second substrate polypeptide, optionally including a DUF2121 recognition motif at the ligation junction. As will be further illustrated herein below, the term ‘fusion polypeptide’ does not exclude the further presence of, e.g., additional proteinaceous (such as, antibodies, enzymes, therapeutically relevant proteins, etc.) or non-proteinaceous moieties (such as solid carriers / supports or drugs) linked / coupled to (e.g., the N-terminus or the C-terminus of) this fusion polypeptide. Further, in the context of the present invention, the term ‘fusion polypeptide’ can relate to a circular polypeptide that has been produced / formed / generated by fusing / coupling / linking the C-terminus of a polypeptide to the N- terminus of the same polypeptide.
[0071] Accordingly, and as also further detailed herein below, the herein provided first substrate polypeptide and the herein provided second substrate polypeptide may further comprise additional proteinaceous (such as, antibodies, enzymes, etc.) or non-proteinaceous moieties (such as solid carriers / supports, fluorophores, or drugs) linked / coupled to (e.g., the N-terminus) of the first substrate polypeptide or to (e.g., the C-terminus) of the second substrate polypeptide, as long as such additional moieties do not substantially hinder the coupling / fusion of said substrate polypeptides (by the herein provided means and methods), e.g., by interfering with the DUF2121 recognition motifs comprised therein. Accordingly, a 'first substrate polypeptide' is a polypeptide containing a complete DUF2121 recognition motif that can be cleaved by the Connectase. A 'second substrate polypeptide' is a polypeptide containing at least a partial N-terminal DUF2121 recognition motif to which the second cleavage product can be fused by the Connectase. As is illustratively shown in the enclosed examples and figures and is characterized in specific embodiments of the present invention, herein provided are various means (such as diverse enzymes and chemicals) having the capability to deplete and / or to modify one side product of the reaction (i.e., the first cleavage product). Accordingly, such means capable of depleting and / or modifying said side product is not particularly limited in the context of the present invention as long as it (specifically) depletes the first cleavage products from the pool of reactants / educts / substrate polypeptides which are available to the Connectase. In particular, such enzymes (also referred to as “modifying enzymes” herein) may (at least partially) cleave or N-acetylate the (N-terminus of the) first substrate polypeptide. A 'modifying enzyme' is any enzyme capable of performing the 'modifying' step as defined herein. The selection of the modifying enzyme is dependent on the N- terminal amino acid of the first cleavage product. This category includes, but is not limited to, peptidases (such as proline aminopeptidase or alanine aminopeptidase), N-acetyltransferases, kinases, and glycosyltransferases. Accordingly, the modifying enzyme may be a peptidase or an N-acetyltransferase. In a preferred embodiment, the modifying enzyme may be an aminopeptidase, preferably a proline aminopeptidase (as identified by accession number EC: 3.4.11.5), in particular it may be a proline aminopeptidase from e.g., Bacillus coagulans or Flavobacterium meningosepliciim. as exemplified in SEQ ID NO: 52 and 589, respectively. Such (modifying) chemicals capable of depleting and / or modifying the first substrate polypeptide may include for example 2-Formylphenylboronic acid (2-FPBA; CAS number 40138-16-7; can be obtained from numerous distributors, e.g., Sigma Aldrich). For example an accordingly modified first cleavage product (i.e., an inactivated by-product) may for example comprise an amino acid sequence according to SEQ ID NO: 611.
[0072] In particular, such (modifying) enzymes or (modifying) chemicals may modify, (partially) degrade, or (partially) cleave a recognition motif for the Connectases comprised in one intermediate / side product of the reaction (i.e., in said first cleavage product). Thereby, said modified Connectase recognition motif (also termed “DUF2121 recognition motif’ herein) cannot be recognized by the Connectase. Consequently, the Connectases may not utilize the first cleavage product (having an altered / modified / (partially) degraded / (partially) cleaved Connectase recognition sequence) as an educt for the back-reaction. Accordingly, such (modifying) enzymes or (modifying) chemicals shift the equilibrium of the reaction as catalyzed by the Connectases from the educt to the product side by modifying the Connectase (or DUF2121) recognition motif comprised in said first cleavage product. Accordingly, the term “modifying”, as used for example in step (ii) of the herein provided methods, refers to any enzymatic or chemical alteration of the first cleavage product, or the partial N-terminal DUF2121 recognition motif comprised therein, that results in a substantial reduction of its ability to be recognized and processed by the polypeptide comprising an N-terminal DUF2121 domain for the reverse ligation reaction. Nonlimiting examples of such modification include proteolytic cleavage, acetylation, phosphorylation, glycosylation, or the covalent addition of a chemical moiety. WO 2021 / 099484 provides not only for Connectases to be used in context of the present invention but also for the specific “Connectase recognition sequences” / “Connectase recognition motifs”. These sequences / motifs are also known as “DUF2121 recognition sequences” and these sequences / motifs are, inter alia, illustrated in appended SEQ ID NO: 34 to 36 and 365 to 578. They are highly conserved and they each comprise, as conserved part, a consensus amino acid sequence of: ‘X1DPX2A’ (with Xi being either K or R and X2 being either G or A, as also illustrated in appended SEQ ID NO: 361 to 364). Illustrative “DUF2121 recognition sequences / motifs ” are provided in SEQ ID NO: 34 to 36 and 365 to 578, reflecting sequences that are derived from phylogenetically distant taxa which diverged over many millions of years. Until the present invention, it was evident for the skilled artisan that such conserved sequences / motifs (i.e., ‘X1DPX2A’) are of relevance for the biological activity of Connectases and the skilled artisan would not have expected sufficient enzymatic activity of the Connectases if such motifs were modified. In particular, until the findings of the present invention, the presence of highly conserved single amino acid residues (i.e., at least amino acids “D”, “P”, and “A” in ‘X1DPX2A’) were considered to be of high relevance for the catalytic activity / function of Connectases. Accordingly, it was considered unlikely that any of these highly conserved single amino acid residues (i.e., at least “D”, “P”, and “A”) in the ‘X1DPX2A’ stretch in / on educts / polypeptide substrates of the Connectase could be substituted with any other amino acid without impeding / reducing / negatively affecting the catalytic activity of the Connectase. In other words, until the present invention, it was understood that at least “D”, “P”, and “A” in the ‘X1DPX2A’ stretch in / on educts / polypeptide substrates of the Connectases are essential for the reaction catalyzed by Connectases as described in WO 2021 / 099484. In particular the “P” (i.e., proline) in ‘X1DPX2A’ was considered as an essential amino acid in the motif, as proline has a peculiar structure among the standard proteinogenic amino acids. In particular, the N-terminus of proline consists of a secondary amine (instead of a free amino group). It is of further note that it was known in the art that the cleavage of the first substrate polypeptide by Connectases occurs between ‘XiD’ and ‘PX2A’ (in ‘X1DPX2A’). Accordingly, the peptide bond between said proline and the N-terminal aspartate residue (i.e., “D”) engages directly in the enzymatic reactions as catalyzed by Connectases. This rendered it more unlikely for the skilled artisan that the proline or said aspartate residue in the X1DPX2A stretch / motif might be exchanged or substituted without impeding / reducing / negatively affecting the catalytic activity of Connectases.
[0073] In contrast to the prior art, it was surprisingly found in context of the present invention that “P” in the ‘X1DPX2A’ motif (i.e., the highly conserved proline residue as comprised in Connectase recognition motifs) may be substituted by other amino acid residues with only minor effects on reactions catalyzed by Connectases. This is, inter alia, documented in appended Example 3 and Figure 2 herein. It is surprisingly shown herein that replacing the proline residue in the ‘X1DPX2A’ stretch / motif with an alanine residue merely reduced the relative ligation rate (i.e., a metric indicating the efficiency of the coupling of two polypeptides; the higher the relative ligation rate, the more efficient the coupling) by a neglectable 10% as compared to a polypeptide comprising the originally defined and conserved Connectase recognition sequence (i.e., the ‘X1DPX2A motif comprising the central proline).
[0074] Furthermore, and as known in the art, the cleavage of the first substrate polypeptide by Connectases occurs within the Connectase recognition sequence (between “D” and “P” of ‘X1DPX2A’ stretch / motif) . This results in the formation of two intermediates, namely a first and a second cleavage product. Accordingly, each of these intermediates comprises a part of the ‘X1DPX2A’ stretch / motif, i.e., a partial DUF2121 recognition sequence. Accordingly, after enzymatic cleavage by the Connectase, a first cleavage product (that may subsequently be degraded in accordance with the teachings of the present invention) and a second cleavage product that comprises the ‘XiD’ on its C-terminus (i.e., the second cleavage product comprises the “partial C -terminal DUF2121 recognition motif’ ‘XiD’). The first cleavage product, as also taught in the prior art comprises N-terminally the “partial N-terminal DUF2121 recognition motif’ ‘PX2A’ motif. In context of this invention, it was surprisingly found that the proline in the ‘X1DPX2A’ stretch / motif can be replaced and substituted by an amino acid, like e.g., alanine, cysteine, serine, valine, tryptophan, methionine, glycine, leucine, phenylalanine, or isoleucine. This substitution / replacement still (and not only) provides for decent Connectase enzymatic (i.e. transferase) activity but also allows for a surprising, yet desired, shift of equilibrium in the Connectase reaction towards the products (for example protein-protein or protein-peptide fusion products) since one educt (i.e., the first cleavage product) of the back-reaction may (in accordance with the present invention) be depleted form the Connectase reaction. In detail and in context of the present invention, it was found that “P” (i.e., the N-terminal amino acid) in said (partial N- terminal) DUF2121 recognition motif might be substituted by other amino acids such as, e.g., alanine, cysteine, serine, valine, tryptophan, methionine, glycine, leucine, phenylalanine, or isoleucine. In context of the present invention it was found that the second substrate polypeptide (i.e., the educt to be reacted by the Connectase with the second cleavage product to form a fusion polypeptide) may not only comprises a “partial N-terminal DUF2121 recognition motif’ ‘PX2A’ but also a modified ‘PX2A’ stretch in which the proline is replaced substituted by another amino acid, such as, e.g., alanine, cysteine, serine, valine, tryptophan, methionine, glycine, leucine, phenylalanine, or isoleucine. It was also surprisingly found that the first substrate polypeptide (i.e., the educt to be reacted by the Connectase to form the second cleavage product) may also comprise a the DUF2121 recognition motif (i.e., ‘X1DPX2A’) in which the amino acid in position 3 is replaced / substituted by another amino acid, such as, e.g., alanine, cysteine, serine, valine, tryptophan, methionine, glycine, leucine, phenylalanine, or isoleucine.
[0075] The present invention, therefore paves the way to (i) re-engineer the DUF2121 recognition sequence of Connectase educts (i.e., of the second substrate polypeptide and / or of the first substrate polypeptide, and thus of the first cleavage product of the Connectase) and to (ii) subsequently develop means and methods (i.e., various modifying enzymes and modifying chemicals) to modify the first cleavage product, the DUF2121 recognition motif comprised therein, or to the N-terminal amino acid comprised therein, so that / to achieve that / thereby it is achieved that they cannot be recognized by the Connectases and thereby reducing / blocking the back-reaction.
[0076] Therefore, and in accordance with the present invention, it is preferred that the N-terminal DUF2121 recognition motif of the first cleavage product (the ‘PX2A’ stretch / motif) and the N- terminal DUF2121 recognition motif of the second substrate polypeptide are not identical, namely (for example and not limiting) that the (N-terminal) proline in the ‘PX2A’ stretch / motif is replaced / substituted by another amino acid, like e.g., by alanine, by cysteine, by serine, by valine, by tryptophan, by methionine, by glycine, by leucine, by phenylalanine, or by isoleucine in one of the two educts / substrates (or in both educts as long as the substituting amino acids in both educts are not identical). Accordingly, in a preferred embodiment, the N-terminal amino acid as comprised in the first cleavage product of Connectases may be proline (i.e. ‘PX2A’) and the N- terminal amino acid as comprised in the second substrate polypeptide of Connectase may be alanine (i.e ‘ AX2A’). Means and methods for such a ’’proline substitution / replacement” are known in the art and further indicated in Example 1.1. In context of this invention, proline aminopeptidases (i.e., enzymes cleaving N-terminal proline residues off of peptides / polypeptides / proteins; EC: 3.4.11.5) may be employed to specifically modify / partially cl eave / alter the Connectase recognition motif comprised in the first cleavage product; see also e.g., appended Examples 4 and 9. This allowed for highly efficient coupling of two educts (specifically the second cleavage product and the second substrate polypeptide) drastically increasing the product yield from about 50% (as achieved maximally in the prior art) to about 60%, about 70%, about 80%, about 90%, or even up to about / nearly 100% (see e.g., Figures 6 and 11, both documenting an product yield of about 100%). This was not anticipated in the prior art and such a high product yield of protein-protein, protein-peptide, peptide-protein, or peptide-peptide fusions have not been achieved with Connectase reactions described in the art or with other transferases, including Sortase A. For example Arnott et al. Angew. Chem. Int. Ed. (2024) 63, e202310862 attempted to shift the equilibrium of the reaction as catalyzed by Sortase A towards the product side. The authors fused the polypeptide sequences, such as ‘PanZ-SLPETGASHHHHHH’ (SEQ ID NO: 601) and ‘GVSKYG’ (SEQ ID NO: 602), using Sortase A, thereby producing the fusion polypeptide ‘PanZ-SLPETGVSKYG’ and the side product ‘GASHHHHHH’ (SEQ ID NO: 603). The peptide by-product (i.e., ‘GASHHHHHH’), was digested with an (unspecific) aminopeptidase (Ochrobactrum D-aminopeptidase; DmpA). This aminopeptidase showed a preference for polypeptides starting with GG or GA over polypeptides starting with GV and therefore displayed a lower activity towards ‘GVSKYG’ educts. However, DmpA is a D-aminopeptidase that also degrades polypeptides starting with ‘GV’ at a lower rate than polypeptides starting with ‘GG’ or ‘GA’. Accordingly, the ‘GVSKYG’ substrate is continuously degraded by DmpA. This, in turn, leads to a continuous reduction in product yield (further differences between the herein provided means and methods and the teaching of Amott et al. 2024 are discussed in Example 11, herein below). In contrast and in the context of the present invention various modifying enzymes (such as e.g., a proline aminopeptidase) and modifying chemicals (such as e.g., 2-FPBA) that are all highly specific for the first cleavage product (i.e., for the undesired product of the Connectase reaction) were employed to deplete the first cleavage products. This is achieved in that these ‘modifying enzymes’ and / or ‘modifying chemicals’ specifically modify the first cleavage products of the Connectase reaction in a way that they are not processed by Connectases and, thereby, the ‘first cleavage product’ is depleted form the pool of reactants. In particular, such modifying enzymes and chemicals modify at least the N-terminal amino acid residue comprised in the partial N-terminal Connectase recognition sequence (e.g., ‘PX2A’ stretch / motif) of the first cleavage product. Concrete, yet non limiting examples of such ‘depletions of the pool of reactants’ are provided in e.g., appended Examples 4, 9, and 10 and in Figures 3, 11, and 12. In other words, herein provided are means and methods that specifically modify the first cleavage product of Connectases without modifying the second substrate polypeptide. This could not have been foreseen and allowed for a high product yield of more than 50%. As illustrated in the appended examples, up to nearly 100% product yield can be obtained with the means and methods described herein, in particular by the targeted modifications of the DUF2121 recognition motif, i.e., substitution / replacement of proline in the ‘X1DPX2A’ stretch / motif. In accordance with the present invention, said proline may be replaced and / or substituted by another amino acid, like e.g., alanine, cysteine, serine, valine, tryptophan, methionine, glycine, leucine, phenylalanine, or isoleucine. This amino acid substitution in interplay with the modifying enzymes or modifying chemicals (that are all highly specific for the first cleavage product / the undesired intermediate), leads to the herein documented surprising high yield of products, such as protein-protein fusions, protein-peptide fusions, and the like.
[0077] Accordingly, the present invention provides for and describes engineered substrates for the Connectase and for methods employing the same, inter alia, in the production of fusion polypeptides / fusion proteins. As is detailed herein below, such engineered substrates and the herein provided methods result in the fast and / or cost-effective production of fusion polypeptides / fusion proteins. Further, such methods allow for the ligation of at least the majority of the polypeptide educts / substrates. Accordingly, it is herein desired that the herein provided means and methods result in at least about 50% product yield, preferably at least about 55% product yield, more preferably at least about 60% product yield, more preferably at least about 65% product yield, more preferably at least about 70% product yield, more preferably at least about 75% product yield, more preferably at least about 80% product yield, more preferably at least about 85% product yield, more preferably at least about 90% product yield, more preferably at least about 95% product yield, more preferably at least about 96% product yield, more preferably at least about 97% product yield, more preferably at least about 98% product yield, more preferably at least about 99% product yield, most preferably up to about 100% product yield. The skilled person is aware of means and methods to quantify the product yield in enzymatic reactions, like also the herein relevant Connectase reactions. Such means and methods are illustrated in the appended examples, in particular appended Example 4 and Figure 3 show a densiometric analysis of product yields based on SDS-page band quantification. In the context of the present invention, product yield may be determined according to the following Formula I:
[0078] Formula I:
[0079] Product yield [%] = 100% x AB / (A + B + AB)
[0080] Wherein “A” and “B” indicate the band intensities of the two educts / substrates / substrate polypeptides and “AB” indicates the band intensity of the product / fusion polypeptide. The skilled person is aware that SDS-page band intensities generally correlate with molar protein quantities (in appropriately designed experiments). The person skilled in the art is further aware of means and methods to design such experiments and to determine molar quantities of the two educts and the product. In particular, the person skilled in the art is aware that SDS-PAGE band quantification may be employed to determine molar quantities herein.
[0081] For example, a Connectase reaction may be separated by SDS-PAGE, stained with Coomassie blue R250, destained, and imaged with a fluorescence scanner with an excitation wavelength at about 680 nm and an emission wavelength at about 720 nm. The wavelength to be used is evidently depending on the staining method / the employed staining reagent. The density of the obtained bands can be integrated / quantified using publicly available software, such as Imaged. The product yield may then be computed using Formula I based on the quantified band intensities (here corresponding to the molar quantities of the two educts [A and B]and the product [AB]).
[0082] The person skilled in the art is also aware of other means and methods to determine / quantify the product yield (of e.g., Connectase reactions) for example by determining / quantifying the molar protein quantities or the molar peptide quantities.
[0083] In contrast to the present invention, Chong et al., 2025, employed non-quantitative methods to assess the efficiency of Connectase catalyzed ligations. This, among other experimental flaws, lead them to conclude that substituting the proline residue in the N-terminal position of the second substrate polypeptide with glycine would improve catalytic efficiency (see Example 14 for a detailed discussion of Chong et al. 2025). As demonstrated in Example 2 and Figure 2 of the present invention, such a substitution in fact reduces the ligation rates by approx. 30-fold. However, as illustrated herein, reduced ligation rates may not hinder application of accordingly substituted substrate polypeptides when employed in the context of the herein provided methods, as the herein provided methods generally shift the product yield to more than about 50% product yield, which can be desirable over increased reaction speed / ligation rates. The present invention provides for a method for the production of a fusion polypeptide, wherein the method comprises the following steps (i) to (iii):
[0084] (i) contacting a first substrate polypeptide with a polypeptide comprising an N- terminal DUF2121 domain having an N-terminal serine or threonine residue, thereby cleaving said first substrate polypeptide into a first cleavage product and a second cleavage product;
[0085] (ii) modifying a partial N-terminal DUF2121 recognition motif comprised in said first cleavage product so that / to achieve that / thereby it is achieved that said modified partial N-terminal DUF2121 recognition motif is not recognized by said polypeptide comprising an N-terminal DUF2121 domain; and
[0086] (iii) contacting said second cleavage product and / or said polypeptide comprising an N- terminal DUF2121 domain with a second substrate polypeptide, thereby said second cleavage product is fused to said second substrate polypeptide.
[0087] Accordingly, the present invention also provides in particular and in one embodiment for a method for the production of a fusion polypeptide, wherein the method comprises the following steps (i) to (iii):
[0088] (i) cleaving a first substrate polypeptide into a first cleavage product and a second cleavage product by contacting said first substrate polypeptide with a polypeptide comprising an N-terminal DUF2121 domain having an N-terminal serine or threonine residue;
[0089] (ii) modifying a partial N-terminal DUF2121 recognition motif comprised in said first cleavage product so that / to achieve that / thereby it is achieved that said partial N- terminal DUF2121 recognition motif is not recognized by said polypeptide comprising an N-terminal DUF2121 domain; and
[0090] (iii) fusing said second cleavage product to said second substrate polypeptide by contacting said second cleavage product and / or said polypeptide comprising an N- terminal DUF2121 domain with a second substrate polypeptide.
[0091] It is herein envisaged that during step (iii) of the herein provided method two polypeptides (i.e., the second substrate polypeptide and the second cleavage product) are fused and a (fusion polypeptide (comprising said two polypeptides; i.e., the second substrate polypeptide and the second cleavage product) is produced. Accordingly, during step (iii) of the herein provided method, a fusion polypeptide may be obtained. It is illustratively demonstrated in appended Example 4, that the herein provided means and methods efficiently couple e.g., two proteins or a protein and a peptide. Accordingly, in the context of the present invention the term “polypeptide” encompasses peptides, and proteins. Polypeptides, peptides, and proteins are polymers of at least two amino acids linked via amide bonds that are formed between an amino group of one amino acid and a carboxy group of another amino acid. Herein, the term “peptide” refers to such a polymer consisting of 50 or less amino acids but at least two amino acids, whereas the term “protein” refers to such a polymer comprising more than 50 amino acids. The amino acids comprised in the peptide or protein, which are also referred to as amino acid residues, may be selected from the 20 standard proteinogenic a amino acids (i.e., Ala, Arg, Asn, Asp, Cys, Glu, Gin, Gly, His, He, Leu, Lys, Met, Phe, Pro, Ser, Thr, Trp, Tyr, and Vai) but also from non- proteinogenic and / or non-standard a amino acids (such as, e.g., ornithine, citrulline, homolysine, pyrrolysine, 4 hydroxyproline, a methylalanine (i.e., 2 aminoisobutyric acid), norvaline, norleucine, terleucine (tert-leucine), labionin, or an alanine or glycine that is substituted at the side chain with a cyclic group (e.g., a cycloalkyl group, a heterocycloalkyl group, an aryl group, or a heteroaryl group) like, e.g., cyclopentylalanine, cyclohexylalanine, phenylalanine, naphthylalanine, pyridylalanine, thienylalanine, cyclohexylglycine, or phenylglycine) as well as P amino acids (e.g., P alanine), y-amino acids (e.g., y-aminobutyric acid, isoglutamine, or statine) and 5 amino acids. Preferably, the amino acid residues comprised in the peptide or protein are selected from a amino acids, more preferably from the 20 standard proteinogenic a amino acids (which can be present as the L isomer or the D-isomer, and are preferably all present as the L isomer).
[0092] In the context of the present invention, the production of a fusion polypeptide may comprise the ligation of two independent polypeptides. In this context, the term “independent” indicates that the two (substrate) polypeptides are not linked / coupled / fused prior to being contacted with the Connectase (in contrast, herein below, is also provided a method for the cyclisation of a polypeptide). The present invention further provides also in one embodiment for a method for the ligation of two independent peptides, wherein the method comprises the following steps (i) to (iii):
[0093] (i) contacting a first substrate polypeptide with a polypeptide comprising an N- terminal DUF2121 domain having an N-terminal serine or threonine residue, thereby cleaving said first substrate polypeptide into a first cleavage product and a second cleavage product;
[0094] (ii) modifying a partial N-terminal DUF2121 recognition motif comprised in said first cleavage product so that / to achieve that / thereby it is achieved that said partial N- terminal DUF2121 recognition motif is not recognized by said polypeptide comprising an N-terminal DUF2121 domain; and
[0095] (iii) contacting said second cleavage product and / or said polypeptide comprising an N- terminal DUF2121 domain with a second substrate polypeptide, thereby said second cleavage product is ligated to said second substrate polypeptide. Accordingly, the present invention also provides in particular and in one embodiment for the ligation of two independent peptides, wherein the method comprises the following steps (i) to (iii):
[0096] (i) cleaving a first substrate polypeptide into a first cleavage product and a second cleavage product by contacting said first substrate polypeptide with a polypeptide comprising an N-terminal DUF2121 domain having an N-terminal serine or threonine residue;
[0097] (ii) modifying a partial N-terminal DUF2121 recognition motif comprised in said first cleavage product so that / to achieve that / thereby it is achieved that said partial N- terminal DUF2121 recognition motif is not recognized by said polypeptide comprising an N-terminal DUF2121 domain; and
[0098] (iii) fusing said second cleavage product to said second substrate polypeptide by contacting said second cleavage product and / or said polypeptide comprising an N- terminal DUF2121 domain with a second substrate polypeptide.
[0099] The herein provided means and methods allow for the efficient coupling of two polypeptides (i.e., of a second substrate polypeptide and a second cleavage product). As mentioned above, said two polypeptide may independent, meaning that they are not connected / linked / fused prior to being contacted with the Connectase (i.e., with a polypeptide comprising an N-terminal DUF2121 domain having an N-terminal serine or threonine residue). Said two polypeptides may also be connected / linked / fused (prior to being contacted with the Connectase) as long as the connection / the link / the linker between the two polypeptides does not hinder the Connectase recognition site and / or the enzymatic activity of the Connectase. The (partial N-terminal) Connectase recognition sequence in the second substrate polypeptide is located on its N-terminus and the (partial C-terminal) Connectase recognition sequence is located on the C-terminus of the second cleavage product (with said second cleavage product corresponding to an N-terminal fraction / part of the first substrate polypeptide). Accordingly, in order to not disturb the Connectase recognition sequences (comprised in the two substrate polypeptides), it is herein preferred that the N-terminus of said first substrate polypeptide is (covalently) linked to the C-terminus of said second substrate polypeptide. It is also envisaged in the context of the present invention that the two substrate polypeptides may be linked via for example amino acid side chains or via an amino acid side chain and one N- or C-terminus (preferably, the C-terminus of the second substrate polypeptide and / or the N-terminus of the first substrate polypeptide). The efficient cyclisation of polypeptides in accordance with the present invention is illustratively shown in appended Example 6.
[0100] In a further embodiment the present invention also provides for a method for the production of a circular polypeptide and / or for a method for the cyclisation of a polypeptide, wherein said method comprises the following steps (i) to (iii): (i) contacting a first substrate polypeptide with a polypeptide comprising an N- terminal DUF2121 domain having an N-terminal serine or threonine residue, thereby cleaving said first substrate polypeptide into a first cleavage product and a second cleavage product;
[0101] (ii) modifying a partial N-terminal DUF2121 recognition motif comprised in said first cleavage product so that / to achieve that / thereby it is achieved that said partial N- terminal DUF2121 recognition motif is not recognized by said polypeptide comprising an N-terminal DUF2121 domain; and
[0102] (iii) contacting said second cleavage product and / or said polypeptide comprising an N- terminal DUF2121 domain with a second substrate polypeptide, thereby said second cleavage product is fused to said second substrate polypeptide, wherein said first substrate polypeptide and said second substrate polypeptide are covalently linked, preferably wherein the N-terminus of said first substrate polypeptide is linked to the C-terminus of said second substrate polypeptide.
[0103] Accordingly, the present invention also provides in particular and in one embodiment for a method for the production of a circular polypeptide and / or for a method for the cyclisation of a polypeptide, wherein said method comprises the following steps (i) to (iii):
[0104] (i) cleaving a first substrate polypeptide into a first cleavage product and a second cleavage product by contacting said first substrate polypeptide with a polypeptide comprising an N-terminal DUF2121 domain having an N-terminal serine or threonine residue;
[0105] (ii) modifying a partial N-terminal DUF2121 recognition motif comprised in said first cleavage product so that / to achieve that / thereby it is achieved that said partial N- terminal DUF2121 recognition motif is not recognized by said polypeptide comprising an N-terminal DUF2121 domain; and
[0106] (iii) fusing said second cleavage product to said second substrate polypeptide by contacting said second cleavage product and / or said polypeptide comprising an N- terminal DUF2121 domain with a second substrate polypeptide, wherein said first substrate polypeptide and said second substrate polypeptide are covalently linked, preferably wherein the N-terminus of said first substrate polypeptide is linked to the C-terminus of said second substrate polypeptide.
[0107] The herein provided means and methods may also be particularly useful in the immobilization of a polypeptide. It may, for example, be desired to immobilize a polypeptide as for example comprised in the first substrate polypeptide (preferably on the N-terminus of the first substrate polypeptide) or as comprised in the second substrate polypeptide (preferably on the C-terminus of the second substrate polypeptide) on a solid carrier. Non-limiting examples for a solid carrier according to the present invention are a polymer (such as polyethylene glycol (PEG) or a polysaccharide), a hydrogel, a microparticle, a nanoparticle, a sphere (including nano- and microspheres), beads (such as microbeads or magnetic beads), a liposome, a micelle, quantum dots, prosthetics, and a solid surface (like a microplate, a membrane, a microarray chip, or a biosensor surface). Beads (in particular magnetic beads) are herein particularly preferred solid carriers. Non-limiting examples for a solid carrier according to the present invention are a polymer, a hydrogel, a microparticle, a nanoparticle, a sphere (e.g. a nano- or microsphere), beads (e.g. microbeads), quantum dots, prosthetics, and a solid surface. In a preferred embodiment the carrier is a bead (e.g. a microbead), such as an agarose bead. In a further particularly preferred embodiment, the solid carrier is a micro-titer plate. In the context of the present invention, “comprises a solid carrier”, “is linked to a solid carrier”, or “is immobilized on a solid carrier” can be used interchangeably.
[0108] In a further embodiment the present invention also provides for a method for the immobilization of a polypeptide and / or for a method for the ligation of a polypeptide to a solid carrier, wherein the method comprises the steps of (0) and (i) to (iii):
[0109] (0) immobilizing the N-terminus of a first substrate polypeptide on a solid carrier or obtaining a first substrate polypeptide immobilized on a solid carrier via its N- terminus, or immobilizing the C-terminus of a second substrate polypeptide on a solid carrier or obtaining a second substrate polypeptide immobilized on a solid carrier via its C-terminus;
[0110] (i) contacting said first substrate polypeptide with a polypeptide comprising an N- terminal DUF2121 domain having an N-terminal serine or threonine residue, thereby cleaving said first substrate polypeptide into a first cleavage product and a second cleavage product;
[0111] (ii) modifying a partial N-terminal DUF2121 recognition motif comprised in said first cleavage product so that / to achieve that / thereby it is achieved that said partial N- terminal DUF2121 recognition motif is not recognized by said polypeptide comprising an N-terminal DUF2121 domain; and
[0112] (iii) contacting said second cleavage product and / or said polypeptide comprising an N- terminal DUF2121 domain with a second substrate polypeptide, thereby said second cleavage product is fused to said second substrate polypeptide.
[0113] Accordingly, the present invention also provides in particular and in one embodiment for the immobilization of a polypeptide and / or for a method for the ligation of a polypeptide to a solid carrier, wherein the method comprises the steps of (0) and (i) to (iii): (0) immobilizing the N-terminus of a first substrate polypeptide on a solid carrier or obtaining a first substrate polypeptide immobilized on a solid carrier via its N- terminus, or immobilizing the C-terminus of a second substrate polypeptide on a solid carrier or obtaining a second substrate polypeptide immobilized on a solid carrier via its C-terminus;
[0114] (i) cleaving a first substrate polypeptide into a first cleavage product and a second cleavage product by contacting said first substrate polypeptide with a polypeptide comprising an N-terminal DUF2121 domain having an N-terminal serine or threonine residue;
[0115] (ii) modifying a partial N-terminal DUF2121 recognition motif comprised in said first cleavage product so that / to achieve that / thereby it is achieved that said partial N- terminal DUF2121 recognition motif is not recognized by said polypeptide comprising an N-terminal DUF2121 domain; and
[0116] (iii) fusing said second cleavage product to said second substrate polypeptide by contacting said second cleavage product and / or said polypeptide comprising an N- terminal DUF2121 domain with a second substrate polypeptide.
[0117] The herein provided means and methods (e.g., the method for the production of a fusion polypeptide, the method for the production of a circular polypeptide, the method for the immobilization of a polypeptide, and the like) all share the same general inventive principle: the depletion of a first cleavage product from the pool of reactants in a Connectase reaction, which as detailed above, shifts the equilibrium towards the product side, thereby, increasing the product yield. Accordingly, any definition and specification (in particular definitions and specifications regarding steps (i) to (iii) of any of such methods) herein below or above may apply mutatis mutandis to any of such methods.
[0118] The Connectases (i.e., the polypeptide comprising an N-terminal DUF2121 domain having an N- terminal serine or threonine residue) can recognize DUF2121 recognition sequences (comprised in Connectase substrates) and may catalyze the ligation of two polypeptides comprising such Connectase recognition sequences. Accordingly, the first substrate polypeptide may comprise a (complete) DUF2121 recognition motif. Such complete DUF2121 recognition sequences are exemplified in SEQ ID NO: 34 to 36 and 365 to 578. Such complete DUF2121 recognition motifs preferably comprise 19 amino acids. In the context of the present invention, the terms “recognition motif’, “recognition sequence”, “DUF2121 recognition motif’, “DUF2121 recognition sequence”, “Connectase recognition motif’, “Connectase recognition sequence” may be used interchangeably. Accordingly, a 'DUF2121 recognition motif is any amino acid sequence that can be specifically recognized and cleaved by a polypeptide comprising an N-terminal DUF2121 domain as defined herein. Such a motif may comprise the consensus sequence 'X1DPX2A' or variants thereof where one or more amino acids, including the proline at position 3, are substituted with other natural or non-proteinogenic amino acids, and which still functions as a substrate for the enzyme. In the context of the present invention, when referring to Connectases “not recognizing” (or the like) e.g., a (partial N-terminal) DUF2121 recognition sequence (as e.g., comprised in a first cleavage product), this means that such Connectases may not process the respective DUF2121 recognition sequence. In other words, this means that the Connectase cannot couple / fuse / ligate the respective polypeptide (e.g., the first cleavage product). A modified partial N-terminal DUF2121 recognition motif is considered 'not recognized' by the polypeptide comprising an N-terminal DUF2121 domain if the rate of the reverse reaction using the modified first cleavage product as a substrate is less than 10%, preferably less than 5%, and most preferably less than 1% of the rate of the forward ligation reaction with the intended second substrate polypeptide under identical reaction conditions. Preferably, no or substantially no reverse reaction can occur when the first substrate polypeptide has previously been modified.
[0119] As mentioned above, Connectases may cleave the (complete) DUF2121 recognition sequence comprised in a first substrate polypeptide resulting in the formation of two cleavage products each comprising a (partial) DUF2121 recognition sequence. Accordingly, during step (i) of the herein provided method, said polypeptide comprising an N-terminal DUF2121 domain may cleave said DUF2121 recognition motif comprised in said first substrate polypeptide into said partial N- terminal DUF2121 recognition motif comprised in said first cleavage product and a partial C- terminal DUF2121 recognition motif comprised in said second cleavage product, thereby producing said first cleavage product and said second cleavage product. Accordingly, the sequence of the partial DUF2121 recognition sequences (comprised in the first or the second cleavage product) can readily be derived from the sequence of the (complete) to-be-cleaved DUF2121 recognition sequence.
[0120] The first cleavage product may be the C-terminal cleavage product of said first substrate polypeptide.
[0121] The second cleavage product may be the N-terminal cleavage product of said first substrate polypeptide.
[0122] The second cleavage product may comprise a partial C-terminal DUF2121 recognition motif. Such partial C-terminal DUF2121 recognition motifs are exemplified in amino acid 18 to 20, 18 to 21, 18 to 22, 18 to 23, 18 to 24, 18 to 25, 18 to 26, 18 to 27, 18 to 28, 18 to 29, 18 to 30, 18 to 31, 18 to 32, 18 to 33, 18 to 34, 18 to 35, 18 to 36, 18 to 37, 18 to 38, 18 to 39, or 18 to 40 in SEQ ID NO: 365 to 578.
[0123] Preferably a partial C-terminal DUF2121 recognition motif comprises 6 amino acids.
[0124] Accordingly, in a particularly preferred embodiment, the partial C-terminal DUF2121 recognition motif comprises a C-terminal amino acid sequence of KD, and further comprises an amino acid sequence as defined in SEQ ID NO: 599 (or an amino acid sequence having at least about 80%, at least about 85%, at least about 90%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, at least about 99%, or preferably about 100% sequence identity thereto) N- terminally adjacent to the sequence of KD.
[0125] The second substrate polypeptide may comprise a partial N-terminal DUF2121 recognition motif. Such partial N-terminal DUF2121 recognition motifs are exemplified in amino acid 16 to 17, 15 to 17, 14 to 17, 13 to 17, 12 to 17, 11 to 17, 10 to 17, 9 to 17, 8 to 17, 7 to 17, 6 to 17, 5 to 17, 4 to 17, 3 to 17, 2 to 17, or 1 to 17 in SEQ ID NO: 365 to 578.
[0126] Preferably a partial N-terminal DUF2121 recognition motif comprises 13 amino acids.
[0127] Accordingly, in a particularly preferred embodiment, the partial N-terminal DUF2121 recognition motif (comprised in said second substrate polypeptide or in said first cleavage product) comprises an N-terminal amino acid sequence of XGA or XAA, and further comprises an amino acid sequence as defined in SEQ ID NO: 600 (or an amino acid sequence having at least about 80%, at least about 85%, at least about 90%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, at least about 99%, or preferably about 100% sequence identity thereto) C- terminally adjacent to the sequence of XGA or XAA.
[0128] In a further particularly preferred embodiment, the partial N-terminal DUF2121 recognition motif (comprised in said second substrate polypeptide or in said first cleavage product) comprises an N- terminal amino acid sequence of XGA or XAA, and further comprises an amino acid sequence as defined in SEQ ID NO: 600 (or an amino acid sequence having at least about 80%, at least about 85%, at least about 90%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, at least about 99%, or preferably about 100% sequence identity thereto) C-terminally adjacent to the sequence of XGA or XAA, wherein the first amino acid residue in the amino acid sequence of XGA or XAA corresponds to proline, alanine, cysteine, serine, valine, tryptophan, methionine, glycine, leucine, phenylalanine, isoleucine, aspartate, glutamate, histidine, asparagine, glutamine, tryptophan, or tyrosine.
[0129] In a further particularly preferred embodiment, the partial N-terminal DUF2121 recognition motif (comprised in said second substrate polypeptide or in said first cleavage product) comprises an N- terminal amino acid sequence of XGA or XAA, and further comprises an amino acid sequence as defined in SEQ ID NO: 600 (or an amino acid sequence having at least about 80%, at least about 85%, at least about 90%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, at least about 99%, or preferably about 100% sequence identity thereto) C-terminally adjacent to the sequence of XGA or XAA, wherein the first amino acid residue in the amino acid sequence of XGA or XAA corresponds to proline, alanine, cysteine, serine, valine, tryptophan, or methionine.
[0130] In a further particularly preferred embodiment, the partial N-terminal DUF2121 recognition motif (comprised in said second substrate polypeptide or in said first cleavage product) comprises an N- terminal amino acid sequence of XGA or XAA, and further comprises an amino acid sequence as defined in SEQ ID NO: 600 (or an amino acid sequence having at least about 80%, at least about 85%, at least about 90%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, at least about 99%, or preferably about 100% sequence identity thereto) C-terminally adjacent to the sequence of XGA or XAA, wherein the first amino acid residue in the amino acid sequence of XGA or XAA corresponds to proline, alanine, cysteine, serine, or valine.
[0131] In a further particularly preferred embodiment, the partial N-terminal DUF2121 recognition motif (comprised in said second substrate polypeptide or in said first cleavage product) comprises an N- terminal amino acid sequence of XGA or XAA, and further comprises an amino acid sequence as defined in SEQ ID NO: 600 (or an amino acid sequence having at least about 80%, at least about 85%, at least about 90%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, at least about 99%, or preferably about 100% sequence identity thereto) C-terminally adjacent to the sequence of XGA or XAA, wherein the first amino acid residue in the amino acid sequence of XGA or XAA corresponds to proline or alanine.
[0132] The present invention provides for means and methods to deplete the first cleavage product from Connectase reactions (so that / to achieve that / thereby it is achieved that the Connectase may not process the first cleavage product). Such means include (modifying) enzymes and (modifying) chemicals that specifically modify the partial N-terminal DUF2121 recognition motif (without modifying the partial N-terminal DUF2121 recognition motif of said second substrate polypeptide). Accordingly, said partial N-terminal DUF2121 recognition motif of said first cleavage product and said partial N-terminal DUF2121 recognition motif of said second substrate polypeptide are preferably not identical.
[0133] In the context of the present invention the specificity of such (modifying) enzymes or (modifying) chemicals preferably depends on the N-terminal amino acid (comprised in the partial N-terminal DUF2121 recognition motif) comprised in the first cleavage product and / or comprised in the second substrate polypeptide. Accordingly, the N-terminal amino acid of said first cleavage product and the N-terminal amino acid of said second substrate polypeptide may be selected from proline, alanine, cysteine, serine, valine, tryptophan, methionine, glycine, leucine, phenylalanine, isoleucine, aspartate, glutamate, histidine, asparagine, glutamine, tryptophan, or tyrosine and said N-terminal amino acid of said first cleavage product and said N-terminal amino acid of said second substrate polypeptide are preferably not identical. In this context, the N-terminal amino acid of said first cleavage product and the N-terminal amino acid of said second substrate polypeptide may be selected from proline, alanine, cysteine, serine, valine, tryptophan, or methionine and said N-terminal amino acid of said first cleavage product and said N-terminal amino acid of said second substrate polypeptide are preferably not identical. Preferably the N-terminal amino acid of said first cleavage product and the N-terminal amino acid of said second substrate polypeptide are selected from proline and alanine, proline and serine, or proline and cysteine and wherein said N-terminal amino acid of said first cleavage product and said N-terminal amino acid of said second substrate polypeptide are not identical.
[0134] More preferably the N-terminal amino acid of said first cleavage product and the N-terminal amino acid of said second substrate polypeptide are selected from proline, alanine, and wherein said N- terminal amino acid of said first cleavage product and said N-terminal amino acid of said second substrate polypeptide are not identical.
[0135] Polypeptides (such as, e.g., the second substrate polypeptide) may be N-terminally acetylated depending on their N-terminal amino acid. For example an N-terminal alanine residue may be acetylated (as shown in appended Example 4), which may affect the specificity / activity of the herein provided (modifying) enzymes or (modifying) chemicals. Accordingly, it is preferred herein that the N-terminal amino acid of said second substrate polypeptide is not modified, preferably wherein the N-terminal amino acid of said second substrate polypeptide is not acetylated. Means and methods to remove an N-terminal acetylation / to de-acetylate an N-terminal amino acid are provided herein below.
[0136] The DUF2121 recognition motif comprised in said first substrate polypeptide may comprise an amino acid sequence selected from the group consisting of SEQ ID NO: 361 to 364 and 579 to 582, preferably SEQ ID NO: 361 to 364, more preferably SEQ ID NO: 363 and 364, most preferably SEQ ID NO: 364.
[0137] The partial N-terminal DUF2121 recognition motif of said second substrate polypeptide may comprise the amino acid sequence XGA or XAA, preferably XGA, wherein X is the N-terminal amino acid of said second substrate polypeptide and may be defined as anywhere herein above or below. XGA or XAA in the partial N-terminal DUF2121 recognition motif of said second substrate polypeptide correspond to amino acids at positions 18 to 20 in any one of SEQ ID NO. 365 to 578, whereas the amino acid at position 18 (i.e., proline) may be substituted by another amino acid residue, in particular wherein said amino acid at position 18 (in any one of SEQ ID NO: 365 to 578) may be the N-terminal amino acid of said second substrate polypeptide and may be defined as anywhere herein above or below.
[0138] The partial N-terminal DUF2121 recognition motif of said first cleavage product may comprise the amino acid sequence XGA or XAA, preferably XGA, wherein X is the N-terminal amino acid of said first cleavage product and may be defined as anywhere herein above or below. XGA or XAA in the partial N-terminal DUF2121 recognition motif of said first cleavage product correspond to amino acids at positions 18 to 20 in any one of SEQ ID NO. 365 to 578, whereas the amino acid at position 18 (i.e., proline) may be substituted by another amino acid residue, in particular wherein said amino acid at position 18 (in any one of SEQ ID NO: 365 to 578) may be the N-terminal amino acid of said first cleavage product and may be defined as anywhere herein above or below.
[0139] The partial C-terminal DUF2121 recognition motif of said second cleavage product may comprise the amino acid sequence KD or RD, preferably KD. In this context, KD or RD are the C-terminal amino acids of said second cleavage product. KD or RD in the partial C-terminal DUF2121 recognition motif of said second cleavage product correspond to amino acids at position 16 to 17 in any one of SEQ ID NO. 365 to 578.
[0140] The DUF2121 recognition motif comprised in said first substrate polypeptide, the partial N- terminal DUF2121 recognition motif comprised in said first cleavage product, and / or the partial N-terminal DUF2121 recognition motif comprised in said second substrate polypeptide may comprise at least 1, preferably at least 2, even more preferably at least 3, even more preferably at least 4, even more preferably at least 5, even more preferably at least 6, even more preferably at least 7, even more preferably at least 8, even more preferably at least 9, even more preferably at least 10, even more preferably at least 15, and even more preferably at least 20 amino acids C- terminally of said DUF2121 recognition motif or of said partial N-terminal DUF2121 recognition motif.
[0141] The DUF2121 recognition motif comprised in said first substrate polypeptide, the partial N- terminal DUF2121 recognition motif comprised in said first cleavage product, and / or the partial N-terminal DUF2121 recognition motif comprised in said second substrate polypeptide may comprise at least 10, at least 15 or at least 20 amino acids C-terminally of said DUF2121 recognition motif or of said partial N-terminal DUF2121 recognition motif.
[0142] The DUF2121 recognition motif comprised in said first substrate polypeptide, the partial N- terminal DUF2121 recognition motif comprised in said first cleavage product, and / or the partial N-terminal DUF2121 recognition motif comprised in said second substrate polypeptide may comprise a sequence identical to or at least about 60%, at least about 65%, at least about 70%, at least about 75%, at least about 80%, at least about 85%, at least about 90%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, at least about 99%, or up to about 100% identical to a sequence as defined by position(s) 21 to 30, 21 to 29, 21 to 28, 21 to 27, 21 to 26, 21 to 25, 21 to 24, 21 to 23, 21 to 22 or 21 of any one of SEQ ID NO: 365 to 578 C-terminally of said DUF2121 recognition motif or of said partial N-terminal DUF2121 recognition motif; and / or the DUF2121 recognition motif comprised in said first substrate polypeptide, the partial N-terminal DUF2121 recognition motif comprised in said first cleavage product, and / or the partial N-terminal DUF2121 recognition motif comprised in said second substrate polypeptide may comprise a sequence identical to or at least 60 % identical to a sequence as defined by position(s) 21 to 40, 21 to 39, 21 to 38, 21 to 37, 21 to 36, 21 to 35, 21 to 34, 21 to 33, 21 to 32, 21 to 31 of any one of SEQ ID NO: 511-578 C-terminally of said DUF2121 recognition motif or of said partial N- terminal DUF2121 recognition motif.
[0143] The DUF2121 recognition motif of the first substrate polypeptide and / or the partial C-terminal DUF2121 recognition motif of said second cleavage product may comprise at least 4 or 5 amino acids N-terminally of said DUF2121 recognition motif or of said partial C-terminal DUF2121 recognition motif.
[0144] The DUF2121 recognition motif of the first substrate polypeptide and / or the partial C-terminal DUF2121 recognition motif of said second cleavage product may comprise a sequence identical to or at least 60 % identical to a sequence as defined by position(s) 1 to 15, 2 to 15, 3 to 15, 4 to 15, 5 to 15, 6 to 15, 7 to 15, 8 to 15, 9 to 15, 10 to 15, 11 to 15, 12 to 15, 13 to 15, 14 to 15 or 15 of any one of SEQ ID NO: 365 to 578 N-terminally of said DUF2121 recognition motif or of said partial C-terminal DUF2121 recognition motif.
[0145] The DUF2121 recognition motif of the first substrate polypeptide comprises or consists of the sequence as defined in any one of SEQ ID NO: 34 to 36 and 365 to 578 or a sequence having at least about 60%, at least about 65%, at least about 70%, at least about 75%, at least about 80%, at least about 85%, at least about 90%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, at least about 99%, or up to about 100% sequence identity to said sequence. Preferably, the DUF2121 recognition motif of the first substrate polypeptide comprises or consists of the sequence as defined in any one of SEQ ID NO: 34 to 36 or a sequence having at least about 60%, at least about 65%, at least about 70%, at least about 75%, at least about 80%, at least about 85%, at least about 90%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, at least about 99%, or up to about 100% sequence identity to said sequence. More preferably, the DUF2121 recognition motif of the first substrate polypeptide comprises or consists of the sequence ELASKDPGAFDADPLVVEI (SEQ ID NO: 35) or a sequence having at least about 60%, at least about 65%, at least about 70%, at least about 75%, at least about 80%, at least about 85%, at least about 90%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, at least about 99%, or up to about 100% sequence identity to said sequence. The length of the DUF2121 recognition sequences is not particularly limited as is shown in appended Example 7.
[0146] The first substrate polypeptide may comprise an N-terminal part defined from the N-terminus of the first substrate polypeptide to the aspartate residue in position 2 of SEQ ID NO: 361 to 364.
[0147] As indicated herein above, the sequence of the DUF2121 recognition sequence comprises a conserved motif having the sequence of any one of SEQ ID NO: 579 to 582, with the amino acid in position 3 in nature corresponding to a proline residue (i.e., as exemplified in SEQ ID NO: 361 to 364). However, in the context the context of the present invention, it was surprisingly found that the amino acid in position 3 can also be substituted by other amino acids (such as, e.g., by alanine, cysteine, serine, or valine). Accordingly, in a particularly preferred embodiment, the first substrate polypeptide comprises a DUF2121 recognition motif as defined in any one of SEQ ID NO: 579 to 582, with the amino acid in position 3 corresponding to proline, alanine, cysteine, serine, valine, tryptophan, methionine, glycine, leucine, phenylalanine, isoleucine, aspartate, glutamate, histidine, asparagine, glutamine, tryptophan, or tyrosine.
[0148] Accordingly, in a particularly preferred embodiment, the first substrate polypeptide comprises a DUF2121 recognition motif as defined in any one of SEQ ID NO: 579 to 582, with the amino acid in position 3 corresponding to proline, alanine, cysteine, serine, valine, tryptophan, or methionine. Accordingly, in a further preferred embodiment, the first substrate polypeptide comprises a DUF2121 recognition motif as defined in any one of SEQ ID NO: 579 to 582, with the amino acid in position 3 corresponding to proline, alanine, cysteine, serine, or valine.
[0149] Accordingly, in a further preferred embodiment, the first substrate polypeptide comprises a DUF2121 recognition motif as defined in any one of SEQ ID NO: 579 to 582, with the amino acid in position 3 corresponding to proline or alanine.
[0150] Accordingly, in a further preferred embodiment, the first substrate polypeptide comprises a DUF2121 recognition motif as defined in any one of SEQ ID NO: 579 to 582, with the amino acid in position 3 corresponding to alanine.
[0151] Accordingly, in a particularly preferred embodiment, the first substrate polypeptide comprises a DUF2121 recognition motif as defined in any one of SEQ ID NO: 579 to 582, with the amino acid in position 3 corresponding to proline, alanine, cysteine, serine, valine, tryptophan, methionine, glycine, leucine, phenylalanine, isoleucine, aspartate, glutamate, histidine, asparagine, glutamine, tryptophan, or tyrosine, wherein said DUF2121 recognition motif further comprises an amino acid sequence as defined in SEQ ID NO: 600 (or an amino acid sequence having at least about 80%, at least about 85%, at least about 90%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, at least about 99%, or preferably about 100% sequence identity thereto) C- terminally adjacent to the sequence of SEQ ID NO: 579 to 582, and wherein said DUF2121 recognition motif further comprises an amino acid sequence as defined in SEQ ID NO: 599 (or an amino acid sequence having at least about 80%, at least about 85%, at least about 90%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, at least about 99%, or preferably about 100% sequence identity thereto) N-terminally adjacent to the sequence of SEQ ID NO: 579 to 582.
[0152] Accordingly, in a particularly preferred embodiment, the first substrate polypeptide comprises a DUF2121 recognition motif as defined in any one of SEQ ID NO: 579 to 582, with the amino acid in position 3 corresponding to proline, alanine, cysteine, serine, valine, tryptophan, or methionine, wherein said DUF2121 recognition motif further comprises an amino acid sequence as defined in SEQ ID NO: 600 (or an amino acid sequence having at least about 80%, at least about 85%, at least about 90%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, at least about 99%, or preferably about 100% sequence identity thereto) C-terminally adjacent to the sequence of SEQ ID NO: 579 to 582, and wherein said DUF2121 recognition motif further comprises an amino acid sequence as defined in SEQ ID NO: 599 (or an amino acid sequence having at least about 80%, at least about 85%, at least about 90%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, at least about 99%, or preferably about 100% sequence identity thereto) N-terminally adjacent to the sequence of SEQ ID NO: 579 to 582.
[0153] Accordingly, in a further preferred embodiment, the first substrate polypeptide comprises a DUF2121 recognition motif as defined in any one of SEQ ID NO: 579 to 582, with the amino acid in position 3 corresponding to proline, alanine, cysteine, serine, or valine, wherein said DUF2121 recognition motif further comprises an amino acid sequence as defined in SEQ ID NO: 600 (or an amino acid sequence having at least about 80%, at least about 85%, at least about 90%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, at least about 99%, or preferably about 100% sequence identity thereto) C-terminally adjacent to the sequence of SEQ ID NO: 579 to 582, and wherein said DUF2121 recognition motif further comprises an amino acid sequence as defined in SEQ ID NO: 599 (or an amino acid sequence having at least about 80%, at least about 85%, at least about 90%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, at least about 99%, or preferably about 100% sequence identity thereto) N- terminally adjacent to the sequence of SEQ ID NO: 579 to 582.
[0154] Accordingly, in a further preferred embodiment, the first substrate polypeptide comprises a DUF2121 recognition motif as defined in any one of SEQ ID NO: 579 to 582, with the amino acid in position 3 corresponding to proline or alanine, wherein said DUF2121 recognition motif further comprises an amino acid sequence as defined in SEQ ID NO: 600 (or an amino acid sequence having at least about 80%, at least about 85%, at least about 90%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, at least about 99%, or preferably about 100% sequence identity thereto) C-terminally adjacent to the sequence of SEQ ID NO: 579 to 582, and wherein said DUF2121 recognition motif further comprises an amino acid sequence as defined in SEQ ID NO: 599 (or an amino acid sequence having at least about 80%, at least about 85%, at least about 90%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, at least about 99%, or preferably about 100% sequence identity thereto) N-terminally adjacent to the sequence of SEQ ID NO: 579 to 582.
[0155] Accordingly, in a further preferred embodiment, the first substrate polypeptide comprises a DUF2121 recognition motif as defined in any one of SEQ ID NO: 579 to 582, with the amino acid in position 3 corresponding to alanine, wherein said DUF2121 recognition motif further comprises an amino acid sequence as defined in SEQ ID NO: 600 (or an amino acid sequence having at least about 80%, at least about 85%, at least about 90%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, at least about 99%, or preferably about 100% sequence identity thereto) C-terminally adjacent to the sequence of SEQ ID NO: 579 to 582, and wherein said DUF2121 recognition motif further comprises an amino acid sequence as defined in SEQ ID NO: 599 (or an amino acid sequence having at least about 80%, at least about 85%, at least about 90%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, at least about 99%, or preferably about 100% sequence identity thereto) N-terminally adjacent to the sequence of SEQ ID NO: 579 to 582.
[0156] In a further preferred embodiment, the first substrate polypeptide comprises a DUF2121 recognition motif as defined in any one of SEQ ID NO: 579 to 582, wherein said DUF2121 recognition motif further comprises an amino acid sequence as defined in SEQ ID NO: 600 (or an amino acid sequence having at least about 80%, at least about 85%, at least about 90%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, at least about 99%, or preferably about 100% sequence identity thereto) C-terminally adjacent to the sequence of SEQ ID NO: 579 to 582, and wherein said DUF2121 recognition motif further comprises an amino acid sequence as defined in SEQ ID NO: 599 (or an amino acid sequence having at least about 80%, at least about 85%, at least about 90%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, at least about 99%, or preferably about 100% sequence identity thereto) N- terminally adjacent to the sequence of SEQ ID NO: 579 to 582.
[0157] The herein provided methods employ Connectases (i.e., a polypeptide comprising an N-terminal DUF2121 domain having an N-terminal serine or threonine residue).
[0158] In the context of the present invention, the polypeptide comprising an N-terminal DUF2121 domain has transpeptidase activity, preferably sequence-specific transpeptidase activity and most preferably DUF2121 transpeptidase activity.
[0159] In the context of the present invention the sequence specificity of Connectases is conferred by the recognition of a DUF2121 recognition motif or the C-terminal portion thereof in a substrate polypeptide by the DUF2121 domain of the polypeptide of the invention. Thus, the sequencespecific transpeptidase activity according to the invention may comprise the capability of catalyzing the formation of a peptide bond between the most C-terminally positioned residue of an N-terminal portion of a first substrate polypeptide and the most N-terminally positioned residue of a C-terminal portion of a second substrate polypeptide so as to form a fusion polypeptide comprising the N-terminal portion of the first substrate polypeptide and the C-terminal portion of the second substrate polypeptide C-terminally fused thereto. The transpeptidase activity may comprise the capability of catalyzing the formation of a peptide bond between said first cleavage product and said second substrate polypeptide thereby fusing / ligating said first cleavage product N-terminally to said second substrate polypeptide.
[0160] As mentioned above, Connectases (as opposed to other transpeptidases, such as, e.g., Sortase A) do not comprise protease activity. Accordingly, said polypeptide comprising an N-terminal DUF2121 domain does not have protease activity. The term 'protease activity' refers to the enzymatic capability to catalyse the hydrolytic cleavage of peptide bonds, where water is the acceptor of the peptide group, resulting in the degradation of the polypeptide substrate. Accordingly, said polypeptide comprising an N-terminal DUF2121 domain does not degrade said first cleavage product or said second cleavage product.
[0161] The N-terminal DUF2121 domain may comprise or consist of an amino acid sequence as depicted in SEQ ID NO: 56 or an amino acid sequence having at least about 20%, at least about 25%, at least about 30%, at least about 35%, at least about 40%, at least about 45%, at least about 46%, at least about 47%, about 48% sequence identity to SEQ ID NO: 56, has a sequence specific transpeptidase activity according to the invention.
[0162] Preferably, the N-terminal DUF2121 domain may comprise or consist of an amino acid sequence as depicted in SEQ ID NO: 56 or an amino acid sequence having at least about 40% sequence identity to SEQ ID NO: 56, has a sequence specific transpeptidase activity according to the invention.
[0163] More preferably, the N-terminal DUF2121 domain may comprise or consist of an amino acid sequence as depicted in SEQ ID NO: 56 or an amino acid sequence having about 48% sequence identity to SEQ ID NO: 56, has a sequence specific transpeptidase activity according to the invention.
[0164] The N-terminal DUF2121 domain comprises an amino acid sequence selected from the group consisting of the following (a) to (c):
[0165] (a) SEQ ID NO: 57 to 196;
[0166] (b) an amino acid sequence having at least about 60%, at least about 65%, at least about 70%, at least about 75%, at least about 80%, at least about 85%, at least about 90%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, at least about 99%, or up to about 100% sequence identity to the amino acid sequences of (a); and
[0167] (c) an amino acid sequence as defined in (a) or (b) wherein one to 10 amino acid residues are deleted, inserted or added; and wherein said N-terminal DUF2121 domain has sequence specific transpeptidase activity.
[0168] The polypeptide comprising an N-terminal DUF2121 domain has an amino acid sequence selected from the group consisting of the following (a) to (c):
[0169] (a) SEQ ID NO: 139 to 278;
[0170] (b) an amino acid sequence having at least about 60%, at least about 65%, at least about 70%, at least about 75%, at least about 80%, at least about 85%, at least about 90%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, at least about 99%, or up to about 100% sequence identity to the amino acid sequences of (a); and (c) an amino acid sequence as defined in (a) or (b) wherein 1 to 10 amino acid residues are deleted, inserted or added; and wherein the polypeptide and / or said N-terminal DUF2121 domain has sequence specific transpeptidase activity.
[0171] The polypeptide comprising an N-terminal DUF2121 domain may further comprise a C-terminal OB-like domain, preferably a C-terminal OB-like domain having an amino acid sequence selected from the group consisting of SEQ ID NO 279 to 360 or an amino acid sequence having at least about 60%, at least about 65%, at least about 70%, at least about 75%, at least about 80%, at least about 85%, at least about 90%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, at least about 99%, or up to about 100% sequence identity to any one of SEQ ID NO 279 to 360. An “OB-like domain” in the context of the invention relates to an amino acid sequence having a fold similar to the OB-fold. “OB” stands for “oligosaccharide binding”, accordingly, an OB-like domain generally has oligosaccharide binding activity. The OB-like domains in these DUF2121 domain-containing proteins are not mandatory for DUF2121 transpeptidase activity as demonstrated in the appended examples. However, their presence may facilitate substrate binding and thus the transpeptidase reaction.
[0172] The polypeptide comprising an N-terminal DUF2121 domain may comprise or consists of an amino acid sequence as depicted in SEQ ID NO: 212 or an amino acid sequence having at least 20%, preferably at least 30%, preferably at least 40%, even more preferably at least 50%, even more preferably at least 60%, even more preferably at least 70%, even more preferably at least 80%, even more preferably at least 90%, even more preferably at least 95% and most preferably at least 99% sequence identity to said amino acid sequence and has a sequence specific transpeptidase activity according to the invention.
[0173] Preferably, the polypeptide comprising an N-terminal DUF2121 domain comprises or consists of an amino acid sequence as depicted in SEQ ID NO: 161 or an amino acid sequence having at least 20%, preferably at least 30%, preferably at least 40%, even more preferably at least 50%, even more preferably at least 60%, even more preferably at least 70%, even more preferably at least 80%, even more preferably at least 90%, even more preferably at least 95% and most preferably at least 99% sequence identity to said amino acid sequence and has a sequence specific transpeptidase activity according to the invention.
[0174] Preferably, the herein employed Connectases, as well as the other herein employed enzymes, are recombinant, non-naturally occurring and / or man-made. They may further comprise a proteinaceous or non-proteinaceous moiety as defined herein below. As also indicated herein below and above, the herein employed enzymes (in particular the herein employed Connectase) may also be fusion enzymes. The herein employed enzymes may further comprise tags, such as affinity tags (e.g., His-tag, Strep-tag, etc.). In preferred embodiments, the herein employed enzymes are isolated enzymes (in particular, when employed in the context of the herein provided in vitro methods).
[0175] In the context of the herein provided methods, linking and / or fusing the second cleavage product and the second substrate polypeptide may produce a fusion polypeptide. Accordingly, in step (iii) of the herein provided methods a (desired) fusion polypeptide may be obtained.
[0176] The (to be produced) fusion polypeptide may comprise:
[0177] (a) a part of the first substrate polypeptide and a part of the second substrate polypeptide; or
[0178] (b) a part of the first substrate polypeptide and the entire second substrate polypeptide.
[0179] The (to be produced) fusion polypeptide comprises the second substrate polypeptide C-terminally fused to said second cleavage product, preferably via a covalent bond, more preferably via a peptide bond.
[0180] The present invention provides for means and methods to ligate two polypeptides (i.e., to produce a fusion polypeptide). These two polypeptides (i.e., the first substrate polypeptide and the second substrate polypeptide) may each further comprise one or more proteinaceous and / or non- proteinaceous moieties. As illustratively shown in the enclosed examples, the nature of these moieties is not particularly limited, as the ligation of such substrate polypeptides could be demonstrated for example for polypeptides further comprising antibodies (Example 5), small molecules such as biotin (Example 12), micro-titer plates (Example 13), or for polypeptide (previously) coupled to themselves (resulting in the formation of a circular polypeptide; Example 6). Such proteinaceous or non-proteinaceous moieties may be coupled to the first or second substrate polypeptide by any means or methods known in the art (e.g., via click chemistry in the context of non-proteinaceous moieties, as illustrated in Example 13; or via recombinant expression in the context of proteinaceous moieties), as long as the coupling does not interfere with the coupling of the first and the second substrate polypeptide (e.g., by interfering with the DUF2121 recognition motifs comprised therein). Accordingly, such proteinaceous or non- proteinaceous moieties can preferably be linked to the N-terminus of the first substrate polypeptide or to the C-terminus of the second substrate polypeptide. However, it is herein also envisaged that such proteinaceous or non-proteinaceous moieties can also be linked / coupled / fused to any suitable amino acid side chain (e.g., lysine or cysteine side chains) of the first or of the second substrate polypeptide. Coupling of such moieties to the substrate polypeptides can occur directly (i.e., without any linkers) or with a suitable linker. Suitable linkers can be any one of the herein detailed linkers. It is the part of the first substrate polypeptide that forms part of the second cleavage product (and thus is to be fused with the second substrate polypeptide in the context of the present invention), that comprises said proteinaceous or said non-proteinaceous moiety, in order for said proteinaceous or said non-proteinaceous moiety to be comprised in the to be produced fusion polypeptide. The herein employed first substrate polypeptide may further comprise a proteinaceous and / or a non-proteinaceous moiety on its C-terminus (i.e., on the C-terminus of the first cleavage product being produced in the context of the present invention), as long as said moiety / moieties do not interfere with the DUF2121 recognition motif comprised in said first substrate polypeptide or with said partial N-terminal DUF2121 recognition motif comprised in said first cleavage product.
[0181] Exemplary proteinaceous moieties in the context of the present invention are an antibody or an antigen binding fragment thereof (such as a Fab fragment, a single-chain variable fragment (scFv), or a Nanobody), an antibody-like scaffold (such as a bispecific T-cell engager (BiTE), a DARPin, or an Affibody), a cytokine, a therapeutic enzyme (such as Alpha-galactosidase A or Asparaginase), a transport protein (such as FABS for fatty acid transport), a storage protein (such as ferritin), a mechanical support protein (such as collagen), a growth factor, a hormone (such as insulin, TSH, a GLP-1 analogue, or Erythropoietin (EPO)), an interferon, a glycoprotein, a vaccine component (such as a multivalent subunit antigen or a protein-based adjuvant), a diagnostic or research reagent (such as a reporter enzyme like Horseradish Peroxidase or a fluorescent protein like Green Fluorescent Protein), a synthetically engineered protein, a virus-like particle, elastinlike proteins, or a fragment of any of the foregoing.
[0182] Exemplary non-proteinaceous moieties in the context of the present invention are a solid carrier, such as a solid surface (like a microplate, a membrane, a microarray chip, or a biosensor surface), a microparticle, a nanoparticle, a sphere (including nano- and microspheres), or a bead (such as a magnetic bead, or a bead for affinity purification, such as an agarose bead); a polymer (such as polyethylene glycol (PEG) or a polysaccharide); a hydrogel; a liposome; a micelle; a quantum dot; a drug, toxin, or therapeutic payload (such as a cytotoxic agent); an imaging agent / detection label (such as a fluorophore or a radiolabel); a small molecule; a small molecule ligand or affinity label (such as biotin); a nucleic acid (such as an oligonucleotide, siRNA, or mRNA); a lipid; a carbohydrate; a dendrimer; or a prosthetic or other implantable medical device. Beads (in particular magnetic beads, or agarose beads), fluorophores (such as, e.g., Cy5.5), affinity labels (such as biotin), and solid carriers (such as, a microplate / micro-titer plate) are herein particularly preferred non-proteinaceous moieties.
[0183] Exemplary fluorophores include cyanine dyes (Cy3, Cy5, Cy5.5), rhodamine dyes (Rhodamine, TRITC), fluorescein, Alexa Fluor series (Alexa Fluor 405, 488, 532, 555, 568, 594, 647, 680, 700, 750), Qdot quantum dots (Qdot 525, 565, 605, 655, 705, 800), Texas Red, allophycocyanin (APC, APC-eFluor 780), Brilliant Ultra Violet dyes (BUV395, BUV496, BUV563, BUV615, BUV661, BUV737, BUV805), and NovaFluor dyes (NovaFluor Yellow 570, 590, 610, 660, 690, 700, 730, 755, 810). Exemplary affinity labels in the context of the present invention include biotin, alkyne and azide groups (for click chemistry), epoxides, peptidyl acyloxymethyl ketones, halomethyl ketones, phosphofluorides, and photoaffinity labels (e.g., nitrenes, 2-aryl-5-carboxytetrazoles), that specifically and reversibly or irreversibly bind to target molecules, thereby enabling selective capture, labeling, or modification in biochemical processes.
[0184] Solid carriers, such as micro-titer plates, may for example be fabricated from materials like polystyrene, polypropylene, or vinyl. Micro-titer plates may for example, be available in formats including 384-well, 96-well, 24-well, 48-well, and 1536-well plates.
[0185] Accordingly, the herein provided methods may be, inter alia, particularly useful in the labeling of desired proteins (such as, e.g., an antibody, or an antigen binding fragment thereof) with e.g., fluorophores or the like. Accordingly, the first substrate polypeptide may (further) comprise a non- proteinaceous moiety N-terminally to said DUF2121 recognition motif, and / or the second substrate polypeptide may (further) comprise a non-proteinaceous moiety C-terminally to said partial N-terminal DUF2121 recognition motif, so that / to achieve that / thereby it is achieved that the produced fusion polypeptide comprises said non-proteinaceous moiety. Accordingly, the first substrate polypeptide may (further) comprise a non-proteinaceous moiety N-terminally to said DUF2121 recognition motif, and / or the second substrate polypeptide may (further) comprise a non-proteinaceous moiety C-terminally to said partial N-terminal DUF2121 recognition motif, wherein the produced fusion polypeptide comprises said non-proteinaceous moiety.
[0186] The non-proteinaceous moiety is not particularly limited in the context of the present invention. Preferably said non-proteinaceous moiety may be selected from the group consisting of a fluorophore, a drug, a toxin, a carbohydrate, a lipid, a solid carrier, and an oligonucleotide.
[0187] The first substrate polypeptide may (further) comprise an antibody, or an antigen-binding fragment thereof N-terminally to said DUF2121 recognition motif, and / or the second substrate polypeptide may (further) comprise an antibody, or an antigen-binding fragment thereof C-terminally to said partial N-terminal DUF2121 recognition motif, so that / to achieve that / thereby it is achieved that the produced fusion polypeptide comprises said antibody, or said antigen-binding fragment thereof. Accordingly, the first substrate polypeptide may (further) comprise an antibody, or an antigen-binding fragment thereof N-terminally to said DUF2121 recognition motif, and / or the second substrate polypeptide may (further) comprise an antibody, or an antigen-binding fragment thereof C-terminally to said partial N-terminal DUF2121 recognition motif, wherein the produced fusion polypeptide comprises said antibody, or said antigen-binding fragment thereof. The first substrate polypeptide may (further) comprise an enzyme N-terminally to said DUF2121 recognition motif, and / or wherein said second substrate polypeptide may (further) comprise an enzyme C-terminally to said partial N-terminal DUF2121 recognition motif, so that / to achieve that / thereby it is achieved that the produced fusion polypeptide comprises said enzyme. Accordingly, the first substrate polypeptide may (further) comprise an enzyme N-terminally to said DUF2121 recognition motif, and / or wherein said second substrate polypeptide may (further) comprise an enzyme C-terminally to said partial N-terminal DUF2121 recognition motif, wherein the produced fusion polypeptide comprises said enzyme.
[0188] The part of the first substrate polypeptide or the part of the second substrate polypeptide forming part of the produced fusion polypeptide may comprise a protein and wherein the part of the other substrate polypeptide (i.e., the substrate polypeptide not comprising a protein) forming part of the produced fusion polypeptide may have / may comprise a solid carrier attached thereto, wherein the produced fusion polypeptide may comprise the protein immobilized on the solid carrier, preferably wherein the protein is an enzyme. Accordingly, if the part of the first substrate polypeptide forming part of the produced fusion polypeptide comprises a protein, the part of the second substrate polypeptide forming part of the produced fusion polypeptide comprises a solid carrier, and vice versa. Said protein may also be any other proteinaceous moiety envisaged herein (such as, for example, an antibody). Accordingly, the part of the first substrate polypeptide or the part of the second substrate polypeptide forming part of the produced fusion polypeptide may comprise a protein and wherein the part of the other substrate polypeptide forming part of the produced fusion polypeptide has a solid carrier attached thereto, wherein the produced fusion polypeptide comprises said protein linked to its C-terminus and comprises said solid carrier linked to its N- terminus, or wherein the produced fusion polypeptide comprises said protein linked to its N- terminus and comprises said solid carrier linked to its C-terminus.
[0189] As illustrated in Example 13, the herein provided means and methods can be employed to in the immobilization of a polypeptide, in particular in the immobilization of a polypeptide (i.e., a first substrate polypeptide comprising an antibody, e.g., an anti-HER2 antibody) in the wells of a microtiter plate (e.g., a 96-well plate). Such an approach is particularly suitable in the preparation of an antibody coated plate, such as an ELISA plate. Accordingly, a particularly preferred non- proteinaceous moieties to be comprised in the first or the second substrate polypeptide is a microtiter plate, with the other substrate polypeptide in this context preferably comprising a proteinaceous moiety (such as an antibody).
[0190] The first substrate polypeptide and / or the second substrate polypeptide may comprise a label. Accordingly, the first substrate polypeptide and / or the second substrate polypeptide may be isotopically labeled, preferably wherein the first or the second polypeptide are isotopically labeled. The part of the first substrate polypeptide or the part of the second substrate polypeptide forming part of the produced fusion polypeptide may be part of a virus-like particle and wherein the part of the other substrate polypeptide forming part of the produced fusion polypeptide may comprise an immunogenic structure.
[0191] The N-terminus of said first substrate polypeptide and / or the C-terminus of said second substrate polypeptide may be linked to a membrane, preferably a vesicle membrane.
[0192] The first substrate polypeptide may comprise an intramolecular disulfide bond, preferably the first cysteine residue forming the disulfide bond may be located N-terminally of the DUF2121 recognition sequence and the second cysteine residue forming the disulfide bond may be located C -terminally of the DUF2121 recognition motif.
[0193] The first substrate polypeptide may comprise an affinity tag N-terminally to said DUF2121 recognition motif, and / or the second substrate polypeptide may comprise an affinity tag C- terminally to said partial N-terminal DUF2121 recognition motif.
[0194] The first substrate polypeptide may comprise an affinity tag N-terminally to said DUF2121 recognition motif, and the second substrate polypeptide may comprise an affinity tag C-terminally to said partial N-terminal DUF2121 recognition motif, preferably the first and the second affinity tag are not identical.
[0195] The herein provided methods may preferably be in vitro methods. The herein provided in vitro methods are carried out in suitable conditions (i.e., conditions suitable for the catalysis of the reaction by the employed enzymes and / or chemicals). Such conditions are illustrated in the enclosed examples. For example, such in vitro methods may be carried out in a suitable buffer / buffer system. The term "buffer" or "buffer solution", as used herein, refers to an aqueous solution that resists significant changes in pH upon the addition of small quantities of an acid or a base. In the context of the present invention, a buffer is used to provide a stable chemical environment with a specific pH that is optimal for the activity, stability, and structural integrity of the enzymes employed, such as the polypeptide comprising an N-terminal DUF2121 domain (Connectase) and the various modifying enzymes (e.g., proline aminopeptidase, N- acetyltransf erase). Such buffers are not particularly limited. A buffer typically comprises a weak acid and its conjugate base, or a weak base and its conjugate acid. Furthermore, as used herein, a "buffer" or "buffer solution" may also contain other dissolved compounds that support the desired reaction conditions, including but not limited to: Salts (such as sodium chloride or potassium chloride) to maintain a specific ionic strength, chelating agents (such as EDTA) to prevent interference from metal ions, other additives required for enzyme stability, purification, or activity (such as imidazole or glycerol). Non-limiting examples of buffering agents suitable for use in the methods of the present invention, as demonstrated in the Examples, include: Tris (Tris(hydroxymethyl)aminomethane), HEPES (4- (2-hy droxy ethyl)- 1 -piperazineethanesulfonic acid), MES (2-(N-morpholino)ethanesulfonic acid) MOPS (3-(N-morpholino)propanesulfonic acid), Phosphate buffers (e.g., derived from sodium or potassium phosphate salts like KH2PO4 and Na2HP04), and Acetate buffers (e.g., sodium acetate) For instance, a suitable buffer for carrying out the conjugation reactions described herein could be a neutral (pH 7.0) buffer containing 50 mM sodium acetate, 50 mM MES, 50 mM HEPES, 150 mM NaCl, and 50 mM KC1, as described in Example 1.2 of the application.
[0196] The herein provided in vitro methods may be carried out sequentially. Accordingly, in the context of the herein provided in vitro methods step (i) may be carried out before step (ii) and step (ii) may carried out before step (iii).
[0197] Alternatively, in the context of the herein provided in vitro methods, steps (i), (ii), and (iii) may be carried out simultaneously and / or in a single reaction.
[0198] Irrespective of whether the herein provided in vitro methods are carried out sequentially or simultaneously, the methods may further comprise collecting the produced fusion polypeptide. In the context of the present invention, the term “produced fusion polypeptide” may refer to e.g., the produced fusion polypeptide (i.e., a polypeptide as produced by the herein provided method for the production of a fusion polypeptide), the produced circular / circularized polypeptide (i.e., a polypeptide as produced by the herein provided method for the production of a circular polypeptide), or the immobilized polypeptide (i.e., a polypeptide as produced by the herein provided method for the immobilization of a polypeptide).
[0199] The terms “recovering”, “purifying”, “collecting” and “isolating” are used interchangeably herein. The produced polypeptide may be recovered by methods known in the art. For example, the polypeptide may be recovered from the reaction composition by conventional procedures including, but not limited to, centrifugation, filtration, extraction, spray-drying, evaporation, or precipitation. In another aspect the produced fusion polypeptide may be purified by chromatography (e.g., ion exchange, affinity, hydrophobic, chromatofocusing, and size exclusion), electrophoretic procedures (e.g., preparative isoelectric focusing), differential solubility (e.g., ammonium sulfate precipitation), SDS-PAGE, or extraction (see for example Jansen (1989) Protein Purification, VCH Publishers, New York). The skilled person is aware of suitable affinity tags. An 'affinity tag' is any polypeptide sequence or chemical moiety fused or conjugated to a polypeptide of interest for the primary purpose of purification or detection. The tag functions by binding with high specificity to a corresponding immobilization matrix or detection agent. Nonlimiting examples for affinity tags used in affinity chromatography are the Hise-tag or the Strep- tag. The affinity tags bind to the corresponding affinity matrix. The affinity matrix may be a solid carrier comprising the structure having affinity for the affinity tag. Said structure may be Ni2+- NTA for the Hise-tag and streptavidin for the Strep-tag. Definition and examples for solid carriers are provided herein.
[0200] The herein provided methods may also be in vivo methods.
[0201] The herein provided in vivo methods may comprise expressing the polypeptide comprising an N- terminal DUF2121 domain, the modifying enzyme to be employed in step (ii), the first substrate polypeptide, and the second substrate polypeptide in a host or a host cell.
[0202] The herein provided in vivo methods may comprise expressing the polypeptide comprising an N- terminal DUF2121 domain, the first substrate polypeptide, and the second substrate polypeptide in a host or a host cell and providing the modifying chemical to said host or host cell. Providing the modifying chemical to said host or host cell is not particularly limited as long as said modifying chemical may come into contact with the (N-terminal amino acid of the) first cleavage product (as produced by the Connectase).
[0203] Expressing the above polypeptides or enzymes in a host cell comprises expression from one or more nucleic acid(s). Accordingly, the present invention also relates to nucleic acids encoding such polypeptides and enzymes. In particular, the present invention provides for a nucleic acid encoding the first substrate polypeptide. Said nucleic acid may further comprise a multiple cloning site (upstream to the DUF2121 recognition sequence comprised in said first substrate polypeptide), thereby allowing the integration of a desired target polypeptide. Further, the present invention provides for a nucleic acid encoding the second substrate polypeptide. Said nucleic acid may further comprise a multiple cloning site (downstream to the DUF2121 recognition sequence comprised in said second substrate polypeptide), thereby allowing the integration of a desired target polypeptide / polypeptide of interest. In the context of the present invention, a multiple cloning site is a synthetically engineered segment of nucleic acid which can be incorporated into a vector, such as a plasmid. An MCS is characterized by comprising a plurality of distinct restriction enzyme recognition sites that are arranged in close proximity to one another. The primary function of an MCS is to provide a versatile region for the insertion of a heterologous nucleic acid sequence, or 'insert', into the vector's backbone. For the practice of the present disclosure, conventional compositions and methods for preparing and using constructs and host cells are well known to one skilled in the art, see for example, Molecular Cloning: A Laboratory Manual, 3rdedition Volumes 1, 2, and 3. J. F. Sambrook, D. W. Russell, andN. Irwin, Cold Spring Harbor Laboratory Press, 2000
[0204] In the context of the present invention, a host is non-human.
[0205] The host or host cell may be procaryotic or eukaryotic, preferably wherein said host or host cell is selected from the group consisting of: Escherichia coli, Bacillus subtilis, Pseudomonas fluorescens, Sulfolobus solfataricus, Pichia pastoris, Saccharomyces cerevisiae, Schizosaccharomyces pombe, Arabidopsis thaliana, Zea mays, Oryza sativa, Caenorhabditis elegans, Drosophila melanogaster, Mus musculus, Danio rerio, Homo sapiens, T4 phage, and TEV virus.
[0206] The in vivo method further comprises obtaining the produced fusion polypeptide from said host or host cell. The produced fusion polypeptide may be obtained by methods known in the art, in particular by the methods mentioned herein above in the context of the herein provided in vitro methods. Obtaining the produced fusion polypeptide may in a first step further comprise the extracting the fusion polypeptide or a solution comprising the fusion polypeptide from the host or host cell. Extracting the fusion polypeptide may for example comprise the lysis of the host cell by methods known in the art, such as for example enzymatic, chemical, physical lysis or the like.
[0207] As mentioned above, the herein provided means and methods are, inter alia, based on the modification of (the N-terminal amino acid comprised in) a partial N-terminal DUF2121 recognition sequence comprised in the first cleavage product (of a Connectase reaction). Such modification may be performed / carried out by (modifying) enzymes or (modifying) chemicals. In the invention various enzymes and chemicals are identified that may specifically modify the N- terminal amino acid of the first cleavage product thereby depleting said first cleavage product from the pool of reactants accessible to the Connectase. The person skilled in the art can readily obtain modifying enzymes and / or modifying chemicals to be employed in context of the present invention. The selection of such modifying enzymes and / or modifying chemicals may depend on the amino acid sequence (in particular the N-terminal amino acid) of the first cleavage product to be modified and also on the selected second substrate polypeptide. For example, a proline aminopeptidase may be particularly useful in the context of the present invention when employed in the herein provided methods in combination with a first cleavage product comprising a proline residue as N-terminal amino acid and a second substrate comprising an alanine residue (or an amino acid residue selected from cysteine, serine, valine, tryptophan, methionine, glycine, leucine, phenylalanine, isoleucine, aspartate, glutamate, histidine, asparagine, glutamine, tryptophan, or tyrosine) as N-terminal amino acid. Even though proline aminopeptidases (also referred to as “PAP” herein) are a structurally diverse group of enzymes, they are defined based on their function (i.e., cleavage of an N-terminal proline residue from a polypeptide), and they are known in the art (see e.g., Enzyme Commission number: EC 3.4.11.5). The appended examples illustratively demonstrate that proline aminopeptidases sharing only about 24% sequence identity can both effectively be employed in the herein provided methods. In particular, a proline aminopeptidase from Bacillus coagulans (also referred to as BcPAP herein; exemplified in SEQ ID NO: 52 and gene bank number BAA01792.1) resulted in increased fusion product (i.e., fusion polypeptide) formation (as compared to a Connectase reaction lacking BcPAP, as conducted in the state of the art), this is detailed in, inter alia, appended Example 4. A proline aminopeptidase from Flavobacterium meningosepticum (also referred to as FmPAP herein; exemplified in SEQ ID NO: 589 and gene bank number BAA19688.1) also increased product yield similar to BcPAP (see appended Example 9). Thus, it is demonstrated herein that structurally diverse proline aminopeptidases may be readily employed in the herein provided methods resulting in increased product yield. Accordingly, the skilled person is aware that structurally different enzymes with comparable enzymatic activity may be employed in this invention. For example, the skilled person may readily identify alternative proline aminopeptidases to be employed in context of the present invention. In particular, Enzyme Commission number: EC 3.4.11.5 functionally classifies proline aminopeptidases. The skilled person can readily identify the respective amino acid sequences and nucleic acid sequences encoding the same. For example, the Swiss Bioinformatics Resource Portal (see, e.g., https: / / www.expasy.org / ) from the Swiss Institute of Bioinformatics provides for such sequence information (in particular, for amino acid sequences see UniProtKB / Swiss-Prot information) on enzymes as classified by Enzyme Commission numbers. It is noted that the enzymes as classified by Enzyme Commission numbers on the Swiss Bioinformatics Resource Portal are also functionally characterized. For examples, BcPAP can be found under entry “P46541, PIP HEYCO” and FmPAP can be found under entry “005420, PIP ELIME” at the Swiss Bioinformatics Resource Portal. Accordingly, the skilled person can readily identify further amino acid sequences of proline aminopeptidases to be employed in (step (ii) of) the herein provided methods. The herein employed proline amino peptidases may preferably be bacterial proline amino peptidases.
[0208] Further, e.g., a first cleavage product comprising a cysteine residue as N-terminal amino acid and a second substrate comprising an proline (or another amino acid residue selected from proline, alanine, serine, valine, tryptophan, methionine, glycine, leucine, phenylalanine, isoleucine, aspartate, glutamate, histidine, asparagine, glutamine, tryptophan, or tyrosine) may be particularly useful in the context of the present invention when employed in the herein provided methods in combination e.g., with a modifying chemical such as 2-Formylphenylboronic acid (2-FPBA; CAS No.: 40138-16-7) which specifically modifies N-terminal cysteine residues. This is illustratively shown in Example 10 herein below.
[0209] The above exemplified, and further herein envisaged combinations of modifying enzymes or modifying chemicals in combination with suitable N-terminal amino acids (as comprised in the first cleavage product or the second substrate polypeptide) are summarized in Table 1, below.
[0210] In the context of the herein provided methods, in step (ii) said partial N-terminal DUF2121 recognition motif may be modified by one or more enzyme(s) and / or one or more chemical(s), preferably an enzyme or a chemical. Table 1: Suitable combinations of modifying enzymes / modifying chemicals and N-terminal amino acids in the second substrate polypeptide and N-terminal amino acids in the first cleavage products to be employed in the herein provided methods. The modifying enzyme to be employed in step (ii) may be selected from the group consisting of a peptidase and an N-acetyltransferase. Said peptidase may be an exopeptidase. The modifying enzyme to be employed in step (ii) preferably has substrate specificity for said first cleavage product or said modifying enzyme to be employed in step (ii) preferably has substrate specificity to the N-terminal amino acid residue of said first cleavage product. The term “substrate specificity”, in the context of the present invention, refers to the ability of a (modifying) enzyme or (modifying) chemical agent employed in step (ii) to selectively recognize and catalyze a reaction on the N-terminal amino acid residue of the first cleavage product, thereby modifying the same, while exhibiting substantially no activity towards the N-terminal amino acid residue of the second substrate polypeptide under the same reaction conditions (i.e., not modifying said second substrate polypeptide).
[0211] Accordingly, the N-terminal amino acid residue of said second substrate polypeptide is preferably not a substrate for said modifying enzyme to be employed in step (ii). Accordingly, said modifying enzyme to be employed in step (ii) of the herein provided methods (or the herein provided modifying chemical to be employed in step (ii) of the herein provided methods) does not modify and / or is not able to modify said second substrate polypeptide.
[0212] Preferably, the modifying enzyme / the peptidase / the exopeptidase is an aminopeptidase. Aminopeptidases (Enzyme Commission number: EC 3.4.11) may have specificity to a certain N- terminal amino acid (for example proline aminopeptidases may specifically cleave N-terminal proline residues, and alanine aminopeptidases may specifically cleave N-terminal alanine residues). Table 1 provides for exemplary aminopeptidases that may be / may function as a modifying enzyme in the context of the present invention.
[0213] Accordingly, said modifying enzyme to be employed in step (ii) may be capable of cleaving off at least the N-terminal amino acid residue of the first cleavage product, preferably only the N- terminal amino acid residue of the first cleavage product.
[0214] In a particularly preferred embodiment, the exopeptidase to be employed in the context of the present invention is a proline aminopeptidase and wherein the N-terminal amino acid of said first cleavage product is proline. This is further illustrated in Table 1. The term “proline aminopeptidase” may be used interchangeably with the terms “proline iminopeptidase”, “prolyl aminopeptidase”, “prolyl iminopeptidase” herein.
[0215] The proline aminopeptidase may comprise an amino acid sequence as defined in any one of SEQ ID NO: 52 and 589 to 591 or the proline aminopeptidase may comprise an amino acid sequence having at least about 60%, at least about 65%, at least about 70%, at least about 75%, at least about 80%, at least about 85%, at least about 90%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, at least about 99%, or up to about 100% sequence identity a sequence as defined in any one of SEQ ID NO: 52 and 589 to 591 and comprises proline aminopeptidase activity.
[0216] The proline aminopeptidase may be a proline aminopeptidase from Flavobacterium, preferably from Flavobacterium meningosepticum. Flavobacterium meningosepticum may also be referred to as Elizabethkingia meningoseptica or Chryseobacterium meningosepticum. Accordingly, the proline aminopeptidase may be a proline aminopeptidase from Elizabethkingia or Chryseobacterium, preferably from Elizabethkingia meningoseptica or Chryseobacterium meningosepticum. Preferably, the proline aminopeptidase comprises an amino acid sequence as defined in SEQ ID NO: 589 or the proline aminopeptidase comprises an amino acid sequence having at least about 60%, at least about 65%, at least about 70%, at least about 75%, at least about 80%, at least about 85%, at least about 90%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, at least about 99%, or up to about 100% sequence identity to SEQ ID NO: 589 and comprises proline aminopeptidase activity.
[0217] The proline aminopeptidase may be a proline aminopeptidase from Bacillus, preferably from Bacillus coagulans. Bacillus coagulans may also be referred to as Heyndrickxia coagulans or Weizmannia coagulans. Accordingly, the proline aminopeptidase may be a proline aminopeptidase from Heyndrickxia or Weizmannia, preferably from Heyndrickxia coagulans or Weizmannia coagulans. Preferably, the proline aminopeptidase comprises an amino acid sequence as defined in SEQ ID NO: 52 or the proline aminopeptidase comprises an amino acid sequence having at least about 60%, at least about 65%, at least about 70%, at least about 75%, at least about 80%, at least about 85%, at least about 90%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, at least about 99%, or up to about 100% sequence identity to SEQ ID NO: 52 and comprises proline aminopeptidase activity.
[0218] As indicated for example in Table 1, when employing a proline aminopeptidase (that preferably specifically cleaves off proline) it is preferred that the N-terminal amino acid of the second substrate polypeptide is not proline (and is thus not cleaved by said proline aminopeptidase). In the context of the herein provided methods employing proline aminopeptidases, the N-terminal amino acid of said second substrate polypeptide may be selected from alanine, cysteine, serine, valine, tryptophan, methionine, glycine, leucine, phenylalanine, isoleucine, aspartate, glutamate, histidine, asparagine, glutamine, tryptophan, or tyrosine, preferably alanine, cysteine, serine, or valine, more preferably alanine. In this context, the N-terminal amino acid of said second substrate polypeptide may be also selected from alanine, cysteine, serine, valine, tryptophan, or methionine, preferably alanine, cysteine, serine, or valine, more preferably alanine.
[0219] In a further preferred embodiment, the exopeptidase (to be employed in step (ii) of the herein provided methods) is an alanine aminopeptidase and wherein the N-terminal amino acid of said first cleavage product is alanine. Alanine aminopeptidase (herein also interchangeably referred to as “alanyl aminopeptidase”) are functionally grouped under Enzyme Commission number: EC 3.4.11.2. Accordingly, and as detailed above, the skilled person can readily identify alanine aminopeptidases for example via the Swiss Bioinformatics Resource Portal to be employed in the context of the herein provided methods.
[0220] The alanine aminopeptidase may comprise an amino acid sequence as defined in any one of SEQ ID NO: 592 and 593 or the alanine aminopeptidase may comprise an amino acid sequence having at least about 60%, at least about 65%, at least about 70%, at least about 75%, at least about 80%, at least about 85%, at least about 90%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, at least about 99%, or up to about 100% sequence identity a sequence as defined in any one of SEQ ID NO: 592 and 593 and comprises alanine aminopeptidase activity.
[0221] As indicated for example in Table 1, when employing an alanine aminopeptidase (that preferably specifically cleaves off alanine) it is preferred that the N-terminal amino acid of said second substrate polypeptide is not alanine. In the context of the herein provided methods employing alanine aminopeptidases, the N-terminal amino acid of said second substrate polypeptide is selected from proline, cysteine, serine, valine, tryptophan, methionine, glycine, leucine, phenylalanine, isoleucine, aspartate, glutamate, histidine, asparagine, glutamine, tryptophan, or tyrosine, preferably proline, cysteine, serine, or valine, more preferably proline. In this context, the N-terminal amino acid of said second substrate polypeptide may also be selected from proline, cysteine, serine, valine, tryptophan, or methionine, preferably proline, cysteine, serine, or valine, more preferably proline.
[0222] In a further preferred embodiment, the modifying enzyme to be employed in step (ii) is an N- acetyltransferase. The term “N-acetyltransferase” and the term “amino-acid N-acetyltransferase” may be used interchangeably herein. N-acetyltransferases (as functionally grouped under Enzyme Commission number: EC 2.3.1) can transfer acetyl groups to the N-terminus of amino acids (such as to the amino-terminus of amino acids that are N-terminally comprised in polypeptides, such as the first substrate polypeptide. The acetylation of the N-terminal amino acid comprised in the first cleavage product modifies the same, thereby depleting the first cleavage product from the pool of reactants. The examples show the effect of N-terminal acetylation of e.g., an N-terminal alanine residue comprised in the first cleavage product on product yields of Connectase reactions (see in particular Examples 3 and 4). The skilled person can readily identify amino-acid N- acetyltransferase (for example via the Swiss Bioinformatics Resource Portal and EC 2.3.1) to be employed in the context of the herein provided methods.
[0223] The N-acetyltransferase may comprise an amino acid sequence as defined in any one of SEQ ID NO: 594 and 595 or the N-acetyltransferase may comprise an amino acid sequence having at least about 60%, at least about 65%, at least about 70%, at least about 75%, at least about 80%, at least about 85%, at least about 90%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, at least about 99%, or up to about 100% sequence identity a sequence as defined in any one of SEQ ID NO: 594 and 595 and comprises N-acetyltransferase activity.
[0224] In the context of the herein provided methods employing N-acetyltransferases, said N- acetyltransferase is capable of acetylating the N-terminal amino acid residue of the first cleavage product, preferably only the N-terminal amino acid residue of the first cleavage product. Accordingly, the N-acetyltransferase may in accordance with the present invention not be able to acetylate the N-terminal amino acid residue of the second substrate polypeptide.
[0225] Given the above-mentioned peculiar structure of the N-terminus of proline residues, N- acetyltransferases may acetylate the N-terminus of all standard proteinogenic amino acids except for proline. In other words, the N-acetyltransferase may not acetylate the N-terminus of proline. Accordingly, the N-terminal amino acid of said first cleavage polypeptide may preferably not be proline. The N-terminal amino acid of said first cleavage product may be selected from the group consisting of alanine, cysteine, serine, valine, tryptophan, methionine, glycine, leucine, phenylalanine, isoleucine, aspartate, glutamate, histidine, asparagine, glutamine, tryptophan, or tyrosine, preferably alanine, or cysteine, more preferably alanine. The N-terminal amino acid of said first cleavage product may also be selected from the group consisting of from alanine, cysteine, serine, valine, tryptophan, and methionine.
[0226] In the context of the herein provided methods employing N-acetyltransferases (for modifying the N-terminus of the first substrate polypeptide), the N-terminal amino acid of said second substrate polypeptide preferably is proline.
[0227] In the context of the present invention, said polypeptide comprising the N-terminal DUF2121 domain and the modifying enzyme to be employed in step (ii) may be covalently linked.
[0228] In the context of the present invention, when covalently linked, the polypeptide comprising an N- terminal DUF2121 domain and a modifying enzyme to be employed in step (ii) may collectively be herein referred to as “fusion enzyme”. The polypeptide comprising an N-terminal DUF2121 domain and the modifying enzyme to be employed in step (ii) may be covalently linked via a linker. Such linkers are not particularly limited in the context of the present invention as long as they do not hinder / substantially reduce / abolish the enzymatic activity of the linked enzymes. The skilled person is aware of suitable linkers for linking / fusing / coupling two polypeptides / enzymes / protein. Said linker may be a polypeptide linker, a polyethylene glycol linker, or an alkyl linker. Exemplary polypeptide linkers can routinely comprise multiple small amino acids, such as one or more glycine residues and / or one or more serine residues. However, also alternative linkers are envisaged herein. Said polypeptide linker may comprise a sequence selected from GGGGS (SEQ ID NO: 586), GGGGSGGGGS (SEQ ID NO: 587), or GGGGSGGGGSGGGGS (SEQ ID NO: 588). A preferred polypeptide linker in the context of the herein provided polypeptide comprising an N-terminal DUF2121 domain and the modifying enzyme is GSGSGSGSG (SEQ ID NO: 606).
[0229] The present invention further provides for modifying chemicals to be employed in step (ii) of the herein provided methods. As mentioned above, suitable combinations of such modifying chemicals and N-terminal amino acids in the second substrate polypeptide and N-terminal amino acids in the first cleavage products to be employed in the herein provided methods are summarized in Table 1. Accordingly, a 'modifying chemical' is any chemical compound or composition of matter capable of performing the 'modifying' step as defined herein. Non-limiting examples include benzaldehydes comprising an ortho-boronic acid substituent like 2-FPBA, and compositions comprising an aldehyde and an organoboronic acid.
[0230] The present invention is also illustrated in one specific, yet not limiting, embodiment employing a specific Connectase of M. mazei (i.e., the Connectase as shown in SEQ ID NO: 161) and (a) modifying chemical(s). Accordingly, the inventive method for the ligation of two polypeptides, the method for the cyclization of a polypeptide or the method for the immobilization of a polypeptide may comprise the following steps (i) to (iii):
[0231] (i) contacting a first substrate polypeptide [e.g., A-ELASKDCGAFDADPLVVEI (SEQ ID NO: 597)] with a polypeptide comprising an N-terminal DUF2121 domain having an N-terminal serine or threonine residue [e.g., a Connectase from AT. mazei as shown in e.g., SEQ ID NO: 161], thereby cleaving said first substrate polypeptide into a first cleavage product [e.g., CGAFDADPLVVEI (SEQ ID NO: 5)] and a second cleavage product [e.g., A-ELASKD (SEQ ID NO: 46)];
[0232] (ii) modifying a partial N-terminal DUF2121 recognition motif [e.g., CGAFDADPLVVEI] comprised in said first cleavage product so that / to achieve that / thereby it is achieved that said modified partial N-terminal DUF2121 recognition motif is not recognized by said polypeptide comprising an N-terminal DUF2121 domain, preferably wherein said partial N-terminal DUF2121 recognition motif is modified by a benzaldehyde comprising an ortho-boronic acid substituent [e.g., 2-Formylphenylboronic acid]; and
[0233] (iii) contacting said second cleavage product and / or said polypeptide comprising an N- terminal DUF2121 domain with a second substrate polypeptide [e.g., PGAFDADPLVVEI-B (SEQ ID NO: 16)], thereby said second cleavage product is fused to said second substrate polypeptide [e.g., resulting in the fusion polypeptide: A-ELASKDPGAFDADPLVVEI-B (SEQ ID NO: 35)].
[0234] Example 10 illustratively shows that such modifying chemicals as included in Table 1 and further detailed herein below can be effectively employed to modify the N-terminal amino acid of the first cleavage product without modifying the second substrate polypeptide (and thus increase the product yield of Connectase reactions). The modifying chemical to be employed in step (ii) may (specifically) modify the N-terminal amino acid residue of the first cleavage product. Accordingly, in that case said modifying chemical to be employed in the same step (ii) is, preferably, not a chemical that modifies and / or is not capable of modifying the N-terminal amino acid residue of the second substrate peptide.
[0235] The prior art provides for various chemicals to be employed as modifying chemicals in step (ii) of the herein provided methods. In particular, Bandyopadhyay et al., (2016) Chem Sci.7(7):4589- 4593 provides for a benzaldehyde comprising an ortho-boronic acid substituent (i.e., 2- Formylphenylboronic acid). According to Bandyopadhyay et al., (2016), 2-Formylphenylboronic acid (2-FPBA; CAS No.: 40138-16-7) can be used to modify N-terminal cysteine residues. Further, Sim et al. (2020) Chem. Sci., 11, 53-61 provides for a composition comprising an aldehyde and an organoboronic acid, which may be employed to modify N-terminal proline residues. Both Bandyopadhyay et al., (2016) and Sim et al. (2020) are herewith incorporated by reference in their entirety. Accordingly, the modifying chemical (to be employed in step (ii) of the herein provided methods) may be a benzaldehyde comprising an ortho-boronic acid substituent, or a composition comprising an aldehyde and an organoboronic acid, preferably a benzaldehyde comprising an ortho-boronic acid substituent. Accordingly, “a chemical” or “a modifying chemical” may in the context of the present invention also refer to a composition comprising one or more components (such as a composition comprising an aldehyde and an organoboronic acid).
[0236] The benzaldehyde comprising an ortho-boronic acid substituent may be 2-Formylphenylboronic acid (2-FPB A), and the aldehyde may be salicylaldehyde, 2-pyridinecarbaldehyde, glyoxylic acid, or 3-hydroxy-2- pyridinecarbaldehyde, and / or the organoboronic acid may be phenylboronic acid or para-methoxyboronic acid. All of these exemplified modifying chemicals to be employed in the context of step (ii) of the present invention are readily commercially available. The skilled person can readily obtain the respective modifying chemicals via the following CAS numbers: 2- Formylphenylboronic acid (2-FPB A; CAS No.: 40138-16-7), Salicylaldehyde (CAS No.: 90-02- 8), 2-pyridinecarbaldehyde (CAS No.: 1121-60-4), Glyoxylic acid (CAS No.: 79-14-1), 3- hydroxy-2-Pyridinecarbaldehyde (CAS No.: 1849-55-4), Phenylboronic acid (CAS No.: 98-80-6), and Para-methoxyboronic acid (CAS No.: 5720-07-0).
[0237] In a further particularly preferred embodiment, the modifying chemical (to be employed in step (ii) of the herein provided methods) is a benzaldehyde comprising an ortho-boronic acid substituent, preferably 2-Formylphenylboronic acid (2-FPBA), the N-terminal amino acid residue of said first cleavage product is cysteine, and the N-terminal amino acid residue of said second substrate polypeptide preferably is not cysteine, more preferably the N-terminal amino acid residue of said second substrate polypeptide is selected from the group consisting of proline, alanine, serine, valine, tryptophan, methionine, glycine, leucine, phenylalanine, isoleucine, aspartate, glutamate, histidine, asparagine, glutamine, tryptophan, or tyrosine. In this context, the N-terminal amino acid residue of said second substrate polypeptide is selected from the group consisting of proline, alanine, serine, valine, tryptophan, and methionine.
[0238] It may be further preferred that the modifying chemical (to be employed in step (ii) of the herein provided methods) is a benzaldehyde comprising an ortho-boronic acid substituent, preferably 2- Formylphenylboronic acid (2-FPBA), the N-terminal amino acid residue of said first cleavage product is cysteine, and the N-terminal amino acid residue of said second substrate polypeptide preferably is not cysteine, more preferably the N-terminal amino acid residue of said second substrate polypeptide is selected from the group consisting of proline, alanine, serine, or valine.
[0239] It may be further preferred that the modifying chemical (to be employed in step (ii) of the herein provided methods) is a benzaldehyde comprising an ortho-boronic acid substituent, preferably 2- Formylphenylboronic acid (2-FPBA), the N-terminal amino acid residue of said first cleavage product is cysteine, and the N-terminal amino acid residue of said second substrate polypeptide preferably is not cysteine, more preferably the N-terminal amino acid residue of said second substrate polypeptide is selected from the group consisting of proline and alanine, preferably alanine,
[0240] In a further preferred embodiment, the modifying chemical is a composition comprising an aldehyde and an organoboronic acid, preferably the aldehyde is salicylaldehyde, 2- pyridinecarbaldehyde, glyoxylic acid, or 3 -hydroxy-2- pyridinecarbaldehyde, the organoboronic acid is phenylboronic acid or para-methoxyboronic acid, the N-terminal amino acid residue of said first cleavage product is proline, and preferably the N-terminal amino acid residue of said second substrate polypeptide is not proline, more preferably the N-terminal amino acid residue of said second substrate polypeptide is selected from the group consisting of alanine, cysteine, serine, valine, tryptophan, methionine, glycine, leucine, phenylalanine, isoleucine, aspartate, glutamate, histidine, asparagine, glutamine, tryptophan, and tyrosine. In this context, the N-terminal amino acid residue of said second substrate polypeptide may also selected from the group consisting of alanine, cysteine, serine, valine, tryptophan, and methionine.
[0241] It may be further preferred that the modifying chemical (to be employed in step (ii) of the herein provided methods) is a composition comprising an aldehyde and an organoboronic acid, preferably the aldehyde is salicylaldehyde, 2-pyridinecarbaldehyde, glyoxylic acid, or 3-hydroxy-2- pyridinecarbaldehyde, the organoboronic acid is phenylboronic acid or para-methoxyboronic acid, the N-terminal amino acid residue of said first cleavage product is proline, and preferably the N- terminal amino acid residue of said second substrate polypeptide is not proline, more preferably the N-terminal amino acid residue of said second substrate polypeptide is selected from the group consisting of alanine, cysteine, serine, valine. It may be further preferred that the modifying chemical (to be employed in step (ii) of the herein provided methods) is a composition comprising an aldehyde and an organoboronic acid, preferably the aldehyde is salicylaldehyde, 2- pyridinecarbaldehyde, glyoxylic acid, or 3 -hydroxy-2- pyridinecarbaldehyde, the organoboronic acid is phenylboronic acid or para-methoxyboronic acid, the N-terminal amino acid residue of said first cleavage product is proline, and preferably the N-terminal amino acid residue of said second substrate polypeptide is not proline, more preferably the N-terminal amino acid residue of said second substrate polypeptide is alanine.
[0242] In the context of the present invention, it was surprisingly found that a second substrate polypeptide (e.g., comprising an N-terminal alanine residue) when expressed in vivo may be acetylated by (naturally occurring) N-acetyltransferases, see Example 4. As mentioned above, the N-terminal acetylation of Connectase substrates may remove them from the pool of reactants. In other words, if a Connectase substrate (such as a second substrate polypeptide) is N-acetylated, it may not be processed / fused / coupled by Connectases. Accordingly, in the context of the present invention, it is undesired that the N-terminal amino acid of a second substrate polypeptide is acetylated. In order to overcome these difficulties that were encountered in the context of the present invention, it was surprisingly found that the N-terminal acetylation of a second substrate polypeptide may be effectively avoided when said second substrate polypeptide further comprises one or more N-terminal protecting residue(s) (N-terminally to the partial N-terminal DUF2121 recognition sequence). This is particularly useful when e.g., the second substrate polypeptide is expressed in vivo. Accordingly, the second substrate polypeptide further comprises one or more N-terminal protecting residue(s). The N-terminal protecting residue(s) may be cleaved off before step (i) or during step (i) and / or (during) step (ii).
[0243] An 'N-terminal protecting residue' is one or more amino acid residues added to the N-terminus of a second substrate polypeptide, upstream of the partial DUF2121 recognition motif. Its function is to prevent undesired modification, such as N-terminal acetylation, of the recognition motif, for example, during expression in a host cell. These protecting residues are preferably substrates for the modifying enzyme (e.g., proline for a proline aminopeptidase), allowing for their removal during the ligation reaction to expose the desired / adequate / correct N-terminus for fusion. The one or more N-terminal protecting residue(s) may be identical to the N-terminal residue of the first cleavage product. In particular, in the context of the herein provided methods employing peptidases (such as, e.g., BcPAP or other proline aminopeptidases) it may be especially advantageous to employ one or more N-terminal protecting residue(s) that are identical to the N- terminal residue of the first cleavage product. When employing e.g., BcPAP, such an N-terminal protecting residue may preferably comprise or consist of one or more proline residues and the N- terminal amino acid residue of the first cleavage product may also preferably be a proline residue. In this example, BcPAP would be capable of cleaving off both the N-terminal protecting residue of the second substrate polypeptide (thereby allowing said second substrate polypeptide to be fused / coupled / linked to a second cleavage product by Connectases) and the N-terminal amino acid residue of the first cleavage product (thereby depleting the first cleavage product from the pool of reactants). This exemplary approach can be generalized to any suitable combination of peptidases (in particular to e.g., amino peptidases) and N-terminal amino acids of the first cleavage product (such as, e.g., an alanine aminopeptidase and alanine as N-terminal amino acids of the first cleavage product). This innovative approach is illustrated in Example 4.
[0244] Accordingly, it is preferred that the modifying enzyme to be employed in step (ii) is capable of cleaving and / or cleaves said one or more N-terminal protecting residue(s) off the second substrate polypeptide, preferably wherein said modifying enzyme is an amino peptidase as defined herein above or below, more preferably a proline aminopeptidase.
[0245] In the context of the herein provided methods employing proline aminopeptidases, the N-terminal protecting residue(s) may be selected from SEQ ID NO: 50, 53, 54, or 55 or an N-terminal amino acid sequence consisting of P, PP, or PPP, with P being the most preferred option. In this context, also longer amino acid stretches are envisaged, as long as they only comprise proline residues.
[0246] The one or more N-terminal protecting residue(s) may also be a cleavable tag. The cleavable tag is not particularly limited as long as it may be cleaved off with adequate means. Accordingly, in the context of the herein provided methods, the removal of the cleavable tag from the second substrate polypeptide may comprise contacting said second substrate polypeptide comprising said cleavable tag with an enzyme capable of cleaving said cleavable tag off of said second substrate polypeptide. In a preferred embodiment, the cleavable tag may comprise a Tobacco Etch Virus (TEV) protease cleavage site (as exemplified in SEQ ID NO: 50. Accordingly, said cleavable tag may comprise an amino acid sequence consisting of SEQ ID NO: 50. Accordingly, the enzyme capable of cleaving said cleavable tag off of said second substrate polypeptide may be Tobacco Etch Virus protease (TEV protease; see Enzyme Commission number: EC 3.4.22.44). An exemplary TEV protease sequence is shown in SEQ ID NO: 27.
[0247] The cleavable tag may further comprise an N-terminal affinity tag. The affinity tag may be selected from the group consisting of: Streptavidin tag, FLAG tag, HA tag, Myc tag, Sumo, or polyhistidine tag, preferably a polyhistidine-tag.
[0248] As mentioned herein above, in the context of the herein provided method for the production of a circular polypeptide and / or in the context of a method for the cyclisation of a polypeptide, the first substrate polypeptide may be (covalently) linked to the second substrate polypeptide. In this context, it may be preferred that the N-terminus of said first substrate polypeptide or the N- terminus of said second cleavage product is covalently linked to the C-terminus of said second substrate polypeptide via a linker. Such linkers are not particularly limited in the context of the present invention as long as they do not interfere with the coupling / circulation reaction of this polypeptide (e.g., by interfering with the DUF2121 recognition sequences). The skilled person is aware of suitable linkers . Said linker may be a polypeptide linker, a polyethylene glycol linker, or an alkyl linker. Exemplary polypeptide linkers can routinely comprise multiple small amino acids, such as one or more glycine residues and / or one or more serine residues. However, also alternative linkers are envisaged herein. Said polypeptide linker may comprise a sequence selected from GGGGS (SEQ ID NO: 586), GGGGSGGGGS (SEQ ID NO: 587), or GGGGSGGGGSGGGGS (SEQ ID NO: 588). Further herein envisaged is a polypeptide linker having the sequence GSGSGSGSG (SEQ ID NO: 606).
[0249] In the context of the herein provided method for the production of a circular polypeptide and / or in the context of a method for the cyclisation of a polypeptide, linking and / or fusing said second cleavage product to said second substrate polypeptide may comprise producing a circular polypeptide. The method may further comprise obtaining the produced circular peptide (for example by means detailed herein above).
[0250] In the context of the herein provided method for the production of a fusion polypeptide, it is envisaged herein that the first substrate polypeptide may be (covalently) linked / coupled / fused to the second substrate polypeptide (prior to the herein provided methods; e.g., prior to step (i) of the herein provided method). The means for linking / the linkage of the first substrate polypeptide to the second substrate polypeptide (prior to the herein provided methods) it is not particularly limited as long as they do not interfere with the coupling / circulation reaction of these polypeptides (e.g., by interfering with the DUF2121 recognition sequences comprised therein). For example, the first and the second substrate polypeptides may be coupled (prior to the herein provided methods) via amino acid side chains (e.g., cysteine side chains) comprised therein. In this context, it is preferred that the N-terminus of said first substrate polypeptide or the N-terminus of said second cleavage product is covalently linked to the C-terminus of said second substrate polypeptide (optionally via a linker). The skilled person is aware that the herein provided method for the production of a fusion polypeptide may produce a circular polypeptide if: (i) N-terminus of said first substrate polypeptide or the N-terminus of said second cleavage product is covalently linked to the C- terminus of said second substrate polypeptide (optionally via a linker) and (ii) if the C-terminus of the second cleavage product is further linked / fused / coupled to the N-terminus of the second substrate polypeptide in accordance with the present invention (i.e., using the herein provided methods). Accordingly, any definition or specification, of how the first substrate polypeptide may be (covalently) linked to the second substrate polypeptide, mentioned herein above (e.g., in the context of the herein provided method for the production of a circular polypeptide) may also apply to the herein provided method for the production of a fusion polypeptide.
[0251] As mentioned herein above, in the context of the herein provided method for the immobilization of a polypeptide and / or in the context of the herein provided method for the ligation of a polypeptide to a solid carrier, the N-terminus of a first substrate polypeptide or the C-terminus of a second substrate polypeptide may be immobilized on a solid carrier or the N-terminus of a first substrate polypeptide or the method may comprise obtaining the N-terminus of a first substrate polypeptide or the C-terminus of a second substrate polypeptide immobilized on a solid carrier. If the first substrate polypeptide is fused to the solid carrier (via its N-terminus), the first substrate polypeptide or the second cleavage product may remain immobilized on said solid carrier during steps (0) and (i) to (iii). This method may fuse said immobilized second cleavage product and said second substrate polypeptide, thereby producing an immobilized polypeptide.
[0252] If the second substrate polypeptide is fused to the solid carrier (via its C-terminus), the second substrate polypeptide may remain immobilized on said solid carrier during steps (0) and (i) to (iii). This method may fuse said second cleavage product and said immobilized second substrate polypeptide, thereby producing an immobilized polypeptide.
[0253] As mentioned above, a Connectase reaction in the context of the present invention results in the ligation / fusion / coupling of two polypeptides, wherein said two polypeptides may be separated by a (complete) DUF2121 recognition sequence. Accordingly, in the context of the herein provided method for the immobilization of a polypeptide and / or in the context of any herein provided method, said fusion polypeptide / said polypeptide product / said circularized polypeptide / said immobilized polypeptide may comprise a (complete) DUF2121 recognition motif, preferably wherein said (complete) DUF2121 recognition motif is as defined as anywhere herein above.
[0254] The method for the immobilization of a polypeptide and / or in the context of the herein provided method for the ligation of a polypeptide to a solid carrier may further comprise removing undesired reagents and / or contaminants from the immobilized polypeptide after step (iii), preferably by washing said immobilized polypeptide with a buffer after step (iii). Such buffers are not particularly limited and can for example include buffers comprising Phosphate buffers (e.g., derived from sodium or potassium phosphate salts like KH2PO4 and Na2HPO4; such as phosphate buffered saline), Tris (Tris(hydroxymethyl)aminomethane), HEPES (4-(2-hy droxy ethyl)- 1- piperazineethanesulfonic acid), MES (2-(N-morpholino)ethanesulfonic acid), MOPS (3-(N- morpholino)propanesulfonic acid), or Acetate buffers (e.g., sodium acetate). A preferred buffer in this context is phosphate buffered saline (PBS). Such buffers may further comprise suitable concentrations of suitable detergents (such as Tween-20) that facilitate the removal of any undesired reagents / contaminants however do not affect, e.g., protein structure or function of the immobilized polypeptide. As demonstrated in Example 13, a suitable buffer may be PBS comprising 0.1% Tween-20. PBS can comprise 137 mMNaCl, 2.7 mM KC1, 10 mMNa2PO4, and 1.8 mM KH2PO4.
[0255] In the context of the herein provided method for the production of a fusion polypeptide, it is envisaged herein that the N-terminus of the first substrate polypeptide may be immobilized on a solid carrier or that the C-terminus of the second substrate polypeptide may be immobilized on a solid carrier. The skilled person is aware that the herein provided method for the production of a fusion polypeptide may produce an immobilized polypeptide if: (i) the N-terminus of the first substrate polypeptide is immobilized on a solid carrier or the C-terminus of the second substrate polypeptide is immobilized on a solid carrier and (ii) if the C-terminus of the second cleavage product is further linked / fused / coupled to the N-terminus of the second substrate polypeptide in accordance with the present invention (i.e., using the herein provided methods). Accordingly, any definition or specification, of how the first substrate polypeptide or the second substrate polypeptide may be immobilized on a solid carrier, mentioned herein above (e.g., in the context of the herein provided method for the immobilization of a polypeptide) may also apply to the herein provided method for the production of a fusion polypeptide.
[0256] The present invention further relates to a method for the purification of a polypeptide, wherein said method comprises steps (0) and (i) to (iii) of the method for the immobilization of a polypeptide and / or of the method for the ligation of a polypeptide to a solid carrier and further comprises the following step (iv):
[0257] (iv) contacting the immobilized polypeptide with a third substrate polypeptide, with said polypeptide comprising an N-terminal DUF2121 domain, and optionally with a modifying enzyme as anywhere herein above or with a modifying chemical as defined anywhere herein above.
[0258] If, in step (0) of the herein provided method, the first substrate polypeptide was immobilized on a solid carrier (or was obtained immobilized on a solid carrier) via its N-terminus, said third substrate polypeptide may be defined as the second substrate polypeptide (e.g., in this case, the third substrate polypeptide may comprise a partial N-terminal DUF2121 recognition sequence as defined anywhere herein above). If, in step (0) of the herein provided method, the second substrate was immobilized on a solid carrier (or was obtained immobilized on a solid carrier) via its C- terminus, said third substrate polypeptide may be defined as the first substrate polypeptide (e.g., in this case, the third substrate polypeptide may comprise a complete DUF2121 recognition sequence as defined anywhere herein above).
[0259] Accordingly, if the second cleavage product is immobilized (e.g., to a solid carrier), the herein provided method for the production of a fusion polypeptide (linking / fusing said second substrate polypeptide and said second cleavage product) may further comprise the following step (iv):
[0260] (iv) contacting the immobilized polypeptide with a third substrate polypeptide, with said polypeptide comprising an N-terminal DUF2121 domain, and optionally with a modifying enzyme as anywhere herein above or with a modifying chemical as defined anywhere herein above. Such a step (iv) would free / uncouple the previously linked / fused second substrate polypeptide from the immobilized second cleavage product and couple said third substrate polypeptide to said immobilized second substrate product. The third substrate polypeptide also comprises a partial N- terminal DUF2121 recognition sequence similar to said first cleavage product or said second substrate polypeptide. Suitably, the N-terminal amino acid of said partial N-terminal DUF2121 recognition sequence comprised in said third substrate polypeptide differs from the N-terminal amino acid of said partial N-terminal DUF2121 recognition sequence comprised in said second substrate product and the modifying employed in step (iv) can modify the N-terminal amino acid of said partial N-terminal DUF2121 recognition sequence comprised in said second substrate product, thereby depleting it from the pool of reactants (similar to the herein provided methods). Suitable combinations of modifying enzymes or modifying chemicals and N-terminal amino acids can be found in Table 1 .
[0261] However, the present invention also envisages other means to uncouple an immobilized polypeptide from the solid carrier. For example, the addition of a modifying enzyme or modifying chemical is optional, as the mere addition of a polypeptide comprising an N-terminal DUF2121 domain as defined anywhere herein above and a suitable third substrate polypeptide would be sufficient to uncouple said second substrate polypeptide, for example said third substrate polypeptide could be added in molar access.
[0262] The present invention further provides for a fusion polypeptide, a circular polypeptide, an immobilized polypeptide, and / or a purified polypeptide, wherein said fusion polypeptide, said circular polypeptide, said immobilized polypeptide, and / or said purified polypeptide comprises a DUF2121 recognition sequence as defined in anyone of SEQ ID NO: 579 to 582, wherein the amino acid in position 3 of anyone of SEQ ID NO: 579 to 582 is not proline. As mentioned herein above, the present inventors have surprisingly found that the conserved proline residue in Connectase recognition sequences may be substituted by other amino acids without impeding the catalytic activity of Connectases. In the context of the herein provided a fusion polypeptide, circular polypeptide, immobilized polypeptide, and / or purified polypeptide, the amino acid in position 3 of anyone of SEQ ID NO: 579 to 582 may preferably be selected from the group consisting of alanine, cysteine, serine, valine, tryptophan, methionine, glycine, leucine, phenylalanine, isoleucine, aspartate, glutamate, histidine, asparagine, glutamine, tryptophan, and tyrosine, preferably alanine, cysteine, serine, or valine, more preferably, alanine, or cysteine, most preferably alanine. In this context, the amino acid in position 3 of anyone of SEQ ID NO: 579 to 582 may also be selected from the group consisting of alanine, cysteine, serine, valine, tryptophan, and methionine. The fusion polypeptide, the circular polypeptide, the immobilized polypeptide, and / or the purified polypeptide may be as defined in the context of the herein provided methods. The present invention further provides for a fusion polypeptide, a circular polypeptide, an immobilized polypeptide, and / or a purified polypeptide (directly) obtained and / or (directly) obtainable by the herein provided methods.
[0263] Accordingly, the present invention further provides for a polypeptide (e.g., as obtained or obtainable through the herein provided methods) comprising a DUF2121 recognition motif as defined in any one of SEQ ID NO: 579 to 582, with the amino acid in position 3 corresponding to alanine, cysteine, serine, valine, tryptophan, methionine, glycine, leucine, phenylalanine, isoleucine, aspartate, glutamate, histidine, asparagine, glutamine, tryptophan, or tyrosine, wherein said DUF2121 recognition motif further comprises an amino acid sequence as defined in SEQ ID NO: 630 (or an amino acid sequence having at least about 80%, at least about 85%, at least about 90%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, at least about 99%, or preferably about 100% sequence identity thereto) C-terminally adjacent to the sequence of SEQ ID NO: 579 to 582, and wherein said DUF2121 recognition motif further comprises an amino acid sequence as defined in SEQ ID NO: 629 (or an amino acid sequence having at least about 80%, at least about 85%, at least about 90%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, at least about 99%, or preferably about 100% sequence identity thereto) N-terminally adjacent to the sequence of SEQ ID NO: 579 to 582.
[0264] Accordingly, the present invention further provides for a polypeptide (e.g., as obtained or obtainable through the herein provided methods) comprising a DUF2121 recognition motif as defined in any one of SEQ ID NO: 579 to 582, with the amino acid in position 3 corresponding to alanine, cysteine, serine, valine, tryptophan, or methionine, wherein said DUF2121 recognition motif further comprises an amino acid sequence as defined in SEQ ID NO: 630 (or an amino acid sequence having at least about 80%, at least about 85%, at least about 90%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, at least about 99%, or preferably about 100% sequence identity thereto) C-terminally adjacent to the sequence of SEQ ID NO: 579 to 582, and wherein said DUF2121 recognition motif further comprises an amino acid sequence as defined in SEQ ID NO: 629 (or an amino acid sequence having at least about 80%, at least about 85%, at least about 90%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, at least about 99%, or preferably about 100% sequence identity thereto) N-terminally adjacent to the sequence of SEQ ID NO: 579 to 582.
[0265] Accordingly, the present invention further provides for a polypeptide (e.g., as obtained or obtainable through the herein provided methods) comprising a DUF2121 recognition motif as defined in any one of SEQ ID NO: 579 to 582, with the amino acid in position 3 corresponding to alanine, cysteine, serine, or valine, wherein said DUF2121 recognition motif further comprises an amino acid sequence as defined in SEQ ID NO: 630 (or an amino acid sequence having at least about 80%, at least about 85%, at least about 90%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, at least about 99%, or preferably about 100% sequence identity thereto) C-terminally adjacent to the sequence of SEQ ID NO: 579 to 582, and wherein said DUF2121 recognition motif further comprises an amino acid sequence as defined in SEQ ID NO: 629 (or an amino acid sequence having at least about 80%, at least about 85%, at least about 90%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, at least about 99%, or preferably about 100% sequence identity thereto) N-terminally adjacent to the sequence of SEQ ID NO: 579 to 582.
[0266] Accordingly, the present invention further provides for a polypeptide (e.g., as obtained or obtainable through the herein provided methods) comprising a DUF2121 recognition motif as defined in any one of SEQ ID NO: 579 to 582, with the amino acid in position 3 corresponding to alanine, wherein said DUF2121 recognition motif further comprises an amino acid sequence as defined in SEQ ID NO: 630 (or an amino acid sequence having at least about 80%, at least about 85%, at least about 90%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, at least about 99%, or preferably about 100% sequence identity thereto) C-terminally adjacent to the sequence of SEQ ID NO: 579 to 582, and wherein said DUF2121 recognition motif further comprises an amino acid sequence as defined in SEQ ID NO: 629 (or an amino acid sequence having at least about 80%, at least about 85%, at least about 90%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, at least about 99%, or preferably about 100% sequence identity thereto) N-terminally adjacent to the sequence of SEQ ID NO: 579 to 582.
[0267] As indicated above, the herein provided polypeptide comprising a partial N-terminal DUF2121 recognition motif may, for example, further comprise one or more proteinaceous or non- proteinaceous moieties (e.g., linked to the C-terminus of said polypeptide comprising a partial N- terminal DUF2121 recognition motif).
[0268] The present invention further provides for nucleic acids encoding the herein provided polypeptide comprising a DUF2121 recognition motif. The present invention further provides for cells comprising said nucleic acid or said polypeptide comprising a DUF2121 recognition motif. The present invention further envisages the use of such polypeptides in the herein provided methods.
[0269] The present invention further provides for a composition comprising the polypeptide comprising a DUF2121 recognition site. Said composition may further comprise one or more of the following (a) to (b):
[0270] (a) a polypeptide comprising an N-terminal DUF2121 domain having an N-terminal serine or threonine residue, preferably wherein said polypeptide comprising an N- terminal DUF2121 domain is as defined anywhere herein above; and
[0271] (b) an enzyme or a chemical, wherein said enzyme is selected from the group consisting of a peptidase and an N-acetyltransferase, preferably wherein said peptidase is as defined anywhere herein above, or preferably wherein said N-acetyltransferase is as defined anywhere herein above, preferably wherein said chemical is as defined anywhere herein above.
[0272] The present invention further provides for a composition comprising a polypeptide comprising an N-terminal DUF2121 domain having an N-terminal serine or threonine residue and an enzyme selected from the group consisting of a peptidase and an N-acetyltransferase and / or with a modifying chemical as defined anywhere herein above, preferably a peptidase.
[0273] In the context of the herein provided composition, the polypeptide comprising an N-terminal DUF2121 domain is as defined anywhere herein above and wherein said peptidase is as defined anywhere herein above and / or wherein said N-acetyltransferase is as defined anywhere herein above.
[0274] In a preferred embodiment, the composition comprises a polypeptide comprising an N-terminal DUF2121 domain having an N-terminal serine or threonine residue as defined in domain comprises or consists of an amino acid sequence as depicted in SEQ ID NO: 159 and a proline aminopeptidase, preferably a proline aminopeptidase according to SEQ ID NO: 52.
[0275] In a further preferred embodiment, the composition comprises a polypeptide comprising an N- terminal DUF2121 domain having an N-terminal serine or threonine residue as defined in domain comprises or consists of an amino acid sequence as depicted in SEQ ID NO: 159 and 2- Formylphenylboronic acid.
[0276] The present invention further relates to a polypeptide comprising an N-terminal DUF2121 domain having an N-terminal serine or threonine residue and a polypeptide selected from the group consisting of a peptidase and an N-acetyltransferase, preferably a peptidase, preferably wherein the polypeptide comprising an N-terminal DUF2121 domain is as defined anywhere herein above and wherein said peptidase is as defined anywhere herein above and / or wherein said N- acetyltransferase is as defined anywhere herein above. The polypeptide comprising an N-terminal DUF2121 domain having an N-terminal serine or threonine residue and a polypeptide selected from the group consisting of a peptidase and an N-acetyltransferase may herein also be referred to as “fusion enzyme”. Examples 12 and 13 illustratively demonstrate the production of such a fusion enzyme and its employment in a method for producing a fusion polypeptide (in particular, of a biotinylated fusion polypeptide and an immobilized fusion polypeptide). Accordingly, in a preferred embodiment, the present invention also provides for a fusion enzyme comprising the polypeptide comprising an N-terminal DUF2121 domain having an N-terminal serine or threonine residue and a proline aminopeptidase. In a particularly preferred embodiment, the present provides for a fusion enzyme comprising or consisting of:
[0277] (i) a polypeptide comprising an N-terminal DUF2121 domain comprising or consisting of an amino acid sequence according to SEQ ID NO: 161 or an amino acid sequence having at least 20%, preferably at least 30%, preferably at least 40%, even more preferably at least 50%, even more preferably at least 60%, even more preferably at least 70%, even more preferably at least 80%, even more preferably at least 90%, even more preferably at least 95% and most preferably at least 99% sequence identity to said amino acid sequence and has a sequence specific transpeptidase activity according to the invention; and
[0278] (ii) a proline aminopeptidase comprising or consisting of an amino acid sequence as defined in SEQ ID NO: 52 or the proline aminopeptidase comprises an amino acid sequence having at least about 60%, at least about 65%, at least about 70%, at least about 75%, at least about 80%, at least about 85%, at least about 90%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, at least about 99%, or up to about 100% sequence identity to SEQ ID NO: 52 and comprises proline aminopeptidase activity, preferably said proline aminopeptidase is linked C-terminally to said polypeptide comprising an N-terminal DUF2121 domain, optionally linked via a linker.
[0279] Accordingly, in a further particularly preferred embodiment, the present provides for a fusion enzyme comprising or consisting of:
[0280] (i) a polypeptide comprising an N-terminal DUF2121 domain comprising or preferably consisting of an amino acid sequence according to SEQ ID NO: 161 or an amino acid sequence having at least 20%, preferably at least 30%, preferably at least 40%, even more preferably at least 50%, even more preferably at least 60%, even more preferably at least 70%, even more preferably at least 80%, even more preferably at least 90%, even more preferably at least 95% and most preferably at least 99% sequence identity to said amino acid sequence and has a sequence specific transpeptidase activity according to the invention; and
[0281] (ii) a proline aminopeptidase comprising or preferably consisting of an amino acid sequence as defined in SEQ ID NO: 52 or the proline aminopeptidase comprises an amino acid sequence having at least about 60%, at least about 65%, at least about 70%, at least about 75%, at least about 80%, at least about 85%, at least about 90%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, at least about 99%, or up to about 100% sequence identity to SEQ ID NO: 52 and comprises proline aminopeptidase activity, preferably said proline aminopeptidase is linked C-terminally to said polypeptide comprising an N-terminal DUF2121 domain via a linker comprising or preferably consisting of an amino acid sequence according to SEQ ID NO: 606.
[0282] Accordingly, in a further particularly preferred embodiment, the present provides for a fusion enzyme comprising or consisting of an amino acid sequence according to SEQ ID NO: 605.
[0283] Accordingly, in a further preferred embodiment, the present provides for a fusion enzyme comprising or consisting of an amino acid sequence according to SEQ ID NO: 604.
[0284] Further, in a preferred embodiment, the present invention also provides for a fusion enzyme comprising the polypeptide comprising an N-terminal DUF2121 domain having an N-terminal serine or threonine residue and a alanine aminopeptidase. In a particularly preferred embodiment, the present provides for a fusion enzyme comprising or consisting of:
[0285] (i) a polypeptide comprising an N-terminal DUF2121 domain comprising or consisting of an amino acid sequence according to SEQ ID NO: 161 or an amino acid sequence having at least 20%, preferably at least 30%, preferably at least 40%, even more preferably at least 50%, even more preferably at least 60%, even more preferably at least 70%, even more preferably at least 80%, even more preferably at least 90%, even more preferably at least 95% and most preferably at least 99% sequence identity to said amino acid sequence and has a sequence specific transpeptidase activity according to the invention; and
[0286] (ii) a alanine aminopeptidase comprising or consisting of an amino acid sequence as defined in SEQ ID NO: 592 or 593, or the alanine aminopeptidase comprises an amino acid sequence having at least about 60%, at least about 65%, at least about 70%, at least about 75%, at least about 80%, at least about 85%, at least about 90%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, at least about 99%, or up to about 100% sequence identity to SEQ ID NO: 592 or 593, and comprises alanine aminopeptidase activity, preferably said alanine aminopeptidase is linked C-terminally to said polypeptide comprising an N-terminal DUF2121 domain, optionally linked via a linker.
[0287] Accordingly, in a further particularly preferred embodiment, the present provides for a fusion enzyme comprising or consisting of:
[0288] (i) a polypeptide comprising an N-terminal DUF2121 domain comprising or preferably consisting of an amino acid sequence according to SEQ ID NO: 161 or an amino acid sequence having at least 20%, preferably at least 30%, preferably at least 40%, even more preferably at least 50%, even more preferably at least 60%, even more preferably at least 70%, even more preferably at least 80%, even more preferably at least 90%, even more preferably at least 95% and most preferably at least 99% sequence identity to said amino acid sequence and has a sequence specific transpeptidase activity according to the invention; and
[0289] (ii) a alanine aminopeptidase comprising or preferably consisting of an amino acid sequence as defined in SEQ ID NO: 592 or 593, or the alanine aminopeptidase comprises an amino acid sequence having at least about 60%, at least about 65%, at least about 70%, at least about 75%, at least about 80%, at least about 85%, at least about 90%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, at least about 99%, or up to about 100% sequence identity to SEQ ID NO: 592 or 593, and comprises alanine aminopeptidase activity, preferably said alanine aminopeptidase is linked C-terminally to said polypeptide comprising an N-terminal DUF2121 domain via a linker comprising or preferably consisting of an amino acid sequence according to SEQ ID NO: 606.
[0290] The present invention further relates to a nucleic acid encoding the fusion enzyme provided herein above.
[0291] The present invention further relates to a nucleic acid encoding the following (a) to (b):
[0292] (a) a polypeptide comprising an N-terminal DUF2121 domain having an N-terminal serine or threonine residue, preferably wherein said polypeptide comprising an N- terminal DUF2121 domain is as defined anywhere herein above; and (b) an enzyme selected from the group consisting of a peptidase and an N- acetyltransferase, preferably wherein said peptidase as defined anywhere herein above, or preferably wherein said N-acetyltransferase as defined anywhere herein above, more preferably a peptidase as defined anywhere herein above.
[0293] In a preferred embodiment the herein provided nucleic acid encodes the following (a) to (b):
[0294] (a) an N-terminal DUF2121 domain having an N-terminal serine or threonine residue as defined in domain comprises or consists of an amino acid sequence as depicted in SEQ ID NO: 159; and
[0295] (b) a proline aminopeptidase, preferably a proline aminopeptidase according to SEQ ID NO: 52.
[0296] The present invention further relates to a nucleic acid vector comprising any one of the herein above provided nucleic acid molecules.
[0297] The present invention further provides for a host or host cell comprising any one of the herein above provided nucleic acid molecules, the herein above provided nucleic acid vector, and / or the fusion enzyme provided herein above.
[0298] The present invention provides for a further host cell comprising at least (a) and (b), or at least (c) and (d), or at least (e) and (f) from the group consisting of the following (a) to (f):
[0299] (a) a polypeptide comprising an N-terminal DUF2121 domain having an N-terminal serine or threonine residue, preferably wherein said polypeptide comprising an N- terminal DUF2121 domain is as defined anywhere herein above;
[0300] (b) an enzyme selected from the group consisting of a peptidase and an N- acetyltransferase, preferably wherein said peptidase is as defined anywhere herein above, or preferably wherein said N-acetyltransferase is as defined anywhere herein above, more preferably a peptidase is as defined anywhere herein above;
[0301] (c) a nucleic acid encoding the polypeptide comprising an N-terminal DUF2121 according to (a);
[0302] (d) a nucleic acid encoding the enzyme selected from the group consisting of a peptidase and an N-acetyltransferase according to (b);
[0303] (e) a nucleic acid vector comprising the nucleic acid according to (c); and
[0304] (f) a nucleic acid vector comprising the nucleic acid according to (d).
[0305] In a further preferred embodiment, the herein provided host cell comprises at least (a) and (b), or at least (c) and (d), or at least (e) and (f) from the group consisting of the following (a) to (f): (a) a polypeptide comprising an N-terminal DUF2121 domain having an N-terminal serine or threonine residue as defined in domain comprises or consists of an amino acid sequence as depicted in SEQ ID NO: 159;
[0306] (b) a proline aminopeptidase, preferably a proline aminopeptidase according to SEQ ID NO: 52;
[0307] (c) a nucleic acid encoding the polypeptide comprising an N-terminal DUF2121 according to (a);
[0308] (d) a nucleic acid encoding the enzyme selected from the group consisting of a peptidase and an N-acetyltransferase according to (b);
[0309] (e) a nucleic acid vector comprising the nucleic acid according to (c); and
[0310] (f) a nucleic acid vector comprising the nucleic acid according to (d).
[0311] In the context of the herein provided host cell, the nucleic acids of (c) and (d) may be comprised in a single nucleic acid. In the context of the herein provided host cell, the nucleic acids of (c) and
[0312] (d) may be comprised in a single nucleic acid vector.
[0313] The herein above provided host cell may be defined as the host cell in the context of the herein above provided methods.
[0314] The present invention further relates to a kit comprising one or more selected from the group consisting of the following (a) to (g):
[0315] (a) a polypeptide comprising an N-terminal DUF2121 domain having an N-terminal serine or threonine residue and an enzyme or a chemical, wherein said enzyme is selected from the group consisting of a peptidase and an N-acetyltransferase, preferably wherein said polypeptide comprising an N-terminal DUF2121 domain is as defined anywhere herein above, and preferably wherein said peptidase is as defined anywhere herein above, or preferably wherein said N-acetyltransferase is as defined anywhere herein above, preferably wherein said (modifying) chemical is as defined anywhere herein above, more preferably a polypeptide comprising an N-terminal DUF2121 domain having an N-terminal serine or threonine residue is as defined anywhere herein above and a peptidase is as defined anywhere herein above;
[0316] (b) the herein above provided composition;
[0317] (c) the herein above provided fusion enzyme;
[0318] (d) the herein above provided nucleic acid;
[0319] (e) the herein above provided nucleic acid vector;
[0320] (f) the herein above provided host cell; and
[0321] (g) the herein above provided further host cell. The present invention further relates to a kit for the production of a fusion polypeptide, the kit comprising the following (a) to (e):
[0322] (a) a polypeptide comprising an N-terminal DUF2121 domain having an N-terminal serine or threonine residue, preferably wherein said polypeptide comprising an N- terminal DUF2121 domain is as defined anywhere herein above, more preferably a polypeptide comprising an N-terminal DUF2121 domain having an N-terminal serine or threonine residue is as defined anywhere herein above;
[0323] (b) a modifying enzyme selected from the group consisting of a peptidase as defined anywhere herein above and an N-acetyltransferase as defined anywhere herein above, preferably a peptidase;
[0324] (c) preferably,
[0325] (cl) a first substrate polypeptide as defined anywhere herein above,
[0326] (c2) a nucleic acid encoding (cl), preferably wherein said nucleic acid comprises a multiple cloning site, or
[0327] (c3) a host cell comprising (cl) or (c2); and
[0328] (d) preferably,
[0329] (dl) a second substrate polypeptide as defined anywhere herein above,
[0330] (d2) a nucleic acid encoding (dl), preferably wherein said nucleic acid comprises a multiple cloning site, or
[0331] (d3) a host cell comprising (dl) or (d2);
[0332] (e) optionally, a buffer as defined anywhere herein above.
[0333] The present invention further relates to a kit for the production of a fusion polypeptide, the kit comprising the following (a) to (e):
[0334] (a) a polypeptide comprising an N-terminal DUF2121 domain having an N-terminal serine or threonine residue, preferably wherein said polypeptide comprising an N- terminal DUF2121 domain is as defined anywhere herein above, more preferably a polypeptide comprising an N-terminal DUF2121 domain having an N-terminal serine or threonine residue is as defined anywhere herein above;
[0335] (b) a modifying chemical as defined anywhere herein above;
[0336] (c) preferably,
[0337] (cl) a first substrate polypeptide as defined anywhere herein above,
[0338] (c2) a nucleic acid encoding (cl), preferably wherein said nucleic acid comprises a multiple cloning site, or
[0339] (c3) a host cell comprising (cl) or (c2); and (d) preferably,
[0340] (dl) a second substrate polypeptide as defined anywhere herein above,
[0341] (d2) a nucleic acid encoding (dl), preferably wherein said nucleic acid comprises a multiple cloning site, or
[0342] (d3) a host cell comprising (dl) or (d2);
[0343] (e) optionally, a buffer as defined anywhere herein above.
[0344] The present invention further relates to a kit for the production of a fusion polypeptide, the kit comprising the following (a) to (d):
[0345] (a) a fusion enzyme as defined anywhere herein above, in particular a fusion enzyme comprising a modifying enzyme and a polypeptide comprising an N-terminal DUF2121 domain having an N-terminal serine or threonine residue;
[0346] (b) preferably,
[0347] (bl) a first substrate polypeptide as defined anywhere herein above,
[0348] (b2) a nucleic acid encoding (bl), preferably wherein said nucleic acid comprises a multiple cloning site, or
[0349] (b3) a host cell comprising (bl) or (b2); and
[0350] (c) preferably,
[0351] (cl) a second substrate polypeptide as defined anywhere herein above,
[0352] (c2) a nucleic acid encoding (cl), preferably wherein said nucleic acid comprises a multiple cloning site, or
[0353] (c3) a host cell comprising (cl) or (c2);
[0354] (d) optionally, a buffer as defined anywhere herein above.
[0355] The present invention further relates to a kit for the immobilization of a polypeptide, the kit comprising the following (a) to (d):
[0356] (a) a polypeptide comprising an N-terminal DUF2121 domain having an N-terminal serine or threonine residue, preferably wherein said polypeptide comprising an N- terminal DUF2121 domain is as defined anywhere herein above, more preferably a polypeptide comprising an N-terminal DUF2121 domain having an N-terminal serine or threonine residue is as defined anywhere herein above;
[0357] (b) a modifying enzyme selected from the group consisting of a peptidase as defined anywhere herein above and an N-acetyltransferase as defined anywhere herein above, preferably a peptidase;
[0358] (c) a first substrate polypeptide as defined anywhere herein above comprising a solid carrier (e.g., a micro-titer plate, magnetic beads, agarose beads, chips, or EM grids) and optionally a second substrate polypeptide as defined anywhere herein above (e.g., comprising a proteinaceous moiety, such as an antibody or an antigen-binding fragment thereof), or a second substrate polypeptide as defined anywhere herein above comprising a solid carrier (e.g., a micro-titer plate, magnetic beads, agarose beads, chips, or EM grids) and optionally a first substrate polypeptide as defined anywhere herein above (e.g., comprising a proteinaceous moiety, such as an antibody or an antigen-binding fragment thereof),
[0359] (d) optionally, a buffer as defined anywhere herein above.
[0360] The present invention further relates to a kit for the immobilization of a polypeptide, the kit comprising the following (a) to (d):
[0361] (a) a polypeptide comprising an N-terminal DUF2121 domain having an N-terminal serine or threonine residue, preferably wherein said polypeptide comprising an N- terminal DUF2121 domain is as defined anywhere herein above, more preferably a polypeptide comprising an N-terminal DUF2121 domain having an N-terminal serine or threonine residue is as defined anywhere herein above;
[0362] (b) a modifying chemical as defined anywhere herein above;
[0363] (c) a first substrate polypeptide as defined anywhere herein above comprising a solid carrier (e.g., a micro-titer plate, magnetic beads, agarose beads, chips, or EM grids) and optionally a second substrate polypeptide as defined anywhere herein above (e.g., comprising a proteinaceous moiety, such as an antibody or an antigen-binding fragment thereof), or a second substrate polypeptide as defined anywhere herein above comprising a solid carrier (e.g., a micro-titer plate, magnetic beads, agarose beads, chips, or EM grids) and optionally a first substrate polypeptide as defined anywhere herein above (e.g., comprising a proteinaceous moiety, such as an antibody or an antigen-binding fragment thereof),
[0364] (d) optionally, a buffer as defined anywhere herein above.
[0365] The present invention further relates to a kit for the immobilization of a polypeptide, the kit comprising the following (a) to (c):
[0366] (a) a fusion enzyme as defined anywhere herein above, in particular a fusion enzyme comprising a modifying enzyme and a polypeptide comprising an N-terminal DUF2121 domain having an N-terminal serine or threonine residue;
[0367] (b) a first substrate polypeptide as defined anywhere herein above comprising a solid carrier (e.g., a micro-titer plate, magnetic beads, agarose beads, chips, or EM grids) and optionally a second substrate polypeptide as defined anywhere herein above (e.g., comprising a proteinaceous moiety, such as an antibody or an antigen-binding fragment thereof), or a second substrate polypeptide as defined anywhere herein above comprising a solid carrier (e.g., a micro-titer plate, magnetic beads, agarose beads, chips, or EM grids) and optionally a first substrate polypeptide as defined anywhere herein above (e.g., comprising a proteinaceous moiety, such as an antibody or an antigen-binding fragment thereof),
[0368] (c) optionally, a buffer as defined anywhere herein above.
[0369] In the context of the herein provided kits for the immobilization of a polypeptide, the kit may also comprise a solid carrier not linked to the first substrate polypeptide or the second substrate polypeptide, wherein the kit further comprises means for linking said solid carrier to said first substrate polypeptide or said second substrate polypeptide.
[0370] As illustrated in Example 5, and Figures 7 and 16, the herein employed means and methods are also particularly useful for the detection and / or quantification of a polypeptide, in particular the in-gel detection and / or quantification of a polypeptide.
[0371] Accordingly, the present invention further relates to a kit for the detection and / or quantification of a polypeptide (e.g., the in-gel detection and / or quantification of a polypeptide), the kit comprising the following (a) to (d):
[0372] (a) a polypeptide comprising an N-terminal DUF2121 domain having an N-terminal serine or threonine residue, preferably wherein said polypeptide comprising an N- terminal DUF2121 domain is as defined anywhere herein above, more preferably a polypeptide comprising an N-terminal DUF2121 domain having an N-terminal serine or threonine residue is as defined anywhere herein above;
[0373] (b) a modifying enzyme selected from the group consisting of a peptidase as defined anywhere herein above and an N-acetyltransferase as defined anywhere herein above, preferably a peptidase;
[0374] (c) a first substrate polypeptide as defined anywhere herein above comprising a detection label (such as a fluorophore) and optionally a second substrate polypeptide as defined anywhere herein above (e.g., comprising a proteinaceous moiety, such as an antibody or an antigen-binding fragment thereof), or a second substrate polypeptide as defined anywhere herein above comprising a detection label (such as a fluorophore) and optionally a first substrate polypeptide as defined anywhere herein above (e.g., comprising a proteinaceous moiety, such as an antibody or an antigen-binding fragment thereof),
[0375] (d) optionally, a buffer as defined anywhere herein above. The present invention further relates to a kit for the detection and / or quantification of a polypeptide (e.g., the in-gel detection and / or quantification of a polypeptide), the kit comprising the following
[0376] (a) to (d):
[0377] (a) a polypeptide comprising an N-terminal DUF2121 domain having an N-terminal serine or threonine residue, preferably wherein said polypeptide comprising an N- terminal DUF2121 domain is as defined anywhere herein above, more preferably a polypeptide comprising an N-terminal DUF2121 domain having an N-terminal serine or threonine residue is as defined anywhere herein above;
[0378] (b) a modifying chemical as defined anywhere herein above;
[0379] (c) a first substrate polypeptide as defined anywhere herein above comprising a detection label (such as a fluorophore) and optionally a second substrate polypeptide as defined anywhere herein above (e.g., comprising a proteinaceous moiety, such as an antibody or an antigen-binding fragment thereof), or a second substrate polypeptide as defined anywhere herein above comprising a detection label (such as a fluorophore) and optionally a first substrate polypeptide as defined anywhere herein above (e.g., comprising a proteinaceous moiety, such as an antibody or an antigen-binding fragment thereof),
[0380] (d) optionally, a buffer as defined anywhere herein above.
[0381] The present invention further relates to a kit for the detection and / or quantification of a polypeptide (e.g., the in-gel detection and / or quantification of a polypeptide), the kit comprising the following (a) to (c):
[0382] (a) a fusion enzyme as defined anywhere herein above, in particular a fusion enzyme comprising a modifying enzyme and a polypeptide comprising an N-terminal DUF2121 domain having an N-terminal serine or threonine residue;
[0383] (b) a first substrate polypeptide as defined anywhere herein above comprising a detection label (such as a fluorophore) and optionally a second substrate polypeptide as defined anywhere herein above (e.g., comprising a proteinaceous moiety, such as an antibody or an antigen-binding fragment thereof), or a second substrate polypeptide as defined anywhere herein above comprising a detection label (such as a fluorophore) and optionally a first substrate polypeptide as defined anywhere herein above (e.g., comprising a proteinaceous moiety, such as an antibody or an antigen-binding fragment thereof),
[0384] (c) optionally, a buffer as defined anywhere herein above.
[0385] In the context of the herein provided kits (in particular the kits for the production of a fusion polypeptide, for the immobilization of a polypeptide, and for the detection and / or quantification of a polypeptide), the first or second substrate polypeptide may also be replaced by a nucleic acid encoding the same (in particular by a nucleic acid encoding comprising a multiple cloning site, that allows for the integration of a polypeptide of interest into said first or second polypeptide).
[0386] The present invention further provides for a polypeptide comprising a partial N-terminal DUF2121 recognition site and one or more protecting residue(s) N-terminal to said partial N-terminal DUF2121 recognition site. Said partial N-terminal DUF2121 recognition site may be as defined anywhere herein above. Said one or more N-terminal protecting residue(s) may be as defined anywhere herein above. Said polypeptide may comprises an N-terminal amino acid sequence (i.e., a corresponding to said N-terminal protecting residues) selected from SEQ ID NO: 50, 53, 54, or 55 or an N-terminal amino acid sequence consisting of P, PP, or PPP. Such polypeptides comprising a partial N-terminal DUF2121 recognition site and one or more protecting residue(s) N-terminal to said partial N-terminal DUF2121 recognition site, may be further defined as the second substrate polypeptide in the context of the herein above provided methods. In a preferred embodiment the N-terminal DUF2121 recognition motif comprises a sequence as defined by the amino acid sequence from positions 3 to 5 of any one of SEQ ID NO: 579 to 582, wherein the amino acid corresponding to position 3 of any one of SEQ ID NO: 579 to 582 is preferably not proline. Such polypeptides comprising a partial N-terminal DUF2121 recognition site and one or more protecting residue(s) N-terminal to said partial N-terminal DUF2121 recognition site may, inter alia, be particularly useful in the efficient ligation of two polypeptides and the herein provided methods.
[0387] The present invention further provides for a nucleic acid encoding the polypeptide comprising a partial N-terminal DUF2121 recognition site and one or more protecting residue(s) N-terminal to said partial N-terminal DUF2121 recognition site. The present invention further provides for a nucleic acid vector comprising said nucleic acid. The present invention further provides for a host or host cell comprising said nucleic acid, said nucleic acid vector, and / or the polypeptide comprising a partial N-terminal DUF2121 recognition site and one or more protecting residue(s) N-terminal to said partial N-terminal DUF2121 recognition site.
[0388] The present invention further provides for a composition comprising the polypeptide comprising a partial N-terminal DUF2121 recognition site and one or more protecting residue(s) N-terminal to said partial N-terminal DUF2121 recognition site. Said composition may further comprise one or more of the following (a) to (b):
[0389] (a) a polypeptide comprising an N-terminal DUF2121 domain having an N-terminal serine or threonine residue, preferably wherein said polypeptide comprising an N- terminal DUF2121 domain is as defined anywhere herein above; and
[0390] (b) an enzyme or a chemical, wherein said enzyme is selected from the group consisting of a peptidase and an N-acetyltransferase, preferably wherein said peptidase is as defined anywhere herein above, or preferably wherein said N- acetyltransferase is as defined anywhere herein above, preferably wherein said chemical is as defined anywhere herein above. Accordingly, the present invention further provides for a polypeptide comprising a partial N- terminal DUF2121 recognition motif, wherein said polypeptide comprises a N-terminal amino acid sequence of XGA or XAA, and further comprises an amino acid sequence as defined in SEQ ID NO: 630 (or an amino acid sequence having at least about 80%, at least about 85%, at least about 90%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, at least about 99%, or preferably about 100% sequence identity thereto) C-terminally adjacent to the sequence of XGA or XAA, wherein the first amino acid residue in the amino acid sequence of XGA or XAA corresponds to alanine, cysteine, serine, valine, tryptophan, methionine, glycine, leucine, phenylalanine, isoleucine, aspartate, glutamate, histidine, asparagine, glutamine, tryptophan, or tyrosine.
[0391] Accordingly, the present invention further provides for a polypeptide comprising a partial N- terminal DUF2121 recognition motif, wherein said polypeptide comprises a N-terminal amino acid sequence of XGA or XAA, and further comprises an amino acid sequence as defined in SEQ ID NO: 630 (or an amino acid sequence having at least about 80%, at least about 85%, at least about 90%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, at least about 99%, or preferably about 100% sequence identity thereto) C-terminally adjacent to the sequence of XGA or XAA, wherein the first amino acid residue in the amino acid sequence of XGA or XAA corresponds to alanine, cysteine, serine, valine, tryptophan, or methionine.
[0392] Accordingly, the present invention further provides for a polypeptide comprising a partial N- terminal DUF2121 recognition motif, wherein said polypeptide comprises a N-terminal amino acid sequence of XGA or XAA, and further comprises an amino acid sequence as defined in SEQ ID NO: 630 (or an amino acid sequence having at least about 80%, at least about 85%, at least about 90%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, at least about 99%, or preferably about 100% sequence identity thereto) C-terminally adjacent to the sequence of XGA or XAA, wherein the first amino acid residue in the amino acid sequence of XGA or XAA corresponds to alanine, cysteine, serine, or valine.
[0393] Accordingly, the present invention further provides for a polypeptide comprising a partial N- terminal DUF2121 recognition motif, wherein said polypeptide comprises a N-terminal amino acid sequence of XGA or XAA, and further comprises an amino acid sequence as defined in SEQ ID NO: 630 (or an amino acid sequence having at least about 80%, at least about 85%, at least about 90%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, at least about 99%, or preferably about 100% sequence identity thereto) C-terminally adjacent to the sequence of XGA or XAA, wherein the first amino acid residue in the amino acid sequence of XGA or XAA corresponds to alanine.
[0394] The herein provided polypeptide comprising a partial N-terminal DUF2121 recognition motif may further comprise one or more N-terminal protecting residues as defined anywhere herein above.
[0395] The herein provided polypeptide comprising a partial N-terminal DUF2121 recognition motif may be further characterized as the first cleavage product of the first substrate polypeptide employed in any of the herein provided methods. Accordingly, the herein provided polypeptide comprising a partial N-terminal DUF2121 recognition motif may, for example, further comprise one or more proteinaceous or non-proteinaceous moi eties (e.g., linked to the C-terminus of said polypeptide comprising a partial N-terminal DUF2121 recognition motif).
[0396] The present invention further provides for nucleic acids encoding the herein provided polypeptide comprising a partial N-terminal DUF2121 recognition motif. The present invention further provides for cells comprising said nucleic acid or said polypeptide comprising a partial N-terminal DUF2121 recognition motif. The present invention further envisages the use of such polypeptides in the herein provided methods.
[0397] The present invention further provides for a composition comprising the polypeptide comprising a partial N-terminal DUF2121 recognition site. Said composition may further comprise one or more of the following (a) to (b):
[0398] (a) a polypeptide comprising an N-terminal DUF2121 domain having an N-terminal serine or threonine residue, preferably wherein said polypeptide comprising an N- terminal DUF2121 domain is as defined anywhere herein above; and
[0399] (b) an enzyme or a chemical, wherein said enzyme is selected from the group consisting of a peptidase and an N-acetyltransferase, preferably wherein said peptidase is as defined anywhere herein above, or preferably wherein said N-acetyltransferase is as defined anywhere herein above, preferably wherein said chemical is as defined anywhere herein above.
[0400] In the context of the present invention, “so that” can be used interchangeably with “to achieve that”, “thereby it is achieved that”, or “thereby achieving” and indicates that a respective result is achieved.
[0401] The term “sequence identity”, as used herein, refers to the sequence match between two (poly)peptides or nucleic acids. The (poly)peptide or nucleic acid sequences to be compared are aligned to give maximum identity, for example, using bioinformatics tools for pairwise alignment such as EMBOSS Needle (https: / / www.ebi.ac.uk / Tools / psa / emboss_needle / ; see also Madeira F, el al. The EMBL-EBI search and sequence analysis tools APIs in 2019. Nucleic Acids Research. 2019 Jul;47(Wl):W636-W641. DOI: 10.1093 / nar / gkz268). When the same position in the sequences to be compared is occupied by the same nucleobase or amino acid residue, then the respective molecules are identical at that very position. Accordingly, the “sequence identity”, “percent identity” or “percent sequence identity” is a function of the number of matching positions divided by the number of positions compared and multiplied by 100%. For example, if 6 out of 10 sequence positions are identical, then the identity is 60%. The “identity” or “percent (%) identity” between two amino acid sequences can, e.g., be determined by using the Needleman-Wunsch algorithm (Needleman, S.B. and Wunsch, CD. A general method applicable to the search for similarities in the amino acid sequence of two proteins. J Mol Biol. 1970;48(3):443-53. DOI: 10.1016 / 0022-2836(70)90057-4f which has been incorporated into EMBOSS Needle, using a BLOSUM62 matrix, a "gap open penalty" of 10, a "gap extend penalty" of 0.5, a false "end gap penalty", an "end gap open penalty" of 10 and an "end gap extend penalty" of 0.5. The percent (%) identity is typically determined over the entire length of the query sequence on which the analysis is performed. Two molecules having the same primary amino acid or nucleic acid sequence are identical irrespective of any chemical and / or biological modification. For example, two antibodies having the same primary amino acid sequence, but different glycosylation patterns are identical by this definition. In case of nucleic acids, for example, two molecules having the same sequence but different linkage components such as thiophosphate instead of phosphate are identical by this definition.
[0402] The term “about” when used in connection with a numerical value is meant to encompass numerical values within a range having a lower limit that is 10% smaller than the indicated numerical value and having an upper limit that is 10% larger than the indicated numerical value. In the foregoing detailed description of the invention, a number of individual elements, characterizing features, techniques and / or steps are disclosed. It is readily recognized that each of these has benefit not only individually when considered or used alone, but also when considered and used in combination with one another. Accordingly, to avoid exceedingly repetitious and redundant passages, this description has refrained from reiterating every possible combination and permutation. Nevertheless, whether expressly recited or not, it is understood that such combinations are entirely within the scope of the presently disclosed subject matter.
[0403] All technical and scientific terms used herein, unless otherwise defined, are intended to have the same meaning as commonly understood by one of ordinary skill in the art. Reference to techniques employed herein are intended to refer to the techniques as commonly understood in the art, including variations on those techniques or substitutions of equivalent techniques that would be apparent to one of skill in the art.
[0404] All amino acid sequences provided herein are presented starting with the most N-terminal residue and ending with the most C-terminal residue, as customarily done in the art, and the one-letter or three-letter code abbreviations as used to identify amino acids throughout the present invention correspond to those commonly used for amino acids. As used herein, the singular forms “a,” “an”, and “the” include the plural referents unless the context clearly indicates otherwise. The terms “include”, “such as”, and “the like” are intended to convey inclusion without limitation, unless otherwise specifically indicated.
[0405] As used herein, the term “or” is generally employed in its usual sense including “and / or” unless the content clearly dictates otherwise. The term “and / or” means one or all of the listed elements or a combination of any two or more of the listed elements.
[0406] As used herein, the term “comprising” also specifically includes embodiments “consisting of’ and “consisting essentially of’ the recited elements, unless specifically indicated otherwise.
[0407] All publications, patent applications, patents, and patent documents referred to in this document are incorporated by reference herein in their entirety, as though individually incorporated by reference. In the event of inconsistent usages between this document and those documents so incorporated by reference, the usage in the incorporated reference(s) should be considered supplementary to that of this document; for irreconcilable inconsistencies, the usage in this document is authoritative.
[0408] Table 2: Herein disclosed, non-limiting amino acid sequences. All sequences disclosed in Table 2 are further disclosed as part of the enclosed sequence listing. If not indicated differently, all sequences are amino acid sequences in the single letter code and from N-terminal to C-terminal orientation. "1” in the note column indicates that the respective sequence represents the amino acid sequence as expressed and thus e.g., comprises an N-terminal methionine residue. “2” in the Note-column indicates that the respective sequence was used in the experiments detailed in the appended examples.
[0409]
[0410]
[0411] References
[0412] 1 Stephanopoulos, N. & Francis, M. B. Choosing an effective protein bioconjugation strategy. Nat Chem Biol 7, 876-884, doi: 10. 1038 / nchembio.720 (2011).
[0413] 2 Kaur, J., Saxena, M. & Rishi, N. An Overview of Recent Advances in Biomedical Applications of Click Chemistry. Bioconjug Chem 32, 1455-1471, doi: 10.1021 / acs.bioconjchem. lc00247 (2021).
[0414] 3 Kolb, H. C., Finn, M. G. & Sharpless, K. B. Click Chemistry: Diverse Chemical Function from a Few Good Reactions. Angew Chem Int Ed Engl 40, 2004-2021, doi: 10. 1002 / 1521- 3773(20010601)40: 1 l<2004::AID-ANIE2004>3.0.CO;2-5 (2001).
[0415] 4 Zakeri, B. et al. Peptide tag forming a rapid covalent bond to a protein, through engineering a bacterial adhesin. Proc Natl Acad Sci USA 109, E690-697, doi: 10. 1073 / pnas. 1115485109 (2012).
[0416] 5 Sutherland, A. R., Alam, M. K. & Geyer, C. R. Post-translational Assembly of Protein Parts into Complex Devices by Using SpyTag / SpyCatcher Protein Ligase. Chemhiochem 20, 319-328, doi: 10. 1002 / cbic.201800538 (2019).
[0417] 6 Wu, H., Hu, Z. & Liu, X. Q. Protein trans-splicing by a split intein encoded in a split DnaE gene of Synechocystis sp. PCC6803. Proc Natl Acad Sci U S A 95, 9226-9231, doi: 10. 1073 / pnas.95. 16.9226 (1998).
[0418] 7 Anastassov, S., Filo, M. & Khammash, M. Inteins: A Swiss army knife for synthetic biology. Biotechnol Adv 73, 108349, doi: 10.1016 / j.biotechadv.2024.108349 (2024).
[0419] 8 Schmidt, M., Toplak, A., Quaedflieg, P. J. & Nuijens, T. Enzyme-mediated ligation technologies for peptides and proteins. Curr Opin Chem Biol 38, 1-7, doi: 10.1016 / j.cbpa.2017.01.017 (2017).
[0420] 9 Morgan, H. E., Turnbull, W. B. & Webb, M. E. Challenges in the use of sortase and other peptide ligases for site-specific protein modification. Chem Soc Rev 51, 4121-4145, doi: 10.1039 / d0cs01148g (2022).
[0421] 10 Mazmanian, S. K., Liu, G., Ton-That, H. & Schneewind, O. Staphylococcus aureus sortase, an enzyme that anchors surface proteins to the cell wall. Science 285, 760-763, doi: 10. 1126 / science.285.5428.760 (1999).
[0422] 11 Heck, T., Pham, P. H., Yerlikaya, A., Thony-Meyer, L. & Richter, M. Sortase A catalyzed reaction pathways: a comparative study with six SrtA variants. Catal Sci Technol 4, 2946-2956, doi: 10.1039 / c4cy00347k (2014).
[0423] 12 Frankel, B. A., Kruger, R. G., Robinson, D. E., Kelleher, N. L. & McCafferty, D. G. Staphylococcus aureus sortase transpeptidase SrtA: Insight into the kinetic mechanism and evidence for a reverse protonation catalytic mechanism. Biochemistry 44, 11188-11200, doi: 10. 102 l / bi05014 Ij (2005).
[0424] 13 Fuchs, A. C. D. et al. Archaeal Connectase is a specific and efficient protein ligase related to proteasome P subunits. Proceedings of the National Academy of Sciences 118, e2017871118, doi: 10.1073 / pnas.2017871118 (2021).
[0425] 14 Fuchs, A. C. D. Specific, sensitive and quantitative protein detection by in-gel fluorescence. Nat Commun 14, 2505, doi: 10.1038 / s41467-023-38147-8 (2023).
[0426] 15 Jumper, J. etal. Highly accurate protein structure prediction with AlphaFold. Nature 596, 583-589, doi: 10.1038 / s41586-021-03819-2 (2021).
[0427] 16 Kitazono, A., Yoshimoto, T. & Tsuru, D. Cloning, sequencing, and high expression of the proline iminopeptidase gene from Bacillus coagulans. J Bacterial 174, 7919-7925, doi: 10. 1128 / jb. 174.24.7919-7925. 1992 (1992).
[0428] 17 Lapteva, Y. S. et al. In Vitro N-Terminal Acetylation of Bacterially Expressed Parvalbumins by N-Terminal Acetyltransferases from Escherichia coli. Appl Biochem Biotechnol 193, 1365-1378, doi: 10.1007 / sl2010-020-03324-8 (2021).
[0429] 18 Bandyopadhyay, A., Cambray, S. & Gao, J. Fast and selective labeling of N-terminal cysteines at neutral pH viathiazolidino boronate formation. Chem Sci 7, 4589-4593, doi: 10. 1039 / C6SC00172F (2016).
[0430] 19 Gonzales, T. & Robert-Baudouy, J. Bacterial aminopeptidases: properties and functions. FEMS Microbiol Rev 18, 319-344, doi: 10.1111 / j.l574-6976.1996.tb00247.x (1996).
[0431] 20 Cunningham, D. F. & O'Connor, B. Proline specific peptidases. Biochim Biophys Acta 1343, 160- 186, doi: 10.1016 / s0167-4838(97)00134-9 (1997). Dong, Z. X. et al. Prolyl aminopeptidases: Reclassification, properties, production and industrial applications. Process Biochem 118, 121-132, doi: 10.1016 / j.procbio.2022.04.025 (2022). Yoshimoto, T. & Tsuru, D. Proline iminopeptidase from Bacillus coagulans: purification and enzymatic properties. J Biochem 97, 1477-1485, doi: 10.1093 / oxfordjoumals.jbchem.al35202 (1985). Kitazono, A., Ito, K. & Yoshimoto, T. Prolyl aminopeptidase is not a sulfhydryl enzyme: identification of the active serine residue by site-directed mutagenesis. J Biochem 116, 943-945, doi: 10.1093 / oxfordjoumals.jbchem.al24649 (1994). Beck, A., Goetsch, L., Dumontet, C. & Corvaia, N. Strategies and challenges for the next generation of antibody-drug conjugates. Nat Rev Drug Discov 16, 315-337, doi: 10.1038 / nrd.2016.268 (2017). Liu, J., Barfield, R. M. & Rabuka, D. Site-Specific Bioconjugation Using SMARTag((R)) Technology: A Practical and Effective Chemoenzymatic Approach to Generate Antibody-Drug Conjugates. Methods Mol Biol 2033, 131-147, doi: 10.1007 / 978-l-4939-9654-4_10 (2019). Wijdeven, M. A. et al. Enzymatic glycan remodeling-metal free click (GlycoConnect) provides homogenous antibody-drug conjugates with improved stability and therapeutic index without sequence engineering. MAbs 14, 2078466, doi: 10.1080 / 19420862.2022.2078466 (2022). Sadowsky, J. D. et al. Development of Efficient Chemistry to Generate Site-Specific Disulfide- Linked Protein- and Peptide-Payload Conjugates: Application to THIOMAB Antibody-Drug Conjugates. Bioconjug Chem 28, 2086-2098, doi: 10.1021 / acs.bioconjchem.7b00258 (2017). Zimmerman, E. S. et al. Production of site-specific antibody-drug conjugates using optimized nonnatural amino acids in a cell-free expression system. Bioconjug Chem 25, 351-361, doi: 10.1021 / bc400490z (2014). Gebleux, R., Briendl, M., Grawunder, U. & Beerli, R. R. Sortase A Enzyme-Mediated Generation of Site-Specifically Conjugated Antibody-Drug Conjugates. Methods Mol Biol 2012, 1-13, doi: 10.1007 / 978-l-4939-9546-2_l (2019). Purkayastha, A. & Kang, T. J. Stabilization of Proteins by Covalent Cyclization. Biotechnol
[0432] Bioproc E 24, doi: 10.1007 / sl2257-019-0363-4 (2019). Fierer, J. O., Veggiani, G. & Howarth, M. SpyLigase peptide-peptide ligation polymerizes affibodies to enhance magnetic cancer cell capture. Proc Natl Acad Sci U SA 111, El 176-1181, doi: 10. 1073 / pnas.1315776111 (2014). van 't Hof, W., Hansenova Manaskova, S., Veerman, E. C. & Bolscher, J. G. Sortase-mediated backbone cyclization of proteins and peptides. Biol Chem 396, 283-293, doi: 10.1515 / hsz-2014- 0260 (2015). Arnott, Z. L. P. et al. Quantitative N- or C-Terminal Labelling of Proteins with Unactivated Peptides by Use of Sortases and a d- Aminopeptidase. Angew Chem Int Ed Engl 63, e202310862, doi: 10. 1002 / anie.202310862 (2024). Chong et al., Structural Basis of High-Precision Protein Ligation and Its Application; Journal of the American Chemical Society 147 (2), 1604-1611 (2025)
[0433] The present invention is further described with reference to the following non-limiting figures. The figures show:
[0434] Figure 1: Schematic representation of the Connectase reaction mechanism as previously detailed in WO 2021 / 099484. The protein structures were predicted with AlphaFold 215and represent the Connectase (i.e., a polypeptide comprising an N-terminal DUF2121 domain in the context of the present invention). For simplicity, the structures are shown mirror-inverted, so that the Connectase binding channel is visible and the Connectase recognition sequences (e.g., ELASKDPGAFDADPLVVEI, SEQ ID NO: 35) can be read from left (N-terminus) to right (C- terminus). A and B symbolize peptides or proteins (and, in this context, do not refer to amino acids in the single-letter code). “Cnt” refers to the Connectase. The reaction mechanism is further detailed in Example 2.
[0435] Figure 2: Mutagenic analysis of the Connectase recognition sequence. A first substrate polypeptide (“Ub-Strep”) consisting of ubiquitin linked to a (complete) C-terminal Connectase recognition sequence (i.e., ELASKDPGAFDADPLVVEI, SEQ ID NO: 35) followed by a Streptavidin-tag (“Strep”)was fused to various second substrate polypeptides each consisting of (partial) N-terminal Connectase recognition sequence variants (XGAFDADPLVVEI, with “X” being any of the 20 standard proteinogenic amino acids, SEQ ID NO: 4-23). The second substrate polypeptides differ in the first amino acid (“X”), where proline (i.e., “P” in “PGAEDADPEVVEI”, SEQ ID NO: 16) was replaced by each of the other 19 standard proteinogenic amino acids. Shown are the results of an SDS-PAGE time course analysis of each reaction (1 eq. Ub-Strep, 1 eq. Peptide, 0.01 eq. Connectase, 22°C).The gradual emergence of the fusion product (i.e., “Ub- Peptide”) is shown over time. Based on densitometric analyses of the SDS-pages shown in, the reaction rate with the different educts (i.e., second substrate polypeptides) was estimated. The determined rates were 90% (X = A), 80% (X = C), 0% (X = D), 0% (X = E), 2% (X = F), 3% (X = G), 0% (X = H), 1% (X = I), 0% (X = K), 2% (X = L), 3% (X = M), 0% (X = N), 100% (X = P), 0% (X = Q), 0% (X = R), 30% (X = S), 4% (X = T), 8% (X = V), 0% (X = W), and 0% (X = Y). The peptide educts (XGAFDADPLVVEI) and byproducts (PGAFDADPLVVEI-Strep; top panel) are not visible on the gels, due to their low molecular weight. Relative ligation rates were normalized to the fastest reaction, with X = Pro. A high relative ligation rate is indicative of efficient coupling of the two polypeptides.
[0436] Figure 3: Complete protein-protein ligations. Shown are SDS-PAGE time course analyses of ligation reactions using Lysine-tRNA ligase (LysS) and Maltose binding protein (MBP; A, D, E, F), Glutathione-S-Transferase (GST) and MBP (B), or Ubiquitin with Streptavidin tag (Ub-Strep) and AGAFDADPLVVEI peptide (C) as educts. The resulting fusion polypeptides are accordingly labeled “LysS-MBP”, “GST-MBP”, “Ub-Peptide”, “LysS-MBP”, “LysS-MBP”, “LysS-MBP” in panels (A) to (F), respectively. Each reaction was performed with 1 eq. N-terminal fusion partner (LysS, GST or Ubiquitin; each having a C-terminal ELASKDPGAFDADPLVVEI sequence; i.e., first substrate polypeptide), 1 eq. C-terminal fusion partner (MBP with N-terminal AGAFDADPLVVEI sequence or AGAFDADPLVVEI peptide, SEQ ID NO: 40 and 4, respectively; i.e., second substrate polypeptide), 0.033 eq. Connectase, and 0.066 eq. BcPAP. The educt concentration was 100 pM (except for D: 10 pM) and the incubation temperature 22°C (except for E: 10°C). In (F), an MBP protein with an additional N-terminal Tobacco Etch Virus protease (TEV protease) recognition sequence (ENLYFQA SEQ ID NO: 50) linked N-terminally to the partial N-terminal Connectase recognition sequence (i.e., MENLYFQ|AGAFDADPLVVEI- MBP with “|” representing the cleavage site of the TEV protease (SEQ ID NO: 26)) was used and TEV protease (0.01 eq; SEQ ID NO 27) was added to the reaction. A densitometric analysis of the protein bands is shown below each experiment. For the educts, the values reflect the educt band density relative to the educt band in the control sample (0 min); for the products, the values reflect the product band density relative to the total band density (educt band density + product band density). In (C), no band corresponding to the AGAFDADPLVVEI peptide is shown due to its low molecular weight. More sample was loaded for (C) compared to (A) and (B), so that the substrate bands appear equally intense.
[0437] Figure 4 (related to Figure 3): Complete protein-protein ligations. Shown are SDS-PAGE time course analyses of ligation reactions using Lysine-tRNA ligase (LysS; i.e., first substrate polypeptide; C-terminal ELASKDPGAFDADPLVVEI sequence, SEQ ID NO: 24) and Ubiquitin (Ub; i.e., second substrate polypeptide; N-terminal AGAFDADPLVVEI (A) or P AGAFDADPLVVEI (B; “Pro-Ub substrate” as the partial N-terminal Connectase recognition sequence comprises a further N-terminal proline residue) sequence, SEQ ID NO: 41 and 42, respectively) as educts. The resulting fusion polypeptides are accordingly labeled “LysS-Ub”. Both reactions were performed with 1 eq. educts (100 pM), 0.033 eq. Connectase, and 0.066 eq. BcPAP at 22°C. A densitometric analysis of the protein bands is shown below each experiment. For the educts, the values reflect the educt band density relative to the educt band in the control sample (0 min); for the products, the values reflect the product band density relative to the total band density (educt band density + product band density).
[0438] Figure 5: LC-MS analysis of an equimolar Ub-MBP mixture before (A) and after (B, C) conjugation. A mixture of 100 pM Ub-ELASKDPGAFDADPLVVEI-Strep (SEQ ID NO: 3) and 100 pM AGAFDADPLVVEI-MBP (SEQ ID NO: 40) was analyzed using LC-MS before (A) and after (B) incubation with 0.033 eq. Connectase and 0.066 eq. BcPAP. The reaction byproduct GAFDADPLVVEI-Strep (C; SEQ ID NO: 49) appeared as an extra peak upon conjugation. MBP was detected both as a full-length version and as an N-terminally truncated version (MBP Al-221; SEQ ID NO 51). The signal intensities in the plots were normalized to the most intense peak. Figure 6: Effect of protease inhibitors on BcPAP activity. Shown is the conjugation of Ub- Strep (educt, i.e., “first substrate polypeptide”) to AGAFDADPLVVEI peptide (SEQ ID NO: 4; i.e., “second substrate polypeptide”) in presence of different protease inhibitors. The reaction catalyzed by Connectase alone (i.e., without BcPAP; upper gel) is not inhibited by these substances and results in an equilibrium between Ub-Strep educt and Ub-Peptide product (as in Figure 2). In a reaction with Connectase and BcPAP (lower gel), up to 100% Ub-peptide product is formed. Lower product yields indicate an inhibition of BcPAP. This effect is most pronounced for the serine protease inhibitors AEBSF, PMSF, and a commercial AEBSF-containing inhibitor mix ("Complete"). ZnCh, which had been reported previously as a BcPAP inhibitor16, led to the precipitation of Connectase and BcPAP.
[0439] Figure 7: Quantification of «HER2 antibodies in cell culture medium. Heavy (HC) and light (LC) antibody chains (human epidermal growth factor receptor 2; aHER2) with a (complete) C- terminal Connectase recognition sequence (i.e., ELASKDPGAFDADPLVVEI, SEQ ID NO: 35) were expressed in HEK293 cells. The medium with the exported antibodies was exchanged daily, allowing the monitoring of antibody expression levels by in-gel fluorescence (A). In this western blot alternative, Connectase is used to fuse fluorophores (i.e., Cy5.5) to the target proteins (HC, LC) and a reference protein (“Ref’). By comparing the intensity of the resulting fluorescent bands, antibody expression levels can be determined (lower panel; “aHER2 [mg / 1]”). A Coomassie stain of the same gel (B) shows all proteins in the cell culture medium samples.
[0440] Figure 8: Antibody conjugation. Shown are SDS-PAGE time course (A, B) and LC-MS analyses (C) of aHER2 (human epidermal growth factor receptor 2) antibody conjugations. The aHER2 heavy (HC) and light chains (LC) were produced with a (complete) C-terminal Connectase recognition sequence (i.e., ELASKDPGAFDADPLVVEI, SEQ ID NO: 35) and a Streptavidin tag C -terminally thereto (HC-Strep, LC-Strep, respectively). In the reactions (25 pM aHER2, 0.033 eq. Connectase, 0.066 eq. BcPAP, 22°C), the Streptavidin tag is replaced by Ubiquitin (A; 1 eq.) or a shorter AGAFDADPLVVEI peptide (B; 1 eq.; SEQ ID NO: 4). A densitometric quantification of the product bands (“HC-Ub” and “LC-Ub”, or “HC -peptide” and “LC-peptide”, in A and B, respectively) relative to the educt bands (LC-Strep, HC-Strep) is shown below the gels. For the calculation, the BcPAP density in the control lane was subtracted from the combined LC- Strep / BcPAP bands. The LC-MS analyses (C) show the assemblies (i.e., fusion polypeptides or polymers) in the unconjugated antibody sample (top panel) and a shift of the detected masses, consistent with a near-complete (i.e., lacking only a C-terminal lysine residue) conjugation to peptide (middle panel) or ubiquitin (lower panel).
[0441] Figure 9: Protein cyclization. Shown are SDS-PAGE time course analyses of a Ubiquitin cyclization reaction. The employed Ubiquitin educt (SEQ ID NO: 43) was produced with both an (partial) N-terminal (AGAFADPLVVEI, SEQ ID NO: 4) and a (complete) C-terminal (ELASKDPGAFDADPLVVEI, SEQ ID NO: 35) Connectase recognition sequence. This allows the formation of linear (LI - L4, formed by 1 - 4 Ubiquitin proteins, respectively) polymers (i.e., fusion polypeptides or assemblies), which are observed in the early stages of the time course. The N-terminus of a given polymer can be fused to its C-terminus, resulting in cyclic assemblies (C2 - C6, formed by 2 - 6 Ubiquitin proteins, respectively), which present the end product of the reaction. A lower educt concentration (A, 10 pM) results in smaller assemblies, and a higher educt concentration in bigger assemblies (B, 100 pM). The assignment of the gel bands (in the upper panels) is consistent with LC-MS data shown in the lower panels of (A) and (B).
[0442] Figure 10: Determination of the minimal Connectase recognition sequence. Connectase acts on a linker sequence derived from its physiological interaction partner, Methyltransferase A (MtrA). In an initial characterization13, this sequence was identified as RELASKDPGAFDADPLVVEI. It remained unclear, whether it could be further shortened from the N-terminal side. The depicted gel shows the Connectase-mediated conjugation of Ub-Strep (SEQ ID NO: 3) to RELASKDPGAFDADPLVVEI (peptide 1; SEQ ID NO: 34), ELASKDPGFDADPLVVEI (peptide 2; SEQ ID NO: 35), or LASKDPGAFDADPLVVEI (peptide 3; SEQ ID NO 36). The product, Ub-peptide, is formed at a similar rate with peptide 1 (relative ligation rate determined by densitometric analysis: 94%) and peptide 2 (100%), but at a reduced rate when using peptide 3 (37%). This suggests that ELASKDPGAFDADPLVVEI is sufficient for efficient conjugation reactions. The protein substrates employed herein have the C- terminal RELASKDPGAFDADPLVVEI sequence, with the additional N-terminal arginine serving as a small linker.
[0443] Figure 11: Complete protein conjugations with FmPAP. Shown is the conjugation of Ub-Strep (educt, i.e., first substrate polypeptide) to AGAFDADPLVVEI peptide (SEQ ID NO: 4; i.e., second substrate polypeptide). In absence of FmPAP (left gel), Connectase catalyzes an equilibrium between Ub-Strep educt and Ub-Peptide product (as in Figure 2). Upon addition of FmPAP (left gel), the equilibrium is shifted, so that up to 100% Ub-peptide product is formed.
[0444] Figure 12: Chemicals can be used to shift the Connectase reaction equilibrium. Shown is the conjugation of Ub-Strep P95C ("Educt 1"; SEQ ID NO: 47; first substrate polypeptide) to a Ubiquitin substrate with an N-terminal Connectase recognition sequence ("Educt 2"; SEQ ID NO: 48; second substrate polypeptide). The reactions were conducted in presence of 0 - 1500 pM 2- Formylphenylboronic acid (2-FPBA), incubated for 2 - 16 h, and analyzed by SDS-PAGE. The addition of 2-FPBA leads to increased Ub-Ub fusion product yields.
[0445] Figure 13: Schematic representation of protein immobilization using a Connectase-BcPAP fusion enzyme. Shown is the conjugation of a protein of interest (POI) to a biotinylated peptide using a Connectase-BcPAP fusion enzyme over the course of 90 min. Figure 14: Schematic representation of protein immobilization using a Connectase-BcPAP fusion enzyme. This figure summarizes the workflows performed in Example 13.
[0446] Figure 15: Results of ELISA assay after protein immobilization using a Connectase-BcPAP fusion enzyme. Shown is the comparison in HRP signal intensity over a range of HER2 antigen concentrations when applied to a “Connectase plate” that has been produced in accordance with the present invention and to a commercially available “Commercial plate” each functionalized with anti-HER2 antibodies. The lower curve corresponds to the “Connectase plate”.
[0447] Figure 16: Comparison of Connectase-mediated protein detection and conventional Western blotting. Left panel: Coupling of fluorophore-containing substrate polypeptides to a protein of interest comprising a DUF2121 recognition sequence using Connectase enables competition-based in-gel quantification. Here, the a substrate polypeptide linked to a fluorophore is transferred to both unknown amounts of the protein of interest and to predefined quantities of a reference protein and coupled thereto using the Connectase (top scheme). After separation by SDS-PAGE, both proteins appear as fluorescent bands (middle gel). The ratio of signal intensities correlates directly with the amount of protein of interest, enabling reliable, accurate, and highly sensitive quantification without the need of blotting (bottom plot). Right panel: In Western blotting, a target protein is routinely detected using primary and secondary antibodies (top scheme). Signal intensity increases with protein quantity (middle blot), but this relationship is often non-linear and variable between replicates, resulting in lower reliability, accuracy, and sensitivity (bottom plot).
[0448] Figure 17: Schematic representation of a method of coupling two proteins (A and B) using the herein provided methods. Protein A comprises a C-terminal Cnt-Tag (i.e., a first substrate polypeptide comprising a DUF2121 recognition sequence) and Protein B comprises an N-terminal Cnt-Tag (i.e., a second substrate polypeptide comprising a partial N-terminal DUF2121 recognition motif). After binding of the second cleavage product to the Connectase (thereby forming a covalent Protein A-Connectase intermediate) the first cleavage product (i.e., the releasev by-product) is modified / cleaved by a modifying enzyme (here, a proline amino peptidase; PAP), resulting in the inactivation of said first cleavage product (thereby producing an inactivated byproduct) and causing it not being recognized by the Connectase anymore. Subsequently, the partial N-terminal and the partial C-terminal DUF2121 recognition motifs are linked by the Connectase, resulting in the production of a fusion polypeptide comprising Proteins A and B. Instead of Proteins A and B, these substrate polypeptides may also comprise other proteinaceous or non- proteinaceous moieties.
[0449] Examples
[0450] Certain embodiments of the invention will now be described with reference to the following examples, which are intended for the purpose of illustration only and are not intended to limit the scope of the generality hereinbefore described.
[0451] Example 1: Methods
[0452] Example 1.1: Cloning, Expression, and Purification
[0453] The sequences of all proteins and peptides used in this example are listed in Table 2. The peptide for the in-gel fluorescence assay (Figure 7) was synthesized by Intavis, while all other peptides were synthesized by Genecust. Genes were synthesized by Biocat using optimized codon frequencies for E. coli or human expression. These genes were cloned into the pET30b(+) vector (restriction sites: Ndel, Xhol) for expression in E. coli or the pcDNA3.1 vector (restriction sites: Hindlll, Xhol) for expression in HEK293 cells.
[0454] For recombinant expression in E. coli (for all proteins except for the aHER2 antibodies in Example 5), BL21 gold cells were transfected with the respective plasmids and grown in lysogeny broth medium with 50 pg / 1 kanamycin at 22°C. Protein expression was induced at an optical density of 0.4 at 600 nm with 500 pM isopropyl-P-D-thiogalactoside. Cells expressing soluble proteins were harvested after 16 h, resuspended in buffer (100 mM Tris-HCl, lx complete EDTA- firee protease inhibitor cocktail (Roche; no inhibitor was added to cells expressing BcPAP (Seq ID NO 2)), 0.02 g / 1 DNase, pH 8.0), lysed by French press, and cleared from cell debris by ultracentrifugation (120,000 g, 45 min, 4°C).
[0455] For recombinant expression of aHER2 antibodies (Example 5; SEQ ID NO: 31-32), HEK293 cells were cultured at 37°C in ten 75 cm2flasks with Dulbecco's Modified Eagle Medium (DMEM) supplemented with fetal calf serum. At 70% confluency, they were transfected with plasmids encoding for aHER2 light and heavy chains using Lipofectamine 2000 (Thermo), according to the manufacturer's instructions (47 pl Lipofectamine, 55 pg of each plasmid). The cells were grown for ten days without splitting, and the medium was exchanged daily. The medium samples were used for aHER2 detection by in-gel fluorescence (Figure 7, described below), then pooled, centrifuged (6000 g, 10 min, 4°C), and filtered (0.45 pm) before protein purification.
[0456] For protein purification, His6-tagged proteins were applied to HisTrap HP columns (20 mM Tris- HCl pH 8.0, 250 mM NaCl, 20 - 250 mM imidazole). Strep-tagged proteins were instead purified with StrepTrap XT columns (1.8 mM KH2PO4, 10 mMNa2HPO4, 2.7 mM KCl, 138 mM NaCl, 0 - 50 mM Biotin, pH 7.4). After this initial purification step, proteins with N-terminal TEV recognition sequences were incubated with TEV protease at a 1 : 100 molar ratio. The reaction was performed overnight in dialysis tubes (dialysis buffer: 20 mM Tris-HCl pH 8.0, 250 mM NaCl) at 4°C. The processed proteins were separated from His6-tagged TEV protease, N-terminal fragments (MHHHHHHENLYFQ; SEQ ID NO: 37), and residual unprocessed proteins by another purification step. For this, the reactions were applied a second time to HisTrap HP columns (as above), and the flow-through was collected. All chromatography steps were performed on an Akta Purifier FPLC (GE Healthcare) using Unicorn v5.1.0 software. Purified proteins were supplemented with 15% glycerol, flash-frozen in liquid nitrogen, and stored at -80°C. Protein concentrations were determined by measuring sample absorbance at 280 nm. The determined values were compared with theoretical absorbance values, based on the number of tryptophan, tyrosine, phenylalanine, and cysteine residues in the primary amino acid sequence. For the concentration determination of the aHER2 antibody preparation, the calculation was performed based on the sequence of the HC2LC2 assembly.
[0457] Example 1.2: Biochemical Assays
[0458] Unless noted otherwise, all conjugation reactions were performed at 22°C in neutral (pH 7.0) buffer containing 50 mM sodium acetate, 50 mM MES, 50 mM HEPES, 150 mM NaCl, and 50 mM KC1. They were stopped with SDS loading buffer (final concentration: 50 mM Tris-HCl, 2% SDS, 10% glycerol, 25 mM P-mercaptoethanol, 0.01% bromophenol blue, pH 6.8; final protein concentration approx. 0.1 g / 1) and incubated at 90°C for 10 minutes. The samples were separated using mPAGE 12% Bis-Tris gels (Merck; 5 pl loading volume per sample) with MOPS running buffer (50 mM MOPS, 50 mM Tris, 0.1% SDS, 1 mM EDTA; no pH adjustment). The gels were stained with Coomassie blue (25% ethanol, 25% 25% methanol, 10% acetate, 0.25% Coomassie R-250), and subsequently with Coomassie colloidal solution (20% ethanol, 10% ammonium sulfate, 5.8% phosphoric acid, 5% methanol, 0.12% Coomassie G-250). They were destained with 10% acetic acid and imaged with an Azure Sapphire NIR fluorescence scanner (excitation at 685 nm, emission at 725 nm, 25 - 50 pm resolution, Intensity 8, highest scanning speed). Densitometric band quantification was performed with Image Studio Lite 5.2.
[0459] For Figure 2, 20 experiments were set up, each with a different XGAFDADPLVVEI peptide (X = any of the 20 amino acids; SEQ ID NO: 4 to 23). The reactions contained 20 mM Ub-Strep (SEQ ID NO 3), 20 mM peptide (SEQ ID NO: 4-23), and 0.2 mM Connectase (SEQ ID NO: 39). Samples were taken before the addition of Connectase (0 min) and after the indicated times (0.1 - 96 h). After SDS-PAGE and densitometric analyses, the relative reaction rates in each experiment were estimated. An exact determination was not possible because the reaction is reversible, and the emerging peptide side product (PGAFDADPLVVEI-Strep; SEQ ID NO: 38) competes with the assayed peptide (XGAFDADPLVVEI; SEQ ID NO:4-23; X = any canonical proteinogenic amino acid) for the enzyme binding sites. To reduce this effect, the estimates were made based on the required time to obtain 10% fusion product yield in each reaction.
[0460] For Figure 3 and Figure 4, 8 experiments were set up with different educt pairs (Figure 3A, D, E: LysS (SEQ ID NO: 24) / MBP (SEQ ID NO: 40); 3B: GST (SEQ ID NO: 25) / MBP (SEQ ID NO: 40); 3C: Ub-Strep (SEQ ID NO: 3) / AGAFDADPLVVEI peptide, SEQ ID NO: 4; 3F: LysS (SEQ ID NO: 24) / MBP before TEV cleavage (see "Purification"; (SEQ ID NO: 26)); Figure 4A: LysS (SEQ ID NO: 24) / Ub (SEQ ID NO: 41); 4B: LysS (SEQ ID NO: 41) / Pro-Ub (SEQ ID NO: 42).. Each educt was used at 100 pM (except for Figure 3D (10 pM)). The reactions were started by addition of 0.033 molar equivalents (eq.) Connectase (SEQ ID NO: 39) and 0.066 eq. BcPAP (SEQ ID NO: 2). For the experiment shown in Figure 3F, 0.01 eq. TEV protease (SEQ ID NO: 37) was also added. The reactions were incubated at 22°C (except for Figure 3E (10°C)). Samples were taken before (0 min) and after the addition of Connectase (SEQ ID NO: 39) / BcPAP (SEQ ID NO: 2) (3.8 - 960 min). After SDS-PAGE, the educt band densities relative to the control sample (0 min) were determined. The product band densities were determined relative to the total band density (educt band density + product band density).
[0461] For Figure 6, Connectase (2 pM; SEQ ID NO: 39) was used without (A) or with BcPAP (7 pM, B; SEQ ID NO 2). Both solutions were incubated at 22°C for 30 min with the following compounds: buffer (control reaction), 1 mM ZnC12, "Complete" protease inhibitor mix (Roche, 1 tablet per 25 ml), 4-(2-Aminoethyl)-benzenesulfonyl-fluoride hydrochloride (AEBSF), ALLN (Calpain inhibitor), Antipain-dihydrochloride, Aprotinin, Bestatin, Chymostatin, E-64, EDTA-Na2, Leupeptin, Pepstatin, Phosphoramidon, and phenylmethyl sulfonyl fluoride (PMSF). The concentrations of the last compounds remain unknown, as the supplier (G-biosciences, Protease Inhibitor Set, inhibitors used at "2x" concentration) refused to provide them on request. After the incubation, the Connectase / BcPAP / inhibitor mixture was added to an equal volume of conjugation educts (20 pM Ub-Strep (SEQ ID NO: 3), 20 pM AGAFDADPLVVEI peptide; SEQ ID NO: 4). After 2 h at 22°C, the reactions were analyzed by SDS-PAGE.
[0462] For Figure 7, the cell culture medium samples taken during aHER2 expression (Example 1.1) were analyzed. They were centrifuged (1 min, 10,000 g) and 2.5 pl of each supernatant was mixed with 300 firnol reference protein (MBP with a C-terminal Connectase recognition sequence; (SEQ ID NO: 40)). The mixture was incubated with 1 nM Connectase (SEQ ID NO: 39) and 10 nM fluorescent peptide educt (RELASKDPGAFDADPLVVEISEEGE-Cy5.5; SEQ ID NO: 30; Synthesized by Intavis (Tuebingen, Germany) for 20 min at 22°C. The reactions were separated by SDS-PAGE and imaged before and after Coomassie staining. The band densities corresponding to reference protein and aHER light chains (“LC”; SEQ ID NO: 45) and heavy chains (“HC”; SEQ ID NO: 44) were determined. They were used to estimate the expressed aHER2 quantities over time. The approach is described in detail in reference14
[0463] For Figure 8, 25 pM Strep-tagged aHER2 antibody was mixed with 3.33 pM Connectase (SEQ ID NO: 39), 6.66 pM BcPAP (SEQ ID NO: 2), and either 100 pM AGAFDADPLVVEI-Ubiquitin (SEQ ID NO: 41) or 100 pM AGAFDADPLVVEI peptide (SEQ ID NO: 4) The reactions were incubated at 22°C, and samples were taken after the indicated times (0 - 120 min). After SDS- PAGE, the product band quantities (i.e., LC-Ub, HC-Ub, or LC-peptide, HC-peptide) were determined relative to the educt band quantities (LC-Strep, HC-Strep).
[0464] For Figure 9, a ubiquitin variant with an (partial) N-terminal (AGAFDADPLVVEI) and a (complete) C-terminal (ELASKDPGAFDADPLVVEI) Connectase recognition sequence was employed (SEQ ID NO: 43). This educt was used at a concentration of 10 pM (first experiment; Figure 9A) and at a concentration of 100 pM (second experiment; Figure 9B). Both reactions were conducted with 0.033 eq. Connectase (SEQ ID NO: 39) and 0.066 eq. BcPAP (SEQ ID NO: 2) at 22°C. Samples were taken after the indicated times and analyzed by SDS-PAGE.
[0465] For Figure 10, Ub-Strep (SEQ ID NO: 3; 10 pM) was mixed with RELASKDPGAFDADPLVVEI (SEQ ID NO: 34), ELASKDPGAFDADPLVVEI (SEQ ID NO: 35), or LASKDPGAFDADPLVVEI (SEQ ID NO 36) peptides (10 pM) and 0.25 pM Connectase (SEQ ID NO 39). Reaction samples were taken after the indicated times (0 - 60 min) and analyzed by SDS-PAGE.
[0466] For Figure 11, reactions with 10 pM Ub-Strep (SEQ ID NO: 3), 10 pM AGAFDADPLVVEI (SEQ ID NO: 4), 1 pM Connectase (SEQ ID NO: 1), and 0 pM or 0.1 pM FmPAP comprising a C-terminal His-tag (SEQ ID NO: 583) were set up, incubated for up to 15 h, and analyzed by SDS- PAGE.
[0467] For Figure 12, reactions with 10 pM Ub-Strep P95C (SEQ ID NO: 584), 10 pM Ubiquitin with N-terminal Connectase recognition sequence (SEQ ID NO: 585), 1 pM Connectase (SEQ ID NO: 1), and 0 - 1500 pM 2-Formylphenylboronic acid (2-FPBA; Sigma Aldrich) were set up, incubated for up to 16 h, and analyzed by SDS-PAGE.
[0468] Example 1.3: Liquid Chromatography-Mass Spectrometry (LC-MS)
[0469] For LC-MS, the following samples and conditions were used:
[0470] For Figure 5, 100 pM Ub-Strep (SEQ ID NO: 3), 100 pM MBP (SEQ ID NO: 40), 3.33 pM Connectase (SEQ ID NO: 39), and 6.66 pM BcPAP (SEQ ID NO: 2) were mixed. The reaction was incubated for 4 h at 22°C and then used for LC-MS analysis.
[0471] For Figure 8, 25 pM Strep-tagged aHER2 antibody (an assembly comprising two heavy and two light chains, according to SEQ ID NO: 44 and 45 respectively, was assumed for protein concentration determination) was mixed with 3.33 pM Connectase (SEQ ID NO: 39), 6.66 pM BcPAP (SEQ ID NO: 2), and either 100 pM AGAFDADPLVVEI-Ubiquitin (SEQ ID NO: 41) or 100 pM AGAFDADPLVVEI peptide (SEQ ID NO: 4). The reaction was incubated for 4 h at 22°C and then used for LC-MS analysis. Unconjugated aHER2 antibody (SEQ ID NO: 44-45) was used as a control. All samples were deglycosylated with PNGase F (R&D Systems) for 16h at 37°C.
[0472] For Figure 9, a ubiquitin variant with (partial) N- and (complete) C-terminal Connectase recognition sequence (SEQ ID NO: 43) was used at two different concentrations, 10 pM (A) and 100 pM (B). Both samples were incubated with 0.033 eq. Connectase (SEQ ID NO: 39) and 0.066 eq. BcPAP (SEQ ID NO: 2) for 4h at 22°C and then used for LC-MS analysis.
[0473] The samples stored at 4°C for up to 6 hours, before they were subjected (1.6 pg protein) to an Acquity BEH C4 column, using an UltiMate3000 UHPLC. They were eluted with a 0 - 50% FEO / acetonitrile gradient in presence of 0.1% formic acid over 7 minutes. The eluted molecules were analyzed with a MaXis HD UHR q-TOF spectrometer. Mass spectrometer parameters were adapted to the size of the molecule and the chromatography flow rate (by default 0.15 ml / min). Data analysis was performed using Bruker Compass DataAnalysis v6.1 software (Bruker Daltonik, Bremen, Germany). Charge deconvolution of the m / z spectra was performed with the MaxEnt deconvolution algorithm (Bruker Daltonic, Bremen, Germany). Deconvolution artifacts without m / z series were excluded.
[0474] Example 2: Modification of the Connectase recognition sequence
[0475] Like other enzyme ligases, Connectase catalyzes a reversible reaction13. Connectase from M. mazei (SEQ ID NO: 161), for example, binds substrates with the sequence A- ELASKDPGAFDADPLVVEI (Figure 1, step 1; i.e., a first substrate polypeptide (comprising the complete DUF2121 recognition motif ELASKDPGAFDADPLVVEI; SEQ ID NO: 35) in the context of the present invention; with A being any desired chemical molecule / polypeptide / polypeptide stretch). It then cleaves off the C-terminal peptide PGAFDADPLVVEI (Figure 1, steps 2-3; i.e., “a first cleavage product” in the context of the present invention; SEQ ID NO: 16) and forms a covalent intermediate, A-ELASKD-Connectase, with the N-terminal part of this sequence (i.e., with “A-ELASKD”, which is considered “a second cleavage product“ in the context of the present invention; ELASKD is shown in SEQ ID NO: 46). This reaction works both ways, meaning that PGAFDADPLVVEI (i.e., the first cleavage product; SEQ ID NO: 16) can react with A-ELASKD-Connectase to restore the Connectase and its substrate, A-ELASKDPGAFDADPLVVEI. However, when a second substrate B in form of PGAFDADPLVVEI-B (i.e., “a second substrate polypeptide” comprising a partial N-terminal DUF2121 recognition sequence according to SEQ ID NO: 16 with B being any desired chemical molecule / polypeptide / polypeptide stretch) is added to the reaction, it may be used instead of the peptide PGAFDADPLVVEI (i.e., the first substrate polypeptide; SEQ ID NO: 16) to form the fusion product A-ELASKDPGAFDADPLVVEI-B (also “A-B” herein below; Figure 1, steps 4- 5; i.e., a fusion polypeptide comprising the second cleavage product of the first substrate linked to the N-terminus of the second substrate in the context of the present invention).
[0476] When using equimolar quantities of educts A and B (i.e., of the first and of the second substrate polypeptides, respectively), Connectase catalyzes an equilibrium of approximately 50% fusion product A-B and 50% educts. In the specific example of the AL mazei Connectase, this is because the same amounts of PGAFDADPLVVEI peptide byproduct (i.e., the first cleavage product; SEQ ID NO: 16) and PGAFDADPLVVEI-B educt (i.e., the second substrate polypeptide) compete for the A-ELASKD-Connectase intermediate (Figure 1, step 3; i.e., the second cleavage product). The gist of the present invention was to shift this equilibrium of reactions catalyzed by Connectases towards more than about 50% of the desired fusion products, i.e., more than about 50% of the A- B fusion products. It was surprisingly found that with the means and methods provided herein, more than even at least 55% if not even up to nearly 100% fusion products can be achieved; i.e., nearly all educts are ligated / fused / coupled to the desired fusion protein. In the context of the invention, means and methods are provided that favorably shift the equilibrium towards the product side, i.e., a higher yield of the desired fusion protein / fusion product / fusion peptide can be achieved. This higher yield can be achieved in Connectase reactions by removing or inactivating the peptide byproduct (i.e., the first cleavage product)from the reaction mixture / the pool of reactants. This “removal” or “inactivation” of this peptide byproduct / first cleavage product can be for example achieved by specific enzymatic proteolysis of the peptide byproduct / first cleavage product. The enzymatic proteolysis can comprise the use of specific proteases (such as, e.g., a proline aminopeptidase) that act only on the first cleavage product (in this exemplified case of the M. mazei Connectase the PGAFDADPLVVEI peptide) and not on the second substrate polypeptide (in this exemplified case the PGAFDADPLVVEI-B educt). This inventive principle can be used also for other Connectase recognition sequences as shown herein (like, e.g., for the Connectase recognition sequences shown in SEQ ID NO: 365 to 578). The removal of the peptide byproduct / first cleavage product as illustrated herein above in the specific example of the M. mazei Connectase can without further ado be generalized for other Connectase reactions. Also in other Connectase reactions, this should lead to a higher yield of desired fusion products / fusion proteins / fusion peptides.
[0477] To enable peptide byproduct removal from Connectase reactions, potential ways to alter the Connectase recognition sequence were investigated herein. For this, Ubiquitin (Ub) with a (complete) C-terminal Connectase recognition sequence (i.e., ELASKDPGAFDADPLVVEI, SEQ ID NO: 35) followed by a Streptavidin tag (Figure 2, “Ub-Strep”; i.e., first substrate polypeptide) was used (SEQ ID NO: 3). As a second reaction educt (i.e., second substrate polypeptide), peptides derived from the Connectase recognition sequence (XGAFDADPLVVEI, X = any of the 20 amino acids) were used (SEQ ID NO: 4-23). A conjugation of the two educts results in a shorter Ubiquitin product (Figure 2, “Ub-Peptide”) as compared to the Ub-Strep educt, which lacks the Streptavidin tag. Therefore, the conjugation rate with the different peptides could be determined by observing Ub-Peptide formation in SDS-PAGE time course analyses (Figure 2). The experiment with the original recognition sequence peptide (X = Proline) resulted in the rapid formation of an equilibrium between approx. 0.5 equivalents (eq.) Ub-Strep and 0.5 eq. Ub- Peptide, in accordance with approx. 50% product yield from equally abundant educts (1 eq. each; see introduction). While proline substitutions with X = D, E, F, G, H, I, K, L, M, N, R, T, W, or Y reduced ligation rates, substitutions with S, C, and A resulted in surprisingly high ligation rates. This was surprising, because the KDPGA sequence is highly conserved in the physiological Connectase target, mtrA (methyltransferase A)13. With its special structure and chemistry, proline was so far expected to be crucial for the reaction catalyzed by the Connectase. Accordingly, the results illustrated in Figure 2 surprisingly show that other amino acids, which do have a P-carbon (in all amino acids, except for G) but lack the y-carbon (in all amino acids, except for G, S, C, A) may also be used in this position.
[0478] I l l Example 3: Discrimination between different recognition sequences
[0479] This unexpected finding allowed the designing of reactions, in which educts (i.e., second substrate polypeptides) and peptide byproducts (i.e., first cleavage products) differ in their N-terminal amino acid. This discrimination criterion could then be used to specifically inactivate the XiGAFDADPLVVEI peptide byproduct (e g., PGAFDADPLVVEI-Strep in Figure 2) without affecting the X2GAFDADPLVVEI-B educt (e.g., “XGAFDADPLVVEI” peptide without B in Figure 2). First, it was hypothesized that an N-acetyltransf erase, which acetylates the N-terminal alanine on the peptide byproduct (Xi = A), while leaving the N-terminal proline on educts (X2 = P) unmodified, might be employed here17. Another possibility may be the use of chemicals, which form ring structures with the amino and sulfhydryl-groups of N-terminal cysteines (Xi = C, X2 = A / P)18. Further, it was speculated that it might be possible to use aminopeptidases, which act exclusively on the peptide byproduct Xi (and thus not on X2). Many of these enzymes have no absolute specificity for just one amino acid19. Proline residues, however, are structurally distinct from the other standard proteinogenic amino acids and not modified by many promiscuous enzymes, but instead by a set of proline-specific enzymes20.
[0480] Based on these considerations, it was decided to search for a proline aminopeptidase21, which removes the N-terminal proline from PGAFDADPLVVEI sequences with suitable efficiency (and accordingly, depletes PGAFDADPLVVEI - i.e., the first cleavage product in the context of the present invention - from the pool of reactants), but is inactive towards all other residues (including X2 = A). Literature research revealed Bacillus coagulans proline aminopeptidase (BcPAP) as a candidate. This enzyme had been shown to cleave N-terminal proline from peptides consisting of 2 - 4 amino acids, while being inactive towards other N-terminal amino acids16,22. BcPAP (SEQ ID NO: 2) was recombinantly produced as a soluble monomer (33 kDa) in E. coli (>40 mg from 1 1 culture) and its suitability for shifting the Connectase reaction equilibrium was subsequently tested.
[0481] Example 4: A method for complete protein-protein fusions
[0482] Next, the effect of BcPAP in ligation reactions with A-ELASKDPGAFDADPLVVEI (i.e., first substrate polypeptide with A = LysS (Lysine-tRNA ligase; (SEQ ID NO: 24)), GST (Glutathione- S-Transferase; (SEQ ID NO: 25)), or Ub (Ubiquitin (SEQ ID NO: 3))) and AGAFDADPLVVEL B (i.e., second substrate polypeptide with B = MBP (Maltose Binding Protein (SEQ ID NO: 40)), Ub (SEQ ID NO: 41), or B being absent (i.e., just the AGAFDADPLVVEI peptide (SEQ ID NO: 4))) substrates was assessed. The reactions were performed at room temperature (22°C) in neutral buffer (pH 7.0), with moderate amounts of salt (150 mM NaCl, 50 mM KC1), 100 pM of each educt (A-ELASKDPGAFDADPLVVEI and AGAFDADPLVVEI-B), as well as 0.033 molar equivalents of Connectase (SEQ ID NO: 39) and 0.066 molar equivalents of BcPAP (SEQ ID NO: 2). They were separated by SDS-PAGE, stained with Coomassie G-250, and imaged with a fluorescence scanner (Excitation 685 nm, Emission 725 nm). This allowed the densitometric quantification of the resulting protein bands with good accuracy. In each case (Figure 3A-C, Figure 4A), 98 - 100 % conversion of the less abundant educt and no reaction side products were observed. In protein-protein ligations (Figure 3A-B, Figure 4A), approx. 90% fusion product was obtained after one hour incubation time and approx. 95% after two hours; protein-peptide ligations (Figure 3C) were found to be even faster. The reaction was around four times faster at high educt concentrations (100 pM (Figure 3A) instead of 10 pM (Figure 3D)) and around eight times faster at ambient temperatures (22°C (Figure 3A) instead of 10°C (Figure 3E) ), but all reactions resulted in equally complete protein-protein fusions at the final measured timepoint, if not earlier. Surprisingly, the AGAFDADPLVVEI-B substrate could be generated during the reaction by TEV protease cleavage of MENLYFQI AGAFDADPLVVEI- B (Figure 3F). This approach avoids the potential acetylation of N-terminal alanine residues during expression of (M)AGAFDADPLVVEI-B substrates (methionine removal by methionine aminopeptidase; MAGAFDADPLVVEI is shown in SEQ ID NO: 47). It was surprisingly found that the addition of an additional N-terminal proline residue (i.e.,”P” in P|AGAFDADPLVVEI-B; PAGAFDADPLVVEI is shown in SEQ ID NO: 48), which is removed during the reaction by BcPAP (Figure 4B) may be used to prevent N-terminal acetylation of alanine.
[0483] To further substantiate these findings, a liquid chromatography mass spectrometry (LC-MS) analysis of a similar reaction with Ub-ELASKDPGAFDAPLVVEI-Strep (SEQ ID NO: 3; i.e., first substrate polypeptide) and AGAFDAPLVVEI-MBP (SEQ ID NO: 40; i.e., second substrate polypeptide) was performed. It was found that these educts were converted almost entirely to the fusion polypeptide Ub -EL A SKD AGAFDAPLVVEI-MBP. The N-terminal proline was removed almost entirely from the peptide byproduct (i.e., from the first cleavage product), resulting in the formation of a GAFDAPLVVEI- Strep peptide (SEQ ID NO: 49; i.e., a first cleavage product comprising a modified (partial) N-terminal Connectase recognition motif that may no longer recognized by the Connectase), but no other amino acids were removed (Figure 5).
[0484] Next, the effect of serine-, cysteine-, or metalloprotease inhibitors on the reaction was tested to further substantiate the above detailed findings. While Connectase is not affected by any of these compounds13, the equilibrium shift associated with BcPAP activity could be suppressed with the serine protease inhibitors PMSF and AEBSF. These results stand in contrast to previous studies, where BcPAP was found to be more susceptible to cysteine protease inhibitors16. They are, however, consistent with the classification of BcPAP as a serine protease23.
[0485] Taken together, these observations support the finding16,19,22that BcPAP exclusively acts on N- terminal proline residues. They show that BcPAP is also active on long peptides or proteins and that this activity can be used to obtain up to 100% fusion product in Connectase-mediated ligations (a drastic and surprising increase by up to about 100% in product yield; in other words an up to about 2-fold increase in product yield). In this context, a 100% product yield means that all educts are used up in the reaction. Example 5: Homogeneous antibody conjugates
[0486] Antibodies are the most relevant protein conjugation target24. Many applications require their conjugation to spacious molecules (e.g. to horse radish peroxidase) and / or to a defined number of molecules. The number of molecules to be attached to an antibody, also known as “drug : antibody ratio”, is used as a benchmark for several specialized techniques. Methods to couple drugs to antibodies include, for example, formyl-glycine insertion25(Catalent), sugar engineering26(Mersana), cysteine engineering27(Genentech), the introduction of unnatural amino acids28(Sutro), and Sortase-mediated conjugations29(NBE), which all find commercial use. This astonishing number of approaches in use suggests that none of them is entirely satisfying. Here, one problem is that even a relatively high conjugation ratio of 90% for each of the four antibody chains (i.e., for the two heavy and the two light chains; “HC” and “LC”, respectively) only leads to a desired drug : antibody ratio of approximately 66% for an antibody having two heavy and two light chains (i.e., an HC2L...
Claims
Claims1. A method for the production of a fusion polypeptide, wherein the method comprises the following steps (i) to (iii):(i) contacting a first substrate polypeptide with a polypeptide comprising an N- terminal DUF2121 domain having an N-terminal serine or threonine residue, thereby cleaving said first substrate polypeptide into a first cleavage product and a second cleavage product;(ii) modifying a partial N-terminal DUF2121 recognition motif comprised in said first cleavage product so that said modified partial N-terminal DUF2121 recognition motif is not recognized by said polypeptide comprising an N-terminal DUF2121 domain; and(iii) contacting said second cleavage product and / or said polypeptide comprising an N- terminal DUF2121 domain with a second substrate polypeptide, thereby said second cleavage product is fused to said second substrate polypeptide.
2. The method according to claim 1, wherein said first substrate polypeptide comprises a DUF2121 recognition motif.
3. The method according to claim 2, wherein during step (i) said polypeptide comprising an N-terminal DUF2121 domain cleaves said DUF2121 recognition motif comprised in said first substrate polypeptide into said partial N-terminal DUF2121 recognition motif comprised in said first cleavage product and a partial C-terminal DUF2121 recognition motif comprised in said second cleavage product, thereby producing said first cleavage product and said second cleavage product.
4. The method according to any one of claims 1 to 3, wherein said first cleavage product is the C-terminal cleavage product of said first substrate polypeptide, and wherein said second cleavage product is the N-terminal cleavage product of said first substrate polypeptide.
5. The method according to any one of claims 1 to 4, wherein said second cleavage product comprises a partial C-terminal DUF2121 recognition motif.
6. The method according to any one of claims 1 to 5, wherein the N-terminal amino acid of said first cleavage product and the N-terminal amino acid of said second substratepolypeptide are selected from proline, alanine, cysteine, serine, valine, tryptophan, methionine, glycine, leucine, phenylalanine, isoleucine, aspartate, glutamate, histidine, asparagine, glutamine, tryptophan, or tyrosine and wherein said N-terminal amino acid of said first cleavage product and said N-terminal amino acid of said second substrate polypeptide are not identical, and / or wherein the N-terminal amino acid of said first cleavage product and the N-terminal amino acid of said second substrate polypeptide are selected from proline, alanine, cysteine, serine, valine, and wherein said N-terminal amino acid of said first cleavage product and said N-terminal amino acid of said second substrate polypeptide not identical, and / or wherein the N-terminal amino acid of said first cleavage product and the N-terminal amino acid of said second substrate polypeptide are selected from proline, alanine, and wherein said N-terminal amino acid of said first cleavage product and said N-terminal amino acid of said second substrate polypeptide are not identical.
7. The method according to any one of claims 1 to 6, wherein in step (ii) said partial N- terminal DUF2121 recognition motif is modified by one or more enzyme(s) and / or one or more chemical(s), preferably an enzyme or a chemical, preferably wherein said modifying enzyme to be employed in step (ii) is selected from the group consisting of a peptidase and an N-acetyltransferase.
8. The method according to claim 7, wherein said modifying enzyme to be employed in step (ii) has substrate specificity to the N-terminal amino acid residue of said first cleavage product, and / or wherein the N-terminal amino acid residue of said second substrate polypeptide is not a substrate for said modifying enzyme to be employed in step (ii).
9. The method according to any one of claims 1 to 8, wherein the partial N-terminal DUF2121 recognition motif of said second substrate polypeptide comprises the amino acid sequence XGA or XAA, preferably XGA, wherein X is the N-terminal amino acid of said second substrate polypeptide and is as defined in claim 6.
10. The method according to any one of claims 1 to 9, wherein the partial N-terminal DUF2121 recognition motif of said first cleavage product comprises the amino acid sequence XGA or XAA, preferably XGA, wherein X is the N-terminal amino acid of said first cleavage product and is as defined in claim 6.
11. The method according to any one of claims 1 to 10, wherein said partial C-terminal DUF2121 recognition motif of said second cleavage product comprises the amino acid sequence KD or RD, preferably KD.
12. The method according to any one of claims 1 to 11, wherein the DUF2121 recognition motif of the first substrate polypeptide comprises the sequence as defined in any one of SEQ ID NO: 365 to 578 or a sequence having at least 60% sequence identity to said sequence.
13. The method according to any one of claims 1 to 12, wherein said N-terminal DUF2121 domain comprises an amino acid sequence as depicted in SEQ ID NO: 56 or an amino acid sequence having at least 20% sequence identity to SEQ ID NO: 56.
14. The method according to any one of claims 1 to 13, wherein said first substrate polypeptide comprises a non-proteinaceous moiety N-terminally to said DUF2121 recognition motif, and / or wherein said second substrate polypeptide comprises a non-proteinaceous moiety C- terminally to said partial N-terminal DUF2121 recognition motif, wherein the produced fusion polypeptide comprises said non-proteinaceous moiety.
15. The method according to any one of claims 1 to 14, wherein said first substrate polypeptide comprises an antibody, or an antigen-binding fragment thereof N-terminally to said DUF2121 recognition motif, and / or wherein said second substrate polypeptide comprises an antibody, or an antigen-binding fragment thereof C-terminally to said partial N-terminal DUF2121 recognition motif, so that the produced fusion polypeptide comprises said antibody, wherein antigen-binding fragment thereof.
16. The method according to any one of claims 1 to 15, wherein said first substrate polypeptide comprises an enzyme N-terminally to said DUF2121 recognition motif, and / or wherein said second substrate polypeptide comprises an enzyme C-terminally to said partial N-terminal DUF2121 recognition motif, wherein the produced fusion polypeptide comprises said enzyme.
17. The method according to any one of claims 1 to 13, wherein the part of the first substrate polypeptide or the part of the second substrate polypeptide forming part of the produced fusion polypeptide comprises a protein and wherein the part of the other substrate polypeptide forming part of the produced fusion polypeptide has a solid carrier attached thereto, wherein the produced fusion polypeptide comprises the protein immobilized on the solid carrier, preferably wherein the protein is an enzyme.
18. The method according to any one of claims 7 to 17, wherein said modifying enzyme to be employed in step (ii) is capable of cleaving off at least the N-terminal amino acid residue of the first cleavage product, preferably only the N-terminal amino acid residue of the first cleavage product.
19. The method according to any one of claims 7 to 18, wherein said modifying enzyme to be employed in step (ii) is a peptidase.
20. The method according to claim 19, wherein said peptidase is an exopeptidase.
21. The method according to claim 20, wherein said exopeptidase is a proline aminopeptidase and wherein the N-terminal amino acid of said first cleavage product is proline.
22. The method according to claim 21, wherein said proline aminopeptidase is a proline aminopeptidase from Bacillus, preferably from Bacillus coagulans, more preferably wherein said proline aminopeptidase comprises an amino acid sequence as defined in SEQ ID NO: 52 or wherein said proline aminopeptidase comprises an amino acid sequence having at least about 60% sequence identity to SEQ ID NO: 52 and comprises proline aminopeptidase activity.
23. The method according to claim 21 or 22, wherein the N-terminal amino acid of said second substrate polypeptide is not proline, preferably wherein the N-terminal amino acid of said second substrate polypeptide is selected from alanine, cysteine, serine, valine, tryptophan, methionine, glycine, leucine, phenylalanine, isoleucine, aspartate, glutamate, histidine, asparagine, glutamine, tryptophan, or tyrosine, more preferably alanine, cysteine, serine, or valine, even more preferably alanine.
24. The method according to claim 20, wherein said exopeptidase is an alanine aminopeptidase and wherein the N-terminal amino acid of said first cleavage product is alanine.
25. The method according to claim 24, wherein the N-terminal amino acid of said second substrate polypeptide is not alanine, preferably wherein the N-terminal amino acid of said second substrate polypeptide is selected from proline, cysteine, serine, valine, tryptophan, methionine, glycine, leucine, phenylalanine, isoleucine, aspartate, glutamate, histidine, asparagine, glutamine, tryptophan, or tyrosine, more preferably proline, cysteine, serine, or valine, even more preferably proline.
26. The method according to any one of claims 7 to 17, wherein said modifying enzyme to be employed in step (ii) is an N-acetyltransferase, preferably wherein said modifying enzyme to be employed in step (ii) or said N- acetyltransferase is capable of acetylating the N-terminal amino acid residue of the first cleavage product, more preferably only the N-terminal amino acid residue of the first cleavage product.
27. The method according to claim 26, wherein the N-terminal amino acid of said first cleavage polypeptide is not proline, preferably wherein the N-terminal amino acid of said first cleavage product is selected from alanine, cysteine, serine, valine, tryptophan, methionine, glycine, leucine, phenylalanine, isoleucine, aspartate, glutamate, histidine, asparagine, glutamine, tryptophan, or tyrosine more preferably alanine, or cysteine, even more preferably alanine, and wherein the N- terminal amino acid of said second substrate polypeptide is proline.
28. The method according to any one of claims 7 to 27, wherein said polypeptide comprising an N-terminal DUF2121 domain and said modifying enzyme to be employed in step (ii) are covalently linked, optionally via a linker.
29. The method according to any one of claims 7 to 17, wherein said modifying chemical to be employed in step (ii) is specifically modifying the N-terminal amino acid residue of the first cleavage product, and wherein said modifying chemical to be employed in step (ii) is not modifying and / or not capable of modifying the N-terminal amino acid residue of the second substrate peptide.
30. The method according to claim 29, wherein said modifying chemical is a benzaldehyde comprising an ortho-boronic acid substituent, or a composition comprising an aldehyde and an organoboronic acid, preferably a benzaldehyde comprising an ortho-boronic acid substituent.
31. The method according to claim 30, wherein said benzaldehyde comprising an ortho- boronic acid substituent is 2-Formylphenylboronic acid (2-FPBA), wherein said aldehyde is salicylaldehyde, 2-pyridinecarbaldehyde, glyoxylic acid, or 3- hydroxy-2- pyridinecarbaldehyde, and / or wherein said organoboronic acid is phenylboronic acid or para-methoxyboronic acid.
32. The method according to any one of claims 18 to 25, wherein said second substrate polypeptide further comprises one or more N-terminal protecting residue(s).
33. The method according to claim 32, wherein said modifying enzyme to be employed in step (ii) is capable of cleaving and / or cleaves said one or more N-terminal protecting residue(s) off the second substrate polypeptide, preferably wherein said modifying enzyme is as defined in any one of claims 18 to 25.
34. The method according to claim 32 or 33, wherein said one or more N-terminal protecting residue(s) are a cleavable tag.
35. The method according to any one of claims 1 to 13 and 18 to 34, wherein the N-terminus of said first substrate polypeptide or the N-terminus of said second cleavage product is covalently linked to the C-terminus of said second substrate polypeptide, optionally via a linker.
36. The method for the production of a circular polypeptide according to claim 35, wherein linking and / or fusing said second cleavage product to said second substrate polypeptide comprises producing a circular polypeptide.
37. The method according to any one of claims 1 to 13 and 18 to 34, wherein said first substrate polypeptide or said second cleavage product is immobilized on a solid carrier, preferably via its N-terminus.
38. The method according to claim 37, wherein fusing said immobilized second cleavage product and said second substrate polypeptide produces an immobilized polypeptide.
39. The method according to claim 38, wherein said method further comprises removing undesired reagents and / or contaminants from the immobilized polypeptide after step (iii), preferably by washing said immobilized polypeptide with a buffer after step (iii).
40. The method according to claim 39, wherein said method further comprises the following step (iv):(iv) contacting said immobilized polypeptide with a third substrate polypeptide, with said polypeptide comprising an N-terminal DUF2121 domain as defined in any one of the preceding claims, and optionally with a modifying enzyme as defined in any one of the preceding claims or with a modifying chemical as defined in any one of the preceding claims.
41. A fusion polypeptide obtained and / or obtainable by the method according to any one of claims 1 to 39, or a purified polypeptide obtained and / or obtainable by the methodaccording to claim 40, wherein said fusion polypeptide comprises a DUF2121 recognition sequence as defined in anyone of SEQ ID NO: 579 to 582, wherein the amino acid in position 3 of anyone of SEQ ID NO: 579 to 582 is not proline.
42. A polypeptide comprising an N-terminal DUF2121 domain having an N-terminal serine or threonine residue and a polypeptide selected from the group consisting of a peptidase and an N-acetyltransferase, preferably a peptidase, more preferably wherein said polypeptide is as defined in claim 28.
43. A nucleic acid encoding:(i) the polypeptide according to claim 42; or(ii) a polypeptide comprising an N-terminal DUF2121 domain having an N-terminal serine or threonine residue as defined in any one of the preceding claims, and a modifying enzyme as defined in any one of the preceding claims.
44. A nucleic acid vector comprising the nucleic acid according to claim 43.
45. A host or host cell comprising:(i) the nucleic acid according to claim 43;(ii) the nucleic acid vector according to claim 44; and / or(iii) the polypeptide according to claim 42, or a polypeptide comprising an N-terminal DUF2121 domain having an N-terminal serine or threonine residue as defined in any one of the preceding claims, and a modifying enzyme as defined in any one of the preceding claims.
46. A composition comprising:(i) the polypeptide according to claim 42; or(ii) a polypeptide comprising an N-terminal DUF2121 domain having an N-terminal serine or threonine residue as defined in any one of the preceding claims, and a modifying enzyme as defined in any one of the preceding claims.
47. A kit for the production of a fusion polypeptide, the kit comprising one or more selected from the group consisting of the following (i) to (vi) and one or more selected from the group consisting of the following (vii) to (viii):(i) a polypeptide comprising an N-terminal DUF2121 domain having an N-terminal serine or threonine residue as defined in any one of the preceding claims, and a modifying enzyme as defined in any one of the preceding claims or a modifying chemical as defined in any one of the preceding claims;(ii) the polypeptide according to claim 42;(iii) the nucleic acid according to claim 43;(iv) the nucleic acid vector according to claim 44;(v) the host cell according to claim 45; and(vi) the composition according to claim 46; and(vii) a first substrate polypeptide as defined in any one of the preceding claims; and(viii) a second substrate polypeptide as defined in any one of the preceding claims.