Method for polypeptide nanopore sequencing

What is AI technical title?
AI technical title is built by Patsnap AI team. It summarizes the technical point description of the patent document.
The peptide-linker construct method maintains sequence order by attaching linkers between amino acid pairs, allowing accurate polypeptide characterization using nanopore sequencing, addressing inefficiencies in existing methods.

WO2026132348A1PCT designated stage Publication Date: 2026-06-25OXFORD NANOPORE TECH LTD

View PDF 0 Cites 0 Cited by

Patent Information

Authority / Receiving Office: WO · WO
Patent Type: Applications
Current Assignee / Owner: OXFORD NANOPORE TECH LTD
Filing Date: 2025-12-18
Publication Date: 2026-06-25

Application Information

Patent Timeline

18 Dec 2025

Application

25 Jun 2026

Publication

WO2026132348A1

IPC: G01N33/68; C12Q1/6869; G01N33/487

CPC: G01N33/6818; C12Q1/6869; G01N33/48721

AI Tagging

Application Domain

Microbiological testing/measurement Biological testing

Technology Topics

Nanopore Peptide

Explore More Agents

Novelty Search
Search existing technologies and assess novelty
↗
FTO
Analyze whether a product may infringe others' patents
↗
Design FTO
Check prior-design risk for exterior design
↗
Drafting
Draft patent application text based on a technical solution
↗
Find Solutions with TRIZ
Generate feasible solution to solve your technical challenge
↗

Similar Technology Patents

A maize genotyping primer combination suitable for nanopore sequencing platform and application thereof
CN121428170BGenome alignmentTest sample
Deterministic stepping of a polymer through a nanopore
CN114934108BSemi-permeable membranes Hydrolases Chemical physics Heat control
Methods, kits and applications for sequencing polypeptides and proteins
CN122270684ABiological testingProtein Sequence DeterminationOrganic chemistry
Nanopore base calling method based on libtorch and c++
CN122157801ABiostatistics Biological models AlgorithmBeam search
A reversible response MOF-confined colorimetric sensor patch and its preparation method
CN122306787AFreeze-drying Polyethylene glycol

Get free access to AI patent search and analysis

Check patentability, review prior art and ask IP Agent with full patent context.

AI Technical Summary

Technical Problem

Existing methods for characterizing polypeptides, such as mass spectrometry and Edman degradation, are inefficient and costly, and nanopore sequencing techniques face challenges with maintaining sequence fidelity of long polypeptides due to random shearing and scrambling of fragments.

Method used

A method involving the expansion of a target polypeptide into a peptide-linker construct by attaching linkers between amino acid pairs and cleaving between them, maintaining sequence order, which is then characterized using nanopore sequencing.

Benefits of technology

Preserves the sequence order of polypeptides, enabling accurate characterization and identification of polypeptides through nanopore sensing, overcoming the limitations of existing techniques.

✦ Generated by Eureka AI based on patent content.

Smart Images

Figure EP2025088235_25062026_PF_FP_ABST

Patent Text Reader

Abstract

Provided herein are methods of characterising a target polypeptide by expanding the target polypeptide to form a peptide linker construct. Also provided are methods of moving one or more peptide portions of a target polypeptide with respect to a nanopore; and to methods of forming a peptide-linker construct. Also provided are related conjugates, constructs and kits.

Need to check novelty before this filing date? Find Prior Art

Description

[0001] METHOD

[0002] Field

[0003] The present disclosure relates to methods of characterising a target polypeptide by expanding the target polypeptide to form a peptide linker construct. The present disclosure also relates to methods of moving one or more peptide portions of a target polypeptide with respect to a nanopore; and to methods of forming a peptide-linker construct. Also provided are related conjugates, constructs and kits.

[0004] The characterisation of biological molecules is of increasing importance in biomedical and biotechnological applications. For example, sequencing of nucleic acids allows the study of genomes and the proteins they encode and, for example, allows correlation between nucleic acid mutations and observable phenomena such as disease indications. Nucleic acid sequencing can be used in evolutionary biology to study the relationship between organisms. Metagenomics involves identifying organisms present in samples, for example microbes in a microbiome, with nucleic acid sequencing allowing the identification of such organisms.

[0005] Whilst techniques to characterise (e.g. sequence) polynucleotides have been extensively developed, techniques to characterise polypeptides are less advanced, despite being of very significant biotechnological importance. For example, knowledge of a protein sequence can allow structure-activity relationships to be established and has implications in rational drug development strategies for developing ligands for specific receptors. Identification of post-translational modifications is also key to understanding the functional properties of many proteins. For example, typically 30-50% of protein species are phosphorylated in eukaryotes. Some proteins may have multiple phosphorylation sites, serving to activate or inactivate a protein, promote its degradation, or modulate interactions with protein partners.

[0006] Known methods of characterising polypeptides include mass spectrometry and Edman degradation.

[0007] Protein mass spectrometry involves characterising whole proteins or fragments thereof in an ionised form. Known methods of protein mass spectrometry include electrospray ionisation (ESI) and matrix-assisted laser desorption / ionisation (MALDI). Mass spectrometry has some benefits, but results obtained can be affected by the presence of contaminants and it can be difficult to process fragile molecules without their fragmentation. Moreover, mass spectrometry is not a single molecule technique and provides only bulk information about the sample interrogated. Mass spectrometry is unsuitable for characterising differences within a population of polypeptide samples and is unwieldy when seeking to distinguish neighbouring residues.

[0008] Edman degradation is an alternative to mass spectrometry which allows the residue- by-residue sequencing of polypeptides. Edman degradation sequences polypeptides by sequentially cleaving the N-terminal amino acid and then characterising the individually cleaved residues using chromatography or electrophoresis. However, Edman sequencing is slow, involves the use of costly reagents, and like mass spectrometry is not a single molecule technique.

[0009] As such, there remains a pressing need for new techniques to characterise polypeptides, especially at the single molecule level. Single molecule techniques for characterising biomolecules such as polynucleotides have proven to be particularly attractive due to their high fidelity and avoidance of amplification bias.

[0010] One attractive method of single molecule characterization of biomolecules such as polypeptides is nanopore sensing. Nanopore sensing is an approach to analyte detection and characterization that relies on the observation of individual binding or interaction events between the analyte molecules and an ion conducting channel. Nanopore sensors can be created by placing a single pore of nanometre dimensions in an electrically insulating membrane and measuring voltage-driven ion currents through the pore in the presence of analyte molecules. The presence of an analyte inside or near the nanopore will alter the ionic flow through the pore, resulting in altered ionic or electric currents being measured over the channel. The identity of an analyte is revealed through its distinctive current signature, notably the duration and extent of current blocks and the variance of current levels during its interaction time with the pore. Nanopore sensing has the potential to allow rapid and cheap polypeptide characterisation.

[0011] Nanopore sensing and characterisation of polypeptides has been proposed in the art. For example, WO 2013 / 123379 discloses the use of an NTP-driven protein processing unfoldase enzyme to process a protein to be translocated through a nanopore. WO 2021 / 111125 discloses methods in which a target polypeptide may be characterised as it moves through a nanopore using a polynucleotide-handling protein. WO 2021 / 133168 discloses protein and polypeptide fingerprinting and sequencing by nanopore translocation. WO 2024 / 094986 discloses methods of characterising a target polypeptide as it moves with relation to a nanopore. Each of these documents is incorporated by reference in their entireties.

[0012] These methods have provided useful techniques for characterising polypeptides using nanopores. However, some challenges remain. In particular, long polypeptides can have significant secondary and tertiary structure which can hamper characterisation methods. Random shearing of long polypeptides (e.g. by exposing the peptides to chemical reagents, e.g. by exposing the polypeptide to a change in pH, or to a chemical reagent) can generate random shorter oligopeptides which avoid or reduce such structures. Such shearing, whether random or controlled e.g. using proteinases, is typically required in known methods. In some known methods, a resulting peptide fragment may then be associated with another polymer such as an oligonucleotide to form a conjugate of a peptide portion and an oligonucleotide portion. For example, this can assist with the processing of the polypeptide using a motor protein such as a polynucleotide-handling protein. However, whether or not such a conjugate is formed, such approaches are associated with challenges. In particular, shearing a longer polypeptide into shorter fragments can result in a loss of sequence fidelity as the fragments can become scrambled in order. In other words, whilst such methods can allow the characteristics of each individual fragment to be determined, information about the overall characteristics of the longer polypeptide from which the fragments are derived may be lost or degraded. For example, if a desired characteristic is the sequence of the longer polypeptide, then random fragmentation of the polypeptide and scrambling before it is characterised typically hinders obtaining this information, because the order of the fragments itself is typically scrambled such that the overall sequence of the longer polypeptide is no longer present. Whilst there are approaches to address this, such approaches may be costly, e.g. in terms of computer processing requirements, and / or require redundancy in the data acquisition to try to obtain overlapping fragments from which the original polypeptide can be reconstructed. Accordingly, there remains a need for further methods.

[0013] The inventors have recognised that significant technical advantages would arise from constructs of fragments of a target polypeptide if the sequence order of the target polypeptide is retained in such constructs.

[0014] Accordingly, an aspect of the disclosure relates to methods of characterising a target polypeptide. The target polypeptide comprises one or more target amino acid pairs, wherein each target amino acid pair in the target polypeptide comprises a first amino acid attached to a second amino acid. The attachment between the first amino acid and the second amino acid is typically, but not necessarily, via a peptide bond such that the first amino acid and the second amino acid are adjacent to one another in the target polypeptide. The method comprises expanding the target polypeptide to form a peptide-linker construct. Expanding the target polypeptide comprises attaching a linker between the first and second amino acids in each target amino acid pair, and cleaving the target polypeptide between the first and second amino acids in each target amino acid pair. By doing this, a peptide-linker construct is formed in which the sequence order of the amino acids in the target polypeptide is maintained. The construct can be characterised by taking one or more measurements characteristic of the construct. For example, the measurements can be taken as the construct moves with respect to a nanopore. By characterising the construct the target polypeptide is characterised. The methods may allow properties of the target polypeptide to be determined, such as its identity, length, composition, amino acid sequence and / or whether and how the polypeptide is modified.

[0015] Another aspect of the disclosure relates to methods of moving peptide portions of a target polypeptide with respect to a nanopore. In these methods the target polypeptide is often as described herein and is expanded as described above. A motor protein may be used to control the movement of peptide portions of the construct formed by expanding the polypeptide with respect to the nanopore. Another aspect of the disclosure relates to methods of producing a peptide-linker construct from a target polypeptide. The method involves expanding the target polypeptide as described herein.

[0016] Accordingly, provided herein is a method of characterising a target polypeptide, wherein the target polypeptide comprises one or more target amino acid pairs, wherein each target amino acid pair in the target polypeptide comprises a first amino acid attached to a second amino acid; the method comprising expanding the target polypeptide to form a peptide-linker construct; wherein expanding the target polypeptide comprises attaching a linker between the first and second amino acids in each target amino acid pair, and cleaving the target polypeptide between the first and second amino acids in each target amino acid pair; thereby forming a peptide-linker construct in which the sequence order of the amino acids in the target polypeptide is maintained; and wherein characterising the target polypeptide comprises taking one or more measurements characteristic of the construct as the construct moves with respect to a nanopore; thereby characterising the target polypeptide.

[0017] In some embodiments the target polypeptide comprises a plurality of target amino acid pairs. In some embodiments, in each target amino acid pair the first and second amino acids are adjacent. In some embodiments, the second amino acid is N-terminal to the first amino acid. In some embodiments the second amino acid is C-terminal to the first amino acid. In some embodiments, each first amino acid in each target amino acid pair is the same. In some embodiments, said plurality of target amino acid pairs comprises at least two different first amino acids. In some embodiments, the second amino acids in each of the one or more target amino acid pairs may be the same or different.

[0018] In some embodiments, expanding the target polypeptide to form a peptide-linker construct comprises attaching a first linker between the first and second amino acids in a first target amino acid pair and attaching a second linker between the first and second amino acids in a second target amino acid pair. In some embodiments, first linker and the second linker are the same. In some embodiments, the first linker and the second linker are different. In some embodiments, the or each linker independently comprises a multifunctional molecule. In some embodiments, the or each linker independently comprises a polymer. In some embodiments, the or each linker independently comprises a polynucleotide, a polypeptide and / or a polysaccharide. In some embodiments, the or each linker independently comprises a hairpin.

[0019] In some embodiments, the or each linker comprises a first end and a second end, and wherein attaching a linker between the first and second amino acids in each target amino acid pair comprises attaching the first end of the linker to the first amino acid and attaching the second end of the linker to the second amino acid. In some embodiments, the or each first amino acid comprises a reactive side chain. In some embodiments, the method comprises activating the side chain of the or each first amino acid for reaction with the first end of the linker. In some embodiments, the method comprises reacting the first end of the or each linker with the side chain of the or each first amino acid. In some embodiments, the first end of the or each linker independently comprises a first reactive group for reacting with the or each first amino acid. In some embodiments, the first reactive group is an amine-reactive group, a thiol -reactive group, a carbonyl-reactive group, a carboxyl-reactive group, a hydroxyl-reactive group, an imidazole-reactive group, or a click-chemistry reactive group. In some embodiments, the or each first amino acid is independently selected from Lys, Arg, Glu, Asp, Cys, Ser, Thr, Tyr and His.

[0020] In some embodiments, the method comprises activating the or each second amino acid for reaction with the second end of the or each linker. In some embodiments, the method comprises a step of activating the second end of the or each linker for reaction with the or each second amino acid. In some embodiments, the second end of the or each linker comprises a second reactive group for reacting with the or each second amino acid. In some embodiments, the second reactive group is an amine-reactive group, a thiolreactive group, a carbonyl -reactive group, a carboxyl -reactive group, a hydroxyl-reactive group, an imidazole-reactive group, or a click-chemistry reactive group.

[0021] In some embodiments, (i) the first amino acid and / or the second amino acid each comprise an amine group; (ii) the method comprises activating said amine group(s) by reaction with an activating agent, and (iii) the first end of the linker and / or the second end of the linker each comprise a thiol -reactive group. In some embodiments the activating agent is Traut’s Reagent (2-iminothiolane, or a salt thereof, such as 2-iminothiolane hydrochloride). In some embodiments the thiol -reactive group is a maleimide or haloacetamide group.

[0022] In some embodiments the or each linker comprises a plurality of linking portions. In some embodiments, the method comprises attaching the linking portions together prior to attaching the second end of the linker to the second amino acid. In some embodiments, the method comprises attaching the linking portions together after attaching the second end of the linker to the second amino acid.

[0023] In some embodiments, expanding the target polypeptide comprises: i) attaching a first end of each linker to the first amino acid in each target amino acid pair; ii) cleaving the target polypeptide between the first and the second amino acids in each target amino acid pair; and iii) attaching a second end of each linker to the second amino acid in each target amino acid pair.

[0024] In some embodiments, cleaving the target polypeptide comprises contacting the target polypeptide with one or more proteolytic enzymes. In some embodiments, cleaving the target polypeptide comprises contacting the target polypeptide with a chemical reagent. In some embodiments, cleaving the target polypeptide comprises contacting the target polypeptide with one or more of LysC, LysN, trypsin, ArgC, clostripain, gingisrex, GluC, glutamyl endopeptidase, granzyme B, staphylococcal peptidase I, AspN, caspase 1, caspase 2, caspase 3, caspase 4, caspase 5, caspase 6, caspase 7, caspase 8, caspase 9, caspase 10, enterokinase, factor Xa, formic acid, granzyme B, 2-Nitro-5- thiocyanatobenzoic acid, papain, thrombin (PeptideCutter), thrombin SG, Asp-N endopeptidase, and LysArgiNase. In some embodiments, the method comprises reacting the second end of the or each linker with the N-terminal amine group of the or each second amino acid.

[0025] In some embodiments, expanding the target polypeptide comprises: i) attaching a first end of each linker to each first amino acid in each target amino acid pair; and ii) attaching a second end of each linker to each second amino acid in each target amino acid pair, thereby cleaving the target polypeptide between the first and the second amino acids in each target amino acid pair.

[0026] In some embodiments reacting the second end of the linker with the second amino acid causes cleavage of the target polypeptide. In some embodiments, reacting the second end of the linker with a peptide bond between the first and second amino acids.

[0027] In some embodiments, expanding the target polypeptide comprises: i) attaching a first end of each linker to each first amino acid in each target amino acid pair; ii) attaching a second end of each linker to each second amino acid in each target amino acid pair; and iii) cleaving the target polypeptide between the first and the second amino acids in each target amino acid pair.

[0028] In some embodiments, the method comprises reacting the second end of the linker with the side chain of the second amino acid.

[0029] In some embodiments, the peptide-linker construct comprises a concatemer of contiguous peptide fragments, wherein proximate peptide fragments are linked together, and wherein the sequence order of the amino acids in the concatemer is the same as the sequence order of the amino acids in the target polypeptide. In some embodiments, the concatemer comprises n contiguous peptide fragments, and wherein the concatemer comprises a structure: or

[0030] N-... [PEPA]2A-2lLink" -I A[PEPA / ]2A-2lLink" -. . . -1A[PEPX=„] ... -C wherein

[0031] [PEP ], [PEPx+y], . . . , [PEP „] represent the n contiguous peptide fragments;

[0032] 1Arepresents the first amino acid in each peptide fragment;

[0033] 2Arepresents the second amino acid in each peptide fragment; each Link represents a linker;

[0034] 1Erepresents the first end of each linker;

[0035] 2Erepresents the second end of each linker;

[0036] N represents the N-terminus of the concatemer; and C represents the C-terminus of the concatemer.

[0037] In some embodiments, the method comprises attaching a sequencing adapter to the construct. In some embodiments, the method comprises loading a motor protein onto the construct or onto a sequencing adapter attached to the construct.

[0038] Also provided is a method of moving one or more peptide portions of a target polypeptide with respect to a nanopore, wherein the target polypeptide comprises one or more target amino acid pairs, wherein each target amino acid pair in the target polypeptide comprises a first amino acid attached to a second amino acid; the method comprising expanding the target polypeptide to form a peptide-linker construct; and contacting the peptide-linker construct with a motor protein under conditions such that the motor protein controls the movement of one or more peptide portions of the construct with respect to a nanopore; wherein expanding the target polypeptide comprises attaching a linker between the first and second amino acids in each target amino acid pair, and cleaving the target polypeptide between the first and second amino acids in each target amino acid pair; thereby forming a peptide-linker construct in which the sequence order of the amino acids in the target polypeptide is maintained.

[0039] Also provided is a method of producing a peptide-linker construct from a target polypeptide, wherein the target polypeptide comprises one or more target amino acid pairs, wherein each target amino acid pair in the target polypeptide comprises a first amino acid attached to a second amino acid; the method comprising expanding the target polypeptide by attaching a linker between the first and second amino acids in each target amino acid pair, and cleaving the target polypeptide between the first and second amino acids in each target amino acid pair; thereby forming a peptide-linker construct in which the sequence order of the amino acids in the target polypeptide is maintained.

[0040] In some embodiments of such methods, the target polypeptide and / or the linker are as defined herein; cleaving the target polypeptide is as defined herein; the peptide-linker construct is as defined herein; the method comprises attaching a sequencing adapter to the construct; and / or the method comprises loading a motor protein onto the construct or onto a sequencing adapter attached to the construct.

[0041] Also provided is a conjugate, comprising a target polypeptide comprising one or more target amino acid pairs, wherein each target amino acid pair in the target polypeptide comprises a first amino acid attached to a second amino acid; and one or more linkers attached to the first amino acid in each target amino acid pair, wherein the one or more linkers are each also optionally attached to the second amino acid in each target amino acid pair.

[0042] In some embodiments the conjugate comprises a sequencing adaptor; and / or the conjugate comprises a motor protein capable of controlling the movement of a peptide- linker construct with respect to a nanopore.

[0043] Also provided is a peptide-linker construct comprising a concatemer of contiguous peptide fragments, wherein proximate peptide fragments are linked together by a plurality of linkers. In some embodiments the construct comprises a sequencing adaptor; and / or the construct comprises a motor protein capable of controlling the movement of the construct with respect to a nanopore; and / or the construct is anchored to a membrane comprising a nanopore.

[0044] Also provided is a kit for modifying a target polypeptide, comprising a linker having a first end capable of selectively reacting with an optionally- activated first amino acid comprised in a target amino acid pair comprised in the target polypeptide, and a second end capable of reacting with an optionally- activated second amino acid comprised in the target amino acid pair; and a chemical or enzymatic reagent capable of cleaving the target polypeptide between the first amino acid and the second amino acid.

[0045] In some embodiments the kit comprises one or more of a sequencing adapter capable of selectively reacting with the target polypeptide or the linker; a motor protein capable of controlling the movement of peptide-linker construct with respect to a nanopore; and a nanopore capable of detecting one or more characteristics of a peptide-linker construct as the construct moves with respect to the nanopore.

[0046] Figure 1 shows a non-limiting example of a schematic reaction scheme in which an exemplary target polypeptide (shown comprising four amino acids, with side chains denoted Ri, A, B and R2) is expanded by a linker (Link) having a first end (IE) and a second end (2E). In the illustrated scheme, the first end (IE) of the linker (link) is attached to the first amino acid (A) in a target amino acid pair ([A-B]) and the peptide bond between the first amino acid (A) and the second amino acid (B) is cleaved as indicated schematically by the scissor symbol, e.g. by using a proteolytic enzyme, e.g. a proteolytic enzyme selective for the first amino acid. For example, the first end (IE) of the linker (link) may be attached to the side chain of the first amino acid (A). Cleavage of the target polypeptide liberates the N-terminal amine group of the second amino acid (B) for attachment to the second end (2E) of the linker. Any or all of the first end of the linker, the second end of the linker, the first amino acid and the second amino acid may be activated to facilitate the attachment of the linker to the amino acids. The peptide linker construct that is formed by expansion of the target polypeptide retains the amino acid order of the target polypeptide, here R1-A-B-R2.

[0047] Figure 2 shows a non-limiting example of a schematic reaction scheme in which an exemplary target polypeptide (shown comprising four amino acids, with side chains denoted Ri, B, A and R2) is expanded by a linker (Link) having a first end (IE) and a second end (2E). In the illustrated scheme, the first end (IE) of the linker (link) is attached to the first amino acid (A) in a target amino acid pair ([B-A]). For example, the first end (IE) of the linker (link) may be attached to the side chain of the first amino acid (A). The second end (2E) of the linker reacts with the first amino acid (here illustrated as reacting with the peptide bond between the first and second amino acids). Reaction of the second end (2E) of the linker with the second amino acid (B) causes cleavage of the peptide bond between the first amino acid (A) and the second amino acid (B), as indicated schematically by the scissor symbol. Any or all of the first end of the linker, the second end of the linker, the first amino acid and the second amino acid may be activated to facilitate the attachment of the linker to the amino acids. The peptide linker construct that is formed by expansion of the target polypeptide retains the amino acid order of the target polypeptide, here Ri-B- A-R2. Figure 3 shows a non-limiting example of a schematic reaction scheme in which an exemplary target polypeptide (shown comprising four amino acids, with side chains denoted Ri, A, B and R2) is expanded by a linker (Link) having a first end (IE) and a second end (2E). In the illustrated scheme, the first end (IE) of the linker (link) is attached to the first amino acid (A) in a target amino acid pair ([A-B]) and the second end (2E) of the linker is attached to the second amino acid (B) in the target amino acid pair. For example, the first end (IE) of the linker (link) may be attached to the side chain of the first amino acid (A). For example, the second end (2E) of the linker (link) may be attached to the side chain of the second amino acid (B). The peptide bond between the first amino acid (A) and the second amino acid (B) is cleaved as indicated schematically by the scissor symbol, e.g. by using a proteolytic enzyme, e.g. a proteolytic enzyme selective for the first amino acid. Any or all of the first end of the linker, the second end of the linker, the first amino acid and the second amino acid may be activated to facilitate the attachment of the linker to the amino acids. The peptide linker construct that is formed by expansion of the target polypeptide retains the amino acid order of the target polypeptide, here R1-A-B-R2.

[0048] Figure 4 shows a non-limiting example of a schematic reaction scheme for the preparation of a peptide-linker construct using single-stranded DNA hairpin linkers and enzymatic cleavage.

[0049] Figure 5 shows a non-limiting example of a schematic reaction scheme for the preparation of a peptide-linker construct using single-stranded DNA hairpin linkers and chemical cleavage.

[0050] Figure 6 shows a non-limiting example of a schematic reaction scheme for the preparation of a peptide-linker construct using a linker and enzymatic cleavage.

[0051] Figure 7 shows a non-limiting example of a schematic reaction scheme for the preparation of a peptide-linker construct using a linker and chemical cleavage.

[0052] Figure 8 shows a schematic of the experimental design used in Example 1 to assess the ability of LysC enzyme to process lysine residues modified with Traut’s reagent.

[0053] Figure 9 shows HPLC trace and mass spectrometry data for the starting peptide AC-EEALYAKAGNNYG-CONH2.

[0054] Figure 10 shows HPLC trace and mass spectrometry data for the LysC-digested starting peptide Ac-EEALYAKAGNNYG-CONH2.

[0055] Figure 11 shows HPLC trace and mass spectrometry data for the starting peptide AC-EEALYAKAGNNYG-CONH2 following modification with Traut’s reagent and maleimide. Figure 12 shows HPLC trace and mass spectrometry data for the peptide Ac- EEALYAKAGNNYG-CONH2, modified with Traut’s, maleimide and digested by LysC in the presence of excess Traut’s maleimide.

[0056] Figure 13 shows a schematic of the experimental design used in Example 2 to assess the sequence identity of the peptide after peptide-linker construct formation.

[0057] Figure 14 shows potential products formed during peptide-linker reaction (Example 2).

[0058] Figure 15 shows HPLC trace and mass data for the starting peptide Ac- EEALYAKAGNNYGKLAQYVA-CONH2.

[0059] Figure 16 shows HPLC trace and mass spectrometry data for the starting peptide AC-EEALYAKAGNNYGKLAQYVA-CONH2, modified with Traut’s reagent and bismaleimidoethane (BMOE).

[0060] Figure 17 shows HPLC trace (A) and mass spectrometry data (B-F) for the starting peptide AC-EEALYAKAGNNYGKLAQYVA-CONH2, modified with Traut’s reagent and BMOE, followed by LysC digestion in the presence of excess Traut’s reagent.

[0061] Figure 18 shows data from a large-scale HPLC purification of the starting peptide AC-EEALYAKAGNNYGKLAQYVA-CONH2, modified with Traut’s reagent and BMOE, followed by LysC digestion in the presence of excess Traut’s reagent.

[0062] Figure 19 shows high-resolution mass spectrometry data, fragmentation patterns, and major products for each peak identified in Fig. 18.

[0063] Figure 20 shows an example signal trace obtained from nanopore characterisation of the test peptide Tetrazine-EEALYAKAGNNYGK(N3)-CONH2 provided in a control construct. Data described in Example 3.

[0064] Figure 21 shows an example signal trace obtained from nanopore characterisation of the test peptide Tetrazine-EEALYAKAGNNYGK(N3)-CONH2 following its cleavage and formation into a peptide-linker ‘concatemer’ construct. Data described in Example 3.

[0065] Detailed Description

[0066] The present invention will be described with respect to particular embodiments and with reference to certain drawings but the invention is not limited thereto but only by the claims. Any reference signs in the claims shall not be construed as limiting the scope. Of course, it is to be understood that not necessarily all aspects or advantages may be achieved in accordance with any particular embodiment of the invention. Thus, for example those skilled in the art will recognize that the invention may be embodied or carried out in a manner that achieves or optimizes one advantage or group of advantages as taught herein without necessarily achieving other aspects or advantages as may be taught or suggested herein.

[0067] The invention, both as to organization and method of operation, together with features and advantages thereof, may best be understood by reference to the following detailed description when read in conjunction with the accompanying drawings. The aspects and advantages of the invention will be apparent from and elucidated with reference to the embodiment(s) described hereinafter. Reference throughout this specification to "one embodiment" or "an embodiment" means that a particular feature, structure or characteristic described in connection with the embodiment is included in at least one embodiment of the present invention. Thus, appearances of the phrases "in one embodiment" or "in an embodiment" in various places throughout this specification are not necessarily all referring to the same embodiment, but may. Similarly, it should be appreciated that in the description of exemplary embodiments of the invention, various features of the invention are sometimes grouped together in a single embodiment, figure, or description thereof for the purpose of streamlining the disclosure and aiding in the understanding of one or more of the various inventive aspects. This method of disclosure, however, is not to be interpreted as reflecting an intention that the claimed invention requires more features than are expressly recited in each claim. Rather, as the following claims reflect, inventive aspects lie in less than all features of a single foregoing disclosed embodiment.

[0068] It should be appreciated that “embodiments” of the disclosure can be specifically combined together unless the context indicates otherwise. The specific combinations of all disclosed embodiments (unless implied otherwise by the context) are further disclosed embodiments of the claimed invention.

[0069] In addition as used in this specification and the appended claims, the singular forms “a”, “an”, and “the” include plural referents unless the content clearly dictates otherwise. Thus, for example, reference to “a polynucleotide” includes two or more polynucleotides, reference to “a motor protein” includes two or more such proteins, reference to “a helicase” includes two or more helicases, reference to “a monomer” refers to two or more monomers, reference to “a pore” includes two or more pores and the like.

[0070] All publications, patents and patent applications cited herein, whether supra or infra, are hereby incorporated by reference in their entirety. Definitions

[0071] Where an indefinite or definite article is used when referring to a singular noun e.g. "a" or "an", "the", this includes a plural of that noun unless something else is specifically stated. Where the term "comprising" is used in the present description and claims, it does not exclude other elements or steps. Furthermore, the terms first, second, third and the like in the description and in the claims, are used for distinguishing between similar elements and not necessarily for describing a sequential or chronological order. It is to be understood that the terms so used are interchangeable under appropriate circumstances and that the embodiments of the invention described herein are capable of operation in other sequences than described or illustrated herein. The following terms or definitions are provided solely to aid in the understanding of the invention. Unless specifically defined herein, all terms used herein have the same meaning as they would to one skilled in the art of the present invention. Practitioners are particularly directed to Sambrook et al., Molecular Cloning: A Laboratory Manual, 4thed., Cold Spring Harbor Press, Plainsview, New York (2012); and Ausubel et al., Current Protocols in Molecular Biology (Supplement 114), John Wiley & Sons, New York (2016), for definitions and terms of the art. The definitions provided herein should not be construed to have a scope less than understood by a person of ordinary skill in the art.

[0072] "About" as used herein when referring to a measurable value such as an amount, a temporal duration, and the like, is meant to encompass variations of ± 20 % or ± 10 %, more preferably ± 5 %, even more preferably ± 1 %, and still more preferably ± 0.1 % from the specified value, as such variations are appropriate to perform the disclosed methods.

[0073] “Nucleotide sequence”, “DNA sequence” or “nucleic acid molecule(s)” as used herein refers to a polymeric form of nucleotides of any length, either ribonucleotides or deoxyribonucleotides. This term refers only to the primary structure of the molecule. Thus, this term includes double- and single-stranded DNA, and RNA. The term “nucleic acid” as used herein, is a single or double stranded covalently-linked sequence of nucleotides in which the 3' and 5' ends on each nucleotide are joined by phosphodiester bonds. The polynucleotide may be made up of deoxyribonucleotide bases or ribonucleotide bases. Nucleic acids may be manufactured synthetically in vitro or isolated from natural sources. Nucleic acids may further include modified DNA or RNA, for example DNA or RNA that has been methylated, or RNA that has been subject to post-transcriptional modification, for example 5 ’-capping with 7-methylguanosine, 3 ’-processing such as cleavage and polyadenylation, and splicing. Nucleic acids may also include synthetic nucleic acids (XNA), such as hexitol nucleic acid (HNA), cyclohexene nucleic acid (CeNA), threose nucleic acid (TNA), glycerol nucleic acid (GNA), locked nucleic acid (LNA) and peptide nucleic acid (PNA). Sizes of nucleic acids, also referred to herein as “polynucleotides” are typically expressed as the number of base pairs (bp) for double stranded polynucleotides, or in the case of single stranded polynucleotides as the number of nucleotides (nt). One thousand bp or nt equal a kilobase (kb). Polynucleotides of less than around 40 nucleotides in length are typically called “oligonucleotides” and may comprise primers for use in manipulation of DNA such as via polymerase chain reaction (PCR).

[0074] The term “amino acid” in the context of the present disclosure is used in its broadest sense and is meant to include organic compounds containing amine (NH2) and carboxyl (COOH) functional groups, along with a side chain (e.g., a R group) specific to each amino acid. In some embodiments, the amino acids refer to naturally occurring L a- amino acids or residues. The commonly used one and three letter abbreviations for naturally occurring amino acids are used herein: A=Ala; C=Cys; D=Asp; E=Glu; F=Phe; G=Gly; H=His; I=Ile; K=Lys; L=Leu; M=Met; N=Asn; P=Pro; Q=Gln; R=Arg; S=Ser; T=Thr; V=Val; W=Trp; and Y=Tyr (Lehninger, A. L., (1975) Biochemistry, 2d ed., pp. 71-92, Worth Publishers, New York). The general term “amino acid” further includes D- amino acids, retro-inverso amino acids as well as chemically modified amino acids such as amino acid analogues, naturally occurring amino acids that are not usually incorporated into proteins such as norleucine, and chemically synthesised compounds having properties known in the art to be characteristic of an amino acid, such as P-amino acids. For example, analogues or mimetics of phenylalanine or proline, which allow the same conformational restriction of the peptide compounds as do natural Phe or Pro, are included within the definition of amino acid. Such analogues and mimetics are referred to herein as "functional equivalents" of the respective amino acid. Other examples of amino acids are listed by Roberts and Vellaccio, The Peptides: Analysis, Synthesis, Biology, Gross and Meiehofer, eds., Vol. 5 p. 341, Academic Press, Inc., N.Y. 1983, which is incorporated herein by reference.

[0075] The terms “polypeptide”, and “peptide” are interchangeably used herein to refer to a polymer of amino acid residues and to variants and synthetic analogues of the same. Thus, these terms apply to amino acid polymers in which one or more amino acid residues is a synthetic non-naturally occurring amino acid, such as a chemical analogue of a corresponding naturally occurring amino acid, as well as to naturally-occurring amino acid polymers. Polypeptides can also undergo maturation or post-translational modification processes that may include, but are not limited to: glycosylation, proteolytic cleavage, lipidization, signal peptide cleavage, propeptide cleavage, phosphorylation, and such like. A peptide can be made using recombinant techniques, e.g., through the expression of a recombinant or synthetic polynucleotide. A recombinantly produced peptide is typically substantially free of culture medium, e.g., culture medium represents less than about 20 %, more typically less than about 10 %, and most typically less than about 5 % of the volume of the protein preparation.

[0076] The term “protein” is used to describe a folded polypeptide having a secondary, tertiary, or quaternary structure. The protein may be composed of a single polypeptide, or may comprise multiple polypeptides that are assembled to form a multimer. The multimer may be a homooligomer, or a heterooligmer. The protein may be a naturally occurring, or wild type protein, or a modified, or non-naturally, occurring protein. The protein may, for example, differ from a wild type protein by the addition, substitution or deletion of one or more amino acids.

[0077] A “variant” of a protein encompass peptides, oligopeptides, polypeptides, proteins and enzymes having amino acid substitutions, deletions and / or insertions relative to the unmodified or wild-type protein in question and having similar biological and functional activity as the unmodified protein from which they are derived. The term "amino acid identity" as used herein refers to the extent that sequences are identical on an amino acid- by-amino acid basis over a window of comparison. Thus, a "percentage of sequence identity" is calculated by comparing two optimally aligned sequences over the window of comparison, determining the number of positions at which the identical amino acid residue (e.g., Ala, Pro, Ser, Thr, Gly, Vai, Leu, He, Phe, Tyr, Trp, Lys, Arg, His, Asp, Glu, Asn, Gin, Cys and Met) occurs in both sequences to yield the number of matched positions, dividing the number of matched positions by the total number of positions in the window of comparison (i.e., the window size), and multiplying the result by 100 to yield the percentage of sequence identity.

[0078] For all aspects and embodiments of the present invention, a “variant” has at least 40%, 50%, 60%, 70%, 80%, 90%, 95% or 99% complete sequence identity to the amino acid sequence of the corresponding wild-type protein. Sequence identity can also be to a fragment or portion of the full length polynucleotide or polypeptide. Hence, a sequence may have only 50 % overall sequence identity with a full length reference sequence, but a sequence of a particular region, domain or subunit could share 80 %, 90 %, or as much as 99 % sequence identity with the reference sequence.

[0079] The term “wild-type” refers to a gene or gene product isolated from a naturally occurring source. A wild-type gene is that which is most frequently observed in a population and is thus arbitrarily designed the “normal” or “wild-type” form of the gene. In contrast, the term “modified”, “mutant” or “variant” refers to a gene or gene product that displays modifications in sequence (e.g., substitutions, truncations, or insertions), post- translational modifications and / or functional properties (e.g., altered characteristics) when compared to the wild-type gene or gene product. It is noted that naturally occurring mutants can be isolated; these are identified by the fact that they have altered characteristics when compared to the wild-type gene or gene product. Methods for introducing or substituting naturally-occurring amino acids are well known in the art. For instance, methionine (M) may be substituted with arginine (R) by replacing the codon for methionine (ATG) with a codon for arginine (CGT) at the relevant position in a polynucleotide encoding the mutant monomer. Methods for introducing or substituting non-naturally-occurring amino acids are also well known in the art. For instance, non- naturally-occurring amino acids may be introduced by including synthetic aminoacyl- tRNAs in the IVTT system used to express the mutant monomer. Alternatively, they may be introduced by expressing the mutant monomer in E. coli that are auxotrophic for specific amino acids in the presence of synthetic (i.e. non-naturally-occurring) analogues of those specific amino acids. They may also be produced by native chemical ligation if the mutant monomer is produced using partial peptide synthesis. Conservative substitutions replace amino acids with other amino acids of similar chemical structure, similar chemical properties or similar side-chain volume. The amino acids introduced may have similar polarity, hydrophilicity, hydrophobicity, basicity, acidity, neutrality or charge to the amino acids they replace. Alternatively, the conservative substitution may introduce another amino acid that is aromatic or aliphatic in the place of a pre-existing aromatic or aliphatic amino acid. Conservative amino acid changes are well-known in the art and may be selected in accordance with the properties of the 20 main amino acids as defined in Table 1 below. Where amino acids have similar polarity, this can also be determined by reference to the hydropathy scale for amino acid side chains in Table 2. Table 1 - Chemical properties of amino acids

[0080] Table 2 - Hydropathy scale

[0081] Side Chain Hydropathy

[0082] He 4.5

[0083] Vai 4.2

[0084] Leu 3.8

[0085] Phe 2.8

[0086] Cys 2.5

[0087] Met 1.9

[0088] Ala 1.8

[0089] Gly -0.4

[0090] Thr -0.7

[0091] Ser -0.8

[0092] Trp -0.9

[0093] Tyr -1.3

[0094] Pro -1.6

[0095] His -3.2

[0096] Glu -3.5

[0097] Gin -3.5

[0098] Asp -3.5

[0099] Asn -3.5

[0100] Lys -3.9

[0101] Arg -4.5

[0102] A mutant or modified protein, monomer or peptide can also be chemically modified in any way and at any site. A mutant or modified monomer or peptide may be chemically modified by attachment of a molecule to one or more cysteines (cysteine linkage), attachment of a molecule to one or more lysines, attachment of a molecule to one or more non-natural amino acids, enzyme modification of an epitope or modification of a terminus. Suitable methods for carrying out such modifications are well-known in the art. The mutant of modified protein, monomer or peptide may be chemically modified by the attachment of any molecule. For instance, the mutant of modified protein, monomer or peptide may be chemically modified by attachment of a dye or a fluorophore.

[0103] As used herein, an alkylene group is a bidentate moiety derived by abstraction of two hydrogen atoms from a linear or branched alkyl group. Typically an alkylene group comprises from 1 to 10 carbon atoms and is referred to as a Ci-io alkylene group. A Ci-io alkylene group is often a Ci-4 alkylene group, or a C1-3 alkylene group. Examples of C1-4 alkylene groups include methylene, ethylene, n-propylene, iso-propylene, n-butylene, secbutylene, and tert-butylene.

[0104] As used herein, an alkenylene group is a bidentate moiety derived by abstraction of two hydrogen atoms from a linear or branched linear alkenyl group having one or more, e.g. one or two, typically one double bonds. Typically an alkenylene group comprises from 2 to 10 carbon atoms and is referred to as a C2-10 alkenylene group. A C2-10 alkenylene group is often a C2 to C4 alkenylene group or a C2 to C3 alkenylene group. Examples of C2 to C4 alkenylene groups include ethenylene, propenylene and butenylene.

[0105] As used herein, a alkynylene group is a bidentate moiety derived by abstraction of two hydrogen atoms from a linear or branched linear alkynyl group having one or more, e.g. one or two, typically one triple bonds. Typically an alkynylene group comprises from 2 to 10 carbon atoms and is referred to as a C2-10 alkynylene group. A C2-10 alkenylene group is often a C2 to C4 alkynylene group or a C2 to C3 alkynylene group. Examples of C2 to C4 alkynylene groups include ethynylene, propynylene and butynylene.

[0106] An arylene group is a bidentate moiety derived from an aryl group. As used herein, an aryl group is often a Ce to C10 aryl group which may be a substituted or unsubstituted, monocyclic or fused polycyclic aromatic group containing from 6 to 10 carbon atoms in the ring portion. Examples include monocyclic groups such as phenyl and fused bicyclic groups such as naphthyl and indenyl.

[0107] A heteroarylene group is a bidentate moiety derived from a heteroaryl group. As used herein, an heteroaryl group is often a 5- to 10- membered heteroaryl group which may be a substituted or unsubstituted monocyclic or fused polycyclic aromatic group containing from 5 to 10 atoms in the ring portion, including at least one heteroatom, for example 1, 2 or 3 heteroatoms, typically selected from O, S and N. A heteroaryl group is typically a 5- or 6-membered heteroaryl group or a 9- or 10- membered heteroaryl group. Examples include imidazole, pyridine, pyrimidine and pyrazine. A carbocyclylene group is a bidentate moiety derived from a carbocyclyl group. As used herein, a carbocyclyl group is often a 4-10- or 4-6 membered carbocyclic group containing from 4 to 10 carbon atoms. A carbocyclic group may be saturated or partially unsaturated, but is typically saturated. Examples of carbocyclic groups include cyclobutyl, cyclopentyl and cyclohexyl groups.

[0108] A heterocyclylene group is a bidentate moiety derived from a heterocyclyl group. As used herein, a heterocyclyl group is often a 4-10- or 4-6 membered heterocyclic group containing from 4 to 10 atoms in the ring portion, including at least one heteroatom, for example 1, 2 or 3 heteroatoms, typically selected from O, S and N. A heterocyclic group may be saturated or partially unsaturated, but is typically saturated. Examples of heterocyclic groups include azetidine, morpholine, 1,4-oxazepane, octahydropyrrolo[3,4- c]pyrrole, piperazine, piperidine, and pyrrolidine.

[0109] Disclosed Methods

[0110] In an aspect, provided herein is a method of characterising a target polypeptide, the method comprising expanding the target polypeptide to form a peptide-linker construct as described herein, and characterising the target polypeptide by taking one or more measurements characteristic of the construct as the construct moves with respect to a nanopore; thereby characterising the target polypeptide.

[0111] In another aspect, provided herein is a method of moving one or more peptide portions of a target polypeptide with respect to a nanopore, the method comprising expanding the target polypeptide to form a peptide-linker construct as described herein, and contacting the peptide-linker construct with a motor protein under conditions such that the motor protein controls the movement of one or more peptide portions of the construct with respect to a nanopore.

[0112] In another aspect, provided herein is a method of producing a peptide-linker construct from a target polypeptide, the method comprising expanding the target polypeptide as described herein; thereby forming a peptide-linker construct in which the sequence order of the amino acids in the target polypeptide is maintained.

[0113] The disclosed methods thus comprise expanding a target polypeptide to form a peptide-linker construct.

[0114] In the disclosed methods, any suitable target polypeptide can be used. Target polypeptides are described in more detail herein. The target polypeptide comprises one or more target amino acid pairs. Target amino acid pairs are pairs of target amino acids as described herein, with each target amino acid pair comprising a first amino acid attached to a second amino acid. This is described in more detail herein.

[0115] The target polypeptide is expanded using a linker, thereby forming a peptide-linker construct. Any suitable linker can be used. Exemplary linkers that can be used in the disclosed methods are described in more detail herein.

[0116] The linker is attached between the first and second amino acids in each target amino acid pair. Thus, each target amino acid pair in the target polypeptide is expanded using a linker. In some embodiments the linker used to expand each target amino acid pair is the same; that is, a plurality of target amino acid pairs may each be expanded using a linker wherein the linkers used to expand each of the plurality of target amino acid pairs are the same. In other embodiments different linkers may be used to expand each target amino acid pair; that is, a plurality of target amino acid pairs may each be expanded using a linker, but not all linkers are the same. For example, a first target amino acid pair may be expanded using a first linker and a second target amino acid pair may be expanded using a second linker. This is described in more detail herein.

[0117] For avoidance of doubt, and as described in more detail herein, the target amino acid pairs may be located at any position within the target polypeptide. In some embodiments a plurality of target amino acid pairs are distributed randomly or pseudo- randomly throughout the target polypeptide. In some embodiments target amino acid pairs may be adjacent to each other, such that the second amino acid in a first target amino acid pair is adjacent to the first amino acid in a second target amino acid pair in the sequence of the target polypeptide. In some embodiments target amino acid pairs may overlap, such that the second amino acid in a first target amino acid pair is the first amino acid in a second target amino acid pair.

[0118] Some of the disclosed methods involve contacting the target polypeptide with one or more proteolytic enzymes. Some of the disclosed methods involve contacting the target polypeptide with a chemical reagent to cleave the target polypeptide. Suitable proteolytic enzymes and suitable chemical reagents are described in more detail herein.

[0119] Expanding the target polypeptide generates a peptide-linker construct. The expansion of the target polypeptide maintains the sequence order of the amino acids in the target polypeptide. The amino acids which are comprised in each target amino acid pair may be separated by one or more linkers as described herein, but the order of the target amino acid sequence is not altered. The methods thus can be understood in terms of inserting linkers between amino acids in a target polypeptide sequence without altering the amino acid sequence of the target polypeptide. This is advantageous because it means that information which derives from the sequence order of the amino acids in the target polypeptide is preserved. Accordingly, the disclosed methods allow for (inter alia) improved characterisation methods of a target polypeptide, as the need for reconstruction of the original amino acid sequence of the target polypeptide from characterisation of individual fragments of the target polypeptide is reduced or eliminated.

[0120] In some embodiments, the cleavage of the target polypeptide in the production of the peptide linker construct generates a plurality of contiguous peptide fragments each comprising a first end comprising the first amino acid, and a second end comprising the second amino acid. In the peptide linker construct, the plurality of contiguous peptide fragments are each separated by a linker as described herein. However, the order of the peptide fragments maintains the amino acid sequence order of the target polypeptide.

[0121] Therefore, in some embodiments of the disclosed methods, the peptide-linker construct comprises a concatemer of contiguous peptide fragments, wherein proximate peptide fragments are linked together, and wherein the sequence order of the amino acids in the concatemer is the same as the sequence order of the amino acids in the target polypeptide.

[0122] The peptide linker construct can be described as a concatemer, i.e. a concatemer of n contiguous peptide fragments each separated by a linker. In such embodiments, n is a positive integer.

[0123] As explained below in more detail, in some embodiments the target amino acid pair comprises a first amino acid N-terminal to a second amino acid. In some such embodiments, expanding the target polypeptide generates a concatemer comprising n contiguous peptide fragments, and wherein the concatemer comprises a structure:

[0124] N-... [PEPJ1A-1ELink2E-2A[PEPx+7]1A-1ELink2E-. . . -2[PEP()] ... -C wherein [PEP ], [PEPX+;], . . . , [PEP „] represent the n contiguous peptide fragments;1Arepresents the first amino acid in each peptide fragment;2Arepresents the second amino acid in each peptide fragment; each Link represents a linker;1Erepresents the first end of each linker;2Erepresents the second end of each linker; N represents the N-terminus of the concatemer; and C represents the C-terminus of the concatemer.

[0125] In some embodiments the target amino acid pair comprises a first amino acid C- terminal to a second amino acid. In some such embodiments, expanding the target polypeptide generates a concatemer comprising n contiguous peptide fragments, and wherein the concatemer comprises a structure: wherein [PEP ], [PEPX+;], . . . , [PEPA„] represent the n contiguous peptide fragments;1Arepresents the first amino acid in each peptide fragment;2Arepresents the second amino acid in each peptide fragment; each Link represents a linker;1Erepresents the first end of each linker;2Erepresents the second end of each linker; N represents the N-terminus of the concatemer; and C represents the C-terminus of the concatemer.

[0126] These features are described in more detail herein.

[0127] Some of the disclosed methods involve attaching a sequencing adapter to the peptide linker construct that is formed via expansion of the target polypeptide. Suitable sequencing adapters are described in more detail herein.

[0128] Some of the disclosed methods involve loading a motor protein onto the peptide linker construct that is formed via expansion of the target polypeptide, or onto a sequencing adapter attached thereto. Suitable motor proteins are described in more detail herein.

[0129] The peptide linker construct has a variety of uses. In some embodiments the linker may provide a binding site for a motor protein. The motor protein may therefore be used to control the movement of the peptide linker construct, for example with respect to a nanopore. If measurements characteristic of the peptide linker construct are taken as the peptide linker construct moves with respect to the nanopore, the peptide linker construct can be characterised; and in doing so the target polypeptide may be characterised. However, the methods are not limited to generation of peptide linker constructs for characterisation (whether or not using a nanopore).

[0130] When the peptide linker construct is to be characterised, it can be characterised in any suitable method. Most generally, it can be characterised by contacting it with a suitable detector. As further described herein, a nanopore is provided as an exemplary detector which can be used in the disclosed methods. Thus, whilst embodiments described herein refer to characterisation of an oligopeptide-adapter adduct using a nanopore, the methods provided herein are also amenable to other detectors including (i) a zero-mode waveguide, (ii) a field-effect transistor, optionally a nanowire field-effect transistor; (iii) an AFM tip; (iv) a nanotube, optionally a carbon nanotube and (v) a nanopore. The disclosed methods are particularly amenable to methods in which an peptide linker construct is moved through a detector or through a structure containing a detector, e.g. a well in a detector chip.

[0131] Some disclosed methods involve taking one or more measurements characteristic of the polypeptide portion of the peptide linker construct as the construct moves with respect to a nanopore. Some suitable measurements are described in more detail herein.

[0132] Thus, the disclosed methods have several advantages. They allow for efficient generation of peptide linker constructs which may be used in characterisation of a polypeptide. They allow for the controlled insertion of linkers into a polypeptide, which may have a variety of uses. They may allow simplified sample preparation and can reduce the data processing needed in known methods.

[0133] Accordingly, the disclosed methods offer a beneficial method for generating peptide linker constructs, which are suitable for characterisation e.g. using a nanopore.

[0134] Further details of the disclosed methods are described in more detail herein.

[0135] Target Polypeptide

[0136] The disclosed methods comprise expanding a target polypeptide with a linker as described in more detail herein. Any suitable target polypeptide can be addressed in such methods.

[0137] As used herein, the term polypeptide refers to a peptide, polypeptide or protein which may, for example, be intended for characterisation in accordance with the present methods. The term polypeptide and peptide, polypeptide or protein can be used interchangeably unless implied otherwise by the context, but typically a peptide is a portion of a target polypeptide which may be derived by cleavage of a polypeptide as described in more detail herein.

[0138] In some embodiments the target polypeptide is an unmodified protein or a portion thereof, or a naturally occurring polypeptide or a portion thereof.

[0139] In some embodiments the target polypeptide is a modified protein or a portion thereof.

[0140] In some embodiments the target polypeptide is a denatured protein or a portion thereof. In some embodiments a protein may be denatured by contacting it with one or more denaturing conditions. Suitable denaturing conditions include chaotropic agents such as urea (often used at a concentration of about 5 to about 10 M, e.g. about 6-8 M), guanidinium chloride (often used at a concentration of about 5 to about 10 M, e.g. about 5- 7 M), lithium perchlorate (often used at a concentration of about 3 to about 6 M, e.g. about 4.5 M); and detergents such as sodium dodecyl sulfate and the like.

[0141] In some embodiments the target polypeptide is secreted from cells. Alternatively, the polypeptide can be produced inside cells such that it must be extracted from cells for use in the disclosed methods. The target polypeptide may comprise the products of cellular expression of a plasmid, e.g. a plasmid used in cloning of proteins in accordance with the methods described in Sambrook et al., Molecular Cloning: A Laboratory Manual, 4thed., Cold Spring Harbor Press, Plainsview, New York (2012); and Ausubel et al., Current Protocols in Molecular Biology (Supplement 114), John Wiley & Sons, New York (2016).

[0142] In some embodiments, for example, the expression is expression in a bacterial cell, a yeast cell, an insect cell or a mammalian cell; or may be a cell free expression method such as a translation system selected from rabbit reticulocyte lysate, wheat germ extract, and E. coli cell-free systems (available commercially, such as from the PURExpress® systems available from New England Biolabs (Ipswich, MA, USA)). In some embodiments the expression is from the genomic DNA of an organism.

[0143] The target polypeptide may be obtained from or extracted from any organism or microorganism. The target polypeptide may be obtained from a human or animal, e.g. from urine, lymph, saliva, mucus, seminal fluid or amniotic fluid, or from whole blood, plasma or serum. The target polypeptide may be obtained from a plant e.g. a cereal, legume, fruit or vegetable. The target polypeptide may be obtained from bacteria, protozoa, algae or fungi.

[0144] The target polypeptide can be provided as an impure mixture of one or more polypeptides and one or more impurities. Impurities may comprise truncated forms of the polypeptide which are distinct from the intended polypeptide for use in the disclosed methods. For example, the target polypeptide may be a full length protein and impurities may comprise fractions of the protein. Impurities may also comprise proteins other than the polypeptide e.g. which may be co-purified from a cell culture or obtained from a sample.

[0145] A target polypeptide may comprise any combination of any amino acids, amino acid analogs and modified amino acids (i.e. amino acid derivatives). Amino acids (and derivatives, analogs etc) in the polypeptide can be distinguished by their physical size and charge.

[0146] The amino acids / derivatives / analogs can be naturally occurring or artificial. In some embodiments the target polypeptide may comprise any naturally occurring amino acid. Twenty amino acids are encoded by the universal genetic code. These are alanine (A), arginine (R), asparagine (N), aspartic acid (D), cysteine (C), glutamic acid / glutamate (E), glutamine (Q), glycine (G), histidine (H), isoleucine (I), leucine (L), lysine (K), methionine (M), phenylalanine (F), proline (P), serine (S), threonine (T), tryptophan (W), tyrosine (Y) and valine (V). Other naturally occurring amino acids include selenocysteine and pyrrolysine.

[0147] In some embodiments the target polypeptide is modified. In some embodiments the target polypeptide is modified for detection using the disclosed methods. In some embodiments the disclosed methods are for characterising modifications in the target polypeptide.

[0148] In some embodiments one or more of the amino acids / derivatives / analogs in the target polypeptide is modified. In some embodiments one or more of the amino acids / derivatives / analogs in the target polypeptide is post-translationally modified. As such, the methods disclosed herein can be used to detect the presence, absence, number of positions of post-translational modifications in a target polypeptide. The disclosed methods can be used to characterise the extent to which a target polypeptide has been post- translationally modified.

[0149] Any one or more post-translational modifications may be present in the target polypeptide. Typical post-translational modifications include modification with a hydrophobic group, modification with a cofactor, addition of a chemical group, glycation (the non-enzymatic attachment of a sugar), biotinylation and pegylation. Post-translational modifications can also be non-natural, such that they are chemical modifications done in the laboratory for biotechnological or biomedical purposes. This can allow monitoring the levels of the laboratory made polypeptide in contrast to the natural counterparts.

[0150] Examples of post-translational modification with a hydrophobic group include myristoylation, attachment of myristate, a Ci4 saturated acid; palmitoylation, attachment of palmitate, a Ci6 saturated acid; isoprenylation or prenylation, the attachment of an isoprenoid group; famesylation, the attachment of a farnesol group; geranylgeranylation, the attachment of a geranylgeraniol group; and glypiation, and glycosylphosphatidylinositol (GPI) anchor formation via an amide bond.

[0151] Examples of post-translational modification with a cofactor include lipoylation, attachment of a lipoate (Cs) functional group; flavination, attachment of a flavin moiety (e.g. flavin mononucleotide (FMN) or flavin adenine dinucleotide (FAD)); attachment of heme C, for instance via a thioether bond with cysteine; phosphopantetheinylation, the attachment of a 4'-phosphopantetheinyl group; and retinylidene Schiff base formation.

[0152] Examples of post-translational modification by addition of a chemical group include acylation, e.g. O-acylation (esters), N-acylation (amides) or S-acylation (thioesters); acetylation, the attachment of an acetyl group for instance to the N-terminus or to lysine; formylation; alkylation, the addition of an alkyl group, such as methyl or ethyl; methylation, the addition of a methyl group for instance to lysine or arginine; amidation; butyrylation; gamma-carboxylation; glycosylation, the enzymatic attachment of a glycosyl group for instance to arginine, asparagine, cysteine, hydroxylysine, serine, threonine, tyrosine or tryptophan; poly si alyl ati on, the attachment of polysialic acid; malonylation; hydroxylation; iodination; bromination; citrulination; nucleotide addition, the attachment of any nucleotide such as any of those discussed above, ADP ribosylation; oxidation; phosphorylation, the attachment of a phosphate group for instance to serine, threonine or tyrosine (O-linked) or histidine (N-linked); adenylyl ati on, the attachment of an adenylyl moiety for instance to tyrosine (O-linked) or to histidine or lysine (N-linked); propionylation; pyroglutamate formation; S-glutathionylation; Sumoylation; S- nitrosylation; succinylation, the attachment of a succinyl group for instance to lysine; sei enoyl ati on, the incorporation of selenium; and ubiquitinilation, the addition of ubiquitin subunits (N-linked).

[0153] It is within the scope of the methods provided herein that the target polypeptide is labelled with a molecular label. A molecular label may be a modification to the target polypeptide which promotes the detection of the polypeptide in the methods provided herein. For example the label may be a modification to the polypeptide which alters the signal obtained as a conjugate comprising the polypeptide (e.g. an oligopeptide-adapter adduct) is characterised. For example, the label may interfere with a flux of ions through the nanopore. In such a manner, the label may improve the sensitivity of the methods.

[0154] In some embodiments the target polypeptide contains one or more cross-linked sections, e.g. C-C bridges. In some embodiments the target polypeptide is not cross-linked prior to the disclosed methods.

[0155] In some embodiments the target polypeptide comprises sulphide-containing amino acids and thus has the potential to form disulphide bonds. Typically, in such embodiments, the target polypeptide is reduced using a reagent such as DTT (Dithiothreitol) or TCEP (tris(2-carboxyethyl)phosphine) prior to being characterised using the disclosed methods. In some embodiments the target polypeptide is a full length protein or naturally occurring polypeptide.

[0156] The target polypeptide can be a polypeptide of any suitable length. In some embodiments the target polypeptide has a length of from about 10 to about 40,000 peptide units. In some embodiments the target polypeptide has a length of from about 50 to about 40,000 peptide units. In some embodiments the target polypeptide has a length of from about 50 to about 35,000 peptide units. In some embodiments the target polypeptide has a length of from about 50 to about 20,000 peptide units. In some embodiments the target polypeptide has a length of from about 50 to about 10,000 peptide units. In some embodiments the target polypeptide has a length of from about 100 to about 5,000 peptide units, for example from about 200 to about 1000 peptide units, e.g. from about 300 to about 500 peptide units, such as from about 200 to about 400 peptide units.

[0157] The target polypeptide is cleaved in the disclosed methods as described in more detail herein, to form a plurality of peptide fragments (which may also be referred to as oligopeptide fragments). In some embodiments the number of peptide fragments that arise from a given target polypeptide may be chosen or determined by the user according to the choice of first and second amino acids in the target amino acid pair. In some embodiments there is no need for the number of peptide fragments to be determined. In some embodiments at least 2, at least 3, at least 4, at least 5, at least 10, at least 20, at least 30, at least 40, at least 50, at least 100, at least 500 or at least 1000, at least 5000, at least 10000, or more peptide fragments may be formed from the target polypeptide in the disclosed methods.

[0158] As described herein, a peptide fragment in the disclosed methods may have a length of at least 1 peptide unit. A peptide fragment may have a length of from about 1 peptide unit to about 100, about 200, about 300 or about 400 peptide units. A peptide fragment in the disclosed methods typically has a length of from about 2 to about 50 peptide units, such as from about 5 to about 30 peptide units. Accordingly, the target polypeptide has a length greater than the length of the peptide fragments generated by cleaving the target polypeptide.

[0159] Any number of polypeptides can be used in the disclosed methods. For instance, the method may comprise processing 2, 3, 4, 5, 6, 7, 8, 9, 10, 20, 30, 50, 100, 200, 500, 1000, 2000 or more polypeptides or about 10, about 50, about 100, about 200, about 500, about 1000, or about 2000 polypeptides. If two or more polypeptides are used, they may be different polypeptides or two or more instances of the same polypeptide. In some embodiments the target polypeptide to be processed in the disclosed methods is present in a sample. In some embodiments the sample comprises a plurality of different polypeptides. In some embodiments the plurality may comprise at least 2, at least 3, at least 4, at least 5, at least 10, at least 20, at least 50, at least 100, at least 500, at least 1000, at least 5000, at least 10000, at least 100000, at least 1000000 or more polypeptides.

[0160] Target amino acid pairs

[0161] In the disclosed methods, a target polypeptide as described herein is expanded by attaching a linker between first and second amino acids in a target amino acid pair.

[0162] A target amino acid pair is a pair of two amino acids present in the target polypeptide. A target amino acid thus comprises a first amino acid attached to a second amino acid. In some embodiments the first amino acid is attached to the second amino acid via a peptide bond.

[0163] In some embodiments a target polypeptide comprises a plurality of target amino acid pairs. In some embodiments a target polypeptide comprises at least one, at least two, at least 3, at least 4, at least 5, at least 10, at least 20, at least 50, at least 100, at least 500, at least 1000, at least 5000, at least 10000, or more target amino acid pairs.

[0164] For avoidance of doubt, and as described in more detail herein, the one or more target amino acid pairs may be located at any position within the target polypeptide. In some embodiments a plurality of target amino acid pairs are distributed randomly or pseudo-randomly throughout the target polypeptide. Thus, in some embodiments the target polypeptide comprises

[0165] ...Xn-[A-B]-Xn-[A-B]-Xn... wherein X represents an amino acid, [A-B] represents a target amino acid pair as described herein, and n is a positive integer, wherein each X and n may differ, and wherein each [A- B] may be the same or different.

[0166] In some embodiments two or more target amino acid pairs may be adjacent to each other, such that the second amino acid in a first target amino acid pair is adjacent to the first amino acid in a second target amino acid pair in the sequence of the target polypeptide. Thus, in some embodiments the target polypeptide comprises

[0167] ...Xn-[A-B]-[A-B]-Xn... wherein X represents an amino acid, [A-B] represents a target amino acid pair as described herein, and n is a positive integer, wherein each X and n may differ, and wherein each [A- B] may be the same or different. In some embodiments target amino acid pairs may overlap, such that the second amino acid in a first target amino acid pair is the first amino acid in a second target amino acid pair. Thus, in some embodiments the target polypeptide comprises ...Xn-[A-B-C]-Xn... wherein X represents an amino acid, [A-B] represents a first target amino acid pair and [B- C represents a second target amino acid pair as described herein, such that [A-B-C] represents overlapping first and second target amino acid pairs, and n is a positive integer, wherein each X and n may differ, and wherein each [A-B] and [B-C] may be the same or different.

[0168] In some embodiments the first amino acid and the second amino acid in each target amino acid pair are separated by at most 10, e.g. at most 5, e.g. at most 2, e.g. at most 1, e.g. zero amino acids.

[0169] In some embodiments the first amino acid and the second amino acid in each target amino acid pair are adjacent to one another in the target polypeptide. Thus, in some embodiments, in each target amino acid pair the first and second amino acids are adjacent.

[0170] In some embodiments the second amino acid is C-terminal to the first amino acid in the target amino acid pair. Thus, in some embodiments an amino acid pair consists of amino acids A and B wherein A is the first amino acid and B is the second amino acid in the target amino acid pair, and the target polypeptide comprises

[0171] A-...Xn-[A-B]-Xn...-C wherein X represents an amino acid, [A-B] represents a target amino acid pair, n is a positive integer, wherein each X and n may differ, and wherein N represents the N-terminal end of the target polypeptide and C represents the C-terminal end of the target polypeptide. In some embodiments the target polypeptide comprises a plurality of such target amino acid pairs. In some embodiments, therefore, the first amino acid is at the N-terminus of a first peptide fragment and the second amino acid is at the C-terminus of the adjacent peptide fragment in the target polypeptide.

[0172] In some embodiments the second amino acid is N-terminal to the first amino acid in the target amino acid pair. Thus, in some embodiments an amino acid pair consists of amino acids A and B wherein A is the first amino acid and B is the second amino acid in the target amino acid pair, and the target polypeptide comprises

[0173] A-...Xn-[B-A]-Xn...-C wherein X represents an amino acid, [B-A] represents a target amino acid pair, n is a positive integer, wherein each X and n may differ, and wherein N represents the N-terminal end of the target polypeptide and C represents the C-terminal end of the target polypeptide. In some embodiments the target polypeptide comprises a plurality of such target amino acid pairs. In some embodiments, therefore, the first amino acid is at the C-terminus of a first peptide fragment and the second amino acid is at the N-terminus of the adjacent peptide fragment in the target polypeptide.

[0174] In some embodiments each first amino acid in each target amino acid pair is the same. Thus, in some embodiments the target polypeptide may comprise a plurality of target amino acid pairs wherein each target amino acid pair comprises a moiety [A-B] or [B-A], wherein A is the first amino acid and B is the second amino acid, and wherein each A in each target amino acid pair is the same.

[0175] Typically there is no particular limitation on the identity of the second amino acid in the target amino acid pair. For example, in some embodiments the second amino acid may be identified as the amino acid immediately C-terminal to the first amino acid at each occurrence in the target polypeptide. In some embodiments the second amino acid may be identified as the amino acid immediately C-terminal to the first amino acid at each occurrence in the target polypeptide.

[0176] In some embodiments the second amino acid is defined such that the or each target amino acid pair comprises specific first and second amino acids. Thus in some embodiments the target polypeptide may comprise a plurality of target amino acid pairs wherein each target amino acid pair comprises a moiety [A-B] or [B-A], wherein A is the first amino acid and B is the second amino acid, and wherein each A in each target amino acid pair is the same, and wherein each B in each target amino acid pair is the same.

[0177] In some embodiments the target polypeptide comprises a plurality of different target amino acid pairs. Those skilled in the art will appreciate that in these embodiments each type of amino acid pair may be different. Thus, for example, in some embodiments the target polypeptide comprises one or more first target amino acid pairs each of which comprises a moiety [A-B] or [B-A], wherein A is the first amino acid and B is the second amino acid, and wherein each A in each target amino acid pair is the same, and wherein each B in each target amino acid pair is optionally the same as described above; and the target polypeptide also comprises one or more second target amino acid pairs each of which comprises a moiety [C-D] or [D-C], wherein C is the first amino acid and D is the second amino acid, and wherein each C in each target amino acid pair is the same, and wherein each D in each target amino acid pair is optionally the same as described above. Further types of target amino acid pair may also be present. Typically the first amino acid in each type of target amino acid pair is different. Thus, for example, in some embodiments the target polypeptide comprises one or more first target amino acid pairs each of which comprises a moiety [A-B] or [B-A], wherein A is the first amino acid and B is the second amino acid; and the target polypeptide also comprises one or more second target amino acid pairs each of which comprises a moiety [C-D] or [D-C], wherein C is the first amino acid and D is the second amino acid; and wherein each A in each target amino acid pair is the same, and each B in each target amino acid pair is optionally the same as described above; and wherein each C in each target amino acid pair is the same, and wherein each D in each target amino acid pair is optionally the same as described above; and wherein A is different to C, and optionally B may be the same or different to D.

[0178] Thus, in some embodiments the target polypeptide comprises a plurality of target amino acid pairs as described herein. Each target amino acid pair in the target polypeptide comprises a first amino acid attached to a second amino acid. In some embodiments the plurality of target amino acid pairs comprises at least two different first amino acids. In some embodiments the second amino acids in each of the one or more target amino acid pairs may be the same or different. Thus in some embodiments the plurality of target amino acid pairs comprises at least two different second amino acids. In some embodiments the second amino acids in the plurality of target amino acid pairs may be any amino acid present in the target polypeptide.

[0179] In some embodiments the first amino acid in each amino acid pair is as described herein. In some embodiments the first amino acid in each amino acid pair is selected from Lys, Arg, Glu, Asp, Cys, Ser, Thr, Tyr and His. In some embodiments the first amino acid in each amino acid pair is Lys. In some embodiments the first amino acid in each amino acid pair is selected from Lys, Arg, Glu, Asp, Cys, Ser, Thr, Tyr and His and the second amino acid in each amino acid pair is the amino acid N-terminal adjacent or C-terminal adjacent to the first amino acid in the target polypeptide. In some embodiments the first amino acid in each amino acid pair is selected from Lys, Arg, Glu, Asp, Cys, Ser, Thr, Tyr and His and the second amino acid in each amino acid pair is the amino acid C-terminal adjacent to the first amino acid in the target polypeptide. In some embodiments the first amino acid in each amino acid pair is Lys and the second amino acid in each amino acid pair is the amino acid N-terminal adjacent or C-terminal adjacent to the first amino acid in the target polypeptide. In some embodiments the first amino acid in each amino acid pair is Lys and the second amino acid in each amino acid pair is the amino acid C-terminal adjacent to the first amino acid in the target polypeptide. In some embodiments the first amino acid in each amino acid pair is selected from Lys, Arg, Glu, Asp, Cys, Ser, Thr, Tyr and His and the second amino acid in each amino acid pair is different from the first amino acid. In some embodiments the first amino acid in each amino acid pair is Lys and the second amino acid in each amino acid pair is different from the first amino acid.

[0180] Linker

[0181] The methods disclosed herein comprise expanding the target polypeptide to form a peptide-linker construct; by attaching a linker between the first and second amino acids in each target amino acid pair, and cleaving the target polypeptide between the first and second amino acids in each target amino acid pair.

[0182] In some embodiments the target polypeptide comprises a plurality of target amino acid pairs as described herein. Thus, in some embodiments, expanding the target polypeptide to form a peptide-linker construct comprises attaching a first linker between the first and second amino acids in a first target amino acid pair and attaching a second linker between the first and second amino acids in a second target amino acid pair. Those skilled in the art will appreciate that if further types of target amino acid pairs are present in the target polypeptide then further linkers may be used accordingly.

[0183] Accordingly, in some embodiments of the disclosed methods, the target polypeptide may be considered as comprising a plurality of contiguous peptide fragments as described in more detail herein. In some embodiments, the first end of each polymer reacts with the first end of the proximate peptide fragment in the plurality of contiguous peptide fragments. In some embodiments the second end of each polymer reacts with the second end of the proximate peptide fragment in the plurality of contiguous peptide fragments.

[0184] In some embodiments the linker used to link the first and second amino acids in a first target amino acid pair may be the same or different to the linker used to link the first and second amino acids in a second target amino acid pair. In some embodiments, each type of target amino acid pair in the target polypeptide is linked with a different linker. In other words, in some embodiments expanding the target polypeptide to form a peptide- linker construct comprises attaching a first linker between the first and second amino acids in a first target amino acid pair and attaching a second linker between the first and second amino acids in a second target amino acid pair, and the first linker and the second linker are different. In some embodiments each type of target amino acid pair in the target polypeptide is linked with the same linker. In other words, in some embodiments expanding the target polypeptide to form a peptide-linker construct comprises attaching a first linker between the first and second amino acids in a first target amino acid pair and attaching a second linker between the first and second amino acids in a second target amino acid pair, and the first linker and the second linker are the same.

[0185] In some embodiments each type of target amino acid pair in the target polypeptide is linked with a linker comprising a common feature (e.g. a specific polynucleotide and / or polypeptide sequence,) but the linkers are not identical. For example, in some embodiments the linker used to link the first and second amino acids in a first target amino acid pair and the linker used to link the first and second amino acids in a second target amino acid pair each comprise a polynucleotide and / or a polypeptide, but the two linkers are not identical. In some embodiments the linker used to link the first and second amino acids in a first target amino acid pair and the linker used to link the first and second amino acids in a second target amino acid pair each comprise a different polynucleotide and / or a different polypeptide. In some embodiments the linker used to link the first and second amino acids in a first target amino acid pair and the linker used to link the first and second amino acids in a second target amino acid pair each comprise the same polynucleotide and / or polypeptide, but the linkers each further comprise one or more additional moieties such as one or more additional reactive groups which are different.

[0186] In the disclosed methods, any suitable linker can be used.

[0187] Generally speaking, in some embodiments the or each linker independently comprises a multi-functional molecule. As used herein, a multi-functional molecule is a molecule having at least two reactive functional groups. In some embodiments the multifunctional molecule has a first reactive group capable of attaching to the first amino acid in a target amino acid pair and a second reactive group capable of attaching to the second amino acid in the target amino acid pair. In some embodiments the first and second reactive functional groups are the same. In some embodiments the first and second reactive functional groups are different.

[0188] In some embodiments the multi-functional molecule comprises a linear or branched, unsubstituted or substituted alkylene, alkenylene, alkynylene, arylene, heteroarylene, carbocyclylene or heterocyclylene moiety. In some embodiments the multifunctional molecule comprises an unsubstituted or substituted alkylene, alkenylene, or alkynylene moiety. In some embodiments the multi-functional molecule comprises an unsubstituted or substituted alkylene or alkenylene moiety. In some embodiments the multi-functional molecule comprises an unsubstituted or substituted alkylene moiety. Typically, an alkylene group is a Ci-io alkylene group. Typically, an alkenylene group is a C2-10 alkenylene group. Typically, an alkynylene group is a C2-10 alkynylene group. Typically, an arylene group is a Ce-12 arylene group. Typically, a heteroarylene group is a 5- to 12- membered heteroarylene group. Typically, a carbocyclylene group is a C5-12 carbocyclylene group. Typically, a heterocyclylene group is a 5- to 12- membered heterocyclylene group.

[0189] An alkylene, alkenylene, or alkynylene moiety may be uninterrupted or interrupted by or terminate in one or more atoms or groups selected from maleimide, O, N(R), S, C(O), C(O)NR, C(O)O, phosphate, thiophosphate, unsubstituted or substituted arylene, unsubstituted or substituted heteroarylene, unsubstituted or substituted carbocyclylene and unsubstituted or substituted heterocyclylene; wherein R is selected from H, unsubstituted or substituted alkyl, and unsubstituted or substituted aryl.

[0190] An exemplary multifunctional molecule which can be used as a linker in the disclosed methods comprises a linear or branched, unsubstituted or substituted alkylene, alkenylene, alkynylene, arylene, heteroarylene, carbocyclylene or heterocyclylene moiety comprising at least two reactive groups, denoted herein as first reactive groups and second reactive groups. The first and second reactive groups may be the same or different. In some embodiments the first and second reactive groups are each independently selected from maleimide groups, carboxyl groups, amine groups, hydroxy groups, azide groups and alkyne groups. In some embodiments the first and second reactive groups are each maleimide groups. In some embodiments the linker comprises a multifunctional molecule which comprises a linear or branched, unsubstituted or substituted alkylene, alkenylene, alkynylene, arylene, heteroarylene, carbocyclylene or heterocyclylene moiety comprising two reactive maleimide groups. An exemplary linker is a bis-maliemido-hydrocarbon such as a bis-maliemido-alkane, e.g. a bis-maliemidoethane (BMOE).

[0191] In some embodiments a linker comprises a multifunctional molecule such as glutaraldehyde, disuccinimidyl suberate (DSS), bismaleimidoethane (BMOE), sulfo- SMCC (sulfosuccinimidyl 4-(N-maleimidomethyl)cyclohexane-l -carboxylate), sulfo- EMCS (N-s-maleimidocaproyl-oxysulfosuccinimide ester), EDC (l-ethyl-3-carbodiimide), DMP (Dess-Martin periodinane; l,l,l-Tris(acetyloxy)-l,l-dihydro-l,2-benziodoxol-3- (IH)-one), bis(sulfosuccinimidyl) suberate (BS3), DTME (dithiobismaleimidoethane), SPDP (succinimidyl 3-(2-pyridyldithio)propionate), BMH (bismaleimidohexane), sulfo- GMBS (N-y-maleimidobutyryl-oxysulfosuccinimide ester), MPBH (4-(4-N- maleimidophenyl)butyric acid hydrazide), NHS-PC (e.g., PC Biotin-PEG-NHS ester, e.g. CAS number 2353409-93-3), and NHS-PEG-maleimide (e.g. CAS number 1325208-25-0).

[0192] In some embodiments, a linker is or comprises a polymer. In some embodiments, a linker is or comprises a biopolymer. Suitable polymers include polynucleotides, polypeptides, polysaccharides, poly(ethylene glycols) and the like. In some embodiments the or each linker independently comprises a polynucleotide, a polypeptide and / or a polysaccharide. In some embodiments the or each linker independently comprises a polynucleotide, and / or a polypeptide. In some embodiments the or each linker independently comprises or consists of a polynucleotide. In some embodiments the or each linker independently comprises or consists of a polypeptide.

[0193] In some embodiments a linker comprises a charged polymer. In some embodiments a linker comprises a polynucleotide, or a charged polypeptide. In some embodiments a linker comprises a polynucleotide or a polypeptide comprising polyglutamic acid, polyaspartic acid, polyarginine, and / or polylysine.

[0194] In some embodiments a linker comprises a synthetic polymer. In some embodiments a linker comprises or consists of a polymer such as polyethylene glycol (PEG), poly(vinyl alcohol) (PVA), polyethylene oxide) (PEO), poly(acrylic acid) (PAA), polypropylene glycol) (PPG), poly(caprolactone) (PCL), polydimethylsiloxane (PDMS), poly(methacrylate) and derivatives thereof, polyurethane, poly(2-oxazoline), poly(N- isopropylacrylamide) (PNIPAM), and / or polyethyleneimine (PEI). In some embodiments a linker comprises a dendrimer.

[0195] A linker for use in the disclosed methods can be chosen or selected according to the applications for which the peptide linker construct is intended.

[0196] For example, in some embodiments the peptide linker construct is to be characterised by taking one or more measurements characteristic of the construct as the construct moves with respect to a nanopore. In some embodiments the construct moves with respect to the nanopore under the control of a motor protein. In some embodiments the motor protein is a polynucleotide motor protein (also known as a polynucleotide- handling protein) as described herein. In such embodiments, the linker may comprise or consist of a polynucleotide. For example, this may be useful as a polynucleotide- containing linker may provide a loading site or binding site for the polynucleotide motor protein. In some embodiments the motor protein is a polypeptide motor protein. In some such embodiments, the linker may comprise or consist of a polypeptide. For example, this may be useful as a polypeptide-containing linker may provide a loading site or binding site for the polypeptide motor protein.

[0197] When the linker is a polymer, the length of the linker is not particularly limited and can be chosen or selected as required by the operator of the disclosed methods. The choice of linker is an operational parameter within the control of the skilled user.

[0198] In some embodiments, however, the linker is a polymer (e.g. a polynucleotide and / or a polypeptide) comprising from about 2 to about 1000 monomer units, such as from about 5 to about 500 monomer units, e.g. from about 10 to about 100 monomer units, e.g. from about 10 to about 50 monomer units, e.g. about 10, about 15, about 20, about 30, about 40 or about 50 monomer units. In some embodiments the linker comprises or consists of a polynucleotide comprising from about 2 to about 1000 nucleotides, such as from about 5 to about 500 nucleotides, e.g. from about 10 to about 100 nucleotides, e.g. from about 10 to about 50 nucleotides, e.g. about 10, about 15, about 20, about 30, about 40 or about 50 nucleotides. In some embodiments the linker comprises or consists of a polypeptide comprising from about 2 to about 1000 amino acids (optionally including amino acid analogs), such as from about 5 to about 500 amino acids, e.g. from about 10 to about 100 amino acids, e.g. from about 10 to about 50 amino acids, e.g. about 10, about 15, about 20, about 30, about 40 or about 50 amino acids.

[0199] In some embodiments the linker comprises a polymer comprising at least two reactive groups, denoted herein as first reactive groups and second reactive groups. The first and second reactive groups may be the same or different. In some embodiments the first and second reactive groups are each independently selected from maleimide groups, carboxyl groups, amine groups, hydroxy groups, azide groups and alkyne groups. In some embodiments the first and second reactive groups are each maleimide groups. In some embodiments the linker comprises a polymer (e.g. a polynucleotide or polypeptide) comprising first and second reactive groups selected from maleimide groups, carboxyl groups, amine groups, hydroxy groups, azide groups and alkyne groups. In some embodiments the linker comprises a polymer (e.g. a polynucleotide or polypeptide) comprising first and second reactive maleimide groups.

[0200] As discussed in more detail herein, the or each linker typically comprises a first end and a second end, and wherein attaching a linker between the first and second amino acids in each target amino acid pair comprises attaching the first end of the linker to the first amino acid and attaching the second end of the linker to the second amino acid. The first and second ends of the linkers are described in more detail herein.

[0201] In some embodiments, a linker as described herein comprises a hairpin. As used herein, the term “hairpin” indicates that the first and second ends of the linker are oriented in the same direction or substantially the same direction with respect to the linker. Those skilled in the art will appreciate that as used herein the term “hairpin” does not require that the linker comprises a polymer, although polymers are provided as exemplary linkers. A multi-functional molecule as defined herein can in some embodiments be considered as comprising a hairpin.

[0202] Thus, in one embodiment, the first and second ends of the linker are oriented in the same direction or substantially the same direction with respect to the linker. For example, in one embodiment, the linker comprises a hairpin configuration such that the first end and the second end of the linker are oriented in the same general direction. In one embodiment, the hairpin may be conceptually considered as forming a U-shape or V-shape, with the first and second ends extending proximally toward the same direction relative to the central axis of the linker. In one embodiment, the linker is configured such that the first end and the second end are substantially parallel to one another, and the linker comprises a bend or loop forming the curvature of a hairpin. In some embodiments a linker comprising a hairpin is advantageous in the disclosed methods because reaction of a first end of the linker with a first amino acid in a target amino acid pair positions the second end of the linker proximate to the second amino acid in the target amino acid pair. This can favour preferential reaction of the second end of the linker with the second amino acid in the target amino acid pair.

[0203] In some embodiments the or each linker comprises a plurality of linking portions. In some embodiments the or each linker comprises from about 2 to about 10 linking portions, such as from about 2 to about 5 linking portions, e.g. about 2, 3 or 4 linking portions. In some embodiments a first linking portion comprises a first reactive group for reacting with the first amino acid of the or each target amino acid pair. In some embodiments a second linking portion comprises a second reactive group for reacting with the second amino acid of the or each target amino acid pair. In some embodiments the linker comprises a first such linking portion, a second such linking portion, and one or more further linking portions. For example, in some embodiments a linker may comprise a first linking portion comprising a first reactive group for reacting with the first amino acid of the or each target amino acid pair and a second linking portion comprising a second reactive group for reacting with the second amino acid of the or each target amino acid pair. In some embodiments a linker may comprise a first linking portion comprising a first reactive group for reacting with the first amino acid of the or each target amino acid pair, attached to a second linking portion comprising a second reactive group for reacting with the second amino acid of the or each target amino acid pair via one or more central linking portions. Any suitable attachment chemistry can be used to attach linking portions together. Some exemplary reaction chemistries are described herein, but the methods are not limited to such methods.

[0204] In some embodiments the method comprises attaching the linking portions together prior to attachment of the linker to the target amino acid pair (e.g. to either the first or second amino acids in the target amino acid pair). Thus, in some embodiments the methods comprise assembling the linker from a plurality of linking portions before attaching the assembled linker to the first and second amino acids in the target amino acid pair.

[0205] In some embodiments the method comprises attaching the linking portions together after the first end of the linker has been attached to the first amino acid in the target amino acid pair. In some embodiments the method comprises attaching the linking portions together prior to attaching the second end of the linker to the second amino acid. Thus, in some embodiments the methods comprise attaching a first portion of the linker to the first amino acid in the target amino acid pair, and then assembling the linker from a plurality of linking portions before attaching the assembled linker to the second amino acids in the target amino acid pair. Of course the reverse approach is also possible. Accordingly, in some embodiments the methods comprise attaching a portion of the linker to the second amino acid in the target amino acid pair, and then assembling the linker from a plurality of linking portions before attaching the assembled linker to the first amino acids in the target amino acid pair.

[0206] In some embodiments the method comprises attaching the linking portions together after attaching the second end of the linker to the second amino acid. Thus, in some embodiments the methods comprise attaching a first linking portion to the first amino acid in the target amino acid pair, attaching a second linking portion to the second amino acid in the target amino acid pair, and then assembling the linker by attaching the first linking portion to the second linking portion (optionally via one or more further linking portions).

[0207] In some embodiments the linker comprises one or more cleavable moieties. In some embodiments the linker comprises one or more photocleavable moieties, In some embodiments the linker comprises one or more enzyme-cleavable moieties, such as one or more protease recognition sites (e.g. when the linker comprises a polypeptide) and / or one or more restriction sites (e.g. when the linker comprises a polynucleotide). Protease recognition sequences and restriction enzyme binding sequences are well known in the art and are described in standard reference texts and in resources such as the Alphabetized List of Recognition Sequences accessible at https: / / www.neb.com / en-gb / tools-and- resources / selection-charts / alphabetized-list-of-recognition-specificities and the MEROPS database of proteolytic enzymes (Rawlings et al, Nucleic Acids Research, 46, DI, 2018, D624-632), each incorporated by reference.

[0208] Reaction of linker with target amino acid pair

[0209] As explained above, the disclosed methods comprise attaching a linker between the first and second amino acids in each target amino acid pair, thereby expanding the target polypeptide by forming a peptide linker construct.

[0210] In some embodiments, the or each linker comprises a first end and a second end. In such embodiments, attaching a linker between the first and second amino acids in a target amino acid pair may comprise attaching the first end of the linker to the first amino acid and attaching the second end of the linker to the second amino acid.

[0211] In some embodiments, the first end of the or each linker independently comprises a first reactive group for reacting with the or each first amino acid.

[0212] The first reactive group can be chosen or determined by the user of the disclosed methods to ensure selective reaction with the or each first amino acid.

[0213] For example, in some embodiments the or each first amino acid may comprise a reactive side chain. Those skilled in the art will appreciate that the “side chain” of an amino acid refers to the chemical moiety attached to the a-carbon of the amino acid. In some embodiments the disclosed methods comprise reacting the first end of the or each linker with the side chain of the or each first amino acid. In some embodiments the disclosed methods comprise reacting a first reactive group at the first end of the or each linker with the side chain of the or each first amino acid.

[0214] In some embodiments the or each first amino acid natively comprises a reactive side chain. In some embodiments the or each first amino acid is modified to comprise a reactive side chain. In some embodiments the methods comprise activating the side chain of the or each first amino acid for reaction with the first end of the linker. In some embodiments the methods comprise activating the first end of the linker for reaction with the first amino acid. In some embodiments the methods comprise activating a first reactive group at the first end of the linker for reaction with the first amino acid.

[0215] As explained above, in some embodiments the first end of the or each linker comprises a first reactive group for reacting with the or each first amino acid. In some embodiments the first reactive group comprises a nucleophilic group. In some embodiments the first reactive group comprises an electrophilic group. In some embodiments the second end of the or each linker comprises a second reactive group for reacting with the or each second amino acid. In some embodiments the second reactive group comprises a nucleophilic group. In some embodiments the second reactive group comprises an electrophilic group.

[0216] In some embodiments the linker comprises additional reactive groups, such as those described herein. Such additional reactive groups may be used to facilitate incorporation of additional functionality into the linker, e.g. to allow for attachment of further moieties.

[0217] The attachment chemistry between the linker and the first and / or second amino acids in the target amino acid pair is not particularly limited and any suitable chemistry can be used. Practitioners are directed to texts such as March's Advanced Organic Chemistry: Reactions, Mechanisms, and Structure (2019), ed. Smith, Wiley; and to G. Hermanson, Bioconjugate Techniques, 3rd Edition (2013), each of which are hereby incorporated by reference in their entirety. Practitioners are directed particularly to discussion in those texts of reactions of amines and guanidines, carboxylic acids, alcohols, and thiols, especially amines and guanidines.

[0218] Reactive groups and their corresponding targets include aryl azides which may react with amine, carbodiimides which may react with amines and carboxyl groups, hydrazides which may react with carbohydrates, hydroxmethyl phosphines which may react with amines, imidoesters which may react with amines, isocyanates which may react with hydroxyl groups, carbonyls which may react with hydrazines, maleimides which may react with sulfhydryl groups, NHS-esters which may react with amines, PFP-esters which may react with amines, psoralens which may react with thymine, pyridyl disulfides which may react with sulfhydryl groups, vinyl sulfones which may react with sulfhydryl amines and hydroxyl groups, vinylsulfonamides, and the like. In some embodiments a reactive group (e.g. a first and / or second reactive group) is selected from NHS esters, maleimides, imido esters, aldehydes, hydrazides, epoxides, isocyanates, activated carboxylic acids, azides, thiols, iodoacetamides, bromoacetamides, diazirines, photoreactive benzophenones, alkyl halides, succinimides, pyridyldisulfides, amines, carbodiimides, and oxiranes.

[0219] Other suitable chemistry for attaching the first and / or second end of the linker to the first and / or second amino acids includes click chemistry. Many suitable click chemistry reagents are known in the art. Suitable examples of click chemistry include, but are not limited to, the following:

[0220] (a) copper(I)-catalyzed azide-alkyne cycloadditions (azide alkyne Huisgen cycloadditions);

[0221] (b) strain-promoted azide-alkyne cycloadditions; including alkene and azide [3+2] cycloadditions; alkene and tetrazine inverse-demand Diels- Alder reactions; and alkene and tetrazole photoclick reactions;

[0222] (c) copper-free variant of the 1,3 dipolar cycloaddition reaction, where an azide reacts with an alkyne under strain, for example in a cyclooctane ring such as in bicycle[6.1.0]nonyne (BCN);

[0223] (d) the reaction of an oxygen nucleophile on one linker with an epoxide or aziridine reactive moiety on the other; and

[0224] (e) the Staudinger ligation, where the alkyne moiety can be replaced by an aryl phosphine, resulting in a specific reaction with the azide to give an amide bond. Any reactive group may be used in the reaction between the linker and the amino acids. Some suitable reactive groups include [1, 4-Bis[3-(2- pyridyldithio)propionamido]butane; 1,1 1-bis-maleimidotriethyleneglycol; 3,3’- dithiodipropionic acid di(N-hydroxysuccinimide ester); ethylene glycol-bis(succinic acid N-hydroxysuccinimide ester); 4,4’ -diisothiocyanatostilbene-2, 2’ -disulfonic acid disodium salt; Bis[2-(4-azidosalicylamido)ethyl] disulphide; 3-(2-pyridyldithio)propionic acid N- hydroxysuccinimide ester; 4-maleimidobutyric acid N-hydroxysuccinimide ester; lodoacetic acid N-hydroxysuccinimide ester; S-acetylthioglycolic acid N- hydroxysuccinimide ester; azide-PEG-maleimide; and alkyne-PEG-maleimide. The reactive group may be any of those disclosed in WO 2010 / 086602, particularly in Table 3 of that application.

[0225] In one embodiment the method comprises reacting the linker with an amine group present in the first and / or second amino acids. For example, the method may comprise reacting the second end of the linker with the N-terminal amine group of the second amino acid. Examples of amine-reactive groups include: carboxylic acids and activated derivatives thereof (e.g. NHS-esters), which can form amide bonds with amine groups; thiols or activated derivatives thereof, which can form thioether bonds by reaction with amine groups; squaramates, which react with amines to form squaramides; amine coupling agents (e.g. N,N’-Disuccinimidyl carbonate, l,l'-Carbonyldiimidazole) which react with amines to form a urea linkage; and aldehydes and ketones, which react with amines via reductive amination (e.g. via reduction using a reducing agent such as NaBEE or NaBH3CN).

[0226] In one embodiment an amine group may be activated for reaction with the linker. In one embodiment an amine group may be activated by reaction with a maleimide- containing compound such as a maleimide-NHS-ester (e.g. 3-maleimido-propionic NHS ester), with the amine group reacting with the NHS-ester to form an amide, followed by reaction of the free maleimide group with the linker; for example with a thiol group of the linker, or with a diene such as a furan group.

[0227] In one embodiment an amine group may be activated for reaction with the linker using a reagent such as an imidothioester, for example Traut’s reagent. Traut’s reagent can be used to convert an amine group to a reactive thiol group as described in more detail herein.

[0228] In one embodiment an amino acid may be activated for reaction with a linker in a click chemistry reaction with the linker. In one embodiment a linker may be activated for reaction with the first or second amino acids in a click chemistry reaction. For example, in one embodiment the click chemistry reaction is a copper-catalysed azide / alkyne cycloaddition (CuAAC) reaction. In one embodiment an amine may be activated by reaction with an azide-containing compound, such as an azidoacetic acid NHS-ester, followed by reaction of the free azide group with the linker, for example with an alkyne group of the sequencing adapter. In one embodiment an amine may be activated by reaction with an alkyne-containing compound, followed by reaction of the alkyne group with the linker, for example with an azide group of the linker, such as an azidoacetic acid NHS-ester. In one embodiment the click chemistry reaction is a strain-promoted azidealkyne reaction. In one embodiment an amine may be activated by reaction with an azide- containing compound, such as an azidoacetic acid NHS-ester, followed by reaction of the free azide group with the linker, for example with a cyclooctynyl group of the linker (e.g. a bicyclononyne, BCN, group). In one embodiment an amine may be activated by reaction with an cyclooctynyl-containing compound, such as a bicyclononyne (BCN) group, followed by reaction of the alkyne group with the linker, for example with an azide group of the linker, such as an azidoacetic acid NHS-ester. In one embodiment the click chemistry reaction is an inverse electron demand Diels-Alder reaction (iEDDA). In one embodiment an amine may be activated by reaction with a tetrazine-containing compound, such as a methyltetrazine-containing compound, followed by reaction of the free tetrazine group with the linker, for example with a trans-cyclooctene group of the linker. In one embodiment an amine may be activated by reaction with a trans-cycloocten-containing compound, followed by reaction of the TCO group with the linker, for example with a tetrazine group (e.g. a methyltetrazine group) of the linker.

[0229] In one embodiment the method comprises reacting the linker with a guanidine group present in the first and / or second amino acids (e.g. present in the side chain of the first and / or second amino acids). In some embodiments the first and / or second amino acids comprise Lys. Examples of guanidine-reactive groups include diketones such as a 1,2-diketone or 1,3-diketone; and NHS-esters, which can form amide bonds with guanidine groups. In one embodiment a guanidine group may be activated for reaction with the linker. In one embodiment a guanidine group may be activated by citrullination via arginine deaminase to form a carbamide group, followed by reaction of the carbamide group with the linker. In one embodiment a guanidine group may be activated using an aldehyde such as formaldehyde, followed by reaction of the carbamide group with the linker, for example with an amine group of the linker. In one embodiment a guanidine group may be activated by reaction with a glyoxal -containing compound. For example, a compound comprising a glyoxal group coupled to a click chemistry group such as an azide or alkyne group may be used, with the guanidine group reacting with the glyoxal group, followed by reaction of the free click chemistry group with the linker; for example with an azide or alkyne group of the linker. Analogous strategies can be used to those described above in the context of activating amine groups.

[0230] In one embodiment the method comprises reacting the linker with a carboxyl group present in the first and / or second amino acids (e.g. present in the side chain of the first and / or second amino acids). In some embodiments the first and / or second amino acids comprise Asp or Glu. Examples of carboxyl-reactive groups include amines, which can form amide bonds with carboxyl groups; and alcohols, which can form esters with carboxyl groups. In one embodiment a carboxyl group may be activated for reaction with the linker. In one embodiment a carboxyl group may be activated using a reagent such as a carbodiimide, followed by reaction with a nucleophilic group (e.g. an amine) on the linker . In one embodiment a compound comprising a nucleophilic group such as an amine group coupled to a click chemistry group such as an azide or alkyne group may be used, with the carboxyl group being activated by the carbodiimide, the amine group reacting with the activated carboxyl group; and followed by reaction of the free click chemistry group with the linker; for example with an azide or alkyne group of the linker. Analogous strategies can be used to those described above in the context of activating amine groups.

[0231] In one embodiment the method comprises reacting the linker with a hydroxyl group present in the first and / or second amino acids (e.g. present in the side chain of the first and / or second amino acids). In some embodiments the first and / or second amino acids comprise Ser or Thr. Examples of hydroxyl -reactive groups include carboxyl groups which can react with hydroxyl groups to form esters; isocyanates which react with hydroxyl groups to form urethanes; and vinyl sulfones. In one embodiment a hydroxyl group may be activated for reaction with the linker. In one embodiment a compound comprising a carboxyl, isocyanate or vinyl sulphone group coupled to a click chemistry group such as an azide or alkyne group may be used, with the hydroxyl group being activated, followed by reaction of the free click chemistry group with the linker; for example with an azide or alkyne group of the linker. Analogous strategies can be used to those described above in the context of activating amine groups.

[0232] In one embodiment the method comprises reacting the linker with a thiol group present in the first and / or second amino acids (e.g. present in the side chain of the first and / or second amino acids). In some embodiments the first and / or second amino acids comprise Cys. Examples of thiol -reactive groups include maleimides, haloacetamides, pyridyl disulfides and vinyl sulfones. In one embodiment a thiol group may be activated for reaction with the linker. In one embodiment a compound comprising a maleimide group, a haloacetamide or a pyridyl disulfides or vinyl sulfone group coupled to a click chemistry group such as an azide or alkyne group may be used, with the thiol group being activated, followed by reaction of the free click chemistry group with the linker; for example with an azide or alkyne group of the linker. Analogous strategies can be used to those described above in the context of activating amine groups.

[0233] Accordingly, in some embodiments the first end of the linker comprises an aminereactive group, a thiol -reactive group, a carbonyl -reactive group, a carboxyl -reactive group, a hydroxyl-reactive group, an imidazole-reactive group, or a click-chemistry reactive group. In some embodiments such groups are suitable for reaction with the first amino acid (e.g. with the side chain of the first amino acid) in the target amino acid pair.

[0234] In some embodiments the or each first amino acid (e.g. the side chain of the or each first amino acid) comprises an amine group, a thiol group, a carbonyl group, a carboxyl group, a hydroxyl group, an imidazole group, or a click-chemistry group. In some embodiments the or each first amino acid (e.g. the side chain of the or each first amino acid) is modified (e.g. is activated) to comprise an amine group, a thiol group, a carbonyl group, a carboxyl group, a hydroxyl group, an imidazole group, or a click-chemistry group.

[0235] In some embodiments the or each first amino acid (e.g. the side chain of the or each first amino acid) comprises an amine group, a thiol group, a carbonyl group, a carboxyl group, a hydroxyl group, an imidazole group, or a click-chemistry group; and the first end of the linker comprises a first reactive group capable of reacting therewith. In some embodiments the first amino acid comprises an amine group, a thiol group, a carbonyl group, a carboxyl group, a hydroxyl group, an imidazole group, or a click-chemistry group; and the first end of the linker comprises an amine-reactive group, a thiol-reactive group, a carbonyl -reactive group, a carboxyl-reactive group, a hydroxyl-reactive group, an imidazole-reactive group, or a click-chemistry reactive group.

[0236] Suitable first amino acids include Lys, Arg, Glu, Asp, Cys, Ser, Thr, Tyr and His. In some embodiments a first amino acid is Lys and the first end of a linker for reaction with the first amino acid comprises a first reactive group which is an amine-reactive group. In some embodiments a first amino acid is Arg and the first end of a linker for reaction with the first amino acid comprises a first reactive group which is an amine-reactive group (e.g. a guanidine-reactive group). In some embodiments a first amino acid is Glu and the first end of a linker for reaction with the first amino acid comprises a first reactive group which is a carboxyl -reactive group. In some embodiments a first amino acid is Asp and the first end of a linker for reaction with the first amino acid comprises a first reactive group which is a carboxyl -reactive group. In some embodiments a first amino acid is Cys and the first end of a linker for reaction with the first amino acid comprises a first reactive group which is a thiol -reactive group. In some embodiments a first amino acid is Ser and the first end of a linker for reaction with the first amino acid comprises a first reactive group which is a hydroxyl-reactive group. In some embodiments a first amino acid is Thr and the first end of a linker for reaction with the first amino acid comprises a first reactive group which is a hydroxyl-reactive group. In some embodiments a first amino acid is His and the first end of a linker for reaction with the first amino acid comprises a first reactive group which is an imidazole-reactive group. In some embodiments a first amino acid comprises a click-chemistry group and the first end of a linker for reaction with the first amino acid comprises a first reactive group which is a click-chemistry reactive group.

[0237] In some embodiments the first end of a linker reacts with the C-terminus (carbonyl group) of the first amino acid. In some such embodiments the first end of the linker may comprise a reactive group which is a carbonyl-reactive group.

[0238] In some embodiments the second end of the or each linker reacts with the second amino acid in each target amino acid pair.

[0239] In some embodiments, the second end of the or each linker independently comprises a second reactive group for reacting with the or each second amino acid.

[0240] The second reactive group can be chosen or determined by the user of the disclosed methods to ensure selective reaction with the or each second amino acid.

[0241] In some embodiments, the second end of the or each linker reacts with the side chain of the or each second amino acid. In some embodiments, the second end of the or each linker reacts with the N-terminal amine group of the or each second amino acid. In some embodiments, the second end of the or each linker reacts with the C-terminal carbonyl group of the or each second amino acid.

[0242] In some embodiments, the or each second amino acid natively comprises a reactive group capable of reacting with the second end of the or each linker. In some embodiments the or each second amino acid is modified to comprise a reactive group. In some embodiments the methods comprise activating the N-terminal amine group of the or each second amino acid for reaction with the second end of the linker. In some embodiments the methods comprise activating the C-terminal carbonyl group of the or each second amino acid for reaction with the second end of the linker. In some embodiments the methods comprise activating the side chain of the or each second amino acid for reaction with the second end of the linker. In some embodiments the methods comprise activating the second end of the or each linker for reaction with the or each second amino acid.

[0243] Any suitable activating agent can be used. Suitable activating agents are discussed above in the context of the reaction of the first end of the linker with the first amino acid.

[0244] In some embodiments, the second end of the or each linker comprises a second reactive group for reacting with the or each second amino acid.

[0245] In some embodiments the second end of the linker comprises an amine-reactive group, a thiol -reactive group, a carbonyl-reactive group, a carboxyl-reactive group, a hydroxyl-reactive group, an imidazole-reactive group, or a click-chemistry reactive group. In some embodiments such groups are suitable for reaction with the second amino acid (e.g. with the side chain of the second amino acid) in the target amino acid pair. In some embodiments the second end of the linker comprises an amine-reactive group capable of reacting with the N-terminal amine group of the second amino acid in the target amino acid pair.

[0246] In some embodiments the or each second amino acid (e.g. the side chain of the or each second amino acid) comprises an amine group, a thiol group, a carbonyl group, a carboxyl group, a hydroxyl group, an imidazole group, or a click-chemistry group. In some embodiments the or each second amino acid (e.g. the side chain of the or each second amino acid) is modified (e.g. is activated) to comprise an amine group, a thiol group, a carbonyl group, a carboxyl group, a hydroxyl group, an imidazole group, or a clickchemistry group.

[0247] In some embodiments the or each second amino acid (e.g. the side chain of the or each second amino acid) comprises an amine group, a thiol group, a carbonyl group, a carboxyl group, a hydroxyl group, an imidazole group, or a click-chemistry group; and the second end of the linker comprises a second reactive group capable of reacting therewith. In some embodiments the second amino acid comprises an amine group, a thiol group, a carbonyl group, a carboxyl group, a hydroxyl group, an imidazole group, or a clickchemistry group; and the second end of the linker comprises an amine-reactive group, a thiol-reactive group, a carbonyl-reactive group, a carboxyl -reactive group, a hydroxylreactive group, an imidazole-reactive group, or a click-chemistry reactive group.

[0248] These groups may be used to react with any suitable second amino acids as described above. Thus, when the second end of the linker comprises an amine-reactive group, the linker may be used to attach to any second amino acid comprising a reactive amine group. The reactive amine group may be in some embodiments the N-terminal amine group of the second amino acid. The reactive amine group may be in some embodiments a reactive amine group comprised in the side chain of the second amino acid. For example, the second amino acid may be Lys or Arg.

[0249] When the second end of the linker comprises a thiol -reactive group, the linker may be used to attach to any second amino acid comprising a reactive thiol group. The reactive thiol group may be in some embodiments a reactive thiol group comprised in the side chain of the second amino acid. For example, the second amino acid may be Cys.

[0250] When the second end of the linker comprises a carbonyl -reactive group, the linker may be used to attach to any second amino acid comprising a reactive carbonyl group. The reactive amine group may be in some embodiments the C-terminal carbonyl group of the second amino acid. The reactive carbonyl group may be in some embodiments a reactive carbonyl group comprised in the side chain of the second amino acid.

[0251] When the second end of the linker comprises a carboxyl -reactive group, the linker may be used to attach to any second amino acid comprising a reactive carboxyl group. The reactive carboxyl group may be in some embodiments a reactive carboxyl group comprised in the side chain of the second amino acid. For example, the second amino acid may be Glu or Asp.

[0252] When the second end of the linker comprises a hydroxyl -reactive group, the linker may be used to attach to any second amino acid comprising a reactive hydroxyl group. The reactive hydroxyl group may be in some embodiments a reactive hydroxyl group comprised in the side chain of the second amino acid. For example, the second amino acid may be Ser or Thr.

[0253] When the second end of the linker comprises an imidazole-reactive group, the linker may be used to attach to any second amino acid comprising a reactive imidazole group. The reactive imidazole group may be in some embodiments a reactive imidazole group comprised in the side chain of the second amino acid. For example, the second amino acid may be His.

[0254] When the second end of the linker comprises a click-chemistry reactive group, the linker may be used to attach to any second amino acid comprising a click-chemistry group. The click-chemistry group may be in some embodiments comprised in the side chain of the second amino acid. The click-chemistry group can be introduced into the side chain of the second amino acid by activating the side chain of the second amino acid with an appropriate reagent, e.g. a reagent comprising a first group capable of reacting with the side chain of the second amino acid and a click chemistry group.

[0255] In one exemplary embodiment, the first amino acid and / or the second amino acid each comprise an amine group; and the method comprises activating said amine group(s) by reaction with an activating agent. In some embodiments the activating agent is Traut’s Reagent (2-iminothiolane), or a salt thereof. In some embodiments the activating agent is 2 -iminothiolane hydrochloride. In some embodiments the first end of the linker and / or the second end of the linker each comprise a thiol -reactive group. In some embodiments the first end of the linker and / or the second end of the linker each comprise a maleimide or haloacetamide group. In some embodiments the first end of the linker and / or the second end of the linker each comprise a maleimide group. In some embodiments, the first amino acid in the target amino acid pair is lysine and the second amino acid is the amino acid adjacent C-terminal to the lysine in the target polypeptide. In some embodiments the methods comprise reacting the Lys residue in each target amino acid pair with Traut’s Reagent (2 -iminothiolane), or a salt thereof. In some embodiments reacting the Lys residue in each target amino acid pair with Traut’s Reagent (2-iminothiolane), or a salt thereof activates each Lys residue thereby forming a sulfhydryl group attached to the first amino acid. In some embodiments the sulfhydryl group is captured by reaction with a thiol -reactive group. In some embodiments the sulfhydryl group is captured by reaction with a maleimide group. In some embodiments the maleimide group is comprised in the linker. In some embodiments the maleimide group is comprised in the first reactive group at the first end of the linker. In some embodiments the linker comprises a further maleimide group at the second end of the linker. In some embodiments the methods comprise cleaving the target amino acid pair using a proteolytic enzyme as described herein. In some embodiments the proteolytic enzyme cleaves the or each target amino acid pair C-terminal to the Lys residue in each target amino acid pair. In some embodiments the cleaving of each target amino acid pair reveals the N-terminal amine group of the second amino acid in the target amino acid pair. In some embodiments the N-terminal amine group is activated, e.g. by with reaction with Traut’s Reagent (2- iminothiolane), or a salt thereof. In some embodiments reacting the N-terminal amine group of the second amino acid with Traut’s Reagent (2-iminothiolane), or a salt thereof activates the amine group thereby forming a sulfhydryl group attached to the second amino acid. In some embodiments the sulfhydryl group is reacted with a thiol -reactive group comprised in the linker. In some embodiments the thiol reactive group is a maleimide group comprised in the second reactive group at the second end of the linker. This embodiment of the disclosed methods may therefore be used to expand a target polypeptide C-terminal to each Lys amino acid in the target polypeptide, by using Traut’s reagent and a bismaleimide linker comprising a maleimide group at the first and second ends of the linker. The nature of the remainder of the linker between the reactive groups is not particularly limited and can be any suitable linking moiety, including a polymeric or molecular species as described in more detail herein.

[0256] In some embodiments the target polypeptide is modified to prevent cross-reaction of the linker with functional groups in the target polypeptide apart from the first and second amino acids in the target amino acid pair. For example, in some embodiments the N-terminal amine group of the target polypeptide is modified, e.g. by being capped. In some embodiments the N-terminal amine group of the target polypeptide is modified by being acetylated. In some embodiments the C-terminal amine group of the target polypeptide is modified, e.g. by being capped. In some embodiments the C-terminal amine group of the target polypeptide is modified by being amidated. Capping of the N- and / or C-terminals of the target polypeptide can prevent reaction of the linker with the terminals in embodiments wherein this is not required, e.g. in embodiments where it may be desirable for the N- and / or C-terminals of the target polypeptide to remain unmodified with the linker so that they can be modified with one or more sequencing adapters as described herein.

[0257] Cleavage of the target polypeptide

[0258] The disclosed methods comprise cleaving the target polypeptide between the first and second amino acids in the or each target amino acid pair. Any suitable cleavage conditions can be used.

[0259] In some embodiments the disclosed methods comprise (i) attaching a first end of the linker to the first amino acid in each target amino acid pair; (ii) cleaving the target polypeptide between the first and the second amino acids in each target amino acid pair; and (iii) attaching a second end of each linker to the second amino acid in each target amino acid pair. Thus, in some embodiments the method comprises attaching a first end of the linker to the first amino acid in each target amino acid pair; and subsequently cleaving the target polypeptide between the first and the second amino acids in each target amino acid pair; and subsequently attaching a second end of each linker to the second amino acid in each target amino acid pair.

[0260] Such methods are particularly suitable for embodiments in which cleaving the target polypeptide comprises contacting the target polypeptide with one or more proteolytic enzymes and / or with one or more chemical reagents. In some embodiments, contacting the target polypeptide with one or more proteolytic enzymes and / or with one or more chemical reagents generates a reactive group at the second amino acid for attachment of the second end of the linker.

[0261] For example, in some embodiments contacting the target polypeptide with one or more proteolytic enzymes and / or with one or more chemical reagents generates a reactive N-terminal amine group for reaction with the second end of the linker. Thus, in some embodiments the disclosed methods comprise reacting the second end of the or each linker with the N-terminal amine group of the or each second amino acid. In some embodiments the method comprises cleaving the target polypeptide with a reagent that cleaves C- terminal to the first amino acid, thereby liberating the N-terminal amine group of the second amino acid for reaction with the second end of the or each linker. This is shown schematically in Figure 1. In some embodiments of this method the first amino acid is Lys, and the method comprises activating the first amino acid for reaction with the linker using a reagent such as Traut’s reagent or a salt thereof. The first end of the linker may comprise a maleimide group for reacting with the activated first amino acid. The cleavage of the target polypeptide generates a reactive N-terminal amine group at the second amino acid which may be also activated using a reagent such as Traut’s reagent or a salt thereof. The second end of the linker may comprise a maleimide group for reacting with the activated second amino acid.

[0262] In some embodiments the method comprises attaching a first end of the linker to the first amino acid in each target amino acid pair; and subsequently cleaving the target polypeptide between the first and the second amino acids in each target amino acid pair as the second end of each linker is attached to the second amino acid in each target amino acid pair. In some embodiments the method comprises (i) attaching a first end of each linker to each first amino acid in each target amino acid pair; and (ii) attaching a second end of each linker to each second amino acid in each target amino acid pair, thereby cleaving the target polypeptide between the first and the second amino acids in each target amino acid pair. Thus, in some embodiments the cleavage of the target polypeptide occurs simultaneously or substantially simultaneously with the attachment of the second end of the linker to the second amino acid.

[0263] Such methods are particularly suitable for embodiments in which cleaving the target polypeptide comprises contacting the target polypeptide (e.g. the backbone of the target polypeptide) with the second end of the linker. For example, as described herein, the second end of the linker may comprise a group capable of reacting with a peptide bond between the first and second amino acids in the target amino acid pair thereby cleaving the peptide bond. Such a group may be a carbonyl-reactive group as described herein. In some embodiments the peptide bond between the first and second amino acids in the target amino acid pair is activated for cleavage by the second end of the linker.

[0264] Thus, in some embodiments the disclosed methods comprise reacting the second end of the or each linker with a peptide bond between the first and second amino acids (e.g. with the carbonyl group of a peptide bond between the first and second amino acids), thereby cleaving the peptide bond. This is shown schematically in Figure 2. In some embodiments, the disclosed methods comprise (i) linking the first and second amino acids in each target amino acid pair and (ii) cleaving the target polypeptide between the first and second peptides in each target amino acid pair. In some embodiments the disclosed methods comprise (i) attaching a first end of the linker to the first amino acid in each target amino acid pair; (ii) attaching a second end of each linker to the second amino acid in each target amino acid pair; and (iii) cleaving the target polypeptide between the first and the second amino acids in each target amino acid pair. Thus, in some embodiments the method comprises linking the first and second amino acids in each target amino acid pair and subsequently cleaving the target polypeptide between the first and second peptides in each target amino acid pair.

[0265] Such methods are particularly suitable for embodiments in which the second end of the linker is attached to a side chain (e.g. a reactive side chain) of the second amino acid. In some embodiments the side chain of the second amino acid is activated for attachment of the second end of the linker. Thus, in some embodiments the disclosed methods comprise reacting the second end of the or each linker with the side chain of the or each second amino acid. In some embodiments the disclosed methods comprise reacting the second end of the or each linker with the side chain of the or each second amino acid, and subsequently cleaving the target polypeptide between the first and second amino acids. This is shown schematically in Figure 3.

[0266] In some embodiments the method comprises (i) cleaving the target polypeptide between the first and second peptides in each target amino acid pair and (ii) linking the first and second amino acids in each target amino acid pair. In some embodiments the disclosed methods comprise (i) cleaving the target polypeptide between the first and the second amino acids in each target amino acid pair; (ii) attaching a first end of the linker to the first amino acid in each target amino acid pair; and (iii) attaching a second end of each linker to the second amino acid in each target amino acid pair. Thus, in some embodiments the method comprises cleaving the target polypeptide between the first and second peptides in each target amino acid pair and subsequently linking the first and second amino acids in each target amino acid pair.

[0267] In some embodiments cleaving the target polypeptide comprises contacting the target polypeptide with one or more proteolytic enzymes. In some embodiments cleaving the target polypeptide comprises contacting the target polypeptide with one or more chemical reagents (i.e. with one or more chemical reagents capable of cleaving the target polypeptide). The term contacting is used herein in its broadest sense to refer to subjecting the polypeptide to the proteolytic enzyme or chemical reagent under conditions such that enzyme or reagent is capable of exerting its polypeptide-cleavage activity. Thus, in embodiments in which the polypeptide is contacted with one or more proteolytic enzymes (also known as proteases), the contacting takes place under conditions that the or each enzyme is catalytically competent to act on the polypeptide to enzymatically cleave the polypeptide at a suitable cleavage site, also known as a recognition site.

[0268] In the disclosed methods, cleavage of the target polypeptide results in peptide fragments that are connected together by linkers as described herein, comprised in a peptide linker construct as described herein. The sequence order of the target polypeptide is retained in the peptide linker construct as described herein.

[0269] In brief, however, and by way of non-limiting illustration, the length of the peptide fragments in the peptide linker construct is typically a function of the frequency of the target amino acid pairs in the target polypeptide. Accordingly, in some embodiments the target amino acid pair can be chosen or selected to result in a peptide linker construct which comprises peptide fragments of a desired frequency. For example, it has been observed that lysine residues typically occur roughly every 10-15 amino acids in a random- or pseudo-random polypeptide sequence. Thus, if the first amino acid in the target amino acid pair is a lysine, then statistically the length of the peptide fragments in the peptide linker construct will thus be 10-15 amino acids. Alternatively, if peptide fragments in the peptide linker construct having an average length of about 10-15 amino acids are required for the desired utility of the construct, then the methods could be operated the methods such that lysine is chosen as the first amino acid in the target polypeptide. If shorter fragments are required then an amino acid with a higher frequency in the target polypeptide could be chosen as the first amino acid. If longer fragments are required then an amino acid with a lower frequency in the target polypeptide could be chosen as the first amino acid.

[0270] In some embodiments the or each peptide fragment in the peptide linker construct independently has a length (e.g. an average length, e.g. a mean length) of from about 2 to about 50 amino acids. In some embodiments the or each oligopeptide independently has a length of from about 5 to about 30 amino acids, such as from about 6 to about 28 amino acids, e.g. from about 8 to about 25 amino acids, e.g. from about 10 to about 20 amino acids, e.g. about 10, 11, 12, 13, 14, 15, 16, 17, 18, 19 or 20 amino acids. In some embodiments at least about 50%, e.g. at least about 60%, e.g. at least about 70%, e.g. at least about 80%, e.g. at least about 90%, e.g. at least about 95%, 96%, 97%, 98% or 99% of the peptides in the peptide linker construct have a length of from about 2 to about 50 amino acids, such as from about 5 to about 30 amino acids, such as from about 6 to about 28 amino acids, e.g. from about 8 to about 25 amino acids, e.g. from about 10 to about 20 amino acids, e.g. about 10, 11, 12, 13, 14, 15, 16, 17, 18, 19 or 20 amino acids.

[0271] In embodiments in which the polypeptide is contacted with one or more proteolytic enzymes, any suitable such enzymes can be used. Suitable proteinases are described in detail in Handbook of Proteolytic Enzymes, Rawlings and Salvesen (eds), Elsevier 2013, the entire contents of which are hereby incorporated by reference.

[0272] In some embodiments a proteolytic enzyme is an endo-protease. In some embodiments a proteolytic enzyme is an exo-proteinase.

[0273] As those skilled in the art will appreciate, choice of a suitable proteolytic enzyme is an operational parameter of the disclosed methods which can be made by the user of the method according to the polypeptide to be processed and the nature of the linker that is used to expand the target polypeptide in the disclosed methods.

[0274] As explained above, a proteolytic enzyme is typically selective for a cleavage site, also known as a recognition site. This is described in more detail herein. In brief, however, and by way of non-limiting illustration, the statistical distribution of a given recognition site in a polypeptide comprising a random or pseudo-random arrangement of amino acids (e.g. in a full length protein or other polypeptide as described herein) can be calculated based on the length and sequence of cleavage site and the composition of the polypeptide.

[0275] The selective action of a proteolytic enzyme at a cleavage site typically result in peptide fragments which comprise a uniform terminal amino acid motif. The choice of proteolytic enzyme can therefore be made by the user of the disclosed methods in order to provide the desired terminal amino acid motif for reaction with the linker as described herein.

[0276] The proteolytic enzyme selectively cleaves the target polypeptide at a cleavage site in the target polypeptide. Typically in the disclosed methods the cleavage site corresponds to a bond between the first and second amino acids in the target amino acid pair. In some embodiments the proteinase selectively cleaves the polypeptide at at least 50%, at least 60%, at least 70%, at least 80%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98% or at least 99% of the cleavage sites in the polypeptide. In some embodiments the proteinase selectively cleaves the polypeptide at substantially all or all of the cleavage sites in the polypeptide.

[0277] In some embodiments of the disclosed methods, the cleavage site comprises lysine or arginine. In some embodiments of the disclosed methods, the cleavage site comprises lysine. In some embodiments of the disclosed methods, the cleavage site comprises arginine. In some embodiments the cleavage site consists of lysine or arginine. In some embodiments the cleavage site consists of lysine. In some embodiments, the cleavage site consists of arginine.

[0278] In embodiments in which the polypeptide is contacted with one or more chemical reagents capable of cleaving the target polypeptide between the first and second amino acids, any suitable chemical reagent can be used.

[0279] As those skilled in the art will appreciate, choice of a suitable chemical reagent is an operational parameter of the disclosed methods which can be made by the user of the method according to the polypeptide to be processed and the nature of the linker that is used to expand the target polypeptide in the disclosed methods.

[0280] A chemical reagent may be chosen or selected to be selective for a cleavage site. A chemical reagent may be chosen or selected to be specific for a first amino acid. A chemical reagent may be chosen or selected to be specific for a target amino acid pair comprising specific first and second amino acids.

[0281] For example, in some embodiments the cleavage site comprises an internal carbonyl group in the backbone of the polypeptide.

[0282] In some embodiments the internal carbonyl group is targeted and / or activated using a metal complex. Without being bound by theory, metal complexes typically activate internal carbonyl groups within peptide backbones by enhancing the electrophilicity of the carbonyl carbon, making it more susceptible to nucleophilic attacks. Accordingly, when a nucleophile is positioned in close proximity, it can react with the activated carbonyl, leading to cleavage of the peptide bond. In some embodiments the nucleophile is comprised in the linker. In some embodiments the nucleophile is comprised at the second end of the linker. In some embodiments the second reactive group at the second end of the linker comprises or consists of the nucleophile. In some embodiments the first end of the linker may be attached to the first amino acid in the target peptide and a metal complex may be used to activate an internal carbonyl group of the peptide backbone for reaction with a nucleophile comprised at the second end of the linker.

[0283] In some embodiments the chemical reagent is a peptide that mimics the catalytic triad of an enzyme such as a proteolytic enzyme. Examples include peptides as described in Singh et al, Proc Natl Acad Scie 121 (31) e2321396121 (2024).

[0284] In some embodiments the chemical reagent is a bromoindole-aryl sulfide such as BNPS-Skatole (3-Bromo-3-methyl-2-(2-nitrophenylthio)-3H-indole) or a derivative thereof.

[0285] In some embodiments, cleaving the target polypeptide comprises contacting the target polypeptide with one or more of Arg-C, Asp-N, BNPS-Skatole (3-Bromo-3-methyl- 2-(2-nitrophenylthio)-3H-indole), Bromelain, Caspase 1, Caspase 2, Caspase 3, Caspase 4, Caspase 5, Caspase 6, Caspase 7, Caspase 8, Caspase 9, Caspase 10, Chymotrypsin high specificity, Chymotrypsin low specificity, Clostripain, CNBr (cyanogen bromide), Enterokinase, Factor Xa, Ficin, Formic acid, Gingisrex, Glu-C, Glutamyl endopeptidase, Granzyme B, Hydroxylamine, lodosobenzoic acid, Lys-C, Lys-N, Neutrophil elastase, NTCB (2-Nitro-5-thiocyanatobenzoic acid), Papain, Pepsin pH 1.3, Pepsin pH >=2, Proline-endopeptidase, Proteinase K, Staphylococcal peptidase I, Thermolysin, Thrombin (PeptideCutter), Thrombin SG, Tobacco etch virus protease, Trypsin, Asp-N Endopeptidase, ProAlanase, Elastase, LysArgiNase and subtilisin.

[0286] In some embodiments the reagent used to cleave the target polypeptide may be chosen according to the first and / or second amino acid in the target amino acid pair. In some embodiments the reagent used to cleave the target polypeptide may be chosen according to the first and / or second amino acid in the target amino acid pair as set out in the following table. Typically the reagent used to cleave the target polypeptide is chosen according to the first amino acid in the target amino acid pair. The reagents are typically available from commercial suppliers such as New England Biolabs (Ipswich, MA, USA), Sigma Aldrich (St. Louis, MO, United States) and Promega (Madison, WI, USA).

[0287] In some embodiments, cleaving the target polypeptide comprises contacting the target polypeptide with one or more of Arg-C, Asp-N, Caspase 1, Caspase 2, Caspase 3, Caspase 4, Caspase 5, Caspase 6, Caspase 7, Caspase 8, Caspase 9, Caspase 10, Clostripain, Enterokinase, Factor Xa, Formic acid, Gingisrex, Glu-C, Glutamyl endopeptidase, Granzyme B, Lys-C, Lys-N, NTCB (2-Nitro-5-thiocyanatobenzoic acid), Papain, Staphylococcal peptidase I, Thrombin (PeptideCutter), Thrombin SG, Trypsin, Asp-N Endopeptidase, and LysArgiNase.

[0288] In some embodiments, cleaving the target polypeptide comprises contacting the target polypeptide with one or more of Arg-C, Asp-N, Clostripain, Formic acid, Gingisrex, Glu-C, Glutamyl endopeptidase, Lys-C, Lys-N, NTCB (2-Nitro-5-thiocyanatobenzoic acid), Staphylococcal peptidase I, Thrombin (PeptideCutter), Trypsin, Asp-N Endopeptidase, and LysArgiNase.

[0289] In some embodiments, cleaving the target polypeptide comprises contacting the target polypeptide with one or more of LysC (cleavage site: K), LysN (cleavage site: K), ArgC (cleavage site: R), and ArgN (cleavage site: R), or a functional analog, fragment or variant thereof. In some embodiments cleaving the target polypeptide comprises contacting the target polypeptide with LysC or a or functional analog, fragment or variant thereof.

[0290] LysC (lysyl endopeptidase; E.C. 3.4.21.50) is a lysine-specific proteinase which cleaves polypeptides C-terminal to lysine residues. LysC and equivalent enzymes may be isolated from Achromobacter lyticus, Lysobacter enzymogenes and Pseudomonas aeruginosa. LysC is a serine protease that hydrolyzes specifically at the carboxyl side of lysines. LysC typically retains proteolytic activity under strong protein denaturing conditions such as 8M urea, which can be used to improve digestion of proteolytically resistant proteins. LysC typically has optimal activity in the range of pH 7.0 — 9.0. LysC is available from e.g. Promega and New England Biolabs. Sequencing adapter

[0291] In some embodiments, the disclosed methods comprise attaching a sequencing adapter to the peptide linker construct formed by expanding the target polypeptide as described herein.

[0292] In some embodiments the attachment between the construct and a sequencing adapter is covalent. In some embodiments the attachment is non-covalent. In some embodiments the attachment comprises ligating the sequencing adapter to the construct.

[0293] As will be apparent from the discussion herein, the attachment between a sequencing adapter and the construct is not especially limited and any suitable attachment means can be used.

[0294] In some embodiments, a sequencing adapter may be attached at the N-terminal of the construct. In some embodiments a sequencing adapter may be attached at the C- terminal of the construct. In some embodiments, sequencing adapters may be attached at the N-terminal and the C-terminal of the construct.

[0295] Any suitable sequencing adapter may be used. For example, if the construct is for characterisation using a nanopore, the choice of the sequencing adapters may be dependent on the nanopore characterisation methods that are envisaged.

[0296] Typically, if sequencing adapters are attached at the N-terminal and the C-terminal of the construct, the sequencing adapter attached at the N-terminal is different to the sequencing adapter attached at the C-terminal of the construct.

[0297] In some embodiments a sequencing adapter comprises a polynucleotide and / or a polypeptide.

[0298] In one embodiment, the or each adapter is synthetic or artificial. Typically, the or each adapter comprises a polymer as described herein. In some embodiments, the or each adapter comprises a spacer as described herein. In some embodiments, the or each adapter comprises a polynucleotide. The or each polynucleotide adapter may comprise DNA, RNA, modified DNA (such as abasic DNA), RNA, PNA, LNA, BNA and / or PEG. Usually, the or each adapter comprises single stranded and / or double stranded DNA or RNA.

[0299] In some embodiments, an adapter is a linear adapter. A linear adapter may be bound to either or both ends of a oligopeptide.

[0300] A linear adapter may comprise a leader sequence as described herein. A linear adapter may comprise a portion for hybridisation with a tag (such as a pore tag) as described herein. A linear adapter may be 10 to 150 nucleotides in length, such as from 20 to 120, e.g. 30 to 100, for example 40 to 80 such as 50 to 70 nucleotides in length. A linear adapter may be single stranded. A linear adapter may be double stranded.

[0301] In some embodiments, an adapter may be a Y adapter. A Y adapter is typically a polynucleotide adapter. A Y adapter is typically double stranded and comprises (a) at one end, a region where the two strands are hybridised together and (b), at the other end, a region where the two strands are not complementary. The non-complementary parts of the strands typically form overhangs. The presence of a non-complementary region in the Y adapter gives the adapter its Y shape since the two strands typically do not hybridise to each other unlike the double stranded portion. The two single-stranded portions of the Y adapter may be the same length, or may be different lengths. For example, one singlestranded portion of the Y adapter may be 10 to 150 nucleotides in length, such as from 20 to 120, e.g. 30 to 100, for example 40 to 80 such as 50 to 70 nucleotides in length and the other single stranded portion of the Y adapter may independently by 10 to 150 nucleotides in length, such as from 20 to 120, e.g. 30 to 100, for example 40 to 80 such as 50 to 70 nucleotides in length. The double-stranded “stem” portion of the Y adapter may be e.g. from 10 to 150 nucleotides in length, such as from 20 to 120, e.g. 30 to 100, for example 40 to 80 such as 50 to 70 nucleotides in length. A Y adapter may be attached to either or both ends of a barcode as described herein.

[0302] An adapter may be linked to the construct by any suitable means known in the art, according to the present disclosure. The adapter may be synthesized separately and chemically attached or enzymatically ligated to the strand as described herein.

[0303] An adapter suitable for use in the described methods may in some embodiments comprise a leader. A leader may be useful to assist the capture of the adapter and thus of the barcode by a nanopore as described herein.

[0304] In some embodiments the leader may be from about 10 to 150 nucleotides (e.g. DNA and / or RNA nucleotides) in length, such as from 20 to 120, e.g. 30 to 100, for example 40 to 80 such as 50 to 70 nucleotides in length, or from about 10 to about 60 nucleotides in length, e.g. from about 20 to about 50, such as from about 20 to about 40, e.g. about 30 nucleotides in length.

[0305] In some embodiments the leader is a charged polymer, e.g. a negatively charged polymer. In some embodiments the leader comprises a polymer such as PEG or a polysaccharide. In such embodiments the leader may be from 10 to 150 monomer units (e.g. ethylene glycol or saccharide units) in length, such as from 20 to 120, e.g. 30 to 100, for example 40 to 80 such as 50 to 70 monomer units (e.g. ethylene glycol or saccharide units) in length.

[0306] An adapter may comprise a polypeptide. An adapter may comprise a leader comprising a polypeptide. The polypeptide may in some embodiments have a net negative charge or it may have a specific recognition sequence, e.g. a sequence specific for a motor protein such as an unfoldase enzyme. In such embodiments the leader may be from 10 to 150 amino acids in length, such as from 20 to 120, e.g. 30 to 100, for example 40 to 80 such as 50 to 70 amino acids in length.

[0307] Motor proteins

[0308] In some embodiments the disclosed methods involve allowing the peptide linker construct or a peptide-containing portion thereof to move with respect to a nanopore, and taking measurements during said movement thereby characterising the construct. This is described in more detail herein.

[0309] The movement of the construct with respect to the nanopore may be driven by any suitable means. In some embodiments, the movement of the construct is driven by a physical or chemical force (potential). In some embodiments the physical force is provided by an electrical (e.g. voltage) potential or a temperature gradient, etc.

[0310] In some embodiments, the movement of the construct comprises mechanically manipulating the construct thereby moving said construct with respect to the nanopore. In some embodiments, movement of the construct by mechanical manipulation does not comprise using a polynucleotide-handling protein.

[0311] In some embodiments the construct is moved by mechanical manipulation in a direction opposite to a potential applied across said nanopore. In some embodiments, the potential is a voltage potential applied across said nanopore. In some embodiments, the construct is moved with respect to the nanopore as described in WO 2020 / 128517, the entire contents of which are hereby incorporated by reference, particularly in regards to discussion in that document of movements of polynucleotides with respect to nanoreactors.

[0312] In some embodiments, the construct moves with respect to the nanopore as an electrical potential is applied across the nanopore. In some embodiments the construct is charged (e.g. negatively charged), and so applying a voltage potential across a nanopore will cause the construct to move with respect to the nanopore under the influence of the applied voltage potential. For example, if a positive voltage potential is applied to the trans side of the nanopore relative to the cis side of the nanopore, then this will induce a negatively charged construct to move from the cis side of the nanopore to the trans side of the nanopore. Similarly, if a positive voltage potential is applied to the trans side of the nanopore relative to the cis side of the nanopore then this will impede the movement of a negatively charged construct from the trans side of the nanopore to the cis side of the nanopore. The opposite will occur if a negative voltage potential is applied to the trans side of the nanopore relative to the cis side of the nanopore. Apparatuses and methods of applying appropriate voltages are described in more detail herein.

[0313] In some embodiments the chemical force is provided by a concentration (e.g. pH) gradient.

[0314] In some embodiments the movement of the construct with respect to the nanopore is controlled using a method as described in WO 2020 / 016573, the entire contents of which are incorporated herein by reference.

[0315] In some embodiments the movement of the construct is controlled using a method as disclosed in any of WO 2021 / 111125, WO 2021 / 133168, or PCT / GB2023 / 052838, the entire contents of which are incorporated herein by reference.

[0316] In some embodiments the movement of the construct with respect to the nanopore is controlled using a motor protein.

[0317] In some embodiments a motor protein is present (e.g. prior to the contact of the construct with the nanopore) on a sequencing adapter comprised in or attached to the construct, as described in more detail herein. In some embodiments a motor protein is present on a polypeptide portion of the construct. In some embodiments a motor protein is present on a linker portion of the construct.

[0318] In some embodiments a motor protein controls the movement of the construct in the same direction as the physical or chemical force (potential). For example, in some embodiments a positive voltage potential is applied to the trans side of the nanopore relative to the cis side of the nanopore, and a motor protein controls the movement of the construct from the cis side of the nanopore to the trans side of the nanopore. In some embodiments a positive voltage potential is applied to the cis side of the nanopore relative to the trans side of the nanopore, and a motor protein controls the movement of the construct from the trans side of the nanopore to the cis side of the nanopore.

[0319] In some embodiments a motor protein controls the movement of the construct in the opposite direction to the physical or chemical force (potential). For example, in some embodiments a positive voltage potential is applied to the trans side of the nanopore relative to the cis side of the nanopore, and the motor protein controls the movement of the construct from the trans side of the nanopore to the cis side of the nanopore. In some embodiments a positive voltage potential is applied to the cis side of the nanopore relative to the trans side of the nanopore, and the motor protein controls the movement of the construct from the cis side of the nanopore to the trans side of the nanopore.

[0320] In some embodiments the movement of the construct is driven by the motor protein in the absence of an applied potential.

[0321] In embodiments of the disclosed methods which comprise the use of a motor protein, the motor protein is typically capable of controlling the movement of the construct with respect to a nanopore. In other words, the motor protein is capable of controlling the movement of the construct .

[0322] Suitable motor proteins are in some embodiments also known as polynucleotide- handling proteins or polynucleotide-handling enzymes, or polypeptide-handling proteins or polypeptide-handling enzymes. Suitable proteins are known in the art and some exemplary motor proteins are described in more detail below.

[0323] In one embodiment, a motor protein is or is derived from a polynucleotide handling enzyme. A polynucleotide handling enzyme is a polypeptide that is capable of interacting with and modifying at least one property of a polynucleotide. The enzyme may modify the polynucleotide by cleaving it to form individual nucleotides or shorter chains of nucleotides, such as di- or trinucleotides. The enzyme may modify the polynucleotide by orienting it or moving it to a specific position.

[0324] In some embodiments, a motor protein can be present on a construct or an adapter attached thereto prior to its contact with a nanopore. For example, a motor protein can be present on a polynucleotide portion of an adapter.

[0325] In one embodiment, the motor protein is derived from a member of any of the Enzyme Classification (EC) groups 3.1.11, 3.1.13, 3.1.14, 3.1.15, 3.1.16, 3.1.21, 3.1.22, 3.1.25, 3.1.26, 3.1.27, 3.1.30, 3.1.31 and 3.4.21.

[0326] In some embodiments of the claimed methods, the motor protein is a helicase, a polymerase, an exonuclease, a topoisomerase, or a variant thereof.

[0327] In some embodiments the motor protein is designed, configured or selected to prevent the motor protein disengaging from the construct (other than by passing off the end of the construct). Such modified polynucleotide-handling proteins are particularly suitable for use in the disclosed methods. This is particularly useful in the methods disclosed herein which comprise characterising the construct using a nanopore. Thus, in some embodiments of such methods, the construct does not disengage from the motor protein.

[0328] As used herein, the term “disengaging” refers to the dissociation of the motor protein from the construct. Thus, a motor protein may be modified to prevent it from dissociating from the construct, e.g. into the reaction medium. It is important to distinguish potential “disengagement” of a motor protein from “unbinding” of a motor protein from a construct. As used herein, “unbinding” refers to the transient release of the construct from the active site of the motor protein but does not imply disengagement. Thus, for example, a motor protein may be modified to prevent the motor protein from disengaging from a construct, but without preventing the motor protein from unbinding from the construct. When unbound, the motor protein remains engaged with the construct. For example, the motor protein may remain engaged with the construct (i.e. it may be prevented from disengaging from the construct) because it is topologically closed around the construct. The polynucleotide binding site may remain free to bind or unbind the construct such that the motor protein may bind or unbind to the construct, whilst the motor protein remains engaged with the construct. When the motor protein is unbound from the construct it may be able to move on (e.g., along) the construct under an applied force and may be capable of re-binding to the construct. When engaged on the construct but unbound from the construct, the motor protein is not capable of dissociating from the construct.

[0329] The motor protein can be adapted to prevent disengagement in any suitable way. For example, the motor protein can be loaded on the construct and then modified in order to prevent it from disengaging from the construct. Alternatively, the motor protein can be modified to prevent it from disengaging from the construct before it is loaded onto the construct. Modification of a motor protein in order to prevent it from disengaging from a construct can be achieved using methods known in the art, such as those discussed in WO 2014 / 013260 and WO 2021 / 255476, each of which is hereby incorporated by reference in its entirety, and with particular reference to passages describing the modification of motor proteins such as helicases in order to prevent them from disengaging from constructs, conjugates and polynucleotide strands. For example, a motor protein can be modified by treating with tetramethylazodicarboxamide (TMAD) or various other closing moieties.

[0330] When a polynucleotide motor protein is used, it may have a polynucleotide- unbinding opening; e.g. a cavity, cleft or void through which a polynucleotide strand may pass when the motor protein disengages from the strand. In some embodiments, the polynucleotide-unbinding opening is the opening through which a polynucleotide may pass when the motor protein disengages from the polynucleotide. In some embodiments, the polynucleotide-unbinding opening for a given motor protein can be determined by reference to its structure, e.g. by reference to its X-ray crystal structure. The X-ray crystal structure may be obtained in the presence and / or the absence of a polynucleotide substrate. In some embodiments, the location of a polynucleotide-unbinding opening in a given motor protein may be deduced or confirmed by molecular modelling using standard packages known in the art. In some embodiments, the polynucleotide-unbinding opening may be transiently produced by movement of one or more parts e.g. one or more domains of the motor protein.

[0331] The motor protein may be modified by closing the polynucleotide-unbinding opening. The polynucleotide-unbinding opening may be closed with a closing moiety. Closing the polynucleotide-unbinding opening may therefore prevent the motor protein from disengaging from the construct. For example, the motor protein may be modified by covalently closing the polynucleotide-unbinding opening. However, as explained above closing the polynucleotide-unbinding opening does not necessarily prevent the construct from unbinding from the polynucleotide binding site of the motor protein. Accordingly, in some embodiments of the disclosed methods, the motor protein is modified to wholly or partially close an opening existing in at least one conformation state of the unmodified protein through which a polynucleotide or polypeptide strand can unbind. In some embodiments, a preferred protein for addressing in this way is a helicase.

[0332] In embodiments in which the motor protein is a polypeptide motor protein, the motor protein may similarly have a polypeptide-unbinding opening. This can be similarly closed to prevent disengagement of the motor protein from the construct in the same way as for a polynucleotide motor protein.

[0333] In one embodiment, the motor protein is an exonuclease. Suitable enzymes include, but are not limited to, exonuclease I from E. coli, exonuclease III enzyme from E. coli, Red from T. thermophilus and bacteriophage lambda exonuclease, TatD exonuclease and variants thereof.

[0334] In one embodiment, the motor protein is a polymerase. The polymerase may be PyroPhage® 3173 DNA Polymerase (which is commercially available from Lucigen® Corporation), SD Polymerase (commercially available from Bioron®), Klenow from NEB or variants thereof. In one embodiment, the enzyme is Phi29 DNA polymerase or a variant thereof. Modified versions of Phi29 polymerase that may be used in the disclosed methods are disclosed in US Patent No. 5,576,204.

[0335] In some embodiments the motor protein is a polymerase, e.g. a polymerase as described herein.

[0336] In one embodiment the motor protein is a topoisomerase. In one embodiment, the topoisomerase is a member of any of the Moiety Classification (EC) groups 5.99.1.2 and 5.99.1.3. The topoisomerase may be a reverse transcriptase, which are enzymes capable of catalysing the formation of cDNA from a RNA template. They are commercially available from, for instance, New England Biolabs® and Invitrogen®.

[0337] In one embodiment the motor protein is a translocase. Examples include translocases in the FtsK and SpoIII families.

[0338] In one embodiment, the motor protein is a helicase. Any suitable helicase can be used in accordance with the methods provided herein. For example, the or each motor protein used in accordance with the present disclosure may be independently selected from a Hel308 helicase, a RecD helicase, a Tral helicase, a TrwC helicase, an XPD helicase, and a Dda helicase, or a variant thereof. Monomeric helicases may comprise several domains attached together. For instance, Tral helicases and Tral subgroup helicases may contain two RecD helicase domains, a relaxase domain and a C-terminal domain. The domains typically form a monomeric helicase that is capable of functioning without forming oligomers. Particular examples of suitable helicases include Hel308, NS3, Dda, UvrD, Rep, PcrA, Pifl and Tral. These helicases typically work on single stranded DNA. Examples of helicases that can move along both strands of a double stranded DNA include FtsK and hexameric enzyme complexes, or multisubunit complexes such as RecBCD, and are particularly suited to some embodiments disclosed herein. NS3 helicases are particularly suitable for use in the disclosed methods as they are capable of processing both DNA and RNA and so can be used in embodiments of the disclosed methods in which the target double stranded nucleic acid is a DNA-RNA hybrid.

[0339] Hel308 helicases are described in publications such as WO 2013 / 057495, the entire contents of which are incorporated by reference. RecD helicases are described in publications such as WO 2013 / 098562, the entire contents of which are incorporated by reference. XPD helicases are described in publications such as WO 2013 / 098561, the entire contents of which are incorporated by reference. Dda helicases are described in publications such as WO 2015 / 055981 and WO 2016 / 055777, the entire contents of each of which are incorporated by reference. In some embodiments a motor protein (e.g. a helicase) can control the movement of a strand in at least two active modes of operation (when the motor protein is provided with all the necessary components to facilitate movement, e.g. fuel and cofactors such as ATP and Mg2+discussed herein) and one inactive mode of operation (when the motor protein is not provided with the necessary components to facilitate movement).

[0340] When provided with all the necessary components to facilitate movement (i.e. in the active modes), the motor protein (e.g. helicase) moves along the construct in a 5’ to 3’ or a 3’ to 5’ direction (depending on the motor protein). The motor protein can be used to either move the construct away from (e.g. out of) the pore (e.g. against an applied force) or the strand towards (e.g. into) the pore (e.g. with an applied force). For example, when the end of the construct towards which the motor protein moves is captured by a pore, the motor protein works against the direction of the force and pulls the threaded construct out of the pore (e.g. into the cis chamber). However, when the end away from which the motor protein moves is captured in the pore, the motor protein works with the direction of the force and pushes the threaded construct into the pore (e.g. into the trans chamber).

[0341] When the motor protein (e.g. helicase) is not provided with the necessary components to facilitate movement (i.e. in the inactive mode) it can bind to the construct and act as a brake slowing the movement of the construct when it is moved with respect to a nanopore, e.g. by being pulled into the pore by a force. In the inactive mode, it does not matter which end of the construct is captured, it is the applied force which determines the movement with respect to the pore, and the motor protein acts as a brake. When in the inactive mode, the movement control by the motor protein can be described in a number of ways including ratcheting, sliding and braking.

[0342] In another embodiment the motor protein is a protein translocase. Protein translocases are protein-binding polypeptides which are able to control movement of a protein substrate, for example an enzyme, enzyme complex, or a part of an enzyme complex that operates on a protein substrate and moves it relative to the enzyme in a processive manner, i.e. as a function of enzymatic activity.

[0343] In some embodiments the motor protein is a NTP driven unfoldase. NTP driven unfoldases are NTP-dependent enzymes that catalyze protein unfolding. NTP driven unfoldases include ATP-dependent proteases, such as proteasomal ATPases, AAA proteases, AAA+ enzymes; membrane fusion proteins, such as NSF (N-Ethylmal eimidesensitive fusion protein) / Sacl8p (N-Ethylmaleimide-sensitive fusion protein homologue in yeast) or p97 / VCP / Cdc48p (97-kDa valosin-containing protein); Pexlp and Pex6p (peroxisomal ATPase); Katanin and SKD1 (Vps4p homolog in mouse) / Vps4p (Vacuolar protein sorting 4 homolog in yeast); Dynein (motor protein); DNA replication proteins, such as ORC (origin recognition complex), Cdc6 (cell division control protein 6), MCM (minichromosome maintenance protein), DnaA, or RFC (replication factor C) / clamp- loader; RuvB (holliday junction ATP-dependent DNA helicase RuvB, EC=3.6.4.12); TIP49a / TIP49 and TIP49b / TIP48 (eukaryotic RuvB-like protein).

[0344] In some embodiments the motor protein is an AAA+ enzyme, AAA+ enzymes are members of the AAA+ superfamily of enzymes. AAA+ is an abbreviation for ATPases Associated with diverse cellular Activities. They share a common conserved module of approximately 230 amino acid residues. This is a large, functionally diverse protein family belonging to the AAA+ superfamily of ring-shaped P-loop NTPases, which exert their activity through the energy-dependent remodeling or translocation of macromolecules. Examples include ClpAP, ClpXP, ClpCP, HslYU and Lon in bacteria and their homologues in mitochondria and chloroplasts. With the exception of Lon, AAA+ enzymes (sometimes referred to as unfoldases or proteases) consist of regulatory (ATPase) and proteolytic subunits, while Lon is a single polypeptide containing both regulatory and proteolytic domains. ClpX and ClpA dock with ClpP to form ClpXP and ClpAP proteases, whereas HslU docks with HslY to form another protease, HslVU. ClpA and ClpX form hexamers, in contrast to ClpP which forms heptamers. HslU and HslY each form hexamers, although HslU heptamers have also been reported. The regulatory subunits ClpA, ClpX and HslU function as chaperones.

[0345] AAA+ enzymes may also be referred to as AAA+ molecular motors.

[0346] HslU is a member of the HsplOO and Clp family of ATPase. It can also form complex with HslY to act as an unfoldase.

[0347] Lon proteases are ATP-dependent serine peptidases belonging to the MEROPS peptidase family S16 (Ion protease family, clan SF).

[0348] In some embodiments the motor protein is ClpX or is a derivative thereof. ClpX is a member of the HSP (heat-shock protein) 100 family having the Uniprot designation clpX and having the 424 amino acid sequence given there, processed into mature form, as a subunit. ClpX subunits associate to form a six-membered (homohexameric) ring that is stabilized by binding of ATP or nonhydrolysable analogs of ATP. The N-terminal domain of ClpX is a C4-type zinc binding domain (ZBD) involved in substrate recognition. ZBD forms a very stable dimer that is essential for promoting the degradation of some typical ClpXP substrates such as and Mu A. In some embodiments the motor protein is E. coli ClpX. E. coli ClpX generates sufficient mechanical force (>20 pN) to denature stable protein folds, and translocates along proteins at a suitable rate for primary sequence analysis by nanopore sensors (up to 80 amino acids per second). ClpX is part of the ClpXP proteasome-like complex. ClpP is composed of a diheptameric cylinder-like protease that binds at one or both ends a regulatory hexameric ATP-dependent unfoldase / translocase complex (e.g. ClpX). ClpX acts as a gate that allows for tagged proteins to enter into the inner lumen of the ClpP protease complex for subsequent degradation. The ATP-dependent unfoldase / translocase activity of the hexameric protein complex, ClpX, is employed to unfold and thread proteins through a nanopore.

[0349] In some embodiments the motor protein is a ClpX-deltaN subunits, lacking N- terminal amino acids 1-60, linked with a 20 amino acid long linker and prepared as a single polypeptide chain.

[0350] In some embodiments the motor protein is a Clp / HsplOO ATPase. Clp / HsplOO ATPases are responsible for selecting protein targets. For example, the two different bacterial ATPases ClpX and ClpA impart distinct substrate preferences to the ClpP peptidase.

[0351] In some embodiments the motor protein is a mitochondrial protein translocase. Examples include TOM or TIM from human or eukaryotic cells, such as TOMM20 (translocase of outer mitochondrial membrane homolog), TOMM22 (mitochondrial import receptor subunit 22 homolog), TOMM40 (translocase of outer mitochondrial membrane 40 homolog), T0M7 (translocase of mitochondrial outer membrane 7), T0MM7 (translocase of outer mitochondrial membrane 7 homolog), TIMM8A (translocase of inner mitochondrial membrane 8 homolog A), TIMM50 (translocase of inner mitochondrial membrane 50 homolog).

[0352] Another alternative protein translocase may be prepared from the Sec family of translocases. These include SecB (chaperone protein), SecA (ATPase), SecY (internal membrane complex in prokaryotes), SecE (interal membrane complex in prokaryotes), SecG (internal membrane complex in prokaryotes) or Sec61 (internal membrane complex in eukaryotes), SecD (membrane protein), and SecF (membrane protein).

[0353] Another alternative protein translocase is Type III Secretion System (TTS) Translocase, such as HrcN and any of the subunits of the TTS translocases, or Secindependent periplasmic protein translocase TatC. Examples of suitable protein translocases, such as NTP driven unfoldases as described above, are described in WO 2013 / 123379, hereby incorporated by reference.

[0354] A motor protein typically requires fuel in order to handle the processing of polynucleotides and / or polypeptides. Fuel is typically free nucleotides or free nucleotide analogues. The free nucleotides may be one or more of, but are not limited to, adenosine monophosphate (AMP), adenosine diphosphate (ADP), adenosine triphosphate (ATP), guanosine monophosphate (GMP), guanosine diphosphate (GDP), guanosine triphosphate (GTP), thymidine monophosphate (TMP), thymidine diphosphate (TDP), thymidine triphosphate (TTP), uridine monophosphate (UMP), uridine diphosphate (UDP), uridine triphosphate (UTP), cytidine monophosphate (CMP), cytidine diphosphate (CDP), cytidine triphosphate (CTP), cyclic adenosine monophosphate (cAMP), cyclic guanosine monophosphate (cGMP), deoxyadenosine monophosphate (dAMP), deoxyadenosine diphosphate (dADP), deoxyadenosine triphosphate (dATP), deoxyguanosine monophosphate (dGMP), deoxyguanosine diphosphate (dGDP), deoxyguanosine triphosphate (dGTP), deoxythymidine monophosphate (dTMP), deoxythymidine diphosphate (dTDP), deoxythymidine triphosphate (dTTP), deoxyuridine monophosphate (dUMP), deoxyuridine diphosphate (dUDP), deoxyuridine triphosphate (dUTP), deoxycytidine monophosphate (dCMP), deoxycytidine diphosphate (dCDP) and deoxycytidine triphosphate (dCTP). The free nucleotides are usually selected from AMP, TMP, GMP, CMP, UMP, dAMP, dTMP, dGMP or dCMP. The free nucleotides are typically adenosine triphosphate (ATP).

[0355] A cofactor for the motor protein is a factor that allows the motor protein to function. The cofactor is often a divalent metal cation. The divalent metal cation is often Mg2+, Mn2+, Ca2+or Co2+. The cofactor is most typically Mg2+.

[0356] Detector

[0357] Embodiments described herein refer to movement of an analyte such as a peptide linker construct as described herein with respect to a nanopore. However, whilst the disclosure provides nanopores as exemplary detectors, the methods provided herein are also amenable to other detectors including (i) a zero-mode waveguide, (ii) a field-effect transistor, optionally a nanowire field-effect transistor; (iii) an AFM tip; (iv) a nanotube, optionally a carbon nanotube and (v) a nanopore. The disclosed methods are particularly amenable to methods in which a polypeptide is moved through a detector or through a structure containing a detector, e.g. a well in a detector chip. Nanopore

[0358] In the disclosed methods, any suitable nanopore can be used. In one embodiment a nanopore is a transmembrane pore.

[0359] A transmembrane pore is a structure that crosses the membrane to some degree. It permits hydrated ions driven by an applied potential to flow across or within the membrane. The transmembrane pore typically crosses the entire membrane so that hydrated ions may flow from one side of the membrane to the other side of the membrane. However, the transmembrane pore does not have to cross the membrane. It may be closed at one end. For instance, the pore may be a well, gap, channel, trench or slit in the membrane along which or into which hydrated ions may flow.

[0360] Any suitable transmembrane pore may be used in the methods provided herein. The pore may be biological or artificial. Suitable pores include, but are not limited to, protein pores, polynucleotide pores, and solid state pores.

[0361] A solid state pore may, in one embodiment, comprise a nanochannel. In some embodiments the solid state pore is a pore disclosed in WO 2003 / 003446, WO 2009 / 020682 or WO 2016 / 187519, each of which is incorporated by reference in their entirety.

[0362] In one embodiment, the pore may be a DNA origami pore (Langecker et al.. Science, 2012; 338: 932-936). Suitable DNA origami pores are disclosed in WO2013 / 083983, WO 2018 / 011603 and WO 2020 / 025974, each of which is incorporated by reference in their entirety.

[0363] In one embodiment, the nanopore is a scaffolded polypeptide nanopore. In some embodiments the pore is a scaffolded polypeptide nanopore as disclosed in WO 2020 / 025909 or WO 2020 / 074399, each of which is incorporated by reference in their entirety.

[0364] In one embodiment, the nanopore is a transmembrane protein pore. A transmembrane protein pore is a polypeptide or a collection of polypeptides that permits hydrated ions, such as polynucleotides, to flow from one side of a membrane to the other side of the membrane. In the methods provided herein, the transmembrane protein pore is capable of forming a pore that permits hydrated ions driven by an applied potential to flow from one side of the membrane to the other. The transmembrane protein pore typically permits polynucleotides and polypeptides to flow from one side of the membrane, such as a polymer membrane, to the other. The transmembrane protein pore allows a polynucleotide or polypeptide to be moved through the pore.

[0365] In one embodiment, the nanopore is a transmembrane protein pore which is a monomer or an oligomer. The pore is typically made up of several repeating subunits, such as at least 6, at least 7, at least 8, at least 9, at least 10, at least 11, at least 12, at least 13, at least 14, at least 15, or at least 16 subunits. The pore is typically a hexameric, heptameric, octameric or nonameric pore. The pore may be a homo-oligomer or a heterooligomer.

[0366] In one embodiment, the transmembrane protein pore comprises a barrel or channel through which the ions may flow. The subunits of the pore typically surround a central axis and contribute strands to a transmembrane P-barrel or channel or a transmembrane a- helix bundle or channel.

[0367] Typically, the barrel or channel of the transmembrane protein pore comprises amino acids that facilitate interaction with an analyte, such as a target polypeptide (as described herein). These amino acids are typically located near a constriction of the barrel or channel. The transmembrane protein pore typically comprises one or more positively charged amino acids, such as arginine, lysine or histidine, or aromatic amino acids, such as tyrosine or tryptophan. These amino acids typically facilitate the interaction between the pore and nucleotides, polynucleotides, nucleic acids and polypeptides.

[0368] The transmembrane protein pore may be from or derived from Wza, Iota toxin, Anthrax protective antigen, Vibrio cholerae cytolysin, Cytotoxin K (CytK), CELIII, CsgG, CsgF, CsgG-CsgF, Aerolysin, alpha hemolysin, MspA, MspB, MspC, PorARr, PorBRr, PorARc, PilQ, necrotic enteritis B-like toxin (NetB), FraC, portal proteins including G20c, P23_45, T4, SPP1, P22 and Phi29, gamma hemolysin, Monalysin, Lysenin, ClyA, an actinoporin, Clostridium perfringens beta toxin, parasporin-2, epsilon toxin, lectin from the parasitic mushroom Laetiporus sulphureus (LSL), volvatoxin, Cry toxins, Cytl Aa, Cyt2Aa, Complement component 9 (C9), Perfringolysin O, Pleurotolysin, Listeriolysin, Perforin-2, Gasdermin- A3, L-, P- and M-ring protein, Type II secretion system protein D, GspD, InvG, VirB7, SpoIIIAG, Cag8, Cag3, Cag or other proteins in the Type IV secretion system apparatus protein CagY, WzzB, Pentraxin, Afp2, Major vault protein, Thioredoxindependent peroxidase reductase, Arf-GAP, Respiratory syncytial virus ribonucleoprotein, Chikungunya virus nonstructural protein 1, PRC, YaxA, XaxA, HfaB, NfpAB, leukocidin and PrgH. Suitable transmembrane protein pores for use in the invention include those described in WO 2016 / 034591, WO 2017 / 149316, WO 2017 / 149317, WO 2017 / 149318, WO 2018 / 211241, WO 2019 / 002893, WO 2023 / 118404, WO 2023 / 198911, WO 2024 / 033421, WO 2024 / 033422, WO 2024 / 033443, and WO 2024 / 089270 (all incorporated by reference herein in their entirety).

[0369] The transmembrane protein pore may also be any of the CsgG pores described in WO 2023 / 060420, WO 2023 / 60418, WO 2023 / 60422, WO 2023 / 060421, WO 2023 / 019470, CN114957412, WO 2023 / 019471, W02023 / 060419 and WO 2023 / 050031 (all incorporated herein by reference in their entireties) or a variant thereof.

[0370] The transmembrane protein pore may be any of the pores described in WO 2023 / 123370, WO 2024 / 138470, WO 2024 / 138472, WO 2024 / 138424, WO 2024 / 138425, WO 2024 / 138512 and WO 2024 / 138565 or a variant thereof.

[0371] The transmembrane pore may be formed from a chimeric pore monomer comprising two or more regions, wherein at least two of the two or more regions are from at least two different pores. The chimeric pore monomer may comprise any number of regions, such as three or more, four or more, five or more, six or more, seven or more, eight or more, nine or more or ten or more regions, from different pores. The chimeric pore monomer may comprise two or three regions. The regions are preferably selected from a cap region, a constriction region, and a transmembrane region. The regions may be a cap region and a constriction region. The regions may be a cap region, a constriction region, and a transmembrane region. The at least two different pores are typically at least two different pores that appear in nature. The at least two different pores are typically at least two different wild-type or naturally occurring pores. The at least two different pores are preferably different before any artificial or synthetic modifications, such as additions, deletions and / or substitutions, are made to them. The at least two different pores are preferably homologues, for example structural homologues. A structural homologue refers to a protein or molecule that shares a similar three-dimensional structure with another protein or molecule. This can be determined using standard methods in the art (e.g., AlphaFold or PSIPRED). Structural homologues typically have similar sequences. Structural homologues are normally identified in similar species. The at least two different pores may be selected from any of the pores listed above. The at least two different pores may be two different PorARc pores or three different PorARc pores. The at least two different pores may be two different CsgG pores or three different CsgG pores. The chimeric pore monomer may be any of those described in PCT / EP2023 / 080135 (incorporated by reference herein in its entirety).

[0372] The transmembrane pore may be formed from a pore monomer comprising (a) a CsgG monomer and (b) a fusion polypeptide comprising a first portion comprising a CsgF peptide and a second portion comprising a helix-forming auxiliary protein, wherein the fusion protein is attached to the pore monomer. The pore monomer may be derived from a protein transmembrane pore complex comprising (a) a CsgG transmembrane pore comprising a lumen and (b) a fusion polypeptide comprising a first portion comprising a CsgF protein and a second portion comprising a helix-forming auxiliary protein, wherein the fusion protein is attached to the transmembrane pore. The auxiliary protein can be designed de novo using computer-based structural analysis tools to confer certain desirable features to the CsgG monomer (e.g., modulation of pore width, lengthening of pore lumen, formation of one or more additional constrictions, etc.). The de novo designed auxiliary protein may form one or more additional constrictions in the lumen of a CsgG pore formed from the monomer, and improve discrimination of polymer units as an analyte moves through the pore. The pore monomer may be any of the pore monomers described in WO 2024 / 033447 (incorporated by reference herein in its entirety).

[0373] Tags

[0374] In some embodiments of the methods provided herein, a tag on the nanopore can be used, e.g. to promote the capture of the peptide linker construct.

[0375] The interaction between a tag on a nanopore and a binding site on a construct (e.g. an adapter attached thereto) may be reversible. A strong non-covalent bond (e.g., biotin / avidin) is still reversible and can be useful in some embodiments of the methods described herein. For example, a pair of pore tag and construct or adaptor can be designed to provide a sufficient interaction with the nanopore such that the construct is held close to the nanopore (without detaching from the nanopore and diffusing away) but is able to release from the nanopore as it is processed.

[0376] A pore tag and adaptor can be configured such that the binding strength or affinity of a binding site on the construct (e.g., a binding site provided by an anchor or a leader sequence of an adaptor or by a capture sequence within the duplex stem of an adaptor) to a tag on a nanopore is sufficient to maintain the coupling between the nanopore and construct until an applied force is placed on it to release the bound construct from the nanopore. One or more molecules that attract or bind the construct or an adapter attached thereto may be linked to the nanopore. Any molecule that hybridizes to the construct and / or adaptor may be used. The molecule attached to the pore may be selected from a PNA tag, a PEG linker, a short oligonucleotide, a positively charged amino acid and an aptamer. Pores having such molecules linked to them are known in the art. For example, pores having short oligonucleotides attached thereto are disclosed in Howarka et al (2001) Nature Biotech. 19: 636-639 and WO 2010 / 086620, and pores comprising PEG attached within the lumen of the pore are disclosed in Howarka et al (2000) J. Am. Chem. Soc. 122(11): 2411-2416. In some embodiments, a tag or tether may be uncharged. This can ensure that the tags or tethers are not drawn into the nanopore under the influence of a potential difference if present.

[0377] A short oligonucleotide attached to the nanopore, which comprises a sequence complementary to a sequence in the construct (e.g. in a leader sequence or another single stranded sequence in an adaptor) may be used to enhance capture of the construct or adapter attached thereto in the methods described herein.

[0378] Anchors

[0379] In some embodiments of the methods provided herein, an anchor on the construct can be used, e.g. to promote the localisation of the construct to a membrane in which a nanopore may be present.

[0380] Thus, in one embodiment, a construct or adapter attached thereto may comprise a membrane anchor or a transmembrane pore anchor. The anchor may be a polypeptide anchor and / or a hydrophobic anchor that can be inserted into the membrane. In one embodiment, the hydrophobic anchor is a lipid, fatty acid, sterol, carbon nanotube, polypeptide, protein or amino acid, for example cholesterol, palmitate or tocopherol. The anchor may comprise thiol, biotin or a surfactant. In one aspect the anchor may be biotin (for binding to streptavidin), amylose (for binding to maltose binding protein or a fusion protein), Ni-NTA (for binding to poly-histidine or poly-histidine tagged proteins) or peptides (such as an antigen).

[0381] In one embodiment, the anchor is or comprises cholesterol or a fatty acyl chain. For example, any fatty acyl chain having a length of from 6 to 30 carbon atom, such as hexadecanoic acid, may be used. Examples of suitable anchors and methods of attaching anchors to adapters are disclosed in WO 2012 / 164270 and WO 2015 / 150786. Membrane

[0382] The detector or nanopore is typically present in a membrane. Any suitable membrane may be used.

[0383] The membrane is preferably an amphiphilic layer. An amphiphilic layer is a layer formed from amphiphilic molecules, such as phospholipids, which have both hydrophilic and lipophilic properties. The amphiphilic molecules may be synthetic or naturally occurring. Non-naturally occurring amphiphiles and amphiphiles which form a monolayer are known in the art and include, for example, block copolymers (Gonzalez-Perez et al., Langmuir, 2009, 25, 10447-10450). Block copolymers are polymeric materials in which two or more monomer sub-units that are polymerized together to create a single polymer chain. Block copolymers typically have properties that are contributed by each monomer sub-unit. However, a block copolymer may have unique properties that polymers formed from the individual sub-units do not possess. Block copolymers can be engineered such that one of the monomer sub-units is hydrophobic (z.e., lipophilic), whilst the other subunits) are hydrophilic whilst in aqueous media. In this case, the block copolymer may possess amphiphilic properties and may form a structure that mimics a biological membrane. The block copolymer may be a diblock (consisting of two monomer sub-units) but may also be constructed from more than two monomer sub-units to form more complex arrangements that behave as amphipiles. The copolymer may be a triblock, tetrablock or pentablock copolymer. The membrane may be a triblock copolymer membrane.

[0384] Archaebacterial bipolar tetraether lipids are naturally occurring lipids that are constructed such that the lipid forms a monolayer membrane. These lipids are generally found in extremophiles that survive in harsh biological environments, thermophiles, halophiles and acidophiles. Their stability is believed to derive from the fused nature of the final bilayer. It is straightforward to construct block copolymer materials that mimic these biological entities by creating a triblock polymer that has the general motif hydrophilic- hydrophobic-hydrophilic. This material may form monomeric membranes that behave similarly to lipid bilayers and encompass a range of phase behaviours from vesicles through to laminar membranes. Membranes formed from these triblock copolymers hold several advantages over biological lipid membranes. Because the triblock copolymer is synthesised, the exact construction can be carefully controlled to provide the correct chain lengths and properties required to form membranes and to interact with pores and other proteins. Block copolymers may also be constructed from sub-units that are not classed as lipid sub-materials; for example, a hydrophobic polymer may be made from siloxane or other non-hydrocarbon-based monomers. The hydrophilic sub-section of block copolymer can also possess low protein binding properties, which allows the creation of a membrane that is highly resistant when exposed to raw biological samples. This head group unit may also be derived from non-classical lipid head-groups.

[0385] Triblock copolymer membranes also have increased mechanical and environmental stability compared with biological lipid membranes, for example a much higher operational temperature or pH range. The synthetic nature of the block copolymers provides a platform to customise polymer-based membranes for a wide range of applications.

[0386] The membrane may be one of the membranes disclosed in International Application No. WO2014 / 064443 or WO2014 / 064444 (both of which are incorporated herein by reference in their entireties).

[0387] The amphiphilic molecules may be chemically modified or functionalised to facilitate coupling of the polynucleotide. The amphiphilic layer may be a monolayer or a bilayer. The amphiphilic layer is typically planar. The amphiphilic layer may be curved. The amphiphilic layer may be supported.

[0388] Amphiphilic membranes are typically naturally mobile, essentially acting as two- dimensional fluids with lipid diffusion rates of approximately 10'8cm s'1. This means that the pore and coupled polynucleotide can typically move within an amphiphilic membrane.

[0389] The membrane may be a lipid bilayer. Lipid bilayers are models of cell membranes and serve as excellent platforms for a range of experimental studies. For example, lipid bilayers can be used for in vitro investigation of membrane proteins by single-channel recording. Alternatively, lipid bilayers can be used as biosensors to detect the presence of a range of substances. The lipid bilayer may be any lipid bilayer. Suitable lipid bilayers include, but are not limited to, a planar lipid bilayer, a supported bilayer, or a liposome. The lipid bilayer is preferably a planar lipid bilayer. Suitable lipid bilayers are disclosed in WO 2008 / 102121, WO 2009 / 077734, and WO 2006 / 100484 (incorporated herein by reference in their entireties).

[0390] Methods for forming lipid bilayers are known in the art. Lipid bilayers are commonly formed by the method of Montal and Mueller (Proc. Natl. Acad. Sci. USA., 1972; 69: 3561-3566). A lipid bilayer may be formed as described in WO 2009 / 077734 (incorporated herein by reference in its entirety). In this method, the lipid bilayer is formed from dried lipids. A lipid bilayer may be formed across an opening as described in W02009 / 077734.

[0391] The membrane may comprise a solid-state layer. Solid state layers can be formed from both organic and inorganic materials including, but not limited to, microelectronic materials, insulating materials such as SislS , AI2O3, and SiO, organic and inorganic polymers such as polyamide, plastics such as Teflon® or elastomers such as two- component addition-cure silicone rubber, and glasses. The solid-state layer may be formed from graphene. Suitable graphene layers are disclosed in WO 2009 / 035647 (incorporated herein by reference in its entirety). If the membrane comprises a solid-state layer, the pore is typically present in an amphiphilic membrane or layer contained within the solid-state layer, for instance within a hole, well, gap, channel, trench or slit within the solid-state layer. The skilled person can prepare suitable solid state / amphiphilic hybrid systems. Suitable systems are disclosed in WO 2009 / 020682 and WO 2012 / 005857 (incorporated herein by reference in their entireties). Any of the amphiphilic membranes or layers discussed above may be used.

[0392] The methods disclosed herein are typically carried out using (i) an artificial amphiphilic layer comprising a pore, (ii) an isolated, naturally occurring lipid bilayer comprising a pore, or (iii) a cell having a pore inserted therein. The methods are typically carried out using an artificial amphiphilic layer, such as an artificial triblock copolymer layer. The layer may comprise other transmembrane and / or intramembrane proteins as well as other molecules in addition to the pore. Suitable apparatus and conditions are discussed below. The method of the invention is typically carried out in vitro.

[0393] Characterisation

[0394] As discussed in more detail herein, in some embodiments the disclosed methods comprise taking one or more measurements as a peptide linker construct moves with respect to a nanopore. The one or more measurements are typically one or more measurements characteristic of the polypeptide portion of the peptide linker construct. By taking one or more measurements characteristic of the peptide linker construct, characteristics of the target polypeptide can be determined.

[0395] Many different characteristics can be determined. Any suitable measurements can be taken. For example, in some embodiments characterising the peptide linker construct comprises determining (i) the length of the construct, (ii) the identity of the construct, (iii) the sequence of the construct, (iv) the secondary structure of the construct; (v) whether or not and / or to the extent to which the construct is modified; (vi) the presence, absence, concentration or relative abundance of the construct in a sample comprising multiple such constructs (e.g. derived from a sample comprising multiple target polypeptides). Those skilled in the art will appreciate that characterising a peptide linker construct or target polypeptide does not necessarily comprise determining any or all of these features. Many characterisation measurements can be made as the peptide linker construct moves with respect to a nanopore.

[0396] Because the peptide linker construct is derived from the target polypeptide, the characteristics of the peptide linker construct can inform on the characteristics of the target polypeptide. Thus, the disclosed methods can be used to inform regarding (i) the length of the target polypeptide, (ii) the identity of the target polypeptide, (iii) the sequence of the target polypeptide, (iv) the secondary structure of the target polypeptide; (v) whether or not and / or to the extent to which the target polypeptide is modified, e.g. by one or more post- translational modifications.; (vi) the presence, absence, concentration or relative abundance of the target polypeptide in a sample comprising multiple polypeptides.

[0397] In some embodiments the measurements are characteristic of the sequence of the polypeptide of the peptide linker construct. In some embodiments the measurements are characteristic of the sequence of the target polypeptide.

[0398] In some embodiments the measurements are characteristic of whether or not the polypeptide portion of the peptide linker construct is modified. In some embodiments the measurements are characteristic of whether or not the target polypeptide is modified.

[0399] Conditions

[0400] The disclosed methods may be carried out using any apparatus that is suitable for investigating a membrane / pore system in which a nanopore is inserted into a membrane. The characterisation method may be carried out using any apparatus that is suitable for transmembrane pore sensing. For example, the apparatus may comprise a chamber comprising an aqueous solution and a barrier that separates the chamber into two sections. The barrier may have an aperture in which a membrane containing a transmembrane pore is formed. Transmembrane pores are described herein. The characterisation methods may be carried out using the apparatus described in WO 2008 / 102120, WO 2010 / 122293 or WO 00 / 28312.

[0401] The characterisation methods may comprise optical measurements, for example such as described in WO 2016 / 009180 and WO 2021 / 198695.

[0402] The characterisation methods may involve measuring the ion current flow through the pore, typically by measurement of a current. Alternatively, the ion flow through the pore may be measured optically, such as disclosed by Heron et al: J. Am. Chem. Soc. 9 Vol. 131, No. 5, 2009. Therefore the apparatus may also comprise an electrical circuit capable of applying a potential and measuring an electrical signal across the membrane and pore. The characterisation methods may be carried out using a patch clamp or a voltage clamp. The characterisation methods typically involve the use of a voltage clamp.

[0403] The characterisation methods may be carried out on a silicon-based array of wells where each array comprises 128, 256, 512, 1024, 2000, 3000, 4000, 6000, 10000, 12000, 15000 or more wells.

[0404] The characterisation methods may involve the measuring of a current flowing through the pore. The method is typically carried out with a voltage applied across the membrane and pore. The voltage used is typically from +2 V to -2 V, typically -400 mV to +400mV. The voltage used is typically in a range having a lower limit selected from -400 mV, -300 mV, -200 mV, -150 mV, -100 mV, -50 mV, -20mV and 0 mV and an upper limit independently selected from +10 mV, + 20 mV, +50 mV, +100 mV, +150 mV, +200 mV, +300 mV and +400 mV. The voltage used is more typically in the range 100 mV to 240mV and most typically in the range of 120 mV to 220 mV. It is possible to increase discrimination between different nucleotides by a pore by using an increased applied potential.

[0405] The characterisation methods are typically carried out in the presence of any charge carriers, such as metal salts, for example alkali metal salts, halide salts, for example chloride salts, such as alkali metal chloride salt. Charge carriers may include ionic liquids or organic salts, for example tetramethyl ammonium chloride, trimethylphenyl ammonium chloride, phenyltrimethyl ammonium chloride, or l-ethyl-3 -methyl imidazolium chloride. In the exemplary apparatus discussed above, the salt is present in the aqueous solution in the chamber. Potassium chloride (KC1), sodium chloride (NaCl) or caesium chloride (CsCl) is typically used. KC1 is typical. The salt may be an alkaline earth metal salt such as calcium chloride (CaCh). The salt concentration may be at saturation. The salt concentration may be 3M or lower and is typically from 0.1 to 2.5 M, from 0.3 to 1.9 M, from 0.5 to 1.8 M, from 0.7 to 1.7 M, from 0.9 to 1.6 M or from 1 M to 1.4 M. The salt concentration is typically from 150 mM to 1 M. The characterisation method may be carried out using a salt concentration of at least 0.3 M, such as at least 0.4 M, at least 0.5 M, at least 0.6 M, at least 0.8 M, at least 1.0 M, at least 1.5 M, at least 2.0 M, at least 2.5 M or at least 3.0 M. High salt concentrations provide a high signal to noise ratio and allow for currents indicative of binding / no binding to be identified against the background of normal current fluctuations.

[0406] The characterisation methods are typically carried out in the presence of a buffer. In the exemplary apparatus discussed above, the buffer is present in the aqueous solution in the chamber. Any suitable buffer may be used. Typically, the buffer is HEPES. Another suitable buffer is Tris-HCl buffer. The methods are typically carried out at a pH of from 4.0 to 12.0, from 4.5 to 10.0, from 5.0 to 9.0, from 5.5 to 8.8, from 6.0 to 8.7 or from 7.0 to 8.8 or 7.5 to 8.5. The pH used may be about 7.5.

[0407] The characterisation methods may be carried out at from 0 °C to 100 °C, from 15 °C to 95 °C, from 16 °C to 90 °C, from 17 °C to 85 °C, from 18 °C to 80 °C, 19 °C to 70 °C, or from 20 °C to 60 °C. The characterisation methods are typically carried out at room temperature. The characterisation methods are optionally carried out at a temperature that supports enzyme function, such as about 37 °C.

[0408] Further aspects

[0409] Also provided herein is a method of moving one or more peptide portions of a target polypeptide with respect to a nanopore, wherein the target polypeptide comprises one or more target amino acid pairs, wherein each target amino acid pair in the target polypeptide comprises a first amino acid attached to a second amino acid; the method comprising expanding the target polypeptide to form a peptide-linker construct; and contacting the peptide-linker construct with a motor protein under conditions such that the motor protein controls the movement of one or more peptide portions of the construct with respect to a nanopore; wherein expanding the target polypeptide comprises attaching a linker between the first and second amino acids in each target amino acid pair, and cleaving the target polypeptide between the first and second amino acids in each target amino acid pair; thereby forming a peptide-linker construct in which the sequence order of the amino acids in the target polypeptide is maintained. In such methods, the target polypeptide is typically a target polypeptide as described herein. The target polypeptide typically comprises one or more target amino acid pairs as described herein. Expanding the target polypeptide using a linker is typically conducted as described herein. The linker is typically as described herein. Cleaving the target polypeptide is typically conducted as described herein. The peptide linker construct may be moved with respect to a nanopore which may be a nanopore as described herein using methods as described herein, for example under the control of a motor protein as described herein.

[0410] Moving one or more peptide portions of the peptide linker construct with respect to the nanopore may allow one or more measurements to be taken as the peptide linker construct moves with respect to the nanopore. In some embodiments the one or more measurements are one or more measurements characteristic of the polypeptide portions of the peptide linker construct. In some embodiments the one or more measurements are one or more measurements characteristic of the target polypeptide from which the peptide linker construct is derived. In some embodiments the one or more measurements are one or more measurements as described herein. In some embodiments the one or more measurements are one or more electrical or optical measurements. In some embodiments the one or more measurements are one or more electrical measurements. In some embodiments the one or more measurements are one or more optical measurements. In some embodiments the one or more measurements comprise measuring the ion current flow through the nanopore.

[0411] Also provided is a method of producing a peptide-linker construct from a target polypeptide, wherein the target polypeptide comprises one or more target amino acid pairs, wherein each target amino acid pair in the target polypeptide comprises a first amino acid attached to a second amino acid; the method comprising expanding the target polypeptide by attaching a linker between the first and second amino acids in each target amino acid pair, and cleaving the target polypeptide between the first and second amino acids in each target amino acid pair; thereby forming a peptide-linker construct in which the sequence order of the amino acids in the target polypeptide is maintained. In such methods, the target polypeptide is typically a target polypeptide as described herein. The target polypeptide typically comprises one or more target amino acid pairs as described herein. Expanding the target polypeptide using a linker is typically conducted as described herein. The linker is typically as described herein. Cleaving the target polypeptide is typically conducted as described herein.

[0412] The peptide linker construct produced in the provided methods can be used in any application for which a peptide linker construct is useful. In some embodiments the peptide linker construct is used in a method of movement of a peptide linker construct with respect to a nanopore. In some embodiments the peptide linker construct is used in a method of characterising a target polypeptide using a nanopore. For example, in some embodiments the peptide linker construct is used in a method comprising moving the peptide linker construct with respect to a nanopore using methods as described herein, for example under the control of a motor protein as described herein. As described herein, the peptide linker construct can be characterised. Any suitable characterisation methods can be used. An exemplary characterisation method comprises the use of HPLC. Another exemplary characterisation method comprises allowing the peptide linker construct to move with respect to a nanopore and taking one or more measurements characteristic of the target polypeptide as the peptide linker construct moves with respect to the nanopore. In some embodiments the one or more measurements are one or more measurements as described herein.

[0413] Also provided is a conjugate, comprising a target polypeptide comprising one or more target amino acid pairs, wherein each target amino acid pair in the target polypeptide comprises a first amino acid attached to a second amino acid; and one or more linkers attached to the first amino acid in each target amino acid pair, wherein the one or more linkers are each also optionally attached to the second amino acid in each target amino acid pair.

[0414] In some embodiments the conjugate is an intermediate in the production of the peptide linker construct as described herein. In some embodiments the target polypeptide in the conjugate is not cleaved, e.g. is not cleaved using a proteolytic enzyme as described herein. The skilled person will understand that the provided conjugate may however be applied in methods in which the polypeptide of the conjugate is cleaved (e.g. is cleaved in a method as described herein). In some embodiments the conjugate comprises a sequencing adapter as described herein. In some embodiments the conjugate comprises a motor protein as described herein. In some embodiments the conjugate comprises one or more sequencing adapters and one or more motor proteins as described herein. In some embodiments the one or more motor proteins are each bound to one or more sequencing adapters.

[0415] Also provided is a peptide-linker construct comprising a concatemer of contiguous peptide fragments, wherein proximate peptide fragments are linked together by a plurality of linkers. In some embodiments the peptide fragments comprise a sequence order, e.g a sequence order which may correspond to the sequence order of a target polypeptide such as a protein. In some embodiments the peptide linkers in the peptide linker construct are linkers as described herein.

[0416] In some embodiments the peptide linker construct comprises a sequencing adapter as described herein. In some embodiments the peptide linker construct comprises a motor protein as described herein. In some embodiments the peptide linker construct comprises one or more sequencing adapters and one or more motor proteins as described herein. In some embodiments the one or more motor proteins are each bound to one or more sequencing adapters. In some embodiments the peptide linker construct is anchored to a membrane comprising a nanopore, e.g. a membrane and / or nanopore as described herein.

[0417] Also provided is a kit for modifying a target polypeptide, comprising a linker having a first end capable of selectively reacting with an optionally- activated first amino acid comprised in a target amino acid pair comprised in the target polypeptide, and a second end capable of reacting with an optionally- activated second amino acid comprised in the target amino acid pair; and a chemical or enzymatic reagent capable of cleaving the target polypeptide between the first amino acid and the second amino acid.

[0418] In some embodiments the kit comprises one or more of: a sequencing adapter capable of selectively reacting with the target polypeptide or the linker; a motor protein capable of controlling the movement of peptide-linker construct with respect to a nanopore; a nanopore capable of detecting one or more characteristics of a peptide-linker construct as the construct moves with respect to the nanopore. In some embodiments the linker is a linker as described herein. In some embodiments the chemical or enzymatic reagent is a reagent as described herein, for example is a proteolytic enzyme as described herein. In some embodiments the sequencing adapter, nanopore and / or motor protein are as described herein.

[0419] The kit may comprise instructions for preparing a peptide linker construct from a target polypeptide as described herein.

[0420] Also provided is a system, comprising a library of peptide linker constructs as described herein, which may be the same or different, and a nanopore. In some embodiments the nanopore is as described herein. In some embodiments the system further comprises a motor protein capable of controlling the movement of the constructs in the library with respect to the nanopore. In some embodiments the system comprises computing means configured to detect information characteristic of the constructs in the library and to selectively process the signal obtained as said constructs move with respect to the nanopore. In some embodiments the system comprises receiving means for receiving data from detection of the oligopeptides, processing means for processing the signal obtained as the constructs move with respect to the nanopore, and output means for outputting the characterisation information thus obtained.

[0421] Exemplary workflows

[0422] The following workflows are provided to illustrate the methods described herein.

[0423] One embodiment of the disclosed methods is illustrated in Figure 1. This workflow provides an example of the disclosed methods in which expanding the target polypeptide comprises: (i) attaching a first end of each linker to the first amino acid in each target amino acid pair; (ii) cleaving the target polypeptide between the first and the second amino acids in each target amino acid pair; and (iii) attaching a second end of each linker to the second amino acid in each target amino acid pair.

[0424] With reference to Figure 1, a larger protein may optionally be denatured or linearised. The protein, whether native, denatured or linearised, constitutes the target polypeptide. The target polypeptide comprises a plurality of amino acids. A target amino acid pair comprising a first amino acid (A) and a second amino acid (B) is defined by the user of the method. For example, in some embodiments the target amino acid pair is defined by the user of the method as comprising lysine as the first amino acid (A) and the second amino acid in each target amino acid pair is the amino acid C-terminal adjacent to the Lys in each amino acid pair. Thus, in some embodiments the target polypeptide comprises one or more moieties of form N... -Ri-Lys-B-R2-. . . C wherein Ri and R2 are amino acids, Lys is the first amino acid and B is the second amino acid, and N and C represent the N- and C-terminals of the target polypeptide, respectively. The method may then comprise attaching a linker to the first amino acid, e.g. the Lys. The Lys may be activated e.g. using Traut’s reagent as described herein. The linker may comprise a first reactive group such as a maleimide group capable of reacting with the Trauts-activated Lysine. The method may then comprise reacting the conjugate formed by attaching the linker to the first amino acid with a reagent capable of cleaving the target polypeptide between the first and second amino acids. A suitable reagent is a proteolytic enzyme which is specific for Lys, such as LysC as described herein. When LysC is used to cleave the target polypeptide, the polypeptide which comprises the second amion acid (B) comprises a free amino groups at its N-terminus. The method may then comprise attaching the second end of the linker to the free amino group of the second amino acid (B). The amino group may be activated e.g. using Traut’s reagent as described herein. The linker may thus comprise a second reactive group such as a maleimide or haloacetamide group capable of reacting with the Trauts-activated amino-group. The reaction of the target polypeptide with the reagents may be conducted in the presence of a suitable solvent such as DMSO.

[0425] One embodiment of the disclosed methods is illustrated in Figure 2. This workflow provides an example of the disclosed methods in which expanding the target polypeptide comprises: (i) attaching a first end of each linker to each first amino acid in each target amino acid pair; and (ii) attaching a second end of each linker to each second amino acid in each target amino acid pair, thereby cleaving the target polypeptide between the first and the second amino acids in each target amino acid pair.

[0426] With reference to Figure 2, a larger protein may optionally be denatured or linearised. The protein, whether native, denatured or linearised, constitutes the target polypeptide. The target polypeptide comprises a plurality of amino acids. A target amino acid pair comprising a first amino acid (A) and a second amino acid (B) is defined by the user of the method. For example, in some embodiments the target amino acid pair is defined by the user of the method as comprising lysine as the first amino acid (A) and the second amino acid in each target amino acid pair is the amino acid N-terminal adjacent to the Lys in each amino acid pair. Thus, in some embodiments the target polypeptide comprises one or more moieties of form N... -Ri-B-Lys-R2-. . . C wherein Ri and R2 are amino acids, Lys is the first amino acid and B is the second amino acid, and N and C represent the N- and C-terminals of the target polypeptide, respectively. The method may then comprise attaching a linker to the first amino acid, e.g. the Lys. The Lys may be activated e.g. using Traut’s reagent as described herein. The linker may comprise a first reactive group such as a maleimide group capable of reacting with the Trauts-activated Lysine. The linker may comprise a second reactive group such as a carbonyl-reactive group capable of reacting with the carbonyl group of a peptide bond. The method may then comprise reacting the second end of the linker with the carbonyl group of the peptide bond between the Lys and the second amino acid (B). The amide bond may be activated for reaction with the linker. Reaction with the linker causes the cleavage of the peptide bond between the Lys and the second amino acid (B).

[0427] One embodiment of the disclosed methods is illustrated in Figure 3. This workflow provides an example of the disclosed methods in which expanding the target polypeptide comprises: (i) attaching a first end of each linker to each first amino acid in each target amino acid pair; (ii) attaching a second end of each linker to each second amino acid in each target amino acid pair; and (iii) cleaving the target polypeptide between the first and the second amino acids in each target amino acid pair.

[0428] With reference to Figure 3, a larger protein may optionally be denatured or linearised. The protein, whether native, denatured or linearised, constitutes the target polypeptide. The target polypeptide comprises a plurality of amino acids. A target amino acid pair comprising a first amino acid (A) and a second amino acid (B) is defined by the user of the method. For example, in some embodiments the target amino acid pair is defined by the user of the method as comprising lysine as the first amino acid (A) and the second amino acid in each target amino acid pair is the amino acid C-terminal adjacent to the Lys in each amino acid pair. Thus, in some embodiments the target polypeptide comprises one or more moieties of form N... -Ri-Lys-B-R2-. . . C wherein Ri and R2 are amino acids, Lys is the first amino acid and B is the second amino acid, and N and C represent the N- and C-terminals of the target polypeptide, respectively. The method may then comprise attaching a linker to the first amino acid, e.g. the Lys. The Lys may be activated e.g. using Traut’s reagent as described herein. The linker may comprise a first reactive group such as a maleimide group capable of reacting with the Trauts-activated Lysine. The second amino acid may be defined by the user as required. For example, the second amino acid may be defined by the user as being a further Lys such that the target app comprises [Lys-Lys], In another example the second amino acid may be defined by the user as being Cys such that the target app comprises [Lys-Cys], In another example the second amino acid may be defined by the user as being Asp such that the target app comprises [Lys- Asp], The choice of second amino acid is a parameter which can be chosen by the user of the methods. The second end of the linker is then reacted with the second amino acid (e.g. with the side chain of the second amino acid). By way of nonlimiting example, when the second amino acid is Lys the Lys may be activated e.g. using Traut’s reagent as described herein. The linker may comprise a second reactive group such as a maleimide or haloacetamide group capable of reacting with the Trauts-activated Lysine. By way of a second non-limiting example, when the second amino acid is Cys the linker may comprise a second reactive group such as a maleimide or haloacetamide group capable of reacting with the Cys. By way of a third non-limiting example, when the second amino acid is Asp the linker may comprise a second reactive group such as an amine group capable of reacting with the Asp (optionally wherein the Asp is first activated using a carboxyl-activating agent as described herein). The method may then comprise reacting the conjugate formed by attaching the linker to the first and second amino acids with a reagent capable of cleaving the target polypeptide between the first and second amino acids. A suitable reagent is a proteolytic enzyme which is specific for Lys, such as LysC as described herein.

[0429] It is to be understood that although particular embodiments, specific configurations as well as materials and / or molecules, have been discussed herein for methods according to the present invention, various changes or modifications in form and detail may be made without departing from the scope and spirit of this invention. The preceding embodiments and subsequent examples are provided for illustration only, and should not be considered limiting the application. The application is limited only by the claims. Examples

[0430] Example 1

[0431] This Example demonstrates that the enzyme LysC can cleave peptides at lysine residues that have been modified with Traut’s reagent (2-iminothiolane). A schematic of the experimental design is shown in Fig. 8.

[0432] Four experiments were carried out using HPLC and mass spectrometry to assess LysC processing of modified lysine residues.

[0433] Experiment 1 : a peptide with the sequence Ac-EEALYAKAGNNYG-CONH2 (SEQ ID NO: 1) was synthesised and characterised using HPLC and mass spectrometry (Fig. 9). The initial peptide exhibited a retention time of 9.911 minutes, with corresponding masses observed at 1442.5 (MH+) and 720.9 ((M2H+) / 2), where M = is the molecular weight of the desired product.

[0434] Experiment 2: LysC enzyme was added to the starting peptide (Fig.10). The resulting peptide fragments showed retention times of 2.854 minutes for NH2-AGNNYG- CONH2 (SEQ ID NO: 2) and 9.660 minutes for Ac-EEALYAK-COOH (SEQ ID NO: 3), with corresponding masses of 594.2 (MH+) / 616.2 (MNa+) and 865.5 (MH+), respectively. These values confirmed complete cleavage of the starting peptide, yielding the expected fragment sizes.

[0435] Experiment 3: The starting peptide was treated with an excess of Traut’s reagent and mal eimide (Fig.11). The peptide contained a single lysine residue, and with the N- terminus capped by an acetyl group, only this lysine side chain was available for functionalisation with Traut’s reagent. Mal eimide was introduced to capture the thiol formed by the reaction between the lysine side chain and Traut’s reagent. The retention time of the modified peptide shifted from 9.911 minutes for the starting peptide to 10.754 minutes, with masses observed at 820.0 ((M2H+) / 2) and 1640.5 (MH+) (Fig. 4). The expected mass of 1639.5 confirmed complete modification.

[0436] Experiment 4: The reaction mixture from Experiment 3 was buffer-exchanged into amine-free PBS buffer at pH 8.5 to remove unreacted excess Traut’s reagent and maleimide. Following this purification step, LysC enzyme was introduced to cleave the modified peptide (Fig. 12). The HPLC trace revealed two main peptide-related peaks at retention times of 2.845 and 10.706 minutes, corresponding to masses of 594.2 and 1063.5, respectively. The peak at 2.845 matched the fragment observed in Experiment 2, validating enzyme cleavage of the original lysine-based site. The mass of the 10.706-minute peak corresponded to the Traut’s-maleimide-modified fragment (expected molecular weight (M) for the Traut’s-mal eimide modified fragment Ac-EEALYAK(Traut’s-Maleimide)-COOH (SEQ ID NO: 4) is 1062.6), confirming that LysC retained recognition ability for the modified peptide. This result demonstrated that the lysine side-chain modification, which maintained a positive charge, did not interfere with the ability of LysC to process the peptide, thus preserving enzyme recognition and cleavage functionality.

[0437] Example 2

[0438] This Example demonstrates that the cleaved fragments obtained by the method of Example 1 can be captured by a linker-attached lysine to form a peptide-linker construct (or ‘concatemer’) in which the cleaved fragments of the original peptide are linked together while maintaining their order from the original peptide.

[0439] A model peptide with the sequence Ac-EEALYAKAGNNYGKLAQYVA-CONBL (SEQ ID NO: 5) was used. To minimise side reactions and simplify the analysis, the peptide’s N- and C-termini were capped with acetyl and amide groups, respectively. The peptide contained two lysine residues, with the cleavage sites located at the C-terminus of each lysine, leading to the formation of three distinct fragments upon cleavage. The objective was to evaluate the method by analysing the mass of the resulting products to confirm that sequence order of the peptide was preserved in the peptide-linker construct (Fig. 13).

[0440] A mass corresponding to the predicted product was expected to be observed only if the sequence order was preserved (Fig. 13). The potential reaction products are illustrated in Fig. 14. As shown, the only scenario in which all fragments connect into a single construct is if they assemble in the correct order.

[0441] Fig. 15 shows the HPLC trace and mass data for the starting peptide, confirming the correct peptide identity. Molecular weight (M) of the peptide is 2213.1 Da. Upper detection limit of the MS system is set at 2000 Da hence it is expected to see an MH+ peak. The observed mass was 1108.1 (M2H+) / 2.

[0442] To initiate the reaction, a solution of the original peptide (Fig. 15) was prepared in DMSO. Traut’s reagent (10 equivalents) and the crosslinker bismaleimidoethane (BMOE) were added, and the mixture was incubated for 10 minutes at 37°C. During this initial step, the lysine residues first react with Traut’s reagent, introducing a thiol group that subsequently binds to one of the maleimide termini of the BMOE linker. Product formation was monitored by HPLC and mass spectrometry (Fig. 16). The retention time shifted from 13.338 minutes for the unmodified peptide to 16.462 minutes for the modified peptide, with observed masses of (M2H+) / 2 = 1429.2 and (M3H+) / 3 = 953.1. The molecular weight (M) for Ac-EEALYAKfTraut’ s-BMOE ) AGNN YGKITrauf s-BMOE)LAQ YVA- CONH2 (SEQ ID NOs: 6-8) is 2855.8.

[0443] Following the reaction, the mixture was buffer-exchanged into amine-free PBS buffer (pH 8.5) to remove excess Traut’s reagent and unreacted BMOE. LysC enzyme was then introduced in the presence of additional Traut’s reagent. LysC cleaves the peptide at the C-terminus of modified lysine residues, exposing the N-terminus of the adjacent amino acid and generating a new amine group. This amine reacts with excess Traut’s reagent in the solution to produce a new thiol group, which is subsequently captured by the maleimide terminus of the BMOE attached to the neighbouring lysine side chain. This reaction forms a crosslinkjoining the peptide fragments and so forming a peptide-linker construct, or concatemer.

[0444] Fig. 17 shows the analysis from this concatemerisation reaction. As previously described, the exact mass of the desired product will only appear if the fragments are correctly connected in sequence. Multiple products are anticipated due to potential ringopening or hydrolysis of the maleimide rings in the two BMOE linkers. However, this hydrolysis will result only in the ring opening of the maleimide groups, preserving the linkage within the concatemer. This will lead to a series of peaks (Fig. 17 A-F), each corresponding to a product. The desired product, Ac-EEALYAK(Traut’ s-BMOE-Traut’ s)- AGNNYGK(Traut’ s-BMOE-Traut’ s )-LAQ Y VA-CONH2 (SEQ ID NOs: 9-11) has an expected molecular weight of 3094.1 Da. Since it contains four maleimide rings, the hydrolysis of each ring would increase the molecular weight by 18 Da. Consequently, the total molecular weight will vary depending on the degree of maleimide hydrolysis in the product. The low-resolution mass spectra show the expected mass of the desired product (Fig. 17 E) as the major species, along with masses corresponding to ring-opening hydrolysis of the maleimide rings of the desired product (Fig. 17 A-D), plus possible deamidation-cyclisation at N / Q residues of the desired product (Fig. 17 F). Further confirmation is provided by high-resolution mass data in Figs. 18 and 19.

[0445] A large-scale reaction was performed, and peptide-related products from the mixture were isolated and subjected to high -resolution mass spectrometry to verify product formation and confirm the sequence order of linking. Fig. 18 shows the HPLC trace from the large-scale purification of the starting peptide Ac-EEALYAKAGNNYGKLAQYVA-CONBL (SEQ ID NO: 5), modified with Traut’s reagent and BMOE, followed by LysC digestion in the presence of excess Traut’s reagent. Peaks labelled 1-7 were collected and submitted for high-resolution mass spectrometry analysis.

[0446] Fig. 19 presents the data from high-resolution mass spectra, fragmentation patterns, and proposed major products corresponding to each peak in Fig. 18. Due to limitations in the purification method, some separated peaks may contain multiple products. In such cases, peaks are annotated with the major product as “2082-product” and any minor products as “by-product_2” based on chromatographic area analysis. As previously explained, multiple products can form, with some sharing the same molecular weight but displaying different retention times depending on the specific sites of maleimide ring hydrolysis. These results confirm the successful concatemerisation process and demonstrate the preservation of the original peptide sequence integrity.

[0447] All peptides used in these Examples were synthesised through solid-phase peptide synthesis. Starting materials and products were analysed by High Performance Liquid Chromatography (HPLC) and Mass Spectrometry (MS). All reagents, including enzymes, were sourced commercially. For high-resolution mass spectrometry, where an observed mass fell within ±5 ppm of the expected mass, this was accepted as confirmation that the sample contained the target molecule.

[0448] Example 3

[0449] This Example demonstrates that a test peptide expanded to form a peptide-linker ‘concatemer’ construct following the methods described herein can be characterised on a nanopore sequencing device. The test peptide is cleaved into two fragments which are linked together while maintaining their order from the original peptide. The resulting peptide-linker ‘concatemer’ construct is characterised on a nanopore sequencing device using a motor protein to control the movement of the construct with respect to the nanopore, such as described in WO 2021 / 111125.

[0450] The test peptide Tetrazine-EEALYAKAGNNYGK(N3)-CONHi was used, incorporating a tetrazine ‘handle’ at the N-terminus and an azide group at the C-terminus via an azido-lysine residue, with an amidated C-terminus.

[0451] Control construct: A DNA tail was attached to the N-terminus and a polynucleotide nanopore sequencing adapter to the C-terminus of the test peptide. The peptide with attached DNA tail and nanopore sequencing adapter was characterised on a MinlON nanopore sequencing device (Oxford Nanopore Technologies) using custom flow cells.

[0452] Data are shown in Figure 20, where the signal from the translocated peptide portion of the construct is highlighted.

[0453] Concatemer construct: The same test peptide Tetrazine-

[0454] EEALYAKAGNNYGK(N3)-CONH2was subjected to a concatemerisation process as described in Examples 1 and 2, in which cleavage occurs at the C-terminus of the lysine residue at position 7, followed by linkage of the lysine side chain to the N-terminus of the resulting fragment using a bismaleimidoethane (BMOE) linker. A DNA tail and a polynucleotide nanopore sequencing adapter were added as per the control construct. This ‘concatemer’ construct was then characterised on a MinlON nanopore sequencing device (Oxford Nanopore Technologies) using custom flow cells.

[0455] Data are shown in Figure 21. In the highlighted peptide region of the signal trace, the altered peptide signal shape relative to the control trace indicates the successful translocation of the two linked peptide fragments.

Claims

CLAIMS1. A method of characterising a target polypeptide, wherein the target polypeptide comprises one or more target amino acid pairs, wherein each target amino acid pair in the target polypeptide comprises a first amino acid attached to a second amino acid; the method comprising expanding the target polypeptide to form a peptide-linker construct; wherein expanding the target polypeptide comprises attaching a linker between the first and second amino acids in each target amino acid pair, and cleaving the target polypeptide between the first and second amino acids in each target amino acid pair; thereby forming a peptide-linker construct in which the sequence order of the amino acids in the target polypeptide is maintained; and wherein characterising the target polypeptide comprises taking one or more measurements characteristic of the construct as the construct moves with respect to a nanopore; thereby characterising the target polypeptide.

2. A method according to claim 1, wherein the target polypeptide comprises a plurality of target amino acid pairs.

3. A method according to any one of the preceding claims, wherein in each target amino acid pair the first and second amino acids are adjacent.

4. A method according to any one of the preceding claims, wherein the second amino acid is N-terminal to the first amino acid.

5. A method according to any one of claims 1 to 3, wherein the second amino acid is C-terminal to the first amino acid.

6. A method according to any one of claims 2 to 5 wherein each first amino acid in each target amino acid pair is the same.

7. A method according to any one of claims 2 to 5 wherein said plurality of target amino acid pairs comprises at least two different first amino acids.

8. A method according to any one of the preceding claims wherein the second amino acids in each of the one or more target amino acid pairs may be the same or different.

9. A method according to any one of claims 2 to 8, wherein expanding the target polypeptide to form a peptide-linker construct comprises attaching a first linker between the first and second amino acids in a first target amino acid pair and attaching a second linker between the first and second amino acids in a second target amino acid pair.

10. A method according to claim 9, wherein the first linker and the second linker are the same.

11. A method according to claim 9, wherein the first linker and the second linker are different.

12. A method according to any one of the preceding claims, wherein the or each linker independently comprises a multi-functional molecule.

13. A method according to any one of the preceding claims, wherein the or each linker independently comprises a polymer.14 A method according to any one of the preceding claims, wherein the or each linker independently comprises a polynucleotide, a polypeptide and / or a polysaccharide.

15. A method according to any one of the preceding claims, wherein the or each linker independently comprises a hairpin.

16. A method according to any one of the preceding claims, wherein the or each linker comprises a first end and a second end, and wherein attaching a linker between the first and second amino acids in each target amino acid pair comprises attaching the first end of the linker to the first amino acid and attaching the second end of the linker to the second amino acid.

17. A method according to any one of the preceding claims, wherein the or each first amino acid comprises a reactive side chain.

18. A method according to any one of the preceding claims, comprising activating the side chain of the or each first amino acid for reaction with the first end of the linker.

19. A method according to any one of the preceding claims, comprising reacting the first end of the or each linker with the side chain of the or each first amino acid.

20. A method according to any one of the preceding claims, wherein the first end of the or each linker independently comprises a first reactive group for reacting with the or each first amino acid.

21. A method according to claim 20, wherein the first reactive group is an aminereactive group, a thiol -reactive group, a carbonyl -reactive group, a carboxyl -reactive group, a hydroxyl-reactive group, an imidazole-reactive group, or a click-chemistry reactive group.

22. A method according to any one of the preceding claims, wherein the or each first amino acid is independently selected from Lys, Arg, Glu, Asp, Cys, Ser, Thr, Tyr and His.

23. A method according to any one of the preceding claims, comprising activating the or each second amino acid for reaction with the second end of the or each linker.

24. A method according to any one of the preceding claims, comprising a step of activating the second end of the or each linker for reaction with the or each second amino acid.

25. A method according to any one of the preceding claims, wherein the second end of the or each linker comprises a second reactive group for reacting with the or each second amino acid.

26. A method according to claim 25, wherein the second reactive group is an aminereactive group, a thiol -reactive group, a carbonyl -reactive group, a carboxyl -reactive group, a hydroxyl-reactive group, an imidazole-reactive group, or a click-chemistry reactive group.

27. A method according to any one of the preceding claims, wherein:- the first amino acid and / or the second amino acid each comprise an amine group;- the method comprises activating said amine group(s) by reaction with an activating agent, preferably wherein said activating agent is Traut’s Reagent (2- iminothiolane, or a salt thereof, preferably 2-iminothiolane hydrochloride); and- the first end of the linker and / or the second end of the linker each comprise a thiolreactive group, preferably a maleimide or haloacetamide group.

28. A method according to any one of the preceding claims, wherein the or each linker comprises a plurality of linking portions.

29. A method according to claim 28, comprising attaching the linking portions together prior to attaching the second end of the linker to the second amino acid.

30. A method according to claim 28, comprising attaching the linking portions together after attaching the second end of the linker to the second amino acid.

31. A method according to any one of the preceding claims, wherein expanding the target polypeptide comprises: i) attaching a first end of each linker to the first amino acid in each target amino acid pair; ii) cleaving the target polypeptide between the first and the second amino acids in each target amino acid pair; and iii) attaching a second end of each linker to the second amino acid in each target amino acid pair.

32. A method according to any one of the preceding claims, wherein cleaving the target polypeptide comprises contacting the target polypeptide with one or more proteolytic enzymes.

33. A method according to any one of the preceding claims, wherein cleaving the target polypeptide comprises contacting the target polypeptide with a chemical reagent.

34. A method according to any one of the preceding claims, wherein cleaving the target polypeptide comprises contacting the target polypeptide with one or more of LysC, LysN, trypsin, ArgC, clostripain, gingisrex, GluC, glutamyl endopeptidase, granzyme B, staphylococcal peptidase I, AspN, caspase 1, caspase 2, caspase 3, caspase 4, caspase 5, caspase 6, caspase 7, caspase 8, caspase 9, caspase 10, enterokinase, factor Xa, formic acid, granzyme B, 2-Nitro-5-thiocyanatobenzoic acid, papain, thrombin (Pepti deCutter), thrombin SG, Asp-N endopeptidase, and LysArgiNase.

35. A method according to any one of the preceding claims, comprising reacting the second end of the or each linker with the N-terminal amine group of the or each second amino acid.

36. A method according to any one of claims 1 to 30, wherein expanding the target polypeptide comprises: i) attaching a first end of each linker to each first amino acid in each target amino acid pair; and ii) attaching a second end of each linker to each second amino acid in each target amino acid pair, thereby cleaving the target polypeptide between the first and the second amino acids in each target amino acid pair.

37. A method according to claim 36, wherein reacting the second end of the linker with the second amino acid causes cleavage of the target polypeptide.

38. A method according to claim 36 or 37, comprising reacting the second end of the linker with a peptide bond between the first and second amino acids.

39. A method according to any one of claims 1 to 30 or 32 to 35, wherein expanding the target polypeptide comprises: i) attaching a first end of each linker to each first amino acid in each target amino acid pair;ii) attaching a second end of each linker to each second amino acid in each target amino acid pair; and iii) cleaving the target polypeptide between the first and the second amino acids in each target amino acid pair.

40. A method according to claim 39, comprising reacting the second end of the linker with the side chain of the second amino acid.

41. A method according to any one of the preceding claims, wherein the peptide-linker construct comprises a concatemer of contiguous peptide fragments, wherein proximate peptide fragments are linked together, and wherein the sequence order of the amino acids in the concatemer is the same as the sequence order of the amino acids in the target polypeptide.

42. A method according to any one of the preceding claims, wherein the concatemer comprises n contiguous peptide fragments, and wherein the concatemer comprises a structure:N-... [PEPJ1A-1ELink2E-2A[PEPx+7]1A-1ELink2E-. . . -2[PEPV„] ... -C orwherein[PEP ], [PEPx+y], . . . , [PEPA„] represent the n contiguous peptide fragments;1Arepresents the first amino acid in each peptide fragment;2Arepresents the second amino acid in each peptide fragment; each Link represents a linker;1Erepresents the first end of each linker;2Erepresents the second end of each linker;N represents the N-terminus of the concatemer; and C represents the C-terminus of the concatemer.

43. A method according to any one of the preceding claims, comprising attaching a sequencing adapter to the construct.

44. A method according to any one of the preceding claims, comprising loading a motor protein onto the construct or onto a sequencing adapter attached to the construct .

45. A method of moving one or more peptide portions of a target polypeptide with respect to a nanopore, wherein the target polypeptide comprises one or more target amino acid pairs, wherein each target amino acid pair in the target polypeptide comprises a first amino acid attached to a second amino acid; the method comprising expanding the target polypeptide to form a peptide-linker construct; and contacting the peptide-linker construct with a motor protein under conditions such that the motor protein controls the movement of one or more peptide portions of the construct with respect to a nanopore; wherein expanding the target polypeptide comprises attaching a linker between the first and second amino acids in each target amino acid pair, and cleaving the target polypeptide between the first and second amino acids in each target amino acid pair; thereby forming a peptide-linker construct in which the sequence order of the amino acids in the target polypeptide is maintained.

46. A method of producing a peptide-linker construct from a target polypeptide, wherein the target polypeptide comprises one or more target amino acid pairs, wherein each target amino acid pair in the target polypeptide comprises a first amino acid attached to a second amino acid; the method comprising expanding the target polypeptide by attaching a linker between the first and second amino acids in each target amino acid pair, and cleaving the target polypeptide between the first and second amino acids in each target amino acid pair; thereby forming a peptide-linker construct in which the sequence order of the amino acids in the target polypeptide is maintained.

47. A method according to claim 45 or 46, wherein:- the target polypeptide and / or the linker are as defined in any one of claims 2 to 30; and / or cleaving the target polypeptide is as defined in any one of claims 31 to 40; and / or- the peptide-linker construct is as defined in any one of claims 41 to 42; and / or- the method comprises attaching a sequencing adapter to the construct; and / or- the method is as defined in claim 46 and the method comprises loading a motor protein onto the construct or onto a sequencing adapter attached to the construct.

48. A conjugate, comprising a target polypeptide comprising one or more target amino acid pairs, wherein each target amino acid pair in the target polypeptide comprises a first amino acid attached to a second amino acid; and one or more linkers attached to the first amino acid in each target amino acid pair, wherein the one or more linkers are each also optionally attached to the second amino acid in each target amino acid pair; optionally wherein:- the conjugate comprises a sequencing adaptor; and / or- the conjugate comprises a motor protein capable of controlling the movement of a peptide-linker construct with respect to a nanopore.

49. A peptide-linker construct comprising a concatemer of contiguous peptide fragments, wherein proximate peptide fragments are linked together by a plurality of linkers; optionally wherein:- the construct comprises a sequencing adaptor; and / or- the construct comprises a motor protein capable of controlling the movement of the construct with respect to a nanopore; and / or- the construct is anchored to a membrane comprising a nanopore.

50. A kit for modifying a target polypeptide, comprising a linker having a first end capable of selectively reacting with an optionally- activated first amino acid comprised in a target amino acid pair comprised in the target polypeptide, and a second end capable of reacting with an optionally- activated second amino acid comprised in the target amino acid pair; and a chemical or enzymatic reagent capable of cleaving the target polypeptide between the first amino acid and the second amino acid; and optionally comprising one or more of: a sequencing adapter capable of selectively reacting with the target polypeptide or the linker;a motor protein capable of controlling the movement of peptide-linker construct with respect to a nanopore; a nanopore capable of detecting one or more characteristics of a peptide-linker construct as the construct moves with respect to the nanopore.