Machine learning model for generation of recombinant polyclonal proteins and methods of use thereof
ML/AI methods enable precise design of recombinant polyclonal proteins, addressing supply and quality issues of plasma-derived antibodies, ensuring consistent and effective therapeutic performance.
Patent Information
- Authority / Receiving Office
- WO · WO
- Patent Type
- Applications
- Current Assignee / Owner
- GIGAGEN INC
- Filing Date
- 2025-12-16
- Publication Date
- 2026-06-25
AI Technical Summary
Plasma-derived antibody therapeutics face challenges such as supply shortages, impurities, allergic reactions, suboptimal effector properties, and batch-to-batch variation, limiting their effectiveness and safety.
Generation of recombinant polyclonal proteins (RPPs) using machine learning (ML) and artificial intelligence (AI) methods to design antibodies with controlled parameters, predicting desired properties and reducing development risks.
ML/AI-designed RPPs offer precise antibody design, reducing costs and mitigating risks by predicting issues like cross-reactivity and immunogenicity, resulting in consistent and effective therapeutic outcomes.
Smart Images

Figure US2025059938_25062026_PF_FP_ABST
Abstract
Description
Attorney Docket No.: 28152-59621 / WOClient Ref. No.: 024WOMACHINE LEARNING MODEL FOR GENERATION OF RECOMBINANT POLYCLONAL PROTEINS AND METHODS OF USE THEREOF1. CROSS-REFERENCE TO RELATED APPLICATIONS
[0001] This application claims the benefit of U.S. Provisional Patent Application No. 63 / 734,683, filed on December 16, 2024, the contents of which are incorporated by reference herein in its entirety.2. FIELD
[0002] Provided herein are recombinant polyclonal proteins (RPPs), also called recombinant polyclonal antibody proteins, recombinant hyperimmune globulins, or simply recombinant hyperimmunes, with binding specificity for a target molecule or complex of molecules. Specifically, the present disclosure provides custom recombinant polyclonal proteins (custom RPPs or cRPPs) generated by machine learning or artificial intelligence (Al) methods. Also provided are methods of making RPPs, and methods of using RPPs, for example, for therapeutic purposes.3. BACKGROUND
[0003] Many diseases, such as those caused by infectious viruses or bacteria with many variants or seroty pes, are best treated by drugs that target multiple epitopes. An established therapeutic modality is multispecific (multivalent) antibodies derived from human or animal plasma, such as intravenous immunoglobulin (IVIG). Polyclonal antibody drugs with higher potency, known as hyperimmune globulins, are often derived from the plasma of recently vaccinated human donors, for example, HepaGam B against hepatitis B virus (HBV) and BabyBIG against infant botulism. In diseases for which human vaccination is not possible, hyperimmune globulins can be generated by immunizing animals, for example, rabbit- derived thymoglobulin (’rabbit- ATG') against human thymocytes for transplant tolerance. For rapid response to emerging pathogens with poorly characterized neutralizing epitopes, many groups have developed hyperimmune globulins derived from immunized animal plasma or convalescent human serum, for example, Zika virus hyperimmune globulin or severe acute respiratory syndrome coronavirus-2 (SARS-CoV-2).
[0004] Plasma-derived antibody therapeutics have substantial drawbacks. First, demand for normal and convalescent donor plasma often outstrips supply. Plasma-derived drugs have suffered from impurities, including infectious viruses and clotting factors, that have resulted1 28152 / 59621 / FW / 25024150.11Attorney Docket No.: 28152-59621 / WOClient Ref. No.: 024WO in serious adverse events. Antibody drugs derived from animal plasma occasionally cause allergic reactions, lead to antidrug antibodies and have suboptimal effector properties. Because they are derived from naturally occurring proteins, plasma-derived drugs are not easily engineered; for example, it is not possible to modify Fc sequences to improve mechanism of action or drug half-life. Finally, each batch of plasma-derived drug is usually derived from a different cohort of human donors or animals, resulting in batch-to-batch variation.
[0005] Many of these problems could be solved by generating multivalent hyperimmune globulins using recombinant DNA technology.4. SUMMARY
[0006] Provided herein are RPPs generated, at least in part, by machine learning (ML) or artificial intelligence (Al) methods. Generation of RPPs by the ML / Al method has several significant advantages. First, use of AI / ML methods allow for a high degree of control over the antibody design process. It is possible to predict numerous parameters to select antibodies wi th desired properties such as binding affinity, specificity, and stability. It is also possible to predict overall characteristics of an ABP library to select and optimize RPPs. Additionally, by narrowing down the possibilities to a smaller set of highly promising candidates before moving on to actual synthesis and testing, ML / Al design can help to significantly reduce the costs associated with antibody development. With the ML / Al design, researchers can predict potential issues related with cross-reactivity, immunogenicity, stability, and manufacturability, thereby mitigating risks in early stages of development.
[0007] The present disclosure provides various ML / Al methods for designing RPPs and RPPs generated by the ML / Al approach. The RPPs comprises recombinant ABPs, and their sequences can be derived from any number of suitable sources, including human or other mammalian samples, known antibodies, synthetic antibodies, antibody database and any combinations thereof or they can be also generated by ML / Al models.
[0008] In some embodiments, provided herein is a method of training a machine-learning model for predicting a characteristic descriptor for an antigen binding protein (ABP) comprising: (1) obtaining a training dataset including a plurality of samples corresponding to a plurality of ABPs, each sample including a token sequence encoding a partial or full sequence of the respective ABP and a label indicating a value of the characteristic descriptor for the ABP; (2) accessing a pre-trained transformer model and a prediction layer; (3)2 28152 / 59621 / FW / 25024150.11Attorney Docket No.: 28152-59621 / WOClient Ref. No.: 024WO dividing the training dataset into one or more batches of samples for one or more iterations; and (4) for each of one or more iterations: (a) obtaining a set of output embedding sequences for the respective batch of samples for a current iteration, wherein the set of output embedding sequences are generated by applying the transformer model to the token sequences for the batch of samples, (b) for each sample in the batch, applying the parameters of the prediction layer to the output embedding sequence for the sample to generate an estimated output, (c) computing a loss function indicating differences between the labels and the estimated outputs for the batch of samples, and (d) updating the parameters of the prediction layer by backpropagating error terms obtained from the loss function.
[0009] In some embodiments, according to any of the methods described above, (a) the characteristic descriptor for the respective ABP is selected from: (i) a binding affinity of the respective ABP for a respective target antigen; (ii) an effector activity of the respective ABP against the target molecule or complex; (iii) a solubility score of the respective ABP; (iv) an aggregation score of the respective ABP; (v) a hydrophobicity score of the respective ABP; (vi) an isoelectric point of the respective ABP; (vii) a stability score of the respective ABP; (viii) a molecular weight of the respective ABP; (ix) a number of unpaired cysteine residues in the respective ABP; (x) an abundance frequency of the respective ABP following a sorting process to enrich for binding to the respective target antigen; (xi) a fold-change of the increase in the abundance frequency of the respective ABP following a sorting process to enrich for binding to the respective target antigen, as compared to the abundance frequency prior to enrichment; (xii) a number of non-canonical glycosylation sites in the respective ABP; (xiii) a number of cleavage sites in the respective ABP; (xiv) a number of deamidation sites in the respective ABP; (xv) a number of isomerization sites in the respective ABP; (xvi) a number of oxidation sites in the respective ABP; (xvii) CDR3H length of the respective ABP; (xviii) binding specificity of the respective ABP; (xix) immunogenicity of the respective ABP; (xx) poly specificity' of the respective ABP; and (xxi) a respective epitope that the respective ABP binds to.
[0010] In some embodiments, according to any of the methods described above, the characteristic descriptors comprise the binding affinity of the respective ABP for a respective target antigen.
[0011] In some embodiments, according to any of the methods described above, the binding affinity is determined by a Poly Map assay comprising the steps of: providing a library of3 28152 / 59621 / FW / 25024150.11Attorney Docket No.: 28152-59621 / WOClient Ref. No.: 024WO target-decorated cells, wherein each of the target-decorated cells presents the target molecule or complex on the membrane; contacting the library of target-decorated cells with a plurality of ABP-ribosome-mRNA (ARM) complexes corresponding to the one or more of the plurality of ABPs, thereby inducing binding between the target-decorated cells and the ARM complexes; generating a plurality of monodisperse or poly disperse emulsion microdroplets, wherein each microdroplet contains a single cell out of the target-decorated cells, one or more ARM complexes bound to the single cell, and a lysis reagent inducing lysis of the single cell; capturing RNA released from the single cell on a solid surface or within a semi-permeable shell; generating a library of hybrid polynucleic acids that comprise a sequence from a transcript of the single cell and / or a sequence from the mRNA of the ARM complex; sequencing the library of hybrid polynucleic acids; and determining a presence or absence of binding of each of the one or more of the plurality of ABPs to their respective target antigen.
[0012] In some embodiments, according to any of the methods described above, the characteristic descriptor comprises the effector activity7of the respective ABP against the target molecule or target molecule complex.
[0013] In some embodiments, according to any of the methods described above, the target molecule or target molecule complex comprises a virus, and the effector activity is a neutralization activity determined by a pseudovirus neutralization assay or a live virus neutralization assay.
[0014] In some embodiments, according to any of the methods described above, the target molecule or target molecule complex comprises a bacterium, and the effector activity7is a bactericidal activity determined by a serum bactericidal assay (SBA) or an opsonophagocytic killing assay (OPKA).
[0015] In some embodiments, according to any of the methods described above, the characteristic descriptor comprises the solubility7score, optionally wherein the solubility score is determined using SKADE.
[0016] In some embodiments, according to any of the methods described above, the characteristic descriptor comprises the aggregation score, optionally wherein the aggregation score corresponds to the number of residues predicted to have a propensity to aggregate and is determined by a method comprising the steps of: determining a 3D structure of the ABP, optionally wherein the 3D structure is determined using ABodyBuilder2; and determining the4 28152 / 59621 / FW / 25024150.11Attorney Docket No.: 28152-59621 / WOClient Ref. No.: 024WO aggregation score based on the 3D structure, optionally wherein the aggregation score is determined using Aggrescan3D.
[0017] In some embodiments, according to any of the methods described above, the characteristic descriptor comprises the hydrophobicity score, optionally wherein the hydrophobic score is determined as the grand average of hydropathy (GRAVY), optionally wherein the hydropathy value of each amino acid is calculated using the Eisenberg scale.
[0018] In some embodiments, according to any of the methods described above, the characteristic descriptor comprises the isoelectric point, optionally wherein the isoelectric point is determined as EMBOSS pK values.
[0019] In some embodiments, according to any of the methods described above, the characteristic descriptor comprises the stability score, optionally wherein the stability score is determined by a method comprising the steps of: calculating an aliphatic index by determining the relative volume of A, V, L, and I residues, wherein the stability score corresponds to the aliphatic index.
[0020] In some embodiments, according to any of the methods described above, the characteristic descriptor comprises an abundance frequency or fold-change of the increase in the abundance frequency of the respective ABP following a sorting process to enrich for binding to the respective target antigen, optionally wherein the sorting process is fluorescence-activated cell sorting (FACS) or magnetic-activated cell sorting (MACS) sorting, further optionally wherein the sorting is carried by yeast display.
[0021] In some embodiments, according to any of the methods described above, the characteristic descriptor comprises the number of cleavage sites, optionally wherein the cleavage site is a DP motif in the variable heavy or variable light chain region of the respective ABP.
[0022] In some embodiments, according to any of the methods described above, the characteristic descriptor comprises the number of deamidation sites, optionally wherein the deamidation site is an NG, NS, or NA motif in CDR2H or CDR1L of the respective ABP.
[0023] In some embodiments, according to any of the methods described above, the characteristic descriptor comprises the number of isomerization sites, optionally wherein the isomerization site is a DG or DS motif in CDR2H, CDR3H, or CDR1L of the respective ABP.5 28152 / 59621 / FW / 25024150.11Attorney Docket No.: 28152-59621 / WOClient Ref. No.: 024WO
[0024] In some embodiments, according to any of the methods described above, the characteristic descriptor comprises the number of oxidation sites, optionally wherein the oxidation site is a W or M residue in the CDRHs or CDRLs of the respective ABP.
[0025] In some embodiments, according to any of the methods described above, the characteristic descriptor comprises the binding specificity, optionally wherein the binding specificity corresponds to the number of variants of a target antigen capable of being targeted by the respective ABP, further optionally wherein the binding specificity is determined by a Poly Map assay.
[0026] In some embodiments, according to any of the methods described above, the method further comprises: (5) obtaining another training dataset including one or more token sequences for one or more ABP’s, each token sequence encoding a partial or full sequence of a respective ABP; (6) dividing the training dataset into one or more batches of token sequences for one or more iterations; (7) for each of one or more iterations: (a) obtaining an estimated output embedding sequence by applying parameters of the transformer model to the token sequence; (b) for each token sequence, mapping the estimated output embedding sequence to an estimated token sequence; (c) computing a loss function indicating differences between the estimated token sequences and the token sequences for the one or more ABPs; and (d) updating the parameters of the transformer model by backpropagating error terms from the loss function.
[0027] In some embodiments, according to any of the methods described above, the transformer model includes one or more attention layers, an attention layer coupled to receive a query, a key, and a value, and generate an attention output by combining the query, the key, and the value.
[0028] In some embodiments, according to any of the methods described above, the token sequence of each sample includes a set of tokens, each token numerically encoding a respective residue of the full sequence or the partial sequence of the ABP of the sample.
[0029] In some embodiments, provided herein is a computer readable medium storing the parameters for the prediction layer trained according to any of the methods described above.
[0030] In some embodiments, provided herein is a method of predicting the characteristic descriptor for a particular ABP, comprising (5) obtaining a token sequence encoding a partial or full sequence of the particular ABP; and (6) applying the machine-learning model created6 28152 / 59621 / FW / 25024150.11Attorney Docket No.: 28152-59621 / WOClient Ref. No.: 024WO by any of the methods described above to the tokens sequence for the particular ABP to generate a prediction for the characteristic descriptor.
[0031] In some embodiments, provided herein is a RPP comprising at least a set of ABPs specific for a target molecule or complex, wherein the RPP is formed according to any of the methods described above.
[0032] In some embodiments, provided herein is a method of selecting a filtered antigen binding protein (ABP) library dataset from a set of candidate ABP 1 i brarx datasets, comprising: (1) obtaining an input ABP library dataset including an ABP profile for each of a plurality of ABPs; (2) generating a set of candidate ABP library datasets each corresponding to a respective subset of ABPs from the plurality of ABPs; (3) for each ABP in a respective candidate ABP library dataset, obtaining values for a plurality of characteristic descriptors for the respective ABP, wherein the plurality of characteristic descriptors is selected from: (i) a binding affinity of the respective ABP for the respective target antigen; (ii) an effector activity of the respective ABP against the target molecule or complex; (iii) a solubility score of the respective ABP; (iv) an aggregation score of the respective ABP; (v) a hydrophobicity score of the respective ABP; (vi) an isoelectric point of the respective ABP; (vii) a stability score of the respective ABP; (viii) a molecular weight of the respective ABP; (ix) a number of unpaired cysteine residues in the respective ABP; (x) an abundance frequency of the respective ABP following a sorting process to enrich for binding to the respective target antigen; (xi) a fold-change of the increase in the abundance frequency of the respective ABP following a sorting process to enrich for binding to the respective target antigen, as compared to the abundance frequency prior to enrichment: (xi) a number of non-canonical glycosylation sites in the respective ABP; (xiii) a number of cleavage sites in the respective ABP; (xiv) a number of deamidation sites in the respective ABP; (xv) a number of isomerization sites in the respective ABP; (xvi) a number of oxidation sites in the respective ABP; (xvii) CDR3H length of the respective ABP; (xviii) binding specificity of the respective ABP; (xix) immunogenicity’ of the respective ABP; (xx) polyspecificity of the respective ABP; and (xxi) a respective epitope that the respective ABP binds to, wherein a value for at least one characteristic descriptor for the respective ABP is generated by applying a machine-learning model; (4) for each ABP in a respective candidate ABP library dataset, obtaining values associated with one or more preferred library’ properties of the candidate ABP library dataset, wherein the one or more preferred library properties are selected from: (i) the set of heavy chain CDR3 sequences contained in the subset of the plurality of ABPs7 28152 / 59621 / FW / 25024150.11Attorney Docket No.: 28152-59621 / WOClient Ref. No.: 024WO comprises at least about 10, 20, 50, 100, 200, or 1000 unique sequences; (ii) the subset of the plurality of ABPs specifically bind to at least two unique epitopes associated with the target molecule or complex; (iii) the subset of the plurality of ABPs is capable of modulating at least two target antigen variants; (iv) the set of heavy chain V genes represented in the subset of the plurality of ABPs comprises at least two unique V genes; (v) the set of light chain V genes represented in the subset of the plurality of ABPs comprises at least two unique V genes; (vi) the set of heavy chain J genes represented in the subset of the plurality7of ABPs comprises at least two unique J genes; (vii) the set of light chain J genes represented in the subset of the plurality of ABPs comprises at least two unique J genes; (viii) the average percent germline identity7of heavy chain V genes represented in the subset of the plurality of ABPs is between about 50% and about 100%; (ix) the average percent germline identity of light chain V genes represented in the subset of the plurality7of ABPs is between about 50% and about 100%; (x) the average percent germline identity of heavy chain J genes represented in the subset of the plurality of ABPs is between about 50% and about 100%; and (xi) the average percent germline identity7of light chain J genes represented in the subset of the plurality of ABPs is between about 50% and about 100%; and (5) for each candidate ABP library dataset from the set of candidate ABP library datasets, generating a score based on the values of the plurality of characteristic descriptors of each of the respective subset of ABPs for the candidate ABP library dataset, and the values associated with the one or more preferred library properties of the candidate ABP library7dataset; and (6) selecting a candidate ABP library dataset as the filtered ABP library7dataset based on the generated scores for the set of candidate ABP library datasets, wherein the selected candidate ABP library dataset is associated with a respective score that is equal to or above a threshold.
[0033] In some embodiments, according to any of the methods described above, the plurality7of characteristic descriptors in (3) comprises the binding affinity of the respective ABP for a respective target antigen.
[0034] In some embodiments, according to any of the methods described above, the binding affinity is determined by a PolyMap assay comprising the steps of: providing a library of target-decorated cells, wherein each of the target-decorated cells presents the target molecule or complex on the membrane; contacting the library of target-decorated cells with a plurality of ABP-ribosome-mRNA (ARM) complexes corresponding to the one or more of the plurality of ABPs, thereby inducing binding between the target-decorated cells and the ARM complexes; generating a plurality of monodisperse or poly disperse emulsion microdroplets,8 28152 / 59621 / FW / 25024150.11Attorney Docket No.: 28152-59621 / WOClient Ref. No.: 024WO wherein each microdroplet contains a single cell out of the target-decorated cells, one or more ARM complexes bound to the single cell, and a lysis reagent inducing lysis of the single cell; capturing RNA released from the single cell on a solid surface or within a semi-permeable shell; generating a library of hybrid polynucleic acids that comprise a sequence from a transcript of the single cell and / or a sequence from the mRNA of the ARM complex; sequencing the library of hybrid polynucleic acids; and determining a presence or absence of binding of each of the one or more of the plurality of ABPs to their respective target antigen.
[0035] In some embodiments, according to any of the methods described above, the plurality of characteristic descriptors in (3) comprises the effector activity of the respective ABP against the target molecule or target molecule complex.
[0036] In some embodiments, according to any of the methods described above, the target molecule or target molecule complex comprises a virus, and the effector activity is a neutralization activity determined by a pseudovirus neutralization assay or a live virus neutralization assay.
[0037] In some embodiments, according to any of the methods described above, the target molecule or target molecule complex comprises a bacterium, and the effector activity is a bactericidal activity determined by a serum bactericidal assay (SBA) or an opsonophagocytic killing assay (OPKA).
[0038] In some embodiments, according to any of the methods described above, the plurality of characteristic descriptors in (3) comprises the solubility score, optionally wherein the solubility score is determined using SKADE.
[0039] In some embodiments, according to any of the methods described above, the plurality of characteristic descriptors in (3) comprises the aggregation score, optionally wherein the aggregation score corresponds to the number of residues predicted to have a propensity to aggregate and is determined by a method comprising the steps of: determining a 3D structure of the ABP, optionally wherein the 3D structure is determined using ABodyBuilder2; and determining the aggregation score based on the 3D structure, optionally wherein the aggregation score is determined using Aggrescan3D.
[0040] In some embodiments, according to any of the methods described above, the plurality of characteristic descriptors in (3) comprises the hydrophobicity score, optionally wherein the hydrophobic score is determined as the grand average of hydropathy (GRAVY), optionally wherein the hydropathy value of each amino acid is calculated using the Eisenberg scale.9 28152 / 59621 / FW / 25024150.11Attorney Docket No.: 28152-59621 / WOClient Ref. No.: 024WO
[0041] In some embodiments, according to any of the methods described above, the plurality of characteristic descriptors in (3) comprises the isoelectric point, optionally wherein the isoelectric point is determined as EMBOSS pK values.
[0042] In some embodiments, according to any of the methods described above, the plurality of characteristic descriptors in (3) comprises the stability7score, optionally wherein the stability7score is determined by a method comprising the steps of: calculating an aliphatic index by determining the relative volume of A, V. L, and I residues, wherein the stability7score corresponds to the aliphatic index.
[0043] In some embodiments, according to any of the methods described above, the plurality7of characteristic descriptors in (3) comprises an abundance frequency or fold-change of the increase in the abundance frequency7of the respective ABP following a sorting process to enrich for binding to the respective target antigen, optionally wherein the sorting process is fluorescence-activated cell sorting (FACS) or magnetic-activated cell sorting (MACS) sorting, further optionally wherein the sorting is carried by yeast display.
[0044] In some embodiments, according to any of the methods described above, the plurality7of characteristic descriptors in (3) comprises the number of cleavage sites, optionally wherein the cleavage site is a DP motif in the variable heavy or variable light chain region of the respective ABP.
[0045] In some embodiments, according to any of the methods described above, the plurality7of characteristic descriptors in (3) comprises the number of deamidation sites, optionally wherein the deamidation site is an NG, NS, or NA motif in CDR2H or CDR1L of the respective ABP.
[0046] In some embodiments, according to any of the methods described above, the plurality of characteristic descriptors in (3) comprises the number of isomerization sites, optionally wherein the isomerization site is a DG or DS motif in CDR2H, CDR3H, or CDR1L of the respective ABP.
[0047] In some embodiments, according to any of the methods described above, the plurality of characteristic descriptors in (3) comprises the number of oxidation sites, optionally wherein the oxidation site is a W or M residue in the CDRHs or CDRLs of the respective ABP.
[0048] In some embodiments, according to any of the methods described above, the plurality of characteristic descriptors in (3) comprises the binding specificity, optionally wherein the10 28152 / 59621 / FW / 25024150.11Attorney Docket No.: 28152-59621 / WOClient Ref. No.: 024WO binding specificity corresponds to the number of variants of a target antigen capable of being targeted by the respective ABP, further optionally wherein the binding specificity is determined by a PolyMap assay.
[0049] In some embodiments, according to any of the methods described above, the value for the at least one characteristic descriptor for the respective ABP is generated by: (7) obtaining a token sequence for the respective ABP encoding a partial or full sequence of the respective ABP: (8) generating an output embedding sequence by applying a trained model to the token sequence for the respective ABP; and (9) predicting the value for the at least one characteristic descriptor by applying the trained model to the token sequence for the respective ABP.
[0050] In some embodiments, according to any of the methods described above, the token sequence of each sample includes a set of tokens, each token numerically encoding a respective residue of the full sequence or the partial sequence of the ABP.
[0051] In some embodiments, according to any of the methods described above, the machinelearning model is configured as a neural network model.
[0052] In some embodiments, according to any of the methods described above, wherein (1) the score for each candidate ABP library dataset decreases when the stability score associated with (3)(vii) of each ABP in the respective subset of ABP’s increases, (2) the score for each candidate ABP library dataset increases when a diversity associated with (4)(ii) increases, (3) the score for each candidate ABP library’ dataset increases when an immunogenicity of the respective subset of ABP’s decreases, (4) the score for each candidate ABP library dataset increases when a diversity associated with (4)(iv) or (4)(v) increases, when a higher score for a ABP library' dataset indicates better performance of the ABP library' dataset.
[0053] In some embodiments, provided herein is a recombinant polyclonal protein (RPP) comprising at least a set of ABPs specific for a target molecule or complex, wherein the RPP is formed according to any of the methods described above.
[0054] In some embodiments, provided herein is a method of predicting a characteristic descriptor for an ABP, comprising: (1) obtaining a set of tokens encoding a partial or full sequence of the ABP; (2) applying a transformer model to the set of tokens to generate a set of output embeddings for the ABP; (3) determining a representation for the ABP from at least the set of output embeddings for the ABP; and (4) generating a prediction for the11 28152 / 59621 / FW / 25024150.11Attorney Docket No.: 28152-59621 / WOClient Ref. No.: 024WO characteristic descriptor by applying one or more machine-learning models to the representation for the ABP.
[0055] In some embodiments, provided herein is a method of training a machine-learning model for selecting a filtered ABP library dataset, comprising: (1) obtaining an input antigen binding protein (ABP) library7dataset including an ABP profile for each of a plurality7of ABPs; (2) obtaining a training dataset including a plurality of sample ABP library datasets each corresponding to a respective subset of ABPs, wherein each sample ABP library dataset includes token sequences encoding a partial or full sequence of each ABP in the respective subset of ABP’s for the sample; (3) for each ABP in a sample candidate ABP library7dataset, obtaining values for a plurality of characteristic descriptors for the respective ABP, wherein the plurality of characteristic descriptors is selected from: (i) a binding affinity of the respective ABP for the respective target antigen; (ii) an effector activity' of the respective ABP against the target molecule or complex; (iii) a solubility score of the respective ABP; (iv) an aggregation score of the respective ABP; (v) a hydrophobicity score of the respective ABP; (vi) an isoelectric point of the respective ABP; (vii) a stability score of the respective ABP; (viii) a molecular weight of the respective ABP; (ix) a number of unpaired cysteine residues in the respective ABP; (x) an abundance frequency of the respective ABP following a sorting process to enrich for binding to the respective target antigen; (xi) a fold-change of the increase in the abundance frequency of the respective ABP following a sorting process to enrich for binding to the respective target antigen, as compared to the abundance frequency prior to enrichment; (xii) a number of non-canonical glycosylation sites in the respective ABP; (xiii) a number of cleavage sites in the respective ABP; (xiv) a number of deamidation sites in the respective ABP; (xv) a number of isomerization sites in the respective ABP; (xvi) a number of oxidation sites in the respective ABP; (xvii) CDR3H length of the respective ABP; (xviii) binding specificity of the respective ABP; (xix) immunogenicity' of the respective ABP; (xx) poly specificity7of the respective ABP; and (xxi) a respective epitope that the respective ABP binds to; (4) for each ABP in a sample candidate ABP library7dataset, obtaining values associated with one or more preferred library7properties of the sample candidate ABP library dataset, wherein the one or more preferred library properties is selected from: (i) the set of heavy chain CDR3 sequences contained in the subset of the plurality' of ABPs comprises at least about 10, 20, 50, 100, 200, or 1000 unique sequences; (ii) the subset of the plurality7of ABPs specifically bind to at least two unique epitopes associated with the target molecule or complex; (iii) the subset of the plurality of ABPs is12 28152 / 59621 / FW / 25024150.11Attorney Docket No.: 28152-59621 / WOClient Ref. No.: 024WO capable of modulating at least two target antigen variants; (iv) the set of heavy chain V genes represented in the subset of the plurality of ABPs comprises at least two unique V genes; (v) the set of light chain V genes represented in the subset of the plurality of ABPs comprises at least two unique V genes; (vi) the set of heavy chain J genes represented in the subset of the plurality of ABPs comprises at least two unique J genes; (vii) the set of light chain J genes represented in the subset of the plurality7of ABPs comprises at least two unique J genes; (viii) the average percent germline identity7of heavy7chain V genes represented in the subset of the plurality of ABPs is between about 50% and about 100%; (ix) the average percent germline identity7of light chain V genes represented in the subset of the plurality of ABPs is between about 50% and about 100%; (x) the average percent germline identity7of heavy chain J genes represented in the subset of the plurality of ABPs is between about 50% and about 100%; and (xi) the average percent germline identity of light chain J genes represented in the subset of the plurality of ABPs is between about 50% and about 100%; and (5) for each sample ABP library dataset, computing a score based on the values of the plurality7of characteristic descriptors of each of the respective subset of ABP’s for the sample ABP library7dataset, and the values associated with the one or more preferred library properties of the sample ABP library dataset; (6) dividing the training dataset into one or more batches of samples for one or more iterations; and (7) for each of one or more iterations: (a) for each sample ABP library dataset in the batch, applying parameters of a machine-learning model to the token sequences for the sample ABP library7dataset to generate an estimated output, (b) computing a loss function indicating differences between the scores and the estimated outputs for the batch of sample ABP library datasets, and (c) updating the weights of the machine-learning model by backpropagating error terms obtained from the loss function.
[0056] In some embodiments, according to any of the methods described above, the plurality7of characteristic descriptors in (3) comprises the binding affinity of the respective ABP for a respective target antigen.
[0057] In some embodiments, according to any of the methods described above, the binding affinity is determined by a PolyMap assay comprising the steps of: providing a library of target-decorated cells, wherein each of the target-decorated cells presents the target molecule or complex on the membrane; contacting the library of target-decorated cells with a plurality of ABP-ribosome-mRNA (ARM) complexes corresponding to the one or more of the plurality of ABPs, thereby inducing binding between the target-decorated cells and the ARM complexes; generating a plurality of monodisperse or poly disperse emulsion microdroplets,13 28152 / 59621 / FW / 25024150.11Attorney Docket No.: 28152-59621 / WOClient Ref. No.: 024WO wherein each microdroplet contains a single cell out of the target-decorated cells, one or more ARM complexes bound to the single cell, and a lysis reagent inducing lysis of the single cell; capturing RNA released from the single cell on a solid surface or within a semi-permeable shell; generating a library of hybrid polynucleic acids that comprise a sequence from a transcript of the single cell and / or a sequence from the mRNA of the ARM complex; sequencing the library of hybrid polynucleic acids; and determining a presence or absence of binding of each of the one or more of the plurality of ABPs to their respective target antigen.
[0058] In some embodiments, according to any of the methods described above, the plurality of characteristic descriptors in (3) comprises the effector activity of the respective ABP against the target molecule or target molecule complex.
[0059] In some embodiments, according to any of the methods described above, the target molecule or target molecule complex comprises a virus, and the effector activity is a neutralization activity determined by a pseudovirus neutralization assay or a live virus neutralization assay.
[0060] In some embodiments, according to any of the methods described above, the target molecule or target molecule complex comprises a bacterium, and the effector activity is a bactericidal activity determined by a serum bactericidal assay (SBA) or an opsonophagocytic killing assay (OPKA).
[0061] In some embodiments, according to any of the methods described above, the plurality of characteristic descriptors in (3) comprises the solubility score, optionally wherein the solubility score is determined using SKADE.
[0062] In some embodiments, according to any of the methods described above, the plurality of characteristic descriptors in (3) comprises the aggregation score, optionally wherein the aggregation score corresponds to the number of residues predicted to have a propensity to aggregate and is determined by a method comprising the steps of: determining a 3D structure of the ABP, optionally wherein the 3D structure is determined using ABodyBuilder2; and determining the aggregation score based on the 3D structure, optionally wherein the aggregation score is determined using Aggrescan3D.
[0063] In some embodiments, according to any of the methods described above, the plurality of characteristic descriptors in (3) comprises the hydrophobicity score, optionally wherein the hydrophobic score is determined as the grand average of hydropathy (GRAVY), optionally wherein the hydropathy value of each amino acid is calculated using the Eisenberg scale.14 28152 / 59621 / FW / 25024150.11Attorney Docket No.: 28152-59621 / WOClient Ref. No.: 024WO
[0064] In some embodiments, according to any of the methods described above, the plurality of characteristic descriptors in (3) comprises the isoelectric point, optionally wherein the isoelectric point is determined as EMBOSS pK values.
[0065] In some embodiments, according to any of the methods described above, the plurality of characteristic descriptors in (3) comprises the stability7score, optionally wherein the stability7score is determined by a method comprising the steps of: calculating an aliphatic index by determining the relative volume of A, V. L, and I residues, wherein the stability7score corresponds to the aliphatic index.
[0066] In some embodiments, according to any of the methods described above, the plurality7of characteristic descriptors in (3) comprises an abundance frequency or fold-change of the increase in the abundance frequency7of the respective ABP following a sorting process to enrich for binding to the respective target antigen, optionally wherein the sorting process is fluorescence-activated cell sorting (FACS) or magnetic-activated cell sorting (MACS) sorting, further optionally wherein the sorting is carried by yeast display.
[0067] In some embodiments, according to any of the methods described above, the plurality7of characteristic descriptors in (3) comprises the number of cleavage sites, optionally wherein the cleavage site is a DP motif in the variable heavy or variable light chain region of the respective ABP.
[0068] In some embodiments, according to any of the methods described above, the plurality7of characteristic descriptors in (3) comprises the number of deamidation sites, optionally wherein the deamidation site is an NG, NS, or NA motif in CDR2H or CDR1L of the respective ABP.
[0069] In some embodiments, according to any of the methods described above, the plurality of characteristic descriptors in (3) comprises the number of isomerization sites, optionally wherein the isomerization site is a DG or DS motif in CDR2H, CDR3H, or CDR1L of the respective ABP.
[0070] In some embodiments, according to any of the methods described above, the plurality of characteristic descriptors in (3) comprises the number of oxidation sites, optionally wherein the oxidation site is a W or M residue in the CDRHs or CDRLs of the respective ABP.
[0071] In some embodiments, according to any of the methods described above, the plurality of characteristic descriptors in (3) comprises the binding specificity, optionally wherein the15 28152 / 59621 / FW / 25024150.11Attorney Docket No.: 28152-59621 / WOClient Ref. No.: 024WO binding specificity corresponds to the number of variants of a target antigen capable of being targeted by the respective ABP, further optionally wherein the binding specificity is determined by a PolyMap assay.
[0072] In some embodiments, according to any of the methods described above, the token sequences of each sample ABP library dataset includes a set of tokens, each token numerically encoding a respective residue of the full sequence or the partial sequence of the respective subset of ABPs.
[0073] In some embodiments, according to any of the methods described above, the machinelearning model is configured as a neural network model.
[0074] In some embodiments, according to any of the methods described above, wherein (1) the score for each sample ABP library dataset decreases when the stability score associated with (3) (vii) of each ABP in the respective subset of ABP's increases, (2) the score for each sample ABP library dataset increases when a diversity associated with (4) (ii) increases, (3) the score for each sample ABP library dataset increases when an immunogenicity of the respective subset of ABP’s decreases, (4) the score for each sample ABP library dataset increases when a diversity associated with 4 (iv) or 4 (v) increases.
[0075] In some embodiments, provided herein is a non-transitory computer readable medium storing the parameters of the machine-learning model trained according to any of the methods described above.
[0076] In some embodiments, provided herein is a method of predicting scores for a plurality of candidate ABP library datasets, comprising: (8) for each candidate ABP library dataset, applying parameters of the machine-learning model trained according to any of the methods described above to token sequences for the respective candidate ABP library dataset to generate a predicted score for the candidate ABP library' dataset; and (9) selecting a candidate ABP library dataset based on the generated predicted scores for the plurality of candidate ABP library' datasets.
[0077] In some embodiments, provided herein is a recombinant polyclonal protein (RPP) comprising at least a set of ABPs specific for a target molecule or complex, wherein the RPP is formed according to any of the methods described above.
[0078] In some embodiments, provided herein is a method of treating a patient in need thereof by administering an effective amount of the pharmaceutical composition according to any of the embodiments described above. In some embodiments, the patient has been16 28152 / 59621 / FW / 25024150.11Attorney Docket No.: 28152-59621 / WOClient Ref. No.: 024WO exposed to the target molecule or complex or a variant thereof. In some embodiments, the patient has a disease associated with the target molecule or complex or a variant thereof. In some embodiments, the pharmaceutical composition is administered intramuscularly, subcutaneously, intravenously, intradermally, orally, or through inhalation. In some embodiments, the effective amount is sufficient to treat the disease associated with the target molecule or complex or a variant thereof.5. BRIEF DESCRIPTION OF THE DRAWINGS AND TABLES
[0079] FIG. 1 illustrates an exemplary process of generating a recombinant polyclonal protein (RPP) specific for a target molecule or complex of target molecules using an in silico method disclosed herein.
[0080] FIG. 2 provides a heatmap visualizing FACS-enriched clones in COVID antibody libraries.
[0081] FIG. 3 shows V and J gene usage in COVID antibody libraries.
[0082] FIG. 4 shows COVID antibody V / J percent identity' to germline sequences.
[0083] FIG. 5 shows PolyMap profile of a COVID antibody library.
[0084] FIG. 6 shows computationally determined CO VID antibody developability profiles overlapped with developability profiles of clinical antibodies (Jain).
[0085] FIG. 7 illustrates an inference process of a transformer-based model for predicting a characteristic descriptor, according to an embodiment.
[0086] FIG. 8 illustrates a method of predicting a characteristic descriptor for an ABP using sequence-based features and structure-based features, according to an embodiment.
[0087] FIGs. 9A-9B illustrate experimental results for epitope prediction using the pipeline illustrated in FIG. 8. FIG. 9A provides average validation loss and average validation accuracy (%) of epitope predictions by (i) PLM alone; (ii) the structure embedding model alone; or (iii) PLM in combination with the structure embedding model. FIG. 9B shows the result by confusion matrix.
[0088] FIG. 10 illustrates a method of predicting hydrophobicity7of a respective ABP using the predicted structure of the ABP, according to an embodiment.
[0089] FIGs. 11A-11C illustrate experimental results for hydrophobicity score prediction using the pipeline, either sequence alone or sequence in combination with the structure embedding model involving the solvent-accessible surface area (SASA) normalization as17 28152 / 59621 / FW / 25024150.11Attorney Docket No.: 28152-59621 / WOClient Ref. No.: 024WO illustrated in FIG. 10. FIG. 11A shows correlations between various hydrophobic score predictions and the HIC retention times disclosed in Jain et al. FIG. 1 IB shows correlations between various hydrophobic score predictions and the HIC retention times from GDPA1 dataset. FIG. Il C shows the correlations as a scatter plot.
[0090] FIG. 12 illustrates a method of predicting aggregation of a respective ABP using a predicted structure of the respective ABP, according to an embodiment.
[0091] FIG. 13 shows correlations between the aggregation scores predicted using the pipeline illustrated in FIG. 12 and the HIC retention times.
[0092] FIG. 14 illustrates prediction accuracy of polyreactivity using a fine-tuned protein language classification model. Model was trained and validated using Dataset SI from Chen et al., Cell Reports, 2024.
[0093] FIG. 15 illustrates model prediction from FIG. 14 when applied to an unrelated set of sequences (Dataset S2 from Chen et al., Cell Reports, 2024). Sequences were clustered and subsampled to maximize diversity in the pool. Although accuracy is reduced, results here show model can be applied to unseen sequences.
[0094] FIG. 16 illustrates a method of selecting a filtered ABP library dataset for producing a RPP, according to one or more embodiments.
[0095] FIG. 17 illustrates a distribution of values for a set of input features in an example dataset for training an example scoring model, in accordance with an embodiment.
[0096] FIG. 18 illustrates a distribution of values for one or more experimentally measured performance metrics in the dataset for training the example scoring model, in accordance with an embodiment.
[0097] FIG. 19 illustrates weight values for a set of three example models trained using the example dataset, in accordance with an embodiment.
[0098] FIG. 20 illustrates experimental results of applying the trained scoring model to ABP library datasets to obtain library scores, in accordance with an embodiment.
[0099] FIG. 21 illustrates a method of training a machine-learning model for selecting a filtered ABP library dataset for producing a RPP, according to one or more embodiments.
[0100] FIG. 22 shows prediction of properties of COVID antibodies (binding, neutralization, ACE2 competition, epitope) using ML models.18 28152 / 59621 / FW / 25024150.11Attorney Docket No.: 28152-59621 / WOClient Ref. No.: 024WO
[0101] FIG. 23A shows the computationally derived developability parameters (aggregation, hydrophobicity, skade_solubility) of the tested antibodies (indicated by star icons) relative to clinical antibodies (histogram). FIG. 23B shows Tm (melting Temperature, °C), Tagg (Aggregation Temperature, °C) or Z-ave diameter (the average size of the antibody particles in solution, measured using dynamic light scattering, nm) of different pools of antibodies selected based on developability criteria (antibodies selected for “good’’ developability', “bad” developability, or “all”).
[0102] FIG. 24 provides data from the binding ELISA 1 assay described in Example 8. The graph shows the raw absorbance values of each sample's titration curves normalized for ease of comparison.
[0103] FIG. 25 provides EC50 values determined from the binding ELISA 1 assay described in Example 8.
[0104] FIG. 26 provides data from the binding ELISA 2 assay described in Example 8. The graph shows the raw absorbance values of each sample’s titration curves normalized for ease of comparison.
[0105] FIG. 27 provides data from the pseudoviral neutralization assay targeting the Wuhan- Hu- 1 strain.
[0106] FIG. 28 shows that a pool with ten clones selected using the PolyMap approach demonstrated neutralization potency that was comparable to, or in some cases exceeded, that of the top ten clones from the GIGA-2050 pool.
[0107] FIG. 29 provides pseudoviral neutralization (IC50) measured in IVIG, strain GIGA 2025, Pool 1 through 9 and Pool 16 and set of 4 mAbs against Wuhan variant.
[0108] FIG. 30 provides data from the neutralization potency assay tested against the Omicron variant and subsequent variants.
[0109] FIG. 31 provides the neutralization data from FIG. 19, further normalized to the Wuhan strain (set as 1).
[0110] FIG. 32 illustrates a workflow of a polyreactivity assay (FIG. 32) that can be used to determine the polyreactivity of antibodies.
[0111] FIG. 33 shows binding reactions between representative monoclonal antibodies and different PRR (insulin, BSA, KLH, ovalbumin, or CTLA-4) measured by ELISA.19 28152 / 59621 / FW / 25024150.11Attorney Docket No.: 28152-59621 / WOClient Ref. No.: 024WO
[0112] FIGs. 34A-34D show binding reactions between representative monoclonal antibodies and different PRR - insulin (FIG. 34A), BSA (FIG. 34B), KLH (FIG. 34C) or ovalbumin (FIG. 34D) - measured by FACS assay described in Example 9. The FACS data are presented as mean fluorescence intensity (MFI) plots.
[0113] FIG. 35 shows FACS data from polyreactivity assay described in Example 10. It shows binding of various monoclonal antibodies (Lenzilumab, Ixekizumab, Ganteneurumab, Elotuzumab. Duligotuzumab, Atezolizumab, Pembrolizumab) to different PRR reagents.
[0114] FIG. 36 shows reactivity of Lenzilumab and Atezolizumab to different concentrations of ovalbumin or KLH measured by polyreactivity assay described in Example 11. FACS data from the assay are represented as fluorescence intensity (MFI). Ovalbumin at 800 nM and KLH at 100 nM produced signals showing distinctions between Lenzilumab (sticky) and Atezolizumab (non-sticky) signals.
[0115] FIG. 37 show FACS data from the polyreactivity assay described in Example 11. It shows binding reaction of Lenzilumab scFv and Fab (sticky ) and Atezolizumab scFv and Fab (non-sticky ) to ovalbumin at different concentrations. Increasing ovalbumin concentration increased sticky and non-sticky control separation up to 800 nM before background signal rose.
[0116] FIG. 38 show FACS data from the polyreactivity7assay described in Example 11. It shows binding reaction of Lenzilumab scFv and Fab (sticky) and Atezolizumab scFv and Fab (non-sticky) using KLH at different concentrations. KLH showed maximum separation near 100 nM, with higher concentrations increasing C3 background.6. DETAILED DESCRIPTION6.1. Definitions
[0117] Unless otherwise defined herein, scientific and technical terms used in connection with the present invention shall have the meanings that are commonly understood by those of ordinary skill in the art. Further, unless otherwise required by context, singular terms shall include pluralities and plural terms shall include the singular. Generally, nomenclatures used in connection with, and techniques of, cell and tissue culture, molecular biology, immunology, microbiology, genetics and protein and nucleic acid chemistry and hybridization described herein are those well-known and commonly used in the art. The methods and techniques of the present invention are generally performed according to20 28152 / 59621 / FW / 25024150.11Attorney Docket No.: 28152-59621 / WOClient Ref. No.: 024WO conventional methods well known in the art and as described in various general and more specific references that are cited and discussed throughout the present specification unless otherwise indicated. See, e.g., Sambrook et al. Molecular Cloning: A Laboratory Manual, 2d ed.. Cold Spring Harbor Laboratory Press. Cold Spring Harbor, N.Y. (1989) and Ausubel et al., Current Protocols in Molecular Biology , Greene Publishing Associates (1992), and Harlow and Lane Antibodies: A Laboratory Manual Cold Spring Harbor Laboratory Press, Cold Spring Harbor, N.Y. (1990), which are incorporated herein by reference. Enzymatic reactions and purification techniques are performed according to manufacturer's specifications, as commonly accomplished in the art or as described herein. The terminology used in connection with, and the laboratory procedures and techniques of, analytical chemi str)-, synthetic organic chemistry', and medicinal and pharmaceutical chemistry described herein are those well-known and commonly used in the art. Standard techniques can be used for chemical syntheses, chemical analyses, pharmaceutical preparation, formulation, and delivery, and treatment of patients.
[0118] The following terms, unless otherwise indicated, shall be understood to have the following meanings:
[0119] The term “recombinant polyclonal protein” or “RPP” refers to more than one recombinant antigen binding proteins (ABPs). collectively comprising more than one antigen-binding domains that specifically bind to an antigen or epitope, or multiple antigens and epitopes. The recombinant polyclonal protein or RPP can be antibodies or variants or derivatives thereof. In some embodiments, the antigen-binding domains bind an antigen or epitope with specificity and affinity similar to that of a naturally occurring antibody. In some embodiments, an RPP comprises antibodies. In some embodiments, the RPP consists essentially of antibodies. In some embodiments, an RPP is a mixture of antibodies. In some embodiments, an RPP comprises scFvs. In some embodiments, the RPP comprises an alternative scaffold. In some embodiments, the RPP consists of alternative scaffolds. In some embodiments, the RPP consists essentially of alternative scaffolds. In some embodiments, the RPP comprises an antibody fragment. In some embodiments, the RPP consists of antibody fragments. In some embodiments, the RPP consists essentially of antibody fragments.
[0120] The term “ABP library” refers to a library’ comprising more than one ABPs, collectively comprising more than one antigen-binding domains that specifically bind to an antigen or epitope, or multiple antigens and epitopes. In some embodiments, all the ABPs in21 28152 / 59621 / FW / 25024150.11Attorney Docket No.: 28152-59621 / WOClient Ref. No.: 024WO the library have binding specificity or affinity against the same antigen. In some embodiments, the ABPs in the library have binding specificity or affinity against different antigens. In some embodiments, the ABP library’ comprises ABPs selected for having preferred characteristics (e.g., binding affinity, binding specificity, neutralizing activity, etc.). In some embodiments, the ABP comprises ABPs randomly selected.
[0121] The term “candidate ABP library” is an ABP library used in various embodiments disclosed herein for selecting an ABP library having one or more preferred properties.
[0122] The term “antigen binding protein” or “ABP” as used herein refers to a protein comprising one or more antigen-binding domains that specifically bind to an antigen or epitope. In some embodiments, the ABP comprises an antibody. In some embodiments, the ABP consists of an antibody. In some embodiments, the ABP consists essentially of an antibody. In some embodiments, the ABP comprises an alternative scaffold. In some embodiments, the ABP consists of an alternative scaffold. In some embodiments, the ABP consists essentially of an alternative scaffold. In some embodiments, the ABP comprises an antibody fragment. In some embodiments, the ABP consists of an antibody fragment. In some embodiments, the ABP consists essentially of an antibody fragment.
[0123] The term “antibody” is used herein in its broadest sense and includes certain types of immunoglobulin molecules comprising one or more antigen-binding domains that specifically bind to an antigen or epitope. An antibody specifically includes intact antibodies (e.g., intact immunoglobulins), antibody fragments, and multi-specific antibodies. One example of an antigen-binding domain is an antigen-binding domain formed by a VH -VL dimer.
[0124] The term “alternative scaffold” refers to a molecule in which one or more regions may be diversified to produce one or more antigen-binding domains that specifically bind to an antigen or epitope. In some embodiments, the antigen-binding domain binds the antigen or epitope with specificity and affinity similar to that of naturally occurring antibodies.Exemplary alternative scaffolds include those derived from fibronectin (e.g, Adnectins™), the (3-sandwich (e.g.. iMab), lipocalin (e.g., Anticalins®), EETI-II / AGRP, BPTI / LACI- D1 / ITI-D2 (e.g., Kunitz domains), thioredoxin peptide aptamers, protein A (e.g.. Affibody®), ankyrin repeats (e.g, DARPins), gamma-B-crystallin / ubiquitin (e.g, Affilins), CTLD3 (e.g, Tetranectins), Fynomers, and (LDLR-A module) (e.g, Avimers). Additional information on alternative scaffolds is provided in Binz et al., Nat. Biotechnol., 2005 23: 1257-1268; Skerra,22 28152 / 59621 / FW / 25024150.11Attorney Docket No.: 28152-59621 / WOClient Ref. No.: 024WOCurrent Opin. in Biotech., 2007 18:295-304; and Silacci et al., J. Biol. Chem., 2014, 289: 14392-14398; each of which is incorporated by reference in its entirety. Alternative scaffolds comprise one type of RPP.
[0125] The term “antigen-binding domain” means the portion of an antibody that is capable of specifically binding to an antigen or epitope.
[0126] The terms “full length antibody,” “intact antibody,” and “whole antibody” are used herein interchangeably to refer to an antibody having a structure substantially similar to a naturally occurring antibody structure and having heavy chains that comprise an Fc region.
[0127] The term “immunoglobulin” refers to a class of structurally related proteins, e.g., antibodies, generally comprising two pairs of polypeptide chains: one pair of light (L) chains and one pair of heavy (H) chains. In an “intact immunoglobulin,” all four of these chains are interconnected by disulfide bonds. The structure of immunoglobulins has been well characterized. See. e.g., Paul, Fundamental Immunology 7th ed., Ch. 5 (2013) Lippincott Williams & Wilkins, Philadelphia, PA. Briefly, each heavy chain typically comprises a heavy chain variable region (VH) and a heavy chain constant region (CH). The heavy chain constant region typically comprises three domains, abbreviated Cm, Cm, and CH3. Each light chain typically comprises a light chain variable region (VL) and a light chain constant region. The light chain constant region typically comprises one domain, abbreviated CL.
[0128] The term “Fc region” means the C-terminal region of an immunoglobulin heavy chain that, in naturally occurring antibodies, interacts with Fc receptors and certain proteins of the complement system. The structures of the Fc regions of various immunoglobulins, and the glycosylation sites contained therein, are known in the art. See Schroeder and Cavacini, J. Allergy Clin. Immunol., 2010, 125:S41-52, incorporated by reference in its entirety. The Fc region may be a naturally occurring Fc region, or an Fc region modified as described elsewhere in this disclosure.
[0129] The VH and VL regions may be further subdivided into regions of hypervariability' (“hypervariable regions (HVRs)” also called “complementarity determining regions” (CDRs)) interspersed with regions that are more conserved. The more conserved regions are called framework regions (FRs). Each VH and VL generally comprises three CDRs and four FRs, arranged in the following order (from N-terminus to C-terminus): FR1 - CDR1 - FR2 - CDR2 - FR3 - CDR3 - FR4. The CDRs are involved in antigen binding, and influence antigen specificity and binding affinity of the antibody. See Kabat et al., Sequences of23 28152 / 59621 / FW / 25024150.11Attorney Docket No.: 28152-59621 / WOClient Ref. No.: 024WOProteins of Immunological Interest 5th ed. (1991) Public Health Service, National Institutes of Health, Bethesda, MD, incorporated by reference in its entirety.
[0130] The light chain from any vertebrate species can be assigned to one of two types, called kappa (K) and lambda (X), based on the sequence of its constant domain.
[0131] The heavy chain from any vertebrate species can be assigned to one of five different classes (or isotypes): IgA, IgD, IgE, IgG, and IgM. These classes are also designated a, 8, e, y, and p, respectively. The IgG and IgA classes are further divided into subclasses on the basis of differences in sequence and function. Humans express the following subclasses: IgGl, IgG2, IgG3, IgG4. IgAl, and IgA2.
[0132] The amino acid sequence boundaries of a CDR can be determined by one of skill in the art using any of a number of know n numbering schemes, including those described by Kabat et al., supra ( ’Rabat" numbering scheme); Al-Lazikani et al., 1997, J. Mol. Biol., 273:927-948 (“Chothia” numbering scheme); MacCallum et al., 1996, J. Mol. Biol. 262:732- 745 (’‘Contact” numbering scheme); Lefranc et al., Dev. Comp. Immunol., 2003, 27:55-77 (“IMGT” numbering scheme); and Honegge and Pliickthun, J. Mol. Biol., 2001, 309:657-70 (“AHo” numbering scheme); each of which is incorporated by reference in its entirety.
[0133] Table 1 provides the positions of CDR1-L (CDR1 of VL), CDR2-L (CDR2 of VL), CDR3-L (CDR3 of VL), CDR1-H (CDR1 of VH), CDR2-H (CDR2 of VH), and CDR3-H (CDR3 of VH), as identified by the Kabat and Chothia schemes. For CDR1-H, residue numbering is provided using both the Kabat and Chothia numbering schemes.
[0134] CDRs may be assigned, for example, using antibody numbering software, such as Abnum, available at www.bioinf.org.uk / abs / abnum / , and described in Abhinandan and Martin. Immunology, 2008, 45:3832-3839, incorporated by reference in its entirety.24 28152 / 59621 / FW / 25024150.11Attorney Docket No.: 28152-59621 / WOClient Ref. No.: 024WO* The C-terminus of CDR1-H, when numbered using the Kabat numbering convention, varies between 32 and 34, depending on the length of the CDR.
[0135] The ELJ numbering scheme'’ is generally used when referring to a residue in an antibody heavy chain constant region (e.g., as reported in Kabat et al., supra).
[0136] An “antibody fragment” comprises a portion of an intact antibody, such as the antigen-binding or variable region of an intact antibody. Antibody fragments include, for example. Fv fragments, Fab fragments, F(ab’)2 fragments, Fab’ fragments, scFv (sFv) fragments, and scFv-Fc fragments.
[0137] A “monospecific RPP” is an RPP that comprises a binding site that specifically binds to a single epitope on a single antigen. An example of a monospecific RPP is a naturally occurring IgG molecule which, while divalent, recognizes the same epitope at each antigenbinding domain. The binding specificity may be present in any suitable valency.
[0138] A “polyspecific RPP” is an RPP that binds to more than one epitope on the same antigen, or more than one epitope on more than one antigen. An example of a poly specific RPP is a mixture of antibodies that bind to different serotypes of pneumococcal bacteria.
[0139] The term “monoclonal antibody” refers to an antibody from a population of substantially homogeneous antibodies. A population of substantially homogeneous antibodies comprises antibodies that are substantially similar and that bind the same epitope(s), except for variants that may normally arise during production of the monoclonal antibody. Such variants are generally present in only minor amounts. A monoclonal antibody is typically obtained by a process that includes the selection of a single antibody from a plurality of antibodies. For example, the selection process can be the selection of a unique clone from a plurality7of clones, such as a pool of hybridoma clones, phage clones, yeast clones, bacterial clones, or other recombinant DNA clones. The selected antibody can be further altered, for example, to improve affinity for the target (“affinity maturation”), to humanize the antibody, to improve its production in cell culture, and / or to reduce its immunogenicity7in a subject.
[0140] The term “polyclonal antibody” refers to a mixture of at least two monoclonal antibodies. Polyclonal antibodies may be either monospecific or polyspecific.
[0141] An “isolated RPP” or “isolated nucleic acid” is an RPP or nucleic acid that has been separated and / or recovered from a component of its natural environment. Components of the natural environment may include enzymes, hormones, and other proteinaceous or25 28152 / 59621 / FW / 25024150.11Attorney Docket No.: 28152-59621 / WOClient Ref. No.: 024WO nonproteinaceous materials. In some embodiments, an isolated RPP is purified to a degree sufficient to obtain at least 15 residues of N-terminal or internal amino acid sequence, for example by use of a spinning cup sequenator. In some embodiments, an isolated RPP is purified to homogeneity by gel electrophoresis (e.g, SDS-PAGE) under reducing or nonreducing conditions, with detection by Coomassie blue or silver stain. An isolated RPP includes an RPP in situ within recombinant cells, since at least one component of the RPP's natural environment is not present. In some aspects, an isolated RPP or isolated nucleic acid is prepared by at least one purification step. In some embodiments, an isolated RPP or isolated nucleic acid is purified to at least 80%, 85%, 90%, 95%, or 99% by weight. In some embodiments, an isolated RPP or isolated nucleic acid is purified to at least 80%, 85%, 90%, 95%, or 99% by volume. In some embodiments, an isolated RPP or isolated nucleic acid is provided as a solution comprising at least 85%, 90%, 95%, 98%, 99% to 100% RPP or nucleic acid by weight. In some embodiments, an isolated RPP or isolated nucleic acid is provided as a solution comprising at least 85%, 90%, 95%, 98%, 99% to 100% RPP or nucleic acid by volume.
[0142] “Affinity” refers to the strength of the sum total of non-covalent interactions between a single binding site of a molecule (e.g, an RPP) and its binding partner (e.g, an antigen or epitope). Unless indicated otherwise, as used herein, “affinity ” refers to intrinsic binding affinity, which reflects a 1 : 1 interaction between members of a binding pair (e.g, RPP and antigen or epitope). The affinity of a molecule X for its partner Y can be represented by the dissociation equilibrium constant (KD). The kinetic components that contribute to the dissociation equilibrium constant are described in more detail below. Affinity' can be measured by common methods known in the art, including those described herein. Affinity' can be determined, for example, using surface plasmon resonance (SPR) technology (e.g., BIACORE®) or biolayer interferometry (e.g, FORTEBIO®).
[0143] With regard to the binding of an RPP to a target molecule, the terms “bind,” “specific binding,” “specifically binds to,” “specific for,” “selectively binds.” and “selective for” a particular antigen (e.g., a polypeptide target) or an epitope on a particular antigen mean binding that is measurably' different from a non-specific or non-selective interaction (e.g., with a non-target molecule). Specific binding can be measured, for example, by measuring binding to a target molecule and comparing it to binding to a non-target molecule. Specific binding can also be determined by competition with a control molecule that mimics the26 28152 / 59621 / FW / 25024150.11Attorney Docket No.: 28152-59621 / WOClient Ref. No.: 024WO epitope recognized on the target molecule. In that case, specific binding is indicated if the binding of the RPP to the target molecule is competitively inhibited by the control molecule.
[0144] The term “kd” (sec-1), as used herein, refers to the dissociation rate constant of a particular ABP -antigen interaction. This value is also referred to as the koff value.
[0145] The term “ka” (M 'xsec1), as used herein, refers to the association rate constant of a particular ABP -antigen interaction. This value is also referred to as the konvalue.
[0146] The term “KD’' (M), as used herein, refers to the dissociation equilibrium constant of a particular ABP -antigen interaction. KD = kd / ka.
[0147] The term “KA” (M'1), as used herein, refers to the association equilibrium constant of a particular ABP -antigen interaction. A = ka / kd.
[0148] An “immunoconjugate” is an RPP conjugated to one or more heterologous molecule(s).
[0149] “Effector functions” refer to those biological activities mediated by the Fc region of an antibody, which activities may vary depending on the antibody isotype. Examples of antibody effector functions include Cl q binding to activate complement dependent cytotoxicity (CDC), Fc receptor binding to activate antibody-dependent cellular cytotoxicity (ADCC). and antibody dependent cellular phagocytosis (ADCP).
[0150] When used herein in the context of two or more RPPs, the term “competes with” or “cross-competes with” indicates that the two or more RPPs compete for binding to an antigen (e.g.. pneumococcus polysaccharide). In one exemplary assay, an antigen is coated on a surface and contacted with a first RPP against the antigen, after which a second RPP against the antigen is added. In another exemplary assay, a first RPP against an antigen is coated on a surface and contacted with the antigen, and then a second RPP against the antigen is added. If the presence of the first RPP against an antigen reduces binding of the second RPP, in either assay, then the RPPs compete. The term “competes with” also includes combinations of RPPs where one RPP reduces binding of another RPP, but where no competition is observed when the RPPs are added in the reverse order. However, in some embodiments, the first and second RPPs inhibit binding of each other, regardless of the order in which they are added. In some embodiments, one RPP reduces binding of another RPP to its antigen by at least 25%, at least 50%, at least 60%, at least 70%, at least 80%, at least 85%, at least 90%, or at least 95%. A skilled artisan can select the concentrations of the antibodies used in the competition assays based on the affinities of the RPPs for pneumococcus polysaccharide and the valency27 28152 / 59621 / FW / 25024150.11Attorney Docket No.: 28152-59621 / WOClient Ref. No.: 024WO of the RPPs. The assays described in this definition are illustrative, and a skilled artisan can utilize any suitable assay to determine if antibodies compete with each other. Suitable assays are described, for example, in Cox et al., ‘Immunoassay Methods,” in^say Guidance Manual [Internet], Updated December 24, 2014 (www.ncbi.nlm.nih.gov / books / NBK92434 / ; accessed September 29, 2015); Silman et al.. Cytometry, 2001, 44:30-37; and Finco et al., J. Pharm. Biomed. Anal., 2011, 54:351-358; each of which is incorporated by reference in its entirety.
[0151] The term “epitope” means a portion of an antigen the specifically binds to an RPP or an ABP. Epitopes frequently consist of surface-accessible amino acid residues and / or sugar side chains and may have specific three-dimensional structural characteristics, as well as specific charge characteristics. Conformational and non-conformational epitopes are distinguished in that the binding to the former but not the latter may be lost in the presence of denaturing solvents. An epitope may comprise amino acid residues that are directly involved in the binding, and other amino acid residues, which are not directly involved in the binding. The epitope to which an RPP or an ABP binds can be determined using know n techniques for epitope determination such as, for example, testing for RPP or an ABP binding to an antigen.
[0152] Percent “identity” between a polypeptide sequence and a reference sequence, is defined as the percentage of amino acid residues in the polypeptide sequence that are identical to the amino acid residues in the reference sequence, after aligning the sequences and introducing gaps, if necessary, to achieve the maximum percent sequence identity. Alignment for purposes of determining percent amino acid sequence identity7can be achieved in various ways that are within the skill in the art, for instance, using publicly available computer software such as BLAST, BLAST-2, ALIGN, MEGALIGN (DNASTAR). CLUSTALW, CLUSTAL OMEGA, or MUSCLE software. Those skilled in the art can determine appropriate parameters for aligning sequences, including any algorithms needed to achieve maximal alignment over the full length of the sequences being compared.
[0153] A “conservative substitution” or a “conservative amino acid substitution,” refers to the substitution an amino acid with a chemically or functionally similar amino acid. Conservative substitution tables providing similar amino acids are well known in the art. By way of example, the groups of amino acids provided in TABLES 2-4 are, in some embodiments, considered conservative substitutions for one another.28 28152 / 59621 / FW / 25024150.11Attorney Docket No.: 28152-59621 / WOClient Ref. No.: 024WO
[0154] Additional conservative substitutions may be found, for example, in Creighton,Proteins: Structures and Molecular Properties 2nd ed. (1993) W. H. Freeman & Co., New29 28152 / 59621 / FW / 25024150.11Attorney Docket No.: 28152-59621 / WOClient Ref. No.: 024WOYork, NY. An RPP generated by making one or more conservative substitutions of amino acid residues in a parent RPP is referred to as a ‘‘conservatively modified variant.”
[0155] The term '‘treating” (and variations thereof such as '‘treat” or “treatment”) refers to clinical intervention in an attempt to alter the natural course of a disease or condition in a subject in need thereof. Treatment can be performed both for prophylaxis and during the course of clinical pathology. Desirable effects of treatment include preventing occurrence or recurrence of disease, alleviation of symptoms, diminish of any direct or indirect pathological consequences of the disease, preventing reinfection or associated symptom, decreasing the rate of disease progression, amelioration or palliation of the disease state, and remission or improved prognosis. Improvements in any conditions can be readily assessed according to standard methods and techniques known in the art. The population of subjects treated by the method of the disease includes subjects suffering from the undesirable condition or disease, as well as subjects at risk for development of the condition or disease.
[0156] As used herein, the term “therapeutically effective amount” or “effective amount” refers to an amount of an RPP or pharmaceutical composition provided herein that, when administered to a subject, is effective to produces the desired effect for which it is administered. The exact dose or amount will depend on the purpose of the treatment, and will be ascertainable by one skilled in the art using known techniques (see, e.g. Lloyd (1999) The Art, Science and Technology of Pharmaceutical Compounding). A therapeutically effective amount can be a “prophylactically effective amount” as prophylaxis can be considered therapy. The term “sufficient amount” means an amount sufficient to produce a desired effect.
[0157] As used herein, the term '‘subject” means a mammalian subject. Exemplary subjects include humans, monkeys, dogs, cats, mice, rats, cows, horses, camels, goats, rabbits, and sheep. In certain embodiments, the subject is a human. In some embodiments the subject has a disease or condition that can be treated with an RPP provided herein. In some aspects, the disease or condition is a cancer. In some aspects, the disease or condition is a viral infection.
[0158] The term “pharmaceutical composition” refers to a preparation which is in such form as to permit the biological activity of an active ingredient contained therein to be effective in treating a subject, and which contains no additional components which are unacceptably toxic to the subject.30 28152 / 59621 / FW / 25024150.11Attorney Docket No.: 28152-59621 / WOClient Ref. No.: 024WO
[0159] The term “plasma cell’’ refers to white blood cells that secrete large volumes of antibodies. They are transported by the blood plasma and the lymphatic system. B cells (for example, either germinal center naive B cells or memory B cells) differentiate into plasma cells that produce antibody molecules closely modelled after the receptors of the precursor B cell. Once released into the blood and lymph, these antibody molecules bind to the target antigen (foreign substance) and initiate its neutralization or destruction. Terminally differentiated plasma cells express relatively few surface antigens, and do not express common pan-B cell markers, such as CD 19 and CD20. Instead, plasma cells are identified through flow cytometry by their additional expression of CD138, CD78, and the Interleukin-6 receptor. In humans, CD27 is a good marker for plasma cells, naive B cells are CD27-, memory7B-cells are CD27+ and plasma cells are CD27++. The surface antigen CD138 (syndecan-1) is expressed at high levels. Another important surface antigen is CD319 (SLAMF7). This antigen is expressed at high levels on normal human plasma cells. It is also expressed on malignant plasma cells in multiple myeloma. Compared with CD 138, which disappears rapidly ex vivo, the expression of CD319 is considerably more stable.
[0160] The term “plasmablast” refers to antibody-secreting cells in the peripheral blood, which differentiate from activated B cells, such as memory B cells, upon stimulation with an antigen. The most immature blood cell that is considered of plasma cell lineage is the plasmablast. Plasmablasts secrete more antibodies than B cells, but less than plasma cells. They divide rapidly and are still capable of internalizing antigens and presenting them to T cells. A cell may stay in this state for several days, and then either die or irrevocably differentiate into a mature, fully differentiated plasma cell. Differentiation of mature B cells into plasma cells is dependent upon the transcription factors Blimp- 1 / PRDM1 and IRF4.
[0161] The term “memory B cell” refers to a B cell sub-type that are formed within germinal centers following rimary infection and are important in generating an accelerated and more robust antibody-mediated immune response in the case of re-infection (also known as a secondary immune response). Memory B cells do not secrete antibody until activated by their specific antigen.
[0162] The term “naive B cell” refers to a B cell that has not been exposed to an antigen. Once exposed to an antigen, the naive B cell either becomes a memory B cell or a plasma cell that secretes antibodies specific to the antigen that was originally bound. Plasma cells do not31 28152 / 59621 / FW / 25024150.11Attorney Docket No.: 28152-59621 / WOClient Ref. No.: 024WO last long in the circulation, this is in contrast to memon cells that last for very long periods of time.
[0163] The term '‘peripheral blood’’ refers to blood which travels through peripheral vessels. Peripheral blood is typically obtained by venipuncture (also called phlebotomy), or by finger prick for small quantities.
[0164] The term "‘plasma hyperimmune” refers to a polyclonal antibody preparation similar to intravenous immunoglobulin (IVIg), except that it is prepared from the plasma of donors with high titers of antibody against a specific organism or antigen. The term hyperimmune is often used interchangeably with the terms “hyperimmune gammaglobulin” and “hyperimmune globulin”. Some agents against which hyperimmune globulins are available include hepatitis B, rabies, tetanus toxin, varicella-zoster, etc. Administration of hyperimmune globulin provides "passive" immunity to the patient against an agent. This is in contrast to vaccines that provide "active" immunity. However, vaccines take much longer to achieve that purpose while hyperimmune globulin provides instant "passive" short-lived immunity.
[0165] t he term activity refers to a quantitative measurement of an RPP or antibody against an antigen, vaccine, protein, epitope, cell, bacterium, or virus. Activity can be assessed using in vivo or in vitro methods.
[0166] The term “recombinant” refers to proteins that result from the expression of recombinant DNA within living cells. Recombinant DNA is the general name for a piece of DNA that has been created by the combination of at least two separate segments of DNA.
[0167] The term “neutralization” refers to the ability of specific antibodies to block the site(s) on viruses that they use to enter their target cell. The effect of a neutralizing antibody can be negligible even with large excesses of antibody production if they lack specificity to this antigen. The production of specific antibodies can be learned for a faster response at next exposition. The reduction or destruction of a homologous infectious agent can be partial or complete and can make it no longer infectious or pathogenic to other cells.
[0168] A “variant” of a polypeptide (e.g., an antibody) comprises an amino acid sequence wherein one or more amino acid residues are inserted into, deleted from and / or substituted into the amino acid sequence relative to the native polypeptide sequence, and retains essentially the same biological activity' as the native polypeptide. The biological activity of the polypeptide can be measured using standard techniques in the art (for example, if the32 28152 / 59621 / FW / 25024150.11Attorney Docket No.: 28152-59621 / WOClient Ref. No.: 024WO variant is an antibody, its activity may be tested by binding assays, as described herein). Variants of the invention include fragments, analogs, recombinant polypeptides, synthetic polypeptides, and / or fusion proteins.
[0169] A “derivative” of a polypeptide is a polypeptide (e.g., an antibody) that has been chemically modified, e.g., via conjugation to another chemical moiety such as, for example, polyethylene glycol, albumin (e.g., human serum albumin), phosphorylation, and glycosylation. Unless otherwise indicated, the term “antibody” includes, in addition to antibodies comprising two full-length heavy chains and two full-length light chains, derivatives, variants, fragments, and muteins thereof, examples of which are described below.
[0170] A nucleotide sequence is “operably linked” to a regulatory sequence if the regulatory sequence affects the expression (e.g., the level, timing, or location of expression) of the nucleotide sequence. A “regulatory sequence” is a nucleic acid that affects the expression (e.g. , the level, timing, or location of expression) of a nucleic acid to which it is operably linked. The regulatory sequence can, for example, exert its effects directly on the regulated nucleic acid, or through the action of one or more other molecules (e.g., polypeptides that bind to the regulatory sequence and / or the nucleic acid). Examples of regulatory sequences include promoters, enhancers and other expression control elements (e.g., poly adenylation signals). Further examples of regulatory sequences are described in, for example, Goeddel, 1990, Gene Expression Technology: Methods in Enzymology 185, Academic Press, San Diego, CA and Baron et al., 1995, Nucleic Acids Res. 23:3605-06.
[0171] A “host cell” is a cell that can be used to express a nucleic acid, e.g., a nucleic acid of the invention. A host cell can be a prokaryote, for example, E. coli, or it can be a eukaryote, for example, a single-celled eukaryote (e.g., a yeast or other fungus), a plant cell (e.g., a tobacco or tomato plant cell), an animal cell (e.g., a human cell, a monkey cell, a hamster cell, a rat cell, a mouse cell, or an insect cell) or a hybridoma. Examples of host cells include CS-9 cells, the COS-7 line of monkey kidney cells (ATCC CRL 1651) (see Gluzman et al., 1981 , Cell 23: 175), L cells, C127 cells, 3T3 cells (ATCC CCL 163), Chinese hamster ovary (CHO) cells or their derivatives such as Veggie CHO and related cell lines which grow in serum-free media (see Rasmussen et al., 1998, Cytotechnology 28:31), HeLa cells, BHK (ATCC CRL 10) cell lines, the CV1 / EBNA cell line derived from the African green monkey kidney cell line CV1 (ATCC CCL 70) (see McMahan et al., 1991, EMBO J. 10:2821), human embryonic kidney cells such as 293, 293 EBNA or MSR 293, human epidermal A431 cells,33 28152 / 59621 / FW / 25024150.11Attorney Docket No.: 28152-59621 / WOClient Ref. No.: 024WO human Colo205 cells, other transformed primate cell lines, normal diploid cells, cell strains derived from in vitro culture of primary tissue, primary explants, HL-60, U937, HaK or Jurkat cells. Typically, a host cell is a cultured cell that can be transformed or transfected with a polypeptide-encoding nucleic acid, which can then be expressed in the host cell.
[0172] The phrase “recombinant host cell” can be used to denote a host cell that has been transformed or transfected with a nucleic acid to be expressed. A host cell also can be a cell that comprises the nucleic acid but does not express it at a desired level unless a regulatory sequence is introduced into the host cell such that it becomes operably linked with the nucleic acid. It is understood that the term host cell refers not only to the particular subject cell but to the progeny or potential progeny of such a cell. Because certain modifications may occur in succeeding generations due to. e.g., mutation or environmental influence, such progeny may not, in fact, be identical to the parent cell, but are still included within the scope of the term as used herein.6.2. Other interpretational conventions
[0173] Ranges recited herein are understood to be shorthand for all of the values within the range, inclusive of the recited endpoints. For example, a range of 1 to 50 is understood to include any number, combination of numbers, or sub-range from the group consisting of 1 , 2, 3, 4, 5, 6. 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42. 43. 44. 45. 46, 47, 48, 49, and 50.
[0174] Unless otherwise indicated, reference to a compound that has one or more stereocenters intends each stereoisomer, and all combinations of stereoisomers, thereof.6.3. Methods of generating a custom RPP
[0175] In one aspect, the present disclosure provides a method of generating a recombinant polyclonal protein (RPP) specific for a target molecule or complex of target molecules. In some embodiments, the method comprises: (1) obtaining an input antigen binding protein (ABP) library dataset including an ABP profile for each of a plurality of ABPs and (2) generating a filtered ABP library dataset corresponding to a subset of the plurality of ABPs and comprising a reference to each of the subset of the plurality' of ABPs. In some embodiments, the filtered ABP library dataset is provided for generation of a composition comprising the ABPs. In some embodiments, the method further comprises the step of (3)34 28152 / 59621 / FW / 25024150.11Attorney Docket No.: 28152-59621 / WOClient Ref. No.: 024WO generating a composition comprising ABPs corresponding to the ABP references in the filtered ABP library dataset, thereby generating the RPP.
[0176] In some embodiments, the filtered ABP library dataset corresponds to a subset of the plurality of ABPs having at least one of a plurality of characteristic descriptors having a value within a predetermined range for the characteristic. In some embodiments, the filtered ABP library dataset corresponds to a subset of the plurality of ABPs having have one or more preferred library properties. In some embodiments, the filtered ABP library dataset corresponds to a subset of the plurality of ABPs having at least one of the plurality of characteristic descriptors having a value within a predetermined range for the characteristic and having have one or more preferred library properties.
[0177] In some embodiments, the one or more preferred library properties are selected from:(i) the set of heavy chain CDR3 sequences contained in the subset of the plurality of ABPs comprises at least about 10, 20, 50, 100, 200, or 1000 unique sequences;(ii) the subset of the plurality of ABPs specifically bind to at least two unique epitopes associated with the target molecule or complex;(iii) the subset of the plurality of ABPs is capable of modulating at least two target antigen variants;(iv) the set of heavy chain V genes represented in the subset of the plurality of ABPs comprises at least two unique V genes;(v) the set of light chain V genes represented in the subset of the plurality of ABPs comprises at least two unique V genes;(vi) the set of heavy chain J genes represented in the subset of the plurality of ABPs comprises at least two unique J genes;(vii) the set of light chain J genes represented in the subset of the plurality of ABPs comprises at least two unique J genes;(viii) the average percent germline identity’ of heavy chain V genes represented in the subset of the plurality of ABPs is between about 50% and about 100%;(ix) the average percent germline identity of light chain V genes represented in the subset of the plurality of ABPs is between about 50% and about 100%;(x) the average percent germline identity of heavy chain J genes represented in the subset of the plurality of ABPs is between about 50% and about 100%; and35 28152 / 59621 / FW / 25024150.11Attorney Docket No.: 28152-59621 / WOClient Ref. No.: 024WO(xi) the average percent germline identity of light chain J genes represented in the subset of the plurality of ABPs is between about 50% and about 100%.
[0178] In some embodiments, the at least one of a plurality of characteristic descriptors are selected from:(i) a binding affinity of the respective ABP for the respective target antigen;(ii) an effector activity of the respective ABP against the target molecule or complex;(iii) a solubility score of the respective ABP;(iv) an aggregation score of the respective ABP;(v) a hydrophobicity score of the respective ABP;(vi) an isoelectric point of the respective ABP;(vii) a stability score of the respective ABP;(viii) a molecular weight of the respective ABP;(ix) a number of unpaired cysteine residues in the respective ABP;(x) an abundance frequency of the respective ABP following a sorting process to enrich for binding to the respective target antigen;(xi) a fold-change of the increase in the abundance frequency of the respective ABP following a sorting process to enrich for binding to the respective target antigen, as compared to the abundance frequency prior to enrichment;(xii) a number of non-canonical glycosylation sites in the respective ABP;(xiii) a number of cleavage sites in the respective ABP;(xiv) a number of deamidation sites in the respective ABP;(xv) a number of isomerization sites in the respective ABP;(xvi) a number of oxidation sites in the respective ABP;(xvii) CDR3H length of the respective ABP;(xviii) binding speci ficity of the respective ABP;(xix) immunogenicity of the respective ABP;(xx) poly specificity of the respective ABP; and(xxi) a respective epitope that the respective ABP binds to.
[0179] In some embodiments, the plurality of characteristic descriptors for ABPs are obtained experimentally, by in silico method, by ML / Al methods, from other sources or a combination thereof. Any method known in the art for obtaining the relevant information can be adopted in various embodiments.36 28152 / 59621 / FW / 25024150.11Attorney Docket No.: 28152-59621 / WOClient Ref. No.: 024WO
[0180] In some embodiments, the plurality of characteristic descriptors comprises the binding affinity of the respective ABP for the respective target antigen. In some embodiments, the binding affinity is expressed in KD, and the predetermined range for the binding affinity is less than about 10 pM, 1 pM, 100 nM, 10 nM, 1 nM, or lower, optionally wherein the binding affinity is determined by surface plasmon resonance (SPR) or biolayer interferometry (BLI). In some embodiments, the ABP has a KD less than about 100 pM (such as less than about any of 100 pM, 90 pM, 80 pM, 70 pM, 60 pM, 50 pM, 40 pM, 30 pM, 20 pM, 10 pM, 9 pM. 8 pM, 7 pM. 6 pM, 5 pM. 4 pM, 3 pM. 2 pM, 1 pM. 900 nM, 800 nM, 700 nM. 600 nM, 500 nM, 400 nM, 300 nM, 200 nM, 100 nM, 50 nM, 1 nM, 900 pM, 800 pM, 700 pM, 600 pM, 500 pM, 400 pM, 300 pM, 200 pM, 100 pM, 50 pM, 1 pM, or lower). In some embodiments, the ABP has a KD from about 100 pM to about 1 pM (such as about any of 100 pM, 90 pM, 80 pM, 70 pM, 60 pM. 50 pM, 40 pM, 30 pM, 20 pM, 10 pM, 9 pM, 8 pM, 7 pM, 6 pM. 5 pM, 4 pM. 3 pM, 2 pM. 1 pM, 900 nM, 800 nM. 700 nM, 600 nM, 500 nM. 400 nM, 300 nM, 200 nM, 100 nM, 50 nM, 1 nM, 900 pM, 800 pM, 700 pM, 600 pM, 500 pM, 400 pM, 300 pM, 200 pM, 100 pM, 50 pM, or 1 pM, including any ranges between any of these values).
[0181] In some embodiments, the binding affinity' is determined by a PolyMap assay comprising the steps of: providing a library of target-decorated cells, wherein each of the target-decorated cells presents the target molecule or complex on the membrane; contacting the library of target-decorated cells with a plurality of ABP-ribosome- mRNA (ARM) complexes corresponding to the one or more of the plurality of ABPs, thereby inducing binding between the target-decorated cells and the ARM complexes; generating a plurality of monodisperse or poly disperse emulsion microdroplets, wherein each microdroplet contains a single cell out of the target-decorated cells, one or more ARM complexes bound to the single cell, and a lysis reagent inducing lysis of the single cell; capturing RNA released from the single cell on a solid surface or within a semi- permeable shell; generating a library’ of hybrid polynucleic acids that comprise a sequence from a transcript of the single cell and / or a sequence from the mRNA of the ARM complex; sequencing the library’ of hybrid polynucleic acids; and37 28152 / 59621 / FW / 25024150.11Attorney Docket No.: 28152-59621 / WOClient Ref. No.: 024WO determining a presence or absence of binding of each of the one or more of the plurality of ABPs to their respective target antigen.
[0182] In some embodiments, the predetermined range for the binding affinity is a presence of binding of the respective ABP to their respective target antigen in the PolyMap assay.
[0183] In some embodiments, the binding affinity is determined by an ML / Al model.
[0184] In some embodiments, the plurality of characteristic descriptors comprises the effector activity' of the respective ABP against the target molecule or target molecule complex. In some embodiments, the target molecule or target molecule complex comprises a virus, and the effector activity is a neutralization activity determined by a pseudovirus neutralization assay or a live virus neutralization assay. In some embodiments, the neutralization activity7is determined by a pseudovirus neutralization assay and corresponds to an ICso from about 1 ng / mL to about 500 mg / mL (such as about any of 1 ng / mL, 10 ng / mL, 20 ng / mL, 30 ng / mL, 40 ng / mL, 50 ng / mL, 60 ng / mL, 70 ng / mL. 80 ng / mL, 90 ng / mL, 100 ng / mL, 200 ng / mL, 300 ng / mL, 400 ng / mL, 500 ng / mL, 600 ng / mL, 700 ng / mL, 800 ng / mL, 900 ng / mL, 1 pg / mL, 10 pg / mL, 20 pg / mL, 30 pg / mL, 40 pg / rnL, 50 pg / mL, 60 pg / mL, 70 pg / mL. 80 pg / mL, 90 pg / mL, 100 pg / mL, 200 pg / mL, 300 pg / mL, 400 pg / mL, 500 pg / mL, 600 pg / mL, 700 pg / mL, 800 pg / mL, 900 pg / mL, 1 mg / mL. 10 mg / mL, 20 mg / mL, 30 mg / mL, 40 mg / mL, 50 mg / mL, 60 mg / mL, 70 mg / mL, 80 mg / mL, 90 mg / mL, 100 mg / mL, 200 mg / mL, 300 mg / mL, 400 mg / mL, or 500 mg / mL, including any ranges between any of these values). In some embodiments, the neutralization activity is determined by a live virus neutralization assay and corresponds to an IC50 from about 1 ng / mL to about 500 mg / mL (such as about any of 1 ng / mL, 2 ng / mL, 3 ng / mL, 4 ng / mL, 5 ng / mL, 6 ng / mL, 7 ng / mL, 8 ng / mL, 9 ng / mL, 10 ng / mL, 20 ng / mL, 30 ng / mL, 40 ng / mL, 50 ng / mL, 60 ng / mL, 70 ng / mL, 80 ng / mL, 90 ng / mL, 100 ng / mL, 200 ng / mL, 300 ng / mL, 400 ng / mL, 500 ng / mL, 600 ng / mL, 700 ng / mL, 800 ng / mL, 900 ng / mL, 1 pg / mL, 10 pg / mL, 20 pg / mL, 30 pg / mL, 40 pg / mL, 50 pg / mL, 60 pg / mL, 70 pg / mL. 80 pg / mL. 90 pg / mL, 100 pg / mL, 200 pg / mL, 300 pg / mL, 400 pg / mL, 500 pg / mL, 600 pg / mL, 700 pg / mL, 800 pg / mL, 900 pg / mL, 1 mg / mL, 10 mg / mL, 20 mg / mL, 30 mg / mL, 40 mg / mL, 50 mg / mL, 60 mg / mL, 70 mg / mL, 80 mg / mL, 90 mg / mL, 100 mg / mL, 200 mg / mL, 300 mg / mL, 400 mg / mL, or 500 mg / mL, including any ranges between any of these values). In some embodiments, the predetermined range for the target molecule or target molecule complex neutralization activity7corresponds to (a) an IC50 from about 0.08 pg / mL to about 900 pg / mL when determined by7pseudovirus38 28152 / 59621 / FW / 25024150.11Atorney Docket No.: 28152-59621 / WOClient Ref. No.: 024WO neutralization assay or (b) an ICso from about 0.003 pg / mL to about 1350 pg / mL when determined by live virus neutralization assay.
[0185] In some embodiments, the target molecule or target molecule complex comprises a bacterium, and the effector activity is a bactericidal activity determined by a serum bactericidal assay (SBA) or an opsonophagocytic killing assay (OPKA). In some embodiments, the bactericidal activity corresponds to a concentration where 50% bactericidal activity is observed from about 1 ng / ml to about 500 mg / ml (such as about any of 1 ng / mL, 2 ng / mL, 3 ng / mL, 4 ng / mL, 5 ng / mL, 6 ng / mL, 7 ng / mL, 8 ng / mL, 9 ng / mL, 10 ng / mL, 20 ng / mL, 30 ng / mL, 40 ng / mL, 50 ng / rnL, 60 ng / mL, 70 ng / mL, 80 ng / mL, 90 ng / mL, 100 ng / mL, 200 ng / mL, 300 ng / mL, 400 ng / mL, 500 ng / mL, 600 ng / mL, 700 ng / mL, 800 ng / mL, 900 ng / mL, 1 pg / mL, 10 pg / mL, 20 pg / mL. 30 pg / mL, 40 pg / mL, 50 pg / mL, 60 pg / mL, 70 pg / mL, 80 pg / mL, 90 pg / mL, 100 pg / mL, 200 pg / mL, 300 pg / mL, 400 pg / mL, 500 pg / mL, 600 pg / mL, 700 pg / mL, 800 pg / mL, 900 pg / mL, 1 mg / mL, 10 mg / mL, 20 mg / mL, 30 mg / mL, 40 mg / mL, 50 mg / mL, 60 mg / mL, 70 mg / mL, 80 mg / mL, 90 mg / mL, 100 mg / mL, 200 mg / mL, 300 mg / mL, 400 mg / mL. or 500 mg / mL, including any ranges between any of these values). In some embodiments, the predetermined range for the bactencidal activity corresponds to a concentration where 50% bactericidal activity is observed from about 0.08 pg / ml to about 3600 pg / ml.
[0186] In some embodiments, the plurality of characteristic descriptors comprises the solubility' score, optionally wherein the solubility score is determined using SKADE. In some embodiments, the solubility score is greater than about 0.5 (such as greater than about any of 0.5, 0.51. 0.52, 0.53, 0.54, 0.55, 0.56, 0.57, 0.58, 0.59, 0.6, 0.61, 0.62, 0.63, 0.64, 0.65. 0.66, 0.67. 0.68, 0.69, 0.7. 0.71, 0.72, 0.73, 0.74, 0.75, 0.76, 0.77, 0.78, 0.79. 0.8, 0.81, 0.82. 0.83. 0.84, 0.85, 0.86, 0.87, 0.88, 0.89, 0.9, 0.91, 0.92, 0.93, 0.94, 0.95, 0.96, 0.97, 0.98, or 0.99). In some embodiments, the solubility score is from about 0.5 to about 0.8 (such as about any of 0.5, 0.51, 0.52, 0.53, 0.54, 0.55, 0.56, 0.57, 0.58, 0.59, 0.6, 0.61, 0.62, 0.63, 0.64, 0.65, 0.66, 0.67, 0.68, 0.69, 0.7, 0.71, 0.72, 0.73, 0.74, 0.75, 0.76, 0.77, 0.78. 0.79, or 0.8, including any ranges between any of these values). In some embodiments, the predetermined range for the solubility score is greater than about 0.5, or between 0.5 and 0.8.
[0187] In some embodiments, the plurality of characteristic descriptors comprises the aggregation score, optionally wherein the aggregation score corresponds to the number of residues predicted to have a propensity to aggregate and is determined by a method39 28152 / 59621 / FW / 25024150.11Attorney Docket No.: 28152-59621 / WOClient Ref. No.: 024WO comprising the steps of: determining a 3D structure of the ABP, optionally wherein the 3D structure is determined using ABodyBuilder2; and determining the aggregation score based on the 3D structure, optionally wherein the aggregation score is determined using Aggrescan3D. In some embodiments, the aggregation score is fewer than 50 (such as fewer than any of 49, 48, 47, 46, 45, 44, 43, 42, 41, 40, 39, 38, 37, 36, 35, 34, 33, 32, 31, 30, 29, 28, 27, 26, 25, 24, 23, 22, 21, 20, 19, 18, 17, 16, 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, or 1) aggregation-prone sites. In some embodiments, the predetermined range for the aggregation score is fewer than 20. 15, or lower aggregation -prone sites.
[0188] In some embodiments, the plurality of characteristic descriptors comprises the hydrophobicity score, optionally wherein the hydrophobic score is determined as the grand average of hydropathy (GRAVY), optionally wherein the hydropathy value of each amino acid is calculated using the Eisenberg scale. In some embodiments, the hydrophobicity score is less than about 0.1 (such as less than about any of 0.1, 0.095, 0.09, 0.085, 0.08, 0.075, 0.07, 0.065, 0.06, 0.055, 0.05, 0.045, 0.04, 0.035, 0.03, 0.025, 0.02, 0.015, 0.01, 0.005, or lower).In some embodiments, the predetermined range for the hydrophobicity score is less than 0.03, less than 0.02, or less than 0.015.
[0189] In some embodiments, the plurality of characteristic descriptors comprises the isoelectric point, optionally wherein the isoelectric point is determined as EMBOSS pK values. In some embodiments, the isoelectric point is between about 6.5 and about 9.5 (such as about any of 6.5, 6.6, 6.7, 6.8, 6.9, 7.0, 7.1, 7.2, 7.3, 7.4, 7.5, 7.6, 7.7, 7.8, 7.9, 8.0, 8.1, 8.2, 8.3, 8.4, 8.5, 8.6, 8.7, 8.8, 8.9, 9.0, 9.1, 9.2, 9.3, 9.4, or 9.5, including any ranges between any of these values). In some embodiments, the predetermined range for the isoelectric point is between about 7.0 and about 9.0 or between about 8.0 and about 8.5.
[0190] In some embodiments, the plurality' of characteristic descriptors comprises the stability score, optionally wherein the stability score is determined by a method comprising the steps of: calculating an aliphatic index by determining the relative volume of A, V. L, and I residues, wherein the stability score corresponds to the aliphatic index. In some embodiments, the stability' score is from about 60 to about 80 (such as about any of 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, or 80, including any ranges between any of these values). In some embodiments, the predetermined range for the stability score is from about 65 to about 73.40 28152 / 59621 / FW / 25024150.11Attorney Docket No.: 28152-59621 / WOClient Ref. No.: 024WO
[0191] In some embodiments, the plurality of characteristic descriptors comprises the molecular weight and the predetermined range for the molecular weight is less than about 250 kDa (such as less than about any of 250 kDa, 245 kDa, 240 kDa, 235 kDa, 230 kDa, 225 kDa, 220 kDa, 215 kDa. 210 kDa. 205 kDa. 200 kDa. 195 kDa. 190 kDa. 185 kDa, 180 kDa, 175 kDa, 170 kDa, 165 kDa, 160 kDa, 155 kDa, 150 kDa, 145 kDa, 140 kDa, 135 kDa, 130 kDa, 125 kDa, 120 kDa, 115 kDa, 110 kDa, 105 kDa, 100 kDa, or less). In some embodiments, the predetermined range for the molecular weight is less than about 170 kDa, less than about 160 kDa. less than about 150 kDa. less than about 140 kDa. less than about 130 kDa, less than about 120 kDa, or lower.
[0192] In some embodiments, the plurality of characteristic descriptors comprises a number of unpaired cysteine residues and the predetermined range for the number of unpaired cysteine residues is less than 5, 4, 3, 2, or 1.
[0193] In some embodiments, the plurality of characteristic descriptors comprises an abundance frequency or fold-change of the increase in the abundance frequency of the respective ABP following a sorting process to enrich for binding to the respective target antigen, optionally wherein the sorting process is fluorescence-activated cell sorting (FACS) or magnetic-activated cell sorting (MACS) sorting, further optionally wherein the sorting is carried by yeast display.
[0194] In some embodiments, the predetermined range for the post-sort abundance frequency is greater than about 0.01% (such as greater than about any of 0.010%, 0.015%, 0.020%, 0.025%, 0.030%, 0.035%, 0.040%, 0.045%. 0.050%, 0.055%, 0.060%, 0.065%, 0.070%, 0.075%, 0.080%, 0.085%. 0.090%, 0.095%. 0.100%, or greater) within a pool of ABPs obtained after the sorting process. In some embodiments, the predetermined range for the post-sort abundance frequency is greater than about any of 0.01%, 0.02%, 0.03%, 0.04%, 0.05%, or greater within a pool of ABPs obtained after the sorting process. In some embodiments, the predetermined range for the post-sort fold-change in abundance frequency is greater than about any of 0.5 (such as greater than about any of 1.0, 1 .2, 1 .4, 1 .6, 1.8, 2.0, 2.2, 2.4, 2.6, 2.8, 3.0, 3.2, 3.4, 3.6, 3.8, 4.0, 4.2, 4.4, 4.6, 4.8, 5.0, 5.2, 5.4, 5.6, 5.8, 6.0, or greater). In some embodiments, the predetermined range for the post-sort fold-change is greater than about any of 1, 1.5, 2. 2.5, 3, 4, or greater. In some embodiments, the enrichment is performed by FACS or MACS.41 28152 / 59621 / FW / 25024150.11Attorney Docket No.: 28152-59621 / WOClient Ref. No.: 024WO
[0195] In some embodiments, the plurality of characteristic descriptors comprises the number of non-canonical glycosylation sites, optionally wherein the predetermined range for the number of non-canonical glycosylation sites is less than 5, 4, 3, 2, or 1.
[0196] In some embodiments, the plurality of characteristic descriptors comprises the number of cleavage sites, optionally wherein the predetermined range for the number of cleavage sites is less than 5, 4. 3, 2, or 1, further optionally wherein the cleavage site is a DP motif in the variable heavy or variable light chain region of the respective ABP.
[0197] In some embodiments, the plurality' of characteristic descriptors comprises the number of deamidation sites, optionally wherein the predetermined range for the number of deamidation sites is less than 5, 4, 3, 2, or 1, further optionally wherein the deamidation site is an NG, NS, or NA motif in CDR2H or CDR1L of the respective ABP.
[0198] In some embodiments, the plurality of characteristic descriptors comprises the number of isomerization sites, optionally wherein the predetermined range for the number of isomerization sites is less than 5, 4, 3, 2, or 1, further optionally wherein the isomerization site is a DG or DS motif in CDR2H, CDR3H, or CDR1L of the respective ABP.
[0199] In some embodiments, the plurality of characteristic descriptors comprises the number of oxidation sites, optionally wherein the predetermined range for the number of oxidation sites is less than 5, 4, 3, 2, or 1, further optionally wherein the oxidation site is a W or M residue in the CDRHs or CDRLs of the respective ABP.
[0200] In some embodiments, the plurality of characteristic descriptors comprises the CDR3H length, optionally wherein the predetermined range for the CDR3H length is from about 10 to about 14 amino acids.
[0201] In some embodiments, the plurality of characteristic descriptors comprises the binding specificity, optionally wherein the binding specificity corresponds to the number of variants of a target antigen capable of being targeted by the respective ABP and the predetermined range for the binding specificity is capability of binding to at least 2 (such as at least any of 3, 4, 5, 6, 7, 8, 9, 10, or more) variants of the target antigen, further optionally wherein the binding specificity' is determined by a Poly Map assay and / or a polyreactivity assay. In some embodiments, the polyreactivity assay is a binding assay. In some embodiments, the polyreactivity assay is a FACS-based assay. In some embodiments, the polyreactivity assay is a plate-based assay with purified ABPs. Non-limiting examples of binding assays to determine polyreactivity or polyspecificity can be found in Jain et al. (Biophysical properties42 28152 / 59621 / FW / 25024150.11Attorney Docket No.: 28152-59621 / WOClient Ref. No.: 024WO of the clinical-stage antibody landscape. Proc Natl Acad Sci U S A. 2017 Jan 31 ; 114(5):944- 949); Makowski et al. Highly sensitive detection of antibody nonspecific interactions using flow cytometry. mAbs, 2021 73(1)); Kelly et al. {Reduction of nonspecificity motifs in synthetic antibody libraries. Journal of Molecular Biology; Volume 430, Issue 1. 5 January 2018, Pages 119-130); and Chen et al. {Human antibody polyreactivity is governed primarily by the heavy-chain complementarity-determining regions. Cell Reports. Volume 43, Issue 10, 22 October 2024, 114801), each of which is incorporated by reference in its entirety.
[0202] In some embodiments, the polyreactivity assay allows for isolating antibodies with high and low polyreactivity. In some embodiments, the pl urality of characteristic descriptors comprises, at least in part, ABP features that result in polyreactivity. In some embodiments, the plurality of characteristic descriptors comprises, at least in part, an oligoclonality assay.
[0203] In some embodiments, the method of the present disclosure comprises generating a dataset for a filtered ABP library comprising selected ABPs, wherein the dataset comprises a reference to each of the selected ABPs and: (a) the filtered ABP library comprises at least 10 selected ABPs which is a subset of the at least 100 candidate ABPs; and (b) each selected ABP has at least one of the plurality7of characteristic descriptors meets a preferred criteria. In some embodiments, the method further comprises providing the dataset for the filtered ABP library for generation of a composition comprising selected ABPs, thereby generating the RPP. In some embodiments, the method further comprises generating a composition comprising selected ABPs using the dataset.
[0204] In some embodiments, the preferred criteria is a binding affinity to the respective target antigen is ranked at least top 25% among all the candidate ABPs in the input ABP library. In some embodiments, the preferred criteria is a binding affinity to the respective target antigen is ranked at least top 10%, at least top 20%, at least top 30%, top 35%, top 40%, top 45%, or top 50% among all the candidate ABPs in the input ABP library.
[0205] In some embodiments, the preferred criteria is an effector activity7against the target molecule or complex is ranked at least top 25% among all the candidate ABPs in the input ABP library. In some embodiments, the preferred criteria is an effector activity against the target molecule or complex is ranked at least top 10%, top 20%, top 30%, top 35%, top 40%, top 45%, or top 50% among all the candidate ABPs in the input ABP library.
[0206] In some embodiments, the preferred criteria is a binding pattern for the respective target antigen and its variants, where the binding pattern is shared with less than 30% of other43 28152 / 59621 / FW / 25024150.11Attorney Docket No.: 28152-59621 / WOClient Ref. No.: 024WO candidate ABPs in the input ABP library. In some embodiments, the binding pattern corresponds to a subset of the respective target antigen and its variants capable of binding to the respective ABP. In some embodiments, the preferred criteria is a binding pattern for the respective target antigen and its variants, where the binding pattern is shared with less than 20% of other candidate ABPs in the input ABP library. In some embodiments, the preferred criteria is a binding pattern for the respective target antigen and its variants, where the binding pattern is shared with less than 40%, less than 30%, less than 20%, less than 15%, less than 10%, less than 9%. less than 8%. less than 7%, less than 6%, less than 5%, less than 4%, less than 3%, less than 2%, or less than 1% of other candidate ABPs in the input ABP library. In some embodiments, the preferred criteria is that the binding pattern of the respective ABP is shared with at least 5, at least 6, at least 7, at least 8, at least 9, or at least 10 other candidate ABPs in the input ABP library. In some embodiments, the preferred criterion is that the binding pattern of the respective ABP is shared with less than 10, less than 9, less than 8, less than 7, less than 6, less than 5, less than 4, less than 3, or less than 2 other candidate ABPs in the input ABP library.
[0207] In some embodiments, the preferred criteria is an abundance frequency of the ABP following sorting the input ABP library or a subset thereof is ranked at least top 25% among all the candidate ABPs in the input ABP library7or the subset thereof. In some embodiments, the preferred criteria is an abundance frequency following sorting the input ABP library or a subset thereof is ranked at least top 15%, at least top 20%, at least top 25%, at least top 30%, at least top 35%, at least top 40%, or at least top 50% among all the candidate ABPs in the input ABP library or the subset thereof.
[0208] In some embodiments, the preferred is a fold-change of the increase in the abundance frequency following sorting the input ABP or the subset thereof is ranked at least top 25% among all the candidate ABPs in the input ABP library or the subset thereof. In some embodiments, the preferred is a fold-change of the increase in the abundance frequency following sorting the input ABP or the subset thereof is ranked at least top 15%, at least top 20%, at least top 25%, at least top 30%, at least top 35%, at least top 40%, or at least top 50% among all the candidate ABPs in the input ABP library or the subset thereof. Fold change can be defined as the ratio of the ABPs post-enrichment frequency to its pre-enrichment frequency..ABP Characteristics44 28152 / 59621 / FW / 25024150.11Attorney Docket No.: 28152-59621 / WOClient Ref. No.: 024WO
[0209] In some embodiments, an ABP of an RPP described herein is selected based on a plurality of characteristics selected from binding affinity7, effector activity7(e.g., neutralization activity or killing activity7), solubility7, aggregation, hydrophobicity, isoelectric point, stability7, molecular weight, number of cysteine residues, abundance frequency and fold-change in abundance following sorting of a library7of ABPs, number of glycosylation sites, number of cleavage sites, number of deamidation sites, number of isomerization sites, number of oxidation sites, CDR3H length, binding specificity7, and full or partial amino acid sequence.
[0210] In some embodiments, an ABP of an RPP described herein is selected based, at least in part, on a binding affinity of the ABP to a respective target antigen. In some embodiments, the binding affinity is expressed in KD, and the ABP is selected for having a KD less than about 100 pM. 10 pM. 1 pM, 100 nM, 10 nM, 1 nM, or lower. In some embodiments, the ABP has a KD less than about 100 pM (such as less than about any of 100 pM, 90 pM, 80 pM, 70 pM, 60 pM, 50 pM, 40 pM, 30 pM, 20 pM, 10 pM, 9 pM, 8 pM, 7 pM, 6 pM, 5 pM, 4 pM, 3 pM, 2 pM, 1 pM, 900 nM, 800 nM, 700 nM, 600 nM, 500 nM, 400 nM, 300 nM, 200 nM. 100 nM, 50 nM, 1 nM, 900 pM, 800 pM, 700 pM, 600 pM. 500 pM, 400 pM, 300 pM, 200 pM, 100 pM, 50 pM, 1 pM, or lower). In some embodiments, the ABP has a KD from about 100 pM to about 1 pM (such as about any of 100 pM, 90 pM, 80 pM, 70 pM, 60 pM, 50 pM, 40 pM, 30 pM, 20 pM, 10 pM, 9 pM, 8 pM, 7 pM, 6 pM, 5 pM, 4 pM, 3 pM, 2 pM, 1 pM, 900 nM, 800 nM, 700 nM. 600 nM, 500 nM, 400 nM, 300 nM, 200 nM, 100 nM, 50 nM. 1 nM, 900 pM. 800 pM, 700 pM, 600 pM, 500 pM. 400 pM, 300 pM, 200 pM. 100 pM, 50 pM, or 1 pM, including any ranges between any of these values). In some embodiments, such a binding affinity7is determined by surface plasmon resonance (SPR) or biolayer interferometry7(BLI). In some embodiments, the binding affinity is expressed qualitatively as being able to bind or not being able to bind, and the ABP is selected for being able to bind. In some embodiments, such a binding affinity is determined by a Poly Map assay as described herein and in PCT / US2024 / 012238, which is incorporated by reference in its entirety herein.
[0211] In some embodiments, an ABP of an RPP described herein is selected based, at least in part, on an effector activity of the ABP. In some embodiments, the target molecule or complex comprises a virus, and the effector activity is a neutralization activity7. In some embodiments, the neutralization activity is determined by a pseudovirus neutralization assay or a live virus neutralization assay. In some embodiments, the neutralization activity is determined by a pseudovirus neutralization assay and corresponds to an ICso from about 145 28152 / 59621 / FW / 25024150.11Attorney Docket No.: 28152-59621 / WOClient Ref. No.: 024WO ng / mL to about 500 mg / mL (such as about any of 1 ng / mL, 10 ng / mL, 20 ng / mL, 30 ng / mL, 40 ng / mL, 50 ng / mL, 60 ng / mL, 70 ng / mL, 80 ng / mL, 90 ng / mL, 100 ng / mL, 200 ng / mL, 300 ng / mL, 400 ng / mL, 500 ng / mL, 600 ng / mL, 700 ng / mL, 800 ng / mL, 900 ng / mL, 1 pg / mL, 10 pg / mL, 20 pg / mL, 30 pg / mL, 40 pg / mL, 50 pg / mL, 60 pg / mL, 70 pg / mL, 80 pg / mL, 90 pg / mL, 100 pg / mL, 200 pg / mL, 300 pg / mL, 400 pg / mL, 500 pg / mL, 600 pg / mL, 700 pg / mL, 800 pg / mL, 900 pg / mL, 1 mg / mL, 10 mg / mL, 20 mg / mL, 30 mg / mL, 40 mg / mL, 50 mg / mL, 60 mg / mL, 70 mg / mL, 80 mg / mL, 90 mg / mL, 100 mg / mL, 200 mg / mL, 300 mg / mL, 400 mg / mL, or 500 mg / mL, including any ranges between any of these values). In some embodiments, the neutralization activity is an ICso from about 0.08 pg / mL to about 900 pg / mL. In some embodiments, the neutralization activity is determined by a live virus neutralization assay and corresponds to an ICso from about 1 ng / mL to about 500 mg / mL (such as about any of 1 ng / mL, 2 ng / mL, 3 ng / mL, 4 ng / mL, 5 ng / mL, 6 ng / mL, 7 ng / mL, 8 ng / mL, 9 ng / mL, 10 ng / mL, 20 ng / mL, 30 ng / mL, 40 ng / mL, 50 ng / mL, 60 ng / mL, 70 ng / mL, 80 ng / mL, 90 ng / mL, 100 ng / mL, 200 ng / mL, 300 ng / mL, 400 ng / mL, 500 ng / mL, 600 ng / mL, 700 ng / mL, 800 ng / mL, 900 ng / mL, 1 pg / mL, 10 pg / mL, 20 pg / mL, 30 pg / mL, 40 pg / mL, 50 pg / mL, 60 pg / mL, 70 pg / mL. 80 pg / mL, 90 pg / mL, 100 pg / mL, 200 pg / mL, 300 pg / mL, 400 pg / mL, 500 pg / mL, 600 pg / mL, 700 pg / mL, 800 pg / mL, 900 pg / mL. 1 mg / mL, 10 mg / mL, 20 mg / mL, 30 mg / mL, 40 mg / mL, 50 mg / mL, 60 mg / mL, 70 mg / mL, 80 mg / mL, 90 mg / mL, 100 mg / mL, 200 mg / mL, 300 mg / mL, 400 mg / mL, 500 mg / mL, including any ranges between any of these values). In some embodiments, the neutralization activity is an ICso from about 0.003 pg / mL to about 1350 pg / mL. In some embodiments, the target molecule or complex comprises a bacterial protein, and the effector activity is a bactericidal activity. In some embodiments, the bactericidal activity is determined by a serum bactericidal assay (SBA) or an opsonophagocytic killing assay (OPKA). In some embodiments, the bactericidal activity corresponds to a concentration where 50% bactericidal activity is observed from about 1 ng / ml to about 500 mg / ml (such as about any of 1 ng / mL, 2 ng / mL, 3 ng / rnL, 4 ng / mL, 5 ng / mL, 6 ng / mL, 7 ng / rnL, 8 ng / mL, 9 ng / mL, 10 ng / mL, 20 ng / rnL, 30 ng / mL, 40 ng / mL, 50 ng / rnL, 60 ng / mL, 70 ng / mL, 80 ng / mL, 90 ng / mL, 100 ng / mL, 200 ng / mL, 300 ng / mL, 400 ng / mL, 500 ng / mL, 600 ng / mL, 700 ng / mL, 800 ng / mL, 900 ng / mL, 1 pg / mL, 10 pg / mL, 20 pg / mL. 30 pg / mL, 40 pg / mL, 50 pg / mL, 60 pg / mL, 70 pg / mL, 80 pg / mL, 90 pg / mL, 100 pg / mL, 200 pg / mL, 300 pg / mL, 400 pg / mL, 500 pg / mL, 600 pg / mL, 700 pg / mL, 800 pg / mL, 900 pg / mL, 1 mg / mL, 10 mg / rnL, 20 mg / mL, 30 mg / mL, 40 mg / mL, 50 mg / mL, 60 mg / mL, 70 mg / mL, 80 mg / mL, 90 mg / mL, 100 mg / mL,46 28152 / 59621 / FW / 25024150.11Attorney Docket No.: 28152-59621 / WOClient Ref. No.: 024WO200 mg / mL, 300 mg / mL, 400 mg / mL, or 500 mg / mL, including any ranges between any of these values). In some embodiments, the bactericidal activity is a concentration where 50% bactericidal activity is observed from about 0.08 pg / ml to about 3600 pg / ml.
[0212] In some embodiments, an ABP of an RPP described herein is selected based, at least in part, on solubility of the ABP. In some embodiments, the ABP is assigned a solubility score. In some embodiments, the solubility score is determined using SKA.DE. See, e.g., Raimondi, D., Orlando, G., Fariselli, P., & Moreau, Y. (2020). Insight into the protein solubility driving forces with neural attention. PLoS computational biology, 16(4), el 007722. In some embodiments, the solubility score is greater than about 0.5 (such as greater than about any of 0.5, 0.51, 0.52, 0.53, 0.54, 0.55, 0.56, 0.57, 0.58, 0.59. 0.6, 0.61, 0.62, 0.63. 0.64, 0.65, 0.66, 0.67, 0.68, 0.69, 0.7, 0.71, 0.72, 0.73, 0.74, 0.75, 0.76. 0.77. 0.78, 0.79, 0.8. 0.81, 0.82, 0.83, 0.84, 0.85, 0.86, 0.87, 0.88, 0.89, 0.9, 0.91, 0.92, 0.93, 0.94, 0.95, 0.96, 0.97, 0.98, or 0.99). In some embodiments, the solubility score is from about 0.5 to about 0.8 (such as about any of 0.5, 0.51, 0.52, 0.53, 0.54, 0.55, 0.56, 0.57, 0.58, 0.59, 0.6, 0.61, 0.62, 0.63, 0.64, 0.65, 0.66, 0.67, 0.68, 0.69, 0.7, 0.71, 0.72, 0.73, 0.74, 0.75, 0.76. 0.77, 0.78, 0.79, or 0.8, including any ranges between any of these values). In some embodiments, the solubility score is greater than about 0.5, or between 0.5 and 0.8.
[0213] In some embodiments, an ABP of an RPP described herein is selected based, at least in part, on aggregation (e.g., predicted aggregation) of the ABP. In some embodiments, the ABP is assigned an aggregation score. In some embodiments, the aggregation score corresponds to the number of residues predicted to have a propensity to aggregate. In some embodiments, the number of residues predicted to have a propensity to aggregate is determined by a method comprising the steps of: determining a 3D structure of the ABP (e.g., using AbodyBuilder2); and determining the aggregation score based on the 3D structure (e.g., using Aggrescan3D). In some embodiments, the aggregation score is fewer than 50 (such as fewer than any of 49, 48, 47, 46, 45, 44, 43, 42, 41, 40, 39, 38, 37, 36, 35, 34, 33, 32, 31, 30, 29, 28, 27, 26, 25, 24, 23, 22, 21, 20, 19, 18, 17. 16. 15. 14, 13, 12, 11, 10, 9, 8, 7, 6, 5. 4, 3, 2, or 1) aggregation-prone sites. In some embodiments, the aggregation score is fewer than 20, 15, or fewer aggregation-prone sites.
[0214] In some embodiments, an ABP of an RPP described herein is selected based, at least in part, on hydrophobicity (e.g., predicted hydrophobicity) of the ABP. In some embodiments, the ABP is assigned a hydrophobicity score. In some embodiments, the47 28152 / 59621 / FW / 25024150.11Attorney Docket No.: 28152-59621 / WOClient Ref. No.: 024WO hydrophobicity score is determined as the grand average of hydropathy (GRAVY). In some embodiments, the hydropathy value of each amino acid is calculated using the Eisenberg scale. In some embodiments, the hydrophobicity score is less than about 0.1 (such as less than about any of 0.1. 0.095, 0.09, 0.085. 0.08. 0.075, 0.07, 0.065. 0.06, 0.055, 0.05, 0.045. 0.04, 0.035, 0.03, 0.025, 0.02, 0.015, 0.01, 0.005, or lower). In some embodiments, the hydrophobicity7score is less than 0.03, less than 0.02, or less than 0.015.
[0215] In some embodiments, an ABP of an RPP described herein is selected based, at least in part, on an isoelectric point of the ABP. In some embodiments, the isoelectric point is determined as EMBOSS pK values. In some embodiments, the isoelectric point is between about 6.5 and about 9.5 (such as about any of 6.5, 6.6, 6.7, 6.8, 6.9, 7.0, 7.1, 7.2, 7.3, 7.4, 7.5, 7.6, 7.7, 7.8, 7.9. 8.0, 8.1, 8.2, 8.3. 8.4, 8.5, 8.6, 8.7. 8.8, 8.9, 9.0, 9.1. 9.2, 9.3, 9.4, or 9.5, including any ranges between any of these values). In some embodiments, the isoelectric point is between about 7.0 and about 9.0, or between about 8.0 and about 8.5.
[0216] In some embodiments, an ABP of an RPP described herein is selected based, at least in part, on a stability (e.g., predicted stability) of the ABP. In some embodiments, the ABP is assigned a stability score. In some embodiments, the stability score is determined by calculating an aliphatic index by determining the relative volume of A, V, L, and I residues, wherein the stability score corresponds to the aliphatic index. In some embodiments, the stability score is from about 60 to about 80 (such as about any of 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, or 80, including any ranges between any of these values). In some embodiments, the stability' score is from about 65 to about 73.
[0217] In some embodiments, an ABP of an RPP described herein is selected based, at least in part, on the molecular weight of the ABP. In some embodiments, the molecular weight is less than about 250 kDa (such as less than about any of 250 kDa, 245 kDa, 240 kDa, 235 kDa, 230 kDa, 225 kDa, 220 kDa, 215 kDa, 210 kDa, 205 kDa. 200 kDa. 195 kDa, 190 kDa, 185 kDa. 180 kDa. 175 kDa. 170 kDa, 165 kDa, 160 kDa, 155 kDa, 150 kDa, 145 kDa, 140 kDa, 135 kDa, 130 kDa, 125 kDa, 120 kDa, 1 15 kDa, 1 10 kDa, 105 kDa, 100 kDa, or less). In some embodiments, the molecular weight is less than about 170 kDa, less than about 160 kDa, less than about 150 kDa, less than about 140 kDa, less than about 130 kDa, less than about 120 kDa, or lower.
[0218] In some embodiments, an ABP of an RPP described herein is selected based, at least in part, on the number of cysteine residues in the ABP. In some embodiments, the selection is48 28152 / 59621 / FW / 25024150.11Attorney Docket No.: 28152-59621 / WOClient Ref. No.: 024WO based on the number of unpaired cysteine residues. In some embodiments, the number of unpaired cysteine residues is less than 5, 4, 3, 2, or 1.
[0219] In some embodiments, an ABP of an RPP described herein is selected based, at least in part, on an abundance frequency of the ABP following sorting of a library of ABPs to enrich for binding to the respective target antigen. In some embodiments, the sorting is by fluorescence-activated cell sorting (FACS) or magnetic-activated cell sorting (MACS). In some embodiments, the sorting is carried out by yeast display. The term ‘'frequency” as described herein, refers to a measure of how often the ABP occurs within a defined population of ABPs in a library or sub-library.
[0220] In some embodiments, an ABP is selected when its abundance frequency following sorting the input ABP library or a subset thereof is ranked at least top 30% among all the candidate ABPs in the input ABP libraiy' or the subset thereof. In some embodiments, an ABP is selected when its abundance frequency following sorting the input ABP library or a subset thereof is ranked at least top 25% among all the candidate ABPs in the input ABP library or the subset thereof. In some embodiments, an ABP is selected when its abundance frequency following sorting the input ABP libraiy' or a subset thereof is ranked at least top 20% among all the candidate ABPs in the input ABP libraiy' or the subset thereof. In some embodiments, an ABP is selected when its abundance frequency following sorting the input ABP library or a subset thereof is ranked at least top 15% among all the candidate ABPs in the input ABP library or the subset thereof. In some embodiments, an ABP is selected when its abundance frequency following sorting the input ABP libraiy' or a subset thereof is ranked at least top 10% among all the candidate ABPs in the input ABP library or the subset thereof. In some embodiments, an ABP is selected when its abundance frequency following sorting the input ABP library or a subset thereof is ranked at least top 5% among all the candidate ABPs in the input ABP library' or the subset thereof.
[0221] In some embodiments, the ABP of an RPP described herein is selected based, at least in part, on binding specificity. In some embodiments, an ABP of an RPP described herein is selected based, at least in part, on polyreactivity'. In some embodiments, binding specificity' is determined at least in part, by assessing the polyreactivity of the ABP. In some embodiments, the ABP of an RPP described herein is selected based, at least in part, on cross-reactivity. In some embodiments, polyreactivity is assessed by performing binding assays. In some embodiments, polyreactivity7is assessed by performing a FACS-based assay. In some49 28152 / 59621 / FW / 25024150.11Attorney Docket No.: 28152-59621 / WOClient Ref. No.: 024WO embodiments, the polyreactivity assay requires one or more reagents e.g., that are used before or during cell sorting. In some embodiments, the one or more reagents is a polyreactivity reagent (PRR) comprising a polyreactive probe. In some embodiments, the PRR selected from a protein, complex carbohydrate, nucleic acid, and lipid. In some embodiments, the PRR is selected from ovalbumin, KLH, BSA, and insulin.
[0222] In some embodiments, an ABP is selected when it has specific binding to a target antigen or its variant. In some embodiments, an ABP is selected when it has low poly reactivity. In some embodiments, an ABP is excluded when it has low specificity to a target antigen and / or high polyreactivity. In some embodiments, an ABP is selected when it has specificity to a target antigen or its variant with a ranking in the range of 10% to 90% within the input ABP library. In some embodiments, an ABP is selected when it has specificity to a target antigen or its variant with a ranking in the range of 20% to 80% within the input ABP library. In some embodiments, an ABP is selected when it has specificity to a target antigen or its variant with a ranking in the range of 25% to 75% within the input ABP library. In some embodiments, an ABP is selected when it has specificity to a target antigen or its variant with a ranking in the range of 30% to 70% within the input ABP library.
[0223] In some embodiments, the post-sort abundance frequency is greater than about 0.01% (such as greater than about any of 0.010%, 0.015%, 0.020%, 0.025%, 0.030%, 0.035%, 0.040%, 0.045%, 0.050%. 0.055%, 0.060%. 0.065%, 0.070%, 0.075%. 0.080%, 0.085%, 0.090%, 0.095%, 0.100%, or greater) within a pool of ABPs obtained after the sorting process. In some embodiments, the post-sort abundance frequency is greater than about any of 0.01%, 0.02%, 0.03%, 0.04%, 0.05%, or greater within a pool of ABPs obtained after the sorting process.
[0224] In some embodiments, an ABP of an RPP described herein is selected based, at least in part, on a fold-change in an abundance frequency of the ABP following sorting of a library of ABPs to enrich for binding to the respective target antigen. In some embodiments, the sorting is by fluorescence-activated cell sorting (FACS) or magnetic-activated cell sorting (MACS). In some embodiments, the sorting is carried out by yeast display. In some embodiments, the post-sort fold-change in abundance frequency is greater than about any of 0.5 (such as greater than about any of 1.0, 1.2. 1.4, 1.6, 1.8, 2.0, 2.2, 2.4, 2.6. 2.8, 3.0, 3.2, 3.4, 3.6, 3.8, 4.0, 4.2, 4.4, 4.6, 4.8, 5.0, 5.2, 5.4, 5.6, 5.8, 6.0, or greater). In some instances, the fold change is infinite if the ABP is not present or detectable in the pre-sort library. In50 28152 / 59621 / FW / 25024150.11Attorney Docket No.: 28152-59621 / WOClient Ref. No.: 024WO some embodiments, the post-sort fold-change in abundance frequency is greater than about any of 1, 1.5, 2, 2.5, 3, 4, or greater.
[0225] In some embodiments, an ABP of an RPP described herein is selected based, at least in part, on the number of glycosylation sites (e.g., predicted glycosylation sites) in the ABP. In some embodiments, the selection is based on the number of non-canonical glycosylation sites. In some embodiments, the number of non-canonical glycosylation sites is less than 5, 4, 3, 2, or 1.
[0226] In some embodiments, an ABP of an RPP described herein is selected based, at least in part, on the number of cleavage sites (e g., predicted cleavage sites) in the ABP. In some embodiments, the number of cleavage sites is less than 5, 4. 3, 2, or 1. In some embodiments, the cleavage sites comprise a DP motif in the variable heavy or variable light chain region of the respective ABP.
[0227] In some embodiments, an ABP of an RPP described herein is selected based, at least in part, on the number of deamidation sites (e.g., predicted deamidation sites) in the ABP. In some embodiments, the number of deamidation sites is less than 5, 4, 3, 2, or 1. In some embodiments, the deamidation sites comprise an NG, NS, or NA motif in CDR2H or CDR1L of the respective ABP.
[0228] In some embodiments, an ABP of an RPP described herein is selected based, at least in part, on the number of isomerization sites (e.g., predicted isomerization sites) in the ABP. In some embodiments, the number of isomerization sites is less than 5, 4. 3, 2, or 1. In some embodiments, the isomerization sites comprise a DG or DS motif in CDR2H, CDR3H, or CDR1L of the respective ABP.
[0229] In some embodiments, an ABP of an RPP described herein is selected based, at least in part, on the number of oxidation sites (e.g., predicted oxidation sites) in the ABP. In some embodiments, the number of oxidation sites is less than 5, 4, 3, 2, or 1. In some embodiments, the oxidation sites comprise a W or M residue in the CDRHs or CDRLs of the respective ABP.
[0230] In some embodiments, an ABP of an RPP described herein is selected based, at least in part, on the CDR3H length in the ABP. In some embodiments, the CDR3H length is from about 8 to about 40 amino acids.
[0231] In some embodiments, an ABP of an RPP described herein is selected based, at least in part, on the binding specificity' of the ABP. For example, in some embodiments, the ABP51 28152 / 59621 / FW / 25024150.11Attorney Docket No.: 28152-59621 / WOClient Ref. No.: 024WO is capable of binding to at least 1 (such as at least any of 2, 3, 4, 5, 6, 7, 8, 9, 10, or more) variant of a target antigen (e.g., a variant of a viral polypeptide). In some embodiments, the binding specificity is determined by a Poly Map assay as described herein. In some embodiments, the binding specificity is further determined by a poly reactivity assay.
[0232] In some embodiments, an ABP of an RPP described herein is selected based, at least in part, on the full or partial amino acid sequence of the ABP. In some embodiments, the selection is based on the full amino acid sequence of the ABP. In some embodiments, the selection is based on a partial amino acid sequence of the ABP, including, e g., CDRs of the ABP. In some embodiments, the selection is based on the CDR3 (heavy and / or light chain) sequences of the ABP. In some embodiments, the selection is based on the CDR1, CDR2, and / or CDR3 (heavy and / or light chain) sequences of the ABP. In some embodiments, the selection is based on the CDR1, CDR2, and CDR3 (heavy and / or light chain) sequences of the ABP. In some embodiments, the selection is based on the heavy chain and light chain CDR1, CDR2, and CDR3 sequences of the ABP.
[0233] In some embodiments, an ABP of an RPP described herein is selected based on at least any 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, or 19 of the characteristics described herein. The selection characteristics may be individually chosen for each ABP in the RPP, or may be the same for all ABPs in the RPP. For example, in some embodiments, a first ABP of an RPP may be selected based on its binding affinity, solubility, and hydrophobicity, while a second ABP of the RPP may be selected based on its aggregation, isoelectric point, and stability. In other embodiments, all of the ABPs in an RPP may be selected based on their binding affinity, solubility, hydrophobicity, aggregation, isoelectric point, and stability.
[0234] In some embodiments, the plurality of characteristic descriptors comprises the immunogenicity of the respective ABP that refers to the ability of the ABP to provoke an immune response in the subject. This response can result in the production of anti-drug antibodies (AD As) by the subject’s immune system, which may recognize the therapeutic antibody as foreign. Reducing immunogenicity is a maj or focus in the development of antibody therapeutics.
[0235] In some embodiments, one or more of the characteristics of an ABP described herein may be determined (e.g., predicted) in silico. For example, in some embodiments, where an ABP characteristic selected from solubility, aggregation, hydrophobicity, isoelectric point,52 28152 / 59621 / FW / 25024150.11Attorney Docket No.: 28152-59621 / WOClient Ref. No.: 024WO stability, number of cysteine residues, number of glycosylation sites, number of cleavage sites, number of deamidation sites, number of isomerization sites, and number of oxidation sites is used as a basis for selection to be included in an RPP, the ABP characteristic can be determined in silico. Such in silico determinations can be carried out using any convenient means known in the art, such as by using a full or partial sequence of the ABP or a nucleic acid encoding the ABP as an input.
[0236] In some embodiments, one or more of the characteristics of an ABP described herein may be determined (e g., predicted) by an ML / Al model. For example, in some embodiments, where an ABP characteristic selected from solubility, aggregation, hydrophobicity, isoelectric point, stability, number of cysteine residues, number of glycosylation sites, number of cleavage sites, number of deamidation sites, number of isomerization sites, immunogenicity, and number of oxidation sites is used as a basis for selection to be included in an RPP, the ABP characteristic can be determined by an ML / Al model.ABP Library Properties
[0237] In some embodiments, the ABPs of an RPP described herein are selected based on one or more ABP library properties selected from number of unique CDR3H sequences, number of unique epitopes, number of antigen variants targeted, number of unique heavy chain V genes, number of unique light chain V genes, number of unique heavy chain J genes, number of unique light chain J genes, heavy chain V gene average percent germline identity, light chain V gene average percent germline identity’, heavy chain J gene average percent germline identity, and light chain J gene average percent germline identity.
[0238] In some embodiments, the ABPs of an RPP described herein are selected based, at least in part, on the number of unique CDR3H sequences in the ABPs. In some embodiments, the library of ABPs in the RPP comprises at least 2 (such as at least any of 3, 4, 5, 6, 7, 8, 9, 10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 200, or more) unique CDR3H sequences.
[0239] In some embodiments, the ABPs of an RPP described herein are selected based, at least in part, on the number of unique epitopes targeted by ABPs in the library of ABPs that make up the RPP. In some embodiments, the library of ABPs in the RPP targets at least 2 (such as at least any of 3, 4, 5, 6, 7, 8, 9, 10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 200, or more) unique epitopes.53 28152 / 59621 / FW / 25024150.11Attorney Docket No.: 28152-59621 / WOClient Ref. No.: 024WO
[0240] In some embodiments, the ABPs of an RPP described herein are selected based, at least in part, on the number of antigen variants targeted by the ABPs. In some embodiments, the library of ABPs in the RPP targets at least 1 (such as at least any of 2, 3, 4, 5. 6, 7, 8, 9, 10, or more) variant of an antigen targeted by the RPP. For example, where the RPP targets a particular viral spike protein, the library of ABPs in the RPP may target one or more variants of the viral spike protein.
[0241] In some embodiments, the ABPs of an RPP described herein are selected based, at least in part, on the number of unique heavy chain V genes represented in the library of ABPs that make up the RPP. In some embodiments, the I i brarx of ABPs in the RPP represents at least 2 (such as at least any of 3, 4, 5, 6, 7, 8, 9, 10, 20, 30, 40, 50, or more) unique heavy chain V genes.
[0242] In some embodiments, the ABPs of an RPP described herein are selected based, at least in part, on the number of unique light chain V genes represented in the library of ABPs that make up the RPP. In some embodiments, the library of ABPs in the RPP represents at least 2 (such as at least any of 3, 4, 5, 6, 7, 8, 9, 10, 20, 30, 40, 50, 60, 70, or more) unique light chain V genes.
[0243] In some embodiments, the ABPs of an RPP described herein are selected based, at least in part, on the number of unique heavy chain J genes represented in the library of ABPs that make up the RPP. In some embodiments, the library of ABPs in the RPP represents at least 2 (such as at least any of 3, 4, 5, or 6) unique heavy chain J genes.
[0244] In some embodiments, the ABPs of an RPP described herein are selected based, at least in part, on the number of unique light chain J genes represented in the library of ABPs that make up the RPP. In some embodiments, the library of ABPs in the RPP represents at least 2 (such as at least any of 3, 4, 5, 6, 7. 8, or 9) unique light chain J genes.
[0245] In some embodiments, the ABPs of an RPP described herein are selected based, at least in part, on the heavy chain V gene average percent germline identity represented in the library of ABPs that make up the RPP. In some embodiments, the library of ABPs in the RPP represents a heavy chain V gene average percent germline identity between about 50% and about 100% (such as about any of 50%, 51%, 52%, 53%, 54%, 55%, 56%, 57%, 58%, 59%, 60%, 61%, 62%, 63%, 64%, 65%, 66%, 67%, 68%, 69%, 70%, 71%, 72%, 73%, 74%, 75%, 76%, 77%, 78%, 79%, 80%, 81%, 82%, 83%, 84%, 85%, 86%. 87%. 88%. 89%. 90%. 91%.54 28152 / 59621 / FW / 25024150.11Attorney Docket No.: 28152-59621 / WOClient Ref. No.: 024WO92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100% including any ranges between any of these values).
[0246] In some embodiments, the ABPs of an RPP described herein are selected based, at least in part, on the light chain V gene average percent germline identity represented in the library of ABPs that make up the RPP. In some embodiments, the library of ABPs in the RPP represents a light chain V gene average percent germline identity between about 50% and about 100% (such as about any of 50%, 51%, 52%, 53%, 54%, 55%, 56%, 57%, 58%, 59%, 60%, 61%, 62%, 63%, 64%, 65%, 66%, 67%, 68%, 69%, 70%, 71%, 72%, 73%, 74%, 75%, 76%, 77%, 78%, 79%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100%, including any ranges between any of these values).
[0247] In some embodiments, the ABPs of an RPP described herein are selected based, at least in part, on the heavy' chain J gene average percent germline identity represented in the library of ABPs that make up the RPP. In some embodiments, the library of ABPs in the RPP represents a heavy chain J gene average percent germline identify between about 50% and about 100% (such as about any of 50%, 51%, 52%, 53%, 54%, 55%, 56%, 57%, 58%, 59%, 60%, 61%, 62%, 63%, 64%, 65%, 66%, 67%, 68%, 69%, 70%, 71%, 72%, 73%, 74%, 75%, 76%, 77%, 78%, 79%, 80%. 81%. 82%. 83%. 84%. 85%. 86%. 87%. 88%. 89%. 90%. 91%. 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100%, including any ranges between any of these values).
[0248] In some embodiments, the ABPs of an RPP described herein are selected based, at least in part, on the light chain J gene average percent germline identify represented in the library of ABPs that make up the RPP. In some embodiments, the library of ABPs in the RPP represents a light chain J gene average percent germline identify between about 50% and about 100% (such as about any of 50%, 51%, 52%, 53%, 54%, 55%, 56%, 57%, 58%, 59%, 60%. 61%. 62%. 63%. 64%. 65%. 66%. 67%. 68%. 69%. 70%. 71%. 72%. 73%. 74%. 75%. 76%, 77%, 78%, 79%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100%, including any ranges between any of these values).
[0249] In some embodiments, the ABPs of an RPP described herein are selected based on at least any 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, or 11 of the I i brary properties described herein.55 28152 / 59621 / FW / 25024150.11Attorney Docket No.: 28152-59621 / WOClient Ref. No.: 024WO
[0250] In some embodiments, a respective ABP of a filtered ABP library dataset may be associated with an abundance value. In some embodiments, each ABP in a respective filtered ABP library dataset is equal or same. In some embodiments, the abundance of each ABP is separately adjusted. For example, the abundance for an ABP may be adjusted depending on the known or predicated values for one or more characteristic descriptors for the ABP.Experimental method
[0251] Various experimental method known in the art can be adopted and used for determination and prediction of characteristics of ABPs. For example, binding affinity of an ABP can be determined by surface plasmon resonance (SPR) or biolayer interferometry (BLI), but other method can be also used. In some embodiments, binding affinity is determined by Isothermal Titration Calorimetry (ITC), Fluorescence Polarization (FP), Microscale Thermophoresis (MST), Equilibrium Dialysis, Fluorescence Resonance Energy Transfer (FRET), Analytical Ultracentrifugation (AUC), Electrophoretic Mobility Shift Assay (EMSA), Radioactive Ligand Binding Assay, Dual Polarization Interferometry (DPI), or Competition Binding Assays.PolyMap assay
[0252] In some embodiments, Poly Map assay is used to determine binding of an ABP to a target antigen. Poly Map assay is the method described in PCT / US2024 / 012238 filed Jan 19, 2024, and PCT / US2025 / 038958 filed July 23. 2025, which are incorporated by reference in their entirety. Additional ABP characteristics can be found in PCT / US2025 / 038958, which is hereby incorporated by reference in its entirety.
[0253] In short, PolyMap assay is a method for high-throughput analysis of antibodies. Specifically, the method can comprise the steps of: (i) providing a library of target-decorated cells, w herein each of the target-decorated cells presents a target of interest on the membrane; (ii) contacting the library of target-decorated cells with a plurality of TBP-ribosome-mRNA (TRM) complexes, thereby inducing binding between the target-decorated cells and the TRM complexes: (iii) generating a plurality of emulsion microdroplets, wherein each microdroplet contains a single cell out of the target-decorated cells, one or more TRM complexes bound to the single cell, and a lysis reagent inducing lysis of the single cell; (iv) capturing RNA released from the single cell on a solid surface or within a semi-permeable shell; and (v) generating a library of hybrid polynucleic acids that comprise a sequence from a transcript of the single cell and / or a sequence from the mRNA of the TRM complex.56 28152 / 59621 / FW / 25024150.11Attorney Docket No.: 28152-59621 / WOClient Ref. No.: 024WO
[0254] The library of hybrid polynucleic acids can be analyzed or sequenced to provide information related to binding between the target of interest and the TRM complex. For example, based on the sequence, a TBP that binds to the target of interest and / or their binding affinity or specificity can be studied. Accordingly, the method can further comprise the step of identifying a target-TBP pair based on the sequencing of the library of hybrid polynucleic acids. In some embodiments, a plurality of target-TBP pairs are identified by sequencing the library of hybrid polynucleic acids. In some embodiments, more than two, three, four, five, six. seven, eight, nine, ten. twenty or more pairs of target-TBP pairs are identified. In some embodiments, the method further comprises the step of identifying a target binding protein specific to the target of interest. In some embodiments, the method further comprises the step of identifying binding affinity or specificity of a target binding protein specific to the target of interest.
[0255] In some embodiments, the PolyMap assay comprises the steps of: providing a library of target-decorated cells, wherein each of the target-decorated cells presents the target molecule or complex on the membrane; contacting the library of target-decorated cells with a plurality of ABP-ribosome-mRNA (ARM) complexes corresponding to the one or more of the plurality of ABPs, thereby inducing binding between the target-decorated cells and the ARM complexes; generating a plurality of monodisperse or poly disperse emulsion microdroplets, wherein each microdroplet contains a single cell out of the target-decorated cells, one or more ARM complexes bound to the single cell, and a lysis reagent inducing lysis of the single cell; capturing RNA released from the single cell on a solid surface or within a semi-permeable shell; generating a library of hybrid polynucleic acids that comprise a sequence from a transcript of the single cell and / or a sequence from the mRNA of the ARM complex; sequencing the library of hybrid polynucleic acids; and determining a presence or absence of binding of each of the one or more of the plurality of ABPs to their respective target antigen.In silica method
[0256] In some embodiments, at least one characteristic descriptor of the plurality of characteristic descriptors is determined using an in silica method. In some embodiments, the in silica method obtains the respective characteristic based on the sequence of the respective ABP, or a nucleic acid encoding the ABP, as an input. In some embodiments, the in silica57 28152 / 59621 / FW / 25024150.11Attorney Docket No.: 28152-59621 / WOClient Ref. No.: 024WO method obtains the respective characteristic based on the protein structure prediction, paratope-epitope prediction and antibody-antigen docking.
[0257] In some embodiments, one or more in silico methods are used to determine the solubility score, aggregation score, hydrophobicity score, isoelectric point, stability score, number of cysteine residues, number of glycosylation sites, number of cleavage sites, number of deamidation sites, number of isomerization sites, and / or number of oxidation sites of one or more of the ABPs referenced in the input ABP library dataset.
[0258] In some embodiments, the in silico methods are used to determine binding affinity7, effector activity (e.g., neutralization activity or killing activity7), solubility7, aggregation, hydrophobicity, isoelectric point, stability, molecular weight, number of cysteine residues, abundance frequency and fold-change in abundance following sorting of a library of ABPs, number of glycosylation sites, number of cleavage sites, number of deamidation sites, number of isomerization sites, number of oxidation sites, CDR3H length, binding specificity, and full or partial amino acid sequence. In some embodiments, an in silico method is used for epitope mapping, affinity maturation, and humanization while ensuring compatibility for therapeutic use. In some embodiments, the in silico method is used for assessment of antibody developability7, including aggregation, solubility, viscosity, and excipient formulation. In some embodiments, an in silico method is used for binding predicting, neutralization prediction, ACE2 competition prediction and / or epitope prediction.
[0259] In some embodiments, an in silico method is used for sequence analysis of a group of ABPs, to determine V and J gene usage and / or V and J gene sequence percent identity to germline sequences. In some embodiments, an in silico method is used for functional analysis of a group of ABPs e.g., binding profile and / or effector activity (e.g., neutralization activity7or killing activity ) profile of the group of ABPs.
[0260] In some embodiments, the in silico method is used to determine one, two, three, four, five, six, seven or more of the characteristic descriptors.
[0261] In some embodiments, the functional characteristics are experimentally generated. In some embodiments, the functional characteristics are computationally calculated. In some embodiments, the functional characteristics are obtained from a database.
[0262] In some embodiments, each antibody sequence is paired with its known functional properties, such as binding affinity, effector activity (e.g., neutralization activity or killing activity ), solubility7, aggregation, hydrophobicity, isoelectric point, stability7, abundance58 28152 / 59621 / FW / 25024150.11Attorney Docket No.: 28152-59621 / WOClient Ref. No.: 024WO frequency and fold-change in abundance following sorting of a library of ABPs and / or binding specificity of the antibodies. In some embodiments, various preprocessing steps are employed, such as sequence alignment, normalization, and removal of redundant or erroneous data.
[0263] In some embodiments, the model is trained to predict binding affinity , effector activity (e.g.. neutralization activity or killing activity), solubility, aggregation, hydrophobicity, isoelectric point, stability, abundance frequency and fold-change in abundance following sorting of a library of ABPs and / or binding specificity of the antibodies. In some embodiments, the model predicts interaction with an antibody with other antibodies in a library. In some embodiments, the model predicts suitability of an antibody for a polyclonal antibody.
[0264] In some embodiments, the model is trained using a dataset divided into training, validation, and test subsets. In some embodiments, the model is cross validated to fine tune hyperparameters or to select the best-performing model configuration. In some embodiments, data augmentation strategies, such as random perturbations and synthetic sequence generation, are implemented to increase the diversity of the training data. In some embodiments, the performance of the model is assessed using various metrics, including accuracy, precision, recall, and the area under the receiver operating characteristic (ROC) curve.
[0265] In some embodiments, the model generates an antibody sequence or variation thereof that has preferred functional characteristics and suitable for generation of a polyclonal antibody.Machine-learning and artificial intelligence (Al) based methods
[0266] In some embodiments, at least one characteristic descriptor of the plurality of characteristic descriptors is determined using machine-learning models. In some embodiments, the machine-learning model based method obtains the respective characteristic based on the sequence of residues of the respective ABP, or a nucleic acid encoding the ABP, as input. In some embodiments, the machine-learning model based method obtains the respective characteristic based on the protein structure prediction, paratope-epitope prediction, and / or antibody-antigen docking as inputs. In some embodiments, the protein structure prediction for a primary amino acid sequence may be generated using a single or multicomponent Al system that uses one or more machine-learning models to predict a59 28152 / 59621 / FW / 25024150.11Attorney Docket No.: 28152-59621 / WOClient Ref. No.: 024WO protein’s structure based on the amino acid sequence. In some embodiments, a protein’s structure refers to the 3-D or tertiary structure of the protein. The tertiary structure refers to the structure of the polypeptide chain of the protein. The tertiary structure may also incorporate properties of various interactions that pertain to forces occurring between molecules that contribute to the resulting folding structure of the protein, including the hydrophobic interactions, hydrogen folding, ionic bonding, and van der Waals forces of the molecules. In some embodiments, a protein’s structure refers to the 4-D or quaternary structure of the protein. The quaternary structure refers to the structure of the protein macromolecule formed by interactions between multiple polypeptide chains, where each polypeptide chain may be referred to as a subunit. The protein with quaternary structure may include more than one of the same type of a protein subunit and / or may include different subunits. The quaternary’ structure may also incorporate properties of various interactions between the subunits of the protein that result in the structure of the protein.
[0267] In some instances, the output of the structure are coordinates of each heavy atom in the protein structure. Moreover, the outputs of such models may also capture evolutionary information in addition to the structural information of the protein, and by providing the structure prediction as inputs, the value of the characteristic descriptor can be predicted based on both characteristics in addition to the sequence itself. However, it is appreciated that the protein structure prediction can be in any appropriate format. In some embodiments, the characteristic descriptor is at least one or a combination of the characteristic descriptors described above in conjunction with subsection '‘ABP Characteristics.”
[0268] In some embodiments, a machine-learning model for predicting a respective characteristic descriptor is configured to receive inputs for an ABP (e.g., the sequence of the respective ABP, the nucleic acid encoding the ABP, the protein structure of the ABP) and generate a predicted value of the characteristic descriptor for the ABP based on the inputs. In some embodiments, the machine-learning model is configured as one or a combination of a neural network, a deep neural network (DNN), a convolutional neural network (CNN), a recurrent neural network (RNN), a long short-term memory (LSTM) network, a transformer architecture, and the like.
[0269] In one embodiment, a trained machine-learning model includes a set of parameters trained via a training process. The training process is based on a training dataset including a plurality of samples of ABP’s and known values of the characteristic descriptor for the ABP.60 28152 / 59621 / FW / 25024150.11Attorney Docket No.: 28152-59621 / WOClient Ref. No.: 024WOIn some embodiments, for one or more characteristic descriptors, a separate model is trained to predict each of the characteristic descriptors. For example, a first machine-learning model is trained to predict neutralization of an ABP and a separate second machine-learning model is trained to predict epitope type of an ABP, each with a different set of parameters. In some embodiments, a machine-learning model is trained to predict two or more characteristic descriptors. A more detailed description of the training process will be provided in further detail below.
[0270] During the inference process, the trained parameters of a machine-learning model for predicting at least one characteristic descriptor are applied to the inputs for an ABP to generate a prediction for the characteristic descriptor. In some embodiments, when the inputs comprise the sequence of the ABP, each residue of the ABP is encoded into a numerical format.
[0271] The training dataset for the model includes a plurality of samples each including inputs for an ABP and corresponding labels including known values of the characteristic descriptor for the ABP. For example, the training dataset for a model to predict a characteristic descriptor of neutralization activity may include a sample including a ABP and a known neutralization activity of 157 ng / mL, another sample including a different ABP and a known neutralization activity of 298 ng / mL. and so on.
[0272] During the training process, the parameters of the model are initialized, and the training dataset is divided into one or more batches for one or more iterations of the training process. At each iteration, the parameters of the model are applied to inputs for the batch to generate estimated outputs. A loss function is computed that indicates a difference between the labels for the batch and the estimated outputs. The parameters of the model are updated to reduce the loss function. This process is repeated for subsequent iterations until a convergence criteria is reached.Transformer architecture for predicting characteristic descriptors
[0273] FIG. 7 illustrates an inference process of a transformer-based model for predicting a characteristic descriptor, according to an embodiment. In one embodiment, a machinelearning model for predicting a characteristic descriptor of an ABP includes an embedding layer 710, a transformer model 750, and a prediction layer 760. In some embodiments, portions of the model are configured as a neural network including a plurality of layers of nodes and is associated with a set of parameters. For example, the transformer model 71061 28152 / 59621 / FW / 25024150.11Atorney Docket No.: 28152-59621 / WOClient Ref. No.: 024WO includes a plurality of encoders or decoders each including one or more atention layers. The model 700 show n in FIG. 7 is a model for predicting any one or a combination of the plurality of character descriptors described herein.
[0274] The embedding layer 710 is configured to receive a set of tokens that are numerical representations of inputs for the ABP (e g., sequence of ABP) and generate a set of input embeddings. A token represents a respective residue of the ABP and encodes one amino acid or nucleotide from a dictionary of amino acids or nucleotides. Thus, the inputs to the embedding layer 710 are a token sequence encoding a sequence of residues of an ABP.
[0275] The transformer model 750 is configured to receive the set of input embeddings and generate a set of output embeddings. In one embodiment, the transformer model 750 includes a set of atention layers and is configured as a ‘'bidirectional encoder representations from transformers” (BERT) model. Specifically, an atention layer is coupled to receive a query , a key, and a value, and generate an atention output by combining the query, the key, and the value.
[0276] The prediction layer 760 is configured to receive at least one of the set of output embeddings and generate a set of predictions for one or more of the characteristic descriptors.
[0277] As an example, the model 700 in FIG. 7 is trained to predict binding affinity for a ABP. In the example shown in FIG. 7, the inference process is performed to predict the binding affinity value for ABP w ith sequence “ARGHWEYYFDY.” A set of tokens are generated for the sequence. The parameters of the embedding layer 710 are applied to the set of tokens to generate a set of input embeddings. The parameters of the transformer model 750 are applied to the set of input embeddings to generate a set of output embeddings. The parameters of the prediction layer 760 are applied to at least one of the set of output embeddings to generate the prediction. As an example, the prediction for binding affinity is 23 pM for the ABP.
[0278] In some embodiments, the training process of the transformer architecture model comprises two steps. (1) a first step for pre-training the transformer model and (2) a second step for fine-tuning parameters of the prediction layer.
[0279] During each iteration of the first step (i.e., the pre-training step), the training method comprises (1) obtaining a training dataset including one or more token sequences for one or more ABPs, each token sequence encoding a partial or full sequence of a respective ABP; (2)62 28152 / 59621 / FW / 25024150.11Attorney Docket No.: 28152-59621 / WOClient Ref. No.: 024WO dividing the training dataset into one or more batches of samples for one or more iterations;(3) for each of one or more iterations:(a) obtaining an estimated output embedding sequence by applying parameters of the transformer model to the token sequence;(b) for each token sequence, mapping the estimated output embedding sequence to an estimated token sequence;(c) computing a loss function indicating differences between the estimated token sequences and the token sequences for the one or more ABP’s; and(d) updating the parameters of the transformer model by backpropagating error terms from the loss function.This process is repeated for subsequent iterations until a convergence criteria is reached.
[0280] This way, the pre-trained transformer model 750 leams the relationships between different residues and their respective positions that are encoded within the one or more token sequences of the training dataset.
[0281] In one embodiment, the estimated output embeddings include a respective output embedding corresponding to each residue in the ABP sequence. The estimated output embeddings are mapped to estimated residues based on logit probabilities. A loss function is computed that indicates a difference between the estimated residues and the actual residues of the ABP sequences labels for the batch and the estimated outputs. The parameters of the model are updated to reduce the loss function.
[0282] During each iteration of the second step (i.e., the fine-tuning step), the training method comprises: (1) obtaining a training dataset including a plurality of samples corresponding to a plurality of ABPs, each sample including a token sequence encoding a partial and / or full sequence of the respective ABP and a label indicating a value of the characteristic descriptor for the ABP; (2) accessing the pre-trained transformer model and a prediction layer; (3) dividing the training dataset into one or more batches of samples for one or more iterations; (4) for each of one or more iterations:(a) obtaining a set of output embedding sequences for the respective batch of samples for a current iteration, wherein the set of output embedding sequences are generated by applying the transformer model to the token sequences for the batch of samples,(b) for each sample in the batch, applying the parameters of the prediction layer to the output embedding sequence for the sample to generate an estimated output,63 28152 / 59621 / FW / 25024150.11Attorney Docket No.: 28152-59621 / WOClient Ref. No.: 024WO(c) computing a loss function indicating differences between the labels and the estimated outputs for the batch of samples, and(d) updating the parameters of the prediction layer by backpropagating error terms obtained from the loss function; and (5) storing the parameters of the prediction layer on the computer readable medium.
[0283] This way, given a set of output embeddings encoding a ABP sequence, the prediction layer 760 leams how to map the output embeddings to a predicted characteristic descriptor value based on the training dataset.
[0284] In some embodiments, the process further comprises pre-training a transformer model with protein sequences or use an existing pre-trained model trained on general protein sequences. The steps (1) and (2) above are performed using the pre-trained model. This way, a transformer model that encodes relationships between amino acid sequences from a general population of proteins can be used as a starting point for learning the parameter values.System for predicting characteristic descriptors using sequence and structural information
[0285] FIG. 8 illustrates a method of predicting a characteristic descriptor for an ABP using sequence-based features and structure-based features, according to an embodiment. In some embodiments, the method is performed by executing the pipeline 800 illustrated in FIG. 8. In some embodiments, the pipeline 800 comprises a protein language model (PLM) 820, a structure prediction model 840, a structure embedding model 850, and a model head layer 870 as machine-learned components. In one example, the parameters of the machine-learned components in the pipeline 800 are trained to predict which epitope on a target antigen an input ABP will bind to given the sequence information of the respective ABP.
[0286] The PLM 820 is configured to receive a set of tokens that are numerical representations of inputs for the ABP (e.g., partial or full sequence associated with ABP) and generate a set of output embeddings. Similar to that described in FIG. 7, a token represents a respective residue of the ABP and encodes one amino acid or nucleotide from a dictionary of amino acids or nucleotides. For example, a token is represented as a vector including a number of elements corresponding to a dictionary of residues, and the respective element in the vector corresponding to the respective residue of the token has a value of 1 (or some nonzero value), while the remaining elements have a value of zero. Thus, the inputs to the PLM64 28152 / 59621 / FW / 25024150.11Attorney Docket No.: 28152-59621 / WOClient Ref. No.: 024WO820 are a token sequence encoding a sequence of residues of an ABP. In some embodiments, at least portions of the PLM 820 are configured as a transformer architecture that includes a set of attention layers. In one embodiment, the transformer model may be configured as a BERT model.
[0287] In some embodiments, the parameters of the PLM 820 are trained on a set of protein sequences, such that the output embeddings encode the relationships between different residues. In some embodiments, the PLM 820 may be the combination of the embedding layer 710 and the transformer model 750 that was pre-trained on general protein sequences, and fine-tuned on antibody sequences as described in detail in conjunction with FIG. 7. In one instance, each output embedding corresponding to a respective position in the original ABP sequence is associated with a vector of dimensionality 1 *dPim. where dPim is the hidden dimensionality space.
[0288] In some embodiments, the output embeddings generated for a respective ABP are mean-pooled to generate a PLM embedding 830 that represents the entire sequence in a latent space. Thus, the PLM embedding 830 may also have a vector of dimensionality 1 xdPim. However, it is appreciated that in other embodiments, any other appropriate method for extracting the PLM embedding 830 can be applied when executing the pipeline 800. For example, in some embodiments, the PLM embedding is generated by applying a max function, a min function, or any other appropriate statistical function to the values of the output embeddings. In some embodiments, a special CLS token is input to the PLM 820 along with the other set of tokens for the ABP. In such an embodiment, a respective output embedding that corresponds to the position of the CLS token is identified as the PLM embedding 830.
[0289] In some embodiments, an output embedding corresponding to a respective position in the original ABP sequence is obtained from the output layer of the PLM 820. and the output layer is associated with hidden dimensionality dPim. In some embodiments, an output embedding is obtained from one or more intermediate layers that are placed before the output layer of the PLM 820, and the intermediate layer is associated with hidden dimensionality dplm.
[0290] The structure prediction model 840 is configured to receive a set of tokens that are numerical representations of inputs for the ABP (e.g., sequence of ABP) and generate a65 28152 / 59621 / FW / 25024150.11Attorney Docket No.: 28152-59621 / WOClient Ref. No.: 024WO structural representation of the input ABP. In some embodiments, portions of the structure prediction model 840 are configured as one or more transformer architectures.
[0291] In some embodiments, the structural representation encodes the tertiary structure (3D) of a protein or quaternary (4D) structure of a protein as defined above. In one instance, the structural representation output by the structure prediction model 840 is formatted as a Protein Data Bank (PDB) file, which is a standardized text file describing the full 3-D or 4-D arrangement of atoms in a protein or any other nucleic acid structure. The PDB file for an input ABP may describe the atomic coordinates of each atom of the protein, secondary structure information, connectivity and bond information (e.g., disulfide bonds or ligands), and / or metadata information.
[0292] In some embodiments, parameters of the structure prediction model 840 are trained by learning a mapping from protein sequences to experimentally known 3D or 4D structures using large datasets. During training, the structure prediction model ingests a residue sequence of a protein along with multiple sequence alignments (MS As) and templates, which capture evolutionary relationships and structural constraints. A deep transformer architecture iteratively reasons over pairwise residue relationships and sequence representations, followed by a structure module that outputs 3D atomic coordinates of the atoms in the protein. The loss function during the training process reduces the differences between predicted and true structures based on distances, angles, and atomic positions, while regularizing for physical plausibility7. Through end-to-end training on tens of thousands of annotated structures, the model learns general patterns of protein folding, enabling accurate predictions even for unseen sequences.
[0293] The structure embedding model 850 is coupled to receive the structural representation of the respective ABP and generate a set of structural embeddings. In one embodiment, the structure embedding model 850 converts protein structures predicted from the structure prediction model 840 into discrete structural alphabets to generate structural embeddings for the input ABP. A structure embedding 855 representing the predicted structure of the input ABP sequence is obtained from the structural embeddings. In one instance, the structure embedding 855 has a vector of dimensionality Ixdstructure.
[0294] In some embodiments, to encode both the sequential and structural features when predicting the characteristic descriptor value for the input ABP, the resulting PLM embedding 830 and the structure embedding 855 is concatenated along the hidden66 28152 / 59621 / FW / 25024150.11Attorney Docket No.: 28152-59621 / WOClient Ref. No.: 024WO dimensionality dimension. Therefore, the resulting concatenated embedding 860 for a respective input ABP may be a vector of dimensionality 1 x(dstructure+dPim). In some embodiments, the concatenated embedding 860 is further generated by applying a scaling function, such as a normalization function or constant scaling function, to the values of the concatenated PLM embedding 830 and the structure embedding 855.
[0295] In some embodiments, a model head layer 870 is coupled to receive sequence-based features (e.g., PLM embedding) and / or structure-based features (e.g., structure embedding) to generate one or more predictions on the characteristic descriptor. In some embodiments, the trained model head layer 870 is applied to the concatenated embedding 860 to generate one or more predicted values of the characteristic descriptor for the respective ABP.
[0296] In some embodiments, the structure of the model head layer 870 is configured as a neural network with one or more layers of nodes. For example, the architecture of the neural network is a multi-layer perceptron (MLP) that includes an input layer withnodes and an output layer with nciasses number of nodes for epitope predictions, when there are different epitopes on the target antigen. However, in other embodiments, the model head layer 870 is configured as any other appropriate model architecture, such as random forest classifiers, support vector machines, regression-based models, and the like. The parameters of the model head layer 870 are trained as described in further detail below.
[0297] As an example, the models in the pipeline 800 in FIG. 8 are trained to predict epitope binding on a target antigen for a ABP. In the example show n in FIG. 8, the pipeline 800 is performed on an example ABP with an example sequence ’VH: EVQLVESGGGLVQPGGSL . . for the variable domain heavy chain sequence in the variable fragment (Fv) of the ABP and an example sequence “VL: DIQMTQSPSSLSASVGDRVTI . . for the variable domain light chain sequence in the Fv of the ABP to determine which epitope on a target antigen the respective ABP will bind to.
[0298] A PLM embedding is generated based on the sequence information of the example ABP. The sequence is also used to generate the structural representation and the structural embedding of the input ABP. The two embeddings are concatenated together to generate a concatenated embedding. The concatenated embedding is input to the trained MLP model head layer to generate predictions for a set ofclasses. Specifically, a prediction for a respective epitope (or class) indicates a likelihood the input ABP will bind to that particular67 28152 / 59621 / FW / 25024150.11Attorney Docket No.: 28152-59621 / WOClient Ref. No.: 024WO epitope, and may be encoded as a value between 0 to 1 , where a higher value indicates higher likelihood.
[0299] In some embodiments, a subset of epitopes with the highest likelihoods are selected for the prediction. For example, epitope region E corresponding to the 5th class may have a predicted value of 0.88 and the prediction may indicate that the example input ABP will likely bind to region E on the target antigen.
[0300] In some embodiments, the training method for training parameters of the model head layer 870 includes (1) obtaining a training dataset including a plurality of samples corresponding to a plurality of ABPs, each sample including a concatenated embedding for the respective ABP obtained by concatenating the PLM embedding and the structure embedding for the ABP, and a label indicating a value of the characteristic descriptor for the ABP; (2) accessing the model head layer; (3) dividing the training dataset into one or more batches of samples for one or more iterations; and (4) for each of one or more iterations:(a) obtaining a set of estimated predictions for the respective batch of samples for a current iteration, wherein the set of estimated predictions are generated by applying the model head layer to the concatenated embeddings for the batch of samples,(b) computing a loss function indicating differences between the labels and the estimated predictions for the batch of samples, and(c) updating the parameters of the model head layer by backpropagating error terms obtained from the loss function; and(5) storing the parameters of the model head layer on the computer readable medium.
[0301] In some embodiments, the training process includes a K-fold cross validation process that trains and evaluates K separate and distinct versions of the model head layer based on K separate validation datasets. Thus, a model for a respective fold of validation data is associated with a respective set of parameters that are different from other models trained using the other folds of data. This prevents leakage between validation data. In some embodiments, for generating predictions for new data, one of the K models is selected as the model head layer 870, and the selected model is used to predict values for new data. In other embodiments, each of the K models generate a respective prediction, and an overall confidence or voting mechanism is used to generate the final prediction for the characteristic descriptor. For the example on epitope prediction, the epitope that was predicted by a majority of the K models is identified as the final prediction.68 28152 / 59621 / FW / 25024150.11Attorney Docket No.: 28152-59621 / WOClient Ref. No.: 024WO
[0302] This way, the training process for the model head layer effectively learns how to incorporate both sequence and / or structure information when generating predictions for one or more characteristic descriptors for an ABP.
[0303] FIGs. 9A-9B illustrate experimental results for epitope prediction using the pipeline illustrated in FIG. 8. FIG. 9A provides average validation loss and average validation accuracy (%) of epitope predictions by (i) PLM alone; (ii) the structure embedding model alone; or (iii) PLM in combination with the structure embedding model. FIG. 9B shows the result by confusion matrix. Specifically, the dataset includes sequences for 1,606 antibodies binding to SARS-CoV-2 spike protein receptor-binding domains (RBDs). The dataset characterizes six regions, epitopes A, B, C, D, E, F on the spike protein RBD.
[0304] FIG. 9A illustrates experimental results for different model head layers trained using embeddings generated from three different models as features. The three different cases are (i) the PLM 820, (ii) the structure embedding model 850, and (iii) both the PLM 820 and the structure prediction model 850. Specifically, the parameters for the model head layer 870 for case (iii) are trained in conjunction with the process described in conjunction with FIG. 8 using both sequence and structure-based features. The parameters of the model head layer for case (i) are trained using only the sequence-based features (e.g., PLM embeddings). The parameters of the model head layer for case (ii) are trained using only the structure-based features (e.g., structure embeddings).
[0305] As shown in FIG. 9A, the average validation loss after performing a 5-fold cross- validation process is lowest for case (iii) that incorporates both sequence and structure-based features when training the model head layer, demonstrating that encoding both types of features are beneficial for characteristic descriptor prediction.
[0306] FIG. 9B illustrates a confusion matrix for an example model head layer trained for case (iii). Specifically, the horizontal axis indicates data samples in the validation dataset including ABPs predicted to bind to respective epitope regions A, B, C, D, E, F. The vertical axis indicates data samples that include ABPs known to bind to the respective epitope regions. Therefore, the diagonal elements in the confusion matrix denote the number of samples that are correctly predicted. As shown in FIG. 9B. the accuracy and performance of the pipeline 800 is relatively high.System for predicting hydrophobicity using structural information69 28152 / 59621 / FW / 25024150.11Attorney Docket No.: 28152-59621 / WOClient Ref. No.: 024WO
[0307] FIG. 10 illustrates a method of predicting hydrophobicity of a respective ABP using the predicted structure of the ABP, according to an embodiment. In some embodiments, the method is performed by executing the pipeline 1000 illustrated in FIG. 10. In some embodiments, the pipeline 1000 includes a structure prediction model 1040, a surface exposure computation process 1070, and a hydrophobicity computation process 1080.
[0308] In some embodiments, the structure prediction model 1040 is configured to receive a set of tokens that are numerical representations of inputs for the ABP (e.g., sequence of ABP) and generate a structural representation of the input ABP. In some embodiments, portions of the structure prediction model 1040 are configured as one or more transformer architectures. In some embodiments, the structure of the structure prediction model 1040 is identical or substantially similar to the structure prediction model 840 of the system described in conjunction with FIG. 8, and generates the structural representation as a PDB file that encodes the tertiary (3D) or quaternary (4D) structure of the input ABP.
[0309] In some embodiments, based on the structure representation generated by the structure prediction model 1040, the surface exposure computation process 1070 computes the solvent- accessible surface area (SAS A) for each residue in the input ABP sequence. In one instance, the SASA for each residue is computed using a C programming language-based implementation that simulates how a spherical solvent probe "‘rolls'’ over the van der Waals surface of a biomolecule. In one instance, the surface exposure computation process 1070 employs the Lee-Richards algorithm with default parameters (Probe Radius=1.4A; 100 points per atom; 20 slices). The surface exposure computation process 1070 yields a SASA value in A for each individual residue in the input ABP, quantifying the extent to which each residue is exposed to a solvent in the predicted structure of the ABP.
[0310] In some embodiments, based on the SASA computations performed during the surface exposure computation process 1070, the hydrophobicity computation process 1080 computes the hydrophobicity score for the input ABP. In some embodiments, the computed SASA values are integrated with one or more amino acid hydrophobicity scales. The surface hydrophobicity score for the input ABP is computed by taking a product of the computed SASA for each residue and the corresponding hydrophobicity index from a given scale, and combining the respective hydrophobicity score for each residue in the sequence. This process may be repeated for different hydrophobicity scales to obtain distinct surface hydrophobicity’ scores for the input ABP.70 28152 / 59621 / FW / 25024150.11Attorney Docket No.: 28152-59621 / WOClient Ref. No.: 024WO
[0311] Compared to existing hydrophobicity computation pipelines, the pipeline 1000 for predicting hydrophobicity scores for a given ABP utilizes the machine-learned structure prediction model 1040 to effectively incorporate structure-based features of an ABP using the machine-learned structure prediction model 1040 to compute the hydrophobicity score for the ABP. In this manner, the hydrophobicity score for the ABP is structure-adjusted by the predicted structure representation of the ABP, resulting in higher accuracy.
[0312] FIGs. 1 1A-11C illustrate experimental results for hydrophobicity score prediction using the pipeline, either sequence alone or sequence in combination with the structure embedding model involving the solvent-accessible surface area (SASA) normalization as illustrated in FIG. 10. FIG. 11 A shows correlations between various hydrophobic score predictions and the HIC retention times disclosed in Jain et al. FIG. 1 IB shows correlations between various hydrophobic score predictions and the HIC retention times from GDPA1 dataset. FIG. 11 C shows the correlations as a scatter plot.
[0313] Specifically, FIG. 11A illustrates experimental results demonstrating the correlation of various hydrophobicity calculations with the hydrophobic interaction chromatography (HIC) retention time for a set of proteins from the Jain et al. dataset. The HIC retention time is indicative of hydrophobicity of antibodies and thus, is a good indicator of the prediction performance of the pipeline 1000. Specifically, the points on the horizontal axis associated with the label “Sequence only” each correspond to different hydrophobicity scales and the vertical axis indicates the coefficient of determination between predicted hydrophobicity scores without computing the SASA values, and the HIC retention times obtained from the Jain et al. dataset. The points on the horizontal axis labeled “SASA normalized’' each correspond to the different hydrophobicity scores obtained by integrating the SASA values computed based on the structure representation predicted using the structure prediction model 1040, and the HIC retention times obtained from the Jain et al. dataset. Similarly, FIG. 1 IB illustrates the experimental results demonstrating the correlation of various hydrophobicity calculations with HIC retention time for a set of proteins from the GDPA1 dataset.
[0314] FIG. 11C illustrates data plots for predicted hydrophobicity using the Eisenberg scale, both without computing the SASA values (i.e., “Sequence only”) and using the predicted structure representation of the set of proteins to integrate SASA values into the hydrophobicity score computation. The top two plots illustrate the data points for the Jain et al. dataset, and the bottom two plots illustrate the data points for the GDPal dataset. As71 28152 / 59621 / FW / 25024150.11Attorney Docket No.: 28152-59621 / WOClient Ref. No.: 024WO show n in FIG. 11C, the data points indicate relatively higher correlation between the predicted hydrophobicity scores and the HIC retention time for the “SASA normalized” case.
[0315] As illustrated in FIGs. 11A-11C, incorporating machine-learning model generated predictions of the structure representation of the input ABPs to compute the SASA contributions significantly increases the coefficient of determination betw een the predicted hydrophobicity scores and the known HIC retention times from these two example datasets. Typically, current methods of computing hydrophobicity consider hydrophobicity scales that attach a hydrophobicity index per residue given a sequence for a protein without considering the surface exposure of atoms in each residue due to the structure of the ABP. By executing the pipeline 1000 described in conjunction with FIG. 10, the hydrophobicity values can be better predicted using structure-based features.System for predicting aggregation using structural information
[0316] FIG. 12 illustrates a method of predicting aggregation of a respective ABP using a predicted structure of the respective ABP, according to an embodiment. In some embodiments, the method is performed by executing the pipeline 1200 illustrated in FIG. 12. In some embodiments, the pipeline 1200 includes a structure prediction model 1240. an aggregation propensity computation process 1280, and an aggregation score computation process 1290.
[0317] In some embodiments, the structure prediction model 1240 is configured to receive a set of tokens that are numerical representations of inputs for the ABP (e.g., sequence of ABP) and generate a structural representation of the input ABP. In some embodiments, portions of the structure prediction model 1240 are configured as one or more transformer architectures. In some embodiments, the structure of the structure prediction model 1240 is identical or substantially similar to the structure prediction model 840 of the system descnbed in conjunction with FIG. 8, and generates the structural representation as a PDB file that encodes the tertiary or uaternary structure of the input ABP.
[0318] In some embodiments, based on the structure representation generated by the structure prediction model 1240, the aggregation propensity computation process 1280 computes the aggregation propensity' per residue relative to surface exposure.
[0319] In some embodiments, based on the computed aggregation propensities calculated during the aggregation propensity computation process 1280, the aggregation score72 28152 / 59621 / FW / 25024150.11Attorney Docket No.: 28152-59621 / WOClient Ref. No.: 024WO computation process 1290 computes the aggregation score for the input ABP. In some embodiments, the aggregation score is generated by averaging or combining the aggregation propensities for each residue in the input ABP, normalized according to the surface exposure contribution obtained from the predicted structure of the input ABP. In some embodiments, the aggregation score is determined by the number of residues above an aggregation score threshold (>0.5).
[0320] The pipeline 1200 for predicting aggregation scores for a given ABP uses the machine-learned structure prediction model 1240 to effectively incorporate structure-based features of an ABP when computing the aggregation score for the ABP. In this manner, the aggregation score for the ABP is structure-adjusted by the predicted structure representation of the ABP, resulting in higher accuracy.
[0321] FIG. 13 shows correlations between the aggregation score predicted using the pipeline illustrated in FIG. 12 and the HIC retention times. Specifically, FIG. 13 illustrates experimental results that demonstrate the correlation of various aggregation score calculations with the HIC retention time for a set of proteins from the GDPal dataset. The HIC retention time is also indicative of aggregation properties of antibodies and thus, is a good indicator of the prediction performance of the pipeline 1200. Specifically, the horizontal axis is associated with the predicted aggregation scores for the set of proteins in the dataset, and the vertical axis indicates the known HIC retention time values for the set of proteins. As illustrated in FIG. 13, the data points indicate relatively higher correlation betw een the predicted aggregation scores and the HIC retention time.System for predicting polyreactivity using machine-learned classifier model
[0322] Disclosed herein is a method of predicting polyreactivity of a respective ABP using a machine-learned classifier model. In some embodiments, the method is performed by executing a pipeline. In some embodiments, the pipeline includes a PLM and a classifier model.
[0323] The PLM is configured to receive a set of tokens that are numerical representations of inputs for the ABP (e g., partial or full sequence of ABP) and generate a set of output embeddings. In some embodiments, some portions of the PLM are configured as a transformer architecture including a set of attention layers. In some embodiments, the PLM is configured identical or substantially similar to the PLM 820 described in conjunction with FIG. 8. In some embodiments, the output embeddings generated for a respective ABP are73 28152 / 59621 / FW / 25024150.11Attorney Docket No.: 28152-59621 / WOClient Ref. No.: 024WO mean-pooled to generate a PLM embedding that represents the entire sequence in a latent space.
[0324] In some embodiments, the machine-learned classifier model is coupled to receive the PLM embedding for the respective ABP and generate a prediction on whether the respective ABP will be associated with a high degree of polyreactivity (e.g., classification class 1) or a low degree of polyreactivity (e.g., classification class 0).
[0325] In some embodiments, the training method for training parameters of the classifier model includes (1) obtaining a training dataset including a plurality of samples corresponding to a plurality of ABPs, each sample including a PLM embedding for the respective ABP obtained by applying the PLM to the sequence information to the respective ABP, and a label indicating a value associated with the poly reactivity for the ABP; (2) accessing the classifier model; (3) dividing the training dataset into one or more batches of samples for one or more iterations; and (4) for each of one or more iterations:(a) obtaining a set of estimated predictions for the respective batch of samples for a current iteration, wherein the set of estimated predictions are generated by applying the classifier model to the PLM embeddings for the batch of samples,(b) computing a loss function indicating differences between the labels and the estimated predictions for the batch of samples, and(c) updating the parameters of the prediction layer by backpropagating error terms obtained from the loss function; and (5) storing the parameters of the classifier model on the computer readable medium.
[0326] In some embodiments, the values associated with the polyreactivity for the ABP is 1 if the ABP is determined to have polyreactivity above a threshold polyreactivity (e.g., top 10% polyreactivity), and 0 if the ABP is determined to have poly reactivity below a threshold polyreactivity (e.g., bottom 10% polyreactivity).
[0327] In some embodiments, the training data for (1) of the training method is obtained by using a polyreactivity assay that allows for isolating antibodies with high and low polyreactivity. In some embodiments, data using such a polyreactivity assay includes a first scFv dataset including 58 adult human spleen / lymph node samples (e.g., > 240,000 ABPs), and a second scFv dataset including 3 human spleen lymphocyte samples (e g., > 120,000 ABPs). In some embodiments, as described in FIG. 32 below, ABPs with high and low polyreactivity were isolated by sequentially sorting against 4 polyreactivity reagents. The top74 28152 / 59621 / FW / 25024150.11Attorney Docket No.: 28152-59621 / WOClient Ref. No.: 024WO10% and bottom 10% ABPs for the first scFv dataset were identified. The ABPs with top 10% polyreactivity were associated with a label of 1, and the ABPs with bottom 10% poly reactivity were associated with a label of 0.
[0328] FIG. 14 illustrates results for poly reactivity prediction using the pipeline for the first scFv dataset using experimentally determined labels (0 or 1). FIG. 15 illustrates results for poly reactivity prediction using the pipeline for the second scFv dataset. As an example, the parameters of the classifier model were trained using approximately 20,000 samples from the first scFv dataset. As illustrated in FIG. 14, the confusion matrix for a sampled test set of approximately 2,000 samples indicated an accuracy of approximately 98%. Moreover, the trained classifier model for polyreactivity prediction was also tested on the samples in the second scFv dataset. As illustrated in FIG. 15, the confusion matrix for a sampled set of approximately 12,000 samples indicated an accuracy of approximately 83%.6.4. Methods of selecting and optimizing RPPsMethod of selecting candidate ABP library dataset based on score computation
[0329] FIG. 16 illustrates a method of selecting a filtered ABP library dataset for producing a RPP, according to one or more embodiments. In some embodiments, a method for selecting a filtered ABP library dataset for a RPP is disclosed. In some embodiments, the method comprises identifying one or more candidate libraries of ABPs each corresponding to a respective subset of ABPs. obtaining a set of values for at least one characteristic descriptor of the plurality of characteristic descriptors and / or preferred library properties, and computing a library score for the candidate library. In some embodiments, the method further comprises selecting at least one candidate library based on the generated scores for producing an RPP.
[0330] In some embodiments, the method comprises: (1) obtaining an input ABP library dataset including an ABP profile for each of a plurality of ABPs; (2) generating a set of candidate ABP library datasets each corresponding to a respective subset of ABPs (e.g., 100 ABPs) from the plurality of ABPs. As an example, FIG. 16 illustrates obtaining a candidate ABP library dataset ‘'Candidate Library if where i = 1. 2. . . L, where L is the total number of candidate ABP library datasets. Each candidate ABP library dataset includes a subset of 100 ABPs. In some embodiments, the plurality of candidate ABP library' datasets are generated by selecting combinations of a number (e.g., 100) of ABP's from the input ABP library dataset randomly or deterministically’.75 28152 / 59621 / FW / 25024150.11Attorney Docket No.: 28152-59621 / WOClient Ref. No.: 024WO
[0331] In some embodiments, the method comprises (3) for each ABP in a respective candidate ABP library dataset, obtaining values for a plurality characteristic descriptors for the respective ABP, wherein the plurality of characteristic descriptors is selected from:(i) a binding affinity of the respective ABP for the respective target antigen;(ii) an effector activity of the respective ABP against the target molecule or complex;(iii) a solubility score of the respective ABP;(iv) an aggregation score of the respective ABP;(v) a hydrophobicity score of the respective ABP;(vi) an isoelectric point of the respective ABP;(vii) a stability score of the respective ABP;(viii) a molecular weight of the respective ABP;(ix) a number of unpaired cysteine residues in the respective ABP;(x) an abundance frequency of the respective ABP following a sorting process to enrich for binding to the respective target antigen;(xi) a fold-change of the increase in the abundance frequency of the respective ABP following a sorting process to enrich for binding to the respective target antigen, as compared to the abundance frequency prior to enrichment;(xii) a number of non-canonical glycosylation sites in the respective ABP;(xiii) a number of cleavage sites in the respective ABP;(xiv) a number of deamidation sites in the respective ABP;(xv) a number of isomerization sites in the respective ABP;(xvi) a number of oxidation sites in the respective ABP;(xvii) CDR3H length of the respective ABP:(xviii) binding speci ficily of the respective ABP;(xix) immunogenicity of the respective ABP;(xx) poly specificity of the respective ABP; and(xxi) a respective epitope that the respective ABP binds to.In some embodiments, a value for at least one characteristic descriptor for a respective ABP is obtained from the methods described herein with respect to subsection "Methods of generating a custom RPP,” specifically subsection “Experimental method,’’ subsection “In silico method,” and subsection “Machine-learning and artificial intelligence (Al) based methods.” In some embodiments, values for at least one characteristic descriptor for the respective ABP is generated by applying one or more machine-learning models to at least a76 28152 / 59621 / FW / 25024150.11Attorney Docket No.: 28152-59621 / WOClient Ref. No.: 024WO full or partial sequence of the respective ABP. In some embodiments, values for at least one characteristic descriptor for the respective ABP is generated by applying the transformer model and the prediction layer trained according to the description in conjunction with FIG.7. In some embodiments, values for at least one characteristic descriptor for the respective ABP are generated by applying any one or a combination of the pipeline 800, 1000, 1200, or 1600 described in conjunction with FIGs. 8, 10, 12, or 16. This way, the predicted structure of the respective ABP can be integrated when generating the predictions for the at least one characteristic descriptor for the respective ABP.
[0332] In some embodiments, the value for the (vii) stability score of a respective ABP may be generated by one or a combination of the values for (xiv) the number of cleavage sites in the respective ABP, (xv) the number of deamidation sites in the respective ABP, (xvi) the number of isomerization sites in the respective ABP, and / or (xvii) the number of oxidation sites in the respective ABP, as these descriptors are indicative of the stability of the respective ABP.
[0333] In the example shown in FIG. 16, the stability value of a respective ABP in a candidate ABP library dataset is determined by applying a stability machine-learning model 1620 to at least the token sequence representing the respective ABP, the immunogenicity value of the respective ABP is determined by applying an immunogenicity7machine-learning model 1630 to at least the token sequence, the isoelectric point value of the respective ABP is determined by applying an isoelectric computation model 1650 to the token sequence, and the aggregation value of the respective ABP is determined by applying an aggregation computation model 1660 to at least the token sequence of the respective ABP.
[0334] In some embodiments, the method comprises (4) for each ABP in a respective candidate ABP library dataset, obtaining values associated with one or more preferred library properties, wherein the one or more preferred library properties is selected from:(i) the set of heavy chain CDR3 sequences contained in the subset of the plurality of ABPs comprises at least about 10, 20, 50, 100, 200, or 1000 unique sequences;(ii) the subset of the plurality7of ABPs specifically7bind to at least two unique epitopes associated with the target molecule or complex;(iii) the subset of the plurality of ABPs is capable of modulating at least two target antigen variants;77 28152 / 59621 / FW / 25024150.11Attorney Docket No.: 28152-59621 / WOClient Ref. No.: 024WO(iv) the set of heavy chain V genes represented in the subset of the plurality of ABPs comprises at least two unique V genes;(v) the set of light chain V genes represented in the subset of the plurality of ABPs comprises at least two unique V genes;(vi) the set of heavy chain J genes represented in the subset of the plurality of ABPs comprises at least two unique J genes;(vii) the set of light chain J genes represented in the subset of the plurality of ABPs comprises at least two unique J genes;(viii) the average percent germline identity of heavy chain V genes represented in the subset of the plurality of ABPs is between about 50% and about 100%;(ix) the average percent germline identity of light chain V genes represented in the subset of the plurality of ABPs is between about 50% and about 100%;(x) the average percent germline identity of heavy chain J genes represented in the subset of the plurality of ABPs is between about 50% and about 100%; and(xi) the average percent germline identity of light chain J genes represented in the subset of the plurality of ABPs is between about 50% and about 100%.
[0335] In the example shown in FIG. 16, the epitopes that a respective ABP is determined or predicted to specifically bind to are determined by applying an epitope machine-learning model 1640 to at least the token sequence for the ABP. In some embodiments, the epitopes that a respective ABP is determined or predicted to specifically bind to are determined by executing the pipeline 800 described in conjunction with FIG. 8. This way, both the sequence-based features (e.g., PLM embedding) and the structure-based features (e.g., structure embedding) can be integrated when generating the predictions for the respective ABP. This way, the diversity value for preferred library property' (ii) indicating the degree to which the different subset of the ABPs in the candidate ABP library dataset specifically binds or is predicted to bind to at least two or more unique epitopes can be determined. For example, the diversity value may be given by the number of unique epitopes the subset of ABPs is predicted to specifically bind to in the target antigen. Similarly, the V genes represented in the respective ABP is determined through a gene sequencing module 1670. This way, the diversity value for preferred library property (iv) indicating the degree to which at least two or more unique V genes are included in the heavy chain V genes presented in the subset of the ABPs in the candidate library dataset can be determined. In some embodiments, the diversity value for a respective library property' can be computed by using a diversity78 28152 / 59621 / FW / 25024150.11Attorney Docket No.: 28152-59621 / WOClient Ref. No.: 024WO index such as the Shannon’s index. However, it is appreciated that the diversity value for the library7property' can be computed with any other appropriate index.
[0336] The method further comprises (5) for each candidate ABP library dataset from the set of candidate ABP library' datasets, computing a library score based on a set of input features (X) comprising (a) the values of the plurality of characteristic descriptors of each of the respective subset of ABPs for the candidate ABP library’ dataset, and / or (b) the values of the one or more preferred library' properties of the candidate ABP library dataset. In some embodiments, the library' score is a loss score computed with a weighted combination of any' one or a combination of values associated with factors (a) and / or (b), where a higher weight value for a respective factor indicates that the values for that factor should be weighted more importantly’ in the loss score. In such an embodiment, a candidate ABP library dataset is considered a favorable dataset for an RPP if the loss score is lower than a threshold score. However, it is appreciated that in other embodiments, the library score can be configured in any appropriate manner, for example, the library' score may be configured such that a candidate ABP library dataset is considered a favorable dataset for an RPP if the library score is higher than a threshold score.
[0337] In the example show n in FIG. 16, when the library' score is a loss score, the loss score for the “Candidate Library z” is given by: loss = — (iv, • avg stability) + (iv2• avg -immunogenicity)— (iv3• epitope -diversity) — (w4• v_gene_diversity) Where feature “avg_stability ” denotes the average of stability values for characteristic descriptor (vii) of the 100 ABPs in the candidate ABP library’ dataset; feature “avg_immunogenicity” denotes average immunogenicity values for characteristic descriptor of the 100 ABPs in the candidate ABP library dataset; feature “epitope diversity” denotes the diversity value for preferred library property' (ii) indicating the degree to which the subset of the ABPs in the candidate ABP library' dataset specifically binds or is predicted to bind to at least two or more unique epitopes associated with the target molecule or complex (e.g., higher value for more diverse epitope distribution); and feature v_gene_diversity denotes the diversity value for preferred library property (iv) indicating a degree to which at least two or more unique V genes are included in the heavy chain V genes represented in the subset of the ABPs in the candidate ABP library' dataset (e.g., higher value for more diverse V gene distribution).79 28152 / 59621 / FW / 25024150.11Attorney Docket No.: 28152-59621 / WOClient Ref. No.: 024WO
[0338] In some embodiments, when computing the library score, the average values for a characteristic descriptor (e.g., average stabi li ty value, average immunogenicity value) is a weighted combination of the abundance for a respective ABP in the library dataset and the value for the characteristic descriptor for the ABP. This way, the library score weights the characteristic descriptor values for each ABP depending on the abundance of that ABP in the library dataset.
[0339] Moreover, wi denotes the respective weight for the avg_stability, W2 denotes the respective weight for the avg_immunogenicity, W3 denotes the respective weight for the epilope di versity. and W4 denotes the respective weight for the v_gene_di versity factors. Thus, a respective candidate ABP library' dataset becomes a more favorable candidate for producing a custom RPP when the average stability values of the subset of ABPs is high, when the average immunogenicity of the subset of ABP’s is low, the diversity or number of unique epitopes the subset of ABPs is determined or predicted to bind to is high, and the diversity or number of unique V genes represented in the heavy or light chain V genes represented in the subset of the ABPs is high.
[0340] In some embodiments, the library score for the “Candidate Library z” is given by computing a first score based on a first set of features comprising (a) the values of the plurality of characteristic descriptors of each of the respective subset of ABPs for the sample ABP library dataset, and a second score based on a second set of feature comprising (b) the values of the one or more preferred library properties of the sample ABP library dataset and combining the first score and the second score together.
[0341] The method further comprises (6) selecting a candidate ABP library dataset with a score based on the scores for forming the RPP. In some embodiments, the method selects one or more candidate ABP library' datasets that are associated with a loss score below a predetermined threshold or proportion. The selected one or more candidate ABP library datasets are used to form the RPPs.Method of selecting candidate ABP library dataset based on scoring model
[0342] In some embodiments, a method for selecting a filtered ABP library dataset for a RPP using a trained scoring model is disclosed. In some embodiments, the method comprises: (1) obtaining an input ABP library' dataset including an ABP profile for each of a plurality' of ABPs; (2) generating a set of candidate ABP library datasets each corresponding to a respective subset of ABPs (e.g., 100 ABPs) from the plurality of ABPs.80 28152 / 59621 / FW / 25024150.11Attorney Docket No.: 28152-59621 / WOClient Ref. No.: 024WO
[0343] In some embodiments, the method further comprises (3) for each ABP in a respective candidate ABP library dataset, obtaining values for a plurality characteristic descriptors for the respective ABP, wherein the plurality of characteristic descriptors is selected from:(i) a binding affinity of the respective ABP for the respective target antigen;(ii) an effector activity7of the respective ABP against the target molecule or complex;(iii) a solubility score of the respective ABP;(iv) an aggregation score of the respective ABP;(v) a hydrophobicity score of the respective ABP;(vi) an isoelectric point of the respective ABP;(vii) a stability score of the respective ABP;(viii) a molecular weight of the respective ABP;(ix) a number of unpaired cysteine residues in the respective ABP;(x) an abundance frequency of the respective ABP following a sorting process to enrich for binding to the respective target antigen;(xi) a fold-change of the increase in the abundance frequency of the respective ABP following a sorting process to enrich for binding to the respective target antigen, as compared to the abundance frequency prior to enrichment;(xii) a number of non-canonical glycosylation sites in the respective ABP;(xiii) a number of cleavage sites in the respective ABP;(xiv) a number of deamidation sites in the respective ABP;(xv) a number of isomerization sites in the respective ABP;(xvi) a number of oxidation sites in the respective ABP;(xvii) CDR3H length of the respective ABP;(xviii) binding specificity of the respective ABP;(xix) immunogenicity of the respective ABP;(xx) poly specificity of the respective ABP; and(xxi) a respective epitope that the respective ABP binds to.
[0344] In some embodiments, a value for at least one characteristic descriptor for a respective ABP is obtained from the methods described herein with respect to subsection '‘Methods of generating a custom RPP,” specifically subsection “Experimental method,” subsection “In silico method,” and subsection “Machine-learning and artificial intelligence (Al) based methods.” In some embodiments, values for at least one characteristic descriptor for the81 28152 / 59621 / FW / 25024150.11Attorney Docket No.: 28152-59621 / WOClient Ref. No.: 024WO respective ABP is generated by applying one or more machine-learning models to at least a full or partial sequence of the respective ABP. In some embodiments, values for at least one characteristic descriptor for the respective ABP are generated by applying the transformer model and the prediction layer trained according to the description in conjunction with FIG.7. In some embodiments, values for at least one characteristic descriptor for the respective ABP are generated by applying any one or a combination of the pipelines 800, 1000, 1200, 1600 described in conjunction with FIGs. 8, 10, 12, or 16.
[0345] In some embodiments, the method further comprises (4) for each ABP in a respective candidate ABP library dataset, obtaining values associated wi th one or more preferred library properties, wherein the one or more preferred library' properties are selected from:(i) the set of heavy chain CDR3 sequences contained in the subset of the plurality' of ABPs comprises at least about 10, 20, 50, 100, 200, or 1000 unique sequences;(ii) the subset of the plurality of ABPs specifically bind to at least two unique epitopes associated with the target molecule or complex;(iii) the subset of the plurality of ABPs is capable of modulating at least two target antigen variants;(iv) the set of heavy’ chain V genes represented in the subset of the plurality of ABPs comprises at least two unique V genes;(v) the set of light chain V genes represented in the subset of the plurality of ABPs comprises at least two unique V genes;(vi) the set of heavy chain J genes represented in the subset of the plurality of ABPs comprises at least two unique J genes;(vii) the set of light chain J genes represented in the subset of the plurality’ of ABPs comprises at least two unique J genes;(viii) the average percent germline identity of heavy chain V genes represented in the subset of the plurality of ABPs is between about 50% and about 100%;(ix) the average percent germline identity of light chain V genes represented in the subset of the plurality of ABPs is between about 50% and about 100%;(x) the average percent germline identity of heavy chain J genes represented in the subset of the plurality of ABPs is between about 50% and about 100%; and82 28152 / 59621 / FW / 25024150.11Attorney Docket No.: 28152-59621 / WOClient Ref. No.: 024WO(xi) the average percent germline identity of light chain J genes represented in the subset of the plurality of ABPs is between about 50% and about 100%.
[0346] The method further comprises (5) for each candidate ABP library dataset from the set of candidate ABP library datasets, computing a library score by applying a trained scoring model to a set of input features of the candidate ABP library dataset to generate a library score for the candidate ABP library dataset. In some embodiments, the set of input features comprises one or a combination of (a) the values of the plurality of characteristic descriptors of each of the respective subset of ABPs for the candidate ABP library dataset, and / or (b) the values of the one or more preferred library properties of the candidate ABP library dataset.
[0347] In some embodiments, the architecture of the scoring model is configured as a regression model, a neural network model, a transformer architecture, a random forest classifier, and the like, each having a respective set of trained weights determined through a training process described in further detail below.
[0348] In some embodiments, applying the trained scoring model to the set of input features of the candidate ABP library dataset further comprises applying a set of learned weights to the set of input features for the candidate ABP library dataset. In some embodiments, the scoring model is associated with a set of trained weights, where each weight (e.g., w>) corresponds to a respective input feature (e.g.. Xi) and denotes the degree and direction of contribution of the respective input feature to the library score. As described in further detail below, the weights of the scoring model are trained by correlating input features for a set of training ABP library7datasets with one or more desired properties of the training ABP library datasets.
[0349] The method further comprises (6) selecting a candidate ABP library dataset based on the library7scores for forming the RPP. In some embodiments, the method selects one or more candidate ABP library datasets associated with a library score above a predetermined threshold or proportion. The selected one or more candidate ABP library datasets are used to form the RPPs.Method of training scoring model for computing library scores
[0350] In some embodiments, a training method for training the weights of the scoring model is disclosed. In some embodiments, the training method for the scoring model comprises (1) obtaining a set of training ABP library datasets.83 28152 / 59621 / FW / 25024150.11Attorney Docket No.: 28152-59621 / WOClient Ref. No.: 024WO
[0351] The training method further comprises (2) for each training ABP library dataset, obtaining a set of input features (X) for the training ABP library- dataset from the (a) the values of the plurality of characteristic descriptors of each of the respective subset of ABPs for the ABP library’ dataset, or (b) the values of the one or more preferred library properties of the ABP library dataset.
[0352] The training method further comprises (3) for each ABP library dataset, obtaining one or more experimentally measured performance metrics (Y) for the ABP library dataset. In some embodiments, the performance metrics (Y) is selected from titer (Yl), percent monomer (Y2), number of neutralized targets (e.g., viral variants) (Y3), neutralization activity (Y4), and binding activity (Y5).
[0353] FIG. 17 illustrates a distribution of values for a set of input features in an example dataset for training an example scoring model, in accordance with an embodiment. FIG. 18 illustrates a distribution of values for one or more experimentally measured performance metrics in the dataset for training the example scoring model, in accordance with an embodiment. Specifically, the dataset includes 9 ABP library datasets, each ranging from 10- 50 clones.
[0354] As illustrated in FIG. 17. for a given ABP library dataset, one example input feature is the preferred library property (ix) indicating average percent germline identity of light chain V genes represented in the subset of plurality of ABPs. In some embodiments, the input feature for the ABP library dataset is determined by experimentally obtaining the percent germline identity of light chain V genes for each ABP in the ABP library dataset, and taking the mean of the values across the ABPs in the ABP library dataset.
[0355] Another example input feature denotes the preferred library property7(viii) indicating the average percent germline identity of heavy7chain V genes represented in the subset of the plurality of ABPs. In some embodiments, the input feature for the ABP library7dataset is determined by experimentally obtaining the percent germline identity7of heavy7chain V genes for each ABP in the ABP library7dataset, and taking the mean of the values across the ABPs in the ABP library dataset.
[0356] Another example input feature denotes the characteristic descriptor (xi) indicating the fold change in abundance frequency of the ABPs following a sorting process to enrich for binding to the respective target antigen, as compared to the abundance frequency prior to enrichment. In some embodiments, the input feature for the ABP library dataset is84 28152 / 59621 / FW / 25024150.11Attorney Docket No.: 28152-59621 / WOClient Ref. No.: 024WO determined by experimentally determining fold change for each ABP in the ABP library dataset after the enrichment, and taking a mean of the fold change values across the ABPs or a subset of the ABPs in the ABP library dataset.
[0357] Another example input feature denotes the characteristic descriptor (iii) indicating a solubility score of the ABPs. In some embodiments, the input feature for the ABP library7dataset is determined by predicting a solubility7for each ABP in the ABP library dataset by applying a machine-learning neural attention-like architecture to the protein sequence of the ABP. The input feature is obtained by taking a mean of the predicted solubility across the ABPs in the ABP library7dataset to obtain a solubility7score.
[0358] Another example input feature denotes the preferred library property (xi) indicating the average percent germline identity of light chain J genes represented in the subset of the plurality7of ABPs. In some embodiments, the input feature for the ABP library7dataset is determined by experimentally obtaining the percent germline identity of light chain J genes for each ABP in the ABP library dataset, and taking the mean of the values across the ABPs in the ABP library dataset.
[0359] Another example input feature denotes the preferred library property (x) indicating the average percent germline identity of heavy7chain J genes represented in the subset of the plurality of ABPs. In some embodiments, the input feature for the ABP library dataset is determined by experimentally obtaining the percent germline identity7of heavy chain J genes for each ABP in the ABP library dataset, and taking the mean of the values across the ABPs in the ABP library dataset.
[0360] Another example input feature denotes the characteristic descriptor (vii) indicating the stability7score of the ABPs. In some embodiments, the input feature for the ABP library dataset is determined by computationally obtaining a stability of each ABP based on dipeptides that are associated with degradation tendencies. In some embodiments, the input feature is obtained by taking the mean or max of the stability across the ABPs in the ABP library7dataset to obtain a stability7score.
[0361] Another example input feature denotes the characteristic descriptor (v) indicating the hydrophobicity7score of the ABPs. In some embodiments, the input feature for the ABP library7dataset is determined by applying the machine-learning pipeline 1000 described in conjunction with FIG. 10 to each ABP in the ABP library dataset. In some embodiments, the hydrophobicity of each ABP is determined using the Eisenberg hydrophobicity scale. In85 28152 / 59621 / FW / 25024150.11Attorney Docket No.: 28152-59621 / WOClient Ref. No.: 024WO some embodiments, the input feature is obtained by taking the mean or max of the hydrophobicity across the ABPs in the ABP library dataset to obtain a hydrophobicity7score.
[0362] Another example input feature denotes the diversity value of preferred library7property (ii) indicating the number of unique epitopes on the target molecule or complex to which the subset of ABPs in the ABP library7dataset specifically binds or is predicted to bind is at least two or more (e.g., higher value for more diverse epitope distribution). In some embodiments, the input feature for the ABP library dataset is determined by applying the machine-learning pipeline 800 described in conjunction with FIG. 8 to the sequence of each ABP in the ABP library dataset to obtain an epitope prediction on the target antigen for each ABP. In some embodiments, the input feature is obtained by computing the diversity7index of the predicted epitopes across the ABPs in the ABP library dataset.
[0363] Another example input feature denotes the characteristic descriptor (iv) indicating the aggregation score of ABPs. In some embodiments, the input feature for the ABP library dataset is determined by applying the machine-learning pipeline 1200 described in conjunction with FIG. 12 to each ABP in the ABP library dataset to obtain an aggregation for each ABP. In some embodiments, the input feature is obtained by taking the mean or max of the aggregation across ABPs in the ABP library dataset.
[0364] As illustrated in FIG. 18, the dataset can also include experimentally measured performance metrics with respect to one or more desired properties of the ABP library dataset. One example performance metric is titer (e.g., mg / L), which is a measurement of how much of the ABP is present in a sample. In some embodiments, the performance metric for an ABP library dataset is obtained by taking the mean of the titer across the ABPs in the ABP library dataset.
[0365] Another example performance metric is percent monomer, which indicates the percentage the ABP exists in monomer form in contrast to an aggregated form. In some embodiments, the percent monomer is measured based on the SEC-MALS assay. The performance metric for a given ABP library7dataset can be obtained by taking the mean of the percent monomer across the ABPs in the ABP library dataset.
[0366] Another example performance metric is the number of neutralized variants. For one or more target antigen variants, the performance metric indicates how many variants are neutralized after being exposed to an ABP. The performance metric for a given ABP library86 28152 / 59621 / FW / 25024150.11Attorney Docket No.: 28152-59621 / WOClient Ref. No.: 024WO dataset is obtained by combining the number of neutralized variants across the ABPs in the ABP library dataset.
[0367] In some embodiments, the performance metric comprises a neutralization activity. The neutralization activity can be measured using any method known in the art, such as Plaque Reduction Neutralization Test (PRNT), Microneutralization assay, Focus Reduction Neutralization Test (FRNT), Pseudovirus neutralization assay, Surrogate virus neutralization test (sVNT), Competitive ELISA, and Cell-based neutralization assay. In some embodiments, the neutralization activity is represented as NT50, IC50, PRNT50, PRNT90, Percent inhibition, or Neutralization titer.
[0368] In some embodiments, the performance metric comprises a binding activity. The binding activity can be measured by any method known in the art, such as Surface Plasmon Resonance (SPR), Bio-Layer Interferometry (BLI), Isothermal Titration Calorimetry' (ITC), Enzyme-Linked Immunosorbent Assay (ELISA), Microscale Thermophoresis (MST), Fluorescence Polarization (FP), Flow cytometry-based binding assay. In some embodiments, the binding affinity’ is presented as EC50, Kd, Ka, kon, koff, KDapp, or AG.
[0369] In some embodiments, the training method for the scoring model further comprises (4) obtaining a set of weights for the scoring model by reducing one or more loss functions indicating a difference between estimated outputs generated by the scoring model and one or more performance metrics from the dataset. In some embodiments, when the scoring model is configured as a neural network, the set of weights are associated with connections between layers of nodes of the neural network. In some embodiments, when the scoring model is configured as a regression model, the set of weights are coefficients associated with the set of input features.
[0370] In some embodiments, the training method further comprises generating a set of models, in which each model is trained using the set of input features and the values for a respective performance metric to obtain a respective set of weights for the model. In some embodiments, the resulting weights for the scoring model are determined by obtaining the weights for each input feature from the set of models, and taking a mean of the w eights for the input feature. Therefore, the weight of a respective input feature of the scoring model determined in this manner takes into account the contribution of the input feature with respect to the one or more desired properties of the ABP library' dataset.87 28152 / 59621 / FW / 25024150.11Attorney Docket No.: 28152-59621 / WOClient Ref. No.: 024WO
[0371] FIG. 19 illustrates weight values for a set of three example models trained using the example dataset, in accordance with an embodiment. As illustrated in FIG. 19, a multiresponse regression model including a set of three regression models is constructed. The first model (corresponding to the first column in FIG. 19) is a regression model including a set of weights trained based on the set of input features of each ABP library dataset in the example dataset and the performance metric values for the number of neutralized variants for each ABP library' dataset. Thus, there is a learned coefficient for each input feature, so that a first loss function representing a difference between the number of variants and estimated outputs generated by applying the first model on the input features obtained from the ABP library datasets are reduced or minimized.
[0372] Similarly, the second model (corresponding to the second column in FIG. 19) is also a regression model including a second set of weights trained based on the set of input features of each ABP library dataset and the second performance metric values for percent monomer. There are also learned coefficients for each input feature, so that a second loss function representing a difference between the percent monomer metric and estimated outputs generated by applying the second model on the input features obtained from the ABP library dataset are reduced or minimized.
[0373] Similarly, the third model (corresponding to the third column in FIG. 19) is also a regression model including a third set of weights trained based on the set of input features of each ABP library dataset and the third performance metric values for titer. There are also learned coefficients for each input feature, so that a third loss function representing a difference between the titer and estimated outputs generated by applying the third model on the input features obtained from the ABP library dataset are reduced or minimized.
[0374] Moreover, the coefficients for each input feature are combined or averaged to generate the respective eight for the input feature for the scoring model. Therefore, when each coefficient of the scoring model is multiplied with the value of its respective input feature to generate a w eighted feature for an ABP library dataset, the sum of the weighted features results in the library' score for the ABP library dataset. An example scoring model generated in this manner is given by:Library score = -3.619x(top_lib_post_mean) + 3.225x(j_light_evenness) + 3.162x(j_heavy_evenness) + 2.934x(instability_index_z_mean) + 1.794x(epitope_evenness) + 0.582x(v_light_shannon) + 0.374x(j_light_id_z_mean) +88 28152 / 59621 / FW / 25024150.11Attorney Docket No.: 28152-59621 / WOClient Ref. No.: 024WO0.311x(v_light_id_z_mean) + -0.183x(v_heavy_evenness) + -0.06x(bind_shannon) + 0.03x(aggregation_z_max) + -0.01x(v_heavy_richness), where (•) indicates an input feature and the coefficient in front of the input feature is the learned weight of the scoring model. Therefore, the output of applying the set of trained weights of the scoring model to the input features of a respective ABP library dataset is the library score of the respective ABP library dataset.
[0375] In some embodiments, the training method further comprises (5) storing the weights of the scoring model on the computer readable medium.
[0376] FIG. 20 illustrates experimental results of applying the trained scoring model to ABP library datasets to obtain library scores, in accordance with an embodiment. As illustrated in FIG. 20, the example scoring model described above are applied to the set of input features obtained for each ABP library dataset to generate a respective library score for each ABP library dataset. The plot labeled “pool score” plots the generated library scores for each ABP library dataset. As shown in FIG. 20, there is a relatively high correlation between the library scores and the experimental performance metrics (e g., titer, percent monomer, and number of neutralized variants). This figure illustrates that when the scoring model is applied to input features for other ABP library datasets to generate library scores for the dataset, using the library scores to select ABP library datasets for RPPs will result in effectively neutralizing target antigens.89 28152 / 59621 / FW / 25024150.11Attorney Docket No.: 28152-59621 / WOClient Ref. No.: 024WOMethod of selecting candidate ABP library dataset based on machine-learning based score prediction
[0377] FIG. 21 illustrates a method of training a machine-learning model for selecting a filtered ABP library dataset for producing a RPP, according to one or more embodiments. In some embodiments, a method for selecting a filtered ABP library dataset for a RPP is disclosed. In some embodiments, the method comprises obtaining a training dataset including a plurality of sample ABP library datasets each corresponding to a respective subset of ABPs from the plurality of ABPs, each sample ABP library dataset including token sequences encoding a partial or full sequence of each ABP in the respective subset of ABPs for the sample. The training dataset is used to train parameters of a machine-learning model coupled to receive at least partial or full sequences of each ABP in a candidate ABP library dataset including a respective subset of ABP’s and generate a score that is indicative of whether the candidate ABP library dataset should be selected for producing an RPP.
[0378] In some embodiments, the method comprises (1) obtaining an input antigen binding protein (ABP) library dataset including an ABP profile for each of a plurality of ABPs; (2) obtaining a training dataset including a plurality of sample ABP library datasets each corresponding to a respective subset of ABPs (e.g., 100 ABPs) from the plurality of ABPs, each sample ABP library dataset including token sequences encoding a partial or full sequence of each ABP in the respective subset of ABPs for the sample.
[0379] In some embodiments, the method comprises (3) for each ABP in a respective sample ABP library dataset, obtaining values for a plurality of characteristic descriptors for the respective ABP, wherein the plurality7of characteristic descriptors is selected from:(i) a binding affinity of the respective ABP for the respective target antigen;(ii) an effector activity of the respective ABP against the target molecule or complex;(iii) a solubility score of the respective ABP;(iv) an aggregation score of the respective ABP;(v) a hydrophobicity score of the respective ABP;(vi) an isoelectric point of the respective ABP;(vii) a stability score of the respective ABP;(viii) a molecular weight of the respective ABP;(ix) a number of unpaired cysteine residues in the respective ABP;90 28152 / 59621 / FW / 25024150.11Attorney Docket No.: 28152-59621 / WOClient Ref. No.: 024WO(x) an abundance frequency of the respective ABP following a sorting process to enrich for binding to the respective target antigen;(xi) a fold-change of the increase in the abundance frequency of the respective ABP following a sorting process to enrich for binding to the respective target antigen, as compared to the abundance frequency prior to enrichment;(xii) a number of non-canonical glycosylation sites in the respective ABP;(xiii) a number of cleavage sites in the respective ABP;(xiv) a number of deamidation sites in the respective ABP;(xv) a number of isomerization sites in the respective ABP;(xvi) a number of oxidation sites in the respective ABP;(xvii) CDR3H length of the respective ABP;(xviii) binding speci ficily of the respective ABP;(xix) immunogenicity of the respective ABP;(xx) polyspecificty of the respective ABP; and(xxi) a respective epitope that the respective ABP binds to.
[0380] In some embodiments, the value for the (vii) stability score of a respective ABP may be generated by one or a combination of the values for (xiv) the number of cleavage sites in the respective ABP, (xv) the number of deamidation sites in the respective ABP, (xvi) the number of isomerization sites in the respective ABP, and / or (xvii) the number of oxidation sites in the respective ABP, as these descriptors are indicative of the stability or degradation of the respective ABP.
[0381] In some embodiments, the characteristic descriptors for the respective candidate ABP selected from: (i) a binding affinity' of the respective candidate ABP to the respective target antigen; (ii) an effector activity of the respective candidate ABP against the target molecule or complex; (iii) a binding pattern of the respective candidate ABP to the respective target antigen and its variants; (iv) an abundance frequency of the respective candidate ABP following sorting the input ABP library' or a subset thereof to enrich for binding to the respective target antigen; and (v) a fold-change of the increase in the abundance frequency of the respective candidate ABP following sorting the input ABP library or a subset thereof to enrich for binding to the respective target antigen, as compared to the abundance frequency prior to enrichment.
[0382] In some embodiments, the method comprises (4) for each ABP in a respective sample ABP library dataset, obtaining values associated with one or more preferred library' properties91 28152 / 59621 / FW / 25024150.11Attorney Docket No.: 28152-59621 / WOClient Ref. No.: 024WO of the sample ABP library dataset, wherein the one or more preferred library properties is selected from:(i) the set of heavy chain CDR3 sequences contained in the subset of the plurality of ABPs comprises at least about 10, 20, 50, 100, 200, or 1000 unique sequences;(ii) the subset of the plurality of ABPs specifically bind to at least two unique epitopes associated with the target molecule or complex;(iii) the subset of the plurality of ABPs is capable of modulating at least two target antigen variants;(iv) the set of heavy chain V genes represented in the subset of the plurality of ABPs comprises at least two unique V genes;(v) the set of light chain V genes represented in the subset of the plurality of ABPs comprises at least two unique V genes;(vi) the set of heavy chain J genes represented in the subset of the plurality7of ABPs comprises at least two unique J genes;(vii) the set of light chain J genes represented in the subset of the plurality of ABPs comprises at least two unique J genes;(viii) the average percent germline identity7of heavy chain V genes represented in the subset of the plurality of ABPs is between about 50% and about 100%;(ix) the average percent germline identity of light chain V genes represented in the subset of the plurality7of ABPs is between about 50% and about 100%;(x) the average percent germline identity of heavy chain J genes represented in the subset of the plurality of ABPs is between about 50% and about 100%; and(xi) the average percent germline identity of light chain J genes represented in the subset of the plurality7of ABPs is between about 50% and about 100%.
[0383] The method further comprises (5) for each sample ABP library7dataset, computing a known library score based on (a) the values of the plurality of characteristic descriptors of each of the respective subset of ABPs for the sample ABP library dataset, and / or (b) the values of the one or more preferred library properties of the sample ABP library dataset. In some embodiments, the score is a weighted combination of any one or a combination of values associated with factors (a) and / or (b), where a higher weight value for a respective factor indicates that the values for that factor should be weighted more importantly in the92 28152 / 59621 / FW / 25024150.11Attorney Docket No.: 28152-59621 / WOClient Ref. No.: 024WO score. The sign of a respective factor is adjusted such that a higher value of the score reflects a more favorable ABP library dataset. As an example, the known library score for a “Sample Library j" in the training dataset is given by:where “avg_stability” denotes the average of stability values for characteristic descriptor (vii) of the 100 ABP’s in the sample ABP library dataset; “avg immunogenicity” denotes average immunogenicity values for characteristic descriptor of the 100 ABPs in the sample ABP library dataset; “epitope_diversity” denotes the diversity value for preferred library property (ii) indicating the degree to which the subset of the ABPs in the sample ABP library dataset specifically binds or is predicted to bind to at least two or more unique epitopes associated with the target molecule or complex (e.g., higher value for more diverse epitope distribution); and v gene diversity denotes the diversity value for preferred library property (iv) indicating a degree to which at least two or more unique V genes are included in the heavy chain V genes represented in the subset of the ABPs in the sample ABP library dataset (e g., higher value for more diverse V gene distribution).
[0384] In some embodiments, when computing the library7score, the average values for a characteristic descriptor (e g., average stability- value, average immunogenicity value) is a weighted combination of the abundance for a respective ABP in the sample library dataset and the value for the characteristic descriptor for the ABP. This way, the library score weights the characteristic descriptor values for each ABP depending on the abundance of that ABP in the sample library dataset.
[0385] Moreover, wi denotes the respective weight for the avg_stability, W2 denotes the respective weight for the avg immunogenicity. W3 denotes the respective weight for the epitope_diversity, and W4 denotes the respective weight for the v_gene_diversity factors. Thus, a respective sample ABP library dataset is a more favorable candidate for producing a custom RPP when the average stability values of the subset of ABPs is high, when the average immunogenicity of the subset of ABPs is low, the diversity or number of unique epitopes the subset of ABPs is determined or predicted to bind to is high, and the diversity or number of unique V genes represented in the heavy or light chain V genes represented in the subset of the ABPs is high.93 28152 / 59621 / FW / 25024150.11Attorney Docket No.: 28152-59621 / WOClient Ref. No.: 024WO
[0386] In some embodiments, the known library score for the “Candidate Library / ” is given by computing a first score based on (a) the values of the plurality of characteristic descriptors of each of the respective subset of ABPs for the sample ABP library dataset, and a second score based on (b) the values of the one or more preferred library properties of the sample ABP library dataset and combining the first score and the second score together.
[0387] The method further comprises (6) dividing the training dataset into one or more batches of samples for one or more iterations.
[0388] Specifically, (7) for each of one or more iterations, the method further comprises (a) for each sample ABP library dataset in the batch, applying parameters of a machine-learning model 2150 to the token sequences for the sample ABP library dataset to generate an estimated output. As shown in the example of FIG. 21, a sample ABP library dataset “Sample Library ’ included in a batch of samples for the current iteration includes tokens encoding the partial or full sequences of the respective subset of ABPs in the sample, including ABP sequence 23. ABP sequence 985, . . ., ABP sequence 65. In some embodiments, the inputs to the machine-learning model 2150 includes a concatenation of the tokens as a single tensor. The parameters for the machine-learning model 2150 for the current iteration are applied to the tensor to generate an estimated output / .
[0389] In some embodiments, when the abundance of each ABP is used to weight the library score for a sample library dataset, the inputs to the machine-learning model 2150 further includes the abundance values for each ABP in the dataset. This way, during the inference for a candidate ABP library dataset, the predicted library score also incorporates the abundance of each ABP to predict the library score.
[0390] The method further comprises (b) computing a loss function indicating differences between the scores and the estimated outputs for the batch of sample ABP library datasets. Specifically, the loss function for each sample in the batch for the current iteration is computed, and summed together to generate a loss for the batch. In the example of FIG. 21, the loss function for the “Sample Libraiy / '' indicates a difference between the estimated output j and the known library score for the sample in the training dataset. In some embodiments, the difference is encoded using a L2 loss, LI loss, and the like. The loss function for the remaining samples in the batch for the current iteration is also computed and summed to generate a loss for the batch.94 28152 / 59621 / FW / 25024150.11Attorney Docket No.: 28152-59621 / WOClient Ref. No.: 024WO
[0391] The method further comprises (c) updating the parameters of the machine-learning model 2150 by backpropagating terms obtained from the loss function for the batch.
[0392] This process described with respect to (a)-(c) is repeated for subsequent iterations until a convergence criteria is reached. For example, the convergence criteria may be met if the change in the values of the parameters of the machine-learning model 2150 are less than a threshold.
[0393] After the training process is completed, the trained parameters of the machinelearning model 2150 may be deployed to select one or more candidate ABP library datasets. Specifically, during the inference process, the method comprises (8) obtaining a plurality of candidate ABP library datasets each including token sequences for a subset of ABPs. In one or more embodiments, a candidate ABP library dataset includes a respective subset of ABPs that are of a different combination than the sample datasets in the training dataset. The method comprises (9) applying the trained machine-learning model 2150 to the token sequences for the candidate ABP library datasets to generate predicted library scores. The method further comprises (10) selecting a candidate ABP library dataset responsive to the predicted library score meeting a threshold value or proportion, and (11) providing the candidate ABP library dataset for generation of a composition comprising ABPs corresponding to the ABP references in the candidate ABP library dataset, thereby generating the RPP.6.5. Custom RPPs
[0394] The present disclosure provides a custom RPP comprising a plurality of ABPs having specific properties. The custom RPP can be generated by employing any of the processes described herein.
[0395] In some embodiments, the RPP specifically binds a target molecule or complex of molecules (such as molecules or complexes of molecules associated with a pathogen, including, e g., viral and bacterial pathogens). The RPP comprises a plurality of ABPs that specifically bind to the target molecule or complex and that are selected by the ML / Al model described herein and / or for having one or more characteristics (including, e.g., physical characteristics and functional characteristics) and / or one or more library properties (i.e., properties of the library of ABPs that make up the RPP). The characteristics and library properties can include those determined experimentally or by an in silica process.95 28152 / 59621 / FW / 25024150.11Attorney Docket No.: 28152-59621 / WOClient Ref. No.: 024WO
[0396] Accordingly, the present disclosure relates to methods of selecting ABPs having desired properties for generation of an RPP. The presence or absence of the desired properties can be determined by a ML / Al model. The present invention also relates to the RPP generated by any of the methods disclosed herein.RPP size
[0397] In some embodiments, the RPP comprises at least about any of 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36,37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49. 50. 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61,62. 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74. 75. 76. 77. 78. 79, 80, 81, 82, 83, 84, 85, 86,87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, or more ABPs.
[0398] In some embodiments, the RPP comprises at least 100, at least 500, at least 1000, at least 2000. at least 3000. at least 4000, at least 5000, at least 6000, at least 7000, at least 8000, at least 9000, at least 10,000, at least 15,000, at least 20,000, at least 30,000, at least 40,000, at least 50,000, at least 100,000, at least 500,000, or at least 1000,000 unique ABPs.ABP Source
[0399] ABPs for generation of an RPP can be obtained from various sources. In some embodiments, a naturally occurring ABP isolated from a donor sample is used. In some embodiments, an ABP synthesized based on known sequences is used. In some embodiments, an ABP having an artificially generated sequence is used. In some embodiments, an ABP having a sequence from an antibody sequence database (e.g., UniProt, IMGT, abYsis. The ABCD database, SabDab. Thera-SabDab, Aho’s Amazing Atlas of Antibody Anatomy, Observed Antibody Space database, cAb-Rep) is used.
[0400] In some embodiments, an RPP comprises ABPs originated from the same source. In some embodiments, an RPP comprises ABPs originated from multiple different sources.
[0401] In some embodiments, ABPs are originated from one or more donor samples. In some embodiments, the ABPs obtained from one or more sources are analyzed and selected to be included in the RPP.
[0402] In some embodiments, an ABP of an RPP described herein comprises a cognate pair of heavy chain and light chain variable regions from a single cell out of a blood sample from at least one (such as at least any of 2, 3, 4. 5, 6, 7, 8. 9, 10, 11, 12, 13, 14. 15. 16. 17, 18, 19, 20, or more) donor previously exposed to the target molecule or complex (e.g., target96 28152 / 59621 / FW / 25024150.11Attorney Docket No.: 28152-59621 / WOClient Ref. No.: 024WO molecule or complex associated with a virus or bacterium). In some embodiments, the cognate pair of heavy chain and light chain variable regions from a single cell is generated by the method described in Adler et al, high-affinity anti-pathogen antibodies from human repertoires, discovered using microfluidics and molecular genomics. Mabs. 2017 Nov / Dec;9(8): 1282-1296, which is incorporated by reference in its entirety.
[0403] In some embodiments, the at least one donor has been previously vaccinated with a vaccine derived from the target molecule or complex. In some embodiments, the blood sample comprises cells purified from peripheral blood mononuclear cells (PBMCs) of the donor. In some embodiments, the single cell is a B cell (e.g., memory B cell), plasma cell, or plasmablast. In some embodiments, the target molecule or complex is associated with a virus. In some embodiments, the target molecule or complex is a viral protein or complex of viral proteins. In some embodiments, the target molecule or complex is associated with a bacterium. In some embodiments, the target molecule or complex is a bacterial protein or complex of bacterial proteins. In some embodiments, the subject is a human. In some embodiments, the subject is a transgenic animal (including, e.g., mice, rats, and chickens) expressing human antibody sequences.
[0404] Fully human monoclonal antibodies may be generated by any number of techniques with which those having ordinary skill in the art will be familiar. Such methods include, but are not limited to, Epstein Barr Virus (EBV) transformation of human peripheral blood cells (e.g., containing B lymphocytes), in vitro immunization of human B-cells, fusion of spleen cells from immunized transgenic mice carry ing inserted human immunoglobulin genes, isolation from human immunoglobulin V region phage libraries, or other procedures as known in the art and based on the disclosure herein. For example, fully human monoclonal antibodies may be obtained from transgenic mice that have been engineered to produce specific human antibodies in response to antigenic challenge. Methods for obtaining fully human antibodies from transgenic mice are described, for example, by Green et al., Nature Genet. 7 : 13, 1994 ; Lonberg et al., Nature 368 :856, 1994 : Taylor et al.. Int. Immun. 6 :579, 1994 ; U.S. Patent No. 5,877,397 ; Bruggemann et al., 1997 Curr. Opin. Biotechnol.8:455-58; Jakobovits et al., 1995 Ann. N. Y. Acad. Sci. 764:525-35. In this technique, elements of the human heavy and light chain locus are introduced into strains of mice derived from embryonic stem cell lines that contain targeted disruptions of the endogenous heavy chain and light chain loci (see also Bruggemann et al., Curr. Opin. Biotechnol. 8:455-58 (1997)). For example, human immunoglobulin transgenes may be mini-gene constructs, or97 28152 / 59621 / FW / 25024150.11Attorney Docket No.: 28152-59621 / WOClient Ref. No.: 024WO transloci on yeast artificial chromosomes, which undergo B-cell-specific DNA rearrangement and hypermutation in the mouse lymphoid tissue. Fully human monoclonal antibodies may be obtained by immunizing the transgenic mice, which may then produce human antibodies specific for the antigen target or targets. Lymphoid cells of the immunized transgenic mice can be used to produce human antibody-secreting hybridomas according to the methods described herein.
[0405] Another method for generating human antibodies of the invention includes immortalizing human peripheral blood cells by EBV transformation. See, e.g., U.S. Patent No. 4,464,456. Such an immortalized B-cell line (or lymphoblastoid cell line) producing an ABP that specifically binds to target or targets can be identified by immunodetection methods as provided herein, for example, an ELISA, and then isolated by standard cloning techniques. The stability of the lymphoblastoid cell line producing an ABP may be improved by fusing the transformed cell lines with a murine myeloma to produce a mouse-human hybrid cell line according to methods known in the art (see, e.g., Glasky et al., Hybridomct 8:377-89 (1989)). Still another method to generate human ABPs is in vitro immunization, which includes priming human splenic B-cells with antigen targets, followed by fusion of primed with a heterohybrid fusion partner. See, e.g., Boemer et al., 1991 J. Immunol. 147:86-95.
[0406] In certain embodiments, B-cells that are producing an ABP are selected and the light chain and heavy chain variable regions are cloned from the B-cell according to molecular biology techniques known in the art (WO 92 / 02551; U.S. Patent 5,627,052; Babcook et al.. Proc. Natl. Acad. Sci. USA 93:7843-48 (1996)) and described herein. B-cells from an immunized animal may be isolated from the spleen, lymph node, or peripheral blood sample by selecting a cell that is producing an antibody that specifically binds to the antigen target. B-cells may also be isolated from humans, for example, from a peripheral blood sample.
[0407] Methods for detecting single B-cells that are producing an antibody with the desired specificity are well known in the art. for example, by plaque formation. fluorescence-activated cell sorting, in vitro stimulation follow ed by detection of specific antibody, and the like. Methods for selection of specific antibody-producing B-cells include, for example, preparing a single cell suspension of B-cells in soft agar that contains the antigen target. Binding of the specific antibodies produced by the B-cell to the antigen results in the formation of a complex, w hich may be visible as an immunoprecipitate.98 28152 / 59621 / FW / 25024150.11Attorney Docket No.: 28152-59621 / WOClient Ref. No.: 024WO
[0408] In some embodiments, specific antibody-producing B-cells are selected by using a method that allows identification of natively paired antibodies. For example, a method described in Adler et al., A natively paired antibody library yields drug leads with higher sensitivity and specificity than a randomly paired antibody library, mAbs (2018), which is incorporated by reference in its entirety herein, can be employed. The method combines microfluidic technology, molecular genomics, yeast single-chain variable fragment (scFv) display, fluorescence-activated cell sorting (FACS) and deep sequencing. In short, B cells can be isolated from immunized animals and then pooled. The B cells are encapsulated into droplets with oligo-dT beads and a lysis solution, and mRNA-bound beads are purified from the droplets, and then injected into a second emulsion with an OE-RT-PCR amplification mix that generates DNA amplicons that encode scFv with native pairing of heavy and light chain Ig. Libraries of natively paired amplicons are then electroporated into yeast for scFv display. FACS is used to identify high affinity scFv. Finally, deep antibody sequencing can be used to identify all clones in the pre- and post-sort scFv libraries.
[0409] After the B-cells producing the desired antibody are selected, the specific antibody genes may be cloned by isolating and amplifying DNA or mRNA according to methods known in the art and described herein.
[0410] The methods for obtaining antibodies of the invention can also adopt various phage display technologies known in the art. See, e.g., Winter et al. , 1994 AWIM. Rev. Immunol. 12:433-55; Burton et al., 1994 Adv. Immunol. 57: 191-280. Human or murine immunoglobulin variable region gene combinatorial libraries may be created in phage vectors that can be screened to select Ig fragments (Fab, Fv, sFv, or multimers thereof) that bind specifically to a target antigen. See, e.g.. U.S. Patent No. 5,223.409; Huse et al., 1989 Science 246: 1275-81 ; Sastry et al., Proc. Natl. Acad. Sci. USA 86:5728-32 (1989); Alting-Mees et al., Strategies in Molecular Biology 3:1-9 (1990); Kang et al., 1991 Proc. Natl. Acad. Sci. USA 88:4363-66; Hoogenboom et al., 1992 Molec. Biol. 227:381-388; Schlebusch et al., 1997 Hybridoma 16:47-52 and references cited therein. For example, a library containing a plurality of polynucleotide sequences encoding Ig variable region fragments may be inserted into the genome of a filamentous bacteriophage, such as Ml 3 or a variant thereof, in frame with the sequence encoding a phage coat protein. A fusion protein may be a fusion of the coat protein with the light chain variable region domain and / or with the heavy chain variable region domain. According to certain embodiments, immunoglobulin Fab fragments may also be displayed on a phage particle (see, e.g., U.S. Patent No. 5,698,426).99 28152 / 59621 / FW / 25024150.11Attorney Docket No.: 28152-59621 / WOClient Ref. No.: 024WO
[0411] In one embodiment, in a hybridoma the variable regions of a gene expressing a monoclonal antibody of interest are amplified using nucleotide primers. These primers may be synthesized by one of ordinary skill in the art, or may be purchased from commercially available sources. (See, e.g., Stratagene (La Jolla, California), which sells primers for mouse and human variable regions including, among others, primers for Vm, Vnb, Vm, Vnd, Cm, VL and CL regions.) These primers may be used to amplify heavy or light chain variable regions, which may then be inserted into vectors such as ImmunoZAP™H or ImmunoZAPTML (Stratagene), respectively. These vectors may then be introduced into E. coli. yeast, or mammalian based systems for expression. Large amounts of a single chain protein containing a fusion of the VH and VL domains may be produced using these methods (see Bird et al., Science 242:423426, 1988).
[0412] Once cells producing antibodies according to the invention have been obtained using any of the above described immunization and other techniques, the specific antibody genes may be cloned by isolating and amplifying DNA or mRNA therefrom according to standard procedures as described herein. The antibodies produced therefrom may be sequenced and the CDRs identified and the DNA coding for the CDRs may be manipulated as described previously to generate other antibodies according to the invention.
[0413] Other antibodies according to the invention may be obtained by conventional immunization and cell fusion procedures as described herein and known in the art.
[0414] Molecular evolution of the complementarity determining regions (CDRs) in the center of the antibody binding site also has been used to isolate antibodies with increased affinity, for example, antibodies having increased affinity for c-erbB-2, as described by Schier et a!., 1996, J. Mol. Biol. 263:551. It will be appreciated that an antibody of the present invention may have at least one amino acid substitution, providing that the antibody retains binding specificity. Therefore, modifications to the antibody structures are encompassed within the scope of the invention. These may include amino acid substitutions, which may be conservative or non-conservative that do not destroy the binding capability of an antibody comprising the RPP. Conservative amino acid substitutions may encompass non-naturally occurring amino acid residues, which are typically incorporated by chemical peptide synthesis rather than by synthesis in biological systems. These include peptidomimetics and other reversed or inverted forms of amino acid moieties. A conservative amino acid substitution may also involve a substitution of a native amino acid residue with a normative100 28152 / 59621 / FW / 25024150.11Attorney Docket No.: 28152-59621 / WOClient Ref. No.: 024WO residue such that there is little or no effect on the polarity or charge of the amino acid residue at that position.
[0415] Non-conservative substitutions may involve the exchange of a member of one class of amino acids or amino acid mimetics for a member from another class with different physical properties (e.g. size, polarity, hydrophobicity, charge). Such substituted residues may be introduced into regions of the human antibody that are homologous with non-human antibodies, or into the non-homologous regions of the molecule.
[0416] Moreover, one skilled in the art may generate test variants containing a single amino acid substitution at each desired amino acid residue. The variants can then be screened using activity assays known to those skilled in the art. Such variants could be used to gather information about suitable variants. For example, if one discovered that a change to a particular amino acid residue resulted in destroyed, undesirably reduced, or unsuitable activity, variants with such a change may be avoided. In other words, based on information gathered from such routine experiments, one skilled in the art can readily determine the amino acids where further substitutions should be avoided either alone or in combination with other mutations.
[0417] A skilled artisan will be able to determine suitable variants of the polypeptide as set forth herein using well-known techniques. In certain embodiments, one skilled in the art may identify suitable areas of the molecule that may be changed without destroying activity by targeting regions not believed to be important for activity. In certain embodiments, one can identify’ residues and portions of the molecules that are conserved among similar polypeptides. In certain embodiments, even areas that may be important for biological activity or for structure may be subject to conservative amino acid substitutions without destroying the biological activity or without adversely affecting the polypeptide structure.
[0418] Additionally, one skilled in the art can review structure-function studies identifying residues in similar polypeptides that are important for activity or structure. In view’ of such a comparison, one can predict the importance of amino acid residues in a protein that correspond to amino acid residues which are important for activity or structure in similar proteins. One skilled in the art may opt for chemically similar amino acid substitutions for such predicted important amino acid residues.
[0419] One skilled in the art can also analyze the three-dimensional structure and amino acid sequence in relation to that structure in similar polypeptides. In view of such information, one101 28152 / 59621 / FW / 25024150.11Attorney Docket No.: 28152-59621 / WOClient Ref. No.: 024WO skilled in the art may predict the alignment of amino acid residues of an antibody with respect to its three-dimensional structure. In certain embodiments, one skilled in the art may choose not to make radical changes to amino acid residues predicted to be on the surface of the protein, since such residues may be involved in important interactions with other molecules.
[0420] A number of scientific publications have been devoted to the prediction of secondary structure. See Moult J., Curr. Op. in Biotech., 7(4):422-427 (1996). Chou et al., Biochem., 13(2):222-245 (1974); Chou etal., Biochem., 113(2):211-222 (1974); Chou e / al., Adv.Enzymol. Relat. Areas Mol. Biol., 47:45-148 (1978); Chou et al., Ann. Rev. Biochem., 47:251- 276 and Chou et al., Biophys. J., 26:367-384 (1979). Moreover, computer programs are currently available to assist with predicting secondary structure. One method of predicting secondary structure is based upon homology’ modeling. For example, two polypeptides or proteins which have a sequence identity of greater than 30%, or similarity greater than 40% often have similar structural topologies. The recent growth of the protein structural database (PDB) has provided enhanced predictability of secondary structure, including the potential number of folds within a polypeptide’s or protein’s structure. See Holm et al., Nucl. Acid. Res., 27(l):244-247 (1999). It has been suggested (Brenner et al., Curr. Op. Struct. Biol., 7(3): 369-376 (1997)) that there are a limited number of folds in a given polypeptide or protein and that once a critical number of structures have been resolved, structural prediction will become dramatically more accurate.
[0421] Additional methods of predicting secondary structure include “threading” (Jones, D., Curr. Opin. Struct. Biol., 7(3) :377-87 (1997) ; Sippl et al., Structure, 4(1) : 15-19 (1996)), “profile analysis” (Bowie el al., Science. 253 : 164-170 (1991) ; Gribskov et al., Melh.Enzym., 183 : 146-159 (1990) ; Gribskov et al., Proc. Nat. Acad. Sci.. 84(13):4355-4358 (1987)), and “evolutionary linkage” (See Holm, supra (1999), and Brenner, supra (1997)).
[0422] In certain embodiments, variants of antibodies include glycosylation variants wherein the number and / or type of glycosylation site has been altered compared to the amino acid sequences of a parent polypeptide. In certain embodiments, variants comprise a greater or a lesser number of N-linked glycosylation sites than the native protein. An N-linked glycosylation site is characterized by the sequence: Asn-X-Ser or Asn-X-Thr, wherein the amino acid residue designated as X can be any amino acid residue except proline. The substitution of amino acid residues to create this sequence provides a potential new site for the addition of an N-linked carbohydrate chain. Alternatively, substitutions which eliminate102 28152 / 59621 / FW / 25024150.11Attorney Docket No.: 28152-59621 / WOClient Ref. No.: 024WO this sequence will remove an existing N-linked carbohydrate chain. Also provided is a rearrangement of N-linked carbohydrate chains wherein one or more N-linked glycosylation sites (typically those that are naturally occurring) are eliminated and one or more new N- linked sites are created. Additional preferred antibody variants include cysteine variants wherein one or more cysteine residues are deleted from or substituted for another amino acid (e.g., serine) as compared to the parent amino acid sequence. Cysteine variants can be useful when antibodies must be refolded into a biologically active conformation such as after the isolation of insoluble inclusion bodies. Cysteine variants generally have fewer cysteine residues than the native protein, and ty pically have an even number to minimize interactions resulting from unpaired cysteines.
[0423] According to certain embodiments, preferred amino acid substitutions are those which: (1) reduce susceptibility to proteolysis, (2) reduce susceptibility to oxidation, (3) alter binding affinity for forming protein complexes, (4) alter binding affinities, and / or (4) confer or modify other physiochemical or functional properties on such polypeptides. According to certain embodiments, single or multiple amino acid substitutions (in certain embodiments, conservative amino acid substitutions) may be made in the naturally-occurring sequence (in certain embodiments, in the portion of the polypeptide outside the domain(s) forming intermolecular contacts). In certain embodiments, a conservative amino acid substitution typically may not substantially change the structural characteristics of the parent sequence (e.g.. a replacement amino acid should not tend to break a helix that occurs in the parent sequence, or disrupt other types of secondary structure that characterizes the parent sequence). Examples of art-recognized polypeptide secondary and tertiary structures are described in Proteins, Structures and Molecular Principles (Creighton, Ed., W. H. Freeman and Company, New York (1984)); Introduction to Protein Structure (C. Branden and J. Tooze, eds., Garland Publishing, New York, N.Y. (1991)); and Thornton et al. Nature 354: 105 (1991), which are each incorporated herein by reference.
[0424] In certain embodiments, ABPs of the invention may be chemically bonded with polymers, lipids, or other moieties.
[0425] The binding agents may comprise at least one of the CDRs described herein incorporated into a biocompatible framework structure. In one example, the biocompatible framework structure comprises a polypeptide or portion thereof that is sufficient to form a conformationally stable structural support, or framework, or scaffold, which is able to display103 28152 / 59621 / FW / 25024150.11Attorney Docket No.: 28152-59621 / WOClient Ref. No.: 024WO one or more sequences of amino acids that bind to an antigen (e.g, CDRs, a variable region, etc.) in a localized surface region. Such structures can be a naturally occurring polypeptide or polypeptide “fold” (a structural motif), or can have one or more modifications, such as additions, deletions or substitutions of amino acids, relative to a naturally occurring polypeptide or fold. These scaffolds can be derived from a polypeptide of any species (or of more than one species), such as a human, other mammal, other vertebrate, invertebrate, plant, bacteria or virus.
[0426] Typically, the biocompatible framework structures are based on protein scaffolds or skeletons other than immunoglobulin domains. For example, those based on fibronectin, ankyrin, lipocalin, neocarzinostain, cytochrome b. CPI zinc finger, PST1, coiled coil, LACI- Dl, Z domain and tendamistat domains may be used (See e.g., Nygren and Uhlen, 1997, Curr. Opin. in Struct. Biol.. 7, 463-469).
[0427] It will be appreciated that the ABPs of the invention include the humanized antibodies described herein. Humanized antibodies such as those described herein can be produced using techniques known to those skilled in the art (Zhang, W., et al.. Molecular Immunology. 42(12): 1445-1451, 2005; Hwang W. et al., Methods. 36(1): 35-42. 2005; Dall’Acqua WF, et al., Methods 36(l):43-60, 2005; and Clark, M., Immunology Today. 21 (8): 397-402, 2000).ABP Format
[0428] In some embodiments, the RPP comprises scFvs. In some embodiments, the RPP consists of scFvs. In some embodiments, the RPP comprises antibody fragments. In some embodiments, the RPP consists of antibody fragments. In some embodiments, the RPP comprises recombinant full-length antibodies. In some embodiments, the RPP consists of recombinant full-length antibodies. In some embodiments, the RPP comprises human antibodies. In some embodiments, the RPP comprises humanized antibodies. In some embodiments, the RPP comprises monospecific ABPs. In some embodiments, the RPP comprises bispecific ABPs. In some embodiments, the RPP consists of ABPs (individually or in combination) of a human IgG subtype including IgGl, IgG2, IgG3, and IgG4. In some embodiments, the RPP comprises IgM, IgD, IgG, IgA, IgE. or a combination thereof.
[0429] In some embodiments, the RPP comprises antibody fragments. The ABPs of the RPP can be a Fab fragment, a F(ab')2 fragment an Fv fragment, or a combination thereof. A Fab fragment is a monovalent fragment having the VL, VH, CL and CHI domains; a F(ab’)2 fragment is a bivalent fragment having two Fab fragments linked by a disulfide bridge at the104 28152 / 59621 / FW / 25024150.11Attorney Docket No.: 28152-59621 / WOClient Ref. No.: 024WO hinge region; a Fd fragment has the Vn and CHI domains; an Fv fragment has the VL and VH domains of a single arm of an antibody; and a dAb fragment has a VH domain, a VL domain, or an antigen-binding fragment of a VH or VL domain (US Pat. No. 6.846,634, 6,696,245, US App. Pub. No. 05 / 0202512, 04 / 0202995, 04 / 0038291. 04 / 0009507. 03 / 0039958. Ward et al.. Nature 341 :544-546, 1989).
[0430] Naturally occurring immunoglobulin chains exhibit the same general structure of relatively conserved framework regions (FR) joined by three hypervariable regions, also called complementarity determining regions or CDRs. From N-terminus to C-terminus, both light and heavy chains comprise the domains FR1, CDR1, FR2, CDR2, FR3, CDR3 and FR4. The assignment of amino acids to each domain is in accordance with the definitions of Kabat et al. in Sequences of Proteins of Immunological Interest. 5thEd.. US Dept, of Health and Human Services, PHS, NIH, NIH Publication no. 91-3242, 1991.
[0431] In some embodiments, the RPP comprises or consists of humanized antibodies. A humanized antibody has a sequence that differs from the sequence of an antibody derived from a non-human species by one or more amino acid substitutions, deletions, and / or additions, such that the humanized antibody is less likely to induce an immune response, and / or induces a less severe immune response, as compared to the non-human species antibody, when it is administered to a human subject. In one embodiment, certain amino acids in the framework and constant domains of the heavy and / or light chains of the non- human species antibody are mutated to produce the humanized antibody. In another embodiment, the constant domain(s) from a human antibody are fused to the variable domain(s) of a non-human species. In another embodiment, one or more amino acid residues in one or more CDR sequences of a non-human antibody are changed to reduce the likely immunogenicity of the non-human antibody when it is administered to a human subject, wherein the changed amino acid residues either are not critical for immunospecific binding of the antibody to its antigen, or the changes to the amino acid sequence that are made are conservative changes, such that the binding of the humanized antibody to the antigen is not significantly worse than the binding of the non-human antibody to the antigen. Examples of how to make humanized antibodies may be found in U.S. Pat. Nos. 6,054,297, 5,886,152 and 5,877,293.
[0432] Fragments or analogs of antibodies can be readily prepared by those of ordinary skill in the art following the teachings of this specification and using techniques well-known in the105 28152 / 59621 / FW / 25024150.11Attorney Docket No.: 28152-59621 / WOClient Ref. No.: 024WO art. Preferred amino- and carboxy -termini of fragments or analogs occur near boundaries of functional domains. Structural and functional domains can be identified by comparison of the nucleotide and / or amino acid sequence data to public or proprietary sequence databases. Computerized comparison methods can be used to identify sequence motifs or predicted protein conformation domains that occur in other proteins of known structure and / or function. Methods to identify protein sequences that fold into a known three-dimensional structure are known. See, e.g., Bowie et al., 1991, Science 253:164.
[0433] An ABP of an RPP can also be any synthetic or genetically engineered protein. For example, antibody fragments include isolated fragments consisting of the light chain variable region, “Fv” fragments consisting of the variable regions of the heavy and light chains, recombinant single chain polypeptide molecules in which light and heavy variable regions are connected by a peptide linker (scFv proteins).
[0434] Another form of an antibody fragment is a peptide comprising one or more complementarity determining regions (CDRs) of an antibody. CDRs (also termed "‘minimal recognition units’’, or “hypervariable region”) can be incorporated into a molecule either covalently or noncovalently to make it an antigen binding protein. CDRs can be obtained by constructing polynucleotides that encode the CDR of interest. Such polynucleotides are prepared, for example, by using the polymerase chain reaction to synthesize the variable region using mRNA of antibody producing cells as a template (see, for example, Larnck et al., Methods: A Companion to Methods in Enzymology 2: 106, 1991; Courtenay Luck, “Genetic Manipulation of Monoclonal Antibodies,” in Monoclonal Antibodies: Production, Engineering and Clinical Application, Ritter et al. (eds.), page 166 (Cambridge University Press 1995); and Ward et al., “Genetic Manipulation and Expression of Antibodies,” in Monoclonal Antibodies: Principles and Applications, Birch et al., (eds.), page 137 (Wiley Liss, Inc. 1995).
[0435] The variable region domains of ABPs can be any naturally occurring variable domain or an engineered version thereof. By engineered version is meant a variable region domain that has been created using recombinant DNA engineering techniques. Such engineered versions include those created, for example, from a specific antibody variable region by insertions, deletions, or changes in or to the amino acid sequences of the specific antibody. Particular examples include engineered variable region domains containing at least one CDR106 28152 / 59621 / FW / 25024150.11Attorney Docket No.: 28152-59621 / WOClient Ref. No.: 024WO and optionally one or more framework amino acids from a first antibody and the remainder of the variable region domain from a second antibody.
[0436] The variable region domain may be covalently attached at a C terminal amino acid to at least one other antibody domain or a fragment thereof. Thus, for example, a Vn domain that is present in the variable region domain may be linked to an immunoglobulin CHI domain, or a fragment thereof. Similarly, a VL domain may be linked to a CK domain or a fragment thereof. In this way, for example, the antibody may be a Fab fragment wherein the antigen binding domain contains associated VH and VL domains covalently linked at their C termini to a CHI and CK domain, respectively. The CHI domain may be extended with further amino acids, for example to provide a hinge region or a portion of a hinge region domain as found in a Fab’ fragment, or to provide further domains, such as antibody CH2 and CH3 domains.
[0437] The RPP can include ABPs comprising, e.g., the cognate pairs of heavy and light chain CDR3 sequence disclosed herein. For example, CDRs may be incorporated into known antibody framework regions (IgGl, IgG2, etc.), or conjugated to a suitable vehicle to enhance the half-life thereof. Suitable vehicles include, but are not limited to Fc, polyethylene glycol (PEG), albumin, transferrin, and the like. These and other suitable vehicles are known in the art. Such conjugated CDR peptides may be in monomeric, dimeric, tetrameric, or other form. In one embodiment, one or more water-soluble polymer is bonded at one or more specific position, for example at the amino terminus, of a binding agent.
[0438] In certain embodiments, the ABP comprises one or more water soluble polymer attachments, including, but not limited to, polyethylene glycol, polyoxyethylene glycol, or polypropylene glycol. See, e.g., U.S. Pat. Nos. 4,640,835, 4,496,689, 4,301,144, 4,670,417, 4,791,192 and 4,179,337. In certain embodiments, a derivative binding agent comprises one or more of monomethoxy-polyethylene glycol, dextran, cellulose, or other carbohydrate based polymers, poly-(N-vinyl pyrrolidone)-poly ethylene glycol, propylene glycol homopolymers, a polypropylene oxide / ethylene oxide co-polymer, polyoxy ethylated polyols (e.g., glycerol) and polyvinyl alcohol, as well as mixtures of such polymers. In certain embodiments, one or more w ater-soluble polymer is randomly attached to one or more side chains. In certain embodiments, PEG can act to improve the therapeutic capacity for a binding agent, such as an antibody. Certain such methods are discussed, for example, in U.S. Pat. No. 6,133,426, which is hereby incorporated by reference for any purpose.107 28152 / 59621 / FW / 25024150.11Attorney Docket No.: 28152-59621 / WOClient Ref. No.: 024WO
[0439] An ABP of an RPP can have, for example, the structure of a naturally occurring immunoglobulin. An “immunoglobulin” is a tetrameric molecule. In a naturally occurring immunoglobulin, each tetramer is composed of two identical pairs of polypeptide chains, each pair having one “light” (about 25 kDa) and one “heavy” chain (about 50-70 kDa). The amino-terminal portion of each chain includes a variable region of about 100 to 110 or more amino acids primarily responsible for antigen recognition. The carboxy -terminal portion of each chain defines a constant region primarily responsible for effector function. Human light chains are classified as kappa and lambda light chains. Heavy chains are classified as mu, delta, gamma, alpha, or epsilon, and define the antibody’s isotype as IgM, IgD, IgG, IgA, and IgE, respectively. Within light and heavy chains, the variable and constant regions are joined by a “J” region of about 12 or more amino acids, with the heavy chain also including a “D” region of about 10 more amino acids. See generally, Fundamental Immunology Ch. 7 (Paul, W.. ed., 2nded. Raven Press, N.Y. (1989)) (incorporated by reference in its entirety for all purposes). The variable regions of each light / heavy chain pair form the antibody binding site such that an intact immunoglobulin has two binding sites.
[0440] Different ABPs may bind to different domains of disease targets or act by different mechanisms of action. As indicated herein inter alia, the domain regions are designated such as to be inclusive of the group, unless otherwise indicated. For example, amino acids 4-12 refers to nine amino acids: amino acids at positions 4, and 12. as well as the seven intervening amino acids in the sequence. Other examples include antigen binding proteins that inhibit binding of a pathogen to its target cell, i.e., neutralizing activity. An antigen binding protein need not completely inhibit a binding to target cell to find use in the present invention.
[0441] The ABPs describe herein can include an Fc region, e g., a dimer Fc polypeptide. One suitable Fc polypeptide, described in PCT application WO 93 / 10151 (hereby incorporated by reference), is a single chain polypeptide extending from the N-terminal hinge region to the native C-terminus of the Fc region of a human IgGl antibody. Another useful Fc polypeptide is the Fc mutein described in U.S. Patent 5,457,035 and in Baum et al., 1994, EMBO J. 13:3992-4001. The amino acid sequence of this mutein is identical to that of the native Fc sequence presented in WO 93 / 10151, except that amino acid 19 has been changed from Leu to Ala, amino acid 20 has been changed from Leu to Glu, and amino acid 22 has been changed from Gly to Ala. The mutein exhibits reduced affinity for Fc receptors.108 28152 / 59621 / FW / 25024150.11Attorney Docket No.: 28152-59621 / WOClient Ref. No.: 024WO
[0442] Antigen-binding fragments of ABPs of the invention can be produced by conventional techniques. Examples of such fragments include, but are not limited to, Fab and F(ab’)2fragments. Antibody fragments and derivatives produced by genetic engineering techniques also are contemplated.
[0443] Additional embodiments include chimeric antibodies, e.g., humanized versions of non-human (e.g., murine) monoclonal antibodies. Such humanized antibodies may be prepared by known techniques, and offer the advantage of reduced immunogenicity when the antibodies are administered to humans. In one embodiment, a humanized antibody comprises the variable domain of a murine antibody (or all or part of the antigen binding site thereof) and a constant domain derived from a human antibody. Alternatively, a humanized antibody fragment may comprise the antigen binding site of a murine antibody and a variable domain fragment (lacking the antigen-binding site) derived from a human antibody. Procedures for the production of chimeric and further engineered antibodies include those described in Riechmann et al., 1988, Nature 332:323, Liu et al., 1987, Proc. Nat. Acad. Sci. USA 84:3439, Larrick et al., 1989, Bio / Technology 7:934, and Winter et al., 1993, TIPS 14: 139. In one embodiment, the chimeric antibody is a CDR grafted antibody. Techniques for humanizing antibodies are discussed in, e.g., U.S. Pat. No.s 5,869,619, 5,225,539, 5,821,337, 5,859,205, 6,881,557, Padlan et a / ., 1995, FASEB J. 9: 133-39, and Tamura et al., 2000, J. Immunol. 164: 1432-41.
[0444] Procedures have been developed for generating human or partially human antibodies in non-human animals. For example, mice in which one or more endogenous immunoglobulin genes have been inactivated by various means have been prepared. Human immunoglobulin genes have been introduced into the mice to replace the inactivated mouse genes. Antibodies produced in the animal incorporate human immunoglobulin polypeptide chains encoded by the human genetic material introduced into the animal. In one embodiment, a non-human animal, such as a transgenic mouse, is immunized with a vaccine, such that antibodies directed against the vaccine antigen pare generated in the animal.
[0445] Examples of techniques for production and use of transgenic animals for the production of human or partially human antibodies are described in U.S. Patents 5.814,318, 5,569,825, and 5,545,806, Davis et al., 2003, Production of human antibodies from transgenic mice in Lo, ed. Antibody Engineering: Methods and Protocols, Humana Press, NJ: 191-200, Kellermann et al, 2002, Curr Opin Biotechnol. 13 :593-97, Russel et al, 2000, Infect Immun.109 28152 / 59621 / FW / 25024150.11Attorney Docket No.: 28152-59621 / WOClient Ref. No.: 024WO68 : 1820-26, Gallo et al., 2000, Eur JImmun. 30:534-40, Davis et al., 1999, Cancer Metastasis Rev. 18:421-25, Green, 1999, J Immunol Methods . 231: 11-23, Jakobovits, 1998, Advanced Drug Delivery Reviews 31 :33-42, Green et al., 1998, J Exp Med. 188:483-95, Jakobovits A, 1998, Exp. Opin. Invest. Drugs. 7 :607-14, Tsuda et al., 1997, Genomics.42 :413-21 , Mendez et al., 1997, Nat Genet. 15 : 146-56, Jakobovits, 1994, Curr Biol. 4 :761 - 63, Arbones et al., 1994, Immunity. 1 :247-60, Green et al., 1994, Nat Genet. 7 : 13-21, Jakobovits et al., 1993, Nature. 362 :255-58, Jakobovits et al., 1993, Proc Natl Acad Sci U S A. 90 :2551-55. Chen, J., M. Trounstine, F. W. Alt, F. Young, C. Kurahara, J. Loring, D. Huszar. Inter ’I Immunol. 5 (1993): 647-656, Choi et al., 1993, Nature Genetics 4: 117-23, Fishwild et al., 1996, Nature Biotech. 14: 845-51, Harding et al., 1995, Annals of the New York Academy of Sciences, Lonberg et al., 1994, Nature 368: 856-59, Lonberg, 1994, Transgenic Approaches to Human Monoclonal Antibodies in Handbook of Experimental Pharmacology 113: 49-101, Lonberg et al., 1995, Internal Review of Immunology 13: 65-93, Neuberger, 1996, Nature Biotechnology 14: 826, Taylor et al., 1992, Nucleic Acids Res. 20: 6287-95, Taylor et al., 1994, Inter ’I Immunol. 6: 579-91, Tomizuka et al., 1997, Nature Genetics 16: 133-43, Tomizuka et al., 2000, Pro. Nat ’lAcad. Sci. USA 97 : 722-27, Tuaillon et al.. 1993, Pro.Nat 'lAcad.Sci. USA 90 : 3720-24, and Tuaillon et al., 1994, J. Immunol. 152 : 2912-20.
[0446] ABPs of the invention can comprise any constant region known in the art. The light chain constant region can be, for example, a kappa- or lambda-type light chain constant region, e.g, a human kappa- or lambda-type light chain constant region. The heavy chain constant region can be, for example, an alpha-, delta-, epsilon-, gamma-, or mu-ly pe heavy chain constant regions, e.g., a human alpha-, delta-, epsilon-, gamma-, or mu-type heavy chain constant region. In one embodiment, the light or heavy chain constant region is a fragment, derivative, variant, or mutein of a naturally occurring constant region.
[0447] Techniques are known for deriving an antibody of a different subclass or isotype from an antibody of interest, i.e., subclass switching. Thus, IgG antibodies may be derived from an IgM antibody, for example, and vice versa. Such techniques allow the preparation of new antibodies that possess the antigen-binding properties of a given antibody (the parent antibody), but also exhibit biological properties associated with an antibody isotype or subclass different from that of the parent antibody. Recombinant DNA techniques may be employed. Cloned DNA encoding particular antibody polypeptides may be employed in such1 10 28152 / 59621 / FW / 25024150.11Attorney Docket No.: 28152-59621 / WOClient Ref. No.: 024WO procedures, e.g., DNA encoding the constant domain of an antibody of the desired isotype. See also Lantto et al., 2002, Methods Mol. Biol. 178:303-16.
[0448] Single chain antibodies (scFv) may be formed by linking heavy and light chain variable domain (Fv region) fragments via an amino acid bridge (short peptide linker, e.g., a synthetic sequence of amino acid residues), resulting in a single polypeptide chain. Such single-chain Fvs (scFvs) have been prepared by fusing DNA encoding a peptide linker between DNAs encoding the two variable domain polypeptides (VL and VH). The resulting polypeptides can fold back on themselves to form antigen-binding monomers, or they can form multimers (e.g., dimers, trimers, or tetramers), depending on the length of a flexible linker between the two variable domains (Kortt et al., 1997, Prot. Eng. 10:423; Kortt et al., 2001, Biomol. Eng. 18:95-108, Bird et al.. 1988, Science 242:423-26 and Huston et al., 1988, Proc. Natl. Acad. Sci. USA 85:5879-83). By combining different VL and Vu-comprising polypeptides, one can form multimeric scFvs that bind to different epitopes (Kriangkum et al., 2001, Biomol. Eng. 18:31-40). Techniques developed for the production of single chain antibodies include those described in U.S. Patent No. 4.946,778; Bird, 1988, Science 242:423; Huston et al., 1988, Proc. Natl. Acad. Sci. USA 85:5879; Ward et al.. 1989, Nature 334:544, de Graaf et al., 2002, Methods Mol Biol. 178:379-87.
[0449] An ABP according to the invention may have a binding affinity for antigen target of less than or equal to 5 x 10‘7M, less than or equal to 1 x 10'7M, less than or equal to 0.5 x 10’7M, less than or equal to 1 x 10'8M, less than or equal to 1 x 10’9M, less than or equal to 1 x 10'10M, less than or equal to 1 x 10-11M, or less than or equal to 1 x 10'12M.
[0450] The affinity of an ABP, as well as the extent to which the ABP inhibits binding, can be determined by one of ordinary skill in the art using conventional techniques, for example those described by Scatchard et al. (Ann. N. Y. Acad. Sci. 51 : 660-672 (1949)) or by surface plasmon resonance (SPR; BIAcore, Biosensor, Piscataway, NJ). For surface plasmon resonance, target molecules are immobilized on a solid phase and exposed to ligands in a mobile phase running along a flow cell. If ligand binding to the immobilized target occurs, the local refractive index changes, leading to a change in SPR angle, which can be monitored in real time by detecting changes in the intensity of the reflected light. The rates of change of the SPR signal can be analyzed to yield apparent rate constants for the association and dissociation phases of the binding reaction. The ratio of these values gives the apparent equilibrium constant (affinity) (see, e.g., Wolff et al., Cancer Res. 53:2560-65 (1993)).I l l 28152 / 59621 / FW / 25024150.11Attorney Docket No.: 28152-59621 / WOClient Ref. No.: 024WO6.6. Nucleic acids
[0451] In one aspect, the present invention provides isolated nucleic acid molecules. The nucleic acids comprise, for example, polynucleotides that encode all or part of an RPP, for example, one or both chains of an antibody of the invention, or a fragment, derivative, mutein, or variant thereof, polynucleotides sufficient for use as hybridization probes, PCR primers or sequencing primers for identifying, analyzing, mutating or amplifying a polynucleotide encoding a polypeptide, anti-sense nucleic acids for inhibiting expression of a polynucleotide, and complementary sequences of the foregoing. The nucleic acids can be any length. They can be, for example, 5, 10, 15, 20, 25, 30, 35, 40, 45, 50, 75, 100, 125, 150, 175, 200, 250. 300, 350, 400, 450, 500. 750, 1,000. 1,500, 3.000, 5,000 or more nucleotides in length, and / or can comprise one or more additional sequences, for example, regulatory sequences, and / or be part of a larger nucleic acid, for example, a vector. The nucleic acids can be single-stranded or double-stranded and can comprise RNA and / or DNA nucleotides, and artificial variants thereof (e.g., peptide nucleic acids).
[0452] Polynucleotides encoding antibody polypeptides (e.g., heavy or light chain, variable domain only, CDRs only, or full length) can be isolated from B cells, plasma cells, or plasmablasts of a subject that has been exposed to an antigen, e.g., by being infected by virus or immunized with a vaccine. The nucleic acid can be isolated by conventional procedures such as polymerase chain reaction (PCR) or methods described herein (e.g., single cell OE- RT-PCR).
[0453] Polypeptide sequences of the CDR3 from the variable regions of the heavy and light chain variable regions are shown herein. The skilled artisan will appreciate that, due to the degeneracy of the genetic code, each of the polypeptide sequences disclosed herein is encoded by a large number of other nucleic acid sequences. The present invention provides each degenerate nucleotide sequence encoding each RPP of the invention.
[0454] Methods for hybridizing nucleic acids are well-known in the art. See, e.g., Curr. Prot. in Mol. Biol., John Wiley & Sons, N.Y. (1989), 6.3.1-6.3.6. As defined herein, a moderately stringent hybridization condition uses a prewashing solution containing 5X sodium chloride / sodium citrate (SSC), 0.5% SDS, 1 .0 mM EDTA (pH 8.0), hybridization buffer of about 50% formamide, 6X SSC, and a hybridization temperature of 55° C (or other similar hybridization solutions, such as one containing about 50% formamide, with a hybridization temperature of 42° C), and washing conditions of 60° C, in 0.5X SSC, 0.1% SDS. A stringent1 12 28152 / 59621 / FW / 25024150.11Attorney Docket No.: 28152-59621 / WOClient Ref. No.: 024WO hybridization condition hybridizes in 6X SSC at 45° C, followed by one or more washes in 0. IX SSC, 0.2% SDS at 68° C. Furthermore, one of skill in the art can manipulate the hybridization and / or washing conditions to increase or decrease the stringency of hybridization such that nucleic acids comprising nucleotide sequences that are at least 65, 70, 75, 80, 85, 90, 95, 98, or 99% identical to each other typically remain hybridized to each other. The basic parameters affecting the choice of hybridization conditions and guidance for devising suitable conditions are set forth by, for example, Sambrook, Fritsch, and Maniatis (1989, Molecular Cloning: A Laboratory Manual. Cold Spring Harbor Laboratory Press. Cold Spring Harbor, N.Y., chapters 9 and 11; and Curr. Prot. in Mol. Biol. 1995, Ausubel et al., eds., John Wiley & Sons, Inc., sections 2.10 and 6.3-6.4), and can be readily determined by those having ordinary skill in the art based on, for example, the length and / or base composition of the DNA.
[0455] Changes can be introduced by mutation into a nucleic acid, thereby leading to changes in the amino acid sequence of a polypeptide (e.g., an RPP) that it encodes. Mutations can be introduced using any technique known in the art. In one embodiment, one or more particular amino acid residues are changed using, for example, a site-directed mutagenesis protocol. In another embodiment, one or more randomly selected residues are changed using, for example, a random mutagenesis protocol. However, it is made, a mutant polypeptide can be expressed and screened for a desired property (e.g, binding to a virus).
[0456] In another aspect, the present invention provides nucleic acid molecules that are suitable for use as primers or hybridization probes for the detection of nucleic acid sequences of the invention. A nucleic acid molecule of the invention can comprise only a portion of a nucleic acid sequence encoding a full-length polypeptide of the invention, for example, a fragment that can be used as a probe or primer or a fragment encoding an active portion (e.g., a virus binding portion) of a polypeptide of the invention.
[0457] Probes based on the sequence of a nucleic acid of the invention can be used to detect the nucleic acid or similar nucleic acids, for example, transcripts encoding a polypeptide of the invention. The probe can comprise a label group, e.g. , a radioisotope, a fluorescent compound, an enzy me, or an enzyme co-factor. Such probes can be used to identify a cell that expresses the polypeptide
[0458] In another aspect, the present invention provides libraries of nu...
Claims
Attorney Docket No.: 28152-59621 / WOClient Ref. No.: 024WOWHAT IS CLAIMED IS:
1. A method of selecting a filtered antigen binding protein (ABP) library’ dataset from a set of candidate ABP library datasets, comprising:(1) obtaining an input ABP library dataset including an ABP profile for each of a plurality of ABPs;(2) generating a set of candidate ABP library datasets each corresponding to a respective subset of ABPs from the plurality of ABPs:(3) for each ABP in a respective candidate ABP library dataset, obtaining values for a plurality of characteristic descriptors for the respective ABP, wherein the plurality of characteristic descriptors is selected from:(i) a binding affinity of the respective ABP for the respective target antigen;(li) an effector activity of the respective ABP against the target molecule or complex;(iii) a solubility7score of the respective ABP;(iv) an aggregation score of the respective ABP;(v) a hydrophobicity score of the respective ABP;(vi) an isoelectric point of the respective ABP;(vii) a stability score of the respective ABP;(viii) a molecular weight of the respective ABP;(ix) a number of unpaired cysteine residues in the respective ABP;(x) an abundance frequency of the respective ABP following a sorting process to enrich for binding to the respective target antigen;(xi) a fold-change of the increase in the abundance frequency of the respective ABP following a sorting process to enrich for binding to the respective target antigen, as compared to the abundance frequency prior to enrichment;(xii) a number of non-canonical glycosylation sites in the respective ABP;(xiii) a number of cleavage sites in the respective ABP;(xiv) a number of deamidation sites in the respective ABP;(xv) a number of isomerization sites in the respective ABP;(xvi) a number of oxidation sites in the respective ABP;(xvii) CDR3H length of the respective ABP;(xviii) binding specificity of the respective ABP;166 28152 / 59621 / FW / 25024150.11Attorney Docket No.: 28152-59621 / WOClient Ref. No.: 024WO(xix) immunogenicity of the respective ABP;(xx) poly specificity of the respective ABP; and(xxi) a respective epitope that the respective ABP binds to, wherein a value for at least one characteristic descriptor for the respective ABP is generated by applying a machine-learning model;(4) for each candidate ABP library dataset, obtaining values associated with one or more preferred library properties of the candidate ABP library' dataset, wherein the one or more preferred library properties are selected from:(i) the set of heavy chain CDR3 sequences contained in the subset of the plurality of ABPs comprises at least about 10, 20, 50, 100, 200, or 1000 unique sequences;(ii) the subset of the plurality of ABPs specifically bind to at least two unique epitopes associated with the target molecule or complex;(iii) the subset of the plurality of ABPs is capable of modulating at least two target antigen variants;(iv) the set of heavy' chain V genes represented in the subset of the plurality of ABPs comprises at least two unique V genes;(v) the set of light chain V genes represented in the subset of the plurality of ABPs comprises at least two unique V genes;(vi) the set of heavy' chain J genes represented in the subset of the plurality' of ABPs comprises at least two unique J genes;(vii) the set of light chain J genes represented in the subset of the plurality of ABPs comprises at least two unique J genes;(viii) the average percent germline identity' of heavy chain V genes represented in the subset of the plurality of ABPs is between about 50% and about 100%;(ix) the average percent germline identity of light chain V genes represented in the subset of the plurality' of ABPs is between about 50% and about 100%;(x) the average percent germline identity of heavy chain J genes represented in the subset of the plurality of ABPs is between about 50% and about 100%; and(xi) the average percent germline identity of light chain J genes represented in the subset of the plurality7of ABPs is between about 50% and about 100%; and167 28152 / 59621 / FW / 25024150.11Attorney Docket No.: 28152-59621 / WOClient Ref. No.: 024WO(5) for each candidate ABP library dataset from the set of candidate ABP library datasets, generating a score by computing a weighted combination of the values of the plurality of characteristic descriptors of each of the respective subset of ABPs for the candidate ABP library dataset, and / or the values associated with the one or more preferred library properties of the candidate ABP library dataset; and(6) selecting a candidate ABP 1 i brary dataset as the filtered ABP library' dataset based on the generated scores for the set of candidate ABP library datasets, wherein the selected candidate ABP library dataset is associated with a respective score that is equal to or above a threshold.
2. The method of claim 1, wherein the plurality' of characteristic descriptors in (3) are selected from: a binding affinity of the respective candidate ABP to the respective target antigen; an effector activity of the respective candidate ABP against the target molecule or complex; a binding pattern of the respective candidate ABP to the respective target antigen and its variants; an abundance frequency of the respective candidate ABP following sorting the input ABP library' or a subset thereof to enrich for binding to the respective target antigen; and a fold-change of the increase in the abundance frequency of the respective candidate ABP following sorting the input ABP library or a subset thereof to enrich for binding to the respective target antigen, as compared to the abundance frequency prior to enrichment.
3. The method of claim 1 or 2, wherein the plurality of characteristic descriptors in (3) comprises the binding affinity of the respective ABP for a respective target antigen, optionally wherein the binding affinity' is determined by a PolyMap assay.
4. The method of any one of claims 1-3, wherein the characteristic descriptor comprises an output of a PolyMap assay, optionally wherein the output of the PolyMap assay indicates the binding affinity or the binding specificity'; further optionally wherein the Poly Map assay' comprises the steps of:168 28152 / 59621 / FW / 25024150.11Attorney Docket No.: 28152-59621 / WOClient Ref. No.: 024WO providing a library of target-decorated cells, wherein each of the targetdecorated cells presents the target antigen on the membrane; contacting the library of target-decorated cells with a plurality of ABP- ribosome-mRNA (ARM) complexes corresponding to the one or more of the plurality of ABPs, thereby inducing binding between the target-decorated cells and the ARM complexes; generating a plurality of monodisperse or poly disperse emulsion microdroplets, wherein each microdroplet contains a single cell out of the targetdecorated cells, one or more ARM complexes bound to the single cell, and a lysis reagent inducing lysis of the single cell; capturing RNA released from the single cell on a solid surface or within a semi-permeable shell; generating a library of hybrid polynucleic acids that comprise a sequence from a transcript of the single cell and / or a sequence from the mRNA of the ARM complex; sequencing the library' of hybrid polynucleic acids; and determining a presence or absence of binding of each of the one or more of the plurality of ABPs to their respective target antigen.
5. The method of any one of claims 1-4, wherein the plurality of characteristic descriptors in (3) comprises the effector activity of the respective ABP against the target molecule or target molecule complex.
6. The method of any one of claims 1-5, wherein the target molecule or target molecule complex comprises a virus, and the effector activity is a neutralization activity determined by a pseudovirus neutralization assay or a live virus neutralization assay; or wherein the target molecule or target molecule complex comprises a bacterium, and the effector activity is a bactericidal activity determined by a serum bactericidal assay (SBA) or an opsonophagocytic killing assay (OPKA).169 28152 / 59621 / FW / 25024150.11Attorney Docket No.: 28152-59621 / WOClient Ref. No.: 024WO7. The method of any one of claims 1-6, wherein the plurality of characteristic descriptors in (3) comprises the solubility7score, optionally wherein the solubilityscore is determined using SKADE.
8. The method of any one of claims 1-7, wherein the plurality of characteristic descriptors in (3) comprises the aggregation score of the respective ABP, optionally wherein the aggregation score corresponds to the number of residues predicted to have a propensity7to aggregate and is determined by a method comprising the steps of: determining a 3D structure of the ABP, optionally wherein the 3D structure is determined by an antibody -specific structure prediction model configured to generate a 3D representation of the ABP; and determining the aggregation score based on the 3D structure, optionally wherein the aggregation score is determined from the three-dimensional structural representation by7computing a structure-based aggregation propensity7metric.
9. The method of any one of claims 1-8, wherein the plurality of characteristic descriptors in (3) comprises the hydrophobicity score of the respective ABP, optionally wherein the hydrophobic score is determined as the grand average of hydropathy (GRAVY), optionally wherein the hydropathy value of each amino acid is calculated using the Eisenberg scale.
10. The method of any one of claims 1-9, wherein the plurality of characteristic descriptors in (3) comprises the isoelectric point of the respective ABP, optionally wherein the isoelectric point is determined as EMBOSS pK values.
11. The method of any one of claims 1-10, wherein the plurality of characteristic descriptors in (3) comprises the stability score of the respective ABP, optionally wherein the stability7score is determined by a method comprising the steps of: calculating an aliphatic index by determining the relative volume of A, V, L, and I residues, wherein the stability score corresponds to the aliphatic index.
12. The method of any one of claims 1-11, wherein the plurality7of characteristic descriptors in (3) comprises an abundance frequency or fold-change of the increase in170 28152 / 59621 / FW / 25024150.11Attorney Docket No.: 28152-59621 / WOClient Ref. No.: 024WO the abundance frequency of the respective ABP following a sorting process to enrich for binding to the respective target antigen, optionally wherein the sorting process is fluorescence-activated cell sorting (FACS) or magnetic-activated cell sorting (MACS) sorting, further optionally wherein the sorting is earned by yeast display.
13. The method of any one of claims 1-12, wherein the plurality of characteristic descriptors in (3) comprises the number of cleavage sites, optionally wherein the cleavage site is a DP motif in the variable heavy or variable light chain region of the respective ABP.
14. The method of any one of claims 1-13, wherein the plurality of characteristic descriptors in (3) comprises the number of deamidation sites, optionally wherein the deamidation site is an NG, NS. or NA motif in CDR2H or CDR1L of the respective ABP.
15. The method of any one of claims 1-14, wherein the plurality of characteristic descriptors in (3) comprises the number of isomerization sites, optionally wherein the isomerization site is a DG or DS motif in CDR2H, CDR3H, or CDR1L of the respective ABP.
16. The method of any one of claims 1-15, wherein the plurality of characteristic descriptors in (3) comprises the number of oxidation sites, optionally wherein the oxidation site is a W or M residue in the CDRHs or CDRLs of the respective ABP.
17. The method of any one of claims 1-16, wherein the plurality of characteristic descriptors in (3) comprises the binding specificity of the respective ABP, optionally wherein the binding specificity' corresponds to a number of variants of the target antigen capable of being targeted by the respective ABP, further optionally wherein the binding specificity is determined by a PolyMap assay.
18. The method of any one of claims 1-17, wherein the value for the at least one characteristic descriptor for the respective ABP is generated by:171 28152 / 59621 / FW / 25024150.11Attorney Docket No.: 28152-59621 / WOClient Ref. No.: 024WO obtaining a token sequence for the respective ABP encoding a partial or full sequence of the respective ABP; generating an output embedding sequence by applying a trained model to the token sequence for the respective ABP; and predicting the value for the at least one characteristic descriptor by applying the trained model to the token sequence for the respective ABP.
19. The method of any one of claims 1-18, wherein a value for at least one characteristic descriptor for the respective ABP is generated by: obtaining a token sequence for the respective ABP, wherein the token sequence for the respective ABP comprises a set of tokens, each token numerically encoding a respective residue of the full sequence or the partial sequence of the ABP; applying a machine-learned protein language model (PLM) to the set of tokens to generate a set of output embeddings for the ABP; obtaining a PLM embedding for the ABP by combining the set of output embeddings, wherein the PLM embedding represents the ABP in a latent space and is associated with a first hidden dimensionality; applying a machine-learned structure prediction model to the set of tokens for the ABP to generate a structural representation of the ABP; obtaining a structure embedding for the ABP by at least applying a structure embedding model to the structural representation of the ABP, wherein the structure embedding represents the structural representation in another latent space and is associated with a second hidden dimensionality; concatenating the PLM embedding and the structure embedding for the ABP together to generate a concatenated embedding; scaling elements of the concatenated embedding to generate a scaled embedding; and applying a machine-learned model head layer to the scaled embedding to generate a prediction for the characteristic descriptor.
20. The method of any one of claims 1-19, wherein the value for the at least one characteristic descriptor for the respective ABP is a respective epitope of the target antigen or its variants the respective ABP is predicted to bind to, and172 28152 / 59621 / FW / 25024150.11Attorney Docket No.: 28152-59621 / WOClient Ref. No.: 024WO wherein for each candidate ABP library dataset, the one or more preferred library properties of the candidate ABP library dataset in (4) comprises (ii) the subset of the plurality of ABPs specifically bind to at least two unique epitopes associated with the target molecule or complex.
21. The method of any one of claims 1-20, wherein the plurality of characteristic descriptors in comprises the hydrophobicity score for the respective ABP, optionally wherein the hydrophobicity score is determined by a method comprising the steps of: obtaining a set of tokens encoding a partial or full sequence of the respective ABP; applying a machine-learned structure prediction model to the set of tokens for the respective ABP to generate a structural representation of the respective ABP; for each residue in the partial or full sequence of the respective ABP, computing a solvent-accessible surface area (SASA) for the residue; for each residue in the partial or full sequence of the respective ABP, computing a product between the SASA for the residue and a hydrophobicity index for the residue to obtain a hydrophobicity score for the residue; and combining the hydrophobicity scores for each of the residues of the partial or full sequence of the respective ABP to generate the hydrophobicity score for the respective ABP.
22. The method of any one of claims 1-21, wherein the plurality of characteristic descriptors in comprises the polyreactivity for the respective ABP, optionally wherein the polyreactivity’ is determined by a method comprising the steps of: obtaining a set of tokens encoding a partial or full sequence of the ABP; applying a machine-learned protein language model (PLM) to the set of tokens to generate a set of output embeddings for the ABP; obtaining a PLM embedding for the ABP by combining the set of output embeddings, wherein the PLM embedding represents the ABP in a latent space; and applying parameters of a machine-learned classifier model to the PLM embedding for the ABP to generate a prediction for the polyreactivity of the ABP.
23. The method of any one of claims 1-22, wherein the token sequence of each sample includes a set of tokens, each token numerically encoding a respective residue of the173 28152 / 59621 / FW / 25024150.11Attorney Docket No.: 28152-59621 / WOClient Ref. No.: 024WO full sequence or the partial sequence of the ABP.
24. The method of any one of claims 1-23, wherein the machine-learning model is configured as a neural network model.
25. The method of any one of claims 1-24, wherein:(1) the score for each candidate ABP library dataset decreases when the stability score associated with (3)(vii) of each ABP in the respective subset of ABP’s increases.(2) the score for each candidate ABP library dataset increases when a diversity associated with (4)(ii) increases,(3) the score for each candidate ABP library dataset increases when an immunogenicity of the respective subset of ABP’s decreases,(4) the score for each candidate ABP library dataset increases when a diversity associated with (4)(iv) or (4)(v) increases.
26. The method of any one of claims 1-25, wherein the score for each candidate ABP library dataset is determined by a method comprising the steps of: obtaining a set of input features for the candidate ABP library dataset based on the values of the plurality of characteristic descriptors of each of the respective subset of ABPs for the candidate ABP library dataset, and optionally, the values of the one or more preferred library7properties of the candidate ABP library7dataset; and applying a scoring model to the set of input features of the candidate ABP library dataset to generate the score, wherein the scoring model is associated with a set of trained weights.
27. The method of any one of claims 1-26, wherein the set of weights of the scoring model is trained by a method comprising the steps of: obtaining a set of training ABP library datasets; for each training ABP library7dataset: obtaining a respective set of input features for the training ABP library7dataset from values of the plurality of characteristic descriptors of each of the respective subset of ABPs for the training ABP library dataset, and / or values174 28152 / 59621 / FW / 25024150.11Attorney Docket No.: 28152-59621 / WOClient Ref. No.: 024WO associated with the one or more preferred library properties of the training ABP library dataset, and obtaining one or more measured performance metrics for the training ABP library dataset, wherein the one or more measured performance metncs is selected from (i) titer, (ii) percent monomer, (iii) number of neutralized targets, (iv) neutralization activity, and (v) binding activity, optionally wherein the binding activity is determined by Polymap; and training the set of weights for the scoring model by reducing one or more loss functions indicating a difference between estimated outputs generated by the scoring model and the one or more measured performance metrics.
28. A recombinant polyclonal protein (RPP) comprising at least a set of ABPs specific for a target molecule or complex, wherein the RPP is formed by any of the methods of claims 1-27.
29. A method of predicting a characteristic descriptor for an antigen binding protein (ABP), comprising:(1) obtaining a set of tokens encoding a partial or full sequence of the ABP;(2) applying a transformer model to the set of tokens to generate a set of output embeddings for the ABP;(3) determining a representation for the ABP from at least the set of output embeddings for the ABP; and(4) generating a prediction for the characteristic descriptor by applying one or more machine-learning models to the representation for the ABP.
30. A method of training a machine-learning model for selecting a filtered ABP library dataset, comprising:(1) obtaining an input antigen binding protein (ABP) library' dataset including an ABP profile for each of a plurality of ABPs;(2) obtaining a training dataset including a plurality of sample ABP library datasets each corresponding to a respective subset of ABPs, wherein each sample ABP library dataset includes token sequences encoding a partial or full sequence of each ABP in the respective subset of ABP’s for the sample;175 28152 / 59621 / FW / 25024150.11Attorney Docket No.: 28152-59621 / WOClient Ref. No.: 024WO(3) for each ABP in a sample candidate ABP library dataset, obtaining values for a plurality of characteristic descriptors for the respective ABP, wherein the plurality7of characteristic descriptors is selected from:(i) a binding affinity of the respective ABP for the respective target antigen;(ii) an effector activity of the respective ABP against the respective target antigen;(iii) a solubility score of the respective ABP;(iv) an aggregation score of the respective ABP;(v) a hydrophobicity score of the respective ABP;(vi) an isoelectric point of the respective ABP;(vii) a stability7score of the respective ABP;(viii) a molecular weight of the respective ABP;(ix) a number of unpaired cysteine residues in the respective ABP;(x) an abundance frequency of the respective ABP following a sorting process to enrich for binding to the respective target antigen;(xi) a fold-change of the increase in the abundance frequency of the respective ABP following a sorting process to enrich for binding to the respective target antigen, as compared to the abundance frequency prior to enrichment;(xii) a number of non-canonical glycosylation sites in the respective ABP;(xiii) a number of cleavage sites in the respective ABP;(xiv) a number of deamidation sites in the respective ABP;(xv) a number of isomerization sites in the respective ABP;(xvi) a number of oxidation sites in the respective ABP;(xvii) CDR3H length of the respective ABP;(xviii) binding specificity of the respective ABP;(xix) immunogenicity of the respective ABP;(xx) polyspecificity7of the respective ABP; and(xxi) a respective epitope that the respective ABP binds to;(4) for each ABP in a sample candidate ABP library dataset, obtaining values associated with one or more preferred library properties of the sample candidate ABP library dataset, wherein the one or more preferred library properties is selected from:(i) the set of heavy chain CDR3 sequences contained in the subset of the plurality7of ABPs comprises at least about 10, 20, 50, 100, 200, or 1000176 28152 / 59621 / FW / 25024150.11Attorney Docket No.: 28152-59621 / WOClient Ref. No.: 024WO unique sequences;(ii) the subset of the plurality of ABPs specifically bind to at least two unique epitopes associated with the target molecule or complex;(iii) the subset of the plurality of ABPs is capable of modulating at least two target antigen variants;(iv) the set of heavy chain V genes represented in the subset of the plurality' of ABPs comprises at least two unique V genes;(v) the set of light chain V genes represented in the subset of the plurality of ABPs comprises at least two unique V genes;(vi) the set of heavy chain J genes represented in the subset of the plurality of ABPs comprises at least two unique J genes;(vii) the set of light chain J genes represented in the subset of the plurality of ABPs comprises at least two unique J genes;(viii) the average percent germline identity’ of heavy chain V genes represented in the subset of the plurality of ABPs is between about 50% and about 100%;(ix) the average percent germline identity of light chain V genes represented in the subset of the plurality’ of ABPs is between about 50% and about 100%;(x) the average percent germline identity’ of heavy chain J genes represented in the subset of the plurality of ABPs is between about 50% and about 100%; and(xi) the average percent germline identity of light chain J genes represented in the subset of the plurality of ABPs is between about 50% and about 100%; and(5) for each sample ABP library dataset, computing a score based on the values of the plurality’ of characteristic descriptors of each of the respective subset of ABP’s for the sample ABP library dataset, and / or the values associated with the one or more preferred library properties of the sample ABP library dataset;(6) dividing the training dataset into one or more batches of samples for one or more iterations; and(7) for each of one or more iterations:(a) for each sample ABP library dataset in the batch, applying parameters of a machine-learning model to the token sequences for the sample ABP library dataset to generate an estimated output,177 28152 / 59621 / FW / 25024150.11Attorney Docket No.: 28152-59621 / WOClient Ref. No.: 024WO(b) computing a loss function indicating differences between the scores and the estimated outputs for the batch of sample ABP library datasets, and(c) updating the weights of the machine-learning model by backpropagating error terms obtained from the loss function.178 28152 / 59621 / FW / 25024150.11