Systems and methods for predicting t-cell receptor engagement
The multi-matrix evaluation approach addresses the limitations of existing peptide interaction algorithms by considering both physical and chemical properties, improving prediction accuracy and resource efficiency in T-cell receptor interactions.
Patent Information
- Authority / Receiving Office
- WO · WO
- Patent Type
- Applications
- Current Assignee / Owner
- TEVOGEN BIO INC
- Filing Date
- 2025-12-17
- Publication Date
- 2026-06-25
AI Technical Summary
Existing predictive algorithms for peptide interactions fail to consider the complexity of chemical and physical properties, leading to high failure rates and inefficient resource allocation in peptide development.
A multi-matrix evaluation approach that incorporates physical and chemical properties of peptides to predict T-cell receptor interactions, using machine learning models trained on comprehensive datasets to identify anchor amino acids and binding sites.
Improves the accuracy of peptide interaction predictions, reducing false positives and conserving resources by focusing on likely binding targets, thereby enhancing the efficiency of peptide development.
Smart Images

Figure US2025060050_25062026_PF_FP_ABST
Abstract
Description
Attorney Docket No.: 767095.000075SYSTEMS AND METHODS FOR PREDICTING T-CELL RECEPTOR ENGAGEMENTCROSS-REFERENCE TO RELATED APPLICATIONS
[0001] This application claims priority to and the benefit of U.S. Provisional Patent Application No. 63 / 735, 193, titled “Al ALGORITHMS TO PREDICT T CELL RECEPTOR ENGAGEMENT TO SPECIFIC HLA+ PEPTIDE COMPLEXES,” filed December 17, 2024, the full disclosure of which is hereby incorporated by reference in its entirety for all purposes.BACKGROUND1. Field of Disclosure
[0002] Embodiments of the present disclosure relate to systems and methods to predict T-cell receptor recognition. Specifically, one or more embodiments are directed toward using a combination of physical and chemical interactions to identify peptide complexes and associated binding sites.2. Description of Related Art
[0003] Predictive algorithms for peptide interactions often fail to consider multiple levels of complexity associated with binding, which may include failures to appreciate relevant binding sites, chemical properties, and physical interactions of various peptides with contact sites. As a result, predicted peptides still demonstrate a high failure rate, even when predictions are generated with advanced models, leading to long lead times for peptide develop and wasted resources.SUMMARY
[0004] Applicant recognized the problems noted above herein and conceived and developed embodiments of systems and methods, according to the present disclosure, for predicting peptide interactions.Attorney Docket No.: 767095.000075
[0005] In an embodiment, a computer-implemented method includes receiving a proposed peptide configured to bind to a T-cell receptor to cause an immune response for a target condition. The computer-implemented method also includes determining, from a set of amino acids associated with the proposed peptide, a subset of binding amino acids. The computer-implemented method further includes generating a binding score for the proposed peptide based, at least in part, on at least one physical characteristic of the proposed peptide and at least one biochemical characteristic of the proposed peptide. The computer-implemented method includes generating, based on the binding score, an indication associated with a likelihood of binding for the proposed peptide.
[0006] In another embodiment, a process includes one or more circuits to receive a target peptide. The one or more circuits may also determine, using a multi-matrix evaluation, one or more anchor amino acids for the target peptide. The one or more circuits may further determine, using the multimatrix evaluation, one or more binding amino acids for the target peptide. The one or more circuits may determine, using the multi-matrix evaluation, one or more biochemical properties for the target peptide. The one or more circuits may further infer a likelihood of binding between the target peptide and a target T-cell receptor.
[0007] In another embodiment, a computer-implemented method includes receiving a target peptide. The computer-implemented method also includes determining, using a multi-matrix evaluation, one or more anchor amino acids for the target peptide. The computer-implemented method further includes determining, using the multi-matrix evaluation, one or more binding amino acids for the target peptide. The computer-implemented method includes determining, using the multi-matrix evaluation, one or more biochemical properties for the target peptide. The computer-implemented method also includes inferring a likelihood of binding between the target peptide and a target T-cell receptor.Attorney Docket No.: 767095.000075BRIEF DESCRIPTION OF DRAWINGS
[0008] The present technology will be better understood on reading the following detailed description of non-limiting embodiments thereof, and on examining the accompanying drawings, in which:
[0009] FIG. 1A illustrates an example schematic representation of a binding interaction with a T- cell receptor (TCR) , in accordance with embodiments of the present disclosure;
[0010] FIG. IB illustrates an example schematic representation of a binding interaction with a TCR, in accordance with embodiments of the present disclosure;
[0011] FIG. 2A illustrates an example environment for predicting and validating a likelihood of binding between a target peptide and a TCR, in accordance with embodiments of the present disclosure;
[0012] FIG. 2B illustrates an example environment for predicting one or more factors associated with binding between a target peptide and a TCR, in accordance with embodiments of the present disclosure;
[0013] FIG. 3 illustrates an example environment for testing an output result from one or more machine learning systems, in accordance with embodiments of the present disclosure;
[0014] FIG. 4 illustrates an example environment for predicting peptide binding, in accordance with embodiments of the present disclosure;
[0015] FIG. 5A is a flow chart of a process for predicting peptide binding interactions, in accordance with embodiments of the present disclosure;
[0016] FIG. 5B is a flow chart of a process for predicting peptide binding interactions, in accordance with embodiments of the present disclosure; and
[0017] FIG. 6 is an example configuration for a computing device, in accordance with embodiments of the present disclosure.Attorney Docket No.: 767095.000075DETAILED DESCRIPTION
[0018] The foregoing aspects, features, and advantages of the present disclosure will be further appreciated when considered with reference to the following description of embodiments and accompanying drawings. In describing the embodiments of the disclosure illustrated in the appended drawings, specific terminology will be used for the sake of clarity. However, the disclosure is not intended to be limited to the specific terms used, and it is to be understood that each specific term includes equivalents that operate in a similar manner to accomplish a similar purpose. Additionally, like reference numerals may be used for like components, but such use should not be interpreted as limiting the disclosure.
[0019] When introducing elements of various embodiments of the present disclosure, the articles "a", "an", "the", and "said" are intended to mean that there are one or more of the elements. The terms "comprising", "including", and "having" are intended to be inclusive and mean that there may be additional elements other than the listed elements. Any examples of operating parameters and / or environmental conditions are not exclusive of other parameters / conditions of the disclosed embodiments. Additionally, it should be understood that references to "one embodiment", "an embodiment", “certain embodiments”, or “other embodiments” of the present disclosure are not intended to be interpreted as excluding the existence of additional embodiments that also incorporate the recited features. Furthermore, reference to terms such as “above”, “below”, “upper”, “lower”, “side”, “front”, “back”, or other terms regarding orientation or direction are made with reference to the illustrated embodiments and are not intended to be limiting or exclude other orientations or directions. Like numbers may be used to refer to like elements throughout, but it should be appreciated that using like numbers is for convenience and clarity and not intended to limit embodiments of the present disclosure. Moreover, references to “substantially” or “approximately” or “about” may refer to differences within ranges of + / - 10 percent.Attorney Docket No.: 767095.000075
[0020] Embodiments of the present disclosure may be directed toward systems and methods for predicting binding interactions with T-cell receptors (TCRs). In at least one embodiment, TCRs present on CD8+ T-cells recognizing specific human leukocyte antigen (HLA)-allele + peptide targets may be sequenced and used, at least in part, to develop one or more predictive models (e.g., machine learning (ML) models, artificial intelligence (Al) models, statistical models, etc.) to predict CD8+T Cell Receptor recognition of specific HLA + peptide complexes based on CD8 TCR sequencing. Systems and methods of the present disclosure may include developing and / or training various models based, at least in part, on one or more databases, data lakes, meshes, and / or the like associated with identified matching regions of TCRs and the portions of various HLA molecules / peptide complexes with which they engage. In this manner, determinations of spatial complementarity may be determined, as well as complementarity of charge and other interactions which would make for stronger binding of T-cells and targets. Accordingly, embodiments of the present disclosure may be used to develop one or more algorithms for prediction of key interaction occurrences, as well as strengths of such interactions.
[0021] One or more embodiments of the present disclosure may be associated with one or more pipelines for prediction of binding interactions that may be used as part of an end-to-end development workflow. By way of example, a workspace associated with the pipeline may provide one or more users access to one or more models, which may be trained or fine tuned for particular desired interactions, specific molecules or peptides, and / or the like. In operation, an input is provided to the pipeline, such as a protein or HLA complex, which may be processed by the model to generate a prediction of a most likely binding site for the input protein. This prediction may then be used in downstream development and testing. Accordingly, embodiments may be provided as an accessible pipeline for providing target proteins / peptides / complexes to identify a predicted binding interaction.Attorney Docket No.: 767095.000075
[0022] Systems and methods of the present disclosure address and overcome problems with existing peptide interaction analysis, which may include manual testing or models that do not provide sufficient binding interaction information to confidently expend resources in developing and testing different configurations. For example, existing approaches have failed to identify the importance of both chemical and physical interaction properties, which may lead to predictions that disregard various properties, such as hydropathy, position scoring, and stability, among other properties. These problems were recognized and overcome by embodiments of the present disclosure, which provide improved considerations for a variety of properties to increase a likelihood of both a matching interaction and / or a strong interaction.
[0023] FIG. 1A illustrates a schematic representation 100 of a binding interaction 102 between a T-cell 104 and an antigen-presenting cell 106. In this example, the T-cell 104 is shown as a CB8+ T cell and includes a TCR 108 along with a CD8 co-receptor 110. The illustrated antigen- presenting cell 106 has a specific HLA class, Class-1 in this example, and includes an HLA Class- 1 molecule 112 that may bind with the TCR 108 via one or more antigens 114.
[0024] FIG. IB illustrates a schematic representation 120 of a binding interaction 122 associated with various receptor cites 124 of the TCR 108 and with HLC pockets 130 of the HLA Class-1 molecule 112. In this example, an antigenic peptide 126 is illustrated with nine amino acid peptides 128, each associated with a different position, labeled as 1-9 (e.g., P1-P9). This example peptide 126 may also be referred to as a 9-mer peptide. In at least one embodiment, the particular antigenic peptide of FIG. IB may be described as “taking preference” with positions 2 and 9. That is positions 2 and 9, associated with the peptides 128, may be referred to as the anchor positions for the peptide 126. Anchor positions imply strong bonding to the HLA molecule 112. Accordingly, the amino acids 128 associated with positions 4-8 are most likely to bind to and interact with the TCR 108.Attorney Docket No.: 767095.000075
[0025] In at least one embodiment, the illustrated HLA molecule 112 may be an HLA Class-1 complex 112 that prefers a specific amino acid, in this case, one that anchors at positions 2 and 9. Embodiments may consider such a desired interaction as a feature that may be used for interaction predictions, among other various features, which may be used to train and / or develop one or more ML / Al models. While not illustrated in FIG. IB, various other features may also be used to develop models for interaction prediction, including features such as hydropathy, which may be used to determine hydrophobisity for certain molecules. For example, certain HLA complexes may be hydrophilic, and not hydrophobic, and therefore may be associated with different binding chemistry. By considering both physical interactions (e.g., the position information) and the chemical interactions, systems and methods of the present disclosure provide improved predictive capabilities for T-cell interactions.
[0026] As discussed herein, interaction information may be used to develop one or more amino acid interaction matrices. In at least one embodiment, a heuristic based matching may be developed for amino acids for an inter protein chain (for binding sites) and an intra protein chain (for stability). This interaction information, along with other interaction properties, such as charges at different positions, may be used to train and / or guide predictions from the ML / Al models of the present disclosure. In at least one embodiment, the matrices may be evaluated against and / or compared to structural information associated with one or more peptide and T-cell binding interactions. In this manner, features of the one or more matrices can be scored to generate different composite scores.
[0027] One or more embodiments of the present disclosure may be used to determine a complementary position for a given peptide and then to validate the peptide against one or more known combinations. Returning to FIG. IB, the antigenic peptide 126 includes the amino acid peptides 128, labeled as 1-9. As shown, certain amino acid peptides 128 contact the structuralAttorney Docket No.: 767095.000075 components of the HLC pockets 130, which in this example may be associated with B and F. One or more embodiments may further evaluate the complements to these connections in order to determine amino acid interactions at the TCR 108. As discussed herein, the complements may be specific to each HLA (e.g., interactions with amino acid peptides 128 (e.g., positions 4-7), and therefore, may be used to generate one or more features of the matrices. However, embodiments of the present disclosure may also extend evaluation to positions 2 and 9. Accordingly, by considering physical binding interactions for the remaining peptides, complementary interactions may be determined and used as a factor, which may be one factor among many, to predict binding interactions. However, as discussed herein, one or more embodiments may use a multi-matrix evaluation that considers, for example, physical and chemical properties associated with binding interactions.
[0028] One or more embodiments of the present disclosure may use the multi-matrix evaluation to develop and deploy one or more ML / Al models that address and overcome various problems in existing systems. For example, as discussed herein, the multi-matrix approach provides additional features for improved matching to likely active peptides, thereby reducing false positive identifications that may consume downstream lab resources for testing. Various embodiments of the present disclosure may also apply features associated with loop structures, as opposed to the linear string based pattern matching of existing approaches. For example, linear string based pattern matching to identify binding interactions fails to account for the true physical structures associated with TCRs. In this manner, embodiments may discount for potential binding locations that will not be accessible due to the shape of the TCRs.
[0029] FIG. 2A illustrates an example environment 200 that may be used with embodiments of the present disclosure. In this example, a curated set of data 202 may be used to train and / or verify one or more ML / Al models and / or develop and / or verify one or more interaction matrices, asAttorney Docket No.: 767095.000075 discussed herein. Systems and methods of the present disclosure may train the models on a variety of different features which may, in various embodiments, provide a score or other indication associated with binding interactions. The data may be based on factors or indicators for the individual features. Non-limiting examples include physical data 204, testing data 206, and chemical data 208. It should be appreciated that the information is divided into groups for clarity, but that various types of data may fall into multiple categories. The physical data 204 may include information such as physical biding information 210, anchoring properties 212, and geometry data 214. Physical data may be associated with binding sites or interactions between amino acids and TCRs. Additionally, physical data may be used to categorize HLAs. In at least one embodiment, the physical data 204 may include multi-modal data, such as text data and image data, as nonlimiting examples. For example, images of different TCRs may be used to extract different types of geometry data, which as discussed herein, may be used to discount or otherwise place more or less importance on different amino acids, which may consider data associated with probabilities of interaction due to complementary shapes.
[0030] The testing data 206 may include aggregated true positives 216 and true negatives 218. For example, true positives 216 may refer to acquired data associated with T-cell interactions and binding configurations. Similarly, true negatives 218 may refer to known configurations that failed to bond or had weak bonding (e.g., bonding below a threshold). As a result, accumulated data from testing, literature, and / or the like may be used to further refine the models to identify which features are important, how features are associated with one another, and the like.
[0031] The chemical data 208 may include properties such as chemical binding information 220 and hydropathy properties 222, among others. For example, chemical bonding may be associated with particular types of interactions, while hydropathy may refer to whether or not certain peptides are hydrophobic, hydrophilic, somewhere between, and / or an intensity for aAttorney Docket No.: 767095.000075 categorization. By incorporating the chemical data 208, among other information, a variable scale for binding may be generated. For example, merely classifying a peptide as hydrophobic would be insufficient because of the typing that can be produced due to the hydropathy identification.
[0032] Various embodiments of the present disclosure may be aggregated and / or collected and provided to one or more ML systems 224 that may execute an ML service 226 that can be used be used for inferencing using one or more models from a model datastore 228 and / or for training 230. As discussed herein, different data may be indicative of different properties. For example, if a peptide interaction is illustrated as being incompatible with hydrophobic peptides, other information may be superfluous or may not provide much additional information for that particular interaction. However, in certain embodiments, various features of the data 202 may be used to develop and / or validate 232 one or more matrices that may be used to execute one or more multi-modal peptide evaluations, which may include validations using the models in the model datastore 228. For example, different matrices may be provided from a matrix datastore 234 for validation.
[0033] In this manner, systems and methods may be used to develop a multi-matrix evaluation in which position information is evaluated along with biochemical properties. By applying the secondary evaluation of biochemical properties, which may be separate from or inapplicable to the position information (e g., hydropathy does not affect physical position information), systems and methods enable a binding prediction from evaluating biochemical properties in view of anchoring preference.
[0034] FIG. 2B illustrates an example environment 250 that may be used with embodiments of the present disclosure. In this example, a testing service 252 may be used to evaluate an interaction matrix 254 with a proposed peptide 256 in order to validate and / or verify variousAttorney Docket No.: 767095.000075 potential TCR binding configurations. In at least one embodiment, the testing service 252 may be incorporated into the one or more ML systems 224. The interaction matrix 254 may be associated with different amino acid interactions and may be scored, for example, using one or more heuristics to establish a binding score. The binding scores may be developed for different pairs of amino acids, and the binding scores may be associated with an indication of how well binding may occur with a given peptide configuration.
[0035] In at least one embodiment, the proposed peptide 256 may include a proposed 4-mer amino acid sequence. By providing the interaction matrix 254 and the proposed peptide 256 for testing, systems and methods of the present disclosure may be used to extract bind site / location information that may be quantified to provide a likelihood of a positive or desirable interaction. In this example, a variety of different types of interaction information are provided, including a candidate TCR 258, interaction type 260, amino acid position 262, and a score 264. It should be appreciated that more of less information may be provided, and moreover, in certain embodiments, a total score or indication may be provided that aggregates or otherwise considers the pieces of the interaction information.
[0036] For example, candidate TCR data 258 may include information such as top-pick quartet, alternative binding information, and / or the like. The candidate TCR data 258 may be provided on a per-position 262 basis, and may be focused on positions 4-7, as discussed here, thereby reducing computing resources or evaluation time with positions that may not be relevant to TCR binding. Interaction types 260 may provide biochemical information for interactions, such as an indication for a primary interaction, which may include indications of salt bridge interactions, hydrophobe packing, H-bond networks, and / or the like. A score 264 may then be provided for a per-position basis, with an aggregate total score provided for the proposed peptide 256. Scores that exceed a threshold may be categorized as likely viable, while those below the threshold mayAttorney Docket No.: 767095.000075 be categorized as unlikely to bind. In this manner, peptide configurations may be evaluated and those with a low likelihood of binding may be eliminated from evaluation before lab testing to conserve resources. In other words, effectiveness of target prediction is improved over current approaches. As one non-limiting example, existing approaches may have approximately 10-20 percent effective identification, such as identifying a list of approximately 50 targets with only 5- 10 binding effectively. In contrast, embodiments of the present disclosure may be used to provide a smaller, more targeted list of targets, yielding both a higher percentage of effective binding targets while also reducing an overall number of tested targets.
[0037] FIG. 3 illustrates an example environment 300 that may be used with embodiments of the present disclosure. In this example, an input 302, which may include a novel protein, a known protein, an HLA complex, and / or the like, is provided to the ML system 224. Furthermore, embodiments may include combinations of input information, which may be based on different parameters of a specified workflow. In one or more embodiments, inputs may be limited or restricted to try and reduce a risk of generating false positives. In at least one embodiment, an input may include a combination of features, such as a protein (novel or not) along with a specified HLA class. As discussed herein, specifying an HLA class may be associated with identifying hydrophobic or hydrophilic properties of the protein, which may provide further data that enables improved binding identifications. For example, HLA classes may behave differently, have different anchor amino acids, have different biochemical properties, and / or the like.
[0038] As discussed herein, the ML service 226 may select one or more models from the model datastore 228, and then perform inferencing over the input 302, to generate one or more outputs 304. The outputs may provide information associated with a likelihood that the input protein will bind at a particular binding site. The protein may then go through testing 306 to evaluate binding, which may be stored, along with the initial prediction, in one or more results datastores 308. TheAttorney Docket No.: 767095.000075 one or more results datastores 308 may then be used for additional refinement of the one or more models 228.
[0039] FIG. 4 illustrates an example environment 400 that may be used with embodiments of the present disclosure. In this example, a computing device 402 (e g., user device, compute device, client device, etc.) can submit a request over at least one network 404 to be received by a prediction environment 406. The prediction environment 406 (e.g., environment) may be an online platform provided by a service provider and / or for an affiliate, for example the environment 406 may be hosted or otherwise provided via one or more cloud resource providers on behalf of a service provider. Authorized users of the prediction environment 406 may be permitted to submit one or more peptide configurations for evaluation using the one or more ML systems 224.
[0040] The client computing device 402 may be a representative and / or act as a proxy for one or more users that may be submitting requests. For example, a user may navigate to one or more dashboards, web applications, landing pages, or access points using the device to submit a request, among other options. Additionally, in at least one embodiment, the client computing device 402 may act as a proxy to execute stored instructions to make and receive requests. For example, the client computing device 402 may send a request responsive to receiving one or more inputs and / or the like. As another example, a request may be transmitted as part of an automated or semiautomated workflow, which may or may not receive user interaction. For example, upon submitting laboratory testing results and / or selecting a target peptide configuration, one or more workflows may be initiated to select one or more ML models, process one or more datasets and / or inputs associated with the target identification, and then provide a curated set of results. Accordingly, the client computing device 402 may be used with direct input from one or more users, from stored software instructions, from executions of various workflows, or combinations thereof.Attorney Docket No.: 767095.000075
[0041] In at least one embodiment, the request can include a request to execute one or more workflows associated with peptide TCR binding interaction prediction, which may be based, at least in part, on an evaluation of a combination of physical and biochemical properties for a given input peptide. It should be appreciated that peptide TCR binding interaction is provided by way of non-limiting example and systems and methods of the present disclosure may be used in a variety of different types of prediction tasks, which may include biological prediction tasks for the development and / or identification of one or more biological compounds that may be used to treat, prevent, or otherwise address one or more diseases or illnesses associated with humans, animals, plant life, and / or the like.
[0042] In many cases, the analysis and / or prediction tasks may include a request to access data (e.g., stored data, streaming data, etc.) and then to process the data using one or more workflows associated with the environment 406. In at least one embodiment, a selected workflow may be based, at least in part, on information provided by the computing device 402, such as a command, or based on data received by the environment 406. The network(s) 404 can include any appropriate network, such as the Internet, a local area network (LAN), a cellular network, an Ethernet, or other such wired and / or wireless network. The prediction environment 406 can include any appropriate resources for accessing data or information, such as laboratory results, prior training data, medical information, and / or the like, as may include various servers, data stores, and other such components known or used for accessing data and / or processing data from across a network (or from the “cloud”). Moreover, the client computing device 402 can be any appropriate computing or processing device, as may include a desktop or notebook computer, smartphone, tablet, wearable computer (e.g., smartwatch, glasses, contacts, headset, etc.), server, or other such system or device.Attorney Docket No.: 767095.000075
[0043] An interface layer 408, when receiving a request or call, can determine the type of call or request and cause information to be forwarded to the appropriate component or sub-system. For example, the interface 408 may be associated with one or more landing pages, as an example, to guide a user toward a workflow or action. In at least one embodiment, the interface layer 408 may include other functionality and implementations, such as load balancing and the like.
[0044] Various embodiment of the present disclosure are directed toward predictive systems to score or otherwise provide indications for binding interactions between TCR and proposed peptides. In at least one embodiment, a manager 410 may be associated with the prediction environment 406 to receive and route incoming requests, for example to determine whether a requestor has authority to make the request by querying a user datastore 412, by accessing one or more third party resources 414 and / or databases 416, and / or by directing the request toward systems and sub-systems of the prediction environment 406. For example, the manager 410 may incorporate an authentication service to verify credentials provided by the client device 402, for example against the user datastore 412, to verify and permit access to the environment 406. Furthermore, in at least one embodiment, verification may also determine a level of accessibility within the environment 406, which may be on an application-basis, a user-basis, or some combination thereof. For example, a first user may have access to the environment, but only have a limited set of applications that are accessible, while a second user may have access to more applications, and a third user may be entirely barred from the environment.
[0045] Systems and methods may include a web-based or application-based portal that permits receipt of proposed peptides, proteins, or other biological molecules and / or data to refine one or more ML models. At least one embodiment discussed herein may be related to generating predictions for binding interaction for proposed peptide configurations.Attorney Docket No.: 767095.000075
[0046] In this example, a data evaluation engine 418 may be used to process data generated by one or more ML models and / or prepare data for execution with one or more ML models. For example, systems and methods of the present disclosure may use the data evaluation engine 418 to process input peptide information, interpret or otherwise prepare output results from the one or more ML systems 224, and / or the like. In this example, a processing engine 420 may receive the input information, such as a proposed peptide configuration, and prepare the information for input to the one or more ML systems 224, which may include adjusting a format or the like. Additionally, one or more properties of the input may be used to select a model from the model datastore 228. The selected model may be loaded to the ML service 226 for inferencing.
[0047] The ML system 224 may be used to manage and execute one or more ML / Al models, such as transformer-based models, convolutional neural networks, recurrent neural networks, and / or combinations thereof. Various embodiments associated with the ML system 224 may include execution of different software instructions based, at least in part, on a request received from the user device. In this example, an ML / Al model may be selected from one or more models of the model datastore 228, which may include a set of models that may be trained for a domain and / or are general or foundational models that may be used with embodiments of the present disclosure. These models may be trained and or execute using one or more different datastores, which may include training data, model parameters, model settings, rules, and / or the like. The models in the model datastore 228 may undergo training using the training engine 230, which may use training data from a training datastore 428, which may include outputs or tagged information. The training data, which may be labeled or unlabeled, and also may be augmented or otherwise influenced by one or more human reviewers, but it should be appreciated that raw training data may be used with one or more self-supervised learning processes. Accordingly, models may be trained for specific use cases and / or a general model may be trained for a specific domain, such as for a specific typeAttorney Docket No.: 767095.000075 of ailment (e.g., cancer, viral, etc.). In operation, the ML service 226 may be used to execute and run the model selected from the model datastore 228 and may output one or more predictions 422.
[0048] The one or more predictions 422 may include binding or interaction information for the given input peptide. In at least one embodiment, the data evaluation engine 418 may process the prediction 422 using a score evaluation engine 424 to extract information from the prediction 422 for presentation to the user. Furthermore, the prediction 422 may be stored in a results datastore 426, which may be used to supplement the training datastore 428 used for training the one or more models.
[0049] Various types of architectures may be implemented in various embodiments, and in certain embodiments, architecture may be technique-specific. As one example, architectures may include recurrent neural networks (RNNs) or long short-term memory networks (LSTMs), transformer architectures (e.g., self-attention mechanisms, encoder and / or decoder blocks, etc.), convolutional neural networks (CNNs), and / or the like.
[0050] In various embodiments, the models may be trained using unsupervised learning, in which models learn patterns from large amounts of unlabeled training data (e.g., text, audio, video, image, etc.). Furthermore, one or more models may be task-specific or domain-specific, which may be based on the type of training data used. Additionally, foundational models may be used and then tuned for specific tasks or domains. Some types of foundational models may include questionanswering, summarization, filling in missing information, and translation. Additionally, specific models may also be used and / or augmented for certain tasks, using techniques like prompt tuning, fine-tuning, retrieval augmented generation (RAG), adding adapters, and / or, the like. Systems and methods may incorporate a number of different models to execute one or more operations, including, but not limited to, NetMHC, PSSM, PickPocket, MHCFlurry, ALMHC, SYFPEFITHI, and ACME.Attorney Docket No.: 767095.000075
[0051] FIG. 5A illustrates an example flow chart for a process 500 for predicting a likelihood of binding between a proposed peptide and a TCR. It should be appreciated that steps for the method may be performed in any order, or in parallel, unless otherwise specifically stated. Moreover, the method may include more or fewer steps. In this example, a proposed peptide is received 502, which may be part of an input to one or more machine learning systems. In at least one embodiment, the proposed peptide is configured to bind to a T-cell receptor to cause an immune response for a target condition. A subset of amino acids associated with the proposed peptide may be evaluated to determine a subset of binding amino acids 504. For example, the binding amino acids may be those in positions 4-7. A binding score may be generated for the proposed peptide 506. The binding score may be based, at least in part, on at least one physical characteristic of the proposed peptide and at least one biochemical characteristic of the proposed peptide. The binding score may further be used to generate an indication associated with a likelihood of binding for the proposed peptide 508.
[0052] FIG. 5B illustrates an example flow chart for a process 520 for predicting a likelihood of binding for a target peptide. In this example, a target peptide for evaluation is received 522. A multi-matrix evaluation may be executed, for the target peptide, to determine one or more anchor amino acids 524, one or more binding amino acids 526, and one or more biochemical properties for the target peptide 528. A likelihood of binding between the target peptide and a target TCR may then be inferred 530, for example based on properties associated with the one or more binding amino acids.
[0053] FIG. 6 illustrates a set of general components of an example computing device 600. In this example, the device includes a processor 602 for executing instructions that can be stored in a memory 604. The device can include many types of memory, data storage, or non-transitory computer-readable storage media, such as a first data storage for program instructions forAttorney Docket No.: 767095.000075 execution by the processor 602, a separate storage for images or data, a removable memory for sharing information with other devices, etc. The device may optionally include a display element 606, such as a touch screen or liquid crystal display (LCD), although devices such as portable media players might convey information via other means, such as through audio speakers, and other devices may not include displays, such as server components executing within data centers, among other options. As discussed, the device in many embodiments will include at least one interaction component 608 able to receive input from a user. This input can include, for example, a push button, touch pad, touch screen, wheel, joystick, keyboard, mouse, keypad, or any other such device or element whereby a user can input a command to the device. In some embodiments, however, such a device might not include any buttons at all and might be controlled only through a combination of visual and audio commands, such that a user can control the device without having to be in contact with the device. In some embodiments, the computing device 600 of FIG. 6 can include one or more network interface or communication components 610 for communicating over various networks, such as a Wi-Fi, Bluetooth, RF, wired, or wireless communication systems. The device may be configured to communicate with a network, such as the Internet, and may be able to communicate with other such devices. The device will also include one or more power components 612, such as power cords, power ports, batteries, wirelessly powered or rechargeable receivers, and the like.
[0054] Storage media and other non-transitory computer readable media for containing code, or portions of code, can include any appropriate media known or used in the art, including storage media and communication media, such as but not limited to volatile and non-volatile, removable and non-removable media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules, or other data, including RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, digital versatileAttorney Docket No.: 767095.000075 disk (DVD) or other optical storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other medium which can be used to store the desired information and which can be accessed by the a system device. Based on the disclosure and teachings provided herein, a person of ordinary skill in the art will appreciate other ways and / or methods to implement the various embodiments.
[0055] Embodiments may also be described in view of the following clauses:1. A computer-implemented method, comprising: receiving a proposed peptide configured to bind to a T-cell receptor to cause an immune response for a target condition; determining, from a set of amino acids associated with the proposed peptide, a subset of binding amino acids; generating a binding score for the proposed peptide based, at least in part, on at least one physical characteristic of the proposed peptide and at least one biochemical characteristic of the proposed peptide; and generating, based on the binding score, an indication associated with a likelihood of binding for the proposed peptide.2. The computer-implemented method of clause 1, further comprising: receiving, as an input to one or more trained machine learning models, the proposed peptide.3. The computer-implemented method of clause 1, further comprising: generating a first set of training data associated with physical data; generating a second set of training data associated with testing data; generating a third set of training data associated with chemical data; and training the one or more trained machine learning models based, at least in part, on the first set of training data, the second set of training data, or the third set of training data.4. The computer-implemented model of clause 1, wherein the at least one physicalAttorney Docket No.: 767095.000075 characteristics includes at least one of an amino acid position, an amino acid name, a binding characteristic associated with the proposed peptide, or T-cell loop properties.5. The computer-implemented method of clause 1, wherein the at least one biochemical characteristic includes at least one of an interaction type or hy drophob osity.6. The computer-implemented method of clause 1, further comprising: generating respective binding score portions for each amino acid of the set of amino acids; and combining the respective binding score portions to produce the binding score.7. The computer-implemented method of clause 1, further comprising: receiving a human leukocyte antigen (HLA) complex with the proposed peptide.8. The computer-implemented method of clause 7, wherein the set of amino acids is based, at least in part, on the HLA complex.9. A processor, comprising: one or more circuits to: receive a target peptide; determine, using a multi-matrix evaluation, one or more anchor amino acids for the target peptide; determine, using the multi-matrix evaluation, one or more binding amino acids for the target peptide; determine, using the multi-matrix evaluation, one or more biochemical properties for the target peptide; and infer a likelihood of binding between the target peptide and a target T-cell receptor.10. The processor of clause 9, wherein the multi-matrix evaluation is executed by one or more trained machine learning systems trained from, at least in part, physical binding data,Attorney Docket No.: 767095.000075 testing data, and chemical binding data.11. The processor of clause 9, wherein the one or more biochemical properties include at least one of an interaction type or hydrophobosity.12. The processor of clause 9, wherein the one or more processors are further to: generate respective binding score portions for each amino acid of the one or more binding amino acids; and combine the respective binding score portions to produce an overall binding score.13. The processor of clause 9, wherein the likelihood of binding is based, at least in part, on the overall binding score.14. The processor of clause 9, wherein the target peptide includes an associated human leukocyte antigen (HLA) complex.15. The processor of clause 14, wherein the one or more anchor amino acids are based, at least in part, on the associated HLA complex.16. A computer-implemented method, comprising: determining, using a multi-matrix evaluation, one or more anchor amino acids for the target peptide; determining, using the multi-matrix evaluation, one or more binding amino acids for the target peptide; determining, using the multi-matrix evaluation, one or more biochemical properties for the target peptide; and inferring a likelihood of binding between the target peptide and a target T-cell receptor.17. The computer-implemented method of clause 16, wherein the multi-matrix evaluation is executed by one or more trained machine learning systems trained from, at least in part, physical binding data, testing data, and chemical binding data.Attorney Docket No.: 767095.00007518. The computer-implemented method of clause 16, wherein the one or more biochemical properties include at least one of an interaction type or hydrophobosity.19. The computer-implemented method of clause 16, further comprising: generating respective binding score portions for each amino acid of the one or more binding amino acids; and combining the respective binding score portions to produce an overall binding score.20. The computer-implemented method of clause 16, wherein the likelihood of binding is based, at least in part, on the overall binding score.21. The computer-implemented method of clause 16, wherein the target peptide includes an associated human leukocyte antigen (HLA) complex.22. The computer-implemented method of clause 21, wherein the one or more anchor amino acids are based, at least in part, on the associated HLA complex.23. A computer-implemented method, comprising: receiving a proposed peptide configured to bind to a T-cell receptor to cause an immune response for a target condition; determining, from a set of amino acids associated with the proposed peptide, a subset of binding amino acids; generating a binding score for the proposed peptide based, at least in part, on at least one physical characteristic of the proposed peptide and at least one biochemical characteristic of the proposed peptide; and generating, based on the binding score, an indication associated with a likelihood of binding for the proposed peptide.24. The computer-implemented method of clause 23, further comprising: receiving, as an input to one or more trained machine learning models, the proposedAttorney Docket No.: 767095.000075 peptide.25. The computer-implemented method of any of clauses 23 or 24, further comprising: generating a first set of training data associated with physical data; generating a second set of training data associated with testing data; generating a third set of training data associated with chemical data; and training the one or more trained machine learning models based, at least in part, on the first set of training data, the second set of training data, or the third set of training data.26. The computer-implemented model of any of clauses 23-25, wherein the at least one physical characteristics includes at least one of an amino acid position, an amino acid name, a binding characteristic associated with the proposed peptide, or T-cell loop properties.27. The computer-implemented method of any of clauses 23-26, wherein the at least one biochemical characteristic includes at least one of an interaction type or hydrophobosity.28. The computer-implemented method of any of clauses 23-27, further comprising: generating respective binding score portions for each amino acid of the set of amino acids; and combining the respective binding score portions to produce the binding score.29. The computer-implemented method of any of clauses 23-28, further comprising: receiving a human leukocyte antigen (HLA) complex with the proposed peptide.30. The computer-implemented method of clause 29, wherein the set of amino acids is based, at least in part, on the HLA complex.31. A processor, comprising: one or more circuits to: receive a target peptide;Attorney Docket No.: 767095.000075 determine, using a multi-matrix evaluation, one or more anchor amino acids for the target peptide; determine, using the multi-matrix evaluation, one or more binding amino acids for the target peptide; determine, using the multi-matrix evaluation, one or more biochemical properties for the target peptide; and infer a likelihood of binding between the target peptide and a target T-cell receptor.32. The processor of clause 31, wherein the multi-matrix evaluation is executed by one or more trained machine learning systems trained from, at least in part, physical binding data, testing data, and chemical binding data.33. The processor of any of clauses 31 or 32, wherein the one or more biochemical properties include at least one of an interaction type or hydrophobosity.34. The processor of any of clauses 31-33, wherein the one or more processors are further to: generate respective binding score portions for each amino acid of the one or more binding amino acids; and combine the respective binding score portions to produce an overall binding score.35. The processor of any of clauses 31-34, wherein the likelihood of binding is based, at least in part, on the overall binding score.36. The processor of any of clauses 31-35, wherein the target peptide includes an associated human leukocyte antigen (HLA) complex.37. The processor of clause 36, wherein the one or more anchor amino acids are based, at least in part, on the associated HLA complex.Attorney Docket No.: 767095.00007538. A computer-implemented method, comprising: determining, using a multi-matrix evaluation, one or more anchor amino acids for the target peptide; determining, using the multi-matrix evaluation, one or more binding amino acids for the target peptide; determining, using the multi-matrix evaluation, one or more biochemical properties for the target peptide; and inferring a likelihood of binding between the target peptide and a target T-cell receptor.39. The computer-implemented method of clause 38, wherein the multi-matrix evaluation is executed by one or more trained machine learning systems trained from, at least in part, physical binding data, testing data, and chemical binding data.40. The computer-implemented method of any of clauses 38 or 39, wherein the one or more biochemical properties include at least one of an interaction type or hydrophobosity.41. The computer-implemented method of any of clauses 38-40, further comprising: generating respective binding score portions for each amino acid of the one or more binding amino acids; and combining the respective binding score portions to produce an overall binding score.42. The computer-implemented method of any of clauses 38-41, wherein the likelihood of binding is based, at least in part, on the overall binding score.43. The computer-implemented method of any of clauses 38-42, wherein the target peptide includes an associated human leukocyte antigen (HLA) complex.44. The computer-implemented method of clause 43, wherein the one or more anchor amino acids are based, at least in part, on the associated HLA complex.Attorney Docket No.: 767095.000075
[0056] Although the technology herein has been described with reference to particular embodiments, it is to be understood that these embodiments are merely illustrative of the principles and applications of the present technology. It is therefore to be understood that numerous modifications may be made to the illustrative embodiments and that other arrangements may be devised without departing from the spirit and scope of the present technology as defined by the appended claims.
Claims
Attorney Docket No.: 767095.000075CLAIMS1. A computer-implemented method, comprising: receiving a proposed peptide configured to bind to a T-cell receptor to cause an immune response for a target condition; determining, from a set of amino acids associated with the proposed peptide, a subset of binding amino acids; generating a binding score for the proposed peptide based, at least in part, on at least one physical characteristic of the proposed peptide and at least one biochemical characteristic of the proposed peptide; and generating, based on the binding score, an indication associated with a likelihood of binding for the proposed peptide.
2. The computer-implemented method of claim 1, further comprising: receiving, as an input to one or more trained machine learning models, the proposed peptide.
3. The computer-implemented method of claim 1, further comprising: generating a first set of training data associated with physical data; generating a second set of training data associated with testing data; generating a third set of training data associated with chemical data; and training the one or more trained machine learning models based, at least in part, on the first set of training data, the second set of training data, or the third set of training data.
4. The computer-implemented model of claim 1, wherein the at least one physical characteristics includes at least one of an amino acid position, an amino acid name, a binding characteristic associated with the proposed peptide, or T-cell loop properties.
5. The computer-implemented method of claim 1, wherein the at least one biochemical characteristic includes at least one of an interaction type or hydrophobosity.
6. The computer-implemented method of claim 1, further comprising:Attorney Docket No.: 767095.000075 generating respective binding score portions for each amino acid of the set of amino acids; and combining the respective binding score portions to produce the binding score.
7. The computer-implemented method of claim 1, further comprising: receiving a human leukocyte antigen (HLA) complex with the proposed peptide.
8. The computer-implemented method of claim 7, wherein the set of amino acids is based, at least in part, on the HLA complex.
9. A processor, comprising: one or more circuits to: receive a target peptide; determine, using a multi-matrix evaluation, one or more anchor amino acids for the target peptide; determine, using the multi-matrix evaluation, one or more binding amino acids for the target peptide; determine, using the multi-matrix evaluation, one or more biochemical properties for the target peptide; and infer a likelihood of binding between the target peptide and a target T-cell receptor.
10. The processor of claim 9, wherein the multi-matrix evaluation is executed by one or more trained machine learning systems trained from, at least in part, physical binding data, testing data, and chemical binding data.
11. The processor of claim 9, wherein the one or more biochemical properties include at least one of an interaction type or hydrophobosity.
12. The processor of claim 9, wherein the one or more processors are further to: generate respective binding score portions for each amino acid of the one or more bindingAttorney Docket No.: 767095.000075 amino acids; and combine the respective binding score portions to produce an overall binding score.
13. The processor of claim 9, wherein the likelihood of binding is based, at least in part, on the overall binding score.
14. The processor of claim 9, wherein the target peptide includes an associated human leukocyte antigen (HLA) complex.
15. The processor of claim 14, wherein the one or more anchor amino acids are based, at least in part, on the associated HLA complex.