Training of a relationship prediction model, method and apparatus for predicting interaction relationships
By constructing local and global feature extraction models and comprehensively utilizing protein topological structure and amino acid sequence information, a relationship prediction model is trained, which solves the problem of low accuracy caused by using only local information in existing technologies and improves the accuracy of protein interaction relationship prediction.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- BOE TECHNOLOGY GROUP CO LTD
- Filing Date
- 2023-09-27
- Publication Date
- 2026-06-23
AI Technical Summary
Existing modeling methods utilize local protein information while ignoring global information, resulting in low model accuracy.
By constructing local feature extraction models and global feature extraction models, and comprehensively utilizing the global information provided by the topological structure of the protein-protein interaction network and the local information provided by the amino acid sequence, a relationship prediction model is trained.
This improves the accuracy of protein interaction prediction and solves the problem of low accuracy caused by relying only on local information in existing technologies.
Smart Images

Figure CN119724370B_ABST
Abstract
Description
TECHNICAL FIELD
[0001] The embodiment of the present disclosure relates to the technical field of machine learning, in particular to a relationship prediction model training method, a relationship prediction model training device, an interaction relationship prediction method, an interaction relationship prediction device, a computer readable storage medium and an electronic device. BACKGROUND
[0002] In the existing model method, the local information of the protein is utilized while the global information is ignored, thereby causing low accuracy of the model.
[0003] It should be noted that the information disclosed in the above background section is only used to strengthen the understanding of the background of the present disclosure, and therefore can include information that does not constitute prior art known to those of ordinary skill in the art. SUMMARY
[0004] The purpose of the present disclosure is to provide a relationship prediction model training method, a relationship prediction model training device, an interaction relationship prediction method, an interaction relationship prediction device, a computer readable storage medium and an electronic device, thereby at least partially overcoming the problem of low accuracy of the model caused by the limitations and defects of the related art.
[0005] According to one aspect of the present disclosure, a relationship prediction model training method is provided, comprising:
[0006] extracting first historical proteins having a correlation relationship and second historical proteins not having a correlation relationship in historical biological data, and constructing a first data set according to the first historical proteins and the actual correlation relationship values between the first historical proteins;
[0007] training a local feature extraction model according to the first data set to obtain a first interaction relationship prediction model, and inputting the second historical proteins into the first interaction relationship prediction model to obtain first predicted correlation relationship values;
[0008] constructing a second data set according to the second historical proteins and the first predicted correlation relationship values, and training a global feature extraction model according to the first data set and the second data set to obtain a second interaction relationship prediction model;
[0009] constructing a target relationship prediction model according to the first interaction relationship prediction model and the second interaction relationship prediction model.
[0010] In an exemplary embodiment of the present disclosure, the first data set is constructed according to the first historical proteins and the actual correlation relationship values between the first historical proteins, comprising:
[0011] First historical protein pairs are formed based on the first historical proteins that have a relationship, and a first dataset is constructed based on the first historical protein pairs and the actual relationship values between the first historical proteins in the first historical protein pairs.
[0012] In one exemplary embodiment of this disclosure, a local feature extraction model is trained based on a first dataset to obtain a first interaction relationship prediction model, including:
[0013] The first historical protein pair in the first dataset is input into the local feature extraction model to obtain the first prediction result between the first historical proteins in the first historical protein pair;
[0014] Based on the first prediction result and the actual correlation value, a first objective loss function is constructed, and the first model parameters in the local feature extraction model are adjusted based on the first objective loss function to obtain the first interaction relationship prediction model.
[0015] In one exemplary embodiment of this disclosure, the local feature extraction model includes a first feature encoding layer, a first feature mapping layer, a first feature sequence extraction layer, and a first classification layer; the first historical protein pair includes a first sub-protein and a second sub-protein.
[0016] Specifically, the first historical protein pairs in the first dataset are input into a local feature extraction model to obtain a first prediction result between the first historical proteins in the first historical protein pair, including:
[0017] Obtain the first amino acid sequence of the first sub-protein and the second amino acid sequence of the second sub-protein in the first historical protein pair in the first dataset, and construct a first amino acid vector and a second amino acid vector based on the first amino acid fragment in the first amino acid sequence and the second amino acid fragment in the second amino acid sequence.
[0018] The first feature encoding layer is used to perform one-hot encoding on the first amino acid vector and the second amino acid vector to obtain the first encoding result and the second encoding result. Then, the first feature mapping layer is used to perform dense mapping on the first encoding result and the second encoding result to obtain the first dense mapping result and the second dense mapping result.
[0019] Temporal features are extracted from the first dense mapping result and the second dense mapping result using the first feature sequence extraction layer to obtain the first feature vector of the first sub-protein based on the first amino acid sequence and the second feature vector of the second sub-protein based on the second amino acid sequence.
[0020] The first feature vector and the second feature vector are input into the first classification layer to obtain the first prediction result between the first sub-protein and the second sub-protein.
[0021] In one exemplary embodiment of this disclosure, the first feature sequence extraction layer includes multiple feature sequence prediction networks and a first splicing layer, wherein the number of feature sequence prediction networks is consistent with the number of first amino acid fragments in the first amino acid vector;
[0022] Specifically, the first feature sequence extraction layer is used to extract temporal features from the first dense mapping result to obtain a first feature vector of the first sub-protein based on the first amino acid sequence, including:
[0023] Based on the position of the first amino acid fragment in the first amino acid vector, the first sub-mapping results corresponding to each first amino acid fragment in the first dense mapping result are input into the feature sequence prediction network corresponding to the fragment position to obtain the first sub-sequence vector;
[0024] Calculate the first weight value of the first sub-sequence vector, and then perform a weighted summation of the first sub-sequence vector and the first weight value through the first splicing layer to obtain the first feature vector of the first sub-protein based on the first amino acid sequence.
[0025] In an exemplary embodiment of this disclosure, the first model parameters include a first encoding parameter corresponding to the first feature encoding layer, a first mapping parameter corresponding to the first feature mapping layer, a first feature extraction parameter corresponding to the first feature sequence extraction layer, and a first classification parameter corresponding to the first classification layer;
[0026] The first model parameters in the local feature extraction model are adjusted based on the first objective loss function to obtain the first interaction relationship prediction model, including:
[0027] The first encoding parameter, first mapping parameter, first feature extraction parameter, and first classification parameter in the local feature extraction model are adjusted based on the first objective loss function to obtain the first interaction relationship prediction model.
[0028] In one exemplary embodiment of this disclosure, a second dataset is constructed based on the second historical protein and the first predicted association value, including:
[0029] Determine whether the first predicted association value between the second historical proteins is greater than a preset threshold;
[0030] When it is determined that the first predicted correlation value is greater than the preset threshold, a second historical protein pair is formed based on the second historical protein corresponding to the first predicted correlation value;
[0031] A second dataset is constructed based on the second historical protein pairs and the first predicted association value.
[0032] In one exemplary embodiment of this disclosure, a global feature extraction model is trained based on a first dataset and a second dataset to obtain a second interaction relationship prediction model, including:
[0033] An original feature map is constructed based on the first dataset and the second dataset, and the original feature map is input into the global feature extraction model to obtain a second prediction result;
[0034] Based on the second prediction result, the actual correlation value and / or the first predicted correlation value, a second target loss function is constructed, and the second model parameters in the global feature extraction model are adjusted based on the second target loss function to obtain the second interaction relationship prediction model.
[0035] In one exemplary embodiment of this disclosure, constructing an original feature map based on the first dataset and the second dataset includes:
[0036] Obtain the first historical protein pair and the actual correlation value between the first historical protein pairs in the first dataset, and the second historical protein pair and the first predicted correlation value between the second historical protein pairs in the second dataset;
[0037] The first and second sub-proteins in the first historical protein pair are taken as the first and second vertices, respectively. The actual relationship is taken as the first connecting edge, and the actual relationship value is taken as the first edge weight of the first connecting edge.
[0038] Using the third and fourth sub-proteins in the second historical protein pair as the third and fourth vertices, the first predicted association relationship as the second connecting edge, and the first predicted association relationship value as the second edge weight of the second connecting edge;
[0039] Construct the original feature map based on the first vertex, second vertex, third vertex, fourth vertex, first connecting edge, second connecting edge, first edge weight, and second edge weight.
[0040] In one exemplary embodiment of this disclosure, the global feature extraction model includes a graph convolutional neural network and a second classification layer;
[0041] The original feature map is input into the global feature extraction model to obtain a second prediction result, including:
[0042] The original feature map is input into the graph convolutional neural network, and the first connecting edge and / or the second connecting edge in the original feature map are updated to obtain the target feature map;
[0043] The target feature map is input into the second classification layer to obtain the second prediction result.
[0044] In one exemplary embodiment of this disclosure, constructing a target relationship prediction model based on the first interaction relationship prediction model and the second interaction relationship prediction model includes:
[0045] The second historical protein is input into the second interaction prediction model to obtain the second predicted association value, and a third dataset is constructed based on the second historical protein and the second predicted association value.
[0046] The local feature extraction model is trained based on the first and third datasets to obtain the updated first interaction relationship prediction model;
[0047] Based on the updated first interaction prediction model and the second interaction prediction model, a target relationship prediction model is constructed.
[0048] According to one aspect of this disclosure, a method for predicting interaction relationships is provided, comprising:
[0049] Obtain a first protein to be predicted and a second protein to be predicted, and construct a protein pair to be predicted based on the first protein to be predicted and the second protein to be predicted;
[0050] The protein pair to be predicted is input into the target relationship prediction model to obtain the relationship prediction result; wherein, the target relationship prediction model is trained by any of the above-mentioned relationship prediction model training methods;
[0051] Based on the relationship prediction results, the interaction relationship between the first predicted protein and the second predicted protein is determined.
[0052] In one exemplary embodiment of this disclosure, the target relationship prediction model includes a first interaction relationship prediction model and a second interaction relationship prediction model;
[0053] The process of inputting the protein pair to be predicted into the target relation prediction model to obtain relation prediction results includes:
[0054] The protein pair to be predicted is input into the first interaction prediction model to obtain the first relationship prediction result, and the protein pair to be predicted is input into the second interaction prediction model to obtain the second relationship prediction result;
[0055] The first relationship prediction result and the second relationship prediction result are weighted and summed to obtain the relationship prediction result.
[0056] In one exemplary embodiment of this disclosure, obtaining a first protein to be predicted and a second protein to be predicted includes:
[0057] In response to a touch operation on a first interactive control on a display interface, a user identifier of the user to be predicted is obtained; wherein the first interactive control includes a user identifier input box;
[0058] In response to a touch operation on a second interactive control on the display interface, a first predictable protein associated with the user to be predicted and a second predictable protein associated with the first predictable protein are obtained based on the user identifier; wherein the second interactive control includes a relationship prediction interactive control.
[0059] In one exemplary embodiment of this disclosure, after determining the interaction relationship between the first protein to be predicted and the second protein to be predicted, the method for predicting the interaction relationship further includes:
[0060] A relationship prediction report is generated based on the interaction relationship and associated with the user to be predicted, and the relationship prediction report is displayed on the display interface.
[0061] According to one aspect of this disclosure, a training apparatus for a relationship prediction model is provided, comprising:
[0062] The first dataset construction module is used to extract the first historical proteins that are related and the second historical proteins that are not related from the historical biological data, and to construct the first dataset based on the first historical proteins and the actual correlation values between the first historical proteins.
[0063] The first model training module is used to train the local feature extraction model based on the first dataset to obtain the first interaction relationship prediction model, and input the second historical protein into the first interaction relationship prediction model to obtain the first predicted association value;
[0064] The second model training module is used to construct a second dataset based on the second historical protein and the first predicted association value, and to train the global feature extraction model based on the first dataset and the second dataset to obtain a second interaction relationship prediction model.
[0065] The target relationship prediction model construction module is used to construct a target relationship prediction model based on the first interaction relationship prediction model and the second interaction relationship prediction model.
[0066] According to one aspect of this disclosure, an apparatus for predicting interaction relationships is provided, comprising:
[0067] The predictor protein pair construction module is used to obtain a first predictor protein and a second predictor protein, and to construct a predictor protein pair based on the first predictor protein and the second predictor protein.
[0068] The relation prediction result determination module is used to input the protein pair to be predicted into the target relation prediction model to obtain the relation prediction result; wherein, the target relation prediction model is trained by any of the above-mentioned relation prediction model training methods;
[0069] An interaction relationship determination module is used to determine the interaction relationship between the first protein to be predicted and the second protein to be predicted based on the relationship prediction results.
[0070] According to one aspect of this disclosure, a computer-readable storage medium is provided having a computer program stored thereon, which, when executed by a processor, implements the training method for the relationship prediction model described in any one of the preceding claims and the prediction method for the interaction relationship described in any one of the preceding claims.
[0071] According to one aspect of this disclosure, an electronic device is provided, comprising:
[0072] Processor; and
[0073] Memory for storing the executable instructions of the processor;
[0074] The processor is configured to execute the training method of the relationship prediction model described in any one of the preceding claims and the prediction method of the interaction relationship described in any one of the preceding claims by executing the executable instructions.
[0075] This disclosure discloses a method for training a relationship prediction model. On one hand, it extracts first historical proteins with a relationship and second historical proteins without a relationship from historical biological data, and constructs a first dataset based on the first historical proteins and the actual relationship values between them. Then, it trains a local feature extraction model using the first dataset to obtain a first interaction relationship prediction model, and inputs the second historical proteins into the first interaction relationship prediction model to obtain a first predicted relationship value. Next, it constructs a second dataset based on the second historical proteins and the first predicted relationship value, and trains a global feature extraction model using the first and second datasets to obtain a second interaction relationship prediction model. Finally, it constructs a target relationship prediction model based on the first and second interaction relationship prediction models. Because both global and local features of the proteins are considered during model training, this method solves the problem in existing technologies where only local information of proteins is used while global information is ignored, resulting in low model accuracy. On the other hand, it also improves the accuracy of the prediction results obtained by predicting relationships using the target relationship prediction model.
[0076] It should be understood that the above general description and the following detailed description are exemplary and explanatory only, and are not intended to limit this disclosure. Attached Figure Description
[0077] The accompanying drawings, which are incorporated in and form part of this specification, illustrate embodiments consistent with this disclosure and, together with the description, serve to explain the principles of this disclosure. It is obvious that the drawings described below are merely some embodiments of this disclosure, and those skilled in the art can obtain other drawings based on these drawings without any inventive effort.
[0078] Figure 1 The flowchart schematically illustrates a method for training a relationship prediction model according to an exemplary embodiment of the present disclosure.
[0079] Figure 2 The diagram schematically illustrates a structural example of a local feature extraction model according to an exemplary embodiment of the present disclosure.
[0080] Figure 3 The diagram schematically illustrates a structural example of a global feature extraction model according to an exemplary embodiment of the present disclosure.
[0081] Figure 4 The diagram schematically illustrates a structural example of a target relationship prediction model according to an exemplary embodiment of the present disclosure.
[0082] Figure 5The flowchart illustrates a method, according to an example embodiment of the present disclosure, for inputting a first historical protein pair from the first dataset into a local feature extraction model to obtain a first prediction result between the first historical proteins in the first historical protein pair.
[0083] Figure 6 The diagram illustrates a scenario example of a calculation process for a first feature vector according to an exemplary embodiment of the present disclosure.
[0084] Figure 7 An example diagram illustrating an original feature map according to an exemplary embodiment of the present disclosure is shown.
[0085] Figure 8 The diagram illustrates an example scenario of calculating a second prediction result according to an exemplary embodiment of the present disclosure.
[0086] Figure 9 A flowchart illustrating a method for predicting interaction relationships according to an exemplary embodiment of this disclosure is shown schematically.
[0087] Figure 10 The diagram schematically illustrates a training apparatus for a relation prediction model according to an exemplary embodiment of the present disclosure.
[0088] Figure 11 A block diagram schematically illustrates an interaction relationship prediction device according to an exemplary embodiment of the present disclosure.
[0089] Figure 12 An electronic device is illustrated, according to an example embodiment of the present disclosure, for implementing the training method of the above-described relationship prediction model and / or the prediction method of interaction relationships. Detailed Implementation
[0090] Example embodiments will now be described more fully with reference to the accompanying drawings. However, example embodiments can be implemented in many forms and should not be construed as limited to the examples set forth herein; rather, these embodiments are provided to make this disclosure more comprehensive and complete, and to fully convey the concept of the example embodiments to those skilled in the art. The described features, structures, or characteristics can be combined in any suitable manner in one or more embodiments. In the following description, numerous specific details are provided to give a full understanding of embodiments of this disclosure. However, those skilled in the art will recognize that the technical solutions of this disclosure can be practiced with one or more of the specific details omitted, or other methods, components, apparatus, steps, etc., can be employed. In other instances, well-known technical solutions are not shown or described in detail to avoid obscuring various aspects of this disclosure.
[0091] Furthermore, the accompanying drawings are merely illustrative of this disclosure and are not necessarily drawn to scale. The same reference numerals in the drawings denote the same or similar parts, and therefore repeated descriptions of them will be omitted. Some block diagrams shown in the drawings are functional entities and do not necessarily correspond to physically or logically independent entities. These functional entities may be implemented in software, in one or more hardware modules or integrated circuits, or in different network and / or processor devices and / or microcontroller devices.
[0092] Protein-protein interaction studies can reveal protein functions at the molecular level, helping to elucidate the patterns of cellular activities such as growth, development, metabolism, differentiation, and apoptosis. Identifying protein-protein interaction pairs across the entire genome is a crucial step in explaining cellular regulatory mechanisms. Furthermore, with the development of protein-protein interaction experimental techniques, researchers can obtain vast amounts of protein-protein interaction data, even enabling genome-wide analysis. However, due to limitations in experimental techniques, traditional methods are not suitable for large-scale detection. To address this issue, existing protein-protein interaction data can be utilized, and machine learning methods can be employed to achieve large-scale detection.
[0093] In practical applications, traditional methods for analyzing and detecting protein-protein interactions using machine learning can be implemented in the following two ways:
[0094] One approach is to represent the amino acid sequence of a protein as an amino acid sequence vector, and then predict protein-protein interactions in the vector space of the amino acid sequence vector. This amino acid sequence vector can be, for example, an amino acid k-mer vector. Here, the k-mer vector can be a small fragment within the amino acid sequence, where k is the length of the small fragment. For example, the amino acid sequence MTAQDD…SYS can be represented as a 3-mer vector [MTA, QDD,…,SYS]. Each dimension of the vector represents a different small amino acid fragment, and the vector elements are independent of each other; that is, the correlation between individual amino acid fragments can be disregarded.
[0095] Another approach is to predict using protein-protein interaction networks. In these networks, vertices represent proteins, and edges represent interactions between corresponding proteins. The prediction task is then accomplished by utilizing the graph's topology through methods such as graph neural networks or random walks.
[0096] However, both of the above implementation methods have their own drawbacks. The first implementation method utilizes local information of the protein while ignoring global information, while the second implementation method utilizes global information while ignoring local information. Both of these methods suffer from low accuracy in prediction results.
[0097] Based on this, the exemplary embodiments of this disclosure first provide a training method for a relationship prediction model. Unlike traditional methods, this disclosure comprehensively utilizes the global information provided by the topological structure of the protein-protein interaction network and the local information provided by the amino acid sequence of the protein to predict protein-protein interactions, thereby improving the accuracy of the prediction.
[0098] In one example embodiment, the training method for the relationship prediction model can run on a server, server cluster, or cloud server, etc.; of course, those skilled in the art can also run the method disclosed herein on other platforms as needed, and this exemplary embodiment does not impose any special limitations on this. Specifically, refer to Figure 1 As shown, the training method for this relationship prediction model may include the following steps:
[0099] Step S110. Extract the first historical proteins that are associated with each other and the second historical proteins that are not associated with each other from the historical biological data, and construct the first dataset based on the first historical proteins and the actual association values between the first historical proteins;
[0100] Step S120. Train the local feature extraction model based on the first dataset to obtain the first interaction relationship prediction model, and input the second historical protein into the first interaction relationship prediction model to obtain the first predicted association value;
[0101] Step S130. Construct a second dataset based on the second historical protein and the first predicted association value, and train the global feature extraction model based on the first dataset and the second dataset to obtain a second interaction relationship prediction model;
[0102] Step S140. Construct a target relationship prediction model based on the first interaction relationship prediction model and the second interaction relationship prediction model.
[0103] In the training method of the above-mentioned relationship prediction model, on the one hand, a first historical protein with an association and a second historical protein without an association are extracted from historical biological data, and a first dataset is constructed based on the first historical protein and the actual association value between the first historical proteins; then, a local feature extraction model is trained based on the first dataset to obtain a first interaction relationship prediction model, and the second historical protein is input into the first interaction relationship prediction model to obtain a first predicted association value; then, a second dataset is constructed based on the second historical protein and the first predicted association value, and a global feature extraction model is trained based on the first and second datasets to obtain a second interaction relationship prediction model; finally, a target relationship prediction model is constructed based on the first and second interaction relationship prediction models. Since both global and local features of proteins are considered during the model training process, the problem of low model accuracy caused by only using local protein information and ignoring global information in existing technologies is solved; on the other hand, the accuracy of the prediction results obtained by predicting relationships through the target relationship prediction model is also improved.
[0104] The training method of the relationship prediction model described in the exemplary embodiments of this disclosure will be further explained and illustrated below with reference to the accompanying drawings.
[0105] First, the technical implementation principles of the exemplary embodiments of this disclosure will be explained and described. Specifically, the training method of the relationship prediction model described in the exemplary embodiments of this disclosure can predict protein-protein interactions based on topological structure and sequence content. This avoids the problems of low accuracy in traditional methods that use sequence content to represent proteins as vectors and then predict protein-protein interactions in vector space, or that use protein-protein interaction networks to predict interactions. The training method of the relationship prediction model provided by this disclosure can comprehensively utilize the global information provided by the topological structure of the protein interaction network and the local information provided by the amino acid sequence of the protein to predict protein-protein interactions, thereby improving the accuracy of the prediction.
[0106] Secondly, the local feature extraction model and the global feature extraction model involved in the example embodiments of this disclosure will be explained and described.
[0107] In one example embodiment, reference is made to... Figure 2As shown, the local feature extraction model may include a first input layer 210, a first feature encoding layer 220, a first feature mapping layer 230, a first feature sequence extraction layer 240, a first classification layer 250, and a first output layer 260; wherein the first input layer, the first feature encoding layer, the first feature mapping layer, the first feature sequence extraction layer, the first classification layer, and the first output layer are connected in sequence; and the specific uses of each functional layer in the feature extraction process will be explained and described in detail later, and will not be repeated here.
[0108] In one example embodiment, reference is made to... Figure 3 As shown, the global feature extraction model may include a second input layer 310, a graph convolutional neural network 320, a second classification layer 330, and a second output layer 340; wherein the second input layer, the graph convolutional neural network, the second classification layer, and the second output layer are connected in sequence; and the specific uses of each functional layer in the feature extraction process will be explained and described in detail later, and will not be repeated here.
[0109] In one example embodiment, reference is made to... Figure 4 As shown, the target relationship prediction model described in the example embodiment of this disclosure may include a first interaction relationship prediction model 410 obtained by training a local feature extraction model, and a second interaction relationship prediction model 420 obtained by training a global feature extraction model, as well as a weighted summation layer 430 and a third output layer 440; wherein the first interaction relationship prediction model and the second interaction relationship prediction model are respectively connected to the weighted summation layer, and the weighted summation layer is connected to the third output layer; at the same time, the specific uses of each functional layer in the feature extraction process will be explained and described in detail later, and will not be repeated here.
[0110] The following will combine Figures 2-4 right Figure 1 The training method of the relationship prediction model shown will be further explained and illustrated. Specifically:
[0111] In step S110, first historical proteins with correlation and second historical proteins without correlation are extracted from historical biological data, and a first dataset is constructed based on the first historical proteins and the actual correlation values between them.
[0112] In this example embodiment, firstly, historical biological data can be obtained from the corresponding database or cluster; wherein, the historical biological data recorded here may include the biological data of the corresponding patient, and the biological data may include multiple sets of different protein data; in the historical biological data, the relationship between each historical protein can be obtained according to the specific disease condition of the patient, and then the historical proteins with the relationship are classified as first historical proteins, and the historical proteins without the relationship are classified as second historical proteins.
[0113] Secondly, a first dataset is constructed based on the first historical proteins and the actual correlation values between them. Specifically, this can be achieved as follows: first historical protein pairs are formed based on the first historical proteins with correlations, and the first dataset is constructed based on the first historical protein pairs and the actual correlation values between the first historical proteins in the first historical protein pairs. That is, in practical applications, the specific presentation form of the obtained first dataset can be as follows: {(p1,p2,y1), (p3,p4,y3), ..., (pN,pN+1,yN)}, which can be labeled as D; at the same time, in this first dataset, p1 and p2 can be used to represent the first historical protein pairs, and y1 can be used to represent the actual correlation value between p1 and p2, which can be obtained from historical biological data.
[0114] In one example embodiment, each protein can be represented by amino acid sequence pairs; for example, the amino acid sequence pair of p1 can be MTAQDD…SYS, and the amino acid sequence pair of p2 can be EAELCP…DRC. In practical applications, it is necessary to use a target relationship prediction model obtained through training to predict the interaction relationship between two proteins whose relationship is unknown. This can also be considered as predicting the interaction relationship between two amino acid sequence pairs whose relationship is unknown through the target relationship prediction model.
[0115] In step S120, the local feature extraction model is trained based on the first dataset to obtain the first interaction relationship prediction model, and the second historical protein is input into the first interaction relationship prediction model to obtain the first predicted association value.
[0116] In this example embodiment, firstly, the local feature extraction model is trained based on the first dataset to obtain a first interaction relationship prediction model. Specifically, this can be achieved as follows: Firstly, the first historical protein pair in the first dataset is input into the local feature extraction model to obtain a first prediction result between the first historical proteins in the first historical protein pair. Secondly, based on the first prediction result and the actual correlation value, a first target loss function is constructed, and the first model parameters in the local feature extraction model are adjusted based on the first target loss function to obtain the first interaction relationship prediction model. Here, the first historical protein pair mentioned here may include a first sub-protein and a second sub-protein, such as p1 and p2 mentioned above, or p3 and p4, etc. This example does not impose any special restrictions on this.
[0117] In one example embodiment, reference is made to... Figure 5 As shown, inputting the first historical protein pairs from the first dataset into a local feature extraction model to obtain the first prediction results between the first historical proteins in the first historical protein pairs may include the following steps:
[0118] Step S510: Obtain the first amino acid sequence of the first sub-protein and the second amino acid sequence of the second sub-protein in the first historical protein pair in the first dataset, and construct the first amino acid vector and the second amino acid vector based on the first amino acid fragment in the first amino acid sequence and the second amino acid fragment in the second amino acid sequence.
[0119] Specifically, assume that the first amino acid sequence of the first sub-protein p1 in the first historical protein pair is MTAQDD…SYS, and the second amino acid sequence of the second sub-protein p2 is EAELCP…DRC. The first amino acid fragments in the first amino acid sequence can include MTA, QDD, …, SYS, etc., and the second amino acid fragments in the second amino acid sequence can include EAE, LCP, …, DRC, etc. The resulting first amino acid vector can be {MTA, QDD, …,SYS}, and the second amino acid vector can be {EAE, LCP, …, DRC}. It should be noted that the first and second amino acid vectors described here can be considered 3-mer vectors, where 3 represents the length of the amino acid fragment. In practical applications, the first and second amino acid vectors can also be represented as k-mer vectors, where the value of k can be selected according to actual needs; this example does not impose any special restrictions on this.
[0120] Step S520: The first feature coding layer is used to perform one-hot encoding on the first amino acid vector and the second amino acid vector to obtain the first encoding result and the second encoding result. The first feature mapping layer is then used to perform dense mapping on the first encoding result and the second encoding result to obtain the first dense mapping result and the second dense mapping result.
[0121] Specifically, the first feature encoding layer described here can be a one-hot encoding layer, and the first feature mapping layer described here can be an embedding layer. In practical applications, since there are 20 amino acids that make up proteins, there are 20 to the power of 3 (8000) types of 3-mer vectors for amino acids. Therefore, an 8000-dimensional one-hot vector is used here to represent the 3-mer of protein amino acids. That is, the vector dimensions of the first encoding result and the second encoding result are 8000-dimensional. Then, the 3-mer one-hot of amino acids (that is, the first encoding result and the second encoding result) is mapped to a dense embedding vector (that is, the first dense mapping result and the second dense mapping result) through the mapping matrix W1. The specific calculation process of the first dense mapping result and the second dense mapping result can be shown in the following formula (1):
[0122] ; Formula (1)
[0123] in, This represents the one-hot vector of the j-th 3-mer of the amino acid sequence. It is 256 in size A matrix of 8000; This represents the embedding vector of the j-th 3-mer in the amino acid sequence; that is, in the first and second encoding results, the encoding vector of each 3-mer fragment is independent; at the same time, in the first and second dense mapping results, the embedding vector of each 3-mer is also independent.
[0124] Step S530: The first feature sequence extraction layer is used to extract temporal features from the first dense mapping result and the second dense mapping result to obtain the first feature vector of the first sub-protein based on the first amino acid sequence and the second feature vector of the second sub-protein based on the second amino acid sequence.
[0125] Specifically, the first feature sequence extraction layer described herein may include multiple feature sequence prediction networks and a first splicing layer. The number of feature sequence prediction networks is consistent with the number of first amino acid fragments in the first amino acid vector. That is, each 3-mer embedding vector may correspond to a feature sequence prediction network. At the same time, the feature sequence prediction network described herein may be a Long Short-Term Memory (LSTM) network.
[0126] In practical applications, the first feature sequence extraction layer extracts temporal features from the first dense mapping result to obtain the first feature vector of the first sub-protein based on the first amino acid sequence. This can be achieved as follows: First, based on the fragment position of the first amino acid fragment in the first amino acid vector, the first sub-mapping result corresponding to each first amino acid fragment in the first dense mapping result is input into the feature sequence prediction network corresponding to the fragment position to obtain the first sub-sequence vector. Second, the first weight value of the first sub-sequence vector is calculated, and the first sub-sequence vector and the first weight value are weighted and summed through the first splicing layer to obtain the first feature vector of the first sub-protein based on the first amino acid sequence. That is, the embedding vector (the first dense mapping result and the second dense mapping result) can be used as the input of LSTM, and the output is the vector corresponding to each 3-mer. (That is, the first subsequence vector, which can have a dimension of 128); For a specific example of the computational scenario, please refer to the diagram below. Figure 6 As shown; further, assuming that the vector representation of the entire protein sequence is composed of linear combinations of the amino acid 3-mer vectors that make up the protein with different weights, the specific calculation process of the first feature vector can be shown in the following formula (2):
[0127] ; Formula (2)
[0128] in, This represents the weight value (i.e., the first weight value) of the 3-mer of the i-th amino acid. The specific calculation process for the first weight value can be shown in the following formula (3):
[0129] ; Formula (3)
[0130] in, It is a 128-dimensional constant parameter vector.
[0131] At this point, the specific calculation process of the first eigenvector has been fully implemented; meanwhile, the specific calculation process of the second eigenvector is similar to that of the first eigenvector, and will not be elaborated further here.
[0132] Step S540: Input the first feature vector and the second feature vector into the first classification layer to obtain the first prediction result between the first sub-protein and the second sub-protein.
[0133] Specifically, in practical applications, assuming proteins p1 and p2 are given, the sequence model module can obtain the amino acid sequence-based vector representations s1 (i.e., the first feature vector) and s2 (i.e., the second feature vector) of p1 and p2. Then, the relationship between p1 and p2 is predicted through the first classification layer (also known as the first classifier). Among them, the first classification layer C1 can be implemented by the following formula (3) in the process of classifying s1 and s2:
[0134] ; Formula (3)
[0135] in, It is the sigmoid function. Indicates to as well as To splice; as well as This represents the first classification parameter corresponding to the first classification layer.
[0136] At this point, the specific calculation process for the first prediction result has been fully implemented. Furthermore, after obtaining the first prediction result, a first objective loss function can be constructed based on the first prediction result and the actual correlation value; the first objective loss function described here can use the minimization loss function corresponding to the stochastic gradient descent method; the specific form of the first objective loss function can be shown in the following formula (4):
[0137] ; Formula (4)
[0138] Where i = 1, 3, 5, ..., N, Indicates the actual relationship value. It represents the interaction between pi and pi+1 predicted by classifier C1, which is also the first prediction result.
[0139] Furthermore, after obtaining the first objective loss function, the first model parameters in the local feature extraction model can be adjusted based on the first objective loss function to obtain the first interaction relationship prediction model. The first model parameters mentioned here may include the first encoding parameters corresponding to the first feature encoding layer, the first mapping parameters corresponding to the first feature mapping layer, the first feature extraction parameters corresponding to the first feature sequence extraction layer, and the first classification parameters corresponding to the first classification layer, etc. Therefore, it can be seen that the process of adjusting the first model parameters in the local feature extraction model based on the first objective loss function to obtain the first interaction relationship prediction model can be achieved in the following way: adjusting the first encoding parameters, first mapping parameters, first feature extraction parameters, and first classification parameters in the local feature extraction model based on the first objective loss function to obtain the first interaction relationship prediction model.
[0140] At this point, the entire training process for the first interaction prediction model has been completed. After obtaining the first interaction prediction model, the first predicted association value between second historical proteins that do not have a direct association can be predicted based on it. The specific calculation process for the first predicted association value is similar to the calculation process for the first prediction result, and will not be elaborated further here. Simultaneously, in the calculation of the first predicted association value, it is first necessary to construct second protein pairs based on the second historical proteins. These second protein pairs can include {(up1,up2),…,(upK,upK+1)}, etc. After obtaining the second protein pairs, the first predicted association value usyi between (up1,up2),…,(upK,upK+1) can be calculated based on the first interaction prediction model.
[0141] In step S130, a second dataset is constructed based on the second historical protein and the first predicted association value, and a global feature extraction model is trained based on the first dataset and the second dataset to obtain a second interaction relationship prediction model.
[0142] In this example embodiment, firstly, a second dataset is constructed based on the second historical proteins and the first predicted association value. Specifically, this can be achieved as follows: First, determine whether the first predicted association value between the second historical proteins is greater than a preset threshold; second, if the first predicted association value is greater than the preset threshold, form second historical protein pairs based on the second historical proteins corresponding to the first predicted association value; then, construct the second dataset based on the second historical protein pairs and the first predicted association value. The resulting second dataset may include: {(up1,up2,usy1), ..., (upK,upK+1,usyK)}, where usyi (i=1,3,...,K) is the predicted label, i.e., the first predicted association value; simultaneously, in the second dataset, second historical protein pairs with usyi ≥ 0.8 need to be selected for the construction of the second dataset; the resulting second dataset can be labeled D1.
[0143] Secondly, after obtaining the second dataset, the global feature extraction model can be trained based on the first and second datasets to obtain the second interaction prediction model. Specifically, the training process of the global feature extraction model can be implemented as follows: First, construct an original feature map based on the first and second datasets, and input the original feature map into the global feature extraction model to obtain the second prediction result; second, construct a second target loss function based on the second prediction result, the actual correlation value, and / or the first predicted correlation value, and adjust the second model parameters in the global feature extraction model based on the second target loss function to obtain the second interaction prediction model.
[0144] In one example embodiment, constructing an original feature map based on the first dataset and the second dataset can be achieved as follows: First, obtain the first historical protein pair and the actual correlation values between the first historical protein pairs in the first dataset, and the second historical protein pair and the first predicted correlation values between the second historical protein pairs in the second dataset; second, use the first sub-protein and the second sub-protein in the first historical protein pair as the first vertex and the second vertex, the actual correlation as the first connecting edge, and the actual correlation value as the first edge weight of the first connecting edge; then, use the third sub-protein and the fourth sub-protein in the second historical protein pair as the third vertex and the fourth vertex, the first predicted correlation as the second connecting edge, and the first predicted correlation value as the second edge weight of the second connecting edge; finally, construct the original feature map based on the first vertex, the second vertex, the third vertex, the fourth vertex, the first connecting edge, the second connecting edge, the first edge weight, and the second edge weight. That is, in practical applications, each sub-protein can be abstracted as a node, and the correlation values between each sub-protein can be abstracted as edges to construct the original feature map. The obtained original feature map can be referenced from... Figure 7 As shown.
[0145] In one example embodiment, inputting the original feature map into a global feature extraction model to obtain a second prediction result can be achieved as follows: First, the original feature map is input into the graph convolutional neural network to update the first and / or second connecting edges in the original feature map, obtaining a target feature map; second, the target feature map is input into a second classification layer to obtain the second prediction result. That is, in practical applications, it is assumed that the protein-protein interaction network (original feature map) has N1 nodes. M1 edge ;at the same time, Indicates the node in step t The vector (vector dimension is 128); under this premise, the first connecting edge and / or the second connecting edge in the original feature map are updated in the graph convolutional neural network, as shown by the following formulas (5) and (6):
[0146] ; Formula (5)
[0147] ; Formula (6)
[0148] in, It is the Leaky ReLU activation function; Represents a node The set of neighboring nodes, express and The inner product between; These are the parameters of the graph neural network; simultaneously, the attention weights. The strength of the connection between nodes i and k is indicated. In this example embodiment, firstly, the parameters of the graph neural network and the initial vectors of each node are randomly initialized. Secondly, the graph convolutional neural network is iterated through the original feature map. Assuming that the maximum value of t is 5, the graph neural network obtains the final vector representation of each node (i.e., protein) (i.e., the target feature map) after 5 iterations. The target feature map described here can be represented by g1 and g2.
[0149] In one example embodiment, after obtaining the target feature map, the target feature map can be input into the second classification layer to obtain the second prediction result; wherein, the specific calculation process of the second prediction result can be implemented by the following formula (7):
[0150] ; Formula (7)
[0151] It is the sigmoid function. This indicates that g1 and g2 are concatenated; and The parameters used to represent the second classification layer, also known as the second classification parameters; a specific calculation scenario example diagram can be found in [the diagram]. Figure 8 As shown.
[0152] Meanwhile, after obtaining the second prediction result, the second target loss function can be constructed based on the second prediction result and the actual correlation value and / or the first predicted correlation value; that is, the loss function (second target loss function) can be minimized using the stochastic gradient descent method based on D∪D1; the specific form of the second target loss function can be shown in the following formula (8):
[0153] ; Formula (8)
[0154] Where i = 1, 3, ..., L, and L = the number of elements in the training dataset + the number of elements in D1. It is the interaction relationship between pi and pi+1 and upK and upK+1 predicted by the second classification layer C2, that is, the second prediction result; It can be used to represent the actual correlation value and / or the first predicted correlation value; at the same time, in the process of adjusting the second model parameters in the global feature extraction model through the second objective loss function, the second model parameters may include the parameters corresponding to the graph convolutional neural network, and also the second classification parameters corresponding to the second classification layer.
[0155] In step S140, a target relationship prediction model is constructed based on the first interaction relationship prediction model and the second interaction relationship prediction model.
[0156] Specifically, the target relationship prediction model can be constructed based on the first and second interaction relationship prediction models as follows: First, the second historical protein is input into the second interaction relationship prediction model to obtain the second predicted association value, and a third dataset is constructed based on the second historical protein and the second predicted association value. Second, the local feature extraction model is trained based on the first and third datasets to obtain an updated first interaction relationship prediction model. Then, the target relationship prediction model is constructed based on the updated first and second interaction relationship prediction models. That is, in practical applications, the second predicted association value ugyi of the second historical protein can be predicted based on the second interaction relationship prediction model, thereby obtaining the third dataset {(up1,up2,ugy1),…,(upK,upK+1,ugyK)}, which can be denoted as D1. Simultaneously, during the construction of the third dataset, the second historical protein pairs corresponding to ugyi≥0.8 need to be selected to construct the third dataset. Then, by repeating the iteration, the updated first and second interaction relationship prediction models can be obtained, thus yielding the target relationship prediction model.
[0157] The following will further explain and illustrate the specific generation process of the target relationship prediction model with specific examples. First, given training data {(p1,p2,y1), (p3,p4,y3),…,(pN,pN+1,yN)}, denoted as D, and unlabeled data {(up1,up2),…,(upK,upK+1)}, let D2 = Secondly, based on the training dataset D∪D2, the loss function is minimized using stochastic gradient descent. (i=1,3,…,N) are used to learn the parameters of the embedding layer, sequence model, and classifier C1; where, This represents the interaction between pi and pi+1 predicted by classifier C1. Further, using the trained parameters, we predict the unlabeled dataset to obtain {(up1,up2,usy1),…,(upK,upK+1,usyK)}, where usyi (i=1,3,…,K) are the predicted labels. We select the terms where usyi≥0.8 to obtain a new dataset, denoted as D1. Further still, based on D∪D1, we minimize the loss function using stochastic gradient descent. (i=1,3,…,L where L=number of elements in the training dataset + number of elements in D1) to learn the parameters of the graph neural network and classifier C2; where The classifier C2 predicts the interaction between pi and pi+1. Further, the trained parameters are used to predict the unlabeled dataset to obtain {(up1,up2,ugy1),…,(upK,upK+1,ugyK)}, where ugyi (i=1,3,…,K) is the predicted label. Items with ugyi ≥ 0.8 are selected to obtain a new dataset, denoted as D2. Finally, the above steps are repeated until the model converges or reaches the maximum number of iterations (the specific value could be 100 or other values; this example does not impose special restrictions). It should be noted that because both global and local features of the protein are considered during model training, the problem of low model accuracy caused by only utilizing local protein information while ignoring global information in existing technologies is solved. Furthermore, because the model can be iteratively trained based on unlabeled data, the accuracy of the obtained relationship prediction model can be improved by expanding the amount of training data.
[0158] At this point, the training method for the relationship prediction model described in the exemplary embodiments of this disclosure has been fully implemented.
[0159] This disclosure also provides an example embodiment of a method for predicting interaction relationships. This method can run on a server, server cluster, or cloud server, etc. Of course, those skilled in the art can also run the method of this disclosure on other platforms as needed, and this exemplary embodiment does not impose any special limitations on this. Specifically, refer to... Figure 9 As shown, the method for predicting this interaction relationship may include the following steps:
[0160] Step S910. Obtain the first protein to be predicted and the second protein to be predicted, and construct a protein pair to be predicted based on the first protein to be predicted and the second protein to be predicted;
[0161] Step S920. Input the protein pair to be predicted into the target relationship prediction model to obtain the relationship prediction result; wherein, the target relationship prediction model is trained by any of the above-mentioned relationship prediction model training methods; the target relationship prediction model includes a first interaction relationship prediction model and a second interaction relationship prediction model;
[0162] Step S930. Based on the relationship prediction results, determine the interaction relationship between the first protein to be predicted and the second protein to be predicted.
[0163] In one example embodiment, the protein pair to be predicted is input into the target relationship prediction model to obtain the relationship prediction result, which can be achieved as follows: First, the protein pair to be predicted is input into the first interaction relationship prediction model to obtain the first relationship prediction result, and the protein pair to be predicted is input into the second interaction relationship prediction model to obtain the second relationship prediction result; second, the first relationship prediction result and the second relationship prediction result are weighted and summed to obtain the relationship prediction result. That is, assuming a new protein pi (first protein to be predicted) and protein pj (second protein to be predicted) are given, protein pi (first protein to be predicted) and protein pj (second protein to be predicted) can first be input into the first interaction relationship prediction model to obtain the first relationship prediction result y1, and then protein pi (first protein to be predicted) and protein pj (second protein to be predicted) can be input into the second interaction relationship prediction model to obtain the second relationship prediction result y2, and finally the relationship prediction result is calculated; wherein, the specific calculation process of the relationship prediction result can be achieved by the following formula (9):
[0164] ; Formula (9)
[0165] Furthermore, if y > 0.5, an interaction relationship is determined to exist between pi and pj; otherwise, no interaction relationship exists. It should be noted that in practical applications, the weights of the first relationship prediction result y1 and the second relationship prediction result y2 can be determined according to actual needs, as long as the sum of the weights is 1. This example does not impose any special restrictions on this.
[0166] In one example embodiment, in practical applications, obtaining the first and second predictable proteins can be achieved as follows: In response to a touch operation on a first interactive control on the display interface, the user identifier of the user to be predicted is obtained; wherein the first interactive control includes a user identifier input box; in response to a touch operation on a second interactive control on the display interface, the first predictable protein associated with the user to be predicted and the second predictable protein associated with the first predictable protein are obtained based on the user identifier; wherein the second interactive control includes a relationship prediction interactive control. That is, in scenarios involving disease-assisted diagnosis or disease prediction, when a doctor needs to predict whether a patient (the user to be predicted) has a possibility of having a certain disease or whether they have a certain disease based on the first predictable protein, the doctor can input the patient's user identifier on the display interface of the hospital information system; wherein the user identifier can be the patient's user code at the hospital, or the user's ID card number or phone number, etc., and this example does not impose any special restrictions; then, the first predictable protein can be obtained from the database based on the user identifier; then, if a certain disease is to be predicted, the second predictable protein associated with that disease can be obtained from the database.
[0167] In one example embodiment, during practical application, after determining the interaction relationship between the first and second proteins to be predicted, the method for predicting this interaction relationship may further include: generating a relationship prediction report associated with the user to be predicted based on the interaction relationship, and displaying the relationship prediction report on the display interface. That is, the relationship prediction report can be displayed on the display interface, allowing doctors to comprehensively assess whether the patient has a certain disease based on the relationship prediction report. This approach not only fully utilizes the patient's protein data (the first protein to be predicted), but also allows for a comprehensive assessment of the patient's disease based on multiple different dimensions, thereby improving the accuracy of the obtained disease diagnosis and treatment results.
[0168] In the above-mentioned method for predicting interaction relationships, on the one hand, since both global and local features of the protein are considered during the model training process, the problem of low model accuracy caused by only using local information of the protein and ignoring global information in the existing technology is solved; on the other hand, the accuracy of the prediction results obtained by predicting relationships through the target relationship prediction model is improved, that is, the accuracy of the obtained relationship prediction results is improved.
[0169] It should be further explained here that, in practical applications, the existence of an interaction between the first and second predicted proteins can be used as an adjunct to treatment, such as determining the presence of a certain disease, whether an allergic reaction will occur, or the severity of a certain disease's symptoms.
[0170] The following are embodiments of the apparatus disclosed herein, which can be used to execute embodiments of the method disclosed herein. For details not disclosed in the apparatus embodiments of this disclosure, please refer to the embodiments of the method disclosed herein.
[0171] This disclosure also provides an example embodiment of a training apparatus for a relationship prediction model. Specifically, refer to... Figure 10 As shown, the training device for the relationship prediction model may include a first dataset construction module 1010, a first model training module 1020, a second model training module 1030, and a target relationship prediction model construction module 1040. Wherein:
[0172] The first dataset construction module 1010 can be used to extract the first historical proteins that are related and the second historical proteins that are not related from historical biological data, and construct the first dataset based on the first historical proteins and the actual correlation values between the first historical proteins.
[0173] The first model training module 1020 can be used to train the local feature extraction model based on the first dataset to obtain the first interaction relationship prediction model, and input the second historical protein into the first interaction relationship prediction model to obtain the first predicted association value;
[0174] The second model training module 1030 can be used to construct a second dataset based on the second historical protein and the first predicted association value, and to train the global feature extraction model based on the first dataset and the second dataset to obtain a second interaction relationship prediction model.
[0175] The target relationship prediction model construction module 1040 can be used to construct a target relationship prediction model based on the first interaction relationship prediction model and the second interaction relationship prediction model.
[0176] In one exemplary embodiment of this disclosure, constructing a first dataset based on first historical proteins and actual correlation values between first historical proteins includes: forming first historical protein pairs based on first historical proteins with correlation, and constructing the first dataset based on the first historical protein pairs and actual correlation values between first historical proteins in the first historical protein pairs.
[0177] In one exemplary embodiment of this disclosure, training a local feature extraction model based on a first dataset to obtain a first interaction relationship prediction model includes: inputting a first historical protein pair from the first dataset into the local feature extraction model to obtain a first prediction result between the first historical proteins in the first historical protein pair; constructing a first target loss function based on the first prediction result and the actual correlation value; and adjusting the first model parameters in the local feature extraction model based on the first target loss function to obtain the first interaction relationship prediction model.
[0178] In an exemplary embodiment of this disclosure, the local feature extraction model includes a first feature encoding layer, a first feature mapping layer, a first feature sequence extraction layer, and a first classification layer; the first historical protein pair includes a first sub-protein and a second sub-protein; wherein, inputting the first historical protein pair in the first dataset into the local feature extraction model to obtain a first prediction result between the first historical proteins in the first historical protein pair includes: obtaining the first amino acid sequence of the first sub-protein and the second amino acid sequence of the second sub-protein in the first historical protein pair in the first dataset, and constructing a first amino acid vector based on the first amino acid fragment in the first amino acid sequence and the second amino acid fragment in the second amino acid sequence. The first amino acid vector is used; the first feature encoding layer performs one-hot encoding on the first amino acid vector and the second amino acid vector to obtain the first encoding result and the second encoding result; the first feature mapping layer performs dense mapping on the first encoding result and the second encoding result to obtain the first dense mapping result and the second dense mapping result; the first feature sequence extraction layer performs temporal feature extraction on the first dense mapping result and the second dense mapping result to obtain the first feature vector of the first sub-protein based on the first amino acid sequence and the second feature vector of the second sub-protein based on the second amino acid sequence; the first feature vector and the second feature vector are input into the first classification layer to obtain the first prediction result between the first sub-protein and the second sub-protein.
[0179] In one exemplary embodiment of this disclosure, the first feature sequence extraction layer includes multiple feature sequence prediction networks and a first splicing layer, wherein the number of feature sequence prediction networks is consistent with the number of first amino acid fragments in the first amino acid vector; wherein, using the first feature sequence extraction layer to extract temporal features from the first dense mapping result to obtain a first feature vector of the first sub-protein based on the first amino acid sequence includes: according to the fragment position of the first amino acid fragment in the first amino acid vector, inputting the first sub-mapping result corresponding to each first amino acid fragment in the first dense mapping result into the feature sequence prediction network corresponding to the fragment position to obtain a first sub-sequence vector; calculating a first weight value of the first sub-sequence vector, and performing a weighted summation of the first sub-sequence vector and the first weight value through the first splicing layer to obtain the first feature vector of the first sub-protein based on the first amino acid sequence.
[0180] In one exemplary embodiment of this disclosure, the first model parameters include a first encoding parameter corresponding to the first feature encoding layer, a first mapping parameter corresponding to the first feature mapping layer, a first feature extraction parameter corresponding to the first feature sequence extraction layer, and a first classification parameter corresponding to the first classification layer; adjusting the first model parameters in the local feature extraction model based on the first target loss function to obtain the first interaction relationship prediction model includes: adjusting the first encoding parameter, the first mapping parameter, the first feature extraction parameter, and the first classification parameter in the local feature extraction model based on the first target loss function to obtain the first interaction relationship prediction model.
[0181] In one exemplary embodiment of this disclosure, constructing a second dataset based on the second historical proteins and a first predicted association value includes: determining whether the first predicted association value between the second historical proteins is greater than a preset threshold; when it is determined that the first predicted association value is greater than the preset threshold, forming a second historical protein pair based on the second historical proteins corresponding to the first predicted association value; and constructing the second dataset based on the second historical protein pair and the first predicted association value.
[0182] In one exemplary embodiment of this disclosure, training a global feature extraction model based on a first dataset and a second dataset to obtain a second interaction relationship prediction model includes: constructing an original feature map based on the first dataset and the second dataset, and inputting the original feature map into the global feature extraction model to obtain a second prediction result; constructing a second target loss function based on the second prediction result and the actual association value and / or the first predicted association value, and adjusting the second model parameters in the global feature extraction model based on the second target loss function to obtain the second interaction relationship prediction model.
[0183] In one exemplary embodiment of this disclosure, constructing an original feature map based on the first dataset and the second dataset includes: obtaining a first historical protein pair and the actual correlation value between the first historical protein pairs in the first dataset, and a second historical protein pair and the first predicted correlation value between the second historical protein pairs in the second dataset; using the first sub-protein and the second sub-protein in the first historical protein pair as the first vertex and the second vertex, the actual correlation as the first connecting edge, and the actual correlation value as the first edge weight of the first connecting edge; using the third sub-protein and the fourth sub-protein in the second historical protein pair as the third vertex and the fourth vertex, the first predicted correlation as the second connecting edge, and the first predicted correlation value as the second edge weight of the second connecting edge; and constructing the original feature map based on the first vertex, the second vertex, the third vertex, the fourth vertex, the first connecting edge, the second connecting edge, the first edge weight, and the second edge weight.
[0184] In one exemplary embodiment of this disclosure, the global feature extraction model includes a graph convolutional neural network and a second classification layer; wherein, inputting the original feature map into the global feature extraction model to obtain a second prediction result includes: inputting the original feature map into the graph convolutional neural network, updating the first connecting edge and / or the second connecting edge in the original feature map to obtain a target feature map; and inputting the target feature map into the second classification layer to obtain the second prediction result.
[0185] In one exemplary embodiment of this disclosure, constructing a target relationship prediction model based on the first interaction relationship prediction model and the second interaction relationship prediction model includes: inputting the second historical protein into the second interaction relationship prediction model to obtain a second predicted association value, and constructing a third dataset based on the second historical protein and the second predicted association value; training a local feature extraction model based on the first dataset and the third dataset to obtain an updated first interaction relationship prediction model; and constructing a target relationship prediction model based on the updated first interaction relationship prediction model and the second interaction relationship prediction model.
[0186] This disclosure also provides an example embodiment of an apparatus for predicting interaction relationships. Specifically, refer to... Figure 11 As shown, the device for predicting the interaction relationship may include a protein pair construction module 1110, a relationship prediction result determination module 1120, and an interaction relationship determination module 1130. Wherein:
[0187] The protein pair construction module 1110 can be used to obtain a first protein to be predicted and a second protein to be predicted, and to construct a protein pair to be predicted based on the first protein to be predicted and the second protein to be predicted.
[0188] The relation prediction result determination module 1120 can be used to input the protein pair to be predicted into the target relation prediction model to obtain the relation prediction result; wherein, the target relation prediction model is trained by any of the above-mentioned relation prediction model training methods;
[0189] The interaction relationship determination module 1130 can be used to determine the interaction relationship between the first protein to be predicted and the second protein to be predicted based on the relationship prediction result.
[0190] In one exemplary embodiment of this disclosure, the target relationship prediction model includes a first interaction relationship prediction model and a second interaction relationship prediction model; wherein, inputting the protein pair to be predicted into the target relationship prediction model to obtain a relationship prediction result includes: inputting the protein pair to be predicted into the first interaction relationship prediction model to obtain a first relationship prediction result, and inputting the protein pair to be predicted into the second interaction relationship prediction model to obtain a second relationship prediction result; and performing a weighted summation of the first relationship prediction result and the second relationship prediction result to obtain the relationship prediction result.
[0191] In one exemplary embodiment of this disclosure, obtaining a first protein to be predicted and a second protein to be predicted includes: obtaining a user identifier of a user to be predicted in response to a touch operation on a first interactive control on a display interface; wherein the first interactive control includes a user identifier input box; and obtaining a first protein to be predicted associated with the user to be predicted and a second protein to be predicted associated with the first protein to be predicted based on the user identifier in response to a touch operation on a second interactive control on the display interface; wherein the second interactive control includes a relationship prediction interactive control.
[0192] In one exemplary embodiment of this disclosure, the device for predicting the interaction relationship further includes:
[0193] The relationship prediction report display module can be used to generate a relationship prediction report associated with the user to be predicted based on the interaction relationship, and display the relationship prediction report on the display interface.
[0194] The specific details of each module in the training device of the above-mentioned relationship prediction model and the prediction device of interaction relationship have been described in detail in the corresponding training method of the relationship prediction model and the prediction method of interaction relationship, so they will not be repeated here.
[0195] It should be noted that although several modules or units for the device used to perform actions have been mentioned in the detailed description above, this division is not mandatory. In fact, according to embodiments of this disclosure, the features and functions of two or more modules or units described above can be embodied in one module or unit. Conversely, the features and functions of one module or unit described above can be further divided and embodied by multiple modules or units.
[0196] Furthermore, although the steps of the method in this disclosure are described in a specific order in the accompanying drawings, this does not require or imply that the steps must be performed in that specific order, or that all the steps shown must be performed to achieve the desired result. Additional or alternative steps may be omitted, multiple steps may be combined into one step, and / or a step may be broken down into multiple steps.
[0197] In an exemplary embodiment of this disclosure, an electronic device capable of implementing the above-described method is also provided.
[0198] Those skilled in the art will understand that various aspects of this disclosure can be implemented as systems, methods, or program products. Therefore, various aspects of this disclosure can be specifically implemented in the following forms: entirely in hardware, entirely in software (including firmware, microcode, etc.), or in a combination of hardware and software, collectively referred to herein as “circuit,” “module,” or “system.”
[0199] The following reference Figure 12 To describe an electronic device 1200 according to such an embodiment of the present disclosure. Figure 12 The electronic device 1200 shown is merely an example and should not impose any limitation on the functionality and scope of use of the embodiments disclosed herein.
[0200] like Figure 12 As shown, the electronic device 1200 is manifested in the form of a general-purpose computing device. The components of the electronic device 1200 may include, but are not limited to: at least one processing unit 1210, at least one storage unit 1220, a bus 1230 connecting different system components (including storage unit 1220 and processing unit 1210), and a display unit 1240.
[0201] The storage unit stores program code that can be executed by the processing unit 1210, causing the processing unit 1210 to perform the steps described in the "Exemplary Methods" section of this specification according to various exemplary embodiments of this disclosure. For example, the processing unit 1210 can perform actions such as... Figure 1 The steps shown are as follows: Step S110: Extract first historical proteins with correlation and second historical proteins without correlation from historical biological data, and construct a first dataset based on the first historical proteins and the actual correlation values between them; Step S120: Train a local feature extraction model based on the first dataset to obtain a first interaction relationship prediction model, and input the second historical proteins into the first interaction relationship prediction model to obtain a first predicted correlation value; Step S130: Construct a second dataset based on the second historical proteins and the first predicted correlation value, and train a global feature extraction model based on the first dataset and the second dataset to obtain a second interaction relationship prediction model; Step S140: Construct a target relationship prediction model based on the first interaction relationship prediction model and the second interaction relationship prediction model.
[0202] For example, the processing unit 1210 can perform actions such as Figure 9 Step S910: Obtain the first protein to be predicted and the second protein to be predicted, and construct a protein pair to be predicted based on the first protein to be predicted and the second protein to be predicted; Step S920: Input the protein pair to be predicted into the target relationship prediction model to obtain the relationship prediction result; wherein, the target relationship prediction model is trained by any of the above-mentioned relationship prediction model training methods; Step S930: Determine the interaction relationship between the first protein to be predicted and the second protein to be predicted based on the relationship prediction result.
[0203] Storage unit 1220 may include readable media in the form of volatile storage units, such as random access memory (RAM) 12201 and / or cache memory 12202, and may further include read-only memory (ROM) 12203.
[0204] Storage unit 1220 may also include a program / utility 12204 having a set (at least one) of program modules 12205, such program modules 12205 including but not limited to: operating system, one or more application programs, other program modules and program data, each or some combination of these examples may include an implementation of a network environment.
[0205] Bus 1230 can represent one or more of several types of bus structures, including a memory cell bus or memory cell controller, a peripheral bus, a graphics acceleration port, a processing unit, or a local bus using any of the various bus structures.
[0206] Electronic device 1200 can also communicate with one or more external devices 1300 (e.g., keyboard, pointing device, Bluetooth device, etc.), one or more devices that enable a user to interact with electronic device 1200, and / or any device that enables electronic device 1200 to communicate with one or more other computing devices (e.g., router, modem, etc.). This communication can be performed via input / output (I / O) interface 1250. Furthermore, electronic device 1200 can also communicate with one or more networks (e.g., local area network (LAN), wide area network (WAN), and / or public networks, such as the Internet) via network adapter 1260. As shown, network adapter 1260 communicates with other modules of electronic device 1200 via bus 1230. It should be understood that, although not shown in the figures, other hardware and / or software modules can be used in conjunction with electronic device 1200, including but not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, tape drives, and data backup storage systems.
[0207] From the above description of the embodiments, those skilled in the art will readily understand that the exemplary embodiments described herein can be implemented by software or by combining software with necessary hardware. Therefore, the technical solutions according to the embodiments of this disclosure can be embodied in the form of a software product, which can be stored in a non-volatile storage medium (such as a CD-ROM, USB flash drive, external hard drive, etc.) or on a network, including several instructions to cause a computing device (such as a personal computer, server, terminal device, or network device, etc.) to execute the methods according to the embodiments of this disclosure.
[0208] In exemplary embodiments of this disclosure, a computer-readable storage medium is also provided, on which a program product capable of implementing the methods described above is stored. In some possible implementations, various aspects of this disclosure may also be implemented as a program product including program code that, when the program product is run on a terminal device, causes the terminal device to perform the steps of the various exemplary embodiments of this disclosure described in the "Exemplary Methods" section above.
[0209] refer to Figure 10As shown, a program product 800 for implementing the above-described method according to an embodiment of the present disclosure is described. This product may employ a portable compact disc read-only memory (CD-ROM) and include program code, and may run on a terminal device, such as a personal computer. However, the program product of the present disclosure is not limited thereto. In this document, the readable storage medium may be any tangible medium containing or storing a program that may be used by or in conjunction with an instruction execution system, apparatus, or device.
[0210] The program product may employ any combination of one or more readable media. A readable medium may be a readable signal medium or a readable storage medium. A readable storage medium may be, for example, but not limited to, an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination thereof. More specific examples of readable storage media (a non-exhaustive list) include: an electrical connection having one or more wires, a portable disk, a hard disk, random access memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), optical storage devices, magnetic storage devices, or any suitable combination thereof.
[0211] Computer-readable signal media may include data signals propagated in baseband or as part of a carrier wave, carrying readable program code. Such propagated data signals may take various forms, including but not limited to electromagnetic signals, optical signals, or any suitable combination thereof. A readable signal medium may also be any readable medium other than a readable storage medium, capable of sending, propagating, or transmitting programs for use by or in conjunction with an instruction execution system, apparatus, or device.
[0212] The program code contained on the readable medium may be transmitted using any suitable medium, including but not limited to wireless, wired, optical fiber, RF, etc., or any suitable combination thereof.
[0213] Program code for performing the operations of this disclosure can be written in any combination of one or more programming languages, including object-oriented programming languages such as Java and C++, and conventional procedural programming languages such as C or similar languages. The program code can execute entirely on the user's computing device, partially on the user's computing device, as a standalone software package, partially on the user's computing device and partially on a remote computing device, or entirely on a remote computing device or server. In cases involving remote computing devices, the remote computing device can be connected to the user's computing device via any type of network, including a local area network (LAN) or a wide area network (WAN), or it can be connected to an external computing device (e.g., via the Internet using an Internet service provider).
[0214] Furthermore, the above figures are merely illustrative of the processes included in the method according to exemplary embodiments of this disclosure and are not intended to be limiting. It is readily understood that the processes shown in the above figures do not indicate or limit the temporal order of these processes. Additionally, it is readily understood that these processes may be executed synchronously or asynchronously, for example, in multiple modules.
[0215] Other embodiments of this disclosure will readily occur to those skilled in the art upon consideration of the specification and practice of the invention described herein. This application is intended to cover any variations, uses, or adaptations of this disclosure that follow the general principles of this disclosure and include common knowledge or customary techniques in the art not invented by this disclosure. The specification and embodiments are to be considered exemplary only, and the true scope and spirit of this disclosure are indicated by the claims.
Claims
1. A training method for a relationship prediction model, characterized in that, include: Extract the first historical proteins that are related and the second historical proteins that are not related from the historical biological data, and construct the first dataset based on the first historical proteins and the actual correlation values between the first historical proteins; The first historical protein pair in the first dataset is input into the local feature extraction model to obtain the first prediction result between the first historical proteins in the first historical protein pair; based on the first prediction result and the actual correlation value, a first target loss function is constructed, and the first model parameters in the local feature extraction model are adjusted based on the first target loss function to obtain the first interaction relationship prediction model; and the second historical protein is input into the first interaction relationship prediction model to obtain the first predicted correlation value. Determine whether the first predicted association value between the second historical proteins is greater than a preset threshold; When the first predicted association value is determined to be greater than the preset threshold, a second historical protein pair is formed based on the second historical protein corresponding to the first predicted association value; a second dataset is constructed based on the second historical protein pair and the first predicted association value, and an original feature map is constructed based on the first dataset and the second dataset. The original feature map is then input into a global feature extraction model to obtain a second prediction result; a second target loss function is constructed based on the second prediction result, the actual association value, and / or the first predicted association value, and the second model parameters in the global feature extraction model are adjusted based on the second target loss function to obtain a second interaction relationship prediction model. The second historical protein is input into the second interaction prediction model to obtain a second predicted association value, and a third dataset is constructed based on the second historical protein and the second predicted association value; the local feature extraction model is trained based on the first dataset and the third dataset to obtain an updated first interaction prediction model; this process is repeated iteratively, and a target relationship prediction model is constructed based on the updated first interaction prediction model and the second interaction prediction model; wherein, the first prediction result is obtained as follows: the first amino acid sequence of the first sub-protein and the second amino acid sequence of the second sub-protein in the first historical protein pair in the first dataset are obtained, and a first amino acid vector and a second amino acid vector are constructed based on the first amino acid fragment in the first amino acid sequence and the second amino acid fragment in the second amino acid sequence; using The first feature encoding layer performs one-hot encoding on the first amino acid vector and the second amino acid vector to obtain a first encoding result and a second encoding result. The first feature mapping layer then performs dense mapping on the first encoding result and the second encoding result to obtain a first dense mapping result and a second dense mapping result. The first feature sequence extraction layer then extracts temporal features from the first dense mapping result and the second dense mapping result to obtain a first feature vector of the first subprotein based on the first amino acid sequence and a second feature vector of the second subprotein based on the second amino acid sequence. The first feature vector and the second feature vector are then input into the first classification layer to obtain a first prediction result between the first subprotein and the second subprotein.
2. The training method for the relationship prediction model according to claim 1, characterized in that, Based on the first historical proteins and the actual association values between them, a first dataset is constructed, including: First historical protein pairs are formed based on the first historical proteins that have a relationship, and a first dataset is constructed based on the first historical protein pairs and the actual relationship values between the first historical proteins in the first historical protein pairs.
3. The training method for the relationship prediction model according to claim 1, characterized in that, The first feature sequence extraction layer includes multiple feature sequence prediction networks and a first splicing layer, wherein the number of feature sequence prediction networks is the same as the number of first amino acid fragments in the first amino acid vector; Specifically, the first feature sequence extraction layer is used to extract temporal features from the first dense mapping result to obtain a first feature vector of the first sub-protein based on the first amino acid sequence, including: Based on the position of the first amino acid fragment in the first amino acid vector, the first sub-mapping results corresponding to each first amino acid fragment in the first dense mapping result are input into the feature sequence prediction network corresponding to the fragment position to obtain the first sub-sequence vector; Calculate the first weight value of the first sub-sequence vector, and then perform a weighted summation of the first sub-sequence vector and the first weight value through the first splicing layer to obtain the first feature vector of the first sub-protein based on the first amino acid sequence.
4. The training method for the relationship prediction model according to claim 1, characterized in that, The first model parameters include a first encoding parameter corresponding to the first feature encoding layer, a first mapping parameter corresponding to the first feature mapping layer, a first feature extraction parameter corresponding to the first feature sequence extraction layer, and a first classification parameter corresponding to the first classification layer; The first model parameters in the local feature extraction model are adjusted based on the first objective loss function to obtain the first interaction relationship prediction model, including: The first encoding parameter, first mapping parameter, first feature extraction parameter, and first classification parameter in the local feature extraction model are adjusted based on the first objective loss function to obtain the first interaction relationship prediction model.
5. The training method for the relationship prediction model according to claim 1, characterized in that, Construct the original feature map based on the first dataset and the second dataset, including: Obtain the first historical protein pair and the actual correlation value between the first historical protein pairs in the first dataset, and the first predicted correlation value between the second historical protein pairs and the second historical protein pairs in the second dataset; The first and second sub-proteins in the first historical protein pair are taken as the first and second vertices, respectively. The actual relationship is taken as the first connecting edge, and the actual relationship value is taken as the first edge weight of the first connecting edge. Using the third and fourth sub-proteins in the second historical protein pair as the third and fourth vertices, the first predicted association relationship as the second connecting edge, and the first predicted association relationship value as the second edge weight of the second connecting edge; Construct the original feature map based on the first vertex, second vertex, third vertex, fourth vertex, first connecting edge, second connecting edge, first edge weight, and second edge weight.
6. The training method for the relationship prediction model according to claim 5, characterized in that, The global feature extraction model includes a graph convolutional neural network and a second classification layer; The original feature map is input into the global feature extraction model to obtain a second prediction result, including: The original feature map is input into the graph convolutional neural network, and the first connecting edge and / or the second connecting edge in the original feature map are updated to obtain the target feature map; The target feature map is input into the second classification layer to obtain the second prediction result.
7. A method for predicting interaction relationships, characterized in that, include: Obtain a first protein to be predicted and a second protein to be predicted, and construct a protein pair to be predicted based on the first protein to be predicted and the second protein to be predicted; The protein pair to be predicted is input into the target relationship prediction model to obtain the relationship prediction result; wherein the target relationship prediction model is trained by the training method of the relationship prediction model according to any one of claims 1-6; Based on the relationship prediction results, the interaction relationship between the first predicted protein and the second predicted protein is determined.
8. The method for predicting interaction relationships according to claim 7, characterized in that, The target relationship prediction model includes a first interaction relationship prediction model and a second interaction relationship prediction model; The process of inputting the protein pair to be predicted into the target relation prediction model to obtain relation prediction results includes: The protein pair to be predicted is input into the first interaction prediction model to obtain the first relationship prediction result, and the protein pair to be predicted is input into the second interaction prediction model to obtain the second relationship prediction result; The first relationship prediction result and the second relationship prediction result are weighted and summed to obtain the relationship prediction result.
9. The method for predicting interaction relationships according to claim 7, characterized in that, Obtain the first and second proteins to be predicted, including: In response to a touch operation on a first interactive control on a display interface, a user identifier of the user to be predicted is obtained; wherein the first interactive control includes a user identifier input box; In response to a touch operation on a second interactive control on the display interface, a first predictable protein associated with the user to be predicted and a second predictable protein associated with the first predictable protein are obtained based on the user identifier; wherein the second interactive control includes a relationship prediction interactive control.
10. The method for predicting interaction relationships according to claim 9, characterized in that, After determining the interaction relationship between the first and second proteins to be predicted, the method for predicting the interaction relationship further includes: A relationship prediction report is generated based on the interaction relationship and associated with the user to be predicted, and the relationship prediction report is displayed on the display interface.
11. A training device for a relationship prediction model, characterized in that, include: The first dataset construction module is used to extract the first historical proteins that are related and the second historical proteins that are not related from the historical biological data, and to construct the first dataset based on the first historical proteins and the actual correlation values between the first historical proteins. The first model training module is used to input the first historical protein pair in the first dataset into the local feature extraction model to obtain the first prediction result between the first historical proteins in the first historical protein pair; construct a first target loss function based on the first prediction result and the actual correlation value, and adjust the first model parameters in the local feature extraction model based on the first target loss function to obtain the first interaction relationship prediction model, and input the second historical protein into the first interaction relationship prediction model to obtain the first predicted correlation value; The second model training module is used to determine whether the first predicted association value between the second historical proteins is greater than a preset threshold. When the first predicted association value is determined to be greater than the preset threshold, a second historical protein pair is formed based on the second historical protein corresponding to the first predicted association value; a second dataset is constructed based on the second historical protein pair and the first predicted association value, and an original feature map is constructed based on the first dataset and the second dataset. The original feature map is then input into a global feature extraction model to obtain a second prediction result; a second target loss function is constructed based on the second prediction result, the actual association value, and / or the first predicted association value, and the second model parameters in the global feature extraction model are adjusted based on the second target loss function to obtain a second interaction relationship prediction model. A target relationship prediction model construction module is used to input the second historical protein into the second interaction relationship prediction model to obtain a second predicted association value, and to construct a third dataset based on the second historical protein and the second predicted association value; to train a local feature extraction model based on the first dataset and the third dataset to obtain an updated first interaction relationship prediction model; and to iterate repeatedly, constructing a target relationship prediction model based on the updated first interaction relationship prediction model and the second interaction relationship prediction model; wherein, the first prediction result is obtained by: obtaining the first amino acid sequence of the first sub-protein and the second amino acid sequence of the second sub-protein in the first historical protein pair in the first dataset, and constructing a first amino acid vector and a second amino acid vector based on the first amino acid fragment in the first amino acid sequence and the second amino acid fragment in the second amino acid sequence; using The first feature encoding layer performs one-hot encoding on the first amino acid vector and the second amino acid vector to obtain a first encoding result and a second encoding result. The first feature mapping layer then performs dense mapping on the first encoding result and the second encoding result to obtain a first dense mapping result and a second dense mapping result. The first feature sequence extraction layer then extracts temporal features from the first dense mapping result and the second dense mapping result to obtain a first feature vector of the first subprotein based on the first amino acid sequence and a second feature vector of the second subprotein based on the second amino acid sequence. The first feature vector and the second feature vector are then input into the first classification layer to obtain a first prediction result between the first subprotein and the second subprotein.
12. A device for predicting interaction relationships, characterized in that, include: The predictor protein pair construction module is used to obtain a first predictor protein and a second predictor protein, and to construct a predictor protein pair based on the first predictor protein and the second predictor protein. A relation prediction result determination module is used to input the protein pair to be predicted into a target relation prediction model to obtain a relation prediction result; wherein the target relation prediction model is trained by the training method of the relation prediction model according to any one of claims 1-6; An interaction relationship determination module is used to determine the interaction relationship between the first protein to be predicted and the second protein to be predicted based on the relationship prediction results.
13. A computer-readable storage medium having a computer program stored thereon, characterized in that, When the computer program is executed by the processor, it implements the training method of the relationship prediction model according to any one of claims 1-6 and the prediction method of the interaction relationship according to any one of claims 7-10.
14. An electronic device, characterized in that, include: processor; as well as Memory for storing the executable instructions of the processor; The processor is configured to execute the training method of the relationship prediction model according to any one of claims 1-6 and the prediction method of the interaction relationship according to any one of claims 7-10 by executing the executable instructions.