Protein model quality assessment methods, devices and computer equipment

By acquiring the coordinates and chemical bond information of the protein model, and using sparse convolution and graph convolutional networks to extract the geometric and topological features of atoms, the mass fraction at the atomic level is determined after fusion. This solves the problem of insufficient accuracy in protein model quality assessment in existing technologies and improves the accuracy and efficiency of assessment.

CN115938468BActive Publication Date: 2026-06-30SHANGHAI ZELIXIR BIOTECH CO LTD +1

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
SHANGHAI ZELIXIR BIOTECH CO LTD
Filing Date
2022-05-09
Publication Date
2026-06-30

AI Technical Summary

Technical Problem

Existing protein model quality assessment methods have low accuracy and cannot provide sufficient information to support downstream tasks, especially lacking accuracy at the local geometric fine-grained level.

Method used

By acquiring the coordinate and chemical bond information of the target protein model, geometric features of atoms are extracted using three-dimensional sparse convolution, and topological features are extracted by combining graph convolutional networks. The two are then fused to determine the atomic-level model quality score.

Benefits of technology

It achieves atomic-level protein model quality assessment, improving the accuracy and efficiency of model quality assessment and providing more granular predictive performance.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN115938468B_ABST
    Figure CN115938468B_ABST
Patent Text Reader

Abstract

This application discloses a method, apparatus, and computer device for protein model quality assessment. The method acquires the coordinate and chemical bond information of a target protein model, which comprises multiple amino acid residues. The coordinate information includes the coordinates of each atom within each amino acid residue, and the chemical bond information includes the chemical bonds between atoms in the target protein model. Geometric features of each atom are extracted based on its coordinates, atom type, and the amino acid type of the amino acid residue to which it belongs. Topological features of each atom are extracted based on the chemical bond information between atoms. The geometric features and corresponding topological features of each atom are fused to obtain a fused feature for each atom. The model quality score corresponding to each atom in the target protein model is determined based on the fused feature. This method can improve the accuracy of protein model quality assessment.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This application relates to the field of biotechnology, specifically to a method, apparatus, and computer equipment for evaluating the quality of a protein model. Background Technology

[0002] The structure of proteins plays a crucial role in studying their function and developing drugs. Currently, the determination of protein structure generally relies on experimental observation.

[0003] However, observing protein structures experimentally is time-consuming and costly. Therefore, increasing research is focusing on computational methods for protein structure prediction, specifically predicting the three-dimensional structure of proteins from their sequences. Protein quality assessment (Protein QA) is a crucial step in the protein structure prediction process, used to evaluate the accuracy of candidate protein models. Protein QA assesses the difference between the predicted structure and a reference structure, thus helping downstream tasks select the optimal model structure and improve the protein model based on the estimated quality score.

[0004] However, current methods for assessing protein model quality have low accuracy. Summary of the Invention

[0005] This application provides a method, apparatus, and computer device for protein model quality assessment, which can effectively improve the accuracy of protein model quality assessment.

[0006] The first aspect of this application provides a method for assessing the quality of a protein model, the method comprising:

[0007] The coordinate information and chemical bond information of the target protein model are obtained. The target protein model includes multiple amino acid residues. The coordinate information includes the coordinate information of each atom in each amino acid residue. The chemical bond information includes the chemical bond information between atoms in the target protein model.

[0008] The geometric features of each atom are extracted based on the coordinate information of each atom, the atom type information of each atom, and the amino acid type information of the amino acid residue to which each atom belongs.

[0009] The topological features of each atom are extracted based on the chemical bond information between the atoms;

[0010] The geometric features of each atom are fused with the corresponding topological features to obtain the fused features of each atom.

[0011] The model mass fraction corresponding to each atom in the target protein model is determined based on the fusion characteristics of each atom.

[0012] Accordingly, a second aspect of this application provides a protein model quality assessment device, the device comprising:

[0013] The acquisition unit is used to acquire coordinate information and chemical bond information of a target protein model, wherein the target protein model includes multiple amino acid residues, the coordinate information includes the coordinate information of each atom in each amino acid residue, and the chemical bond information includes the chemical bond information between atoms in the target protein model.

[0014] The first extraction unit is used to extract the geometric features of each atom based on the coordinate information of each atom, the atom type information of each atom, and the amino acid type information of the amino acid residue to which each atom belongs.

[0015] The second extraction unit is used to extract the topological features of each atom based on the chemical bond information between the atoms;

[0016] The fusion unit is used to fuse the geometric features of each atom with the corresponding topological features to obtain the fused features of each atom.

[0017] A determining unit is used to determine the model mass fraction corresponding to each atom in the target protein model based on the fusion characteristics of each atom.

[0018] A third aspect of this application also provides a computer-readable storage medium storing a plurality of instructions adapted for loading by a processor to perform steps in the protein model quality assessment method provided in the first aspect of this application.

[0019] The fourth aspect of this application provides a computer device including a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein the processor executes the computer program to implement the steps in the protein model quality assessment method provided in the first aspect of this application.

[0020] The fifth aspect of this application provides a computer program product, including a computer program / instructions that, when executed by a processor, implement the steps in the protein model quality assessment method provided in the first aspect.

[0021] The protein model quality assessment method provided in this application obtains the coordinate information and chemical bond information of the target protein model, which includes multiple amino acid residues. The coordinate information includes the coordinate information of each atom in each amino acid residue, and the chemical bond information includes the chemical bond information between atoms in the target protein model. Geometric features of each atom are extracted based on the coordinate information, atom type information, and amino acid type information of the amino acid residue to which each atom belongs. Topological features of each atom are extracted based on the chemical bond information between atoms. The geometric features and corresponding topological features of each atom are fused to obtain the fused feature of each atom. The model quality score corresponding to each atom in the target protein model is determined based on the fused feature of each atom.

[0022] Therefore, the protein model quality assessment method provided in this application can extract atomic-level geometric and topological features from a protein model, then jointly learn the two features to obtain atomic-level fusion features of the protein model, and determine the atomic-level fine-grained quality score of the protein model based on the fusion features. This method provides a fine-grained, atomic-level protein model quality assessment method that can effectively improve the prediction performance at both the residue level and the overall level. In other words, this method can significantly improve the accuracy of protein model quality assessment. Attached Figure Description

[0023] To more clearly illustrate the technical solutions in the embodiments of this application, the accompanying drawings used in the description of the embodiments will be briefly introduced below. Obviously, the accompanying drawings described below are only some embodiments of this application. For those skilled in the art, other drawings can be obtained based on these drawings without creative effort.

[0024] Figure 1 This is a schematic diagram of a scenario for protein model quality assessment in this application;

[0025] Figure 2 This is a flowchart illustrating the protein model quality assessment method provided in this application;

[0026] Figure 3 This is a schematic diagram of the protein model quality assessment device provided in this application;

[0027] Figure 4 This is a schematic diagram of the structure of the computer device provided in this application. Detailed Implementation

[0028] The technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only a part of the embodiments of the present invention, and not all of them. All other embodiments obtained by those skilled in the art based on the embodiments of the present invention without creative effort are within the scope of protection of the present invention.

[0029] This invention provides a method, apparatus, and computer device for protein model quality assessment. The protein model quality assessment method can be used in a protein model quality assessment apparatus. The protein model quality assessment apparatus can be integrated into a computer device, which can be a terminal or a server. The terminal can be a mobile phone, tablet computer, laptop computer, smart TV, wearable smart device, personal computer (PC), or vehicle terminal, etc. The server can be a standalone physical server, a server cluster or distributed system composed of multiple physical servers, or a cloud server providing basic cloud computing services such as cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communication, middleware services, domain name services, security services, Content Delivery Network (CDN), and big data and artificial intelligence platforms. The server can also be a node in a blockchain.

[0030] Please see Figure 1 This is a schematic diagram of a scenario illustrating the protein model quality assessment method provided in this application. As shown, server A receives an assessment request from terminal B, which includes the target protein model to be assessed. Server A obtains the coordinate information and chemical bond information of the target protein model. The target protein model includes multiple amino acid residues. The coordinate information includes the coordinate information of each atom within each amino acid residue, and the chemical bond information includes the chemical bond information between atoms in the target protein model. Based on the coordinate information, atom type information, and amino acid type information of the amino acid residue to which each atom belongs, the geometric features of each atom are extracted. The topological features of each atom are extracted based on the chemical bond information between atoms. The geometric features and corresponding topological features of each atom are fused to obtain the fused feature of each atom. Based on the fused feature of each atom, the model quality score corresponding to each atom in the target protein model is determined. Then, server A returns the model quality score corresponding to each atom in the target protein model as the assessment result to terminal B.

[0031] It should be noted that, Figure 1The illustrated protein model quality assessment scenario is merely an example. The protein model quality assessment scenario described in the embodiments of this application is intended to more clearly illustrate the technical solution of this application and does not constitute a limitation on the technical solution provided in this application. Those skilled in the art will understand that as protein model quality assessment scenarios evolve and new business scenarios emerge, the technical solution provided in this application is equally applicable to similar technical problems.

[0032] The implementation scenarios described above will be explained in detail below.

[0033] In related technologies, protein model quality assessment typically utilizes trained models to evaluate specific similarity scores between predicted and reference structures, such as the Local Distance Difference Test (lDDT) and the Global Distance Test (GDT). However, GDT only generates overall structural similarity and cannot measure scores for non-overlapping local structures. Furthermore, it is highly sensitive to input data when evaluating proteins composed of multiple domains. In contrast, lDDT scores capture fine-grained accuracy of protein local geometry by considering all predicted atoms (including all side-chain atoms). However, current protein model quality assessment methods, even lDDT, can only perform residue-level protein model quality assessment, resulting in insufficient accuracy. Therefore, this application provides a protein model quality assessment method that can improve the accuracy of protein model quality assessment to a certain extent.

[0034] This application describes a protein model quality assessment device from the perspective of a protein model quality assessment apparatus that can be integrated into a computer device. The computer device can be a terminal or a server. The terminal can be a mobile phone, tablet computer, laptop computer, smart TV, wearable smart device, personal computer (PC), or in-vehicle terminal, etc. The server can be a standalone physical server, a server cluster or distributed system composed of multiple physical servers, or a cloud server providing basic cloud computing services such as cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communication, middleware services, domain name services, security services, content delivery network (CDN) acceleration services, and big data and artificial intelligence platforms. Figure 2 The diagram shown is a flowchart of the protein model quality assessment method provided in this application. The method includes:

[0035] Step 101: Obtain the coordinate information and chemical bond information of the target protein model.

[0036] In the context of evaluating protein model quality, local distance difference testing is widely adopted and researched because it can capture the fine-grained accuracy of local protein geometry. Current technologies employ deep learning to estimate these metrics, including 3DCNN, VoroMQA, DeepAccNet, and graph convolutional network-based methods such as ProteinGCN, GraphQA, and GNNRefine. Specifically, 3DCNN uses three-dimensional convolutional neural networks; VoroMQA uses interatomic contact regions to evaluate protein model structure; DeepAccNet is a deep learning-based method for improving protein structure optimization through accuracy estimation; ProteinGCN uses graph convolutional representations to learn protein structure; GraphQA uses graph convolutional networks to evaluate protein model quality; and GNNRefine uses deep graph networks to quickly and effectively optimize protein models.

[0037] DeepAccNet performs 3D convolution operations on the voxelized protein structure to obtain the 3D features of each residue. These 3D features are then stacked onto a 2D feature map, which is then processed by a 2D convolutional neural network to output the lDDT score for each residue. ProteinGCN uses a graph convolutional network to encode node features based on the topological map of the chemically bonded protein structure, ultimately predicting global and residue-level lDDT scores. However, the finest-grained methods mentioned above can only achieve residue-level quality assessment, which lacks accuracy and cannot provide sufficient information for downstream tasks. Therefore, this application provides a method for atomic-level quality assessment, which is described in detail below.

[0038] When a target protein model requiring quality assessment is received, its basic parameters need to be acquired first. In this embodiment, only the coordinate and chemical bond information of the target protein model needs to be acquired, without the need for a large amount of additional information. Therefore, the protein model quality assessment method provided in this application can improve the efficiency of protein model quality assessment from the perspective of acquiring assessment information. The coordinate information of the target protein model can be coordinate information in a preset coordinate system. This coordinate system can be a three-dimensional coordinate system with any position as the origin, or a three-dimensional coordinate system with the geometric center of the target protein model as the origin. The coordinate information of the target protein model includes the three-dimensional coordinate information of each atom in the target protein model. The target protein includes multiple amino acid residues, and each amino acid residue also contains multiple atoms. The atom type information of each atom and the amino acid type information of the amino acid residue constitute the attribute information of that atom. The chemical bond information of the target protein model includes the information of the chemical bonds between every two atoms connected by chemical bonds in the target protein model. The chemical bond information includes the type of chemical bond (i.e., the type of atoms connected at both ends of the chemical bond), the length of the chemical bond, and the bond energy information, etc.

[0039] Step 102: Extract the geometric features of each atom based on the coordinate information of each atom, the atom type information of each atom, and the amino acid type information of the amino acid residue to which each atom belongs.

[0040] In this embodiment of the application, after obtaining the coordinate information and chemical bond information of the target protein model to be evaluated, the geometric features of each atom of the target protein model can be extracted based on the obtained information.

[0041] In this approach, the target protein structure can be represented as a three-dimensional point cloud based on its atomic coordinates. Related techniques typically voxelize the target protein model and use traditional 3D CNNs to extract features. However, common voxelization methods in three-dimensional space often produce a large proportion of invalid voxels, and convolving these invalid voxels incurs a significant computational burden. Therefore, the resolution of the 3D voxels used in these methods is usually very low, resulting in the loss of geometric details of the protein structure.

[0042] The protein model quality assessment method provided in this application can utilize three-dimensional sparse convolution to address the aforementioned problems. Specifically, the attribute characteristics of each atom in the target protein model can be determined first. These attribute characteristics can include the atom type and the amino acid type of the corresponding amino acid residue. Specifically, the atom type can be carbon, hydrogen, oxygen, or sulfur, etc. The amino acid type of the residue can be methionine, hexine, or alanine, etc. Then, the geometric characteristics of each atom are determined based on its three-dimensional coordinate characteristics and attribute characteristics.

[0043] In some embodiments, the geometric features of each atom are extracted based on the coordinate information of each atom, the atom type information of each atom, and the amino acid type information of the amino acid residue to which each atom belongs, including:

[0044] 1. Obtain the atom type information of each atom and the amino acid type information of the amino acid residue to which each atom belongs, and obtain the attribute characteristics of each atom.

[0045] 2. Perform voxel transformation on the coordinate information and attribute features of each atom to obtain the voxel features of each atom. The voxel features include voxel coordinate features and voxel attribute features.

[0046] 3. Input the voxel features of each atom into the preset convolutional neural network model to obtain the geometric features of each atom.

[0047] In this embodiment, after obtaining the three-dimensional coordinate information and attribute features of each atom, the coordinate information and attribute features of the atoms can be further voxelized to obtain the voxel features of each atom. Voxelization involves dividing the three-dimensional space of the target protein into multiple interconnected cubes of the same size, and then using the features of these cubes to characterize the features of each atom, i.e., determining the voxel features of each atom. After determining the voxel features of each atom, a pre-trained convolutional neural network model can be used to convolve the voxel features of each atom to obtain the geometric features of each atom. This pre-trained convolutional neural network model can be understood as a sparse convolutional neural network model.

[0048] Since the pre-defined convolutional neural network model used in this application only performs convolution processing on the voxel features corresponding to each atom and does not process invalid voxels, the computational load can be reduced and the evaluation efficiency of protein model quality can be improved.

[0049] In some embodiments, the coordinate information and attribute features of each atom are voxel-converted to obtain voxel features of each atom. The voxel features include voxel coordinate features and voxel attribute features, including:

[0050] 2.1. Based on the preset mesh side length, the coordinate system of the target protein model is processed into a cubic mesh;

[0051] 2.2 Determine the cubic grid where each atom is located, and determine the voxel coordinate characteristics of each atom based on the coordinate information of the cubic grid;

[0052] 2.3 Calculate the average value of the property characteristics of multiple atoms contained in each cubic mesh to obtain the property information of each cubic mesh;

[0053] 2.4. Based on the attribute information of each cubic grid, determine the voxel attribute features of each atom. The voxel coordinate features of the atom and the voxel attribute features of the atom constitute the voxel features of the atom.

[0054] In this embodiment, the voxelization features of each atom are obtained by first converting the coordinate system of the target protein model into a cubic mesh, then determining the cubic mesh containing each atom, and using the coordinate information of each cubic mesh to characterize the voxel coordinate features of the atom within that mesh. Furthermore, the attribute features of the cubic mesh can also be used to characterize the attribute features of the atom within that mesh, and these attribute features can be determined by the average of the attribute features of all atoms within the mesh.

[0055] Specifically, the three-dimensional point cloud of the target protein can be represented as {x k}={(p k ,f k )}, where p k f is the 3D coordinate of the k-th point in the 3D point cloud. k This refers to the feature corresponding to that point, which can be the attribute feature of the atom corresponding to that point. First, we translate all points to a local coordinate system with the geometric center as the origin to obtain the translated point cloud coordinates. Features of all points {f k The result remains unchanged. Then, the transformed point cloud... Convert to a sparse voxel representation with resolution r:

[0056]

[0057]

[0058] in, The voxel coordinate features of the k-th point, It is a round-down operation. and These are the coordinates of the k-th point on the x-axis, y-axis, and z-axis, respectively. and These are the voxel coordinates of the point on the x-axis, y-axis, and z-axis, respectively. Let N be the voxel feature corresponding to the m-th voxel. m Let m be the number of points in the m-th voxel. It is a binary indicator for judging Whether something belongs to the m-th voxel grid can be understood as follows: the voxel feature corresponding to the m-th voxel is the average of the attribute features of all points in that voxel grid.

[0059] Then, after the above operations, the non-empty voxel (N) m Points >0 are stored in a hash table, and then the convolution operation is performed only on these non-empty voxels. This approach can represent point clouds at a higher volumetric resolution while maintaining computational efficiency, and reduces memory usage and computation, offering greater advantages when dealing with high-resolution and large-scale point clouds.

[0060] In some embodiments, before inputting the voxel features of each atom into a preset convolutional neural network model to obtain the geometric features of each atom, the method further includes:

[0061] A. Obtain training sample data, which includes coordinate information and chemical bond information of multiple protein models;

[0062] B. Determine the sample voxel features of each sample atom in each sample protein model based on the coordinate information and chemical bond information of each sample protein model.

[0063] C. Obtain the label value corresponding to each atom in each protein model;

[0064] D. Train a pre-defined convolutional neural network model by taking the sample voxel features of each sample atom in each protein model as input and the label value corresponding to the sample atom as output.

[0065] In this embodiment, before performing convolution processing on voxel features using a pre-defined convolutional neural network (CNN) model, the model needs to be trained. This training can be supervised. Therefore, training sample data and corresponding label data can be obtained first. The training sample data can include the coordinate information and chemical bond information of multiple sample protein models. As described above regarding the use of the pre-defined CNN model, its input is the voxel features of atoms, and its output is the geometric features of atoms. Therefore, it is necessary to first determine the sample voxel features of each atom in each sample protein model based on its coordinate information. Then, it is necessary to further obtain the label value corresponding to each atom in each sample protein model. This label value is used to supervise the training process of the pre-defined CNN model.

[0066] In some embodiments, obtaining the label value corresponding to each sample atom in each sample protein model includes:

[0067] C1. Obtain the reference protein model corresponding to each sample protein model;

[0068] C2. Calculate the similarity coefficient between each sample atom in each sample protein model and the corresponding atom in the corresponding reference protein model to obtain the label value corresponding to each sample atom in each sample protein model.

[0069] In this embodiment, the similarity coefficient between each atom in the sample protein model and the corresponding atom in the reference protein model can be used as the label value for each sample atom in the sample protein model. The reference protein model can be a reference structure of the sample protein model, or it can be understood as a standard structure. Protein model quality assessment is to evaluate the difference between the protein model structure and the reference structure, or it can be understood as evaluating the similarity score between the protein model structure and the reference structure.

[0070] Therefore, we can first obtain the reference protein model corresponding to each sample protein model, and then calculate the similarity score, or similarity coefficient, between each sample atom in each sample protein model and the corresponding atom in the corresponding reference protein model. This similarity coefficient is then used as the label value corresponding to that sample atom. In some embodiments, the similarity score between the sample atom and the corresponding atom in the corresponding reference protein can be represented by a local distance difference test score.

[0071] In some embodiments, the similarity coefficient between each sample atom in each sample protein model and the corresponding atom in the corresponding reference protein model is calculated to obtain the label value corresponding to each sample atom in each sample protein model, including:

[0072] C21. Obtain the atomic number information of the neighboring atoms of the target sample atom within the preset neighborhood radius in the target sample protein model, and calculate the distance between the target sample atom and each neighboring atom to obtain the first distance information.

[0073] C22. Based on the atomic number information, determine the reference neighboring atoms in the reference protein model corresponding to the target sample protein model, and calculate the distance between the target reference sample atom corresponding to the target sample atom and each reference neighboring atom to obtain the second distance information;

[0074] C23. A preset mapping function is used to map the difference between the first distance information and the second distance information to obtain the label value corresponding to the target sample atom in the target sample protein model.

[0075] C24. Iterate through each sample atom in each sample protein model and calculate the label value corresponding to each sample atom in each sample protein model.

[0076] In this embodiment, the atomic-level lDDT score can be used as the tag value corresponding to the sample atom. Currently, there is no method for calculating the atomic-level lDDT score of proteins; only the residue-level lDDT score can be calculated. This application provides a method for directly calculating the atomic-level lDDT score, which will be described in detail below.

[0077] Specifically, each atom in any target protein model from multiple sample protein models can be numbered first. Similarly, each atom in the corresponding reference model can also be numbered. The atoms in the target protein model and the reference model have the same numbering. Then, for any target atom in the target protein model, its neighboring atoms within a preset neighborhood radius are determined, and the atom numbers of these neighboring atoms are determined. Further, the distance between the target atom and each neighboring atom can be calculated to obtain first distance information. Here, the first distance information includes multiple distances, namely the distance between the target atom and each neighboring atom.

[0078] For the target reference sample atom in the reference model corresponding to the target protein model, its reference neighbor atom in the reference model can be determined based on the aforementioned neighbor atom indices. Then, the distance between the target reference sample atom and its reference neighbor atom is calculated to obtain the second distance information. That is, this second distance information also includes the distance information between the target reference sample atom and each reference neighbor atom. The number of distance data points in the second distance information is the same as the number of distance data points in the first distance information, and there is a one-to-one correspondence between the distance data. Then, the difference between the corresponding distance data points can be calculated, and this difference is input into the mapping function f to obtain a score between 0 and 1. These scores correspond to neighboring atoms. Finally, averaging these scores yields the atomic-level score of the target sample atom, i.e., the atomic-level lDDT score.

[0079] Then, we can iterate through each atom of each sample protein and use the method described above to calculate the lDDT score of each atom in the target sample protein to obtain the lDDT score of each atom in each sample protein.

[0080] Furthermore, this application may also provide a specific algorithm for calculating atomic-level lDDT fractions, in which the following input parameters are used: protein model P m Protein reference structure P n ; Neighborhood radius threshold R; Distance-to-fraction mapping function f and Euclidean distance Dist.

[0081] Among them, P m ,P n ∈R N*3 Let P[i,j] represent the sets of atomic coordinates for the protein model and the protein reference structure, respectively. Each protein contains M residues, and the m-th residue contains Nm atoms. P[i,j] represents the j-th atom in the i-th residue.

[0082] The algorithm's output parameters include: atomic-level lDDT fraction S A Residue-level lDDT fraction S R and the overall lDDT score S G .

[0083] The following is pseudocode for calculating the atomic-level lDDT fraction:

[0084]

[0085] Each step of the above pseudocode is explained here:

[0086] S1: Define the atomic-level lDDT fraction S AResidue-level lDDT fraction S R and the overall lDDT score S G Then, initialize these defined parameters. Next, define the protein model index n. G And initialize it.

[0087] S2: For each residue in the first protein model, perform the following operations S3 through S15.

[0088] S3: Define residue fractions s R and residue number n R Each of these processes is then initialized.

[0089] S4: For each atom in the i-th residue, repeat the operations S5 through S14.

[0090] S5: Calculate the indices of the neighboring atoms of the j-th atom in the i-th residue within a neighborhood of radius R. Assign these indices to indices.

[0091] S6: Determine the number of adjacent atoms K j .

[0092] S7: Define the atomic fraction s A and atomic number n A And initialize.

[0093] S8: For each adjacent atom, perform operations S9 to S13.

[0094] S9: Calculate the distance d between the j-th atom of the i-th residue and its k-th adjacent atom in the reference protein structure. n .

[0095] S10: Calculate the distance d between the j-th atom of the i-th residue and its k-th adjacent atom in the protein model. m .

[0096] S11: d n and d m The absolute value of the difference is input into the mapping function f to obtain a fraction between 0 and 1, which is the fraction s corresponding to the kth adjacent atom.

[0097] S12: The scores are summed up at the atomic, residue, and protein levels, respectively.

[0098] S13: During accumulation, counting is performed at the atomic, residual, and protein levels.

[0099] S14: Calculate the lDDT fraction for each atom in the protein model.

[0100] S15: Calculate the lDDT score corresponding to each amino acid residual in the protein model.

[0101] S16: Calculate the lDDT score for each protein level.

[0102] S17: Outputs the atomic level, residual level, and protein-level lDDT score of the protein model.

[0103] In some embodiments, a predefined convolutional neural network model is trained using the sample voxel features of each sample atom in each sample protein model as input and the label value corresponding to the sample atom as output, including:

[0104] D1. Input the sample voxel features of each sample atom in each sample protein model into the preset convolutional neural network model to obtain the output voxel features corresponding to each sample atom output by the preset convolutional neural network model.

[0105] D2. Perform devoxification on the output voxel features corresponding to each sample atom to obtain the output score of each sample atom.

[0106] D3. Adjust the model parameters in the preset convolutional neural network model based on the difference between the output score of each sample atom and the label value corresponding to each sample atom until the preset convolutional neural network model converges.

[0107] In this embodiment of the application, the sample voxel features of each sample atom in each sample protein model and the label value corresponding to each sample atom are used to train a preset convolutional neural network model. Specifically, the sample voxel features of the sample atoms can be input into the preset convolutional neural network model. Here, the preset convolutional neural network model can be a convolutional neural network model after parameter initialization, or a preset convolutional neural network model after pre-training. Here, the preset convolutional neural network model can be a sparse convolutional neural network model.

[0108] When training a pre-defined convolutional neural network (CNN) model, the voxel features of sample atoms are input into the model to obtain the output voxel features. Then, the output voxel features are de-voxed to obtain the output scores of the sample atoms. This de-voxing can be achieved using a nearest neighbor interpolation algorithm, converting the output voxel features into a point cloud representation to obtain the output score for each sample atom. The gradient is then calculated using the difference between the output score and the corresponding label value of each sample atom as the loss function. Backpropagation is then performed based on this gradient to adjust the model parameters of the CNN until it converges, completing the training of the CNN model.

[0109] Step 103: Extract the topological features of each atom based on the chemical bond information between atoms.

[0110] While 3DCNN excels at capturing geometric details through regular grids, graph convolution is better suited for processing unstructured data, and proteins are naturally unstructured data. Therefore, embodiments of this application can capture the topological features of proteins by learning atomic-level chemical relationships and close-range interactions. These atomic-level chemical relationships and close-range interactions can be determined through the chemical bond information between atoms in the protein. A graph G = (V, E) represents the protein, where V and E are the vertex set and edge set, respectively. We use the coordinates of atoms as nodes and construct the edges of the graph using chemical bond and inter-atomic distance information. Then, based on this graph structure, the topological features of each atom in the target protein are extracted.

[0111] In some embodiments, the topological features of each atom are extracted based on the chemical bond information between atoms, including:

[0112] 1. Construct a graph structure using atoms in the target protein model as nodes and chemical bond information and distance information between atoms as edges;

[0113] 2. Determine the attribute information of each node in the graph structure based on the atom type information of each atom and the amino acid type information of the amino acid residues of each atom.

[0114] 3. Based on the graph structure and the attribute information of each node in the graph structure, extract the node features of each node in the graph structure to obtain the topological features of each atom in the target protein model.

[0115] In this embodiment, the topological features of each atom are extracted based on the chemical bond information and distance information between atoms. This can be achieved by first constructing a graph structure corresponding to the target protein model based on the chemical bond information and distance information between atoms. Specifically, atoms in the target protein model can be used as nodes in the graph structure, and the chemical bonds and distance information between atoms can be used as edges to construct the graph structure. Each node in this graph structure corresponds to an atom in the protein, and this atom has basic attribute information, including atom type information and the amino acid type information of the amino acid residue to which the atom belongs. Thus, the attribute features of each node can be determined based on the attribute information of the atom corresponding to the node. Furthermore, the node features of each node can be extracted based on the attribute features of the nodes in the graph structure and the edge connections between the nodes, thereby obtaining the topological features of each atom in the target protein model.

[0116] Step 104: Fuse the geometric features of each atom with the corresponding topological features to obtain the fused features of each atom.

[0117] After determining the geometric features and corresponding topological features of each atom in the target protein model, the set features of each atom can be further fused with the corresponding topological features to obtain the fused features of each atom. In this embodiment, fusing the geometric features of each atom with the corresponding topological features can be done by fusing the geometric features of each atom into the node features of the corresponding node in the graph structure to obtain new node features.

[0118] In some embodiments, the geometric features of each atom are fused with the corresponding topological features to obtain the fused features of each atom, including:

[0119] 1. Obtain the preset feature splicing rules;

[0120] 2. According to the preset feature splicing rules, the geometric features of each atom are spliced ​​into the topological features of each atom to obtain the fused features of each atom.

[0121] In this embodiment, fusing the geometric features and corresponding topological features of each atom can be achieved by splicing the geometric features and topological features of the atom according to certain splicing rules. Specifically, a splicing rule for splicing the geometric features and topological features of the atom can be obtained first, and then the geometric features and topological features of the atom can be spliced ​​according to the splicing rule to obtain the fused features of the atom.

[0122] Step 105: Determine the model mass fraction corresponding to each atom in the target protein model based on the fusion characteristics of each atom.

[0123] In this process, after fusing the geometric and topological features of each atom in the target protein model to obtain the fused features characterizing each atom in the target protein model, the model mass score corresponding to each atom in the target protein model can be further determined based on the atomic fused features, thus obtaining the atomic-level model mass score of the target protein model. Here, the atomic-level model mass score can be the atomic-level lDDT score.

[0124] Among them, the model quality score of an atom is determined based on the fusion characteristics of the atom. This can be done by using the mapping relationship between the fusion characteristics of the atom and the corresponding model quality score, or by using a trained neural network model to process the fusion characteristics of the atom to obtain the model quality score corresponding to the atom.

[0125] In some embodiments, determining the model quality score corresponding to each atom in the target protein model based on the fusion characteristics of each atom includes:

[0126] 1. Replace the node features of the nodes corresponding to each atom in the graph structure with the fusion features of each atom to obtain the updated graph structure;

[0127] 2. Input the updated graph structure into the graph convolutional neural network model for processing to obtain the model quality score of each atom in the output.

[0128] In this embodiment, the node features corresponding to each node in the aforementioned graph structure can be replaced with the fusion features of the atoms corresponding to each node to obtain a new graph structure. Then, a trained graph convolutional neural network model is used to perform convolution processing on this new graph structure, thereby outputting the score value corresponding to each node, which is the model quality score of each atom in the target protein model.

[0129] Specifically, the process of updating node features when a graph convolutional neural network model performs convolution processing on a graph structure can be represented by the following formulas (3) and (4):

[0130]

[0131]

[0132] in, For the updated node features, v j The node features before the update can be the fusion features corresponding to the aforementioned atoms, v j N represents the node characteristics of the neighboring nodes of this node. i W represents the number of adjacent nodes. ij D represents the weight corresponding to the j-th neighboring node; ijB represents the distance between two nodes. ij A binary number representing whether there is a chemical bond between two nodes. and is the nonlinear transformation parameter in the graph convolutional neural network model, and [·; ·] is the concatenation operation.

[0133] It is understandable that the above formulas (3) and (4) only illustrate the update process of a single layer of graph convolutional neural network. In some embodiments, the ability of the network to represent atomic features can be enhanced by stacking multiple layers of graph convolutional neural networks. Finally, a fully connected layer can be added at the end of the graph convolutional neural network model to output atomic-level quality assessment.

[0134] In some embodiments, before inputting the updated graph structure into a graph convolutional neural network model for processing and obtaining the model quality score for each atom in the output, the method further includes:

[0135] A. Obtain the sample fusion features of each sample atom in the protein model of the training sample data;

[0136] B. Obtain the label value of each atom in the protein model of the training sample data;

[0137] C. Train a graph convolutional neural network model by taking the graph structure constructed from the sample fusion features of each sample atom as input and the label value corresponding to the sample atom as output.

[0138] Before performing convolution processing on the graph structure corresponding to the target protein using a graph convolutional neural network (GNN) model, the GNN model needs to be trained first. As described above, the input to the GNN model is a graph structure constructed with the fusion features of each atom as node features and the chemical bonds between atoms as edges; the output is the label value corresponding to each atom. Therefore, it is necessary to first obtain the training sample data and corresponding label values ​​for the GNN model. That is, for each sample protein model, the sample fusion features of each sample atom and the label value corresponding to each sample atom are obtained first. The sample fusion features of each sample atom can be calculated using the method described above for calculating the fusion features of atoms in the target protein model, which will not be repeated here. Since the calculation of sample fusion features requires the use of the topological features of the sample atoms, the training of a preset convolutional neural network model, i.e., a sparse convolutional neural network model, needs to be completed before training the GNN model. Similarly, the method for calculating the label value of each sample atom in the sample protein model has been specifically described in the previous embodiments and will not be repeated here.

[0139] After determining the fusion features and label values ​​corresponding to each sample atom, a feature map can be constructed based on the fusion features and chemical bonds between the sample atoms. The graph convolutional neural network model can then be trained using this feature map as input and the label values ​​corresponding to each sample atom as output.

[0140] In some embodiments, fusing the geometric and topological features of each atom in the target protein model can be achieved by concatenating the atomic-level features of each sparse convolutional layer in a sparse convolutional neural network with the atomic-level features of the corresponding convolutional layer in a graph convolutional neural network, and then performing a nonlinear transformation. Specifically, the feature fusion process can be represented by the following formula (5):

[0141]

[0142] in, For atomic-level features of the l-th layer of a sparse convolutional neural network, For the atomic-level features of the l-th layer of the graph convolutional neural network, This represents the atomic-level features fused from the l-th layer of a graph convolutional neural network. [·; ·] represents the concatenation operation. It is the learned linear change, and the updated fused features. As the node feature input for the next layer of graph convolution operation.

[0143] In some embodiments, the protein model quality assessment method provided in this application further includes:

[0144] a. Determine the residue-level mass fraction of each amino acid residue in the target protein model based on the model mass fraction corresponding to each atom;

[0145] b. Determine the mass fraction of the target protein model based on the model mass fraction corresponding to each atom.

[0146] In this embodiment, the protein model quality assessment method provided in this application can calculate not only more fine-grained atomic-level lDDT scores, but also residue-level and protein-level lDDT scores. Specifically, the atomic-level lDDT scores can be averaged at the residue level and the overall protein level to obtain residue-level and protein-level lDDT scores.

[0147] Therefore, based on the detailed description of the embodiments above, it can be understood that the protein model quality assessment method provided in this application specifically provides an Atom-ProteinQA (atomic-level protein quality assessment) model, which includes a geometric feature extraction module, a topological feature extraction module, and a feature fusion module. When training this model, an atomic-level lDDT score calculation method is first provided to generate atomic-level lDDT scores as supervision for atomic-level predictions. The geometric feature extraction module first takes all atomic coordinates and atomic attribute features of the protein as input to generate atomic-level predictions, focusing on capturing the geometric information of the protein and outputting fine-grained atomic geometric features. The topological feature extraction module is used to capture the topological relationships between protein atoms, using chemical bonds and proximity interactions as edges to construct an undirected graph, and utilizing a graph neural network to aggregate atomic neighborhood features to generate the final atomic-level predictions. Between these two modules, a feature fusion module is used to input the geometric features extracted by the feature extraction module into the topological feature extraction module for cross-model feature fusion to enhance mutual representation.

[0148] The protein model quality assessment method provided in this application breaks with the existing convention of predicting only at the residue level or the overall structure level. The Atom-ProteinQA model in this application can generate fine-grained, atomic-level lDDT scores and uses these scores as supervision, enabling the model to predict accurate atomic-level mass scores. Furthermore, the Atom-ProteinQA model provided in this application incorporates three novel modules: Geometric Perception (GP), Topological Perception (TP), and Cross Model Fusion (CMF), to acquire geometric and chemical information and enhance cross-model mutual information. The Geometric Perception (GP) module incorporates sparse convolution to capture fine-grained local atomic geometry. Compared to traditional 3D convolution and graph convolution, it uses sparse voxel representations, effectively preserving the original geometry of local protein regions. The Topological Perception (TP) module processes graphs generated using chemical bonds and close-range atomic interactions to produce atomic-level predictions, thus fully utilizing the protein's topological structure. Finally, the Cross-Model Fusion (CMF) module fuses features from different representations—geometric and chemical features—to improve predictions at the atomic and residue levels. Furthermore, the protein model quality assessment method provided in this application only requires the protein's atomic coordinates, atom types, amino acid types, and chemical bond connections as input data, without requiring additional information, thus reducing the complexity of data preprocessing and improving the efficiency of protein model quality assessment.

[0149] Please refer to Table 1 for a performance comparison of each protein model quality assessment model on different datasets.

[0150]

[0151]

[0152] CATH-2084 and Decoy-8000 are two different datasets in R. golbal It is the global mean square error, R residue It is the mean squared error at the residue level; P global It is the global Pearson correlation coefficient, P residue It is the Pearson correlation coefficient of the residual level.

[0153] As shown in Table 1, the Atom-ProteinQA model provided in this application achieves lower mean squared error and higher Pearson correlation coefficient at both the residual and global levels compared to the baseline model and the aforementioned existing models. In other words, the protein model quality assessment method provided in this application can achieve higher accuracy.

[0154] As described above, the protein model quality assessment method provided in this application obtains the coordinate information and chemical bond information of the target protein model. The target protein model includes multiple amino acid residues. The coordinate information includes the coordinate information of each atom in each amino acid residue, and the chemical bond information includes the chemical bond information between atoms in the target protein model. The method extracts the geometric features of each atom based on its coordinate information, atom type information, and amino acid type information of the amino acid residue to which it belongs. It then extracts the topological features of each atom based on the chemical bond information between atoms. Finally, it fuses the geometric features of each atom with its corresponding topological features to obtain the fused feature of each atom. Finally, it determines the model quality score corresponding to each atom in the target protein model based on the fused feature of each atom.

[0155] Therefore, the protein model quality assessment method provided in this application can extract atomic-level geometric and topological features from a protein model, then jointly learn the two features to obtain atomic-level fusion features of the protein model, and determine the atomic-level fine-grained quality score of the protein model based on the fusion features. This method provides a fine-grained, atomic-level protein model quality assessment method that can effectively improve the prediction performance at both the residue level and the overall level. In other words, this method can significantly improve the accuracy of protein model quality assessment.

[0156] To better implement the above methods, this application also provides a protein model quality assessment device, which can be integrated into a terminal or server.

[0157] For example, such as Figure 3 The diagram shown is a schematic representation of the structure of a protein model quality assessment device provided in an embodiment of this application. The protein model quality assessment device may include an acquisition unit 301, a first extraction unit 302, a second extraction unit 303, a fusion unit 304, and a determination unit 305, as follows:

[0158] The acquisition unit 301 is used to acquire the coordinate information and chemical bond information of the target protein model. The target protein model includes multiple amino acid residues. The coordinate information includes the coordinate information of each atom in each amino acid residue. The chemical bond information includes the chemical bond information between atoms in the target protein model.

[0159] The first extraction unit 302 is used to extract the geometric features of each atom based on the coordinate information of each atom, the atom type information of each atom, and the amino acid type information of the amino acid residue to which each atom belongs.

[0160] The second extraction unit 303 is used to extract the topological features of each atom based on the chemical bond information between atoms;

[0161] The fusion unit 304 is used to fuse the geometric features of each atom with the corresponding topological features to obtain the fused features of each atom.

[0162] The determination unit 305 is used to determine the model mass fraction corresponding to each atom in the target protein model based on the fusion characteristics of each atom.

[0163] In some embodiments, the first extraction unit includes:

[0164] The first acquisition subunit is used to acquire the atom type information of each atom and the amino acid type information of the amino acid residue to which each atom belongs, so as to obtain the attribute characteristics of each atom.

[0165] The transformation subunit is used to perform voxel transformation on the coordinate information and attribute features of each atom to obtain the voxel features of each atom. The voxel features include voxel coordinate features and voxel attribute features.

[0166] The input sub-unit is used to input the voxel features of each atom into a preset convolutional neural network model to obtain the geometric features of each atom.

[0167] In some embodiments, the protein model quality assessment apparatus provided in this application further includes:

[0168] The second acquisition subunit is used to acquire training sample data, which includes coordinate information and chemical bond information of multiple sample protein models.

[0169] The first determining subunit is used to determine the sample voxel features of each sample atom in each sample protein model based on the coordinate information and chemical bond information of each sample protein model.

[0170] The third acquisition subunit is used to acquire the label value corresponding to each sample atom in each sample protein model;

[0171] The first training subunit is used to train a preset convolutional neural network model with the sample voxel features of each sample atom in each sample protein model as input and the label value corresponding to the sample atom as output.

[0172] In some embodiments, the third acquisition subunit includes:

[0173] The acquisition module is used to acquire the reference protein model corresponding to each sample protein model.

[0174] The first calculation module is used to calculate the similarity coefficient between each sample atom in each sample protein model and the corresponding atom in the corresponding reference protein model, so as to obtain the label value corresponding to each sample atom in each sample protein model.

[0175] In some embodiments, the first computing module includes:

[0176] The acquisition submodule is used to acquire the atomic number information of the neighboring atoms of the target sample atom within a preset neighborhood radius in the target sample protein model, and to calculate the distance between the target sample atom and each neighboring atom to obtain the first distance information;

[0177] The first calculation submodule is used to determine the reference neighboring atoms in the reference protein model corresponding to the target sample protein model based on the atomic number information, and to calculate the distance between the target reference sample atom corresponding to the target sample atom and each reference neighboring atom to obtain the second distance information.

[0178] The processing submodule is used to map the difference between the first distance information and the second distance information using a preset mapping function to obtain the label value corresponding to the target sample atom in the target sample protein model;

[0179] The second calculation submodule is used to traverse each sample atom in each sample protein model and calculate the label value corresponding to each sample atom in each sample protein model.

[0180] In some embodiments, the training subunit includes:

[0181] The input module is used to input the sample voxel features of each sample atom in each sample protein model into the preset convolutional neural network model, and obtain the output voxel features corresponding to each sample atom output by the preset convolutional neural network model.

[0182] The first processing module is used to devoxify the output voxel features corresponding to each sample atom to obtain the output score value of each sample atom.

[0183] The adjustment module is used to adjust the model parameters in the preset convolutional neural network model based on the difference between the output score of each sample atom and the label value corresponding to each sample atom, until the preset convolutional neural network model converges.

[0184] In some embodiments, the conversion subunit includes:

[0185] The second processing module is used to perform cubic meshing of the coordinate system of the target protein model according to the preset mesh side length.

[0186] The first determining module is used to determine the cubic grid in which each atom is located, and to determine the voxel coordinate characteristics of each atom based on the coordinate information of the cubic grid.

[0187] The second calculation module is used to calculate the average value of the property characteristics of multiple atoms contained in each cubic grid, so as to obtain the property information of each cubic grid.

[0188] The second determining module is used to determine the voxel attribute features of each atom based on the attribute information of each cubic grid. The voxel coordinate features of the atom and the voxel attribute features of the atom constitute the voxel features of the atom.

[0189] In some embodiments, the second extraction unit includes:

[0190] Construct subunits to build a graph structure using atoms in the target protein model as nodes and chemical bond information and distance information between atoms as edges;

[0191] The second determining subunit is used to determine the attribute information of each node in the graph structure based on the atom type information of each atom and the amino acid type information of the amino acid residues of each atom.

[0192] Extracting sub-units is used to extract node features of each node in the graph structure based on the graph structure and the attribute information of each node in the graph structure, so as to obtain the topological features of each atom in the target protein model.

[0193] In some embodiments, the fusion unit includes:

[0194] The fourth acquisition subunit is used to acquire preset feature splicing rules;

[0195] The splicing subunit is used to splice the geometric features of each atom into the topological features of each atom according to the preset feature splicing rules, so as to obtain the fused features of each atom.

[0196] In some embodiments, the determining unit includes:

[0197] The replacement subunit is used to replace the node features of the corresponding nodes of each atom in the graph structure with the fusion features of each atom, so as to obtain the updated graph structure.

[0198] The processing subunit is used to input the updated graph structure into the graph convolutional neural network model for processing, and to obtain the model quality score of each atom in the output.

[0199] In some embodiments, the protein model quality assessment apparatus provided in this application further includes:

[0200] The fifth acquisition subunit is used to acquire the sample fusion features of each sample atom of the sample protein model in the training sample data;

[0201] The sixth acquisition subunit is used to acquire the label value of each sample atom in the sample protein model in the training sample data;

[0202] The second training subunit is used to train a graph convolutional neural network model by taking the graph structure constructed from the sample fusion features of each sample atom as input and the label value corresponding to the sample atom as output.

[0203] In some embodiments, the protein model quality assessment apparatus provided in this application further includes:

[0204] The third determining subunit is used to determine the residue-level mass fraction of each amino acid residue in the target protein model based on the model mass fraction of each atom.

[0205] The fourth determining subunit is used to determine the mass fraction of the target protein model based on the model mass fraction corresponding to each atom.

[0206] In practice, each of the above units can be implemented as an independent entity or can be arbitrarily combined to be implemented as the same or several entities. For the specific implementation of each of the above units, please refer to the previous method embodiments, which will not be repeated here.

[0207] As described above, the protein model quality assessment device provided in this application acquires the coordinate information and chemical bond information of the target protein model through the acquisition unit 301. The target protein model includes multiple amino acid residues, the coordinate information includes the coordinate information of each atom in each amino acid residue, and the chemical bond information includes the chemical bond information between atoms in the target protein model. The first extraction unit 302 extracts the geometric features of each atom based on the coordinate information of each atom, the atom type information of each atom, and the amino acid type information of the amino acid residue to which each atom belongs. The second extraction unit 302 extracts the topological features of each atom based on the chemical bond information between atoms. The fusion unit 303 fuses the geometric features of each atom with the corresponding topological features to obtain the fusion feature of each atom. The determination unit 305 determines the model quality score corresponding to each atom in the target protein model based on the fusion feature of each atom.

[0208] Therefore, the protein model quality assessment method provided in this application can extract atomic-level geometric and topological features from a protein model, then jointly learn the two features to obtain atomic-level fusion features of the protein model, and determine the atomic-level fine-grained quality score of the protein model based on the fusion features. This method provides a fine-grained, atomic-level protein model quality assessment method that can effectively improve the prediction performance at both the residue level and the overall level. In other words, this method can significantly improve the accuracy of protein model quality assessment.

[0209] This application also provides a computer device, which can be a terminal or a server, such as... Figure 4 The diagram shown is a structural schematic of the computer device provided in this application. Specifically:

[0210] The computer device may include components such as a processing unit 401 with one or more processing cores, a storage unit 402 with one or more storage media, a power module 403, and an input module 404. Those skilled in the art will understand that... Figure 4 The computer device structure shown does not constitute a limitation on the computer device and may include more or fewer components than shown, or combine certain components, or have different component arrangements. Wherein:

[0211] The processing unit 401 is the control center of the computer device. It connects various parts of the computer device via various interfaces and lines, and performs various functions and processes data by running or executing software programs and / or modules stored in the storage unit 402, and by calling data stored in the storage unit 402. Optionally, the processing unit 401 may include one or more processing cores; preferably, the processing unit 401 may integrate an application processor and a modem processor, wherein the application processor mainly handles the operating system, user interface, and applications, and the modem processor mainly handles wireless communication. It is understood that the modem processor may not be integrated into the processing unit 401.

[0212] Storage unit 402 can be used to store software programs and modules. Processing unit 401 executes various functional applications and data processing by running the software programs and modules stored in storage unit 402. Storage unit 402 may mainly include a program storage area and a data storage area. The program storage area may store the operating system, at least one application program required for a function (such as sound playback, image playback, and web page access); the data storage area may store data created based on the use of the computer device. In addition, storage unit 402 may include high-speed random access memory and may also include non-volatile memory, such as at least one disk storage device, flash memory device, or other volatile solid-state storage device. Accordingly, storage unit 402 may also include a memory controller to provide processing unit 401 with access to storage unit 402.

[0213] The computer equipment also includes a power supply module 403 that supplies power to various components. Preferably, the power supply module 403 can be logically connected to the processing unit 401 through a power management system, thereby enabling functions such as charging, discharging, and power consumption management through the power management system. The power supply module 403 may also include one or more DC or AC power supplies, recharging systems, power fault detection circuits, power converters or inverters, power status indicators, and other arbitrary components.

[0214] The computer device may also include an input module 404, which can be used to receive input numeric or character information and generate keyboard, mouse, joystick, optical or trackball signal inputs related to user settings and function control.

[0215] Although not shown, the computer device may also include a display unit, etc., which will not be described in detail here. Specifically, in this embodiment, the processing unit 401 in the computer device loads the executable files corresponding to the processes of one or more applications into the storage unit 402 according to the following instructions, and the processing unit 401 runs the applications stored in the storage unit 402 to realize various functions, as follows:

[0216] The process involves acquiring the coordinate and chemical bond information of a target protein model, which comprises multiple amino acid residues. The coordinate information includes the coordinates of each atom within each amino acid residue, and the chemical bond information includes the chemical bonds between atoms in the target protein model. Geometric features of each atom are extracted based on its coordinates, atom type, and the amino acid type of the amino acid residue to which it belongs. Topological features of each atom are extracted based on the chemical bond information between atoms. The geometric and topological features of each atom are then fused to obtain a fused feature for each atom. Finally, the model mass fraction corresponding to each atom in the target protein model is determined based on the fused feature.

[0217] It should be noted that the computer device provided in this application embodiment and the method in the above embodiment belong to the same concept. The specific implementation of each of the above operations can be found in the previous embodiments, and will not be repeated here.

[0218] Those skilled in the art will understand that all or part of the steps in the various methods of the above embodiments can be performed by instructions, or by instructions controlling related hardware. These instructions can be stored in a computer-readable storage medium and loaded and executed by a processor.

[0219] Therefore, embodiments of the present invention provide a computer-readable storage medium storing a plurality of instructions that can be loaded by a processor to execute steps in any of the methods provided in the embodiments of the present invention. For example, the instructions can execute the following steps:

[0220] The process involves acquiring the coordinate and chemical bond information of a target protein model, which comprises multiple amino acid residues. The coordinate information includes the coordinates of each atom within each amino acid residue, and the chemical bond information includes the chemical bonds between atoms in the target protein model. Geometric features of each atom are extracted based on its coordinates, atom type, and the amino acid type of the amino acid residue to which it belongs. Topological features of each atom are extracted based on the chemical bond information between atoms. The geometric and topological features of each atom are then fused to obtain a fused feature for each atom. Finally, the model mass fraction corresponding to each atom in the target protein model is determined based on the fused feature.

[0221] For details on the implementation of each of the above operations, please refer to the previous examples, which will not be repeated here.

[0222] The computer-readable storage medium may include: read-only memory (ROM), random access memory (RAM), disk or optical disk, etc.

[0223] Since the instructions stored in the computer-readable storage medium can execute the steps of any of the methods provided in the embodiments of the present invention, the beneficial effects that any of the methods provided in the embodiments of the present invention can achieve can be realized, as detailed in the preceding embodiments, and will not be repeated here.

[0224] According to one aspect of this application, a computer program product or computer program is provided, comprising computer instructions stored in a storage medium. A processor of a computer device reads the computer instructions from the storage medium and executes the computer instructions, causing the computer device to perform the methods provided in various optional implementations of the protein model quality assessment method described above.

[0225] The protein model quality assessment method, apparatus, and computer equipment provided in the embodiments of the present invention have been described in detail above. Specific examples have been used to illustrate the principles and implementation methods of the present invention. The descriptions of the above embodiments are only for the purpose of helping to understand the method and core ideas of the present invention. At the same time, for those skilled in the art, there will be changes in the specific implementation methods and application scope based on the ideas of the present invention. Therefore, the content of this specification should not be construed as a limitation of the present invention.

Claims

1. A method for assessing the quality of a protein model, characterized in that, The method includes: The coordinate information and chemical bond information of the target protein model are obtained. The target protein model includes multiple amino acid residues. The coordinate information includes the coordinate information of each atom in each amino acid residue. The chemical bond information includes the chemical bond information between atoms in the target protein model. The geometric features of each atom are extracted based on the coordinate information of each atom, the atom type information of each atom, and the amino acid type information of the amino acid residue to which each atom belongs. The topological features of each atom are extracted based on the chemical bond information between the atoms; The geometric features of each atom are fused with the corresponding topological features to obtain the fused features of each atom. The model mass fraction corresponding to each atom in the target protein model is determined based on the fusion characteristics of each atom. The step of extracting the geometric features of each atom based on its coordinate information, atom type information, and amino acid type information of the amino acid residue to which it belongs includes: Obtain the atom type information of each atom and the amino acid type information of the amino acid residue to which each atom belongs, and obtain the attribute characteristics of each atom. The coordinate information and attribute features of each atom are converted into voxels to obtain the voxel features of each atom, wherein the voxel features include voxel coordinate features and voxel attribute features. The voxel features of each atom are input into a pre-defined convolutional neural network model to obtain the geometric features of each atom. The extraction of topological features of each atom based on the chemical bond information between the atoms includes: A graph structure is constructed using atoms in the target protein model as nodes and chemical bond information and distance information between the atoms as edges. The attribute information of each node in the graph structure is determined based on the atom type information of each atom and the amino acid type information of the amino acid residues of each atom. Based on the graph structure and the attribute information of each node in the graph structure, the node features of each node in the graph structure are extracted to obtain the topological features of each atom in the target protein model. The determination of the model quality score corresponding to each atom in the target protein model based on the fusion characteristics of each atom includes: The node features corresponding to each atom in the graph structure are replaced with the fusion features of each atom to obtain the updated graph structure. The updated graph structure is input into a graph convolutional neural network model for processing, and the model quality score of each atom in the output is obtained.

2. The method according to claim 1, characterized in that, Before inputting the voxel features of each atom into a preset convolutional neural network model to obtain the geometric features of each atom, the process further includes: Acquire training sample data, which includes coordinate information and chemical bond information of multiple sample protein models; The sample voxel features of each sample atom in each sample protein model are determined based on the coordinate information of each sample protein model. Obtain the label value corresponding to each atom in each protein model sample; The pre-defined convolutional neural network model is trained by taking the sample voxel features of each atom in each protein model as input and the label value corresponding to the sample atom as output.

3. The method according to claim 2, characterized in that, The process of obtaining the label value corresponding to each sample atom in each sample protein model includes: Obtain the reference protein model corresponding to each sample protein model; Calculate the similarity coefficient between each sample atom in each sample protein model and the corresponding atom in the corresponding reference protein model to obtain the label value corresponding to each sample atom in each sample protein model.

4. The method according to claim 3, characterized in that, The calculation of the similarity coefficient between each sample atom in each sample protein model and the corresponding atom in the corresponding reference protein model, to obtain the label value corresponding to each sample atom in each sample protein model, includes: Obtain the atomic number information of the neighboring atoms of the target sample atom within a preset neighborhood radius in the target sample protein model, and calculate the distance between the target sample atom and each neighboring atom to obtain the first distance information; Based on the atomic number information, reference neighboring atoms are determined in the reference protein model corresponding to the target sample protein model, and the distance between the target reference sample atom corresponding to the target sample atom and each reference neighboring atom is calculated to obtain the second distance information; A preset mapping function is used to map the difference between the first distance information and the second distance information to obtain the label value corresponding to the target sample atom in the target sample protein model; Iterate through each atom in each protein model and calculate the label value corresponding to each atom in each protein model.

5. The method according to claim 2, characterized in that, The process of training a pre-defined convolutional neural network model by taking the voxel features of each sample atom in each protein model as input and the label value corresponding to the sample atom as output includes: The sample voxel features of each sample atom in each sample protein model are input into a preset convolutional neural network model to obtain the output voxel features corresponding to each sample atom output by the preset convolutional neural network model. The output voxel features corresponding to each sample atom are devoxed to obtain the output score of each sample atom. The model parameters in the preset convolutional neural network model are adjusted based on the difference between the output score of each sample atom and the label value corresponding to each sample atom until the preset convolutional neural network model converges.

6. The method according to claim 1, characterized in that, The coordinate information and attribute features of each atom are converted into voxel features to obtain voxel features of each atom. These voxel features include voxel coordinate features and voxel attribute features, including: The coordinate system of the target protein model is cubically meshed according to the preset mesh side length; Determine the cube grid where each atom is located, and determine the voxel coordinate characteristics of each atom based on the coordinate information of the cube grid; Calculate the average value of the property characteristics of multiple atoms contained in each cubic grid to obtain the property information of each cubic grid; The voxel attribute features of each atom are determined based on the attribute information of each cubic grid. The voxel coordinate features of the atom and the voxel attribute features of the atom constitute the voxel features of the atom.

7. The method according to claim 1, characterized in that, The process of fusing the geometric features of each atom with its corresponding topological features to obtain the fused features of each atom includes: Obtain preset feature splicing rules; According to the preset feature splicing rules, the geometric features of each atom are spliced ​​into the topological features of each atom to obtain the fused features of each atom.

8. The method according to claim 7, characterized in that, Before inputting the updated graph structure into the graph convolutional neural network model for processing to obtain the model quality score of each atom, the process further includes: Obtain the sample fusion features of each atom in the protein model of the training sample data; Obtain the label value of each atom in the protein model of the training sample data; The graph structure, constructed using the sample fusion features of each sample atom, is used as input, and the label value corresponding to the sample atom is used as output to train the graph convolutional neural network model.

9. The method according to claim 1, characterized in that, The method further includes: The residue-level mass fraction of each amino acid residue in the target protein model is determined based on the model mass fraction corresponding to each atom. The mass fraction of the target protein model is determined based on the model mass fraction corresponding to each atom.

10. A protein model quality assessment device, characterized in that, The device includes: The acquisition unit is used to acquire coordinate information and chemical bond information of a target protein model, wherein the target protein model includes multiple amino acid residues, the coordinate information includes the coordinate information of each atom in each amino acid residue, and the chemical bond information includes the chemical bond information between atoms in the target protein model. The first extraction unit is used to extract the geometric features of each atom based on the coordinate information of each atom, the atom type information of each atom, and the amino acid type information of the amino acid residue to which each atom belongs. The second extraction unit is used to extract the topological features of each atom based on the chemical bond information between the atoms; The fusion unit is used to fuse the geometric features of each atom with the corresponding topological features to obtain the fused features of each atom. A determining unit is used to determine the model mass fraction corresponding to each atom in the target protein model based on the fusion characteristics of each atom. The step of extracting the geometric features of each atom based on its coordinate information, atom type information, and amino acid type information of the amino acid residue to which it belongs includes: Obtain the atom type information of each atom and the amino acid type information of the amino acid residue to which each atom belongs, and obtain the attribute characteristics of each atom. The coordinate information and attribute features of each atom are converted into voxels to obtain the voxel features of each atom, wherein the voxel features include voxel coordinate features and voxel attribute features. The voxel features of each atom are input into a pre-defined convolutional neural network model to obtain the geometric features of each atom. The extraction of topological features of each atom based on the chemical bond information between the atoms includes: A graph structure is constructed using atoms in the target protein model as nodes and chemical bond information and distance information between the atoms as edges. The attribute information of each node in the graph structure is determined based on the atom type information of each atom and the amino acid type information of the amino acid residues of each atom. Based on the graph structure and the attribute information of each node in the graph structure, the node features of each node in the graph structure are extracted to obtain the topological features of each atom in the target protein model. The determination of the model quality score corresponding to each atom in the target protein model based on the fusion characteristics of each atom includes: The node features corresponding to each atom in the graph structure are replaced with the fusion features of each atom to obtain the updated graph structure. The updated graph structure is input into a graph convolutional neural network model for processing, and the model quality score of each atom in the output is obtained.

11. A computer-readable storage medium, characterized in that, The computer-readable storage medium stores a plurality of instructions adapted for loading by a processor to perform the steps of the protein model quality assessment method according to any one of claims 1 to 9.

12. A computer device, characterized in that, The method includes a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein the processor executes the computer program to implement the steps of the protein model quality assessment method according to any one of claims 1 to 9.