Method and apparatus for analyzing protein structure rigidity weaknesses based on atomic node and network constraint models

By using an atomic node and network constraint model, we can analyze the rigidity weaknesses of proteins, which solves the problem of difficulty in pinpointing the analysis of rigidity weaknesses in proteins in existing technologies, and enables the rapid identification of remodeling sites to improve protein stability.

CN116665765BActive Publication Date: 2026-06-30SHENZHEN NEWROSETTA BIOSCIENCES CO LTD

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
SHENZHEN NEWROSETTA BIOSCIENCES CO LTD
Filing Date
2023-06-14
Publication Date
2026-06-30

AI Technical Summary

Technical Problem

Existing technologies make it difficult to quickly and accurately analyze the rigid weaknesses of proteins by starting with their primary or secondary structures, which leads to the proteins being easily inactivated under extreme conditions and limits their applications.

Method used

Using an atomic node and network constraint model, the crystal structure of the target protein molecule is obtained, and free energy minimization simulation and pyrolysis folding molecular dynamics simulation are performed to construct an atomic node and network constraint model until a kinematic equilibrium state is reached, and the rigid and weak regions of the protein are analyzed.

Benefits of technology

It can quickly analyze the rigid weaknesses of protein structure, provide design references for improving protein stability, and identify remodeling sites to enhance protein stability.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN116665765B_ABST
    Figure CN116665765B_ABST
Patent Text Reader

Abstract

This application relates to a method and apparatus for analyzing the rigidity weaknesses of protein structures based on an atomic node and network constraint model. The method includes: obtaining the first crystal structure of the target protein molecule; performing a free energy minimization simulation on the first crystal structure to obtain an energy-minimized crystal structure; constructing an atomic node and network constraint model of the energy-minimized crystal structure; performing pyrolysis and folding molecular dynamics simulations on the atomic node and network constraint model until the atomic node and network constraint model reaches a state of equilibrium; analyzing the atomic node and network constraint model in equilibrium and obtaining the analysis results; and identifying rigidity weak regions in the target protein molecule based on the analysis results. The scheme provided in this application can calculate the rigidity weaknesses of the protein structure, find more suitable modification sites or regions, and thus enable targeted mutation of the protein to improve the stability of the protein molecule.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This application relates to the fields of computer and computational structural biology, and in particular to a method and apparatus for analyzing the rigid weaknesses of protein structures based on an atomic node and network constraint model. Background Technology

[0002] Proteins are macromolecules that perform physiological functions efficiently and with high specificity, and can be widely used in the fields of biomedicine and chemical production. However, proteins typically function under mild conditions and are easily inactivated under extreme conditions, which severely limits their applications.

[0003] A small number of proteins in nature can remain active under extreme conditions: the DNA polymerase used in polymerase chain reaction comes from a thermophilic bacterium that grows in hot springs, and its thermostability is significantly improved compared to the most primitive Escherichia coli DNA polymerase. While the thermophilic bacterium DNA polymerase is sequence-similar to the E. coli DNA polymerase, their thermostability differs drastically.

[0004] In 1961, Christian Anfinson completed denaturation and renaturation experiments on a series of proteins, including bovine pancreatic ribonuclease, discovering that folded or denatured proteins can recover their original structure under physiological conditions. For example, high temperatures or chemical factors can denature proteins, causing their structure to loosen or disintegrate. When the environment returns to its original state, the loosened or disintegrated protein can instantly fold back to its original three-dimensional structure, and no matter how many times the experiment is repeated, the protein always maintains only this one three-dimensional structure. Anfinson then pointed out that the amino acid sequence of a protein's primary structure contains all the information about its secondary or higher-level structures; that is, the primary structure of a protein determines its higher-level structure.

[0005] Therefore, how to quickly and accurately analyze the rigid weaknesses of proteins by starting with their primary or secondary structure is an urgent problem to be solved. Summary of the Invention

[0006] To address or partially address the problems existing in related technologies, this application provides a method and apparatus for analyzing the rigid weaknesses of protein structures based on an atomic node and network constraint model. This method can quickly analyze the rigid weaknesses of protein structures and provide a reference for the rational design of improving protein stability.

[0007] The first aspect of this application provides a method for analyzing the rigidity weaknesses of protein structures based on an atomic node and network constraint model, including:

[0008] Obtain the first crystal structure of the target protein molecule;

[0009] The first crystal structure was subjected to a free energy minimization simulation to obtain the energy-minimal crystal structure of the target protein molecule;

[0010] Construct an atomic node and network constraint model for the energy-minimizing crystal structure;

[0011] The atomic node and network constraint model of the energy-minimized crystal structure is subjected to pyrolysis folding molecular dynamics simulation until the atomic node and network constraint model of the energy-minimized crystal structure is in motion equilibrium.

[0012] The atomic node and network constraint model of the energy-minimizing crystal structure in motion equilibrium is analyzed, and the analytical results are obtained.

[0013] Based on the analysis results, identify the rigid and weak regions in the target protein molecule.

[0014] As an optional embodiment, obtaining the first crystal structure of the target protein molecule includes:

[0015] Obtain a second crystal structure containing the target protein molecule;

[0016] The second crystal structure is repaired to obtain the repaired second crystal structure;

[0017] The first crystal structure of the target protein molecule is isolated from the repaired second crystal structure.

[0018] As an optional embodiment, the step of performing free energy minimization simulation on the first crystal structure to obtain the energy-minimized crystal structure of the target protein molecule includes:

[0019] A water molecule model and force field are added to the first crystal structure to construct the topological structure of the target protein molecule;

[0020] Add a simulated box to the topology;

[0021] A pre-processing method is used to perform energy minimization simulations of the topology with added simulation boxes under vacuum conditions;

[0022] The energy minimization simulation of the topology with added simulation box was formally performed under vacuum conditions, and the simulation results were obtained.

[0023] The topological structure corresponding to the simulation results is converted into the energy-minimized crystal structure of the target protein molecule, and the energy-minimized crystal structure is output.

[0024] As an optional embodiment, the atomic node and network constraint model for constructing the energy-minimized crystal structure includes:

[0025] All atoms in the energy-minimized crystal structure are considered as atomic nodes, and the covalent and non-covalent bonds between all atoms are considered as lines.

[0026] The motion pattern of the connecting lines is set according to the type of covalent bonds and non-covalent bonds.

[0027] As an optional embodiment, setting the motion pattern of the connection according to the type of covalent bond and the non-covalent bond includes:

[0028] If the covalent bond is a peptide bond or a covalent double bond, then the rotation of the connecting line corresponding to the peptide bond or the covalent double bond is restricted;

[0029] If the covalent bond is a single covalent bond, then the connection line corresponding to the single covalent bond is set to be rotatable around the key axis and cannot be broken;

[0030] If the non-covalent bond is one of hydrogen bond, salt bond, and hydrophobic interaction, then the connection line corresponding to the hydrogen bond or salt bond is set to be breakable.

[0031] As an optional embodiment, the van der Waals interactions between the atoms are not considered as connections.

[0032] As an optional embodiment, the step of performing pyrolysis folding molecular dynamics simulations on the atomic node and network constraint model of the energy-minimized crystal structure until the atomic node and network constraint model of the energy-minimized crystal structure reaches a state of kinematic equilibrium includes:

[0033] The process of connecting lines breaking during the heating process of the atomic nodes and network constraint model of the energy-minimizing crystal structure is simulated until the atomic nodes and network constraint model of the energy-minimizing crystal structure are in a state of motion equilibrium.

[0034] As an optional embodiment, the step of analyzing the atomic node and network constraint model of the energy-minimizing crystal structure in a state of motion equilibrium and obtaining the analytical results includes:

[0035] The atomic nodes whose connections remain unbroken and whose original networks are still intact are grouped into one category and further divided into different structural clusters.

[0036] By labeling the different structural clusters with different tags, multiple different structural clusters with different tags are obtained.

[0037] As an optional embodiment, identifying the rigid and weak regions in the target protein molecule based on the analysis results includes:

[0038] Based on the multiple different structural clusters with different labels, identify the structural clusters with relatively weak interactions, and find the secondary structure information of the amino acid residues in the structural clusters with relatively weak interactions.

[0039] A second aspect of this application provides a device for analyzing the rigid weaknesses of protein structures based on an atomic node and network constraint model, comprising:

[0040] The acquisition module is used to acquire the first crystal structure of the target protein molecule;

[0041] The first simulation module is used to perform a free energy minimization simulation on the first crystal structure to obtain the energy-minimized crystal structure of the target protein molecule.

[0042] The building module is used to construct the atomic node and network constraint model of the energy-minimized crystal structure.

[0043] The second simulation module is used to perform pyrolysis folding molecular dynamics simulation on the atomic node and network constraint model of the energy-minimized crystal structure until the atomic node and network constraint model of the energy-minimized crystal structure is in a state of motion equilibrium.

[0044] The analysis module is used to analyze the atomic node and network constraint model of the energy-minimizing crystal structure in motion equilibrium and obtain the analysis results.

[0045] The determination module is used to identify rigid and weak regions in the target protein molecule based on the analysis results.

[0046] A third aspect of this application provides an electronic device, comprising:

[0047] Processor; and

[0048] A memory that stores executable code, which, when executed by the processor, causes the processor to perform the method described above.

[0049] A fourth aspect of this application provides a computer-readable storage medium having executable code stored thereon, which, when executed by a processor of an electronic device, causes the processor to perform the method described above.

[0050] The technical solution provided in this application may include the following beneficial effects:

[0051] This application's embodiments involve obtaining the first crystal structure of the target protein molecule; performing free energy minimization simulations on the first crystal structure to obtain an energy-minimized crystal structure; constructing an atomic node and network constraint model of the energy-minimized crystal structure, transforming the complex protein structure into a simple structure of nodes, connections, and networks; then performing pyrolysis folding molecular dynamics simulations on the atomic node and network constraint model of the energy-minimized crystal structure until the atomic node and network constraint model of the energy-minimized crystal structure reaches a state of motional equilibrium; analyzing the atomic node and network constraint model of the energy-minimized crystal structure in motional equilibrium and obtaining the analysis results; finally, based on the analysis results, identifying rigid and weak regions in the target protein molecule. This allows for the calculation of rigid and weak regions in the target protein molecule structure, the identification of more suitable modification sites or regions, and thus, targeted mutations of the protein molecule to improve its stability.

[0052] It should be understood that the above general description and the following detailed description are exemplary and explanatory only, and do not limit this application. Attached Figure Description

[0053] The above and other objects, features and advantages of this application will become more apparent from the more detailed description of exemplary embodiments thereof in conjunction with the accompanying drawings, wherein the same reference numerals generally represent the same components in the exemplary embodiments thereof.

[0054] Figure 1 This is a flowchart illustrating a method for analyzing the rigid weaknesses of protein structure based on an atomic node and network constraint model, as shown in an embodiment of this application.

[0055] Figure 2 This is another flowchart illustrating a method for resolving rigid weaknesses in protein structure based on an atomic node and network constraint model, as shown in an embodiment of this application.

[0056] Figure 3 This is a schematic diagram of the flowchart of a method for analyzing the rigid weaknesses of protein structure based on an atomic node and network constraint model, as shown in an embodiment of this application.

[0057] Figure 4 This is a schematic diagram of the rigid and weak regions of the FGF10 protein molecule shown in the embodiments of this application;

[0058] Figure 5 This is a schematic diagram of the device for analyzing the rigid weaknesses of protein structure based on atomic nodes and network constraint models, as shown in the embodiments of this application.

[0059] Figure 6 This is a schematic diagram of the structure of an electronic device shown in an embodiment of this application. Detailed Implementation

[0060] Embodiments of this application will now be described in more detail with reference to the accompanying drawings. While embodiments of this application are shown in the drawings, it should be understood that this application may be implemented in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided to make this application more thorough and complete, and to fully convey the scope of this application to those skilled in the art.

[0061] The terminology used in this application is for the purpose of describing particular embodiments only and is not intended to be limiting of the application. The singular forms “a,” “the,” and “the” used in this application and the appended claims are also intended to include the plural forms unless the context clearly indicates otherwise. It should also be understood that the term “and / or” as used herein refers to and includes any or all possible combinations of one or more of the associated listed items.

[0062] It should be understood that although the terms "first," "second," etc., may be used in this application to describe various information, such information should not be limited to these terms. These terms are only used to distinguish information of the same type from one another. For example, without departing from the scope of this application, first information may also be referred to as second information, and similarly, second information may also be referred to as first information. Thus, a feature defined as "first" or "second" may explicitly or implicitly include one or more of that feature. In the description of this application, "multiple" means two or more, unless otherwise explicitly specified.

[0063] This application provides a method for analyzing the rigid weaknesses of protein structures based on an atomic node and network constraint model. Based on computer and computational structural biology techniques, it can quickly analyze the rigid weaknesses of protein structures and provide a reference for improving the rational design of protein stability.

[0064] The technical solutions of the embodiments of this application are described in detail below with reference to the accompanying drawings.

[0065] Figure 1 This is a flowchart illustrating a method for analyzing the rigid weaknesses of protein structure based on an atomic node and network constraint model, as shown in an embodiment of this application.

[0066] See Figure 1 This application provides a method for analyzing the rigidity weaknesses of protein structures based on an atomic node and network constraint model, comprising steps S1 to S6:

[0067] Step S1: Obtain the first crystal structure of the target protein molecule.

[0068] The embodiments of this application can obtain the first crystal structure of the target protein molecule in the following manner: obtaining the second crystal structure of the target protein molecule; repairing the crystal structure of the second crystal structure to obtain the repaired second crystal structure; and separating the first crystal structure of the target protein molecule from the repaired second crystal structure.

[0069] In this application, the second crystal structure refers to the crystal structure of the target protein molecule resolved by biophysical methods. The second crystal structure is obtained from a protein structure database (PDB). The protein structure database can be pre-constructed. It should also be noted that related modeling methods can be used for modeling, and this application does not limit this approach.

[0070] Step S2: Perform free energy minimization simulation on the first crystal structure to obtain the energy-minimal crystal structure of the target protein molecule.

[0071] The embodiments of this application can obtain the energy-minimized crystal structure in the following way: adding a water molecule model and force field to the first crystal structure to construct the topological structure of the target protein molecule; adding a simulation box to the topological structure; preprocessing the topological structure under vacuum conditions using a preset method to perform energy-minimization simulation; formally performing energy-minimization simulation on the topological structure under vacuum conditions to obtain the simulation results; converting the topological structure corresponding to the simulation results into the energy-minimized crystal structure of the target protein molecule, and outputting the energy-minimized crystal structure.

[0072] Step S3: Construct an atomic node and network constraint model for an energy-minimized crystal structure.

[0073] The embodiments of this application can construct an atomic node and network constraint model of an energy-minimizing crystal structure in the following way: taking all atoms in the energy-minimizing crystal structure as atomic nodes, and taking all covalent and non-covalent bonds between atoms as lines; setting the motion form of the corresponding lines according to the type of covalent and non-covalent bonds.

[0074] The motion mode of the corresponding connection is set according to the type of covalent and non-covalent bond, which may include: if the covalent bond is a peptide bond or a covalent double bond, then the rotation of the connection corresponding to the peptide bond or covalent double bond is restricted; if the covalent bond is a covalent single bond, then the connection corresponding to the covalent single bond is set to be able to rotate around the bond axis and cannot be broken; if the non-covalent bond is one of hydrogen bond, salt bond and hydrophobic interaction, then the connection corresponding to hydrogen bond or salt bond is set to be able to break.

[0075] Step S4: Perform pyrolysis folding molecular dynamics simulations on the atomic node and network constraint model of the energy-minimized crystal structure until the atomic node and network constraint model of the energy-minimized crystal structure is in motion equilibrium.

[0076] Step S4 of this application embodiment can be simulated in the following way: simulating the process of the connection breaking between the atomic nodes and the network constraint model of the energy-minimizing crystal structure during the heating process, until the atomic nodes and the network constraint model of the energy-minimizing crystal structure are in a state of motion equilibrium.

[0077] Step S5: Analyze the atomic node and network constraint model of the energy-minimizing crystal structure in motion equilibrium and obtain the analytical results.

[0078] The embodiments of this application can be analyzed in the following way: the atomic nodes whose connections are not broken and whose original networks are still maintained are clustered into one class and divided into different structural clusters; different structural clusters are labeled with different tags to obtain multiple different structural clusters with different tags.

[0079] In other words, nodes with interactions can be understood as a group of atomic nodes whose connections remain intact and whose original networks are maintained during pyrolysis folding. Therefore, these groups of atomic nodes whose connections remain intact and whose original networks are maintained during pyrolysis folding can be clustered into a class and further classified into different structural clusters.

[0080] Step S6: Based on the analysis results, identify the rigid and weak regions in the target protein molecule.

[0081] The embodiments of this application can identify structural clusters with relatively weak interactions based on multiple different structural clusters with different labels, and find the secondary structure information of amino acid residues in the structural clusters with relatively weak interactions.

[0082] This application's embodiments involve obtaining the first crystal structure of the target protein molecule; performing free energy minimization simulations on the first crystal structure to obtain an energy-minimized crystal structure; constructing an atomic node and network constraint model of the energy-minimized crystal structure, transforming the complex protein structure into a simple structure of nodes, connections, and networks; then performing pyrolysis folding molecular dynamics simulations on the atomic node and network constraint model of the energy-minimized crystal structure until the atomic node and network constraint model of the energy-minimized crystal structure reaches a state of motional equilibrium; analyzing the atomic node and network constraint model of the energy-minimized crystal structure in motional equilibrium and obtaining the analysis results; finally, based on the analysis results, identifying rigid and weak regions in the target protein molecule. This allows for the calculation of rigid and weak regions in the target protein molecule structure, the identification of more suitable modification sites or regions, and thus, targeted mutations of the protein molecule to improve its stability.

[0083] Figure 2 This is another flowchart illustrating a method for resolving rigid weaknesses in protein structure based on an atomic node and network constraint model, as shown in an embodiment of this application. Figure 3 This is a schematic flowchart illustrating a method for analyzing the rigid weaknesses of protein structures based on an atomic node and network constraint model, as shown in an embodiment of this application.

[0084] See Figure 2 and Figure 3 This application discloses a method for analyzing the rigidity weaknesses of protein structures based on an atomic node and network constraint model, comprising the following steps:

[0085] S10: Obtain the second crystal structure containing the target protein molecule.

[0086] The second crystal structure in this application refers to the crystal structure of the target protein molecule resolved by biophysical methods. The second crystal structure is obtained from a protein structure database (PDB). The protein structure database can be pre-constructed. It should also be noted that related modeling methods can be used for modeling, and this application does not limit this approach.

[0087] S11: Repair the crystal structure of the second crystal structure to obtain the repaired second crystal structure.

[0088] For example, in the second crystal structure, the amide groups in the side chains of glutamine and asparagine residues, and the imidazole groups in the side chains of histidine residues, exhibit extremely high electron density symmetry. Therefore, it is necessary to flip the side chains of these amino acid residues by 180° and then determine their specific positions by calculating their interactions with surrounding atoms. Other amino acid residues are optimized for energy through subtle rotations of their side chain groups or peptide planes. Finally, the repaired second crystal structure is obtained.

[0089] S12: The first crystal structure of the target protein molecule is isolated from the repaired second crystal structure.

[0090] The first crystal structure of the target protein molecule can be separated from the repaired second crystal structure using separation methods in related technologies, but this application does not limit this method.

[0091] S20: Add a water molecule model and force field to the first crystal structure to construct the topological structure of the target protein molecule.

[0092] The embodiments of this application can utilize molecular dynamics simulations, input a first crystal structure, add a water molecule model such as TIP3P and a force field such as Amber99sb-ildn to the first crystal structure, and construct the topological structure of the target protein molecule.

[0093] S21: Add a simulation box to the topology.

[0094] The simulation box in this application embodiment can be a solvent box, and the shape and size of the simulation box can be defined. For example, a cubic simulation box with a boundary distance of 2nm can be added to the topology of the target protein molecule.

[0095] S22: Preprocessing is performed on the topology of the added simulation box under vacuum conditions using a pre-set method to simulate energy minimization.

[0096] The preset method in this application embodiment can be the steepest descent method, and the number of steps for minimizing energy can be set according to the actual situation.

[0097] S23: Perform energy minimization simulation on the topology with added simulation box under vacuum conditions and obtain simulation results.

[0098] The optimal number of steps for energy minimization can be obtained from the preprocessing in step S22. The optimal number of steps for energy minimization is then used to perform energy minimization simulation of the topology under vacuum conditions.

[0099] S24: Convert the topological structure corresponding to the simulation results into the energy-minimized crystal structure of the target protein molecule, and output the energy-minimized crystal structure.

[0100] Since the simulation results are the topological structure of the target protein molecule, it is necessary to convert the topological structure of the target protein molecule into a crystal structure, and use it as the energy-minimizing crystal structure of the target protein molecule.

[0101] S30: All atoms in the energy-minimized crystal structure are treated as atomic nodes, and all covalent and non-covalent bonds between atoms are treated as lines.

[0102] In this step, all atoms in the energy-minimized crystal structure can be considered as nodes, and all covalent and non-covalent bonds between atoms can be considered as lines. Alternatively, van der Waals interactions between atoms can be excluded from being considered as lines.

[0103] By constructing an atomic node and network constraint model of an energy-minimal crystal structure, the complex protein structure is transformed into a simple structure of nodes, connections, and networks, which greatly reduces the subsequent computational load and improves the efficiency of resolving the rigid weaknesses of protein molecular structures.

[0104] S31: Set the motion form of the corresponding connection according to the type of covalent and non-covalent bond.

[0105] For example: if the covalent bond is a peptide bond or a covalent double bond, then the rotation of the connection line corresponding to the peptide bond or covalent double bond is restricted. If the covalent bond is a covalent single bond, then the connection line corresponding to the covalent single bond is allowed to rotate around the bond axis and cannot be broken. If the non-covalent bond is one of the following: hydrogen bond, salt bond, or hydrophobic interaction, then the connection line corresponding to the hydrogen bond or salt bond is allowed to be broken.

[0106] S40: Simulates the process of connection breakage of the atomic nodes and network constraint model of the energy-minimized crystal structure during heating, until the atomic nodes and network constraint model of the energy-minimized crystal structure are in motion equilibrium.

[0107] This application's embodiments simulate the pyrolysis folding process and collect a set number, for example, 5000 atomic nodes in motion equilibrium and network constraint model conformations for statistical analysis.

[0108] The following examples of several non-covalent bonds illustrate the thermal delamination and folding process:

[0109] The energy simulation for breaking hydrogen bonds and salt bonds is shown in the following formula:

[0110]

[0111] It is Gaussian white noise. The mean is 0 and the standard deviation depends on the Gaussian distribution of hydrogen bonds.

[0112] The energy simulation for the breakage of a hydrophobic connection is given by the following formula:

[0113]

[0114] The distance between nodes. For van der Waals interaction, Let be the full width at half maximum (FWHM) of the Gaussian function.

[0115] S50: Cluster the atomic nodes whose connections are not broken and whose original networks are still maintained during the analysis into one class and then divide them into different structural clusters.

[0116] Statistical analysis was conducted on the connection breakage status of a set number of atomic nodes in motion equilibrium and the conformation of the network constraint model during the defolding process. A group of atomic nodes whose connections did not break during the pyrolysis and folding process and which still maintained the original network were clustered into a class and further divided into different structural clusters.

[0117] After pyrolysis folding simulation, the atomic nodes that remain connected and maintain the original network have strong interactions, which support the stability of protein molecules. These interacting nodes are clustered together to form different structural clusters.

[0118] S51: Different structural clusters are labeled with different tags to obtain multiple different structural clusters with different tags.

[0119] The embodiments of this application do not limit the form of the markings; for example, they can be colors or numbers, as long as they can distinguish the structural clusters. By distinguishing different structural clusters through markings, the target structural cluster that meets the requirements can be found more clearly and intuitively.

[0120] S60: Based on multiple different structural clusters with different labels, find the structural clusters with relatively weak interactions, and find the secondary structure information of the amino acid residues in the structural clusters with relatively weak interactions.

[0121] For example, if a protein molecule contains relatively independent structural clusters, it indicates that the interactions between these clusters are weak. After pyrolysis folding simulation, the connections break. Therefore, it is possible to consider increasing the interactions between these independent structural clusters to improve the thermal stability of the protein molecule.

[0122] The following embodiments of this application use the human FGF10 protein molecule as an example to illustrate the method for analyzing the rigidity and weakness of protein structure based on the atomic node and network constraint model, including the following steps:

[0123] 1. First, the second crystal structure 1NUN of the receptor FGFR2b complex containing the FGF10 protein molecule, obtained from the PDB database through crystallization and X-ray diffraction, was then repaired. Finally, the first crystal structure of the FGF10 protein molecule was isolated from the second crystal structure 1NUN. Repairing the second crystal structure 1NUN involved addressing the extremely high electron density symmetry of the amide groups in the side chains of glutamine and asparagine residues, and the imidazole groups in the side chains of histidine residues. These amino acid residue side chains needed to be flipped 180°, and their specific positions were determined by calculating their interactions with surrounding atoms. Other amino acid residues were optimized for energy through subtle rotations of their side chain groups or peptide planes. The final repaired second crystal structure 1NUN was obtained.

[0124] 2. Perform free energy minimization simulations on the first crystal structure to obtain the energy-minimized crystal structure. For example, molecular dynamics simulations can be used. Input the first crystal structure of the FGF10 protein molecule, add a TIP3P water molecule model and an Amber99sb-ildn force field to construct the topological structure of the FGF10 protein molecule. Then add a cubic simulation box with a boundary distance of 2 nm. First, use the steepest descent method to perform energy minimization preprocessing on the topological structure of the FGF10 protein molecule under vacuum conditions. Then, perform formal energy minimization simulations on the topological structure of the FGF10 protein molecule under vacuum conditions to obtain the simulation results. Finally, convert the topological structure of the FGF10 protein molecule corresponding to the simulation results into the energy-minimized crystal structure of the target protein molecule and output the energy-minimized crystal structure.

[0125] 3. Construct an atomic node and network constraint model for the energy-minimized crystal structure of the FGF10 protein molecule. For example, consider all atoms in the energy-minimized crystal structure of the FGF10 protein molecule as atomic nodes, and all covalent and non-covalent bonds between atoms as connections, excluding van der Waals interactions between atoms. Then, define the motion patterns of the connections according to the type of covalent and non-covalent bonds. Specifically, peptide bonds and covalent double bonds in the energy-minimized crystal structure of the FGF10 protein molecule are counted as 6 connections, with strict restrictions on their rotation. Covalent single bonds in the energy-minimized crystal structure of the FGF10 protein molecule are counted as 5 connections, which can rotate around the bond axis and cannot be broken. Hydrogen bonds and salt bonds in the energy-minimized crystal structure of the FGF10 protein molecule are counted as 5 connections, which can be broken. Hydrophobic interactions in the energy-minimized crystal structure of the FGF10 protein molecule are counted as 2 connections, which can be broken.

[0126] 4. Perform pyrolysis folding molecular dynamics simulations on the atomic node and network constraint model of the energy-minimized crystal structure of the FGF10 protein molecule until the energy-minimized crystal structure of the FGF10 protein molecule reaches a state of kinematic equilibrium. For example, simulate the pyrolysis folding process with a sample size of 5000, collecting 5000 conformations of the atomic node and network constraint model in a state of kinematic equilibrium for statistical analysis. Simulate the process of connection breakage of the atomic node and network constraint model of the energy-minimized crystal structure of the FGF10 protein molecule during heating until the atomic node and network constraint model of the energy-minimized crystal structure of the FGF10 protein molecule reaches a state of kinematic equilibrium.

[0127] 5. Analyze the atomic nodes and network constraint models of the energy-minimized crystal structure of the FGF10 protein molecule in a state of kinematic equilibrium, and obtain the analytical results. For example, statistically analyze the connection breakage status of 5000 atomic nodes and network constraint models in a state of kinematic equilibrium during the unfolding process. Atomic nodes whose connections do not break during pyrolysis and which still maintain the original network are clustered into one class and divided into different structural clusters; different structural clusters are labeled with different colors, resulting in multiple different structural clusters labeled with different colors.

[0128] 6. Based on multiple structural clusters with different labels, identify the structural clusters with relatively weak interactions in the FGF10 protein molecule, and find the secondary structure information of the amino acid residues in the structural clusters with relatively weak interactions.

[0129] 7. Using the above method, the FGF10 crystal structure was divided into six relatively independent parts: the core region, A region, B region, C region, D region, and N-terminus. These six parts form six relatively independent structural clusters. Each structural cluster has close internal interactions, while the interactions between the six structural clusters are extremely weak. Further investigation was conducted to identify the secondary structure information of amino acid residues in the weakly interacting structural clusters: the N-terminus, β2-β3 loop, β3-β4 loop, heparin-binding region, and C-terminus of FGF10 are not closely associated with other regions. These structural clusters and secondary structures represent the weakly rigid regions of FGF10. Figure 4 By targeting the weakest regions in the rigid structure of a protein molecule, more suitable modification sites or regions can be found, enabling targeted mutations to improve the stability of the protein molecule.

[0130] Corresponding to the aforementioned application function implementation method embodiments, this application also provides a device, electronic device, and corresponding embodiments for analyzing the rigid weaknesses of protein structure based on atomic nodes and network constraint models.

[0131] Figure 5 This is a schematic diagram of the device for analyzing the rigid weaknesses of protein structures based on an atomic node and network constraint model, as shown in an embodiment of this application.

[0132] See Figure 5 A device for analyzing the rigid weaknesses of protein structure based on an atomic node and network constraint model includes: an acquisition module 50, a first simulation module 51, a construction module 52, a second simulation module 53, an analysis module 54, and a determination module 55.

[0133] Acquisition module 50 is used to acquire the first crystal structure of the protein molecule.

[0134] The first simulation module 51 is used to perform free energy minimization simulation on the first crystal structure to obtain the energy-minimized crystal structure of the target protein molecule.

[0135] Module 52 is used to construct the atomic node and network constraint model of the energy-minimized crystal structure.

[0136] The second simulation module 53 is used to perform pyrolysis folding molecular dynamics simulations on the atomic node and network constraint model of the energy-minimized crystal structure until the atomic node and network constraint model of the energy-minimized crystal structure is in a state of motion equilibrium.

[0137] The analysis module 54 is used to analyze the atomic node and network constraint model of the energy-minimizing crystal structure in motion equilibrium and obtain the analysis results.

[0138] The determination module 55 is used to identify rigid and weak regions in the target protein molecule based on the analysis results.

[0139] The acquisition module 50 includes an acquisition submodule, a repair module, and a separation module.

[0140] The acquisition submodule is used to acquire the second crystal structure containing the target protein molecule; the repair module is used to repair the crystal structure of the second crystal structure to obtain the repaired second crystal structure; the separation module is used to separate the first crystal structure of the target protein molecule from the repaired second crystal structure.

[0141] The first simulation module 51 includes a first addition module, a second addition module, a preprocessing module, a formal simulation module, and a conversion output module.

[0142] The first addition module adds a water molecule model and force field to the first crystal structure to construct the topological structure of the target protein molecule; the second addition module adds a simulation box to the topological structure; the preprocessing module uses a preset method to perform energy minimization simulation on the topological structure with added simulation boxes under vacuum conditions; the formal simulation module performs energy minimization simulation on the topological structure with added simulation boxes under vacuum conditions to obtain the simulation results; the conversion output module converts the topological structure corresponding to the simulation results into the energy-minimized crystal structure of the target protein molecule and outputs the energy-minimized crystal structure.

[0143] Module 52 includes a modeling module and a configuration module.

[0144] The modeling module treats all atoms in the energy-minimized crystal structure as atomic nodes and all covalent and non-covalent bonds between atoms as connections. The setting module sets the motion of the corresponding connections according to the type of covalent and non-covalent bonds. For example, if the covalent bond is a peptide bond or a covalent double bond, the rotation of the connection corresponding to the peptide bond or covalent double bond is restricted. If the covalent bond is a covalent single bond, the connection corresponding to the covalent single bond is set to be able to rotate around the bond axis and cannot be broken. If the non-covalent bond is one of hydrogen bonds, salt bonds, or hydrophobic interactions, the connection corresponding to hydrogen bonds or salt bonds is set to be able to break.

[0145] The parsing module 54 includes a clustering module and a labeling module.

[0146] The clustering module is used to group atomic nodes whose connections remain intact and maintain the original network during parsing into one class, and then classify them into different structural clusters. The labeling module is used to assign different labels to different structural clusters, resulting in multiple different structural clusters with different labels. This application does not limit the form of the labels; for example, they can be colors or numbers, as long as they can distinguish the structural clusters. Distinguishing different structural clusters through labels allows for a clearer and more intuitive identification of the target structural cluster that meets the requirements.

[0147] The determination module 55 is used to identify structural clusters with relatively weak interactions based on multiple different structural clusters with different labels, and to find the secondary structure information of amino acid residues in the structural clusters with relatively weak interactions. For example, if there are relatively independent structural clusters in a protein molecule, it indicates that the interactions between these structural clusters are weak. After pyrolysis folding simulation, the connections break. Therefore, it is possible to consider increasing the interactions between these independent structural clusters to improve the thermal stability of the protein molecule.

[0148] This application's embodiments involve obtaining the first crystal structure of the target protein molecule; performing free energy minimization simulations on the first crystal structure to obtain an energy-minimized crystal structure; constructing an atomic node and network constraint model of the energy-minimized crystal structure, transforming the complex protein structure into a simple structure of nodes, connections, and networks; then performing pyrolysis folding molecular dynamics simulations on the atomic node and network constraint model of the energy-minimized crystal structure until the atomic node and network constraint model of the energy-minimized crystal structure reaches a state of motional equilibrium; analyzing the atomic node and network constraint model of the energy-minimized crystal structure in motional equilibrium and obtaining the analysis results; finally, based on the analysis results, identifying rigid and weak regions in the target protein molecule. This allows for the calculation of rigid and weak regions in the target protein molecule structure, the identification of more suitable modification sites or regions, and thus, targeted mutations of the protein molecule to improve its stability.

[0149] Regarding the apparatus in the above embodiments, the specific manner in which each module performs its operation has been described in detail in the embodiments related to the method, and will not be elaborated further here.

[0150] Figure 6 This is a schematic diagram of the structure of an electronic device shown in an embodiment of this application.

[0151] See Figure 6 The electronic device 600 includes a memory 610 and a processor 620.

[0152] The processor 620 can be a Central Processing Unit (CPU), or other general-purpose processors, digital signal processors (DSPs), application-specific integrated circuits (ASICs), field-programmable gate arrays (FPGAs), or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc. The general-purpose processor can be a microprocessor or any conventional processor.

[0153] Memory 610 may include various types of storage units, such as system memory, read-only memory (ROM), and permanent storage devices. ROM may store static data or instructions required by the processor 620 or other modules of the computer. Permanent storage devices may be read-write storage devices. Permanent storage devices may be non-volatile storage devices that retain stored instructions and data even when the computer is powered off. In some embodiments, permanent storage devices use mass storage devices (e.g., magnetic or optical disks, flash memory) as permanent storage devices. In other embodiments, permanent storage devices may be removable storage devices (e.g., floppy disks, optical drives). System memory may be a read-write storage device or a volatile read-write storage device, such as dynamic random access memory. System memory may store some or all of the instructions and data required by the processor during operation. Furthermore, memory 610 may include any combination of computer-readable storage media, including various types of semiconductor memory chips (e.g., DRAM, SRAM, SDRAM, flash memory, programmable read-only memory), and disks and / or optical disks may also be used. In some embodiments, memory 610 may include a removable storage device that is readable and / or writable, such as a laser disc (CD), a read-only digital multifunction optical disc (e.g., DVD-ROM, dual-layer DVD-ROM), a read-only Blu-ray disc, an ultra-high density optical disc, a flash memory card (e.g., SD card, mini SD card, Micro-SD card, etc.), a magnetic floppy disk, etc. Computer-readable storage media do not contain carrier waves or transient electronic signals transmitted wirelessly or via wired connections.

[0154] The processor 620 may include an acquisition module 50, a first simulation module 51, a construction module 52, a second simulation module 53, a parsing module 54, and a determination module 55. For specific functions and connections, please refer to [reference needed]. Figure 5 The description in the text will not be repeated here.

[0155] The electronic device 600 may also include a display for showing the execution results of the processor 520.

[0156] The memory 610 stores executable code, which, when processed by the processor 520, can cause the processor 620 to execute part or all of the methods described above.

[0157] Furthermore, the method according to this application can also be implemented as a computer program or computer program product, which includes computer program code instructions for performing some or all of the steps in the method described above.

[0158] Alternatively, this application may be implemented as a computer-readable storage medium (or a non-transitory machine-readable storage medium or a machine-readable storage medium) storing executable code (or computer program or computer instruction code) thereon, which, when executed by a processor of an electronic device (or server, etc.), causes the processor to perform part or all of the steps of the methods described above according to this application.

[0159] The various embodiments of this application have been described above. These descriptions are exemplary and not exhaustive, nor are they limited to the disclosed embodiments. Many modifications and variations will be apparent to those skilled in the art without departing from the scope and spirit of the described embodiments. The terminology used herein is chosen to best explain the principles, practical application, or improvement of the technology in the market, or to enable others skilled in the art to understand the embodiments disclosed herein.

Claims

1. A method for analyzing protein structure rigidity weakness based on atomic node and network constraint model, characterized in that, include: Obtain the first crystal structure of the target protein molecule; The first crystal structure was subjected to a free energy minimization simulation to obtain the energy-minimal crystal structure of the target protein molecule; Construct an atomic node and network constraint model for the energy-minimizing crystal structure; It includes: taking all atoms in the energy-minimized crystal structure as atomic nodes, and taking the covalent bonds and non-covalent bonds between all atoms as connecting lines; setting the motion form of the corresponding connecting lines according to the type of the covalent bonds and the non-covalent bonds; The atomic node and network constraint model of the energy-minimized crystal structure is subjected to pyrolysis folding molecular dynamics simulation until the atomic node and network constraint model of the energy-minimized crystal structure is in motion equilibrium. The atomic node and network constraint model of the energy-minimizing crystal structure in motion equilibrium is analyzed, and the analysis results are obtained. The analysis includes: grouping the atomic nodes whose connections are not broken and whose original networks are still maintained into one class and dividing them into different structural clusters; and marking the different structural clusters with different labels to obtain multiple different structural clusters with different labels. Based on the analysis results, identify the rigid and weak regions in the target protein molecule; this includes: identifying structural clusters with relatively weak interactions based on the multiple different structural clusters with different labels.

2. The method of claim 1, wherein, Obtaining the first crystal structure of the target protein molecule includes: Obtain a second crystal structure containing the target protein molecule; The crystal structure of the second crystal structure is repaired to obtain the repaired second crystal structure; The first crystal structure of the target protein molecule is isolated from the repaired second crystal structure.

3. The method of claim 1, wherein, The step of performing a free energy minimization simulation on the first crystal structure to obtain the energy-minimized crystal structure of the target protein molecule includes: A water molecule model and force field are added to the first crystal structure to construct the topological structure of the target protein molecule; Add a simulated box to the topology; A pre-processing method is used to perform energy minimization simulations of the topology with added simulation boxes under vacuum conditions; The topology with the added simulation box is subjected to a formal energy minimization simulation under vacuum conditions to obtain the simulation results; the topology corresponding to the simulation results is converted into the energy minimization crystal structure of the target protein molecule, and the energy minimization crystal structure is output.

4. The method of claim 1, wherein, The step of setting the motion form of the connection according to the type of covalent bond and the non-covalent bond includes: If the covalent bond is a peptide bond or a covalent double bond, then the rotation of the connecting line corresponding to the peptide bond or the covalent double bond is restricted; If the covalent bond is a single covalent bond, then the connection line corresponding to the single covalent bond is set to be rotatable around the key axis and cannot be broken; If the non-covalent bond is one of hydrogen bond, salt bond, and hydrophobic interaction, then the connection line corresponding to the hydrogen bond or salt bond is set to be breakable.

5. The method of claim 1, wherein, The step of identifying the rigid and weak regions in the target protein molecule based on the analysis results also includes: Find the secondary structure information of amino acid residues in the structural clusters with relatively weak interactions.

6. A device for analyzing the rigidity weaknesses of protein structures based on an atomic node and network constraint model, characterized in that, include: The acquisition module is used to acquire the first crystal structure of the target protein molecule; The first simulation module is used to perform a free energy minimization simulation on the first crystal structure to obtain the energy-minimized crystal structure of the target protein molecule. A construction module is used to construct the atomic node and network constraint model of the energy-minimized crystal structure. The construction module includes a modeling module and a setting module. The modeling module is used to treat all atoms in the energy-minimized crystal structure as atomic nodes and all covalent and non-covalent bonds between atoms as connections. The setting module is used to set the motion form of the corresponding connections according to the type of covalent and non-covalent bonds. The second simulation module is used to perform pyrolysis folding molecular dynamics simulation on the atomic node and network constraint model of the energy-minimized crystal structure until the atomic node and network constraint model of the energy-minimized crystal structure is in a state of motion equilibrium. The analysis module is used to analyze the atomic node and network constraint model of the energy-minimizing crystal structure in motion equilibrium and obtain the analysis results. The analysis module includes a clustering module and a labeling module. The clustering module is used to cluster atomic nodes whose connections are not broken and still maintain the original network in the analysis into one class and divide them into different structural clusters. The labeling module is used to label different structural clusters with different labels to obtain multiple different structural clusters with different labels. The determination module is used to identify rigid and weak regions in the target protein molecule based on the analysis results; wherein, the determination module is used to find structural clusters with relatively weak interactions based on multiple different structural clusters with different labels.

7. An electronic device, comprising: include: processor; as well as A memory having executable code stored thereon, which, when executed by the processor, causes the processor to perform the method as described in any one of claims 1-5.

8. A computer-readable storage medium having executable code stored thereon, which, when executed by a processor of an electronic device, causes the processor to perform the method as described in any one of claims 1-5.