Transition state searching method, system and application for complex biological macromolecular conformational change
By employing reversible dimensionality reduction and low-dimensional search methods, the curse of dimensionality in the search for transition states of complex biomolecules has been solved, enabling efficient and accurate identification and generation of transition state structures, which is applicable to the study of biomolecular conformational changes.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- THE CHINESE UNIV OF HONG KONG (SHENZHEN)
- Filing Date
- 2025-03-14
- Publication Date
- 2026-06-19
AI Technical Summary
Existing transition state search methods cannot be effectively applied to complex biological macromolecular systems, especially due to the curse of dimensionality and information loss during dimensionality reduction, which makes it impossible to accurately find the transition state.
A reversible dimensionality reduction method is used to project molecular dynamics data from high-dimensional space to low-dimensional space. Combined with low-dimensional search methods such as GAD, candidate transition state samples are searched in low-dimensional space and then back-projected into high-dimensional space for verification, ensuring the accuracy of the transition state structure.
It achieves efficient and accurate transition state structure search, significantly reduces computational costs, and can generate reasonable transition state conformations, avoiding false positive problems, and is applicable to complex biomolecular systems.
Smart Images

Figure CN120356509B_ABST
Abstract
Description
Technical Field
[0001] This application relates to the field of computational simulation of biomolecular systems, and more specifically, to a method, system, and application for searching transition states of conformational changes in complex biomacromolecules. Background Technology
[0002] When biomolecules perform their functions, they are often accompanied by significant structural transformations, known as functional conformational changes. Transition states are crucial for physical chemists to understand and regulate the microscopic mechanisms underlying the functions of biomolecules. Because their existence is extremely short-lived and difficult to capture experimentally, a comprehensive characterization of their structure must be achieved through simulation-driven computational searches based on physical laws. However, unlike chemical reactions which involve only a small number of atoms, the functional conformational changes of biomolecules involve a vast number of atoms and coordinates. Searching for their transition states inevitably encounters the curse of dimensionality, also known as the reaction coordinate problem. Since transition states typically reside in low-probability regions in high-dimensional space, direct search for transition states incurs extremely high computational costs.
[0003] The conformational changes of complex macromolecules involve a vast number of atoms, which in extreme cases can include all the atoms of the solute, or even atoms of lipid and solvent molecules in the environment. The large number of atoms (and their three-dimensional coordinates) poses a significant challenge to the automated analysis of molecular dynamics and the accurate identification of transition states.
[0004] Determining whether a molecule's state (conformation or structure) is a transition state is primarily accomplished through Committeer Analysis (CA). This involves running multiple (tens to thousands) independent molecular dynamics simulations (with different initial rates) starting from the given structure. If approximately 50% of these simulations initially fall into a stable state A, while the other 50% fall into a stable state B, then the molecule can be identified as being in a transition state (or at least not far from it). However, due to the sheer number of possible structures in high-dimensional space, it's impossible to perform CA analysis on every single one. Therefore, CA cannot be used for transition state searching; it can only serve as a post-hoc detection method after finding some candidate transition state samples.
[0005] Existing transition state search methods, such as Gentle Ascent Dynamics (GAD), have significant drawbacks: 1. They can only be performed on potential energy functions described analytically; 2. They can only be performed in low-dimensional spaces. Due to the enormous number of atoms and high dimensionality of complex biomolecules, the reaction coordinate problem arises directly: 1. GAD cannot be directly run in high-dimensional space, requiring dimensionality reduction first; 2. The free energy surface, after dimensionality reduction, cannot be directly expressed analytically; 3. If the dimensionality reduction process is not handled carefully, the transition state region will be distorted, resulting in the loss of relevant information.
[0006] GAD requires frequent calculations of the Hessian matrix, thus its fast search is typically limited to low-dimensional spaces. This characteristic makes GAD difficult to apply directly to biomacromolecule systems, requiring other methods to reduce dimensionality first. However, existing dimensionality reduction algorithms have significant drawbacks. For example, while the commonly used time-lagged Independent Component Analysis (tICA) method utilizes the dynamic information inherent in the data for dimensionality reduction, its low-dimensional coordinates are merely linear combinations of the original high-dimensional space, whereas practical needs may require nonlinear combinations. Other nonlinear dimensionality reduction algorithms only rely on steady-state and density information, completely neglecting dynamic information (the conditional probability of one conformation transitioning to another within a certain relaxation time) during the learning process. Therefore, they cannot guarantee the preservation of dynamic information during dimensionality reduction and cannot be combined with GAD to find transition states.
[0007] Therefore, a scheme is needed to search for transition states of conformational changes in complex biomacromolecules. Summary of the Invention
[0008] To address the problems existing in current technologies, this application provides a method, system, and application for searching transition states in the conformational changes of complex biomolecules. The specific solution is as follows:
[0009] In the first part, this application proposes a transition state search method for conformational changes in complex biological macromolecules, including the following:
[0010] Obtain the input dataset involving molecular dynamics simulation trajectory data;
[0011] The input dataset is reduced in dimensionality by a pre-defined reversible dimensionality reduction method. While preserving the transition state information of the input dataset, the trajectory data is projected from the high-dimensional space to the low-dimensional space to generate a low-dimensional free energy surface.
[0012] A pre-defined low-dimensional search method is used to search for low-dimensional candidate transition state samples on the low-dimensional free energy surface;
[0013] The reversible dimensionality reduction method is used to inversely project the low-dimensional candidate transition state samples back into the high-dimensional space to obtain the complete candidate transition state structure under the high-dimensional representation.
[0014] The candidate transition state structures are verified, and the true transition state structures are selected based on the verification results.
[0015] In some specific embodiments, the dimensionality reduction process specifically includes:
[0016] Based on a preset target molecular system, the relaxation time and dimensionality reduction dimension are set so as to map the trajectory data of the input dataset to the potential space corresponding to the dimensionality reduction dimension through a preset first dimensionality reduction method;
[0017] Based on the current molecular system, the relaxation time and the number of dimensions of the low-dimensional space are set so as to map the data of the potential space to the low-dimensional space through a preset second dimensionality reduction method.
[0018] In some specific embodiments, the search process for the low-dimensional candidate transition state samples includes:
[0019] Determine the starting position and initial search direction, and optimize each starting position based on the initial search direction;
[0020] The search is performed at all starting positions according to the set search step size and number of iterations to obtain the search trajectory at each step;
[0021] By evaluating the convergence of each search trajectory, corresponding data points are extracted as low-dimensional candidate transition state samples.
[0022] In some specific embodiments, the first dimensionality reduction method includes time-structure independent component analysis;
[0023] And / or, the second dimensionality reduction method includes reactive coordinate flow, diffusion model, or flow model based on variational autoencoder.
[0024] In some specific embodiments, the low-dimensional search method includes GAD, a variant of GAD based on Hessian matrix optimization, and a combination of GAD and gradient descent.
[0025] In some specific embodiments, the committer probability of each candidate transition state structure is calculated. If the committer probability of a candidate transition state structure meets a preset interval, the candidate transition state structure is identified as a real transition state structure.
[0026] In some specific embodiments, the process of obtaining the real transition state structure includes:
[0027] Molecular dynamics simulations were performed on each candidate transition state structure to obtain multiple molecular dynamic trajectories;
[0028] Determine the metastable state that each molecular dynamics trajectory first approaches, and calculate the committor probability based on the number of different metastable states that all molecular dynamics trajectories first approach.
[0029] If the committer probability of a candidate transition state structure is within a preset range, then the candidate transition state structure is considered to be a true transition state structure.
[0030] In the second part, this application proposes a transition state search system for conformational changes in complex biomolecules, including the following:
[0031] The input unit is used to acquire the input dataset involving molecular dynamics simulation trajectory data;
[0032] The dimension reduction unit is used to reduce the dimension of the input dataset using a preset reversible dimension reduction method. While preserving the transition state information of the input dataset, the trajectory data is projected from the high-dimensional space to the low-dimensional space to generate a low-dimensional free energy surface.
[0033] The search unit is used to search for low-dimensional candidate transition state samples on the low-dimensional free energy surface using a preset low-dimensional search method.
[0034] Candidate units are used to inversely project low-dimensional candidate transition state samples back into high-dimensional space using the reversible dimensionality reduction method to obtain the complete candidate transition state structure under the high-dimensional representation.
[0035] The verification unit is used to verify the candidate transition state structure and select the true transition state structure based on the verification results.
[0036] Part Three: This application discloses a computer device, said computer device comprising:
[0037] One or more processors;
[0038] Memory, used to store one or more programs;
[0039] When the one or more programs are executed by the one or more processors, the one or more processors implement the transition state search method for conformational changes of complex biomacromolecules as described in any of the first parts.
[0040] In Part Four, this application proposes a computer program product including executable instructions for implementing, when executed by a processor, a transition state search method for conformational changes of complex biomacromolecules as described in any of Part One.
[0041] Beneficial Effects: This application proposes a method, system, and application for searching transition states in the conformational changes of complex biomolecules. Based on generative artificial intelligence, it achieves high computational efficiency and high accuracy in searching high-dimensional transition state structures through reversible dimensionality reduction and low-dimensional search. Furthermore, it exhibits strong scalability and high physical reliability, making it suitable for complex biomolecular systems. By projecting high-dimensional molecular conformational data into a low-dimensional space and parallelizing the transition state search in that space, the search complexity is significantly lower than in the high-dimensional space, thus significantly reducing computational costs. The search process not only detects transition states but also generates reasonable transition state conformations, accurately representing the key structures of the corresponding conformational change process, avoiding false positives caused by abnormal data distribution, and effectively reducing the false positive rate of transition state identification.
[0042] To make the above-mentioned objectives, features and advantages of this application more apparent and understandable, preferred embodiments are described below in detail with reference to the accompanying drawings. Attached Figure Description
[0043] To more clearly illustrate the technical solutions of the embodiments of this application, the accompanying drawings used in the embodiments will be briefly introduced below. It should be understood that the following drawings only show some embodiments of this application and should not be regarded as a limitation of the scope. For those skilled in the art, other related drawings can be obtained based on these drawings without creative effort.
[0044] Figure 1 This is a schematic diagram of the transitional search method of this application;
[0045] Figure 2 This is a schematic diagram of the RCF-GAD method of this application.
[0046] Figure 3 This is a schematic diagram of the RCF principle of this application;
[0047] Figure 4 This is a metastable state distribution diagram of the T4 lysozyme L99A variant during the transition from the ground state to the excited state in the verification experiment of this application;
[0048] Figure 5 This is a three-dimensional visualization of the transition state distribution in the 4-reaction coordinate space during the verification experiment of this application;
[0049] Figure 6 This is a schematic diagram of key conformational changes related to the transition state identified in the verification experiments of this application;
[0050] Figure 7 This is a schematic diagram of the work role simulation system module in this application.
[0051] Figure labels: 1-Input unit; 2-Dimensionality reduction unit; 3-Search unit; 4-Candidate unit; 5-Validation unit. Detailed Implementation
[0052] The technical solutions of the embodiments of this application will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only a part of the embodiments of this application, and not all of the embodiments. Based on the embodiments of this application, all other embodiments obtained by those of ordinary skill in the art without creative effort are within the scope of protection of this application.
[0053] This application proposes a transition state search method for conformational changes in complex biomacromolecules. Combining reversible dimensionality reduction and low-dimensional search methods, the method first reduces the dimensionality of the complex biomacromolecule molecular dynamics dataset using a reversible dimensionality reduction method that preserves the dynamic information from the high-dimensional space in the low-dimensional space. Then, it searches for possible candidate transition state sample points on the low-dimensional free energy surface. Finally, the low-dimensional candidate samples are inversely projected back into the original high-dimensional space, and the candidate transition states are verified in the original space to ensure the accuracy of the obtained high-dimensional transition state structures. A flowchart of the transition state search method for conformational changes in complex biomacromolecules is attached. Figure 1 As shown in the attached diagram, the principle is as follows: Figure 2 As shown, the specific solution is as follows:
[0054] A method for searching transition states in the conformational changes of complex biological macromolecules, comprising the following:
[0055] 101. Obtain the input dataset involving molecular dynamics simulation trajectory data;
[0056] 102. The input dataset is reduced in dimensionality by a pre-defined reversible dimensionality reduction method. While preserving the transition state information of the input dataset, the trajectory data is projected from the high-dimensional space to the low-dimensional space to generate a low-dimensional free energy surface.
[0057] 103. A pre-defined low-dimensional search method is used to search for low-dimensional candidate transition state samples on the low-dimensional free energy surface;
[0058] 104. By using the reversible dimensionality reduction method, the low-dimensional candidate transition state samples are inversely projected back into the high-dimensional space to obtain the complete candidate transition state structure under the high-dimensional representation.
[0059] 105. Verify the candidate transition state structures and select the true transition state structures based on the verification results.
[0060] This application combines reversible dimensionality reduction methods with low-dimensional search methods such as GAD for transition state search. The dimensionality reduction process can simultaneously preserve density information and transition pair information, avoiding the loss of transition states caused by traditional dimensionality reduction. GAD search is performed in the low-dimensional space, resulting in highly parallel search and significantly reduced time costs. The low-dimensional transition states are reversibly mapped back to the high-dimensional space, ensuring accurate reconstruction of the transition state structure. Taking RCF as the reversible dimensionality reduction method and GAD as the low-dimensional search method as an example, the principle is illustrated in the appendix. Figure 2 As shown in the figure. Verification has shown that the combination of RCF and GAD reduces the transition state search time from several hours in high-dimensional systems to just a few minutes.
[0061] In this application, the reversible dimensionality reduction method uses transition state information as a loss function during training to preserve the transition state information in the original input dataset. This transition state information is hidden in the transition (transition, conditional) probabilities of molecules moving from one structure to another within a relaxation time in the high-dimensional trajectory, and cannot be directly obtained. However, since the reversible neural network training incorporates the transition probabilities themselves as part of the loss function—meaning the training objective is to make the transition probabilities in the reduced low-dimensional space as consistent as possible with those in the original high-dimensional space—this hidden information is preserved during the dimensionality reduction process.
[0062] Step 101 involves preparing the input data. After acquiring sufficient molecular dynamics simulation trajectory data, the input data needs to be preprocessed. For example, the molecular conformations in the trajectory are aligned to eliminate the effects of structural variations. Subsequently, key input information, such as the coordinates of all heavy atoms, is extracted from the aligned trajectory data and organized into a dataset for subsequent training.
[0063] Step 102 involves dimensionality reduction of the input dataset. A reversible dimensionality reduction method is used to project the trajectory data into a low-dimensional space, generating a low-dimensional free energy surface. This dimensionality reduction process can simultaneously preserve transition state information such as the system's density and transition pair information.
[0064] In some specific embodiments, the dimensionality reduction process specifically includes: setting the relaxation time and dimensionality reduction dimension based on the preset target molecular system, so as to map the trajectory data of the input dataset to the latent space corresponding to the dimensionality reduction dimension through a preset first dimensionality reduction method; setting the relaxation time and the number of dimensions of the low-dimensional space based on the current molecular system, so as to map the data of the latent space to the low-dimensional space through a preset second dimensionality reduction method.
[0065] In some specific embodiments, the first dimensionality reduction method includes time-structure independent component analysis; and / or, the second dimensionality reduction method includes reaction coordinate flow, diffusion models, or flow models based on variational autoencoders. Reversible dimensionality reduction methods such as RCF and diffusion models are employed to ensure that important transition state information is not lost during the dimensionality reduction process. High-dimensional molecular conformation data is projected into a low-dimensional latent space, and the transition state search is parallelized in the low-dimensional space. The search complexity in the low-dimensional space is much lower than that in the high-dimensional space, significantly reducing computational costs. Compared to traditional dimensionality reduction methods, this invention can accurately recover the key state conformations in the original high-dimensional space after dimensionality reduction, ensuring the integrity of transition state information.
[0066] Specifically, based on the characteristics of the target molecular system, an appropriate lag time and dimensionality reduction dimension *d* are selected, and Time-lagged Independent Component Analysis (TICA) is applied to the input dataset for dimensionality reduction. During TICA dimensionality reduction, a transformation matrix is calculated, mapping the original high-dimensional data to a latent space. Typically, the dimension of the latent space is set to retain approximately 80% of the original spatial information. Furthermore, the calculated transformation matrix and the dimensionality-reduced latent space data need to be collected for subsequent analysis and modeling.
[0067] Specifically, on the latent space data after initial dimensionality reduction, the Reaction Coordinate Flow (RCF) method is further used for dimensionality reduction. First, a relaxation time suitable for the current molecular system is selected (the value from the previous step can be used), and the number of dimensions of the final required low-dimensional space (Reaction Coordinates, RCs) is set (e.g., 1-20). Then, the RCF model is trained, and the trained model and its corresponding low-dimensional space data are saved for subsequent analysis and application.
[0068] Among them, the Reaction Coordinate Flow (RCF) is a normalized rheotype that learns dynamic information, and its basic principle is as follows: Figure 3 As shown, it includes: RCF first processes the input data x t Using TICA, based on the obtained transformation matrix M D×m Obtain data from the latent space Where s t =x t ·M+b t m < D, where D is the dimension of the original space and m is the dimension of the set latent space, with a range of D*(2-80%).
[0069] Next, by adjusting the loss function The optimized RCF model is constructed, and the trajectory s is determined based on the set τ,d. r Further dimensionality reduction to RCs space z t ,in:
[0070]
[0071] The average log-likelihood of transition pairs represents the dynamic information of the trajectory.
[0072]
[0073] For s t The average marginal log-likelihood of s t ), which represents the density distribution information of the trajectory; d is the spatial dimension of RCs; τ is the minimum relaxation time that enables TICA.
[0074] Brownian dynamics, the reduced dynamic information within the RCs space. This indicates that other dynamic models can also be used here.
[0075] Step 103 involves searching for low-dimensional candidate transition state samples. The low-dimensional search methods include GAD, a variant of GAD based on Hessian matrix optimization, and a combination of GAD and gradient descent. Preferably, Gentlest Ascent Dynamics (GAD) is used to search for low-dimensional candidate transition state samples. GAD is a representative algorithm for transition state search. Starting from a metastable or arbitrary state in a predefined low-dimensional space, it can directly complete the transition state search within the low-dimensional potential energy surface space. The algorithm's principle is as follows: starting from any point in the low-dimensional potential energy surface space, according to the following rules...
[0076]
[0077] This determines the displacement direction for each iteration, i.e., moving in small steps along the direction of minimum rate of change of the potential energy function gradient, eventually converging at the saddle point (i.e., the transition state). Here... middle The force is calculated from the potential energy gradient of the molecular system in the current low-dimensional CV space; while The eigenvector is set to approach the smallest eigenvalue of the Hessian matrix of the potential energy function, i.e., pointing in the direction of minimum curvature. It needs to converge through repeated iterations. During this process, γ controls the pair of eigenvectors H. The ability to influence changes is used to eliminate noise in the potential energy function. Simply put, the rules of the formula guide the molecule to continuously climb against the trend along the gentlest potential slope until it converges and stagnates in the transition state. It's important to note that GAD requires frequent calculations of the Hessian matrix, thus its fast search is usually limited to low-dimensional spaces. This characteristic makes GAD difficult to apply directly to biomolecular systems, requiring other methods to first reduce the dimensionality.
[0078] After the dimensionality reduction in step 102, the computational efficiency of GAD is significantly improved due to the lower spatial dimension after dimensionality reduction. This avoids the computational bottleneck of high-dimensional search and allows GAD to be run directly on the equivalent potential energy surface to find candidate transition state sample points. These points are then inversely projected back into the original high-dimensional space to complete the CA test of the candidates and accurately determine the transition state structure.
[0079] In some specific embodiments, the search process for low-dimensional candidate transition state samples includes: determining the starting position and initial search direction of the search, and optimizing each starting position according to the initial search direction; performing a search at all starting positions according to the set search step size and number of iterations to obtain the search trajectory for each step; and extracting the corresponding data points as low-dimensional candidate transition state samples by evaluating the convergence of each search trajectory.
[0080] Choosing the starting location: The starting location can be any point in the low-dimensional space, typically the location of metastable states (MS). The location of the metastable states can be obtained through clustering methods such as density peaks, or determined based on prior knowledge.
[0081] The first step in initializing the search direction is to ensure a comprehensive search within the low-dimensional space. The initial search direction *n* needs to cover multiple possibilities. For example, in a low-dimensional space of dimension *n*, each dimension can generate three initial directions, following these rules: move one unit along the negative direction of that dimension; move one unit along the positive direction of that dimension; or remain at the origin. By combining these three choices in different dimensions, 3*n* initial directions can be generated, thus efficiently covering all possible search paths in the low-dimensional space. For example, in a 2RCs (two-response coordinate) dimensionality-reduced space, nine initial directions can be generated; in a 4RCs dimensionality-reduced space, 81 initial directions can be generated.
[0082] Optimize the starting position: The following two methods can be used to improve search efficiency: (a) Set the endpoint of the first step of the search path to the point farthest from the initial metastable state (MS) within the cluster where it is located, along the specified initial search direction. (b) Set the endpoint of the first step of the search path to a point that has moved a certain distance (1 to 5 units) along the specified initial direction.
[0083] Set GAD parameters: Select an appropriate search step size (e.g., 0.05), set γ = 10, and set the number of iterations, which can be set to 300.
[0084] Perform a GAD search: Perform a GAD search at all starting positions.
[0085] Candidate transition states are selected: The convergence of the GAD trajectory is evaluated, for example, by comparing the Root-Mean-Square-Distance (RMSD) at steps 200 and 300. For GAD search paths with RMSD less than the critical value, the data points corresponding to step 200 are extracted as candidate transition states in the low-dimensional space.
[0086] Step 104 involves obtaining candidate transition state structures from low-dimensional candidate transition state samples. Various methods can be used to obtain these structures. In some specific embodiments, the process includes: using a reversible dimensionality reduction method to restore the low-dimensional candidate transition state samples from the low-dimensional space to the high-dimensional space to reconstruct the candidate transition state structure; specifically, reconstructing the candidate transition state through RCF and TICA inverse projection: for candidate transition states in the low-dimensional space, using RCF and TICA inverse projection to the high-dimensional conformation space to reconstruct the candidate transition state structure. The low-dimensional transition state is inversely mapped back to the high-dimensional space, ensuring accurate reconstruction of the transition state structure.
[0087] Step 105 involves verifying candidate transition state structures. In some specific embodiments, the committer probability of each candidate transition state structure is calculated. If the committer probability of a candidate transition state structure falls within a preset range, then the candidate transition state structure is identified as a true transition state structure. By calculating the committer probability of the transition state structure, its authenticity as a true transition state is verified.
[0088] In some specific embodiments, the process of obtaining the true transition state structure includes: performing molecular dynamics simulations on each candidate transition state structure to obtain multiple molecular dynamic trajectories; determining the metastable state that each molecular dynamic trajectory first approaches; calculating the committor probability based on the number of different metastable states that all molecular dynamic trajectories first approach; and identifying the candidate transition state structure as a true transition state structure if the committor probability is within a preset interval. Dynamic verification is performed using Committor Analysis (CA) to ensure that the identified transition state is indeed located at the dynamic boundary between two stable states. This method significantly improves the physical reliability of the model. During molecular dynamics simulations, each trajectory is ensured to have the same initial position but different initial velocities (directions of motion), resulting in multiple molecular dynamic trajectories obtained at once.
[0089] First, run CMD (Classical Molecular Dynamics) to generate candidate transition state verification data: for each candidate transition state, run CMD with multiple molecular dynamic trajectories (50 to 100), and the simulation time range for each trajectory can be set to 50 ps-2 ns.
[0090] Next, the CA results are analyzed to determine the authenticity of the candidate transition states: First, the metastable state that each molecular dynamics trajectory first approaches is identified. For each molecular dynamics trajectory, the first point at which its distance from the metastable state is less than the critical value (RMSD) is determined. At the moment when the trajectory first falls into metastable state A and metastable state B, the metastable state of the trajectory is identified. Then, the trajectory trends are statistically analyzed and the committer probability is calculated: the number of trajectories that first fall into metastable state A and metastable state B is counted, and the committer probability is calculated using the formula:
[0091]
[0092] If the committer probability of a candidate transition state structure is between 40% and 60%, then the candidate transition state is considered a true transition state.
[0093] Because the search scheme of this application incorporates generative modeling, it can more clearly model steady state and transition state in low-dimensional potential space, and can accurately distinguish key transition regions between different states even when facing complex biomolecular systems.
[0094] In some embodiments, the search scheme of this application can be extended to other fields, including:
[0095] Chemical reaction kinetics: can be used to search chemical reaction pathways and predict reaction intermediates and rate-determining steps.
[0096] Materials science: It can be used to predict material phase transitions and interface migration paths.
[0097] Protein-ligand interaction prediction: can be used for small molecule drug design to find key intermediates in receptor-ligand binding.
[0098] Furthermore, in addition to biological macromolecules such as proteins, the search scheme of this application is also applicable to small molecule systems, nanomaterials, or polymer systems. By adjusting the dimensionality reduction strategy, it can be applied to ultra-high-dimensional systems (such as complex biological network modeling).
[0099] In practical applications, the search method of this invention is primarily implemented on GPUs / high-performance computing clusters, but can be replaced by quantum computing optimization or low-power AI chips for efficient computation. Employing a distributed computing scheme further enhances the feasibility of large-scale simulations.
[0100] To verify the effectiveness of the search method in this application, dimensionality reduction using RCF and the search using GAD were performed. This method was applied to a relatively complex biomolecular system—the conformational transitions of the T4 lysozyme L99A variant (T4L-L99A) in an explicit solvent environment, from the ground state (G state) to the excited state (E state), as follows: Figure 4 As shown. Figure 4The metastable state distribution during the transition from the ground to the excited state of the T4 lysozyme L99A variant is shown. The ground state (MS1) and excited state (MS5) structures of T4L-L99A are presented, highlighting key residues M106, F114, and W138, as well as the α0 and α1 helices. During the transition from the ground to the excited state, the flipping of F114 and W138, the uplift of M106, and the rearrangement of the α0 and α1 helices occur. Specifically, the α0 helix corresponds to residues 114-122 in the ground state (G state) and residues 119-122 in the excited state (E state), while the α1 helix corresponds to residues 107-113 in the G state and residues 107-118 in the E state. In previous studies, a low free energy path (LFEP) containing five metastable states (MSs) and four transition states (TSs) was identified using the Traveling Salesman Pathfinder (TAPS) protocol. In this experimental study, the LFEP nodes identified by TAPS (i.e., conformations containing the complete system and solvent) were used as the initial conformations. Four sets of unbiased molecular dynamics (MD) simulations were performed for each path node, each lasting 50 ns, resulting in a total of 2650 ns of unbiased trajectory data. Subsequently, we trained an RCF model on the T4L-L99A dataset and reduced its dimensionality to four reaction coordinates (RCs).
[0101] After performing a GAD search in 4D space, three transition states validated by Committer Analysis (CA) were successfully located: TS12 (MS1 / 2), TS23 (MS2 / 3), and TS45 (MS4 / 5). Figure 5 As shown. Figure 6 Each subplot (A–E) illustrates the key conformational changes associated with the identified transition states. (A) TS12, located between MS1 and MS2, shows the downward shift of the F114 benzene ring. (B) TS23, located between MS2 and MS3, depicts the right-flipping motion of the F114 benzene ring. (C) TS23, located between MS2 and MS3, shows a slight increase in the angle between the α0 and α1 helices. (D) TS45, located between MS4 and MS5, captures the flipping transition of W138. (E) TS45 highlights the upward shift of M106. This method successfully identified transition states that clearly demonstrate conformational transitions and is highly consistent with transition states obtained by the Taps method. Although TS34 (MS3 / 4) was not identified, possibly due to the high structural similarity or low energy barrier of MS3 / 4, the obtained RCF-GAD transition states (TSs) are highly consistent with the TSs previously identified by TAPS, specifically as follows: Figure 6As described in AE. Overall, the TS structure resolved by RCF-GAD is more reasonable than that of the TAPS method. For example, in TS23 resolved by RCF-GAD, F114 is located closer to the middle region of MS2 / 3, which is more reasonable than TS23 resolved by TAPS. Figure 6 As shown in B. Furthermore, in TS45 ( Figure 6 In D), the W138 position resolved by RCF-GAD is closer to the equal division interface between MS4 / 5, while the TS45 position resolved by TAPS is slightly off.
[0102] Experimental results show that the TS structure resolved by RCF-GAD is more accurate than that obtained by TAPS because it combines unbiased MD trajectory data and reaction coordinate mapping, allowing the TS structure to form naturally under realistic kinetic conditions. These results demonstrate that the RCF-GAD method is applicable to real-world large biomolecule systems and can be used to optimize TS structures obtained by bias sampling-based path methods (such as the Finite Temperature String method, Fast Tomographic method, Path Metadynamics method, and TAPS method).
[0103] This application provides a transition state search system for conformational changes in complex biological macromolecules. A schematic diagram of the system's modules is attached. Figure 7 As shown, the system includes the following:
[0104] Input unit 1 is used to acquire the input dataset involving molecular dynamics simulation trajectory data;
[0105] Dimensionality reduction unit 2 is used to reduce the dimension of the input dataset using a preset reversible dimension reduction method. While preserving the transition state information of the input dataset, the trajectory data is projected from the high-dimensional space to the low-dimensional space to generate a low-dimensional free energy surface.
[0106] Search unit 3 is used to search for low-dimensional candidate transition state samples on the low-dimensional free energy surface using a preset low-dimensional search method.
[0107] Candidate unit 4 is used to inversely project low-dimensional candidate transition state samples back into high-dimensional space using the reversible dimensionality reduction method to obtain the complete candidate transition state structure under the high-dimensional representation.
[0108] Verification unit 5 is used to verify the candidate transition state structure and select the real transition state structure based on the verification results.
[0109] This application provides a computer program product including computer instructions stored in a computer-readable storage medium. A processor of a computer device reads the computer instructions from the computer-readable storage medium and executes the computer instructions, causing the computer device to perform a transition state search method for conformational changes of complex biomolecules. Applying a transition state search method for conformational changes of complex biomolecules to a computer program product facilitates execution.
[0110] This application also provides a computer-readable storage medium having a computer program stored thereon, which, when executed by a processor, implements the steps of a transition state search method for conformational changes of complex biomacromolecules as described above.
[0111] The computer storage medium of this application can be any combination of one or more computer-readable media. The computer-readable medium can be a computer-readable signal medium or a computer-readable storage medium. For example, a computer-readable storage medium can be, but is not limited to, an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination thereof. More specific examples of computer-readable storage media (a non-exhaustive list) include: an electrical connection having one or more wires, a portable computer disk, a hard disk, random access memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), optical storage device, magnetic storage device, or any suitable combination thereof. In this document, a computer-readable storage medium can be any tangible medium containing or storing a program that can be used by or in conjunction with an instruction execution system, apparatus, or device. This application applies a transition state search method for conformational changes of complex biomolecules to a computer-readable storage medium storing a computer program. When executed by a processor, this program implements the steps of the clothing simulation method provided in this application, which is simple, fast, easy to store, and not easily lost.
[0112] This application proposes a method, system, and application for searching transition states in the conformational changes of complex biomolecules. Based on reversible dimensionality reduction and low-dimensional search, it achieves high computational efficiency and high accuracy in searching high-dimensional transition state structures, and is highly scalable and physically reliable, applicable to complex biomolecular systems. By projecting high-dimensional molecular conformational data into a low-dimensional space and parallelizing the transition state search in the low-dimensional space, the search complexity is significantly lower than in the high-dimensional space, significantly reducing computational costs. The search process not only detects transition states but also generates reasonable transition state conformations, accurately representing the key structures of the corresponding conformational change process, avoiding false positives caused by abnormal data distribution, and effectively reducing the false positive rate of transition state identification.
[0113] Those skilled in the art will understand that the modules described above can be implemented using general-purpose computing systems. They can be centralized on a single computing system or distributed across a network of multiple computing systems. Optionally, they can be implemented using computer-executable program code, allowing them to be stored in a storage system for execution by the computing system. Alternatively, they can be fabricated as separate integrated circuit modules, or multiple modules or steps can be fabricated as a single integrated circuit module. Thus, this application is not limited to any particular combination of hardware and software.
[0114] Note that the above description is merely a preferred embodiment and the technical principles employed in this application. Those skilled in the art will understand that this application is not limited to the specific embodiments described herein, and various obvious changes, readjustments, and substitutions can be made without departing from the scope of protection of this application. Therefore, although this application has been described in detail through the above embodiments, this application is not limited to the above embodiments. Many other equivalent embodiments may be included without departing from the concept of this application, and the scope of this application is determined by the scope of the appended claims.
[0115] The above disclosures are only a few specific implementation scenarios of this application. However, this application is not limited to these. Any variations that can be conceived by those skilled in the art should fall within the protection scope of this application.
Claims
1. A method for searching transition states of conformational changes of complex biological macromolecules, characterized in that, Including the following: Obtain the input dataset involving molecular dynamics simulation trajectory data; The input dataset is reduced in dimensionality by a pre-defined reversible dimensionality reduction method. While preserving the transition state information of the input dataset, the trajectory data is projected from the high-dimensional space to the low-dimensional space to generate a low-dimensional free energy surface. A pre-defined low-dimensional search method is used to search for low-dimensional candidate transition state samples on the low-dimensional free energy surface; The reversible dimensionality reduction method is used to inversely project the low-dimensional candidate transition state samples back into the high-dimensional space to obtain the complete candidate transition state structure under the high-dimensional representation. The candidate transition state structures are verified, and the true transition state structures are selected based on the verification results. The search process for the low-dimensional candidate transition state samples includes: determining the starting position and initial search direction, and optimizing each starting position according to the initial search direction; performing a search at all starting positions according to the set search step size and number of iterations to obtain the search trajectory for each step; and extracting the corresponding data points as low-dimensional candidate transition state samples by evaluating the convergence of each search trajectory.
2. The transition state search method for conformational changes of complex biological macromolecules according to claim 1, characterized in that, The dimensionality reduction process specifically includes: Based on a preset target molecular system, the relaxation time and dimensionality reduction dimension are set so as to map the trajectory data of the input dataset to the potential space corresponding to the dimensionality reduction dimension through a preset first dimensionality reduction method; Based on the current molecular system, the relaxation time and the number of dimensions of the low-dimensional space are set so as to map the data of the potential space to the low-dimensional space through a preset second dimensionality reduction method.
3. The method of transition state searching for conformational changes of complex biological macromolecules according to claim 2, characterized in that, The first dimensionality reduction method includes time-structure independent component analysis; And / or, the second dimensionality reduction method includes reactive coordinate flow, diffusion model, or flow model based on variational autoencoder.
4. The method of transition state searching for conformational changes of complex biological macromolecules according to claim 2, wherein, The low-dimensional search methods include GAD, a GAD variant based on Hessian matrix optimization, and a combination of GAD and gradient descent.
5. The transition state search method for conformational changes of complex biological macromolecules according to claim 1, characterized in that, Calculate the committer probability for each candidate transition state structure. If the committer probability of a candidate transition state structure meets the preset interval, then the candidate transition state structure is identified as the true transition state structure.
6. The transition state search method for conformational changes of complex biological macromolecules according to claim 1, characterized in that, The process of obtaining the real transition state structure includes: Molecular dynamics simulations were performed on each candidate transition state structure to obtain multiple molecular dynamic trajectories; Determine the metastable state that each molecular dynamics trajectory first approaches, and calculate the committor probability based on the number of different metastable states that all molecular dynamics trajectories first approach. If the committer probability of a candidate transition state structure is within a preset range, then the candidate transition state structure is considered to be a true transition state structure.
7. A transition state searching system for conformational changes of complex biological macromolecules, characterized by, Including the following: The input unit is used to acquire the input dataset involving molecular dynamics simulation trajectory data; The dimension reduction unit is used to reduce the dimension of the input dataset using a preset reversible dimension reduction method. While preserving the transition state information of the input dataset, the trajectory data is projected from the high-dimensional space to the low-dimensional space to generate a low-dimensional free energy surface. The search unit is used to search for low-dimensional candidate transition state samples on the low-dimensional free energy surface using a preset low-dimensional search method. Candidate units are used to inversely project low-dimensional candidate transition state samples back into high-dimensional space using the reversible dimensionality reduction method to obtain the complete candidate transition state structure under the high-dimensional representation. The verification unit is used to verify the candidate transition state structure and select the real transition state structure based on the verification results. The search process for the low-dimensional candidate transition state samples includes: determining the starting position and initial search direction, and optimizing each starting position according to the initial search direction; performing a search at all starting positions according to the set search step size and number of iterations to obtain the search trajectory for each step; and extracting the corresponding data points as low-dimensional candidate transition state samples by evaluating the convergence of each search trajectory.
8. A computer device, characterized in that, The computer device includes: One or more processors; Memory, used to store one or more programs; When the one or more programs are executed by the one or more processors, the one or more processors implement the transition state search method for conformational changes of complex biomacromolecules as described in any one of claims 1-6.
9. A computer program product, characterised in that, It includes executable instructions that, when executed by a processor, implement the transition state search method for conformational changes of complex biomacromolecules as described in any one of claims 1-6.