Molecular conformation generation method and device based on evolutionary clustering algorithm

By optimizing the molecular conformation generation method through evolutionary clustering algorithm, dynamically adjusting the search strategy, identifying unexplored regions, and prioritizing the exploration of potential low-energy structures, the inefficiency and local optima problems of existing technologies are solved, and efficient global optimal solution discovery is achieved.

CN120877912BActive Publication Date: 2026-06-26烟台国工智能科技有限公司

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
烟台国工智能科技有限公司
Filing Date
2025-07-10
Publication Date
2026-06-26

AI Technical Summary

Technical Problem

Existing technologies are inefficient in molecular structure search, prone to getting stuck in local optima, and have low utilization of intermediate structural information, making it difficult to effectively improve the probability of finding the global optimum.

Method used

A molecular conformation generation method based on evolutionary clustering algorithm is adopted. Through local optimization, hierarchical clustering and adaptive evolution strategy, the search path is dynamically adjusted to identify unexplored conformation space regions, prioritize the exploration of potential low-energy structures, and combine anomaly structure priority selection and clustering penalty mechanism to balance exploration and development.

Benefits of technology

It significantly improves the convergence speed of evolutionary algorithms, reduces redundant calculations, enhances the efficiency and intelligence of structure search, and increases the probability of discovering the global optimum. It is applicable to various systems such as organic molecules and inorganic materials.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN120877912B_ABST
    Figure CN120877912B_ABST
Patent Text Reader

Abstract

The application discloses a molecular conformation generation method and device based on an evolutionary clustering algorithm. The method optimizes an initial molecular structure input through a local optimization strategy, obtains an optimized initial molecular structure, and generates an initial population based on the optimized initial molecular structure. Clustering analysis is performed through a hierarchical clustering strategy, and a plurality of structure clusters are obtained. Abnormal structure clusters and normal structure clusters are obtained by judging the maximum distance in the clusters. For the abnormal structure clusters, the abnormal structures in the clusters are taken as parents in a set proportion to perform population iteration. For the normal structure clusters, a search strategy is optimized through an adaptive evolution strategy to perform population iteration. An algorithm termination condition is generated based on energy convergence criteria and structure convergence criteria. When the population iteration meets the algorithm termination condition or reaches a maximum iteration number, the iteration is stopped. A molecular conformation meeting energy criteria in the iteration process is output, and a final molecular conformation is obtained. The application can improve the efficiency and intelligent level of structure search.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the fields of computational chemistry and artificial intelligence, specifically to a method and apparatus for generating molecular conformations based on an evolutionary clustering algorithm. Background Technology

[0002] In materials science, chemistry, and molecular biology, atomic-scale structure prediction is crucial for understanding the properties of materials. However, due to the high-dimensional complexity of potential energy surfaces and the existence of numerous local minima, finding the global minimum (GM) structure has always been a major challenge in computational simulations.

[0003] Currently, researchers rely on manual guessing and domain knowledge to construct candidate structures, and then use first-principles methods (such as density functional theory, DFT) to optimize and determine their stability. This approach is inefficient, usually computationally intensive, and prone to getting trapped in local optima. To address this challenge, various automated structure search methods have been developed in recent years, such as molecular dynamics, Monte Carlo simulations, particle swarm optimization, stochastic search (such as random walks on potential energy surfaces), neural network optimization, and evolutionary algorithms.

[0004] Among them, evolutionary algorithms (EA) have become the most promising global optimization tools due to their ability to search without requiring any initial structural information (in practical applications, including the method provided in this patent, an initial molecular structure is still required), their strong search capabilities, and their suitability for parallel computation. EA generates new structures through crossover and mutation operations and evaluates their energy through local optimization, thereby gradually evolving into a better solution. However, in practical applications, EA generates a large number of intermediate structures during the search for the global optimum. These structures are usually ignored, resulting in low information utilization.

[0005] With the development of machine learning technology, clustering algorithms, as an unsupervised learning method, can classify intermediate structures and extract potential structural distribution information to guide the Expert Analytical Search (EA) process. Existing research shows that introducing structural clustering analysis into the EA process can identify, classify, and prioritize structural families in the search space, thereby helping to avoid search redundancy, achieve a degree of targeted search, enhance population diversity, and improve overall search efficiency. Although some existing works have attempted post-processing analysis of clustering after EA, research on embedding clustering results into the EA loop in real time to dynamically guide the search path remains at the forefront.

[0006] Therefore, how to invent a molecular conformation generation method based on evolutionary clustering algorithm that can improve the efficiency and intelligence of structure search has become an urgent problem to be solved. Summary of the Invention

[0007] To this end, the present invention provides a molecular conformation generation method and apparatus based on evolutionary clustering algorithm. By analyzing the intermediate structures generated by the evolutionary algorithm (EA) in real time, the search strategy is dynamically adjusted to prioritize the exploration of insufficiently covered conformational space regions, which significantly improves the convergence speed of the evolutionary algorithm, reduces redundant calculations, and thus improves the efficiency and intelligence level of structure search.

[0008] To achieve the above objectives, the present invention provides the following technical solution: a molecular conformation generation method based on evolutionary clustering algorithm, comprising:

[0009] The initial molecular structure is optimized by a local optimization strategy to obtain the optimized initial molecular structure.

[0010] Based on the optimized initial molecular structure, an initial population is generated.

[0011] The initial population is clustered using a hierarchical clustering strategy to obtain several structural clusters; the maximum intra-cluster distance of each structural cluster is calculated; the molecular structure within the structural cluster is judged based on the maximum intra-cluster distance to obtain abnormal and normal structural clusters.

[0012] For the abnormal structure cluster, the abnormal structures within the cluster are used as parents according to a set ratio for population iteration; for the normal structure cluster, the search strategy is optimized through an adaptive evolution strategy to obtain an optimized search strategy; and population iteration is performed based on the optimized search strategy.

[0013] Based on the energy convergence criterion and the structural convergence criterion, an algorithm termination condition is generated; iteration stops when the population iteration satisfies the algorithm termination condition or reaches the maximum number of iterations.

[0014] The molecular conformation that meets the energy standard during the output iteration process is obtained to obtain the final molecular conformation.

[0015] As a preferred scheme for molecular conformation generation methods based on evolutionary clustering algorithms, the local optimization strategies include: first-principles strategies, semi-empirical quantum chemistry strategies, and semi-empirical density functional theory strategies.

[0016] As a preferred scheme for molecular conformation generation method based on evolutionary clustering algorithm, in the process of generating the initial population, the bond distance matrix and bond angle distribution are used as feature vectors of molecular structure.

[0017] The expression for the key distance matrix is:

[0018]

[0019] In the formula, DM is the bond distance matrix; d ijis the distance between the i-th and j-th atoms; n is the number of atoms;

[0020] The bond angle distribution refers to the bond angles θ that exist between atoms in the molecular structure. ijk The bond angle formed by the i-th, j-th, and k-th atoms with the j-th atom as the vertex.

[0021] As a preferred embodiment of the molecular conformation generation method based on evolutionary clustering algorithm, the steps of performing cluster analysis on the initial population using the hierarchical clustering strategy to generate several structural clusters are as follows:

[0022] Each of the optimized initial molecular structures is treated as an independent cluster;

[0023] The average similarity among all the independent clusters is calculated using a similarity metric; based on the average similarity, the independent clusters whose similarity reaches a set requirement are merged.

[0024] Set a clustering termination condition; when the clustering analysis reaches the clustering termination condition, the clustering is terminated, and several structural clusters are obtained.

[0025] As a preferred embodiment of the molecular conformation generation method based on evolutionary clustering algorithm, in the process of optimizing the search strategy through the adaptive evolutionary strategy for the normal structure cluster, the search strategy is optimized by dynamically adjusting the selection probability distribution and the cluster penalty function.

[0026] The expression for the selection probability distribution is:

[0027]

[0028] In the formula, P(x i x represents the probability of being selected as the parent; i E represents the molecular structure in the population; β is a pressure parameter, the larger β is, the more the algorithm tends to choose low-energy structures, the smaller β is, the more random the selection; j E i The structures are x respectively j Structure x i The original energy;

[0029] The expression for the cluster penalty function is:

[0030]

[0031] In the formula, F i To correct fitness; C k For structure x i The number of structures within the cluster; N is the total number of structures in the current population; λ is the maximum penalty intensity, controlling the influence of cluster size on fitness. τ is a sigmoid function that makes the penalty grow smoothly over time; t is the number of generations; τ is a time constant that controls the rate at which the penalty grows with the number of generations.

[0032] As a preferred embodiment of the molecular conformation generation method based on evolutionary clustering algorithm, the formula for determining the energy convergence criterion is as follows:

[0033]

[0034] In the formula, The optimal structural energy for generation ti; This represents the optimal structural energy of the previous generation of Ti.

[0035] This invention also provides a molecular conformation generation device based on an evolutionary clustering algorithm, and based on the above-mentioned molecular conformation generation method based on an evolutionary clustering algorithm, includes:

[0036] The local molecular structure optimization module is used to optimize the input initial molecular structure through a local optimization strategy to obtain the optimized initial molecular structure.

[0037] An initial population generation module is used to generate an initial population based on the optimized initial molecular structure.

[0038] The structure cluster generation and judgment module is used to perform cluster analysis on the initial population through a hierarchical clustering strategy to obtain several structure clusters; calculate the maximum intra-cluster distance of each structure cluster; and judge the molecular structure within the structure cluster based on the maximum intra-cluster distance to obtain abnormal structure clusters and normal structure clusters.

[0039] The population iteration module is used to perform population iteration for the abnormal structure cluster by using the abnormal structures in the cluster as parents according to a set ratio; for the normal structure cluster, the search strategy is optimized by an adaptive evolution strategy to obtain an optimized search strategy; and population iteration is performed based on the optimized search strategy.

[0040] The population iteration termination module is used to generate algorithm termination conditions based on energy convergence criteria and structural convergence criteria; when the population iteration satisfies the algorithm termination conditions or reaches the maximum number of iterations, the iteration stops.

[0041] The final molecular conformation acquisition module is used to output the molecular conformations that meet the energy standard during the iteration process, and obtain the final molecular conformation.

[0042] As a preferred embodiment of a molecular conformation generation device based on evolutionary clustering algorithm, the local optimization strategy in the molecular structure local optimization module includes: first-principles strategy, semi-empirical quantum chemistry strategy, and semi-empirical density functional theory strategy.

[0043] As a preferred embodiment of a molecular conformation generation device based on evolutionary clustering algorithm, in the initial population generation module, the bond distance matrix and bond angle distribution are used as feature vectors of the molecular structure during the generation of the initial population.

[0044] The expression for the key distance matrix is:

[0045]

[0046] In the formula, DM is the bond distance matrix; d ij is the distance between the i-th and j-th atoms; n is the number of atoms;

[0047] The bond angle distribution refers to the bond angles θ that exist between atoms in the molecular structure. ijk The bond angle formed by the i-th, j-th, and k-th atoms with the j-th atom as the vertex.

[0048] As a preferred embodiment of a molecular conformation generation device based on evolutionary clustering algorithms, the sub-module for generating and judging structural clusters by performing cluster analysis on the initial population using the hierarchical clustering strategy to generate several structural clusters includes:

[0049] An independent cluster setting submodule is used to treat each of the optimized initial molecular structures as an independent cluster;

[0050] The independent cluster merging submodule is used to calculate the average similarity between all the independent clusters through a similarity metric; and to merge the independent clusters whose similarity reaches a set requirement based on the average similarity.

[0051] The clustering termination submodule is used to set clustering termination conditions; when the clustering analysis reaches the clustering termination conditions, the clustering is terminated, and several of the structure clusters are obtained.

[0052] As a preferred embodiment of a molecular conformation generation device based on evolutionary clustering algorithm, in the population iteration module, during the process of optimizing the search strategy for the normal structure clusters through the adaptive evolution strategy, the search strategy is optimized by dynamically adjusting the selection probability distribution and the cluster penalty function.

[0053] The expression for the selection probability distribution is:

[0054]

[0055] In the formula, P(x i x represents the probability of being selected as the parent; i E represents the molecular structure in the population; β is a pressure parameter, the larger β is, the more the algorithm tends to choose low-energy structures, the smaller β is, the more random the selection; j E iThe structures are x respectively j Structure x i The original energy;

[0056] The expression for the cluster penalty function is:

[0057]

[0058] In the formula, F i To correct fitness; C k For structure x i The number of structures within the cluster; N is the total number of structures in the current population; λ is the maximum penalty intensity, controlling the influence of cluster size on fitness. τ is a sigmoid function that makes the penalty grow smoothly over time; t is the number of generations; τ is a time constant that controls the rate at which the penalty grows with the number of generations.

[0059] As a preferred embodiment of a molecular conformation generation device based on evolutionary clustering algorithm, the energy convergence criterion determination formula in the population iteration termination module is as follows:

[0060]

[0061] In the formula, The optimal structural energy for generation ti; This represents the optimal structural energy of the previous generation of Ti.

[0062] This invention has the following advantages: It optimizes the input initial molecular structure using a local optimization strategy to obtain an optimized initial molecular structure; based on the optimized initial molecular structure, an initial population is generated; the initial population is clustered using a hierarchical clustering strategy to obtain several structural clusters; the maximum intra-cluster distance of each structural cluster is calculated; the molecular structures within each structural cluster are judged based on the maximum intra-cluster distance to obtain abnormal and normal structural clusters; for the abnormal structural clusters, the abnormal structures within the cluster are used as parents according to a set ratio for population iteration; for the normal structural clusters, the search strategy is optimized using an adaptive evolutionary strategy to obtain an optimized search strategy; population iteration is performed based on the optimized search strategy; algorithm termination conditions are generated based on energy convergence and structural convergence criteria; iteration stops when the population iteration meets the algorithm termination conditions or reaches the maximum number of iterations; molecular conformations that meet the energy criteria during iteration are output to obtain the final molecular conformation. This invention identifies unexplored conformational space regions through dynamic clustering, prioritizes the search for potential low-energy structures, significantly improves the convergence speed of the evolutionary algorithm, and reduces redundant computation. This invention combines anomaly structure priority selection and clustering penalty mechanisms to automatically balance the "exploration-development" strategy, effectively avoiding getting trapped in local optima and increasing the probability of discovering the global optimum. This invention uses bond distance features to represent molecular structures, applicable to various systems including organic molecules and inorganic materials, without relying on prior knowledge or human intervention, thus possessing broad applicability. Throughout the conformational screening process, this invention allows for customization of the computational level, employing both high-level quantum chemical calculation methods and various semi-empirical methods with lower computational costs, providing free control over the computational workload. Attached Figure Description

[0063] To more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly introduced below. Obviously, the drawings in the following description are merely exemplary, and those skilled in the art can derive other embodiments based on the provided drawings without creative effort.

[0064] The structures, proportions, sizes, etc. illustrated in this specification are only for the purpose of assisting those skilled in the art in understanding and reading the content disclosed herein, and are not intended to limit the conditions under which the present invention can be implemented. Therefore, they have no substantial technical significance. Any modifications to the structure, changes in the proportions, or adjustments to the size, without affecting the effects and objectives that the present invention can produce, should still fall within the scope of the technical content disclosed in the present invention.

[0065] Figure 1This is a schematic flowchart of the molecular conformation generation method based on evolutionary clustering algorithm provided in Embodiment 1 of the present invention;

[0066] Figure 2 This is a schematic diagram illustrating the specific implementation process of the molecular conformation generation method based on evolutionary clustering algorithm provided in Embodiment 1 of the present invention;

[0067] Figure 3 This is a schematic diagram of the initial 1,3-butadiene molecular structure in one possible embodiment provided in Embodiment 1 of the present invention;

[0068] Figure 4 This is a schematic diagram of various conformational molecules generated based on the input 1,3-butadiene molecule in one possible embodiment provided in Embodiment 1 of the present invention.

[0069] Figure 5 This is a schematic diagram of the molecular conformation generation device based on evolutionary clustering algorithm provided in Embodiment 2 of the present invention. Detailed Implementation

[0070] The following specific embodiments illustrate the implementation of the present invention. Those skilled in the art can easily understand other advantages and effects of the present invention from the content disclosed in this specification. Obviously, the described embodiments are only some, not all, of the embodiments of the present invention. Based on the embodiments of the present invention, all other embodiments obtained by those skilled in the art without creative effort are within the scope of protection of the present invention.

[0071] Example 1

[0072] See Figure 1 and Figure 2 Embodiment 1 of the present invention provides a molecular conformation generation method based on evolutionary clustering algorithm, comprising the following steps:

[0073] S1. Optimize the input initial molecular structure using a local optimization strategy to obtain the optimized initial molecular structure;

[0074] S2. Based on the optimized initial molecular structure, generate an initial population;

[0075] S3. Perform cluster analysis on the initial population using a hierarchical clustering strategy to obtain several structural clusters; calculate the maximum intra-cluster distance for each structural cluster; use the maximum intra-cluster distance to determine the molecular structure within the structural cluster to obtain abnormal and normal structural clusters;

[0076] S4. For the abnormal structure cluster, the abnormal structures within the cluster are used as parents according to a set ratio, and population iteration is performed; for the normal structure cluster, the search strategy is optimized through an adaptive evolution strategy to obtain an optimized search strategy; population iteration is performed based on the optimized search strategy.

[0077] S5. Based on the energy convergence criterion and the structural convergence criterion, generate the algorithm termination condition; when the population iteration satisfies the algorithm termination condition or reaches the maximum number of iterations, stop the iteration;

[0078] S6. Output the molecular conformation that meets the energy standard during the iteration process to obtain the final molecular conformation.

[0079] In this embodiment, in step S1, the input initial molecular structure is optimized using a local optimization strategy to obtain the optimized initial molecular structure.

[0080] The input format for the initial molecular structure includes, but is not limited to, methods that can represent the XYZ coordinates, crystal structure, or SMILES of a molecule.

[0081] Specifically, the local optimization strategies include, but are not limited to, the use of first-principles methods (such as the Hartree-Fock method, density functional theory, etc.), semi-empirical quantum chemical methods (such as PM7, etc.), and semi-empirical density functional theory methods (such as GFn-xTB).

[0082] In this embodiment, in step S2, an initial population is generated based on the optimized initial molecular structure;

[0083] Specifically, for each molecular structure representation in the generated initial population, the bond distance matrix (DM) and bond angle distribution (BAD) are mainly used as feature vectors. However, it should be noted that the feature vectors representing molecular structures include, but are not limited to, the two methods mentioned above.

[0084] The bond distance matrix DM is defined as follows:

[0085]

[0086] In the formula, DM is the bond distance matrix; d ij is the distance between the i-th and j-th atoms; n is the number of atoms;

[0087] The constraints on the bond distance matrix are:

[0088] Define the search space boundaries (e.g., set a size of...) (space);

[0089] Interatomic covalent radius constraint:

[0090] d min =Δ(r) i +r j )*1.05

[0091] d max =Δ(r) i +r j )*1.40

[0092] In the formula, r i and r j These are the covalent radii of the i-th and j-th atoms, respectively; the above constraint requires that the bonding determination range of the two atoms be between 1.05 and 1.40 times the sum of the covalent radii of the two atoms.

[0093] In this embodiment, the bond angle distribution refers to the bond angle θ between atoms in the molecular structure. ijk The bond angle formed by the i-th, j-th, and k-th atoms with the j-th atom as the vertex.

[0094] In this embodiment, the rejection sampling method is used to ensure that the molecules sampled in the space are reasonable. To ensure that the atoms in the sampled molecules meet the distance constraints, the sampling steps are repeated until the desired number of valid initial structures are generated.

[0095] In this embodiment, in step S3, the initial population is clustered using a hierarchical clustering strategy to obtain several structural clusters; the maximum intra-cluster distance of each structural cluster is calculated; the molecular structure within the structural cluster is judged based on the maximum intra-cluster distance to obtain abnormal structural clusters and normal structural clusters.

[0096] Specifically, the steps for performing cluster analysis on the initial population using the hierarchical clustering strategy to generate several structural clusters are as follows:

[0097] S31. Treat each of the optimized initial molecular structures as an independent cluster;

[0098] Specifically, geometrically similar structures in conformational space are collected to form a set, called a cluster. Each optimized initial molecular structure is treated as an independent cluster for initialization.

[0099] S32. Calculate the average similarity among all the independent clusters using a similarity metric; based on the average similarity, merge the independent clusters whose similarity reaches a set requirement.

[0100] Specifically, the similarity metric uses an improved weighted Manhattan distance:

[0101]

[0102] In the formula, A and B are two different molecules; The k-th bond distance of molecule A (the k-th value after flattening the bond distance matrix DM into a one-dimensional vector); weight In the formula, N is the total number of atoms in the molecule. i and N j These represent the number of atoms of atomic types i and j in the molecule, respectively.

[0103] S33. Set clustering termination conditions; when the clustering analysis reaches the clustering termination conditions, terminate the clustering and obtain several of the structure clusters.

[0104] Specifically, the inconsistency coefficient IC is defined as follows:

[0105]

[0106] in, μ represents the average link similarity between the two clusters in the current merging step. s σ is the average similarity of historical merging steps; s This represents the standard deviation of the similarity of historical merging steps.

[0107] When all inconsistency coefficients |IC| > 4.8 are finally found, it means that merging can no longer be performed, and clustering is terminated at this point.

[0108] In this embodiment, during the process of calculating the maximum intra-cluster distance for each of the structural clusters,

[0109] For each cluster c, calculate the maximum intra-cluster distance:

[0110]

[0111] like:

[0112]

[0113] In the formula, σ is the average width of all clusters; w This represents the standard deviation of the cluster width.

[0114] In this embodiment, in step S4, for the abnormal structure cluster, the abnormal structures within the cluster are used as parents according to a set ratio for population iteration; for the normal structure cluster, the search strategy is optimized through an adaptive evolution strategy to obtain an optimized search strategy; and population iteration is performed based on the optimized search strategy.

[0115] Specifically, for the identified anomalous structures, at least 30% should be retained as parents in the next generation.

[0116] For the normal structure cluster, the search strategy is optimized through an adaptive evolutionary strategy to obtain an optimized search strategy; based on the optimized search strategy, population iteration is performed.

[0117] Specifically, this invention introduces an adaptive evolution strategy into the Clustering Accompanied Evolutionary Algorithm (CAEA), which optimizes search efficiency by dynamically adjusting the selection probability distribution and cluster penalty function.

[0118] The adaptive evolutionary strategy consists of two parts: the selection probability distribution and the cluster penalty function.

[0119] Choose a probability distribution:

[0120] During evolution, anomalous structures are preferentially selected to balance exploration and exploitation. For each structure x in the population... i The probability P(x) of being selected as the parent generation. i ) is defined as:

[0121]

[0122] In the formula, P(x i x represents the probability of being selected as the parent; i E represents the molecular structure in the population; β is a pressure parameter, the larger β is, the more the algorithm tends to choose low-energy structures, the smaller β is, the more random the selection; j E i The structures are x respectively j Structure x i The original energy.

[0123] Cluster penalty function:

[0124] The purpose of this function is to suppress redundant searches in already explored regions, preventing the algorithm from getting trapped in local optima. For each structure x in the population... i Correct fitness F i Defined as:

[0125]

[0126] In the formula, F i To correct fitness; C k For structure x i The number of structures within the cluster; N is the total number of structures in the current population; λ is the maximum penalty intensity, controlling the influence of cluster size on fitness. τ is a sigmoid function that makes the penalty grow smoothly over time; t is the number of generations; τ is a time constant that controls the rate at which the penalty grows with the number of generations.

[0127] In this embodiment, in step S5, an algorithm termination condition is generated based on the energy convergence criterion and the structure convergence criterion; when the population iteration satisfies the algorithm termination condition or reaches the maximum number of iterations, the iteration stops.

[0128] Specifically, the formula for determining the energy convergence criterion is as follows:

[0129]

[0130] In the formula, The optimal structural energy for generation ti; Let 1 be the energy of the optimal structure of the previous generation. Examine the energy fluctuation of the optimal structure of the last 5 generations. If the sum of the energy changes in 5 consecutive iterations is less than 0.01 eV, the energy is considered to have converged.

[0131] The criteria for determining structural convergence are: the pairwise similarity s of the lowest-energy structures among the five best clusters in the current population. AB A value greater than 0.95 indicates that multiple independent evolutionary paths converge to the same or highly similar structures, suggesting that the global optimum may have been found.

[0132] In this embodiment, iteration stops when the population iteration meets the algorithm termination condition or reaches the maximum number of iterations;

[0133] Specifically, in the evolutionary generation t>t max If the above two criteria are still not met, the algorithm terminates, where the maximum number of generations t is... max It is usually set to 1000.

[0134] In this embodiment, in step S6, the molecular conformation that meets the energy standard during the iteration process is output to obtain the final molecular conformation.

[0135] Specifically, output all values ​​that satisfy energy E. <E min The final molecular conformation is obtained by adopting a conformation of +0.5 eV.

[0136] In the formula, E min This is the currently found global minimum energy conformation.

[0137] In one possible embodiment, an example of a 1,3-butadiene molecular conformation is provided below:

[0138] T1. Initial Input and Parameter Settings;

[0139] Target molecule: 1,3-Butadiene (SMILES means C=C=C), its molecular structure is as follows: Figure 3 As shown.

[0140] Initial structure: A linear inverse conformation is generated using RDKit (initial XYZ coordinates are shown in the table below), and local optimization is performed using the GFN2-xTB method.

[0141] Computational level: GFN2-xTB (semi-empirical method) is used in the global search phase, and DFT (B3LYP / 6-31G*) is used for the final conformation optimization.

[0142] Population size: 50.

[0143] Maximum number of algebras: 200.

[0144] Spatial constraints: For a cube, the interatomic distance constraint is 1.05 to 1.40 times the sum of the covalent radii.

[0145] T2, Initial population generation;

[0146] The bond distance matrix (DM) and bond angle distribution (BAD) are used as feature vectors; by rejecting sampling, 50 initial structures are generated to ensure that the interatomic distance constraints are met.

[0147] For example: sample a cis conformation (C=C=C, dihedral angle ≈ 0°); sample a twisted conformation (dihedral angle ≈ 120°).

[0148] T3, Dynamic Clustering and Evolutionary Process;

[0149] Similarity is measured using weighted Manhattan distance (weights based on the proportion of atom types); the inconsistency coefficient threshold is set to 0.3. After clustering, the resulting structural clusters are judged for abnormal structures; if the maximum distance within a cluster exceeds [a certain threshold], [the cluster is considered abnormal]. Mark it as an anomaly, generate an anomaly structure cluster, and the other normal structure clusters.

[0150] T4. Optimize population iteration through adaptive strategies;

[0151] For anomalous structure clusters, the anomalous structures within the cluster are used as parents according to a set ratio for population iteration, and 30% are retained for the next generation.

[0152] For clusters with normal structures, the search strategy is optimized through an adaptive evolutionary strategy to obtain an optimized search strategy; population iteration is then performed based on the optimized search strategy.

[0153] Specifically, by selecting a probability distribution, the probability of cis conformation (low-density cluster) is increased by 20%; and by using a cluster penalty function, a penalty ΔE = 0.1eV is imposed on inverse cluster (large cluster).

[0154] T5. Define the algorithm termination conditions;

[0155] Among them, energy convergence: at generation 122, the optimal energy fluctuation for 5 consecutive generations is <0.01eV;

[0156] Structural convergence: The similarity of the 5 best conformations is >0.95.

[0157] T6 produces many conformations, including not only the cis-trans structure of 1,3-butadiene, but also many other isomers, as well as structures that form two molecules (such as ethylene plus an acetylene structure), and various unstable intermediates. For example... Figure 4 As shown, various molecular conformations obtained from the initial 1,3-butadiene by this patented invention are illustrated.

[0158] In summary, this invention optimizes the input initial molecular structure using a local optimization strategy to obtain an optimized initial molecular structure; based on the optimized initial molecular structure, an initial population is generated; the initial population is clustered using a hierarchical clustering strategy to obtain several structural clusters; the maximum intra-cluster distance of each structural cluster is calculated; the molecular structures within each structural cluster are judged based on the maximum intra-cluster distance to obtain anomalous and normal structural clusters; for the anomalous structural clusters, the anomalous structures within the cluster are used as parents according to a set ratio for population iteration; for the normal structural clusters, the search strategy is optimized using an adaptive evolutionary strategy to obtain an optimized search strategy; based on the optimized search strategy, population iteration is performed; based on energy convergence criteria and structural convergence criteria, algorithm termination conditions are generated; when the population iteration satisfies the algorithm termination conditions or reaches the maximum number of iterations, iteration stops; the molecular conformations that meet the energy criteria during the iteration process are output to obtain the final molecular conformation. This invention identifies unexplored conformational space regions through dynamic clustering, prioritizes the search for potential low-energy structures, significantly improves the convergence speed of the evolutionary algorithm, and reduces redundant computation. This invention combines anomaly structure priority selection and clustering penalty mechanisms to automatically balance the "exploration-development" strategy, effectively avoiding getting trapped in local optima and increasing the probability of discovering the global optimum. This invention uses bond distance features to represent molecular structures, applicable to various systems including organic molecules and inorganic materials, without relying on prior knowledge or human intervention, thus possessing broad applicability. Throughout the conformational screening process, this invention allows for customization of the computational level, employing both high-level quantum chemical calculation methods and various semi-empirical methods with lower computational costs, providing free control over the computational workload.

[0159] It should be noted that the method of this disclosure embodiment can be executed by a single device, such as a computer or server. The method of this embodiment can also be applied to a distributed scenario, where multiple devices cooperate to complete the task. In such a distributed scenario, one of these devices may execute only one or more steps of the method of this disclosure embodiment, and the multiple devices will interact with each other to complete the method described.

[0160] It should be noted that the above description describes some embodiments of this disclosure. Other embodiments are within the scope of the appended claims. In some cases, the actions or steps recorded in the claims can be performed in a different order than that shown in the above embodiments and still achieve the desired result. Furthermore, the processes depicted in the drawings do not necessarily require a specific or sequential order to achieve the desired result. In some embodiments, multitasking and parallel processing are also possible or may be advantageous.

[0161] Example 2

[0162] See Figure 5 Embodiment 2 of the present invention also provides a molecular conformation generation apparatus based on an evolutionary clustering algorithm, comprising:

[0163] The local molecular structure optimization module 001 is used to optimize the input initial molecular structure through a local optimization strategy to obtain the optimized initial molecular structure.

[0164] The initial population generation module 002 is used to generate an initial population based on the optimized initial molecular structure.

[0165] The structure cluster generation and judgment module 003 is used to perform cluster analysis on the initial population through a hierarchical clustering strategy to obtain several structure clusters; calculate the maximum intra-cluster distance of each structure cluster; and judge the molecular structure within the structure cluster based on the maximum intra-cluster distance to obtain abnormal structure clusters and normal structure clusters.

[0166] The population iteration module 004 is used to perform population iteration for the abnormal structure cluster by taking the abnormal structures in the cluster as parents according to a set ratio; for the normal structure cluster, it optimizes the search strategy through an adaptive evolution strategy to obtain an optimized search strategy; and performs population iteration based on the optimized search strategy.

[0167] The population iteration termination module 005 is used to generate algorithm termination conditions based on energy convergence criteria and structural convergence criteria; when the population iteration satisfies the algorithm termination conditions or reaches the maximum number of iterations, the iteration stops.

[0168] The final molecular conformation acquisition module 006 is used to output the molecular conformation that meets the energy standard during the iteration process, and obtain the final molecular conformation.

[0169] In this embodiment, the local optimization strategy in the molecular structure local optimization module 001 includes: first-principles strategy, semi-empirical quantum chemistry strategy, and semi-empirical density functional theory strategy.

[0170] In this embodiment, in the initial population generation module 002, during the process of generating the initial population, the bond distance matrix and bond angle distribution are used as feature vectors of the molecular structure.

[0171] The expression for the key distance matrix is:

[0172]

[0173] In the formula, DM is the bond distance matrix; d ij is the distance between the i-th and j-th atoms; n is the number of atoms;

[0174] The bond angle distribution refers to the bond angles θ that exist between atoms in the molecular structure. ijk The bond angle formed by the i-th, j-th, and k-th atoms with the j-th atom as the vertex.

[0175] In this embodiment, the sub-module for generating and judging the structural clusters in the structural cluster generation and judgment module 003 includes: performing cluster analysis on the initial population using the hierarchical clustering strategy to generate several structural clusters.

[0176] Independent cluster setting submodule 031 is used to treat each of the optimized initial molecular structures as an independent cluster;

[0177] The independent cluster merging submodule 032 is used to calculate the average similarity between all the independent clusters through a similarity metric; and to merge the independent clusters whose similarity reaches a set requirement based on the average similarity.

[0178] Clustering termination submodule 033 is used to set clustering termination conditions; when the clustering analysis reaches the clustering termination conditions, the clustering is terminated, and several of the structure clusters are obtained.

[0179] In this embodiment, in the population iteration module 004, during the process of optimizing the search strategy for the normal structure cluster using the adaptive evolution strategy, the search strategy is optimized by dynamically adjusting the selection probability distribution and the cluster penalty function.

[0180] The expression for the selection probability distribution is:

[0181]

[0182] In the formula, P(x i x represents the probability of being selected as the parent; i E represents the molecular structure in the population; β is a pressure parameter, the larger β is, the more the algorithm tends to choose low-energy structures, the smaller β is, the more random the selection; j E i The structures are x respectively j Structure x i The original energy;

[0183] The expression for the cluster penalty function is:

[0184]

[0185] In the formula, F i To correct fitness; C k For structure x i The number of structures within the cluster; N is the total number of structures in the current population; λ is the maximum penalty intensity, controlling the influence of cluster size on fitness. τ is a sigmoid function that makes the penalty grow smoothly over time; t is the number of generations; τ is a time constant that controls the rate at which the penalty grows with the number of generations.

[0186] In this embodiment, the formula for determining the energy convergence criterion in the population iteration termination module 005 is as follows:

[0187]

[0188] In the formula, The optimal structural energy for generation ti; This represents the optimal structural energy of the previous generation of Ti.

[0189] It should be noted that the information interaction and execution process between the modules of the above system are based on the same concept as the method embodiment in Embodiment 1 of this application, and the resulting technical effects are the same as those in the method embodiment of this application. For details, please refer to the description in the method embodiment shown above in this application, and it will not be repeated here.

[0190] Example 3

[0191] Embodiment 3 of the present invention provides a non-transitory computer-readable storage medium storing program code for a molecular conformation generation method based on an evolutionary clustering algorithm. The program code includes instructions for executing a molecular conformation generation method based on an evolutionary clustering algorithm according to Embodiment 1 or any possible implementation thereof.

[0192] Computer-readable storage media can be any available medium that a computer can access, or a data storage device such as a server or data center that integrates one or more available media. The available medium can be magnetic media (e.g., floppy disks, hard disks, magnetic tapes), optical media (e.g., DVDs), or semiconductor media (e.g., solid-state drives, SSDs).

[0193] Example 4

[0194] Embodiment 4 of the present invention provides an electronic device, including: a memory and a processor;

[0195] The processor and the memory communicate with each other via a bus; the memory stores program instructions that can be executed by the processor, and the processor can execute the molecular conformation generation method based on evolutionary clustering algorithm according to Embodiment 1 or any possible implementation thereof by calling the program instructions.

[0196] Specifically, a processor can be implemented in hardware or software. When implemented in hardware, the processor can be a logic circuit, an integrated circuit, etc. When implemented in software, the processor can be a general-purpose processor that reads software code stored in memory. This memory can be integrated into the processor or located outside the processor and exist independently.

[0197] In the above embodiments, implementation can be achieved, in whole or in part, through software, hardware, firmware, or any combination thereof. When implemented in software, it can be implemented, in whole or in part, as a computer program product. The computer program product includes one or more computer instructions. When the computer program instructions are loaded and executed on a computer, all or part of the processes or functions described in the embodiments of the present invention are generated. The computer can be a general-purpose computer, a special-purpose computer, a computer network, or other programmable system. The computer instructions can be stored in a computer-readable storage medium or transmitted from one computer-readable storage medium to another. For example, the computer instructions can be transmitted from one website, computer, server, or data center to another via wired (e.g., coaxial cable, fiber optic, digital subscriber line (DSL)) or wireless (e.g., infrared, wireless, microwave, etc.) means.

[0198] It is obvious to those skilled in the art that the modules or steps of the present invention described above can be implemented using general-purpose computing systems. They can be centralized on a single computing system or distributed across a network of multiple computing systems. Optionally, they can be implemented using program code executable by a computing system, thereby storing them in a storage system for execution by the computing system. In some cases, the steps shown or described can be performed in a different order than those presented herein, or they can be fabricated as separate integrated circuit modules, or multiple modules or steps can be fabricated as a single integrated circuit module. Thus, the present invention is not limited to any particular combination of hardware and software.

[0199] Although the present invention has been described in detail above with general descriptions and specific embodiments, modifications or improvements can be made to it, which will be obvious to those skilled in the art. Therefore, all such modifications or improvements made without departing from the spirit of the present invention fall within the scope of protection claimed by the present invention.

Claims

1. A molecular conformation generation method based on evolutionary clustering algorithm, characterized in that, include: The initial molecular structure is optimized by a local optimization strategy to obtain the optimized initial molecular structure. Based on the optimized initial molecular structure, an initial population is generated. The initial population is clustered using a hierarchical clustering strategy to obtain several structural clusters; the maximum intra-cluster distance of each structural cluster is calculated; the molecular structure within the structural cluster is judged based on the maximum intra-cluster distance to obtain abnormal and normal structural clusters. For the aforementioned abnormal structure cluster, the abnormal structures within the cluster are used as parents according to a set ratio for population iteration; For the normal structure cluster, the search strategy is optimized through an adaptive evolutionary strategy to obtain an optimized search strategy; based on the optimized search strategy, population iteration is performed; Based on the energy convergence criterion and the structural convergence criterion, the algorithm termination condition is generated. The iteration stops when the population iteration meets the algorithm termination condition or reaches the maximum number of iterations. The molecular conformation that meets the energy standard during the output iteration process is obtained to obtain the final molecular conformation.

2. The molecular conformation generation method based on evolutionary clustering algorithm according to claim 1, characterized in that, The local optimization strategies include: first-principles strategies, semi-empirical quantum chemistry strategies, and semi-empirical density functional theory strategies.

3. The molecular conformation generation method based on evolutionary clustering algorithm according to claim 2, characterized in that, In the process of generating the initial population, the bond distance matrix and bond angle distribution are used as the feature vectors of the molecular structure. The expression for the key distance matrix is: In the formula, DM is the bond distance matrix; d ij is the distance between the i-th and j-th atoms; n is the number of atoms; The bond angle distribution refers to the bond angles θ that exist between atoms in the molecular structure. ijk The bond angle formed by the i-th, j-th, and k-th atoms with the j-th atom as the vertex.

4. The molecular conformation generation method based on evolutionary clustering algorithm according to claim 3, characterized in that, The steps for performing cluster analysis on the initial population using the hierarchical clustering strategy to generate several structural clusters are as follows: Each of the optimized initial molecular structures is treated as an independent cluster; The average similarity among all the independent clusters is calculated using a similarity metric; based on the average similarity, the independent clusters whose similarity reaches a set requirement are merged. Set clustering termination conditions; When the clustering analysis reaches the clustering termination condition, the clustering is terminated, and several structural clusters are obtained.

5. The molecular conformation generation method based on evolutionary clustering algorithm according to claim 4, characterized in that, In the process of optimizing the search strategy for the normal structure cluster using the adaptive evolution strategy, the search strategy is optimized by dynamically adjusting the selection probability distribution and the cluster penalty function. The expression for the selection probability distribution is: In the formula, P(x i x represents the probability of being selected as the parent; i E represents the molecular structure in the population; β is a pressure parameter, the larger β is, the more the algorithm tends to choose low-energy structures, the smaller β is, the more random the selection; j E i The structures are x respectively j Structure x i The original energy; The expression for the cluster penalty function is: In the formula, F i To adjust fitness; C k For structure x i The number of structures within the cluster; N is the total number of structures in the current population; λ is the maximum penalty intensity, controlling the effect of cluster size on fitness; τ is a sigmoid function that makes the penalty grow smoothly over time; t is the number of generations; τ is a time constant that controls the rate at which the penalty grows with the number of generations.

6. The molecular conformation generation method based on evolutionary clustering algorithm according to claim 5, characterized in that, The formula for determining the energy convergence criterion is as follows: In the formula, The optimal structural energy for generation ti; This represents the optimal structural energy of the previous generation of Ti.

7. A molecular conformation generation device based on evolutionary clustering algorithm, employing the molecular conformation generation method based on evolutionary clustering algorithm as described in any one of claims 1-6, characterized in that, include: The local molecular structure optimization module is used to optimize the input initial molecular structure through a local optimization strategy to obtain the optimized initial molecular structure. An initial population generation module is used to generate an initial population based on the optimized initial molecular structure. The structure cluster generation and judgment module is used to perform cluster analysis on the initial population through a hierarchical clustering strategy to obtain several structure clusters; calculate the maximum intra-cluster distance of each structure cluster; and judge the molecular structure within the structure cluster based on the maximum intra-cluster distance to obtain abnormal structure clusters and normal structure clusters. The population iteration module is used to perform population iteration for the abnormal structure cluster, taking the abnormal structures in the cluster as parents according to a set ratio. For the normal structure cluster, the search strategy is optimized through an adaptive evolutionary strategy to obtain an optimized search strategy; based on the optimized search strategy, population iteration is performed; The population iteration termination module is used to generate algorithm termination conditions based on energy convergence criteria and structural convergence criteria. The iteration stops when the population iteration meets the algorithm termination condition or reaches the maximum number of iterations. The final molecular conformation acquisition module is used to output the molecular conformations that meet the energy standard during the iteration process, and obtain the final molecular conformation.

8. The molecular conformation generation device based on evolutionary clustering algorithm according to claim 7, characterized in that, In the molecular structure local optimization module, the local optimization strategies include: first-principles strategy, semi-empirical quantum chemistry strategy, and semi-empirical density functional theory strategy.

9. The molecular conformation generation device based on evolutionary clustering algorithm according to claim 8, characterized in that, In the initial population generation module, during the process of generating the initial population, the bond distance matrix and bond angle distribution are used as feature vectors of the molecular structure. The expression for the key distance matrix is: In the formula, DM is the bond distance matrix; d ij is the distance between the i-th and j-th atoms; n is the number of atoms; The bond angle distribution refers to the bond angles θ that exist between atoms in the molecular structure. ijk The bond angle formed by the i-th, j-th, and k-th atoms with the j-th atom as the vertex.

10. The molecular conformation generation device based on evolutionary clustering algorithm according to claim 9, characterized in that, The sub-module for generating and judging structural clusters, which performs cluster analysis on the initial population using the hierarchical clustering strategy to generate several structural clusters, includes: An independent cluster setting submodule is used to treat each of the optimized initial molecular structures as an independent cluster; The independent cluster merging submodule is used to calculate the average similarity between all the independent clusters through a similarity metric; and to merge the independent clusters whose similarity reaches a set requirement based on the average similarity. The clustering termination submodule is used to set clustering termination conditions; when the clustering analysis reaches the clustering termination conditions, the clustering is terminated, and several of the structure clusters are obtained.