A material reverse design method and system based on LLM

CN122201563APending Publication Date: 2026-06-12NORTHEASTERN UNIV AT QINHUANGDAO

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
NORTHEASTERN UNIV AT QINHUANGDAO
Filing Date
2026-04-03
Publication Date
2026-06-12

AI Technical Summary

Technical Problem

Existing technologies in reverse engineering materials suffer from limitations in modeling dimensions, irreversibility of crystal structure and material property constraint mapping, and dynamic programming capabilities, making it difficult to achieve efficient, operable, and verifiable property-guided crystal structure generation.

Method used

We employ an LLM-based material reverse design method. By constructing a correlation model between changes in crystal structure and the evolution of material properties, we start from the initial crystal structure and perform condition-guided iterative structure generation. We learn to perform structure reshaping along the direction of material property improvement to generate the target crystal structure.

🎯Benefits of technology

It achieves high stability, high generation efficiency and high success rate in material reverse design, and provides an operable, reliable and verifiable means of new material design, ensuring the physical rationality and functional accessibility of the generated results.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN122201563A_ABST
    Figure CN122201563A_ABST
Patent Text Reader

Abstract

The embodiment of the application discloses a material reverse design method and system based on LLM, and relates to the field of material intelligent design. The material reverse design method based on LLM comprises the following steps: taking at least one initial crystal structure from a preset crystal database for a given target property; for any initial crystal structure, using the difference between the material property corresponding to the initial crystal structure and the target property as a guide, performing conditional guided iterative structure generation on the initial crystal structure by using the fine-tuned LLM, and obtaining a target crystal structure.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This manual relates to the field of intelligent material design, and in particular to a material reverse design method and system based on LLM. Background Technology

[0002] Reverse engineering, also known as property-guided crystal structure generation, refers to the reverse design and discovery of crystalline materials that meet specific functional requirements under given constraints on target physical or chemical properties. This process requires not only a model that accurately characterizes the periodic atomic arrangement of the crystal in three-dimensional space, but also a systematic depiction of the intrinsic relationship between atomic-level structural features and macroscopic material properties. It represents a key technological challenge in achieving the integration of microscopic material structure design with macroscopic application performance.

[0003] Materials reverse engineering is highly valuable across various materials design fields. In the field of energy materials, it is considered a crucial technological path to break through existing performance limits. For example, in lithium-ion battery material design, this technology is expected to synergistically optimize energy density, ion transport performance, and structural stability at the atomic structure level, thereby exploring novel positive and negative electrode material systems with multiple performance advantages. In the field of high-end electronic and information materials, it is expected to provide fundamental support for the targeted design of material properties and device-level applications. By introducing property constraints at the structural level, crystal materials with specific dielectric constants, low dielectric losses, target thermal conductivity, or magnetic responses can be designed as needed, providing structurally controllable material solutions for dielectric materials, packaging materials, and functional thin films in cutting-edge applications such as communication, high-speed computing chips, and quantum information devices. In basic materials science research, it can provide new research methods for systematically exploring the mapping relationship between structure and properties. Researchers can use this technology to reverse deduce the potential crystal structures required to achieve specific physical phenomena (such as superconductivity, electric topological states, giant magnetoresistance effects, etc.), thereby accelerating the discovery of new states of matter and the theoretical verification of their physical mechanisms.

[0004] The various artificial intelligence-based material design methods proposed in existing research can be mainly summarized into the following technical routes.

[0005] The first type of method focuses on the iterative optimization and performance prediction of macroscopically adjustable parameters of materials, abstracting material design into a problem of searching for optimal solutions in a high-dimensional parameter space. Its core lies in constructing an efficient surrogate model that can characterize the mapping relationship between adjustable design variables and the target properties of materials. While this type of method can achieve a certain level of efficiency in specific material systems, its technical approach still has fundamental limitations. Taking the polymer material formulation design optimization method and system disclosed in Chinese patent CN202511365858.2 as an example, this method constructs a data-driven active learning closed loop. Based on limited initial experimental data, it predicts the performance of a preset formulation space and generates virtual samples. Then, it combines real and virtual data to train a prediction model, recommends potential optimal formulations through optimization algorithms, and conducts experimental verification, forming a continuously iterative optimization process. However, this method abstracts material design into an optimization problem of continuously adjustable parameters, failing to address the structurally decisive essence of crystalline material properties. The macroscopic properties of crystalline materials are mainly determined by the periodic lattice arrangement of atoms in three-dimensional space, and not merely by the simple mixing of components. Therefore, this type of method has limitations in design tasks involving structure-sensitive properties such as crystal symmetry, atomic coordination environment, or anisotropy, and its technical effectiveness is significantly affected by the limitations of the model's degrees of freedom. Furthermore, this method relies on active learning loops to iterate through large amounts of experimental data. When applied to the exploration of crystal materials, the synthesis and characterization of each new structure is costly and time-consuming, making it difficult to realize the so-called high efficiency advantage in practical applications and hindering the large-scale, high-throughput discovery of new materials.

[0006] Another approach starts at the atomic scale, directly performing generative modeling of crystal structures. This aims to learn the statistical distribution characteristics of existing crystal databases and generate new candidate phases with reasonable structures. While this approach can directly generate crystal structures, it faces fundamental challenges in property-guided generation tasks. For example, a deep learning-based metallic material design method disclosed in Chinese patent CN202510855287.4, and advanced generative models such as MatterGen and UnigenX, generate new structures by learning the structural distribution of known crystal databases. However, their technical bottleneck lies in the irreversibility of the structure-property mapping. These models essentially learn a one-way mapping from structure to property. When used for reverse design, i.e., generating structures from target properties, the same target property may correspond to a vast number of potential structures, making it difficult for the model to converge to a finite optimal solution during end-to-end training. Specifically, in conditional generation tasks, even after generating thousands of samples, structures that simultaneously satisfy the target property constraints and are thermodynamically stable are still very few. For example, among the samples generated by UnigenX under constraints, only about a dozen successfully satisfy both the target properties and stability, resulting in a low success rate. This problem is not simply caused by insufficient engineering optimization, but rather by the inherent limitations of data-driven generative models in inverse mapping problems. Furthermore, such models typically lack hard constraints on fundamental crystallographic rules, and the structural rationality of the generated results highly depends on the coverage of the training data. Therefore, their generation capability significantly decreases for chemically sparse or novel structural types.

[0007] Furthermore, the design method for adsorbent materials of new pollutants perfluorinated compounds based on a large language model, disclosed in Chinese patent CN202510804747.0, attempts to introduce a hybrid design framework combining domain knowledge and data-driven methods. This method constructs a domain knowledge graph and semantic vector database, and combines it with a retrieval-enhanced generation (RAG) mechanism. This allows the large language model to comprehensively call upon structured knowledge and relevant literature texts based on an understanding of design requirements, thereby generating design schemes that include material structure suggestions, performance analysis, and potential synthesis paths. However, this framework is essentially an information retrieval and text generation system, rather than a directly executable structural design and optimization engine. Its main limitation is that it can provide reference information about the potential properties of materials, but it cannot quantify how specific modifications to the structure affect the material properties precisely, nor can it output specific, operable crystal structure adjustment steps to systematically approximate the target properties. In addition, large language models still face challenges in materials science, such as insufficient knowledge injection, scarce training data, and a superficial understanding of complex physicochemical rules. In tasks requiring high-precision, verifiable structural reasoning, the generated results are difficult to directly guide practical material design.

[0008] Clearly, existing technologies are limited by factors such as modeling dimensions, irreversible mapping constraints of crystal structure and material properties, and dynamic programming capabilities. They cannot meet the requirements for fine modeling of structure-property relationships needed for property-guided crystal structure generation, as well as the need for controllable and interpretable directional adjustments to the crystal structure during the generation process. Therefore, a new method and system are needed for crystal structure design oriented towards target properties. Summary of the Invention

[0009] This specification provides an LLM-based material reverse design method and system. By constructing a correlation model between changes in crystal structure and the evolution of material properties within a structural cluster, the system infers the target crystal structure from an initial crystal structure with similar material properties to the target structure. This ensures the integrity of structural information and avoids the complexity of constructing structures from random noise. It overcomes the limitations of modeling dimensions, the irreversibility of reverse mapping, and dynamic programming capabilities in material reverse design, providing an operable, reliable, and verifiable technical means for on-demand design of industrial-grade new materials.

[0010] To solve the above-mentioned technical problems, the embodiments in this specification are implemented as follows: This specification provides an LLM-based material reverse design method, including: For a given target property, at least one initial crystal structure is selected from a pre-defined crystal database; For any initial crystal structure, the fine-tuned LLM is guided by the difference between the material properties corresponding to the initial crystal structure and the target properties, and performs condition-guided iterative structure generation on the initial crystal structure to obtain the target crystal structure; The fine-tuned LLM is obtained by perturbation-remodeling training of the LLM; The perturbation-remodeling training is as follows: taking the material property differences between the source sample and the target sample of each sample pair as conditions, the LLM is guided to learn to perform structural remodeling along the direction of material property improvement during the structural transformation process from the crystal structure of the source sample to the crystal structure of the target sample. A sample pair consists of any two ordered samples from the same structural cluster; the crystal structures of any two samples from the same structural cluster satisfy the following: when comparing the atoms contained in one crystal structure with the atoms contained in another crystal structure, the number of identical atoms is not less than the number of dissimilar atoms. All crystal structures are Wyckoff sequenced crystal structures.

[0011] This specification also provides an LLM-based material reverse design system, including: The selection module selects at least one initial crystal structure from a preset crystal database for a given target property. The fine-tuned LLM, for any initial crystal structure, uses the difference between the material properties corresponding to the initial crystal structure and the target properties as a guide to perform condition-guided iterative structure generation on the initial crystal structure to obtain the target crystal structure; The fine-tuned LLM is obtained by perturbation-remodeling training of the LLM; The perturbation-remodeling training is as follows: taking the material property differences between the source sample and the target sample of each sample pair as conditions, the LLM is guided to learn to perform structural remodeling along the direction of material property improvement during the structural transformation process from the crystal structure of the source sample to the crystal structure of the target sample. A sample pair consists of any two ordered samples from the same structural cluster; the crystal structures of any two samples from the same structural cluster satisfy the following: when comparing the atoms contained in one crystal structure with the atoms contained in another crystal structure, the number of identical atoms is not less than the number of dissimilar atoms. All crystal structures are Wyckoff sequenced crystal structures.

[0012] This invention also provides a computer-readable storage medium storing a computer program thereon, which, when executed by a processor, implements the methods described in this invention.

[0013] The LLM-based material reverse design method and system provided in this specification employ generative artificial intelligence technology. Based on LLM, it automatically generates Wyckoff serialized crystal structures by learning the correlation between structural changes and the evolution of material properties. Specifically, it abandons the traditional approach of inferring high-dimensional structures from low-dimensional properties. By focusing LLM on learning the correlation between structural changes and the evolution of material properties—that is, mastering the offset path of structural adjustment—it transforms the material reverse design process into a structural translation search problem in continuous crystal structure space, thereby automatically generating the target crystal structure. This overcomes the limitations of existing technologies in terms of modeling dimensionality, irreversibility of reverse mapping, and dynamic programming capabilities. It ensures the physical rationality, synthesizability, and functional attainability of the generated material reverse design results, achieving high stability, high generation efficiency, and high success rate in material reverse design. This provides an operable, reliable, and verifiable technical means for the on-demand design of industrial-grade new materials. Attached Figure Description

[0014] To more clearly illustrate the technical solutions in the embodiments or prior art of this specification, the drawings used in the description of the embodiments or prior art will be briefly introduced below. Obviously, the drawings described below are only some embodiments recorded in this specification. For those skilled in the art, other drawings can be obtained based on these drawings without creative effort.

[0015] Figure 1 A schematic diagram of a system architecture for an LLM-based material reverse design method provided in the embodiments of this specification; Figure 2 A schematic diagram illustrating the changes in properties caused by modifications to the crystal structure; Figure 3 A flowchart illustrating an LLM-based material reverse design method provided in the embodiments of this specification; Figure 4 This is a schematic diagram of an LLM-based material reverse design system provided as an embodiment of this specification. Detailed Implementation

[0016] To enable those skilled in the art to better understand the technical solutions in this specification, the technical solutions in the embodiments of this specification will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of this application, and not all embodiments. Based on the embodiments of this specification, all other embodiments obtained by those skilled in the art without creative effort should fall within the scope of protection of this application.

[0017] The present invention provides a system architecture for an LLM-based material reverse design method, as shown in the embodiments below. Figure 1 As shown, system architecture 100 may include terminal devices 101, 102, and 103, a network 104, and a server 105. Network 104 serves as the medium for providing communication links between terminal devices 101, 102, and 103 and server 105. Network 104 may include various connection types, such as wired or wireless communication links, or fiber optic cables, etc.

[0018] Terminal devices 101, 102, and 103 interact with server 105 via network 104 to receive or send messages, etc. Various client applications can be installed on terminal devices 101, 102, and 103, such as dedicated programs for LLM-based reverse engineering of materials.

[0019] Terminal devices 101, 102, and 103 can be either hardware or software. When terminal devices 101, 102, and 103 are hardware, they can be various dedicated or general-purpose electronic devices, including but not limited to smartphones, tablets, laptops, and desktop computers. When terminal devices 101, 102, and 103 are software, they can be installed in the electronic devices listed above. They can be implemented as multiple software programs or software modules (e.g., multiple software programs or software modules used to provide distributed services) or as a single software program or software module.

[0020] Server 105 can be a server that provides various services, such as a backend server that provides services to client applications installed on terminal devices 101, 102, and 103. For example, the server can perform LLM-based reverse engineering of materials so that the results of the LLM-based reverse engineering can be displayed on terminal device servers 101, 102, and 103.

[0021] Server 105 can be either hardware or software. When server 105 is hardware, it can be implemented as a distributed server cluster consisting of multiple servers, or as a single server. When server 105 is software, it can be implemented as multiple software programs or software modules (e.g., multiple software programs or software modules used to provide distributed services), or as a single software program or software module.

[0022] The flowchart of a material reverse design method based on LLM provided in this embodiment of the invention is as follows: Figure 3 As shown. From a programmatic perspective, the execution entity of the process can be a program hosted on an application server or application terminal. It can be understood that this method can be executed by any device, equipment, platform, or cluster of devices with computing and processing capabilities. The method includes steps S301 to S302, specifically: Step S301: For a given target property, take at least one initial crystal structure from a preset crystal database.

[0023] Step S302: For any initial crystal structure, the fine-tuned Large Language Model (LLM) is guided by the difference between the material properties corresponding to the initial crystal structure and the target properties to perform condition-guided iterative structure generation on the initial crystal structure to obtain the target crystal structure.

[0024] The fine-tuned LLM used in the embodiments of the present invention is obtained by perturbation-remodeling training of the LLM.

[0025] The perturbation-remodeling training is as follows: taking the material property differences between the source sample and the target sample of each sample pair as conditions, the LLM is guided to learn to perform structural remodeling along the direction of material property improvement during the structural transformation process from the crystal structure of the source sample to the crystal structure of the target sample. A sample pair consists of any two ordered samples from the same structural cluster; the crystal structures of any two samples from the same structural cluster satisfy the following: when comparing the atoms contained in one crystal structure with the atoms contained in another crystal structure, the number of identical atoms is not less than the number of dissimilar atoms. All crystal structures are Wyckoff sequenced crystal structures.

[0026] The number of identical atoms is not less than the number of dissimilar atoms. When the total number of atoms in two crystal structures is the same, the formula is expressed as: x1>=y1 in, x1 represents the number of identical atoms, y1 represents the number of different atoms, and the total number of atoms contained in either crystal structure is x1+y1.

[0027] The number of identical atoms is not less than the number of dissimilar atoms. When the total number of atoms contained in the two crystal structures is not the same, the formula is expressed as: x2>=max(y2,z) in, x2 represents the number of identical atoms; x2+y2 represents the total number of atoms contained in a crystal structure; x2+z represents the total number of atoms contained in another crystal structure.

[0028] The material properties include at least one of the following four parameters: Cauchy stress tensor, bulk modulus, magnetic moment, or band gap. Clearly, the material properties include the same parameters as the target properties. For example, if the target property includes the Cauchy stress tensor, then the material properties include the Cauchy stress tensor. The Cauchy stress tensor is hereinafter referred to as the stress tensor.

[0029] The direction of material property improvement is the direction in which the material properties gradually approach the target properties.

[0030] In one possible implementation, the method described in this embodiment of the invention further includes: preprocessing each material from its representation in Crystal Information File (CIF) format to the Wyckoff serialized crystal structure, as expressed by the formula:

[0031] in, Under specific symmetry constraints, atomic types and fractional coordinates are integrated, consisting of triples { wy,sym,SITE i} characterization, where wy The letter Wyckoff is a classification label for a position within a space group, used to uniquely identify the position type and determine the equivalent number of atoms; sym Site symmetry indicates wy The local symmetry environment of the corresponding atom, that is, the type of symmetry operation around the atom; SITE i express wy The corresponding set of atomic information includes idx (site number), species (atom type), and x / y / z (fractional coordinates of the atom, based on the unit cell parameters); SPG stands for Space Group; L represents the lattice parameter. In another possible implementation, the method described in this embodiment of the invention further includes: a method for generating any of the said structure clusters, specifically: Select any crystal structure from the preset crystal database as the parent crystal structure; Multiple independent, controlled structural perturbations are applied to the parent crystal structure to generate a set of crystal structures. After screening the group of crystal structures for stability, diversity, and repeatability, each crystal structure and its corresponding material properties are taken as a sample, and the collection of all samples constitutes a structure cluster.

[0032] That is, a sample is a pair of crystal structures and material properties of a material.

[0033] The controlled structural perturbation includes at least one of space group adjustment, lattice parameter shift, atomic coordinate fine-tuning, and atom type replacement, expressed by the following formula:

[0034] in, Indicates the initial crystal structure; This indicates the crystal structure generated after adding structural perturbations; P (·): Perturbation operator; θ 1- θ 4 represents the perturbation parameter corresponding to the space group, lattice, atomic coordinates, and atom type; ΔSPG, ΔL, Δr, ΔZ represent the perturbation parameters for the space group, the lattice parameters, the atomic coordinates, and the atom types, respectively.

[0035] The stability screening criterion is that the generated crystal structure's energy difference per atom from the parent crystal structure does not exceed 0.1 electron volts per atom (eV / atom); the diversity screening criterion is that the generated crystal structure's mean absolute error of the stress tensor from the parent crystal structure exceeds 0.05 electron volts per cubic angstrom (eV / atom). 3 The magnetic moment difference between the parent crystal structure and the magnetic moment of the parent crystal structure exceeds 0.3 Bohr magnetons (μ). B ( ), or at least one of the band gap differences from the parent crystal structure exceeding 0.3 electron volts (eV).

[0036] Here, by using Wyckoff sequences and hints, the symmetry prior and the knowledge prior of the large model are effectively utilized, thereby improving the quality of the perturbated structure.

[0037] In another possible implementation, multiple materials are selected from a pre-defined crystal database, and the crystal structure and material properties of each selected material are used as samples to form a structure cluster. The crystal structures of any two samples in the structure cluster satisfy the following condition: when comparing the atoms contained in one crystal structure with the atoms contained in another crystal structure, the number of identical atoms is not less than the number of dissimilar atoms.

[0038] Here, the preset crystal database is derived from an existing materials database.

[0039] In another possible implementation, the LLM is a pre-trained large language model with 6B, 7B, 8B, or 10B parameters that has learned general material knowledge.

[0040] The objective function for perturbation-remodeling training of LLM is defined as follows:

[0041] in, Δ S Indicates structural perturbation; P θ (Δ S |Prompt) represents the model's performance under the training cue Prompt. θ Generate Δ S The conditional probability, where Prompt is the conditional difference in material properties between the source and target samples of the sample pair.

[0042] During training, the cross-entropy between the model-generated samples and the target samples is used as the loss function.

[0043] The embodiments of the present invention employ the Low-Rank Adaptation (LoRA) method to fine-tune the parameters of the LLM, training only the low-rank parameters of specific layers in the model, thereby significantly reducing memory usage and training time while maintaining the model's expressive power.

[0044] In another possible implementation, the initial crystal structure has material properties similar to the target properties. Step S301, which involves selecting at least one initial crystal structure from a preset crystal database, includes steps a1 to b1, specifically: Step a1: Use the Top-K fast screening algorithm to select a set number of candidate initial crystal structures that have the smallest weighted Euclidean distance between the material properties and the target properties.

[0045]

[0046] D, which represents the weighted Euclidean distance between the material properties and the target properties, and is dimensionless; ω σ , represents the weighting coefficient of the stress tensor, which is dimensionless; σ , representing the actual stress tensor of the material, is a 3×3 matrix with units of eV / 3 ; σ* Target stress tensor, a 3×3 matrix, unit eV / 3 ; ω μ : Weighting coefficient of magnetic moment, dimensionless; μ Actual magnetic moment of the material, in μ B ; μ* Target magnetic moment, in μ B ; ω E : The weighting coefficient of the band gap, dimensionless; E g Actual band gap of the material, in eV; E g * Target bandgap, in eV; Due to the order-of-magnitude difference between different properties, to ensure that the calculation results can reasonably measure the difference between the two properties, ω σ , ω μ , ω E The value of needs to take into account the importance of stress tensor, magnetic moment and band gap, as well as the scaling factor.

[0047] Step b1: Calculate the reachability score of each of the selected candidate initial crystal structures, and take the at least one candidate initial crystal structure with the highest score as the initial crystal structure.

[0048] In step a1, the number set is K. Here, K is set based on computing resources, actual design requirements, and experience, and is usually an integer from 1 to 10.

[0049] The reachability score of the candidate initial crystal structure is used to quantify the modifiability potential of the candidate initial crystal structure. It comprehensively considers the closeness of the candidate initial crystal structure to the target properties and the derivation capability of the candidate initial crystal structure. Its calculation formula is expressed as follows:

[0050] in, A 0 , represents the reachability score of the candidate initial crystal structure, with a value range of [0,1]; S d : Indicates the derivation capability score of the candidate initial crystal structure; D , representing the weighted Euclidean distance between the material properties of the candidate initial crystal structure and the target property; α , β Weighting coefficients, if the focus is on quickly approximating the target properties, improve... α Value selection; if the emphasis is on the flexibility and ductility of structural perturbations, it can be improved. β Values.

[0051] Specifically, to obtain S d The method is as follows: Based on the crystal structure dataset, a graph neural network (GNN) model was trained using "the number of effective derivatives generated by a single sample" as the label. The derivational capability of candidate initial crystal structures is predicted using a trained GNN model, and the output is normalized to obtain the result. S d .

[0052] Here, a search-enhanced generation (RAG) mechanism is introduced to assist users in obtaining suitable initial crystal structures, improving the user-friendliness of the system and achieving the technical effect of outputting material structures matching the target property simply by inputting the target property parameters from the user's perspective. Furthermore, the pre-selection of the initial crystal structure achieves high efficiency through rapid screening and provides the model with foresight through accessibility analysis, balancing search efficiency with the suitability of the initial crystal structure to the target property.

[0053] In another possible implementation, the condition-guided iterative structure generation described in step S302 includes, in each iteration: Using the current crystal structure as a condition, the structure increment is predicted based on the difference between the material properties corresponding to the current crystal structure and the target properties to update the current crystal structure, until the iteration continuation condition is no longer met, and the current iteration ends.

[0054] In any iteration step t The formula is expressed as:

[0055] St+1 = S t + S t in, S t Representing the iterative steps t The current crystal structure, S t+1 Representing the iterative steps t+ The current crystal structure of 1, S t ∈C, S t+1 ∈C, where C represents the continuous crystal structure space; S t Follows conditional distribution p ( S|P t →P , S t ), that is, with S t ,as well as S t Corresponding material properties P t As input, and in the target property P Predicting structural increments under constraints S t .

[0056] The above describes the steps performed at any step in an iteration, until the conditions for continuing the iteration are no longer met, at which point the current iteration ends.

[0057] This application uses this as a unified mathematical description for structure search and screening. Any iteration can be abstracted as moving a structure translation operator in the continuous crystal structure space. One round of iteration is a sequential decision-making process of gradually translating along the physically feasible direction in the continuous crystal structure space.

[0058] Furthermore, the iteration continuation condition formula is expressed as: If

[0059] in, Indicates the first t Step, First t The weighted Euclidean distance between the current crystal structure and the target property at step -1; This represents the energy of the current crystal structure at step t; This indicates the maximum allowed energy threshold.

[0060] The iteration continuation condition formula is called before each iteration to check whether the current crystal structure meets the condition for continuing the iteration. If it does, the iteration continues.

[0061] In practice, structural reshaping requires more than one iteration. In each iteration, the crystal structures whose energy is not higher than the first set threshold after each update are all taken as candidate crystal structures. The candidate crystal structure with the smallest weighted Euclidean distance between the material properties and the target properties is taken as the initial crystal structure for the next iteration, and the second iteration begins, until the preset termination condition is reached.

[0062] The smaller the first set threshold value, the more stable the generated target crystal structure. In application, while setting the first set threshold according to the actual requirements for the stability of the target crystal structure, it is also necessary to consider the computing resources and generation efficiency. Here, the sum of the energy of the initial crystal structure and the maximum allowable energy increment is used as the first set threshold. The maximum allowable energy increment generally does not exceed 1.0 eV / atom.

[0063] Furthermore, the termination condition is a pre-set number of iterations or a stop in dynamic quality.

[0064] The preset number of iteration rounds is set based on computing resources, design requirements, and experience. It should be sufficient to ensure that the entire iteration process meets the expected requirements in terms of systematicity, controllability, and material optimization depth. Here, it can be set to 5 rounds.

[0065] The dynamic quality cessation is specifically as follows: If a candidate crystal structure with an energy not higher than the first set threshold and a weighted Euclidean distance between the material properties and the target properties is lower than the second set threshold appears in a new round of iteration, the iteration is terminated.

[0066] The smaller the value of the second threshold, the closer the material properties of the generated crystal structure are to the target properties. In application, while setting the second threshold according to the actual requirements for the accuracy of material properties, computational resources and generation efficiency also need to be considered.

[0067] Here, the setting of the second threshold also needs to take into account the weighting coefficients of each property parameter used in calculating the weighted Euclidean distance. Generally, when the weighting coefficient of the stress tensor is 1.0e5, and the weighting coefficients of the magnetic moment and the band gap are both 1, the second threshold of 0.5 can meet the needs of most application scenarios.

[0068] Here, dynamic mass stopping is used, which ensures the superior stability and potential variability of the final material while avoiding redundant calculations.

[0069] In another possible implementation, the condition-guided iterative structure generation described in step S302 to obtain the target crystal structure further includes: taking the candidate crystal structure that is closest to the target properties among all candidate crystal structures in each iteration as the target crystal structure.

[0070] Specifically, selecting the candidate crystal structure that best matches the target properties from all candidate crystal structures generated in each iteration includes steps a2 and b2: Step a2: During the iteration, based on the multi-index parallel computing mechanism, the energy of the candidate crystal structure and the weighted Euclidean distance between the material properties of the candidate crystal structure and the target property are continuously calculated.

[0071] Step b2: Select the target crystal structure from all candidate crystal structures generated in each iteration, whose energy is not higher than the first set threshold and whose weighted Euclidean distance between the material properties and the target properties is the smallest.

[0072] The preset number is set according to design requirements and generally does not need to exceed 5.

[0073] Here, the iterative optimization closed-loop design method of iterative generation and search-guided scoring not only realizes the efficient, controllable and dynamic generation of crystal structures, but also effectively avoids the exponential computational overhead that may be caused by traditional Monte Carlo tree search, providing a scientific and practical technical path for the reverse design of high-performance materials.

[0074] Based on the generated results, the iterative optimization trend is presented intuitively through multi-dimensional visualization comparison, thereby assisting researchers in quickly identifying the optimal candidate crystal structure. Simultaneously, combining the two optional stopping strategies mentioned above, the system ultimately selects high-quality candidate crystal structures and outputs their complete structural information and property evaluation reports.

[0075] This invention, while ensuring the physical rationality and stability of the generated structure, compresses the process of thousands of search iterations required in traditional material structure design to no more than ten generation and evaluation cycles, significantly improving the efficiency and engineering feasibility of material structure orientation design.

[0076] The above content details an LLM-based material reverse design method. Correspondingly, this specification also provides an LLM-based material reverse design system, such as... Figure 4 As shown. Figure 4 A schematic diagram of an LLM-based material reverse design system provided in this specification includes: Select module 401 to retrieve at least one initial crystal structure from a preset crystal database for a given target property; The fine-tuned LLM 402, for any initial crystal structure, uses the difference between the material properties corresponding to the initial crystal structure and the target properties as a guide to perform condition-guided iterative structure generation on the initial crystal structure to obtain the target crystal structure; The fine-tuned LLM is obtained by perturbation-remodeling training of the LLM; The perturbation-remodeling training is as follows: taking the material property differences between the source sample and the target sample of each sample pair as conditions, the LLM is guided to learn to perform structural remodeling along the direction of material property improvement during the structural transformation process from the crystal structure of the source sample to the crystal structure of the target sample. A sample pair consists of any two ordered samples from the same structural cluster; the crystal structures of any two samples from the same structural cluster satisfy the following: when comparing the atoms contained in one crystal structure with the atoms contained in another crystal structure, the number of identical atoms is not less than the number of dissimilar atoms. All crystal structures are Wyckoff sequenced crystal structures.

[0077] A computer-readable storage medium having a computer program stored thereon, which, when executed by a processor, implements the methods described in the embodiments of the present invention.

[0078] The various embodiments in this specification are described in a progressive manner. Similar or identical parts between embodiments can be referred to interchangeably. Each embodiment focuses on describing the differences from other embodiments. In particular, the embodiments for apparatus, electronic devices, and non-volatile computer storage media are basically similar to the method embodiments, so the descriptions are relatively simple; relevant parts can be referred to the descriptions of the method embodiments.

[0079] The apparatus, electronic device, and non-volatile computer storage medium and method provided in the embodiments of this specification are corresponding. Therefore, the apparatus, electronic device, and non-volatile computer storage medium also have similar beneficial technical effects as the corresponding method. Since the beneficial technical effects of the method have been described in detail above, the beneficial technical effects of the corresponding apparatus, electronic device, and non-volatile computer storage medium will not be repeated here.

[0080] In the 1990s, improvements to a technology could be clearly distinguished as either hardware improvements (e.g., improvements to the circuit structure of diodes, transistors, switches, etc.) or software improvements (improvements to the methodology). However, with technological advancements, many methodological improvements today can be considered direct improvements to the hardware circuit structure. Designers almost always obtain the corresponding hardware circuit structure by programming the improved methodology into the hardware circuit. Therefore, it cannot be said that a methodological improvement cannot be implemented using hardware physical modules. For example, a Programmable Logic Device (PLD) (such as a Field Programmable Gate Array (FPGA)) is such an integrated circuit whose logic function is determined by the user programming the device. Designers can program and "integrate" a digital system onto a PLD themselves, without needing chip manufacturers to design and manufacture dedicated integrated circuit chips. Furthermore, nowadays, instead of manually manufacturing integrated circuit chips, this programming is mostly implemented using "logic compiler" software. Similar to the software compiler used in program development, the original code before compilation must also be written in a specific programming language, called a Hardware Description Language (HDL). There are many HDLs, such as ABEL (Advanced Boolean Expression Language), AHDL (Altera Hardware Description Language), Confluence, CUPL (Cornell University Programming Language), HDCal, JHDL (Java Hardware Description Language), Lava, Lola, MyHDL, PALASM, and RHDL (Ruby Hardware Description Language). Currently, the most commonly used are VHDL (Very-High-Speed ​​Integrated Circuit Hardware Description Language) and Verilog. Those skilled in the art should also understand that by simply performing some logic programming on the method flow using one of these hardware description languages ​​and programming it into an integrated circuit, the hardware circuit implementing the logical method flow can be easily obtained.

[0081] The controller can be implemented in any suitable manner. For example, it can take the form of a microprocessor or processor and a computer-readable medium storing computer-readable program code (e.g., software or firmware) executable by the (micro)processor, logic gates, switches, application-specific integrated circuits (ASICs), programmable logic controllers, and embedded microcontrollers. Examples of controllers include, but are not limited to, the following microcontrollers: ARC 625D, Atmel AT91SAM, Microchip PIC18F26K20, and Silicon Labs C8051F320. A memory controller can also be implemented as part of the control logic of the memory. Those skilled in the art will also recognize that, in addition to implementing the controller in purely computer-readable program code form, the same functionality can be achieved by logically programming the method steps to make the controller take the form of logic gates, switches, application-specific integrated circuits, programmable logic controllers, and embedded microcontrollers. Therefore, such a controller can be considered a hardware component, and the means included therein for implementing various functions can also be considered as structures within the hardware component. Alternatively, the means for implementing various functions can be considered as both software modules implementing the method and structures within the hardware component.

[0082] The systems, devices, modules, or units described in the above embodiments can be implemented by computer chips or entities, or by products with certain functions. A typical implementation device is a computer. Specifically, a computer can be, for example, a personal computer, laptop computer, cellular phone, camera phone, smartphone, personal digital assistant, media player, navigation device, email device, game console, tablet computer, wearable device, or any combination of these devices.

[0083] For ease of description, the above apparatus is described by dividing it into various functional units. Of course, when implementing one or more embodiments of this specification, the functions of each unit can be implemented in one or more software and / or hardware.

[0084] Those skilled in the art will understand that the embodiments of this specification can be provided as methods, systems, or computer program products. Therefore, the embodiments of this specification can take the form of a completely hardware embodiment, a completely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the embodiments of this specification can take the form of a computer program product implemented on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, etc.) containing computer-usable program code.

[0085] This specification is described with reference to flowchart illustrations and / or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of this specification. It will be understood that each block of the flowchart illustrations and / or block diagrams, and combinations of blocks in the flowchart illustrations and / or block diagrams, can be implemented by computer program instructions. These computer program instructions can be provided to a processor of a general-purpose computer, special-purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, generate instructions for implementing the flowchart illustrations and / or block diagrams. Figure 1 One or more processes and / or boxes Figure 1 A device that provides the functions specified in one or more boxes.

[0086] These computer program instructions may also be stored in a computer-readable storage medium that can direct a computer or other programmable data processing device to function in a particular manner, such that the instructions stored in the computer-readable storage medium produce an article of manufacture including instruction means, which are implemented in a process Figure 1 One or more processes and / or boxes Figure 1 The function specified in one or more boxes.

[0087] These computer program instructions may also be loaded onto a computer or other programmable data processing equipment to cause a series of operational steps to be performed on the computer or other programmable equipment to produce a computer-implemented process, thereby providing instructions that execute on the computer or other programmable equipment for implementing the process. Figure 1 One or more processes and / or boxes Figure 1 The steps of the function specified in one or more boxes.

[0088] In a typical configuration, a computing device includes one or more processors (CPU), input / output interfaces, network interfaces, and memory.

[0089] Memory may include non-persistent storage in computer-readable media, such as random access memory (RAM) and / or non-volatile memory, such as read-only memory (ROM) or flash RAM. Memory is an example of computer-readable media.

[0090] Computer-readable media includes both permanent and non-permanent, removable and non-removable media that can store information using any method or technology. Information can be computer-readable instructions, data structures, modules of programs, or other data. Examples of computer storage media include, but are not limited to, phase-change memory (PRAM), static random access memory (SRAM), dynamic random access memory (DRAM), other types of random access memory (RAM), read-only memory (ROM), electrically erasable programmable read-only memory (EEPROM), flash memory or other memory technologies, CD-ROM, digital versatile optical disc (DVD) or other optical storage, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other non-transferable medium that can be used to store information accessible by a computing device. As defined herein, computer-readable media does not include transient computer-readable media, such as modulated data signals and carrier waves.

[0091] It should also be noted that the terms "comprising," "including," or any other variations thereof are intended to cover non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements includes not only those elements but also other elements not expressly listed, or elements inherent to such a process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one..." does not exclude the presence of other identical elements in the process, method, article, or apparatus that includes said element.

[0092] This specification can be described in the general context of computer-executable instructions that are executed by a computer, such as program modules. Generally, program modules include routines, programs, objects, components, data structures, etc., that perform a specific task or implement a specific abstract data type. This specification can also be practiced in distributed computing environments, where tasks are performed by remote processing devices connected via a communication network. In distributed computing environments, program modules can reside on local and remote computer storage media, including storage devices.

[0093] The various embodiments in this specification are described in a progressive manner. Similar or identical parts between embodiments can be referred to interchangeably. Each embodiment focuses on describing the differences from other embodiments. In particular, the system embodiments are basically similar to the method embodiments, so the description is relatively simple; relevant parts can be referred to the descriptions in the method embodiments.

[0094] The above description is merely an embodiment of this specification and is not intended to limit this application. Various modifications and variations can be made to this application by those skilled in the art. Any modifications, equivalent substitutions, improvements, etc., made within the spirit and principle of this application should be included within the scope of the claims of this application.

Claims

1. A material reverse design method based on LLM, characterized in that, include: For a given target property, at least one initial crystal structure is selected from a pre-defined crystal database; For any initial crystal structure, the fine-tuned Large Language Model (LLM) is guided by the difference between the material properties corresponding to the initial crystal structure and the target properties to perform condition-guided iterative structure generation on the initial crystal structure, thereby obtaining the target crystal structure. The fine-tuned LLM is obtained by perturbation-remodeling training of the LLM; The perturbation-remodeling training is as follows: taking the material property differences between the source sample and the target sample of each sample pair as conditions, the LLM is guided to learn to perform structural remodeling along the direction of material property improvement during the structural transformation process from the crystal structure of the source sample to the crystal structure of the target sample. A sample pair consists of any two ordered samples from the same structural cluster; the crystal structures of any two samples from the same structural cluster satisfy the following: when comparing the atoms contained in one crystal structure with the atoms contained in another crystal structure, the number of identical atoms is not less than the number of dissimilar atoms. All crystal structures are Wyckoff sequenced crystal structures.

2. The method as described in claim 1, characterized in that, The step of retrieving at least one initial crystal structure from a preset crystal database includes: The Top-K fast screening algorithm was used to select a set number of candidate initial crystal structures with the smallest weighted Euclidean distance between the material properties and the target properties; The accessibility score of each candidate initial crystal structure is calculated, and at least one candidate initial crystal structure with the highest score is selected as the initial crystal structure.

3. The method as described in claim 1, characterized in that, The condition-guided iterative structure generation includes, in each iteration: Using the current crystal structure as a condition, the structure increment is predicted based on the difference between the material properties corresponding to the current crystal structure and the target properties to update the current crystal structure, until the condition for continuing the iteration is no longer met.

4. The method as described in claim 3, characterized in that, In any iteration step t The formula is expressed as: S t p( S|P t →P , S t ) S t+1 = S t + S t in, S t Representing the iterative steps t The current crystal structure, S t+1 Representing the iterative steps t+ The current crystal structure of 1, S t ∈C, S t+1 ∈C, where C represents the continuous crystal structure space; S t Follows conditional distribution p ( S|P t →P , S t ), that is, with S t ,as well as S t Corresponding material properties P t As input, and in the target property P Predicting structural increments under constraints S t .

5. The method as described in claim 3 or 4, characterized in that, The condition-guided iterative structure generation performs at least one iteration until a preset termination condition is met; the termination condition is: a preset number of iterations or dynamic quality stops.

6. The method as described in claim 1, characterized in that, Also includes: The method for generating any of the aforementioned structure clusters is as follows: Select any crystal structure from the preset crystal database as the parent crystal structure; Multiple independent, controlled structural perturbations are applied to the parent crystal structure to generate a set of crystal structures. After screening the group of crystal structures for stability, diversity, and repeatability, each crystal structure and its corresponding material properties are taken as a sample, and the collection of all samples constitutes a structure cluster.

7. The method as described in claim 6, characterized in that, The controlled structural perturbation includes at least one of space group adjustment, lattice parameter shift, atomic coordinate fine-tuning, and atom type replacement.

8. The method as described in claim 6, characterized in that, The stability screening criterion is: the generated crystal structure satisfies that the energy difference per atom from the parent crystal structure does not exceed 0.1 eV / atom; the diversity screening criterion is: the generated crystal structure satisfies that the mean absolute error of the stress tensor from the parent crystal structure exceeds 0.05 eV / cubic angstrom / 3 The difference in magnetic moment between the parent crystal structure and the parent crystal structure exceeds 0.3 Bohr magnetons μ. B Or at least one of the band gaps that differ from the parent crystal structure by more than 0.3 electron volts (eV).

9. The method as described in claim 1, characterized in that, The material properties include at least one of Cauchy stress tensor, bulk modulus, magnetic moment, or band gap.

10. A material reverse design system based on LLM, characterized in that, include: The selection module selects at least one initial crystal structure from a preset crystal database for a given target property. The fine-tuned LLM, for any initial crystal structure, uses the difference between the material properties corresponding to the initial crystal structure and the target properties as a guide to perform condition-guided iterative structure generation on the initial crystal structure to obtain the target crystal structure; The fine-tuned LLM is obtained by perturbation-remodeling training of the LLM; The perturbation-remodeling training is as follows: taking the material property differences between the source sample and the target sample of each sample pair as conditions, the LLM is guided to learn to perform structural remodeling along the direction of material property improvement during the structural transformation process from the crystal structure of the source sample to the crystal structure of the target sample. A sample pair consists of any two ordered samples from the same structural cluster; the crystal structures of any two samples from the same structural cluster satisfy the following: when comparing the atoms contained in one crystal structure with the atoms contained in another crystal structure, the number of identical atoms is not less than the number of dissimilar atoms. All crystal structures are Wyckoff sequenced crystal structures.