A descriptor construction method for quantifying host-guest interactions in confined environments

By constructing the ICE descriptor and combining it with the Voronoi diagram and Lennard-Jones potential function, the quantification problem of host-guest interactions in MOF materials was solved, improving the prediction accuracy of gas diffusion coefficient and adsorption capacity.

CN122245512APending Publication Date: 2026-06-19HAINAN UNIV

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
HAINAN UNIV
Filing Date
2026-05-12
Publication Date
2026-06-19

AI Technical Summary

Technical Problem

Existing technologies struggle to efficiently quantify host-guest interactions in confined environments, impacting the accuracy of MOF material performance predictions.

Method used

By constructing ICE descriptors and combining Voronoi diagram analysis and Lennard-Jones potential functions, the interaction forces between host material atoms and guest molecules are calculated, generating ICE descriptors that can be used to predict the diffusion coefficient and adsorption capacity of gases in MOF materials.

Benefits of technology

It enables a detailed characterization of the host-guest interaction in MOF materials, improving prediction accuracy. In particular, it outperforms traditional methods in predicting gas diffusion coefficient and adsorption capacity, with a correlation of over 0.85.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN122245512A_ABST
    Figure CN122245512A_ABST
Patent Text Reader

Abstract

This invention discloses a descriptor construction method for quantifying host-guest interactions in confined environments, belonging to the technical field of materials analysis. The method includes the following steps: S1: Obtaining atomic coordinate data of the host material; S2: Based on the atomic coordinate data, identifying candidate sites within the host material, and filtering these sites to obtain a set of effective sites; S3: At each effective site, calculating the interaction force between the host material atoms and guest molecules to obtain a net interaction force vector; S4: Calculating the root mean square of all net interaction force vectors, and obtaining a descriptor through accessible volume fraction normalization; S5: Using the ICE descriptor to describe the host-guest interactions in the confined environment. This invention employs the above method, using descriptors to vector-encode local interaction forces, integrating them in root mean square form, and normalizing through accessible volume fraction, thereby achieving the quantification of the physical interpretability of host-guest interactions.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the technical fields of chemical engineering and information technology, and in particular to a method for constructing descriptors for quantifying the interaction between subject and object in a confined environment. Background Technology

[0002] Metal-organic frameworks (MOFs) are a class of crystalline porous materials formed by metal nodes and organic ligands linked by coordination bonds. Due to their high specific surface area, tunable pore size, and rich chemical functionalities, MOFs have broad application prospects in gas storage, separation, and catalysis. In MOF applications, the diffusion coefficient and adsorption capacity of gas molecules are two key physical properties. These two properties are closely related to the host-guest interaction between the guest molecules and the host material. Therefore, quantitatively characterizing the host-guest interaction in confined environments at the atomic scale is crucial for rapidly predicting MOF performance.

[0003] To address this issue, existing technologies mainly fall into two categories. One category is based on geometric descriptors, such as pore size and specific surface area. These methods are computationally efficient but primarily reflect the size of the pores, making it difficult to reflect the influence of the environment within the pores on the interaction between the host and guest. The other category is based on molecular simulation methods, such as the grand canonical Monte Carlo method and molecular dynamics methods. These methods are highly accurate but computationally intensive. Summary of the Invention

[0004] The purpose of this invention is to provide a method for constructing descriptors for quantifying host-guest interactions in confined environments. By mathematically calculating atomic coordinate data, an Interaction in Confined Environment (ICE) descriptor is established to quantify host-guest interactions in confined environments. This method can be applied to predict the diffusion coefficient and adsorption capacity of guest molecules in host materials.

[0005] To achieve the above objectives, this invention provides a method for constructing descriptors for quantifying subject-object interactions in confined environments, comprising the following steps: S1: Obtain the atomic coordinate data of the main material; S2: Based on the atomic coordinate data, candidate sites inside the main material are identified by Voronoi diagram analysis, and the candidate sites are screened to remove sites that do not meet the preset conditions, thus obtaining a set of effective sites. S3: Set the cutoff radius. At each effective site, all host material atoms within the cutoff radius are traversed. The negative gradient of the host material atoms within the cutoff radius is obtained using the Lennard-Jones potential. The interaction force between the host material atoms and the guest molecules is calculated, and the net interaction force vector is obtained. ; S4: Calculate the root mean square of the net interaction force vector for all valid sites and obtain the ICE descriptor by normalizing it with the accessible volume fraction; S5: Use ICE descriptors to describe subject-object interactions in a constrained environment.

[0006] Preferably, the extracted atomic coordinate data in step S1 specifically includes the element type and atomic fraction coordinates of all atoms.

[0007] Preferably, in step S2, the identification of candidate sites is based on Voronoi diagram analysis. The analysis process is as follows: traverse each host material atom and all its neighboring atoms; for each atom and any three of its neighboring atoms forming a non-coplanar quadrilateral group, find the intersection points of the perpendicular bisectors between the atom and these three neighboring atoms; use these intersection points as candidate sites; and screen the candidate sites to obtain valid sites, with the following preset conditions: obtain the zero potential distance in the Lennard-Jones potential function of the guest molecule and the host material atom. A first threshold is set based on the type of target molecule. The first threshold is the sum of the van der Waals radii of the target molecules. The van der Waals radius is determined by... Calculations showed that when the distance between any two candidate sites is less than a first threshold, only the one farther from the host material atom is retained; then the distance between the guest molecule and the host material atom is... The sum of the values ​​is used as a second threshold. When the distance between a candidate site and any atom of the host material is less than this second threshold, the candidate site is eliminated. Sites that pass the screening of the two thresholds are defined as valid sites.

[0008] Preferably, in step S3, the net interaction force vector is calculated. The process is as follows: Calculate the well depth of the Lennard-Jones potential function based on the Lorentz-Berthelot mixing rule. and zero potential distance : ; In the above formula, and For guest molecules located at effective sites, the zero potential distance and well depth are given by the Lennard-Jones potential function. and The zero potential distance and well depth of the Lennard-Jones potential function for the host material atom; the Lennard-Jones potential function between the effective site i and the host material atom j. The formula is as follows: ; in, The scalar distance between the effective site and the atoms of the host material is... Calculate the negative gradient to obtain the effective sites. With the atoms of the main material The paired force vectors between them are summed over j within the range to obtain the corresponding effective sites. Net interaction force vector : ; in, The effective site i is the unit vector pointing to the host material atom j.

[0009] Preferably, in step S4, the process of generating the ICE descriptor is as follows: First, obtain the root mean square of the net interaction force vectors for all valid sites. : ; In the above formula, This represents the cutoff radius, which can be 3 Å, 3.5 Å, 5 Å, 8 Å, 12 Å, or 20 Å. Let be the net interaction force vector of the i-th effective site. This represents the total number of valid sites; the formula for the corresponding ICE descriptor is as follows: ; In the above formula, Expressed as the accessible volume fraction of the main material, when The larger it becomes, the greater the interaction between the subject and object. As the interaction between the subject and object decreases, the interaction between them also decreases.

[0010] Preferably, in step S5, the ICE descriptor is used to describe the interaction between the subject and object in a constrained environment, and its application in predicting the target variable is as follows: The ICE descriptor, the structural descriptor of the host material, and the attribute descriptor of the guest molecule are combined to form a feature vector. The structural descriptor of the host material includes the maximum pore diameter, pore confinement diameter, density, and accessible volume fraction. The attribute descriptor of the guest molecule includes molecular mass. Finally, the formed feature vector and the corresponding target variable are used as input to train a machine learning regression model. The trained machine learning regression model is used to predict the target variable.

[0011] Preferred pre-trained machine learning regression models include the LightGBM model, the XGBoost model, and the RandomForest model.

[0012] Therefore, the present invention employs the above-described method for constructing descriptors for quantifying subject-object interactions in confined environments, which has the following advantages: (1) The technical solution of the present invention has strong physical interpretability. The ICE descriptor is directly derived from the negative gradient of the Lennard-Jones potential. Its physical meaning is clear and it can characterize the magnitude and direction of the force exerted by each point in the channel on the guest molecule, and describe the local environment in more detail.

[0013] (2) The technical solution of this invention has high prediction accuracy. Results show that when predicting the diffusion coefficients and adsorption capacities of MOFs for various gases such as hydrogen, helium, methane, nitrogen, and oxygen, the ICE descriptor, when used alone, has better correlation than the traditional geometric descriptor, and in the combined model, the coefficient of determination (…) The stoichiometry can reach above 0.85 (diffusion) and above 0.92 (adsorption).

[0014] (3) This invention can reveal the physical mechanism. By comparing diffusion and adsorption tasks, the ICE descriptor can reflect the physical law that diffusion performance is more sensitive to the local environment and adsorption performance is more sensitive to the overall effect, which has guiding significance for material design.

[0015] The technical solution of the present invention will be further described in detail below with reference to the accompanying drawings and embodiments. Attached Figure Description

[0016] Figure 1 This is a flowchart of a descriptor construction method for quantifying subject-object interactions in a confined environment, according to the present invention. Figure 2 This is a schematic diagram illustrating the use of Voronoi polyhedra to determine candidate sites in this invention. Figure 3 This is a graph showing the variation trend of ICE values ​​under different cutoff radii in this invention; Figure 4 This is a Pearson correlation heatmap of the ICE descriptor and diffusion coefficient and adsorption capacity of the present invention. Figure 5 This is a scatter plot comparing the predicted and actual values ​​in the diffusion coefficient prediction task of this invention. Figure 6 This is a scatter plot comparing the predicted and actual values ​​in the adsorption capacity prediction task of this invention. Detailed Implementation

[0017] To make the objectives, technical solutions, and advantages of the embodiments of the present invention clearer, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of the present invention, not all embodiments. The components of the embodiments of the present invention described and shown in the accompanying drawings can generally be arranged and designed in various different configurations. Specific model specifications need to be selected and determined according to the actual specifications of the device, etc. The specific selection calculation method adopts existing technology in the art, and therefore will not be described in detail.

[0018] Example 1 like Figure 1 As shown, this invention provides a method for constructing descriptors for quantifying subject-guest interactions in confined environments. This embodiment uses a MOF material named ACOLOV_clean from a database as an example to illustrate the construction process of the ICE descriptor in detail. ACOLOV_clean is a MOF material containing cadmium (Cd), and its crystal structure data is obtained from a public database. The method includes the following steps: S1: Obtain the atomic coordinate data of the main material. Select to extract the atomic coordinate data of ACOLOV_clean from the crystal information file (CIF file), including the element type and atomic fraction coordinates of all atoms. In this embodiment, ACOLOV_clean contains cadmium atoms, oxygen atoms, carbon atoms and hydrogen atoms.

[0019] S2: Based on atomic coordinate data, candidate sites within the host material ACOLOV_clean are identified through Voronoi diagram analysis. These candidate sites are then screened, and sites that do not meet preset criteria are removed, resulting in a set of valid sites. The specific process is as follows: like Figure 2 As shown, each host material atom and all its neighboring atoms are traversed. For each atom and any three of its non-coplanar four-atom groups, the intersection points of the perpendicular bisectors between the selected atom and these three neighboring atoms (i.e., the centers of the circumspheres of the four atoms) are calculated. These intersection points are located in geometrically open regions within the pore space and serve as candidate sites to identify potential accessible locations. Further screening of these candidate sites involves setting a first threshold based on the type of target guest molecule. This first threshold is the sum of the van der Waals radii of the target guest molecules. When the distance between any two candidate sites is less than the first threshold, only the one farther from the host material atom is retained. Then, the zero potential distance between the guest molecule and the host material atom is calculated. The sum of the values ​​is used as a second threshold. When the distance between a candidate site and any atom of the host material is less than this second threshold, the candidate site is eliminated. Sites that pass the screening of the two thresholds are defined as valid sites.

[0020] S3: Set a cutoff radius. For each effective site, traverse all host material atoms within the cutoff radius. Apply the Lennard-Jones potential to the host material atoms within the cutoff radius to find the negative gradient. Calculate the interaction force between the host material atoms and the guest molecules to obtain the net interaction force vector. The specific process is as follows: Calculate the well depth of the Lennard-Jones potential function based on the Lorentz-Berthelot mixing rule. and zero potential distance : ; In the above formula, and Let Lennard-Jones be the zero potential distance and well depth of the guest molecule (located at the effective site). and The zero potential distance and well depth of the Lennard-Jones potential function for the host material atom; the Lennard-Jones potential function between the effective site i and the host material atom j. The formula is as follows: ; in, The scalar distance between the effective site and the atoms of the host material is... Find the negative gradient to obtain the paired force vector between effective site i and host material atom j. Summing over j within the range yields the net interaction force vector at that site. : ; in, The effective site i is the unit vector pointing to the host material atom j.

[0021] S4: Calculate the resultant force at all valid sites The root mean square of the solution is obtained. Through accessible volume fraction Normalize, ; In the above formula, This represents the cutoff radius, which can be 3 Å, 3.5 Å, 5 Å, 8 Å, 12 Å, or 20 Å. Let be the net interaction force vector of the i-th effective site. This represents the total number of valid sites; the formula for the corresponding ICE descriptor is as follows: The ICE descriptor is calculated as follows: ; The MOF material was obtained under the cutoff radius condition. Descriptor value, when The larger it becomes, the greater the interaction between the subject and object. As the interaction between the subject and object decreases, the interaction between them also decreases.

[0022] like Figure 3 As shown, for most MOF structures, the ICE value first increases and then decreases with the increase of the cutoff radius. When the cutoff radius is small, the contribution of the host material atoms gradually increases; when the cutoff radius is too large, the contributions of the far-end atoms cancel each other out, resulting in a decrease in the ICE value.

[0023] S5: Use the ICE descriptor to describe the host-guest interaction in a confined environment. Combine the ICE descriptor, the structural descriptor of the host material, and the attribute descriptor of the guest molecule to form a feature vector. The structural descriptor of the host material includes the maximum pore diameter, pore confinement diameter, density, and accessible volume fraction. The attribute descriptor of the guest molecule includes molecular mass. Finally, use the formed feature vector and the corresponding target variable as input to train a machine learning regression model. The trained machine learning regression model is used to predict the target variable. When the guest molecule is selected as a gas, the target variable is selected as the diffusion coefficient and adsorption capacity of the gas in the host material.

[0024] Example 2: Application of ICE descriptors in diffusion coefficient prediction; Step 1: Dataset Construction. Structural data for approximately 1500 MOFs, along with the corresponding self-diffusion coefficients of hydrogen, helium, methane, nitrogen, and oxygen in each MOF, were obtained from publicly available databases (Daglar et al., ACS Appl. Mater. Interfaces 2022, 14, 32134; Orhan et al., ACS Appl. Mater. Interfaces 2022, 14, 736). MOFs containing open metal sites were removed. The base-10 logarithm of the diffusion coefficient (D) was used as the regression target variable to obtain lgD.

[0025] Step 2: Feature engineering. Following the method in Example 1, calculate the ICE descriptor values ​​for cutoff radii of 3Å, 3.5Å, 5Å, 8Å, 12Å, and 20Å, respectively, and denote them as ICE3, ICE5, ICE8, ICE12, and ICE20, where Å represents angstrom and Å represents the unit of length. Construct a feature combination vector: gas molecule mass (Mass) + maximum cavity diameter (LCD) + pore confinement diameter (PLD) + density ( ) + Accessible volume fraction ( )+ICE.

[0026] Step 3: Model Training and Evaluation. The dataset was divided into training and test sets in a 7:3 ratio. The LightGBM model was used for testing, and hyperparameter optimization was performed using grid search combined with 5-fold cross-validation. The main parameters were set as follows: num_leaves=63, learning_rate=0.1, n_estimators=120. Mean absolute error (MAE) and root mean square error (RMSE) are used as evaluation metrics.

[0027] Results analysis is as follows: Figure 4 As shown, the ICE descriptor is negatively correlated with lgD, which is consistent with physical intuition: the stronger the interaction, the more difficult the diffusion, and the better the correlation with the target variable than other descriptors.

[0028] In the diffusion coefficient prediction task, combined features on the LightGBM model The accuracy reached 0.851, with errors mostly within half an order of magnitude. For example... Figure 5 As shown, most of the predicted values ​​fall within half the order of magnitude of the error of the true values, indicating that the prediction results are in high agreement with the molecular simulation results.

[0029] Example 3: Application of ICE descriptors in adsorption capacity prediction; This embodiment is basically the same as Embodiment 2, except that the prediction target is the adsorption capacity (Uptake, abbreviated as U), and the logarithm to the base 10 is also used to obtain lgU.

[0030] Results analysis is as follows: Figure 6 As shown, in the adsorption capacity prediction task, the individual ICE feature on the LightGBM model... The score reached 0.72. Combined features on the LightGBM model. The result reached 0.924, with most predicted values ​​falling within half the order of magnitude of the actual value, indicating that the prediction results are in high agreement with the molecular simulation results.

[0031] Mechanism Analysis: The ICE descriptor directly quantifies the host-guest interaction and is suitable for physical processes dominated by interaction forces. In the examples, adsorption and diffusion are used for verification, representing two typical scenarios: adsorption, as a thermodynamic process, accumulates and integrates the contributions of all accessible sites and is insensitive to the spatial distribution of sites; diffusion, as a kinetic process, requires a higher degree of accuracy in characterizing the local environment. Therefore, the accuracy of the ICE descriptor in predicting adsorption capacity (…) =0.924) is higher than the predicted diffusion coefficient ( =0.851).

[0032] With the same set of ICE descriptor inputs, the machine learning model exhibits opposite feature contribution patterns for different physical tasks: in adsorption systems, ICE makes a positive contribution, meaning that strong interactions favor adsorption; in diffusion systems, ICE makes a negative contribution, meaning that strong interactions hinder diffusion.

[0033] The above results show that the ICE descriptor constructed in this invention can make differentiated responses to different physical mechanisms and has the ability to distinguish different mass transfer processes.

[0034] Therefore, this invention employs a descriptor construction method for quantifying host-guest interactions in confined environments. By mathematically calculating atomic coordinate data, an ICE descriptor is established to quantify host-guest interactions in confined environments. This method can be applied to predicting the diffusion coefficient and adsorption capacity of gases in host materials.

[0035] Finally, it should be noted that the above embodiments are only used to illustrate the technical solutions of the present invention and not to limit them. Although the present invention has been described in detail with reference to preferred embodiments, those skilled in the art should understand that modifications or equivalent substitutions can still be made to the technical solutions of the present invention, and these modifications or equivalent substitutions cannot cause the modified technical solutions to deviate from the spirit and scope of the technical solutions of the present invention.

Claims

1. A method for constructing descriptors for quantifying subject-object interactions in a confined environment, characterized in that: Includes the following steps: S1: Obtain the atomic coordinate data of the main material; S2: Based on the atomic coordinate data, candidate sites inside the main material are identified by Voronoi diagram analysis, and the candidate sites are screened to remove sites that do not meet the preset conditions, thus obtaining a set of effective sites. S3: Set the cutoff radius. At each effective site, all host material atoms within the cutoff radius are traversed. The negative gradient of the host material atoms within the cutoff radius is obtained using the Lennard-Jones potential. The interaction force between the host material atoms and the guest molecules is calculated, and the net interaction force vector is obtained. ; S4: Calculate the root mean square of the net interaction force vector for all valid sites and obtain the ICE descriptor by normalizing it with the accessible volume fraction; S5: Use ICE descriptors to describe subject-object interactions in a constrained environment.

2. The descriptor construction method for quantifying subject-object interactions in a confined environment according to claim 1, characterized in that: In step S1, the extracted atomic coordinate data specifically includes the element type and atomic fraction coordinates of all atoms.

3. The descriptor construction method for quantifying subject-object interactions in a confined environment according to claim 1, characterized in that: In step S2, the identification of candidate sites is based on Voronoi diagram analysis. The analysis process is as follows: traverse each host material atom and all its neighboring atoms. For each atom and any three of its neighboring atoms forming a non-coplanar quadrilateral group, find the intersection points of the perpendicular bisectors between the atom and these three neighboring atoms. These intersection points are used as candidate sites. Valid sites are obtained by screening the candidate sites, with the following preset conditions: obtaining the zero potential distance in the Lennard-Jones potential function of the guest molecule and the host material atom. A first threshold is set based on the type of target molecule. The first threshold is the sum of the van der Waals radii of the target molecules. The van der Waals radius is determined by... Calculations showed that when the distance between any two candidate sites is less than a first threshold, only the one farther from the host material atom is retained; then the distance between the guest molecule and the host material atom is... The sum of these values ​​serves as a second threshold; when the distance between a candidate site and any atom of the host material is less than this second threshold, the candidate site is eliminated. Sites filtered through two thresholds are defined as valid sites.

4. The descriptor construction method for quantifying subject-object interactions in a confined environment according to claim 1, characterized in that: In step S3, the net interaction force vector is calculated. The process is as follows: Calculate the well depth of the Lennard-Jones potential function based on the Lorentz-Berthelot mixing rule. and zero potential distance : ; In the above formula, and For guest molecules located at effective sites, the zero potential distance and well depth are given by the Lennard-Jones potential function. and The zero potential distance and well depth of the Lennard-Jones potential function for the host material atom; the Lennard-Jones potential function between the effective site i and the host material atom j. The formula is as follows: ; in, The scalar distance between the effective site and the atoms of the host material is... Find the negative gradient to obtain the effective sites. With the main material atoms The paired force vectors between them are summed over j within the range to obtain the corresponding effective sites. Net interaction force vector : ; in, The effective site i is the unit vector pointing to the host material atom j.

5. A descriptor construction method for quantifying subject-object interactions in a confined environment according to claim 1, characterized in that: In step S4, the process of generating the ICE descriptor is as follows: First, obtain the root mean square of the net interaction force vectors for all valid sites. : ; In the above formula, This represents the cutoff radius, which can be 3 Å, 3.5 Å, 5 Å, 8 Å, 12 Å, or 20 Å. Let be the net interaction force vector of the i-th effective site. This represents the total number of valid sites; the formula for the corresponding ICE descriptor is as follows: ; In the above formula, Expressed as the accessible volume fraction of the main material, when The larger it becomes, the greater the interaction between the subject and object. As the interaction between the subject and object decreases, the interaction between them also decreases.

6. A method for constructing a descriptor for quantifying subject-object interactions in a confined environment, as described in claim 5, characterized in that: In step S5, the ICE descriptor is used to describe the interaction between the subject and object in a constrained environment. Its application in predicting the target variable is as follows: The ICE descriptor, the structural descriptor of the host material, and the attribute descriptor of the guest molecule are combined to form a feature vector. The structural descriptor of the host material includes the maximum pore diameter, pore confinement diameter, density, and accessible volume fraction. The attribute descriptor of the guest molecule includes molecular mass. Finally, the formed feature vector and the corresponding target variable are used as input to train a machine learning regression model. The trained machine learning regression model is used to predict the target variable.

7. A method for constructing a descriptor for quantifying subject-object interactions in a confined environment, as described in claim 6, characterized in that: Pre-trained machine learning regression models include the LightGBM model, the XGBoost model, and the RandomForest model.