A knowledge graph driven molecular intelligent design method for oil displacement agents

By constructing a knowledge graph and intelligent algorithms in the field of oil displacement, we can mine the correlation rules between molecular structure, performance and environment, and generate suitable oil displacement agent molecular schemes. This solves the problems of low efficiency and poor adaptability in traditional oil displacement agent design, and realizes efficient and precise oil displacement agent research and development.

CN122245508APending Publication Date: 2026-06-19TONGJI UNIV

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
TONGJI UNIV
Filing Date
2026-03-26
Publication Date
2026-06-19

Smart Images

  • Figure CN122245508A_ABST
    Figure CN122245508A_ABST
Patent Text Reader

Abstract

This invention relates to a knowledge graph-driven intelligent molecular design method for oil displacement agents. It involves integrating multi-source heterogeneous data from experiments, literature, and oilfield sites. Through cleaning, standardization, and entity relationship extraction, a knowledge graph for oil displacement is constructed, deeply integrating molecular structure, functional group attributes, performance indicators, and reservoir environmental parameters. Using graph computing and graph neural network technologies, hidden association rules are mined from the graph, establishing an interpretable mapping model from molecular microstructure to macroscopic performance and environmental adaptability. Finally, based on the established mapping relationships, and with target reservoir conditions and performance requirements as multiple constraints, candidate molecular structural features are generated through reverse reasoning, achieving knowledge-guided precise molecular design. This invention constructs a knowledge graph for oil displacement, revealing the complex intrinsic relationships between molecular functional groups, reservoir environment, and oil displacement performance. Through the combination of knowledge graphs and intelligent algorithms, it achieves intelligent and automated molecular design of oil displacement agents.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of chemical flooding technology for enhancing oil and gas recovery, specifically to an intelligent molecular design method for oil displacement agents that integrates knowledge graph and machine learning technologies. Background Technology

[0002] With the continued growth of global energy demand and the gradual expansion of my country's oil and gas resource development into deeper and unconventional areas, improving oil recovery has become a core strategic measure to ensure national energy security. Chemical flooding technology, represented by surfactants, is one of the leading technologies for improving the recovery rate of old oilfields by significantly reducing oil-water interfacial tension and altering rock wettability. Among them, Gemini surfactants, due to their unique amphiphilic and amphiphilic structure, exhibit superior interfacial activity, lower critical micelle concentration, and stronger salt and temperature resistance compared to traditional single-chain surfactants, and have become a cutting-edge direction in the research and development of ultra-low interfacial tension oil displacement agents.

[0003] However, the development of traditional oil displacement agent molecules relies heavily on the experience and trial-and-error experiments of researchers, and faces three major bottlenecks: 1) Lack of theoretical guidance for molecular design: The structure-activity relationship of molecules is unclear, and the influence mechanism of key structural parameters such as the configuration of hydrophilic and hydrophobic groups and the type of linker on macroscopic oil displacement performance is complex and difficult to quantify and characterize; 2) Poor adaptability to reservoir environment: Molecules synthesized in the laboratory often have excellent performance at specific temperatures and salinities, but they are difficult to adapt to the harsh environment of real reservoirs with high temperature, high salinity, and strong heterogeneity, resulting in a significant decrease in the effectiveness of field applications; 3) Long development cycle and high cost: From molecular design, synthesis, performance testing to reservoir adaptability evaluation, the entire process takes 3-5 years, and the success rate is less than 20%.

[0004] In recent years, the application of artificial intelligence technology, especially the AI ​​for Science paradigm, in materials research and development has provided new pathways to solving the aforementioned problems. Existing research attempts to use machine learning algorithms such as support vector machines and random forests to predict the performance of oil displacement agents, or to use generative adversarial networks to generate molecular structures. However, these methods still have significant limitations: first, the models are mostly "black boxes," lacking embedding and explanation of physicochemical mechanisms; second, the training data mostly comes from experiments under ideal conditions and is not deeply integrated with real reservoir environmental parameters; third, the feasibility and economy of synthesizing reverse-generated molecules are not effectively constrained, making them difficult to engineer.

[0005] Therefore, there is an urgent need for an intelligent design method that can integrate multi-source knowledge, connect the "molecular structure-performance-environment" relationship, and have closed-loop optimization capabilities to achieve efficient and precise research and development of oil displacement agents. Summary of the Invention

[0006] The purpose of this invention is to provide a knowledge graph-driven intelligent design method for oil displacement agent molecules. This knowledge graph-driven intelligent design method for oil displacement agent molecules solves the problems of long trial-and-error cycles and low success rates caused by unclear molecular structure-activity relationships and poor reservoir adaptability in traditional oil displacement agent molecular design methods.

[0007] The technical solution adopted by this invention to solve its technical problem is as follows: This knowledge graph-driven intelligent design method for oil displacement agents constructs a knowledge graph in the field of oil displacement from multi-source data, and mines the complex mapping relationship between molecular structure, performance, and environmental parameters based on the graph to realize the intelligent design of oil displacement agent molecules. Specifically, it includes the following steps: Step S1: Multi-source oil displacement data acquisition and knowledge extraction: Collect historical production data, reservoir data, oil displacement agent experimental data, and oil displacement agent literature data of the target oil reservoir block; at the same time, extract entities and relationships related to the molecular design of oil displacement agents from the literature data through entity recognition technology; Step S2: Construct a knowledge graph for the field of oil displacement agents: Based on the structured data extracted in step S1, a graph database is used to construct a knowledge network of "entity-relationship-attribute" triplets specific to the field of oil displacement agents. Through this knowledge network, molecular structure features, functional group chemical properties, macroscopic performance indicators and reservoir environmental parameters are deeply integrated to finally form a knowledge graph that comprehensively represents the complex relationships in the field of oil displacement agents. Step S3: Intelligent design of oil displacement agent molecules based on knowledge graph: The knowledge graph is trained using a graph convolutional neural network model to mine the association rules between molecular structure, performance and environment. The Transformer architecture model is used for positive performance prediction. At the same time, with reservoir data and performance indicators of the target oil reservoir block as constraints, candidate molecular structures are generated inversely through generative adversarial networks, and a candidate molecular library is formed. Step S4: Verify and evaluate the candidate molecules generated in step S3, and screen out the preferred molecules; Step S5: Optimize and iteratively select the design parameters of the preferred molecule, dynamically adjust the parameters of the preferred molecule, and finally output a technically feasible, economically reasonable and environmentally adaptable oil displacement agent molecule design scheme.

[0008] In the above scheme, the experimental data in step S1 include physical property parameters, such as critical micelle concentration and oil-water interfacial tension; the literature data includes characteristic parameters, such as emulsification rate and salt tolerance extracted by natural language processing technology; the reservoir data includes environmental parameters, such as salinity and temperature; and the Z-Score normalization method is used for preprocessing to eliminate the influence of dimensions, the outlier removal adopts the 3σ criterion, and the entity recognition technology adopts a BERT-based pre-trained model.

[0009] The entities in step S2 of the above scheme include: oil displacement agent molecules, functional groups, physical property parameters, reservoir data, and experimental conditions; the relationships are semantic relationships, including "molecule-include-functional group", "functional group-determine-salt tolerance", "temperature-influence-interfacial tension" and "mineralization-constraint-molecular stability"; the oil displacement agent molecule entity includes Gemini-type surfactants; the functional group entities include sulfonic acid groups, carboxyl groups, and quaternary ammonium salt groups; the graph database is Neo4j; the 12 types of semantic relationship edges include "include", "influence", "determine", "constraint", "characterize", "located in", "have", "depend on", "optimize", "inhibit", "promote", and "equivalent to".

[0010] In the above scheme, step S3 specifically involves: training the knowledge graph using a graph convolutional neural network model to mine the association rules between molecular structure, performance, and environment; updating features by aggregating neighborhood node information; and obtaining the weight matrix and bias values ​​based on data from the oil displacement agent domain. Based on the mined association rules, a Transformer architecture model is used for forward performance prediction, taking the SMILES sequence of the oil displacement agent molecule as input and outputting predicted values ​​of physical property parameters. Simultaneously, using reservoir data and performance indicators of the target oil reservoir block as constraints, a generative adversarial network is used to inversely generate candidate molecular structures. In the optimization objective of the generation model, both real molecules and generated molecules are limited to molecular types that conform to the structural characteristics of oil displacement agents, forming a candidate molecule library for subsequent verification. The graph convolutional neural network model has a 3-layer structure and its correlation prediction accuracy is no less than 90%. The Transformer architecture model is used for forward prediction of critical micelle concentration and interfacial tension. When designing the generative adversarial network, multiple constraints are imposed, including interfacial tension of less than 0.01 mN / m, mineralization of no less than 20000 mg / L, and temperature of no less than 80℃.

[0011] In the above scheme, step S4 specifically involves: using a synthetic pathway prediction tool to evaluate the synthetic feasibility of candidate molecules and screening molecules with fewer than 5 synthetic pathway steps; simultaneously using a molecular simulation tool with a dedicated force field to calculate the total energy of the molecules, verifying the chemical stability of the molecules, and ensuring that the generated molecular structure has low energy and can exist stably; and through the above verification, selecting the preferred molecular scheme that combines synthetic feasibility, structural stability, and performance advantages.

[0012] In the above scheme, step S4, the synthesis feasibility assessment specifically uses the Synthia synthesis pathway prediction tool, with a synthesis pathway of less than 5 steps as the feasibility criterion; the chemical stability verification specifically uses the COMPASS force field in Materials Studio software to perform molecular dynamics simulations, calculating the total energy of the system and screening molecules with lower energies to confirm their stability; Total molecular energy E total The calculation formula is: E total =E bond +E vdw +E coulomb In the formula, E bond For bond energy, E vdw For van der Waals, E coulomb It is coulomb energy.

[0013] In the above scheme, step S5 specifically involves: inputting the preferred molecule into a virtual optimization environment, using key molecular parameters as the state space, including hydrophilic-lipophilic balance value, carbon chain length, and linker length; employing an intelligent optimization algorithm for multi-objective iterative optimization, with the optimization objectives comprehensively considering oil displacement performance, raw material cost, and environmental friendliness; dynamically adjusting molecular parameters through a cumulative reward function and a strategy function; and finally outputting a technically feasible, economically reasonable, and environmentally adaptable oil displacement agent molecular design scheme.

[0014] In step S5 of the above scheme, the hydrophilic-lipophilic balance value ranges from 3 to 18, the carbon chain length ranges from 8 to 20, and the linker length ranges from 2 to 8. The intelligent optimization algorithm specifically employs a deep deterministic gradient algorithm. The multi-objective optimization is specifically achieved through a reward function, which comprehensively considers the reduction in interfacial tension, raw material cost, and environmental toxicity. The weights for interfacial tension reduction, raw material cost, and environmental toxicity are allocated as 60%, 20%, and 20%, respectively; where R represents the reward value, ΔIFT represents the interfacial tension reduction, IFT0 represents the reference interfacial tension, and Cost represents the estimated raw material cost of the current molecular design scheme. max This represents the preset cost ceiling threshold; Toxicity represents the estimated environmental toxicity of the current molecular design scheme. max This indicates the preset upper limit threshold for toxicity. Beneficial effects

[0015] 1. This invention, by constructing a knowledge graph in the field of oil displacement, systematically reveals the complex intrinsic relationship between molecular functional groups, reservoir environment, and oil displacement performance, breaking through the bottleneck of unclear molecular structure-activity relationships in traditional trial-and-error research and development. Mapping rules based on graph mining provide interpretable theoretical guidance for molecular design, realizing a paradigm shift from "experience-driven" to "knowledge-driven." This invention significantly improves the design efficiency and success rate of oil displacement agent molecules, greatly shortening the traditional three-to-five-year research and development cycle, while effectively reducing research and development costs. It provides a novel technical solution for the precise and efficient development of ultra-low interface chemicals in complex reservoir environments such as high temperature and high salinity.

[0016] 2. This invention combines knowledge graphs with intelligent algorithms to achieve intelligent and automated R&D processes, providing strong technical support for cost reduction and efficiency improvement in oil and gas fields.

[0017] 3. This invention deeply integrates multi-source knowledge to construct an interpretable mapping model linking "molecular structure-performance-environment," enabling knowledge-guided targeted molecular design and significantly improving design efficiency and success rate. This method... 4. The principles of this invention are clear, it is highly operable, has a high degree of intelligence, and is easy to implement in engineering. To make the above-mentioned objectives, features, and advantages of this invention more apparent and understandable.

[0018] 5. The core of this invention lies in constructing a dedicated knowledge graph for the research and development of oil displacement agents, and based on this graph, achieving a precise and efficient mapping from reservoir requirements to molecular structures. Attached Figure Description

[0019] Figure 1 This invention is a knowledge graph of oil displacement agents constructed from literature. Figure 2 This is the overall flowchart of the Gemini oil displacement agent molecular intelligent design method based on knowledge graph constraints provided in Embodiment 1 of the present invention; Figure 3 This is a graph showing the changes in training error and goodness of fit of the random forest model in this invention, where (a) is the mean squared error (MSE) as a function of the number of decision trees, and (b) is the coefficient of determination (R²) as a function of the number of decision trees. Figure 4 This is a heatmap of the self-attention mechanism of the Transformer model in this invention; Figure 5 This is a schematic diagram illustrating the feature importance ranking based on the random forest model in this invention; Figure 6 This is a schematic diagram of the molecular performance forward prediction process in this invention, where (a) represents the input molecular structure parameters and (b) represents the predicted surface tension result. Figure 7This is a schematic diagram of the candidate molecule structure generated by reverse design in this invention. Detailed Implementation

[0020] The present invention will be further described below with reference to the accompanying drawings: This knowledge graph-driven intelligent molecular design method for oil displacement agents includes the following steps: Step S1: Multi-source oil displacement data acquisition and knowledge extraction

[0021] For the target reservoir block under study, three types of core data were systematically collected. First, experimental data: at least 1200 sets of experimental data on Gemini surfactants were collected, with key characteristic parameters including critical micelle concentration and oil-water interfacial tension. Second, literature data: at least 5000 relevant research articles were collected, and key performance parameters such as emulsification rate and salt tolerance were extracted using natural language processing technology. Third, reservoir field data: actual production data from at least 30 blocks, focusing on environmental parameters such as salinity and temperature.

[0022] These multi-source heterogeneous data underwent rigorous preprocessing. For numerical data, the Z-Score standardization method was used to eliminate the influence of dimensions, and the mean and standard deviation involved in the calculation formula adopted industry standard values ​​in the field of oil displacement agents. Simultaneously, the 3σ criterion was used to remove outliers, ensuring that the data quality met the accuracy requirements of oil displacement agent research. Entity recognition technology based on a BERT pre-trained model was used to accurately extract entities and semantic relationships closely related to the molecular design of oil displacement agents from massive literature data. These entities include core concepts such as oil displacement agent molecules, functional groups, performance indicators, and reservoir parameters, while the relationships include key correlations such as functional groups determining salt tolerance and temperature affecting interfacial tension. This series of professional data processing steps laid a solid and reliable data foundation for the subsequent construction of a high-quality knowledge graph in the field of oil displacement agents. Step S2: Construction of a knowledge graph in the field of oil displacement agents

[0023] Based on the structured information extracted in step S1, this step utilizes the Neo4j graph database to construct a knowledge graph specific to oil displacement agents, for example... Figure 1 This is a knowledge graph of oil displacement agents constructed from literature. The knowledge graph employs an entity-relationship-attribute triple model, whose mathematical expression clearly defines the graph's structural composition. The entity set contains seven carefully designed core node types, each carrying specialized information required for oil displacement agent research and development.

[0024] Specifically, the molecule node for the oil displacement agent not only records the molecule name but also details key structural parameters such as carbon chain length and hydrophilic-lipophilic balance, as well as synthesis steps. The functional group node further subdivides the core functional groups of the oil displacement agent, such as sulfonic acid and carboxyl groups. The sulfonic acid group node explicitly indicates its salt tolerance critical value of 20,000 mg / L and its activity temperature range of 60-90℃, directly corresponding to the application requirements of high-temperature, high-salinity reservoirs. Performance index nodes include interfacial tension and critical micelle concentration, with the interfacial tension node setting a target threshold of no more than 0.01 mN / m to meet the technical requirements of ultra-low interfacial tension oil displacement. Reservoir parameter nodes include salinity and temperature, clearly indicating the source block and applicable oil displacement agent type. Experimental condition nodes cover synthesis temperature and reaction pH, while also indicating key process indicators such as functional group conversion rate and molecular stability. Characteristic parameter nodes, such as emulsification rate, are strictly correlated with their test conditions to ensure scenario adaptability. Application scenario nodes clearly indicate the main reservoir types, providing a basis for targeted reservoir lookup in the map.

[0025] The relation set, serving as the core link connecting entities, contains twelve types of semantic relationships that strictly adhere to the development logic of oil displacement agents. These relationships specifically include: molecules containing functional groups; functional groups determining performance indicators; reservoir parameters constraining performance indicators; and experimental conditions affecting functional group activity. Each relation type corresponds to specific design requirements, forming a complete logical connection from molecular design to performance prediction and reservoir adaptation. The attribute set further enhances the practicality of entity information; all attributes focus on the key requirements of oil displacement agent design, collectively forming the foundation of the reliability of the spectral data.

[0026] The final constructed knowledge graph for oil displacement agents encompasses over 5,800 entity nodes and over 12,600 semantic relationship edges, forming a comprehensive knowledge support capability covering the entire process from reservoir requirements to functional group selection and molecular design. In practical applications, researchers input the salinity and temperature parameters of the target reservoir, and the graph automatically matches performance requirements through the relationships between entities, recommending suitable functional group and carbon chain combinations, and simultaneously predicting the corresponding performance results, significantly shortening the matching cycle from reservoir requirements to molecular solutions. Step S3: Intelligent molecular design of oil displacement agents based on knowledge graphs

[0027] This step is the core of achieving intelligent design, and it is divided into three progressive sub-steps that fully utilize the domain knowledge contained in the knowledge graph to drive the molecular design process.

[0028] (a) Discovering Association Rules. A graph convolutional neural network model was used to deeply train the constructed knowledge graph. By learning the update process of node feature vectors, the model can capture the complex association patterns hidden in the knowledge graph. During training, the weight matrix and bias values ​​were optimized based on data from the oil displacement agent domain to ensure the model's accurate grasp of domain characteristics. The trained model exhibited strong association reasoning ability, with a graph association prediction accuracy of no less than 90%. Specifically, the model was able to quantify the association rule that sulfonic acid groups determine salt tolerance with a confidence level of up to 92.3%, and the confidence level of temperature affecting interfacial tension reached 89.7%.

[0029] (b) Molecular positive performance prediction. Based on the mined association rules, a deep learning model with a Transformer architecture is used for performance prediction. This model takes the SMILES sequence of the oil displacement agent molecule as input and captures the long-range dependencies between atoms in the molecular sequence through its multi-head self-attention mechanism (the heatmap of its self-attention weights is shown in the figure). Figure 4 As shown in the diagram, the intrinsic relationship between key characteristics such as carbon atoms and hydrophobicity, and oxygen atoms and hydrophilicity, is particularly evident. During the model's calculations, the query matrix and bond matrix represent the atomic characteristics of the Gemini molecule, while the value matrix corresponds to the oil displacement performance characteristics. After training, the model can accurately predict key performance indicators of the molecule under specific reservoir conditions, including critical micelle concentration and interfacial tension. Taking a typical Gemini surfactant molecule as an example, under conditions of 20000 mg / L salinity and 85 degrees Celsius, the model predicts its critical micelle concentration to be 0.032 mmol / L and its interfacial tension to be 0.0085 mN / m, fully meeting the target requirement of ultra-low interfacial tension.

[0030] (c) Reverse Design Process. Using target reservoir parameters and performance indicators as multiple constraints, a generative adversarial network (GAN) is driven to perform reverse molecular generation. This generative model learns the structural distribution characteristics of real Gemini displacement agent molecules through an adversarial training process between the discriminator and the generator. During optimization, the generator aims to produce candidate molecules that meet strict constraints such as interfacial tension not exceeding 0.01 mN / m, salinity not less than 20000 mg / L, and temperature not less than 80 degrees Celsius, while also possessing typical structural characteristics of displacement agents. Step S4: Verification and Evaluation of Design Results

[0031] This step involves a systematic and rigorous validation and evaluation of the numerous candidate molecules generated through reverse engineering, ensuring that the resulting molecules not only meet performance standards but also possess the potential for practical synthesis and application. Validation is primarily conducted from two dimensions: synthetic feasibility and chemical stability.

[0032] For synthetic feasibility assessment, the Synthia synthetic pathway prediction tool was used to systematically evaluate all candidate molecules. Based on a rich organic synthesis knowledge base and reaction rules, this tool can intelligently plan the synthetic pathway for each molecule and estimate the synthetic difficulty. The evaluation results showed that among the 100 generated candidate molecules, 68 molecules had synthetic pathways with fewer than 5 steps, indicating that most candidate molecules have good synthetic feasibility and can meet the economic requirements for industrial production. In particular, molecules with relatively simple structures and high functional group reactivity exhibited even simpler and more efficient synthetic pathways.

[0033] For chemical stability verification, the COMPASS force field in Materials Studio molecular simulation software was used to perform molecular dynamics simulations of the system. By calculating the total energy of the system for each molecule, including important energy terms such as bond energy, van der Waals energy, and Coulomb energy, the thermodynamic stability of the molecules can be comprehensively evaluated. The calculation results show that most of the generated molecules have low total energy, indicating that these molecules are thermodynamically stable and can maintain structural integrity under practical application conditions.

[0034] Through rigorous screening across the two dimensions mentioned above, a number of preferred molecular schemes that combine synthetic feasibility, structural stability, and performance advantages were finally obtained. Step S5: Iterative optimization of design parameters

[0035] The validated and selected molecules were placed in a virtual optimization environment built on OpenAI Gym, where in-depth design parameter optimization iterations were carried out. Three key molecular descriptors were designed for the optimization system's state space: hydrophilic-lipophilic balance, carbon chain length, and linker length. The reasonable range for the hydrophilic-lipophilic balance was set to 3 to 18, the carbon chain length to 8 to 20 carbon atoms, and the linker length to 2 to 8 carbon atoms.

[0036] A deep deterministic policy gradient algorithm is employed as the optimization engine. This algorithm dynamically adjusts molecular parameters to seek the optimal solution through the synergistic effect of the cumulative reward function and the policy function. The reward function design comprehensively considers three key objectives: oil displacement performance, feedstock cost, and environmental friendliness, assigning corresponding weight coefficients. Specifically, the reduction in interfacial tension accounts for 60% of the weight, feedstock cost control accounts for 20%, and environmental toxicity indicators also account for 20%. This weight allocation achieves a balance between economic efficiency and environmental friendliness while ensuring oil displacement performance.

[0037] After multiple rounds of iterative optimization, the algorithm ultimately optimized the hydrophilic-lipophilic balance value to 10.5, the carbon chain length to 16 carbon atoms, and the linker length to 4 carbon atoms, achieving a significant reduction in interfacial tension. Simultaneously, in terms of economic efficiency, it reduced raw material costs by 22% compared to the estimated cost of conventional sulfonate-type Gemini surfactants. This step, together with the knowledge graph in step S2 and the prediction model in step S3, forms a closed loop, with the optimized molecular parameters fed back into the knowledge graph, enabling the model's self-evolution and continuous optimization.

[0038] The final output of the oil displacement agent molecular design scheme achieved comprehensive optimization in terms of technical feasibility, economic rationality, and environmental adaptability, forming a complete closed-loop optimization solution from molecular design to industrial application. It not only meets the stringent requirements of high-temperature and high-salinity reservoir environments but also possesses comprehensive advantages for large-scale promotion and application. Example

[0039] This embodiment uses a high-temperature, high-salinity block in the Shengli Oilfield of China as the target reservoir, with a salinity of 20,000 mg / L and a temperature of 85 degrees Celsius. It utilizes a Gemini surfactant designed according to this invention, exhibiting an interfacial tension of less than 0.01 mN / m under these harsh conditions. The specific application process of the method is as follows: Figure 2 As shown, the steps are as follows: Step S1: Acquisition of multi-source oil displacement data and knowledge extraction.

[0040] The system collected historical production data, publicly available experimental data on Gemini surfactants from both domestic and international sources, and relevant research literature for the target block. It collected no fewer than 1200 sets of experimental data, covering core physical properties such as critical micelle concentration (CMC) and interfacial tension (IFT); no fewer than 5000 literature articles, extracting characteristic parameters such as emulsification rate and salt tolerance using natural language processing; and reservoir data from no fewer than 30 blocks, including environmental parameters such as salinity and temperature. For numerical data, Z-Score normalization was used for preprocessing to eliminate dimensional influences; the calculation formula is as follows:

[0041] In the formula, x represents the original data related to the oil displacement agent, σ represents the industry mean of this type of data, σ represents the industry standard deviation of this type of data, and x′ represents the standardized data. The preprocessed standardized reservoir parameters are obtained as follows: salinity 20000 mg / L, temperature 85℃. Simultaneously, entity recognition technology based on a BERT pre-trained model is used to extract entities and relationships such as "sulfonic acid group - salt tolerance" and "carbon chain length - interfacial tension" from the literature data. Outlier removal criteria are used to remove outliers to ensure data quality. The data processing procedure and some features are shown in Table 1.

[0042] Data types Sample size Core features processing method Experimental data 1200+ CMC, IFT Outlier removal (3σ criterion) Literature data 5000+ articles Emulsification rate, salt resistance BERT Entity Recognition reservoir data 30+ blocks Mineralization, temperature Z-Score Standardization Step S2: Construction of a knowledge graph for the oil displacement field.

[0043] Using the Neo4j graph database as a carrier, the experimental data, literature extraction information, and reservoir parameters after preprocessing in step S1 are integrated and transformed into a structured knowledge network of oil displacement agents. This network is implemented through an "entity-relationship-attribute" triple model, with the mathematical expression as follows:

[0044] Here, G represents the overall knowledge graph in the field of oil displacement agents. This model can establish the entire link between oil displacement agent molecules, from microstructure to macroscopic performance and reservoir environment adaptation.

[0045] The ternary entity set E contains 7 types of nodes with clearly defined functions. Each type of node carries exclusive attributes directly related to oil displacement performance. Taking the Gemini molecular node as an example, such as "Gemini-001", it not only records structural parameters such as carbon chain length and hydrophilic-lipophilic balance, but also indicates the synthesis steps. The functional group nodes are further subdivided into core functional groups of the oil displacement agent, such as sulfonic acid groups and carboxyl groups. Among them, the sulfonic acid group node clearly indicates a salt tolerance critical value of 20,000 mg / L and an activity temperature range of 60-90℃, matching the application requirements of high-temperature and high-salt reservoirs. The carboxyl group node corresponds to a salt tolerance critical value of 15,000 mg / L. The difference in attributes directly reflects the influence of functional groups on the environmental resistance of the oil displacement agent. The performance index nodes include "interfacial tension-01" and "CMC-02", etc. Among them, the interfacial tension node sets a target threshold of ≤0.01mN / m to meet the technical requirements of ultra-low interfacial tension oil displacement, and the critical micelle concentration node is set to ≤0.05mmol / L. The threshold values ​​ensure stable oil displacement effects even at low dosages. Reservoir parameter nodes include "mineralization - 20000" and "temperature - 85," with "mineralization - 20000" indicating the source as the Shengli Oilfield block, and "temperature - 85" clearly indicating compatibility with Gemini-type oil displacement agents, directly linking the reservoir environment to molecular design. Experimental condition nodes cover "synthesis temperature - 80" and "reaction pH - 7," indicating key indicators such as functional group conversion rate ≥90% and molecular stability ≥85%, providing data support for subsequent synthesis feasibility assessments. Characteristic parameter nodes, such as "emulsification rate - 92," need to be associated with specific test conditions of 20000 mg / L mineralization to ensure parameter scenario adaptability. Application scenario nodes, such as "Shengli Oilfield block - 01," clearly indicate the main reservoir type as high-temperature and high-salinity, providing a basis for targeted reservoir queries in the map.

[0046] The relation set R serves as the core link connecting entities. All 12 semantic relations follow the logical chain design of oil displacement agent R&D, with each relation corresponding to specific R&D decision-making needs: The "Gemini molecule - inclusion - functional group" relation, typically exemplified by "Gemini-001 - inclusion - sulfonic acid group - 01," directly establishes the attribution relationship between molecular structure and functional groups, providing a path for subsequent inference of molecular performance through functional groups; the "functional group - determines - performance index" relation, such as "sulfonic acid group - 01 - determines - CMC - 02 ≤ 0.04 mmol / L" and "carboxyl group - 02 - determines - salt tolerance ≤ 15000 mg / L," quantifies the mapping relationship between functional group type and core performance, which is key to structure-activity relationship mining; the "reservoir parameter - constraint - performance index" relation, such as "mineralization - 20000 - constraint - salt tolerance ≥ 20000 mg / L" and "temperature - 85 - constraint - "Interfacial tension stability ≥8h" transforms the stringent requirements of the reservoir environment into a hard constraint on performance indicators, ensuring that the designed molecules can adapt to real reservoirs. The relationship of "experimental conditions - influence - functional group activity", such as "synthesis temperature - 80 - influence - sulfonic acid group - 01 activity ≥90%", reveals the impact of the preparation process on the effectiveness of functional groups, providing a reference for optimizing the synthesis process. In addition, the relationship of "performance indicators - characterization - oil displacement effect", such as "interfacial tension - 01≤0.01mN / m - characterization - oil displacement efficiency ≥85%", the relationship of "application scenario - adaptation - molecular type", such as "Shengli Oilfield block - 01 - adaptation - Gemini-001", and the relationship of "carbon chain length - correlation - hydrophobic properties" and "linker length - adjustment - HLB value", all revolve around the entire process of oil displacement agent development from "molecular design → performance prediction → reservoir adaptation → cost control", forming a closed-loop correlation logic.

[0047] Attribute set A further enhances the practicality of entity information. All attributes focus on key indicators of oil displacement agents. For example, "Gemini-001 - Carbon chain length: C16" clarifies the core hydrophobic parameter of the molecule, "Sulfonic acid group - 01 - Salt tolerance threshold: 20000mg / L" is directly related to the salt tolerance requirements of the reservoir, and "Mineralization - 20000 - Test time: 2024" ensures the timeliness of reservoir data. These attributes together form the basis for the credibility of the map data.

[0048] Ultimately, the knowledge graph for oil displacement agents constructed in this embodiment contains over 5,800 entity nodes (covering mainstream Gemini oil displacement agent molecules, 12 core functional groups, and parameters of over 30 typical reservoir blocks) and over 12,600 semantic relationship edges, forming a comprehensive knowledge support capability covering the entire process from "reservoir requirements → functional group selection → molecular design → performance prediction." In practical applications, by inputting the parameters of "mineralization 20,000 mg / L + temperature 85℃" for the target reservoir, the graph can automatically match the performance requirement of "salt tolerance ≥ 20,000 mg / L" through the correlation between entities, and then recommend a suitable "sulfonic acid group + C16 carbon chain" combination, while simultaneously predicting the corresponding performance result of "interfacial tension ≤ 0.01 mN / m," significantly shortening the matching cycle from reservoir requirements to molecular solutions, and truly realizing knowledge-driven development of oil displacement agents.

[0049] Step S3: Intelligent design of oil displacement agent molecules based on knowledge graph.

[0050] The constructed knowledge graph is trained using a Graph Convolutional Neural Network (GCN) model. GCN updates node features by aggregating neighborhood node information, and its feature update formula is as follows:

[0051] In the formula, H i (i+=1) Let N(i) be the feature vector of the i-th layer oil displacement agent entity (e.g., "sulfonic acid group - 01"), which aggregates neighborhood information. N(i) represents the neighborhood entities of node i (e.g., "salt tolerance - 20000 mg / L" "mineralization - 20000"), and b is the oil displacement agent domain-specific weight matrix (e.g., "sulfonic acid group - salt tolerance" association weight 0.92, "carboxyl group - salt tolerance" association weight 0.75). (l) σ represents the domain bias, and σ is the ReLU activation function. After training, the model exhibits strong associative reasoning capabilities, with a graph association prediction accuracy of no less than 90%, uncovering association rules specific to oil displacement agents. For example, the confidence level for predicting the association rule "sulfonic acid group - determines - salt tolerance" is as high as 92.3%, and the confidence level for "temperature - influence - interfacial tension" is 89.7%.

[0052] Based on the discovered association rules, molecular performance prediction and inverse design are performed. A deep learning model based on the Transformer architecture is used for forward performance prediction. Taking the SMILES sequence of a typical Gemini surfactant molecule as input, the model utilizes its multi-head self-attention mechanism (e.g., ... Figure 4 The long-range dependence of "carbon atom-hydrophobicity" and "oxygen atom-hydrophilicity" in molecules is captured, and its calculation formula is as follows:

[0053] In the formula, Q (query) and K (bond) are the atomic characteristic matrices of the Gemini molecule, V (value) is the oil displacement performance characteristic matrix, and d k This represents the molecular characteristic dimension. The model outputs the predicted performance values ​​of the molecule under the conditions of 20000 mg / L mineralization and 85℃: the predicted critical micelle concentration (CMC) is 0.032 mmol / L, and the predicted interfacial tension (IFT) is 0.0085 mN / m, which meets the target requirements.

[0054] The training process of a positive prediction model is as follows: Figure 3 As shown, the mean squared error (MSE) decreased rapidly and then stabilized, while the coefficient of determination (R²) rose to a high level, indicating that the model training was effective and the prediction accuracy was good. Figure 5 The ranking of model feature importance is shown, revealing that key structural parameters such as the number of carbon atoms and the lipid-water partition coefficient contribute the most to surface tension prediction, consistent with physicochemical rules mined from the knowledge graph, such as 'functional groups determine salt tolerance' and 'carbon chain length affects interfacial tension'. The forward prediction process for molecular properties is as follows: Figure 6 As shown, it provides a process from molecular structure feature input to performance index prediction output.

[0055] Then, reverse design is initiated, driving a generative adversarial network (GAN) under multiple constraints on target reservoir parameters and performance indicators, such as IFT ≤ 0.01 mN / m, salinity ≥ 20000 mg / L, and temperature ≥ 80℃. This generative model learns the distribution of real Gemini displacement agent molecules through adversarial training, and its optimization objective can be expressed as:

[0056] In the formula, x represents the actual Gemini oil displacement agent molecule structure, z represents the molecular random noise vector, and G(z) represents the generated candidate Gemini oil displacement agent molecule. In this embodiment, 100 candidate molecules covering different hydrophilic head groups such as sulfonates and sulfates, as well as different linking groups, were generated in reverse. Among them, 68 molecules meet the core structure requirements of "amphiphilic structure + sulfonic acid group / C14-C18 carbon chain" for oil displacement agents, and the structural qualification rate of the generated molecules is not less than 68%.

[0057] Step S4: Verification and evaluation of the design results.

[0058] Of the 100 candidate molecules generated, 68 were evaluated using the Synthia synthetic pathway prediction tool, indicating good synthetic feasibility with fewer than 5 steps in their synthetic pathways. Furthermore, the stability of the generated molecules was evaluated using the COMPASS force field from Materials Studio; the total energy was calculated using the following formula: E total =E bond +E vdw +E coulomb (6) In the formula, E bond For bond energy, E vdw For van der Waals, E coulomb The energy is Coulomb energy. Calculations show that the generated molecular structures have low energy and good stability. These candidate molecules cover different hydrophilic head groups such as sulfonates, sulfates, and carboxylates, as well as flexible or rigid linkers of different lengths. Some of the generated molecules and their performance examples are shown in Tables 2 and 3. A partial visualization of the two-dimensional structures of these candidate molecules is shown below. Figure 7 As shown, this visually demonstrates the structural diversity of molecules generated by reverse design. Step S5: Iterative optimization of design parameters.

[0060] The candidate molecules were input into a virtual optimization environment built on OpenAI Gym. Its state space includes key molecular descriptors such as hydrophilic-lipophilic balance (HLB), carbon chain length, and linker length. The range of values ​​is shown in Table 4.

[0061] Table 4 State-space parameters and range parameter scope Physical meaning HLB value 3-18 Hydrophilic-lipophilic balance carbon chain length 8-20 Hydrophobic chain carbon number Connection base length 2-8 Number of intergroup carbon atoms in the amphiphilic group The Deep Deterministic Policy Gradient (DDPG) algorithm is used for optimization, and its cumulative reward and policy function are as follows:

[0062] In Equation 7, G t For the accumulated reward at time t, R t+1 For immediate reward, γ (0 < γ < 1) is the discount factor. In Equation 8, μ is the Actor policy function, s is the state, and θ μ For network parameters. The reward function integrates the reduction in interface tension (weighted at 60%), cost (20%), and toxicity (20%), and its specific mathematical expression is: (9)

[0063] Costmax With Toxicity max The preset thresholds can be set according to the economic evaluation standards and environmental regulations of specific oil reservoir projects. After multiple rounds of iterative optimization, the algorithm finally optimized the hydrophilic-lipophilic balance (HLB) value to 10.5, the carbon chain length to 16, and the linker length to 4. Based on the final molecular scheme selected according to these optimal parameter boundaries, its raw material cost is reduced by 22% compared with the estimated cost of conventional sulfonate-type Gemini surfactants. The oil displacement agent molecule obtained in this embodiment is a Gemini-type surfactant, which is used to formulate an ultra-low interfacial tension oil displacement agent. It can reduce the oil-water interfacial tension to below 0.01 mN / m and is suitable for high-temperature and high-salinity oil reservoir environments with a salinity of not less than 20,000 mg / L and a temperature of not less than 85°C.

[0064] Verification example: Hyperparameter sensitivity analysis and optimization This validation example aims to evaluate the impact of different settings of key hyperparameters in the model on the overall performance of the method, so as to ensure the robustness and reproducibility of the method.

[0065] (1) Verification of the impact of GCN layer count: The depth of a graph convolutional neural network (GCN) directly affects its ability to extract features from a knowledge graph. The performance of 2-layer, 3-layer, and 4-layer GCN models in knowledge graph association prediction was tested. As shown in Table 5, the 3-layer GCN achieves the best balance between test accuracy and training efficiency. When the number of layers increases to 4, although the training accuracy improves slightly, the test accuracy decreases, indicating overfitting. Simultaneously, the training time also increases significantly. Therefore, the 3-layer GCN is the better choice.

[0066] Table 5 Performance Comparison of Different GCN Layer Numbers GCN layers Training accuracy Test accuracy Training time (s / epoch) 2 92.1% 88.5% 15 3 96.8% 90.3% 18 4 98.5% 89.7% 25 (2) Validation of the reinforcement learning discount factor γ: The optimization effects under three settings, γ=0.90, 0.95, and 0.99, were compared. As shown in Table 6, when γ=0.95, it can better balance immediate performance and long-term stability, obtain the highest cumulative reward, and finally obtain the lowest molecular interfacial tension and relatively low cost index. When γ=0.90, the final molecular interfacial tension is high and the cost index is relatively high, while when γ=0.99, the optimization process converges slowly and the effect is not optimal. Table 6 Optimization results under different discount factors γ γ Final IFT (mN / m) Final cost index Cumulative Rewards 0.90 0.009 0.85 152 0.95 0.008 0.78 168 0.99 0.008 0.82 155 In summary, the above verification examples demonstrate that the performance of the method provided by this invention is significantly affected by the settings of key hyperparameters. Based on the data scale and complexity of the specific task, an appropriate GCN depth (e.g., 3 layers) and a reinforcement learning discount factor (e.g., 0.95) should be selected to achieve the optimal balance between performance and efficiency.

Claims

1. A knowledge graph-driven intelligent design method for oil displacement agent molecules, characterized in that: This paper describes the construction of a knowledge graph for oil displacement from multi-source data, and the mining of complex mapping relationships between molecular structure, performance, and environmental parameters based on the graph to achieve intelligent design of oil displacement agent molecules. The specific steps include the following: Step S1: Multi-source oil displacement data acquisition and knowledge extraction: Collect historical production data, reservoir data, oil displacement agent experimental data, and oil displacement agent literature data of the target oil reservoir block; at the same time, extract entities and relationships related to the molecular design of oil displacement agents from the literature data through entity recognition technology; Step S2: Construct a knowledge graph for the field of oil displacement agents: Based on the structured data extracted in step S1, a graph database is used to construct a "entity-relationship-attribute" triplet knowledge network for the field of oil displacement agents. This network deeply integrates molecular structure features, functional group chemical properties, macroscopic performance indicators and reservoir environmental parameters to form a knowledge graph that comprehensively characterizes the complex relationships in the field of oil displacement agents. Step S3: Intelligent design of oil displacement agent molecules based on knowledge graph: The knowledge graph is trained using a graph convolutional neural network model to mine the association rules between molecular structure, performance and environment, and to make positive performance predictions. At the same time, with reservoir data and performance indicators of the target reservoir block as constraints, candidate molecular structures are generated in reverse through generative adversarial networks, and a candidate molecular library is formed. Step S4: Verify and evaluate the candidate molecules generated in step S3, and screen out the preferred molecules; Step S5: Optimize and iteratively select the design parameters of the preferred molecule, dynamically adjust the parameters of the preferred molecule, and dynamically output the design scheme of the oil displacement agent molecule.

2. The knowledge graph-driven intelligent design method for oil displacement agent molecules according to claim 1, characterized in that: The experimental data in step S1 includes physical property parameters, such as critical micelle concentration and oil-water interfacial tension; the literature data includes characteristic parameters, such as emulsification rate and salt tolerance extracted by natural language processing technology; the reservoir data includes environmental parameters, such as salinity and temperature; and the Z-Score normalization method is used for preprocessing to eliminate the influence of dimensions, the outlier removal adopts the 3σ criterion, and the entity recognition technology adopts a BERT-based pre-trained model.

3. The knowledge graph-driven intelligent design method for oil displacement agents according to claim 2, characterized in that: The entities in step S2 include: oil displacement agent molecules, functional groups, physical property parameters, reservoir data, and experimental conditions; the relationships are semantic relationships, including "molecule-include-functional group", "functional group-determine-salt tolerance", "temperature-influence-interfacial tension" and "mineralization-constraint-molecular stability"; the oil displacement agent molecule entity includes Gemini-type surfactants; the functional group entities include sulfonic acid groups, carboxyl groups, and quaternary ammonium salt groups; the graph database is Neo4j; the 12 types of semantic relationship edges include "include", "influence", "determine", "constraint", "characterize", "located in", "have", "depend on", "optimize", "inhibit", "promote", and "equivalent to".

4. The knowledge graph-driven intelligent design method for oil displacement agents according to claim 3, characterized in that: Step S3 specifically involves: training the knowledge graph using a graph convolutional neural network model to mine the association rules between molecular structure, performance, and environment. The graph convolutional neural network model updates features by aggregating neighborhood node information, and the weight matrix and bias values ​​are obtained based on data from the oil displacement agent domain. Based on the mined association rules, a Transformer architecture model is used for forward performance prediction, taking the SMILES sequence of the oil displacement agent molecule as input and outputting predicted values ​​of physical property parameters. Simultaneously, using reservoir data and performance indicators of the target oil reservoir block as constraints, a generative adversarial network is used to inversely generate candidate molecular structures. In the optimization objective of the generation model, both real molecules and generated molecules are limited to molecular types that conform to the structural characteristics of oil displacement agents, forming a candidate molecule library that can be verified later. The graph convolutional neural network model has a 3-layer structure and its correlation prediction accuracy is no less than 90%. The Transformer architecture model is used for forward prediction of critical micelle concentration and interfacial tension. When designing the generative adversarial network, multiple constraints are imposed, including interfacial tension of less than 0.01 mN / m, mineralization of no less than 20000 mg / L, and temperature of no less than 80℃.

5. The knowledge graph-driven intelligent design method for oil displacement agent molecules according to claim 4, characterized in that: Step S4 specifically involves: using a synthetic pathway prediction tool to evaluate the synthetic feasibility of candidate molecules and screening molecules with fewer than 5 synthetic pathway steps; simultaneously using a molecular simulation tool with a dedicated force field to calculate the total energy of the molecules and verify their chemical stability, ensuring that the generated molecular structure has low energy and can exist stably; and through the above verification, selecting the preferred molecular scheme that combines synthetic feasibility, structural stability, and performance advantages.

6. The knowledge graph-driven intelligent design method for oil displacement agents according to claim 5, characterized in that: In step S4, the synthesis feasibility assessment specifically employs the Synthia synthesis pathway prediction tool, with a synthesis pathway of less than 5 steps used as the feasibility criterion. Chemical stability verification specifically utilizes molecular dynamics simulations performed using the COMPASS force field within Materials Studio software. The total energy of the system is calculated, and molecules with lower energies are screened to confirm their stability. The total molecular energy E... total The calculation formula is: E total =E bond +E vdw +E coulomb In the formula, E bond For bond energy, E vdw For van der Waals, E coulomb It is coulomb energy.

7. The knowledge graph-driven intelligent design method for oil displacement agents according to claim 6, characterized in that: Step S5 specifically involves: inputting the preferred molecule into a virtual optimization environment, using key molecular parameters as the state space, including hydrophilic-lipophilic balance value, carbon chain length, and linker length; employing an intelligent optimization algorithm for multi-objective iterative optimization, with the optimization objectives comprehensively considering oil displacement performance, raw material cost, and environmental friendliness; dynamically adjusting molecular parameters through a cumulative reward function and a strategy function; and finally outputting a technically feasible, economically reasonable, and environmentally adaptable oil displacement agent molecular design scheme.

8. The knowledge graph-driven intelligent design method for oil displacement agents according to claim 7, characterized in that: In step S5, the hydrophilic-lipophilic balance value ranges from 3 to 18, the carbon chain length ranges from 8 to 20, and the linker length ranges from 2 to 8. The intelligent optimization algorithm specifically employs a deep deterministic gradient algorithm. The multi-objective optimization is achieved through a reward function, which specifically considers the reduction in interfacial tension, raw material cost, and environmental toxicity. The weights for interfacial tension reduction, raw material cost, and environmental toxicity are allocated as 60%, 20%, and 20%, respectively; where R represents the reward value, ΔIFT represents the interfacial tension reduction, IFT0 represents the reference interfacial tension, and Cost represents the estimated raw material cost of the current molecular design scheme. max This represents the preset cost ceiling threshold; Toxicity represents the estimated environmental toxicity of the current molecular design scheme. max This indicates the preset upper limit threshold for toxicity.