Method for constructing and screening potential environmental factors of deep-sea species based on mechanism inference
By constructing a deep-sea ecology knowledge graph and performing multi-objective optimization screening, an optimal set of environmental factors is generated, which solves the problems of mechanism explanatoryness and data scarcity in the selection of environmental factors in deep-sea species distribution models, and improves prediction accuracy and reliability.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- SHANDONG UNIV OF SCI & TECH
- Filing Date
- 2026-05-20
- Publication Date
- 2026-06-19
AI Technical Summary
Existing deep-sea species distribution models lack mechanistic explanations for the selection of environmental factors, have poor adaptability to data scarcity, and fail to effectively identify and quantify the complex characteristics of the deep-sea environment, resulting in low prediction accuracy.
A mechanism-based inference method is used to construct a deep-sea ecological knowledge graph. By constructing rules for potential environmental factors and using multi-objective optimization screening, an optimal set of environmental factors is generated, including potential factors such as deep-sea physiological stress index and benthic resource utilization potential. Multi-objective optimization algorithms are then used for screening and ranking.
It improves the scientific rigor, interpretability, and robustness of deep-sea species distribution prediction, enhances prediction accuracy and reliability, effectively eliminates redundancy in ecological mechanisms and underlying data, and provides clear biological explanations.
Smart Images

Figure CN122242774A_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the field of marine organism distribution prediction technology, and more particularly to the field of deep-sea key species distribution prediction technology, specifically referring to a method for constructing and screening potential environmental factors of deep-sea species based on mechanism inference. Background Technology
[0002] The unique and extreme environmental conditions of the deep sea have fostered a diverse range of life forms. With the increasing exploration and development of deep-sea resources, accurate prediction and assessment of the distribution patterns of key deep-sea species have become increasingly important. Currently, deep-sea species distribution models are one of the key tools for predicting the distribution of deep-sea organisms; however, the construction of these models faces a severe challenge in selecting environmental factors. Existing methods for selecting environmental factors mostly rely on statistical correlation analysis or general machine learning feature selection techniques, which have the following significant technical limitations:
[0003] (1) Existing environmental factors lack explanatory mechanisms. The environmental factors screened by traditional methods often lack a clear connection with the physiological and ecological mechanisms of deep-sea organisms. This makes it difficult for the model results to provide in-depth biological explanations, effectively guide specific conservation strategies, and limit the model's predictive ability in response to future environmental changes. For example, a simple temperature gradient may be related to species distribution, but it fails to explain how temperature drives distribution by affecting physiological processes unique to deep-sea organisms, such as enzyme activity, protein folding, or metabolic rate.
[0004] (2) Existing environmental factor selection methods are highly dependent on data and have poor adaptability in the data-scarce deep-sea environment. The acquisition cost of deep-sea environmental data is extremely high, resulting in scarce data and uneven spatial coverage. Traditional factor selection methods are prone to overfitting or selecting spurious factors in the context of scarce data and high noise, leading to unstable model performance and poor generalization ability. They are even more difficult to effectively capture the complex environmental pressures unique to the deep sea that are difficult to measure directly (such as physiological stress and resource availability).
[0005] Current methods mostly involve simple combination, weighting, or application of existing general algorithms for factor selection. They fail to provide unique solutions for the complex environmental characteristics and species-specific physiological and ecological mechanisms of the deep sea, and fail to effectively identify and quantify the key mechanistic factors that truly drive species distribution under special environmental conditions such as high pressure, low temperature, no light, and chemoenergy in the deep sea. This creates a "technological blind spot" and hinders the further development of deep-sea species distribution modeling technology.
[0006] In view of the above problems, there is an urgent need to develop a method that can systematically and mechanism-drivenly construct and screen potential environmental factors with clear ecological significance in order to improve the scientificity, interpretability, robustness and innovation of deep-sea species distribution prediction. Summary of the Invention
[0007] The purpose of this invention is to provide a method for constructing and screening potential environmental factors for deep-sea species based on mechanism inference, so as to solve the problems of low accuracy in predicting the distribution of deep-sea organisms due to the lack of explanatory mechanisms, poor adaptability to data scarcity, and failure to effectively identify and quantify deep-sea environmental characteristics in existing environmental factor selection methods.
[0008] To achieve the above objectives, this invention provides a method for constructing and screening potential environmental factors for deep-sea species based on mechanism inference, comprising: S1. Construction of deep-sea ecology knowledge graph, including the feature description of the original environmental factors and the ecological characteristics of the target species in the target sea area, and the establishment of the mechanism association mapping relationship between the original environmental factors and the target species; S2. Construction of potential environmental factors: Based on the deep-sea ecology knowledge graph constructed in S1, the rules for constructing potential environmental factors are derived. Based on the derived rules for constructing potential environmental factors, a derivation algorithm for potential environmental factors is constructed. The original environmental factors are used as the input of the algorithm, and multiple potential environmental factors are constructed through the derivation algorithm for potential environmental factors. S3. Mechanism inference-driven screening and prioritization of potential environmental factors: Calculate the mechanism relevance score, mechanism redundancy, and data proxy redundancy for each potential environmental factor. Prioritize the potential environmental factors based on the mechanism relevance score, mechanism redundancy, and data proxy redundancy to generate multiple sets of potential environmental factors. Construct a comprehensive objective function to maximize the mechanism relevance score, minimize the mechanism redundancy, and minimize the data proxy redundancy. Output the set of potential environmental factors with the largest comprehensive objective function value as the optimal set of environmental factors. Each potential environmental factor in the optimal set of environmental factors provides a mechanism-based argument.
[0009] S1 includes: S1.1, Characterize the original environmental factors of the target sea area, including collecting the original environmental factors of the target sea area, constructing a list of original environmental factors of the target sea area, and recording the measurement time, measurement range, variability, measurement uncertainty and availability of the target sea area for each original environmental factor; S1.2, Describe the ecological characteristics of the target species in the target sea area, including collecting the ecological characteristics of the target species in the target sea area, and recording the physiological tolerance range, reproductive mode, feeding strategy, dispersal ability, habitat preference and key ecological interaction information of the target species; S1.3, construct a structured deep-sea ecology knowledge graph, including establishing a mechanism-related mapping relationship between the original environmental factors and the target species based on the feature descriptions of S1.1 and S1.2, and formally representing the mechanism-related mapping relationship between the original environmental factors, the ecological characteristics of the target species, the original environmental factors in S1.1 and the ecological characteristics in S1.2; The deep-sea ecology knowledge graph consists of entity nodes and relation edges. Entity nodes include primary environmental factors, ecological characteristics of the target species, and potential environmental factors. Primary environmental factors include physical factors, chemical factors, topographic and geological factors, and deep-sea event factors. Ecological characteristics of the target species include species characteristics, physiological processes, and ecological processes. Relation edges represent the mechanistic association mapping relationship between primary environmental factors and ecological characteristics of the target species.
[0010] Mechanism association mapping relationships include mechanistic connections and mechanistic hypotheses between original environmental factors and the ecological characteristics of target species. A mechanistic hypothesis consists of multiple mechanistic connections, and each mechanistic connection in a mechanistic hypothesis is accompanied by an attribute used to quantify the reliability of the mechanistic hypothesis. The construction rule for potential environmental factors is to traverse the deep-sea ecology knowledge graph, identify all complete paths that can reach the physiological and ecological processes of the target species from a set of original environmental factors through mechanistic connections, characterize each complete path obtained through the traversal as a Link Ecological Factor (LEF), define a calculation function for each LEF to capture the nonlinear response, threshold effect, interaction, and weight allocation of mechanistic connections, and derive the potential environmental factors.
[0011] Potential environmental factors include deep-sea physiological stress index and the potential for benthic resource availability; Deep-sea physiological stress index for: ; ; In the formula, The temperature of the target sea area, Indicates pressure in the target sea area. This indicates the dissolved oxygen level in the target sea area. Indicates the pH level of the target sea area. It is a nonlinear weighting function; The combined stress index value represents the combined stress factors of temperature, pressure, and dissolved oxygen. , , Let these represent the stress response functions for temperature, pressure, and dissolved oxygen, respectively. This is the interaction function of temperature and pressure. , , , They are respectively , , , The weighting coefficients.
[0012] Benthic resource utilization potential for: ; In the formula, Describe the organic carbon flux of the target sea area. Indicates the water exchange rate in the target sea area. This indicates the sulfide concentration in the target sea area. This indicates the methane concentration in the target sea area. Sediment types in the target sea area; ; In the formula, Indicates in complete The contributions related to organic carbon flux and water exchange rate in the function are as follows: Weighting coefficients representing organic carbon flux. The turbidity factor represents the rate of water exchange.
[0013] The mechanism relevance score in S3 is: ; In the formula, Indicates the first One potential environmental factor, Indicating potential environmental factors A set of mechanistic pathways that connect to all complete pathways of the target species' ecological and physiological processes. Represents any path in the set of mechanism paths. Representing a path The relationship edge on the top, Representing relation edges The mechanism of connection strength.
[0014] The mechanism redundancy is: ; In the formula, Indicates the first One potential environmental factor, and They represent potential environmental factors. and The corresponding set of mechanism paths; and They represent potential environmental factors. and The corresponding set of target physiological and ecological processes; and They represent potential environmental factors. and The corresponding mechanism subgraph; Indicates the similarity of subgraphs in the mechanism; , and These represent the weight coefficients for the mechanism path direction term, the ecological feature term, and the similarity of the mechanism subgraph, respectively. Data proxy redundancy is: ; in, and These represent the construction of potential environmental factors. and The original set of environmental factors, express Any original environmental factor in, express Any original environmental factor in it; Indicates original environmental factors With original environmental factors The expected correlation; These are the weighting coefficients.
[0015] The relevance score for maximizing the mechanism is: ; In the formula, Represents the set of potential environmental factors The overall mechanism relevance score, , is the i-th factor The mechanism relevance score, This represents the set of potential environmental factors.
[0016] Minimize mechanism redundancy for: ; In the formula, Represents the set of potential environmental factors The objective function value of mechanism redundancy is used to characterize the degree of repetition among the selected potential environmental factors at the ecological mechanism level. Indicating potential environmental factors With potential environmental factors Mechanism redundancy between them; Minimize data proxy redundancy for: ; In the formula, Represents the set of potential environmental factors The objective function value of the data proxy redundancy is used to characterize the degree of duplication among the selected potential environmental factors at the level of the original input data. Indicating potential environmental factors With potential environmental factors Data redundancy between proxies.
[0017] The overall objective function is: ; ; In the formula, Represents a set The comprehensive objective function value, Represents the set of potential environmental factors The diversity of coverage Set of potential environmental factors The number of ecological processes covered in it.
[0018] Compared with the prior art, the present invention has the following advantages:
[0019] This invention utilizes path derivation and multi-objective optimization screening based on knowledge graphs. Employing a dual-track evaluation logic combining mechanism relevance scoring and redundancy assessment algorithms, it generates a global set of candidate potential environmental factors, a relevance score list, and a redundancy matrix in system memory for subsequent use. A mature multi-objective optimization algorithm from the computer science field is then applied to solve the model, seeking the optimal solution among various objectives. Without increasing redundancy, it outputs the optimal set of potential environmental factors with the highest relevance and diversity, along with a ranking result. This not only represents an innovative leap from simple raw data to mechanism-driven potential environmental factors at the feature construction level but also effectively eliminates the dual redundancy of ecological mechanisms and underlying data at the evaluation strategy level. It can output the optimal combination of environmental factors with clear deep-sea biological interpretability, significantly improving the prediction accuracy, robustness, and reliability of deep-sea species distribution models. Attached Figure Description
[0020] Figure 1 The graph shows the comparison results of the AUC of the method of this invention and the traditional method under different data scenarios; Figure 2 The figure shows the comparison results of the TSS of the method of the present invention and the traditional method under different data scenarios; Figure 3 The calculation process of DPPI in the potential environmental factors provided by this invention; Figure 4 A flowchart illustrating the calculation of the mechanism relevance score provided by this invention; Figure 5 A flowchart for redundancy evaluation provided by the present invention; Figure 6 The prioritization selection flowchart provided by this invention. Detailed Implementation
[0021] To make the objectives, technical solutions, and advantages of this invention clearer, the technical solutions of this invention are described clearly and completely below. Obviously, the described embodiments are only some, not all, of the embodiments of this invention. All other embodiments obtained by those skilled in the art based on the embodiments of this invention without creative effort are within the scope of protection of this invention.
[0022] Methods for constructing and screening potential environmental factors for deep-sea species based on mechanism inference include: S1. Construction of deep-sea ecology knowledge graph, including the feature description of the original environmental factors and the ecological characteristics of the target species in the target sea area, and the establishment of the mechanism association mapping relationship between the original environmental factors and the target species; S2. Construction of potential environmental factors: Based on the deep-sea ecology knowledge graph constructed in S1, the rules for constructing potential environmental factors are derived. Based on the derived rules for constructing potential environmental factors, a derivation algorithm for potential environmental factors is constructed. The original environmental factors are used as the input of the algorithm, and multiple potential environmental factors are constructed through the derivation algorithm for potential environmental factors. S3. Screening of potential environmental factors: Calculate the mechanism relevance score, mechanism redundancy and data proxy redundancy of each potential environmental factor. Based on the calculation results, prioritize the potential environmental factors and generate a set of potential environmental factors, the relevance score of each potential environmental factor, and the redundancy matrix. Perform hard constraint checks and filter out potential environmental factors that are not available or exceed the set size. S4. Construct a multi-objective optimization model, implement the priority ranking through the comprehensive objective function, maximize the mechanism relevance score, minimize the mechanism redundancy, minimize the data proxy redundancy, select the set of potential environmental factors with the largest comprehensive objective function value as the optimal environmental factor set, and provide mechanistic justification for each potential environmental factor in the optimal environmental factor set.
[0023] S1 includes: S1.1, Characterize the original environmental factors of the target sea area, including collecting the original environmental factors of the target sea area, constructing a list of original environmental factors of the target sea area, and recording the measurement time, measurement range, variability, measurement uncertainty and availability of the target sea area for each original environmental factor; S1.2, Describe the ecological characteristics of the target species in the target sea area, including collecting the ecological characteristics of the target species in the target sea area, and recording the physiological tolerance range, reproductive mode, feeding strategy, dispersal ability, habitat preference and key ecological interaction information of the target species; S1.3, construct a structured deep-sea ecology knowledge graph, including establishing a mechanism-related mapping relationship between the original environmental factors and the target species based on the feature descriptions of S1.1 and S1.2, and formally representing the mechanism-related mapping relationship between the original environmental factors, the ecological characteristics of the target species, the original environmental factors in S1.1 and the ecological characteristics in S1.2; The deep-sea ecology knowledge graph consists of entity nodes and relation edges. Entity nodes include primary environmental factors, ecological characteristics of the target species, and potential environmental factors. Primary environmental factors include physical factors, chemical factors, topographic and geological factors, and deep-sea event factors. Ecological characteristics of the target species include species characteristics, physiological processes, and ecological processes. Relation edges represent the mechanistic association mapping relationship between primary environmental factors and ecological characteristics of the target species.
[0024] Mechanism association mapping relationships include mechanistic connections and mechanistic hypotheses between original environmental factors and the ecological characteristics of target species. A mechanistic hypothesis consists of multiple mechanistic connections, and each mechanistic connection in a mechanistic hypothesis is accompanied by an attribute used to quantify the reliability of the mechanistic hypothesis. The construction rule for potential environmental factors is to traverse the deep-sea ecology knowledge graph, identify all complete paths that can reach the physiological and ecological processes of the target species from a set of original environmental factors through mechanistic connections, characterize each complete path obtained through the traversal as a Link Ecological Factor (LEF), define a calculation function for each LEF to capture the nonlinear response, threshold effect, interaction, and weight allocation of mechanistic connections, and derive the potential environmental factors.
[0025] Establishing the mechanistic association mapping relationship between the original environmental factors and the target species in S1.3 includes identifying and evaluating the causal relationship among the original environmental factors, the target species, and the ecological characteristics (such as physiological processes) of the target species. By connecting the original environmental factors, the physiological processes of the target species, and the target species with directed edges, the causal relationship among the three is formally encoded, and the mechanistic association mapping relationship between the original environmental factors and the target species is established.
[0026] Potential environmental factors include deep-sea physiological stress index and the potential for benthic resource availability; Deep-sea physiological stress index for: ; ; In the formula, The temperature of the target sea area, Indicates pressure in the target sea area. This indicates the dissolved oxygen level in the target sea area. Indicates the pH level of the target sea area. It is a nonlinear weighting function; The combined stress index value represents the combined stress factors of temperature, pressure, and dissolved oxygen. , , Let these represent the stress response functions for temperature, pressure, and dissolved oxygen, respectively. This is the interaction function of temperature and pressure. , , , They are respectively , , , The weighting coefficients.
[0027] Benthic resource utilization potential for: ; In the formula, Describe the organic carbon flux of the target sea area. Indicates the water exchange rate in the target sea area. This indicates the sulfide concentration in the target sea area. This indicates the methane concentration in the target sea area. Sediment types in the target sea area; ; In the formula, Indicates in complete The contributions related to organic carbon flux and water exchange rate in the function are as follows: Weighting coefficients representing organic carbon flux. The turbidity factor represents the rate of water exchange.
[0028] The mechanism relevance score in S3 is: ; In the formula, Indicates the first One potential environmental factor, Indicating potential environmental factors A set of mechanistic pathways that connect to all complete pathways of the target species' ecological and physiological processes. Represents any path in the set of mechanism paths. Representing a path The relationship edge on the top, Representing relation edges The mechanistic connection strength. The mechanistic argument in S4, through the formula for calculating the mechanistic relevance score, explains how to characterize the mechanistic relevance of potential environmental factors by multiplying and accumulating the mechanistic connection strength of all complete paths from environmental factors to the core physiological and ecological processes of the target species, the mechanistic connection strength of the relational edges on the paths, and the confidence of the corresponding mechanistic hypotheses.
[0029] The mechanism redundancy is: ; In the formula, Indicates the first One potential environmental factor, and They represent potential environmental factors. and The corresponding set of mechanism paths; and They represent potential environmental factors. and The corresponding set of target physiological and ecological processes; and They represent potential environmental factors. and The corresponding mechanism subgraph; Indicates the similarity of subgraphs in the mechanism; , and These represent the weight coefficients of the mechanism path term, the ecological characteristic term, and the similarity of the mechanism subgraph, respectively. The mechanism argument in S4, through the calculation formula of mechanism redundancy, explains how to quantify the redundancy of potential environmental factors at the mechanism level by means of the shared set of mechanism paths, the shared set of target physiological and ecological processes, and the similarity of mechanism subgraphs among potential environmental factors.
[0030] Data proxy redundancy is: ; in, and These represent the construction of potential environmental factors. and The original set of environmental factors, express Any original environmental factor in, express Any original environmental factor in it; Indicates original environmental factors With original environmental factors The expected correlation; The weighting coefficient is given in S4. The mechanistic argument in S4 explains how to quantify the data proxy redundancy between two potential environmental factors by using the data proxy redundancy calculation formula, based on the overlap of the original environmental factor sets constituting the potential environmental factors and the expected correlation between the original input factors.
[0031] The relevance score for maximizing the mechanism is: ; In the formula, Represents the set of potential environmental factors The overall mechanism relevance score, , is the i-th factor The mechanism relevance score, This represents the set of potential environmental factors.
[0032] Minimize mechanism redundancy for: ; In the formula, Represents the set of potential environmental factors The objective function value of mechanism redundancy is used to characterize the degree of repetition among the selected potential environmental factors at the ecological mechanism level. Indicating potential environmental factors With potential environmental factors Mechanism redundancy between them; Minimize data proxy redundancy for: ; In the formula, Represents the set of potential environmental factors The objective function value of the data proxy redundancy is used to characterize the degree of duplication among the selected potential environmental factors at the level of the original input data. Indicating potential environmental factors With potential environmental factors Data redundancy between proxies.
[0033] The overall objective function is: ; ; In the formula, Represents a set The comprehensive objective function value, Represents the set of potential environmental factors The diversity of coverage Set of potential environmental factors The number of ecological processes covered in it.
[0034] In S4, mechanistic justification is provided for each potential environmental factor in the optimal set of environmental factors. This includes providing the original environmental factors influencing each potential environmental factor, and how to construct the calculation formula for potential environmental factors using the emergency response function, interaction function, and weighting coefficients of the original environmental factors. Multiple potential environmental factors are generated through this calculation formula. The strength of the mechanistic association between the generated potential environmental factors and the target species, as well as the independence of the generated potential environmental factors at the mechanistic and data levels, are calculated. A multi-objective optimization function is used to select the set of potential environmental factors that best reflects the ecological mechanism and has explanatory power from the generated potential environmental factors, thus constructing the potential environmental factor set. The strength of the mechanistic association between potential environmental factors and the survival, growth, reproduction, or distribution of the target species is directly quantified through mechanism relevance scores. The independence of potential environmental factors at the mechanistic and data levels is characterized by calculating mechanism redundancy and data proxy redundancy. Mechanism redundancy is used to assess the degree of mechanistic overlap between two potential environmental factors in the knowledge graph, and data proxy redundancy is used to assess the degree of overlap between two potential environmental factors at the level of the original input data. A comprehensive objective function integrating mechanism relevance, diversity, mechanism redundancy, and data proxy redundancy is used to achieve this. To maximize the overall mechanism relevance and ecological process coverage diversity, while minimizing mechanism redundancy and data proxy redundancy, the mechanism demonstration will ultimately select the optimal set of potential environmental factors.
[0035] The construction of a knowledge graph for deep-sea ecology and the formalization of mechanism hypotheses involve the structured storage and formal expression of the mechanistic connections and hypotheses of deep-sea environmental elements, characteristics of key deep-sea species, and their interactions. The construction of potential environmental factors involves systematically deriving and defining a series of rules for constructing potential environmental factors based on knowledge graphs and mechanism hypotheses. These rules are used to calculate and generate potential environmental factors with clear deep-sea ecological significance from original deep-sea environmental factors. The mechanism inference-driven screening and prioritization of potential environmental factors is achieved by calculating the mechanism relevance score of each constructed potential environmental factor, conducting redundancy assessment of potential environmental factors, and selecting an optimal set of potential environmental factors from all constructed potential environmental factors.
[0036] In S1, the mechanistic connections and hypotheses of deep-sea environmental elements, characteristics of key deep-sea species, and their interactions are structurally stored and formally expressed. Through the listing and characterization of original environmental factors, description of key deep-sea species characteristics and niches, and the mapping and knowledge graph construction of mechanistic associations, the deep-sea ecology knowledge graph is constructed and the mechanistic hypotheses are formalized, including the following features: Step 1.1, Original Environmental Factor Inventory and Characterization, which involves collecting all available or inferred original environmental factors within the target deep-sea area or ecosystem, and describing their measurement range, variability, and measurement uncertainty; Step 1.2, Description of characteristics and niche of key deep-sea species, that is, for the target key deep-sea species, collect its known or hypothetical characteristics, including physiological tolerance range, feeding strategies, reproductive methods, dispersal ability, habitat preference and key ecological interaction information. Step 1.3, Mechanism Association Mapping and Knowledge Graph Construction, that is, constructing a structured deep-sea ecology knowledge graph or ontology to formally represent the original environmental factors, the species characteristics, and the known or assumed mechanistic connections and mechanistic hypotheses between them.
[0037] Data source collection: Systematically collect all available or inferred raw environmental factor data and their descriptive information for the target deep-sea area (such as seamounts, hydrothermal vents, cold seeps, and abysses) from global ocean databases (such as World Ocean Database, GEBCO, EMODnet), deep-sea exploration reports, historical documents, and expert interviews.
[0038] Factor types include, but are not limited to, physical factors (such as depth, temperature, salinity, pressure, ocean current velocity, dissolved oxygen, pH value, redox potential, and light intensity), chemical factors (such as methane concentration, sulfide concentration, organic carbon flux, and nutrient concentration), geological and topographic factors (such as sediment type, slope, roughness, seabed hardness, and seafloor features), and specific deep-sea events (such as turbidity currents and seismic disturbance frequency).
[0039] The characterization includes, for each original environmental factor, recording its unit of measurement, spatial / temporal resolution, typical range of variation, expected variability, measurement uncertainty, and assessment of its availability or inference in the target sea area.
[0040] Species selection: Collect biological and ecological characteristics of key deep-sea species (such as specific dominant species, indicator species, flagship species, or threatened species); Physiological tolerance range: the physiological limits and optimal ranges for environmental factors such as temperature, pressure, dissolved oxygen, pH, and sulfides; Feeding strategies: such as filter feeding, sedimentary feeding, predation, chemoautotrophy, saprophytism, etc., and their dependence on food sources (organic carbon flux, chemomaterials) and substrate conditions (sediment grain size, ocean currents); Reproduction methods: such as oviparous, viviparous, dispersal ability (livestock floating time and range), and their need for habitat connectivity; Habitat preferences: preferences for substrate type, topography, hydrological conditions, and specific geological features (such as hydrothermal vents); Key ecological interactions: such as competition, predation, symbiosis, parasitism, and their dependence on other species or the environment; Information sources: deep-sea biology monographs, species genome and physiology studies, species distribution databases, and expert experience. Expert experience refers to the rich knowledge and skills accumulated through long-term learning and practice, used to solve complex problems and provide high-quality decision support.
[0041] In step 2 (S2), the potential environmental factors are constructed. Based on the knowledge graph and the mechanism hypothesis, a series of rules for constructing potential environmental factors are systematically derived and defined. These rules are used to calculate and generate potential environmental factors with clear deep-sea ecological significance from the original deep-sea environmental factors. The construction of the potential environmental factors is achieved through the definition of potential environmental factor derivation rules or algorithms, and the definition of potential environmental factor attributes. This includes: Step 2.1, definition of potential environmental factor derivation rules or algorithms, that is, based on the knowledge graph and the mechanism hypothesis, a set of systematic rules or algorithms are defined to synthesize, derive or transform potential environmental factors with clear deep-sea ecological significance from the original environmental factors; Step 2.2, defining the attributes of potential environmental factors, that is, for each constructed potential environmental factor, clarifying its ecological meaning, expected species response pattern, and expected variability in the deep-sea environment. For each constructed LEF, clarify its ecological meaning (e.g., "this index represents the physiological burden of a species in the current environment"), expected species response pattern (e.g., "high DPPI values are expected to lead to reduced species abundance or restricted individual growth"), and expected variability in the deep-sea environment (e.g., "DPPI is expected to show dramatic gradient changes near hydrothermal vents").
[0042] The mechanism inference-driven screening and prioritization of potential environmental factors involves calculating the mechanism relevance score of each constructed potential environmental factor, performing redundancy assessment of the potential environmental factors, and selecting an optimal set of potential environmental factors from all constructed potential environmental factors. This screening and prioritization of potential environmental factors is completed through mechanism relevance score calculation, redundancy assessment, priority ranking selection, and output of screening results along with mechanistic justification. The process includes: Step 3.1, Mechanism relevance score calculation, that is, for each constructed potential environmental factor, based on the knowledge graph and the mechanism hypothesis, quantify its mechanism relevance score with the survival, growth, reproduction or distribution of the target deep-sea key species. Step 3.2, Redundancy assessment, which assesses the potential redundancy among the potential environmental factors, including mechanism redundancy assessment and data proxy redundancy assessment; Step 3.3, Priority ranking selection, that is, based on the mechanism relevance score and the redundancy assessment result, priority ranking is performed to select an optimal set of potential environmental factors; Step 3.4: Output of Screening Results and Mechanism Justification. This step outputs the final selected set of potential environmental factors (LEFs) and provides a detailed mechanism justification for each LEF. The final selected LEF set is output, including the ID of each LEF, its constituent original environmental factors, the calculation formula, and its mechanism relevance score. A detailed mechanism justification is provided for each selected LEF, explaining why it is important, the mechanism by which it affects the target species, and its fit with biological background knowledge.
[0043] Step 1.3 includes: Step 1.3.1: The knowledge graph adopts a triple structure. A triple consists of an entity node and a relation edge. The relation edge contains mechanistic connections and mechanistic hypotheses. Construct a structured deep-sea ecology knowledge graph (KG) using a triplet structure (subject-predicate-object).
[0044] Define multiple types of entity nodes, including: Raw Environmental Factor (referring to physical, chemical, or biological conditions that exist directly in nature, such as temperature, pH, light, and soil salinity, and are not generated by biological activities). Species Characteristic (refers to the observable attributes of a species in terms of morphology, physiology, behavior, or life history, such as physiological tolerance and feeding strategies). Physiological Process (a series of physical and chemical activities and functional mechanisms that occur in an organism, such as enzyme activity and metabolic rate); Ecological processes (dynamic processes involving interactions between organisms and their environment, and between organisms themselves, that affect populations, communities, or ecosystems, such as energy acquisition, reproductive success, and habitat connectivity). Potential Environmental Factor (PFF) refers to environmental conditions that have not yet had a direct impact but may be transformed into actual effects under certain conditions, such as the nutrient reserves in the soil that can be used by plants, future climate fluctuation trends, the Deep Sea Physiological Stress Index (DPPI), and the Rapid Assessment Protocol (RAP). Geological Feature (the long-term structure, composition, and landforms of the Earth's crust, such as rock strata types and hydrothermal vents).
[0045] Define multiple types of relation edges, including: Affects (indicating the influence of one entity on another, which can be further divided into positive effects and negative effects. Positive effects are used to quantify promoting, enhancing, or beneficial effects, while negative effects are used to quantify inhibiting, weakening, or harmful effects). Regulates (meaning one entity controls or maintains the dynamic equilibrium, rate, or state of another entity, such as hormones regulating metabolism or environmental factors regulating enzyme activity). Depends On (meaning that the existence, occurrence, or state of one entity depends on another entity, such as species distribution depending on temperature range). Comprises (indicating the relationship between the whole and its parts, such as a physiological process being composed of multiple subprocesses); Induces (meaning one entity triggers or initiates the occurrence of another entity, such as low temperature-induced dormancy or pollutant-induced gene expression). Tolerance Range For (Tolerance range refers to the range of values of a specific environmental factor that an organism or process can maintain normal function, such as the tolerance range of a species to salinity). Required For (necessary for, required for, indicating that one entity is a prerequisite for the realization or maintenance of another entity, such as water being a prerequisite for photosynthesis).
[0046] Step 1.3.2: The mechanistic connection may be accompanied by a confidence or strength of evidence attribute to quantify the reliability of the mechanistic hypothesis. The description of how environmental factors influence species through specific mechanisms in domain knowledge is formalized into attributed relation edges. For example: (Raw Environmental Factor: High Pressure) -- [Negative Effects (Mechanism: Protein Denaturation)] --> (Physiological Process: Enzyme Activity); (Raw Environmental Factor: Organic Carbon Flux) -- [Positive Affects (RequiredFor: Energy Acquisition)] --> (Ecological Process: Energy Acquisition); (Physiological Process: enzyme activity) -- [Regulates] --> (SpeciesCharacteristic: metabolic rate);
[0047] like Figure 2 The example diagram of a deep-sea ecology knowledge graph fragment illustrates the mechanistic transmission pathways between environmental factors and deep-sea biological characteristics. Bottom water temperature, dissolved oxygen concentration, and deep-sea pressure serve as initial environmental factor nodes, connected to physiological characteristic nodes such as metabolic oxygen demand and cell membrane fluidity via weighted and confidence-weighted relational edges. Ultimately, these nodes point to core physiological and ecological processes (such as energy metabolism and growth, and habitat distribution). This graph-like structure lays the foundation for subsequently transforming prior ecological knowledge into computer-executable mathematical calculations.
[0048] Each mechanistic connection (relationship edge) can be accompanied by an attribute, such as a Confidence Score or Evidence Strength, to quantify the reliability of the mechanism hypothesis. This helps assign different weights to different mechanism paths in subsequent LEF assessments. A mechanistic connection is an edge in the deep-sea ecological knowledge graph, serving as a basic causal unit. A mechanism hypothesis is a complete explanatory path or scientific narrative composed of one or more mechanistic connections. Typically, a mechanism hypothesis consists of one or more mechanistic connections linked together in a temporal, spatial, or causal order.
[0049] Knowledge graph building tools: can be stored and managed using graph databases or graph processing libraries.
[0050] Step 2.1 includes the following features: Step 2.1.1, the latent environmental factor derivation rules or algorithms include path derivation based on the knowledge graph, that is, traversing the knowledge graph, identifying the path from a set of original environmental factors to a certain core physiological and ecological process of a species through continuous mechanism paths, and converting it into the construction rules of the latent environmental factors; By performing path search in the knowledge graph, a complete path can be identified from a set of raw environmental factor nodes, through a series of mechanistic connections such as effects or regulations, to finally reach a physiological process or ecological process node.
[0051] The comprehensive ecological concepts represented by these mechanistic pathways are extracted and defined as Linked Ecological Factors (LEFs). For example, identifying the pathway "temperature -> enzyme activity -> metabolic rate" allows for the derivation of a "deep-sea physiological stress index." LEFs are path-level abstractions of multiple mechanistic relationships, used to quantify or characterize a class of multi-step coupled ecological response mechanisms. A computational function or algorithm is defined for each LEF. These functions should be able to capture the nonlinear responses, threshold effects, interactions, and weight assignments described by the mechanistic hypothesis.
[0052] For example, in the deep-sea physiological stress index, the function It is a nonlinear weighted function constructed based on the species’ physiological tolerance curves to these factors (such as the Sigmoid function, Gaussian function, or piecewise function) and mechanistic hypotheses (such as the synergistic effect of high pressure on protein structure). Describe the intensity of physiological stress caused solely by temperature changes, such as the increase in stress value when deviating from the optimum temperature (commonly using bimodal curves, exponential functions, or threshold functions). Describe the intensity of physiological stress caused solely by changes in pressure (deep-sea organisms are sensitive to stress, and their stress increases when they deviate from the adaptive stress). Describes the intensity of physiological stress caused solely by changes in dissolved oxygen (both hypoxia and supersaturation can produce stress, typically using threshold or Michaelis-Menten functions). Temperature-pressure interaction functions represent the non-additive effects produced when temperature and pressure coexist (e.g., high pressure alters enzyme responses to temperature; the two can be synergistic or antagonistic).
[0053] A function of the exploitable potential of benthic resources The original environmental factors are combined based on the species’ feeding strategies (e.g., filter feeders are sensitive to water flow velocity and organic carbon flux, while chemotherapists are sensitive to sulfide concentration).
[0054] Step 2.1.2, the potential environmental factor derivation rules or algorithms include the construction based on predefined ecological process templates, that is, predefined deep-sea common ecological process templates, and the selection and combination of original environmental factors from the knowledge graph according to the template requirements to construct the potential environmental factors.
[0055] Step 3.1 includes: Step 3.1.1, the quantification of the mechanism relevance score includes calculation based on the cumulative mechanistic connection strength from the constituent factors of the potential environmental factors to the core physiological and ecological processes of the species in the knowledge graph; for each LEF, a path search is performed in the knowledge graph to find all mechanistic paths from the constituent environmental factors of the LEF to the core physiological and ecological processes of the target species (such as energy acquisition, reproductive success, and physiological adaptation). The strength of each path is calculated by accumulating the strength of all mechanistic connections (relationship edges) on the path (e.g., product or summation) and combining it with the confidence of the mechanism hypothesis. The final LEF mechanism relevance score is a comprehensive reflection of the strength of all relevant paths.
[0056] Step 3.1.2, the quantification of the mechanism relevance score includes allocation based on the confidence or strength of the mechanism hypothesis. When calculating the cumulative path strength, priority is given to or weighting of mechanism hypotheses with higher confidence or stronger evidence support in the knowledge graph.
[0057] Step 3.2 includes the following features:
[0058] Step 3.2.1, Mechanism Redundancy Assessment, which assesses whether two potential environmental factors share most of the mechanism pathways or ultimately affect the same species physiological and ecological processes in the knowledge graph. This can be done using a graph similarity algorithm. For any two LEFs, extract the mechanism subgraphs they are involved in in the knowledge graph, that is, the set of all mechanism pathways from their respective constituent factors to the ecological processes they represent.
[0059] The similarity between the two mechanism subgraphs is calculated using a graph similarity algorithm. If the similarity score is higher than a preset threshold, they are considered to have mechanism redundancy. The preset threshold needs to be set according to the specific application scenario and the selected graph similarity algorithm. Generally, the preset threshold is set to 0.80 to 0.90. When the preset threshold is specifically set to 0.85, it indicates that the calculated similarity score of the mechanism subgraph is greater than or equal to 0.85, which means that the two environmental factors have more than 85% high structural overlap in the knowledge graph path affecting the physiology and ecology of the target species. At this point, it can be clearly determined that they have significant mechanism redundancy and will be removed in subsequent screening.
[0060] Step 3.2.2, Data Proxy Redundancy Assessment, involves analyzing the construction formulas of the potential environmental factors and the expected correlations of their original input factors to predict the potential statistical correlations between the potential environmental factors. Based on the construction formula of the LEF, its original input factors are analyzed. Combining the expected statistical correlations among these original environmental factors, the potential statistical correlations between two LEFs are predicted. For example, if two LEFs share a large number of identical and highly correlated original input factors, their data proxy redundancy is high.
[0061] Step 3.2.3, calculate the mechanism redundancy, the mechanism redundancy... , and For weight coefficients with values greater than zero, and satisfying ; The target set of physiological and ecological processes is a potential environmental factor that is mechanistically linked to all physiological and ecological processes.
[0062] Step 3.2.4: Calculate the data proxy redundancy, with the weight parameters satisfying 0 ≤ ≤1.
[0063] Step 3.3 includes: Step 3.3.1: Maximize the overall mechanism relevance score of the selected set of potential environmental factors S. ; Step 3.3.2: Minimize the set of potential environmental factors. Internal mechanism redundancy and data proxy redundancy; The objective function value representing the mechanism redundancy of the selected set of potential environmental factors is used to characterize the degree of repetition among the selected potential environmental factors at the ecological mechanism level; the smaller the value, the less the mechanism overlap among the selected potential environmental factors. The value of the objective function representing the data proxy redundancy of the selected set of potential environmental factors is used to characterize the degree of duplication among the selected potential environmental factors at the level of the original input data; the smaller the value, the lower the data correlation among the selected potential environmental factors. Indicating potential environmental factors With potential environmental factors Mechanism redundancy is used to measure the degree of similarity between the two in the knowledge graph, such as shared mechanism paths, jointly acting species physiological and ecological processes, or mechanism action chains; the larger the value, the more redundant the two are at the mechanism level. Indicating potential environmental factors With potential environmental factors The data proxy redundancy between the two measures the degree of overlap in terms of the composition of the original input environmental factors, the correlation of input variables, or substitutability; the larger the value, the more redundant the two are at the data level.
[0064] Step 3.3.3: Ensure that the selected potential environmental factors cover the diversity of key ecological processes of the species; Step 3.3.4 further considers the availability or inferability of the original environmental factor data required to construct the potential environmental factors in the target sea area as a constraint.
[0065] Maximum quantity Kmax is a preset, externally defined parameter introduced to ensure the practicality and operability of the final screening results. In practical applications, it needs to be set and adjusted according to the specific circumstances. Typically, for deep-sea species distribution prediction tasks, Kmax is set between 2 and 10. To maintain the model's generalization ability under extremely sparse deep-sea observation data (e.g., only 20% of the data is retained), extremely rigorous feature dimensionality reduction must be adopted. Setting Kmax=2 forces the multi-objective optimization algorithm to maximize the elimination of mechanism redundancy and data redundancy while pursuing mechanism relevance, ultimately accurately identifying the two core mechanism factors driving the survival of this type of coral (such as the combination of DPPI and respiratory stress), thus maintaining extremely high AUC and TSS prediction scores even under extremely scarce data. When conducting a regional deep-sea large-scale benthic community habitat suitability assessment project, Kmax is set to 5. This allows the model to retain sufficient diversity to cover different physiological mechanism pathways (improving the physiological diversity score), while strictly controlling it within 5 dimensions to prevent severe feature collinearity interference when inputting random forest or maximum entropy models, ensuring that the output results can be used to guide the actual delineation of deep-sea marine protected areas.
[0066] Step 3.3.5: Priority ranking is achieved through a multi-objective optimization model.
[0067] It is important to note that, since deep-sea feature screening involves a massive combinatorial explosion problem, mature multi-objective optimization algorithms from the computer science field (such as, but not limited to, the non-dominated sorting genetic algorithm NSGA-II or multi-objective particle swarm optimization) are applied to solve the model in order to find the optimal Pareto Front among the various objectives. Through Pareto Front optimization, the system can output the optimal set of potential environmental factors with the highest relevance and diversity, along with the ranking results, without increasing redundancy, thereby significantly improving the prediction accuracy and robustness of the deep-sea species distribution model.
[0068] The method described above for constructing and screening potential environmental factors for key deep-sea species based on mechanism inference includes, but is not limited to, any one or a combination of the following: deep-sea physiological stress index, resource availability potential, habitat complexity proxy, or physical disturbance intensity index.
[0069] Through the aforementioned path derivation and multi-objective optimization screening based on knowledge graphs, an innovative leap has been achieved from simple raw data to mechanism-driven potential environmental factors at the feature construction level. At the evaluation strategy level, the dual redundancy of ecological mechanisms and underlying data has been effectively eliminated, and the optimal combination of environmental factors with clear deep-sea biological interpretability can be output. This significantly improves the scientificity, robustness and reliability of species distribution pattern prediction under deep-sea sparse data conditions.
[0070] A simulation experiment was conducted to predict the survival of deep-sea cold-water corals. The simulation data included: Characteristics associated with biological mechanisms (such as temperature, pressure, and dissolved oxygen) directly affect the survival of corals.
[0071] Noise features unrelated to coral survival (such as salinity and silicates) may create spurious statistical associations in the data.
[0072] The experiment was conducted under two data scenarios: abundant data (100% training data) and extremely scarce data (only 20% training data) to simulate the real challenges of deep-sea research. The performance of the prediction models built using the present invention and traditional methods was compared and analyzed through simulation experiments. Traditional methods directly use all raw data features (including mechanistic and noise features) for modeling, representing the current mainstream pure data-driven prediction modeling approach. The present invention strictly follows the steps of S1 (construction of a deep-sea ecological knowledge graph), S2 (construction of potential environmental factors), and S3 (mechanism-inference-driven screening and priority ranking of potential environmental factors), systematically selecting a few key features most relevant to the target for prediction modeling. The present invention encodes the biological prior knowledge related to the target species, coral, by constructing a mechanistic knowledge graph and deletes any mechanistic paths that have no connection to coral survival, systematically eliminating noise features and preventing their influence on prediction.
[0073] Figure 1 The graph shows a comparison of the AUC (Area Under the Receiver Operating Characteristic Curve) between the method of this invention and the traditional method under different data scenarios, reflecting the AUC scores of the two methods under different data volumes. Figure 2The graph shows a comparison of the TSS (Total Self-Score) of the proposed method and the traditional method under different data scenarios. Blue rectangles represent the performance data of the proposed method under different data volumes, reflecting the TSS scores of the two methods under different data volumes. Red rectangles represent the performance data of the traditional method under different data volumes. The ROC (Receiver Operating Characteristic Curve) evaluates the performance of a binary classification model by plotting the relationship between the false positive rate (FPR) and the true positive rate (TPR). In classification problems, the area under the ROC curve (AUC) is a commonly used metric for evaluating the performance of binary classification models, especially suitable for evaluating the model's ability to distinguish between positive and negative samples. The AUC score ranges from 0 to 1, where an AUC score of 0.5 indicates that the model's performance is no different from random guessing, and an AUC score of 1.0 indicates that the model can perfectly distinguish between positive and negative samples. A higher AUC score indicates better model classification performance, meaning the model achieves a better balance between correctly identifying the positive class (e.g., species presence) and correctly identifying the negative class (e.g., species absence). Figure 1 As shown, in the full data scenario, the AUC value of the traditional method (red) is 0.960, and the AUC value of the method of this invention (blue) is 0.961. This indicates that when data is sufficient, the performance of the two methods is similar and both exhibit excellent classification performance (AUC values are close to 1.0). In the sparse data scenario, the AUC score of the traditional method (red) is 0.950, and the AUC score of the method of this invention (blue) is 0.955. This shows that when the data volume is reduced to 20% sparse data, the AUC scores of the two methods decrease slightly, but still remain at a high level. Compared with the traditional method, the performance decline of the method of this invention under sparse data is smaller, indicating that it may have better robustness in handling data incompleteness. In ecological and species distribution models, the Tree Species Spatial Diversity Index (TSS) is another important model performance evaluation indicator, combining the model's sensitivity (the proportion of correctly predicted points) and specificity (the proportion of correctly predicted points). The TSS score ranges from -1 to 1. A TSS score of 0 indicates that the model performance is no different from random guessing. A TSS score of 1 indicates that the model perfectly predicts the distribution of species. A negative TSS score indicates that the model performs worse than random guessing. A higher TSS score indicates higher predictive accuracy, especially indicating greater stability when dealing with imbalanced data (e.g., the number of species present is far less than the number of species absent). Figure 2As shown, in the full data scenario, the TSS score of the traditional method (red) is 0.763, while the TSS score of the method of this invention (blue) is 0.768. Both methods demonstrate good predictive ability (far above 0), with the method of this invention showing a slight advantage. In the sparse data scenario, the TSS score of the traditional method (red) is 0.718, while the TSS score of the method of this invention (blue) is 0.761. In the case of sparse data, the TSS score of the traditional method decreases significantly (from 0.763 to 0.718). Compared to the traditional method, the TSS score of the method of this invention decreases much less (from 0.768 to 0.761), and in the sparse data scenario, the TSS score of the method of this invention (0.761) is significantly better than that of the traditional method (0.718), with a more significant performance difference. This indicates that compared to the traditional method, the method of this invention can better maintain its predictive ability and reliability when facing the challenge of data sparsity.
[0074] Figure 3 This paper demonstrates the calculation and mechanistic justification process of the Deep-Sea Physiological Stress Index (DPPI). Using the DPPI as a concrete example, it illustrates the logical process of constructing a potential environmental factor (LEF) with high-order biological or ecological significance from multiple raw environmental parameters through knowledge graph-based mechanism inference and algorithmic calculation. Figure 3 As shown, using temperature, dissolved oxygen, and pressure as input data, an algorithm derived from a knowledge graph is used to construct a formula for calculating the Potential Deep-Sea Physiological Stress Index (DPPI). This includes using nonlinear functions to construct a nonlinear mapping relationship between the deep-sea temperature field and the original dissolved oxygen data, calculating the dissolved oxygen and temperature co-metabolic index, calculating the pressure compensation coefficient, compensating for the nonlinear mapping between the deep-sea temperature field and dissolved oxygen data using deep-sea pressure data, constructing corresponding potential environmental factor calculation formulas based on the environmental pressure interaction mechanism, comprehensively considering the impact of temperature, dissolved oxygen, and pressure on the target species, outputting the Deep-Sea Physiological Stress Index (DPPI), providing mechanistic evidence for the Deep-Sea Physiological Stress Index, and constructing a prediction model based on the Deep-Sea Physiological Stress Index (DPPI), outputting the expected species response pattern and ecologically significant attributes.
[0075] Figure 4 The flowchart for calculating the mechanism relevance score provided by this invention is as follows: Figure 4The ecological process is constructed by multiplying and summing the paths from environmental factors to the core physiological and ecological processes of the target species, the mechanistic connection strength of the relation edges on the paths, and the confidence of the corresponding mechanism hypotheses (weighted product confidence). The ecological process is then summed to output a mechanism relevance score, which is used to characterize the mechanism relevance of potential environmental factors.
[0076] Figure 5 The flowchart provided by this invention describes the calculation process of redundancy assessment at the mechanism and data levels, characterizing the redundancy of potential environmental factors (output process of mechanism redundancy and data proxy redundancy). The prediction system built based on the method of this invention includes a redundancy assessment module, such as... Figure 5 As shown, the system's redundancy assessment module calculates the redundancy between any two candidate factors by comparing the intersection of the knowledge graph mechanism subgraphs (mechanism redundancy) and the overlap of the underlying raw input data (data proxy redundancy). It outputs both mechanism redundancy and data proxy redundancy, and generates a global set of candidate latent environmental factors, a list of relevance scores, and a redundancy matrix in system memory for subsequent calls. The calculation of mechanism redundancy includes constructing knowledge graph subgraphs for two latent environmental factors, calculating the intersection-union ratio of mechanism paths, and outputting the mechanism redundancy based on the similarity between the mechanism path intersection-union ratio and the knowledge graph subgraphs. The calculation of data proxy redundancy includes obtaining the underlying raw input data, the overlap of input factors, and the expected relevance, and outputting the data proxy redundancy.
[0077] Figure 6 The flowchart for priority sorting selection provided by the present invention is as follows: Figure 6 As shown, the system first inputs the global candidate potential environmental factor set, correlation score, and redundancy matrix generated in the previous steps; then, it performs hard constraint checks to filter out data that is unavailable or exceeds the set size. The combination of four sub-objectives is used to construct a multi-objective optimization function consisting of four sub-objectives (maximizing overall mechanism relevance, maximizing process coverage diversity, minimizing overall mechanism redundancy, and minimizing overall data redundancy). Based on the multi-objective optimization function, a multi-objective optimization model is constructed. The optimal weight combination is solved by applying an optimization algorithm through the multi-objective optimization model. The optimal set of potential environmental factors or the optimal solution of the Pareto Front is output, as well as the ranking results of multiple potential environmental factors in the set.
[0078] The above embodiments are only used to illustrate the technical solutions of the present invention, and are not intended to limit it. Although the present invention has been described in detail with reference to the foregoing embodiments, those skilled in the art should understand that modifications can still be made to the technical solutions described in the foregoing embodiments, or equivalent substitutions can be made to some or all of the technical features. Such modifications or substitutions do not cause the essence of the corresponding technical solutions to deviate from the scope of the technical solutions of the embodiments of the present invention.
Claims
1. A method for constructing and screening potential environmental factors for deep-sea species based on mechanism inference, characterized in that, include: S1. Construction of deep-sea ecology knowledge graph, including the feature description of the original environmental factors and the ecological characteristics of the target species in the target sea area, and the establishment of the mechanism association mapping relationship between the original environmental factors and the target species; S2. Construction of potential environmental factors: Based on the deep-sea ecology knowledge graph constructed in S1, the rules for constructing potential environmental factors are derived. Based on the derived rules for constructing potential environmental factors, a derivation algorithm for potential environmental factors is constructed. The original environmental factors are used as the input of the algorithm, and multiple potential environmental factors are constructed through the derivation algorithm for potential environmental factors. S3. Mechanism inference-driven screening and prioritization of potential environmental factors: Calculate the mechanism relevance score, mechanism redundancy, and data proxy redundancy for each potential environmental factor. Prioritize the potential environmental factors based on the mechanism relevance score, mechanism redundancy, and data proxy redundancy to generate multiple sets of potential environmental factors. Construct a comprehensive objective function to maximize the mechanism relevance score, minimize the mechanism redundancy, and minimize the data proxy redundancy. Output the set of potential environmental factors with the largest comprehensive objective function value as the optimal set of environmental factors. Each potential environmental factor in the optimal set of environmental factors provides a mechanism-based argument.
2. The method for constructing and screening potential environmental factors for deep-sea species based on mechanism inference according to claim 1, characterized in that, S1 includes: S1.1, Characterize the original environmental factors of the target sea area, including collecting the original environmental factors of the target sea area, constructing a list of original environmental factors of the target sea area, and recording the measurement time, measurement range, variability, measurement uncertainty and availability of the target sea area for each original environmental factor; S1.2, Describe the ecological characteristics of the target species in the target sea area, including collecting the ecological characteristics of the target species in the target sea area, and recording the physiological tolerance range, reproductive mode, feeding strategy, dispersal ability, habitat preference and key ecological interaction information of the target species; S1.3, construct a structured deep-sea ecology knowledge graph, including establishing a mechanism-related mapping relationship between the original environmental factors and the target species based on the feature descriptions of S1.1 and S1.2, and formally representing the mechanism-related mapping relationship between the original environmental factors, the ecological characteristics of the target species, the original environmental factors in S1.1 and the ecological characteristics in S1.2; The deep-sea ecology knowledge graph consists of entity nodes and relation edges. Entity nodes include primary environmental factors, ecological characteristics of the target species, and potential environmental factors. Primary environmental factors include physical factors, chemical factors, topographic and geological factors, and deep-sea event factors. Ecological characteristics of the target species include species characteristics, physiological processes, and ecological processes. Relation edges represent the mechanistic association mapping relationship between primary environmental factors and ecological characteristics of the target species.
3. The method for constructing and screening potential environmental factors for deep-sea species based on mechanism inference according to claim 2, characterized in that, Mechanism association mapping relationships include mechanistic connections and mechanistic hypotheses between original environmental factors and the ecological characteristics of target species. A mechanistic hypothesis consists of multiple mechanistic connections, and each mechanistic connection in a mechanistic hypothesis is accompanied by an attribute used to quantify the reliability of the mechanistic hypothesis. The construction rule for potential environmental factors is to traverse the deep-sea ecology knowledge graph, identify all complete paths that can reach the physiological and ecological processes of the target species from a set of original environmental factors through mechanistic connections, characterize each complete path obtained through the traversal as a Link Ecological Factor (LEF), define a calculation function for each LEF to capture the nonlinear response, threshold effect, interaction, and weight allocation of mechanistic connections, and derive the potential environmental factors.
4. The method for constructing and screening potential environmental factors for deep-sea species based on mechanism inference according to claim 3, characterized in that, Potential environmental factors include deep-sea physiological stress index and the potential for benthic resource availability; Deep-sea physiological stress index for: ; ; In the formula, The temperature of the target sea area, Indicates pressure in the target sea area. This indicates the dissolved oxygen level in the target sea area. Indicates the pH level of the target sea area. It is a nonlinear weighting function; The combined stress index value represents the combined stress factors of temperature, pressure, and dissolved oxygen. , , Let these represent the stress response functions for temperature, pressure, and dissolved oxygen, respectively. This is the interaction function of temperature and pressure. , , , They are respectively , , , The weighting coefficients.
5. The method for constructing and screening potential environmental factors for deep-sea species based on mechanism inference according to claim 4, characterized in that, Benthic resource utilization potential for: ; In the formula, Describe the organic carbon flux of the target sea area. Indicates the water exchange rate in the target sea area. This indicates the sulfide concentration in the target sea area. This indicates the methane concentration in the target sea area. Sediment types in the target sea area; ; In the formula, Indicates in complete The contributions related to organic carbon flux and water exchange rate in the function are as follows: Weighting coefficients representing organic carbon flux. The turbidity factor represents the rate of water exchange.
6. The method for constructing and screening potential environmental factors for deep-sea species based on mechanism inference according to claim 5, characterized in that, The mechanism relevance score in S3 is: ; In the formula, Indicates the first One potential environmental factor, Indicating potential environmental factors A set of mechanistic pathways that connect to all complete pathways of the target species' ecological and physiological processes. Represents any path in the set of mechanism paths. Representing a path The relationship edge on the top, Representing relation edges The mechanism of connection strength.
7. The method for constructing and screening potential environmental factors for deep-sea species based on mechanism inference according to claim 6, characterized in that, The mechanism redundancy is: ; In the formula, Indicates the first One potential environmental factor, and They represent potential environmental factors. and The corresponding set of mechanism paths; and They represent potential environmental factors. and The corresponding set of target physiological and ecological processes; and They represent potential environmental factors. and The corresponding mechanism subgraph; Indicates the similarity of subgraphs in the mechanism; , and These represent the weight coefficients for the mechanism path direction term, the ecological feature term, and the similarity of the mechanism subgraph, respectively. Data proxy redundancy is: ; in, and These represent the construction of potential environmental factors. and The original set of environmental factors, express Any original environmental factor in, express Any original environmental factor in it; Indicates original environmental factors With original environmental factors The expected correlation; These are the weighting coefficients.
8. The method for constructing and screening potential environmental factors for deep-sea species based on mechanism inference according to claim 7, characterized in that, The relevance score for maximizing the mechanism is: ; In the formula, Represents the set of potential environmental factors The overall mechanism relevance score, , is the i-th factor The mechanism relevance score, This represents the set of potential environmental factors.
9. The method for constructing and screening potential environmental factors for deep-sea species based on mechanism inference according to claim 8, characterized in that, Minimize mechanism redundancy for: ; In the formula, Represents the set of potential environmental factors The objective function value of mechanism redundancy is used to characterize the degree of repetition among the selected potential environmental factors at the ecological mechanism level. Indicating potential environmental factors With potential environmental factors Mechanism redundancy between them; Minimize data proxy redundancy for: ; In the formula, Represents the set of potential environmental factors The objective function value of the data proxy redundancy is used to characterize the degree of duplication among the selected potential environmental factors at the level of the original input data. Indicating potential environmental factors With potential environmental factors Data redundancy between proxies.
10. The method for constructing and screening potential environmental factors for deep-sea species based on mechanism inference according to claim 9, characterized in that, The overall objective function is: ; ; In the formula, Represents a set The comprehensive objective function value, Represents the set of potential environmental factors The diversity of coverage Set of potential environmental factors The number of ecological processes covered in it.