Environment health risk early warning system and method based on data fusion and artificial intelligence, and storage medium

The environmental health risk early warning system based on data fusion and artificial intelligence solves the problems of insufficient causal relationship identification, multi-source data integration, individual difference assessment and regional adaptability in environmental health research. It realizes full-chain causal quantification, efficient integration of multi-source data and individualized risk assessment, provides multi-level early warning and decision support, and improves the accuracy and foresight of environmental health risk assessment.

CN122245833APending Publication Date: 2026-06-19HUBEI PROVINCIAL ACADEMY OF ECO-ENVIRONMENTAL SCIENCES(PROVINCIAL ECOLOGICAL ENVIRONMENT ENGINEERING ASSESSMENT CENTER)

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
HUBEI PROVINCIAL ACADEMY OF ECO-ENVIRONMENTAL SCIENCES(PROVINCIAL ECOLOGICAL ENVIRONMENT ENGINEERING ASSESSMENT CENTER)
Filing Date
2026-03-17
Publication Date
2026-06-19

AI Technical Summary

Technical Problem

Existing technologies in environmental health research suffer from several problems, including difficulty in identifying causal relationships, insufficient integration of multi-source data, one-sided assessment of multi-media exposure, lack of individualized assessment, delayed health risk warnings, difficulty in quantifying the health benefits of emission reduction measures, gaps in the assessment of emerging pollutants, unresolved issues of the superposition effects of historical and existing pollution, insufficient regional adaptability, and lack of inter-system synergy. These issues make it difficult to achieve full-chain causal inference from pollution sources to health effects, multi-source data fusion, individualized risk assessment, and forward-looking early warning.

Method used

An environmental health risk early warning system based on data fusion and artificial intelligence is adopted, including a multi-source data acquisition and fusion platform, a pollution source-environmental medium-human exposure-health effect correlation analysis module, a causal inference and prediction module, a health risk dynamic assessment module, a multi-level early warning and decision support module, an organ system-specific damage biomarker map, a regional adaptation mechanism module, and a multi-system collaborative mechanism module. Through deep learning, machine learning, and large language model technology, it realizes the spatiotemporal matching and fusion of multi-source heterogeneous data, performs causal relationship identification and individualized risk assessment, and provides multi-level early warning and decision support.

Benefits of technology

It has achieved full-chain causal quantification, efficient fusion of multi-source data, cross-media individualized exposure assessment, multi-level biomarker early warning, and regional adaptive collaboration, which has significantly improved the accuracy of health risk assessment and the foresight of early warning, reduced fusion delay and error, and enhanced the ability to assess the health benefits of emission reduction measures.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN122245833A_ABST
    Figure CN122245833A_ABST
Patent Text Reader

Abstract

This invention discloses an environmental health risk early warning system, method, and storage medium based on data fusion and artificial intelligence. It integrates real-time data on pollution source emissions, environmental media monitoring, population activities and exposure, and health effects. Using an artificial intelligence analysis engine, it sequentially performs pollution source analysis, multi-media transport simulation, human exposure dose calculation, and health effect assessment to construct a full-chain correlation model. It identifies the causal relationship between pollutant exposure and health effects and uses a machine learning prediction model to predict environmental quality changes and population health risks under future pollution scenarios. Based on biomarker monitoring data, it sets four-level early warning thresholds to achieve multi-level, precise early warning of pollutant exposure-related health risks. It quantitatively evaluates the environmental quality improvement effects and health benefits of different emission reduction measures, generating tiered early warning information and targeted intervention suggestions to support environmental management and public health decision-making.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of interdisciplinary technology of environmental health and artificial intelligence, specifically to an environmental health risk early warning system, method and storage medium based on data fusion and artificial intelligence. Background Technology

[0002] Environmental pollution has become a significant factor affecting global health. Scientifically assessing the entire causal chain of pollutants, from source emission and migration through environmental media to human exposure and health effects, is a prerequisite for developing precise environmental governance strategies and public health interventions. However, current technologies still have the following significant limitations when dealing with complex environmental health issues: Identifying causal relationships is challenging: Traditional environmental health studies often rely on statistical correlation analysis of observational data, which makes it difficult to effectively control for confounding factors such as meteorology, socioeconomic status, and lifestyle, leading to significant biases in identifying exposure-health causal chains. Furthermore, complex spatiotemporal lags exist between pollutants and health effects (e.g., the effects of PM2.5 on chronic obstructive pulmonary disease can lag by several years), and existing methods lack quantitative means to assess dynamic causal mechanisms.

[0003] Insufficient integration of multi-source heterogeneous data: Environmental monitoring data (such as air and water quality), pollution source emission data (industrial and transportation), population exposure data (activity patterns and dietary surveys), and health effect data (disease surveillance and biomarkers) are scattered across different departments and platforms, resulting in problems such as mismatched spatiotemporal resolution, inconsistent data formats, and a lack of sharing mechanisms. For example, air quality monitoring data is on an hourly basis, while health survey data is often sampled annually, making cross-data source correlation analysis difficult.

[0004] Multi-media exposure assessment is one-sided: Existing exposure assessment techniques often focus on a single environmental medium (such as considering only air pollution or drinking water pollution), neglecting the migration and transformation of pollutants among multiple media such as air, water, soil, food, and indoor decoration materials, as well as the cumulative exposure to the human body through multiple pathways. For example, heavy metals can be exposed synergistically through the "soil-crop-food chain" and "atmosphere-respiration," and traditional methods cannot quantify such combined exposure doses.

[0005] Lack of individualized assessment: Current risk assessments are mostly based on population mean and do not fully consider individual susceptibility differences in metabolic enzyme gene polymorphisms (such as the CYP450 family), physiological characteristics (children have higher respiratory rates than adults), and behavioral patterns (differences in occupational exposure), resulting in insufficient precise protection for high-risk groups (such as pregnant women and patients with chronic diseases).

[0006] Delayed health risk warnings: Traditional early warning systems rely on data from already occurred health outcomes (such as hospital-reported cases), lacking sensitive early biomarkers (such as urinary β2-microglobulin indicating renal tubular damage) and prospective predictive models, thus missing the optimal intervention window. For example, lead exposure causing neurodevelopmental damage in children may not produce clinical symptoms for several months, and current technologies cannot provide early warnings in the subclinical stage.

[0007] The health benefits of emission reduction measures are difficult to quantify: existing emission reduction assessments are mostly based on improvements in environmental quality (such as a decrease in PM2.5 concentration), making it difficult to convert environmental benefits into quantifiable health and economic benefits (such as avoiding premature deaths and saving on medical expenses), which prevents policymakers from prioritizing governance solutions that maximize health benefits.

[0008] Emerging pollutant assessment gaps: There is a lack of standardized multi-media monitoring methods, dose-response models, and health risk assessment frameworks for emerging pollutants such as microplastics, perfluorinated compounds (PFAS), and pharmaceutical and personal care product residues (PPCPs), which fails to meet the forward-looking needs of environmental management.

[0009] The combined effects of historical and existing pollution remain unresolved: Existing pollution sources (such as industrial point sources) and historically accumulated pollution (such as heavy metal contaminated sites) coexist in the regional environment. The two may have synergistic, antagonistic, or independent effects in the environmental media. Traditional single-source apportionment techniques cannot distinguish their relative contributions and long-term health impacts.

[0010] Insufficient regional adaptability: Existing model parameters are mostly based on general scenarios (such as plains cities), without considering regional differences such as topography (valleys, coastlines), climate (monsoon, drought), and economic structure (industrial, agricultural). For example, basin topography is prone to pollutant accumulation, requiring adjustment of atmospheric diffusion parameters, but traditional systems lack adaptive mechanisms.

[0011] Lack of inter-system collaboration: Environmental monitoring, public health, and emergency response systems use independent data standards and early warning mechanisms, resulting in information silos. For example, during periods of heavy pollution, environmental protection departments issue PM2.5 warnings, while health departments must independently assess health risks, making real-time cross-departmental coordination impossible.

[0012] In summary, existing technologies are insufficient to achieve full-chain causal inference from pollution sources to health effects, multi-source data fusion, individualized risk assessment, and forward-looking early warning. There is an urgent need for an intelligent correlation early warning method and system that integrates multi-source heterogeneous data, combines causal inference with machine learning, and supports regional adaptation and multi-system collaboration. Summary of the Invention

[0013] The purpose of this invention is to provide an environmental health risk early warning system, method, and storage medium based on data fusion and artificial intelligence, so as to solve the problem that the existing technology mentioned in the background art is difficult to achieve full-chain causal inference from pollution source to health effect, multi-source data fusion, individualized risk assessment, and forward-looking early warning.

[0014] To achieve the above objectives, the present invention provides the following technical solution: An environmental health risk early warning system based on data fusion and artificial intelligence includes: The multi-source data acquisition and fusion platform includes a multi-source data automatic acquisition submodule, a data cleaning and standardization submodule, a medical and health data processing submodule, and a spatiotemporal matching and interpolation submodule; it is used to acquire and integrate pollution source data, environmental media data, and population health data. The pollution source-environmental medium-human exposure-health effect correlation analysis module includes a pollution source analysis submodule, an environmental medium transport model submodule, a human exposure assessment submodule, and a health effect assessment submodule, which are used to perform spatiotemporal correlation analysis, pollutant exposure path tracing, causal inference of health risks, and biomarker identification and verification of the multi-source data; The causal inference and prediction module uses causal inference algorithms and machine learning prediction models to accurately identify the causal relationship between pollutant exposure and health effects and to scientifically predict future health risks. The dynamic health risk assessment module includes a time-series risk assessment submodule and a spatial risk distribution assessment submodule. It receives input data from the human exposure assessment submodule and the health effect assessment submodule. Then, it analyzes the temporal variation pattern of health risks through the time-series risk assessment submodule and analyzes the spatial distribution characteristics of health risks through the spatial risk distribution assessment submodule. Finally, it integrates the risk assessment results in both time and space dimensions to generate a dynamic risk assessment report. Multi-level early warning and decision support module; based on risk assessment and prediction results, it provides intelligent early warning and decision support functions to help environmental management departments formulate scientific prevention and control measures; The emission reduction benefit assessment module includes an environmental quality improvement assessment sub-module and a health benefit assessment sub-module. It assesses the contribution of different emission reduction measures to improving environmental quality and enhancing public health, providing quantitative basis for environmental management decisions.

[0015] Furthermore, it also includes: The organ system-specific damage biomarker atlas module is used to construct biomarker atlases covering eight major organ systems: kidney, liver, respiratory, nervous, cardiovascular, immune, endocrine, and reproductive, enabling accurate identification and early warning of multi-organ system damage caused by environmental pollutants. The Environmental / Health Knowledge Reasoning and Integration Module is used to extract, integrate, and apply professional knowledge in the environmental health field based on large language model technology, supporting data analysis and decision-making. The regional adaptation mechanism module is used to automatically adjust model parameters, identify key exposure paths, optimize health risk assessment, and customize early warning thresholds based on the geographical, climatic, hydrological, population, and economic characteristics of different regions. A multi-system collaboration mechanism module is used to enable data sharing and collaborative work with existing environmental monitoring systems, public health monitoring systems, and emergency response systems. The system iterative update mechanism module is used to support continuous optimization based on new scientific discoveries, accumulated monitoring data, and user feedback.

[0016] Furthermore, the multi-source data acquisition and fusion platform adopts a deep learning framework for spatiotemporal data fusion and matching to solve the problem of integrating multi-source heterogeneous spatiotemporal data in environmental health research, and to achieve accurate matching and fusion of environmental monitoring data and health monitoring data with different spatiotemporal resolutions, sampling frequencies, and data formats.

[0017] Furthermore, the causal inference and prediction module includes: A deep learning-driven causal inference engine is used to fuse structural causal models with neural networks to achieve high-dimensional nonlinear causal discovery. A framework combining causal inference and graph neural networks is used to model environmental health systems as heterogeneous spatiotemporal causal graph networks, enabling the estimation of full-chain causal effects; Spatiotemporally sensitive counterfactual prediction networks are used to support the neuralization of "do-operators" and quantify intervention effects and uncertainties.

[0018] Furthermore, the multi-level early warning and decision support module includes: The scenario simulation and emergency drill subsystem is used to establish a database of emergency scenarios, enabling simulation of event evolution, health impact and emergency response, and interactive multi-department collaborative drills. The ALOHA / CALPUFF / WASP model (ALOHA is typically used for chemical leaks, CALPUFF is an atmospheric diffusion model, and WASP is a water model) is used to simulate sudden events such as leaks, explosions, and water pollution. A multi-role collaborative exercise platform is used to record decision-making processes and generate exercise evaluation reports.

[0019] Furthermore, the regional adaptation mechanism module is used for: Automatically identify and classify regional topography, climate, hydrology, land use, population, and economic characteristics; The parameters of atmospheric diffusion, multi-media transport, exposure assessment, and health effects models are dynamically adjusted based on sensitivity analysis and mapping relationship library. Identify regional advantages and exposure pathways and generate customized monitoring plans; It supports adaptive operation in various regions, including plains, coastal areas, mountains, basins, agriculture, industry, and cities.

[0020] Furthermore, the multi-system collaboration mechanism module includes: Standardized data exchange protocols, supporting XML / JSON / CSV and industry standards such as HJ / T 212, HL7 FHIR, OGC, and DICOM; Based on a federated learning-based distributed analysis architecture, the FedAvg parameter aggregation strategy is used to achieve joint modeling under cross-system data privacy protection. Multi-source early warning integration technology integrates environmental monitoring, meteorological, disease, and public opinion early warning information to generate unified hierarchical early warnings; An integrated decision support interface enables collaborative action and optimized resource allocation among multiple departments, including environmental protection, health, and emergency response.

[0021] An environmental health risk early warning method based on data fusion and artificial intelligence, and an environmental health risk early warning system based on data fusion and artificial intelligence, including: Step S1: Acquire and integrate pollution source emission data, environmental media monitoring data, population activity and exposure data, and health effect data in real time through a multi-source data acquisition and fusion platform; Step S2: Using an artificial intelligence analysis engine, the integrated data is sequentially analyzed for pollution source analysis, environmental media transmission simulation, human exposure dose calculation, and health effect assessment to construct a full-chain correlation model of "pollution source-environmental media-human exposure-health effect". Step S3: Based on the aforementioned correlation model, a causal inference algorithm is used to identify the causal relationship between pollutant exposure and health effects, and to quantify the causal effects; Step S4: Use machine learning prediction models to predict changes in environmental quality and health risks to the population under different pollution scenarios in the future; Step S5: Based on biomarker monitoring data, establish a multi-level early warning mechanism to achieve multi-level and accurate early warning of health risks related to pollutant exposure; Step S6: Quantitatively assess the environmental quality improvement effects and health benefits of different emission reduction measures, and generate tiered early warning information and targeted intervention recommendations for environmental management and public health decision-making services.

[0022] Furthermore, step S1 performs spatiotemporal fusion through the spatiotemporal matching and interpolation submodule, where the weight calculation formula is:

[0023] Where, d iFor spatial distance, t i w represents the time distance. i The data quality coefficient is represented by p and q, which are the spatial and temporal weighting parameters, respectively. The default values ​​are p=2 and q=1. When, directly use the observations from that data source; when When the measured data at that moment is used directly, the fusion process takes into account the accuracy, reliability and representativeness of different data sources, and determines the optimal weight combination through cross-validation; The AI ​​analysis engine calculates an individual's total exposure using the human exposure assessment submodule. The calculation formula is as follows:

[0024] Where E represents the total exposure. The pollutant concentration in microenvironment i. The time spent in microenvironment i Let be the activity intensity coefficient of microenvironment i, where rest = 1.0, light activity = 1.2, moderate activity = 1.5, and heavy activity = 2.0.

[0025] A computer-readable storage medium having a computer program stored thereon, which, when executed by a processor, implements the environmental health risk early warning method based on data fusion and artificial intelligence.

[0026] Compared with the prior art, the beneficial effects of the present invention are: (1) Full-chain causal quantification: Using "causal graph + deep learning" to lock the complete path from pollution to health in one go, effectively reducing the effect estimation error and significantly outperforming traditional statistics; (2) Efficient fusion of multi-source data: The spatiotemporal graph neural network automatically registers heterogeneous data from "hour-meter level" to "decade-region level" into a unified grid, which greatly reduces the fusion delay and reduces the fusion error by 42%; (3) Cross-media-personalized exposure: Simultaneously calculate multiple media such as air, water, soil, food, and three routes of exposure: inhalation, oral administration, and skin. Then, automatically adjust the individual dose according to genes, age, and disease, which significantly improves the accuracy of exposure assessment. (4) Multi-level biomarker early warning: The combination of four-level thresholds and multiple biomarkers can significantly advance the early warning time of health events such as kidney damage, and the response is faster than that of clinical indicators; (5) Regional adaptive collaboration: The built-in "topography-climate-economy" parameter mapping library can be used in basins, coastal areas and agricultural areas; federated learning enables the environmental protection-health and emergency response systems to achieve efficient linkage and effectively reduce health losses from heavy pollution. Attached Figure Description

[0027] Figure 1This is a diagram illustrating the overall architecture of the environmental health risk early warning system based on data fusion and artificial intelligence, as described in this invention. Figure 2 Flowchart of the multi-source data acquisition and preprocessing module; Figure 3 This is a structural diagram of the pollution source-environmental medium-human exposure-health effect correlation analysis module; Figure 4 This is the algorithm flowchart for the causal inference and prediction module; Figure 5 The calculation flowchart for the dynamic health risk assessment module; Figure 6 This is a functional structure diagram of the early warning and decision support module; Figure 7 This is a flowchart of the regional adaptation mechanism. Detailed Implementation

[0028] The technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of the present invention, and not all embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those skilled in the art without creative effort are within the scope of protection of the present invention.

[0029] like Figure 1 As shown, the environmental health risk early warning system based on data fusion and artificial intelligence of this invention adopts a three-layer architecture design, including a data layer, an algorithm layer, and an application layer. The overall workflow of the system is as follows: First, various environmental and health data are acquired and integrated through a multi-source data acquisition and fusion platform; then, an artificial intelligence analysis engine performs spatiotemporal correlation analysis, pollutant exposure path tracking, causal inference of health risks, and biomarker identification and verification; finally, a multi-level early warning and decision support system provides regional environmental risk assessment, population health risk stratification, and environmental health policy decision support (this system is not used for disease diagnosis or treatment, but only provides environmental health risk reference).

[0030] Specifically, it includes the following main modules and sub-modules: The multi-source data acquisition and fusion platform includes a multi-source data automatic acquisition submodule, a data cleaning and standardization submodule, a medical and health data processing submodule, and a spatiotemporal matching and interpolation submodule; it is used to acquire and integrate pollution source data, environmental media data, and population health data. The pollution source-environmental medium-human exposure-health effect correlation analysis module includes a pollution source analysis submodule, an environmental medium transport model submodule, a human exposure assessment submodule, and a health effect assessment submodule, which are used to perform spatiotemporal correlation analysis, pollutant exposure path tracing, causal inference of health risks, and biomarker identification and verification of the multi-source data; The causal inference and prediction module uses causal inference algorithms and machine learning prediction models to accurately identify the causal relationship between pollutant exposure and health effects and to scientifically predict future health risks. The dynamic health risk assessment module includes a time-series risk assessment submodule and a spatial risk distribution assessment submodule. It receives input data from the human exposure assessment submodule and the health effect assessment submodule. Then, it analyzes the temporal variation pattern of health risks through the time-series risk assessment submodule and analyzes the spatial distribution characteristics of health risks through the spatial risk distribution assessment submodule. Finally, it integrates the risk assessment results in both time and space dimensions to generate a dynamic risk assessment report. Multi-level early warning and decision support module; based on risk assessment and prediction results, it provides intelligent early warning and decision support functions to help environmental management departments formulate scientific prevention and control measures; The emission reduction benefit assessment module includes an environmental quality improvement assessment sub-module and a health benefit assessment sub-module. It assesses the contribution of different emission reduction measures to improving environmental quality and enhancing public health, providing quantitative basis for environmental management decisions.

[0031] Furthermore, it also includes: an organ system-specific damage biomarker atlas module, used to construct biomarker atlases covering eight major organ systems: kidney, liver, respiratory, nervous, cardiovascular, immune, endocrine, and reproductive, to achieve accurate identification and early warning of multi-organ system damage caused by environmental pollutants; and an environmental / health knowledge reasoning and integration module, used to extract, integrate, and apply professional knowledge in the field of environmental health based on large language model technology to support data analysis and decision-making. Additionally, it includes the following mechanism modules: a regional adaptability mechanism module, used to automatically adjust model parameters, identify key exposure pathways, optimize health risk assessments, and customize early warning thresholds based on the geographical, climatic, hydrological, population, and economic characteristics of different regions; a multi-system collaboration mechanism module, used to achieve data sharing and collaborative work with existing environmental monitoring systems, public health monitoring systems, and emergency response systems; and a system iteration and update mechanism module, used to support continuous optimization based on new scientific discoveries, accumulated monitoring data, and user feedback.

[0032] Figures 2 to 7 The internal structure and workflow of each functional module are illustrated below. The system construction process will be further explained in conjunction with the above methods and steps: I. Multi-source data acquisition and preprocessing module This module aims to address the challenges of intelligent acquisition and standardized processing of multi-source heterogeneous data in the field of environmental health. Through a unified data processing workflow, it achieves automatic conversion from raw data to standardized data. The module includes sub-modules for automatic acquisition of multi-source data, data cleaning and standardization, medical and health data processing, and spatiotemporal matching and interpolation.

[0033] The actual operation flow of this module is as follows: First, the system automatically obtains raw data from various data sources through configured API interfaces or intelligent web crawlers; second, the system automatically performs data cleaning and standardization processing, including outlier detection, missing value imputation, and unit conversion; then, special privacy protection and anonymization processing is performed on medical and health data; finally, spatiotemporal matching and interpolation algorithms are used to solve the mismatch problem between different data sources in terms of time and space dimensions. In case of anomalies, the system will mark them and remind the user to conduct manual review.

[0034] 1.1 Multi-Source Data Automatic Acquisition Submodule: This submodule uses a combination of API interfaces and intelligent web crawlers to automatically acquire multi-source environmental health data. The core parameters of the intelligent web crawler are configured as follows: request frequency limited to 5-10 times / minute (adaptively adjusted according to the target website's rules), timeout set to 30 seconds, maximum retries of 3, randomized User-Agent pool size of 15, proxy IP rotation threshold of 50 requests, and exponential backoff strategy for handling request failures. The API interface supports three authentication methods: OAuth 2.0, Basic Auth, and API Key. Data transmission uses HTTPS protocol to ensure security. The interface access frequency is limited to 100 times / hour, and response formats support JSON, XML, and CSV.

[0035] 1.2 Data Cleaning and Standardization Submodule: This submodule implements automated data quality control and standardization processing. The specific processing flow includes: (a) Outlier detection: A multi-level detection strategy is adopted. First, a fixed threshold method is used to detect extreme anomalies (the threshold is the upper and lower limits of the reasonable range of contaminants set by domain experts). Then, the Z-score method is used to detect moderate anomalies (the threshold is set to μ±3σ). Finally, the IQR method is used to detect minor anomalies (the threshold is Q1-1.5IQR to Q3+1.5IQR). The detection results are divided into three categories: "confirmed anomaly" (to be deleted), "suspected anomaly" (to be manually reviewed), and "normal". (b) Missing value handling: The system automatically selects the optimal imputation method based on data type, missing pattern and time series characteristics. Short-term missing data (≤3 time points) is imputed by linear interpolation, medium-term missing data (4-10 time points) is imputed by ARIMA model, and long-term missing data (>10 time points) is imputed by similar day pattern. The imputation quality is evaluated by cross-validation. RMSE < 20% of the standard deviation of the original data is considered acceptable. (c) Unit conversion and standardization: The system has built-in standard unit definitions for 156 environmental parameters and 83 health indicators, supports automatic conversion between 27 common units, and uses two methods for standardization: z-score and min-max. Users can choose according to their subsequent analysis needs. (d) Time standardization: Convert all time data to ISO 8601 format (YYYY-MM-DDThh:mm:ss±hh:mm), and set appropriate time granularity (hourly, daily, weekly, or monthly) according to data type. (e) Spatial standardization: All spatial data are uniformly converted into the WGS84 coordinate system, supporting three representation methods: administrative division code, grid code, and latitude and longitude.

[0036] 1.3 Medical and Health Data Processing Submodule: This submodule implements the secure acquisition and standardized processing of medical and health data, with a particular focus on data privacy protection. Specific implementation includes: (a) Data interface construction: The system supports secure connection with the Hospital Information System (HIS) and Electronic Medical Record System (EMR), adopts HL7 FHIR (version R4), DICOM 3.0 and CDA Release 2 standard protocols, the interface encryption adopts TLS1.3, supports two-way authentication, the single data transmission volume is limited to 5MB, and automatic batch processing is performed when the limit is exceeded; (b) Privacy protection and desensitization: The system adopts a multi-layered protection strategy. First, 17 types of direct identifiers are removed (in compliance with HIPAA privacy rules). Then, k-anonymization technology (k=5) is used to process the identifiers. Differential privacy technology (ε=0.1) is used for highly sensitive data. Anonymous IDs are generated using the SHA-256 hash algorithm and salted. The salt value is 32 bytes long and is changed regularly. (c) Health data standardization: The system has established a standard vocabulary (based on LOINC coding) covering 2,500 common clinical test items. For non-standard codes of different medical institutions, a machine learning-assisted mapping method is used (accuracy >95%). Standard reference ranges are defined for each test item (stratified by age and gender). Outliers are represented by standard scores (z-score). (d) Biomarker Data Processing: The system supports data processing for 157 environmentally relevant biomarkers, including sample pretreatment, quality control, and data correction. Pollutant exposure index correction uses either urinary creatinine correction (urine sample) or fat content correction (blood sample), with the following correction formula:

[0037] in, The average correction factor level of the reference population, For the corrected data, This is the data before correction.

[0038] 1.4 Spatiotemporal Matching and Interpolation Submodule: This submodule addresses the mismatch between different data sources in terms of time and space dimensions. Specific implementation methods include: (a) Spatial Interpolation: The system implements three spatial interpolation algorithms, including Inverse Distance Weighted (IDW), Ordinary Kriging (OK), and Universal Kriging (UK). The distance power parameter p in the IDW algorithm can be adjusted within the range of 1-3, with a default value of 2. The semi-variogram model of the Kriging method supports spherical, exponential, and Gaussian models, and the model parameters are automatically optimized through cross-validation. The default grid resolution for environmental data spatial interpolation is 1km × 1km, which can be adjusted to 100m × 100m (city scale) or 10km × 10km (regional scale) as needed. (b) Temporal Interpolation: The system supports three methods: linear interpolation, spline interpolation, and LSTM-based temporal prediction. Linear interpolation is suitable for short-term missing data (interval ≤ 6 hours), spline interpolation is suitable for medium-term missing data (interval 6-24 hours), and LSTM prediction is suitable for long-term missing data (interval > 24 hours). The LSTM model configuration is as follows: hidden layer size = 64, stacked layers = 2, sequence length = 24, batch size = 32, learning rate = 0.001, and optimizer is Adam. (c) Spatiotemporal fusion: The system adopts a multi-source data fusion algorithm based on spatiotemporal weights. The weight calculation formula is as follows:

[0039] Where, d i For spatial distance, t i w represents the time distance. i The data quality coefficient is represented by p and q, which are the spatial and temporal weighting parameters, respectively. The default values ​​are p=2 and q=1. When, directly use the observations from that data source; when When the measured data at that moment is used directly, the fusion process takes into account the accuracy, reliability and representativeness of different data sources, and determines the optimal weight combination through cross-validation; (d) Spatiotemporal registration: The system establishes a unified spatiotemporal reference framework. The spatial reference adopts a two-layer structure that combines equally spaced grids (100m in urban areas and 1km in suburban areas) with administrative divisions. The temporal reference adopts a multi-scale structure (hour-day-week-month-quarter-year) to achieve seamless conversion and aggregation of data at different scales.

[0040] II. Pollution Source-Environmental Medium-Human Exposure-Health Effect Correlation Analysis System

[0041] This system is the core of this invention, enabling full-chain correlation analysis from pollution sources to health effects. For example... Figure 3 As shown, the system includes four core sub-modules: pollution source analysis sub-module, environmental media transport model sub-module, human exposure assessment sub-module, and health effect assessment sub-module.

[0042] The system workflow is as follows: First, the pollution source apportionment submodule identifies and quantifies the source composition of pollutants in the environmental media; then, the environmental media transport model submodule simulates the migration and transformation processes of pollutants between various environmental media; next, the human exposure assessment submodule calculates the pollutant exposure dose for different exposure pathways; finally, the health effect assessment submodule evaluates the impact of pollutant exposure on human health. Each submodule can be used independently or formed into a complete chain for end-to-end analysis.

[0043] 2.1 Pollution Source Apportionment Submodule: This submodule implements the quantitative analysis of the sources of pollutants in environmental media. Specific implementation methods include: (a) Receptor Model Construction: The system implements three main receptor models, including the Chemical Mass Balance (CMB) model, the Positive Matrix Factorization (PMF) model, and the Effective Variance Least Squares (EV-MLR) model. The PMF model employs a multi-starting-point strategy to avoid local optima, with the number of starting points set to 20, and the convergence condition being a relative change in the objective function <10. -6 The iteration count is greater than 300. A CMB model with a χ² value less than 4.0 and an R² greater than 0.8 is considered a valid solution. Model evaluation uses bootstrap analysis (repetition count = 200) to estimate the stability and uncertainty of the solution. (b) Source Feature Spectral Library Construction: The system has a built-in feature spectral library containing 25 major pollution source types, with each source type containing 15-35 characteristic components. The source spectral library is built based on the analysis results of more than 2,000 source samples from domestic and foreign sources. Each source spectrum is updated periodically to reflect technological changes, with an update cycle of 2 years. Users can upload local source spectrum data for personalized optimization, and the system automatically performs data quality control and standardization processing. (c) Isotope source apportionment and dating: The system implements techniques for identifying and dating historical pollution sources based on the analysis of multiple isotope ratios. Specifically, it includes: Stable isotope analysis: using stable isotope ratios (e.g.) The system is equipped with a high-precision isotope mass spectrometry analysis method, with a carbon isotope determination accuracy of ±0.1‰ and a lead isotope determination accuracy of ±0.002. Based on the isotopic characteristics of different pollution sources, a pollution source isotope fingerprint database is established, covering 20 major pollution source types, including fossil fuels, biomass combustion, and industrial emissions.

[0044] Radioisotope dating: using¹ 4 C、²¹ 0 Pb,¹³ 7 The decay characteristics of radioactive isotopes such as Cs were used to perform geochronological analysis on sediment and soil profile samples. The system employed the CRS (Constant Rate of Supply) model to calculate²¹ 0 Pb depositional age, combined with ¹³ 7 Cross-validation using the Cs timescale can achieve dating accuracy of ±2 to 5 years (for samples within the last 50 years).

[0045] Historical pollution profile reconstruction: By analyzing changes in pollutant concentration and isotopic composition in vertical profiles of soil and sediments, the pollution history of a region can be reconstructed. The system supports high-resolution sampling (minimum interval of 0.5 cm), and combined with dating results, it can reconstruct a time series of pollutant concentration changes over the past 100 years, with a time resolution of 2-5 years.

[0046] Multi-source pollution overlay analysis: Based on an isotopic mixing model, the system employs the Markov Chain Monte Carlo (MCMC) method within a Bayesian framework to decompose the contribution rates of pollutants from different sources at different times. The mixing model is configured as follows: chain length = 10,000, aging period = 1,000, thinning factor = 5. Model convergence is evaluated using the Gelman-Rubin statistic (R < 1.1) and effective sample size (> 1000). The system can distinguish the relative contributions of existing pollution sources and historically accumulated pollution, supporting simultaneous analysis of up to eight sources.

[0047] Historical and Existing Pollution Source Integration: The system innovatively integrates historical pollution data obtained through isotope technology with existing pollution source data acquired through real-time monitoring systems to establish a time-continuous dataset of pollution source evolution. The integration employs a time-weighted smoothing transition algorithm to ensure consistency and coherence of data from different periods. Based on the integrated data, the system constructs a multi-source pollutant superposition impact assessment model, enabling a comprehensive assessment of the combined effects of long-term cumulative pollutants (such as heavy metals and persistent organic pollutants) and short-term active pollutants.

[0048] (d) Spatial Distribution Pattern Analysis: The system integrates spatial pattern recognition algorithms based on Principal Component Analysis (PCA) and Non-negative Matrix Factorization (NMF) to automatically identify the spatial distribution characteristics of pollutants and associate them with potential pollution sources. The number of principal components is determined by the proportion of variance explained (>85%) or the eigenvalue (>1), and the rank of NMF is automatically optimized through cross-validation.

[0049] 2.2 Multi-Source Pollutant Superimposed Impact Assessment Model: This module realizes the assessment of the superimposed effect of multiple pollutants under the synergistic influence of existing and historical pollution sources. Specific implementation methods include: (a) Integration of Spatiotemporally Heterogeneous Pollution Sources: The system achieves spatiotemporal integration of real-time monitoring data of existing pollution sources with historical pollution source information. For existing pollution sources, the system acquires real-time emission data from industrial, transportation, agricultural, and domestic sources through an online monitoring network, with a data frequency on the hourly level. For historical pollution sources, the system reconstructs historical pollution records through sediment / soil profile sampling and isotope analysis, with a temporal resolution on the decadal level. The system employs a hierarchical temporal model to integrate data from different time scales. The model structure is as follows:

[0050] in, The overall pollution contribution at time t, Contribute to existing sources. For the contribution of historical sources, α is the time weighting factor (0-1), and f is the time decay function of the influence of historical sources.

[0051] Historical pollution decay functions vary depending on the type of pollutant: persistent organic pollutants (POPs) are modeled using a first-order kinetic decay model. Heavy metals are modeled using a constant model (assuming long-term stability), while other pollutants are modeled using appropriate models based on their physicochemical properties. The system achieves spatial heterogeneity integration, uniformly mapping the pollution contributions from point sources, area sources, and line sources onto a regular grid (100m×100m in urban areas and 1km×1km in suburban areas). (b) Algorithm for the superposition effect of multiple pollutants: The system implements an algorithm for evaluating the superposition effect of pollutants from different sources and of different types. For concentration superposition, the system adopts a weighted superposition model:

[0052] in, The total concentration is... The contribution concentration of source i, This is used as a weighting factor. For the superposition of health risks, the system distinguishes three superposition modes based on their mechanism of action: Pollutants with similar mechanisms of action were modeled using a concentration summation model:

[0053] Pollutants with different mechanisms of action but synergistic effects are used in an interaction model:

[0054] in, The interaction coefficient was determined based on toxicological studies. >0 indicates synergy. <0 indicates an antagonistic effect.

[0055] For pollutants with independent mechanisms of action, an independent action model is used:

[0056] in, For total risk, Risk of a single pollutant; (c) Spatiotemporal Dynamic Overlay Model: The system implements a multi-source pollutant overlay model that considers spatiotemporal dynamic characteristics. Spatially, the system adopts a geographically weighted regression (GWR) model to consider the spatial heterogeneity of pollution source impacts.

[0057] in, For the concentration of pollutants or health risks at location s, These are explanatory variables (such as source intensity, distance, terrain, etc.). The location-related regression coefficients are estimated using locally weighted least squares. A biquadratic kernel function is used, and the bandwidth is determined through cross-validation. In the time dimension, the system employs a time-varying coefficient model to capture the temporal dynamics of pollutant impacts.

[0058] Where t represents a point in time, For time-varying coefficients, spline smoothing or local polynomial estimation is used. The system integrates spatiotemporal dimensions to construct a spatiotemporal superposition model, in the form of:

[0059] The parameter estimation adopts a Bayesian hierarchical model and is solved using the MCMC algorithm; (d) Cumulative Exposure Health Impact Assessment: The system implements a method for assessing the health impacts of multiple pollutants under long-term cumulative exposure. The system employs a physiologically typed toxicokinetic (PBTK) model to simulate the accumulation process of pollutants in the human body, including gastrointestinal absorption, pulmonary absorption, skin absorption, tissue distribution, metabolism, and excretion. Model parameters are based on literature and human experimental data and calibrated using a Bayesian framework. The cumulative health risk calculation uses an exposure time-weighted method.

[0060] in, Let be the pollutant concentration in the i-th time period. The duration is given by f, which is an adjustment function for individual characteristics. The system also implements a risk amplification factor for sensitive populations (children, the elderly, pregnant women, and patients with chronic diseases), adjusting the risk assessment results based on susceptibility differences.

[0061] 2.3 Environmental Media Transport Model Submodule: This submodule constructs a multi-media environmental transport model to simulate the migration and transformation processes of pollutants between various environmental media. Specific implementation methods include: (a) Atmospheric diffusion models: The system implements atmospheric diffusion models at three scales, including a near-field Gaussian plume model (50m-5km), a mid-field CALPUFF model (5-50km), and a far-field CMAQ model (>50km). The Gaussian model parameters are set as follows: grid resolution = 50m, time step = 1 hour, number of vertical layers = 10, and stability classification using the Pasquill-Gifford six-class classification method. The CALPUFF model configuration is: grid resolution = 1km, time step = 1 hour, considering topographic effects and building wakes. All models are driven by hourly meteorological data, and meteorological data interpolation uses the Barnes objective analysis method. (b) Water Transport Model: The system includes a surface water WASP model, a groundwater MT3DMS model, and a drinking water network EPANET model. WASP model configuration: river length = 500m, time step = 1 hour, considering pollutant adsorption, degradation, and sedimentation processes. MT3DMS configuration: horizontal grid resolution = 100m, vertical stratification = 5 layers, time step = 1 day, using the method of characteristics to solve the convection term. Model validation requires a correlation coefficient between simulated and measured values ​​> 0.8 and a relative error < 20%. (c) Soil Migration Model: The system employs a vertically stratified one-dimensional mass transport equation to describe the migration process of pollutants in the soil, considering processes such as adsorption, desorption, degradation, and volatilization. Model configuration: vertical soil stratification = 10 layers, layer thickness increases with depth (top layer 0.05m, bottom layer 0.5m), time step = 1 day, boundary conditions include surface input flux and bottom zero concentration gradient; (d) Multi-media equilibrium model: The system implements a multi-media environmental equilibrium model based on the Mackay fugacity model, dividing the environment into five compartments: atmosphere, water, soil, sediment, and organisms, describing the distribution and transport of pollutants among these compartments. The model is solved using the Runge-Kutta fourth-order method, with a time step of 1 day, a relative error tolerance of 10^-6, and a maximum number of iterations of 10,000.

[0062] 2.4 Human Exposure Assessment Submodule: This submodule quantitatively assesses pollutant exposure doses through different exposure pathways based on pollutant concentration data in environmental media and population activity patterns. Specific implementation methods include: (a) Respiratory Exposure Assessment: The system uses a microenvironmental exposure model to estimate respiratory exposure, classifying individual activities into eight main microenvironments (outdoor, indoor residence, office, school, transportation, commercial premises, industrial premises, and others). Specific time-based activity patterns and pollutant concentrations are set for each microenvironment. Respiratory rate parameters are stratified by age, sex, and activity intensity: 0.45 m³ / h for adults at rest, 1.0 m³ / h for light activity, 1.6 m³ / h for moderate activity, and 3.2 m³ / h for heavy activity. Exposure time is estimated based on time-based activity logs or typical activity patterns. (b) Oral Exposure Assessment: The system covers three oral exposure routes: food intake, water intake, and unintentional ingestion (e.g., soil, dust). Food intake is based on dietary survey data, with adult intake of 350 g / day of cereals, 300 g / day of vegetables, 200 g / day of fruits, 100 g / day of meat, and 50 g / day of seafood. Water intake is 2 L / day (adult). The transfer of contaminants from environmental media to food is calculated using the bioaccumulation factor (BCF) or transfer factor (TF). (c) Skin Contact Exposure Assessment: The system considers three skin contact scenarios: water, soil / dust, and gaseous contaminants. Skin surface area parameters are stratified by age and sex, with a total body surface area of ​​1.8 m² (male) or 1.6 m² (female) for adults. Exposure site proportions are set as follows: head = 6.5%, trunk = 35.5%, upper limbs = 14.5%, lower limbs = 32.5%, hands = 5%, feet = 6%. Skin absorption coefficients are estimated based on the physicochemical properties of the contaminants (e.g., octanol-water partition coefficient, molecular weight). (d) Exposure Model Integration and Calibration: The system integrates a Monte Carlo simulation module to simulate the uncertainty and variability of exposure parameters through random sampling. The number of simulations is set to 10,000 to ensure stable result distribution. Model calibration uses biomonitoring data to back-derive exposure levels and establishes an exposure-biomarker relationship model. The calibration factor typically ranges from 0.8 to 1.2.

[0063] 2.5 Health Effect Assessment Submodule: This submodule assesses the impact of pollutant exposure on population health based on exposure dose data and dose-response relationships. Specific implementation methods include: (a) Risk Characterization Model: The system supports three risk characterization methods, including Hazard Quotient (HQ), Carcinogenic Risk Probability (CR), and Baseline Dose Method (BMD). Hazard Quotient calculation uses the US EPA Reference Dose (RfD) or WHO Tolerable Daily Intake (TDI) as the toxicological baseline. Carcinogenic risk calculation employs a linear, thresholdless model, using the Cancer Slope Factor (CSF) to quantify the lifetime carcinogenic risk per unit of exposure. The system includes a built-in database of RfD, TDI, and CSF values ​​for common pollutants, including heavy metals, persistent organic pollutants, and volatile organic compounds. (b) Dose-Response Relationship Database: The system has established a dose-response relationship database containing 350 major environmental pollutants. The relationship models include four types: linear models, log-linear models, threshold models, and nonlinear curve models. Model parameters are determined based on epidemiological and toxicological studies, and each model includes a central estimate and a 95% confidence interval. Users can select the most suitable dose-response model based on the characteristics of the study area. (c) Population Susceptibility Analysis: The system implements stratified health risk analysis based on demographic characteristics. Susceptibility factors consider age (child coefficient = 1.5, elderly coefficient = 1.3), sex (female reproductive system impact coefficient = 1.2), underlying disease status (respiratory disease coefficient = 1.5, cardiovascular disease coefficient = 1.3), and genetic polymorphism (fast metabolizer coefficient = 0.8, slow metabolizer coefficient = 1.2). The system estimates the proportion of each susceptible group in the total population using a demographic database; (d) Time Lag Effect Assessment: The system implements various time lag effect models, including the Distributed Lag Model (DLM) and the Distributed Lag Nonlinear Model (DLNM). The lag period settings vary depending on the health endpoint: 0-7 days for acute effects (e.g., asthma attacks), 0-14 days for subacute effects (e.g., respiratory infections), and 0-365 days for chronic effects (e.g., decreased lung function). The weighting function is constructed using polynomial or spline functions, and the degrees of freedom are optimized using the AIC criterion.

[0064] III. Implementation of the Causal Inference and Prediction System

[0065] like Figure 4 As shown, this system is an important innovative part of the present invention. Through advanced causal inference algorithms and machine learning prediction models, it can accurately identify the causal relationship between pollutant exposure and health effects and scientifically predict future health risks.

[0066] The system workflow is as follows: First, the causal relationship inference submodule identifies the causal relationship between pollutant exposure and health effects from observational data; then, the machine learning prediction submodule trains a prediction model based on historical data to predict health risks under future pollution scenarios; finally, causal relationship reports and risk prediction reports are generated to support environmental management and public health decision-making.

[0067] 3.1 Causal Relationship Inference Submodule: This submodule employs a hybrid causal inference method combining structural equation modeling (SEM) and Bayesian networks to address complex causal identification problems in the field of environmental health. Specific implementation methods include: (a) Bayesian Network Structure Learning: The system implements a three-stage Bayesian network learning algorithm. The first stage uses the PC algorithm to construct the initial skeleton, with a significance level α set to 0.05. Conditional independence is tested using Fisher's Z-test (for continuous variables) or G² test (for discrete variables). The second stage uses the Bayesian Information Criterion (BIC) to optimize the network structure. The BIC calculation formula is as follows: Where LL is the log-likelihood value, k is the number of parameters, and n is the sample size. In the third stage, expert knowledge and time-series information are used to determine the direction of the edges, with environmental exposure variables taking precedence over health effect variables. (b) Confounding Factor Control: The system implements an automatic identification algorithm for confounding factors using backdoor control. The algorithm first constructs a complete causal graph, then identifies all backdoor paths from the exposure variable to the outcome variable, and finally determines the minimum sufficient adjustment set. The system includes a built-in library of common confounding factors in environmental health research, including meteorological factors (temperature, humidity, air pressure), time factors (season, day of the week, holiday), demographic factors (age, sex, socioeconomic status), and behavioral factors (smoking, alcohol consumption, exercise). (c) Causal Effect Estimation: The system supports multiple causal effect estimation methods, including propensity score matching, counterfactual prediction, and instrumental variable methods. The propensity score model uses logistic regression (binary exposure) or a generalized additive model (continuous exposure), and the matching algorithm uses nearest neighbor matching (ratio 1:4, caliper = 0.2 × SD). Instrumental variables use meteorological variables (such as wind direction and pressure gradient) as instrumental variables for environmental exposure, and the estimation uses two-stage least squares. (d) Sensitivity Analysis: The system implements the E-value method to assess the potential impact of unobserved confounding factors. The E-value calculation formula is as follows: RR stands for hazard ratio. The system also implements negative exposure control analysis, using "sham" exposure variables unrelated to health effects to validate the model's specificity.

[0068] 3.2 Machine Learning Prediction Submodule: This submodule, based on historical data, constructs advanced machine learning models to predict the impact of future changes in pollution sources, environmental media, and human exposure on public health. Specific implementation methods include: (a) Feature Engineering: The system implements a feature engineering pipeline for environmental health data. Temporal feature extraction includes trend components, seasonal components, periodic components, and residual components (through STL decomposition). Spatial features include proximity environmental indicators, distance features (distance to pollution sources, main traffic arteries, water bodies, etc.), and spatial clustering features. Interaction features consider the synergistic effects between pollutants, capturing nonlinear relationships through pairwise interaction terms and high-order polynomials. Feature selection employs the recursive feature elimination method (RFE), combined with stability selection to improve the robustness of the feature set; (b) Ensemble Learning Model: The system implements a stacked ensemble framework comprising multiple base models. The first-level models include Random Forest (RF), Gradient Boosting Tree (GBT), Support Vector Machine (SVM), and Long Short-Term Memory (LSTM). Random Forest configuration: Number of trees = 500, maximum depth = 15, minimum number of leaf node samples = 5, feature sampling ratio = 0.7. Gradient Boosting Tree configuration: Number of trees = 300, learning rate = 0.05, maximum depth = 6, subsampling rate = 0.8. LSTM configuration: Hidden layer size = 128, number of stacked layers = 2, sequence length = 30, dropout rate = 0.2. The second-level model employs Elastic Regression with mixed L1 and L2 regularization (α = 0.5, λ = 0.01). (c) Spatiotemporal Prediction Framework: The system implements a prediction framework with spatiotemporal awareness. The time dimension employs a multi-scale recursive strategy: short-term predictions (1-7 days) use the original time granularity, medium-term predictions (8-30 days) use aggregated time granularity, and long-term predictions (>30 days) use a trend-seasonal decomposition method. The spatial dimension uses a locally weighted regression method. The model parameters of the prediction points are influenced by neighboring points, with the influence weight inversely proportional to the distance. Spatial decay parameters are automatically determined based on spatial autocorrelation analysis. (d) Uncertainty Quantification: The system employs multiple methods to quantify the uncertainty of the prediction results. The prediction interval of the ensemble model is estimated using quantile regression (α = 0.025 and 0.975). Parameter uncertainty is assessed using Bayesian inference or bootstrap methods. Scenario uncertainty is assessed through multi-scenario simulation, considering possible changes in pollutant emissions, meteorological conditions, and population activity.

[0069] IV. Implementation of the Dynamic Health Risk Assessment System

[0070] like Figure 5As shown, this system, based on exposure assessment and health effect assessment results and combined with temporal dynamic characteristics, enables dynamic assessment and visualization of health risks. The system includes a time-series risk assessment submodule and a spatial risk distribution assessment submodule.

[0071] The system workflow is as follows: First, the system receives input data from the exposure assessment and health effect assessment modules; then, the time series risk assessment submodule analyzes the temporal variation pattern of health risks; simultaneously, the spatial risk distribution assessment submodule analyzes the spatial distribution characteristics of health risks; finally, the spatiotemporal risk assessment results are integrated to generate a dynamic risk assessment report.

[0072] 4.1 Time Series Risk Assessment Submodule: This submodule assesses the temporal variation characteristics of health risks based on time series analysis methods. Specific implementation methods include: (a) Risk Calculation at Multiple Time Scales: The system supports risk calculations at five time scales: daily, weekly, monthly, quarterly, and yearly. Each time scale uses a specific time window to aggregate pollutant exposure data and health effect data. Daily scale analysis uses a 24-hour average, weekly scale uses a 7-day moving average, and monthly scale uses a 30-day moving average. Risk calculation adopts the method described in the aforementioned health effect assessment module, adjusting the exposure-effect relationship parameters according to the time scale; (b) Time Pattern Recognition: The system implements time series decomposition and pattern recognition algorithms. Seasonal-trend decomposition uses the STL method (LOESS smoothing parameters: trend window = 365, seasonal window = 11) to identify long-term trends, seasonal variations, and short-term fluctuations. Periodicity analysis uses Fast Fourier Transform (FFT) with a significance level of α = 0.05. Change point detection uses the PELT algorithm (penalty parameter λ = 10) to identify abrupt changes in risk levels. (c) Time Lag Effect Analysis: The system implements a distributed lag nonlinear model (DLNM), simultaneously capturing nonlinear exposure-effect relationships and complex lag structures. The exposure-effect dimension uses a natural spline function (degrees of freedom = 4), and the lag-effect dimension uses a multinomial distribution (order = 3). The maximum lag days are set according to the type of health endpoint: acute effect = 7 days, subacute effect = 14 days, chronic effect = 30 days. The model is evaluated using the QAIC criterion; the optimal model is considered significant if ΔQAIC < 2. (d) Risk Trend Prediction: Based on historical risk time series, the system uses an ARIMA model to predict future risk trends. The model order (p, d, q) is determined through autocorrelation function (ACF) and partial autocorrelation function (PACF) analysis, or automatically optimized through grid search. The seasonal model S-ARIMA considers two seasonal cycles: weekly (s=7) and annual (s=365). The prediction period is 30 days, and the model is updated every 7 days to ensure prediction accuracy.

[0073] 4.2 Spatial Risk Distribution Assessment Submodule: This submodule assesses the spatial distribution characteristics of health risks based on spatial analysis methods.

[0074] V. Application of Personalized Health Risk Early Warning and Intervention System

[0075] This system is responsible for providing individuals with accurate health risk assessments and early warnings, generating personalized health management recommendations based on individual characteristics and exposure status. The core application scenario is providing the public with personalized services regarding environmental health risks, delivered through mobile applications or online platforms.

[0076] The system workflow is as follows: First, individual users register and enter their basic personal information through the system interface; second, the system obtains the user's location and activity information, and estimates the individual's exposure level by combining it with environmental monitoring data; then, the system assesses the individual's health risk based on individual characteristics and exposure status; finally, the system generates personalized early warning information and health management suggestions, and updates them dynamically according to environmental changes and user feedback.

[0077] 5.1 Personalized Exposure Assessment Module: This module enables accurate exposure assessment based on individual characteristics and activity patterns. Specific implementation methods include: (a) Individual Activity Trajectory Acquisition: The system acquires an individual's spatiotemporal activity trajectory through the GPS positioning function of the mobile application (sampling frequency = 5 minutes / time, positioning accuracy <10 meters) or manual recording by the user. The system categorizes activity locations into residences, workplaces, commuting routes, and other activity venues, with each category associated with specific activity patterns and exposure scenarios. Privacy protection measures include anonymized data storage and access control. (b) Microenvironment Exposure Model: The system established pollutant concentration models for 12 typical microenvironments, including outdoor environments (urban roads, parks and green spaces, industrial areas, rural areas) and indoor environments (residential buildings, offices, schools, shopping malls, restaurants, vehicles, hospitals, and entertainment venues). The relationship between indoor and outdoor concentrations was calculated using the permeability factor method. The permeability factor depends on building characteristics, ventilation conditions, and pollutant properties, and typically ranges from 0.3 to 0.8. Microenvironment concentration data sources included data from the most recent monitoring stations, spatial interpolation results, and preset typical values. (c) Individual Exposure Calculation: The system calculates total individual exposure based on a time-activity-location-concentration framework. The calculation formula is as follows: Where E is the total exposure. The pollutant concentration in microenvironment i. The time spent in microenvironment i The activity intensity coefficient for microenvironment i is (rest = 1.0, light activity = 1.2, moderate activity = 1.5, heavy activity = 2.0). The system considers the impact of personal protective measures (such as mask use), with the mask protection coefficient set according to type (ordinary mask = 0.3, N95 mask = 0.8). (d) Individual physiological parameter adjustment: The system adjusts exposure parameters based on the physiological characteristics provided by the user (age, sex, height, weight, underlying diseases). Respiratory rate adjustment is based on the allometric growth formula for body weight. ,in, Respiratory rate, BW body weight, and ref represent reference values. Drug use affects metabolic capacity; the system adjusts the metabolic coefficient (range 0.7-1.3) based on the effects of common drugs.

[0078] 5.2 Personalized Health Risk Assessment Module: This module achieves accurate individual health risk assessment based on individual exposure levels and personal susceptibility characteristics. Specific implementation methods include: (a) Individual Susceptibility Assessment: The system assesses individual susceptibility based on the health information provided by the user. Susceptibility factors include age (children and the elderly are more sensitive), gender (some pollutants have different effects on men and women), underlying medical conditions (respiratory diseases, cardiovascular diseases, immune system diseases, etc. increase susceptibility), and special conditions (such as pregnancy). The system uses a multifactorial scoring method to calculate a comprehensive susceptibility index, with a score range of 1-5, where 1 represents general susceptibility and 5 represents extremely high susceptibility. (b) Individual Risk Calculation: The system calculates individual health risk based on exposure level and susceptibility index. Non-carcinogenic risk calculation formula: Where ADD is the average daily exposure dose, As an individual susceptibility factor, This is a reference dose.

[0079] Formula for calculating carcinogenic risk: Where SF is the slope factor. The system accumulates the health risks of multiple pollutants to calculate the total risk index; (c) Health endpoint-specific risks: The system performs specific risk assessments for different health endpoints, including respiratory diseases (such as asthma and COPD), cardiovascular diseases (such as hypertension and arrhythmia), neurological effects (such as cognitive decline), and effects on specific populations (such as developmental delays in children). The system incorporates association models between 85 common pollutants and 42 health endpoints, allowing for the selection of appropriate risk models based on individual characteristics. (d) Dynamic Risk Trends: Based on continuously monitored exposure data, the system calculates the time trend of individual health risk. The system supports risk trend analysis at daily, weekly, and monthly time scales, displaying patterns of risk level changes and peak periods. The system also implements short-term forecasting capabilities, predicting risk changes over the next 3-5 days based on weather forecasts and historical patterns.

[0080] 5.3 Early Warning and Intervention Recommendation Module: This module generates personalized early warning information and health management recommendations based on individual health risk assessment results. Specific implementation methods include: (a) Tiered Early Warning Mechanism: The system establishes a four-tiered early warning mechanism, corresponding to different risk levels. Early Warning Level 1 (Low Risk): HQ < 0.1 or CR < 10 -6 Routine protection is recommended; Warning Level 2 (Low to Medium Risk): 0.1≤HQ<0.5 or 10^-6≤CR<10 -5 It is recommended to strengthen protection; Warning Level 3 (Medium-high risk): 0.5≤HQ<1.0 or 10 -5 ≤CR<10 -4 Strict precautions are recommended; Warning Level 4 (High Risk): HQ ≥ 1.0 or CR ≥ 10 -4 Emergency precautions are recommended. Warning information will be sent via mobile app push notifications, SMS, or email. (b) Intervention Library: The system has established a knowledge base containing 250 specific interventions, covering three main categories: exposure reduction, health protection, and medical intervention. Exposure reduction measures include adjusting activity time (e.g., avoiding peak pollution hours), optimizing routes (e.g., choosing low-pollution routes), and improving indoor environments (e.g., using air purifiers with a CADR value ≥ 350 m³ / h). Health protection measures include the use of personal protective equipment (e.g., choosing the right type of mask), dietary adjustments (e.g., increasing the intake of antioxidant foods), and lifestyle optimization (e.g., appropriate exercise to enhance immunity). Medical intervention measures include symptom monitoring, preventative medication, and timely medical advice. (c) Personalized Recommendation Generation: The system uses a combination of rule-based and machine learning methods to generate personalized recommendations. The rule base contains 180 IF-THEN rules, mapping risk characteristics to intervention measures. The machine learning model predicts the acceptability and effectiveness of specific intervention measures based on user profiles and historical feedback. The recommendation algorithm employs a hybrid approach combining collaborative filtering and content filtering. The system generates no more than 5 priority-ranked specific recommendations for each user, ensuring that the recommendations are concise, clear, and highly actionable. (d) Effect Feedback and Optimization: The system implements an intervention effect evaluation and continuous optimization mechanism. Users can provide feedback on the implementation of intervention measures and changes in their health status through a mobile application. They can assess their subjective feelings using a five-point scale (1 = significant deterioration, 3 = no change, 5 = significant improvement) and record symptom changes (present / absent) using a binary choice. Based on user feedback data, the system uses a Bayesian optimization algorithm to dynamically adjust the recommendation weights of intervention measures, improving the accuracy and effectiveness of personalized recommendations.

[0081] VI. Implementation of a Multi-Level Biomarker Health Risk Early Warning System

[0082] This system, based on biomarker monitoring data, enables accurate early, mid-, and long-term warnings of health risks related to pollutant exposure. It has established a pollutant-specific biomarker library and, combined with multiple physiological indicators, provides tiered health risk warnings for different population groups.

[0083] The system workflow is as follows: First, the optimal combination of biomarkers for a specific pollutant is determined through the biomarker screening and validation module; then, the early warning thresholds for early, medium, and long-term health risks are set through the multi-level risk threshold determination module; next, the individual biomarker monitoring and evaluation module analyzes the individual biomarker levels and risk status; finally, the graded early warning and intervention recommendation module generates targeted risk warning information and intervention suggestions.

[0084] 6.1 Biomarker Screening and Validation Module: This module uses a combination of multi-task learning and feature selection to screen pollutant-specific biomarkers.

[0085] The specific implementation methods include: Based on a multi-task learning framework, the system jointly models biomarker responses under different pollutant exposures as associated tasks. Common features among biomarkers are extracted through a shared representation layer, while pollutant-specific responses are captured through a task-specific layer. Feature selection employs a recursive feature elimination combined with cross-validation (RFECV) method, with screening criteria including: sensitivity > 80%, specificity > 75%, and AUC > 0.85. The system also implements a biomarker validation process based on prospective cohort data, evaluating the incremental predictive value of new biomarkers by calculating the net reclassification improvement index (NRI > 0.1) and the integrated discrimination improvement index (IDI > 0.02).

[0086] 6.2 Multi-level risk threshold determination module: This module establishes a multi-timescale health risk early warning indicator system and sets corresponding early warning thresholds for different exposure stages.

[0087] The specific implementation methods include: the system determines the optimal cutoff values ​​for each biomarker based on receiver operating characteristic (ROC) curve analysis and the Youden Index. A three-tiered threshold system is used: the first-tier threshold (early warning) is set at the 75th percentile of the reference population to identify increased subclinical exposure; the second-tier threshold (moderate warning) is set at the 90th percentile of the reference population or 1 / 10 of the known lowest observable adverse effect concentration (LOAEC) to identify exposure levels requiring attention; and the third-tier threshold (high warning) is set at the 95th percentile of the reference population or the biological exposure limit (BEL) to identify exposure levels requiring immediate intervention. Threshold parameters are stratified according to age (children, adults, and the elderly), gender, and special physiological states (such as pregnancy). The system uses a Bayesian adaptive method to dynamically update the threshold parameters based on continuously accumulated monitoring data, with an update cycle of once per quarter.

[0088] 6.3 Individual Biomarker Monitoring and Assessment Module: This module assesses an individual's health risk status based on individual biological sample test results and preset thresholds. Specific implementation methods include: (a) Standardized Testing Procedures: The system specifies the standard operating procedures (SOPs) for biomarker testing to ensure the accuracy and comparability of results. Blood samples must be collected after fasting (>8 hours), using EDTA anticoagulant tubes, and stored at room temperature for <2 hours. Urine samples must be collected using first morning urine or random urine, using sterile containers, and stored at 4°C for <24 hours. Sample pretreatment steps, analytical methods, and quality control requirements are all detailed in the SOPs to ensure comparability of results across different testing batches and institutions. (b) Result Standardization: The system implements a standardized processing flow for test results. For biomarkers in urine samples, urine creatinine correction is used to eliminate the effect of urine concentration dilution. The correction formula is:

[0089] Among them, C creatinine_reference The reference population median urinary creatinine level was 1.2 g / L. For persistent organic pollutants in blood samples, lipid correction was applied using the following formula:

[0090] Where TL represents total blood lipid content (g / L); (c) Comprehensive Risk Scoring: The system implements a comprehensive risk scoring algorithm based on multiple biomarkers. The standardized score (Z-score) calculation formula is as follows: Where X represents the individual biomarker level, and μ and σ are the mean and standard deviation of the reference population. The comprehensive risk score is calculated as follows:

[0091] in, The weight of biomarker i is determined based on its diagnostic value. A risk score <1.0 is considered normal, 1.0-2.0 is considered mild risk, 2.0-3.0 is considered moderate risk, and >3.0 is considered severe risk. (d) Time Trend Analysis: The system implements time trend analysis of individual biomarker levels. For continuous monitoring data, the system calculates the relative rate of change:

[0092] A change rate >20% was considered significant. The system also used linear regression analysis to analyze long-term trends, and the significance of the slope was assessed using a t-test (p<0.05). For rapidly changing biomarkers, the system considered the half-life, used an exponential decay model to predict the theoretical change curve, and compared it with actual observations to assess exposure status.

[0093] 6.4 Tiered Early Warning and Intervention Recommendation Module: This module generates tiered early warning information and targeted intervention recommendations based on biomarker assessment results.

[0094] The specific implementation methods include: Based on the three-level threshold system described in Section 6.2, the system compares individual biomarker monitoring results with corresponding thresholds to automatically determine the warning level. Warning information generation employs a strategy combining template-based and personalized approaches: the template library contains 120 standardized warning information templates for damage to various organ systems, while the personalized portion is adjusted based on individual exposure characteristics, susceptibility factors, and historical monitoring data. Intervention recommendations utilize a decision tree-based rule engine, matching the most suitable intervention plan from the intervention measure library based on the warning level, pollutant type, damaged organ system, and individual characteristics, recommending no more than five priority-ranked specific measures each time. The system supports interface integration with Clinical Decision Support Systems (CDSS), and can automatically generate referral suggestions in high-alert situations (this system does not perform disease diagnosis, but only provides environmental health risk references).

[0095] VII. Implementation of Emerging Pollutant Monitoring and Assessment System

[0096] This system enables comprehensive environmental monitoring, health effect assessment, source apportionment and tracing, and optimization of control strategies for emerging pollutants such as microplastics and nanoplastics, drug and personal care product residues, perfluorinated and polyfluoroalkyl substances, flame retardants, nanomaterials, endocrine disruptors, and novel persistent organic pollutants.

[0097] The system workflow is as follows: First, the emerging pollutant classification and feature library module systematically classifies and describes the characteristics of target pollutants; then, the multi-media monitoring method module uses specific analysis techniques to detect emerging pollutants in various environmental media; next, the environmental health effect assessment module evaluates the toxicological mechanisms and health risks of emerging pollutants; further, the source apportionment and tracing module identifies the sources and environmental behaviors of emerging pollutants; finally, the control strategy assessment module evaluates the environmental health benefits of various control measures.

[0098] 7.1 Emerging Pollutant Classification and Characteristic Database: This module establishes a comprehensive classification system and characteristic database for emerging pollutants. Specifically, the system classifies emerging pollutants into 7 primary categories (microplastics, PFAS, PPCPs, flame retardants, nanomaterials, endocrine disruptors, and novel POPs) and 42 secondary categories based on chemical structure, environmental behavior, and health effects. The characteristic database records physicochemical properties (molecular weight, log Kow, water solubility, vapor pressure, etc.), environmental behavior parameters (degradation half-life, bioaccumulation factor, soil adsorption coefficient, etc.), toxicological data (known toxicity endpoints, NOAEL / LOAEL values, mechanisms of action, etc.), and detection method information (recommended analytical methods, detection limits, availability of standards, etc.) for each pollutant. Currently, it includes characteristic data for 3,200 emerging pollutants. The database supports three search methods: keyword search, structural similarity search, and toxicity analogue search.

[0099] 7.2 Multi-Media Monitoring Method Module: This module develops specific monitoring methods for emerging pollutants, applicable to various environmental media. Specifically, the system integrates an analytical method library for different categories of emerging pollutants, including: microplastic detection using μ-FTIR and Py-GC / MS, supporting particle identification with a diameter >10 μm and simultaneously analyzing 13 common polymer types; PFAS detection using ultra-high performance liquid chromatography-tandem mass spectrometry (UPLC-MS / MS), covering 32 target compounds and total extractable organic fluorine (EOF) analysis; and PPCPs detection using solid-phase extraction combined with LC-MS / MS, covering 80 target compounds in a single analysis. The system provides standard operating procedures (SOPs) for each method, including sampling specifications, pretreatment steps, instrument parameters, and quality control requirements, and includes automatic calculation functions for the method detection limit (MDL) and method quantitation limit (MQL).

[0100] 7.3 Environmental Health Effects Assessment Module: This module establishes a framework for assessing the health effects of emerging pollutants. Specifically, it employs a multi-level assessment strategy for emerging pollutants with limited toxicological data. The first level utilizes quantitative structure-activity relationship (QSAR) models and the Read-across method to predict toxicity endpoints based on chemical structure, achieving an accuracy rate >75%. The second level integrates in vitro experimental data (such as high-throughput screening of ToxCast data) to establish toxicity pathway analysis, covering 14 toxicity pathways including nuclear receptor activation, oxidative stress, and genotoxicity. The third level derives reference values ​​using the baseline dose (BMD) method based on limited in vivo experimental data and epidemiological evidence. For new pollutants lacking any toxicological data, the system uses the Threshold for Toxicological Concern (TTC) method for preliminary risk screening and determines the daily allowable exposure based on Cramer classification.

[0101] 7.4 Source Apportionment and Tracing Module: This module implements source apportionment and tracing functions for emerging pollutants. Specifically, the system performs source apportionment based on the characteristic ratio method and multivariate statistical analysis of emerging pollutants. For PFAS, characteristic component ratios (such as PFOA / PFOS ratio, short-chain / long-chain ratio) are used to distinguish sources such as industrial emissions, fire-fighting foam use, and domestic wastewater. For microplastics, based on polymer type composition, particle size distribution, and morphological characteristics (fiber / fragment / film ratio), combined with a PMF model, sources such as textile washing, plastic packaging degradation, and tire wear are identified. The system also integrates GIS-based spatial tracing functionality, combining emission source inventories and hydrological / meteorological transport models to trace the spatial source paths of pollutants.

[0102] 7.5 Management Strategy Evaluation Module: This module evaluates the environmental health benefits of various emerging pollutant management strategies. Specifically, the system establishes a framework for evaluating the effectiveness of emerging pollutant management measures, covering three categories: source substitution, process control, and end-of-pipe treatment. The evaluation method employs scenario analysis, with 12 typical management scenarios built-in (such as a complete ban on long-chain PFAS, mandatory installation of microplastic filters, and greening of PPCP production processes). Each scenario sets emission change parameters for the baseline and target years. Benefit assessment includes three levels: environmental concentration reduction rate prediction, population exposure reduction calculation, and health risk reduction estimation. The system also supports management cost accounting, generating a cost-benefit comparison matrix to assist management departments in identifying the optimal management combination scheme.

[0103] VIII. Application of Organ System-Specific Damage Biomarker Atlas

[0104] This system establishes a comprehensive atlas of organ system-specific damage biomarkers, enabling precise identification and early warning of multi-organ system damage caused by environmental pollutants. The atlas covers eight major organ systems: kidney, liver, respiratory, nervous, cardiovascular, immune, endocrine, and reproductive, providing a scientific basis for the systematic assessment of health risks related to pollutant exposure.

[0105] The system workflow is as follows: First, the target organ systems to be monitored are identified based on the characteristics of pollutant exposure; then, organ-specific biomarker combinations are screened from the biomarker database; next, biological samples are collected and tested; finally, an organ damage assessment report and intervention recommendations are generated based on the test results. This system can serve as a further application of health risk early warning systems, providing more accurate organ damage assessments.

[0106] 8.1 Organ-Specific Biomarker Database: This module constructs a comprehensive database of organ system-specific damage biomarkers. (This system is not used for disease diagnosis or treatment, but only provides reference for environmental health risks.) Specific implementation includes: (a) Organ System Classification Framework: The system classifies human organ systems into eight major categories: renal system, hepatic system, respiratory system, nervous system, cardiovascular system, immune system, endocrine system, and reproductive system. Each system is further subdivided into four categories of biomarkers: functional, structural, early injury, and reparative. Functional biomarkers reflect the functional status of organs (e.g., creatinine and blood urea nitrogen in renal function), structural biomarkers reflect the structural integrity of organs (e.g., cardiac-specific troponin T), early injury biomarkers are used to detect damage that has not yet manifested as functional abnormalities (e.g., NAG enzyme, a biomarker of renal tubular injury), and reparative biomarkers reflect the tissue repair process (e.g., transforming growth factor-β). (b) Biomarker Characterization: The system provides standardized parameter descriptions for each biomarker, including: organ specificity index (OSI, range 0-1), with higher values ​​indicating greater specificity for a particular organ; time-dynamic characteristics (half-life, time to peak concentration, clearance rate); sample type applicability (blood, urine, breath, etc.); detection method and its sensitivity and specificity; reference range (stratified by age and sex); and contaminant association strength (association coefficient and level of evidence for a specific contaminant). For each organ system, at least 5 functional, 3 structural, 4 early, and 2 repair biomarkers are included, totaling approximately 120 core biomarkers. (c) Organ-Contaminant Association Matrix: The system established an association matrix between pollutants and organ system damage, covering the association strength between 85 major environmental pollutants and 8 major organ systems. Association strength was categorized into five levels: 1 = Possible association (with limited epidemiological evidence), 2 = Low association (with consistent epidemiological evidence), 3 = Moderate association (with epidemiological and experimental evidence), 4 = High association (with sufficient epidemiological, experimental, and mechanistic evidence), and 5 = Definite association (a recognized pathogenic relationship). The system also recorded information on damage mechanisms, such as oxidative stress, inflammatory responses, and mitochondrial dysfunction. (d) Combined Biomarker Strategy: An optimized combined biomarker strategy was designed for each major pollutant-organ system combination. The combined strategy included: biomarker combination members (typically 3-5 complementary biomarkers), the weight of each biomarker, the decision rule (e.g., "early damage is defined when A > threshold 1 and B > threshold 2 or C > threshold 3"), and validation performance (sensitivity, specificity, positive predictive value, negative predictive value). The combined strategy was optimized from historical data using machine learning methods, and its performance was evaluated using 10-fold cross-validation. An AUC > 0.85 was considered an efficient strategy.

[0107] 8.2 Organ Damage Assessment Algorithm: This module implements an organ system damage assessment algorithm based on multiple biomarkers. Specific implementation methods include: (a) Single Organ Injury Score: The system uses a weighted index to calculate the injury score of a single organ system. Calculation formula: Among them, Z i For the standardized score of marker i, Weights are assigned (functional markers = 1.0, structural markers = 1.2, early markers = 1.5, and restorative markers are weighted according to the direction of change: increase = 0.8, decrease = 1.2). The scoring range is 0-10, divided into five levels: 0-2 points for normal, 2-4 points for mild damage, 4-6 points for moderate damage, 6-8 points for severe damage, and 8-10 points for extremely severe damage. (b) Multi-organ injury pattern recognition: The system implements a multi-organ injury pattern recognition algorithm based on unsupervised learning. The algorithm uses hierarchical clustering to cluster injury patterns of different organs into typical patterns. Euclidean distance is used for distance calculation, and Ward's minimum variance method is used for clustering. Typical patterns include: combined liver and kidney injury, neuroendocrine injury, and combined cardiopulmonary injury. The system identifies the most matching injury pattern by comparing the similarity between the individual's multi-organ injury distribution and the typical patterns. (c) Time Progression Prediction: Based on historical data and current status, the system predicts the time progression trend of organ damage. The prediction employs an individualized Bayesian model, fusing population data and individual-specific parameters. The prediction range is 30-90 days, and the output is a time curve of the damage score and a 95% confidence interval. Prediction accuracy is assessed using the mean absolute error (MAE), with an MAE < 1.0 considered acceptable. (d) Comprehensive Risk Score: The system calculates a comprehensive health risk score that considers the impact on multiple organs. Calculation formula: The organ system weights reflect the impact of organ damage on overall health (kidney = 0.8, liver = 0.8, lung = 0.9, nervous system = 1.0, cardiovascular system = 1.0, immune system = 0.7, endocrine system = 0.7, reproductive system = 0.7). The comprehensive score is divided into four levels: <3 points are low risk, 3-6 points are medium risk, 6-9 points are high risk, and >9 points are very high risk.

[0108] 8.3 Pollutant-Specific Organ Damage Spectrum: This module establishes specific organ damage spectra and monitoring strategies for major environmental pollutant categories. Specific implementation methods include: (a) Organ Damage Monitoring for Heavy Metal Exposure: The system has established specific organ damage monitoring protocols for common heavy metals such as lead (Pb), cadmium (Cd), mercury (Hg), arsenic (As), and chromium (Cr). For example, the cadmium exposure monitoring protocol includes: kidney damage monitoring (urine β2-microglobulin, urine NAG enzyme, urine retinol-binding protein, serum cystatin C), skeletal system monitoring (serum calcium, serum phosphorus, serum alkaline phosphatase, serum PTH, urinary deoxypyridinoline), and liver monitoring (ALT, AST, GGT, total bilirubin). The system automatically generates the most suitable monitoring combination based on the exposure level and duration. (b) Organ Damage Monitoring for Organic Pollutant Exposure: The system has established specific organ damage monitoring protocols for organic pollutants such as persistent organic pollutants (POPs), volatile organic compounds (VOCs), and polycyclic aromatic hydrocarbons (PAHs). For example, the benzene series exposure monitoring protocol includes: hematopoietic system monitoring (complete blood cell count, reticulocyte percentage, lymphocyte micronucleus rate), liver monitoring (complete liver function tests, liver cytokine profile), and nervous system monitoring (neurobehavioral tests, neurotransmitter metabolites). The system optimizes the monitoring combination based on the main target organs and toxic mechanisms of pollutants; (c) Monitoring of Organ Damage from Gaseous Pollutant Exposure: The system has established specific organ damage monitoring protocols for atmospheric pollutants such as NO2, SO2, O3, and PM2.5. For example, the PM2.5 exposure monitoring protocol includes: respiratory system monitoring (pulmonary function testing, exhaled nitric oxide, pulmonary surfactant protein D, Clara cell protein 16), cardiovascular system monitoring (high-sensitivity C-reactive protein, fibrinogen, D-dimer, myocardial enzyme profile), and oxidative stress monitoring (8-hydroxydeoxyguanosine, malondialdehyde, total antioxidant capacity). The system adjusts the monitoring focus according to the particle size distribution and chemical composition of the pollutants. (d) Organ Damage Monitoring of Emerging Pollutants: The system has established specific organ damage monitoring protocols for emerging pollutants such as perfluorinated compounds (PFAS), phthalates, and brominated flame retardants. For example, the PFAS exposure monitoring protocol includes: liver monitoring (lipid profile, ALT, bile acid profile), endocrine system monitoring (thyroid function, sex hormone levels), and immune system monitoring (lymphocyte subset analysis, cytokine profile). Because research on the health effects of emerging pollutants is relatively limited, the system employs a dynamic update mechanism to adjust the monitoring strategy based on the latest research findings.

[0109] 8.4 Personalized Organ Protection Plan: This module generates personalized organ protection intervention plans based on organ damage assessment results. The specific implementation method includes: the system matches the most suitable protection plan from an intervention knowledge base containing 180 organ protection measures based on organ damage scores and damage patterns. Intervention measures are organized in three levels: the first level is exposure source control measures (such as reducing exposure to specific pollutants, adjusting dietary sources, etc.); the second level is organ function support measures (such as increasing water intake and restricting high-sodium diets for kidney damage, and avoiding exposure to hepatotoxic substances and supplementing with hepatoprotective nutrients for liver damage, etc.); the third level is medical monitoring recommendations (such as recommending regular checkups of specific biomarkers, referring patients to relevant specialists, etc.; this system does not perform disease diagnosis or prescribe medications). Plan generation uses a rule-based reasoning engine, personalizing the plan according to the individual's age, gender, underlying disease status, and medication use. Each plan contains no more than 8 specific recommendations, ordered by urgency and feasibility.

[0110] IX. Implementation of the Emission Reduction Benefit Evaluation System

[0111] This system assesses the contribution of different emission reduction measures to improving environmental quality and public health, providing quantitative evidence for environmental management decisions. The system comprises two core components: an environmental quality improvement assessment submodule and a health benefit assessment submodule.

[0112] 9.1 Environmental Quality Improvement Assessment Submodule: This submodule assesses the effectiveness of emission reduction measures in improving environmental quality based on source-receptor relationships. Specific implementation methods include: (a) Source-Acceptor Matrix Construction: The system constructs regional source-acceptor relationship matrices based on atmospheric and aquatic environmental models. The atmospheric source-acceptor matrix is ​​based on sensitivity analysis of chemical transport models such as CMAQ or CAMx, with a grid-level resolution (3km×3km for urban areas, 9km×9km for regional scales) and a seasonal average temporal resolution. The aquatic environment source-acceptor matrix is ​​based on water quality models such as WASP or EFDC, with river segments of 1-5km in length and lake / reservoir zones divided into 5-10 sub-regions. Matrix element a_ij represents the concentration contribution of a unit emission from source j to receptor i, in μg / m³ / t or mg / L / t; (b) Emission Reduction Scenario Design: The system supports the design and comparison of various emission reduction scenarios. Scenario types include single-source emission reduction (emission reduction from a single pollution source), type-source emission reduction (emission reduction from a specific type of pollution source, such as the power industry or steel industry), regional emission reduction (emission reduction from all sources within a specific region), and composite emission reduction (a combination of multiple types). The emission reduction percentage can be set to 10%, 20%, 30%, 50%, or a user-defined value. The system also supports time-varying scenarios, such as seasonal or phased emission reductions. (c) Environmental Quality Change Prediction: The system predicts environmental quality changes based on the source-receptor matrix and emission reduction scenarios. Concentration reduction calculation formula: ,in, This represents the decrease in concentration at receptor site i. For source-receptor matrix elements, Let J be the emission reduction amount from source j. The system considers pollutant transformation relationships, such as the nonlinear impact of NOx and VOCs emission reductions on O3 concentration, and uses a statistical response surface model to correct the linear prediction results; (d) Environmental Quality Improvement Assessment: Based on the predicted results of environmental quality changes, the system assesses the degree of improvement in environmental quality. Assessment indicators include: reduction in average pollutant concentration, relative concentration reduction rate, reduction in the number of days exceeding standards, improvement in Air Quality Index (AQI) or water quality category, and changes in the proportion of areas meeting environmental quality standards. The system generates a spatial distribution map of environmental quality improvement, including contour maps, gradient color maps, and improvement rate zoning maps, visually demonstrating the spatial differences in emission reduction benefits.

[0113] 9.2 Health Benefit Assessment Submodule: This submodule assesses the health benefits of emission reduction measures based on the results of environmental quality improvement. Specific implementation methods include: (a) Health Effect Function Library: The system has established a health effect function library containing 85 combinations of pollutants and health endpoints. Each health effect function records: pollutant type, health endpoint, population characteristics (general population or sensitive population), function type (linear, log-linear, threshold, or nonlinear), relative hazard ratio (RR) or concentration-response coefficient (β), applicable concentration range, uncertainty range (95% confidence interval), and level of evidence. Core health endpoints include: all-cause mortality, cardiovascular disease mortality, respiratory disease mortality, cardiovascular disease hospitalization, respiratory disease hospitalization, chronic bronchitis, asthma exacerbation, days of lost work, and days of restricted activity, etc. (b) Population Exposure Assessment: The system combines environmental quality change predictions with population distribution data to assess population exposure changes. The system employs a GIS-based population distribution model with a resolution of 1km × 1km grids or community / street-level administrative units. The population is divided into five age groups (0-4 years, 5-14 years, 15-44 years, 45-64 years, and ≥65 years) and two gender groups. Vulnerable populations include children, the elderly, pregnant women, and patients with underlying medical conditions, with proportions based on local epidemiological data. The formula for calculating population exposure changes is as follows: ,in, Changes in total exposure of group g, For the concentration change of grid i, Let g be the population of group g in grid i; (c) Health Impact Assessment: The system calculates the health impact of emission reduction measures based on the health effect function and population exposure assessment results. Formula for calculating the number of cases avoided: ,in, To avoid the number of health impacts, Here, represents the baseline morbidity / mortality rate, β is the concentration-response coefficient, ΔC is the concentration change, and Pop is the number of exposed individuals. For a linear model, this is simplified to... The system considers the health effects of multi-pollutant exposure, employs additive or multiplicative models for comprehensive assessment, and quantifies the uncertainty (95% confidence interval) using Monte Carlo simulation. (d) Economic Value Assessment: The system quantifies the economic value of health benefits. Assessment methods include the Cost of Injury (COI), Willingness to Pay (WTP), and Human Capital approach. The economic unit value of the primary health endpoints includes: Statistical Value of Life (VSL) of RMB 7.6 million / case (adjusted to local income level), asthma emergency room visit of RMB 2,500 / case, chronic bronchitis of RMB 80,000 / case / day, hospitalization of RMB 8,000 / case / day, and lost workday of RMB 400 / day. The system considers inflation and time discounting (discount rate = 3%) to generate annual and cumulative economic benefit estimates.

[0114] 9.3 Emission Reduction Cost-Benefit Analysis Submodule: This submodule comprehensively considers emission reduction costs and environmental health benefits, providing scientific decision support. Specific implementation methods include: (a) Emission Reduction Cost Assessment: The system establishes a cost assessment model for major emission reduction measures. Cost types include initial investment costs (equipment purchase, installation, engineering construction, etc.), operation and maintenance costs (energy consumption, material consumption, labor costs, etc.), and indirect costs (production losses, adjustment costs, etc.). Cost calculation is based on the cost per unit emission reduction (RMB / ton) multiplied by the emission reduction amount, taking into account economies of scale (the marginal cost change of large-scale emission reduction). Unit emission reduction cost data for different industries and technologies are derived from actual engineering cases and industry surveys, and are updated regularly to reflect technological progress. (b) Cost-Benefit Comparison: The system uses multiple indicators to compare the cost-benefit relationship of emission reduction measures. Core indicators include: cost-benefit ratio (emission reduction cost / health benefit), cost per unit of health benefit (emission reduction cost / number of cases avoided), net benefit (health benefit - emission reduction cost), and internal rate of return. The system supports short-term (1-3 years), medium-term (5-10 years), and long-term (>10 years) benefit assessments, considering time-discounted factors. (c) Multi-objective optimization: The system implements an emission reduction strategy optimization algorithm based on multi-objective programming. The optimization objectives include maximizing health benefits, minimizing emission reduction costs, maximizing environmental quality improvement, and maximizing social equity. Constraints include total emission reduction budget limits, technical feasibility limits, and time schedule requirements. The algorithm uses a genetic algorithm to solve the problem, with a population size of 100, 200 iterations, a crossover probability of 0.8, and a mutation probability of 0.1, generating optimal and suboptimal Pareto solution sets. (d) Uncertainty Analysis: The system implements uncertainty analysis functionality for emission reduction benefit assessment. Analysis methods include scenario analysis (best-case, most likely, and worst-case scenarios), sensitivity analysis (identifying factors most sensitive to parameter changes), and Monte Carlo simulation (considering the combined impact of multiple parameter uncertainties). Key uncertain parameters include the health effect coefficient (±25%), economic unit value (±30%), and emission reduction cost (±40%). Uncertainty analysis results are presented with 95% confidence intervals and probability distribution plots.

[0115] 10. Implementation of Early Warning and Decision Support System

[0116] Based on risk assessment and forecasting results, this system provides intelligent early warning and decision support functions to help environmental management departments formulate scientific prevention and control measures. The system consists of two core components: a multi-level early warning submodule and a decision support submodule.

[0117] The system workflow is as follows: First, the risk assessment and prediction results of the aforementioned modules are integrated; then, warning information is generated based on the risk level through the multi-level early warning sub-module; next, targeted intervention plan suggestions are provided through the decision support sub-module; finally, personalized decision support services are provided to different users.

[0118] 10.1 Multi-level early warning submodule: This submodule designs a multi-level early warning mechanism based on health risk thresholds.

[0119] Early warning classification standards: The system has established a four-level environmental health risk early warning standard.

[0120] Level IV Alert (Blue, Mild Risk): HQ=0.1-0.5 or CR=10 -6 -10-5 or a health risk index of 1.0-2.0; Level III Alert (Yellow, Moderate Risk): HQ=0.5-1.0 or CR=10 -5 -10 -4 Or a health risk index of 2.0-3.0; Level II Alert (Orange, Severe Risk): HQ=1.0-3.0 or CR=10 -4 -10 -3 Or a health risk index of 3.0-4.0; Level I Alert (Red, Severe Risk): HQ > 3.0 or CR > 10 -3 Or a health risk index > 4.0. The system takes into account the risks of mixed exposure to multiple pollutants and sensitive populations. Under the same HQ or CR value, mixed exposure to multiple pollutants raises the warning level by one level, and a sensitive population proportion > 30% raises the warning level by one level. 10.2 Decision Support Submodule: This submodule provides decision support functions based on system analysis results. Specific implementation methods include: (a) Problem Identification and Analysis: Based on the environmental health risk assessment results, the system automatically identifies key problems and challenges. Problem identification dimensions include: risk type (acute / chronic), scope of impact (local / regional), risk source (single / multiple sources), exposure pathway (single / multiple pathways), at-risk group (general / specific), and development trend (stable / deteriorating / improving). The system uses the Analytic Hierarchy Process (AHP) to quantify the severity and urgency of each problem, calculate priority scores, and determine key intervention points. (b) Intervention Program Library: The system has established a library of 650 specific intervention measures, covering five major categories: source control, process management, end-of-pipe treatment, exposure reduction, and health protection. Each intervention measure records information such as applicable conditions, technical requirements, implementation cycle, cost estimation, expected results, uncertainties, supporting measures, and case references. The system provides parameterized descriptions of intervention measures and supports intelligent matching and combination based on multiple attributes. The library is regularly updated to incorporate new technologies, methods, and best practices. (c) Solution Generation and Evaluation: Based on problem characteristics and an intervention solution library, the system automatically generates targeted intervention solution combinations. Solution generation employs a hybrid reasoning approach based on rules and cases. First, it matches applicable intervention types using the IF-THEN rule, and then recommends specific measure combinations based on historical successful cases. The system conducts multi-dimensional evaluations of the generated solutions, including: effectiveness (degree of reduction in health risks), cost (resource input), timeliness (time to see results), feasibility (technical and management difficulty), and fairness (degree of benefit to different population groups). The evaluation uses a weighted scoring method, and the weights can be adjusted according to the decision-maker's preferences. (d) Interactive Decision Support: The system provides an interactive decision support interface, enabling decision-makers to explore different options and hypothetical scenarios. Interface functions include: option comparison (displaying the advantages and disadvantages of multiple options side-by-side), parameter adjustment (adjusting key parameters and observing changes in results), scenario simulation (simulating the effects under different implementation scenarios), and sensitivity analysis (identifying which factors the results are most sensitive to). The system supports collaborative decision-making, allowing multiple users to view and comment on options simultaneously. The system automatically integrates opinions and updates evaluation results. The decision-making process recording function supports decision transparency and accountability.

[0121] 10.3 Scenario Simulation and Emergency Drill Submodule: This submodule provides scenario simulation and emergency drill functions for environmental health emergencies. Specific implementation methods include: (a) Emergency Scenario Database: The system has established a scenario database containing 85 typical environmental health emergencies, covering common emergency types such as chemical spills, explosions, toxic gas releases, water pollution, and heavy metal contamination. Each scenario includes parameters such as event description, pollutant characteristics, release conditions, diffusion patterns, population exposure, and potential health impacts. Scenario parameters are adjustable, supporting simulations of event development under different scales and conditions; (b) Event Evolution Simulation: The system implements event evolution simulation capabilities based on a combination of physical and empirical models. Chemical spill simulation uses the SOURCE model to calculate release rates and total amounts; atmospheric diffusion simulation uses the ALOHA or CALPUFF model, considering the effects of topography, meteorology, and buildings; water pollution simulation uses the WASP model, considering hydrodynamics and water quality changes. Simulation results are presented in time series and spatial distribution formats, including pollutant concentration changes, impact range expansion, and duration predictions. (c) Health Impact and Emergency Response Simulation: Based on exposure scenarios and health risk models, the system simulates the impact of emergencies on population health. Simulation content includes: the size of the potentially exposed population, the distribution of acute health effects (such as poisoning and irritation symptoms), prediction of medical needs (emergency visits, hospitalization, and specialist treatment), and long-term health risk assessment. The system integrates an emergency response resource management module to simulate the implementation effects and resource requirements of different emergency measures (such as evacuation, shelter, and medical rescue), supporting emergency plan optimization. (d) Interactive Drill Function: The system provides an interactive emergency drill platform that supports multi-department collaborative drills. Drill modes include scripted (based on preset scenarios) and free-form (randomly generated emergencies). The system simulates the event's progression, participants make decisions based on the information they receive, and the system provides real-time feedback on the decision-making effectiveness. During the drill, the system records indicators such as decision-making time, resource allocation, information transmission, and collaboration efficiency. After the drill, an evaluation report is generated, identifying strengths and weaknesses in the emergency response and proposing improvement suggestions.

[0122] XI. Implementation of the Environmental Health Knowledge Reasoning and Integration System

[0123] This system, based on large language modeling technology, extracts, integrates, and applies professional knowledge in the field of environmental health to support data analysis and decision-making. The system realizes the entire process of knowledge mining, structuring, and intelligent application, providing knowledge support for environmental health management.

[0124] The system workflow is as follows: First, environmental health expertise is extracted from scientific literature, research reports, and expert knowledge; then, unstructured knowledge is transformed into a structured knowledge graph; next, the knowledge is combined with data analysis results to conduct knowledge-driven reasoning; finally, explanations and recommendations based on expertise are generated to support environmental health decision-making.

[0125] 11.1 Environmental Health Knowledge Mining Driven by Large Language Models: This module implements the automatic mining, structuring, and application of environmental health domain knowledge based on large language models. Specific implementation methods include: (a) Intelligent Document Processing: The system implements an intelligent processing workflow for environmental health scientific literature. Text preprocessing includes format standardization, chapter recognition, table extraction, and image OCR. The core processing employs semantic analysis based on a large language model. The model uses a Transformer architecture pre-trained in the environmental health domain, with 20 bytes of parameters and a context window length of 8,192 tokens. It is fine-tuned for domain adaptation based on 5 million environmental health documents. Text analysis tasks include entity recognition (pollutants, health effects, biomarkers, etc.), relation extraction (causal relationships, correlation relationships, temporal relationships, etc.), and attribute extraction (concentration, dose, hazard ratio, etc.). (b) Knowledge Triple Extraction: The system employs a knowledge triple extraction method based on a large language model to extract structured knowledge in the form of (entity 1, relation, entity 2) from the text. The extraction uses a two-stage strategy: the first stage uses a pre-trained model to directly extract explicitly expressed triples, prioritizing accuracy; the second stage uses reasoning enhancement to mine implicit knowledge, prioritizing recall. Knowledge types include: pollutant characteristics ("lead, half-life, 30 days"), exposure-effect relationships ("PM2.5, increased risk, cardiovascular disease"), biomarker relationships ("urine 8-OHdG, indicator, oxidative stress"), mechanism relationships ("benzene, by inhibiting hematopoiesis, leading to leukopenia"), and intervention effects ("chelation therapy, reducing, blood lead levels"). (c) Knowledge Graph Construction: The system constructs an environmental health knowledge graph based on extracted triples. The graph contains 5 core entities (pollutants, health effects, biomarkers, mechanisms, and interventions) and 20 relationship types. The graph adopts a hierarchical architecture, including a conceptual layer (domain terminology and classification system), a factual layer (specific relationships and attributes), and a reasoning layer (rules and patterns). Graph quality control includes consistency checks (identifying and resolving conflicting knowledge), integrity assessment (identifying knowledge gaps), and reliability scoring (based on source and strength of evidence). The knowledge graph has 1.8 million entities and 3.2 million relationships, covering 5,800 environmental pollutants and 3,200 health effects. (d) Knowledge-Data Fusion Reasoning: The system implements a knowledge-data fusion reasoning mechanism. Reasoning methods include: rule-based deductive reasoning (interpreting observation results using domain knowledge), case-based analogical reasoning (finding similar historical cases), and statistically based inductive reasoning (discovering new patterns from data). The system also implements a causal chain completion function. When a correlation between A and C is observed but direct evidence is lacking, the system uses a knowledge graph to find possible intermediate nodes B, constructing a complete causal chain "A→B→C," enhancing the completeness and credibility of the explanation.

[0126] 11.2 Environmental Health Knowledge Reasoning Engine: This module, based on knowledge graphs and large language models, implements complex reasoning functions in the environmental health domain. Specific implementation methods include: (a) Multi-hop reasoning mechanism: The system implements a knowledge graph-based multi-hop reasoning algorithm, supporting multi-step logical reasoning starting from known facts and following relational paths in the knowledge graph. The maximum reasoning depth supports 5 hops, and the confidence score of each hop is calculated by weighting the evidence strength of each relation on the path. For example, when cadmium levels in the soil of a certain area are detected to be excessive, the system can automatically deduce a complete reasoning chain of "excessive cadmium in the soil → cadmium enrichment in rice → increased cadmium exposure in residents' diets → increased risk of renal tubular damage," and provide a confidence score for each step. The reasoning results are verified by an expert rule base to exclude reasoning paths that violate domain common sense.

[0127] (b) Uncertainty Reasoning: The system employs a Bayesian inference framework to handle the uncertainty of knowledge. Each knowledge relationship is labeled with a prior probability (based on evidence level: definite association = 0.95, high association = 0.85, moderate association = 0.70, low association = 0.50, possible association = 0.30). The system calculates the posterior probability based on newly input observation data using Bayesian updates. When multiple inference paths point to the same conclusion, the system uses Dempster-Shafer evidence theory to fuse multi-source evidence and calculate the overall confidence level.

[0128] (c) Contradictory Knowledge Detection and Resolution: The system implements an automatic contradiction knowledge detection function based on logical constraints. Detection rules include: antisymmetry checks (e.g., "A increases B" and "A decreases B" cannot be true simultaneously), transitivity checks (e.g., "A causes B" and "B causes C," then "A causes C" should be true), and numerical consistency checks (if the difference in quantification values ​​from different sources for the same relation does not exceed 3 times). For detected contradictions, the system prioritizes them according to evidence level, publication time, and sample size, adopting high-level, latest, and large-sample research conclusions first, while retaining contradiction records for expert review.

[0129] (d) Hypothesis Generation and Validation: Based on structural gaps (i.e., missing but expected relationships) in the knowledge graph, the system automatically generates scientific hypotheses to be validated. Hypothesis generation employs a graph embedding method (TransE model, embedding dimension = 200) to predict the probability of missing links in the knowledge graph. When the prediction score exceeds a threshold (>0.7), the system marks the missing relationship as a hypothesis to be validated and recommends possible validation strategies (such as suggesting monitoring specific biomarkers or conducting surveys of specific populations).

[0130] 11.3 Intelligent Report and Explanation Generation: This module generates professional and understandable reports and explanations based on data analysis results and a knowledge base. Specific implementation methods include: (a) Multi-level Report Generation: The system automatically generates three levels of reports based on the target audience. Technical reports (for professionals) include the complete data analysis process, model parameters, uncertainty analysis, and methodological details, ranging from 20 to 50 pages. Management reports (for decision-makers) primarily use charts and key conclusions, highlighting risk levels, spatiotemporal distribution, and intervention recommendations, ranging from 5 to 10 pages. Public reports (for the general public) use plain language and visual graphics, focusing on conveying risk levels and protective recommendations, ranging from 1 to 2 pages. Report generation employs a combination of template-based and large language modeling approaches. The template defines the report structure and necessary content, while the large language model is responsible for filling in and refining the text based on the analysis results.

[0131] (b) Presentation of Interpretable Analysis Results: The system provides interpretable presentation capabilities for complex AI analysis results. For causal inference results, the system generates a visualization including causal path diagrams, effect sizes, and confidence intervals. For machine learning prediction results, the system uses the SHAP (SHapley Additive exPlanations) method to calculate the contribution of each feature to the prediction results, generating feature importance ranking charts and local interpretation charts. For risk assessment results, the system provides a risk decomposition view, showing the contribution ratio of different pollutants, different exposure pathways, and different health endpoints to the total risk.

[0132] (c) Natural Language Explanation Generation: Leveraging the generation capabilities of a large language model, the system transforms numerical analysis results into natural language explanations. Explanation generation employs a four-step process: "Data → Key Discoveries → Domain Knowledge Connections → Plain Language Expression." The system utilizes a knowledge graph to provide domain context, ensuring the professional accuracy of the explanations. The generated text is verified by a fact-checking module, undergoing consistency comparison with the input data and analysis results to prevent the large language model from generating misleading content. Explanations support bilingual output in Chinese and English.

[0133] (d) Interactive Question Answering Functionality: The system provides an interactive question answering interface based on a large language model, allowing users to ask questions related to the analysis results in natural language. The question answering system adopts a retrieval-enhanced generation (RAG) architecture, first retrieving relevant information from the knowledge graph and analysis results, and then generating accurate answers based on the retrieval results. The system supports follow-up questions and contextual understanding, with a dialogue history window length of 10 rounds. For questions beyond the system's capabilities, the system will clearly indicate the limitations and suggest consulting relevant domain experts.

[0134] XII. Implementation of Regional Adaptation Mechanisms

[0135] The system is designed with a regional adaptability mechanism for environmental health risk assessment. It can automatically adjust model parameters, identify key exposure pathways, optimize health risk assessment and customize early warning thresholds based on the geographical, climatic, hydrological, population and economic characteristics of different regions, so as to achieve the goal of "one system applicable to multiple locations".

[0136] The system workflow is as follows: First, acquire feature data of the target area; then, extract key features of the area through the area feature recognition and classification module; next, optimize model parameters based on area features through the model parameter adaptive adjustment module; further, determine the advantageous exposure pathways of the area through the area-specific exposure path recognition module; finally, generate an area adaptability assessment scheme to guide subsequent risk assessment work.

[0137] XIII. Implementation of Multi-System Collaboration Mechanism

[0138] The system is designed with an advanced multi-system collaboration mechanism to achieve data sharing and collaborative work with existing environmental monitoring systems, public health monitoring systems and emergency response systems, forming an intelligent collaborative network for environmental health management.

[0139] The system workflow is as follows: First, data interoperability with different systems is achieved through a standardized data exchange protocol; then, the consistency and timeliness of data from each system are maintained through a real-time data synchronization and sharing mechanism; next, in-depth fusion analysis of data from multiple systems is achieved through a cross-system joint analysis framework; finally, a multi-departmental collaborative environmental health management network is formed through a collaborative early warning and emergency response mechanism.

[0140] Standardized data exchange protocol: This module adopts a multi-level, multi-standard data exchange protocol to ensure seamless integration with different systems.

[0141] Real-time data synchronization and sharing mechanism: This module implements a real-time data synchronization and sharing mechanism with other systems.

[0142] Cross-system joint analysis framework: This module establishes a cross-system joint analysis framework to achieve in-depth fusion analysis of data from multiple systems.

[0143] Collaborative early warning and emergency response mechanism: This module implements a collaborative early warning and emergency response mechanism with other early warning systems.

[0144] XIV. Implementation of System Iterative Update Mechanism

[0145] The system is designed with a robust iterative update mechanism, supporting continuous optimization based on new scientific discoveries, accumulated monitoring data, and user feedback, ensuring that the system maintains its advanced nature and applicability in the long term.

[0146] The system workflow is as follows: First, the performance of each model is continuously monitored through the model performance monitoring and evaluation system; then, new scientific discoveries and research results are incorporated in a timely manner through the dynamic update mechanism of the knowledge base; next, user feedback and demand analysis are used to collect and analyze user experience and demand changes; finally, the system version iteration and function optimization are achieved through system component upgrades and integration.

[0147] XV. Application Cases

[0148] To verify the effectiveness and practicality of the system of this invention, a typical industrial city was selected as the verification scenario. Through system deployment and trial operation, the system's ability to perform full-chain correlation analysis and accurate early warning from pollution sources to health effects was verified, providing scientific support for environmental management decisions. The following data are based on preliminary statistical results from the system simulation and trial operation phases.

[0149] Application Background and Requirements: This city is a typical heavy industrial city with various industrial enterprises such as steel, chemical, and power, and environmental pollution and residents' health problems have long existed. The main requirements include: identifying the causal relationship between key pollution sources and health effects; assessing the health benefits of different emission reduction measures; providing personalized health risk warnings for sensitive populations; and supporting rapid response to environmental health emergencies. The system needs to integrate heterogeneous data scattered across environmental protection, health, and meteorological departments to achieve multi-dimensional correlation analysis and accurate early warning.

[0150] System Implementation Process: The system implementation adopts a phased approach, with full deployment completed over 18 months. Phase 1 (3 months): Requirements analysis and system design were completed, clarifying functional modules and data requirements. Phase 2 (6 months): The basic platform was built and data access was completed, integrating historical data from multiple environmental monitoring stations, medical institutions, and meteorological stations to establish an initial model. Phase 3 (6 months): Core functions were developed and the model optimized, enabling pollution source analysis, health risk assessment, and early warning functions. Phase 4 (3 months): System integration and verification were completed, including trial operation and training, ensuring stable system operation and user proficiency.

[0151] Typical application cases: (a) Pollution Source-Health Effect Correlation Analysis: The system conducted a full-chain correlation analysis of environmental monitoring data and health data for the city over the past five years. The results showed that PM2.5 and SO2 emissions from steel enterprises were the main factors contributing to the increase in respiratory diseases, contributing over 40%; VOCs emissions from chemical enterprises were the main influencing factors for nervous system and skin allergies, contributing nearly 40%. The system also identified the long-term health effects of historical pollution (such as heavy metals in the soil), particularly the potential risks to children's development. These findings can provide a scientific basis for the government to formulate targeted pollution control policies. (b) Health Benefit Assessment of Emission Reduction Measures: Based on the correlation analysis results, the health benefits of various emission reduction scenarios were systematically assessed. The assessment shows that implementing ultra-low emission retrofitting in the steel industry can reduce the hospitalization rate of respiratory diseases by approximately 15%–20%, significantly reduce lost work days, and generate economic benefits of hundreds of millions of yuan per year; VOCs treatment in the chemical industry can reduce the incidence of allergic diseases by approximately 10%–15%, and significantly reduce related drug expenditures. The spatial distribution map of health benefits generated by the system shows that emission reduction measures have the most significant health improvement effects on densely populated areas and areas with concentrated sensitive populations. These assessment results can help the government optimize environmental protection investment strategies and prioritize emission reduction projects with the greatest health benefits. (c) Personalized Health Risk Warning: The system provided personalized health risk warning services to tens of thousands of registered users in the city. During a period of heavy pollution, the system generated differentiated warning information based on the characteristics and activity patterns of different population groups. For a large number of patients with respiratory diseases, the system issued a high-risk warning 48 hours in advance, suggesting adjustments to outdoor activities and medication use plans; for the elderly, the system provided suggestions for indoor activities and air purification guidance; for school-aged children, the system sent suggestions for adjusting outdoor activities to parents and schools. Preliminary follow-up surveys showed that users who received and followed the warning suggestions had a significantly lower emergency room visit rate than the control group, preliminarily demonstrating the actual effectiveness of the warning system. (d) Environmental Health Emergency Response: The system played a crucial role in a chemical plant leak incident. Shortly after the incident, the system integrated environmental monitoring, meteorological, and population distribution data to predict pollutant diffusion trends and potential health risks. The system generated a zoned early warning map, identifying three high-risk communities and five medium-risk communities, and predicting the potential types and scale of health impacts. Based on these analyses, emergency management departments implemented precise evacuations and medical resource allocation. All residents in high-risk areas were safely evacuated, and medical institutions made advance preparations for treatment. Post-incident assessment showed that the health impact of this incident was significantly reduced compared to historical events of similar scale, fully demonstrating the system's value in emergency response.

[0152] Application Effectiveness Evaluation: During the trial operation in the city, a preliminary effectiveness evaluation was conducted. In terms of environmental management benefits, the system helped identify several key pollution sources, optimized various emission reduction measures, and improved the environmental quality compliance rate. Regarding health protection benefits, the incidence of environment-related diseases showed a downward trend, health risks to specific populations decreased, and medical expenses were reduced. In terms of social benefits, public awareness of environmental health improved, and satisfaction survey scores significantly increased; the scientific nature and transparency of government decision-making were enhanced, and support for environmental policies rose. The system also promoted inter-departmental collaboration, establishing data sharing and linkage mechanisms among environmental protection, health, meteorology, and emergency response departments, thereby improving overall governance efficiency.

[0153] The above description is merely a preferred embodiment of the present invention and does not limit the scope of patent protection of the present invention. All equivalent structural transformations made under the inventive concept of the present invention using the contents of the present invention specification and drawings, or direct / indirect applications in other related technical fields, are included within the scope of patent protection of the present invention.

Claims

1. An environmental health risk early warning system based on data fusion and artificial intelligence, characterized in that: include: The multi-source data acquisition and fusion platform includes a multi-source data automatic acquisition submodule, a data cleaning and standardization submodule, a medical and health data processing submodule, and a spatiotemporal matching and interpolation submodule; it is used to acquire and integrate pollution source data, environmental media data, and population health data. The pollution source-environmental medium-human exposure-health effect correlation analysis module includes a pollution source analysis submodule, an environmental medium transport model submodule, a human exposure assessment submodule, and a health effect assessment submodule, which are used to perform spatiotemporal correlation analysis, pollutant exposure path tracing, causal inference of health risks, and biomarker identification and verification of the multi-source data; The causal inference and prediction module uses causal inference algorithms and machine learning prediction models to accurately identify the causal relationship between pollutant exposure and health effects and to scientifically predict future health risks. The health risk dynamic assessment module includes a time series risk assessment submodule and a spatial risk distribution assessment submodule, and receives input data from the human exposure assessment submodule and the health effect assessment submodule; Then, the temporal variation pattern of health risks is analyzed through the time series risk assessment submodule; simultaneously, the spatial distribution characteristics of health risks are analyzed through the spatial risk distribution assessment submodule. Finally, the risk assessment results from the spatiotemporal dimensions are integrated to generate a dynamic risk assessment report; Multi-level early warning and decision support module; based on risk assessment and prediction results, it provides intelligent early warning and decision support functions to help environmental management departments formulate scientific prevention and control measures; Emissions reduction benefit assessment module; It includes an environmental quality improvement assessment submodule and a health benefit assessment submodule; for different emission reduction measures, it assesses their contribution to improving environmental quality and enhancing public health, providing a quantitative basis for environmental management decisions.

2. The environmental health risk early warning system based on data fusion and artificial intelligence according to claim 1, characterized in that, Also includes: The organ system-specific damage biomarker atlas module is used to construct biomarker atlases covering eight major organ systems: kidney, liver, respiratory, nervous, cardiovascular, immune, endocrine, and reproductive, enabling accurate identification and early warning of multi-organ system damage caused by environmental pollutants. The Environmental / Health Knowledge Reasoning and Integration Module is used to extract, integrate, and apply professional knowledge in the environmental health field based on large language model technology, supporting data analysis and decision-making. The regional adaptation mechanism module is used to automatically adjust model parameters, identify key exposure paths, optimize health risk assessment, and customize early warning thresholds based on the geographical, climatic, hydrological, population, and economic characteristics of different regions. A multi-system collaboration mechanism module is used to enable data sharing and collaborative work with existing environmental monitoring systems, public health monitoring systems, and emergency response systems. The system iterative update mechanism module is used to support continuous optimization based on new scientific discoveries, accumulated monitoring data, and user feedback.

3. The environmental health risk early warning system based on data fusion and artificial intelligence according to claim 1, characterized in that, The multi-source data acquisition and fusion platform adopts a deep learning framework for spatiotemporal data fusion and matching to solve the problem of integrating multi-source heterogeneous spatiotemporal data in environmental health research, and achieves accurate matching and fusion of environmental monitoring data and health monitoring data with different spatiotemporal resolutions, sampling frequencies, and data formats.

4. The environmental health risk early warning system based on data fusion and artificial intelligence according to claim 1, characterized in that, The causal inference and prediction module includes: A deep learning-driven causal inference engine is used to fuse structural causal models with neural networks to achieve high-dimensional nonlinear causal discovery. A framework combining causal inference and graph neural networks is used to model environmental health systems as heterogeneous spatiotemporal causal graph networks, enabling the estimation of full-chain causal effects; Spatiotemporally sensitive counterfactual prediction networks are used to support the neuralization of "do-operators" and quantify intervention effects and uncertainties.

5. The environmental health risk early warning system based on data fusion and artificial intelligence according to claim 1, characterized in that, The multi-level early warning and decision support module includes: The scenario simulation and emergency drill subsystem is used to establish a database of emergency scenarios, enabling simulation of event evolution, health impact and emergency response, and interactive multi-department collaborative drills. The ALOHA / CALPUFF / WASP model is used to simulate sudden events such as leaks, explosions, and water pollution. A multi-role collaborative exercise platform is used to record decision-making processes and generate exercise evaluation reports.

6. The environmental health risk early warning system based on data fusion and artificial intelligence according to claim 2, characterized in that, The regional adaptation mechanism module is used for: Automatically identify and classify regional topography, climate, hydrology, land use, population, and economic characteristics; The parameters of atmospheric diffusion, multi-media transport, exposure assessment, and health effects models are dynamically adjusted based on sensitivity analysis and mapping relationship library. Identify regional advantages and exposure pathways and generate customized monitoring plans; It supports adaptive operation in various regions, including plains, coastal areas, mountains, basins, agriculture, industry, and cities.

7. The environmental health risk early warning system based on data fusion and artificial intelligence according to claim 2, characterized in that, The multi-system collaboration mechanism module includes: Standardized data exchange protocols, supporting XML / JSON / CSV and industry standards such as HJ / T 212, HL7 FHIR, OGC, and DICOM; Based on a federated learning-based distributed analysis architecture, the FedAvg parameter aggregation strategy is used to achieve joint modeling under cross-system data privacy protection. Multi-source early warning integration technology integrates environmental monitoring, meteorological, disease, and public opinion early warning information to generate unified hierarchical early warnings; An integrated decision support interface enables collaborative action and optimized resource allocation among multiple departments, including environmental protection, health, and emergency response.

8. An environmental health risk early warning method based on data fusion and artificial intelligence, using the environmental health risk early warning system based on data fusion and artificial intelligence as described in any one of claims 1 to 7, characterized in that, include: Step S1: Acquire and integrate pollution source emission data, environmental media monitoring data, population activity and exposure data, and health effect data in real time through a multi-source data acquisition and fusion platform; Step S2: Using an artificial intelligence analysis engine, the integrated data is sequentially analyzed for pollution source analysis, environmental media transmission simulation, human exposure dose calculation, and health effect assessment to construct a full-chain correlation model of "pollution source-environmental media-human exposure-health effect". Step S3: Based on the aforementioned correlation model, a causal inference algorithm is used to identify the causal relationship between pollutant exposure and health effects, and to quantify the causal effects; Step S4: Use machine learning prediction models to predict changes in environmental quality and health risks to the population under different pollution scenarios in the future; Step S5: Based on biomarker monitoring data, establish a multi-level early warning mechanism to achieve multi-level and accurate early warning of health risks related to pollutant exposure; Step S6: Quantitatively assess the environmental quality improvement effects and health benefits of different emission reduction measures, and generate tiered early warning information and targeted intervention recommendations for environmental management and public health decision-making services.

9. The environmental health risk early warning method based on data fusion and artificial intelligence according to claim 8, characterized in that, Step S1 involves spatiotemporal fusion via a spatiotemporal matching and interpolation submodule, where the weight calculation formula is: Where, d i For spatial distance, t i w represents the time distance. i The data quality coefficient is represented by p and q, which are the spatial and temporal weighting parameters, respectively. The default values ​​are p=2 and q=1. When, directly use the observations from that data source; when When the measured data at that moment is used directly, the fusion process takes into account the accuracy, reliability and representativeness of different data sources, and determines the optimal weight combination through cross-validation; The artificial intelligence analysis engine analyzes and calculates the total individual exposure through the human exposure assessment submodule. The calculation formula is as follows: Where E represents the total exposure. The pollutant concentration in microenvironment i. The time spent in microenvironment i Let be the activity intensity coefficient of microenvironment i, where rest = 1.0, light activity = 1.2, moderate activity = 1.5, and heavy activity = 2.

0.

10. A computer-readable storage medium having a computer program stored thereon, which, when executed by a processor, implements the environmental health risk early warning method based on data fusion and artificial intelligence as described in any one of claims 8-9.