A method and system for quantifying the synergistic effects of multiple factors on ozone pollution

By combining photochemical box model with machine learning methods, key factors were screened and their quantitative impact on ozone concentration and synergistic effect index were calculated. This solved the problems of accuracy and computational complexity in identifying ozone generation factors and synergistic effects in existing technologies, and enabled the scientific prevention and control of ozone pollution.

CN122245491APending Publication Date: 2026-06-19CHINA UNIV OF MINING & TECH

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
CHINA UNIV OF MINING & TECH
Filing Date
2026-05-25
Publication Date
2026-06-19

AI Technical Summary

Technical Problem

Existing technologies struggle to accurately identify and quantify the key drivers of ozone formation and the synergistic effects of multiple factors, especially in the face of meteorological interference and unknown reactions, leading to problems of misjudgment and high computational complexity.

Method used

By combining the photochemical box model with machine learning methods, a causal learning model is constructed to screen key influencing factors and calculate their quantitative impact on ozone concentration and synergistic effect index, thus achieving a combination of chemical mechanism constraints and data-driven analysis.

Benefits of technology

Effectively separating the influence of meteorological factors reduces computational costs, enabling quantitative assessment of ozone formation and accurate judgment of the synergistic effects of multiple factors, thus providing a scientific basis for prevention and control.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN122245491A_ABST
    Figure CN122245491A_ABST
Patent Text Reader

Abstract

This invention discloses a quantitative method and system for assessing the synergistic effects of multiple factors on ozone pollution. The method includes: acquiring ozone, meteorological, and precursor data for a target area; converting VOCs into ozone formation potential to construct a time-series dataset; constructing and optimizing a machine learning model; screening key influencing factors based on SHAP values; constructing a causal learning model based on dual machine learning to estimate the quantitative impact of unit changes in each key factor on ozone concentration; and calculating the relative effect index based on the relative contribution method to achieve quantitative assessment of the synergistic effects of multiple factors and control zones. This invention combines chemical mechanism constraints with data-driven analysis, achieving accurate identification, unified quantification, and synergistic effect analysis of the main controlling factors of ozone pollution.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention belongs to the field of atmospheric environment monitoring and pollution prevention technology, specifically relating to a quantitative method and system for the synergistic influence of multiple factors on ozone pollution. Background Technology

[0002] In recent years, atmospheric ozone (O3) pollution has become the primary issue affecting air quality in my country. Since 2022, the percentage of days exceeding the standard for ozone as the primary pollutant has surpassed that of fine particulate matter (PM2.5). 2.5 Ozone is a secondary pollutant, and its formation is influenced not only by emissions of precursors such as volatile organic compounds (VOCs), nitrogen oxides (NOx), methane (CH4), and carbon monoxide (CO), but also by a combination of factors including meteorological conditions such as temperature, relative humidity, wind speed, and boundary layer height, as well as regional transport processes. Due to the complex nonlinear interactions among these factors, accurately identifying key driving factors and quantitatively assessing the impact of each factor and the synergistic effects of multiple factors on ozone formation remains a critical technical challenge in current atmospheric environmental research and pollution control.

[0003] In existing technologies, ozone formation analysis mainly relies on the following two types of methods:

[0004] The first category is mechanistic analysis methods based on photochemical box models. These methods, based on detailed chemical mechanisms (such as MCM) or highly homogenized mechanisms (such as RACM, SAPRC, CB, etc.), simulate the photochemical oxidation process of VOCs and NOx under illumination by inputting observed meteorological parameters and pollutant concentrations. Then, by changing the initial conditions (such as adjusting NOx or VOC concentrations), ozone formation sensitivity analysis is performed to obtain the relative contributions of key precursors. However, this type of method has significant shortcomings: First, the box model mainly relies on known chemical reaction mechanisms, ignoring regional transport and meteorological field evolution. In areas heavily influenced by external transport, it is difficult to effectively separate the contributions of local generation and external transport, making it difficult to remove meteorological interference. Second, for unknown reaction processes that are not yet understood, the model cannot effectively identify and quantify them, which may lead to misjudgments of ozone formation rates. Third, ozone concentration is controlled by multiple factors such as meteorology, precursors, regional transport, and chemical reactions, and the box model has difficulty effectively quantifying the nonlinear collaborative contributions between these factors. Fourth, when using detailed chemical reaction mechanisms to simulate long-term pollution processes, thousands of chemical reactions and a large number of time steps need to be coupled, resulting in high computational complexity, long processing time, and demanding requirements on server performance.

[0005] The second category is data-driven analysis methods based on machine learning. This method collects meteorological parameters and pollutant concentration data for the study area, uses ozone concentration as the prediction target, and constructs machine learning models such as random forests, gradient boosting decision trees, support vector regression, or neural networks. It then uses feature importance scores or interpretable SHAP (Shapley Additive Explanations) values ​​to identify key meteorological factors and precursors affecting ozone concentration. Although machine learning methods can capture complex nonlinear statistical relationships between multiple factors, they still have the following limitations: First, the model input lacks chemical activity characteristics. Most studies directly use the concentrations of dozens or even hundreds of VOC species as input, ignoring the significant differences in ozone formation potential among species, resulting in a lack of reasonable explanations for the chemical mechanisms in the models. Second, existing methods mostly qualitatively determine the contribution of each factor to ozone through feature importance or SHAP values, failing to quantitatively explain the changes in ozone concentration caused by actual changes in factors, lacking quantitative characterization of factor contributions. Third, existing machine learning models mainly focus on the impact of single factors on ozone, lacking quantitative assessment methods for the synergistic contributions of multiple factors, making it difficult to determine whether different influencing factors have synergistic enhancing or mutually inhibiting effects.

[0006] In summary, existing technologies struggle to simultaneously address meteorological interference removal, unknown process identification, multi-factor synergistic quantification, and computational efficiency, while data-driven models lack chemical mechanism constraints, quantitative attribution capabilities, and synergistic impact assessment methods. Therefore, a technical solution is urgently needed that can simultaneously address chemical mechanism constraints and data-driven analysis capabilities to systematically identify the influence patterns of different precursors and meteorological factors on ozone formation. This would enable accurate identification of key influencing factors in ozone formation, quantitative assessment of their impact, and quantitative analysis of multi-factor synergistic effects, thereby providing technical support for the scientific prevention and precise control of ozone pollution. Summary of the Invention

[0007] To address the aforementioned technical problems, this invention proposes a quantitative method and system for the synergistic effects of multiple factors on ozone pollution. By coupling a photochemical box model with a causal learning model, this method takes into account both chemical mechanism constraints and data-driven analysis, achieving accurate identification, unified quantification, and synergistic effect analysis of the main controlling factors of ozone pollution.

[0008] To achieve the above effects, the first technical solution of this application discloses a quantitative method for the synergistic influence of multiple factors on ozone pollution, comprising the following steps:

[0009] S1. Obtain ozone observation data of the target area, as well as meteorological data and precursor data related to ozone generation, and construct time-series datasets for each meteorological data and precursor data; wherein, the precursor data includes nitrogen oxides (NOx) and volatile organic compounds (VOCs), and the time-series datasets of each volatile organic compound are converted into corresponding ozone generation chemical activity parameters through a photochemical box model.

[0010] S2. Using the time-series dataset as input and ozone concentration as output, construct a machine learning model and optimize the model's hyperparameters.

[0011] S3. Perform feature importance analysis based on the optimized machine learning model, and screen key influencing factors by ranking. The key influencing factors include at least one nitrogen oxide and at least one volatile organic compound.

[0012] S4. Construct a causal learning model to estimate the causal effects of each key influencing factor, and obtain the quantitative impact of each key factor on ozone concentration when the change is unit. ;

[0013] S5. Calculate the relative effect index of each key factor on ozone formation at a predetermined rate of change. The quantitative relationship between the influence of key factors in the target area on ozone formation was obtained.

[0014] Furthermore, the quantitative relationship between the key factors in the target area and the ozone formation can be used to distinguish the ozone control zone type in the target area and / or to perform joint perturbation analysis between different influencing factors in the target area.

[0015] Furthermore, the time-series datasets of each volatile organic compound in S1 are constructed by converting them into corresponding ozone generation chemical activity parameters, which include ozone generation potential and / or hydroxyl radical reactivity.

[0016] The ozone generation potential is obtained by multiplying the maximum incremental reactivity calculated based on the photochemical box model with the concentration of each volatile organic compound.

[0017] The hydroxyl radical reactivity is obtained by multiplying the hydroxyl radical reaction rate constant by the concentration of each volatile organic compound.

[0018] Furthermore, the machine learning model described in S2 is an ensemble learning model or a neural network model, and hyperparameter optimization is performed using ten-fold cross-validation.

[0019] Furthermore, the quantitative influence value of ozone concentration mentioned in S4 Calculated using the following causal learning model:

[0020] ;

[0021] in, X represents the marginal impact coefficient of key factor i on ozone, and X represents the remaining variables besides the key influencing factor. Represents key factor i, Representative except The nonlinear influence function of the other control variables, This represents the random error term.

[0022] Furthermore, the relative effect index described in S5 The calculation formula is:

[0023] ;

[0024] in, This represents the change in ozone concentration when the key factor i is disturbed alone. This represents the ozone concentration without changing any variables. Representative factor i change f·Z i The change in ozone concentration under the given conditions, where f is the predetermined change percentage of each key factor, and Z... i Indicates the concentration of key factors; This represents the change in concentration of the key factor i, and is numerically equal to f.

[0025] Furthermore,

[0026] The distinction between ozone control zone types in the target area is specifically as follows:

[0027] when At that time, the study area was a VOCs control zone;

[0028] when At that time, the study area was a NOx control area;

[0029] when At that time, the study area was a transition zone, that is, a zone jointly controlled by NOx and VOCs;

[0030] The joint perturbation analysis of different key influencing factors in the target area specifically involves quantitatively determining the multi-factor synergistic influence relationship of ozone pollution by calculating the synergistic influence index among different key influencing factors; wherein the formula for calculating the synergistic influence index is:

[0031] ;

[0032] in, The index representing the synergistic impact of key impact factor i and key impact factor j; This represents the change in ozone concentration when both key influencing factor i and key influencing factor j are simultaneously disturbed. and These represent the changes in ozone concentration caused by individual perturbations of key influencing factor i and key influencing factor j, respectively.

[0033] The basis for quantitatively judging the synergistic influence of multiple factors on ozone pollution is as follows:

[0034] when The results indicate that key influencing factors i and j have a synergistic enhancing effect on ozone formation;

[0035] when This indicates that key influencing factor i and key influencing factor j have a mutually inhibitory effect on ozone formation;

[0036] when This indicates that factors i and j have virtually no synergistic effect on ozone formation.

[0037] The second technical solution of this application discloses a quantitative system for the synergistic effects of multiple factors on ozone pollution, including:

[0038] The data acquisition module is used to acquire ozone observation data of the target area, as well as meteorological data and precursor data related to ozone generation, and to construct time-series datasets for each meteorological data and precursor data; wherein, the precursor data includes nitrogen oxides (NOx) and volatile organic compounds (VOCs), and the time-series datasets for each volatile organic compound are constructed by converting them into corresponding ozone generation chemical activity parameters.

[0039] The model building module is used to build a machine learning model with the time series dataset as input and ozone change as output, and to optimize the hyperparameters of the model.

[0040] The factor screening module is used to perform feature importance analysis based on the optimized machine learning model and screen key influencing factors by ranking. The key influencing factors include at least one nitrogen oxide and at least one volatile organic compound.

[0041] The causal estimation module is used to build a causal learning model, estimate the causal effects of each key influencing factor, and obtain the quantitative impact value of each key factor on ozone concentration when the change is a unit.

[0042] The quantitative analysis module is used to calculate the relative effect index of each key factor on ozone generation based on the relative contribution method, so as to obtain the quantitative relationship of the influence of each key factor on ozone generation in the target area.

[0043] The control zone differentiation module is used to differentiate the ozone control zone type of the target area according to the quantitative relationship; and / or the synergistic analysis module is used to perform joint perturbation analysis between different key influencing factors in the target area to obtain the synergistic influence index between each key influencing factor.

[0044] Furthermore, the data acquisition module includes:

[0045] The photochemical calculation unit is used to calculate the chemical activity coefficients of each volatile organic compound based on the photochemical box model. The chemical activity coefficients include the maximum incremental reaction activity and / or the hydroxyl radical reaction rate constant.

[0046] An active conversion unit is used to multiply the chemical activity coefficient by the concentration of each volatile organic compound to obtain ozone generation chemical activity parameters, wherein the ozone generation chemical activity parameters include ozone generation potential and / or hydroxyl radical reactivity.

[0047] And an electronic device, including a processor and a memory, the memory storing a computer program, characterized in that the processor executes the computer program to implement the method described in the first technical solution.

[0048] Compared with the prior art, the present invention has the following beneficial effects:

[0049] (1) Effectively separating the influence of meteorological factors on ozone formation

[0050] To address the problem that existing box models struggle to remove meteorological interference, this invention introduces meteorological parameters (temperature, relative humidity, wind speed, boundary layer height, etc.) as machine learning input features and utilizes a dual-causal machine learning framework to statistically control meteorological confounding factors. This effectively separates the independent influence of meteorological conditions on ozone formation, solving the simulation bias problem caused by the box model neglecting meteorological field evolution and regional transport.

[0051] (2) Balancing chemical mechanism constraints with big data analysis capabilities

[0052] To address the lack of chemical mechanism constraints in existing machine learning models, this invention calculates the localized MIR and OFP for each VOC species using a box model, replacing the original VOC concentration with OFP as the model input, thus providing the data-driven model with photochemical mechanism constraints. Simultaneously, based on machine learning methods, this invention captures the influence of various factors on ozone formation through a big data-driven approach, providing indicative evidence for the existence of unknown reactive species and avoiding errors caused by the unclear chemical mechanisms in the box model.

[0053] (3) Significantly reduce computational costs and improve analysis efficiency

[0054] To address the issues of long computation time and high operating costs in existing box models due to the complexity of chemical reactions, this invention utilizes machine learning methods and SHAP analysis to screen key features, reducing high-dimensional input variables to the top N main influencing factors. This retains the key factors for ozone formation while reducing data dimensionality and runtime. Compared to traditional box models that simulate tens of thousands of chemical reactions, this method significantly reduces computation time while maintaining analytical accuracy, thus significantly reducing reliance on high-performance computing resources.

[0055] (4) To achieve unified quantification of the influence of different dimensional factors on ozone formation.

[0056] Different influencing factors (such as temperature, NO2, and OFP) have different dimensions, making direct comparison impossible. Existing methods can only provide a qualitative ranking of importance, without specifying how much a change in a particular factor will affect ozone formation. This invention standardizes each factor, uniformly calculating the ozone change when they change by the same proportion (e.g., 1%, 2%, 5%, or 10%), avoiding misleading conclusions due to different dimensions and achieving a unified quantification of the impact of factors with different dimensions on ozone formation.

[0057] (5) To achieve quantitative assessment of the synergistic effects of multiple factors on ozone pollution.

[0058] To address the difficulty of quantifying the synergistic effects of multiple factors using existing methods, this invention proposes a Synergistic Index (SI) method. This method can perform joint perturbation analysis on two or more influencing factors, quantitatively assessing the synergistic enhancement or mutual inhibition effects between different factors. This method overcomes the limitation of existing technologies that can only analyze the effects of single factors, providing a quantitative basis for judging the synergistic effects of multiple factors in ozone pollution control. Attached Figure Description

[0059] To more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings used in the description of the embodiments of the present invention will be briefly introduced below. Obviously, the drawings described below are only some embodiments of the present invention. For those skilled in the art, other drawings can be obtained based on these drawings without creative effort.

[0060] Figure 1 This is a flowchart illustrating the technical process of the present invention.

[0061] Figure 2 This is a comparison chart of hyperparameter optimization before and after the optimization of a machine learning model based on a photochemical parameter training set. Detailed Implementation

[0062] To make the technical problems solved, the technical solutions, and the beneficial effects of this invention clearer, the invention will be further described in detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.

[0063] First, the technical terms used in this application will be explained.

[0064] The term "O3" stands for ozone, a key secondary pollutant in the atmosphere, typically expressed in μg / m³. 3 Or ppb.

[0065] The term "VOCs" stands for Volatile Organic Compounds, which are important precursors to ozone formation.

[0066] The term "NOx" stands for nitrogen oxides, which include NO and NO2 and are important precursors to ozone formation.

[0067] The term "MIR" stands for Maximum Incremental Reactivity, which represents the maximum amount of ozone generated per unit concentration of pollutant at different NOx concentrations.

[0068] The term "OFP" stands for Ozone Formation Potential, which indicates the contribution of a pollutant to ozone formation.

[0069] The term "DML" stands for Double Machine Learning, which is used to estimate the causal effect of a variable on a target variable.

[0070] The term "SHAP, Shapley Additive Explanations" is a method used to interpret the contribution of variables in a machine learning model.

[0071] The term "RE, Relative Effect" is used to evaluate the relative change in ozone concentration caused by a certain proportion of change in a certain influencing factor, while keeping other variables constant.

[0072] The term "SI" stands for Synergy Index, which measures the strength of synergistic or inhibitory effects among different influencing factors.

[0073] like Figure 1 As shown, the first embodiment of this application discloses a quantitative method for the synergistic influence of multiple factors on ozone pollution, including the following steps.

[0074] S1. Constructing a training dataset based on a localized photochemical box model: Obtain ozone observation data, meteorological data, and precursor data related to ozone generation in the target area, and construct time-series datasets for each meteorological data and precursor data; wherein, the precursor data includes nitrogen oxides (NOx) and volatile organic compounds (VOCs), and the time-series datasets of each VOC are converted into corresponding ozone generation chemical activity parameters through a photochemical box model.

[0075] In this step, the “time series dataset” refers to a multivariate observation data matrix arranged in a time series, with each row corresponding to an observation time and each column corresponding to a meteorological parameter, precursor concentration or ozone concentration variable, used to reflect the dynamic evolution of the ozone formation process over time.

[0076] The "ozone generation chemical activity parameter" refers to a parameterized index obtained by combining the concentration of volatile organic compounds (VOCs) with their chemical reactivity. It is used to characterize the ability of different VOC species to participate in ozone generation reactions under specific environmental conditions, which is different from directly using VOC concentration as model input.

[0077] The activity parameters include, but are not limited to, ozone formation potential (OFP) and / or hydroxyl radical reactivity (LOH). The "ozone formation potential (OFP)" refers to the maximum potential contribution of a VOC species to ozone formation under specific nitrogen oxide (NOx) conditions, expressed in μg / m³. 3 Or ppb; the “hydroxyl radical reactivity (LOH)” refers to the product of the reaction rate constant (kOH) of VOC species with hydroxyl radicals (OH) and the concentration of the species, reflecting the intensity of its activity in participating in photochemical reactions through the OH radical pathway.

[0078] The "maximum incremental reactivity (MIR)" refers to the maximum increase in ozone caused by a change in a unit mass (or concentration) of VOCs under different NOx concentration scenarios, with the unit being g O3 / g VOC.

[0079] In one specific embodiment, this step specifically includes:

[0080] 1. Obtain hourly observation data of the target area through environmental monitoring stations and meteorological observation stations.

[0081] The meteorological data includes temperature (T), relative humidity (RH), wind speed (WS), wind direction (WD), surface solar radiation (SSR), boundary layer height (BLH), total cloud cover (TCC), and total precipitation (TP).

[0082] The precursor data includes nitrogen oxides (NOx), carbon monoxide (CO), methane (CH4), sulfur dioxide (SO2), and fine particulate matter (PM2.5). 2.5 Inhalable particulate matter (PM2.5) 10 The data includes the concentrations of various VOC species; the ozone observation data refers to the hourly concentration of ozone (O3).

[0083] After all data undergoes quality control, outliers and missing values ​​are removed, and an hourly time-series dataset is constructed.

[0084] 2. For VOCs data, this implementation method calculates the ozone generation chemical activity parameters using a photochemical box model.

[0085] (1) Calculate the maximum incremental reactivity (MIR) of each VOC species:

[0086] ① Perform diurnal variation calculations on the observed data (obtain 24 sets of hourly average data from 0:00 to 23:00) and use them as input for the photochemical box model;

[0087] ② By changing the NOx concentration input settings in the model, the photochemical reactions under different NOx scenarios are simulated, and the ratio of the change in ozone concentration ΔO3 to the change in VOCs concentration ΔVOCs is obtained. The maximum value is taken as the MIR for this species, i.e.:

[0088] ;

[0089] in, The maximum incremental reactivity of the i-th VOC; This represents the change in concentration of the i-th VOC, expressed in grams. It represents the change in ozone concentration, measured in grams; NOx represents different nitrogen oxide scenarios.

[0090] (2) Multiply the MIR of each VOC species by its hourly concentration to obtain the ozone formation potential. :

[0091] ;

[0092] in, This represents the change in concentration of the i-th VOC species, expressed in ppb. Let represent the ozone formation potential of the i-th VOC, expressed in g.

[0093] In another equivalent embodiment, step (2) can be implemented using the reaction rate constant of VOCs and OH radicals. Alternative to MIR, calculate hydroxyl radical reactivity:

[0094] ;

[0095] in, This indicates the reactivity of hydroxyl radicals.

[0096] Ultimately, the input feature matrix of the training dataset consists of ozone generation chemical activity parameters (OFP and / or LOH of each VOC), and concentrations of other ozone precursors (NOx, CO, CH4, SO2, PM2.5). 2.5 PM 10 It consists of ozone concentration and meteorological parameters; the target variable (output) is ozone concentration or its change.

[0097] This step transforms VOCs concentration into a chemically active parameter with photochemical significance, enabling the input features of subsequent machine learning models to reflect the essential differences in ozone generation potential among different species. This overcomes the attribution bias caused by existing pure data-driven models ignoring differences in chemical activity. At the same time, the MIR calculated based on the localized photochemical box model can accurately reflect the emission characteristics and environmental conditions of the target area, improving the relevance and reliability of the training dataset.

[0098] S2. Optimization and construction of the machine learning model: Using the time series dataset as input and ozone change as output, a machine learning model is constructed, and the model is optimized for hyperparameters.

[0099] The "machine learning model" refers to a computational model that learns the mapping relationship between input features and output variables through a data-driven approach. In this invention, the machine learning model includes, but is not limited to, ensemble learning models (such as Random Forest (RF), Gradient Boosting Decision Tree (GBDT), Extreme Gradient Boosting (XGBoost), Lightweight Gradient Boosting Machine (LightGBM), etc.) or neural network models (such as Deep Neural Networks (DNN), etc.).

[0100] The "ten-fold cross-validation" refers to randomly dividing the original dataset into 10 non-overlapping subsets, selecting 9 subsets as the training set and the remaining subset as the validation set each time, and performing the training and validation process 10 times in a loop. The average error of each validation result is used as the model performance evaluation index, thereby improving the stability and reliability of the model results.

[0101] The term "hyperparameter optimization" refers to optimizing parameters (such as the number of trees, maximum depth, etc.) that need to be pre-set in the model using automatic parameter tuning algorithms (such as Bayesian optimization, grid search, random search, etc.) to avoid overfitting caused by relying solely on training error.

[0102] In one specific implementation, the step specifically includes:

[0103] 1. Using the time-series dataset as input and ozone concentration (or ozone change) as output, construct a machine learning model. In a preferred embodiment, a random forest (RF) model is used; in other equivalent embodiments, GBDT, XGBoost, LightGBM, or a neural network model may be used.

[0104] During model training, the dataset is randomly divided into training and test sets according to a preset ratio (e.g., 8:2). Ten-fold cross-validation is used to evaluate model performance and optimize hyperparameters. The average error of cross-validation is calculated using the following formula:

[0105] ;

[0106] Wherein, CV is the average error of cross-validation, and K is the number of folds. In this invention, K=10. Let be the error of the k-th fold validation set.

[0107] 2. Hyperparameter optimization can employ Bayesian optimization methods, using the average error of 10-fold cross-validation as the optimization objective. Taking random forest as an example, the main hyperparameters to be optimized include: the number of trees (n_estimators), the maximum depth (max_depth), the minimum number of sample splits (min_samples_split), and the minimum number of leaf samples (min_samples_leaf). During hyperparameter tuning, using the average error of 10-fold cross-validation as the optimization objective and selecting the set of hyperparameters (such as the number of trees and maximum depth) that minimizes the average error can optimize the machine learning model and avoid overfitting caused by solely relying on training error.

[0108] This step involves comparing and cross-validating multiple models to select the model with the best ozone concentration prediction performance (the optimal choice is the random forest model); hyperparameter optimization is then used to further reduce prediction errors and improve the model's generalization ability, providing a highly reliable basic model for subsequent key factor selection and causal effect estimation.

[0109] S3. Screening key ozone-influencing factors: Based on the optimized machine learning model, feature importance analysis is performed, and key influencing factors are screened by ranking. The key influencing factors include at least one nitrogen oxide and at least one volatile organic compound.

[0110] The "feature importance analysis" refers to analytical methods used to explain and evaluate the contribution of each input feature in a machine learning model to the prediction results, including but not limited to the SHAP (Shapley Additive Explanations) method, permutation importance method, and Gini importance method. The SHAP method is based on the Shapley value theory in cooperative game theory, and achieves additive decomposition of the model's predicted values ​​by calculating the marginal contribution of each feature in all possible feature combinations.

[0111] Based on the S2-optimized machine learning model, importance analysis is performed on each input feature. In a preferred embodiment, the SHAP method is used to calculate the SHAP value of each feature in each sample. For the j-th sample, the model prediction can be decomposed into:

[0112] ;

[0113] in, This represents the model's prediction for the j-th sample; The mean of all sample predictions is also called the baseline prediction. This represents the contribution of the k-th feature variable to the predicted value of the j-th sample. By comparing the SHAP values ​​of different feature variables, the qualitative contribution and relative importance of each factor to ozone concentration can be determined.

[0114] The overall importance score of each feature is obtained by calculating the average of the absolute values ​​of the SHAP values ​​of all features across all samples, and these features are then sorted from highest to lowest score. The top N (e.g., 30) features are selected as key influencing factors. The selection criteria ensure that these key influencing factors include at least one nitrogen oxide (e.g., NOx) and at least one volatile organic compound (e.g., OFP or L-O2 of a VOC). OH This is to support subsequent control area judgment and collaborative analysis.

[0115] In this step, due to the large number of input variables (8 meteorological variables, hundreds of VOC species, and 6 other pollutants), directly inputting all variables into the dual machine learning model would lead to an exponential increase in computational complexity, a significant extension of model training time, and a high risk of overfitting. Furthermore, the introduction of numerous noisy variables into the high-dimensional sparse data would interfere with the accurate estimation of causal effects. To address these issues, this technique, based on SHAP calculation results, selects the top N key feature variables (chosen according to actual conditions) that contribute to ozone formation as input for the dual machine learning. This method avoids omitting key factors of ozone formation and effectively reduces feature dimensionality, significantly improving computational efficiency while maintaining analytical accuracy, making dual machine learning feasible and operable in practical applications.

[0116] S4. Construct a causal learning model to calculate the impact of key factors on ozone: Estimate the causal effects of each key influencing factor to obtain the quantitative impact of each key factor on ozone concentration per unit change. .

[0117] The “causal learning model” refers to a statistical model used to estimate the causal effect of the target treatment variable on the outcome variable under the condition of controlling for the influence of confounding variables. It includes, but is not limited to, causal inference methods such as the Double Machine Learning (DML) framework, instrumental variable method, and propensity score matching.

[0118] The "marginal impact coefficient" refers to the average causal impact on ozone concentration when the key influencing factor changes by one unit, while keeping other control variables constant.

[0119] The "quantitative impact value" "Ozone concentration change" refers to the change in ozone concentration caused by a unit change in each key factor, estimated through a causal model, with units of μg / m³. 3 Or ppb.

[0120] This implementation method preferably uses a dual machine learning framework to construct a causal learning model, with the model taking the following form:

[0121] ;

[0122] in, X represents the marginal impact coefficient of key factor i on ozone, and X represents the remaining variables besides the key influencing factor. Represents key factor i, Representative except The nonlinear influence function of the other control variables, Represents the random error term

[0123] By changing the concentration of each of the screened key factors, the quantitative impact of a unit change in each factor on the ozone concentration can be obtained. For example, it can obtain the specific number of micrograms per cubic meter that the ozone concentration changes when the temperature changes by 1 degree Celsius.

[0124] This step differs from traditional machine learning models that only reflect statistical correlation. By using a causal inference framework to mathematically control the interference of confounding factors, it obtains the independent causal effects of each key factor on ozone concentration, thus achieving a leap from "correlation" to "causation".

[0125] S5. Quantify the impact of each factor on ozone: Calculate the relative effect index of each key factor on ozone formation at a predetermined rate of change. The quantitative relationship between the influence of key factors in the target area on ozone formation was obtained.

[0126] To address the issue that different ozone formation influencing factors cannot be directly compared due to their different dimensions, and may even lead to model attribution bias, this step standardizes the variation range of each influencing factor based on the calculation results of a dual machine learning model, and uniformly calculates the ozone change corresponding to a certain percentage change in each factor, thereby achieving comparability of the degree of influence between variables of different magnitudes.

[0127] The relative effect (RE) of each influencing factor on ozone formation is quantified by calculating the change in ozone concentration when each influencing factor changes by a predetermined percentage (preferably 10%, but can also be 1%, 2%, 5%, 20%, etc., depending on the needs). The expression for RE is as follows:

[0128] ;

[0129] in, This represents the change in ozone concentration when the key factor i is disturbed alone. This represents the ozone concentration without changing any variables. Representative factor i change f·Z i The change in ozone concentration under the given conditions, where f is the predetermined change percentage of each key factor, and Z... i Indicates the concentration of key factors; This represents the change in concentration of the key factor i, and is numerically equal to f.

[0130] The above calculations output a unified quantitative impact index, which is used to determine the control intensity of different pollutants and meteorological factor disturbances on the ozone formation process, and serves as the basis for the formulation of pollution control strategies.

[0131] In a further embodiment, the quantitative relationship between the key factors in the target area and the influence of ozone generation can be used to distinguish the type of ozone control zone in the target area and / or to perform joint perturbation analysis between different influencing factors in the target area.

[0132] The distinction between ozone control zone types in the target area is specifically as follows:

[0133] when At that time, the study area was a VOCs control zone;

[0134] when At that time, the study area was a NOx control area;

[0135] when At that time, the study area was a transition zone, that is, a zone where NOx and VOCs are jointly controlled.

[0136] The joint perturbation analysis of different key influencing factors in the target area specifically involves quantitatively determining the multi-factor synergistic influence relationship of ozone pollution by calculating the synergistic influence index among different key influencing factors; wherein the formula for calculating the synergistic influence index is:

[0137] ;

[0138] in, The index representing the synergistic impact of key impact factor i and key impact factor j; This represents the change in ozone concentration when both key influencing factor i and key influencing factor j are simultaneously disturbed. and These represent the changes in ozone concentration caused by individual perturbations of key influencing factor i and key influencing factor j, respectively.

[0139] The basis for quantitatively judging the synergistic influence of multiple factors on ozone pollution is as follows:

[0140] when The results indicate that key influencing factors i and j have a synergistic enhancing effect on ozone formation;

[0141] when This indicates that key influencing factor i and key influencing factor j have a mutually inhibitory effect on ozone formation;

[0142] when This indicates that factors i and j have virtually no synergistic effect on ozone formation.

[0143] By differentiating control zones, it is possible to quickly determine whether a target area belongs to a VOCs control zone, NOx control zone, or transition zone, providing a direct basis for the precise prevention and control of regional ozone pollution by zone and classification. Through the synergistic impact index, it is possible to quantitatively assess the synergistic or inhibitory effects between meteorological factors and precursor factors, as well as between different precursors, making up for the shortcomings of existing technologies that can only analyze the impact of single factors, and providing a quantitative judgment tool for the formulation of multi-factor synergistic control strategies.

[0144] The technical methods and effects of this application will be described in detail below through specific embodiments.

[0145] Example 1

[0146] This embodiment uses the ozone pollution season (April-October) of a typical polluted city in China in a certain year as an example to further illustrate the present invention. The technical flowchart provided in this embodiment is as follows: Figure 1 As shown.

[0147] Step 1: Construct a training dataset based on the localized photochemical box model

[0148] (1) Acquisition of relevant meteorological parameters and precursor data affecting ozone formation

[0149] Collect hourly meteorological data and pollutant concentration data for ozone pollution season in a sample city within a given year.

[0150] There are a total of 8 meteorological parameters, including: temperature (T, unit: °C), relative humidity (RH, unit: %), wind speed (WS, unit: m / s), wind direction (WD, unit: °), boundary layer height (BLH, unit: m), surface solar radiation (SSR, unit: W / m²), total cloud cover (TCC, dimensionless), and total precipitation (TP, unit: mm).

[0151] A total of 108 pollutant data points were collected, including: O3, NOx, CO, CH4, SO2, and PM2.5. 2.5 and PM 10 hourly concentration (unit: μg / m³) 3 (or ppb), and hourly concentrations (in ppb) of 100 VOC species.

[0152] All data undergoes quality control to remove outliers and missing values.

[0153] (2) Calculation of localized VOCs ozone maximum increment reactivity parameters based on box model

[0154] Hourly averages were performed on the collected meteorological and pollutant data to obtain 24 rows of diurnal variation data from 0:00 to 23:00. The daily data were input into the photochemical box model to simulate ozone formation on a daily scale. By changing the NOx concentration input to the photochemical box model, the maximum incremental ozone reactivity was calculated under the meteorological and pollutant conditions of the example city. The localized MIR results for some VOCs in the example are shown in Table 1.

[0155] Table 1. Comparison of localized MIR and Saprc11 mechanisms in studied cities

[0156] .

[0157] Comparing the calculated localized MIR results with the default MIR values ​​for the SAPRC11 mechanism commonly used in existing studies revealed significant differences. This indicates that using localized MIR parameters can more accurately reflect the ozone formation process under different pollutant emission characteristics and environmental conditions in different Chinese cities, thereby improving the relevance and reliability of ozone formation analysis results.

[0158] (3) Construct training data with photochemical significance

[0159] By multiplying the obtained localized MIR values ​​by the VOCs concentrations, hourly datasets of ozone generation potential for each VOCs species are generated. This embodiment includes meteorological parameters (T, RH, WS, BLH, SSR, TCC, TP) and other pollutants (NOx, CO, SO2, PM). 2.5 PM 10 , , , …) Hourly data are merged to form the feature matrix X of the training dataset, with the measured ozone concentration as the target variable. .

[0160] All valid samples are randomly divided into training and test sets in an 8:2 ratio.

[0161] Step 2: Optimizing and Building the Machine Learning Model

[0162] (1) Machine learning model selection

[0163] This study compares six machine learning models: Deep Learning (DL), Decision Tree (DT), Gradient Boosting Tree (GBDT), Lightweight Gradient Boosting Machine (LightGBM), Extreme Gradient Boosting (XGBoost), and Random Forest (RF). Ten-fold cross-validation is used to evaluate the performance of each model, with the coefficient of determination (R²) as the evaluation metric. 2 The performance comparison of the machine learning models in the examples is shown in Table 2.

[0164] Table 2 Performance Comparison and Selection of Machine Learning Models

[0165] .

[0166] The results show that the RF model performs best, R 2 The value is equal to 0.914, and the RMSE is equal to 14.39 μg / m². 3 Therefore, RF was chosen as the base model.

[0167] (2) Hyperparameter optimization of machine learning models

[0168] The Bayesian optimization method was used to tune the hyperparameters of the random forest model. The main optimizations were made to four hyperparameters: the number of trees (n_estimators), the maximum depth (max_depth), the minimum number of sample splits (min_samples_split), and the minimum number of leaf samples (min_samples_leaf). The main optimization test range and optimal values ​​are shown in Table 3.

[0169] Table 3 Test range and selection for hyperparameter optimization of Random Forest (RF) model

[0170] .

[0171] The optimized result is as follows Figure 2 As shown, Random Forest R 2 The value is 0.925, and the RMSE is 13.45 ppb, indicating a significant improvement in simulation performance.

[0172] Step 3: Screening key ozone-influencing factors based on the SHAP method

[0173] The Shapley Additive Explanations (SHAP) method was used to interpret and analyze the optimized random forest model. The average absolute value of the SHAP for each feature in each sample was calculated, and the top 30 input factors were selected as key influencing factors after ranking. The top 10 input factors and their SHAP importance are shown in Table 4.

[0174] Table 4. Top 10 input factors and their SHAP importance

[0175] .

[0176] Step 4: Construct a causal learning model to calculate the impact of key factors on ozone.

[0177] Based on the top 30 key factors identified through SHAP analysis, a Double Machine Learning (DML) model was constructed to obtain the quantitative impact of a unit change in each key factor on ozone concentration. For example, in the example city, a 1-degree Celsius increase in temperature corresponds to a +3.75 μg / m³ change in ozone concentration. 3 A 1% increase in humidity results in a -0.51 μg / m³ change in ozone levels. 3 For every 1 μg / m³ increase in NOx concentration... 3 The change in ozone was -0.64 μg / m³. 3 The ozone formation potential of 2,3-dimethylbutane increases by 1 μg / m³. 3 The ozone change was +10.47 μg / m³. 3 The ozone formation potential of toluene increases by 1 μg / m³. 3 The ozone change was +0.13 μg / m³. 3 The ozone formation potential of acetone increases by 1 μg / m³. 3 The ozone change was +7.37 μg / m³. 3 This indicates that 2,3-dimethylbutane, toluene, acetone, and increased temperature all promote ozone formation, while increased relative humidity and NOx inhibit ozone formation.

[0178] Step 5: Quantify the impact of each factor on ozone using the relative contribution method.

[0179] Step four calculates the ozone formation potential of 2,3-dimethylbutane for every 1 μg / m³ change. 3 Under certain conditions, ozone formation would increase dramatically, but in reality, the concentration of 2,3-dimethylbutane in the atmosphere is very low, making it difficult to change by more than 1 μg / m³. 3 Furthermore, the different influencing factors, due to their different dimensions, make it difficult to compare their impact on ozone formation. Therefore, a relative contribution method was used to uniformly quantify the examples, calculating the percentage change in ozone formation for every 10% change in the characteristic factors.

[0180] Calculations show that for example cities, a 10% change in temperature results in a +14.25% change in ozone; a 10% increase in humidity results in a -6.92% change in ozone; a 10% increase in NOx concentration results in a -1.18% change in ozone; a 10% increase in the ozone formation potential of 2,3-dimethylbutane results in a +0.35% change in ozone; a 10% increase in the ozone formation potential of toluene results in a +0.14% change in ozone; and a 10% increase in the ozone formation potential of acetone results in a +1.70% change in ozone.

[0181] At this point, the ozone changes caused by all VOCs changes are summed to obtain the result. Changes in ozone caused by NOx changes .because The example city was a VOCs control zone during the observation period.

[0182] Step Six: Construction of a Quantitative Method for the Synergistic Influence of Multiple Factors on Ozone Pollution

[0183] This embodiment selects two key factors, temperature and toluene, for synergistic effect analysis. When either factor changes individually, a 10% change in temperature results in a +14.25% change in ozone; a 10% increase in the ozone formation potential of toluene results in a +0.14% change in ozone. When both temperature and toluene factors change by 10% simultaneously, ozone increases by +17.35%.

[0184] ;

[0185] This indicates that simultaneous changes in temperature and toluene have a synergistic enhancing effect on ozone formation.

[0186] The above-described embodiments are only used to illustrate the technical solutions of the present invention, and are not intended to limit it. Although the present invention has been described in detail with reference to the foregoing embodiments, those skilled in the art should understand that modifications can still be made to the technical solutions described in the foregoing embodiments, or equivalent substitutions can be made to some of the technical features. Such modifications or substitutions do not cause the essence of the corresponding technical solutions to deviate from the spirit and scope of the technical solutions of the embodiments of the present invention, and should all be included within the protection scope of the present invention.

Claims

1. A quantitative method for the synergistic effects of multiple factors on ozone pollution, characterized in that, Includes the following steps: S1. Obtain ozone observation data of the target area, as well as meteorological data and precursor data related to ozone generation, and construct time-series datasets for each meteorological data and precursor data; wherein, the precursor data includes nitrogen oxides (NOx) and volatile organic compounds (VOCs), and the time-series datasets of each volatile organic compound are converted into corresponding ozone generation chemical activity parameters through a photochemical box model. S2. Using the time-series dataset as input and ozone concentration as output, construct a machine learning model and optimize the model's hyperparameters. S3. Perform feature importance analysis based on the optimized machine learning model, and screen key influencing factors by ranking. The key influencing factors include at least one nitrogen oxide and at least one volatile organic compound. S4. Construct a causal learning model to estimate the causal effects of each key influencing factor, and obtain the quantitative impact of each key factor on ozone concentration when the change is unit. ; S5. Calculate the relative effect index of each key factor on ozone formation at a predetermined rate of change. The quantitative relationship between the influence of key factors in the target area on ozone formation was obtained.

2. The method according to claim 1, characterized in that, The quantitative relationship between the key factors in the target area and their impact on ozone formation can be used to distinguish the type of ozone control zone in the target area and / or to perform joint perturbation analysis between different influencing factors in the target area.

3. The method according to claim 1, characterized in that, The time-series datasets of each volatile organic compound in S1 are constructed by converting them into corresponding ozone generation chemical activity parameters, which include ozone generation potential and / or hydroxyl radical reactivity. The ozone generation potential is obtained by multiplying the maximum incremental reactivity calculated based on the photochemical box model with the concentration of each volatile organic compound. The hydroxyl radical reactivity is obtained by multiplying the hydroxyl radical reaction rate constant by the concentration of each volatile organic compound.

4. The method according to claim 1, characterized in that, The machine learning model described in S2 is an ensemble learning model or a neural network model, and hyperparameter optimization is performed using 10-fold cross-validation.

5. The method according to claim 1, characterized in that, The quantitative influence value of ozone concentration mentioned in S4 Calculated using the following causal learning model: ; in, X represents the marginal impact coefficient of key factor i on ozone, and X represents the remaining variables besides the key influencing factor. Represents key factor i, Representative except The nonlinear influence function of the other control variables, This represents the random error term.

6. The method according to claim 1, characterized in that, The relative effect index mentioned in S5 The calculation formula is: ; in, This represents the change in ozone concentration when the key factor i is disturbed alone. This represents the ozone concentration without changing any variables. Representative factor i change f·Z i The change in ozone concentration under the given conditions, where f is the predetermined change percentage of each key factor, and Z... i Indicates the concentration of key factors; This represents the change in concentration of the key factor i, and is numerically equal to f.

7. The method according to claim 2, characterized in that, The distinction between ozone control zone types in the target area is specifically as follows: when At that time, the study area was a VOCs control zone; when At that time, the study area was a NOx control area; when At that time, the study area was a transition zone, that is, a zone jointly controlled by NOx and VOCs; The joint perturbation analysis of different key influencing factors in the target area specifically involves quantitatively determining the multi-factor synergistic influence relationship of ozone pollution by calculating the synergistic influence index among different key influencing factors; wherein the formula for calculating the synergistic influence index is: ; in, The index representing the synergistic impact of key impact factor i and key impact factor j; This represents the change in ozone concentration when both key influencing factor i and key influencing factor j are simultaneously disturbed. and These represent the changes in ozone concentration caused by individual perturbations of key influencing factor i and key influencing factor j, respectively. The basis for quantitatively judging the synergistic influence of multiple factors on ozone pollution is as follows: when The results indicate that key influencing factors i and j have a synergistic enhancing effect on ozone formation; when This indicates that key influencing factor i and key influencing factor j have a mutually inhibitory effect on ozone formation; when This indicates that factors i and j have virtually no synergistic effect on ozone formation.

8. A quantitative system for the synergistic effects of multiple factors on ozone pollution, characterized in that, include: The data acquisition module is used to acquire ozone observation data of the target area, as well as meteorological data and precursor data related to ozone generation, and to construct time-series datasets for each meteorological data and precursor data; wherein, the precursor data includes nitrogen oxides (NOx) and volatile organic compounds (VOCs), and the time-series datasets for each volatile organic compound are constructed by converting them into corresponding ozone generation chemical activity parameters. The model building module is used to build a machine learning model with the time series dataset as input and ozone change as output, and to optimize the hyperparameters of the model. The factor screening module is used to perform feature importance analysis based on the optimized machine learning model and screen key influencing factors by ranking. The key influencing factors include at least one nitrogen oxide and at least one volatile organic compound. The causal estimation module is used to build a causal learning model, estimate the causal effects of each key influencing factor, and obtain the quantitative impact value of each key factor on ozone concentration when the change is a unit. The quantitative analysis module is used to calculate the relative effect index of each key factor on ozone generation based on the relative contribution method, so as to obtain the quantitative relationship of the influence of each key factor on ozone generation in the target area. The control zone differentiation module is used to differentiate the ozone control zone type of the target area according to the quantitative relationship; and / or the synergistic analysis module is used to perform joint perturbation analysis between different key influencing factors in the target area to obtain the synergistic influence index between each key influencing factor.

9. The system according to claim 8, characterized in that, The data acquisition module includes: The photochemical calculation unit is used to calculate the chemical activity coefficients of each volatile organic compound based on the photochemical box model. The chemical activity coefficients include the maximum incremental reaction activity and / or the hydroxyl radical reaction rate constant. An active conversion unit is used to multiply the chemical activity coefficient by the concentration of each volatile organic compound to obtain ozone generation chemical activity parameters, wherein the ozone generation chemical activity parameters include ozone generation potential and / or hydroxyl radical reactivity.

10. An electronic device comprising a processor and a memory, the memory storing a computer program, characterized in that, When the processor executes the computer program, it implements the method of claim 1.