Intelligent atmospheric pollution tracing and early warning method and system
By integrating multivariate data and applying the CTL-Hybrid model, the issues of accuracy and interpretability in air pollution source tracing and early warning technologies have been resolved, enabling high-precision pollutant concentration prediction and quantitative source tracing, and supporting precise pollution prevention and control decisions.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- CCTEG SHENYANG ENG CO
- Filing Date
- 2026-02-04
- Publication Date
- 2026-06-19
Smart Images

Figure CN122241046A_ABST
Abstract
Description
Technical Field
[0001] This invention belongs to the field of environmental protection and air pollution prevention and control technology, and specifically relates to an intelligent source tracing and early warning method and system for air pollution. Background Technology
[0002] With urbanization and industrialization, regional complex air pollution problems are becoming increasingly prominent, characterized by complex sources, diverse formation mechanisms, and significant spatiotemporal dynamics. Accurate prediction and scientific source tracing of pollution processes are crucial for effective prevention and control decisions. Currently, research on air pollution source tracing analysis mainly relies on forecasting systems based on numerical models and statistical methods, as well as source apportionment techniques based on backward trajectories and receptor models. However, existing technologies have significant limitations: on the one hand, traditional numerical forecasting models are computationally expensive and heavily dependent on emission source data, often exhibiting large deviations across different prediction spatial scales. On the other hand, statistical or machine learning models largely rely on limited local monitoring and meteorological data, failing to adequately utilize cross-fertilization of multi-source heterogeneous data (such as remote sensing and socioeconomic activity data), making it difficult to accurately describe the complex spatiotemporal relationships and long-distance transmission impacts of pollution, resulting in limited prediction accuracy and short lead times. On the other hand, existing source tracing methods are mostly offline, qualitative or semi-quantitative. For example, backward trajectory clustering can only indicate the possible direction of origin, but fails to quantitatively correlate with real-time pollution characteristics and contribution intensity. The model prediction results are mostly "black boxes" and lack interpretable output on the causes of pollution, making it difficult to directly link early warning with precise control measures.
[0003] Therefore, there is an urgent need to develop an intelligent system capable of deeply integrating multi-dimensional data and achieving high-precision prediction and quantitative source tracing, in order to improve the precision and intelligence of air pollution prevention and control. To this end, this invention proposes an intelligent source tracing and early warning method and system for air pollution, aiming to solve the aforementioned technical bottlenecks. Summary of the Invention
[0004] To address the shortcomings of existing technologies, the purpose of this invention is to provide an intelligent method and system for tracing and early warning of air pollution, which solves the problem that existing air pollution early warning and tracing technologies are isolated and difficult to coordinate.
[0005] The technical solution adopted in this invention is: an intelligent method and system for tracing and early warning of air pollution, the key technical points of which include: Acquire and integrate multi-dimensional data from environmental monitoring, weather forecasting, remote sensing observation and socio-economic activity monitoring, and perform spatiotemporal alignment and gridding on the multi-dimensional data to generate a standardized spatiotemporal feature dataset covering the target area; The standardized spatiotemporal feature dataset is input into the prediction and analysis integrated model for processing, and the predicted results of the spatial distribution of pollutant concentration in the target area within a specified future time period are output simultaneously, as well as the quantitative contribution distribution results of various features constituting the standardized spatiotemporal feature dataset to the prediction results. Based on the quantitative contribution distribution results, combined with the pollution transmission path information obtained from the backward trajectory model simulation analysis, a quantitative source tracing analysis of the pollution source is performed. Based on the predicted spatial distribution of pollutant concentrations and the quantitative source analysis results, early warning and decision support information are automatically generated and output.
[0006] In the above scheme, the step of inputting the standardized spatiotemporal feature dataset into the integrated prediction and analysis model for processing includes: The standardized spatiotemporal feature dataset is constructed into a three-dimensional data tensor with temporal dimension, spatial grid dimension, and feature dimension; The three-dimensional data tensor is processed sequentially through a convolutional neural network layer, a Transformer encoder layer, and a recurrent neural network layer; The convolutional neural network layer is used to extract the spatial correlation features of pollutant concentration, the Transformer encoder layer is used to capture the long-range dependencies between multivariate features, and the recurrent neural network layer is used to learn the temporal evolution of pollutant concentration and output the spatial distribution prediction results of pollutant concentration.
[0007] In the above scheme, the integrated prediction and analysis model is also used for: Based on the Shapley value (SHAP) interpretability algorithm, the contribution backtracking calculation is performed on the feature data processed by the convolutional neural network layer, the Transformer encoder layer and the recurrent neural network layer to generate the quantitative contribution distribution result.
[0008] In the above scheme, the quantitative source tracing analysis of pollution sources based on the quantitative contribution distribution results and the pollution transmission path information obtained from backward trajectory model simulation analysis specifically includes: Based on a preset contribution threshold, high contribution features with contribution values higher than the threshold are selected from the quantitative contribution distribution results, and their corresponding spatial grid positions are determined to form a first position set. The backward trajectory model is run to simulate the air mass arriving at the monitoring point in the target area, and the trajectory is clustered to extract the spatial grid sequence representing the main transmission path, forming a second location set; Spatial matching and correlation analysis are performed on the first set of locations and the second set of locations to identify key grids that are simultaneously located on high-contribution grids and major transmission paths. Based on their associated feature contribution, the quantitative source tracing analysis results are generated.
[0009] In the above scheme, the spatiotemporal alignment and gridding processing of the multi-dimensional data includes: Data sources with different spatial resolutions are uniformly interpolated onto a preset regular geographic grid using spatial interpolation methods; Data sequences with different time frequencies are unified into a preset continuous time series through time interpolation.
[0010] In the above scheme, the spatial resolution of the preset regular geographic grid is a latitude and longitude grid of 0.01° to 0.5°, and the temporal resolution of the preset continuous time series is on the hourly level.
[0011] A method and system for intelligent source tracing and early warning of air pollution, the key technical points of which include: The data fusion processing unit is configured to perform the steps of acquiring and fusing multi-source data as described in claim 1, and performing spatiotemporal alignment and gridding processing; The integrated prediction and parsing unit is configured to perform the steps described in claim 1: inputting a standardized spatiotemporal feature dataset into the model and simultaneously outputting prediction results and contribution distribution results; The intelligent traceability decision-making unit is configured to perform the steps of quantitative traceability analysis and generating early warning and decision support information as described in claim 1.
[0012] In the above scheme, the integrated prediction and parsing unit includes the following components connected in sequence via communication: The convolution computation module is configured to receive 3D data tensors and perform spatial feature extraction. The attention computation module is configured to perform self-attention computation on the extracted features to capture long-range dependencies; The loop calculation module is configured to learn temporal patterns and output predicted results of the spatial distribution of pollutant concentrations. The interpretability calculation module is configured to calculate and output quantitative contribution distribution results based on the Shapley value algorithm.
[0013] In the above scheme, the intelligent traceability decision-making unit includes: The contribution filtering module is configured to filter high-contribution features and their grid positions based on a threshold, generating a first set of positions. The trajectory analysis module is configured to run the backward trajectory model and perform cluster analysis to generate a second set of locations representing the main transmission path. The spatial association module is configured to perform spatial matching and association analysis on the first set of locations and the second set of locations to generate quantitative source tracing analysis results; The report generation module is configured to generate early warning and decision support reports based on prediction results and source analysis results.
[0014] A computer-readable storage medium having a computer program stored thereon, wherein the key feature is that, when the computer program is executed by a processor, it implements the method as described in any one of claims 1 to 6.
[0015] The beneficial effects of this invention are as follows: This invention constructs a regional three-dimensional terrain model based on GIS technology, establishes a unified geographic grid by using Kriging interpolation, bilinear interpolation or deep learning super-resolution technology, integrates geographic information such as administrative divisions, transportation networks, pollution source locations, and monitoring station locations, and uses time series interpolation and high temporal resolution meteorological fields to assimilate the data, generating hourly continuous datasets. Multi-dimensional pollution emission characteristic values such as pollutant concentration monitoring results, wind speed / direction parameters, boundary layer height, road traffic flow, and real-time power consumption in industrial areas are imported in grid units.
[0016] This invention integrates simulation prediction, contribution analysis, source tracing, and decision support, making it applicable to air pollution prevention and control research and applications in any scenario. By fusing and aligning multi-source spatiotemporal feature data and constructing a CNN-Transformer-LSTM cascade model, the system significantly improves the accuracy and lead time of air pollution prediction. The system innovatively couples interpretable AI with backward trajectory analysis, generating quantitative visualizations of pollution source analysis. The entire system achieves full automation and intelligence from data fusion, intelligent prediction, causal analysis to decision recommendations, providing an efficient technical support platform for precise pollution control and demonstrating significant application value and environmental benefits.
[0017] Based on the CTL-Hybrid model, a 2D convolutional neural network is used to capture the spatial diffusion and aggregation patterns of pollutants. A Transformer encoder layer is introduced to perform self-attention calculation on multi-source feature sequences, capturing the long-distance dependence between feature parameters such as meteorological conditions and social activities and pollution concentration. A Long Short-Term Memory (LSTM) network is used to capture the dynamic temporal evolution of the pollution process, and at the same time, the predicted atmospheric pollutant concentration values for each grid in the next 6, 12, 24, 48 and 72 hours are output.
[0018] The Shapley algorithm (SHAP), an interpretable artificial intelligence algorithm based on game theory, is used to calculate the quantitative contribution of each feature value to the predicted pollutant concentration. A 48-hour backward trajectory simulation centered on the receptor is simultaneously performed using a backward trajectory model. The pollution trajectory clustering results are spatially correlated with the key pollution feature values resolved by the AI SHAP, presenting the complete source tracing analysis results in numerical and visualization form. Attached Figure Description
[0019] To more clearly illustrate the technical solutions in the embodiments of the present invention or the prior art, the drawings used in the embodiments will be briefly introduced below. Obviously, the drawings described below are only some embodiments of the present invention. For those skilled in the art, other drawings can be obtained based on these drawings without creative effort.
[0020] Figure 1 This is a system structure flowchart in an embodiment of the present invention. Detailed Implementation
[0021] To make the above-mentioned objects, features and advantages of the present invention more apparent and understandable, the following description is provided in conjunction with the accompanying drawings. Figure 1 The present invention will be further described in detail below with reference to specific embodiments.
[0022] This embodiment employs an intelligent method and system for tracing and early warning of air pollution, comprising the following steps: S1: Steps for system initialization and data acquisition verification.
[0023] Define the geographical coordinates of the target research area, the types of target pollutants, grid feature indicators, and corresponding preset contribution values. The system will then initiate a scheduled task to retrieve at least 24 hours of data from various interfaces, based on data availability and richness, for use in training and prediction of the integrated prediction and analysis model. The retrieved data includes: target pollutants (commonly PM2.5) obtained from environmental monitoring stations. 2.5 PM 10 The system collects concentrations of O3, NO2, and SO2, obtaining meteorological parameters such as wind speed, wind direction, temperature, humidity, boundary layer height, and pressure field from meteorological numerical weather prediction products. It also obtains remote sensing data such as aerosol optical thickness, administrative divisions, road networks, pollution source locations, and monitoring station locations from NASA or ESA satellite platforms, as well as other characteristic parameters such as traffic flow indices from traffic management departments, regional industrial electricity consumption from power grid companies, and factory coal consumption. The system automatically checks the data missing rate and reasonableness. If the missing rate of a certain source data is >15%, the system automatically initiates a data imputation program, which uses the average of data before and after the missing value.
[0024] S2: Steps for data alignment.
[0025] A regional 3D terrain model is constructed based on a GIS model. Kriging interpolation, bilinear interpolation, or deep learning super-resolution techniques are used to establish a unified geographic grid. All data is uniformly interpolated to a preset resolution latitude and longitude grid, such as 0.1° × 0.1°, with a grid size of approximately 11km × 11km. All data is uniformly timestamped at the hourly level (UTC+8). Low-frequency data is generated as hourly sequences using time interpolation. The hourly feature values of each grid point collectively constitute a standardized spatiotemporal feature dataset covering the entire region, possessing temporal, spatial, and feature dimensions; that is, a 3D data tensor of (temporal length × number of grids × number of features). An example of the hourly feature value for each grid point is shown below: [Pollutant concentration, wind speed U, wind speed V, relative humidity, boundary layer height, aerosol optical thickness, traffic flow index, industrial electricity consumption index, coal consumption] S3: Steps for performing model predictions.
[0026] The prediction and analysis integrated model in this embodiment, namely the CTL-Hybrid prediction model, is processed sequentially through a convolutional neural network (CNN) layer, a Transformer encoder layer, and a long short-term memory network (LSTM) layer: the CNN layer first extracts the spatial correlation features of pollutant concentration; then the Transformer encoder layer captures the long-range dependencies between these spatial features and other multivariate features; finally, the LSTM network layer learns and predicts the temporal evolution of pollutant concentration.
[0027] After continuously training and learning the CTL-Hybrid prediction model by importing the feature sequence data ([data duration, number of grids, number of features]) of at least 24 hours in the past, the model generates grid prediction values of pollutant concentration for each grid in the next 6, 12, 24, 48, and 72 hours.
[0028] S4: Steps for real-time analysis of pollution causes.
[0029] After the prediction is completed, the system automatically initiates pollution cause analysis, calling an interpretability algorithm based on the Shapley value (SHAP) to perform contribution backtracking calculations on the model processing, obtaining the quantitative contribution value of each input feature to the concentration of each grid, and generating quantitative contribution distribution results. Simultaneously, after 48 hours, the system simulates the trajectory model, clustering the trajectories to obtain the main transmission paths (5-8 paths). Based on a preset contribution threshold, the system automatically selects features with contributions higher than the threshold and their corresponding grid locations (first location set), and performs spatial overlay and correlation analysis with the clustered transmission paths (second location set) to identify key transmission channels and potential source areas, generating quantitative pollution source analysis results.
[0030] S5: Early Warning Report Generation. Based on the predicted concentration peak, duration, spatial range, and causal analysis results, and combined with a threshold table for comprehensive judgment, an early warning and decision support report can be generated, including the period of heavy pollution, peak concentration, impact range, and quantitative contribution ranking.
[0031] The above description is merely a specific embodiment of the present invention, but the scope of protection of the present invention is not limited thereto. Any variations or substitutions that can be easily conceived by those skilled in the art within the technical scope disclosed in the present invention should be included within the scope of protection of the present invention. Therefore, the scope of protection of the present invention should be determined by the scope of the claims.
Claims
1. A method for intelligent source tracing and early warning of air pollution, characterized in that, include: Acquire and integrate multi-dimensional data from environmental monitoring, weather forecasting, remote sensing observation and socio-economic activity monitoring, and perform spatiotemporal alignment and gridding on the multi-dimensional data to generate a standardized spatiotemporal feature dataset covering the target area; The standardized spatiotemporal feature dataset is input into the prediction and analysis integrated model for processing, and the predicted results of the spatial distribution of pollutant concentration in the target area within a specified future time period are output simultaneously, as well as the quantitative contribution distribution results of various features constituting the standardized spatiotemporal feature dataset to the prediction results. Based on the quantitative contribution distribution results, combined with the pollution transmission path information obtained from backward trajectory simulation analysis, a quantitative source tracing analysis of pollution sources is performed. Based on the predicted spatial distribution of pollutant concentrations and the quantitative source analysis results, early warning and decision support information are automatically generated and output.
2. The intelligent source tracing and early warning method for air pollution according to claim 1, characterized in that, The step of inputting the standardized spatiotemporal feature dataset into the integrated prediction and parsing model for processing includes: The standardized spatiotemporal feature dataset is constructed into a three-dimensional data tensor with temporal dimension, spatial grid dimension, and feature dimension; The three-dimensional data tensor is processed sequentially through a convolutional neural network layer, a Transformer encoder layer, and a recurrent neural network layer; The convolutional neural network layer is used to extract the spatial correlation features of pollutant concentration, the Transformer encoder layer is used to capture the long-range dependencies between multivariate features, and the recurrent neural network layer is used to learn the temporal evolution of pollutant concentration and output the spatial distribution prediction results of pollutant concentration.
3. The intelligent source tracing and early warning method for air pollution according to claim 2, characterized in that, The integrated prediction and analysis model is also used for: Based on the Shapley value (SHAP) interpretability algorithm, the contribution backtracking calculation is performed on the feature data processed by the convolutional neural network layer, the Transformer encoder layer and the recurrent neural network layer to generate the quantitative contribution distribution result.
4. The intelligent source tracing and early warning method for air pollution according to claim 1, characterized in that, The quantitative source tracing analysis of pollution sources, based on the quantitative contribution distribution results and combined with the pollution transmission path information obtained from backward trajectory simulation analysis, specifically includes: Based on a preset contribution threshold, high contribution features with contribution values higher than the threshold are selected from the quantitative contribution distribution results, and their corresponding spatial grid positions are determined to form a first position set. The backward trajectory model is run to simulate the air mass arriving at the monitoring point in the target area, and the trajectory is clustered to extract the spatial grid sequence representing the main transmission path, forming a second location set; Spatial matching and correlation analysis are performed on the first set of locations and the second set of locations to identify key grids that are simultaneously located on high-contribution grids and major transmission paths. Based on their associated feature contribution, the quantitative source tracing analysis results are generated.
5. The intelligent source tracing and early warning method for air pollution according to claim 1, characterized in that, The process of performing spatiotemporal alignment and gridding on the multivariate data includes: Data sources with different spatial resolutions are uniformly interpolated onto a preset regular geographic grid using spatial interpolation methods; Data sequences with different time frequencies are unified into a preset continuous time series through time interpolation.
6. The intelligent source tracing and early warning method for air pollution according to claim 5, characterized in that, The spatial resolution of the preset regular geographic grid is a latitude and longitude grid of 0.01° to 0.5°, and the temporal resolution of the preset continuous time series is on the hourly level.
7. An intelligent air pollution source tracing and early warning system, characterized in that, include: The data fusion processing unit is configured to perform the steps of acquiring and fusing multi-source data as described in claim 1, and performing spatiotemporal alignment and gridding processing; The integrated prediction and parsing unit is configured to perform the steps described in claim 1: inputting a standardized spatiotemporal feature dataset into the model and simultaneously outputting prediction results and contribution distribution results; The intelligent traceability decision-making unit is configured to perform the steps of quantitative traceability analysis and generating early warning and decision support information as described in claim 1.
8. The intelligent air pollution source tracing and early warning system according to claim 7, characterized in that, The integrated prediction and parsing unit includes the following components connected in sequence via communication: The convolution computation module is configured to receive 3D data tensors and perform spatial feature extraction. The attention computation module is configured to perform self-attention computation on the extracted features to capture long-range dependencies; The loop calculation module is configured to learn temporal patterns and output predicted results of the spatial distribution of pollutant concentrations. The interpretability calculation module is configured to calculate and output quantitative contribution distribution results based on the Shapley value algorithm.
9. The intelligent air pollution source tracing and early warning system according to claim 7, characterized in that, The intelligent traceability decision-making unit includes: The contribution filtering module is configured to filter high-contribution features and their grid positions based on a threshold, generating a first set of positions. The trajectory analysis module is configured to run the backward trajectory model and perform cluster analysis to generate a second set of locations representing the main transmission path. The spatial association module is configured to perform spatial matching and association analysis on the first set of locations and the second set of locations to generate quantitative source tracing analysis results; The report generation module is configured to generate early warning and decision support reports based on prediction results and source analysis results.
10. A computer-readable storage medium having a computer program stored thereon, characterized in that, When the computer program is executed by a processor, it implements the method as described in any one of claims 1 to 6.