An air quality data screening method based on data consistency checking

By using data consistency checks and 1D convolutional neural network models, air quality prediction models are automatically selected, solving the problem of reliance on human experience in existing technologies and improving the accuracy and efficiency of air quality prediction.

CN122196571APending Publication Date: 2026-06-12CHINA NAT ENVIRONMENTAL MONITORING CENT

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
CHINA NAT ENVIRONMENTAL MONITORING CENT
Filing Date
2026-03-09
Publication Date
2026-06-12

AI Technical Summary

Technical Problem

Existing air quality forecasting systems rely on human experience in model selection, which is inefficient and highly subjective. They are unable to achieve high-frequency dynamic optimization on a daily basis and cannot adapt to sudden pollution events or abrupt changes in meteorological and climatic conditions.

Method used

Using a data consistency test-based approach, air quality indicator features are extracted through a 1D convolutional neural network model. The optimal model is automatically selected by combining the sliding time window and the error, trend correlation, and accuracy indicators of various air quality prediction models.

🎯Benefits of technology

It has enabled automated screening of air quality prediction models, improving the accuracy and efficiency of predictions, adapting to the rapid evolution of pollution processes, and providing more accurate prediction results.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN122196571A_ABST
    Figure CN122196571A_ABST
Patent Text Reader

Abstract

The application provides an air quality data screening method based on data consistency checking, and relates to the technical field of atmospheric pollution control and environmental monitoring. The method comprises the following steps: obtaining predicted air index data obtained by a plurality of air quality prediction models; obtaining measured air index data at a plurality of time points in a first preset time period in the past; determining an error index; determining a trend correlation index; screening a target air quality prediction model; and obtaining air index prediction data. According to the application, the error index and the trend correlation coefficient of the predicted air index data and the measured air index data can be calculated, the target air quality prediction model with the minimum data error index and the consistent change trend is screened out, and the air index prediction data is obtained based on the target air quality prediction model. The application breaks through the traditional static weight or manual screening mode, improves the real-time performance and efficiency of screening the air quality prediction model, and improves the accuracy of air quality prediction.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of air pollution control and environmental monitoring technology, and in particular to an air quality data screening method based on data consistency verification. Background Technology

[0002] Current ambient air quality prediction systems typically employ multiple numerical models, statistical / machine learning models, or ensemble models of these to improve accuracy. However, the performance of different models is significantly affected by spatiotemporal differences; for example, some models perform better in specific seasons or regions. Traditional methods often rely on human experience to select the optimal model, which suffers from low efficiency and high subjectivity, and cannot adapt to scenarios involving sudden pollution events or abrupt changes in meteorological and climatic conditions.

[0003] In ambient air quality forecasting operations, dynamic optimization and automated screening of multi-model forecast results are key bottlenecks in improving forecast accuracy, mainly reflected in the following aspects: Different air quality forecasting models exhibit spatiotemporal heterogeneity, meaning their forecasting effectiveness varies across different seasons and for different primary pollutants, particularly regarding PM2.5. 2.5 The forecasting capability of pollutants varies significantly with season, region, and pollution stage. Therefore, the forecasting performance of a single model may differ greatly across different forecast ranges and forecast periods.

[0004] Manual model selection suffers from insufficient efficiency and objectivity. Traditional methods require manual comparison of the forecast performance of multiple air quality prediction models, manual selection of the model with the best forecast performance, and manual correction of the predicted results from that model. This method is time-consuming and experience-dependent, making it difficult to achieve high-frequency, dynamic optimization on a daily basis.

[0005] The information disclosed in the background section of this application is intended only to enhance the understanding of the general background of this application and should not be construed as an admission or in any way implying that the information constitutes prior art known to those skilled in the art. Summary of the Invention

[0006] This invention provides an air quality data screening method based on data consistency verification, which can solve the technical problem that related technologies are unable to achieve daily high-frequency dynamic selection of air quality forecast models.

[0007] According to a first aspect of the present invention, an air quality data screening method based on data consistency verification is provided, comprising: Acquire multiple predicted air quality index data obtained from various air quality prediction models at multiple moments within a first preset time period in the past; Acquire multiple measured air quality data at various times within a first preset time period in the past; Determine the error index between the predicted air quality index data and the measured air quality index data at multiple time points output by each air quality prediction model; Based on the changing trends of predicted air quality index data output by each air quality prediction model at multiple times, and the changing trends of measured air quality index data at multiple times, determine the trend correlation index. Based on the error index and the trend correlation index, a target air quality prediction model is selected from multiple air quality prediction models. Based on the target air quality prediction model, air quality index data for multiple moments within a second preset time period are predicted to obtain air quality index forecast data.

[0008] According to the present invention, determining the error index between the predicted air quality index data and the measured air quality index data at multiple time points output by each air quality prediction model includes: Based on various measured air quality index data within the first preset time period, a reference time period is selected from the historical time periods, and reference air quality index data within the reference time period is obtained. Determine the predicted air quality index data and measured air quality index data within a first preset time period output by each air quality prediction model, as well as the reference time period, and determine the error index of various air quality prediction models.

[0009] According to the present invention, based on multiple measured air quality index data within a first preset time period, a reference time period is selected from historical time periods, and reference air quality index data within the reference time period is obtained, including: Obtain a second time period preceding the first preset time period, wherein the end time of the second time period is adjacent to the start time of the first preset time period; Acquire multiple measured air quality index data within the second time period, and acquire a measured air quality index data sequence composed of the measured air quality index data within the second time period and the measured air quality index data within the first preset time period; A matrix of measured air quality data was obtained based on multiple measured air quality index data sequences. By using a 1D convolutional neural network model, the measured air quality index data matrix is ​​processed to obtain the characteristic information of the measured air quality index. The historical air index data matrix, composed of various historical air index data within the sliding time window before the start of the second time period, is processed by the 1D convolutional neural network model to obtain historical air index feature information. The starting point of the sliding time window at the beginning of the sliding is the starting point of the historical time period, and the ending point of the sliding time window after the last sliding is the moment before the start of the second time period. The length of the sliding time window is equal to the total duration of the second time period and the first preset time period. Determine the feature similarity between the historical air quality index feature information and the measured air quality index feature information corresponding to each sliding of the sliding time window; Under the condition of obtaining the highest feature similarity, the target start point and target end point of the sliding time window are determined, and the target end point is used as the end point of the reference time period. The difference between the target end point and the duration of the first preset time period is used as the start point of the reference time period to obtain the reference time period, and the reference air index data within the reference time period are obtained.

[0010] According to the present invention, the training steps of the 1D convolutional neural network model include: Acquire multiple air quality data during the first training period and multiple air quality data during the second training period; Obtain the relative deviation values ​​between the air quality index data of the same type at corresponding times within the first and second training periods, and obtain the average relative deviation index at multiple corresponding times. And trend consistency indicators of the same type of air quality data in the first and second training periods. According to the formula The regression equation for the i-th air quality index data is obtained, where, This is used to label the similarity of the i-th air quality index data within the first and second training time periods. , and These are the fitting coefficients. Let be the average relative deviation index of the i-th type of air quality data. This is the trend consistency index for the i-th type of air quality data; Based on multiple air quality index data within multiple first training time periods and multiple air quality index data within multiple second training time periods, the fitting coefficients are solved to obtain the solution values ​​of the fitting coefficients. These values ​​are then substituted into the regression equation for each type of air quality index data to obtain the similarity solution function for each type of air quality index data. Acquire multiple air quality index data during the third training period and the fourth training period, and solve for the mean relative deviation index and trend consistency index of each air quality index data. Substitute the average relative deviation index and trend consistency index of each type of air quality index data into the similarity calculation function of the corresponding type of air quality index data to obtain the similarity judgment result of each type of air quality index data. The similarity judgment results of multiple air quality index data are weighted and summed to obtain the similarity reference values ​​of air quality index data in the third and fourth training periods. The air index data matrix composed of multiple air index data in the third training time period is processed by a 1D convolutional neural network model to obtain the third training feature information, and the air index data matrix composed of multiple air index data in the fourth training time period is processed to obtain the fourth training feature information. Solve for the feature similarity between the third and fourth training feature information; Based on the feature similarity and the similarity reference value, the loss function of the 1D convolutional neural network model is obtained; The 1D convolutional neural network model is trained according to the loss function to obtain the trained 1D convolutional neural network model.

[0011] According to the present invention, the predicted air quality index data and measured air quality index data within a first preset time period output by each air quality prediction model, as well as the reference time period, are determined to determine the error index of various air quality prediction models, including: Acquire first prediction data for each air quality prediction model to predict multiple air indicators within a second time period after the reference time period, and first measured data for multiple air indicators within a second time period after the reference time period; Based on the first predicted data and the first measured data, determine the first mean absolute error of each air quality prediction model for each air quality index. Determine the pollution levels of various air indicators within the second time period, and select comparison time periods from historical time periods based on the pollution levels; Obtain the second predicted data obtained by each air quality prediction model for predicting air quality indicators within the comparison time period, and the second measured data of the air quality indicators; The second mean absolute error is obtained based on the second predicted data and the second measured data. The first weight of the air quality prediction model is obtained based on the first mean absolute error and the second mean absolute error. The third mean absolute error value is determined based on the predicted air quality index data and the measured air quality index data. Error indices for various air quality prediction models are determined based on the first weight and the third mean absolute error value.

[0012] According to the present invention, obtaining the first weight of the air quality prediction model based on the first mean absolute error and the second mean absolute error includes: According to the formula Obtain the first weight of the j-th air quality prediction model for the i-th air quality index. ,in, Let be the first mean absolute error of the j-th air quality prediction model in predicting the i-th air quality index. Let be the second mean absolute error of the j-th air quality prediction model for the i-th air quality index, and m be the number of air quality prediction models.

[0013] According to the present invention, error indices for various air quality prediction models are determined based on a first weight and a third mean absolute error value, including: According to the formula Determine the error index of the j-th air quality prediction model for the i-th air quality index. ,in, The third mean absolute error value of the j-th air quality prediction model for predicting the i-th air quality index. This represents the minimum of the third average absolute error value among various air quality prediction models for the i-th air quality index. This represents the maximum value of the third average absolute error among multiple air quality prediction models for the i-th air quality index. Let be the first weight of the j-th air quality prediction model for the i-th air quality index.

[0014] According to the present invention, a trend correlation index is determined based on the changing trends of predicted air quality index data output by each air quality prediction model at multiple time points and the changing trends of measured air quality index data at multiple time points, including: Based on the first predicted data and the first measured data, determine the first correlation coefficient for each air quality prediction model to predict each air quality index; The second correlation coefficient is obtained based on the second predicted data and the second measured data; The second weight of the air quality prediction model is obtained based on the first and second correlation coefficients. The third correlation coefficient is determined based on the predicted air quality index data and the measured air quality index data; According to the formula Determine the trend correlation index of the j-th air quality prediction model with respect to the i-th air quality index. ,in, Let be the first correlation coefficient for the j-th air quality prediction model to predict the i-th air quality index. Let be the second correlation coefficient used by the j-th air quality prediction model to predict the i-th air quality index. Let be the third correlation coefficient of the j-th air quality prediction model for the i-th air quality index. Let be the minimum value of the third correlation coefficient for each air quality prediction model when predicting the i-th air quality index. This represents the maximum value of the third correlation coefficient for each air quality prediction model when predicting the i-th air quality index.

[0015] According to the present invention, the method further includes: Based on the first predicted data, the first measured data, and the preset accuracy range, determine the first accuracy rate of each air quality prediction model for each air quality index. Based on the second predicted number, the second measured data, and the preset accuracy range, determine the second accuracy rate of each air quality prediction model for each air quality index; Based on the first and second accuracy rates, the third weight of the air quality prediction model is obtained. Based on the predicted air quality index data and the measured air quality index data, as well as the preset accuracy range, determine the accuracy score of each air quality prediction model for each air quality index. According to the formula Determine the accuracy index of the j-th air quality prediction model for the i-th air quality index. ,in, Let be the first accuracy rate of the j-th air quality prediction model for predicting the i-th air quality index. Let be the second accuracy rate of the j-th air quality prediction model for predicting the i-th air quality index. The accuracy score for predicting the i-th air quality index by the j-th air quality prediction model.

[0016] According to the present invention, a target air quality prediction model is selected from multiple air quality prediction models based on the error index and the trend correlation index, including: The selection index of the j-th air quality prediction model for the i-th air quality index is obtained by weighted summing of the error index of the j-th air quality prediction model for the i-th air quality index, the trend correlation index of the j-th air quality prediction model for the i-th air quality index, and the accuracy index of the j-th air quality prediction model for the i-th air quality index. The air quality prediction model corresponding to the maximum value of the selected index for the i-th air quality index is used as the target air quality prediction model for predicting the i-th air quality index.

[0017] According to a second aspect of the present invention, an air quality data screening system based on data consistency verification is provided, comprising: The first acquisition module acquires multiple predicted air quality index data obtained from multiple air quality prediction models at multiple times within a first preset time period in the past. The second acquisition module acquires multiple measured air quality data at various times within a first preset time period in the past. The error index module determines the error index between the predicted air quality index data and the measured air quality index data at multiple time points output by each air quality prediction model. The trend correlation index module determines the trend correlation index between the changing trends of predicted air quality index data at multiple times and the changing trends of measured air quality index data output by each air quality prediction model. The screening module screens target air quality prediction models from multiple air quality prediction models based on the error index and the trend correlation index. The prediction module, based on the target air quality prediction model, predicts air quality index data for multiple moments within a second preset time period in the future, and obtains air quality index forecast data.

[0018] By adopting the above technical solution, the present invention can achieve the following technical effects: According to this invention, valuable reference time periods can be first selected from historical time periods. Then, by utilizing the performance of various air quality prediction models within a first preset time period and the reference time periods, multiple screening indicators are determined. This automatically selects air quality prediction models suitable for predicting time periods after the first preset time period, breaking through the traditional static weighting or manual screening mode. It is compatible with multi-source, multi-dimensional data, enabling real-time tracking and dynamic elimination of model performance, adapting to the rapid evolution of pollution processes, making the screening process more accurate and automated, and providing more accurate prediction results, significantly improving operational efficiency. When screening reference time periods, features of various air quality indicators can be extracted using a 1D convolutional neural network model. The feature similarity of air quality indicators is calculated successively using a sliding time window. During the screening process, not only are time periods with the highest similarity to air quality indicators within the first preset time period selected, but also time periods with similar change patterns from the second time period to the first preset time period are selected. This allows for the selection of reference time periods with the highest overall similarity among various air quality indicators, providing more valuable reference air quality indicator data for selecting the optimal model. When training a 1D convolutional neural network (CNN) model, a similarity function for each air quality indicator can be solved through fitting. This function serves as annotation information for training the CNN model, significantly increasing the amount of annotations with minimal manual annotation. This increases the amount of training data, enhancing the training power and effectiveness of the CNN model. Furthermore, compared to directly using the similarity function, the trained CNN model can further integrate the relationships between multiple air quality indicators, improving the nonlinearity within the model and more accurately representing the comprehensive characteristics of multiple air quality indicators over a given time period. When determining the error index, reference time periods with high overall similarity and comparison time periods with similar pollution levels can be selected. A primary weight is assigned based on the accuracy of each model within the reference and comparison time periods. This increases the probability of selecting the more accurate model under similar objective conditions, thereby improving prediction accuracy. When determining trend correlation indicators, the second weight can be calculated by combining the trend prediction accuracy of each air quality prediction model in the environment with the highest comprehensive similarity of air indicators and the trend prediction accuracy of each air quality prediction model in the environment with the most similar pollution levels. This gives higher weight to air quality prediction models with higher trend prediction accuracy in similar objective environments, increases the probability of selecting models with higher accuracy, and thus improves the prediction accuracy of future air indicator data.When determining the accuracy index, the third weight can be calculated by combining the prediction accuracy of each air quality prediction model in the environment with the highest comprehensive similarity of air indicators and the prediction accuracy of each air quality prediction model in the environment with the most similar pollution level. The air quality sub-index is then used to convert the air indicator data into a score that can be used for air quality comparison, thereby improving the applicability and objectivity of the accuracy score. Based on the air quality sub-index, the prediction accuracy of each model in the current environment can be determined. This can increase the weight of models with higher prediction accuracy in environments similar to the current environment and with similar pollution levels, thereby increasing the probability of selecting models with higher prediction accuracy and thus improving the accuracy of subsequent predictions.

[0019] It should be understood that the foregoing general description and the following detailed description are exemplary and explanatory only, and are not intended to limit the invention. Other features and aspects of the invention will become clearer from the following detailed description of exemplary embodiments with reference to the accompanying drawings. Attached Figure Description

[0020] To more clearly illustrate the technical solutions in the embodiments of the present invention or the prior art, the drawings used in the description of the embodiments or the prior art will be briefly introduced below. Obviously, the drawings described below are only some embodiments of the present invention. For those skilled in the art, other embodiments can be obtained based on these drawings without creative effort. Figure 1 An exemplary flowchart of an air quality data screening method based on data consistency verification according to an embodiment of the present invention is shown. Figure 2 An exemplary schematic diagram illustrates the application of an air quality data screening method based on data consistency verification according to an embodiment of the present invention; Figure 3 A block diagram of an air quality data screening system based on data consistency verification according to an embodiment of the present invention is shown as an example. Detailed Implementation

[0021] To make the objectives, technical solutions, and advantages of the embodiments of the present invention clearer, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of the present invention, and not all embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those skilled in the art without creative effort are within the scope of protection of the present invention.

[0022] The technical solution of the present invention will be described in detail below with reference to specific embodiments. These specific embodiments can be combined with each other, and the same or similar concepts or processes may not be described again in some embodiments.

[0023] Figure 1 An exemplary flowchart illustrates an air quality data screening method based on data consistency verification according to an embodiment of the present invention, the method comprising: Step S1: Obtain multiple predicted air quality index data from multiple times within the past first preset time period, which are obtained from multiple air quality prediction models. Step S2: Obtain multiple measured air quality data at various times within the past first preset time period; Step S3: Determine the error index between the predicted air quality index data and the measured air quality index data at multiple time points output by each air quality prediction model. Step S4: Based on the changing trends of the predicted air quality index data output by each air quality prediction model at multiple times and the changing trends of the measured air quality index data at multiple times, determine the trend correlation index. Step S5: Based on the error index and the trend correlation index, select a target air quality prediction model from multiple air quality prediction models; Step S6: Based on the target air quality prediction model, predict the air quality index data for multiple moments within the second preset time period in the future to obtain air quality index forecast data.

[0024] Example 1: According to an embodiment of the present invention, in step S1, multiple predicted air quality index data (e.g., PM2.5) are obtained from multiple moments within a first preset time period, based on predictions from various air quality prediction models. Since air quality data exhibits strong temporal variability, data from the most recent three days can represent the air quality changes over a short period. Therefore, ecological and environmental management primarily focuses on 72-hour forecast results, which effectively support air pollution control and heavy pollution emergency response. Thus, the first preset time period can be three days. Within the first preset time period, multiple air quality prediction models (e.g., numerical models including CAMx, CMAQ, NAQPMS, WRF-Chem, statistical models including XGBoost, LightGBM, DNN, LSTM, and ensemble models OEF) are used to analyze multiple air quality index data (e.g., PM2.5). 2.5 PM 10The maximum 8-hour concentrations of major air pollutants (such as ozone, sulfur dioxide, nitrogen dioxide, and carbon monoxide) are predicted to obtain the various predicted air quality index data. The time interval between each moment within the first preset time period can be 3 hours, and each moment can be predicted based on multiple air quality prediction models to obtain a set of predicted air quality index data. For example, if the first preset time period is 3 days, and the time interval between adjacent moments is 1 hour or 3 hours, one air quality prediction model can obtain 24 or 72 sets of predicted air quality index data. This invention does not limit the time interval between adjacent moments, the type of air quality prediction model, the type of air quality index data, or the length of the first preset time period.

[0025] According to one embodiment of the present invention, in step S2, multiple measured air quality index data at multiple times within a past first preset time period can also be obtained. By detecting the content of multiple air quality indexes at multiple times within the past first preset time period, the actual content of each air quality index can be obtained, which is the multiple measured air quality index data, for comparison with predicted air quality index data.

[0026] Example 2: According to an embodiment of the present invention, in step S3, determining the error index between the predicted air quality index data and the measured air quality index data at multiple times output by each air quality prediction model includes: filtering a reference time period in historical time periods based on multiple measured air quality index data within a first preset time period, and obtaining reference air quality index data within the reference time period; determining the predicted air quality index data and the measured air quality index data within the first preset time period output by each air quality prediction model, as well as the reference time period, and determining the error index of various air quality prediction models.

[0027] According to one embodiment of the present invention, the historical time period can be the past year or three years. A time period that is similar to a recently passed first preset time period can be selected as a reference time period. This allows the performance of each air quality prediction model within the reference time period to be determined, providing a reference for the selection of air quality prediction models. In the example, the applicable scenarios for each air quality prediction model are different. For example, some models perform better in certain seasons. Therefore, by selecting a reference time period and determining the performance of each model within that reference time period, a reference can be provided for the selection of air quality prediction models.

[0028] According to an embodiment of the present invention, a reference time period is selected from historical time periods based on multiple measured air quality index data within a first preset time period, and reference air quality index data within the reference time period is obtained. This includes: obtaining a second time period prior to the first preset time period, wherein the end time of the second time period is adjacent to the start time of the first preset time period; obtaining multiple measured air quality index data within the second time period, and obtaining a measured air quality index data sequence composed of the measured air quality index data within the second time period and the measured air quality index data within the first preset time period; obtaining a measured air quality index data matrix based on the multiple measured air quality index data sequences; processing the measured air quality index data matrix using a 1D convolutional neural network model to obtain measured air quality index feature information; and applying the 1D convolutional neural network model to a sliding time window prior to the start time of the second time period. The system processes a historical air quality index data matrix composed of various historical air quality index data to obtain historical air quality index feature information. The starting point of the sliding time window at the beginning of the sliding is the starting point of the historical time period, and the ending point of the sliding time window after the last sliding is the moment before the start of the second time period. The length of the sliding time window is equal to the total duration of the second time period and the first preset time period. The system determines the feature similarity between the historical air quality index feature information and the measured air quality index feature information corresponding to each sliding of the sliding time window. When the feature similarity is highest, the target starting point and target ending point of the sliding time window are determined. The target ending point is used as the ending point of the reference time period, and the difference between the target ending point and the duration of the first preset time period is used as the starting point of the reference time period to obtain the reference time period. Reference air quality index data within the reference time period are then obtained.

[0029] According to one embodiment of the present invention, the duration of the second time period is the same as that of the first preset time period, for example, the duration of the second time period is also 72 hours, and the end time of the second time period is adjacent to the start time of the first preset time period. Furthermore, a measured air index data sequence composed of measured air index data in the second time period and measured air index data in the first preset time period can be obtained, and a set of measured air index data sequences can be obtained for each air index.

[0030] According to one embodiment of the present invention, each measured air quality index sequence can be used as a row vector of a measured air quality index data matrix, thereby combining multiple measured air quality index sequences to obtain a measured air quality index data matrix. The measured air quality index data matrix can then be input into a 1D convolutional neural network model for processing. In the example, the convolution kernel of the 1D convolutional neural network model can process multiple measured air quality index data in the measured air quality index data matrix according to the temporal direction. Furthermore, the 1D convolutional neural network model can have multiple convolution kernels and multiple levels, ultimately obtaining a 1-dimensional vector of measured air quality index feature information.

[0031] According to one embodiment of the present invention, a sliding time window can be set. When the sliding time window is initially positioned, its starting point is the starting point of a historical time period. After the last slide, its ending point is the moment preceding the start of the second time period. The sliding time window slides by one moment at a time; that is, after each slide, both the starting and ending points of the sliding time window move backward by one moment. Furthermore, the length of the sliding time window is equal to the total duration of the second time period and the first preset time period. Further, the historical air quality index data matrix composed of various historical air quality index data within the sliding time window can be processed using a 1D convolutional neural network model to obtain historical air quality index feature information. The historical air quality index feature information can be solved at each position of the sliding time window. The historical air quality index feature information is a 1-dimensional vector and contains the same amount of data as the measured air quality index feature information.

[0032] According to one embodiment of the present invention, the historical air quality index feature information corresponding to each position of the sliding time window can be compared with the measured air quality index feature information. The higher the similarity, the more similar the air quality index at that position of the sliding time window is to the air quality index in the second time period and the first preset time period, that is, the more similar the external scene is. Therefore, the feature similarity (e.g., cosine similarity) between the historical air quality index feature information and the measured air quality index feature information corresponding to each sliding of the sliding time window can be calculated, and the time period included in the sliding time window with the highest feature similarity is taken as the reference time period. That is, the target endpoint is taken as the endpoint of the reference time period, and the difference between the target endpoint and the duration of the first preset time period is taken as the starting point of the reference time period, thus obtaining the reference time period. Within the reference time period, the air quality index is most similar to the air quality index in the first preset time period, and the change pattern of the air quality index before and within the reference time period is also most similar to the change pattern of the air quality index in the second time period and the first preset time period. Reference air quality index data within the reference time period can be obtained as a reference.

[0033] In this way, features can be extracted from data of various air quality indicators using a 1D convolutional neural network model. The feature similarity of air quality indicators can be calculated successively by using a sliding time window. During the screening process, not only are the time periods with the highest similarity to air quality indicators within the first preset time period selected, but also the time periods with similar change patterns from the second time period to the first preset time period are selected. This allows for the selection of reference time periods with the highest comprehensive similarity to various air quality indicators, providing more valuable reference air quality indicator data for selecting the optimal model.

[0034] Example 3: According to one embodiment of the present invention, the above-described 1D convolutional neural network model can integrate multiple types of air quality index data at multiple time points to form 1D measured air quality index feature information, thereby facilitating the comprehensive comparison of multiple types of air quality index data. The 1D convolutional neural network model can be trained before performing the above processing.

[0035] According to an embodiment of the present invention, the training steps of the 1D convolutional neural network model include: acquiring multiple air quality index data within a first training time period and multiple air quality index data within a second training time period; acquiring the relative deviation values ​​between air quality index data of the same type at corresponding times within the first and second training time periods, and acquiring the average relative deviation index at multiple corresponding times; and the trend consistency index of air quality index data of the same type within the first and second training time periods; obtaining the regression equation of the i-th type of air quality index data according to formula (1). (1) in, This is used to label the similarity of the i-th type of air quality data within the first and second training time periods. , and These are the fitting coefficients. Let be the average relative deviation index of the i-th type of air quality data. Let be the trend consistency index for the i-th type of air quality data; based on multiple air quality data within multiple first training time periods and multiple air quality data within multiple second training time periods, solve for the fitting coefficients, obtain the solution values ​​of the fitting coefficients, and substitute them into the regression equation for each type of air quality data to obtain the similarity solution function for each type of air quality data; obtain multiple air quality data within the third training time period and multiple air quality data within the fourth training time period, and solve for the average relative deviation index and trend consistency index for each type of air quality data; substitute the average relative deviation index and trend consistency index for each type of air quality data into the similarity solution function for the corresponding type of air quality data to obtain the similarity judgment result for each type of air quality data; for multiple The similarity judgment results of various air quality index data are weighted and summed to obtain similarity reference values ​​for air quality index data in the third and fourth training time periods. A 1D convolutional neural network model is used to process the air quality index data matrix composed of various air quality index data in the third training time period to obtain third training feature information, and the same process is applied to the air quality index data matrix composed of various air quality index data in the fourth training time period to obtain fourth training feature information. The feature similarity between the third and fourth training feature information is calculated. Based on the feature similarity and the similarity reference values, the loss function of the 1D convolutional neural network model is obtained. The 1D convolutional neural network model is trained based on the loss function to obtain the trained 1D convolutional neural network model.

[0036] According to one embodiment of the present invention, the average relative deviation index and trend consistency index of the same type of air quality index data within a first training time period and a second training time period can be obtained. The average relative deviation index can be obtained by subtracting the air quality index data at corresponding times, calculating the square of the difference, and then calculating the average of the squares of the differences at multiple corresponding times. This average value is the average relative deviation index. Alternatively, the changes in air quality index data at adjacent times can be calculated within the first training time period and the second training time period, and the Pearson correlation coefficient of the changes at corresponding times can be calculated, serving as the trend consistency index of the air quality index data within the first training time period and the second training time period.

[0037] According to one embodiment of the present invention, the similarity of various air quality data in the first training period and the second training period can be obtained by manual annotation. The similarity annotation can be 1 or 0. For example, 1 indicates that the change pattern of the air quality data in the first training period and the second training period is similar, and 0 indicates that the change pattern of the air quality data in the first training period and the second training period is dissimilar. The annotation can be done manually in small quantities, and only annotating 0 or 1 can reduce the annotation difficulty.

[0038] According to one embodiment of the present invention, the fitting coefficients in the regression equation can be solved according to formula (1). That is, the average relative deviation index and trend consistency index of the i-th air quality index data in multiple sets of first training time periods and second training time periods are substituted into formula (1), and the corresponding pattern similarity label is substituted into formula (1). The specific values ​​of the fitting coefficients are solved by fitting, and the similarity solution function of the i-th air quality index data can be obtained. After obtaining the similarity solution function, the similarity of the i-th air quality index data in any two time periods can be automatically calculated. That is, the average relative deviation index and trend consistency index of the i-th air quality index data in the two time periods are substituted into the similarity solution function, and the similarity of the i-th air quality index data in the two time periods can be obtained. There is no need to manually label the similarity of the i-th air quality index data in the two time periods. The similarity is used as the label of the similarity of the i-th air quality index data in the two time periods for subsequent training of the 1D convolutional neural network model. Thus, without manual labeling and without increasing labor costs, the amount of training data can be greatly increased, and the training strength and model performance can be improved. For example, by substituting the average relative deviation and trend consistency indices of the i-th air quality index data within the third and fourth training time periods into the similarity calculation function for the i-th air quality index data, the similarity of the variation patterns of the i-th air quality index data within the third and fourth training time periods can be determined; that is, the similarity judgment result. Similarly, the similarity calculation function for each type of air quality index data can be obtained, and the similarity judgment result for each type of air quality index data within the third and fourth training time periods can be calculated. Furthermore, the similarity judgment results for multiple air quality index data can be weighted and summed to obtain a similarity reference value for the air quality index data within the third and fourth training time periods. This similarity reference value can serve as the comprehensive similarity of multiple air quality index data within the third and fourth training time periods and can also be used as a label when training a 1D convolutional neural network model.

[0039] According to one embodiment of the present invention, an air index data matrix composed of multiple air index data in a third training time period and an air index data matrix composed of multiple air index data in a fourth training time period can be processed by a 1D convolutional neural network model to obtain third training feature information and fourth training feature information. The feature similarity (e.g., cosine similarity) between the third training feature information and the fourth training feature information is then calculated. The loss function is then calculated using the feature similarity and the aforementioned similarity reference value. For example, the similarity reference value is used as a label, and the absolute value of the error between the feature similarity obtained by the 1D convolutional neural network model and the similarity reference value is used as the loss function. By backpropagating the loss function, the parameters of the 1D convolutional neural network model are adjusted using the gradient descent method. After multiple training iterations, a trained 1D convolutional neural network model is obtained, which can be used in the above-mentioned feature similarity calculation process.

[0040] In this way, the similarity function for each air quality index can be solved by fitting, and used as annotation information for training the 1D convolutional neural network model. This can significantly increase the amount of annotation with only a small amount of manual annotation, thereby increasing the amount of training data and improving the training strength and effect of the 1D convolutional neural network model. Moreover, compared with directly using the similarity function, the trained 1D convolutional neural network model can further integrate the relationships between multiple air quality index data, improve the nonlinearity within the model, and more accurately express the comprehensive characteristics of multiple air quality index data within a time period.

[0041] Example 4: According to one embodiment of the present invention, the error index can be used to reflect the prediction error level of various models and can describe the prediction accuracy of various models. The method involves determining the predicted air quality index data and measured air quality index data for a first preset time period output by each air quality prediction model, along with the reference time period, and then determining the error index of various air quality prediction models. This includes: acquiring first predicted data for each air quality prediction model predicting multiple air quality indicators for a second time period following the reference time period, and first measured data for multiple air quality indicators for the second time period following the reference time period; determining a first average absolute error for each air quality prediction model predicting each air quality indicator based on the first predicted data and the first measured data; determining the pollution level of various air quality indicators within the second time period, and selecting a comparison time period from historical timeframes based on the pollution level; acquiring second predicted data for each air quality prediction model predicting air quality indicators within the comparison time period, and comparing it with the second measured data of the air quality indicators; obtaining a second average absolute error based on the second predicted data and the second measured data; obtaining a first weight for the air quality prediction model based on the first and second average absolute errors; determining a third average absolute error value based on the predicted air quality index data and the measured air quality index data; and determining the error index of various air quality prediction models based on the first weight and the third average absolute error value.

[0042] According to one embodiment of the present invention, the comprehensive similarity of air quality indicators between the reference time period and the first preset time period is the highest. The prediction accuracy of each model for air quality indicators within a second time period (e.g., 72 hours) after the first preset time period can be predicted by referring to the prediction accuracy of each model for air quality indicators within a second time period after the first preset time period. The first average absolute error between the first predicted data and the first measured data within the second time period after the reference time period can be determined. The first average absolute error is calculated by taking the absolute value of the error between the first predicted data and the first measured data at corresponding times, and averaging the absolute values ​​over multiple times within the second time period.

[0043] According to one embodiment of the present invention, a comparison time period can be selected from historical time periods based on the pollution levels of various air indicators within a second time period. For example, for an air indicator, the average value of the measured data of that air indicator within the second time period can be determined, and the pollution level of that air indicator within the second time period can be determined using this average value. Since the overall similarity of the air indicator data between the reference time period and the first preset time period is the highest, the pollution level in the second preset time period after the first preset time period is likely to be similar to the pollution level within the second time period. Therefore, models with higher prediction accuracy at that pollution level can be selected. Based on this purpose, comparison time periods with the same pollution level can be selected to determine the prediction accuracy of each model at that pollution level. In the example, a sliding time window (the duration of which is equal to the second time period) can also be used to select the sliding time window whose pollution level of the air indicator is closest to the pollution level within the second time period. This sliding time window is the comparison time period. If there are multiple comparison time periods with the same pollution level, the time period whose average value of the air indicator data is closest to the average value of the air indicator data within the second time period can be selected as the comparison time period.

[0044] According to one embodiment of the present invention, second predicted data and second measured data within a comparison time period can be obtained, and then a second mean absolute error can be calculated. The method for calculating the second mean absolute error is similar to that for calculating the first mean absolute error, and will not be repeated here. The second mean absolute error can reflect the prediction accuracy of each model under a specific pollution level.

[0045] According to one embodiment of the present invention, the prediction accuracy of each model for air quality indicators in the second preset time period after the first preset time period and the prediction accuracy of each model under a specific pollution level can be combined to determine the weight of each model. That is, the first weight can give higher weight to the model with higher prediction accuracy for air quality indicators in the second preset time period after the first preset time period and the model with higher prediction accuracy under a specific pollution level, thereby increasing the probability of the model with higher prediction accuracy being selected, and thus improving the prediction accuracy.

[0046] According to an embodiment of the present invention, obtaining the first weight of the air quality prediction model based on the first mean absolute error and the second mean absolute error includes: obtaining the first weight of the j-th air quality prediction model for the i-th air quality index according to formula (2). , (2) in, Let be the first mean absolute error of the j-th air quality prediction model in predicting the i-th air quality index. Let be the second mean absolute error of the j-th air quality prediction model for the i-th air quality index, and m be the number of air quality prediction models.

[0047] According to one embodiment of the present invention, This can represent the weight corresponding to the prediction accuracy of the j-th air quality prediction model for the air quality index in the second preset time period after the first preset time period. The higher the prediction accuracy, the higher the weight. This can represent the weight corresponding to the prediction accuracy of the j-th air quality prediction model for a specific pollution level; the higher the prediction accuracy, the higher the weight. Multiplying the two yields the first weight of the j-th air quality prediction model for the i-th air quality index. The higher the overall prediction accuracy, the higher the first weight.

[0048] According to one embodiment of the present invention, the error index of various air quality prediction models is determined based on a first weight and a third mean absolute error value, including: determining the error index of the j-th air quality prediction model for the i-th air quality index according to formula (3). , (3) in, The third mean absolute error value of the j-th air quality prediction model for predicting the i-th air quality index. This represents the minimum of the third average absolute error value among various air quality prediction models for the i-th air quality index. This represents the maximum value of the third average absolute error among multiple air quality prediction models for the i-th air quality index. Let be the first weight of the j-th air quality prediction model for the i-th air quality index.

[0049] According to one embodiment of the present invention, the third mean absolute error between the predicted air quality index data and the measured air quality index data is the prediction error of each model within a first preset time period, and can be used as error data for screening models. The solution method for the third mean absolute error is similar to the solution method for the first mean absolute error described above, and will not be repeated here.

[0050] According to one embodiment of the present invention, The normalized third mean absolute error. This can be used to represent the prediction accuracy of the j-th air quality prediction model for the i-th air quality index within a first preset time period. Combining the aforementioned first weight, an error index for the j-th air quality prediction model for the i-th air quality index can be obtained. The higher the error index, the more accurate the prediction of the j-th air quality prediction model for the i-th air quality index; that is, under similar environmental conditions and similar pollution levels, the prediction of the i-th air quality index is more accurate.

[0051] In this way, reference time periods with high overall similarity can be selected, and comparison time periods with similar pollution levels can be selected. The first weight is set based on the accuracy of each model in the reference and comparison time periods, which increases the probability of selecting the more accurate model under similar objective conditions, thereby improving the prediction accuracy.

[0052] Example 5: According to an embodiment of the present invention, in step S4, the trend correlation index can be used to represent the similarity between the predicted air quality index data and the measured air quality index data in terms of their changing trends. The trend correlation index is determined based on the changing trends of the predicted air quality index data output by each air quality prediction model at multiple times, and the changing trends of the measured air quality index data at multiple times. This includes: determining a first correlation coefficient for each air quality prediction model to predict each air quality index based on the first predicted data and the first measured data; obtaining a second correlation coefficient based on the second predicted data and the second measured data; obtaining a second weight of the air quality prediction model based on the first and second correlation coefficients; determining a third correlation coefficient based on the predicted air quality index data and the measured air quality index data; and determining the trend correlation index of the j-th air quality prediction model for the i-th air quality index according to formula (4). : (4) in, Let be the first correlation coefficient for the j-th air quality prediction model to predict the i-th air quality index. Let be the second correlation coefficient used by the j-th air quality prediction model to predict the i-th air quality index. Let be the third correlation coefficient of the j-th air quality prediction model for the i-th air quality index. Let be the minimum value of the third correlation coefficient for each air quality prediction model when predicting the i-th air quality index. This represents the maximum value of the third correlation coefficient for each air quality prediction model when predicting the i-th air quality index.

[0053] According to one embodiment of the present invention, since the overall similarity of air quality indicators within the reference time period is the highest with that of the first preset time period, the probability that the change pattern of air quality indicators in the second time period after the reference time period is similar to the change pattern in the second preset time period after the first preset time period is relatively high. The accuracy of each model in predicting the change trend of air quality indicators in the second time period can, to a certain extent, reflect the accuracy of each model in predicting the change trend of air quality indicators in the second preset time period. Therefore, based on the accuracy of each model in predicting the change trend of air quality indicators in the second time period, models with higher accuracy in predicting the change trend of air quality indicators in the second preset time period can be selected.

[0054] According to one embodiment of the present invention, the change in the first predicted data at adjacent time points and the change in the first measured data at adjacent time points can be calculated. Further, the Pearson correlation coefficient of the change at corresponding time points can be calculated as the first correlation coefficient. This correlation coefficient can serve as an indicator describing the similarity between the predicted change trend of an air quality indicator by an air quality prediction model over a second time period and the actual change trend of that air quality indicator over the second time period. Similarly, a second correlation coefficient can be obtained, which can be used to describe the similarity between the predicted change trend of an air quality prediction model for an air quality indicator within a comparative time period with the closest pollution levels and the actual change trend of that air quality indicator within the comparative time period. The first and second correlation coefficients can be multiplied to obtain a second weight, which can assign a higher weight to models that predict change trends more accurately within time periods with high overall similarity and similar pollution levels. This increases the probability of selecting models with more accurate prediction trends, thereby improving prediction accuracy.

[0055] According to one embodiment of the present invention, similar to the above method, a third correlation coefficient can be calculated, which is the similarity between the predicted trend of an air quality index by an air quality prediction model within a first preset time period and the actual trend of the air quality index within the first preset time period. Further, since the correlation coefficient is a value between 0 and 1, in order to increase the contrast between the various correlation coefficients, the third correlation coefficients can be standardized. That is, the third correlation coefficients of multiple air quality prediction models predicting an air quality index can be standardized by subtracting the difference between the minimum values ​​of the third correlation coefficients from the third correlation coefficient of one air quality prediction model predicting the air quality index, and then calculating the ratio between this difference and the difference between the maximum and minimum values ​​of the third correlation coefficients, as the standardized correlation coefficient.

[0056] According to one embodiment of the present invention, This is the second weight, which can be used to predict the air quality index by the air quality prediction model. Multiplying the second weight by the standardized correlation coefficient, we get the trend correlation index of the air quality index prediction model for predicting the air quality index. The higher the trend correlation index, the higher the trend accuracy of the air quality index prediction model when predicting the air quality index in similar objective environments.

[0057] In this way, the second weight can be calculated by combining the trend prediction accuracy of each air quality prediction model in the environment with the highest comprehensive similarity of air indicators and the trend prediction accuracy of each air quality prediction model in the environment with the most similar pollution levels. The air quality prediction model with higher trend prediction accuracy in similar objective environments is given higher weight, which increases the probability of the more accurate model being selected, thereby improving the prediction accuracy of future air indicator data.

[0058] Example 6: According to one embodiment of the present invention, the accuracy index can be used to describe whether the measured data is within a preset range around the predicted data. If it is, the prediction is accurate; if not, the prediction is inaccurate.

[0059] According to an embodiment of the present invention, the method further includes: determining a first accuracy rate for each air quality prediction model to predict each air quality index based on the first predicted data, the first measured data, and a preset accuracy range; determining a second accuracy rate for each air quality prediction model to predict each air quality index based on the second predicted data, the second measured data, and the preset accuracy range; obtaining a third weight of the air quality prediction model based on the first accuracy rate and the second accuracy rate; determining an accuracy score for each air quality prediction model to predict each air quality index based on the predicted air quality index data, the measured air quality index data, and the preset accuracy range; and determining the accuracy index of the j-th air quality prediction model for the i-th air quality index according to formula (5). , (5) in, Let be the first accuracy rate of the j-th air quality prediction model for predicting the i-th air quality index. Let be the second accuracy rate of the j-th air quality prediction model for predicting the i-th air quality index. The accuracy score for predicting the i-th air quality index by the j-th air quality prediction model.

[0060] According to one embodiment of the present invention, the preset accuracy range is related to the first predicted data. For example, if the first predicted data for an air quality index belongs to the "excellent" level, the range of ±10% centered on the first predicted data is the preset accuracy range; if the first predicted data for an air quality index belongs to the "lightly polluted" level, the range of ±20% centered on the first predicted data is the preset accuracy range. If the first measured data at the corresponding time point belongs to the preset accuracy range, the first predicted data can be determined to be accurate. The accuracy rate of the first predicted data at each time point within a second time period can be statistically analyzed to obtain the first accuracy rate of the air quality prediction model for that air quality index. Similarly, a second accuracy rate of the air quality prediction model for that air quality index within a comparison time period can be determined. The first accuracy rate can be used to represent the accuracy rate of the air quality prediction model in the environment with the highest overall similarity, and the second accuracy rate can be used to represent the accuracy rate of the air quality prediction model in the environment with the closest pollution levels. The product of the two is... It can be used as a third weight, thus assigning higher weights to models with higher prediction accuracy in environments similar to the current environment and with similar pollution levels, increasing the probability of selecting models with higher prediction accuracy, thereby improving the accuracy of subsequent predictions.

[0061] According to an embodiment of the present invention, the air quality sub-index of the predicted air quality index data of the j-th type of air quality prediction model for the i-th type of air quality index can be determined at the t-th time within the first preset time period according to formula (6). , (6) in, For the t-th time within the first preset time period, the predicted air quality index data for the i-th air quality index by the j-th air quality prediction model. for The upper limit of concentration for the corresponding pollution level. for The lower limit of the concentration for the corresponding pollution level. To and The corresponding air quality index can be obtained by looking up a table. To and The corresponding air quality sub-index can also be obtained by looking up a table. Similarly, the air quality sub-index of the i-th measured air quality index at time t within the first preset time period can be determined. If the air quality sub-index of the measured air quality index falls within a preset accuracy range (e.g., ±25% of the air quality sub-index of the predicted air quality index data at the same time) of the predicted air quality index data, the prediction is determined to be accurate; otherwise, the prediction is determined to be inaccurate. Furthermore, the prediction accuracy rate of the j-th air quality prediction model for the i-th air quality index within the first preset time period can be statistically analyzed and used as the accuracy score of the j-th air quality prediction model for the i-th air quality index.

[0062] According to one embodiment of the present invention, the accuracy index of the j-th air quality prediction model for the i-th air quality index can be obtained by multiplying the third weight and the accuracy score, i.e., by formula (5), so as to describe the expected accuracy of the j-th air quality prediction model for the i-th air quality index under similar environmental and pollution levels.

[0063] In this way, the third weight can be calculated by combining the prediction accuracy of each air quality prediction model in the environment with the highest comprehensive similarity of air indicators and the prediction accuracy of each air quality prediction model in the environment with the most similar pollution level. The air quality sub-index is used to convert the air indicator data into a score that can be used for air quality comparison, so as to improve the applicability and objectivity of the accurate score. Based on the air quality sub-index, the prediction accuracy of each model in the current environment is determined. This can increase the weight of the model with higher prediction accuracy in the environment similar to the current environment and with similar pollution levels, increase the probability of the model with higher prediction accuracy being selected, and thus improve the accuracy of subsequent predictions.

[0064] Example 7: According to an embodiment of the present invention, in step S5, a target air quality prediction model is selected from multiple air quality prediction models based on the error index, the trend correlation index, and the accuracy index. This includes: weighting and summing the error index of the j-th air quality prediction model for the i-th air quality index, the trend correlation index of the j-th air quality prediction model for the i-th air quality index, and the accuracy index of the j-th air quality prediction model for the i-th air quality index to obtain a selection index for the j-th air quality prediction model for the i-th air quality index; and selecting the air quality prediction model corresponding to the maximum value of the selection index for the i-th air quality index as the target air quality prediction model for predicting the i-th air quality index.

[0065] According to one embodiment of the present invention, the weight of the error index can be 0.4, the weight of the trend correlation index can be 0.3, and the weight of the accuracy index can be 0.3. After weighted summation, the selection index of the j-th air quality prediction model for the i-th air quality index can be obtained. The air quality prediction model corresponding to the maximum value of the selection index is taken as the target air quality prediction model for predicting the i-th air quality index. Then, in step S6, the i-th air quality index can be predicted within a second preset time period (e.g., 72 hours). The target air quality prediction models corresponding to different air quality indices can be different from each other. Furthermore, the prediction performance of each model for various air quality indices can be tracked in real time based on the above method. Therefore, models can be selected and eliminated in real time under different objective conditions, so that the model best adapted to the real-time objective conditions can be used to predict air quality indices, thereby improving prediction accuracy.

[0066] Example 8: Figure 2 An exemplary schematic diagram illustrates the application of an air quality data screening method based on data consistency checks according to an embodiment of the present invention, such as... Figure 2 As shown, firstly, measured data and prediction data from multiple models can be acquired, and a first weight can be calculated to determine the error index of each model for a specific air quality indicator. Secondly, a second weight can be calculated to determine the trend correlation index of each model for that air quality indicator. Further, a third weight can be calculated to determine the accuracy index of each model for that air quality indicator. Further, the error index, trend correlation index, and accuracy index can be standardized and weighted to obtain the selection index for each model for that specific air quality indicator. The model with the highest value of the selection index can be used as the target model to predict the air quality indicator data for the next 72 hours.

[0067] The air quality data screening method based on data consistency verification according to an embodiment of the present invention can first screen reference time periods with reference value from historical time periods, and then determine multiple screening indicators by utilizing the performance of various air quality prediction models within a first preset time period and reference time periods. This automatically screens air quality prediction models suitable for predicting time periods after the first preset time period, breaking through the traditional static weighting or manual screening mode. It is compatible with multi-source and multi-dimensional data, realizes real-time tracking and dynamic elimination of model performance, adapts to the rapid evolution of pollution processes, makes the screening process more accurate and automated, and provides more accurate prediction results, significantly improving operational efficiency. When screening reference time periods, features of various air quality indicators can be extracted using a 1D convolutional neural network model, and the feature similarity of air quality indicators can be calculated successively using a sliding time window. During the screening process, not only are time periods with the highest similarity to air quality indicators within the first preset time period screened, but also time periods with similar change patterns from the second time period to the first preset time period are screened. This allows for the selection of reference time periods with the highest comprehensive similarity among multiple air quality indicators, providing more valuable reference air quality indicator data for selecting the optimal model. When training a 1D convolutional neural network (CNN) model, a similarity function for each air quality indicator can be solved through fitting. This function serves as annotation information for training the CNN model, significantly increasing the amount of annotations with minimal manual annotation. This increases the amount of training data, enhancing the training power and effectiveness of the CNN model. Furthermore, compared to directly using the similarity function, the trained CNN model can further integrate the relationships between multiple air quality indicators, improving the nonlinearity within the model and more accurately representing the comprehensive characteristics of multiple air quality indicators over a given time period. When determining the error index, reference time periods with high overall similarity and comparison time periods with similar pollution levels can be selected. A primary weight is assigned based on the accuracy of each model within the reference and comparison time periods. This increases the probability of selecting the more accurate model under similar objective conditions, thereby improving prediction accuracy. When determining trend correlation indicators, the second weight can be calculated by combining the trend prediction accuracy of each air quality prediction model in the environment with the highest comprehensive similarity of air indicators and the trend prediction accuracy of each air quality prediction model in the environment with the most similar pollution levels. This gives higher weight to air quality prediction models with higher trend prediction accuracy in similar objective environments, increases the probability of selecting models with higher accuracy, and thus improves the prediction accuracy of future air indicator data.When determining the accuracy index, the third weight can be calculated by combining the prediction accuracy of each air quality prediction model in the environment with the highest comprehensive similarity of air indicators and the prediction accuracy of each air quality prediction model in the environment with the most similar pollution level. The air quality sub-index is then used to convert the air indicator data into a score that can be used for air quality comparison, thereby improving the applicability and objectivity of the accuracy score. Based on the air quality sub-index, the prediction accuracy of each model in the current environment can be determined. This can increase the weight of models with higher prediction accuracy in environments similar to the current environment and with similar pollution levels, thereby increasing the probability of selecting models with higher prediction accuracy and thus improving the accuracy of subsequent predictions.

[0068] Figure 3 An exemplary block diagram of an air quality data screening system based on data consistency verification according to an embodiment of the present invention is shown, the system comprising: The first acquisition module acquires multiple predicted air quality index data obtained from multiple air quality prediction models at multiple times within a first preset time period in the past. The second acquisition module acquires multiple measured air quality data at various times within a first preset time period in the past. The error index module determines the error index between the predicted air quality index data and the measured air quality index data at multiple time points output by each air quality prediction model. The trend correlation index module determines the trend correlation index between the changing trends of predicted air quality index data at multiple time points output by each air quality prediction model and the changing trends of measured air quality index data. The screening module screens target air quality prediction models from multiple air quality prediction models based on the error index and the trend correlation index. The prediction module, based on the target air quality prediction model, predicts air quality index data for multiple moments within a second preset time period in the future, and obtains air quality index forecast data.

[0069] This invention can be a method, apparatus, system, and / or computer program product. The computer program product may include a computer-readable storage medium having computer-readable program instructions loaded thereon for performing various aspects of the invention.

[0070] Those skilled in the art should understand that the embodiments of the present invention described above and shown in the accompanying drawings are merely examples and do not limit the present invention. The objectives of the present invention have been fully and effectively achieved. The functions and structural principles of the present invention have been demonstrated and explained in the embodiments, and any variations or modifications may be made to the implementation of the present invention without departing from the stated principles.

[0071] Finally, it should be noted that the above embodiments are only used to illustrate the technical solutions of the present invention, and not to limit them; although the present invention has been described in detail with reference to the foregoing embodiments, those skilled in the art should understand that modifications can still be made to the technical solutions described in the foregoing embodiments, or equivalent substitutions can be made to some or all of the technical features; and these modifications or substitutions do not cause the essence of the corresponding technical solutions to deviate from the scope of the technical solutions of the embodiments of the present invention.

Claims

1. A method for screening air quality data based on data consistency checks, characterized in that, include: Acquire multiple predicted air quality index data obtained from various air quality prediction models at multiple moments within a first preset time period in the past; Acquire multiple measured air quality data at various times within a first preset time period in the past; Determine the error index between the predicted air quality index data and the measured air quality index data at multiple time points output by each air quality prediction model; Based on the changing trends of predicted air quality index data output by each air quality prediction model at multiple times, and the changing trends of measured air quality index data at multiple times, determine the trend correlation index. Based on the error index and the trend correlation index, a target air quality prediction model is selected from multiple air quality prediction models. Based on the target air quality prediction model, air quality index data for multiple moments within a second preset time period are predicted to obtain air quality index forecast data.

2. The air quality data screening method based on data consistency verification according to claim 1, characterized in that, Determine the error indices between the predicted air quality index data and the measured air quality index data at multiple time points output by each air quality prediction model, including: Based on various measured air quality data within the first preset time period, a reference time period is selected from historical time periods, and reference air quality data within the reference time period is obtained. Determine the predicted air quality index data and measured air quality index data within a first preset time period output by each air quality prediction model, as well as the reference time period, and determine the error index of various air quality prediction models.

3. The air quality data screening method based on data consistency verification according to claim 2, characterized in that, Based on various measured air quality index data within a first preset time period, a reference time period is selected from historical time periods, and reference air quality index data within the reference time period is obtained, including: Obtain a second time period preceding the first preset time period, wherein the end time of the second time period is adjacent to the start time of the first preset time period; Acquire multiple measured air quality index data within the second time period, and acquire a measured air quality index data sequence composed of the measured air quality index data within the second time period and the measured air quality index data within the first preset time period; A matrix of measured air quality data was obtained based on multiple measured air quality index data sequences. By using a 1D convolutional neural network model, the measured air quality index data matrix is ​​processed to obtain the characteristic information of the measured air quality index. The historical air index data matrix, composed of various historical air index data within the sliding time window before the start of the second time period, is processed by the 1D convolutional neural network model to obtain historical air index feature information. The starting point of the sliding time window at the beginning of the sliding is the starting point of the historical time period, and the ending point of the sliding time window after the last sliding is the moment before the start of the second time period. The length of the sliding time window is equal to the total duration of the second time period and the first preset time period. Determine the feature similarity between the historical air quality index feature information and the measured air quality index feature information corresponding to each sliding of the sliding time window; Under the condition of obtaining the highest feature similarity, the target start point and target end point of the sliding time window are determined, and the target end point is used as the end point of the reference time period. The difference between the target end point and the duration of the first preset time period is used as the start point of the reference time period to obtain the reference time period, and the reference air index data within the reference time period are obtained.

4. The air quality data screening method based on data consistency verification according to claim 3, characterized in that, The training steps of the 1D convolutional neural network model include: Acquire multiple air quality data during the first training period and multiple air quality data during the second training period; Obtain the relative deviation values ​​between the air quality index data of the same type at corresponding times within the first and second training periods, and obtain the average relative deviation index at multiple corresponding times. And trend consistency indicators of the same type of air quality data in the first and second training periods. According to the formula The regression equation for the i-th air quality index data is obtained, where, This is used to label the similarity of the i-th air quality index data within the first and second training time periods. , and These are the fitting coefficients. Let be the average relative deviation index of the i-th type of air quality data. This is the trend consistency index for the i-th type of air quality data; Based on multiple air quality index data within multiple first training time periods and multiple air quality index data within multiple second training time periods, the fitting coefficients are solved to obtain the solution values ​​of the fitting coefficients. These values ​​are then substituted into the regression equation for each type of air quality index data to obtain the similarity solution function for each type of air quality index data. Acquire multiple air quality index data during the third training period and the fourth training period, and solve for the mean relative deviation index and trend consistency index of each air quality index data. Substitute the average relative deviation index and trend consistency index of each type of air quality index data into the similarity calculation function of the corresponding type of air quality index data to obtain the similarity judgment result of each type of air quality index data. The similarity judgment results of multiple air quality index data are weighted and summed to obtain the similarity reference values ​​of air quality index data in the third and fourth training periods. The air index data matrix composed of multiple air index data in the third training time period is processed by a 1D convolutional neural network model to obtain the third training feature information, and the air index data matrix composed of multiple air index data in the fourth training time period is processed to obtain the fourth training feature information. Solve for the feature similarity between the third and fourth training feature information; Based on the feature similarity and the similarity reference value, the loss function of the 1D convolutional neural network model is obtained; The 1D convolutional neural network model is trained according to the loss function to obtain the trained 1D convolutional neural network model.

5. The air quality data screening method based on data consistency verification according to claim 2, characterized in that, Determine the predicted air quality index data and measured air quality index data within a first preset time period output by each air quality prediction model, as well as the reference time period, and determine the error indices of various air quality prediction models, including: Acquire first prediction data for each air quality prediction model to predict multiple air indicators within a second time period after the reference time period, and first measured data for multiple air indicators within a second time period after the reference time period; Based on the first predicted data and the first measured data, determine the first mean absolute error of each air quality prediction model for each air quality index. Determine the pollution levels of various air indicators within the second time period, and select comparison time periods from historical time periods based on the pollution levels; Obtain the second predicted data obtained by each air quality prediction model for predicting air quality indicators within the comparison time period, and the second measured data of the air quality indicators; The second mean absolute error is obtained based on the second predicted data and the second measured data. The first weight of the air quality prediction model is obtained based on the first mean absolute error and the second mean absolute error. The third mean absolute error value is determined based on the predicted air quality index data and the measured air quality index data. Error indices for various air quality prediction models are determined based on the first weight and the third mean absolute error value.

6. The air quality data screening method based on data consistency verification according to claim 5, characterized in that, Based on the first mean absolute error and the second mean absolute error, the first weights of the air quality prediction model are obtained, including: According to the formula Obtain the first weight of the j-th air quality prediction model for the i-th air quality index. ,in, Let be the first mean absolute error of the j-th air quality prediction model in predicting the i-th air quality index. Let be the second mean absolute error of the j-th air quality prediction model for the i-th air quality index, and m be the number of air quality prediction models.

7. The air quality data screening method based on data consistency verification according to claim 5, characterized in that, Based on the first weight and the third mean absolute error value, error indices for various air quality prediction models are determined, including: According to the formula Determine the error index of the j-th air quality prediction model for the i-th air quality index. ,in, The third mean absolute error value of the j-th air quality prediction model for predicting the i-th air quality index. This represents the minimum of the third average absolute error value among various air quality prediction models for the i-th air quality index. This represents the maximum value of the third average absolute error among multiple air quality prediction models for the i-th air quality index. Let be the first weight of the j-th air quality prediction model for the i-th air quality index.

8. The air quality data screening method based on data consistency verification according to claim 5, characterized in that, Based on the changing trends of predicted air quality index data output by each air quality prediction model over multiple time points, and the changing trends of measured air quality index data over multiple time points, trend correlation indicators are determined, including: Based on the first predicted data and the first measured data, determine the first correlation coefficient for each air quality prediction model to predict each air quality index; The second correlation coefficient is obtained based on the second predicted data and the second measured data; The second weight of the air quality prediction model is obtained based on the first and second correlation coefficients. The third correlation coefficient is determined based on the predicted air quality index data and the measured air quality index data; According to the formula Determine the trend correlation index of the j-th air quality prediction model with respect to the i-th air quality index. ,in, Let be the first correlation coefficient for the j-th air quality prediction model to predict the i-th air quality index. Let be the second correlation coefficient used by the j-th air quality prediction model to predict the i-th air quality index. Let be the third correlation coefficient of the j-th air quality prediction model for the i-th air quality index. Let be the minimum value of the third correlation coefficient for each air quality prediction model when predicting the i-th air quality index. This represents the maximum value of the third correlation coefficient for each air quality prediction model when predicting the i-th air quality index.

9. The air quality data screening method based on data consistency verification according to claim 5, characterized in that, The method further includes: Based on the first predicted data, the first measured data, and the preset accuracy range, determine the first accuracy rate of each air quality prediction model for each air quality index. Based on the second predicted number, the second measured data, and the preset accuracy range, determine the second accuracy rate of each air quality prediction model for each air quality index; Based on the first and second accuracy rates, the third weight of the air quality prediction model is obtained. Based on the predicted air quality index data and the measured air quality index data, as well as the preset accuracy range, determine the accuracy score of each air quality prediction model for each air quality index. According to the formula Determine the accuracy index of the j-th air quality prediction model for the i-th air quality index. ,in, Let be the first accuracy rate of the j-th air quality prediction model for predicting the i-th air quality index. Let be the second accuracy rate of the j-th air quality prediction model for predicting the i-th air quality index. The accuracy score for predicting the i-th air quality index by the j-th air quality prediction model.

10. The air quality data screening method based on data consistency verification according to claim 9, characterized in that, Based on the error index and the trend correlation index, a target air quality prediction model is selected from multiple air quality prediction models, including: The selection index of the j-th air quality prediction model for the i-th air quality index is obtained by weighted summing of the error index of the j-th air quality prediction model for the i-th air quality index, the trend correlation index of the j-th air quality prediction model for the i-th air quality index, and the accuracy index of the j-th air quality prediction model for the i-th air quality index. The air quality prediction model corresponding to the maximum value of the selected index for the i-th air quality index is used as the target air quality prediction model for predicting the i-th air quality index.