A method for historical monitoring data uncertainty correction for contaminated site assessment

By correcting the uncertainty of historical monitoring data of contaminated sites using machine learning models and interpolation methods, the problem of data uncertainty not being effectively corrected in existing technologies is solved, and high accuracy and reliability of contaminated site assessment are achieved.

CN122196335APending Publication Date: 2026-06-12BEIJING INST OF GEOLOGY FOR MINERAL RESOURCES

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
BEIJING INST OF GEOLOGY FOR MINERAL RESOURCES
Filing Date
2026-03-03
Publication Date
2026-06-12

AI Technical Summary

Technical Problem

Existing methods for assessing contaminated sites have failed to effectively correct for uncertainties in historical monitoring data, resulting in inaccurate assessment results and affecting the selection of remediation plans and environmental protection measures.

Method used

Machine learning models are used to extract and train features from historical monitoring data. Uncertain data are corrected by interpolation. Feature vectors are constructed by combining mean, standard deviation, skewness, sharpness and trend features. Fully connected neural networks are used for data correction.

🎯Benefits of technology

It significantly improves the accuracy and reliability of contaminated site assessment data, reduces the impact of human factors and errors on assessment results, and ensures the precision of contaminated site assessment.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN122196335A_ABST
    Figure CN122196335A_ABST
Patent Text Reader

Abstract

The present application relates to a kind of historical monitoring data uncertainty correction method for contaminated site evaluation, belong to data cleaning technical field, comprising: historical monitoring data is sorted according to time sequence and forms the time sequence of historical monitoring data;The time sequence of historical monitoring data is divided according to preset interval and forms multiple time windows;The characteristic value of each time window is extracted to construct feature vector;Characteristic vector is input into machine learning model and is trained to obtain uncertainty data monitoring model;Uncertainty data in historical monitoring data is identified using the uncertainty data monitoring model;Uncertainty data is corrected using interpolation method.The present application effectively identifies and corrects the uncertainty data in historical monitoring data by introducing machine learning model, and can significantly improve the data accuracy and reliability in contaminated site evaluation.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of data quality control technology, specifically to a method for correcting the uncertainty of historical monitoring data used in contaminated site assessment. Background Technology

[0002] Contaminated site assessment is a crucial step in environmental protection and land remediation, aiming to evaluate the degree of contamination, the distribution of pollution sources, and potential environmental risks. Accurate contaminated site assessment relies not only on monitoring data but also on a comprehensive consideration of historical data, geographical information, and environmental conditions. However, in actual contaminated site monitoring, historical monitoring data often contains uncertainties. These uncertainties stem from various factors, including measurement errors, temporal and spatial differences in data collection, equipment calibration errors, changes in the external environment, and human error.

[0003] In the risk assessment of contaminated sites, the accuracy and reliability of historical monitoring data are crucial. If data uncertainties are not effectively corrected, inaccurate assessment results may result, consequently affecting decision-makers' choices of remediation plans, land use policies, and environmental protection measures. Therefore, effectively correcting uncertainties in historical monitoring data and improving the accuracy and reliability of contaminated site assessments is a pressing problem in environmental science and pollution prevention.

[0004] Existing methods for assessing contaminated sites largely rely on empirical data correction or simple statistical analysis. However, these methods often overlook the complexity of data uncertainty and fail to fully utilize the potential value of historical monitoring data, resulting in insufficient accuracy and reliability of the corrected data. Furthermore, existing technologies generally lack methods for customized corrections based on specific conditions of contaminated sites (such as variations in different pollutant types and monitoring periods), and they also fail to effectively integrate multi-source, multi-dimensional historical monitoring data to improve the accuracy of the correction results.

[0005] Therefore, there is an urgent need for a method that can systematically correct for uncertainties in historical monitoring data of contaminated sites, which can reduce measurement errors and uncertainties in the data while preserving the actual trend of data change, so as to support more accurate risk assessment of contaminated sites and environmental decision-making. Summary of the Invention

[0006] To address the aforementioned problems, the purpose of this invention is to provide a method for correcting the uncertainty of historical monitoring data used in the assessment of contaminated sites.

[0007] A method for correcting the uncertainty of historical monitoring data for contaminated site assessment includes:

[0008] Step 1: Obtain historical monitoring data for the target contaminated site;

[0009] Step 2: Sort the historical monitoring data in chronological order to form a time series of historical monitoring data;

[0010] Step 3: Divide the time series of historical monitoring data into multiple time windows according to preset intervals;

[0011] Step 4: Extract the feature values ​​of each time window to construct a feature vector;

[0012] Step 5: Input the feature vectors into the machine learning model for training to obtain the uncertainty data monitoring model;

[0013] Step 6: Use the aforementioned uncertainty data monitoring model to identify uncertain data in historical monitoring data;

[0014] Step 7: Use interpolation to correct for uncertain data.

[0015] Preferably, step 4: extracting feature values ​​from each time window to construct a feature vector includes:

[0016] Step 4.1: Calculate the mean and standard deviation of the data within the time window;

[0017] Step 4.2: Calculate the skewness and sharpness of the data distribution within the time window based on the mean and standard deviation of the data;

[0018] Step 4.3: Fit a feature function of the data changing over time within the time window, and obtain trend features based on this feature function;

[0019] Step 4.4: Construct feature vectors using mean, standard deviation, skewness, sharpness, and trend characteristics.

[0020] Preferably, in step 4.2, the formula for calculating the skewness is:

[0021]

[0022] in, Indicates the degree of skewness. Indicates the number of data points within the time window. This represents the i-th data point within the time window. This represents the mean. It represents the standard deviation.

[0023] Preferably, in step 4.2, the formula for calculating sharpness is:

[0024]

[0025] in, Indicates sharpness, Indicates the number of data points within the time window. This represents the i-th data point within the time window. This represents the mean. It represents the standard deviation.

[0026] Preferably, step 4.3: fitting a feature function of the data changing over time within the time window, and obtaining trend features based on this feature function, including:

[0027] A first-order linear model is used to fit the data within the time window to obtain a feature function, and the slope and intercept of this feature function are used as trend features; the formula for calculating the trend features is as follows:

[0028]

[0029] in, Represents the set of data points within a time window. Indicates the slope. Indicates the intercept. Represents the characteristic function. This represents the data point at time t within the time window. This represents the average time within the window.

[0030] Preferably, step 5: inputting the feature vector into a machine learning model for training to obtain an uncertainty data monitoring model includes:

[0031] The feature vector is input into a fully connected neural network, and the weights and biases of the fully connected neural network are updated through backpropagation to obtain an uncertainty data monitoring model; the loss function during training is:

[0032]

[0033] in, Represents the loss function. Indicates the number of data points. This represents the predicted value of the i-th data point. This represents the true value of the i-th data point.

[0034] Preferably, step 7: correcting the uncertain data using interpolation includes:

[0035] Step 7.1: Obtain adjacent data points for uncertain data;

[0036] Step 7.2: Correct the uncertain data by using the linear relationship between adjacent data points; the correction formula is as follows:

[0037]

[0038] in, This indicates the corrected data. Timestamps representing uncertain data This represents the timestamp of the first adjacent data point. This represents the timestamp of the second adjacent data point. This represents the value of the first adjacent data point. This represents the value of the second adjacent data point.

[0039] The present invention also provides an electronic device, including a bus, a transceiver, a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein the transceiver, the memory, and the processor are connected via the bus, characterized in that the computer program, when executed by the processor, implements the steps in the above-described method for correcting the uncertainty of historical monitoring data for contaminated site assessment.

[0040] The present invention also provides a computer-readable storage medium having a computer program stored thereon, characterized in that, when the computer program is executed by a processor, it implements the steps in the above-described method for correcting the uncertainty of historical monitoring data for the assessment of contaminated sites.

[0041] According to specific embodiments provided by the present invention, the present invention discloses the following technical effects:

[0042] This invention relates to a method for correcting uncertainties in historical monitoring data for contaminated site assessment. Compared with existing technologies, this invention introduces a machine learning model to effectively identify and correct uncertainties in historical monitoring data, significantly improving the accuracy and reliability of data in contaminated site assessment. Compared with traditional empirical correction methods, this invention can automatically identify and correct uncertainties in the data, reducing the impact of human factors and errors on the assessment results, thereby ensuring the accuracy of contaminated site assessment.

[0043] To make the above-mentioned objects, features and advantages of the present invention more apparent and understandable, preferred embodiments are described below in detail with reference to the accompanying drawings. Attached Figure Description

[0044] To more clearly illustrate the technical solutions in the embodiments of the present invention or the prior art, the drawings used in the description of the embodiments or the prior art will be briefly introduced below. Obviously, the drawings described below are only some embodiments of the present invention. For those skilled in the art, other drawings can be obtained based on these drawings without creative effort.

[0045] Figure 1 The present invention provides a flowchart of a method for correcting the uncertainty of historical monitoring data for the assessment of contaminated sites. Detailed Implementation

[0046] In the description of this invention, it should be understood that the terms "center," "longitudinal," "lateral," "length," "width," "thickness," "upper," "lower," "front," "rear," "left," "right," "vertical," "horizontal," "top," "bottom," "inner," "outer," "clockwise," and "counterclockwise," etc., indicate the orientation or positional relationship based on the orientation or positional relationship shown in the accompanying drawings. They are only for the convenience of describing this invention and simplifying the description, and do not indicate or imply that the device or element referred to must have a specific orientation, or be constructed and operated in a specific orientation. Therefore, they should not be construed as limitations on this invention.

[0047] Furthermore, the terms "first" and "second" are used for descriptive purposes only and should not be construed as indicating or implying relative importance or implicitly specifying the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include one or more of that feature. In the description of this invention, "a plurality of" means two or more, unless otherwise explicitly specified.

[0048] In this invention, unless otherwise explicitly specified and limited, the terms "installation," "connection," "linking," and "fixing," etc., should be interpreted broadly. For example, they can refer to a fixed connection, a detachable connection, or an integral connection; they can refer to a mechanical connection or an electrical connection; they can refer to a direct connection or an indirect connection through an intermediate medium; and they can refer to the internal connection of two components. Those skilled in the art can understand the specific meaning of the above terms in this invention according to the specific circumstances.

[0049] Please see Figure 1 A method for correcting the uncertainty of historical monitoring data for contaminated site assessment, comprising:

[0050] Step 1: Obtain historical monitoring data for the target contaminated site;

[0051] Step 2: Sort the historical monitoring data in chronological order to form a time series of historical monitoring data;

[0052] Step 3: Divide the time series of historical monitoring data into multiple time windows according to preset intervals;

[0053] In order to use machine learning models to model time series, historical monitoring data is divided into time windows, and statistical and dynamic features are extracted within each time window.

[0054] Step 4: Extract the feature values ​​of each time window to construct a feature vector;

[0055] Furthermore, step 4 includes:

[0056] Step 4.1: Calculate the mean and standard deviation of the data within the time window;

[0057] Step 4.2: Calculate the skewness and sharpness of the data distribution within the time window based on the mean and standard deviation of the data;

[0058] In step 4.2, the formula for calculating the skewness is:

[0059]

[0060] in, Indicates the degree of skewness. Indicates the number of data points within the time window. This represents the i-th data point within the time window. This represents the mean. It represents the standard deviation.

[0061] The formula for calculating sharpness is:

[0062]

[0063] in, Indicates sharpness, Indicates the number of data points within the time window. This represents the i-th data point within the time window. This represents the mean. It represents the standard deviation.

[0064] Step 4.3: Fit a feature function of the data changing over time within the time window, and obtain trend features based on this feature function;

[0065] In step 4.3, a first-order linear model is used to fit the data within the time window to obtain a feature function, and the slope and intercept of this feature function are used as trend features; wherein, the formula for calculating the trend features is:

[0066]

[0067] in, Represents the set of data points within a time window. Indicates the slope. Indicates the intercept. Represents the characteristic function. This represents the data point at time t within the time window. This represents the average time within the window.

[0068] Step 4.4: Construct feature vectors using mean, standard deviation, skewness, sharpness, and trend characteristics.

[0069] 1. The mean is the arithmetic average of all values ​​in a dataset, reflecting the overall level or central location of the data. It can describe the overall trend of the monitoring data and help identify whether there are systematic deviations in the data (such as long-term pollution exceeding standards), helping to capture the overall pollution status of a contaminated site over a specific time period.

[0070] 2. Standard deviation measures the degree of deviation of data from the mean, i.e., the volatility of the data. Monitoring data volatility may be related to environmental changes, seasonal factors, or human intervention. It is used to capture data volatility and help identify abnormal fluctuations and uncertainties.

[0071] 3. Skewness measures the symmetry of data distribution. If the skewness of the data is positive, it means that the right tail of the data is longer; conversely, it is left-skewed. Skewness can help identify whether there is an asymmetric distribution in the data and help identify pollution peaks or extreme pollution events.

[0072] 4. Sharpness measures the degree of peak in a data distribution, reflecting whether extreme values ​​around the midpoint of the data are significant. Data with high sharpness typically have more extreme values ​​(e.g., occasional high concentrations of pollutants), which helps identify the presence of frequent extreme values ​​in the data, and is crucial for risk assessment of contaminated sites.

[0073] 5. Trend characteristics can describe the long-term direction of data change over time. The upward or downward trend of monitoring data may be related to external factors (such as seasonality, policy changes, or remediation measures), and are used to reveal the long-term trend of monitoring data, helping to determine whether pollution is showing an upward or downward trend.

[0074] This invention integrates multiple features such as mean, standard deviation, skewness, sharpness, and trend into a single feature vector, which can comprehensively describe the distribution characteristics, volatility, tail behavior, and long-term trend of data. It provides richer information than a single indicator and helps machine learning models make more accurate judgments.

[0075] Step 5: Input the feature vectors into the machine learning model for training to obtain the uncertainty data monitoring model;

[0076] In step 5, the feature vector is input into a fully connected neural network, and the weights and biases of the fully connected neural network are updated using the backpropagation algorithm to obtain the uncertainty data monitoring model; wherein, the loss function during the training process is:

[0077]

[0078] in, Represents the loss function. Indicates the number of data points. This represents the predicted value of the i-th data point. This represents the true value of the i-th data point.

[0079] Step 6: Use the aforementioned uncertainty data monitoring model to identify uncertain data in historical monitoring data;

[0080] Step 7: Use interpolation to correct for uncertain data.

[0081] Furthermore, step 7 includes:

[0082] Step 7.1: Obtain adjacent data points for uncertain data;

[0083] Step 7.2: Correct the uncertain data by using the linear relationship between adjacent data points; the correction formula is as follows:

[0084]

[0085] in, This indicates the corrected data. Timestamps representing uncertain data Indicates the timestamp of the first adjacent data point. This represents the timestamp of the second adjacent data point. This represents the value of the first adjacent data point. This represents the value of the second adjacent data point.

[0086] Uncertainty in data (such as sensor error) often manifests as certain extreme values. By inferring the trend of data change through linear relationships, uncertain data can be locally corrected without destroying the global regularity of the data.

[0087] The present invention also provides an electronic device, including a bus, a transceiver, a memory, a processor, and a computer program stored in the memory and executable on the processor. The transceiver, the memory, and the processor are connected via the bus. The computer program, when executed by the processor, implements the steps in the aforementioned method for correcting the uncertainty of historical monitoring data for contaminated site assessment. Compared with the prior art, the beneficial effects of the electronic device provided by the present invention are the same as those of the aforementioned method for correcting the uncertainty of historical monitoring data for contaminated site assessment, and will not be elaborated upon here.

[0088] The present invention also provides a computer-readable storage medium having a computer program stored thereon, characterized in that, when the computer program is executed by a processor, it implements the steps in the above-described method for correcting the uncertainty of historical monitoring data for contaminated site assessment. Compared with the prior art, the beneficial effects of the computer-readable storage medium provided by the present invention are the same as the beneficial effects of the above-described method for correcting the uncertainty of historical monitoring data for contaminated site assessment, and will not be elaborated here.

[0089] The above description is merely a specific embodiment of the present invention, but the scope of protection of the present invention is not limited thereto. Any variations or substitutions that can be easily conceived by those skilled in the art within the scope of the technology disclosed in the present invention should be included within the scope of protection of the present invention. Therefore, the scope of protection of the present invention should be determined by the scope of the claims.

Claims

1. A method for correcting the uncertainty of historical monitoring data for contaminated site assessment, characterized in that, include: Step 1: Obtain historical monitoring data for the target contaminated site; Step 2: Sort the historical monitoring data in chronological order to form a time series of historical monitoring data; Step 3: Divide the time series of historical monitoring data into multiple time windows according to preset intervals; Step 4: Extract the feature values ​​of each time window to construct a feature vector; Step 5: Input the feature vectors into the machine learning model for training to obtain the uncertainty data monitoring model; Step 6: Use the aforementioned uncertainty data monitoring model to identify uncertain data in historical monitoring data; Step 7: Use interpolation to correct for uncertain data.

2. The method for correcting the uncertainty of historical monitoring data for contaminated site assessment according to claim 1, characterized in that, Step 4: Extracting feature values ​​from each time window to construct a feature vector, including: Step 4.1: Calculate the mean and standard deviation of the data within the time window; Step 4.2: Calculate the skewness and sharpness of the data distribution within the time window based on the mean and standard deviation of the data; Step 4.3: Fit a feature function of the data changing over time within the time window, and obtain trend features based on this feature function; Step 4.4: Construct feature vectors using mean, standard deviation, skewness, sharpness, and trend characteristics.

3. The method for correcting the uncertainty of historical monitoring data for contaminated site assessment according to claim 2, characterized in that, In step 4.2, the formula for calculating the skewness is: in, Indicates the degree of skewness. Indicates the number of data points within the time window. This represents the i-th data point within the time window. This represents the mean. It represents the standard deviation.

4. The method for correcting the uncertainty of historical monitoring data for contaminated site assessment according to claim 3, characterized in that, In step 4.2, the formula for calculating sharpness is: in, Indicates sharpness, Indicates the number of data points within the time window. This represents the i-th data point within the time window. This represents the mean. It represents the standard deviation.

5. The method for correcting the uncertainty of historical monitoring data for contaminated site assessment according to claim 4, characterized in that, Step 4.3: Fit a feature function to the data change over time within the time window, and obtain trend features based on this feature function, including: A first-order linear model is used to fit the data within the time window to obtain a feature function, and the slope and intercept of this feature function are used as trend features; the formula for calculating the trend features is as follows: in, Represents the set of data points within a time window. Indicates the slope. Indicates the intercept. Represents the characteristic function. This represents the data point at time t within the time window. This represents the average time within the window.

6. The method for correcting the uncertainty of historical monitoring data for contaminated site assessment according to claim 5, characterized in that, Step 5: Inputting the feature vector into a machine learning model for training to obtain an uncertainty data monitoring model, including: The feature vector is input into a fully connected neural network, and the weights and biases of the fully connected neural network are updated through backpropagation to obtain an uncertainty data monitoring model; the loss function during training is: in, Represents the loss function. Indicates the number of data points. This represents the predicted value of the i-th data point. This represents the true value of the i-th data point.

7. The method for correcting the uncertainty of historical monitoring data for contaminated site assessment according to claim 1, characterized in that, Step 7: Correcting the uncertain data using interpolation, including: Step 7.1: Obtain adjacent data points for uncertain data; Step 7.2: Correct the uncertain data by using the linear relationship between adjacent data points; the correction formula is as follows: in, This indicates the corrected data. Timestamps representing uncertain data Indicates the timestamp of the first adjacent data point. This represents the timestamp of the second adjacent data point. This represents the value of the first adjacent data point. This represents the value of the second adjacent data point.

8. An electronic device comprising a bus, a transceiver, a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein the transceiver, the memory, and the processor are connected via the bus, characterized in that, When the computer program is executed by the processor, it implements the steps in the method for correcting the uncertainty of historical monitoring data for the assessment of contaminated sites as described in any one of claims 1-6.

9. A computer-readable storage medium having a computer program stored thereon, characterized in that, When the computer program is executed by the processor, it implements the steps in the method for correcting the uncertainty of historical monitoring data for the assessment of contaminated sites as described in any one of claims 1-6.