Soil moisture content time series data processing method and device

By combining adaptive analysis windows and soil hydrophysical processes, anomalies in soil moisture content time series data can be identified and removed, solving the problem of low identification accuracy in existing technologies and improving data quality.

CN122019987BActive Publication Date: 2026-06-26BEIJING GEOLOGY INST

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
BEIJING GEOLOGY INST
Filing Date
2026-04-14
Publication Date
2026-06-26

Smart Images

  • Figure CN122019987B_ABST
    Figure CN122019987B_ABST
Patent Text Reader

Abstract

The application provides a soil moisture content time series data processing method and device, and relates to the technical field of data processing.The method comprises the following steps: collecting initial soil moisture content time series data corresponding to one or more preset monitoring points through a preset sensor; deleting data outside a first preset threshold interval in the initial soil moisture content time series data to obtain first soil moisture content time series data; deleting data repeated for a preset number of times continuously in the first soil moisture content time series data to obtain second soil moisture content time series data; deleting jump values in the second soil moisture content time series data to obtain third soil moisture content time series data; and deleting abnormal fluctuation values in the third soil moisture content time series data to obtain target soil moisture content time series data.The soil moisture content time series data processing method and device can accurately identify and eliminate abnormal data in the soil moisture content time series data, and improve the quality of the processed soil moisture content time series data.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of data processing technology, and in particular to a method and apparatus for processing time-series data of soil moisture content. Background Technology

[0002] Soil moisture content is one of the core parameters in fields such as geological disaster monitoring and early warning, agricultural production and precision irrigation, and hydrological and meteorological research. The quality of its monitoring data directly affects the accuracy of analytical results and the reliability of application decisions. Therefore, it is necessary to accurately identify anomalies in soil moisture content time-series data and effectively clean the data to improve its quality.

[0003] Currently, most existing methods for identifying soil moisture anomalies rely on range checks based on fixed thresholds or simple statistical outlier detection. While these methods can remove some obvious outliers (such as out-of-range data caused by sensor failure), they have low accuracy in identifying most anomalies, resulting in numerous false positives and false negatives. Consequently, they cannot effectively clean soil moisture time-series data, leading to low-quality soil moisture time-series data. Summary of the Invention

[0004] This invention provides a method and apparatus for processing soil moisture content time series data, which solves the technical problem that the accuracy of identifying abnormal data in soil moisture content time series data is low in the prior art, resulting in low quality of soil moisture content time series data.

[0005] This invention provides a method for processing time-series soil moisture content data, comprising the following steps:

[0006] The initial soil moisture content time series data corresponding to one or more preset monitoring points are collected by preset sensors;

[0007] Delete data outside the first preset threshold range from the initial soil moisture content time series data to obtain the first soil moisture content time series data; the first preset threshold range is determined based on the range of the preset sensor and the historical soil moisture data of the preset monitoring points;

[0008] Delete the data that is repeated a preset number of times consecutively in the first soil moisture content time series data to obtain the second soil moisture content time series data;

[0009] Remove the jump values ​​from the second soil moisture content time series data to obtain the third soil moisture content time series data;

[0010] Delete the abnormal fluctuation values ​​in the third soil moisture content time series data to obtain the target soil moisture content time series data.

[0011] According to a method for processing soil moisture content time-series data provided by the present invention, the step of determining the jump value includes:

[0012] Based on the second soil moisture content time series data, the width of the first window and the width of the second window are determined respectively.

[0013] The smaller of the first window width and the second window width is used as the adaptive moving window width;

[0014] The jump value is determined based on the width of the moving window.

[0015] According to a method for processing soil moisture content time-series data provided by the present invention, the step of determining a first window width and a second window width based on the second soil moisture content time-series data includes:

[0016] Determine the response data in the second soil moisture content time series data where the moisture content change value is greater than the second preset threshold;

[0017] The preset monitoring point corresponding to the response data is used as the target monitoring point;

[0018] Based on the soil texture type of the target monitoring point, the soil moisture diffusivity is determined;

[0019] The soil moisture response time is determined based on the soil depth and soil moisture diffusivity at the target monitoring point.

[0020] The width of the first window is determined based on the soil moisture response time.

[0021] According to a method for processing soil moisture content time-series data provided by the present invention, the step of determining the width of a first window and the width of a second window based on the second soil moisture content time-series data further includes:

[0022] Determine each monitoring period corresponding to the second soil moisture content time series data;

[0023] The median data for each monitoring period is extracted from the second soil moisture content time series data and used as resampling data;

[0024] Autocorrelation analysis is performed on the resampled data to obtain a target curve of the autocorrelation coefficient as a function of lag time; the lag time is determined based on the monitoring period.

[0025] The horizontal line corresponding to the preset autocorrelation coefficient threshold is intersected with the target curve, and the maximum lag time corresponding to the point above the horizontal line is used as the width of the second window.

[0026] According to a method for processing soil moisture content time-series data provided by the present invention, determining the jump value based on the moving window width includes:

[0027] Extract the time period data corresponding to each data point to be processed from the second soil moisture content time series data; the time period data is the data in the moving interval corresponding to the data point to be processed; the moving interval includes a first interval and a second interval; the first interval is the interval before the acquisition time corresponding to the data point to be processed; the second interval is the interval after the acquisition time corresponding to the data point to be processed; the interval length of the first interval and the second interval is half the width of the moving window;

[0028] Calculate the difference between the mean of the time period data and the data to be processed corresponding to the time period data;

[0029] Calculate the multiple between the first standard deviation of the data for the time period and the difference;

[0030] Calculate the absolute deviation between the mean of the time period data and the data to be processed corresponding to the time period data;

[0031] The data to be processed corresponding to the multiple being greater than or equal to the third preset threshold and the absolute deviation being greater than or equal to the fourth preset threshold is used as the jump value.

[0032] According to a method for processing soil moisture content time-series data provided by the present invention, the step of determining the abnormal fluctuation value includes:

[0033] Based on the moving window width, the third soil moisture content time series data is divided into multiple window data;

[0034] Calculate the data turning rate and second standard deviation of the window data;

[0035] Perform linear trend regression on the window data to obtain the corresponding regression line;

[0036] A t-test was performed on the slope of the regression line to obtain the significance test P-value;

[0037] The window data corresponding to the data turning rate being greater than or equal to the fifth preset threshold, the second standard deviation being greater than or equal to the sixth preset threshold, and the significance test P-value being greater than or equal to the seventh preset threshold, is used as the abnormal fluctuation value.

[0038] The present invention also provides a soil moisture content time-series data processing device, comprising the following modules:

[0039] The data acquisition module is used to acquire time-series data of initial soil moisture content corresponding to one or more preset monitoring points through preset sensors;

[0040] The first processing module is used to delete data outside the first preset threshold range in the initial soil moisture content time series data to obtain the first soil moisture content time series data; the first preset threshold range is determined based on the range of the preset sensor and the historical soil moisture data of the preset monitoring point.

[0041] The second processing module is used to delete data that is repeated a preset number of times consecutively in the first soil moisture content time series data to obtain the second soil moisture content time series data.

[0042] The third processing module is used to delete the jump values ​​in the second soil moisture content time series data to obtain the third soil moisture content time series data.

[0043] The fourth processing module is used to delete abnormal fluctuation values ​​in the third soil moisture content time series data to obtain the target soil moisture content time series data.

[0044] The present invention also provides an electronic device, including a memory, a processor, and a computer program stored in the memory and running on the processor, wherein the processor executes the computer program to implement the soil moisture content time-series data processing method as described above.

[0045] The present invention also provides a non-transitory computer-readable storage medium having a computer program stored thereon, which, when executed by a processor, implements the soil moisture content time-series data processing method as described above.

[0046] The present invention also provides a computer program product, including a computer program that, when executed by a processor, implements the soil moisture content time-series data processing method as described above.

[0047] The present invention provides a method and apparatus for processing soil moisture content time-series data. This method involves acquiring initial soil moisture content time-series data corresponding to one or more preset monitoring points using a preset sensor; deleting data outside a first preset threshold range from the initial soil moisture content time-series data to obtain first soil moisture content time-series data; wherein the first preset threshold range is determined based on the range of the preset sensor and historical soil moisture data of the preset monitoring points, thereby quickly eliminating significantly abnormal data that clearly exceeds a reasonable range based on physical principles and historical patterns; and deleting data that is repeatedly processed a preset number of times from the first soil moisture content time-series data to obtain... The second soil moisture content time series data is used to eliminate invalid and duplicate data caused by communication failures, ensuring the validity and variability of the data. Jump values ​​in the second soil moisture content time series data are deleted to obtain the third soil moisture content time series data, thereby accurately identifying and eliminating short-term drastic change points that are inconsistent with the trend of surrounding data. Abnormal fluctuation values ​​in the third soil moisture content time series data are deleted to obtain the target soil moisture content time series data, thereby accurately identifying and eliminating abnormal data hidden in the soil moisture content time series data, improving the overall smoothness and reliability of the data, and thus improving the quality of the processed soil moisture content time series data. Attached Figure Description

[0048] To more clearly illustrate the technical solutions in this invention or the prior art, the drawings used in the description of the embodiments or the prior art will be briefly introduced below. Obviously, the drawings described below are some embodiments of this invention. For those skilled in the art, other drawings can be obtained from these drawings without creative effort.

[0049] Figure 1 This is a flowchart illustrating the soil moisture content time-series data processing method provided by the present invention.

[0050] Figure 2 This is a schematic diagram showing how the autocorrelation coefficient changes with the number of lag hours, as provided by this invention.

[0051] Figure 3 This is a schematic diagram of the soil moisture content time-series data processing device provided by the present invention.

[0052] Figure 4 This is a schematic diagram of the structure of the electronic device provided by the present invention. Detailed Implementation

[0053] To make the objectives, technical solutions, and advantages of this invention clearer, the technical solutions of this invention will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some, not all, of the embodiments of this invention. All other embodiments obtained by those skilled in the art based on the embodiments of this invention without creative effort are within the scope of protection of this invention.

[0054] Currently, most existing methods for identifying soil moisture anomalies rely on range checks based on fixed thresholds or simple statistical outlier detection. While these methods can remove some obvious outliers (such as out-of-range data caused by sensor failure), they have significant limitations, mainly including the following:

[0055] First, existing methods for identifying local anomalies typically rely on a fixed-length analysis window. This window width is often subjectively set by human experience, lacking a physical basis and failing to adapt to the differences in soil moisture transport rates caused by variations in soil texture, soil depth, and other geological conditions at different monitoring points. Because the analysis window is disconnected from the actual hydrological response time of the soil, this leads to numerous misjudgments (misclassifying normal moisture changes as anomalies) and omissions (failing to identify genuine anomalies) in anomaly identification.

[0056] Secondly, soil moisture content changes are continuous and autocorrelated, while existing methods cannot dynamically adapt to the inherent change cycles of different monitoring sequences, resulting in low accuracy in identifying abnormal data.

[0057] Finally, existing methods lack effective and universally applicable methods to identify situations where data exhibits frequent, large, and irregular fluctuations within a short period. Such anomalies may be caused by intermittent sensor connection failures, external environmental interference, or other factors. However, existing methods simply use the fluctuation amplitude for judgment, which can easily be confused with actual rainfall infiltration processes, leading to a high misjudgment rate.

[0058] To address the aforementioned issues, this invention provides a method and apparatus for processing soil moisture content time-series data. By adaptively determining the analysis window and combining it with soil hydrophysical processes, it accurately and reliably identifies various types of abnormal data, thereby effectively cleaning soil moisture content time-series data and improving the quality of the data.

[0059] The following is combined with Figures 1 to 4 The present invention describes a method and apparatus for processing soil moisture content time-series data.

[0060] Figure 1 This is a flowchart illustrating the soil moisture content time-series data processing method provided by the present invention, as shown below. Figure 1 As shown, the method includes the following steps:

[0061] Step 101: Collect initial soil moisture content time-series data corresponding to one or more preset monitoring points using preset sensors;

[0062] Specifically, a specific frequency or form of electromagnetic signal is emitted into the soil through a preset sensor. Then, the changes in electromagnetic signal parameters (such as propagation time, frequency, amplitude, etc.) in the soil or on the sensor probe are measured. Finally, according to the calibration equation stored in the preset sensor, the measured changes are converted into the apparent dielectric constant of the soil, thereby indirectly obtaining the soil volumetric water content data, which serves as the original observation data of soil water content.

[0063] In this embodiment of the invention, the initial soil moisture content time-series data can be raw soil moisture content observation data from one or more geographical locations, arranged in chronological order. This data is acquired continuously or periodically by a preset sensor within a preset data acquisition duration. The preset sensor can be any sensor based on the dielectric constant principle, such as a frequency domain reflectometry (FDR) sensor or a time domain reflectometry (TDR) sensor. The data acquisition duration can be set according to actual needs, such as any time period like 30, 40, or 60 consecutive days; the monitoring frequency can be set according to sensor performance, communication network quality, or actual needs, such as any monitoring frequency like acquiring one data point every 10 minutes or one data point every hour.

[0064] For example, in one embodiment, three soil volumetric moisture sensors at different depths were deployed at a landslide hazard point, specifically at depths of 20cm, 50cm, and 100cm. Each sensor collected data every 60 minutes and transmitted the data (including monitoring point ID, depth, timestamp, and moisture content value) to the cloud platform via an IoT module to obtain initial soil moisture content time-series data.

[0065] Step 102: Delete the data outside the first preset threshold interval in the initial soil moisture content time series data to obtain the first soil moisture content time series data; the first preset threshold interval is determined based on the range of the preset sensor and the historical soil moisture data of the preset monitoring point;

[0066] Specifically, the first preset threshold range is determined based on the preset sensor's operating range and the long-term historical data distribution of preset monitoring points. It is used to quickly eliminate unreasonable data caused by sensor failure, power outage, transmission errors, etc.

[0067] For example, for a sensor with a working range of 0-100% soil volumetric moisture content, its first preset threshold range should be within 0-100%; if the long-term historical data distribution of the preset monitoring points has never been lower than 5% or higher than 70%, then the first preset threshold range can be further narrowed to within 5%-70%.

[0068] In one embodiment, for a surface sensor in an agricultural area, the theoretical minimum value is 0 (i.e., air-dried soil). If the local long-term field water holding capacity data ranges from 40% to 60%, then, referring to the distribution of historical soil moisture monitoring data and considering oversaturation caused by extreme rainfall, the first preset threshold range for soil volumetric moisture content can be set to 5%-70%. If, after irrigation, a sensor records a volumetric moisture content of 85% due to poor probe contact, this can be identified as unreasonable data and deleted directly.

[0069] Based on physical principles and historical experience, the embodiments of the present invention can quickly and efficiently remove unreasonable data (such as maximum / minimum values ​​caused by sensor failure) from the initial soil moisture content time series data, thereby improving data processing efficiency.

[0070] Step 103: Delete the data that is repeated a preset number of times in the first soil moisture content time series data to obtain the second soil moisture content time series data;

[0071] Specifically, due to the continuous dynamic changes in soil moisture and the presence of random electrical noise in the sensor, the probability of the sensor continuously outputting the exact same value is extremely low. If the sensor continuously outputs data that repeats a preset number of times, it can be determined as abnormal data. The preset number of times can be determined based on the normal number of repetitions of data under normal conditions in the historical soil moisture data of the preset monitoring points.

[0072] For example, at a landslide monitoring point, the sensor collects data once per hour. During a data transmission module malfunction, the same "29.5%" was reported for six consecutive hours. If the preset number of occurrences is six, this sequence can be identified as abnormal data and deleted.

[0073] This invention improves the accuracy of soil moisture content time-series data by effectively identifying and eliminating invalid data caused by sensor or data acquisition system "freezing".

[0074] Step 104: Delete the jump values ​​in the second soil moisture content time series data to obtain the third soil moisture content time series data;

[0075] Specifically, theoretical equations and autocorrelation analysis are used to determine an adaptive moving window width based on the soil hydrological response time and data autocorrelation patterns of each preset monitoring point. This allows for the accurate identification and deletion of abrupt values ​​in the second soil moisture content time series data, thereby improving data quality.

[0076] Furthermore, the step of determining the jump value includes:

[0077] Based on the second soil moisture content time series data, the width of the first window and the width of the second window are determined respectively.

[0078] The smaller of the first window width and the second window width is used as the adaptive moving window width;

[0079] The jump value is determined based on the width of the moving window.

[0080] Further, determining the width of the first window and the width of the second window based on the second soil moisture content time-series data includes:

[0081] Determine the response data in the second soil moisture content time series data where the moisture content change value is greater than the second preset threshold;

[0082] The preset monitoring point corresponding to the response data is used as the target monitoring point;

[0083] Based on the soil texture type of the target monitoring point, the soil moisture diffusivity is determined;

[0084] The soil moisture response time is determined based on the soil depth and soil moisture diffusivity at the target monitoring point.

[0085] The width of the first window is determined based on the soil moisture response time.

[0086] Specifically, the first window width is based on the soil moisture response time, reflecting the hydrological response time of the preset monitoring points, typically corresponding to the degree of change in soil moisture content. Therefore, firstly, response data in the second soil moisture content time series data where the change in moisture content exceeds a second preset threshold are identified, and then the preset monitoring points corresponding to these response data are used as target monitoring points. The second preset threshold is used to determine whether the degree of change in moisture content exceeds normal conditions and can be determined based on the data changes in the historical soil moisture data of the preset monitoring points.

[0087] For example, the target monitoring point can be the pre-set monitoring point corresponding to the soil layer depth where the soil moisture changes significantly after the first heavy rain in the flood season. If the flood season is from June 1 to September 15, the target monitoring point can be the pre-set monitoring point corresponding to the soil layer depth where the soil moisture of the deepest layer rises by more than 10% (i.e., the second pre-set threshold) during the first rainfall after June 1 with a cumulative rainfall of more than 100 mm, and the corresponding soil layer depth can be recorded.

[0088] After determining the target monitoring points, the soil moisture diffusivity is determined based on the soil texture type of the target monitoring points. The soil moisture diffusivity is an empirical value. Table 1 shows a correspondence between the reference values ​​of soil moisture diffusivity and soil texture type provided by this invention. As shown in Table 1, the corresponding soil moisture diffusivity can be estimated based on the soil texture type.

[0089] Table 1. Correspondence between reference values ​​of soil moisture diffusivity and soil texture type

[0090]

[0091] Based on the soil depth and soil moisture diffusivity at the target monitoring point, the soil moisture response time can be calculated. The expression is as follows:

[0092]

[0093] In the formula, L represents the soil depth and D represents the soil moisture diffusivity.

[0094] Based on the soil moisture response time, the width of the first window can be further calculated. The expression is as follows:

[0095]

[0096] In the formula, k is a proportionality coefficient (which can be determined by observing and adjusting based on historical data, usually between 1.5 and 2.5), used to ensure that the data within the width of the first window can reflect the soil moisture response process; This refers to the soil moisture response time.

[0097] The embodiments of the present invention determine the width of the first window based on the physical nature of soil hydrology, so that the data processing process is combined with the actual soil moisture movement pattern. This allows the invention to adapt to the differences in soil moisture transport speed caused by differences in geological conditions such as soil texture and soil depth at different monitoring points, reduce misjudgments and omissions in the identification of jump values, improve the accuracy of data identification, and further improve data quality.

[0098] Furthermore, the step of determining the width of the first window and the width of the second window based on the second soil moisture content time-series data further includes:

[0099] Determine each monitoring period corresponding to the second soil moisture content time series data;

[0100] The median data for each monitoring period is extracted from the second soil moisture content time series data and used as resampling data;

[0101] Autocorrelation analysis is performed on the resampled data to obtain a target curve of the autocorrelation coefficient as a function of lag time; the lag time is determined based on the monitoring period.

[0102] The horizontal line corresponding to the preset autocorrelation coefficient threshold is intersected with the target curve, and the maximum lag time corresponding to the point above the horizontal line is used as the width of the second window.

[0103] Specifically, firstly, each monitoring period corresponding to the second soil moisture content time series data is determined. Then, the data corresponding to the median of each monitoring period is extracted from the second soil moisture content time series data as resampled data. Then, autocorrelation analysis is performed on the resampled data to quantify the intrinsic time dependence between data points and obtain the target curve of the autocorrelation coefficient changing with lag time.

[0104] Figure 2 This is a schematic diagram illustrating how the autocorrelation coefficient changes with the number of lag hours, as provided by the present invention. Figure 2 The figure shows the target curve illustrating the change of the autocorrelation coefficient with lag time. Figure 2 In this example, the autocorrelation coefficient threshold is set to 0.6. The horizontal line corresponding to the autocorrelation coefficient threshold intersects the target curve. The maximum lag time above the horizontal line represents the maximum time span during which the data maintains a strong correlation. Therefore, the maximum lag hours corresponding to the points above the horizontal line can be used as the second window width, i.e., the window width based on data autocorrelation. ).

[0105] For example, in one embodiment, data points from the second soil moisture content time series, taken every 10 minutes, are resampled to one per hour (using the median of all data within that hour). Autocorrelation analysis is performed on the resampled sequences to obtain a target curve showing how the autocorrelation coefficient changes with the lag time. A threshold of 0.6 is set; if the autocorrelation coefficient first falls below 0.6 at a lag time of 36 hours (1.5 days), then… That is, 1.5 days.

[0106] In this embodiment of the invention, the width of the second window is determined based on the time-dependent structure of the data itself, ensuring that the data points within the second window width have statistical correlation, thereby dynamically adapting to the inherent change cycle of different monitoring sequences and improving the accuracy of identifying jump values.

[0107] Further, determining the jump value based on the width of the moving window includes:

[0108] Extract the time period data corresponding to each data point to be processed from the second soil moisture content time series data; the time period data is the data in the moving interval corresponding to the data point to be processed; the moving interval includes a first interval and a second interval; the first interval is the interval before the acquisition time corresponding to the data point to be processed; the second interval is the interval after the acquisition time corresponding to the data point to be processed; the interval length of the first interval and the second interval is half the width of the moving window;

[0109] Calculate the difference between the mean of the time period data and the data to be processed corresponding to the time period data;

[0110] Calculate the multiple between the first standard deviation of the data for the time period and the difference;

[0111] Calculate the absolute deviation between the mean of the time period data and the data to be processed corresponding to the time period data;

[0112] The data to be processed corresponding to the multiple being greater than or equal to the third preset threshold and the absolute deviation being greater than or equal to the fourth preset threshold is used as the jump value.

[0113] Specifically, for each preset monitoring point, after determining the first window width and the second window width, the smaller of the first window width and the second window width is used as the adaptive moving window width (WS).

[0114] Based on the moving window width, the time period data corresponding to each data point to be processed in the second soil moisture content time series data is extracted. The time period data is the data within the moving interval corresponding to the data point to be processed; the moving interval includes a first interval and a second interval; the first interval is the interval before the acquisition time corresponding to the data point to be processed, centered on the acquisition time; the second interval is the interval after the acquisition time corresponding to the data point to be processed, centered on the acquisition time; the length of both the first and second intervals is half the moving window width.

[0115] After extracting the time period data corresponding to each piece of data to be processed, the specific steps for determining the jump value include:

[0116] First, each piece of data to be processed is used. front and back Window data within a time period Calculate the mean respectively and the first standard deviation ;

[0117] Then, calculate the mean. With the data to be processed The difference between them;

[0118] Then, calculate the difference and the first standard deviation. Multiples between The expression is as follows:

[0119]

[0120] In the formula, This indicates taking the absolute value.

[0121] Then, calculate the mean. With the data to be processed absolute deviation between The expression is as follows:

[0122]

[0123] Finally, in multiples Greater than or equal to the third preset threshold, and the absolute deviation Data points greater than or equal to the fourth preset threshold are used as transition values, and these transition values ​​are removed from the second soil moisture content time series data to obtain the third soil moisture content time series data. Based on historical data analysis, the third preset threshold can be set to a value of 1.5 or 2, and the fourth preset threshold can be set to 10%.

[0124] This invention embodiment uses the data points before and after the data points to be processed. Within the window, jump values ​​are identified and removed by using both the standard deviation multiple of the data from the mean and the absolute value threshold of the standard deviation from the mean. This avoids misjudging normal fluctuations during stable periods (small standard deviation) (for example, it avoids misjudging the problem of small fluctuations in soil moisture content during periods without rainfall, which lead to a low overall standard deviation, and relying solely on the standard deviation multiple threshold for judgment). It also avoids missing true jump values ​​during periods of change (large standard deviation). Thus, through the adaptive window and the collaborative judgment of dual thresholds (standard deviation multiple and absolute deviation), it achieves accurate identification of local pulse jump values, reducing the risk of misjudging and missing jump values ​​during stable and changing periods.

[0125] Step 105: Delete the abnormal fluctuation values ​​in the third soil moisture content time series data to obtain the target soil moisture content time series data.

[0126] Specifically, to identify anomalous fluctuations in values ​​that are frequent and significant, while avoiding misjudging minor fluctuations in soil moisture content during periods without rainfall as anomalies, this embodiment of the invention uses three indices—Rate of Directional Change (RDC), Standard Deviation (STD), and the significance test p-value for trend regression—for coordinated judgment. RDC and STD are used together to identify large fluctuations, while the p-value is used to avoid misjudging anomalies by small fluctuations in the first half of the moving window leading to a high RDC, or by a rise in STD due to rainfall in the second half of the window.

[0127] Furthermore, the step of determining the abnormal fluctuation value includes:

[0128] Based on the moving window width, the third soil moisture content time series data is divided into multiple window data;

[0129] Calculate the data turning rate and second standard deviation of the window data;

[0130] Perform linear trend regression on the window data to obtain the corresponding regression line;

[0131] A t-test was performed on the slope of the regression line to obtain the significance test P-value;

[0132] The window data corresponding to the data turning rate being greater than or equal to the fifth preset threshold, the second standard deviation being greater than or equal to the sixth preset threshold, and the significance test P-value being greater than or equal to the seventh preset threshold, is used as the abnormal fluctuation value.

[0133] Specifically, firstly, the third soil moisture content time series data is processed according to an adaptive moving window width. The data is divided into multiple parts to obtain window data within multiple windows to be analyzed. ;

[0134] Then, calculate the data for each window. Data turning rate This is used to characterize the proportion of the total data volume where the data direction changes, reflecting data fluctuations. The specific calculation process is as follows: calculate the window data. The first-order difference of is expressed as follows:

[0135]

[0136] In the formula, Represents window data The first difference result of the t-th data point. and Represents window data Two adjacent data points in the array represent the t-th data point and the (t-1)-th data point, respectively.

[0137] To obtain the sign of a first-order difference sequence, the expression is as follows:

[0138]

[0139] In the formula, Indicates the sign change of the first-order difference. Represents window data The first-order difference result of the (t-1)th data.

[0140] Since slowly varying data such as soil moisture content typically exhibits a trend within a certain time window, without frequent reversals in the direction of data change, this invention designs a data turning rate to quantify the frequency of reversals in the direction of time-series data change and identify anomalous fluctuations in the window data that do not conform to slowly varying physical processes. This is done to collaboratively judge anomalous fluctuation values, thereby improving the accuracy and reliability of anomalous data identification. Specifically, the data turning rate is defined as the proportion of adjacent difference fractions with different signs to the total number of fractions. The calculation expression is as follows:

[0141]

[0142] In the formula, For data turning rate, The length of the window data.

[0143] In calculation Then, the data for each window is further calculated. Second standard deviation Then for each window data Perform linear trend regression to obtain the corresponding regression line, and finally conduct a t-test on the slope of the regression line to obtain the p-value of the data trend regression. ).

[0144] Based on the analysis of historical soil moisture data, the data shift rate can be adaptively set. The corresponding fifth preset threshold Second standard deviation The corresponding sixth preset threshold , The corresponding seventh preset threshold .

[0145] when Greater than or equal to , Greater than or equal to , Greater than or equal to At this time, it indicates the window data within the window to be analyzed. Large fluctuations without a clear trend can be identified as abnormal fluctuations and deleted, thus obtaining the time series data of the target soil moisture content.

[0146] For example, in one embodiment, within a time window (WS=6 hours), the data exhibits disordered oscillations due to a loose sensor connection: [30%, 15%, 38%, 12%, 33%]. The calculated RDC is 1.0 (greater than...). ), =10% (greater than) ), =0.8 (greater than) At this point, all data in the window is identified as abnormal fluctuation values ​​and deleted.

[0147] For example, in another embodiment, during a rainfall event, the window data initially fluctuates slightly before steadily increasing: [28%, 29%, 27%, 30%, 35%, 42%, 48%]. Although RDC and All are greater than the corresponding thresholds, but the data trend regression It shows a very strong upward trend. =0.008 (less than) At this point, the fluctuation was identified as a normal hydrological response, and all data was retained, thus effectively avoiding misjudgment.

[0148] This invention uses three indicators—data turning rate, standard deviation, and significance test P-value—to make a coordinated judgment, which can accurately identify and eliminate irregular fluctuations that may be caused by factors such as intermittent sensor connection failures or external environmental interference. At the same time, it effectively avoids misjudging normal and trending moisture changes as abnormal fluctuations, thereby improving the accuracy of abnormal fluctuation value identification and thus improving the quality of soil moisture time series data.

[0149] Based on any of the above embodiments, the soil moisture content time-series data processing method provided by the present invention collects initial soil moisture content time-series data corresponding to one or more preset monitoring points using preset sensors; deletes data outside a first preset threshold interval from the initial soil moisture content time-series data to obtain first soil moisture content time-series data; wherein, the first preset threshold interval is determined based on the range of the preset sensors and historical soil moisture data of the preset monitoring points, thereby quickly eliminating significantly abnormal data that clearly exceeds a reasonable range based on physical principles and historical patterns; and deletes data that are repeated a preset number of times consecutively from the first soil moisture content time-series data. According to the method, a second soil moisture content time series data is obtained, thereby eliminating invalid and duplicate data caused by communication failures and ensuring the validity and variability of the data; jump values ​​in the second soil moisture content time series data are deleted to obtain a third soil moisture content time series data, thereby accurately identifying and eliminating short-term drastic change points that are inconsistent with the trend of surrounding data; abnormal fluctuation values ​​in the third soil moisture content time series data are deleted to obtain the target soil moisture content time series data, thereby accurately identifying and eliminating abnormal data hidden in the soil moisture content time series data, improving the overall smoothness and reliability of the data, and thus improving the quality of the processed soil moisture content time series data.

[0150] The soil moisture content time series data processing device provided by the present invention is described below. The soil moisture content time series data processing device described below and the soil moisture content time series data processing method described above can be referred to in correspondence.

[0151] Figure 3 This is a schematic diagram of the soil moisture content time-series data processing device provided by the present invention, as shown below. Figure 3 As shown. An embodiment of the present invention provides a soil moisture content time-series data processing device, comprising an acquisition module 301, a first processing module 302, a second processing module 303, a third processing module 304, and a fourth processing module 305, wherein:

[0152] The acquisition module 301 is used to acquire initial soil moisture content time-series data corresponding to one or more preset monitoring points through preset sensors; the first processing module 302 is used to delete data outside a first preset threshold range in the initial soil moisture content time-series data to obtain first soil moisture content time-series data; the first preset threshold range is determined based on the range of the preset sensors and the historical soil moisture data of the preset monitoring points; the second processing module 303 is used to delete data that repeats continuously a preset number of times in the first soil moisture content time-series data to obtain second soil moisture content time-series data; the third processing module 304 is used to delete jump values ​​in the second soil moisture content time-series data to obtain third soil moisture content time-series data; the fourth processing module 305 is used to delete abnormal fluctuation values ​​in the third soil moisture content time-series data to obtain target soil moisture content time-series data.

[0153] The soil moisture content time-series data processing device provided by this invention collects initial soil moisture content time-series data corresponding to one or more preset monitoring points through preset sensors; deletes data outside a first preset threshold range from the initial soil moisture content time-series data to obtain first soil moisture content time-series data; wherein, the first preset threshold range is determined based on the range of the preset sensors and the historical soil moisture data of the preset monitoring points, thereby quickly eliminating significantly abnormal data that clearly exceeds a reasonable range based on physical principles and historical patterns; deletes data that are continuously repeated a preset number of times from the first soil moisture content time-series data to obtain the first... The process involves two soil moisture content time series data sets: First, invalid and duplicate data due to communication failures are removed, ensuring data validity and variability. Second, abrupt changes in the soil moisture content time series data are deleted to obtain a third set of soil moisture content time series data, accurately identifying and removing short-term, drastic abrupt changes that do not conform to surrounding data trends. Third, abnormal fluctuations in the third set of soil moisture content time series data are removed to obtain the target soil moisture content time series data, accurately identifying and removing abnormal data hidden within the soil moisture content time series data, improving the overall smoothness and reliability of the data, and thus enhancing the quality of the processed soil moisture content time series data.

[0154] Figure 4 An example is a schematic diagram of the physical structure of an electronic device, such as... Figure 4 As shown, the electronic device may include: a processor 410, a communication interface 420, a memory 430, and a communication bus 440, wherein the processor 410, the communication interface 420, and the memory 430 communicate with each other via the communication bus 440. The processor 410 can call logical instructions in the memory 430 to execute a soil moisture content time-series data processing method, which includes:

[0155] The initial soil moisture content time series data corresponding to one or more preset monitoring points are collected by preset sensors;

[0156] Delete data outside the first preset threshold range from the initial soil moisture content time series data to obtain the first soil moisture content time series data; the first preset threshold range is determined based on the range of the preset sensor and the historical soil moisture data of the preset monitoring points;

[0157] Delete the data that is repeated a preset number of times consecutively in the first soil moisture content time series data to obtain the second soil moisture content time series data;

[0158] Remove the jump values ​​from the second soil moisture content time series data to obtain the third soil moisture content time series data;

[0159] Delete the abnormal fluctuation values ​​in the third soil moisture content time series data to obtain the target soil moisture content time series data.

[0160] Furthermore, the logical instructions in the aforementioned memory 430 can be implemented as software functional units and, when sold or used as independent products, can be stored in a computer-readable storage medium. Based on this understanding, the technical solution of the present invention, or the part that contributes to the prior art, or a part of the technical solution, can be embodied in the form of a software product. This computer software product is stored in a storage medium and includes several instructions to cause a computer device (which may be a personal computer, server, or network device, etc.) to execute all or part of the steps of the methods described in the various embodiments of the present invention. The aforementioned storage medium includes various media capable of storing program code, such as USB flash drives, portable hard drives, read-only memory (ROM), random access memory (RAM), magnetic disks, or optical disks.

[0161] On the other hand, the present invention also provides a computer program product, the computer program product comprising a computer program that can be stored on a non-transitory computer-readable storage medium, wherein when the computer program is executed by a processor, the computer is able to execute the soil moisture content time-series data processing method provided by the above methods, the method comprising:

[0162] The initial soil moisture content time series data corresponding to one or more preset monitoring points are collected by preset sensors;

[0163] Delete data outside the first preset threshold range from the initial soil moisture content time series data to obtain the first soil moisture content time series data; the first preset threshold range is determined based on the range of the preset sensor and the historical soil moisture data of the preset monitoring points;

[0164] Delete the data that is repeated a preset number of times consecutively in the first soil moisture content time series data to obtain the second soil moisture content time series data;

[0165] Remove the jump values ​​from the second soil moisture content time series data to obtain the third soil moisture content time series data;

[0166] Delete the abnormal fluctuation values ​​in the third soil moisture content time series data to obtain the target soil moisture content time series data.

[0167] In another aspect, the present invention also provides a non-transitory computer-readable storage medium having a computer program stored thereon, which, when executed by a processor, implements a method for processing soil moisture content time-series data provided by the methods described above, the method comprising:

[0168] The initial soil moisture content time series data corresponding to one or more preset monitoring points are collected by preset sensors;

[0169] Delete data outside the first preset threshold range from the initial soil moisture content time series data to obtain the first soil moisture content time series data; the first preset threshold range is determined based on the range of the preset sensor and the historical soil moisture data of the preset monitoring points;

[0170] Delete the data that is repeated a preset number of times consecutively in the first soil moisture content time series data to obtain the second soil moisture content time series data;

[0171] Remove the jump values ​​from the second soil moisture content time series data to obtain the third soil moisture content time series data;

[0172] Delete the abnormal fluctuation values ​​in the third soil moisture content time series data to obtain the target soil moisture content time series data.

[0173] The device embodiments described above are merely illustrative. The units described as separate components may or may not be physically separate. The components shown as units may or may not be physical units; that is, they may be located in one place or distributed across multiple network units. Some or all of the modules can be selected to achieve the purpose of this embodiment according to actual needs. Those skilled in the art can understand and implement this without any creative effort.

[0174] Through the above description of the embodiments, those skilled in the art can clearly understand that each embodiment can be implemented by means of software plus necessary general-purpose hardware platforms, and of course, it can also be implemented by hardware. Based on this understanding, the above technical solutions, in essence or the part that contributes to the prior art, can be embodied in the form of a software product. This computer software product can be stored in a computer-readable storage medium, such as ROM / RAM, magnetic disk, optical disk, etc., and includes several instructions to cause a computer device (which may be a personal computer, server, or network device, etc.) to execute the methods described in the various embodiments or some parts of the embodiments.

[0175] It should be noted that, in this document, the terms "comprising," "including," or any other variations thereof are intended to cover non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements includes not only those elements but also other elements not expressly listed, or elements inherent to such a process, method, article, or apparatus. Without further limitations, an element defined by the phrase "comprising one..." does not exclude the presence of other identical elements in the process, method, article, or apparatus that includes that element. Furthermore, it should be noted that the scope of the methods and apparatuses in the embodiments of this application is not limited to performing functions in the order shown or discussed, but may also include performing functions substantially simultaneously or in the reverse order, depending on the functions involved. For example, the described methods may be performed in a different order than described, and various steps may be added, omitted, or combined. Additionally, features described with reference to certain examples may be combined in other examples.

[0176] In this application's embodiments, "determine B based on A" means that factor A must be considered when determining B. It is not limited to "B can be determined based solely on A," but should also include: "determine B based on A and C," "determine B based on A, C, and E," "determine C based on A, and further determine B based on C," etc. Additionally, it can include using A as a condition for determining B, for example, "when A meets the first condition, determine B using the first method"; another example, "when A meets the second condition, determine B," etc.; another example, "when A meets the third condition, determine B based on the first parameter," etc. Of course, it can also be a condition where A is a factor in determining B, for example, "when A meets the first condition, determine C using the first method, and further determine B based on C," etc.

[0177] It should also be noted that the terms "target," "first," and "second" in this invention are used to distinguish similar objects, not to describe a specific order or sequence. It should be understood that such terms can be used interchangeably where appropriate so that embodiments of this application can be implemented in orders other than those illustrated or described herein, and the objects distinguished by "first" and "second" are generally of the same class, without limiting the number of objects; for example, the first object can be one or more.

[0178] In this invention, the term "multiple" refers to two or more, and other quantifiers are similar.

[0179] Finally, it should be noted that the above embodiments are only used to illustrate the technical solutions of the present invention, and not to limit them; although the present invention has been described in detail with reference to the foregoing embodiments, those skilled in the art should understand that modifications can still be made to the technical solutions described in the foregoing embodiments, or equivalent substitutions can be made to some of the technical features; and these modifications or substitutions do not cause the essence of the corresponding technical solutions to deviate from the spirit and scope of the technical solutions of the embodiments of the present invention.

Claims

1. A method for processing time-series soil moisture content data, characterized in that, include: The initial soil moisture content time series data corresponding to one or more preset monitoring points are collected by preset sensors; Delete data outside the first preset threshold range from the initial soil moisture content time series data to obtain the first soil moisture content time series data; The first preset threshold range is determined based on the range of the preset sensor and the historical soil moisture data of the preset monitoring point; Delete the data that is repeated a preset number of times consecutively in the first soil moisture content time series data to obtain the second soil moisture content time series data; Remove the jump values ​​from the second soil moisture content time series data to obtain the third soil moisture content time series data; Delete the abnormal fluctuation values ​​in the third soil moisture content time series data to obtain the target soil moisture content time series data; The steps for determining the jump value include: Based on the second soil moisture content time series data, the width of the first window and the width of the second window are determined respectively. The smaller of the first window width and the second window width is used as the adaptive moving window width; The jump value is determined based on the width of the moving window; The step of determining the width of the first window and the width of the second window based on the second soil moisture content time series data includes: Determine the response data in the second soil moisture content time series data where the moisture content change value is greater than the second preset threshold; The preset monitoring point corresponding to the response data is used as the target monitoring point; Based on the soil texture type of the target monitoring point, the soil moisture diffusivity is determined; The soil moisture response time is determined based on the soil depth and soil moisture diffusivity at the target monitoring point. The width of the first window is determined based on the soil moisture response time; Determine each monitoring period corresponding to the second soil moisture content time series data; The median data for each monitoring period is extracted from the second soil moisture content time series data and used as resampling data; Autocorrelation analysis is performed on the resampled data to obtain a target curve of the autocorrelation coefficient as a function of lag time; the lag time is determined based on the monitoring period. The horizontal line corresponding to the preset autocorrelation coefficient threshold is intersected with the target curve, and the maximum lag time corresponding to the point above the horizontal line is used as the width of the second window. Determining the jump value based on the width of the moving window includes: Extract the time period data corresponding to each data point to be processed from the second soil moisture content time series data; the time period data is the data in the moving interval corresponding to the data point to be processed; the moving interval includes a first interval and a second interval; the first interval is the interval before the acquisition time corresponding to the data point to be processed; the second interval is the interval after the acquisition time corresponding to the data point to be processed; the interval length of the first interval and the second interval is half the width of the moving window; Calculate the difference between the mean of the time period data and the data to be processed corresponding to the time period data; Calculate the multiple between the first standard deviation of the data for the time period and the difference; Calculate the absolute deviation between the mean of the time period data and the data to be processed corresponding to the time period data; The data to be processed corresponding to the multiple being greater than or equal to the third preset threshold and the absolute deviation being greater than or equal to the fourth preset threshold is used as the jump value.

2. The method for processing soil moisture content time-series data according to claim 1, characterized in that, The steps for determining the abnormal fluctuation value include: Based on the moving window width, the third soil moisture content time series data is divided into multiple window data; Calculate the data turning rate and second standard deviation of the window data; Perform linear trend regression on the window data to obtain the corresponding regression line; A t-test was performed on the slope of the regression line to obtain the significance test P-value; The window data corresponding to the data turning rate being greater than or equal to the fifth preset threshold, the second standard deviation being greater than or equal to the sixth preset threshold, and the significance test P-value being greater than or equal to the seventh preset threshold, is used as the abnormal fluctuation value.

3. A soil moisture content time-series data processing device, characterized in that, include: The data acquisition module is used to acquire time-series data of initial soil moisture content corresponding to one or more preset monitoring points through preset sensors; The first processing module is used to delete data outside the first preset threshold range in the initial soil moisture content time series data to obtain the first soil moisture content time series data. The first preset threshold range is determined based on the range of the preset sensor and the historical soil moisture data of the preset monitoring point; The second processing module is used to delete data that is repeated a preset number of times consecutively in the first soil moisture content time series data to obtain the second soil moisture content time series data. The third processing module is used to delete the jump values ​​in the second soil moisture content time series data to obtain the third soil moisture content time series data. The fourth processing module is used to delete abnormal fluctuation values ​​in the third soil moisture content time series data to obtain the target soil moisture content time series data; The steps for determining the jump value include: Based on the second soil moisture content time series data, the width of the first window and the width of the second window are determined respectively. The smaller of the first window width and the second window width is used as the adaptive moving window width; The jump value is determined based on the width of the moving window; The step of determining the width of the first window and the width of the second window based on the second soil moisture content time series data includes: Determine the response data in the second soil moisture content time series data where the moisture content change value is greater than the second preset threshold; The preset monitoring point corresponding to the response data is used as the target monitoring point; Based on the soil texture type of the target monitoring point, the soil moisture diffusivity is determined; The soil moisture response time is determined based on the soil depth and soil moisture diffusivity at the target monitoring point. The width of the first window is determined based on the soil moisture response time; Determine each monitoring period corresponding to the second soil moisture content time series data; The median data for each monitoring period is extracted from the second soil moisture content time series data and used as resampling data; Autocorrelation analysis is performed on the resampled data to obtain a target curve of the autocorrelation coefficient as a function of lag time; the lag time is determined based on the monitoring period. The horizontal line corresponding to the preset autocorrelation coefficient threshold is intersected with the target curve, and the maximum lag time corresponding to the point above the horizontal line is used as the width of the second window. Determining the jump value based on the width of the moving window includes: Extract the time period data corresponding to each data point to be processed from the second soil moisture content time series data; the time period data is the data in the moving interval corresponding to the data point to be processed; the moving interval includes a first interval and a second interval; the first interval is the interval before the acquisition time corresponding to the data point to be processed; the second interval is the interval after the acquisition time corresponding to the data point to be processed; the interval length of the first interval and the second interval is half the width of the moving window; Calculate the difference between the mean of the time period data and the data to be processed corresponding to the time period data; Calculate the multiple between the first standard deviation of the data for the time period and the difference; Calculate the absolute deviation between the mean of the time period data and the data to be processed corresponding to the time period data; The data to be processed corresponding to the multiple being greater than or equal to the third preset threshold and the absolute deviation being greater than or equal to the fourth preset threshold is used as the jump value.

4. An electronic device comprising a memory, a processor, and a computer program stored in the memory and running on the processor, characterized in that, When the processor executes the computer program, it implements the soil moisture content time-series data processing method as described in claim 1 or 2.

5. A non-transitory computer-readable storage medium having a computer program stored thereon, characterized in that, When the computer program is executed by the processor, it implements the soil moisture content time-series data processing method as described in claim 1 or 2.

6. A computer program product, comprising a computer program, characterized in that, When the computer program is executed by the processor, it implements the soil moisture content time-series data processing method as described in claim 1 or 2.