A soil thallium pollution risk early warning method
By constructing a risk early warning model under a unified grid and time step, combined with soil testing and operating conditions, the problem of quantification and dynamic prediction of soil thallium pollution risk assessment was solved, and a technical solution for automatically selecting priority inspection targets was realized.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- SICHUAN NATURAL RESOURCES EXPERIMENTAL TESTING & RES CENT (SICHUAN NUCLEAR EMERGENCY TECH SUPPORT CENT)
- Filing Date
- 2026-03-30
- Publication Date
- 2026-06-12
AI Technical Summary
Existing methods for assessing soil thallium pollution risk lack a technical solution that systematically couples soil testing, rainfall processes, and upstream emission factors under a unified early warning grid and time step, making it difficult to quantify risk evolution and select priority inspection targets.
The system divides the early warning grid, collects soil samples and generates a detection library, calculates operating condition factors by combining meteorological and emission data, fits sensitivity parameters, constructs a risk indicator function, and performs risk prediction and early warning through a Bayesian fusion model and a time evolution model.
It enables stable quantitative prediction of soil thallium pollution risk and automatic screening of priority inspection targets under complex conditions, and provides a traceable early warning process and rule system.
Smart Images

Figure CN122198652A_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the field of soil pollution risk early warning technology, specifically a method for early warning of soil thallium pollution risk. Background Technology
[0002] In current soil heavy metal pollution risk management, for highly toxic elements such as thallium, the approach largely relies on periodic or special sampling and testing. The measured concentrations are compared with soil background values and relevant standard limits, supplemented by simple spatial interpolation or regional statistics to provide indicators such as the locations and proportions of exceedances. Some regions have attempted to incorporate information such as rainfall and emission records, but these largely remain at the level of post-event analysis or qualitative judgment, failing to form a data system organized according to a unified grid and time step. Testing results, rainfall processes, enterprise emissions, and upstream transport often belong to different systems and have different definitions, making direct correlation difficult. Existing methods in regulatory practice focus more on static judgments of "whether or not exceedances have occurred." They generally lack a quantitative characterization that can be repeatedly calculated to describe how the risk evolves over time under different conditions such as continuous heavy rainfall, high emissions, or prolonged drought within the same grid, and the impact of thallium load carried by upstream water on downstream grids.
[0003] On the other hand, existing risk assessment and early warning work often uses single-source information to construct risk indices, such as relying solely on soil sampling or emission inventories plus empirical coefficients. This lacks long-term quantitative verification of systematic biases, fluctuations, and uncertainties in data from different sources, making it difficult to provide structured indicators such as "risk center level," "potential level under high-level conditions," and "probability of exceeding the warning threshold." When applying these methods at the city or watershed scale, the common practice is to use administrative regions or functional zones as units for coarse-grained classification. This makes it difficult to create a risk spatial distribution at the early warning grid scale that reflects both local soil physicochemical characteristics and upstream transport impacts. Furthermore, determining which grids require more intensive sampling and focused investigation of upstream emissions in the future relies heavily on manual experience and ad-hoc consultations, lacking a traceable and recalculated rule system.
[0004] Therefore, under current technological conditions, there is still a lack of a technical solution for soil thallium pollution scenarios that can systematically couple working factors such as soil monitoring records, rainfall processes, thallium emissions and upstream contributions with soil physicochemical characteristics on a unified early warning grid and time step, and form a quantifiable grid-level risk index and its posterior characteristics based on fully considering the long-term deviations and uncertainties of various risk sources, thereby supporting the dynamic assessment of grid risks and the selection of priority inspection targets for a future period of time. Summary of the Invention
[0005] To address the shortcomings of existing technologies, this invention provides a method for early warning of soil thallium pollution risks, thereby solving the problems mentioned in the background section.
[0006] To achieve the above objectives, the present invention provides the following technical solution: a method for early warning of soil thallium pollution risk, comprising: S1. Divide the early warning grid, collect soil samples in the grid, and determine the content of acid-exchangeable thallium, pH and clay content. Compile the samples according to the sampling location to generate a soil testing library. S2. Based on meteorological and emission data, calculate the rainfall index, emission load and upstream contribution of each grid to generate a working condition factor field at the same scale as the soil monitoring database; S3. Within a grid with multiple detection records, the working condition response relationship is fitted with acid-exchangeable thallium content as the response quantity and working condition factor, pH and clay content as independent variables. Sensitivity parameters are extracted and extended according to soil type and geomorphology codes to form a sensitivity parameter field. S4. Based on the soil test results, working condition factors and sensitivity parameters of each grid, a risk indicator function is constructed to obtain the normalized risk index, and observation and estimation results from different sources are aggregated to form a multi-source risk set; S5. Construct a Bayesian fusion model using the multi-source risk set and the error distribution of each source to obtain the posterior distribution of the grid risk index, and extract risk features as input to the early warning model. S6. Based on the risk posterior time series and working condition factors, train a constrained time evolution model and output grid risk prediction values. Combine spatial neighborhood characteristics to correct the prediction values through spatial residual regression, and generate grid future period graded early warning results according to the threshold.
[0007] Furthermore, S1 includes: Collecting soil samples and generating a soil testing library includes: setting up soil sampling points within the early warning grid and recording their locations and sampling times to form soil samples with sample numbers; In the laboratory, soil samples were tested for acid-exchangeable thallium content, pH, and clay content. The test results were electronically transmitted to the monitoring platform, which then performed format and range verification. Records that fail verification are stored in the error record database. Records that pass verification are grouped by warning grid and sampling date, and the arithmetic mean of the index representative value is calculated. The representative value and sample number are written into the soil testing database.
[0008] Furthermore, S2 includes: The monitoring platform collects hourly rainfall and emission records daily from meteorological departments and sewage discharge monitoring systems, based on the time synchronization benchmark. Calculate the daily rainfall, maximum hourly rainfall, and number of consecutive rainy days for each grid based on the correspondence between grids and meteorological observation stations; Based on the digital elevation model, the upstream and downstream relationships of the grid are determined, and the emission intensity of the emission unit is attenuated by distance and converged to each grid along the water flow path to obtain the emission load and upstream contribution. Generate load factor entries containing rainfall indices, emission loads, upstream contributions, and rule version numbers according to grid and date, and write them into the load factor field.
[0009] Furthermore, S3 includes: The monitoring platform initiates parameter fitting tasks within the statistical period; Based on the early warning grid, representative values of acid-exchangeable thallium content, working factors, pH, and clay content were extracted from the soil monitoring database and working factor field. Sample records were constructed with representative values of acid-exchangeable thallium content as the explained variable and working factors, pH, and clay content as the explained variables. The statistical fitting module groups the samples by soil type and geomorphology, performs regression fitting on the samples, and obtains sensitivity parameters including rainfall leaching sensitivity parameters, emission load response parameters, and soil buffering capacity parameters.
[0010] Furthermore, the monitoring platform divides the early warning area into soil zones according to soil type codes and into geomorphic zones according to geomorphic codes; When the number of effective detections of the early warning grid is lower than the preset lower limit, the sensitivity parameter version that matches the soil type code and geomorphology code of the early warning grid is retrieved from the parameter library, and the matching version is registered as the version used by the early warning grid; When no matching combination exists, among the candidate sensitivity parameter versions with the same soil type code and landform code within the preset promotion range, the sensitivity parameter version is selected by sorting according to the average relative error and registered as the alternative version, and the allocation relationship is recorded in the internal mapping table.
[0011] Furthermore, S4 includes: The monitoring platform obtains the most recent representative values of acid-exchangeable thallium content, pH, and clay content from the soil testing database according to the early warning grid and assessment day. Rainfall indices, emission loads, and upstream contributions covering the operating condition observation window are obtained from the operating condition factor field. A risk indication function containing detection risk components, operating condition risk components, and upstream correction components is constructed by combining sensitivity parameters. The normalized risk index is obtained by weighting each component according to the weight coefficient, and the risk level is divided. The grid number, assessment date, normalized risk index, risk level, component value and corresponding version number are written into the risk base library.
[0012] Furthermore, S5 includes: The risk fusion module reads the error feature library version information and fusion rule version information in each fusion task; Extract multi-source risk sets from the risk database according to the early warning grid number and assessment date; The risk value set is divided according to the source of risk, and sources with missing quality indicators and unreliable sources are eliminated based on quality indicators. Based on the soil type code and landform code, the average deviation and standard deviation of each source are queried in the error characteristic database. The source risk index is obtained by subtracting the average deviation from the source risk index, and the weight of each source is determined by normalization according to the reciprocal of the standard deviation.
[0013] Furthermore, after completing the source deviation correction and source weight determination, the risk fusion module calculates the weighted average risk index as the risk center value using the corrected risk index and source weight, and calculates the comprehensive standard deviation using the source standard deviation and source weight. The risk fluctuation range is calculated according to the multiplier given in the configuration table. The risk center value and the risk fluctuation range are superimposed to obtain the upper quantile estimate. The probability of exceeding the warning threshold is determined by combining the risk center value, upper quantile, probability of exceeding the warning threshold, and version information are written into the risk feature database.
[0014] Furthermore, S6 includes: The monitoring platform extracts time series of risk features and time series of operating conditions covering the observation window from the risk feature library and operating condition factor field on a rolling cycle. By inputting the time series of risk characteristics and the time series of operating conditions into a constrained time evolution model, the predicted risk center values for each future day are obtained. Based on spatial neighborhood, spatial residual regression is performed to correct the risk center value prediction, and the corrected risk center value prediction is mapped to the warning level according to the warning threshold. Generate early warning result records containing priority patrol grid markers and write the early warning result records into the early warning result database.
[0015] Compared with the prior art, the present invention has the following beneficial effects: 1. By linking soil sampling and testing, meteorological and emission condition factor construction, sensitivity parameter fitting and generalization according to soil type and landform, risk indicator function calculation, multi-source error correction and Bayesian fusion, temporal evolution prediction and spatial residual regression into a closed-loop early warning process that is version-manageable and traceable under a unified time synchronization and grid scale, the system can achieve stable quantification, rolling prediction and automatic screening of priority inspection targets for soil thallium pollution risk in each early warning grid even under complex situations such as uneven spatiotemporal distribution of monitoring data and superposition of upstream input and local emissions.
[0016] 2. By uniformly configuring and versioning the early warning grid division, sample numbering and representative value merging rules, operating condition factor weighting and upstream tracking rules, sensitivity parameter fitting and generalization boundary, risk indicator function weighting scheme, error characteristic library and source weight calculation, time evolution model constraints, and various task idempotency and quality indicator strategies throughout the entire process, the key algorithm logic and judgment conditions are hardcoded into engineering rules. This allows the solution to be reproduced by those skilled in the art without relying on implicit experience, and facilitates subsequent maintenance and expansion. Attached Figure Description
[0017] Figure 1 This is a flowchart illustrating a method for early warning of soil thallium pollution risks according to the present invention. Detailed Implementation
[0018] The technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of the present invention, and not all embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those skilled in the art without creative effort are within the scope of protection of the present invention.
[0019] Example: Figure 1 A flowchart illustrating a soil thallium pollution risk early warning method according to the present invention is provided. The soil thallium pollution risk early warning method includes: S1. Divide the area into early warning grids, collect soil samples within the grids, and determine the content of acid-exchangeable thallium, pH, and clay content. Compile the samples according to their location to generate a soil testing database. The specific implementation is as follows: During the early warning area determination phase, the competent authority first selects the boundary of the target area on the monitoring platform, loads the basic geographic base map and administrative division information of the area into the platform, and divides it into several early warning grids according to unified rules. Each early warning grid is a geographically closed unit. Preferably, the early warning grid is set as a regular grid that is approximately square, with the side length set to about one kilometer in the platform configuration and remaining unchanged. A unique grid number is assigned to each early warning grid, and the grid number is stored in the basic configuration library in a one-to-one correspondence with the grid geometric boundary. At the same time, a version identifier is recorded for the current grid division. In subsequent operation, the grid boundary will not be adjusted unless approved, thereby ensuring that all subsequent records can uniquely fall into a certain early warning grid and maintain traceability.
[0020] During the sampling site layout phase, technicians draw a sampling site map within each early warning grid according to the monitoring plan. The sampling site map provides suggested locations and numbers for each soil sampling point. Preferably, the plan requires at least three soil sampling points within each early warning grid. Each soil sampling point represents the actual location where soil samples are excavated within the early warning grid. For each soil sampling point, at least the planar location, sampling depth, and sampling time information are recorded. On-site personnel preferably use a handheld positioning device to record the current latitude and longitude coordinates and timestamp when the soil samples are bagged. The time is synchronized via the built-in clock of the positioning device or with a unified time signal, preferably provided by a unified time synchronization system. The sampling time is provided by a server or satellite positioning system, making the sampling time comparable to subsequent working condition information under the same time reference. The sampling depth is the distance from the ground surface to the soil sample. The depth scale can be marked on the sampling tool on site. By observing the position of the scale on the sampling tool, it can be determined whether the sampling depth falls within the depth range allowed by the monitoring plan. Preferably, the plan can set the target depth to 0 to 20 centimeters, and the sample with a sampling depth that deviates from the target depth by more than 5 centimeters is regarded as a sample record with a large deviation and is removed when merging the representative values in the future. In this way, a group of soil sampling points that meet the quantity requirements and have a reasonable depth can be formed within an early warning grid.
[0021] The collected soil samples were labeled with sample numbers according to a unified numbering rule. The sample number included the warning grid number, sampling point number, and sampling date. The sample number remained unchanged during the submission and data entry process so that the laboratory test results could be matched one by one with the on-site sampling locations. After the samples were delivered to a qualified laboratory, the laboratory conducted tests on each sample according to current soil environmental standards, including tests for acid-exchangeable thallium (AET), pH, and clay content. AET content refers to the amount of easily migratable thallium released from the soil sample under standard extraction conditions. The laboratory extracts the soil sample with an acidic solution within a specified time, measures the thallium concentration in the solution at the end of the extraction, and calculates the AET content based on the extraction solution volume and soil sample mass according to conversion rules. pH refers to the acid-base balance of soil mixed with a certain proportion of water. The laboratory obtains this value by allowing the mixture to stand for a predetermined time after mixing and measuring the pH of the resulting suspension. Clay content is the mass ratio of fine particles smaller than a predetermined upper limit in dry soil. The laboratory obtains this value by classifying and settling the soil, or by sieving, and then calculating the ratio of fine particle mass to total mass within a fixed time period. Each of the above test values includes the testing unit and the testing time.
[0022] After the laboratory completes the test, it sends the test results to the monitoring platform using a unified electronic reporting template within a preset time window. The template must include at least the sample number, the corresponding early warning grid number, the sampling date, the laboratory test date, the acid exchangeable thallium content value and its unit, the pH value, the clay content value, and the information of the testing personnel and the review personnel. When the monitoring platform receives the electronic message, it uses the message arrival time as the receiving timestamp for archiving.
[0023] Before writing data into the soil testing database, the monitoring platform performs format and range checks on the submitted records. Format checks involve verifying that each field is complete and correct according to the template, comparing the number and names of each field to determine if the report meets the requirements. Range checks compare values such as acid-exchangeable thallium content, pH, and clay content with pre-configured reasonable upper and lower limits in the platform. For example, the upper and lower limits can be set based on recommended ranges in current soil environmental standards and historical testing results for the region. Preferably, the pH upper and lower limits can be set to four to nine, and the upper limit for acid-exchangeable thallium content can be set to no less than the regional average. The larger of the historical maximum value and the relevant standard screening value is used to set the upper and lower limits of the clay content to zero to 100%. The system identifies obviously erroneous values by judging whether the acid-exchangeable thallium content is greater than zero and lower than the set upper limit, whether the pH falls within the set pH range, and whether the clay content falls within the set interval. For records that do not meet these conditions, the platform marks the record as an error without directly discarding the original message and stores it in a dedicated error record database. At the same time, the error type and corresponding sample number are written in the operation log to form the basis for subsequent manual review, but the record is not included in the subsequent indicator calculation.
[0024] For records that pass format and range verification, the monitoring platform categorizes them into the corresponding early warning grid and sampling day group according to the early warning grid number and sampling date carried in the message, and establishes the main record of the soil testing database based on this. When multiple qualified testing records exist in the same early warning grid on the same day, the monitoring platform merges these records into a representative value to avoid duplicate counting in subsequent analysis. In the merging process, the platform first removes the sample records that are judged to be significantly deviated based on the sampling depth information, and then sorts the remaining records according to the laboratory testing date. By comparing the difference in testing dates between adjacent records and the relative difference in acid-exchangeable thallium content, the judgment of duplicate testing is obtained. Preferably, the testing dates of two records are no more than one day apart and the acid-exchangeable thallium content is... If the relative difference in quantity does not exceed 5%, it is considered a duplicate test result from the same batch. One record meeting this condition is retained, and the remaining records are stored in the soil testing database as supplementary records but are not included in the representative value calculation. For the multiple records that are still retained, representative values are calculated and statistically analyzed for acid-exchangeable thallium content, pH, and clay content. Preferably, the arithmetic mean method can be used, which is to add the values of each record and divide by the number of records to obtain the representative value. In scenarios where it is necessary to enhance the ability to resist extreme values, the median method can also be used, which is to sort the values and take the middle value as the representative value. Both representative value calculation methods are explicitly registered in the platform configuration and assigned a version number. When it is necessary to adjust the representative value calculation method, the new version will take effect, and the old version will be retained for the interpretation of historical results.
[0025] After completing the representative value calculation, the monitoring platform generates a detection record for that early warning grid on that date, writing the representative value, the corresponding sample quantity, the list of sample numbers involved in the merging, the version number of the representative value calculation method, and the sampling and detection time range into the soil testing database. The soil testing database is preferably deployed in a storage system with transaction management capabilities. Each batch import task generates a task identifier at startup, attaching this identifier and the current timestamp to the task context. When processing the reporting messages, the platform writes the task identifier into each newly generated detection record. When the upstream system repeatedly sends the same batch of messages due to network fluctuations, the monitoring platform compares the task identifier and sample number to determine if this batch has been written. If a record with the same task identifier and sample number combination already exists in the database, a new record is not generated again; instead, the previous processing result status is returned, thus ensuring that only one valid write occurs for duplicate requests. For records in the same task that are partially written... In cases where writing fails due to a temporary fault, the monitoring platform automatically re-initiates the writing process according to the configured number of retries and waiting time interval. Preferably, the number of retries can be set to three, and the waiting time between two retries can be set to five minutes. Each retrieval is recorded in the log. When the maximum number of retries is reached and there are still failure records, the platform marks the task as failed. At the same time, the content of the soil testing database generated by the previous successful task remains unchanged. An alarm message is sent to the operation and maintenance personnel through the log alarm module, prompting them to manually check the data link or storage system status, thereby avoiding the overwriting of historical valid data when the database content is incomplete.
[0026] Regarding sampling rhythm, the competent authorities can set the sampling frequency based on monitoring tasks and resource conditions. Preferably, the regular sampling frequency is set to complete soil sampling and testing once per quarter for each early warning grid. When there are special investigations or key grids, the sampling frequency of certain grids can be temporarily increased. The monitoring platform records the latest testing date of each early warning grid in the soil testing database and identifies grids that have not been tested for a long time by comparing the current date with the latest testing date, providing identification of sparse data areas for subsequent early warning models. For example, when the area of the early warning region is within the scale of 1,000 to 10,000 square kilometers, and the total amount of soil samples that can be completed each quarter is on the same order of magnitude as the number of early warning grids, the regular sampling rhythm and laboratory testing capabilities can generally support the above-mentioned site layout and testing arrangements.
[0027] Throughout the process, the early warning grid division, sample numbering rules, representative value calculation methods, error record handling strategies, and retry strategies are all centrally managed through configuration files or parameter tables. Each change generates a new version while retaining the old version in the system, so that the specific rules used at the time can be reproduced when interpreting or recalculating historical early warning results. Under the premise that the early warning area is a city or watershed and the sample quantity matches the detection capacity, this implementation method can provide soil testing basic data with complete spatial coverage, clear temporal rhythm, and traceable quality for subsequent risk assessment and early warning steps within the range that the existing sampling capacity and laboratory testing capacity can bear. Those skilled in the art can reproduce the soil testing library construction process in similar scenarios by dividing the grid, setting up sampling points, organizing testing, and running the import and merging logic in the monitoring platform in the above order, thus laying an implementable foundation for realizing soil thallium pollution risk early warning.
[0028] S2. Based on meteorological and emission data, calculate the rainfall index, emission load, and upstream contribution for each grid, generating a working condition factor field at the same scale as the soil monitoring database. The specific implementation is as follows: During the data acquisition and processing phase, the monitoring platform collects daily meteorological and emission records related to thallium migration in the soil from meteorological departments and the pollution discharge monitoring system under a unified time reference. The meteorological records include at least the daily rainfall, hourly rainfall sequences, and the number of consecutive rainy days derived from the daily rainfall, compiled by several fixed meteorological observation stations at the end of each day. The daily rainfall is obtained by summing the hourly rainfall sequences recorded within a calendar day. After receiving the hourly rainfall records for a given day from a station, the monitoring platform obtains the station's daily rainfall by summing the hourly rainfall amounts. The maximum hourly rainfall is selected from the hourly rainfall sequences for the same day. For large hourly precipitation, the monitoring platform compares the hourly rainfall data for each day and takes the maximum value to obtain the maximum hourly rainfall data for the station. The number of consecutive rainy days is the number of days in a recent consecutive period where the daily rainfall is consistently greater than the trace precipitation threshold. When calculating the number of consecutive rainy days for a given day, the monitoring platform reads the daily rainfall records for the station or grid for the most recent 30 days backward from that day. Days with daily rainfall greater than the preset trace precipitation threshold are included in the consecutive days and accumulated. Once a day with daily rainfall less than or equal to the threshold is encountered, the accumulation stops, and the accumulated number of days is taken as the number of consecutive rainy days for that day. The trace precipitation threshold is given in advance in the system configuration, and is preferably set to two millimeters.
[0029] The emission records must include at least the daily emission intensity, emission period, and emission location reported by the thallium-containing emission unit. The daily emission intensity is the total mass of thallium-containing pollutants emitted into the environment by the emission unit within a calendar day. When the monitoring platform receives records reported at the hourly average emission rate, it multiplies the rate by the number of hours in the corresponding emission period to obtain the daily emission amount, and then converts the units to the form of mass divided by days as agreed. The emission period is the start and end time of the actual emission by the unit on that day. The monitoring platform determines whether the emission falls within the calendar day by comparing the emission period with the time interval of the calendar day to be assessed. The emission location is the longitude and latitude of the unit's emission outlet. When mapping the emission to the warning grid, the monitoring platform uses these coordinates as the basis to determine the warning grid to which it belongs.
[0030] The monitoring platform initiates the operating condition factor update task at a pre-set time each day. When the task starts, it generates a task identifier and a start timestamp. First, based on the early warning grid boundary stored in the basic configuration library, it retrieves meteorological records from each meteorological observation station in the target area from the meteorological department interface according to the configured time range, and retrieves emission records contained within the boundary rectangle of the target area from the pollution discharge supervision system interface. The interface configuration registers the measurement units, reporting cycles, and allowable error ranges of meteorological and emission data, and assigns a version number to each version of the interface configuration. When writing each batch of data to the log, the monitoring platform records the data source organization identifier, interface address, interface configuration version number, and reception time together, so as to trace the measurement caliber used in the batch of data later.
[0031] When mapping meteorological station records to the early warning grid, the monitoring platform determines the grid location of each observation station based on the grid's geometric boundaries. If one or more meteorological observation stations exist within a given early warning grid, the daily rainfall data from these internal stations is directly used in the calculation of that grid's daily rainfall. If multiple observation stations exist, their daily rainfall values are summed and divided by the number of observation stations to obtain the grid's daily rainfall. If no observation stations exist within a given early warning grid, the monitoring platform uses the grid's geometric center coordinates as a reference point and selects several observation stations within a pre-set search radius. Preferably, the search radius can be set to 50 kilometers. The three closest observation stations within the radius are weighted according to the distance between each station and the grid center. The weights can be normalized by the inverse of the distance, that is, the closer the observation station is, the greater the weight, and the farther the observation station is, the smaller the weight. The monitoring platform multiplies the daily rainfall of each observation station by the corresponding weight and sums them up. The sum is taken as the daily rainfall of the grid for that day. The maximum hourly rainfall can be weighted in the same way at the grid level. That is, the maximum hourly rainfall of the day is first selected from the hourly rainfall sequence of each observation station, and then these maximum hourly rainfalls are combined into the maximum hourly rainfall of the grid according to the distance weighting method mentioned above.
[0032] In the calculation of consecutive rainfall days at the grid level, the monitoring platform retrieves the daily rainfall records of the most recent 30 days from the historical record table of operating conditions for each early warning grid. It compares the daily rainfall of the grid with the trace precipitation threshold day by day from the current day. If it is greater than the threshold, the consecutive days are incremented by one and the process continues to move forward. If it is less than or equal to the threshold, the process stops. When a stopping condition is encountered or the record is retrieved to 30 days ago, the accumulated number of days is taken as the consecutive rainfall days of the grid for the current day and written into the operating condition record for that day.
[0033] To calculate emission load and upstream contribution, the monitoring platform first uses a digital elevation model (DEM) to construct elevation relationships between early warning grids. The DEM is stored in the system as a regular grid, with each grid cell containing the center point elevation value. The grid resolution and elevation units are specified in the configuration table. The monitoring platform divides the early warning grid into several sub-regions corresponding to the DEM grids, statistically analyzes the grid elevation values covering each early warning grid, and uses the average elevation as the representative elevation for that grid. Then, it compares the elevation values between each early warning grid and its adjacent grids, considering the grids with lower elevations as downstream candidates for the grids with higher elevations. Finally, for each early warning grid, one or more elevations are selected from adjacent grids based on the elevation difference. The significantly lower grids are designated as downstream grids on the natural slope, and a water flow connection is established between the warning grid and these downstream grids. When it is necessary to determine the upstream grid set of a certain warning grid, the monitoring platform performs a tracking operation along the water flow direction on that grid. Starting from all directly adjacent upstream grids that have a water flow connection with that grid, the platform expands layer by layer to higher grids in an increasing step manner. In each expansion, only the directly adjacent upstream grids of the current node's upstream grid are selected. The current step number is recorded in each expansion. When the step number reaches the configured upper limit, the expansion stops. Preferably, the upper limit of the step number can be set to ten steps. In this way, the upstream warning grid set with hydrological connectivity on the natural slope is determined within a range of no more than ten upstream levels.
[0034] When calculating emission-related factors daily, the monitoring platform performs the following steps for each early warning grid: First, it searches the emission records for all emission units whose emission location coordinates fall within the grid for that day, and sums the daily emission intensity values of these units to obtain the grid's direct emission amount for that day. Then, in the upstream early warning grid set obtained above, it sequentially searches for the direct emission amount and the upstream cumulative emission amount of each upstream early warning grid, and transmits these emissions along the water flow path to the current early warning grid. During transmission along each segment of the water flow path, it multiplies the emissions by an attenuation coefficient set according to distance. Preferably, the attenuation coefficient corresponding to each kilometer of travel distance can be set... The attenuation coefficient is set to 0.8. When the water flow path length exceeds one kilometer, the attenuation coefficient is converted to the corresponding power according to the path length, so as to achieve the effect that the more distant the upstream grid is, the smaller the contribution of the current grid. After summing the attenuated contributions of all upstream warning grids, the monitoring platform adds the summation result to the direct emission of the current warning grid to obtain the emission load of the warning grid on that day, and summarizes the contributions of each upstream warning grid into the total upstream contribution. When writing the operating condition factor record, the monitoring platform stores the emission load value, the total upstream contribution, and the list of upstream warning grid numbers into the corresponding record so that the subsequent risk assessment steps can distinguish between direct emissions and upstream migration.
[0035] In the daily cycle of operating condition factor updates, after the monitoring platform completes the calculation of daily rainfall, maximum hourly rainfall, consecutive rainfall days, emission load, and upstream contribution, it organizes these quantities according to the combination key of "early warning grid number plus calendar date" to form an operating condition factor field. Each record in the operating condition factor field includes at least the early warning grid number, date, daily rainfall, maximum hourly rainfall, consecutive rainfall days, emission load, total upstream contribution, and the version numbers of the site weighting rule, upstream attenuation rule, and consecutive rainfall day determination rule. At the configuration level, after determining a set of site weighting parameters, upstream attenuation parameters, and consecutive rainfall determination thresholds, the system administrator registers these parameters and their applicable areas in a unified configuration table, assigns a unique version number to the corresponding rule, and makes the new rule effective on the new collection date by adding a new version when the rule is adjusted. The old version configuration is still retained for the interpretation or recalculation of historical operating condition factors.
[0036] The working condition factor update task is preferably triggered automatically at a fixed time each day, such as a fixed time after midnight. After the task starts, the monitoring platform attempts to obtain complete data for the day from the meteorological department interface and the sewage discharge monitoring system interface multiple times within the configured maximum waiting time. Preferably, the maximum waiting time can be set to two hours, the number of retries can be set to three, and the waiting time between two retries can be set to twenty minutes. When a certain acquisition operation is successful and the proportion of missing stations or emission units does not exceed the preset missing threshold, such as not exceeding 10% of the total number, the monitoring platform prioritizes using the most recent valid record to fill in the time for missing stations or emission units. For grids that cannot be directly filled in by the most recent record, the same distance weighting method as the aforementioned station weighting is used. Spatial interpolation is performed to fill in the missing data. Within the configured search radius, several nearby observation stations or emission units with data are selected. The records of these observation stations or emission units from the previous day or the most recent day are weighted by distance to obtain the replacement value for the missing location. Then, the complete operating condition factor field is generated according to the aforementioned mapping and calculation steps. If the number of retries has reached the upper limit within the maximum waiting time and the data missing ratio is still higher than the preset missing threshold, the monitoring platform marks this operating condition factor update task as a failure. No new operating condition factor record is generated for that day. Instead, the operating condition factor field of the previous day is marked as a continuation state in the system for the subsequent risk assessment module to use as a temporary reference. The reason for failure and the category of missing data are recorded in the operation and maintenance log. The operation and maintenance personnel are notified through the alarm mechanism to check the status of the meteorological interface and the pollution discharge supervision system.
[0037] Through the aforementioned process of constructing operating condition factors, the monitoring platform, with an observation window preferably of thirty days, can update the operating condition factor field covering all early warning grids within a warning area of 1,000 to 10,000 square kilometers, under the condition that the number of early warning grids is on the same order of magnitude as the number of meteorological observation stations and the number of pollution discharge units, at a fixed time every day. Based on this description, those skilled in the art can configure the communication interface with the meteorological department and the pollution discharge supervision system, the mapping relationship between the early warning grids and the observation stations and emission units, and the digital elevation model data, and implement the program according to the weighting, tracking, and attenuation rules. This allows the same operating condition factor calculation process to be reproduced in similar areas, providing a traceable basis of rainfall and emission factors that are consistent with the rhythm of the soil detection library for subsequent soil thallium pollution risk assessment.
[0038] S3. Within a grid with multiple detection records, the working condition response relationship is fitted using acid-exchangeable thallium content as the response quantity and working condition factor, pH, and clay content as independent variables. Sensitivity parameters are extracted and generalized according to soil type and geomorphological coding to form a sensitivity parameter field. The specific implementation is as follows: During the sensitivity parameter construction phase, the monitoring platform initiates parameter fitting tasks periodically in batch processing based on unified time synchronization and existing soil testing database and working condition factor field. When a task starts, it first reads parameters such as the statistical cycle length, observation window length, and error control limits for one year from the configuration table, and generates a task identifier and timestamp. Then, it iterates through the records of all early warning grids in the soil testing database, sorts each early warning grid by calendar date, and counts the number of valid tests in the most recent year. Valid test records are records of acid-exchangeable thallium content, pH, and clay content, which are all checked against the aforementioned format and range to form representative values. When the number of valid tests for a certain early warning grid in a year is greater than or equal to the preset lower limit, the grid is marked as a candidate grid that can participate in the fitting. The lower limit is given in advance in the system, preferably set to three.
[0039] For each candidate grid, the monitoring platform further selects a continuous observation window in the parameter fitting task. The observation window is a time interval covering all detection dates and appropriately extended forward. Preferably, the start date of the observation window can be set to the start date of the one-year statistical period for that grid, and the end date can be set to the end date of the statistical period or the most recent detection date. This ensures that the observation window includes both the detection dates and the operating factors used to describe rainfall and emission processes before and after these dates. Within this observation window, for each date with available detection records, the monitoring platform reads the representative value of acid-exchangeable thallium content, pH, and clay content for that grid from the soil testing database, and reads the daily rainfall, maximum hourly rainfall, number of consecutive rainfall days, emission load, and upstream contribution factors for the corresponding day from the operating factor field. These values constitute a sample record. Each sample record includes at least one explained quantity and multiple explanatory quantities. The explained quantity is the representative value of the acid-exchangeable thallium content of the grid on that day. The explanatory quantities include the daily rainfall, consecutive rainfall days, emission load and upstream contribution of the grid on that day and for a certain number of days prior, as well as the pH and clay content on that day. In actual implementation, the monitoring platform can preferably construct the rainfall and emission portions of the explanatory quantities in a sliding manner within the observation window. For example, the daily rainfall and emission load of the day before, the three days before, and the seven days before the observation date can be summed or averaged respectively, and these statistics can be used together with the consecutive rainfall days on that day as the explanatory quantities related to rainfall and emissions. This processing can be completed by accessing the records of the corresponding date in the working condition factor field and adding them in calendar order.
[0040] To obtain an empirical response relationship between acid-exchangeable thallium content and rainfall and emissions under given soil types and geomorphological conditions, the monitoring platform needs to label each sample with soil type and geomorphological category while constructing sample records. Soil type can be divided into several types based on soil surveys or existing soil maps, such as sandy soil, loam, and clay, and stored in the system in the form of codes, with each warning grid corresponding to a soil type code. Geomorphological conditions can be divided into plains, terraces, gentle slopes, and gentle hills based on elevation, slope, and surface morphology, and are also stored in the configuration in the form of codes, and a corresponding relationship is established with the warning grid. When reading sample records, the monitoring platform simultaneously attaches the soil type code and geomorphological code of the grid to the sample, thereby ensuring that subsequent fitting can be grouped according to soil type and geomorphological conditions.
[0041] After receiving all sample records for a specific soil type and landform combination, the statistical fitting module first checks the sample quantity. If the sample quantity is less than the preset minimum sample size, such as less than ten, it is considered that the data volume for this combination is insufficient to fit the parameters alone, and no new sensitivity parameter version will be generated for this combination in this round of tasks. For combinations with sufficient sample quantity, the statistical fitting module arranges the explanatory quantities in each sample into a corresponding numerical matrix and arranges the explanatory quantities into a one-dimensional numerical sequence. Then, it calculates a set of regression coefficients according to the principle of minimizing the residual sum of squares, so that the residual sum of squares between the estimated acid-exchangeable thallium content obtained by linear combination of explanatory quantities and the measured content in the sample is minimized among all possible coefficient combinations. In this process, the module can adjust the coefficient values in an iterative manner, that is, first initialize a set of coefficients, and then calculate the estimated value, residual, and residual sum of squares successively and compare the size of the new and old residual sums of squares until the reduction of the residual sum of squares is less than the preset convergence threshold. The preset convergence threshold is given in the configuration, and preferably can be set to reduce the residual sum of squares by less than one percent in each iteration.
[0042] The regression coefficients obtained after fitting are used as sensitivity parameters in this scheme. These sensitivity parameters are a set of values, including at least parameters reflecting rainfall leaching sensitivity, emission load response, and soil buffering capacity. The rainfall leaching sensitivity parameter can be understood as the average change in acid-exchangeable thallium content when daily rainfall increases by a certain amount, all other things being equal. The statistical fitting module compares the variation of acid-exchangeable thallium content in each sample within the observation window under different rainfall levels and uses the regression model described above to estimate the average contribution of each unit of rainfall to the acid-exchangeable thallium content. To determine this parameter; the emission load response parameter can be understood as the average impact of changes in emission load and upstream contribution on changes in acid-exchangeable thallium content. During the regression process, the statistical fitting module calculates the correlation between the emission-related explanatory quantities and the explained quantities and encodes them into the corresponding regression coefficients, thereby reflecting the degree of response in the magnitude of the coefficient values; the soil buffering capacity parameter can be reflected by the regression coefficients of the explanatory quantities corresponding to pH and clay content. After the fitting is completed, the statistical fitting module analyzes the sign and magnitude of these coefficients to determine the adsorption or buffering contribution of changes in soil pH and clay content to acid-exchangeable thallium content.
[0043] To verify the reliability of the fitting results, after obtaining the sensitivity parameters, the monitoring platform uses a regression model to estimate the acid-exchangeable thallium content for each sample and compares it with the measured content in each sample. The relative error for each sample is calculated, which is the ratio of the absolute value of the difference between the estimated and measured values to the measured value. The average relative error of all samples is then calculated and compared with a preset error control limit, which is given in the configuration and preferably set to 20%. When the average relative error is less than or equal to the error control limit, the sensitivity parameters are considered to have an acceptable fitting quality for that soil type and landform combination and can be effective in this round of tasks. When the average relative error exceeds the control limit or there are obviously unreasonable situations, such as a negative value for the rainfall leaching sensitivity parameter and field experience indicating that increased rainfall in this type of soil does not lead to a decrease in thallium content, the monitoring platform marks the set of parameters as a failed fit, does not write it into the parameter database, and records the corresponding soil type, landform combination, and abnormal situation in the task report for technical personnel to review whether the selection method of explanatory quantities needs to be adjusted or the zoning needs to be re-divided.
[0044] For parameter groups deemed to have acceptable fitting quality, the monitoring platform generates a sensitivity parameter version record. This record includes at least the sensitivity parameter value, applicable soil type code, applicable landform code, a list of warning grids involved in the fitting, the start and end dates of the observation window, the number of samples, the average relative error, and the identification of this fitting task. The system assigns a unique version number to this parameter group and writes it into the parameter library. Multiple historical versions can be retained for the same soil type and landform combination in the parameter library. However, in subsequent working condition response calculations and risk assessments, for a certain warning grid within a certain statistical period, only one specific version that is effective within that period is called. The effective rule can be preferably set to use the version with the latest effective date and satisfactory fitting quality as the currently used version.
[0045] For early warning grids that have fewer than the preset lower limit of detection frequency within a one-year statistical period, the monitoring platform will not individually fit sensitivity parameters for them in this round of tasks. Instead, it will promote existing parameters through soil zoning or geomorphological zoning. Soil zoning involves dividing the early warning area into several zones based on soil type, with the soil type codes of the early warning grids within each zone being the same or similar. Geomorphological zoning involves dividing zones based on geomorphological codes. The zoning information is established in advance in the configuration table. During the promotion process, the monitoring platform first searches the parameter library for a corresponding combination of valid sensitivity parameter versions based on the soil type code and geomorphological code of the early warning grid. If it exists, it will directly apply the corresponding parameters. The parameter version number is registered as the sensitivity parameter version used by the early warning grid in the current statistical period, and this allocation relationship is recorded in the internal mapping table. If there is no perfectly matching combination, a soil type priority strategy can be preferred. First, find parameter groups with the same soil type but whose landform codes are allowed to deviate within a certain range. Within this range, mutual promotion between plains, terraces, gentle slopes and gentle hills is preferred, but these parameters are not applied to significantly different mountain landforms. Then, select a group from these candidate parameter groups according to the average relative error from small to large as the alternative version, so that the boundary of landform promotion remains clear in the configuration and description.
[0046] The parameter fitting task runs in the background according to a pre-set rhythm, preferably once a quarter. After each run, the monitoring platform generates a task report, which lists the total number of samples participating in the fitting, the fitting pass rate of each soil type and landform combination, the average relative error range of each parameter group, and the list of effective versions used. At the same time, the task report is associated with the version records in the parameter library to form a complete evidence chain of sensitivity parameters from the original detection record to the final version taking effect.
[0047] Through the aforementioned process of constructing and promoting sensitivity parameters, grid-by-grid sensitivity parameters adapted to specific grids and soil conditions can be obtained in areas with abundant data within the early warning region. In areas with relatively sparse data, representative parameters can also be introduced through soil and geomorphological partitioning. This enables each early warning grid in the entire region to complete the response estimation of acid-exchangeable thallium content to rainfall and emissions under clear parameter version constraints in subsequent operational response calculations. Those skilled in the art can reproduce the construction and updating process of the sensitivity parameter field in similar scenarios by configuring sample screening conditions, constructing observation windows, organizing explanatory and explained quantities, performing regression fitting and error testing, and managing parameter versions by following the above steps. This provides a traceable and physically meaningful parameter basis for subsequent risk indication functions and early warning models.
[0048] S4. Based on the soil testing results, working condition factors, and sensitivity parameters of each grid, a risk indicator function is constructed to obtain a normalized risk index. Then, observation and estimation results from different sources are aggregated to form a multi-source risk set. The specific implementation is as follows: During the risk assessment phase, after the daily working condition factor field is updated, the monitoring platform executes risk index calculation tasks sequentially for each early warning grid and each calendar day according to a preset schedule. Upon task initiation, it first retrieves the most recent representative value of acid-exchangeable thallium content, along with the corresponding pH and clay content, from the soil testing database based on the grid number and date. The most recent record refers to a record with a testing date no later than the current assessment date and within the configured maximum backtesting period. The maximum backtesting period is given in the configuration and is preferably set to one year. If no testing record is found within this backtesting period, then... In this task, the risk component based on detection is not calculated; only the inferred component based on operating conditions and sensitivity parameters is retained and explained in the quality label. Subsequently, the monitoring platform extracts the daily rainfall, maximum hourly rainfall, number of consecutive rainfall days, emission load, and upstream contribution sequence for that day and several days prior from the operating condition factor field based on the same grid number and date. The range of the previous several days is used as the operating condition observation window in the risk calculation, preferably set to thirty days. When constructing the observation window, the monitoring platform reads the operating condition factor records of that grid day by day from the current assessment date backwards, and sets the rainfall and emission indexes for each day... The parameters are arranged in calendar order. For quantities related to short-term cumulative exposure, the cumulative exposure value over the past few days is obtained by summing the values daily. For quantities related to process intensity, such as maximum hourly rainfall, the representative process intensity is obtained by taking the maximum value within the observation window. This results in a set of stable operating condition statistics for each assessment day, which can be called by the subsequent risk indication function. After the operating condition information is ready, the monitoring platform reads the sensitivity parameter versions that match the soil type code and geomorphological code of the warning grid and are effective within the current statistical period from the sensitivity parameter field. The sensitivity parameters include at least rainfall leaching sensitivity parameters, emission load response parameters, and soil buffering capacity parameters. Each parameter group has a version number and applicable time range. The monitoring platform ensures that only one clear parameter version is selected for calculation within the same statistical period by comparing the current date with the effective start and end dates of the parameter versions. If no completely matching parameter version is found in the parameter library, a set of candidate parameters with consistent soil type and allowable deviation in geomorphological code is selected as a substitute according to the aforementioned zoning promotion logic, based on the average relative error from smallest to largest. The quality mark is recorded that the parameter comes from the promotion zone rather than being independently fitted to this grid.
[0049] With all three types of information available, the monitoring platform constructs the risk indication function for that day's early warning grid in a fixed order. The risk indication function is a function structure with physical meaning that combines soil testing information, working condition factors, and sensitivity parameters in numerical form on an assessment day. Its calculation process includes three interrelated components.
[0050] The first component is the direct risk component based on a single soil test. In this component, the monitoring platform compares the most recent representative value of acid-exchangeable thallium content with the soil background value and relevant standard limits for the region. The soil background value can be given in the configuration table based on long-term monitoring results for the region. The relevant standard limits can adopt the screening or control values for thallium in the current soil environmental standards. In actual calculation, the monitoring platform first determines the increase factor of the current representative value relative to the background value, and then determines the difference between the representative value and the standard limit. These two comparison relationships are mapped to an initial risk scale between zero and one. Specifically, a dimensionless ratio can be obtained by subtracting the background value from the representative value and dividing by the difference between the background value and the standard limit. This ratio is then restricted to the range of zero to one. When the ratio is less than zero, it is counted as zero, and when the ratio is greater than one, it is counted as one, thus obtaining the initial risk index based on the test.
[0051] The second component is the risk estimation component based on the operating condition response relationship. In this component, the monitoring platform does not directly use the daily detection values, but rather estimates the potential level of acid-exchangeable thallium content under the current operating conditions based on the daily rainfall and emission load within the operating condition observation window, as well as the corresponding sensitivity parameters. Specifically, the rainfall index within the observation window is first weighted and accumulated using the rainfall leaching sensitivity parameter. Preferably, the daily rainfall within the most recent three days can be assigned weights of 0.5, 0.3, and 0.2 respectively, while the weight of earlier dates is set to zero. Within the same observation window, the daily rainfall is multiplied by its corresponding weight and then summed to obtain a time-weighted rainfall index. This index is then multiplied by the rainfall leaching sensitivity parameter to obtain an estimate of the rainfall-induced thallium migration contribution. Simultaneously... The emission load and upstream contribution within the observation window are similarly time-weighted using emission load response parameters. The emission load and upstream contribution within the most recent three days are combined with the same weight, and the summation is used to obtain an estimate of the thallium input contribution caused by emissions. Then, based on soil buffering capacity parameters and current pH and clay content, the soil's adsorption or fixation capacity for the above input contribution is estimated. The buffering contribution is subtracted from the total input contribution to form a net cumulative contribution. The monitoring platform superimposes these net cumulative contributions onto the regional background value to obtain an estimated value of the expected acid-exchangeable thallium content under the current operating conditions and sensitivity. Then, by comparing with the background value and the standard limit, the expected content is mapped to an operating condition risk index between zero and one using the same scaling transformation method as the first component.
[0052] The third component is a correction component based on the superposition of upstream contributions. In this component, the monitoring platform focuses on characterizing the additional risk amplification effect of upstream regional conditions on the current grid. The specific steps are as follows: First, read the daily upstream contribution values within the observation window from the condition factor field. Then, use the same weight sequence as the aforementioned rainfall time weighting to perform weighted summation to obtain the upstream cumulative contribution index. Next, read the emission load of this grid within the observation window and calculate the cumulative emission contribution of this grid using the same time weighting method. Compare the upstream cumulative contribution with the cumulative emission contribution of this grid. When the upstream cumulative contribution is greater than or equal to twice the cumulative emission contribution of this grid, it is considered that the upstream input has a significant advantage in the overall risk. In this case, the monitoring platform adds an upward correction term to the risk indication function. It is preferable to add a correction amount of 0.1 to the normalized risk index based on the original value, while limiting the corrected risk index to a range not exceeding one. When the upstream cumulative contribution is lower than the above-mentioned twice threshold, the upstream correction term can be set to zero, thereby avoiding excessive amplification of the risk index when the upstream influence is not significant.
[0053] After calculating the three components, the monitoring platform needs to unify them into a single normalized risk index and clearly record the source and weight of each component. The normalized risk index is a dimensionless value representing the final risk level within a closed interval of zero to one. The calculation process involves first setting a set of weight coefficients for the three components. These weight coefficients are centrally managed in the configuration and assigned a weight scheme version number. The sum of the weights is preferably one. In areas with sufficient detection records, the weight of the direct risk component can be set to 0.5, the weight of the operating condition-estimated risk component can be set to 0.3, and the weight of the upstream correction component can be set to 0.2. In areas with relatively sparse detection records but abundant operating condition data, the weight of the operating condition risk component can be appropriately increased and the weight of the detection component can be decreased in the new weight scheme version. During the specific calculation, the monitoring platform multiplies the three components by their corresponding weights and adds them together to obtain the normalized risk index. At the same time, the source information of each component and the combined index is marked, including whether it is based on actual detection, whether it is based on operating condition response estimation, and whether it includes upstream correction, so that subsequent analysis can understand the composition of each risk index.
[0054] Once the mathematical structure and weighting scheme of the risk indicator function are determined, they are fixed in the system with a version number. When adjusting, the new version must be registered in the configuration table and the applicable area and effective date must be specified. The old version is set to read-only but is retained for recalculating or interpreting historical risk indices, forming a complete evolution chain of indicator function parameters.
[0055] After obtaining the normalized risk index, the monitoring platform divides the index into four levels—safe, attention, warning, and high risk—according to preset grading rules. The grading rules are recorded in the configuration as level thresholds. Preferably, the index range of 0 to 0.3 corresponds to the safe level, the range of 0.3 to 0.5 corresponds to the attention level, the range of 0.5 to 0.7 corresponds to the warning level, and the range above 0.7 corresponds to the high risk level. In actual implementation, the monitoring platform compares the normalized risk index with each threshold to determine the level of the index in ascending order and writes the level code into the record. This gives the risk index both continuous numerical information and grading information that is easy for regulators to use.
[0056] At the data storage level, the monitoring platform compiles the risk calculation results of each early warning grid for each calendar day into a multi-source risk record, forming a multi-source risk set and writing it into the risk base database. The multi-source risk set is a summary of various risk indices and weight information at the scale of a grid and a day. Its structured fields include at least the early warning grid number, assessment date, normalized risk index value, corresponding risk level code, risk component value based on detection and its source identifier, risk component value based on operating condition response and its source identifier, corrected component value based on upstream contribution and its source identifier, version number of the risk indication function used, version number of the weighting scheme used, and version number of the sensitivity parameters involved in the calculation. In areas with long-term regulatory experience, experience reference values corresponding to the historical risk distribution or regulatory experience of the area can also be written into the same record. For example, the percentile value of the risk index of the grid in the past few years or the risk index value corresponding to the occurrence of an event exceeding the standard can be recorded for horizontal comparison in subsequent analysis.
[0057] On a given assessment date, if one of the three components cannot be calculated due to missing upstream operating condition data, missing sensitivity parameters, or the detection record exceeding the maximum backtesting period, the monitoring platform will mark the risk record with a quality flag. The quality flag in the system represents the missing situation and the degree of credibility in the form of an enumerated code. For example, missing detection component is marked as insufficient detection, missing operating condition component is marked as incomplete operating condition, and missing upstream component is marked as unknown upstream. In the subsequent risk fusion and early warning release stages, such records can be used with reduced weight or directly excluded based on the quality flag to avoid the risk being overestimated or underestimated due to the lack of key information.
[0058] Through the aforementioned risk indicator function construction, normalization, and multi-source risk set generation process, the monitoring platform can obtain a set of traceable risk indices and their constituent information for each early warning grid and each assessment day in the early warning area, following the same steps and under the same parameter version constraints. Based on this description, those skilled in the art can configure the detection backtracking period, operating condition observation window length, background values and limits, weighting schemes and level thresholds, and sequentially implement data retrieval, component calculation, normalization, and record storage. This allows the operation process of the risk assessment stage to be reproduced in similar scenarios, and the upper-level early warning logic can be extended on this basis.
[0059] S5. Construct a Bayesian fusion model using the multi-source risk set and the error distribution of each source to obtain the posterior distribution of the grid risk index, and extract risk features as input to the early warning model. The specific implementation is as follows: In the risk fusion phase, after completing the daily writing of the basic risk database, the monitoring platform initiates a risk fusion task within the scope of the warning area. Upon task initiation, it first reads the error characteristic database version, fusion rule version, source missing threshold, and the applicable area and time period for this batch of fusion tasks from the configuration table, and generates a task identifier and a start timestamp. The error characteristic database is a set of parameters in the monitoring platform that specifically records the long-term deviation of various risk sources. Each error characteristic record includes at least the risk source type, applicable soil type code, applicable landform code, reference grid list, number of samples participating in the statistics, average historical deviation, dispersion index of historical deviation, and effective start and end dates. Deviation refers to the difference between the risk index value given by a certain source in a certain grid on a certain day and the value given by the same grid on the same day based on the detection date. The difference between the representative values of acid-exchangeable thallium content and the reference risk index obtained by converting them according to predetermined rules is used by the monitoring platform during the formation of the error characteristic library. In areas and time periods with sufficient soil testing records, the platform aligns the records marked as having sufficient testing components in the risk base library with the test results in the soil testing library one by one. By calculating the difference between the risk index of the source and the reference risk index for each record, and taking the standard deviation of this difference sequence over time, the standard deviation is used as the dispersion index of the source under the corresponding soil type and landform combination. Preferably, the system defaults to using the standard deviation as the only dispersion index. Only when explicitly specified in the configuration table is it allowed to use a substitute index represented by the quantile interval. Under the default configuration, all error characteristic entries are generated based on the standard deviation, thereby ensuring the comparability between different sources.
[0060] After the monitoring platform completes the calculation of the average deviation and dispersion index, it generates error characteristic entries for the source under the same soil type and landform combination and assigns a version number. The technicians update the version at the end of each statistical period based on the new round of comparison results. The old version is retained in read-only mode to interpret and recalculate the historical fusion results.
[0061] The daily risk fusion task is preferably launched at a fixed time point after the risk base database is written, following a fixed schedule. After the task starts, it traverses and summarizes the multi-source risk sets in the risk base database that are within the applicable time period. For each early warning grid and each assessment date, it first retrieves the normalized risk index, its components, and quality indicators that have been calculated and stored in the previous stage from the risk base database. These records are then broken down into several source risk value sets according to source type. Each source risk value set must contain at least the pre-correction risk index and corresponding quality indicator for the corresponding source on that grid and that date. The monitoring platform checks whether each source risk value is affected by missing or abnormal factors through the quality indicator. If a source is marked as missing or unreliable on that grid and that date, it is considered an unusable source in this fusion and is not included in subsequent fusion calculations.
[0062] In this batch of tasks, the monitoring platform also needs to conduct an overall assessment of the source missing situation. To this end, at the beginning of the task, the platform will calculate the missing proportion of each source risk value within the assessment date and all early warning grids. When it is found that the missing proportion of a certain source in this batch exceeds the preset source missing threshold, the source missing threshold is preferably set to 30% of the total number of all grid and date combinations. In this case, the monitoring platform will not use the source in this round of fusion task, but will mark the source as a disabled source in this batch and write the reason for the decision in the fusion task record. After the subsequent data is supplemented, a new fusion task can be launched to recalculate the corresponding time period.
[0063] After determining the set of sources that can participate in this round of fusion, the monitoring platform performs bias correction and posterior feature estimation on the source risk value of each early warning grid for each assessment date according to the pre-configured fusion rules. The specific steps are as follows: First, for each source type, the platform searches for currently effective error characteristic records in the error characteristic library based on the soil type code and landform code of the early warning grid. If there is a perfectly matching combination of soil type and landform, the average deviation and standard deviation in the record are used directly as the basis for correction. If there is no perfectly matching combination, the platform follows the aforementioned zoning promotion strategy of prioritizing soil type and allowing landforms to be promoted among plains, terraces, gentle slopes and gentle hills but not across mountainous landforms. Among these candidate error characteristic entries, they are comprehensively sorted in order of sample size from large to small and standard deviation from small to large, and the record with a larger sample size and smaller standard deviation is selected as the replacement record.
[0064] After obtaining the error characteristic parameters, the monitoring platform corrects the risk index of the source on the current date in the current grid by subtracting the corresponding average deviation from the source's risk index, removing any long-term systematically high or low values, resulting in a corrected risk value that has already deducted the average deviation. Simultaneously, the standard deviation of the source is converted into a credibility weight for that source in the current region. Specifically, the reciprocal of the standard deviation of all sources participating in the fusion is taken, reflecting that sources with smaller standard deviations and less fluctuation are more credible. Then, the sum of these reciprocals is normalized by dividing each reciprocal by the sum of all reciprocals, ensuring that the sum of the resulting set of source weights is strictly equal to one, thus obtaining a set of source weights automatically adjusted according to historical error performance. If a source has no available record in the error characteristic database, the monitoring platform preferably sets its weight to zero and marks the source as having unknown credibility in the quality flag, avoiding the introduction of uncorrected source risk values into the fusion results.
[0065] After completing the deviation correction and weight determination for each source, the monitoring platform estimates the posterior characteristics of the risk index for the current date in the current early warning grid. To calculate the central value of the posterior distribution, the monitoring platform uses a source weighting method to linearly combine the corrected risk values of each source. That is, the corrected risk value of each source is multiplied by its corresponding weight and then summed to obtain a weighted average. The weighted average is used as the risk center value for that grid on that date. This center value, in the range of zero to one, represents the comprehensive risk level after considering the deviation correction of multiple sources. To characterize the overall uncertainty, the monitoring platform further calculates the comprehensive dispersion based on the standard deviation and weight of each source. Specifically, the standard deviations of each source are summed according to the source weight to obtain a comprehensive standard deviation as the comprehensive dispersion index. Then, according to the multiplier given in the configuration table, the comprehensive dispersion is multiplied by the multiplier to obtain a risk fluctuation range representing the distance between the upper quantile and the center value. Preferably, the multiplier can be set to two. For example, when the comprehensive dispersion is 0.05, the risk fluctuation range is 0.1.
[0066] Based on this, the monitoring platform obtains an upper quantile estimate by adding the risk center value to the risk fluctuation amplitude, and limits this value to no more than one, thus obtaining an upper quantile estimate representing the risk level that may be reached under typical high-risk conditions. For the probability of exceeding the warning level threshold, the monitoring platform uses the risk threshold corresponding to the warning level in the previous step as the warning threshold, and records this threshold as the warning threshold. It then determines the probability in segments based on the relationship between the risk center value, upper quantile, and risk fluctuation amplitude: when the risk center value is greater than or equal to the warning threshold, it is considered that the probability of exceeding the warning threshold in the current state is relatively high, and the probability of exceeding the warning threshold can be fixed at a high probability value close to one, preferably 0.9; when the risk center value is lower than the warning threshold and the upper quantile does not exceed the warning threshold, it is considered that even considering typical fluctuations, it is difficult to reach the warning threshold, and the probability of exceeding the warning threshold can be set at a low probability value close to zero, preferably 0.05; when the risk center value is lower than the warning threshold but the upper quantile is higher than the warning threshold, it is considered that there is a possibility of exceeding the warning threshold under certain fluctuation conditions. In this case, the monitoring platform first determines the difference between the risk center value and the warning threshold. If the risk center value is less than or equal to a risk fluctuation range, and if the risk center value is located in the interval between the warning threshold minus the risk fluctuation range and the warning threshold, then this interval is used as the interval in which the probability gradually transitions from 0.5 to 0.9. The specific probability value is determined by interpolation according to the position of the risk center value within this interval using a linear proportional relationship. That is, the closer the risk center value is to the warning threshold, the closer the corresponding probability is to 0.9, and the closer it is to the warning threshold minus the risk fluctuation range, the closer the corresponding probability is to 0.5. If the risk center value is lower than the warning threshold minus the risk fluctuation range but the upper quantile is still higher than the warning threshold, it can be considered that it may exceed the warning threshold only under more extreme fluctuation conditions. In this case, the probability of exceeding the warning threshold is fixed at a moderately low probability value, preferably 0.5. In this way, through clear interval division and linear interpolation rules, a risk characteristic quantity representing the probability of exceeding the warning level threshold can be approximately obtained without introducing complex distribution assumptions.
[0067] After calculating the risk center value, upper quantile, and probability of exceeding the warning level threshold, the monitoring platform writes these risk characteristics, along with the fusion rule version number and error characteristic version number used in this fusion, into the risk characteristic database. Each record in the risk characteristic database includes at least the warning grid number, assessment date, risk center value, risk upper quantile, probability of exceeding the warning threshold, list of sources participating in the fusion, corrected risk value of each source, weight of each source, fusion rule version number used, error characteristic version number used, fusion task identifier, and quality flag information. The quality flag is used to indicate whether there are situations such as sources being completely disabled, error characteristics being missing, or weights being forcibly set to zero during the calculation process. This allows the time evolution model to reduce the weight or remove abnormal records based on the quality flag when calling the risk characteristic database, ensuring the stability and interpretability of the time evolution results.
[0068] To ensure the idempotency and resource utilization efficiency of the fusion task, the monitoring platform establishes unique constraints on the early warning grid number, assessment date, and fusion task identifier in the risk feature database. Only one valid record is allowed to be generated for the same grid and the same date in the same task. When the upstream system triggers the same task repeatedly, the monitoring platform checks whether there is a record with the corresponding task identifier and grid date combination in the risk feature database after receiving the duplicate request. If it exists, it directly returns the existing record without performing the fusion calculation again, and records this trigger as a duplicate call in the log, thereby avoiding the repeated consumption of computing resources for the same batch of data.
[0069] The risk fusion module itself does not directly expose its interface to the outside world. Instead, it is passively triggered and accessed through the monitoring platform's internal services during daily scheduling and subsequent time evolution model calls. External systems and regulatory personnel can only conduct subsequent early warning analysis by accessing the risk feature quantities that have completed fusion and quality control in the risk feature library. Those skilled in the art can configure the error feature library, source missing threshold, fusion rules, and risk feature library structure according to the above process to reproduce the operation of the risk fusion process in similar scenarios, and realize the entire process of deviation correction, credibility weighting, and posterior risk feature extraction for multi-source risk indices.
[0070] S6. Based on the posterior time series of risk and working condition factors, train a constrained time evolution model and output grid risk prediction values. Combine spatial neighborhood characteristics and correct the prediction values through spatial residual regression. Generate graded early warning results for the grid in the future period according to the threshold. The specific implementation is as follows: In the early warning generation stage, the monitoring platform initiates the early warning assessment task after the daily risk fusion task is completed according to the set rolling cycle. The rolling cycle is preferably set to once a day. When the task is started, it first reads the currently effective time evolution model version, spatial residual regression rule version, early warning threshold and early warning level classification scheme from the configuration table, and generates the early warning task identifier for this time.
[0071] Subsequently, for each early warning grid within the early warning area, the monitoring platform retrieves a time series of risk characteristics covering a specific observation window from the risk characteristic database, based on the grid number and the current assessment date. In the main embodiment, the observation window length is preferably set to thirty days, meaning that the risk center value, risk upper quantile, and probability of exceeding the early warning threshold for each grid are read daily for the thirty calendar days preceding the current assessment date. Simultaneously, the corresponding daily rainfall, maximum hourly rainfall, number of consecutive rainfall days, emission load, and upstream contribution are read from the operating condition factor field according to the same date sequence. These risk characteristics and operating condition factors are then organized into a time series record in chronological order. During the sequence construction process, if a risk characteristic record is missing for a certain day but the operating condition factor is complete, time interpolation can preferably be performed using the risk center value and upper quantile of the previous day, and this interpolation behavior is recorded in the quality flag. If the missing data exceeds a preset threshold for several consecutive days, for example, more than three consecutive days, the time evolution prediction for that grid in this round of tasks is abandoned; only the prediction results from the previous round are retained, and the reason for the termination of the prediction for that grid in this round is recorded in the task log.
[0072] After obtaining the complete or finitely interpolated time series of risk characteristics and operating condition factors, the monitoring platform feeds these series into the currently effective version of the constrained time evolution model for future risk prediction. The time evolution model has already fitted parameters with historical data during the training phase, and a set of constraint rules are introduced during the training process to ensure that the prediction results do not show a trend that violates physical intuition under typical operating conditions. These constraint rules are recorded and locked in the model version description in the form of explicit clauses.
[0073] Specifically, during the training phase, the monitoring platform identifies historical sample segments containing continuous heavy rainfall and high emissions. Periods where daily rainfall exceeds a preset heavy rainfall threshold for several consecutive days and emission load is in the historically high percentile range are marked as high-input phases. The length of a high-input phase is preferably set to three to seven days. A monotonicity constraint is added during parameter updates: when both rainfall and emission indicators remain high or continue to rise within the input time window, the corresponding predicted risk center value must not show a significant decrease between adjacent days. A significant decrease can be defined as a reduction exceeding a preset range compared to the previous day; for example, a decrease exceeding 0.1% within a single day is not allowed. Similarly, historical segments with prolonged drought and zero emissions are identified. Periods where rainfall is close to zero and emission load is zero for several consecutive days are marked as low-input phases. In the training samples of low-input phases, the time evolution model incorporates a steady or slow decrease constraint during parameter updates. That is, when the operating condition input remains low or even zero for a long period, the predicted risk center value must not show a significant increase within a single day; a significant increase can be defined as an increase exceeding 0.05% compared to the previous day. By imposing the above constraints on the parameter update steps during training, it is ensured that the risk trajectory predicted by the final model version will not exhibit drastic reverse fluctuations that are significantly contrary to the physical process when encountering similar working condition sequences.
[0074] After the time evolution model is trained, the monitoring platform assigns a unique version number to the model and specifies the currently effective version and applicable region in the configuration. Older versions of the model are kept read-only for recalculation or comparison of historical prediction results. During online prediction, the monitoring platform inputs the risk center value, upper quantile, probability of exceeding the warning threshold, and time series of working condition factors within each early warning grid observation window into the model. The model then outputs the predicted risk center value, upper quantile, and probability of exceeding the warning threshold for each day in the future prediction period. In the main embodiment, the prediction period is preferably set to seven days, that is, the risk characteristic quantity is predicted for each of the next seven calendar days.
[0075] After obtaining the future risk characteristics output by the time evolution model, the monitoring platform constructs a spatial neighborhood around each early warning grid and performs spatial coherence correction on the time evolution results. The construction method of the spatial neighborhood is clearly given in the configuration table. In the main embodiment, the eight-neighborhood rule is preferred, that is, the grids that are adjacent to the target early warning grid in the grid division in the top, bottom, left, right and four diagonal positions are taken as the spatial neighborhood. If some neighborhood grids are located outside the boundary of the early warning area or are marked as unusable due to insufficient data, a ring of usable grids is selected from the remaining neighborhoods to form the actual neighborhood.
[0076] In the spatial residual regression step, the monitoring platform first calculates the difference between the risk center value predicted by the time evolution model and the average predicted risk center value of neighboring grids on the same day for each early warning grid and each prediction date. This difference serves as the initial spatial residual. Simultaneously, the working condition factors and geomorphic attributes of each grid in the neighborhood for the corresponding date are organized into a set of feature vectors. Geomorphic attributes may include geomorphic codes, average elevation, and slope. During the training phase, a set of spatial correction coefficients is fitted using historical residuals and these feature vectors. This ensures that the predicted residuals exhibit a relatively smooth spatial distribution in grids with similar geomorphic conditions and working conditions, avoiding unreasonable abrupt differences between adjacent grids. In the online correction phase, the monitoring platform uses the currently effective spatial correction coefficients to adjust the initial spatial residuals of each grid for each prediction date. The adjusted residuals are then added back to the risk center value prediction of the time evolution model to obtain the future risk center value prediction after spatial coherence correction. A similar method can be used to spatially smooth the risk upper quantile, ensuring that the high-risk probability within the neighborhood exhibits continuous changes within the continuous geomorphic region, without producing isolated abnormally high or low values.
[0077] After time and space corrections are completed, the monitoring platform maps the predicted future risk center value for each forecast date of each early warning grid to the corresponding early warning level according to the aforementioned risk level classification scheme. The risk level classification rules are consistent with those in the risk assessment stage described above, that is, the risk index range of 0 to 0.3 corresponds to the safety level, 0.3 to 0.5 corresponds to the attention level, 0.5 to 0.7 corresponds to the early warning level, and the range above 0.7 corresponds to the high-risk level. At the same time, combined with the probability prediction of exceeding the early warning threshold on each future day, the platform counts the number of days that reach the early warning level or above for each early warning grid during the entire forecast period, as well as the overall probability that at least one day during the forecast period will reach the early warning level or above and exceed the early warning threshold, which exceeds a certain threshold. In the main embodiment, the threshold is preferably set to 70%, i.e., 0.7%.
[0078] When the monitoring platform detects that a certain early warning grid has a risk level that reaches the early warning level or high risk level on any day within the next seven-day forecast period, and the predicted probability of exceeding the early warning threshold for that day is greater than 0.7, the grid is marked as a priority inspection grid in the current early warning task, and a structured inspection suggestion record is generated. This record includes at least the time period for recommended intensive soil sampling, the upstream emission unit number and emission time range for which the recommendation is to be focused, the future risk characteristics and model version number on which the recommendation is based, so that regulatory personnel can use it directly when arranging on-site verification.
[0079] After the early warning generation task is completed, the monitoring platform organizes the risk center value prediction, risk level prediction, probability prediction of exceeding the early warning threshold, priority patrol mark and patrol suggestion of each early warning grid into a set of structured records and writes them into the early warning result database. Each record in the early warning result database includes at least the early warning grid number, prediction date, predicted risk center value, predicted risk level, probability of exceeding the early warning threshold, whether it is a priority patrol grid, associated patrol suggestion number, version number of the time evolution model used, version number of the spatial correction rule used, identification of this early warning task, and quality mark information.
[0080] The query service for regulatory business systems accesses the early warning result database through a unified interface. When initiating a query, external systems need to provide the grid number and time range. After receiving the query, the monitoring platform retrieves the record generated by the most recently completed early warning task within the time range from the early warning result database and returns it to the caller. If the caller repeatedly requests the results of the same grid and time range within a short period of time, the monitoring platform identifies it as a duplicate request by comparing the grid number, time range, and early warning task identifier, and directly returns the stored result without re-triggering the calculation, thereby ensuring the idempotency of the query interface and the effective utilization of backend computing resources. If the requested time range exceeds the prediction period coverage of the current time evolution model, such as requesting early warning results more than seven days in the future, the monitoring platform returns a prompt message in the response, indicating that it exceeds the model coverage and no additional prediction was performed this time.
[0081] When the time evolution model version or warning threshold needs to be adjusted, a qualified technical manager initiates a change request through the management interface. The changes include the proposed effective model version number, spatial correction rule version number, warning level threshold, and priority inspection probability threshold. After the change request is submitted, the system enters the approval process. Once approved, the system updates the currently effective version in the configuration table and marks the old version as a historical version, which is retained for the interpretation of historical warning results. At any stage, if the warning task cannot be completed within the specified number of retries and the maximum allowable delay time due to missing upstream operating data, incomplete risk feature database, or internal calculation failure, the monitoring platform will not release warning results generated based on incomplete information. Instead, it will retain the previous round of complete and valid warning results as the current external display result and issue an alarm in the operation and maintenance alarm system, prompting technical personnel to check the data source and computing node status.
[0082] The entire early warning service operates within the authorized data range and preset functional boundaries. It does not directly intervene in the enterprise's emission control and mandatory decision-making process. It only provides auxiliary judgment information based on the above model to authorized regulatory agencies and relevant decision support systems. Those skilled in the art can configure the observation window length, prediction period length, constraint rules, spatial neighborhood method and version management strategy according to the above steps to reproduce the operation process of the early warning generation stage in similar scenarios, so as to achieve stable prediction of future risks of the early warning grid and selection of priority inspection targets.
[0083] In the operating scenario shown in this embodiment: In the actual operation of a certain watershed early warning area, the competent authority selected a river valley and alluvial plain area of approximately 3,000 square kilometers as the soil thallium pollution risk early warning area on the monitoring platform. After dividing the area into early warning grids according to the rule of 1,000-meter side length, several farmland early warning grids within a range of approximately 5,000 kilometers downstream of the upstream smelting industrial park were given special attention. Among them, the early warning grid numbered G-127 is located in the river valley alluvial plain zone, with loamy clay soil and a corresponding plain landform. During the first quarter's routine monitoring, technicians set up three soil sampling points in different small farmland plots within the G-127 grid according to the established layout plan. The sampling depth was controlled within the range of 0 to 20 centimeters below the surface. The latitude and longitude coordinates and sampling time of each sampling point were recorded on-site using handheld positioning devices. The collected soil samples were assigned sample numbers according to the rule of "grid number + sampling point number + date" and then sent to a qualified laboratory for testing. The laboratory tested the acid-exchangeable thallium (AET) content, pH, and clay content of three samples one by one according to current soil environmental methods. The results showed that the AET content of the three samples ranged from 0.22 to 0.32, the pH was concentrated between 6.2 and 6.5, and the clay content was approximately 35% to 40%. After verification through the platform's preset format and range, the monitoring platform removed samples that significantly deviated from the target depth and combined the three indicators of the remaining qualified samples into representative values using an arithmetic mean. The representative values for the AET content of the G-127 grid for this quarter were approximately 0.26, the representative value for pH was approximately 6.3, and the representative value for clay content was approximately 37%. These values, along with the sample number list, testing time range, and representative value calculation method version number, were entered into the soil testing database.
[0084] Within the same quarter, the monitoring platform obtains meteorological and emission records daily from the meteorological department and the pollution discharge supervision system according to a unified time reference, and summarizes and maps the records of meteorological observation stations and thallium-containing emission units located near and downstream of the smelting park. Taking a certain assessment day at the end of June as an example, when the monitoring platform was constructing the operating condition factor field, the calculation of the daily rainfall of grid G-127 showed that the grid experienced significant rainfall for three consecutive days before the assessment date. The daily rainfall from three days ago to one day ago was approximately 35, 42, and 18 mm respectively, with the corresponding maximum hourly rainfall exceeding 20 mm, and the number of consecutive rainfall days reaching five. The emission records showed that the daily emission intensity of the smelting enterprise located 5 kilometers upstream fluctuated in the order of several kilograms during the above period. After constructing a digital elevation model and determining the upstream early warning grid set, the monitoring platform converted the direct emissions of the early warning grid where the enterprise was located into the water flow path at an attenuation coefficient of 0.8 per kilometer. The attenuated contribution of multiple upstream early warning grids was added to the local direct emissions of grid G-127 to obtain the emission load of grid G-127 and the total upstream contribution for that day. Together with the daily rainfall, maximum hourly rainfall, and number of consecutive rainfall days, the operating condition factor record with "grid number plus calendar date" as the key was written into the operating condition factor field.
[0085] During the sensitivity parameter construction phase, the monitoring platform checked the number of valid detections of grid G-127 in the past year during the quarterly parameter fitting task. It found that the grid had completed at least one qualified detection in each of the four quarters, meeting the minimum requirement of more than three detections. Moreover, its soil type code was the same as that of several surrounding grids. After merging the samples of these grids that belong to the combination of "loamy clay + plain", the statistical fitting module used the representative value of acid-exchangeable thallium content as the explained variable, and the daily rainfall, number of consecutive rainfall days, emission load and upstream contribution sliding cumulative amount, as well as the pH and clay content of the day as the explanatory variables for the corresponding date and the previous few days in the observation window. The sample matrix was constructed and the regression coefficients were solved iteratively. The fitting results show that the rainfall leaching sensitivity parameters for this soil type and landform combination are positive and relatively large, indicating that, keeping other conditions constant, the recent increase in total rainfall will significantly increase the acid-exchangeable thallium content. The emission load response parameters are also positive and significant, indicating that the increase in emission intensity and upstream contribution will increase the acid-exchangeable thallium level in the target grid. The regression coefficients corresponding to pH and clay content show a certain buffering and adsorption effect. After calculating the average relative error for all samples, the error was found to be approximately 15%, which is less than the configured 20% error control limit. Therefore, the statistical fitting module recorded this set of sensitivity parameters as having acceptable fitting quality, generated a version record in the parameter library, and marked it as the effective version of the "loamy clay + plain" combination within the current statistical period.
[0086] On an assessment day at the end of June, the monitoring platform constructed a risk indication function for grid G-127 according to a predetermined procedure during the risk assessment phase: First, it reviewed the monitoring records within the past year in the soil testing database to confirm that the most recent valid test, i.e., the representative value of 0.26 mg / kg for this quarter, was valid within the maximum review period. This representative value was compared with the regional soil background value of 0.15 and the environmental standard limit of 0.8. By measuring the increase of the representative value relative to the background value and the distance between it and the standard limit, the initial risk based on the testing was mapped to a testing risk component between zero and one. The result was in the range slightly above the concern level. Subsequently, it retrieved the daily rainfall, maximum hourly rainfall, consecutive rainfall days, emission load, and upstream contribution records for the most recent 30 days from the working condition factor field to construct a working condition observation window. The rainfall and emissions of the most recent three days were weighted at 0.5... The 0.3 and 0.2 values are time-weighted and accumulated. Combined with the rainfall leaching sensitivity parameter and emission load response parameter in the sensitivity parameters, the net cumulative contribution of acid-exchangeable thallium content relative to the background under the current rainfall and emission combination is estimated. After considering the adsorption effect reflected by the soil buffering capacity parameter, the background value is superimposed with the net contribution to obtain an estimated value of expected acid-exchangeable thallium content. This estimated value is then mapped to the working condition risk component by comparing it with the background value and the standard limit. The result roughly falls between the range of concern and warning. At the same time, the monitoring platform performs time-weighted accumulation of the upstream contribution within the observation window and compares it with the cumulative emission contribution of this grid. It is found that the upstream cumulative contribution is more than twice that of the cumulative emission contribution of this grid. According to the preset rules, the upstream correction component is set as a positive correction amount, for example, adding 0.1 to the existing normalized risk. The monitoring platform found in the weighting scheme version record that the current area adopted a scheme with a detection component weight of 0.5, an operating condition component weight of 0.3, and an upstream correction component weight of 0.2 under the scenario where the detection information is relatively sufficient. The three components were weighted and summed according to their weights to obtain the normalized risk index of the G-127 grid on that day, which was approximately 0.58, corresponding to the "early warning" level in the early warning level classification. The index, along with the values of each component, the composition information, the risk indicator function version number, and the weighting scheme version number, was written into the risk base database.
[0087] In the risk fusion phase, the monitoring platform, based on the statistical results of long-term deviations of various risk sources within the region from the error characteristic library, performs deviation correction and weighting on the multi-source risk index of grid G-127 for that day. Assuming that the region maintains risk information from two sources simultaneously—one based on local monitoring and model calculations, and the other on regional background risk sources provided by the superior platform—the error characteristic library shows that the local source has a smaller average deviation and a standard deviation of approximately 0.04 for this soil type and landform combination, while the superior platform source has a larger standard deviation of approximately 0.08. Based on this, the monitoring platform subtracts the average deviation from the risk indices of both the local and superior platform sources to obtain corrected risk values after deducting systematic offsets. Then, using reciprocal normalization of the standard deviation, the weight of the local source is set to approximately two-thirds, and the weight of the superior platform source is set to approximately one-third. The weighted sum of the corrected risk values for both types of sources yields a risk center value of approximately 0.60, and an upper quantile estimate of approximately 0.72 is given based on twice the comprehensive standard deviation. Simultaneously, using the risk threshold of 0.5 corresponding to the warning level as the warning threshold, when the risk center value has exceeded the threshold and the upper quantile has significantly exceeded the threshold, the probability of exceeding the warning threshold is set to a high probability value close to one, such as 0.85. The aforementioned risk center value, upper quantile, and probability of exceeding the warning threshold, along with the list of sources participating in the fusion, the corrected risk values of each source, their weights, and the versions of the error characteristics and fusion rules used, are written into the risk feature library to provide input for subsequent time evolution models.
[0088] In the early warning generation phase, after completing the daily risk fusion task, the monitoring platform initiates an early warning assessment task. This involves compiling the time series of risk center values and operating condition factors for the G-127 grid within the 30 days prior to the current assessment date, forming a time series record within the observation window. This record includes the aforementioned continuous heavy rainfall and high emission phases, as well as subsequent changes in operating conditions. During the training phase, the time evolution model has already imposed a constraint on high-input phase samples (e.g., "predicted risk must not decrease by more than 0.1% per day") and a constraint on low-input phase samples (e.g., "predicted risk must not increase by more than 0.05% per day"), and accordingly, when encountering operating condition sequences like those in the G-127 grid, the current effective model version will provide a risk prediction trajectory that slowly increases or remains flat during the continuous high-input phase and gradually decreases during the input weakening phase. At the end of June, the time evolution model predicted the risk center value and the probability of exceeding the warning threshold for the next seven days based on the historical risk center value and operating conditions over the past thirty days. The results showed that the risk center value was between 0.55 and 0.65 for the first three days of the next seven days due to the influence of previous leaching effects and upstream residual emissions. For the next four days, rainfall was significantly reduced and emission intensity decreased, and the predicted risk center value began to slowly fall back to around 0.5. However, it was still higher than 0.5 for several days during the entire seven-day prediction period.
[0089] After obtaining the temporal evolution results, the monitoring platform constructed a spatial neighborhood of the G-127 grid. It found that six of the eight surrounding warning grids were also affected by emissions from the upstream smelting park and heavy rainfall within a similar timeframe, with risk prediction values close to those of the G-127 grid. However, some upstream slope grids had slightly lower prediction values due to differences in topography and drainage conditions. Using spatial correction coefficients fitted during the training phase, the monitoring platform adjusted the initial spatial residuals of the G-127 grid and its neighborhood, smoothing predictions that deviated abnormally from the neighborhood average. This ensured that the predicted risk center value for the G-127 grid over the next seven days remained consistent with the overall neighborhood level, without any isolated extremely high or low anomalies. Subsequently, the monitoring platform mapped the predicted risk center value for the G-127 grid over the next seven days to risk levels according to a unified risk level classification scheme. The results showed that at least three of the next seven days would reach the warning or high-risk level, and the predicted probability of exceeding the warning threshold for these dates was all higher than 0.7. According to the priority inspection judgment rules, the monitoring platform marked the G-127 grid as a priority inspection grid in this round of early warning mission and generated an inspection suggestion record. It specifically recommended to increase the number of soil sampling points in key farmland areas within the G-127 grid within the next two days, and to conduct on-site investigations of the emission units corresponding to the two upstream early warning grids with significant upstream contributions to verify the emission intensity and the operation status of treatment facilities. The platform also included the version number of the time evolution model and the version number of the spatial correction rule used in this operation.
[0090] At the end of the early warning mission, the monitoring platform writes the predicted risk center value, predicted risk level, probability of exceeding the early warning threshold, whether it is a priority inspection grid, and the associated inspection suggestion number and early warning mission identifier for each of the next seven days for grid G-127 into the early warning result database. During routine inspections, the regulatory system uses a query interface to search the early warning result database using the grid G-127 number and the next seven-day time range. It obtains the most recently completed early warning result, showing that grid G-127 has been marked as a priority inspection grid for the next seven days and has reached the early warning level for multiple days. Combining this with the recommendations in the inspection suggestions regarding intensified soil sampling and upstream emission investigation, the system arranges for the environmental monitoring station to conduct on-site sampling within the suggested time period and to focus on inspecting upstream emission units. Overall, this implementation case demonstrates the complete operational process in a specific watershed scenario, from early warning grid division, soil monitoring database construction, working condition factor field generation, sensitivity parameter fitting and generalization, risk indicator function calculation and multi-source risk set formation, to risk fusion, temporal evolution and spatial correction, and priority inspection and early warning output. Based on this, those skilled in the art can gain an intuitive understanding of the implementation of this solution in real areas, and configure parameters and deploy systems according to the same steps in different watersheds or administrative regions to achieve similar soil thallium pollution risk early warning capabilities.
[0091] All calculations involved in the embodiments are dimensionless numerical calculations, and the preset parameters and thresholds in the calculations are set by those skilled in the art according to the actual situation.
[0092] It should be noted that this invention can be deployed on the device itself to realize embedded applications, or it can run on a PC or other terminal with a user interface, thereby meeting various hardware environments and usage requirements.
[0093] The above embodiments can be implemented, in whole or in part, by software, hardware, firmware, or any other combination thereof. When implemented using software, the above embodiments can be implemented, in whole or in part, as a computer program product. The computer program product includes one or more computer instructions or computer programs. When the computer instructions or computer programs are loaded or executed on a computer, all or part of the processes or functions described in the embodiments of this application are generated. The computer can be a general-purpose computer, a special-purpose computer, a computer network, or other programmable device. The computer instructions can be stored in a computer-readable storage medium or transmitted from one computer-readable storage medium to another. For example, the computer instructions can be transmitted from one website, computer, server, or data center to another website, computer, server, or data center via wireless or wired transmission; wired transmission methods include optical fiber, twisted pair, coaxial cable, etc.; wireless transmission includes infrared, microwave, etc. The computer-readable storage medium can be any available medium that a computer can access or a data storage device such as a server or data center containing one or more sets of available media. The available medium can be a magnetic medium (e.g., floppy disk, hard disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium. A semiconductor medium can be a solid-state drive.
[0094] Those skilled in the art will understand that, for the sake of convenience and brevity, the specific working processes of the systems, devices, and modules described above can be referred to the corresponding processes in the foregoing method embodiments, and will not be repeated here.
[0095] In the several embodiments provided in this application, it should be understood that the disclosed systems, apparatuses, and methods can be implemented in other ways. For example, the apparatus embodiments described above are merely illustrative; for instance, the division of modules is only a logical functional division, and in actual implementation, there may be other division methods. For example, multiple modules or components may be combined or integrated into another system, or some features may be ignored or not executed. Furthermore, the coupling or direct coupling or communication connection shown or discussed may be through some interfaces; the indirect coupling or communication connection between apparatuses or modules may be electrical, mechanical, or other forms.
[0096] The modules described as separate components may or may not be physically separate. The components shown as modules may or may not be physical modules; they may be located in one place or distributed across multiple network modules. Some or all of the modules can be selected to achieve the purpose of this embodiment according to actual needs.
[0097] In addition, the functional modules in the various embodiments of this application can be integrated into one processing module, or each module can exist physically separately, or two or more modules can be integrated into one module.
[0098] If the aforementioned functions are implemented as software functional modules and sold or used as independent products, they can be stored in a computer-readable storage medium. Based on this understanding, the technical solution of this application, in essence, or the part that contributes to the prior art, or a portion of the technical solution, can be embodied in the form of a software product. This computer software product is stored in a storage medium and includes several instructions to cause a computer device (which may be a personal computer, server, or network device, etc.) to execute all or part of the steps of the methods described in the various embodiments of this application. The aforementioned storage medium includes various media capable of storing program code, such as USB flash drives, portable hard drives, read-only memory (ROM), random access memory (RAM), magnetic disks, or optical disks.
[0099] The above description is merely a specific embodiment of this application, but the scope of protection of this application is not limited thereto. Any variations or substitutions that can be easily conceived by those skilled in the art within the scope of the technology disclosed in this application should be included within the scope of protection of this application. Therefore, the scope of protection of this application should be determined by the scope of the claims.
[0100] In conclusion, the above description is only a preferred embodiment of the present invention and is not intended to limit the present invention. Any modifications, equivalent substitutions, improvements, etc., made within the spirit and principles of the present invention should be included within the protection scope of the present invention.
Claims
1. A method for early warning of soil thallium pollution risk, characterized in that, include: S1. Divide the early warning grid, collect soil samples in the grid, and determine the content of acid-exchangeable thallium, pH and clay content. Compile the samples according to the sampling location to generate a soil testing library. S2. Based on meteorological and emission data, calculate the rainfall index, emission load and upstream contribution of each grid to generate a working condition factor field at the same scale as the soil monitoring database; S3. Within a grid with multiple detection records, the working condition response relationship is fitted with acid-exchangeable thallium content as the response quantity and working condition factor, pH and clay content as independent variables. Sensitivity parameters are extracted and extended according to soil type and geomorphology codes to form a sensitivity parameter field. S4. Based on the soil test results, working condition factors and sensitivity parameters of each grid, a risk indicator function is constructed to obtain the normalized risk index, and observation and estimation results from different sources are aggregated to form a multi-source risk set; S5. Construct a Bayesian fusion model using the multi-source risk set and the error distribution of each source to obtain the posterior distribution of the grid risk index, and extract risk features as input to the early warning model. S6. Based on the risk posterior time series and working condition factors, train a constrained time evolution model and output grid risk prediction values. Combine spatial neighborhood characteristics to correct the prediction values through spatial residual regression, and generate grid future period graded early warning results according to the threshold.
2. The method for early warning of soil thallium pollution risk according to claim 1, characterized in that, S1 includes: Collecting soil samples and generating a soil testing library includes: setting up soil sampling points within the early warning grid and recording their locations and sampling times to form soil samples with sample numbers; In the laboratory, soil samples were tested for acid-exchangeable thallium content, pH, and clay content. The test results were electronically transmitted to the monitoring platform, which then performed format and range verification. Records that fail verification are stored in the error record database. Records that pass verification are grouped by warning grid and sampling date, and the arithmetic mean of the index representative value is calculated. The representative value and sample number are written into the soil testing database.
3. The method for early warning of soil thallium pollution risk according to claim 1, characterized in that, S2 include: The monitoring platform collects hourly rainfall and emission records daily from meteorological departments and sewage discharge monitoring systems, based on the time synchronization benchmark. Calculate the daily rainfall, maximum hourly rainfall, and number of consecutive rainy days for each grid based on the correspondence between grids and meteorological observation stations; Based on the digital elevation model, the upstream and downstream relationships of the grid are determined, and the emission intensity of the emission unit is attenuated by distance and converged to each grid along the water flow path to obtain the emission load and upstream contribution. Generate load factor entries containing rainfall indices, emission loads, upstream contributions, and rule version numbers according to grid and date, and write them into the load factor field.
4. The method for early warning of soil thallium pollution risk according to claim 1, characterized in that, S3 includes: The monitoring platform initiates parameter fitting tasks within the statistical period; Based on the early warning grid, representative values of acid-exchangeable thallium content, working factors, pH, and clay content were extracted from the soil monitoring database and working factor field. Sample records were constructed with representative values of acid-exchangeable thallium content as the explained variable and working factors, pH, and clay content as the explained variables. The statistical fitting module groups the samples by soil type and geomorphology, performs regression fitting on the samples, and obtains sensitivity parameters including rainfall leaching sensitivity parameters, emission load response parameters, and soil buffering capacity parameters.
5. The method for early warning of soil thallium pollution risk according to claim 4, characterized in that: The monitoring platform divides the early warning area into soil zones according to soil type codes and into geomorphic zones according to geomorphic codes; When the number of effective detections of the early warning grid is lower than the preset lower limit, the sensitivity parameter version that matches the soil type code and geomorphology code of the early warning grid is retrieved from the parameter library, and the matching version is registered as the version used by the early warning grid; When no matching combination exists, among the candidate sensitivity parameter versions with the same soil type code and landform code within the preset promotion range, the sensitivity parameter version is selected by sorting according to the average relative error and registered as the alternative version, and the allocation relationship is recorded in the internal mapping table.
6. The method for early warning of soil thallium pollution risk according to claim 1, characterized in that, S4 include: The monitoring platform obtains the most recent representative values of acid-exchangeable thallium content, pH, and clay content from the soil testing database according to the early warning grid and assessment day. Rainfall indices, emission loads, and upstream contributions covering the operating condition observation window are obtained from the operating condition factor field. A risk indication function containing detection risk components, operating condition risk components, and upstream correction components is constructed by combining sensitivity parameters. The normalized risk index is obtained by weighting each component according to the weight coefficient, and the risk level is divided. The grid number, assessment date, normalized risk index, risk level, component value and corresponding version number are written into the risk base library.
7. The method for early warning of soil thallium pollution risk according to claim 1, characterized in that, S5 include: The risk fusion module reads the error feature library version information and fusion rule version information in each fusion task; Extract multi-source risk sets from the risk database according to the early warning grid number and assessment date; The risk value set is divided according to the source of risk, and sources with missing quality indicators and unreliable sources are eliminated based on quality indicators. Based on the soil type code and landform code, the average deviation and standard deviation of each source are queried in the error characteristic database. The source risk index is obtained by subtracting the average deviation from the source risk index, and the weight of each source is determined by normalization according to the reciprocal of the standard deviation.
8. The method for early warning of soil thallium pollution risk according to claim 7, characterized in that: After completing the source deviation correction and source weight determination, the risk fusion module calculates the weighted average risk index as the risk center value using the corrected risk index and source weight, and calculates the comprehensive standard deviation using the source standard deviation and source weight. The risk fluctuation range is calculated according to the multiplier given in the configuration table. The risk center value and the risk fluctuation range are superimposed to obtain the upper quantile estimate. The probability of exceeding the warning threshold is determined by combining the risk center value, upper quantile, probability of exceeding the warning threshold, and version information are written into the risk feature database.
9. A method for early warning of soil thallium pollution risk according to claim 1, characterized in that, S6 include: The monitoring platform extracts time series of risk features and time series of operating conditions covering the observation window from the risk feature library and operating condition factor field on a rolling cycle. By inputting the time series of risk characteristics and the time series of operating conditions into a constrained time evolution model, the predicted risk center values for each future day are obtained. Based on spatial neighborhood, spatial residual regression is performed to correct the risk center value prediction, and the corrected risk center value prediction is mapped to the warning level according to the warning threshold. Generate early warning result records containing priority patrol grid markers and write the early warning result records into the early warning result database.