Machine learning based soil pollution evolution prediction system and method

By using machine learning methods to divide soil regions, construct soil pollution correlation maps, and calculate the dynamic pollution impact, this approach solves the problem of unpredictable spatiotemporal evolution patterns of soil pollution in existing technologies, and achieves accurate assessment of pollutant migration and spatial transmission characterization.

CN122241075APending Publication Date: 2026-06-19GUANGZHOU SINO-GERMAN ENVIRONMENTAL TECH RES INST CO LTD

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
GUANGZHOU SINO-GERMAN ENVIRONMENTAL TECH RES INST CO LTD
Filing Date
2026-03-16
Publication Date
2026-06-19

AI Technical Summary

Technical Problem

Existing technologies cannot accurately predict the spatiotemporal evolution of soil pollution, ignore the spatial transmission correlation between soil units and the degree of pollution impact, and have poor adaptability, lack of physical mechanism support and weak generalization ability.

Method used

Using machine learning methods, the spatial boundaries of soil areas are delineated and gridded, static soil data are extracted, a soil pollution correlation map is constructed, the dynamic pollution impact is calculated, and the changes in pollution status are predicted by combining the spatial correlation between soil volumetric water content and pollution transmission probability.

Benefits of technology

It achieves precise response to the migration patterns of pollutants in different humidity ranges, improves the accuracy of dynamic pollution assessment, quantifies the spatial transmission of soil pollution, lays the foundation for spatiotemporal coupled evolution prediction, and takes into account both prediction accuracy and adaptability.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN122241075A_ABST
    Figure CN122241075A_ABST
Patent Text Reader

Abstract

This invention discloses a soil pollution evolution prediction system and method based on machine learning, belonging to the field of soil pollution remediation technology. The invention delineates soil region boundaries and grids them, generating soil units with unique coordinates. It extracts static basic data feature sets for each coordinate, inputs them into a trained machine learning model, and outputs the static pollution impact level. Then, it collects the soil volumetric water content at each coordinate and calculates the dynamic pollution impact level based on the static value. Subsequently, it determines the spatial correlation between coordinates, obtains paired combinations with potential pollution transmission, calculates the pollution impact coefficient of the source on the target, constructs a soil pollution correlation graph based on the coefficient, sets the coordinates as nodes, and configures their attributes. Based on this correlation graph, according to the preset graph calculation logic, it obtains the pollution state change data of each soil coordinate under different time dimensions. This invention significantly improves the accuracy, interpretability, and engineering practicality of soil pollution evolution prediction.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of soil pollution remediation technology, specifically to a soil pollution evolution prediction system and method based on machine learning. Background Technology

[0002] Soil pollution is characterized by its concealment, accumulation, and spatial heterogeneity. Accurately predicting the spatiotemporal evolution of soil pollution is the core technological support for achieving source control, risk warning, and precise remediation of soil pollution.

[0003] Current technologies primarily focus on static pollution characteristic analysis, severely underestimating soil moisture, a core dynamic factor in pollution migration. They treat it merely as a fixed constant or subject it to simple linear corrections, failing to match the nonlinear characteristics of pollutant migration under different hydrological thresholds of soil moisture. Existing technologies neglect spatial transmission relationships between soil units, failing to quantify the degree of pollution impact across different soil regions and thus unable to characterize the spatial pathways of pollution diffusion. Most existing technologies are single models driven by pure mechanisms or pure data. Pure mechanism models rely on numerous parameters that are difficult to obtain accurately, resulting in computational complexity and poor adaptability to large areas. Pure data-driven models lack systematic integration of multi-source static basic data for feature extraction, lack physical mechanism support, and have weak generalization ability. Summary of the Invention

[0004] The purpose of this invention is to provide a soil pollution evolution prediction system and method based on machine learning to solve the problems raised in the prior art.

[0005] To achieve the above objectives, the present invention provides the following technical solution: Firstly, this application provides a machine learning-based method for predicting the evolution of soil pollution, comprising the following steps: The spatial boundaries of the soil area are defined, and the area is divided into grids to generate several soil units, each corresponding to a unique soil coordinate. For each soil coordinate, static basic soil data is acquired, and the corresponding feature set is extracted. The feature set is used as the input to a machine learning algorithm model that has been trained, and the static pollution impact level corresponding to each soil coordinate is obtained by calculating the output. Collect the soil volumetric moisture content corresponding to each soil coordinate, and calculate the dynamic pollution impact corresponding to that soil coordinate by combining the corresponding static pollution impact degree. The spatial relationships between all soil coordinates within the soil area are determined to obtain soil coordinate pairings with the potential for pollution transmission. For each pairing, the pollution influence coefficient of the source soil coordinates on the target soil coordinates is calculated. Based on the pollution influence coefficient, a soil pollution association graph is constructed, and each soil coordinate within the area to be detected is set as an independent node in the graph structure, with attribute values ​​set. Based on the soil pollution correlation map, the pollution status change data corresponding to each soil coordinate under different preset time dimensions were calculated.

[0006] In conjunction with the first aspect, in the first embodiment of the first aspect of this application, the step of defining the spatial boundary of the soil region, dividing it into grids, and generating several soil units, each soil unit corresponding to a unique soil coordinate, includes: The boundary delineation criteria are determined, the geospatial area of ​​the soil region is obtained, and the initial boundary is delineated. The initial boundary is then modified by combining natural topography, land feature boundaries, and pollution diffusion boundaries to eliminate overlapping and blank areas, forming a non-self-intersecting spatial boundary. The division scale and rules of the grid cells are set, and the soil region within the boundary is discretized and divided into soil cells by using the spatial boundary as a constraint. Soil cells that intersect with the spatial boundary are then trimmed and modified. Using a unified geographic coordinate system, spatial feature parameters are extracted for each soil cell to generate a corresponding unique soil coordinate.

[0007] In conjunction with the first aspect, in the second embodiment of the first aspect of this application, the step of acquiring static soil baseline data and extracting the corresponding feature set for each soil coordinate includes: The static soil baseline data includes static attribute data of pollution sources, soil inherent property data, regional topography data, and soil spatial correlation attribute data, which are preprocessed. Each type of static soil baseline data corresponds to a feature extraction dimension. For each feature extraction dimension, a corresponding feature extraction rule is preset to determine the extraction logic of the feature items to be extracted under that dimension. For each feature extraction dimension, the corresponding feature items are extracted from the matched static soil baseline data according to the extraction rule corresponding to that dimension, and integrated to form a feature set.

[0008] In conjunction with the first aspect, in the third embodiment of the first aspect of this application, the step of using the feature set as input to the trained machine learning algorithm model and calculating the static pollution impact degree corresponding to each soil coordinate by calculating the output includes: Historical soil sample data matching the soil type, pollution source type, and topographic features of the soil region were collected. Each sample contained a set of feature items with dimensions completely consistent with the feature set of the soil coordinates to be detected, as well as the measured static pollution impact label value of the corresponding sample point. The samples were divided into a training subset and a validation subset. The training subset was used for model iterative fitting, and the validation subset was used for model parameter verification. The machine learning algorithm model selected was the XGBoost regression model. The basic parameters of the tree structure, the number of iterations, the loss function type, and the regularization constraint parameters of the model were preset, and iterative training and model verification were performed. After the verification was passed, the feature set was used as the input of the XGBoost regression model, and the static pollution impact degree corresponding to each soil coordinate was output.

[0009] In conjunction with the first aspect, in the fourth embodiment of the first aspect of this application, the step of collecting the soil volumetric moisture content corresponding to each soil coordinate, and calculating the dynamic pollution impact degree corresponding to that soil coordinate in conjunction with the corresponding static pollution impact degree, includes: For a single soil coordinate, inherent hydrological parameters, including field capacity and saturated water content, are retrieved from the static soil baseline data. Based on expert experience, preset fixed coefficients are determined according to the pollution type and soil texture, including wilting humidity conversion coefficient, pollution type humidity influence weighting coefficient, and saturation segment correction coefficient. The static pollution impact level, field capacity, saturated water content, and corresponding preset fixed coefficients corresponding to the coordinate are matched with the soil volumetric water content to form a single set of calculation parameters for the coordinate. The field capacity in the set of calculation parameters is multiplied by the wilting humidity conversion factor in the preset fixed coefficients to obtain the wilting humidity threshold for the soil coordinate. The soil volumetric water content in the set of calculation parameters is compared with the wilting humidity threshold and field capacity to determine the numerical range to which the soil volumetric water content belongs. The numerical ranges are classified as follows: soil volumetric water content is less than or equal to the wilting humidity threshold, soil volumetric water content is greater than the wilting humidity threshold and less than or equal to field capacity, and soil volumetric water content is greater than field capacity. Based on the numerical range of soil volumetric moisture content, select the corresponding piecewise mathematical formula, substitute all parameters in the set of calculation parameters into the formula, and obtain the dynamic pollution impact degree corresponding to the soil coordinate.

[0010] In conjunction with the first aspect, in the fifth embodiment of the first aspect of this application, the step of selecting a corresponding piecewise mathematical formula based on the numerical range of soil volumetric moisture content, substituting all parameters in the set of calculation parameters into the formula, and obtaining the dynamic pollution impact degree corresponding to the soil coordinates includes: Assume the soil volumetric water content is The wilting humidity threshold is Field water holding capacity is ; when At that time, the degree of dynamic pollution impact is a fixed proportion of the static value, and the calculation formula is: ; Where D represents the degree of dynamic pollution impact, and S represents the degree of static pollution impact. This is the humidity conversion factor for wilting; when At that time, the pollutant migration capacity increases linearly with humidity, and the calculation formula is: ; in, The weighting coefficient for the influence of humidity on pollution type; when At this time, the pollutant migration capacity continues to increase with increasing humidity, but the marginal effect gradually decreases and eventually tends to stabilize. The calculation formula is: ; Where k is the correction coefficient for the saturation segment. This represents the saturated moisture content.

[0011] In conjunction with the first aspect, in the sixth embodiment of the first aspect of this application, determining the spatial correlation between all soil coordinates within the soil area to obtain soil coordinate pairings with the potential for pollution transmission includes: Retrieve all generated soil coordinates within the soil area, along with the spatial attribute data bound to each soil coordinate, including the spatial extent of the corresponding soil unit, topographic elevation, soil conductivity characteristics, and static pollution impact level. Determine the driving logic of soil pollution transmission as the migration of pollutants driven by the flow of water along the topography and soil pores. Based on the driving logic, preset judgment dimensions for the possibility of pollution transmission, namely spatial proximity, topographic runoff feasibility, soil conductivity adaptability, and pollution load basis. For each judgment dimension, preset corresponding feasibility judgment rules, and filter out soil coordinate pairings with the possibility of pollution transmission.

[0012] Specifically, regarding spatial proximity, for each soil coordinate, the maximum potential pollution conduction influence boundary of the coordinate is preset based on the spatial center point of the soil unit corresponding to the coordinate. This boundary is the maximum spatial range that pollution at the coordinate can reach through hydraulic conduction. For each soil coordinate, all other soil coordinates whose spatial range is completely within the maximum potential pollution conduction influence boundary of the coordinate are selected to form the initial candidate pairing coordinate set of the coordinate. For topographic runoff feasibility, for each initial candidate pair of coordinates, topographic elevation data of the soil units corresponding to the two soil coordinates are extracted to determine the relative elevation relationship between the two coordinates, identifying potential source and target coordinates. The source coordinates are soil coordinates with higher elevations, and the target coordinates are soil coordinates with lower elevations. When the elevations of the two soil coordinates are completely identical, or the target coordinate elevation is higher than the source coordinate, the coordinate pair is directly determined to have no possibility of forward pollution transmission and is eliminated. For coordinate pairs whose elevations meet the forward transmission requirements, it is verified whether there is a continuous topographic runoff path between the two soil coordinates, confirming that the two coordinates are within the same catchment unit and there are no topographic obstacles such as watersheds or steep slopes blocking the runoff path. When there is a topographic runoff obstruction between the two coordinates, the coordinate pair is directly determined to have no possibility of pollution transmission and is eliminated. When a continuous runoff path exists, the coordinate pair is determined to pass the topographic runoff feasibility assessment and proceeds to the next stage. For soil conduction adaptability, for coordinate pairs that have passed the topographic runoff feasibility assessment, the inherent conduction characteristics data of all soil units within the continuous spatial range between the two soil coordinates are extracted, including soil texture, porosity, permeability, and soil layer continuity data. The system verifies whether there is a continuous permeable soil layer between the two soil coordinates, and whether there are any soil conduction barriers such as continuous impermeable layers, bedrock outcrops, or artificial seepage prevention structures. When there is a soil conduction barrier between the two coordinates, the coordinate pair is directly determined to have no possibility of pollution conduction and is discarded. When a continuous conductable soil layer exists, the coordinate pair is determined to have passed the soil conduction adaptability assessment and proceeds to the next step. For the pollution load baseline, for all coordinate pairs that pass the above judgment, extract the static pollution impact data corresponding to the source soil coordinates in the pair and verify whether the data reaches the preset minimum pollution transmission load threshold. When the static pollution impact of the source soil coordinates does not reach the minimum threshold, it is determined that the source coordinates do not have the basic load capacity to transmit pollution outwards, and the pair of coordinates has no actual pollution transmission possibility and is removed. When the static pollution impact of the source soil coordinates reaches the minimum threshold, it is determined that the pair of coordinates has the possibility of pollution transmission and is retained. The soil coordinate pairings with the possibility of pollution transmission are then integrated to obtain the combination.

[0013] In conjunction with the first aspect, in the seventh embodiment of the first aspect of this application, the step of calculating the pollution influence coefficient of the source soil coordinates on the target soil coordinates for each pair of soil coordinates includes: For each pair of soil coordinates, data on the spatial distance between the source and target coordinates, relative topographic slope, permeability characteristics of the intermediate soil layer, and the degree of static pollution impact of the source are retrieved. The spatial distance attenuation factor is calculated using an exponential decay function algorithm, with higher factor values ​​for closer distances. The topographic slope gain factor, soil permeability and conduction factor, and source pollution load basis factor are calculated using a linear normalization algorithm, and each parameter is uniformly mapped to a standardized interval of 0 to 1. The weighted arithmetic mean algorithm is used to sum each factor according to a preset fixed weight, and the interval truncation constraint algorithm is used to restrict the results to a reasonable physical meaning interval, thus obtaining the pollution impact coefficient of the source soil coordinates on the target soil coordinates.

[0014] In conjunction with the first aspect, in the eighth embodiment of the first aspect of this application, the step of calculating the pollution state change data corresponding to each soil coordinate under different preset time dimensions based on the soil pollution correlation diagram includes: Several time dimensions are preset, and a time step discretization algorithm is used to decompose each time dimension into equidistant time steps. An attribute value mapping algorithm is used to use the dynamic pollution impact degree of each node as the initial pollution state value and bind it to the initial time node of the corresponding soil coordinate to form an initial pollution state matrix. A graph convolution iterative algorithm is used to iterate and calculate in units of time steps. Specifically, in each step, based on the pollution impact coefficient of the directed edges between nodes, a weighted summation algorithm is used to transmit the pollution state value of the source node to the target node according to the weight. An exponential decay algorithm is used to correct the pollution loss in the transmission process to obtain the pollution state update value of the target node under the current time step. After traversing and completing the iterative calculation of all preset time steps, a time dimension aggregation algorithm is used to integrate the node pollution state values ​​of each time step into the corresponding preset time dimension, and output the pollution state change data corresponding to each soil coordinate under different preset time dimensions.

[0015] Secondly, this application provides a machine learning-based soil pollution evolution prediction system, including: The static pollution impact degree calculation module includes: a soil region division unit that delineates the spatial boundaries of the soil region, performs grid division, and generates several soil units, each soil unit corresponding to a unique soil coordinate; a feature set extraction unit that, for each soil coordinate, obtains the static basic data of the soil and extracts the corresponding feature set; and a static pollution impact degree calculation unit that uses the feature set as input to the trained machine learning algorithm model and calculates the static pollution impact degree corresponding to each soil coordinate. Dynamic pollution impact degree calculation module: includes: Dynamic pollution impact degree calculation unit collects the soil volume moisture content corresponding to each soil coordinate, and calculates the dynamic pollution impact degree corresponding to the soil coordinate by combining the corresponding static pollution impact degree. The soil pollution association map construction module includes: a soil coordinate pairing unit to determine the spatial relationship between all soil coordinates within the soil area, obtaining soil coordinate pairing combinations with the possibility of pollution transmission; a pollution impact coefficient calculation unit to calculate the pollution impact coefficient of the source soil coordinates on the target soil coordinates for each soil coordinate pairing combination; and a soil pollution association map construction unit to construct a soil pollution association map based on the pollution impact coefficient, setting each soil coordinate within the area to be detected as an independent node in the graph structure and setting attribute values. Pollution state change calculation module: includes: a pollution state change calculation unit that calculates pollution state change data for each soil coordinate under different preset time dimensions based on the soil pollution correlation map.

[0016] Compared with the prior art, the beneficial effects of the present invention are: 1. This invention specifically collects soil volumetric moisture content at various soil coordinates, combines inherent soil hydrological parameters with preset fixed coefficients, and calculates the degree of dynamic pollution impact through piecewise mathematical formulas. It accurately responds to the migration patterns of pollutants in different humidity ranges, significantly improving the accuracy of dynamic pollution assessment.

[0017] 2. This invention achieves a quantitative characterization of spatial transmission of soil pollution by determining the spatial correlation between soil coordinates, screening pairings with the possibility of pollution transmission, calculating the pollution impact coefficient of the source on the target, and constructing a soil pollution correlation map, thus laying the foundation for predicting the evolution of spatiotemporal coupling.

[0018] 3. This invention first divides the soil area into grids and extracts feature sets of multi-source static basic data such as pollution sources and soil characteristics. It then uses a pre-trained XGBoost regression model to calculate the degree of static pollution impact, integrating the nonlinear fitting ability of machine learning with the physical mechanism of soil pollution migration, thus taking into account both prediction accuracy and adaptability to different scenarios. Attached Figure Description

[0019] Figure 1 This is a schematic diagram illustrating the steps of the machine learning-based method for predicting the evolution of soil pollution in this invention. Figure 2 This is a system structure diagram of the soil pollution evolution prediction system based on machine learning of this invention. Detailed Implementation

[0020] The technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of the present invention, and not all embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those skilled in the art without creative effort are within the scope of protection of the present invention.

[0021] Example: Figures 1-2 As shown, the present invention provides a technical solution: like Figure 1 As shown, this application provides a machine learning-based method for predicting the evolution of soil pollution, including the following steps: Step S100: Delineate the spatial boundary of the soil area, perform grid division, and generate several soil units, each soil unit corresponding to a unique soil coordinate; for each soil coordinate, obtain the static basic soil data and extract the corresponding feature set; use the feature set as the input to the trained machine learning algorithm model, and calculate the static pollution impact degree corresponding to each soil coordinate through the output. Specifically, the boundary delineation criteria are determined, the geospatial area of ​​the soil region is obtained, and the initial boundary is delineated. The initial boundary is then modified by combining natural topography, land feature boundaries, and pollution diffusion boundaries to eliminate overlapping and blank areas, forming a non-self-intersecting spatial boundary. The division scale and rules of the grid units are set, and the soil region within the boundary is discretized and divided into soil units by using the spatial boundary as a constraint. Soil units that intersect with the spatial boundary are then trimmed and modified. Using a unified geographic coordinate system, spatial feature parameters are extracted for each soil unit to generate a corresponding unique soil coordinate.

[0022] Furthermore, the static soil basic data includes static attribute data of pollution sources, soil inherent characteristic data, regional topography and geomorphology data, and soil spatial correlation attribute data, which are preprocessed. Each type of static soil basic data corresponds to a feature extraction dimension. For each feature extraction dimension, a corresponding feature extraction rule is preset, and the extraction logic of the feature items to be extracted under that dimension is determined. For each feature extraction dimension, the corresponding feature items are extracted from the matched static soil basic data according to the extraction rule corresponding to that dimension, and integrated to form a feature set.

[0023] Furthermore, historical soil sample data matching the soil type, pollution source type, and topographic features of the soil region are collected. Each sample contains a set of feature items that are completely consistent with the dimension of the feature set of the soil coordinates to be detected, as well as the measured static pollution impact label value of the corresponding sample point. The sample is divided into a training subset and a validation subset. The training subset is used for model iterative fitting, and the validation subset is used for model parameter verification. The machine learning algorithm model selected is the XGBoost regression model. The basic parameters of the tree structure, the number of iterations, the loss function type, and the regularization constraint parameters of the model are preset. Iterative training is performed, and the model is verified. After the verification is passed, the feature set is used as the input of the XGBoost regression model, and the static pollution impact degree corresponding to each soil coordinate is output.

[0024] In one specific embodiment, a 1km×1km soil area around a coking plant was selected as the area to be tested, and the degree of static pollution impact was calculated.

[0025] Based on the emission diffusion range of coking plant exhaust gas and the control boundary of surrounding farmland, an initial rectangular boundary was delineated. The boundary was then modified by combining the natural river on the west side and the municipal road on the east side. After eliminating blank areas, a 0.98 km² non-self-intersecting closed boundary was formed. A 50m×50m grid division scale was set. After discretization, the 12 grids intersecting with the river / road were trimmed and modified, ultimately generating 392 soil units. Using the WGS84 geographic coordinate system, the latitude and longitude of the center point of each unit were extracted as unique soil coordinates. For example, the coordinates of the core area unit are 116.5230°E and 39.8725°N, and the coordinates of the far field unit are 116.5315°E and 39.8802°N.

[0026] Four types of static basic data were collected and preprocessed. The static attributes of the pollution sources were the Cd / Pb emission rates (0.02~0.05mg / s) of the three exhaust gas emission outlets in the plant area, the inherent soil characteristics were the porosity of the regional loam (42%) and pH (7.8), the topography was the average slope of the region (3°, higher in the northwest and lower in the southeast), and the spatial correlation attributes were the distances of each unit from the main emission outlet (20~980m). Features were extracted according to four dimensions. For example, for the far-field unit 850m away from the main emission outlet, the feature set was: distance from the main emission outlet 850m, Cd emission rate 0.02mg / s, loam, and topographic slope of 2°.

[0027] Two hundred sets of matched historical soil samples were collected and divided into 140 training subsets and 60 validation subsets in a 7:3 ratio. The model tree depth was preset to 6, the number of iterations to 100, the loss function to mean squared error, and the L2 regularization coefficient to 0.1. After iterative training, the model validation R²=0.92, and the validation passed. The feature set of 392 soil units was input into the model, and the static pollution impact degree (values ​​from 0 to 1) was output. The core area unit 20m away from the main discharge outlet was 0.98, the far field unit 850m away was 0.15, and the transition area unit 500m away was 0.52.

[0028] Step S200: Collect the soil volumetric moisture content corresponding to each soil coordinate, and calculate the dynamic pollution impact degree corresponding to the soil coordinate by combining the corresponding static pollution impact degree. Specifically, for a single soil coordinate, inherent hydrological parameters, including field capacity and saturated water content, are retrieved from the static soil baseline data. Based on expert experience, preset fixed coefficients are determined according to the pollution type and soil texture, including wilting humidity conversion coefficient, pollution type humidity influence weighting coefficient, and saturation segment correction coefficient. The static pollution impact level, field capacity, saturated water content, and corresponding preset fixed coefficients corresponding to the coordinate are matched with the soil volumetric water content to form a single set of calculation parameters for the coordinate. The field capacity in the set of calculation parameters is multiplied by the wilting humidity conversion factor in the preset fixed coefficients to obtain the wilting humidity threshold for the soil coordinate. The soil volumetric water content in the set of calculation parameters is compared with the wilting humidity threshold and field capacity to determine the numerical range to which the soil volumetric water content belongs. The numerical ranges are classified as follows: soil volumetric water content is less than or equal to the wilting humidity threshold, soil volumetric water content is greater than the wilting humidity threshold and less than or equal to field capacity, and soil volumetric water content is greater than field capacity. Based on the numerical range of soil volumetric moisture content, select the corresponding piecewise mathematical formula, substitute all parameters in the set of calculation parameters into the formula, and obtain the dynamic pollution impact degree corresponding to the soil coordinate.

[0029] Furthermore, let the soil volumetric water content be... The wilting humidity threshold is Field water holding capacity is ; when At that time, the degree of dynamic pollution impact is a fixed proportion of the static value, and the calculation formula is: ; Where D represents the degree of dynamic pollution impact, and S represents the degree of static pollution impact. This is the humidity conversion factor for wilting; when At that time, the pollutant migration capacity increases linearly with humidity, and the calculation formula is: ; in, The weighting coefficient for the influence of humidity on pollution type; when At this time, the pollutant migration capacity continues to increase with increasing humidity, but the marginal effect gradually decreases and eventually tends to stabilize. The calculation formula is: ; Where k is the correction coefficient for the saturation segment. This represents the saturated moisture content.

[0030] In one specific embodiment, three typical soil coordinates (core area S=0.98, transition area S=0.52, and far-field area S=0.15) of a loam area surrounding a coking plant were used to calculate the dynamic pollution impact. The pollution type was easily activated heavy metals Cd / Pb. The specific experimental data and implementation process are as follows: Extracting inherent hydrological parameters of the soil: field water holding capacity saturated moisture content Preset fixed coefficients were determined based on expert experience: wilting humidity conversion coefficient. Pollution type and humidity influence weighting coefficient The correction factor for the saturation segment is k=3; the wilting humidity threshold is calculated. .

[0031] Real-time collection of soil volumetric moisture content at various coordinates: core area Transition Zone Far field area Each set of calculation parameters is matched with the static values ​​of each coordinate, hydrological parameters, and fixed coefficients to form three sets of calculation parameters.

[0032] Determine the moisture content range and substitute it into the formula for calculation: far-field region D = 0.15 × 0.2 = 0.03; Transition zone D = 0.52 × [0.2 + (1 - 0.2) × 0.8 × (0.2 - 0.064) / (0.32 - 0.064)] = 0.28; Core area , D=0.98×[1-(1-0.2-0.8×0.8)×exp(-3×(0.40-0.32) / (0.50-0.32))]=0.95.

[0033] Step S300: Determine the spatial relationship between all soil coordinates within the soil area to obtain soil coordinate pairings with the potential for pollution transmission; for each pairing, calculate the pollution influence coefficient of the source soil coordinates on the target soil coordinates; based on the pollution influence coefficient, construct a soil pollution association graph, setting each soil coordinate within the area to be detected as an independent node in the graph structure and setting attribute values. Specifically, all generated soil coordinates within the soil area are retrieved, along with the spatial attribute data bound to each soil coordinate, including the spatial range of the corresponding soil unit, topographic elevation, soil conductivity characteristics, and static pollution impact level. The driving logic for soil pollution transmission is determined to be the migration of pollutants driven by the flow of water along the topography and soil pores. Based on the driving logic, the determination dimensions for the possibility of pollution transmission are preset, namely spatial proximity, topographic runoff feasibility, soil conductivity adaptability, and pollution load basis. For each determination dimension, corresponding feasibility determination rules are preset, and soil coordinate pairings with the possibility of pollution transmission are selected.

[0034] Furthermore, for each pair of soil coordinates, data on the spatial distance between the source and target coordinates, relative topographic slope, permeability characteristics of the intermediate soil layer, and the degree of static pollution impact of the source are retrieved. The spatial distance attenuation factor is calculated using an exponential decay function algorithm, with the factor value increasing as the distance increases. The topographic slope gain factor, soil permeability conduction factor, and source pollution load basis factor are calculated using a linear normalization algorithm, and each parameter is uniformly mapped to a standardized interval of 0 to 1. The weighted arithmetic mean algorithm is used to sum each factor according to a preset fixed weight, and the interval truncation constraint algorithm is used to restrict the result to a reasonable physical meaning interval, thus obtaining the pollution impact coefficient of the source soil coordinates on the target soil coordinates.

[0035] In one specific embodiment, using the aforementioned 392 soil coordinates around the coking plant, three typical coordinates were selected for the experiment. The specific data and implementation process are as follows: Retrieve coordinates and spatial attributes: Core area coordinates A (116.5230°E, 39.8725°N), static S=0.98, elevation 28.5m, soil permeability coefficient 1.2×10⁻ 5 cm / s; adjacent coordinates B (116.5234°E, 39.8721°N), S=0.65, elevation 28.1m; transition zone coordinates C (116.5238°E, 39.8717°N), S=0.52, elevation 27.7m.

[0036] Screening and pairing combinations: According to the judgment dimensions, A and B (distance 45m, A elevation is higher than B, no conduction barrier, A load meets the standard) and B and C (distance 40m, B elevation is higher than C, soil layer is continuous) are valid pairs; A and C (distance 53m, although adjacent, there is a shallow impermeable layer in between) are eliminated, and finally 2 pairs are obtained (A→B, B→C).

[0037] Calculate the pollution impact coefficient: preset factor weights (distance 0.4, slope 0.2, infiltration 0.2, source load 0.2); A→B: distance factor 0.85, slope factor 0.32, infiltration factor 0.72, source load factor 0.98, the coefficient after weighted summation and truncation is 0.76; B→C: calculated similarly, the coefficient is 0.69.

[0038] Constructing the association graph: Set the 392 coordinates as nodes, and assign the node attributes to the corresponding static (A=0.98, B=0.65, C=0.52) and dynamic pollution impact degree. Construct directed edges between A→B and B→C, and assign edge attributes of 0.76 and 0.69 respectively to complete the construction of the association graph.

[0039] Step S400: Based on the soil pollution correlation map, calculate the pollution status change data corresponding to each soil coordinate under different preset time dimensions.

[0040] Specifically, several time dimensions are preset, and a time step discretization algorithm is used to decompose each time dimension into equidistant time steps. An attribute value mapping algorithm is used to use the dynamic pollution impact degree of each node as the initial pollution state value and bind it to the initial time node of the corresponding soil coordinate to form an initial pollution state matrix. A graph convolution iterative algorithm is used to iteratively calculate in units of time steps. Specifically, in each step, based on the pollution impact coefficient of the directed edges between nodes, a weighted summation algorithm is used to transmit the pollution state value of the source node to the target node according to the weight. An exponential decay algorithm is used to correct the pollution loss in the transmission process to obtain the pollution state update value of the target node under the current time step. After traversing and completing the iterative calculation of all preset time steps, a time dimension aggregation algorithm is used to integrate the node pollution state values ​​of each time step into the corresponding preset time dimension, and output the pollution state change data corresponding to each soil coordinate under different preset time dimensions.

[0041] In one specific embodiment, the time dimension and step size are preset: three time dimensions are preset: 1 day, 7 days and 30 days. The time step size discretization algorithm is used to decompose 1 day into 24 equidistant steps (1 step per hour), 7 days into 168 steps, and 30 days into 720 steps.

[0042] Initial pollution state matrix construction: Using an attribute value mapping algorithm, the dynamic pollution influence degree of nodes A, B, and C (A=0.95, B=0.28, C=0.03) is used as the initial value and bound to the initial time node to form an initial pollution state matrix of 392 nodes.

[0043] Graph convolution iterative calculation: The graph convolution iterative algorithm is adopted. Each step is based on the directed edge coefficients (A→B=0.76, B→C=0.69). The source node state value is transmitted through the weighted summation algorithm, and the loss is corrected by the exponential decay algorithm (decay coefficient 0.98).

[0044] Time-dimensional aggregation and result output: After traversing all step sizes, the time-dimensional aggregation algorithm is used to integrate the results and output the change data in different time dimensions: after 1 day, A=0.94, B=0.90, C=0.26; after 7 days, A=0.92, B=0.93, C=0.65; after 30 days, A=0.90, B=0.94, C=0.88, thus completing the calculation of pollution status changes at each coordinate.

[0045] like Figure 2 As shown, this application provides a machine learning-based soil pollution evolution prediction system, including: The static pollution impact degree calculation module includes: a soil region division unit that delineates the spatial boundaries of the soil region, performs grid division, and generates several soil units, each soil unit corresponding to a unique soil coordinate; a feature set extraction unit that, for each soil coordinate, obtains the static basic data of the soil and extracts the corresponding feature set; and a static pollution impact degree calculation unit that uses the feature set as input to the trained machine learning algorithm model and calculates the static pollution impact degree corresponding to each soil coordinate. Dynamic pollution impact degree calculation module: includes: Dynamic pollution impact degree calculation unit collects the soil volume moisture content corresponding to each soil coordinate, and calculates the dynamic pollution impact degree corresponding to the soil coordinate by combining the corresponding static pollution impact degree. The soil pollution association map construction module includes: a soil coordinate pairing unit to determine the spatial relationship between all soil coordinates within the soil area, obtaining soil coordinate pairing combinations with the possibility of pollution transmission; a pollution impact coefficient calculation unit to calculate the pollution impact coefficient of the source soil coordinates on the target soil coordinates for each soil coordinate pairing combination; and a soil pollution association map construction unit to construct a soil pollution association map based on the pollution impact coefficient, setting each soil coordinate within the area to be detected as an independent node in the graph structure and setting attribute values. Pollution state change calculation module: includes: a pollution state change calculation unit that calculates pollution state change data for each soil coordinate under different preset time dimensions based on the soil pollution correlation map.

[0046] It will be apparent to those skilled in the art that the present invention is not limited to the details of the exemplary embodiments described above, and that the invention can be implemented in other specific forms without departing from its spirit or essential characteristics. Therefore, the embodiments should be considered in all respects as exemplary and non-limiting, and the scope of the invention is defined by the appended claims rather than the foregoing description. Thus, all variations falling within the meaning and scope of equivalents of the claims are intended to be included within the present invention. No reference numerals in the claims should be construed as limiting the scope of the claims.

Claims

1. A method for predicting the evolution of soil pollution based on machine learning, characterized in that, Includes the following steps: The spatial boundaries of the soil region are defined, and the region is divided into grids to generate several soil units, each corresponding to a unique soil coordinate. For each soil coordinate, static basic soil data is obtained, and the corresponding feature set is extracted. The feature set is used as input to the trained machine learning algorithm model, and the static pollution impact level corresponding to each soil coordinate is obtained by calculating the output. Collect the soil volumetric moisture content corresponding to each soil coordinate, and calculate the dynamic pollution impact corresponding to that soil coordinate by combining the corresponding static pollution impact degree. The spatial relationships between all soil coordinates within the soil area are determined to obtain soil coordinate pairings with the potential for pollution transmission. For each pairing, the pollution influence coefficient of the source soil coordinates on the target soil coordinates is calculated. Based on the pollution influence coefficient, a soil pollution association graph is constructed, and each soil coordinate within the area to be detected is set as an independent node in the graph structure, with attribute values ​​set. Based on the soil pollution correlation map, the pollution status change data corresponding to each soil coordinate under different preset time dimensions were calculated.

2. The method for predicting the evolution of soil pollution based on machine learning according to claim 1, characterized in that, The spatial boundary of the defined soil region is divided into grids to generate several soil units, each corresponding to a unique soil coordinate system, including: The boundary delineation criteria are determined, the geospatial area of ​​the soil region is obtained, and the initial boundary is delineated. The initial boundary is then modified by combining natural topography, land feature boundaries, and pollution diffusion boundaries to eliminate overlapping and blank areas, forming a non-self-intersecting spatial boundary. The division scale and rules of the grid cells are set, and the soil region within the boundary is discretized and divided into soil cells by using the spatial boundary as a constraint. Soil cells that intersect with the spatial boundary are then trimmed and modified. Using a unified geographic coordinate system, spatial feature parameters are extracted for each soil cell to generate a corresponding unique soil coordinate.

3. The method for predicting the evolution of soil pollution based on machine learning according to claim 1, characterized in that, For each soil coordinate, static basic soil data is acquired, and the corresponding feature set is extracted, including: The static soil baseline data includes static attribute data of pollution sources, soil inherent property data, regional topography data, and soil spatial correlation attribute data, which are preprocessed. Each type of static soil baseline data corresponds to a feature extraction dimension. For each feature extraction dimension, a corresponding feature extraction rule is preset to determine the extraction logic of the feature items to be extracted under that dimension. For each feature extraction dimension, the corresponding feature items are extracted from the matched static soil baseline data according to the extraction rule corresponding to that dimension, and integrated to form a feature set.

4. The method for predicting the evolution of soil pollution based on machine learning according to claim 1, characterized in that, The process of using the feature set as input to a trained machine learning algorithm model and calculating the static pollution impact level corresponding to each soil coordinate includes: Historical soil sample data matching the soil type, pollution source type, and topographic features of the soil region were collected. Each sample contained a set of feature items with dimensions completely consistent with the feature set of the soil coordinates to be detected, as well as the measured static pollution impact label value of the corresponding sample point. The samples were divided into a training subset and a validation subset. The training subset was used for model iterative fitting, and the validation subset was used for model parameter verification. The machine learning algorithm model selected was the XGBoost regression model. The basic parameters of the tree structure, the number of iterations, the loss function type, and the regularization constraint parameters of the model were preset, and iterative training and model verification were performed. After the verification was passed, the feature set was used as the input of the XGBoost regression model, and the static pollution impact degree corresponding to each soil coordinate was output.

5. The method for predicting the evolution of soil pollution based on machine learning according to claim 1, characterized in that, The process involves collecting the soil volumetric moisture content corresponding to each soil coordinate, combining it with the corresponding static pollution impact level, and calculating the dynamic pollution impact level corresponding to that soil coordinate, including: For a single soil coordinate, inherent hydrological parameters, including field capacity and saturated water content, are retrieved from the static soil baseline data. Based on expert experience, preset fixed coefficients are determined according to the pollution type and soil texture, including wilting humidity conversion coefficient, pollution type humidity influence weighting coefficient, and saturation segment correction coefficient. The static pollution impact level, field capacity, saturated water content, and corresponding preset fixed coefficients corresponding to the coordinate are matched with the soil volumetric water content to form a single set of calculation parameters for the coordinate. The field capacity in the set of calculation parameters is multiplied by the wilting humidity conversion factor in the preset fixed coefficients to obtain the wilting humidity threshold for the soil coordinate. The soil volumetric water content in the set of calculation parameters is compared with the wilting humidity threshold and field capacity to determine the numerical range to which the soil volumetric water content belongs. The numerical ranges are classified as follows: soil volumetric water content is less than or equal to the wilting humidity threshold, soil volumetric water content is greater than the wilting humidity threshold and less than or equal to field capacity, and soil volumetric water content is greater than field capacity. Based on the numerical range of soil volumetric moisture content, select the corresponding piecewise mathematical formula, substitute all parameters in the set of calculation parameters into the formula, and obtain the dynamic pollution impact degree corresponding to the soil coordinate.

6. The method for predicting the evolution of soil pollution based on machine learning according to claim 5, characterized in that, The process involves selecting a piecewise mathematical formula based on the numerical range of soil volumetric moisture content, substituting all parameters from the set of calculation parameters into the formula, and obtaining the dynamic pollution impact level corresponding to the soil coordinates. This includes: Assume the soil volumetric water content is The wilting humidity threshold is Field water holding capacity is ; when At that time, the degree of dynamic pollution impact is a fixed proportion of the static value, and the calculation formula is: ; Where D represents the degree of dynamic pollution impact, and S represents the degree of static pollution impact. This is the humidity conversion factor for wilting; when At that time, the pollutant migration capacity increases linearly with humidity, and the calculation formula is: ; in, The weighting coefficient for the influence of humidity on pollution type; when At this time, the pollutant migration capacity continues to increase with increasing humidity, but the marginal effect gradually decreases and eventually tends to stabilize. The calculation formula is: ; Where k is the correction coefficient for the saturation segment. This represents the saturated moisture content.

7. The method for predicting the evolution of soil pollution based on machine learning according to claim 1, characterized in that, The process of determining the spatial correlation between all soil coordinates within a soil area to obtain soil coordinate pairings with the potential for pollution transmission includes: Retrieve all generated soil coordinates within the soil area, along with the spatial attribute data bound to each soil coordinate, including the spatial extent of the corresponding soil unit, topographic elevation, soil conductivity characteristics, and static pollution impact level. Determine the driving logic of soil pollution transmission as the migration of pollutants driven by the flow of water along the topography and soil pores. Based on the driving logic, preset judgment dimensions for the possibility of pollution transmission, namely spatial proximity, topographic runoff feasibility, soil conductivity adaptability, and pollution load basis. For each judgment dimension, preset corresponding feasibility judgment rules, and filter out soil coordinate pairings with the possibility of pollution transmission.

8. The method for predicting the evolution of soil pollution based on machine learning according to claim 1, characterized in that, For each pair of soil coordinates, the pollution influence coefficient of the source soil coordinates on the target soil coordinates is calculated, including: For each pair of soil coordinates, data on the spatial distance between the source and target coordinates, relative topographic slope, permeability characteristics of the intermediate soil layer, and the degree of static pollution impact of the source are retrieved. The spatial distance attenuation factor is calculated using an exponential decay function algorithm, with higher factor values ​​for closer distances. The topographic slope gain factor, soil permeability and conduction factor, and source pollution load basis factor are calculated using a linear normalization algorithm, and each parameter is uniformly mapped to a standardized interval of 0 to 1. The weighted arithmetic mean algorithm is used to sum each factor according to a preset fixed weight, and the interval truncation constraint algorithm is used to restrict the results to a reasonable physical meaning interval, thus obtaining the pollution impact coefficient of the source soil coordinates on the target soil coordinates.

9. The method for predicting the evolution of soil pollution based on machine learning according to claim 1, characterized in that, The pollution status change data corresponding to each soil coordinate under different preset time dimensions, calculated based on the soil pollution correlation map, includes: Several time dimensions are preset, and a time step discretization algorithm is used to decompose each time dimension into equidistant time steps. An attribute value mapping algorithm is used to use the dynamic pollution impact degree of each node as the initial pollution state value and bind it to the initial time node of the corresponding soil coordinate to form an initial pollution state matrix. A graph convolution iterative algorithm is used to iterate and calculate in units of time steps. Specifically, in each step, based on the pollution impact coefficient of the directed edges between nodes, a weighted summation algorithm is used to transmit the pollution state value of the source node to the target node according to the weight. An exponential decay algorithm is used to correct the pollution loss in the transmission process to obtain the pollution state update value of the target node under the current time step. After traversing and completing the iterative calculation of all preset time steps, a time dimension aggregation algorithm is used to integrate the node pollution state values ​​of each time step into the corresponding preset time dimension, and output the pollution state change data corresponding to each soil coordinate under different preset time dimensions.

10. A machine learning-based soil pollution evolution prediction system, using the machine learning-based soil pollution evolution prediction method according to any one of claims 1-9, characterized in that, include: The static pollution impact degree calculation module includes: a soil region division unit that delineates the spatial boundaries of the soil region, performs grid division, and generates several soil units, each soil unit corresponding to a unique soil coordinate; a feature set extraction unit that, for each soil coordinate, obtains the static basic data of the soil and extracts the corresponding feature set; and a static pollution impact degree calculation unit that uses the feature set as input to the trained machine learning algorithm model and calculates the static pollution impact degree corresponding to each soil coordinate. Dynamic pollution impact degree calculation module: includes: Dynamic pollution impact degree calculation unit collects the soil volume moisture content corresponding to each soil coordinate, and calculates the dynamic pollution impact degree corresponding to the soil coordinate by combining the corresponding static pollution impact degree. The soil pollution association map construction module includes: a soil coordinate pairing unit to determine the spatial relationship between all soil coordinates within the soil area, obtaining soil coordinate pairing combinations with the possibility of pollution transmission; a pollution impact coefficient calculation unit to calculate the pollution impact coefficient of the source soil coordinates on the target soil coordinates for each soil coordinate pairing combination; and a soil pollution association map construction unit to construct a soil pollution association map based on the pollution impact coefficient, setting each soil coordinate within the area to be detected as an independent node in the graph structure and setting attribute values. Pollution state change calculation module: includes: a pollution state change calculation unit that calculates pollution state change data for each soil coordinate under different preset time dimensions based on the soil pollution correlation map.