A machine learning-based intelligent decision system for soil pollution remediation of a dump
By constructing a spatiotemporal graph structure and a local curvature adaptive estimation mechanism for spoil heaps, and combining graph structure-driven reinforcement learning, the problem of pollution diffusion assessment and remediation optimization of spoil heaps was solved, achieving efficient and reliable pollution remediation decisions.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- GUIZHOU UNIV
- Filing Date
- 2026-05-20
- Publication Date
- 2026-06-16
AI Technical Summary
Existing intelligent technologies are insufficient to effectively characterize the spatiotemporal coupling relationship of pollution diffusion in spoil heaps. Model training relies on fixed optimization strategies and lacks adaptive control, making it difficult to achieve coordinated optimization of the pollution control process and balance resource constraints.
A spatiotemporal graph structure for spoil heaps is constructed, and a local curvature adaptive estimation mechanism and conditional gradient update are introduced. Combined with a graph structure-driven reinforcement learning decision model, closed-loop optimization of pollution status assessment and remediation strategies is achieved.
It improves the spatial consistency of pollution status assessment and the engineering feasibility of remediation strategies, thereby increasing the efficiency and cost-effectiveness of pollution control.
Smart Images

Figure CN122222425A_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the field of environmental information processing technology, and proposes an intelligent decision-making system for soil pollution remediation of spoil heaps based on machine learning. Background Technology
[0002] With the expansion of mineral resource development, spoil heaps are prone to pollutant migration and diffusion under the influence of rainfall infiltration, surface runoff, and groundwater transport, causing continuous impacts on the regional ecological environment. While existing intelligent technologies have incorporated the Internet of Things and machine learning methods for assessing pollution status and making remediation decisions, certain limitations remain: multi-source monitoring data exhibits strong heterogeneity across spatiotemporal scales, making it difficult for existing methods to effectively characterize the spatiotemporal coupling relationship of pollution diffusion; conventional neural network-based modeling methods lack the ability to express the spatial topology and propagation paths of spoil heaps, affecting the spatial consistency of assessment results; furthermore, model training often relies on fixed optimization strategies, lacking adaptive control over parameter update processes, resulting in lower convergence stability; and in terms of remediation decisions, existing methods lack the ability to collaboratively model multi-stage remediation processes and resource constraints, making it difficult to obtain optimized solutions that balance effectiveness and feasibility. Therefore, it is necessary to construct an intelligent pollution assessment and remediation decision-making method for complex spatiotemporal environments. Summary of the Invention
[0003] This invention addresses the challenges of accurately modeling soil pollution in spoil heaps under complex terrain and multi-media coupling environments, as well as the shortcomings of existing intelligent methods in terms of assessment consistency and collaborative optimization of decisions. It proposes an intelligent decision-making system for spoil heap soil pollution remediation based on machine learning. By constructing a spatiotemporal graph structure for spoil heaps that can simultaneously characterize pollution migration paths, risk diffusion relationships, and construction constraints, multi-source heterogeneous monitoring data are uniformly embedded into the graph topology space, achieving an integrated expression of pollution status in both spatial structure and temporal evolution dimensions. Based on this, a local curvature adaptive estimation mechanism oriented towards parameter update trajectories is introduced, transforming historical parameter evolution behavior into dynamic smooth scale constraints. Combined with conditional gradient updates and closed-step analytical solutions, this achieves coordinated matching of "update direction—update amplitude—curvature response" during model training, thereby improving convergence efficiency and training stability without pre-setting a global Lipschitz constant or additional line search. Furthermore, at the remediation decision-making level, the spatiotemporal graph structure of spoil heaps is combined with continuous control strategy learning. A graph-driven reinforcement learning decision model is constructed under the spatial constraints of candidate remediation actions. Through a multi-step benefit backpropagation mechanism, the pollution control effect, diffusion suppression capability, and resource consumption across decision cycles are jointly modeled, achieving global optimization of the remediation strategy in the long-term domain. This enables pollution status assessment and remediation strategy generation to form a closed-loop optimization process with consistent structure and interconnected information, thereby improving the precision of pollution control decisions and the feasibility of engineering projects.
[0004] This invention proposes an intelligent decision-making system for soil pollution remediation of spoil heaps based on machine learning, including a spoil heap collaborative monitoring access module, an edge preprocessing module, a site spatiotemporal map modeling module, a pollution status assessment module, and a candidate remediation strategy generation module;
[0005] The spoil heap collaborative monitoring access module collects spoil heap monitoring data and obtains raw environmental monitoring data packages;
[0006] The edge preprocessing module performs standardized preprocessing and quality assessment on the original environmental monitoring data packets, and outputs a standardized site basic dataset with unified structure, standardized attributes and confidence weight annotations.
[0007] The site spatiotemporal graph modeling module discretizes the spoil heap into several graph node objects; extracts spatial unit features corresponding to each graph node object from the standardized site basic dataset, and constructs a node attribute matrix; constructs graph edges and forms an adjacency matrix; and generates a spoil heap spatiotemporal graph structure containing the node attribute matrix and the adjacency matrix, which serves as the spoil heap spatiotemporal graph data output.
[0008] The pollution status assessment module constructs a pollution assessment model. Spatiotemporal map data of spoil heaps are input into the pollution assessment model based on GraphSAGE and GRU. The model jointly identifies and assesses the pollution type, pollution level, diffusion risk, migration trend, remediation urgency, ecological sensitivity, and construction sensitivity of each map node, generating an initial pollution risk profile. An adaptive Lipschitz-free conditional gradient optimization mechanism is used to adaptively optimize the parameter update process of the pollution assessment model, resulting in a converged pollution assessment model. Based on the converged pollution assessment model, inference calculations are performed to obtain an optimized pollution risk profile. Based on the optimized pollution risk profile, combined with preset risk classification rules, remediation constraints, and resource allocation constraints, pollution status assessment results are generated. Priority ranking is performed according to the risk level, remediation urgency, and resource constraints of each plot, outputting a remediation demand map with priority labels.
[0009] The candidate remediation strategy generation module, based on the remediation demand map, calls upon a preset remediation process knowledge base to generate a set of candidate remediation strategies for different contaminated sites. It then parameterizes the material type, reagent ratio, dosage, construction equipment, construction sequence, construction period, compliance indicators, resource consumption, and cost boundary for each candidate strategy, outputting a candidate remediation action space. Using the spatiotemporal map data of the spoil heap as state input, and under the constraints of the candidate remediation action space, it runs a graph-driven continuous control decision model to solve the strategy, determine the target remediation scheme, and outputs the site-level process configuration results, construction scheduling results, and phased resource allocation results corresponding to the target remediation scheme. Finally, it graphically displays the output results, triggers alarms, and generates reports, producing a traceable intelligent decision report for spoil heap soil pollution remediation.
[0010] Furthermore, the process of adaptively optimizing the parameter update process of the pollution assessment model using an adaptive Lipschitz-free conditional gradient optimization mechanism to obtain the converged pollution assessment model includes the following steps:
[0011] Step S1: Extract the current batch of training samples from the spatiotemporal map data of the spoil heap according to the preset sampling strategy; drive the pollution assessment model to perform forward inference and loss calculation based on the current batch of training samples to obtain the current round loss function value and obtain the current round model parameter vector as the current parameter point; and simultaneously extract the parameter difference vector, local curvature estimation results and gradient estimation results of each round in the parameter evolution process of the previous several rounds to construct a historical trajectory cache sequence corresponding to the parameter evolution process;
[0012] Step S2: Based on the historical trajectory cache sequence, extract parameter update information for each round to construct the parameter update trajectory; perform recursive cumulative calculation on the norm information of the parameter difference vector for each round to characterize the geometric change intensity of the parameter update trajectory, and calculate the parameter update trend change term based on the difference in parameter update amplitude between adjacent rounds. Joint modeling is performed based on the geometric change intensity and the parameter update trend change term to generate the local curvature estimation variable for the current training round; on this basis, the trajectory geometric change during parameter evolution is explicitly introduced into the smooth constraint modeling process, and a local smooth scale estimation result dynamically matched to the current training round is generated based on the local curvature estimation variable; the local smooth scale estimation result serves as the local smooth scale in the parameter update process of the current round.
[0013] Step S3: Under the constraint of local smooth scale, combining the stochastic gradient information of the current batch of training samples with the gradient estimation results of the previous rounds adjacent to the current round in the historical trajectory cache sequence, perform recursive estimation update of the gradient direction of the current round to obtain the gradient estimation result of the current round; within the parameter constraint domain of the pollution assessment model, construct a conditional gradient linear minimization subproblem with the gradient estimation result of the current round as the linear objective coefficient; according to the linear descent direction represented by the gradient estimation result of the current round, perform linear minimization solution within the parameter constraint domain to determine the target candidate parameter point corresponding to the parameter update of the current round;
[0014] Step S4: After obtaining the current parameter point and the target candidate parameter point, combine the local smooth scale estimation result obtained in Step S2 and the current round gradient estimation result obtained in Step S3 to construct a quadratic upper bound approximation model for the current round parameter update, which is used to characterize the local change trend of the loss function in the neighborhood of the current parameter; define the update ratio coefficient of the current parameter point along the direction of the target candidate parameter point as the step size parameter; perform analytical minimization on the step size parameter based on the quadratic upper bound approximation model to obtain the closed adaptive update step size of the current round;
[0015] Step S5: Based on the closed adaptive update step size, drive the current parameter point to perform conditional gradient advancement along the direction of the target candidate parameter point, and after the parameter advancement is completed, extract and record the parameter difference vector before and after the update, and write the obtained parameter movement result back to the historical trajectory cache sequence.
[0016] Step S6: Repeat steps S2 to S6 to perform multiple rounds of iterative updates on the pollution assessment model parameters and output the converged pollution assessment model.
[0017] Furthermore, graph-structure-driven continuous control decision models include:
[0018] The graph structure feature encoding submodule uses a graph convolutional network to encode the spatiotemporal graph data of the spoil heap, extracts the topological embedding features corresponding to the nodes and their adjacency relationships of each plot, and constructs the state representation input.
[0019] The policy network submodule generates initial continuous repair actions for each plot based on the state representation input using a policy network, and performs policy optimization using a dual critic network.
[0020] The remediation decision optimization module, based on a preset reward function, calculates corresponding reward information by combining the pollution state changes, pollution diffusion changes, resource consumption results, and construction feedback results generated by each plot after executing the initial continuous remediation actions. Under the constraint of the candidate remediation action space, during the parameter update process of the dual critic network, a multi-step tree backpropagation benefit propagation mechanism guided by the target strategy is introduced to perform recursive cumulative estimation of reward information within multiple decision cycles to construct multi-step target benefits. Based on the multi-step target benefits, the benefit error is calculated, the dual critic network is updated, and the strategy network is optimized and updated based on the updated dual critic network to obtain the continuous remediation control actions.
[0021] The decision output submodule determines the target repair scheme based on the continuous repair control actions.
[0022] Furthermore, under the constraint of the candidate repair action space, a multi-step tree backpropagation benefit propagation mechanism based on the target policy is introduced during the parameter update process of the dual critic network. This mechanism performs recursive cumulative estimation of reward information over multiple decision cycles to construct a multi-step target benefit, specifically including the following steps:
[0023] Step T1: Obtain the immediate reward information corresponding to the current decision cycle as the current cycle reward, and obtain the reward feedback information corresponding to multiple subsequent decision cycles as the subsequent cycle rewards; simultaneously obtain the state transition sequence corresponding to multiple subsequent decision cycles, and input the state representation corresponding to each subsequent decision cycle into the target policy network in sequence to generate the corresponding target continuous repair action; input the state representation corresponding to each subsequent decision cycle and the target continuous repair action together into the target critic network to obtain the target action value estimation result corresponding to each subsequent decision cycle, and extract the target action value estimation result corresponding to the final decision cycle as the final target action value;
[0024] Step T2: According to the preset discount factor, the current period reward, subsequent period rewards and the value of the final target action are cumulatively discounted in a recursive manner, and the revenue of each level is processed layer by layer to build a multi-step tree backpropagation revenue chain for the current state-action pair;
[0025] Step T3: Perform recursive discount aggregation on each level of reward items and the final target value item in the multi-step tree backpropagation revenue chain to construct the multi-step target revenue corresponding to the current state-action pair.
[0026] By adopting the above solution, the beneficial effects achieved by the present invention are as follows:
[0027] This invention constructs a spatiotemporal map structure for spoil heaps that can uniformly represent pollution migration paths, risk diffusion relationships, and construction constraints. This enables integrated modeling of multi-source heterogeneous monitoring data in both spatial topology and temporal evolution dimensions, enhancing the overall characterization and spatial consistency of spoil heap pollution status. It solves the problems of data fragmentation and insufficient expression of pollution diffusion relationships in existing intelligent methods, and improves the matching degree between pollution risk assessment results and actual site structure and pollution propagation mechanisms. This provides a more realistic, complete, and engineering-guiding data foundation for subsequent remediation decisions.
[0028] This invention introduces a local curvature adaptive estimation mechanism based on parameter update trajectory, and combines conditional gradient update with closed-loop adaptive step size solution to achieve dynamic matching between update direction, update magnitude and local curvature response during model training. This improves the convergence efficiency and training stability of the pollution assessment model, solves the problems of existing models relying on preset optimization parameters, large fluctuations in the convergence process and complex parameter tuning, and enhances the robustness and generalization ability of the model under complex non-stationary environmental data. Thus, it ensures that the pollution status assessment results have high reliability and stability under different working conditions.
[0029] This invention constructs a graph-driven continuous control decision-making model and introduces a multi-step benefit feedback mechanism under the space constraints of candidate remediation actions. This enables cross-cycle synergistic optimization of pollution control effectiveness, diffusion suppression capability, and resource consumption, improving the global optimality and feasibility of remediation strategies under long-term conditions. It solves the problems of existing intelligent decision-making methods that are difficult to balance control effectiveness and resource constraints and lack overall optimization capabilities. This enhances the feasibility and scheduling rationality of soil pollution remediation schemes for spoil heaps in actual engineering projects, thereby effectively improving pollution control efficiency and reducing comprehensive control costs. Attached Figure Description
[0030] Figure 1 This is a schematic diagram comparing the training loss convergence curves of the pollution assessment model in Embodiment 1 and Embodiment 2 of the present invention; wherein, the horizontal axis represents the training rounds, and the vertical axis represents the loss function value; the blue solid line represents the loss change curve when the pollution assessment model is trained using the adaptive Lipschitz-free conditional gradient optimization mechanism in Embodiment 1, the orange dashed line represents the loss change curve when the pollution assessment model is trained using the SGD optimization mechanism in Embodiment 2, and the dotted line represents the preset convergence threshold;
[0031] Figure 2This is a comparative schematic diagram of the spatial distribution of pollution risks generated in Embodiment 1 and Embodiment 2 of the present invention. Among them, (a) is the pollution risk distribution map output by the pollution assessment model trained by the adaptive Lipschitz-free conditional gradient optimization mechanism in Embodiment 1, and (b) is the pollution risk distribution map output by the pollution assessment model trained by the SGD optimization mechanism in Embodiment 2. The horizontal axis represents the horizontal position of the site, and the vertical axis represents the vertical position of the site. The colors in the figure from light to dark represent the pollution risk scores of the corresponding areas from low to high. As can be seen from Figure (a), in Embodiment 1, high-risk area A, high-risk area B, and the second-highest-risk area C show relatively independent and clearly defined distribution characteristics. The transition range between each risk area is small, and the risk gradient changes are concentrated, which can accurately depict the location of the pollution source and its spatial influence range. As can be seen from Figure (b), in Embodiment 2, there is an obvious risk diffusion zone between high-risk area A and high-risk area B. The boundary of the area shows a stretching and merging trend. The boundary between high-risk areas is not clear. At the same time, some medium and low-risk areas show diffusion tailing phenomenon, which leads to the amplification of the risk distribution range. Detailed Implementation
[0032] The technical solutions of the present invention will be clearly and completely described below with reference to the accompanying drawings of the embodiments of the present invention. Obviously, the described embodiments are only some embodiments of the present invention, and not all embodiments. All other embodiments obtained by those skilled in the art based on the embodiments of the present invention without creative effort are within the scope of protection of the present invention.
[0033] Example 1, according to Figure 1 , Figure 2 This invention proposes an intelligent decision-making system for soil pollution remediation of spoil heaps based on machine learning, including a spoil heap collaborative monitoring access module, an edge preprocessing module, a site spatiotemporal map modeling module, a pollution status assessment module, and a candidate remediation strategy generation module.
[0034] This embodiment takes the southern area of the spoil heap of the A open-pit mine as the object. The total area of the spoil heap is 18.6 hm², the average elevation is 1268 m, the slope is 8° to 27°, and the area is equipped with drainage ditches, temporary stockpiling areas, leachate collection ponds and downstream agricultural irrigation sensitive areas.
[0035] The collaborative monitoring access module for spoil heaps is deployed at the on-site monitoring layer of the spoil heap, and is set up in the spoil heap platform, slope area, perimeter of retaining works, leachate collection area, drainage ditch, temporary stockpile area and adjacent area of downstream sensitive receptors. It includes soil in-situ monitoring sensor nodes, groundwater monitoring well acquisition terminals, leachate water quality acquisition terminals, meteorological monitoring terminals, UAV remote sensing acquisition units and mobile sampling terminals. It is used to continuously collect heavy metal concentration, organic pollutant concentration, pH value, moisture content, conductivity, redox potential, soil temperature, groundwater depth, leachate water quality parameters, rainfall, wind speed and direction, surface temperature and humidity, topography slope, vegetation coverage, remote sensing spectral characteristics and historical treatment records for each spatial zone of the spoil heap, and output raw environmental monitoring data packages with unified time labels, spatial coordinate labels and sampling source labels.
[0036] In this embodiment, the spoil heap is divided into 12 plot units, 4 slope units, 3 drainage units, 2 seepage-sensitive units and 3 construction units according to topography, pollution distribution, drainage structure and construction accessibility, forming a total of 24 graph node objects.
[0037] The project included: 1. Installing 36 in-situ soil monitoring sensor nodes in key areas to monitor parameters such as lead (Pb), cadmium (Cd), arsenic (As), total petroleum hydrocarbons (TPH), pH, moisture content, conductivity, and redox potential; 2. Deploying 6 groundwater monitoring wells downstream to collect data on groundwater depth, Pb concentration, Cd concentration, sulfate concentration, and conductivity; 3. Setting up 3 leachate water quality collection terminals in the leachate collection area to collect COD, suspended solids, Pb, Cd, and total dissolved solids; 4. Setting up 1 meteorological monitoring terminal in the center of the site to collect rainfall, wind speed, wind direction, and air temperature and humidity; 5. Conducting weekly low-altitude surveys using UAV remote sensing units to acquire orthophotos, slope damage information, vegetation coverage, and multispectral reflectance characteristics, resulting in 12,384 original monitoring records.
[0038] The edge preprocessing module is deployed at the spoil heap control station and communicates with both the spoil heap collaborative monitoring access module and the site spatiotemporal map modeling module. It performs standardized preprocessing and quality assessment on the raw environmental monitoring data packets. Specifically, it performs time alignment on multi-source monitoring data with different sampling frequencies based on a unified time synchronization benchmark, and spatial registration based on topographic control points, remote sensing positioning results, and monitoring well spatial coordinates. Subsequently, it performs drift correction, threshold filtering, spatiotemporal interpolation completion, and duplicate sample merging for sensor drift, outliers, missing sampling points, and duplicate sampling data. Based on this, it constructs data quality assessment indicators by considering sampling methods, sampling equipment accuracy levels, differences in detection methods, and data integrity, and performs quantitative modeling of the credibility of samples from each plot, assigning corresponding weights. Finally, it outputs a standardized site basic dataset with a unified structure, standardized attributes, and credibility weight annotations.
[0039] The site spatiotemporal map modeling module is deployed on a cloud computing platform and communicates with the pollution status assessment module; the spoil heap is discretized into several graph node objects; spatial unit features corresponding to each graph node object are extracted from the standardized site basic dataset to construct a node attribute matrix; graph edges are constructed and an adjacency matrix is formed; a spoil heap spatiotemporal map structure containing the node attribute matrix and the adjacency matrix is generated as the spoil heap spatiotemporal map data output;
[0040] In this embodiment, a total of 5 types of graph edges are constructed: 1. 46 slope runoff transmission edges; 2. 18 underground seepage coupling edges; 3. 52 spatial adjacency edges; 4. 27 transportation connectivity edges; 5. 31 construction coordination edges; finally forming a spatiotemporal graph of the spoil heap with an average node degree of 7.25.
[0041] The pollution status assessment module is deployed on GPU computing nodes of a cloud computing platform. It constructs a pollution assessment model comprising a graph structure feature encoder based on GraphSAGE and a temporal feature modeling unit based on GRU. Spatiotemporal map data of spoil heaps are input into the GraphSAGE and GRU-based pollution assessment model to jointly identify and assess the pollution type, pollution level, diffusion risk, migration trend, remediation urgency, ecological sensitivity, and construction sensitivity of map nodes in each plot, generating an initial pollution risk profile. An adaptive Lipschitz-free conditional gradient optimization mechanism is used to adaptively optimize the parameter update process of the pollution assessment model, reducing the dependence on the preset global smoothness constant during training and improving model convergence efficiency, resulting in a converged pollution assessment model. Inference calculations are then performed based on the converged pollution assessment model to obtain an optimized pollution risk profile. Based on the optimized pollution risk profile, combined with preset risk classification rules, remediation constraints, and resource allocation constraints, pollution status assessment results are generated. Priority ranking is performed according to the risk level, remediation urgency, and resource constraints of each plot, outputting a remediation demand map with priority labels.
[0042] In this embodiment, a contamination assessment model using GraphSAGE+GRU is adopted, wherein: GraphSAGE has 2 layers; the graph embedding dimension is 64 dimensions; and the GRU hidden layer dimension is 128 dimensions;
[0043] Training settings: Training, validation, and test sets are divided in a 7:2:1 ratio; batch size is 32; maximum training epochs are 180; the parameter constraint domain uses L2 norm constraints.
[0044] Taking plots A3, B2, C1, and D4 as examples, the evaluation results are shown in Table 1:
[0045] Table 1
[0046] ;
[0047] The system classified A3 as a first-level high-risk priority remediation site based on preset classification rules. The main reason is that its Pb and Cd concentrations are significantly exceeded, and it is located at the intersection of the slope runoff edge and the underground seepage edge, which poses a significant risk of downstream migration.
[0048] The candidate remediation strategy generation module communicates with the pollution status assessment module and the site spatiotemporal map modeling module. It uses a pre-defined remediation process knowledge base based on the remediation demand map to generate a set of candidate remediation strategies for different contaminated sites. For each candidate strategy, it parameterizes the material type, reagent ratio, dosage, construction equipment, construction sequence, construction period, compliance indicators, resource consumption, and cost boundary, outputting the candidate remediation action space. Using the spatiotemporal map data of the spoil heap as state input, it runs a graph-driven continuous control decision model under the constraints of the candidate remediation action space to solve strategies, determine the target remediation scheme, and output the site-level process configuration results, construction scheduling results, and phased resource allocation results corresponding to the target remediation scheme. The output results are graphically displayed, alarms are triggered, and reports are generated, resulting in a traceable intelligent decision-making report for spoil heap soil pollution remediation. The candidate remediation strategy set includes soil replacement strategies, stabilization / solidification strategies, chemical leaching strategies, phytoremediation strategies, microbial remediation strategies, isolation and covering strategies, drainage control strategies, and combined composite process strategies.
[0049] Taking the highest priority plot A3 as an example, plot A3 has an area of 1.42 hm², an average contamination depth of 1.0 m, and an estimated contaminated soil volume of 14,200 m³; the system generates the following candidate remediation strategies:
[0050] Candidate Scheme S1: Stabilization / Consolidation + Soil Covering: Consolidating Agent: Phosphate Stabilizer + Cement-based Consolidating Agent; Dosage: 5.5%; Soil Covering Thickness: 0.45m; Construction Period: 21 days; Estimated Cost: RMB 1.68 million; Pb Reduction Rate: 31%; Leaching Risk Reduction Rate: 79%; Ecological Disturbance Level: Medium.
[0051] Candidate Scheme S2: Chemical rinsing + stabilization tail-end treatment: rinsing agent: EDTA compound system; rinsing liquid to solid ratio: 2.8:1; tail liquid recovery rate: 91%; construction period: 28 days; estimated cost: RMB 2.46 million; Pb reduction rate: 58%; Cd reduction rate: 61%; secondary waste liquid treatment pressure: high.
[0052] Candidate Scheme S3: Soil Replacement + Drainage Control: Soil volume: 12,600 m³; Drainage ditch reinforcement length: 186 m; Drainage layer thickness: 0.20 m; Construction period: 19 days; Estimated cost: 2.92 million yuan; Pollution reduction rate: 92%; Construction machinery requirements: High; Slope disturbance: High.
[0053] Candidate Scheme S4: Stabilization / Consolidation + Phytoremediation Combined Use: Consolidation agent dosage: 4.2%; Remediation plants: Centipede grass and Bermuda grass mixed sowing; Vegetation restoration cycle: 90 days; Construction cycle: 16 days; Estimated cost: RMB 1.54 million; Pb leaching risk reduction rate: 73%; Long-term surface ecological restoration effect: Excellent; Short-term achievement speed: Medium.
[0054] For plot A3, the model output results are shown in Table 2:
[0055] Table 2
[0056] ;
[0057] Therefore, the system determined candidate solution S4, "stabilization / consolidation + phytoremediation combined", as the target remediation solution for plot A3.
[0058] Example 2, according to Figure 1 , Figure 2 The difference between this embodiment and Embodiment 1 is that the pollution status assessment module uses a different model parameter optimization mechanism. In this embodiment, the adaptive Lipschitz-free conditional gradient optimization mechanism is no longer used. Instead, a stochastic gradient descent (SGD)-based optimization mechanism is used to train and update the pollution assessment model. Specifically, this includes the following: calculating the loss function based on the current batch of training samples, performing gradient backpropagation calculation on the model parameters according to the loss function, iteratively updating the model parameters through a preset learning rate, thereby gradually reducing the training error until the preset convergence condition is met, and obtaining the converged pollution assessment model.
[0059] Example 3, based on Example 1, describes the process of acquiring the spatiotemporal map data of the spoil heap in the site spatiotemporal map modeling module, specifically including the following steps:
[0060] Step E1: Based on the preset spatial division rules, the spoil heap is divided into several plot units, slope units, drainage units, seepage-sensitive units and construction units, and each of the above units is used as a graph node object.
[0061] Step E2: Extract pollution concentration characteristics, soil physicochemical characteristics, environmental disturbance characteristics, historical remediation characteristics, and construction accessibility characteristics of the spatial units corresponding to each graph node object from the standardized site basic dataset, and construct node attribute vectors to form a node attribute matrix;
[0062] Step E3: Construct slope runoff transmission edges based on slope elevation difference and flow direction, construct groundwater seepage coupling edges based on groundwater flow direction and permeability coefficient, construct adjacency edges based on spatial adjacency, construct transportation connectivity edges based on road connectivity, and construct construction coordination edges based on construction sequence and equipment coordination requirements, thereby forming a multi-type graph edge set and constructing an adjacency matrix; on this basis, generate a spoil heap spatiotemporal graph structure containing a node attribute matrix and an adjacency matrix, so that the spatiotemporal graph structure can simultaneously represent pollution migration paths, risk diffusion relationships, and construction constraint relationships; output the spoil heap spatiotemporal graph structure as spoil heap spatiotemporal graph data.
[0063] Example 4, based on Example 3, describes the process of adaptively optimizing the parameter update process of the pollution assessment model using an adaptive Lipschitz-free conditional gradient optimization mechanism to obtain the converged pollution assessment model. The specific steps include:
[0064] Step S1: Extract the current batch of training samples from the spatiotemporal map data of the spoil heap according to the preset sampling strategy; drive the pollution assessment model to perform forward inference and loss calculation based on the current batch of training samples to obtain the current round loss function value and obtain the current round model parameter vector as the current parameter point; and simultaneously extract the parameter difference vector, local curvature estimation results and gradient estimation results of each round in the parameter evolution process of the previous several rounds to construct a historical trajectory cache sequence corresponding to the parameter evolution process, which serves as the basic input for subsequent execution of local smooth scale adaptive estimation, gradient direction recursive correction and conditional gradient update;
[0065] Step S2: Based on the historical trajectory cache sequence, extract parameter update information for each round to construct the parameter update trajectory; perform recursive cumulative calculation on the norm information of the parameter difference vector for each round to characterize the geometric change intensity of the parameter update trajectory, and calculate the parameter update trend change term based on the difference in parameter update amplitude between adjacent rounds. Based on the geometric change intensity and the parameter update trend change term, perform joint modeling to generate the local curvature estimation variable under the current training round; on this basis, explicitly introduce the trajectory geometric change in the parameter evolution process into the smooth constraint modeling process, and generate the local smooth scale estimation result that dynamically matches the current training round based on the local curvature estimation variable; the local smooth scale estimation result serves as the local smooth scale in the parameter update process of the current round; where the local smooth scale no longer uses the preset global Lipschitz constant, but is adaptively generated through continuous feedback of historical parameter update behavior, so that the curvature constraint in the parameter update process can be adjusted in real time with the change of the training trajectory, thereby constructing a closed-loop adaptive optimization mechanism consisting of "parameter update trajectory - local curvature response - smooth scale adjustment";
[0066] And the local curvature estimation variables are generated based on the following recursive relationship:
[0067] ;
[0068] in, Indicates the first The local curvature estimation variables corresponding to each training iteration. Indicates the initial scale parameter. Indicates the index of the historical iteration round, Indicates the first The square of the local curvature estimator variable, Indicates the first The model parameter vector during the first training iteration, i.e., the pollution assessment model in the [number]th iteration. The parameter status after the wheel parameter update. Indicates the first It's the turn of the first Update the difference vector of parameters between rounds; No. The round parameters update the norm of the difference vector; Indicates the first The square of the wheel parameter update magnitude; Indicates the trend change adjustment coefficient;
[0069] Step S3: Under the constraint of local smooth scale, combining the stochastic gradient information of the current batch of training samples with the gradient estimation results of the previous rounds adjacent to the current round in the historical trajectory cache sequence, perform recursive estimation update of the gradient direction of the current round to obtain the gradient estimation result of the current round; within the parameter constraint domain of the pollution assessment model, construct a conditional gradient linear minimization subproblem with the gradient estimation result of the current round as the linear objective coefficient; according to the linear descent direction represented by the gradient estimation result of the current round, perform linear minimization solution within the parameter constraint domain to determine the target candidate parameter point corresponding to the parameter update of the current round;
[0070] Step S4: After obtaining the current parameter point and the target candidate parameter point, combine the local smooth scale estimation result obtained in Step S2 and the current round gradient estimation result obtained in Step S3 to construct a quadratic upper bound approximation model for parameter update in the current round, which is used to characterize the local change trend of the loss function in the neighborhood of the current parameter; define the update ratio coefficient of the current parameter point along the direction of the target candidate parameter point as the step size parameter; perform analytical minimization on the step size parameter based on the quadratic upper bound approximation model to obtain the closed adaptive update step size of the current round; wherein, the closed adaptive update step size is jointly determined by the local smooth scale estimation result, the directional difference between the target candidate parameter point and the current parameter point, and the loss reduction potential of the gradient estimation result in this direction, so that the step size selection process is freed from the dependence on manual empirical parameters or extra line search process, and transformed into an adaptive adjustment method driven by the current training state, thereby realizing the dynamic matching between the parameter update amplitude and the local curvature structure;
[0071] The closed-loop adaptive update step size is obtained by performing an analytical minimization solution on the constructed quadratic upper bound approximation model, and its calculation expression is:
[0072] ;
[0073] in, Indicates the first The closed-loop adaptive update step size corresponding to the wheel parameter update. This represents the upper bound parameter for the step size. This indicates that the candidate step size obtained analytically is truncated with an upper bound; Indicates the first Gradient estimation results corresponding to each training round. Indicates the first The current parameter point corresponding to the round of training. Indicates the first The target candidate parameter points corresponding to each round of training. This represents the direction difference vector between the current parameter point and the target candidate parameter point; This represents the adjustment coefficient for the loss difference. Represents the loss function. The current parameter point and the target candidate parameter point in the loss function The difference in function values; The square norm represents the directional difference between the current parameter point and the target candidate parameter point. This represents the gradient stabilization adjustment coefficient;
[0074] Step S5: Based on the closed-loop adaptive update step size, drive the current parameter point to perform conditional gradient advancement along the direction of the target candidate parameter point. After the parameter advancement is completed, extract and record the parameter difference vector before and after the update, and write the obtained parameter movement result back to the historical trajectory cache sequence for continued execution of local smooth scale recursive estimation, gradient estimation correction, and step size adaptive adjustment in the next round of training. This forms a closed-loop linkage relationship between parameter trajectory evolution, local curvature estimation, gradient direction correction, and update step size control.
[0075] Step S6: Repeat steps S2 to S6 to perform multiple rounds of iterative updates on the pollution assessment model parameters; during each iteration, continuously adjust the local smoothing scale adaptively based on the historical parameter trajectory, dynamically correct the update direction based on the recursive gradient estimation results, and solve the current round update step size in a closed form based on the second upper bound approximation; when the decrease in the loss function, the parameter update magnitude, and the site-specific metric meet the preset convergence criteria, stop the parameter iteration and output the converged pollution assessment model, thereby completing the collaborative optimization of the pollution assessment model parameter update path, update intensity, and convergence stability without the need to preset a global Lipschitz constant, perform complex projection operations, or perform an additional line search process.
[0076] Example 5, based on Example 4, describes a graph-structure-driven continuous control decision model that includes:
[0077] The graph structure feature encoding submodule uses a graph convolutional network to encode the spatiotemporal graph data of the spoil heap, extracts the topological embedding features corresponding to the nodes and their adjacency relationships of each plot, and constructs the state representation input.
[0078] The policy network submodule generates initial continuous repair actions for each plot based on the state representation input using a policy network, and performs policy optimization using a dual critic network.
[0079] The remediation decision optimization module, based on a preset reward function, calculates corresponding reward information by combining the pollution state changes, pollution diffusion changes, resource consumption results, and construction feedback results generated by each plot after executing the initial continuous remediation actions. This reward information characterizes the comprehensive feedback results of the current continuous remediation actions in terms of pollution reduction, diffusion suppression, resource consumption, and construction safety. Under the constraint of the candidate remediation action space, during the parameter update process of the dual-commentator network, a multi-step tree backpropagation benefit propagation mechanism guided by the target strategy is introduced. This mechanism performs recursive cumulative estimation of reward information over multiple decision cycles to construct multi-step target benefits. Based on the multi-step target benefits, the benefit error is calculated, the dual-commentator network is updated, and the updated dual-commentator network is used as the basis for further optimization. The theory network optimizes and updates the policy network to obtain continuous remediation control actions. The reward function includes remediation effect reward, pollution diffusion inhibition reward, cost penalty, construction period penalty, ecological disturbance penalty, and construction risk penalty. Among them, the remediation effect reward is used to characterize the positive benefits generated by the decrease in pollution indicators and the improvement of compliance rate; the pollution diffusion inhibition reward is used to characterize the positive benefits generated by the reduction of pollution migration risk; the cost penalty is used to characterize the consumption of materials, equipment and human resources; the construction period penalty is used to characterize the negative impact of the remediation cycle exceeding the target window; the ecological disturbance penalty is used to characterize the adverse impact on the surrounding ecological environment; and the construction risk penalty is used to characterize the safety risks caused by slope operation, chemical application and equipment operation.
[0080] The decision output submodule determines the target repair scheme based on the continuous repair control actions, and outputs the plot-level process configuration results, construction scheduling results, and phased resource allocation results corresponding to the target repair scheme.
[0081] Example 6, based on Example 5, introduces a multi-step tree backpropagation benefit propagation mechanism guided by the target policy during parameter updates of the dual-critic network under the constraint of the candidate repair action space. This mechanism performs recursive cumulative estimation of reward information across multiple decision cycles to construct a multi-step target benefit. Specifically, it includes the following steps:
[0082] Step T1: Obtain the immediate reward information corresponding to the current decision cycle as the current cycle reward, and obtain the reward feedback information corresponding to multiple subsequent decision cycles as the subsequent cycle rewards; simultaneously obtain the state transition sequence corresponding to multiple subsequent decision cycles, and input the state representation corresponding to each subsequent decision cycle into the target policy network to generate the corresponding target continuous repair action; input the state representation corresponding to each subsequent decision cycle and the target continuous repair action into the target critic network to obtain the target action value estimation result corresponding to each subsequent decision cycle, and extract the target action value estimation result corresponding to the final decision cycle as the final target action value; the target policy network is obtained by updating the policy network through parameter delay; the target critic network is obtained by updating the dual critic network through parameter delay.
[0083] Step T2: According to the preset discount factor, the current period reward, subsequent period rewards and the value of the final target action are cumulatively discounted in a recursive manner, and the revenue of each level is processed layer by layer to construct a multi-step tree backpropagation revenue chain for the current state-action pair; wherein, the multi-step tree backpropagation revenue chain is used to characterize the cumulative impact of the current continuous repair action on the pollution control effect, pollution diffusion inhibition effect and resource consumption results in multiple decision cycles;
[0084] Step T3: Perform recursive discount aggregation on the reward items at each level and the final target value item in the multi-step tree backpropagation revenue chain to construct the multi-step target revenue corresponding to the current state-action pair; wherein, the multi-step target revenue is different from the local target value constructed based on the single-step instant reward, and is used to characterize the overall revenue level of continuous repair actions in the long-term governance process.
[0085] The multi-step objective return can be expressed as:
[0086] ;
[0087] in, Indicates the first Starting with a decision cycle and oriented towards the current state-action pair, the system is constructed... The multi-step target benefit is used to characterize the cumulative benefit of the current continuous repair action in multiple subsequent decision cycles, and serves as a target supervision signal in the parameter update process of the dual commentator network. This represents a hierarchical index in a multi-step recursive accumulation process. Indicates the first Discount weights corresponding to future reward items at each level; Indicates the first Reward information corresponding to each decision cycle; In this invention, the reward information is determined by the pollutant concentration change results, pollution diffusion change results, resource consumption results, construction feedback results, ecological disturbance results, and construction risk results within the corresponding cycle, and is used to characterize the comprehensive governance benefits generated by continuous remediation actions within that cycle; This represents the overall discount weight corresponding to the value item of the final target action; Indicates the target critic network index, This indicates taking the minimum value among the outputs of the two target critic networks; Indicates the first The target action value function estimation result output by the target commentator network; it is used to characterize the expected cumulative benefit estimate of the corresponding action in the subsequent governance process under the given state and continuous target repair action conditions; Indicates the first The state representation corresponding to each decision cycle The target policy network is represented in the th... The state corresponding to each decision cycle The target is continuously repaired.
[0088] The present invention and its embodiments have been described above. This description is not restrictive. The accompanying drawings are only one embodiment of the present invention, and the actual structure is not limited thereto. In short, if a person skilled in the art is inspired by this description and designs a similar structure and embodiment without departing from the spirit of the present invention, such design should fall within the protection scope of the present invention.
Claims
1. A machine learning-based intelligent decision-making system for soil pollution remediation of spoil heaps, characterized in that, include The spoil heap collaborative monitoring access module collects spoil heap monitoring data and obtains raw environmental monitoring data packages; The edge preprocessing module preprocesses the raw environmental monitoring data packets and outputs a standardized site basic dataset. The site spatiotemporal graph modeling module discretizes the spoil heap into graph node objects; it extracts features of spatial units corresponding to each graph node object from the standardized site basic dataset and constructs a node attribute matrix. Construct an adjacency matrix; Generate the spatiotemporal map structure of the spoil heap and output it as the spatiotemporal map data of the spoil heap. The pollution status assessment module constructs a pollution assessment model; it inputs the spatiotemporal map data of the spoil heap into the pollution assessment model, and jointly identifies and assesses the pollution type, pollution degree, diffusion risk, migration trend, remediation urgency, ecological sensitivity and construction sensitivity of each plot map node, generating an initial pollution risk profile; An adaptive Lipschitz-free conditional gradient optimization mechanism is used to adaptively optimize the parameter update process of the pollution assessment model, resulting in a converged pollution assessment model. Based on the converged pollution assessment model, inference calculations are performed to obtain an optimized pollution risk profile. Based on the optimized pollution risk profile, combined with preset risk classification rules, remediation constraints and resource allocation constraints, pollution status assessment results are generated. The remediation needs map with priority labels is output according to the risk level, remediation urgency and resource constraints of each plot. The candidate repair strategy generation module constructs a graph-driven continuous control decision model; it calls a preset repair process knowledge base based on the repair demand map to generate a set of candidate repair strategies, and expresses each candidate strategy in a parameterized manner, outputting the candidate repair action space. Using the spatiotemporal map data of the spoil heap as the state input, and running a graph-driven continuous control decision model under the space constraints of candidate repair actions, the strategy is solved and the target repair scheme is determined.
2. The intelligent decision-making system for soil pollution remediation of spoil heaps based on machine learning according to claim 1, characterized in that: The process of acquiring spatiotemporal map data for spoil heaps includes the following steps: Step E1: Based on the preset spatial division rules, the spoil heap is divided into several plot units, slope units, drainage units, seepage-sensitive units and construction units, and each of the above units is used as a graph node object. Step E2: Extract pollution concentration characteristics, soil physicochemical characteristics, environmental disturbance characteristics, historical remediation characteristics, and construction accessibility characteristics of the spatial units corresponding to each graph node object from the standardized site basic dataset, and construct node attribute vectors to form a node attribute matrix; Step E3: Construct slope runoff transmission edges based on slope elevation difference and flow direction, construct underground seepage coupling edges based on groundwater flow direction and permeability coefficient, construct adjacency edges based on spatial adjacency, construct transportation connectivity edges based on road connectivity, and construct construction coordination edges based on construction sequence and equipment coordination requirements, thereby forming a multi-type graph edge set and constructing an adjacency matrix; on this basis, generate a spoil heap spatiotemporal graph structure containing a node attribute matrix and an adjacency matrix; output the spoil heap spatiotemporal graph structure as spoil heap spatiotemporal graph data.
3. The intelligent decision-making system for soil pollution remediation of spoil heaps based on machine learning as described in claim 1, characterized in that: The pollution assessment model includes a graph structure feature encoder based on GraphSAGE and a time-series feature modeling unit based on GRU.
4. The intelligent decision-making system for soil pollution remediation of spoil heaps based on machine learning according to claim 1, characterized in that: The process of adaptively optimizing the parameter update process of the pollution assessment model using an adaptive Lipschitz-free conditional gradient optimization mechanism to obtain the converged pollution assessment model includes the following steps: Step S1: Extract the current batch of training samples from the spatiotemporal map data of the spoil heap according to the preset sampling strategy, drive the pollution assessment model to perform forward inference and loss calculation, obtain the current round loss function value, and obtain the current round model parameter vector as the current parameter point; extract the parameter difference vector, local curvature estimation result and gradient estimation result of each round in the parameter evolution process, and construct the historical trajectory cache sequence corresponding to the parameter evolution process; Step S2: Based on the historical trajectory cache sequence, extract parameter update information for each round to construct the parameter update trajectory; perform recursive cumulative calculation on the norm information of the parameter difference vector for each round to characterize the geometric change intensity of the parameter update trajectory; calculate the parameter update trend change term based on the difference in parameter update amplitude between adjacent rounds, and perform joint modeling based on the geometric change intensity and the parameter update trend change term to generate the local curvature estimation variable under the current training round; on this basis, explicitly introduce the trajectory geometric change in the parameter evolution process into the smooth constraint modeling process, and generate the local smooth scale estimation result that dynamically matches the current training round based on the local curvature estimation variable; the local smooth scale estimation result serves as the local smooth scale in the parameter update process of the current round. Step S3: Under the constraint of local smooth scale, combining the stochastic gradient information of the current batch of training samples with the gradient estimation results of the previous rounds adjacent to the current round in the historical trajectory cache sequence, perform recursive estimation update of the gradient direction of the current round to obtain the gradient estimation result of the current round; within the parameter constraint domain of the pollution assessment model, construct a conditional gradient linear minimization subproblem with the gradient estimation result of the current round as the linear objective coefficient; according to the linear descent direction represented by the gradient estimation result of the current round, perform linear minimization solution within the parameter constraint domain to determine the target candidate parameter point corresponding to the parameter update of the current round; Step S4: After obtaining the current parameter point and the target candidate parameter point, construct a quadratic upper bound approximation model by combining the local smooth scale estimation result and the current round gradient estimation result; define the step size parameter; perform analytical minimization on the step size parameter based on the quadratic upper bound approximation model to obtain the closed adaptive update step size; Step S5: Based on the closed adaptive update step size, drive the current parameter point to perform conditional gradient advancement along the direction of the target candidate parameter point, obtain the parameter movement result and write it back to the historical trajectory cache sequence; Step S6: Repeat steps S2 to S6 to perform multiple rounds of iterative updates on the pollution assessment model parameters and output the converged pollution assessment model.
5. The intelligent decision-making system for soil pollution remediation of spoil heaps based on machine learning according to claim 4, characterized in that: The quadratic upper bound approximation model is used to characterize the local variation trend of the loss function in the neighborhood of the current parameters.
6. The intelligent decision-making system for soil pollution remediation of spoil heaps based on machine learning according to claim 4, characterized in that: The update ratio coefficient of the current parameter point along the direction of the target candidate parameter point is defined as the step size parameter.
7. The intelligent decision-making system for soil pollution remediation of spoil heaps based on machine learning according to claim 1, characterized in that: Graph-structure-driven continuous control decision models include: The graph structure feature encoding submodule uses a graph convolutional network to encode the spatiotemporal graph data of the spoil heap, extracts the topological embedding features corresponding to the nodes and their adjacency relationships of each plot, and constructs the state representation input. The policy network submodule generates initial continuous repair actions for each plot based on the state representation input using a policy network, and performs policy optimization using a dual critic network. The repair decision optimization module, based on a preset reward function, executes initial continuous repair actions and calculates the corresponding reward information. Under the constraint of the candidate repair action space, during the parameter update process of the dual critic network, a multi-step tree backpropagation benefit propagation mechanism guided by the target policy is introduced to perform recursive cumulative estimation of reward information in multiple decision cycles to construct multi-step target benefits. Based on the multi-step target benefits, the benefit error is calculated, the dual critic network is updated, and continuous repair control actions are obtained. The decision output submodule determines the target repair scheme based on the continuous repair control actions.
8. The intelligent decision-making system for soil pollution remediation of spoil heaps based on machine learning according to claim 7, characterized in that: The process of constructing a multi-step target return includes the following steps: Step T1: Obtain the current period reward, subsequent period rewards, and the value of the final target action; Step T2: According to the preset discount factor, the current period reward, subsequent period rewards and the value of the final target action are cumulatively discounted in a recursive manner, and the income of each level is processed layer by layer to build a multi-step tree back-transmission income chain; Step T3: Perform recursive discount aggregation on each level of reward items and the final target value item in the multi-step tree backpropagation revenue chain to construct the multi-step target revenue corresponding to the current state-action pair.
9. The intelligent decision-making system for soil pollution remediation of spoil heaps based on machine learning according to claim 7, characterized in that: The reward function includes a reward for remediation effectiveness, a reward for pollution diffusion inhibition, a penalty for cost, a penalty for construction period, a penalty for ecological disturbance, and a penalty for construction risk.