Water quality dynamic regulation method based on water quality change and ecological function recovery

By constructing a three-dimensional spatiotemporal ground state matrix of the water environment and a reinforcement learning decision model, the release flux of endogenous pollutants is quantified, and the operation of water body control equipment is optimized. This solves the problem of inaccurate release patterns of endogenous pollutants in existing technologies and achieves low-energy and high-efficiency water quality control.

CN122243082APending Publication Date: 2026-06-19POWERCHINA CHONGQING ENG CO LTD

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
POWERCHINA CHONGQING ENG CO LTD
Filing Date
2026-03-20
Publication Date
2026-06-19

AI Technical Summary

Technical Problem

Existing technologies for water environment management in deep reservoirs or lakes fail to accurately quantify the nonlinear release patterns of endogenous pollutants, resulting in control strategies that cannot adaptively match equipment operating parameters. This can easily lead to problems such as excessive aeration, high energy consumption, or substandard water quality.

Method used

A three-dimensional spatiotemporal ground state matrix of the water environment is constructed. The evolution of water stratification and dissolved oxygen at the bottom layer are calculated using a physical information neural network. Dynamic control commands are generated by combining a reinforcement learning decision model. The release flux of endogenous pollutants is quantified through an interface mass transfer kinetic model to optimize equipment energy consumption and water quality compliance.

Benefits of technology

It enables precise quantification of endogenous pollution release, reduces the power consumption of treatment projects, avoids the risk of water quality deterioration, and improves the safety and efficiency of control strategies.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN122243082A_ABST
    Figure CN122243082A_ABST
Patent Text Reader

Abstract

This invention relates to the field of water environment management and smart water technology, and particularly to a dynamic water quality control method based on water quality changes and ecological function restoration, comprising the following steps: S1, collecting water quality and hydrological parameters at various depth levels of the target water body to construct a three-dimensional spatiotemporal ground state matrix of the water environment; S2, inputting the three-dimensional spatiotemporal ground state matrix of the water environment into a physical information neural network, and performing calculations using the hydrodynamic equations and pollution release kinetic equations embedded in the physical information neural network. In this invention, an interfacial mass transfer kinetic model combining water body stratification stability and anoxic duration is constructed to quantify the release flux of endogenous pollutants in the sediment, and this release flux is fused with the dissolved oxygen state of the bottom layer and input into a reinforcement learning decision model. Then, under the premise of ensuring that the bottom water quality meets standards, dynamic control commands that take into account equipment energy consumption are generated, improving the problem of over-aeration caused by inaccurate pollution source estimation in traditional water quality control.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of water environment management and smart water technology, and in particular to a method for dynamic water quality control based on water quality changes and ecological function restoration. Background Technology

[0002] In water environment management projects for deep reservoirs or lakes, the release of endogenous pollutants from bottom sediments is a key factor leading to eutrophication and water quality deterioration. Mechanical equipment such as bottom aerators and underwater flow generators are typically used to replenish dissolved oxygen in the bottom water, thereby inhibiting the release of pollutants such as nitrogen and phosphorus by altering the redox potential at the sediment-water interface. This process involves complex hydrothermal stratification and dissolved oxygen biochemical consumption mechanisms. Accurately assessing the intensity of endogenous release and adjusting equipment operation accordingly is crucial for ensuring treatment effectiveness and controlling costs.

[0003] Existing technologies typically overlook the dynamic driving effect of water body thermodynamic stratification stability and the duration of bottom anoxic conditions on sediment release flux. They often rely solely on static monitoring concentrations or fixed empirical coefficients to estimate pollution loads, making it difficult to accurately quantify the nonlinear release patterns of endogenous pollutants. This results in control strategies failing to adaptively match optimal equipment operating parameters to real-world environmental changes, easily leading to high energy consumption due to blind over-aeration or water quality non-compliance issues caused by delayed responses to release peaks. Summary of the Invention

[0004] To overcome the above shortcomings, this invention provides a dynamic water quality control method based on water quality changes and ecological function restoration, aiming to improve the problem that existing technologies rely on static monitoring concentrations or fixed empirical coefficients to estimate pollution loads, making it difficult to accurately quantify the nonlinear release patterns of endogenous pollution.

[0005] This invention provides the following technical solution: a method for dynamic water quality control based on water quality changes and ecological function restoration, comprising the following steps: S1. Collect water quality and hydrological parameters at various water depths of the target water body and construct a three-dimensional spatiotemporal ground state matrix of the water environment. S2. Input the three-dimensional water environment spatiotemporal ground state matrix into the physical information neural network, and use the hydrodynamic equation and pollution release kinetic equation embedded in the physical information neural network to perform calculations, and output the water body stratification evolution sequence and the bottom dissolved oxygen prediction sequence. S3. Extract the stratification stability parameter from the water body stratification evolution sequence, combine it with the anoxic duration parameter from the bottom dissolved oxygen prediction sequence, calculate the endogenous pollution release rate of the sediment under the stratification stability parameter and the anoxic duration parameter, and generate the endogenous pollution release flux sequence. S4. The endogenous pollution release flux sequence and the bottom dissolved oxygen prediction sequence are fused to construct an environmental state feature input reinforcement learning decision model. The energy consumption parameters and bottom dissolved oxygen target threshold of the water body control equipment corresponding to the target water body are obtained. The reward function of the reinforcement learning decision model is constructed based on the energy consumption parameters and the bottom dissolved oxygen target threshold and calculated to output the initial dynamic control command set of the water body control equipment. S5. Using the constructed hydrodynamic simulation model, the initial dynamic control instruction set is simulated and calculated, and the over-limit instruction that causes the convection mixing gradient of the upper and lower water bodies to be greater than the critical value of the instability of the target water body during the simulation process is extracted. S6. Remove the out-of-limit instructions from the initial dynamic control instruction set, and interpolate based on the instruction parameters of adjacent time steps to output the target dynamic control instruction set.

[0006] Preferably, in step S1, constructing the three-dimensional water environment spatiotemporal ground state matrix specifically includes the following steps: Outlier removal is performed on the collected water quality parameters and hydrological parameters to obtain a set of environmental parameters; The environmental parameter set is spatiotemporally aligned using spatial interpolation and time series smoothing algorithms to generate spatiotemporal grid data. The spatiotemporal grid data is reconstructed using tensors according to the depth dimension, latitude and longitude dimension, and time dimension to generate a three-dimensional spatiotemporal ground state matrix of the water environment.

[0007] Preferably, in step S2, the calculation using the hydrodynamic equations and pollution release kinetic equations embedded in the physical information neural network to output the water body stratification evolution sequence and the bottom dissolved oxygen prediction sequence specifically includes the following steps: The three-dimensional water environment spatiotemporal ground state matrix is ​​used as the initial boundary condition and input into the physical information neural network. The momentum conservation residual of the hydrodynamic equation and the mass conservation residual of the pollution release kinetic equation are used as the physical loss function of the physical information neural network. The gradient descent algorithm is used to update the weight parameters of the physical information neural network based on the physical loss function, and the water stratification evolution sequence and bottom dissolved oxygen prediction sequence are output.

[0008] Preferably, the step of using the momentum conservation residual of the hydrodynamic equation and the mass conservation residual of the pollution release kinetic equation as the physical loss function of the physical information neural network specifically includes the following steps: Extract the predicted spatiotemporal field data output from the hidden layer of the physical information neural network; Calculate the partial derivative matrix of the predicted spatiotemporal field data with respect to spatial coordinates and time steps; Input the partial derivative matrix into the discretized hydrodynamic equation and the pollution release kinetic equation to obtain the initial residuals of momentum conservation and mass conservation. Calculate the network backpropagation gradient variance of the initial residual with momentum conservation and the network backpropagation gradient variance of the initial residual with mass conservation; The inverse of the backpropagation gradient variance of each of the networks is configured as the penalty weight; The initial residuals of momentum conservation and mass conservation are weighted and fused using the penalty weights to generate a physical loss function.

[0009] Preferably, in step S3, calculating the endogenous pollutant release rate of the sediment under the stratification stability parameter and the anoxic duration parameter, and generating the endogenous pollutant release flux sequence specifically includes the following steps: The vertical density gradient in the water body stratification evolution sequence is extracted, and the Schmidt stability corresponding to the vertical density gradient is calculated as a stratification stability parameter. The continuous time segments in the underlying dissolved oxygen prediction sequence where the dissolved oxygen concentration is lower than the critical concentration for anoxic conditions are statistically analyzed and used as a parameter for the duration of hypoxia. The stratified stability parameter and the hypoxia duration parameter are input into the interface mass transfer kinetics model to perform flux integration calculation, generating an endogenous pollutant release flux sequence.

[0010] Preferably, in step S4, the calculation of the reward function for constructing the reinforcement learning decision model based on the energy consumption parameter and the underlying dissolved oxygen target threshold, and the output of the initial dynamic control command set for the water body regulation equipment, specifically includes the following steps: The energy consumption parameter is configured as the energy consumption penalty term in the reward function of the reinforcement learning decision model, and the underlying dissolved oxygen target threshold is configured as the dissolved oxygen state reward term in the reward function. The policy parameters are updated based on the reward function and the environmental state features using a proximal policy optimization algorithm. The updated strategy parameters are used to generate a continuous sequence of actions, which is then converted into the initial dynamic control instruction set of the water body regulation equipment.

[0011] Preferably, configuring the energy consumption parameter as the energy consumption penalty term in the reward function of the reinforcement learning decision model, and configuring the underlying dissolved oxygen target threshold as the dissolved oxygen state reward term in the reward function, specifically includes the following steps: Extract the predicted dissolved oxygen concentration corresponding to the bottom dissolved oxygen prediction sequence from the environmental state features; Calculate the concentration difference between the predicted dissolved oxygen concentration and the target threshold for dissolved oxygen in the underlying layer; The concentration difference is input into a piecewise truncation mapping function: when the concentration difference is less than zero, a first state reward value that is linearly mapped to the concentration difference is output; when the concentration difference is greater than or equal to zero, a constant reward value is output. The first state reward value and the constant reward value are concatenated to form the dissolved oxygen state reward item; Analyze the energy consumption parameters and extract the equipment operating power sequence and equipment start-up / shutdown frequency sequence; The operating power sequence and the start / stop frequency sequence of the equipment are normalized and weighted and summed to form an energy consumption penalty term.

[0012] Preferably, in step S5, the extraction of the over-limit instruction that causes the convection mixing gradient between the upper and lower water bodies to exceed the critical value for instability of the target water body during the simulation specifically includes the following steps: The initial dynamic control instruction set is input into the hydrodynamic simulation model for time-step solution, and the three-dimensional flow field simulation matrix is ​​output. Calculate the ratio of velocity shear force to density gradient at vertically adjacent grid nodes in the three-dimensional flow field simulation matrix to obtain the convection mixing gradient of the upper and lower water bodies; By comparing the convective mixing gradient of the upper and lower water bodies with the critical value of the strata instability corresponding to the target water body, the instructions corresponding to the time steps in which the convective mixing gradient of the upper and lower water bodies is greater than the critical value of the strata instability corresponding to the target water body are extracted as over-limit instructions.

[0013] Preferably, in step S6, the interpolation of instruction parameters based on adjacent time steps to output the target dynamic control instruction set specifically includes the following steps: Determine the time breakpoint position corresponding to the over-limit instruction in the initial dynamic control instruction set; Extract the first instruction parameters of the time step preceding the time breakpoint and the second instruction parameters of the subsequent time step; The first instruction parameter and the second instruction parameter are numerically fitted using a polynomial interpolation algorithm to generate replacement parameters, which are then written into the time breakpoint position of the initial dynamic control instruction set to generate the target dynamic control instruction set.

[0014] The present invention has the following beneficial effects: 1. In this invention, an interfacial mass transfer dynamics model combining water stratification stability and anoxic duration is constructed to quantify the release flux of endogenous pollutants in the sediment. This release flux is then integrated with the dissolved oxygen state of the bottom layer and input into a reinforcement learning decision model. Under the premise of ensuring that the bottom water quality meets the standards, dynamic control commands that take into account equipment energy consumption are generated, thereby improving the problem of over-aeration caused by inaccurate estimation of pollution sources in traditional water quality control.

[0015] 2. In this invention, by configuring a reinforcement learning architecture based on a near-end policy optimization algorithm and constructing a multi-objective evaluation function that integrates dissolved oxygen state rewards with equipment operating frequency and power penalties, the decision-making model is guided to autonomously learn the balance between water quality improvement and energy consumption. This reduces the frequent start-up and shutdown of the control equipment in the process of maintaining water quality standards and lowers the overall power consumption of the treatment project.

[0016] 3. In this invention, a control command safety verification mechanism based on hydrodynamic simulation is introduced to identify and eliminate over-limit actions that could lead to water body stratification instability by calculating the ratio of flow velocity shear force to density gradient. A polynomial interpolation algorithm is then used to smoothly reconstruct the action gap period, thereby avoiding the risk of water quality deterioration caused by excessive mechanical mixing that disrupts the natural stratification of the water body and improving the safety of the control strategy in actual water body execution. Attached Figure Description

[0017] Figure 1 This is a flowchart of the dynamic water quality control method based on water quality changes and ecological function restoration proposed in this invention. Detailed Implementation

[0018] The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of the present invention, and not all embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those skilled in the art without creative effort are within the scope of protection of the present invention.

[0019] This invention provides a method for dynamic water quality control based on water quality changes and ecological function restoration, such as... Figure 1 As shown, it includes the following steps: S1. Collect water quality and hydrological parameters at various water depths of the target water body and construct a three-dimensional spatiotemporal ground state matrix of the water environment. Furthermore, in step S1, constructing the three-dimensional spatiotemporal ground state matrix of the water environment specifically includes the following steps: Outlier removal was performed on the collected water quality and hydrological parameters to obtain a set of environmental parameters. Spatiotemporal alignment of environmental parameter sets is performed based on spatial interpolation and time series smoothing algorithms to generate spatiotemporal grid data. The spatiotemporal grid data is reconstructed using tensors according to the depth, latitude and longitude, and time dimensions to generate a three-dimensional spatiotemporal ground state matrix of the water environment.

[0020] Specifically, water quality and hydrological parameters are collected at various depth levels in the target water body. Raw observation data are obtained through multi-source sensor nodes deployed at different depths in the target water body. The water quality parameters include dissolved oxygen and total phosphorus concentrations at each depth level, while the hydrological parameters include water flow velocity and water temperature data at the corresponding depth levels.

[0021] Outlier removal was performed on the collected water quality and hydrological parameters to obtain a set of environmental parameters. Considering that underwater sensors are susceptible to sediment obstruction or signal drift, resulting in outliers, the Laida criterion was used for cleaning. Taking dissolved oxygen concentration as an example, a specific sampling point is defined as the [number]th [item] in the dissolved oxygen concentration observation sequence at a specific depth level. The data points are Calculate the data mean of this sequence. with standard deviation The degree of dispersion of each data point relative to the mean is expressed by the coefficient of absolute deviation. The quantification process is as follows: ; The threshold for judgment is set to three. If the calculated value is... Then the corresponding The data is identified as abnormal noise from the sensor and deleted. The above logic is then executed by traversing the water quality and hydrological parameter sequences at all acquisition levels to obtain the set of environmental parameters after eliminating physical outliers.

[0022] Spatiotemporal grid data is generated by spatiotemporally aligning the environmental parameter set using spatial interpolation and time series smoothing algorithms. Since the sensor deployment points in the water body are discrete and the sampling step sizes of each device are inconsistent, spatial dimension filling is required first. Taking dissolved oxygen concentration in the environmental parameter set as an example, the spatial coordinates of the blank grid nodes to be solved are set as follows: The dissolved oxygen concentration to be predicted at this node is... The total number of known sensor nodes participating in the calculation around it is Known sensor nodes The effective dissolved oxygen concentration value collected is ,node With nodes The spatial Euclidean distance between them is The distance decay weighting coefficient is And it is usually preset to two, the spatial interpolation calculation formula is as follows: ; After completing the global spatial grid filling, an exponential smoothing algorithm is used to resample the unaligned time series. The smoothing parameter value at the beginning of the algorithm is set to be equal to the original spatial interpolation value of the first time step at the current position. Under a uniformly set time step size, the smoothing parameter value at the current time step is... The smoothing parameter value of the previous time step The original physical values ​​obtained through spatial interpolation at the current time step are: The time smoothing coefficient is Furthermore, its value is greater than zero and less than one. The formula for calculating time series smoothing is as follows: ; This two-dimensional alignment calculation transforms discrete parameter observations into continuous spatiotemporal grid data on a preset three-dimensional grid and a fixed time series.

[0023] A three-dimensional spatiotemporal ground state matrix of the water environment is generated by tensor reconstruction of spatiotemporal grid data according to the depth, latitude and longitude, and time dimensions. Different water quality and hydrological parameters are mapped to feature channel dimensions, and a tensor structure covering the time axis, longitude axis, latitude axis, depth level, and feature channels is established to generate the three-dimensional spatiotemporal ground state matrix of the water environment. Its mathematical expression is as follows: ; In the formula, Represents the set of real numbers. This represents the total number of time steps after resampling. This represents the total number of depth levels that vertically divide the water body. and These represent the total number of horizontal and vertical nodes in the horizontal latitude and longitude grid. This refers to the total number of characteristic channels, including dissolved oxygen concentration, total phosphorus concentration, water flow velocity, and water temperature.

[0024] By converting discrete raw monitoring data into structured multidimensional tensors, standardized data is provided for subsequent model derivation.

[0025] S2. Input the three-dimensional water environment spatiotemporal ground state matrix into the physical information neural network, and use the hydrodynamic equation and pollution release kinetic equation embedded in the physical information neural network to perform calculations, and output the water body stratification evolution sequence and bottom dissolved oxygen prediction sequence. Further, in step S2, the calculations using the hydrodynamic equations and pollution release kinetic equations embedded in the physical information neural network to output the water body stratification evolution sequence and the bottom dissolved oxygen prediction sequence specifically include the following steps: The three-dimensional water environment spatiotemporal ground state matrix is ​​used as the initial boundary condition input into the physical information neural network. The momentum conservation residuals of the hydrodynamic equations and the mass conservation residuals of the pollution release kinetic equations are used as the physical loss functions of the physical information neural network. The gradient descent algorithm is used to update the weight parameters of the physical information neural network based on the physical loss function, and the water stratification evolution sequence and the bottom dissolved oxygen prediction sequence are output.

[0026] Furthermore, using the momentum conservation residuals of the hydrodynamic equations and the mass conservation residuals of the pollution release kinetic equations as the physical loss function of the physical information neural network specifically includes the following steps: Extracting predicted spatiotemporal field data from the hidden layer output of a neural network to extract physical information; Calculate the partial derivative matrix of the predicted spatiotemporal field data with respect to spatial coordinates and time steps; By inputting the partial derivative matrix into the discretized hydrodynamic equation and the pollution release kinetic equation, the initial residuals of momentum conservation and mass conservation are obtained. Calculate the backpropagation gradient variance of the network with initial residuals of momentum conservation and the backpropagation gradient variance of the network with initial residuals of mass conservation. The inverse of the backpropagation gradient variance of each network is configured as the penalty weight; The initial residuals of momentum conservation and mass conservation are weighted and fused using penalty weights to generate a physical loss function.

[0027] Specifically, the three-dimensional water environment spatiotemporal ground state matrix generated in the preceding steps is used as the initial boundary condition input to the physical information neural network. The physical information neural network is constructed using a fully connected multilayer perceptron architecture, with network layers including an input layer, multiple fully connected hidden layers, and an output layer. The spatiotemporal grid composed of depth coordinates, latitude and longitude coordinates, and continuous time variables in the matrix serves as independent input parameters for the input layer. Measured physical quantities such as water flow velocity, water temperature, and dissolved oxygen concentration contained in the matrix's feature channels are mapped to the initial state labels and boundary constraint data for network training. The network input vector, composed of three-dimensional spatial coordinates and continuous time variables, is set as... To meet the mathematical requirement that the Laplace operator in the subsequent hydrodynamic partial differential equations be second-order continuously differentiable for physical quantities, the fully connected hidden layer uses the hyperbolic tangent function as the activation function. Let the physical information neural network have a total of... The 1st hidden layer, the 2nd The weight matrix of each hidden layer is as follows: The bias vector is The hyperbolic tangent activation function is , No. The output hidden feature vectors of each hidden layer are: And the initial hidden feature vector Equal to input vector The formula for calculating the layer-by-layer forward propagation of the hidden layer in a network is as follows: ; The predicted spatiotemporal field data vector is obtained by performing forward propagation computation on the network, extracting physical information, and then linearly mapping the output layer of the neural network. The output layer contains an independently configured output weight matrix. With output bias vector Its mapping formula is as follows: ; Analyze the predicted spatiotemporal field data vector The predicted spatiotemporal field data covers the predicted three-dimensional velocity field vector, predicted water pressure field value, predicted water temperature field value, and predicted dissolved oxygen concentration field value in a continuous spatiotemporal domain.

[0028] Calculate the partial derivative matrix of the predicted spatiotemporal field data with respect to spatial coordinates and continuous-time variables. Let the continuous-time variables be... The three-dimensional spatial coordinates are The predicted three-dimensional velocity vector is The predicted dissolved oxygen concentration is The predicted water pressure is The above predicted physical quantities are calculated with respect to time using automatic differentiation techniques. With space The first-order and second-order spatial partial derivatives are summarized to form a partial derivative matrix. This matrix is ​​then input into the discretized hydrodynamic and pollution release kinetic equations to obtain the initial residuals for momentum and mass conservation. The initial residuals for momentum conservation in the flow field evolution are then calculated. The calculation formula is as follows: ; In the formula, The preset constant vector of gravitational acceleration. This is the preset kinematic viscosity constant corresponding to the target water body. To utilize the preset temperature-density empirical polynomial to numerically calculate the dynamic water density based on the predicted water temperature field, For three-dimensional gradient operators, For the Laplace operator. Mass conservation of water quality migration, initial residuals. The calculation formula is as follows: ; In the formula, The preset turbulent diffusion coefficient corresponding to the target water body. The preset dissolved oxygen consumption rate benchmark constant is the sediment of the target water body.

[0029] Calculate the backpropagation gradient variance of the network with initial residuals of momentum conservation and the network with initial residuals of mass conservation. Let the total weight parameter set of the physical information neural network in the current iteration step, including all hidden layers and the output layer, be . Using the chain rule, calculate the mean square values ​​of the initial residuals of the above two items with respect to the total weight parameter set. The first-order partial derivative of the network gradient is calculated. The variance of the network gradient is statistically analyzed based on batch spatiotemporal sampling points to obtain the momentum conservation gradient variance. With mass conservation gradient variance The inverse of the variance of the backpropagation gradient of each network is configured as a penalty weight, and a small constant is set to prevent the denominator from being zero. Furthermore, its normal value is 10 to the power of negative 8, and the momentum penalty weight is set to... The quality penalty weight is set to The initial residuals for momentum conservation and mass conservation are weighted and fused using penalty weights to generate a physical loss function. Its mathematical expression is as follows: ; In the formula, The total number of spatiotemporal points involved in the physical loss calculation. The traversal index for spatiotemporal point allocation. and The first The initial residual vector of momentum conservation and the initial residual scalar of mass conservation are calculated at each spatiotemporal collocation point. This represents Euclidean norm operations.

[0030] The mean square error function is used to calculate the data loss term between the predicted spatiotemporal field data and the initial state label and boundary constraint data. Let the total number of known observation points for the initial boundary involved in the constraint calculation be . , For the traversal index of the observation points, the first... The true known physical quantity vectors at each observation point are: The physical information neural network outputs the corresponding position predicted physical quantity vector as follows: The formula for calculating the data loss term is: The data loss term and the generated physical loss function are directly summed proportionally to construct the total loss function of the physical information neural network. Its formula is The gradient descent optimization algorithm is used based on this total loss function. Perform backpropagation to update the total weight parameter set of the physical information neural network. As the network converges iteratively, the velocity and concentration fields output by the network gradually satisfy the fluid dynamic constraints and boundary conditions of the real aquatic environment. The predicted spatiotemporal field data output after the network's final convergence are extracted, and the predicted vertical velocity and predicted water temperature profiles under the time dimension are extracted to construct a water body stratification evolution sequence. Simultaneously, continuous time node data of the predicted dissolved oxygen concentration corresponding to the lowest water depth are extracted to generate a bottom-level dissolved oxygen prediction sequence.

[0031] By employing rigorous forward derivation of the feature layer and backward differentiation of the multi-objective loss function, the mathematical problem of the inability of pure physical partial differential equations to converge under no data constraints was solved, and the predicted data of hydrodynamic stratification and bottom dissolved oxygen evolution that fully conform to the physical conservation laws were output.

[0032] S3. Extract the stratification stability parameter from the water stratification evolution sequence, combine it with the anoxic duration parameter from the bottom dissolved oxygen prediction sequence, calculate the endogenous pollution release rate of the sediment under the stratification stability parameter and the anoxic duration parameter, and generate the endogenous pollution release flux sequence. Further, in step S3, calculating the endogenous pollutant release rate of the sediment under stratification stability parameters and anoxic duration parameters, and generating the endogenous pollutant release flux sequence specifically includes the following steps: The vertical density gradient is extracted from the water body stratification evolution sequence, and the Schmidt stability corresponding to the vertical density gradient is calculated as the stratification stability parameter. Statistical analysis was conducted on continuous time segments in the underlying dissolved oxygen prediction sequence where the dissolved oxygen concentration was below the critical concentration for anoxic conditions, which were used as parameters for the duration of hypoxia. By inputting the stratified stability parameter and the hypoxia duration parameter into the interface mass transfer kinetics model, flux integration is performed to generate an endogenous pollutant release flux sequence.

[0033] Specifically, the predicted water temperature profile data is extracted from the water body stratification evolution sequence generated in the aforementioned steps. The origin of the water depth coordinate system is set to be located at the surface of the target water body with the positive direction vertically downward. The predicted water temperature profile data is converted into the vertical density distribution function of the water body using the temperature-density empirical polynomial. The calculation formula is as follows: ; In the formula, Indicates the current time step water depth coordinates Vertical density distribution function of water at a given location; This represents the preset reference water density constant. This represents the preset coefficient of thermal expansion of the water body; This represents the temperature value of the predicted water temperature profile at the corresponding spatiotemporal point; This indicates the preset maximum density corresponding to the temperature constant, which is usually set to four degrees Celsius.

[0034] The vertical density gradient is obtained by taking the first partial derivative of the vertical density distribution function of the water body with respect to the water depth coordinates. The calculation formula is as follows: ; In the formula, Represents water depth coordinates With time step The corresponding vertical density gradient; This represents the partial derivative operator.

[0035] Substituting the vertical density gradient into the integral formula for water body stability, the Schmidt stability at the corresponding time step is calculated as a parameter for stratification stability. The calculation formula is as follows: ; In the formula, Indicates the current time step The corresponding stratified stability parameters; Indicates the maximum calibrated water depth of the target water body; This represents the preset gravitational acceleration constant; This represents the definite integral operator.

[0036] Extract the bottom dissolved oxygen prediction sequence generated in the preceding steps, and set an anoxic critical concentration threshold for determining whether anaerobic release has occurred at the sediment interface, with a standard value of two milligrams per liter. Construct a sediment anoxic state indicator function; when the bottom dissolved oxygen prediction concentration is less than the anoxic critical concentration threshold, the indicator function equals one, and vice versa. Statistically analyze continuous time segments in the bottom dissolved oxygen prediction sequence where the dissolved oxygen concentration is below the anoxic critical concentration to generate an anoxic duration parameter. Integrate this indicator function over a continuous time interval to generate an anoxic duration parameter that truly reflects the persistent anoxic state; the calculation formula is as follows: ; In the formula, Indicates time step The corresponding hypoxia duration parameter; Indicates the current time step The starting point closest to the current time step where the predicted dissolved oxygen concentration is greater than or equal to the anaerobic critical concentration threshold. Function indicating the anoxic state of bottom sediment; This represents the time variable for continuous integration.

[0037] The stratification stability parameter and the anoxic duration parameter are input into the interface mass transfer kinetics model for flux integration calculation to generate an endogenous pollutant release flux sequence. The release flux of endogenous pollutants in the sediment consists of the baseline background release flux and the accelerated release flux under anoxic conditions, and the accelerated release process is affected by the dynamic attenuation due to the physical stratification of the overlying water body. The interface mass transfer kinetics model obtains the total release flux by performing time integration calculation on the accelerated release rate attenuated by stratification over the time span characterized by the anoxic duration parameter. The flux integration calculation formula is as follows: ; In the formula, Indicates the current time step The corresponding endogenous pollutant release flux; This represents the basic background release flux constant of the target water body sediment; This represents the kinetic coefficient for accelerated release of oxygen from the bottom sediment of the target water body; The operator for exponential functions with the natural constant as the base; Indicates the physical stratification attenuation coefficient of water body; Represents the continuous integration time variable The corresponding hierarchical stability parameter.

[0038] The endogenous pollutant release flux values ​​at each time step are calculated by iterating through the time evolution sequence, and then the complete endogenous pollutant release flux sequence is generated.

[0039] By dynamically quantifying the combined driving process of continuous anoxic environment and the thermodynamic stratification and inhibition effect of water body on the release of endogenous pollutants in bottom sediment, a high-precision release flux sequence was generated, providing a precise quantitative bottom boundary pollution input source for subsequent water quality evolution prediction.

[0040] S4. The endogenous pollution release flux sequence and the bottom dissolved oxygen prediction sequence are fused to construct an environmental state feature input reinforcement learning decision model. The energy consumption parameters of the water body control equipment corresponding to the target water body and the bottom dissolved oxygen target threshold are obtained. The reward function of the reinforcement learning decision model is constructed based on the energy consumption parameters and the bottom dissolved oxygen target threshold, and the initial dynamic control command set of the water body control equipment is output. Further, in step S4, the reward function of the reinforcement learning decision model is calculated based on the energy consumption parameter and the underlying dissolved oxygen target threshold. The output of the initial dynamic control command set of the water body regulation equipment specifically includes the following steps: Configure the energy consumption parameter as the energy consumption penalty term in the reward function of the reinforcement learning decision model, and configure the underlying dissolved oxygen target threshold as the dissolved oxygen state reward term in the reward function; The policy parameters are updated based on the reward function and environmental state features using a proximal policy optimization algorithm. The updated strategy parameters are used to generate a continuous sequence of actions, which is then converted into the initial dynamic control instruction set for the water body regulation equipment.

[0041] Furthermore, configuring the energy consumption parameter as the energy consumption penalty term in the reward function of the reinforcement learning decision model, and configuring the underlying dissolved oxygen target threshold as the dissolved oxygen state reward term in the reward function, specifically includes the following steps: Extract the predicted dissolved oxygen concentration corresponding to the bottom dissolved oxygen prediction sequence from the environmental state features; Calculate the concentration difference between the predicted dissolved oxygen concentration and the target threshold for dissolved oxygen in the bottom layer; Input the concentration difference into the piecewise truncation mapping function: when the concentration difference is less than zero, output the first state reward value that is linearly mapped to the concentration difference; when the concentration difference is greater than or equal to zero, output a constant reward value. The first state reward value and the constant reward value are concatenated to form the dissolved oxygen state reward item; Analyze energy consumption parameters and extract equipment operating power sequences and equipment start-up and shutdown frequency sequences; The equipment operating power sequence and the equipment start-stop frequency sequence are normalized and weighted and summed, and configured as an energy consumption penalty term.

[0042] Specifically, the endogenous pollutant release flux sequence generated in the aforementioned steps is concatenated with the underlying dissolved oxygen prediction sequence at the same time step to construct an environmental state feature sequence. Let the current time step be... The corresponding endogenous pollutant release flux is The predicted dissolved oxygen concentration corresponding to the bottom dissolved oxygen prediction sequence is The environmental state feature vector formed by splicing is The network architecture of the reinforcement learning decision model is configured, which adopts an actor-judge architecture. Internally, it contains independent policy and value networks, both employing a multi-layer perceptron structure with an input layer, fully connected hidden layers, and an output layer. The fully connected hidden layers use a linear rectified function as the activation function. The environmental state feature vector is then processed. As an independent input parameter of the input layer, the output layer of the policy network outputs a continuous action vector. The output layer of the value network outputs the state value scalar corresponding to the environmental state. . Continuous action vector The physical operating parameters of the water body control equipment corresponding to the target water body are mapped to obtain the energy consumption parameters of the water body control equipment. The bottom dissolved oxygen target threshold used to determine whether the water quality meets the standard is obtained, and the reward function of the reinforcement learning decision model is constructed based on the energy consumption parameters and the bottom dissolved oxygen target threshold.

[0043] The predicted dissolved oxygen concentration is extracted from the environmental state feature vector, and the concentration difference between the predicted dissolved oxygen concentration and the underlying dissolved oxygen target threshold is calculated using the following formula: ; In the formula, Indicates time step The corresponding concentration difference; Indicates time step The corresponding predicted dissolved oxygen concentration; This represents the set target threshold constant for dissolved oxygen at the bottom layer.

[0044] The concentration difference is input into a piecewise truncation mapping function. When the concentration difference is less than zero, a first-state reward value linearly mapped to the concentration difference is output; when the concentration difference is greater than or equal to zero, a constant reward value is output. The first-state reward value and the constant reward value are concatenated to form the dissolved oxygen state reward item, calculated using the following formula: ; In the formula, Indicates time step The corresponding dissolved oxygen status reward item; This represents the preset linear mapping reward coefficient, and it is a positive real number; This represents the preset constant reward value.

[0045] The energy consumption parameters obtained from the continuous action vector mapping are analyzed, and the equipment operating power sequence and equipment start-stop frequency sequence are extracted. The equipment operating power sequence and equipment start-stop frequency sequence are normalized using an extreme value normalization algorithm and then weighted and summed, configured as an energy consumption penalty term. The calculation formula is as follows: ; In the formula, Indicates time step The corresponding energy consumption penalty item; Indicates time step The device operating power value is obtained by mapping continuous action vectors; and These represent the historical maximum and minimum operating power of the equipment, respectively. Indicates time step The device start / stop frequency value is obtained by mapping continuous action vectors. and These represent the historical maximum and minimum values ​​of the equipment start-up and shutdown frequency, respectively. This represents the preset power penalty weighting coefficient; This represents the preset frequency penalty weighting coefficient.

[0046] The difference between the dissolved oxygen state reward and the energy consumption penalty is calculated to generate the overall reward function for the reinforcement learning decision model. Let the time step be... The corresponding total reward value is Its formula is Utilizing the state value scalar output by the value network With total reward value Calculate the action advantage function. Let the weight parameter set of the policy network at the current iteration step be... The old set of weight parameters is Time step The action advantage function is By combining a near-end policy optimization algorithm with a truncation mechanism to limit the policy update magnitude, an objective function for updating the policy network parameters is constructed, and its maximization formula is as follows: ; In the formula, Represents the objective function for optimizing the near-end strategy; The operator represents the expected value; Indicates time step The ratio of the probabilities of the new and old strategies is equal to the probability density of the current action output by the new strategy network divided by the probability density of the current action output by the old strategy network. The operator represents the minimum value; This represents the truncation function operator; This indicates the preset truncation hyperparameter, and its normal value is 0.2.

[0047] Based on the aforementioned objective function, the weight parameters of the policy network are updated using the gradient ascent algorithm via backpropagation, while the weight parameters of the value network are updated using the mean squared error function. As the iteration converges, the updated policy network generates the optimal continuous action vector as a continuous action sequence. This continuous action sequence is then directly converted into the oxygenation rate values ​​of the bottom aerator and the operating frequency values ​​of the underwater thruster in a physical aquatic environment scenario, thereby obtaining the initial dynamic control command set of the water body regulation equipment.

[0048] This step, through rigorous feature action mapping and multi-objective reward function iteration, outputs the optimal dynamic control command that balances the treatment of underlying hypoxia with low equipment operating costs, successfully transforming the previously deduced water quality evolution and pollution release characteristics into a water quality improvement plan with practical implementation significance.

[0049] S5. Use the constructed hydrodynamic simulation model to perform simulation calculations on the initial dynamic control command set, and extract the over-limit commands that cause the convection mixing gradient of the upper and lower water bodies to be greater than the critical value of the jump instability of the target water body during the simulation process. Furthermore, in step S5, extracting the over-limit command that causes the convection mixing gradient between the upper and lower water bodies to exceed the critical value for instability of the target water body during the simulation specifically includes the following steps: The initial dynamic control instruction set is input into the hydrodynamic simulation model and solved step by step over time, outputting a three-dimensional flow field simulation matrix. Calculate the ratio of velocity shear force to density gradient at vertically adjacent grid nodes in the three-dimensional flow field simulation matrix to obtain the convection mixing gradient of the upper and lower water bodies; By comparing the convective mixing gradient of the upper and lower water bodies with the critical value of the superstructure instability corresponding to the target water body, the instructions corresponding to the time steps in which the convective mixing gradient of the upper and lower water bodies is greater than the critical value of the superstructure instability corresponding to the target water body are extracted as over-limit instructions.

[0050] Specifically, the initial dynamic control command set generated in the aforementioned steps, containing the oxygenation rate values ​​of the bottom aerator and the operating frequency values ​​of the underwater thruster, is extracted. This initial dynamic control command set is then transformed into hydrodynamic boundary disturbance conditions and momentum source terms. Specifically, the oxygenation rate values ​​of the bottom aerator are mapped into upward vertical momentum source terms, and the operating frequency values ​​of the underwater thruster are mapped into horizontal momentum source terms. These momentum source terms are then input into a pre-constructed hydrodynamic simulation model for time-step solution. The hydrodynamic simulation model is set to include a three-dimensional spatial grid and discrete time steps. By discretely solving the Navier-Stokes equations embedding the vertical and horizontal momentum source terms, the three-dimensional flow field simulation matrix of the target water body at each time step is output. This three-dimensional flow field simulation matrix is ​​analyzed to extract the water velocity and density parameters at each three-dimensional spatial grid node.

[0051] Based on the water velocity and density parameters of vertically adjacent grid nodes in the three-dimensional flow field simulation matrix, the convection mixing gradient of the water above and below each grid node is calculated. Let the current time step be... The water depth coordinates are Extract the horizontal lateral velocity components corresponding to the grid nodes. With the horizontal longitudinal velocity component and the corresponding predicted water density values. The vertical velocity shear force and vertical density gradient are calculated using spatial partial derivatives. A small positive constant is used to prevent the denominator from being zero. The ratio of the absolute values ​​of the vertical velocity shear force and the vertical density gradient plus the small positive constant is calculated to obtain the convective mixing gradient between the upper and lower water bodies. The calculation formula is as follows: ; In the formula, Indicates time step water depth coordinates The corresponding convection mixing gradient between the upper and lower water bodies; This represents the preset dynamic viscosity constant of the target water body. Indicates time step water depth coordinates Predicted horizontal lateral velocity at the location; Indicates time step water depth coordinates Predicted horizontal longitudinal velocity at the location; Indicates time step water depth coordinates Predicted water density at the location; The absolute value operator is represented by the symbol. This represents a small positive integer that is set to prevent the denominator from being zero, and its usual value is 10 to the power of negative 8. The partial derivative operator with respect to water depth coordinates is used to perform equivalent central difference calculations in a discrete three-dimensional flow field simulation matrix by dividing the difference in physical quantities between vertically adjacent grid nodes by a set vertical spatial step size.

[0052] Obtain the pre-calibrated critical value for density gradient instability of the target water body in a stable state. Following the time evolution sequence, traverse each grid node along the vertical water depth coordinate to calculate the convective mixing gradients of the upper and lower water bodies, and extract the maximum mixing gradient value on the water body profile at that time step. Compare this maximum mixing gradient value with the critical value for density gradient instability. Extract the control actions from the initial dynamic control command set corresponding to the time step where the convective mixing gradient of the upper and lower water bodies exceeds the critical value for density gradient instability of the target water body, and generate and output these actions as over-limit commands.

[0053] This step quantifies the true impact of the shear force generated by the physical control equipment on the original thermodynamic stratification of the water body by transforming AI decisions into fluid dynamic boundary conditions and introducing a division-to-zero anti-collapse mechanism, effectively intercepting out-of-limit instability actions that could lead to water quality deterioration.

[0054] S6. Remove the out-of-limit instructions from the initial dynamic control instruction set, and interpolate based on the instruction parameters of adjacent time steps to output the target dynamic control instruction set.

[0055] Specifically, in step S6, interpolation is performed based on the instruction parameters of adjacent time steps to output the target dynamic control instruction set, which includes the following steps: Determine the time breakpoint locations corresponding to the over-limit instructions in the initial dynamic control instruction set; Extract the first instruction parameter of the time step preceding the time breakpoint and the second instruction parameter of the subsequent time step; The first instruction parameter and the second instruction parameter are numerically fitted using a polynomial interpolation algorithm. The replacement parameters are then written into the time breakpoint position of the initial dynamic control instruction set to generate the target dynamic control instruction set.

[0056] Specifically, the initial dynamic control command set and the marked limit-crossing commands generated in the preceding steps are extracted. The continuous time series of the initial dynamic control command set is traversed, and the limit-crossing commands are removed from the initial dynamic control command set. The data gap time steps left in the sequence after removing the limit-crossing commands are determined as time breakpoints. This initial dynamic control command set includes the oxygenation rate sequence of the bottom aerator and the operating frequency sequence of the submersible thruster.

[0057] For each time breakpoint, extract the valid action data of its adjacent time steps from the time series to construct an interpolation window. Let the time corresponding to the current time breakpoint be denoted as . To prevent Runge phenomenon caused by high-order polynomial interpolation from leading to equipment control oscillations, the oxygenation rate of the bottom aerator and the operating frequency of the underwater thruster corresponding to the two consecutive effective time steps preceding the time breakpoint are extracted as the first command parameters. Simultaneously, the oxygenation rate of the bottom aerator and the operating frequency of the underwater thruster corresponding to the two consecutive effective time steps following the time breakpoint are extracted as the second command parameters. The oxygenation rate of the bottom aerator and the operating frequency of the underwater thruster are decoupled into independent one-dimensional data sequences for separate fitting. Let the total number of preceding and subsequent effective time steps involved in the interpolation calculation be... Furthermore, its value is strictly set to four. and All are traversal indices of valid time steps, the first... Each effective time step is recorded as The corresponding one-dimensional first instruction parameter or one-dimensional second instruction parameter is collectively referred to as the valid instruction parameter. .

[0058] A polynomial interpolation algorithm is used to numerically fit the first and second command parameters, calculating the smooth transition action values ​​corresponding to the time breakpoints. A Lagrange polynomial interpolation algorithm is then used to independently reconstruct the oxygenation rate of the bottom aerator and the operating frequency of the underwater thruster using cubic polynomial time series analysis. The specific calculation formulas for the replacement parameters are as follows: ; In the formula, Indicates the location of the time breakpoint Replacement parameters for a single physical dimension control variable are calculated and generated. This represents the summation operator; This represents the cumulative multiplication operator; This represents the total number of valid time steps involved in the interpolation calculation, with a constant value of four. and These represent the traversal of the interpolation window. The and the first One effective time step; Indicates the effective time step The corresponding valid command parameters clearly indicate the oxygenation rate of a single bottom aerator or the operating frequency of a single underwater thruster.

[0059] The calculated replacement parameters for the oxygenation rate of the bottom aerator and the operating frequency of the underwater thruster are synchronously written into the corresponding time breakpoints in the initial dynamic control command set, replacing the dangerous control actions that would otherwise cause the water body to become unstable. By traversing all time breakpoints and completing the writing of the replacement parameters, the discontinuous command sequence is reconstructed into a continuous and smooth control sequence in the time dimension, generating and outputting the target dynamic control command set.

[0060] The numerical fitting algorithm of stable limit nodes was used to repair the time discontinuity caused by the elimination of dangerous hydrodynamic actions, and the discrete actions with the risk of exceeding the limit in the previous deduction were transformed into continuous and safe mechanical control signals, thus ensuring the mechanical stability and physical flow field safety of the water equipment in actual operation.

[0061] Finally, it should be noted that the above description is only a preferred embodiment of the present invention and is not intended to limit the present invention. Although the present invention has been described in detail with reference to the foregoing embodiments, those skilled in the art can still modify the technical solutions described in the foregoing embodiments or make equivalent substitutions for some of the technical features. Any modifications, equivalent substitutions, improvements, etc., made within the spirit and principles of the present invention should be included within the protection scope of the present invention.

Claims

1. A dynamic water quality control method based on water quality changes and ecological function restoration, characterized in that, Includes the following steps: S1. Collect water quality and hydrological parameters at various water depths of the target water body and construct a three-dimensional spatiotemporal ground state matrix of the water environment. S2. Input the three-dimensional water environment spatiotemporal ground state matrix into the physical information neural network, and use the hydrodynamic equation and pollution release kinetic equation embedded in the physical information neural network to perform calculations, and output the water body stratification evolution sequence and the bottom dissolved oxygen prediction sequence. S3. Extract the stratification stability parameter from the water body stratification evolution sequence, combine it with the anoxic duration parameter from the bottom dissolved oxygen prediction sequence, calculate the endogenous pollution release rate of the sediment under the stratification stability parameter and the anoxic duration parameter, and generate the endogenous pollution release flux sequence. S4. The endogenous pollution release flux sequence and the bottom dissolved oxygen prediction sequence are fused to construct an environmental state feature input reinforcement learning decision model. The energy consumption parameters and bottom dissolved oxygen target threshold of the water body control equipment corresponding to the target water body are obtained. The reward function of the reinforcement learning decision model is constructed based on the energy consumption parameters and the bottom dissolved oxygen target threshold and calculated to output the initial dynamic control command set of the water body control equipment. S5. Using the constructed hydrodynamic simulation model, the initial dynamic control instruction set is simulated and calculated, and the over-limit instruction that causes the convection mixing gradient of the upper and lower water bodies to be greater than the critical value of the instability of the target water body during the simulation process is extracted. S6. Remove the out-of-limit instructions from the initial dynamic control instruction set, and interpolate based on the instruction parameters of adjacent time steps to output the target dynamic control instruction set.

2. The water quality dynamic regulation method based on water quality changes and ecological function restoration according to claim 1, characterized in that, In step S1, the construction of the three-dimensional water environment spatiotemporal ground state matrix specifically includes the following steps: Outlier removal is performed on the collected water quality parameters and hydrological parameters to obtain a set of environmental parameters; The environmental parameter set is spatiotemporally aligned using spatial interpolation and time series smoothing algorithms to generate spatiotemporal grid data. The spatiotemporal grid data is reconstructed using tensors according to the depth dimension, latitude and longitude dimension, and time dimension to generate a three-dimensional spatiotemporal ground state matrix of the water environment.

3. The water quality dynamic regulation method based on water quality changes and ecological function restoration according to claim 1, characterized in that, In step S2, the calculation using the hydrodynamic equations and pollution release kinetic equations embedded in the physical information neural network to output the water body stratification evolution sequence and the bottom dissolved oxygen prediction sequence specifically includes the following steps: The three-dimensional water environment spatiotemporal ground state matrix is ​​used as the initial boundary condition and input into the physical information neural network. The momentum conservation residual of the hydrodynamic equation and the mass conservation residual of the pollution release kinetic equation are used as the physical loss function of the physical information neural network. The gradient descent algorithm is used to update the weight parameters of the physical information neural network based on the physical loss function, and the water stratification evolution sequence and bottom dissolved oxygen prediction sequence are output.

4. The water quality dynamic regulation method based on water quality changes and ecological function restoration according to claim 3, characterized in that, The step of using the momentum conservation residual of the hydrodynamic equation and the mass conservation residual of the pollution release kinetic equation as the physical loss function of the physical information neural network specifically includes the following steps: Extract the predicted spatiotemporal field data output from the hidden layer of the physical information neural network; Calculate the partial derivative matrix of the predicted spatiotemporal field data with respect to spatial coordinates and time steps; Input the partial derivative matrix into the discretized hydrodynamic equation and the pollution release kinetic equation to obtain the initial residuals of momentum conservation and mass conservation. Calculate the network backpropagation gradient variance of the initial residual with momentum conservation and the network backpropagation gradient variance of the initial residual with mass conservation; The inverse of the backpropagation gradient variance of each of the networks is configured as the penalty weight; The initial residuals of momentum conservation and mass conservation are weighted and fused using the penalty weights to generate a physical loss function.

5. The water quality dynamic regulation method based on water quality changes and ecological function restoration according to claim 1, characterized in that, In step S3, calculating the endogenous pollutant release rate of the sediment under the stratification stability parameter and the anoxic duration parameter, and generating the endogenous pollutant release flux sequence specifically includes the following steps: The vertical density gradient in the water body stratification evolution sequence is extracted, and the Schmidt stability corresponding to the vertical density gradient is calculated as a stratification stability parameter. The continuous time segments in the underlying dissolved oxygen prediction sequence where the dissolved oxygen concentration is lower than the critical concentration for anoxic conditions are statistically analyzed and used as a parameter for the duration of hypoxia. The stratified stability parameter and the hypoxia duration parameter are input into the interface mass transfer kinetics model to perform flux integration calculation, generating an endogenous pollutant release flux sequence.

6. The water quality dynamic regulation method based on water quality changes and ecological function restoration according to claim 1, characterized in that, In step S4, the calculation of the reward function for constructing the reinforcement learning decision model based on the energy consumption parameter and the underlying dissolved oxygen target threshold, and the output of the initial dynamic control command set for the water body regulation equipment, specifically includes the following steps: The energy consumption parameter is configured as the energy consumption penalty term in the reward function of the reinforcement learning decision model, and the underlying dissolved oxygen target threshold is configured as the dissolved oxygen state reward term in the reward function. The policy parameters are updated based on the reward function and the environmental state features using a proximal policy optimization algorithm. The updated strategy parameters are used to generate a continuous sequence of actions, which is then converted into the initial dynamic control instruction set of the water body regulation equipment.

7. The water quality dynamic regulation method based on water quality changes and ecological function restoration according to claim 6, characterized in that, The step of configuring the energy consumption parameter as the energy consumption penalty term in the reward function of the reinforcement learning decision model, and configuring the underlying dissolved oxygen target threshold as the dissolved oxygen state reward term in the reward function, specifically includes the following steps: Extract the predicted dissolved oxygen concentration corresponding to the bottom dissolved oxygen prediction sequence from the environmental state features; Calculate the concentration difference between the predicted dissolved oxygen concentration and the target threshold for dissolved oxygen in the underlying layer; The concentration difference is input into a piecewise truncation mapping function: when the concentration difference is less than zero, a first state reward value that is linearly mapped to the concentration difference is output; when the concentration difference is greater than or equal to zero, a constant reward value is output. The first state reward value and the constant reward value are concatenated to form the dissolved oxygen state reward item; Analyze the energy consumption parameters and extract the equipment operating power sequence and equipment start-up / shutdown frequency sequence; The operating power sequence and the start / stop frequency sequence of the equipment are normalized and weighted and summed to form an energy consumption penalty term.

8. The water quality dynamic regulation method based on water quality changes and ecological function restoration according to claim 1, characterized in that, In step S5, the extraction of the over-limit command that causes the convection mixing gradient of the upper and lower water bodies to exceed the critical value of the instability of the target water body during the simulation specifically includes the following steps: The initial dynamic control instruction set is input into the hydrodynamic simulation model for time-step solution, and the three-dimensional flow field simulation matrix is ​​output. Calculate the ratio of velocity shear force to density gradient at vertically adjacent grid nodes in the three-dimensional flow field simulation matrix to obtain the convection mixing gradient of the upper and lower water bodies; By comparing the convective mixing gradient of the upper and lower water bodies with the critical value of the strata instability corresponding to the target water body, the instructions corresponding to the time steps in which the convective mixing gradient of the upper and lower water bodies is greater than the critical value of the strata instability corresponding to the target water body are extracted as over-limit instructions.

9. The method for dynamic water quality control based on water quality changes and ecological function restoration according to claim 1, characterized in that, In step S6, the interpolation of instruction parameters based on adjacent time steps to output the target dynamic control instruction set specifically includes the following steps: Determine the time breakpoint position corresponding to the over-limit instruction in the initial dynamic control instruction set; Extract the first instruction parameters of the time step preceding the time breakpoint and the second instruction parameters of the subsequent time step; The first instruction parameter and the second instruction parameter are numerically fitted using a polynomial interpolation algorithm to generate replacement parameters, which are then written into the time breakpoint position of the initial dynamic control instruction set to generate the target dynamic control instruction set.