Intelligent flood discharge scheduling method for small reservoirs based on reinforcement learning

CN120197889BActive Publication Date: 2026-06-16CHANGZHOU INST OF TECH

Patent Information

Authority / Receiving Office: CN · China
Patent Type: Patents(China)
Current Assignee / Owner: CHANGZHOU INST OF TECH
Filing Date: 2025-03-11
Publication Date: 2026-06-16

AI Technical Summary

⚠Technical Problem

Small reservoirs face problems such as insufficient real-time response capability, poor multi-objective optimization capability, and insufficient adaptability to complex dynamic environments in flood discharge scheduling. Existing methods are difficult to meet the diverse needs of flood control, irrigation, and ecological maintenance.

⚗Method used

An intelligent flood discharge scheduling method based on reinforcement learning is adopted. The initial scheduling strategy is generated by quantum genetic algorithm, and multi-round iterative training is carried out by combining an improved reinforcement learning model. A multi-objective reward function and dynamic weight adjustment mechanism are constructed to realize real-time optimization of flood discharge decision-making.

🎯Benefits of technology

It improves the efficiency and reliability of flood discharge scheduling, enables rapid response under complex hydrological and meteorological conditions, dynamically balances flood control, irrigation and ecological objectives, and reduces reservoir operation risks.

✦ Generated by Eureka AI based on patent content.

Smart Images

Figure CN120197889B_ABST

Patent Text Reader

Abstract

The application discloses a kind of small reservoir intelligent flood discharge scheduling methods based on reinforcement learning, S1.Form complete reservoir dataset;S2.Based on reservoir dataset, the environment state vector required for reinforcement learning is constructed;S3.Initial flood discharge scheduling strategy is generated;S4.Based on the environment state vector and initial flood discharge scheduling strategy, an improved reinforcement learning model is established;S5.The policy network in the improved reinforcement learning model is updated to obtain the optimal flood discharge decision scheme;S6.According to the optimal flood discharge decision scheme, gate operation is carried out, and the water level change of small reservoir after execution, the safety state of downstream basin and the satisfaction degree of ecological demand are monitored;S7.Feedback information is input into the improved reinforcement learning model again, and the improved reinforcement learning model is iteratively updated and the policy is corrected in combination with the reservoir dataset and the environment state vector.The application significantly improves the efficiency and reliability of small reservoir flood discharge scheduling.

Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of reservoir technology, and in particular to an intelligent flood discharge scheduling method for small reservoirs based on reinforcement learning. Background Technology

[0002] With the intensification of climate change and the frequent occurrence of extreme weather events, the role of small reservoirs in flood control, irrigation and ecological maintenance is becoming increasingly prominent. However, due to their small capacity, simple equipment and insufficient monitoring methods, small reservoirs face significant technical challenges in flood discharge scheduling.

[0003] Currently, flood discharge scheduling of small reservoirs usually adopts scheduling methods based on empirical rules or simple rule-based control models. Traditional methods rely on the historical experience of reservoir managers or single-objective rules. Although they can meet basic flood control needs under normal circumstances, they are obviously insufficient when facing complex hydrological and meteorological conditions and multi-objective scheduling requirements.

[0004] In recent years, automated scheduling technology based on optimization algorithms has been gradually applied to reservoir management. For example, scheduling methods using linear programming, genetic algorithms, or rule-based optimization have been adopted. Existing methods introduce mathematical modeling of scheduling constraints and optimization of objective functions. However, there are still many problems in practical applications. On the one hand, these methods rely on pre-set model assumptions to cope with changes in environmental conditions and lack the ability to learn from highly uncertain environments. On the other hand, these methods are usually solved offline, making it difficult to achieve real-time scheduling optimization.

[0005] In summary, existing technologies have significant shortcomings in real-time response capabilities, multi-objective optimization capabilities, and adaptability to complex dynamic environments, making it difficult to meet the practical needs of intelligent flood discharge scheduling for small reservoirs. Therefore, a new method is urgently needed to achieve dynamic adaptive scheduling optimization, improve the accuracy, comprehensiveness, and real-time performance of flood discharge decisions, and thus better cope with complex hydrological and meteorological conditions and diverse operational objectives. Summary of the Invention

[0006] One objective of this invention is to propose an intelligent flood discharge scheduling method for small reservoirs based on reinforcement learning. This invention significantly improves the efficiency and reliability of flood discharge scheduling for small reservoirs.

[0007] A method for intelligent flood discharge scheduling of a small reservoir based on reinforcement learning according to an embodiment of the present invention includes the following steps:

[0008] S1. Obtain real-time reservoir data to form a complete reservoir dataset;

[0009] S2. Construct the environmental state vector required for reinforcement learning based on the reservoir dataset;

[0010] S3. Calculate and optimize the environmental state vector using a quantum genetic algorithm to generate an initial flood discharge scheduling strategy;

[0011] S4. Establish an improved reinforcement learning model based on the environmental state vector and the initial flood discharge scheduling strategy;

[0012] S5. Using the initial flood discharge scheduling strategy as the initial parameters of the improved reinforcement learning model, the strategy network in the improved reinforcement learning model is updated through multiple rounds of iterative training in a simulated environment to obtain the optimal flood discharge decision scheme.

[0013] S6. Operate the gates according to the optimal flood discharge decision plan, monitor the water level changes of the small reservoir, the safety status of the downstream basin and the degree of ecological demand satisfaction after the execution, and record the monitoring results and actual operation data as feedback information.

[0014] S7. Input the feedback information back into the improved reinforcement learning model, and iteratively update and correct the policy by combining the reservoir dataset and the environmental state vector.

[0015] Optionally, S1 includes the following steps:

[0016] S11. Acquire real-time hydrological data of a small reservoir and its surrounding area through sensor networks and remote monitoring equipment. The data includes the reservoir water level H at a certain time t. t The inflow rate Q at a certain time t t Rainfall R at a certain time t t And the evaporation E at a certain time t t ;

[0017] S12. Utilize sensors to monitor the downstream basin status in real time and collect the water level H of the downstream basin at a certain time t. d,t and the flow rate Q of the downstream basin at a certain time t d,t ;

[0018] S13. Obtain the predicted rainfall value R within the future time interval Δt from the meteorological forecasting system. f,t+Δt And compared with the real-time rainfall R at the current moment. t Perform joint modeling;

[0019] S14. Extract the water level data H from the i-th collection in the historical dispatch records from historical data. h,i The inbound flow data Q collected in the historical scheduling record for the i-th time. h,i The parameters A of the i-th flood discharge operation in the historical dispatch records. h,i And the downstream water level status data H of the i-th time in the historical dispatch record d,h,i Construct a complete set of historical scheduling records H;

[0020] S15. Integrate real-time reservoir water level data, inflow data, rainfall data, evaporation data, downstream basin status data, meteorological forecast data, and historical dispatch records H to construct a complete reservoir dataset D:

[0021] D={H t Q t ,R t E t H d,t Q d,t ,R f,t+Δt ,H}.

[0022] Optionally, S2 includes the following steps:

[0023] S21. Extract key environmental state information of a small reservoir at the current time t based on the reservoir dataset D;

[0024] S22. Construct a reservoir capacity constraint model and a gate operation characteristic model based on the physical characteristics of the small reservoir. The reservoir capacity constraint model is based on the relationship between the current reservoir water level and the maximum allowable water level, and is defined as follows:

[0025] C = H t -H max C≤0;

[0026] Wherein, C indicates that the reservoir water level has exceeded the limit;

[0027] The gate operation characteristic model, based on the gate opening degree and maximum opening degree, is defined as follows:

[0028] G t ∈[0,G max ];

[0029] Among them, G t This indicates the current actual opening degree of the gate;

[0030] S23. Based on the flood control and safety requirements of the downstream area, using the current downstream water level H... d,t and maximum safe water level H d,max As a constraint, the downstream flood control safety model is defined as follows:

[0031] S = H d,t -H d,max S≤0;

[0032] Wherein, S represents the downstream water level exceeding the limit;

[0033] S24. Based on the extracted environmental state information, reservoir capacity constraint model, gate operation characteristic model, and downstream flood control safety model, define the environmental state vector E of the reinforcement learning model. t :

[0034] E t ={H t Q t ,R t E t H d,t Q d,t ,R f,t+Δt ,C,S,G t}

[0035] Optionally, S3 includes the following steps:

[0036] S31. Based on the environment state vector E t A population of individuals for intelligent flood discharge scheduling of small reservoirs is generated, and the individuals in the population are represented by a multi-layer coding form:

[0037] S i ={G,Q,T};

[0038] Where G represents the gate opening degree within the scheduling period, controlling each period t. i The flood discharge capacity ensures that the discharge volume meets reservoir capacity constraints and flood control objectives. Q is the discharge volume vector, indicating the amount of water discharged from the reservoir in each time period, affecting the reservoir water level and flood control safety in the downstream area. T is the discharge time allocation, determining the duration of the discharge action, combined with the inflow flow rate Q. t and rainfall R t Optimize for dynamic changes;

[0039] The population size N is set as the number of candidate reservoir scheduling schemes;

[0040] S32. Population Individual S Based on Multi-Level Scheduling Scheme i Adaptive encoding is performed using qubits, and the state of each qubit is defined as follows:

[0041] ψ i,j =α i,j |0<+β i,j |1<;

[0042] |α i,j | 2 +|β i,j | 2 =1;

[0043] Where, ψ i,j S represents the population of individuals. i The quantum state in the j-th dimension specifically corresponds to the gate opening, flood discharge volume, or time interval, α i,j and β i,j Let |α| represent the quantum probability amplitude of the j-th dimension, used to represent the probability of different scheduling decisions in that dimension. i,j |2 and |β i,j | 2 These represent the probabilities of selecting states |0< and |1<, respectively, and are used for random sampling within the range of gate opening and flood discharge parameters;

[0044] To address the dynamic characteristics of small reservoir scheduling, the encoding dimension d of the qubits is adaptively adjusted:

[0045]

[0046] Where T is the total scheduling time, representing the overall operation time of the small reservoir scheduling scheme, and R... t Q represents the current rainfall, indicating the impact of current external precipitation on the reservoir's flood discharge demand. t The current inflow rate represents the real-time hydrological pressure faced by the reservoir, and K is the scheduling complexity coefficient.

[0047] S33. Construct a dynamic optimization function f(S) based on the actual needs of flood discharge scheduling for small reservoirs. i The dynamic optimization function is defined as follows: (t) , to dynamically balance the benefits of flood control, irrigation, and ecological objectives.

[0048] f(S i ,t)=w1f flood (S i ,t)+w2f irrigation (S i ,t)+w3f eco (S i ,t);

[0049] Where w1, w2, and w3 are the dynamic weights for flood control, irrigation, and ecological objectives, respectively, and are adjusted according to the current operating status of the reservoir. flood (S i f(t) is the flood control constraint benefit function. irrigation (S i Let f(t) be the irrigation demand benefit function, and f(t) be the function of irrigation demand benefit. eco (S i (t) is the ecological water use guarantee function;

[0050] S34. Adjusting individual states through quantum rotation gates to bring the population towards the optimal scheduling strategy:

[0051]

[0052] Where, Δθ ij Based on the current population fitness calculation, collaborative optimization is carried out by combining reservoir capacity constraints, flood discharge volume, and gate opening strategy;

[0053] S35. When the population fitness meets the preset convergence condition, output the final initial flood discharge scheduling strategy:

[0054] S opt ={G opt Q opt ,T opt};

[0055] If the convergence condition is not met, return to step S32 to continue iterating until the termination condition is met.

[0056] Optionally, the flood control constraint benefit function f flood (S i ,t) is defined as the safety guarantee benefit of flood discharge scheduling on the current reservoir water level and downstream water level:

[0057]

[0058] Among them, H max H is the maximum safe water level allowed by the reservoir. safe H is the safety warning water level for the reservoir. d,max and H d,safe These are the maximum safe water level and the safety warning water level for the downstream basin, respectively. out,t The discharge volume, Q, directly affects the downstream water level change. in,t This indicates the current inbound flow rate;

[0059] Irrigation demand benefit function f irrigation (S i (t) measures the degree to which the scheduling scheme meets agricultural irrigation needs:

[0060]

[0061] Where η is the water conveyance efficiency, Q irrigation Q represents the current agricultural irrigation demand. out,t This is the downstream ecological water demand, used to measure the minimum flow required to maintain the river ecosystem;

[0062] Ecological water use guarantee function f eco (S i ,t) Evaluate the water supply guarantee of the scheduling plan for the downstream ecosystem:

[0063]

[0064] Among them, Q eco To maintain the minimum amount of water required for downstream ecosystems, |Q eco -Q out,t |Measure the degree of deviation in ecological water use.

[0065] Optionally, S4 includes the following steps:

[0066] S41. The initial improved reinforcement learning model, based on the environmental state vector and the initial flood discharge scheduling strategy, is initialized as follows:

[0067] θ0=Init(w,b,E t ,S opt );

[0068] Where θ0 represents the initial parameter set of the improved reinforcement learning model, including weights w and biases b, used for initializing the reinforcement learning policy network, and Init represents the initialization function, whose parameter input is the environment state vector E. t and initial flood discharge scheduling strategy S opt Based on the current hydrological conditions and preliminary optimization results of small reservoirs, the basic architecture of the scheduling strategy network is generated.

[0069] S42. Action space A is defined as gate opening G. t Discharge period ΔT, discharge volume Q out,t The dynamic range of the set is calculated as follows:

[0070] G t ∈[G safe G max ];

[0071]

[0072]

[0073] Among them, G safe This represents the minimum gate opening required to ensure that gate operation does not pose a risk to the structure, where ΔT is the flood discharge period, calculated based on the current reservoir water level H. t The excess portion and the inbound flow Q in,t With flood discharge Q out,t The difference ensures that the scheduling action is completed within a safe timeframe;

[0074] S43. Based on the multi-objective requirements of small reservoirs, construct a reward function R(E) that encompasses flood control benefits, agricultural irrigation demand satisfaction, ecological water use security, and downstream safety indicators. t ,a t ):

[0075] R(E t ,a t )=α1·f flood (E t ,a t )+α2·f irrigation (E t ,a t )+α3·f eco (E t ,a t)+α4·

[0076] f safety (E t ,a t );

[0077] Among them, f flood (E t ,a t (f) measures the contribution of flood discharge actions to the current reservoir water level safety. irrigation (E t ,a t () indicates the degree to which the current flood discharge meets irrigation needs, f eco (E t ,a t To ensure current ecological water use, f safety (E t ,a t The impact of flood discharge on downstream water level safety is measured, where α1, α2, α3, and α4 represent the weighting coefficients of each target.

[0078] S44. Combining the action space A from step S42, the reward function from step S43, and the environment state vector, the improved reinforcement learning model is defined as a quadruple:

[0079] <ε,A,P,R>;

[0080] Where ε is the environmental state space, corresponding to the current and predicted hydrological information of the small reservoir; A is the action space, corresponding to the gate opening, flood discharge period and discharge volume; P is the environmental state transition probability distribution, which is calculated in combination with the water level evolution and downstream response during the reservoir scheduling process; and R is the reward function, which measures the degree to which each scheduling action meets the multi-objective requirements.

[0081] S45. Combining steps S41 to S44, based on the environmental state vector and the initial flood discharge scheduling strategy, complete the parameter setting and structure definition of the improved reinforcement learning model, forming a scheduling model that can output flood discharge decisions in real time under multi-objective constraints.

[0082] Optionally, S5 includes the following steps:

[0083] S51. Based on the improved reinforcement learning model quadruple <ε,A,P,R>, the hydrological change process of a small reservoir is modeled in a simulation environment. The simulation environment extrapolates the water level change, inflow change and downstream basin response at different times {t,t+1,…} according to reservoir capacity constraints, gate operation characteristics and historical data.

[0084] S52. Using the initial parameter set θ0 and the initial scheduling strategy S opt The policy network of the improved reinforcement learning model is loaded into the simulation environment.

[0085] S53. Run several rounds in a simulation environment, each round containing the following training process:

[0086] The environmental state vector E is observed at time t. t ;

[0087] Based on the current policy network Select action a t ∈A, where θ k These are the network parameters updated after the k-th round of training;

[0088] Perform action a t Subsequently, the simulation environment updates the reservoir and downstream watershed states to E based on the state transition probability P. t+1 And calculate the reward R(E) t ,a t );

[0089] Based on the obtained reward R(E) t ,a t ) and subsequent state E t+1 Update the policy network, changing the network parameters from θ. k Adjust to θ k+1 The update method employs either gradient-based optimal value function approximation or time-series difference-based adaptive update.

[0090] S54. After completing multiple rounds of training, determine the convergence degree of the policy network. If the preset convergence condition is met, output a new flood discharge scheduling policy π. θ* (a t |E t ) and its corresponding network parameter θ * If the convergence condition is not met, return to step S53 to continue training until a better flood discharge scheduling strategy under complex hydrological and meteorological conditions is obtained, and the optimal flood discharge decision scheme is generated:

[0091] S′ opt ={G′ opt ,Q′ opt ,T′ opt}

[0092] The beneficial effects of this invention are:

[0093] (1) In the initialization stage of the reinforcement learning model, the present invention introduces a quantum genetic algorithm. By combining qubit encoding with rotating door operation, the scheduling strategy is optimized efficiently. Traditional random initialization strategies often have slow convergence speed and unstable initial decisions due to the large exploration space. However, the quantum genetic algorithm can quickly locate the global optimal or near-optimal scheduling scheme under multi-objective constraints. The adaptive adjustment of the dimension by the qubit encoding enables the algorithm to dynamically adapt to the complex environment of small reservoir flood discharge. At the same time, the efficient evolution of the population is achieved through rotating door update of probability amplitude, which greatly improves the quality of the initial scheduling strategy.

[0094] (2) This invention constructs a multi-objective reward function covering flood control, irrigation, ecological maintenance and downstream safety and introduces a dynamic weight adjustment mechanism, so that the scheduling strategy can optimize the balance between different objectives in real time according to the current hydrological conditions of the reservoir. The reward function can prioritize the safety of reservoir capacity during the flood peak period, dynamically allocate water resources during the peak period of irrigation demand, and ensure the minimum requirements for downstream ecological water use. Through real-time evaluation of downstream water level changes and ecological flow deviations, the algorithm can accurately regulate the flood discharge volume and avoid the phenomenon of resource waste or increased risk caused by the single objective bias in traditional methods.

[0095] (3) This invention achieves multi-round strategy iteration optimization in a simulated environment through the closed-loop training mechanism of the reinforcement learning model, enabling the model to learn the optimal scheduling scheme autonomously under complex and variable hydrological and meteorological conditions. The reinforcement learning model can not only dynamically adjust the scheduling strategy through historical data and real-time monitoring data, but also improve the decision-making ability in unknown scenarios through the adaptive update of the strategy network. It can respond quickly under sudden heavy rainfall or extreme weather conditions and output safe and efficient flood discharge strategies in real time, effectively reducing the risk of reservoir operation. Attached Figure Description

[0096] The accompanying drawings are provided to further illustrate the invention and form part of the specification. They are used in conjunction with embodiments of the invention to explain the invention and do not constitute a limitation thereof. In the drawings:

[0097] Figure 1 This is a flowchart of a small reservoir intelligent flood discharge scheduling method based on reinforcement learning proposed in this invention. Detailed Implementation

[0098] The present invention will now be described in further detail with reference to the accompanying drawings. These drawings are simplified schematic diagrams, illustrating only the basic structure of the invention, and therefore only show the components relevant to the invention.

[0099] refer to Figure 1 A method for intelligent flood discharge scheduling of small reservoirs based on reinforcement learning includes the following steps:

[0100] S1. Obtain real-time reservoir data to form a complete reservoir dataset;

[0101] S2. Construct the environmental state vector required for reinforcement learning based on the reservoir dataset;

[0102] S3. Utilize quantum genetic algorithms to calculate and optimize the environmental state vector, generating an initial flood discharge scheduling strategy;

[0103] S4. An improved reinforcement learning model is established based on the environmental state vector and the initial flood discharge scheduling strategy;

[0104] S5. Using the initial flood discharge scheduling strategy as the initial parameters of the improved reinforcement learning model, the strategy network in the improved reinforcement learning model is updated through multiple rounds of iterative training in a simulated environment to obtain the optimal flood discharge decision scheme.

[0105] S6. Operate the gates according to the optimal flood discharge decision plan, monitor the water level changes of the small reservoir, the safety status of the downstream basin and the degree of ecological demand satisfaction after the execution, and record the monitoring results and actual operation data as feedback information.

[0106] S7. Input the feedback information back into the improved reinforcement learning model, and iteratively update and correct the policy by combining the reservoir dataset and the environmental state vector.

[0107] In this embodiment, S1 includes the following steps:

[0108] S11. Acquire real-time hydrological data of a small reservoir and its surrounding area through sensor networks and remote monitoring equipment. The data includes the reservoir water level H at a certain time t. t The inflow rate Q at a certain time t t Rainfall R at a certain time t t And the evaporation E at a certain time t t ;

[0109] S12. Utilize sensors to monitor the downstream basin status in real time and collect the water level H of the downstream basin at a certain time t. d,t and the flow rate Q of the downstream basin at a certain time t d,t ;

[0110] S13. Obtain the predicted rainfall value R within the future time interval Δt from the meteorological forecasting system. f,t+Δt And compared with the real-time rainfall R at the current moment. t Perform joint modeling;

[0111] S14. Extract the water level data H from the i-th collection in the historical dispatch records from historical data. h,i The inbound flow data Q collected in the historical scheduling record for the i-th time.h,i The parameters A of the i-th flood discharge operation in the historical dispatch records. h,i And the downstream water level status data H of the i-th time in the historical dispatch record d,h,i Construct a complete set of historical scheduling records H;

[0112] S15. Integrate real-time reservoir water level data, inflow data, rainfall data, evaporation data, downstream basin status data, meteorological forecast data, and historical dispatch records H to construct a complete reservoir dataset D:

[0113] D={H t Q t ,R t E t H d,t Q d,t ,R f,t+Δt ,H}.

[0114] In this embodiment, S2 includes the following steps:

[0115] S21. Extract key environmental state information of a small reservoir at the current time t based on the reservoir dataset D;

[0116] S22. Construct a reservoir capacity constraint model and a gate operation characteristic model based on the physical characteristics of small reservoirs. The reservoir capacity constraint model is based on the relationship between the current reservoir water level and the maximum allowable water level, and is defined as follows:

[0117] C = H t -H max C≤0;

[0118] Wherein, C indicates that the reservoir water level has exceeded the limit;

[0119] The gate operation characteristic model, based on the gate opening degree and maximum opening degree, is defined as follows:

[0120] G t ∈[0,G max ];

[0121] Among them, G t This indicates the current actual opening degree of the gate;

[0122] S23. Based on the flood control and safety requirements of the downstream area, using the current downstream water level H... d,t and maximum safe water level H d,max As a constraint, the downstream flood control safety model is defined as follows:

[0123] S = H d,t -H d,max S≤0;

[0124] Wherein, S represents the downstream water level exceeding the limit;

[0125] S24. Based on the extracted environmental state information, reservoir capacity constraint model, gate operation characteristic model, and downstream flood control safety model, define the environmental state vector E of the reinforcement learning model. t :

[0126] E t ={H t Q t ,R t E t H d,t Q d,t ,R f,t+Δt ,C,S,G t}

[0127] In this embodiment, S3 includes the following steps:

[0128] S31. Based on the environment state vector E t A population of individuals for intelligent flood discharge scheduling of small reservoirs is generated, and the individuals in the population are represented by a multi-layer coding form:

[0129] S i ={G,Q,T};

[0130] Where G represents the gate opening degree within the scheduling period, controlling each period t. i The flood discharge capacity ensures that the discharge volume meets reservoir capacity constraints and flood control objectives. Q is the discharge volume vector, indicating the amount of water discharged from the reservoir in each time period, affecting the reservoir water level and flood control safety in the downstream area. T is the discharge time allocation, determining the duration of the discharge action, combined with the inflow flow rate Q. t and rainfall R t Optimize for dynamic changes;

[0131] The population size N is set as the number of candidate reservoir scheduling schemes;

[0132] S32. Population Individual S Based on Multi-Level Scheduling Scheme i Adaptive encoding is performed using qubits, and the state of each qubit is defined as follows:

[0133] ψ i,j =α i,j |0>+β i,j |1>;

[0134] |α i,j | 2 +|β i,j | 2 =1;

[0135] Where, ψ i,j S represents the population of individuals.i The quantum state in the j-th dimension specifically corresponds to the gate opening, flood discharge volume, or time interval, α i,j and β i,j Let |α| represent the quantum probability amplitude of the j-th dimension, used to represent the probability of different scheduling decisions in that dimension. i,j | 2 and |β i,j | 2 These represent the probabilities of selecting states |0> and |1>, respectively, and are used for random sampling within the range of gate opening and flood discharge parameters;

[0136] To address the dynamic characteristics of small reservoir scheduling, the encoding dimension d of the qubits is adaptively adjusted:

[0137]

[0138] Where T is the total scheduling time, representing the overall operation time of the small reservoir scheduling scheme, and R... t Q represents the current rainfall, indicating the impact of current external precipitation on the reservoir's flood discharge demand. t The current inflow rate represents the real-time hydrological pressure faced by the reservoir, and K is the scheduling complexity coefficient.

[0139] S33. Construct a dynamic optimization function f(S) based on the actual needs of flood discharge scheduling for small reservoirs. i The dynamic optimization function is defined as follows: (t) , to dynamically balance the benefits of flood control, irrigation, and ecological objectives.

[0140] f(S i ,t)=w1f flood (S i ,t)+w2f irrigation (S i ,t)+w3f eco (S i ,t);

[0141] Where w1, w2, and w3 are the dynamic weights for flood control, irrigation, and ecological objectives, respectively, and are adjusted according to the current operating status of the reservoir. flood (S i f(t) is the flood control constraint benefit function. irrigation (S i Let f(t) be the irrigation demand benefit function, and f(t) be the function of irrigation demand benefit. eco (S i (t) is the ecological water use guarantee function;

[0142] S34. Adjusting individual states through quantum rotation gates to bring the population towards the optimal scheduling strategy:

[0143]

[0144] Where, Δθ ij Based on the current population fitness calculation, collaborative optimization is carried out by combining reservoir capacity constraints, flood discharge volume, and gate opening strategy;

[0145] S35. When the population fitness meets the preset convergence condition, output the final initial flood discharge scheduling strategy:

[0146] S opt ={G opt Q opt ,T opt};

[0147] If the convergence condition is not met, return to step S32 to continue iterating until the termination condition is met.

[0148] In this embodiment, the flood control constraint benefit function f flood (S i ,t) is defined as the safety guarantee benefit of flood discharge scheduling on the current reservoir water level and downstream water level:

[0149]

[0150] Among them, H max H is the maximum safe water level allowed by the reservoir. safe H is the safety warning water level for the reservoir. d,max and H d,safe These are the maximum safe water level and the safety warning water level for the downstream basin, respectively. out,t The discharge volume, Q, directly affects the downstream water level change. in,t This indicates the current inbound flow rate;

[0151] Irrigation demand benefit function f irrigation (S i (t) measures the degree to which the scheduling scheme meets agricultural irrigation needs:

[0152]

[0153] Where η is the water conveyance efficiency, Q irrigation Q represents the current agricultural irrigation demand. out,t This is the downstream ecological water demand, used to measure the minimum flow required to maintain the river ecosystem;

[0154] Ecological water use guarantee function f eco (S i ,t) Evaluate the water supply guarantee of the scheduling plan for the downstream ecosystem:

[0155]

[0156] Among them, Q ecoTo maintain the minimum amount of water required for downstream ecosystems, |Q eco -Q out,t |Measure the degree of deviation in ecological water use.

[0157] In this embodiment, S4 includes the following steps:

[0158] S41. The initial improved reinforcement learning model, based on the environmental state vector and the initial flood discharge scheduling strategy, is initialized as follows:

[0159] θ0=Init(w,b,E t ,S opt );

[0160] Where θ0 represents the initial parameter set of the improved reinforcement learning model, including weights w and biases b, used for initializing the reinforcement learning policy network, and Init represents the initialization function, whose parameter input is the environment state vector E. t and initial flood discharge scheduling strategy S opt Based on the current hydrological conditions and preliminary optimization results of small reservoirs, the basic architecture of the scheduling strategy network is generated.

[0161] S42. Action space A is defined as gate opening G. t Discharge period ΔT, discharge volume Q out,t The dynamic range of the set is calculated as follows:

[0162] G t ∈[G safe G max ];

[0163]

[0164] Among them, G safe This represents the minimum gate opening required to ensure that gate operation does not pose a risk to the structure, where ΔT is the flood discharge period, calculated based on the current reservoir water level H. t The excess portion and the inbound flow Q in,t With flood discharge Q out,t The difference ensures that the scheduling action is completed within a safe timeframe;

[0165] S43. Based on the multi-objective requirements of small reservoirs, construct a reward function R(E) that encompasses flood control benefits, agricultural irrigation demand satisfaction, ecological water use security, and downstream safety indicators. t ,a t ):

[0166] R(E t ,a t )=α1·f flood (E t ,a t )+α2·firrigation (E t ,a t )+α3·f eco (E t ,a t )+α4·

[0167] f safety (E t ,a t );

[0168] Among them, f flood (E t ,a t (f) measures the contribution of flood discharge actions to the current reservoir water level safety. irrigation (E t ,a t () indicates the degree to which the current flood discharge meets irrigation needs, f eco (E t ,a t To ensure current ecological water use, f safety (E t ,a t The impact of flood discharge on downstream water level safety is measured, where α1, α2, α3, and α4 represent the weighting coefficients of each target.

[0169] S44. Combining the action space A from step S42, the reward function from step S43, and the environment state vector, the improved reinforcement learning model is defined as a quadruple:

[0170] <ε,A,P,R>;

[0171] Where ε is the environmental state space, corresponding to the current and predicted hydrological information of the small reservoir; A is the action space, corresponding to the gate opening, flood discharge period and discharge volume; P is the environmental state transition probability distribution, which is calculated in combination with the water level evolution and downstream response during the reservoir scheduling process; and R is the reward function, which measures the degree to which each scheduling action meets the multi-objective requirements.

[0172] S45. Combining steps S41 to S44, based on the environmental state vector and the initial flood discharge scheduling strategy, complete the parameter setting and structure definition of the improved reinforcement learning model, forming a scheduling model that can output flood discharge decisions in real time under multi-objective constraints.

[0173] In this embodiment, S5 includes the following steps:

[0174] S51. Based on the improved reinforcement learning model quadruple <ε,A,P,R>, the hydrological change process of a small reservoir is modeled in a simulation environment. The simulation environment extrapolates the water level change, inflow change and downstream basin response at different times {t,t+1,…} according to reservoir capacity constraints, gate operation characteristics and historical data.

[0175] S52. Using the initial parameter set θ0 and the initial scheduling strategy S opt Load the policy network of the improved reinforcement learning model into a simulation environment.

[0176] S53. Run several rounds in a simulation environment, each round containing the following training process:

[0177] The environmental state vector E is observed at time t. t ;

[0178] Based on the current policy network Select action a t ∈A, where θ k These are the network parameters updated after the k-th round of training;

[0179] Perform action a t Subsequently, the simulation environment updates the reservoir and downstream watershed states to E based on the state transition probability P. t+1 And calculate the reward R(E) t ,a t );

[0180] Based on the obtained reward R(E) t ,a t ) and subsequent state E t+1 Update the policy network, changing the network parameters from θ. k Adjust to θ k+1 The update method employs either gradient-based optimal value function approximation or time-series difference-based adaptive update.

[0181] S54. After completing multiple rounds of training, determine the convergence degree of the policy network. If the preset convergence condition is met, output a new flood discharge scheduling policy π. θ* (a t |E t ) and its corresponding network parameter θ * If the convergence condition is not met, return to step S53 to continue training until a better flood discharge scheduling strategy under complex hydrological and meteorological conditions is obtained, and the optimal flood discharge decision scheme is generated:

[0182] S′ opt ={G′ opt ,Q′ opt ,T′ opt}

[0183] Example 1:

[0184] In mid-July 2024, a small reservoir, "XX Reservoir," located in Central China, experienced unusually heavy rainfall. At 8:00 AM on July 15th, the local meteorological department issued a red alert for heavy rain, predicting a cumulative rainfall of 400 mm over the next 48 hours, with a peak rainfall intensity of 60 mm per hour. As a major flood control facility, XX Reservoir has a capacity of 1 million cubic meters and an initial water level of 16 meters, leaving a 6-meter margin from the maximum safe water level of 22 meters. The downstream area includes an ecological protection zone and 1,500 mu (approximately 100 hectares) of farmland. The maximum permissible water level in the ecological protection zone is 22 meters, and the farmland urgently needs 300,000 cubic meters of water.

[0185] At 9:00 AM on July 15th, the system collected the following key data through the sensor network deployed at XX Reservoir: the current reservoir water level is 16.5 meters, the inflow rate is 20 cubic meters per second, and the preliminary forecast is that the inflow rate will gradually increase over the next 6 hours, reaching a peak of 150 cubic meters per second. The downstream water level is 20 meters, which is 2 meters away from the maximum safe water level of 22 meters. Based on this real-time monitoring data, the system recorded this key information in the environmental state vector and, combined with the rainfall data from the weather forecast, quickly constructed a complete reservoir status.

[0186] Using a quantum genetic algorithm, an initial scheduling strategy was quickly generated: the gate opening was initially set at 30%, the flood discharge period was set at 2 hours, and the initial discharge rate was 50 cubic meters per second. Subsequently, a reinforcement learning model took over the scheduling task and entered real-time decision-making mode. At 9:30 AM, the system calculated the optimal scheduling strategy based on the current state and alerted management personnel with an alarm signal: due to the continuous increase in inflow, the gate opening needed to be adjusted to 50%, and the discharge rate increased to 75 cubic meters per second.

[0187] At 10:00 AM on July 15th, the rainfall intensified further, and the inflow rate rapidly increased to 100 cubic meters per second, with the reservoir water level reaching 18 meters. At this time, the system detected that the inflow rate would continue to rise within the next two hours. Combining the water level status in the downstream area and the irrigation needs of farmland, the reinforcement learning model proposed a scheduling strategy: increase the gate opening to 60%, increase the flood discharge to 100 cubic meters per second, and simultaneously remind management personnel to control the flood discharge duration to not exceed 4 hours to ensure that the water level in the downstream ecological protection area does not exceed 22 meters.

[0188] At 2 PM, the heavy rainfall continued, and the reservoir water level had reached 19.5 meters, with the inflow reaching a peak of 150 cubic meters per second. The system monitored the downstream protected area, noting that the water level was approaching 21.5 meters. The model adjusted its strategy based on multi-objective trade-offs: reducing the gate opening to 40%, controlling the discharge to 60 cubic meters per second, and extending the discharge period to 6 hours to slow the rise in downstream water levels. Simultaneously, the model set up a discharge plan for the next phase based on farmland irrigation needs and ecological water security, ensuring that irrigation demands are fully met after the rain stops.

[0189] After the scheduling process is completed, the following Table 1 records the specific performance of the method of the present invention:

[0190] Table 1 Scheduling process data

[0191]

[0192] As shown in Table 1 above, this invention, through real-time optimization of a reinforcement learning model, adjusts the gate opening and discharge volume during the dynamic changes in water level and inflow, ensuring the reservoir remains in a safe operating state. At the peak of 150 cubic meters per second inflow, the reservoir water level is controlled at 19.5 meters (far below the maximum safe water level of 22 meters). When the reservoir water level approaches the critical value (19.5 meters), the discharge volume is rapidly increased to 100 cubic meters per second to quickly reduce reservoir pressure. When the downstream water level reaches a high of 21.5 meters, the discharge volume is adjusted to 60 cubic meters per second to balance downstream ecological security and irrigation needs. The farmland irrigation satisfaction rate and ecological water use guarantee rate gradually increase during the scheduling process, ultimately reaching 100% and 95%, respectively.

[0193] Compared to the fixed-rule scheduling of traditional methods, Table 2 below shows the performance comparison data of the two methods:

[0194] Table 2 Performance Comparison Data

[0195] index Traditional methods Method of the present invention <![CDATA[Peak flood discharge (m 3 / s)]]> 50 100 Maximum downstream water level (meters) 21.8 21.5 Farmland irrigation satisfaction rate (%) 70 100 Decision response time (seconds) 300 120

[0196] This invention increases peak discharge to 100 cubic meters per second at critical moments, double the 50 cubic meters per second of traditional methods. This not only effectively alleviates reservoir water level pressure but also ensures the safety of downstream areas, while providing more water resources to support irrigation and ecological needs. The invention controls the maximum downstream water level at 21.5 meters, 0.3 meters lower than traditional methods. This difference fully demonstrates that the invention can accurately predict downstream water level changes and dynamically adjust the discharge strategy, significantly improving the safety of downstream areas. Farmland irrigation satisfaction rate increases from 70% to 100% using traditional methods, and ecological water security rate increases from 70% to 95%. The decision response time of this invention is reduced from 300 seconds to 120 seconds, a 60% reduction, thanks to the real-time update capability of the reinforcement learning model and the high-quality initial strategy generated by the quantum genetic algorithm, significantly improving scheduling efficiency.

[0197] As can be seen from the implementation of the above scheduling process, the method of the present invention can accurately and in real time adjust the flood discharge strategy under dynamic and complex hydrological and meteorological conditions, effectively solve the shortcomings of traditional scheduling methods in terms of flood control efficiency, resource utilization and ecological protection, and greatly improve the comprehensive optimization capability of intelligent flood discharge of small reservoirs.

[0198] This invention introduces a quantum genetic algorithm in the initialization stage of the reinforcement learning model. By combining qubit encoding with a rotating gate operation, the scheduling strategy is optimized efficiently. Traditional random initialization strategies often suffer from slow convergence and unstable initial decisions due to the large exploration space. In contrast, the quantum genetic algorithm can quickly locate the globally optimal or near-optimal scheduling scheme under multi-objective constraints. The adaptive adjustment of the dimensionality by the qubit encoding enables the algorithm to dynamically adapt to the complex environment of small reservoir flood discharge. At the same time, the efficient evolution of the population is achieved through rotating gate updates with probability amplitude, which greatly improves the quality of the initial scheduling strategy.

[0199] This invention constructs a multi-objective reward function encompassing flood control, irrigation, ecological maintenance, and downstream safety, and introduces a dynamic weight adjustment mechanism. This enables the scheduling strategy to optimize the balance between different objectives in real time based on the current hydrological conditions of the reservoir. The reward function prioritizes reservoir safety during peak flood seasons, dynamically allocates water resources during peak irrigation demand periods, and ensures the minimum requirements for downstream ecological water use. Through real-time evaluation of downstream water level changes and ecological flow deviations, the algorithm can accurately regulate flood discharge, avoiding the resource waste or increased risk caused by the single-objective bias in traditional methods.

[0200] This invention achieves multi-round strategy iteration optimization in a simulated environment through the closed-loop training mechanism of the reinforcement learning model. This enables the model to autonomously learn the optimal scheduling scheme under complex and variable hydrological and meteorological conditions. The reinforcement learning model can not only dynamically adjust the scheduling strategy through historical data and real-time monitoring data, but also improve its decision-making ability in unknown scenarios through the adaptive update of the policy network. It can respond quickly under sudden heavy rainfall or extreme weather conditions and output safe and efficient flood discharge strategies in real time, effectively reducing the risk of reservoir operation.

[0201] The above description is only a preferred embodiment of the present invention, but the scope of protection of the present invention is not limited thereto. Any equivalent substitutions or modifications made by those skilled in the art within the scope of the technology disclosed in the present invention, based on the technical solution and inventive concept of the present invention, should be covered within the scope of protection of the present invention.

Claims

1. A method for intelligent flood discharge scheduling of small reservoirs based on reinforcement learning, characterized in that, Includes the following steps: S1. Obtain real-time reservoir data to form a complete reservoir dataset; S2. Construct the environmental state vector required for reinforcement learning based on the reservoir dataset; S3. Calculate and optimize the environmental state vector using a quantum genetic algorithm to generate an initial flood discharge scheduling strategy; S4. Establish an improved reinforcement learning model based on the environmental state vector and the initial flood discharge scheduling strategy; S5. Using the initial flood discharge scheduling strategy as the initial parameters of the improved reinforcement learning model, the policy network in the improved reinforcement learning model is updated through multiple rounds of iterative training to obtain the optimal flood discharge decision scheme; S6. Operate the gates according to the optimal flood discharge decision plan, monitor the water level changes of the small reservoir, the safety status of the downstream basin, and the degree of ecological demand satisfaction after the execution, and record the monitoring results and actual operation data as feedback information. S7. Input the feedback information back into the improved reinforcement learning model, and iteratively update and correct the policy by combining the reservoir dataset and the environmental state vector.

2. The intelligent flood discharge scheduling method for small reservoirs based on reinforcement learning according to claim 1, characterized in that, S1 includes the following steps: S11. Acquire real-time hydrological data of a small reservoir and its surrounding area through sensor networks and remote monitoring equipment. The data includes the reservoir water level at a specific time t. Inflow rate at a certain time t Rainfall at a certain time t Evaporation at a certain time t ; S12. Utilize sensors to monitor the downstream watershed status in real time and collect the water level of the downstream watershed at a certain time t. and the flow rate of the downstream basin at a certain time t ; S13. Obtain future time intervals from weather forecasting systems. Rainfall forecast within And the real-time rainfall at the current moment. Perform joint modeling; S14. Extract the water level data collected for the i-th time from historical dispatch records. The inbound flow data collected for the i-th time in the historical scheduling records. Parameters of the i-th flood discharge operation in the historical dispatch records and the downstream water level status data of the i-th time in the historical dispatch records Construct a complete set of historical scheduling records H; S15. Integrate real-time reservoir water level data, inflow data, rainfall data, evaporation data, downstream basin status data, meteorological forecast data, and historical dispatch records H to construct a complete reservoir dataset D: 。 3. The intelligent flood discharge scheduling method for small reservoirs based on reinforcement learning according to claim 1, characterized in that, S2 includes the following steps: S21. Extract key environmental state information of a small reservoir at the current time t based on the reservoir dataset D; S22. Construct a reservoir capacity constraint model and a gate operation characteristic model based on the physical characteristics of the small reservoir. The reservoir capacity constraint model uses the current reservoir water level and the maximum allowable water level as the basis for the model. Based on the relationship between them, it is defined as: ； Wherein, C indicates that the reservoir water level has exceeded the limit; The gate operation characteristic model is based on the gate opening degree and maximum opening degree. Defined as: ； in, This indicates the current actual opening degree of the gate; S23. Based on the flood control and safety needs of the downstream area, and the current downstream water level and maximum safe water level As a constraint, the downstream flood control safety model is defined as follows: ； Where S represents the downstream water level exceeding the limit; S24. Based on the extracted environmental state information, reservoir capacity constraint model, gate operation characteristic model, and downstream flood control safety model, define the environmental state vector of the reinforcement learning model. : 。 4. The intelligent flood discharge scheduling method for small reservoirs based on reinforcement learning according to claim 1, characterized in that, S3 includes the following steps: S31. Based on environment state vector A population of individuals for intelligent flood discharge scheduling of small reservoirs is generated, and the individuals in the population are represented by a multi-layer coding form: ； Where G represents the gate opening degree within the scheduling period, controlling each period. The flood discharge capacity ensures that the discharge volume meets reservoir capacity constraints and flood control objectives. Q is the discharge volume vector, indicating the amount of water discharged from the reservoir in each time period, affecting the reservoir water level and flood control safety in the downstream area. T is the discharge time allocation, determining the duration of the discharge action, combined with the current inflow. and rainfall Optimize for dynamic changes; The population size N is set as the number of candidate reservoir scheduling schemes; S32. Population Individuals Based on Multi-Level Scheduling Scheme Adaptive encoding is performed using qubits, and the state of each qubit is defined as follows: ； in, Represents individuals in a population In the j-th dimension, the quantum state specifically corresponds to the gate opening, the flood discharge volume, or the time interval. and Let be the quantum probability amplitudes of the j-th dimension, representing the likelihood of different scheduling decisions in that dimension. and These represent the selection states. and The probability is used for random sampling within the range of gate opening and flood discharge parameters; To address the dynamic characteristics of small reservoir scheduling, the encoding dimension d of the qubits is adaptively adjusted: ； in, The total scheduling duration represents the overall operation time of the small reservoir scheduling plan. The current rainfall represents the impact of current external precipitation on the reservoir's flood discharge demand. The current inflow rate represents the real-time hydrological pressure faced by the reservoir, and K is the scheduling complexity coefficient. S33. Construct a dynamic optimization function based on the actual needs of flood discharge scheduling for small reservoirs. Based on the benefits of dynamically balancing flood control, irrigation, and ecological objectives, the dynamic optimization function is defined as follows: ； in, The dynamic weights for flood control, irrigation, and ecological objectives are adjusted based on the current operational status of the reservoir. For flood control constraint benefit function, Let the irrigation demand benefit function be... For ecological water use guarantee function; S34. Adjusting individual states through quantum rotation gates to bring the population towards the optimal scheduling strategy: ； in, Based on the current population fitness calculation, collaborative optimization is carried out by combining reservoir capacity constraints, flood discharge volume, and gate opening strategy; S35. When the population fitness meets the preset convergence condition, output the final initial flood discharge scheduling strategy: ； If the convergence condition is not met, return to step S32 to continue iterating until the termination condition is met.

5. The intelligent flood discharge scheduling method for small reservoirs based on reinforcement learning according to claim 4, characterized in that, The flood control constraint benefit function Defined as the safety guarantee benefit of flood discharge scheduling for the current reservoir water level and downstream water level: in, The maximum safe water level allowed by the reservoir. The reservoir's safety warning water level, and These are the maximum safe water level and the safety warning water level for the downstream basin, respectively. The discharge volume directly affects downstream water level changes. This represents the current inbound flow. This is the period for flood discharge; Irrigation demand-benefit function Measuring the degree to which the scheduling plan meets agricultural irrigation needs: ； in, For water conveyance efficiency, This indicates the current agricultural irrigation demand. Ecological water use guarantee function Assess the water availability of the scheduling plan for the downstream ecosystem: ； in, To maintain the minimum amount of water required for downstream ecosystems, To measure the degree of deviation in ecological water use.

6. The intelligent flood discharge scheduling method for small reservoirs based on reinforcement learning according to claim 1, characterized in that, S4 includes the following steps: S41. The initial improved reinforcement learning model, based on the environmental state vector and the initial flood discharge scheduling strategy, is initialized with the following parameters: ； in, This represents the initial set of parameters for improving the reinforcement learning model, including weights. and bias Used for initializing reinforcement learning policy networks. This represents the initialization function, whose input parameter is the environment state vector. and initial flood discharge scheduling strategy Based on the current hydrological conditions and preliminary optimization results of small reservoirs, the basic architecture of the scheduling strategy network is generated. S42. The action space A is defined as the gate opening degree. Flood discharge period Flood discharge volume The dynamic range of the set is calculated as follows: ； in, This indicates the minimum opening degree required to ensure that gate operation does not pose a risk to the structure. For the flood discharge period, the calculation is based on the current reservoir water level. Excess quantity and inbound flow With flood discharge volume The difference ensures that the scheduling action is completed within a safe timeframe; S43. Based on the multi-objective requirements of small reservoirs, construct a reward function that encompasses flood control benefits, agricultural irrigation demand satisfaction, ecological water security, and downstream safety indicators. : ； in, To assess the contribution of flood discharge actions to the current safety of the reservoir water level, This indicates the degree to which the current flood discharge meets irrigation needs. To ensure current ecological water use, Assess the impact of flood discharge operations on downstream water level safety. Represents the weighting coefficients of each objective; S44. Combining the action space A from step S42, the reward function from step S43, and the environment state vector, the improved reinforcement learning model is defined as a quadruple: ； in, A represents the environmental state space, corresponding to the current and predicted hydrological information of the small reservoir; A represents the action space, corresponding to the gate opening, flood discharge period, and flood discharge volume. The probability distribution of environmental state transition is calculated by combining the water level evolution and downstream response during the reservoir scheduling process. R is the reward function, which measures the degree to which each scheduling action meets the multi-objective requirements. S45. Combining steps S41 to S44, based on the environmental state vector and the initial flood discharge scheduling strategy, complete the parameter setting and structure definition of the improved reinforcement learning model, forming a scheduling model that can output flood discharge decisions in real time under multi-objective constraints.

7. The intelligent flood discharge scheduling method for small reservoirs based on reinforcement learning according to claim 1, characterized in that, S5 includes the following steps: S51. Based on the improved reinforcement learning model quadruple The hydrological changes of a small reservoir are modeled in a simulation environment. The simulation environment extrapolates different time points based on reservoir capacity constraints, gate operation characteristics, and historical data. Water level changes, inflow changes and downstream basin response; S52. Using the initial parameter set and initial scheduling strategy The policy network of the improved reinforcement learning model is loaded into the simulation environment. ; S53. Run several rounds in a simulation environment, each round containing the following training process: The environmental state vector is observed at time t. ; Based on the current policy network Select Action ,in These are the network parameters updated after the k-th round of training. Execute action Subsequently, the simulation environment updates the reservoir and downstream watershed states based on the state transition probability P. And calculate the reward ; Based on the rewards obtained and subsequent status Update the policy network, changing the network parameters from... Adjust to The update method employs either gradient-based optimal value function approximation or time-series difference-based adaptive update. S54. After completing multiple rounds of training, determine the convergence degree of the policy network. If the preset convergence condition is met, output a new flood discharge scheduling policy. and its corresponding network parameters If the convergence condition is not met, return to step S53 to continue training until a better flood discharge scheduling strategy under complex hydrological and meteorological conditions is obtained, and the optimal flood discharge decision scheme is generated: 。