Real-time risk early warning method for ecological flow of river channel
By acquiring historical hydrological data to identify time-delay response relationships, constructing a probabilistic runoff forecasting model and conducting two-layer ensemble sampling, the uncertainty problem in river flow warning during the dry season was solved, achieving accurate risk attribution and highly reliable early warning.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- HOHAI UNIV
- Filing Date
- 2026-03-11
- Publication Date
- 2026-06-16
AI Technical Summary
Existing technologies are insufficient for accurately depicting the evolution of upstream outflow during dry seasons in river flow early warning, and are also insufficient for removing background interference from baseflow. This results in non-physical oscillations or divergence in prediction results, making it impossible to effectively quantify the sources of risk.
By acquiring historical hydrological observation data, we identify time-delay response relationships that satisfy physical consistency constraints, generate an evolution parameter set, construct a probabilistic runoff forecast model, combine it with two-layer ensemble sampling to generate cross-sectional flow ensemble forecasts, identify key sources of uncertainty, and output ecological flow risk warning results.
It has solved the problems of non-physical oscillation of evolution parameters and lack of physical constraints in forecasting in the early warning of river ecological flow during the dry season, and achieved accurate attribution of risk sources and high reliability of early warning.
Smart Images

Figure CN121829469B_ABST
Abstract
Description
Technical Field
[0001] This invention belongs to the field of ecological flow risk assessment and early warning, and in particular, a method for real-time risk early warning of river ecological flow. Background Technology
[0002] The technical challenge of river ecological flow early warning lies in accurately depicting the downstream response process after the upstream discharge has evolved over a long river channel, and superimposing the predicted component of unknown confluence within the interval to assess the risk of cross-sectional flow meeting standards. Especially during the dry season, slow flow velocity and significant influence from riverbed boundary resistance lead to a highly nonlinear evolution process. Simultaneously, the confluence within the interval is mainly dominated by groundwater baseflow, resulting in weak flow signals that are difficult to monitor directly. Constructing a dynamic model that couples evolutionary hysteresis effects with confluence stochasticity, and systematically quantifying the multi-source uncertainties such as parameter drift and forecast residuals, is the key to achieving high-precision and high-reliability risk early warning.
[0003] In current river evolution calculations, simplified models of the Saint-Venant equations (such as the diffuse wave model) or the Muskingan method are typically used. The evolution parameters are mostly calibrated based on historical typical flood data or assumed to be constant according to river geometry. For runoff forecasting, lumped hydrological models such as the Xin'anjiang River model or standard Long Short-Term Memory (LSTM) deep learning algorithms are commonly used for deterministic point prediction. For uncertainty analysis, existing research mainly utilizes GLUE (Generalized Likelihood Uncertainty Estimation) or Monte Carlo methods to randomly perturb the model inputs or predetermined parameters to generate prediction intervals.
[0004] However, the above-mentioned schemes have significant technical limitations when applied in the weak signal environment of the dry season. Traditional evolution parameter identification methods based on flood data calibration are difficult to adapt to the characteristics of the dry season and cannot effectively isolate the interference of strong baseflow background. This results in the analyzed time-lag response relationship (unit curve) often exhibiting non-physical numerical oscillations or negative values, violating water balance constraints. Purely data-driven forecast models lack hard constraints of physical mechanisms and are prone to outputting prediction trajectories that violate the hydraulic decay law during the receding stage, leading to divergent prediction results over long lead times. In addition, existing uncertainty quantification frameworks usually mix the structural errors of evolution parameters with the input residuals of the forecast model, making it difficult to decouple the independent contributions of inaccurate evolution model parameters and confluence forecast noise at the topological level, thus making it difficult to accurately attribute the sources of risk. Summary of the Invention
[0005] The purpose of this invention is to provide a real-time risk warning method for river ecological flow, in order to solve at least one of the aforementioned problems in the existing technology.
[0006] Technical solution: A method for real-time risk early warning of river ecological flow, comprising:
[0007] Obtain historical hydrological observation data of the river during the dry season, including upstream discharge and downstream cross-sectional observation flow;
[0008] Based on historical hydrological observation data, the time-delay response relationship that satisfies the physical consistency constraint is identified, and a set of evolution parameters representing the uncertainty of the evolution process is generated.
[0009] Based on the evolution parameter set, the inter-regional confluence characteristics are constructed and input into the pre-constructed probabilistic runoff forecasting model to obtain the predicted distribution statistics of the downstream section.
[0010] Two-layer ensemble sampling is performed based on the evolution parameter set and the prediction distribution statistics to generate a cross-sectional flow ensemble forecast that couples evolution uncertainty and forecast uncertainty;
[0011] Based on cross-sectional flow ensemble forecasts, risk probabilities are calculated and key sources of uncertainty are identified, resulting in an ecological flow risk warning.
[0012] Beneficial effects: This invention solves the problems of non-physical oscillations in evolution parameters during the dry season and the lack of physical constraints in forecasting, and achieves accurate attribution of risk sources. Attached Figure Description
[0013] Figure 1 This is a schematic diagram of a real-time risk warning method for river ecological flow provided in an embodiment of this application.
[0014] Figure 2 This is a flowchart illustrating a method for constructing a real-time risk early warning model for river ecological flow under the coupling uncertainty of low water evolution and runoff forecasting, as provided in an embodiment of this application.
[0015] Figure 3 This is a schematic diagram of the uncertainty characterization and simulation process for the evolution of low water levels provided in the embodiments of this application.
[0016] Figure 4 This is a schematic diagram of the runoff forecast uncertainty characterization and simulation process provided in the embodiments of this application. Detailed Implementation
[0017] To enable those skilled in the art to better understand the present invention, the technical solutions of the present invention will be clearly and completely described below with reference to the accompanying drawings of the embodiments of the present invention. Obviously, the described embodiments are only some embodiments of the present invention, and not all embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those skilled in the art without creative effort should fall within the scope of protection of the present invention.
[0018] Example 1 details the overall process of a real-time risk early warning method for river ecological flow, such as... Figure 1 As shown, by constructing a complete closed-loop system that includes data cleaning, physical identification, probability forecasting, and attribution decision-making, the dual uncertainties of nonlinear evolution parameters and difficulty in monitoring interval water inflow in the early warning of ecological flow risk in rivers during the dry season can be solved.
[0019] Step 101: Obtain historical hydrological observation data of the river during the dry season. The historical hydrological observation data includes upstream discharge flow and downstream cross-sectional observation flow.
[0020] Historical hydrological observation data typically originates from watershed hydrological monitoring networks or reservoir operation records. Specifically, it requires collecting long-sequence daily flow data during the dry season (usually from October to March of the following year). Upstream discharge data reflects boundary input conditions, while downstream cross-sectional observation flow data reflects the comprehensive response after river channel evolution and inter-channel confluence. To eliminate the impact of observation noise on subsequent identification, it is preferable to preprocess the raw observation data, for example, by using moving averages or filtering algorithms to remove high-frequency random fluctuations and ensure data stability.
[0021] Step 102: Based on historical hydrological observation data, identify the time-delay response relationships that satisfy physical consistency constraints, and generate a set of evolution parameters that characterize the uncertainty of the evolution process.
[0022] Due to the slow flow velocity and significant influence from riverbed boundaries during the dry season, the evolution process exhibits strong nonlinearity. This embodiment does not seek a single deterministic parameter, but rather identifies a set of parameters that satisfy physical laws (such as water balance). The time-delay response relationship is specifically characterized by a time-delay kernel weight vector, reflecting the time allocation ratio of upstream flow propagation to downstream. The evolution parameter set is a sampling result of a set of possible time-delay kernel weight vectors, used to quantify the evolutionary uncertainties caused by model structure and parameter selection.
[0023] Step 103: Construct interval confluence characteristics based on the evolution parameter set, and input the interval confluence characteristics into the pre-constructed probabilistic runoff forecasting model to obtain the predicted distribution statistics of the downstream section.
[0024] Probabilistic runoff forecasting models combine physical mechanisms with data-driven approaches. For each sample in the evolution parameter set, a corresponding interval confluence sequence can be derived using the water balance principle. A probabilistic runoff forecasting model is a deep learning model that can output the probability distribution characteristics of the predicted results, such as the mean and variance (or scale parameters) of the predicted values. This model employs a dual-channel input structure, receiving both recession characteristics representing physical laws and interval confluence characteristics representing statistical laws, thus achieving a deep fusion of physical and data information.
[0025] Step 104: Perform two-layer ensemble sampling based on the evolution parameter set and the prediction distribution statistics to generate a cross-sectional flow ensemble forecast that couples evolution uncertainty and forecast uncertainty.
[0026] This step decouples uncertainty through a structured two-layer sampling mechanism. The outer loop samples the evolution parameter set to capture the uncertainty of the evolution process; the inner loop samples the predicted distribution statistics output by the probabilistic runoff forecasting model to capture the uncertainty of the forecasting model itself and the input data. The resulting cross-sectional flow ensemble forecast contains multiple possible future flow trajectories, and its divergence reflects the overall risk level of the system.
[0027] Step 105: Calculate the risk probability based on the cross-sectional flow set forecast and identify key sources of uncertainty, and output the ecological flow risk warning result.
[0028] The system calculates the risk probability based on the proportion of samples in the cross-sectional flow ensemble forecast that are below a preset ecological flow threshold. Simultaneously, by analyzing the variance composition of the two-layer ensemble samples, the contribution rates of evolutionary uncertainty and forecast uncertainty can be quantified, identifying the dominant uncertainty source at the current moment. The ecological flow risk early warning results can specifically include risk level, early warning lead time, and identification of key uncertainty sources, providing an intuitive and interpretable scientific basis for reservoir scheduling decisions.
[0029] Example 2 describes the data preprocessing process based on the stripping of the drainage baseline, elaborates on the preparation stage of the time delay response identification, and solves how to extract the net flow signal that can truly reflect the upstream discharge response from the observation data affected by background base flow and inter-regional water use.
[0030] Step 201: Extract the stable receding section during the dry season from historical hydrological observation data, calibrate the interval confluence receding curve equation, and generate the receding baseline sequence for the entire time period based on the interval confluence receding curve equation.
[0031] In this embodiment, the acquired raw flow sequence first needs to undergo quality control. Preferably, a Savitzky-Golay filter is used to smooth the raw data. The filter window length can be set to 5 to 7 days, and the polynomial fitting order can be set to 2nd or 3rd order to remove measurement noise while preserving waveform characteristics. Periods with no rainfall and stable upstream discharge are identified from the smoothed data as samples for recession analysis. These samples are used to calibrate the interval confluence recession curve equation, which is typically expressed using an exponential decay model:
[0032] Q t =Q0*e -k't ;
[0033] Among them, Qt Let t be the discharge flow rate, Q0 be the initial flow rate, k' be the discharge coefficient, and e be the base of the natural logarithm.
[0034] Based on the parameter k' obtained from calibration, a recession baseline sequence covering the entire historical period can be recursively generated. This sequence represents the background runoff process under the condition of no new upstream inflow and inter-regional replenishment.
[0035] Step 202: Extract the recession baseline sequence from the downstream cross-section observed flow to obtain the unexplained flow sequence characterizing the net confluence response of the interval.
[0036] To accurately identify the impact of upstream discharge on downstream sections, interference from background baseflow and inter-section water use must be eliminated. This embodiment obtains the flow sequence to be interpreted through baseline stripping. The specific calculation formula is as follows:
[0037] y net (t)=Q obs (t)-Q base (t)+W(t);
[0038] Among them, y net (t) represents the flow sequence to be explained at time t, i.e., the net response portion that needs to be explained by the evolution model; Q obs (t) represents the observed flow rate at the downstream cross-section; Q base W(t) is the drainage baseline sequence generated in step 201; W(t) is the interval water flow rate after restoration calculation.
[0039] This step breaks down the complex observed flow into background and response components, reducing the difficulty and interference of subsequent identification.
[0040] The following is a specific numerical example illustrating the above-mentioned baseline stripping process.
[0041] Suppose the observed data for a certain river channel on day t are as follows: Observed flow rate Q at the downstream section. obs (t)=85m 3 / s, receding baseline sequence Q base (t)=72m 3 / s (calculated from the calibrated drainage coefficient k'=0.05 / day), the restored interval water flow rate W(t)=3m 3 / s. Calculate according to the formula in step 202:
[0042] y net (t) = 85 - 72 + 3 = 16m 3 / s.
[0043] The unexplained flow sequence value is 16m. 3 / s represents the net response flow generated by the upstream discharge propagating through the river channel to the downstream section after removing the effects of background drainage and inter-regional water use. This value will serve as the target variable for subsequent time-delay response identification.
[0044] Step 203: Perform time-delayed response identification based on the flow sequence to be explained and the upstream outflow.
[0045] Based on this, the processed clean signal (the flow sequence to be interpreted and the upstream discharge flow) is used as input to enter the next stage of the evolution parameter identification process.
[0046] This step-by-step processing strategy ensures that the identification process focuses on the evolution of the water flow itself, rather than being obscured by baseflow fluctuations.
[0047] Example 3 details how to construct and solve an optimization model with multiple physical constraints to obtain a physically consistent time-delay kernel weight vector.
[0048] Step 301: Based on the preset maximum delay extension length, construct an extended delay response matrix using the upstream outflow.
[0049] To capture the potentially excessively long time delays that may occur during the dry season, a relatively large maximum time delay extension length L is set, such as 10 or 15 days. Based on this length, an extended time delay response matrix X is constructed. This matrix is typically designed as a Toeplitz structure (a diagonal constant matrix), with its column vectors corresponding to the upstream daily outflow under different time delays. Specifically, the matrix elements X... t,k This represents the upstream input flow I(tk) that is lagging by k time steps at time t. This matrix structure transforms the convolution operation into matrix multiplication in linear algebra, which facilitates subsequent optimization solutions.
[0050] Step 302: With the goal of minimizing the error between the traffic sequence to be explained and the reconstructed traffic, a least squares optimization model containing sparse smoothing regularization terms and physical consistency constraints is established.
[0051] Traditional least squares methods are prone to oscillations and negative solutions when processing hydrological data with multicollinearity, violating physical principles. This embodiment establishes a regularized least squares optimization model, whose objective function J(h) has the following specific form:
[0052] J(h) = ||Xh - y net ||2 2 +λ tv *||Dh||1;
[0053] Where J(h) is the objective function value to be minimized; h is the time delay kernel weight vector to be identified; X is the extended time delay response matrix; y netFor the flow sequence to be explained; ||...||2 2 λ represents the square of the Euclidean norm, used to measure the error in data fitting; tv λ is a regularization parameter, preferably ranging from 0.01 to 1.0. This parameter controls the balance between data fitting and curve smoothing: a smaller λ... tv A larger λ value, which emphasizes data fitting, may cause local oscillations in the lag kernel curve; tv The value emphasizes smoothness and may over-smooth out peak characteristics. In practical applications, the optimal λ can be selected through cross-validation or the L-curve criterion. tv The value is preferably 0.1; D is a first-order difference matrix; ||...||1 represents the L1 norm.
[0054] The second term in the formula is the sparse smoothing regularization term, specifically the Total Variation (TV) regularization term. This term effectively suppresses non-physical oscillations by penalizing the gradient of the weight vector (i.e., the difference between adjacent time-delay weights), forcing the time-delay kernel curve to remain smooth.
[0055] Step 303: Solve the least squares optimization model to obtain the physically consistent time-delay kernel weight vector.
[0056] Step 304: The sparse smoothing regularization term is the total variation regularization term, which is used to suppress the non-physical oscillations of the lagging kernel weight vector between adjacent lagging steps.
[0057] As in step 302, the total variation regularization term not only smooths the curve but also preserves edge characteristics to some extent, making it suitable for describing unit linear processes with unimodal or bimodal shapes. Compared to L2 regularization (ridge regression), TV regularization is better at avoiding the phenomenon of excessively smoothing out peaks for the sake of smoothing. In some alternative implementations, ridge regression can also be used as an alternative, but its effect on suppressing oscillations is generally weaker than that of TV regularization.
[0058] Step 305: The physical consistency constraint is a simplex constraint, used to ensure that each element of the time-delay kernel weight vector is non-negative and the sum of the elements is equal to 1.
[0059] To ensure the physical balance of water volume, strict hard constraints, namely physical consistency constraints, must be imposed when solving the objective function J(h). These constraints are specifically expressed as simplex constraints:
[0060] h i >=0 and ∑(h) i )=1;
[0061] Among them, h i It is the i-th element of the time-delay kernel weight vector h.
[0062] The nonnegativity constraint ensures that the water flow will not reverse, and the sum-to-1 constraint ensures that all the water discharged from the upstream will eventually pass through the downstream section (without considering transmission losses). This constraint makes the optimization problem a constrained convex optimization problem, which can be solved using the projective gradient descent method or the interior point method.
[0063] Step 306, the time delay kernel weight vector is determined by repeatedly solving the least squares optimization model after dividing the historical hydrological observation data into multiple sub-sample windows, and then filtering and retaining the effective time delay set based on the stability threshold.
[0064] To improve the robustness of the identification results, this embodiment employs a resampling strategy. Historical hydrological observation data is divided into B sub-sample windows, and the optimization model is solved independently for each window. The frequency at which each time lag step k is identified as having a non-zero weight is counted. The formula for calculating the selected frequency is:
[0065] freq k =(∑ i=1 B I(h k,i >δ)) / B;
[0066] Where, freq k Let k be the frequency at which the delay step size is identified as an effective delay, B be the total number of sub-sample windows, I be the indicator function, which takes a value of 1 when the condition is met and 0 otherwise, h k,i The weight value corresponding to the lag step size k identified in the i-th sub-window is δ, which is the threshold for determining the weight, preferably set to 0.01. Based on the preset stability threshold F... s (Preferred setting: 0.8) Filter the set of valid delays. Only when freq k >F s Only when the time delay step is considered stable and effective is the time delay step size considered stable. The final time delay kernel weight vector can be the average or median of all sub-window results, resulting in a set of evolution parameters that satisfy both physical constraints and statistical stability.
[0067] Example 4 elaborates on the quantification process of evolution parameter uncertainty. By constructing a data-driven adaptive posterior distribution construction method, it solves the problem that the prior parameters are fixed in traditional methods and it is difficult to reflect the real-time identification effect of the model. It achieves the adaptive characteristic that the larger the error, the more dispersed the distribution, and the smaller the error, the more concentrated the distribution.
[0068] Step 401: Calculate the identification residual sequence and its residual variance of the least squares optimization model.
[0069] In this embodiment, the optimal time delay kernel weight vector h is used. optCombined with the extended response matrix X, the reconstructed flow sequence is calculated. The reconstructed flow is then compared with the actual flow sequence to be explained, y. net By comparing the results, the identification residual sequence is obtained. The variance of this residual sequence is calculated and denoted as σ. res 2 This statistic directly reflects the interpretability of the current linear evolution model for the observed data. If the residual variance is small, it indicates that the linear evolution assumption holds and the identification results are highly reliable; if the residual variance is large, it indicates the presence of nonlinear interference or model bias, and the identification results have greater uncertainty.
[0070] Step 402: Construct a posterior probability distribution that satisfies the simplex constraint, using the time-delay kernel weight vector as the distribution center.
[0071] To quantify the uncertainty of the parameters, this embodiment does not treat the evolution parameters as a fixed vector, but rather as random variables following a certain probability distribution. Considering the physical constraint that the evolution coefficients must satisfy non-negativity and sum to 1 (simplex constraint), the Dirichlet distribution is preferably used as the posterior distribution. The Dirichlet distribution is a multivariate probability distribution defined on the simplex, naturally satisfying the physical constraint and avoiding the statistical property destruction caused by the secondary normalization required after sampling using a Gaussian distribution. During construction, the lag kernel weight vector h is... opt Let the expected value (center) of the distribution be set, ensuring that the sampling results fluctuate around the physically consistent optimal solution.
[0072] Step 403: Dynamically adjust the concentration coefficient of the posterior probability distribution based on the residual variance, so that the dispersion of the posterior probability distribution is positively correlated with the residual variance.
[0073] The posterior probability distribution is a Dirichlet distribution. The concentration coefficient of the posterior probability distribution is dynamically adjusted based on the residual variance. Specifically, the parameter vector of the Dirichlet distribution is determined using an inverse proportional mapping function.
[0074] The inverse proportional mapping function is: κ = κ0 / max(σ) res 2 ,ε);
[0075] Where κ is the concentration coefficient, κ0 is the preset baseline scale constant, and σ res 2 ε is the residual variance, and ε is the preset truncation threshold to prevent the denominator from being zero.
[0076] The parameter vector of the Dirichlet distribution is determined by the product of the time-delay kernel weight vector and the concentration coefficient.
[0077] The above steps define in detail the mathematical mechanism of adaptive adjustment. The parameter vector α of the Dirichlet distribution can be expressed as α = κ*h opt The concentration coefficient κ determines the peak level of the distribution. To achieve adaptivity, this embodiment designs the aforementioned inverse mapping function. Specifically, the preset baseline scale constant κ0 can be set empirically to a value between 10 and 100, for example, 50; the preset cutoff threshold ε is a very small positive number, for example, 10. -6 When the model fits well (σ) res 2 When the value of κ is small, the calculated value of κ increases, causing the Dirichlet distribution to be highly concentrated around the center h. opt When the sampled samples show small differences, the κ value decreases and the distribution becomes flatter when the fit is poor, indicating that the sampled samples are divergent and thus reflect the greater uncertainty of the parameters.
[0078] Step 404: Perform multiple samplings from the adjusted posterior probability distribution to generate an evolution parameter set.
[0079] Using a predetermined parameter vector α, M random samples are taken from the Dirichlet distribution, for example, M=100, resulting in a set containing M time-lapse weight vectors, i.e., the evolution parameter set. Furthermore, during the rolling update process, an exponential forgetting mechanism can be used to update the posterior parameters, with the update formula as follows:
[0080] α new =ρ*α old +(1-ρ)*α obs ;
[0081] Where, α new For the updated Dirichlet distribution parameter vector (i.e., the new posterior parameters), α old This is the Dirichlet parameter vector estimated from the previous time step (or the previous time window), representing the accumulation of historical information. ρ is the forgetting coefficient, typically taking a value of 0.2, and α... obs It is an instantaneous parameter estimate calculated based on the latest day's data.
[0082] This mechanism ensures that the parameter set can adapt to the slow changes in river characteristics over time.
[0083] Example 5 details the specific architecture of the probabilistic runoff forecasting model. By designing a dual-channel input structure of physics and data and a distributed output layer, it solves the problems of poor physical interpretability and difficulty in quantifying prediction uncertainty in traditional black-box models.
[0084] Step 501: Based on the evolution parameter samples in the evolution parameter set, the interval confluence sequence is inferred by using the water balance principle, and the sliding statistical characteristics of the interval confluence sequence are calculated.
[0085] In this embodiment, for each sample in the evolution parameter set, the interval confluence sequence is inferred using the water balance equation. To enhance the model's ability to capture local trends, a sliding window, for example, with a window length of 7 days, is applied to the interval confluence sequence. Statistical characteristics within the window are calculated, specifically including the mean, standard deviation, and coefficient of variation. These statistical characteristics constitute the enhanced input for the data-driven part.
[0086] Step 502: Construct a probabilistic runoff forecasting model with a dual-channel input structure, wherein the physical channel inputs the recession baseline sequence extracted from historical hydrological observation data, and the data channel inputs the interval confluence sequence and its sliding statistical characteristics.
[0087] This embodiment uses a Long Short-Term Memory (LSTM) network as the model skeleton, but with specific design modifications to the input layer. Specifically, the model input layer is divided into two independent channels:
[0088] Physical channel: Direct input of the recession baseline sequence Q base And its first-order difference. This part of the information represents the definite physical decay law and constitutes the physical prior basis of the model.
[0089] Data Channel: Input interval confluence sequence and its sliding statistical characteristics. This information represents random fluctuations influenced by rainfall, evaporation, and human activities, and the neural network is responsible for mining its nonlinear patterns.
[0090] In some alternative implementations, in addition to LSTM, probabilistic runoff forecasting models can also use gated recurrent units (GRUs) or Transformer architectures as the basic feature extractors, as long as the dual-channel input structure described above is maintained.
[0091] In one specific implementation, the LSTM network is configured as follows: the physical channel contains one LSTM unit layer with a hidden layer dimension of 32; the data channel contains two stacked LSTM units, each with a hidden layer dimension of 64. The outputs of the two channels are concatenated and then fused through a fully connected layer of dimension 128 before being fed into the mean head and scale head respectively. The sequence input length is set to 30 days, corresponding to a one-month historical window. During training, the Adam optimizer is used with an initial learning rate of 0.001, a batch size of 32, and the number of training epochs is dynamically determined based on the convergence of the validation set loss, typically between 100 and 200 epochs. To prevent overfitting, a Dropout layer is introduced after the fully connected layer, with a dropout probability set to 0.2.
[0092] Step 503: Use the probabilistic runoff forecasting model to output the conditional mean and conditional scale of the predicted cross-sectional flow distribution within the future forecast period.
[0093] To achieve probabilistic prediction, the model's output layer does not directly output a single flow rate value, but instead outputs parameters of the assumed probability distribution. In this embodiment, it is assumed that the prediction error follows a Gaussian distribution (normal distribution), and the model's output layer is designed as two parallel fully connected layers (head):
[0094] Mean Header: Outputs the conditional mean μ t , representing the most likely traffic forecast;
[0095] Scale Head: Outputs conditional scale σ t (i.e., standard deviation), representing the range of uncertainty in the prediction. To ensure the scaling parameter σ... t Always positive, apply the Softplus activation function (an activation function that guarantees a positive output) to the output of the scale head:
[0096] ;
[0097] Where, σ t Let z be the conditional scale (standard deviation) of the model output at time t, guaranteed to be non-negative. t These are the raw output values of the linear layer of the scale head in the neural network.
[0098] In other implementations, taking into account the skewed distribution of hydrological data, it can be assumed that the predicted values follow a Gamma distribution. In this case, the model outputs the shape parameter k. t and scale parameter θ t .
[0099] Example 6: By introducing a physical constraint term controlled by a gated variable into the loss function, the problem of performance degradation caused by imposing physical constraints on traditional PINN (Physical Information Neural Network) during non-physically dominant periods (such as reservoir scheduling and heavy rainfall replenishment) is solved.
[0100] Step 601: Construct a loss function that includes a negative log-likelihood term and a conditional physical consistency constraint term.
[0101] In this embodiment, the total loss function L total It consists of two parts:
[0102] L total =L nll +L phy ;
[0103] Among them, L nllThis is the negative log-likelihood term, used to optimize the probability distribution parameters. For the Gaussian distribution assumption, its formula is:
[0104] ;
[0105] Among them, L nll This is the negative log-likelihood loss term. T is the total number of time steps in the forecast period. σ t y represents the conditional scale of the model prediction at time t. t Let μ be the true value of the flow rate observation at time t. t Let t be the conditional mean of the model prediction at time t.
[0106] This factor drives the mean of the model output to approximate the observed value y. t At the same time, adjust the scale σ t To reflect the magnitude of the error.
[0107] L phy This is a conditional physical consistency constraint term, used to penalize predictions that violate physical laws under certain conditions.
[0108] Step 602: During the training iteration process, monitor in real time whether the physical dominance condition is met in the current time period.
[0109] Step 603: When the physical dominance condition is met, the physical consistency constraint term is activated to force the conditional mean to follow the physical decay law.
[0110] The activation state of the conditional physical consistency constraint is controlled by a gating variable, and the calculation logic of the gating variable is as follows:
[0111] Gated variables are constructed based on the rate of change of upstream outflow and preset interval replenishment indicators.
[0112] The gating variables are constructed based on the rate of change of upstream downstream flow and the preset interval replenishment indication, and specifically include the introduction of interval water use plan change criterion.
[0113] The activation logic of the gated variable is as follows: the gated variable is activated only when the rate of change of the upstream outflow is lower than the preset fluctuation threshold, the interval replenishment indicator shows no obvious rainfall replenishment, and the pre-acquired interval water use plan remains stable.
[0114] The above steps describe the logic of the gating mechanism in detail. To determine when to activate the physical constraint, a binary gating variable g is constructed. t g t The calculation integrates three dimensions of criteria:
[0115] Upstream boundary criterion: Calculate the rate of change of upstream downstream discharge |dQ upIf / dt| is less than the preset fluctuation threshold, such as 5%, it is considered to be boundary stable.
[0116] Inter-regional replenishment criteria: Check the interval rainfall data or soil moisture data. If it is lower than the replenishment threshold, it is considered that there is no significant replenishment.
[0117] Human interference criterion: Check the water use plan of the interval. If the change in water use in adjacent time periods is less than the sudden change threshold, such as 10%, it is considered that the water use is stable.
[0118] Only when all three conditions above are met simultaneously is it determined that the current state is dominated by pure receding water, and g is set. t =1; otherwise set g t =0. The stringent activation logic ensures that physical constraints will not incorrectly suppress the model's dynamic response capability during reservoir peak shaving or heavy rainfall.
[0119] Step 604: Minimize the loss function to optimize the model parameters.
[0120] When the upstream outflow is stable and there is no strong inter-regional replenishment, the gating variable is set to the active state; the specific form of the conditional physical consistency constraint is:
[0121] ;
[0122] Among them, L phy For the conditional physical consistency constraint term, λ phy As a penalty weight, g t For gated variables, and These are the conditional mean values of the current time and the previous time, respectively, e -k' This is the physical attenuation factor determined based on the interval confluence drainage curve equation.
[0123] During training, when g t When =1, L phy The term is now active. This formula uses ReLU logic (i.e., max(0,...)) to check if the predicted mean at adjacent time points conforms to an exponential decay law. Physical decay factor e -k' The k' value can be directly adopted from the drainage coefficient calibrated in the aforementioned embodiments. If the predicted value... Greater than the theoretical attenuation value This indicates that the model's prediction violates the receding water pattern, in which case a positive loss value is generated as a penalty; otherwise, the loss is 0.
[0124] Penalty weight λ phy It is usually set to a large positive number, such as 1.0 or 10.0, to ensure the priority of physical constraints. Specifically, λ phy The selection of λ needs to balance the accuracy of data fitting and the degree to which physical constraints are satisfied: when λphy When λ is too small, the penalty effect of physical constraints is insufficient, and the model may output predictions that violate the decay law during the recession phase; when λ... phy When the value is too large, the physical constraints become too strong, potentially suppressing the model's ability to respond to changes in the actual confluence. In practical applications, it is recommended to compare different values on a validation set. phy For the root mean square error of prediction and the number of physical violations under a given value, select λ that achieves Pareto optimality for both. phy Value. Based on experimental experience, for typical dry season scenarios, λ phy A value of 1.0 usually achieves a good balance.
[0125] By minimizing the total loss function, the model learns to fit the data while also incorporating the physical logic of the dry season.
[0126] Example 7: By constructing a structured two-layer sampling system, the traditional mixed uncertainty is explicitly separated into evolution parameter uncertainty and prediction residual uncertainty, and the computable attribution of key uncertainty sources is realized based on the variance decomposition principle.
[0127] Step 701: Execute the outer loop: Iterate through each evolution parameter sample in the evolution parameter set and use the probabilistic runoff forecasting model to calculate the corresponding conditional mean and conditional scale.
[0128] In this embodiment, the architecture of the two-layer set sampling is designed as a nested loop structure. The outer loop is mainly responsible for capturing the uncertainty of the evolution process. Assume that the evolution parameter set contains M samples, for example, M=50, denoted as {h1,h2,...,h...}. M For each sample h i First, the corresponding inter-regional runoff sequence is inversely derived using the water balance principle and then input into the probabilistic runoff forecasting model. The model outputs the predicted distribution parameters for the next T forecast periods under these evolution parameters: the conditional mean sequence μ. i (t) and conditional scaling sequence σ i (t), where t=1,...,T. This process generates M sets of center prediction trajectories and their corresponding uncertainty bandwidths, each representing the best prediction result under a predetermined evolutionary assumption.
[0129] Step 702, execute the inner loop: for each evolution parameter sample corresponding to the conditional mean and conditional scale, generate multiple sets of forecast residual samples that satisfy the distribution characteristics by random sampling, and superimpose them to obtain the secondary flow prediction value.
[0130] The inner loop is primarily responsible for capturing forecast uncertainties caused by data noise and model structure errors. For each pair of distribution parameters {μ} obtained from the outer loop... i (t),σi For each element (t), perform N random samplings, for example, N=20. The specific sampling formula is:
[0131] q i,j (t)=μ i (t)+σ i (t)*z j ;
[0132] Among them, z j Let be the j-th random number drawn from the standard normal distribution N(0,1). For each time t, this formula generates N random numbers around the mean μ. i (t) The specific flow rate value q of the fluctuation i,j (t). z j This represents the normalized forecast residual sample. Through the overlay operation, the abstract distribution parameters are transformed into specific flow rate numerical samples.
[0133] Step 703: Combine all the secondary flow prediction values generated by the outer and inner loops to form a two-layer ensemble forecast sample.
[0134] After the two-level loop described above, a large set of M*N traffic trajectories is obtained, for example, 50*20=1000 trajectories. These trajectories constitute a two-level ensemble forecast sample, which can be mathematically represented as a three-dimensional matrix P with dimensions M*N*T. Each trajectory corresponds to an evolution parameter-random noise combination. The structured sample set not only provides common probability forecast intervals, such as the 90% confidence interval, through statistical quantiles, but also provides the necessary data foundation for subsequent variance decomposition.
[0135] Step 704: Decompose the total variance of the two-layer ensemble forecast sample using the variance decomposition principle.
[0136] To identify the sources of risk, this embodiment employs ANOVA (Abstract Variation and Variance Analysis). According to the Law of Total Variance, the total variance Var(Y) can be decomposed into the variance of the conditional mean and the mean of the conditional variances. In the context of a two-level set, these two components correspond to evolutionary uncertainty and forecast uncertainty, respectively. This decomposition directly links abstract statistical concepts to physical processes.
[0137] Step 705: Calculate the variance of the conditional mean over the set of evolutionary parameters as a contribution to evolutionary uncertainty.
[0138] In the specific calculation, firstly, for each prediction period t, extract the M conditional mean values μ1(t),...,μ generated by the outer loop. M (t). Calculate the variance of this set of means, which is the contribution V of the evolution uncertainty. routing(t). The formula is:
[0139] ;
[0140] Among them, V routing (t) represents the variance contribution of the uncertainty at time t attributed to the evolution parameters; M is the total number of samples in the outer loop (evolution parameter set); μ i (t) is the conditional mean of the prediction corresponding to the i-th evolution parameter sample; This is the average of the predicted conditional means for all M evolution parameter samples. This metric measures how much the predicted center value has drifted due to changes in the evolution parameter h. A large value indicates that the uncertainty of the evolution model dominates the prediction divergence.
[0141] Step 706: Calculate the mean of the conditional scale squared over the set of evolution parameters as a contribution to the forecast uncertainty.
[0142] Extract the M conditional scales σ1(t),...,σ generated by the outer loop. M (t). Calculate the average of the squares (i.e., variances) of this set of scales, which is the contribution V to the forecast uncertainty. forecast (t). The formula is:
[0143] V forecast (t)=(1 / M)*Σ(σ i (t)) 2 ;
[0144] Among them, V forecast (t) represents the variance contribution of the uncertainty at time t attributed to the prediction model (and input noise), M is the total number of samples in the outer loop (evolutionary parameter set), and σ i (t) represents the prediction condition scale corresponding to the i-th evolution parameter sample.
[0145] This metric measures the average level of uncertainty of the forecast model itself, given a fixed evolution parameter. A large value indicates that no matter how accurately the evolution parameter is selected, noise in the input data or the model's own accuracy limitations can still lead to significant prediction errors.
[0146] Step 707: Compare the contributions of evolution uncertainty and forecast uncertainty, and identify the one with the larger contribution as the key source of uncertainty at the current warning time.
[0147] For each forecast period t, compare V routing (t) and V forecast The magnitude of (t). If V routing (t)>V forecastIf (t), then the current risk is mainly due to the unclear evolution process, and it is recommended to strengthen the monitoring or calibration of the river channel evolution characteristics; otherwise, the risk is due to insufficient forecast accuracy, and it is recommended to optimize the inter-regional confluence forecast model or improve the quality of input data.
[0148] Computable attribution results provide direct guidance for improving the targeting of early warning systems. For example, in a specific numerical case, if the total variance is calculated to be 12 at a certain moment, with evolution contributing 10 and forecast contributing 2, the system will clearly indicate that evolution uncertainty is dominant, and decision-makers should focus on the rationality of the evolution parameters.
[0149] Example 8 details the post-processing and operation mechanism of the early warning system from model output to final decision-making. It solves the probability bias problem through online calibration, ensures the timeliness of the system through rolling updates, and enhances the practical value of the early warning by calculating the early warning lead time.
[0150] Step 801: Calculate the proportion of samples in the two-layer ensemble forecast samples that are below the preset ecological flow threshold to obtain the uncalibrated risk probability.
[0151] After obtaining the two-layer ensemble forecast samples, for any forecast period t, we count all M*N samples where the flow value is less than the preset ecological flow threshold Q. eco Number of samples low Uncalibrated risk probability P raw (t) is calculated as follows:
[0152] P raw (t)=Count low / (M*N).
[0153] For example, if 200 out of 1000 samples are below the ecological threshold, the probability of uncalibrated risk is 20%. Due to the deviation between model assumptions (such as normal distribution) and reality, the original probability often suffers from overconfidence or underestimation and cannot be directly used to issue early warnings.
[0154] Step 802: Obtain the pairing data of historical predicted probabilities and historical actual observed events within the sliding window, and construct the probability calibration mapping function.
[0155] To correct for the aforementioned bias, the system maintains a sliding window, for example, over the past 90 days, collecting historical model prediction probabilities P for the same period. hist And the corresponding actual observation results (1 if water shortage actually occurs, 0 otherwise). Using this set of paired data, a probability calibration mapping function f is constructed. calOrdinal-preserving regression is preferred as the calibration algorithm, as it guarantees that the calibrated probability monotonically increases with the original probability and does not require assumptions about a specific functional form. The calibration function is essentially a lookup table or piecewise linear function that maps the probability assumed by the model to the actual frequency of occurrence in history.
[0156] Step 803: Correct the uncalibrated risk probability using the probability calibration mapping function to obtain the calibrated risk probability, and determine the warning level based on the calibrated risk probability.
[0157] P calculated in step 801 raw (t) Input the calibration function to obtain the calibration risk probability P cal (t)=f cal (P raw (t)). The warning level is determined based on the preset risk classification criteria. For example, it can be set as follows:
[0158] Level IV (Blue Alert): 0.2 <= P cal <0.4;
[0159] Level III (Yellow Alert): 0.4 <= P cal <0.6;
[0160] Level II (Orange Alert): 0.6 <= P cal <0.8;
[0161] Level I (Red Alert): P cal >=0.8.
[0162] This level directly determines the urgency of the response measures.
[0163] Obtain the calibration risk probability sequence corresponding to each future forecast period; retrieve the first forecast period moment in chronological order where the calibration risk probability exceeds the corresponding risk level threshold; determine the time difference between the forecast period moment and the current moment as the early warning lead time, and output it in association with the risk level.
[0164] This process defines the early warning period T. lead The calculation logic is as follows: Assuming the current day is day 0, the model outputs a calibration risk probability sequence for the next 7 days. If the system determines that a Level II warning has been triggered (threshold is 0.6), the system will start scanning backwards from day 1 to find the first condition that satisfies P. cal At time t(t)>=0.6 alert If the threshold is exceeded for the first time on the 3rd day, the warning lead time T is... lead =3 days. This means that the managers have a 3-day window to replenish the reservoir and schedule water supply in order to avoid ecological water shortage events.
[0165] The process proceeds at fixed time steps to obtain the latest upstream discharge flow and downstream cross-sectional observation flow.
[0166] An exponential forgetting mechanism is used to update the posterior distribution parameters in the lag response identification, giving higher weight to the latest observation data.
[0167] The evolution parameter set is regenerated based on the updated posterior distribution parameters, and risk warning is executed for the next time step.
[0168] The system operates on a daily, rolling basis, not as a one-time event. As time progresses to the next day and new measured data is acquired, the posterior distribution parameter α of the evolution model needs to be updated online. The update formula employs an exponential forgetting mechanism:
[0169] α new =ρ*α old +(1-ρ)*α obs ;
[0170] Where, α new For the updated Dirichlet distribution parameter vector (i.e., the new posterior parameters), α old The Dirichlet parameter vector estimated at the previous time step (or the previous time window), α obs The instantaneous parameter estimates are calculated based on the latest day's data, where ρ is the forgetting factor, for example, 0.2. This mechanism causes the weight of old information to decay at an exponential rate, ensuring that the evolution parameter set always reflects the most recent hydraulic characteristics of the river channel, thus achieving adaptive evolution of the model.
[0171] In actual operation, it also includes data quality control and anomaly handling mechanisms:
[0172] In response to abnormal input data: When negative values, abnormal peak values exceeding historical extreme values, or missing data for multiple consecutive days are detected in upstream outflow or downstream cross-sectional observed flow, the system executes the following processing strategies:
[0173] (1) For single-point outliers, linear interpolation between adjacent time points is used for replacement;
[0174] (2) For consecutive missing data (the number of missing days does not exceed 3 days), extrapolation based on the receding water pattern is used to fill the missing data;
[0175] (3) For large-scale data anomalies or long-term missing data, the system will suspend early warning output and issue a data quality alarm.
[0176] Addressing model prediction anomalies: When the conditional scale σ in the output of the probabilistic runoff forecasting model... t Exceeding the conditional mean μ tWhen the preset percentage threshold (e.g., 200%) is reached, it is determined that the prediction confidence is too low. The system will mark the warning information with excessive prediction uncertainty in the warning result, prompting decision-makers to refer to it with caution.
[0177] Example 9: Providing alternative technical implementation methods.
[0178] Regarding alternatives to time-delay response identification, in the aforementioned embodiments, total variation (TV) regularization was used to constrain the smoothness of the time-delay kernel. As an alternative, ridge regression (i.e., L2 regularization) can be used to replace the TV regularization term. In this case, the regularization term in the objective function becomes λ. tv *||h||2 2 Although ridge regression is less effective than TV regularization in suppressing oscillations under certain abrupt signals, it is simpler to calculate and can achieve acceptable results in stable channels. Furthermore, for simplex constraints, in addition to using the projected gradient method, the standard quadratic programming (QP) solver can also be used, as long as the non-negativity and normalization conditions are satisfied.
[0179] Regarding alternatives to the output distribution of the probabilistic forecasting model, the aforementioned embodiments assumed that the flow prediction error follows a Gaussian (normal) distribution. However, in some small and medium-sized rivers, the flow during the dry season may exhibit a significantly skewed distribution. As a preferred alternative, the output distribution can be set to a Gamma distribution. In this case, the output head of the probabilistic runoff forecasting model will be adjusted to output the shape parameter k. t and scale parameter θ t The negative log-likelihood term in the loss function will also be replaced with a likelihood function in the form of a Gamma distribution. This alternative is better suited to hydrological data with strictly non-negativity and skewness.
[0180] Regarding alternatives to the gating variable criterion, in the aforementioned embodiments, the gating variable g... t It is a binary variable based on a hard threshold (0 or 1). In some more refined implementations, g can be... t Designed as a continuous variable between [0,1], it is softened using the Sigmoid function or fuzzy logic. For example, g t =Sigmoid(-α'*(|dQ / dt|-threshold)), where α' is an adjustable scaling parameter (also called the steepness parameter or gain) that controls the steepness of the transition in the Sigmoid function, dQ / dt is the rate of change, and threshold is a preset threshold. This soft gating mechanism allows physical constraints to intervene gradually, avoiding the abrupt gradient changes caused by hard switching, and is more conducive to the stability of model training.
[0181] Example 10: An optional scheme for constructing a real-time risk early warning model for river ecological flow under the coupling uncertainty of low water evolution and runoff forecasting, including the following steps:
[0182] S1. Identification of physically consistent evolution parameters and quantification of uncertainty under the constraint of water discharge: Based on historical data of the stable section of reservoir discharge during the dry season, the equation of the interval confluence water discharge curve is calibrated; the maximum time delay extended response matrix is constructed, and the water discharge baseline stripping, total variation regularization and simplex constraint are introduced to identify the physically consistent time delay kernel weights; through adaptive posterior simplex sampling, the set of evolution parameters is generated to quantify the uncertainty of the dry season evolution process;
[0183] S2, Physical information embedded in distributed output forecast and two-layer ensemble generation: Based on the evolution parameter set, the interval confluence sequence is back-inferred, a dual-channel LSTM model with output conditional mean and scale is constructed, and the retreat gating consistency constraint is introduced in the training; through the two-layer ensemble method of outer evolution parameter sampling and inner forecast residual sampling, a probabilistic ensemble forecast of cross-sectional flow is generated.
[0184] S3, online calibration and calculable attribution rolling risk warning: calculate the uncalibrated risk probability based on two-layer ensemble samples, and perform online probability calibration through historical paired data in a sliding window; perform variance decomposition on the ensemble forecast samples to attribute key sources of uncertainty; combine calibration probability and preset threshold to determine the warning level and warning lead time, and establish a rolling update mechanism to achieve continuous warning.
[0185] According to one aspect of this application, the identification of physically consistent evolution parameters and the quantification of uncertainties under receding water constraints further include:
[0186] The daily discharge process of upstream reservoirs during the dry season, the daily observed flow process of downstream ecological sections, and the water use plan of the interval were selected as historical stable data. The interval confluence and drainage section corresponding to the stable discharge flow of the reservoir was extracted, the drainage curve equation of the interval confluence was calibrated, and the drainage baseline sequence was generated.
[0187] The flow rate to be explained is obtained by subtracting the drainage baseline from the cross-sectional observed flow rate and correcting for the impact of water use; the maximum time delay extended response matrix is constructed, a least squares model with total variation regularity and simplex constraint is established, the time delay kernel weight vector is identified, and the effective time delay set is determined by the stability selection method;
[0188] Centered on the identified time-delay kernel weights, the concentration is adaptively controlled using residual variance. A Dirichlet posterior distribution is constructed and sampled to generate an evolution parameter set. The posterior parameters are then updated in a rolling manner using an exponential forgetting method.
[0189] Determining the effective delay set further includes:
[0190] The discharge at the observation section was stripped using the receding water baseline sequence to obtain the discharge sequence to be explained;
[0191] An extended response matrix is constructed using the maximum time delay L, and its column vectors correspond to the upstream daily outflow at different time delays.
[0192] An optimization model with a total variation regularization term is established, with the constraint that the time delay kernel weights are non-negative and sum to 1. The solution yields a physically consistent evolution coefficient vector.
[0193] Based on the results of multiple subsample window solutions, the selection frequency of each delay is calculated, and delays that exceed the stability threshold are identified as the set of stable and effective delays.
[0194] Generate an evolution parameter set and continuously update the posterior parameters, further including:
[0195] Construct the parameter vector of the Dirichlet distribution using the time-delay kernel weight vector;
[0196] The distribution concentration coefficient is adaptively adjusted using residual variance to make the distribution dispersed when the error is large and concentrated when the error is small.
[0197] Sample from the Dirichlet posterior distribution to generate a set of evolution parameter samples that satisfy the simplex constraint;
[0198] At each rolling moment, the Dirichlet distribution parameters are updated using an exponential forgetting method, which integrates the old and new information.
[0199] According to one aspect of this application, the physical information embedded in the distributed output forecast and the two-layer ensemble generation further includes:
[0200] Based on each set of samples in the evolution parameter set, the corresponding interval confluence sequence is obtained by back-calculation through water balance, and quality control and feature extraction are performed.
[0201] A dual-channel LSTM model is constructed, where the physical channel inputs the receding water sequence and the data channel inputs the interval confluence sequence and its sliding statistical characteristics; the model output is the conditional mean and conditional scale for the forecast period.
[0202] A recession gating consistency constraint term is introduced into the training loss function to force the predicted mean to follow the recession curve decay law only during the period when recession is dominant.
[0203] For each outer evolution parameter sample, a corresponding predicted distribution parameter is generated, and a second-level ensemble forecast sample is formed through inner residual sampling.
[0204] The construction of the dual-channel LSTM model further includes:
[0205] The model output layer is designed with two heads, which output the conditional mean μ(t,τ) and conditional scale s(t,τ) for each prediction period τ, respectively, where s(t,τ)>0;
[0206] The training loss function is composed of a weighted negative log-likelihood term and a retreat consistency constraint term, wherein the retreat consistency term controls the activation timing through a gating variable.
[0207] The gating variable is determined by a combination of the upstream discharge change rate, the interval replenishment indication, and the water use plan change criterion.
[0208] According to one aspect of this application, online calibration and rolling risk warning with computable attribution further include:
[0209] Based on a two-layer ensemble sample, the proportion of samples with cross-sectional flow below the ecological threshold in each forecast period is statistically analyzed to obtain the uncalibrated risk probability.
[0210] By using the historical predicted probabilities within a sliding window and the actual subthreshold event labels, a monotonic piecewise calibration function is fitted to map the uncalibrated probabilities online, thus obtaining the calibration risk probability.
[0211] Variance decomposition is performed on the two-layer ensemble sample to calculate the contribution of evolution uncertainty, forecast uncertainty and interaction term, and the largest contribution term is identified as the key source of uncertainty.
[0212] Based on whether the calibration risk probability exceeds the preset risk classification threshold, the risk level of each forecast period is determined, and the earliest forecast period to reach the predetermined warning level is calculated as the warning lead time.
[0213] The process involves rolling forward at fixed time steps, incorporating the latest observation and scheduling information, repeatedly performing the identification of physically consistent evolution parameters and the quantification of uncertainties under the constraint of receding water, and calculating the earliest prediction period for reaching the predetermined warning level as the warning lead time, thereby realizing the dynamic updating of risk warnings.
[0214] According to one aspect of this application, a method for constructing a real-time risk early warning model for river ecological flow under the coupled uncertainty of drought evolution and runoff forecasting is provided, the overall process of which is as follows: Figure 2 As shown, this method is trained based on historical data and achieves dynamic and probabilistic monitoring and early warning of ecological flow risks by integrating physical consistency evolution parameter identification, distributed output runoff forecasting, two-layer ensemble probability calculation, online calibration, and attributable rolling early warning.
[0215] Step S01: Characterize and simulate the uncertainty of the low water level evolution, such as... Figure 3 As shown.
[0216] Data from a stable period during the dry season (usually from September to February of the following year) with no effective rainfall, no sudden changes in reservoir scheduling, and stable water use plans within the region were selected, including the daily outflow Q from the upstream reservoir. rel (t), Daily observed flow rate at downstream ecological section Q obs (t) and the interval daily water use plan (interval water use flow after restoration calculation) W(t).
[0217] Savitzky-Golay smoothing was applied to the original flow sequence to suppress short-term fluctuations. Based on the principle of water balance, multiple sets of interval confluence sequences were derived through back-calculation. Using the period of continuous stable upstream discharge as a marker, the recession segment of the interval confluence was extracted from the back-calculation results. An exponential or linear model was used to fit the recession segment, and the parameters θ of the recession curve equation were calibrated. rec And generate the full-time recession baseline sequence Q. base (t).
[0218] Furthermore, physical consistency time delay kernel identification under the constraint of water discharge is performed.
[0219] Calculate the unexplained flow y net (t):
[0220] y net (t)=Q obs (t)-Q base (t)+W(t);
[0221] Where W(t)>0 indicates that the water intake in the interval is added back after stripping to restore the natural runoff state.
[0222] Construct the extended response matrix X of the maximum delay L, whose elements are:
[0223] X(t,k)=Q rel (tk), k=0,1,…,L;
[0224] Establish an optimization model with total variation regularization and simplex constraints:
[0225] ;
[0226] Constraints:
[0227] ;
[0228] Where w is the evolution coefficient (weight vector), w k λ is the weighting coefficient corresponding to the time delay k; tv This is the total variation regularization coefficient, used to enhance the continuity of the time-delay kernel.
[0229] Identifying stable and effective time delay:
[0230] The historical data is divided into B sub-sample windows, and the above optimization is solved for each sub-window to obtain the weight vector w. (b) Calculate the selection frequency π for each delay k. k :
[0231] ;
[0232] Define the set of stable effective delays Ω: Ω = {k | π} k ≥ρ};
[0233] In the formula, ε w The weighting threshold is set, for example, 0.01. This is a stability threshold, for example, 0.8.
[0234] Furthermore, adaptive posterior simplex sampling generates an evolution parameter set.
[0235] Constructing the Dirichlet posterior distribution parameters:
[0236] ;
[0237] in, The weights for physical consistency time delay kernel identification are α0, which is the prior baseline parameter and can be 1.0.
[0238] Based on residual variance σ res 2 Adaptive setting of concentration coefficient κ:
[0239] ;
[0240] Where κ0 is the scale constant and ε is a small positive number to avoid division by zero.
[0241] Sample from the Dirichlet distribution to generate a set of evolution parameters:
[0242] ;
[0243] In the rolling update, the posterior parameters are updated using an exponential forgetting method:
[0244] ;
[0245] Where η is the forgetting coefficient, for example, 0.2. It is a vector consisting entirely of 1s.
[0246] Step S02 involves characterizing and simulating the uncertainty in runoff forecasting, such as... Figure 4 As shown.
[0247] For each sample w in the evolution parameter set (m) By inferring the confluence sequence of the intervals through water balance:
[0248] ;
[0249] The reverse sequence is subjected to quality control (negative values and anomalous mutations are removed), and the mean, standard deviation and coefficient of variation within a 7-day sliding window are calculated as input features for the data channel.
[0250] Furthermore, a dual-channel LSTM model with distributed output is constructed and trained.
[0251] Model input:
[0252] Physical channel: Recession baseline sequence Q base (t) and its first difference.
[0253] Data Channel: Interval Convergence Sequence Q lateral (t) and its moving statistical characteristics (mean, standard deviation, coefficient of variation).
[0254] Model output:
[0255] The model is configured with two output heads, which output the conditional mean μ(t,τ) and conditional scale s(t,τ) for each prediction period τ, respectively, where s(t,τ)>0 is guaranteed by the softplus activation function.
[0256] Design the loss function, the total loss function L. total By negative log-likelihood term L nll Consistency constraint term L with drainage rec constitute:
[0257] L total =L nll +β rec ·L rec ;
[0258] in:
[0259] β rec This refers to the weighting coefficient for the water discharge constraint.
[0260] ;
[0261] ;
[0262] H is the set of time ranges for prediction, Q true (t+τ) represents the actual observed flow rate, f rec (μ(t,τ),θ rec ) is the theoretical recession function, θ rec These are the physical parameters of the water receding function (parameters of the water receding curve equation).
[0263] The gated variable g(t)∈{0,1} is determined by a combination of criteria such as the upstream discharge rate of change and the abrupt change in water use in the interval, and is activated (set to 1) only during the period when water recedes.
[0264] Furthermore, a two-layer ensemble forecast is generated.
[0265] Outer loop (evolutionary uncertainty): for each evolution parameter sample w (m) Repeat the above steps to obtain the corresponding forecast distribution parameter μ. (m) (t,τ) and s (m) (t,τ).
[0266] Inner loop (forecast uncertainty): For each outer sample m, sample N standard normal random numbers z. (n) ~N(0,1), generate secondary flow samples:
[0267] ;
[0268] The final result is a two-layer ensemble forecast sample of size M×N.
[0269] Step S03: Construct and continuously update the real-time risk early warning model for river ecological flow.
[0270] Perform online probability calibration.
[0271] Collect sliding windows Historical data (e.g., the last 90 days) ,in, Labels for actual subthreshold events. Q represents the current uncalibrated risk probability. eco This is a preset ecological flow threshold.
[0272] The monotonic calibration function C is fitted using the equal-frequency binning method or ordinal-preserving regression. τ (·)
[0273] Current uncalibrated risk probability Perform calibration:
[0274] .
[0275] Furthermore, attribution of key sources of uncertainty is conducted:
[0276] Calculate the outer conditional mean:
[0277] ;
[0278] Perform variance decomposition:
[0279] ;
[0280] Among them, U evo Contribution to evolutionary uncertainty; U for Contribution to forecast uncertainty; U int The contribution of the interaction term is the remaining part of the total variance after subtracting the sum of the contributions of the first two terms.
[0281] Determine the critical source source(t,τ):
[0282] ;
[0283] Based on this, early warnings are issued and updated on a rolling basis.
[0284] Set risk classification threshold γ r For example, if γ1=0.5 and γ2=0.8, the risk level is determined based on the calibration probability:
[0285] ;
[0286] The calculation shows that the predetermined warning level r has been reached. * Early warning period:
[0287] ;
[0288] Progress is made in a rolling fashion with fixed time steps (usually 1 day):
[0289] 1) Incorporate the latest observed flow rate Q obs (t), discharge flow rate Q rel (t) and the updated scheduling plan;
[0290] 2) Re-execute the online probability calibration to early warning output and rolling update steps, and update the evolution parameter posterior, forecast distribution, calibration probability and attribution results;
[0291] 3) Output the updated risk level, early warning period and key sources of uncertainty to achieve continuous and adaptive ecological flow risk early warning.
[0292] This application employs a receding water baseline stripping technique to purify the confluence signal, and combines this with total variation (TV) regularization and simplex constraints to effectively suppress oscillations in numerical calculations. It also forces the identified time-delay kernel weights to meet the requirements of non-negativity, normalization, and smoothness, ensuring that the evolution model strictly adheres to the water balance principle. This solves the problem of non-physical oscillations and negative values easily occurring in the identification of evolution parameters under weak signals during the dry season.
[0293] This application constructs a dual-channel model (physical and data-driven) and embeds a recession gating consistency constraint into the training loss function. This mechanism automatically activates the physical penalty term during the dominant recession period using gating variables, forcing the model output to follow an exponential decay law, thus eliminating the phenomenon of long-term prediction results diverging or violating physical common sense. It solves the problem of pure data-driven forecast models violating physical decay laws during the recession phase.
[0294] This application establishes a two-layer ensemble sampling architecture for evolution parameters and forecast residuals. Through a nested structure of outer Dirichlet adaptive sampling and inner random sampling, combined with variance decomposition technology, it achieves the separation and independent quantification of two types of uncertainties at the topological level, solving the problem of accurately locating the source of risk. It also addresses the difficulty in decoupling and attributing evolution parameter errors and forecast residuals.
[0295] The preferred embodiments of the present invention have been described in detail above. However, the present invention is not limited to the specific details in the above embodiments. Within the scope of the technical concept of the present invention, various equivalent transformations can be made to the technical solutions of the present invention, and these equivalent transformations all fall within the protection scope of the present invention.
Claims
1. A method for real-time risk early warning of river ecological flow, characterized in that, include: Obtain historical hydrological observation data of the river during the dry season, including upstream discharge and downstream cross-sectional observation flow; Based on historical hydrological observation data, the time-delay response relationship that satisfies the physical consistency constraint is identified, and a set of evolution parameters representing the uncertainty of the evolution process is generated. Based on the evolution parameter set, the inter-regional confluence characteristics are constructed and input into the pre-constructed probabilistic runoff forecasting model to obtain the predicted distribution statistics of the downstream section. Two-layer ensemble sampling is performed based on the evolution parameter set and the prediction distribution statistics to generate a cross-sectional flow ensemble forecast that couples evolution uncertainty and forecast uncertainty; Based on cross-sectional flow ensemble forecasts, risk probabilities are calculated and key sources of uncertainty are identified, resulting in ecological flow risk warnings. Based on historical hydrological observation data, the time-delay response relationships that satisfy physical consistency constraints are identified, including: Extract the stable water receding section during the dry season from historical hydrological observation data, calibrate the interval confluence water receding curve equation, and generate the water receding baseline sequence for the entire time period. By stripping the recession baseline sequence from the downstream cross-sectional flow observations, an unexplained flow sequence characterizing the inter-regional net confluence response is obtained; Based on the flow sequence to be explained and the upstream outflow flow, a time-delayed response identification is performed; Execution delay response identification includes: Based on the preset maximum delay extension length, an extended delay response matrix is constructed using the upstream outflow. To minimize the error between the traffic sequence to be explained and the reconstructed traffic, a least squares optimization model containing sparse smooth regularization terms and physical consistency constraints is established. Solve the least squares optimization model to obtain a physically consistent time-delay kernel weight vector; Inter-regional runoff characteristics are constructed based on the evolution parameter set and input into a pre-constructed probabilistic runoff forecasting model, including: Based on the evolution parameter samples in the evolution parameter set, the interval confluence sequence is inferred by using the water balance principle, and the sliding statistical characteristics of the interval confluence sequence are calculated. A probabilistic runoff forecasting model with a dual-channel input structure was constructed, in which the physical channel inputs the recession baseline sequence extracted from historical hydrological observation data, and the data channel inputs the interval confluence sequence and its sliding statistical characteristics. The conditional mean and conditional scale of the predicted cross-sectional flow distribution within the future forecast period are output using a probabilistic runoff forecast model.
2. The method according to claim 1, characterized in that, The least squares optimization model satisfies the following conditions: The sparse smoothing regularization term is the total variation regularization term, used to suppress non-physical oscillations of the lagging kernel weight vector between adjacent lagging steps; The physical consistency constraint is a simplex constraint, used to ensure that each element of the time-delay kernel weight vector is non-negative and the sum of the elements is equal to 1; The time delay kernel weight vector is determined by repeatedly solving the least squares optimization model after dividing historical hydrological observation data into multiple sub-sample windows, and then filtering and retaining the effective time delay set based on the stability threshold.
3. The method according to claim 1, characterized in that, Generate a set of evolution parameters that characterize the uncertainties in the evolution process, including: Calculate the identification residual sequence and its residual variance of the least squares optimization model; Using the time-delay kernel weight vector as the distribution center, construct a posterior probability distribution that satisfies the simplex constraint; The concentration coefficient of the posterior probability distribution is dynamically adjusted based on the residual variance, so that the dispersion of the posterior probability distribution is positively correlated with the residual variance. Multiple samplings are performed from the adjusted posterior probability distribution to generate a set of evolution parameters.
4. The method according to claim 3, characterized in that, The posterior probability distribution is a Dirichlet distribution; the concentration coefficient of the posterior probability distribution is dynamically adjusted based on the residual variance, which is a parameter vector of the Dirichlet distribution determined by the inverse proportional mapping function; The inverse proportional mapping function is: κ = κ0 / max(σ) res 2 ,ε); Where κ is the concentration coefficient, κ0 is the preset baseline scale constant, and σ res 2 ε is the residual variance, and ε is the preset truncation threshold to prevent the denominator from being zero. The parameter vector of the Dirichlet distribution is determined by the product of the time-delay kernel weight vector and the concentration coefficient.
5. The method according to claim 1, characterized in that, The training process for a probabilistic runoff forecasting model includes: Construct a loss function that includes a negative log-likelihood term and a conditional physical consistency constraint term; During the training iteration process, it is monitored in real time whether the physical dominance condition is met in the current time period; When the physical dominance condition is met, the physical consistency constraint term is activated to force the conditional mean to follow the physical decay law. Minimize the loss function to optimize the model parameters.
6. The method according to claim 5, characterized in that, The activation state of the conditional physical consistency constraint is controlled by a gating variable, and the calculation logic of the gating variable is as follows: A gated variable is constructed based on the rate of change of upstream outflow and a preset interval replenishment indicator; when the upstream outflow is stable and there is no strong interval replenishment, the gated variable is set to the active state. The specific form of the conditional physical consistency constraint is as follows: ; Among them, L phy For the conditional physical consistency constraint term, λ phy As a penalty weight, g t For gated variables, and These are the conditional mean values of the current time and the previous time, respectively, e -k' This is the physical attenuation factor determined based on the interval confluence drainage curve equation.
7. The method according to claim 1, characterized in that, Generate a ensemble forecast of cross-sectional flows that combines evolutionary and forecast uncertainties, including: Execute the outer loop: Iterate through each evolution parameter sample in the evolution parameter set and use the probabilistic runoff prediction model to calculate the corresponding conditional mean and conditional scale; Execute the inner loop: For each evolution parameter sample, the conditional mean and conditional scale are used to generate multiple sets of forecast residual samples that meet the distribution characteristics through random sampling, and then superimposed to obtain the secondary flow prediction value; The second-level flow predictions generated by all outer and inner loops are combined to form a two-layer ensemble forecast sample.
8. The method according to claim 7, characterized in that, Risk probabilities are calculated and key sources of uncertainty are identified based on cross-sectional flow ensemble forecasts, including: The total variance of the two-level ensemble forecast sample is decomposed using the principle of variance decomposition. The variance of the conditional mean over the set of evolution parameters is calculated as a contribution to the evolution uncertainty. The mean of the conditional scale squared over the set of evolution parameters is calculated as a contribution to the forecast uncertainty. By comparing the contributions of evolution uncertainty and forecast uncertainty, the one with the larger contribution is identified as the key source of uncertainty at the current warning time.