A runoff prediction method and system fusing dynamic physical priori and graph network
By constructing directed graphs and edge-conditional convolutional networks to generate transmission weights and gating coefficients, and combining them with the R-Vine Copula model, the problem of imbalance between the physical structure of the river network and the dynamic evolution of hydrology in runoff prediction is solved. This achieves accurate capture of complex topological structures and spatiotemporal characteristics, and improves the prediction and risk quantification capabilities for extreme flood peaks.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- XIAN UNIV OF TECH
- Filing Date
- 2026-05-20
- Publication Date
- 2026-06-19
AI Technical Summary
Existing technologies for runoff prediction suffer from several problems, including the imbalance between constraints of river network physical structure and hydrological dynamic evolution, difficulties in capturing long-term spatiotemporal dependencies, data quality defects, and a lack of risk quantification capabilities in deterministic point prediction. These issues make it difficult to accurately capture complex topological structures and spatiotemporal characteristics, resulting in prediction results lacking support from probability distribution information.
By constructing a directed graph, generating transmission weights and gating coefficients, and combining edge-conditional convolutional networks and multilayer perceptrons, dynamic physical prior features are generated. The R-Vine Copula model is then used for probabilistic forecasting. This constructs a runoff prediction method and system that integrates dynamic physical priors and graph networks, enhancing the ability to capture features and quantify risks of extreme flood peaks.
It enables the joint characterization of static river channel attributes and dynamic hydrological evolution processes in a watershed, accurately reconstructs the continuous physical migration process of spatial displacement accumulating over time, provides a scientific and systematic risk assessment reference, and enhances the ability to predict extreme flood peaks.
Smart Images

Figure CN122242570A_ABST
Abstract
Description
Technical Field
[0001] This invention relates to a method and system for runoff prediction that integrates dynamic physical priors and graph networks, belonging to the field of hydrological prediction technology. Background Technology
[0002] Accurate runoff forecasting is crucial for flood control and disaster reduction, rational water resource allocation, and sustainable ecological development. With advancements in remote sensing technology and perception methods, data-driven models based on deep learning, with their powerful ability to mine high-dimensional nonlinear features, have demonstrated higher computational efficiency and robustness in runoff forecasting compared to traditional physics-driven models. However, traditional machine learning has limitations in capturing spatial dependencies and heterogeneity when dealing with the prevalent non-Euclidean data with complex topological connections in the real world.
[0003] The introduction of Graph Neural Networks (GNNs) provides a novel mathematical framework for solving this problem. This architecture naturally adapts to complex topologies, accurately captures long-range dependencies between nodes, and enhances the model's ability to represent complex network systems. However, existing GNNs still face the following limitations when processing non-Euclidean graph data and node-level runoff prediction tasks:
[0004] (1) Imbalance between the physical structure constraints of the river network and the dynamic evolution of hydrology. In the existing technology, some models, such as Diffusion Convolutional Recurrent Neural Network (DCRNN) and Convolution with Edge-Node Switching Graph Neural Network (CensNet), although attempting to deeply integrate edge attributes with graph structure, have mostly static weight mechanisms, which are difficult to capture the real-time fluctuations of runoff convergence efficiency driven by meteorology, thus limiting the model's ability to express dynamic hydrological processes. On the other hand, another type of weight learning model that relies entirely on data-driven approaches, such as Graph Attention Network (GAT) and AdaTrip, although possessing dynamic adjustment capabilities, often loses the inherent physical topological constraints of the watershed and lacks robustness when facing abnormal conditions such as extreme rainfall.
[0005] (2) The dilemma of capturing long-term spatiotemporal dependence and distortion of physical evolution. GNN is essentially extracting discrete time slice features. This method cuts off the continuity of the time dimension, causing the model to be unable to capture the long and short lag effects and cumulative effects in hydrological evolution. Although some technologies attempt to combine GNN with recurrent neural networks (RNN), this loosely coupled mode of static spatial aggregation + general temporal inference makes the spatial features input to the temporal layer fixed and lagging in terms of physical properties, making it difficult for the model to restore the spatiotemporal cumulative characteristics of watershed runoff in physical evolution.
[0006] (3) Deep learning models face a dual challenge from data quality defects and extreme runoff distribution characteristics. On the one hand, at the input end, multi-source heterogeneous observation data are often accompanied by missing data and noise interference. Existing models often use simple zero-padding or linear interpolation in the data preprocessing stage, which often injects accumulated noise into the network during feature aggregation and information transmission. On the other hand, at the optimization end, the runoff sequence itself has extremely asymmetric and long-tailed distribution characteristics. The traditional Mean Squared Error (MSE) loss function adheres to the principle of indiscriminate punishment, which makes the network tend to fit most of the characteristics of the normal water period, thus seriously ignoring the low-frequency and high-risk extreme flood peak signals, and making it difficult to avoid the asymmetric risks caused by the prediction bias of extreme events.
[0007] (4) Deterministic point prediction only provides a single numerical output, making it difficult to quantify the complex stochastic uncertainties in runoff prediction. This is because, driven by factors such as atmospheric circulation, significant implicit correlations exist between distant stations that do not have direct hydrological connections. These complex implicit dependencies exceed the scope of traditional GNNs, preventing them from effectively characterizing the spatiotemporal uncertainties in runoff evolution and leading to a severe underestimation of systemic risks. This limitation results in predictions lacking probability distribution information, making it difficult to meet the need for uncertainty quantification in high-risk decision-making. Summary of the Invention
[0008] This invention provides a method and system for runoff prediction that integrates dynamic physical priors and graph networks. It can solve problems such as the imbalance between physical topological constraints and dynamic evolution laws, the distortion of spatiotemporal feature coupling, and the lack of risk quantification capability in deterministic point prediction in the prior art.
[0009] On the one hand, the present invention provides a runoff prediction method that integrates dynamic physical priors and graph networks, the method comprising:
[0010] S1. Construct a directed graph using multiple hydrological stations and multiple meteorological stations within the basin as nodes. The directed graph includes the observation features of each node and the static edge features of each directed edge. The nodes corresponding to the hydrological stations are denoted as hydrological nodes, and the nodes corresponding to the meteorological stations are denoted as meteorological nodes. Determine the observation mask for each observation feature of each node at each time. The observation mask is used to characterize whether the measured data of the corresponding observation feature is valid.
[0011] S2. Based on the static edge characteristics of each directed edge in the directed graph, a transmission weight is generated for the corresponding directed edge using a filtering generation network. The transmission weight is used to characterize the hydraulic transmission capacity between the two nodes connected by the corresponding directed edge.
[0012] S3. Based on the static edge characteristics of each directed edge and the observation characteristics of each node at each time step, determine the gating coefficient of each directed edge at each time step using a multilayer perceptron and gating mechanism.
[0013] S4. Based on the transmission weights, the gating coefficients, and the observation features of each node at each time step, generate the dynamic physical prior features of each node at each time step using the edge-conditional convolutional network.
[0014] S5. Determine the predicted runoff for each hydrological node based on the observation mask and the dynamic physical prior features.
[0015] Optionally, after S5, the method further includes:
[0016] S6. Based on the predicted runoff at each hydrological node, determine the runoff probability forecast interval corresponding to each predicted runoff using the R-Vine Copula model.
[0017] Optionally, S5 specifically includes:
[0018] A runoff prediction model is constructed based on the observation mask;
[0019] Based on the aforementioned dynamic physical prior characteristics, the predicted runoff volume for each hydrological node is determined using the aforementioned runoff prediction model.
[0020] Optionally, a runoff prediction model is constructed based on the observation mask, specifically including:
[0021] Construct a loss function based on the observation mask;
[0022] By integrating the aforementioned dynamic physical prior features, an initial model for predicting runoff at each hydrological node is constructed based on the aforementioned loss function and time-dimensional neural network.
[0023] The hyperparameter combination of the initial model is optimized using a hyperparameter optimization algorithm, and the optimized initial model is used as the runoff prediction model.
[0024] Optionally, a loss function is constructed based on the observation mask, specifically including:
[0025] Determine the node weight of each hydrological node, determine the basic weight of the sample based on the flood flow threshold, and determine the deviation correction weight based on the prediction deviation.
[0026] A triple nested weight structure is constructed based on the node weight, the sample base weight, and the deviation correction weight;
[0027] A loss function is constructed based on the triple nested weights and the observation mask.
[0028] Optionally, a hyperparameter optimization algorithm is used to optimize the combination of hyperparameters of the initial model, specifically including:
[0029] Multiple hyperparameter combinations are iteratively generated using a hyperparameter optimization algorithm, and the performance index of the initial model under different hyperparameter combinations is determined based on the Nash efficiency coefficient.
[0030] The determined performance indicators are divided into positive indicators that meet the preset performance and negative indicators that do not meet the preset performance. A first probability density model is constructed based on all positive indicators, and a second probability density model is constructed based on all negative indicators.
[0031] The hyperparameter combination of the initial model is optimized with the goal of maximizing the ratio of the first probability density model to the second probability density model.
[0032] Optionally, the observed characteristics of the hydrological nodes include measured runoff; prior to S2, the method further includes:
[0033] The measured runoff at each hydrological node at each time moment is standardized based on the observation mask to obtain the standardized runoff.
[0034] Correspondingly, the observation features in S3 and S4 include the standardized runoff at each hydrological node at each time point.
[0035] Optionally, the measured runoff at each hydrological node at each time point is standardized according to the observation mask to obtain the standardized runoff, specifically including:
[0036] Logarithmically compress the measured runoff at each hydrological node at each time point to obtain the compressed runoff.
[0037] Based on the compressive runoff at all times and the observation mask of the measured runoff at all times for each hydrological node, determine the mean and standard deviation of the compressive runoff at all times for each node.
[0038] The standard score of the compressed runoff at each node at each time step is determined based on the mean and the standard deviation, and the standard score is determined as the standardized runoff of the corresponding measured runoff.
[0039] Optionally, S6 specifically includes:
[0040] Based on the residual between the predicted runoff and the measured runoff at each hydrological node, a residual simulation model is constructed using the R-Vine Copula model.
[0041] The residual simulation model is used to generate simulated residual samples corresponding to each predicted runoff volume;
[0042] The simulated residual samples are superimposed with their corresponding predicted runoff volumes to obtain the runoff probability forecast interval for each predicted runoff volume.
[0043] On the other hand, the present invention provides a runoff prediction system that integrates dynamic physical priors and graph networks, the system comprising:
[0044] The directed graph generation unit is used to construct a directed graph with multiple hydrological stations and multiple meteorological stations in the basin as nodes. The directed graph contains the observation features of each node and the static edge features of each directed edge. The nodes corresponding to the hydrological stations are recorded as hydrological nodes and the nodes corresponding to the meteorological stations are recorded as meteorological nodes. The observation mask of each node for each observation feature at each time is determined. The observation mask is used to characterize whether the measured data of the corresponding observation feature is valid.
[0045] The transmission weight generation unit is used to generate the transmission weight of the corresponding directed edge based on the static edge characteristics of each directed edge in the directed graph and the filtering generation network. The transmission weight is used to characterize the hydraulic transmission capacity between the two nodes connected by the corresponding directed edge.
[0046] The gating coefficient generation unit is used to determine the gating coefficient of each directed edge at each time step based on the static edge characteristics of each directed edge and the observation characteristics of each node at each time step, using a multilayer perceptron and a gating mechanism.
[0047] The prior feature generation unit is used to generate dynamic physical prior features of each node at each time step based on the transmission weights, gating coefficients and the observation features of each node at each time step, using the edge-conditional convolutional network.
[0048] The runoff prediction unit is used to determine the predicted runoff for each hydrological node based on the observation mask and dynamic physical prior characteristics.
[0049] The beneficial effects that this invention can produce include:
[0050] This invention achieves a real-time balance between physical topological constraints and dynamic hydrological evolution through edge-conditional convolution and dynamic gating mechanisms. By constructing a spatiotemporal graph neural network architecture that integrates physical priors and state awareness, it achieves a joint representation of the static river channel attributes and dynamic hydrological evolution process of the watershed, accurately reconstructing the continuous physical migration process of spatial displacement accumulating over time. Utilizing a data observation mask and an asymmetric loss function with triple-nested weight constraints, the model's ability to capture features of extreme flood peak samples is significantly enhanced. Finally, by using the R-Vine Copula model to capture the implicit spatial correlations between stations, it successfully transforms traditional deterministic point prediction into probabilistic forecasting that incorporates the spatial correlation characteristics and physical evolution logic of the watershed, providing a more scientific and comprehensive systemic risk assessment reference for the optimal allocation of watershed water resources and flood control and disaster reduction decisions. Attached Figure Description
[0051] Figure 1 This is a flowchart illustrating the runoff prediction method that integrates dynamic physical priors and graph networks provided in Embodiment 1 of the present invention.
[0052] Figure 2 This is an example of a watershed node spatial topology diagram provided in Embodiment 2 of the present invention;
[0053] Figure 3 This is a schematic diagram of the structure of the first layer tree (Tree 1) of the R-Vine Copula provided in Embodiment 2 of the present invention;
[0054] Figure 4 This refers to the runoff probability forecast intervals for multiple hydrological nodes provided in Embodiment 2 of the present invention. Detailed Implementation
[0055] The present invention will now be described in detail with reference to the embodiments, but the present invention is not limited to these embodiments.
[0056] Embodiment 1 of this invention provides a runoff prediction method that integrates dynamic physical priors and graph networks, such as... Figure 1 As shown, the method includes:
[0057] S1. Construct a directed graph of the watershed using multiple hydrological stations and multiple meteorological stations as nodes. This directed graph includes the observation features of each node and the static edge features of each directed edge. Nodes corresponding to hydrological stations are denoted as hydrological nodes, and nodes corresponding to meteorological stations are denoted as meteorological nodes. Determine the observation mask for each observation feature of each node at each time point. The observation mask is used to characterize whether the measured data for the corresponding observation feature is valid. Specifically, this includes:
[0058] S1.1: This embodiment addresses the non-Euclidean spatial distribution characteristics of observation stations such as hydrological and meteorological stations within a watershed system. Both hydrological and meteorological stations are abstracted as graph nodes, and the topological connections between nodes are predefined based on runoff evolution patterns, constructing a directed graph based on physical topological constraints. This directed graph breaks down the spatial isolation between observation stations, mapping the observation results of each station to the observation feature time series of each node on the directed graph, and transforming the spatial physical association attributes between observation stations into static edge features. The observation feature time series of each node and the static edge features of the directed edges between nodes together constitute complete directed graph data, laying the structural foundation for subsequent spatiotemporal feature fusion.
[0059] Specifically, the directed graph of the watershed can be represented as: .in, express A set of nodes , Each node contains Hydrological nodes and Each meteorological node Represents a set of nodes The first in Each node. This represents the set of all directed edges in a directed graph.
[0060] node At any moment The observed characteristics include runoff characteristics, rainfall characteristics, temperature characteristics, humidity characteristics, and static elevation characteristics, which are represented by vectors as follows:
[0061] (1)
[0062] In formula (1), Represents a node At any moment The observation characteristics, and , Represents the field of real numbers, that is For the real number field The first 1 eigenvector; , , , and Representing nodes respectively At any moment The characteristics of runoff, rainfall, temperature, humidity, and static elevation.
[0063] Connecting nodes and nodes The directed edge is denoted as , Directed edge Characterizing water flow by nodes Flow to Node ,node It can be called the source node, node It can be called the target node.
[0064] Unlike traditional graph structures that rely solely on 0-1 adjacency matrices (i.e., containing only unweighted edges indicating connectivity), the directed graph structure constructed in this embodiment defines an edge attribute mapping function. For each directed edge in a directed graph The edge attribute mapping function Map it to a real number field superior By constructing a static physical attribute vector space, we can obtain the static edge characteristics of each directed edge.
[0065] Directed edge The static edge features include physical features such as node distance features, elevation difference features, slope features, inverse distance features, channel length features, and self-loop marker features. The vector representation of the static edge features is as follows:
[0066] (2)
[0067] In formula (2), Represents a directed edge Static edge features; Show edge attribute mapping function; They represent directed edges respectively. The corresponding node distance features, elevation difference features, slope features, distance reciprocal features, channel length features, and self-loop marker features encode key geographical information within the watershed and can serve as prior knowledge to guide the message transmission process of the subsequently constructed runoff prediction model.
[0068] S1.2 Determine the observation mask for each observation feature of each node at each time step.
[0069] To address the issues of missing data and heterogeneous data asymmetry in actual observations, which affect the prediction accuracy of the subsequently constructed runoff prediction model, this embodiment further determines the observation mask for each observation feature at each time step for each node. The observation mask is used to characterize whether the measured data for the corresponding observation feature is valid. The set of observation masks for all observation features of all nodes at all time steps can be represented as an observation mask matrix, i.e.: ,in, Represents the observation mask matrix, Represents the total number of moments. express A set of nodes with nodes. This represents the total number of observed features. (The nodes are then...) At any moment The The observation mask for each observed feature is denoted as When node ( and At that moment The The observed feature has a valid observation value (i.e., the first... When the measured data of each observed feature are valid, then the node At any moment The Observation mask for each observation feature ;otherwise This observation mask can explicitly eliminate the interference of invalid observations through element-wise multiplication in the subsequent calculation process of the runoff prediction model, thereby guiding the runoff prediction model to update parameters only within the effective spatiotemporal domain.
[0070] S1.3: Standardize the measured runoff of each hydrological node at each time step according to the observation mask to obtain the standardized runoff.
[0071] The runoff characteristics observed at hydrological nodes include measured runoff volume. To eliminate dimensional differences between different measured runoff volumes and mitigate the long-tailed distribution of runoff data, this embodiment also performs standardization processing on the measured runoff volume, stabilizing the data variance through nonlinear transformation, and providing a standardized data foundation for the convergence of the subsequently constructed runoff prediction model.
[0072] Specifically, regarding time Hydrological nodes Measured runoff This embodiment suppresses the skewed distribution of high-volume peak flows and mitigates the long-tailed distribution through logarithmic compression, resulting in compressed runoff. The logarithmic compression process can be represented as:
[0073] (3)
[0074] In formula (3), Indicates time Hydrological nodes The measured runoff, Indicates to The time was obtained after logarithmic compression. Hydrological nodes The compressed flow rate.
[0075] Meanwhile, in this embodiment, the mean and standard deviation of the compressive runoff of each hydrological node at all times are determined based on the observation mask of the compressive runoff of each hydrological node at all times and the measured runoff of each hydrological node at all times.
[0076] The formula for calculating the mean is as follows:
[0077] (4)
[0078] In formula (4), Indicates hydrological nodes The average compressed runoff at all times; Indicates time Hydrological nodes Compressed flow rate; Indicates hydrological nodes At any moment The observation mask for the measured runoff; Represents the total number of moments. .
[0079] The formula for calculating standard deviation is as follows:
[0080] (5)
[0081] In formula (5), Indicates hydrological nodes The standard deviation of compressed runoff at all times; Indicates hydrological nodes The average compressed runoff at all times; Indicates time Hydrological nodes Compressed flow rate; Indicates hydrological nodes At any moment The observation mask for the measured runoff; Represents the total number of moments. .
[0082] The standard score (Z-score) of compressible runoff at each hydrological node at each time point is determined based on the mean and standard deviation. The Z-score is then used to define the standardized runoff of the corresponding measured runoff. The formula for calculating the Z-score is:
[0083] (6)
[0084] In formula (6), Indicates hydrological nodes The standard deviation of compressed runoff at all times; The table represents hydrological nodes. The average compressed runoff at all times; Indicates time Hydrological nodes Compressed flow rate; express The corresponding Z-score, i.e., the measured runoff. The corresponding standardized runoff.
[0085] In this step, this embodiment also standardizes the measured data of all other observed characteristics besides the measured runoff, such as the measured data of rainfall, temperature, humidity, and static elevation. It is worth noting that when standardizing the measured data other than the measured runoff, this embodiment does not perform logarithmic compression on these measured data; instead, it directly calculates the standard score corresponding to each measured data point and uses that score as its standardized data.
[0086] It is worth noting that the observation features used in subsequent steps S3 and S4 are all standardized data.
[0087] S1.4: Fill in missing data based on observation mask.
[0088] Furthermore, to address the unavoidable random missing data in meteorological and hydrological monitoring data, this embodiment introduces a filling strategy based on a nearest-neighbor observation mask. This strategy aims to avoid bias caused by one-sided extreme values while preserving the local smoothness of the time series: when valid observations exist before and after the missing point, the mean of the nearest neighboring valid observations is used for a linear transition; when the missing point is located at the end of the valid observation range (i.e., there are no valid observations before or after the missing point), the mean of the two nearest valid observations on the same side is used to extrapolate the boundary state; if only a single valid observation exists on one side, a constant shift is performed; for the extremely rare case of complete sequence missing data, zero values are used as a fallback to maintain the tensor dimension. This method can eliminate computational anomalies caused by missing data blockages to the greatest extent possible without changing the original observation mask properties. Specifically, it is shown in the following equation:
[0089] (7)
[0090] In formula (7): Indicates time Fill in missing data with values; Indicates time All previous observations; Indicates time The observation mask for all previous observations; Indicates time The sum of the observation masks for all previous observations; and Representing time respectively The most recent first and second valid observations; Indicates time All subsequent observations; Indicates time The observation mask for all subsequent observations; Indicates time The sum of the observation masks for all subsequent observations; and Representing time respectively Then, the first and second most recent valid observations are used. Through this imputation strategy, each observation feature has standardized data at each time step, resolving the data missing problem and preventing data gaps from affecting the prediction accuracy of the subsequently established runoff prediction model.
[0091] S2. Based on the static edge characteristics of each directed edge in the directed graph, a transmission weight is generated for the corresponding directed edge using a filtering generation network. The transmission weight is used to characterize the hydraulic transmission capacity between the two nodes connected by the corresponding directed edge. Specifically, this includes:
[0092] This embodiment utilizes an edge-conditional convolution (ECC) graph neural network to encode the static physical transmission mechanism of the watershed. A filter generation network (FGN) is introduced. The convolutional kernel weights are adaptively generated based on the static edge features (such as channel length and slope features) of each directed edge defined in step S1. This mechanism overcomes the limitations of traditional graph neural networks that use fixed scalar weights, enabling the model to simulate the spatial transport differences determined by topography and channel features in hydrophysical processes, explicitly embedding the physical constraints of the watershed into the feature transmission path of the neural network. Specifically, this includes:
[0093] S2.1: This embodiment employs a multilayer perceptron architecture and connects nodes. and nodes Directed edge Static edge features and learnable parameters enter , The output will be the feature transformation matrix corresponding to the directed edge:
[0094] (8)
[0095] In formula (8): Represents a directed edge The characteristic transformation matrix, Represents a directed edge Static edge features; express Learnable parameters.
[0096] This feature transformation matrix can characterize the hydraulic transmission capacity between two nodes connected by a corresponding directed edge. In this embodiment, it is defined as the transmission weight.
[0097] S2.2: Utilizing ECC to perform feature aggregation operations can enhance the observed features of nodes in the spatial dimension. Specifically, for any target node... It can be obtained through the characteristic transformation matrix of its corresponding directed edge. Aggregate all source nodes upstream Observational characteristics This enhances the target node in the spatial dimension. Observational characteristics , obtain the target node The static physical transmission characteristics. This aggregation process ensures that information is transmitted only along the physical channel, preventing non-causal downstream information leakage.
[0098] The above feature aggregation and enhancement process can be represented as:
[0099] (9)
[0100] In formula (9): Represents the target node At any moment The static physical transmission characteristics; ReLU represents the nonlinear activation function; This indicates pointing to the target node via a directed edge. source node The set, including the target node Self-loop; Represents a directed edge The characteristic transformation matrix; Represents a directed edge Static edge features; express Learnable parameters; Indicates the source node At any moment Observational characteristics; This represents the bias vector.
[0101] S3. Based on the static edge characteristics of each directed edge and the observation characteristics of each node at each time step, determine the gating coefficient of each directed edge at each time step using a multilayer perceptron and gating mechanism.
[0102] This embodiment constructs a state-aware dynamic gating network based on a multilayer perceptron and a gating mechanism. , with target node Observational characteristics Source node Observational characteristics And the corresponding static edge characteristics of directed edges. Perform vector concatenation. Utilize a state-aware dynamic gating network. Capture the non-linear interaction between the three and employ rescaling. Activation function Output the gating coefficient. The calculation process of the gating coefficient can be expressed as:
[0103] (10)
[0104] (11)
[0105] In formulas (10) and (11): Indicates time Source node Point to target node The gating coefficient; Represents the target node At any moment Observational characteristics; Indicates the source node At any moment Observational characteristics; Represents a directed edge Static edge features; This represents a vector concatenation operation; A gated multilayer perceptron for calculating state-aware weights; This represents the original activation value output by the gating network. Activation function. The output range is constrained to (0.1, 1.0] to ensure minimum conduction strength for physical topology information.
[0106] Gating coefficients are used to gate or adjust the intensity of information flow in the physical topology, identify and suppress spurious connections that fail at specific hydrological times (such as the dry season), thereby significantly enhancing the model's ability to express complex dynamic hydrological processes while maintaining physical structural constraints.
[0107] S4. Based on the transmission weights, gating coefficients, and the observation features of each node at each time step, the dynamic physical prior features of each node at each time step are generated using the spatial aggregation mechanism of the edge-conditional convolutional network.
[0108] This embodiment integrates dynamic physical prior features and dynamic gating to achieve the final update of the node state. Specifically, the gating coefficients calculated in step S3 are used... Coupled with the static physical transport characteristics generated by ECC in step S2, the transmission intensity of upstream information is adjusted to achieve dynamic updates that accurately reflect the instantaneous evolution of the watershed. Ultimately, dynamic physical prior features containing real-time hydrological conditions are obtained:
[0109] (12)
[0110] In formula (12): Represents the target node At any moment The dynamic physical prior characteristics; ReLU represents the nonlinear activation function; This indicates pointing to the target node via a directed edge. source node The set, including the target node Self-loop; Indicates time target node Pointing to node The gating coefficient; Represents a directed edge The characteristic transformation matrix; Represents a directed edge Static edge features; express Learnable parameters; Indicates the source node At any moment Observational characteristics; This represents the bias vector.
[0111] Through the coupling operation in step S4, key hydrological channels can be dynamically strengthened and noise interference suppressed based on real-time operating conditions, thereby constructing a dynamic gated edge conditional convolution module that integrates physical priors, and realizing the dynamic adjustment of the convergence intensity of hydrological response information flow.
[0112] S5. Determine the predicted runoff for each hydrological node based on the observation mask and dynamic physical prior characteristics. Specifically, this includes:
[0113] S5.1: Construct the loss function based on the observation mask.
[0114] In this embodiment, the node weight of each hydrological node is determined based on its importance in the spatial dimension, the sample base weight is determined based on the flood flow threshold, and the deviation correction weight is determined based on the prediction deviation. Then, a triple nested weight is constructed based on the node weight, sample base weight, and deviation correction weight. Finally, a loss function is constructed based on the triple nested weight and the observation mask.
[0115] Because the flow rates at different hydrological nodes across the entire basin vary significantly, it is not appropriate to use a uniform value as the criterion for determining the flow rate at each hydrological node. Therefore, this embodiment sets an adaptive threshold and hydrological condition range based on the historical runoff sequence of each hydrological node in the directed graph.
[0116] Specifically, this embodiment focuses on each hydrological node in the directed graph. Using its historical runoff time series, an adaptive threshold is calculated separately. The adaptive threshold includes a flood flow threshold and a low-water flow threshold, used to distinguish between flood and low-water conditions, respectively. This embodiment calculates the threshold based on each hydrological node. The historical runoff time series is used to determine its historical runoff distribution, and then the 95th quantile of the historical runoff distribution is used to determine its historical runoff distribution. ) is defined as the flood flow threshold, which will be greater than the flood flow threshold. The runoff range is determined as the flood discharge range. At the same time, the 20th percentile ( ) is defined as the low water flow threshold, and flows below the low water flow threshold ( The runoff range is determined as the low-water flow range. .
[0117] To enable the subsequent runoff prediction model to allocate attention more reasonably and dynamically constrain prediction errors, this embodiment constructs a triple-nested weighting system in conjunction with an adaptive threshold:
[0118] (13)
[0119] In formula (13), Indicates hydrological nodes At any moment The triple nested weights; , and Representing nodes respectively At any moment The node weights, sample base weights, and bias correction weights.
[0120] Among them, node weight The design aims to allocate more attention to important hydrological nodes. In this embodiment, the node weights are determined based on the complexity of each hydrological node's topology in the directed graph. The value of the value should be determined by the hydrological station. If the hydrological station corresponding to a certain hydrological node is a key control station on the main stream or an important tributary station, then the node weight of that hydrological node should be increased.
[0121] Sample base weights This setting is intended to address the problem of learning difficulties in subsequent runoff prediction models caused by the scarcity of extreme flood samples. This embodiment focuses on areas within the flood discharge range. Measured runoff Unified application of flood reinforcement weights The basic sample weights for each hydrological node are obtained:
[0122] (14)
[0123] In formula (14), Indicates hydrological nodes At any moment The basic weights of the samples, This indicates that the flood strengthens the weight. Indicates time Hydrological nodes The measured runoff, Indicates hydrological nodes The range of flood flow.
[0124] Deviation Correction Weight The setting is to narrow the predicted runoff volume. Compared with measured runoff This method reduces deviations in flow rate and improves the accuracy of runoff prediction. This embodiment is based on measured runoff volume. The physical state range in which it exists, and the predicted runoff Compared with measured runoff The deviation direction is dynamically generated, and deviation correction weights consistent with the spatiotemporal dimension are determined. . Specifically, when and When an event is identified as a high-risk missed event, a flood penalty weight should be assigned to that hydrological node. ,at this time ;when and If the event is determined to be a false noise event, a low-water suppression weight should be assigned to that hydrological node. ,at this time ; and when and If the event is classified as a routine event, a baseline weight should be assigned to the hydrological node. ,at this time .
[0125] Specifically, deviation correction weight The calculation formula is as follows:
[0126] (15)
[0127] In formula (15), Indicates hydrological nodes At any moment Deviation correction weights, Indicates hydrological nodes The range of flood flow Indicates hydrological nodes The low water flow range, Indicates time Hydrological nodes The measured runoff, Indicates time Hydrological nodes Predicted runoff, Indicates hydrological nodes Flood penalty weighting Indicates hydrological nodes The weight of the drought suppression.
[0128] Then, in this embodiment, the observation mask of the measured runoff is combined with the asymmetric loss calculation of the adaptive threshold. The observation mask is then used to calculate the loss. To eliminate the interference of invalid observations on gradient backpropagation, a loss function is finally constructed by combining triple nested weights:
[0129] (16)
[0130] In formula (16), Represents the loss function. Indicates hydrological nodes At any moment The measured runoff corresponding to the triple nested weights, Indicates hydrological nodes At any moment The observation mask for the measured runoff. Indicates time Hydrological nodes The measured runoff, Indicates time Hydrological nodes Predicted runoff, This represents the total number of hydrological nodes. Represents the total number of moments. This represents the basic loss operator.
[0131] Basic loss operator The mathematical expression for Smooth L1 Loss is as follows:
[0132] (17)
[0133] In formula (17), Represents the basic loss operator. Indicates time Hydrological nodes The measured runoff, Indicates time Hydrological nodes Predicted runoff, This represents the smoothing threshold parameter, in this embodiment... The value is 1.0.
[0134] S5.2: Integrate dynamic physical prior features and construct an initial model based on a loss function and a time-dimensional neural network to predict runoff at each hydrological node.
[0135] Temporal neural networks are neural network architectures with long-range dependency modeling capabilities or recursive feature extraction mechanisms, capable of capturing the temporal evolution patterns in time-series data. This embodiment may employ temporal neural networks including, but not limited to, Long Short-Term Memory (LSTM) networks, Gated Recurrent Units (GRUs), Temporal Convolutional Networks (TCNs), or Transformer architecture neural networks based on self-attention mechanisms. This embodiment uses LSTM as an example for illustration.
[0136] For example, in this embodiment, the loss function established in S5.1 is introduced into the LSTM to obtain the initial model. The prediction process of the initial model is as follows:
[0137] History Each time step (in this embodiment) The measured data are used to generate dynamic physical prior features through the aforementioned steps, and then LSTM is used to capture deep temporal dependencies to achieve multi-site collaborative spatiotemporal runoff prediction. At each time step... Inside, it receives the dynamic physical prior features of the current moment, and recursively updates the cell state and hidden state through the synergistic effect of the forget gate, input gate and output gate of LSTM.
[0138] After completing history After a step-long deduction, the terminal hidden state is extracted. The predicted runoff is transformed into a standardized value through a multi-layer nonlinear fully connected mapping layer (MLP decoder). :
[0139] (18)
[0140] In formula (18): This represents the standardized predicted runoff value. Represents a non-linear activation function. and These represent the weight matrix and bias vector of the first fully connected layer in the MLP decoder, respectively. and These represent the weight matrix and bias vector of the final fully connected layer in the MLP decoder, respectively. Indicates the historical input time step. This represents the terminal hidden state vector extracted by the LSTM in the last time step.
[0141] After obtaining the standardized predicted runoff values, due to the discrepancy between their data magnitude and the actual hydrological magnitude, the initial model still needs to perform destandardization and physical consistency mapping on the standardized runoff prediction values to accurately represent the prediction results at the actual hydrological magnitude. This is to restore the dimensionless runoff prediction values to the actual hydrological magnitude (m). 3 (on the order of / s).
[0142] Specifically, the initial model performs the inverse operation of the standardization process in step S1.3 on the standardized runoff forecasts. This utilizes the global standard deviation in logarithmic space. with the mean The original distribution magnitude of the standardized runoff forecast is restored through exponential transformation, and a nonnegativity constraint operator is explicitly introduced. The predicted runoff at the actual hydrological level is obtained:
[0143] (19)
[0144] In formula (19): Indicates hydrological nodes at the actual hydrological magnitude. At any moment Predicted runoff, Indicates hydrological nodes At any moment Standardized runoff forecast values, Indicates hydrological nodes The mean of compressed runoff at all times. Indicates hydrological nodes The standard deviation of compressed runoff at all times. This represents a nonnegative constraint operator.
[0145] This reduction process not only unifies the magnitude of the predicted runoff with the actual hydrological magnitude, but also ensures, through non-negative constraint operators, that the predicted runoff strictly conforms to the hydrophysical criterion that the runoff is not negative.
[0146] During the initial model training phase, after obtaining the predicted runoff, the loss function of the initial model will be adjusted based on the predicted runoff. Compared with measured runoff Determine the prediction residuals, and based on these residuals, minimize the loss function by optimizing model parameters such as the 3D nested weights. Referring to formulas (13) to (17), the prediction performance of the initial model is continuously improved through iterative optimization.
[0147] S5.3: Optimize the hyperparameter combination of the initial model and use the optimized initial model as the runoff prediction model.
[0148] This embodiment utilizes a hyperparameter optimization algorithm to iteratively generate multiple hyperparameter combinations. Based on the Nash efficiency coefficient, the performance indicators of the initial model under different hyperparameter combinations are determined. The determined performance indicators are divided into positive indicators (better indicators) that meet the preset performance and negative indicators (poor indicators) that do not meet the preset performance. A first probability density model is constructed based on all positive indicators, and a second probability density model is constructed based on all negative indicators. With the goal of maximizing the ratio of the first probability density model and the second probability density model, the hyperparameter combinations of the initial model are optimized to determine the optimal hyperparameter combination of the initial model under a specific watershed environment. Finally, the initial model based on the optimal hyperparameter combination is used as the final runoff prediction model.
[0149] The hyperparameter optimization algorithm used in this embodiment is the Tree-structured Parzen Estimator algorithm (TPE algorithm for short), and the optimization process is as follows:
[0150] First, a hyperparameter search model based on the TPE algorithm is constructed. Under a consistent computational framework (in this embodiment, the Optuna framework is used to construct a unified automated optimization process), the hyperparameter search space, including the optimizer, network architecture, and loss function, is defined as shown in Table 1.
[0151] Table 1 Hyperparameter Search Space
[0152] Then, automated iterative optimization and objective function evaluation are performed. Within the preset computational budget (60 optimization experiments are performed in this embodiment, with each optimization training for 60 epochs), the TPE algorithm achieves intelligent exploration of the initial model architecture and hyperparameter space through dynamic feedback of evaluation metrics.
[0153] First, the Nash-Sutcliffe Efficiency (NSA) coefficient is used to verify the set. The objective function serves as the core metric for measuring the prediction accuracy of the initial model. The calculation formula is as follows:
[0154] (20)
[0155] In formula (20): Indicates hydrological nodes of ; Indicates the total number of moments; Indicates hydrological nodes At any moment Measured runoff; Indicates hydrological nodes At any moment Predicted runoff; Indicates hydrological nodes The average measured runoff at all times.
[0156] To balance the average performance of the initial model across the entire watershed with the robustness of local nodes, this embodiment sets a performance objective function that weighs the prediction performance of the initial model. This function weighs the average performance across the entire watershed with the robustness of local nodes. With minimum This guides the hyperparameter search process to avoid extreme prediction conditions. The performance objective function can be expressed as:
[0157] (twenty one)
[0158] In formula (21): Hyperparameter combination, Indicates based on hyperparameter combination The performance metrics of the initial model, Indicates the total number of nodes; Represents a node The Nash efficiency coefficient of the validation set. and These are the weighting coefficients.
[0159] Weighting coefficient and The aim is to ensure the overall optimization of the initial model while preventing poor predictions at individual nodes. In this embodiment, , .
[0160] Building upon this, this embodiment utilizes the Tree Parzen Estimator (TPE) algorithm within the Optuna framework to perform automated optimization. This automated optimization process is implemented through the following three stages:
[0161] 1) Divide the hyperparameter combinations according to the performance indicators of the initial model;
[0162] In each iteration, the TPE algorithm first uses performance metrics obtained from multiple historical experiments. The hyperparameter combinations are then sorted. Next, they are divided according to a preset quantile (25% in this example). The performance metrics at the preset quantiles during the sorting are considered the preset performance that the initial model aims to achieve. .
[0163] Specifically, performance metrics that rank in the top 25% of the ranking will be designated as positive metrics (better metrics), indicating that they meet the intended preset performance. All the hyperparameter combinations corresponding to the positive indicators constitute the better solution set of the hyperparameters; then the remaining 75% of the performance indicators are identified as negative indicators (poor indicators), indicating that they do not meet the intended preset performance. The combination of hyperparameters corresponding to all negative indices constitutes the poor solution set of the hyperparameters.
[0164] 2) Construct two probability density models based on the two solution sets respectively;
[0165] The first probability density model of the better solution set in the hyperparameter space is constructed using the kernel density estimation (KDE) method. The second probability density model of the poor solution set in the hyperparameter space Specifically, it is expressed as:
[0166] (twenty two)
[0167] In equation (22): This represents the construction function of the probability density model. This represents the first probability density model corresponding to the better solution set. This represents the second probability density model corresponding to the poor solution set. Indicates based on hyperparameter combination The performance metrics of the initial model, This indicates the preset performance that the initial model is intended to achieve.
[0168] 3) Determine the optimal hyperparameter combination based on the expected improvement amount.
[0169] This embodiment determines the next set of candidate hyperparameter combinations by maximizing the expected improvement (EI). Under the TPE algorithm architecture, the calculation process of EI can find the combination that makes the first probability density model... Second probability density model The ratio ( Maximizing the combination of hyperparameters In this embodiment, the search process for the optimal hyperparameter region is transformed into a comparison process between two probability density models. This prompts the initial model to prioritize exploring hyperparameter regions that have a high probability of appearing in the better solution set and a low probability of appearing in the worse solution set in subsequent iterations, thereby significantly improving the efficiency of hyperparameter optimization.
[0170] By dividing hyperparameter combinations and constructing probability density models, the TPE algorithm transforms the task of searching for the optimal hyperparameter region into comparing probability density models.
[0171] After optimizing the hyperparameters of the initial model through the above steps, an optimized model is obtained. This embodiment further verifies the stability and performs statistical analysis on the optimized model. To quantify the reliability of the runoff prediction model under random initialization conditions, this embodiment uses 10 different random seeds to independently and repeatedly train the optimized model, and... The mean and standard deviation of the indicators serve as the core evaluation benchmarks for the predictive performance of the optimized model, ensuring that the optimized model possesses excellent robustness under different environments. Through training, the optimized model is eventually made to meet the preset predictive performance, and this optimized model is then the runoff prediction model.
[0172] S5.4: Based on the dynamic physical prior characteristics, the predicted runoff volume for each hydrological node is determined using the runoff prediction model.
[0173] Specifically, in this embodiment, dynamic physical prior features are input into the runoff prediction model. The runoff prediction model will predict the runoff volume according to the prediction process described in S5.2, obtain a standardized runoff volume prediction value, and then restore the magnitude of the standardized runoff volume prediction value to finally output the predicted runoff volume at the actual hydrological magnitude.
[0174] S6. Based on the predicted runoff at each hydrological node, determine the runoff probability forecast interval corresponding to each predicted runoff using the R-Vine Copula model.
[0175] In this embodiment, a residual simulation model is constructed based on the R-VineCopula model according to the residual between the predicted runoff and the measured runoff at each hydrological node. The residual simulation model is used to generate simulated residual samples corresponding to each predicted runoff. The simulated residual samples are superimposed with their corresponding predicted runoff to obtain the runoff probability forecast interval corresponding to each predicted runoff.
[0176] We employ residual hidden spatial dependency modeling based on R-Vine Copula, and capture the hidden spatial dependencies between hydrological nodes that are not explicitly defined by physical topology by constructing R-Vine vine structures. This transforms the deterministic predicted runoff output by S5 into a runoff probability forecast interval that includes risk assessment information.
[0177] S6 specifically includes:
[0178] S6.1: Perform residual marginal cumulative distribution fitting and probability space transformation. Extract hydrological nodes from step S5. Predicted runoff at each time point Combined with measured runoff Calculate hydrological nodes Set of predicted residuals at all times , This represents the total number of time points. For each hydrological node's prediction residual set, a non-parametric empirical distribution function method is used to fit its specific residual marginal cumulative distribution function. Subsequently, the residual values are transformed into standard uniformly distributed variables using probability integral transformation. :
[0179] (twenty three)
[0180] In formula (23), Indicates hydrological nodes At any moment Standard uniformly distributed variable, Indicates hydrological nodes At any moment The predicted residuals Indicates hydrological nodes The residual marginal cumulative distribution function.
[0181] The standard uniform distribution variables of all hydrological nodes constitute the standard uniform distribution matrix. ,in Indicates the number of hydrological nodes. Indicates the total number of moments.
[0182] S6.2: Construct the R-Vine topology and perform global parameter calibration. This is based on the standard uniform distribution variables of each hydrological node. To determine the correlation strength between nodes, a maximum spanning tree algorithm was employed. Using the absolute values of Kendall's correlation coefficients between hydrological node pairs as weights, the algorithm searched layer by layer to identify the tree sequence that captured the maximum dependency strength. Maximum likelihood estimation (MLE) was then used to evaluate the correlation parameters of each pair of binary copulas in the tree structure. The solution is performed to achieve global parameter calibration, and the objective function is:
[0183] (twenty four)
[0184] In formula (24): This represents the maximum likelihood estimate of the correlation parameter of the bivariate Copula; This represents the objective function for maximum likelihood estimation, used to solve for the parameters that maximize the log-likelihood value. ; This represents the correlation parameter to be determined for the binary pair Copula; Let represent the probability density function of a bivariate pair Copula; Indicates the total number of moments; and Representing hydrological nodes and hydrological nodes At any moment The standard uniformly distributed variable.
[0185] Based on the principle of vine structure decomposition, The joint probability density function of the dimensional residuals is expressed as: The product form of nested trees. Through... The parameters of all edges in the tree are calibrated layer by layer, and finally an R-Vine model that can describe the high-dimensional joint probability distribution of multiple hydrological nodes is constructed:
[0186] (25)
[0187] In formula (25): Indicates hydrological nodes At any moment The standard uniformly distributed variable; Indicates hydrological nodes The marginal probability density function; Indicates the total number of hydrological nodes; Represents the first in the R-Vine structure Tree; Represents the first in the R-Vine structure The set of edges of the trees; This indicates the set of edges connecting hydrological nodes. and hydrological nodes The edge; This is a set of connection conditions used to describe the complex and deep probabilistic dependencies between multiple sites; Indicates that in a given set of conditions Below, hydrological nodes and hydrological nodes The conditional bivariate Copula density function between them; and Representing hydrological nodes and hydrological nodes Given a set of conditions The conditional marginal cumulative distribution function.
[0188] S6.3: Perform spatiotemporal consistency sampling and probability prediction interval generation. Execute Monte Carlo stochastic simulations using the calibrated R-Vine model:
[0189] (1) Conditional simulation sampling: Monte Carlo stochastic simulation was performed using the R-Vine model fitted by S6.2 for each hydrological node. Every prediction time Generate B groups (e.g., B=1000) of uniformly distributed random samples with spatial dependence characteristics. , indicating hydrological nodes At the moment The A uniformly distributed random sample. The uniformly distributed random samples from all hydrological nodes constitute the simulation sample tensor. ,in Represents the total number of moments. B represents the number of hydrological nodes, and B represents the total number of groups.
[0190] (2) Residual scaling: for simulated sample tensors Each element in Utilizing corresponding hydrological nodes Inverse function of residual marginal cumulative distribution ,Will Mapping back to the original physical scale of the residuals yields simulated residual samples. Because the sampling process follows the hierarchical dependency structure of R-Vine, the simulated residual samples after the restoration of each hydrological node... All sites retain the spatial coordination and change characteristics, and the calculation formula is as follows:
[0191] (26)
[0192] In formula (26), Indicates hydrological nodes At any moment The One simulated residual sample, Indicates hydrological nodes At any moment The A uniformly distributed random sample.
[0193] (3) Calculation of runoff probability forecast interval: The simulated residual samples The predicted runoff is superimposed on the deterministic point prediction value (predicted runoff) output in step S5. Above, hydrological nodes are obtained. At any moment Runoff forecast set containing B samples The calculation formula is as follows:
[0194] (27)
[0195] In formula (27), Indicates hydrological nodes At any moment The A sample of runoff forecasts, Indicates hydrological nodes at the actual hydrological magnitude. At any moment Predicted runoff, Indicates hydrological nodes At any moment The One simulated residual sample.
[0196] Extracting runoff forecast ensemble In quantiles and quantiles, among which The significance level is [value]. According to [the relevant data]... quantiles and Quantile output Runoff probability forecast interval at confidence level .
[0197] Significance level It can be flexibly set according to forecasting needs. For example, the significance level... It can be 1.0%, 5.0%, 10.0%, etc., then the corresponding... The quantiles are 0.5%, 2.5%, 5.0%, etc., respectively. The quantiles are 99.5%, 97.5%, 95.0%, etc.
[0198] At significance level Taking 5.0% as an example, then The quantile is 2.5%. With a quantile of 97.5%, the runoff probability forecast interval at the 95% confidence level will be output. As shown in formula (28)
[0199] (28)
[0200] In formula (28), This indicates hydrological nodes at a 95% confidence level. At any moment Runoff probability forecast interval, Indicates the lower limit of the forecast interval. Indicates the upper limit of the forecast interval. Indicates hydrological nodes At any moment Runoff forecast set, and These represent the extracted hydrological nodes. At any moment The 2.5% and 97.5% quantiles corresponding to the runoff forecast set.
[0201] This process maps the spatial dependence features of the residuals of deterministic point forecasts to a set of random simulations, thus transforming the data from "single numerical values" to "probabilistic forecast intervals." This provides decision support for flood risk early warning, including not only magnitude forecasts but also spatially coordinated risk information.
[0202] The following example, as Embodiment 2, illustrates the implementation process and effects of this method:
[0203] This embodiment uses a section of the Wei River basin as the research object, selecting 16 observation stations: four hydrological stations (Weijiabao, Xianyang, Lintong, and Huaxian) and 12 meteorological stations (Qianyang, Baoji County, and Meixian County) distributed within the basin. Observational data were obtained from these stations, covering the period from 1975 to 2024. The observational characteristics of each station are shown in Table 2.
[0204] Table 2 Observation characteristics of observation stations
[0205] The specific implementation steps of Example 2 are as follows:
[0206] S1: Construct a directed graph of the Wei River basin:
[0207] S1.1: The 16 observation stations are abstracted as graph nodes. Based on the hydrogeographic distribution and natural hydraulic connections within the watershed, the topological connections between nodes are predefined (e.g., Figure 2As shown in the diagram, a set of directed edges is constructed, and each directed edge is assigned static edge features. These static edge features include node distance features, elevation difference features, slope features, reciprocal distance features, channel length features, and self-loop marker features. A self-loop marker is set for each node. This ensures that the historical information of each node is preserved in the spatial feature aggregation. Taking the directed edge from Weijiabao Hydrological Node to Xianyang Hydrological Node as an example, the node distance is approximately 85km, and the measured river length is approximately 110km.
[0208] S1.2: Determine the observation mask for each observation feature at each node at each time point. The observation mask is used to characterize whether the measured data for the corresponding observation feature is valid. This embodiment addresses the issue of missing measured data for some nodes between 1975 and 2024 by generating an observation mask. For example, the Xianyang hydrological node has a missing measured runoff on January 1, 1985. The observation mask corresponding to this measured runoff is 0, ensuring that this invalid observation value is not included in subsequent calculations of the mean, standard deviation, and other statistical calculations.
[0209] S1.3: The measured runoff at each hydrological node at each time point is standardized using the observation mask to obtain the standardized runoff. This embodiment addresses the issue of large range (0.012m) in the runoff data training set for the Wei River basin. 3 / s to 5740.001m 3 / s), exhibiting a distinctly skewed distribution (mean value 135.372m). 3 / s, with a standard deviation of 248.956m. 3 The measured runoff is standardized to obtain standardized runoff, making the runoff data distribution closer to a Gaussian distribution.
[0210] S1.4: Fill in missing data based on observation mask.
[0211] S2: Based on the static edge characteristics of each directed edge in the directed graph, the transmission weight of the corresponding directed edge is generated by the filtering generation network. The transmission weight is used to characterize the hydraulic transmission capacity between the two nodes connected by the corresponding directed edge, which maps the physical transport characteristics of the Weihe River channel, so that the runoff prediction model constructed later can perceive the difference in the confluence velocity between different river sections.
[0212] S3: Based on the static edge characteristics of each directed edge and the observation characteristics of each node at each time step, determine the gating coefficient of each directed edge at each time step using a multilayer perceptron and gating mechanism.
[0213] S4: Based on the transmission weights, gating coefficients, and the observation features of each node at each time step, the dynamic physical prior features of each node at each time step are generated using the spatial aggregation mechanism of the edge-conditional convolutional network.
[0214] S5: Determine the predicted runoff for each hydrological node based on the observation mask and dynamic physical prior characteristics.
[0215] S5.1: Historical runoff time series data for four hydrological nodes in the Weihe River basin—Weijiabao, Xianyang, Lintong, and Huaxian—were retrieved, and adaptive thresholds were calculated for each node individually. Taking Weijiabao as an example: the 95th percentile of this node was used as the flood discharge threshold, and the calculated flood discharge threshold was 288.60 m³ / s. 3 If the flow rate is / s, then the runoff interval exceeding the flood flow threshold is the flood flow interval; taking the 20th percentile of this node as the low-water flow threshold, the calculated low-water flow threshold is 7.61m. 3 If the flow rate is less than the low water flow threshold, then the runoff range is the low water flow range.
[0216] The node weights are determined based on the complexity of the topology of each hydrological node in the directed graph. Lintong Station, located at the confluence of the Jinghe River and the Weihe River, is subject to the superposition of multiple runoff sources, resulting in high hydrological complexity. Huaxian Station, as a key control station before the Weihe River flows into the Yellow River, carries water collection information for the entire basin. Therefore, in this embodiment, the node weights for Lintong and Huaxian hydrological nodes are preset to 2, while the node weights for the remaining hydrological nodes are preset to 1.
[0217] To enhance the ability to capture torrential rain and floods in the Weihe River, a flood enhancement weight (preset to 3 in this embodiment) is applied to all samples falling within the flood flow range to obtain the basic weight of each sample.
[0218] At the same time, the deviation correction weights for each hydrological node are determined. For example, when a high-risk underreporting event occurs, the corresponding hydrological node is assigned a flood peak penalty weight (preset to 2 in this embodiment); when a false noise event occurs, the corresponding hydrological node is assigned a low-water suppression weight (preset to 1.5 in this embodiment); when a regular event occurs, the corresponding hydrological node is assigned a baseline weight (preset to 1 in this embodiment).
[0219] Then, a triple-nested weighting system is constructed based on node weights, sample base weights, and bias correction weights. A loss function is then built based on this triple-nested weighting system and the observation mask. By minimizing this loss function, the runoff prediction model can automatically tilt towards high-weight samples during training, significantly improving prediction accuracy in years of major floods in the Wei River, while maintaining the stability of model parameters during periods of missing observations.
[0220] S5.2: Integrate dynamic physical prior features and construct an initial model based on loss function and time-dimensional neural network to predict the runoff of each hydrological node. This initial model can predict the predicted runoff of each hydrological node based on the dynamic physical prior features.
[0221] S5.3: The TPE algorithm is used to optimize the hyperparameter combination of the initial model. Based on the multi-node joint prediction requirements of the Weihe River Basin, the hyperparameter search is shown in Table 3.
[0222] Table 3 Hyperparameter Search Space
[0223] By optimizing the hyperparameters, the optimal parameter combination and the runoff prediction model based on the optimal hyperparameter combination were obtained. The hyperparameter optimization results are shown in Table 4.
[0224] Table 4. Results of Hyperparameter Optimization
[0225] To improve the robustness of the runoff prediction model under random initialization conditions, 10 distinct random seeds were used to independently and repeatedly train the runoff prediction model based on the optimal hyperparameter combination. The results are shown in Table 5. Statistical results show that the runoff prediction model performs well globally under 10 random seeds. The mean was 0.863, and the standard deviation was only 0.014. When the random seed was 42, The index reached its highest value of 0.883, indicating that the runoff prediction model possesses extremely high predictive skill and statistical reliability under the optimal hyperparameter combination. Subsequent prediction results in this embodiment are all based on model instances generated from this seed.
[0226] Table 5 Runoff Prediction Models under Different Random Seeds result
[0227] S5.4: Based on the dynamic physical prior characteristics, the predicted runoff volume for each hydrological node is determined using the runoff prediction model.
[0228] S6: Based on the predicted runoff at each hydrological node, determine the runoff probability forecast interval corresponding to each predicted runoff using the R-Vine Copula model.
[0229] S6.1 Perform residual marginal cumulative distribution fitting and probability space transformation. For the prediction residual set of each node, a non-parametric empirical distribution function is used to fit its own residual marginal cumulative distribution function. The original residuals are transformed into variables that follow a standard uniform distribution through formula (23) to eliminate the interference of runoff level differences between nodes on subsequent spatial correlation modeling.
[0230] S6.2: Constructing the R-Vine topology structure of the Weihe River Basin and performing global parameter calibration. Based on the correlation strength between the standard uniformly distributed variables of each node, the global parameter calibration of the R-Vine model is completed using the maximal spanning tree algorithm. Using the absolute value of Kendall's correlation coefficient between node pairs as weights, the maximal spanning tree algorithm is used layer by layer to determine the vine structure tree sequence that can capture the maximum dependency strength. For each edge in the sequence, the binary pair Copula type is optimized from the preset function pool according to the Akaike information criterion (AIC), and the results are shown in Table 6.
[0231] Table 6. Simulation results of R-Vine Copula
[0232] Note: In the table , and These represent the degrees of freedom, correlation coefficient, and dependence strength, respectively.
[0233] As shown in Table 6, the Kendall's Tau values at each level are relatively small, indicating that the runoff prediction model has fully learned the physical laws between nodes and effectively decoupled the local errors of adjacent nodes. Tree1, as the most correlated layer in the R-Vine Copula, exhibits a chain-like structure (e.g., Figure 3 As shown in the figure, this indicates that the error is transitive, meaning that the uncertainty mainly originates from the propagation of uncertainty from the upstream input. Copula successfully captured the fluid dynamic connection between the upstream and downstream.
[0234] S6.3: Perform spatiotemporal consistency sampling and generate probabilistic forecast intervals. Using the calibrated R-Vine Copula model, perform Monte Carlo stochastic simulations to generate 1000 sets of uniformly distributed random samples with spatially dependent characteristics. Then, using the inverse function of the cumulative marginal distribution of residuals at each node, map the samples back to the original residual physical scale. Finally, superimpose the restored simulated residual samples onto the deterministic point prediction values output in step S5 to derive the runoff forecast set.
[0235] By performing quantile statistics on the runoff forecast ensemble, the 2.5% and 97.5% quantile values are extracted, and the final runoff probability forecast interval at the 95% confidence level is output (e.g., ...). Figure 4 (As shown). Taking the runoff at the Huaxian hydrological node on August 1, 2020 as an example, the measured runoff was 350.00 m³. 3 / s, predicted runoff is 300.58m³. 3 / s, after this step, the output probability prediction interval at a 95% confidence level is [220.15m].3 / s, 391.22m 3 / s].
[0236] To verify the effectiveness of the probabilistic forecast interval under extreme conditions in the Weihe River Basin, this embodiment introduces two indicators: Prediction Interval Coverage Probability (PICP) and Mean Prediction Interval Width (MPIW). The statistical results are shown in Table 7. The average PICP of the four hydrological nodes is 0.947, which is highly consistent with the theoretically preset confidence level of 0.95. Specifically, the PICP of the Xianyang hydrological node reaches 0.985, demonstrating extremely high envelope reliability; the PICPs of the Weijiabao and Huaxian hydrological nodes are 0.944 and 0.952, respectively. This proves that R-Vine Copula can effectively capture the joint distribution characteristics of the basin residuals, and even in complex nonlinear river sections, the forecast interval maintains extremely high statistical reliability. Meanwhile, the MPI indices of each node remain at a low level, indicating that the runoff prediction model, while ensuring high coverage, does not blindly increase the interval width to obtain the index, demonstrating the refinement of R-Vine Copula's spatial correlation extraction.
[0237] Table 7. Statistics of Probability Forecasting Indicators at 95% Confidence Level
[0238] By exploiting hidden dependencies between nodes using R-Vine Copula, the spatial collaborative variation characteristics between nodes not explained by the physical model are accurately captured. The resulting probabilistic forecast intervals not only cover the measured runoff but also have compact forecast boundaries, providing stronger practical scheduling reference value. Combining the spatiotemporal joint inference of S1 to S5, step S6 successfully transforms the deterministic forecast bias in the Weihe River Basin from 1975 to 2024 into statistically significant risk intervals. The probabilistic forecast intervals output by this method completely encompass the measured flood peak, providing decision support for flood control scheduling from a point-to-area perspective.
[0239] Embodiment 3 of the present invention provides a runoff prediction system that integrates dynamic physical priors and graph networks. The system includes:
[0240] The directed graph generation unit is used to construct a directed graph with multiple hydrological stations and multiple meteorological stations in the basin as nodes. The directed graph contains the observation features of each node and the static edge features of each directed edge. The nodes corresponding to the hydrological stations are recorded as hydrological nodes and the nodes corresponding to the meteorological stations are recorded as meteorological nodes. The observation mask of each node for each observation feature at each time is determined. The observation mask is used to characterize whether the measured data of the corresponding observation feature is valid.
[0241] The transmission weight generation unit is used to generate the transmission weight of the corresponding directed edge based on the static edge characteristics of each directed edge in the directed graph and the filtering generation network. The transmission weight is used to characterize the hydraulic transmission capacity between the two nodes connected by the corresponding directed edge.
[0242] The gating coefficient generation unit is used to determine the gating coefficient of each directed edge at each time step based on the static edge characteristics of each directed edge and the observation characteristics of each node at each time step, using a multilayer perceptron and a gating mechanism.
[0243] The prior feature generation unit is used to generate dynamic physical prior features of each node at each time step based on the transmission weights, gating coefficients and the observation features of each node at each time step, using the edge-conditional convolutional network.
[0244] The runoff prediction unit is used to determine the predicted runoff for each hydrological station based on the observation mask and dynamic physical prior characteristics.
[0245] This invention achieves a real-time balance between physical topological constraints and dynamic hydrological evolution through edge-conditional convolution and dynamic gating mechanisms. By constructing a spatiotemporal graph neural network architecture that integrates physical priors and state awareness, it achieves a joint representation of the static river channel attributes and dynamic hydrological evolution process of the watershed, accurately reconstructing the continuous physical migration process of spatial displacement accumulating over time. Utilizing a data observation mask and an asymmetric loss function with triple-nested dynamic weight constraints, the model's ability to capture features of extreme flood peak samples and its robustness to missing data are significantly enhanced. Finally, by using the R-VineCopula model to capture the implicit spatial correlations between nodes, it successfully transforms traditional deterministic point prediction into probabilistic forecasting that contains the spatial correlation characteristics and physical evolution logic of the watershed, providing a more scientific and comprehensive systemic risk assessment reference for the optimal allocation of watershed water resources and flood control and disaster reduction decisions.
[0246] The above description is merely a few embodiments of this application and is not intended to limit this application in any way. Although this application discloses preferred embodiments as described above, it is not intended to limit this application. Any changes or modifications made by those skilled in the art without departing from the scope of the technical solution of this application using the disclosed technical content are equivalent to equivalent implementation cases and fall within the scope of the technical solution.
Claims
1. A method for runoff prediction fusing dynamic physical priors and graph networks, characterized in that, The method includes: S1. Construct a directed graph using multiple hydrological stations and multiple meteorological stations within the basin as nodes. The directed graph includes the observation features of each node and the static edge features of each directed edge. The nodes corresponding to the hydrological stations are denoted as hydrological nodes, and the nodes corresponding to the meteorological stations are denoted as meteorological nodes. Determine the observation mask for each observation feature of each node at each time. The observation mask is used to characterize whether the measured data of the corresponding observation feature is valid. S2. Based on the static edge characteristics of each directed edge in the directed graph, a transmission weight is generated for the corresponding directed edge using a filtering generation network. The transmission weight is used to characterize the hydraulic transmission capacity between the two nodes connected by the corresponding directed edge. S3. Based on the static edge characteristics of each directed edge and the observation characteristics of each node at each time step, determine the gating coefficient of each directed edge at each time step using a multilayer perceptron and gating mechanism. S4. Based on the transmission weights, the gating coefficients, and the observation features of each node at each time step, generate the dynamic physical prior features of each node at each time step using the edge-conditional convolutional network. S5. Determine the predicted runoff for each hydrological node based on the observation mask and the dynamic physical prior features.
2. The method of claim 1, wherein, Following S5, the method further includes: S6. Based on the predicted runoff at each hydrological node, determine the runoff probability forecast interval corresponding to each predicted runoff using the R-Vine Copula model.
3. The method of claim 1, wherein, S5 specifically includes: A runoff prediction model is constructed based on the observation mask; Based on the aforementioned dynamic physical prior characteristics, the predicted runoff volume for each hydrological node is determined using the aforementioned runoff prediction model.
4. The method of claim 3, wherein, Constructing a runoff prediction model based on the observation mask specifically includes: Construct a loss function based on the observation mask; By integrating the aforementioned dynamic physical prior features, an initial model for predicting runoff at each hydrological node is constructed based on the aforementioned loss function and time-dimensional neural network. The hyperparameter combination of the initial model is optimized using a hyperparameter optimization algorithm, and the optimized initial model is used as the runoff prediction model.
5. The method of claim 4, wherein, The loss function is constructed based on the observation mask, specifically including: Determine the node weight of each hydrological node, determine the basic weight of the sample based on the flood flow threshold, and determine the deviation correction weight based on the prediction deviation. A triple nested weight structure is constructed based on the node weight, the sample base weight, and the deviation correction weight; A loss function is constructed based on the triple nested weights and the observation mask.
6. The method of claim 4, wherein, The hyperparameter optimization algorithm is used to optimize the combination of hyperparameters of the initial model, specifically including: Multiple hyperparameter combinations are iteratively generated using a hyperparameter optimization algorithm, and the performance index of the initial model under different hyperparameter combinations is determined based on the Nash efficiency coefficient. The determined performance indicators are divided into positive indicators that meet the preset performance and negative indicators that do not meet the preset performance. A first probability density model is constructed based on all positive indicators, and a second probability density model is constructed based on all negative indicators. The hyperparameter combination of the initial model is optimized with the goal of maximizing the ratio of the first probability density model to the second probability density model.
7. The method of claim 1, wherein, The observational characteristics of the hydrological nodes include measured runoff; Prior to S2, the method further includes: The measured runoff at each hydrological node at each time moment is standardized based on the observation mask to obtain the standardized runoff. Correspondingly, the observation features in S3 and S4 include the standardized runoff at each hydrological node at each time point.
8. The method of claim 7, wherein, The measured runoff at each hydrological node at each time moment is standardized based on the observation mask to obtain the standardized runoff, specifically including: Logarithmically compress the measured runoff at each hydrological node at each time point to obtain the compressed runoff. Based on the compressive runoff at all times and the observation mask of the measured runoff at all times for each hydrological node, determine the mean and standard deviation of the compressive runoff at all times for each node. The standard score of the compressed runoff at each node at each time step is determined based on the mean and the standard deviation, and the standard score is determined as the standardized runoff of the corresponding measured runoff.
9. The method of claim 2, wherein, S6 specifically includes: Based on the residual between the predicted runoff and the measured runoff at each hydrological node, a residual simulation model is constructed using the R-Vine Copula model. The residual simulation model is used to generate simulated residual samples corresponding to each predicted runoff volume; The simulated residual samples are superimposed with their corresponding predicted runoff volumes to obtain the runoff probability forecast interval for each predicted runoff volume.
10. A runoff prediction system that fuses dynamic physical priors and graph networks, characterized in that, The system includes: The directed graph generation unit is used to construct a directed graph with multiple hydrological stations and multiple meteorological stations in the basin as nodes. The directed graph contains the observation features of each node and the static edge features of each directed edge. The nodes corresponding to the hydrological stations are recorded as hydrological nodes and the nodes corresponding to the meteorological stations are recorded as meteorological nodes. The unit also determines the observation mask of each observation feature of each node at each time. The observation mask is used to characterize whether the measured data of the corresponding observation feature is valid. The transmission weight generation unit is used to generate the transmission weight of the corresponding directed edge based on the static edge characteristics of each directed edge in the directed graph and the filtering generation network. The transmission weight is used to characterize the hydraulic transmission capacity between the two nodes connected by the corresponding directed edge. The gating coefficient generation unit is used to determine the gating coefficient of each directed edge at each time step based on the static edge characteristics of each directed edge and the observation characteristics of each node at each time step, using a multilayer perceptron and a gating mechanism. The prior feature generation unit is used to generate dynamic physical prior features of each node at each time step based on the transmission weights, gating coefficients and the observation features of each node at each time step, using the edge-conditional convolutional network. The runoff prediction unit is used to determine the predicted runoff for each hydrological node based on the observation mask and dynamic physical prior characteristics.