A three-source characteristic similarity transfer hydrological simulation method for a data-deficient river basin
By constructing a three-source feature space and feature mapping, and combining cluster analysis and supervised learning, the problem of insufficient similarity measurement in hydrological simulation in watersheds with scarce data was solved, and refined transfer and continuous optimization were achieved, thereby improving the accuracy and adaptability of the simulation.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- DALIAN UNIV OF TECH
- Filing Date
- 2026-05-19
- Publication Date
- 2026-06-16
AI Technical Summary
Existing technologies cannot fully characterize the hydrological similarity of watersheds with insufficient data, the migration strategy is crude, and the knowledge base cannot be updated, resulting in inaccurate hydrological simulations.
We construct a three-source feature space of geography, rainfall, and runoff. Through cluster analysis and feature mapping, we achieve adaptive migration of hydrological model parameters and dynamic updating of the knowledge base. We adopt multi-dimensional similarity measurement and hierarchical migration strategy, combined with supervised learning model to estimate runoff response characteristics.
It enables refined migration of hydrological model parameters in watersheds with scarce data, improving the reliability and generalization ability of the simulation, and establishing an adaptive knowledge base update system.
Smart Images

Figure CN122221702A_ABST
Abstract
Description
Technical Field
[0001] This invention belongs to the interdisciplinary field of hydrology and water resources technology and artificial intelligence technology, and relates to a three-source feature similarity migration hydrological simulation method for watersheds with scarce data. Background Technology
[0002] Watersheds lacking data, due to insufficient hydrological station coverage and a scarcity of runoff observation data, struggle to meet the fundamental requirements of data-driven methods for long-term observational data, making it one of the long-standing classic challenges in the field of hydrology. While these watersheds typically possess basic geographical information such as topography, soil, and land use, or have short-term rainfall records, the lack of crucial runoff data poses a fundamental obstacle to the parameter calibration phase of traditional hydrological models, and deep learning methods are also difficult to apply due to a lack of effective training samples. Therefore, how to achieve reliable hydrological simulation under conditions of missing runoff observations has become a key bottleneck restricting the development of this field.
[0003] Currently, hydrological simulation methods for data-scarce watersheds mainly include parameter transplantation methods based on the similarity of watershed geographical attributes, regionalization methods based on watershed physical attributes, and prior parameter estimation methods based on global or regional scales. Parameter transplantation methods transfer calibrated parameters from a reference watershed to a target watershed, regionalization methods focus on establishing regression relationships between watershed attributes and model parameters, and prior parameter estimation methods rely on global or regional datasets to directly derive model parameters. However, a common limitation of these methods is their primary reliance on the static geographical attributes of the watershed, with insufficient attention paid to the essential differences in dynamic hydrological behavior. This static perspective often leads to watersheds with similar geographical attributes exhibiting drastically different hydrological response behaviors, while watersheds with significantly different geographical attributes may have similar hydrological response characteristics.
[0004] In recent years, deep learning models, represented by Long Short-Term Memory (LSTM) neural networks, have demonstrated superior performance compared to traditional conceptual models in hydrological time series simulations. However, the data-driven nature of LSTM makes it impossible to directly train models in data-scarce watersheds. Transfer learning offers a new approach to address this challenge: by using a similarity metric, LSTM models pre-trained in data-rich regions can be transferred to data-scarce regions.
[0005] To apply deep learning models to data-scarce regions, researchers have explored model transfer methods based on watershed similarity. For example, Chinese invention patent CN116432828A calculates the similarity of the target watershed and the source watershed in terms of geographical attributes to select the most similar source watershed, uses its physical model to simulate data to pre-train an LSTM, and then transfers the model to the target watershed. However, the similarity measurement in this patent is limited to the single dimension of geographical attributes, making it difficult to comprehensively reflect the essential differences in the dynamic hydrological behavior of the watersheds; moreover, the transfer decision uses a binary "yes or no" judgment, lacking a fine-grained characterization of the degree of similarity. Another example is Chinese invention patent CN119204355A, which uses a deep learning model to correct the forecast residuals of a physical model, using the knowledge of the physical model to compensate for insufficient data. Although this method improves forecast accuracy to some extent, it is essentially still a post-processing correction mode; the structure and parameters of the physical model itself remain unchanged, and it is not specifically designed for similarity transfer.
[0006] In summary, how to provide a data-scarce watershed hydrological simulation method that can comprehensively characterize watershed hydrological similarity from multiple dimensions, achieve refined hierarchical transfer decision-making, and possess closed-loop iterative optimization capabilities, in order to overcome the shortcomings of existing transfer learning techniques in terms of single similarity measurement, coarse transfer strategies, and inability to update the knowledge base, has become an urgent problem to be solved. Summary of the Invention
[0007] The purpose of this invention is to overcome the shortcomings of the prior art and provide a hydrological simulation method based on the similarity migration of three-source features for watersheds with insufficient data. By constructing a three-source feature space of geography, rainfall and runoff and integrating cluster analysis, feature mapping and similarity migration, the method achieves adaptive migration of hydrological model parameters and dynamic updating of the knowledge base from the source watershed to the target watershed with insufficient data.
[0008] To achieve the above objectives, the present invention provides the following technical solution: A method for migrating hydrological data based on the similarity of three-source features in watersheds with limited data is specifically a computer-based method for migrating hydrological data based on the similarity of three-source features in watersheds with limited data, comprising the following steps: Step S1: Obtain multi-source data from the source and target watersheds, preprocess and normalize the multi-source data, and construct a three-source feature vector including geographical attribute features, rainfall driving features, and runoff response features. Specifically: Step S1.1: Obtain multi-source data from the source and target watersheds. The multi-source data includes geographic attribute data, rainfall time-series data, and runoff time-series data. The geographic attribute data is sourced from a global hydrological and environmental database and includes, but is not limited to, topographic features, climate features, soil features, geological features, and land use features. The rainfall time-series data is sourced from a multi-source fused precipitation dataset with a time resolution of at least daily. The runoff time-series data is sourced from publicly available watershed hydrological datasets. For the target watershed, due to a lack of data, its runoff time-series data is considered missing or only used for subsequent verification.
[0009] Step S1.2 involves preprocessing and feature normalization of the multi-source data obtained in Step S1.1. Missing values in the geographic attribute data are filled using multiple interpolation. Rainfall and runoff time-series data are uniformly resampled to a daily scale using linear interpolation, and watershed samples with more than 20% missing runoff observation data are removed. Dimensionless processing is then applied to each feature dimension after preprocessing and feature normalization to eliminate feature scale bias introduced by differences in dimensions and numerical ranges, ensuring that each feature has equal representational weight in the subsequent joint feature space.
[0010] Step S1.3: Construct geographic attribute feature vectors Extracting from dimensionless geographic attribute data Several attribute variables closely related to the watershed hydrological response constitute a geographic attribute feature vector. ,in It is a real number space. The geographic attribute feature vector includes, but is not limited to, the following five categories of sub-features: topographic features, including average elevation, watershed area, etc.; climate features, including average annual precipitation, precipitation seasonality index, etc.; soil features, including soil thickness, saturated hydraulic conductivity, etc.; geological features, including bedrock type, etc.; land use features, including forest coverage, impermeable surface ratio, etc.
[0011] Step S1.4: Construct rainfall-driven feature vectors Extracting from dimensionless rainfall time-series data These feature variables, which characterize the dynamic properties of a rainfall process, constitute the rainfall-driven feature vector. The rainfall-driven feature vector includes, but is not limited to, the following four sub-features: total rainfall features, including the annual average rainfall; seasonal features, including the rainfall concentration index; extreme rainfall features, including the maximum daily rainfall and the total extreme rainfall; and rainfall event features, including the average intensity of rainfall events.
[0012] Step S1.5: Construct runoff response feature vector Extracting from dimensionless runoff time series data These characteristic variables, which characterize the essence of the hydrological response behavior of a watershed, constitute the runoff response characteristic vector. The runoff response feature vector includes, but is not limited to, the following five categories of sub-features: water balance indicators, including runoff coefficient, etc.; flow duration curve indicators, including Q95 flow value, etc.; recession characteristic indicators, including recession constant, etc.; flood characteristic indicators, including flood frequency, peak time, etc.; and seasonal indicators, including runoff seasonality index, etc.
[0013] Step S1.6, convert the geographic attribute feature vector Rainfall-driven feature vectors With runoff response eigenvector By concatenating the vectors, we obtain the three source feature vectors of the source watershed. : (1) For the target watershed, due to the lack of runoff observation data, its runoff response characteristic vector For now, treat it as unknown and only construct geographic attribute feature vectors. Rainfall-driven feature vector .
[0014] Step S2 involves feature fusion and dimensionality reduction of the three source feature vectors of the source watershed. Cluster analysis is then performed in the dimensionality-reduced joint feature space to obtain multiple watershed categories and the distribution of distances between each cluster center and within each cluster. This process constructs a prior knowledge base for model parameters. Specifically: Step S2.1, the source-watershed three-source feature vectors constructed in step S1.6 are... Dimensionless processing is performed. Using the same dimensionless processing strategy as in step S1.2, the scale differences that may remain between dimensions after feature concatenation are further eliminated to obtain the feature matrix. .
[0015] Step S2.2, for the feature matrix Dimensionality reduction is performed. Principal component analysis is used to calculate the eigenvalues and eigenvectors of the covariance matrix, and the number of principal components is determined based on the cumulative variance contribution rate. This makes the former The cumulative variance contribution rate of each principal component is not less than a preset threshold. The joint feature space after dimensionality reduction is denoted as... ,in This represents the total number of source basins.
[0016] Step S2.3, in the dimensionality-reduced joint feature space Cluster analysis is performed. A clustering algorithm is used, with Euclidean distance as the similarity measure. The optimal number of clusters is determined... After comprehensively examining various clustering effectiveness evaluation indicators, the one that achieves the overall optimal balance between intra-cluster compactness and inter-cluster segregation was selected. The value is used as the cluster number. After clustering is completed, we get... There are 1 watershed category, each denoted as _____. ( ).
[0017] Step S2.4, calculate each category Cluster center And the Euclidean distance distribution from within-class samples to cluster centers. Statistical analysis of within-class distances for each category, calculating the... percentile , No. percentile Passing the exam percentile ,in . , , These are respectively used as the strict threshold, baseline threshold, and lenient threshold in subsequent similarity determination.
[0018] Step S2.5: For each source basin, collect the parameters of its trained hydrological model. The hydrological model is preferably a long short-term memory network model, employing a unified network architecture and independently trained using daily-scale rainfall, potential evapotranspiration, and runoff data for each watershed. After training, the model parameters are saved, and the simulation accuracy evaluation index for the validation period is calculated. Watersheds with validation accuracy below a preset threshold are marked as unusable source watersheds and are not included in the prior knowledge base.
[0019] Step S2.6, the cluster centers obtained in step S2.4 Percentiles , , and the available source watersheds and their model parameters collected in step S2.5. Together, they constitute the prior knowledge base of model parameters.
[0020] Step S3 involves using a feature space mapping method to map the feature vectors of the target and source watersheds to a unified feature space, and aligning the feature spaces of the source and target watersheds to obtain a projected unified feature representation. Specifically: Step S3.1: Determine the feature space mapping method. The feature space mapping method employs principal component analysis, calculating the projection matrix based on the geographical attribute features and rainfall-driven features of the source watershed. The geographical attribute feature vectors of the source watershed are then mapped... Rainfall-driven feature vector By concatenating the features, we obtain the joint input feature vector of the source and watersheds. .
[0021] Step S3.2, calculate the projection matrix The joint input feature matrix of the source and watersheds. Perform eigenvalue decomposition, calculate its covariance matrix, and solve for the eigenvalues and eigenvectors. Select the first... The eigenvectors corresponding to the largest eigenvalues constitute the projection matrix. ,in The selection of features ensures that the cumulative variance contribution rate of the projected features is not lower than a preset threshold.
[0022] Step S3.3: Project the joint input features of the source watersheds onto a unified feature space. The projected features of the source watersheds are represented as follows: (2) in, This represents the aligned representation of the source basin in a unified feature space. It is the mean vector of the joint input features of the source and watersheds.
[0023] Step S3.4: Project the joint input features of the target watershed onto a unified feature space. The joint input features of the target watershed are... Using the same mean With projection matrix Project: (3) in, This represents the aligned representation of the target watershed in a unified feature space. Through the above projection, the alignment representation of the source and target watersheds in the unified feature space is obtained. and This effectively eliminates the domain offset caused by differences in cross-basin data acquisition methods and resolutions.
[0024] Step S4: In the absence of runoff observation data in the target watershed, establish a mapping model between the geographical attribute characteristics of the source watershed and the runoff response characteristics, and estimate its runoff response characteristics using the geographical attribute characteristics of the target watershed to obtain the predicted runoff response feature vector and uncertainty index. Specifically: Step S4.1: Construct a supervised learning model. This involves using the geographical attributes of the source watershed. As input features, runoff response features Each component is used as the prediction target, and they are trained separately. A supervised learning model. The supervised learning model is preferably a regression model based on ensemble learning.
[0025] Step S4.2, train the mapping model. This uses the geographic attribute features constructed in step S1.3. As input, the runoff response characteristics constructed in step S1.5 Each component is used as the output, and they are trained separately. A set of trained mapping models is obtained by performing several regression models. Model performance is evaluated using cross-validation or proprietary validation mechanisms, and the root mean square error and coefficient of determination for each component's predictions are calculated. Components with prediction performance below a preset threshold are weighted less in subsequent similarity calculations.
[0026] Step S4.3: Estimate the runoff response characteristics of the target watershed. This involves analyzing the geographical attributes of the target watershed. Input the mapping model trained in step S4.2 to obtain the predicted runoff response feature vector. : (4) in, For the reason A set of mapping models consisting of regression models, each model corresponding to a component of the predicted runoff response characteristics.
[0027] Step S4.4, estimate the prediction uncertainty. Using the prediction residual distribution generated during the regression model training process, calculate the variance of each component's predicted value, and then obtain the confidence interval width of the predicted value. Define the confidence interval width ratio: (5) in, This represents the overall standard deviation of the characteristic components of the runoff response corresponding to the source basin.
[0028] Step S5: Construct the joint feature vector of the target watershed, and use a weighted distance metric to calculate the feature similarity between the target watershed and each cluster center to obtain the similarity level between the target watershed and each source watershed category. Specifically: Step S5.1: Construct the joint feature vector of the target watershed. This involves combining the geographical attribute features of the target watershed. Rainfall-driven characteristics and the runoff response characteristics estimated in step S4.3 By concatenating the features, we obtain the joint feature vector of the target watershed: (6) Step S5.2: Perform feature alignment on the source watershed cluster centers. This involves aligning the features of each category obtained in step S2.4. Cluster center Mapping back from the dimensionality-reduced joint feature space to the original feature space yields the cluster centers in the original feature space. .
[0029] Step S5.3, define the weighted distance metric. The target watershed and the... The weighted Euclidean distance between cluster centers is defined as: (7) in, The cluster centers in the original feature space are the cluster centers at the th Values can be taken in each feature dimension; For the first The weight coefficients for each feature dimension. These weight coefficients are assigned based on the feature source, grouped into: geographic attribute feature groups. The weight is Rainfall-driven feature group The weight is Runoff response characteristic group The weight is ,and The weights within each group are evenly distributed across all dimensions.
[0030] Step S5.4, dynamically adjust the weights. The weights are dynamically adjusted based on the availability of data for the target watershed: when runoff observation data is available for the target watershed, the weight of the runoff response characteristic group is increased. When runoff observation data is unavailable for the target watershed, increase the weight of the geographic attribute feature group. and the weights of rainfall-driven feature groups .
[0031] Step S5.5, Similarity Level Determination. Based on the weighted distance calculated in step S5.3. Combined with the percentiles of each category in step S2.4 , , Feature similarity is divided into multiple levels. The rules for dividing similarity levels are as follows: First similarity level, i.e., high similarity: when Furthermore, the weighted distances of the target watershed in the geographic attribute subspace and the rainfall-driven subspace do not exceed the corresponding subspace's [number of units]. Percentile distance; The second similarity level, i.e., medium to high similarity: when And in at least two subspaces, the corresponding first... Percentile distance constraint; The third similarity level, or medium similarity: when And in at least one subspace, the corresponding first... Percentile distance constraint; The fourth similarity level, i.e., low similarity: when Alternatively, it may satisfy the distance condition but not the subspace matching condition.
[0032] Step S6: Based on the feature similarity obtained in step S5, select the corresponding category of hydrological model parameters from the prior knowledge base of model parameters constructed in step S2, and apply the hydrological model parameters to the hydrological simulation of the target watershed to obtain the simulated flow process curve of the target watershed. Specifically: Step S6.1: Determine the most similar category. Calculate the target watershed's similarity to all... The weighted distance between the cluster centers is used to select the category with the smallest weighted distance as the most similar category. : (8) Step S6.2: Obtain the similarity level. Based on step S5.5, determine the target watershed and the most similar category. Similarity levels between .
[0033] Step S6.3: Execute differentiated parameter migration strategies based on similarity levels: like For high similarity, a direct transfer strategy or an ensemble transfer strategy is employed. The direct transfer strategy directly applies the model parameters of the source watershed with the smallest weighted distance to the target watershed within the most similar category to the target watershed. The ensemble transfer strategy weights and fuses the model parameters of the top few most similar source watersheds within the most similar category, with the weights inversely proportional to the weighted distance.
[0034] like To achieve medium to high similarity, a parameter fine-tuning transfer strategy is adopted. The model parameters of the source watershed with the smallest weighted distance to the target watershed in the most similar category are used as the initial parameters, and fine-tuning training is performed using local data from the target watershed.
[0035] like For models with moderate similarity, a constraint-corrected transfer strategy is adopted. The model parameters from multiple source watersheds in the most similar category are weighted and fused, and a parameter deviation constraint term is introduced to control transfer risk.
[0036] like In other words, if the similarity is low, no model transfer will be performed, the target watershed will be marked as an unmatched category, and a prompt will be made that local observation data needs to be supplemented.
[0037] The above strategies are used to obtain hydrological model parameters applicable to the target watershed.
[0038] Step S6.4: Perform hydrological simulation. Apply the hydrological model parameters obtained in step S6.3 to the target watershed, using the rainfall sequence and potential evapotranspiration sequence of the target watershed as input, perform forward calculation, and output the simulated flow process curve of the target watershed.
[0039] Step S7: Based on actual observation data or validation samples from the target watershed, update the prior knowledge base of the model parameters to obtain the optimized weight configuration and the updated knowledge base. Specifically: Step S7.1: Collect validation samples. After completing the migration simulation in the target watershed, when new runoff observation data is obtained, this data is used as a validation sample to evaluate the simulation performance of the migration model. Performance evaluation indicators include Nash efficiency coefficient, root mean square error, and relative error of peak flow.
[0040] Step S7.2: Construct a similarity-performance relationship model. Treat this migration as a migration case and summarize it with historical migration cases. Record the similarity distance of each migration case. Based on the corresponding simulation performance metrics, establish similarity distances for all cases. The mapping relationship between similarity and performance metrics is used to obtain a similarity-performance relationship model. This model is then used for subsequent weight adjustment and threshold optimization.
[0041] Step S7.3: Update the similarity threshold. Include the validation samples from the target watershed in the most similar category. Given a sample set, recalculate the intra-class distance distribution for that category to obtain the updated [number of samples]. , , Percentiles. If the relative change between the updated percentile and the original percentile exceeds a preset tolerance, a threshold update of the knowledge base is triggered.
[0042] Step S7.4: Update the prior knowledge base. Based on the evaluation results of step S7.1, determine the transfer effect: when the simulation performance indicators meet the preset standards, the transfer effect is considered good, and the watershed characteristics, cluster affiliation, and transferred model parameters of the target watershed are included in the prior knowledge base as a new source watershed. When the simulation performance indicators do not meet the preset standards, the transfer effect is considered poor, and the watershed is not included in the knowledge base, but it is recorded as an anomaly case for analyzing the shortcomings of the existing similarity judgment system. When the anomaly cases accumulate to a preset scale, the cluster structure or feature space is re-optimized.
[0043] Step S7.5, optimize weight configuration. Based on the similarity-performance relationship model established in step S7.2, an adaptive optimization method is used to adjust the weights in step S5.4. , , This maximizes the correlation between weighted distance and migration performance.
[0044] Compared with the prior art, the present invention has the following beneficial effects: (1) The present invention constructs a three-source feature space that integrates geographical attribute features, rainfall driving features and runoff response features through step S1. It comprehensively describes the hydrological similarity of the watershed from three dimensions: watershed physical attributes, meteorological input process and hydrological response behavior. It overcomes the limitation of existing methods that rely only on static geographical attributes for similarity measurement and provides a more comprehensive and essential matching basis for parameter migration in watersheds with insufficient data.
[0045] (2) The present invention projects the features of the source and target watersheds onto a unified standardized space through feature space mapping and collaborative alignment processing in step S3, effectively eliminating the domain offset caused by the difference in cross-watershed data acquisition methods and spatial resolution, and significantly improving the accuracy and reliability of feature similarity calculation.
[0046] (3) This invention achieves a leap from traditional extensive binary migration to refined hierarchical migration through the weighted distance measurement and hierarchical similarity determination in step S5, the differentiated migration strategy in step S6, and the dynamic adaptive iterative update mechanism in step S7. It also establishes a closed-loop knowledge base update system with self-evolution capabilities, which significantly enhances the model's ability to continuously adapt to new watershed types and complex underlying surface conditions.
[0047] In summary, this invention achieves refined migration and continuous optimization of hydrological model parameters from source basins to data-deficient target basins without the need for runoff observation data of the target basin. This is achieved through the deep integration of three-source feature space construction, feature space mapping and collaborative alignment, hierarchical similarity judgment and differentiated migration strategy, and closed-loop knowledge base adaptive update mechanism. This significantly improves the reliability and generalization ability of hydrological simulation in data-deficient basins. Attached Figure Description
[0048] Figure 1 This is a flowchart of a three-source feature similarity migration hydrological simulation method for watersheds with insufficient data, as described in this embodiment of the invention.
[0049] Figure 2 This is a spatial distribution map of the target watershed in an embodiment of the present invention; wherein... Figure 2 (a) in the figure is a spatial distribution map of the target watershed A; Figure 2 (b) in the figure is a spatial distribution map of the target watershed B; Figure 2 (c) in the figure represents the spatial distribution of the target watershed C. Figure 2 (d) in the figure represents the spatial distribution of the target watershed D.
[0050] Figure 3 This is a dimensionality-reduced clustering distribution map of the target watershed and its source watershed category in the joint feature space, as shown in this embodiment of the invention; wherein... Figure 3 (a) in the figure is the dimension-reduced clustering distribution of the target watershed A and its source watershed categories in the joint feature space; Figure 3 (b) in the figure is the dimension-reduced clustering distribution of the target watershed B and its source watershed category in the joint feature space; Figure 3 (c) in the figure is the dimension-reduced clustering distribution of the target watershed C and its source watershed category in the joint feature space; Figure 3 (d) in the diagram represents the dimensionality-reduced clustering distribution of the target watershed D and its source watershed categories in the joint feature space.
[0051] Figure 4 This is a hydrological simulation result for the target watershed A based on similarity transfer learning. Figure 5 This is a simulation result of the hydrological conditions in the target watershed B based on similarity transfer learning. Figure 6 This is a simulation result of the hydrological conditions in the target watershed C based on similarity transfer learning. Figure 7 This is a simulation result of the hydrological conditions of the target watershed D based on similarity transfer learning. Detailed Implementation
[0052] To make the objectives, technical solutions, and beneficial effects of this invention clearer, the invention will be further described in detail below with reference to the accompanying drawings and specific embodiments. The specific embodiments described herein are for illustrative purposes only and are not intended to limit the scope of protection of this invention.
[0053] This embodiment provides a computer-based method for hydrological simulation of three-source feature similarity migration in watersheds with limited data. For example... Figure 1 As shown, the method includes steps S1 to S7. The method can be executed by a computer system, and by processing multi-source data from the source and target watersheds, it achieves cross-watershed migration and adaptive updating of hydrological model parameters.
[0054] Step S1: Obtain multi-source data from the source and target watersheds, preprocess and normalize the multi-source data, and construct a three-source feature vector including geographical attribute features, rainfall driving features, and runoff response features. Specifically: Step S1.1: Obtain multi-source data for source and target watersheds. In this embodiment, the source watershed dataset is integrated from multiple publicly available hydrological datasets, preferably the CAMELS series datasets. Specifically, it includes CAMELS-AUS (Australia, 561 watersheds), CAMELS-BR (Brazil, 897 watersheds), CAMELS-CH (Switzerland, 331 watersheds), CAMELS-CL (Chile, 516 watersheds), CAMELS-DE (Germany, 1582 watersheds), CAMELS-DK (Denmark, 304 watersheds), CAMELS-FR (France, 654 watersheds), CAMELS-GB (United Kingdom, 671 watersheds), CAMELS-IND (India, 242 watersheds), CAMELS-NZ (New Zealand, 369 watersheds), CAMELS-SE (Sweden, 50 watersheds), and CAMELS-US (United States, 671 watersheds), totaling 6848 source watersheds. The aforementioned watersheds cover a variety of climate types, from cold temperate to tropical and from humid to arid, and encompass multiple runoff generation mechanisms, including rainfall, snowmelt, and glacial meltwater, making them highly representative.
[0055] Furthermore, the rainfall data preferably uses the Multi-Source Weighted-Ensemble Precipitation Version 2.8 (MSWEP v2.8) dataset, which has a spatial resolution of 0.1°, a temporal resolution of 3 hours, and a time span covering 1979 to the present. The rainfall data is obtained by fusing rain gauge observation data, satellite inversion data, and reanalysis data to improve the spatiotemporal continuity and completeness of rainfall information.
[0056] Furthermore, the HydroATLAS database is preferred for geographic attribute data. This database provides global-scale hydrological and environmental attribute information, including multiple categories of attribute variables such as hydrology, geomorphology, climate, land cover and use, soil, geology, and human activity impacts.
[0057] Furthermore, the target watersheds were selected as four watersheds with insufficient data within the Songliao River Basin in China, denoted as Watershed A, Watershed B, Watershed C, and Watershed D, respectively. Figure 2 As shown, all the aforementioned watersheds suffer from missing runoff observation data or short observation sequences, which is typical of data-deficient watersheds. The geographical attribute features of the target watersheds were extracted from the HydroATLAS database, rainfall-driving features were obtained from the MSWEP dataset, while runoff response features were considered missing or only used for subsequent model validation.
[0058] Step S1.2 involves preprocessing and feature normalization of the multi-source data obtained in Step S1.1. Missing values in the geographic attribute data are filled using multiple imputation. Rainfall and runoff time-series data are uniformly resampled to a daily scale using linear interpolation, and watershed samples with more than 20% missing runoff observation data are removed. Dimensionless processing is then applied to each feature dimension after the above processing. In this embodiment, Z-score normalization is specifically used to ensure that each feature follows a distribution with a mean of 0 and a variance of 1, thereby eliminating feature scale bias introduced by differences in dimensions and numerical ranges and ensuring that each feature has equal representational weight in the subsequent joint feature space.
[0059] Step S1.3: Construct geographic attribute feature vectors Fifty-six original geographic attribute variables closely related to the watershed hydrological response were extracted from the HydroATLAS database and, after dimensionless processing in step S1.2, formed geographic attribute feature vectors. Specifically, these include: 8 topographic features (average elevation, elevation standard deviation, average slope, watershed area, shape coefficient, etc.), 12 climate features (annual average precipitation, precipitation seasonality index, P / PET ratio, etc.), 15 soil features (soil thickness, sand content, clay content, saturated hydraulic conductivity, etc.), 9 geological features (bedrock type, impermeable layer depth, etc.), and 12 land use features (forest coverage, grassland coverage, farmland coverage, proportion of impermeable surface, etc.).
[0060] Step S1.4: Construct rainfall-driven feature vectors Diurnal precipitation time series from 1981 to 2020 (40 years) for each watershed were extracted from the MSWEP dataset. After dimensionless processing, precipitation-driven feature vectors were further calculated. Specifically, this includes: 6 characteristics of total rainfall (annual average rainfall, annual standard deviation of rainfall, cumulative rainfall on wet days, etc.), 5 seasonal characteristics (rainfall concentration index, start time of rainy season, end time of rainy season, etc.), 7 extreme characteristics (maximum daily rainfall, rainfall intensity index, R95p extreme total rainfall, maximum length of consecutive wet days, etc.), and 6 event characteristics (average intensity of rainfall events, distribution parameters of rainfall event intervals, etc.).
[0061] Step S1.5: Construct runoff response feature vector Daily runoff time series data for each source basin were extracted from the CAMELS series datasets, and runoff response feature vectors were calculated after dimensionless processing. Specifically, these include: 5 water balance indicators (base value of runoff coefficient, runoff coefficient variation, etc.), 7 flow duration curve indicators (Q5, Q50, Q95 flow values and slope characteristics, etc.), 6 drainage characteristic indicators (drainage constant, drainage velocity variation coefficient, etc.), 8 flood characteristic indicators (flood frequency, flood rise rate, peak time, etc.), and 5 seasonal indicators (runoff center time, runoff seasonality index, etc.).
[0062] Step S1.6, convert the geographic attribute feature vector Rainfall-driven feature vectors With runoff response eigenvector By concatenating the vectors, we obtain the three source feature vectors of the source watershed. : (1) For target watersheds A, B, C, and D, the geographic attribute feature vectors are also extracted. Rainfall-driven feature vector The runoff response eigenvector In the absence of information, it is considered missing.
[0063] Step S2 involves feature fusion and dimensionality reduction of the three source feature vectors of the source watershed. Cluster analysis is then performed in the dimensionality-reduced joint feature space to obtain multiple watershed categories and the distribution of distances between each cluster center and within each cluster. This process constructs a prior knowledge base for model parameters. Specifically: Step S2.1, the source-watershed three-source feature vectors constructed in step S1.6 are... Dimensionless processing is performed. This embodiment again employs Z-score normalization, using the same processing strategy as in step S1.2, to further eliminate any residual scale differences between dimensions after feature concatenation, resulting in the feature matrix. .
[0064] Step S2.2, for the feature matrix Dimensionality reduction is performed. Principal component analysis is used to calculate the eigenvalues and eigenvectors of the covariance matrix, and the number of principal components is determined based on the cumulative variance contribution rate. The results show that the cumulative variance contribution rate of the first 10 principal components is approximately 82%; the cumulative variance contribution rate of the first 15 principal components is approximately 87.5%; and the cumulative variance contribution rate of the first 20 principal components is approximately 91%. In this embodiment, the first 15 principal components are selected as the clustering input features, and the dimensionality-reduced joint feature space is denoted as... .
[0065] Step S2.3, in the dimensionality-reduced joint feature space Cluster analysis was performed. The K-means clustering algorithm was used, with Euclidean distance as the similarity measure. The optimal number of clusters was determined... The profile coefficient, Calinski-Harabasz index, and Davies-Bouldin index were comprehensively considered. Calculations showed that when... At that time, the silhouette coefficient reached its maximum value of 0.42, and the overall performance of various indicators was good. Therefore, the optimal number of clusters was determined to be 12. After clustering, 12 watershed categories were obtained, each denoted as [category name missing]. ( ).
[0066] Step S2.4, calculate each category Cluster center And the Euclidean distance distribution from within-class samples to cluster centers. Statistical analysis of within-class distances for each category, calculating the... percentile , No. percentile Passing the exam percentile These are respectively used as the strict threshold, baseline threshold, and lenient threshold in subsequent similarity determination. Simultaneously, the cluster centers... By mapping back to the original 111-dimensional feature space through inverse PCA transformation, the cluster centers in the original feature space are obtained. This is used for subsequent similarity calculations.
[0067] Step S2.5: For each source basin, collect the parameters of its trained hydrological model. This embodiment uses a Long Short-Term Memory (LSTM) network model as the hydrological model, employing a unified network architecture: input dimension of 2 (daily rainfall and daily potential evapotranspiration), hidden layer dimension of 256, number of layers of 2, and output dimension of 1. Daily rainfall, potential evapotranspiration, and runoff data from 1981 to 2010 (30 years) were used for independent training in each watershed, with validation using data from 2011 to 2020 (10 years). The Adam optimizer was used during training, with a learning rate of 0.001, a batch size of 256, and an early stopping mechanism. After training was completed for each watershed, the model parameters were saved. For watersheds with a Nash efficiency coefficient (NSE) below 0.5 during the validation period, they are marked as unusable source watersheds and are not included in the prior knowledge base.
[0068] Step S2.6, the cluster centers obtained in step S2.4 Percentile distances within each category , , and the available source watersheds and their model parameters collected in step S2.5. Together, they constitute the prior knowledge base of model parameters.
[0069] Step S3 involves using a feature space mapping method to map the feature vectors of the target and source watersheds to a unified feature space, and aligning the feature spaces of the source and target watersheds to obtain a projected unified feature representation. Specifically: Step S3.1: Determine the feature space mapping method. This embodiment uses principal component analysis to calculate the projection matrix based on the geographical attribute features and rainfall driving features of the source watershed. The geographical attribute feature vectors of the source watershed are then mapped... Rainfall-driven feature vector By concatenating the features, we obtain the joint input feature vector of the source and watersheds. .
[0070] Step S3.2, calculate the projection matrix The joint input feature matrix of the source and watersheds. Perform eigenvalue decomposition, calculate its covariance matrix, and solve for the eigenvalues and eigenvectors. Select the first... The eigenvectors corresponding to the largest eigenvalues constitute the projection matrix. The results show that the variance contribution rates of the first two principal components are 43.6% and 25.1%, respectively, accumulating to 68.7%; when the number of principal components reaches 20 dimensions, the cumulative variance contribution rate reaches approximately 91.6%. This embodiment selects... That is, the first 20 principal components serve as a unified feature representation space.
[0071] Step S3.3: Project the joint input features of the source watersheds onto a unified feature space. The projected features of the source watersheds are represented as follows: (2) in, It is the mean vector of the joint input features of the source and watersheds.
[0072] Step S3.4: Project the joint input features of the target watersheds onto a unified feature space. The joint input features of target watersheds A, B, C, and D are as follows: Using the same mean With projection matrix Project: (3) Through the above projection, the alignment representation of the source and target watersheds in the unified feature space is obtained. and This effectively eliminates the domain offset caused by differences in cross-basin data acquisition methods and resolutions.
[0073] Step S4: In the absence of runoff observation data in the target watershed, establish a mapping model between the geographical attribute characteristics of the source watershed and the runoff response characteristics, and estimate its runoff response characteristics using the geographical attribute characteristics of the target watershed to obtain the predicted runoff response feature vector and uncertainty index. Specifically: Step S4.1: Construct a supervised learning model. This involves using the geographical attributes of the source watershed. As input features, runoff response features Each component is used as the prediction target, and they are trained separately. This is a supervised learning model. Specifically, this embodiment uses a random forest regression model, combining multiple decision trees to improve prediction accuracy and robustness. Each model contains 500 decision trees, with a maximum depth of 20 and a minimum number of leaf samples of 5.
[0074] Step S4.2, using the geographic attribute features constructed in step S1.3 As input, the runoff response characteristics constructed in step S1.5 Each component is used as the output, and they are trained separately. Several regression models were developed to obtain a set of trained mapping models. Validation was performed using out-of-bag samples, and the root mean square error and coefficient of determination for each component's predicted values were calculated. For components with a coefficient of determination below 0.3, their weights were reduced by 50% in subsequent similarity calculations.
[0075] Step S4.3: Estimate the runoff response characteristics of the target watershed. This involves determining the geographical attributes of target watersheds A, B, C, and D. Input the trained random forest model set from step S4.2 respectively to obtain the predicted runoff response feature vectors. : (4) in, for A set of mapping models consisting of regression models, each model corresponding to a component of the predicted runoff response characteristics.
[0076] Step S4.4, estimate the prediction uncertainty. Using the distribution of prediction residuals generated during the training of the random forest model, calculate the variance of each component's predicted value, and thus obtain the confidence interval width of the predicted value. Define the confidence interval width ratio: (5) in, This represents the population standard deviation of the characteristic components of the runoff response corresponding to the source basin. When At that time, it was considered that the prediction uncertainty of this component was high, and its weight was reduced by 50% in subsequent similarity calculations.
[0077] Step S5: Construct the joint feature vector of the target watershed, and use a weighted distance metric to calculate the feature similarity between the target watershed and each cluster center to obtain the similarity level between the target watershed and each source watershed category. Specifically: Step S5.1: Construct the joint feature vector of the target watershed. This involves combining the geographical attribute features of the target watershed. Rainfall-driven characteristics and the runoff response characteristics estimated in step S4.3 By concatenating the features, we obtain the joint feature vector of the target watershed: (6) (In this embodiment) for All features have been Z-score standardized to ensure comparability between different feature components.
[0078] Step S5.2: Perform feature alignment on the source watershed cluster centers. Step S2.4 has already aligned the cluster centers... It can be directly used for distance calculation.
[0079] Step S5.3, define the weighted distance metric. The target watershed and the... The weighted Euclidean distance between cluster centers is defined as: (7) (In this embodiment, The weighting coefficients are assigned based on the feature source. In this embodiment, since there is no runoff observation data for the target watershed, the following weighting configuration is adopted: geographic attribute feature group. The weight is Rainfall-driven feature group The weight is Runoff response characteristic group The weight is The weights within each group are evenly distributed across all dimensions.
[0080] Step S5.4: Dynamically adjust the weights. If runoff observation data is subsequently obtained for the target watershed, the weight configuration will be adjusted accordingly. When runoff observation data is unavailable for the target watershed, increase the weight of the geographic attribute feature group. and rainfall-driven feature group weights .
[0081] Step S5.5, Similarity Level Determination. Based on the weighted distance calculated in step S5.3. Combined with the intra-class distance percentiles of each category in step S2.4 , , Feature similarity is divided into four levels. Taking target watershed A as an example, the weighted Euclidean distance between it and the 12 cluster centers is calculated, and the distance to the 5th cluster center is the smallest, at 1.87. This is combined with the intra-cluster distance threshold of the 5th cluster. , , Because it satisfies Furthermore, the matching conditions are met in both the geographic attribute subspace and the rainfall-driven subspace. Therefore, the target watershed A is determined to be at the second similarity level (medium-high similarity). Figure 3 The dimension-reduced clustering distributions of target watersheds A, B, C, and D and their respective source watershed categories in the joint feature space are shown respectively.
[0082] Step S6: Based on the feature similarity obtained in step S5, select the corresponding category of hydrological model parameters from the prior knowledge base of model parameters constructed in step S2, and apply the hydrological model parameters to the hydrological simulation of the target watershed to obtain the simulated flow process curve of the target watershed. Specifically: Step S6.1: Determine the most similar category. For each target watershed, select the category with the smallest weighted distance as the most similar category. The most similar category for target watersheds A, B, and C is category 5, while the most similar category for target watershed D is category 8.
[0083] (8) Step S6.2: Obtain the similarity level. Target watersheds A and B are determined to be at the second similarity level (medium-high similarity), target watershed C is determined to be at the first similarity level (high similarity), and target watershed D is determined to be at the third similarity level (medium similarity).
[0084] Step S6.3: Execute differentiated parameter migration strategies based on similarity levels: For the target watershed C ( (High similarity) and adopt a direct transfer strategy: select the source watershed with the smallest weighted distance to the target watershed C from the 5th class, and directly apply its pre-trained LSTM model parameters to the target watershed C.
[0085] For target watersheds A and B ( (For medium to high similarity), a parameter fine-tuning transfer strategy is adopted: the source watershed with the smallest weighted distance to the target watershed is retrieved from the 5th class, and its pre-trained LSTM model parameters are used as initial parameters. Fine-tuning is then performed using local data from the target watershed. During fine-tuning, the parameters of the first two layers of the LSTM network are frozen, and only the parameters of the last layer are fine-tuned; a relatively small learning rate of 0.0001 (1 / 10 of the initial training learning rate) is used; and Dropout regularization (dropout rate 0.3) is introduced.
[0086] For the target watershed D ( (Medium similarity) adopts a constraint correction migration strategy: the model parameters of the top 5 most similar source watersheds in the 8th class are weighted and averaged, with the weights inversely proportional to the weighted distance, and an L2 regularization term is introduced to constrain the degree of parameter deviation.
[0087] The above strategies are used to obtain hydrological model parameters applicable to the target watershed.
[0088] Step S6.4: Perform hydrological simulation. Apply the hydrological model parameters obtained in step S6.3 to each target watershed, using the rainfall sequence and potential evapotranspiration sequence of the target watershed as input, perform forward calculations, and output the simulated flow process curves for each target watershed. Among these, Figure 4 The hydrological simulation results for target watershed A based on similarity transfer learning were demonstrated. Figure 5 The hydrological simulation results for target watershed B were demonstrated. Figure 6 The hydrological simulation results for the target watershed C were demonstrated. Figure 7 The hydrological simulation results for the target watershed D are presented, with each figure showing a comparison between the direct training method and the unified training-transfer method. The results show that the method of this invention outperforms the comparative methods in both peak flow capture capability and overall fitting accuracy.
[0089] Step S7: Based on actual observation data or validation samples from the target watershed, update the prior knowledge base of the model parameters to obtain the optimized weight configuration and the updated knowledge base. Specifically: Step S7.1: Collect validation samples. After completing the migration simulation in the target watershed, add approximately 6 months of observational data for each target watershed. Use this data as validation samples to evaluate the simulation performance of the migration model.
[0090] Step S7.2: Construct a similarity-performance relationship model. The current four migration cases are summarized with historical migration cases. In this embodiment, Gaussian process regression is used to fit the similarity distance. The mapping relationship between the coefficient and the Nash efficiency coefficient is used for subsequent weight adjustment and threshold optimization.
[0091] Step S7.3: Update the similarity threshold. Include the validation samples from the target watershed in the most similar category. Given a sample set, recalculate the intra-class distance distribution for classes 5 and 8. The results show that the intra-class distance distribution for class 5... , , The updated values are 1.48, 2.29, and 3.61. The relative changes from the original thresholds are all less than 5%, which does not exceed the preset tolerance of 10%, indicating that the original thresholds have good stability and will not trigger threshold updates for the time being.
[0092] Step S7.4: Update the prior knowledge base. For target watersheds A, B, and C with good transfer performance, their watershed characteristics, cluster affiliation (class 5), and fine-tuned model parameters are added to the prior knowledge base as new source watersheds. For target watershed D with acceptable transfer performance, it is not added to the knowledge base for the time being, but it is recorded as a marginal case.
[0093] Step S7.5: Optimize weight configuration. Based on the similarity-performance relationship model established in step S7.2, the weight coefficients in step S5.4 are adjusted using the Bayesian optimization method. , , .
[0094] The above embodiments are merely illustrative of the implementation methods of the present invention, but should not be construed as limiting the scope of the present invention. It should be noted that those skilled in the art can make various modifications and improvements without departing from the concept of the present invention, and these modifications and improvements all fall within the protection scope of the present invention.
Claims
1. A method for migrating hydrological data based on the similarity of three-source characteristics in watersheds with limited data, characterized in that, The three-source feature similarity migration hydrological simulation method includes the following steps: Step S1: Obtain multi-source data from the source and target watersheds, preprocess and normalize the multi-source data, and construct a three-source feature vector including geographic attribute features, rainfall driving features, and runoff response features. Step S2: Perform feature fusion and dimensionality reduction on the three source feature vectors of the source watershed, perform cluster analysis in the dimensionality-reduced joint feature space to obtain multiple watershed categories and the distribution of distances between each cluster center and within each cluster, and construct a prior knowledge base for model parameters. Step S3: The feature space mapping method is used to map the feature vectors of the target watershed and the source watershed to a unified feature space, and the feature spaces of the source watershed and the target watershed are aligned to obtain the projected unified feature representation; Step S4: In the absence of runoff observation data in the target watershed, establish a mapping model between the geographical attribute characteristics of the source watershed and the runoff response characteristics, and use the geographical attribute characteristics of the target watershed to estimate its runoff response characteristics, thereby obtaining the predicted runoff response feature vector and uncertainty index. Step S5: Construct the joint feature vector of the target watershed, and use the weighted distance metric method to calculate the feature similarity between the target watershed and each cluster center to obtain the similarity level between the target watershed and each source watershed category; Step S6: Based on the feature similarity obtained in step S5, select the corresponding category of hydrological model parameters from the prior knowledge base of model parameters constructed in step S2, and apply the hydrological model parameters to the hydrological simulation of the target watershed to obtain the simulated flow process line of the target watershed. Step S7: Based on the actual observation data or validation samples of the target watershed, update the prior knowledge base of the model parameters to obtain the optimized weight configuration and the updated knowledge base.
2. The hydrological simulation method based on the similarity of three-source features for watersheds lacking data, as described in claim 1, is characterized in that... Specifically, step S1 includes: Step S1.1: Obtain multi-source data from the source and target watersheds; the multi-source data includes geographic attribute data, rainfall time-series data, and runoff time-series data. The geographic attribute data is sourced from the Global Hydrological Environment Database; the rainfall time series data is sourced from a multi-source fused precipitation dataset with a time resolution of at least daily; and the runoff time series data is sourced from publicly available watershed hydrological datasets. Step S1.2: Preprocess and normalize the features of the multi-source data obtained in step S1.1; Missing values in geographic attribute data are filled using multiple interpolation; rainfall and runoff time series data are uniformly resampled to the daily scale using linear interpolation, and watershed samples with a runoff observation data missing ratio exceeding 20% are removed; dimensionless processing is performed on each feature dimension after preprocessing and feature normalization. Step S1.3: Construct geographic attribute feature vectors Extracting from dimensionless geographic attribute data Several attribute variables closely related to the watershed hydrological response constitute a geographic attribute feature vector. ,in It is the space of real numbers; The geographic attribute feature vector includes five sub-features: topographic features, climate features, soil features, geological features, and land use features; Step S1.4: Construct rainfall-driven feature vectors Extract from dimensionless rainfall time series data These feature variables, which characterize the dynamic properties of a rainfall process, constitute the rainfall-driven feature vector. ; The rainfall-driven feature vector includes four types of sub-features: total rainfall features, seasonal features, extreme rainfall features, and rainfall event features; Step S1.5: Construct runoff response feature vector Extract from dimensionless runoff time series data These characteristic variables, which characterize the essence of the hydrological response behavior of a watershed, constitute the runoff response characteristic vector. ; The runoff response feature vector includes five sub-features: water balance index, flow duration curve index, drainage feature index, flood feature index, and seasonality index; Step S1.6, convert the geographic attribute feature vector Rainfall-driven feature vectors With runoff response eigenvector By concatenating the vectors, we obtain the three source feature vectors of the source watershed. : (1) For the target watershed, its runoff response characteristic vector Treat it as unknown, and only construct geographic attribute feature vectors. Rainfall-driven feature vector .
3. The hydrological simulation method based on the similarity of three-source features for watersheds with limited data, as described in claim 2, is characterized in that... Specifically, step S2 includes: Step S2.1: Process the three source feature vectors of the source basin. Dimensionless processing is performed to obtain the characteristic matrix. ; Step S2.2, for the feature matrix Perform dimensionality reduction processing; Principal component analysis is used to calculate the eigenvalues and eigenvectors of the covariance matrix, and the number of principal components is determined according to the cumulative variance contribution rate. This makes the former The cumulative variance contribution rate of each principal component is not less than a preset threshold; the joint feature space after dimensionality reduction is denoted as... ,in The total number of source basins; Step S2.3, in the dimensionality-reduced joint feature space Cluster analysis is performed; a clustering algorithm is used, with Euclidean distance as the similarity measure; to determine the optimal number of clusters... After comprehensively examining various clustering effectiveness evaluation indicators, the one that achieves the overall optimal balance between intra-cluster compactness and inter-cluster segregation was selected. The value is used as the cluster number; after clustering is completed, we get... There are 1 watershed category, each denoted as _____. ( ); Step S2.4, calculate each category Cluster center And the Euclidean distance distribution from within-class samples to cluster centers; statistically analyze the within-class distances for each category, and calculate the first... percentile , No. percentile Passing the exam percentile ,in , , These are respectively used as the strict threshold, baseline threshold, and lenient threshold in subsequent similarity determination; Step S2.5: For each source basin, collect the parameters of its trained hydrological model. ; Calculate the simulation accuracy evaluation index during the verification period; For watersheds with accuracy below the preset threshold during the verification period, mark them as unusable source watersheds and do not include them in the prior knowledge base; Step S2.6, the cluster centers obtained in step S2.4 Percentiles , , and the available source watersheds and their model parameters collected in step S2.
5. Together, they constitute the prior knowledge base of model parameters.
4. The hydrological simulation method based on the similarity of three-source features for watersheds with scarce data, as described in claim 3, is characterized in that... In step S2.5, the hydrological model is a long short-term memory network model, which adopts a unified network architecture and is independently trained using daily-scale rainfall, potential evapotranspiration and runoff data of each watershed; after training, the model parameters are saved.
5. The hydrological simulation method based on the similarity of three-source features for watersheds lacking data, as described in claim 4, is characterized in that... Specifically, step S3 is as follows: Step S3.1: Determine the feature space mapping method; The feature space mapping method employs principal component analysis to calculate the projection matrix based on the geographical attribute features and rainfall-driven features of the source basin; the geographical attribute feature vectors of the source basin are then mapped. Rainfall-driven feature vector By concatenating the features, we obtain the joint input feature vector of the source and watersheds. ; Step S3.2, calculate the projection matrix ; Joint input feature matrix of source and watershed Perform eigenvalue decomposition, calculate its covariance matrix, and solve for the eigenvalues and eigenvectors; select the first... The eigenvectors corresponding to the largest eigenvalues constitute the projection matrix. ,in The selection of features ensures that the cumulative variance contribution rate of the projected features is not lower than a preset threshold. Step S3.3: Project the joint input features of the source watershed onto a unified feature space; the projected features of the source watershed are represented as follows: (2) in, This represents the aligned representation of the source basin in a unified feature space. The mean vector of the joint input features of the source and watersheds; Step S3.4: Project the joint input features of the target watershed onto a unified feature space; the joint input features of the target watershed are... Using the same mean With projection matrix Project: (3) in, This represents the aligned representation of the target watershed in a unified feature space.
6. The hydrological simulation method based on the similarity of three-source features for watersheds with limited data, as described in claim 5, is characterized in that... Specifically, step S4 includes: Step S4.1: Construct a supervised learning model; using the geographical attribute features of the source watershed. As input features, runoff response features Each component is used as the prediction target, and they are trained separately. A supervised learning model; the supervised learning model is preferably a regression model based on ensemble learning; Step S4.2, train the mapping model; using the geographic attribute features constructed in step S1.
3. As input, the runoff response characteristics constructed in step S1.5 Each component is used as the output, and they are trained separately. A regression model is used to obtain a trained mapping model; the model performance is evaluated using cross-validation or a proprietary validation mechanism. Step S4.3: Estimate the runoff response characteristics of the target watershed; Geographical attributes of the target watershed Input the mapping model trained in step S4.2 to obtain the predicted runoff response feature vector. : (4) in, For the reason A set of mapping models consisting of regression models, each model corresponding to a component of the predicted runoff response characteristics; Step S4.4: Estimate the prediction uncertainty; using the prediction residual distribution generated during the regression model training process, calculate the variance of each component's predicted value, and then obtain the confidence interval width of the predicted value. Define the confidence interval width ratio: (5) in, This represents the overall standard deviation of the characteristic components of the runoff response corresponding to the source basin.
7. The hydrological simulation method based on the similarity of three-source features for watersheds with limited data, as described in claim 6, is characterized in that... Specifically, step S5 includes: Step S5.1: Construct the joint feature vector of the target watershed; combine the geographical attribute features of the target watershed. Rainfall-driven characteristics and the runoff response characteristics estimated in step S4.3 By concatenating the features, we obtain the joint feature vector of the target watershed: (6) Step S5.2: Align the features of the source watershed cluster centers; Each category obtained in step S2.4 Cluster center Mapping back from the dimensionality-reduced joint feature space to the original feature space yields the cluster centers in the original feature space. ; Step S5.3, define the weighted distance metric; target watershed and the first The weighted Euclidean distance between cluster centers is defined as: (7) in, The cluster centers in the original feature space are the cluster centers at the th Values can be taken in each feature dimension; For the first Weight coefficients for each feature dimension; weight coefficients are assigned in groups based on feature source: geographic attribute feature group. The weight is Rainfall-driven feature group The weight is Runoff response characteristic group The weight is ,and The weights within each group are evenly distributed across all dimensions. Step S5.4, dynamically adjust the weights; dynamically adjust the weights based on the data availability of the target watershed: when runoff observation data is available for the target watershed, increase the weight of the runoff response characteristic group. When runoff observation data is unavailable for the target watershed, increase the weight of the geographic attribute feature group. and the weights of rainfall-driven feature groups ; Step S5.5, similarity level determination; based on the weighted distance calculated in step S5.
3. Combined with the percentiles of each category in step S2.4 , , Feature similarity is divided into multiple levels; the rules for dividing similarity levels are as follows: First similarity level, i.e., high similarity: when Furthermore, the weighted distances of the target watershed in the geographic attribute subspace and the rainfall-driven subspace do not exceed the corresponding subspace's [number of units]. Percentile distance; The second similarity level, i.e., medium to high similarity: when And in at least two subspaces, the corresponding first... Percentile distance constraint; The third similarity level, or medium similarity: when And in at least one subspace, the corresponding first... Percentile distance constraint; The fourth similarity level, i.e., low similarity: when Alternatively, it may satisfy the distance condition but not the subspace matching condition.
8. The hydrological simulation method based on the similarity of three-source features for watersheds lacking data, as described in claim 7, is characterized in that... Specifically, step S6 includes: Step S6.1: Determine the most similar category; calculate the target watershed's relationship with all... The weighted distance between the cluster centers is used to select the category with the smallest weighted distance as the most similar category. : (8) Step S6.2: Obtain the similarity level; based on step S5.5, determine the target watershed and the most similar category. Similarity levels between ; Step S6.3: Execute differentiated parameter migration strategies based on similarity levels: like In other words, high similarity requires either a direct migration strategy or an integrated migration strategy. like That is, medium to high similarity, adopting a parameter fine-tuning migration strategy; like That is, moderate similarity, and a constraint-modified migration strategy is adopted; like That is, low similarity, no model transfer is performed; By employing parameter migration strategies, hydrological model parameters suitable for the target watershed can be obtained. Step S6.4: Perform hydrological simulation; apply the hydrological model parameters obtained in step S6.3 to the target watershed, use the rainfall sequence and potential evapotranspiration sequence of the target watershed as input, perform forward calculation, and output the simulated flow process line of the target watershed.
9. A method for migrating hydrological data based on the similarity of three-source features in a data-scarce watershed, as described in claim 8, is characterized in that... Specifically, step S7 includes: Step S7.1: Collect validation samples; after completing the migration simulation in the target watershed, when new runoff observation data is obtained, it is used as a validation sample to evaluate the simulation performance of the migration model; the performance evaluation indicators include Nash efficiency coefficient, root mean square error, and relative error of peak flow. Step S7.2: Construct a similarity-performance relationship model; treat this migration as a migration case and summarize it with historical migration cases; record the similarity distance of each migration case. Based on the corresponding simulation performance metrics, establish similarity distances for all cases. The mapping relationship between similarity and performance indicators is used to obtain a similarity-performance relationship model; Step S7.3: Update the similarity threshold; include the validation samples from the target watershed in the most similar category. Given a sample set, recalculate the intra-class distance distribution for that category to obtain the updated [number of samples]. , , Percentiles; if the relative change between the updated percentile and the original percentile exceeds the preset tolerance, the knowledge base threshold update will be triggered. Step S7.4, update the prior knowledge base; Based on the evaluation results of step S7.1, the migration effect is judged: when the simulation performance index meets the preset standard, it is judged as a good migration effect, and the watershed characteristics, cluster affiliation, and model parameters after migration of the target watershed are included in the prior knowledge base as a new source watershed; when the simulation performance index does not meet the preset standard, it is judged as a poor migration effect, and it is not included in the knowledge base, but it is recorded as an abnormal case for analysis of the shortcomings of the existing similarity judgment system; when the abnormal cases accumulate to a preset scale, the re-optimization of the cluster structure or feature space is triggered. Step S7.5, optimize weight configuration; based on the similarity-performance relationship model established in step S7.2, adjust the weights in step S5.4 using an adaptive optimization method. , , This maximizes the correlation between weighted distance and migration performance.