Water quality prediction method based on deep learning and hec-ras model
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- 浙江省环境科技股份有限公司
- Filing Date
- 2026-02-02
- Publication Date
- 2026-06-12
AI Technical Summary
Existing water quality prediction methods suffer from insufficient accuracy, stability, and interpretability when dealing with complex hydrological processes, especially when the uncertainty of input parameters and data is high, making it difficult to perform well simultaneously.
By combining deep learning and the Hec-Ras model, the accuracy of hydrological input is improved through the CNN-LSTM model, and real-time correction is performed using the ensemble Kalman filter data assimilation method, forming a complementary water quality simulation method that ensures the stability and interpretability of the simulation process.
It significantly improves the accuracy of water quality forecasts, enhances the interpretability and stability of the model, can quickly provide high-quality water quality forecast results, and optimizes calculation results through real-time monitoring data.
Smart Images

Figure CN122200933A_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the field of water environment simulation technology, and more specifically to a water quality simulation method based on deep learning and mechanistic models. Background Technology
[0002] Water quality simulation is one of the core technologies for water ecological environment protection and water resource management. Currently, the technical approaches in this field are mainly divided into two categories: one is data-driven modeling methods (such as CN119089718B), which calculate the changes in water quality indicators in water bodies by screening and building mathematical relationships or mappings between driving factors and water quality indicators. This is mainly represented by statistical, machine learning, and deep learning methods, including various multiple regression methods, random forests, and various neural networks. The other is modeling methods based on physical, chemical, and biological processes (such as CN118485543A and CN109992909A), which describe the changes in pollutants by generalizing the main processes of pollutant migration and transformation in water bodies, and finally calculate the results of water quality changes in natural water bodies. This is mainly represented by hydrodynamic water quality models, including MIKE, HEC-RAS, WASP, and SWAT.
[0003] The problem with data-driven modeling methods is that they skip the description of intermediate processes of water quality changes, ignore the effects of physical, chemical and biological factors, and may lead to difficulties in interpreting the results, affecting subsequent water quality analysis. They may also be prone to overfitting.
[0004] The problems with modeling methods based on physical, chemical, and biological processes are that it is difficult to obtain accurate future runoff, and the general mechanistic models have high data requirements and low simulation efficiency. The model efficiency depends on the experience of parameter selection, and the results can be biased when the provided data quality is poor. Furthermore, they have poor adaptability in water bodies with complex pollution changes.
[0005] In general, current water quality prediction methods struggle to simultaneously achieve good performance in terms of accuracy, stability, and interpretability when dealing with complex hydrological processes and high uncertainties in input parameters and data. Summary of the Invention
[0006] To address the aforementioned technical problems and shortcomings in this field, this invention provides a water quality forecasting method based on deep learning and the Hec-Ras model.
[0007] This invention improves the accuracy of hydrological inputs through deep learning, ensures the stability and interpretability of simulated water quality changes using models based on physical, chemical, and biological processes, and achieves dynamic timeliness correction through data assimilation. It creatively integrates the advantages of both approaches, forming a novel water quality simulation method that is complementary, dynamically adaptable, efficient, and reliable. This fusion method not only significantly improves forecast accuracy under conventional scenarios and enhances model interpretability and stability, but also overcomes the key technical bottlenecks of traditional methods.
[0008] The specific technical solution is as follows: In a first aspect, the present invention provides a water quality forecasting method based on deep learning and the Hec-Ras model, comprising: The maximum information coefficient (MIC) of cumulative rainfall and current flow rate at different time scales is calculated. The cumulative rainfall at different time scales is sorted from largest to smallest according to the MIC. The top few cumulative rainfall at different time scales are selected and used together with previous flow rate and water level or previous flow rate, water level and temperature as input to the CNN-LSTM (Convolutional Neural Network-Long Short-Term Memory Artificial Neural Network) model. The CNN-LSTM model outputs the flow rate prediction. The CNN-LSTM model is trained and optimized by combining quantile loss functions. The Hec-Ras model is adopted, and the flow prediction output by the CNN-LSTM model and historical water quality data are used as inputs to output water quality predictions; the Hec-Ras model uses the ensemble Kalman filter (EnKF) data assimilation method for real-time dynamic data correction. When the deviation between the water quality prediction output by the Hec-Ras model and the measured value exceeds the set limit, the deviation information is fed back to the CNN-LSTM model. The cumulative rainfall at several different time scales is re-selected as the input of the CNN-LSTM model according to the aforementioned process. The CNN-LSTM model is adjusted, and the parameters of the Hec-Ras model are adjusted at the same time until the deviation between the water quality prediction output by the Hec-Ras model and the measured value is within the set range.
[0009] In some embodiments, the water quality forecasting method based on deep learning and the Hec-Ras model performs the following preprocessing on the data collected for the CNN-LSTM model and the Hec-Ras model: for missing data, spatial interpolation is used to supplement it in combination with upstream and downstream hydrological relationships; for outliers and data with obvious deviations, outliers are removed by setting a pre-set threshold.
[0010] In some embodiments, the water quality forecasting method based on deep learning and the Hec-Ras model includes a CNN-LSTM model comprising an encoder and a decoder.
[0011] Furthermore, the encoder includes: The input layer receives a multidimensional time series [T,F], where T is the time step and F is the number of features. Convolutional layers use multiple convolutional kernels of different widths to extract local temporal patterns in parallel; Max pooling layers reduce the time dimension and enhance feature invariance; The LSTM layer extracts the long-term bidirectional temporal dependencies, remembering the correlation between rainfall and flow.
[0012] Furthermore, the decoder includes: The attention mechanism layer allows the model to dynamically focus on key periods in the historical input when predicting each future time step; The forecast layer gradually generates flow sequences for future time steps; The traffic range output layer performs multiple forward propagations using the Monte Carlo Dropout method to output the interval distribution of traffic prediction.
[0013] In some embodiments, the water quality forecasting method based on deep learning and the Hec-Ras model uses the CNN-LSTM model to derive the range of water flow based on the 10th, 50th, and 90th percentiles.
[0014] In a second aspect, the present invention provides a computer-readable storage medium storing at least one instruction, which is loaded and executed by a processor to implement the water quality forecasting method based on deep learning and the Hec-Ras model as described in the first aspect.
[0015] Thirdly, the present invention provides an electronic device comprising a processor and a memory, wherein the memory stores at least one instruction, which is loaded and executed by the processor to implement the water quality forecasting method based on deep learning and the Hec-Ras model as described in the first aspect.
[0016] Compared with the prior art, the beneficial effects of this invention are as follows: (1) This invention can improve the simulation efficiency of mechanistic models. Compared with calculating hydrological forecast results through mechanistic models, this invention requires less data and has higher computational efficiency, which can help the Hec-Ras model quickly provide high-quality water quality forecast results.
[0017] (2) The present invention can optimize the calculation results by monitoring data in real time. When the simulation results of the model do not pass the set relative deviation threshold, the model can be retrained by collecting new real-time monitoring data and modifying the relevant parameters of the Hec-Ras model to improve the model calculation results.
[0018] (3) Compared with machine learning and deep learning methods that skip intermediate mechanism processes, this invention can provide richer calculation results, which facilitates the analysis and application of subsequent water quality forecast results, pollution source tracing and other work. Attached Figure Description
[0019] Figure 1 This is a schematic diagram of the process of a water quality forecasting method based on deep learning and the Hec-Ras model according to the present invention. Detailed Implementation
[0020] The present invention will be further described below with reference to the accompanying drawings and specific embodiments. It should be understood that these embodiments are for illustrative purposes only and are not intended to limit the scope of the invention.
[0021] like Figure 1 As shown, a water quality forecasting method based on deep learning and the Hec-Ras model includes data collection, data preprocessing, driving factor screening, CNN-LSTM hybrid model construction, CNN-LSTM hybrid model training, Hec-Ras model training, and feedback correction.
[0022] Data collection involves the collection and fusion of diverse data. Specifically, it involves collecting diverse and heterogeneous data for the forecast area, including: meteorological and hydrological data (mainly obtained from companies providing meteorological forecasting services and water quality monitoring stations, including rainfall, temperature, flow rate and water level); and water quality data (mainly collected from water quality monitoring stations, including monitoring values of indicators such as water temperature, pH, turbidity, total phosphorus, total nitrogen, ammonia nitrogen, permanganate, biochemical oxygen demand, and heavy metals).
[0023] Data preprocessing: For missing data, spatial interpolation is used to supplement it based on upstream and downstream hydrological relationships; outliers and data with significant deviations are removed by setting pre-defined thresholds.
[0024] Driving factor selection: Calculate the maximum information coefficient between cumulative rainfall and current flow at different time scales. Sort the cumulative rainfall at different time scales from largest to smallest based on the maximum information coefficient, and select the top 2-3 sets of cumulative rainfall at different time scales. Use these sets, along with the previous 1-3 days' flow and water level, or the previous flow, water level, and temperature, as input to a CNN-LSTM hybrid model. The CNN-LSTM hybrid model outputs a flow prediction. For example, construct 15-day, 7-day, 5-day, and 3-day cumulative rainfall time series data from the original rainfall data. Perform correlation analysis on the current day's cumulative rainfall time series data to select two sets of cumulative rainfall time series data with high correlation.
[0025] A CNN-LSTM hybrid model is constructed, employing an encoder-decoder architecture, comprising an encoder and a decoder. The encoder includes: an input layer that receives a multi-dimensional time series [T, F], where T is the time step and F is the number of features (the number of filtered factors + historical flow); convolutional layers that use multiple convolutional kernels of different widths (7-day, 5-day, 3-day) to extract local temporal patterns in parallel, extracting features such as the cumulative response time of different rainfall periods; a max-pooling layer to reduce the temporal dimension and enhance feature invariance; and an LSTM layer to extract long-term bidirectional temporal dependencies, memorizing the correlation between rainfall and water flow. The decoder includes: an attention mechanism layer that dynamically focuses on key periods in the historical input (such as periods of heavy rainfall) when predicting each future time step; a forecast layer that progressively generates flow sequences for future time steps; and a flow range output layer that uses the Monte Carlo Dropout method to perform multiple forward propagations and output the interval distribution of the predicted flow.
[0026] Training the CNN-LSTM hybrid model includes constructing training samples using a sliding window method, with 75% of the data used as the training set and 25% as the validation set. The loss function employs a combination of quantile loss functions, simultaneously optimizing mean prediction and the 90% confidence interval. The model with the best performance on the validation set is saved.
[0027] Using a pre-trained CNN-LSTM hybrid model, inputting future weather forecast data, the CNN-LSTM hybrid model outputs a flow sequence for the next 3 days. The range of water flow is then determined based on the 10th, 50th, and 90th quantiles.
[0028] Dynamic threshold verification and adaptive updates are implemented, using relative bias to analyze forecast accuracy, while different tolerance thresholds are set based on the flood season / non-flood season. When seven consecutive forecasts exceed the threshold, the CNN-LSTM hybrid model restarts training based on the latest input data.
[0029] The Hec-Ras model was constructed based on accurate river topographic data to establish a two-dimensional hydrodynamic model, coupled with a water quality module, including convection-diffusion equations and reaction kinetic equations for various water quality indicators. The main parameters calibrated included diffusion coefficient and degradation rate.
[0030] The flow sequence output by the CNN-LSTM hybrid model was used as the input boundary condition for the HEC-RAS model to run the 10th quantile, 50th quantile, and 90th quantile flow rates respectively. The water quality simulation under the three conditions output the forecast of water quality indicators.
[0031] Hec-Ras model training: Real-time dynamic data correction is performed using the Ensemble Kalman Filter (EnKF) data assimilation method. Assimilation is performed whenever new real-time water quality monitoring data is received. This assimilation applies to all state variables in the HEC-RAS model, i.e., the pollutant concentrations in each grid cell. Ensemble forecasting is run, generating the covariance matrix of multiple model states for the forecast ensemble. Combining observational data, the state of each ensemble member is updated using the Kalman filter formula, and the updated ensemble mean is used as the optimal estimate for further forecasting. This reduces forecast bias caused by initial condition errors and parameter uncertainties. The Hec-Ras model output results represent water quality numerical ranges based on different water level scenarios, and the Ensemble Kalman Filter (EnKF) data assimilation method is used to correct and reduce computational bias in real time.
[0032] Feedback Correction: Establish information feedback between the deep learning model and the mechanistic model. When the water quality results simulated by the HEC-RAS model under real-time data dynamic correction still show a large deviation from real-time monitoring (relative deviation greater than 10% for 7 consecutive days), adjust the hydrological input, feed the deviation information back to the CNN-LSTM hybrid model, and re-evaluate the importance of the input features. Simultaneously, adjust the HEC-RAS model parameters until the adjusted relative deviation is less than 5%.
[0033] The forecast results are visualized and output, generating a water quality concentration distribution map. The spatial distribution of water quality and graded early warnings for different future periods are displayed on the geographic information system platform. Based on the water quality category, six levels of water quality early warnings are issued: blue (Class I), cyan (Class II), green (Class III), yellow (Class IV), orange (worse than Class V), and red (worse than Class V).
[0034] This case study collects daily rainfall, temperature, flow, and water level data for a certain watershed over the past 10 years. Data preprocessing is performed, followed by factor screening. Training samples for a convolutional neural network and a long short-term memory artificial neural network (CNN-LSTM) are then constructed for model training. The relative deviation on the validation set is less than 5%. After passing a set deviation threshold, hydrological forecast data is output. Based on the collected data and CNN-LSTM simulation results, training samples for a Hec-Ras model are constructed. After passing a set relative deviation threshold, the Hec-Ras model is trained and integrated with real-time rainfall data. The system runs every 12 hours and outputs water quality prediction range data for the next 3 days. Simultaneously, a water quality concentration distribution map is generated, displaying the spatial distribution of water quality and graded early warnings for different future periods on a geographic information system platform. Six levels of water quality warnings are issued based on water quality category: blue (Class I), cyan (Class II), green (Class III), yellow (Class IV), orange (worse than Class V), and red (worse than Class V). When the water quality results simulated by HEC-RAS under real-time data dynamic correction still show a significant deviation from real-time monitoring (relative deviation greater than 10% for 7 consecutive days), the hydrological input is adjusted, and the deviation information is fed back to the CNN-LSTM model to re-evaluate the importance of the input features. Simultaneously, the HEC-RAS model parameters are adjusted until the adjusted relative deviation is less than 5%.
[0035] A computer-readable storage medium storing at least one instruction, which is loaded and executed by a processor to implement the water quality forecasting method based on deep learning and the Hec-Ras model as described above.
[0036] An electronic device includes a processor and a memory, the memory storing at least one instruction, which is loaded and executed by the processor to implement the water quality forecasting method based on deep learning and the Hec-Ras model as described above.
[0037] Furthermore, it should be understood that after reading the above description of the present invention, those skilled in the art can make various alterations or modifications to the present invention, and these equivalent forms also fall within the scope defined by the appended claims.
Claims
1. A water quality forecasting method based on deep learning and the Hec-Ras model, characterized in that, include: The maximum information coefficients of cumulative rainfall and current flow rate at different time scales are calculated respectively. The cumulative rainfall at different time scales are arranged from largest to smallest according to the maximum information coefficients. The top few cumulative rainfall at different time scales are selected and used together with previous flow rate and water level or previous flow rate, water level and temperature as input to the CNN-LSTM model. The CNN-LSTM model outputs the flow rate prediction. The CNN-LSTM model is trained and optimized by combining quantile loss functions. The Hec-Ras model is adopted, and the flow prediction output by the CNN-LSTM model and historical water quality data are used as inputs to output water quality predictions; the Hec-Ras model uses the ensemble Kalman filter data assimilation method for real-time dynamic data correction. When the deviation between the water quality prediction output by the Hec-Ras model and the measured value exceeds the set limit, the deviation information is fed back to the CNN-LSTM model. The cumulative rainfall at several different time scales is re-selected as the input of the CNN-LSTM model according to the aforementioned process. The CNN-LSTM model is adjusted, and the parameters of the Hec-Ras model are adjusted at the same time until the deviation between the water quality prediction output by the Hec-Ras model and the measured value is within the set range.
2. The water quality forecasting method based on deep learning and the Hec-Ras model according to claim 1, characterized in that, The data collected for use in the CNN-LSTM and Hec-Ras models are preprocessed as follows: for missing data, spatial interpolation is used to supplement the missing data in combination with the hydrological relationship between upstream and downstream areas; outliers and data with obvious deviations are removed by setting a pre-set threshold.
3. The water quality forecasting method based on deep learning and the Hec-Ras model according to claim 1, characterized in that, A CNN-LSTM model consists of an encoder and a decoder.
4. The water quality forecasting method based on deep learning and the Hec-Ras model according to claim 3, characterized in that the encoder... include: The input layer receives a multidimensional time series [T,F], where T is the time step and F is the number of features. Convolutional layers use multiple convolutional kernels of different widths to extract local temporal patterns in parallel; Max pooling layers reduce the time dimension and enhance feature invariance; The LSTM layer extracts the long-term bidirectional temporal dependencies, remembering the correlation between rainfall and flow.
5. The water quality forecasting method based on deep learning and the Hec-Ras model according to claim 3, characterized in that, The decoder includes: The attention mechanism layer allows the model to dynamically focus on key periods in the historical input when predicting each future time step; The forecast layer gradually generates flow sequences for future time steps; The traffic range output layer performs multiple forward propagations using the Monte Carlo Dropout method to output the interval distribution of traffic prediction.
6. The water quality forecasting method based on deep learning and the Hec-Ras model according to claim 1, characterized in that, The CNN-LSTM model uses the 10th, 50th, and 90th percentiles to determine the range of water flow.
7. A computer-readable storage medium, characterized in that, The storage medium stores at least one instruction, which is loaded and executed by a processor to implement the water quality forecasting method based on deep learning and the Hec-Ras model as described in any one of claims 1 to 6.
8. An electronic device, characterized in that, The electronic device includes a processor and a memory, the memory storing at least one instruction, which is loaded and executed by the processor to implement the water quality forecasting method based on deep learning and the Hec-Ras model as described in any one of claims 1 to 6.