A surface water anomaly identification and cause reasoning method based on a large model
The anomaly identification and causal reasoning method based on large model construction solves the problem of the separation between anomaly identification and causal analysis in the surface water environment monitoring system, realizes full-process automation and efficient intelligent analysis, and improves the accuracy of anomaly identification and the interpretability of causal reasoning.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- CHINA NAT ENVIRONMENTAL MONITORING CENT
- Filing Date
- 2026-03-19
- Publication Date
- 2026-06-16
AI Technical Summary
Existing surface water environmental monitoring systems suffer from problems such as insufficient adaptability, fragmented analysis, inadequate utilization of multi-source data, and lack of interpretability in the reasoning process in anomaly identification and cause analysis, making it difficult to achieve a holistic understanding and efficient analysis of complex environmental scenarios.
A large-scale model-based method for surface water anomaly identification and causal reasoning is adopted. Monitoring data and auxiliary data are acquired and preprocessed. A spatiotemporal correlation graph is constructed using an anomaly identification large-scale model and a dynamic graph neural network. Combined with a causal reasoning large-scale model, multi-dimensional reasoning and evidence organization are carried out to generate anomaly causal analysis results and evidence chains.
It has achieved full automation of the surface water anomaly identification and causal reasoning process, improved the intelligence level of analysis and decision support capabilities, reduced reliance on experts, and enhanced the accuracy of anomaly identification and the interpretability of causal reasoning.
Smart Images

Figure CN122222016A_ABST
Abstract
Description
Technical Field
[0001] This application relates to the field of environmental monitoring technology, and in particular to a method for identifying and reasoning about the causes of surface water anomalies based on a large model. Background Technology
[0002] Existing surface water environment monitoring systems typically collect water quality-related environmental indicator data continuously through automatic monitoring stations and analyze and issue early warnings based on fixed thresholds, statistical rules, or simple models. For example, in the field of surface water environment monitoring, the determination of whether the water quality at a monitoring section is abnormal is usually based on water quality category criteria, single-factor evaluation methods, or fixed threshold rules.
[0003] In some more advanced systems, techniques based on time series analysis, regression analysis, or machine learning models (such as cluster analysis and outlier detection algorithms) have been introduced to improve the automation of anomaly identification. In addition, some research and applications have attempted to combine meteorological data, pollution source inventories, or upstream-downstream cross-sectional relationships to assist in the analysis of anomaly results; however, overall, manual analysis and expert experience remain the primary methods.
[0004] In practical applications, although the above technologies can achieve automatic discovery of abnormal data to a certain extent, their analysis process is usually decentralized and rule-driven, lacking the ability to understand the overall business scenarios in complex environments, and making it difficult to achieve systematic reasoning about the causes of anomalies.
[0005] Existing methods for anomaly identification and analysis in surface water environmental monitoring have the following main shortcomings: 1. The anomaly identification methods are limited and lack adaptability. Existing methods mostly rely on fixed thresholds, statistical rules, or single models, which are poorly adaptable to different regions, seasons, and site characteristics, and are prone to false alarms or missed alarms.
[0006] 2. Disconnect between anomaly identification and causal analysis. Existing systems can typically only identify whether an anomaly is abnormal, while causal analysis after an anomaly occurs still relies on human experts to make a comprehensive judgment based on information such as meteorological conditions, pollution source distribution, and upstream and downstream relationships. This results in low analysis efficiency and poor consistency of results.
[0007] 3. Insufficient utilization of multi-source data. Although the monitoring system already has multi-source information such as meteorological data, pollution source data, and geospatial data, existing technologies mostly rely on simple correlation or manual querying, making it difficult to achieve a unified understanding and collaborative analysis of multi-source information.
[0008] 4. Lack of interpretable reasoning process. While some methods based on machine learning or statistical models can output anomaly detection results, they cannot clearly provide the logical path and supporting evidence for the formation of anomalies, making it difficult to meet the requirements of environmental regulation for traceable and interpretable analysis processes.
[0009] The root cause of the above-mentioned defects is that existing technologies are mainly based on rules or single models for local analysis, and lack the overall modeling capability for the semantics and complex causal relationships of surface water environmental monitoring.
[0010] Therefore, the urgent technical problems to be solved at present are the fragmentation between anomaly identification and cause analysis in existing surface water environmental monitoring technologies, low analysis efficiency, insufficient utilization of multi-source data, and lack of interpretability in the reasoning process. Summary of the Invention
[0011] The purpose of this application is to provide a method for identifying and reasoning about the causes of surface water anomalies based on a large model, so as to realize the intelligent identification of anomalies in surface water environmental monitoring data, and automatically complete multi-dimensional reasoning and evidence organization of the causes of anomalies based on anomaly identification, thereby improving the level of intelligence of ecological and environmental supervision and decision support capabilities.
[0012] To achieve the above objectives, this application provides a method for surface water anomaly identification and causal reasoning based on a large model. The method includes: acquiring surface water environmental monitoring data and auxiliary data related to monitoring operations; preprocessing the surface water environmental monitoring data and auxiliary data related to monitoring operations; identifying anomalies and anomaly features based on a pre-constructed anomaly identification large model; in response to the identification of anomalies, constructing a spatiotemporal correlation graph between watershed monitoring stations based on a dynamic graph neural network; and using the spatiotemporal correlation graph between watershed monitoring stations as constraints, performing causal reasoning on the anomalies through the causal reasoning large model to generate anomaly causal analysis results and corresponding evidence chains.
[0013] The surface water anomaly identification and causal reasoning method based on large models, as described above, further includes: constructing an interpretable causal analysis report that includes evidence chains, causal reasoning processes, and candidate causal confidence levels.
[0014] The surface water anomaly identification and causal reasoning method based on large models described above includes the following preprocessing steps for surface water environmental monitoring data and auxiliary data related to monitoring operations: time alignment, missing value handling, and initial screening of outliers for surface water environmental monitoring data and auxiliary data related to monitoring operations; and multimodal feature fusion for surface water environmental monitoring data and auxiliary data related to monitoring operations to construct a unified high-dimensional feature representation.
[0015] The surface water anomaly identification and causal reasoning method based on the large model described above involves identifying anomalies in pre-constructed large anomaly identification models on pre-processed surface water environmental monitoring data and auxiliary data related to monitoring operations. Identifying anomalous events and features includes: the large anomaly identification model comprehensively analyzing the temporal variation characteristics, spatial distribution characteristics, and historical patterns of the pre-processed surface water environmental monitoring data and auxiliary data related to monitoring operations to automatically identify anomalous events in the surface water environmental monitoring data; and outputting structured anomalous feature vectors for the identified anomalous events.
[0016] The surface water anomaly identification and causal reasoning method based on a large model, as described above, involves the anomaly identification large model comprehensively analyzing the temporal variation characteristics, spatial distribution characteristics, and historical patterns of preprocessed surface water environmental monitoring data and auxiliary data related to monitoring operations. This automatically identifies anomalous events in the surface water environmental monitoring data by: calculating temporal variation characteristic data based on the preprocessed surface water environmental monitoring data; extracting spatial distribution characteristic data based on the preprocessed surface water environmental monitoring data and auxiliary data related to monitoring operations; performing historical pattern analysis on the preprocessed surface water environmental monitoring data to obtain historical pattern characteristic data; calculating a comprehensive anomaly value based on the temporal variation characteristic data, spatial distribution characteristic data, and historical pattern characteristic data; and comparing the comprehensive anomaly value with a preset anomaly threshold. If the comprehensive anomaly value is less than the preset anomaly threshold, the surface water environmental monitoring is considered to be in a normal state; otherwise, the temporal variation characteristic data, spatial distribution characteristic data, and historical pattern characteristic data are input into an anomaly event classification decision tree to obtain the type of anomaly event.
[0017] The surface water anomaly identification and causal reasoning method based on a large model, as described above, includes the following steps in response to the identification of anomaly events: constructing a spatiotemporal correlation graph among watershed monitoring stations based on a dynamic graph neural network. These steps involve: identifying the monitoring stations with anomalies based on the anomaly events; constructing a spatiotemporal correlation graph among watershed monitoring stations using all monitoring stations within the watershed containing the anomaly monitoring station as nodes; designating two adjacent monitoring stations connected by river channels as associated monitoring stations; establishing edge connections between the nodes corresponding to the associated monitoring stations in the spatiotemporal correlation graph; calculating the hydraulic connection strength between the associated monitoring stations; using this hydraulic connection strength as the weight of the edges in the spatiotemporal correlation graph; and updating the hydraulic connection strength of the associated monitoring stations in the spatiotemporal correlation graph in real time.
[0018] The surface water anomaly identification and causal reasoning method based on a large model, as described above, uses a spatiotemporal correlation diagram between watershed monitoring stations as a constraint. The causal reasoning large model is used to perform causal reasoning on anomaly events, generating anomaly causal analysis results and corresponding evidence chains. This includes: filtering monitoring stations with hydraulic connection strength greater than a preset threshold between the associated monitoring stations in the spatiotemporal correlation diagram and the preset threshold, forming a set of inference analysis stations; inputting basic information of the anomaly event, anomaly characteristics, comprehensive anomaly values, the spatiotemporal correlation diagram, and auxiliary data related to monitoring operations into the causal reasoning large model; and tracing the source and making causal inferences on the anomaly event under the constraints of the spatiotemporal correlation diagram.
[0019] The surface water anomaly identification and causal reasoning method based on the large model described above, wherein the causal reasoning large model, under the constraint of the spatiotemporal correlation graph, performs source tracing and causal inference of anomalous events, including: Centered on the anomaly monitoring stations, only monitoring stations with hydraulic connections are retained in the spatiotemporal correlation diagram; the causal reasoning model traces the source upstream and verifies the diffusion downstream based on the spatiotemporal correlation diagram; the causal reasoning model uses the hydraulic conduction time in the spatiotemporal correlation diagram as a basis, calls the pre-constructed causal reasoning process to determine the cause of the anomaly; obtains multi-dimensional objective evidence, generates an evidence chain, and verifies the anomaly cause analysis results through the evidence chain.
[0020] The surface water anomaly identification and causal reasoning method based on large models described above includes the following pre-constructed causal reasoning process: constructing a set of candidate causes for anomalous events; matching the evidence conditions of each candidate cause according to the occurrence time of the anomalous event, and filtering out matching candidate causes; calculating the confidence level of the filtered candidate causes, sorting the filtered candidate causes according to their confidence levels, forming a causal reasoning result, and outputting the candidate cause with the highest confidence level.
[0021] The surface water anomaly identification and causal reasoning method based on large models described above includes upstream input type, local generation type and regional diffusion type in the candidate causal set.
[0022] The beneficial effects achieved by this application are as follows: (1) This application realizes intelligent analysis of surface water anomaly identification and cause reasoning, automating the entire process from data input to cause report output, eliminating artificial faults.
[0023] (2) The anomaly identification big model of this application comprehensively analyzes the temporal change characteristics, spatial distribution characteristics and historical patterns of the preprocessed surface water environment monitoring data and the auxiliary data related to the monitoring business, automatically identifies abnormal events in the surface water environment monitoring data, and automatically completes multi-dimensional reasoning through the big model, reducing the reliance on experts.
[0024] (3) This application improves the accuracy of calculating the hydraulic connection strength of associated monitoring stations by updating the spatiotemporal correlation diagram in real time. Calculating the hydraulic connection strength between associated monitoring stations can quantify the hydraulic transmission capacity and influence between stations, realize the reasonable association and dynamic linkage of multiple monitoring stations, thereby improving the accuracy and reliability of watershed anomaly identification, pollution source tracing, and spatial anomaly analysis, and providing key support for real-time water environment monitoring and intelligent early warning.
[0025] (4) Under the constraints and guidance of the spatiotemporal correlation diagram, the big model of causal reasoning in this application intelligently reasons about the source, propagation path and scope of influence of abnormal events, and constructs a chain of evidence. The chain of evidence is used to verify the accuracy of causal reasoning and improve the interpretability and credibility of causal reasoning. Attached Figure Description
[0026] To more clearly illustrate the technical solutions in the embodiments of this application or the prior art, the drawings used in the description of the embodiments or the prior art will be briefly introduced below. Obviously, the drawings described below are only some embodiments recorded in this application. For those skilled in the art, other drawings can be obtained based on these drawings.
[0027] Figure 1 This is a flowchart illustrating a method for identifying and reasoning about the causes of surface water anomalies based on a large model, according to an embodiment of this application.
[0028] Figure 2 This is a flowchart illustrating a pre-constructed causal reasoning process in an embodiment of this application. Detailed Implementation
[0029] The technical solutions of the embodiments of this application will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some, not all, of the embodiments of this application. All other embodiments obtained by those skilled in the art based on the embodiments of this application without creative effort are within the scope of protection of this application.
[0030] like Figure 1 As shown, this application provides a method for surface water anomaly identification and causal reasoning based on a large model. The method includes the following steps: Step S1: Obtain surface water environmental monitoring data and auxiliary data related to monitoring operations.
[0031] Among them, surface water environment monitoring data are hourly or daily monitoring index data of monitoring stations (or sections), including: conventional five parameters (pH, dissolved oxygen, conductivity, turbidity, water temperature), nutrient indicators (ammonia nitrogen, total phosphorus, total nitrogen), organic pollution indicators (permanganate index, chemical oxygen demand), etc.
[0032] Among them, the auxiliary data related to monitoring operations shall include at least one or more of the following: meteorological data (including wind direction, wind speed, rainfall, temperature, etc.), pollution source data (including pollution source location, type and emission characteristics), and geospatial data (including station location, upstream and downstream relationship of watershed or administrative division information).
[0033] Step S2 involves preprocessing the surface water environment monitoring data and auxiliary data related to the monitoring operations.
[0034] Specifically, surface water environment monitoring data and auxiliary data related to monitoring operations are preprocessed to form structured input data for large-scale model analysis.
[0035] Large models refer to high-capacity deep learning models with a parameter scale exceeding hundreds of millions or even trillions. They are usually constructed from deep neural networks and have billions or even hundreds of billions of parameters.
[0036] Step S2 includes: Step S210 involves performing time alignment, missing value processing, and initial screening of outliers on surface water environmental monitoring data and auxiliary data related to monitoring operations.
[0037] Step S220: Multimodal feature fusion is performed on surface water environment monitoring data and auxiliary data related to monitoring operations to construct a unified high-dimensional feature representation.
[0038] Step S3: Based on the pre-built anomaly identification model, perform anomaly identification on the pre-processed surface water environmental monitoring data and auxiliary data related to monitoring operations to identify abnormal events and abnormal features.
[0039] Among them, the pre-built anomaly identification model learns the dynamic evolution of surface water environment monitoring data at multiple time scales (hours, days, seasons) through a hierarchical Transformer architecture or a continuous-time neural network.
[0040] As a specific embodiment of the present invention, the pre-constructed large-scale anomaly recognition model includes a data input module, a feature extraction module, a feature fusion module, an inference prediction module, and a parameter optimization module. Each module is functionally independent and works collaboratively. Specifically, the data input module is used to receive raw input data and perform preprocessing; the feature extraction module is used to perform feature mining and filtering on the preprocessed data; the feature fusion module is used to fuse the extracted multi-dimensional features; the inference prediction module is used to output the target result based on the fused features; and the parameter optimization module is used to dynamically adjust and optimize the parameters during the model training process.
[0041] The system comprises the following modules: a data input module with two layers (data cleaning and normalization); a feature extraction module with three layers (convolutional, pooling, and fully connected), where convolutional layers extract local features, pooling layers reduce feature dimensionality while retaining key information, and fully connected layers map local features to global features; a feature fusion module with two layers (feature alignment and weight allocation), where the feature alignment layer aligns features of different dimensions and the weight allocation layer assigns weights to different features to highlight key features; an inference and prediction module with two layers (activation and output), where the activation layer introduces non-linear mapping and the output layer outputs the final prediction result; and a parameter optimization module with two layers (loss calculation and gradient descent), where the loss calculation layer calculates the deviation between the prediction and the actual result, and the gradient descent layer adjusts the model parameters based on the deviation.
[0042] The data input module's output is directly connected to the feature extraction module's input, and all preprocessed data from the data input module is transmitted to the feature extraction module for feature extraction. The feature extraction module's output is connected to the feature fusion module's input, and the multi-dimensional features output by the feature extraction module are transmitted to the feature fusion module for fusion processing. The feature fusion module's output is connected to the inference prediction module's input, and the fused features are transmitted to the inference prediction module for inference prediction. The inference prediction module's output is connected to the parameter optimization module's input, and the prediction results and bias data output by the inference prediction module are transmitted to the parameter optimization module. The parameter optimization module's output is connected to the inputs of the feature extraction module, feature fusion module, and inference prediction module, respectively, and the adjusted parameters are fed back to the corresponding modules to achieve dynamic optimization of model parameters. All connections between modules use wired data transmission to ensure the stability and real-time performance of data transmission.
[0043] The training steps of the large-scale anomaly detection model are as follows: Step T1, data preparation: collect raw data, remove abnormal and missing data through the cleaning layer of the data input module, and normalize the data to the [0,1] interval through the normalization layer to obtain preprocessed data. The preprocessed data is then divided into training and validation sets in an 8:2 ratio. Step T2, model initialization: assign initial parameters to the feature extraction module, feature fusion module, and inference prediction module. The initial parameters are randomly initialized, with a value range of [-0.01, 0.01]. Step T3, model training: input the training set data into the initialized model, extract data features through the feature extraction module, and fuse them through the feature fusion module. Step T4: Parameter adjustment. The loss value between the predicted result and the true result in the training set is calculated by the loss calculation layer of the parameter optimization module. The gradient of each module parameter is calculated based on the loss value by the gradient descent layer. The parameters of each module are adjusted according to the preset step size to minimize the loss value. Step T5: Iterative training. Steps T3 to T4 are repeated until the loss value converges to the preset threshold or the number of iterations reaches the preset maximum value, and training is stopped. Step T6: Model validation. The validation set data is input into the trained model to verify the prediction accuracy of the model. If the accuracy does not meet the preset standard, return to step T4 to adjust the parameters and iterate again until the accuracy requirement is met, and the final trained model is obtained. The key parameters during model training are as follows: the data normalization threshold is [0,1], and the ratio of training set to validation set is 8:2; in the feature extraction module, the convolutional kernel size of the convolutional layer is 3×3, the convolution stride is 1, and the padding method is "same"; the pooling kernel size of the pooling layer is 2×2, and the pooling stride is 2; in the feature fusion module, the initial weight coefficients of the weight allocation layer are [0.3,0.4,0.3], which can be dynamically adjusted according to the training process; in the inference and prediction module, the activation layer uses the ReLU activation function, and the output layer uses the Sigmoid activation function; in the parameter optimization module, the loss function is the cross-entropy loss function, the gradient descent step size is 0.001, the loss value convergence threshold is 0.0001, and the maximum number of iterations is 1000; in the model training process, the batch size is 32, the learning rate decay coefficient is 0.95, and the decay period is 50 iterations.
[0044] Step S3 includes: Step S310: The anomaly identification big data model comprehensively analyzes the temporal variation characteristics, spatial distribution characteristics, and historical patterns of the preprocessed surface water environment monitoring data and auxiliary data related to monitoring operations, and automatically identifies abnormal events in the surface water environment monitoring data.
[0045] Step S310 includes: Step S311: Calculate time variation characteristic data based on the pre-processed surface water environmental monitoring data.
[0046] Specifically, the time-varying characteristic data includes: the mean, standard deviation, variance, difference from the standard value, rate of change, maximum value, and minimum value of the monitoring indicators.
[0047] Define the time feature vector as: T = [t1, t2, ..., tm]; where t1, t2, ..., tm represent the time-varying feature data of the 1st to 1st time period, respectively.
[0048] Step S312: Extract spatial distribution characteristic data based on the preprocessed surface water environment monitoring data and auxiliary data related to monitoring operations.
[0049] Specifically, the extracted spatial distribution features include: spatial autocorrelation, hotspots, pollution propagation paths, and spatial gradient features.
[0050] Define the spatial distribution feature vector: K = [k1, k2, ..., kn]; where k1, k2, ..., kn represent the 1st to nth spatial distribution feature data respectively.
[0051] Step S313: Perform historical pattern analysis on the preprocessed surface water environmental monitoring data to obtain historical pattern characteristic data.
[0052] Specifically, by reconstructing errors, prediction biases, and comparative learning, the deviation between predicted and actual values, and the deviation between surface water environmental monitoring data and historical models are obtained, which serve as historical model feature data.
[0053] Define the historical pattern feature vector: H = [h1, h2, ..., hp]; where h1, h2, ..., hp represent the 1-pth historical pattern data respectively.
[0054] Step S314: Calculate the comprehensive outlier value based on the time variation characteristic data, spatial distribution characteristic data, and historical pattern characteristic data.
[0055] The formula for calculating the comprehensive outlier is as follows: ; in, Indicates anomaly scores based on time-varying characteristics; Indicates anomaly scores based on spatial distribution characteristics; This indicates an anomaly score for historical pattern features.
[0056] ; in, ; in, This represents the total number of data categories that indicate time-varying characteristics. Indicates the first The weighting of the influence of time-varying feature data on anomaly scoring; Indicates the first The measured value of the time-varying characteristic data at time t; Indicates the first The mean of a type of time-varying characteristic data over a period of time (e.g., 30 days, 60 days, etc.) before time t; Indicates the first The standard deviation of time-varying characteristic data over a period of time before time t.
[0057] ; in, ; in, This represents the total number of categories of spatially distributed characteristic data. Indicates the first The influence weight of spatial distribution characteristics on anomaly scoring; Indicates the first The measured values of spatial distribution characteristics data at time t; Indicates the first The mean of spatial distribution characteristic data over a period of time (e.g., 30 days, 60 days, etc.) before time t; Indicates the first The standard deviation of spatial distribution characteristic data over a period of time before time t.
[0058] ; in, ; in, This represents the total number of categories of historical pattern feature data; Indicates the first The influence weight of historical pattern feature data on anomaly scoring; Indicates the first The measured values of historical pattern feature data at time t; Indicates the first The mean of historical pattern characteristic data over a period of time (e.g., 30 days, 60 days, etc.) before time t; Indicates the first The standard deviation of historical pattern characteristic data over a period of time before time t.
[0059] Step S315: Compare the comprehensive anomaly value with the preset anomaly threshold. If the comprehensive anomaly value is less than the preset anomaly threshold, the surface water environment monitoring is in a normal state. Otherwise, input the time change characteristic data, spatial distribution characteristic data, and historical pattern characteristic data into the anomaly event classification decision tree to obtain the type of anomaly event.
[0060] Specifically, the steps for identifying the type of anomaly using an anomaly classification decision tree include: Based on the anomaly scores for time-varying features, spatial distribution features, and historical pattern features, it is determined whether the anomaly conditions corresponding to various types of anomalies are met. If the anomaly conditions corresponding to a certain type of anomaly are met, then that type of anomaly is output as the identification result.
[0061] Among them, the abnormal conditions for sudden increases or decreases in indicators are: and ;in, Indicates a dynamic threshold that changes over time; This represents the dynamic threshold of spatial distribution. When... This indicates that the monitoring indicators show strong dynamic anomalies (sudden changes, trend deviations, periodic disruptions, etc.) in the time series. The spatial dimension is basically normal, and the anomaly does not show obvious spatial propagation or clustering characteristics; the spatial distribution is relatively isolated. Anomalies with sudden increases or decreases in indicators include, for example, sudden pollution events at a single site (illegal discharge from local sewage outlets, localized accidental leaks).
[0062] The abnormal conditions for spatial coordination anomalies or local anomalies are as follows: The indication is that the spatial dimension is significantly abnormal, and the abnormality shows strong spatial clustering, propagation or synergy, which goes beyond the scope of a single point.
[0063] The abnormal conditions for persistently exceeding the standard are as follows: ;in, Indicates the dynamic threshold of the historical pattern; The following message indicates a significant deviation from historical dimensions, meaning the current state differs significantly from historical periods, trends, or distributions, exceeding the normal fluctuation range.
[0064] As a specific embodiment of the present invention, the time-varying dynamic threshold, the spatial distribution dynamic threshold, and the historical pattern dynamic threshold are updated periodically and are not fixed values, thereby adapting to changes in data, avoiding the failure of various thresholds under various abnormal conditions, reducing the false alarm rate, and improving the accuracy of abnormal event detection.
[0065] As a specific embodiment of the present invention, it is known that under normal conditions, time change feature data, spatial distribution feature data, and historical pattern feature data are acquired within the sampling period, and time change dynamic threshold, spatial distribution dynamic threshold, and historical pattern dynamic threshold are calculated based on the acquired time change feature data, spatial distribution feature data, and historical pattern feature data.
[0066] in, ; in, Indicates the number of samples within the sampling period Mean of time-varying characteristic data; Indicates the number of samples within the sampling period The weighting of the influence of time-varying feature data on the dynamic threshold; This indicates the adjustment coefficient (which can be 2, 2.5, or 3, depending on the strictness of the threshold). Indicates the number of samples within the sampling period Standard deviation of time-varying characteristic data; This indicates the total number of data types that represent time-varying characteristics.
[0067] in, ; in, Indicates the number of samples within the sampling period Mean of spatial distribution characteristics data; Indicates the number of samples within the sampling period Standard deviation of spatial distribution characteristic data; Indicates the number of samples within the sampling period The influence weight of spatial distribution characteristics on dynamic thresholds; This indicates the total number of types of spatially distributed characteristic data.
[0068] in, ; in, Indicates the number of samples within the sampling period Mean of historical pattern characteristic data; Indicates the number of samples within the sampling period Standard deviation of historical pattern characteristic data; Indicates the first The influence weight of historical pattern feature data on dynamic thresholds; This indicates the total number of categories of historical pattern feature data.
[0069] As a specific embodiment of the present invention, those that do not meet the above-mentioned abnormal conditions are considered complex abnormalities and require manual review.
[0070] These abnormal events include, but are not limited to: (1) Abnormalities of sudden increase or decrease in indicators: One or more indicators deviate significantly from the normal baseline in a short period of time; (2) Persistent exceedance type of abnormality: The indicator deviates from the standard limit or the level of the same period in history for a long time, showing a trend of deterioration or periodic recurrence; (3) Spatial coordination anomalies or local anomalies: chain anomalies that propagate across sites, regional contiguous anomalies, or isolated anomalies at a single site.
[0071] Step S320: Output a structured anomaly feature vector for the identified abnormal events.
[0072] As a specific embodiment of the present invention, for the identified abnormal events, a structured abnormal feature vector containing time features, spatial features, index features, historical pattern deviations, and multi-source consistency verification is output to provide input for subsequent causal reasoning.
[0073] As a specific embodiment of the present invention, the structured anomaly feature vector includes: time features (occurrence time, duration, rate of change), spatial features (range of influence, direction of propagation), index features (multiple of exceedance, synergistic pattern), and contextual features (related meteorological conditions, pollution source activities), forming an anomaly feature vector for causal reasoning.
[0074] Step S4: In response to the identification of abnormal events, a spatiotemporal correlation graph between watershed monitoring stations is constructed based on a dynamic graph neural network.
[0075] Step S4 includes: Step S410: In response to the identification of an abnormal event, the monitoring station with the abnormality is determined based on the abnormal event. Using all monitoring stations in the watershed where the abnormal monitoring station is located as nodes, a spatiotemporal correlation graph among the monitoring stations in the watershed is constructed based on a dynamic graph neural network.
[0076] Based on anomalous events, monitoring stations with anomalies are identified. Using all monitoring stations within the watershed where the anomalous monitoring station is located as nodes, and the hydraulic connections, geographical connections, and temporal propagation relationships between the stations as edges, a spatiotemporal relationship graph is constructed to represent the potential propagation paths and spatiotemporal relationships of anomalies within the watershed. A dynamic graph neural network is then used to learn the hydraulic connections and pollution propagation paths of the monitoring stations, improving the accuracy of surface water anomaly identification and causal reasoning.
[0077] Step S420: Two adjacent monitoring stations connected by the river channel are designated as associated monitoring stations, and edge connections are established between the nodes corresponding to the associated monitoring stations in the spatiotemporal association graph.
[0078] The spatiotemporal correlation graph consists of multiple nodes, each corresponding to a monitoring station. Two adjacent monitoring stations connected by a river are defined as associated monitoring stations. The two nodes corresponding to two adjacent monitoring stations connected by a river are linked in the spatiotemporal correlation graph by a straight line, forming an edge. The two nodes corresponding to two monitoring stations not connected by a river are not linked in the spatiotemporal correlation graph. The spatiotemporal correlation graph is used to describe the possible propagation paths of anomalies.
[0079] Step S430: Calculate the hydraulic connection strength of the associated monitoring stations, use the hydraulic connection strength as the weight of the edge in the spatiotemporal association graph, and update the hydraulic connection strength of the associated monitoring stations in the spatiotemporal association graph in real time.
[0080] Specifically, based on hydraulic connection characteristic data (such as runoff distance and flow velocity), geographical straight-line distance, river width, and water level at the monitoring stations, the hydraulic connection strength of the associated monitoring stations is calculated and dynamically updated. This application improves the accuracy of calculating the hydraulic connection strength of associated monitoring stations by updating the hydraulic connection strength of associated monitoring stations in a real-time spatiotemporal correlation map. Calculating the hydraulic connection strength between associated monitoring stations can quantify the hydraulic transmission capacity and influence between stations, enabling reasonable association and dynamic linkage of multiple monitoring stations. This improves the accuracy and reliability of watershed anomaly identification, pollution source tracing, and spatial anomaly analysis, providing crucial support for real-time water environment monitoring and intelligent early warning.
[0081] The formula for calculating the hydraulic connection strength between the associated monitoring stations is as follows: ; Among them, monitoring stations At monitoring stations Downstream of.
[0082] in, Indicates monitoring stations and monitoring stations The hydraulic connection strength; Indicates monitoring stations and monitoring stations River connectivity coefficient; if monitoring station and monitoring stations If the waterways are connected, then ,otherwise, For example, gate closure / dam blockage / flow interruption can cause monitoring stations to be affected. and monitoring stations If the river channels are not connected, then ; This represents the weight of the influence of the runoff distance of the connected waterways on the strength of the hydraulic connection. Indicates the weight of the influence of geographical straight-line distance on the strength of hydraulic connection; This indicates the weight of the influence of river width on the hydraulic connection strength. This indicates the weight of the influence of the water level difference between associated monitoring stations on the strength of the hydraulic connection. Indicates monitoring stations and monitoring stations The runoff distance between the connecting waterways; Indicates monitoring stations and monitoring stations The flow rate of the water circulating between them; Indicates monitoring stations and monitoring stations The geographical straight-line distance between them; Indicates monitoring stations and monitoring stations The width of the connecting waterways; Indicates monitoring stations Water level height; Indicates monitoring stations Water level height; Indicates monitoring stations and monitoring stations The attenuation coefficient corresponding to the water system to which it belongs; among them, the attenuation coefficients are different for different types of water systems.
[0083] As a specific embodiment of the present invention, monitoring stations and monitoring stations The river system it belongs to is a mountain river / mountain stream. Monitoring sites and monitoring stations The river system it belongs to is plain rivers / small and medium-sized rivers. Monitoring sites and monitoring stations The river system it belongs to is a plain river network area. Monitoring sites and monitoring stations The water system it belongs to is an urban inland river / scenic waterway. Monitoring sites and monitoring stations The water system it belongs to is the urban drainage network / culvert. Monitoring sites and monitoring stations The water system it belongs to is a lake and reservoir area (still water environment). .
[0084] This application uses calculation monitoring stations and monitoring stations The strength of the hydraulic connection is used to determine which stations are truly correlated, avoid false associations, distinguish stations with strong hydraulic connections from those that are geographically close but have no water flow relationship, prevent unrelated stations from being included in the anomaly analysis, and reduce false associations, false warnings, and false tracing. Based on the upstream and downstream relationship, conduction time, and connection strength, it is possible to infer which station / river section the anomaly is most likely to come from. Upgrade from single-point alarm to full-basin linkage positioning. When the gate is opened or closed, the flow velocity changes, the water level rises or falls, silt accumulates, or the flow is interrupted, the strength of the hydraulic connection between monitoring stations will change in real time, and an accurate association relationship is always maintained.
[0085] It can be understood that the greater the strength of the hydraulic connection, the closer the hydrological response between monitoring stations, and the higher the possibility of anomaly propagation and the degree of influence. The smaller the strength of the hydraulic connection, the weaker the mutual influence between monitoring stations. Based on the spatio-temporal association graph, anomaly tracing, anomaly propagation deduction, multi-station linkage verification, and spatial anomaly region identification can be achieved.
[0086] Step S440: Regularly update the hydraulic conduction time between two monitoring stations in the spatio-temporal association graph.
[0087] Specifically, the hydraulic conduction time is equal to the result of dividing the runoff distance of the connected river channel between two monitoring stations by the water flow velocity.
[0088] Step S5: Using the spatio-temporal association graph between basin monitoring stations as a constraint, conduct cause inference on abnormal events through a cause inference large model to generate an abnormal cause analysis result and the corresponding evidence chain.
[0089] As an explanation, under the constraint and guidance of the spatio-temporal association graph, the cause inference large model conducts intelligent inference on the source, propagation path, and influence range of abnormal events, and constructs an evidence chain. For the identified abnormal events, the large model further combines meteorological conditions (rainfall, temperature, wind speed), pollutant source distribution, upstream and downstream relationships, and historical case information to conduct inference analysis on the possible causes of the anomaly, and obtains an abnormal analysis result and the corresponding evidence chain. The abnormal cause analysis result and the corresponding evidence chain are used for environmental supervision warning, analysis, and decision support.
[0090] Among them, the module composition, hierarchical structure, inter-module connection relationship, and training method of the cause inference large model and the anomaly recognition large model are the same.
[0091] Step S5 includes: Step S510: Based on the magnitude relationship between the hydraulic connection strength of the associated monitoring stations in the spatio-temporal association graph and the preset threshold, screen out the monitoring stations with a hydraulic connection strength greater than the preset threshold with the abnormal station to form a set of inference analysis stations.
[0092] Understandably, monitoring stations with a hydraulic connection strength greater than a preset threshold with the abnormal stations are selected, namely upstream, downstream, and surrounding stations with a strong hydraulic connection with the abnormal stations, forming a set of stations for inference analysis, and excluding interference from unrelated stations.
[0093] Step S520: Input the basic information of the abnormal event, abnormal characteristics, comprehensive abnormal value, spatiotemporal correlation diagram, and auxiliary data related to the monitoring business into the causal reasoning model.
[0094] In step S530, the causal reasoning model, under the constraints of the spatiotemporal correlation diagram, performs source tracing and causal inference on abnormal events.
[0095] Specifically, the causal reasoning model determines whether the cause of an abnormal event is local or upstream transmission, and determines the path and scope of the abnormal propagation.
[0096] Step S530 includes: Step S531: Centered on the anomaly monitoring stations, only the monitoring stations with hydraulic connections are retained in the spatiotemporal correlation diagram.
[0097] Specifically, centered on the abnormal monitoring stations, only monitoring stations with hydraulic connections are retained in the spatiotemporal correlation graph; that is, monitoring stations that are directly or indirectly connected to the abnormal stations are retained, thereby excluding unconnected monitoring stations. Under the constraints of the spatiotemporal correlation graph, the causal reasoning model automatically excludes areas without hydraulic connections, closed gates, flow interruptions, and disconnected river sections.
[0098] Step S532: The causal reasoning model traces the source upstream based on the spatiotemporal correlation diagram and verifies the diffusion downstream.
[0099] Among them, the causal reasoning big model prohibits cross-regional reasoning without hydraulic connections.
[0100] Step S533: The causal reasoning model uses the hydraulic conduction time in the spatiotemporal correlation diagram as a basis to call the pre-constructed causal reasoning process to determine the cause of the abnormal event.
[0101] The hydraulic conduction time is equal to the runoff distance of the connecting river between the two monitoring stations divided by the water flow velocity.
[0102] The causes of abnormal events include: upstream input, local generation, and regional diffusion.
[0103] like Figure 2 As shown, the pre-constructed causal reasoning process includes: Step T1: Construct a set of candidate causes for the abnormal event.
[0104] The candidate causal set includes upstream input type, local generation type, and regional diffusion type.
[0105] Step T2: Based on the occurrence time of the abnormal event, perform evidence condition matching on each candidate cause and filter out the matching candidate causes.
[0106] Specifically, causes that meet any of the following evidentiary conditions are selected as candidate causes for matching and then used for confidence calculation.
[0107] Evidence requirements include: First, upstream input-type matching evidence conditions: upstream strongly correlated stations (monitoring stations whose hydraulic connection strength with abnormal stations is greater than a preset threshold are selected in step S510) are abnormal first.
[0108] Second, the conditions for locally generated matching evidence are: the anomalies at the monitoring stations occur without any temporal sequence or strong correlation with other stations.
[0109] Third, the condition for regional diffusion-type matching evidence: the monitoring station first shows anomalies, and downstream strongly correlated monitoring stations subsequently show anomalies.
[0110] This invention performs evidence matching on each candidate cause, filters out necessary evidentiary conditions, and determines which candidate causes are likely to be valid. Evidence conditions that do not meet any cause are marked as "complex anomalies" and transferred to manual review.
[0111] Step T3: Calculate the confidence level of the selected candidate causes, sort the selected candidate causes according to their confidence levels, form the cause inference results, and output the candidate cause with the highest confidence level.
[0112] The formula for calculating the confidence level of candidate causes is as follows: ; in, Indicates the confidence level of the candidate cause; This indicates the total number of types of surface water environmental monitoring data. Indicates the first Measured values of surface water environment monitoring data; Indicates the first candidate cause corresponding to Typical values of surface water environment monitoring data; The influence weight of measured values in surface water environment monitoring data; The weight representing the influence of the hydraulic connection strength; Indicates the weight of the influence of hydraulic conduction time; This indicates the total number of strongly correlated monitoring sites for the current abnormal site; Indicates the first A strongly correlated monitoring station and the current monitoring station The hydraulic connection strength; Indicates the first A strongly correlated monitoring station and the current monitoring station The measured abnormal time difference; Indicates the first A strongly correlated monitoring station and the current monitoring station Hydraulic conduction time.
[0113] Understandably, the current monitoring stations In other words, for the station under analysis where an anomaly occurs, the current anomaly station is used as the analysis center, and strongly correlated monitoring stations are considered as the source or range of propagation. For upstream input anomalies, the anomaly is transmitted to the current monitoring station. The hydraulic connection strength between the upstream and current monitoring stations measures the anomaly transmission capacity of the upstream monitoring station to the current monitoring station. The larger the value, the greater the confidence level of the candidate cause. For regionally diffuse anomalies originating from the current monitoring station, the hydraulic connection strength between monitoring stations measures the anomaly transmission capacity from the current monitoring station to downstream monitoring stations. The larger the value, the greater the confidence level of the candidate cause. For locally generated causes with no strongly correlated monitoring stations, the confidence level calculation formula for candidate causes is as follows: , All are 0.
[0114] For example, "upstream input type" is usually accompanied by a simultaneous increase in ammonia nitrogen and total phosphorus. The confidence of the candidate causes can be evaluated by calculating the data matching degree between surface water environmental monitoring data and candidate causes, and then combining the contribution of spatial correlation to the confidence degree and the rationality of calculating hydraulic conduction time.
[0115] This invention calculates To assess the data matching degree between surface water environmental monitoring data and candidate causes, by calculating... The contribution of quantifiable spatial correlation to confidence level is determined by the strength of the hydraulic connection; a stronger hydraulic connection contributes more to the confidence level, and a higher probability that anomalies originate from or are transmitted from that monitoring station. This application calculates... Verify the rationality of the hydraulic conduction time.
[0116] As a specific embodiment of the present invention, the causal reasoning model compares the times when anomalies occur at each monitoring station. If an upstream strongly correlated monitoring station (monitoring stations whose hydraulic connection strength with the abnormal station is greater than a preset threshold selected in step S510) becomes abnormal first, the cause of the anomaly is determined to be upstream input. If the current monitoring station becomes abnormal first, and the downstream strongly correlated monitoring station becomes abnormal later, the cause of the anomaly is determined to be local source diffusion. If the times when anomalies occur at the monitoring stations are not sequential and there is no strong correlation with the monitoring stations, the cause of the anomaly is determined to be an independent anomaly (local generation type).
[0117] Step S534: Obtain multi-dimensional objective evidence, generate an evidence chain, and verify the anomaly cause analysis results through the evidence chain.
[0118] Among them, multi-dimensional objective evidence includes: temporal evidence (abnormal moments, duration, etc.), spatial evidence (hydraulic connection strength between monitoring stations, hydraulic conduction time and propagation path, etc.), and data evidence (measured values of surface water environment monitoring data from monitoring stations and typical values corresponding to candidate causes).
[0119] The chain of evidence includes: the conclusion layer (containing the result of the anomaly cause determination), the reasoning layer (containing the cause reasoning process), the evidence layer (containing multi-dimensional objective evidence), and the data source layer (containing the original data source tracing).
[0120] The original data sources include monitoring data sources: monitoring station, equipment number; meteorological data sources: meteorological bureau, timestamp, etc.
[0121] Specifically, the accuracy of the anomaly cause analysis results is verified by judging whether all the multi-dimensional objective evidence in the evidence layer of the evidence chain supports the anomaly cause determination result in the conclusion layer.
[0122] Step S6: Construct an interpretable causal analysis report that includes the chain of evidence, the causal reasoning process, and the confidence level of candidate causes.
[0123] This application also provides a computer storage medium storing computer instructions, which, when invoked, execute the address mapping method of the large-capacity solid-state drive. The computer storage medium includes one or more program instructions, which are executed by a processor as a method for identifying and reasoning about the causes of surface water anomalies based on a large model.
[0124] The embodiments disclosed in this invention provide a computer-readable storage medium storing computer program instructions. When the computer program instructions are executed on a computer, the computer performs the above-described method for identifying and reasoning about the causes of surface water anomalies based on a large model.
[0125] This invention provides a processor for processing the above-described method for identifying and reasoning about the causes of surface water anomalies based on a large model.
[0126] In this embodiment of the invention, the processor can be an integrated circuit chip with signal processing capabilities. The processor can be a general-purpose processor, a digital signal processor (DSP), an application-specific integrated circuit (ASIC), a field-programmable gate array (FPGA), or other programmable logic devices, discrete gate or transistor logic devices, or discrete hardware components.
[0127] The various methods, steps, and logic diagrams disclosed in the embodiments of this invention can be implemented or executed. The general-purpose processor can be a microprocessor or any conventional processor. The steps of the methods disclosed in the embodiments of this invention can be directly implemented by a hardware decoding processor, or implemented by a combination of hardware and software modules in the decoding processor. The software modules can reside in random access memory, flash memory, read-only memory, programmable read-only memory, electrically erasable programmable memory, registers, or other mature storage media in the art. The processor reads information from the storage medium and, in conjunction with its hardware, completes the steps of the above methods.
[0128] The storage medium can be memory, such as volatile memory or non-volatile memory, or may include both volatile and non-volatile memory.
[0129] The non-volatile memory can be read-only memory (ROM), programmable read-only memory (PROM), erasable programmable read-only memory (EEPROM), electrically erasable programmable read-only memory (EEPROM), or flash memory. The volatile memory can be random access memory (RAM), which is used as an external cache. By way of example, but not limitation, many forms of RAM are available, such as Static Random Access Memory (SRAM), Dynamic Random Access Memory (DRAM), Synchronous DRAM (SDRAM), Double Data Rate SDRAM (DDRSDRAM), Enhanced Synchronous DRAM (ESDRAM), Synchlink DRAM (SLDRAM), and Direct Rambus RAM (DRRAM).
[0130] The beneficial effects achieved by this application are as follows: (1) This application realizes intelligent analysis of surface water anomaly identification and cause reasoning, automating the entire process from data input to cause report output, eliminating artificial faults.
[0131] (2) The anomaly identification big model of this application comprehensively analyzes the temporal change characteristics, spatial distribution characteristics and historical patterns of the preprocessed surface water environment monitoring data and the auxiliary data related to the monitoring business, automatically identifies abnormal events in the surface water environment monitoring data, and automatically completes multi-dimensional reasoning through the big model, reducing the reliance on experts.
[0132] (3) This application improves the accuracy of calculating the hydraulic connection strength of associated monitoring stations by updating the spatiotemporal correlation diagram in real time. Calculating the hydraulic connection strength between associated monitoring stations can quantify the hydraulic transmission capacity and influence between stations, realize the reasonable association and dynamic linkage of multiple monitoring stations, thereby improving the accuracy and reliability of watershed anomaly identification, pollution source tracing, and spatial anomaly analysis, and providing key support for real-time water environment monitoring and intelligent early warning.
[0133] (4) Under the constraints and guidance of the spatiotemporal correlation diagram, the big model of causal reasoning in this application intelligently reasons about the source, propagation path and scope of influence of abnormal events, and constructs a chain of evidence. The chain of evidence is used to verify the accuracy of causal reasoning and improve the interpretability and credibility of causal reasoning.
[0134] In the description of this application, the terms "first" and "second" are used for descriptive purposes only and should not be construed as indicating or implying relative importance or implicitly specifying the number of indicated technical features. Thus, a feature defined as "first" or "second" may explicitly or implicitly include one or more of the stated features. In the description of this application, "multiple" means two or more, unless otherwise explicitly specified.
[0135] In the description of this application, the word "for example" is used to mean "used as an example, illustration, or description." Any embodiment described as "for example" in this application is not necessarily to be construed as being more preferred or advantageous than other embodiments. The following description is provided to enable any person skilled in the art to make and use the invention. Details are set forth in the following description for purposes of explanation. It should be understood that those skilled in the art will recognize that the invention can be made without using these specific details. In other instances, well-known structures and processes will not be described in detail to avoid obscuring the description of the invention with unnecessary detail. Therefore, the invention is not intended to be limited to the embodiments shown, but is consistent with the broadest scope of the principles and features disclosed in this application.
[0136] The above description is merely an embodiment of the present invention and is not intended to limit the invention. Various modifications and variations can be made to the present invention by those skilled in the art. Any modifications, equivalent substitutions, improvements, etc., made within the spirit and principle of the present invention should be included within the scope of the claims of the present invention.
Claims
1. A method for identifying and reasoning about the causes of surface water anomalies based on a large model, characterized in that, The method includes: Acquire surface water environment monitoring data and auxiliary data related to monitoring operations; Preprocessing of surface water environment monitoring data and auxiliary data related to monitoring operations; Based on a pre-built large-scale anomaly identification model, anomalies are identified in pre-processed surface water environmental monitoring data and auxiliary data related to monitoring operations, identifying abnormal events and abnormal characteristics. In response to the identification of abnormal events, a spatiotemporal correlation graph between watershed monitoring stations is constructed based on a dynamic graph neural network; Using the spatiotemporal correlation diagram between watershed monitoring stations as constraints, a large-scale causal reasoning model is used to perform causal reasoning on abnormal events, generating abnormal causal analysis results and corresponding evidence chains.
2. The method for identifying and reasoning about the causes of surface water anomalies based on a large model according to claim 1, characterized in that, The method also includes: Construct an interpretable causal analysis report that includes the chain of evidence, the causal reasoning process, and the confidence level of candidate causes.
3. The method for identifying and reasoning about the causes of surface water anomalies based on a large model according to claim 1, characterized in that, Preprocessing of surface water environment monitoring data and auxiliary data related to monitoring operations includes: Time alignment, missing value processing, and initial screening of outliers are performed on surface water environment monitoring data and auxiliary data related to monitoring operations. Multimodal feature fusion is performed on surface water environment monitoring data and auxiliary data related to monitoring operations to construct a unified high-dimensional feature representation.
4. The method for identifying and reasoning about the causes of surface water anomalies based on a large model according to claim 1, characterized in that, Based on a pre-built large-scale anomaly identification model, anomalies are identified in pre-processed surface water environmental monitoring data and auxiliary data related to monitoring operations. The identified anomalies and characteristics include: The anomaly identification model comprehensively analyzes the temporal variation characteristics, spatial distribution characteristics, and historical patterns of preprocessed surface water environment monitoring data and auxiliary data related to monitoring operations, and automatically identifies abnormal events in the surface water environment monitoring data. For the identified abnormal events, a structured abnormal feature vector is output.
5. The method for identifying and reasoning about the causes of surface water anomalies based on a large model according to claim 4, characterized in that, The large-scale anomaly identification model comprehensively analyzes the temporal variation characteristics, spatial distribution characteristics, and historical patterns of preprocessed surface water environment monitoring data and auxiliary data related to monitoring operations, automatically identifying anomalous events in the surface water environment monitoring data, including: Based on the pretreated surface water environmental monitoring data, calculate the time variation characteristics data; Based on the preprocessed surface water environment monitoring data and auxiliary data related to monitoring operations, spatial distribution characteristic data are extracted. Historical pattern analysis was performed on the pretreated surface water environmental monitoring data to obtain historical pattern characteristic data. Calculate the comprehensive outlier value based on time variation characteristic data, spatial distribution characteristic data, and historical pattern characteristic data; The comprehensive outlier value is compared with the preset outlier threshold. If the comprehensive outlier value is less than the preset outlier threshold, the surface water environment monitoring is in a normal state. Otherwise, the time variation characteristic data, spatial distribution characteristic data, and historical pattern characteristic data are input into the outlier event classification decision tree to obtain the type of outlier event.
6. The method for identifying and reasoning about the causes of surface water anomalies based on a large model according to claim 1, characterized in that, In response to the identification of anomalous events, a spatiotemporal correlation graph among watershed monitoring stations is constructed based on a dynamic graph neural network, including: In response to the identification of abnormal events, the monitoring stations with abnormalities are identified based on the abnormal events. Using all monitoring stations in the watershed where the abnormal monitoring station is located as nodes, a spatiotemporal correlation graph among the monitoring stations in the watershed is constructed based on a dynamic graph neural network. Two adjacent monitoring stations connected by the river channel are designated as associated monitoring stations, and edge connections are established between the nodes corresponding to the associated monitoring stations in the spatiotemporal correlation graph. Calculate the hydraulic connection strength of the associated monitoring stations, use this hydraulic connection strength as the weight of the edges in the spatiotemporal association graph, and update the hydraulic connection strength of the associated monitoring stations in the spatiotemporal association graph in real time.
7. The method for identifying and reasoning about the causes of surface water anomalies based on a large model according to claim 1, characterized in that, Using the spatiotemporal correlation diagram among watershed monitoring stations as constraints, a large-scale causal reasoning model is used to perform causal reasoning on abnormal events, generating anomaly causal analysis results and corresponding evidence chains, including: Based on the magnitude of the hydraulic connection strength between the associated monitoring stations in the spatiotemporal correlation diagram and the preset threshold, monitoring stations with a hydraulic connection strength greater than the preset threshold with abnormal stations are selected to form a set of inference analysis stations; The basic information of the abnormal event, the abnormal characteristics, the comprehensive abnormal value, the spatiotemporal correlation diagram, and the auxiliary data related to the monitoring business are input into the cause reasoning model. The causal reasoning model, constrained by the spatiotemporal correlation diagram, performs source tracing and causal inference for anomalous events.
8. The method for identifying and reasoning about the causes of surface water anomalies based on a large model according to claim 7, characterized in that, Under the constraints of a spatiotemporal correlation graph, the large-scale causal reasoning model performs source tracing and causal inference for anomalous events, including: Centered on the anomaly monitoring stations, only monitoring stations with hydraulic connections are retained in the spatiotemporal correlation diagram; The causal reasoning model traces the source upstream based on the spatiotemporal correlation diagram and verifies the diffusion downstream. The causal reasoning model uses the hydraulic conduction time in the spatiotemporal correlation diagram as a basis, calls the pre-constructed causal reasoning process, and determines the cause of abnormal events. Obtain multi-dimensional objective evidence, generate a chain of evidence, and verify the results of the anomaly cause analysis through the chain of evidence.
9. The method for identifying and reasoning about the causes of surface water anomalies based on a large model according to claim 8, characterized in that, The pre-constructed causal reasoning process includes: Construct a set of candidate causes for the abnormal event; Based on the time of occurrence of the abnormal event, the evidence conditions of each candidate cause are matched, and the matching candidate causes are selected. Calculate the confidence level of the selected candidate causes, sort the selected candidate causes according to their confidence levels, form the causal reasoning results, and output the candidate cause with the highest confidence level.
10. The method for identifying and reasoning about the causes of surface water anomalies based on a large model according to claim 9, characterized in that, The candidate causal set includes upstream input type, local generation type, and regional diffusion type.