A water pollution treatment risk early warning method based on big data processing

What is AI technical title?
AI technical title is built by PatSnap AI team. It summarizes the technical point description of the patent document.
By using big data stream processing technology and feature fusion analysis methods, a water pollution propagation relationship diagram was constructed, which solved the problem of inconsistent structure of multi-source water environment monitoring data, realized the comprehensive expression of water quality, pollution source, hydrological and meteorological data, and improved the accuracy of pollution risk analysis and the timeliness of early warning.

CN122200940APending Publication Date: 2026-06-12SHANDONG HAITAI TECHNOLOGY DEVELOPMENT CO LTD

View PDF 0 Cites 0 Cited by

Patent Information

Authority / Receiving Office: CN · China
Patent Type: Applications(China)
Current Assignee / Owner: SHANDONG HAITAI TECHNOLOGY DEVELOPMENT CO LTD
Filing Date: 2026-03-12
Publication Date: 2026-06-12

Application Information

Patent Timeline

12 Mar 2026

Application

12 Jun 2026

Publication

CN122200940A

IPC: G08B31/00; G08B21/18; G06F16/2455; G06F16/2458; G06F16/25; G06F18/20; G06F18/22; G06F18/2415; G06F18/25; G06F18/213; G06F18/2433; G06Q10/04; G06Q50/26; G06F123/02

AI Tagging

Application Domain

Database management systems Forecasting

Explore More Agents

Novelty Search
Search existing technologies and assess novelty
↗
FTO
Analyze whether a product may infringe others' patents
↗
Design FTO
Check prior-design risk for exterior design
↗
Drafting
Draft patent application text based on a technical solution
↗
Find Solutions with TRIZ
Generate feasible solution to solve your technical challenge
↗

Similar Technology Patents

Get free access to AI patent search and analysis

Check patentability, review prior art and ask IP Agent with full patent context.

AI Technical Summary

Technical Problem

Existing water pollution control and risk early warning technologies suffer from inconsistent data structures and difficulty in merging when faced with multi-source, diverse, and massive water environment monitoring data. This results in low efficiency and poor accuracy in pollution risk analysis, and a lack of modeling of the spatial propagation characteristics of pollution, leading to a lack of foresight in early warning.

Method used

By employing big data stream processing technology, feature fusion analysis method, and Pettit mutation point detection method, a water pollution propagation relationship map is constructed. Combining water quality, pollution source, hydrological and meteorological data, a trend prediction model is used to achieve real-time data format conversion and timestamp reconstruction, identify potential water pollution anomalies, and generate dynamic risk warnings.

Benefits of technology

It has achieved efficient integration and standardized expression of water environment data, improved the accuracy and scientific nature of pollution risk analysis, enabled early identification of pollution risk trends, and improved the timeliness and accuracy of early warning.

✦ Generated by Eureka AI based on patent content.

Smart Images

Figure CN122200940A_ABST

Patent Text Reader

Abstract

The application discloses a water pollution treatment risk early warning method based on big data processing, comprising the following steps: collecting water environment monitoring data and performing data cleaning processing; performing data format conversion and timestamp reconstruction to form a structured water environment monitoring data set; performing feature fusion processing to construct a water environment risk feature vector; constructing a water pollution propagation relationship graph; outputting a pollution risk prediction result by using a water pollution trend prediction model; identifying a mutation time in the pollution risk prediction result by using a Pettitt mutation point detection method, and identifying a monitoring section state corresponding to the mutation time as a potential water pollution abnormal event; performing probability inference processing to generate a pollution risk assessment result; constructing a dynamic risk early warning mechanism and generating corresponding level early warning information, which realizes dynamic identification and early warning of water pollution risk, and improves the accuracy of water pollution risk identification and the timeliness of early warning.

Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of water environment monitoring and environmental risk early warning, and in particular to a method for early warning of water pollution control risks based on big data processing. Background Technology

[0002] With the continuous advancement of industrialization and the sustained expansion of urbanization, large amounts of industrial wastewater, agricultural non-point source pollutants, and domestic sewage are constantly being discharged into rivers, lakes, and reservoirs, leading to increasingly prominent water pollution problems. Water pollution not only damages the structure of ecosystems and affects the self-purification capacity of water bodies, but also seriously impacts the safety of drinking water for residents, agricultural irrigation, and regional ecological security. Therefore, how to effectively monitor and provide early warning of water pollution risks has become an important research direction in the field of water environment governance.

[0003] Currently, in the field of water pollution control and risk early warning, relevant technologies typically rely on online water quality monitoring systems. These systems deploy water quality monitoring equipment at river sections, lake areas, and sewage outlets to monitor water quality indicators such as dissolved oxygen, chemical oxygen demand, ammonia nitrogen, total phosphorus, and heavy metals in real time. Statistical analysis of the monitoring data can provide some understanding of changes in water pollution. However, with the continuous expansion of water environment monitoring networks and the significant increase in the number of water quality monitoring devices, monitoring data exhibits characteristics such as diverse data sources, large data volumes, and high update frequencies. Traditional data processing methods are gradually revealing significant shortcomings when dealing with massive amounts of water environment monitoring data. Summary of the Invention

[0004] One objective of this invention is to propose a water pollution control risk early warning method based on big data processing. This invention fully utilizes big data stream processing technology, feature fusion analysis method, pollution propagation relationship modeling method, and Pettitt mutation point detection method to achieve efficient processing and comprehensive analysis of water environment monitoring data. It has the advantages of high data processing efficiency, high accuracy in pollution risk identification, and strong timeliness in early warning response.

[0005] A water pollution control risk early warning method based on big data processing according to an embodiment of the present invention includes the following steps: Collect water environment monitoring data and perform data cleaning and processing; Based on the Apache Flink stream processing framework, the water environment monitoring data after data cleaning is converted into data format and reconstructed with timestamps to form a structured water environment monitoring dataset. Feature fusion processing is performed on the structured water environment monitoring dataset to construct a water environment risk feature vector; A water pollution propagation relationship map was constructed by analyzing the water flow relationships between various monitoring sections and the pollutant diffusion paths. Based on the water environment risk feature vector and the water pollution propagation relationship diagram, the pollution risk prediction results are output using the water pollution trend prediction model. The pollution risk prediction results are processed by mutation detection. The Pettitt mutation point detection method is used to identify the mutation time in the pollution risk prediction results, and the monitoring section status corresponding to the mutation time is identified as a potential water pollution anomaly. Probabilistic inference is performed on potential water pollution anomalies, and statistical analysis is conducted on potential water pollution anomalies in conjunction with a historical pollution event database to calculate the pollution risk level corresponding to the potential water pollution anomalies and generate pollution risk assessment results. A dynamic risk early warning mechanism is constructed based on the pollution risk assessment results. By setting graded risk early warning thresholds, corresponding early warning information is triggered for different pollution risk levels, and the early warning information is released and recorded in real time through the environmental monitoring and management platform.

[0006] Optionally, the target water area is monitored using online water quality monitoring equipment, pollution source emission monitoring equipment, hydrological monitoring equipment, and meteorological monitoring equipment to collect water environment monitoring data. The water environment monitoring data includes water quality index data, pollution source emission data, hydrological dynamic data, and meteorological environmental data. The data cleaning process includes outlier identification, missing value imputation, and time series alignment processing.

[0007] Optionally, the formation of the structured water environment monitoring dataset specifically includes: Based on the Apache Flink stream processing framework, the water environment monitoring data after data cleaning is converted into a data format. Water quality index data, pollution source discharge data, hydrological dynamic data and meteorological environmental data from different sources are mapped into a unified data field structure to obtain a standardized time series dataset. The standardized time series dataset is reconstructed using a preset time base to obtain the reconstructed timestamps. Based on the reconstructed timestamps, the standardized time series dataset is divided into time windows, and continuous time windows are constructed according to time intervals to form a time window sequence arranged in chronological order. Numerical aggregation processing is performed on water environment monitoring data that fall within the same time window, belong to the same monitoring section, and have the same data type to obtain the corresponding window aggregation value; Structured records are generated based on the reconstructed timestamps, data types, and window aggregation values to form a structured water environment monitoring dataset.

[0008] Optionally, the construction of the water environment risk feature vector specifically includes: The data records in the structured water environment monitoring dataset are divided according to data type to form water quality index data sequence, pollution source emission data sequence, hydrological dynamic data sequence, and meteorological environment data sequence. Using the reconstructed timestamp as the time index, the water quality index data sequence, pollution source emission data sequence, hydrological dynamic data sequence and meteorological environment data sequence under the same timestamp are synchronized and aligned. The window aggregation values of various types of data are combined according to a unified field order to form a multidimensional environmental state vector corresponding to the time window. The multidimensional environmental state vectors are arranged in chronological order to form a continuous environmental state time series. The environmental state time series is then extended based on a time delay embedding method to construct a delayed embedding vector of historical state information. The delayed embedding vectors are processed by kernel space mapping, and the kernel mapping function is used to transform the delayed embedding vectors into high-dimensional environmental feature representations; The high-dimensional environmental feature representation is subjected to manifold-preserving dimensionality reduction processing to extract key feature components that can reflect the trend of water environment change and pollution risk characteristics, and generate water environment risk feature vector.

[0009] Optionally, the construction of the water pollution propagation relationship diagram specifically includes: Read the structured water environment monitoring dataset and obtain the spatial coordinate information, water flow velocity information, and water flow direction information of each monitoring section; Each monitoring section is used as a node to construct a node set; Based on the water flow relationships between monitoring sections and the pollutant diffusion paths, a graph edge relationship is established to form a graph edge set; Calculate the spatial distance between each monitoring section; The pollution propagation weights between nodes in the graph are calculated based on the water flow velocity and the spatial distance between monitoring sections. Construct a water pollution propagation relationship graph based on the node set, graph edge set, and corresponding pollution propagation weights.

[0010] Optionally, the output of the pollution risk prediction results specifically includes: The water environment risk feature vectors are arranged according to the time window order. Within the same time window, the water environment risk feature vectors of each monitoring section are combined according to the monitoring section identification order to form the environmental status input matrix corresponding to the time window. The adjacency matrix corresponding to the water pollution propagation relationship diagram is standardized to obtain the standardized propagation relationship matrix; The environmental state input matrix and the standardized propagation relationship matrix are input into the spatial propagation calculation layer of the water pollution trend prediction model. The spatial propagation operation is performed on the environmental state input matrix through the standardized propagation relationship matrix to obtain the spatial propagation characteristic matrix of each monitoring section under the current time window. The spatial propagation feature matrix corresponding to the continuous time window is input into the time evolution calculation layer of the water pollution trend prediction model in chronological order. By recursively calculating the spatial propagation feature matrix of the current time window and the time evolution feature matrix of the previous time window, the time evolution feature matrix of each monitoring section under the current time window is obtained. Trend prediction calculations are performed based on the time evolution feature matrix to obtain the trend prediction matrix for each monitoring section under the future time window; The trend prediction values of each monitoring section under the future time window are read according to the trend prediction matrix, and the trend prediction values are converted into pollution risk prediction values through the risk mapping function to obtain the pollution risk prediction results of each monitoring section under different time windows.

[0011] Optionally, the identification of potential water pollution anomalies specifically includes: The pollution risk prediction results are organized according to the monitoring section identification and time window order to form the pollution risk time series of the corresponding monitoring section. The pollution risk time series of each monitoring section is traversed, and the predicted pollution risk values in the pollution risk time series are recorded sequentially according to the time window order to form a pollution risk series calculated by mutation detection. The Pettitt mutation point detection method is used to calculate the statistics of the pollution risk sequence and obtain the statistics of the corresponding time window. The absolute values of the statistics corresponding to all time windows are calculated, and the absolute values of the statistics for each time window are compared. The time window with the largest absolute value of the statistics is determined as the mutation time window in the pollution risk sequence. The mutation time window is used to determine the moment when the pollution risk prediction result changes, and the pollution risk prediction value corresponding to the mutation time window is read. The pollution risk prediction value corresponding to the time window of the sudden change is compared with the pollution risk prediction value before the time window of the sudden change. When the pollution risk prediction value corresponding to the time window of the sudden change is greater than the pollution risk prediction value corresponding to the previous time window, it is determined that the pollution risk has increased at the time window. After determining that the pollution risk has increased, the status of the monitoring section corresponding to the time window is marked as a potential water pollution anomaly event.

[0012] Optionally, the generation of the pollution risk assessment results specifically includes: Potential water pollution anomalies are organized according to the monitoring section markings and time window sequence to form an anomaly event sequence; Historical pollution event samples are extracted from the historical pollution event database and organized according to the pollution occurrence time, pollution monitoring section, changes in pollution indicators, and pollution event handling results to form a historical pollution event sample set. For each potential water pollution anomaly, extract the corresponding event feature information, and extract the historical event feature information corresponding to each historical pollution event from the historical pollution event sample set; The difference between the event feature information of each potential water pollution anomaly and the historical event feature information of each historical pollution event in the historical pollution event sample set is calculated to obtain the event similarity value; Calculate the probability of occurrence of pollution events corresponding to potential water pollution anomalies based on event similarity values; Determine the pollution risk level corresponding to potential water pollution anomalies based on the probability of pollution incidents occurring. The pollution risk assessment results are generated by combining the monitoring section identifiers, time windows, probability of occurrence of pollution events, and pollution risk levels corresponding to each potential water pollution anomaly.

[0013] Optionally, the monitoring section identifier, time window, and corresponding pollution risk level in the pollution risk assessment results are obtained. A dynamic risk early warning mechanism is established based on the pollution risk level. Corresponding risk early warning thresholds are set for different pollution risk levels. The pollution risk assessment results are judged and processed according to the risk early warning thresholds. When the pollution risk level of the monitoring section reaches the corresponding risk early warning threshold, the corresponding level of early warning information is generated. The early warning information is associated with the monitoring section identifier, time window, and pollution risk level. The associated early warning information is sent to the environmental monitoring management platform. After receiving the early warning information, the environmental monitoring management platform displays, records, and stores the early warning information in real time.

[0014] The beneficial effects of this invention are: Compared with existing technologies, this invention provides a water pollution control risk early warning method based on big data processing. By performing unified data processing and structured management of multi-source water environment monitoring data, it achieves efficient integration and standardized representation of water environment data. Specifically, this invention utilizes streaming data processing technology to perform real-time data format conversion and timestamp reconstruction of water environment monitoring data. This enables data from different monitoring devices, monitoring types, and sampling frequencies to be organized and managed under a unified data structure, effectively solving the problems of inconsistent data structures and difficulty in data fusion in existing technologies. This improves the efficiency and consistency of water environment monitoring data processing, providing a reliable data foundation for subsequent pollution risk analysis.

[0015] This invention constructs a water environment risk feature vector by performing feature fusion processing on structured water environment monitoring datasets. This allows multi-dimensional information such as water quality index data, pollution source emission data, hydrological dynamic data, and meteorological environmental data to be comprehensively expressed in the same feature space, thus providing a more comprehensive reflection of the changing characteristics of the water environment. Simultaneously, this invention combines the water flow relationships between monitoring sections and the pollutant diffusion paths to construct a water pollution propagation relationship diagram. This enables the effective modeling of spatial correlations between different monitoring sections, overcoming the shortcomings of existing technologies that only analyze single monitoring sections and ignore the spatial propagation characteristics of pollution, thereby improving the accuracy and scientific rigor of pollution risk analysis.

[0016] This invention inputs a water environment risk feature vector and a water pollution propagation relationship diagram into a water pollution trend prediction model, enabling the prediction of future water quality change trends at each monitoring section. This allows for the identification of potential pollution trends before pollution risks actually occur. Compared to traditional methods that trigger early warnings based on fixed thresholds, this invention uses a trend prediction model to dynamically analyze pollution risk change trends, enabling the early identification of these trends and thus improving the foresight of water pollution risk early warnings.

[0017] In terms of pollution anomaly identification, this invention introduces the Pettt mutation point detection method to detect mutations in pollution risk prediction results. By calculating statistical values from the pollution risk time series, it identifies the mutation moments, thereby accurately identifying key time nodes where pollution risk mutations occur and recognizing the corresponding monitoring section status as potential water pollution anomalies. Compared to traditional simple change detection methods, this invention can more accurately identify the mutation behavior of pollution risks and improve the accuracy of anomaly event identification.

[0018] This invention combines a historical pollution event database to perform probabilistic inference and statistical analysis on potential water pollution anomalies. By calculating the similarity between potential water pollution anomalies and historical pollution events, it obtains the pollution risk level corresponding to the potential water pollution anomaly and generates a pollution risk assessment result. In this way, this invention can not only identify anomalies but also quantitatively assess the degree of pollution risk that anomalies may cause, thus providing a more reliable basis for subsequent risk warnings.

[0019] This invention constructs a dynamic risk early warning mechanism based on pollution risk assessment results. By setting early warning thresholds corresponding to different pollution risk levels, it generates corresponding early warning information and publishes and records it in real time through an environmental monitoring and management platform, achieving dynamic monitoring and timely early warning of water pollution risks. Through the above technical solution, this invention can effectively improve the accuracy of water pollution risk identification and the timeliness of early warning, thus providing important technical support for water environment governance and pollution prevention and control. Attached Figure Description

[0020] The accompanying drawings are provided to further illustrate the invention and form part of the specification. They are used in conjunction with embodiments of the invention to explain the invention and do not constitute a limitation thereof. In the drawings: Figure 1 This is an overall flowchart of a water pollution control risk early warning method based on big data processing proposed in this invention; Figure 2 This is a schematic diagram illustrating the construction of a structured water environment monitoring dataset for a water pollution control risk early warning method based on big data processing proposed in this invention. Figure 3 This is a schematic diagram illustrating the construction of potential water pollution anomalies in a water pollution control risk early warning method based on big data processing proposed in this invention. Detailed Implementation

[0021] The present invention will now be described in further detail with reference to the accompanying drawings. These drawings are simplified schematic diagrams, illustrating only the basic structure of the invention, and therefore only show the components relevant to the invention.

[0022] refer to Figures 1-3 A water pollution control risk early warning method based on big data processing includes the following steps: Collect water environment monitoring data and perform data cleaning and processing; Based on the Apache Flink stream processing framework, the water environment monitoring data after data cleaning is converted into data format and reconstructed with timestamps to form a structured water environment monitoring dataset. Feature fusion processing is performed on the structured water environment monitoring dataset to construct a water environment risk feature vector; A water pollution propagation relationship map was constructed by analyzing the water flow relationships between various monitoring sections and the pollutant diffusion paths. Based on the water environment risk feature vector and the water pollution propagation relationship diagram, the pollution risk prediction results are output using the water pollution trend prediction model. The pollution risk prediction results are processed by mutation detection. The Pettitt mutation point detection method is used to identify the mutation time in the pollution risk prediction results, and the monitoring section status corresponding to the mutation time is identified as a potential water pollution anomaly. Probabilistic inference is performed on potential water pollution anomalies, and statistical analysis is conducted on potential water pollution anomalies in conjunction with a historical pollution event database to calculate the pollution risk level corresponding to the potential water pollution anomalies and generate pollution risk assessment results. A dynamic risk early warning mechanism is constructed based on the pollution risk assessment results. By setting graded risk early warning thresholds, corresponding early warning information is triggered for different pollution risk levels, and the early warning information is released and recorded in real time through the environmental monitoring and management platform.

[0023] In this embodiment, the target water area is monitored by online water quality monitoring equipment, pollution source emission monitoring equipment, hydrological monitoring equipment and meteorological monitoring equipment to collect water environment monitoring data. The water environment monitoring data includes water quality index data, pollution source emission data, hydrological dynamic data and meteorological environmental data. The data cleaning process includes outlier identification, missing value imputation and time series alignment processing.

[0024] In this embodiment, the formation of the structured water environment monitoring dataset specifically includes: Based on the Apache Flink stream processing framework, the water environment monitoring data after data cleaning is converted into a data format. Water quality index data, pollution source discharge data, hydrological dynamic data and meteorological environmental data from different sources are mapped into a unified data field structure to obtain a standardized time series dataset. The data transformation and processing process is as follows: Deploy the Apache Flink stream processing framework, input the water environment monitoring data after data cleaning as a real-time data stream into the Flink data stream processing pipeline, continuously receive and parse the data stream through the FlinkDataStream interface, and use Flink operators to perform field mapping and structure transformation processing on the water quality index data, pollution source emission data, hydrological dynamic data and meteorological environment data in the data stream. During the data stream processing, based on Flink's streaming operators, perform unified field definition, field order reorganization and data type standardization processing on data fields from different sources, and output the transformed data in real-time through Flink's state management mechanism and event time processing mechanism to form a standardized time-series dataset with consistent structure and unified fields. The standardized time series dataset is reconstructed according to a preset time benchmark to obtain the reconstructed timestamps. The process of reconstructing the event stamps involves setting a uniform time interval, calculating the difference between the original monitoring time and the start time of each water environment monitoring data, dividing the difference by the time interval and rounding it down, multiplying the integer result by the time interval, and adding the start time to it. The start time is the starting point of the uniform time benchmark. Based on the reconstructed timestamps, the standardized time series dataset is divided into time windows, and continuous time windows are constructed according to time intervals to form a time window sequence arranged in chronological order. The time window division is specifically as follows: a uniform time interval is set as the time window length, and a preset start time is used as the base time for the time window division; multiple consecutive time intervals are generated sequentially in units of the time interval, and each time interval corresponds to a time window, thereby forming a time window sequence arranged in chronological order. Numerical aggregation processing is performed on water environment monitoring data that fall within the same time window, belong to the same monitoring section, and have the same data type to obtain the corresponding window aggregation value; The aggregation process is as follows: based on the time window boundaries, all water environment monitoring data are divided into corresponding time windows. Using the monitoring section identifier and data type as the grouping key, the data records in the same time window are grouped. All corresponding monitoring values are extracted in each group, and the monitoring values are summed to obtain the total monitoring values. The number of data records in the group is counted as the number of data entries. The total monitoring values are divided by the number of data entries to obtain the window aggregation value of the monitoring section in the corresponding time window under the data type. Structured records are generated based on the reconstructed timestamps, data types, and window aggregation values to form a structured water environment monitoring dataset. The generation process is as follows: a unified data record structure is established according to the reconstructed timestamp, data type, and window aggregation value; the reconstructed timestamp of the corresponding time window is read, and the data type corresponding to the time window is determined. The window aggregation value calculated within the time window is written as a numerical field into the same data record and combined according to a fixed field order; an independent data record is generated for each time window and the window aggregation value corresponding to each data type, forming a structured record.

[0025] In this embodiment, the construction of the water environment risk feature vector specifically includes: The data records in the structured water environment monitoring dataset are divided according to data type to form water quality index data sequence, pollution source emission data sequence, hydrological dynamic data sequence, and meteorological environment data sequence. The specific implementation method of the division is as follows: read all data records in the structured water environment monitoring dataset, and parse the data type field contained in each data record one by one; classify the data records according to the data type field. When the data type field corresponds to a water quality monitoring project, the data record is classified into the water quality index data sequence; when the data type field corresponds to a pollution source emission monitoring project, the data record is classified into the pollution source emission data sequence; when the data type field corresponds to a hydrological monitoring project, the data record is classified into the hydrological dynamic data sequence; when the data type field corresponds to a meteorological monitoring project, the data record is classified into the meteorological environment data sequence. Using the reconstructed timestamp as the time index, the water quality index data sequence, pollution source emission data sequence, hydrological dynamic data sequence and meteorological environment data sequence under the same timestamp are synchronized and aligned. The window aggregation values of various types of data are combined according to a unified field order to form a multidimensional environmental state vector corresponding to the time window. The formation of the multidimensional environmental state vector specifically involves: reading all data records from the water quality index data sequence, pollution source emission data sequence, hydrological dynamic data sequence, and meteorological environmental data sequence, and performing time index matching on the four types of data sequences based on the reconstructed timestamp in each data record; within the time window corresponding to the same timestamp, extracting the window aggregation values corresponding to each water quality index from the water quality index data sequence, extracting the window aggregation values corresponding to each pollution source emission index from the pollution source emission data sequence, extracting the window aggregation values corresponding to each hydrological monitoring index from the hydrological dynamic data sequence, and extracting the window aggregation values corresponding to each meteorological monitoring index from the meteorological environmental data sequence; after completing the extraction of various index values, arranging the window aggregation values corresponding to each type of index in a unified field order, with the window aggregation values corresponding to water quality indexes arranged first, followed by the window aggregation values corresponding to pollution source emission indexes, hydrological dynamic indexes, and meteorological environmental indexes; and generating a multidimensional environmental state vector corresponding to the time window by combining the multiple window aggregation values in a fixed field order. The multidimensional environmental state vectors are arranged in chronological order to form a continuous environmental state time series. The environmental state time series is then extended based on a time delay embedding method to construct a delayed embedding vector of historical state information. The specific feature expansion process is as follows: In the environmental state time series, the multidimensional environmental state vector of the current time window is used as the reference vector. The multidimensional environmental state vectors corresponding to the historical time windows are selected sequentially at fixed time intervals. The multidimensional environmental state vector of the current time window is then sequentially concatenated with the multidimensional environmental state vectors corresponding to the selected historical time windows to construct an expanded feature representation containing both current and historical environmental state information. After the vector concatenation is completed, the concatenated expanded feature representation is used as a delayed embedding vector. The delayed embedding vectors are processed by kernel space mapping, and the kernel mapping function is used to transform the delayed embedding vectors into high-dimensional environmental feature representations; The transformation process specifically involves constructing a kernel mapping function to map the delayed embedding vectors from the original feature space to a high-dimensional feature space. The kernel mapping function calculates the similarity between two delayed embedding vectors and generates a new high-dimensional feature representation. For any two delayed embedding vectors, the kernel mapping function calculates their correspondence in the high-dimensional space. The kernel mapping function uses a radial basis function, taking the Euclidean distance between the two delayed embedding vectors as the input variable. An exponential transformation of the Euclidean distance yields the corresponding kernel value, which represents the degree of similarity between the two delayed embedding vectors in the high-dimensional space. After completing the kernel mapping calculation, the kernel values calculated between each delayed embedding vector and all delayed embedding vectors are arranged in order to form a high-dimensional environmental feature representation of the corresponding delayed embedding vector. Manifold-preserving dimensionality reduction is performed on the high-dimensional environmental feature representation to extract key feature components that can reflect the trend of water environment change and pollution risk characteristics, and to generate a water environment risk feature vector. The specific manifold-preserving dimensionality reduction process involves: calculating the proximity relationships between each high-dimensional environmental feature representation; determining closely related feature samples by comparing the distances between high-dimensional environmental feature representations corresponding to different time windows; selecting nearby high-dimensional environmental feature representations as neighborhood samples within the vicinity of each high-dimensional environmental feature representation as the center, and constructing a local adjacency relationship structure based on the neighborhood samples; performing low-dimensional mapping on the high-dimensional environmental feature representations after establishing the adjacency relationship structure, while preserving the relative distance relationships between each neighborhood sample during the mapping process; and combining the feature components that characterize the changing trends of the water environment state and the changing characteristics of pollution risk to form a water environment risk feature vector by organizing the dimensionality-reduced feature components.

[0026] In this embodiment, the construction of the water pollution propagation relationship diagram specifically includes: Read the structured water environment monitoring dataset and obtain the spatial coordinate information, water flow velocity information, and water flow direction information of each monitoring section; The acquisition process specifically involves: reading data records corresponding to each monitoring section from the structured water environment monitoring dataset, and indexing and matching data from different monitoring sections based on the monitoring section identifiers in the data records; extracting the spatial coordinate information of the corresponding monitoring section from the monitoring section basic information table in the environmental monitoring management system using the monitoring section identifiers, the spatial coordinate information including the horizontal and vertical coordinate values representing the spatial location of the monitoring section; after acquiring the spatial coordinate information, extracting the water flow velocity monitoring value under the corresponding time window from the hydrological monitoring data records using the monitoring section identifiers, and using the water flow velocity monitoring value as the water flow velocity information of the corresponding monitoring section; reading the water flow direction monitoring data from the hydrological monitoring data records, and obtaining the corresponding water flow direction information by performing direction angle analysis on the water flow direction monitoring data; Each monitoring section is used as a node to construct a node set; The construction process is as follows: read all monitoring section identifiers involved in the structured water environment monitoring dataset, and perform deduplication on all monitoring section identifiers to obtain a set of monitoring section identifiers; assign a unique node number to each monitoring section according to the order in the set of monitoring section identifiers, and map each monitoring section to its corresponding node number; after completing the node number assignment, combine all node numbers in order to form a node set. Based on the water flow relationships between monitoring sections and the pollutant diffusion paths, a graph edge relationship is established to form a graph edge set; The formation of the graph edge set is specifically as follows: The spatial coordinates, water velocity, and water direction of each monitoring section are read, and the monitoring sections are paired according to their spatial location to form a monitoring section. Based on the water direction information, it is determined whether a water flow relationship exists between two monitoring sections. When the water flow direction of the upstream monitoring section points to the area where the downstream monitoring section is located, it is determined that a water flow channel exists between the two monitoring sections. Based on the determination of the existence of a water flow channel, the possible diffusion paths of pollutants are identified by combining the spatial distance between the monitoring sections and the river channel connectivity structure. When two monitoring sections simultaneously satisfy the conditions of a water flow channel and the ability of pollutants to propagate along the water flow path, a graph edge connection relationship is established between the corresponding two graph nodes. All monitoring sections are judged and connected one by one, and all graph node connection relationships that meet the conditions are summarized to form a complete graph edge set. The spatial distance between each monitoring section is calculated by reading the spatial coordinate information of two monitoring sections, calculating the square of the difference between the two monitoring sections in the horizontal coordinate direction and the square of the difference in the vertical coordinate direction, summing the two square results and then performing a square root operation to obtain the spatial distance between the two monitoring sections. The pollution propagation weight between nodes is calculated based on the water flow velocity and the spatial distance between monitoring sections. The calculation process is to use the water flow velocity between two monitoring sections as the numerator and the spatial distance between two monitoring sections as the denominator, and obtain the corresponding pollution propagation weight by dividing the water flow velocity by the spatial distance. The pollution propagation weight characterizes the ability of pollutants to propagate along the water flow direction under water flow conditions. Construct a water pollution propagation relationship graph based on the node set, graph edge set, and corresponding pollution propagation weights; The construction process is as follows: Read the node identifiers corresponding to each monitoring section in the node set, and establish a matrix structure with the same number of rows and columns as the number of nodes as the number of node identifiers, as an adjacency matrix; traverse the graph edge set sequentially according to the node identifiers, match the two graph nodes corresponding to each graph edge, and read the pollution propagation weight corresponding to the graph edge; after determining that there is a graph edge connection relationship between two graph nodes, write the pollution propagation weight into the corresponding row and column positions in the adjacency matrix, with the row position corresponding to the node identifier of the pollution propagation initiation monitoring section and the column position corresponding to the node identifier of the pollution propagation target monitoring section; after completing the traversal and matrix filling of all graph edges in the graph edge set, fill the corresponding position of the adjacency matrix with zero for node pairs that do not have a graph edge connection relationship, forming a water pollution propagation relationship graph.

[0027] In this embodiment, the output of the pollution risk prediction result specifically includes: The water environment risk feature vectors are arranged according to the time window order. Within the same time window, the water environment risk feature vectors of each monitoring section are combined according to the monitoring section identification order to form the environmental status input matrix corresponding to the time window. The adjacency matrix corresponding to the water pollution propagation relationship diagram is standardized to obtain the standardized propagation relationship matrix; The standardization process is as follows: First, read the adjacency matrix corresponding to the water pollution propagation relationship diagram. Each element in the adjacency matrix represents the pollution propagation weight between two monitoring sections. Then, construct an identity matrix based on the adjacency matrix and add the identity matrix to the adjacency matrix element-wise, adding self-connection relationships between each monitoring section to the adjacency matrix. After completing the self-connection relationship superposition, calculate the sum of the values of each row element in the new matrix according to the row direction, and write the sum of each row element to the corresponding diagonal position of the diagonal matrix. Perform square root operations on each diagonal element in the diagonal matrix and calculate its reciprocal value. Then, use the matrix formed by the square root reciprocal of the diagonal matrix to perform matrix multiplication operations on the left and right sides of the original matrix to complete the scale adjustment of each pollution propagation weight in the adjacency matrix, obtaining the standardized propagation relationship matrix. The environmental state input matrix and the standardized propagation relationship matrix are input into the spatial propagation calculation layer of the water pollution trend prediction model. The spatial propagation operation is performed on the environmental state input matrix through the standardized propagation relationship matrix to obtain the spatial propagation characteristic matrix of each monitoring section under the current time window. The spatial propagation calculation process is as follows: the environmental state input matrix is weighted and transferred using a standardized propagation relationship matrix; the water environment risk feature vector of each monitoring section is aggregated with the water environment risk feature vector of its adjacent monitoring sections according to the corresponding pollution propagation weights; a spatial propagation result containing the section's own characteristics and the surrounding propagation influence characteristics is generated; and the spatial propagation results of each monitoring section are arranged in the order of the monitoring sections to form the spatial propagation feature matrix of each monitoring section under the current time window. The spatial propagation feature matrix corresponding to the continuous time window is input into the time evolution calculation layer of the water pollution trend prediction model in chronological order. By recursively calculating the spatial propagation feature matrix of the current time window and the time evolution feature matrix of the previous time window, the time evolution feature matrix of each monitoring section under the current time window is obtained. The recursive calculation process is as follows: the spatial propagation feature matrix corresponding to each time window is read sequentially according to the time window order, and the spatial propagation feature matrix corresponding to the first time window is used as the initial time evolution feature matrix; starting from the second time window, the spatial propagation feature matrix corresponding to the current time window and the time evolution feature matrix of the previous time window are recursively calculated; the spatial propagation feature matrix of the current time window is multiplied by the time evolution parameter matrix; the time evolution feature matrix of the previous time window is multiplied by the time state parameter matrix; the results of the two matrix operations are summed; and feature mapping is performed through a nonlinear activation function to obtain the time evolution feature matrix corresponding to the current time window. Trend prediction calculations are performed based on the time evolution feature matrix to obtain the trend prediction matrix for each monitoring section under the future time window; The trend prediction calculation process is as follows: linear mapping is performed on the time evolution feature matrix; an intermediate prediction feature matrix is obtained by performing matrix multiplication between the time evolution feature matrix and the trend prediction parameter matrix; and a bias vector is superimposed on the intermediate prediction feature matrix to form the prediction feature result; after completing the linear mapping calculation, the prediction feature result is processed by a nonlinear activation function to obtain the trend prediction matrix of each monitoring section under the future time window. Each row in the trend prediction matrix corresponds to the prediction feature result of a monitoring section under the future time window, and each column corresponds to a feature component in the prediction feature vector. The trend prediction values of each monitoring section under the future time window are read according to the trend prediction matrix, and the trend prediction values are converted into pollution risk prediction values through the risk mapping function to obtain the pollution risk prediction results of each monitoring section under different time windows. The conversion process is as follows: First, the trend prediction values for each monitoring section under future time windows are extracted sequentially according to the monitoring section identifier and time window order. Second, the trend prediction values are standardized by calculating the difference between the trend prediction value and the historical baseline value, and then dividing the difference by the standard deviation of the historical baseline value to obtain the corresponding standardized trend value. Third, after standardization, the standardized trend value is input into a risk mapping function for risk value mapping calculation. This risk mapping function uses a Sigmoid mapping form, and by performing exponential transformation and normalization on the standardized trend value, it is converted into a pollution risk prediction value ranging from zero to one. Finally, all pollution risk prediction values are combined and organized according to the monitoring section identifier and time window order to obtain the pollution risk prediction results for each monitoring section under different time windows.

[0028] In this embodiment, the identification of potential water pollution anomalies specifically includes: The pollution risk prediction results are organized according to the monitoring section identification and time window order to form the pollution risk time series of the corresponding monitoring section. The specific process of sorting is as follows: obtaining the monitoring section identifier, time window number, and corresponding pollution risk prediction value of the pollution risk prediction results; grouping all pollution risk prediction results according to the monitoring section identifier, so that all pollution risk prediction values corresponding to the same monitoring section are included in the same data set; after grouping, sorting each group of data in ascending order according to the time window number; and connecting the sorted pollution risk prediction values in the order of the time windows to form a continuous data sequence, thereby obtaining a pollution risk time series set. The pollution risk time series of each monitoring section is traversed, and the predicted pollution risk values in the pollution risk time series are recorded sequentially according to the time window order to form a pollution risk series calculated by mutation detection. The Pettitt mutation point detection method is used to calculate the statistics of the pollution risk sequence and obtain the statistics of the corresponding time window. The specific method for detecting Pettitt mutation points is as follows: For each pollution risk sequence corresponding to a monitoring section, each time window in the sequence is selected sequentially as a candidate detection location according to the time window order. The pollution risk sequence is divided into a preceding subsequence and a following subsequence, using the candidate detection location as a boundary point. For each pollution risk prediction value in the preceding subsequence, a pairwise comparison is performed with each pollution risk prediction value in the following subsequence. If the pollution risk prediction value in the preceding subsequence is greater than that in the following subsequence, a positive accumulation is recorded; if the pollution risk prediction value in the preceding subsequence is less than that in the following subsequence, a negative accumulation is recorded; if they are equal, zero is recorded. After completing all pairwise comparisons corresponding to the current candidate detection location, all comparison results are summed to obtain the statistic corresponding to the candidate detection location. The candidate detection location is moved according to the time window order, and the above division, comparison, and accumulation process is repeated until the calculation of the statistics corresponding to all time windows in the pollution risk sequence is completed, resulting in a statistic corresponding to each time window. The absolute values of the statistics corresponding to all time windows are calculated, and the absolute values of the statistics for each time window are compared. The time window with the largest absolute value of the statistics is determined as the mutation time window in the pollution risk sequence. The mutation time window is used to determine the moment when the pollution risk prediction results change, and the pollution risk prediction value corresponding to the mutation time window is read. The determination process specifically involves: obtaining the time window number of the mutation time window in the pollution risk time series; locating the corresponding time window position in the pollution risk time series according to the time window number, and reading the timestamp information corresponding to the time window; associating and matching the timestamp information with the monitoring section identifier to determine the specific time position of the corresponding monitoring section under the time window; and using the timestamp as the time point when the pollution risk prediction result changes significantly to determine the mutation time of the pollution risk prediction result. The pollution risk prediction value corresponding to the time window of the sudden change is compared with the pollution risk prediction value before the time window of the sudden change. When the pollution risk prediction value corresponding to the time window of the sudden change is greater than the pollution risk prediction value corresponding to the previous time window, it is determined that the pollution risk has increased at the time window. After determining that the pollution risk has increased, the status of the monitoring section corresponding to the time window is marked as a potential water pollution anomaly event.

[0029] In this embodiment, the generation of the pollution risk assessment results specifically includes: Potential water pollution anomalies are organized according to the monitoring section markings and time window sequence to form an anomaly event sequence; Historical pollution event samples are extracted from the historical pollution event database and organized according to the pollution occurrence time, pollution monitoring section, changes in pollution indicators, and pollution event handling results to form a historical pollution event sample set. For each potential water pollution anomaly, the corresponding event feature information is extracted. The historical event feature information corresponding to each historical pollution event is extracted from the historical pollution event sample set. The event feature information includes the pollution risk prediction value corresponding to the anomaly occurrence time window, the pollution risk prediction value before the anomaly occurrence time window, and the corresponding monitoring section identification information. For each potential water pollution anomaly, the event feature information is compared with the historical event feature information of each historical pollution event in the historical pollution event sample set. Specifically, the feature difference between the event feature information and the historical event feature information is calculated, and the distance between the feature difference is calculated. The event similarity value is obtained by dividing a constant by the sum of the constant and the distance. Calculate the probability of occurrence of pollution events corresponding to potential water pollution anomalies based on event similarity values; The calculation process is as follows: the similarity value between the potential water pollution anomaly event and all historical pollution events is multiplied by the event tag value of the corresponding historical pollution event, and all product results are summed. The summation result is then divided by the sum of all similarity values to obtain the probability of pollution event occurrence corresponding to the potential water pollution anomaly event. The event tag value of the historical pollution event is used to indicate whether the historical pollution event is a real pollution event. Determine the pollution risk level corresponding to potential water pollution anomalies based on the probability of pollution incidents occurring. The determination process is as follows: when the probability of a pollution event occurring is less than the first risk threshold, the pollution risk level is determined to be low risk; when the probability of a pollution event occurring is greater than or equal to the first risk threshold and less than the second risk threshold, the pollution risk level is determined to be medium risk; and when the probability of a pollution event occurring is greater than or equal to the second risk threshold, the pollution risk level is determined to be high risk. The pollution risk assessment results are generated by combining the monitoring section identifiers, time windows, probability of occurrence of pollution events, and pollution risk levels corresponding to each potential water pollution anomaly.

[0030] In this embodiment, the monitoring section identifier, time window, and corresponding pollution risk level are obtained from the pollution risk assessment results. A dynamic risk early warning mechanism is established based on the pollution risk level. Corresponding risk early warning thresholds are set for different pollution risk levels, and the pollution risk assessment results are judged and processed according to the risk early warning thresholds. When the pollution risk level of the monitoring section reaches the corresponding risk early warning threshold, the corresponding level of early warning information is generated. The early warning information is associated with the monitoring section identifier, time window, and pollution risk level, and the associated early warning information is sent to the environmental monitoring management platform. After receiving the early warning information, the environmental monitoring management platform displays, records, and stores the early warning information in real time.

[0031] Example 1: During the water environment management of a river basin in an industrial city along the Yangtze River, due to the concentrated distribution of industrial enterprises and the continuous increase in urban sewage discharge, water quality indicators frequently fluctuated significantly at multiple monitoring sections of the river. The basin is approximately 68 kilometers long and has 12 water environment monitoring sections. Each section is equipped with online water quality monitoring equipment, hydrological monitoring equipment, and meteorological monitoring equipment. The water quality monitoring equipment can monitor dissolved oxygen, chemical oxygen demand (COD), ammonia nitrogen, and total phosphorus in real time. The hydrological monitoring equipment records water flow velocity and direction information, and the meteorological monitoring equipment records meteorological data such as rainfall, temperature, and wind speed. In traditional water environment monitoring systems, data collected by various monitoring devices are stored separately in different data systems, resulting in inconsistent data structures and different timestamp formats, making unified analysis of the monitoring data difficult. Furthermore, most existing technologies use fixed thresholds for water pollution early warning; for example, an alarm is triggered when the COD level at a monitoring section exceeds a set threshold. This approach can only respond after pollution has already occurred, and cannot identify changes in pollution risk trends in advance, which often leads to a lag in the response of water environment management departments when dealing with sudden pollution incidents.

[0032] To address the aforementioned issues, this invention proposes a water pollution control risk early warning method based on big data processing, applied to the watershed water environment monitoring system. First, water environment monitoring data from 12 monitoring sections is collected in real-time through a water environment monitoring network, including water quality index data, hydrological dynamic data, and meteorological environmental data. The collected data undergoes data cleaning to remove outliers and fill in missing data. Subsequently, a data processing module based on the Apache Flink stream processing framework performs data format conversion and timestamp reconstruction on the cleaned water environment monitoring data, unifying the data from different monitoring devices into a structured water environment monitoring dataset. In this way, data from different sources are integrated into a unified data structure, solving the problem of difficulty in merging multi-source monitoring data.

[0033] After structuring the data, the structured water environment monitoring dataset undergoes feature fusion processing. Water quality index data, pollution source emission data, hydrological dynamics data, and meteorological environmental data are represented using unified features to construct a water environment risk feature vector. Subsequently, a water pollution propagation relationship diagram is established based on the water flow relationships between monitoring sections and the pollutant diffusion paths, enabling the system to reflect the spatial propagation patterns of pollution in water bodies. By inputting the water environment risk feature vector and the water pollution propagation relationship diagram into the water pollution trend prediction model, the pollution risk change trends of each monitoring section within multiple future time windows can be predicted. In this way, the system can identify pollution trend changes before pollution risks actually occur.

[0034] After pollution trend prediction is completed, the system performs abrupt change detection on the pollution risk prediction results, identifying abrupt changes in the pollution risk sequence using the Pettitt mutation point detection method. When the system detects a significant upward trend in pollution risk at a monitoring section within a short period, that monitoring section is identified as a potential water pollution anomaly. Subsequently, the system uses a historical pollution event database to perform probabilistic inference on the potential anomaly and calculates the corresponding pollution risk level. Finally, based on the pollution risk level, a corresponding level of risk warning information is triggered, and the warning results are released in real time through the environmental monitoring and management platform, enabling water environment management personnel to take timely remedial measures.

[0035] To verify the practical effectiveness of the method of this invention, the system of this invention and the traditional threshold early warning system were run continuously for 90 days each in the watershed, and the pollution risk identification capabilities of the two systems were compared. Statistical results show that the method of this invention is significantly superior to the traditional method in terms of pollution risk identification accuracy, early warning time, and the number of abnormal events identified.

[0036] Table 1. Performance Comparison between Big Data Processing-Based Water Pollution Risk Early Warning System and Traditional Threshold Early Warning System

[0037] As shown in Table 1, under the same monitoring data scale, the data processing latency of the method of this invention is only 3 seconds, which is significantly reduced compared to the 12 seconds of the traditional system. This indicates that the data processing method based on the Apache Flink stream processing framework can significantly improve the processing efficiency of water environment monitoring data. Meanwhile, in terms of pollution risk identification accuracy, the method of this invention reaches 92.8%, significantly higher than the 71.4% of the traditional threshold early warning system, demonstrating that the fusion of water environment risk characteristics and modeling of the relationship between pollution propagation can more accurately reflect the trend of water environment changes.

[0038] Regarding the lead time for early warning, the method of this invention can identify potential pollution risks on average about 4.6 hours in advance, while traditional methods can only trigger early warnings about 0.8 hours after pollution indicators exceed the standard. This result demonstrates that this invention, through a pollution trend prediction model and abrupt change detection method, can achieve early identification of pollution risks, enabling water environment management departments to take emergency measures before pollution spreads, thereby reducing the scope of pollution impact.

[0039] Furthermore, regarding the number of anomaly events identified, the method of this invention identified 26 potential pollution anomalies within 90 days, of which 20 matched actual pollution events, while the traditional system only identified 17 anomalies. This demonstrates that the method of this invention has higher sensitivity and identification capability in pollution risk identification. In summary, this embodiment proves that the water pollution control risk early warning method based on big data processing proposed in this invention can effectively improve the accuracy of water pollution risk identification and the timeliness of early warning, thereby providing more reliable technical support for water environment governance.

[0040] The above are merely preferred embodiments of the present invention, but the scope of protection of the present invention is not limited thereto. Any equivalent substitutions or modifications made by those skilled in the art within the scope of the technology disclosed in the present invention, based on the technical solution and inventive concept of the present invention, should be covered within the scope of protection of the present invention.

Claims

1. A water pollution control risk early warning method based on big data processing, characterized in that, Includes the following steps: Collect water environment monitoring data and perform data cleaning and processing; Based on the Apache Flink stream processing framework, the water environment monitoring data after data cleaning is converted into data format and reconstructed with timestamps to form a structured water environment monitoring dataset. Feature fusion processing is performed on the structured water environment monitoring dataset to construct a water environment risk feature vector; A water pollution propagation relationship map was constructed by analyzing the water flow relationships between various monitoring sections and the pollutant diffusion paths. Based on the water environment risk feature vector and the water pollution propagation relationship diagram, the pollution risk prediction results are output using the water pollution trend prediction model. The pollution risk prediction results are processed by mutation detection. The Pettitt mutation point detection method is used to identify the mutation time in the pollution risk prediction results, and the monitoring section status corresponding to the mutation time is identified as a potential water pollution anomaly. Probabilistic inference is performed on potential water pollution anomalies, and statistical analysis is conducted on potential water pollution anomalies in conjunction with a historical pollution event database to calculate the pollution risk level corresponding to the potential water pollution anomalies and generate pollution risk assessment results. A dynamic risk early warning mechanism is constructed based on the pollution risk assessment results. By setting graded risk early warning thresholds, corresponding early warning information is triggered for different pollution risk levels, and the early warning information is released and recorded in real time through the environmental monitoring and management platform.

2. The water pollution control risk early warning method based on big data processing according to claim 1, characterized in that, The target water area is monitored by online water quality monitoring equipment, pollution source emission monitoring equipment, hydrological monitoring equipment and meteorological monitoring equipment to collect water environment monitoring data. The water environment monitoring data includes water quality index data, pollution source emission data, hydrological dynamic data and meteorological environmental data. The data cleaning process includes outlier identification, missing value imputation and time series alignment.

3. The water pollution control risk early warning method based on big data processing according to claim 1, characterized in that, The formation of the structured water environment monitoring dataset specifically includes: Based on the Apache Flink stream processing framework, the water environment monitoring data after data cleaning is converted into a data format. Water quality index data, pollution source discharge data, hydrological dynamic data and meteorological environmental data from different sources are mapped into a unified data field structure to obtain a standardized time series dataset. The standardized time series dataset is reconstructed using a preset time base to obtain the reconstructed timestamps. Based on the reconstructed timestamps, the standardized time series dataset is divided into time windows, and continuous time windows are constructed according to time intervals to form a time window sequence arranged in chronological order. Numerical aggregation processing is performed on water environment monitoring data that fall within the same time window, belong to the same monitoring section, and have the same data type to obtain the corresponding window aggregation value; Structured records are generated based on the reconstructed timestamps, data types, and window aggregation values to form a structured water environment monitoring dataset.

4. The water pollution control risk early warning method based on big data processing according to claim 1, characterized in that, The construction of the water environment risk feature vector specifically includes: The data records in the structured water environment monitoring dataset are divided according to data type to form water quality index data sequence, pollution source emission data sequence, hydrological dynamic data sequence, and meteorological environment data sequence. Using the reconstructed timestamp as the time index, the water quality index data sequence, pollution source emission data sequence, hydrological dynamic data sequence and meteorological environment data sequence under the same timestamp are synchronized and aligned. The window aggregation values of various types of data are combined according to a unified field order to form a multidimensional environmental state vector corresponding to the time window. The multidimensional environmental state vectors are arranged in chronological order to form a continuous environmental state time series. The environmental state time series is then extended based on a time delay embedding method to construct a delayed embedding vector of historical state information. The delayed embedding vectors are processed by kernel space mapping, and the kernel mapping function is used to transform the delayed embedding vectors into high-dimensional environmental feature representations; The high-dimensional environmental feature representation is subjected to manifold-preserving dimensionality reduction processing to extract key feature components that can reflect the trend of water environment change and pollution risk characteristics, and generate water environment risk feature vector.

5. The water pollution control risk early warning method based on big data processing according to claim 1, characterized in that, The construction of the water pollution propagation relationship diagram specifically includes: Read the structured water environment monitoring dataset and obtain the spatial coordinate information, water flow velocity information, and water flow direction information of each monitoring section; Each monitoring section is used as a node to construct a node set; Based on the water flow relationships between monitoring sections and the pollutant diffusion paths, a graph edge relationship is established to form a graph edge set; Calculate the spatial distance between each monitoring section; The pollution propagation weights between nodes in the graph are calculated based on the water flow velocity and the spatial distance between monitoring sections. Construct a water pollution propagation relationship graph based on the node set, graph edge set, and corresponding pollution propagation weights.

6. The water pollution control risk early warning method based on big data processing according to claim 1, characterized in that, The output of the pollution risk prediction results specifically includes: The water environment risk feature vectors are arranged according to the time window order. Within the same time window, the water environment risk feature vectors of each monitoring section are combined according to the monitoring section identification order to form the environmental status input matrix corresponding to the time window. The adjacency matrix corresponding to the water pollution propagation relationship diagram is standardized to obtain the standardized propagation relationship matrix; The environmental state input matrix and the standardized propagation relationship matrix are input into the spatial propagation calculation layer of the water pollution trend prediction model. The spatial propagation operation is performed on the environmental state input matrix through the standardized propagation relationship matrix to obtain the spatial propagation characteristic matrix of each monitoring section under the current time window. The spatial propagation feature matrix corresponding to the continuous time window is input into the time evolution calculation layer of the water pollution trend prediction model in chronological order. By recursively calculating the spatial propagation feature matrix of the current time window and the time evolution feature matrix of the previous time window, the time evolution feature matrix of each monitoring section under the current time window is obtained. Trend prediction calculations are performed based on the time evolution feature matrix to obtain the trend prediction matrix for each monitoring section under the future time window; The trend prediction values of each monitoring section under the future time window are read according to the trend prediction matrix, and the trend prediction values are converted into pollution risk prediction values through the risk mapping function to obtain the pollution risk prediction results of each monitoring section under different time windows.

7. The water pollution control risk early warning method based on big data processing according to claim 1, characterized in that, The identification of potential water pollution anomalies specifically includes: The pollution risk prediction results are organized according to the monitoring section identification and time window order to form the pollution risk time series of the corresponding monitoring section. The pollution risk time series of each monitoring section is traversed, and the predicted pollution risk values in the pollution risk time series are recorded sequentially according to the time window order to form a pollution risk series calculated by mutation detection. The Pettitt mutation point detection method is used to calculate the statistics of the pollution risk sequence and obtain the statistics of the corresponding time window. The absolute values of the statistics corresponding to all time windows are calculated, and the absolute values of the statistics for each time window are compared. The time window with the largest absolute value of the statistics is determined as the mutation time window in the pollution risk sequence. The mutation time window is used to determine the moment when the pollution risk prediction results change, and the pollution risk prediction value corresponding to the mutation time window is read. The pollution risk prediction value corresponding to the time window of the sudden change is compared with the pollution risk prediction value before the time window of the sudden change. When the pollution risk prediction value corresponding to the time window of the sudden change is greater than the pollution risk prediction value corresponding to the previous time window, it is determined that the pollution risk has increased at the time window. After determining that the pollution risk has increased, the status of the monitoring section corresponding to the time window is marked as a potential water pollution anomaly event.

8. The water pollution control risk early warning method based on big data processing according to claim 1, characterized in that, The generation of the pollution risk assessment results specifically includes: Potential water pollution anomalies are organized according to the monitoring section markings and time window sequence to form an anomaly event sequence; Historical pollution event samples are extracted from the historical pollution event database and organized according to the pollution occurrence time, pollution monitoring section, changes in pollution indicators, and pollution event handling results to form a historical pollution event sample set. For each potential water pollution anomaly, extract the corresponding event feature information, and extract the historical event feature information corresponding to each historical pollution event from the historical pollution event sample set; The difference between the event feature information of each potential water pollution anomaly and the historical event feature information of each historical pollution event in the historical pollution event sample set is calculated to obtain the event similarity value; Calculate the probability of occurrence of pollution events corresponding to potential water pollution anomalies based on event similarity values; Determine the pollution risk level corresponding to potential water pollution anomalies based on the probability of pollution incidents occurring. The pollution risk assessment results are generated by combining the monitoring section identifiers, time windows, probability of occurrence of pollution events, and pollution risk levels corresponding to each potential water pollution anomaly.

9. The water pollution control risk early warning method based on big data processing according to claim 1, characterized in that, The system obtains the monitoring section identifier, time window, and corresponding pollution risk level from the pollution risk assessment results. A dynamic risk early warning mechanism is established based on the pollution risk level. Corresponding risk early warning thresholds are set for different pollution risk levels. The pollution risk assessment results are then processed according to the risk early warning thresholds. When the pollution risk level of the monitoring section reaches the corresponding risk early warning threshold, a corresponding level of early warning information is generated. The early warning information is associated with the monitoring section identifier, time window, and pollution risk level, and the associated early warning information is sent to the environmental monitoring management platform. After receiving the early warning information, the environmental monitoring management platform displays, records, and stores the early warning information in real time.