A pollution source tracing method coupling water system topology constraints and multi-source tracing technology
By constructing a digital water system topology network and multi-dimensional feature spatial analysis of the watershed, combined with multi-source tracing technology, the problems of low source identification specificity and inaccurate positioning in watershed pollution source tracing have been solved, achieving accurate source tracing and spatial positioning of pollution sources, and improving the robustness and reliability of the source tracing method.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- ANHUI PROVINCIAL ACAD OF ECOLOGICAL & ENVIRONMENTAL SCI (ANHUI PROVINCIAL ECOLOGICAL ENVIRONMENT PLANNING INST ANHUI PROVINCIAL ECOLOGICAL ENVIRONMENTAL ENG CONSULTING & DESIGN INST)
- Filing Date
- 2026-03-30
- Publication Date
- 2026-06-26
AI Technical Summary
Existing watershed pollution source tracing technologies suffer from several drawbacks: insufficient dimensionality of single chemical indicators leading to low source identification specificity; independence between chemical identification and spatial analysis resulting in qualitative source tracing results; and a lack of physical constraints in source apportionment models, making it difficult to accurately locate pollution sources.
By combining water system topological constraints with multi-source tracing technology, a digital water system topological network is constructed for the watershed, and multi-dimensional feature spatial analysis is performed. The complementary analysis of stable isotopes, biomarkers, three-dimensional fluorescence spectroscopy and conventional water quality physicochemical parameters is utilized, and a spatial distribution map of pollution contribution rate is generated by combining geographic information system, thus achieving the organic integration of chemistry and space.
It improves the specificity and accuracy of pollution source identification, ensures that the source tracing results are accurately located to specific spatial emission units, enhances the robustness and reliability of the analysis results, and overcomes interference and distortion in the pollutant migration process.
Smart Images

Figure CN122287101A_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the field of environmental monitoring and water pollution control technology, specifically a pollution source tracing method that couples water system topological constraints with multi-source tracing technology. Background Technology
[0002] Watershed pollution source tracing is a key technology for ensuring water environment safety and supporting precise governance decisions. With the intertwining and superposition of various pollution sources such as industry, agriculture, and domestic pollution within the watershed, the complexity of pollution source tracing continues to increase. Currently, watershed pollution source tracing technology is mainly developing along two paths. The first approach is chemical tracer analysis. This type of technology identifies pollution sources by analyzing differences in characteristic chemical components in water bodies and pollution sources. It mainly includes: stable isotope analysis, which uses nitrogen and oxygen isotopes, carbon isotopes, phosphorus and oxygen isotopes to trace the source of specific elements. Different pollution sources exhibit distinguishable isotopic composition characteristics due to differences in their biogeochemical processes; biomarker analysis, which uses the composition spectrum and characteristic ratios of organic compounds such as fecal sterols, cholesterol, and bile acids to distinguish between different animal and human fecal pollution sources; and three-dimensional fluorescence spectroscopy analysis, which rapidly identifies the source category of organic pollutants by analyzing the fluorescence response characteristics of dissolved organic matter at different excitation-emission wavelengths. The second approach is the hydrological spatial analysis approach. This type of technology is based on spatial data such as geographic information systems and digital elevation models to analyze the potential migration paths of pollutants from a physical perspective. This includes using digital elevation models to extract watershed boundaries, river network structures, sub-watershed divisions, and water flow directions, and using source-sink relationships to estimate pollution loads and spatially characterize potential emission paths. However, existing technologies have the following shortcomings: First, the lack of dimensionality of a single chemical indicator leads to low source identification specificity. Existing isotope analysis techniques usually only target the isotopic composition of a single element for source identification. However, different pollution sources may have overlapping isotopic composition ranges for certain single indicators. Relying solely on the analysis results of a single chemical indicator can easily lead to misidentification of source types in complex watersheds with multiple mixed sources, resulting in insufficient specificity and low confidence in the source tracing conclusions. Secondly, the independence of chemical identification and spatial analysis results in source tracing at a qualitative level. Chemical tracer analysis can identify the source type characteristics of pollutants, but cannot determine their specific spatial emission location; hydrological spatial analysis can characterize the potential physical pathways of pollutant migration, but cannot identify the chemical identity and type of the migrating substances. Due to the lack of an effective coupling mechanism between the two, source tracing results often only provide qualitative inferences and cannot accurately locate specific spatial emission units, thus restricting the targeting and precision of governance measures. Third, source apportionment models lack physical constraints and are prone to introducing unreasonable candidate sources. Existing chemical mass balance models or receptor models usually include all known endmembers in the candidate source set for calculation when performing source apportionment, without considering the natural constraints of the watershed topology network on the direction and reach of pollutant migration. Emission sources located downstream of the receptor section or emission sources that have no hydraulic connection with the receptor section may also be incorrectly included in the calculation, causing the apportionment results to deviate from the physical reality. To address this, we propose a pollution source tracing method that couples watershed topology constraints with multi-source tracing technology. Summary of the Invention
[0003] The purpose of this invention is to provide a pollution source tracing method that couples water system topological constraints with multi-source tracing technology.
[0004] To achieve the above objectives, the present invention provides the following technical solution: a pollution source tracing method coupling water system topological constraints and multi-source tracing technology, comprising the following steps: Constructing a digital water system topology network for the watershed: Based on the digital elevation model data of the target watershed, through hydrological preprocessing, water flow direction calculation, runoff accumulation analysis and river network extraction, the natural river network is abstracted into a topology network graph composed of nodes and directed edges, and an reachability matrix is generated based on the directed edge connection relationship; Constructing a multi-source pollution tracer fingerprint database: Conducting watershed pollution source surveys, identifying and classifying all potential pollution source types, collecting end-member samples of potential pollution sources within the watershed, and simultaneously performing four complementary analyses on each end-member sample: stable isotope analysis, biomarker fingerprint analysis, three-dimensional fluorescence spectroscopy analysis, and conventional water quality physicochemical parameter analysis. After standardization, a multi-dimensional feature vector is formed, which is statistically constructed into an end-member fingerprint database according to source type. Among them, stable isotopes are good at distinguishing the major differences between inorganic and organic sources, biomarkers are good at further distinguishing human sources from different livestock and poultry sources within the major category of organic sources, three-dimensional fluorescence spectroscopy can quickly capture the compositional characteristics of organic matter, and conventional water quality parameters provide auxiliary distinguishing information. The four analytical methods are complementary. Conduct multi-source synchronous monitoring and data acquisition of receptor points in the watershed: Based on the water system topology network, monitoring points are set up at key nodes, water samples are collected synchronously, and the same four fingerprint analyses as the end-member samples are performed to obtain multi-dimensional feature vectors of receptor points. The collected receptor point fingerprint data is subjected to quality control and preprocessing, including removing outliers caused by abnormal sampling or analysis processes and filling missing data with reasonable interpolation methods. Perform multi-source data fusion and pollution source analysis based on topological constraints: Perform data spatial association binding, associate end-member fingerprints and receptor point fingerprints with the spatial location of the topological network, so that each topological network node carries its corresponding multi-dimensional fingerprint information or end-member attribution information, use the reachability matrix to filter the candidate source set, construct a chemical mass balance model with topological constraints, and solve the contribution ratio of each source to each receptor point; Generate a spatial map and source tracing report of pollution contribution: Assign the contribution rate back to the spatial source location, use the geographic information system to generate a spatial distribution map of pollution contribution rate and a critical migration path map, and form a comprehensive source tracing report.
[0005] As a further aspect of the present invention: the construction of the watershed digital water system topology network includes: The digital elevation model data is subjected to depression filling preprocessing to eliminate non-flowing depression areas. The depression filling process adopts a grid-by-grid scanning method to identify local depression areas that are lower than all adjacent grids and raise their elevation values to the lowest overflow outlet elevation. The D8 single-direction algorithm is used to calculate the water flow direction of each grid cell. The water flow direction is specified as the direction of the adjacent grid cell corresponding to the elevation difference value among the eight adjacent grid cells. The D8 algorithm encodes the water flow direction as one of the eight directions, and each grid cell finally obtains a unique water flow direction encoding value. The cumulative runoff is calculated based on the water flow direction data. A threshold for the cumulative runoff is set to extract the river network structure. The watershed boundary and sub-watersheds are delineated based on the water flow direction and the watershed outlet. The larger the cumulative runoff value, the larger the upstream catchment area of the grid, and the more likely it is to form a natural river channel. The results of the river network and sub-basin division are abstracted into a topological network graph composed of nodes and directed edges. The node types include monitoring section nodes at the basin outlet, tributary confluence point nodes, and generalized discharge outlet nodes of potential pollution sources. The generalized discharge outlet nodes include industrial enterprise sewage outlets, livestock and poultry farm outlets, urban domestic sewage outlets, and generalized discharge outlets of contiguous farmland. The directed edge direction is from upstream nodes to downstream nodes, and the edge attributes include river segment length, confluence area, and average slope. Based on the directed edge connections between nodes in the topological network graph, a reachability matrix is generated using a directed graph traversal algorithm. The directed graph traversal algorithm is implemented using either breadth-first search or depth-first search, and the reachability matrix is... Square array The total number of nodes, matrix elements The definition is: if there exists a slave node To the node A directed path, then ,otherwise ,in Indicates the source node index. This represents the receptor node index.
[0006] As a further aspect of the present invention: the four types of complementarity analysis specifically include: The stable isotope analysis involves measuring the nitrogen and oxygen isotope values and the phosphorus and oxygen isotope values in the sample. Nitrogen and oxygen isotopes have a good ability to distinguish between chemical fertilizer sources, soil organic nitrogen mineralization sources, livestock and poultry manure sources, and domestic sewage sources. The nitrogen isotope composition values of different types of pollution sources vary significantly. Phosphorus and oxygen isotopes are specific for distinguishing between mineral phosphate fertilizer sources, organic phosphorus sources, and microbial cycle phosphorus. The biomarker fingerprint analysis involves determining the content of fecal sterol, cholesterol, stigmasterol, sitosterol, and campesterol in the sample and calculating characteristic ratios. The characteristic ratios include the ratio of fecal sterol to cholesterol and the proportion of fecal sterol to total sterols. Feces from different sources have different sterol composition patterns. The three-dimensional fluorescence spectroscopy analysis involves scanning the three-dimensional fluorescence spectrum of the dissolved organic matter in the sample. The excitation wavelength range is set to 200 nm to 450 nm, the emission wavelength range is set to 250 nm to 550 nm, and the scanning interval is 5 nm. The three-dimensional fluorescence spectral matrix is decomposed into several independent fluorescence components by parallel factor analysis. Each component corresponds to a specific type of organic matter. The relative proportion of the fluorescence components of dissolved organic matter from different pollution sources constitutes their characteristic fluorescence fingerprint. The routine water quality physicochemical parameter analysis includes measuring total nitrogen, total phosphorus, ammonia nitrogen, chemical oxygen demand, dissolved oxygen, pH value, conductivity, and suspended solids in the sample.
[0007] As a further aspect of the present invention: the standardization process employs the Z-score standardization method. For each indicator dimension, the original measurement value is subtracted from the mean of all samples in that dimension, and then divided by the standard deviation to eliminate differences in the dimensions and magnitudes of different indicators. The standardized indicator values are then combined sequentially to form the endmember sample. dimensional feature vectors, where The total dimension of all indicators; for multiple end-member samples of the same type of pollution source, calculate the mean and standard deviation of their feature vectors in each dimension, as the representative fingerprint and uncertainty range of the source. The number of samples collected for each type of pollution source shall not be less than 5 to ensure the reliability of statistical analysis. The spatial distribution of sampling points covers the main distribution area of the pollution source of this type in the watershed.
[0008] As a further aspect of the present invention: the principle for setting up monitoring points at key nodes is as follows: a main receiver monitoring section is set at the outlet of the basin, a control section is set before the major tributaries of each level flow into the main channel, a background reference section is set in the upstream of the basin, and a densified monitoring section is set downstream of the discharge outlets of key pollution sources. The spatial layout of the monitoring points covers the main confluence nodes and key discharge path nodes in the topological network. The synchronous collection of water samples requires that the sampling time difference of each sampling point be controlled within a set time window. Four fingerprint analyses are performed on each receiver point water sample, which are exactly the same as those of the endmember samples. The analysis methods, instruments, and operating procedures are strictly consistent with those of the endmember sample analysis to ensure that the endmember fingerprints and receiver point fingerprints are comparable under the same analytical framework. When performing the same Z-score standardization processing as the endmember samples, the mean and standard deviation parameters established in the endmember analysis are used to ensure that the receiver point feature vectors and the endmember feature vectors are in the same standardized space.
[0009] As a further aspect of the present invention: the method of filtering the candidate source set using the reachability matrix includes: for each receptor node... Filter out all that meet the requirements source node , forming receptor sites Candidate source set ,Right now The candidate source set contains only those physically located at the receptor site. Source nodes that are upstream and reachable via hydraulic paths are excluded, as are downstream nodes and unconnected nodes.
[0010] As a further aspect of the present invention: the chemical mass balance model for constructing topological constraints includes: With receptor point Using the observed multidimensional eigenvectors as the dependent variable and the endmember multidimensional eigenvectors of each source node in the candidate source set as the independent variables, a mass balance equation system is constructed for the receptor point. The The characteristic index and the mass balance equation are as follows: ; in, Receptor point In the Observations on the dimensional feature index Indicates candidate source nodes In the Mean of endmember features on the dimensional feature index Indicates candidate source nodes receptor point The proportion of contribution, Receptor point The candidate source set, Indicates the first The residual term on the dimension index, For the index of feature metrics dimension, , The total dimension of the feature indicators; Contribution ratio The constraints are satisfied: ,and The above constraints mean that the contribution ratio of each candidate source is non-negative, and the sum of the contribution ratios of all candidate sources is 1. The equations were solved using the nonnegative least squares method to obtain the contribution ratio of each source to each receptor point, totaling [amount missing]. There are several feature indicators in the candidate source set. If there are multiple source nodes, then the system of equations is formed. Equations An overdetermined system of equations with several unknowns was used to obtain the contribution ratio of each source to each receptor site.
[0011] As a further aspect of the present invention: the solution using the nonnegative least squares method includes: Constructing the source fingerprint matrix The matrix dimension is ,matrix The Each element is a candidate source node. In the Mean of endmember features on dimensional feature index ,in This represents the number of source nodes in the candidate source set. Let be the total dimension of the feature indicators, and ; Constructing receptor point observation vectors The vector dimension is 1. Vector The Each element is a receptor point. In the Observations on the feature index ; Define the contribution rate vector to be solved The vector dimension is 1. Vector The Each element is a candidate source node. receptor point Contribution ratio ; Transform the problem into a constrained optimization problem: ; in Denotes the Euclidean norm, with the following constraints: and ; The iterative effective set method is used to solve the problem, and all sets are initialized. In each iteration, the current Variables ≥ 0 are included in the valid set. Unconstrained least squares are performed on the variables in the valid set. If a negative component appears in the solution, the corresponding variable is removed from the valid set and set to zero. The solution is repeated. The iteration continues until all components are non-negative and satisfy the KKT conditions. The contribution rate vector obtained is then normalized to ensure that the sum of all components is 1.
[0012] As a further aspect of the present invention: after determining the contribution ratio of each source to each receptor point, the method further includes: Calculate the analytical residuals. The formula for calculating the residuals is: ; in Indicates the first The residual values on the dimension index are as before; the meanings of the other symbols are the same as before; calculate the root mean square of the normalized residuals for all dimensions. If the root mean square of the normalized residuals is less than the set threshold, the analytical result is considered reliable. The uncertainty range of the contribution rate results was evaluated using the Monte Carlo simulation method. Random perturbation samples were generated for each dimension of the feature values of each source endmember based on their mean and standard deviation. The solution process was repeated no less than 1000 times, and the distribution interval of the contribution rate of each source was statistically analyzed as the uncertainty assessment result.
[0013] As a further aspect of the present invention: the generation of the pollution contribution spatial mapping and source tracing report includes: The contribution rate is assigned to the corresponding spatial source location, and a spatial distribution heat map of pollution contribution rate at the watershed scale is generated using a geographic information system. The color intensity represents the contribution rate of each sub-watershed to the downstream receptor section. Based on the directed edge connections in the water system topology network, a migration path diagram from key pollution sources to the receptor section is drawn, and the contribution rate and uncertainty range of each migration path are marked. The integrated analysis results generate a comprehensive source tracing report, which includes a watershed topology network map, an end-member fingerprint database summary, a receptor point monitoring data summary, quantitative results of each source contribution rate and uncertainty assessment, a heat map of the spatial distribution of pollution contribution, a critical migration path map, and a list of key pollution sources ranked by contribution rate and recommendations for governance priorities.
[0014] Compared with the prior art, the beneficial effects of the present invention by adopting the above technical solution are as follows: 1. This invention constructs a high-dimensional feature space by integrating four complementary chemical fingerprints: stable isotopes, biomarkers, three-dimensional fluorescence spectroscopy, and conventional water quality physicochemical parameters. It utilizes the distinguishing advantages of different fingerprints in different dimensions for cross-validation, effectively overcoming the source type confusion problem caused by the insufficient dimension of a single indicator, and significantly improving the specificity and accuracy of source identification.
[0015] 2. By introducing a digital water system topology network based on a digital elevation model as a spatial constraint framework, and using the reachability matrix to map the chemical analysis results from abstract source types to specific spatial emission units, the organic integration of chemical identification and spatial positioning is achieved, enabling the source tracing results to accurately point to the spatial location with the greatest contribution.
[0016] 3. By introducing the topological accessibility matrix constraint of the water system into the chemical mass balance model, the source apportionment calculation is ensured to consider only the upstream pollution sources that are physically accessible to the receiver section, thus eliminating the interference of unreasonable candidate sources. This makes the contribution rate calculation results conform to the hydrophysical laws and enhances the robustness and reliability of the apportionment results.
[0017] 4. The combined effect of multi-dimensional fingerprint fusion analysis and water system topological constraints can effectively correct the interference and distortion caused by environmental behaviors such as degradation, fractionation and mixing of pollutants during migration on a single fingerprint signal, thereby improving the robustness of the overall method. Attached Figure Description
[0018] Figure 1 This is a schematic diagram of the overall process of the method in an embodiment of the present invention; Figure 2 This is a schematic diagram illustrating the process of constructing a digital water system topology network in a watershed according to an embodiment of the present invention; Figure 3 This is a schematic diagram illustrating the process of constructing a multi-source pollution tracing fingerprint database in an embodiment of the present invention; Figure 4 This is a schematic diagram of multi-source synchronous monitoring and data acquisition of watershed receptor points in an embodiment of the present invention; Figure 5 This is a schematic diagram of multi-source data fusion and pollution source analysis based on topological constraints in an embodiment of the present invention. Detailed Implementation
[0019] The specific embodiments of the present invention will be further described below with reference to the accompanying drawings. It should be noted that the description of these embodiments is for the purpose of helping to understand the present invention, but does not constitute a limitation of the present invention.
[0020] Furthermore, the technical features involved in the various embodiments of the present invention described below can be combined with each other as long as they do not conflict with each other.
[0021] Please see the appendix Figure 1 -Appendix Figure 5 This invention provides a pollution source tracing method that couples water system topological constraints with multi-source tracing technology: It includes five core steps: constructing a digital water system topology network for the watershed, constructing a multi-source pollution tracer fingerprint database, synchronous monitoring and data acquisition of multiple sources at watershed receptor points, multi-source data fusion and pollution source analysis based on topological constraints, and spatial mapping of pollution contribution and generation of source tracing reports.
[0022] The water system topology network construction step provides a spatial basis for the placement of receptor point monitoring points and provides topological constraints for the source apportionment step. The end-member fingerprint database and receptor point fingerprint data are used as inputs to the source apportionment model. The contribution rate results of each source output by the source apportionment are input into the spatial mapping step for visualization and report generation. The entire process forms a complete closed loop from spatial framework construction, fingerprint data collection, coupled model parsing to result visualization output.
[0023] I. Construction of Digital Water System Topology Network in the Watershed The purpose of constructing a digital water system topology network for a watershed is to build a digital water system topology network based on the geospatial data of the target watershed, providing a basic framework for subsequent monitoring point deployment design and spatial constraints of source apportionment models. The detailed process of this step is as follows: Figure 2 As shown.
[0024] First, high-precision digital elevation model (DEM) data of the target watershed is obtained. The spatial resolution of the DEM data is determined based on the watershed area and the required analysis accuracy. For small and medium-sized watersheds, it is recommended to use data with a spatial resolution of not less than 30 meters; for large watersheds, the resolution can be appropriately reduced to 90 meters.
[0025] Then, hydrological preprocessing is performed on the original digital elevation model data. First, depression filling is performed to eliminate areas without flow direction caused by data errors or terrain depressions, ensuring that each grid cell has a clear water outlet. Depression filling adopts a grid-by-grid scanning method to identify local depression areas that are lower than all adjacent grids and raise their elevation values to the lowest overflow outlet elevation.
[0026] The direction of water flow is calculated based on the digital elevation model data after filling depressions. For each grid cell, the D8 single-direction algorithm is used to compare the elevation difference between the grid cell and its eight adjacent grid cells. The direction of water flow is assigned to the direction of the adjacent grid cell with the largest elevation difference. The D8 algorithm encodes the direction of water flow as one of the eight directions, and each grid cell finally obtains a unique water flow direction encoding value.
[0027] The cumulative flow is calculated based on the water flow direction data. Starting from each grid cell, the number of all upstream grid cells flowing into the grid is tracked along the water flow direction to obtain the cumulative flow value of each grid cell. The larger the cumulative flow value, the larger the upstream catchment area of the grid cell is, and the more likely it is to form a natural river channel.
[0028] Set a threshold for the cumulative runoff volume, extract the river network structure of the basin, and mark the grids with a cumulative runoff volume greater than the set threshold as river grids. The threshold is set according to the basin area and river network density requirements. At the same time, based on the water flow direction and the specified basin outlet, the basin boundary is delineated and the basin is subdivided into several sub-basins.
[0029] The extracted natural river network and sub-basin division results are abstracted into a topological network graph composed of nodes and directed edges. The types of nodes include: watershed outlet monitoring section nodes, confluence point nodes of tributaries at all levels, and generalized discharge outlet nodes of potential pollution sources. Among them, generalized discharge outlet nodes include industrial enterprise sewage outlets, livestock and poultry farm outlets, urban domestic sewage outlets, and generalized discharge outlets of contiguous farmland. Directed edges represent river segments or hydraulic paths connecting adjacent nodes, and the direction of the edges is from upstream nodes to downstream nodes. The attributes of the edges include weight information such as river segment length, confluence area, and average slope.
[0030] Based on the directed edge connections between nodes in the topological network graph, a reachability matrix is generated by traversing and calculating. The reachability matrix is as follows: A square matrix of order, in which Let be the total number of nodes, and let the matrix elements be . Its definition is: if there exists a slave node To the node A directed path, i.e., a node Located at node The upstream node that can be reached via a directed edge. ,but ,otherwise ,in, This represents the source node index in the network topology. This represents the index of the receptor node in the topology network. The reachability matrix is represented by a breadth-first search or depth-first search algorithm that traverses a directed graph. The reachability matrix defines, from a physical perspective, whether each potential pollution source can reach a certain receptor monitoring section via a hydraulic path. It is a key data structure for introducing spatial constraints in subsequent source apportionment models.
[0031] II. Construction of Multi-Source Pollution Tracing Fingerprint Database The purpose of constructing the multi-source pollution tracer fingerprint database is to systematically collect representative samples of all potential pollution source end-members within the watershed and establish a localized end-member fingerprint database through multi-dimensional chemical analysis. This database is one of the core inputs for subsequent source apportionment models.
[0032] First, conduct a watershed pollution source survey to identify and classify all potential pollution source types. Based on the characteristics of the watershed, potential pollution source types typically include, but are not limited to: industrial enterprise emissions, urban domestic sewage discharge, livestock and poultry breeding wastewater discharge, agricultural fertilizer runoff, agricultural pesticide runoff, rural decentralized domestic sewage discharge, soil erosion into rivers caused by soil erosion, atmospheric deposition, and wet deposition.
[0033] For each type of potential pollution source, representative endmember samples should be collected within the watershed. The number of samples collected for each type of pollution source should be no less than 5 to ensure the reliability of statistical analysis. The spatial distribution of sampling points should cover the main distribution areas of that type of pollution source within the watershed.
[0034] The following four types of complementarity analysis are performed simultaneously on each endmember sample: The first category is stable isotope analysis, which measures the nitrogen and oxygen isotope values and phosphorus and oxygen isotope values in the sample. Nitrogen and oxygen isotopes have a good ability to distinguish chemical fertilizer sources, soil organic nitrogen mineralization sources, livestock and poultry manure and domestic sewage sources. The nitrogen isotope composition values of different types of pollution sources have significant differences. Phosphorus and oxygen isotopes are specific for distinguishing mineral phosphate fertilizer sources, organic phosphorus sources and microbial cycle phosphorus.
[0035] The second category is biomarker fingerprint analysis, which measures the content of characteristic sterols such as fecal sterol, cholesterol, stigmasterol, sitosterol, and campesterol in the sample and calculates characteristic ratios, including the ratio of fecal sterol to cholesterol and the proportion of fecal sterol to total sterols. This fingerprint plays a key role in distinguishing human feces from feces from different livestock and poultry, as feces from different sources have different sterol composition patterns.
[0036] The third type is three-dimensional fluorescence spectroscopy analysis, which involves scanning the three-dimensional fluorescence spectrum of dissolved organic matter in the sample. The excitation wavelength range is set to 200 nm to 450 nm, the emission wavelength range is set to 250 nm to 550 nm, and the scanning interval is 5 nm. Through parallel factor analysis, the three-dimensional fluorescence spectrum matrix is decomposed into several independent fluorescent components. Each component corresponds to a specific type of organic matter. The relative proportion of the fluorescent components of dissolved organic matter from different pollution sources constitutes its characteristic fluorescence fingerprint.
[0037] The fourth category is the analysis of routine water quality physicochemical parameters, which measures routine water quality physicochemical parameters such as total nitrogen, total phosphorus, ammonia nitrogen, chemical oxygen demand, dissolved oxygen, pH value, conductivity, and suspended solids in the sample. Although these parameters have limited distinguishing ability when used alone, they can enhance the overall fingerprint's distinguishability when incorporated into the multidimensional feature vector as auxiliary dimensions.
[0038] The four analytical methods are complementary: stable isotopes are good at distinguishing major differences between inorganic and organic sources; biomarkers are good at further distinguishing human sources from different livestock and poultry sources within the major categories of organic sources; three-dimensional fluorescence spectroscopy can quickly capture the compositional characteristics of organic matter; and conventional water quality parameters provide auxiliary distinguishing information.
[0039] The four types of analysis results for each endmember sample were standardized using the Z-score standardization method. For each indicator dimension, the original measurement value was subtracted from the mean of all samples in that dimension and then divided by the standard deviation to eliminate the influence of differences in the dimensions and magnitudes of different indicators on subsequent analysis.
[0040] The standardized index values are combined sequentially to form a multidimensional feature vector for each endmember sample. Let there be a total of With each indicator dimension, the feature vector of each endmember sample is: A dimensional vector, where Given the total dimension of all indicators, for multiple end-member samples of the same type of pollution source, calculate the mean and standard deviation of their feature vectors in each dimension, which serve as the representative fingerprint and uncertainty range of that type of source. Integrate and store the fingerprint information of all source types to form an end-member fingerprint database.
[0041] III. Multi-source synchronous monitoring and data acquisition of receptor sites in the watershed Based on the topological network obtained from the water system topology network construction steps, monitoring sampling points are set up at key nodes. The placement principles are as follows: set up the main receiver monitoring section at the watershed outlet; set up control sections before the major tributaries of each level flow into the main channel; set up background reference sections in the upper reaches of the watershed; and set up densified monitoring sections downstream of suspected key pollution source discharge outlets. The spatial layout of the monitoring points should be able to cover the main confluence nodes and key emission path nodes in the topological network.
[0042] During the set monitoring period, water samples are collected synchronously from all monitoring sampling points. Synchronous sampling requires that the sampling time difference between each sampling point be controlled within the shortest possible time window to reduce data inconsistency caused by sampling time differences. For scenarios that require attention to time-varying characteristics, multiple rounds of synchronous sampling can be set.
[0043] For each receptor site water sample, four fingerprint analyses were performed exactly the same as those for the end-member samples: stable isotope analysis, biomarker fingerprint analysis, three-dimensional fluorescence spectroscopy analysis, and routine water quality physicochemical parameter analysis. This yielded an m-dimensional feature vector for each receptor site. The analytical methods, instruments, and operating procedures were strictly consistent with those for the end-member sample analysis to ensure that the end-member fingerprints and receptor site fingerprints were comparable within the same analytical framework.
[0044] The collected receptor fingerprint data undergoes quality control and preprocessing, including: removing outliers caused by abnormal sampling or analysis processes; filling missing data with appropriate interpolation methods; and performing the same Z-score standardization process as the endmember samples, using the mean and standard deviation parameters established in the endmember analysis to ensure that the receptor feature vector and the endmember feature vector are in the same standardized space.
[0045] IV. Multi-source data fusion and pollution source analysis based on topological constraints This step is the core step of the present invention. By deeply coupling the water system topology network constraints with multidimensional chemical fingerprint information, quantitative analysis of pollution sources can be achieved.
[0046] First, perform data spatial association binding, which associates the fingerprint data of each end-unit in the end-unit fingerprint database with its corresponding spatial node position in the topology network, and associates the fingerprint data of the recipient point with its corresponding monitoring section node position in the topology network. After association binding, each topology network node carries its corresponding multidimensional fingerprint information or end-unit affiliation information.
[0047] Then, candidate source set filtering is performed for each receptor node. Using the aforementioned reachability matrix Filter out all that meet the requirements source node , forming receptor sites The candidate source set, which contains only sources physically located at the receptor site. The source node is upstream and reachable via a hydraulic path, eliminating interference from downstream nodes and disconnected nodes. Indicates the source node index. This represents the receptor point node index.
[0048] Next, a topologically constrained chemical mass balance model is constructed, with the acceptor point as the basis. Using the observed multidimensional eigenvectors as the dependent variable and the endmember multidimensional eigenvectors of each source node in its candidate source set as the independent variables, a topologically constrained chemical mass balance equation system is constructed.
[0049] The mathematical expression of the model is: for the receptor point The The dimensional characteristic index has the following mass balance equation: ; in, Receptor point In the Observations on the dimensional feature index Indicates candidate source nodes In the Mean of endmember features on the dimensional feature index Indicates candidate source nodes receptor point The proportion of contribution, Receptor point The candidate source set, Indicates the first The residual term on the dimension index, For the index of feature metrics dimension, , The total dimension of the feature index.
[0050] Contribution ratio The constraints are satisfied: ,and .
[0051] The above constraints mean that the contribution ratio of each candidate source is non-negative, and the sum of the contribution ratios of all candidate sources is 1.
[0052] Solving the above system of equations using the nonnegative least squares method yields a total of... There are several feature indicators in the candidate source set. If there are multiple source nodes, then the system of equations is formed. Equations A system of overdetermined equations with n unknowns, where Given the number of source nodes in the candidate source set, the requirement is... This is to ensure that the system of equations has a solution.
[0053] The detailed implementation of the solution process is as follows: Constructing the source fingerprint matrix The matrix dimension is ,matrix The Each element is a candidate source node. In the Mean of endmember features on dimensional feature index ; Constructing receptor point observation vectors The vector dimension is 1. Vector The Each element is a receptor point. In the Observations on the feature index ; Define the contribution rate vector to be solved The vector dimension is ×1, vector The Each element is a candidate source node. receptor point Contribution ratio ; The problem is transformed into an optimization problem with the following constraints: ; in Denotes the Euclidean norm, with the following constraints: and ; The above non-negativity constrained least squares problem is solved using the iterative effective set method, initializing all... In each iteration, the current The variables are included in the valid set. Unconstrained least squares are performed on the variables in the valid set. If a negative component appears in the solution, the corresponding variable is removed from the valid set and set to zero. The solution is repeated until all components are non-negative and the KKT conditions are met, and then the process terminates.
[0054] The contribution rate vector obtained by solving Perform normalization to ensure that the sum of all components is 1.
[0055] Calculate the analytical residuals to evaluate the model fit quality. The formula for calculating the residuals is: ; in Indicates the first The residual values on the dimension index are as before; the meanings of the other symbols are the same as before; calculate the root mean square of the normalized residuals for all dimensions. If the root mean square of the normalized residuals is less than the set threshold, the analytical result is considered reliable. The uncertainty range of the contribution rate results was evaluated using the Monte Carlo simulation method. Random perturbation samples were generated for each dimension of the feature values of each source endmember based on their mean and standard deviation. The solution process was repeated no less than 1000 times, and the distribution interval of the contribution rate of each source was statistically analyzed as the uncertainty assessment result.
[0056] While the present invention has been disclosed above with reference to preferred embodiments, it is not intended to limit the invention. Any variations and modifications can be made by those skilled in the art without departing from the spirit and scope of the invention. Therefore, any modifications, equivalent changes, and alterations made to the above embodiments based on the technical essence of the present invention, without departing from the scope of the invention, fall within the protection scope defined by the claims of the present invention.
Claims
1. A pollution source tracing method that couples water system topological constraints with multi-source tracing technology, characterized in that: Includes the following steps: Constructing a digital water system topology network for the watershed: Based on the digital elevation model data of the target watershed, through hydrological preprocessing, water flow direction calculation, runoff accumulation analysis and river network extraction, the natural river network is abstracted into a topology network graph composed of nodes and directed edges, and an reachability matrix is generated based on the directed edge connection relationship; Constructing a multi-source pollution tracer fingerprint database: Collect end-member samples of potential pollution sources in the watershed, and simultaneously perform four complementary analyses on each end-member sample: stable isotope analysis, biomarker fingerprint analysis, three-dimensional fluorescence spectroscopy analysis, and conventional water quality physicochemical parameter analysis. After standardization, a multi-dimensional feature vector is formed, and the end-member fingerprint database is constructed according to the source type. Conduct multi-source synchronous monitoring and data acquisition of receptor points in the watershed: Based on the water system topology network, monitoring points are set up at key nodes, water samples are collected synchronously, and the same four fingerprint analyses as those for end-member samples are performed to obtain multi-dimensional feature vectors of receptor points; Perform multi-source data fusion and pollution source analysis based on topological constraints: associate end-member fingerprints and receptor point fingerprints with spatial locations in the topological network, use the reachability matrix to screen the candidate source set, construct a chemical mass balance model with topological constraints, and solve for the contribution ratio of each source to each receptor point; Generate a spatial map and source tracing report of pollution contribution: Assign the contribution rate back to the spatial source location, use the geographic information system to generate a spatial distribution map of pollution contribution rate and a critical migration path map, and form a comprehensive source tracing report.
2. The pollution source tracing method according to claim 1, which couples water system topological constraints with multi-source tracing technology, is characterized in that: The construction of the digital water system topology network in the watershed includes: Perform depression filling preprocessing on the digital elevation model data to eliminate non-flowing depression areas; The water flow direction of each grid cell is calculated based on the D8 single-direction algorithm, and the water flow direction is specified as the direction of the adjacent grid cell whose elevation difference is the elevation difference value among the eight adjacent grid cells. The cumulative runoff volume is calculated based on the flow direction data, the cumulative runoff volume threshold is set to extract the river network structure, and the basin boundary and sub-basins are delineated based on the flow direction and the basin outlet. The results of the river network and sub-basin division are abstracted into a topological network graph consisting of nodes and directed edges. The node types include monitoring section nodes at the basin outlet, tributary confluence nodes, and generalized discharge outlet nodes of potential pollution sources. The directed edges point from upstream nodes to downstream nodes, and the attributes of the edges include river segment length, confluence area, and average slope. Based on the directed edge connections between nodes in the topological network graph, a reachability matrix is generated using a directed graph traversal algorithm. The reachability matrix is Square array The total number of nodes, matrix elements The definition is: if there exists a slave node To the node A directed path, then ,otherwise ,in Indicates the source node index. This represents the receptor node index.
3. The pollution source tracing method according to claim 1, which combines coupled water system topological constraints and multi-source tracing technology, is characterized in that: The four types of complementarity analysis specifically include: The stable isotope analysis involves measuring the nitrogen and oxygen isotope values and the phosphorus and oxygen isotope values in the sample. The biomarker fingerprint analysis involves determining the content of fecal sterol, cholesterol, stigmasterol, sitosterol, and campesterol in the sample and calculating characteristic ratios, including the ratio of fecal sterol to cholesterol and the proportion of fecal sterol to total sterols. The three-dimensional fluorescence spectroscopy analysis involves scanning the three-dimensional fluorescence spectrum of the dissolved organic matter in the sample and decomposing the three-dimensional fluorescence spectral matrix into several independent fluorescent components using a parallel factor analysis method. The routine water quality physicochemical parameter analysis includes measuring total nitrogen, total phosphorus, ammonia nitrogen, chemical oxygen demand, dissolved oxygen, pH value, conductivity, and suspended solids in the sample.
4. The pollution source tracing method according to claim 1, which couples water system topological constraints with multi-source tracing technology, is characterized in that: The standardization process employs the Z-score standardization method. For each indicator dimension, the original measurement value is subtracted from the mean of all samples in that dimension, and then divided by the standard deviation to eliminate differences in the dimensions and magnitudes of different indicators. The standardized indicator values are then combined sequentially to form the endmember sample. dimensional feature vectors, where The total dimension of all indicators; for multiple end-member samples of the same type of pollution source, calculate the mean and standard deviation of their feature vectors in each dimension, as representative fingerprints and uncertainty ranges of that type of source.
5. The pollution source tracing method according to claim 1, which combines water system topological constraints with multi-source tracing technology, is characterized in that: The principle for setting up monitoring points at key nodes is as follows: a main receiver monitoring section is set up at the outlet of the basin, a control section is set up before the major tributaries of each level flow into the main channel, a background reference section is set up in the upstream of the basin, and a densified monitoring section is set up downstream of the discharge outlet of key pollution sources; the synchronous collection of water samples requires that the sampling time difference of each sampling point be controlled within the set time window.
6. The pollution source tracing method according to claim 2, which combines water system topological constraints with multi-source tracing technology, is characterized in that: The method of filtering the candidate source set using the reachability matrix includes: for each receptor node Filter out all that meet the requirements source node , forming receptor sites Candidate source set ,Right now The candidate source set contains only those physically located at the receptor site. Source nodes that are upstream and reachable via hydraulic paths are excluded, as are downstream nodes and unconnected nodes.
7. The pollution source tracing method according to claim 6, which combines water system topological constraints with multi-source tracing technology, is characterized in that: The chemical mass balance model with topological constraints includes: With receptor point Using the observed multidimensional eigenvectors as the dependent variable and the endmember multidimensional eigenvectors of each source node in the candidate source set as the independent variables, a mass balance equation system is constructed for the receptor point. The The characteristic index and the mass balance equation are as follows: ; in, Receptor point In the Observations on the dimensional feature index Indicates candidate source nodes In the Mean of endmember features on the dimensional feature index Indicates candidate source nodes receptor point The proportion of contribution, Receptor point The candidate source set, Indicates the first The residual term on the dimension index, For the index of feature metrics dimension, , The total dimension of the feature indicators; Contribution ratio The constraints are satisfied: ,and ; The equations were solved using the non-negative least squares method to obtain the contribution ratio of each source to each receptor point.
8. The pollution source tracing method according to claim 7, which combines water system topological constraints with multi-source tracing technology, is characterized in that: The method of solving using nonnegative least squares includes: Constructing the source fingerprint matrix The matrix dimension is ,matrix The Each element is a candidate source node. In the Mean of endmember features on dimensional feature index ,in This represents the number of source nodes in the candidate source set. Let be the total dimension of the feature indicators, and ; Constructing receptor point observation vectors The vector dimension is 1. Vector The Each element is a receptor point. In the Observations on the feature index ; Transform the problem into a constrained optimization problem: ; in Denotes the Euclidean norm, with the following constraints: and ; The iterative effective set method is used to solve the problem, and the resulting contribution rate vector is normalized to ensure that the sum of all components is 1.
9. A pollution source tracing method according to claim 8, characterized in that: After determining the contribution ratio of each source to each receptor site, the following steps are also included: Calculate the analytical residuals. The formula for calculating the residuals is: ; in Indicates the first The residual values on the dimension index are as before; the meanings of the other symbols are the same as before; calculate the root mean square of the normalized residuals for all dimensions. If the root mean square of the normalized residuals is less than the set threshold, the analytical result is considered reliable. The uncertainty range of the contribution rate results was evaluated using the Monte Carlo simulation method. Random perturbation samples were generated for each dimension of the feature values of each source endmember based on their mean and standard deviation. The solution process was repeated no less than 1000 times, and the distribution interval of the contribution rate of each source was statistically analyzed as the uncertainty assessment result.
10. The pollution source tracing method according to claim 1, which couples water system topological constraints with multi-source tracing technology, is characterized in that: The generated pollution contribution spatial mapping and source tracing report includes: The contribution rate is assigned to the corresponding spatial source location, and a spatial distribution heat map of pollution contribution rate at the watershed scale is generated using a geographic information system. The color intensity represents the contribution rate of each sub-watershed to the downstream receptor section. Based on the directed edge connections in the water system topology network, a migration path diagram from key pollution sources to the receptor section is drawn, and the contribution rate and uncertainty range of each migration path are marked. The integrated analysis results generate a comprehensive source tracing report, which includes a watershed topology network map, an end-member fingerprint database summary, a receptor point monitoring data summary, quantitative results of each source contribution rate and uncertainty assessment, a heat map of the spatial distribution of pollution contribution, a critical migration path map, and a list of key pollution sources ranked by contribution rate and recommendations for governance priorities.