Surface water extraction model training method, surface water extraction method and related system

By constructing a training sample set and training it using a random forest model, and combining spectral bands and water body indices for multi-dimensional clustering, the problem of insufficient accuracy in surface water extraction is solved, achieving high-precision automatic surface water extraction and large-scale surface water mapping, supporting regional water resource management.

CN117911863BActive Publication Date: 2026-06-26BEIJING NORMAL UNIVERSITY

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
BEIJING NORMAL UNIVERSITY
Filing Date
2024-01-02
Publication Date
2026-06-26

AI Technical Summary

Technical Problem

Existing technologies suffer from insufficient accuracy in surface water extraction and are difficult to apply to large-scale, high-resolution surface water mapping. In particular, supervised classification is difficult due to sparse training and sample collection, while unsupervised classification is subject to uncertainty due to feature combination and clustering algorithm selection.

Method used

A surface water extraction model training method is adopted. By constructing a training sample set, selecting a spectral band set and calculating the water index, and combining it with random forest model training, a surface water mask is generated. Multi-dimensional clustering is then performed using spectral bands and water index to automatically extract surface water.

Benefits of technology

It improves the accuracy of surface water extraction, can generate high-precision surface water masks, is suitable for large-scale, high-resolution surface water mapping, and provides technical support for regional water resource management.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN117911863B_ABST
    Figure CN117911863B_ABST
Patent Text Reader

Abstract

The present application provides a kind of surface water extraction model training method, surface water extraction method and related system, it is related to remote sensing technology monitoring surface water field, wherein training method mainly includes: constructing training sample set;According to each surface water remote sensing image in training sample set, select spectral band set;Water body index set is obtained according to water body index calculation according to spectral band set;According to spectral band set and water body index set, get labeled clustering result;With the spectral band set corresponding to the surface water remote sensing image, water body index set as input, with the labeled clustering result corresponding to the surface water remote sensing image as output, train random forest model, obtain surface water extraction model;Surface water extraction model is used to obtain the corresponding labeled clustering result based on the surface water remote sensing image, and generate surface water mask according to the labeled clustering result, not only can improve the accuracy of model extraction surface water, also can be applied to large-scale high-resolution surface water mapping.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of remote sensing monitoring of surface water, and in particular to a surface water extraction model training method, a surface water extraction method, and a related system. Background Technology

[0002] Surface water, composed of lakes, ponds, rivers, and other water systems, interacts significantly with environmental systems. Accurate spatiotemporal monitoring of inland surface water is crucial for many applications, such as natural disaster risk management, sustainable development and water resource management, maintaining ecosystem stability, and mitigating the impacts of climate change.

[0003] Currently, commonly used techniques in surface water extraction and monitoring include supervised classification and unsupervised classification. While supervised classification can achieve good classification results, its application in large-scale surface water mapping is limited by sparse training and difficulties in sample collection. Unsupervised classification, on the other hand, can consider multi-dimensional features simultaneously and achieve automatic classification without human intervention. However, the choice of feature combination, number of clusters, and clustering algorithm introduces significant uncertainty into the classification results, leading to inaccurate and unreliable final surface water extraction classification results.

[0004] Therefore, how to provide a surface water extraction method with higher accuracy that can also be applied to large-scale, high-resolution surface water mapping has become a technical problem that urgently needs to be solved in this field. Summary of the Invention

[0005] The purpose of this invention is to provide a surface water extraction model training method, a surface water extraction method, and a related system, which can not only improve the accuracy of surface water extraction, but also be applied to large-scale high-resolution surface water mapping, and provide technical support for regional water resource management.

[0006] To achieve the above objectives, the present invention provides the following solution:

[0007] On the one hand, this invention proposes a training method for a surface water extraction model, including:

[0008] Construct a training sample set, which includes several remote sensing images of surface water.

[0009] Based on the surface water remote sensing images in the training sample set, a spectral band set is selected, which includes several spectral bands.

[0010] Water body indices are calculated based on the set of spectral bands to obtain a set of water body indices, which includes several water body indices.

[0011] Based on the set of spectral bands and the set of water body indices, labeled clustering results are obtained, in which each cluster is labeled with a surface water label or a non-surface water label.

[0012] Using the spectral band set and water index set corresponding to the surface water remote sensing image as input, and the labeled clustering results corresponding to the surface water remote sensing image as output, a random forest model is trained to obtain a surface water extraction model. The surface water extraction model is used to obtain the corresponding labeled clustering results based on the surface water remote sensing image, and generate a surface water mask based on the labeled clustering results. The surface water mask is a labeled binary raster image, and the labels include labels indicating that a certain area is surface water or non-surface water.

[0013] Optionally, based on the spectral band set and the water index set, labeled clustering results are obtained, specifically including:

[0014] Spectral bands are selected from the set of spectral bands, and water indices are selected from the set of water indices as clustering features to construct a clustering feature set;

[0015] Multidimensional hierarchical clustering is performed on the clustering feature set to obtain unlabeled clustering results;

[0016] Based on the unlabeled clustering results, the labeled clustering results are obtained.

[0017] Optionally, the clustering feature set is subjected to multi-dimensional hierarchical clustering to obtain unlabeled clustering results, specifically including:

[0018] The SNIC algorithm is used to perform superpixel segmentation on each cluster feature in the cluster feature set to obtain several superpixel grids;

[0019] Use the raster-to-vector tool in ArcGIS Pro to convert the superpixel raster into vector polygons;

[0020] Randomly sample the vector polygons to obtain several vector polygon samples;

[0021] The vector polygon samples are subjected to hierarchical clustering using a hierarchical clustering algorithm. The optimal number of clusters is determined based on the Calinski-Harabasz index, resulting in unlabeled clustering results.

[0022] Optionally, the spectral band set includes B2, B3, B4, B8, B11, and B12.

[0023] Optionally, the water body index set includes AWEI, MBWI, MNDWI, and NDWI.

[0024] Optionally, the clustering feature set includes B8, MBWI, and NDWI.

[0025] Optionally, based on the unlabeled clustering results, labeled clustering results are obtained, specifically including:

[0026] Calculate the mean MBWI of all clusters in the unlabeled clustering results;

[0027] The mean MBWI of all clusters is compared, and the cluster with the largest mean MBWI is selected as the first cluster to be processed.

[0028] Determine whether the mean B2 value of the first cluster to be processed is less than a first threshold, and obtain a first determination result;

[0029] When the first judgment result is yes, then it is determined whether the mean NDWI value of the first cluster to be processed is greater than the second threshold, and the second judgment result is obtained.

[0030] When the second judgment result is yes, the first cluster to be processed is marked as surface water, and all other clusters are marked as non-surface water;

[0031] If the second judgment result is negative, then all clusters are marked as non-surface water;

[0032] If the first judgment result is negative, then the cluster with the second largest MBWI mean is selected as the second cluster to be processed.

[0033] Determine whether the mean NDWI value of the second cluster to be processed is greater than the second threshold to obtain the third judgment result;

[0034] When the third judgment result is yes, the second cluster to be processed is marked as surface water, and all other clusters are marked as non-surface water;

[0035] If the third judgment result is negative, then all clusters are marked as non-surface water.

[0036] On the other hand, this invention proposes a training system for a surface water extraction model, comprising:

[0037] The training sample set construction module is used to construct a training sample set, which includes several remote sensing images of surface water.

[0038] The first selection module is used to select a set of spectral bands based on the training sample set, wherein the set of spectral bands includes several spectral bands.

[0039] The first calculation module is used to calculate water body indices based on the spectral band set to obtain a water body index set, which includes several water body indices.

[0040] The first clustering module is used to obtain labeled clustering results based on the spectral band set and the water index set, wherein each cluster in the labeled clustering results is labeled with a surface water label or a non-surface water label.

[0041] The model training module is used to train a random forest model by taking the spectral band set and water index set corresponding to the surface water remote sensing image as input and the labeled clustering results corresponding to the surface water remote sensing image as output, to obtain a surface water extraction model. The surface water extraction model is used to obtain the corresponding labeled clustering results based on the surface water remote sensing image and generate a surface water mask based on the labeled clustering results. The surface water mask is a labeled binary raster image, and the labels include labels indicating that a certain area is surface water or non-surface water.

[0042] Furthermore, this invention also proposes a method for extracting surface water, comprising:

[0043] Acquire remote sensing images of surface water in the target area;

[0044] Based on the remote sensing image of surface water in the target area, a set of spectral bands is selected, which includes several spectral bands;

[0045] Water body indices are calculated based on the set of spectral bands to obtain a set of water body indices, which includes several water body indices.

[0046] Based on the set of spectral bands and the set of water body indices, labeled clustering results are obtained, in which each cluster is labeled with a surface water label or a non-surface water label.

[0047] The spectral band set and water index set corresponding to the surface water remote sensing image of the target area are input into the surface water extraction model to obtain the labeled clustering results corresponding to the surface water remote sensing image of the target area, and a surface water mask is generated based on the labeled clustering results; wherein, the surface water extraction model is a model trained by the training method of the surface water extraction model described above, and the surface water mask is a labeled binary raster image, wherein the label includes a label that a certain area is surface water or a label that is not surface water.

[0048] Furthermore, this invention also proposes a surface water extraction system, comprising:

[0049] The surface water remote sensing image acquisition module is used to acquire surface water remote sensing images of the target area;

[0050] The second selection module is used to select a set of spectral bands based on the remote sensing image of surface water in the target area, wherein the set of spectral bands includes several spectral bands.

[0051] The second calculation module is used to calculate water body indices based on the spectral band set to obtain a water body index set, which includes several water body indices.

[0052] The second clustering module is used to obtain labeled clustering results based on the spectral band set and the water body index set, wherein each cluster in the labeled clustering results is labeled with a surface water label or a non-surface water label.

[0053] The surface water extraction module is used to input the spectral band set and the water index set corresponding to the surface water remote sensing image of the target area into the surface water extraction model, obtain the labeled clustering results corresponding to the surface water remote sensing image of the target area, and generate a surface water mask based on the labeled clustering results; wherein, the surface water extraction model is a model trained by the training method of the surface water extraction model described above, and the surface water mask is a labeled binary raster image, the labels including labels indicating that a certain area is surface water or non-surface water.

[0054] According to specific embodiments provided by the present invention, the present invention discloses the following technical effects:

[0055] This invention provides a surface water extraction model training method, surface water extraction method, and related system. Based on surface water remote sensing images, a set of spectral bands is first established by selecting a portion of spectral bands. Then, water indices are calculated from the spectral band set to obtain a set of water indices. Next, labeled clustering results are obtained based on the spectral band set and the water indices. The labeled clustering results, the spectral band set, and the water indices are combined to train a random forest model, thereby obtaining a trained surface water extraction model. This surface water extraction model is used to generate corresponding surface water masks for surface water remote sensing images. The surface water mask can accurately and intuitively reflect whether each area in the surface water remote sensing image is surface water or non-surface water. As can be seen, this invention uses a robust random forest model as the surface water extraction model, while considering multiple surface water-related features. It combines spectral bands, water body indices, and clustering results for analysis and processing, and completes automatic surface water extraction based on high-resolution remote sensing images in a multi-dimensional and high-precision manner. This not only effectively improves the accuracy of surface water extraction by the model and obtains more accurate and reliable surface water extraction results, but also generates surface water masks that can be applied to large-scale high-resolution surface water mapping, providing technical support for large-scale dynamic monitoring of surface water and regional water resource management. Attached Figure Description

[0056] To more clearly illustrate the technical solutions in the embodiments of the present invention or the prior art, the drawings used in the embodiments will be briefly introduced below. Obviously, the drawings described below are only some embodiments of the present invention. For those skilled in the art, other drawings can be obtained based on these drawings without creative effort.

[0057] Figure 1 A flowchart of a training method for a surface water extraction model provided in Embodiment 1 of the present invention;

[0058] Figure 2 This is a schematic diagram of the SNIC algorithm superpixel segmentation in the method provided in Embodiment 1 of the present invention;

[0059] Figure 3 This is a basic flowchart of the method for extracting surface water provided in Embodiment 1 of the present invention;

[0060] Figure 4 A schematic diagram illustrating the principle of extracting surface water using the method provided in Embodiment 1 of the present invention;

[0061] Figure 5 This is a schematic diagram of the surface water mask in the method provided in Embodiment 1 of the present invention;

[0062] Figure 6 This is a structural block diagram of a training system for a surface water extraction model provided in Embodiment 2 of the present invention;

[0063] Figure 7 This is an overall flowchart of a surface water extraction method provided in Embodiment 3 of the present invention;

[0064] Figure 8 This is a structural block diagram of a surface water extraction system provided in Embodiment 4 of the present invention. Detailed Implementation

[0065] The technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of the present invention, and not all embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those skilled in the art without creative effort are within the scope of protection of the present invention.

[0066] Compared to traditional field surveys, remote sensing technology can monitor surface water dynamics efficiently and at a lower cost. However, a major challenge currently facing this technology is the trade-off between temporal and spatial resolution. The Medium Resolution Imaging Spectroradiometer (MODIS) offers high temporal resolution (daily) but low spatial resolution (typically 250-500 m). While it has been applied to large-scale surface water monitoring and flood mapping, the problem of mixed pixels still introduces significant errors. The Landsat sensor, with a 16-day revisit period and a spatial resolution of 30 m, is widely used for annual or monthly surface water dynamics monitoring. However, monitoring features such as rivers and aquaculture ponds remains challenging, and the long revisit period makes it difficult to obtain monthly cloud-free composite images. Recently, satellite observation capabilities have significantly improved. The Sentinel-2 satellite, designed by the European Space Agency, is equipped with optical sensors and can provide global observation data with a spatial resolution of up to 10 m and a revisit period of only 5-6 days. Sentinel-2 data has been used for surface water extent monitoring at regional and national scales, and its high data quality makes it a powerful tool for surface water mapping.

[0067] Accurate identification of surface water pixels in remote sensing images is a prerequisite for all subsequent applications. Commonly used techniques in surface water monitoring include water body index thresholding, supervised classification, unsupervised classification, hue, saturation, and value conversion, superpixel segmentation, and spectral mixing analysis. Among these, water body indices are widely used due to their ease of implementation and high efficiency. However, no fixed threshold is suitable for all scenarios. Some automatic thresholding methods, such as thresholding, are often used in conjunction with water body indices to achieve automatic water mapping, but this algorithm fails when the water body index does not meet the bimodal histogram requirement. Supervised classification, especially machine learning, often achieves better classification results. However, due to the problems of sparse training and the difficulty of sample collection, supervised classification is difficult to apply to large-scale surface water mapping. Furthermore, considering the use of image segmentation algorithms in the image preprocessing stage to achieve object-oriented classification is very effective in suppressing salt and pepper noise in the classification results.

[0068] Unsupervised classification can simultaneously consider multidimensional features, such as multiple water body indices, and can achieve automatic classification without human intervention. However, the combination of features, the number of clusters, and the choice of clustering algorithm introduce significant uncertainty into the classification results. Furthermore, accurately distinguishing the categories of surface water is also a challenge. Unsupervised classification is not currently widely used in surface water monitoring. Considering the large volume of high spatiotemporal resolution imagery data in large-scale surface water monitoring and the difficulty of collecting samples for supervised classification, developing a highly accurate automatic surface water extraction method combining unsupervised and supervised classification has significant application value and practical significance.

[0069] The purpose of this invention is to provide a surface water extraction model training method, a surface water extraction method, and a related system, aiming to improve the accuracy of automatic surface water extraction from remote sensing and to provide a reference for the application of unsupervised classification in remote sensing image classification. This algorithm can achieve automatic mapping of large-scale surface water from remote sensing without human intervention, providing technical support for regional water resource management.

[0070] To make the above-mentioned objects, features and advantages of the present invention more apparent and understandable, the present invention will be further described in detail below with reference to the accompanying drawings and specific embodiments.

[0071] Example 1

[0072] This embodiment provides a training method for a surface water extraction model, such as... Figure 1 As shown, the method specifically includes the following steps:

[0073] Step A1: Construct a training sample set, which includes several remote sensing images of surface water.

[0074] Step A2: Select a set of spectral bands based on the surface water remote sensing images in the training sample set. The set of spectral bands includes several spectral bands.

[0075] Step A3: Calculate the water index based on the spectral band set to obtain a water index set, which includes several water indices.

[0076] Step A4: Based on the spectral band set and the water index set, obtain labeled clustering results, wherein each cluster in the labeled clustering results is labeled with either surface water or non-surface water.

[0077] Step A5: Using the spectral band set and water index set corresponding to the surface water remote sensing image as input, and the labeled clustering results corresponding to the surface water remote sensing image as output, train the random forest model to obtain the surface water extraction model; the surface water extraction model is used to obtain the corresponding labeled clustering results based on the surface water remote sensing image, and generate a surface water mask based on the labeled clustering results. The surface water mask is a labeled binary raster image, and the labels include labels indicating that a certain area is surface water or non-surface water.

[0078] In this embodiment, step A4 specifically includes the following steps:

[0079] Step A41: Select spectral bands from the spectral band set and water indexes from the water index set as clustering features to construct a clustering feature set.

[0080] Step A42: Perform multidimensional hierarchical clustering on the clustering feature set to obtain unlabeled clustering results.

[0081] Step A43: Obtain labeled clustering results based on the unlabeled clustering results.

[0082] In this embodiment, step A42 specifically includes the following steps:

[0083] Step A421: Use the SNIC algorithm to perform superpixel segmentation on each cluster feature in the cluster feature set to obtain several superpixel grids, such as... Figure 2 As shown.

[0084] Step A422: Use the raster-to-vector tool in ArcGIS Pro software to convert the superpixel raster into a vector polygon.

[0085] Step A423: Randomly sample the vector polygons to obtain several vector polygon samples.

[0086] Step A424: Perform hierarchical clustering on the vector polygon samples using a hierarchical clustering algorithm, and determine the optimal number of clusters based on the Calinski-Harabasz index to obtain the unlabeled clustering results.

[0087] In this embodiment, the spectral band set includes B2, B3, B4, B8, B11, and B12; the water body index set includes AWEI, MBWI, MNDWI, and NDWI; and the clustering feature set includes B8, MBWI, and NDWI. Among these, B8, NDWI, and MBWI can be applied to most types of scenarios for surface water extraction clustering experiments with good results. The B8 band has the highest resolution and most obvious water body characteristics among surface reflectance features. NDWI is the most classic water body index, and MBWI has strong noise resistance in various scenarios. The selected spectral band information and water body index calculation methods in this embodiment are shown in Table 1.

[0088] Table 1. Spectral band information and water index calculation method

[0089]

[0090]

[0091] In this embodiment, step A43 specifically includes the following steps:

[0092] Step A431: Calculate the mean MBWI of all clusters in the unlabeled clustering results;

[0093] Step A432: Compare the mean MBWI values ​​of all clusters and select the cluster with the largest mean MBWI value as the first cluster to be processed.

[0094] Step A433: Determine whether the mean B2 value of the first cluster to be processed is less than a first threshold, and obtain a first determination result, specifically including:

[0095] When the first judgment result is yes, then it is determined whether the mean NDWI value of the first cluster to be processed is greater than the second threshold, and a second judgment result is obtained.

[0096] When the second judgment result is yes, the first cluster to be processed is marked as surface water, and the other clusters are marked as non-surface water.

[0097] If the second judgment result is negative, then all clusters are marked as non-surface water.

[0098] If the first judgment result is negative, then the cluster with the second largest MBWI mean is selected as the second cluster to be processed.

[0099] Step A434: Determine whether the mean NDWI value of the second cluster to be processed is greater than the second threshold to obtain the third judgment result, which specifically includes:

[0100] When the third judgment result is yes, the second cluster to be processed is marked as surface water, and all other clusters are marked as non-surface water.

[0101] If the third judgment result is negative, then all clusters are marked as non-surface water.

[0102] This embodiment selects Poyang Lake as the example area. The following is an example of the Poyang Lake area. Figure 3 and Figure 4As shown, firstly, seasonal median synthetic Sentinel-2 surface reflectance data of the Poyang Lake area after cloud removal was obtained. Six high-resolution spectral reflectance bands (B2, B3, B4, B8, B11, and B12) were selected, and four water body indices (AWEI (Automated Water Extraction Index), MBWI (Multiband Water Index), MNDWI (Modified Normalized Difference Water Index), and NDWI (Normalized Difference Water Index)) were calculated based on these bands. Then, a false-color image was synthesized based on the spectral band data, and water and non-water body validation points were collected from the image for algorithm accuracy evaluation. Next, three features were selected from the above features for clustering experiments. The SNIC algorithm (Simple Non-Iterative Clustering) was used to segment the selected clustering features into superpixels, and these raster data were converted into vector data to perform object-oriented classification. Random sampling is performed on the vector polygons to conduct multi-dimensional hierarchical clustering experiments. The optimal number of clusters is determined based on the Calinski-Harabasz index, resulting in unlabeled clustering results. Then, based on the relatively stable surface water-related features B8, MBWI, and NDWI, clusters representing water and non-water elements in the unlabeled clustering results are identified, leading to labeled clustering results. Finally, based on the labeled clustering results and combined with all available feature data in this invention (i.e., B2, B3, B4, B8, B11, and B12 in the spectral band set, and AWEI, MBWI, MNDWI, and NDWI in the water indices set), a random forest model is trained to obtain a surface water extraction model, which is then used to generate a surface water mask.

[0103] It should be noted that the Poyang Lake area covers more than 10,000 square kilometers, mainly including urban buildings, roads, numerous aquaculture ponds, and building and terrain shadows. The remote sensing images are composite images from winter, making accurate mapping of surface water in this area challenging. The remote sensing data came from the Google Earth Engine (GEE) platform. Bands B2, B3, B4, B8, B11, and B12 with a resolution of 10m-20m were selected to form a spectral band set (Sentinel-2 imagery has 13 bands with a resolution of 10m-60m. All bands with a resolution of 10m-20m were used here. Among them, B2, B3, B4, and B8 have a spatial resolution of 10m, and B11 and B12 have a spatial resolution of 20m). Water body indices AWEI, MBWI, MNDWI, and NDWI were calculated to form a water body index set. Subsequently, verification samples were collected based on false-color composite images. First, using the Sentinel-2 surface reflectance product and Sentinel-2 cloud probability dataset provided by GEE, winter de-clouded images of Poyang Lake from December 1, 2022 to March 1, 2023 were synthesized with a 50% threshold. The water index was directly calculated based on GEE and downloaded locally. Then, false-color composite was completed using ArcGIS Pro 3.0, and new polygon vector data was collected to collect water and non-water verification samples. Subsequently, points were randomly generated within the polygon data as verification points, with the ratio of water to non-water samples being approximately 1:1.

[0104] False-color compositing refers to assigning values ​​B8, B4, and B3 to the red, green, and blue channels in ArcGIS Pro 3.0, respectively, to highlight the characteristics of surface water in the image. The sample collection process involves identifying water and non-water areas in the image based on the bluish-black appearance of surface water in the false-color composite image. Specifically, two shapefiles containing polygonal features are created in ArcGIS Pro, named "water" and "non-water" respectively. Water and non-water polygons are created throughout the study area through visual interpretation. Then, the "Create Random Points" tool in the Geoprocessing Toolbox is used to generate approximately the same number of water and non-water validation samples. Finally, water is marked as 1 and non-water as 0, and these are combined into a single shapefile for accuracy evaluation.

[0105] Step A42 in this embodiment essentially involves object-oriented thinking, raster-to-vector conversion, and the use of the Calinski-Harabasz index to complete automatic hierarchical clustering. The SNIC algorithm is implemented in the GEE platform, which has lower memory requirements, faster superpixel generation speed, and higher accuracy, and can be implemented in the GEE platform. SNIC is characterized by high boundary dependency and low complexity. The higher the compactness value, the better the regularity of the clusters. Considering that there are many small water bodies in the Poyang Lake area, the key parameter compactness is set to 1, and size and connectivity are set to 32 and 8, respectively. In the superpixel, the value of each pixel is replaced by the mean value of the pixels within the cluster. Raster-to-vector conversion is implemented in ArcGIS Pro 3.0, and the original three raster features are represented by fields of new polygon vector data. The Calinski-Harabasz index and hierarchical clustering algorithm are implemented using the Python scikit-learn library. In this embodiment, the hierarchical clustering algorithm adopts bottom-up clustering, with the key parameter distance metric being Euclidean distance and the connection method being "ward". This algorithm can effectively reduce the impact of varying terrain features in remote sensing images. However, it suffers from high memory consumption and complexity. Therefore, a hierarchical clustering algorithm is run by randomly sampling 20,000 polygons from the superpixel-converted polygon vector data. A limitation of clustering algorithms is the difficulty in determining the number of clusters. This invention uses the Calinski-Harabasz index to determine the locally optimal number of clusters between 2 and 10. Specifically, the final number of clusters is determined by the Calinski-Harabasz index score corresponding to the highest number of clusters within the range of 2-10.

[0106] In this embodiment, the Calinski-Harabasz index measures cluster density and separation by calculating the ratio of inter-cluster dispersion to intra-cluster dispersion. The index is calculated as follows:

[0107]

[0108] Where s represents the Calinski-Harabasz index, k is the number of clusters, N is the total number of samples, and SS B For inter-cluster variance, SS W The intra-cluster variance is calculated as follows:

[0109] SS B =tr(B k );

[0110]

[0111] SS W =tr(W k );

[0112]

[0113] Among them, tr(B) k ) and tr(W k Both ) represent the trace of a matrix, C q Let c be the sample set of cluster q. q It is the center of cluster q, C E It is the center of all samples, n q B is the total number of samples in cluster q. k W k Both are matrices used to prepare the expressions for inter-cluster variance and intra-cluster variance, where x represents C. q One of the samples.

[0114] This application employs an object-oriented automatic clustering experimental method. Based on suitable features and object-oriented principles, it automatically divides randomly collected samples from images into multiple categories. The suitable features are B8, NDWI, and MBWI. Object-oriented approach refers to using SNIC technology to perform superpixel segmentation on the original raster image. Automatically dividing randomly collected samples into multiple categories refers to automatically dividing randomly selected superpixels into the most suitable number of clusters (2-10) based on the Calinski-Harabasz index and hierarchical clustering algorithm. For determining the clustering category, this invention identifies clusters belonging to the water category from different clusters, thereby obtaining labeled clustering results. The identification method refers to automatically determining the category of water and non-water clusters based on the characteristics of surface water in the B2 band, NDWI, and MBWI.

[0115] The appropriate features were obtained through multidimensional clustering experiments. Specifically, the experiments used the kappa coefficient as the evaluation criterion to determine which feature combination provided the most stable water system mapping performance for the multidimensional clustering algorithm across multiple scenarios. The feature combination described here is any combination among B8, B12, AWEI, MNDWI, MBWI, and NDWI. Due to content limitations, the conclusions of the clustering experiments were directly used here.

[0116] Step A43 in this embodiment essentially involves determining the water and non-water clusters in the unlabeled clustering results based on the relatively stable water index, thereby obtaining labeled clustering results. Specifically, this includes:

[0117] The following principles are used to determine which cluster is uniquely identified as water: Calculate the average MBWI index of all clusters, select the cluster with the largest average, and determine if the average B2 reflectance of this cluster is less than 0.5. If so, determine if the average NDWI of the corresponding cluster is greater than -0.1. If so, mark it as water and mark the other clusters as non-water. Otherwise, mark all clusters as non-water. If the average B2 reflectance is greater than or equal to 0.5, select the cluster with the second largest average MBWI, and then determine whether the cluster is water based on the NDWI value. The determination of whether the average B2 reflectance of the cluster is less than 0.5 is mainly to deal with the interference of snow accumulation. When snow is present in the scene, the MBWI of the snow category will be significantly higher than that of surface water. Therefore, this method uses the principle that snow has a reflectance peak in the blue light band and surface water has relatively low reflectance in all bands to distinguish between them.

[0118] After obtaining the labeled clustering results, this embodiment indicates that all 20,000 sampled samples have been labeled as either surface water or non-surface water. Combining all feature values ​​of these 20,000 samples, including six spectral bands and four water indices, a relationship is established between the feature values ​​and the water or non-water label. This relationship is established using a random forest model. Random forest is an ensemble machine learning algorithm whose base learner is a decision tree. Traditional decision trees classify based on all features. Random forest, however, introduces random feature selection after random sampling. For each node of the base learner, a subset containing k features is first selected from the node's feature set. Then, the optimal feature is selected from the subset for partitioning. Here, k introduces randomness into the model.

[0119] In this embodiment, step A5 trains the random forest model, resulting in a surface water extraction model. Therefore, the surface water extraction model is essentially the trained random forest model. This model generalizes the labeled clustering results into a regional surface water mask. Generalization refers to training the random forest model based on all spectral bands and water indices used in this invention, combined with the labeled clustering results, to generate a surface water mask for the entire study area. The surface water mask is a binary raster image where surface water is labeled 1 and non-surface water is labeled 0. Figure 5 As shown.

[0120] During the model training phase, the number of base learners for the key parameters of the random forest was set to 100. Using the kappa coefficient as the evaluation metric, the accuracy of the surface water mask generated by the surface water extraction model was validated and evaluated based on collected validation samples. This process helps to optimize the surface water extraction model and ensure its accuracy in extracting surface water.

[0121] A kappa coefficient closer to 1 indicates higher accuracy, meaning it is more likely to be correct, classifying real surface water as surface water and real land as land. The kappa coefficient is used to evaluate the classification accuracy of the results. In reality, accuracy evaluation is not part of the actual mapping process; it's only used to verify the feasibility of the algorithm. The kappa coefficient calculation formula is as follows:

[0122]

[0123]

[0124]

[0125] Where kappa refers to the kappa coefficient, p0 represents the overall precision value, and p e TP represents the sum of the products of the actual and predicted numbers for each category and the quotient of the square of the total number of samples. TP, TN, FP, and FN refer to the true positive, true negative, false positive, and false negative in the binary confusion matrix, respectively.

[0126] In this embodiment, the kappa coefficient of the accuracy verification result in the Poyang Lake area is 0.969, which means that the method of the present invention performs very well; from Figure 5 The mapping results show that the surface water extraction method proposed in this invention can achieve very good surface water mapping results even in complex scenarios such as Poyang Lake, and can accurately draw surface water maps.

[0127] This invention employs a robust random forest model as the surface water extraction model, while considering multiple surface water-related features. It combines spectral bands, water body indices, and clustering results for analysis and processing, achieving automatic surface water extraction based on high-resolution remote sensing images in a multi-dimensional and high-precision manner. This not only effectively improves the accuracy of surface water extraction, yielding more accurate and reliable results, but also generates surface water masks that can be applied to large-scale, high-resolution surface water mapping, providing technical support for large-scale dynamic monitoring of surface water and regional water resource management.

[0128] Example 2

[0129] Figure 6 A structural block diagram of a training system for a surface water extraction model is shown. Embodiment 2 of this invention proposes a training system for a surface water extraction model, which corresponds to the training method in Embodiment 1. The system specifically includes:

[0130] Training sample set construction module 1 is used to construct a training sample set, which includes several surface water remote sensing images.

[0131] The first selection module 2 is used to select a set of spectral bands based on the training sample set, wherein the set of spectral bands includes several spectral bands.

[0132] The first calculation module 3 is used to calculate water body indices based on the spectral band set to obtain a water body index set, which includes several water body indices.

[0133] The first clustering module 4 is used to obtain labeled clustering results based on the spectral band set and the water index set, wherein each cluster in the labeled clustering results is labeled with a surface water label or a non-surface water label.

[0134] Model training module 5 is used to train a random forest model with the spectral band set and water index set corresponding to the surface water remote sensing image as input and the labeled clustering results corresponding to the surface water remote sensing image as output, to obtain a surface water extraction model. The surface water extraction model is used to obtain the corresponding labeled clustering results based on the surface water remote sensing image and generate a surface water mask based on the labeled clustering results. The surface water mask is a labeled binary raster image, and the labels include labels indicating that a certain area is surface water or non-surface water.

[0135] Example 3

[0136] Embodiment 3 of the present invention proposes a method for surface water extraction, such as... Figure 7 As shown, this method is actually the application method corresponding to the training method of the surface water extraction model in Example 1. The main content of Example 3 is the same as that in Example 1. The only difference is that, based on the surface water extraction model trained in Example 1, Example 3 acquires the surface water remote sensing image of the target area and uses the surface water extraction model of Example 1 to generate a surface water mask map corresponding to the surface water remote sensing image of the target area.

[0137] The surface water extraction method specifically includes the following steps:

[0138] Step B1: Obtain remote sensing images of surface water in the target area.

[0139] Step B2: Select a set of spectral bands based on the remote sensing image of surface water in the target area. The set of spectral bands includes several spectral bands.

[0140] Step B3: Calculate the water index based on the spectral band set to obtain a water index set, which includes several water indices.

[0141] Step B4: Input the spectral band set and water index set corresponding to the surface water remote sensing image of the target area into the surface water extraction model to obtain the labeled clustering results corresponding to the surface water remote sensing image of the target area, and generalize according to the labeled clustering results to generate a corresponding surface water mask; wherein, the surface water extraction model is a model trained by the training method of the surface water extraction model proposed in Embodiment 1 of the present invention, and the surface water mask is a labeled binary raster image, the labels including labels indicating that a certain area is surface water or non-surface water.

[0142] Example 4

[0143] Embodiment 4 of the present invention proposes a surface water extraction system, such as... Figure 8 As shown, this system corresponds to the surface water extraction method in Example 3, and the system specifically includes:

[0144] The surface water remote sensing image acquisition module 6 is used to acquire surface water remote sensing images of the target area.

[0145] The second selection module 7 is used to select a set of spectral bands based on the remote sensing image of surface water in the target area, wherein the set of spectral bands includes several spectral bands.

[0146] The second calculation module 8 is used to calculate water body indices based on the spectral band set to obtain a water body index set, which includes several water body indices.

[0147] The second clustering module 9 is used to obtain labeled clustering results based on the spectral band set and the water index set, wherein each cluster in the labeled clustering results is labeled with a surface water label or a non-surface water label.

[0148] The surface water extraction module 10 is used to input the spectral band set and the water index set corresponding to the surface water remote sensing image of the target area into the surface water extraction model to obtain the labeled clustering results corresponding to the surface water remote sensing image of the target area, and generate a surface water mask based on the labeled clustering results; wherein, the surface water extraction model is a model trained by the training method of the surface water extraction model proposed in Embodiment 1 of the present invention, and the surface water mask is a labeled binary raster image, wherein the label includes a label indicating that a certain area is surface water or a label indicating that a certain area is not surface water.

[0149] Specific examples are used in this article, but the above description is only to illustrate the principles and implementation methods of the present invention. The description of the above embodiments is only for the purpose of helping to understand the method and core ideas of the present invention. Those skilled in the art should understand that the various modules or steps of the present invention described above can be implemented using general-purpose computer devices. Optionally, they can be implemented using computer-executable program code, and thus, they can be stored in a storage device for execution by a computer device, or they can be fabricated as separate integrated circuit modules, or multiple modules or steps can be fabricated as a single integrated circuit module. The present invention is not limited to any specific combination of hardware and software.

[0150] Furthermore, those skilled in the art will recognize that, based on the principles of this invention, there will be variations in the specific implementation methods and application scope. Therefore, the content of this specification should not be construed as limiting the invention.

Claims

1. A training method for a surface water extraction model, characterized in that, include: Construct a training sample set, which includes several remote sensing images of surface water. Based on the surface water remote sensing images in the training sample set, a spectral band set is selected, which includes several spectral bands. Water body indices are calculated based on the set of spectral bands to obtain a set of water body indices, which includes several water body indices. Based on the spectral band set and the water index set, labeled clustering results are obtained, specifically including: selecting spectral bands from the spectral band set and water indices from the water index set as clustering features to construct a clustering feature set; performing multidimensional hierarchical clustering on the clustering feature set to obtain unlabeled clustering results; obtaining labeled clustering results based on the unlabeled clustering results; each cluster in the labeled clustering results is labeled with a surface water tag or a non-surface water tag; obtaining labeled clustering results based on the unlabeled clustering results specifically includes: calculating the MBWI mean of all clusters in the unlabeled clustering results; comparing the MBWI mean of all clusters and selecting the cluster with the largest MBWI mean as the first cluster to be processed; determining the B2 mean of the first cluster to be processed. The system first determines whether the value is less than a first threshold to obtain a first judgment result. If the first judgment result is yes, it determines whether the mean NDWI value of the first cluster to be processed is greater than a second threshold to obtain a second judgment result. If the second judgment result is yes, the first cluster to be processed is marked as surface water, and all other clusters are marked as non-surface water. If the second judgment result is no, all clusters are marked as non-surface water. If the first judgment result is no, the cluster with the second largest mean MBWI value is selected as the second cluster to be processed. The system then determines whether the mean NDWI value of the second cluster to be processed is greater than a second threshold to obtain a third judgment result. If the third judgment result is yes, the second cluster to be processed is marked as surface water, and all other clusters are marked as non-surface water. If the third judgment result is no, all clusters are marked as non-surface water. Using the spectral band set and water index set corresponding to the surface water remote sensing image as input, and the labeled clustering results corresponding to the surface water remote sensing image as output, a random forest model is trained to obtain a surface water extraction model. The surface water extraction model is used to obtain the corresponding labeled clustering results based on the surface water remote sensing image, and generate a surface water mask based on the labeled clustering results. The surface water mask is a labeled binary raster image, and the labels include labels indicating that a certain area is surface water or non-surface water.

2. The training method for a surface water extraction model according to claim 1, characterized in that, Multidimensional hierarchical clustering is performed on the clustering feature set to obtain unlabeled clustering results, specifically including: The SNIC algorithm is used to perform superpixel segmentation on each cluster feature in the cluster feature set to obtain several superpixel grids; Use the raster-to-vector tool in ArcGIS Pro to convert the superpixel raster into vector polygons; Randomly sample the vector polygons to obtain several vector polygon samples; The vector polygon samples are subjected to hierarchical clustering using a hierarchical clustering algorithm. The optimal number of clusters is determined based on the Calinski-Harabasz index, resulting in unlabeled clustering results.

3. The training method for a surface water extraction model according to claim 1, characterized in that, The spectral band set includes B2, B3, B4, B8, B11, and B12.

4. The training method for a surface water extraction model according to claim 1, characterized in that, The water body index set includes AWEI, MBWI, MNDWI, and NDWI.

5. The training method for a surface water extraction model according to claim 1, characterized in that, The clustering feature set includes B8, MBWI, and NDWI.

6. A training system for a surface water extraction model, characterized in that, include: The training sample set construction module is used to construct a training sample set, which includes several remote sensing images of surface water. The first selection module is used to select a set of spectral bands based on the training sample set, wherein the set of spectral bands includes several spectral bands. The first calculation module is used to calculate water body indices based on the spectral band set to obtain a water body index set, which includes several water body indices. The first clustering module is used to obtain labeled clustering results based on the spectral band set and the water index set. Specifically, it includes: selecting spectral bands from the spectral band set and water indices from the water index set as clustering features to construct a clustering feature set; performing multi-dimensional hierarchical clustering on the clustering feature set to obtain unlabeled clustering results; obtaining labeled clustering results based on the unlabeled clustering results; each cluster in the labeled clustering results is labeled with either a surface water label or a non-surface water label; obtaining labeled clustering results based on the unlabeled clustering results specifically includes: calculating the MBWI mean of all clusters in the unlabeled clustering results; comparing the MBWI mean of all clusters and selecting the cluster with the largest MBWI mean as the first cluster to be processed; determining the first cluster to be processed. The first judgment result is obtained by determining whether the mean B2 value of the cluster is less than a first threshold. If the first judgment result is yes, the second judgment result is obtained by determining whether the mean NDWI value of the first cluster to be processed is greater than a second threshold. If the second judgment result is yes, the first cluster to be processed is marked as surface water, and all other clusters are marked as non-surface water. If the second judgment result is no, all clusters are marked as non-surface water. If the first judgment result is no, the cluster with the second largest mean MBWI value is selected as the second cluster to be processed. The third judgment result is obtained by determining whether the mean NDWI value of the second cluster to be processed is greater than a second threshold. If the third judgment result is yes, the second cluster to be processed is marked as surface water, and all other clusters are marked as non-surface water. If the third judgment result is no, all clusters are marked as non-surface water. The model training module is used to train a random forest model by taking the spectral band set and water index set corresponding to the surface water remote sensing image as input and the labeled clustering results corresponding to the surface water remote sensing image as output, to obtain a surface water extraction model. The surface water extraction model is used to obtain the corresponding labeled clustering results based on the surface water remote sensing image and generate a surface water mask based on the labeled clustering results. The surface water mask is a labeled binary raster image, and the labels include labels indicating that a certain area is surface water or non-surface water.

7. A method for extracting surface water, characterized in that, include: Acquire remote sensing images of surface water in the target area; Based on the remote sensing image of surface water in the target area, a set of spectral bands is selected, which includes several spectral bands; Water body indices are calculated based on the set of spectral bands to obtain a set of water body indices, which includes several water body indices. The spectral band set and water index set corresponding to the surface water remote sensing image of the target area are input into the surface water extraction model to obtain the labeled clustering results corresponding to the surface water remote sensing image of the target area, and a surface water mask is generated based on the labeled clustering results; wherein, the surface water extraction model is a model trained by the training method of a surface water extraction model according to any one of claims 1-5, and the surface water mask is a labeled binary raster image, wherein the label includes a label indicating that a certain area is surface water or a label indicating that a certain area is not surface water.

8. A surface water extraction system, characterized in that, include: The surface water remote sensing image acquisition module is used to acquire surface water remote sensing images of the target area; The second selection module is used to select a set of spectral bands based on the remote sensing image of surface water in the target area, wherein the set of spectral bands includes several spectral bands. The second calculation module is used to calculate water body indices based on the spectral band set to obtain a water body index set, which includes several water body indices. The second clustering module is used to obtain labeled clustering results based on the spectral band set and the water index set, wherein each cluster in the labeled clustering results is labeled with a surface water label or a non-surface water label. The surface water extraction module is used to input the spectral band set and the water index set corresponding to the surface water remote sensing image of the target area into the surface water extraction model to obtain the labeled clustering results corresponding to the surface water remote sensing image of the target area, and generate a surface water mask based on the labeled clustering results; wherein, the surface water extraction model is a model trained by the training method of the surface water extraction model according to any one of claims 1-5, and the surface water mask is a labeled binary raster image, wherein the label includes a label indicating that a certain area is surface water or a label indicating that a certain area is not surface water.