A method for image segmentation based on a clustering algorithm of discovering micro-cluster structure based on weight constraint minimum spanning tree
By using a clustering algorithm based on weighted minimum spanning tree, and leveraging boundary fuzzy sampling and micro-cluster merging index, the problem of traditional clustering algorithms being unable to identify image structures in image segmentation is solved, achieving higher segmentation accuracy.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- KUNMING UNIV OF SCI & TECH
- Filing Date
- 2023-07-25
- Publication Date
- 2026-06-23
Smart Images

Figure CN117036697B_ABST
Abstract
Description
Technical Field
[0001] This invention relates to a method for image segmentation based on a clustering algorithm that discovers micro-cluster structures using a weighted minimum spanning tree, belonging to the technical field of cluster analysis applications in data mining and machine learning. Background Technology
[0002] Digital image processing is an interdisciplinary field. Image segmentation is a crucial preprocessing step for image recognition and computer vision. Correct recognition is impossible without accurate segmentation. However, segmentation is based solely on the brightness and color of pixels in an image, and automated computer segmentation encounters various difficulties. For example, uneven lighting, noise, the presence of blurry parts in the image, and shadows often lead to segmentation errors. Therefore, image segmentation is a technology requiring further research. Introducing human-guided knowledge and artificial intelligence methods to correct certain segmentation errors is a promising approach, but this further complicates the problem.
[0003] Image segmentation using feature space clustering involves representing pixels in the image space with corresponding feature space points, segmenting the feature space based on their clustering, and then mapping them back to the original image space to obtain the segmentation result. K-means and fuzzy C-means (FCM) clustering algorithms are the most commonly used clustering algorithms. However, traditional clustering algorithms cannot accurately find the distribution structure of the data, resulting in unsatisfactory performance in image segmentation. Therefore, this invention proposes a clustering algorithm that can accurately find cluster structures to improve the accuracy of image segmentation. Summary of the Invention
[0004] This invention provides a method for image segmentation based on a clustering algorithm that discovers micro-cluster structures using a weighted minimum spanning tree. This method addresses the issue that traditional clustering algorithms cannot accurately identify image structures when segmenting images, thereby increasing the accuracy of image segmentation.
[0005] The technical solution of this invention is: a method for image segmentation based on a clustering algorithm for discovering micro-cluster structures using a weighted minimum spanning tree. This method employs a boundary fuzzy sampling method to preserve the original data distribution structure of the image to be segmented. Then, it finds initial micro-clusters by restricting the growth of the minimum spanning tree. Finally, it further defines a merging index between micro-clusters to merge them, accurately finding the cluster structure for clustering. The specific steps are as follows:
[0006] Step 1: Preprocess the image to be segmented to obtain a standardized scaled dataset;
[0007] For the image segmentation, taking a color image of size d×p as an example, RGB values are used to represent the different color channels of each pixel. The image is then represented as a matrix with rows and columns corresponding to the pixels in the image, and each element in the matrix is represented by a vector of the RGB values of the corresponding pixel. The matrix is then expanded into a pixel-indexed matrix, which serves as the input to the algorithm.
[0008]
[0009] After the above transformation, the images to be segmented produce a multidimensional dataset with d×p samples. This dataset is then standardized and scaled, with each dimension normalized to the range [0,1], and then scaled to the range [0,50]. The specific scaling formula is as follows:
[0010]
[0011] In the formula, X is the set of dimensions in the dataset, x is any data in X, and x' is the value of data x after scaling.
[0012] Step 2: Perform boundary fuzzy sampling on each dimension of the standardized dataset;
[0013] For the standardized dataset, using 1 as the standard unit, fuzzy boundary sampling is performed sequentially in each dimension. The sampled dataset is described as X′:
[0014]
[0015] Where X = {x1,...,x} i ,...,x n}, n represents the number of samples, and m represents the data dimension. For sample x i The values for dimension d are a = {0, 1, 2, ..., 50}, and ξ is the fuzzy threshold.
[0016] Step 3: For the dataset after boundary blurring sampling, find the initial micro-cluster by restricting the growth of the minimum spanning tree;
[0017] From the dataset X' after fuzzy boundary sampling, arbitrarily select an unlabeled sample and add it to tree T, labeling the sample. At this point, the tree contains only one sample. Then, select an unlabeled sample that is closest to the current sample set in T. If this distance is less than the specified weight Limi_weight, label the sample and add it to the tree, and add the corresponding weighted edge to the weight set W. T After each operation, the number of samples in T and W T The weighted edge count in each cluster is increased by 1; otherwise, the growth of the spanning tree is restricted. This process continues until all micro-cluster sets Micro_C are found.
[0018] Step 4 defines the inter-cluster merging index to merge the initial micro-clusters and obtain the clustered cluster structure;
[0019] Using a cohesive hierarchical clustering approach, initial micro-clusters are iteratively merged until the correct cluster structure is found:
[0020] (1) Calculate the local density of all microclusters and the distance between microclusters, starting from any microcluster C. i Starting from this point, calculate the merging index MI between this microcluster and the other microclusters, and determine the microcluster C corresponding to the largest merging index. j With the current microcluster C i minimum distance between dist min (C i C j Is it smaller than the microcluster C? i With microcluster C j The sum of the average weights within the cluster; if so, merge the micro-clusters C. i With C j If not, abandon the merger;
[0021] The local density of the microclusters is described as follows:
[0022]
[0023] Where n i Micro_C cluster i Medium sample size, v j Micro_C cluster i The corresponding weight set W Ti Connection weights in the data;
[0024] The inter-cluster distance is determined by both the cluster centroid distance and the shortest distance between clusters, where the shortest distance between micro-clusters is dist. min (C i C j ) is represented as:
[0025]
[0026] in |p-p'| is the distance between two samples p and p', inf = avg(W Ti )+avg(W Tj ) represents cluster C i With cluster C j The average weights are summed. If the minimum distance between two clusters is greater than this value, the distance is increased to highlight the separation between the two clusters; if the minimum distance between two clusters is less than this value, the distance is decreased to strengthen the connection between the two clusters.
[0027] The distance between microclusters is described as:
[0028] dist ij =dist min (C i C j )×dist centroid (C i C j )
[0029] dist centroid (C i C j )=|c i -c j |,c i It is cluster C i The center of mass;
[0030] The cluster merging index is described as follows:
[0031]
[0032] Where |ρ i -ρ j | represents the density difference between cluster i and cluster j.
[0033] (2) Repeat step (1) above in the remaining microclusters until all microclusters have been merged once;
[0034] (3) Take all the cluster results that have been merged once as a new set, repeat steps (1)-(2), and repeat the iteration until all micro-clusters are no longer merged and the set stops updating.
[0035] Step 5: Perform image segmentation based on the cluster structure found in Step 4.
[0036] When performing image segmentation, a one-time allocation strategy is adopted. Based on the final cluster structure output in Step 4, unassigned samples belonging to the same cluster will be distributed around the cluster structure. When allocating these unassigned samples, the nearest assigned sample is found and the category of that sample is assigned to the unassigned sample.
[0037] The beneficial effects of this invention are:
[0038] This invention provides a clustering algorithm for discovering micro-cluster structures based on weighted minimum spanning tree. It can accurately identify cluster structures during the clustering process, thereby solving the problem that traditional clustering algorithms cannot accurately identify image structures when segmenting images, and increasing the accuracy of clustering algorithms in the field of image segmentation. Attached Figure Description
[0039] Figure 1 This is a flowchart of the present invention;
[0040] Figure 2 This is a schematic diagram of boundary blur sampling;
[0041] Figure 3 This is a diagram illustrating the experimental process of Example 1 on the Aggregation dataset;
[0042] Figure 4 This is a graph showing the sensitivity test results of the parameter ξ proposed in this invention on different datasets;
[0043] Figure 5 This is a graph showing the sensitivity test results of the parameter Limit_weight proposed in this invention on different datasets;
[0044] Figure 6-8 The figures show the experimental results of the present invention and existing methods on three datasets of different sizes. Detailed Implementation
[0045] The present invention will be further described below with reference to the accompanying drawings and specific embodiments.
[0046] Example 1: As shown in the attached document Figure 1 As shown, a method for image segmentation based on a clustering algorithm for discovering micro-cluster structures using weight-constrained minimum spanning tree is presented. The specific steps of the method are as follows:
[0047] Step 1: Preprocess the image to be segmented to obtain a standardized scaled dataset;
[0048] For the image to be segmented, taking a color image of size d×p as an example, RGB values are used to represent the different color channels of each pixel. The image is then represented as a matrix with rows and columns corresponding to the pixels in the image, and each element in the matrix is represented by a vector of the RGB values of the corresponding pixel. The matrix is then expanded into a pixel-indexed matrix, which serves as the input to the algorithm.
[0049]
[0050] After the above transformation, the images to be segmented produce a multidimensional dataset with d×p samples. This dataset is then standardized and scaled, with each dimension of the original dataset normalized to the range [0,1], and then scaled to the range [0,50]. The specific scaling formula is as follows:
[0051]
[0052] Taking the dimension R in the output multidimensional dataset as an example, X = [R1, R2, R3, ... R d×p ], where x is any data R in X. i, where x' is the value of data x after scaling.
[0053] Step 2: Perform boundary fuzzy sampling on each dimension of the standardized dataset.
[0054] Before searching for cluster structures, it is desirable to have a clearer distribution structure of the data. Furthermore, the search for cluster structures does not require the participation of all samples. Based on this, a fuzzy boundary-based sampling method is proposed. This method performs fuzzy boundary sampling on each dimension of the standardized dataset. While preserving the distribution characteristics of the original data, it also makes the originally fuzzy data distribution clearer through reasonable sparsification.
[0055] For the standardized dataset, using 1 as the standard unit, fuzzy boundary sampling is performed sequentially in each dimension. The sampled dataset is described as follows:
[0056]
[0057] Where X = {x1,...,x} i ,...,x n}, n represents the number of samples, and m represents the data dimension. For sample x i The values for dimension d are a = {0, 1, 2, ..., 50}, and ξ is the fuzzy threshold.
[0058] Step 3: After processing in Step 2, first, randomly select an unlabeled sample from dataset X' and add it to tree T, then label the sample. At this point, the tree contains only one sample. Next, select an unlabeled sample that is closest to the current set of samples in T. If this distance is less than the specified weight Limi_weight, label the sample and add it to the tree, and add the corresponding weighted edge to the weight set W. T After each operation, the number of samples in T and W T The weighted edge count in each cluster is increased by 1; otherwise, the growth of the spanning tree is restricted. This process continues until all micro-cluster sets Micro_C are found.
[0059] Step 4: After obtaining the initial microclusters in Step 3, calculate the local density of the microclusters and the distance between microclusters. The local density of the microclusters is described as follows:
[0060]
[0061] Where n i Micro_C cluster i Medium sample size, v j Micro_C cluster i The corresponding weight set WTi The connection weights in the equation.
[0062] The inter-cluster distance is determined by both the distance between the cluster centroids and the shortest distance between clusters. The shortest distance between micro-clusters is dist. min (C i C j ) is represented as:
[0063]
[0064] in |p-p'| is the distance between two samples p and p'. inf = avg(W Ti )+avg(W Tj ) represents cluster C i With cluster C j The average weights are summed. If the minimum distance between two clusters is greater than this value, the distance is increased to highlight the separation between the two clusters; if the minimum distance between two clusters is less than this value, the distance is decreased to strengthen the connection between the two clusters.
[0065] The distance between microclusters is described as:
[0066] dist ij =dist min (C i C j )×dist centroid (C i C j )
[0067] dist centroid (C i C j )=|c i -c j |,c i It is cluster C i The center of mass.
[0068] The cluster merging index is described as follows:
[0069]
[0070] Where |ρ i -ρ j | represents the density difference between cluster i and cluster j, and this index is proportional to MI(i,j); dist ij It is inversely proportional to MI(i,j). The larger MI(i,j) is, the more likely cluster i and cluster j are to be merged.
[0071] In the process of finding the cluster structure, a cohesive hierarchical clustering approach is adopted, iteratively merging micro-clusters until the correct cluster structure is found. The specific process is as follows:
[0072] (1) Calculate the local density of all microclusters and the distance between microclusters, starting from any microcluster C. i Starting from this point, calculate the merging index MI between this microcluster and the other microclusters, and determine the microcluster C corresponding to the largest merging index. j With the current microcluster C i minimum distance between dist min (C i C j Is it smaller than the microcluster C? i With microcluster C j The sum of the average weights within the cluster; if so, merge the micro-clusters C. i With C j If not, abandon the merger.
[0073] (2) Repeat step (1) above in the remaining microclusters until all microclusters have been merged once;
[0074] (3) Update all cluster results that have been merged once to a new set, repeat steps (1)-(2), and repeat the iteration until all micro-clusters are no longer merged and the set stops updating.
[0075] Step 5: The final cluster structure output from Step 4 accurately contains the sample distribution information of the original clusters. Unassigned samples belonging to the same cluster will be distributed around the cluster structure. When assigning these unassigned samples, it is only necessary to find the nearest assigned sample and assign its category to the unassigned sample. Although a one-time assignment strategy is adopted, thanks to the accurate identification of the cluster structure, this assignment method can still achieve relatively accurate clustering results.
[0076] Example 2: To intuitively demonstrate the fuzzy boundary sampling process, using... Figure 2 For example, fuzzy boundary sampling is performed on a two-dimensional dataset containing two clusters. Figure 2 (a) The original dataset was divided into a grid. Figure 2 In (b), the shaded area represents a blurred boundary. Figure 2 (c) is the sampling result obtained based on fuzzy boundary sampling. Although it is represented as a grid in two-dimensional data, this sampling method samples the dataset independently in each dimension, and the dimensions do not interfere with each other. Therefore, it does not suffer from the curse of dimensionality problem in traditional grid-based sampling. This method is also scalable in higher-dimensional datasets.
[0077] Example 3: As shown in the attached document Figure 3As shown, unlike the traditional minimum spanning tree clustering algorithm which separates clusters by removing the longest edge from the minimum spanning tree, this invention defines the maximum weight in the minimum spanning tree to limit its growth, thereby dividing the dataset into several micro-clusters, each of which is a minimum spanning tree with restricted weights.
[0078] Appendix Figure 3 Figures (a)-(f) show the distribution of the original data in the Aggregation dataset, the distribution of the data from the boundary-fuzzy sampling, the weight-constrained minimum spanning tree for finding micro-clusters, the micro-cluster merging, the remaining point allocation, and the final clustering result, respectively. The initial micro-clusters in Figure (c) strictly adhere to the principle of making samples within the cluster as similar as possible. Although samples originally belonging to the same cluster may be divided into different initial micro-clusters, this method guarantees the purity within each micro-cluster; that is, samples within the same initial micro-cluster must belong to the same class.
[0079] Example 4: As shown in the appendix Figure 4 Appendix Figure 5 As shown, the sensitivity of the two parameters ξ used in this invention to Limi_weight was tested using the Aggregation dataset as an example.
[0080] The three datasets of different sizes used in this embodiment are shown in Table 1:
[0081] Table 1 Comparison of three different datasets
[0082]
[0083] Regarding the fuzzy threshold parameter ξ, since the original dataset is scaled to the [0,50] interval and a unit interval of 1 is specified, the maximum range of this parameter is limited to (0,0.5]. Furthermore, since not all data is necessarily required in the process of finding cluster structures, the main purpose of this parameter is to preserve the distribution characteristics of the original data after interval fuzzy sampling. For datasets with very small amounts of data, we recommend setting the parameter ξ to 0.4-0.5, because we believe that recompressing datasets with too little data cannot effectively preserve their distribution characteristics; for datasets with moderate amounts of data, we recommend setting the parameter ξ to 0.2±0.05; for large datasets, our recommended range is 0.1±0.05. In general, the setting of the parameter ξ is inversely proportional to the amount of data. The larger the dataset, the smaller the setting of ξ should be, and the smaller the dataset, the larger the setting of ξ should be, in order to preserve the original characteristics of the data. Figure 4 The performance of parameter ξ is shown on different datasets. When testing the current parameter, another parameter was fixed. Figure 4 (a) shows that stable and optimal clustering results can be obtained when ξ is greater than 0.44 on the Spiral dataset. Figure 4(b) While the performance of parameter ξ on the Aggregation dataset fluctuates slightly, it remains at a high level and finds the optimal result within this range. Encouragingly, parameter ξ exhibits remarkable stability on the larger dataset S1. In summary, the selection of parameter ξ is relatively simple, and within the recommended parameter range, fluctuations in parameter ξ have little impact on the clustering results, demonstrating relatively stable performance.
[0084] Regarding the weighting parameter Limi_weight, this parameter is used to limit the growth of the minimum spanning tree to ensure the acquisition of tightly connected small clusters. Clearly, if this value is too large, it will be impossible to separate effective micro-cluster structures. Therefore, the value of this parameter depends on the sample distance matrix. Experiments show that attempting to traverse the tree with this parameter greater than the minimum value of the distance matrix often yields satisfactory results. Similar to the above, we fixed another parameter during our testing experiments. Figure 5 Experimental results show that when iterating through the parameter C_weight on different datasets, it is always easy to find the parameter value that optimizes the clustering effect.
[0085] Example 5: As shown in the attached document Figure 6-8 As shown, to further verify the effectiveness of this invention, we compare it with other clustering methods. Accuracy (Acc), Adjusted Rand index (ARI), and Normalized mutual information (NMI) are chosen to measure clustering performance.
[0086] Table 2 shows the clustering results of the present invention and existing methods on three different datasets. It can be seen that the present invention achieves the best clustering effect in datasets of different sizes and distributions.
[0087] Table 2 shows the clustering results of different clustering methods on the three datasets.
[0088]
[0089] The specific embodiments of the present invention have been described in detail above with reference to the accompanying drawings. However, the present invention is not limited to the above embodiments. Within the scope of knowledge possessed by those skilled in the art, various changes can be made without departing from the spirit of the present invention.
Claims
1. A method for image segmentation based on a clustering algorithm for discovering micro-cluster structures using weighted minimum spanning tree, characterized in that, A boundary fuzzy sampling method is employed to preserve the original data distribution structure of the image to be segmented. Then, initial micro-clusters are found by limiting the growth of the minimum spanning tree. Finally, a merging index between micro-clusters is defined to merge the micro-clusters to accurately find the cluster structure for clustering. The specific steps are as follows: Step 1: Preprocess the image to be segmented to obtain a standardized scaled dataset; Step 2 performs boundary fuzzy sampling on each dimension of the standardized dataset; Step 3 finds the initial micro-clusters by restricting the growth of the minimum spanning tree on the dataset after boundary fuzzy sampling. Step 4 defines the inter-cluster merging index to merge the initial micro-clusters and obtain the clustered cluster structure; Step 5: Perform image segmentation based on the cluster structure found in Step 4; The specific process of Step 2 is as follows: For the standardized dataset, using 1 as the standard unit, fuzzy boundary sampling is performed sequentially in each dimension. The sampled dataset is described as follows: : ; in , , Indicates the number of samples. For data dimensions, For the sample In dimension The value at time, , For fuzzy thresholds; The specific process of Step 3 is as follows: Dataset sampled from fuzzy boundaries Randomly select an unlabeled sample and add it to the tree. The sample is then labeled, and at this point the tree contains only one sample. Then, a sample matching the current sample is selected. The nearest unlabeled sample in the middle sample set is selected when that distance is less than the specified weight. When the sample is labeled and added to the tree, the corresponding weighted edge is added to the weight set. After each operation The number of samples and Increment the weighted edge count by 1 in each cluster; otherwise, limit the growth of the spanning tree. Continue this process until all micro-cluster sets are found. .
2. The method for image segmentation based on the clustering algorithm for discovering micro-cluster structures using weight-constrained minimum spanning tree as described in claim 1, characterized in that: The specific process of Step 1 is as follows: For size The image to be segmented uses RGB values to represent the different color channels of each pixel, forming a pixel matrix. This matrix is then expanded into a matrix indexed by pixels, resulting in a sample size of [number missing]. The multidimensional dataset is normalized and scaled by normalizing each dimension to the range [0,1], and then scaling it to the range [0,50]. The specific scaling formula is as follows: ; In the formula, X is the set of dimensions in the dataset. Let X be any data. For data The value after scaling.
3. The method for image segmentation based on the clustering algorithm for discovering micro-cluster structures using weight-constrained minimum spanning tree as described in claim 1, characterized in that: The specific process of Step 4 is as follows: Using a cohesive hierarchical clustering approach, initial micro-clusters are iteratively merged until the correct cluster structure is found: (1) Calculate the local density of all microclusters and the distance between microclusters, starting from any microcluster. Starting from this point, calculate the merging index between this microcluster and the other microclusters. Determine the microcluster corresponding to the largest merging index. With current microclusters minimum distance between Is it smaller than a microcluster? with microclusters If the sum of the average weights within a cluster is equal, then merge the micro-clusters. and If not, abandon the merger; (2) Repeat step (1) above in the remaining microclusters until all microclusters have been merged once; (3) Take all the cluster results that have been merged once as a new set, repeat steps (1)-(2), and repeat the iteration until all micro-clusters are no longer merged and the set stops updating.
4. The method for image segmentation based on the clustering algorithm for discovering micro-cluster structures using weight-constrained minimum spanning tree as described in claim 3, characterized in that: The local density of the microclusters is described as follows: ; in microclusters Medium sample size microclusters Corresponding weight set Connection weights in the data; The distance between clusters is determined by both the distance between the cluster centroids and the shortest distance between clusters, while the shortest distance between micro-clusters is... It is represented as: ; in , There are two samples. and The distance between them Cluster with cluster The average weights are summed. If the minimum distance between two clusters is greater than this value, the distance is increased to highlight the separation between the two clusters; if the minimum distance between two clusters is less than this value, the distance is decreased to strengthen the connection between the two clusters. The distance between microclusters is described as: ; in , It is a cluster The center of mass; The cluster merging index is described as follows: ; in For clusters with cluster The density difference between them.
5. The method for image segmentation based on the clustering algorithm for discovering micro-cluster structures using weight-constrained minimum spanning tree as described in claim 1, characterized in that: The specific process of Step 5 is as follows: When performing image segmentation, a one-time allocation strategy is adopted. Based on the final cluster structure output in Step 4, unassigned samples belonging to the same cluster will be distributed around the cluster structure. When allocating these unassigned samples, the nearest assigned sample is found and the category of that sample is assigned to the unassigned sample.