Hyperspectral image partial label learning method based on heterogeneous network cross disambiguation
By employing a heterogeneous network cross-label disambiguation method, and utilizing the spatial and spectral information of hyperspectral images, combined with a prototype learning-based cross-label disambiguation strategy, the problems of high annotation pressure and label ambiguity in hyperspectral image classification are solved, achieving pixel-level accurate classification.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- CHONGQING UNIV
- Filing Date
- 2024-03-04
- Publication Date
- 2026-06-16
AI Technical Summary
Hyperspectral image classification suffers from problems such as high annotation pressure, high annotation cost, and difficulty in distinguishing ambiguous labels, which makes model training difficult and makes it hard to achieve accurate pixel-level classification.
A cross-label disambiguation method based on heterogeneous networks is adopted. By contrastive learning to utilize the spatial and spectral information of images, combined with the cross-label disambiguation strategy of prototype learning, consistency constraints are established to alleviate the mutual entanglement between representation learning and label disambiguation, and reliable supervision information is provided.
By effectively utilizing the spatial and spectral information of hyperspectral images, the annotation burden and cost are reduced, enabling pixel-level accurate classification of hyperspectral images.
Smart Images

Figure CN118552843B_ABST
Abstract
Description
Technical Field
[0001] This invention belongs to the field of image recognition technology, and specifically relates to a hyperspectral image bias label learning method based on heterogeneous network cross-disambiguation. Background Technology
[0002] Hyperspectral images are integrated data cubes containing rich spectral and spatial information about surface objects, thus enhancing the ability to distinguish different land cover types. Fully utilizing the rich spectral information of hyperspectral images to achieve accurate classification results plays a crucial role in many fields, such as military reconnaissance, environmental monitoring, geological exploration, and deep space exploration. Hyperspectral image classification is a pixel-level classification task, assigning a specific category to each pixel. Thanks to the development of deep learning, its classification accuracy has been continuously improved.
[0003] Deep learning-based hyperspectral image classification methods require a large amount of labeled data during model training, posing a significant challenge to data collection. Specifically, in real-world hyperspectral image annotation scenarios, the complexity of ground features and the limitations of annotators' knowledge directly lead to label ambiguity, making it difficult to distinguish similar categories and thus hindering the acquisition of perfectly labeled data. In recent years, researchers have proposed partial label learning to alleviate annotation pressure and reduce costs. This approach uses ambiguous labels, along with true labels, as a candidate label set, which is then allocated to training samples and participates in the model's training process. Ultimately, the true label is identified from the candidate labels. Therefore, due to its relatively lower annotation requirements, partial label learning is considered more common and practical in various contexts.
[0004] Hyperspectral image partial label learning is a type of weakly supervised learning. Compared with traditional supervised learning, it not only needs to extract effective discriminative features but also needs to identify the true labels from the candidate label set, thus facing the dual challenges of representation learning and label disambiguation. Furthermore, the interdependence between representation learning and label disambiguation can lead to a very difficult dilemma in model training: inherent label uncertainty may inevitably manifest during representation learning—and this quality may, in turn, hinder effective label disambiguation. To address these issues, this invention proposes a hyperspectral image partial label learning method based on heterogeneous network cross-disambiguation. Summary of the Invention
[0005] This invention addresses the main research challenges of hyperspectral image partial label learning by providing a method based on heterogeneous network cross-disambiguation. The aim is to learn pixel-level accurate classification of spectral images in a weakly supervised manner, thereby reducing annotation pressure and costs. By leveraging the complementarity of heterogeneous networks based on contrastive learning, the spatial and spectral information in the image is fully utilized. Consistency constraints are established between networks to ensure consistency in semantic prediction probabilities and feature representations, providing reliable and highly discriminative semantic representations for subsequent label disambiguation. Furthermore, a cross-label disambiguation strategy based on prototype learning alleviates the entanglement between representation learning and label disambiguation, providing reliable supervision information for the representation learning process.
[0006] To achieve the above objectives, the present invention provides the following technical solution:
[0007] A hyperspectral image bias label learning method based on heterogeneous network cross-disambiguation includes the following steps: S1: Data input, inputting hyperspectral image (HSI) data. ,in and It refers to the size of the space. S2: Data preprocessing, performing principal component analysis (PCA) on the read hyperspectral image to obtain dimensionality-reduced hyperspectral data. ,in This refers to the number of spectral bands after PCA; S3: data chunking, using... Window size for raw data and the data after dimensionality reduction S4: Data partitioning. A certain proportion of labeled original sample image blocks and PCA-reduced sample image blocks are extracted as the training set, and the remaining labeled sample image blocks are used as the test set. The proportion can be set according to actual conditions. S5: Constructing a label confidence matrix. Candidate label sets are allocated to the training samples according to a candidate label threshold R, and normalized to obtain the candidate label confidence matrix. A larger R indicates more candidate labels. S6: Representation learning through heterogeneous networks. The processed original image blocks and PCA data blocks are input into heterogeneous network module A and network module B respectively to obtain the corresponding sample feature representations. , And the predicted probability of the category. and S7: Update the label confidence matrix using a cross-label disambiguation strategy. Calculate the similarity between the category prototype vector and the sample feature representation in S6, and iteratively update the label confidence matrix based on this similarity to increase the confidence of the true label. S8: Update the category prototype. Obtain pseudo-labels based on the category prediction probabilities in S6 and update the new category prototypes using their corresponding sample semantic representations. S9: Loss calculation and model update. Process the image patches corresponding to the training samples through steps S6 to S8. During repeated iterations, use consistency loss and cross-entropy loss for supervision, and use stochastic gradient descent for parameter updates.
[0008] Furthermore, the feature extraction of network module A and network module B in step S6 includes the following steps: S61, image patch , The output is obtained after passing through the feature embedding layers of network module A and network module B respectively. , Specifically, in the feature embedding layer of network module A, convolutional layers with different kernels of 1×1, 3×3, and 3×3 are used to extract features; in the feature embedding layer of network module B, convolutional layers with different kernels of 9×1×1, 7×3×3, and 5×3×3 are used to extract features, as shown in the following formula:
[0009]
[0010]
[0011] S62, , The data are processed by multi-directional perceptrons in network module A and network module B, respectively, and the outputs feature representations containing long-range dependency information. , Specifically, will Expanding along three directions—horizontal, vertical, and feature dimensions—results in three data formats. , , (where d is the feature dimension) is fed into a fully connected layer (FC), and then the feature representation is output by reshaping it back to its original form and using an adaptive weighted sum. ;Will Expanding along four directions—horizontal, vertical, channel, and feature dimensions—results in four data formats. , , , (where l is the number of channels) and input it into a fully connected layer (FC), then restore the original form through a reshape operation and output the feature representation through an adaptive phase weighting method. The formula is as follows:
[0012]
[0013]
[0014] S63, , After being processed by multi-scale attention mechanisms in network module A and network module B respectively, the output features include global spatial information. , Specifically, in the feature embedding layer of network module A, 3×3 convolutions with dilation rates of 1, 3, and 5, and a 1×1 convolution are used to obtain spatial attention maps of different scales. These maps are then summed to obtain the global spatial attention map. ,Will and Dot product In the feature embedding layer of network module B, 1×5×5 convolutional kernels with dilation rates of (1,1,1) and (1,2,2), 3×5×5 convolutions with dilation rates of (1,5,5), and a 1×1 convolution are used to obtain spatial attention maps of different scales. These are then summed to obtain the global spatial attention map. ,Will and Dot product .
[0015] S64, will , Perform averaging and fully connected operations to output feature representations. , Predicted probability and .
[0016] Furthermore, in step S7, both network module A and network module B maintain a prototype vector for each category. , ,in C represents the number of labels; specifically, the feature representation output by network module A in step 6 is calculated. With the corresponding prototype vector The inner product of the features output by network module B With the corresponding prototype vector The inner product of network module A is used as the basis for label disambiguation in network module B to update the label confidence matrix. The calculation results of network module B are used as the basis for label disambiguation in network module A to update the label confidence matrix. The formula is as follows:
[0017]
[0018]
[0019]
[0020] in For updating the weights, T is the matrix transpose operation.
[0021] Furthermore, in step S8, the category prediction probability output in step 6 is used to initially determine the category of the sample, and the prototype vector of the corresponding category is updated using the sample semantic representation, as shown in the following formula:
[0022]
[0023] in To update the weights, and These are the sample sets initially identified as category c in network module A and network module B, respectively.
[0024] Furthermore, in step S9, network module A and network module B are first established in feature representation. , and predicted probability , The consistency constraints on the surface together constitute the consistency loss, as shown in the following formula:
[0025]
[0026] Then calculate the classification cross-entropy loss for network module A and network module B respectively. , The formula is as follows:
[0027]
[0028]
[0029] Finally, the classification loss and consistency loss of network module A and network module B are added together to obtain the final loss used to optimize the corresponding module. , The model parameters are optimized based on network loss and backpropagation process; after training, a trained hyperspectral image classifier is obtained; the trained classifier is used to judge the input samples to obtain the sample category.
[0030] The beneficial effects of this invention are as follows:
[0031] The heterogeneous network feature representation module proposed in this invention can fully utilize the complementarity between networks to capture spatial and spectral information in hyperspectral images, with a particular focus on discriminative information of categories. Through a cross-label disambiguation strategy, it can alleviate the entanglement between feature representation and label disambiguation, thereby enabling both to mutually promote each other during training and obtain a classifier that can accurately identify real labels. This invention proposes learning discriminative classification features directly from the candidate label set and continuously updating supervision information during the learning process, thus reducing the labeling pressure and cost of hyperspectral images.
[0032] Other advantages, objectives, and features of the invention will be set forth in part in the description which follows, and in part will be apparent to those skilled in the art from the following examination, or may be learned from practice of the invention. The objectives and other advantages of the invention can be realized and obtained through the following description. Attached Figure Description
[0033] To make the objectives, technical solutions, and advantages of the present invention clearer, the preferred embodiments of the present invention will be described in detail below with reference to the accompanying drawings, wherein:
[0034] Figure 1 This is a flowchart of the method of the present invention;
[0035] Figure 2 This is an overall framework diagram of the present invention;
[0036] Figure 3 This is a structural diagram of network module A in the heterogeneous network of the present invention;
[0037] Figure 4 This is a structural diagram of network module B in the heterogeneous network of the present invention;
[0038] Figure 5 The dataset is the Indian dataset, where (a) is a true-color image and (b) is a ground truth image.
[0039] Figure 6 Visualizations of different methods on the Indian dataset, including (a) PiCO, (b) PaPi, (c) ParSE, (d) SLAP, (e) the results of this invention, and (f) the ground truth map; Detailed Implementation
[0040] To make the objectives, technical solutions, and advantages of the present invention clearer, the present invention will be further described in detail below with reference to the accompanying drawings and examples.
[0041] like Figure 1As shown, a hyperspectral image bias label learning method for heterogeneous network cross-label disambiguation includes the following steps:
[0042] 1. Data Import: Import hyperspectral image (HSI) data. ,in and It refers to the size of the space. It is the number of spectral bands;
[0043] 2. Data Preprocessing: Principal component analysis (PCA) is performed on the read hyperspectral images to obtain dimensionality-reduced hyperspectral data. ,in It represents the number of spectral bands after PCA.
[0044] 3. Data partitioning: using Window size for raw data and the data after dimensionality reduction Perform block processing;
[0045] 4. Data partitioning: Select a certain proportion of labeled original sample image blocks and sample image blocks after PCA dimensionality reduction as the training set, and use the remaining labeled sample image blocks as the test set. The proportion can be set according to the actual situation.
[0046] 5. Construct the label confidence matrix: Assign candidate label sets to the training samples according to the candidate label threshold R and normalize them to obtain the candidate label confidence matrix. The larger R is, the more candidate labels there are.
[0047] 6. Representation learning through heterogeneous networks: The processed original image blocks and PCA data blocks are input into heterogeneous network module A and network module B, respectively, to obtain the corresponding sample feature representations. , And the predicted probability of the category. and Specific examples include... Figure 2 , Figure 3 , Figure 4 As shown. First, the image patch... , The features are extracted through feature embedding layers in network module A and network module B, respectively. Specifically, network module A uses convolutional layers with different kernel sizes of 1×1, 3×3, and 3×3 to extract the features. In network module B, convolutional layers with different kernel sizes of 9×1×1, 7×3×3, and 5×3×3 are used to extract features. The formula is shown below:
[0048]
[0049]
[0050] Then , After passing through the multi-directional perceptrons of network module A and network module B respectively, it is about to Expanding along three directions—horizontal, vertical, and feature dimensions—results in three data formats. , , (where d is the feature dimension) is fed into a fully connected layer (FC), and then the feature representation is output by reshaping it back to its original form and using an adaptive weighted sum. ;Will Expanding along four directions—horizontal, vertical, channel, and feature dimensions—results in four data formats. , , , (where l is the number of channels) and input it into a fully connected layer (FC), then restore the original form through a reshape operation and output the feature representation through an adaptive phase weighting method. The formula is as follows:
[0051]
[0052]
[0053] Further , The data is processed through multi-scale attention mechanisms in network modules A and B, respectively. Specifically, in the feature embedding layer of network module A, a 3×3 convolution with dilation rates of 1, 3, and 5, and a 1×1 convolution are used to obtain spatial attention maps at different scales. These maps are then summed to obtain the global spatial attention map. ,Will and Dot product In the feature embedding layer of network module B, 1×5×5 convolutional kernels with dilation rates of (1,1,1) and (1,2,2), 3×5×5 convolutions with dilation rates of (1,5,5), and a 1×1 convolution are used to obtain spatial attention maps at different scales. These are then summed to obtain the global spatial attention map. ,Will and Dot product .
[0054] Finally , Perform averaging and fully connected operations to output feature representations. , Predicted probability and .
[0055] 7. Update the label confidence matrix using a cross-label disambiguation strategy. This involves calculating the similarity between the category prototype vector and the sample feature representation, and then iteratively updating the label confidence matrix based on this similarity to increase the confidence of the true labels. Specifically, as shown below... Figure 2 As shown. First, in both network module A and network module B, a prototype vector is maintained for each category. , ,in C represents the number of labels. Then, the feature representation output by network module A is calculated. With the corresponding prototype vector The inner product of the features output by network module B With the corresponding prototype vector The inner product of the two components is then used. Finally, the calculation result of network module A is used as the basis for label disambiguation in network module B to update the label confidence matrix. The calculation results of network module B are used as the basis for label disambiguation in network module A to update the label confidence matrix. The formula is as follows:
[0056]
[0057]
[0058]
[0059] in For updating the weights, T is the matrix transpose operation.
[0060] 8. Update Category Prototype: Obtain pseudo-labels based on the category prediction probabilities and update the new category prototypes using their corresponding sample semantic representations. First, use the category prediction probabilities output above. , First, determine the category of the sample, then update the prototype vector of the corresponding category using the sample semantic representation. The formula is as follows:
[0061]
[0062] in To update the weights, and These are the sample sets initially identified as category c in network module A and network module B, respectively.
[0063] 9. Loss Calculation and Parameter Update: Calculate the consistency loss and classification cross-entropy loss of the heterogeneous networks. First, establish the feature representation of network module A and network module B. , and predicted probability , These consistency constraints collectively constitute the consistency loss. The specific formula is as follows:
[0064]
[0065] Then calculate the classification cross-entropy loss for network module A and network module B respectively. , The formula is as follows:
[0066]
[0067]
[0068] Finally, the classification loss and consistency loss of network module A and network module B are added together to obtain the final loss used to optimize the corresponding module. , The image patches corresponding to the training samples are processed through steps 6 to 8. During repeated iterations, consistency loss and cross-entropy loss are used for supervision, and stochastic gradient descent is used for parameter updates. After training, a trained hyperspectral image classifier is obtained; finally, the trained classifier is used to discriminate input samples to obtain the sample category.
[0069] like Figure 6 The experimental results of the method described in this invention on the open-source hyperspectral classification dataset Indian show that the types of surface objects are well identified. The classification performance of this invention can be further illustrated through comparative experiments. The method of this invention is compared with other existing methods such as PiCO, ParSE, PaPi, and SLAP on the Indian dataset, and the overall accuracy is calculated. A higher overall accuracy indicates better classification performance. Table 1 shows the comparison of the overall accuracy of different methods under different partial label thresholds:
[0070] Table 1. Comparison of various methods on the Indian dataset.
[0071]
[0072] It can be seen that the method of the present invention achieves the best accuracy on this dataset. Finally, it should be noted that the above embodiments are only used to illustrate the technical solutions of the present invention and are not intended to limit it. Although the present invention has been described in detail with reference to preferred embodiments, those skilled in the art should understand that modifications can be made to the technical solutions of the present invention without departing from the spirit and scope of the present invention, and all such modifications should be covered within the scope of the claims of the present invention.
Claims
1. A hyperspectral image bias label learning method based on heterogeneous network cross-disambiguation, characterized in that: The method includes the following steps: S1: Data import, importing hyperspectral image data. ,in and It refers to the size of the space. It is the number of spectral bands; S2: Data preprocessing: Principal component analysis (PCA) is performed on the read hyperspectral images to obtain dimensionality-reduced hyperspectral data. , where l is the number of spectral bands after PCA; S3: Data partitioning, using... Window size for raw data and the data after dimensionality reduction Perform block processing; S4: Data partitioning; S5: Construct a label confidence matrix. Assign candidate label sets to the training samples according to the candidate label threshold R and normalize them to obtain the candidate label confidence matrix. The larger R is, the more candidate labels there are. S6: First, divide the image into blocks. , The features are extracted through feature embedding layers in network module A and network module B, respectively. Specifically, network module A uses convolutional layers with different kernel sizes of 1×1, 3×3, and 3×3 to extract the features. In network module B, convolutional layers with different kernel sizes of 9×1×1, 7×3×3, and 5×3×3 are used to extract features. The formula is shown below: Then , After passing through the multi-directional perceptrons of network module A and network module B respectively, it is about to Expanding along three directions—horizontal, vertical, and feature dimensions—results in three data formats. , Where d is the feature dimension; then it is input into a fully connected layer and restored to its original form through a Reshape operation and output as a feature representation by an adaptive weighted sum. ;Will Expanding along four directions—horizontal, vertical, channel, and feature dimensions—results in four data formats. , Where l is the number of channels, it is then fed into a fully connected layer and then the original form is restored through a reshape operation and the feature representation is output through an adaptive phase weighting method. The formula is as follows: Further , The data is processed through multi-scale attention mechanisms in network modules A and B, respectively. Specifically, in the feature embedding layer of network module A, a 3×3 convolution with dilation rates of 1, 3, and 5, and a 1×1 convolution are used to obtain spatial attention maps at different scales. These maps are then summed to obtain the global spatial attention map. ,Will and Dot product In the feature embedding layer of network module B, 1×5×5 convolutional kernels with dilation rates of (1, 1, 1) and (1, 2, 2), 3×5×5 convolutions with dilation rates of (1, 5, 5), and a 1×1 convolution are used to obtain spatial attention maps of different scales. These are then summed to obtain the global spatial attention map. ,Will and Dot product Finally, , Perform averaging and fully connected operations to output feature representations. , Predicted probability and ; S7: Update the label confidence matrix through a cross-label disambiguation strategy, that is, calculate the similarity between the category prototype vector and the sample feature representation and update the label confidence matrix iteratively based on this to increase the confidence of the true label. S8: Update the category prototype; S9: Loss calculation, model training, and parameter updates.
2. The hyperspectral image bias label learning method based on heterogeneous network cross-disambiguation according to claim 1, characterized in that: First, a prototype vector is maintained for each category in both network module A and network module B. , ,in C represents the number of labels; then the feature representation output by network module A is calculated. With the corresponding prototype vector The inner product of the features output by network module B With the corresponding prototype vector The inner product is calculated; finally, the calculation result of network module A is used as the basis for label disambiguation in network module B to update the label confidence matrix. The calculation results of network module B are used as the basis for label disambiguation in network module A to update the label confidence matrix. The formula is as follows: in For updating the weights, T is the matrix transpose operation.
3. The hyperspectral image bias label learning method based on heterogeneous network cross-disambiguation according to claim 2, characterized in that: Pseudo-labels are obtained based on the category prediction probabilities, and the new category prototypes are updated using their corresponding sample semantic representations; firstly, the category prediction probabilities are used. , First, determine the category of the sample, and then update the prototype vector of the corresponding category using the sample semantic representation, as shown in the following formula: in To update the weights, and These are the sample sets initially identified as category c in network module A and network module B, respectively.
4. The hyperspectral image bias label learning method based on heterogeneous network cross-disambiguation according to claim 3, characterized in that: Calculate the consistency loss and classification cross-entropy loss of heterogeneous networks; first, establish the feature representation of network module A and network module B. , and predicted probability , The consistency constraints on the network modules together constitute the consistency loss; then, the classification cross-entropy loss of network module A and network module B is calculated. , The formula is as follows: Finally, the classification loss and consistency loss of network module A and network module B are added together to obtain the final loss used to optimize the corresponding module. , The image patches corresponding to the training samples are processed through S6 to S8. During the iterative process, consistency loss and cross-entropy loss are used for supervision, and stochastic gradient descent is used for parameter updates. After training, a trained hyperspectral image classifier is obtained; finally, the trained classifier is used to judge the input samples to obtain the sample category.