Weakly supervised semantic segmentation method, device and medium based on dual student progressive learning

By employing a weakly supervised semantic segmentation network with dual-student progressive learning, utilizing differential constraints and progressive learning strategies, dynamically adjusting thresholds and adaptive noise filtering, and combining consistency regularization, the confirmation bias problem in single-stage weakly supervised semantic segmentation is solved, thereby improving the accuracy and robustness of the segmentation results.

CN118262113BActive Publication Date: 2026-06-19SHANGHAI UNIV

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
SHANGHAI UNIV
Filing Date
2024-03-27
Publication Date
2026-06-19

Smart Images

  • Figure CN118262113B_ABST
    Figure CN118262113B_ABST
Patent Text Reader

Abstract

This invention relates to a weakly supervised semantic segmentation method, device, and medium based on dual-student progressive learning, comprising the following steps: acquiring an image to be segmented; inputting the image into a weakly supervised semantic segmentation network based on dual-student progressive learning to obtain a semantic segmentation result for the image to be segmented; the weakly supervised semantic segmentation network based on dual-student progressive learning includes two independent sub-networks, each sub-network including a segmentation head and a backbone network; introducing a difference constraint between the backbone networks of the two sub-networks, which are used to generate dissimilar class activation maps based on the image to be segmented; the segmentation head is used to obtain the segmentation result based on the image to be segmented, and the segmentation head of each sub-network performs weakly supervised learning based on pseudo-labels generated by the other sub-network, the pseudo-labels being generated based on the class activation maps using a progressive learning strategy; the output of the segmentation head with the better segmentation performance between the two sub-networks is used as the final output. Compared with the prior art, this invention can improve the accuracy of semantic segmentation results.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention belongs to the field of computer vision technology, and in particular relates to a weakly supervised semantic segmentation method, device and medium based on dual-student progressive learning. Background Technology

[0002] Semantic segmentation is an important task in image processing. Weakly supervised semantic segmentation is a method for training semantic segmentation models under image-level annotation, typically relying on image-level labels. Class activation maps are a technique used to extract regions related to specific categories from deep neural networks. Pseudo-labels can be generated using class activation maps. The paper "Learning pixel-level semantic affinity with image-level supervision for weakly supervised semantic segmentation" (Ahn J, Kwak S., Proceedings of the IEEE conference on computer vision and pattern recognition. 2018:4981-4990) utilizes a post-processing network combined with dense conditional random fields to optimize pseudo-labels, which are then used as supervision signals to guide the model in completing the semantic segmentation task. The aforementioned multi-stage paradigm often suffers from severe efficiency limitations due to the need to train multiple models. Therefore, some researchers have proposed efficient single-stage solutions for weakly supervised semantic segmentation tasks, where pseudo-label generation and segmentation head training are performed simultaneously. However, due to the inherent ambiguity of class activation maps, this method is prone to errors in pseudo-label generation. Since backbone features are used for both segmentation head and class activation map generation, these inaccurate pseudo-labels not only hinder the segmentation learning process but may also reinforce incorrect class activation map judgments, causing the model to accumulate more and more errors during training, leading to confirmation bias and ultimately affecting segmentation performance. Some methods use high thresholds to filter out a large number of unreliable pixel supervisions; however, discarding unreliable pseudo-labels with a fixed threshold cannot effectively improve the performance of class activation maps in low-confidence regions, resulting in inaccurate segmentation results and insufficient supervision signals. Therefore, a semantic segmentation method is needed to address the confirmation bias problem in single-stage weakly supervised semantic segmentation. Summary of the Invention

[0003] The purpose of this invention is to overcome the shortcomings of the existing technology by providing a weakly supervised semantic segmentation method, device and medium based on dual-student progressive learning, solving the confirmation bias problem in single-stage weakly supervised semantic segmentation, and further improving the accuracy of the segmentation results.

[0004] The objective of this invention can be achieved through the following technical solutions:

[0005] A weakly supervised semantic segmentation method based on dual-student progressive learning includes the following steps:

[0006] Obtain the image to be segmented, input it into a weakly supervised semantic segmentation network based on dual-student progressive learning, and obtain the semantic segmentation result of the image to be segmented;

[0007] The weakly supervised semantic segmentation network based on dual-student progressive learning comprises two independent sub-networks, each including a segmentation head and a backbone network. A difference constraint is introduced between the backbone networks of the two sub-networks, which are used to generate distinct class activation maps based on the image to be segmented. The segmentation head is used to obtain the segmentation result from the image to be segmented. The segmentation head of each sub-network performs weakly supervised learning based on pseudo-labels generated by the other sub-network. These pseudo-labels are generated based on the class activation maps using a progressive learning strategy. The output of the segmentation head with the better segmentation performance between the two sub-networks is taken as the final output of the weakly supervised semantic segmentation network based on dual-student progressive learning.

[0008] Furthermore, the difference constraint is achieved by minimizing the cosine similarity of features extracted from the image to be segmented by the two backbone networks.

[0009] Furthermore, the progressive learning strategy includes dynamic threshold adjustment and adaptive noise filtering.

[0010] Furthermore, a cosine descent strategy is employed to achieve the dynamic threshold adjustment.

[0011] Furthermore, the expression for the dynamic threshold adjustment is as follows:

[0012]

[0013] Where, τ h (t) represents the current threshold, τ h (0) is the initial threshold, τ h (T) is the final threshold, t is the number of steps at the current time, and T is the total number of training steps.

[0014] Furthermore, the loss distribution of noise and pseudo-labels is fitted based on a Gaussian mixture model. The noise probability of each pixel is inferred through the expectation-maximization algorithm. Noise pixels are excluded according to the set threshold and distance metric, thus realizing the adaptive noise filtering.

[0015] Furthermore, a consistency regularization method is introduced to provide additional supervision for the split head.

[0016] Furthermore, the specific process of the consistency regularization method is as follows:

[0017] A perturbation is added to the image to be segmented, and the image to be segmented is input together with the original image to be segmented into the weakly supervised semantic segmentation network based on dual-student progressive learning. The consistency loss is calculated based on the obtained segmentation results, and the consistency loss is added to the loss of the weakly supervised semantic segmentation network based on dual-student progressive learning. The hyperparameters of the network are adjusted through iterative training.

[0018] The present invention also provides an electronic device, including a memory, a processor, and a program stored in the memory, wherein the processor executes the program to implement the above-described method.

[0019] The present invention also provides a computer-readable storage medium having a computer program stored thereon, which, when executed by a processor, implements the above-described method.

[0020] Compared with the prior art, the present invention has the following beneficial effects:

[0021] 1. This invention proposes a weakly supervised semantic segmentation network structure based on dual-student progressive learning, comprising two independent sub-networks. Each sub-network includes a segmentation head and a backbone network. A difference constraint is introduced between the backbone networks of the two sub-networks, which can generate dissimilar class activation maps based on the image to be segmented. The segmentation head of each sub-network performs weakly supervised learning based on pseudo-labels generated by the other sub-network. The output of the segmentation head with better segmentation performance is used as the final output of the weakly supervised semantic segmentation network based on dual-student progressive learning. The two sub-networks generate supervision signals for each other, which can effectively reduce the confirmation bias caused by learning incorrect pseudo-labels and improve the accuracy of the segmentation results.

[0022] 2. This invention introduces a progressive learning strategy, which gradually introduces more reliable pseudo-labels for the supervised learning process through dynamic threshold adjustment and adaptive noise filtering. This helps to improve the utilization of reliable pseudo-labels by the weakly supervised semantic segmentation network, thereby reducing the confirmation bias caused by erroneous pseudo-labels in the class activation graph during training and improving the robustness of the weakly supervised semantic segmentation network during training.

[0023] 3. This invention introduces consistency regularization to provide additional supervision for the segmentation head, which can ensure that the weakly supervised semantic segmentation network pays attention to all pixels of the image to be segmented, thereby further improving the semantic segmentation performance. Attached Figure Description

[0024] Figure 1 This is a schematic diagram of the structure of a weakly supervised semantic segmentation network based on dual-student progressive learning. Detailed Implementation

[0025] The present invention will now be described in detail with reference to the accompanying drawings and specific embodiments. These embodiments are based on the technical solution of the present invention and provide detailed implementation methods and specific operating procedures. However, the scope of protection of the present invention is not limited to the following embodiments.

[0026] Example:

[0027] This embodiment provides a weakly supervised semantic segmentation method based on dual-student progressive learning, including the following steps:

[0028] Obtain the image to be segmented, input it into a weakly supervised semantic segmentation network based on dual-student progressive learning, and obtain the semantic segmentation result of the image to be segmented;

[0029] Among them, weakly supervised semantic segmentation networks based on dual-student progressive learning, such as Figure 1 As shown, the system comprises two independently updated sub-networks that do not share parameters to avoid class activation map confirmation bias caused by learning its own erroneous information. Each sub-network includes a backbone network and a segmentation head. For a batch of images, the images are first fed into the backbone network for feature extraction, and the results are then fed into the segmentation head to obtain classification results. The class activation map corresponding to each backbone network can be obtained through the weights of the segmentation head. To achieve diversity between the two sub-networks, this invention introduces a difference constraint between the two backbone networks, achieved by minimizing the cosine similarity of the features extracted from the images to be segmented by the two backbone networks. Then, pseudo-labels are generated based on the class activation maps, such as... Figure 1 As shown, the pseudo-labels of one sub-network serve as supervision for the segmentation head of another sub-network. The output of the segmentation head with the better segmentation performance is taken as the final semantic segmentation result of the image to be segmented. This network structure can effectively improve the quality of pseudo-labels while reducing class activation map confirmation bias, providing a good supervision signal for the segmentation head, and enabling the two sub-networks to learn from each other's knowledge more effectively.

[0030] In the pseudo-label generation process, a progressive learning strategy is employed, including dynamic threshold adjustment and adaptive noise filtering. Traditional methods generate pseudo-labels by setting a fixed background threshold, ensuring that only reliable foreground pseudo-labels are used for segmentation-supervised training. However, the method proposed in this invention uses a cosine descent strategy to dynamically adjust the threshold in each iteration. This allows for full utilization of more foreground pseudo-labels during training, improving the use of reliable pseudo-labels and mitigating confirmation bias caused by erroneous pseudo-labels in the class activation map. This enhances the robustness of the semantic segmentation network during training. The specific expression for dynamic threshold adjustment is as follows:

[0031]

[0032] Where, τ h (t) represents the current threshold, τh (0) is the initial threshold, τ h (T) is the final threshold, t is the number of steps at the current time, and T is the total number of training steps.

[0033] To reduce noise in false labels that negatively impact segmentation generalization and reinforce confirmation bias, this invention introduces an adaptive noise filtering strategy. This strategy analyzes the loss distribution, fits the loss distributions of noise and false labels to a Gaussian mixture model, infers the noise probability of each pixel using an expectation-maximization algorithm, and excludes noisy pixels based on a set threshold and distance metric. This method enables the model to perform progressive learning more reliably, thereby further improving the quality of the supervision signal and contributing to better segmentation learning.

[0034] Furthermore, in existing methods, unreliable pseudo-labels that may contain noise are often discarded to ensure segmentation quality. However, due to the semantic ambiguity of class activation maps, many unreliable regions, such as non-discriminatory regions, boundaries, and background, still exist during the training phase. To address this issue, this invention treats regions with unreliable pseudo-labels as unlabeled samples and requires the segmentation head to maintain consistency when outputting perturbed versions of the same image by perturbing the input image. This consistency regularization method provides additional supervision to the model, particularly in difficult-to-handle regions such as non-discriminatory regions, boundaries, and background, thus contributing to improved segmentation quality.

[0035] To verify the effectiveness of the method of this invention, this embodiment compares the method of this invention with existing methods on three datasets, with mIoU (intersection over union) as the metric, i.e., the intersection-over-union ratio of the segmentation result and the ground-truth. The experimental results are shown in Table 1, which shows that the method of this invention achieves better segmentation results compared with existing methods.

[0036] Table 1 Comparative Experiment Results of Examples

[0037]

[0038] In summary, this invention introduces a dual-student network structure, using difference loss to enable each sub-network to generate different class activation maps, thus providing mutual supervision and effectively mitigating confirmation bias caused by learning its own erroneous pseudo-labels. Secondly, the proposed reliable stepwise learning strategy, through dynamic threshold adjustment and adaptive noise filtering, gradually introduces more reliable pseudo-labels for supervised learning, improving the model's robustness during training. Finally, the consistency regularization introduced in this invention provides supervision for each pixel in discarded regions, improving model performance even if these pixels are discarded due to their unreliability. These innovations collectively constitute the core technology of this invention, providing a series of unique and effective solutions to the confirmation bias problem in weakly supervised semantic segmentation, possessing significant technical and commercial value.

[0039] If the above methods are implemented as software functional units and sold or used as independent products, they can be stored in a computer-readable storage medium. Based on this understanding, the technical solution of this invention, or the part that contributes to the prior art, or a part of the technical solution, can be embodied in the form of a software product. This computer software product is stored in a storage medium and includes several instructions to cause a computer device (which may be a personal computer, server, or network device, etc.) to execute all or part of the steps of the methods described in the various embodiments of this invention. The aforementioned storage medium includes various media capable of storing program code, such as USB flash drives, portable hard drives, read-only memory (ROM), random access memory (RAM), magnetic disks, or optical disks.

[0040] The above description of the embodiments is provided to enable those skilled in the art to understand and use the invention. It will be apparent to those skilled in the art that various modifications can be made to these embodiments, and the general principles described herein can be applied to other embodiments without inventive effort. Therefore, the present invention is not limited to the above embodiments, and any improvements and modifications made by those skilled in the art based on the disclosure of the present invention without departing from the scope of the invention should be within the protection scope of the present invention.

Claims

1. A weakly supervised semantic segmentation method based on dual student progressive learning, characterized in that, Includes the following steps: Obtain the image to be segmented, input it into a weakly supervised semantic segmentation network based on dual-student progressive learning, and obtain the semantic segmentation result of the image to be segmented; The weakly supervised semantic segmentation network based on dual-student progressive learning includes two independent sub-networks, each of which includes a segmentation head and a backbone network. A difference constraint is introduced between the backbone networks of the two sub-networks to generate different class activation maps based on the image to be segmented. The segmentation head is used to obtain the segmentation result based on the image to be segmented. The segmentation head of each sub-network performs weakly supervised learning based on the pseudo-labels generated by the other sub-network. The pseudo-labels are generated based on the class activation map using a progressive learning strategy. The output of the segmentation head with the better segmentation effect among the two sub-networks is used as the final output of the weakly supervised semantic segmentation network based on dual student progressive learning. 2.The weakly supervised semantic segmentation method based on dual student progressive learning according to claim 1, characterized in that, The difference constraint is achieved by minimizing the cosine similarity of features extracted from the image to be segmented by the two backbone networks.

3. The weakly supervised semantic segmentation method based on dual-student progressive learning according to claim 1, characterized in that, The progressive learning strategy includes dynamic threshold adjustment and adaptive noise filtering.

4. The weakly supervised semantic segmentation method based on dual-student progressive learning according to claim 3, characterized in that, The dynamic threshold adjustment is achieved using a cosine descent strategy.

5. The weakly supervised semantic segmentation method based on dual-student progressive learning according to claim 4, characterized in that, The specific expression for the dynamic threshold adjustment is as follows: where τ h (t) is the current threshold, τ h (0) is the initial threshold, τ h (T) is the final threshold, t is the current time step, and T is the total number of training steps.

6. The weakly supervised semantic segmentation method based on dual-student progressive learning according to claim 3, characterized in that, The loss distribution of noise and false labels is fitted based on a Gaussian mixture model. The noise probability of each pixel is inferred through the expectation-maximization algorithm. Noise pixels are excluded according to the set threshold and distance metric, thus realizing the adaptive noise filtering.

7. A weakly supervised semantic segmentation method based on dual-student progressive learning according to claim 1, characterized in that, A consistency regularization method is introduced to provide additional supervision for the split head.

8. The weakly supervised semantic segmentation method based on dual-student progressive learning according to claim 7, characterized in that, The specific process of the consistency regularization method is as follows: A perturbation is added to the image to be segmented, and the image to be segmented is input together with the original image to be segmented into the weakly supervised semantic segmentation network based on dual-student progressive learning. The consistency loss is calculated based on the obtained segmentation results, and the consistency loss is added to the loss of the weakly supervised semantic segmentation network based on dual-student progressive learning. The hyperparameters of the network are adjusted through iterative training.

9. An electronic device comprising a memory, a processor, and a program stored in the memory, characterized in that, When the processor executes the program, it implements the method as described in any one of claims 1-8.

10. A computer-readable storage medium having a computer program stored thereon, characterized in that, When the program is executed by the processor, it implements the method as described in any one of claims 1-8.