Active domain adaptation semantic segmentation method, system, device and storage medium
By employing superpixel-level annotation and a two-stage selection strategy, combined with cross-domain hybridization and pseudo-label consistency techniques, the problem of insufficient annotation efficiency and quality in existing technologies is solved, achieving efficient domain-adaptive semantic segmentation.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- UNIV OF SCI & TECH OF CHINA
- Filing Date
- 2024-09-09
- Publication Date
- 2026-06-23
AI Technical Summary
In existing active domain-adaptive semantic segmentation methods, annotation efficiency and annotation quality need to be improved, and the domain adaptation performance has not been fully resolved.
A superpixel-level annotation method is adopted. The entropy map of the target domain image is extracted through the domain adaptation model. The superpixel extraction network is combined to divide the superpixels into high and low uncertainty. Two-stage annotation and training are carried out. High uncertainty superpixels with large feature differences are selected for annotation. The model is trained by combining cross-domain mixing and pseudo-label consistency techniques.
It improves annotation efficiency and quality, ensures the performance of domain-adaptive semantic segmentation, reduces annotation costs, and enhances the model's adaptability in the target domain.
Smart Images

Figure CN119131393B_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the field of image semantic segmentation technology, and in particular to an active domain adaptive semantic segmentation method, system, device and storage medium. Background Technology
[0002] In recent years, deep learning has made significant progress in the field of computer vision. However, deep learning models typically require large amounts of high-quality labeled data for training. For pixel-level prediction tasks such as semantic segmentation, obtaining large-scale labeled datasets requires enormous time and human resources, making manually labeled datasets impractical.
[0003] To address this issue, researchers have proposed unsupervised domain adaptation learning methods. These methods aim to leverage knowledge learned on a source domain with a large number of labeled samples to help the model learn on a target domain that is related to the source domain but lacks labels. By reducing the disparity between domains, the model's performance on the target domain is improved, thereby reducing the labeling cost in the target domain. Although current unsupervised domain adaptation learning methods have achieved good results by reducing the disparity between the source and target domains, the domain adaptation problem remains unresolved due to the lack of guidance from target domain labeling information.
[0004] To further improve model performance in the target domain, an active domain adaptation learning method has been proposed. This method actively selects a small number of the most valuable target domain samples for annotation, thereby guiding the model's domain adaptation learning. By selectively choosing labeled samples, the model can better learn the characteristics of the target domain. This method significantly improves domain adaptation performance by introducing only a small amount of annotation cost.
[0005] A common active domain adaptation method selects the most valuable target domain samples for image-level or pixel-level annotation based on the uncertainty of model predictions. In Chinese invention patent CN111767674B, entitled "A Well Logging Lithology Identification Method Based on Active Domain Adaptation," two differentiated neural networks are trained using source domain samples, and the prediction results of the two neural networks for each sample in the target sample set are obtained. Based on the prediction results, this method selects samples with high uncertainty in the target domain samples for image-level annotation and uses these annotated samples for the model's target domain learning. In Chinese invention patent application CN114220086A, entitled "A Cost-Effective Scene Text Detection Method and System," a scene text detection network with an entropy-aware global alignment module and a text region alignment module is pre-trained to reduce inter-domain differences. The pre-trained model is used for active learning based on uncertainty metrics, selecting several target domain samples with the highest uncertainty for image-level annotation, and these annotated samples are used to fine-tune the pre-trained model to adapt to target domain learning. In Chinese invention patent application CN115620160A, entitled "Remote Sensing Image Classification Method Based on Multi-Classifier Adversarial Active Transfer Learning," an adversarial transfer learning method is used to train a feature extractor and two classifiers. An active learning query strategy is used to select target domain data that are inconsistently classified by the multi-classifier for image-level annotation, which is then used for target domain learning in the model. In Chinese invention patent application CN116863186A, entitled "Image Classification Method and System Based on Source Domain Independent Adaptation and Active Learning," a source domain model is used to initialize the feature extractor and classifier of the target domain model, and features of all samples are extracted. Clustering is performed on the features of all samples to obtain pseudo-labels for all samples, and the neighbor uncertainty value of each sample is calculated. Target domain samples are selected for image-level annotation based on the neighbor uncertainty value. In Chinese invention patent application CN116758541A, entitled "A Domain-Adaptive Interactive Semantic Segmentation Method, Apparatus, and Device Based on Active Learning," a weakly supervised approach is used to accelerate the integration of human-computer interaction and semantic segmentation tasks. During the warm-up phase, source domain data and a labeling sampling strategy are used to train the model in multiple rounds. The labeling sampling strategy employs a "look first, ask later" approach to perform pixel-level labeling on target domain samples (only a small number of the most valuable pixels are labeled per image) based on a preset value evaluation function and a preset score threshold. In Chinese invention patent application CN117058373A, entitled "A Semi-Supervised Domain-Adaptive Semantic Segmentation Method Based on Dual-Level Alignment Active Learning," source domain data is used for model pre-training. The pre-trained model extracts features from target domain samples and performs feature clustering. An equal number of the most entropy-rich, difficult target domain samples are selected from each cluster for image-level labeling. These labeled data are then used for adversarial and contrastive learning, improving the model's domain adaptation performance through dual-level alignment active learning.
[0006] However, the above methods for image-level or pixel-level annotation still need to be improved in terms of both annotation efficiency and annotation quality.
[0007] In view of this, the present invention is hereby proposed. Summary of the Invention
[0008] The purpose of this invention is to provide an active domain-adaptive semantic segmentation method, system, device, and storage medium, which improves the annotation efficiency of active domain adaptation, enhances the annotation quality, and ensures the performance of domain-adaptive semantic segmentation.
[0009] The objective of this invention is achieved through the following technical solution:
[0010] An active domain adaptive semantic segmentation method includes:
[0011] The entropy map of the target domain image is extracted by a domain adaptation model, and the superpixels of the target domain image are extracted by a superpixel extraction network. The average entropy of each superpixel is determined by combining the entropy map, and the superpixels are divided into two categories according to the magnitude of the average entropy: one category is called high uncertainty superpixels, and the other category is called low uncertainty superpixels. For low uncertainty superpixels, fusion is performed using set conditions.
[0012] The first-stage annotation is obtained by using the superpixel annotations involved in the fusion, and the first-stage annotation is used to train the target domain model to obtain the first-stage target domain model.
[0013] Features of high-uncertainty superpixels are extracted by the domain adaptation model and the first-stage target domain model, and several high-uncertainty superpixels with the largest feature differences are selected for annotation to obtain the second-stage annotation. The second-stage annotation is then used to train the first-stage target domain model to obtain the second-stage target domain model.
[0014] Semantic segmentation of the target domain image is performed using the target domain model in the second stage.
[0015] An active domain adaptive semantic segmentation system, comprising:
[0016] The low-uncertainty superpixel fusion module is used to extract the entropy map of the target domain image through a domain adaptation model, extract the superpixels of the target domain image through a superpixel extraction network, determine the average entropy of each superpixel by combining the entropy map, and divide them into two categories according to the magnitude of the average entropy: one category is called high-uncertainty superpixels and the other category is called low-uncertainty superpixels; for low-uncertainty superpixels, fusion is performed using set conditions.
[0017] The first-stage annotation and training module is used to obtain the first-stage annotation using the superpixel annotations involved in the fusion, and to train the target domain model using the first-stage annotation to obtain the first-stage target domain model.
[0018] The second-stage annotation and training module is used to extract features of high-uncertainty superpixels through the domain adaptation model and the first-stage target domain model, respectively, and select several high-uncertainty superpixels with the largest feature differences for annotation to obtain the second-stage annotation. The second-stage annotation is then used to train the first-stage target domain model to obtain the second-stage target domain model.
[0019] The semantic segmentation module is used to perform semantic segmentation on the target domain image through the second-stage target domain model.
[0020] A processing device includes: one or more processors; and a memory for storing one or more programs;
[0021] When the one or more programs are executed by the one or more processors, the one or more processors implement the aforementioned method.
[0022] A readable storage medium storing a computer program that, when executed by a processor, implements the aforementioned method.
[0023] As can be seen from the technical solution provided by the present invention, superpixel-level annotation, unlike image-level and pixel-level annotation methods, significantly improves annotation efficiency by assigning only one semantic category to each superpixel. Furthermore, unlike existing uncertainty-based annotation strategies, this invention comprehensively considers both the number of annotations and the difficulty of annotation examples, and proposes a domain-information-based selection strategy. This strategy selects the superpixels most valuable for domain adaptation learning for annotation. The "most valuable" aspect is mainly reflected in two ways: First, the large-size superpixels after fusion have more annotated pixels, thus their annotation value is higher, and the fused superpixels are annotated first. Second, for small-size superpixels that did not participate in fusion, the samples with the greatest domain difference are selected for annotation, because samples with large domain differences are often difficult examples and require more annotation learning. In summary, the domain-information-based selection strategy mainly focuses on both the number and quality of pixels. By adopting a superpixel-level annotation method and a domain-information-based selection strategy, this invention significantly reduces annotation costs while improving annotation quality, ensuring the performance of domain-adaptive semantic segmentation. Attached Figure Description
[0024] To more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings used in the following description of the embodiments will be briefly introduced. Obviously, the drawings described below are only some embodiments of the present invention. For those skilled in the art, other drawings can be obtained based on these drawings without creative effort.
[0025] Figure 1 A flowchart of an active domain adaptive semantic segmentation method provided in an embodiment of the present invention;
[0026] Figure 2 This is a comparative diagram of various annotation schemes for active domain adaptation provided in the embodiments of the present invention;
[0027] Figure 3 This is a schematic diagram of superpixel fusion provided in an embodiment of the present invention;
[0028] Figure 4 This is a schematic diagram of the overall active domain adaptation scheme based on superpixel-level annotation provided in an embodiment of the present invention;
[0029] Figure 5 This is a schematic diagram of the model training process provided in an embodiment of the present invention;
[0030] Figure 6 An active domain adaptive semantic segmentation system provided in this embodiment of the invention
[0031] Figure 7 This is a schematic diagram of a processing device provided in an embodiment of the present invention. Detailed Implementation
[0032] The technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of the present invention, and not all embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those skilled in the art without creative effort are within the protection scope of the present invention.
[0033] First, the following explanations are provided for the terms that may be used in this article:
[0034] The terms “including,” “comprising,” “containing,” “having,” or other similar semantic descriptions should be interpreted as non-exclusive inclusion. For example, “including a technical feature element (such as raw material, component, ingredient, carrier, dosage form, material, size, part, component, mechanism, device, step, process, method, reaction conditions, processing conditions, parameter, algorithm, signal, data, product or article of manufacture, etc.)” should be interpreted as including not only the expressly listed technical feature element, but also other technical feature elements that are not expressly listed and are well-known in the art.
[0035] The term "composed of" excludes any technical features not expressly listed. When used in a claim, it closes the claim to exclude all technical features other than those expressly listed, except for associated conventional impurities. If the term appears only in a clause of a claim, it limits the claim to the elements expressly listed in that clause; elements recited in other clauses are not excluded from the overall claim.
[0036] The following is a detailed description of an active domain adaptive semantic segmentation method provided by this invention. Contents not described in detail in the embodiments of this invention are prior art known to those skilled in the art. Where specific conditions are not specified in the embodiments of this invention, they are performed according to conventional conditions in the art or conditions recommended by the manufacturer. Reagents or instruments used in the embodiments of this invention, unless otherwise specified by the manufacturer, are all commercially available conventional products.
[0037] Example 1
[0038] This invention also provides an active domain adaptive semantic segmentation method, such as... Figure 1 As shown, it mainly includes the following steps:
[0039] Step 1: Low-uncertainty superpixel fusion.
[0040] In this embodiment of the invention, the entropy map of the target domain image is extracted by a domain adaptation model, and the superpixels of the target domain image are extracted by a superpixel extraction network. The average entropy of each superpixel is determined by combining the entropy map, and the superpixels are divided into two categories according to the magnitude of the average entropy: one category is called high uncertainty superpixels, and the other category is called low uncertainty superpixels. For low uncertainty superpixels, fusion is performed using set conditions.
[0041] In this embodiment of the invention, the domain adaptation model is pre-trained using source domain labeled data and target domain unlabeled data; the superpixel extraction network is pre-trained using source domain labeled data.
[0042] In this embodiment of the invention, superpixel division is represented as:
[0043]
[0044] Among them, ent s Let τ represent the average entropy of superpixel s, and τ be the threshold. LUSP represents a low-uncertainty superpixel, and HUSP represents a high-uncertainty superpixel.
[0045] In this embodiment of the invention, the fusion setting conditions are as follows:
[0046] d JS (f(k)||f(n))<∈
[0047] Where, d JS Let || denote the Jensen-Shannon divergence. The symbol || is a commonly used symbol when calculating divergence, indicating relative to. Here, it is denoted as the Jensen-Shannon divergence of f(k) and f(n), where f(k) and f(n) are the average features of low-uncertainty superpixels k and n, respectively, and ∈ is the threshold.
[0048] Step 2, First stage of annotation and training.
[0049] In an embodiment of the present invention, all superpixels participating in the fusion are first labeled to obtain the first-stage labeling result. Since the number of superpixels after fusion is small but their size is large, a large number of labeled pixels can be obtained with only a small labeling cost. Next, the target domain model is trained using the labeled data from the first stage, and finally the target domain model of the first stage is obtained.
[0050] Step 3, Second Stage: Labeling and Training.
[0051] In this embodiment of the invention, features of high-uncertainty superpixels that did not participate in the fusion are extracted by the domain adaptation model and the first-stage target domain model, respectively. Several high-uncertainty superpixels with the largest feature differences are selected for annotation according to the annotation budget to obtain the second-stage annotation. The second-stage annotation is then used to train the first-stage target domain model to obtain the second-stage target domain model.
[0052] In this embodiment of the invention, the feature differences are calculated as follows:
[0053]
[0054] Where DomainGp represents feature differences. This represents the average feature of the high-uncertainty superpixel m extracted by the target domain model and domain adaptation model in the first stage.
[0055] In this embodiment of the invention, the number of high uncertainty superpixels to be selected can be set according to actual needs. For example, when the second-stage annotation budget is set to 30, the 30 high uncertainty superpixels with the largest feature differences can be selected for annotation.
[0056] Step 4: Semantic segmentation.
[0057] In this embodiment of the invention, semantic segmentation of the target domain image is performed through a second-stage target domain model.
[0058] The above-mentioned solution provided by the embodiments of the present invention has the following main advantages:
[0059] 1) This invention uses a superpixel-level annotation method to reduce annotation costs and further improves the annotation efficiency of active learning through a low-uncertainty superpixel fusion module.
[0060] 2) This invention proposes a selection and labeling strategy based on domain information content, which selects the most valuable samples for domain adaptation learning for labeling.
[0061] In summary, this invention utilizes a superpixel-level annotation method to effectively improve the annotation efficiency and quality of active domain adaptation, thereby ensuring the performance of domain-adaptive semantic segmentation.
[0062] To more clearly demonstrate the technical solution and its effects provided by the present invention, the method provided by the embodiments of the present invention will be described in detail below with reference to specific examples.
[0063] I. Overall Overview of the Plan
[0064] This invention proposes an active domain-adaptive semantic segmentation method based on superpixel-level annotation. Unlike image-level and pixel-level annotation methods, superpixel-level annotation significantly improves annotation efficiency by assigning only one semantic category to each superpixel. Furthermore, unlike the uncertainty-based selection strategies of previous works, this invention focuses on difficult examples in domain adaptation scenarios and proposes a domain-information-based selection strategy to select the superpixels most valuable for domain adaptation learning. By employing a superpixel-level annotation method and a domain-information-based selection strategy, this invention significantly reduces annotation costs while improving annotation quality.
[0065] The core innovations of this invention can be summarized in three aspects: 1) An uncertainty-based superpixel fusion scheme is proposed, which further reduces annotation costs through superpixel fusion and identifies easily confused superpixels by uncertainty, excluding them from the fusion process to ensure fusion quality; 2) A two-stage selection annotation strategy based on domain information content is designed. In the first stage, superpixels with higher information content are selected for fusion and annotation; in the second stage, several superpixels with the greatest domain differences that did not participate in fusion are selected for annotation. Efficient active learning is achieved by selecting samples with the highest domain information content for annotation, thereby maximizing annotation value; 3) A domain adaptation learning method suitable for superpixel-level annotation is proposed. Cross-domain mixing technology is used to alleviate domain differences in source domain data, and pseudo-label consistency technology is used to alleviate label noise in superpixel-level annotation.
[0066] II. Detailed introduction of the plan.
[0067] The core idea of this invention is to use a superpixel-level annotation method, selecting annotations on a superpixel-by-superpixel basis to reduce the waste of annotation resources and improve annotation efficiency. To this end, this invention first compares various annotation methods. Figure 2 Image-level annotation, pixel-level annotation, and superpixel-level annotation methods were compared. For example... Figure 2 As shown, image-level annotation labels every pixel in the entire image, leading to wasted annotation budget for redundant regions within the labeled object, thus reducing annotation efficiency. In contrast, pixel-level annotation, on a per-pixel basis, selects only a small number of the most valuable pixels for annotation. However, pixel-level annotation typically has low information content, requiring the annotation of 10,000 to 20,000 pixels per image to provide sufficient training information, resulting in high annotation costs. Superpixel-level annotation, on a per-superpixel basis, only requires assigning a semantic category to each superpixel to annotate all pixels within the superpixel, significantly improving annotation efficiency. To further reduce the waste of annotation resources, superpixel fusion can be used to merge superpixels that may have the same semantic category together, thereby improving annotation efficiency.
[0068] Therefore, this invention considers using a superpixel-level annotation method and further improves annotation efficiency through superpixel fusion. However, simple feature similarity-based fusion is not suitable for domain adaptation scenarios. Figure 3 As shown in section (b), superpixel fusion in domain adaptation scenarios may face the challenge of fusion errors. Due to the differences between the source and target domains, features in the target domain may be subject to inter-class confusion; for example, roads and sidewalks may have similar features, as may buses and trains. Therefore, in domain adaptation scenarios, directly using fusion methods based on feature similarity may not be applicable. To address this issue, this invention proposes an uncertainty-based superpixel fusion scheme. Figure 3 As shown in section (c), superpixels with easily confused features are first accurately identified, and these specific superpixels are excluded during the fusion process. Only superpixels that are not easily confused participate in the fusion process, which can effectively solve the problem of superpixel fusion errors in domain adaptation scenarios.
[0069] Specifically, entropy maps are used to identify superpixels with easily confused features. For example... Figure 4 As shown, the principle of the present invention is illustrated. During low-uncertainty superpixel fusion, the superpixels are divided into high-uncertainty superpixels (HUSP) and low-uncertainty superpixels (LUSP) based on their average entropy.
[0070]
[0071] Where s is a basic superpixel, ent sLet be the average entropy of superpixel s, and τ be the threshold. HUSPs remain unchanged and do not participate in fusion; only LUSPs are fused based on feature similarity. The condition for fusing two low-uncertainty superpixels k and n is:
[0072] d JS (f(k)||f(n))<∈
[0073] Where, d JS Let f(k) and f(n) represent the Jensen-Shannon divergence, respectively, and let ∈ be the average features of low-uncertainty superpixels k and n, respectively. After superpixel fusion, a two-stage selection annotation process is performed.
[0074] For example, τ can be set to [0.01, 0.10], for example, τ = 0.05; τ can be set to [0.01, 0.20], for example, τ = 0.10. Of course, this is just an example, and the present invention does not limit the specific values. In practical applications, users can set the values according to the actual situation or experience.
[0075] Considering that the size of the merged low uncertainty superpixel (MLUSP) participating in the fusion is much larger than the size of the non-merged HUSP, the first stage prioritizes the annotation of MLUSPs with high information content, thereby obtaining a large amount of annotation information with minimal annotation cost. After completing the first stage of selection and labeling, use the labeling information. A target domain model 1 is initially trained using the designed training scheme. This model utilizes the labeled target domain information to learn the features and semantics of the target domain, providing a foundation for subsequent superpixel annotation selection. In the second stage, the remaining annotation budget is used to select the HUSP with the greatest domain difference for annotation, providing the most valuable annotation information for domain adaptation learning. Here, the prediction difference between the source domain-biased domain adaptation model and the target domain model 1 is used to evaluate the magnitude of the domain difference in HUSP:
[0076]
[0077] DomainGap represents feature differences. This represents the average feature of the high-uncertainty superpixel m extracted by the target domain model and the domain adaptation model in the first stage. Clearly, the greater the difference between the predictions of the two models, the greater the domain gap contained in the superpixel s.
[0078] The two-stage annotation selection method described above effectively utilizes annotation resources and prioritizes superpixels with greater information content and larger domain differences for annotation.
[0079] For example, DeepLabv2 can be used as the image semantic segmentation model (i.e., the target domain model), DACS can be used as the domain adaptation model to extract the entropy map, and SSN can be used as the superpixel network to extract superpixels for each target domain image.
[0080] Figure 4 In the text, A represents the Acquisition Function, which is the strategy for selecting annotations; This indicates the superpixels selected for fusion in the first phase; This represents the superpixels with the largest domain differences selected in the second stage.
[0081] For model training, since superpixels are not perfectly accurate, superpixel-level annotations inevitably contain noise. Furthermore, source domain annotation information can introduce domain discrepancies. Therefore, this invention introduces label denoising and domain adaptation techniques into model training. Figure 5 As shown, when training the target domain model using the first-stage annotation and training the first-stage target domain model using the second-stage annotation, cross-domain blending and pseudo-label consistency techniques are employed. Cross-domain blending refers to using the labels Y of the source domain data... s Randomly select a predetermined number of categories (e.g., half the categories) from the original image and paste them onto the target domain image I. t Thus, a cross-domain hybrid image I is obtained. st The pseudo-tag consistency technique extracts pseudo-tags from the target domain. Based on cross-domain mixing, the source domain label Y s By setting a number of category regions and pasting them onto the target domain pseudo-tags, cross-domain hybrid pseudo-tags are obtained. And using data augmentation method A, force the augmented A(I) to... st The segmentation prediction is consistent with the prediction before augmentation (pseudo-labels); where, when the first-stage annotation is used to train the target domain model, When using the second-stage annotation of the first-stage target domain model I t with A(I st Both are used as input data for the two training phases.
[0082] Figure 5 In this context, "Network" generally refers to a segmentation network, "P" stands for prediction, "I" stands for image, "s" represents the source domain, "t" represents the target domain, and "st" represents source mix target. Figure 4 The model training of the target domain network in the two stages follows Figure 5 The training methods shown mainly differ in the labeled data.
[0083] Ultimately, the loss of the active domain adaptive learning method is:
[0084]
[0085] Among them, L total For the loss of active domain adaptive learning methods, L CE For cross-entropy loss, I t For the target domain image, I st For mixed data of the source and target domains (i.e., cross-domain mixed images), Active annotation at the superpixel level for the target domain (i.e., pseudo-labels for the target domain). A is a hybrid tag that combines source domain tags and target domain pseudo-tags (i.e., a cross-domain hybrid pseudo-tag), where A represents the tag applied to I. st Data enhancement.
[0086] Since the scheme of updating model parameters using loss can be implemented in a conventional way, it will not be elaborated here.
[0087] III. Example of the solution.
[0088] This example illustrates the process of the active domain adaptive semantic segmentation method using a specific model, mainly including:
[0089] Step S1: Prepare the source domain labeled training dataset and the target domain training and test sets. For the training set images in both the source and target domains, perform ordinary random cropping online. After image processing, the images are scaled to 512×1024, then cropped to 512×512, and then numerically normalized.
[0090] Step S2: Use the PyTorch deep learning framework, specifically DeepLabv2 based on ResNet101 as the segmentation network, the classic domain adaptation method DACS model as the domain adaptation model, and the classic superpixel network SSN as the superpixel extraction network.
[0091] Step S3: Train the DACS model using source domain labeled data and target domain unlabeled data.
[0092] Step S4: Train the SSN network using source domain labeled data.
[0093] Step S5: Extract the entropy map and superpixels of the target domain data using the trained DACS and SSN respectively, and perform superpixel fusion using the low-uncertainty superpixel fusion module.
[0094] Step S6: After fusion, perform the first-stage selection and annotation, selecting the MLUSPs participating in the fusion for annotation, thus obtaining the first-stage annotations.
[0095] Step S7: Use the first-stage annotation The model is trained to obtain target domain model 1 (i.e., the target domain model in the first stage); specifically, the loss provided above is used for training.
[0096] Step S8: Use the remaining annotation budget for the second-stage annotation selection. Based on the prediction differences between the trained domain adaptation model DACS and the target domain model 1, select several HUSPs with the largest domain differences for annotation, thus obtaining the second-stage annotations.
[0097] Step S9: Use second-stage annotation The model is trained to obtain target domain model 2 (i.e., the target domain model in the second stage); specifically, the loss provided above is used for training.
[0098] Step S10: Input the test dataset and calculate the segmentation accuracy of the target domain model 2.
[0099] Through the above description of the embodiments, those skilled in the art can clearly understand that the above embodiments can be implemented by software, or by using software plus necessary general-purpose hardware platforms. Based on this understanding, the technical solutions of the above embodiments can be embodied in the form of a software product, which can be stored in a non-volatile storage medium (such as a CD-ROM, USB flash drive, external hard drive, etc.), including several instructions to cause a computer device (such as a personal computer, server, or network device, etc.) to execute the methods described in the various embodiments of the present invention.
[0100] Example 2
[0101] This invention also provides an active domain-adaptive semantic segmentation system, which is mainly used to implement the methods provided in the foregoing embodiments, such as... Figure 6 As shown, the system mainly includes:
[0102] The low-uncertainty superpixel fusion module is used to extract the entropy map of the target domain image through a domain adaptation model, extract the superpixels of the target domain image through a superpixel extraction network, determine the average entropy of each superpixel by combining the entropy map, and divide them into two categories according to the magnitude of the average entropy: one category is called high-uncertainty superpixels and the other category is called low-uncertainty superpixels; for low-uncertainty superpixels, fusion is performed using set conditions.
[0103] The first-stage annotation and training module is used to obtain the first-stage annotation using the superpixel annotations involved in the fusion, and to train the target domain model using the first-stage annotation to obtain the first-stage target domain model.
[0104] The second-stage annotation and training module is used to extract features of high-uncertainty superpixels through the domain adaptation model and the first-stage target domain model, respectively, and select several high-uncertainty superpixels with the largest feature differences for annotation to obtain the second-stage annotation. The second-stage annotation is then used to train the first-stage target domain model to obtain the second-stage target domain model.
[0105] The semantic segmentation module is used to perform semantic segmentation on the target domain image through the second-stage target domain model.
[0106] Since the main technical details of the above system have been described in detail in the previous embodiments, they will not be repeated here.
[0107] Those skilled in the art will understand that, for the sake of convenience and brevity, the above-described division of functional modules is used as an example. In practical applications, the above functions can be assigned to different functional modules as needed, that is, the internal structure of the system can be divided into different functional modules to complete all or part of the functions described above.
[0108] Example 3
[0109] The present invention also provides a processing device, such as Figure 7 As shown, it mainly includes: one or more processors; a memory for storing one or more programs; wherein, when the one or more programs are executed by the one or more processors, the one or more processors implement the method provided in the foregoing embodiments.
[0110] Furthermore, the processing device also includes at least one input device and at least one output device; in the processing device, the processor, memory, input device, and output device are connected via a bus.
[0111] In this embodiment of the invention, the specific types of the memory, input device, and output device are not limited; for example:
[0112] Input devices can be touchscreens, image acquisition devices, physical buttons, or mice, etc.
[0113] The output device can be a display terminal;
[0114] The memory can be random access memory (RAM) or non-volatile memory, such as disk storage.
[0115] Example 4
[0116] The present invention also provides a readable storage medium storing a computer program that, when executed by a processor, implements the method provided in the foregoing embodiments.
[0117] In this embodiment of the invention, the readable storage medium is a computer-readable storage medium and can be disposed in the aforementioned processing device, for example, as a memory in the processing device. Furthermore, the readable storage medium can also be any medium capable of storing program code, such as a USB flash drive, portable hard drive, read-only memory (ROM), magnetic disk, or optical disk.
[0118] The above description is merely a preferred embodiment of the present invention, but the scope of protection of the present invention is not limited thereto. Any variations or substitutions that can be easily conceived by those skilled in the art within the scope of the technology disclosed in the present invention should be included within the scope of protection of the present invention. Therefore, the scope of protection of the present invention should be determined by the scope of the claims.
Claims
1. An active domain adaptive semantic segmentation method, characterized in that, include: The entropy map of the target domain image is extracted by a domain adaptation model, and the superpixels of the target domain image are extracted by a superpixel extraction network. The average entropy of each superpixel is determined by combining the entropy map, and the superpixels are divided into two categories according to the magnitude of the average entropy: one category is called high uncertainty superpixels, and the other category is called low uncertainty superpixels. For low uncertainty superpixels, fusion is performed using set conditions. The first-stage annotation is obtained by using the superpixel annotations involved in the fusion, and the first-stage annotation is used to train the target domain model to obtain the first-stage target domain model. Features of high-uncertainty superpixels are extracted by the domain adaptation model and the first-stage target domain model, and several high-uncertainty superpixels with the largest feature differences are selected for annotation to obtain the second-stage annotation. The second-stage annotation is then used to train the first-stage target domain model to obtain the second-stage target domain model. Semantic segmentation of the target domain image is performed using the target domain model in the second stage.
2. The active domain adaptive semantic segmentation method according to claim 1, characterized in that, The domain adaptation model is pre-trained using source domain labeled data and target domain unlabeled data; the superpixel extraction network is pre-trained using source domain labeled data.
3. The active domain adaptive semantic segmentation method according to claim 1, characterized in that, The average entropy of each superpixel is determined by combining the entropy map, and then divided into two categories based on the magnitude of the average entropy: Among them, ent s Let τ represent the average entropy of superpixel s, and τ be the threshold. LUSP represents a low-uncertainty superpixel, and HUSP represents a high-uncertainty superpixel.
4. The active domain adaptive semantic segmentation method according to claim 1, characterized in that, The conditions for the fusion are as follows: d JS (f(k)||f(n))<∈ Where, d JS Let f(k) and f(n) represent the Jensen-Shannon divergence, respectively, and let ∈ be the average features of low-uncertainty superpixels k and n, respectively.
5. The active domain adaptive semantic segmentation method according to claim 1, characterized in that, The method for calculating feature differences is as follows: DomainGap represents feature differences. This represents the average feature of the high-uncertainty superpixel m extracted by the target domain model and domain adaptation model in the first stage.
6. The active domain adaptive semantic segmentation method according to claim 1, characterized in that, When training the target domain model using the first-stage annotation and training the first-stage target domain model using the second-stage annotation, cross-domain blending and pseudo-label consistency techniques are used. Among them, cross-domain hybrid technology refers to the technique of using the label Y of the source domain data. s Randomly select a set number of original image regions corresponding to each category in the annotation and paste them onto the target domain image I. t The cross-domain blended image I is obtained. st The pseudo-tag consistency technique extracts pseudo-tags from the target domain. Based on cross-domain mixing, the source domain label Y s By setting a certain number of category regions and pasting them onto the target domain pseudo-tags, cross-domain hybrid pseudo-tags are obtained. And use data augmentation method A to make the augmented A(I) st The segmentation prediction remains consistent with the prediction before augmentation; specifically, when the first-stage annotation is used to train the target domain model, When using the second-stage annotation of the first-stage target domain model I t with A(I st Both are used as input data for the two training phases.
7. The active domain adaptive semantic segmentation method according to claim 6, characterized in that, The active domain adaptation loss during training is: Among them, L total For active domain adaptation loss, L CE This represents the cross-entropy loss.
8. An active domain adaptive semantic segmentation system, characterized in that, include: The low-uncertainty superpixel fusion module is used to extract the entropy map of the target domain image through a domain adaptation model, extract the superpixels of the target domain image through a superpixel extraction network, determine the average entropy of each superpixel by combining the entropy map, and divide them into two categories according to the magnitude of the average entropy: one category is called high-uncertainty superpixels and the other category is called low-uncertainty superpixels; for low-uncertainty superpixels, fusion is performed using set conditions. The first-stage annotation and training module is used to obtain the first-stage annotation using the superpixel annotations involved in the fusion, and to train the target domain model using the first-stage annotation to obtain the first-stage target domain model. The second-stage annotation and training module is used to extract features of high-uncertainty superpixels through the domain adaptation model and the first-stage target domain model, respectively, and select several high-uncertainty superpixels with the largest feature differences for annotation to obtain the second-stage annotation. The second-stage annotation is then used to train the first-stage target domain model to obtain the second-stage target domain model. The semantic segmentation module is used to perform semantic segmentation on the target domain image through the second-stage target domain model.
9. A processing device, characterized in that, include: One or more processors; Memory, used to store one or more programs; Wherein, when the one or more programs are executed by the one or more processors, the one or more processors cause the one or more processors to implement the method as described in any one of claims 1 to 7.
10. A readable storage medium storing a computer program, characterized in that, When a computer program is executed by a processor, it implements the method as described in any one of claims 1 to 7.