A dynamic visual segmentation method based on subfield self-adaptation
By using subdomain adaptation technology to align features in subdomains of the same category in the source and target domains, the problems of inaccurate segmentation and high computational cost caused by differences in training scenarios are solved, and the model achieves efficient generalization and real-time segmentation in new scenarios.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- BEIJING SPACEFLIGHT TUOPUGAO SCI & TECH CO LTD
- Filing Date
- 2025-06-16
- Publication Date
- 2026-06-23
AI Technical Summary
Existing technologies suffer from inaccurate segmentation and high computational resource consumption when faced with significant differences in training scenarios, and the global domain adaptation strategy leads to severe class confusion.
We employ a subdomain adaptation method, which aligns features in subdomains of the same category in the source and target domains. We then utilize subdomain adaptation training of the encoder and classifier, and combine a domain discriminator and symmetric cross-entropy loss to optimize the model structure and training process.
It enhances the model's generalization ability in new scenarios, improves the real-time performance and robustness of semantic segmentation, reduces the impact of training noise, and alleviates classification bias caused by cross-class feature confusion.
Smart Images

Figure CN120726323B_ABST
Abstract
Description
Technical Field
[0001] This invention belongs to the field of machine vision, specifically relating to a dynamic visual segmentation method based on subdomain adaptation. Background Technology
[0002] Semantic segmentation is a highly influential technology in the field of computer vision. Compared to image classification and object detection algorithms, it can accurately separate various objects in an image by pixel. However, when this technology is applied to environments that differ significantly from the training scenario, it also faces challenges in generalization, resulting in difficulties in feature extraction and inaccurate segmentation.
[0003] Segment Anything Model (SAM), as a large-scale visual model technique, exhibits excellent generalization ability in zero-shot tasks. However, this technique requires significant computational resources, particularly for real-time dynamic segmentation tasks. Domain adaptation, a branch of transfer learning, has proven effective in addressing the issue of decreased model accuracy due to significant differences in image scenes and can compensate for the poor real-time performance of large models. However, global domain adaptation strategies can lead to class confusion when the model encounters objects with similar features, misclassifying two closely spaced and feature-similar objects as a single object. Summary of the Invention
[0004] In view of the above-mentioned defects or deficiencies in the prior art, the present invention provides a dynamic visual segmentation method based on subdomain adaptation. On the basis of the model pre-trained in the source domain, domain adaptation training is performed in subdomains of the same category between the source domain and the target domain to achieve feature alignment of subdomains of the same category, thereby alleviating the classification bias problem caused by cross-class confusion of features during the global domain adaptation process.
[0005] To achieve the above objectives, this invention performs subdomain adaptation training on the encoder of the feature extraction part of the model and the classifier of the part that determines the specific object category in the image.
[0006] A dynamic visual segmentation method based on subdomain adaptation includes the following steps:
[0007] Step S1, Data Acquisition:
[0008] From source domain dataset D s Obtain all image samples X s and its corresponding label Y s From the target domain dataset D t Obtain target domain image samples X t ;
[0009] Step S2, construct the source domain verification set V sand the target domain validation set V t :
[0010] Select 1 / 10 samples of each category in the source domain dataset D s and their annotation files as the validation set V s . The target domain dataset D t Collect 5 images for each category and perform manual annotation. The annotation files and image samples are used as the target domain validation set V t ;
[0011] Step S3, model pre-training:
[0012] Use the source domain dataset D s and the cross-entropy loss to pre-train the encoder E and classifier C of the model. In each round of iterative training, the source domain validation set V s is used to perform the mean pixel accuracy MPA s test. When the mean pixel accuracy MPA s in the source domain validation set V s does not improve for 5 consecutive rounds, stop the pre-training. Step S4, construct the domain discriminator D:
[0013] Copy the classifier C trained on the source domain and reconstruct the classifier C, set it to a binary classification structure, and the reconstructed classifier is used as the domain discriminator D;
[0014] Step S5, start the iterative process of domain adaptation, and set the initial value of the iteration number e to 1;
[0015] Step S6, set the execution number to i times; if e < i, execute Step S7; if e >= i, transfer to Step S12;
[0016] Step S7, use the encoder E and classifier C trained on the source domain to perform feature extraction and classification on the image samples in the target domain dataset D t ;
[0017] Step S8, assign domain labels to the image samples of the same category in the source domain dataset D s and the target domain dataset D t . Set the source domain label Y sd to 0, and the target domain label Y td to 1, and re-input these two parts into the model to train the encoder E and the domain discriminator D using the binary cross-entropy loss ;
[0018] Step S9, reverse the domain labels in Step S8, set the source domain label Y sd to 1, and the target domain label Y tdSet these values to 0, and then re-invest these two parts into the model to train encoder E and domain discriminator D;
[0019] Step S10, re-process the target domain dataset D using encoder E and classifier C. t Feature extraction and classification are performed on image samples in the dataset. Image samples whose classification results match those in step S7 are then compared with the source domain dataset D. s To form a new dataset D ST And feed it into the model, using symmetric cross-entropy loss. Continue training encoder E and classifier C.
[0020] Step S11, use encoder E and classifier C to validate the target domain set V. t Perform verification tests and record the average pixel accuracy (MPA). t The value; if MPA is iterated for 5 consecutive rounds. t If none of the values increase, the iteration is terminated early; step S12, obtain the model after domain adaptation.
[0021] Furthermore, in step S3, the cross-entropy loss The calculation method is as follows:
[0022]
[0023] Where n is the source domain dataset D s The total number of samples, For the source domain dataset D s The true label of the i-th sample. For the i-th sample, Indicates the encoder's response to the sample. Extracted feature values, This represents the classification result of the classifier on the feature values extracted by the encoder.
[0024] Furthermore, in step S3, the average pixel accuracy (MPA) s The calculation method is as follows:
[0025]
[0026] Where K represents the total number of categories, TP c FN represents the number of pixels correctly predicted as class c. c This represents the number of pixels that are actually of category c but are predicted to be of other categories.
[0027] Furthermore, in step S8, the binary cross-entropy loss The calculation method is as follows:
[0028]
[0029] Where n is the source domain dataset D s The total number of samples, For the source domain dataset D s The true label of the i-th sample. Let be the predicted probability distribution of the model for the i-th sample. Let i be the i-th sample.
[0030] Furthermore, in step S10, the symmetric cross-entropy loss The calculation method is as follows:
[0031]
[0032] in For cross-entropy loss, The loss is the inverse cross-entropy loss.
[0033] The inverse cross-entropy loss is calculated as follows:
[0034]
[0035] Therefore, the symmetric cross-entropy loss in step S10 The calculation method can be transformed into:
[0036]
[0037] The beneficial effects of this invention are as follows: This invention provides a dynamic visual segmentation method based on subdomain adaptation, which enhances the generalization ability of the model in new scenarios through subdomain alignment-based domain adaptation. It optimizes the model structure and inference speed by reconstructing the classifier as a domain discriminator, thereby enhancing the real-time performance of semantic segmentation. Simultaneously, to address the noise impact during domain adaptation training, symmetric cross-entropy loss is used to reduce the influence of training noise, enhancing the model's robustness and achieving subdomain feature alignment within the same category. This alleviates the classification bias caused by cross-class feature confusion during global domain adaptation.
[0038] The present invention will be further explained in detail below with reference to the accompanying drawings and specific embodiments. Attached Figure Description
[0039] Figure 1 This is a flowchart of the dynamic visual segmentation method based on subdomain adaptation of the present invention;
[0040] Figure 2 This is a schematic diagram of the model training process in the dynamic visual segmentation method based on subdomain adaptation of the present invention;
[0041] Figure 3 This is a schematic diagram of the method for reconstructing a classifier into a domain discriminator in the dynamic visual segmentation method based on subdomain adaptation of the present invention. Detailed Implementation
[0042] A dynamic visual segmentation method based on subdomain adaptation, such as Figure 1-2 As shown, subdomain adaptation techniques are used to improve the generalization and real-time performance of the model in semantic segmentation tasks, effectively solving the problems of model generalization, real-time inference speed, and category confusion in computer vision semantic segmentation. Figure 2 The decoder part consists of conventional inverse encoding operations, which will not be described in detail. The method includes the following steps:
[0043] Step S1, Data Acquisition:
[0044] From source domain dataset D s Obtain all image samples X s and its corresponding label Y s From the target domain dataset D t Obtain target domain image samples X t ;
[0045] The label Y s For image sample X s A labeling file that categorizes each pixel.
[0046] Step S2, construct the source domain verification set V s and target domain validation set V t :
[0047] Take the source domain dataset D s One-tenth of the samples from each category and their labeled files are used as the validation set. Target domain dataset D t Five images were collected for each category and manually labeled. The labeled files and image samples served as the target domain validation set V. t ;
[0048] Step S3, Model pre-training:
[0049] Using source domain dataset D s and cross-entropy loss The encoder E and classifier C of the model are pre-trained, and the source domain validation set V is used in each round of training iteration. s Perform average pixel accuracy testing. When the model achieves the desired accuracy on the source domain validation set V for 5 consecutive rounds... s Average pixel accuracy (MPA) s Stop pre-training if there is no improvement.
[0050] The cross-entropy loss is calculated as follows:
[0051]
[0052] where n is the total number of samples in the source domain dataset D s ; is the true label of the i-th sample in the source domain dataset D s , is the i-th sample represents the feature value extracted by the encoder for the sample ; represents the classification result of the classifier for the feature value extracted by the encoder is the cross-entropy loss
[0053] The calculation method of the average pixel accuracy is as follows:
[0054]
[0055] where K represents the total number of categories, TP c represents the number of pixel points correctly predicted as category c, FN c represents the number of pixel points that are actually category c but are predicted as other categories, and MPA s represents the average pixel accuracy
[0056] Step S4, construct the domain discriminator D:
[0057] Copy a classifier C trained in the source domain and reconstruct this classifier C, set it as a binary classification structure, and use this classifier as the domain discriminator D. The reconstruction method is as Figure 3 shown, adding a fully connected layer on the basis of the original classifier C
[0058] Step S5, start the iterative process of domain adaptation, and set the initial value of the iteration number e to 1;
[0059] Step S6, set the number of executions to i times; if e < i, then execute Step S7; if e >= i, then transfer to Step S12;
[0060] Step S7, use the encoder E and classifier C trained in the source domain to extract features and classify the image samples in the target domain dataset D t ;
[0061] Step S8, assign domain labels to the image samples of the same category in the source domain and target domain datasets. The source domain part of the source domain label Y sd is set to 0, and the target domain part of the target domain label Y td is set to 1, and these two parts are re-input into the model and the binary cross-entropy loss is used to train the encoder E and the domain discriminator D. The calculation method of the binary cross-entropy loss is as follows:
[0062]
[0063] Where n is the source domain dataset D s The total number of samples, For the source domain dataset D s The true label of the i-th sample. Let be the predicted probability distribution of the model for the i-th sample. For the i-th sample, This is the binary cross-entropy loss.
[0064] Step S9: Invert the field label from step S8, and the source field label Y of the source field part. sd Set to 1, the source domain label Y of the target domain part. td Set these values to 0, and then re-invest these two parts into the model to train encoder E and domain discriminator D;
[0065] Step S10, re-process the target domain dataset D using encoder E and classifier C. t Feature extraction and classification are performed on image samples in the dataset. Image samples whose classification results match those in step S7 are then compared with the source domain dataset D. s To form a new dataset D ST And feed it into the model, using symmetric cross-entropy loss. Continue training the source domain encoder E and classifier C. The symmetric cross-entropy loss is calculated as follows:
[0066]
[0067] in For cross-entropy loss, For inverse cross-entropy loss, and:
[0068]
[0069] Therefore:
[0070]
[0071] The reason for using symmetric cross-entropy here is that when image data from the target domain is fed into the source domain for joint training, there will inevitably be noisy data from the target domain. Therefore, the robustness of symmetric cross-entropy loss needs to be utilized to reduce the impact of noisy data.
[0072] Step S11, use encoder E and classifier C to validate the target domain set V. t Verification tests were conducted, and the average pixel accuracy was recorded as MPA. t If MPA is iterated for 5 consecutive rounds... t If the value does not increase, the iteration is terminated early (MPA). t The calculation method and steps in S3 for MPA s The calculation method is consistent;
[0073] Step S12: Obtain the model after domain adaptation.
[0074] This method, based on the model pre-trained in the source domain, adopts a discriminative adversarial domain adaptation approach to narrow the difference in feature distribution between the source and target domains. Then, pseudo-labeling is used on the classification results made by the classifier to achieve sub-domain alignment between the same category, thereby alleviating the class confusion phenomenon in the global domain adaptation process. This not only enhances the model's generalization ability in the target domain, but also retains the real-time performance of fast segmentation.
[0075] Finally, it should be noted that the above is only used to illustrate the technical solutions of the present invention and not to limit it. Although the present invention has been described in detail with reference to preferred embodiments, those skilled in the art should understand that modifications or equivalent substitutions can be made to the technical solutions of the present invention (such as the application of various formulas, the order of steps, etc.) without departing from the spirit and scope of the technical solutions of the present invention.
Claims
1. A dynamic visual segmentation method based on subdomain adaptation, characterized in that, It includes the following steps: Step S1, data collection: From source domain dataset Obtain all image samples and its corresponding annotations From the target domain dataset Obtain target domain image samples ; Step S2, construct the source domain verification set and target domain validation set : Source domain dataset One-tenth of the samples in each category and their labeled files are used as the validation set. Target domain dataset Five images were collected for each category and manually labeled. The labeled files and image samples served as the target domain validation set. ; Step S3, model pre-training: Using source domain datasets and cross-entropy loss The encoder E and classifier C of the model are pre-trained, and the source domain validation set is used in each round of training iteration. Perform average pixel accuracy Test, when the model is validated on the source domain set for 5 consecutive rounds. Average pixel accuracy Stop pre-training if there is no improvement; Step S4, constructing domain discriminator D: Duplicate the classifier C trained on the source domain, reconstruct the classifier C, set it as a binary classification structure, and use the reconstructed classifier as the domain discriminator D; Step S5, start the iterative process of domain adaptation, and set the initial value of the iteration number e to 1; Step S6, set the number of executions to i times; if e < i, then execute Step S7; if e >= i, then transfer to Step S12; Step S7: Use the encoder E and classifier C trained in the source domain to process the target domain dataset. Feature extraction and classification of image samples in the dataset; Step S8, retrieve the source domain dataset and target domain dataset Assign domain labels to image samples of the same category, and transfer the source domain label. Set to 0, target domain label Set it to 1, and then re-input both parts into the model using binary cross-entropy loss. Train encoder E and domain discriminator D; Step S9: Invert the domain labels from step S8, and the source domain labels... Set to 1, target domain label Set these values to 0, and then re-invest these two parts into the model to train encoder E and domain discriminator D; Step S10: Reprocess the target domain dataset using encoder E and classifier C. Feature extraction and classification are performed on the image samples in the dataset. Image samples whose classification results are consistent with those in step S7 are selected and compared with the source domain dataset. Create a new dataset And feed it into the model, using symmetric cross-entropy loss. Continue training encoder E and classifier C; Step S11, use encoder E and classifier C to validate the target domain set. Conduct verification tests and record the average pixel accuracy. The value; if 5 consecutive iterations If none of the values increase, the iteration is terminated early. Step S12, obtain the model after domain adaptation is completed.
2. The dynamic visual segmentation method based on subdomain adaptation according to claim 1, characterized in that, The calculation method of the cross-entropy loss in Step S3 is: (1) Where n is the source domain dataset The total number of samples, For source domain dataset The true label of the i-th sample. For the i-th sample, Indicates the encoder's response to the sample. Extracted feature values, This represents the classification result of the classifier on the feature values extracted by the encoder. This represents the cross-entropy loss.
3. The dynamic visual segmentation method based on subdomain adaptation according to claim 1, characterized in that, The calculation method of the average pixel accuracy in Step S3 is: (2) in Indicates the total number of categories. This represents the number of pixels correctly predicted as class c. This represents the number of pixels that are actually of category c but are predicted as other categories. This represents the average pixel accuracy.
4. The dynamic visual segmentation method based on subdomain adaptation according to claim 1, characterized in that, The calculation method of the binary cross-entropy loss in Step S8 is: (3) Where n is the source domain dataset The total number of samples, For source domain dataset The true label of the i-th sample. Let be the predicted probability distribution of the model for the i-th sample. For the i-th sample, This is the binary cross-entropy loss.
5. The dynamic visual segmentation method based on subdomain adaptation according to claim 2, characterized in that, The calculation method of the symmetric cross-entropy loss in Step S10 is: (4) in For cross-entropy loss, For inverse cross-entropy loss, and: (5) Therefore: (6)。