An unsupervised domain adaptation method for passive domain data
By utilizing the statistical information of the BN layer of the pre-trained model and generating soft labels through fuzzy clustering, the feature distributions of the source and target domains are explicitly aligned, solving the distribution alignment problem in unsupervised domain adaptation under source domain data, and improving the model's recognition accuracy and generalization ability in the target domain.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- ZHEJIANG UNIV OF TECH
- Filing Date
- 2022-12-13
- Publication Date
- 2026-06-12
AI Technical Summary
Under conditions of no source domain data, existing unsupervised domain adaptation methods have difficulty effectively aligning the distributions of the source and target domains, resulting in limited model generalization ability. In particular, when source domain data is unavailable, traditional methods cannot explicitly align the distributions, affecting the application of the model in unknown data domains.
By utilizing the statistical information stored in the BN layer of the pre-trained model to approximate the feature distribution of the source domain, it is explicitly aligned with the target domain samples. Soft labels are generated through fuzzy clustering, and combined with information maximization loss, the target domain model is optimized to achieve unsupervised domain adaptation.
Explicitly aligning the feature distributions of the source and target domains corrects some classification errors, improves the model's recognition accuracy in the target domain, and does not depend on source domain data, resulting in higher classification accuracy and wider applicability.
Smart Images

Figure CN116227578B_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the technical field of computation, calculation, or counting, and particularly to an unsupervised domain adaptation method for passive domain data based on BN layer information and soft clustering in the field of machine learning. Background Technology
[0002] In recent years, deep neural networks have achieved remarkable results in visual classification and have been widely applied across various industries. A prerequisite for the superior performance of neural networks is that the test and training data follow an independent and identically distributed (i.i.d.) distribution. However, this condition is difficult to meet in the real world. Ideally, the knowledge gained by the model on richly labeled datasets should be transferable or applicable to other unlabeled data. But even if the differences between datasets are small, deep networks struggle to be applied to unknown data domains. During training, a significant factor affecting the model's generalization ability is the distribution shift between data from different domains. Therefore, domain adaptation is a research area addressing this type of problem.
[0003] Significant progress has been made in this technical problem in recent years, especially in unsupervised domain adaptation. When we have direct access to the source domain dataset, we can directly align the distribution shifts between the source and target domains. Many existing domain adaptation methods are very effective even for unlabeled target domain data. However, traditional domain adaptation relies on the availability of source domain data and its labels. In some practical situations, including but not limited to large datasets that are difficult to store, challenges in sharing data, data privacy, and other dataset processing issues, source data is not readily available, and only pre-trained models can be obtained. This limits traditional unsupervised domain adaptation models, thus leading to the development of source-free domain adaptation.
[0004] The difference between passive domain adaptation and unsupervised domain adaptation lies in the fact that passive domain adaptation cannot obtain labeled source domain data; it can only be trained using a model trained on source domain data and unlabeled target domain data. Currently, there are two common methods for passive domain adaptation: one involves mining information containing source domain features from a pre-trained model and training it with target domain samples to fine-tune the pre-trained model; the other uses generative models, utilizing target domain data and the pre-trained model to generate samples containing source domain information, and then using these generated samples and target domain samples for domain adaptation. However, because there is no source domain data, these methods do not explicitly align the source and target domains; they merely use unsupervised methods for fine-tuning or generate pseudo-source domain samples that resemble the target domain samples. Summary of the Invention
[0005] This invention addresses the problems existing in the prior art and provides an unsupervised domain adaptation method for passive domain data.
[0006] The technical solution adopted in this invention is an unsupervised domain adaptation method for passive domain data, the method comprising the following steps:
[0007] Step 1: Train the model with labeled source domain samples to obtain a pre-trained source domain model;
[0008] Step 2: Initialize the target domain model with the source domain model, including the feature extractor and classifier;
[0009] Step 3: Approximate the feature distribution of the source domain with the statistical information stored in the BN layer of the source domain model, explicitly align it with the feature distribution of the target domain samples, and calculate the distribution alignment loss L. BN ;
[0010] Step 4: Based on the predictions of the classifier for the target domain model, perform fuzzy clustering on the features of the target domain samples, using the cluster membership degree as the soft label for the target domain samples, and calculate the cross-entropy loss L between the soft label and the model classifier's predictions for the target domain samples. clu ;
[0011] Step 5: Calculate the information maximization loss L for the target domain samples. IM The information maximization loss includes minimizing entropy loss and maximizing average entropy loss, which makes the sample prediction confidence higher and avoids model collapse;
[0012] Step 6: Align the loss L with the distribution described above. BN Cross-entropy loss L clu And information maximization loss L IM By jointly training the target domain model, unsupervised domain adaptation of passive domain data can be achieved, thereby improving the accuracy of target domain sample recognition.
[0013] Preferably, in step 1, to prevent the pre-trained model from overfitting to the source domain data, the cross-entropy loss is calculated after label smoothing to improve the model's generalization performance to the target domain. The objective function is:
[0014]
[0015] Among them, f s This represents a pre-trained source domain model, including the feature extractor g. s and classifier h s Given input x, f s (x)=h s (g s (x)); K represents the number of categories, k corresponds to any category, X s Given the source domain sample set; given q k For the source domain sample x s The tag, then It is qk The smoothed label meets the requirements. α is the smoothing coefficient, 0 < α < 1, and is generally taken as 0.05 ≤ α ≤ 0.15;
[0016] σ(·) represents the softmax normalization operation on a given vector. Assuming a given vector a and a temperature parameter T, σ... k This represents the value of the k-th dimension obtained after operating on a vector σ(·).
[0017]
[0018] a k Let T represent the value of the k-th dimension of vector a, and let j point to the j-th dimension of vector a. In equation (1), T is 1.
[0019] Preferably, the classifier of the target domain model remains constant. In step 2, the target domain model f t Including feature extractor g t and classifier h t They are initialized as the feature extractor and classifier in the source domain model, respectively. Given any input x, f satisfies t (x)=h t (g t (x)); The feature extractor of the target domain model is optimized by the loss function, and the classifier is frozen and not updated after initialization.
[0020] Preferably, in step 3, the statistical information of the BN layer includes the mean and variance of each channel in that layer. This statistical information can be used to approximate the global feature distribution of the training samples. Specifically, the data distribution of each channel in each BN layer can be represented by a Gaussian distribution N(μ, σ). 2 ) represents, where μ and σ 2 The mean and variance of the Gaussian distribution are represented by the mean and variance of each channel in each BN layer of the source domain model and the mean and variance of each channel in the current batch sample of the BN layer of the target domain sample. The average value of the KL divergence between them is used as a measure of the distance between the feature distributions of the source domain and target domain samples.
[0021] Preferably, the distribution alignment loss L BN for,
[0022]
[0023] Where M represents the total number of BN layers in the model, C m This represents the total number of channels in the m-th BN layer. and Let represent the mean and variance stored in the cm-th channel of the m-th BN layer in the source domain model. and This indicates that the current batch has passed through the m-th BN layer of the target domain model, at the c-th level. m The mean and variance of each channel; D KL Let KL divergence be a metric.
[0024] Minimize loss function L BN By minimizing this loss function, the Gaussian distribution represented by the mean and variance in the BN layer is used to approximate the source domain feature distribution that cannot be obtained due to the lack of source domain samples, thereby achieving distribution alignment with the target domain features.
[0025] Preferably, step 4 includes the following steps:
[0026] Step 4.1: Due to the data differences between the target domain and the source domain, the predictions of the target domain samples by the fixed source domain classifier are noisy, rendering the classifier ineffective in correcting difficult-to-distinguish target domain samples. To alleviate this problem, a cluster-based soft label alignment loss is introduced. The soft label generation process is similar to fuzzy clustering. First, the extracted features are weighted and averaged using the probabilities output by the target domain model classifier as weights to initialize the cluster centers:
[0027]
[0028] In equation (3), δ k f represents the cluster center of the k-th class. t The target model is represented by the feature extractor g. t and classifier h t Given input x, f t (x)=h t (g t (x)); x t B represents the target domain sample. t This represents the batch of target domain samples currently read in, σ(·) represents the softmax normalization operation on a given vector, and the superscript T represents the transpose of the vector.
[0029] Step 4.2: Although directly calculating the distance from sample features to cluster centers can yield one-hot encoded pseudo-labels, some misclassified samples are too far from the decision boundary to be corrected by clustering alone. To reduce the impact of erroneous pseudo-labels, the one-hot encoded pseudo-labels are replaced with smooth soft labels. This reduces overconfidence in erroneous labels and ultimately improves the model's generalization ability. Based on cluster center δ... k Calculate the cosine distance from the sample to each cluster center, take the reciprocal, and then normalize using softmax to obtain the predicted distribution of the sample. Simultaneously, add the temperature parameter T to adjust the smoothness of the soft label.
[0030]
[0031] Where D represents the cosine distance. This represents the probability or membership degree of the soft label obtained from clustering in the kth class, where the temperature parameter satisfies 0.6≤T≤1.2;
[0032] Step 4.3: Calculate the cross-entropy loss using the soft label and the model classifier's output probability distribution for the target domain samples.
[0033]
[0034] It can correct, to some extent, the prediction of target domain samples that were initially misclassified by the source domain classifier, thus optimizing the model.
[0035] Preferably, in step 5, the information maximization loss results in higher prediction confidence for samples in the target domain, while avoiding the collapse of solutions caused by all samples being classified into only a few classes. Specifically, it satisfies the following:
[0036] L IM =L ent +L div (6)
[0037] Among them, L ent To minimize entropy loss, L div To maximize the average entropy loss, The average membership degree of the k-th class.
[0038] Preferably, in step 6, the complete objective function L gt for,
[0039] L gt =L IM +βL BN +γL clu (7)
[0040] Where β and γ are the corresponding hyperparameters, β, γ∈[0.6, 1.0].
[0041] This invention relates to an unsupervised domain adaptation method for source-domain data. The method involves training a model with labeled source-domain samples to obtain a pre-trained source-domain model; initializing a target-domain model using the source-domain model; approximating the feature distribution of the source domain with the statistical information stored in the Batch Normalization (BN) layer of the source-domain model, explicitly aligning it with the feature distribution of the target-domain samples, minimizing the distribution alignment loss, and bringing the feature distribution spaces of the source and target domains as close as possible; performing fuzzy clustering of the target-domain samples' features based on the predictions of the classifier from the source-domain model, using the cluster membership degree as the soft label for the target-domain samples, calculating the cross-entropy loss between the soft label and the model classifier's predictions for the target-domain samples, and maximizing the information loss for the target-domain samples; and training the target-domain model using all loss functions to achieve unsupervised domain adaptation for source-domain data, correcting some initially misclassified target-domain samples by the classifier, and improving classification accuracy.
[0042] The beneficial effects of this invention are as follows:
[0043] (1) Make full use of the statistical information stored in the parameters of the source domain network model, namely the mean and variance of the training samples, to approximate the feature distribution of the source domain samples, so as to explicitly align the distribution with the target domain samples and avoid the problem of not being able to align the distribution due to the inability to obtain the source domain samples.
[0044] (2) Although the pseudo labels obtained by direct clustering have improved the prediction accuracy compared with the direct model classifier, some misclassified samples are still too far from the decision boundary and difficult to correct by clustering. The smooth soft labels obtained by calculating the cosine distance between the sample and the cluster center plus the temperature parameter can contain more information about the target domain samples.
[0045] (3) Compared with traditional unsupervised domain adaptation methods, this invention has higher classification accuracy and does not require source domain data, only using pre-trained models and target domain samples, thus having wider model applicability;
[0046] (4) The effectiveness was validated on the SVHN, MINIST, USPS, Office-31 and Office-Home datasets. The target domain model achieved an average accuracy of 89.5% on six migration scenarios with pairwise migration in three subsets of the Office-31 dataset and an average accuracy of 72.4% on twelve migration scenarios with four subsets of the Office-Home dataset. Attached Figure Description
[0047] Figure 1 This is a flowchart of the present invention;
[0048] Figure 2 This is a schematic diagram of the unsupervised domain adaptation method for passive domain data according to the present invention. Detailed Implementation
[0049] The present invention will be further described in detail below with reference to embodiments, but the scope of protection of the present invention is not limited thereto.
[0050] This invention relates to an unsupervised domain adaptation method for source domain data. It reuses the classifier of the source domain model and approximates the feature distribution of the source domain by using the statistical information stored in the Batch Normalization (BN) layer of the source domain model—namely, the global mean and variance of the model's training samples—thereby explicitly minimizing the distribution difference between the source and target domains. Since the source domain classifier's predictions of target domain samples contain noise, this invention proposes soft clustering of the target domain sample features based on the classifier output to obtain smooth labels. Compared to pseudo-labels obtained through one-hot encoding, the membership degrees obtained by soft clustering contain more information about the target domain samples, which can, to some extent, correct target domain samples that the source domain classifier struggles to distinguish. Furthermore, it employs information maximization loss to improve the confidence of sample predictions and prevent collapsed solutions, thereby further improving the model's classification performance and robustness in the target domain.
[0051] In this invention, it should be noted that the subscript s represents the source domain and the subscript t represents the target domain.
[0052] The method includes the following steps:
[0053] Step 1: Select a dataset, determine the source domain and target domain, and train the model with labeled source domain samples to obtain a pre-trained source domain model.
[0054] In step 1, a public dataset is selected, and multiple domains within the dataset are randomly paired to form multiple migration scenarios. In this invention, two public datasets, Office-31 and Office-Home, are used as experimental datasets. Office-31 is a small dataset containing 31 classes of photos in an office environment, with three subsets; Office-Home is a medium-sized dataset containing 65 classes of photos, with four subsets. When training the source domain model using the labeled source domain datasets, label smoothing is added to the standard cross-entropy loss to increase robustness.
[0055] The cross-entropy loss is calculated after label smoothing, and the objective function is...
[0056]
[0057] Among them, f s This represents a pre-trained source domain model, including the feature extractor g. s and classifier h s Given input x, f s (x)=h s (g s(x)); K represents the number of categories, k corresponds to any category, X s Given the source domain sample set; given q k For the source domain sample x s The tag, then It is q k The smoothed label meets the requirements. α is the smoothing coefficient, 0 < α < 1;
[0058] σ(·) represents the softmax normalization operation on a given vector. Assuming a given vector a and a temperature parameter T, σ... k This represents the value of the k-th dimension obtained after operating on a vector σ(·).
[0059]
[0060] a k Let T represent the value of the k-th dimension of vector a, and let j point to the j-th dimension of vector a. In equation (1), T is 1.
[0061] Step 2: Initialize the target domain model using the source domain model, including the feature extractor and classifier; the feature extractor of the target domain model will be trained and optimized subsequently, while the classifier will remain unchanged.
[0062] In this invention, after initialization is completed, g s =g t And h s =h t , where h t There will be no further updates.
[0063] Step 3: Approximate the global feature distribution of the source domain samples using the statistical information stored in the BN layer of the source domain model, and use this information to explicitly align with the feature distribution of the target domain samples. Calculate the distribution alignment loss L. BN ;
[0064] In step 3, the average of the relative entropy (KL divergence) of the Gaussian distribution represented by the mean and variance of each channel in each BN layer of the source domain model and the Gaussian distribution represented by the mean and variance of each channel in the current batch of samples in the corresponding BN layer of the target domain sample is calculated as a measure of the distance between the feature distributions of the source domain and target domain samples. The loss function is as follows:
[0065]
[0066] Where M represents the total number of BN layers in the model, C m This represents the total number of channels in the m-th BN layer. and This represents the m-th BN layer and the c-th layer in the source domain model. m The mean and variance stored in each channel. and This indicates that the current batch has passed through the m-th BN layer of the target domain model, at the c-th level. m The mean and variance of each channel are used to approximate the source domain feature distribution, which cannot be obtained due to the lack of source domain samples, by minimizing this loss function and using a Gaussian distribution represented by the mean and variance in the BN layer, so as to align it with the feature distribution of the target domain.
[0067] Step 4: Based on the predictions of the classifier for the target domain model, perform fuzzy clustering on the features of the target domain samples, using the cluster membership degree as the soft label for the target domain samples, and calculate the cross-entropy loss L between the soft label and the model classifier's predictions for the target domain samples. clu ;
[0068] Step 4 includes the following steps:
[0069] Step 4.1: To alleviate the noise problem of the source domain classifier on the target domain samples, this invention introduces soft label loss; the soft label generation process is similar to fuzzy clustering. First, the extracted features are weighted and averaged using the probability output by the target domain model classifier as weights to initialize the cluster centers:
[0070]
[0071] In equation (3), δ k f represents the cluster center of the k-th class. t The target model is represented by the feature extractor g. t and classifier h t Given input x, f t (x)=h t (g t (x)); x t B represents the target domain sample. t This represents the batch of target domain samples currently read in, σ(·) represents the softmax normalization operation on a given vector, and the superscript T represents the transpose of the vector.
[0072] Step 4.2: Although directly calculating the distance from sample features to cluster centers can yield one-hot encoded pseudo-labels, some misclassified samples are too far from the decision boundary to be corrected by clustering alone. To reduce the impact of erroneous pseudo-labels, the one-hot encoded pseudo-labels are replaced with smooth soft labels. This reduces overconfidence in erroneous labels and ultimately improves the model's generalization ability. Based on cluster center δ... k Calculate the cosine distance from the sample to each cluster center, take the reciprocal, and then normalize using softmax to obtain the predicted distribution of the sample. Simultaneously, add a temperature parameter T (0.6 ≤ T ≤ 1.2) to adjust the smoothness of the soft labels.
[0073]
[0074] Where D represents the cosine distance. This represents the probability or membership degree of the soft label obtained from clustering in the kth class, where the temperature parameter satisfies 0.6≤T≤1.2;
[0075] Step 4.3: Calculate the cross-entropy loss using the soft label and the model classifier's output probability distribution for the target domain samples. The loss function is:
[0076]
[0077] It can correct the prediction of target domain samples that were initially misclassified to a certain extent, thereby optimizing the model.
[0078] Step 5: Employ information maximization loss L IM This includes minimizing entropy loss and maximizing average entropy loss, which makes the sample prediction confidence higher and avoids collapsed solutions.
[0079] In step 5, maximizing information loss includes minimizing entropy loss L. ent and maximizing the average entropy loss L div This results in higher prediction confidence for samples in the target domain, while avoiding the collapse of solutions caused by all samples being classified into only a few classes. Specifically,
[0080] L IM =L ent +L div (6)
[0081] Where: L ent To minimize entropy loss,
[0082]
[0083] L div To maximize the average entropy loss, The average membership degree of the k-th class.
[0084] Step 6: Use the loss from the above three parts: distribution alignment loss L BN Cross-entropy loss L clu And information maximization loss L IM We can jointly train the target domain model to improve the accuracy of identifying target domain samples.
[0085] In step 6, combining the losses from the above three components, the optimized complete objective function is:
[0086]
[0087] Where β and γ are the hyperparameters of the two-part loss, β, γ∈[0.6, 1.0].
[0088] In this invention, a specific embodiment is given:
[0089] Step 1: Select the Amazon subset of the Office-31 dataset as the source domain training set and the Webcam subset as the target domain. Amazon contains 2817 images of online e-commerce with a single background, and Webcam contains 795 noisy, low-resolution images, both belonging to 31 classes.
[0090] Step 2: Train the source domain model using Amazon Web Services (AWS) and select ResNet50 as the backbone model. Replace the last fully connected layer of the ResNet network with a 256-dimensional adaptation layer, add a batch normalization (BN) layer after the adaptation layer, and finally, a 31-class classifier. When training the source domain model with a labeled source domain dataset, we add label smoothing to the standard cross-entropy loss to increase robustness. The smoothing parameter α is 0.1, and the batch size is 64.
[0091] The above operations can be performed on the dataset using deep learning frameworks such as PyTorch. The images are input into the DataLoader, the data in the DataLoader are traversed and input into the encoder to obtain their model outputs, the loss is calculated, and the model is optimized using the SGD optimizer.
[0092] Step 3: Initialize the target domain model using the source domain model.
[0093] Step 4: Use the Gaussian distribution N(μ, σ) represented by the mean and variance of each channel in each BN layer of the source domain model. 2 Align the mean and variance of each channel of the current batch samples in the BN layer with the target domain samples, representing Gaussian distributions. The model has a total of 54 BN layers, and the average value of their relative entropy (KL divergence) L is calculated. BN It serves as a measure of the distance between the feature distributions of samples in the source and target domains.
[0094] Step 5: Use the classifier output as fuzzy membership degrees to perform soft clustering on the target domain sample features, add a temperature parameter T = 0.8 to obtain smooth soft labels, and calculate the loss L by aligning it with the classifier prediction. clu .
[0095] Step Six: Employ Information Maximization Loss L IM This includes minimizing the entropy loss L. ent and maximizing the average entropy loss L div This results in higher prediction confidence for samples in the target domain, while avoiding the situation where all samples are classified into a few classes, leading to a collapse solution.
[0096] Step 7: Combine the above three parts of the loss and add weights to the partial losses. The feature extractor for the target domain is optimized, while the classifier is frozen and not updated.
[0097] The target domain model was trained using the SGD optimizer with a momentum of 0.9 and a weight decay of 10. -3 The batch size is 64. The learning rate dynamically changes as lr = lr0(1 + 10p). -0.75 ), where lr0 is the initial value, except for the adaptation layer and classifier which are set to 0.01, all others are set to 0.001, and p changes from 0 to 1 as the number of iterations increases. During training, the soft labels obtained from clustering are updated once per epoch, with hyperparameters β = 0.3, γ = 1.0, and epoch set to 20.
[0098] Based on the present invention, training the target domain model can realize the transfer or application of knowledge from the source domain model to the learning of unlabeled target domain data, reducing the impact of distribution offset between data from different domains; based on this method, the development of computer media, programs, and devices can be realized.
[0099] Those skilled in the art will understand that embodiments of the present invention can be provided as methods, systems, or computer program products. Therefore, the present invention can take the form of a completely hardware embodiment, a completely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present invention can take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, etc.) containing computer-usable program code.
[0100] This invention is described with reference to flowchart illustrations and / or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each block of the flowchart illustrations and / or block diagrams, and combinations of blocks in the flowchart illustrations and / or block diagrams, can be implemented by computer program instructions. These computer program instructions can be provided to a processor of a general-purpose computer, special-purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, generate instructions for implementing the flowchart illustrations and / or block diagrams. Figure 1 One or more processes and / or boxes Figure 1 A device that provides the functions specified in one or more boxes.
[0101] These computer program instructions may also be stored in a computer-readable storage medium that can direct a computer or other programmable data processing device to function in a particular manner, such that the instructions stored in the computer-readable storage medium produce an article of manufacture including instruction means, which are implemented in a process Figure 1 One or more processes and / or boxes Figure 1 The function specified in one or more boxes.
[0102] These computer program instructions may also be loaded onto a computer or other programmable data processing equipment to cause a series of operational steps to be performed on the computer or other programmable equipment to produce a computer-implemented process, thereby providing instructions that execute on the computer or other programmable equipment for implementing the process. Figure 1 One or more processes and / or boxes Figure 1 The steps of the function specified in one or more boxes.
[0103] Although preferred embodiments of the invention have been described, those skilled in the art, upon learning the basic inventive concept, can make other changes and modifications to these embodiments. Therefore, the appended claims are intended to be interpreted as including both the preferred embodiments and all changes and modifications falling within the scope of the invention.
[0104] Obviously, those skilled in the art can make various modifications and variations to this invention without departing from its spirit and scope. Therefore, if these modifications and variations fall within the scope of the claims of this invention and their equivalents, this invention also intends to include these modifications and variations.
Claims
1. An unsupervised domain adaptation method for passive domain data, characterized in that: The method includes the following steps: Step 1: Select a source domain training set, which includes images with a single background and a target domain containing low-resolution images with noise; train the model with labeled source domain samples to obtain a pre-trained source domain model. Step 2: Initialize the target domain model with the source domain model, including the feature extractor and classifier; Step 3: Approximate the feature distribution of the source domain with the statistical information stored in the Batch Normalization (BN) layer of the source domain model, explicitly align it with the feature distribution of the target domain samples, and calculate the distribution alignment loss. ; The statistical information of the BN layer includes the mean and variance. The average value of the KL divergence is calculated by comparing the Gaussian distribution of the mean and variance of each channel in each BN layer of the source domain model with the Gaussian distribution of the mean and variance of each channel in the current batch of samples in the BN layer of the target domain sample. This average value is used to measure the distance between the feature distributions of the source domain and target domain samples. Distribution Alignment Loss for, (2) in, This represents the total number of BN layers in the model. This represents the total number of channels in the m-th BN layer. This represents the m-th BN layer in the source domain model. The mean and variance stored in each channel. This indicates that the current batch has passed through the m-th BN layer of the target domain model. The mean and variance of each channel; for Divergence; Minimize loss function ; Step 4: Based on the predictions of the classifier for the target domain model, perform fuzzy clustering on the features of the target domain samples, using the cluster membership degree as the soft label for the target domain samples, and calculate the cross-entropy loss between the soft label and the model classifier's predictions for the target domain samples. The steps include: Step 4.1: Using the probabilities output by the target domain model classifier as weights, perform a weighted average on the extracted features to initialize the cluster centers. (3) in, This represents the cluster center of the k-th class. Represents the target model, including the feature extractor. and classifier Satisfying the given input , ; Represents the target domain sample. σ represents the batch of currently read target domain samples. This represents the softmax normalization operation on a given vector, and the superscript T indicates the transpose of the vector; Step 4.2: Based on the cluster center Calculate the cosine distance from the sample to each cluster center, take the reciprocal, and then normalize using softmax to obtain the predicted distribution of the sample, while also adding the temperature parameter. Adjust the smoothness of the soft label. (4) in, Represents the cosine distance. This represents the probability or membership degree of the soft label obtained from clustering in the k-th class, where the temperature parameter satisfies... ; Step 4.3: Calculate the cross-entropy loss using the soft label and the model classifier's output probability distribution for the target domain samples. (5) Correcting target domain sample predictions that are misclassified by the source domain classifier; Step 5: Calculate the information-maximizing loss for the target domain samples. Information maximization loss includes minimizing entropy loss and maximizing average entropy loss; Step 6: Align the loss with the distribution. Cross-entropy loss And information maximization loss We jointly train the target domain model to achieve unsupervised domain adaptation of passive domain data.
2. The unsupervised domain adaptation method for passive domain data according to claim 1, characterized in that: In step 1, the cross-entropy loss is calculated after label smoothing, and the objective function is: (1) in, This represents a pre-trained source domain model, including the feature extractor. and classifier Satisfying the given input , ; Indicates the number of categories. For any category, Given the source domain sample set; For source domain samples The tag, then Yes The smoothed label meets the requirements. , It is the smoothing coefficient. ; σ This represents the softmax normalization operation on a given vector. And temperature parameter T, using This indicates that for a certain vector σ The result obtained after the operation Dimension value, , Representing vectors No. Dimension value, pointer No. Dimension, T is 1.
3. The unsupervised domain adaptation method for passive domain data according to claim 1, characterized in that: The classifier for the target domain model remains constant.
4. The unsupervised domain adaptation method for passive domain data according to claim 2, characterized in that: In step 5, the information maximization loss is satisfied. (6) in, To minimize entropy loss, ; To maximize the average entropy loss, , The average membership degree of the k-th class. .
5. The unsupervised domain adaptation method for passive domain data according to claim 1, characterized in that: In step 6, the complete objective function for, (7) in, and For the corresponding hyperparameters, .