Single-source domain generalization fault diagnosis method based on masking feature against unwinding network

What is AI technical title?
AI technical title is built by PatSnap AI team. It summarizes the technical point description of the patent document.
By employing a method of masking features to counteract unwrapped networks, and utilizing multi-scale style enhancement and feature decoupling techniques, the generalization problem of deep learning fault diagnosis models in single-source domains under varying operating conditions is solved. This achieves strong generalization fault diagnosis in single-source domains, improving the robustness and accuracy of the model.

CN122241415APending Publication Date: 2026-06-19UNIV OF SCI & TECH BEIJING

View PDF 0 Cites 0 Cited by

Patent Information

Authority / Receiving Office: CN · China
Patent Type: Applications(China)
Current Assignee / Owner: UNIV OF SCI & TECH BEIJING
Filing Date: 2026-02-03
Publication Date: 2026-06-19

Application Information

Patent Timeline

03 Feb 2026

Application

19 Jun 2026

Publication

CN122241415A

IPC: G06F18/2415; G06F18/213; G06F18/15; G06N3/045; G06N3/0475; G06N3/094; G06N3/096; G06N3/0895; G06N5/04

AI Tagging

Application Domain

Biological models Inference methods

Explore More Agents

Novelty Search
Search existing technologies and assess novelty
↗
FTO
Analyze whether a product may infringe others' patents
↗
Design FTO
Check prior-design risk for exterior design
↗
Drafting
Draft patent application text based on a technical solution
↗
Find Solutions with TRIZ
Generate feasible solution to solve your technical challenge
↗

Similar Technology Patents

Get free access to AI patent search and analysis

Check patentability, review prior art and ask IP Agent with full patent context.

Smart Images

Figure CN122241415A_ABST

Patent Text Reader

Abstract

This invention discloses a single-source domain generalized fault diagnosis method based on a masking feature adversarial unwrapping network, relating to the field of predictive maintenance technology in industrial artificial intelligence. The method includes: S1, performing multi-scale style enhancement on the original training signal based on a style enhancement module; S2, decoupling the enhanced signal using a masking feature adversarial unwrapping module to obtain dominant and subordinate features; S3, obtaining a classification loss for dominant features based on a first classifier and a classification loss for subordinate features based on a second classifier; S4, calculating the first and second loss functions and performing two-stage adversarial training, jointly optimizing and fixing the parameters of the feature extractor, masker, first classifier, and second classifier; S5, inputting the sample to be diagnosed into the feature extractor obtained in step S4 to obtain the dominant features to be diagnosed, and inputting the dominant features to be diagnosed into the first classifier obtained in step S4 to obtain the fault category prediction probability distribution.

Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of predictive maintenance technology in industrial artificial intelligence, and in particular to a single-source domain generalized fault diagnosis method based on masking feature adversarial untangling network. Background Technology

[0002] In intelligent manufacturing systems, monitoring the health status of rotating machinery (such as bearings and gears) is crucial. Existing deep learning-based fault diagnosis models typically work well under the assumption that the training and testing data distributions are consistent. However, real-world industrial equipment often operates under varying conditions (such as changes in speed and load), leading to data distribution shifts and causing a sharp decline in model performance under unknown operating conditions.

[0003] Significant progress has been made in deep learning-based intelligent fault diagnosis technology, but its performance heavily relies on the assumption that training and test data are independently and identically distributed. In real-world industrial scenarios, changes in equipment operating conditions (such as speed, load, and ambient temperature) can cause shifts in the distribution of collected vibration, sound, and other signals, known as covariate shift. This results in a significant drop in diagnostic accuracy when a model trained under one operating condition is directly applied to another.

[0004] To address the distribution shift problem, Domain Adaptation (DA) in transfer learning has been extensively studied. DA methods utilize labeled raw training signals and unlabeled (or poorly labeled) target domain data, reducing inter-domain discrepancies during training through feature alignment, adversarial learning, and other methods. However, DA methods rely on the availability of target domain data during training, limiting their application in novel or unknown scenarios and often proving impractical in engineering practice. Domain generalization methods, aiming to train models with strong generalization capabilities using only the raw training signals, represent a more suitable solution for real-world needs.

[0005] Domain generalization (DG) assumes that the target domain data is completely unavailable during the training phase, and learns a model that can generalize to any unknown target domain using only one or more original training signals. Existing DG methods can be divided into three categories: (1) Data manipulation method: artificially creating domain diversity by enhancing, mixing or generating the original training signals. For example, generating new samples by mixing style statistics of different samples. The disadvantage is that the distribution of generated samples may not cover the real complex domain variations. (2) Learning strategy method: encouraging generalization by improving the training process, such as meta-learning, gradient operations or specific regularization terms. This type of method may be unstable during training and sensitive to the quantity and quality of the source domain. (3) Representation learning method: the core is to learn domain-invariant feature representations. Commonly used methods include: adversarial learning (training a domain classifier and deceiving it), feature decoupling (separating features into domain-shared and domain-private parts), causal inference, etc. The current bottleneck lies in the difficulty of completely and cleanly separating task-related features from domain-related features in complex high-dimensional feature spaces; moreover, most methods require multiple source domains to define the concept of "domain" for alignment or decoupling. This is particularly problematic in industrial fault diagnosis scenarios where fault samples are scarce, typically requiring sufficient data from a single stable operating condition. Therefore, single-source domain DG has become a crucial but highly challenging research direction. Existing methods in single-source domains often suffer from limited generalization performance improvement due to insufficient diversity and lack of comparative information.

[0006] However, existing domain generalization methods face the following challenges:

[0007] Under single-source domain conditions, the data diversity is insufficient, making it difficult to simulate complex domain changes; and it is difficult to effectively decouple the essential features that are related to the fault category but not to the operating conditions from high-dimensional features. Summary of the Invention

[0008] To address the technical problems existing in the prior art, embodiments of the present invention provide a single-source domain generalized fault diagnosis method based on masking feature-based adversarial unwrapping networks. The technical solution is as follows:

[0009] On the one hand, a single-source domain generalized fault diagnosis method based on masking feature adversarial unwrapped networks is provided, including the following steps:

[0010] S1. Obtain the original training signal, and perform multi-scale style enhancement on the original training signal based on the style enhancement module to obtain the enhanced signal;

[0011] S2. Based on the feature extractor set in the masking feature adversarial unwrapping module, the original training signal and the enhanced signal are used to obtain embedded features. Based on the feature extractor and the mask set in the masking feature adversarial unwrapping module, the enhanced signal is decoupled to obtain advantageous features and disadvantageous features.

[0012] S3. Obtain the embedding feature classification score based on the first classifier and the embedding feature; obtain the dominant feature classification score based on the first classifier and the dominant feature and calculate the dominant feature classification loss; obtain the inferior feature classification score based on the second classifier and the inferior feature and calculate the inferior feature classification loss; calculate the divergence loss based on the dominant feature classification score and the embedding feature classification score.

[0013] S4. Based on the divergence loss, the dominant feature classification loss, and the inferior feature classification loss, a first loss function and a second loss function are obtained, and a two-stage adversarial iterative training is performed based on the first loss function and the second loss function. In the first stage of iterative training, the feature extractor, the first classifier, and the second classifier are jointly optimized, and in the second stage of iterative training, the parameters of the feature extractor, the masker, the first classifier, and the second classifier are fixed.

[0014] S5. Input the sample to be diagnosed into the feature extractor obtained in step S4 to obtain the dominant features to be diagnosed, and input the dominant features to be diagnosed into the first classifier obtained in step S4 to obtain the fault category prediction probability distribution.

[0015] Optionally, the style enhancement module in step S1 includes multiple style feature generation units arranged in parallel;

[0016] Each of the multiple style feature generation units includes a converter, an inverse converter, and an instance normalization layer.

[0017] Optionally, step S1 includes:

[0018] S11. An adaptive instance normalization layer is introduced into each of the style feature generation units, and multiple stylization signals are obtained based on the original training signal, the converter, and the inverse converter.

[0019] S12. The original training signal and the multiple stylized signals are linearly mixed and normalized to obtain the enhanced signal.

[0020] Optionally, the i-th stylized signal among the plurality of stylized signals for:

[0021]

[0022] in, It is the converter, It is the inverse converter, It is the original training signal. It is an instance normalization layer. and These are two linear layers in the adaptive instance normalization layer.

[0023] Optionally, the enhanced signal for:

[0024]

[0025] in, These are random weights sampled from a standard normal distribution.

[0026] Optionally, step S2 includes:

[0027] S21. Based on the original training signal and the enhanced signal, the feature extractor is used to obtain the embedded features;

[0028] S22. Based on the masking feature countermeasures in the unwrapping module, the embedded features are processed to obtain a first vector;

[0029] S23. The first vector is sampled multiple times to obtain a mask vector, and the embedded features are decoupled based on the mask vector to obtain advantageous features and disadvantageous features.

[0030] Optionally, in step S23, the mask vector is M, and its 6th... The masking strength of each feature dimension in the diagnostic task is used as... The formula is as follows:

[0031]

[0032] in, , It is a probability vector, and its expression is:

[0033]

[0034] in, Indicates the feature dimension index. For the number of samples index, The logits represent all dimensions in the denominator, used for softmax normalization. Indicates the first During the second sampling Gumbel noise in multiple dimensions For temperature parameters;

[0035] The first vector is:

[0036]

[0037] in, It is the aforementioned mask. z is the embedded feature.

[0038] Optionally, the advantages and features for:

[0039]

[0040] The disadvantageous feature is:

[0041]

[0042] Wherein, M is the mask vector.

[0043] Optionally, in step S3:

[0044] The divergence loss is The formula is:

[0045]

[0046] in, It is the embedded feature classification score:

[0047]

[0048] in, It is an exponentially normalized exponential function. Z is the first classifier, and Z is the embedded feature.

[0049] in, It is the dominant feature classification score:

[0050]

[0051] in, These are the aforementioned advantages and features.

[0052] Optionally, in step S4:

[0053] The first loss function for:

[0054]

[0055] in, For dominant feature classification loss, Classification loss for inferior features, For divergence loss, To supervise and compare learning loss;

[0056] The second loss function is:

[0057]

[0058] in, It's the loss of Manhattan distance. , , It's about balancing hyperparameters.

[0059] On the other hand, a single-source domain generalized fault diagnosis device based on masking feature-based adversarial unwrapping network is provided, the fault diagnosis device comprising:

[0060] processor;

[0061] The memory stores computer-readable instructions that, when executed by the processor, implement any of the methods described above for single-source domain generalized fault diagnosis methods based on masking feature-based adversarial unwrapping networks.

[0062] On the other hand, a computer-readable storage medium is provided, wherein at least one instruction is stored in the storage medium, the at least one instruction being loaded and executed by a processor to implement any of the above-described methods of single-source domain generalized fault diagnosis methods based on masking feature adversarial untangled networks.

[0063] The beneficial effects of the technical solutions provided in the embodiments of the present invention include at least the following:

[0064] This invention provides a masking feature decoupling mechanism based on adversarial game theory. Through a learnable mask and driven by adversarial loss, it automatically and explicitly decouples neural network features into advantageous and disadvantageous parts. This mechanism does not rely on any domain label, achieving self-discovery and enhancement of domain-invariant features within a single-source domain. Furthermore, it combines multi-scale random style enhancement with a masking feature adversarial decoupling module. Style enhancement is responsible for "creating differences," providing a contrastive basis for decoupling; feature decoupling is responsible for "learning invariance from differences." Working together, they constitute a complete domain generalization solution for single-source domain conditions. In addition, this invention ensures that the decoupled advantageous features do not lose the discriminative semantics of the original features through divergence loss. Supervised contrastive loss and Manhattan distance loss force feature similarity between enhanced views of different styles, thereby stabilizing the adversarial training process and improving the robustness of the learned features. Secondly, this invention employs a two-stage training strategy, performing feature learning and feature decoupling in stages, avoiding the instability of multi-objective joint optimization, and ensuring that the model first learns good basic discriminative features before performing refined feature decoupling operations. Attached Figure Description

[0065] To more clearly illustrate the technical solutions in the embodiments of the present invention, the accompanying drawings used in the description of the embodiments will be briefly introduced below. Obviously, the accompanying drawings described below are only some embodiments of the present invention. For those skilled in the art, other drawings can be obtained based on these drawings without creative effort.

[0066] Figure 1 This is a flowchart of a single-source domain generalized fault diagnosis method based on masking feature adversarial unwrapping network provided by an embodiment of the present invention;

[0067] Figure 2 This is a framework diagram of a single-source domain generalized fault diagnosis method based on masking feature adversarial unwrapping network provided by an embodiment of the present invention;

[0068] Figure 3 This is a framework diagram of the style enhancement module in a single-source domain generalized fault diagnosis method based on masking feature adversarial unwrapping network provided in an embodiment of the present invention;

[0069] Figure 4 This is a schematic diagram of a subway train packing rack axle box fault diagnosis platform provided in an embodiment of the present invention;

[0070] Figure 5 This is a diagram showing the diagnostic results of eight different methods provided in this invention on 20 single-source domain generalization tasks on the subway train packing rack bearing dataset;

[0071] Figure 6 This invention provides a feature distribution map that maps features obtained by a feature extractor to a two-dimensional space.

[0072] Figure 7 These are the feature distribution kernel density estimation diagrams of the source and target domains for the four methods provided in the embodiments of the present invention;

[0073] Figure 8 These are the feature distribution kernel density estimation diagrams of the source and target domains for four other methods provided in this embodiment of the invention. Detailed Implementation

[0074] The technical solution of the present invention will now be described with reference to the accompanying drawings.

[0075] In embodiments of the present invention, words such as "exemplarily," "for example," etc., are used to indicate that something is an example, illustration, or description. Any embodiment or design described as "exemplary" in the present invention should not be construed as being more preferred or advantageous than other embodiments or designs. Specifically, the use of the word "exemplary" is intended to present the concept in a concrete manner. Furthermore, in embodiments of the present invention, the meaning expressed by "and / or" can be both, or either one.

[0076] In the embodiments of this invention, the terms "image" and "picture" may sometimes be used interchangeably. It should be noted that, without emphasizing the distinction between them, they convey the same meaning. Similarly, the terms "of," "corresponding (relevant)," and "corresponding" may sometimes be used interchangeably. It should be noted that, without emphasizing the distinction between them, they convey the same meaning.

[0077] In this embodiment of the invention, sometimes a subscript such as W1 may be written in a non-subscript form such as W1. When the difference is not emphasized, the meaning they express is the same.

[0078] To make the technical problems, technical solutions and advantages of the present invention clearer, a detailed description will be given below in conjunction with the accompanying drawings and specific embodiments.

[0079] This invention provides a single-source domain generalized fault diagnosis method based on masking feature-based adversarial unwrapped networks, such as... Figure 1 The method flowchart shown includes the following steps:

[0080] S1. Obtain the original training signal, and perform multi-scale style enhancement on the original training signal based on the style enhancement module to obtain the enhanced signal;

[0081] S2. The enhanced signal is decoupled based on the masking feature countermeasure unwrapping module to obtain advantageous and disadvantageous features. The masking feature countermeasure unwrapping module includes a feature extractor and a masker.

[0082] S3. Obtain the embedding feature classification score based on the first classifier and the embedding feature; obtain the dominant feature classification score based on the first classifier and the dominant feature and calculate the dominant feature classification loss; obtain the inferior feature classification score based on the second classifier and the inferior feature and calculate the inferior feature classification loss; calculate the divergence loss based on the dominant feature classification score and the embedding feature classification score.

[0083] S4. Based on the divergence loss, the dominant feature classification loss, and the inferior feature classification loss, a first loss function and a second loss function are obtained. Two-stage adversarial training is performed based on the first loss function and the second loss function to jointly optimize the feature extractor, the first classifier, and the second classifier, and the parameters of the feature extractor, the masker, the first classifier, and the second classifier are fixed.

[0084] S5. Input the sample to be diagnosed into the feature extractor obtained in step S4 to obtain the dominant features to be diagnosed, and input the dominant features to be diagnosed into the first classifier obtained in step S4 to obtain the fault category prediction probability distribution.

[0085] This method does not rely on explicit multiple domains or domain labels to define invariant features. Instead, it designs a self-supervised, adversarial game-based feature decoupling mechanism. This mechanism allows the model to learn to construct "pseudo-domain variations" within a single source domain through style enhancement, identifying and strengthening "dominant features" that are crucial for classification and unaffected by style changes, while suppressing "disadvantageous features" that change with style. This approach more closely approximates the practical constraints of a single source domain, providing a new technical path for achieving strong generalization fault diagnosis.

[0086] Furthermore, this method performs feature learning and feature decoupling in stages, avoiding the instability of multi-objective joint optimization, ensuring that the model can first learn good basic discriminative features, and then perform refined feature decoupling operations.

[0087] This invention primarily trains the model using a style enhancement module and a masking feature adversarial unwrapping module, employing a two-stage training strategy; such as Figure 2 As shown, from the input original training signal Initially, it is generated through the style enhancement module. Then the feature extractor extracts and The entire process then proceeds to the masking feature adversarial unwrapping module for feature decoupling and adversarial training, ultimately outputting the diagnostic results. Figure 2 and Figure 3 In this context, the original training signal is also referred to as the original signal. The source domain here refers to the domain of monitoring data labeled with fault categories used in the model training phase, which typically corresponds to a known operating condition.

[0088] Optionally, in step S1, the style enhancement module includes multiple style feature generation units arranged in parallel; and each of the multiple style feature generation units includes a converter and an inverse converter. Through the style enhancement module, in the case of only a single original training signal, potential changes in operating conditions (domains) are simulated to generate enhanced samples with diverse styles, providing necessary domain-variable comparisons for subsequent feature decoupling. Each style feature generation unit also includes an adaptive instance normalization layer for simulating random domain shifts, which samples noise from a standard normal distribution. ;

[0089] Optionally, step S1 includes:

[0090] S11. Based on the original training signal, the converter, and the inverse converter, multiple stylized signals are obtained;

[0091] like Figure 3 As shown, Conv1 to ConvN are converters, and ConvT1 to ConvTN are inverse converters.

[0092] Optionally, in step S11, the i-th stylized signal among the plurality of stylized signals for:

[0093]

[0094] in, It is the converter, It is the inverse converter, It is the original training signal. It is an instance normalization layer. and These are two linear layers in the adaptive instance normalization layer. Through these two linear layers... and A set of scaling and offset parameters is generated and injected as multiplicative and additive perturbations into the instance-normalized features to obtain the stylized signal.

[0095] like Figure 3 As shown, AdaIN is the step for injecting random styles.

[0096] Optionally, in step S12, the original training signal and the plurality of stylized signals are linearly mixed and normalized to obtain the enhanced signal.

[0097] Optionally, the enhanced signal for:

[0098]

[0099] in, These are random weights sampled from a standard normal distribution. The original training signal... and the stylized signal generated by all N units According to random weights sampled from a standard normal distribution Linear mixing is performed, followed by min-max normalization to obtain the final enhanced signal. This step ensures the diversity and continuity of the generated sample styles.

[0100] Step S2 can automatically learn and decouple the advantageous features and the disadvantageous features from the embedded features obtained by the feature extractor.

[0101] Optionally, S2 includes:

[0102] S21. Based on the original training signal and the enhanced signal, the embedded features are obtained using the feature extractor; that is, a shared feature extractor is used. Process the original training signals separately and enhanced signal To obtain their embedding features and .

[0103] S22, The masking feature anti-unwrapping module is equipped with a masker. The embedded features are processed to obtain the first vector, i.e., the mask. For a multilayer perceptron, the embedded feature z is mapped to a... logits vector of dimension .

[0104] S23. The first vector is sampled multiple times to obtain a mask vector, and the embedded features are decoupled based on the mask vector to obtain the dominant features and the disadvantageous features. For example, for the first vector... Perform K Gumbel-Softmax samplings, each sampling yielding one dimensional probability vector Then, the results of the K samplings are subjected to an element-wise maximum value operation in the dimension to obtain a mask vector with binarization tendency. Each element in M A value close to 0 or 1 indicates the degree to which the corresponding feature dimension is judged as a "disadvantage" or "advantage".

[0105] Optionally, in step S23, the mask vector is M, and its 6th... The masking strength of each feature dimension in the diagnostic task is used as... The formula is as follows:

[0106]

[0107] in, , It is a probability vector, and its expression is:

[0108]

[0109] in, Indicates the feature dimension index. For the number of samples index, The logits represent all dimensions in the denominator, used for softmax normalization. Indicates the first During the second sampling Gumbel noise in multiple dimensions For temperature parameters;

[0110] The first vector is:

[0111]

[0112] in, It is the aforementioned mask. z is the embedded feature.

[0113] Then, using the direct mask M and its inverse mask (1-M) generated by the masker, the original embedding features are decoupled, and the dominant features... It contains the main discriminant information, namely:

[0114]

[0115] The disadvantageous characteristics It contains secondary discriminative information, namely:

[0116]

[0117] Wherein, M is the mask vector.

[0118] In step S4, for the feature extractor, the adversarial objective is to simultaneously minimize the classification loss of both dominant and disadvantageous features. and This forces all feature dimensions to include discriminative information as much as possible.

[0119] For the masker, the adversarial objective is to minimize the classification loss of the dominant features. Simultaneously, maximize the classification loss of the inferior features. This adversarial game-driven masker learns to assign information most easily used for classification to dominant features, and information difficult to use for classification or irrelevant to classification to disadvantageous features.

[0120] Optionally, to avoid the decoupling process compromising the discriminative power of the original features, KL divergence loss is introduced. The output distribution of the constrained dominant features after passing through the classifier is kept consistent with the original feature classification output distribution, ensuring that the decoupled dominant features do not lose the discriminative semantics of the original features. The divergence loss formula is:

[0121]

[0122] in, It is the embedded feature classification score:

[0123]

[0124] in, It is an exponentially normalized exponential function. Z is the first classifier, and Z is the embedded feature.

[0125] in, It is the dominant feature classification score:

[0126]

[0127] in, These are the aforementioned advantages and features.

[0128] By minimizing This can effectively ensure that the advantageous features extracted by the masker do not lose the label-related discriminative information in the original features, thereby enhancing the stability and reliability of training.

[0129] Optionally, in step S4, in order to stabilize training and gradually optimize, a two-stage adversarial training strategy is adopted, in which feature discriminative learning is performed in the first stage to jointly optimize the feature extractor. Classifier and The first loss function for:

[0130]

[0131] in, For dominant feature classification loss, Classification loss for inferior features, For divergence loss, To supervise the contrastive learning loss, it is used to bring the feature distance of similar samples (original samples and their enhanced samples, as well as similar samples within the same batch) closer together, and to push the feature distance of dissimilar samples further apart, thereby further enhancing intra-class compactness and inter-class separability.

[0132] In the second stage, feature adversarial unwrapping is performed, and the feature extractor is fixed. , mask w Classifier and The parameters are: the second loss function is:

[0133]

[0134] in, It is the Manhattan distance loss, used to constrain the features of the original samples. Rather than enhancing sample features The distance between them ensures that style enhancement does not change the essential semantics of the samples, thus improving the stability of the mask. , , It's about balancing hyperparameters.

[0135] This step forces feature similarity between augmented views of different styles, thereby stabilizing the adversarial training process and improving the robustness of the learned features.

[0136] Optionally, in the actual diagnostic process, the step of diagnosing the unknown operating condition is S5. In the actual diagnostic process, only the trained feature extractor is used. and dominant feature classifier Perform diagnosis; sample the unknown operating conditions (target domain) to be diagnosed. Input the feature extractor to obtain features At this point, the masking mechanism is no longer used, and the trained feature extractor has been optimized to directly output a representation rich in "dominant features." Direct input of dominant feature classifier The predicted probability distribution of fault categories is obtained, and the category corresponding to the highest probability is taken as the actual diagnostic result.

[0137] The core of this invention is a masking feature decoupling mechanism based on adversarial game theory. It proposes a method that utilizes a learnable masking mechanism and adversarial loss (…) This method, driven by [the specific domain], automatically and explicitly decouples neural network features into two parts: strengths and weaknesses. This mechanism does not rely on any domain labels and achieves self-discovery and enhancement of domain-invariant features within a single-source domain.

[0138] Exemplary, the method of the present invention can be used for Figure 4 In the fault diagnosis of the mechanical structure shown.

[0139] This invention also provides a joint "style enhancement-feature decoupling" framework for single-source domain generalization: combining multi-scale stochastic style enhancement with the aforementioned masking feature adversarial untangling module. Style enhancement is responsible for "creating differences," providing a comparative basis for decoupling; feature decoupling is responsible for "learning invariance from differences." Working together, they constitute a complete domain generalization solution for single-source domain conditions.

[0140] Alternatively, a module based on differentiable sparse coding can be used to replace the Gumbel-Softmax mask to achieve soft selection or hard masking of feature dimensions.

[0141] Alternatively, mask generation can bypass sampling and directly use the Sigmoid function to output the soft mask value for each dimension, while promoting its sparsity through L1 regularization.

[0142] Alternatively, frequency domain enhancement methods such as Fourier transform mixing and amplitude / phase perturbation can be used to replace spatial style mixing based on AdaIN in order to simulate the frequency response characteristics under different operating conditions.

[0143] Alternatively, a pre-trained generative adversarial network or diffusion model can be used to learn the distribution of the original training signal and sample from it to generate new domain samples, in place of the forward style transformation network.

[0144] Alternatively, cosine similarity loss, Euclidean distance loss, or mutual information maximization can be used to replace Manhattan distance loss to constrain the feature consistency between the original sample and the enhanced sample.

[0145] Alternatively, triplet loss or center loss can be used to replace supervised contrast loss in order to achieve intra-class aggregation and inter-class separation.

[0146] Optionally, an additional domain classifier can be introduced to classify the inferior features by domain, and a feature extractor and a masker can be trained to maximize the error of this domain classifier (gradient inversion layer), thereby explicitly pushing domain-related information into the inferior features.

[0147] Optionally, this invention also discloses an intervention-based generalization framework based on causal inference. This approach does not directly decouple features; instead, it constructs a structural causal model for fault diagnosis, treating observed data as the result of the combined effects of causal factors (fault type, operating condition) and non-causal factors (noise). Then, through do-calculus or counterfactual data generation techniques, the "operating condition" variable is intervened to generate various intervened samples, and the model is trained to predict the invariant "fault type" on these intervened samples. This method pursues invariance at the causal level and represents another theoretical path for achieving domain generalization.

[0148] Combining the above embodiments, this invention creatively introduces a "masking and untangling" mechanism: a learnable mask is designed to explicitly decouple the features extracted by the deep network into "advantageous features" and "disadvantageous features." Advantageous features aim to capture domain-invariant discriminative information that is strongly correlated with fault categories and insensitive to changes in operating conditions; disadvantageous features contain domain-specific information that is related to operating conditions and interferes with diagnosis. A "generation-decoupling" joint training framework is constructed: first, a multi-scale style enhancement module enhances the single original training signal with diversity, simulating potential changes in operating conditions and providing rich contrast samples for decoupling. Then, an adversarial training strategy drives the mask to learn how to separate these two types of features, while using dual consistency constraints (supervised contrastive learning and Manhattan distance) to ensure that the enhancement process does not destroy the original semantics. Furthermore, this invention achieves strong generalization in a single-source domain without domain labels: the entire method does not rely on multiple original training signals or any domain label information; it can be trained using only vibration signals with category labels under a single operating condition. The final model can be directly applied to the diagnosis of unknown operating conditions, significantly improving its applicability and robustness in real industrial scenarios. Extensive experiments on two publicly available bogie bearing datasets demonstrate that the method of this invention significantly outperforms state-of-the-art domain generalization methods in average accuracy across multiple cross-condition diagnostic tasks, exhibiting lower performance variance and validating its superior generalization ability and stability.

[0149] Figure 5This paper showcases the diagnostic results of eight different methods on 20 single-source domain generalization tasks on a bearing dataset. The eight methods and their characteristics are as follows: CNN (Convolutional Neural Networks), which does not include any additional generalization strategy; MixStyle (Mixing styles), which mixes instance-level feature statistics (mean and variance) in a shallow layer of the model to implicitly synthesize new domains or styles, thereby expanding domain diversity during training and improving generalization ability; L2D (Learning to Diversify), which effectively improves single-domain generalization performance by generating diverse style samples while maintaining semantic consistency in the single-source domain case; MSG-ACN (multi-scale style generative and adversarial contrastive networks), which expands the domain distribution through a multi-scale style generation module, generates multi-scale style samples rich in working condition information, and combines adversarial contrastive learning to extract domain-invariant features, thereby enhancing the generalization ability of fault diagnosis; and DG-Softmax (domain generalization Softmax). DG-Softmax: Introduces an improvement mechanism based on decision boundary / margin in the Softmax classification head to enhance discriminability under different operating conditions and speeds, thereby improving cross-domain fault diagnosis performance in scenarios such as planetary gearboxes; ADNet (Adversarial decouplingdomain generalization network, ADNet): Proposes a two-level adversarial learning decoupling framework, which separates domain-related and domain-invariant factors through adversarial training, and expands the distribution by combining sample generation and rearrangement strategies, thereby improving the generalization ability across scenarios; DDDG (Dual disentanglement domain generalization, DDDG): Employs a dual decoupling mechanism, splitting features into domain-invariant and domain-related parts, and applying constraints and enhancements respectively, thereby achieving stronger generalization performance for unknown operating conditions; MFAD is the method proposed in this invention.

[0150] like Figure 5As shown, overall, all domain generalization methods significantly outperform the CNN baseline, indicating that data augmentation or feature diversification strategies can effectively improve the model's cross-domain representation capabilities. Among them, CNN's average diagnostic accuracy is only 83.49%, the lowest among all methods, highlighting its shortcomings in cross-domain tasks. MixStyle, L2D, MSG-CAN, and DG-Softmax show improvement in some tasks, but their overall accuracy and stability remain unsatisfactory. ADNet and DDDG perform exceptionally well overall, with average accuracies of 91.23% and 93.60%, respectively, achieving near-optimal results in some tasks. In contrast, MFAD, the method proposed in this invention, performs best among all methods, achieving an average accuracy of 95.65%, which is 12.16% higher than the baseline method without any domain generalization strategy and 2.05% higher than the second-best performing DDDG. Furthermore, MFAD's standard deviation is 4.49%, the lowest among all methods, indicating that it not only leads in accuracy across different scenarios but also possesses better stability and robustness. Furthermore, MFAD achieved the highest accuracy in 14 out of 20 transfer tasks, and maintained near-optimal levels in the remaining tasks. Compared to MFAD, other methods showed a significant gap in accuracy, with an average accuracy of [missing data]. Figure 7 Most of the data lies below the MFAD broken line, especially in the 0-2, 2-3, and 4-2 tasks, where the advantage of MFAD is most significant. In summary, the experimental results quantitatively demonstrate that the proposed method exhibits excellent generalization performance and robustness across different cross-domain diagnostic tasks, validating its application potential in complex engineering scenarios.

[0151] like Figure 6 As shown, to analyze the diagnostic results in more detail, taking tasks 0-2, 1-3, and 4-3 as examples, the t-distributed random neighborhood embedding (t-SNE) technique is used to map the features obtained by the feature extractor to a two-dimensional space, visually demonstrating the feature distribution learned by different methods. The symbols represent: Normal (NC), Inner Ring Fault (IF), Outer Ring Fault (OF), Rolling Element Fault (RF), and Cage Fault (CF), respectively. Figure 6 As shown, almost all domain generalization methods exhibit a similar trend to the base model in terms of the visualization results of category features; that is, the feature distribution between the source domain subclasses and the invisible target domain subclasses cannot achieve precise alignment as in standard domain adaptation. The main difference between different methods lies in their ability to effectively enhance the separability between categories. In particular, for tasks 1-3, L2D and the proposed MFAD method demonstrate more significant inter-class separation effects. Figure 6The results show that although domain generalization methods generally cannot achieve the alignment effect of standard domain adaptive methods, MFAD performs better than other methods in terms of inter-class separation and intra-class aggregation, verifying the effectiveness of its style generation strategy and masking feature adversarial untangling strategy.

[0152] In addition, such as Figure 7-8 As shown, to more intuitively demonstrate the ability of each method to handle domain differences, feature distribution kernel density estimation maps of the source and target domains are plotted using Tasks 1-3 as examples. Here, `source` represents the source domain and `target` represents the target domain. The feature distribution kernel density estimation maps clearly reflect the overlap and alignment between the two domains under different models, and the degree of overlap directly relates to the effectiveness of the model in transferring knowledge from the source domain to the target domain. Generally speaking, the higher the overlap, the better the model's generalization performance tends to be. Figure 7-8 As shown in Figure (a), the alignment of significant peaks between the source and target domains is insufficient, with limited overlap, indicating a significant difference between the two domains that may weaken the model's performance in the target domain. In contrast, Figures (b)-(g) show more pronounced overlap, with a significant improvement in the consistency of significant peaks between the source and target domains. Particularly in subfigure (h), the feature distributions of the source and target domains highly overlap, demonstrating that the proposed method can better achieve cross-domain feature alignment and highlighting its effectiveness in knowledge transfer and generalization.

[0153] This invention also provides a single-source domain generalized fault diagnosis device based on masking feature-based adversarial unwrapped networks. Optionally, the single-source domain generalized fault diagnosis device based on masking feature-based adversarial unwrapped networks may include a first processor.

[0154] Optionally, the single-source domain generalization fault diagnosis device based on masking feature-based adversarial unwrapping network may further include a memory and a transceiver. The first processor, memory, and transceiver may be connected via a communication bus.

[0155] The first processor, as the control center of the single-source domain generalized fault diagnosis device based on masking feature adversarial unwrapping network, can be a single processor or a collective term for multiple processing elements. For example, the first processor 2001 can be one or more central processing units (CPUs), application-specific integrated circuits (ASICs), or one or more integrated circuits configured to implement embodiments of the present invention, such as one or more digital signal processors (DSPs), or one or more field-programmable gate arrays (FPGAs).

[0156] Optionally, the first processor can perform various functions of the single-source domain generalized fault diagnosis device based on masking feature adversarial untangling network by running or executing software programs stored in memory and calling data stored in memory.

[0157] In a specific implementation, as one example, the first processor may include one or more CPUs.

[0158] In a specific implementation, as one example, the single-source domain generalized fault diagnosis device based on masking feature adversarial untangling network may also include multiple processors, each of which may be a single-core processor (single-CPU) or a multi-core processor (multi-CPU). Here, a processor may refer to one or more devices, circuits, and / or processing cores used to process data (e.g., computer program instructions).

[0159] The memory is used to store the software program that executes the solution of the present invention, and is controlled by the first processor. The specific implementation method can be referred to the above method embodiment, and will not be repeated here.

[0160] Optionally, the memory can be read-only memory (ROM) or other types of static storage devices capable of storing static information and instructions, random access memory (RAM) or other types of dynamic storage devices capable of storing information and instructions, or electrically erasable programmable read-only memory (EEPROM), compact disc read-only memory (CD-ROM) or other optical disc storage, optical disc storage (including compressed optical discs, laser discs, optical discs, digital universal optical discs, Blu-ray discs, etc.), magnetic disk storage media or other magnetic storage devices, or any other medium capable of carrying or storing desired program code in the form of instructions or data structures and accessible by a computer, but not limited thereto. The memory can be integrated with the first processor or exist independently, and is coupled to the first processor through the interface circuit of the single-source domain generalized fault diagnosis device based on masking feature anti-untangling network. The embodiments of the present invention do not specifically limit this.

[0161] A transceiver is used to communicate with network devices or with terminal devices.

[0162] It should be understood that the first processor in the embodiments of the present invention may be a central processing unit (CPU), or it may be other general-purpose processors, digital signal processors (DSPs), application-specific integrated circuits (ASICs), field-programmable gate arrays (FPGAs), or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc. The general-purpose processor may be a microprocessor, or it may be any conventional processor, etc.

[0163] It should also be understood that the memory in the embodiments of the present invention can be volatile memory or non-volatile memory, or may include both volatile and non-volatile memory. The non-volatile memory can be read-only memory (ROM), programmable read-only memory (PROM), erasable programmable read-only memory (EPROM), electrically erasable programmable read-only memory (EEPROM), or flash memory. The volatile memory can be random access memory (RAM), which is used as an external cache. By way of example, but not limitation, many forms of random access memory (RAM) are available, such as static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double data rate synchronous DRAM (DDR SDRAM), enhanced synchronous DRAM (ESDRAM), synchronous linked DRAM (SLDRAM), and direct rambus RAM (DR RAM).

[0164] The above embodiments can be implemented, in whole or in part, by software, hardware (such as circuits), firmware, or any other combination thereof. When implemented using software, the above embodiments can be implemented, in whole or in part, as a computer program product. The computer program product includes one or more computer instructions or computer programs. When the computer instructions or computer programs are loaded or executed on a computer, all or part of the processes or functions described in the embodiments of the present invention are generated. The computer can be a general-purpose computer, a special-purpose computer, a computer network, or other programmable device. The computer instructions can be stored in a computer-readable storage medium or transmitted from one computer-readable storage medium to another. For example, the computer instructions can be transmitted from one website, computer, server, or data center to another website, computer, server, or data center via wired (e.g., infrared, wireless, microwave, etc.) means. The computer-readable storage medium can be any available medium that a computer can access or a data storage device such as a server or data center that includes one or more sets of available media. The available medium can be a magnetic medium (e.g., floppy disk, hard disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium. A semiconductor medium can be a solid-state drive.

[0165] In this invention, "at least one" means one or more, and "more than one" means two or more. "At least one of the following" or similar expressions refer to any combination of these items, including any combination of a single item or a plurality of items. For example, at least one of a, b, or c can represent: a, b, c, ab, ac, bc, or abc, where a, b, and c can be a single item or multiple items.

[0166] It should be understood that, in various embodiments of the present invention, the order of the above-mentioned process numbers does not imply the order of execution. The execution order of each process should be determined by its function and internal logic, and should not constitute any limitation on the implementation process of the embodiments of the present invention.

[0167] Those skilled in the art will recognize that the units and algorithm steps of the various examples described in conjunction with the embodiments disclosed herein can be implemented in electronic hardware, or a combination of computer software and electronic hardware. Whether these functions are implemented in hardware or software depends on the specific application and design constraints of the technical solution. Those skilled in the art can use different methods to implement the described functions for each specific application, but such implementations should not be considered beyond the scope of this invention.

[0168] Those skilled in the art will clearly understand that, for the sake of convenience and brevity, the specific working processes of the devices, apparatuses, and units described above can be referred to the corresponding processes in the foregoing method embodiments, and will not be repeated here.

[0169] In the several embodiments provided by this invention, it should be understood that the disclosed devices, apparatuses, and methods can be implemented in other ways. For example, the apparatus embodiments described above are merely illustrative; for instance, the division of units is only a logical functional division, and in actual implementation, there may be other division methods. For example, multiple units or components may be combined or integrated into another device, or some features may be ignored or not executed. Furthermore, the coupling or direct coupling or communication connection shown or discussed may be through some interfaces; the indirect coupling or communication connection between devices or units may be electrical, mechanical, or other forms.

[0170] The units described as separate components may or may not be physically separate. The components shown as units may or may not be physical units; that is, they may be located in one place or distributed across multiple network units. Some or all of the units can be selected to achieve the purpose of this embodiment according to actual needs.

[0171] In addition, the functional units in the various embodiments of the present invention can be integrated into one processing unit, or each unit can exist physically separately, or two or more units can be integrated into one unit.

[0172] If the aforementioned functions are implemented as software functional units and sold or used as independent products, they can be stored in a computer-readable storage medium. Based on this understanding, the technical solution of this invention, or the part that contributes to the prior art, or a part of the technical solution, can be embodied in the form of a software product. This computer software product is stored in a storage medium and includes several instructions to cause a computer device (which may be a personal computer, server, or network device, etc.) to execute all or part of the steps of the methods described in the various embodiments of this invention. The aforementioned storage medium includes various media capable of storing program code, such as USB flash drives, portable hard drives, read-only memory (ROM), random access memory (RAM), magnetic disks, or optical disks.

[0173] The above description is merely a specific embodiment of the present invention, but the scope of protection of the present invention is not limited thereto. Any variations or substitutions that can be easily conceived by those skilled in the art within the technical scope disclosed in the present invention should be included within the scope of protection of the present invention. Therefore, the scope of protection of the present invention should be determined by the scope of the claims.

Claims

1. A single-source domain generalized fault diagnosis method based on masking feature-based adversarial unwrapped networks, characterized in that, The method includes the following steps: S1. Obtain the original training signal, and perform multi-scale style enhancement on the original training signal based on the style enhancement module to obtain the enhanced signal; S2. Based on the feature extractor set in the masking feature adversarial unwrapping module, the original training signal and the enhanced signal are used to obtain embedded features. Based on the feature extractor and the masker set in the masking feature adversarial unwrapping module, the enhanced signal is decoupled to obtain advantageous features and disadvantageous features. S3. Obtain the embedding feature classification score based on the first classifier and the embedding feature; obtain the dominant feature classification score based on the first classifier and the dominant feature and calculate the dominant feature classification loss; obtain the inferior feature classification score based on the second classifier and the inferior feature and calculate the inferior feature classification loss; calculate the divergence loss based on the dominant feature classification score and the embedding feature classification score. S4. Based on the divergence loss, the dominant feature classification loss, and the inferior feature classification loss, a first loss function and a second loss function are obtained, and a two-stage adversarial iterative training is performed based on the first loss function and the second loss function. The first stage is iterative training to jointly optimize the feature extractor, the first classifier, and the second classifier, and the second stage is iterative training with the parameters of the feature extractor, the masker, the first classifier, and the second classifier fixed. S5. Input the sample to be diagnosed into the feature extractor obtained in step S4 to obtain the dominant features to be diagnosed, and input the dominant features to be diagnosed into the first classifier obtained in step S4 to obtain the fault category prediction probability distribution.

2. The single-source domain generalized fault diagnosis method based on masking feature adversarial unwrapped network according to claim 1, characterized in that, The style enhancement module in step S1 includes multiple style feature generation units arranged in parallel. Each of the multiple style feature generation units includes a converter, an inverse converter, and an instance normalization layer.

3. The single-source domain generalization fault diagnosis method based on masking feature adversarial unwrapped network according to claim 2, characterized in that, Step S1 includes: S11. Based on the original training signal, the converter, and the inverse converter, multiple stylized signals are obtained; S12. The original training signal and the multiple stylized signals are linearly mixed and normalized to obtain the enhanced signal.

4. The single-source domain generalized fault diagnosis method based on masking feature adversarial unwrapped network according to claim 3, characterized in that, The i-th stylized signal among the plurality of stylized signals for: in, It is the converter, It is the inverse converter, It is the original training signal. It is an instance normalization layer. and These are two linear layers in the adaptive instance normalization layer; The enhanced signal for: in, These are random weights sampled from a standard normal distribution.

5. The single-source domain generalization fault diagnosis method based on masking feature adversarial unwrapped network according to claim 1, characterized in that, Step S2 includes: S21. Based on the original training signal and the enhanced signal, the embedded features are obtained using a feature extractor; S22. Based on the masking feature countermeasures in the unwrapping module, the embedded features are processed to obtain a first vector; S23. The first vector is sampled multiple times to obtain a mask vector, and the embedded features are decoupled based on the mask vector to obtain advantageous features and disadvantageous features.

6. The single-source domain generalized fault diagnosis method based on masking feature adversarial unwrapped network according to claim 5, characterized in that, In step S23, the mask vector is M, and its first... The masking strength of each feature dimension in the diagnostic task is used as... The formula is as follows: in, , It is a probability vector, and its expression is: in, Indicates the feature dimension index. For the number of samples index, The logits represent all dimensions in the denominator, used for softmax normalization. Indicates the first During the second sampling Gumbel noise in multiple dimensions For temperature parameters; The It is the first vector: in, It is the aforementioned mask. z is the embedded feature; The advantages and features for: The disadvantageous feature is: Wherein, M is the mask vector.

7. The single-source domain generalized fault diagnosis method based on masking feature adversarial unwrapped network according to claim 1, characterized in that, In step S3: The divergence loss is The formula is: in, It is the embedded feature classification score: in, It is an exponentially normalized exponential function. Z is the first classifier, and Z is the embedded feature. in, It is the dominant feature classification score: in, These are the aforementioned advantages and features.

8. The single-source domain generalized fault diagnosis method based on masking feature adversarial unwrapped network according to claim 1, characterized in that, In step S4: The first loss function for: in, For dominant feature classification loss, Classification loss for inferior features, For divergence loss, To supervise and compare learning loss; The second loss function is: in, It's the loss of Manhattan distance. , and It's about balancing hyperparameters.

9. A single-source domain generalized fault diagnosis device based on masking feature-based adversarial unwrapped networks, characterized in that, The fault diagnosis equipment includes: processor; A memory storing computer-readable instructions that, when executed by the processor, implement the method as described in any one of claims 1 to 8.

10. A computer-readable storage medium, characterized in that, The computer-readable storage medium contains program code that can be called by a processor to execute the method of the single-source domain generalized fault diagnosis device based on masking feature adversarial untangling network as described in any one of claims 1 to 8.