A bearing fault diagnosis method, device, equipment and medium

By introducing physical attention and contrastive learning modules into a complex domain convolutional recurrent network, the problem of fault feature extraction of acoustic signals under strong noise and speed fluctuations is solved, achieving high-precision fault diagnosis under extreme conditions and balancing the contradiction between physical priors and data-driven approaches.

CN122241515APending Publication Date: 2026-06-19BEIJING UNIV OF POSTS & TELECOMM

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
BEIJING UNIV OF POSTS & TELECOMM
Filing Date
2026-03-18
Publication Date
2026-06-19

AI Technical Summary

Technical Problem

In existing technologies for non-contact industrial health monitoring, acoustic signals are easily affected by strong background noise and speed fluctuations, making it difficult to extract early and weak fault features. Deep denoising models are prone to misjudgment or over-smoothing under extreme conditions, failing to effectively balance the contradiction between physical priors and data-driven approaches, resulting in low fault diagnosis accuracy.

Method used

A complex domain convolutional recurrent network is constructed, which combines a physical attention module and a contrastive learning module to generate a multi-scale frequency domain attention mask. Fault features are anchored by physical priors and denoising is achieved by using contrastive learning constraints. The training strategy is dynamically adjusted to adapt to different signal-to-noise ratio environments, thus achieving stable fault feature extraction.

🎯Benefits of technology

It improves the diagnostic accuracy of early and minor faults under extreme operating conditions, retains clearly identifiable fault characteristics while suppressing noise interference, and achieves stable diagnosis across the entire signal-to-noise ratio range.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN122241515A_ABST
    Figure CN122241515A_ABST
Patent Text Reader

Abstract

This invention provides a bearing fault diagnosis method, apparatus, device, and medium, relating to the field of fault detection technology. The method includes: constructing a denoising model comprising a complex-domain convolutional recurrent network with an embedded physical attention module and a contrastive learning module; the physical attention module generating a multi-scale frequency domain mask using operating conditions and fault priors, and weighting the encoded complex features; the contrastive learning module obtaining multiple potential representations of the same input by injecting different noises into the clean output signal and mapping them; constructing a reconstruction loss by minimizing the difference between the input and output signals, and constructing a contrastive loss by using homogeneous representations as positive sample pairs and heterogeneous representations as negative sample pairs, updating parameters based on backpropagation of both; inputting the bearing data to be tested into the trained denoising model to obtain denoised data, then feeding it into a pre-trained fault diagnosis model, and outputting the final fault diagnosis result.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of fault detection technology, and in particular to a bearing fault diagnosis method, apparatus, equipment and medium. Background Technology

[0002] In the field of non-contact industrial health monitoring, acoustic signals are widely used in fault diagnosis of critical equipment such as bearings. The core requirement is to extract early, weak fault features from the acquired acoustic signals to achieve timely early warning and accurate diagnosis of equipment faults. However, in actual industrial scenarios, acoustic signals are easily affected by the coupling interference of strong background noise and speed fluctuations. This severely obscures the weak transient impact signals corresponding to early faults, significantly increasing the difficulty of fault feature extraction and making it difficult to meet the actual needs of refined health monitoring of industrial equipment.

[0003] To address the aforementioned signal interference issues, various deep denoising models have been proposed in existing technologies. Among them, models such as the complex domain DCCRN have become mainstream denoising schemes due to their ability to handle the amplitude and phase information of signals effectively. However, these deep denoising models are essentially physically agnostic statistical models, lacking consideration of the physical mechanisms of equipment failures. Under extreme conditions such as low speed or extremely low signal-to-noise ratio (SNR < -5 dB), they are prone to misjudging the periodic transient impacts generated by bearing failures as noise and performing excessive smoothing, thereby losing the key transient impact components required for fault diagnosis. This results in a tradeoff between model denoising and fault feature preservation. Meanwhile, another type of improvement scheme attempts to enhance the physical interpretability of the model by introducing physical priors. However, such schemes often ignore signal phase information and are difficult to adapt to the frequency ambiguity caused by speed fluctuations. Furthermore, the approach based on physical information neural networks is prone to gradient misleading in strong noise environments, and similarly fails to achieve stable and effective fault feature extraction.

[0004] Therefore, there is a contradiction between existing non-contact industrial health monitoring technologies for denoising and fault feature extraction: when physical fault features are clearly identifiable in acoustic signals, it is necessary to rely on physical priors for anchoring to avoid fault features being misjudged as noise; however, when physical fault features are overwhelmed by strong noise and speed fluctuations, it is necessary to rely on data-driven methods to achieve denoising fallback in order to preserve weak fault features as much as possible. Existing technologies cannot effectively balance the relationship between the two, making it difficult to improve the accuracy of early weak fault diagnosis under extreme operating conditions. Summary of the Invention

[0005] Therefore, it is necessary to provide a bearing fault diagnosis method, device, equipment, and medium to address the aforementioned technical problems.

[0006] The following technical solution is adopted in this specification: This manual provides a bearing fault diagnosis method, including: A denoising model is constructed, comprising a complex-domain convolutional recurrent network with a physical attention module added to the encoder output, and a contrastive learning module connected to the output of the complex-domain convolutional recurrent network; wherein, The physical attention module is configured to: generate frequency domain attention masks with different feature scales based on the prior information of the working conditions and prior knowledge of fault characteristics of the input signal of the complex domain convolutional recurrent network; and use the frequency domain attention masks with different feature scales to weight the complex features output by the encoder in the complex domain convolutional recurrent network. The contrastive learning module is configured to inject different noises into the clean signal output by the complex domain convolutional recurrent network; and map the clean signals with different noises to the same feature space to obtain multiple potential representation vectors corresponding to the same input signal. The reconstruction loss is constructed by minimizing the difference between the input signal and the clean signal output by the complex-domain convolutional recurrent network. Multiple latent representation vectors originating from the same input signal are defined as positive sample pairs, and latent representation vectors originating from other different input signals within the current training batch are defined as negative sample pairs. The contrast loss is constructed by maximizing the similarity of positive sample pairs and minimizing the similarity of negative sample pairs. The denoising model is trained based on the reconstruction loss and the contrast loss. The bearing data to be diagnosed is input into the complex domain convolutional recurrent network in the trained denoising model to obtain denoised bearing data; the denoised bearing data is then input into the pre-trained fault diagnosis model to obtain bearing fault diagnosis results.

[0007] Furthermore, the workflow of the physical attention module specifically includes: Construct a set of prior frequencies for fault characteristics, including the main fault frequency, modulation sideband frequency, and higher-order harmonic frequencies. , ,in, For the first The a priori frequencies of the fault characteristics correspond to the main fault frequency, the modulation sideband frequency, and the higher harmonic frequencies, respectively. The total number of prior frequencies of fault characteristics; Based on the bearing rotation speed corresponding to the input signal of the complex domain convolutional recurrent network, the first... Gaussian soft mask bandwidth corresponding to the prior frequencies of the first-order fault features , ,in, This refers to the bearing speed; This is the preset baseline bandwidth; This is the proportionality coefficient. For time; Based on Gaussian soft mask bandwidth Passing the exam Prior frequencies of fault characteristics Construct a single-scale Gaussian soft mask , ;in, For the first A Gaussian soft mask at resolution scale; Input signal for complex domain convolutional recurrent network; Single-scale Gaussian soft masks of different resolution scales Weighted fusion into multi-resolution Gaussian soft mask , ,in, For the first Weighting coefficients for resolution scales; This represents the total number of resolution scales. Through mapping operators Transform multi-resolution Gaussian soft mask The attention weights are mapped to match the dimension of the complex feature channels of the encoder output in a complex-domain convolutional recurrent network. , ; Complex features output by the encoder in a complex-domain convolutional recurrent network are weighted based on attention weights: ,in, These are the complex features output by the encoder in a complex-domain convolutional recurrent network. For the weighted complex number characteristics, This is for element-wise multiplication.

[0008] Furthermore, the contrastive learning module includes a noise-adding unit, a shared encoder, and a projection head; the workflow of the contrastive learning module specifically includes: The noise-adding unit cleans the output signal of the complex domain convolutional recurrent network. Two different noises were injected respectively , The first noisy view is constructed. With the second noisy view : ; ; in, , Each noise is selected from at least one of pink noise, random impulse noise, burst noise, and Gaussian white noise; First noisy view With the second noisy view Input a shared encoder to extract the corresponding first latent representation. With the second potential representation : ; ; in, For shared encoders; The first potential representation With the second potential representation Input the projection head to obtain the first contrast vector in the contrast space. Compared with the second comparison vector : ; ; in, For projection head.

[0009] Furthermore, the training of the denoising model based on reconstruction loss and contrast loss specifically includes: Construct a dataset containing bearing vibration signal samples with different signal-to-noise ratios; Based on the signal-to-noise ratio of each bearing vibration signal sample, the bearing vibration signal samples in the dataset are divided into three subsets: high signal-to-noise ratio, medium signal-to-noise ratio, and low signal-to-noise ratio. Based on the subset category to which each bearing vibration signal sample belongs, the weight coefficients of the reconstruction loss term and the contrastive learning loss term in the total loss function are dynamically adjusted: When the bearing vibration signal sample belongs to a high signal-to-noise ratio subset, the weight coefficient of the reconstruction loss term is configured to be greater than the weight coefficient of the contrastive learning loss term, and Hinge Loss is introduced into the total loss function to suppress the reverse noise addition phenomenon in the denoising process. When the bearing vibration signal sample belongs to the medium signal-to-noise ratio subset, the weight coefficient of the reconstruction loss term is configured to be equal to or the difference between the weight coefficient of the contrastive learning loss term is less than a preset threshold, so as to achieve collaborative optimization of physical constraints and data-driven approaches. When the bearing vibration signal sample belongs to a low signal-to-noise ratio subset, the weight coefficient of the contrastive learning loss term is configured to be greater than the weight coefficient of the reconstruction loss term in order to enhance the model’s fault feature separability under physical constraint failure scenarios. Based on the total loss function, the network parameters of the denoising model are iteratively updated using the backpropagation algorithm.

[0010] Furthermore, the method of iteratively updating the network parameters of the denoising model based on the total loss function through backpropagation, and employing a phased course training strategy, specifically includes: Phase 1: Only samples from the high signal-to-noise ratio subset and the medium signal-to-noise ratio subset are selected as input data to construct the training batch, while the low signal-to-noise ratio subset samples are frozen or ignored. In this phase, the parameters are iteratively updated based on the corresponding weighted total loss function until the model loss converges, so that the denoising model can preferentially master the ability to extract fault features and reconstruct signals under medium and quiet operating conditions. Phase 2: Gradually introduce samples from the low signal-to-noise ratio subset to participate in training; as the training rounds increase, dynamically increase the sampling ratio or weight of low signal-to-noise ratio samples in the training batch, and use the high contrastive learning loss weight configuration corresponding to the low signal-to-noise ratio subset to enhance the model's fault feature separability under strong noise interference and physical constraint failure scenarios.

[0011] Furthermore, the pre-trained fault diagnosis model is obtained based on a knowledge distillation strategy, specifically including: A teacher-student network architecture is constructed by setting a pre-trained, convergent convolutional recurrent neural network as the teacher model and the temporal convolutional network to be trained as the student model. The teacher model is used to infer from the training samples to generate soft labels containing category probability distributions; Construct a distillation loss function, which includes a softening loss term to constrain the student model output distribution to approximate the teacher model soft label, and a hard supervision loss term to constrain the student model output to match the real fault label; Based on the distillation loss function, the network parameters of the student model are iteratively updated through the backpropagation algorithm until the student model converges, thus obtaining a pre-trained fault diagnosis model.

[0012] This manual provides a bearing fault diagnosis device, including: A denoising model building module is used to construct a denoising model, including a complex-domain convolutional recurrent network with a physical attention module added to the encoder output, and a contrastive learning module connected to the output of the complex-domain convolutional recurrent network; wherein, The physical attention module is configured to: generate frequency domain attention masks with different feature scales based on the prior information of the working conditions and prior knowledge of fault characteristics of the input signal of the complex domain convolutional recurrent network; and use the frequency domain attention masks with different feature scales to weight the complex features output by the encoder in the complex domain convolutional recurrent network. The contrastive learning module is configured to inject different noises into the clean signal output by the complex domain convolutional recurrent network; and map the clean signals with different noises to the same feature space to obtain multiple potential representation vectors corresponding to the same input signal. The denoising model training module is used to construct a reconstruction loss by minimizing the difference between the input signal of the complex-domain convolutional recurrent network and the clean signal output by the complex-domain convolutional recurrent network; it defines multiple latent representation vectors from the same input signal as positive sample pairs and latent representation vectors from other different input signals in the current training batch as negative sample pairs; it constructs a contrastive loss by maximizing the similarity of positive sample pairs and minimizing the similarity of negative sample pairs; and it trains the denoising model based on the reconstruction loss and the contrastive loss. The diagnostic module is used to input the bearing data to be diagnosed into the complex domain convolutional recurrent network in the trained denoising model to obtain denoised bearing data; and to input the denoised bearing data into the pre-trained fault diagnosis model to obtain bearing fault diagnosis results.

[0013] This specification provides a computer-readable storage medium storing a computer program that, when executed by a processor, implements the above-described bearing fault diagnosis method.

[0014] This specification provides a computer device, including a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein the processor executes the program to implement the above-described bearing fault diagnosis method.

[0015] The above-mentioned technical solutions adopted in this specification can achieve the following beneficial effects: This invention generates a multi-scale frequency domain attention mask by adding a physical attention module that integrates prior operating conditions and fault features at the encoder output. This mask precisely weights complex features, anchoring and strengthening the real fault features in the acoustic signal from a physical prior perspective, thus avoiding the problem of fault features being misjudged as noise. At the same time, it relies on a complex domain convolutional recurrent network to complete the basic signal denoising and reconstruction. The contrastive loss defines multiple enhanced signals obtained by injecting different noises into the same clean signal as positive sample pairs and enhanced signals corresponding to different original signals as negative sample pairs. During training, it forces the potential representations of signals from the same source to be as close as possible in the feature space and the potential representations of signals from different sources to be as far apart as possible. This allows the system to autonomously learn a stable feature structure that is highly correlated with the semantics of the original signal and insensitive to noise disturbances at the data-driven level. It relies on physical prior constraints to retain clearly identifiable fault features and prevent false filtering of features, while using data-driven comparative learning to constrain denoising effects and lock weak fault features that are submerged by the coupling interference of strong noise and speed fluctuations. This achieves a synergistic constraint between physical prior anchoring and data-driven denoising, balancing the contradiction between fault feature protection and noise suppression, and can improve the diagnostic accuracy of early weak faults under extreme operating conditions. Attached Figure Description

[0016] The accompanying drawings, which are included to provide a further understanding of this application and form part of this application, illustrate exemplary embodiments and are used to explain this application, but do not constitute an undue limitation of this application. In the drawings:

[0017] Figure 1 This is a flowchart illustrating a bearing fault diagnosis method provided in this specification. Figure 2 This is a schematic diagram of a dual-flow constraint overall framework provided in this specification; Figure 3This specification provides a comparative diagram of noise reduction and fidelity at -10 dB SNR / 40Hz for a cage failure. Figure 4 This specification provides a comparative diagram of noise reduction and fidelity for rolling element faults at -10 dB SNR / 40Hz. Figure 5 This manual provides a schematic diagram comparing the noise reduction and fidelity of an inner ring fault at -10 dB SNR / 40Hz. Figure 6 This manual provides a schematic diagram comparing the noise reduction and fidelity of an outer ring fault at -10 dB SNR / 40Hz. Figure 7 This is a schematic diagram of the structure of a bearing fault diagnosis device provided in this specification; Figure 8 This is a schematic diagram of a computer device provided for this specification. Detailed Implementation

[0018] To make the objectives, technical solutions, and advantages of this specification clearer, the technical solutions of this application will be clearly and completely described below in conjunction with specific embodiments and corresponding drawings. Obviously, the described embodiments are only a part of the embodiments of this application, and not all of them. All other embodiments obtained by those skilled in the art based on the embodiments in this specification without creative effort are within the scope of protection of this application.

[0019] In non-contact industrial health monitoring, acoustic signals are susceptible to interference from strong background noise and speed fluctuations, making it difficult to extract early, subtle faults. Existing deep denoising models (such as complex-domain DCCRN) are essentially physically agnostic statistical models: under low speed or extremely low signal-to-noise ratio (SNR < -5 dB) conditions, they easily misjudge periodic fault impacts as noise and over-smooth them, thus losing crucial transient impact components for diagnosis. On the other hand, simply introducing physical priors (such as binary / ratio masks) often ignores phase and struggles to adapt to frequency ambiguity caused by speed fluctuations; physical information-based neural networks are prone to gradient misinterpretation under strong noise. This creates a core contradiction: when physical features are clear, physical anchoring is needed; when physical features are obscured, data-driven fallback is required.

[0020] The bearing fault diagnosis method of the present invention is described below with reference to the accompanying drawings.

[0021] Figure 1 This is a flowchart illustrating a bearing fault diagnosis method provided in this specification, such as... Figure 1 As shown, the method includes the following: S1. Construct a denoising model, including a complex-domain convolutional recurrent network with a physical attention module added to the encoder output, and a contrastive learning module connected to the output of the complex-domain convolutional recurrent network; wherein, The physical attention module is configured to: generate frequency domain attention masks with different feature scales based on the prior information of the working conditions and prior knowledge of fault characteristics of the input signal of the complex domain convolutional recurrent network; and use the frequency domain attention masks with different feature scales to weight the complex features output by the encoder in the complex domain convolutional recurrent network. The contrastive learning module is configured to inject different noises into the clean signal output by the complex domain convolutional recurrent network; and map the clean signals with different noises to the same feature space to obtain multiple potential representation vectors corresponding to the same input signal.

[0022] S2. Construct a reconstruction loss by minimizing the difference between the input signal of the complex-domain convolutional recurrent network and the clean signal output by the complex-domain convolutional recurrent network; define multiple latent representation vectors originating from the same input signal as positive sample pairs, and define latent representation vectors originating from other different input signals within the current training batch as negative sample pairs; construct a contrastive loss by maximizing the similarity of positive sample pairs and minimizing the similarity of negative sample pairs; train the denoising model based on the reconstruction loss and the contrastive loss. S3. Input the bearing data to be diagnosed into the complex domain convolutional recurrent network in the trained denoising model to obtain denoised bearing data; input the denoised bearing data into the pre-trained fault diagnosis model to obtain bearing fault diagnosis results.

[0023] This invention provides a rolling bearing fault diagnosis method based on acoustic / vibration signals under strong background noise and nonlinear coupling interference of rotational speed fluctuations. Specifically, it relates to a robust diagnostic method for all signal-to-noise ratio (SNR) conditions that integrates physical prior attention, latent spatial contrastive learning, and SNR-aware adaptive curriculum learning. This invention also provides a physically guided adaptive contrastive curriculum learning bearing fault diagnosis method for extreme time-varying noise environments. It aims to propose a bearing fault diagnosis scheme for such environments, employing a PACL framework of "physical-data dual-stream constraint + SNR adaptive curriculum control." This allows the model to achieve both "noise reduction and noise-free fidelity" across the entire SNR range of -10 dB to +15 dB, while suppressing over-smoothing and maintaining the integrity of fault features in low-speed, weak-signal scenarios.

[0024] The steps S1 to S3 described above summarize the overall architecture and core logic of the method of this invention, namely, the extraction and diagnosis of fault features under strong noise through a collaborative mechanism of denoising guided by physical priors and data-driven contrastive learning. Further, this embodiment refines the above-described generalized steps into a physical-guided adaptive curriculum learning framework for extreme time-varying noise environments. This framework decomposes the model construction in S1 into specific signal representation construction, multi-resolution soft mask generation, and complex domain network design; extends the loss function training in S2 into a dynamic optimization process including dual-view contrastive learning, SNR-aware adaptive weight gating, and Hinge Loss constraints; and concretizes the diagnostic output in S3 into an efficient classification strategy based on knowledge distillation. Figure 2 This is a schematic diagram of a dual-flow constraint overall framework provided in this specification, which is illustrated below. Figure 2 The six specific implementation stages of this method are described in detail.

[0025] Step 1: Signal acquisition and representation construction.

[0026] Acquire the acoustic / vibration time-domain signal of the rolling bearing operation; perform framing and time-frequency transformation on the signal to form the time-frequency representation of the network input. Support the use of complex spectral features to enhance the backbone, while using a 64-dimensional Log-Mel spectrum as the classification input on the diagnostic side. Specifically, acquire the discrete time-domain signal of the bearing acoustic / vibration. The complex time spectrum is obtained by windowing the frames and performing a short-time Fourier transform:

[0027] ; in, For frame index, For frequency index, For window functions, The imaginary unit is used. This complex spectrum is used as input for subsequent complex field augmentation networks.

[0028] Step 2: Physical feature frequency band modeling and multi-resolution soft mask generation.

[0029] Based on the kinematic laws of bearings, a characteristic frequency set is determined, comprising the main fault frequency, modulation sideband frequency, and higher-order harmonic frequencies. Considering frequency smearing caused by speed fluctuations, a multi-resolution Gaussian soft mask is constructed instead of a binary hard mask. The bandwidth parameter of the soft mask is an adaptive bandwidth, positively correlated with the speed change rate, used to cover the frequency smearing interval. The multiple masks are then fused to obtain the final physical soft mask. Specifically, considering frequency smearing caused by speed fluctuations, this invention abandons binary masks and constructs a multi-resolution physical soft mask based on a Gaussian distribution, which includes the main fault frequency, modulation sideband frequency, and higher-order harmonic frequencies.

[0030] 1) Construct the target frequency set: ; in, For the first The a priori frequencies of the fault characteristics correspond to the main fault frequency, the modulation side band frequency, and the higher harmonic frequencies, respectively.

[0031] 2) For any Define a single-scale soft mask: ; in, For adaptive bandwidth.

[0032] 3) Adaptive bandwidth Positively correlated with the rate of change of rotational speed, to cover the frequency range: ; in, This refers to the bearing speed or frequency. This is the preset baseline bandwidth; This is the proportionality coefficient. For time.

[0033] 4) Set multiple resolution rulers The final soft mask is obtained by fusion: ; ; in, For the first Weighting coefficients for resolution scales, For normalization operators, such as linear normalization to [0,1].

[0034] Step 3: Construction of a physically guided complex domain augmentation network.

[0035] This invention employs a Complex DCCRN (Complex Domain Convolutional Recurrent Network) as the backbone to simultaneously enhance amplitude and phase information. A physical attention module is designed within the network, injecting multi-resolution physical soft masks as attention weights. By multiplying these masks element-wise with encoder features, the network is guided to focus on physically appropriate frequency bands, achieving spectral fidelity enhancement and suppressing the accidental deletion of weak, low-speed features. Specifically, to fully utilize phase information, this invention uses a Complex DCCRN as the backbone and designs a physical attention module.

[0036] Suppose the encoder outputs complex features at a certain layer. Through the mapping operator Transform multi-resolution Gaussian soft mask The attention weights are mapped to match the dimension of the complex feature channels of the encoder output in a complex-domain convolutional recurrent network. : ; in, This is a mapping operator, which can be either an identity or a linear scaling operator. Physical attention injection is performed through element-wise multiplication.

[0037] ; in For element-wise multiplication, These are weighted complex features. This injection method guides the network to focus on physically reasonable frequency bands to improve spectral fidelity and prevent the accidental deletion of low-speed, weak signals.

[0038] Step 4: Data-driven latent space contrastive learning constraints.

[0039] In extremely low signal-to-noise ratio (SNR < -6 dB) conditions, physical features may be completely submerged, and forced physical constraints can introduce bias. Therefore, two "heterogeneous noise views" are constructed for the same clean signal segment. A latent representation is obtained through a shared encoder, and InfoNCE contrastive loss is used to maximize mutual information between source views and minimize the similarity between dissimilar samples. This forces the encoder to ignore random noise perturbations and focus on structured semantics, maintaining separability in the "physical failure zone." Specifically, when SNR < -6 dB, physical features may be completely submerged, and forced physical constraints can introduce misleading bias; therefore, dual-view contrastive learning is introduced.

[0040] 1) For the same cleaning signal Construct two noisy views: ; ;in It is generated by different noise injection strategies, and is selected from at least one of pink noise, random impulse noise, burst noise and Gaussian white noise.

[0041] 2) Shared encoder Extracting latent characterization: ; .

[0042] 3) Projector head Obtain the comparison space vector: ; .

[0043] 4) Employ InfoNCE to maximize mutual information between views from the same source and minimize the similarity between dissimilar samples: ; ; in, For temperature coefficient, For the in-batch sample set, This represents negative samples. This mechanism is used to maintain feature separability in the physical failure region.

[0044] Step 5: Learning the course on real industrial heterogeneous noise field injection and SNR sensing adaptation.

[0045] During the training phase, a highly simulated heterogeneous industrial noise field is constructed, containing at least four types of noise primitives randomly combined and injected: pink noise, random impulse noise, burst noise, and Gaussian white noise. Simultaneously, the SNR of the input samples is estimated and binned, and the weights of "physical guidance" and "contrastive learning" loss are dynamically adjusted through an adaptive curriculum controller. The low SNR subset (<-6 dB) primarily uses contrastive learning; the medium SNR subset (-6 dB≤SNR≤+2 dB) uses a combination of physical and contrastive learning; and the high SNR subset (>+2 dB) primarily uses physical guidance, with Hinge Loss introduced to prevent reverse noise addition, ensuring high fidelity and lossless compatibility at high SNR. Specifically, during the training phase, a heterogeneous industrial noise field is constructed and randomly combined and injected: pink noise, random impulse noise, burst noise, and Gaussian white noise.

[0046] 1) Estimate the sample signal-to-noise ratio And divide into buckets: Bucketing means: low SNR physical features are unreliable; medium SNR features are collaborative; high SNR physical features are clear and require fidelity constraints.

[0047] 2) Total Loss Function and Weight Gating: Let the reconstruction loss be... The comparative loss is The total loss is: ;in, Dynamically adjusts bucketing based on SNR: Low bucket: Larger Smaller; Mid bucket: and balance; High bucket: Increase and enable. .

[0048] 3) Introduce Hinge Loss in the High Bucket to prevent inverse noise addition and ensure high fidelity. The Hinge term can take the form of a threshold penalty for the difference between the enhanced output and the input, such as: ; in, To enhance the output complex spectrum, To input the complex spectrum, For the measure of difference (e.g., energy difference or amplitude difference within the mask band). This is the tolerance threshold. This constraint ensures that no additional artificial artifacts are introduced when the signal itself is already relatively pure.

[0049] Step 6: Diagnostic classification output.

[0050] The enhanced / represented features are input into the fault classifier to output the fault category. To balance edge deployment efficiency, a teacher-student knowledge distillation diagnostic framework can be adopted: the teacher model is a CRNN (convolutional layer + bidirectional GRU), and the student model is a lightweight TCN (causal convolution + dilated convolution), both using a 64-dimensional Log-Mel spectrum as input to achieve five-class classification (normal, inner race, outer race, rolling element, cage). Specifically, the model output features are input into the classifier to obtain the fault category. Experiments use the BJTU acoustic bearing dataset, setting up a five-class classification task (normal, inner race, outer race, rolling element, cage), and constructing a full SNR test set from -10 dB to +15 dB to verify robustness.

[0051] 1) Under extreme noise of -10dB, an adaptive mechanism is used to make the method degenerate into an excellent data-driven model to ensure "survivability".

[0052] 2) In the critical range of 0dB to +2dB, the decision boundary is stabilized by physical topology constraints to suppress prediction fluctuations.

[0053] 3) In the high SNR region of +8dB to +15dB, artificial artifacts are avoided by attenuation regularization and Hinge constraint, thus achieving lossless preservation.

[0054] This manual also provides experimental verification results for the above methods. Figure 3 This specification provides a comparative diagram of noise reduction and fidelity at -10 dB SNR / 40Hz for a cage failure. Figures 4 to 6The three figures respectively demonstrate the denoising and fidelity comparison effects of three typical faults—rolling element, inner ring, and outer ring—under extreme noise conditions (-10 dBSNR / 40 Hz). All three figures employ a unified four-column, three-row matrix layout for multi-dimensional evaluation: the four horizontal columns sequentially present the denoising results of the original clean signal, the noisy signal, the baseline method (DCCRN), and the denoising result of the proposed method, providing a visual comparison of signal characteristics at each stage; the three vertical rows delve into performance differences from different physical and characteristic perspectives—the first row uses a time-frequency spectrum plot to show the energy distribution of the signal in the time-frequency plane, focusing on the suppression capabilities of each method against background noise and the degree of recovery of the main frequency band structure; the second row uses a Log-Mel feature map to reflect the differences in the representation of the denoised signal in the diagnostic feature space, verifying the method's ability to preserve the fault-related spectral structure; the third row uses normalized order amplitude curves to quantitatively analyze the retention of fault-related order responses, thereby comprehensively evaluating whether each method effectively protects key fault characteristics while providing powerful denoising. In addition, the ablation analysis results of the performance contribution of each module at 20Hz under different signal-to-noise ratios are shown in Table 1:

[0055] Table 1. Ablation analysis of the performance contribution of each module at 20Hz under different signal-to-noise ratios. The above method addresses the problem that bearing fault features are easily obscured and diagnosis is unstable under low signal-to-noise ratio (SNR) conditions due to factors such as strong background noise, time-varying noise, and speed fluctuations. It introduces a physically guided attention mechanism into the complex domain enhancement network: a multi-resolution Gaussian soft mask containing the main frequency, modulation sidebands, and higher-order harmonics is constructed based on the bearing fault feature frequencies and injected into the network as attention weights to strengthen fault-related frequency bands and suppress excessive smoothing of weak low-speed features. Simultaneously, by constructing a heterogeneous noise dual-view, noise-invariant semantic representations are learned using InfoNCE contrastive learning constraints in the latent space to improve robustness under extremely low SNR conditions. Furthermore, based on SNR estimation, binning is performed, and an adaptive curriculum strategy is used to dynamically adjust the training weights of physically guided and contrastive learning. Hinge constraints are introduced in the high SNR region to avoid reverse noise addition and ensure spectral fidelity of the enhanced output, thereby achieving stable bearing fault diagnosis under all SNR conditions.

[0056] The bearing fault diagnosis device provided by the present invention is described below. The bearing fault diagnosis device described below can be referred to in correspondence with the bearing fault diagnosis method described above.

[0057] Figure 7 This is a schematic diagram of a bearing fault diagnosis device provided in this specification. For example, please refer to [link to relevant documentation]. Figure 7 As shown, the bearing fault diagnosis device may include: A denoising model building module is used to construct a denoising model, including a complex-domain convolutional recurrent network with a physical attention module added to the encoder output, and a contrastive learning module connected to the output of the complex-domain convolutional recurrent network; wherein, The physical attention module is configured to: generate frequency domain attention masks with different feature scales based on the prior information of the working conditions and prior knowledge of fault characteristics of the input signal of the complex domain convolutional recurrent network; and use the frequency domain attention masks with different feature scales to weight the complex features output by the encoder in the complex domain convolutional recurrent network. The contrastive learning module is configured to inject different noises into the clean signal output by the complex domain convolutional recurrent network; and map the clean signals with different noises to the same feature space to obtain multiple potential representation vectors corresponding to the same input signal.

[0058] The denoising model training module is used to construct a reconstruction loss by minimizing the difference between the input signal of the complex-domain convolutional recurrent network and the clean signal output by the complex-domain convolutional recurrent network; multiple latent representation vectors originating from the same input signal are defined as positive sample pairs, and latent representation vectors originating from other different input signals in the current training batch are defined as negative sample pairs; a contrastive loss is constructed by maximizing the similarity of positive sample pairs and minimizing the similarity of negative sample pairs; and the denoising model is trained based on the reconstruction loss and the contrastive loss.

[0059] The diagnostic module is used to input the bearing data to be diagnosed into the complex domain convolutional recurrent network in the trained denoising model to obtain denoised bearing data; and to input the denoised bearing data into the pre-trained fault diagnosis model to obtain bearing fault diagnosis results.

[0060] Specific limitations regarding the bearing fault diagnosis device can be found in the above-mentioned limitations on bearing fault diagnosis, and will not be repeated here. Each module in the aforementioned bearing fault diagnosis device can be implemented entirely or partially through software, hardware, or a combination thereof. These modules can be embedded in the processor of a computer device in hardware form or independent of it, or stored in the memory of a computer device in software form, so that the processor can call and execute the corresponding operations of each module.

[0061] This specification also provides a computer-readable storage medium storing a computer program that can be used to execute the above-described... Figure 1 The provided bearing fault diagnosis method.

[0062] This instruction manual also provides Figure 8 The schematic diagram of the computer device shown is as follows: Figure 8As shown, at the hardware level, this computer device includes a processor, internal bus, network interface, memory, and non-volatile memory, and may also include other hardware required for business operations. The processor reads the corresponding computer program from the non-volatile memory into memory and then executes it to achieve the above. Figure 1 The provided bearing fault diagnosis method.

[0063] Those skilled in the art will understand that all or part of the processes in the methods of the above embodiments can be implemented by a computer program instructing related hardware. The computer program can be stored in a non-volatile computer-readable storage medium, and when executed, it can include the processes of the embodiments of the above methods. Any references to memory, storage, databases, or other media used in the embodiments provided in this application can include at least one of non-volatile and volatile memory. Non-volatile memory can include read-only memory (ROM), magnetic tape, floppy disk, flash memory, or optical storage, etc. Volatile memory can include random access memory (RAM) or external cache memory. By way of illustration and not limitation, RAM can be in various forms, such as static random access memory (SRAM) or dynamic random access memory (DRAM), etc.

[0064] The technical features of the above embodiments can be combined in any way. For the sake of brevity, not all possible combinations of the technical features in the above embodiments are described. However, as long as there is no contradiction in the combination of these technical features, they should be considered to be within the scope of this specification.

Claims

1. A bearing fault diagnosis method, characterized in that, include: A denoising model is constructed, comprising a complex-domain convolutional recurrent network with a physical attention module added to the encoder output, and a contrastive learning module connected to the output of the complex-domain convolutional recurrent network; wherein, The physical attention module is configured to: generate frequency domain attention masks with different feature scales based on the prior information of the working conditions and prior knowledge of fault characteristics of the input signal of the complex domain convolutional recurrent network; and use the frequency domain attention masks with different feature scales to weight the complex features output by the encoder in the complex domain convolutional recurrent network. The contrastive learning module is configured to inject different noises into the clean signal output by the complex domain convolutional recurrent network; and map the clean signals with different noises to the same feature space to obtain multiple potential representation vectors corresponding to the same input signal. The reconstruction loss is constructed by minimizing the difference between the input signal and the clean signal output by the complex-domain convolutional recurrent network. Multiple latent representation vectors originating from the same input signal are defined as positive sample pairs, and latent representation vectors originating from other different input signals within the current training batch are defined as negative sample pairs. The contrast loss is constructed by maximizing the similarity of positive sample pairs and minimizing the similarity of negative sample pairs. The denoising model is trained based on the reconstruction loss and the contrast loss. The bearing data to be diagnosed is input into the complex domain convolutional recurrent network in the trained denoising model to obtain denoised bearing data; the denoised bearing data is then input into the pre-trained fault diagnosis model to obtain bearing fault diagnosis results.

2. The bearing fault diagnosis method according to claim 1, characterized in that, The workflow of the physical attention module specifically includes: Construct a set of prior frequencies for fault characteristics, including the main fault frequency, modulation sideband frequency, and higher-order harmonic frequencies. , ,in, For the first The a priori frequencies of the fault characteristics correspond to the main fault frequency, the modulation sideband frequency, and the higher harmonic frequencies, respectively. The total number of prior frequencies of fault characteristics; Based on the bearing rotation speed corresponding to the input signal of the complex domain convolutional recurrent network, the first... Gaussian soft mask bandwidth corresponding to the prior frequencies of the first-order fault features , ,in, This refers to the bearing speed; This is the preset baseline bandwidth; This is the proportionality coefficient. For time; Based on Gaussian soft mask bandwidth Passing the exam Prior frequencies of fault characteristics Construct a single-scale Gaussian soft mask , ;in, For the first A Gaussian soft mask at resolution scale; Input signal for complex domain convolutional recurrent network; Single-scale Gaussian soft masks of different resolution scales Weighted fusion into multi-resolution Gaussian soft mask , ,in, For the first Weighting coefficients for resolution scales; This represents the total number of resolution scales. Through mapping operators Transform multi-resolution Gaussian soft mask The attention weights are mapped to match the dimension of the complex feature channels of the encoder output in a complex-domain convolutional recurrent network. , ; Complex features output by the encoder in a complex-domain convolutional recurrent network are weighted based on attention weights: ,in, These are the complex features output by the encoder in a complex-domain convolutional recurrent network. For the weighted complex number characteristics, This is for element-wise multiplication.

3. The bearing fault diagnosis method according to claim 1, characterized in that, The contrastive learning module includes a noise-adding unit, a shared encoder, and a projection head; the workflow of the contrastive learning module specifically includes: The noise-adding unit cleans the output signal of the complex domain convolutional recurrent network. Two different noises were injected respectively , The first noisy view is constructed. With the second noisy view : ; ; in, , Each noise is selected from at least one of pink noise, random impulse noise, burst noise, and Gaussian white noise; First noisy view With the second noisy view Input a shared encoder to extract the corresponding first latent representation. With the second potential representation : ; ; in, For shared encoders; The first potential representation With the second potential representation Input the projection head to obtain the first contrast vector in the contrast space. Compared with the second comparison vector : ; ; in, For projection head.

4. The bearing fault diagnosis method according to claim 1, characterized in that, The training of the denoising model based on reconstruction loss and contrast loss specifically includes: Construct a dataset containing bearing vibration signal samples with different signal-to-noise ratios; Based on the signal-to-noise ratio of each bearing vibration signal sample, the bearing vibration signal samples in the dataset are divided into three subsets: high signal-to-noise ratio, medium signal-to-noise ratio, and low signal-to-noise ratio. Based on the subset category to which each bearing vibration signal sample belongs, the weight coefficients of the reconstruction loss term and the contrastive learning loss term in the total loss function are dynamically adjusted: When the bearing vibration signal sample belongs to a high signal-to-noise ratio subset, the weight coefficient of the reconstruction loss term is configured to be greater than the weight coefficient of the contrastive learning loss term, and Hinge Loss is introduced into the total loss function to suppress the reverse noise addition phenomenon in the denoising process. When the bearing vibration signal sample belongs to the medium signal-to-noise ratio subset, the weight coefficient of the reconstruction loss term is configured to be equal to or the difference between the weight coefficient of the contrastive learning loss term is less than a preset threshold, so as to achieve collaborative optimization of physical constraints and data-driven approaches. When the bearing vibration signal sample belongs to a low signal-to-noise ratio subset, the weight coefficient of the contrastive learning loss term is configured to be greater than the weight coefficient of the reconstruction loss term in order to enhance the model’s fault feature separability under physical constraint failure scenarios. Based on the total loss function, the network parameters of the denoising model are iteratively updated using the backpropagation algorithm.

5. The bearing fault diagnosis method according to claim 4, characterized in that, The method of iteratively updating the network parameters of the denoising model based on the total loss function through backpropagation algorithm, and adopting a phased course training strategy, specifically includes: Phase 1: Only samples from the high signal-to-noise ratio subset and the medium signal-to-noise ratio subset are selected as input data to construct the training batch, while the low signal-to-noise ratio subset samples are frozen or ignored. In this phase, the parameters are iteratively updated based on the corresponding weighted total loss function until the model loss converges, so that the denoising model can preferentially master the ability to extract fault features and reconstruct signals under medium and quiet operating conditions. Phase 2: Gradually introduce samples from the low signal-to-noise ratio subset to participate in training; as the training rounds increase, dynamically increase the sampling ratio or weight of low signal-to-noise ratio samples in the training batch, and use the high contrastive learning loss weight configuration corresponding to the low signal-to-noise ratio subset to enhance the model's fault feature separability under strong noise interference and physical constraint failure scenarios.

6. The bearing fault diagnosis method according to claim 1, characterized in that, The pre-trained fault diagnosis model is obtained based on a knowledge distillation strategy, specifically including: A teacher-student network architecture is constructed by setting a pre-trained, convergent convolutional recurrent neural network as the teacher model and the temporal convolutional network to be trained as the student model. The teacher model is used to infer from the training samples to generate soft labels containing category probability distributions; Construct a distillation loss function, which includes a softening loss term to constrain the student model output distribution to approximate the teacher model soft label, and a hard supervision loss term to constrain the student model output to match the real fault label; Based on the distillation loss function, the network parameters of the student model are iteratively updated through the backpropagation algorithm until the student model converges, thus obtaining a pre-trained fault diagnosis model.

7. A bearing fault diagnosis device, characterized in that, include: A denoising model building module is used to construct a denoising model, including a complex-domain convolutional recurrent network with a physical attention module added to the encoder output, and a contrastive learning module connected to the output of the complex-domain convolutional recurrent network; wherein, The physical attention module is configured to: generate frequency domain attention masks with different feature scales based on the prior information of the working conditions and prior knowledge of fault characteristics of the input signal of the complex domain convolutional recurrent network; and use the frequency domain attention masks with different feature scales to weight the complex features output by the encoder in the complex domain convolutional recurrent network. The contrastive learning module is configured to inject different noises into the clean signal output by the complex domain convolutional recurrent network; and map the clean signals with different noises to the same feature space to obtain multiple potential representation vectors corresponding to the same input signal. The denoising model training module is used to construct a reconstruction loss by minimizing the difference between the input signal of the complex-domain convolutional recurrent network and the clean signal output by the complex-domain convolutional recurrent network; it defines multiple latent representation vectors from the same input signal as positive sample pairs and latent representation vectors from other different input signals in the current training batch as negative sample pairs; it constructs a contrastive loss by maximizing the similarity of positive sample pairs and minimizing the similarity of negative sample pairs; and it trains the denoising model based on the reconstruction loss and the contrastive loss. The diagnostic module is used to input the bearing data to be diagnosed into the complex domain convolutional recurrent network in the trained denoising model to obtain denoised bearing data; and to input the denoised bearing data into the pre-trained fault diagnosis model to obtain bearing fault diagnosis results.

8. An electronic device comprising a memory, a processor, and a computer program stored in the memory and running on the processor, characterized in that, When the processor executes the computer program, it implements the bearing fault diagnosis method as described in any one of claims 1 to 6.

9. A non-transitory computer-readable storage medium having a computer program stored thereon, characterized in that, When the computer program is executed by the processor, it implements the bearing fault diagnosis method as described in any one of claims 1 to 6.