Slewing bearing fault diagnosis method and device

What is AI technical title?
AI technical title is built by PatSnap AI team. It summarizes the technical point description of the patent document.
By collecting and processing various types of monitoring signals, a multi-channel feature extraction architecture was constructed and cross-modal gating fusion was performed, which solved the problem of insufficient heterogeneous signal fusion feature discrimination capability in slewing bearing fault diagnosis and achieved high-precision fault diagnosis.

CN122241350APending Publication Date: 2026-06-19HEBEI UNIV OF SCI & TECH

View PDF 0 Cites 0 Cited by

Patent Information

Authority / Receiving Office: CN · China
Patent Type: Applications(China)
Current Assignee / Owner: HEBEI UNIV OF SCI & TECH
Filing Date: 2026-03-11
Publication Date: 2026-06-19

Application Information

Patent Timeline

11 Mar 2026

Application

19 Jun 2026

Publication

CN122241350A

IPC: G06F18/241; G01M13/00; G01M13/045; G06F18/25; G06F18/213; G06F18/10; G06N3/0455; G06N3/0464; G06N3/08

AI Tagging

Application Domain

Machine part testing Neural learning methods

Explore More Agents

Novelty Search
Search existing technologies and assess novelty
↗
FTO
Analyze whether a product may infringe others' patents
↗
Design FTO
Check prior-design risk for exterior design
↗
Drafting
Draft patent application text based on a technical solution
↗
Find Solutions with TRIZ
Generate feasible solution to solve your technical challenge
↗

Similar Technology Patents

Get free access to AI patent search and analysis

Check patentability, review prior art and ask IP Agent with full patent context.

AI Technical Summary

Technical Problem

Existing technologies use simple multimodal signal fusion methods, which are difficult to handle the differences between heterogeneous signals and lack the ability to distinguish fusion features, resulting in poor fault diagnosis of slewing bearings.

Method used

Multiple types of monitoring signals are collected during the operation of the slewing bearing. Independent feature tensors are extracted through a multi-channel feature extraction architecture and cross-modal gating fusion processing is performed. Combined with bidirectional temporal dependency modeling and global key feature focusing, fault diagnosis results are output.

Benefits of technology

It achieves full-dimensional capture of slewing bearing fault characteristics, improves the accuracy and stability of fault diagnosis, solves the problem of insufficient discrimination power of heterogeneous signal fusion, and significantly improves fault differentiation capability.

✦ Generated by Eureka AI based on patent content.

Smart Images

Figure CN122241350A_ABST

Patent Text Reader

Abstract

This invention relates to the field of fault diagnosis technology, and more particularly to a method and apparatus for diagnosing slewing bearing faults. The method includes: acquiring multiple types of monitoring signals during the operation of the slewing bearing, preprocessing the obtained multi-source raw monitoring data to obtain a standardized sample sequence; constructing a multi-channel feature extraction architecture to extract features from different types of monitoring signals in the sample sequence, obtaining independent feature tensors corresponding to each type of signal; performing cross-modal gating fusion processing on the independent feature tensors to obtain fused features; inputting the fused features into a classification module for fault identification, and outputting the fault diagnosis result of the slewing bearing. This invention can solve the problem of insufficient discrimination capability of fused features caused by the simple multimodal signal fusion method in the prior art.

Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of fault diagnosis technology, and in particular to a method and apparatus for diagnosing slewing bearing faults. Background Technology

[0002] Slewing bearings are critical rotating components capable of withstanding axial loads, radial loads, and overturning moments, and are widely used in large mechanical equipment such as wind power generation equipment, construction machinery, lifting equipment, and radar systems. Operating under long-term low-speed, heavy-load, and complex conditions, slewing bearings are prone to rolling element damage, inner or outer raceway wear, and spalling. Failure to detect these faults in a timely manner will severely impact the operational safety and reliability of the entire equipment. Therefore, conducting efficient fault diagnosis research is of significant practical importance.

[0003] Currently, most slewing bearing fault diagnosis technologies rely on single signals such as vibration and acoustic emission collected by sensors. Artificial features are extracted through time-domain, frequency-domain, or time-frequency-domain processing, and then combined with traditional machine learning to complete fault identification. Some technologies also employ deep learning models such as convolutional neural networks and recurrent neural networks to achieve automatic extraction and classification of signal features. However, these methods mostly use a single sensor signal as input, resulting in insufficient comprehensive perception of fault information. To comprehensively characterize the equipment's operating status and improve diagnostic accuracy, existing technologies have also proposed multimodal signal fusion fault diagnosis methods. These methods perform simple splicing or linear weighted fusion of multimodal signals before conducting diagnostic analysis.

[0004] However, existing multimodal signal fusion methods are relatively simple, making it difficult to handle the differences between heterogeneous signals. Their ability to distinguish fusion features is insufficient, and the overall diagnostic effect is difficult to meet the actual industrial needs. Summary of the Invention

[0005] This invention provides a method and apparatus for diagnosing slewing bearing faults, in order to solve the problem of insufficient fusion feature discrimination capability caused by the simple multimodal signal fusion method in the prior art.

[0006] In a first aspect, embodiments of the present invention provide a method for diagnosing slewing bearing faults, including: Multiple types of monitoring signals were collected during the operation of the slewing bearing, and the obtained multi-source raw monitoring data were preprocessed to obtain a standardized sample sequence. A multi-channel feature extraction architecture is constructed to extract features from different types of monitoring signals in the sample sequence, thereby obtaining independent feature tensors corresponding to each type of signal. The independent feature tensors are subjected to cross-modal gating fusion processing to obtain fused features; The fused features are input into the classification module for fault identification, and the fault diagnosis results of the slewing bearing are output.

[0007] In one possible implementation, after performing cross-modal gating fusion processing on the independent feature tensors to obtain fused features, the method further includes: By performing bidirectional temporal dependency modeling and global key feature focusing on the fused features, a global fault feature vector is obtained; The fused features are input into the classification module for fault identification, and the fault diagnosis results of the slewing bearing are output, including: The global fault feature vector is input into the classification module for fault identification, and the fault diagnosis result of the slewing bearing is output.

[0008] In one possible implementation, the multi-source raw monitoring data includes vibration signals and acoustic emission signals; Construct a multi-channel feature extraction architecture, including: A two-stream multi-scale residual encoder is constructed, which includes parallel neural network branches with non-shared parameters for vibrational flow and acoustic emission flow. The first layer of each branch is set with N parallel convolutional channels with different kernel sizes, where N is a positive integer greater than or equal to 3.

[0009] In one possible implementation, feature extraction is performed on different types of monitoring signals in the sample sequence to obtain independent feature tensors corresponding to each type of signal, including: The output features of the N parallel convolutional channels in each branch are concatenated along the channel dimension. The concatenated features are processed sequentially through a batch normalization layer and a ReLU activation layer. The activated features are then input into a Conv1x1 projection convolutional layer for channel fusion and dimensionality reduction. The dimensionality-reduced features are then fed into the residual blocks for deep feature mapping. The features output by the residual block are downsampled by a max pooling layer to obtain the independent feature tensors corresponding to each type of signal.

[0010] In one possible implementation, the dimensionality-reduced features are fed into the residual block for deep feature mapping, including: The dimensionality-reduced features are sequentially input into two Conv1d (3x3) convolutional layers. The first Conv1d (3x3) convolutional layer is followed by a ReLU activation layer. The input features of the residual block are added to the output of the second Conv1d (3x3) convolutional layer through a skip connection, and then output after ReLU activation.

[0011] In one possible implementation, the independent feature tensors are subjected to cross-modal gating fusion processing to obtain fused features, including: The independent feature tensor corresponding to the vibration signal is compressed through a Conv1d(1x1) layer and processed by a Sigmoid activation function to generate a vibration gating mask; the independent feature tensor corresponding to the acoustic emission signal is compressed through a Conv1d(1x1) layer and processed by a Sigmoid activation function to generate an acoustic emission gating mask. Interactive filtering is performed by multiplying the vibration gating mask element-wise with the independent feature tensor corresponding to the acoustic emission signal, and by multiplying the acoustic emission gating mask element-wise with the independent feature tensor corresponding to the vibration signal, to obtain two sets of corrected features. The two sets of corrected features are concatenated along the channel dimension, and then compressed through a convolutional layer before being input into a fusion convolutional block for deep fusion to obtain fused features.

[0012] In one possible implementation, the fused features are subjected to bidirectional temporal dependency modeling and global key feature focusing to obtain a global fault feature vector, including: The fused features are input into the Bi-LSTM layer, and the forward hidden state sequence and the backward hidden state sequence are calculated in the order of time step from 1 to T and T to 1, respectively. The forward hidden state and the backward hidden state corresponding to each time step are concatenated one by one to obtain the temporal feature vector. The temporal feature vector is input into a multi-head self-attention mechanism. Attention weights are calculated on the temporal feature vector and then weighted. The weighted temporal feature sequence is then subjected to global average pooling along the time dimension to obtain a global fault feature vector.

[0013] In one possible implementation, the attention weights are calculated on the temporal feature vector and then weighted, including: The time-series feature vector is mapped to a query matrix, a key matrix, and a value matrix through multiple sets of independent linear transformation matrices, with each set of linear transformation matrices corresponding to an attention head of the multi-head self-attention mechanism. Attention scores at each time step are calculated using the scaled dot product formula; The weighted feature results obtained from all attention points are concatenated along the channel dimension and then integrated through a linear transformation matrix to obtain the weighted temporal feature sequence.

[0014] In one possible implementation, the scaling dot product formula is: ; In the formula, Indicates attention score, Represents the query matrix. Represents the key matrix. Represents a value matrix, The transpose of the key matrix. This represents the dimension of the key matrix. Indicates the scaling factor. This indicates that the similarity score matrix is converted into an attention weight matrix.

[0015] Secondly, embodiments of the present invention provide a slewing bearing fault diagnosis device, comprising: The acquisition module is used to acquire various types of monitoring signals during the operation of the slewing bearing; The preprocessing module is used to preprocess the obtained multi-source raw monitoring data to obtain standardized sample sequences; The feature extraction module is used to construct a multi-channel feature extraction architecture, extract features from different types of monitoring signals in the sample sequence, and obtain independent feature tensors corresponding to each type of signal. The feature fusion module is used to perform cross-modal gated fusion processing on the independent feature tensors to obtain fused features; The fault identification module is used to input the fused features into the classification module for fault identification and output the fault diagnosis results of the slewing bearing.

[0016] This invention provides a method and apparatus for diagnosing slewing bearing faults. It collects various types of monitoring signals during the operation of the slewing bearing and preprocesses the obtained multi-source raw monitoring data to obtain a standardized sample sequence. A multi-channel feature extraction architecture is constructed to extract features from different types of monitoring signals in the sample sequence, obtaining independent feature tensors corresponding to each type of signal. Cross-modal gating fusion processing is performed on the independent feature tensors to obtain fused features. The fused features are then input into a classification module for fault identification, outputting the fault diagnosis results of the slewing bearing. This invention, by collecting both vibration and acoustic emission monitoring signals, overcomes the deficiency of single-signal representation of equipment status, achieving full-dimensional capture of fault features. Compared to traditional simple splicing, cross-modal gating fusion processing achieves complementary coupling of vibration and acoustic emission features, solving the problem of insufficient discriminative power in heterogeneous signal fusion and significantly improving the fault differentiation capability of the fused features. Attached Figure Description

[0017] To more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings used in the description of the embodiments or the prior art will be briefly introduced below. Obviously, the drawings described below are only some embodiments of the present invention. For those skilled in the art, other drawings can be obtained based on these drawings without creative effort.

[0018] Figure 1 This is a flowchart illustrating the implementation of the slewing bearing fault diagnosis method provided in this embodiment of the invention. Figure 2This is a flowchart illustrating the implementation of a slewing bearing fault diagnosis method according to another embodiment of the present invention. Figure 3 This is a schematic diagram of the slewing bearing fault diagnosis device provided in an embodiment of the present invention; Figure 4 This is a schematic diagram of the slewing bearing fault diagnosis device provided in another embodiment of the present invention; Figure 5 This is a schematic diagram of an electronic device provided in an embodiment of the present invention. Detailed Implementation

[0019] The embodiments of the present invention will now be described in detail with reference to the accompanying drawings.

[0020] See Figure 1 The diagram illustrates a flowchart of a slewing bearing fault diagnosis method provided by an embodiment of the present invention, which is described in detail below: Step 101: Collect various types of monitoring signals during the operation of the slewing bearing, and preprocess the obtained multi-source raw monitoring data to obtain a standardized sample sequence.

[0021] This step involves jointly monitoring the operating status of the slewing bearing using multiple types of sensors, collecting raw monitoring data from multiple sources, and then preprocessing the data to obtain a standardized sample sequence that meets the model input requirements, thus laying the data foundation for subsequent feature extraction.

[0022] Multi-source raw monitoring data includes vibration signals and acoustic emission signals. During the actual operation of the slewing bearing, vibration sensors and acoustic emission sensors are deployed to monitor its operating status in real time, simultaneously collecting the vibration signals and acoustic emission signals generated during the slewing bearing's operation. These two types of signals are then integrated to form multi-source raw monitoring data. By jointly acquiring multiple types of sensor signals, fault-related information during the slewing bearing's operation is comprehensively captured, compensating for the deficiency of a single sensor signal in fully characterizing the equipment's operating status.

[0023] like Figure 2As shown, the acquired multi-source raw monitoring data are sequentially subjected to data annotation, Z-score standardization, and sliding window time series segmentation. The specific processing steps are as follows: First, the multi-source raw monitoring data are annotated to indicate the corresponding slewing bearing operating status. Then, the annotated raw signals are subjected to Z-score standardization to eliminate the dimensional influence between different types of sensor signals, ensuring data consistency and comparability. Finally, considering the low-speed operation characteristics of the slewing bearing and its long fault cycle, a sliding window method is used to segment the standardized continuous signal. For example, the sliding window length can be set to 4096, and the continuous vibration signal and acoustic emission signal are segmented into corresponding sample sequences, ultimately obtaining the overall standardized sample sequence, which provides standardized and serialized input data for the subsequent feature decoupling extraction of the multi-channel feature extraction architecture.

[0024] Step 102: Construct a multi-channel feature extraction architecture, extract features from different types of monitoring signals in the sample sequence, and obtain independent feature tensors corresponding to each type of signal.

[0025] This step constructs a multi-channel feature extraction architecture based on a dual-stream multi-scale residual encoder. For the different characteristics of the two types of monitoring signals, vibration and acoustic emission, multi-scale feature decoupling extraction is performed separately to finally obtain the independent feature tensors corresponding to each modal signal, providing the basic input for subsequent cross-modal fusion.

[0026] In one embodiment, constructing a multi-channel feature extraction architecture may include: A two-stream multi-scale residual encoder is constructed, which includes parallel neural network branches with non-shared parameters for vibrational flow and acoustic emission flow. The first layer of each branch is set with N parallel convolutional channels with different kernel sizes, where N is a positive integer greater than or equal to 3.

[0027] A dual-stream multi-scale residual encoder is adopted as the core architecture for multi-channel feature extraction. This architecture includes two parallel neural network branches with non-shared parameters: vibration flow and acoustic emission flow, which respectively process vibration signal sample sequences and acoustic emission signal sample sequences.

[0028] For example, the first layer of each branch has three parallel convolution channels with different kernel sizes: 7, 5, and 3, used to capture fault features at different scales. Large convolutional kernels (e.g., 7) are used to capture long-period trends and global fault modes in vibration or acoustic emission signals; Medium-sized convolutional kernels (e.g., 5) are used to extract medium-scale fault feature components; Small convolution kernels (e.g., 3) are used to focus on weak fault features in signals, such as transient shocks and local microcracks.

[0029] In one embodiment, such as Figure 2 As shown, feature extraction is performed on different types of monitoring signals in the sample sequence to obtain independent feature tensors corresponding to each type of signal, which may include: The output features of N parallel convolutional channels in each branch are concatenated along the channel dimension. The concatenated features are processed sequentially through a batch normalization layer and a ReLU activation layer. The activated features are then input into a Conv1x1 projection convolutional layer for channel fusion and dimensionality reduction. The dimensionality-reduced features are then fed into the residual blocks for deep feature mapping. The features output by the residual block are downsampled by a max pooling layer to obtain the independent feature tensors corresponding to each type of signal.

[0030] Optionally, taking a single branch (such as vibrational flow) as an example, the complete process of feature extraction and tensor generation is as follows: Multi-scale feature concatenation: The features output from three parallel convolutional channels are concatenated along the channel dimension to integrate fault information at different scales under the same modal signal, forming a multi-scale fusion feature. Normalization and nonlinear activation: The concatenated features are sequentially input into the batch normalization (BN) layer and the ReLU activation layer. The batch normalization layer eliminates the distribution differences between different batches of data and accelerates model convergence; the ReLU activation layer introduces nonlinear transformation to uncover complex relationships between features. Channel fusion and dimensionality reduction: The activated features are input into the Conv1x1 projective convolutional layer, and the channel dimension is fused and reduced through a 1×1 convolutional kernel, which reduces the amount of subsequent computation while preserving effective multi-scale features. Deep mapping of residual blocks: In one embodiment, the dimensionality-reduced features are fed into the residual blocks for deep feature mapping, including: inputting the dimensionality-reduced features into two Conv1d (3x3) convolutional layers in sequence, and connecting the first Conv1d (3x3) convolutional layer to a ReLU activation layer; adding the input features of the residual blocks to the output of the second Conv1d (3x3) convolutional layer through skip connections, and then outputting them after ReLU activation, which effectively avoids the gradient vanishing problem in deep networks, while retaining weak fault features; Pooling downsampling: The features output by the residual block are downsampled into a max pooling layer with a kernel size of 2, which further compresses the feature dimension, enhances the feature displacement invariance, and improves the robustness of the model to changes in the location of fault features.

[0031] The vibration flow and acoustic emission flow branches respectively execute the above feature extraction process, and finally obtain the vibration feature tensor and the acoustic emission feature tensor. The two types of tensors are independent feature expressions of their respective modal signals and do not interfere with each other, serving as the input basis for subsequent cross-modal gating fusion.

[0032] Step 103: Perform cross-modal gating fusion processing on the independent feature tensors to obtain fused features.

[0033] This step addresses the heterogeneity of vibration feature tensors and acoustic emission feature tensors by employing a cross-modal gating fusion strategy instead of the traditional simple splicing or linear weighting method. Through interactive filtering between modes, effective feature selection, noise suppression, and adaptive weight adjustment are achieved, ultimately completing the deep fusion of the two types of features and obtaining fused features that are both complementary and highly discriminative.

[0034] In one embodiment, such as Figure 2 As shown, cross-modal gating fusion processing is performed on independent feature tensors to obtain fused features, which may include: The independent feature tensors corresponding to the vibration signal are compressed through a Conv1d(1x1) layer and processed by a Sigmoid activation function to generate a vibration gating mask; the independent feature tensors corresponding to the acoustic emission signal are compressed through a Conv1d(1x1) layer and processed by a Sigmoid activation function to generate an acoustic emission gating mask. Interactive filtering is performed by multiplying the vibration gate mask element-wise with the independent feature tensor corresponding to the acoustic emission signal, and by multiplying the acoustic emission gate mask element-wise with the independent feature tensor corresponding to the vibration signal, to obtain two sets of corrected features. The two sets of corrected features are concatenated along the channel dimension, and then compressed through a convolutional layer before being input into a fusion convolutional block for deep fusion to obtain fused features.

[0035] Optionally, the vibration feature tensor is input into a Conv1d(1×1) convolutional layer to complete channel dimension compression and feature mapping, and then the output value is mapped to the 0-1 interval through the Sigmoid activation function to generate a vibration gating mask; similarly, the acoustic emission feature tensor is processed by the same Conv1d(1×1) convolutional layer and Sigmoid activation function to generate an acoustic emission gating mask.

[0036] Each element of the above gate mask corresponds to the channel weight of the original feature tensor. The closer the value is to 1, the more effective the fault features of the corresponding channel are. The closer the value is to 0, the higher the proportion of noise interference or invalid information in the channel.

[0037] Based on the generated two types of gated masks, cross-modal interactive filtering is performed to achieve mutual constraints and filtering between different modal features: Element-wise multiplication is performed using a vibration-gated mask and the acoustic emission feature tensor, i.e. Effective components strongly correlated with vibration fault characteristics in the acoustic emission signal are selected, and irrelevant noise is suppressed. Simultaneously, an element-wise multiplication operation is performed between the acoustic emission gating mask and the vibration characteristic tensor. Key information complementary to the acoustic emission fault characteristics in the vibration signal was selected, and redundant components were removed; among them... This indicates the corrected acoustic emission characteristics. Indicates acoustic emission characteristics, Indicates a vibration gating mask. This indicates the corrected vibration characteristics. Indicates vibration characteristics, This represents the acoustic emission gating mask.

[0038] Through this bidirectional filtering process, two sets of corrected features are obtained, which not only preserve the core fault features of each mode, but also realize feature calibration between modes, thus solving the problem of differential adaptation during heterogeneous signal fusion.

[0039] The two sets of corrected features are concatenated along the channel dimension to integrate the filtered bimodal feature information and form a concatenated feature. The concatenated feature is then input into a convolutional layer for channel compression to reduce the feature dimension and computational redundancy. The compressed feature is then input into a fusion convolutional block to complete deep fusion. The fusion convolutional block consists of a Conv1d (1×1) convolutional layer, a batch normalization layer (BN1d), and a ReLU activation layer connected in series. The convolutional layer achieves deep coupling of features, the batch normalization layer ensures the stability of the feature distribution, and the ReLU activation layer introduces non-linear relationships. Finally, a fused feature with unified dimension and close feature correlation is output.

[0040] Slewing bearing faults exhibit significant temporal evolution characteristics. Existing technologies have limited ability to model key fault information over long time series, affecting the accuracy and stability of fault identification. Therefore, in this embodiment, as... Figure 2 As shown, after performing cross-modal gating fusion processing on independent feature tensors to obtain fused features, it can also include: performing bidirectional temporal dependency modeling and global key feature focusing on the fused features to obtain a global fault feature vector.

[0041] This step targets the time-series attributes of the fused features, sequentially capturing the temporal evolution of the fault through bidirectional temporal dependency modeling, and then achieving accurate extraction of the core fault information through global key feature focusing. Finally, a global fault feature vector containing complete temporal correlation and key fault features is obtained, providing a highly discriminative feature input for subsequent fault classification and discrimination.

[0042] In one embodiment, performing bidirectional temporal dependency modeling and global key feature focusing on the fused features to obtain a global fault feature vector may include: The fused features are input into the Bi-LSTM layer, and the forward hidden state sequence and the backward hidden state sequence are calculated in the order of time step from 1 to T and T to 1 respectively. The forward hidden state and the backward hidden state corresponding to each time step are concatenated one by one to obtain the temporal feature vector. The temporal feature vector is input into the multi-head self-attention mechanism. Attention weights are calculated on the temporal feature vector and then weighted. The weighted temporal feature sequence is then subjected to global average pooling along the time dimension to obtain the global fault feature vector.

[0043] In the bidirectional temporal dependency modeling stage, the fused features obtained after cross-modal gating fusion are input into the temporal modeling module. The bidirectional temporal dependency of the fused features is modeled through a Bi-LSTM layer, and the number of hidden layer nodes of the Bi-LSTM layer can be set to 64.

[0044] The Bi-LSTM layer contains two independent temporal computation layers: a forward layer and a backward layer. The forward layer proceeds according to time steps... arrive The forward-order input features are fused, and the hidden state is calculated step by step to accurately capture the cumulative effect and positive evolution information of the slewing bearing fault over time; the backward layer follows the time step from arrive The reverse input features are fused, and the hidden state is calculated step by step to fully explore the reverse contextual information of the fault features.

[0045] After completing the forward and backward hidden state sequence calculations, the forward and backward hidden states corresponding to each time step are concatenated to obtain a temporal feature vector containing complete bidirectional temporal semantic information. This effectively solves the problem that traditional models do not adequately model long-sequence temporal information and are prone to losing early fault information.

[0046] In the global key feature focusing stage, the spliced temporal feature vector is input into the feature focusing module based on the multi-head self-attention mechanism to achieve adaptive weighting and global aggregation of key fault features. The number of heads in this multi-head self-attention mechanism can be set to 4.

[0047] In one embodiment, the attention weights are calculated on the temporal feature vectors and then weighted, including: The temporal feature vector is mapped to a query matrix, a key matrix, and a value matrix through multiple sets of independent linear transformation matrices, with each set of linear transformation matrices corresponding to an attention head of the multi-head self-attention mechanism. Attention scores at each time step are calculated using the scaled dot product formula; The weighted feature results obtained from all attention points are concatenated along the channel dimension and then integrated through a linear transformation matrix to obtain the weighted temporal feature sequence.

[0048] First, the time-series feature vectors are mapped to query matrices through four independent linear transformation matrices. Key matrix Sum matrix Each set of linear transformation matrices corresponds to an attention head, enabling multi-dimensional feature association calculation; subsequently, the attention score at each time step is calculated using the scaling dot product formula, as follows: This calculation process automatically assigns higher attention weights to the features at the moment of slewing bearing failure impact, while suppressing the weights of time steps without fault information, such as silent periods and stable operation periods. Raceway spalling or rolling element damage in the slewing bearing generates periodic transient impact pulses during operation. These impact components have significant high-energy gradients and unique spectral distributions in the time-domain signal, exhibiting a clear feature space difference from background noise or the mundane features of the stable operation period. The self-attention mechanism calculates the attention score for each time step using a scaled dot product formula. Because the feature vector at the moment of failure impact has extremely high significance in the embedding space, its correlation with the global context will produce a significant numerical bias. After processing by the Softmax function, these time steps containing deterministic fault evidence are assigned weights close to 1, while the weights of time steps in silent periods or containing only random background noise are suppressed to near 0. This weighting mechanism fundamentally solves the technical problem of weak fault features being diluted by noise in long-sequence data, enabling the model to automatically focus on the transient impact region with the highest discriminative gain, thereby ensuring the model's high sensitivity in identifying early weak faults under complex operating conditions. In the formula, Indicates attention score, Represents the query matrix. Represents the key matrix. Represents a value matrix, The transpose of the key matrix. This represents the dimension of the key matrix. Indicates the scaling factor. This indicates that the similarity score matrix is converted into an attention weight matrix.

[0049] Next, the weighted feature results calculated by the four attention heads are concatenated and integrated through a linear transformation matrix to obtain a weighted temporal feature sequence. Finally, a global average pooling operation is performed on the weighted temporal feature sequence along the time dimension to compress the variable-length temporal feature sequence into a fixed-dimensional feature vector, ultimately obtaining a global fault feature vector. This vector not only preserves the bidirectional temporal evolution law of fault features but also focuses on the core fault information with the greatest discriminative gain.

[0050] Step 104: Input the fused features into the classification module for fault identification and output the fault diagnosis results of the slewing bearing.

[0051] This step inputs the global fault feature vector obtained in the previous step into the preset classification module. Through end-to-end model calculation and probability discrimination, the fault identification of the slewing bearing's operating status is completed, and the accurate fault diagnosis result is finally output. At the same time, the classification module is optimized through a loss function with regularization to ensure the model's generalization ability and diagnostic stability.

[0052] Optionally, the fused features are input into the classification module for fault identification, and the fault diagnosis results of the slewing bearing are output, including: The global fault feature vector is input into the classification module for fault identification, and the fault diagnosis results of the slewing bearing are output.

[0053] Optionally, the global fault feature vector obtained after bidirectional temporal dependency modeling and global key feature focusing is input into the fully connected layer of the classification module. Through the linear transformation of the fully connected layer, the high-dimensional global fault feature vector is mapped to the feature dimension that matches the fault category of the slewing bearing, laying the foundation for subsequent fault category probability calculation.

[0054] The output of the fully connected layer is then regularized by passing it through the ReLU activation function and the Dropout layer. The ReLU activation function introduces a non-linear transformation into the classification model, improving the model's ability to fit and distinguish complex fault features. The Dropout layer effectively avoids overfitting during training by randomly deactivating some neurons, ensuring the generalization performance of the classification module in real industrial scenarios.

[0055] The classification module is trained and optimized using a cross-entropy loss function with L2 regularization. The cross-entropy loss function is: ;in, This represents the value of the cross-entropy loss function, which measures the difference between the model's prediction and the true label. The training objective is to minimize this value. Indicates the number of samples in the batch. This indicates the total number of health condition categories for slewing bearings. Indicates the first The true label of the input sample, if the... The true class of each input sample is ,but ,otherwise . This represents the predicted probability of the classification model. This represents the regularization coefficient, which can balance the model's fitting ability and generalization ability, avoiding overfitting caused by excessive parameter complexity. This represents all learnable parameters of the classification model. This represents the square of the L2 norm, which is the sum of the squares of all learnable parameters. It is used to constrain the size of the parameters and prevent the model from overfitting on the training set due to excessively large parameter values.

[0056] This loss function can accurately measure the difference between the model's predictions and the true labels, and can also constrain the model parameters through regularization terms, thereby further improving the model's generalization ability.

[0057] The regularized feature vector is input into the Softmax function. Through normalization calculation, the feature vector is converted into a probability distribution of various operating state categories of the slewing bearing, including healthy state, inner race fault, outer race fault, and rolling element fault. Finally, the category with the highest probability distribution is selected as the final fault diagnosis result for this slewing bearing inspection. This achieves accurate identification from global fault characteristics to specific fault categories. The output diagnostic results clearly reflect the actual operating state of the slewing bearing, providing a clear basis for subsequent maintenance and repair of the equipment.

[0058] This invention provides a method for diagnosing slewing bearing faults. It involves collecting various types of monitoring signals during the slewing bearing's operation and preprocessing the multi-source raw monitoring data to obtain a standardized sample sequence. A multi-channel feature extraction architecture is constructed to extract features from different types of monitoring signals in the sample sequence, resulting in independent feature tensors for each type of signal. Cross-modal gating fusion processing is then performed on these independent feature tensors to obtain fused features. These fused features are input into a classification module for fault identification, outputting the fault diagnosis results for the slewing bearing. This invention uses both vibration and acoustic emission monitoring signals to jointly perceive the slewing bearing's operating status, overcoming the shortcomings of single-signal representation of equipment status and achieving full-dimensional capture of fault features. This improves the accuracy of fault identification under low-speed, heavy-load, and high-noise environments. Compared to traditional simple splicing, cross-modal gating fusion processing achieves complementary coupling of vibration and acoustic emission features, solving the problem of insufficient discriminative power in heterogeneous signal fusion and significantly improving the fault differentiation capability of the fused features.

[0059] In this embodiment of the invention, parallel branches with non-shared parameters are used to process different types of signals respectively, avoiding mutual interference between heterogeneous features; combined with multi-scale convolution and residual structure, weak fault features under low-speed heavy-load conditions are accurately mined, while enhancing the adaptability of features to operating condition fluctuations and improving the robustness of feature extraction.

[0060] In this embodiment of the invention, for the time series attributes of the fused features, the temporal evolution law of the fault is captured by bidirectional temporal dependency modeling, and then the core information of the fault is accurately extracted by global key feature focusing. This effectively solves the problem that traditional models do not adequately model long-sequence time information and are prone to losing early fault information, and realizes complete modeling of the fault evolution law throughout the entire life cycle.

[0061] The embodiments of the present invention do not rely on manual feature design, can adapt to the fault diagnosis needs of different working conditions and different types of slewing bearings, have strong versatility and engineering application value, and are suitable for promotion and application in actual industrial scenarios.

[0062] It should be understood that the sequence number of each step in the above embodiments does not imply the order of execution. The execution order of each process should be determined by its function and internal logic, and should not constitute any limitation on the implementation process of the embodiments of the present invention.

[0063] The following are device embodiments of the present invention. For details not described in detail, please refer to the corresponding method embodiments described above.

[0064] Figure 3 A schematic diagram of a slewing bearing fault diagnosis device according to an embodiment of the present invention is shown. For ease of explanation, only the parts related to the embodiment of the present invention are shown, and are described in detail below: like Figure 3 As shown, the slewing bearing fault diagnosis device includes: acquisition module 31, preprocessing module 32, feature extraction module 33, feature fusion module 34, and fault identification module 35.

[0065] The acquisition module 31 is used to acquire various types of monitoring signals during the operation of the slewing bearing; The preprocessing module 32 is used to preprocess the obtained multi-source raw monitoring data to obtain standardized sample sequences; Feature extraction module 33 is used to construct a multi-channel feature extraction architecture, extract features from different types of monitoring signals in the sample sequence, and obtain independent feature tensors corresponding to each type of signal; Feature fusion module 34 is used to perform cross-modal gated fusion processing on independent feature tensors to obtain fused features; The fault identification module 35 is used to input the fused features into the classification module for fault identification and output the fault diagnosis results of the slewing bearing.

[0066] In one possible implementation, such as Figure 4 As shown, the slewing bearing fault diagnosis device also includes: a feature processing module 36; After the feature fusion module 34 performs cross-modal gating fusion processing on the independent feature tensors to obtain the fused features, the feature processing module 36 is used for: By performing bidirectional temporal dependency modeling and global key feature focusing on the fused features, a global fault feature vector is obtained. When the fault identification module 35 inputs the fused feature classification module for fault identification and outputs the fault diagnosis result of the slewing bearing, it is used for: The global fault feature vector is input into the classification module for fault identification, and the fault diagnosis results of the slewing bearing are output.

[0067] In one possible implementation, the multi-source raw monitoring data includes vibration signals and acoustic emission signals; When constructing a multi-channel feature extraction architecture, feature extraction module 33 is used for: A two-stream multi-scale residual encoder is constructed, which includes parallel neural network branches with non-shared parameters for vibrational flow and acoustic emission flow. The first layer of each branch is set with N parallel convolutional channels with different kernel sizes, where N is a positive integer greater than or equal to 3.

[0068] In one possible implementation, when the feature extraction module 33 extracts features from different types of monitoring signals in the sample sequence and obtains the independent feature tensors corresponding to each type of signal, it is used for: The output features of N parallel convolutional channels in each branch are concatenated along the channel dimension. The concatenated features are processed sequentially through a batch normalization layer and a ReLU activation layer. The activated features are then input into a Conv1x1 projection convolutional layer for channel fusion and dimensionality reduction. The dimensionality-reduced features are then fed into the residual blocks for deep feature mapping. The features output by the residual block are downsampled by a max pooling layer to obtain the independent feature tensors corresponding to each type of signal.

[0069] In one possible implementation, when the feature extraction module 33 inputs the dimensionality-reduced features into the residual block for deep feature mapping, it is used for: The dimensionality-reduced features are sequentially input into two Conv1d (3x3) convolutional layers. The first Conv1d (3x3) convolutional layer is followed by a ReLU activation layer. The input features of the residual block are added to the output of the second Conv1d (3x3) convolutional layer through a skip connection, and then output after ReLU activation.

[0070] In one possible implementation, the feature fusion module 34 performs cross-modal gated fusion processing on the independent feature tensors to obtain fused features, which are then used for: The independent feature tensors corresponding to the vibration signal are compressed through a Conv1d(1x1) layer and processed by a Sigmoid activation function to generate a vibration gating mask; the independent feature tensors corresponding to the acoustic emission signal are compressed through a Conv1d(1x1) layer and processed by a Sigmoid activation function to generate an acoustic emission gating mask. Interactive filtering is performed by multiplying the vibration gate mask element-wise with the independent feature tensor corresponding to the acoustic emission signal, and by multiplying the acoustic emission gate mask element-wise with the independent feature tensor corresponding to the vibration signal, to obtain two sets of corrected features. The two sets of corrected features are concatenated along the channel dimension, and then compressed through a convolutional layer before being input into a fusion convolutional block for deep fusion to obtain fused features.

[0071] In one possible implementation, when the feature processing module 36 performs bidirectional temporal dependency modeling and global key feature focusing on the fused features to obtain the global fault feature vector, it is used for: The fused features are input into the Bi-LSTM layer, and the forward hidden state sequence and the backward hidden state sequence are calculated in the order of time step from 1 to T and T to 1 respectively. The forward hidden state and the backward hidden state corresponding to each time step are concatenated one by one to obtain the temporal feature vector. The temporal feature vector is input into the multi-head self-attention mechanism. Attention weights are calculated on the temporal feature vector and then weighted. The weighted temporal feature sequence is then subjected to global average pooling along the time dimension to obtain the global fault feature vector.

[0072] In one possible implementation, when the feature processing module 36 calculates attention weights and performs weighted processing on the temporal feature vectors, it is used for: The temporal feature vector is mapped to a query matrix, a key matrix, and a value matrix through multiple sets of independent linear transformation matrices, with each set of linear transformation matrices corresponding to an attention head of the multi-head self-attention mechanism. Attention scores at each time step are calculated using the scaled dot product formula; The weighted feature results obtained from all attention points are concatenated along the channel dimension and then integrated through a linear transformation matrix to obtain the weighted temporal feature sequence.

[0073] In one possible implementation, the scaling dot product formula is: ; In the formula, Indicates attention score, Represents the query matrix. Represents the key matrix. Represents a value matrix, The transpose of the key matrix. This represents the dimension of the key matrix. Indicates the scaling factor. This indicates that the similarity score matrix is converted into an attention weight matrix.

[0074] The above embodiments provide a slewing bearing fault diagnosis device. A data acquisition module collects various types of monitoring signals during the slewing bearing's operation. A preprocessing module preprocesses the obtained multi-source raw monitoring data to obtain a standardized sample sequence. A feature extraction module constructs a multi-channel feature extraction architecture to extract features from different types of monitoring signals in the sample sequence, obtaining independent feature tensors corresponding to each type of signal. A feature fusion module performs cross-modal gated fusion processing on the independent feature tensors to obtain fused features. A fault identification module inputs the fused features into a classification module for fault identification and outputs the fault diagnosis results of the slewing bearing. This invention, by acquiring both vibration and acoustic emission monitoring signals, overcomes the deficiency of a single signal in representing the equipment state comprehensively, achieving full-dimensional capture of fault features. Compared to traditional simple splicing, cross-modal gated fusion processing achieves complementary coupling of vibration and acoustic emission features, solving the problem of insufficient discriminative power in heterogeneous signal fusion and significantly improving the fault differentiation capability of the fused features.

[0075] In this embodiment of the invention, parallel branches with non-shared parameters are used to process different types of signals respectively, avoiding mutual interference between heterogeneous features; combined with multi-scale convolution and residual structure, weak fault features under low-speed heavy-load conditions are accurately mined, while enhancing the adaptability of features to operating condition fluctuations and improving the robustness of feature extraction.

[0076] In this embodiment of the invention, for the time-series attributes of the fused features, bidirectional temporal dependency modeling is used to capture the temporal evolution pattern of the fault, effectively solving the problems of insufficient modeling of long-sequence time-series information and easy loss of early fault information in traditional models. Then, global key feature focusing is used to accurately extract the core fault information.

[0077] Figure 5 This is a schematic diagram of an electronic device provided in an embodiment of the present invention. For example... Figure 5 As shown, the electronic device 5 of this embodiment includes a processor 50 and a memory 51. The memory 51 stores a computer program 52. When the processor 50 executes the computer program 52, it implements the steps in the various method embodiments described above. Alternatively, when the processor 50 executes the computer program 52, it implements the functions of each module / unit in the various device embodiments described above.

[0078] For example, computer program 52 may be divided into one or more modules / units, which are stored in memory 51 and executed by processor 50 to complete the present invention. The one or more modules / units may be a series of computer program instruction segments capable of performing a specific function, which describe the execution process of computer program 52 in electronic device 5.

[0079] Electronic device 5 may include, but is not limited to, processor 50 and memory 51. Those skilled in the art will understand that... Figure 5 This is merely an example of electronic device 5 and does not constitute a limitation on electronic device 5. It may include more or fewer components than shown, or combine certain components, or different components. For example, electronic device 5 may also include input / output devices, network access devices, buses, etc.

[0080] The processor 50 can be a central processing unit (CPU), or other general-purpose processors, digital signal processors (DSPs), application-specific integrated circuits (ASICs), field-programmable gate arrays (FPGAs), or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc. The general-purpose processor can be a microprocessor or any conventional processor.

[0081] The memory 51 can be an internal storage unit of the electronic device 5, such as a hard disk or RAM. The memory 51 can also be an external storage device of the electronic device 5, such as a plug-in hard disk, Smart Media Card (SMC), Secure Digital (SD) card, or Flash Card. Furthermore, the memory 51 can include both internal and external storage units of the electronic device 5. The memory 51 is used to store the computer program 52 and other programs and data required by the electronic device 5. The memory 51 can also be used to temporarily store data that has been output or will be output.

[0082] For the sake of simplicity and clarity, only the above-described functional modules / units are used as examples. In practical applications, the functions described above can be assigned to different functional modules / units as needed. These modules / units can be implemented in hardware, software, or a combination of both.

[0083] This invention also provides a computer-readable storage medium storing a computer program. When the computer program is executed by a processor, it implements the methods described in the above-described method embodiments.

[0084] This invention also provides a computer program product, including a computer program. When the computer program is executed by a processor, it implements the methods described in the above-described method embodiments.

[0085] Computer programs include computer program code, which can be in the form of source code, object code, executable files, or certain intermediate forms. Computer-readable media can include: any entity or device capable of carrying computer program code, recording media, USB flash drives, portable hard drives, magnetic disks, optical disks, computer memory, read-only memory (ROM), random access memory (RAM), electrical carrier signals, telecommunication signals, and software distribution media, etc.

[0086] In the above embodiments, the descriptions of each embodiment have their own emphasis. Parts not detailed or described in a particular embodiment can be referred to in the relevant descriptions of other embodiments. Unless otherwise specified or in conflict with logic, the terminology and / or descriptions between different embodiments are consistent and can be referenced interchangeably. Technical features in different embodiments can be combined to form new embodiments based on their inherent logical relationships.

[0087] The above-described embodiments are only used to illustrate the technical solutions of the present invention, and are not intended to limit it. Although the present invention has been described in detail with reference to the foregoing embodiments, those skilled in the art should understand that modifications can still be made to the technical solutions described in the foregoing embodiments, or equivalent substitutions can be made to some of the technical features. Such modifications or substitutions do not cause the essence of the corresponding technical solutions to deviate from the spirit and scope of the technical solutions of the embodiments of the present invention, and should all be included within the protection scope of the present invention.

Claims

1. A method for diagnosing slewing bearing faults, characterized in that, include: Multiple types of monitoring signals were collected during the operation of the slewing bearing, and the obtained multi-source raw monitoring data were preprocessed to obtain a standardized sample sequence. A multi-channel feature extraction architecture is constructed to extract features from different types of monitoring signals in the sample sequence, thereby obtaining independent feature tensors corresponding to each type of signal. The independent feature tensors are subjected to cross-modal gating fusion processing to obtain fused features; The fused features are input into the classification module for fault identification, and the fault diagnosis results of the slewing bearing are output.

2. The slewing bearing fault diagnosis method according to claim 1, characterized in that, After performing cross-modal gating fusion processing on the independent feature tensors to obtain fused features, the process further includes: By performing bidirectional temporal dependency modeling and global key feature focusing on the fused features, a global fault feature vector is obtained; The fused features are input into the classification module for fault identification, and the fault diagnosis results of the slewing bearing are output, including: The global fault feature vector is input into the classification module for fault identification, and the fault diagnosis result of the slewing bearing is output.

3. The slewing bearing fault diagnosis method according to claim 2, characterized in that, The multi-source raw monitoring data includes vibration signals and acoustic emission signals; Construct a multi-channel feature extraction architecture, including: A two-stream multi-scale residual encoder is constructed, which includes parallel neural network branches with non-shared parameters for vibrational flow and acoustic emission flow. The first layer of each branch is set with N parallel convolutional channels with different kernel sizes, where N is a positive integer greater than or equal to 3.

4. The slewing bearing fault diagnosis method according to claim 3, characterized in that, Feature extraction is performed on different types of monitoring signals in the sample sequence to obtain independent feature tensors corresponding to each type of signal, including: The output features of the N parallel convolutional channels in each branch are concatenated along the channel dimension. The concatenated features are processed sequentially through a batch normalization layer and a ReLU activation layer. The activated features are then input into a Conv1x1 projection convolutional layer for channel fusion and dimensionality reduction. The dimensionality-reduced features are then fed into the residual blocks for deep feature mapping. The features output by the residual block are downsampled by a max pooling layer to obtain the independent feature tensors corresponding to each type of signal.

5. The slewing bearing fault diagnosis method according to claim 4, characterized in that, The dimensionality-reduced features are then fed into the residual blocks for deep feature mapping, including: The dimensionality-reduced features are sequentially input into two Conv1d (3x3) convolutional layers. The first Conv1d (3x3) convolutional layer is followed by a ReLU activation layer. The input features of the residual block are added to the output of the second Conv1d (3x3) convolutional layer through a skip connection, and then output after ReLU activation.

6. The slewing bearing fault diagnosis method according to any one of claims 2-5, characterized in that, The independent feature tensors are subjected to cross-modal gating fusion processing to obtain fused features, including: The independent feature tensor corresponding to the vibration signal is compressed through a Conv1d(1x1) layer and processed by a Sigmoid activation function to generate a vibration gating mask; the independent feature tensor corresponding to the acoustic emission signal is compressed through a Conv1d(1x1) layer and processed by a Sigmoid activation function to generate an acoustic emission gating mask. Interactive filtering is performed by multiplying the vibration gating mask element-wise with the independent feature tensor corresponding to the acoustic emission signal, and by multiplying the acoustic emission gating mask element-wise with the independent feature tensor corresponding to the vibration signal, to obtain two sets of corrected features. The two sets of corrected features are concatenated along the channel dimension, and then compressed through a convolutional layer before being input into a fusion convolutional block for deep fusion to obtain fused features.

7. The slewing bearing fault diagnosis method according to claim 6, characterized in that, The fused features are subjected to bidirectional temporal dependency modeling and global key feature focusing to obtain a global fault feature vector, including: The fused features are input into the Bi-LSTM layer, and the forward hidden state sequence and the backward hidden state sequence are calculated in the order of time step from 1 to T and T to 1, respectively. The forward hidden state and the backward hidden state corresponding to each time step are concatenated one by one to obtain the temporal feature vector. The temporal feature vector is input into a multi-head self-attention mechanism. Attention weights are calculated on the temporal feature vector and then weighted. The weighted temporal feature sequence is then subjected to global average pooling along the time dimension to obtain a global fault feature vector.

8. The slewing bearing fault diagnosis method according to claim 7, characterized in that, After calculating the attention weights and performing weighted processing on the temporal feature vectors, the process includes: The time-series feature vector is mapped to a query matrix, a key matrix, and a value matrix through multiple sets of independent linear transformation matrices, with each set of linear transformation matrices corresponding to an attention head of the multi-head self-attention mechanism. Attention scores at each time step are calculated using the scaled dot product formula; The weighted feature results obtained from all attention heads are concatenated along the channel dimension and then integrated through a linear transformation matrix to obtain the weighted temporal feature sequence.

9. The slewing bearing fault diagnosis method according to claim 8, characterized in that, The scaling dot product formula is: ； In the formula, Indicates attention score, Represents the query matrix. Represents the key matrix. Represents a value matrix, The transpose of the key matrix. This represents the dimension of the key matrix. Indicates the scaling factor. This indicates that the similarity score matrix is converted into an attention weight matrix.

10. A slewing bearing fault diagnosis device, characterized in that, include: The acquisition module is used to acquire various types of monitoring signals during the operation of the slewing bearing; The preprocessing module is used to preprocess the obtained multi-source raw monitoring data to obtain standardized sample sequences; The feature extraction module is used to construct a multi-channel feature extraction architecture, extract features from different types of monitoring signals in the sample sequence, and obtain independent feature tensors corresponding to each type of signal. The feature fusion module is used to perform cross-modal gated fusion processing on the independent feature tensors to obtain fused features; The fault identification module is used to input the fused features into the classification module for fault identification and output the fault diagnosis results of the slewing bearing.