A bearing fault diagnosis method based on adaptive lifting wavelet and convolution KAN network
By combining adaptive lifting wavelet and convolutional KAN network, the problems of weak noise resistance, large feature extraction deviation and insufficient nonlinear expression in bearing fault diagnosis are solved, and high-precision and high-robust fault identification is achieved, especially maintaining high recognition performance under complex working conditions.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- NORTHEASTERN UNIV CHINA
- Filing Date
- 2026-05-22
- Publication Date
- 2026-06-19
AI Technical Summary
Existing deep learning methods for bearing fault diagnosis suffer from problems such as weak noise resistance, large feature extraction bias, insufficient nonlinear expression ability, loss of high-frequency detail information, and non-optimal feature space distribution, which limit the recognition accuracy and generalization ability under complex working conditions.
We employ a combination of adaptive lifting wavelet and convolutional KAN network. Initial features are extracted using large convolutional kernels, and local convolutional and spline nonlinear KAN residual blocks are fused. A learnable soft threshold denoising wavelet module is used for feature decomposition and dynamic denoising. Furthermore, a prototype-aware contrastive learning mechanism is introduced to optimize the loss function, thereby achieving multi-scale feature fusion and classification.
It significantly improves the robustness and accuracy of the model under complex working conditions, effectively identifies weak fault features, and enhances classification accuracy and generalization ability, especially maintaining high recognition performance in strong noise environments.
Smart Images

Figure CN122241387A_ABST
Abstract
Description
Technical Field
[0001] This invention belongs to the field of intelligent fault diagnosis technology, and relates to a bearing fault diagnosis method based on adaptive lifting wavelet and convolutional KAN network. Background Technology
[0002] In the wave of industrial intelligent transformation, intelligent manufacturing places extremely high demands on the reliability and safety of mechanical equipment. As a core component in modern rotating machinery, the health of rolling bearings directly affects the stable operation of the entire equipment system. Statistics show that the vast majority of unexpected shutdowns and major accidents in rotating machinery are caused by early bearing failures. Therefore, in complex and ever-changing industrial environments, achieving high-precision and robust fault diagnosis of rolling bearings has significant engineering value and economic importance. With the explosion of IoT big data and the rapid development of artificial intelligence technology, data-driven intelligent diagnosis is gradually replacing traditional signal analysis relying on human experience, becoming the mainstream direction in the field of fault diagnosis.
[0003] Against the backdrop of the rise of deep learning, diagnostic methods based on models such as convolutional neural networks (CNNs) and multilayer perceptrons have been widely applied. Traditional CNNs extract local features of one-dimensional fault signals through convolutional and pooling layers, significantly improving the automation level of classification. However, existing deep learning diagnostic methods and signal processing techniques still face a series of technical bottlenecks that urgently need to be addressed in real-world, complex, and noisy industrial conditions:
[0004] First, in the early feature extraction stage, existing frequency domain transformation methods (such as short-time Fourier transform and traditional wavelet transform) often rely on preset, fixed basis functions (such as db4 or Morlet wavelets). In real-world environments with strong background noise, the impact waveform of bearing failure often undergoes severe distortion. Fixed basis functions cannot adapt to the dynamic changes in the current data, resulting in significant deviations in the extracted features and weak anti-interference capabilities.
[0005] Secondly, in feature mapping of deep networks, traditional CNNs and MLPs heavily rely on linear weight multiplication and fixed nonlinear activation functions (such as ReLU). Bearing fault vibration signals have extremely strong high-order nonlinear characteristics. Traditional networks are limited by the linear combination expression paradigm, and fitting highly complex fault mapping relationships often requires a large number of parameters and is prone to overfitting. At the same time, existing multi-scale deep models inevitably lose high-frequency detail information when downsampling through pooling layers or stride convolutions to expand the receptive field. This high-frequency information often contains extremely critical early weak fault features.
[0006] Finally, at the decision-making level of fault classification, most existing methods only use the traditional cross-entropy loss function in conjunction with Softmax for optimization. This traditional classification strategy focuses solely on finding the decision boundaries between different categories, but it is difficult to effectively constrain the compactness of similar samples and the spacing between dissimilar samples within the feature space. This feature space distribution of "excessively large intra-class spacing and excessively small inter-class spacing" makes the model highly susceptible to confusion when faced with early weak faults or complex faults with extremely similar features, severely limiting the overall recognition accuracy and cross-condition generalization ability.
[0007] Based on the above analysis, existing technical solutions have obvious limitations in adaptive noise reduction, high-order nonlinear feature representation, lossless multi-scale feature transfer, and high-dimensional feature space metric optimization. Summary of the Invention
[0008] To address the aforementioned issues, this invention provides a bearing fault diagnosis method based on adaptive lifting wavelet and convolutional KAN network, which can overcome the structural defects of traditional deep networks, possess strong adaptive noise resistance, and accurately characterize and classify subtle bearing fault features.
[0009] This invention provides a bearing fault diagnosis method based on adaptive lifting wavelet and convolutional KAN network, comprising:
[0010] Step 1: Acquire the original one-dimensional vibration signal of the bearing, use the sliding window technique to truncate the original signal, and standardize the truncated signal sample to construct the input tensor;
[0011] Step 2: Construct a large convolutional kernel initial feature extraction module to capture the initial low-frequency envelope features of the input tensor using a large-size one-dimensional convolutional kernel;
[0012] Step 3: Construct a KAN residual block that integrates local convolution and spline nonlinearity, extract the local impact envelope through depthwise separable convolution, and use the high-order spline function of the Kolmogorov-Arnold network to realize nonlinear feature mapping;
[0013] Step 4: Construct an adaptive lifting wavelet module based on learnable soft threshold denoising. The features are decomposed by a neural network parameterized predictor and updater, and the high-frequency detail features are dynamically denoised using a learnable threshold.
[0014] Step 5: Construct a multi-stage pyramid network by alternately cascading the KAN residual blocks and the adaptive lifting wavelet module, and perform global pooling and splicing fusion of the multi-scale features extracted from each stage to obtain a comprehensive feature vector;
[0015] Step 6: Introduce a prototype-aware contrastive learning mechanism, construct a joint loss function that includes cross-entropy loss and prototype contrastive loss, optimize and train the network model, and use the trained model to output bearing fault diagnosis results.
[0016] The bearing fault diagnosis method based on adaptive lifting wavelet and convolutional KAN network of the present invention has the following beneficial effects:
[0017] (1) An adaptive lifting wavelet module based on learnable soft threshold denoising is proposed. Compared with the traditional frequency domain transformation method using fixed basis functions, this module can adaptively learn and filter out strong background noise in complex industrial environments, significantly enhancing the robustness of the model under harsh working conditions, while preserving the physical interpretability of feature extraction.
[0018] (2) A KAN residual block integrating local convolution and spline nonlinearity was constructed. By replacing the fixed activation function in the traditional neural network with the B-spline function, the shortcomings of the linear expression ability of traditional CNN and MLP are overcome, and efficient and accurate mapping of bearing high-order nonlinear fault features is achieved with very few parameters.
[0019] (3) A lossless multi-scale pyramid downsampling architecture based on lifting wavelet was designed. The traditional large-stride convolution or pooling operation was abandoned. Downsampling was achieved through wavelet decomposition, which completely avoided the loss of high-frequency weak fault details and realized lossless transfer and global fusion of multi-scale features, which greatly improved the model’s sensitivity to early weak faults.
[0020] (4) A prototype-based multi-scale contrastive learning joint optimization strategy was introduced. This strategy overcomes the limitations of the single cross-entropy loss. By bringing similar samples closer to the prototype center and pushing away heterogeneous prototypes, it effectively increases the boundary margin between different fault features and significantly improves the model's classification accuracy and generalization ability under unknown conditions when facing highly similar composite faults. Attached Figure Description
[0021] Figure 1 This is a flowchart of a bearing fault diagnosis method based on adaptive lifting wavelet and convolutional KAN network according to the present invention;
[0022] Figure 2 This is a schematic diagram of the overall diagnostic framework of the present invention. Detailed Implementation
[0023] like Figure 1 As shown, the bearing fault diagnosis method based on adaptive lifting wavelet and convolutional KAN network of the present invention includes:
[0024] Step 1: Acquire the original one-dimensional vibration signal of the bearing, truncate the original signal using the sliding window technique, and standardize the truncated signal samples to construct the input tensor, specifically:
[0025] Step 1.1: Collect raw one-dimensional vibration signals of the bearings of industrial equipment under different operating conditions (such as different speeds and different loads) using an accelerometer. ,in This represents the total number of sampling points for the signal sequence.
[0026] Step 1.2: Since the acquired original one-dimensional vibration signal is a continuous long sequence containing tens of thousands of sampling points, in order to meet the batch training requirements of the deep learning model and increase sample diversity, a sliding window technique is used to overlap and truncate the original one-dimensional vibration signal. The length of the sliding window is set to... The sliding step size is The total number of local sample sequences generated after truncation using a sliding window. The calculation formula is as follows:
[0027]
[0028] Among them, symbols This indicates a floor operation; the floor value extracted by this operation is the 1st floor. A local sample sequence is represented as:
[0029]
[0030] Step 1.3: Calculate the sequence of each extracted local sample. sample mean with standard deviation .
[0031] Step 1.4: For Independent Z-score mean-variance standardization is performed. To prevent division-by-zero errors when signal fluctuations are extremely small (i.e., the standard deviation approaches 0), a very small constant is introduced into the denominator when normalizing each data point in the sequence. To protect, specifically take The standardization formula is:
[0032]
[0033] in, Indicates the first The a-th data point in a local sample sequence; Let a be the standardized data point a; after normalizing each data point, the standardized input sequence is obtained. .
[0034] Step 1.5: After the sliding window truncation and standardization processes described above, summarize all standardized input sequences with corresponding fault category labels, divide them into training and test sets proportionally, and then... The tensor format required for conversion to a deep learning model is denoted as... .
[0035] like Figure 2 The diagram shown is a schematic representation of the overall diagnostic framework of the present invention, and the diagnostic process is as follows.
[0036] Step 2: Construct a large convolutional kernel initial feature extraction module, which uses a large-size one-dimensional convolutional kernel to capture the initial low-frequency envelope features of the input tensor. Specifically:
[0037] Step 2.1: For the input tensor Perform a one-dimensional large kernel convolution operation: Set the convolution kernel size to... (Set to 64), the maximum step size is (Set to 16), zero padding length is (Set to 24), number of output channels is For the first One channel, Its convolution output feature map In spatial location The formula for calculating the value at this location is as follows:
[0038]
[0039] in, This is a local sliding index within the convolution window. For the first The weight vector of the nth convolutional kernel One element; For the corresponding bias term; due to the use of a large size and The network can effectively filter out high-frequency noise in the first layer and significantly reduce the time dimension of the sequence.
[0040] Step 2.2: To accelerate network convergence and alleviate internal covariate shift, the convolution output feature map is processed. Batch normalization is performed, and nonlinear expressive power is introduced through the ReLU activation function; the mean within the batch is set to be... The variance is The channel learnable scaling parameter is The translation parameters are Intermediate features are obtained after nonlinear mapping. Its spatial location The formula for calculating the value at this location is as follows:
[0041]
[0042] in, This is a very small positive constant introduced in batch normalization to ensure numerical stability;
[0043] Step 2.3: For intermediate features Apply one-dimensional max pooling, and set the pooling kernel size to 1. (Set to 3), pooling step size is (Set as 2), fill with (Set to 1); the final extracted initial low-frequency feature map In spatial location The value is calculated as follows:
[0044]
[0045] in, For the local sliding index within the pooling window, all Initial low-frequency feature map extracted from each channel Stacking and merging along the channel dimension yields an initial low-frequency envelope feature for multiple channels. ,in The length of the sequence after large stride convolution and pooling is reduced. The number of channels is used as the input to the first-stage KAN residual block in the multi-stage pyramid network.
[0046] Step 3: Construct a KAN residual block that integrates local convolution and spline nonlinearity. Extract the local impulsive envelope through depthwise separable convolution, and use the high-order spline function of the Kolmogorov-Arnold network to achieve nonlinear feature mapping. Specifically:
[0047] Step 3.1: Let the feature tensor of the input KAN residual block be denoted as . , for input One-dimensional depthwise separable convolutions with a grouping mechanism are applied to capture local fault impact envelopes. After convolution, the features are sequentially passed through batch normalization (BatchNorm) and SiLU activation functions to obtain intermediate features. :
[0048]
[0049] in, This refers to one-dimensional depthwise separable convolution, which achieves decoupling and extraction of local temporal feature information in each channel by spatially convolving the convolution kernel with each channel of the input. This significantly reduces the number of model parameters and computational complexity.
[0050] Step 3.2: In order to suppress invalid noise channels and highlight fault characteristic channels, [the following steps are taken]. Introducing a channel attention mechanism, and using global average pooling to... The data is compressed into channel descriptors, and then processed through two fully connected layers and a sigmoid function to calculate the adaptive weight vector for each channel. ; then combine the weight vector with the features Perform element-wise multiplication to obtain the recalibrated features. :
[0051]
[0052] Subsequently, a higher-order nonlinear mapping based on spline curves is performed. In order to overcome the expression bottleneck of fixed activation functions, the features are... The dimensions are reshaped into a one-dimensional sequence to fit the KAN layer structure; in the KAN architecture, the activation function is transferred from the neuron node to the connection edge, and the network does not use fixed linear weights, but instead constructs a learnable univariate nonlinear function on the feature propagation edge. This enables highly flexible mapping of input features; let... Features After the above flattening and reshaping, the scalar components input to specific neuron nodes, the activation output of the KAN layer, consists of a combination of a basic activation function and a spline function:
[0053]
[0054] in, The basic activation function is adopted, specifically... ). For a spline function defined on a learnable grid, and These are the learnable scaling parameters of the fundamental function and the spline function, respectively; the spline function... A set of B-spline basis functions Linear combination expansion:
[0055]
[0056] in, The number of grid intervals, Let be the spline order of the piecewise polynomial. These are the control point coefficients that the network adaptively optimizes during training. Through this design, each neuron connection becomes a learnable, smooth curve, enabling the network to accurately approximate arbitrarily complex fault nonlinear mappings with very few parameters.
[0057] Step 3.3: Activate the output The features, after being mapped by a higher-order nonlinear method, are reconstructed back to their original dimensions to obtain the feature map. .
[0058] Step 3.4: To prevent overfitting in deep networks, ... Apply random depth dropout regularization and compare it with the initial input. By adding element by element, we obtain the final output of this module. :
[0059]
[0060] in, Discard regularization operations for random depths.
[0061] This step ensures the translation invariance of the signal over time through local convolution, and then uses the KAN network to endow the model with strong high-order nonlinear expression capabilities, effectively solving the problem that complex composite fault features are difficult to be represented by a single structure.
[0062] Step 4: Construct an adaptive lifting wavelet module based on learnable soft threshold denoising. This module decomposes features using a neural network parameterized predictor and updater, and dynamically denoises high-frequency detail features using a learnable threshold. Specifically:
[0063] Step 4.1: Output the final value of the KAN residual block The input is fed into an adaptive lifting wavelet module for feature decomposition and dynamic noise reduction; the physical processes of splitting, prediction and updating are reconstructed using learnable neural network operators.
[0064] First, perform an even / odd split operation; then, along the time series dimension, split the input features... Split into even sequences by index and odd sequence Both are half the length of the original sequence; regarding spatial location Its splitting formula is as follows:
[0065]
[0066]
[0067] Step 4.2: In the prediction stage, in order to overcome the inadequacy of fixed wavelet basis functions for complex working conditions, this application designs a nonlinear predictor composed of grouped one-dimensional convolution and hyperbolic tangent activation function. Using even number sequences To predict odd number sequences The prediction residual between the two is the extracted high-frequency detail feature. :
[0068]
[0069]
[0070] Since high-frequency components in real-world industrial datasets (such as the PU dataset) often contain significant background noise, directly inputting residual features into deep networks would lead to severe feature interference. Therefore, this application innovatively introduces a learnable soft-thresholding denoising layer. [This layer is for high-frequency detail features.] Each channel is initialized with a threshold parameter that can be dynamically adjusted during backpropagation of the network. To ensure the non-negativity of the threshold, its absolute value is taken. For high-frequency detail features Applying a soft threshold shrinkage operation yields the cleaned high-frequency detail features. :
[0071]
[0072] in, This is the sign function; the physical meaning of this operation is that the network will automatically learn a dynamic threshold. The method treats small high-frequency fluctuations with amplitudes less than the threshold as noise and directly sets them to zero, while retaining the actual fault impact components with amplitudes greater than the threshold. This achieves a perfect combination of physical interpretability and data-driven denoising.
[0073] Step 4.3: In the update phase, to maintain the global smoothness of low-frequency features and avoid aliasing, the cleaned high-frequency detail features are utilized. To update the even sequence Design a nonlinear updater consisting of grouped one-dimensional convolutions and a sigmoid activation function. Calculate the low-frequency approximate features that include the global trend of the signal. :
[0074]
[0075]
[0076] in, This indicates that grouped one-dimensional convolutions are used to extract local spatial features of high-frequency details. The Sigmoid activation function is used to generate weighted coefficients for feature importance. This indicates element-wise multiplication.
[0077] Step 4.4: After the above adaptive lifting wavelet transform and soft thresholding denoising, the low-frequency approximate features are... High-frequency detail features after cleaning By splicing along the channel dimension, a fused feature map that combines macro trends and pure impact characteristics is obtained, which serves as a reliable input for the next stage KAN residual block in the multi-stage pyramid network.
[0078] Step 5: Construct a multi-stage pyramid network by alternately cascading the KAN residual blocks and the adaptive lifting wavelet module, and then perform global pooling and concatenation to fuse the multi-scale features extracted from each stage to obtain a comprehensive feature vector.
[0079] Step 5.1: Construct a multi-stage pyramid network, where each level is called a stage, and each stage contains a KAN residual block and a global average pooling layer. Adjacent stages are connected and dimensionality reduced using an adaptive lifting wavelet module. The KAN residual block serves as the backbone component for feature extraction within each stage of the network, and the adaptive lifting wavelet module serves as the lossless downsampling component between adjacent stages. Based on the multi-stage pyramid network, perform hierarchical extraction and multi-scale fusion of deep features. The specific process is as follows:
[0080] Let the first The feature maps output by the KAN residual blocks at each stage are , For the number of channels, The current sequence length; The input is fed into the adaptive lifting wavelet module of the stage, and then the output of the adaptive lifting wavelet module is used... The convolution is linearly projected and batch normalized to adjust the number of channels to the dimension required for the next stage, thus obtaining the input to the KAN residual block for the next stage. .
[0081] Step 5.2: By alternately stacking KAN residual blocks and adaptive lifting wavelet modules, the network continuously propagates to deeper layers; as the network depth increases, after... The model extracts features from a pyramid at different levels, capturing fault features at different time scales and receptive fields at various depths, denoted as follows: .
[0082] Step 5.3: Perform multi-scale feature fusion operation, applying global average pooling to the output of the KAN residual block at each stage, and merging the time dimension. Compress to 1, and extract channel feature vectors that are globally representative. The calculation formula is as follows:
[0083]
[0084] in, for The OK.
[0085] Step 5.4: Concatenate the multi-scale feature vectors extracted from all levels and fuse them to obtain a dimension of... One-dimensional composite feature vector :
[0086]
[0087] This comprehensive feature vector It fully preserves multi-scale fault information from shallow low-frequency envelope to deep high-order nonlinearity, laying a solid feature foundation for the final fault classification.
[0088] Step 6: Introduce a prototype-aware contrastive learning mechanism, construct a joint loss function including cross-entropy loss and prototype contrastive loss, optimize and train the network model, and use the trained model to output bearing fault diagnosis results, specifically:
[0089] Step 6.1: Combine the one-dimensional comprehensive feature vector obtained by multi-scale fusion To overcome the shortcomings of single cross-entropy loss in handling weak or complex faults, which results in "excessively large intra-class spacing and excessively small inter-class spacing" in the feature space, this application introduces prototype-aware contrastive learning.
[0090] First, a learnable category prototype matrix is initialized at the end of the multi-stage pyramid network. ,in The total number of bearing failure categories; category prototype matrix. Each row vector These represent the feature centers of the corresponding fault categories, i.e., the category prototypes.
[0091] To compute cosine similarity in a unified metric space, the input feature vectors are... With the first category prototype matrix Each category prototype Perform separately Norm normalization yields normalized features. With normalized prototype :
[0092]
[0093]
[0094] Step 6.2: Calculate normalized features The scaled inner product with all category prototypes yields the logistic value used for classification prediction, which is then combined with the true labels to calculate the smoothed cross-entropy loss. .
[0095] Step 6.3: To force samples to converge toward the prototype center of their respective categories in the feature space, a prototype contrastive loss is further constructed. The calculation formula is as follows:
[0096]
[0097] in, The size of the current training batch; For the first The normalized features of each sample, and their corresponding true class labels are: The prototype vector corresponding to the real category is then... ; This refers to the cosine similarity between the sample features and their true prototypes; the final joint optimization total loss of the network model. This is a weighted sum of cross-entropy loss and prototype contrast loss:
[0098]
[0099] in, The hyperparameter is used to balance the loss ratio.
[0100] Step 6.4: During training, continuously minimize the total loss using the gradient descent algorithm. Update the network weights until the model converges and save the optimal parameters.
[0101] When the network training is completed and put into actual fault diagnosis inference, in order to further improve the robustness of the model in complex industrial environments, a test-time enhancement mechanism is introduced. The original test signal, the signal flipped along the time dimension, and the signal injected with small Gaussian noise are respectively fed into the trained network. The average of the predicted probabilities of these three outputs is taken as the final comprehensive decision result, so as to complete the high-precision classification and evaluation of faults.
[0102] The invention will be further explained below with reference to specific experimental data and implementation details.
[0103] To comprehensively verify the effectiveness and robustness of this invention under different operating conditions, five authoritative publicly available bearing fault datasets—CWRU, IMS, JNU, MFPT, and PU—were selected for testing. The parameters of each dataset are shown in Table 1.
[0104] Table 1. Statistical table of parameters and quantities of the dataset.
[0105]
[0106] As shown in Table 1, each dataset has a sufficient number of samples, and the different datasets cover a variety of fault categories and speed conditions. This diversity in data distribution provides a reliable basis for verifying the model's generalization ability in complex scenarios. To quantitatively evaluate the diagnostic performance of the model for bearing faults in this invention, the following formula was selected as the evaluation index, and the experimental results are shown in Table 2:
[0107] Acc: Accuracy. This refers to the proportion of correctly classified samples out of the total sample.
[0108]
[0109] Pre: Precision. This refers to the proportion of samples that the model predicts to be positive, but which are actually positive.
[0110]
[0111] Rec: Recall. This is the proportion of samples that are actually positive, but which are predicted as positive by the model.
[0112]
[0113] F1-Score: The harmonic mean of precision and recall, used to measure the overall classification performance of a model, especially in class imbalance problems.
[0114]
[0115] Table 2 shows the results of this invention across five datasets.
[0116]
[0117] As shown in Table 2, the model proposed in this invention exhibits outstanding classification performance on five typical bearing datasets covering different sources and operating conditions: CWRU, IMS, JNU, MFPT, and PU.
[0118] Extremely high classification accuracy: In the CWRU, JNU and MFPT datasets, the model of this invention achieved a perfect recognition effect of 100.0% in all four core metrics: accuracy (Acc), precision (Pre), recall (Rec) and F1 score.
[0119] High generalization and reliability: In the IMS and PU datasets containing complex noise and varying operating conditions, this model still maintains extremely high diagnostic accuracy (97.50% and 98.26%, respectively). This fully demonstrates that the algorithm has strong adaptability to multi-source heterogeneous data and can stably extract discriminative fault features.
[0120] Table 3 Results of different baseline algorithms on five datasets
[0121]
[0122] Based on the comparative experimental results in Table 3, the proposed model was subjected to multi-dimensional benchmarking tests against five mainstream baseline models, including CNN, WDCNN, LSTM, ResNet18, and ViT.
[0123] Comprehensive Superiority: In all five test datasets, the performance of this invention's model generally outperforms existing mainstream algorithms. Under typical operating conditions (such as CWRU and JNU), this model achieves a comprehensive surpassing of the basic models.
[0124] Robustness under extreme conditions: The advantages of this model are particularly significant on the PU dataset, which has the most severe environment and the strongest background noise. The accuracy of traditional models such as CNN and LSTM dropped significantly to 88.17% and 91.26% respectively, while this model, thanks to its adaptive noise reduction and KAN high-order nonlinear fitting ability, still maintains a high accuracy level of 98.26%.
[0125] Conclusion: The above comparative data strongly demonstrates that the method proposed in this invention has made breakthrough progress in solving the pain points of traditional deep learning models, such as poor noise resistance and insufficient generalization under complex conditions, and has significant technical advantages.
[0126] The above description is only a preferred embodiment of the present invention and is not intended to limit the ideas of the present invention. Any modifications, equivalent substitutions, improvements, etc., made within the spirit and principles of the present invention should be included within the protection scope of the present invention.
Claims
1. A bearing fault diagnosis method based on adaptive lifting wavelet and convolutional KAN network, characterized in that, include: Step 1: Acquire the original one-dimensional vibration signal of the bearing, use the sliding window technique to truncate the original signal, and standardize the truncated signal sample to construct the input tensor; Step 2: Construct a large convolutional kernel initial feature extraction module to capture the initial low-frequency envelope features of the input tensor using a large-size one-dimensional convolutional kernel; Step 3: Construct a KAN residual block that integrates local convolution and spline nonlinearity, extract the local impact envelope through depthwise separable convolution, and use the high-order spline function of the Kolmogorov-Arnold network to realize nonlinear feature mapping; Step 4: Construct an adaptive lifting wavelet module based on learnable soft threshold denoising. The features are decomposed by a neural network parameterized predictor and updater, and the high-frequency detail features are dynamically denoised using a learnable threshold. Step 5: Construct a multi-stage pyramid network by alternately cascading the KAN residual blocks and the adaptive lifting wavelet module, and perform global pooling and splicing fusion of the multi-scale features extracted from each stage to obtain a comprehensive feature vector; Step 6: Introduce a prototype-aware contrastive learning mechanism, construct a joint loss function that includes cross-entropy loss and prototype contrastive loss, optimize and train the network model, and use the trained model to output bearing fault diagnosis results.
2. The bearing fault diagnosis method based on adaptive lifting wavelet and convolutional KAN network according to claim 1, characterized in that, Step 1 specifically involves: Step 1.1: Acquire raw one-dimensional vibration signals of bearings in industrial equipment under different operating conditions using an accelerometer. ,in This represents the total number of sampling points in the signal sequence. Step 1.2: The original one-dimensional vibration signal is overlapped and truncated using the sliding window technique. The length of the sliding window is set to... The sliding step size is The total number of local sample sequences generated after truncation using a sliding window. The calculation formula is as follows: Among them, symbols This indicates a floor operation; the floor value extracted by this operation is the 1st floor. A local sample sequence is represented as: Step 1.3: Calculate the sequence of each extracted local sample. sample mean with standard deviation ; Step 1.4: For Independent Z-score mean-variance standardization is performed. To prevent division-by-zero errors when signal fluctuations are extremely small (i.e., the standard deviation approaches 0), a very small constant is introduced into the denominator when normalizing each data point in the sequence. For protection, the standardized processing formula is as follows: in, Indicates the first The a-th data point in a local sample sequence; Let a be the standardized data point a; after normalizing each data point, the standardized input sequence is obtained. ; Step 1.5: After the sliding window truncation and standardization processes described above, summarize all standardized input sequences with corresponding fault category labels, divide them into training and test sets proportionally, and then... The tensor format required for conversion to a deep learning model is denoted as... .
3. The bearing fault diagnosis method based on adaptive lifting wavelet and convolutional KAN network according to claim 2, characterized in that, Step 2 specifically involves: Step 2.1: For the input tensor Perform a one-dimensional large kernel convolution operation: Set the convolution kernel size to... Large stride length is The zero-padding length is The number of output channels is For the first One channel, Its convolution output feature map In spatial location The formula for calculating the value at this location is as follows: in, This is a local sliding index within the convolution window. For the first The weight vector of the nth convolutional kernel One element; For the corresponding bias term; Step 2.2: To accelerate network convergence and alleviate internal covariate shift, the convolution output feature map is processed. Batch normalization is performed, and nonlinear expressive power is introduced through the ReLU activation function; the mean within the batch is set to be... The variance is The channel learnable scaling parameter is The translation parameters are Intermediate features are obtained after nonlinear mapping. Its spatial location The formula for calculating the value at this location is as follows: in, This is a very small positive constant introduced in batch normalization to ensure numerical stability; Step 2.3: For intermediate features Apply one-dimensional max pooling, and set the pooling kernel size to 1. Pooling step size is Fill with The final extracted initial low-frequency feature map In spatial location The value is calculated as follows: in, For the local sliding index within the pooling window, all Initial low-frequency feature map extracted from each channel Stacking and merging along the channel dimension yields the initial low-frequency envelope features for multiple channels. ,in This represents the reduced sequence length after large-stride convolution and pooling. The number of channels is used as the input to the first-stage KAN residual block in the multi-stage pyramid network.
4. The bearing fault diagnosis method based on adaptive lifting wavelet and convolutional KAN network according to claim 3, characterized in that, Step 3 specifically involves: Step 3.1: Let the feature tensor of the input KAN residual block be denoted as . , for input One-dimensional depthwise separable convolutions with a grouping mechanism are applied to capture local fault impact envelopes. After convolution, the features are sequentially passed through batch normalization (BatchNorm) and SiLU activation functions to obtain intermediate features. : in, This represents a one-dimensional depthwise separable convolution, which achieves decoupling and extraction of local temporal feature information in each channel by spatially convolving the convolution kernel with each channel of the input. This significantly reduces the number of model parameters and computational complexity. Step 3.2: In order to suppress invalid noise channels and highlight fault characteristic channels, [the following steps are taken]. Introducing a channel attention mechanism, and using global average pooling to... The data is compressed into channel descriptors, and then processed through two fully connected layers and a sigmoid function to calculate the adaptive weight vector for each channel. ; then combine the weight vector with the features Perform element-wise multiplication to obtain the recalibrated features. : Subsequently, a higher-order nonlinear mapping based on spline curves is performed. In order to overcome the expression bottleneck of fixed activation functions, the features are... The dimensions are reshaped into a one-dimensional sequence to fit the KAN layer structure; in the KAN architecture, the activation function is transferred from the neuron node to the connection edge, and the network does not use fixed linear weights, but instead constructs a learnable univariate nonlinear function on the feature propagation edge. This enables highly flexible mapping of input features; let... Features After the above flattening and reshaping, the scalar components input to specific neuron nodes, the activation output of the KAN layer, consists of a combination of a basic activation function and a spline function: in, Based on the activation function, For a spline function defined on a learnable grid, and These are the learnable scaling parameters of the fundamental function and the spline function, respectively; the spline function... A set of B-spline basis functions Linear combination expansion: in, The number of grid intervals, Let be the spline order of the piecewise polynomial. These are the control point coefficients for adaptive optimization during network training. Step 3.3: Activate the output The features, after being mapped by a higher-order nonlinear method, are reconstructed back to their original dimensions to obtain the feature map. ; Step 3.4: To prevent overfitting in deep networks, ... Apply random depth dropout regularization and compare it with the initial input. By adding element by element, we obtain the final output of this module. : in, Discard regularization operations for random depths.
5. The bearing fault diagnosis method based on adaptive lifting wavelet and convolutional KAN network according to claim 4, characterized in that, Step 4 specifically involves: Step 4.1: Output the final value of the KAN residual block. The input is fed into an adaptive lifting wavelet module for feature decomposition and dynamic noise reduction; the physical processes of splitting, prediction and updating are reconstructed using learnable neural network operators; First, perform an even / odd split operation; then, along the time series dimension, split the input features... Split into even-numbered sequences by index and odd sequence Both are half the length of the original sequence; regarding spatial location Its splitting formula is as follows: Step 4.2: In the prediction stage, a nonlinear predictor composed of grouped one-dimensional convolution and hyperbolic tangent activation function was designed. Using even number sequences To predict odd number sequences The prediction residual between the two is the extracted high-frequency detail feature. : For high-frequency detail features Each channel is initialized with a threshold parameter that can be dynamically adjusted during backpropagation of the network. To ensure the non-negativity of the threshold, its absolute value is taken. For high-frequency detail features Applying a soft threshold shrinkage operation yields the cleaned high-frequency detail features. : in, It is a symbolic function; Step 4.3: In the update phase, to maintain the global smoothness of low-frequency features and avoid aliasing, the cleaned high-frequency detail features are utilized. To update the even sequence Design a nonlinear updater consisting of grouped one-dimensional convolutions and a sigmoid activation function. Calculate the low-frequency approximate features that include the global trend of the signal. : in, This indicates that grouped one-dimensional convolutions are used to extract local spatial features of high-frequency details. The Sigmoid activation function is used to generate weighted coefficients for feature importance. This indicates element-wise multiplication; Step 4.4: After the above adaptive lifting wavelet transform and soft thresholding denoising, the low-frequency approximate features are... High-frequency detail features after cleaning By splicing along the channel dimension, a fused feature map that combines macro trends and pure impact characteristics is obtained, which serves as a reliable input for the next stage KAN residual block in the multi-stage pyramid network.
6. The bearing fault diagnosis method based on adaptive lifting wavelet and convolutional KAN network according to claim 3, characterized in that, Step 5 specifically involves: Step 5.1: Construct a multi-stage pyramid network, where each level is called a stage, and each stage contains a KAN residual block and a global average pooling layer. Adjacent stages are connected and dimensionality reduced using an adaptive lifting wavelet module. The KAN residual block serves as the backbone component for feature extraction within each stage of the network, and the adaptive lifting wavelet module serves as the lossless downsampling component between adjacent stages. Based on the multi-stage pyramid network, perform hierarchical extraction and multi-scale fusion of deep features. The specific process is as follows: Let the first The feature maps output by the KAN residual blocks at each stage are , For the number of channels, The current sequence length; The input is fed into the adaptive lifting wavelet module of the stage, and then the output of the adaptive lifting wavelet module is used... The convolution is linearly projected and batch normalized to adjust the number of channels to the dimension required for the next stage, thus obtaining the input to the KAN residual block for the next stage. ; Step 5.2: By alternately stacking KAN residual blocks and adaptive lifting wavelet modules, the network continuously propagates to deeper layers; as the network depth increases, after... The model extracts features from a pyramid at different levels, capturing fault features at different time scales and receptive fields at various depths, denoted as follows: ; Step 5.3: Perform multi-scale feature fusion operation, applying global average pooling to the output of the KAN residual block at each stage, and merging the time dimension. Compress to 1, and extract channel feature vectors that are globally representative. The calculation formula is as follows: in, for The OK; Step 5.4: Concatenate the multi-scale feature vectors extracted from all levels and fuse them to obtain a dimension of... One-dimensional composite feature vector : This comprehensive feature vector It fully preserves multi-scale fault information from shallow low-frequency envelope to deep high-order nonlinearity, laying a solid feature foundation for the final fault classification.
7. The bearing fault diagnosis method based on adaptive lifting wavelet and convolutional KAN network according to claim 6, characterized in that, Step 6 specifically involves: Step 6.1: Combine the one-dimensional comprehensive feature vector obtained by multi-scale fusion The classification and loss calculation are performed by mapping to the metric space, and prototype-aware contrastive learning is introduced. First, a learnable category prototype matrix is initialized at the end of the multi-stage pyramid network. ,in The total number of bearing failure categories; category prototype matrix. Each row vector These represent the feature centers of the corresponding fault categories, i.e., the category prototypes; To compute cosine similarity in a unified metric space, the input feature vectors are... With the first category prototype matrix Each category prototype Perform separately Norm normalization yields normalized features. With normalized prototype : Step 6.2: Calculate the normalized features The scaled inner product with all category prototypes yields the logistic value used for classification prediction, which is then combined with the true labels to calculate the smoothed cross-entropy loss. ; Step 6.3: To force samples to converge toward the prototype center of their respective categories in the feature space, a prototype contrastive loss is further constructed. The calculation formula is as follows: in, The size of the current training batch; For the first The normalized features of each sample, and their corresponding true class labels are: The prototype vector corresponding to the real category is then... ; This refers to the cosine similarity between the sample features and their true prototypes; the final joint optimization total loss of the network model. This is a weighted sum of cross-entropy loss and prototype contrast loss: in, Hyperparameters used to balance the loss ratio; Step 6.4: During training, continuously minimize the total loss using the gradient descent algorithm. Update the network weights until the model converges and save the optimal parameters.