A coordinate attention-guided engine bearing class incremental fault diagnosis method

By employing a coordinate attention mechanism and an information retention principle in example selection, along with a two-level knowledge distillation technique, the problems of catastrophic forgetting and new category prediction bias in bearing fault diagnosis are solved. This enables accurate identification and continuous learning of both old and new fault categories, and is applicable to the entire lifecycle fault diagnosis of vehicle transmission systems and industrial machinery.

CN122241449APending Publication Date: 2026-06-19NANJING UNIV OF AERONAUTICS & ASTRONAUTICS

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
NANJING UNIV OF AERONAUTICS & ASTRONAUTICS
Filing Date
2026-01-27
Publication Date
2026-06-19

AI Technical Summary

Technical Problem

Existing incremental learning methods for bearing fault diagnosis suffer from problems such as catastrophic forgetting, unreasonable example selection, incomplete feature distillation, and prediction bias of new categories, resulting in an imbalance in the diagnostic accuracy of new and old fault categories and failing to achieve continuous learning of fault categories.

Method used

A coordinate attention mechanism is adopted to enhance the capture of fault-sensitive features. Combined with example selection based on the information preservation principle and two-level knowledge distillation technology, an initial fault diagnosis model is constructed through a deep residual network. In multi-stage incremental training, example sets are selected and distillation training of features and prediction results is carried out to ensure the recognition and learning of new and old categories.

Benefits of technology

It effectively mitigates catastrophic amnesia, improves the differentiation of fault categories, and enables accurate identification and continuous learning of new and old fault categories, adapting to the dynamic changes in bearing fault diagnosis scenarios.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN122241449A_ABST
    Figure CN122241449A_ABST
Patent Text Reader

Abstract

This invention discloses a coordinate attention-guided incremental fault diagnosis method for engine bearings, comprising the following steps: First, vibration signals of bearings under different health conditions are collected, preprocessed, and a balanced labeled dataset is constructed. An initial fault diagnosis model embedded with a coordinate attention mechanism is then trained. When new category fault data appears, an example set is selected from the old category samples based on the information retention principle. Subsequently, the example set and the new category data are merged to construct an incremental training set. The model is then trained and updated through coordinate attention feature distillation and prediction result similarity distillation. After training, the model can be used to accurately diagnose bearing data containing both new and old fault categories. This invention effectively mitigates catastrophic forgetting, balances the ability to identify both new and old fault categories, and has significant advantages in scenarios where fault categories dynamically increase.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention belongs to the field of intelligent fault diagnosis technology, and relates to intelligent fault diagnosis of engine bearings in incremental scenarios, specifically to a coordinate attention-guided method for incremental fault diagnosis of engine bearings. Background Technology

[0002] Bearings are core components of mechanical equipment such as transmission systems, playing crucial roles in supporting rotating parts, transmitting loads, and reducing friction. Their operating condition directly determines the reliability and operational safety of the equipment. In actual working conditions, bearings are subjected to complex conditions such as alternating loads and high-speed operation for extended periods, resulting in diverse and random fault types. New fault modes constantly emerge as the equipment continues to operate. Failure to identify new faults in a timely manner and continuously update the diagnostic model may lead to the escalation of equipment failures, downtime, and even safety accidents, causing serious economic losses and safety hazards. Therefore, achieving accurate diagnosis in scenarios with dynamically increasing fault categories is of great significance for ensuring the continuous and stable operation of equipment, reducing maintenance costs, and improving safety assurance capabilities.

[0003] In recent years, deep learning technology has been widely used in bearing fault diagnosis due to its powerful capabilities in automatic feature extraction and complex pattern recognition. By constructing deep neural network models, fault features can be learned from massive vibration signals, enabling efficient identification of known fault categories. However, traditional deep learning models are mostly trained on static datasets, assuming that the training and test sets contain fixed fault categories. When faced with new fault categories, the model suffers from catastrophic forgetting, which means that in the process of learning new category knowledge, the ability to identify the original fault categories is rapidly lost.

[0004] To address this issue, the concept of incremental learning was proposed. Its core principle is to enable models to continuously learn new category knowledge from dynamic data streams while retaining the ability to identify old categories, thus providing the possibility for lifelong learning of fault diagnosis knowledge. However, existing incremental learning methods still have many shortcomings in bearing fault diagnosis scenarios. On the one hand, paradigm selection methods fail to balance the completeness of the old category sample distribution with the selection of easily forgotten samples, resulting in insufficient review of old knowledge and an inability to effectively mitigate catastrophic forgetting. On the other hand, knowledge distillation is limited to constraints at the prediction result level, neglecting the consistency at the feature representation level and failing to consider the need to capture fault-sensitive features in fault diagnosis. Furthermore, traditional methods are prone to prediction bias towards new categories during incremental training, leading to an imbalance in diagnostic accuracy between new and old categories, affecting overall diagnostic performance.

[0005] Therefore, in response to the practical need for the dynamic increase of fault categories in bearing fault diagnosis, existing incremental learning methods suffer from problems such as catastrophic forgetting, unreasonable example selection, incomplete feature distillation, and prediction bias of new categories. There is an urgent need to design an adaptive incremental intelligent diagnosis method that adapts to the fault diagnosis scenario, so as to achieve accurate identification and continuous learning of new and old fault categories. Summary of the Invention

[0006] Purpose of the invention: To address the shortcomings of the aforementioned background technologies, this invention provides a coordinate attention-guided incremental fault diagnosis method for engine bearings. By embedding a coordinate attention mechanism to enhance the capture of fault-sensitive features, and combining example selection based on the information retention principle with a two-level knowledge distillation technique, the method achieves efficient learning of new fault categories and knowledge retention of old fault categories. This effectively solves problems such as catastrophic forgetting, new category prediction bias, and insufficient fault feature capture in incremental scenarios, achieving accurate identification and continuous learning of both new and old fault categories.

[0007] Technical solution: The coordinate attention-guided incremental fault diagnosis method for engine bearings described in this invention includes the following steps:

[0008] (1) Preprocess the vibration signals of the bearing under different states obtained in advance to construct a balanced label dataset;

[0009] (2) Based on the deep residual network ResNet50 as the backbone architecture, an initial fault diagnosis model with embedded coordinate attention mechanism was constructed and trained.

[0010] (3) Design a multi-stage incremental training task to simulate the actual scenario of dynamic increase in fault categories and construct a new category dataset;

[0011] (4) When new types of fault data appear, an example set is selected from the old type of samples based on the principle of information retention;

[0012] (5) Merge the example set and the new category data to construct an incremental training set, and train and update the model through coordinate attention feature distillation and prediction result similarity distillation;

[0013] (6) Use the trained model to diagnose the bearing data containing both new and old fault categories.

[0014] Furthermore, the vibration signals of the bearing in different states described in step (1) include vibration signals of normal state, inner ring minor fault, inner ring medium fault, inner ring deep fault, outer ring minor fault, outer ring medium fault, outer ring deep fault, rolling element minor fault, rolling element medium fault, and rolling element deep fault.

[0015] Furthermore, the implementation process of step (1) is as follows:

[0016] A bandpass filter is used to remove environmental noise and low-frequency interference. Then, synchronous compression transform is used to reconstruct the time-frequency features of the denoised vibration signal, transforming the one-dimensional time-domain vibration signal into a two-dimensional time-frequency image. Finally, the time-frequency image is normalized to map the pixel values ​​to the [0,1] interval. The preprocessed time-frequency image is divided into training and test sets according to the proportion to construct a balanced label dataset.

[0017] Further, the initial fault diagnosis model in step (2) includes an input layer, a convolutional feature extraction module, a coordinate attention module, a fully connected layer, and a Softmax classifier; wherein:

[0018] Input layer: Receives normalized time-frequency images;

[0019] Convolutional feature extraction module: includes 5 convolutional stages, each stage including a convolutional layer, a batch normalization layer, a ReLU activation function and a max pooling layer; the first convolutional stage uses a 7×7 convolutional kernel for initial feature extraction, and subsequent convolutional stages use a 3×3 small convolutional kernel for deep feature mining, and residual connections are used to alleviate the gradient vanishing problem.

[0020] Coordinate attention module: Embedded after each convolutional stage, it performs global average pooling on the feature map along both the vertical and horizontal spatial dimensions to generate a position-sensitive one-dimensional feature vector; horizontal feature aggregation. Vertical feature aggregation and attention weight generation , In the formula and It is the result of encoding the c-th channel in different directions. and These are the two tensors resulting from the intermediate dimension decomposition. and Representing different convolutional dimensionality-up operations, It is the sigmoid activation function, and finally we get and As output;

[0021] Subsequently, the feature vector is subjected to dimensionality reduction and dimensionality increase transformation through two fully connected layers, and channel attention weights are generated by the Sigmoid activation function. Finally, the attention weights are multiplied element-wise with the original feature map to enhance fault-sensitive features and suppress irrelevant background information.

[0022] Fully connected layer and classifier: The high-dimensional feature map output by the convolutional feature extraction module is flattened into a one-dimensional feature vector, and the features are fused through two fully connected layers. Finally, the probability distribution of different bearing states is output by the Softmax classifier.

[0023] Furthermore, the training process for the initial fault diagnosis model described in step (2) is as follows:

[0024] Set model training parameters: Use stochastic gradient descent as the optimizer and set model parameters such as learning rate, batch size, weight decay coefficient and number of training iterations.

[0025] The cross-entropy loss function is used as the optimization objective for model training to measure the difference between the predicted label and the true label. An early stopping strategy is adopted during training. When the accuracy of the test set does not improve for 10 consecutive iterations, training is stopped and the current best model, i.e., the initial fault diagnosis model, is saved.

[0026] Furthermore, the implementation process of step (3) is as follows:

[0027] Design a 6-stage incremental training task to simulate the real-world scenario of dynamically increasing fault categories: Stage 1 training samples are normal and various types of minor faults; Stage 2 adds faults in the inner race; Stage 3 adds faults in the outer race; Stage 4 adds faults in the rolling element; Stage 5 adds deep faults in the inner race; Stage 6 adds deep faults in the outer race and deep faults in the rolling element, covering a total of 10 states; construct new category datasets, each new category dataset including a training set and a test set.

[0028] Furthermore, the implementation process of step (4) is as follows:

[0029] Calculate the similarity between the feature vectors of old category samples and the average feature vectors of the categories to select core samples that cover the sample distribution; compare the difference in prediction loss for old samples before and after the model update to select samples that are easily forgotten.

[0030] Through formula Calculate the adaptive weighting coefficient η, where This represents the total number of samples after the increment. For the number of samples in the new category, The sample size of the example set;

[0031] According to the formula Select the K oldest category samples with the highest information content to form an example set. Let y be the average feature vector of category y. For sample feature vectors, To predict the loss difference; according to Sort the samples from largest to smallest, select samples from each old category to form an example set, and ensure that at least one sample is retained for each old category.

[0032] Furthermore, the implementation process of step (5) is as follows:

[0033] Input all samples from the old category into the student model and the teacher model, and extract the attention vectors output by the coordinate attention modules of the two models respectively; calculate the L1 norm loss of the attention vectors of the teacher and student models, using the following formula: ,in For horizontal attention vector loss, The vertical attention vector loss is used as a constraint to train the student model to learn the attention allocation pattern of the teacher model, ensuring the consistency of the capture of fault-sensitive features by the new and old models.

[0034] Cosine similarity is used to measure the difference in prediction results between teacher and student models, and a prediction distillation loss function is constructed. ,in , These are the parameters for the teacher model and the student model, respectively. This is a label matching indicator function; it returns 1 if the labels match, and 0 otherwise.

[0035] Constructing a composite loss function ,in For cross-entropy loss, Characteristic distillation loss, To predict distillation loss, the model parameters are optimized using the backpropagation algorithm. When the number of iterations reaches a set value, training is stopped and the optimal model for the current incremental stage is saved.

[0036] Beneficial effects: Compared with the prior art, the beneficial effects of the present invention are as follows:

[0037] 1. This invention combines coordinate attention mechanism with deep residual network to aggregate features along the horizontal and vertical directions, accurately capture location-sensitive information and fault-sensitive features, significantly improve the fault category discrimination, and lay the foundation for the identification of new and old categories;

[0038] 2. The example selection method of this invention, based on the principle of information retention, takes into account both the integrity of sample distribution and the screening of easily forgotten samples. Combined with two-level knowledge distillation, it effectively alleviates catastrophic forgetting.

[0039] 3. This invention reduces the prediction bias for new categories during incremental training by adjusting adaptive weight coefficients and optimizing composite loss functions, thereby achieving balanced identification of both new and old fault categories.

[0040] 4. This invention is adapted to bearing fault diagnosis scenarios, and can achieve continuous learning of fault categories without reconstructing the model. It has strong generalization ability and can be widely used in the whole life cycle fault diagnosis of bearings in vehicle transmission systems, industrial machinery and equipment, etc. Attached Figure Description

[0041] Figure 1 This is a flowchart of the present invention;

[0042] Figure 2 This is a physical image of the bearing fault simulation device in this invention;

[0043] Figure 3 This is a diagram of bearing vibration signals collected in this invention;

[0044] Figure 4 This is a two-dimensional time-frequency image reconstructed from the time-frequency features of the denoised vibration signal using synchronous compression transform in this invention;

[0045] Figure 5 This is a line graph showing the accuracy of bearing fault diagnosis in the method of this invention. Detailed Implementation

[0046] The present invention will now be described in further detail with reference to the accompanying drawings.

[0047] like Figure 1 As shown, this invention proposes a coordinate attention-guided incremental fault diagnosis method for engine bearings, which specifically includes the following process:

[0048] Step 1: Test platform setup and vibration signal acquisition.

[0049] A bearing failure simulation test platform was constructed. This platform mainly consists of a bearing failure simulation test bench, a Leuven Measurement Systems (LMS) data acquisition system, and a control cabinet. The bearing simulation test bench primarily comprises a base plate, a drive motor, a bearing housing, a cylindrical outer shell, and an outer shell support, as detailed below. Figure 2 As shown. The LMS data acquisition instrument's sampling frequency was set to 25.6kHz, and each acquisition session lasted 20 seconds to ensure that the signal contained complete fault characteristic information. The time-domain plot of some acquired bearing vibration signals is shown below. Figure 3 As shown in Table 1, vibration signals of the bearing were collected for 10 different states: normal state (C0), minor inner ring fault (C1, 0.2 mm damage), medium inner ring fault (C2, 0.6 mm damage), deep inner ring fault (C3, 1.2 mm damage), minor outer ring fault (C4, 0.2 mm damage), medium outer ring fault (C5, 0.6 mm damage), deep outer ring fault (C6, 1.2 mm damage), minor rolling element fault (C7, 0.2 mm damage), medium rolling element fault (C8, 0.6 mm damage), and deep rolling element fault (C9, 1.2 mm damage). 200 samples were collected for each state.

[0050] Table 1 Fault Types and Label Descriptions

[0051]

[0052] Step 2: Data preprocessing and balanced label dataset construction.

[0053] The acquired raw vibration signal was preprocessed as follows: First, a bandpass filter was used to remove environmental noise and low-frequency interference; then, Synchronous Compression Transform (SST) was used to reconstruct the time-frequency features of the denoised vibration signal, transforming the one-dimensional time-domain vibration signal into a two-dimensional time-frequency image, such as... Figure 4 As shown. This transformation effectively preserves the local time-frequency features of the signal and improves the identification of fault features. Finally, the time-frequency image is normalized to map pixel values ​​to the [0,1] interval, reducing the impact of data scale differences on model training. The preprocessed time-frequency image is divided into training and test sets proportionally for initial model training.

[0054] Step 3: Initial fault diagnosis model construction and training.

[0055] A bearing fault diagnosis initial model is constructed based on a deep residual network (ResNet50) as the backbone architecture and embedding a coordinate attention mechanism. The model mainly consists of an input layer, a convolutional feature extraction module, a coordinate attention module, a fully connected layer, and a Softmax classifier.

[0056] Input layer: Receives standardized time-frequency images, adapted to the ResNet50 architecture.

[0057] Convolutional feature extraction module: It contains 5 convolutional stages. Each stage consists of a convolutional layer, a batch normalization layer, a ReLU activation function and a max pooling layer. The first convolutional stage uses a 7×7 convolutional kernel for initial feature extraction. Subsequent convolutional stages use 3×3 small convolutional kernels for deep feature mining. Residual connections are used to alleviate the gradient vanishing problem and improve the stability of model training.

[0058] Coordinate attention module: Embedded after each convolutional stage, it performs global average pooling on the feature map along both the vertical and horizontal spatial dimensions to generate a one-dimensional feature vector with position sensitivity.

[0059] The vector is then subjected to dimensionality reduction and expansion transformations via two fully connected layers, and channel attention weights are generated using a sigmoid activation function. Finally, the attention weights are multiplied element-wise with the original feature map to enhance fault-sensitive features and suppress irrelevant background information.

[0060] Fully connected layer and classifier: The high-dimensional feature map output by the convolutional feature extraction module is flattened into a one-dimensional feature vector, and the feature is fused through two fully connected layers. Finally, the probability distribution of 10 states is output by the Softmax classifier.

[0061] Set model training parameters: Use stochastic gradient descent as the optimizer and set model parameters such as learning rate, batch size, weight decay coefficient and number of training iterations.

[0062] The cross-entropy loss function is used as the optimization objective for model training, measuring the difference between the predicted label and the true label. An early stopping strategy is employed during training: when the test set accuracy shows no improvement for 10 consecutive iterations, training is stopped and the current optimal model, i.e., the initial fault diagnosis model, is saved.

[0063] Step 4: Incremental task design and new category data preparation.

[0064] A six-stage incremental training task is designed to simulate a real-world scenario where fault categories dynamically increase: Stage 1 (initial stage) uses C0, C1, C4, and C7 (normal and various types of minor faults); Stage 2 adds C2 (fault in the inner race); Stage 3 adds C5 (fault in the outer race); Stage 4 adds C8 (fault in the rolling element); Stage 5 adds C3 (deep fault in the inner race); Stage 6 adds C6 (deep fault in the outer race) and C9 (deep fault in the rolling element), covering a total of 10 states. For each newly added fault category in each incremental stage, vibration signals are acquired according to the acquisition standards in step 1, and converted into standardized time-frequency images through the preprocessing process in step 2 to construct a new category dataset. Each newly added category dataset includes a training set and a test set.

[0065] Step 5: Selection of an example set based on the principle of information retention.

[0066] When entering a certain incremental phase, the example set selection process is initiated to select the samples with the highest information content from the old class samples that have been trained to form the example set. The specific process is as follows:

[0067] Step 5.1: Feature vector extraction and class center calculation.

[0068] Input all samples from the old categories into the initial fault diagnosis model (or the incremental model trained in the previous stage), and extract the one-dimensional feature vector output by the fully connected layer; calculate the average feature vector (category center) for each old category, using the formula: ,in Let y be the average feature vector of category y. is the feature vector output by the model feature extractor, where n is the number of samples of category y.

[0069] Step 5.2: Core sample screening.

[0070] Calculate the Euclidean distance between the feature vector of each old category sample and the category center, sort them in ascending order of distance, and select the top 30% of samples as core samples to ensure that the example set covers the core distribution of old category samples.

[0071] Step 5.3: Screening for Easily Forgotten Samples.

[0072] Input the old class samples into the model trained in the previous stage and the pseudo-updated model fine-tuned only with the new class samples, respectively, and calculate the difference in prediction loss between the two models for the same sample. ( The loss is predicted for the original model. (For pseudo-update model prediction loss), according to Sort the samples from largest to smallest and select the top 20% as easily forgotten samples to enhance the retention of easily lost knowledge.

[0073] Step 5.4: Adaptive weight calculation and example set determination.

[0074] Through formula Calculate adaptive weight coefficients ,in This represents the total number of samples after the increment. For the number of samples in the new category, The number of samples in the old category is used to balance the selection weights of core samples and easily forgotten samples; a comprehensive information content evaluation index is constructed. ,in The distance between the sample features and the class center. To predict the loss difference; according to Sort the samples from largest to smallest, select samples from each old category to form an example set, and ensure that at least one sample is retained for each old category.

[0075] Step 6: Incremental training using two-level knowledge distillation.

[0076] Merge the example set and the new category training set to construct the class-increment training set for the current incremental stage; use a teacher-student network architecture for two-level knowledge distillation training, where the teacher model is the incremental model trained in the previous stage (the teacher model in the initial stage is the initial model trained in step 3), and the student model is a newly constructed network with an architecture completely identical to the teacher model. The specific training process is as follows:

[0077] Step 6.1: Coordinate attention feature distillation training.

[0078] Input all samples from the old category into the student model and the teacher model, and extract the attention vectors (horizontal and vertical directions) output by the coordinate attention modules of the two models respectively; calculate the L1 norm loss of the attention vectors of the teacher and student models, using the formula: ,in For horizontal attention vector loss, The vertical attention vector loss is used as a constraint to train the student model to learn the attention allocation pattern of the teacher model, ensuring consistency in the capture of fault-sensitive features by the old and new models.

[0079] Step 6.2: Training based on similarity distillation of prediction results.

[0080] Obtain the prediction output probability distributions of the teacher model and the student model on the incremental training set; use cosine similarity to measure the difference in prediction results between the teacher and student models, and construct a prediction distillation loss function. ,in , These are the parameters for the teacher model and the student model, respectively. The label matching indicator function is set to 1 if the labels match, and 0 otherwise. This loss constraint reduces the prediction bias of the student model for new categories and improves the balance between diagnosis of new and old categories.

[0081] Step 6.3: Optimize training using the composite loss function.

[0082] Constructing a composite loss function ,in For cross-entropy loss, Characteristic distillation loss, To predict distillation loss, the student model is trained using the same training parameters as in step 3. The model parameters are optimized using the backpropagation algorithm. When the number of iterations reaches a set value, training is stopped and the optimal model for the current incremental stage is saved.

[0083] Step 7: Model Performance Validation and Iterative Updates

[0084] Input the test set of the current incremental stage into the trained incremental model, and calculate the model's average incremental accuracy and average forgetting rate. The average incremental accuracy is the average diagnostic accuracy of the model for all known categories, and the average forgetting rate is the average decrease in the diagnostic accuracy of the model for each old category compared to the previous stage.

[0085] If the average incremental accuracy no longer increases, the current stage of incremental training is complete; if the above indicators are not met, adjust the example set selection ratio or the weight coefficient of the composite loss function, and repeat the training process of steps 5-6 until the model performance meets the standards.

[0086] Step 8: Application of bearing fault diagnosis

[0087] The vibration signal of the bearing to be diagnosed is converted into a standardized time-frequency image according to the preprocessing process in step 2. The time-frequency image is input into the finally trained incremental diagnostic model. The model completes the adaptive extraction and enhancement of fault features through the convolutional feature extraction module and the coordinate attention module. After feature fusion by the fully connected layer, the Softmax classifier outputs the probability distribution of fault categories, realizing accurate diagnosis of bearing faults including both new and old fault categories.

[0088] Incremental fault diagnosis simulation experiments were conducted using both the trained model and a model trained solely with a deep residual network structure. The simulation results are shown in the figure below. Figure 5 As shown, with the increase in the number of fault categories, the accuracy of the method proposed in this invention (solid black line) remains above 85%, exhibiting only a very gentle downward trend, demonstrating extremely strong stability. In contrast, the accuracy of the model trained using a deep residual network (dashed gray line) drops rapidly from approximately 95% initially; when the number of categories increases to 10, the accuracy drops to only about 10%, a significant decrease. Simulation experiments demonstrate that the method proposed in this invention can effectively retain the ability to identify existing categories while adding new fault categories, exhibiting excellent incremental learning ability and anti-forgetting characteristics. Compared with the residual network structure model, it has better stability and incremental learning ability, and is more adaptable to the needs of dynamically increasing fault categories in real-world industrial scenarios.

[0089] The above description is only a preferred embodiment of the present invention. It should be noted that for those skilled in the art, several improvements and modifications can be made without departing from the principle of the present invention, such as adjusting the embedding position of the coordinate attention module, optimizing the evaluation index of the example set selection, and modifying the weight of the loss function of knowledge distillation. These improvements and modifications should also be considered within the scope of protection of the present invention.

Claims

1. A coordinate attention-guided engine bearing class incipient fault diagnosis method, characterized in that, Includes the following steps: (1) Preprocess the vibration signals of the bearing under different states obtained in advance to construct a balanced label dataset; (2) Based on the deep residual network ResNet50 as the backbone architecture, an initial fault diagnosis model with embedded coordinate attention mechanism was constructed and trained. (3) Design a multi-stage incremental training task to simulate the actual scenario of dynamic increase in fault categories and construct a new category dataset; (4) When new types of fault data appear, an example set is selected from the old type of samples based on the principle of information retention; (5) Merge the example set and the new category data to construct an incremental training set, and train and update the model through coordinate attention feature distillation and prediction result similarity distillation; (6) Use the trained model to diagnose the bearing data containing both new and old fault categories.

2. The engine bearing incipient fault diagnosis method of claim 1, wherein, The vibration signals of the bearing in different states in step (1) include vibration signals of normal state, inner ring light fault, inner ring medium fault, inner ring deep fault, outer ring light fault, outer ring medium fault, outer ring deep fault, rolling element light fault, rolling element medium fault, and rolling element deep fault.

3. The engine bearing incipient fault diagnosis method of claim 1, wherein, The implementation process of step (1) is as follows: A bandpass filter is used to remove environmental noise and low-frequency interference. Then, synchronous compression transform is used to reconstruct the time-frequency features of the denoised vibration signal, transforming the one-dimensional time-domain vibration signal into a two-dimensional time-frequency image. Finally, the time-frequency image is normalized to map the pixel values ​​to the [0,1] interval. The preprocessed time-frequency image is divided into training and test sets according to the proportion to construct a balanced label dataset.

4. The engine bearing incipient fault diagnosis method of claim 1, wherein, The initial fault diagnosis model in step (2) includes an input layer, a convolutional feature extraction module, a coordinate attention module, a fully connected layer, and a Softmax classifier; wherein: Input layer: Receives normalized time-frequency images; Convolutional feature extraction module: includes 5 convolutional stages, each stage including a convolutional layer, a batch normalization layer, a ReLU activation function and a max pooling layer; the first convolutional stage uses a 7×7 convolutional kernel for initial feature extraction, and subsequent convolutional stages use a 3×3 small convolutional kernel for deep feature mining, and residual connections are used to alleviate the gradient vanishing problem. Coordinate attention module: embedded after each convolution stage, global average pooling is performed on the feature map along the vertical and horizontal two spatial dimensions to generate a one-dimensional feature vector with position sensitivity; horizontal direction feature aggregation , vertical direction feature aggregation , and attention weight generation , , wherein and are the results of the cth channel along different directions after encoding, and are two tensors after intermediate dimension decomposition, and represent different convolution dimension increasing operations, is a sigmoid activation function, and finally and are obtained as outputs; Subsequently, the feature vector is subjected to dimensionality reduction and dimensionality increase transformation through two fully connected layers, and channel attention weights are generated by the Sigmoid activation function. Finally, the attention weights are multiplied element-wise with the original feature map to enhance fault-sensitive features and suppress irrelevant background information. Fully connected layer and classifier: The high-dimensional feature map output by the convolutional feature extraction module is flattened into a one-dimensional feature vector, and the features are fused through two fully connected layers. Finally, the probability distribution of different bearing states is output by the Softmax classifier.

5. The coordinate attention-guided incremental fault diagnosis method for engine bearings according to claim 1, characterized in that, The training process for the initial fault diagnosis model in step (2) is as follows: Set model training parameters: Use stochastic gradient descent as the optimizer and set model parameters such as learning rate, batch size, weight decay coefficient and number of training iterations. The cross-entropy loss function is used as the optimization objective for model training to measure the difference between the predicted label and the true label. An early stopping strategy is adopted during the training process. When the accuracy of the test set does not improve for 10 consecutive iterations, the training is stopped and the current best model, i.e., the initial fault diagnosis model, is saved.

6. The coordinate attention-guided incremental fault diagnosis method for engine bearings according to claim 1, characterized in that, The implementation process of step (3) is as follows: Design a 6-stage incremental training task to simulate the real-world scenario of dynamically increasing fault categories: Stage 1 training samples are normal and various types of minor faults; Stage 2 adds faults in the inner race; Stage 3 adds faults in the outer race; Stage 4 adds faults in the rolling element; Stage 5 adds deep faults in the inner race; Stage 6 adds deep faults in the outer race and deep faults in the rolling element, covering a total of 10 states; construct new category datasets, each new category dataset including a training set and a test set.

7. The coordinate attention-guided incremental fault diagnosis method for engine bearings according to claim 1, characterized in that, The implementation process of step (4) is as follows: Calculate the similarity between the feature vectors of old category samples and the average feature vectors of the categories to screen the core samples that cover the sample distribution; compare the difference in prediction loss for old samples before and after the model update to screen samples that are easily forgotten. Through formula Calculate the adaptive weighting coefficient η, where This represents the total number of samples after the increment. For the number of samples in the new category, The sample size of the example set; According to the formula Select the K oldest category samples with the highest information content to form an example set. Let y be the average feature vector of category y. For sample feature vectors, To predict the loss difference; according to Sort the samples from largest to smallest, select samples from each old category to form an example set, and ensure that at least one sample is retained for each old category.

8. The coordinate attention-guided incremental fault diagnosis method for engine bearings according to claim 1, characterized in that, The implementation process of step (5) is as follows: Input all samples from the old category into the student model and the teacher model, and extract the attention vectors output by the coordinate attention modules of the two models respectively; calculate the L1 norm loss of the attention vectors of the teacher and student models, using the following formula: ,in For horizontal attention vector loss, The vertical attention vector loss is used as a constraint to train the student model to learn the attention allocation pattern of the teacher model, ensuring the consistency of the capture of fault-sensitive features by the new and old models. Cosine similarity is used to measure the difference in prediction results between teacher and student models, and a prediction distillation loss function is constructed. ,in , These are the parameters for the teacher model and the student model, respectively. This is a label matching indicator function; it returns 1 if the labels match, and 0 otherwise. Constructing a composite loss function ,in For cross-entropy loss, Characteristic distillation loss, To predict distillation loss, the model parameters are optimized using the backpropagation algorithm. When the number of iterations reaches a set value, training is stopped and the optimal model for the current incremental stage is saved.