Cross-condition fault diagnosis method combining spatio-temporal feature fusion and parameter hierarchical transfer
By constructing a parallel CNN-BiGRU network and combining spatiotemporal feature fusion and parameter hierarchical transfer methods, the problem of insufficient generalization ability of traditional fault diagnosis methods under varying working conditions is solved, and high-accuracy cross-working-condition fault diagnosis is achieved.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- UNIV OF SCI & TECH BEIJING
- Filing Date
- 2025-10-29
- Publication Date
- 2026-06-26
Smart Images

Figure CN121350486B_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the field of cross-condition fault diagnosis technology, and in particular to a cross-condition fault diagnosis method and apparatus that combines spatiotemporal feature fusion and parameter hierarchical migration. Background Technology
[0002] Industrial equipment operates under constantly changing conditions, inevitably leading to various faults due to mechanical wear and electrical aging. Traditional fault diagnosis methods rely on manual feature extraction, which struggles to adapt to the dynamic changes in signal distribution under varying operating conditions, resulting in insufficient generalization ability of the diagnostic models. Deep learning methods can automatically extract deep features from signals; however, a single feature structure has limited ability to capture fault features, and the number of samples for industrial equipment under specific operating conditions is often severely insufficient, easily causing the model to overfit and hindering high-accuracy fault diagnosis.
[0003] Parameter transfer learning improves the model's accuracy under target operating conditions by transferring model parameters trained under the original operating conditions to the target operating conditions and fine-tuning some parameters accordingly, thus addressing the issues of insufficient data and distribution differences. Most existing transfer methods employ global parameter fine-tuning strategies, failing to distinguish the differences in the contribution of parameters from different layers of the model to changes in operating conditions. Furthermore, they use simple feature fusion methods, lacking a dynamic weighting of spatiotemporal features, which limits the robustness of cross-operating condition diagnosis.
[0004] Existing cross-condition fault diagnosis methods do not consider situations where equipment operates under multiple conditions and the number of training samples is insufficient. Traditional methods are unable to effectively extract fault features, leading to a decrease in fault accuracy. Summary of the Invention
[0005] To address the technical problems of existing technologies, such as the inability to effectively extract fault features and the resulting decrease in fault accuracy due to insufficient training samples and the lack of consideration for various operating conditions, this invention provides a cross-operating-condition fault diagnosis method and apparatus that combines spatiotemporal feature fusion and parameter hierarchical transfer. The technical solution is as follows:
[0006] On the one hand, a cross-condition fault diagnosis method combining spatiotemporal feature fusion and parameter hierarchical transfer is provided. This method is implemented by a cross-condition fault diagnosis device combining spatiotemporal feature fusion and parameter hierarchical transfer, and includes:
[0007] S1. Obtain the operating data of the equipment under the original operating condition and the operating data under the target operating condition; normalize the operating data under the original operating condition and the operating data under the target operating condition to construct the original operating condition dataset and the target operating condition dataset.
[0008] S2. Construct a fault diagnosis model based on a spatiotemporal dual-channel network with parallel CNN-BiGRU as the main framework; pre-train the fault diagnosis model using the original working condition dataset to obtain the optimal parameter set;
[0009] S3. Construct a transfer model with the same structure as the fault diagnosis model; transfer the optimal parameter set to the transfer model, and freeze all parameters of the CNN layer and BiGRU layer; train the transfer model using the target working condition dataset, and output the trained fault diagnosis model.
[0010] S4. Input the operating data of the equipment to be diagnosed into the trained fault diagnosis model and output the equipment fault diagnosis results.
[0011] On the other hand, a cross-condition fault diagnosis device combining spatiotemporal feature fusion and parameter hierarchical transfer is provided. This device is applied to a cross-condition fault diagnosis method combining spatiotemporal feature fusion and parameter hierarchical transfer. The device includes:
[0012] The acquisition unit is used to acquire the operating data of the equipment under the original operating condition and the operating data under the target operating condition; and to perform normalization processing on the operating data under the original operating condition and the operating data under the target operating condition to construct the original operating condition dataset and the target operating condition dataset.
[0013] The building unit is used to construct a fault diagnosis model based on a spatiotemporal dual-channel network with parallel CNN-BiGRU as the main framework; the fault diagnosis model is pre-trained using the original working condition dataset to obtain the optimal parameter set;
[0014] The training unit is used to construct a transfer model with the same structure as the fault diagnosis model; transfer the optimal parameter set to the transfer model, freeze all parameters of the CNN layer and BiGRU layer; train the transfer model using the target working condition dataset, and output the trained fault diagnosis model.
[0015] The diagnostic unit is used to input the operating data of the equipment to be diagnosed into the trained fault diagnosis model and output the equipment fault diagnosis results.
[0016] On the other hand, a cross-condition fault diagnosis device combining spatiotemporal feature fusion and parameter hierarchical migration is provided. The cross-condition fault diagnosis device combining spatiotemporal feature fusion and parameter hierarchical migration includes: a processor and a memory; the memory stores computer-readable instructions, and when the computer-readable instructions are executed by the processor, they implement any of the methods in the cross-condition fault diagnosis method combining spatiotemporal feature fusion and parameter hierarchical migration described above.
[0017] On the other hand, a computer-readable storage medium is provided, wherein at least one instruction is stored in the storage medium, the at least one instruction being loaded and executed by a processor to implement any of the methods described above in the cross-condition fault diagnosis method combining spatiotemporal feature fusion and parameter hierarchical migration.
[0018] The beneficial effects of the technical solutions provided in the embodiments of the present invention include at least the following:
[0019] This invention collects raw time-series signals from equipment under various operating conditions, performs normalization and resampling processing to construct a dataset containing normal operating conditions and various fault states. It then constructs a spatiotemporal dual-channel network with a parallel CNN-BiGRU framework, dynamically fusing spatiotemporal modal features using a self-attention mechanism. The model is pre-trained using a large amount of raw operating condition data to obtain the optimal parameter set. Furthermore, this invention constructs a transfer model with the same structure as the pre-trained model, transferring the pre-trained parameters to the transfer model and freezing the parameters of the spatiotemporal feature extraction module. Only the parameters of the remaining modules are fine-tuned, and model training is completed using a small amount of target operating condition data. Using this invention can improve the robustness of cross-operating condition fault diagnosis and the diagnostic accuracy under small sample conditions. Attached Figure Description
[0020] To more clearly illustrate the technical solutions in the embodiments of the present invention, the accompanying drawings used in the description of the embodiments will be briefly introduced below. Obviously, the accompanying drawings described below are only some embodiments of the present invention. For those skilled in the art, other drawings can be obtained based on these drawings without creative effort.
[0021] Figure 1 This is a flowchart of a cross-condition fault diagnosis method that combines spatiotemporal feature fusion and parameter hierarchical migration, provided by an embodiment of the present invention.
[0022] Figure 2 This is a schematic diagram of a pre-trained fault diagnosis model provided in an embodiment of the present invention;
[0023] Figure 3 This is a flowchart of a pre-trained model parameter update provided by an embodiment of the present invention;
[0024] Figure 4 This is a schematic diagram of a migration fault diagnosis model provided in an embodiment of the present invention;
[0025] Figure 5 This is a schematic diagram of a parameter migration process provided in an embodiment of the present invention;
[0026] Figure 6 This is a flowchart of a migration model parameter update provided in an embodiment of the present invention;
[0027] Figure 7 This is a block diagram of a cross-condition fault diagnosis device that combines spatiotemporal feature fusion and parameter hierarchical migration, provided by an embodiment of the present invention.
[0028] Figure 8 This is a schematic diagram of the structure of a cross-condition fault diagnosis device that combines spatiotemporal feature fusion and parameter hierarchical migration, provided by an embodiment of the present invention. Detailed Implementation
[0029] The technical solution of the present invention will now be described with reference to the accompanying drawings.
[0030] In embodiments of the present invention, words such as "exemplarily," "for example," etc., are used to indicate that something is an example, illustration, or description. Any embodiment or design described as "exemplary" in the present invention should not be construed as being more preferred or advantageous than other embodiments or designs. Specifically, the use of the word "exemplary" is intended to present the concept in a concrete manner. Furthermore, in embodiments of the present invention, the meaning expressed by "and / or" can be both, or either one.
[0031] In the embodiments of this invention, the terms "image" and "picture" may sometimes be used interchangeably. It should be noted that, without emphasizing the distinction between them, they convey the same meaning. Similarly, the terms "of," "corresponding (relevant)," and "corresponding" may sometimes be used interchangeably. It should be noted that, without emphasizing the distinction between them, they convey the same meaning.
[0032] In this embodiment of the invention, sometimes a subscript such as W1 may be written in a non-subscript form such as W1. When the difference is not emphasized, the meaning they express is the same.
[0033] To make the technical problems, technical solutions and advantages of the present invention clearer, a detailed description will be given below in conjunction with the accompanying drawings and specific embodiments.
[0034] This invention provides a cross-condition fault diagnosis method combining spatiotemporal feature fusion and parameter hierarchical migration. This method can be implemented by a cross-condition fault diagnosis device combining spatiotemporal feature fusion and parameter hierarchical migration, which can be a terminal or a server. Figure 1 The flowchart shown is for a cross-condition fault diagnosis method that combines spatiotemporal feature fusion and parameter hierarchical transfer. The processing flow of this method may include the following steps:
[0035] S1. Obtain the operating data of the equipment under the original operating condition and the operating data under the target operating condition; normalize the operating data under the original operating condition and the operating data under the target operating condition to construct the original operating condition dataset and the target operating condition dataset.
[0036] Optionally, the specific implementation process of S1 includes S11-S14:
[0037] S11. Set the equipment to the original operating condition, and operate the equipment in sequence under normal operation and fault 1 to fault 9. Use the vibration acceleration sensor on the equipment to continuously collect the vibration acceleration data of the equipment at a constant sampling frequency, and save it in the form of time series data to obtain the operating data of the original operating condition; adjust the equipment to the target operating condition, and repeat the above collection steps to obtain the operating data of the target operating condition.
[0038] The sensors on the equipment include, but are not limited to, vibration acceleration sensors, and the data collected includes, but is not limited to, vibration acceleration data.
[0039] In one feasible implementation, the equipment's operating states are divided into 10 types: one normal operating state and nine fault operating states (fault 1 to fault 9). The equipment is operated under four conditions (condition 0 to condition 3), and raw time-series signals are collected sequentially for both the normal operating state and the nine fault operating states. Specifically, based on the actual operating scenario of the equipment and considering the frequency of use and data acquisition, equipment conditions are defined: conditions frequently used during equipment operation and for which a large amount of data has accumulated are defined as the original conditions, abbreviated as condition S; conditions less frequently used and for which data is relatively scarce are defined as the target conditions, abbreviated as condition T. The raw time-series signal data is normalized and resampled, and the data is labeled with the corresponding equipment conditions and operating states, thereby constructing datasets for each equipment condition.
[0040] In this embodiment of the invention, a migration scenario is defined, namely, the original working condition is working condition 0, the target working condition is working condition 1, and the cross-working condition migration is from working condition 0 to working condition 1. This embodiment of the invention can implement a migration method including but not limited to this one.
[0041] The equipment was set to operating condition 0, and then operated sequentially under normal and fault conditions 1 through 9. Vibration acceleration data was continuously collected using the equipment's vibration acceleration sensor at a sampling frequency of 12kHz, and saved as time-series data to obtain the operating data for operating condition 0. After completion, the equipment was switched to operating condition 1, and the above operating and data collection steps were repeated to obtain the operating data for operating condition 1. By controlling the equipment to operate under 10 different states in 4 different operating conditions, a total of 40 sets of time-series data were collected and saved as separate .csv files.
[0042] S12. The maximum and minimum value methods are used to normalize the original operating data and the target operating data respectively to obtain the normalized original operating data and the normalized target operating data. Based on the actual operating status of the equipment, the corresponding operating conditions and running status are labeled on the normalized original operating data. The corresponding operating conditions and running status are labeled on the normalized target operating data.
[0043] In one feasible implementation, the raw data of different operating conditions and different operating states collected by the sensor, namely 40 sets of raw time-series data, are normalized using the maximum-minimum method, and the normalized data are saved as .csv files. Based on the actual operating conditions and operating states of the equipment, the normalized data are labeled with the corresponding operating conditions and operating states (e.g., fault type 3 for operating condition 0); wherein the normalized data is represented by the following formula (1):
[0044] (1)
[0045] in, Represents the original data; This represents the mean of the original data; This represents the standard deviation of the original data; This represents the normalized data.
[0046] S13. A sliding window is used to resample the normalized original operating condition data and the normalized target operating condition data to obtain the processed original operating condition data and the processed target operating condition data.
[0047] In one feasible implementation, a sliding window is used to resample 40 groups of normalized data. The length of the sliding window is 1024 and the step size is 512. Starting from the first data in each group, 320 resampling operations are performed, thereby dividing a long original time series data into 320 short time series data, which are the resampled data.
[0048] S14. Based on the equipment's operating conditions and status, and according to preset quantities and proportions, divide the processed original operating data into a training set, a validation set, and a test set for the original operating conditions; divide the processed target operating data into a training set, a validation set, and a test set for the target operating conditions.
[0049] In one feasible implementation, the obtained resampled data is used to construct an original operating condition dataset. The order of the resampled data is shuffled, and the 320 resampled data points of each normalized data set are randomly divided into a training set, a validation set, and a test set in a ratio of 7:2:1. It is ensured that the amount of data for various fault types under the same operating condition is equal. For example, the amount of data for operating condition 0 in the training set, validation set, and test set are 2240, 640, and 320, respectively, and the amount of data for each operating state is 224, 64, and 32, respectively.
[0050] In one feasible implementation, the obtained resampled data is used to construct a target operating condition dataset. The order of the resampled data is shuffled. For each group of 320 resampled data points from the normalized data, the dataset is randomly divided into training, validation, and test sets, with the number of samples in the training and validation sets of the target operating condition being 1 / 32 of the original operating condition, and the number of samples in the test set being the same. It is ensured that the amount of data for various fault types under the same operating condition is equal. For example, the amount of data for operating condition 1 in the training, validation, and test sets is 70, 20, and 320, respectively, and the amount of data for each operating state is 7, 2, and 32, respectively.
[0051] S2. Construct a fault diagnosis model based on a spatiotemporal dual-channel network with parallel CNN-BiGRU as the main framework; pre-train the fault diagnosis model using the original working condition dataset to obtain the optimal parameter set.
[0052] Among them, such as Figure 2 The figure shown is a schematic diagram of a pre-trained fault diagnosis model provided in an embodiment of the present invention.
[0053] Optionally, the fault diagnosis model of the spatiotemporal dual-channel network with parallel CNN-BiGRU as the main framework includes: an input layer, a spatiotemporal feature extraction module, a feature fusion layer, a self-attention module, a fully connected layer, and an output layer; wherein, the spatiotemporal feature extraction module includes two branches: a spatial feature extraction branch and a temporal feature extraction branch;
[0054] The spatial feature extraction branch consists of three concatenated one-dimensional CNN layers, with a kernel size of [size missing]. Each convolutional structure contains a convolutional layer, a batch normalization layer, a ReLU activation layer, and a max pooling layer;
[0055] The temporal feature extraction branch is composed of a series of two-layer bidirectional gated unit (BiGRU) structures, with the hidden layer sizes of the unidirectional GRUs in each layer being 128 and 64, respectively.
[0056] Among them, such as Figure 3 The diagram shown is a flowchart of a pre-trained model parameter update according to an embodiment of the present invention;
[0057] In one feasible implementation, model parameters are randomly initialized, and raw working condition data is input. This data is then processed through the input layer, spatiotemporal feature extraction module, feature fusion layer, self-attention module, fully connected layer, and output layer of the pre-trained model. The pre-trained model parameters are updated, and the loss function is calculated. If the validation set loss decreases, the optimal parameter set is updated. Further checks are then performed to determine if the early stopping condition is met or the maximum number of iterations has been reached. If met, the model parameters are set to the optimal parameter set, and a diagnostic result is output. If not, the pre-trained model parameters continue to be updated until the early stopping condition is met. Conversely, if the validation set loss increases, the early stopping condition is directly checked to determine if the maximum number of iterations has been reached. If met, the model parameters are set to the optimal parameter set, and a diagnostic result is output. If not, the pre-trained model parameters continue to be updated until the early stopping condition is met.
[0058] Optionally, S2 uses the original operating condition dataset to pre-train the fault diagnosis model to obtain the optimal parameter set, including:
[0059] S21. Input the training set and validation set of the original working conditions into the spatial feature extraction branch and the temporal feature extraction branch at the same time. Perform feature extraction layer by layer through convolutional layer, batch normalization layer, ReLU activation layer and max pooling layer to output spatial local features; perform feature extraction through two-layer bidirectional gating unit BiGRU to output temporal global features.
[0060] In one feasible implementation, the original working condition training set data is input into the model, and the input size of the input layer is 1024. The input layer is used to transform the data from a one-dimensional raw input to the input of the two-dimensional feature extraction module. For the transfer space feature extraction branch, the data size is converted to... For the time feature extraction branch, the data size is converted to... Then, the transformed data is input into the two branches of the feature extraction module.
[0061] In one feasible implementation, the converted size is Two-dimensional data is input to the spatial feature extraction branch of the spatiotemporal feature extraction module, and then processed sequentially through three cascaded one-dimensional convolutional structures (1D-CNN) to obtain output features. Each convolutional structure sequentially contains a convolutional layer, a batch normalization layer, a ReLU activation layer, and a max pooling layer, with the size of the three convolutional kernels being [missing information]. The specific process of obtaining the output features is represented by the following formula (2):
[0062] (2)
[0063] in, Indicates the number of layers in the convolutional structure. , Indicates the total number of input channels. Indicates the number of input channels j represents the number of output channels. This represents the input features, which are the output features of the previous layer. For output features, The convolution kernel weight matrix is... For bias terms, Here, is the activation function, and is the ReLU activation function. This is a one-dimensional convolution operation.
[0064] The output features, after being nonlinearized by the ReLU activation layer, enter the max pooling layer. The max pooling method is used to downsample the output features, reducing the feature dimension to half of its original value, as expressed by the following formula (3):
[0065] (3)
[0066] in, The output features after downsampling This is the index of the current sample's position within the batch. This is the location index of the output feature.
[0067] The downsampled output features are input into the global average pooling layer, and global averaging is performed on each feature map channel to convert the two-dimensional feature vector back into a one-dimensional feature vector, thereby obtaining the spatial local features, which are expressed by the following formula (4):
[0068] (4)
[0069] in, This represents the output of global average pooling. It is a local feature in space; The length of the input feature. The location index for the input feature. The final max-pooling layer outputs features;
[0070] Table 1 shows the changes in parameters and feature size at each layer of the spatial feature extraction branch.
[0071] Table 1
[0072]
[0073] In one feasible implementation, the converted two-dimensional data of size 1024×1 is input into the temporal feature extraction branch of the spatiotemporal feature extraction module, and passes through two cascaded bi-directional gated unit (BiGRU) structures. The hidden layer sizes of each unidirectional GRU are 128 and 64, respectively. The gated unit structure (GRU) dynamically adjusts the flow of information through two gating mechanisms. Under the control of the reset gate, a new candidate hidden state is calculated by combining the current input and some historical information. Under the control of the update gate, the hidden state at the current moment is updated by combining the candidate hidden state and the hidden state at the previous moment, as expressed by the following formulas (5)-(8):
[0074] (5)
[0075] (6)
[0076] (7)
[0077] (8)
[0078] in, It is the Sigmoid activation function. It is the hyperbolic tangent function. This indicates element-wise multiplication. For the current input, The current hidden state. This is the hidden state from the previous moment. The update gate output for the current moment. The weights are input to the update gate. To hide the state and update the weight of the gate, To update the door offset, The current reset gate output. The weights are input to the reset gate. To hide the state and reset the weight of the door, To reset the door offset, This represents the current candidate hidden state. The weights are input to the candidate hidden state. The weights from the hidden state to the candidate hidden state. The bias is used to determine the candidate hidden state.
[0079] In one feasible implementation, the bidirectional gated unit (BiGRU) structure introduces a bidirectional information processing mechanism, consisting of a forward GRU and a backward GRU. The forward GRU captures information dependencies from the past to the present, and the backward GRU captures information dependencies from the future to the present. The hidden states of the forward and backward GRUs are concatenated to obtain the hidden state at the current moment, thereby comprehensively capturing the long-term and short-term dependencies of the time-series signal, as expressed by the following formulas (9)-(10):
[0080] (9)
[0081] (10)
[0082] (11)
[0083] in, This represents the current hidden state of the forward channel; This represents the hidden state of the reverse channel at the current moment. The current hidden state; Indicates a splicing operation; This represents the input at the current moment.
[0084] In one feasible implementation, the features output from the second BiGRU layer are input into a global average pooling layer. The feature values of each channel at all time steps are averaged to convert the two-dimensional feature vector into a one-dimensional feature vector, thus obtaining the final output of the time feature extraction branch, which is expressed by the following formula (12):
[0085] (12)
[0086] in, The average pooling output of the time-step feature values is the time-series global feature; This represents the total duration of time.
[0087] Table 2 shows the changes in parameters and feature sizes of each layer in the time feature extraction branch.
[0088] Table 2
[0089]
[0090] S22. Input the spatial local features and temporal global features into the feature fusion module for adaptive fusion to obtain spatiotemporal fused features;
[0091] In one feasible implementation, spatial local features and temporal global features are input into the feature fusion module and concatenated along the feature dimension to obtain a feature vector of size 256, which contains both temporal and spatial features. The concatenation process is represented by the following formula (13):
[0092] (13)
[0093] in, This refers to the spatiotemporal fusion characteristics after splicing.
[0094] S23. Input the spatiotemporal fusion features into the attention module to create three trainable parameter matrices, and perform dot product with the spatiotemporal fusion features respectively to map the spatiotemporal fusion features to the query, key, and value space; based on the query, key, and value, calculate the attention-weighted features through the softmax function;
[0095] In one feasible implementation, the process of mapping spatiotemporal fusion features to query, key, and value spaces is represented by the following formulas (14)-(16):
[0096] (14)
[0097] (15)
[0098] (16)
[0099] in, This represents the first parameter matrix that can be trained; This represents the trainable second parameter matrix; This represents the trainable third parameter matrix; Indicates spatiotemporal fusion characteristics; Indicates a query; V represents the key; V represents the value.
[0100] In one feasible implementation, the similarity between the query and the key is calculated using a dot product method and divided by a scaling factor to prevent the dot product result from being too large and causing gradient vanishing. The similarity is converted into a probability distribution, i.e., attention weights, through the softmax function, representing the contribution of each position to the current position. The values are weighted and summed according to the attention weights to obtain a sequence with the same length and dimension as the input sequence. The element at each position is the result of a weighted sum of all position elements in the input sequence. The calculation process of the attention weighted feature is represented by the following formula (17):
[0101] (17)
[0102] in, This represents the scaling factor, with a value of 256. This indicates attention-weighted features.
[0103] S24. Input the attention-weighted features into the fully connected layer for linear transformation, and output the fault probability distribution matrix.
[0104] In one feasible implementation, high-dimensional attention-weighted features are input into a fully connected layer and mapped to a low-dimensional fault probability distribution space through a linear transformation; wherein the dimension of the probability space is 1×10, and the value at each position represents the probability that the input data belongs to the corresponding fault category; wherein the calculation process of the fault probability distribution matrix is expressed by the following formula (18):
[0105] (18)
[0106] in, Represents the failure probability distribution matrix; Represents the weight matrix; This represents the bias matrix.
[0107] In one feasible implementation, the loss of the training set is calculated using the cross-entropy loss function based on the fault probability distribution matrix, and the sum of the losses of all samples is obtained, which is expressed by the following formula (19):
[0108] (19)
[0109] in, The loss function; Let be the predicted probability of model i for the true fault category n of the sample.
[0110] S25. Select the dimension with the largest probability value in the fault probability distribution matrix, and use the corresponding category label as the prediction result of the model.
[0111] In one feasible implementation, the dimension with the largest probability value in the fault probability distribution matrix is selected, and the corresponding category label is the most likely fault type. This fault type is used as the prediction result of the model and transmitted to the output layer, expressed by the following formula (20):
[0112] (20)
[0113] in, Indicates the fault type, including normal operation and fault 1 to fault 9.
[0114] S26. Based on the prediction results, calculate the loss function for each round, backpropagate to update all training parameters, and stop training when the maximum number of learning rounds is reached. Find and return the training round that minimizes the loss function, save the values of each parameter in the model structure under that round, and obtain the optimal parameter set of the model under the original working conditions.
[0115] Based on the prediction results, the loss function for each round is calculated, and all training parameters are updated through backpropagation until the maximum number of learning rounds is reached or the early stopping condition of "if the validation set loss does not decrease in multiple consecutive training rounds, then training is stopped" is met.
[0116] S27. Set the parameters of the model to the optimal parameters; input the test set of the original working conditions into the fault diagnosis model to obtain the optimal accuracy and confusion matrix of the pre-trained model.
[0117] S3. Construct a transfer model with the same structure as the fault diagnosis model; transfer the optimal parameter set to the transfer model, and freeze all parameters of the CNN layer and BiGRU layer; train the transfer model using the target working condition dataset, and output the trained fault diagnosis model.
[0118] Among them, such as Figure 4 The figure shown is a schematic diagram of a migration fault diagnosis model provided in an embodiment of the present invention.
[0119] Optionally, S3 is constructed using a transfer model with the same structure as the fault diagnosis model; the optimal parameter set is transferred to the transfer model, and all parameters of the CNN and BiGRU layers are frozen, including:
[0120] Construct an initial transfer model; wherein, the initial transfer model has the same structure as the fault diagnosis model of the spatiotemporal dual-channel network with parallel CNN-BiGRU as the main framework, including: input layer, spatiotemporal feature extraction module, feature fusion layer, self-attention module, fully connected layer and output layer;
[0121] All parameter states in the transfer model are randomly initialized; the optimal parameter set of the original working condition is completely copied into the transfer model as the initialization parameters of the transfer model, forming the initial parameter set for the target working condition.
[0122] Among them, such as Figure 5 The diagram shown is a schematic diagram of a parameter migration process provided by an embodiment of the present invention.
[0123] In one feasible implementation, the optimal parameter set of the original working condition is completely copied into the transfer model as the initialization parameters of the transfer model. The process of forming the initial parameter set for the target working condition is expressed by the following formula (21):
[0124] (twenty one)
[0125] in, Represents the optimal parameter set for the original operating condition; This represents the initial parameter set for the target operating condition.
[0126] Based on the initial parameter set of the target working condition, all parameters in the spatiotemporal feature extraction module of the transfer model are frozen so that they remain at fixed values during model training and do not participate in parameter updates during backpropagation.
[0127] Optionally, the process of freezing all parameters in the spatiotemporal feature extraction module of the transfer model can be represented by the following formula (22):
[0128] (twenty two)
[0129] in, The loss function representing the transfer learning model; The parameters represent the spatial feature extraction branch of the transfer model; The parameters represent the time feature extraction branch.
[0130] Among them, such as Figure 6 The diagram shown is a flowchart of a migration model parameter update provided by an embodiment of the present invention.
[0131] In one feasible implementation, the model parameters are randomly initialized to the optimal parameter set of the pre-trained model. Target working condition data is input and processed through the input layer, spatiotemporal feature extraction module, feature fusion layer, self-attention module, fully connected layer, and output layer of the transfer model. The transfer model parameters are updated, and the loss function of the transfer model is calculated. When the validation set loss decreases, the optimal parameter set is updated. Further checks are made to determine if the early stopping condition is met or the maximum number of iterations has been reached. If met, the transfer model parameters are set to the optimal parameter set, and a diagnostic result is output. If not, the transfer model parameters continue to be updated until the early stopping condition is met. Conversely, when the validation set loss increases, checks are made directly to determine if the early stopping condition is met or the maximum number of iterations has been reached. If met, the transfer model parameters are set to the optimal parameter set, and a diagnostic result is output. If not, the transfer model parameters continue to be updated until the early stopping condition is met.
[0132] Optionally, S3 uses the target operating condition dataset to train the transfer model and outputs a trained fault diagnosis model, including:
[0133] S31. Input the training set and validation set of the target working condition into the spatial feature extraction branch and the temporal feature extraction branch at the same time. Perform feature extraction layer by layer through convolutional layer, batch normalization layer, ReLU activation layer and max pooling layer to output spatial local features; perform feature extraction through two-layer bidirectional gating unit BiGRU to output temporal global features.
[0134] S32. Input the spatial local features and temporal global features into the feature fusion module for adaptive fusion to obtain spatiotemporal fused features;
[0135] S33. Input the spatiotemporal fusion features into the attention module to create three trainable parameter matrices, and perform dot product with the spatiotemporal fusion features respectively to map the spatiotemporal fusion features to the query, key, and value space; calculate the attention-weighted features based on the query, key, and value using the softmax function;
[0136] S34. Input the attention-weighted features into the fully connected layer for linear transformation, and output the fault probability distribution matrix.
[0137] S35. Select the dimension with the largest probability value in the fault probability distribution matrix, and use the corresponding category label as the prediction result of the transfer model.
[0138] S36. Based on the prediction results, calculate the transfer model loss function for each round, backpropagate to update the parameters that are not frozen except for the spatiotemporal feature extraction module, and stop training when the maximum number of learning rounds is reached. Find and return the training round that minimizes the transfer model loss function, save the values of each parameter in the model structure under that round as the optimal parameter set of the transfer model under the target working condition, and output the trained fault diagnosis model.
[0139] Training is stopped when the maximum number of training rounds is reached or when the early stopping condition of "stop training if the validation set loss does not decrease in multiple consecutive training rounds" is met.
[0140] In one feasible implementation, all parameters of the transfer model are set to optimal parameters, the target working condition test set data is input, and all processes of steps S31-S35 are completed in sequence to calculate the loss function of the transfer model and output the optimal accuracy and confusion matrix of the model.
[0141] In this model, the transfer model updates the model parameters with a small learning rate during training. This allows the model to integrate a small amount of target working condition information from a large amount of original working condition information, based on the optimal parameter set of the pre-trained model. While retaining the original working condition feature representation ability, it gradually incorporates the domain-specific features of the target working condition, achieving high-accuracy transfer of the model from the original working condition to the target working condition.
[0142] S4. Input the operating data of the equipment to be diagnosed into the trained fault diagnosis model and output the equipment fault diagnosis results.
[0143] This invention collects raw time-series signals from equipment under various operating conditions, performs normalization and resampling processing to construct a dataset containing normal operating conditions and various fault states. It then constructs a spatiotemporal dual-channel network with a parallel CNN-BiGRU framework, dynamically fusing spatiotemporal modal features using a self-attention mechanism. The model is pre-trained using a large amount of raw operating condition data to obtain the optimal parameter set. Furthermore, this invention constructs a transfer model with the same structure as the pre-trained model, transferring the pre-trained parameters to the transfer model and freezing the parameters of the spatiotemporal feature extraction module, only differentiating the parameters of the remaining modules. Model training is completed using a small amount of target operating condition data. This invention improves the robustness of cross-operating condition fault diagnosis and the diagnostic accuracy under small sample conditions.
[0144] Figure 7 This is a block diagram of a cross-condition fault diagnosis device combining spatiotemporal feature fusion and parameter hierarchical migration, provided by an embodiment of the present invention. This device is used for a cross-condition fault diagnosis method combining spatiotemporal feature fusion and parameter hierarchical migration. (Refer to...) Figure 7 The device includes an acquisition unit 710, a construction unit 720, a training unit 730, and a diagnostic unit 740. Wherein:
[0145] The acquisition unit 710 is used to acquire the operating data of the device under the original operating condition and the operating data under the target operating condition; and to perform normalization processing on the operating data under the original operating condition and the operating data under the target operating condition to construct the original operating condition dataset and the target operating condition dataset.
[0146] The building unit 720 is used to build a fault diagnosis model based on a spatiotemporal dual-channel network with parallel CNN-BiGRU as the main framework; the fault diagnosis model is pre-trained using the original working condition dataset to obtain the optimal parameter set;
[0147] Training unit 730 is used to construct a transfer model with the same structure as the fault diagnosis model; transfer the optimal parameter set to the transfer model, freeze all parameters of the CNN layer and BiGRU layer; train the transfer model using the target working condition dataset, and output the trained fault diagnosis model;
[0148] The diagnostic unit 740 is used to input the operating data of the equipment to be diagnosed into the trained fault diagnosis model and output the equipment fault diagnosis results.
[0149] Optionally, the fault diagnosis model of the spatiotemporal dual-channel network with parallel CNN-BiGRU as the main framework includes: an input layer, a spatiotemporal feature extraction module, a feature fusion layer, a self-attention module, a fully connected layer, and an output layer; wherein, the spatiotemporal feature extraction module includes two branches: a spatial feature extraction branch and a temporal feature extraction branch;
[0150] The spatial feature extraction branch consists of three concatenated one-dimensional CNN layers, with a kernel size of [size missing]. Each convolutional structure contains a convolutional layer, a batch normalization layer, a ReLU activation layer, and a max pooling layer;
[0151] The temporal feature extraction branch is composed of a series of two-layer bidirectional gated unit (BiGRU) structures, with the hidden layer sizes of the unidirectional GRUs in each layer being 128 and 64, respectively.
[0152] Optionally, the acquisition unit 710 is configured to:
[0153] Set the equipment to its original operating condition, and then operate the equipment in sequence under normal operation and fault 1 to fault 9 conditions. Use the vibration acceleration sensor on the equipment to continuously collect the vibration acceleration data of the equipment at a constant sampling frequency, and save it in the form of time-series data to obtain the operating data of the original operating condition. Adjust the equipment to the target operating condition, and repeat the above collection steps to obtain the operating data of the target operating condition.
[0154] The maximum and minimum value methods are used to normalize the original operating data and the target operating data respectively, to obtain normalized original operating data and normalized target operating data; combined with the actual operating status of the equipment, the normalized original operating data are labeled with the corresponding operating conditions and operating status; the normalized target operating data are labeled with the corresponding operating conditions and operating status.
[0155] A sliding window is used to resample the normalized original operating condition data and the normalized target operating condition data to obtain the processed original operating condition data and the processed target operating condition data.
[0156] Based on the equipment's operating conditions and status, and according to preset quantities and proportions, the processed original operating data is divided into a training set, a validation set, and a test set for the original operating conditions; the processed target operating data is also divided into a training set, a validation set, and a test set for the target operating conditions.
[0157] Optionally, the step of pre-training the fault diagnosis model using the original operating condition dataset to obtain the optimal parameter set includes:
[0158] The training and validation sets of the original working conditions are simultaneously input into the spatial feature extraction branch and the temporal feature extraction branch. Layer-by-layer feature extraction is performed through convolutional layers, batch normalization layers, ReLU activation layers and max pooling layers to output spatial local features; feature extraction is performed through a two-layer bidirectional gating unit BiGRU to output temporal global features.
[0159] Spatial local features and temporal global features are input into the feature fusion module for adaptive fusion to obtain spatiotemporal fused features;
[0160] The spatiotemporal fusion features are input into the attention module to create three trainable parameter matrices, which are then multiplied by the spatiotemporal fusion features to map the spatiotemporal fusion features to the query, key, and value spaces. Based on the query, key, and value, the attention-weighted features are calculated using the softmax function.
[0161] The attention-weighted features are input into the fully connected layer for linear transformation, and the fault probability distribution matrix is output.
[0162] Select the dimension with the highest probability value in the fault probability distribution matrix, and use the corresponding category label as the model's prediction result;
[0163] Based on the prediction results, calculate the loss function for each round, backpropagate to update all training parameters, and stop training when the maximum number of learning rounds is reached. Find and return the training round that minimizes the loss function, save the values of each parameter in the model structure under that round, and obtain the optimal parameter set of the model under the original working conditions.
[0164] The model's parameters are set to their optimal values; the test set of the original working conditions is input into the fault diagnosis model to obtain the optimal accuracy and confusion matrix of the pre-trained model.
[0165] Optionally, the construction of a transfer model with the same structure as the fault diagnosis model; transferring the optimal parameter set to the transfer model, and freezing all parameters of the CNN layer and BiGRU layer, including:
[0166] Construct an initial transfer model; wherein, the initial transfer model has the same structure as the fault diagnosis model of the spatiotemporal dual-channel network with parallel CNN-BiGRU as the main framework, including: input layer, spatiotemporal feature extraction module, feature fusion layer, self-attention module, fully connected layer and output layer;
[0167] All parameter states in the transfer model are randomly initialized; the optimal parameter set of the original working condition is completely copied into the transfer model as the initialization parameters of the transfer model, forming the initial parameter set for the target working condition.
[0168] Based on the initial parameter set of the target working condition, all parameters in the spatiotemporal feature extraction module of the transfer model are frozen so that they remain at fixed values during model training and do not participate in parameter updates during backpropagation.
[0169] Optionally, the process of freezing all parameters in the spatiotemporal feature extraction module of the transfer model is represented by the following formula (1):
[0170] (1)
[0171] in, The loss function representing the transfer learning model; The parameters represent the spatial feature extraction branch of the transfer model; The parameters represent the time feature extraction branch.
[0172] Optionally, the training unit 730 is used for:
[0173] The training and validation sets of the target working condition are simultaneously input into the spatial feature extraction branch and the temporal feature extraction branch. Feature extraction is performed layer by layer through convolutional layers, batch normalization layers, ReLU activation layers and max pooling layers to output spatial local features; feature extraction is performed through a two-layer bidirectional gating unit BiGRU to output temporal global features.
[0174] Spatial local features and temporal global features are input into the feature fusion module for adaptive fusion to obtain spatiotemporal fused features;
[0175] The spatiotemporal fusion features are input into the attention module to create three trainable parameter matrices, which are then multiplied by the spatiotemporal fusion features to map the spatiotemporal fusion features to the query, key, and value space. Based on the query, key, and value, the attention-weighted features are calculated using the softmax function.
[0176] The attention-weighted features are input into the fully connected layer for linear transformation, and the fault probability distribution matrix is output.
[0177] Select the dimension with the highest probability value in the fault probability distribution matrix, and use the corresponding category label as the prediction result of the transfer model;
[0178] Based on the prediction results, the transfer model loss function for each round is calculated. Backpropagation updates the parameters that are not frozen except for the spatiotemporal feature extraction module. Training stops when the maximum number of learning rounds is reached. The training round that minimizes the transfer model loss function is found and returned. The values of each parameter in the model structure under that round are saved as the optimal parameter set of the transfer model under the target working condition. The trained fault diagnosis model is then output.
[0179] This invention collects raw time-series signals from equipment under various operating conditions, performs normalization and resampling processing to construct a dataset containing normal operating conditions and various fault states. It then constructs a spatiotemporal dual-channel network with a parallel CNN-BiGRU framework, dynamically fusing spatiotemporal modal features using a self-attention mechanism. The model is pre-trained using a large amount of raw operating condition data to obtain the optimal parameter set. Furthermore, this invention constructs a transfer model with the same structure as the pre-trained model, transferring the pre-trained parameters to the transfer model and freezing the parameters of the spatiotemporal feature extraction module. Only the parameters of the remaining modules are fine-tuned, and model training is completed using a small amount of target operating condition data. Using this invention can improve the robustness of cross-operating condition fault diagnosis and the diagnostic accuracy under small sample conditions.
[0180] Figure 8 This is a schematic diagram of the structure of a cross-condition fault diagnosis device combining spatiotemporal feature fusion and parameter hierarchical migration provided in an embodiment of the present invention, as shown below. Figure 8 As shown, the cross-condition fault diagnosis device combining spatiotemporal feature fusion and parameter hierarchical migration can include the above-mentioned... Figure 7 The illustrated cross-condition fault diagnosis device combines spatiotemporal feature fusion and parameter hierarchical migration. Optionally, the cross-condition fault diagnosis device 810 combining spatiotemporal feature fusion and parameter hierarchical migration may include a first processor 2001.
[0181] Optionally, the cross-condition fault diagnosis device 810, which combines spatiotemporal feature fusion and parameter hierarchical migration, may also include a memory 2002 and a transceiver 2003.
[0182] The first processor 2001, memory 2002, and transceiver 2003 can be connected via a communication bus.
[0183] The following is combined with Figure 8 The components of the cross-condition fault diagnosis device 810, which combines spatiotemporal feature fusion and parameter hierarchical migration, are described in detail below:
[0184] The first processor 2001 is the control center of the cross-condition fault diagnosis device 810 that combines spatiotemporal feature fusion and parameter hierarchical migration. It can be a single processor or a collective term for multiple processing elements. For example, the first processor 2001 can be one or more central processing units (CPUs), application-specific integrated circuits (ASICs), or one or more integrated circuits configured to implement embodiments of the present invention, such as one or more digital signal processors (DSPs), or one or more field-programmable gate arrays (FPGAs).
[0185] Optionally, the first processor 2001 can execute various functions of the cross-condition fault diagnosis device 810, which combines spatiotemporal feature fusion and parameter hierarchical migration, by running or executing software programs stored in the memory 2002 and calling data stored in the memory 2002.
[0186] In a specific implementation, as one example, the first processor 2001 may include one or more CPUs, for example... Figure 8 CPU0 and CPU1 are shown in the diagram.
[0187] In a specific implementation, as one example, the cross-condition fault diagnosis device 810 combining spatiotemporal feature fusion and parameter hierarchical migration may also include multiple processors, for example... Figure 8 The first processor 2001 and the second processor 2004 are shown in the diagram. Each of these processors can be a single-core processor or a multi-core processor. Here, a processor can refer to one or more devices, circuits, and / or processing cores used to process data (such as computer program instructions).
[0188] The memory 2002 is used to store the software program that executes the present invention, and is controlled by the first processor 2001 to execute it. The specific implementation method can be referred to the above method embodiment, and will not be repeated here.
[0189] Optionally, the memory 2002 may be a read-only memory (ROM) or other type of static storage device capable of storing static information and instructions, random access memory (RAM) or other type of dynamic storage device capable of storing information and instructions, or electrically erasable programmable read-only memory (EEPROM), compact disc read-only memory (CD-ROM) or other optical disc storage, optical disc storage (including compressed optical discs, laser discs, optical discs, digital universal optical discs, Blu-ray discs, etc.), magnetic disk storage media or other magnetic storage devices, or any other medium capable of carrying or storing desired program code in the form of instructions or data structures and accessible by a computer, but not limited thereto. The memory 2002 may be integrated with the first processor 2001 or may exist independently, and may be connected via the interface circuit of the cross-condition fault diagnosis device 810, which combines spatiotemporal feature fusion and parameter hierarchical migration. Figure 8 (Not shown in the image) is coupled to the first processor 2001, and this embodiment of the invention does not specifically limit this.
[0190] The transceiver 2003 is used to communicate with network devices or with terminal devices.
[0191] Alternatively, transceiver 2003 may include a receiver and a transmitter. Figure 8 (Not shown separately). The receiver is used to implement the receiving function, and the transmitter is used to implement the transmitting function.
[0192] Optionally, the transceiver 2003 can be integrated with the first processor 2001 or exist independently, and can be connected to the interface circuit of the cross-condition fault diagnosis device 810 that combines spatiotemporal feature fusion and parameter hierarchical migration. Figure 8 (Not shown in the image) is coupled to the first processor 2001, and this embodiment of the invention does not specifically limit this.
[0193] It should be noted that, Figure 8 The structure of the cross-condition fault diagnosis device 810 combining spatiotemporal feature fusion and parameter hierarchical migration shown in the figure does not constitute a limitation on the router. The actual cross-condition fault diagnosis device combining spatiotemporal feature fusion and parameter hierarchical migration may include more or fewer components than shown, or combine certain components, or have different component arrangements.
[0194] Furthermore, the technical effects of the cross-condition fault diagnosis device 810 combining spatiotemporal feature fusion and parameter hierarchical migration can be referred to the technical effects of the cross-condition fault diagnosis method combining spatiotemporal feature fusion and parameter hierarchical migration described in the above method embodiments, and will not be repeated here.
[0195] It should be understood that the first processor 2001 in the embodiments of the present invention may be a central processing unit (CPU), or it may be other general-purpose processors, digital signal processors (DSPs), application-specific integrated circuits (ASICs), field-programmable gate arrays (FPGAs), or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc. The general-purpose processor may be a microprocessor, or it may be any conventional processor, etc.
[0196] It should also be understood that the memory in the embodiments of the present invention can be volatile memory or non-volatile memory, or may include both volatile and non-volatile memory. The non-volatile memory can be read-only memory (ROM), programmable read-only memory (PROM), erasable programmable read-only memory (EPROM), electrically erasable programmable read-only memory (EEPROM), or flash memory. The volatile memory can be random access memory (RAM), which is used as an external cache. By way of example, but not limitation, many forms of random access memory (RAM) are available, such as static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double data rate synchronous DRAM (DDR SDRAM), enhanced synchronous DRAM (ESDRAM), synchronous linked DRAM (SLDRAM), and direct rambus RAM (DR RAM).
[0197] The above embodiments can be implemented, in whole or in part, by software, hardware (such as circuits), firmware, or any other combination thereof. When implemented using software, the above embodiments can be implemented, in whole or in part, as a computer program product. The computer program product includes one or more computer instructions or computer programs. When the computer instructions or computer programs are loaded or executed on a computer, all or part of the processes or functions described in the embodiments of the present invention are generated. The computer can be a general-purpose computer, a special-purpose computer, a computer network, or other programmable device. The computer instructions can be stored in a computer-readable storage medium or transmitted from one computer-readable storage medium to another. For example, the computer instructions can be transmitted from one website, computer, server, or data center to another website, computer, server, or data center via wired (e.g., infrared, wireless, microwave, etc.) means. The computer-readable storage medium can be any available medium that a computer can access or a data storage device such as a server or data center that includes one or more sets of available media. The available medium can be a magnetic medium (e.g., floppy disk, hard disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium. A semiconductor medium can be a solid-state drive.
[0198] It should be understood that the term "and / or" in this article is merely a description of the relationship between related objects, indicating that three relationships can exist. For example, A and / or B can represent: A existing alone, A and B existing simultaneously, or B existing alone. A and B can be singular or plural. Additionally, the character " / " in this article generally indicates an "or" relationship between the preceding and following related objects, but it can also represent an "and / or" relationship. Please refer to the context for a more accurate understanding.
[0199] In this invention, "at least one" means one or more, and "more than one" means two or more. "At least one of the following" or similar expressions refer to any combination of these items, including any combination of a single item or a plurality of items. For example, at least one of a, b, or c can represent: a, b, c, ab, ac, bc, or abc, where a, b, and c can be a single item or multiple items.
[0200] It should be understood that, in various embodiments of the present invention, the order of the above-mentioned process numbers does not imply the order of execution. The execution order of each process should be determined by its function and internal logic, and should not constitute any limitation on the implementation process of the embodiments of the present invention.
[0201] Those skilled in the art will recognize that the units and algorithm steps of the various examples described in conjunction with the embodiments disclosed herein can be implemented in electronic hardware, or a combination of computer software and electronic hardware. Whether these functions are implemented in hardware or software depends on the specific application and design constraints of the technical solution. Those skilled in the art can use different methods to implement the described functions for each specific application, but such implementations should not be considered beyond the scope of this invention.
[0202] Those skilled in the art will clearly understand that, for the sake of convenience and brevity, the specific working processes of the devices, apparatuses, and units described above can be referred to the corresponding processes in the foregoing method embodiments, and will not be repeated here.
[0203] In the embodiments provided by this invention, it should be understood that the disclosed devices, apparatuses, and methods can be implemented in other ways. For example, the apparatus embodiments described above are merely illustrative; for instance, the division of units is only a logical functional division, and in actual implementation, there may be other division methods. For example, multiple units or components may be combined or integrated into another device, or some features may be ignored or not executed. Furthermore, the coupling or direct coupling or communication connection shown or discussed may be through some interfaces; the indirect coupling or communication connection between devices or units may be electrical, mechanical, or other forms.
[0204] The units described as separate components may or may not be physically separate. The components shown as units may or may not be physical units; that is, they may be located in one place or distributed across multiple network units. Some or all of the units can be selected to achieve the purpose of this embodiment according to actual needs.
[0205] In addition, the functional units in the various embodiments of the present invention can be integrated into one processing unit, or each unit can exist physically separately, or two or more units can be integrated into one unit.
[0206] If the aforementioned functions are implemented as software functional units and sold or used as independent products, they can be stored in a computer-readable storage medium. Based on this understanding, the technical solution of this invention, or the part that contributes to the prior art, or a part of the technical solution, can be embodied in the form of a software product. This computer software product is stored in a storage medium and includes several instructions to cause a computer device (which may be a personal computer, server, or network device, etc.) to execute all or part of the steps of the methods described in the various embodiments of this invention. The aforementioned storage medium includes various media capable of storing program code, such as USB flash drives, portable hard drives, read-only memory (ROM), random access memory (RAM), magnetic disks, or optical disks.
[0207] The above description is merely a specific embodiment of the present invention, but the scope of protection of the present invention is not limited thereto. Any variations or substitutions that can be easily conceived by those skilled in the art within the technical scope disclosed in the present invention should be included within the scope of protection of the present invention. Therefore, the scope of protection of the present invention should be determined by the scope of the claims.
Claims
1. A cross-condition fault diagnosis method combining spatiotemporal feature fusion and parameter hierarchical migration, characterized in that, The method includes: S1. Obtain the operating data of the equipment under the original operating condition and the operating data under the target operating condition; normalize the operating data under the original operating condition and the operating data under the target operating condition to construct the original operating condition dataset and the target operating condition dataset. S2. Construct a fault diagnosis model based on a spatiotemporal dual-channel network with parallel CNN-BiGRU as the main framework; pre-train the fault diagnosis model using the original working condition dataset to obtain the optimal parameter set; The fault diagnosis model of the spatiotemporal dual-channel network with parallel CNN-BiGRU as the main framework includes: an input layer, a spatiotemporal feature extraction module, a feature fusion layer, a self-attention module, a fully connected layer, and an output layer; wherein the spatiotemporal feature extraction module includes two branches: a spatial feature extraction branch and a temporal feature extraction branch. The spatial feature extraction branch consists of three concatenated one-dimensional CNN layers with a kernel size of 3×1. Each convolutional structure includes a convolutional layer, a batch normalization layer, a ReLU activation layer, and a max pooling layer. The temporal feature extraction branch is composed of a series of two-layer bidirectional gated unit (BiGRU) structures, with the hidden layer sizes of each unidirectional GRU being 128 and 64, respectively. S3. Construct a transfer model with the same structure as the fault diagnosis model; transfer the optimal parameter set to the transfer model, and freeze all parameters of the CNN layer and BiGRU layer; train the transfer model using the target working condition dataset, and output the trained fault diagnosis model. Specifically, S3 is constructed using a transfer model with the same structure as the fault diagnosis model; the optimal parameter set is transferred to the transfer model, and all parameters of the CNN layer and BiGRU layer are frozen, including: Construct an initial transfer model; wherein, the initial transfer model has the same structure as the fault diagnosis model of the spatiotemporal dual-channel network with parallel CNN-BiGRU as the main framework, including: input layer, spatiotemporal feature extraction module, feature fusion layer, self-attention module, fully connected layer and output layer; All parameter states in the transfer model are randomly initialized; the optimal parameter set of the original working condition is completely copied into the transfer model as the initialization parameters of the transfer model, forming the initial parameter set for the target working condition. Based on the initial parameter set of the target working condition, all parameters in the spatiotemporal feature extraction module of the transfer model are frozen so that they remain at fixed values during model training and do not participate in parameter updates during backpropagation. S4. Input the operating data of the equipment to be diagnosed into the trained fault diagnosis model and output the equipment fault diagnosis results.
2. The cross-condition fault diagnosis method combining spatiotemporal feature fusion and parameter hierarchical migration as described in claim 1, characterized in that, The S1 acquires the operating data of the device under the original operating condition and the operating data under the target operating condition; The operating data of the original operating condition and the operating data of the target operating condition are normalized to construct the original operating condition dataset and the target operating condition dataset, including: S11. Set the equipment to the original operating condition, and operate the equipment in sequence under normal operation and fault 1 to fault 9. Use the vibration acceleration sensor on the equipment to continuously collect the vibration acceleration data of the equipment at a constant sampling frequency, and save it in the form of time series data to obtain the operating data of the original operating condition; adjust the equipment to the target operating condition, and repeat the above collection steps to obtain the operating data of the target operating condition. S12. The maximum and minimum value methods are used to normalize the original operating data and the target operating data respectively to obtain the normalized original operating data and the normalized target operating data. Based on the actual operating status of the equipment, the corresponding operating conditions and running status are labeled on the normalized original operating data. The corresponding operating conditions and running status are labeled on the normalized target operating data. S13. A sliding window is used to resample the normalized original operating condition data and the normalized target operating condition data to obtain the processed original operating condition data and the processed target operating condition data. S14. Based on the equipment's operating conditions and status, and according to preset quantities and proportions, divide the processed original operating data into a training set, a validation set, and a test set for the original operating conditions; divide the processed target operating data into a training set, a validation set, and a test set for the target operating conditions.
3. The cross-condition fault diagnosis method combining spatiotemporal feature fusion and parameter hierarchical migration as described in claim 1, characterized in that, The S2 method uses the original operating condition dataset to pre-train the fault diagnosis model to obtain the optimal parameter set, including: S21. Input the training set and validation set of the original working conditions into the spatial feature extraction branch and the temporal feature extraction branch at the same time. Perform feature extraction layer by layer through convolutional layer, batch normalization layer, ReLU activation layer and max pooling layer to output spatial local features; perform feature extraction through two-layer bidirectional gating unit BiGRU to output temporal global features. S22. Input the spatial local features and temporal global features into the feature fusion module for adaptive fusion to obtain spatiotemporal fused features; S23. Input the spatiotemporal fusion features into the attention module to create three trainable parameter matrices, and perform dot product with the spatiotemporal fusion features respectively to map the spatiotemporal fusion features to the query, key, and value space; based on the query, key, and value, calculate the attention-weighted features through the softmax function; S24. Input the attention-weighted features into the fully connected layer for linear transformation, and output the fault probability distribution matrix. S25. Select the dimension with the largest probability value in the fault probability distribution matrix, and use the corresponding category label as the prediction result of the model. S26. Based on the prediction results, calculate the loss function for each round, backpropagate to update all training parameters, and stop training when the maximum number of learning rounds is reached. Find and return the training round that minimizes the loss function, save the values of each parameter in the model structure under that round, and obtain the optimal parameter set of the model under the original working conditions. S27. Set the parameters of the model to the optimal parameters; input the test set of the original working conditions into the fault diagnosis model to obtain the optimal accuracy and confusion matrix of the pre-trained model.
4. The cross-condition fault diagnosis method combining spatiotemporal feature fusion and parameter hierarchical migration according to claim 1, characterized in that, The process of freezing all parameters in the spatiotemporal feature extraction module of the transfer model is represented by the following formula (1): (1) in, The loss function representing the transfer learning model; The parameters represent the spatial feature extraction branch of the transfer model; The parameters represent the time feature extraction branch.
5. The cross-condition fault diagnosis method combining spatiotemporal feature fusion and parameter hierarchical migration according to claim 1, characterized in that, The S3 method uses the target operating condition dataset to train the transfer model and outputs a trained fault diagnosis model, including: S31. Input the training set and validation set of the target working condition into the spatial feature extraction branch and the temporal feature extraction branch at the same time. Perform feature extraction layer by layer through convolutional layer, batch normalization layer, ReLU activation layer and max pooling layer to output spatial local features; perform feature extraction through two-layer bidirectional gating unit BiGRU to output temporal global features. S32. Input the spatial local features and temporal global features into the feature fusion module for adaptive fusion to obtain spatiotemporal fused features; S33. Input the spatiotemporal fusion features into the attention module to create three trainable parameter matrices, and perform dot product with the spatiotemporal fusion features respectively to map the spatiotemporal fusion features to the query, key, and value space; calculate the attention-weighted features based on the query, key, and value using the softmax function; S34. Input the attention-weighted features into the fully connected layer for linear transformation, and output the fault probability distribution matrix. S35. Select the dimension with the largest probability value in the fault probability distribution matrix, and use the corresponding category label as the prediction result of the transfer model. S36. Based on the prediction results, calculate the transfer model loss function for each round, backpropagate to update the parameters that are not frozen except for the spatiotemporal feature extraction module, and stop training when the maximum number of learning rounds is reached. Find and return the training round that minimizes the transfer model loss function, save the values of each parameter in the model structure under that round as the optimal parameter set of the transfer model under the target working condition, and output the trained fault diagnosis model.
6. A cross-condition fault diagnosis device combining spatiotemporal feature fusion and parameter hierarchical migration, wherein the cross-condition fault diagnosis device combining spatiotemporal feature fusion and parameter hierarchical migration is used to implement the cross-condition fault diagnosis method combining spatiotemporal feature fusion and parameter hierarchical migration as described in any one of claims 1-5, characterized in that, The device includes: The acquisition unit is used to acquire the operating data of the equipment under the original operating condition and the operating data under the target operating condition; and to perform normalization processing on the operating data under the original operating condition and the operating data under the target operating condition to construct the original operating condition dataset and the target operating condition dataset. The building unit is used to construct a fault diagnosis model based on a spatiotemporal dual-channel network with parallel CNN-BiGRU as the main framework; the fault diagnosis model is pre-trained using the original working condition dataset to obtain the optimal parameter set; The training unit is used to construct a transfer model with the same structure as the fault diagnosis model; transfer the optimal parameter set to the transfer model, freeze all parameters of the CNN layer and BiGRU layer; train the transfer model using the target working condition dataset, and output the trained fault diagnosis model. The diagnostic unit is used to input the operating data of the equipment to be diagnosed into the trained fault diagnosis model and output the equipment fault diagnosis results.
7. A cross-condition fault diagnosis device combining spatiotemporal feature fusion and parameter hierarchical migration, characterized in that, The cross-condition fault diagnosis device that combines spatiotemporal feature fusion and parameter hierarchical migration includes: processor; A memory storing computer-readable instructions that, when executed by the processor, implement the method as described in any one of claims 1 to 5.
8. A computer-readable storage medium, characterized in that, The computer-readable storage medium contains program code that can be invoked by a processor to execute the method as described in any one of claims 1 to 5.