A multi-modal robust intelligent fault diagnosis method and device for resource-constrained edge industrial equipment

What is AI technical title?
AI technical title is built by PatSnap AI team. It summarizes the technical point description of the patent document.
By using SimSiam cross-modal self-supervised pre-training and quantized perceptual adversarial fine-tuning, a lightweight fault diagnosis model is generated, which solves the problem of high-precision robustness of edge devices in resource-constrained and harsh environments, and achieves efficient fault diagnosis.

CN121901871BActive Publication Date: 2026-06-19NANJING UNIV OF POSTS & TELECOMM

View PDF 2 Cites 0 Cited by

Patent Information

Authority / Receiving Office: CN · China
Patent Type: Patents(China)
Current Assignee / Owner: NANJING UNIV OF POSTS & TELECOMM
Filing Date: 2026-03-26
Publication Date: 2026-06-19

Application Information

Patent Timeline

26 Mar 2026

Application

19 Jun 2026

Publication

CN121901871B

IPC: G06F18/2413; G06F18/214; G06F18/25; G06N3/0455; G06N3/0895; G06N3/084; G06F18/2415

AI Tagging

Application Domain

Biological models

Technology Topics

Industrial equipmentCharacteristic space

Explore More Agents

Novelty Search
Search existing technologies and assess novelty
↗
FTO
Analyze whether a product may infringe others' patents
↗
Design FTO
Check prior-design risk for exterior design
↗
Drafting
Draft patent application text based on a technical solution
↗
Find Solutions with TRIZ
Generate feasible solution to solve your technical challenge
↗

Similar Technology Patents

A cigarette equipment part whole life cycle management system and method based on a two-dimensional code mark
CN122243387ACo-operative working arrangements Office automationReverse analysisIndustrial equipment
A flexible polymer plate with a three-dimensional infrared radiation network and a preparation method and application thereof
CN122234490AMaterial analysis by optical meansCarbon nanotubeIndustrial equipment
Early warning anomaly detection and risk avoidance industrial device control method and apparatus
CN117032104BProgramme total factory controlDecision modelTarget control
Device predictive maintenance and overhaul management integration system
CN122243445AOffice automationReal-time dataDowntime
An industrial internet of things thing model card based on FPGA technology
CN122247867AError prevention Securing communicationAttackIndustrial equipment

Get free access to AI patent search and analysis

Check patentability, review prior art and ask IP Agent with full patent context.

AI Technical Summary

Technical Problem

Existing deep learning models struggle to achieve high accuracy and robustness on resource-constrained edge devices, especially in harsh industrial environments where they are susceptible to electromagnetic interference and noise, and the scarcity of labeled data leads to a decline in model performance.

Method used

We employ the SimSiam cross-modal self-supervised pre-trained model, combined with a feature fusion layer, quantization module, and adversarial attack module. Through pre-training and fine-tuning on a multimodal dataset, we generate a lightweight fault diagnosis model that adapts to resource constraints at the edge and enhances robustness.

Benefits of technology

Achieving high-precision fault diagnosis in resource-constrained and interference-prone environments reduces reliance on labeled data, improves model stability under strong noise and electromagnetic interference, and significantly enhances diagnostic accuracy.

✦ Generated by Eureka AI based on patent content.

Smart Images

Figure CN121901871B_ABST

Patent Text Reader

Abstract

This invention discloses a multimodal robust intelligent fault diagnosis method and device for resource-constrained edge industrial equipment, belonging to the fields of industrial Internet of Things and artificial intelligence. The method first constructs a SimSiam cross-modal self-supervised pre-trained model, extracting modal invariant features from one-dimensional vibration and two-dimensional time-frequency images from massive unlabeled data, thus addressing the problem of scarce labeled data. Subsequently, a quantization-before-attack strategy is introduced in the fine-tuning stage. First, the fused features are discretized into low-bit quantized features. Then, adversarial perturbation samples are generated in the low-bit quantized feature space. These samples are used to update the classifier, achieving model fine-tuning. This invention, while ensuring a lightweight model to adapt to edge computing resource constraints, effectively solves the problem of poor anti-interference capability of quantized models in discrete feature spaces, significantly improving the robustness and stability of the diagnostic system in noisy industrial environments.

Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention belongs to the field of industrial Internet of Things and artificial intelligence technology, specifically relating to a multimodal robust intelligent fault diagnosis method and device for resource-constrained edge industrial equipment. Background Technology

[0002] With the development of Industry 4.0, deep learning-based fault diagnosis technology has been widely applied in industrial equipment maintenance. However, in real-world edge computing scenarios, existing diagnostic methods face severe challenges. First, traditional deep learning models rely on massive amounts of labeled data, while while industrial field data is abundant, high-quality data with fault labels is extremely scarce and labeling costs are high. Second, to adapt to the limited computing resources and storage space at the edge (such as embedded sensors and gateways), models must undergo quantization compression (e.g., converting from 32-bit floating-point to 8-bit integer). However, traditional "train first, quantize later" or simple quantization-aware training often leads to a significant decrease in the model's ability to resist interference. Furthermore, industrial environments are harsh, with strong electromagnetic interference and sensor noise. Existing defense methods are mostly based on full-precision spatial design; when the model is quantized to discrete space, the original defense mechanisms often fail due to the "gradient mismatch" problem, causing the diagnostic system to easily misjudge under harsh conditions. Therefore, there is an urgent need for a diagnostic method that can solve the data labeling problem while simultaneously meeting the requirements of limited edge resources and robustness against interference. Summary of the Invention

[0003] To overcome the shortcomings of existing technologies in achieving both high precision and high robustness at resource-constrained edge environments, this invention provides a multimodal robust intelligent fault diagnosis method and device for industrial equipment at resource-constrained edge environments.

[0004] To achieve the above objectives, the technical solution adopted by the present invention is as follows:

[0005] On one hand, the present invention provides a multimodal robust intelligent fault diagnosis method for resource-constrained edge industrial equipment, comprising:

[0006] Operational data of industrial equipment is collected to construct a multimodal dataset containing one-dimensional vibration signals and two-dimensional time-frequency images, which is divided into an unlabeled pre-training set and a labeled fine-tuning set; the labeled fine-tuning set is labeled with the normal state and different fault categories of the industrial equipment.

[0007] A SimSiam cross-modal self-supervised pre-trained model is constructed, which includes a two-stream encoder with shared weights, a projection head, and a prediction head; the two-stream encoder includes a temporal encoder and an image encoder.

[0008] The unlabeled pre-training set was used to pre-train the dual-stream encoder in the SimSiam cross-modal self-supervised pre-training model.

[0009] A feature fusion layer, a quantization module, and an adversarial attack module are introduced after a pre-trained two-stream encoder, and a fault classifier is cascaded after the quantization module to obtain a quantization-aware adversarial fine-tuning model. The feature fusion layer is used to fuse the one-dimensional vibration signal features output by the pre-trained two-stream encoder with the two-dimensional time-frequency image features to obtain fused features. The quantization module is used to discretize the fused features into low-bit quantization features. The adversarial attack module is used to generate adversarial perturbations in the discrete quantization feature space. The fault classifier is used to output the predicted probabilities of different fault categories.

[0010] The labeled fine-tuning set is used to perform collaborative fine-tuning training of the quantization-aware adversarial fine-tuning model by first quantizing and then attacking, and only the parameters of the fault classifier are updated to finally obtain the trained lightweight model.

[0011] The trained lightweight model is deployed to resource-constrained edge computing nodes to collect real-time operating data of industrial equipment and output fault category diagnosis results based on the lightweight model.

[0012] Preferably, the acquisition of operational data from industrial equipment constructs a multimodal dataset comprising one-dimensional vibration signals and two-dimensional time-frequency images, including:

[0013] Vibration signal data of industrial equipment under different health conditions are collected and used as the first mode input; the vibration signal data is converted into a two-dimensional time-frequency image using continuous wavelet transform and used as the second mode input; finally, a multimodal dataset containing one-dimensional vibration signal and two-dimensional time-frequency image is constructed.

[0014] Preferably, the dual-stream encoder in the SimSiam cross-modal self-supervised pre-training model is pre-trained using the unlabeled pre-training set, including:

[0015] For the same input sample in the unlabeled pre-training set, vibration features are extracted using a time-series encoder. And extracting time-frequency features through an image encoder ;

[0016] Using prediction head Mapping one-dimensional vibration characteristics to two-dimensional time-frequency modes, represented as: And mapping two-dimensional time-frequency features to one-dimensional vibration modes, expressed as: ;

[0017] Constructing an asymmetric loss function :

[0018] ,

[0019] in, Represents negative cosine similarity. This indicates that the gradient operation is stopped. This indicates the one-dimensional vibration characteristics. Features mapped to two-dimensional time-frequency modes This indicates the two-dimensional time-frequency features. Features mapped to one-dimensional vibration modes;

[0020] By continuously minimizing the asymmetric loss function Iterative training is performed on the unlabeled pre-training set until convergence, resulting in a pre-trained two-stream encoder.

[0021] Preferably, the temporal encoder employs a one-dimensional convolutional neural network; the image encoder employs a visual Transformer.

[0022] Preferably, the collaborative fine-tuning training of the quantization-aware adversarial fine-tuning model using the labeled fine-tuning set for a pre-quantization-then-attack approach includes:

[0023] During fine-tuning training, the parameters of the pre-trained dual-stream encoder are kept frozen, and samples from the labeled fine-tuning set are input into the quantized perceptual adversarial fine-tuning model to obtain the fused features generated by the feature fusion layer. ;

[0024] Using quantization function The fusion features Discretization into low-bit quantization features :

[0025] ,

[0026] in, This is the quantization scaling factor. This is a rounding function. This is a truncation function. This is the truncation threshold;

[0027] In quantification features Based on this, the projected gradient descent method is used to generate anti-perturbation methods. Obtain adversarial examples :

[0028] ,

[0029] Adversarial examples are used to update the parameters of the fault classifier only, resulting in a well-trained quantized model.

[0030] Preferably, the cross-entropy loss function is used during the fine-tuning training process.

[0031] Preferably, during the fine-tuning training process, the fault classifier parameters are updated by backpropagating gradients through the pass-through estimator.

[0032] Secondly, the present invention provides a multimodal robust intelligent fault diagnosis device for resource-constrained edge industrial equipment, used to implement the above-mentioned multimodal robust intelligent fault diagnosis method for resource-constrained edge industrial equipment, the device comprising:

[0033] The data acquisition module is used to collect operating data of industrial equipment, construct a multimodal dataset containing one-dimensional vibration signals and two-dimensional time-frequency images, and divide it into an unlabeled pre-training set and a labeled fine-tuning set; the labeled fine-tuning set is labeled with the normal state and different fault categories of the industrial equipment.

[0034] The first model building module is used to build a SimSiam cross-modal self-supervised pre-trained model, which includes a two-stream encoder with shared weights, a projection head, and a prediction head; the two-stream encoder includes a temporal encoder and an image encoder.

[0035] The pre-training module is used to pre-train the two-stream encoder in the SimSiam cross-modal self-supervised pre-training model using the unlabeled pre-training set.

[0036] The second model construction module is used to introduce a feature fusion layer, a quantization module, and an adversarial attack module after the pre-trained dual-stream encoder, and to cascade a fault classifier after the quantization module to obtain a quantization-aware adversarial fine-tuning model. The feature fusion layer is used to fuse the one-dimensional vibration signal features output by the pre-trained dual-stream encoder with the two-dimensional time-frequency image features to obtain fused features. The quantization module is used to discretize the fused features into low-bit quantization features. The adversarial attack module is used to generate adversarial perturbations in the discrete quantization feature space. The fault classifier is used to output the predicted probabilities of different fault categories.

[0037] The fine-tuning module is used to perform collaborative fine-tuning training of the quantization-aware adversarial fine-tuning model by first quantizing and then attacking using the labeled fine-tuning set, and only updates the parameters of the fault classifier to finally obtain the trained lightweight model.

[0038] The prediction output module is used to deploy the trained lightweight model to resource-constrained edge computing nodes, collect real-time operating data of industrial equipment, and output fault category diagnosis results based on the lightweight model.

[0039] Thirdly, the present invention provides a computer-readable storage medium for storing one or more programs, the one or more programs including instructions that, when executed by a computing device, cause the computing device to perform any of the methods described above in the multimodal robust intelligent fault diagnosis method for resource-constrained edge industrial equipment.

[0040] Fourthly, the present invention provides a computing device comprising one or more processors, a memory, and one or more programs, wherein the one or more programs are stored in the memory and configured to be executed by the one or more processors, the one or more programs comprising instructions for performing any of the methods in the above-described multimodal robust intelligent fault diagnosis methods for resource-constrained edge industrial equipment.

[0041] The beneficial effects of this invention are as follows:

[0042] This invention utilizes SimSiam cross-modal self-supervised technology to learn high-quality features from massive datasets without requiring a large number of labels, reducing reliance on manual annotation. Model quantization significantly reduces storage and computational overhead, enabling it to run on resource-constrained, low-power industrial chips. It innovatively combines quantization with adversarial attacks, solving the problem of robustness loss after quantization in traditional methods. This method transforms quantization from "passive compression" to "active defensive constraints," significantly improving the stability of the diagnostic system in environments with strong noise and electromagnetic interference. Attached Figure Description

[0043] Figure 1 This is a schematic diagram of the overall process of the multimodal robust intelligent fault diagnosis method for resource-constrained edge industrial equipment provided by the present invention.

[0044] Figure 2 This is a schematic diagram of the SimSiam cross-modal self-supervised pre-training model structure provided by the present invention;

[0045] Figure 3 This is a schematic diagram of multimodal data provided in an embodiment of the present invention;

[0046] Figure 4 This is a schematic diagram showing the comparison of the balanced accuracy of different self-supervised methods in a few-sample scenario in the embodiments of the present invention;

[0047] Figure 5 This is a schematic diagram showing the performance comparison results of the defense modes in the ablation experiment in this embodiment of the invention. Detailed Implementation

[0048] To make the objectives, technical solutions, and advantages of this invention clearer, the invention will be further described in detail below with reference to the embodiments and accompanying drawings. Here, the illustrative embodiments and descriptions of this invention are used to explain the invention, but are not intended to limit the invention.

[0049] It should also be noted that, in order to avoid obscuring the invention with unnecessary details, only the structures and / or processing steps closely related to the solution according to the invention are shown in the accompanying drawings, while other details that are not closely related to the invention are omitted.

[0050] It should be emphasized that the term "including / comprises" as used herein refers to the presence of a feature, element, step, or component, but does not exclude the presence or addition of one or more other features, elements, steps, or components.

[0051] It should also be noted that, unless otherwise specified, the term "connection" in this article can refer not only to a direct connection, but also to an indirect connection involving an intermediary.

[0052] In the following description, embodiments of the invention will be illustrated with reference to the accompanying drawings. In the drawings, the same reference numerals represent the same or similar parts, or the same or similar steps.

[0053] It should be emphasized here that the step markers mentioned below are not a limitation on the order of the steps, but should be understood as meaning that the steps can be executed in the order mentioned in the embodiments, or in a different order than in the embodiments, or several steps can be executed simultaneously.

[0054] Unless otherwise defined, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs. The terminology used herein is for the purpose of describing particular embodiments only and is not intended to limit the scope of this application.

[0055] This invention provides a multimodal robust intelligent fault diagnosis method for resource-constrained edge industrial equipment, see [link to relevant documentation]. Figure 1 This includes the following steps:

[0056] S1. Collect multimodal data from industrial equipment.

[0057] Specifically, operational data from industrial equipment was collected to construct a multimodal fault dataset containing one-dimensional vibration signals and two-dimensional time-frequency images, which was then divided into an unlabeled pre-training set and a labeled fine-tuning set. The labeled fine-tuning set was used to label the normal state of the equipment and different fault categories for subsequent supervised fine-tuning and fault classification.

[0058] S2. Construct and pre-train a SimSiam cross-modal self-supervised pre-trained model.

[0059] SimSiam (Simple Siamese) is a simple Siamese self-supervised learning architecture that requires no negative samples and prevents representation collapse through an asymmetric predictor and a stopping gradient operation. The model comprises a two-stream encoder with shared weights, a projection head, and a prediction head; the two-stream encoder includes a temporal encoder and an image encoder.

[0060] The dual-stream encoder in the constructed SimSiam cross-modal self-supervised pre-training model is pre-trained using an unlabeled pre-training set. By maximizing the cosine similarity between one-dimensional vibration signal features and two-dimensional time-frequency image features, modal invariant features reflecting the health status of the equipment are extracted.

[0061] S3. Construct a quantization-aware adversarial fine-tuning architecture for resource-constrained edge devices.

[0062] Specifically, a feature fusion layer, a quantization module, and an adversarial attack module are introduced after the pre-trained dual-stream encoder, and a fault classifier is cascaded after the quantization module to form a quantization-aware adversarial fine-tuning model. The feature fusion layer performs cross-modal interaction between the one-dimensional vibration signal features output by the dual-stream encoder and the two-dimensional time-frequency image features, outputting full-precision fused features through channel concatenation. The quantization module discretizes the fused features into low-bit quantized features. The adversarial attack module takes the low-bit discrete quantized features output by the quantization module as input and generates adversarial perturbations within the discrete quantized feature space. The fault classifier receives the low-bit quantized features and adversarial perturbation samples and maps them to predicted probabilities for various fault categories.

[0063] S4. Execute the Quantize-then-Attack strategy.

[0064] Specifically, the labeled fine-tuning set constructed by S1 is used to perform supervised fine-tuning training on the fault classifier in the aforementioned quantization-aware adversarial fine-tuning model. During the fine-tuning training process, the parameters of the dual-stream encoder pre-trained by S2 are kept frozen. Samples from the labeled fine-tuning set are input into the aforementioned quantization-aware adversarial fine-tuning model to obtain full-precision fused features generated by the feature fusion layer. These features are then discretized into low-bit quantized features by the quantization module, and adversarial perturbation samples are generated in this quantization feature space by the adversarial attack module. The predicted probabilities of different fault categories are output by the fault classifier. The cross-entropy loss is calculated by combining the corresponding real fault category labels in the labeled fine-tuning set. Backpropagation is used to update only the parameters of the fault classifier, ultimately obtaining the trained lightweight model.

[0065] S5. Deploy the trained lightweight model to a resource-constrained edge computing environment for diagnostics.

[0066] Specifically, the system collects real-time operating data from industrial equipment and outputs fault category diagnostic results based on this lightweight model.

[0067] In step S2 of this invention, the pre-training process of the SimSiam cross-modal self-supervised pre-training model is as follows:

[0068] For the same input sample, vibration features are extracted using a time encoder. Extracting time-frequency features from image encoders ;

[0069] Using prediction head The projection features of a one-dimensional vibration mode are mapped to a two-dimensional time-frequency mode, which is represented as follows: And mapping the projection features of the two-dimensional time-frequency modes to the one-dimensional vibration modes, expressed as: Construct an asymmetric loss function. :

[0070] ,

[0071] In the formula, This indicates the one-dimensional vibration characteristics. Features mapped to two-dimensional time-frequency modes This indicates the two-dimensional time-frequency features. Features mapped to one-dimensional vibration modes Represents negative cosine similarity. This indicates that gradient operations are stopped to prevent model collapse.

[0072] The constructed SimSiam cross-modal self-supervised pre-training model was pre-trained using an unlabeled pre-training set. During iterative training, the loss function was continuously minimized. (i.e., maximizing the cosine similarity of cross-modal features) forces the dual-stream encoder to filter out the specific noise of each mode, ultimately causing the features output by the dual-stream encoder to converge into modally invariant features that reflect the health status of the equipment.

[0073] In step S4 of this invention, a quantization-before-attack strategy is executed, and the specific process is as follows:

[0074] S41. Feature Quantization: Using a quantization function Fusion features Discretization into low-bit quantization features :

[0075] ,

[0076] In the formula, This is the quantization scaling factor. This is a rounding function. This is a truncation function. This is the truncation threshold;

[0077] S42. Quantization Space Adversarial Attack: Based on quantization features Based on this, the Projected Gradient Descent (PGD) algorithm is used to generate adversarial perturbations. Obtain adversarial examples :

[0078] ,

[0079] S43. Use adversarial examples to train the fault classifier. During the training process, backpropagate gradients through the Straight-Through Estimator (STE) to update the fault classifier parameters, so that the model is robust to disturbances in the quantization space.

[0080] The following is a detailed description of the multimodal robust intelligent fault diagnosis method for resource-constrained edge industrial equipment provided by the present invention, using a specific implementation example:

[0081] S1. Collect operational data from industrial equipment and construct a multimodal dataset.

[0082] In this embodiment, rolling bearings are used as the research object (i.e., resource-constrained edge industrial equipment), and acceleration vibration signals under different health states are collected; the one-dimensional time series of this acceleration vibration signal is then analyzed. As the first mode input; using continuous wavelet transform to... Convert to two-dimensional time-frequency image The first is used as the second modal input; finally, a multimodal dataset containing one-dimensional vibration signals and two-dimensional time-frequency images is constructed as the training set.

[0083] S2. Construct a SimSiam cross-modal self-supervised pre-trained model and train it using the training set described above.

[0084] In this embodiment, to address the issue of limited labels in industrial scenarios, the SimSiam cross-modal self-supervised pre-training model employs a dual-stream encoder network, which includes a temporal encoder and an image encoder.

[0085] like Figure 2 As shown, the temporal encoder uses a one-dimensional convolutional neural network (1D-CNN) to extract features, while the image encoder uses a visual Transformer (Vision Transformer, ViT) to extract features.

[0086] During the pre-training phase, only the two-stream encoder is trained. The same sample is used as input. and Vibration features were extracted using a time encoder. Extracting time-frequency features from image encoders ;

[0087] Using prediction head And a stopping gradient mechanism to minimize the negative cosine similarity loss between vibrational features and time-frequency features:

[0088] ,

[0089] in, This represents the feature of projecting a one-dimensional vibration mode onto a two-dimensional time-frequency mode. This represents the feature of projecting a two-dimensional time-frequency mode onto a one-dimensional vibration mode. This indicates that the gradient operation is stopped. This represents the negative cosine similarity.

[0090] By continuously minimizing the loss function Iterative training is performed on the training set to obtain a pre-trained two-stream encoder. Through this step, the SimSiam cross-modal self-supervised pre-trained model learns common modal features that are insensitive to noise.

[0091] S3. Construct a quantization-aware adversarial fine-tuning model for resource-constrained edge devices. Introduce a feature fusion layer, a quantization module, and an adversarial attack module after the pre-trained dual-stream encoder, and cascade a fault classifier after the quantization module.

[0092] S4. Execute a coordinated fine-tuning strategy of quantize-then-attack.

[0093] Quantization-aware adversarial fine-tuning is the core innovation of this invention. To adapt to 8-bit quantized inference at the edge and maintain robustness, this embodiment abandons the traditional path of "full-precision adversarial training followed by quantization," and adopts a "quantization first, attack later" strategy. Specifically, a quantization operation is added after the feature fusion layer. In each training iteration:

[0094] Forward propagation and quantization: The feature fusion layer outputs fused features through channel concatenation operations. Using quantization function Fusion features Discretization into low-bit quantization features .

[0095] Generating Quantization Space Adversarial Examples: Based on Quantization Features Adversarial perturbations are generated using the PGD algorithm to obtain adversarial examples. Note that the target of this attack is to deceive the quantized model.

[0096] Parameter update: based on adversarial examples Calculate the cross-entropy loss and use the Straight-Through Estimator (STE) to backpropagate and update the fault classifier parameters to obtain the trained lightweight model.

[0097] S5. Experimental Validation: The lightweight model trained in steps S1-S4 was validated on the CWRU bearing dataset. Experimental results show that, under the condition of a very small number of samples (5-shot, only 5 labeled samples per class), the accuracy of this model outperforms mainstream methods such as Bootstrap Your Own Latent (BYOL), Momentum Contrast (MoCo), Simple Framework for Contrastive Learning of Representations (SimCLR), and Swpping Assignments between Views (SwAV). Furthermore, when strong adversarial attacks are introduced using the Fast Gradient Sign Method (FGSM) and Projected Gradient Descent (PGD), the diagnostic accuracy of the method in this invention is more than 20% higher than that of traditional methods, demonstrating that the quantized model of this invention has extremely high reliability at resource-constrained and interference-prone edge environments.

[0098] The following is a specific implementation case to verify the multimodal robust intelligent fault diagnosis method for resource-constrained edge industrial equipment proposed in this invention. The specific process is as follows:

[0099] 1) Parameter settings: This experiment is based on the PyTorch 2.4.1 deep learning framework, and the running environment is Ubuntu 20.04 system, equipped with four NVIDIA RTX 4090 GPUs.

[0100] Data Layer: The Case Western Reserve University (CWRU) rolling bearing dataset is used as the industrial equipment testing benchmark. The dataset includes four health states: Normal, Ball fault, Inner race fault, and Outer race fault, with fault diameters ranging from 0.007 to 0.021 inches. Figure 3 As shown, each sample contains two parts of input: Figure 3 The left-middle image shows a multi-channel one-dimensional vibration signal with a length of 512 points. Figure 3 The middle right image is generated using continuous wavelet transform (CWT). Three-channel two-dimensional time-frequency image. The data was divided into training set, validation set and test set in a ratio of 70%:15%:15%.

[0101] In the pre-training phase, the batch size was set to 64, with 100 epochs. The AdamW optimizer was used, and the learning rate was 1e-4. In the fine-tuning phase, a resource-constrained edge environment was simulated, and the quantization bit width was set accordingly. bit, quantization truncation range is The adversarial attack employs the PGD algorithm, with a perturbation amplitude... Number of iterations .

[0102] 2) Experimental Results

[0103] Small-sample performance analysis: The model was trained using the method of this invention and existing mainstream self-supervised methods in a few-sample scenario, and the results were obtained. Figure 4 The results show the comparison of balance accuracy.

[0104] From the experimental results Figure 4 As can be seen, in scenarios with only 5 samples per class and extremely limited labels, the SimSiam-based Quantization Adversarial network (SiQA) method proposed in this invention achieves a balanced accuracy of 87.22%, significantly outperforming existing mainstream self-supervised learning methods such as SimCLR (85.91%) and MoCo (84.72%). In 10-shot scenarios, the accuracy of this method reaches 94.47%, while other comparative methods only range from 91.15% to 92.19%. The results demonstrate that SimSiam cross-modal pre-training can effectively extract modality-invariant features, overcoming the problem of scarce data labeling in industrial scenarios.

[0105] Robustness and Defense Effectiveness Analysis: To verify the necessity of the "quantization before attack" defense strategy in the SiQA method, we compared the performance differences of various defense modes through ablation experiments. The results of the ablation experiments... Figure 5 As can be seen, taking five sample scenarios as examples, although the SiQA model without defense measures achieves an accuracy of 80.99% on clean samples, its accuracy drops precipitously to 56.02% once subjected to PGD adversarial attacks (simulating strong industrial noise interference). This indicates that existing models without defense modules are highly susceptible to failure under harsh conditions. If the traditional "attack first, quantize later" strategy is adopted, experiments show that its accuracy under PGD attacks is only 51.86%, even lower than the undefended model. This verifies the theoretical analysis of this invention: adversarial samples optimized in full-precision space fail after quantization, exhibiting a "gradient mismatch" problem. In contrast, using the "quantize first, attack later" collaborative fine-tuning strategy proposed in this invention, the model can still maintain a high accuracy of 80.23% even under 8-bit quantization and strong PGD attacks, an improvement of approximately 23 percentage points compared to traditional methods.

[0106] In summary, the multimodal robust intelligent fault diagnosis method for resource-constrained edge industrial equipment proposed in this invention can effectively resist complex adversarial interference signals in edge computing environments characterized by resource constraints (8-bit quantization) and data scarcity (small sample size). By incorporating quantization constraints into adversarial training, not only are energy storage and computing resources saved, but the robustness and security of the industrial equipment fault diagnosis system are also significantly improved, enabling accurate diagnosis under harsh operating conditions.

[0107] Based on the above-described inventive concept, the present invention also provides a multimodal robust intelligent fault diagnosis device for resource-constrained edge industrial equipment, used to implement the above-described multimodal robust intelligent fault diagnosis method for resource-constrained edge industrial equipment. The device includes:

[0108] The data acquisition module is used to collect operating data of industrial equipment, construct a multimodal dataset containing one-dimensional vibration signals and two-dimensional time-frequency images, and divide it into an unlabeled pre-training set and a labeled fine-tuning set; the labeled fine-tuning set is labeled with the normal state and different fault categories of the industrial equipment.

[0109] The first model building module is used to build a SimSiam cross-modal self-supervised pre-trained model, which includes a two-stream encoder with shared weights, a projection head, and a prediction head; the two-stream encoder includes a temporal encoder and an image encoder.

[0110] The pre-training module is used to pre-train the two-stream encoder in the SimSiam cross-modal self-supervised pre-training model using the unlabeled pre-training set.

[0111] The second model construction module is used to introduce a feature fusion layer, a quantization module, and an adversarial attack module after the pre-trained dual-stream encoder, and to cascade a fault classifier after the quantization module to obtain a quantization-aware adversarial fine-tuning model. The feature fusion layer is used to fuse the one-dimensional vibration signal features output by the pre-trained dual-stream encoder with the two-dimensional time-frequency image features to obtain fused features. The quantization module is used to discretize the fused features into low-bit quantization features. The adversarial attack module is used to generate adversarial perturbations in the discrete quantization feature space. The fault classifier is used to output the predicted probabilities of different fault categories.

[0112] The fine-tuning module is used to perform collaborative fine-tuning training of the quantization-aware adversarial fine-tuning model by first quantizing and then attacking using the labeled fine-tuning set, and only updates the parameters of the fault classifier to finally obtain the trained lightweight model.

[0113] The prediction output module is used to deploy the trained lightweight model to resource-constrained edge computing nodes, collect real-time operating data of industrial equipment, and output fault category diagnosis results based on the lightweight model.

[0114] It is worth noting that this device embodiment corresponds to the above method embodiment. The implementation methods of the above method embodiments are all applicable to this device embodiment and can achieve the same or similar technical effects, so they will not be described in detail here.

[0115] Based on the above-described inventive concept, the present invention also provides a computer-readable storage medium for storing one or more programs, the one or more programs including instructions that, when executed by a computing device, cause the computing device to perform any of the methods described above in the multimodal robust intelligent fault diagnosis method for resource-constrained edge industrial equipment.

[0116] Based on the above-described inventive concept, the present invention also provides a computing device, including one or more processors, a memory, and one or more programs, wherein the one or more programs are stored in the memory and configured to be executed by the one or more processors, and the one or more programs include instructions for performing any of the methods in the above-described multimodal robust intelligent fault diagnosis methods for resource-constrained edge industrial equipment.

[0117] Those skilled in the art will understand that embodiments of this application can be provided as methods, systems, or computer program products. Therefore, this application can take the form of a completely hardware embodiment, a completely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, this application can take the form of a computer program product embodied on one or more computer-usable storage media (including but not limited to disk storage, CD-ROM, optical storage, etc.) containing computer-usable program code.

[0118] This application is described with reference to flowchart illustrations and / or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of this application. It will be understood that each block of the flowchart illustrations and / or block diagrams, and combinations of blocks in the flowchart illustrations and / or block diagrams, can be implemented by computer program instructions. These computer program instructions can be provided to a processor of a general-purpose computer, special-purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, generate instructions for implementing the flowchart... Figure 1 One or more processes and / or boxes Figure 1 A device that provides the functions specified in one or more boxes.

[0119] These computer program instructions may also be stored in a computer-readable storage medium that can direct a computer or other programmable data processing device to function in a particular manner, such that the instructions stored in the computer-readable storage medium produce an article of manufacture including instruction means, which are implemented in a process Figure 1 One or more processes and / or boxes Figure 1 The function specified in one or more boxes.

[0120] These computer program instructions may also be loaded onto a computer or other programmable data processing equipment to cause a series of operational steps to be performed on the computer or other programmable equipment to produce a computer-implemented process, thereby providing instructions that execute on the computer or other programmable equipment for implementing the process. Figure 1 One or more processes and / or boxes Figure 1 The steps of the function specified in one or more boxes.

[0121] Finally, it should be noted that the above embodiments are only used to illustrate the technical solutions of the present invention and not to limit it. Although the present invention has been described in detail with reference to the above embodiments, those skilled in the art should understand that modifications or equivalent substitutions can still be made to the specific implementation of the present invention. Any modifications or equivalent substitutions that do not depart from the spirit and scope of the present invention should be covered within the scope of protection of the claims of the present invention.

Claims

1. A multimodal robust intelligent fault diagnosis method for resource-constrained edge industrial equipment, characterized in that, include: Operational data of industrial equipment is collected to construct a multimodal dataset containing one-dimensional vibration signals and two-dimensional time-frequency images, which is divided into an unlabeled pre-training set and a labeled fine-tuning set; the labeled fine-tuning set is labeled with the normal state and different fault categories of the industrial equipment. A SimSiam cross-modal self-supervised pre-trained model is constructed, which includes a two-stream encoder with shared weights, a projection head, and a prediction head; the two-stream encoder includes a temporal encoder and an image encoder. The unlabeled pre-training set was used to pre-train the dual-stream encoder in the SimSiam cross-modal self-supervised pre-training model. A feature fusion layer, a quantization module, and an adversarial attack module are introduced after the pre-trained dual-stream encoder, and a fault classifier is cascaded after the quantization module to obtain a quantization-aware adversarial fine-tuning model; the feature fusion layer is used to fuse the one-dimensional vibration signal features output by the pre-trained dual-stream encoder with the two-dimensional time-frequency image features to obtain fused features; The quantization module is used to discretize the fused features into low-bit quantization features; The adversarial attack module is used to generate adversarial perturbations in a discrete quantized feature space; the fault classifier is used to output the predicted probabilities of different fault categories. The labeled fine-tuning set is used to perform collaborative fine-tuning training of the quantization-aware adversarial fine-tuning model by first quantizing and then attacking, and only the parameters of the fault classifier are updated to finally obtain the trained lightweight model. The trained lightweight model is deployed to resource-constrained edge computing nodes to collect real-time operating data of industrial equipment and output fault category diagnosis results based on the lightweight model.

2. The multimodal robust intelligent fault diagnosis method for resource-constrained edge industrial equipment according to claim 1, characterized in that, The collected industrial equipment operating data is used to construct a multimodal dataset containing one-dimensional vibration signals and two-dimensional time-frequency images, including: Vibration signal data of industrial equipment under different health conditions are collected and used as the first mode input; the vibration signal data is converted into a two-dimensional time-frequency image using continuous wavelet transform and used as the second mode input; finally, a multimodal dataset containing one-dimensional vibration signal and two-dimensional time-frequency image is constructed.

3. The multimodal robust intelligent fault diagnosis method for resource-constrained edge industrial equipment according to claim 1, characterized in that, Pre-training the two-stream encoder in the SimSiam cross-modal self-supervised pre-training model using the unlabeled pre-training set includes: For the same input sample in the unlabeled pre-training set, vibration features are extracted using a time-series encoder. And extracting time-frequency features through an image encoder ; Using prediction head Mapping one-dimensional vibration characteristics to two-dimensional time-frequency modes, represented as: And mapping two-dimensional time-frequency features to one-dimensional vibration modes, expressed as: ; Constructing an asymmetric loss function : ， in, Represents negative cosine similarity. This indicates that the gradient operation is stopped. This indicates the one-dimensional vibration characteristics. Features mapped to two-dimensional time-frequency modes This indicates the two-dimensional time-frequency features. Features mapped to one-dimensional vibration modes; By continuously minimizing the asymmetric loss function Iterative training is performed on the unlabeled pre-training set until convergence, resulting in a pre-trained two-stream encoder.

4. The multimodal robust intelligent fault diagnosis method for resource-constrained edge industrial equipment according to claim 3, characterized in that, The timing encoder uses a one-dimensional convolutional neural network; the image encoder uses a visual Transformer.

5. The multimodal robust intelligent fault diagnosis method for resource-constrained edge industrial equipment according to claim 3, characterized in that, The labeled fine-tuning set is used to perform collaborative fine-tuning training of the quantization-aware adversarial fine-tuning model, involving quantization followed by attack, including: During fine-tuning training, the parameters of the pre-trained dual-stream encoder are kept frozen, and samples from the labeled fine-tuning set are input into the quantized perceptual adversarial fine-tuning model to obtain the fused features generated by the feature fusion layer. ; Using quantization function The fusion features Discretization into low-bit quantization features : ， in, This is the quantization scaling factor. This is a rounding function. This is a truncation function. This is the truncation threshold; In quantification features Based on this, the projected gradient descent method is used to generate anti-perturbation methods. Obtain adversarial examples : ， Adversarial examples are used to update the parameters of the fault classifier only, resulting in a well-trained quantized model.

6. The multimodal robust intelligent fault diagnosis method for resource-constrained edge industrial equipment according to claim 5, characterized in that, The cross-entropy loss function is used in the fine-tuning training process.

7. The multimodal robust intelligent fault diagnosis method for resource-constrained edge industrial equipment according to claim 5, characterized in that, During the fine-tuning training process, the fault classifier parameters are updated by backpropagating gradients through the pass-through estimator.

8. A multimodal robust intelligent fault diagnosis device for resource-constrained edge industrial equipment, characterized in that, The apparatus for implementing the multimodal robust intelligent fault diagnosis method for resource-constrained edge industrial equipment as described in claim 1 includes: The data acquisition module is used to collect operating data of industrial equipment, construct a multimodal dataset containing one-dimensional vibration signals and two-dimensional time-frequency images, and divide it into an unlabeled pre-training set and a labeled fine-tuning set; the labeled fine-tuning set is labeled with the normal state and different fault categories of the industrial equipment. The first model building module is used to build a SimSiam cross-modal self-supervised pre-trained model, which includes a two-stream encoder with shared weights, a projection head, and a prediction head; the two-stream encoder includes a temporal encoder and an image encoder. The pre-training module is used to pre-train the two-stream encoder in the SimSiam cross-modal self-supervised pre-training model using the unlabeled pre-training set. The second model construction module is used to introduce a feature fusion layer, a quantization module, and an adversarial attack module after the pre-trained dual-stream encoder, and to cascade a fault classifier after the quantization module to obtain a quantization-aware adversarial fine-tuning model. The feature fusion layer is used to fuse the one-dimensional vibration signal features output by the pre-trained dual-stream encoder with the two-dimensional time-frequency image features to obtain fused features. The quantization module is used to discretize the fused features into low-bit quantization features. The adversarial attack module is used to generate adversarial perturbations in the discrete quantization feature space. The fault classifier is used to output the predicted probabilities of different fault categories. The fine-tuning module is used to perform collaborative fine-tuning training of the quantization-aware adversarial fine-tuning model by first quantizing and then attacking using the labeled fine-tuning set, and only updates the parameters of the fault classifier to finally obtain the trained lightweight model. The prediction output module is used to deploy the trained lightweight model to resource-constrained edge computing nodes, collect real-time operating data of industrial equipment, and output fault category diagnosis results based on the lightweight model.

9. A computer-readable storage medium for storing one or more programs, characterized in that, The one or more programs include instructions that, when executed by a computing device, cause the computing device to perform any of the methods in the multimodal robust intelligent fault diagnosis methods for resource-constrained edge industrial equipment according to claims 1 to 7.

10. A computing device, characterized in that, It includes one or more processors, a memory, and one or more programs, wherein the one or more programs are stored in the memory and configured to be executed by the one or more processors, and the one or more programs include instructions for performing any of the methods in the multimodal robust intelligent fault diagnosis methods for resource-constrained edge industrial equipment according to claims 1 to 7.