An industrial equipment fault prediction and maintenance method fusing AI vision

What is AI technical title?
AI technical title is built by PatSnap AI team. It summarizes the technical point description of the patent document.
By integrating multimodal feature extraction and lightweight models from visual and physical data, combined with cloud-based incremental learning and multi-objective optimization, high-precision, low-latency fault prediction and maintenance of industrial equipment are achieved. This solves the problems of single data, poor real-time performance, and rigid models in existing technologies, and enhances the self-iterative capability of equipment health management.

CN122219366APending Publication Date: 2026-06-16CHONGQING BITMAP INFORMATION TECH CO LTD

View PDF 0 Cites 0 Cited by

Patent Information

Authority / Receiving Office: CN · China
Patent Type: Applications(China)
Current Assignee / Owner: CHONGQING BITMAP INFORMATION TECH CO LTD
Filing Date: 2026-04-17
Publication Date: 2026-06-16

Application Information

Patent Timeline

17 Apr 2026

Application

16 Jun 2026

Publication

CN122219366A

IPC: G05B19/418

AI Tagging

Application Domain

Programme total factory control

Explore More Agents

Novelty Search
Search existing technologies and assess novelty
↗
FTO
Analyze whether a product may infringe others' patents
↗
Design FTO
Check prior-design risk for exterior design
↗
Drafting
Draft patent application text based on a technical solution
↗
Find Solutions with TRIZ
Generate feasible solution to solve your technical challenge
↗

Similar Technology Patents

Get free access to AI patent search and analysis

Check patentability, review prior art and ask IP Agent with full patent context.

AI Technical Summary

⚠Technical Problem

Existing technologies for industrial equipment fault prediction and maintenance suffer from limitations such as a single data dimension, inability to integrate visual and physical parameters to comprehensively depict equipment status, poor real-time performance of edge inference, high computational resource consumption, insufficient model adaptability, and inaccurate maintenance strategies, leading to over-maintenance or missed faults.

⚗Method used

By synchronously acquiring physical state parameters and visual image data of industrial equipment, multimodal feature extraction and weighted fusion are performed to generate multidimensional feature vectors. Lightweight models are used for dynamic path selection and real-time inference. Combined with cloud-based incremental learning and multi-objective optimization, the optimal maintenance plan is generated to achieve closed-loop optimization.

🎯Benefits of technology

It achieves comprehensive perception of equipment status and high-precision, low-latency real-time fault prediction, generates an optimal maintenance strategy that balances cost and risk, solves the problems of single data, poor real-time performance and rigid models, and enhances the lifecycle value of the system.

✦ Generated by Eureka AI based on patent content.

Smart Images

Figure CN122219366A_ABST

Patent Text Reader

Abstract

The application provides an industrial equipment fault prediction and maintenance method fused with AI vision, multi-source heterogeneous data is synchronously collected through deployment of high-resolution industrial cameras and physical sensors, a multi-dimensional feature vector of equipment state is generated through multi-modal feature fusion and weighting, a lightweight model optimized through structural pruning and knowledge distillation is deployed at an edge, dynamic path selection and a safe fallback mechanism are combined to perform real-time fault reasoning, the model is continuously optimized with the aid of cloud incremental learning, a multi-objective optimization algorithm is used to generate an optimal maintenance strategy balancing health degree, cost and risk, and finally a closed-loop optimization is formed through execution feedback. The method solves the problems of single data dimension, poor real-time performance of edge reasoning, insufficient model adaptability and inaccurate maintenance decision in the prior art, and comprehensively improves fault prediction accuracy, real-time performance, long-term adaptability of the system and maintenance economy.

Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of industrial equipment health management and intelligent maintenance technology, and in particular to a method for predicting and maintaining industrial equipment faults by integrating AI vision. Background Technology

[0002] Failures in industrial equipment can lead to production line downtime, reduced production efficiency, and even safety accidents, making the need for early prediction and precise maintenance of equipment increasingly urgent. Existing industrial equipment failure prediction and maintenance technologies mainly achieve this through the following methods: First, monitoring and maintenance based on physical sensors, collecting physical parameters of key equipment components, identifying faults through threshold judgment or simple time-series analysis, and making maintenance decisions based on fixed cycles or manual intervention; Second, predictive maintenance based on cloud-based centralized AI inference, collecting multi-source data from sensors and uploading it to the cloud, using cloud computing power to train complex AI models for fault prediction, and finally distributing the prediction results and maintenance recommendations. However, these methods still have the following shortcomings:

[0003] 1) Limited data dimensions and insufficient integration: Existing technologies rely solely on physical parameters or visual data, resulting in an incomplete characterization of equipment status and potential faults being overlooked. 2) Poor real-time performance and resource adaptability of inference: Centralized cloud-based inference requires uploading large amounts of raw data to the cloud, which is affected by network bandwidth and cloud computing power scheduling, leading to inference latency and making it difficult to meet the real-time monitoring needs of equipment. 3) Insufficient accuracy and dynamism in maintenance decisions: Existing technologies rely on threshold judgments or human experience, which cannot be iteratively optimized according to changes in equipment operating conditions. Long-term use can easily lead to a decline in prediction accuracy, resulting in over-maintenance or missed faults. Summary of the Invention

[0004] To address the shortcomings of existing technologies, this invention provides an industrial equipment fault prediction and maintenance method that integrates AI vision. This method solves the problems of existing technologies, such as: single data dimension, inability to integrate visual and physical parameters to comprehensively characterize equipment status; poor real-time performance of edge inference, high computational resource consumption, and inability to meet industrial-grade latency requirements; insufficient model adaptability, with fixed architecture unable to iteratively optimize with changing operating conditions; and inaccurate maintenance strategies, lacking a dynamic optimization decision-making mechanism, leading to over-maintenance or missed fault diagnosis.

[0005] According to an embodiment of the present invention, a method for industrial equipment fault prediction and maintenance integrating AI vision includes:

[0006] Simultaneously acquire physical state parameters of industrial equipment and visual image data of key components to obtain multi-source heterogeneous data;

[0007] Multimodal feature extraction and weighted fusion are performed on the multi-source heterogeneous data to generate a multidimensional feature vector;

[0008] Based on the lightweight model, the multidimensional feature vector is received, and the model calculation path selection and real-time inference are performed through a dynamic path selection strategy to obtain the fault prediction result.

[0009] The fault prediction results and the multi-source heterogeneous data are uploaded to the cloud, and the multimodal fusion model is iteratively optimized and an updated model is generated through incremental learning algorithm.

[0010] Based on the updated model and the fault prediction results, a fault warning is issued. At the same time, combined with the equipment's historical maintenance records and current operating conditions, the optimal maintenance plan is generated through a multi-objective optimization model.

[0011] Collect feedback results and optimize model parameters and maintenance strategies based on the feedback results to achieve a closed loop.

[0012] Furthermore, the synchronization is achieved through timestamp alignment and data transmission is performed via CAN bus or industrial Ethernet.

[0013] Furthermore, the multimodal feature extraction and weighted fusion includes:

[0014] Key features are obtained by extracting component deformation, surface cracks, and oil stain area percentage from the visual image data;

[0015] The vibration spectrum energy distribution, temperature change gradient, and current peak frequency are extracted from the physical state parameters to obtain the time-series characteristics.

[0016] The key features and the temporal features are weighted and fused using a Transformer-based multimodal fusion model to generate the multidimensional feature vector.

[0017] Furthermore, the lightweight model is obtained by applying a joint optimization strategy of structural pruning and knowledge distillation to the multimodal fusion model:

[0018] The structural pruning simplifies the model structure by removing redundant neurons, network layers, or connection weights in the multimodal fusion model.

[0019] The knowledge distillation process uses the multimodal fusion model as the teacher model and the pruned model as the student model for training. During training, the output of the teacher model is used as a supervision signal to generate a lightweight model.

[0020] Furthermore, the distillation temperature used in the knowledge distillation is adaptively set based on the sensor noise variance, as shown in the formula:

[0021] ;

[0022] in, The distillation temperature. This is the adaptive adjustment coefficient for distillation temperature. The noise variance of the sensor is used to smooth the output distribution of the teacher model. The higher the noise, the higher the distillation temperature.

[0023] Furthermore, the dynamic path selection strategy is implemented through a gating mechanism, which determines whether to skip a specific computation layer based on a gating variable. This gating variable is dynamically calculated based on the current fused feature vector, specifically:

[0024] ;

[0025] in, To fuse feature vectors, Let be a gating variable, representing the first . Activation probability of layer paths, This is the transpose of the weight vector. For bias terms, As an activation function, the gating mechanism also introduces a resource pressure factor and a feature complexity factor, skipping redundant computation layers when resource utilization is high or features are simple.

[0026] Furthermore, the dynamic path selection strategy also includes a safety fallback mechanism: when the prediction confidence fluctuation exceeds a threshold in multiple consecutive inferences, it automatically switches to the full path execution mode;

[0027] When the device's operating environment parameters exceed the safety threshold, the model calculation path structure is forcibly locked.

[0028] Furthermore, the real-time inference satisfies the real-time performance index, namely, the total inference latency does not exceed the maximum tolerable inference latency defined by the system, wherein the total inference latency consists of the sum of the data input / output latency and the computation latency of each activation layer of the model.

[0029] Furthermore, the real-time performance metric also includes stability constraints:

[0030] ;

[0031] in, This is the real-time redundancy coefficient. For real-time redundancy threshold, The maximum tolerable inference latency defined for the system. The total inference delay for a single fault prediction;

[0032] when When this happens, the system triggers a performance-delay trade-off strategy, reducing the resolution of the multimodal fusion input or shortening the timing window to maintain the prediction delay within the range required by industrial control.

[0033] Furthermore, the incremental learning algorithm employs online random forests or online deep neural networks to perform quality checks on new data before model updates and verifies the performance of the new model through A / B testing. The model with better performance is deployed to the edge.

[0034] Furthermore, the incremental learning algorithm also includes an online support vector machine or an incremental graph neural network, with the model being updated at least once a week.

[0035] Furthermore, the fault warning is triggered based on the fault probability prediction value, which is obtained by weighted fusion of the confidence level output by the edge model and the confidence level output by the cloud model.

[0036] Furthermore, the multi-objective optimization model aims to solve for the Pareto optimal maintenance strategy set, and the objective functions include: maximizing the overall equipment health improvement, minimizing the total maintenance cost, and minimizing the maintenance operation risk;

[0037] The constraints include: fault correlation constraints, resource availability constraints, and operational feasibility constraints.

[0038] Furthermore, the fault correlation constraint is generated based on the association rules between fault type and equipment operating conditions.

[0039] The technical principle of this invention is as follows: This invention comprehensively perceives the status of equipment by fusing visual and physical multi-source data, improves the accuracy of status representation based on the multimodal feature fusion of Transformer, obtains a lightweight edge model through pruning and knowledge distillation, and achieves high-precision, low-latency real-time fault prediction at the edge by adjusting the dynamic path; continuously optimizes the model with cloud incremental learning, and generates the optimal maintenance strategy that balances cost, risk and benefit through multi-objective optimization algorithms; finally, it forms a self-iterative optimization closed loop through execution feedback, systematically solving the core problems such as single data, poor real-time performance, model rigidity and suboptimal decision-making.

[0040] Compared with the prior art, the present invention has the following beneficial effects:

[0041] 1. This invention employs synchronous acquisition of multi-source heterogeneous data and Transformer-based multimodal feature fusion technology. By integrating the visual status and physical operating parameters of key equipment components, it solves the problem of insufficient dimensions from a single data source, providing comprehensive and reliable data support for subsequent analysis. With the help of multimodal feature fusion technology, it effectively integrates the complementary value of visual and temporal features, improves the accuracy of equipment status representation, and avoids the shortcomings of incomplete characterization by single-modal data.

[0042] 2. This invention reduces redundant computational overhead of the model through structural pruning, and combines knowledge distillation to allow the lightweight student model to learn key feature representations from the teacher model, which greatly reduces the consumption of computing resources at the edge. At the same time, it dynamically adjusts the computation path through a gating mechanism and introduces a safe rollback mechanism, so that edge inference can meet the core performance indicators of industrial grade. This solves the contradiction between high consumption of computing resources at the edge and insufficient real-time performance in the existing technology, and reduces the hardware deployment cost of edge devices.

[0043] 3. This invention adopts cloud-based incremental learning and a closed-loop optimization mechanism based on execution feedback, which solves the problems of performance degradation and inability to adapt to changes in working conditions when using fixed models for a long time. The generated models and strategies have long-term adaptability through continuous self-iteration with data accumulation, thereby improving the life cycle value of the system.

[0044] 4. This invention uses the NSGA-II multi-objective optimization model to generate a Pareto optimal maintenance strategy set, which solves the problems of inaccurate maintenance strategies and high costs caused by relying on threshold judgment or human experience. It generates a decision scheme that achieves the optimal balance between equipment health, maintenance costs and operational risks, realizing the upgrade from "preventive maintenance" to "predictive optimization maintenance". Attached Figure Description

[0045] Figure 1 This is a flowchart illustrating the steps of an embodiment of the present invention.

[0046] Figure 2 This is a logic diagram for multi-source heterogeneous data acquisition in an embodiment of the present invention.

[0047] Figure 3 This is a flowchart of the multimodal feature extraction and fusion process according to an embodiment of the present invention.

[0048] Figure 4 This is a flowchart of the lightweight model inference and dynamic adjustment mechanism in an embodiment of the present invention. Detailed Implementation

[0049] The technical solutions of the present invention will be further described below with reference to the accompanying drawings and embodiments.

[0050] like Figure 1 As shown in the figure, this invention proposes a method for industrial equipment fault prediction and maintenance that integrates AI vision, including:

[0051] S1: Simultaneously acquire physical state parameters of industrial equipment and visual image data of key components to obtain multi-source heterogeneous data;

[0052] S2: Perform multimodal feature extraction and weighted fusion on multi-source heterogeneous data to generate multidimensional feature vectors;

[0053] S3: Based on a lightweight model, it receives multi-dimensional feature vectors, performs model calculation path selection and real-time inference through a dynamic path selection strategy, and obtains fault prediction results.

[0054] S4: Upload the fault prediction results and multi-source heterogeneous data to the cloud, and use the incremental learning algorithm to iteratively optimize the multimodal fusion model and generate an updated model;

[0055] S5: Based on the updated model and fault prediction results, fault warnings are issued. At the same time, combined with the equipment's historical maintenance records and current operating conditions, the optimal maintenance plan is generated through a multi-objective optimization model.

[0056] S6: Collect feedback results and optimize model parameters and maintenance strategies based on the feedback results to achieve a closed loop.

[0057] The detailed working process of this embodiment includes:

[0058] In step S1, multi-source heterogeneous data is collected. For example... Figure 2 As shown, at least two types of sensors are deployed in industrial equipment operation scenarios. The first type is a high-resolution industrial camera, which is used to collect visual image data of key components of the equipment in real time. The second type is physical sensors (such as vibration sensors, temperature sensors, and current sensors), which are used to synchronously acquire the physical state parameters of the equipment. All data acquisition processes are aligned with timestamps to ensure the time synchronization of multi-source data.

[0059] The first type of sensor is a high-resolution industrial camera, which collects visual image data including the surface condition (such as cracks, rust, foreign objects) and motion state (such as component displacement and vibration amplitude) of key components of the equipment, and performs real-time detection of target components in the image through YOLOv8 or similar algorithms; the second type of sensor includes triaxial vibration sensors, infrared thermal imaging sensors and current transformers, whose physical parameter data are transmitted to edge computing devices in real time via CAN bus or industrial Ethernet.

[0060] In some examples, high-resolution industrial cameras can be replaced with industrial-grade HD cameras or machine vision sensors with real-time image acquisition capabilities (such as CMOS industrial cameras), still enabling visual data acquisition of the surface and motion states of key equipment components; triaxial vibration sensors in physical sensors can be replaced with single-axis / dual-axis vibration sensors (if the vibration characteristics of the equipment failure are concentrated in a single direction), and infrared thermal imaging sensors can be replaced with contact temperature sensors (if the equipment allows close-range installation and the temperature acquisition accuracy requirements are consistent); in terms of data transmission methods, CAN bus / industrial Ethernet can be replaced with 4G / 5G wireless transmission or LoRa IoT protocol, which is suitable for scenarios where wiring is difficult in industrial fields, as long as the synchronization of multi-source data is achieved through timestamp alignment.

[0061] In step S2, multimodal feature extraction and fusion are performed to obtain a multidimensional feature vector of the device state. For example... Figure 3 As shown, the collected visual image data is preprocessed to extract key features from the image (such as component deformation, surface cracks, and the proportion of oil stains). At the same time, the physical parameters are extracted with temporal features (such as vibration spectrum energy distribution, temperature change gradient, and current peak frequency). The visual features and temporal features are then weighted and fused using a Transformer-based multimodal fusion model to generate a multidimensional feature vector of the equipment status.

[0062] In step S3, fault prediction is performed through lightweight model inference and dynamic adjustment. For example... Figure 4 As shown, a lightweight AI model that has undergone model pruning and knowledge distillation is deployed in an edge computing device. This model is fine-tuned based on a pre-trained multimodal fusion model and receives multidimensional feature vectors of the device state in real time during inference, dynamically adjusting the model's computation path (such as skipping redundant feature layers or activating specific sub-models) to reduce the computational load at the edge.

[0063] In edge computing devices, to achieve real-time fault prediction while balancing model accuracy and computational load, the system performs lightweighting and dynamic inference path optimization on the pre-trained multimodal fusion model. The lightweight model reduces redundant computational overhead through structural pruning and incorporates knowledge distillation from the teacher model. Learn key feature representations to form a deployable student model. During the model inference process, the system decides in real time whether to skip redundant feature layers or activate specific sub-models based on the current operating status of the device (such as load rate and ambient temperature) and the utilization rate of edge computing resources, thereby achieving a dynamic balance between accuracy and efficiency.

[0064] After completing the multi-source heterogeneous data acquisition in step S1 and the multi-modal feature extraction and fusion in step S2, the system has obtained a time-synchronized multi-dimensional device state vector. ,in Represents the feature dimension after fusion, a vector. Includes key features from high-resolution vision sensors Key features of physical sensors and the corresponding fusion weights and .

[0065] To achieve real-time fault prediction in edge computing devices while balancing model accuracy and computational load, it is necessary to perform lightweight processing and dynamic inference path optimization on the pre-trained multimodal fusion model.

[0066] Based on this, the core objective of step S3 is to implement an adaptive model inference mechanism at the edge, specifically including:

[0067] 1) Robustness and safe backoff mechanism of dynamic path selection strategy - to ensure that the model maintains stable output under changing operating conditions and feature perturbations.

[0068] 2) Quantification and decision-making mechanism of edge resources and real-time performance indicators (SLO) - to achieve a dynamic balance between inference latency and accuracy.

[0069] 3) Hyperparameter setting and verification process for pruning and knowledge distillation – ensuring that the performance of the lightweight model after compression remains consistent with that of the teacher model.

[0070] Specifically, regarding the robustness and safe fallback mechanism of the dynamic path selection strategy, during lightweight inference, the dynamic path selection strategy utilizes a gating mechanism. Control the activation and skipping of each sub-model or feature layer.

[0071] in, To fuse feature vectors, For gated variables, This is the transpose of the weight vector. For bias terms, The input vector is the Sigmoid activation function. The multimodal fusion output from step S2 includes visual features. Physical characteristics and its fusion weight , The gate variable Indicates the first The activation probability of the layer path is dynamically adjusted based on the feature complexity and the current resource utilization rate to determine whether to skip a specific layer in real time.

[0072] To improve robustness and avoid performance degradation caused by incorrect path decisions, the system introduces a safe rollback mechanism: when consecutive In the next inference, the prediction confidence level fluctuates beyond the threshold. That is, satisfying Then it will automatically switch to full path execution mode (i.e., all...) This ensures the stability of the reasoning.

[0073] Confidence The Sigmoid probability from the edge inference output is affected by the feature fusion stability in step S2 and the data sampling noise in step S1. To ensure decision-making security, the device operating status parameters from step S1 will be monitored in real time during the route adjustment process. ={load rate, temperature, humidity}, when environmental parameters are detected to exceed the safety threshold, the path structure is forcibly locked to avoid frequent model reconstruction under extreme conditions.

[0074] Therefore, this strategy comprehensively utilizes the environmental state and noise features of step S1 and the fusion feature structure of step S2 in terms of robustness, and realizes adaptive inference path control driven by features and state awareness.

[0075] The core of dynamic path adjustment lies in building a lightweight, adaptive inference engine. This engine executes the following closed-loop process within each inference cycle:

[0076] Status monitoring and feature analysis: Real-time acquisition of equipment environmental parameters provided in step S1 and the current fused feature vector output from step S2 Calculate the sparsity of the eigenvectors. and significance , as a quantitative indicator of feature complexity.

[0077] Gating decision generation: Dynamically calculates the gating value for each path based on monitoring data. The gating function considers not only linear weighting. It also introduced resource perception factors. and feature complexity factor Specifically:

[0078] ;

[0079] in, This is a comprehensive score of the current CPU / GPU utilization and memory pressure of edge devices. It is derived from a combination of feature sparsity and saliency. When resource pressure is high or features are relatively simple, the gating mechanism tends to output smaller values. Values are used to skip more computational layers.

[0080] Security Assessment and Coverage: A security assessment must be conducted before applying gating decisions. The assessment criteria include:

[0081] Historical confidence series Does the variance exceed the dynamic threshold? , Based on recent noise levels Adjustment.

[0082] current Whether it is located in the edge region of the model training data distribution (determined by metrics such as Mahalanobis distance).

[0083] If any security assessment triggers an alarm, the current gating decision will be immediately overridden, and the full-path execution mode or the high-redundancy security sub-model will be forcibly enabled.

[0084] Resource-precision collaborative scheduling: The system maintains a lightweight "performance-latency" lookup table or regression model, which learns different path configurations based on historical data. Approximate inference delay and accuracy loss In satisfying Under the constraints, choose to make The minimum path configuration is used as the final execution plan. This process is completed in milliseconds, achieving dynamic optimal scheduling.

[0085] Quantification and dynamic decision-making of edge resource / real-time SLO indicators: Real-time inference at the edge is limited by device computing power and data update frequency. To meet industrial-grade Service Level Objectives (SLO), the system establishes the following real-time evaluation model:

[0086] ;

[0087] in, The total inference delay for a single fault prediction. The input / output delay is determined by the data sampling rate in step S1. With transmission bandwidth Decide; Model No. The computational complexity of each layer is determined by the multimodal fusion model structure (number of layers) in step S2. Channel width Feature Dimension )Sure; This refers to the current available computing resources at the edge (such as real-time GPU utilization), which is dynamically obtained through the system monitoring module. This refers to the maximum tolerable inference latency defined by the system.

[0088] when When this occurs, the system triggers a resource scheduling mechanism, prioritizing resource allocation based on feature importance weights. , (From step S2) Turn off the low-weight path layer to ensure that the critical modal is executed first.

[0089] Meanwhile, the real-time performance metrics include stability constraints:

[0090] ;

[0091] in, This is the real-time redundancy coefficient. This is a real-time redundancy threshold, typically set to 0.1–0.2. At this time, the system triggers a "performance-latency tradeoff strategy": reducing the fused input resolution or shortening the timing window. (Step S2 output) to maintain the predicted delay within the range of industrial control requirements.

[0092] Using this formula, the model can achieve adaptive inference based on the sampling rate of step S1 and the input dimension of step S2, dynamically controlling the inference latency while ensuring accuracy.

[0093] In the hyperparameters, loss function, and validation process of pruning and knowledge distillation, the lightweight model training stage adopts a joint optimization strategy of structural pruning and knowledge distillation to ensure the efficient deployment of the model at the edge.

[0094] The pruning part is based on the model structure parameters (number of layers L, channel width d) and modal feature importance weights output in step S2. Calculate the sparsity of the pruning target:

[0095] ;

[0096] in, For the number of fusion layers, For the first Layer modal weights, computational complexity of the pruned model It is reused in the delay formula of step S3 (see the real-time evaluation formula).

[0097] The following loss function is used in the distillation stage:

[0098] ;

[0099] in: This refers to the distillation weighting coefficient; KL divergence is used to measure the distance between probability distributions. The probability distribution output by the multimodal fusion teacher model in step S2 is... This represents the probability distribution output by the pruned lightweight student model. Cross-entropy loss is used to measure the deviation between the model's predictions and the true labels. A true label for equipment malfunction status; distillation temperature Based on the sensor noise variance in step S1 Adaptive settings, specifically:

[0100] ;

[0101] in, This is the adaptive adjustment coefficient for distillation temperature; the higher the noise, the lower the distillation temperature. The larger the value, the smoother the distribution of teacher output and the better the distillation effect.

[0102] The verification process uses multi-source data streams (synchronized timestamp alignment) from step S1 for batch testing, with a verification set size of ≥5000 records. Performance metrics include:

[0103] ;

[0104] in, Accuracy refers to the proportion of correctly predicted samples in the model's fault prediction results out of the total number of validation samples, reflecting the overall accuracy of the model's predictions. Precision is the harmonic mean of precision and recall, which comprehensively measures the model's ability to identify faulty samples and its false positive rate.

[0105] When the performance meets the requirements and the pruning rate improves the model efficiency by ≥30%, the model version enters the edge deployment process.

[0106]

[0107] Through technical optimization, the multi-dimensional device state feature vectors output from the preceding steps are transformed into reliable real-time inference results. On the one hand, structural pruning and knowledge distillation are used to lightweight the pre-trained multimodal fusion model, significantly reducing the computational load on the edge. Simultaneously, based on the current operating state of the device and the edge computing resource utilization, a gating mechanism dynamically adjusts the model's computational path (e.g., skipping redundant feature layers or activating specific sub-models) to avoid resource waste. On the other hand, a safety fallback mechanism and real-time SLO (Solution Time Limit Out) metric are introduced to ensure that the model can still stably output results that meet accuracy requirements (accuracy ≥ 95%) under varying operating conditions, feature perturbations, or extreme environments. The inference results (≥0.92) provide key data support for subsequent fault warning and maintenance strategy generation, and directly solve the core contradiction of high computing resource consumption and insufficient real-time performance in existing technologies.

[0108] In step S4, cloud-based continuous learning and model updates: the inference results and original multi-source data from the edge are uploaded to the cloud server. The cloud uses an incremental learning algorithm to iteratively optimize the multimodal fusion model and generate an updated model version, which is then distributed to the edge.

[0109] The incremental learning algorithm uses online random forests or online deep neural networks. The cloud server performs quality checks on the new data before updating the model and verifies the predictive performance of the new model through A / B testing. If the new model performs better than the old model, it is deployed to the edge.

[0110] In some examples, incremental learning algorithms employ online support vector machines (SVM) or incremental graph neural networks (GNN). As long as the algorithm can iteratively optimize the multimodal fusion model based on newly added multi-source data and completes data quality verification and A / B testing before model updates, it can achieve model updates at least once a week and maintain the model's long-term predictive performance.

[0111] In step S5, fault warning and maintenance strategy generation: Based on the real-time inference results at the edge, if the predicted fault probability exceeds a preset threshold, a fault warning signal is triggered; at the same time, based on the health status score output by the model updated in the cloud (the score range is 0-100, and maintenance is required when the score is ≤60), combined with the equipment's historical maintenance records and current operating conditions, a targeted maintenance strategy is generated.

[0112] Among them, the fault probability prediction value is weighted and fused by the confidence scores output by the edge model and the cloud model to improve the robustness of the prediction; the maintenance strategy generation algorithm generates the optimal maintenance plan based on the correlation between fault type and equipment operating conditions (such as crack faults being positively correlated with load rate, and oil stain faults being positively correlated with ambient humidity) and maintenance cost constraints (such as minimizing downtime and maintenance costs) through a multi-objective optimization model (such as the NSGA-II algorithm).

[0113] This multi-objective optimization model aims to solve for the Pareto optimal set of maintenance strategies. Its mathematical model is defined as follows:

[0114] Decision variable: a sequence of maintenance actions ,in This indicates maintenance actions (such as no operation, cleaning, tightening, or replacement) taken for a specific component within a specific time window.

[0115] Objective function (to be optimized simultaneously):

[0116] Maximize overall equipment health improvement: .in To adopt strategies Later, in time The improvement in device health score predicted by a cloud-based model. Weights are assigned to different points in time in the future (with higher weights for recent points).

[0117] Minimize total maintenance cost: Including labor costs. spare parts cost and predicted production losses due to downtime .

[0118] Minimize maintenance operation risks: .in The inherent risk factor for this type of maintenance operation, Assess the current health of this component. The attenuation coefficient indicates that the healthier the component, the lower the perceived risk of the same maintenance action.

[0119] Constraints:

[0120] Fault correlation constraints: Based on the fault type and operating condition correlation rules determined in step S5 (such as "crack faults are positively correlated with high load rates"), generate "if the load rate continues to be higher than the threshold X%, then a detection or preventive maintenance action will be taken for the crack". Must in the future Hard or soft constraints that are "scheduled within an hour".

[0121] Resource availability constraints: available maintenance personnel, spare parts inventory, maintenance time windows, etc.

[0122] Operational feasibility constraints: the logical sequence of maintenance actions (e.g., cleaning must precede inspection).

[0123] Optimization and Decision-Making Process: The NSGA-II algorithm is used to solve the above model, outputting a set of Pareto optimal solutions (i.e., a set of non-dominated maintenance strategies). Subsequently, a decision-making layer based on on-site preferences is introduced. This layer selects a final execution strategy from the Pareto frontier based on the urgency of the current production plan, the upper limit of the cost budget, and risk tolerance. For example, during a rush to complete a project, one might choose... (cost) and (Risk) Slightly higher but The strategy that maximizes (health improvement); during normal periods, this might be the best choice. and The optimal strategy is the one that takes all factors into account.

[0124] In step S6, the maintenance execution and feedback closed loop is established: the maintenance strategy is sent to the equipment execution system through the industrial control interface, and the execution results of the maintenance operation are recorded; the execution results are fed back to the cloud for optimization of subsequent model training and maintenance strategy generation algorithms; in the above steps, the model is lightweighted and dynamically adjusted through the edge-cloud collaborative architecture, which solves the contradiction between high computing resource consumption and insufficient real-time performance in the existing technology while ensuring prediction accuracy.

[0125] Finally, it should be noted that the above embodiments are only used to illustrate the technical solutions of the present invention and are not intended to limit it. Although the present invention has been described in detail with reference to preferred embodiments, those skilled in the art should understand that modifications or equivalent substitutions can be made to the technical solutions of the present invention without departing from the spirit and scope of the technical solutions of the present invention, and all such modifications or substitutions should be covered within the scope of the claims of the present invention.

Claims

1. A method for industrial equipment fault prediction and maintenance integrating AI vision, characterized in that: include: Simultaneously acquire physical state parameters of industrial equipment and visual image data of key components to obtain multi-source heterogeneous data; Multimodal feature extraction and weighted fusion are performed on the multi-source heterogeneous data to generate a multidimensional feature vector; Based on the lightweight model, the multidimensional feature vector is received, and the model calculation path selection and real-time inference are performed through a dynamic path selection strategy to obtain the fault prediction result. The fault prediction results and the multi-source heterogeneous data are uploaded to the cloud, and the multimodal fusion model is iteratively optimized and an updated model is generated through incremental learning algorithm. Based on the updated model and the fault prediction results, a fault warning is issued. At the same time, combined with the equipment's historical maintenance records and current operating conditions, the optimal maintenance plan is generated through a multi-objective optimization model. Collect feedback results and optimize model parameters and maintenance strategies based on the feedback results to achieve a closed loop.

2. The method for industrial equipment fault prediction and maintenance integrating AI vision as described in claim 1, characterized in that: The synchronization is achieved through timestamp alignment and data transmission is performed via CAN bus or industrial Ethernet.

3. The method for industrial equipment fault prediction and maintenance integrating AI vision as described in claim 1, characterized in that: The multimodal feature extraction and weighted fusion include: Key features are obtained by extracting component deformation, surface cracks, and oil stain area percentage from the visual image data; The vibration spectrum energy distribution, temperature change gradient, and current peak frequency are extracted from the physical state parameters to obtain the time-series characteristics. The key features and the temporal features are weighted and fused using a Transformer-based multimodal fusion model to generate the multidimensional feature vector.

4. The method for industrial equipment fault prediction and maintenance integrating AI vision as described in claim 1, characterized in that: The lightweight model is obtained by applying a joint optimization strategy of structural pruning and knowledge distillation to the multimodal fusion model: The structural pruning simplifies the model structure by removing redundant neurons, network layers, or connection weights in the multimodal fusion model. The knowledge distillation process uses the multimodal fusion model as the teacher model and the pruned model as the student model for training. During training, the output of the teacher model is used as a supervision signal to generate a lightweight model.

5. The industrial equipment fault prediction and maintenance method integrating AI vision as described in claim 4, characterized in that: The pruning target sparsity of the structural pruning is calculated based on the model structure parameters of the multimodal fusion model and the importance weights of the modal features of each fusion layer, using the following formula: ； in, To determine the sparseness target for pruning. For the number of fusion layers, Let be the modal weights of the i-th layer.

6. The industrial equipment fault prediction and maintenance method integrating AI vision as described in claim 4, characterized in that: The distillation temperature used in the knowledge distillation is adaptively set based on the sensor noise variance, using the following formula: ； in, The distillation temperature. This is the adaptive adjustment coefficient for distillation temperature. The noise variance of the sensor is used to smooth the output distribution of the teacher model. The higher the noise, the higher the distillation temperature.

7. The method for industrial equipment fault prediction and maintenance integrating AI vision as described in claim 1, characterized in that: The dynamic path selection strategy is implemented through a gating mechanism, which determines whether to skip a specific computation layer based on a gating variable. This gating variable is dynamically calculated based on the current fused feature vector. ； in, To fuse feature vectors, Let be a gating variable, representing the first . Activation probability of layer paths, This is the transpose of the weight vector. For bias terms, As an activation function, the gating mechanism also introduces a resource pressure factor and a feature complexity factor, skipping redundant computation layers when resource utilization is high or features are simple.

8. The industrial equipment fault prediction and maintenance method integrating AI vision as described in claim 7, characterized in that: The dynamic path selection strategy also includes a safety rollback mechanism: when the prediction confidence fluctuation exceeds the threshold in multiple consecutive inferences, it automatically switches to the full path execution mode; When the device's operating environment parameters exceed the safety threshold, the model calculation path structure is forcibly locked.

9. The method for industrial equipment fault prediction and maintenance integrating AI vision as described in claim 1, characterized in that: The real-time inference satisfies the real-time performance metric, namely, the total inference latency does not exceed the maximum tolerable inference latency defined by the system, wherein the total inference latency consists of the sum of the data input / output latency and the computation latency of each activation layer of the model.

10. The method for industrial equipment fault prediction and maintenance integrating AI vision as described in claim 9, characterized in that: The real-time performance metrics also include stability constraints: ； in, This is the real-time redundancy coefficient. For real-time redundancy threshold, The maximum tolerable inference latency defined for the system. The total inference delay for a single fault prediction; when When this happens, the system triggers a performance-delay trade-off strategy, reducing the resolution of the multimodal fusion input or shortening the timing window to maintain the prediction delay within the range required by industrial control.

11. The method for industrial equipment fault prediction and maintenance integrating AI vision as described in claim 1, characterized in that: The incremental learning algorithm uses online random forests or online deep neural networks to perform quality checks on new data before model updates and verifies the performance of the new model through A / B testing. The model with better performance is deployed to the edge.

12. The method for industrial equipment fault prediction and maintenance integrating AI vision as described in claim 11, characterized in that: The incremental learning algorithm also includes online support vector machines or incremental graph neural networks, with the model being updated at least once a week.

13. The method for industrial equipment fault prediction and maintenance integrating AI vision as described in claim 1, characterized in that: The fault warning is triggered based on the fault probability prediction value, which is obtained by weighted fusion of the confidence level output by the edge model and the confidence level output by the cloud model.

14. The method for industrial equipment fault prediction and maintenance integrating AI vision as described in claim 1, characterized in that: The multi-objective optimization model aims to solve for the Pareto optimal maintenance strategy set. The objective functions include: maximizing the overall equipment health improvement, minimizing the total maintenance cost, and minimizing the maintenance operation risk. The constraints include: fault correlation constraints, resource availability constraints, and operational feasibility constraints.

15. The method for industrial equipment fault prediction and maintenance integrating AI vision as described in claim 14, characterized in that: The fault correlation constraints are generated based on the association rules between fault type and equipment operating conditions.