A multi-modal industrial autonomous learning method and system
By dynamically integrating weight calculation and incremental lifelong learning mechanisms, combined with a performance feedback loop based on causal inference, the rigidity of multimodal learning methods in industrial environments and the disconnect from optimization goals are solved, thus realizing the continuous optimization and adaptive capabilities of the autonomous learning system.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- NINGBO UNKNOWN DIGITAL INFORMATION TECH CO LTD
- Filing Date
- 2026-03-25
- Publication Date
- 2026-06-19
AI Technical Summary
Existing industrial multimodal learning methods have limitations in long-term, autonomous online learning and optimization. They are unable to adapt to dynamic changes in sensor data quality, resulting in insufficient system robustness, inability of the model to continuously evolve, and a disconnect between optimization objectives and industrial value.
By employing dynamic fusion weight calculation, incremental lifelong learning mechanism, and causal inference performance feedback loop, we can achieve self-awareness, continuous optimization, and autonomous decision-making. We can optimize the fusion strategy through multimodal data fusion and causal relationship analysis.
This enhances the system's stability and adaptability in complex industrial environments, enables continuous learning and optimization of the model, and directly contributes to improving actual production efficiency.
Smart Images

Figure CN122242864A_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the field of industrial automation and intelligent manufacturing technology, and in particular to a multimodal industrial autonomous learning method and system. Background Technology
[0002] With the continuous improvement of industrial intelligence, multimodal data analysis is playing an increasingly important role in industrial scenarios such as equipment condition monitoring and product quality control. By integrating heterogeneous information from different sensors, the accuracy of model perception and decision-making can be effectively improved.
[0003] However, existing industrial multimodal learning methods still have significant limitations in achieving long-term, autonomous online learning and optimization. First, most methods employ fixed fusion strategies, making it difficult to adapt to the dynamic changes in sensor data quality in industrial settings, resulting in insufficient system robustness. Second, when faced with new tasks or data patterns, models typically require full retraining, which is not only inefficient but also prone to losing previously learned knowledge, hindering continuous model evolution. Furthermore, traditional optimization objectives are often disconnected from core performance indicators reflecting actual production value, causing the direction of model improvement to align with the ultimate goals of industrial applications.
[0004] Therefore, existing technologies struggle to build an autonomous learning system that is self-aware, continuously evolving, and driven by industrial value. Summary of the Invention
[0005] The purpose of this invention is to provide a multimodal industrial autonomous learning method and system to address the problems of rigid multimodal fusion strategies, the inability of models to continuously evolve, and the disconnect between optimization objectives and industrial value in existing technologies. By introducing dynamic fusion weight calculation, an incremental lifelong learning mechanism, and a performance feedback loop based on causal inference, the system achieves self-awareness, continuous optimization, and autonomous decision-making capabilities in complex industrial environments, thereby significantly improving the adaptability, robustness, and overall effectiveness of industrial intelligent systems.
[0006] To achieve the above objectives, this invention provides a multimodal industrial autonomous learning method, comprising the following steps: Step S1: Collect multimodal sensing data from industrial equipment, calculate fusion weights based on the information utility of each modal data and the task objective, and fuse the multimodal features. Step S2: Input the fused multimodal features into the incremental lifelong learning model to obtain the inference results of the industrial task and the corresponding uncertainty value; Step S3: Collect industrial performance indicator data related to the inference results; Step S4: Construct a causal graph model and calculate the average treatment effect of different fusion strategy features on industrial performance indicators; Step S5: Based on the calculated average processing effect, generate optimization instructions for the fusion weight calculation process and feed these optimization instructions back to step S1 to start a new round of autonomous optimization cycle.
[0007] Preferably, step S1 specifically includes: Step S11: Synchronously acquire sensor data streams from at least two different modes from industrial equipment; Step S12: For the data of each mode, calculate its modal quality score. Modal information score ; Step S13, for the first Each mode, and its modal mass fraction Modal information score Its information utility is calculated by linear weighted summation. : ; in, and This represents the learning parameters used to balance quality and information contribution; Step S14, for all The information utility of each modality is normalized using Softmax to obtain the fusion weights for each modality. The calculation formula is as follows: ; in, Indicates the first One modality, Indicates the first Information utility of each modality; Step S15: Utilize fusion weights The feature vectors of each modality are weighted and summed to complete the multimodal feature fusion.
[0008] Preferably, modal mass fraction The calculation method is as follows: For the For each modality, calculate its eigenvector at the current time. Compared with the preset modal reference feature vector Mahalanobis distance between The calculation formula is as follows: ; in, The covariance matrix represents the baseline characteristic of this mode. Indicates transpose; Subsequently, the Mahalanobis distance is mapped to a modal quality fraction using the Sigmoid function, calculated as follows: ; in, Indicates the scaling factor. This represents the distance threshold.
[0009] Preferably, modal information score The calculation method is as follows: For the Each modality estimates its data within a sliding time window. With task tags Normalized mutual information between them is used as their modal information score. The calculation formula is as follows: ; in, express and mutual information, This represents information entropy.
[0010] Preferably, in step S2, the incremental lifelong learning model consists of a shared feature extraction backbone network and multiple task-specific output head networks; the backbone network is a multilayer perceptron, whose parameters are shared across all tasks; each output head network is a fully connected layer responsible for the output of a specific task; when a new task is encountered, a new output head network is created and connected to the frozen backbone network for training.
[0011] Preferably, in step S2, the uncertainty value The calculation process is as follows: To conduct incremental lifelong learning models Each forward propagation uses Dropout in each iteration, resulting in... Each predicted probability distribution ; calculate The mean of a probability distribution The calculation formula is as follows: ; calculate Information entropy as a measure of uncertainty The calculation formula is as follows: ; in, Indicates the first Second forward propagation, Indicates the first The predicted probability distribution obtained from the first forward propagation Indicates the total number of categories. Indicates the first There are several categories.
[0012] Preferably, step S4 specifically includes: Step S41: Define a cause-effect graph, whose nodes include fusion strategy variables. Environmental operating condition variables and industrial performance indicator variables , , Indicates equipment operating parameters; Step S42: Calculate the average treatment effect using a dual machine learning method. : Using the first machine learning model Using environmental operating condition variables As input, predict industrial performance indicators To obtain the residual The calculation formula is as follows: ; in, This represents a first-gradient boosting decision tree; Using a second machine learning model Using environmental operating condition variables As input, predict the fusion policy variables. To obtain the residual The calculation formula is as follows: ; in, This represents a second-gradient boosting decision tree; The calculation formula is obtained by fitting a linear regression model: ; in, Represents the error term, regression coefficient This is the average treatment effect.
[0013] Preferred, fusion strategy variables From the time window The fusion weight sequence within It consists of at least two statistical features extracted from it; Fusion strategy variables The calculation formula is as follows: ; in, This represents an index of a discrete time point within a time window. Indicates the current moment. Indicates the length of the time window. , , This represents the result calculated from the fusion weight sequence. A number of different statistical characteristics; , , This represents the weighting coefficients corresponding to each statistical feature. Indicates the first Each modality at time point The fusion weight.
[0014] Preferably, step S5 specifically includes: Step S51: From the calculation results of step S4, select the treatments with the largest positive average effect. Fusion strategy characteristics ; Step S52: Generate fusion strategy features The internal parameters in step S1 corresponding to the time are set as the optimization target. The internal parameters include the baseline feature vector. Distance threshold or balance parameters , ; Step S53: Feedback the optimization target to step S1 in a smooth update manner: ; in, This indicates the current value of the parameter. This indicates the updated value of the parameter. This represents the meta-learning rate.
[0015] This invention also provides a multimodal industrial autonomous learning system, comprising: A data acquisition module is used to simultaneously acquire sensor data streams from at least two different modes from industrial equipment; The feature fusion module is used to calculate the fusion weights based on the information utility of each modality's data and the task objective, and to fuse multimodal features. The inference module is used to input the fused multimodal features into the incremental lifelong learning model to obtain the inference results and corresponding uncertainty values for industrial tasks. The industrial performance indicator collection module is used to collect industrial performance indicator data that are associated with the inference results; The causal analysis module is used to construct causal graph models and calculate the average treatment effect of different fusion strategy features on industrial performance indicators. The optimization feedback module is used to generate optimization instructions for the fusion weight calculation process based on the calculated average processing effect, and feed these optimization instructions back to the data acquisition module to start a new round of autonomous optimization cycle.
[0016] Therefore, the present invention employs the above-mentioned multimodal industrial autonomous learning method and system, and the beneficial technical effects are as follows: (1) This invention calculates modal quality score and information score in real time and dynamically generates fusion weight, enabling the system to automatically sense and adapt to abnormal situations such as fluctuations in sensor data quality and modal loss, overcoming the rigidity of traditional fixed fusion strategies and enhancing the stability and reliability of the system in complex industrial environments.
[0017] (2) By adopting an incremental lifelong learning model architecture, the system does not need to retrain the entire system when introducing new tasks or new data. Instead, it only needs to dynamically create and train a specific output head network. At the same time, it uses knowledge distillation technology to protect existing knowledge, thereby enabling continuous learning of new knowledge without forgetting old knowledge and possessing the ability to learn autonomously throughout life.
[0018] (3) This invention innovatively introduces a causal inference method, which guides the optimization direction by analyzing the causal relationship between the fusion strategy and key performance indicators (such as equipment overall efficiency and product yield), so that the adjustment of the front-end perception fusion strategy directly serves the goal of improving the actual production efficiency of the back-end, and solves the problem of the disconnect between model optimization and actual value. Attached Figure Description
[0019] Figure 1 This is a flowchart of a multimodal industrial autonomous learning method according to the present invention; Figure 2 This is a schematic diagram illustrating the calculation of weights for multimodal feature fusion. Figure 3 This is an architecture diagram of a multimodal industrial autonomous learning system according to the present invention. Detailed Implementation
[0020] The technical solution of the present invention will be further described below with reference to the accompanying drawings and embodiments.
[0021] Unless otherwise defined, the technical or scientific terms used in this invention shall have the ordinary meaning as understood by one of ordinary skill in the art to which this invention pertains.
[0022] Example 1 This embodiment uses CNC machine tool tool wear monitoring as an application scenario to illustrate the implementation process of a multimodal industrial autonomous learning method of the present invention. In this scenario, the system achieves real-time identification and prediction of tool status by fusing sensor data from three modes: vibration, acoustics, and infrared thermal imaging. It also autonomously optimizes the fusion strategy based on production performance indicators (such as tool replacement cost, downtime, and product defect rate).
[0023] like Figure 1 As shown, the method includes the following steps: Step S1: Collect multimodal sensing data from industrial equipment, calculate fusion weights based on the information utility of each modal data and the task objective, and fuse the multimodal features.
[0024] Step S11: Synchronously acquire sensor data streams from at least two different modes from the industrial equipment; in this embodiment, vibration sensor data (mode 1), acoustic sensor data (mode 2), and infrared thermal imaging data (mode 3) are acquired. All data are acquired synchronously at a sampling rate of 100Hz and time-aligned.
[0025] Step S12, as follows Figure 2 For each modal data point, calculate its modal quality score. Modal information score The modal quality score reflects the reliability of the data itself, while the modal information score reflects its ability to discriminate the current task.
[0026] (1) Modal mass fraction The calculation method is as follows: For the For each modality, calculate its eigenvector at the current time. Compared with the preset modal reference feature vector Mahalanobis distance between The calculation formula is as follows: ; in, The covariance matrix representing the baseline characteristics of this mode is estimated using historical normal data. Indicates transpose; Subsequently, the Mahalanobis distance is mapped to a modal quality fraction using the Sigmoid function, calculated as follows: ; in, This indicates the scaling factor (set to 1.0). This represents the distance threshold (set to 3.0), used to control the sensitivity to quality degradation.
[0027] (2) Modal information score The calculation method is as follows: For the Each modality estimates its data within a sliding time window. With task tags Normalized mutual information between them is used as their modal information score. The calculation formula is as follows: ; in, express and mutual information, Represents information entropy. Represents modal data, Labels indicating wear status (such as "normal", "slight wear", "severe wear").
[0028] Step S13, for the first Each mode, and its modal mass fraction Modal information score Its information utility is calculated by linear weighted summation. : ; in, and This represents the learning parameters used to balance quality and information contribution. In this embodiment, and Set them to 0.6 and 0.4 respectively.
[0029] Step S14, for all The information utility of each (3) modality is normalized using Softmax to obtain the fusion weights for each modality. The calculation formula is as follows: ; in, Indicates the first One modality, Indicates the first Information utility of each modality; Step S15: Utilize fusion weights The feature vectors of each modality are weighted and summed to complete the multimodal feature fusion.
[0030] Step S2: Input the fused multimodal features into the incremental lifelong learning model to obtain the inference results of the industrial task and the corresponding uncertainty value.
[0031] The incremental lifelong learning model consists of a shared feature extraction backbone network and multiple task-specific output head networks.
[0032] The backbone network is a 5-layer MLP, which outputs 128-dimensional shared features; The output head network is a fully connected layer, corresponding to "tool state classification" and "remaining life regression" respectively; When a new "tool type recognition" task is added, a new output head is created and the main branch is frozen for training.
[0033] Uncertainty value The calculation process is as follows: To conduct incremental lifelong learning models Each forward propagation uses Dropout in each iteration, resulting in... Each predicted probability distribution ; calculate The mean of a probability distribution The calculation formula is as follows: ; calculate Information entropy as a measure of uncertainty The calculation formula is as follows: ; in, Indicates the first Second forward propagation, Indicates the first The predicted probability distribution obtained from the first forward propagation Indicates the total number of categories. Indicates the first There are several categories.
[0034] Step S3: Collect industrial performance indicator data related to the inference results, including: tool replacement cost (yuan / time), downtime caused by tool replacement (minutes), and product defect rate caused by tool problems (%).
[0035] Step S4: Construct a causal graph model and calculate the average treatment effect of different fusion strategy features on industrial performance indicators.
[0036] Step S41: Define a cause-effect graph, whose nodes include fusion strategy variables. Environmental operating condition variables and industrial performance indicator variables , , Indicates equipment operating parameters; Fusion strategy variables: consisting of the mean, variance, and dominant mode percentage of each modality weight over the past 5 minutes; Environmental operating condition variables: including spindle speed, feed rate, depth of cut, and uncertainties; Industrial performance indicators, such as "product defect rate".
[0037] Step S42: Calculate the average treatment effect using a dual machine learning method. : Using the first machine learning model (GBDT), based on environmental operating condition variables As input, predict industrial performance indicators To obtain the residual The calculation formula is as follows: ; in, This represents a first-gradient boosting decision tree; Using a second machine learning model (GBDT), based on environmental operating condition variables As input, predict the fusion policy variables. To obtain the residual The calculation formula is as follows: ; in, This represents a second-gradient boosting decision tree; The calculation formula is obtained by fitting a linear regression model: ; in, The error term is represented by the obtained regression coefficients. This is the average treatment effect.
[0038] Fusion strategy variables From the time window The fusion weight sequence within It consists of at least two statistical features extracted from it; Fusion strategy variables The calculation formula is as follows: ; in, This represents an index of a discrete time point within a time window. Indicates the current moment. Indicates the length of the time window. , , This represents the result calculated from the fusion weight sequence. A number of different statistical characteristics; , , This represents the weighting coefficients corresponding to each statistical feature. Indicates the first Each modality at time point The fusion weights; Statistical characteristics include any one or more of the following combinations: Weighted mean characteristics The average value of the modal weights within this time window, i.e.: ; Weighted variance features The variance of the modal weights within this time window, i.e.: ; Dominant mode proportion characteristics Within this time window, a specific mode is designated as the dominant mode (i.e., The proportion of time steps to the total number of time steps.
[0039] Step S5: Based on the calculated average processing effect, generate optimization instructions for the fusion weight calculation process and feed these optimization instructions back to step S1 to start a new round of autonomous optimization cycle.
[0040] Step S51: From the calculation results of step S4, select the treatments with the largest positive average effect. Fusion strategy characteristics ; Step S52: Generate fusion strategy features The internal parameters in step S1 corresponding to the time are set as the optimization target. The internal parameters include the baseline feature vector. Distance threshold or balance parameters , ; Step S53: Feedback the optimization target to step S1 in a smooth update manner: ; in, This indicates the current value of the parameter. This indicates the updated value of the parameter. This represents the meta-learning rate, which is set to 0.01 in this embodiment.
[0041] In this embodiment, after three rounds of autonomous optimization, the tool misjudgment rate decreased by 18%, the defect rate caused by tool problems decreased by 12%, and the system can still maintain stable recognition performance when facing new types of tools, demonstrating good adaptive and continuous learning capabilities.
[0042] Example 2 like Figure 3 As shown, a multimodal industrial autonomous learning system includes: A data acquisition module is used to simultaneously acquire sensor data streams from at least two different modes from industrial equipment; The feature fusion module is used to calculate the fusion weights based on the information utility of each modality's data and the task objective, and to fuse multimodal features. The inference module is used to input the fused multimodal features into the incremental lifelong learning model to obtain the inference results and corresponding uncertainty values for industrial tasks. The industrial performance indicator collection module is used to collect industrial performance indicator data that are associated with the inference results; The causal analysis module is used to construct causal graph models and calculate the average treatment effect of different fusion strategy features on industrial performance indicators. The optimization feedback module is used to generate optimization instructions for the fusion weight calculation process based on the calculated average processing effect, and feed these optimization instructions back to the data acquisition module to start a new round of autonomous optimization cycle.
[0043] It is worth noting that all contents not described in detail in this invention are existing technologies and are well known to those skilled in the art.
[0044] Therefore, this invention adopts the above-mentioned multimodal industrial autonomous learning method and system, which solves the problems of rigid multimodal fusion strategies, inability of models to continuously evolve, and disconnection between optimization objectives and industrial value, and realizes a complete autonomous optimization closed loop from perception to decision-making in complex industrial environments.
[0045] Finally, it should be noted that the above embodiments are only used to illustrate the technical solutions of the present invention and not to limit them. Although the present invention has been described in detail with reference to preferred embodiments, those skilled in the art should understand that modifications or equivalent substitutions can still be made to the technical solutions of the present invention, and these modifications or equivalent substitutions cannot cause the modified technical solutions to deviate from the spirit and scope of the technical solutions of the present invention.
Claims
1. A multimodal industrial autonomous learning method, characterized in that, Includes the following steps: Step S1: Collect multimodal sensing data from industrial equipment, calculate fusion weights based on the information utility of each modal data and the task objective, and fuse the multimodal features. Step S2: Input the fused multimodal features into the incremental lifelong learning model to obtain the inference results of the industrial task and the corresponding uncertainty value; Step S3: Collect industrial performance indicator data related to the inference results; Step S4: Construct a causal graph model and calculate the average treatment effect of different fusion strategy features on industrial performance indicators; Step S5: Based on the calculated average processing effect, generate optimization instructions for the fusion weight calculation process and feed these optimization instructions back to step S1 to start a new round of autonomous optimization cycle.
2. The multimodal industrial autonomous learning method according to claim 1, characterized in that, Step S1 specifically includes: Step S11: Synchronously acquire sensor data streams from at least two different modes from industrial equipment; Step S12: For the data of each mode, calculate its modal quality score. Modal information score ; Step S13, for the first Each mode, and its modal mass fraction Modal information score Its information utility is calculated by linear weighted summation. : ; in, and This represents the learning parameters used to balance quality and information contribution; Step S14, for all The information utility of each modality is normalized using Softmax to obtain the fusion weights for each modality. The calculation formula is as follows: ; in, Indicates the first One modality, Indicates the first Information utility of each modality; Step S15: Utilize fusion weights The feature vectors of each modality are weighted and summed to complete the multimodal feature fusion.
3. The multimodal industrial autonomous learning method according to claim 2, characterized in that, Modal mass fraction The calculation method is as follows: For the For each modality, calculate its eigenvector at the current time. Compared with the preset modal reference feature vector Mahalanobis distance between The calculation formula is as follows: ; in, The covariance matrix represents the baseline characteristic of this mode. Indicates transpose; Subsequently, the Mahalanobis distance is mapped to a modal quality fraction using the Sigmoid function, calculated as follows: ; in, Indicates the scaling factor. This represents the distance threshold.
4. The multimodal industrial autonomous learning method according to claim 3, characterized in that, Modal information score The calculation method is as follows: For the Each modality estimates its data within a sliding time window. With task tags Normalized mutual information between them is used as their modal information score. The calculation formula is as follows: ; in, express and mutual information, This represents information entropy.
5. The multimodal industrial autonomous learning method according to claim 1, characterized in that, In step S2, the incremental lifelong learning model consists of a shared feature extraction backbone network and multiple task-specific output head networks; the backbone network is a multilayer perceptron, whose parameters are shared across all tasks; each output head network is a fully connected layer responsible for the output of a specific task; when a new task is encountered, a new output head network is created and connected to the frozen backbone network for training.
6. The multimodal industrial autonomous learning method according to claim 4, characterized in that, In step S2, the uncertainty value The calculation process is as follows: To conduct incremental lifelong learning models Each forward propagation uses Dropout in each iteration, resulting in... Each predicted probability distribution ; calculate The mean of a probability distribution The calculation formula is as follows: ; calculate Information entropy as a measure of uncertainty The calculation formula is as follows: ; in, Indicates the first Second forward propagation, Indicates the first The predicted probability distribution obtained from the first forward propagation Indicates the total number of categories. Indicates the first There are several categories.
7. The multimodal industrial autonomous learning method according to claim 6, characterized in that, Step S4 specifically includes: Step S41: Define a cause-effect graph, whose nodes include fusion strategy variables. Environmental operating condition variables and industrial performance indicator variables , , Indicates equipment operating parameters; Step S42: Calculate the average treatment effect using a dual machine learning method. : Using the first machine learning model Using environmental operating condition variables As input, predict industrial performance indicators To obtain the residual The calculation formula is as follows: ; in, This represents a first-gradient boosting decision tree; Using a second machine learning model Using environmental operating condition variables As input, predict the fusion policy variables. To obtain the residual The calculation formula is as follows: ; in, This represents a second-gradient boosting decision tree; The calculation formula is obtained by fitting a linear regression model: ; in, Represents the error term, regression coefficient This is the average treatment effect.
8. The multimodal industrial autonomous learning method according to claim 7, characterized in that, Fusion strategy variables From the time window The fusion weight sequence within It consists of at least two statistical features extracted from it; Fusion strategy variables The calculation formula is as follows: ; in, This represents an index of a discrete time point within a time window. Indicates the current moment. Indicates the length of the time window. , , This represents the result calculated from the fusion weight sequence. A number of different statistical characteristics; , , This represents the weighting coefficients corresponding to each statistical feature. Indicates the first Each modality at time point The fusion weight.
9. A multimodal industrial autonomous learning method according to claim 8, characterized in that, Step S5 specifically includes: Step S51: From the calculation results of step S4, select the treatments with the largest positive average effect. Fusion strategy characteristics ; Step S52: Generate fusion strategy features The internal parameters in step S1 corresponding to the time are set as the optimization target. The internal parameters include the baseline feature vector. Distance threshold or balance parameters , ; Step S53: Feedback the optimization target to step S1 in a smooth update manner: ; in, This indicates the current value of the parameter. This indicates the updated value of the parameter. This represents the meta-learning rate.
10. A multimodal industrial autonomous learning system, characterized in that, include: A data acquisition module is used to simultaneously acquire sensor data streams from at least two different modes from industrial equipment; The feature fusion module is used to calculate the fusion weights based on the information utility of each modality's data and the task objective, and to fuse multimodal features. The inference module is used to input the fused multimodal features into the incremental lifelong learning model to obtain the inference results and corresponding uncertainty values for industrial tasks. The industrial performance indicator collection module is used to collect industrial performance indicator data that are associated with the inference results; The causal analysis module is used to construct causal graph models and calculate the average treatment effect of different fusion strategy features on industrial performance indicators. The optimization feedback module is used to generate optimization instructions for the fusion weight calculation process based on the calculated average processing effect, and feed these optimization instructions back to the data acquisition module to start a new round of autonomous optimization cycle.