Industrial production line intelligent monitoring and self-adaptive regulation method based on multi-source visual perception

By combining multi-source visual perception with digital twin technology, a unified scene understanding map is generated and reinforcement learning is performed, which solves the problems of insufficient perception and reliance on human experience in industrial production line monitoring, and realizes efficient and autonomous real-time monitoring and control of the production line.

CN122194631APending Publication Date: 2026-06-12NANTONG BLUE DRAGONFLY INTELLIGENT TECHNOLOGY CO LTD

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
NANTONG BLUE DRAGONFLY INTELLIGENT TECHNOLOGY CO LTD
Filing Date
2026-01-30
Publication Date
2026-06-12

AI Technical Summary

Technical Problem

Existing technologies for monitoring industrial production lines suffer from limitations such as insufficient perception capabilities of single visual sensors, inability to acquire multi-dimensional information, and lack of in-depth information interaction and closed-loop linkage. This results in incomplete and inaccurate monitoring, reliance on human experience for control, and slow response, making it unsuitable for flexible production needs.

Method used

By employing multi-source visual perception and digital twin technology, a unified scene understanding map is generated through a dynamic attention fusion network. Combined with a reinforcement learning agent, the digital twin performs state inference and regulation, realizing an autonomous closed loop of perception and decision-making, and performing autonomous optimization.

🎯Benefits of technology

It achieves comprehensiveness and accuracy in multi-dimensional perception, foresight and security in decision-making, autonomy and optimization in regulation, and the system can self-evolve to adapt to production changes, thereby improving the efficiency of real-time monitoring and regulation of the production line.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN122194631A_ABST
    Figure CN122194631A_ABST
Patent Text Reader

Abstract

The present application relates to the technical field of industrial automation and intelligent manufacturing, and is an industrial production line intelligent monitoring and self-adaptive regulation method based on multi-source visual perception, which comprises the following steps: synchronously collecting visual sensor data of at least two different principles of the same target in a production line; processing the multi-source visual data through a dynamic attention fusion network to generate a unified scene understanding graph containing multi-dimensional attributes of the target; synchronously transmitting the scene understanding graph to a digital twin of the production line, updating the state thereof, and performing state deduction and abnormality analysis on the production process in the digital twin; and based on the deduction result of the scene understanding graph and the digital twin, collecting and regulating the production line data trained in the digital twin environment. The present application has the following advantages: more comprehensive and intelligent perception; more forward-looking and safe decision-making; more autonomous and optimized regulation; and self-evolution of the system.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of industrial automation and intelligent manufacturing technology, and in particular to a method for intelligent monitoring and adaptive control of industrial production lines based on multi-source visual perception. Background Technology

[0002] In modern intelligent manufacturing, real-time monitoring and precise control of production lines are crucial for ensuring product quality and improving production efficiency. Traditional monitoring methods mainly rely on single-type sensors and preset threshold judgments, which have significant limitations. First, single vision sensors are vulnerable in complex industrial environments and cannot acquire multi-dimensional information such as temperature, material, and three-dimensional shape, resulting in incomplete and inaccurate monitoring. Second, most existing monitoring systems stop at "detecting problems," while "how to adjust" heavily relies on human experience. Operators modify equipment parameters based on alarm information and experience, resulting in slow response, poor consistency, and an inability to cope with the frequent production changeovers and process adjustments required in flexible production.

[0003] In recent years, digital twin technology has made it possible to build virtual production line models, and reinforcement learning technology has shown great potential in complex decision-making problems. However, existing technologies typically treat perception, modeling, and decision-making as isolated processes: visual inspection systems output defect results, digital twin systems provide visualization or offline simulation, and control systems operate according to fixed logic. These processes lack deep, automated information interaction and closed-loop linkage, creating isolated "data silos" and "decision breakpoints."

[0004] Therefore, this application proposes an end-to-end intelligent control scheme that can deeply integrate and understand multi-source sensing information, and based on this, proactively extrapolate and automatically generate the optimal control strategy in virtual space, and continuously optimize itself according to the actual effect. Summary of the Invention

[0005] (a) Technical problems to be solved This invention aims to overcome the shortcomings of existing technologies and provide a method and system for intelligent control of production lines based on multi-source vision and digital twins. The core of this method lies in constructing an autonomous closed loop of "perception-deduction-decision-learning," realizing integrated intelligent control from multi-dimensional perception of production status, to predicting the future and making autonomous decisions in the digital space, and then to continuously evolving based on real-world feedback.

[0006] (II) Technical Solution To solve the above-mentioned technical problems, the technical solution provided by this invention is: a method for intelligent monitoring and adaptive control of industrial production lines based on multi-source visual perception, comprising the following steps: Simultaneously collect visual sensor data based on at least two different principles for the same target in the production line; The multi-source visual data is processed by a dynamic attention fusion network to generate a unified scene understanding map containing multi-dimensional attributes of the target. The scenario understanding map is synchronized to the digital twin of the production line, its state is updated, and the state deduction and anomaly analysis of the production process are performed in the digital twin. Based on the scenario understanding map and the inference results of the digital twin, a production line data after being trained and regulated in the digital twin environment is used to optimize and update the decision-making strategy of the reinforcement learning agent and the simulation model of the digital twin by utilizing the difference between the results and the predictions of the digital twin.

[0007] As an improvement, the dynamic attention fusion network dynamically assigns spatial attention weights to visual data features from different sources according to different monitoring tasks, and then fuses them.

[0008] As an improvement, state extrapolation in the digital twin refers to simulating the subsequent development of a production deviation in a virtual environment and assessing its impact when a deviation is detected.

[0009] As an improvement, the reinforcement learning agent is pre-trained offline in a simulation environment composed of a digital twin, and the optimal control action is selected by comparing the inference results of the digital twin during online operation.

[0010] The intelligent control system is characterized by comprising: A multi-source visual sensing unit for acquiring at least two types of visual data; The data fusion and understanding unit is used to run the dynamic attention fusion network and generate a scene understanding graph. The digital twin and simulation unit is used to maintain and operate a digital twin synchronized with the physical production line and to perform state simulations. An adaptive decision-making unit is used to run the reinforcement learning agent and generate control instructions. A closed-loop learning unit is used to optimize the reinforcement learning agent and digital twin based on actual feedback.

[0011] As an improvement, the multi-source visual perception unit includes at least two of the following visual sensor types: visible light camera, infrared thermal imager, three-dimensional structured light sensor, and hyperspectral imager.

[0012] The present invention has the following advantages: 1. More comprehensive and intelligent perception: Through a task-driven dynamic attention fusion mechanism, multi-source visual information is organically integrated, overcoming the limitations of a single sensor and generating a semantically rich multi-dimensional scene understanding map, providing a solid and reliable data foundation for subsequent decision-making.

[0013] 2. More forward-looking and safer decision-making: Real-time state simulation using digital twins enables a shift from reactive response to proactive prevention. Decision-making based on simulation results significantly reduces the risks and costs of trial and error directly on the physical production line.

[0014] 3. More autonomous and optimized regulation: Through reinforcement learning agents, the system can automatically learn the optimal regulation strategy under complex production conditions and use digital twins for online simulation and optimization, thus getting rid of the heavy dependence on fixed rules and human experience, and is especially suitable for flexible and customized production.

[0015] 4. The system is capable of self-evolution: A unique dual-model closed-loop learning mechanism enables the system to continuously self-correct and improve by utilizing the gap between reality and prediction. Both the control strategy and the simulation model become increasingly accurate as the production process progresses, achieving continuous growth and adaptation of the system. Attached Figure Description

[0016] Figure 1 This is a schematic diagram of the overall process of the intelligent control method provided in the embodiments of the present invention. Detailed Implementation

[0017] The technical solutions of the present invention will be clearly and completely described below with reference to the accompanying drawings of the embodiments of the present invention. Obviously, the described embodiments are only some embodiments of the present invention, and not all embodiments. All other embodiments obtained by those skilled in the art based on the embodiments of the present invention without creative effort are within the scope of protection of the present invention.

[0018] As attached Figure 1 The diagram illustrates the overall process of the intelligent control method of this invention. The intelligent monitoring and adaptive control method for industrial production lines based on multi-source visual perception includes the following steps: Step S1: Synchronous acquisition of multi-source visual data.

[0019] Deploy a set of time- and space-calibrated multi-source vision sensors at key workstations on the production line (such as welding, assembly, and quality inspection stations). For example, a typical configuration may include a high-resolution visible light color camera, an infrared thermal imager, and a 3D structured light scanner. These sensors ensure data acquisition of the same target object at the same production moment through hardware triggering or network synchronization protocols.

[0020] Step S2: Dynamically fuse and generate a scene understanding map.

[0021] The core of this step is the dynamic attention fusion network. The network's input is registered multi-source image data (such as RGB images, heatmaps, and depth maps). First, depth feature maps of each source data are extracted through their respective convolutional neural network (CNN) branches. Then, the key innovation lies in the task-guided dynamic attention fusion module. The system inputs a task code (e.g., "detect surface defects," "measure weld penetration," "monitor component temperature") based on the primary task to be addressed (which can be triggered by upstream processes or scheduled by the system).

[0022] This module generates a spatial attention weight map for each source data feature map based on the task encoding. Each pixel value in this weight map represents the importance of the corresponding sensor feature at that spatial location for the current task. For example, for the task of "detecting surface scratches," the weight of visible light (RGB) features will be significantly increased in the texture area of ​​the object's surface; while for the task of "monitoring bearing overheating," the weight of infrared thermal imaging features will dominate at the bearing installation location. Subsequently, each feature map is multiplied by its respective attention weight map and then fused (e.g., convolution after channel stitching) to finally output a unified "scene understanding map." This map can be a tensor, where each spatial location is associated with a set of attribute vectors, such as [category, confidence, 2D position, 3D coordinates, temperature, material inference, etc.].

[0023] Step S3: Synchronize and update the digital twin and extrapolate its state.

[0024] The scene understanding map generated in the previous step is injected into the digital twin of the production line in real time. The digital twin is a high-fidelity virtual model containing geometric, physical, and behavioral rules. The map information is used to update the state of the corresponding virtual objects in the twin (such as position, temperature, and surface condition).

[0025] When an anomaly is detected in the graph, the state simulation engine is activated. Based on embedded physical laws and process knowledge, this engine starts from the current state and "fast-forwards" through the twin to simulate the production process over a future period. Through simulation, the potential impact of the anomaly can be quantitatively assessed: for example, will a tiny dimensional deviation cause subsequent assembly to stall? Will a localized overheating cause a degradation in material properties? This provides crucial information on the "severity" and "urgency" dimensions for decision-making.

[0026] Step S4: Reinforcement learning agent decision-making and control execution.

[0027] Decision-making is accomplished by a reinforcement learning agent. This agent has been pre-trained on a large scale in an infinite, lossless simulation environment composed of digital twins during the offline phase. Its state (S) is an abstract representation of the scene understanding graph, its actions (A) are a set of adjustable process parameters (such as robot speed, laser power, and valve opening), and its reward (R) is a composite function combining quality, efficiency, and energy consumption.

[0028] During online operation, when adjustments are needed, the agent receives the current state (S). Instead of directly outputting an action, it first "consults" with its digital twin: the agent proposes N candidate actions (A1, A2, ..., An), and the twin quickly extrapolates the future result of executing each action and estimates its reward (R1, R2, ..., Rn). The agent ultimately selects the action with the highest predicted reward (e.g., Ai) as the final instruction and issues it to the physical production line (PLC, robot controller, etc.). This "simulation-based optimization" mechanism ensures that every adjustment is virtually verified, greatly improving safety and reliability.

[0029] Step S5: Closed-loop feedback and dual-model self-evolution.

[0030] After the control action is executed, the system returns to step S1 to collect new multi-source visual data and generate a new scene understanding map. The closed-loop learning unit then begins to work: it compares the new map (representing the real-world result) with the predicted map made by the digital twin before executing the action Ai, and calculates the difference (i.e., the prediction error).

[0031] This error signal is used for joint backpropagation: On one hand, the error is used to update the policy network parameters of the reinforcement learning agent. Its learning objective is: "In state S, you chose action Ai, but the actual effect is different from the expectation. Next time in a similar state, you need to adjust your selection policy." On the other hand, the error is also used to update the parameters of the adjustable simulation model in the digital twin (such as the coefficient of friction and the coefficient of thermal conductivity). Its learning objective is: "If your prediction of the world after the action is inaccurate, it means that your virtual world model deviates from the real world and you need to correct your physical rule parameters." Through this continuous "decision-verification-correction" process, the agent's strategy becomes increasingly better, the digital twin's model becomes increasingly realistic, and the entire system achieves co-evolution.

[0032] It should be noted that, in this document, relational terms such as "first" and "second" are used only to distinguish one entity or operation from another, and do not necessarily require or imply any such actual relationship or order between these entities or operations. Furthermore, the terms "comprising," "including," or any other variations thereof are intended to cover non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements includes not only those elements but also other elements not expressly listed, or elements inherent to such process, method, article, or apparatus.

[0033] Although embodiments of the invention have been shown and described, it will be understood by those skilled in the art that various changes, modifications, substitutions and alterations can be made to these embodiments without departing from the principles and spirit of the invention, the scope of which is defined by the appended claims and their equivalents.

[0034] The present invention and its embodiments have been described above. This description is not restrictive, and the accompanying drawings are only one embodiment of the present invention; the actual structure is not limited thereto. In conclusion, if those skilled in the art are inspired by this description and design similar structures and embodiments without departing from the spirit of the invention, such designs should fall within the protection scope of the present invention.

Claims

1. A method for intelligent monitoring and adaptive control of industrial production lines based on multi-source visual perception, characterized in that, Includes the following steps: Simultaneously collect visual sensor data based on at least two different principles for the same target in the production line; The multi-source visual data is processed by a dynamic attention fusion network to generate a unified scene understanding map containing multi-dimensional attributes of the target. The scenario understanding map is synchronized to the digital twin of the production line, its state is updated, and the state deduction and anomaly analysis of the production process are performed in the digital twin. Based on the scenario understanding map and the deduction results of the digital twin, a reinforcement learning agent trained in the digital twin environment generates control action instructions and sends them to the physical production line. Data from the production line after adjustment is collected, and the difference between the results and the predictions of the digital twin is used to optimize and update the decision-making strategy of the reinforcement learning agent and the simulation model of the digital twin.

2. The intelligent monitoring and adaptive control method for industrial production lines based on multi-source visual perception according to claim 1, characterized in that: The dynamic attention fusion network dynamically assigns spatial attention weights to visual data features from different sources according to different monitoring tasks, and then fuses them.

3. The intelligent monitoring and adaptive control method for industrial production lines based on multi-source visual perception according to claim 1, characterized in that: State extrapolation within the digital twin refers to simulating the subsequent development of a production deviation in a virtual environment and assessing its impact when such deviation is detected.

4. The intelligent monitoring and adaptive control method for industrial production lines based on multi-source visual perception according to claim 1, characterized in that: The reinforcement learning agent is pre-trained offline in a simulation environment composed of a digital twin, and then selected as the optimal control action by comparing the inference results of the digital twin during online operation.

5. An intelligent control system for implementing the method of any one of claims 1-4, characterized in that, include: A multi-source visual sensing unit for acquiring at least two types of visual data; The data fusion and understanding unit is used to run the dynamic attention fusion network and generate a scene understanding graph. The digital twin and simulation unit is used to maintain and operate a digital twin synchronized with the physical production line and to perform state simulations. An adaptive decision-making unit is used to run the reinforcement learning agent and generate control instructions. A closed-loop learning unit is used to optimize the reinforcement learning agent and digital twin based on actual feedback.

6. The intelligent control system according to claim 5, characterized in that, The multi-source visual perception unit includes visual sensor types of at least two combinations of visible light cameras, infrared thermal imagers, three-dimensional structured light sensors, and hyperspectral imagers.