Intelligent laser cutting control system and method based on visual language large model
The intelligent laser cutting control system based on a large visual language model integrates multi-source data acquisition and multi-modal fusion, solving the problems of insufficient perception and decision lag in existing laser cutting systems. It achieves holographic monitoring and adaptive control, improving cutting quality and production efficiency.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- JINAN BODOR LASER CO LTD
- Filing Date
- 2026-02-09
- Publication Date
- 2026-06-19
AI Technical Summary
Existing laser cutting systems lack comprehensive perception of internal state, cutting quality, and external environment. Data processing is mostly based on single-modal analysis, making it difficult to achieve deep integration of cross-dimensional information. Decision-making relies on preset parameters or manual intervention, resulting in delayed parameter adjustments during the cutting process. Quality inspection and equipment maintenance are handled after the fact, which can easily lead to workpiece scrap and reduced production efficiency.
The intelligent laser cutting control system based on a visual language big model integrates multi-source data acquisition modules, including temperature, vibration, acoustic emission sensors and 3D depth sensors. Through the visual language big model processing center, multimodal data fusion and understanding are performed to generate control commands and early warning information. The cutting parameters are adjusted through the execution module to form a closed-loop control.
It enables holographic monitoring of the cutting process, improves the stability and adaptability of cutting quality, and allows for adaptive control under complex processing conditions, reducing workpiece scrap and equipment failure, and improving production efficiency.
Smart Images

Figure CN122239596A_ABST
Abstract
Description
Technical Field
[0001] This application relates to the field of laser cutting control technology, specifically to an intelligent laser cutting control system and method based on a large visual language model. Background Technology
[0002] As high-end manufacturing demands increasingly higher precision and efficiency in laser cutting, traditional control systems face growing challenges. Existing cutting systems still suffer from the following technical deficiencies: Current cutting heads typically rely on only a few sensors to collect localized data, lacking comprehensive perception of internal conditions (such as optical component temperature and mechanical structure wear), cutting quality (such as kerf defects and slag buildup), and the external environment (such as workpiece position deviation and obstacles); data processing is mostly based on single-modal analysis, failing to achieve deep fusion of cross-dimensional information; decision-making relies on preset parameters or manual intervention, making it difficult to cope with dynamic adjustment needs under complex processing conditions; parameter adjustments during cutting are lagging, and quality inspection and equipment maintenance are mostly post-processing, easily leading to workpiece scrap, sudden equipment failures, and decreased production efficiency.
[0003] In recent years, although some studies have attempted to introduce deep learning for quality inspection, most of these studies are offline, single-modal analyses. Therefore, there is an urgent need for a method that can integrate multimodal information such as vision, sensing, and language to address key issues in existing technologies, such as data fragmentation and insufficient intelligent decision-making. Summary of the Invention
[0004] To address the aforementioned problems, this invention provides an intelligent laser cutting control system and method based on a large visual language model.
[0005] In a first aspect, the present invention provides an intelligent laser cutting control system based on a large visual language model, comprising: The intelligent laser cutting head integrates a multi-source data acquisition module for collecting data on the internal state, cutting quality, and external environment during the cutting process; The visual language large model processing center is used to perform multimodal fusion and understanding of the heterogeneous data collected by the multi-source data acquisition module, and to perform reasoning and decision-making based on the process knowledge base to generate control commands and early warning information. An execution module is used to respond to the control commands and adjust the cutting parameters of the intelligent laser cutting head; The feedback module is used to collect the actual operating data of the intelligent laser cutting head and feed it back to the visual language big model processing center to form closed-loop control and optimize the process knowledge base.
[0006] As a preferred embodiment of the technical solution of the present invention, the multi-source data acquisition module includes: The internal status acquisition unit includes a temperature sensor installed on the inner wall of the optical barrel of the cutting head, a vibration sensor installed on the side wall of the barrel, and an acoustic emission sensor installed on the outer wall of the cutting cavity. External environment acquisition unit, including a 3D depth sensor for acquiring workpiece point cloud data; The cutting quality acquisition unit includes: The guide rail is arranged along the axial direction of the intelligent laser cutting head; A sliding seat is slidably mounted on the guide rail; A miniature stepper motor is connected to the sliding base for driving the sliding base to move along the guide rail; An industrial camera is fixed to the sliding base by a bracket; An angle adjustment bracket, connecting the bracket and the sliding seat, is used to adjust the lens tilt angle of the industrial camera; The angle between the lens axis of the industrial camera and the workpiece surface is configured to a preset first angle range, and the working distance between the industrial camera lens and the workpiece surface is configured to a preset distance range.
[0007] By sensing the internal state, external environment, and cutting quality in multiple dimensions, holographic monitoring of the cutting process is achieved. In particular, the cutting quality acquisition unit uses guide rails and sliding seats, along with an angle adjustment support, to allow the industrial camera to dynamically adjust the shooting angle and working distance as needed. This ensures optimal imaging quality at different cutting positions and solves the problems of limited viewing angle and insufficient image resolution of fixed cameras.
[0008] As a preferred embodiment of the technical solution of the present invention, the cutting quality acquisition unit further includes: An optical coherence tomography (OCT) probe is mounted side-by-side with the industrial camera on the same sliding base. Wherein, the detection optical path of the optical coherence tomography probe and the visual optical path of the industrial camera form an angle within a second angular range; A common-path coupling module is used to couple the detection optical path of the optical coherence tomography probe to the main cutting optical path; the common-path coupling module includes: A conical reflector is installed at the beam-splitting point of the main cut optical path, located between the collimating lens and the focusing lens; An arc-shaped reflecting surface, an attenuator, and a filter are arranged sequentially along the reflected light path of the conical mirror.
[0009] By introducing optical coherence tomography (OCT) technology and working in conjunction with an industrial camera, high-precision non-destructive testing of the internal structure, subsurface defects, and penetration depth of materials is achieved. A common-path coupling module uses optical elements such as a conical mirror to partially couple the OCT probe optical path with the main cutting optical path, ensuring high-precision alignment between the detection area and the cutting area.
[0010] As a preferred embodiment of the technical solution of the present invention, the visual language large model processing center adopts a multimodal fusion model based on the Transformer architecture, and the visual language large model processing center retrieves historical cases and parameters from the process knowledge base to assist in real-time decision-making.
[0011] As a preferred embodiment of the technical solution of the present invention, the visual language large model processing center includes: A multimodal data alignment and fusion module is used to receive heterogeneous data from the multi-source data acquisition module, including sensor numerical sequences, cutting area images, and workpiece point cloud data; and to map the heterogeneous data to a unified processing state semantic space through spatiotemporal alignment. The retrieval enhancement generation module is used to call the process knowledge base and query historical material parameters and defect cases that match the current processing conditions. Combined with the retrieved knowledge, the fused multimodal information is subjected to correlation analysis and deep reasoning. The multi-objective decision-making module is used to simultaneously perform equipment health analysis, cutting quality analysis, and processing condition adaptation analysis based on the reasoning results, and generate adaptive control parameter instructions and early warning information.
[0012] Employing a multimodal fusion model based on the Transformer architecture, the system effectively aligns and fuses heterogeneous data such as sensor values, images, and point clouds to construct a unified semantic space for processing states. The introduction of a Retrieval Enhanced Generation (RAG) module enables the system to proactively retrieve historical similar cases and process knowledge, achieving knowledge-driven decision-making rather than solely relying on data-driven approaches, thus improving the interpretability and reliability of decisions. The multi-objective decision-making module simultaneously optimizes equipment health, cutting quality, and processing condition adaptation, achieving globally optimal control rather than local optimization.
[0013] As a preferred embodiment of the technical solution of the present invention, the execution module includes: An adaptive control unit is used to dynamically adjust the laser power, cutting speed, and focal position according to the control commands. The quality assessment and early warning unit is used to output the cutting defect judgment result, component remaining life assessment and collision risk warning in real time based on the early warning information, and trigger the corresponding processing action.
[0014] As a preferred embodiment of the technical solution of the present invention, the adaptive control unit is specifically used to parse the structured control instructions from the visual language large model processing center, map the target parameters therein to the corresponding laser power controller, motion controller and focus adjustment driver; synchronously adjust the laser output power, the movement speed and trajectory of the cutting head in the XY plane, and the focus position of the Z axis to achieve the target parameters; according to the requirements of the control instructions regarding the adaptation of processing conditions, drive the micro motor in the cutting quality acquisition unit to adjust the working distance and angle between the industrial camera lens and the workpiece surface to maintain the best imaging conditions.
[0015] As a preferred embodiment of the technical solution of the present invention, the specific process of the quality assessment and early warning unit responding to the early warning information includes a graded triggering mechanism: In response to a mild warning, the warning information is displayed on the human-machine interface; In response to a moderate warning, it automatically executes preset process parameter adjustments or equipment intervention actions; In response to the severe warning, an emergency stop command is sent to the motion control system to immediately terminate the cutting process.
[0016] The adaptive control unit can adjust core parameters such as laser power, speed, and focus in real time, and drive camera pose adjustment to ensure that process parameters are always within the optimal range. A tiered early warning mechanism adopts differentiated response strategies based on risk level: mild warnings alert operators, moderate warnings automatically adjust parameters, and severe warnings result in immediate emergency shutdowns. This avoids production efficiency losses caused by frequent downtime and ensures equipment and personnel safety in extreme situations.
[0017] As a preferred embodiment of the technical solution of the present invention, the specific working process of the feedback module includes: The actual operating data of the intelligent laser cutting head is collected and packaged to obtain a data packet, which includes at least: actual cutting accuracy data, real-time health status data of the equipment, and the final process parameters executed in this cutting. The data packets are transmitted to the visual language large model processing center in real time or periodically via industrial bus or Ethernet protocol. The visual language large model processing center evaluates the effectiveness of previous decisions based on the data packets and updates the process knowledge base accordingly.
[0018] By collecting real-time operational data and feeding it back to the visual language big data model processing center, a data-driven continuous learning loop is formed. The system can evaluate the effectiveness of previous decisions based on actual cutting results and automatically update the process knowledge base, enabling the system to continuously optimize itself and accumulate knowledge over time, thus solving the problem of knowledge solidification in traditional control systems.
[0019] Secondly, the technical solution of the present invention also provides an intelligent laser cutting control method based on a large visual language model, comprising the following steps: By integrating a multi-source data acquisition module into the intelligent laser cutting head, internal status data, cutting quality data, and external environmental data of the cutting process are collected simultaneously. Through the visual language big model processing center, multimodal fusion and understanding of the collected heterogeneous data are performed, and reasoning and decision-making are carried out based on the process knowledge base to generate control commands and early warning information; In response to the control command, the cutting parameters of the intelligent laser cutting head are adjusted; and in response to the warning information, corresponding processing actions are executed. The actual operating data of the intelligent laser cutting head is collected and fed back to the visual language large model processing center to form a closed-loop control and optimize the process knowledge base.
[0020] As a preferred embodiment of the technical solution of the present invention, the cutting quality data acquisition specifically includes: A micro stepper motor is driven to adjust the position of the industrial camera via a sliding base, so that the angle between the lens axis and the workpiece surface is kept within a first preset angle range, and the working distance between the lens and the workpiece surface is kept within a first preset distance range; the first preset angle range is 60° to 75°, and the first preset distance range is 80mm to 120mm.
[0021] Images of the cut area are captured using the industrial camera.
[0022] High-precision cross-sectional data of the cutting area is acquired using an optical coherence tomography (OCT) probe, wherein the probe's optical path forms an angle with the industrial camera's visual optical path within a second preset angle range. The second preset angle range is 10° to 15°.
[0023] As a preferred embodiment of the technical solution of the present invention, the generation of control commands and early warning information through the visual language large model processing center specifically includes: Sensor values, images, and point cloud data are mapped to a unified semantic space through spatiotemporal alignment; Call the process knowledge base to retrieve historical cases and parameters that match the current processing conditions, and combine the retrieved knowledge to perform deep reasoning on the fused information; Based on the inference results, the system simultaneously performs equipment health analysis, cutting quality analysis, and processing condition adaptation analysis, and generates adaptive control parameter commands and early warning information.
[0024] Equipment health analysis includes: using an adaptive k-value local outlier factor algorithm to identify temperature anomalies, and using a convolutional neural network to analyze vibration and acoustic emission signals to diagnose mechanical faults.
[0025] As a preferred embodiment of the technical solution of the present invention, the execution of the corresponding processing actions includes a hierarchical triggering mechanism: In response to a mild warning, the information is displayed on the human-machine interface; In response to a moderate warning, the system will automatically perform a secondary cutting process or fine-tune the parameters. In response to the severe warning, an emergency stop was triggered.
[0026] As a preferred embodiment of the technical solution of the present invention, optimizing the process knowledge base includes: Successful combinations of operating conditions, parameters, and results are stored as high-quality cases in the knowledge base; and / or the actual effect data from feedback is used to incrementally train the large visual language model.
[0027] As can be seen from the above technical solutions, this application has the following advantages: By constructing a closed-loop control system integrating multi-source data acquisition, visual language large-scale model processing, execution, and feedback, the entire process from data perception to intelligent decision-making is automated. Utilizing a visual language large-scale model to perform multimodal fusion and semantic understanding of heterogeneous data breaks through the limitations of traditional single-modal perception, enabling a comprehensive grasp of the cutting process status; combined with enhanced reasoning through retrieval from a process knowledge base, the system possesses expert-like decision-making capabilities, improving adaptability and cutting quality stability under complex processing conditions. Attached Figure Description
[0028] To more clearly illustrate the technical solution of this application, the accompanying drawings used in the description will be briefly introduced below. Obviously, the accompanying drawings described below are only some embodiments of this application. For those skilled in the art, other drawings can be obtained based on these drawings without creative effort.
[0029] Figure 1 A block diagram of a system provided in an embodiment of the present invention.
[0030] Figure 2 This is a flowchart illustrating the method provided in an embodiment of the present invention. Detailed Implementation
[0031] To make the purpose, features, and advantages of this application more apparent and understandable, specific embodiments and accompanying drawings will be used to clearly and completely describe the technical solution protected by this application. Obviously, the embodiments described below are only some embodiments of this application, and not all embodiments. Based on the embodiments in this application, all other embodiments obtained by those skilled in the art without creative effort are within the scope of protection of this application.
[0032] Unless otherwise defined, all technical and scientific terms used in this application have the same meaning as commonly understood by one of ordinary skill in the art to which this invention pertains. The terminology used in this application and in the specification of this invention is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention.
[0033] like Figure 1 As shown, this embodiment of the invention provides an intelligent laser cutting control system based on a large visual language model, comprising: The intelligent laser cutting head integrates a multi-source data acquisition module for collecting data on the internal state, cutting quality, and external environment during the cutting process; The visual language large model processing center is used to perform multimodal fusion and understanding of the heterogeneous data collected by the multi-source data acquisition module, and to perform reasoning and decision-making based on the process knowledge base to generate control commands and early warning information. An execution module is used to respond to the control commands and adjust the cutting parameters of the intelligent laser cutting head; The feedback module is used to collect the actual operating data of the intelligent laser cutting head and feed it back to the visual language big model processing center to form closed-loop control and optimize the process knowledge base.
[0034] In this embodiment of the invention, the multi-source data acquisition module is used to acquire internal state data, cutting quality data, and external environmental data during the cutting process, including: The internal status acquisition unit specifically includes: The temperature sensor is embedded in the annular groove on the inner wall of the lens barrel near the focusing / collimating lens. Preferably, it is installed at three evenly spaced points at 0°, 120°, and 240°, 10–15 mm from the lens edge. If there is no pre-made groove in the lens barrel, it is installed by an embedded opening with a depth not exceeding 1 / 2 of the lens barrel wall thickness. This type of sensor is also installed in the heat dissipation weak areas of the power drive module and optical path adjustment module housing. The four corners of the side wall where the housing and internal heat-generating element are in contact (20 mm from the edge) are preferred. During installation, thermally conductive silicone is used to tightly attach the sensor probe to the mounting surface, and the outside is sealed with high-temperature resistant insulating tape. The sensor lead is arranged along the pre-reserved wiring groove in the lens barrel / housing.
[0035] The core mounting point of the vibration sensor is the rigid connection section 50–80 mm from the lower end face of the cutting head on the side wall of the lens barrel. The auxiliary mounting point is the flange connection between the module shell and the main body of the cutting head. The sensor is rigidly fixed with bolts, and the sensor base and the mounting surface must be flat and fit together without gaps. The installation direction is parallel to the axis of the cutting head.
[0036] The acoustic emission sensor is directly attached to the outer wall of the cutting head's cutting cavity, 30-40mm from the nozzle outlet. If the outer wall of the cutting cavity is non-metallic, a metal adapter plate is added. During installation, a special acoustic coupling agent is applied and the sensor is gently pressed and fixed with a clamp. The S2836-44K photoelectric sensor is integrated at the end of the reflection branch of the dedicated optical path. The photosensitive surface faces the reflected light outlet of the conical reflector, and the installation height is flush with the focal point of the conical reflector. This sensor must be placed inside the optical path detection cavity of the cutting head and physically isolated from the main cutting optical path. During installation, ensure that the angle between the sensor axis and the reflected light path is ≤5°. A transparent dustproof glass cover is installed in front of the photosensitive surface, keeping the glass cover parallel to the photosensitive surface and maintaining a distance of 2-3mm. The conical reflector of the dedicated optical path is installed at the beam splitting point between the collimating lens and the focusing lens in the main cutting optical path, with the cone tip pointing towards the main laser incident direction. The axis adjustment bracket is fixed to ensure that the axis of the conical surface coincides with the axis of the main optical path. The arc-shaped reflective surface is installed on the reflected light path of the conical reflector, and the distance between it and the conical reflector is the radius of curvature of the arc-shaped surface. It is fixed by the angle fine-tuning knob. The attenuator is installed between the arc-shaped reflective surface and the filter and is close to the front end of the filter. It is fixed by a slot to support quick replacement of lenses with different attenuation rates. The filter is installed between the attenuator and the S2836-44K photoelectric sensor, 10-15mm away from the photosensitive surface of the sensor, perpendicular to the optical path axis and sealed with a sealing ring at the edge. In addition, the installation of all sensors and optical components must meet the requirements of coaxiality and parallelism. After the optical path is adjusted, it is calibrated with a laser collimator. The sensor leads must be shielded with metal shielded wire. Maintenance space must also be reserved inside the cutting head to facilitate the disassembly, calibration and replacement of sensors and optical components.
[0037] In this embodiment of the invention, the temperature sensor is a PT1000 high-precision platinum resistance thermometer, embedded in the side wall of the lens barrel and the module housing; the vibration sensor is a PCB Piezotronics 352C65 model, and the acoustic emission sensor is a PhysicalAcoustics PAC R15α model, installed on the bottom mechanical structure of the cutting head; the photoelectric sensor is an S2836-44K, and the light intensity signal is transmitted through a POE communication connector.
[0038] The cutting quality acquisition unit includes: The guide rail is arranged along the axial direction of the intelligent laser cutting head; A sliding seat is slidably mounted on the guide rail; A miniature stepper motor is connected to the sliding base for driving the sliding base to move along the guide rail; An industrial camera is fixed to the sliding base by a bracket; An angle adjustment bracket, connecting the bracket and the sliding seat, is used to adjust the lens tilt angle of the industrial camera; The angle between the lens axis of the industrial camera and the workpiece surface is configured to a preset first angle range, and the working distance between the industrial camera lens and the workpiece surface is configured to a preset distance range.
[0039] In this embodiment of the invention, the guide rail is preferably a precision linear guide rail, and the sliding seat is preferably a precision linear slider that matches the guide rail. The combination of the precision linear guide rail and the precision linear slider provides high guiding accuracy, low friction coefficient, and good rigidity, ensuring that the industrial camera mounted on it is accurately positioned and wobble-free during movement, thereby meeting the requirements for stable, high-definition imaging of the cutting area. In one specific embodiment, the repeatability of the precision linear guide rail and the slider can reach within 0.005 mm, and the parallelism error is less than 0.02 mm / m.
[0040] The overall mounting reference for the camera and its supporting components is a dedicated linear guide rail on the side of the cutting head. The guide rail is arranged along the axial direction of the cutting head, and the vertical distance between its fixed base and the nozzle outlet of the cutting head is 150–200mm. This ensures that the components will not interfere with the cutting head's moving mechanism or the workpiece surface when they move. The industrial camera body is fixed on a precision linear slider via a quick-release bracket. The lens faces the workpiece cutting area, and the angle between the lens axis and the surface of the sheet material needs to be precisely controlled within the range of 60°–75°. At the same time, the working distance between the camera lens and the workpiece surface is kept stable at 80–120mm by adjusting the movement of the linear slider. This allows the camera's field of view to completely cover a cutting area of ≥20×20mm, clearly capturing the real-time status of the cut and the molten pool.
[0041] If an optical coherence tomography (OCT) device is selected, its probe head needs to be mounted side-by-side with the industrial camera on the same slider. The probe optical path and the camera's visual optical path should be at an angle of 10°–15°. A common optical path coupling module is used to achieve coordination between the laser cutting optical path and the OCT probe optical path, avoiding mutual interference between the two types of optical paths. The ring light source is coaxially nested in front of the camera lens, with the light-emitting surface of the light source flush with the lens end face. It is fixed to the lens bracket through a threaded interface. The illumination direction of the light source is consistent with the lens axis to ensure that the light uniformly covers the camera's field of view, improving the contrast between the cut edge and the molten pool area. A quartz transparent protective plate also needs to be installed in front of the ring light source. This protective plate is sealed to the light source bracket through a sealing ring. The distance between the protective plate and the lens end face is 5–8 mm, and it must completely cover the effective area of the lens and the light source to isolate the metal dust and slag generated during the cutting process and prevent contamination of the lens.
[0042] This unit employs a drive scheme combining a miniature stepper motor, a precision linear slider, and a photoelectric limit switch, along with an angle adjustment bracket, to achieve precise control of the camera's position and angle. A miniature stepper motor with a torque ≥0.5Nm is selected as the drive core, connected to the ball screw of the linear guide rail via a coupling. The motor receives pulse signals from the control system, driving the ball screw to rotate, which in turn drives the linear slider to perform reciprocating linear motion along the guide rail. The motor's step angle is controlled within 1.8°, ensuring a slider movement accuracy ≤0.01mm, precisely maintaining the stability of the camera's working distance. The precision linear slider uses a high-rigidity ball linear guide pair, with a guide rail parallelism error ≤0.02mm / m and a slider-guide rail clearance ≤0.005mm. Furthermore, multiple sets of mounting holes are pre-drilled on the slider, making it compatible with different models of industrial cameras and OCT probes, achieving component integration. For rapid mounting and positioning, the photoelectric limit structure has one photoelectric limit sensor installed at each of the extreme positions at both ends of the guide rail. The sensors are linked with the drive controller of the stepper motor. When the slider moves to the extreme position, the sensor triggers the limit signal, and the controller immediately cuts off the motor power to prevent the slider from overtraveling and derailing. At the same time, an origin photoelectric sensor is added at the middle position of the guide rail. After each equipment start-up, the slider automatically returns to the origin position to ensure the consistency of the collected data. A hinged angle adjustment support is added between the camera bracket and the linear slider. The support has a built-in worm gear adjustment mechanism. By rotating the adjustment knob, the camera lens axis can be infinitely adjusted from 60° to 75°. After adjustment, tighten the support locking bolt to lock the angle position and prevent the angle from shifting due to cutting vibration. The angle scale accuracy of the support is 1°, which is convenient for operators to quickly calibrate.
[0043] In addition, a beam splitter is installed at the main optical path splitting point inside the cutting head, and a common optical path coupling design is adopted to reflect 0.1% of the laser energy to the OCT detection module, while ensuring that the energy loss of the main cutting optical path is negligible. The camera visual optical path is guided by a set of reflectors and merges with the OCT detection optical path to point to the cutting area, realizing the synchronous acquisition of visual data and tomographic scanning data.
[0044] The industrial camera used is a Basler acA2500-14gm, with a frame rate of 14fps and a resolution of 2592×1944, paired with an OCT device (axial resolution 10μm); the micro motor is a 28BYJ-48 stepper motor, driving a precision linear slider (stroke ≥50mm), and a photoelectric limit switch is used to achieve precise stroke limitation; the ring light source is a CCS VLG-C100, and the transparent protective plate is made of quartz material; the slider guide rail is fixed by an angle adjustment support to ensure that the angle between the lens axis and the plate is precisely controlled at 60°~75°, the working distance is stable at 80~120 mm, and the field of view coverage is ≥20×20mm.
[0045] The external environment acquisition unit includes a 3D depth sensor for acquiring workpiece point cloud data; the 3D depth sensor is an Intel RealSense D455 with an acquisition range of 0.1-3.9m and a point cloud density of 1280×720.
[0046] The visual language large model processing center includes: A multimodal data alignment and fusion module is used to receive heterogeneous data from the multi-source data acquisition module, including sensor numerical sequences, cutting area images, and workpiece point cloud data; and to map the heterogeneous data to a unified processing state semantic space through spatiotemporal alignment; specifically used for: A unified time reference is assigned to the data from each sensor based on a precision clock protocol; Based on the pre-calibrated transformation relationship between each sensor coordinate system and the world coordinate system, the image data and point cloud data are mapped to the same world coordinate system; For the analysis of target time and spatial location, spatiotemporal interpolation is performed on multi-source data so that data from different sensors represent the state of the same spatiotemporal point. Specifically, spatiotemporal interpolation matching includes: using the extended Kalman filter algorithm to fuse visually recognized defect location information with three-dimensional geometric information in point cloud to eliminate measurement noise and generate more accurate spatiotemporal alignment results.
[0047] Multidimensional features related to processing quality and equipment health are extracted from the aligned data to form a unified state feature vector.
[0048] The retrieval enhancement generation module is used to call the process knowledge base and query historical material parameters and defect cases that match the current processing conditions. Combined with the retrieved knowledge, it performs correlation analysis and deep reasoning on the fused multimodal information. Specifically, it is used for: The multimodal feature vector representing the current processing state is compared with the historical case vectors stored in the process knowledge base to calculate the similarity and recall the K most relevant historical cases. The historical case context of the recall is combined with the current state data and input into the visual language big model; the big model performs multi-hop causal reasoning by associating current data with historical patterns and combining knowledge graphs to generate decision output containing control parameters, early warning information and reasoning basis.
[0049] The multi-objective decision-making module is used to simultaneously perform equipment health analysis, cutting quality analysis, and processing condition adaptation analysis based on the inference results, and generate adaptive control parameter commands and early warning information. Specifically, it is used for: Based on the results of deep reasoning, the system simultaneously performs equipment health status assessment and life prediction, cutting defect identification and process parameter back-calculation, as well as path correction and obstacle avoidance analysis based on real-time point clouds. A comprehensive optimization function is constructed with laser power, cutting speed, and focal position as decision variables. The function integrates sub-objectives of quality, efficiency, and equipment health, and uses parallel analysis results as constraints. The optimal parameter set is obtained by solving the problem through an optimization algorithm. The optimal parameter set is encapsulated into adaptive control parameter instructions, and corresponding prompts, interventions, or emergency stop warnings are generated based on the severity level of the analysis results.
[0050] The execution module is used to respond to the decision instructions of the visual language model. It includes an adaptive control unit, which adjusts the power output, movement speed and focus position of the cutting head according to the decision parameters. At the same time, it can drive a micro motor to adjust the camera pose to adapt to the visual acquisition needs of different thickness plates. The quality assessment and early warning unit outputs the cutting defect judgment results, the remaining life assessment of optical components / mechanical structures and the collision risk warning in real time, and triggers corresponding processing actions (such as pausing cutting and cleaning prompts).
[0051] ① Model Training and Optimization Stage: First, a multi-dimensional dataset is constructed and labeled, collecting data covering sheet metal properties, process parameters, visual and sensor data, defect samples, component lifespan and collision risk data, which are divided into training, validation and test sets in a 7:2:1 ratio; then, the vision-language multimodal fusion architecture of the Transformer framework is adopted, and the model is trained through pre-training and fine-tuning under process constraints; finally, the performance is evaluated based on three indicators: defect recognition, parameter prediction and lifespan risk assessment, and incremental training is performed on error samples to enhance the adaptability to complex processing conditions.
[0052] ② Real-time command response and execution phase: The adaptive control unit receives and analyzes the model's commands, and adjusts the cutting power, speed, focus, and camera pose (distance 80–120mm, angle 60°–75°) in conjunction with the model, and performs closed-loop calibration through visual feedback; the quality assessment and early warning unit inputs multi-source data, and infers cutting defects, component lifespan, and collision risks from the model, triggering prompts, interventions, and emergency stops according to three levels of early warning: light, medium, and heavy. Archived data is used for incremental optimization of the model.
[0053] ③System assurance: The edge computing architecture is adopted to shorten the instruction response time to ≤50ms, a manual fault tolerance mechanism is set up, and the camera, sensor and model indicators are calibrated weekly to ensure stable system operation.
[0054] The feedback module feeds back the actual operating data of the execution module (cutting accuracy, equipment operating status, camera imaging quality parameters) to the visual language big model processing center in real time to achieve closed-loop optimization.
[0055] The working logic of this system is as follows: (1) System initialization self-test System initialization self-test: After the system is powered on, the multi-source data acquisition module (temperature, vibration, vision, depth sensors, etc.), the visual language large model processing center, and the execution module complete hardware self-test and parameter calibration in sequence; the micro motor drives the camera component to move to the preset initial position, and the limit switch verifies the position signal; at the same time, the built-in process knowledge base (containing more than 10 kinds of material cutting parameters and 500+ defect cases) is loaded to provide data support for subsequent decision-making.
[0056] (2) Multi-dimensional data perception and collection After the cutting operation starts, the multi-source data acquisition module simultaneously collects three types of data: ①Internal status data: The PT1000 platinum resistance sensor (10Hz) collects the temperature of the lens barrel and module shell, the PCB vibration sensor (1kHz) and PAC acoustic emission sensor capture mechanical structure vibration and wear signals, and the photoelectric sensor collects laser light intensity data.
[0057] ② Cutting quality data: Basler industrial camera (2592×1944 resolution) with ring light source and quartz protective plate to capture images of the cut and molten pool. Optional OCT equipment (axial resolution 10μm) can be used to obtain high-precision cross-sectional information.
[0058] ③ External environment data: The Intel RealSense D455 depth sensor collects point cloud data of the workpiece and identifies workpiece position deviations and dynamic obstacles.
[0059] All data is transmitted to the processing center in real time via industrial bus / PoE protocol.
[0060] (3) Multimodal data cognitive decision making The Visual Language Large Model Processing Center, based on the Transformer architecture, maps heterogeneous data such as sensor values, images, and point clouds to a unified semantic space through the ViT-L / 14 visual encoder and BERT-base language encoder. Combined with RAG retrieval enhancement generation technology, it retrieves matching cases and parameters from the process knowledge base to complete three core analyses: ① Equipment health analysis: The adaptive k-value LOF algorithm (k=5-15) is used to identify temperature anomalies. Vibration / acoustic emission waveform features are extracted through 3×3 convolutional kernel CNN to determine early faults such as mechanical loosening and bearing wear.
[0061] ② Cutting quality analysis: The kerf width and molten pool shape are quantified by a pixel-level semantic segmentation model to identify defects such as slag residue, with a processing speed of ≤100ms / frame.
[0062] ③ Processing condition adaptation analysis: Correct the workpiece position based on point cloud data and plan the optimal cutting path.
[0063] The final output includes dynamic cutting parameter commands (power, speed, focus position), equipment warning information, and path correction commands.
[0064] (4) Precise execution and real-time feedback After receiving the instruction, the execution module adjusts the laser power, cutting speed, and focus position; the quality and early warning unit monitors the execution effect in real time; at the same time, the feedback module transmits the cutting accuracy data, equipment health status data, and camera imaging quality data back to the visual language big model processing center in real time to update the process knowledge base and decision logic, forming a closed-loop control for continuous learning.
[0065] (5) Task completion and data archiving After the cutting task is completed, the system automatically saves the processing parameters, quality data, and equipment status data of this cutting to the knowledge base, providing an optimization basis for subsequent cutting of similar workpieces, and generating equipment maintenance suggestions (such as sensor calibration cycle, component replacement warning, and camera lens cleaning reminder).
[0066] The work process is as follows: (1) Start-up phase After the system is powered on, the hardware power-on and self-test process prioritizes the multi-source data acquisition module for hierarchical power supply and diagnostics. For the sensor channels, the temperature sensor undergoes zero-point drift correction (referencing the ambient temperature baseline). Then, the vibration sensor's gain accuracy is verified using its built-in standard signal source, and the light intensity sensor's response linearity is confirmed using its built-in calibration light source. Based on Modbus / TCP or EtherCAT protocols, synchronous communication links are established with each sensor to confirm the data transmission cycle (e.g., 10Hz for the temperature sensor, 1kHz for the vibration sensor) and the packet loss rate threshold (<0.01%).
[0067] Load pre-trained process knowledge bases (such as material-parameter mapping tables and defect feature libraries) from local SSDs or the cloud, supporting fast retrieval based on attention mechanisms. Multi-sensor timing alignment: Employ PTP (Precision Clock Protocol) to unify the clocks of all devices, controlling timestamp errors to the microsecond level, providing a spatiotemporal consistency foundation for subsequent data fusion.
[0068] (2) Multi-source data acquisition and fusion processing The heterogeneous sensor collaborative acquisition strategy is shown in Table 1. Each sensor synchronously acquires data at a preset frequency and generates a global processing status view through feature-level fusion.
[0069] Table 1
[0070] (3) Data fusion and feature extraction Spatial errors caused by transmission delays are eliminated by fusing multi-sensor data through Extended Kalman Filter (EKF). For example, the location of slag adhesion identified by visual recognition is mapped to the same coordinate system as the 3D point cloud cut depth. Multimodal feature generation: The frequency domain energy entropy of the vibration signal, the rate of change of the temperature gradient, and the texture features of the visual image are extracted and input into the defect diagnosis model.
[0071] The intelligent decision-making and real-time correction process is as follows: (1) Path planning and parameter optimization The A* algorithm is used to dynamically plan the cutting path, and the cost function integrates geometric distance, temperature constraints (avoiding high-temperature areas >500℃), and surface roughness weights. For example, when a local high-temperature region is detected, a cooling section path is automatically inserted. Laser power is adjusted in real time based on the Particle Swarm Optimization (PSO) algorithm. With cutting speed The objective function is:
[0072] in The width of the cut. For surface roughness, For the thickness of the heat-affected zone, the weighting factor is... Adaptive adjustment based on material type.
[0073] (2) Process parameter back-inference model: When the vision system detects slag (pixel area > 0.1 mm²), the regression relationship is trained based on historical data;
[0074] in The width of the cut. For cutting speed, For cutting power, For the intercept term, These are the linear regression coefficients.
[0075] The parameter adjustments were reversed, including a power fine-tuning of +5% (e.g., increasing from 3.5kW to 3.675kW); and a simultaneous reduction in cutting speed of 0.2m / min to ensure full vaporization of the slag oxide layer.
[0076] The execution control and quality closed loop are as follows: (1) Multi-axis motion compensation Vibration compensation: The cutting head offset is calculated using IMU (Inertial Measurement Unit) data, and the servo motor is used to correct the pose (accuracy ±0.01mm). Adaptive focus position: The Z-axis focus position is dynamically adjusted based on the workpiece flatness feedback from the 3D sensor (e.g., raising the focus by 0.2mm when the sheet metal warps).
[0077] (2) Quality assessment and incremental learning Real-time defect handling: For slag-laden defects, trigger secondary cutting (power increase by 10%, speed decrease by 15%) or increase auxiliary gas pressure by 10%. Crack warning: When abnormal cooling rate in the heat-affected zone is detected, automatically inject a slow cooling process (e.g., reduce speed by 20% and activate auxiliary air cooling). Online knowledge base update: Every 5ms, data such as actual cut width and slag-laden status are fed back to the large model, updating the process knowledge base through incremental learning. For example, recording successful parameter combinations for new materials (such as carbon fiber composites) to optimize subsequent decisions.
[0078] like Figure 2 As shown, this embodiment of the invention also provides an intelligent laser cutting control method based on a large visual language model, comprising the following steps: S1. Through the multi-source data acquisition module integrated into the intelligent laser cutting head, internal status data, cutting quality data and external environment data of the cutting process are collected simultaneously. S2. Through the visual language large model processing center, multimodal fusion and understanding of the collected heterogeneous data are performed, and reasoning and decision-making are carried out based on the process knowledge base to generate control commands and early warning information. S3. In response to the control command, adjust the cutting parameters of the intelligent laser cutting head; and in response to the warning information, execute the corresponding processing action; S4. Collect the actual operating data of the intelligent laser cutting head and feed it back to the visual language large model processing center to form closed-loop control and optimize the process knowledge base.
[0079] In this embodiment of the invention, the acquisition of cutting quality data specifically includes: A micro stepper motor is driven to adjust the position of an industrial camera via a precision linear slider, so that the angle between the lens axis and the workpiece surface is kept within a first preset angle range, and the working distance between the lens and the workpiece surface is kept within a first preset distance range; the first preset angle range is 60° to 75°, and the first preset distance range is 80mm to 120mm.
[0080] Images of the cut area are captured using the industrial camera.
[0081] High-precision cross-sectional data of the cutting area is acquired using an optical coherence tomography (OCT) probe, wherein the probe's optical path forms an angle with the industrial camera's visual optical path within a second preset angle range. The second preset angle range is 10° to 15°.
[0082] In some embodiments, generating control commands and warning information through the visual language large model processing center specifically includes: The system receives heterogeneous data from the multi-source data acquisition module, including sensor numerical sequences, cutting area images, and workpiece point cloud data; it maps the heterogeneous data to a unified processing state semantic space through spatiotemporal alignment; the spatiotemporal alignment specifically involves using the extended Kalman filter (EKF) algorithm to fuse the defect location identified by the vision system with the cut depth information in the 3D point cloud into the same coordinate system.
[0083] The process knowledge base is invoked, and retrieval-enhanced generation (RAG) technology is used to retrieve historical material parameters and defect cases that match the current processing conditions. Combined with the retrieved knowledge, correlation analysis and deep reasoning are performed on the fused multimodal information. The deep reasoning specifically includes: An algorithm based on the Local Outlier Factor (LOF) with adaptive k-value is used to identify abnormal patterns in temperature sensor data. Early fault diagnosis of mechanical loosening or wear is performed by extracting the frequency domain and waveform features of vibration sensor and acoustic emission sensor signals based on convolutional neural network (CNN).
[0084] Based on the results of the deep inference, the following analysis is performed synchronously: Equipment health analysis: Based on temperature, vibration and acoustic emission data, it determines the condition of mechanical structures and generates lifespan warnings; Cutting quality analysis: Based on image and point cloud data, identify cutting defects and generate quality warnings; specifically, use a pixel-level semantic segmentation model to process the image of the cutting area, quantify the cutting width and molten pool shape, and when slag residue defects are identified, the processing speed is no higher than 100ms / frame.
[0085] Machining condition adaptation analysis: Correcting the workpiece position based on point cloud data and planning the optimal cutting path; Machining condition adaptation analysis also includes adaptive path planning, which adopts the A* algorithm, and the planning cost function integrates geometric distance, temperature constraints and surface roughness weights.
[0086] Based on the above analysis results, adaptive control parameter commands for dynamically adjusting laser power, cutting speed, and focal position, along with corresponding warning information, are generated. When generating the adaptive control parameter commands, a dynamic parameter optimization model based on the particle swarm optimization (PSO) algorithm is used, whose objective function integrates kerf width, surface roughness, and heat-affected zone thickness.
[0087] Correspondingly, the execution of corresponding processing actions includes a tiered triggering mechanism: In response to a mild warning, the information is displayed on the human-machine interface; In response to a moderate warning, the system will automatically perform a secondary cutting process or fine-tune the parameters. In response to the severe warning, an emergency stop was triggered.
[0088] The above description of the disclosed embodiments enables those skilled in the art to make or use the invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the general principles defined herein may be implemented in other embodiments without departing from the spirit or scope of the invention. Therefore, the invention is not to be limited to the embodiments shown herein, but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.
Claims
1. An intelligent laser cutting control system based on a large visual language model, characterized in that, include: The intelligent laser cutting head integrates a multi-source data acquisition module for collecting data on the internal state, cutting quality, and external environment during the cutting process; The visual language large model processing center is used to perform multimodal fusion and understanding of the heterogeneous data collected by the multi-source data acquisition module, and to perform reasoning and decision-making based on the process knowledge base to generate control commands and early warning information. An execution module is used to respond to the control commands and adjust the cutting parameters of the intelligent laser cutting head; The feedback module is used to collect the actual operating data of the intelligent laser cutting head and feed it back to the visual language big model processing center to form closed-loop control and optimize the process knowledge base.
2. The intelligent laser cutting control system based on a large visual language model according to claim 1, characterized in that, The multi-source data acquisition module includes: The internal status acquisition unit includes a temperature sensor installed on the inner wall of the optical barrel of the cutting head, a vibration sensor installed on the side wall of the barrel, and an acoustic emission sensor installed on the outer wall of the cutting cavity. External environment acquisition unit, including a 3D depth sensor for acquiring workpiece point cloud data; The cutting quality acquisition unit includes: The guide rail is arranged along the axial direction of the intelligent laser cutting head; A sliding seat is slidably mounted on the guide rail; A miniature stepper motor is connected to the sliding base for driving the sliding base to move along the guide rail; An industrial camera is fixed to the sliding base by a bracket; An angle adjustment bracket, connecting the bracket and the sliding seat, is used to adjust the lens tilt angle of the industrial camera; The angle between the lens axis of the industrial camera and the workpiece surface is configured to a preset first angle range, and the working distance between the industrial camera lens and the workpiece surface is configured to a preset distance range.
3. The intelligent laser cutting control system based on a large visual language model according to claim 2, characterized in that, The cutting quality acquisition unit also includes: An optical coherence tomography (OCT) probe is mounted side-by-side with the industrial camera on the same sliding base. Wherein, the detection optical path of the optical coherence tomography probe and the visual optical path of the industrial camera form an angle within a second angular range; A common-path coupling module is used to couple the detection optical path of the optical coherence tomography probe to the main cutting optical path; the common-path coupling module includes: A conical reflector is installed at the beam-splitting point of the main cut optical path, located between the collimating lens and the focusing lens; An arc-shaped reflecting surface, an attenuator, and a filter are arranged sequentially along the reflected light path of the conical mirror.
4. The intelligent laser cutting control system based on a large visual language model according to claim 1, characterized in that, The visual language large model processing center adopts a multimodal fusion model based on the Transformer architecture. The visual language large model processing center retrieves historical cases and parameters from the process knowledge base to assist in real-time decision-making.
5. The intelligent laser cutting control system based on a large visual language model according to claim 1, characterized in that, The visual language large model processing center includes: A multimodal data alignment and fusion module is used to receive heterogeneous data from the multi-source data acquisition module, including sensor numerical sequences, cutting area images, and workpiece point cloud data; and to map the heterogeneous data to a unified processing state semantic space through spatiotemporal alignment. The retrieval enhancement generation module is used to call the process knowledge base and query historical material parameters and defect cases that match the current processing conditions. Combined with the retrieved knowledge, the fused multimodal information is subjected to correlation analysis and deep reasoning. The multi-objective decision-making module is used to simultaneously perform equipment health analysis, cutting quality analysis, and processing condition adaptation analysis based on the reasoning results, and generate adaptive control parameter instructions and early warning information.
6. The intelligent laser cutting control system based on a large visual language model according to claim 4, characterized in that, The execution module includes: An adaptive control unit is used to dynamically adjust the laser power, cutting speed, and focal position according to the control commands. The quality assessment and early warning unit is used to output the cutting defect judgment result, component remaining life assessment and collision risk warning in real time based on the early warning information, and trigger the corresponding processing action.
7. The intelligent laser cutting control system based on a large visual language model according to claim 6, characterized in that, The adaptive control unit is specifically used to parse the structured control instructions from the visual language large model processing center, map the target parameters therein to the corresponding laser power controller, motion controller and focus adjustment driver; synchronously adjust the laser output power, the movement speed and trajectory of the cutting head in the XY plane, and the focus position of the Z axis to achieve the target parameters; according to the requirements of the control instructions regarding the adaptation of processing conditions, drive the micro motor in the cutting quality acquisition unit to adjust the working distance and angle between the industrial camera lens and the workpiece surface to maintain the best imaging conditions.
8. The intelligent laser cutting control system based on a large visual language model according to claim 6, characterized in that, The specific process by which the quality assessment and early warning unit responds to early warning information includes a tiered triggering mechanism: In response to a mild warning, the warning information is displayed on the human-machine interface; In response to a moderate warning, it automatically executes preset process parameter adjustments or equipment intervention actions; In response to the severe warning, an emergency stop command is sent to the motion control system to immediately terminate the cutting process.
9. The intelligent laser cutting control system based on a large visual language model according to claim 6, characterized in that, The specific working process of the feedback module includes: The actual operating data of the intelligent laser cutting head is collected and packaged to obtain a data packet, which includes at least: actual cutting accuracy data, real-time health status data of the equipment, and the final process parameters executed in this cutting. The data packets are transmitted to the visual language large model processing center in real time or periodically via industrial bus or Ethernet protocol. The visual language big model processing center evaluates the effectiveness of previous decisions based on the data packets and updates the process knowledge base accordingly.
10. A smart laser cutting control method based on a large visual language model, characterized in that, Includes the following steps: By integrating a multi-source data acquisition module into the intelligent laser cutting head, internal status data, cutting quality data, and external environmental data of the cutting process are collected simultaneously. Through the visual language big model processing center, multimodal fusion and understanding of the collected heterogeneous data are performed, and reasoning and decision-making are carried out based on the process knowledge base to generate control commands and early warning information; In response to the control command, the cutting parameters of the intelligent laser cutting head are adjusted; In response to the warning information, the corresponding processing action is executed; The actual operating data of the intelligent laser cutting head is collected and fed back to the visual language large model processing center to form a closed-loop control and optimize the process knowledge base.