A control method and system for fire-fighting duty robots based on AI vision
By employing AI vision technology for preemptive command arbitration and a dual-layer verification mechanism, the problems of control instability and low precision of fire-fighting duty robots in complex scenarios have been solved, achieving high-precision adaptive operation.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- HANGZHOU TAIXIAO TECHNOLOGY CO LTD
- Filing Date
- 2026-05-12
- Publication Date
- 2026-06-30
AI Technical Summary
Existing fire-fighting duty robots suffer from control instability and low accuracy in complex fire-fighting scenarios due to accumulated errors, target position drift, and conflicts of multiple sources of commands.
By adopting an AI vision-based control method, precise positioning and operation of the robot's end effector are achieved through preemptive command arbitration, topology adaptive spatial positioning, and dual-layer vision closed-loop verification.
It improves the adaptability and precision of robot control, ensures the reliability and traceability of operation, and reduces the dependence on high-precision hardware.
Smart Images

Figure CN122299656A_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the field of fire-fighting duty robot control technology, and more specifically, to a fire-fighting duty robot control method and system based on AI vision. Background Technology
[0002] As an automated system designed to replace manual operation of critical equipment in fire control centers, the core task of fire-fighting duty robots is to ensure absolute operational reliability. Existing robots mostly employ a teach-and-playback model, where absolute coordinates are pre-recorded and then repeatedly executed. However, this control method is inherently unstable in the face of physical and logical uncertainties in the real world. Factors such as cumulative errors from long-distance movements, target equipment position drift, and conflicts between on-site and remote multi-source commands contribute to the robot system's instability, making it difficult for existing robots to provide reliable unmanned operation in complex fire-fighting scenarios. Summary of the Invention
[0003] This invention provides a fire-fighting duty robot control method and system based on AI vision, which at least solves the problem of low robot control accuracy in related technologies.
[0004] According to an embodiment of the present invention, a fire-fighting duty robot control method based on AI vision is provided, comprising:
[0005] Obtain a first instruction and execute a preemptive instruction arbitration based on the first instruction, wherein the first instruction carries a target component identifier;
[0006] Based on the deployment topology of the fire-fighting duty robot, perform coarse spatial positioning;
[0007] After coarse spatial positioning is completed, a dimension-reduced visual servoing is performed based on the visual template of the target component to align the end effector of the fire-fighting robot with the target component.
[0008] After the end effector is aligned with the target component, a first state verification is performed;
[0009] After the first state verification is passed, the end effector is controlled to perform physical operations on the target component;
[0010] After the physical operation is performed, a second state verification is performed.
[0011] In an exemplary embodiment, performing preemptive instruction arbitration according to the first instruction includes:
[0012] Concurrently monitor both the on-site command channel and the remote command channel;
[0013] When the field command channel receives a field command, it assigns the field command the highest priority and broadcasts a hardware braking signal to the motion control bus to interrupt the physical motion being executed.
[0014] In one exemplary embodiment, the method further includes, prior to performing coarse spatial localization:
[0015] Obtain the local three-dimensional coordinates corresponding to the target component identifier;
[0016] Based on the preset maximum effective working radius of the fire-fighting duty robot, a workspace safety check is performed on the local three-dimensional coordinates;
[0017] When the workspace security check passes, coarse spatial positioning is performed.
[0018] In one exemplary embodiment, performing dimensionality reduction visual servoing includes:
[0019] Acquire real-time image frames;
[0020] In the real-time image frame, the planar visual error vector between the visual template and the real-time image frame is obtained through template matching;
[0021] Based on the inverse of the preset first matrix, the planar visual error vector is converted into a fine-tuning speed command, and the fine-tuning speed command is executed. The fine-tuning speed command is constrained within the two-dimensional plane of the local base coordinate system.
[0022] In one exemplary embodiment, performing the first state verification includes:
[0023] Obtain the pre-operation baseline state and the pre-operation state determination threshold of the target component;
[0024] Acquire images before the operation and extract pre-operation state feature values from the images before the operation;
[0025] The pre-operation state feature value is compared with the pre-operation state determination threshold to determine whether the current state of the target component is consistent with the pre-operation baseline state.
[0026] According to another embodiment of the present invention, a fire-fighting duty robot control system based on AI vision is provided, comprising:
[0027] The instruction arbitration module is used to obtain a first instruction and perform preemptive instruction arbitration according to the first instruction, wherein the first instruction carries a target component identifier;
[0028] The motion control module is used to perform coarse spatial positioning based on the deployment topology of the fire-fighting duty robot;
[0029] The perception and analysis module is used to perform dimensionality-reduced visual servoing based on the visual template of the target component after the coarse spatial positioning is completed, so as to align the end effector of the fire duty robot with the target component.
[0030] After the end effector is aligned with the target component, a first state verification is performed; after the first state verification is passed, the end effector is controlled to perform a physical operation on the target component; after the physical operation is performed, a second state verification is performed.
[0031] In an exemplary embodiment, performing preemptive instruction arbitration according to the first instruction includes:
[0032] Concurrently monitor both the on-site command channel and the remote command channel;
[0033] When the field command channel receives a field command, it assigns the field command the highest priority and broadcasts a hardware braking signal to the motion control bus to interrupt the physical motion being executed.
[0034] In one exemplary embodiment, the system further includes:
[0035] Before performing coarse spatial localization, obtain the local three-dimensional coordinates corresponding to the target component identifier;
[0036] Based on the preset maximum effective working radius of the fire-fighting duty robot, a workspace safety check is performed on the local three-dimensional coordinates;
[0037] When the workspace security check passes, coarse spatial positioning is performed.
[0038] According to yet another embodiment of the present invention, a computer-readable storage medium is also provided, wherein a computer program is stored therein, wherein the computer program is configured to perform the steps in any of the above method embodiments when executed.
[0039] According to yet another embodiment of the present invention, an electronic device is also provided, including a memory and a processor, wherein the memory stores a computer program and the processor is configured to run the computer program to perform the steps in any of the above method embodiments.
[0040] This invention addresses the security risks associated with concurrent multi-source commands through preemptive command arbitration; and through topology-adaptive spatial positioning, it transforms absolute positioning errors such as the long-stroke cumulative error of the slide rail and the position drift of the target panel into dynamically eliminateable relative errors, significantly reducing reliance on high-precision hardware; furthermore, by constructing a dual-layer verification system, it ensures the verifiability and traceability of each operation. Therefore, it can solve the problem of low robot control precision and achieve the effect of improving robot control accuracy. Attached Figure Description
[0041] Figure 1 This is a flowchart of a fire-fighting duty robot control method based on AI vision according to an embodiment of the present invention;
[0042] Figure 2 This is a diagram showing the convergence effect of visual servoing error according to an embodiment of the present invention;
[0043] Figure 3 This is a structural block diagram of a fire-fighting duty robot control system based on AI vision according to an embodiment of the present invention. Detailed Implementation
[0044] The technical solutions of the embodiments of this application will be described below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of this application, and not all embodiments.
[0045] In the following description, the terms "first," "second," etc., are used for descriptive purposes only and should not be construed as indicating or implying relative importance or implicitly specifying the number of technical features indicated. Therefore, a feature defined with "first," "second," etc., may explicitly or implicitly include one or more of that feature. In the description of this application, unless otherwise stated, "a plurality of" means two or more.
[0046] Furthermore, in this application, directional terms such as "upper," "lower," "left," and "right" may be defined relative to the orientation of the components shown in the accompanying drawings. It should be understood that these directional terms can be relative concepts, used for relative description and clarification, and may change accordingly depending on the orientation of the components in the accompanying drawings.
[0047] In this application, unless otherwise expressly specified and limited, the term "connection" should be interpreted broadly. For example, "connection" can be a fixed connection, a detachable connection, or an integral part; it can be a direct connection or an indirect connection through an intermediate medium. Furthermore, the term "coupled" can refer to an electrical connection that enables signal transmission.
[0048] As used herein, “about,” “approximately,” or “approximately” includes the stated value and the average value within an acceptable range of deviation from the given value, wherein the acceptable range of deviation is determined by a person skilled in the art taking into account the measurement under discussion and the error associated with the measurement of the given quantity (i.e., the limitations of the measurement system).
[0049] Example 1
[0050] This embodiment provides a fire-fighting duty robot control method based on AI vision. It employs a control architecture that includes hardware-level instruction preemption arbitration, topology-adaptive spatial decoupling positioning, and dual-layer visual closed-loop verification. This architecture enables the fire-fighting duty robot to autonomously operate the control panel regardless of whether it is deployed in a fixed or sliding-rail configuration. This method solves the operational failure problems caused by differences in physical deployment form, long-stroke error accumulation, environmental drift, and multi-source instruction conflicts in existing technologies, thus improving the adaptability of robot control.
[0051] Reference Figure 1 This is a flowchart illustrating a fire-fighting robot control method based on AI vision, provided in an embodiment of this application. This method is applied to robot control systems, particularly for fire-fighting robots used in fire control centers. The method includes the following steps:
[0052] S100: Obtain the first instruction and execute the preemptive instruction arbitration according to the first instruction. The first instruction carries the target component identifier.
[0053] The preemptive command arbitration mechanism is designed to address potential equipment damage or personnel injury in high-security scenarios where physical intervention commands from on-site maintenance personnel clash with scheduling commands from the remote automated dispatch center. This mechanism includes hardware-broadcast braking signaling to the underlying motion control bus to mitigate the safety risks associated with conventional software queue management failing to interrupt the current generator pulse flow. The M100 command arbitration module within the robot control cabinet utilizes its internal I / O multiplexing mechanism to concurrently monitor the RS232 serial port (on-site command channel) connected to the on-site handheld terminal and the Ethernet TCP / IP socket (remote command channel) connected to the remote central control server. The on-site command channel's file descriptor is assigned a real-time scheduling strategy and the highest interrupt weight. Simultaneously, in response to the detection of a valid data byte stream on the field command channel, the highest-level hardware interrupt service routine immediately executes a pre-defined preemption algorithm. This algorithm first acquires a system-level global mutex spinlock Current_Task_Lock and sets its state to ACQUIRE_HIGH. This prevents the influx of remote commands at the hardware level by disabling new data packet copying operations triggered by network interrupts from low-priority ports. Based on this latch state, the arbitration algorithm broadcasts a predefined hardware emergency stop signaling CMD_EMERGENCY_BRAKE to all motion control nodes via the industrial fieldbus. This signaling, as the highest-priority process data object, is parsed by the underlying servo driver firmware and, within the response cycle, directly commands the driver to cut off the power supply to the motor and engage the mechanical brake, thereby stopping the high-speed physical axis in the shortest possible time. After completing physical braking at the hardware level, the algorithm forcibly terminates the currently active task thread triggered by the remote command at the software level and clears all historical data accumulated in the remote command queue Remote_Queue. After completing this series of blocking and clearing operations, the system broadcasts a servo enable signal CMD_SERVO_ENABLE to restore the servo state of each axis and safely executes the instructions from the local channel in a new thread.
[0054] For example, in a sliding rail deployment scenario, a remotely controlled robot moves at high speed along a sliding rail from X=0.1 meters to a target cabinet at X=0.8 meters. During the movement, on-site personnel discover an obstacle on the rail and press the "HALT" key on their local terminal. At this moment, based on a preemptive command arbitration mechanism, the RS232 serial port connected to the local terminal captures the message {"source": "LOCAL", "action": "HALT"}; simultaneously, the arbitration kernel immediately sets Current_Task_Lock and broadcasts the CMD_EMERGENCY_BRAKE signaling to the EtherCAT bus. The sliding rail servo driver responds within a short time to cut off the motor power and activate the electromagnetic brake, causing the slider to physically stop at X=3.5 meters. The command arbitration module M100 kills the remote task thread and clears the command queue at the software level, thereby avoiding mechanical collisions, and sends back the ERR_LOCAL_OVERRIDE_BRAKED exception code to the remote terminal to clearly inform that control has been forcibly revoked on-site.
[0055] S200: Performs topology-adaptive coarse spatial localization based on the robot's deployment topology.
[0056] In this embodiment, it is necessary to unify the robot motion control logic under different physical deployment configurations and implement strict safety boundary checks. This includes a series of sub-steps to transform abstract instructions into safe physical motion planning; specifically:
[0057] Before performing any physical movement, the system performs a workspace safety check. Specifically, after command arbitration in S100, the command arbitration module M100 extracts a parameter package associated with the target component identifier from the feature database module M200. This parameter package contains local three-dimensional coarse coordinates (X_arm, Y_arm, Z_arm) relative to the robot mounting base. Simultaneously, the command arbitration module M100 has cached the maximum effective working radius R_max of the current robot model during the system initialization phase (S000). In the verification sub-step, the command arbitration module M100 calculates the Euclidean distance D = (X_arm^2 + Y_arm^2 + Z_arm^2)^(1 / 2) from the target point to the origin of the base and determines whether D < R_max holds true. If the target point exceeds the workspace, the system rejects the command and throws an OUT_OF_REACH error to prevent the kinematics solver from crashing or the robot arm from hitting physical limits.
[0058] For example, the maximum working radius R_max of the fixed robot is set to 850 mm. When a command is received from a target component whose local coordinates recorded in the feature database module M200 are (800, 400, 200) mm, the system calculates D = (800^2 + 400^2 + 200^2)^(1 / 2) ≈ 916.5 mm. Since 916.5 mm > 850 mm, the safety check fails, and the task is aborted.
[0059] After the security check passes, the system enters the core topology adaptive spatial coarse positioning stage, where the topology manager in the instruction arbitration module M100 performs conditional branch routing based on the system environment variable PLATFORM_TYPE.
[0060] When PLATFORM_TYPE is configured as SLIDE_RAIL, the instruction arbitration module M100 reads the macroscopic one-dimensional coordinate Macro_Offset_X and the local three-dimensional coordinates (X_arm, Y_arm, Z_arm) from the parameter package extracted by the feature database module M200.
[0061] The positioning process is broken down into macro-motion and micro-motion, which are executed sequentially:
[0062] First, the slide rail drive subunit of the motion control module M400 receives Macro_Offset_X, generates an S-shaped velocity planning curve and converts it into pulse commands to control the linear servo motor to move the slider to the target position. This process allows for a positioning error of ±5 mm. After the slide rail is locked in place, the robotic arm servo subunit of the motion control module M400 receives local three-dimensional coordinates (X_arm, Y_arm, Z_arm). These local coordinates are limited to the local base coordinate system with the center of the bottom surface of the robot's physical mounting base as the origin, and inverse kinematics calculation is performed based on this relative coordinate system.
[0063] When PLATFORM_TYPE is configured as FIXED_BASE, the system only reads the local 3D coordinates (X_arm, Y_arm, Z_arm) and directly executes the deployment of the robotic arm with its fixed physical mounting base as the origin. Through this topology-adaptive transmission mechanism, the system's absolute positioning is uniformly transformed into relative positioning in the local base coordinate system. The absolute stopping error of the slide rail is equivalent to a one-time drift of the camera in local space, providing a unified input condition for subsequent visual servo correction.
[0064] S300: After coarse positioning is completed, a dimension-reduced visual servo is performed based on the visual template of the target component to align the robot's end effector with the target component.
[0065] In this embodiment, all uncertainties and errors remaining from the S200 stage are dynamically eliminated through "hand-eye" coordination. These errors include mechanical repetitive positioning errors of the slide rail, physical drift of the cabinet, and installation deviations of the robot base. Here, a forced dimensionality reduction visual servo control strategy is used to avoid uncontrollable drifts in the depth direction (Z-axis) and attitude angles (pitch / yaw) that may occur in conventional six-degree-of-freedom (6-DOF) visual servoing due to the pseudo-inverse of the Jacobian matrix (i.e., the first matrix). This strategy is based on the assumption that, during the robot arm deployment step in S200, the Z-axis of the robot's end-effector coordinate system (i.e., the camera optical axis) is planned to be approximately perpendicular to the target control panel. Based on this constraint, the three-dimensional alignment problem is reduced to a two-dimensional planar tracking problem.
[0066] In the execution process of dimensionality reduction visual servoing, the perception analysis module M300 acquires real-time grayscale image frames with a resolution of 640x480 pixels from the end-effector camera. It also retrieves the visual template image associated with the target component from the database of the feature database module M200. ; By using the normalized cross-correlation algorithm NCC, in Perform a sliding window search to find matching... The region with the strongest response. The difference between the coordinates of the peak point and the coordinates of the image center is used to construct the two-dimensional pixel translation error. By performing principal direction analysis on the gradient direction histogram of the matching region, the rotation angle error is calculated. Ultimately, these three scalars together constitute a three-degree-of-freedom planar visual error vector. .
[0067] For example, such as Figure 2 As shown, if the camera's field of view center is (320, 240), the NCC-matched template center is (355, 228), and feature direction analysis indicates the template has rotated clockwise by 0.02 radians, then the planar visual error vector at this moment is... .
[0068] The robotic arm servo subunit of the motion control module M400 responds to the error vector pushed at a high frequency (30 Hz) by the perception and analysis module M300 via IPC shared memory. Execute the dimension-reduced visual servo control law:
[0069]
[0070] in, For a pre-calibrated offline The inverse of the Jacobian matrix of the reduced-dimensional image describes the inverse mapping relationship between the robot end effector's unit velocity or unit angular velocity motion in the local base coordinate system XY plane and the rate of change of pixels in the image plane at the current working distance. This is the proportional convergence gain coefficient.
[0071] Through this matrix multiplication, the error in pixel space is resolved into a fine-tuning speed command in physical space:
[0072]
[0073] When generating the final motion command, the motion control module M400 forces the depth direction velocity to be... and pitch / yaw rate Set to zero to generate a constrained dimensionality reduction speed command. The command is sent to the robot's underlying controller via the industrial bus, driving the robotic arm to perform compensatory movements only within a spatial plane parallel to the panel. This "sensing-computing-control" process then cycles through the system at frequencies of tens of hertz until the norm of the error vector is reached. Once the convergence reaches the preset dead zone of 2.0 pixels, the servo process ends, and the robotic arm brakes on all axes.
[0074] S400: After the end effector is aligned with the target component, perform the first state verification.
[0075] In this embodiment, it is confirmed that the initial state of the target component to be operated on before the physical operation is consistent with the baseline state expected by the system, so as to prevent erroneous logical operations.
[0076] After the dimensionality reduction visual servoing is completed, the instruction arbitration module M100 sends an execution state verification instruction to the perception analysis module M300. The perception analysis module M300 extracts a baseline state descriptor associated with the target component from the feature database module M200. This descriptor contains the expected pre-operation baseline state (e.g., "OFF") and a corresponding pre-operation state determination threshold Th. This threshold Th is dynamically generated through brightness statistical analysis of multiple frames of background images of the target component in an inactive state, and its calculation formula is as follows: ,in and These represent the average brightness and standard deviation of the background image within the target ROI, respectively.
[0077] The perception and analysis module M300 acquires a pre-operation image in the current alignment state and extracts the ROI of the target component. For this ROI, the perception and analysis module M300 performs the same feature extraction algorithm as when generating the threshold, such as calculating the average pixel brightness value. The system will then extract... The comparison is performed with the stored Th. If the expected baseline state is "OFF", the comparison logic is to determine... Check if the condition is met. If met, the current status is set to "OFF", and the verification passes. If not met, the task is aborted and a PRE_STATE_VALIDATION_FAILED error is reported.
[0078] For example, the indicator light for the emergency stop button should be off when in standby mode. At this time, the system measures the average background brightness of its ROI in the off state. 20.0 (grayscale value, 0-255), standard deviation The threshold is 1.5. The system calculates and stores the judgment threshold as Th = 20.0 + 3 * 1.5 = 24.5. In one task, the first state verification is performed. The perception analysis module M300 acquires the image before the operation and calculates the average brightness of the current ROI. The value is 18.7. The system performs a comparison: 18.7 < 24.5, the condition is met. Therefore, the system determines the current state as "OFF", the verification is successful, and M100 allows subsequent physical operations, and so on.
[0079] S500: After the first state verification is passed, control the end effector to perform physical operations on the target component.
[0080] In this embodiment, after the initial conditions are confirmed to meet expectations in the first state verification of S400, the state machine of the instruction arbitration module M100 transitions to the physical execution stage and sends the encapsulated underlying execution code to the end effector control module M500 located on the robot's end flange according to the operation type defined in the current task instruction (such as "PRESS" or "TURN").
[0081] The M500 end effector control module is an independent embedded system driven by a microcontroller (MCU). Its firmware contains a fixed hardware timing generator for different operation types to decouple complex electromechanical coordination from upper-level logic scheduling.
[0082] When the end effector control module M500 receives a "press" command code (0x01) from the command arbitration module M100 via its CAN bus interface, its internal firmware timing generator is triggered, automatically executing a series of preset hardware drive sequences. These sequences include: outputting a PWM signal to the H-bridge circuit of the electric cylinder 42 connected to the driving positioning contact rod 43 to control its extension at a preset speed; simultaneously, continuously monitoring the I / O pin status of the microswitch or pressure sensor linked to the contact rod; after detecting a sensor status change (confirming physical contact), continuing to drive a short overtravel to ensure the button is fully pressed, and delaying for 500 milliseconds (ms); after the delay, outputting a reverse PWM signal to control the electric cylinder to return to its original position; and after monitoring the reset limit switch status, sending back a confirmation code indicating "[physical action completed]" to the command arbitration module M100.
[0083] For example, after successful S400 verification, the command arbitration module M100 sends a CAN message containing action code 0x01 to the end effector control module M500. The STM32 microcontroller of the end effector control module M500 sends commands to the motor driver chip via the SPI bus, controlling the electric cylinder to extend at a speed of 10 mm / s. During extension, the MCU continuously reads the voltage value of the force-sensitive resistor connected to the end of the contact rod via the ADC channel. When this voltage value exceeds the threshold equivalent to a 5 Newton pressing force, the MCU determines that contact has been successfully made and the button has been pressed. The MCU maintains this pressure for 0.5 seconds, then controls the electric cylinder to move in the reverse direction until it touches the reset limit switch. Finally, the end effector control module M500 sends a status code indicating successful execution of the physical action back to the command arbitration module M100.
[0084] S600: Perform second-state verification after the physical operation is executed.
[0085] In this embodiment, the purpose is to confirm whether the physical operation of S500 has achieved the expected logical effect in order to detect the problem of [action has been performed but the effect has not been achieved] caused by the target device itself.
[0086] This verification process is similar to the first-state verification of S400, but the comparison logic is reversed. After the end effector control module M500 sends back the [physical action completed] confirmation code, the instruction arbitration module M100 immediately sends the execution state verification instruction to the perception and analysis module M300 again. The perception and analysis module M300 acquires a "post-operation" image, extracts the ROI of the target component, and executes the same feature extraction algorithm as the first-state verification to obtain the post-operation state feature values. .
[0087] The system will It is compared with the same generated judgment threshold Th. At this point, the comparison logic is reversed based on the expected post-operation state. If the expected operation of a start button is a change from "OFF" to "ON", then the success condition for the second state verification is the judgment... Check if the condition is met. If met, the indicator light is considered lit, and the task loop is successfully closed. If not met, the task is considered to have failed, and a POST_STATE_VALIDATION_FAILED error is reported.
[0088] For example, the threshold for the emergency stop button is Th = 24.5. After the robot completes the pressing action, a second state verification is performed. If the button works normally, its red LED indicator is lit. The perception and analysis module M300 collects the image after the operation and calculates the average brightness of the ROI. The value is 188.2. The system then performs a comparison: 188.2 > 24.5, the condition is met. The system determines the state has successfully flipped to "ON", and the verification is successful. Conversely, if the LED of the button is burnt out, even if the internal circuit is connected after pressing, its ROI brightness will be low. It may still be 19.0. In this case, 19.0 > 24.5 is not true, the system will determine that the task has failed and issue an alarm to the firefighters, indicating that the emergency stop button has a status indication function failure, and so on.
[0089] In addition, to address the problem that the status of some target components on the fire control panel (such as old-fashioned mechanical instrument pointers, low-resolution LCD screen digits, or indicator lights partially obscured by dust) is difficult to accurately identify using traditional OCR or thresholding methods, this embodiment can also improve the recognition accuracy of weakly blurred or partially obscured states and the reliability of operation verification by using the state twin and residual analysis network STRANet.
[0090] STRANet employs a dual-channel input and residual output architecture to perform refined state alignment and verification, rather than simple classification. Specifically, STRANet's macro-architecture consists of a shared feature extraction backbone and two parallel processing branches: the Expected State Generation (ESGB) branch and the True State Alignment (RSCB) branch, which ultimately converge into a residual analysis head.
[0091] The feature extraction backbone network adopts a lightweight convolutional neural network structure (such as MobileNetV3). Its function is to receive the input image ROI and convert it into a high-dimensional feature map, which contains rich spatial and texture information.
[0092] The Expected State Generation Branch (ESGB) is a conditional generative network, similar to a miniature conditional generative adversarial network (cGAN) generator. It takes two inputs: a feature map from the feature extraction backbone and a conditional vector representing the desired target state. For example, if the target operation is "turning the knob to position 2," the conditional vector is the one-hot encoding of "2." ESGB's role is to "render" or "predict" in the feature space the ideal feature map that the target region should present if the operation is successful, based on the current real image features.
[0093] The Real State Comparison Branch (RSCB) directly transmits the real feature maps extracted from the backbone network.
[0094] The residual analysis head receives the desired feature map from the ESGB and the true feature map from the RSCB, and performs pixel-by-pixel difference operations on the two to generate a residual feature map. The fully connected layer network is used to analyze the energy (e.g., L2 norm) or statistical distribution of the residual feature map.
[0095] Subsequently, the execution logic of S400 (first state verification) and S600 (second state verification) was replaced and enhanced, specifically:
[0096] S401:
[0097] After visual servo alignment, the system acquires the ROI image before operation. The instruction arbitration module M100 retrieves the expected pre-operation state from the feature database module M200; for example, for a knob, the expected initial state is "1". This state "1" is encoded as a conditional vector. Subsequently, the image ROI and the conditional vector "1" are simultaneously fed into STRANet. The residual energy value calculated by the network is then extracted. If this energy value is lower than a preset low threshold... This means that the state of the real image is highly matched with the expected "1" state at the feature level, and the first state verification is successful.
[0098] S601:
[0099] After the S500 physical operation (e.g., performing a 30-degree rotation) is completed, the system acquires a post-operation image ROI. The instruction arbitration module M100 obtains the expected post-operation target state from the task instruction, such as "2", and encodes it into a new conditional vector. The post-operation image ROI and the conditional vector "2" are simultaneously fed into STRANet. The network recalculates the residual energy value; if this energy value is also below the threshold... If the physical operation successfully transitions the target component to the desired "2" state, the second state verification is passed. Conversely, if the residual energy value is higher than the threshold, it indicates a deviation (e.g., incomplete rotation or overshoot), and the task fails, and so on.
[0100] For example, consider verifying the pointer reading of an old-fashioned mechanical ammeter. The dial glass has scratches, and there is glare from ambient light. The task is to confirm that after an operation, the pointer moves from 0A to 25A. Traditional methods using OCR to identify numbers near the pointer or using Hough transform to detect a straight line in the pointer are easily affected by scratches and glare, leading to failure. Here, the first state verification (S401) is performed: the image before the operation and the conditional vector "0A" are fed into STRANet. ESGB generates an ideal feature map of "pointer at 0A" based on the current image (including scratches and glare). RSCB extracts the true feature map. The two are subtracted; the residual energy is very small (e.g., 0.02), below the threshold T_{match} (e.g., 0.1), and the verification passes. The second state verification (S601) is then performed: after the operation, the new image and the conditional vector "25A" are fed into the network. ESGB generates an ideal feature map of "pointer at 25A" based on the new image (the positions of scratches and glare may have slightly changed). At this point, if the physical operation is successful, the true pointer will point to 25A, and the true feature map extracted by RSCB will be highly similar to the expected feature map generated by ESGB, with the residual energy still very small (e.g., 0.03), indicating successful verification. If the operation fails and the pointer remains at 0A, the true feature map will differ greatly from the "expected" feature map at 25A, and the residual energy will spike (e.g., 0.85), far exceeding the threshold, indicating task failure.
[0101] Example 2
[0102] This embodiment provides a fire-fighting duty robot control system based on AI vision, which, in terms of physical structure and software architecture, is designed to fully support the method described in Embodiment 1. (Refer to...) Figure 3 The system includes a command arbitration module M100, a motion control module M400, a perception and analysis module M300, a feature database module M200, and an end effector control module M500.
[0103] The M100 instruction arbitration module can be an industrial embedded motherboard deployed within a robot control cabinet, equipped with dual gigabit Ethernet ports and a physical RS232 serial port; for example, it could be a computing node based on the ARM Cortex-A architecture. Internally, this module runs an event-driven finite state machine responsible for managing the entire lifecycle from instruction reception to task completion. Its key sub-components include a topology parser loader, which reads and loads the environment variable PLATFORM_TYPE (SLIDE_RAIL or FIXED_BASE) describing the robot's physical deployment and the maximum effective working radius R_max of the robotic arm from the firmware configuration area during system startup. The M100 instruction arbitration module also includes an interrupt arbitrator, which, based on the Linux epoll I / O multiplexing model, assigns a real-time scheduling policy and the highest interrupt weight to the serial port file descriptor connected to the field operation terminal, thereby enabling physical preemption of remote instructions. This module distributes parsed and arbitrated structured task instructions to other modules via an internal ZeroMQ message queue.
[0104] To ensure hard real-time performance in motion control, the M400 motion control module runs on a Linux microkernel space patched with RT-Preempt, or it can be deployed directly on a standalone real-time coprocessor. Based on the topology flags passed from the instruction arbitration module M100, this module dynamically drives or puts its internal slide rail drive subunit into sleep mode. This subunit receives one-dimensional macroscopic displacement coordinates and performs trapezoidal or S-shaped velocity curve planning to generate pulse commands for controlling the linear servo motor. The M400 also includes a robotic arm servo subunit. This subunit receives coarse positioning coordinates in a strict local base coordinate system and performs inverse kinematics. During the vision servoing phase, it converts the visual error vector into fine-tuning speed compensation for the six-axis joints using a dimension-reduced Jacobian control law. The M400 sends process data objects (PDOs) to the physical servo drives via an industrial fieldbus (such as EtherCAT) with short cycles.
[0105] The M300 perception analysis module relies on the GPU or NPU acceleration unit built into the edge computing gateway. Through a zero-copy image capture pipeline (e.g., utilizing V4L2 combined with DMA-BUF features), the M300 directly transfers camera sensor data to GPU memory without CPU overhead. Internally, it includes a parallel NCC kernel function deployed on the GPU for high-speed calculation of cross-correlation responses during the template matching stage. Furthermore, the M300 includes a CUDA-based luminance statistics calculator for parallel pixel reduction and mean calculation of specified ROIs during the state verification stage. As a data producer, the M300 shares a memory region via inter-process communication (IPC) and writes a non-blocking planar visual error vector stream containing timestamps to the M400 at a high frequency (e.g., 30Hz).
[0106] The feature database module M200 is physically mapped to the solid-state storage chip of the edge gateway. It employs a transactional embedded database engine (such as SQLite) to ensure data consistency, isolation, and durability (ACID properties) under conditions such as abnormal power outages. Its core data structure is a composite location topology table. This table is designed with high compatibility, and its fields include Target_ID (primary key), Macro_Offset_X (a nullable floating-point number used for track coordinates), Arm_X_Local, Arm_Y_Local, Arm_Z_Local (floating-point numbers used for local coordinates), Template_Blob (a binary large object used to store template images), and Th_Calib (a floating-point number used to store dynamic calibration thresholds). This table structure allows for unified storage and retrieval of scene data for both fixed and track-based robots.
[0107] The M500 end effector control module is typically a highly integrated, custom-designed MCU driver board mounted on the robot's end effector flange. Its firmware includes hardware timing generators for different operations, automatically translating simple abstract instructions (such as PRESS and TURN) from the upper-level M100 into a series of complex, microsecond-precise motor drive and sensor readout timings. This module communicates with the upper-level system via a low-speed, highly reliable CAN bus that penetrates the robot arm's internal cavity.
[0108] Specifically, this system is deployed to operate a rotary selector switch with multiple graduations (e.g., 0, 1, 2, 3). In this scenario, the system's workflow is basically the same as in Example 1, but different feature extraction and judgment logic is used in the state verification stage.
[0109] For example, suppose the task instruction is {"action": "TURN", "target_id": "SELECTOR_SW_01", "param": "2"}, which requires rotating the selection switch from its current position to the "2" position.
[0110] During the initialization phase, the system not only stores the visual template of the switch, but also trains and stores a small optical character recognition (OCR) model or template for each gear's scale number ("0", "1", "2", "3").
[0111] During the first state verification phase of the S400, the perception analysis module M300 captures the image before operation after aligning with the switch. At this time, it no longer calculates the average brightness value, but instead calls the OCR engine to perform character recognition on the scale area currently pointed to by the switch pointer. Assuming the recognition result is "1", which is consistent with the current state of the switch recorded by the system (or matches the expected initial state), then the first state verification is successful.
[0112] During the S500 physical operation phase, the command arbitration module M100 sends a "TURN" command to the end effector control module M500, along with calculated rotation angle parameters (e.g., a 30-degree clockwise rotation is required to go from "1" to "2"). The firmware of the end effector control module M500 drives the end effector's gripper to clamp the switch knob, then controls the rotary motor to precisely execute a 30-degree rotation, and finally releases the gripper.
[0113] In the second-state verification phase of the S600, the perception analysis module M300 recaptures the post-operation image and invokes the OCR engine. Ideally, the recognition result should be "2". The system compares this result with the target parameter "2" in the instruction. If they match, the second-state verification passes, and the task is successful. If the recognition result is still "1" (possibly due to mechanical slippage causing rotation failure) or "3" (possibly due to overshoot causing excessive rotation), the task is deemed a failure and an alarm is triggered, and so on.
[0114] Through the above description of the embodiments, those skilled in the art can clearly understand that the methods according to the above embodiments can be implemented by means of software plus necessary general-purpose hardware platforms. Of course, they can also be implemented by hardware, but in many cases the former is a better implementation method. Based on this understanding, the technical solution of the present invention, in essence, or the part that contributes to the prior art, can be embodied in the form of a software product. This computer software product is stored in a storage medium (such as ROM / RAM, magnetic disk, optical disk) and includes several instructions to cause a terminal device (which may be a mobile phone, computer, server, or network device, etc.) to execute the methods described in the various embodiments of the present invention.
[0115] It should be noted that the above modules can be implemented by software or hardware. For the latter, they can be implemented in the following ways, but are not limited to: all the above modules are located in the same processor; or, the above modules are located in different processors in any combination.
[0116] Embodiments of the present invention also provide a computer-readable storage medium storing a computer program, wherein the computer program is configured to perform the steps in any of the above method embodiments when executed.
[0117] In one exemplary embodiment, the aforementioned computer-readable storage medium may include, but is not limited to, various media capable of storing computer programs, such as a USB flash drive, read-only memory (ROM), random access memory (RAM), portable hard disk, magnetic disk, or optical disk.
[0118] Embodiments of the present invention also provide an electronic device including a memory and a processor, the memory storing a computer program and the processor being configured to run the computer program to perform the steps in any of the above method embodiments.
[0119] In one exemplary embodiment, the electronic device may further include a transmission device and an input / output device, wherein the transmission device is connected to the processor and the input / output device is connected to the processor.
[0120] Through the above description of the embodiments, those skilled in the art can clearly understand that, for the sake of convenience and brevity, only the division of the above functional modules is used as an example. In actual applications, the above functions can be assigned to different functional modules as needed, that is, the internal structure of the device can be divided into different functional modules to complete all or part of the functions described above.
[0121] In the several embodiments provided in this application, it should be understood that the disclosed apparatus and methods can be implemented in other ways. For example, the apparatus embodiments described above are merely illustrative; for instance, the division of modules or units is only a logical functional division, and in actual implementation, there may be other division methods. For example, multiple units or components may be combined or integrated into another apparatus, or some features may be ignored or not executed. Furthermore, the mutual coupling or direct coupling or communication connection shown or discussed may be through some interfaces; the indirect coupling or communication connection between apparatuses or units may be electrical, mechanical, or other forms.
[0122] The units described as separate components may or may not be physically separate. A component shown as a unit can be one or more physical units; that is, it can be located in one place or distributed in multiple different locations. Some or all of the units can be selected to achieve the purpose of this embodiment according to actual needs.
[0123] Furthermore, the functional units in the various embodiments of this application can be integrated into one processing unit, or each unit can exist physically separately, or two or more units can be integrated into one unit. The integrated unit can be implemented in hardware or as a software functional unit.
[0124] If the integrated unit is implemented as a software functional unit and sold or used as an independent product, it can be stored in a readable storage medium. Based on this understanding, the technical solutions of the embodiments of this application, in essence, or the parts that contribute to the prior art, or all or part of the technical solutions, can be embodied in the form of a software product. This software product is stored in a storage medium and includes several instructions to cause a device (which may be a microcontroller, chip, etc.) or processor to execute all or part of the steps of the methods of the various embodiments of this application. The aforementioned storage medium includes various media capable of storing program code, such as USB flash drives, portable hard drives, read-only memory (ROM), random access memory (RAM), magnetic disks, or optical disks.
[0125] The above description is merely a specific embodiment of this application, but the scope of protection of this application is not limited thereto. Any changes or substitutions within the technical scope disclosed in this application should be included within the scope of protection of this application. Therefore, the scope of protection of this application should be determined by the scope of the claims.
Claims
1. A fire-fighting duty robot control method based on AI vision, characterized in that, include: Obtain a first instruction and execute a preemptive instruction arbitration based on the first instruction, wherein the first instruction carries a target component identifier; Based on the deployment topology of the fire-fighting duty robot, perform coarse spatial positioning; After coarse spatial positioning is completed, a dimension-reduced visual servoing is performed based on the visual template of the target component to align the end effector of the fire-fighting robot with the target component. After the end effector is aligned with the target component, a first state verification is performed; After the first state verification is passed, the end effector is controlled to perform physical operations on the target component; After the physical operation is performed, a second state verification is performed.
2. The method according to claim 1, characterized in that, Executing preemptive instruction arbitration according to the first instruction includes: Concurrently monitor both the on-site command channel and the remote command channel; When the field command channel receives a field command, it assigns the field command the highest priority and broadcasts a hardware braking signal to the motion control bus to interrupt the physical motion being executed.
3. The method according to claim 1, characterized in that, Before performing coarse spatial localization, the method further includes: Obtain the local three-dimensional coordinates corresponding to the target component identifier; Based on the preset maximum effective working radius of the fire-fighting duty robot, a workspace safety check is performed on the local three-dimensional coordinates; When the workspace security check passes, coarse spatial positioning is performed.
4. The method according to claim 1, characterized in that, Performing dimensionality reduction visual servoing includes: Acquire real-time image frames; In the real-time image frame, the planar visual error vector between the visual template and the real-time image frame is obtained through template matching; Based on the inverse of the preset first matrix, the planar visual error vector is converted into a fine-tuning speed command, and the fine-tuning speed command is executed. The fine-tuning speed command is constrained within the two-dimensional plane of the local base coordinate system.
5. The method according to claim 1, characterized in that, Performing the first-state verification includes: Obtain the pre-operation baseline state and the pre-operation state determination threshold of the target component; Acquire images before the operation and extract pre-operation state feature values from the images before the operation; The pre-operation state feature value is compared with the pre-operation state determination threshold to determine whether the current state of the target component is consistent with the pre-operation baseline state.
6. A fire-fighting duty robot control system based on AI vision, characterized in that, include: The instruction arbitration module is used to obtain a first instruction and perform preemptive instruction arbitration according to the first instruction, wherein the first instruction carries a target component identifier; The motion control module is used to perform coarse spatial positioning based on the deployment topology of the fire-fighting duty robot; The perception and analysis module is used to perform dimensionality-reduced visual servoing based on the visual template of the target component after the coarse spatial positioning is completed, so as to align the end effector of the fire duty robot with the target component. After the end effector is aligned with the target component, a first state verification is performed; after the first state verification is passed, the end effector is controlled to perform a physical operation on the target component; after the physical operation is performed, a second state verification is performed.
7. The system according to claim 6, characterized in that, Executing preemptive instruction arbitration according to the first instruction includes: Concurrently monitor both the on-site command channel and the remote command channel; When the field command channel receives a field command, it assigns the field command the highest priority and broadcasts a hardware braking signal to the motion control bus to interrupt the physical motion being executed.
8. The system according to claim 6, characterized in that, The system also includes: Before performing coarse spatial localization, obtain the local three-dimensional coordinates corresponding to the target component identifier; Based on the preset maximum effective working radius of the fire-fighting duty robot, a workspace safety check is performed on the local three-dimensional coordinates; When the workspace security check passes, coarse spatial positioning is performed.
9. A computer-readable storage medium, characterized in that, The computer-readable storage medium stores a computer program, wherein the computer program is configured to perform the method described in any one of claims 1 to 5 when executed.
10. An electronic device comprising a memory and a processor, characterized in that, The memory stores a computer program, and the processor is configured to run the computer program to perform the method described in any one of claims 1 to 5.