Robots and their autonomous motion execution methods, devices and computer program products
By using a collaborative architecture of pre-set scheduling services and behavior trees, efficient connection of robot action execution is achieved, solving the problem of separation between scheduling services and behavior tree control in traditional architectures. This improves the success rate, stability, and scalability of robot autonomous action execution and supports autonomous applications in multiple scenarios.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- UBTECH ROBOTICS CORP LTD
- Filing Date
- 2026-05-12
- Publication Date
- 2026-06-30
AI Technical Summary
In traditional robot autonomous action execution architecture, scheduling services and behavior tree control are separated, resulting in poor coordination, low efficiency in integrating functional modules, lack of state feedback, and insufficient scalability, making it difficult to meet the action requirements in complex scenarios.
A collaborative architecture of preset scheduling service and behavior tree is adopted. The preset scheduling service standardizes the information of each functional module of the robot, generates an action behavior tree, and introduces a node detection and reporting module to monitor the execution status in real time, so as to achieve efficient connection between action logic generation and execution.
It improves the real-time performance, accuracy, and stability of robot motion execution, reduces the cost of module interface adaptation, supports rapid adaptation to different robot models and rapid expansion of new functions, and meets the needs of batch applications and complex scenarios.
Smart Images

Figure CN122299657A_ABST
Abstract
Description
Technical Field
[0001] This application belongs to the field of robot control technology, and in particular relates to a robot and its autonomous motion execution method, device and computer program product. Background Technology
[0002] Current autonomous robot action execution solutions mostly rely on a single scheduling module or a simple logic control process, aiming to enable the robot to complete specific actions, such as grasping, moving, and assembling, according to preset instructions. In traditional architectures, although behavior trees are introduced for control, there is a problem of separation between scheduling services and behavior tree control. Summary of the Invention
[0003] This application provides a robot and its autonomous action execution method, device, and computer program product, which can solve the problem of separation between scheduling service and behavior tree control in traditional architectures.
[0004] In a first aspect, embodiments of this application provide a method for autonomous robot action execution, including: The large model is triggered to call the robot's preset database, and combined with the scene requirement information, the action logic description text is generated and sent to the preset scheduling service. The preset database includes standardized description information of each first functional module that has been connected to the robot. The preset scheduling service is triggered to generate an action behavior tree based on the action logic description text and the preset database, and the action behavior tree is sent to the behavior tree. The behavior tree is a model used to schedule the behavior of the robot, and the action behavior tree is a behavior tree instance used to complete the task corresponding to the scene requirement information. The behavior tree is triggered to call the corresponding nodes to control the robot to perform actions based on the action behavior tree.
[0005] In this embodiment, the large model is triggered to call the robot's preset database. Combined with scene requirement information, an action logic description file can be generated. The preset scheduling service is then triggered to generate an action behavior tree based on the action logic description text and the preset database. The behavior tree, based on the action behavior tree, calls the corresponding nodes to control the robot to execute actions, thus enabling the robot to perform autonomous actions. This solution, through the collaborative architecture of the preset scheduling service and the nodes of the behavior tree, can generate an executable action behavior tree based on the action logic description text generated by the large model, combined with the preset database. Based on the action behavior tree, the corresponding nodes are called to control the robot to execute actions. This breaks the limitation of the separation of scheduling and control in traditional robot action execution, achieving efficient connection from action logic generation to execution, and solving the problem of separation between scheduling service and behavior tree control in traditional architectures.
[0006] In some embodiments of the first aspect, triggering the preset scheduling service to generate an action behavior tree based on the action logic description text and the preset database includes: The preset scheduling service is triggered to match the corresponding second functional modules from the preset database based on the action logic description text, and to convert the second functional modules into functional units that the behavior tree can recognize, and to combine them in the order of action logic to form the action behavior tree.
[0007] In some embodiments of the first aspect, the triggering of the preset scheduling service, based on the action logic description text, matches corresponding second functional modules from the preset database, including: The preset scheduling service is triggered to parse the action steps and functional requirements in the action logic description text, and to match the corresponding second functional modules from the preset database.
[0008] In some embodiments of the first aspect, the triggering of the large model to call the robot's preset database, combined with scene requirement information, to generate action logic description text includes: The large model is triggered to call the preset database, and combined with the scene requirement information, the action target, the required third functional modules and action logic are analyzed to generate the action logic description text.
[0009] In some embodiments of the first aspect, before the triggering of the large model to call the robot's preset database, combine it with scene requirement information, and generate action logic description text, the method further includes: The preset scheduling service is triggered to scan each of the first functional modules that the robot has connected to, and collects the description information of each first functional module. The description information of each first functional module is standardized to obtain the standardized description information of each first functional module. The preset database is constructed based on the standardized information of each first functional module. The relevant information of a first functional module includes the functional description, interface parameters and execution capabilities of the first functional module.
[0010] In some embodiments of the first aspect, the robot autonomous action execution method further includes: During the execution of each node, the node detection and reporting module is triggered to monitor the execution status of each node; If there is a target node whose execution status is unsuccessful, the reason for the failure is collected and the preset scheduling service is triggered to send the reason for the failure to the large model. The unsuccessful status includes failure status or abnormal status. The large model is triggered to update the action logic description text based on the unsuccessful reason, combined with the preset database and the scenario requirement information, and then return to execute the step of sending the action logic description text to the preset scheduling service and subsequent steps until the execution status of the target node is successful, or the large model determines that the action cannot be completed by updating the action logic description text.
[0011] In some embodiments of the first aspect, after the trigger node detection and reporting module monitors the execution status of each node of the behavior tree, the method further includes: The text-to-speech module is triggered to convert the execution state into a voice broadcast.
[0012] In some embodiments of the first aspect, controlling the robot to perform actions through corresponding nodes includes: The object detection node calls the visual recognition unit to obtain the pose data of the target object and sends the pose data to the motion control service. The motion control service is triggered to calculate the grasping path of the robot's robotic arm based on the pose data, and the motion control grasping unit is invoked to control the robotic arm to move to the target position and perform the grasping action based on the grasping path.
[0013] Secondly, embodiments of this application provide a robot autonomous action execution device, comprising: The model triggering module is used to trigger the large model to call the robot's preset database, combine the scene requirement information, generate action logic description text, and send the action logic description text to the preset scheduling service. The preset database includes standardized description information of each first functional module that has been connected to the robot. The scheduling trigger module is used to trigger the preset scheduling service to generate an action behavior tree based on the action logic description text and the preset database, and send the action behavior tree to the behavior tree. The behavior tree is a model used to schedule the behavior of the robot, and the action behavior tree is a behavior tree instance used to complete the task corresponding to the scene requirement information. The behavior tree triggering module is used to trigger the behavior tree to control the robot to perform actions based on the action behavior tree through the corresponding nodes.
[0014] Thirdly, embodiments of this application provide a robot, including a memory, a processor, and a computer program stored in the memory and executable on the processor. When the processor executes the computer program, the robot performs the autonomous action execution method as described in any one of the first aspects above.
[0015] Fourthly, embodiments of this application provide a computer-readable storage medium storing a computer program, which, when executed by a computer, implements the robot autonomous action execution method as described in any one of the first aspects above.
[0016] Fifthly, embodiments of this application provide a computer program product, including a computer program, which, when run, causes the robot autonomous action execution method as described in any one of the first aspects above to be executed.
[0017] It is understood that the beneficial effects of the second to fifth aspects mentioned above can be found in the relevant descriptions in the first aspect mentioned above, and will not be repeated here. Attached Figure Description
[0018] To more clearly illustrate the technical solutions in the embodiments of this application, the drawings used in the description of the embodiments or the prior art will be briefly introduced below. Obviously, the drawings described below are only some embodiments of this application. For those skilled in the art, other drawings can be obtained based on these drawings without creative effort.
[0019] Figure 1 This is a schematic diagram of a traditional human-computer interaction large model process provided in the embodiments of this application; Figure 2 This is a schematic diagram of the improved human-computer interaction large model provided in the embodiments of this application; Figure 3 This is a flowchart illustrating a robot autonomous action execution method provided in an embodiment of this application; Figure 4 This is another flowchart illustrating the robot autonomous action execution method provided in this application embodiment; Figure 5 This is another flowchart illustrating the robot autonomous action execution method provided in the embodiments of this application; Figure 6 This is a schematic diagram of the structure of the robot autonomous motion execution device provided in the embodiments of this application; Figure 7 This is a schematic diagram of the robot provided in the embodiments of this application. Detailed Implementation
[0020] In the following description, specific details such as particular system architectures and techniques are set forth for illustrative purposes and not for limitation, in order to provide a thorough understanding of the embodiments of this application. However, those skilled in the art will understand that this application may also be implemented in other embodiments without these specific details. In other instances, detailed descriptions of well-known systems, apparatuses, circuits, and methods have been omitted so as not to obscure the description of this application with unnecessary detail.
[0021] It should be understood that, when used in this application specification and the appended claims, the term "comprising" indicates the presence of the described features, integrals, steps, operations, elements and / or components, but does not exclude the presence or addition of one or more other features, integrals, steps, operations, elements, components and / or a collection thereof.
[0022] It should also be understood that the term “and / or” as used in this application specification and the appended claims means any combination of one or more of the associated listed items and all possible combinations, and includes such combinations.
[0023] Furthermore, in the description of this application and the appended claims, the terms "first," "second," "third," etc., are used only to distinguish descriptions and should not be construed as indicating or implying relative importance.
[0024] References to "one embodiment" or "some embodiments" in this specification mean that one or more embodiments of this application include a specific feature, structure, or characteristic described in connection with that embodiment. Therefore, the phrases "in one embodiment," "in some embodiments," "in other embodiments," "in still other embodiments," etc., appearing in different parts of this specification do not necessarily refer to the same embodiment, but rather mean "one or more, but not all, embodiments," unless otherwise specifically emphasized.
[0025] Current autonomous robot motion execution solutions mostly rely on a single scheduling module or simple logic control flow, aiming to enable robots to complete specific actions, such as grasping, moving, and assembling, according to preset instructions. In traditional architectures, robot function invocation, motion planning, and execution control are often scattered across different modules, lacking a unified collaborative scheduling mechanism. For example, some solutions directly receive instructions and control robot actions through independent motion control services, with the large model only responsible for generating simple motion instructions and unable to interact with state information during execution in real time; other solutions, while introducing behavior trees for logic control, fail to integrate them with dedicated scheduling services, resulting in low efficiency in functional module integration and limited flexibility and scalability in motion execution.
[0026] In the traditional autonomous action execution architecture of robots (i.e., the traditional architecture), the robot includes a large-scale model software development kit (SDK), system services, a vision SDK, and motion control services. System services include system scheduling services. The large-scale model SDK includes Automatic Speech Recognition (ASR) and Text-to-Speech (Text-to-Speech) modules. to The system consists of a text-to-speech (TTS) module and a large model. The large model can be a large language model (LLM). The vision SDK can be a vision SDK that provides six-dimensional pose estimation capabilities, enabling it to identify target objects and obtain their precise position and orientation in three-dimensional space. Its core process typically includes: initializing and loading the large model SDK; analyzing scene requirements and generating action commands using the large model; the system scheduling service transmitting the action commands to the motion control service; and the motion control service controlling the robot to execute the actions. During this process, the interaction between functional modules largely relies on hard-coded interface calls, lacking standardized information integration and status feedback mechanisms.
[0027] like Figure 1 The diagram shown is a schematic representation of a traditional human-computer interaction large-scale model provided in this application embodiment, which implements object grasping based on the aforementioned traditional architecture. The object grasping process may include the following steps: Step 101: The system scheduling service initializes the large model SDK and vision SDK.
[0028] Step 102: During the large model dialogue, the system scheduling service directly calls the large model SDK to trigger the large model SDK to generate text information.
[0029] Step 103: The system scheduling service obtains the pose data of the target object based on the text information.
[0030] Step 104: The system scheduling service sends pose data to the operation control service to trigger the operation control service to perform a grabbing action.
[0031] By analyzing the aforementioned traditional architecture and Figure 1 Analyzing the traditional solution shown, the inventors found the following problems with the existing technology: Poor coordination between scheduling and execution: The scheduling service is separated from the behavior tree control. The action execution instructions generated by the large model SDK need to go through multiple transformations before being transmitted to the execution layer. Moreover, the state information during the execution process cannot be fed back to the system scheduling service and the large model SDK in real time, resulting in delayed action adjustments and easy occurrence of action execution deviations or failures. For example, when the robot's grasping action fails due to object position deviation, the system cannot quickly feed back the failure status to the large model SDK, requiring manual intervention to replan the action, which reduces the efficiency of autonomous execution.
[0032] Low efficiency in integrating functional modules: The calls to various functional modules of the robot lack standardized information carriers, and interface adaptation between different functional modules requires targeted development. When adding new functional modules or changing robot models, the interface logic needs to be readjusted, resulting in long system deployment cycles, high maintenance costs, and difficulty in meeting the needs of mass robot application scenarios.
[0033] Lack of status feedback mechanism: Traditional architectures lack a robust mechanism for detecting and reporting node execution status, making it impossible to monitor the execution status of each node in the behavior tree in real time. When a node malfunctions (such as hardware failure or instruction error), the system cannot quickly pinpoint the root cause of the problem and can only resolve it through a complete restart or manual troubleshooting, affecting the continuity and stability of the robot's actions.
[0034] Insufficient scalability and flexibility: Due to the lack of a standardized functional dictionary and modular architecture, traditional architectures struggle to adapt to changing action requirements in complex scenarios. For example, when a robot needs to perform two related actions, grasping and assembling, simultaneously, the action logic code must be rewritten. It cannot quickly combine existing modules to achieve new functions, thus limiting the robot's autonomous execution capabilities in multiple scenarios.
[0035] To address the aforementioned problems in the existing technology, this application proposes a robot autonomous action execution architecture based on preset scheduling services and behavior tree nodes. Based on this architecture, this application proposes the following... Figure 2 The diagram illustrates the improved human-computer interaction large-scale model workflow. The robot includes a large-scale model SDK, system scheduling service, node scheduling service, and motion control service. The system scheduling service includes a third-party scheduling service. The node scheduling service includes a preset scheduling service, a behavior tree, and a node detection and reporting module. Its object grasping process may include the following steps: Step 201: When a third-party scheduling service calls the large model SDK, the ASR module in the large model SDK converts the voice commands into text information and sends the text information to the large model.
[0036] During the object grasping process, third-party scheduling services can ensure normal communication between various functional modules in the robot based on the Robot Operating System (ROS) or a custom communication protocol.
[0037] Step 202: After receiving the text information, the large model analyzes whether to perform an action or engage in dialogue. If it is engaging in dialogue, it generates a response based on the text information and broadcasts the response through the TTS module. If it is performing an action, the large model calls the robot's preset database, combines it with the scene requirements information, generates an action logic description text, and returns the action logic description text to the third-party scheduling service.
[0038] Step 203: The third-party scheduling service calls the preset scheduling service through the scheduling service interface.
[0039] Among them, the preset scheduling service can refer to the pre-set scheduling service used to generate action behavior trees and coordinate scheduling with behavior trees.
[0040] Optionally, the aforementioned preset scheduling service can be a Model Context Protocol (MCP) scheduling service. Based on this, the scheduling service interface in step 203 can be an MCP interface.
[0041] Step 204: The preset scheduling service parses the action steps and functional requirements in the action logic description text, matches the corresponding functional modules from the preset database, and converts them into functional units that can be recognized by the behavior tree. The action behavior tree is then formed by combining them according to the action logic order and sent to the behavior tree scheduler.
[0042] After receiving the action behavior tree sent by the preset scheduling service, the behavior tree calls the corresponding nodes to perform a series of operations based on the action behavior tree, connecting the actions in the object grasping process.
[0043] In this embodiment, a pre-designed database combination mechanism is used to integrate the information of each functional module of the robot into a pre-designed database through a pre-designed scheduling service. This database is then matched and combined with the functional units that can be identified by the behavior tree to form a structured action execution logic, which improves the standardization and reusability of action execution.
[0044] Step 205: During the execution of each node in the behavior tree, the node detection and reporting module monitors the execution status of each node in real time.
[0045] This embodiment adds a node detection and reporting module, which can collect success, failure, and abnormal status information of each node in the behavior tree during execution and feed it back to the preset scheduling service and the large model, providing accurate data support for subsequent action adjustment and replanning.
[0046] Step 206: Based on the action behavior tree, the behavior tree triggers the operation and control service to execute the crawling action.
[0047] It should be understood that when implementing object grasping based on action behavior tree, behavior tree not only needs to call the motion control grasping unit, but may also need to call other units (such as the visual recognition unit) to complete the entire object grasping process.
[0048] The embodiments provided in this application are as follows: Figure 2 The improved human-computer interaction large model process shown has the following advantages: Collaborative scheduling optimization: This addresses the issue of separation between scheduling services and behavior tree execution control in traditional architectures. It constructs a pre-defined node collaboration mechanism between scheduling services and behavior trees to achieve closed-loop management of large model instruction generation, scheduling distribution, action execution, and status feedback, thereby improving the real-time performance and accuracy of action execution.
[0049] Functional integration and standardization: A pre-defined database is built, and the information of each functional module is standardized through a pre-defined scheduling service. This enables rapid matching of different functional modules with the functional units of the behavior tree, reduces the cost of module interface adaptation, and improves the maintainability and scalability of the system.
[0050] Status feedback mechanism construction: Design a node detection and reporting module to collect the execution status of behavior tree nodes in real time and feed it back to the preset scheduling service and large model, so as to realize the rapid identification and location of abnormal and failure states, provide data support for action replanning, and ensure the continuity of action execution.
[0051] Enhanced architectural flexibility: Based on a modular architecture with preset scheduling services and behavior trees, the robot's functional modules can be standardized and flexibly combined, supporting rapid adaptation to different types of robots and rapid expansion of new functions, meeting the needs of batch applications and complex scenarios.
[0052] Based on this, the embodiments of this application provide an efficient, intelligent, and flexible robot autonomous action execution architecture based on preset scheduling services and behavior trees. This architecture solves problems such as poor scheduling coordination, low efficiency of function integration, and lack of state feedback in traditional architectures, and improves the success rate, stability, and scalability of robot autonomous action execution. It lays the foundation for the autonomous application of robots in various scenarios such as industrial manufacturing, service industry, and family companionship.
[0053] The robot autonomous action execution architecture based on preset scheduling services and behavior trees provided in this application embodiment achieves a complete autonomous control chain of "perception → decision → execution → feedback" through the deep integration of large models, preset scheduling services, and behavior trees. This breaks through the limitations of traditional robot "command-driven" and realizes a step towards "intelligent decision-driven," laying the foundation for higher-order autonomous control. For example, in the service robot scenario, the robot can autonomously analyze the required actions ("pick up debris → sort and place → clean the table") based on user voice commands (such as "tidy up the living room"), and complete a complex series of actions by combining functional units through preset scheduling services. This provides core architectural support for robots to move towards Artificial General Intelligence (AGI) autonomous control.
[0054] To illustrate the technical solution of this application, the following is combined with... Figure 2 The method for executing autonomous robot actions according to embodiments of this application will be described in detail. Please refer to [link to relevant documentation]. Figure 3 , Figure 3 The diagram illustrates a flowchart of a robot autonomous action execution method according to an embodiment of this application. This is an example and not a limitation; the method can be applied to a robot, specifically to the robot's processor. The robot is configured with a large model, a preset scheduling service, and a behavior tree. The method includes the following steps: Step 301: Trigger the large model to call the robot's preset database, combine the scene requirement information, generate action logic description text, and send the action logic description text to the preset scheduling service.
[0055] The preset database includes standardized descriptions of each first functional module of the robot that has been connected.
[0056] In this embodiment, the large model actively calls the preset database during planning, which ensures that each planned action in the generated action logic description text has a corresponding functional module as support, thus ensuring the executability and practical feasibility of the action planning.
[0057] Scene requirement information can refer to a formatted description of the specific tasks or goals that the robot needs to complete. It is a standardized task input that the large model directly receives and analyzes when planning actions.
[0058] In some embodiments, the robot can receive user voice commands (such as "grab the red square on the table") via an ASR module. The ASR module converts the voice commands into text information and identifies the text information as the scene requirement information. Alternatively, the robot can obtain scene requirement information through system commands (such as "assemble part A to workstation B" in an industrial scenario).
[0059] Action logic description text can refer to a sequence of action steps with a clear logical order, generated by a large model planning process to meet scenario requirements.
[0060] In some embodiments, before performing step 301, the robot system can be started first to complete the initial loading of the large model SDK, preset scheduling service, motion control service, and node detection and reporting module, ensuring normal communication between the above modules (which can be implemented based on ROS or a custom communication protocol).
[0061] In some embodiments, a preset scheduling service may be pre-built and maintained, which stores standardized description information of each first functional module of the robot for invocation and matching by the large model and the preset scheduling service in subsequent steps.
[0062] In some embodiments, large models can send action logic description text to a preset scheduling service through a third-party scheduling service.
[0063] Step 302: Trigger the preset scheduling service to generate an action behavior tree based on the action logic description text and the preset database, and send the action behavior tree to the behavior tree.
[0064] Among them, behavior tree can refer to a model used to schedule the behavior of a robot.
[0065] An action behavior tree can refer to a behavior tree instance used to complete the task corresponding to the scenario requirement information.
[0066] It should be understood that a behavior tree is a general scheduling model that defines node types, execution rules, and communication interfaces, but it does not contain specific task logic itself. An action behavior tree, on the other hand, is a specific tree structure dynamically generated based on a preset database and the action logic description text of the task to complete the task corresponding to the scenario requirement information. It is a one-time generated task program for a specific task.
[0067] In this embodiment, the large model generates action logic description text by analyzing the preset database and scene requirement information. It is then transformed into an executable action behavior tree by the preset scheduling service, which ensures the consistency between the large model's decisions and the robot's actual action execution, improves the intelligence of the robot's autonomous action execution, and optimizes the interaction process between the large model and the execution layer.
[0068] Step 303: Trigger the behavior tree to call the corresponding nodes to control the robot to perform actions based on the action behavior tree.
[0069] In this embodiment, the behavior tree triggers the robot to perform actions based on the action behavior tree, which enables the behavior tree to drive the robot strictly and reliably according to the predetermined logic. This avoids action conflicts or timing errors that may occur due to the dispersion of logic in traditional solutions, thereby ensuring the continuity, orderliness and high success rate of complex task execution.
[0070] In this embodiment, by using a collaborative architecture of a pre-defined scheduling service and behavior tree nodes, an executable action behavior tree can be generated based on the action logic description text generated by the large model and combined with a pre-defined database. The corresponding nodes are then called based on the action behavior tree to control the robot to perform actions. This breaks the limitation of the separation of scheduling and control in the traditional robot action execution, realizes the efficient connection of action logic from generation to execution, and solves the problem of the separation of scheduling service and behavior tree control in the traditional architecture.
[0071] In some embodiments of this application, before triggering the large model to call the robot's preset database, combine it with scene requirement information, and generate action logic description text, the following steps are also included: The preset scheduling service is triggered to scan the first functional modules (such as motion control and grasping module, visual recognition module, and voice interaction module) that the robot has been connected to, and collects the description information of each first functional module. The description information of each first functional module is standardized to obtain the standardized description information of each first functional module. Based on the standardized information of each first functional module, a preset database is constructed. The relevant information of a first functional module includes the functional description, interface parameters and execution capabilities of the first functional module.
[0072] In this embodiment, the preset scheduling service collects descriptive information such as functional descriptions, interface parameters, and execution capabilities of each first functional module and standardizes it. This establishes a unified descriptive language and data format for functional modules with different functions and implementation methods. Regardless of how a functional module is implemented at the underlying level, its functional description, interface parameters, and execution capabilities in the system are defined in a standardized form. This solves the problem of heterogeneous module information in traditional architectures, which cannot be uniformly understood and processed by the system, thus laying the foundation for automated scheduling.
[0073] In this embodiment, the pre-set database and modular architecture ensure that the access, updating, and replacement of robot functional modules do not require readjustment of the core logic. When adding a new functional module, only the standardized description information of the functional module needs to be added to the pre-set database, and it can be automatically matched through the pre-set scheduling service and behavior tree, shortening the deployment cycle by 60% compared to traditional solutions. At the same time, this architecture supports rapid adaptation to different robot models. Only the interface parameters in the pre-set database need to be updated to enable the application of the same set of control logic on multiple types of robots, reducing the cost of batch deployment and greatly optimizing the system's maintainability and scalability.
[0074] In some embodiments of this application, the above-mentioned triggering of the large model to call the robot's preset database, combined with scene requirement information, to generate action logic description text may include: The large model is triggered to call the preset database, and combined with the scene requirements information, it analyzes the action target, the required third-party functional modules and action logic, and generates action logic description text.
[0075] As an example rather than a limitation, the voice command received by the ASR module is "grab the red square on the desktop". After converting it into scene requirement information, the large model calls the preset database. The action target analyzed by combining the scene requirement information can be "grab the red square". The required third functional modules are "visual recognition module and motion control grasping module". The action logic is "first identify the position of the object → calculate the grasping path → control the robotic arm to grasp".
[0076] In this embodiment, by intelligently deconstructing and reconstructing the relatively abstract "scenario requirement information" into a structured scheme containing specific action goals, functional modules and action logic, a key transformation from user intent to action instructions that the robot can understand and operate is achieved. This is the core manifestation of the robot's advanced intelligence of "understanding instructions and being able to plan how to complete them".
[0077] It should be understood that the second functional module is obtained by matching the action logic description text from a preset database, while the third functional module is obtained by matching the scene requirement information from the preset database. Compared to the third functional module, the actions implemented by the second functional module are more logical. For example, the third functional module is used to implement the action of raising a hand, while the second functional module can implement the action of first raising the head and then raising the hand.
[0078] In some embodiments of this application, triggering the preset scheduling service to generate an action behavior tree based on the action logic description text and the preset database may include: The preset scheduling service is triggered to match the corresponding second functional modules from the preset database based on the action logic description text, and to convert the second functional modules into functional units that the behavior tree can recognize, and to combine them in the order of action logic to form the action behavior tree.
[0079] In this embodiment, the preset scheduling service matches the corresponding second functional modules from the preset database based on the action logic description text. The preset scheduling service does not need to understand the interface of each functional module. It only needs to perform standardized queries to achieve fast and accurate positioning of the second functional modules.
[0080] The action behavior tree is used to describe the execution order, dependencies, and triggering conditions of each functional unit.
[0081] As an example and not a limitation, each functional unit includes a visual recognition unit, a path planning unit, and a capture and execution unit. The preset scheduling service combines the above three functional units according to the action logic order to form an action behavior tree, which may include the following: The execution order of the above three functional units is: visual recognition unit → path planning unit → grasping and execution unit.
[0082] The dependencies between the three functional units are as follows: the execution of the path planning unit depends on the successful execution of the visual recognition unit; the execution of the grasping and execution unit depends on the successful execution of the path planning unit.
[0083] The triggering conditions for the above three functional units are as follows: the visual recognition unit can be triggered by receiving a target detection instruction; the path planning unit can be triggered by successfully executing the visual recognition unit; and the grasping execution unit can be triggered by successfully executing the path planning unit.
[0084] In this embodiment, the action behavior tree clearly defines the execution order, dependencies, and triggering conditions of each functional unit. The behavior tree triggers the robot to perform actions based on the action behavior tree, which enables the behavior tree to drive the robot strictly and reliably according to the predetermined logic. This avoids action conflicts or timing errors that may occur due to the dispersion of logic in traditional solutions, thereby ensuring the continuity, orderliness, and high success rate of complex task execution.
[0085] In some embodiments of this application, the aforementioned triggering of the preset scheduling service, based on action logic description text, matches corresponding second functional modules from a preset database, and may include: The preset scheduling service is triggered to parse the action steps and functional requirements in the action logic description text, and matches the corresponding second functional modules from the preset database.
[0086] The preset scheduling service can parse the action steps and functional requirements in the action logic description text through the scheduling service interface call mechanism, and query the functional modules (i.e., the second functional modules) that match the action steps and functional requirements from the preset database.
[0087] In this embodiment, by parsing the action steps and functional requirements in the action logic description text, a complete task description can be decomposed into a series of specific, operable sub-tasks and their required functional types, enabling the preset scheduling service to accurately understand the planning intent of the large model.
[0088] Behavior trees control robot actions through nodes. In some embodiments of this application, a node detection and reporting module can be introduced to monitor the execution status of nodes, such as... Figure 4 As shown, its implementation process may include steps 401 to 403.
[0089] Step 401: During the execution of each node, the node detection and reporting module is triggered to monitor the execution status of each node.
[0090] During the execution of each node, the node detection and reporting module monitors the execution status of each node in real time. The execution status of a node typically includes any of the following: success, failure, exception, or execution.
[0091] In some embodiments, if a node executes successfully, the execution status of that node is a success status, and a success log can be recorded to trigger the execution of the next dependent node.
[0092] Step 402: If there is a target node with an execution status of non-success, collect the reason for non-success and trigger the preset scheduling service to send the reason for non-success to the large model. Non-success status includes failure status or abnormal status.
[0093] If a node fails to execute (e.g., the visual recognition unit fails to find the target object, or the robotic arm grasps the object at an offset) or experiences an execution anomaly (e.g., hardware failure, or communication interruption), the cause of failure (e.g., "object occlusion causes recognition failure" or "insufficient robotic arm force causes the object to fall off") or anomaly can be immediately collected and fed back to the large model through a preset scheduling service.
[0094] When a node malfunctions, an emergency plan can be triggered (such as pausing the action or issuing an alarm), and the abnormal information can be reported.
[0095] In this embodiment, the node detection and reporting module monitors the execution status of nodes in real time, enabling rapid identification and location of faulty nodes. Combined with emergency plans, this can prevent system crashes caused by the spread of faults. For example, when a robotic arm hardware malfunctions, the system can immediately stop its operation, report the fault information, and prompt for repair, reducing the risk of equipment damage.
[0096] Step 403: Trigger the large model to update the action logic description text based on the non-successful reason, combined with the preset database and scenario requirement information, and return to the step of sending the action logic description text to the preset scheduling service and subsequent steps until the execution status of the target node is successful, or the large model determines that the action cannot be completed by updating the action logic description text.
[0097] In this embodiment, after receiving the failure or exception reason sent by the preset scheduling service, the large model can re-analyze the cause of the problem (such as "object occlusion → adjust visual recognition angle", "insufficient force → increase robotic arm grasping force") by combining the preset database and scene requirement information, and generate an adjusted action logic description text (i.e., a new action logic description text). The preset scheduling service calls the interface again, updates the functional unit according to the new action logic description text, generates an adjusted action behavior tree, sends it to the behavior tree, triggers the robot to re-execute the action, and repeats the steps of behavior tree execution, node status detection and reporting, action adjustment and re-execution until the action is successfully executed, or the large model determines that the action cannot be completed through adjustment (such as the target object is missing), and then feeds back the execution result through the TTS module (such as "cannot find the red square, grasping failed"). The above-mentioned dynamic adjustment mechanism based on execution status feedback further improves the system's adaptability in complex scenes (such as object occlusion, changes in ambient light), and ensures the continuity and stability of action execution.
[0098] In this embodiment, the node detection and reporting module provides real-time feedback on the execution status of nodes, enabling large models to quickly adjust the action logic description text. This avoids execution deviations caused by "no feedback after instruction is issued" in traditional architectures, significantly improving the efficiency and accuracy of action execution. For example, in a crawling scenario, if the first crawl fails, this embodiment can complete the execution status reporting and action logic description text adjustment within 1 second, allowing the crawling action to be re-executed. The success rate of action execution is more than 30% higher than that of traditional solutions.
[0099] In some embodiments of this application, if voice feedback on the execution status of each node is required, after triggering the node detection and reporting module to monitor the execution status of each node in the behavior tree, the method further includes: The TTS module is triggered to convert the execution status into voice broadcast.
[0100] When the execution status is text information, the TTS module can convert the text information into voice information for broadcast.
[0101] In some embodiments of this application, when it is necessary to control the robot to perform a grasping action, such as Figure 5 As shown, the above-mentioned control of the robot to perform actions through the corresponding nodes may include steps 501 to 502.
[0102] Step 501: The object detection node calls the visual recognition unit to obtain the pose data of the target object and sends the pose data to the motion control service.
[0103] In this embodiment, the pose data of the target object can be accurately obtained by calling the visual recognition unit through the object detection node.
[0104] Step 502: Trigger the motion control service to calculate the robot's grasping path based on the pose data, call the motion control grasping unit to control the robot to move to the target position based on the grasping path, and execute the grasping action.
[0105] In this embodiment, the motion control service calculates the grasping path based on the pose data and controls the robotic arm to move to the target position through the motion control grasping unit, which can realize the precision and automation of motion execution, thereby ensuring the accuracy of grasping the target object.
[0106] In some embodiments, after an action is successfully executed, a preset scheduling service can be triggered to record the execution log, including information such as the action target, execution steps, execution status of each node, and execution time, for subsequent system optimization and troubleshooting.
[0107] In some embodiments, after an action is successfully executed, the robot can also perform a system reset to release the resources (such as functional modules and memory space) occupied by the current action execution, and wait for the next action command (such as a voice command or system command) to be triggered.
[0108] It should be understood that the sequence number of each step in the above embodiments does not imply the order of execution. The execution order of each process should be determined by its function and internal logic, and should not constitute any limitation on the implementation process of the embodiments of this application.
[0109] Corresponding to the robot autonomous action execution method described in the above embodiments, Figure 6 A schematic diagram of the structure of the robot autonomous motion execution device provided in the embodiment of this application is shown. For ease of explanation, only the parts related to the embodiment of this application are shown.
[0110] Reference Figure 6 The device includes: The model triggering module 601 is used to trigger the large model to call the robot's preset database, combine the scene requirement information, generate action logic description text, and send the action logic description text to the preset scheduling service. The preset database includes standardized description information of each first functional module that has been connected to the robot. The scheduling trigger module 602 is used to trigger the preset scheduling service to generate an action behavior tree based on the action logic description text and the preset database, and send the action behavior tree to the behavior tree. The behavior tree is a model used to schedule the behavior of the robot, and the action behavior tree is a behavior tree instance used to complete the task corresponding to the scene requirement information. The behavior tree triggering module 603 is used to trigger the behavior tree to control the robot to perform actions based on the action behavior tree through the corresponding nodes.
[0111] It should be understood that, Figure 6 The structural block diagram of the robot's autonomous motion execution device shown includes modules for execution. Figures 2 to 5 The steps in the corresponding embodiments, and for Figures 2 to 5 The steps in the corresponding embodiments have been explained in detail in the above embodiments. Please refer to them for details. Figures 2 to 5 as well as Figures 2 to 5 The relevant descriptions in the corresponding embodiments will not be repeated here.
[0112] Figure 7 This is a schematic diagram of the robot provided in an embodiment of this application. Figure 7 As shown, the robot 7 in this embodiment includes: at least one processor 70 ( Figure 7(Only one is shown in the diagram), memory 71, and computer program 72 stored in said memory 71 and executable on said at least one processor 70, which, when executed, implements the steps in any of the above method embodiments.
[0113] The robot may include, but is not limited to, a processor 70 and a memory 71. Those skilled in the art will understand that... Figure 7 The robot 7 is merely an example and does not constitute a limitation on the robot 7. It may include more or fewer parts than shown in the figure, or combine certain parts, or different parts, such as input / output devices, network access devices, etc.
[0114] The processor 70 may be a Central Processing Unit (CPU), or it may be other general-purpose processors, digital signal processors (DSPs), application-specific integrated circuits (ASICs), field-programmable gate arrays (FPGAs), or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc. A general-purpose processor may be a microprocessor or any conventional processor.
[0115] In some embodiments, the memory 71 may be an internal storage unit of the robot 7, such as a hard disk or memory of the robot 7. In other embodiments, the memory 71 may be an external storage device of the robot 7, such as a plug-in hard disk, smart media card (SMC), secure digital (SD) card, flash card, etc., equipped on the robot 7. Furthermore, the memory 71 may include both internal storage units and external storage devices of the robot 7. The memory 71 is used to store operating systems, applications, bootloaders, data, and other programs, such as the program code of computer programs. The memory 71 can also be used to temporarily store data that has been output or will be output.
[0116] Those skilled in the art will clearly understand that, for the sake of convenience and brevity, the above-described division of functional units and modules is merely an example. In practical applications, the above functions can be assigned to different functional units and modules as needed, that is, the internal structure of the device can be divided into different functional units or modules to complete all or part of the functions described above. The functional units and modules in the embodiments can be integrated into one processing unit, or each unit can exist physically separately, or two or more units can be integrated into one unit. The integrated unit can be implemented in hardware or as a software functional unit. Furthermore, the specific names of the functional units and modules are only for easy differentiation and are not intended to limit the scope of protection of this application. The specific working process of the units and modules in the above system can be referred to the corresponding process in the foregoing method embodiments, and will not be repeated here.
[0117] If the integrated unit is implemented as a software functional unit and sold or used as an independent product, it can be stored in a computer-readable storage medium. Based on this understanding, all or part of the processes in the methods of the above embodiments can be implemented by a computer program instructing related hardware. The computer program can be stored in a computer-readable storage medium, and when executed by a processor, it can implement the steps of the various method embodiments described above. The computer program includes computer program code, which can be in the form of source code, object code, executable files, or certain intermediate forms. The computer-readable medium can include at least: any entity or device capable of carrying computer program code to a device / electronic device, a recording medium, a computer memory, a read-only memory (ROM), a random access memory (RAM), an electrical carrier signal, a telecommunication signal, and a software distribution medium. Examples include USB flash drives, portable hard drives, magnetic disks, or optical disks.
[0118] In the above embodiments, the descriptions of each embodiment have different focuses. For parts that are not described in detail or recorded in a certain embodiment, please refer to the relevant descriptions of other embodiments.
[0119] Those skilled in the art will recognize that the units and algorithm steps of the various examples described in conjunction with the embodiments disclosed herein can be implemented in electronic hardware, or a combination of computer software and electronic hardware. Whether these functions are implemented in hardware or software depends on the specific application and design constraints of the technical solution. Those skilled in the art can use different methods to implement the described functions for each specific application, but such implementation should not be considered beyond the scope of this application.
[0120] In the embodiments provided in this application, it should be understood that the disclosed devices / electronic devices and methods can be implemented in other ways. For example, the device / electronic device embodiments described above are merely illustrative. For instance, the division of modules or units is only a logical functional division, and in actual implementation, there may be other division methods. For example, multiple units or components may be combined or integrated into another system, or some features may be ignored or not executed. Furthermore, the displayed or discussed mutual couplings or direct couplings or communication connections may be through some interfaces; indirect couplings or communication connections between devices or units may be electrical, mechanical, or other forms.
[0121] The units described as separate components may or may not be physically separate. The components shown as units may or may not be physical units; that is, they may be located in one place or distributed across multiple network units. Some or all of the units can be selected to achieve the purpose of this embodiment according to actual needs.
[0122] The above-described embodiments are only used to illustrate the technical solutions of this application, and are not intended to limit them. Although this application has been described in detail with reference to the foregoing embodiments, those skilled in the art should understand that modifications can still be made to the technical solutions described in the foregoing embodiments, or equivalent substitutions can be made to some of the technical features. Such modifications or substitutions do not cause the essence of the corresponding technical solutions to deviate from the spirit and scope of the technical solutions of the embodiments of this application, and should all be included within the protection scope of this application.
Claims
1. A robot autonomous action execution method characterized by, include: The large model is triggered to call the robot's preset database, combine the scene requirement information, generate action logic description text, and send the action logic description text to the preset scheduling service. The preset database includes standardized description information of each first functional module that has been connected to the robot. The preset scheduling service is triggered to generate an action behavior tree based on the action logic description text and the preset database, and the action behavior tree is sent to the behavior tree. The behavior tree is a model used to schedule the behavior of the robot, and the action behavior tree is a behavior tree instance used to complete the task corresponding to the scene requirement information. The behavior tree is triggered based on the action behavior tree, and the robot is controlled to perform actions through the corresponding nodes.
2. The robot autonomous action execution method of claim 1, wherein, The triggering of the preset scheduling service generates an action behavior tree based on the action logic description text and the preset database, including: The preset scheduling service is triggered to match the corresponding second functional modules from the preset database based on the action logic description text, and to convert the second functional modules into functional units that can be recognized by the behavior tree, and to combine them in the order of action logic to form the action behavior tree.
3. The robot autonomous action execution method according to claim 2, wherein, The triggering of the preset scheduling service, based on the action logic description text, matches the corresponding second functional modules from the preset database, including: The preset scheduling service is triggered to parse the action steps and functional requirements in the action logic description text, and to match the corresponding second functional modules from the preset database.
4. The robot autonomous action execution method of claim 1, wherein, The triggering model calls the robot's preset database, combines it with scene requirement information, and generates action logic description text, including: The large model is triggered to call the preset database, and combined with the scene requirement information, the action target, the required third functional modules and action logic are analyzed to generate the action logic description text.
5. The robot autonomous action execution method according to claim 1, characterized in that, Before the triggering of the large model to call the robot's preset database, and combining it with scene requirement information to generate action logic description text, the following is also included: The preset scheduling service is triggered to scan each of the first functional modules that the robot has connected to, and collects the description information of each first functional module. The description information of each first functional module is standardized to obtain the standardized description information of each first functional module. The preset database is constructed based on the standardized information of each first functional module. The relevant information of a first functional module includes the functional description, interface parameters and execution capabilities of the first functional module.
6. The method for executing autonomous robot actions according to any one of claims 1 to 5, characterized in that, The method for executing autonomous robot actions also includes: During the execution of each node, the node detection and reporting module is triggered to monitor the execution status of each node; If there is a target node whose execution status is unsuccessful, the reason for the failure is collected and the preset scheduling service is triggered to send the reason for the failure to the large model. The unsuccessful status includes failure status or abnormal status. The large model is triggered to update the action logic description text based on the unsuccessful reason, combined with the preset database and the scenario requirement information, and then return to execute the step of sending the action logic description text to the preset scheduling service and subsequent steps until the execution status of the target node is successful, or the large model determines that the action cannot be completed by updating the action logic description text.
7. The robot autonomous action execution method according to claim 6, characterized in that, After the trigger node detection and reporting module monitors the execution status of each node in the behavior tree, it further includes: The text-to-speech module is triggered to convert the execution state into a voice broadcast.
8. The method for executing autonomous robot actions according to any one of claims 1 to 5, characterized in that, The process of controlling the robot to perform actions through corresponding nodes includes: The object detection node calls the visual recognition unit to obtain the pose data of the target object and sends the pose data to the motion control service. The motion control service is triggered to calculate the grasping path of the robot's robotic arm based on the pose data, and the motion control grasping unit is invoked to control the robotic arm to move to the target position and perform the grasping action based on the grasping path.
9. A robot autonomous action execution device, characterized in that, include: The model triggering module is used to trigger the large model to call the robot's preset database, combine the scene requirement information, generate action logic description text, and send the action logic description text to the preset scheduling service. The preset database includes standardized description information of each first functional module that has been connected to the robot. The scheduling trigger module is used to trigger the preset scheduling service to generate an action behavior tree based on the action logic description text and the preset database, and send the action behavior tree to the behavior tree. The behavior tree is a model used to schedule the behavior of the robot, and the action behavior tree is a behavior tree instance used to complete the task corresponding to the scene requirement information. The behavior tree triggering module is used to trigger the behavior tree to control the robot to perform actions based on the action behavior tree through the corresponding nodes.
10. A robot comprising a memory, a processor, and a computer program stored in the memory and executable on the processor, characterized in that, When the processor executes the computer program, it causes the robot to implement the robot autonomous action execution method as described in any one of claims 1 to 8.
11. A computer program product, characterized in that, It includes a computer program, which, when run, causes the robot autonomous action execution method as described in any one of claims 1 to 8 to be executed.