A robot interaction method, device, apparatus and storage medium

What is AI technical title?
AI technical title is built by PatSnap AI team. It summarizes the technical point description of the patent document.
By using parallel processing of the dialogue master thread and the task master thread, and the collaboration of the hierarchical model, the problems of interactive silence and insufficient status feedback in pure voice screenless robot interaction are solved. This achieves the continuity of dialogue and the perceptibility of status during task execution, and improves the flexibility and responsiveness of robot interaction.

CN122201291APending Publication Date: 2026-06-12DIGITAL HUAXIA (SHENZHEN) TECHNOLOGY CO LTD +1

View PDF 0 Cites 0 Cited by

Patent Information

Authority / Receiving Office: CN · China
Patent Type: Applications(China)
Current Assignee / Owner: DIGITAL HUAXIA (SHENZHEN) TECHNOLOGY CO LTD
Filing Date: 2026-03-13
Publication Date: 2026-06-12

Application Information

Patent Timeline

13 Mar 2026

Application

12 Jun 2026

Publication

CN122201291A

IPC: G10L15/22; G10L15/06; G10L15/26

AI Tagging

Application Domain

Speech recognition

Explore More Agents

Novelty Search
Search existing technologies and assess novelty
↗
FTO
Analyze whether a product may infringe others' patents
↗
Design FTO
Check prior-design risk for exterior design
↗
Drafting
Draft patent application text based on a technical solution
↗
Find Solutions with TRIZ
Generate feasible solution to solve your technical challenge
↗

Similar Technology Patents

Vehicle with a beamforming system for improving speech recognition in the vehicle's interior, and methods for improving speech recognition in a vehicle's interior.
DE102024116406B4Aircraft componentsSubstation speech amplifiers
Intelligent voice human-machine interaction system adapted to multiple dialects
CN122201298ASpeech recognition Speech synthesis
REPRESENTATION OF THE SPEECH APPARATUS IN ARTICULATORY FEATURE SPACE
DE102024137164A1TracheaeSensors
A signal control closed-loop execution method and system based on AI voice instructions
CN121884812BRoad vehicles traffic control Biological models
Method and system for generating machine learning models for a vehicle's voice assistant
DE102024136402A1Semantic analysis Vehicle components

Get free access to AI patent search and analysis

Check patentability, review prior art and ask IP Agent with full patent context.

AI Technical Summary

⚠Technical Problem

Existing pure voice screenless robots struggle to balance interaction response speed and complex task planning, resulting in interactive silence issues, a disconnect between casual conversation and task-oriented interaction, difficulty in natural parallel execution within the same conversation, insufficient status feedback during task execution, and the inability of a single model to simultaneously meet the requirements of low-latency interaction and high-reliability planning.

⚗Method used

The system employs a parallel processing mechanism of dialogue master thread and task master thread, combined with a hierarchical collaboration mechanism of fast interaction model and complex planning model, to achieve continuous availability of dialogue, state awareness, and interruptible interaction. Through real-time fusion of dialogue and task state, it ensures that the robot responds to user input in a timely manner during task execution.

🎯Benefits of technology

It enhances the human-computer interaction experience of robots in real-world scenarios, ensures the continuity of dialogue and the perceptibility of status during task execution, avoids prolonged silence, and supports multi-task parallelism and dynamic adjustment.

✦ Generated by Eureka AI based on patent content.

Smart Images

Figure CN122201291A_ABST

Patent Text Reader

Abstract

The application discloses a kind of robot interaction method, device, equipment and storage medium, applied to robot control field, the semantic recognition of user voice;When it is identified as dialogue instruction, call dialogue main control thread and input dialogue instruction into dialogue model, and control robot is based on dialogue model and carries out voice reply;When it is identified as task instruction, call task main control thread and input task instruction into task model and carry out task planning, and based on planning instruction, control robot carries out task execution;Call the context fusion engine in dialogue main control thread and fuse task state event with current dialogue upper and down to obtain fusion result, input fusion result into dialogue model and carry out voice reply.Conversation interaction and task planning execution are decoupled, introduce the layered collaborative mechanism of dialogue model and task planning model, realize that dialogue is continuously available in task planning execution, state can be perceived and interaction can be interrupted, significantly improve the human-computer interaction experience of robot in real scene.

Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of robot control, and in particular to a robot interaction method, robot interaction device, electronic device, and computer-readable storage medium. Background Technology

[0002] With the development of artificial intelligence and robotics technologies, robots are gradually entering public services and daily life scenarios. Robots typically exist in mobile form or as fixed voice terminals. Due to limitations in cost, power consumption, and usage scenarios, they often lack or do not rely on displays, with human-computer interaction primarily achieved through pure voice without a screen. In this interaction mode, users cannot obtain task status, execution progress, or system prompts through a graphical interface, thus placing higher demands on the continuity, real-time feedback, and naturalness of voice interaction. However, existing pure voice screenless robots struggle to simultaneously address the issues of interaction response speed and the accuracy of complex task planning in practical applications. If the focus is on rapid responses, task planning is prone to inadequacy or errors; if the focus is on deep reasoning, significant delays are easily introduced, exacerbating the problem of silent interaction. Summary of the Invention

[0003] The purpose of this invention is to provide a robot interaction method, robot interaction device, electronic device, and computer-readable storage medium, which are applied in the field of robot control. This method decouples dialogue interaction from task planning and execution, and introduces a hierarchical collaborative mechanism between the dialogue model and the task planning model, so as to realize that the dialogue is continuously available, the state is perceptible, and the interaction can be interrupted during the task execution process, thereby significantly improving the human-computer interaction experience of the robot in real-world scenarios.

[0004] To address the aforementioned technical problems, this invention provides a robot interaction method, comprising:

[0005] Acquire user voice, convert the user voice into text, and perform semantic recognition on the text to determine the instruction type of the text;

[0006] When the instruction type is a dialogue instruction, the dialogue master thread is invoked to input the dialogue instruction into the dialogue model, and the robot is controlled to make a voice response based on the dialogue model;

[0007] When the instruction type is a task instruction, the task master thread is invoked to input the task instruction into the task model for task planning, and the robot is controlled to execute the task based on the planned instruction.

[0008] The task master thread is invoked to output the task status event. The context fusion engine in the dialogue master thread is invoked to fuse the task status event with the current dialogue context to obtain a fusion result. The dialogue master thread is invoked to input the fusion result into the dialogue model and control the robot to make a voice response based on the dialogue model.

[0009] Optionally, when the instruction type is a task instruction, the task master thread is invoked to input the task instruction into the task model for task planning, and the robot is controlled to execute the task based on the planned instructions, including:

[0010] When the instruction type is a task generation instruction, the task master thread is invoked to input the task generation instruction into the task model for task planning, and the robot is controlled to perform task execution based on the planning instruction;

[0011] When the instruction type is a task modification instruction, the task master thread is invoked to input the task modification instruction into the task model for task replanning, and the robot is controlled to execute the task based on the replanning instruction;

[0012] When the instruction type is a task switching instruction, the task master thread is invoked to input the task switching instruction into the task model for task replanning, and the robot is controlled to perform task execution based on the replanning instruction;

[0013] When the instruction type is a task pause instruction, the task master thread is invoked to control the pause of task execution based on the task pause instruction;

[0014] When the instruction type is a task termination instruction, the task master thread is invoked to control the termination of task execution based on the task termination instruction.

[0015] Optionally, when the instruction type is a task instruction, the task master thread is invoked to input the task instruction into the task model for task planning, and the robot is controlled to execute the task based on the planned instructions, including:

[0016] When the instruction type is the task instruction, the task master thread is invoked to generate a task instance and lifecycle state based on the task instruction; the task planning sub-thread in the task master thread is invoked to input the task instruction into the task model for task planning, and the planning instruction is generated based on the task planning result; the task execution sub-thread in the task master thread is invoked to control the robot hardware or call an external API to execute the task based on the planning instruction.

[0017] Optionally, when the instruction type is a task instruction, the task master thread is invoked to input the task instruction into the task model for task planning, and the robot is controlled to execute the task based on the planned instructions, including:

[0018] When the instruction type is a multi-task instruction, the task master thread is invoked to input the multi-task instruction into the task model for multi-task planning. Based on the multi-task planning result, a multi-task planning instruction is generated, and the robot is controlled to perform task execution based on the multi-task planning instruction.

[0019] Optionally, controlling the robot to provide voice responses based on the dialogue model includes:

[0020] The output text of the dialogue model is converted into speech data, and the robot is controlled to respond with speech based on the speech data.

[0021] After the voice response is completed, the robot will re-enter voice listening mode.

[0022] Optionally, the dialogue model is a fast interaction model, and the task model is a complex planning model.

[0023] Optionally, the task status events include planning, planning completed, task execution, execution exception, and task completion.

[0024] To address the aforementioned technical problems, the present invention provides a robot interaction device, comprising:

[0025] The first module is used to acquire user voice, convert the user voice into text, and perform semantic recognition on the text to determine the instruction type of the text;

[0026] The second module is used to, when the instruction type is a dialogue instruction, call the dialogue master thread to input the dialogue instruction into the dialogue model, and control the robot to make a voice response based on the dialogue model;

[0027] The third module is used to call the task master thread to input the task instruction into the task model for task planning when the instruction type is a task instruction, and to control the robot to perform the task based on the planned instruction.

[0028] The fourth module is used to call the task master thread to output task status events, call the context fusion engine in the dialogue master thread to fuse the task status events with the current dialogue context to obtain a fusion result, call the dialogue master thread to input the fusion result into the dialogue model, and control the robot to make voice responses based on the dialogue model.

[0029] To solve the above-mentioned technical problems, the present invention provides an electronic device, comprising:

[0030] Memory, used to store computer programs;

[0031] A processor is used to implement the robot interaction method described above when executing the computer program.

[0032] To address the aforementioned technical problems, the present invention provides a computer-readable storage medium storing computer-executable instructions, which, when executed by a processor, implement the robot interaction method described above.

[0033] As can be seen, this invention acquires user speech, converts it into text, and performs semantic recognition on the text to determine the instruction type. When the instruction type is a dialogue instruction, the dialogue master thread is invoked to input the dialogue instruction into the dialogue model, and the robot is controlled to respond with speech based on the dialogue model. When the instruction type is a task instruction, the task master thread is invoked to input the task instruction into the task model for task planning, and the robot is controlled to execute the task based on the planned instructions. The task master thread outputs task status events, and the context fusion engine in the dialogue master thread is invoked to fuse the task status events with the current dialogue context to obtain a fusion result. The dialogue master thread then inputs the fusion result into the dialogue model, and the robot is controlled to respond with speech based on the dialogue model. This invention decouples dialogue interaction from task planning and execution, and introduces a hierarchical collaborative mechanism between the dialogue model and the task planning model, achieving continuous availability of dialogue, perceptible status, and interruptible interaction during task execution, thereby significantly improving the human-computer interaction experience of the robot in real-world scenarios. Attached Figure Description

[0034] To more clearly illustrate the technical solutions in the embodiments of the present invention or the prior art, the drawings used in the description of the embodiments or the prior art will be briefly introduced below. Obviously, the drawings described below are only embodiments of the present invention. For those skilled in the art, other drawings can be obtained based on the provided drawings without creative effort.

[0035] Figure 1 A flowchart of a robot interaction method provided in an embodiment of the present invention;

[0036] Figure 2 This is an example of a robot interaction process provided in an embodiment of the present invention;

[0037] Figure 3 This is a structural block diagram of a robot interaction device provided in an embodiment of the present invention. Detailed Implementation

[0038] The technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of the present invention, and not all embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those skilled in the art without creative effort are within the scope of protection of the present invention.

[0039] With the development of artificial intelligence and robotics technologies, service robots, companion robots, guide robots, and home assistant robots are gradually entering public services and daily life scenarios. In applications such as elderly care, hotel services, exhibition guidance, medical assistance, and home services, robots typically exist in a mobile form or as fixed voice terminals. Due to limitations in cost, power consumption, and usage scenarios, they often lack or do not rely on displays, and human-computer interaction is mainly completed through pure voice without a screen. In this type of interaction mode, users cannot obtain task status, execution progress, or system prompts through a graphical interface, thus placing higher demands on the continuity, real-time feedback capability, and naturalness of voice interaction.

[0040] In practical applications, robots not only need to perform specific tasks such as retrieving objects, navigating, controlling equipment, and retrieving information, but also need to engage in casual conversation, provide emotional responses, and conduct continuous dialogue with users. Users' actual expressions often don't strictly distinguish between "task instructions" and "chat content," but rather constantly switch and intertwine these two needs within the same dialogue. For example, after issuing the instruction "Get me a bottle of water," they might continue to inquire about the weather, add additional conditions, or temporarily pause and trigger a new task. This requires the robot's interaction system to maintain uninterrupted dialogue capabilities during task execution and continuously convey the system's status to the user in a purely voice-based environment.

[0041] However, existing robot voice interaction systems mostly employ a serial or semi-serial processing architecture, typically including steps such as speech recognition, intent understanding, dialogue response or task triggering, task planning and execution, and result feedback. When the system recognizes that user input involves a specific task, it often enters the task processing flow, during which the dialogue module is partially or completely suspended until the task is completed or a predetermined feedback node is reached before the voice broadcast resumes. This architecture, when there is a long planning time or task execution time, is prone to creating a noticeable "silent period" in pure voice-only, screenless scenarios, making it difficult for users to determine whether the system is still working, resulting in a poor interactive experience.

[0042] Furthermore, existing technologies typically employ a separate design for casual conversation and task-oriented dialogue, with each handled by different rules, models, or state machines, and connected through limited switching conditions. When the system is in task execution mode, it may fail to respond promptly to new casual input or misinterpret it as task parameters; conversely, in casual conversation mode, it may delay or ignore responses to tasks. This design hinders users from engaging in continuous dialogue, inserting new instructions, or adjusting and terminating existing tasks during task execution, thus reducing the robot's interactive flexibility in real-world, complex contexts.

[0043] From a model perspective, with the application of large language models in robot interaction, some existing solutions attempt to use a single large model to handle both dialogue generation and task planning. However, task planning typically involves complex logic such as multi-step reasoning, constraint satisfaction, path search, and resource coordination, often requiring strong reasoning capabilities or "thinking modes," resulting in significant computational overhead. Voice interaction, on the other hand, is highly sensitive to response latency, especially in screenless scenarios where users expect verbal feedback within a short timeframe. When a single model simultaneously addresses both needs, a conflict often arises between response speed and planning accuracy: prioritizing rapid responses can lead to insufficient or erroneous task planning; prioritizing deep reasoning can introduce significant delays, exacerbating the problem of silent interaction.

[0044] Meanwhile, in existing technologies, the intermediate states of task planning and execution processes typically exist as internal data, lacking a unified mechanism for event-based expression and dialogue integration, making it difficult to naturally translate into understandable voice feedback. This means that even when the system continuously advances the task in the background, it cannot effectively communicate the current stage to the user, making it difficult for the user to intervene, correct errors, or cancel the task in a timely manner. This deficiency is particularly prominent in robotic scenarios involving physical actions or safety-related operations.

[0045] In summary, existing technologies in purely voice-based, screenless robot interaction systems generally suffer from the following shortcomings:

[0046] 1) The task execution and dialogue response are tightly coupled, resulting in prolonged silence;

[0047] 2) Casual conversation and task-oriented interactions are disconnected and difficult to run naturally in the same conversation;

[0048] 3) It is difficult to continuously provide users with status feedback during task execution, resulting in insufficient interpretability of the interaction;

[0049] 4) A single model cannot simultaneously meet the requirements of low-latency interaction and high-reliability task planning.

[0050] Therefore, there is an urgent need for a new robot interaction system architecture that enables dialogue interaction, task planning, and task execution to proceed in parallel. This architecture should ensure rapid voice feedback, the accuracy of complex task planning, and the dynamic integration of the task execution process into the dialogue content, thereby improving the human-computer interaction experience in screenless, voice-only scenarios.

[0051] The following combination Figure 1 , Figure 1 A flowchart of a robot interaction method provided in an embodiment of the present invention, the method may include:

[0052] S101: Acquire user speech, convert user speech into text, and perform semantic recognition on the text to determine the instruction type of the text.

[0053] This embodiment can acquire user voice, convert user voice into text using ASR (Automatic Speech Recognition) technology, and perform semantic recognition on the converted text to determine the type of instruction in the text.

[0054] like Figure 2 As shown, when a user issues a voice command (such as "Get me a bottle of water" or "How's the weather today?"), it is converted into text information via ASR (Automatic Speech Recognition). The text information is then semantically understood, and the system performs user intent recognition, sentiment analysis, and context parsing. If the command is recognized as simple chat or a query, i.e., a dialogue command, it can be directly sent to the dialogue control thread. If the command is recognized as a task command (such as navigation or retrieving an item), both the dialogue control thread and the task control thread can be triggered simultaneously, and the system enters a dual-thread parallel processing mode.

[0055] This embodiment features a dialogue master thread and a task master thread, both operating in parallel as relatively independent control units. The dialogue master thread continuously maintains the human-computer dialogue process, handling interactive behaviors such as voice input response, casual conversation generation, task confirmation, and status announcements. The task master thread is responsible for task creation, planning, scheduling, and execution management. The two master threads communicate via events and instructions, ensuring that neither thread is blocked by the other's running status, thus preventing interaction interruptions due to task processing time from a system architecture perspective.

[0056] S102: When the instruction type is a dialogue instruction, the dialogue master thread is invoked to input the dialogue instruction into the dialogue model, and the robot is controlled to respond with voice based on the dialogue model.

[0057] like Figure 2As shown, when the recognized instruction type is a dialogue instruction, the dialogue manager in the dialogue main thread can be called to input the dialogue instruction into the dialogue model. The output text of the dialogue model is converted into speech data through TTS (Text to Speech) technology, and the robot is controlled to make a speech response based on the speech data. After the speech response is completed, the robot is controlled to re-enter the speech listening state.

[0058] S103: When the instruction type is a task instruction, the task master thread is called to input the task instruction into the task model for task planning, and the robot is controlled to execute the task based on the planned instructions.

[0059] S104: Call the task master thread to output the task status event, call the context fusion engine in the dialogue master thread to fuse the task status event with the current dialogue context to obtain the fusion result, call the dialogue master thread to input the fusion result into the dialogue model, and control the robot to make a voice response based on the dialogue model.

[0060] Specifically, when the instruction type is a task instruction, the task master thread is invoked to generate a task instance and lifecycle state based on the task instruction; the task planning sub-thread in the task master thread is invoked to input the task instruction into the task model for task planning, and generate planning instructions based on the task planning results; the task execution sub-thread in the task master thread is invoked to control the robot hardware or call an external API (Application Programming Interface) to execute the task based on the planning instructions.

[0061] like Figure 2 As shown, the identified task instruction first enters the Task Instance Manager in the task master thread. The Task Instance Manager determines whether the task instruction is the creation of a new task or the modification of an old task (such as interruption or parameter correction), and creates or updates the corresponding task instance object (number, lifecycle status).

[0062] The task instance triggers a task planning sub-thread (slow planning), which calls a complex planning model with deep reasoning capabilities to generate a specific execution plan (such as map path planning or robotic arm motion sequence) based on environmental constraints. This process may take a long time (several seconds to tens of seconds).

[0063] Once the planning is complete, instructions are sent to the task execution sub-thread to drive the robot hardware or call external APIs to perform actual physical actions.

[0064] like Figure 2As shown, at each key node of task planning and execution (such as "planning", "planning completed", "task execution", "execution exception", "task end"), neither the task planning sub-thread nor the task execution sub-thread directly generates speech, but instead sends structured task status events to the bus.

[0065] During task planning and execution, the task master thread outputs the task's lifecycle state information as task status events. The dialogue master thread dynamically injects these task status events into the current dialogue context through the Context Fusion Engine, enabling the system to generate corresponding voice feedback based on the actual task progress. This continuously communicates the system status to the user during task execution, achieving process-aware interaction in a screenless voice environment. In this embodiment, task status events may include states such as planning, planning completed, task execution, execution exception, and task completion.

[0066] Because the task execution sub-threads are independent and parallel, while the robot is performing physical actions (such as moving), the user can initiate voice input again at any time (such as "Could you bring me an apple?" or "The water should be at room temperature"). The new voice input will then undergo text conversion before entering the semantic recognition step. The parameters of the running task will be updated through the task instance manager, which not only achieves multi-task parallelism but also enables dynamic adjustments during task execution.

[0067] This embodiment assigns independent task instances and lifecycle states to different tasks through a task master thread. Within the same dialogue session, the system supports multiple task instances existing in parallel and allows users to add, switch, pause, or terminate any task via voice commands. The dialogue master thread can reference different task instances within the same context, thereby achieving multi-task management at the natural language level and improving the system's adaptability to real-world, complex interactive scenarios.

[0068] Specifically, when the instruction type is a multi-task instruction, the task master thread is called to input the multi-task instruction into the task model for multi-task planning. Based on the multi-task planning results, multi-task planning instructions are generated, and the robot is controlled to perform tasks based on the multi-task planning instructions.

[0069] This embodiment sets up a task arbitration strategy with dialogue interaction as the high priority. When a new user voice input or a clear control intention is detected, the dialogue master thread has the right to preempt the task master thread, and can trigger operations such as task addition, switching, pausing, or termination. Through this mechanism, it is ensured that the robot maintains its ability to respond promptly to user input at any stage of task execution, avoiding the weakening of the controllability and safety of human-computer interaction due to task execution consuming system resources.

[0070] Specifically, when the instruction type is a task generation instruction, the task master thread is invoked to input the task generation instruction into the task model for task planning, and the robot is controlled to execute the task based on the planned instructions.

[0071] When the instruction type is a task modification instruction, the task master thread is invoked to input the task modification instruction into the task model for task replanning, and the robot is controlled to execute the task based on the replanned instruction.

[0072] When the instruction type is a task switching instruction, the task master thread is invoked to input the task switching instruction into the task model for task replanning, and the robot is controlled to execute the task based on the replanning instruction.

[0073] When the instruction type is a task pause instruction, the task master thread is invoked to control the pause of task execution based on the task pause instruction.

[0074] When the instruction type is a task termination instruction, the task master thread is invoked to control the termination of task execution based on the task termination instruction.

[0075] like Figure 2 As shown, the context fusion engine acts as a bridge connecting the two threads, receiving "task status events" from the task side in real time and merging them with the current "dialogue context" and "user intent".

[0076] For example, if the user has just given the instruction but the task has not yet been planned, the context generated by the fusion engine will be "User intent = fetch water, Task status = planning".

[0077] The merged information is input into the dialogue manager, which determines the current interaction behavior based on a priority strategy, for example:

[0078] If the task is being planned, the robot is controlled to maintain the conversation by “filling in the gaps”.

[0079] If an anomaly occurs during the task, the robot will proactively initiate an inquiry.

[0080] The dialogue manager calls the dialogue model to respond, and the dialogue model quickly generates natural spoken responses based on the fused context.

[0081] For example, in scenario A (planning): the dialogue model generates: "Okay, I'm confirming the location of the kitchen, please wait." (instant response, eliminating silence).

[0082] For example, in scenario B (planning complete): the dialogue model generates: "Found it, I'll go to the kitchen to get you some water now." (state synchronization).

[0083] After the voice broadcast is completed, the robot can re-enter the listening state to achieve closed-loop feedback and continuous interaction.

[0084] The dialogue model in this embodiment can be a fast interaction model, also known as a Fast LLM (Fast Large Language Model). The dialogue master thread calls the fast interaction model, which aims for low-latency inference and is used for natural language understanding, real-time voice response, and maintaining casual conversation. The task model in this embodiment can be a complex planning model. The task master thread calls the complex planning model, also known as a slow planning model, during the task planning phase. This model enables stronger reasoning capabilities or thinking patterns to generate multi-step task plans and handle environmental constraints and abnormal situations. Through the above layered design, the parallel execution of fast response and highly reliable planning is achieved, overcoming the technical limitations of existing technologies where a single model cannot simultaneously satisfy low latency and high accuracy.

[0085] Based on the above embodiments, the present invention decouples dialogue interaction from task planning and execution, and introduces a hierarchical collaborative mechanism between the dialogue model and the task planning model, thereby enabling dialogue to remain available, state to be perceived, and interaction to be interrupted during task execution, thus significantly improving the human-computer interaction experience of robots in real-world scenarios.

[0086] The following combination Figure 3 , Figure 3 This is a structural block diagram of a robot interaction device provided in an embodiment of the present invention. The device may include:

[0087] The first module 100 is used to acquire user voice, convert user voice into text, and perform semantic recognition on the text to determine the instruction type of the text.

[0088] The second module 200 is used to call the dialogue master thread to input the dialogue instruction into the dialogue model when the instruction type is a dialogue instruction, and to control the robot to make a voice response based on the dialogue model.

[0089] The third module 300 is used to call the task master thread to input the task instruction into the task model for task planning when the instruction type is a task instruction, and to control the robot to perform the task based on the planned instruction.

[0090] The fourth module 400 is used to call the task master thread to output task status events, call the context fusion engine in the dialogue master thread to fuse the task status events with the current dialogue context to obtain the fusion result, call the dialogue master thread to input the fusion result into the dialogue model, and control the robot to make voice responses based on the dialogue model.

[0091] Based on the above embodiments, the present invention decouples dialogue interaction from task planning and execution, and introduces a hierarchical collaborative mechanism between the dialogue model and the task planning model, thereby enabling dialogue to remain available, state to be perceived, and interaction to be interrupted during task execution, thus significantly improving the human-computer interaction experience of robots in real-world scenarios.

[0092] Based on the above embodiments, the third module 300 may include:

[0093] The first unit is used to call the task master thread to input the task generation instruction into the task model for task planning when the instruction type is task generation instruction, and to control the robot to perform the task based on the planned instruction.

[0094] The second unit is used to call the task master thread to input the task modification instruction into the task model for task replanning when the instruction type is task modification instruction, and to control the robot to perform task execution based on the replanning instruction.

[0095] The third unit is used to call the task master thread to input the task switching instruction into the task model for task replanning when the instruction type is task switching instruction, and to control the robot to perform task execution based on the replanning instruction.

[0096] The fourth unit is used to call the task master thread to control the suspension of task execution based on the task suspension instruction when the instruction type is a task pause instruction;

[0097] The fifth unit is used to call the task master thread to terminate task execution based on the task termination instruction when the instruction type is a task termination instruction.

[0098] Based on the above embodiments, the third module 300 may include:

[0099] The sixth unit is used to, when the instruction type is a task instruction, call the task master thread to generate a task instance and lifecycle state based on the task instruction; call the task planning sub-thread in the task master thread to input the task instruction into the task model for task planning, and generate planning instructions based on the task planning results; and call the task execution sub-thread in the task master thread to control the robot hardware or call external APIs to execute tasks based on the planning instructions.

[0100] Based on the above embodiments, the third module 300 may include:

[0101] The seventh unit is used to call the task master thread to input the multi-task instruction into the task model for multi-task planning when the instruction type is a multi-task instruction. Based on the multi-task planning result, it generates multi-task planning instructions and controls the robot to perform tasks based on the multi-task planning instructions.

[0102] Based on the above embodiments, the second module 200 may include:

[0103] The eighth unit is used to convert the output text of the dialogue model into speech data, and to control the robot to make speech responses based on the speech data;

[0104] The ninth unit is used to control the robot to re-enter the voice listening state after the voice reply is completed.

[0105] Based on the above embodiments, the dialogue model is a fast interaction model, and the task model is a complex planning model.

[0106] Based on the above embodiments, task status events include planning, planning completed, task execution, execution exception, and task completion.

[0107] Based on the above embodiments, the present invention also provides an electronic device, which may include a memory and a processor. The memory stores a computer program, and when the processor calls the computer program in the memory, it can implement the steps provided in the above embodiments. Of course, the device may also include various necessary network interfaces, a power supply, and other components.

[0108] The present invention also provides a computer-readable storage medium having a computer program stored thereon, which, when executed by an execution terminal or processor, can implement the method provided in the embodiments of the present invention; the storage medium may include various media capable of storing program code, such as a USB flash drive, a portable hard drive, a read-only memory (ROM), a random access memory (RAM), a magnetic disk, or an optical disk.

[0109] In this document, relational terms such as "first" and "second" are used merely to distinguish one entity or operation from another, without necessarily requiring or implying any such actual relationship or order between these entities or operations. Furthermore, the terms "comprising," "including," or any other variations thereof are intended to cover non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements includes not only those elements but also other elements not expressly listed, or elements inherent to such a process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one..." does not exclude the presence of other identical elements in the process, method, article, or apparatus that includes said element.

Claims

1. A robot interaction method, characterized in that, include: Acquire user voice, convert the user voice into text, and perform semantic recognition on the text to determine the instruction type of the text; When the instruction type is a dialogue instruction, the dialogue master thread is invoked to input the dialogue instruction into the dialogue model, and the robot is controlled to make a voice response based on the dialogue model; When the instruction type is a task instruction, the task master thread is invoked to input the task instruction into the task model for task planning, and the robot is controlled to execute the task based on the planned instruction. The task master thread is invoked to output the task status event. The context fusion engine in the dialogue master thread is invoked to fuse the task status event with the current dialogue context to obtain a fusion result. The dialogue master thread is invoked to input the fusion result into the dialogue model and control the robot to make a voice response based on the dialogue model.

2. The robot interaction method according to claim 1, characterized in that, When the instruction type is a task instruction, the task master thread is invoked to input the task instruction into the task model for task planning, and the robot is controlled to execute the task based on the planned instructions, including: When the instruction type is a task generation instruction, the task master thread is invoked to input the task generation instruction into the task model for task planning, and the robot is controlled to perform task execution based on the planning instruction; When the instruction type is a task modification instruction, the task master thread is invoked to input the task modification instruction into the task model for task replanning, and the robot is controlled to execute the task based on the replanning instruction; When the instruction type is a task switching instruction, the task master thread is invoked to input the task switching instruction into the task model for task replanning, and the robot is controlled to perform task execution based on the replanning instruction; When the instruction type is a task pause instruction, the task master thread is invoked to control the pause of task execution based on the task pause instruction; When the instruction type is a task termination instruction, the task master thread is invoked to control the termination of task execution based on the task termination instruction.

3. The robot interaction method according to claim 1, characterized in that, When the instruction type is a task instruction, the task master thread is invoked to input the task instruction into the task model for task planning, and the robot is controlled to execute the task based on the planned instructions, including: When the instruction type is the task instruction, the task master thread is invoked to generate a task instance and lifecycle state based on the task instruction; the task planning sub-thread in the task master thread is invoked to input the task instruction into the task model for task planning, and the planning instruction is generated based on the task planning result; the task execution sub-thread in the task master thread is invoked to control the robot hardware or call an external API to execute the task based on the planning instruction.

4. The robot interaction method according to claim 1, characterized in that, When the instruction type is a task instruction, the task master thread is invoked to input the task instruction into the task model for task planning, and the robot is controlled to execute the task based on the planned instructions, including: When the instruction type is a multi-task instruction, the task master thread is invoked to input the multi-task instruction into the task model for multi-task planning. Based on the multi-task planning result, a multi-task planning instruction is generated, and the robot is controlled to perform task execution based on the multi-task planning instruction.

5. The robot interaction method according to claim 1, characterized in that, Controlling the robot to provide voice responses based on the dialogue model includes: The output text of the dialogue model is converted into speech data, and the robot is controlled to respond with speech based on the speech data. After the voice response is completed, the robot will re-enter voice listening mode.

6. The robot interaction method according to claim 1, characterized in that, The dialogue model is a fast interaction model, and the task model is a complex planning model.

7. The robot interaction method according to claim 1, characterized in that, The task status events include planning, planning completed, task execution, execution exception, and task completion.

8. A robot interaction device, characterized in that, include: The first module is used to acquire user voice, convert the user voice into text, and perform semantic recognition on the text to determine the instruction type of the text; The second module is used to, when the instruction type is a dialogue instruction, call the dialogue master thread to input the dialogue instruction into the dialogue model, and control the robot to make a voice response based on the dialogue model; The third module is used to call the task master thread to input the task instruction into the task model for task planning when the instruction type is a task instruction, and to control the robot to perform the task based on the planned instruction. The fourth module is used to call the task master thread to output task status events, call the context fusion engine in the dialogue master thread to fuse the task status events with the current dialogue context to obtain a fusion result, call the dialogue master thread to input the fusion result into the dialogue model, and control the robot to make voice responses based on the dialogue model.

9. An electronic device, characterized in that, include: Memory, used to store computer programs; A processor for implementing the robot interaction method as described in any one of claims 1 to 7 when executing the computer program.

10. A computer-readable storage medium, characterized in that, The computer-readable storage medium stores computer-executable instructions, which, when executed by a processor, implement the robot interaction method as described in any one of claims 1 to 7.