A robot and an intelligent interaction method thereof
By combining a large model with a knowledge graph architecture, the exhibition hall guide robot has achieved intelligent human-computer interaction and autonomous planning, solving the problem that existing robots cannot provide personalized services and improving the user experience of the exhibition hall.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- BAIZHI EMBODIMENT (BEIJING) TECHNOLOGY CO LTD
- Filing Date
- 2025-08-07
- Publication Date
- 2026-06-26
Smart Images

Figure CN120949939B_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the field of robotics, and in particular to a robot for guiding or explaining in exhibition halls, and its intelligent interaction method. Background Technology
[0002] The exhibition industry is developing rapidly, with social demand growing quickly. However, the training costs for exhibition staff are high and the training period is long. The quality of human guides varies, and it is difficult to meet personalized visitor needs. Existing technology has gradually introduced guided tour robots into the exhibition service field. However, while existing guided tour robots can ensure standardized and uniform explanations, they struggle to intelligently complete tasks beyond just explanations. The explanations are not engaging enough, lack human-computer interaction, and are unable to respond quickly and accurately to visitor questions. They lack specificity and personalization, showing a significant gap compared to human services and thus lowering the overall exhibition service experience. Summary of the Invention
[0003] In view of the above-mentioned defects or deficiencies in the prior art, the present invention provides a robot and its intelligent interaction method, which adopts an architecture combining large model and knowledge graph to drive various deep learning algorithms, and has functions such as image recognition, speech recognition, large language model analysis and processing, speech synthesis and multimodal fusion perception, so as to realize human-computer interaction and autonomous planning and intelligent control of the robot body.
[0004] One aspect of the present invention provides an intelligent interaction method for a robot, comprising: acquiring external command information of the robot and converting the external command information into text information; performing vectorization processing and similarity comparison on the text information and keywords in a preset task list through a large model, and identifying the tasks corresponding to keywords whose similarity reaches or exceeds a preset threshold as regular target tasks; if the similarity between the text information and the keywords in the preset task list is all below the preset threshold, then identifying the task corresponding to the text information as an unconventional target task; decomposing the regular target task or the unconventional target task into one or more sub-tasks, identifying the type of each sub-task, and determining a solution matching each sub-task according to the type of each sub-task, wherein the solution includes the execution tool, execution order, and priority of each sub-task; supervising and controlling the execution of each sub-task according to the execution tool, execution order, and priority of each sub-task, and collecting the running status information and hardware running status information of each sub-task; caching key data during the execution of each sub-task, and providing a knowledge graph database and a preset task list to the large model, as well as backing up the key data; and making management decisions on the acquisition of external commands, the invocation of the large model, the decomposition of regular and unconventional target tasks, and the execution of each sub-task.
[0005] In another aspect, the present invention also provides a robot, comprising: an instruction acquisition module configured to acquire external instruction information of the robot and convert the external instruction information into text information; an artificial intelligence module configured to perform vectorization processing and similarity comparison on the text information and keywords in a preset task list using a large model, and to identify tasks corresponding to keywords with similarity reaching or exceeding a preset threshold as regular target tasks; if the similarity between the text information and all keywords in the preset task list is below the preset threshold, then the task corresponding to the text information is identified as an unconventional target task; and a task decomposition module configured to decompose regular target tasks or unconventional target tasks into one or more sub-tasks and identify the type of each sub-task. The system determines a solution matching each subtask based on its type, the solution including the execution tools, execution order, and priority of each subtask; a task execution module is configured to supervise and control the execution of each subtask according to its execution tools, execution order, and priority, and to collect the running status information and hardware running status information of each subtask; a memory module is configured to cache key data during the execution of each subtask, provide the large model with a knowledge graph data library and a preset task list, and perform backups of key data; and a planning and decision module is configured to manage and make decisions regarding the acquisition of external instructions, the invocation of the large model, the decomposition of regular and non-regular target tasks, and the execution of each subtask.
[0006] This invention provides a robot and its intelligent interaction method, which adopts an architecture combining large models and knowledge graphs to drive various deep learning algorithms. It has functions such as image recognition, speech recognition, large language model analysis and processing, speech synthesis, and multimodal fusion perception. It can realize human-computer interaction and autonomous planning and intelligent control of the robot body. It can not only vividly complete exhibition hall explanation tasks, but also accurately answer visitors' questions, whether professional or non-professional knowledge. It can also autonomously perform tasks other than explanation, such as greeting guests, which greatly improves the user's exhibition hall service experience. Attached Figure Description
[0007] Other features, objects, and advantages of this application will become more apparent from the following detailed description of non-limiting embodiments with reference to the accompanying drawings:
[0008] Figure 1 This is a logical structure diagram of a robot provided in one embodiment of this application;
[0009] Figure 2 This is a business logic diagram of a robot provided in one embodiment of this application;
[0010] Figure 3 This is a flowchart illustrating a robot intelligent interaction method provided in one embodiment of this application. Figure 1 ;
[0011] Figure 4 This is a flowchart illustrating a robot intelligent interaction method provided in one embodiment of this application. Figure 2 ;
[0012] Figure 5 This is a flowchart of the subtask decomposition of a robot intelligent interaction method provided in one embodiment of this application;
[0013] Figure 6 This is a flowchart illustrating the proactive greeting subtask of the robot intelligent interaction method provided in one embodiment of this application. Figure 1 ;
[0014] Figure 7 This is a flowchart illustrating the proactive greeting subtask of the robot intelligent interaction method provided in one embodiment of this application. Figure 2 ;
[0015] Figure 8 This is a flowchart illustrating the exhibition hall explanation subtask of the robot intelligent interaction method provided in one embodiment of this application. Detailed Implementation
[0016] To make the objectives, technical solutions, and advantages of the embodiments of the present invention clearer, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of the present invention, not all embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those skilled in the art without creative effort are within the scope of protection of the present invention.
[0017] The terminology used in the embodiments of this invention is for the purpose of describing particular embodiments only and is not intended to limit the invention. The singular forms “a,” “the,” and “the” as used in the embodiments of this invention and the appended claims are also intended to include the plural forms unless the context clearly indicates otherwise.
[0018] It should be understood that although the terms first, second, third, etc., may be used to describe the acquisition modules in the embodiments of the present invention, these acquisition modules should not be limited to these terms. These terms are only used to distinguish the acquisition modules from each other.
[0019] Depending on the context, the word "if" as used here can be interpreted as "when," "when," "in response to determination," or "in response to detection." Similarly, depending on the context, the phrase "if determination" or "if detection (of the stated condition or event)" can be interpreted as "when determination," "in response to determination," "when detection (of the stated condition or event)," or "in response to detection (of the stated condition or event)."
[0020] It should be noted that the directional terms such as "upper," "lower," "left," and "right" used in the embodiments of the present invention are used to describe the angles shown in the accompanying drawings and should not be construed as limiting the embodiments of the present invention. Furthermore, in the context, it should be understood that when it is mentioned that an element is formed "upper" or "lower" of another element, it can not only be formed directly "upper" or "lower" of the other element, but also indirectly "upper" or "lower" of the other element through an intermediate element.
[0021] This application describes the present invention in detail using embodied robots and intelligent interaction methods of embodied robots as examples. However, the implementation of the present invention is not limited to embodied robots, and other robots with the technical solutions of the present invention are also within the protection scope of the present invention.
[0022] The robots mentioned in this application include embodied robots, non-embodied robots, devices with intelligent human-computer interaction capabilities, etc. An embodied robot refers to an intelligent agent with a body that supports physical interaction, capable of interacting with the environment, perceiving, autonomously planning, making decisions, acting, and performing tasks like a human. The embodied robot of this invention adopts an architecture combining a large language model and a knowledge graph, driving various deep learning algorithms, and possesses functions such as image recognition, speech recognition, large language model analysis and processing, speech synthesis, and multimodal fusion perception, realizing human-computer interaction and autonomous planning and intelligent control of the robot itself. With the development of embodied robot technology, it is increasingly being used in the fields of exhibition hall explanation and exhibition hall service technology.
[0023] See Figure 1 One embodiment of this application provides an embodied robot 100, including: an instruction acquisition module 101, an artificial intelligence module 102, a task decomposition module 103, a task execution module 104, a memory module 105, and a planning and decision-making module 106. The modules in system 100 can be computer program modules or hardware modules.
[0024] Specifically:
[0025] The instruction acquisition module 101 is used to acquire external instruction information from the robot and convert it into text information. The instruction acquisition module 101 includes robot perception modules such as a microphone, which detect external audio signals in real time. When the microphone receives audio, it first determines the audio type, analyzes the volume, intensity, and duration of the audio, and eliminates environmental noise, crowd mixing, and voices that are too low or unrecognizable. Then, the processed audio data is converted into text information using a speech recognition algorithm. This audio-to-text conversion process can be implemented in the instruction acquisition module, the robot's core controller, or through the artificial intelligence module 102.
[0026] The artificial intelligence module 102 can be a processor or server based on a large model. In this embodiment, the large model is a multimodal model or a large language model. The artificial intelligence module 102 is used to perform vectorization processing and similarity comparison on the text information and keywords in a preset task list using the large model. Tasks corresponding to keywords with similarity reaching or exceeding a preset threshold are designated as regular target tasks; if the similarity between the text information and all keywords in the preset task list is below the preset threshold, the task corresponding to the text information is determined as an unconventional target task. Since the operation of the large model requires significant computing resources to ensure the accuracy and stability of the analysis results, and the processing time should be as short as possible for smooth response, the preferred solution, besides deploying the robot itself, is to deploy the large model on a remote server. Therefore, after the speech detection function recognizes human voice and converts it into text information, the text information needs to be sent to the remotely deployed large model server via HTTP protocol through a web service for analysis and processing.
[0027] Furthermore, the preset task list is pre-generated and includes target task keywords and sub-task keywords. Each keyword corresponds to an execution tool for the target task or sub-task. The execution tool includes program functional units that implement the task or APIs that implement the task.
[0028] The task decomposition module 103 is used to decompose a regular target task or an unconventional target task into one or more sub-tasks and determine a solution that matches each sub-task. The solution defines the execution tools, execution order and priority of each sub-task.
[0029] Specifically, the analysis results of external voice commands are defined as target tasks. Target tasks existing in the preset task list are considered regular target tasks, while those not existing are considered non-regular target tasks. To complete the target task, the operations of related functions are defined as subtasks. For each subtask, a corresponding monitoring task is generated to collect information on the hardware operation involved in the execution of the subtask, such as the position of the robotic arm and resource usage, as well as the subtask's running status, such as task progress and heartbeat messages. The collected data is fed back to the planning and decision-making module 106 and the memory module 105, so that when an anomaly occurs in the subtask, it can be handled promptly.
[0030] Furthermore, the task decomposition module 103 is also used to directly match each subtask corresponding to the conventional target task in the preset task list for the conventional target task, and determine the type of each subtask; if the subtask is a professional knowledge question-and-answer task, then the professional knowledge explanation words and question text information in the knowledge graph database are used as prompt information for the large model, and the large model is used to semantically search for professional knowledge related to the question text information in the knowledge graph database to generate contextually coherent sentences, and store them in the memory module for broadcast. If the subtask is a non-professional knowledge question-and-answer task, and there are matching task keywords in the preset task list, then it is further determined whether it is an interactive dialogue task or a query task; if the subtask is an interactive dialogue task, the large model generates multi-turn dialogue content as a solution, and stores the results in the memory module for broadcast; if the subtask is a query task, then the relevant execution tools are matched in the preset task list according to the task keywords and the execution results are returned, and then the execution results are input into the large model to generate contextually coherent sentences, and the results are stored in the memory module for broadcast. If the subtask is a non-professional knowledge question-and-answer task and there is no matching task keyword in the preset task list, a prompt message will be returned to the visitor indicating that the instruction should be entered again or that the instruction cannot be executed. If the subtask is a control task, a solution matching the subtask will be directly searched in the preset task list. If a match is found, the solution will be executed; otherwise, a prompt message will be returned indicating that the instruction should be entered again or that the instruction cannot be executed.
[0031] The process of comparing input instructions and task keywords is as follows: audio signals are detected in real time. When an audio signal is received, the audio type is determined first. After recognizing human voice and converting it into text information, it is transmitted to the large model via Web service using protocols such as HTTP. The large model first performs word segmentation on the text, and then compares the similarity between the segmented words (tokens) and the keywords in the preset task list after vectorization. The one with the highest similarity is then determined as the task keyword. The program function module and API that implement the sub-task corresponding to the task keyword are used as the execution tools in the solution of the selected sub-task.
[0032] For unconventional target tasks, the preset task list information is used as prompts for the large model. The large model and prior knowledge are used to generate names for each subtask and temporary solutions for each subtask. The task decomposition operation is then performed according to the task decomposition process for conventional target tasks. If any subtask generated by the unconventional target task does not have a matching task keyword in the preset task list, a prompt message indicating that the user should re-enter the command or that the command cannot be executed is returned to the visitor.
[0033] Furthermore, the task decomposition module 103 is also used to match sensitive words in the dialogue with filter words in a pre-set filter word library for interactive dialogue tasks in non-professional knowledge question-and-answer tasks, thereby filtering or replacing the sensitive words in the answer content, or prohibiting the answer to questions containing the sensitive words.
[0034] The task execution module 104 is used to supervise and control the execution of each subtask according to the execution tool, execution order and priority of each subtask, and to collect the running status information of each subtask and the hardware running status information.
[0035] Furthermore, the task execution module 104 is also used to monitor and control the start, pause, or termination of each subtask according to its execution order and priority; based on the collected running status information and hardware running status information of each subtask, it determines whether the execution time of the subtask exceeds the predetermined time, or whether the heartbeat signal received by the subtask per unit time is lower than the threshold. If so, it stops and restarts the subtask, or updates the input parameters of the subtask and executes it again. For example, in the exhibition hall explanation task, if the exhibition hall explanation position is occupied, it tries to replan the navigation path with other explanation positions (i.e., new input parameters). If the position is also occupied, it terminates all subtasks included in the solution based on priority and execution order, and returns a prompt to inform the user.
[0036] Furthermore, the task execution module 104 is also used to obtain the central words of regular or unconventional target tasks, match them with pre-entered visitor information or professional knowledge related to the central words; determine the input parameters in the solution of the sub-task based on the matched visitor information or professional knowledge; execute the solution of the sub-task based on the input parameters of the sub-task, and perform cross-task association query analysis. For example, the central word "visitor name" is parsed from the input audio command, and the visitor's identity and facial information are matched with the pre-entered information of the robot using the "visitor name". Then, the input parameters of the sub-tasks such as the explanation content, the exhibition hall, and the location of the exhibition hall are matched with the corresponding visitor through a network or tree-like data relationship. The sub-tasks such as the exhibition hall explanation can be specifically executed through the input parameters of these sub-tasks, and cross-task association query analysis can also be performed between the exhibition hall explanation sub-task and the exhibition hall path planning sub-task.
[0037] The memory module 105 is used to cache key data during the execution of each subtask, provide the large model with a knowledge graph database and a preset task list, and perform backups of key data. The knowledge graph database serves as a data source of professional knowledge; when external instructions are professional questions, the robot uses the large model combined with the knowledge graph to obtain professional knowledge to provide an answer.
[0038] The planning and decision-making module 106 controls the operation of the instruction acquisition module 101, the artificial intelligence module 102, the task decomposition module 103, the task execution module 104, and the memory module 105. It manages and makes decisions regarding the acquisition of external instructions, the invocation of large models, the decomposition of routine and non-routine target tasks, and the execution of each sub-task. For example, it controls the instruction acquisition module to acquire the user's natural language input, invokes the artificial intelligence module for analysis and processing, and feeds back the results of question-and-answer tasks to the visitor. Furthermore, in specific applications such as visitor behavior analysis, identity authentication, cruise path planning, and automatic obstacle avoidance, the planning and decision-making module 106 integrates information from lidar and environmental perception modules such as visible light, infrared, and depth. It drives various intelligent algorithms to perform multi-dimensional and multi-modal fusion analysis, planning and adjusting the execution tasks in real time based on task execution status, user feedback, and environmental perception, providing more stable system performance and more accurate analysis results. Further, the planning and decision-making module 106 can specifically be a robot controller and a control program running on the controller.
[0039] See Figure 2 The operation and execution process of the embodied robot 100 is divided into three stages: planning, decision-making, and execution. First, the planning stage includes speech recognition via multimodal perception technology, converting human speech into text information, or visual understanding, converting visual information into text information, through the instruction acquisition module 101; the artificial intelligence module 102 is responsible for text processing and semantic understanding, converting externally acquired instruction information into target tasks. Next, the decision-making stage includes the task decomposition module 103 breaking down the target task into several sub-tasks, matching executable solutions, and task classification, management, and information aggregation through the planning and decision-making module 106. Finally, the execution stage includes the task execution module 104, under the control of the planning and decision-making module 106, controlling relevant functional modules to complete task operations according to execution logic and priority, and feeding back the monitoring results of task execution to the planning and decision-making module 106 and the memory module 105. The memory module 105 is used to cache key data and provide a knowledge graph database for use by large models as needed.
[0040] The embodied robot in this application embodiment is used for explanation and guidance in exhibition hall services. The following is a detailed introduction to the various business modules of the embodied robot 100 through the most important tasks in exhibition hall services: proactive greeting and exhibition hall explanation.
[0041] Implementation Scenario 1
[0042] The task of proactively welcoming guests.
[0043] Visitor images and identity information are pre-entered into the graph database of the robot's memory module 105. The relevant visitor information is stored in a structured form within the graph database. The access and query interface for the graph database is integrated into the memory module 105 according to unified rules for external use. The structured storage includes two parts: "description content" and "relationships." "Description content" is the actual stored data, which in this embodiment includes multi-dimensional data such as visitor facial images (close-up photos) and full-body photos, as well as one-dimensional data (strings, variables, etc.) such as visitor name, age, visit requirements and interests, and visit plan. "Relationships" can be "edges" in a knowledge graph triple or an attribute value of data, such as a member variable in a structure. This invention constructs a connection between the visitor's name and the "description content," describing the relationships between the data through these "relationships."
[0044] For example, if a visitor is named Li Ming and their visit request is "XXX topic", by querying the information "Li Ming", relevant information can be retrieved and used as input parameters for subtasks.
[0045] After entering the information, you can give the robot a voice command to greet you. Based on the functional modules described above, the specific process is as follows:
[0046] Define the target task
[0047] The robot receives the voice information of the proactively welcoming guests through a microphone and converts it into text information.
[0048] The robot's voice detection function detects voice signals in real time. When the microphone receives audio, it first determines the audio type, analyzes the volume, intensity, and duration of the audio, and eliminates environmental noise, crowd mixing, and voices that are too low or unrecognizable before transmitting it to the large model. Preferably, a multimodal large model can be used instead of a regular large language model. The multimodal large model can use audio as input and does not need to be converted to text. Since the large model requires a large amount of computing resources to ensure the accuracy and stability of the analysis results, the processing time should be as short as possible for smooth response. In addition to deploying the robot itself, the preferred solution is to deploy the large model to a remote server. Therefore, after the voice detection function recognizes the human voice and converts it into text information, it needs to send the text information from the local voice device to the remotely deployed large model via a web service using protocols such as HTTP. For example, in JSON format, it should at least include fields such as request ID, text information to be analyzed, and the ID of the large model to be called. The remote end responds to the request, indicating that it has received the request from the robot. If the robot's web server does not receive a response from the remote end within a specified time, it resends the request. If it still does not receive a response or feedback within the specified time, it is determined that the message has failed and the visitor is notified by voice.
[0049] A large-scale model is used to analyze text information and generate a proactive greeting task. Optionally, the voice command can be a conversational phrase or a formatted phrase. For conversational phrases, the large-scale model's text processing involves: first, word segmentation to convert the phrase into tokens, and then semantic search. A task list containing task keywords is pre-created, with each keyword (task name) corresponding to a task solution. The segmented tokens are then vectorized using the large-scale model, and similarity is compared to determine the target task.
[0050] For example, the colloquial instruction is "Go to the door to greet XXX". The word "greet" has the highest similarity to the keyword "proactively greet guests" in the task list and exceeds the similarity threshold. Therefore, the target task is determined to be "proactively greet guests".
[0051] It should be noted that this embodiment calculates similarity based on the vectorized feature distance. The advantage of doing so is that it can accommodate the processing of synonyms and similar words. This invention defines a target task consisting of subtasks from a preset task list as a regular target task. Preferably, a formatted instruction statement is used, such as "Please perform the active greeting task to greet Mr. / Ms. XXX!" This description is more precise, can optimize the length of the input token, reduce computing resources, and improve the reliability and stability of feedback from large models.
[0052] (2) Decompose the target task into several sub-tasks
[0053] The proactive greeting target task has been pre-stored in the preset task list in the memory module 105. Therefore, the proactive greeting target task belongs to the regular target task. The preset task list also contains several sub-tasks of proactive greeting, including but not limited to ① cruising to the greeting location; ② visitor identity authentication; ③ greeting; ④ recognizing the visitor's voice commands, etc.
[0054] The system detects audio signals in real time. Upon receiving an audio signal, it first determines the audio type. After recognizing the human voice and converting it into text information, it transmits the text to the large model via a web service using protocols such as HTTP. The large model first segments the text into words, then compares the similarity between the segmented words (tokens) and the keywords in the preset task list after vectorization. The task with the highest similarity is then identified as the target task in the list. The four sub-tasks mentioned above are then found through the matched target task.
[0055] Next, if all four subtasks are identified as control tasks, the program function unit that matches the control subtask is directly searched in the preset task list. The start, pause, or termination of each subtask is monitored and controlled according to the execution order and priority of each subtask.
[0056] (3) Execute sub-tasks
[0057] The task execution module 104 determines, based on the collected running status information and hardware running status information of each subtask, whether the execution time of each subtask exceeds the predetermined time, or whether the heartbeat signal received by the subtask per unit time is lower than the threshold. If so, it stops and restarts the subtask, or updates the input parameters of the subtask and executes it again. Specifically, this includes:
[0058] ① Perform the sub-task of patrolling to the welcoming location.
[0059] The robot is pre-controlled to navigate the entire exhibition hall, continuously collecting and generating an environmental map of the entire hall using inertial navigation elements (IMU), lidar, cameras, and other devices. One or more welcoming locations are pre-defined and assigned location IDs, such as suitable locations on either side of the entrance. The robot is then controlled to reach these locations, and its coordinates on the environmental map are recorded using IMU sensors and other means.
[0060] When performing an active greeting, the robot obtains its current position via IMU sensors, selects the nearest greeting location, and plans the shortest path on the environmental map based on these two locations. The robot then moves to the greeting location according to this shortest path. Simultaneously, during navigation, automatic obstacle avoidance is activated. Multimodal perception modules, including LiDAR and depth cameras, detect the size, distance, and direction of moving objects. If the distance to a moving object is below a safe distance threshold, or if the robot is judged to be moving relative to an object at a high speed exceeding a safe speed threshold, the robot will either wait or reduce its speed. Upon reaching the vicinity of the designated greeting location, the multimodal sensors, including LiDAR and depth cameras, determine if the location is occupied. If occupied, the robot waits for 1-2 minutes (the waiting time can be adjusted according to the actual situation). If still occupied, an adjacent greeting location is selected, the path is recalculated, and the robot reaches the designated location and waits in a safe area.
[0061] ② Execute the visitor authentication subtask
[0062] During the aforementioned movement, the robot's head-mounted camera detects and identifies visitors. Specifically, the camera captures facial images of the visitors, combines them with pre-recorded facial images, uses facial recognition technology to extract the facial region from the camera view, and then compares it with the system's facial images to confirm the visitor's identity. It should be noted that in this example, the "visitor identification" subtask has higher priority than the "navigation to the welcoming location" subtask, and their execution order is parallel. Once the robot encounters and identifies a visitor during its movement, it can skip the "navigation to the welcoming location" subtask and proceed to the next step.
[0063] ③ Perform the greeting sub-task
[0064] By combining multimodal data from visible light cameras, infrared depth cameras, and LiDAR, the robot intelligently senses and determines the distance and location of visitors. At a suitable safe distance, the robot adjusts its posture, detects the position of the visitor's face in the frame using its eye cameras, turns its head to face the visitor, and proactively greets them: broadcasting a welcome message that identifies the visitor, controlling the robotic arm to raise its left arm, and making a uniform left-right swinging motion 2-3 times to perform a welcoming gesture.
[0065] ④ Perform the subtask of recognizing visitor voice commands.
[0066] The system acquires visitor voice data through a sound card and other sensing modules, converts the acquired voice into text information, transmits it to a remote large model server via HTTP protocol for semantic analysis, and then sends the relevant response text back to the robot. After speech synthesis, the response is broadcast to the visitor.
[0067] Implementation Scenario 2
[0068] Exhibition hall tour guide duties.
[0069] The robot's perception module receives voice commands explaining the exhibition hall's content and converts them into text information. The large model analyzes the text information and generates a target task for explaining the exhibition hall's content by comparing it with a preset task list. Since the target task in this embodiment is decomposed into sub-tasks in the same way as the proactive welcoming task, it will not be described again.
[0070] The exhibition hall explanation breaks down the target task into subtasks and corresponding solutions. The solutions define the execution tools, execution order, and priority of each subtask. Each subtask includes: ① opening remarks; ② planning the visitor route; ③ explaining the exhibition area; ④ Q&A. If multiple exhibition areas are involved, subtasks ②-④ will be executed repeatedly.
[0071] This embodiment designs subtasks as parameterized APIs, meaning they perform differentiated operations based on input information. For example, the exhibition area explanation subtask generates movement routes between two exhibition areas based on the input exhibition area ID; the Q&A subtask generates explanations with different orders and content based on the input explanation unit ID. Similarly, the execution order and priority of subtasks are also specified. For example, the execution order of "exhibition area explanation" and "Q&A" can be specified as parallel, with "Q&A" having higher priority than "exhibition area explanation," meaning that answering visitor questions takes precedence.
[0072] Executing each subtask involves the following steps:
[0073] ① Perform the opening remarks task
[0074] Based on visitor information and the content of the explanation, the large model generates an opening speech and broadcasts it automatically. The opening speech includes, but is not limited to, an introduction to the exhibition hall and precautions for visiting. This part of the speech can also be pre-recorded, or an impromptu welcome speech can be generated based on the visitor's identity information. Then, the voice guides the visitor to follow the robot on the tour.
[0075] ② Execute the planned visitor route sub-task
[0076] The tour route and explanation locations are planned based on the content being explained. Specifically, the full-content explanation includes all exhibition areas, while the thematic explanation includes only some areas. The robot is pre-controlled to navigate the entire exhibition hall, continuously collecting and generating an environmental map of the entire hall using inertial navigation elements (IMU sensors), LiDAR, cameras, and other devices. One or more explanation locations within a specific exhibition area are pre-defined, and the robot is controlled to reach these locations, recording its coordinates on the environmental map using the IMU sensors. When performing a specific explanation task, the predetermined explanation location is determined based on the input exhibition area ID. The robot's current position is obtained through the IMU sensors, and the nearest explanation location is selected. A shortest path is then planned based on these two locations, and the robot is controlled to move to the explanation location.
[0077] Furthermore, for a specific narration location, in addition to recording the point's position on the environmental map as mentioned above, the robot's pose is also recorded. For example, when the robot reaches the designated location, it is controlled to turn around and assume a narration posture. Relevant operating parameters are recorded through inertial devices and other means as the robot's initial posture for preparing to narrate. Subsequent adjustments are made based on the camera on the robot's head and other sensors during the actual narration.
[0078] Furthermore, during navigation, automatic obstacle avoidance is activated. Multimodal perception modules, such as LiDAR and depth cameras, detect the size, distance, and direction of movement of moving objects. When the distance to a moving object is below a distance threshold, or when relative movement with a high speed is detected, the robot actively waits or reduces its speed. Upon reaching the vicinity of the designated explanation location, multimodal perception modules, such as LiDAR and depth cameras, determine if the location is occupied. If so, it waits for 1-2 minutes (the time is adjustable). If still occupied, it selects an adjacent explanation location, recalculates the path, reaches the designated location, and waits in a safe area. During movement, automatic obstacle avoidance is implemented, using multimodal perception modules such as LiDAR and depth cameras to detect moving objects and actively waits or avoids pedestrians. Furthermore, in the event of an obstruction and sudden stop (detected by LiDAR, indicating the distance is too close and exceeds the braking distance), the robot slows down or stops, and simultaneously broadcasts a voice message reminding visitors to be careful and maintain a reasonable following distance.
[0079] ③ Perform the sub-task of explaining the exhibition area
[0080] The robot moves to its designated explanation position in the exhibition area, acquires a pre-set initial posture based on the current position information, and then uses its eye camera to detect the visitor's face in the frame. It further fine-tunes its posture to face the visitor and, following the predetermined explanation process, reads the explanation script for the exhibition area. At the same time, it wirelessly connects to the on-site multimedia equipment and, according to the explanation order and content, remotely controls the indoor multimedia playback of audio-visual files via the robot's wireless communication module.
[0081] In this embodiment, the "exhibition hall explanation" objective task, besides the "opening remarks" and "planning the visitor route" mentioned above, has a crucial sub-task: "explanation of exhibition area content." In this embodiment, the explanation of exhibition area content is a collection of sub-tasks, containing multiple sub-tasks. Their structure is consistent, including exhibition area IDs and corresponding explanation unit IDs. The difference lies in the number and numbering of the included exhibition area IDs and corresponding explanation unit IDs. For example: full content explanation, historical evolution-themed explanation, innovative achievements-themed explanation, etc. Considering the actual layout of the exhibition hall and multimedia interaction, different explanation objective tasks may correspond to different exhibition areas and different explanation content. Therefore, the explanation content is organized according to the specific content of each exhibition area. For example, an exhibition hall might be arranged chronologically to introduce the city's development history, divided into several exhibition areas such as ancient civilization, modern times, early years of the People's Republic of China, the reform and opening-up period, and contemporary development. Each exhibition area includes explanation units such as humanities and history, social landscape, and development and achievements. For the full content explanation, the entire content is explained chronologically; for the historical evolution-themed explanation, only the humanities and history portion of each exhibition area is explained. Each exhibition area and each explanation unit is individually numbered. Different explanation unit numbers correspond to different explanation terms. Each exhibition area is further associated with different explanation unit numbers. The solution for the "Exhibition Area Explanation" subtask is a set of exhibition area numbers and explanation unit numbers. The relevant functional modules are encapsulated into an API, which outputs the corresponding explanation content based on the input exhibition area number and explanation unit number.
[0082] The specific explanation text and audio are stored in the database of memory module 105. When appropriate (e.g., after the robot arrives at its designated explanation location in an exhibition hall), the audio is played directly by the robot's speaker, or it is converted from text to speech through speech synthesis. The explanation order and even the specific explanation text involved in the sub-task solution can be customized and edited. The number or specific explanation units included in a particular exhibition area can also be edited, as can the multimedia playback content. It can explain content from multiple exhibition areas, or only a portion of a single exhibition area; it can also be a comprehensive explanation or a thematic explanation.
[0083] ④ Perform the question-and-answer subtask
[0084] The system acquires feedback information such as visitor voice, facial expressions, and body movements through sensing devices such as cameras and sound cards. It answers questions raised by visitors, inputs the converted text information into a network model composed of a large model and a knowledge graph to obtain question-and-answer results, and stores the question-and-answer text and voice information in the memory module 105 for broadcast at appropriate times.
[0085] Specifically, upon receiving a visitor's voice, the system first determines the audio type, analyzes the volume, intensity, and duration, and eliminates environmental noise, crowd mixing, excessively low volume, and unrecognizable voices before transmitting it to the large model. If the question is related to professional knowledge about the exhibition content, the system uses explanations and other professional knowledge from the knowledge graph database as prompts for the large model, requiring it to answer by combining the professional knowledge with relevant information to generate coherent sentences. If the question is related to non-professional knowledge about interactive interests, such as asking "Today's weather," the system uses a voice detection function to detect the voice signal in real time. Upon receiving the audio, the system first determines the audio type, identifies the voice, converts it into text, and then transmits it to the large model server via a web service using protocols such as HTTP. The large model first segments the text, then vectorizes the words (tokens) and task keywords such as "weather / meteorology" from the preset task list, compares the similarity, and then determines the pre-defined weather API in the preset task list. It then calls the relevant weather API to query the weather from the network or database and plays the retrieved weather information through speech synthesis. If the weather query is a regular command, i.e., it is in the preset task list, then follow the process above, call the relevant API, execute normally, and return the query result; if the weather query is not in the preset task list, then it is an unregular command, and the internal system determines that there is no relevant API and cannot return the weather result, then returns a prompt message, asking the visitor to repeat the relevant voice command, or informing them that the relevant voice command cannot be completed.
[0086] When there are multiple exhibition areas, repeat subtasks ②-④ until the explanation ends. Then play the closing remarks (similar to the opening remarks, which can be pre-recorded or generated by the large model). The robot automatically navigates back to the origin, automatically avoids obstacles during the movement, and provides voice prompts to visitors to pay attention to safety.
[0087] The embodied robot provided in this embodiment adopts an architecture that combines a large language model and a knowledge graph, driving various deep learning algorithms. It has functions such as image recognition, speech recognition, large language model analysis and processing, speech synthesis, and multimodal fusion perception. It can realize human-computer interaction and autonomous planning and intelligent control of the robot body. It can not only vividly complete the exhibition hall explanation task, but also accurately answer visitors' questions, whether professional or non-professional knowledge. It can also autonomously perform tasks other than explanation, such as greeting guests, which greatly improves the user's exhibition hall service experience.
[0088] See Figure 3 , 4 Another embodiment of the present invention also provides an intelligent interaction method for an embodied robot, which relies on the embodied robot in the above-described product embodiments. The method includes the following steps:
[0089] Step S101: Obtain external command information from the robot and convert the external command information into text information;
[0090] Step S102: The text information and keywords in the preset task list are vectorized and compared using a large model. The tasks corresponding to keywords with similarity reaching or exceeding a preset threshold are taken as regular target tasks. If the similarity between the text information and the keywords in the preset task list is all below the preset threshold, the task corresponding to the text information is determined as an unconventional target task.
[0091] Step S103: Decompose the conventional or unconventional target task into one or more subtasks, identify the type of each subtask, and determine the solution matching each subtask according to the type of each subtask. The solution includes the execution tool, execution order and priority of each subtask.
[0092] Step S104: Based on the execution tools, execution order and priority of each subtask, supervise and control the execution of each subtask, and collect the running status information and hardware running status information of each subtask.
[0093] Step S105: Cache key data during the execution of each subtask, provide the large model with a knowledge graph database and a preset task list, and back up the key data;
[0094] Step S106: Make management decisions on the acquisition of external instructions, the invocation of large models, the decomposition of regular and non-regular target tasks, and the execution of each sub-task.
[0095] Further, see Figure 5 Step S103 further includes the following logical flow for subtask decomposition:
[0096] For the aforementioned regular target task, each subtask corresponding to the regular target task is directly matched in the preset task list, and the type of each subtask is determined. If the subtask is a professional knowledge question-and-answer task, the professional knowledge explanation terms and question text information in the knowledge graph database are used as prompts for the large model. The large model uses semantic search in the knowledge graph database to find professional knowledge related to the question text information, generates contextually coherent sentences, and stores them in the memory module for later broadcast. If the subtask is a non-professional knowledge question-and-answer task, and there are matching task keywords in the preset task list, it is further determined whether it is an interactive dialogue task or a query task. If the subtask is an interactive dialogue task, the large model generates multi-turn dialogue content. As a solution, the results are stored in the memory module for later broadcast. If the subtask is a query task, the relevant execution tool is matched in the preset task list according to the task keywords and the execution result is returned. The execution result is then input into the large model to generate a contextually coherent statement, and the result is stored in the memory module for later broadcast. If the subtask is a non-professional knowledge question-and-answer task and there is no matching task keyword in the preset task list, a prompt message indicating that the instruction should be entered again or that the instruction cannot be executed is returned to the visitor. If the subtask is a control task, a solution matching the subtask is directly searched in the preset task list. If a match is found, the solution is executed; otherwise, a prompt message indicating that the instruction should be entered again or that the instruction cannot be executed is returned.
[0097] For the unconventional target task, the preset task list information is used as the prompt information of the large model. The large model and prior knowledge are used to generate the names of each subtask and the temporary solutions for each subtask. The task decomposition operation is performed according to the task decomposition process of the conventional target task. When any subtask generated by the unconventional target task does not have a matching task keyword in the preset task list, a prompt message is returned to the visitor indicating that the instruction should be entered again or that the instruction cannot be executed.
[0098] Furthermore, the method includes the following steps: for interactive dialogue tasks in non-professional knowledge question-and-answer tasks, sensitive words in the dialogue are matched with filter words in a pre-set filter word library, thereby filtering or replacing the sensitive words in the answer content, or prohibiting the answer to questions containing the sensitive words.
[0099] Furthermore, step S104 also includes the following steps: according to the execution order and priority of each subtask, monitor and control the start, pause, or termination of each subtask; based on the collected running status information and hardware running status information of each subtask, determine whether the execution time of the subtask exceeds the predetermined time, or whether the heartbeat signal of the subtask received per unit time is lower than the threshold. If so, stop and restart the subtask, or update the input parameters of the subtask and execute it again.
[0100] Furthermore, the method includes the following steps: pre-generating a preset task list containing target task keywords and sub-task keywords; wherein each task keyword corresponds to an execution tool for a target task or sub-task, and the execution tool includes a program function unit for implementing the target task or sub-task and an API for implementing the target task or sub-task.
[0101] Furthermore, the process includes the following steps: obtaining the central keywords of regular or unconventional target tasks, matching them with pre-entered visitor information or expertise related to the central keywords; determining the input parameters in the solution of the sub-task based on the matched visitor information or expertise; executing the solution of the sub-task based on the input parameters of the sub-task, and performing cross-task correlation query analysis.
[0102] Further, see Figure 6 , 7 It also includes the execution steps for the proactive welcoming objective:
[0103] Step S201: When the target task is to actively welcome guests, the task decomposition module decomposes the welcoming task into four sub-tasks: cruising to the welcoming location, guest identity authentication, greeting, and recognizing the guest's voice command, and generates solutions corresponding to the four sub-tasks.
[0104] Step S202: The task execution module executes the sub-task of cruising to the welcoming position, obtains the robot's current position, selects the target welcoming position closest to the robot's current position from a number of preset welcoming positions, plans the shortest path in the system environment map based on the robot's current position and the target welcoming position, and controls the robot to reach the target welcoming position according to the shortest path; the sensor identifies whether the target welcoming position is occupied. If it is occupied, wait for a predetermined time. If the target welcoming position is still occupied after the predetermined time, select another adjacent welcoming position as the new target welcoming position, replan the path, and run to the new target welcoming position.
[0105] Step S203: The visitor identity authentication subtask is executed through the task execution module. During the cruise to the welcoming location and after reaching the target welcoming location, facial images of people passing by are acquired through the camera. The acquired facial images are compared with the target visitor images recorded in the system to confirm the visitor's identity. The visitor identity authentication subtask has a higher priority than the cruise to the welcoming location subtask. When the target visitor is identified, the cruise to the welcoming location subtask is skipped and subsequent tasks are executed.
[0106] Step S204: The task execution module executes the greeting sub-task, obtains the distance and location information of the target visitor through the sensor, identifies the face orientation of the target visitor, adjusts the robot's head to face the visitor's face at a safe distance, and controls the robot to perform the greeting action and broadcast a welcome message.
[0107] Step S205: The task execution module executes the voice command subtask of recognizing visitors, obtains the voice information of the target visitor, converts the voice information into text information, sends the text information to the artificial intelligence module for recognition, and caches the recognition result in the memory module.
[0108] Further, see Figure 8 It also includes the execution steps for explaining the target task in the exhibition hall:
[0109] Step S301: When the target task is the exhibition hall explanation task, the exhibition hall explanation task is decomposed into four sub-tasks through the task decomposition module: broadcasting the opening remarks, planning the visit route, explaining the exhibition area, and answering questions, and solutions corresponding to the four sub-tasks are generated.
[0110] Step S302: The task execution module executes the opening remarks sub-task, generates an opening remarks through the artificial intelligence module, and broadcasts it by voice. The opening remarks include at least one of the following: introduction to the exhibition area, precautions for visiting, and welcome speech.
[0111] Step S303: The task execution module executes the planning visitor path sub-task, driving the robot to run to the target exhibition area according to the preset explanation task and the system environment map; the exhibition area code is obtained based on the robot's current position information, and the nearest explanation location to the robot's current position is selected from several preset explanation locations as the target explanation location based on the exhibition area code; the shortest path is planned based on the robot's current position and the target explanation location, and the robot is controlled to reach the target explanation location along the shortest path; the target explanation location is identified by sensors as to whether it is occupied. If it is occupied, it waits for a predetermined time. If the target explanation location is still occupied after the predetermined time, another adjacent explanation location is selected as the new target explanation location, the path is replanned, and the robot runs to the new target explanation location; the position information of the target explanation location in the system environment map and the robot's initial posture data at the target explanation location are recorded.
[0112] Step S304: The task execution module executes the exhibition area explanation sub-task. Based on the robot's initial posture data and the location of the visitor's face, the robot is adjusted to face the visitor. The corresponding explanation content is retrieved from the memory module according to the exhibition area code, and the explanation is broadcast in voice according to the preset explanation process. The on-site multimedia equipment is wirelessly connected, and the indoor multimedia playback audio-visual files are remotely controlled through the wireless communication module according to the explanation order and content.
[0113] Step S305: The question-and-answer subtask is executed through the task execution module. The visitor's question is converted into text information through the instruction acquisition module, and the text information is input into the artificial intelligence module for semantic recognition. If the visitor asks about professional knowledge, the professional knowledge explanation words and question text information in the knowledge graph database of the memory module are used as prompt information for the big model. The big model uses semantic search in the knowledge graph database to find professional knowledge related to the question text information, generates a contextually coherent sentence, and broadcasts it. If the visitor asks about non-professional knowledge, the answer to the question is searched through the Internet or internal database and broadcast.
[0114] Furthermore, it also includes: during the cruise to the welcoming location and the planning of the visitor route, the task execution module detects the size, distance and direction of movement of the moving object through sensors. When the distance to the moving object is lower than a preset threshold or the relative speed to the moving object is greater than a preset threshold, the robot's movement speed is reduced or the robot operation is stopped.
[0115] It should be noted that each step of the intelligent interaction method provided in this embodiment corresponds exactly to the various functional modules of the robot in the product embodiment. Its implementation principle and technical effect are exactly the same as the corresponding functional units of the robot, and will not be repeated here.
[0116] The above description is merely a preferred embodiment of the present invention. Those skilled in the art should understand that the scope of disclosure in this invention is not limited to the specific combination of the above-described technical features, but should also cover other technical solutions formed by any combination of the above-described technical features or their equivalents without departing from the above-described concept. For example, technical solutions formed by substituting the above features with (but not limited to) technical features with similar functions disclosed in this invention.
Claims
1. A method for intelligent interaction with a robot, characterized in that, include: Acquire external command information from the robot and convert it into text information; The text information and keywords in the preset task list are vectorized and similarity compared by a large model. The tasks corresponding to keywords whose similarity reaches or exceeds a preset threshold are taken as regular target tasks. If the similarity between the text information and the keywords in the preset task list is all below the preset threshold, then the task corresponding to the text information will be identified as an unconventional target task. The routine or unroutine target task is decomposed into one or more subtasks, the type of each subtask is identified, and a solution matching each subtask is determined based on its type. The solution includes the execution tools, execution order, and priority of each subtask. Based on the execution tools, execution order, and priority of each subtask, the execution of each subtask is monitored and controlled, and the running status information of each subtask and the hardware running status information are collected. The system caches key data during the execution of each subtask, provides a knowledge graph database and a pre-defined task list to the large model, and performs backups of key data. Make management decisions regarding the acquisition of external instructions, the invocation of large models, the decomposition of routine and non-routine target tasks, and the execution of each subtask; For the aforementioned regular target task, each subtask corresponding to the regular target task is directly matched in the preset task list, and the type of each subtask is determined. If the subtask is a professional knowledge question-and-answer task, the professional knowledge explanation words and question text information in the knowledge graph database are used as prompt information for the large model. The large model uses semantic search in the knowledge graph database to find professional knowledge related to the question text information, generates contextually coherent sentences, and stores them in the memory module for broadcast. If the subtask is a non-professional knowledge question-and-answer task, and there are matching task keywords in the preset task list, then it is further determined whether it is an interactive dialogue task or a query task; if the subtask is an interactive dialogue task, multi-turn dialogue content is generated through the large model, and the results are stored in the memory module for broadcast. If the subtask is a query task, then the relevant execution tool is matched in the preset task list according to the task keywords and the execution result is returned. The execution result is then input into the large model to generate a contextually coherent statement and store the result in the memory module for broadcast. If the subtask is a non-professional knowledge question-and-answer task and there is no matching task keyword in the preset task list, then a prompt message will be returned indicating that the instruction should be entered again or that the instruction cannot be executed; if the subtask is a control task, then a solution matching the subtask will be searched directly in the preset task list. If a match is found, the solution will be executed; otherwise, a prompt message will be returned indicating that the instruction should be entered again or that the instruction cannot be executed. For the unconventional target task, the preset task list information is used as the prompt information of the large model. The names of each subtask and the temporary solutions of each subtask are generated through the large model and prior knowledge, and the task decomposition operation is performed according to the task decomposition process of the conventional target task. When any subtask generated by the unconventional target task does not have a matching task keyword in the preset task list, the prompt information of re-entering the command or the command cannot be executed is returned to the visitor. Obtain the central keywords for regular or unconventional target tasks, and match them with pre-entered visitor information or expertise related to the central keywords; determine the input parameters in the solution of the sub-task based on the matched visitor information or expertise. The solution for the subtask is executed based on the input parameters of the subtask, and cross-task correlation query analysis is performed.
2. The intelligent interaction method for a robot according to claim 1, characterized in that, Also includes: For interactive dialogue tasks within non-professional knowledge question-and-answer tasks, sensitive words in the dialogue are matched with filter words in a pre-set filter word library, thereby filtering or replacing the sensitive words in the answer content, or prohibiting answers to questions containing the sensitive words.
3. The intelligent interaction method for a robot according to claim 1, characterized in that, The steps of supervising and controlling the execution of each subtask based on its execution tool, execution order, and priority, and collecting the running status information of each subtask and the hardware running status information, include: According to the execution order and priority of each subtask, monitor and control the start, pause, or termination of each subtask. Based on the collected running status information of each subtask and hardware running status information, determine whether the execution time of the subtask exceeds the predetermined time, or whether the heartbeat signal of the subtask received per unit time is lower than the threshold. If so, stop and restart the subtask, or update the input parameters of the subtask and execute it again.
4. The intelligent interaction method for a robot according to claim 1, characterized in that, Also includes: A preset task list containing target task keywords and subtask keywords is generated in advance; wherein, each task keyword corresponds to an execution tool for a target task or subtask, and the execution tool includes a program function unit for implementing the target task or subtask and an API for implementing the target task or subtask.
5. The intelligent interaction method for a robot according to claim 1, characterized in that: When the general target task is to proactively greet guests, the greeting task is decomposed into four sub-tasks: cruising to the greeting location, guest identification, greeting, and recognizing the guest's voice commands, and solutions corresponding to the four sub-tasks are generated. Perform the following steps according to the solution described: The robot performs a cruise to the welcoming position sub-task, obtains its current position, selects the target welcoming position closest to the robot's current position from a number of preset welcoming positions, and plans the shortest path in the system environment map based on the robot's current position and the target welcoming position, and controls the robot to reach the target welcoming position along the shortest path. The robot uses sensors to identify whether the target welcoming position is occupied. If it is occupied, it waits for a predetermined time. If the target welcoming position is still occupied after the predetermined time, it selects another adjacent welcoming position as the new target welcoming position, replans the path, and runs to the new target welcoming position. During the robot's movement, the robot uses sensors to detect the size, distance, and direction of movement of moving objects. When the distance to the moving object is lower than a preset threshold or the relative speed to the moving object is greater than a preset threshold, the robot's movement speed is reduced or the robot stops running. The visitor identity authentication subtask is executed. During the cruise to the welcoming location and after reaching the target welcoming location, facial images of people passing by are acquired through the camera. The acquired facial images are compared with the target visitor images recorded in the system to confirm the visitor's identity. The visitor identity authentication subtask has a higher priority than the cruise to the welcoming location subtask. When the target visitor is identified, the cruise to the welcoming location subtask is skipped and the subsequent tasks are executed. The robot performs the greeting sub-task by acquiring the distance and location information of the target visitor through sensors, identifying the facial orientation of the target visitor, adjusting the robot's head to face the visitor's face at a safe distance, and controlling the robot to perform the greeting action and broadcast a welcome message. The system executes a subtask to identify the visitor's voice command, acquires the target visitor's voice information, converts the voice information into text information, sends the text information to the artificial intelligence module for recognition, and caches the recognition result in the memory module.
6. The intelligent interaction method for a robot according to claim 1, characterized in that: When the regular target task is the exhibition hall explanation task, the exhibition hall explanation task is decomposed into four sub-tasks: broadcasting the opening remarks, planning the visit route, explaining the exhibition area, and Q&A, and solutions corresponding to the four sub-tasks are generated. Perform the following steps according to the solution described: The task of broadcasting the opening remarks is to generate an opening speech through the artificial intelligence module and broadcast it by voice. The opening speech includes at least one of the following: introduction of the exhibition area, precautions for visiting, and welcome speech. The robot executes the sub-task of planning the visitor route, driving it to the target exhibition area according to the system environment map based on the preset explanation task; it obtains the exhibition area code based on the robot's current location information, selects the explanation location closest to the robot's current location from several preset explanation locations based on the exhibition area code as the target explanation location; it plans the shortest path based on the robot's current location and the target explanation location, and controls the robot to reach the target explanation location along the shortest path. The system uses sensors to identify whether the target explanation location is occupied. If it is occupied, it waits for a predetermined time. If the target explanation location is still occupied after the predetermined time, it selects another adjacent explanation location as the new target explanation location, replans the path, and runs to the new target explanation location. The system records the location information of the target explanation location in the system environment map and the robot's initial posture data at the target explanation location. During robot movement, sensors detect the size, distance, and direction of movement of moving objects. When the distance to the moving object is lower than a preset threshold or the relative speed with respect to the moving object is greater than a preset threshold, the robot's movement speed is reduced or the robot stops operating. Perform the exhibition area explanation sub-task, adjust the robot's orientation to face the visitor based on the robot's initial posture data and the visitor's face location; retrieve the corresponding explanation content from the memory module according to the exhibition area code, and broadcast it via voice according to the preset explanation process; wirelessly connect to the on-site multimedia equipment, and remotely control the indoor multimedia playback of audio-visual files according to the explanation order and content through the wireless communication module. The question-and-answer subtask is executed. The instruction acquisition module converts the visitor's question into text information and inputs the text information into the artificial intelligence module for semantic recognition. If the visitor asks about professional knowledge, the professional knowledge explanation words and question text information in the knowledge graph database of the memory module are used as prompt information for the big model. The big model uses semantic search in the knowledge graph database to find professional knowledge related to the question text information, generates contextually coherent sentences, and broadcasts them. If a visitor asks a question that is not in their area of expertise, the answer will be searched for on the internet or in an internal database and then broadcast.
7. A robot for executing the intelligent interaction method of the robot according to any one of claims 1-6, characterized in that, include: The instruction acquisition module is configured to acquire external instruction information from the robot and convert the external instruction information into text information. The artificial intelligence module is configured to perform vectorization and similarity comparison on the text information and keywords in the preset task list through a large model, and take the tasks corresponding to keywords whose similarity reaches or exceeds a preset threshold as regular target tasks. If the similarity between the text information and the keywords in the preset task list is all below the preset threshold, then the task corresponding to the text information will be identified as an unconventional target task. The task decomposition module is configured to decompose a regular or non-regular target task into one or more subtasks, identify the type of each subtask, and determine a solution matching each subtask based on its type. The solution includes the execution tools, execution order, and priority of each subtask. The task execution module is configured to supervise and control the execution of each subtask according to the execution tool, execution order and priority of each subtask, and to collect the running status information of each subtask and the hardware running status information. The memory module is configured to cache key data during the execution of each subtask, provide the large model with a knowledge graph database and a preset task list, and perform backups of key data. The planning and decision-making module is configured to manage and make decisions regarding the acquisition of external instructions, the invocation of large models, the decomposition of routine and non-routine target tasks, and the execution of each sub-task. The task decomposition module is further configured to: for the regular target task, directly match each subtask corresponding to the regular target task in the preset task list, and determine the type of each subtask; if the subtask is a professional knowledge question-and-answer task, then use the professional knowledge explanation words and question text information in the knowledge graph database as prompt information of the big model, use the big model to semantically search for professional knowledge related to the question text information in the knowledge graph database, generate contextually coherent sentences, and store them in the memory module for broadcasting; If the subtask is a non-professional knowledge question-and-answer task, and there are matching task keywords in the preset task list, then it is further determined whether it is an interactive dialogue task or a query task; if the subtask is an interactive dialogue task, multi-turn dialogue content is generated through the large model, and the results are stored in the memory module for broadcast. If the subtask is a query task, then the relevant execution tool is matched in the preset task list according to the task keywords and the execution result is returned. The execution result is then input into the large model to generate a contextually coherent statement and store the result in the memory module for broadcast. If the subtask is a non-professional knowledge question-and-answer task and there is no matching task keyword in the preset task list, a prompt message indicating that the user should re-enter the command or that the command cannot be executed is returned to the visitor. If the subtask is a control task, a solution matching the subtask is directly searched in the preset task list. If a match is found, the solution is executed; otherwise, a prompt message indicating that the user should re-enter the command or that the command cannot be executed is returned. For unconventional target tasks, the preset task list information is used as prompt information for the large model. The large model and prior knowledge are used to generate the names of each subtask and temporary solutions for each subtask, and the task decomposition operation is performed according to the task decomposition process of the conventional target task. When any subtask generated by the unconventional target task does not have a matching task keyword in the preset task list, a prompt message indicating that the user should re-enter the command or that the command cannot be executed is returned to the visitor. The task execution module is further configured to: obtain the central words of a regular or unconventional target task, match pre-entered visitor information or professional knowledge related to the central words; and determine the input parameters in the solution of the sub-task based on the matched visitor information or professional knowledge. The solution for the subtask is executed based on the input parameters of the subtask, and cross-task correlation query analysis is performed.