Information interaction method, robot and storage medium

By acquiring sensor data to determine the user's dynamic profile, the robot generates prompts that match the user's state, solving the problem of interruptions caused by fixed prompts in existing technologies and realizing personalized information interaction.

CN122240219APending Publication Date: 2026-06-19UBTECH ROBOTICS CORP LTD

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
UBTECH ROBOTICS CORP LTD
Filing Date
2026-02-02
Publication Date
2026-06-19

Smart Images

  • Figure CN122240219A_ABST
    Figure CN122240219A_ABST
Patent Text Reader

Abstract

This application provides an information interaction method, a robot, and a storage medium, belonging to the field of robotics technology. The method includes: acquiring raw sensor data, which includes state information characterizing a user's state and environmental information characterizing the user's environment; determining a dynamic profile of the user based on the raw sensor data; generating prompt information matching the dynamic profile in response to information interaction needs; and outputting the prompt information based on the user's state information. By matching the prompt information generated based on the user's dynamic profile with the user's current dynamic profile, the output prompt information matches the user's current state, preventing the output prompt information from disturbing the user's current state. Furthermore, for different dynamic profiles of the user, the prompt information can be personalized to meet the user's needs and satisfy different user states.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This application belongs to the field of robotics technology, and in particular relates to an information interaction method, a robot, and a storage medium. Background Technology

[0002] With the development of intelligent robots, the widespread adoption of devices such as home service robots, smart speakers, and gateways with screens, and the application of Large Language Models (LLM), intelligent robot assistants have evolved from simple question-and-answer systems into cognitive hubs capable of multi-turn dialogues, task planning, and even robot control. Among these, the "proactive reminder" function has become a common information interaction scenario. For example, in information interaction scenarios such as weather change reminders, schedule conflict warnings, and recommendations of content of interest, the "proactive reminder" function is a necessity.

[0003] In related technologies, the "proactive reminder" function generally needs to use third-party interfaces to call relevant data. When fixed time conditions are met, the corresponding text is played using a preset template. In addition, the solution of directly applying a large model to generate reminder content is used. For example, the data called by the third-party interface is used as the input of a large language model. The matching reminder text is generated through this natural language model, and then the reminder is given according to the preset schedule.

[0004] In the aforementioned technologies, reminders are only given based on the reminder text generated by the model, which fails to meet the personalized needs of different users. Summary of the Invention

[0005] The purpose of this application is to provide an information interaction method, robot, and storage medium, which aims to solve the problem that current robot assistants cannot meet the personalized needs of different users.

[0006] A first aspect of this application provides an information interaction method, the method comprising:

[0007] Acquire raw sensor data, which includes state information characterizing the user's state and environmental information characterizing the user's environment. Based on the raw sensor data, a dynamic profile of the user is determined; In response to information interaction needs, based on the dynamic profile, a prompt message matching the dynamic profile is generated; Based on the user's status information, the prompt message is output.

[0008] In some embodiments, outputting the prompt information based on the user's state information includes: Based on the user's status information, the user's level of focus is determined; If the focus level is less than a first preset value, output the prompt message; or, if the focus level is greater than or equal to the first preset value and less than a second preset value, cache the prompt message and output the prompt message after a first preset duration; or, if the focus level is greater than or equal to the second preset value, delete the prompt message, wherein the first preset value is less than the second preset value.

[0009] In some embodiments, determining the user's focus level based on the user's state information includes: Analyze the user's state information to obtain the user's focus factor; The user's focus is obtained by inputting focus factors into the focus determination model. The focus determination model is used to output the user's focus based on the user's focus factors.

[0010] In some embodiments, determining the user's dynamic profile based on the raw sensor data includes: Based on the user's state information in the raw sensor data, the user's behavior tags are determined, including motion tags and interaction tags; Based on the environmental information in the raw sensor data, determine the weather event that matches the environmental information; Based on the temporal correspondence between the environmental information and the user's status information, a correspondence between weather events and user behavior tags is established. Based on the correspondence between the weather events and the user's behavioral tags, a profile of the current user is created to obtain the user's dynamic profile.

[0011] In some embodiments, the method further includes: Obtain the user's historical operation records on the robot; The user's interest tags are determined based on the historical operation records; Accordingly, the process of creating a dynamic profile of the current user based on the correspondence between the weather event and the user's behavioral tags includes: Based on the correspondence between the weather events and user behavior tags and the interest tags, a profile of the current user is created to obtain the user's dynamic profile.

[0012] In some embodiments, in response to information interaction needs, generating prompt information matching the dynamic profile based on the dynamic profile includes: In response to information interaction needs, the content of the prompt message is determined based on the information interaction. Based on the dynamic profile, determine the expression parameters of the prompt information; Based on the content of the prompt information and the expression parameters, a prompt information matching the dynamic portrait is determined.

[0013] In some embodiments, determining the expression parameters of the prompt information based on the dynamic profile includes: When the prompt information is audio information, the speech rate and fundamental frequency of the prompt information are determined based on the dynamic profile; and / or, When the prompt information is visual information, the display color tone and / or robot control parameters of the prompt information are determined based on the dynamic image.

[0014] In some embodiments, the method further includes: After collecting and outputting the prompt information, the user's emotional feedback information is obtained; Based on the emotional feedback information, the focus model and profile rules are adjusted.

[0015] A second aspect of this application provides an information interaction device, the device comprising: The first acquisition unit is used to acquire raw sensor data, which includes state information for characterizing the user's state and environmental information for characterizing the user's environment. The first determining unit is used to determine the dynamic profile of the user based on the original sensor data; The generation unit is used to respond to information interaction needs and generate prompt information that matches the dynamic profile based on the dynamic profile; The output unit is used to output the prompt information based on the user's status information.

[0016] In some embodiments, the output unit is configured to determine the user's focus level based on the user's state information; if the focus level is less than a first preset value, output the prompt information; or, if the focus level is greater than or equal to the first preset value and less than a second preset value, cache the prompt information and output the prompt information after a first preset duration; or, if the focus level is greater than or equal to the second preset value, delete the prompt information, wherein the first preset value is less than the second preset value.

[0017] In some embodiments, the output unit is used to parse the user's state information to obtain the user's focus factors; input the focus factors into a focus determination model to obtain the user's focus, and the focus determination model is used to output the user's focus based on the user's focus factors.

[0018] In some embodiments, the first determining unit is configured to: determine user behavior tags based on user state information in the original sensor data, the behavior tags including motion tags and interaction tags; determine weather events matching the environmental information in the original sensor data; establish a correspondence between weather events and user behavior tags based on the time correspondence between the environmental information and user state information; and create a profile of the current user based on the correspondence between the weather events and user behavior tags to obtain a dynamic profile of the user.

[0019] In some embodiments, the apparatus further includes: The second acquisition unit is used to acquire the historical operation records of the user on the robot; The second determining unit is used to determine the user's interest tags based on the historical operation records; Accordingly, the first determining unit is used to create a profile of the current user based on the correspondence between the weather event and the user behavior tag and the interest tag, so as to obtain the dynamic profile of the user.

[0020] In some embodiments, the generation unit is configured to respond to information interaction needs by determining the content of the prompt information based on the information interaction; determining the expression parameters of the prompt information based on the dynamic profile; and determining prompt information matching the dynamic profile based on the content of the prompt information and the expression parameters.

[0021] In some embodiments, the generating unit is configured to, when the prompt information is audio information, determine the speech rate and fundamental frequency of the prompt information based on the dynamic profile; and / or, The generation unit is used to determine the display color tone and / or robot control parameters of the prompt information based on the dynamic image when the prompt information is visual information.

[0022] In some embodiments, the apparatus further includes: The data acquisition unit is used to collect the user's emotional feedback information after the prompt information is output; The training unit is used to adjust the focus model and profiling rules based on the emotional feedback information.

[0023] A third aspect of this application provides a robot including a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein the processor executes the computer program to implement the information interaction method described above.

[0024] A fourth aspect of this application provides a computer-readable storage medium storing a computer program that, when executed by a processor, implements the information interaction method described above.

[0025] A fifth aspect of this application provides a computer program product that, when run on an electronic device, causes the electronic device to perform the information interaction method described above.

[0026] The beneficial effects of the embodiments of the present invention compared with the prior art are as follows: In this embodiment, the robot acquires raw sensor data during operation. Based on the state information representing the user's state and the environmental information representing the user's environment in the raw sensor data, it determines the user's dynamic profile. When receiving an information interaction request, it generates a prompt message matching the dynamic profile based on the dynamic profile and outputs the prompt message based on the user's state information. In this way, the prompt message generated based on the user's dynamic profile matches the user's current dynamic profile, so the output prompt message can match the user's current state, preventing the output prompt message from disturbing the user's current state. Furthermore, for different dynamic profiles of the user, the prompt message can be personalized to match the user's needs and meet the user's different states. Attached Figure Description

[0027] Figure 1 A schematic diagram of the overall system technical architecture provided by an exemplary embodiment of this application is shown; Figure 2 A flowchart illustrating an exemplary embodiment of an information interaction method is shown. Figure 3 A flowchart illustrating an exemplary embodiment of an information interaction method is shown. Figure 4 A flowchart illustrating an exemplary embodiment of an information interaction method is shown. Figure 5 A flowchart illustrating an exemplary embodiment of an information interaction method is shown. Figure 6 A schematic diagram of the information interaction logic of a robot provided in an exemplary embodiment is shown; Figure 7 A schematic diagram of the structure of an information interaction device provided in this application is shown; Figure 8 A schematic diagram of the structure of a robot provided in this application is shown. Detailed Implementation

[0028] To make the technical problems, technical solutions, and beneficial effects to be solved by this application clearer, the following detailed description is provided in conjunction with the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative and are not intended to limit the scope of this application.

[0029] Furthermore, the terms "first" and "second" are used for descriptive purposes only and should not be construed as indicating or implying relative importance or implicitly specifying the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include one or more of that feature. In the description of this application, "multiple" means two or more, unless otherwise explicitly specified.

[0030] With the development of intelligent robots, the widespread adoption of devices such as home service robots, smart speakers, and gateways with screens, and the application of Large Language Models (LLM), intelligent robot assistants have evolved from simple question-and-answer systems into cognitive hubs capable of multi-turn dialogues, task planning, and even robot control. Among these, the "proactive reminder" function has become a common information interaction scenario. For example, in information interaction scenarios such as weather change reminders, schedule conflict warnings, and recommendations of content of interest, the "proactive reminder" function is a necessity.

[0031] In some embodiments, the "proactive reminder" function typically requires using a third-party interface to call relevant data. Furthermore, when fixed time conditions are met, a pre-set template is used to play corresponding text, and a large-scale model is directly applied to the reminder content generation scheme. The data from the third-party interface is used as input to a large-scale language model, which generates matching reminder text and then sends reminders according to a preset schedule.

[0032] For example, by calling a third-party weather API, a pre-set template can be displayed to play text when fixed time conditions are met. While this process does not require complex calculations, the format parameters of the reminder content are fixed and unrelated to user preferences. The template text is monotonous and lacks emotional and personalized expression, resulting in low user acceptance. Furthermore, the fixed timing of the reminder makes it impossible to dynamically adjust based on information such as the user's current focus or actual daily routine, which can still easily disturb the user.

[0033] To prevent disturbance to users when they are highly focused, this application provides an information interaction method, a robot, and a storage medium. During operation, the robot acquires raw sensor data and determines a dynamic profile of the user based on state information representing the user's state and environmental information representing the user's environment. Upon receiving an information interaction request, the robot generates a matching prompt based on this dynamic profile and outputs the prompt based on the user's state information. This ensures that the generated prompt matches the user's current dynamic profile, preventing the output prompt from disturbing the user's current state. Furthermore, the prompt can be personalized to meet the user's needs and different states based on their different dynamic profiles.

[0034] The present application will now be described with reference to specific embodiments. See below. Figure 1 This illustrates a schematic diagram of the overall system technical architecture provided by an exemplary embodiment of this application. Figure 1 As shown in the embodiments of this application, when implementing the information interaction method provided in this application, the system includes a large language model module, a user behavior data module, and a user profile module in the core component integration. The large language model module is used to implement real-time interaction process processing, and this application provides a model selection and fine-tuning strategy for this large language model module. The user behavior data module is used for collecting and cleaning user behavior data and performing behavior model analysis. The user profile module is used to construct and update user profiles and provide personalized feature extraction techniques.

[0035] The system's interaction flow includes a proactive reminder triggering process and a user feedback processing process. The proactive reminder triggering process is achieved through multi-source data fusion decision-making and reminder timing optimization; the user feedback processing process is achieved through feedback collection, model iteration, and recording of effectiveness evaluation metrics.

[0036] See Figure 2 The diagram illustrates a flowchart of an information interaction method provided by an exemplary embodiment. This method is applied to robots, not as a limitation.

[0037] S201, the robot acquires raw sensor data, which includes state information characterizing the user's state and environmental information characterizing the user's environment.

[0038] The raw sensor data includes data from multiple modalities. In some embodiments, the raw sensor data includes state information characterizing the user's state and environmental information characterizing the user's environment.

[0039] It should be noted that the raw sensor data can be collected by the robot's sensors, or by other electronic devices, with the robot receiving the raw sensor data from these other devices. For example, the status information characterizing the user's state in the raw sensor data may include audio information. In some embodiments, the robot integrates a microphone array for audio acquisition; accordingly, the robot acquires the speech stream through the microphone array to obtain the audio information. The status information characterizing the user's state in the raw sensor data may also include motion status information such as the user's heart rate, wrist temperature, and step count. In some embodiments, the robot establishes a communication connection with the user's wearable device; accordingly, the wearable device can collect the user's motion status information such as heart rate, wrist temperature, and step count, and send this motion status information to the robot, which then receives the motion status information sent by the wearable device.

[0040] The environmental information used to characterize the user's environment in the raw sensor data includes information such as temperature, humidity, air pressure, and particulate matter content. In some embodiments, the robot also integrates an environmental information sensor, which collects environmental information such as temperature, humidity, air pressure, and particulate matter content of the user's environment.

[0041] In some embodiments, the robot can also collect information such as its body pose and velocity. Accordingly, the robot also integrates a pose sensor, which detects information such as its body pose and velocity.

[0042] S202, the robot determines the user's dynamic profile based on the original sensor data.

[0043] Based on the raw sensor data, the robot extracts the user's state information and environmental information, and creates a dynamic profile of the user based on the correspondence between the state and environmental information. In some embodiments, the robot performs atomic extraction of the state information from the raw sensor data, samples the data to obtain the user's behavioral atomic sequence, and determines the user's behavioral label based on the behavioral atomic sequence; it also processes the environmental information to determine the corresponding weather event; and aligns the user's behavioral label with the weather event based on a timestamp, i.e., it calculates the probability of the user performing each behavior under different weather events. Then, a dynamic profile of the user is created based on the correspondence between the weather event and the user's behavioral label. See also Figure 3 This illustrates a flowchart of an information interaction method provided by an exemplary embodiment. Figure 3 As shown, this method can be implemented through the following steps S2021-S2024, including: S2021, the robot determines the user's behavior labels based on the user's state information in the raw sensor data. These behavior labels include movement labels and interaction labels.

[0044] To reduce the efficiency of the robot processing raw sensor data, atomic extraction is performed on the raw sensor data to extract atomic sequences of user behavior. This atomic extraction refers to periodically extracting the raw sensor data. The period can be set as needed, and in this embodiment, no specific limitation is made. For example, the period can be 30 seconds, 35 seconds, or 40 seconds.

[0045] In this step, the robot determines the behavior label based on the changes in the user's state information in the raw sensor data. In some embodiments, the robot determines the user's movement label based on the changes in adjacent user behavior data in the behavior atomic sequence. For example, the robot takes a segment of raw sensor data every 30 seconds. In the adjacent sensor data collected, if the user's movement distance is less than a first preset distance and the user's heart rate change is less than a preset rate of change, the movement label is determined to be "stationary"; if the user's movement distance is greater than a second preset distance and the user's average speed is greater than a preset speed, the movement label is determined to be "walking"; otherwise, the movement label is determined to be "transitional". The first preset distance, second preset distance, preset rate of change, and preset speed can be set as needed. In this embodiment, the first preset distance, second preset distance, preset rate of change, and preset speed are not specifically limited. For example, the first preset distance can be 0.5 meters, 0.6 meters, or 0.8 meters; the second preset distance can be 1 meter, 1.2 meters, or 1.5 meters; the preset rate of change can be 5%, 6%, or 8%; and the preset speed can be 0.3 meters per second, 0.5 meters per second, or 0.6 meters per second.

[0046] For audio information, the robot can use Voice Activity Detection to detect whether there is user audio within a preset continuous duration. Accordingly, if audio is detected throughout the preset continuous duration, the interaction is labeled as "conversation". This preset continuous duration can be set as needed; in this embodiment, it is not specifically limited. For example, the preset continuous duration can be 30 seconds, 35 seconds, or 40 seconds.

[0047] This sampling of raw sensor data compresses a segment of raw sensor data into a few behavioral atoms, reducing storage and computational burden.

[0048] S2022, the robot determines the weather event that matches the environmental information in the raw sensor data.

[0049] The robot records weather events matching the changes in environmental information from raw sensor data. For example, if the environmental information indicates that the temperature rises or falls by more than a preset temperature within a second preset time period, a "temperature abrupt change" event is determined to have occurred. The second preset time period can be set as needed, and in this embodiment, it is not specifically limited. For example, the second preset time period can be 1 hour, 1.5 hours, or 2 hours. As another example, if the concentration of fine particulate matter (PM2.5) exceeds a preset concentration for a period of time equal to a third preset time period, a "smog" event is determined to have occurred. The preset concentration and the third preset time period can be set as needed, and in this embodiment, they are not specifically limited. For example, the preset concentration can be 75 μg / m³, 80 μg / m³, or 85 μg / m³, and the third preset time period can be 30 minutes, 40 minutes, or 45 minutes. For example, if the humidity is greater than a preset humidity and the air pressure drops by a preset value within a fourth preset time period, then a "precursor to heavy rain" event is determined to have occurred. The preset humidity, the fourth preset time period, and the preset air pressure value can be set as needed. In this embodiment, these preset humidity, fourth preset time period, and preset air pressure value are not specifically limited. For example, the preset humidity can be 80%, 85%, or 88%, the fourth preset time period can be 3 hours, 3.5 hours, or 4 hours, and the preset air pressure value can be 2 hPa, 2.5 hPa, or 3 hPa.

[0050] S2023, the robot establishes a correspondence between weather events and user behavior tags based on the time correspondence between the environmental information and the user's state information.

[0051] The robot maps the weather event to the user's behavior tag based on the timestamp, thus calculating the probability of the user performing each behavior tag under each weather event. For example, it calculates the probability that the user will remain still when encountering smog.

[0052] S2024, the robot creates a profile of the current user based on the correspondence between the weather event and the user's behavioral tags, thus obtaining the user's dynamic profile.

[0053] The user's dynamic profile is used to associate reminder content with user needs when determining reminder content. This allows for a more accurate determination of whether to send a reminder, the timing of the reminder, and the content of the reminder, thus better meeting the user's actual needs and enabling personalized reminder content based on user requirements.

[0054] In some embodiments, the robot can also combine the user's historical operation records on the robot to determine the user's interest tags. This dynamic profiling based on the user's interest tags makes the profile more relevant to the user's needs. Accordingly, prior to this step, the robot obtains the user's historical operation records on the robot; and determines the user's interest tags based on these historical operation records.

[0055] This historical operation record includes raw records of user actions on the robot, such as voice-activated playback records, screen click records, and search keyword records. It may also include operation records from user-linked applications, such as music apps, video apps, and reading apps.

[0056] When the robot retrieves historical operation records, it can retain data such as style, genre, and duration, but does not retain data such as account and content identifiers, thereby reducing the robot's resource consumption.

[0057] The robot determines the user's interest tags based on their historical activity records. For example, it can determine the user's preferred music style based on voice-activated playback records, thus identifying their music interest tags. For instance, if the voice-activated playback record shows "play jazz," the user's music interest tag is determined to be "likes jazz." Alternatively, the robot can determine the user's video interest tags based on screen click records, identifying the video content selected during those clicks. For example, if the screen click record includes "play sci-fi movie trailers," the user's video interest tag is determined to be "likes sci-fi movies." Or, the robot can determine other user interest tags based on search keyword records. For instance, if the robot records a search keyword as "search for low-sugar desserts," the user's dietary interest tag is determined to be "low-sugar foods," and so on.

[0058] It should be noted that the robot can determine multiple interest tags based on different search keywords. To prevent an excessive number of interest tags from leading to insufficient targeting in the profiling process, these interest tags can be filtered. For example, the robot can evaluate the confidence level of multiple interest tags and select the top N interest tags with the highest confidence levels. Here, N is an integer greater than 0. The value of N can be set as needed, and in this embodiment, the value of N is not specifically limited. For example, N can be 5, 6, or 8, etc.

[0059] Accordingly, this step includes: the robot creates a profile of the current user based on the correspondence between the weather event and the user's behavior tags and the interest tags, thus obtaining the user's dynamic profile.

[0060] In some embodiments, a dynamic profiling engine can be deployed in the robot to create a dynamic profile of the user. This dynamic profile includes a dynamic profile vector U in a three-dimensional latent space representing weather sensitivity, interests, and planning patterns. The user's dynamic profile vector U = [u_wea, u_int, u_sch], where u_wea is the weather sensitivity scalar; u_int is the interest embedding; and u_sch is the planning pattern vector. Different vector dimensions represent the user's dynamic profile.

[0061] It should be noted that, to ensure the alignment of user behavior tags with weather events on the timeline, thereby guaranteeing the accuracy of the user's dynamic profile, the robot can also align the collected raw sensor data before this step. Accordingly, the robot aligns the collected raw sensor data based on the timestamps carried in the raw sensor data. Furthermore, to prevent data misalignment caused by data latency, the robot also performs packet loss detection, discarding incomplete packets with delays exceeding a third preset duration to ensure the alignment of the raw sensor data. This third preset duration can be set as needed; in this embodiment, it is not specifically limited. For example, the third preset duration can be 50 milliseconds, 55 milliseconds, or 60 milliseconds, etc.

[0062] Accordingly, in some embodiments, the robot aligns the raw sensor data using timestamps and encapsulates it into Robot Operating System 2 Topic (Ros2 Topic) statements, providing millisecond-level synchronous input for subsequent yard extraction, thereby solving the problem of misalignment between user behavior and weather events caused by sensor latency.

[0063] S203, In response to the information interaction needs, the robot generates prompt information that matches the dynamic profile based on the dynamic profile.

[0064] Based on the information interaction request, the robot determines the content of the prompt, and based on that content, determines the expression parameters of the prompt. Then, based on the content and expression parameters, it outputs the corresponding prompt. (See also...) Figure 4 This illustrates a flowchart of an information interaction method provided by an exemplary embodiment. Figure 4 As shown, this process is implemented through the following steps S2031-S2033, including: S2031, In response to the information interaction request, the robot determines the content of the prompt information based on the information interaction.

[0065] In some embodiments, the information interaction request can be triggered by a user-defined schedule, such as an alarm or reminder. The robot can then determine whether the scheduled reminder time has arrived and whether to trigger the information interaction request. In other embodiments, the information interaction request can also be triggered by a sudden event detected by the robot. For example, the robot detects environmental information, and when it detects a sudden change in environmental information, it determines that the information interaction request should be triggered.

[0066] The content of the prompt message varies depending on how the interaction request is triggered. When the interaction request is triggered by a user-defined schedule, the prompt message can contain information about that schedule. For example, it could be a reminder to drink water or take a trip. When the interaction request is triggered by a sudden event detected by the robot, the prompt message can contain information about that event. For example, it could be information about environmental changes.

[0067] S2032, the robot determines the expression parameters of the prompt information based on the dynamic image.

[0068] This expression parameter controls the emotional characteristics of the output prompt information. In this step, the robot uses a Large Language Model (LLM) combined with the user's dynamic profile to determine the emotional expression mode that matches the user's dynamic profile, and then determines this expression parameter.

[0069] In this embodiment, the robot can output the prompt information through multiple modalities, and correspondingly, the expression parameter can also be set with different parameter types according to different output modalities. In some embodiments, when the prompt information is audio information, the expression parameter may include speech rate and fundamental frequency. Accordingly, when the prompt information is audio information, the robot determines the speech rate and fundamental frequency of the prompt information based on the dynamic profile. For example, when the dynamic profile indicates that the user prefers "concerned" voice prompt information, the robot determines the expression parameter to be speech rate. 10%, base frequency +5%; when the dynamic profile indicates that the user prefers "humorous" voice prompts, the robot determines the expression parameters as speech rate +8% and base frequency +12%.

[0070] In some embodiments, the prompt information may also be visual information. This visual information may be multimedia information, or it may be robot gesture information, etc. Accordingly, the expression parameters include display hue and / or robot control parameters. Accordingly, when the prompt information is visual information, the robot determines the display hue and / or robot control parameters of the prompt information based on the dynamic image.

[0071] For example, when the prompt is multimedia information, the expression parameter can be a multi-segment LED display on the head-shaped circular screen, showing a breathing animation in cool or warm colors matching the dynamic image. When the prompt is robot gesture information, the expression parameter can be the robot's shoulder degree of freedom (DOF) = 2 and elbow DOF = 1. In some embodiments, the robot can pre-store preset gesture templates corresponding to different emotions. Accordingly, in this step, the robot can determine the user's current emotion based on the dynamic image, and then obtain the expression parameters of the corresponding gesture template from the corresponding preset gesture template based on the emotion corresponding to the dynamic image.

[0072] S2033, the robot determines the prompt information that matches the dynamic portrait based on the content of the prompt information and the expression parameters.

[0073] The robot combines the content and expression parameters of the prompt information to create a prompt for matching the dynamic profile.

[0074] In this embodiment, the content of the prompt message can be inferred using a Large Language Model (LLM), enabling the robot to generate candidate reminder statements containing "reminder timing token, reminder content token, and reminder mood token" through restricted decoding, based on the current mutation event, the user's dynamic profile vector, and the schedule as joint inputs. Accordingly, the above steps S2031-S2033 can be implemented by an LLM model.

[0075] S204, the robot outputs the prompt message based on the user's status information.

[0076] To prevent the robot's output of prompts from disturbing the user, in this embodiment, the robot also acquires the user's state information when outputting the prompt, and then determines whether to output the prompt based on the user's state information. See also Figure 5 This illustrates a flowchart of an information interaction method provided by an exemplary embodiment. Figure 5 As shown, this process can be implemented through the following steps S2041-S2044, including: S2041, the robot determines the user's level of focus based on the user's state information.

[0077] The state information can be obtained in the same way as the state information obtained by the robot in step S201. In order to ensure that the focus determination process is more accurate, the state information can also include other state information that can characterize the user's focus. For example, the state information can also include information such as heart rate variability HRV_SDNN, eye gaze deviation angle, keyboard and mouse interval time TK, and ambient sound pressure level Leq.

[0078] This focus level represents the user's current level of focus. In some embodiments, the robot parses the user's state information to obtain the user's focus factors; these focus factors are then input into a focus determination model to obtain the user's focus level, which is used to output the user's focus level based on the user's focus factors.

[0079] The attention determination model can be an ultra-lightweight deep network used to determine whether a user can be disturbed. In some embodiments, the attention determination model may include three fully connected layers, with each layer containing 32, 16, and 1 neuron respectively. That is, the input layer includes 32 neurons, the intermediate layer includes 16 neurons, and the output layer includes 1 neuron. The input and intermediate layers use the ReLU function for computation, and the output layer uses the Sigmoid function to compress the output. This results in fewer parameters and intermediate structures in the neural network, less memory usage, and lower computational cost, making it suitable for resource-constrained microchips and thus more suitable for the structural layout of robots.

[0080] S2042, If the focus level is less than the first preset value, the robot outputs the prompt message.

[0081] When the level of focus is less than the first preset value, the robot determines that the user is not currently focused on something and can be disturbed, so it immediately outputs the prompt message.

[0082] S2043, if the focus level is greater than or equal to the first preset value and the focus level is less than the second preset value, the robot caches the prompt information and outputs the prompt information after the first preset time.

[0083] When the level of focus is greater than the first preset value and less than the second preset value, it is determined that the user's current attention is relatively focused. The robot caches the prompt information and outputs the prompt information after the first preset time.

[0084] It should be noted that after the robot caches the prompt message, it can monitor the user's attention level in real time within the first preset time period. When the user's attention level is detected to be less than the third preset threshold, the prompt message is output. If the user's attention level is not detected to be less than the first preset threshold within the first preset time period, the prompt message can be deleted. Alternatively, after caching the prompt message, the robot can perform attention level monitoring again after the first preset time period. If the detected attention level is less than the first preset value, the prompt message is output; otherwise, the prompt message is deleted.

[0085] The first preset duration can be set as needed, and in this embodiment, the first preset duration is not specifically limited. For example, the first preset duration can be 10 minutes, 15 minutes, or 18 minutes, etc.

[0086] S2044, If the level of focus is greater than or equal to the second preset value, the robot deletes the prompt message.

[0087] When the level of focus is greater than or equal to the second preset value, the robot determines that the user is currently in a highly focused attention stage and cannot be disturbed. Therefore, it deletes the prompt message and stops outputting it.

[0088] Wherein, the first preset value is less than the second preset value. Furthermore, the magnitudes of the first and second preset values ​​can be determined based on the range of focus level. In this embodiment, the first and second preset values ​​are not specifically limited. For example, in this embodiment, the focus level is d∈[0,1], and correspondingly, the first preset value can be 0.25, and the second preset value can be 0.45, etc. Accordingly, when focus level d<0.25, an output prompt message is immediately triggered; when 0.25≤d<0.45, the prompt message is cached and the window is opened; when d≥0.45, the prompt message is directly discarded.

[0089] In this implementation, the user's focus level is detected before outputting the prompt message. The prompt message is only output when the user's focus level is low, and not output when the user's focus level is high. Furthermore, the prompt message is output when the user's focus level is moderate. This ensures the effective output of the prompt message and prevents the robot from disturbing the user when outputting the prompt message, thus optimizing the user experience.

[0090] Furthermore, when the robot outputs the prompt, it can output different modalities of prompt information through different expression parameters. Moreover, the robot can output at least one modality of prompt information simultaneously. For example, the robot can simultaneously output prompt information in voice, facial expression, and gesture modalities; that is, it can simultaneously output voice information, multimedia information, and robot gesture information.

[0091] It should be noted that since the prompts obtained from the large language model inference can be multiple prompt statements, the robot can also evaluate the prompt strategies of these multiple prompt statements and select the prompt statement with the best evaluation effect. Furthermore, in order to improve the accuracy of the timing of the prompt output, this application also provides a feedback learning method. Accordingly, the robot collects the user's emotional feedback information after outputting the prompt information; based on the emotional feedback information, the attention model and profile rules are adjusted.

[0092] The user's emotional feedback information includes explicit and implicit information. Explicit information refers to information actively provided by the user. For example, the robot can periodically collect user feedback and determine explicit information based on that feedback. Implicit information refers to changes in the user's state after a prompt is output. For example, after outputting a prompt, the robot can collect the user's gaze duration and probability of occurrence; a longer gaze duration and a higher probability indicate greater user satisfaction with the current prompt.

[0093] After the robot collects the user's emotional feedback information, it can fine-tune the LLM model, attention determination model and profiling rules in this application through offline reinforcement learning algorithms (such as Batch-Constrained Q-learning, BCQ), so that the robot becomes more and more in line with the user's usage habits and further improves the user experience.

[0094] In this embodiment, the robot acquires raw sensor data during operation. Based on the state information representing the user's state and the environmental information representing the user's environment in the raw sensor data, it determines the user's dynamic profile. When receiving an information interaction request, it generates a prompt message matching the dynamic profile based on the dynamic profile and outputs the prompt message based on the user's state information. In this way, the prompt message generated based on the user's dynamic profile matches the user's current dynamic profile, so the output prompt message can match the user's current state, preventing the output prompt message from disturbing the user's current state. Furthermore, for different dynamic profiles of the user, the prompt message can be personalized to match the user's needs and meet the user's different states.

[0095] For ease of understanding, this application is explained below with reference to the logic block diagram of the robot provided. See also Figure 6 This illustrates a schematic diagram of the information interaction logic of a robot provided in an exemplary embodiment. Figure 6As shown, the robot acquires raw sensor data through a multimodal perception layer. Specifically, it collects voice streams through a microphone array, acquires the robot's pose and velocity vectors through an IMU / wheel speed sensor, collects environmental information such as temperature, humidity, air pressure, and fine particulate matter content through environmental sensors, and obtains the user's physiological data through wearable devices to obtain state information.

[0096] After acquiring raw sensor data, behavioral elements are extracted to obtain behavioral atomic sequences and weather events. A dynamic profiling engine then creates a dynamic profile of the user based on these behavioral sequences and weather events. This dynamic profile, along with weather events and schedule information, is input into a large language model, which determines the appropriate prompts. The timing of these prompts is optimized using a focus model, ultimately achieving multimodal output.

[0097] After completing the multimodal output, the attention model, large language model, and dynamic profiling engine are optimized through online feedback learning to form a closed loop.

[0098] In this way, the robot acquires raw sensor data during operation, determines the user's dynamic profile based on the state information representing the user's state and the environmental information representing the user's environment. When receiving an information interaction request, it generates a prompt message matching the dynamic profile based on the dynamic profile, and outputs the prompt message based on the user's state information. In this way, the prompt message generated based on the user's dynamic profile matches the user's current dynamic profile, so the output prompt message can match the user's current state, preventing the output prompt message from disturbing the user's current state. Furthermore, the prompt message can be personalized to match the user's needs and meet the user's different states based on different dynamic profiles.

[0099] It should be understood that the sequence number of each step in the above embodiments does not imply the order of execution. The execution order of each process should be determined by its function and internal logic, and should not constitute any limitation on the implementation process of the embodiments of this application.

[0100] See Figure 7 It shows a schematic diagram of the structure of an information interaction device provided in this application, including various units used to perform the various steps in the above embodiments, see [link to diagram]. Figure 7 The information interaction device includes: The first acquisition unit 701 is used to acquire raw sensor data, which includes state information for characterizing the user's state and environmental information for characterizing the user's environment. The first determining unit 702 is used to determine the dynamic profile of the user based on the original sensor data; The generation unit 703 is used to generate prompt information that matches the dynamic profile in response to information interaction needs. The output unit 704 is used to output the prompt information based on the user's status information.

[0101] In some embodiments, the output unit 704 is configured to determine the user's level of focus based on the user's state information; if the level of focus is less than a first preset value, output the prompt message; or, if the level of focus is greater than or equal to the first preset value and less than a second preset value, cache the prompt message and output the prompt message after a first preset duration; or, if the level of focus is greater than or equal to the second preset value, delete the prompt message, wherein the first preset value is less than the second preset value.

[0102] In some embodiments, the output unit 704 is used to parse the user's state information to obtain the user's focus factors; input the focus factors into a focus determination model to obtain the user's focus, and the focus determination model is used to output the user's focus based on the user's focus factors.

[0103] In some embodiments, the first determining unit 702 is configured to: determine user behavior tags based on user state information in the original sensor data, the behavior tags including motion tags and interaction tags; determine weather events matching the environmental information based on the environmental information in the original sensor data; establish a correspondence between the weather events and user behavior tags based on the time correspondence between the environmental information and the user state information; and create a profile of the current user based on the correspondence between the weather events and the user behavior tags to obtain a dynamic profile of the user.

[0104] In some embodiments, the device further includes: The second acquisition unit is used to acquire the user's historical operation records on the robot; The second determining unit is used to determine the user's interest tags based on the historical operation record; Accordingly, the first determining unit 702 is used to create a profile of the current user based on the correspondence between the weather event and the user behavior tag and the interest tag, so as to obtain the dynamic profile of the user.

[0105] In some embodiments, the generation unit 703 is configured to, in response to an information interaction request, determine the content of a prompt message based on the information interaction; determine the expression parameters of the prompt message based on the dynamic profile; and determine a prompt message matching the dynamic profile based on the content of the prompt message and the expression parameters.

[0106] In some embodiments, the generation unit 703 is configured to determine the speech rate and fundamental frequency of the prompt information based on the dynamic profile when the prompt information is audio information; and / or, The generation unit 703 is used to determine the display color tone and / or robot control parameters of the prompt information based on the dynamic image when the prompt information is visual information.

[0107] In some embodiments, the device further includes: The data collection unit is used to collect the user's emotional feedback information after the prompt message is output. The training unit is used to adjust the attention model and profiling rules based on the emotional feedback information.

[0108] In this embodiment, the robot acquires raw sensor data during operation. Based on the state information representing the user's state and the environmental information representing the user's environment in the raw sensor data, it determines the user's dynamic profile. When receiving an information interaction request, it generates a prompt message matching the dynamic profile based on the dynamic profile and outputs the prompt message based on the user's state information. In this way, the prompt message generated based on the user's dynamic profile matches the user's current dynamic profile, so the output prompt message can match the user's current state, preventing the output prompt message from disturbing the user's current state. Furthermore, for different dynamic profiles of the user, the prompt message can be personalized to match the user's needs and meet the user's different states.

[0109] Figure 8 This is a schematic diagram of a robot provided in an exemplary embodiment of this application. Figure 8 As shown, the robot 8 in this embodiment includes a processor 80, a memory 81, and a computer program 82 stored in the memory 81 and executable on the processor 80, such as an information interaction program. When the processor 80 executes the computer program 82, it implements the steps in the various information interaction method embodiments described above, for example... Figure 2 The steps S201 to S204 are shown. Alternatively, when the processor 80 executes the computer program 82, it implements the functions of each unit in the above-described device embodiments, for example... Figure 7 The functions of units 701 to 704 are shown.

[0110] For example, the computer program 82 can be divided into one or more units, which are stored in the memory 81 and executed by the processor 80 to complete this application. The one or more units can be a series of computer program instruction segments capable of performing specific functions, which describe the execution process of the computer program 82 in the robot 8. For example, the computer program 82 can be divided into a first acquisition unit 701, a first determination unit 702, a generation unit 703, and an output unit 704, with the specific functions of each module as follows: The first acquisition unit 701 is used to acquire raw sensor data, which includes state information for characterizing the user's state and environmental information for characterizing the user's environment. The first determining unit 702 is used to determine the dynamic profile of the user based on the original sensor data; The generation unit 703 is used to generate prompt information that matches the dynamic profile in response to information interaction needs. The output unit 704 is used to output the prompt information based on the user's status information.

[0111] The robot 8 can be any robot with information interaction capabilities. The robot 8 may include, but is not limited to, a processor 80 and a memory 81. Those skilled in the art will understand that... Figure 8 This is merely an example of robot 8 and does not constitute a limitation on robot 8. It may include more or fewer parts than shown, or combine certain parts, or different parts. For example, robot 8 may also include input / output devices, network access devices, buses, etc.

[0112] The processor 80 may be a Central Processing Unit (CPU), or other general-purpose processors, digital signal processors (DSPs), application-specific integrated circuits (ASICs), field-programmable gate arrays (FPGAs), or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc. A general-purpose processor may be a microprocessor or any conventional processor.

[0113] The memory 81 can be an internal storage unit of the robot 8, such as a hard drive or memory. The memory 81 can also be an external storage device of the robot 8, such as a plug-in hard drive, smart media card (SMC), secure digital (SD) card, flash card, etc., equipped on the robot 8. Furthermore, the memory 81 can include both internal and external storage units of the robot 8. The memory 81 is used to store the computer program and other programs and data required by the terminal device. The memory 81 can also be used to temporarily store data that has been output or will be output.

[0114] Those skilled in the art will clearly understand that, for the sake of convenience and brevity, the above-described division of functional units and modules is merely an example. In practical applications, the above functions can be assigned to different functional units and modules as needed, that is, the internal structure of the device can be divided into different functional units or modules to complete all or part of the functions described above. The functional units and modules in the embodiments can be integrated into one processing unit, or each unit can exist physically separately, or two or more units can be integrated into one unit. The integrated unit can be implemented in hardware or as a software functional unit. Furthermore, the specific names of the functional units and modules are only for easy differentiation and are not intended to limit the scope of protection of this application. The specific working process of the units and modules in the above system can be referred to the corresponding process in the foregoing method embodiments, and will not be repeated here.

[0115] In the above embodiments, the descriptions of each embodiment have different focuses. For parts that are not described in detail or recorded in a certain embodiment, please refer to the relevant descriptions of other embodiments.

[0116] Those skilled in the art will recognize that the units and algorithm steps of the various examples described in conjunction with the embodiments disclosed herein can be implemented in electronic hardware, or a combination of computer software and electronic hardware. Whether these functions are implemented in hardware or software depends on the specific application and design constraints of the technical solution. Those skilled in the art can use different methods to implement the described functions for each specific application, but such implementation should not be considered beyond the scope of this application.

[0117] In the embodiments provided in this application, it should be understood that the disclosed devices / terminal equipment and methods can be implemented in other ways. For example, the device / terminal equipment embodiments described above are merely illustrative. For instance, the division of modules or units is only a logical functional division, and in actual implementation, there may be other division methods. For example, multiple units or components may be combined or integrated into another system, or some features may be ignored or not executed. Furthermore, the displayed or discussed mutual coupling or direct coupling or communication connection may be through some interfaces; the indirect coupling or communication connection between devices or units may be electrical, mechanical, or other forms.

[0118] The units described as separate components may or may not be physically separate. The components shown as units may or may not be physical units; that is, they may be located in one place or distributed across multiple network units. Some or all of the units can be selected to achieve the purpose of this embodiment according to actual needs.

[0119] Furthermore, the functional units in the various embodiments of this application can be integrated into one processing unit, or each unit can exist physically separately, or two or more units can be integrated into one unit. The integrated unit can be implemented in hardware or as a software functional unit.

[0120] If the integrated module / unit is implemented as a software functional unit and sold or used as an independent product, it can be stored in a computer-readable storage medium. Based on this understanding, all or part of the processes in the methods of the above embodiments can also be implemented by a computer program instructing related hardware. The computer program can be stored in a computer-readable storage medium, and when executed by a processor, it can implement the steps of the various method embodiments described above. The computer program includes computer program code, which can be in the form of source code, object code, executable files, or certain intermediate forms. The computer-readable medium can include: any entity or device capable of carrying the computer program code, recording media, USB flash drives, portable hard drives, magnetic disks, optical disks, computer memory, read-only memory (ROM), random access memory (RAM), electrical carrier signals, telecommunication signals, and software distribution media, etc. It should be noted that the content included in the computer-readable medium can be appropriately added or removed according to the requirements of legislation and patent practice in the jurisdiction. For example, in some jurisdictions, according to legislation and patent practice, computer-readable media do not include electrical carrier signals and telecommunication signals.

[0121] This application also provides a computer-readable storage medium storing a computer program that, when executed by a processor, implements the steps described in the above method embodiments.

[0122] This application also provides a computer program product that, when run on a mobile terminal, enables the mobile terminal to implement the steps described in the various method embodiments above.

[0123] The above-described embodiments are only used to illustrate the technical solutions of this application, and are not intended to limit them. Although this application has been described in detail with reference to the foregoing embodiments, those skilled in the art should understand that modifications can still be made to the technical solutions described in the foregoing embodiments, or equivalent substitutions can be made to some of the technical features. Such modifications or substitutions do not cause the essence of the corresponding technical solutions to deviate from the spirit and scope of the technical solutions of the embodiments of this application, and should all be included within the protection scope of this application.

Claims

1. An information exchange method, characterized in that, The method includes: Acquire raw sensor data, which includes state information characterizing the user's state and environmental information characterizing the user's environment. Based on the raw sensor data, a dynamic profile of the user is determined; In response to information interaction needs, based on the dynamic profile, a prompt message matching the dynamic profile is generated; Based on the user's status information, the prompt message is output.

2. The method as described in claim 1, characterized in that, The step of outputting the prompt information based on the user's status information includes: Based on the user's status information, the user's level of focus is determined; If the focus level is less than a first preset value, output the prompt message; or, if the focus level is greater than or equal to the first preset value and less than a second preset value, cache the prompt message and output the prompt message after a first preset duration; or, if the focus level is greater than or equal to the second preset value, delete the prompt message, wherein the first preset value is less than the second preset value.

3. The method as described in claim 2, characterized in that, Determining the user's focus level based on the user's state information includes: Analyze the user's state information to obtain the user's focus factor; The user's focus is obtained by inputting focus factors into the focus determination model. The focus determination model is used to output the user's focus based on the user's focus factors.

4. The method as described in claim 1, characterized in that, The process of determining the user's dynamic profile based on the original sensor data includes: Based on the user's state information in the raw sensor data, the user's behavior tags are determined, including motion tags and interaction tags; Based on the environmental information in the raw sensor data, determine the weather event that matches the environmental information; Based on the temporal correspondence between the environmental information and the user's status information, a correspondence between weather events and user behavior tags is established. Based on the correspondence between the weather events and the user's behavioral tags, a profile of the current user is created to obtain the user's dynamic profile.

5. The method as described in claim 4, characterized in that, The method further includes: Obtain the user's historical operation records on the robot; The user's interest tags are determined based on the historical operation records; Accordingly, the process of creating a dynamic profile of the current user based on the correspondence between the weather event and the user's behavioral tags includes: Based on the correspondence between the weather events and user behavior tags and the interest tags, a profile of the current user is created to obtain the user's dynamic profile.

6. The method according to any one of claims 1-5, characterized in that, The step of responding to information interaction needs by generating prompt information matching the dynamic profile, based on the dynamic profile, includes: In response to information interaction needs, the content of the prompt message is determined based on the information interaction. Based on the dynamic profile, determine the expression parameters of the prompt information; Based on the content of the prompt information and the expression parameters, a prompt information matching the dynamic portrait is determined.

7. The method as described in claim 6, characterized in that, The step of determining the expression parameters of the prompt information based on the dynamic profile includes: When the prompt information is audio information, the speech rate and fundamental frequency of the prompt information are determined based on the dynamic profile; and / or, When the prompt information is visual information, the display color tone and / or robot control parameters of the prompt information are determined based on the dynamic image.

8. The method according to any one of claims 1-5, characterized in that, The method further includes: After collecting and outputting the prompt information, the user's emotional feedback information is obtained; Based on the aforementioned emotional feedback information, the focus model and profile rules are adjusted.

9. A robot comprising a memory, a processor, and a computer program stored in the memory and executable on the processor, characterized in that, When the processor executes the computer program, it implements the information interaction method as described in any one of claims 1 to 8.

10. A computer-readable storage medium storing a computer program, characterized in that, When the computer program is executed by a processor, it implements the information interaction method as described in any one of claims 1 to 8.