A multi-source wake-up management method for a robot and an artificial intelligence chip
By analyzing users' historical wake-up data, identifying users with response deviations and optimizing wake-up words, and adopting a multi-source wake-up management method, the problem of fixed robot wake-up words was solved, thereby increasing the number of wake-ups and improving user experience.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- 智芯科(合肥)芯片设计有限公司
- Filing Date
- 2026-02-06
- Publication Date
- 2026-06-19
AI Technical Summary
Robot wake words are often fixed. When a user's voice is not standard or familiar enough, the number of wake-up calls is low. Existing technology has difficulty in effectively recognizing and updating wake words to improve user experience.
By analyzing users' historical wake-up data, we can identify user groups with response deviations and adopt different levels of wake-up optimization and recognition methods, including global leniency strategy and targeted leniency strategy, to optimize the wake-up word update and recognition strategy. By combining voice and visual interaction, we can improve the reliability of wake-up counts.
It improves the recognition reliability when the number of robot wake-up calls is low, reduces the risk of interference caused by inconsistent wake words, and ensures the reliability of robot use and user experience.
Smart Images

Figure CN122245307A_ABST
Abstract
Description
Technical Field
[0001] This invention belongs to the field of wake-up management technology, and particularly relates to a multi-source wake-up management method and artificial intelligence chip for robots. Background Technology
[0002] Modern service robots and companion robots are typically equipped with multiple interaction modalities, such as voice, vision, and touch, to achieve natural human-computer interaction. Specifically, the invention patent application CN202511770604.9, "A Voice Navigation Control System and Control Method for a Vehicle-Type Robot," uses voice as the human-computer interaction entry point. It performs automatic speech recognition and natural language processing on the input voice, parsing out the user's intent and key entities, and classifying the parsing results into mapping instructions, motion control instructions, or navigation instructions. This effectively overcomes the limitations of existing technologies, but it also has the following technical defects: Robot wake words are often fixed. If a user's voice is not standard or they are not familiar with the wake word, the number of wake-up calls will inevitably be low. Therefore, it is urgent to determine how to identify whether a user is not familiar with or has a standard wake word based on the wake-up data of different users in the robot, so as to update the user's wake word in a targeted manner and improve the user experience.
[0003] Therefore, there is an urgent need for a multi-source wake-up management method and an artificial intelligence chip for robots. Summary of the Invention
[0004] To achieve the objectives of this invention, the following technical solution is adopted: Specifically, this application provides a multi-source wake-up management method for robots, which includes: S1 analyzes the robot's voice wake-up data to obtain historical wake-up data for different users. Based on the historical wake-up data, it determines the robot's wake-up optimization recognition method. Based on the wake-up optimization recognition method, it manages and processes the robot's camera device. Based on the matching degree between the voice recognition result and the voice wake-up command, and combined with the user's historical wake-up process change data and the wake-up optimization recognition method, it determines the optimization target user among the users. S2 determines the user with interaction improvement needs based on the wake-up demand identification data of the target user based on the wake-up recognition optimization method, and determines the wake-up word update identification strategy of the target user based on the wake-up demand identification data of the user with interaction improvement needs and the wake-up demand identification data of different target users. S3 uses the update processing strategy to perform different optimization target user wake word recognition processing, so as to optimize the update recognition result of the target user's wake word and the deviation from the wake words of other optimization target users, and determine the wake word recognition user of the optimization target user.
[0005] The beneficial effects of this application are as follows: By analyzing users' historical wake-up data, user groups with response deviations to the robot's standard wake-up commands are identified. Based on the size of this group, different levels of wake-up optimization and recognition methods are generated and enabled to improve the overall reliability of identifying the reasons for users' low wake-up frequency. Specifically, when an individual user has a low historical wake-up frequency, a relaxed strategy is adopted to selectively activate the camera device to identify the user's wake-up intention, laying the foundation for further wake-up word updates.
[0006] The determination of the wake-up recognition user for the optimized target user is based on the updated recognition results of the wake-up word of the target user and the deviation from the wake-up words of other optimized target users. It fully considers that keeping the robot in active monitoring mode for a long time will increase power consumption and affect the robot's lifespan. Based on this, the determination of the wake-up recognition user with a large number of updates and a high risk of interference is based on the number of user wake-up words updated and the severity of the interference risk caused by the deviation from the wake-up words of other users. The wake-up recognition user with a large number of updates and a high risk of interference is determined, and the camera is no longer actively turned on. The wake-up reliability under the current wake-up word is verified to ensure the reliability of the robot and reduce the technical problem of excessive interference risk caused by the inconsistency of wake-up words between different users.
[0007] Furthermore, the user's historical wake-up data is divided according to the recognition results of the user's voice features, specifically, the historical wake-up data is divided into different users based on the voice features.
[0008] Furthermore, the user's historical wake-up data includes the number of times the user has historically woken up.
[0009] Furthermore, the method for determining the robot's wake-up optimization recognition method is as follows: Based on the historical wake-up data, the number of times the robot has been woken up by different users is determined. Based on the historical number of wake-ups, identify users among the users who exhibit wake-up response deviations; Based on the wake-up response deviation user data, an optimized wake-up recognition method for the robot is determined.
[0010] Furthermore, the method for determining the wake-up recognition user of the optimized target user is as follows: The number of wake words to be updated for the target user is determined based on the updated recognition results of the wake words for the target user. Based on the deviation data of the wake words of the optimized target user and other optimized target users, a wake word that is inconsistent with the wake words of the optimized target user and other optimized target users is determined and treated as an independent wake word; Based on the number of wake words updated for different target users and the unique wake words of the target users, wake-up recognition users among the target users are determined.
[0011] Secondly, the present invention provides an artificial intelligence chip for multi-source wake-up management of robots, comprising: a processor and a memory, wherein the memory stores a computer program, and the processor executes the computer program to implement the above-mentioned multi-source wake-up management method for robots.
[0012] Other features and advantages will be set forth in the following description, and the objects and other advantages of the invention are realized and obtained through the structures particularly pointed out in the description and the drawings.
[0013] To make the above-mentioned objects, features and advantages of the present invention more apparent and understandable, preferred embodiments are described below in detail with reference to the accompanying drawings. Attached Figure Description
[0014] The above and other features and advantages of the present invention will become more apparent from a detailed description of exemplary embodiments thereof with reference to the accompanying drawings.
[0015] Figure 1 This is a flowchart of a multi-source wake-up management method for robots; Figure 2 This is a flowchart illustrating the method for determining the optimal recognition method for robot wake-up; Figure 3 This is a flowchart illustrating the method for identifying target users for optimization within the user base; Figure 4 This is a flowchart illustrating the method for determining the updated recognition strategy for the wake words of the target user. Detailed Implementation
[0016] To enable those skilled in the art to better understand the technical solutions in this specification, the technical solutions in the embodiments of this specification will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of this specification, and not all embodiments. Based on the embodiments of this specification, all other embodiments obtained by those skilled in the art without creative effort should fall within the scope of protection of this specification.
[0017] Example 1 like Figure 1 As shown, this application provides a multi-source wake-up management method for robots, specifically including: S1 analyzes the robot's voice wake-up data to obtain historical wake-up data for different users. Based on the historical wake-up data, it determines the robot's wake-up optimization recognition method. Based on the wake-up optimization recognition method, it manages and processes the robot's camera device. Based on the matching degree between the voice recognition result and the voice wake-up command, and combined with the user's historical wake-up process change data and the wake-up optimization recognition method, it determines the optimization target user among the users. Furthermore, the user's historical wake-up data is divided according to the recognition results of the user's voice features, specifically, the historical wake-up data is divided into different users based on the voice features.
[0018] The core decision-making objective of this embodiment is to identify user groups with response deviations to standard robot wake-up commands by analyzing users' historical wake-up data. Based on the size of this group, different levels of wake-up optimization identification methods are generated and activated to improve the overall reliability of identifying the reasons why users wake up infrequently. The core logic is as follows: using the historical wake-up count of an individual user as a quantitative indicator, "wake-up response deviation users" are screened by comparing it with a preset threshold. Then, depending on whether the number of this user group exceeds another preset threshold, a decision is made to adopt either a "global leniency strategy" or a "targeted leniency strategy." This achieves adaptive strategy configuration for identification environments with different risk levels, determining under what circumstances the camera device should be activated to identify whether a user has a wake-up request.
[0019] Furthermore, the user's historical wake-up data includes the number of times the user has historically woken up.
[0020] Specifically, such as Figure 2 As shown, the method for determining the robot's wake-up optimization recognition method is as follows: S11 determines the number of times the robot has been woken up by different users based on the historical wake-up data; The system accesses the historical database, where each successful wake-up record has been associated with a specific registered user through voice feature recognition. The system performs statistical operations, calculating the total number of successful wake-up events recorded under each registered user's name within a set statistical period.
[0021] Specific example: The system queries the database and accumulates the number of log entries marked "successfully woke up" for users A, B, and C within the statistical period.
[0022] Historical wake-up counts are a core indicator characterizing the frequency and stability of user interaction with the robot's wake-up function. The number of times a user successfully wakes up directly reflects the strength of their habitual use of the function and their mastery of the standard wake-up process. This data is an objective, measurable, and continuous variable, forming the basis for subsequent classification and judgment. This step digitizes user wake-up behavior, providing actionable input for data-driven automated decision-making and realizing the transformation from qualitative description to quantitative analysis.
[0023] S12 determines the wake-up response deviation users among the users based on the historical wake-up count; The system compares the historical wake-up counts for each user obtained in S11 with a predefined threshold for wake-up counts. For any user whose historical wake-up counts are less than this threshold, the system identifies them as a user with poor wake-up response.
[0024] Specific example: The system compares the number of times user C has been woken up in history with the preset number of wake-up count threshold. Since the former is less than the latter, user C's identity is added to the wake-up response deviation user list.
[0025] Users with poor wake-up response are the key target set identified by this method, referring to users whose historical interaction frequency is significantly lower than the normal level. Setting a clear frequency threshold is a necessary condition for binary classification. This threshold is set based on the analysis of typical user behavior, aiming to separate users with abnormally sparse interaction behavior from the entire user base as potential candidates with usage difficulties or habit differences. This step completes the initial user segmentation, accurately identifying specific user subsets that may require special attention or adjustment strategies from the system, providing a basis for subsequent strategy branching decisions.
[0026] S13 determines the robot's wake-up optimization recognition method based on the wake-up response deviation user data.
[0027] It is understood that the wake-up response deviation user refers to a user whose historical wake-up count is less than a preset wake-up count threshold.
[0028] This step is the strategy decision layer. The system first checks whether the set of users with wake-up response deviation is empty. If it is not empty, it further determines the relationship between the number of users in the set and another preset threshold for the number of users with deviation, and selects to execute the corresponding wake-up optimization recognition process based on the comparison result.
[0029] It is understood that, based on the wake-up response deviation user data and the historical wake-up counts of different users, the wake-up optimization recognition method for the robot is determined, specifically including: Case 1: If there are no users with wake-up response deviation, then no wake-up optimization and recognition processing will be performed on the robot. That is, the robot will be woken up using a combination of voice wake-up and video wake-up. If there are no users with wake-up response deviations, the system determines that there are no abnormal users and does not perform any optimization processing. The robot maintains the default multi-source wake-up mode, that is, the voice wake-up module and the video wake-up module work in parallel according to the originally set, relatively strict collaborative logic, and only trigger wake-up or perform mutual verification when they reach a high confidence level.
[0030] Specific example: In this case, the system will only execute a wake-up response when the voice recognition confidence is extremely high and the camera device detects that the user has a wake-up request.
[0031] Not performing wake-up optimization processing for the robot means maintaining the baseline system configuration. When all users' historical data are normal, it indicates that the existing wake-up strategy is well-matched with the user group. Maintaining the original strategy at this time ensures the stability and efficiency of system operation, avoiding the introduction of unnecessary complexity or potential false triggering risks. This guarantees that in most normal scenarios, the system operates with optimal resource consumption and the highest accuracy, and the optimization mechanism is only activated when data indicates it is necessary.
[0032] Scenario 2: If there are users with wake-up response deviations, determine whether the number of such users exceeds a preset threshold. If so, the robot's wake-up optimization recognition method is to activate the camera when any voice wake-up command of the robot is identified that has a similarity greater than a preset threshold, to determine whether the user needs to wake up the robot. If not, the camera is activated when any voice wake-up command of a user with wake-up response deviations is identified that has a similarity greater than a preset threshold, to determine whether the user needs to wake up the robot.
[0033] Scenario 2: If there are users with wake-up response deviations.
[0034] Sub-step: The system calculates the total number of users with current wake-up response deviation and determines whether the value is greater than the preset threshold for the number of users with deviation.
[0035] The preset threshold for the number of users with deviations is a critical parameter that distinguishes between two different levels of optimization strategies. The scale of the number of users with deviations reflects the prevalence of the potential problem. A few individual deviations may be related to the characteristics of the users themselves, while a large-scale deviation may indicate a general mismatch between the system's default settings and the current main user group. Differentiating between these two situations helps to take more targeted and cost-effective optimization measures. This judgment enables differentiated output of the strategy, allowing the system to choose whether to implement a global policy relaxation or adjust only for specific individuals, depending on the scope of the problem.
[0036] Case 2-1 (Number of users with deviation > Preset threshold for number of users with deviation).
[0037] The system's judgment deviations are common, so a globally lenient strategy is generated as an optimized wake-up recognition method. This method stipulates that when the system recognizes that the similarity between any user's voice command and the standard wake-up command is greater than a preset similarity threshold, regardless of the user's identity, the camera device is immediately activated to perform visual assistance judgment to confirm whether the user has a wake-up intention. Under this strategy, any user who utters a sentence with a pronunciation similar to the standard wake-up word may trigger the robot's visual confirmation process.
[0038] The global leniency strategy is an aggressive strategy aimed at improving overall recall. When the proportion of users with poor performance is high, it indicates that the overall number of wake-up calls is currently low. Therefore, there is a greater need to identify and process whether the low number of wake-up calls is actually due to unfamiliarity with the wake word. The strategy involves globally lowering the voice trigger threshold and supplementing it with visual confirmation to determine if there is indeed an issue of unfamiliarity with the wake word.
[0039] Case 2-2 (Number of users with deviation ≤ Preset threshold for number of users with deviation).
[0040] Operation: The system determines that the deviation is an isolated phenomenon and generates a targeted lenient strategy as a wake-up optimization and recognition method. This method stipulates that the camera device will only be activated for visual assistance to determine whether a wake-up request exists if the system identifies the current voice as coming from a user with a marked wake-up response deviation, and the similarity between the user's voice command and the standard wake-up command is greater than a preset similarity threshold. Instead of performing robot wake-up processing, this step is not performed. For non-deviation users, the default multi-source wake-up mode is still used.
[0041] Specific example: Under this strategy, visual confirmation is triggered only when a specific user (such as user C) utters a similar statement; the interactions of other users are unaffected.
[0042] Targeted easing strategy is a resource-efficient and controllable precision optimization strategy. By focusing optimization on the difficulties faced by a small number of specific users, it effectively identifies user experience bottlenecks while minimizing disruption to the majority of users and additional system overhead. This embodies the concept of refined management. While ensuring that overall system efficiency is not significantly affected, this strategy significantly improves the efficiency of identifying and handling wake-up deviations in individual "wake-up-difficult users," achieving optimized allocation and personalized adaptation of system resources.
[0043] This embodiment provides a data-driven, hierarchical dynamic generation mechanism for robot wake-up strategies. Its value lies in enabling the robot to transcend fixed interaction logic, automatically diagnose potential wake-up efficiency issues based on user behavior patterns reflected in historical interaction data, and generate matching optimization strategies according to the scope of the problem's impact. This method can identify whether the low number of wake-up attempts is due to unfamiliarity with the wake word or a lack of genuine user wake-up needs. By proactively identifying users unfamiliar with the wake word, it also identifies a clear target user group for potential subsequent system optimizations (such as personalized wake word updates or training). Simultaneously, its hierarchical strategy mechanism ensures that resolving local problems does not affect overall performance, improving system inclusiveness while maintaining overall operational efficiency.
[0044] In a specific scenario implementation: There is a home service robot. The system has a preset statistical period of 30 days, a preset wake-up threshold of 10 times, a preset deviation user threshold of 2 people, and a preset similarity threshold of 80%.
[0045] S11 execution: The system counts data from the last 30 days and obtains the following user information: Father (historical wake-up count = 50 times), Mother (historical wake-up count = 45 times), Child (historical wake-up count = 8 times), Grandparents (historical wake-up count = 3 times).
[0046] S12 execution: Compare each user's historical wake-up count with the preset wake-up count threshold (10 times). Children (8 times < 10 times) and grandparents (3 times < 10 times) are identified as wake-up response deviation users.
[0047] S13 execution: There are users with wake-up response deviations, proceed to case 2. The number of users with deviations is calculated to be 2. Compared with the preset threshold for the number of users with deviations (2 people), it meets the "≤" condition, so proceed to case 2-2.
[0048] Strategy Generation and Implementation: The system adopts a targeted lenient strategy as the current wake-up optimization and recognition method. This method only applies to users with wake-up response deviations (children, grandparents). When the robot recognizes voice from a child or visiting grandparent, and its similarity to the standard wake-up command is greater than 80%, the camera is immediately activated. Visual information is used to determine whether the user is facing the robot and has an intention to interact, thus determining whether a wake-up intention exists. For fathers and mothers, the robot still executes the default multi-source wake-up mode, which requires higher voice confidence.
[0049] Through this embodiment, the system proactively enhances its ability to capture the potential wake-up intentions of unfamiliar users (children, grandparents) without causing any additional interference to experienced users (parents), thereby improving the overall wake-up success rate in home scenarios. Simultaneously, the clearly identified group of "wake-up response deviation users" lays a data foundation for future consideration of whether to provide them with personalized wake-up word settings.
[0050] Specifically, such as Figure 3 As shown, the method for determining the target users for optimization among the users is as follows: In this embodiment, a refined, in-depth diagnosis is performed on the initially identified user group exhibiting behavioral deviations in wake-up interactions. This process precisely distinguishes which users can be effectively served by the current optimization strategy and which require further fundamental system optimization. The core logic is to construct a two-stage decision funnel of "initial screening and refined judgment." The first stage (initial screening) rapidly segments users based on the core quantitative indicator of "wake-up matching ratio," directly identifying users with severely non-standard behaviors. The second stage (refined judgment) for users who fail the initial screening executes three progressive sub-steps: first, analyzing the behavioral improvement effect after the optimization strategy is implemented; second, investigating whether they possess inherent non-standard interaction habits; and finally, assessing the sufficiency of the current optimization strategy's coverage of these non-standard habits, i.e., the sufficiency of using camera devices to capture the number of users with wake-up needs. This logic aims to ensure that the final determination of "optimization target users" is highly necessary and accurate, providing a reliable basis for subsequent resource investment.
[0051] S21 determines the historical wake-up process in which the user's voice recognition result is completely consistent with the voice wake-up command based on the degree of matching between the voice recognition result and the voice wake-up command, and uses it as the matching wake-up process. The matching wake-up process refers to a single interactive event in which a user successfully triggers the robot's wake-up by speaking a voice that is exactly the same as the standard voice wake-up command text string.
[0052] This step is designed to extract "pure" successful samples from all successful wake-up events, ensuring they fully meet design expectations and are not corrected by mechanisms such as voice similarity tolerance. The aim is to accurately assess the user's ability to "accurately reproduce" standard commands.
[0053] This operation forms the foundation for building the entire quantitative diagnostic system. It provides clear and unambiguous molecular data for the subsequent calculation of the "wake-up matching ratio," making it possible to measure the degree of standardization of user behavior.
[0054] Taking a child user in a family as an example, the system retrieves the user's historical successful wake-up logs and compares the voice recognition text results recorded in the logs with the standard command "Xiaodu Xiaodu". Records with completely identical text (e.g., the recognition result is also "Xiaodu Xiaodu") are filtered and marked. These marked records are the matching wake-up process.
[0055] S22 determines the user's wake-up matching ratio based on the proportion of the matching wake-up process in the historical wake-up process; The wake-up matching ratio refers to the percentage of a user's matched wake-up processes out of the total number of all historical wake-up processes (i.e., all successful wake-up events) within a specific statistical period.
[0056] This ratio condenses users' complex and multidimensional interaction patterns into a single, intuitive, and comparable numerical indicator. It directly reflects the probability that a user will successfully achieve an interaction through a strictly standardized path. As a primary core filter, the wake-up matching ratio can efficiently distinguish between users who highly rely on standard processes and users who highly rely on the system's fault-tolerance mechanisms, providing crucial quantitative input for subsequent rapid judgment and decision-making.
[0057] For example, the system statistics show that this child user was marked as matching the wake-up process 8 times in the past month, while the total number of successful wake-ups in that month was 40. By calculating (8 / 40*100%), the wake-up matching rate of this user is 20%.
[0058] S23 determines whether the user is an optimization target user based on the user's historical wake-up process change data, wake-up matching ratio, and wake-up optimization identification method.
[0059] The target user for optimization refers to the individual user who, after going through the multi-dimensional diagnostic process described in this embodiment, is ultimately determined to need deeper systemic interventions such as personalized wake word training and targeted optimization of acoustic models.
[0060] This step serves as the central decision-making point in the entire diagnostic process. It's designed to intelligently select the most efficient judgment path based on the initial assessment of the wake-up matching ratio. For users with significant issues, a rapid response is provided; for users with complex situations, in-depth analysis is initiated. This structure establishes a hierarchical decision-making framework that ensures processing efficiency while balancing diagnostic depth and accuracy, avoiding potential misjudgments or omissions caused by a "one-size-fits-all" approach.
[0061] Specifically: After the system obtains that the wake-up matching rate of child users is 20%, it compares this value with an internally preset baseline and determines the subsequent execution process based on the comparison result.
[0062] It should be noted that if the wake-up matching ratio of the user is less than the preset wake-up matching ratio threshold, then the user is determined to be an optimization target user.
[0063] The preset wake-up matching ratio threshold is a predefined numerical limit used to perform fast channel determination.
[0064] When a user's wake-up match rate is extremely low, this is the clearest and strongest signal that they are unable to effectively use the standard wake-up word. Establishing a fast-determination channel for such high-priority cases is based on considerations of problem severity and timeliness of response. This greatly optimizes the efficiency of the overall diagnostic process, ensuring that system resources can be prioritized and allocated promptly to the individuals with the most urgent optimization needs and the most significant problems.
[0065] Additionally, it is understood that if the user's wake-up matching ratio is not less than a preset wake-up matching ratio threshold, the following content is also included: S231 Based on the change data of the user's historical wake-up process, determine the daily average number of changes in the user's historical wake-up process after adopting the wake-up optimization identification method, and determine whether the daily average number of changes in the user's historical wake-up process is greater than a preset number threshold. If yes, determine that the user is the optimization target user; otherwise, proceed to step S231. The variable data here specifically refers to the quantitative change in the number or frequency of successful wake-ups per unit time (e.g., per day) before and after enabling a specific wake-up optimization identification method for the user.
[0066] Even if a user's wake-up matching ratio "meets the standard," if the introduction of existing optimization strategies can significantly improve their interaction success rate, this strongly proves from a practical perspective that the user previously had a real need that was suppressed by the standard process, and is the key beneficiary of the current optimization strategy. This step diagnoses from the reverse perspective of "dynamic effects of intervention measures," effectively preventing the omission of users who have qualified surface data but actually rely heavily on optimization "crutches" to interact normally, thus improving the comprehensiveness of the diagnosis.
[0067] For example, for an elderly user whose wake-up matching rate was just above average, the system analysis showed that after enabling the "visual assistance confirmation" strategy, the average number of successful wake-ups per day significantly increased from 0.5 before the strategy to 2.0 after the strategy. This significant increase was recorded as change data and used for subsequent judgment.
[0068] S232 determines whether the user has a history of wake-up process where the voice recognition result and voice wake-up command are not completely consistent. If yes, proceed to step S233. If no, determine that the user does not belong to the optimization target user. Inconsistent historical wake-up processes refer to interactive events where a user successfully wakes up the robot, but the voice recognition text that triggered this wake-up is not exactly the same as the standard voice wake-up command text.
[0069] This step aims to investigate whether users have developed any stable, non-standard interaction habits or expressions. If a user has no such history, it indicates that all their interactions strictly follow standard paths and their behavioral patterns are pure. As an effective secondary filter, it safely excludes users who do not exhibit any non-standard interaction habits and therefore do not require in-depth optimization from subsequent, more complex analysis processes.
[0070] Specifically: The system queries the elderly user's entire historical records to check for records of successful wake-up calls using expressions that are not entirely consistent with the standard command "Xiaodu Xiaodu," such as "Hello Xiaodu" or "Turn on the TV."
[0071] S233 Based on the wake-up optimization recognition method, determine the user who is woken up by a statement that has a similarity greater than a preset threshold to the voice wake-up command, and take the user as the wake-up target user. Determine whether the proportion of the wake-up target user among the users is greater than a preset proportion threshold. If so, determine that the user does not belong to the optimization target user. If not, determine that the user belongs to the optimization target user.
[0072] Wake-up recognition users are a specific category here, referring to users whose successful wake-up events are attributed primarily to the lenient conditions set in the currently enabled wake-up optimization recognition method (e.g., triggering multimodal confirmation after the voice similarity exceeds a certain threshold).
[0073] This step aims to provide a final evaluation of the effectiveness of the current optimization strategy, determining the number of target users among all users at this point. If the majority of users have adopted this strategy, and the deviation of wake-up voices for most users has been reliably verified, then for users with a small proportion of historical wake-up processes where the voice recognition results are not completely consistent with the voice wake-up command, the wake-up keyword update process will not be performed temporarily, provided that the recognition is reliable.
[0074] This embodiment is implemented in an enterprise meeting robot system. The system's preset thresholds are as follows: preset wake-up matching ratio threshold of 55%, preset quantity threshold of 1.5 times / day, and preset ratio threshold of 80%. The standard voice wake-up command is "Start meeting recording". Based on preliminary analysis, users "A" and "B" have been identified as users with wake-up response deviations, and a directional visual assistance strategy has been enabled for them (as a wake-up optimization recognition method, the rule is: for a specific user, when the voice similarity is >80%, the camera is forcibly turned on for intent confirmation).
[0075] S21 and S22 execution (statistical period: the most recent 20 working days): User "A": Total successful wake-up attempts: 15. Among them, 3 wake-up attempts were for the exact text match "Start Meeting Recording" obtained through speech recognition. The wake-up matching rate is 3 / 15 = 20%.
[0076] User "B": Total successful wake-up attempts: 10. Among them, 6 were wake-up attempts where the speech recognition text completely matched "Start Meeting Records". The wake-up matching rate = 6 / 10 = 60%.
[0077] S23 Decision-Making Process Execution: For user "New Employee A": their wake-up matching ratio (20%) is less than the preset wake-up matching ratio threshold (55%). According to the rules, the system directly determines user "A" as the target user for optimization.
[0078] For user "Visitor B": their wake-up matching ratio (60%) is not less than the preset wake-up matching ratio threshold (55%). Therefore, proceed to the deep analysis process.
[0079] S231: The system analyzes the changing data. After enabling the targeted visual assistance strategy, the average number of successful wake-ups for user "Visitor B" increased from 0.2 times per day before the strategy to 1.0 times after the strategy, a change of 0.8 times / day. This value does not exceed the preset threshold (1.5 times / day). Therefore, proceed to S232.
[0080] S232: The system checks the user "B's" history and finds four non-standard wake-up records (such as "start record"), indicating that there are inconsistent historical wake-up processes. Therefore, proceed to S233.
[0081] S233: The system analysis shows that only 2 users use the directional visual assistance strategy for wake-up recognition processing, while there are 100 people in the company. Therefore, the proportion of users using the directional visual assistance strategy for wake-up recognition processing is 2%. Most users do not use the directional visual assistance strategy for wake-up word recognition processing. Therefore, in order to ensure overall reliability, B will also be included as an optimization target user.
[0082] The method proposed in this embodiment achieves high-precision and high-efficiency identification of "optimization target users" by constructing a multi-level diagnostic system with "wake-up matching ratio" as the initial screening core and "change effect verification - non-standard habit exploration - strategy coverage evaluation" as progressive fine screening methods. Its core value lies in upgrading resource allocation decisions from rough judgments based on experience to precise diagnosis based on multi-dimensional, quantifiable evidence chains. This not only significantly improves the efficiency of resource utilization and avoids the blindness of optimization measures, but more importantly, by deeply analyzing the interaction between individual user behavior patterns and existing system strategies, it can clearly reveal users who have wake-up word update needs. This method provides a key technical path and decision-making framework for intelligent systems to evolve from "general-purpose services" to "personalized adaptation," and has substantial implications for improving the robustness, inclusiveness, and user satisfaction of human-computer interaction in complex real-world scenarios.
[0083] S2 determines the wake-up word update and recognition strategy for the target user based on the optimized target user data and the wake-up requirement recognition data of the optimized target user based on the wake-up recognition optimization method. The core decision-making objective of this embodiment is to dynamically determine a tiered and differentiated "wake word update and recognition strategy" for a defined "target user" group, based on the group's size and the historical dependence data of its members on visual assistance wake-up. Its core logic is based on optimizing the coverage probability of camera active activation periods. When the number of target users is large, due to their high overall daily interaction frequency, the time periods during which the camera is passively activated in response to their voice naturally increase, resulting in a high overall monitoring coverage. Therefore, the strategy can focus on "precise response," i.e., activating the camera only when voice is detected. When the number of target users is small, the system's natural monitoring coverage is low. In this case, the strategy needs to be dynamically adjusted according to the group's internal composition: if a large proportion of users in the group have a high dependence on visual assistance ("high number of active recognitions"), the frequency of active activation needs to be increased to compensate for the lower coverage; conversely, a lower frequency of active activation should be maintained. Through this logic, the system aims to: reliably monitor users during periods of high coverage generated by their natural interactions when the user group is generally active; and intelligently fill the blind spots in monitoring coverage by strategically adjusting proactive activation behavior when the user group is inactive or constitutes a single entity, thereby ensuring a stable and reliable global monitoring probability for the wake-up intentions of all optimized target users.
[0084] Furthermore, such as Figure 4 As shown, the method for determining the updated recognition strategy for the target user's wake word is as follows: S31 determines the number of target users based on the target user data; The system obtains the list of target users for optimization that has been accurately filtered through the aforementioned diagnostic process (such as S21-S23), and counts the total number of users in the list.
[0085] This step aims to quantify the basic size of the target user group that requires special attention when applying new strategies. Group size is the basis for assessing the frequency and duration of passive camera activation that may result from natural system interactions, elevating the input for strategy formulation from the individual level to the group statistical level, and providing the primary basis for judging the current potential level of "natural monitoring coverage" of the system.
[0086] It should be noted that if the number of target users to be optimized is greater than the preset threshold for the number of target users to be optimized, then the update and recognition strategy for the wake words of all target users to be optimized is to turn on the camera when the voice data of the target user to be optimized is recognized. At this time, since there are many users to be recognized, the recognition reliability of each target user is high.
[0087] The system has a preset threshold for the number of target users. When the number of target users exceeds this threshold, the system adopts a uniform and relatively strict strategy for all target users: the camera is only activated if the current voice signal is confirmed to originate from any user on the list via voiceprint recognition. In this case, the system does not perform any pre-set periodic active activation processing.
[0088] When the target user group is large, these users interact with the robot via voice in their daily environment at a high frequency. This results in a large number of periods where the camera is "passively turned on" in response to these users' voice commands, and the system has already achieved high coverage of natural time periods. At this point, the strategy should shift its focus to "precision," reducing proactive probing when there are no clear voice cues, and instead making full use of existing high-frequency voice interaction periods for reliable monitoring. The more users there are, the more densely the naturally triggered monitoring periods will be, and the more the strategy should rely strictly on voice triggers.
[0089] This is an "efficiency-first" strategy in a high-coverage environment. It leverages the inherently high interaction frequency in a multi-user environment to ensure monitoring coverage, avoiding unnecessary repeated active probing during periods of sufficient coverage, and making monitoring behavior more closely aligned with the actual timing of interaction intentions.
[0090] It should also be noted that if the number of target users for optimization is not greater than the preset target number threshold, then proceed to step S32. This judgment implies that the target user group is relatively small, and the passive monitoring coverage generated by the system's natural interactions may be insufficient, requiring intervention and strategy adjustments to optimize the overall monitoring probability.
[0091] When the number of target users is limited, relying solely on passive activation triggered by voice may result in sparse monitoring periods and blind spots. Therefore, more refined analysis is needed to determine whether and how to supplement coverage through proactive activation. This directs the decision-making process towards in-depth analysis of the group's internal composition, providing an entry point for intelligently adjusting monitoring strategies in scenarios with low natural coverage.
[0092] S32 determines the number of times the wake-up needs of the target user based on the wake-up recognition optimization method are recognized, and uses this number as the number of active recognitions. The active recognitions are then used to identify users among the target users who need interaction improvement. For each target user, the system counts the number of times they successfully woke up using the currently effective wake-up recognition optimization method within the past evaluation period. This number is defined as the user's proactive recognition count. Subsequent steps will use this data to define users with interaction improvement needs.
[0093] The number of active recognitions directly quantifies the user's wake-up success rate for non-standard paths (relaxed voice conditions relying on visual assistance), i.e., the strength of their dependence on the current optimization mechanism. A higher number indicates that the user is more inclined or more dependent on being woken up through this "voice + vision" fusion method. This step creates a "needs profile" within the target user group, identifying individual users who are more likely to benefit from or need continuous, relaxed visual monitoring, providing a key dimension for refined strategy formulation.
[0094] The above steps include the following: S321 Based on the number of active recognitions of different optimization target users, determine whether there are optimization target users whose number of active recognitions is greater than the preset recognition threshold. If yes, proceed to step S33. If no, determine that the update recognition strategy for the wake-up word of the optimization target user is to turn on the camera when the voice data of the optimization target user is recognized. The system sets a preset threshold for the number of recognition attempts. It iterates through the number of active recognition attempts by all target users. If the number of attempts for all users does not exceed this threshold, it indicates that the overall reliance of this small group on visual assistance is not high, and the intensity of demand does not show significant differentiation.
[0095] When the needs within a small group are homogeneous and of moderate intensity, there is no need for complex differentiation strategies. Adopting a similar voice-triggered strategy as for large groups is reasonable and straightforward. Although a small user base may result in low natural coverage, maintaining a relatively strict (voice-triggered only) strategy is a cost-benefit balance choice given the less intense demand.
[0096] S33 determines the update and recognition strategy for the wake-up words of the target users based on the number of target users to be optimized, the number of times different target users actively identify each other, and the composition data of users with interaction improvement needs.
[0097] When S321 determines that there are users with high demand, the system enters this comprehensive decision-making step. This step will comprehensively consider the total number of target users, the specific demand intensity data of each user, and the composition of the high-demand users (i.e., users with interaction improvement needs) identified as a result.
[0098] In complex situations where the number of target users is small but the intensity of their internal needs varies significantly, a strategy is needed that can dynamically adjust the frequency of proactive processing based on the distribution of needs in order to optimize the overall monitoring coverage probability. This is the core layer of strategy generation, which aims to create a dynamic strategy that can respond to changes in the internal composition of the group and intelligently determine the "intensity" and "target" of proactive monitoring.
[0099] The above steps include the following: S331 determines the update recognition requirement weight value for different optimization target users based on the proportion of active recognition times of different optimization target users in the historical wake-up process, and judges whether the sum of the update recognition requirement weight values of different optimization target users is greater than the preset weight threshold. If so, the update recognition strategy for all optimization target users' wake words is to keep the robot's camera in the always-on state to realize the recognition processing of the wake words of optimization target users. If not, proceed to step S332.
[0100] The system calculates a demand weight value for each user, for example (the number of times the user actively identifies the device / the total number of successful wake-ups during the same period). This weight value reflects the "dependency ratio" of successful wake-ups on the current optimization mechanism. The system calculates the sum of the weight values for all target users. If this sum exceeds a preset weight threshold, it indicates that this small group has an extremely high "overall dependency density" on visual assistance.
[0101] When the overall dependency density of a small group is extremely high, it means that these users rely almost entirely on the "voice + vision" fusion channel. To provide them with continuous and uninterrupted wake-up possibilities, adopting the most extreme "always-on camera" strategy becomes a necessary means to ensure their basic interactive usability, and an upper limit is set for the strength of this strategy. When the target group exhibits extreme, collective high dependency characteristics, the system will activate a continuous monitoring mode to maximize the reliability of wake-up capture.
[0102] S332 identifies the target users whose active recognition count exceeds a preset recognition count threshold as users with interaction improvement needs. It determines whether the target users are users with interaction improvement needs. If so, if the wake-up voice of the target user is not recognized within the most recent preset time period, active activation is performed in the next preset time period. Alternatively, if the voice data of the target user is recognized, the camera is activated. Otherwise, proceed to step S333. For individuals identified as users with high-demand interaction improvement needs (i.e., high-demand users), the system applies a flexible combination strategy of "proactive processing + voice triggering." Specifically, if no voice is detected from the user within a preset time period, the system will proactively activate the camera to perform an environmental scan at the next preset time interval; simultaneously, once the user's voice is detected, the camera will also be activated immediately.
[0103] For high-demand individuals, during quiet periods where natural voice triggering may be insufficient, periodic active scanning is used to supplement monitoring coverage. This aims to reduce the probability that their wake-up intentions will be missed due to environmental silence, achieving "key protection" for high-demand users. By increasing the frequency of active monitoring, the coverage rate for their time periods can be improved in a targeted manner.
[0104] S333 determines whether the proportion of users with interaction improvement needs among the target users is greater than a preset target user proportion threshold. If so, the camera is turned on when the voice data of the target user is identified. If not, the camera is turned on in the next preset time period if it has not been turned on in the most recent preset time period, or the camera is turned on when the voice data of the target user is identified.
[0105] This step involves developing strategies for the remaining optimization target users (i.e., users with moderate demand) who were not identified as having interaction improvement needs. The key to this decision lies in the proportion of users with high demand. If the proportion of users with interaction improvement needs exceeds a preset threshold for the proportion of optimization target users, it indicates that users employing a "lenient strategy" dominate this small group.
[0106] When users with a "lenient strategy" dominate, the system's periodic proactive activation for these users is already quite frequent, objectively resulting in high passive monitoring coverage for the entire environment (including the remaining general users). Therefore, for the remaining general users, a relatively strict strategy can be adopted, activating the camera only when their voice is heard, thus fully utilizing the "overflow" monitoring coverage generated by the high-demand user strategy. Conversely, if the proportion of high-demand users is low, it indicates that the overall proactive activation frequency is not high, and monitoring coverage may be insufficient. Therefore, a hybrid strategy of "periodic proactive activation + voice triggering" is needed for general users to jointly improve coverage. Furthermore, the fewer the total number of users, the higher the proactive activation frequency assigned to each user may need to maintain a certain coverage rate (i.e., a more "lenient" strategy).
[0107] The overall strategy was designed collaboratively. It dynamically adjusts the strategy strength for general users based on the density of high-demand users within the group, achieving global optimization of the group's monitoring coverage probability, rather than simply stacking isolated user strategies.
[0108] Assume a home intelligent robot scenario where, through preliminary processes, the target users are identified as: two children (child A, 8 years old; child B, 5 years old) and a long-term resident (grandfather C, 70 years old). The system's preset key thresholds are as follows: Preset target user quantity threshold = 4 people; Preset recognition frequency threshold = 10 times / week; Preset weight threshold = 2.0; Preset target user proportion threshold = 60%. The evaluation period is one week (7 days).
[0109] S31 execution: Optimize the number of target users to 3, which is no greater than the threshold of 4, then proceed to S32.
[0110] S32 Execution: Count the number of times each user is successfully woken up by active recognition using the currently effective "targeted easing strategy" (as a wake-up recognition optimization method) (i.e., the number of times visual confirmation is triggered by voice similarity > 75%): Child A: 12 times / week; Child B: 18 times / week; Grandfather C: 4 times / week; S321 Execution: Check whether the number of active identifications by each user exceeds the preset identification threshold (10 times / week). Child A (12 times) and Child B (18 times) exceed the threshold, while Grandfather C (4 times) does not exceed the threshold. Therefore, there are users with high demand, proceed to step S33.
[0111] S33 (including S331) execution: First, calculate the demand weight value for each user. Assume the total number of successful wake-ups for each user this week are: Child A 15 times, Child B 20 times, Grandfather C 10 times. Then the weight values are: Child A: 12 / 15 = 0.8; Child B: 18 / 20 = 0.9; Grandfather C: 4 / 10 = 0.4. The sum of the weights is 0.8 + 0.9 + 0.4 = 2.1.
[0112] The weight sum of 2.1 is determined to be greater than the preset weight threshold of 2.0, so the process enters the "if" branch of S331.
[0113] Strategy determined (according to S331): Since this small-scale target user group (child A, child B, grandfather C) has an extremely high overall dependence density on visual-assisted wake-up (weight sum > 2.0), the system determines the final wake-up word update recognition strategy as follows: During the active period at home (e.g., from 7 am to 10 pm), keep the robot's camera on to achieve continuous and complete recognition of the wake-up intentions of the target users.
[0114] In this scenario, the two child users' strong reliance on visual-assisted wake-up increased the overall reliance density of the small group. Therefore, the system ultimately adopted the most reliable "always-on camera" strategy. This ensures that during peak family times, the robot can continuously monitor the potential wake-up needs of all family members (especially children and the elderly), regardless of whether they issue voice commands. Although this strategy keeps the camera continuously on, it provides the most comprehensive coverage based on the high-frequency interaction in a family environment with children, avoiding wake-up failures due to monitoring blind spots.
[0115] The core value of the method proposed in this embodiment lies in constructing a dynamic monitoring strategy generation system driven by both "group size" and "individual demand intensity distribution." This method shifts the focus of strategy formulation from simply "whether to enable it" to the more fundamental question of "how to optimize the coverage probability during monitoring periods." By distinguishing between high natural coverage in multi-user scenarios and insufficient coverage risks in small-user scenarios, and introducing internal differential adjustments based on demand intensity, the system can intelligently determine when, to whom, and at what frequency to activate proactive monitoring. This not only fully utilizes the monitoring benefits of natural interaction when users are active, but also ensures a baseline of monitoring reliability through strategic proactive supplementation when users are silent or their demands are concentrated. Ultimately, this method enables the robot's wake-up recognition system to maintain a stable and highly reliable global monitoring capability in an adaptive and cost-effective manner across various user compositions and activity levels, thereby significantly improving the robustness of human-computer interaction and user satisfaction in complex home environments.
[0116] S3 uses the update processing strategy to update the wake words of different target users, and determines the wake-up recognition scheme among the target users based on the similarity of the voice features of different target users and the update deviation data of the wake words.
[0117] Specifically, the method for determining the wake-up recognition user of the optimized target user is as follows: In this embodiment, keeping the robot in active monitoring mode for extended periods increases power consumption and affects its lifespan. Therefore, testing is conducted on certain users by disabling the camera's active activation. This ensures that the robot can be reliably woken up with the updated wake word, thus verifying the wake-up reliability of the user and guaranteeing the robot's overall reliability.
[0118] Specifically, during the parallel updating of personalized wake words for multiple "optimized target users," the system assesses the risk of cross-wake-ups caused by excessive differences in wake words among users and coordinates the management of camera active monitoring resources to determine which users are safely removed from the active data collection phase and marked as "wake-up recognition users." The core logic is that the primary risk of cross-wake-ups stems from excessive differentiation in the set of user-specific wake words. When user A possesses a highly specific wake word X, and user B unintentionally utters this word, if the voiceprint recognition module experiences a momentary deviation due to environmental noise, user status, etc., and incorrectly classifies user B's voice as user A, the robot will be unintentionally woken up by user B. Therefore, the more "independent wake words" a user has, the higher the likelihood of them causing this risk, and the overall risk increases sharply when most users in a group possess a large number of independent wake words. The system assesses the risk of individual and group differences by quantifying the distribution of "independent wake words." Meanwhile, to ensure update efficiency, verification for "users requesting interaction improvement" that have already consumed a significant amount of monitoring resources is delayed. Furthermore, when data for most users is insufficient, the identification of users requiring wake-up is paused entirely to maintain sufficient active monitoring time for the cameras. Ultimately, this achieves orderly and efficient multi-user updates while controlling for differentiation risks.
[0119] S41 determines the number of times the wake word of the target user is updated based on the updated recognition result of the wake word of the target user; The system counts the number of valid candidate wake words captured and confirmed by the camera for each target user within the update cycle, and records this as the number of wake words updated. In other words, when there is a deviation in the wake word, the system updates the candidate wake words when there is a wake-up demand, thereby improving the wake-up success rate.
[0120] The number of updates is a basic indicator for measuring the progress of data collection for a single user, reflecting the initial size of its personalized wake word sample library, providing a basis for subsequent judgment of data sufficiency, and is a prerequisite for initiating any risk assessment.
[0121] S42 Based on the deviation data of the wake words of the optimized target user and other optimized target users, determine the wake words of the optimized target user that are inconsistent with the wake words of other optimized target users, and treat them as independent wake words; The system performs a rigorous text similarity comparison between all candidate wake words for a given user and all candidate wake words for every other target user. Statements that are dissimilar to any candidate words from all other users (text similarity below a strict threshold) are defined as the user's unique wake words.
[0122] Unique wake words are a key indicator for quantifying "user exclusivity" and potential "risk of difference." The more unique wake words a user has, the fewer wake words that may be shared or similar with other users in their wake word library, and the stronger the exclusivity of their wake word set. While this enhances the user's unique identification, it also means a higher risk of false wake-ups when other users accidentally utter these words and voiceprint recognition fails.
[0123] It accurately identified the "high-risk exclusive part" in each user's wake word set, providing a core basis for assessing an individual's "contribution" to overall risk and the group's risk level.
[0124] S43 determines the wake-up recognition user among the optimization target users based on the number of updated wake words for different optimization target users and the independent wake words of the optimization target users.
[0125] This step is the decision-making hub. The system comprehensively considers the data scale (number of updates) of each user and the risk characteristics represented by the data (number and proportion of independent wake words). Through subsequent rule chains, it carefully determines whether the user can end the active data collection and become a wake-up recognition user. This ensures that the state transition from data collection to verification is strictly based on the data and the risks it reflects, so as to systematically control the potential for false wake-ups caused by excessive differentiation of wake words.
[0126] It should be noted that if the target user for optimization is a user with interaction improvement needs, then as long as there are users who are neither users with interaction improvement needs nor users who are wake-up recognition users, they will not be used as wake-up recognition users, thereby ensuring that their wake-up words can be reliably identified.
[0127] Users requiring improved interaction are those allocated more camera monitoring time. Their wake-word update and recognition strategies correspond to a greater amount of monitoring time. To optimize the overall efficiency of limited resources and prioritize ensuring other ordinary users complete basic data collection, their graduation is strategically delayed. This ensures that the total monitoring time does not decrease due to premature exit, guaranteeing sufficient camera activation time to maintain the overall update speed.
[0128] Implement a global monitoring and resource scheduling strategy to ensure the overall data collection progress of the group by "controlling the outflow of high-resource users".
[0129] Additionally, it is understood that if the target user for optimization is not a user requiring interaction improvement, this includes the following: S431 Obtain the update count of wake words for different optimization target users, and determine whether the number of optimization target users whose update count is less than the preset update count threshold is greater than the optimization target user count threshold. If yes, determine that the optimization target user does not belong to the wake-up recognition user; otherwise, proceed to step S432. The number of users whose wake word updates have not reached the preset update threshold among all target users for optimization is counted. If this number does not exceed the target user number threshold, the overall situation is determined to be in the early stage of data scarcity.
[0130] When most users lack sufficient data, allowing any user to be identified as a wake-up user will directly reduce the total effective time of actively activated camera devices available for data collection. This will decrease the probability of remaining users getting a collection opportunity, thus slowing down the overall progress and creating a cycle of decreased efficiency. Therefore, at this stage, all users should be forced to remain in the collection queue, establishing a group progress guarantee mechanism to prevent decisions that harm overall efficiency during periods of weak basic data.
[0131] S432 determines whether the number of updates for the target user being optimized is less than a preset update number threshold. If yes, proceed to step S433; otherwise, determine that the target user being optimized is a wake-up recognition user. If the group's data base is adequate (S431 determines "No"), then check the current user's own data volume. If their update quantity has met the target, they are allowed to become a wake-up identification user. Under the premise of ensuring overall progress, users with sufficient data collection are rewarded by being the first to enter the next step, i.e., as wake-up identification users for wake-up identification processing. This helps to gradually release and refocus monitoring resources, establish a fast track, and improve process efficiency.
[0132] S433 Determine whether the proportion of the independent wake words of the optimization target user in the wake words of the user is greater than the preset wake word proportion threshold. If yes, determine that the optimization target user belongs to the wake recognition user. If no, proceed to step S434. For users whose data volume does not meet the standard, calculate their unique wake word ratio. If this ratio is higher than the preset wake word ratio threshold, it indicates that although the total sample size of the user is small, their sample library is highly "specialized" and "exclusive".
[0133] A high percentage of independent wake words is a clear risk signal. It means that the vocabulary collected by the user is highly user-specific and potentially high-risk. Allowing such a user to graduate based on a small number of high-risk samples introduces a high-risk source. Therefore, a safer strategy is to allow the user to continue collecting, hoping to obtain more diverse samples (possibly including some more general vocabulary) to dilute their overall risk. If the percentage is high (judged as "yes"), then due to their excessive risk, they should be allowed to perform wake-up recognition processing to solidify their high-risk sample library and remove it from the dynamic collection environment, preventing them from further increasing the risk in subsequent collections.
[0134] Individuals with high-risk characteristics should be dealt with quickly. High-risk individuals (those with a high proportion of independent words) should be removed from the dynamic data collection environment first to solidify the risk.
[0135] S434 determines the wake word interference risk coefficient based on the average proportion of the independent wake words of different optimization target users in the wake words of the user, and determines whether the wake word interference risk coefficient is greater than the preset interference risk coefficient threshold. If so, all optimization target users with more than the preset number of wake words are regarded as wake-up recognition users. If not, it is determined that the optimization target user does not belong to the wake-up recognition user.
[0136] It should be noted that when there are a large number of independent wake words among the target users, different target users need to be identified based on their voice characteristics and the wake words of the target users need to be determined based on the wake words updated by the users in history. When the degree of deviation between the wake words of different target users is high, the risk of misidentification will increase. Therefore, it is necessary to use the number of independent wake words to determine the wake-up identification user.
[0137] For users with insufficient data and a low proportion of independent wake words (i.e., unclear risk characteristics), the system performs a final group risk assessment. The average proportion of independent wake words for all users is calculated. The higher this average, the greater the "differentiation" or "specificity" of the group's overall wake word library, meaning a higher overall risk of cross-wake-ups. Therefore, the wake word interference risk coefficient can be directly defined as this average value. If this coefficient exceeds a preset interference risk coefficient threshold, it indicates that the group is in a high-risk state.
[0138] When the overall risk of group dissimilarity is high, a "risk suppression" strategy must be adopted. In this case, users who already possess a large number of independent wake words (exceeding the preset number of wake words) are marked as wake word identification users. This is because their large number of independent wake words constitutes the high-risk group; removing them from active collection prevents them from "injecting" new proprietary words into the high-risk environment, helping to curb further increases in risk. The freed-up monitoring resources can then be more focused on users with fewer independent wake words, encouraging them to collect more shared and common words, thereby "diluting" the overall dissimilarity of the wake word database and reducing future risk.
[0139] If the overall risk coefficient is not high (the average is low), it indicates that the current environmental risk is controllable. Users with insufficient data and unclear risk characteristics can remain in the data collection queue without special handling; adjustments based on the overall risk situation can be implemented. In high-risk situations, the main risk sources (users with many independent keywords) should be decisively identified and isolated, and resources should be reconfigured to reduce overall risk. In low-risk situations, the standard procedures should be maintained.
[0140] Furthermore, if the target user being optimized is a wake-up recognition user, the camera device will no longer be actively activated to recognize the wake-up word of the target user. The number of wake-up processes for the target user under the current wake-up word will be updated if the requirement is met.
[0141] Complete implementation examples in specific scenarios: In a smart home scenario, the target users for optimization are: a father, a mother, and a son (8 years old, users who need interaction improvement). The system has collected data for one week. Preset thresholds: Preset update quantity threshold = 10 updates; Target user quantity threshold = 2 people; Preset wake word percentage threshold = 70%; Preset wake word quantity = 8; Wake word interference risk coefficient = average percentage of independent wake words in the group; Preset interference risk coefficient threshold = 65%.
[0142] S41 Execution: Number of Updates: Father: 8, Mother: 9, Son (users with interaction improvement requests): 15; S42 Execution: Independent Wake-up Words and Their Percentage: Father: 2 independent wake words (25%), Mother: 3 independent wake words (33%), Son: 12 independent wake words (80%); S43 Decision-Making Process Execution: For the son (user with interaction improvement needs): According to the lowest priority rule, since neither the father nor the mother has graduated, the son is not considered as a wake-up recognition user.
[0143] For the father (a user who does not require interaction improvement): S431: Users with fewer than 10 updates: Father (8), Mother (9), a total of 2. Since the number is less than the threshold of 2, proceed to S432.
[0144] S432: Father's update count 8 < 10, proceed to S433.
[0145] S433: The percentage of father-only wake words is 25% < 70%, indicating no significant risk characteristics. Proceed to S434.
[0146] S434: Calculate the average percentage of independent wake words in the group: (25% + 33% + 80%) / 3 = 46%. The wake word interference risk coefficient is 46%. Since 46% is not greater than 65%, the group risk has not reached the high-risk threshold. Therefore, it is determined that the father is not a wake-up recognition user, and data collection continues.
[0147] In this round of evaluation, although the son (the user requiring interaction improvement) possessed a large number of unique wake words, their use was temporarily suspended due to priority rules. The father and mother had a lower percentage of unique wake words, which lowered the average risk of group variability (46%) and did not trigger the high-risk threshold (65%). Therefore, the system determined that the overall risk is currently manageable, and data collection should continue for the father, mother, and son to enrich their respective wake word libraries. The system will maintain its current monitoring resource allocation, focusing on helping the father and mother collect more samples, hoping they can add more wake words.
[0148] The core value of the method proposed in this embodiment lies in clearly defining and effectively managing the inherent risks caused by "excessive differentiation of wake words" in personalized wake word updates. By explicitly defining "independent wake words" as risk characterization factors and constructing a risk coefficient centered on the "average proportion of independent wake words in the group," the system can quantitatively assess the probability of cross-false wake-ups caused by excessive differences in wake word libraries among users. This method innovatively integrates resource efficiency logic (ensuring overall monitoring duration) with dynamic risk logic (identifying and isolating high-risk sources and diluting overall differences), forming a forward-looking decision-making framework. It not only prevents reckless advancement during periods of weak overall data or risk incubation but also proactively intervenes when risk accumulation is detected to reduce the overall false wake-up risk of the system in the future. Ultimately, this method provides crucial technical assurance for achieving safe, robust, and efficient large-scale personalized wake word updates, fundamentally avoiding the unsafe technical problem of robots' cameras being actively turned on for extended periods, while also ensuring the reliability of wake word update processing.
[0149] Example 2 Secondly, the present invention provides an artificial intelligence chip for multi-source wake-up management of robots, comprising: a processor and a memory, wherein the memory stores a computer program, and the processor executes the computer program to implement the above-mentioned multi-source wake-up management method for robots.
[0150] The various embodiments in this specification are described in a progressive manner. Similar or identical parts between embodiments can be referred to mutually. Each embodiment focuses on describing the differences from other embodiments. In particular, the embodiments of apparatus, devices, and non-volatile computer storage media are basically similar to the method embodiments, so the descriptions are relatively simple; relevant parts can be referred to the descriptions of the method embodiments.
[0151] The foregoing has described specific embodiments of this specification. Other embodiments are within the scope of the appended claims. In some cases, the actions or steps recited in the claims may be performed in a different order than that shown in the embodiments and may still achieve the desired result. Furthermore, the processes depicted in the drawings do not necessarily require the specific or sequential order shown to achieve the desired result. In some embodiments, multitasking and parallel processing are possible or may be advantageous.
[0152] The above description is merely one or more embodiments of this specification and is not intended to limit this specification. Various modifications and variations can be made to the one or more embodiments of this specification by those skilled in the art. Any modifications, equivalent substitutions, improvements, etc., made within the spirit and principle of one or more embodiments of this specification should be included within the scope of the claims of this specification.
Claims
1. A multi-source wake-up management method for robots, characterized in that, Specifically, it includes: The robot's voice wake-up data is analyzed to obtain historical wake-up data for different users. Based on the historical wake-up data, a wake-up optimization recognition method for the robot is determined. Based on the wake-up optimization recognition method, the robot's camera device is controlled and processed. According to the matching degree between the voice recognition result and the voice wake-up command, and combined with the change data of the user's historical wake-up process and the wake-up optimization recognition method, the optimization target user among the users is determined. Based on the wake-up demand identification data of the target users based on the wake-up recognition optimization method, users with interaction improvement needs are identified. Based on the wake-up demand identification data of the users with interaction improvement needs and different target users, the wake-up word update identification strategy of the target users is determined. The update processing strategy is used to identify different wake words for the target users to optimize the updated recognition results of the wake words of the target users and the deviation between the wake words of the target users and other target users, so as to determine the wake-up recognition users of the target users.
2. The multi-source wake-up management method for robots as described in claim 1, characterized in that, The user's historical wake-up data is divided according to the recognition results of the user's voice features, specifically, the historical wake-up data is divided into different users based on the voice features.
3. The multi-source wake-up management method for robots as described in claim 1, characterized in that, The user's historical wake-up data includes the number of times the user has woken up in the past.
4. The multi-source wake-up management method for robots as described in claim 1, characterized in that, The method for determining the robot's wake-up optimization recognition method is as follows: Based on the historical wake-up data, the number of times the robot has been woken up by different users is determined. Based on the historical number of wake-ups, identify users among the users who exhibit wake-up response deviations; Based on the wake-up response deviation user data, an optimized wake-up recognition method for the robot is determined.
5. The multi-source wake-up management method for robots as described in claim 4, characterized in that, The wake-up response deviation user is a user whose historical wake-up count is less than a preset wake-up count threshold.
6. The multi-source wake-up management method for robots as described in claim 1, characterized in that, The method for determining the optimization target users among the users is as follows: Based on the degree of matching between the speech recognition result and the voice wake-up command, determine the historical wake-up process in which the speech recognition result and the voice wake-up command are completely consistent, and use it as the matching wake-up process; The wake-up matching ratio of the user is determined based on the proportion of the matching wake-up process in the historical wake-up process; Based on the user's historical wake-up process change data, wake-up matching ratio, and wake-up optimization identification method, it is determined whether the user is an optimization target user.
7. The multi-source wake-up management method for robots as described in claim 6, characterized in that, If the wake-up matching ratio of the user is less than the preset wake-up matching ratio threshold, then the user is determined to be the target user for optimization.
8. The multi-source wake-up management method for robots as described in claim 1, characterized in that, The method for determining the wake-up recognition user of the optimized target user is as follows: The number of wake words to be updated for the target user is determined based on the updated recognition results of the wake words for the target user. Based on the deviation data of the wake words of the optimized target user and other optimized target users, a wake word that is inconsistent with the wake words of the optimized target user and other optimized target users is determined and treated as an independent wake word; Based on the number of wake words updated for different target users and the unique wake words of the target users, wake-up recognition users among the target users are determined.
9. The multi-source wake-up management method for robots as described in claim 8, characterized in that, If the target user for optimization is a user with interaction improvement needs, then as long as there are users who are neither users with interaction improvement needs nor wake-up recognition users, they will not be considered wake-up recognition users, thereby ensuring that their wake-up words can be reliably identified.
10. An artificial intelligence chip for multi-source wake-up management of robots, comprising: A processor and a memory, the memory storing a computer program, characterized in that, when the processor executes the computer program, it implements a multi-source wake-up management method for a robot as described in any one of claims 1-9.