Voice-based interaction method, interaction device and intelligent device

What is AI technical title?
AI technical title is built by PatSnap AI team. It summarizes the technical point description of the patent document.
By parsing and storing the target control pattern in voice commands, the problem of cumbersome operation in existing technologies is solved, achieving efficient and accurate voice interaction and improving user experience and device control flexibility.

CN122201271APending Publication Date: 2026-06-12MIDEA GRP (SHANGHAI) CO LTD +1

View PDF 0 Cites 0 Cited by

Patent Information

Authority / Receiving Office: CN · China
Patent Type: Applications(China)
Current Assignee / Owner: MIDEA GRP (SHANGHAI) CO LTD
Filing Date: 2026-03-10
Publication Date: 2026-06-12

Application Information

Patent Timeline

10 Mar 2026

Application

12 Jun 2026

Publication

CN122201271A

IPC: G10L15/18; G10L15/19; G10L15/22; G10L15/26; H04L12/28

AI Tagging

Application Domain

Data switching by path configurationSpeech recognition

Explore More Agents

Novelty Search
Search existing technologies and assess novelty
↗
FTO
Analyze whether a product may infringe others' patents
↗
Design FTO
Check prior-design risk for exterior design
↗
Drafting
Draft patent application text based on a technical solution
↗
Find Solutions with TRIZ
Generate feasible solution to solve your technical challenge
↗

Similar Technology Patents

Get free access to AI patent search and analysis

Check patentability, review prior art and ask IP Agent with full patent context.

AI Technical Summary

⚠Technical Problem

In existing voice interaction technologies, users must strictly follow pre-set standard words to issue commands, resulting in cumbersome operation processes, low interaction efficiency, and a negative impact on user experience.

⚗Method used

By detecting user voice commands, parsing and storing the mode identifier and control parameters of the target control mode, users can achieve the desired function without multiple commands. The combination of speech-to-text, semantic understanding and keyword library improves parsing accuracy and robustness.

🎯Benefits of technology

It simplifies user operations, improves interaction efficiency and accuracy, enhances user experience, supports multiple interaction methods and cross-device synchronization, and provides personalized configuration and context awareness.

✦ Generated by Eureka AI based on patent content.

Smart Images

Figure CN122201271A_ABST

Patent Text Reader

Abstract

The application provides a voice-based interaction method, an interaction device and a smart device. The method relates to the field of voice interaction, and the interaction method comprises the following steps: detecting a first voice instruction of a user, wherein the first voice instruction is used for requesting a target control mode of a smart device; obtaining a mode identifier of the target control mode and control parameters corresponding to the mode identifier based on the first voice instruction; and storing the mode identifier of the target control mode and the control parameters corresponding to the mode identifier. The method can simplify user operation and improve interaction efficiency in the process of voice interaction.

Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This application relates to the field of voice interaction, and more specifically, to a voice-based interaction method, interaction device, and smart device in the field of voice interaction. Background Technology

[0002] With the development of the intelligent control field, voice interaction has gradually become the mainstream control method for smart devices. Users can control smart devices such as lights, air conditioners, and curtains directly through voice commands without manual operation. However, in existing technologies, users need to strictly follow a limited number of pre-set standard words to issue commands; if users want to set up smart devices according to their own preferences, they need to issue commands one by one to complete the settings of multiple functions; this results in cumbersome operation processes, low interaction efficiency, and affects the user experience.

[0003] Therefore, simplifying user operations and improving interaction efficiency are urgent problems to be solved in the process of voice interaction. Summary of the Invention

[0004] This application provides a voice-based interaction method, interaction device, and smart device. The method can simplify user operations and improve interaction efficiency during voice interaction.

[0005] Firstly, a voice-based interaction method is provided, which includes: The user's first voice command is detected. The first voice command is used to request the configuration of the target control mode of the smart device. Based on the first voice command, the mode identifier of the target control mode and the control parameters corresponding to the mode identifier are obtained; Store the mode identifier of the target control mode and the control parameters corresponding to the mode identifier.

[0006] In the above technical solution, the user's first voice command is detected. Since the first voice command is used to request the configuration of the target control mode of the smart device, the mode identifier corresponding to the target control mode and the control parameters corresponding to the mode identifier can be obtained based on the first voice command, and the mode identifier of the target control mode and its corresponding control parameters are stored. In this way, the user can complete the configuration of the target control mode based on the first voice command. Since the mode identifier corresponding to the target control mode and the control parameters corresponding to the mode identifier are stored, the user can achieve the desired function without issuing multiple commands when controlling the smart device in the future. This simplifies user operation, improves interaction efficiency, and thus enhances the user experience.

[0007] In conjunction with the first aspect, in some possible implementations, based on the first voice command, the mode identifier of the target control mode and the control parameters corresponding to the mode identifier are obtained, including: The first voice command is parsed to obtain the first text; Based on the first text, the mode identifier of the target control mode and the control parameters corresponding to the mode identifier are obtained.

[0008] In the above technical solution, a preset format is obtained, and the first voice command is parsed to obtain the first text. Based on the first text and the preset format, the mode identifier of the target control mode and the corresponding control parameters are obtained. By converting the parsed first text using the preset format, the user's spoken expression can be mapped into structured information that conforms to grammatical rules. This ensures that the mode identifier of the target control mode and its corresponding control parameters can be accurately extracted when parsing the first voice command, avoiding parsing errors or information loss during voice interaction due to the user's expression, and improving the control accuracy of the smart device.

[0009] Combining the first aspect and the above implementation methods, in some possible implementation methods, the first voice command is parsed to obtain the first text, including: The first voice command is processed into speech-to-text to obtain the second text. The second text is input into the semantic understanding model for semantic parsing to obtain the first text.

[0010] In the above technical solution, the first text is format-converted based on a preset format to obtain a second text that includes a mode identifier, device identifier, control operation, control parameters, or a preset flag. Since the second text is obtained by format conversion of the first text, the user's spoken expression can be mapped into structured information that conforms to grammatical rules. This ensures that the mode identifier of the target control mode and its corresponding control parameters can be accurately extracted when parsing the first voice command, avoiding parsing errors or information loss during voice interaction caused by the user's expression. The second text includes a preset flag. Since the preset flag is used to indicate the number of control operations, when the second text includes the preset flag, the number of control operations contained in the user-configured target control mode can be identified, ensuring that the target control mode can implement multiple control operations. This also avoids missing control operations when configuring the target control mode, thereby improving the control accuracy of the smart device.

[0011] Combining the first aspect and the above implementation methods, in some possible implementation methods, the second text is input into a semantic understanding model for semantic parsing processing to obtain the first text, including: The second text is input into the semantic understanding model for semantic parsing to obtain the first text; The third text is formatted to obtain the first text. The first text includes at least one of the following: pattern identifier, device identifier of smart device, control operation, control parameters and preset flag bit. The preset flag bit is used to indicate the number of control operations.

[0012] In the above technical solution, the user's first voice command is converted into third text through speech-to-text processing, which can convert the voice signal into semantic information that can be processed later. Then, the first text is obtained by semantic understanding of the third text, which can accurately identify the user's intention and extract the mode identifier of the target control mode and its corresponding control parameters from the user's spoken expression. This improves the accuracy and robustness of speech parsing and thus enhances the user experience.

[0013] Combining the first aspect and the above implementation methods, in some possible implementation methods, the first voice command is processed into speech-to-text to obtain the second text, including: Obtain a keyword database; Based on a keyword database, the first voice command is processed into speech-to-text to obtain the second text.

[0014] In the above technical solution, a keyword library is obtained, and the first voice command is processed into speech-to-text based on the keyword library to obtain the second text. By introducing the keyword library during speech-to-text processing, the accuracy of the speech recognition process can be improved, ensuring that the second text can reproduce the pattern identifiers or device identifiers involved in the user's intent. This enhances the robustness and accuracy of the voice interaction process, thereby improving the user experience.

[0015] Combining the first aspect and the above implementation methods, in some possible implementation methods, the interaction method also includes: Obtain the target confidence level corresponding to the first voice command; The storage includes the mode identifier of the target control mode and the corresponding control parameters, including: When the target confidence level is greater than or equal to the first threshold, store the mode identifier of the target control mode and the control parameters corresponding to the mode identifier.

[0016] In the above technical solution, the target confidence level corresponding to the first text is obtained, and different interaction strategies are executed based on the comparison result between the target confidence level and a preset threshold. When the target confidence level is high, the subsequent storage steps are executed directly, ensuring the reliability and robustness of the voice interaction process and guaranteeing processing efficiency in high-confidence scenarios.

[0017] Combining the first aspect and the above implementation methods, in some possible implementation methods, the interaction method also includes: When the target confidence level is greater than or equal to the second threshold and less than the first threshold, the first prompt message is output. The first prompt message is used to prompt the user to confirm the configuration information. The configuration information includes the mode identifier of the target control mode and the control parameters corresponding to the mode identifier. When the target confidence level is less than the second threshold, a second prompt message is output, which is used to inquire about the user's intent.

[0018] In the above technical solution, the target confidence level corresponding to the first text is obtained, and different interaction strategies are executed based on the comparison result between the target confidence level and a preset threshold. Through this method, when the target confidence level is between the first and second thresholds, a prompt message is output requesting user confirmation, avoiding misconfiguration due to insufficient confidence; when the target confidence level is low, an inquiry message is output to guide the user to supplement control intent, obtaining more accurate instruction information through proactive interaction. This solution employs different processing strategies for different confidence level ranges, enabling confirmation or supplementation through user interaction when confidence is insufficient, avoiding mode configuration failures due to parsing errors, improving the robustness and configuration success rate of voice interaction, and thus enhancing the user experience.

[0019] Combining the first aspect and the above implementation methods, in some possible implementations, the smart device includes a first smart device, which is used to represent any one of the smart devices, and the interaction method further includes: A second instruction from the user is detected. The second instruction is used to indicate the target identifier for configuring the first smart device. The target identifier is used to indicate the device alias of the first smart device. Based on the second instruction, determine the target identifier of the first smart device; Store the device identifier of the first intelligent device and the target identifier of the first intelligent device.

[0020] In the above technical solution, the user's second command is detected. Since the second command is used to instruct the configuration of the target identifier of the first smart device, the target identifier of the first smart device is determined based on the three voice commands, and the device identifier and target identifier of the first smart device are stored. Through this method, the user can complete the identifier configuration of the first smart device based on the second command. Because the device identifier and target identifier of the first smart device are stored, when the user subsequently controls the first smart device, if a voice command containing the target identifier is detected, it can be accurately mapped to the corresponding first smart device, improving the flexibility of the voice interaction process and enhancing the user experience.

[0021] In combination with the first aspect and the above implementation methods, in some possible implementation methods, after storing the device identifier and the target identifier of the first smart device, the interaction method further includes: A third instruction from the user is detected. The third instruction includes a target identifier and is used to request control of the first smart device. Based on the third instruction, the control parameters corresponding to the first intelligent device are obtained; The first intelligent device is controlled based on the control parameters corresponding to the first intelligent device.

[0022] In the above technical solution, a third user instruction is detected, determining that the user's intention is to request control of the first smart device. Since the third instruction includes a target identifier, the corresponding first smart device can be obtained based on the target identifier in the third instruction. Further control parameters corresponding to the first smart device are obtained based on the third instruction. Then, the first smart device is controlled according to the control parameters. This solution, after storing the device identifier and target identifier of the first smart device, allows direct identification of the first smart device through a user-defined target identifier, improving the flexibility of voice control and thus enhancing the user experience.

[0023] Combining the first aspect and the above implementation methods, in some possible implementations, after storing the mode identifier of the target control mode and the control parameters corresponding to the mode identifier, the interaction method further includes: An update command for the target control mode was detected; The control parameters corresponding to the target control mode are updated based on the update command.

[0024] In the above technical solution, when an update command for the target control mode is detected, the control parameters corresponding to the target control mode are updated based on the update command. This allows for the adjustment and updating of the control parameters of the target control mode according to user needs, based on the stored target control mode, thereby improving the flexibility and convenience of mode maintenance and enhancing the user experience.

[0025] Combining the first aspect and the above implementation methods, in some possible implementations, after storing the mode identifier of the target control mode and the control parameters corresponding to the mode identifier, the interaction method further includes: A user request command was detected, which is used to request the opening of the target control mode; The intelligent device is controlled based on the control parameters corresponding to the target control mode.

[0026] In the above technical solution, after storing the mode identifier of the target control mode and the control parameters corresponding to the mode identifier, when a user's request command is detected, the smart device is controlled based on the control parameters corresponding to the target control mode. This allows the user to achieve the desired function without issuing multiple commands, which can simplify user operation, improve interaction efficiency, and thus enhance the user experience.

[0027] Combining the first aspect and the above implementation methods, in some possible implementation methods, the interaction method also includes: Upon detecting the user's fourth voice command, it is determined that a request command has been detected; or, When a user's target operation on a smart device is detected, a request instruction is determined to exist. The target operation is used to indicate a touch operation that invokes the target control mode.

[0028] The above technical solution provides users with two triggering methods for calling the target control mode: voice and touch. Users can flexibly choose according to their own preferences or current usage scenarios. This avoids the usage limitations that may be caused by relying on a single interaction method, improves the diversity and flexibility of interaction methods, and thus enhances the user experience.

[0029] Combining the first aspect and the above implementation methods, in some possible implementations, the smart device includes a display screen, and after storing the mode identifier of the target control mode and the control parameters corresponding to the mode identifier, the interaction method further includes: A visual interface corresponding to the target control mode is generated on the display screen. The visual interface includes at least one of the following: mode identifier, control parameters corresponding to the mode identifier, execution control and playback control. The execution control is used to trigger the execution of the target control mode, and the playback control is used to play the control parameters corresponding to the mode identifier through text-to-speech. In response to a click on the control, the smart device is controlled based on the control parameters corresponding to the target control mode.

[0030] In the above technical solution, after storing the target control mode, a corresponding visual interface is generated on the display screen. This interface shows the mode identifier and its control parameters, and provides execution and playback controls. Through this method, the target control mode configured by the user via voice is presented in a visual form, allowing the user to intuitively view and confirm the configured mode content. Furthermore, the user can trigger the mode by clicking the execution control or preview the mode's execution effect via voice using the playback control, thus achieving a closed loop between voice configuration and visual interaction, enriching the user's interaction methods and improving the user experience.

[0031] Combining the first aspect and the above implementation methods, in some possible implementation methods, the interaction method also includes: Obtain the user's historical interaction information, which includes the interaction time, location, and content. Based on historical interaction information, establish the correspondence between historical interaction scenarios and historical operations; When a match is detected between the current scene and a historical interaction scene, the historical operation is executed based on the correspondence between the historical interaction scene and the historical operation.

[0032] The above technical solution acquires users' historical interaction information, analyzes user interactions at specific times and locations, establishes a correlation between interaction scenarios and operations, and automatically executes corresponding historical operations when the current scenario matches historical scenarios. Through this method, the solution can learn users' historical behavior patterns, proactively providing operation suggestions or directly executing commands based on context awareness even when the user has not explicitly given an instruction. This reduces repetitive user operations, improves the convenience of interaction, and ultimately enhances the user experience.

[0033] Combining the first aspect and the above implementation methods, in some possible implementations, after storing the mode identifier of the target control mode and the control parameters corresponding to the mode identifier, the interaction method further includes: The target control mode's mode identifier and the corresponding control parameters are synchronized to multiple devices associated with the smart device, so that the multiple devices can operate based on the control parameters corresponding to the target control mode.

[0034] In the above technical solution, after storing the target control mode, the mode identifier and its control parameters are synchronized to other devices associated with the smart device, enabling multiple devices to share the same personalized user configuration. Through this method, users only need to complete the mode configuration on a single device to use the mode on all associated devices, avoiding the tedious operation of repeatedly configuring on multiple devices. Furthermore, cross-device synchronization ensures that users receive a consistent personalized interactive experience when using different devices at different times and locations, improving configuration efficiency and thus enhancing the user experience.

[0035] Secondly, a voice-based interactive device is provided, the control device comprising: The detection module is used to detect the user's first voice command, which is used to request the configuration of the target control mode of the smart device. The processing module is used to obtain the mode identifier of the target control mode and the control parameters corresponding to the mode identifier based on the first voice command; and to store the mode identifier of the target control mode and the control parameters corresponding to the mode identifier.

[0036] In conjunction with the second aspect, in some possible implementations, the processing module is further configured to obtain the mode identifier of the target control mode and the control parameters corresponding to the mode identifier based on the first voice command, including: parsing the first voice command to obtain the first text; and obtaining the mode identifier of the target control mode and the control parameters corresponding to the mode identifier based on the first text.

[0037] In combination with the second aspect and the above implementation methods, in some possible implementation methods, the processing module is further configured to perform speech-to-text processing on the first voice command to obtain the second text; and input the second text into the semantic understanding model for semantic parsing processing to obtain the first text.

[0038] In combination with the second aspect and the above implementation methods, in some possible implementation methods, the processing module is further configured to input the second text into the semantic understanding model for the semantic parsing processing to obtain the third text; and to perform format conversion on the third text to obtain the first text, wherein the first text includes at least one of the following: pattern identifier, device identifier of the smart device, control operation, control parameters, and preset flag bit, wherein the preset flag bit is used to indicate the number of control operations.

[0039] In combination with the second aspect and the above implementation methods, in some possible implementation methods, the processing module is also used to obtain a keyword library; based on the keyword library, the first voice command is processed into speech-to-text to obtain the second text.

[0040] In conjunction with the second aspect and the above implementation methods, in some possible implementation methods, the processing module is further configured to obtain the target confidence level corresponding to the first voice command; the storage of the mode identifier of the target control mode and the control parameters corresponding to the mode identifier includes: when the target confidence level is greater than or equal to a first threshold, storing the mode identifier of the target control mode and the control parameters corresponding to the mode identifier.

[0041] In conjunction with the second aspect and the above implementation methods, in some possible implementation methods, the processing module is further configured to output a first prompt message when the target confidence level is greater than or equal to the second threshold and less than the first threshold. The first prompt message is used to prompt the user to confirm the configuration information, the configuration information including the mode identifier of the target control mode and the control parameters corresponding to the mode identifier; and output a second prompt message when the target confidence level is less than the second threshold. The second prompt message is used to inquire about the user's intention.

[0042] In conjunction with the second aspect and the above implementation methods, in some possible implementation methods, the acquisition module is further configured to detect a second instruction from the user, the second instruction being used to indicate the target identifier of the first smart device, the target identifier being used to indicate the device alias of the first smart device; determine the target identifier of the first smart device based on the second instruction; and store the device identifier of the first smart device and the target identifier of the first smart device.

[0043] In conjunction with the second aspect and the above implementation methods, in some possible implementation methods, the processing module is further configured to detect a third instruction from the user, the third instruction including a target identifier, the third instruction being used to request control of the first smart device; based on the third instruction, to obtain the first smart device and the control parameters corresponding to the first smart device; and based on the control parameters corresponding to the first smart device, to control the first smart device.

[0044] In combination with the second aspect and the above implementation methods, in some possible implementation methods, the acquisition module is also used to detect the update instruction of the target control mode; and update the control parameters corresponding to the target control mode based on the update instruction.

[0045] In conjunction with the second aspect and the above implementation methods, in some possible implementation methods, the processing module is also used to detect the user's request instruction, which is used to request the opening of the target control mode; and to control the smart device based on the control parameters corresponding to the target control mode.

[0046] In conjunction with the second aspect and the above implementation methods, in some possible implementation methods, the processing module is further configured to determine that a request instruction is detected when the user's fourth voice instruction is detected; or, when the user's target operation on the smart device is detected, determine that a request instruction is detected, wherein the target operation is used to represent a touch operation that invokes a target control mode.

[0047] In conjunction with the second aspect and the above implementation methods, in some possible implementation methods, the processing module is also used to generate a visual interface corresponding to the target control mode on the display screen. The visual interface includes a mode identifier, control parameters corresponding to the mode identifier, at least one of an execution control and a playback control. The execution control is used to trigger the execution of the target control mode, and the playback control is used to play the control parameters corresponding to the mode identifier through text-to-speech. In response to a click operation on the execution control, the smart device is controlled based on the control parameters corresponding to the target control mode.

[0048] In conjunction with the second aspect and the above implementation methods, in some possible implementation methods, the processing module is also used to obtain the user's historical interaction information, which includes interaction time, interaction location and interaction content; based on the historical interaction information, establish a correspondence between historical interaction scenarios and historical operations; when a match between the current scenario and a historical interaction scenario is detected, the historical operation is executed based on the correspondence between the historical interaction scenario and the historical operation.

[0049] In combination with the second aspect and the above implementation methods, in some possible implementation methods, the processing module is also used to synchronize the mode identifier of the target control mode and the control parameters corresponding to the mode identifier to multiple devices associated with the smart device, so that the multiple devices can operate based on the control parameters corresponding to the target control mode.

[0050] Thirdly, a smart device is provided, including a memory and a processor. The memory is used to store executable program code, and the processor is used to call and run the executable program code from the memory, causing the smart device to perform the methods of the first aspect or any possible implementation thereof.

[0051] Fourthly, a computer program product is provided, comprising: computer program code, which, when run on a computer, causes the computer to perform the methods described in the first aspect or any possible implementation thereof.

[0052] Fifthly, a computer-readable storage medium is provided that stores computer program code, which, when executed on a computer, causes the computer to perform the methods described in the first aspect or any possible implementation thereof. Attached Figure Description

[0053] Figure 1 This is a schematic diagram illustrating a scenario of a voice-based interaction method provided in an embodiment of this application; Figure 2 This is a schematic diagram of the system architecture of a smart home system provided in an embodiment of this application; Figure 3 This is a schematic flowchart illustrating a voice-based interaction method provided in an embodiment of this application; Figure 4 This is a schematic flowchart illustrating another voice-based interaction method provided in an embodiment of this application; Figure 5 This is a schematic diagram of the structure of a voice-based interactive device provided in an embodiment of this application; Figure 6 This is a schematic diagram of the structure of a smart device provided in an embodiment of this application. Detailed Implementation

[0054] The technical solutions in this application will be clearly and thoroughly described below with reference to the accompanying drawings. In the description of the embodiments of this application, unless otherwise stated, " / " means "or," for example, A / B can mean A or B. "And / or" in the text is merely a description of the relationship between related objects, indicating that three relationships can exist. For example, A and / or B can represent: A existing alone, A and B existing simultaneously, and B existing alone. Furthermore, in the description of the embodiments of this application, "multiple" refers to two or more than two.

[0055] Hereinafter, the terms "first" and "second" are used for descriptive purposes only and should not be construed as implying or suggesting relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include one or more of that feature.

[0056] With the development of the smart control field, the control methods for home appliances have evolved from physical buttons to mobile device control, and then to voice interaction. For example... Figure 1 As shown in (a), early home control primarily relied on remote control buttons. Users operated individual appliances by pressing physical buttons on the devices or using dedicated remote controls. This method was limited by distance and the number of buttons, making remote control and complex function adjustments impossible. Subsequently, remote control via mobile applications (APPs) became mainstream, such as... Figure 1 As shown in (b), users control smart devices via mobile terminals. In recent years, voice control technology has developed rapidly, allowing users to directly issue commands to devices using natural language, such as... Figure 1 As shown in (c), users can control smart home functions without manual operation, which improves the naturalness and convenience of interaction.

[0057] However, in existing voice-controlled smart home technologies, users need to strictly follow a limited number of pre-set standard words to issue commands. If users want to set up smart devices according to their own preferences, they need to issue commands one by one to complete the settings of multiple functions. This results in a cumbersome operation process, low interaction efficiency, and affects the user experience.

[0058] In view of the technical problems existing in the prior art, this application provides a voice-based interaction method, interaction device, and smart device. The interaction method detects a user's first voice command. Since the first voice command is used to request configuration of a target control mode of the smart device, the method obtains the mode identifier corresponding to the target control mode and the control parameters corresponding to the mode identifier, and stores the mode identifier and its corresponding control parameters. Through this method, the user can complete the configuration of the target control mode based on the first voice command. Because the mode identifier and its corresponding control parameters are stored, the user can achieve the desired function without issuing multiple commands when controlling the smart device, simplifying user operation, improving interaction efficiency, and thus enhancing the user experience.

[0059] In this application embodiment, the smart device can be a smart home device, specifically including but not limited to air conditioners, fresh air systems, air purifiers, humidifiers, dehumidifiers, lighting equipment, curtains, window units, robot vacuum cleaners, washing machines, refrigerators, televisions, stereos, water heaters, bathroom heaters, smart door locks, security cameras, sensors, smart sockets, and other home devices with network connectivity and automatic control functions. This application embodiment will use smart devices in a home scenario as examples for illustration. It should be noted that the method provided in this application is also applicable to smart devices with voice interaction functions in other scenarios, such as mobile terminals, service robots, etc.

[0060] Figure 2 This is a schematic diagram of the system architecture of a smart home system provided in an embodiment of this application. Figure 2 As shown, the smart home system 200 includes a voice input module 201, a voice recognition module 202, a natural language processing module 203, a memory intelligent agent module 204, and a device control module 205.

[0061] The voice input module 201 is used to receive voice commands input by the user and preprocess the voice commands. Specifically, the voice input module 201 may include one or more microphone arrays for acquiring voice signals from the environment and converting analog voice signals into digital voice signals through analog-to-digital conversion before sending them to the voice recognition module 202.

[0062] The speech recognition module 202 is used to process the received speech signal into text, converting it into corresponding text information. The speech recognition module 202 supports dual-channel decoding for both online streaming recognition and offline non-streaming recognition. The speech recognition module 202 then sends the recognized text information to the natural language processing module 203.

[0063] Optionally, the speech recognition module 202 can obtain a keyword library from the memory agent module 204 during the speech signal recognition process. The keyword library contains user-defined device aliases and pattern identifiers, and the recognition accuracy of user-defined words is enhanced through hot word dynamic injection technology.

[0064] Natural Language Processing (NLP) module 203 performs semantic understanding and analysis on received text information to determine user intent and extract key information. NLP module 203 integrates a large model that supports parsing complex expressions such as multiple intents, fuzzy quantifiers, and omitted subjects. It outputs text content in a preset format (structured control tokens). The preset format text content includes at least one of the following: pattern identifier, device identifier, control operation, control parameters, and preset flag bits. The control information is then sent to device control module 205, or custom information to be stored is sent to memory agent module 204.

[0065] Optionally, when the natural language processing module 203 detects that the user's intent is ambiguous or has low confidence, it generates an inquiry voice, clarifies the user's intent through interaction, and sends the control information to the device control module 205, or sends the custom information to be stored to the memory agent module 204.

[0066] The memory agent module 204 is used to store the association between user-defined device identifiers and target identifiers, the mode identifiers of user-defined target control modes and their corresponding control parameters, etc.; the memory agent module 204 can analyze the user's historical interaction behavior, identify the user's preferences and habits, and support context awareness, actively recommending or executing corresponding operations when similar scenarios are detected; the memory agent module 204 supports cross-device synchronization, synchronizing the stored content to other smart devices to ensure the consistency of multi-device experience.

[0067] Optionally, the memory agent module 204 may provide a keyword library in response to a request from the speech recognition module 202 to improve the accuracy of speech recognition.

[0068] The device control module 205 generates specific device control commands based on the received control information and sends them to the corresponding smart devices for execution. Upon receiving control parameters, it performs corresponding operations to meet the user's control requirements for the smart devices.

[0069] Figure 3 This is a schematic flowchart illustrating a voice-based interaction method provided in an embodiment of this application. It should be understood that this control method can be applied to, for example... Figure 2 The smart home system 200 shown.

[0070] For example, such as Figure 3 As shown, the interaction method 300 includes: S301 detected the user's first voice command.

[0071] The first voice command is used to request the configuration of the target control mode of the smart device.

[0072] In the embodiments of this application, after the user sends out voice content, the voice signal is captured and recognized. By parsing and recognizing that the user's intention is to create or define a new target control mode, rather than to execute an existing mode, the subsequent configuration process is triggered.

[0073] For example, the voice signal emitted by the user is parsed to obtain "Create a goodnight mode, turn off the lights, set the air conditioner to 26 degrees, and close the curtains", thus determining that the user's first voice command has been detected, and a goodnight mode is created according to the command.

[0074] In one implementation, after detecting a preset wake word, the detected audio signal is acquired, and the audio signal is parsed to detect whether the user's first voice command exists.

[0075] S302, based on the first voice command, obtain the mode identifier of the target control mode and the control parameters corresponding to the mode identifier.

[0076] The mode identifier includes the name or code of the target control mode.

[0077] For example, the first voice command is subjected to speech recognition and semantic parsing to extract the mode name of the target control mode created by the user's intention, as well as the control parameters associated with the mode name.

[0078] In one implementation, the process of obtaining the mode identifier of the target control mode and the control parameters corresponding to the mode identifier based on the first voice command may specifically include: The first voice command is parsed to obtain the first text; Based on the first text and the preset format, the mode identifier of the target control mode and the control parameters corresponding to the mode identifier are obtained.

[0079] For example, the first voice command is parsed to obtain the first text, and the text is structured and mapped according to a preset format to extract the mode identifier (e.g., "tornado mode") and the control parameters corresponding to the mode identifier (e.g., "temperature set to 16 degrees" and "wind speed set to maximum").

[0080] In this embodiment of the application, the first text obtained by parsing is converted by a preset format, which can map the user's spoken expression into structured information that conforms to grammatical rules. This ensures that the mode identifier of the target control mode and its corresponding control parameters can be accurately extracted when parsing the first voice command, avoiding parsing errors or information loss during voice interaction due to the user's expression, and improving the control accuracy of the smart device.

[0081] In one implementation, the process of parsing the first voice command to obtain the first text may specifically include: The first voice command is processed into text to obtain the second text. The second text is input into the semantic understanding model for semantic parsing to obtain the first text.

[0082] The second text represents the text recognition result obtained after the first voice command is processed into text, that is, the text content output by the voice recognition module, which directly reflects the text expression corresponding to the first voice command.

[0083] For example, speech-to-text processing can support multiple speech and dialect recognition. Smart devices can be configured with multiple acoustic models, and select the corresponding model for recognition based on user settings or automatically detected speech features to ensure that the second text can accurately reflect the user's intent.

[0084] One implementation also includes: Obtain the target confidence level corresponding to the first voice command; The storage includes the mode identifier of the target control mode and the corresponding control parameters, including: When the target confidence level is greater than or equal to the first threshold, store the mode identifier of the target control mode and the control parameters corresponding to the mode identifier.

[0085] One implementation also includes: When the target confidence level is greater than or equal to the second threshold and less than the first threshold, the first prompt message is output. The first prompt message is used to prompt the user to confirm the configuration information. The configuration information includes the mode identifier of the target control mode and the control parameters corresponding to the mode identifier. When the target confidence level is less than the second threshold, a second prompt message is output, which is used to inquire about the user's intent.

[0086] It should be understood that low-confidence segments appearing in speech-to-text processing are recorded with confidence information; when performing speech recognition processing, the confidence information is combined with the context for correction.

[0087] Optionally, in one embodiment, confidence information of the first voice command is obtained; if the confidence level is detected to be greater than a first preset value, the mode identifier of the target control mode and the control parameters corresponding to the mode identifier are stored; if the confidence level is detected to be less than or equal to the first preset value and greater than a second preset value, the mode identifier of the target control mode and the control parameters corresponding to the mode identifier are stored after the user's confirmation operation is detected; if the confidence level is detected to be less than or equal to the second preset value, an inquiry message is output to confirm the operation to be performed with the user.

[0088] In this embodiment, the target confidence level corresponding to the first text is obtained, and different interaction strategies are executed based on the comparison result between the target confidence level and a preset threshold. When the target confidence level is high, subsequent storage steps are executed directly, ensuring the reliability and robustness of the voice interaction process and guaranteeing processing efficiency in high-confidence scenarios. When the target confidence level is between the first and second thresholds, a prompt message is output to request user confirmation, avoiding incorrect configuration due to insufficient confidence. When the target confidence level is low, an inquiry message is output to guide the user to supplement control intent, obtaining more accurate instruction information through proactive interaction. This solution adopts different processing strategies for different confidence level ranges, enabling confirmation or supplementation through user interaction when confidence is insufficient, avoiding mode configuration failure due to parsing errors, improving the robustness and configuration success rate of voice interaction, and thus enhancing the user experience.

[0089] In this embodiment, the first text is format-converted based on a preset format to obtain a second text that includes a pattern identifier, device identifier, control operation, control parameters, or a preset flag. Since the second text is obtained by format conversion of the first text, the user's spoken expression can be mapped into structured information that conforms to grammatical rules. This ensures that the pattern identifier of the target control mode and its corresponding control parameters can be accurately extracted when parsing the first voice command, avoiding parsing errors or information loss during voice interaction caused by the user's expression. The second text includes a preset flag. Since the preset flag is used to indicate the number of control operations, when the second text includes the preset flag, the number of control operations contained in the user-configured target control mode can be identified, thereby ensuring that the target control mode can implement multiple control operations and avoiding omissions of control operations when configuring the target control mode, thus improving the control accuracy of the smart device.

[0090] In one implementation, the process of inputting the second text into a semantic understanding model for semantic parsing to obtain the first text may specifically include: The second text is input into the semantic understanding model for semantic parsing to obtain the third text; The third text is formatted to obtain the first text. The first text includes at least one of the following: pattern identifier, device identifier of smart device, control operation, control parameters and preset flag bit. The preset flag bit is used to indicate the number of control operations.

[0091] The format conversion function is used to convert text into a preset format, which may include field definitions, data types, and rules for combining fields.

[0092] The first text includes at least one of the following: pattern identifier, device identifier of smart device, control operation, control parameters, and preset flag bit, wherein the preset flag bit is used to indicate the number of control operations.

[0093] For example, the second text could be: Set the air conditioner to tornado mode, set the temperature to 16 degrees, and set the fan speed to maximum. The second text is processed according to a preset format, and the first text is: {"mode_name": "tornado mode", "action": ["set_temperature", "set_fan_speed" ], "device_id": "air_conditioner_livingroom", "param": ["16", "max"], "combo_flag": 1}; where mode_name is used to represent the mode identifier, action is used to represent the control operation, device_id is used to represent the device identifier, param is used to represent the control parameters, and combo_flag is used to represent the preset flag.

[0094] It should be understood that the data structure of combo_flag can be a boolean structure, used to indicate whether there are multiple control operations. When the value of combo_flag is 1, it means that there are multiple control operations (at least two), and when the value of combo_flag is 0, it means that there is one control operation.

[0095] Optionally, the data structure of combo_flag can be int or other data structures to represent the number of control operations. A value of 1 in combo_flag indicates that there is one control operation, and a value of 3 in combo_flag indicates that there are three control operations.

[0096] In this embodiment of the application, the user's first voice command is converted into second text through speech-to-text processing, which can convert the voice signal into semantic information that can be processed later. Then, the second text is semantically understood to obtain the third text, which can accurately identify the user's intention and extract the mode identifier of the target control mode and its corresponding control parameters from the user's spoken expression, thereby improving the accuracy and robustness of speech parsing and thus enhancing the user's experience.

[0097] In one implementation, the process of converting the first voice command to text to obtain the second text may specifically include: Obtain a keyword library; based on the keyword library, perform speech-to-text processing on the first voice command to obtain the second text.

[0098] The keyword library, also known as the hot word library, is a pre-built collection of commonly used words.

[0099] For example, before performing speech recognition, a user-defined alias list is retrieved from the local memory module, and keywords such as "Xiaobai" and "Tornado Mode" from the keyword library are injected into the offline decoding network. During the speech command recognition process, dual-channel decoding is performed using both online and offline decoding channels.

[0100] It should be understood that a keyword library can be used to increase the weight of keywords during the decoding process, thereby increasing the probability of them being selected during recognition and thus improving the accuracy of recognition.

[0101] In this embodiment, a keyword library is obtained, and the first voice command is processed into speech-to-text based on the keyword library to obtain the second text. Introducing the keyword library during speech-to-text processing improves the accuracy of speech recognition, ensuring that the second text can reproduce the pattern identifiers or device identifiers involved in the user's intent. This enhances the robustness and accuracy of the voice interaction process, thereby improving the user experience.

[0102] S303 stores the mode identifier of the target control mode and the control parameters corresponding to the mode identifier.

[0103] For example, the mode identifier of the target control mode obtained in S303 and the control parameters corresponding to the mode identifier are stored for use in subsequent interactions.

[0104] For example, based on the parsing of the first voice command, the user-created "Tornado Mode" is obtained, and the corresponding control parameters are "Air conditioning set to 16 degrees" and "Fan speed adjusted to maximum"; where the mode identifier of the target control mode is "Tornado Mode"; and the corresponding control parameters are "set_temperature=16, set_fan_speed=max".

[0105] In one implementation, after storing the mode identifier of the target control mode and the control parameters corresponding to the mode identifier, the method further includes: An update command for the target control mode was detected; The control parameters corresponding to the target control mode are updated based on the update command.

[0106] The update command is a request issued by the user to modify the stored target control mode.

[0107] It should be understood that the update command can be in the form of a voice command (e.g., "set the air conditioner temperature of the goodnight mode to 24 degrees") or a touch command (e.g., the user edits and / or saves the stored goodnight mode on the visual interface).

[0108] For example, after storing the user-created "Goodnight Mode", such as "Goodnight Mode" containing three operations: "Turn off the lights, set the air conditioner to 26 degrees, and close the curtains", if the user issues an update command: "Adjust the air conditioner temperature of Goodnight Mode to 24 degrees", the system will find the control parameters of "Goodnight Mode" in the storage, change the temperature parameter from 26 degrees to 24 degrees, and save it again.

[0109] It should be noted that when an update command is detected, the update operation corresponding to the update command may include adjusting the order of multiple control operations corresponding to the target control mode, adjusting the control parameters corresponding to the target control mode, adding control operations, or deleting control operations.

[0110] Optionally, upon detecting an update command, the smart device can push the increment to the cloud via WebSocket.

[0111] In one embodiment, the smart device can detect the user's update command and parse the user's intent; at the same time, the smart device can display the control operation and / or control parameters corresponding to the updated target control mode on a visual interface so that the user can confirm whether the parsed command is correct.

[0112] In this embodiment, when an update command for the target control mode is detected, the control parameters corresponding to the target control mode are updated based on the update command. This allows for the adjustment and updating of the control parameters of the target control mode according to user needs, based on the stored target control mode, thereby improving the flexibility and convenience of mode maintenance and enhancing the user experience.

[0113] In one implementation, after storing the mode identifier of the target control mode and the control parameters corresponding to the mode identifier, the method further includes: The user's request instruction has been detected. The intelligent device is controlled based on the control parameters corresponding to the target control mode.

[0114] The request command is used to request the opening of the target control mode.

[0115] For example, if a user has configured "Goodnight Mode" to include three operations: "turn off the lights, set the air conditioner to 26 degrees Celsius, and close the curtains", and subsequently detects that the user has issued a request command: "Turn on Goodnight Mode", the above three operations can be performed automatically without issuing commands one by one.

[0116] In one embodiment, when it is detected that a user performs an operation or combination of operations multiple times under similar times, scenarios, or preconditions, an association between the scenario and the corresponding operation is established and stored. When the scenario is detected to reappear, the corresponding operation is recommended to the user based on the stored association, or the corresponding operation is executed directly after the user authorizes it.

[0117] For example, if a user says "it's a bit stuffy" at a fixed time and then turns on the fan via voice command, the association is recorded. When the fixed time is reached again, if the user says "it's a bit stuffy" again, the system will automatically ask the user whether to turn on the fan.

[0118] In this embodiment, after storing the mode identifier of the target control mode and the control parameters corresponding to the mode identifier, when a user's request instruction is detected, the smart device is controlled based on the control parameters corresponding to the target control mode. This allows the user to achieve the desired function without issuing multiple instructions, which simplifies user operation, improves interaction efficiency, and enhances the user experience.

[0119] One implementation also includes: Upon detecting the user's fourth voice command, it is determined that a request command has been detected; or, When a user's target operation on a smart device is detected, a request instruction is determined.

[0120] The fourth voice command is used to invoke the target control mode, such as "turn on Goodnight Mode" or "turn on Tornado Mode". The target operation is used to represent the touch operation that invokes the target control mode, including the touch operation performed by the user on the visual interactive interface to invoke the target control mode, such as clicking the on button corresponding to the mode card.

[0121] In one implementation, the smart device includes a display screen, and after storing a mode identifier of the target control mode and the control parameters corresponding to the mode identifier, it also includes: A visual interface corresponding to the target control mode is generated on the display screen. The visual interface includes at least one of the following: mode identifier, control parameters corresponding to the mode identifier, execution control and playback control. The execution control is used to trigger the execution of the target control mode, and the playback control is used to play the control parameters corresponding to the mode identifier through text-to-speech. In response to a click on the control, the smart device is controlled based on the control parameters corresponding to the target control mode.

[0122] For example, a user configures a "Goodnight Mode" via voice on a mobile app, which generates a "Goodnight Mode" card. When the app detects that the user issues a voice command "Turn on Goodnight Mode" or clicks the turn button corresponding to the "Goodnight Mode" card, it obtains the mode identifier of "Goodnight Mode" and retrieves and executes the device, control operation, and control parameters associated with "Goodnight Mode" from the stored content.

[0123] In one embodiment, the usage of the target control mode is displayed via a visual interface.

[0124] In this embodiment, after storing the target control mode, a corresponding visual interface is generated on the display screen. This interface displays the mode identifier and its control parameters, and provides execution and playback controls. Through this method, the target control mode configured by the user via voice is presented in a visual form, allowing the user to intuitively view and confirm the configured mode content. Furthermore, the user can trigger the mode by clicking the execution control or preview the mode's execution effect via voice using the playback control, thus achieving a closed loop between voice configuration and visual interaction. This enriches the user's interaction methods and enhances the user experience.

[0125] For example, a user configures a "Goodnight Mode" via voice on a mobile app, which generates a "Goodnight Mode" card that displays "Used 0 times today".

[0126] Optionally, the number of times each mode is used can be recorded and displayed.

[0127] One implementation also includes: Obtain the user's historical interaction information, which includes the interaction time, location, and content. Based on historical interaction information, establish the correspondence between historical interaction scenarios and historical operations; When a match is detected between the current scene and a historical interaction scene, the historical operation is executed based on the correspondence between the historical interaction scene and the historical operation.

[0128] Optionally, the system can intelligently recommend activation methods based on the user's usage scenario. When the distance between the user and the visual interface (or screen) is less than a preset value, the interface highlights frequently used mode cards for easy access; when the distance between the user and the visual interface (or screen) is greater than or equal to the preset value, voice wake-up is triggered first.

[0129] In this embodiment, the system acquires the user's historical interaction information, analyzes the user's interaction content at specific times and locations, establishes a correlation between interaction scenarios and operations, and automatically executes the corresponding historical operations when the current scenario matches a historical scenario. Through this method, the solution can learn the user's historical behavior patterns, enabling it to proactively provide operation suggestions that conform to the user's habits or directly execute commands based on context awareness even when the user has not explicitly given an instruction. This reduces repetitive operations for the user, improves the convenience of interaction, and ultimately enhances the user experience.

[0130] In this embodiment, users are provided with two triggering methods for calling the target control mode: voice and touch. Users can flexibly choose according to their own preferences or the current usage scenario. This avoids the usage limitations that may be caused by relying on only a single interaction method, improves the diversity and flexibility of interaction methods, and thus enhances the user experience.

[0131] In one implementation, after storing the mode identifier of the target control mode and the control parameters corresponding to the mode identifier, the method further includes: The target control mode's mode identifier and the corresponding control parameters are synchronized to multiple devices associated with the smart device, so that the multiple devices can operate based on the control parameters corresponding to the target control mode.

[0132] For example, a user configures a "Goodnight Mode" in the living room via a smart speaker. This mode includes three control operations: "turn off the living room lights, set the air conditioner to 26 degrees Celsius, and close the curtains." The mode identifier "Goodnight Mode" and its corresponding control parameters are stored in the cloud-based memory smart device, and this mode is synchronized to the smart devices and mobile app associated with the user in the bedroom. When the user enters the bedroom to rest that night, there is no need to repeat the configuration; they simply say "Turn on Goodnight Mode" to the smart device in the bedroom. The device automatically retrieves the synchronized control parameters from the cloud, sets the bedroom air conditioner to 26 degrees Celsius, and simultaneously closes the living room lights and curtains.

[0133] In this embodiment, after storing the target control mode, the mode identifier and its control parameters are synchronized to other devices associated with the smart device, enabling multiple devices to share the same personalized user configuration. Through this method, users only need to complete the mode configuration on a single device to use the mode on all associated devices, avoiding the tedious operation of repeated configuration on multiple devices. Furthermore, cross-device synchronization ensures that users receive a consistent personalized interactive experience when using different devices at different times and locations, improving configuration efficiency and thus enhancing the user experience.

[0134] One implementation also includes: A second instruction from the user is detected, which is used to indicate the target identifier for configuring the first smart device; Based on the second instruction, determine the target identifier of the first smart device; Store the device identifier of the first intelligent device and the target identifier of the first intelligent device.

[0135] Among them, the smart device includes the first smart device, which is used to represent any one of the smart devices. The target identifier is used to indicate the device alias of the first smart device, that is, the alternative name or identifier that the user defines for the smart device, which is used to refer to a specific device during voice interaction, such as naming the bedroom air conditioner "Xiaobai" and the living room light "Da Deng".

[0136] For example, when a second command is detected, the device and the target identifier assigned by the user are extracted through voice recognition and semantic parsing. For instance, if the user's second command, "Rename the bedroom air conditioner to Xiaobai," is detected, the first smart device is identified as "bedroom air conditioner," and the target identifier is "Xiaobai." The device identifier of "bedroom air conditioner" is associated with the identifier "Xiaobai" and stored. Subsequently, when the user issues a command containing "Xiaobai," the bedroom air conditioner can be controlled.

[0137] It should be understood that before speech recognition, a user-defined alias list is retrieved from the local memory module, and keywords such as "Xiaobai" and "Tornado Mode" from the keyword library are injected as hot words into the offline decoding network. During the recognition of voice commands, dual-channel decoding is performed using both online and offline decoding channels.

[0138] In this embodiment, a second user instruction is detected. Since the second instruction is used to instruct the configuration of a target identifier for the first smart device, the target identifier of the first smart device is determined based on the second instruction, and the device identifier of the first smart device and the target identifier are stored. Through this method, the user's identifier configuration for the first smart device can be completed based on the second instruction. Because the device identifier and target identifier of the first smart device are stored, when the user subsequently controls the first smart device, a voice command containing the target identifier can be accurately mapped to the corresponding first smart device, improving the flexibility of the voice interaction process and enhancing the user experience.

[0139] In one implementation, after storing the device identifier and the target identifier of the first smart device, the method further includes: A third-party command from the user was detected; Based on the third instruction, the control parameters corresponding to the first intelligent device are obtained; The first intelligent device is controlled based on the control parameters corresponding to the first intelligent device.

[0140] The third instruction includes a target identifier and is used to request control of the first smart device; for example, "turn on Xiaobai" or "set Dabai to 26 degrees".

[0141] For example, after storing the device identifier and target identifier of the first smart device, a third command from the user is detected. This third command is parsed to obtain the control command issued by the user for the first smart device. The corresponding device identifier is retrieved through the target identifier, and the control parameters of the device are extracted from the third command. The first smart device is then controlled to execute commands based on the device identifier and control parameters. For instance, if the user names the bedroom air conditioner "Xiaobai," and the user issues the voice command "Open Xiaobai," the device identifier of the bedroom air conditioner is retrieved based on "Xiaobai," the control action is identified as "Open," and the bedroom air conditioner is turned on.

[0142] In this embodiment, a third user instruction is detected, determining that the user's intention is to request control of a first smart device. Since the third instruction includes a target identifier, the corresponding first smart device can be obtained based on the target identifier in the third instruction. Further control parameters corresponding to the first smart device are obtained based on the third instruction. Then, the first smart device is controlled according to the control parameters. This solution, after storing the device identifier and target identifier of the first smart device, allows direct identification of the first smart device through a user-defined target identifier, improving the flexibility of voice control and thus enhancing the user experience.

[0143] In summary, in this embodiment, a user's first voice command is detected. Since the first voice command is used to request configuration of a target control mode for the smart device, the mode identifier corresponding to the target control mode and the control parameters corresponding to the mode identifier can be obtained based on the first voice command, and the mode identifier of the target control mode and its corresponding control parameters are stored. Through this method, the user can complete the configuration of the target control mode based on the first voice command. Because the mode identifier corresponding to the target control mode and the control parameters corresponding to the mode identifier are stored, the user can achieve the desired function without issuing multiple commands when controlling the smart device, which simplifies user operation, improves interaction efficiency, and thus enhances the user experience.

[0144] Figure 4 This is a schematic flowchart illustrating another voice-based interaction method provided in an embodiment of this application. It should be understood that this control method can be applied to, for example... Figure 2 The smart home system 200 shown.

[0145] For example, such as Figure 4 As shown, the control method 400 includes: S401 receives the first voice command.

[0146] The first voice command is used to request the configuration of the target control mode of the smart device.

[0147] For example, the smart home system receives the first voice command issued by the user through a voice acquisition device. For instance, the user says, "Create a scene called 'Tornado Mode' for me, set the air conditioner to 26 degrees Celsius, and turn the fan speed to maximum."

[0148] S402, parse the first voice command to obtain the first text.

[0149] For example, after receiving the first voice command, the smart home system parses the command to obtain the corresponding first text. The parsing process includes speech-to-text processing and semantic understanding processing.

[0150] Specifically, the smart home system first converts the first voice command into third text (i.e., the original recognized text) through the voice recognition module, and then performs semantic parsing on the third text through the semantic understanding model to obtain the structured first text.

[0151] Optionally, during the speech-to-text processing, the smart home system can obtain a keyword library containing at least one pattern identifier or device alias defined by the user. Based on this keyword library, the first voice command can be processed into speech-to-text to improve the recognition accuracy of user-defined words.

[0152] S403, based on the first text and the preset format, obtain the mode identifier of the target control mode and the control parameters corresponding to the mode identifier.

[0153] For example, the smart home system obtains a preset format template, which represents the syntax rules for format conversion of voice commands. The preset format may include field definitions, data types, value constraints, and combination rules between fields. For instance, it may specify that configuration commands include a mode identifier field, which may optionally include fields such as device identifier, control operation, and control parameters. The preset format can be stored in the smart home system's local storage or cloud configuration database, and the mode identifier of the target control mode and the corresponding control parameters can be obtained from the first text.

[0154] S404 stores the mode identifier of the target control mode and the control parameters corresponding to the mode identifier.

[0155] For example, the smart home system associates and stores the mode identifier of the target control mode extracted in S403 with its corresponding control parameters, and saves it to the local memory module or the cloud memory smart body. The stored target control mode can be called later.

[0156] S405, Target control mode update command detected.

[0157] For example, after the target control pattern is stored, the smart home system can receive an update command from the user for that pattern. The update command could be a request to modify, add, or delete control parameters of an existing pattern.

[0158] S406 updates the control parameters corresponding to the target control mode based on the update command.

[0159] For example, the smart home system adjusts the control parameters of the stored target control mode according to the content of the update instruction, and re-stores the updated association to ensure the real-time performance and accuracy of the mode configuration.

[0160] S407, the user's second command has been detected.

[0161] For example, in a device naming scenario, the smart home system receives a second instruction from the user, which is used to set a custom name (i.e., target identifier) for the first smart device. For instance, if the user says, "Rename the bedroom air conditioner to Xiaobai," the smart home system recognizes that the instruction is intended to name the device.

[0162] S408, based on the second instruction, determines the target identifier of the first smart device.

[0163] For example, the system parses the second instruction, extracts the device to be named (such as "bedroom air conditioner") and the target identifier assigned by the user (such as "Xiaobai"), and establishes a mapping relationship between the device identifier and the target identifier.

[0164] S409, store the device identifier of the first smart device and the target identifier of the first smart device.

[0165] For example, the system stores the association between the device identifier and the target identifier determined in S408 so that it can be accurately mapped when the user uses the target identifier to refer to the device later.

[0166] S410, the user's request command has been detected.

[0167] The request instruction may include a mode identifier of the target control mode and / or a target identifier of the first smart device.

[0168] For example, if a user says, "Xiaobai, turn on Tornado Mode," the smart home system identifies "Xiaobai" as the target identifier. Based on the target identifier in the request command, the smart home system finds the corresponding first smart device, the bedroom air conditioner, from its stored associations and extracts the control parameters (such as "turn on") for that device from the command. If the smart home system identifies "Goodnight Mode" as the mode identifier for the target control mode, it finds the corresponding control parameters in its storage based on the mode identifier in the request command and controls the relevant smart devices to perform corresponding operations sequentially or simultaneously according to the parameter content: setting the air conditioner to 26 degrees Celsius and adjusting the fan speed to maximum.

[0169] In summary, in this embodiment, a user's first voice command is detected. Since the first voice command is used to request configuration of a target control mode for the smart device, the mode identifier corresponding to the target control mode and the control parameters corresponding to the mode identifier can be obtained based on the first voice command, and the mode identifier of the target control mode and its corresponding control parameters are stored. Through this method, the user can complete the configuration of the target control mode based on the first voice command. Because the mode identifier corresponding to the target control mode and the control parameters corresponding to the mode identifier are stored, the user can achieve the desired function without issuing multiple commands when controlling the smart device, which simplifies user operation, improves interaction efficiency, and thus enhances the user experience.

[0170] The above text combined Figures 1 to 4 The voice-based interaction method provided in the embodiments of this application is described in detail below; the following will be combined with Figure 5 and Figure 6 The apparatus embodiments of this application are described in detail below. It should be understood that the apparatus in the embodiments of this application can perform the various methods described in the foregoing embodiments of this application, that is, the specific working processes of the various products described below can be referred to the corresponding processes in the foregoing method embodiments.

[0171] Figure 5 This is a schematic diagram of the structure of a voice-based interactive device provided in an embodiment of this application.

[0172] For example, such as Figure 5 As shown, the interactive device 500 includes: Detection module 501 is used to detect the user's first voice command, which is used to request the configuration of the target control mode of the smart device; The processing module 502 is used to obtain the mode identifier of the target control mode and the control parameters corresponding to the mode identifier based on the first voice command; and to store the mode identifier of the target control mode and the control parameters corresponding to the mode identifier.

[0173] In one possible implementation, the processing module 502 is further configured to obtain the mode identifier of the target control mode and the control parameters corresponding to the mode identifier based on the first voice command, including: parsing the first voice command to obtain the first text; and obtaining the mode identifier of the target control mode and the control parameters corresponding to the mode identifier based on the first text.

[0174] In one possible implementation, the processing module 502 is further configured to parse the first voice command to obtain the first text, including: performing speech-to-text processing on the first voice command to obtain the second text; and inputting the second text into a semantic understanding model for semantic parsing processing to obtain the first text.

[0175] In one possible implementation, the processing module 502 is further configured to input the second text into the semantic understanding model for semantic parsing to obtain the first text, including: inputting the second text into the semantic understanding model for semantic parsing to obtain the third text; performing format conversion on the third text to obtain the first text, wherein the first text includes at least one of pattern identifier, device identifier of smart device, control operation, control parameters and preset flag bit, wherein the preset flag bit is used to indicate the number of control operations.

[0176] In one possible implementation, the processing module 502 is further configured to acquire a keyword library; based on the keyword library, the first voice command is processed into speech-to-text to obtain the second text.

[0177] In one possible implementation, the processing module 502 is further configured to obtain the target confidence level corresponding to the first voice command; and store the mode identifier of the target control mode and the control parameters corresponding to the mode identifier, including: when the target confidence level is greater than or equal to a first threshold, storing the mode identifier of the target control mode and the control parameters corresponding to the mode identifier.

[0178] In one possible implementation, the processing module 502 is further configured to output a first prompt message when the target confidence level is greater than or equal to a second threshold and less than a first threshold. The first prompt message is used to prompt the user to confirm the configuration information, which includes the mode identifier of the target control mode and the control parameters corresponding to the mode identifier. When the target confidence level is less than the second threshold, the second prompt message is used to inquire about the user's intent.

[0179] In one possible implementation, the processing module 502 is further configured to detect a second instruction from the user, the second instruction being used to indicate the target identifier of the first smart device, the target identifier being used to indicate the device alias of the first smart device; determine the target identifier of the first smart device based on the second instruction; and store the device identifier of the first smart device and the target identifier of the first smart device.

[0180] In one possible implementation, the processing module 502 is further configured to detect a third instruction from the user, the third instruction including a target identifier, the third instruction being used to request control of the first smart device; based on the third instruction, to obtain the first smart device and the control parameters corresponding to the first smart device; and based on the control parameters corresponding to the first smart device, to control the first smart device.

[0181] In one possible implementation, the processing module 502 is further configured to detect the update instruction of the target control mode and update the control parameters corresponding to the target control mode based on the update instruction.

[0182] In one possible implementation, the processing module 502 is further configured to detect a user's request instruction, which requests the opening of a target control mode; and control the smart device based on the control parameters corresponding to the target control mode.

[0183] In one possible implementation, the processing module 502 is further configured to determine that a request instruction is detected when a fourth voice instruction from the user is detected; or, when a target operation by the user on the smart device is detected, determine that a request instruction is detected, wherein the target operation is used to represent a touch operation that invokes a target control mode.

[0184] In one possible implementation, the processing module 502 is further configured to generate a visual interface corresponding to the target control mode on the display screen. The visual interface includes at least one of a mode identifier, control parameters corresponding to the mode identifier, an execution control, and a playback control. The execution control is used to trigger the execution of the target control mode, and the playback control is used to play the control parameters corresponding to the mode identifier through text-to-speech. In response to a click operation on the execution control, the smart device is controlled based on the control parameters corresponding to the target control mode.

[0185] In one possible implementation, the processing module 502 is further configured to acquire the user's historical interaction information, which includes interaction time, interaction location and interaction content; establish a correspondence between historical interaction scenarios and historical operations based on the historical interaction information; and execute the historical operation based on the correspondence between the current scenario and the historical interaction scenario when a match is detected between the current scenario and the historical interaction scenario.

[0186] In one possible implementation, the processing module 502 is further configured to synchronize the mode identifier of the target control mode and the control parameters corresponding to the mode identifier to multiple devices associated with the smart device, so that the multiple devices can operate based on the control parameters corresponding to the target control mode.

[0187] It should be noted that the aforementioned voice-based interactive devices are embodied in the form of functional units. The term "module" here can be implemented in software and / or hardware, without specific limitations.

[0188] For example, a "module" can be a software program, a hardware circuit, or a combination of both that implements the above functions. The hardware circuit may include an application-specific integrated circuit (ASIC), electronic circuits, a processor (e.g., a shared processor, a proprietary processor, or a group processor) and memory for executing one or more software or firmware programs, integrated logic circuits, and / or other suitable components that support the described functions.

[0189] Therefore, the units of the various examples described in the embodiments of this application can be implemented in electronic hardware, or a combination of computer software and electronic hardware. Whether these functions are implemented in hardware or software depends on the specific application and design constraints of the technical solution. Those skilled in the art can use different methods to implement the described functions for each specific application, but such implementation should not be considered beyond the scope of this application.

[0190] Figure 6 This is a schematic diagram of the structure of a smart device provided in an embodiment of this application.

[0191] For example, such as Figure 6 As shown, the smart device 600 includes a memory 601 and a processor 602. The memory 601 stores executable program code 603, and the processor 602 is used to call and execute the executable program code 603 to perform a voice-based interaction method.

[0192] Furthermore, embodiments of this application also protect an apparatus that may include a memory and a processor, wherein the memory stores executable program code, and the processor is used to call and execute the executable program code to execute a voice-based interaction method provided in embodiments of this application.

[0193] This embodiment can divide the device into functional modules based on the above method example. For example, each module can correspond to a separate function, or two or more functions can be integrated into one processing module. The integrated module can be implemented in hardware. It should be noted that the module division in this embodiment is illustrative and only represents one logical functional division. In actual implementation, there may be other division methods.

[0194] When the functional modules are divided according to their respective functions, the device may also include an acquisition module, a processing module, etc. It should be noted that all relevant content of each step involved in the above method embodiments can be referenced to the functional description of the corresponding functional module, and will not be repeated here.

[0195] It should be understood that the device provided in this embodiment is used to execute the above-described voice-based interaction method, and therefore can achieve the same effect as the above-described implementation method.

[0196] When using integrated units, the device may include a processing module and a storage module. When applied to a smart device, the processing module can be used to control and manage the actions of the smart device. The storage module can be used to support the execution of relevant program code by the smart device.

[0197] The processing module may be a processor or a controller, which can implement or execute the various exemplary logic blocks, modules, and circuits described in conjunction with the disclosure of this application. The processor may also be a combination of functions that implement computing capabilities, such as a combination of one or more microprocessors, a combination of digital signal processing (DSP) and a microprocessor, etc., and the storage module may be a memory.

[0198] In addition, the device provided in the embodiments of this application may specifically be a chip, component or module. The chip may include a connected processor and a memory. The memory is used to store instructions. When the processor calls and executes the instructions, the chip can execute a voice-based interaction method provided in the above embodiments.

[0199] This embodiment also provides a computer-readable storage medium storing computer program code. When the computer program code is run on a computer, the computer executes the aforementioned method steps to implement the voice-based interaction method provided in the above embodiment.

[0200] The computer-readable storage medium may include, but is not limited to, any type of disk, including floppy disks, optical disks, Digital Video Discs (DVDs), Compact Disc Read-Only Memory (CD-ROMs), microdrives, and magneto-optical disks, read-only memory (ROMs), random access memory (RAMs), erasable programmable read-only memory (EPROMs), electrically erasable programmable read-only memory (EEPROMs), dynamic random access memory (DRAMs), video random access memory (VRAMs), flash memory devices, magnetic cards or optical cards, nanosystems (including molecular memory ICs), or any type of medium or device suitable for storing instructions and / or data.

[0201] This embodiment also provides a computer program product that, when run on a computer, causes the computer to perform the aforementioned steps to implement a voice-based interaction method provided in the above embodiment.

[0202] In this embodiment, the device, computer-readable storage medium, computer program product, or chip are all used to execute the corresponding methods provided above. Therefore, the beneficial effects they can achieve can be referred to the beneficial effects in the corresponding methods provided above, and will not be repeated here.

[0203] Through the above description of the embodiments, those skilled in the art will understand that, for the sake of convenience and brevity, only the division of the above functional modules is used as an example. In actual applications, the above functions can be assigned to different functional modules as needed, that is, the internal structure of the device can be divided into different functional modules to complete all or part of the functions described above.

[0204] In the embodiments provided in this application, it should be understood that the disclosed apparatus and methods can be implemented in other ways. For example, the apparatus embodiments described above are merely illustrative; for instance, the division of modules or units is only a logical functional division, and in actual implementation, there may be other division methods. For example, multiple units or components may be combined or integrated into another device, or some features may be ignored or not executed. Furthermore, the coupling or direct coupling or communication connection shown or discussed may be through some interfaces; the indirect coupling or communication connection between devices or units may be electrical, mechanical, or other forms.

[0205] The above description is merely a specific embodiment of this application, but the scope of protection of this application is not limited thereto. Any variations or substitutions that can be easily conceived by those skilled in the art within the scope of the technology disclosed in this application should be included within the scope of protection of this application. Therefore, the scope of protection of this application should be determined by the scope of the claims.

Claims

1. A voice-based interaction method, characterized in that, The interaction method includes: The user's first voice command is detected, which is used to request the configuration of the target control mode of the smart device. Based on the first voice command, the mode identifier of the target control mode and the control parameters corresponding to the mode identifier are obtained; Store the mode identifier of the target control mode and the control parameters corresponding to the mode identifier.

2. The voice-based interaction method according to claim 1, characterized in that, The step of obtaining the mode identifier of the target control mode and the control parameters corresponding to the mode identifier based on the first voice command includes: The first voice command is parsed to obtain the first text; Based on the first text, the mode identifier of the target control mode and the control parameters corresponding to the mode identifier are obtained.

3. The voice-based interaction method according to claim 2, characterized in that, The step of parsing the first voice command to obtain the first text includes: The first voice command is processed into speech-to-text to obtain the second text. The second text is input into the semantic understanding model for semantic parsing to obtain the first text.

4. The voice-based interaction method according to claim 3, characterized in that, The step of inputting the second text into a semantic understanding model for semantic parsing to obtain the first text includes: The second text is input into the semantic understanding model for semantic parsing processing to obtain the third text; The third text is formatted to obtain the first text. The first text includes at least one of the following: a pattern identifier, a device identifier of the smart device, a control operation, a control parameter, and a preset flag bit. The preset flag bit is used to indicate the number of the control operations.

5. The voice-based interaction method according to claim 4, characterized in that, The step of performing speech-to-text processing on the first voice command to obtain the second text includes: Obtain a keyword database; Based on the keyword library, the first voice command is processed into speech-to-text to obtain the second text.

6. The voice-based interaction method according to claim 1, characterized in that, The interaction method also includes: Obtain the target confidence level corresponding to the first voice command; The storage of the mode identifier of the target control mode and the control parameters corresponding to the mode identifier include: When the target confidence level is greater than or equal to the first threshold, the mode identifier of the target control mode and the control parameters corresponding to the mode identifier are stored.

7. The voice-based interaction method according to claim 6, characterized in that, The interaction method also includes: When the target confidence level is greater than or equal to the second threshold and less than the first threshold, a first prompt message is output. The first prompt message is used to prompt the user to confirm the configuration information. The configuration information includes the mode identifier of the target control mode and the control parameters corresponding to the mode identifier. When the target confidence level is less than the second threshold, a second prompt message is output, which is used to inquire about the user's intent.

8. The voice-based interaction method according to claim 1, characterized in that, The smart device includes a first smart device, which is used to represent any one of the smart devices, and the interaction method further includes: A second instruction from the user is detected, the second instruction being used to instruct the target identifier for configuring the first smart device, the target identifier being used to instruct the device alias of the first smart device; Based on the second instruction, the target identifier of the first smart device is determined; Store the device identifier of the first smart device and the target identifier of the first smart device.

9. The voice-based interaction method according to claim 8, characterized in that, After storing the device identifier and the target identifier of the first smart device, the interaction method further includes: A third instruction from the user is detected, the third instruction including the target identifier, the third instruction being used to request control of the first smart device; Based on the third instruction, the control parameters corresponding to the first smart device are obtained; The first intelligent device is controlled based on the control parameters corresponding to the first intelligent device.

10. The voice-based interaction method according to claim 1, characterized in that, After storing the mode identifier of the target control mode and the control parameters corresponding to the mode identifier, the interaction method further includes: An update command for the target control mode was detected; The control parameters corresponding to the target control mode are updated based on the update instruction.

11. The voice-based interaction method according to any one of claims 1 to 10, characterized in that, After storing the mode identifier of the target control mode and the control parameters corresponding to the mode identifier, the interaction method further includes: A user request instruction is detected, the request instruction being used to request the opening of the target control mode; The intelligent device is controlled based on the control parameters corresponding to the target control mode.

12. The voice-based interaction method according to claim 11, characterized in that, The interaction method also includes: Upon detecting the user's fourth voice command, it is determined that the request command is detected; or, When the user's target operation on the smart device is detected, it is determined that the request instruction exists, and the target operation is used to represent a touch operation that invokes the target control mode.

13. The voice-based interaction method according to any one of claims 1 to 10, characterized in that, The smart device includes a display screen, and after storing the mode identifier of the target control mode and the control parameters corresponding to the mode identifier, the interaction method further includes: A visual interface corresponding to the target control mode is generated on the display screen. The visual interface includes at least one of the mode identifier, the control parameters corresponding to the mode identifier, an execution control, and a playback control. The execution control is used to trigger the execution of the target control mode, and the playback control is used to play the control parameters corresponding to the mode identifier through text-to-speech conversion. In response to a click operation on the execution control, the smart device is controlled based on the control parameters corresponding to the target control mode.

14. The voice-based interaction method according to claim 13, characterized in that, The interaction method also includes: Obtain the user's historical interaction information, which includes interaction time, interaction location, and interaction content; Based on the historical interaction information, establish the correspondence between historical interaction scenarios and historical operations; When a match is detected between the current scene and the historical interaction scene, the historical operation is executed based on the correspondence between the historical interaction scene and the historical operation.

15. The voice-based interaction method according to claim 13, characterized in that, After storing the mode identifier of the target control mode and the control parameters corresponding to the mode identifier, the interaction method further includes: The mode identifier of the target control mode and the control parameters corresponding to the mode identifier are synchronized to multiple devices associated with the smart device, so that the multiple devices operate based on the control parameters corresponding to the target control mode.

16. A voice-based interactive device, characterized in that, The voice-based interactive device includes: The detection module is used to detect the user's first voice command, which is used to request the configuration of the target control mode of the smart device. The processing module is configured to obtain the mode identifier of the target control mode and the control parameters corresponding to the mode identifier based on the first voice command; and store the mode identifier of the target control mode and the control parameters corresponding to the mode identifier.

17. A smart device, characterized in that, The intelligent device includes: Memory, used to store executable program code; A processor for calling and running the executable program code from the memory, causing the smart device to perform the interaction method as described in any one of claims 1 to 15.

18. A computer program product, characterized in that, The computer program product includes computer program code that, when run on a computer, implements the interaction method as described in any one of claims 1 to 15.

19. A computer-readable storage medium, characterized in that, The computer-readable storage medium stores a computer program that, when executed, implements the interaction method as described in any one of claims 1 to 15.