Home device control method and apparatus, electronic device, and storage medium

By using a multi-level rewriting model to process voice signals in multiple dimensions, the problem of intent recognition in complex scenarios for smart home voice assistants is solved, achieving efficient and accurate device control and improving the user experience.

CN122245309APending Publication Date: 2026-06-19MIDEA GRP (SHANGHAI) CO LTD +1

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
MIDEA GRP (SHANGHAI) CO LTD
Filing Date
2026-03-10
Publication Date
2026-06-19

Smart Images

  • Figure CN122245309A_ABST
    Figure CN122245309A_ABST
Patent Text Reader

Abstract

This application provides a method, device, electronic device, and storage medium for controlling home appliances. The method, applied in the field of smart home control technology, includes: receiving an input voice signal, converting the voice signal into text information, rewriting the text information according to N rewriting dimensions to obtain an initial text command, standardizing the initial text command to obtain a target text command, and finally controlling the target home appliance based on the target text command. This method can accurately parse user input commands and achieve precise control of home appliances.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This application relates to the field of smart home control technology, and more specifically, to control methods, devices, electronic devices and storage media for home devices in the field of smart home control technology. Background Technology

[0002] With the increasing popularity and application of smart home devices, the voice interaction scenarios between users and these devices are becoming increasingly diverse. Users are placing higher demands on the naturalness, fluency, and intelligence of voice interaction. Existing smart home voice assistant interaction architectures typically employ rule-based or lightweight neural network-based natural language understanding models (i.e., "fast models") to process voice commands. These solutions primarily rely on the user's current voice input, or only combine very short historical context information, to complete command standardization, domain classification (such as control domain, query domain), and non-command rejection processing.

[0003] However, in real-world scenarios involving multi-turn interactions, complex intent understanding, and continuous dialogue, relying solely on the current command or a very short historical context makes it difficult to fully understand the user's true intent. This can easily lead to problems such as inaccurate intent recognition, poor interaction coherence, and low adaptability to ambiguous or elliptical expressions, resulting in poor voice interaction performance and consequently a poor user experience. Summary of the Invention

[0004] This application provides a method, device, electronic device, and storage medium for controlling home appliances. The method can accurately parse user input commands and achieve precise control of home appliances.

[0005] Firstly, a method for controlling home appliances is provided, the method comprising: It receives input voice signals and converts them into text information; The text information is rewritten according to N rewriting dimensions to obtain the initial text instruction; where N is an integer greater than or equal to 1. The initial text instructions are normalized to obtain the target text instructions; wherein, the target text instructions are executable instructions in a preset format; Control target home appliances based on target text commands.

[0006] In this implementation, after receiving the input voice signal, it can first be converted into text information, facilitating the generation of commands for controlling home appliances based on the converted text information. After converting the voice signal into text information, the text information can be rewritten according to N rewriting dimensions, supplementing and improving the text information from multiple aspects, effectively filling in missing information in the commands, fully understanding the user's true intentions, and avoiding parsing errors caused by incomplete information or ambiguous expressions, thereby significantly improving the accuracy of subsequent command parsing. Then, the initial text commands are uniformly standardized to generate standard-format, directly executable target text commands, effectively eliminating format differences and providing a stable and reliable foundation for subsequent command issuance and device execution. Finally, the target home appliances can be controlled based on the target text commands, enabling precise control of the home appliances.

[0007] In conjunction with the first aspect, in some implementations of the first aspect, N is an integer greater than 1. The text information is rewritten according to N rewriting dimensions to obtain initial text instructions, including: Based on the i-th rewriting dimension, the content to be rewritten under the i-th rewriting dimension is rewritten to obtain the rewritten text corresponding to the i-th rewriting dimension; where, when i=1, the content to be rewritten under the i-th rewriting dimension is text information; when i>1, the content to be rewritten under the i-th rewriting dimension is the rewritten text corresponding to the (i-1)-th rewriting dimension; i is an integer greater than 1 and less than or equal to N; When i=N, the rewritten text corresponding to the Nth rewritten dimension is used as the initial text instruction.

[0008] In this implementation, the rewriting result of the previous dimension is used as the content to be rewritten in the next dimension, forming a multi-dimensional and hierarchical rewriting process. This process can gradually improve, complete, and standardize the text information from different dimensions, so that the final initial text instruction has contextual integrity, standardized expression, and accurate intent, ensuring the comprehensiveness of the rewriting process.

[0009] Combining the first aspect and the above implementation methods, in some implementation methods of the first aspect, the content to be rewritten under the i-th rewriting dimension is rewritten according to the i-th rewriting dimension to obtain the rewritten text corresponding to the i-th rewriting dimension, including: The content to be rewritten under the i-th rewriting dimension is detected to obtain the detection result; the detection result is used to characterize whether to perform rewriting processing on the content to be rewritten under the i-th rewriting dimension. Based on the detection results, the content to be rewritten under the i-th rewriting dimension is rewritten to obtain the rewritten text corresponding to the i-th rewriting dimension.

[0010] Combining the first aspect and the above implementation methods, in some implementation methods of the first aspect, the content to be rewritten under the i-th rewriting dimension is detected to obtain the detection result, including: Determine whether the rewrite execution condition corresponding to the i-th rewrite dimension is met, and determine whether the content to be rewritten under the i-th rewrite dimension has met the rewrite completion condition; If it is determined that the rewrite execution conditions are not met or the rewrite completion conditions are met, the first detection result is output; wherein, the first detection result is used to indicate that no rewrite processing is required on the content to be rewritten. If the conditions for rewriting execution are met but the conditions for rewriting completion are not met, a second detection result is output; the second detection result is used to characterize the content to be rewritten and requires rewriting processing.

[0011] In this implementation, by judging whether the rewrite execution conditions are met and whether the content to be rewritten meets the rewrite completion conditions, it is possible to accurately determine whether the text to be rewritten in the current rewrite dimension needs to be rewritten. If the rewrite execution conditions are not met or the rewrite completion conditions are met, the detection result that no rewriting is required is directly output, avoiding repeated execution and invalid execution during subsequent rewrite processing, effectively improving the accuracy and reliability of instruction processing, and reducing system resource consumption and latency.

[0012] Combining the first aspect and the above implementation methods, in some implementation methods of the first aspect, based on the detection results, the content to be rewritten under the i-th rewriting dimension is rewritten to obtain the rewritten text corresponding to the i-th rewriting dimension, including: For the content to be rewritten under the i-th rewriting dimension, if the detection result indicates that no rewriting processing needs to be performed on the content to be rewritten, the content to be rewritten is determined as the rewritten text corresponding to the i-th rewriting dimension. For the content to be rewritten under the i-th rewriting dimension, if the detection result indicates that the content to be rewritten needs to be rewritten, the content to be rewritten is rewritten according to the rewriting dimension to obtain the rewritten text corresponding to the i-th rewriting dimension.

[0013] In this implementation, the content to be rewritten is first checked for rewriting processing under each rewriting dimension to distinguish whether rewriting processing is performed or the original content is directly used. When no rewriting is required, the content to be rewritten can be directly used as the rewritten text corresponding to the current dimension, avoiding unnecessary rewriting calculations and resource overhead. When rewriting processing is required, the rewriting processing of the corresponding dimension is completed in a targeted manner. This not only ensures the accuracy of instruction processing under each rewriting dimension, but also effectively reduces invalid calculations and lowers system latency.

[0014] Combining the first aspect and the above implementation methods, in some implementation methods of the first aspect, the content to be rewritten under the i-th rewriting dimension is rewritten according to the i-th rewriting dimension to obtain the rewritten text corresponding to the i-th rewriting dimension, including: When the i-th rewriting dimension is the first dimension, the historical interaction record is obtained, and the intent-related information corresponding to the text information is completed based on the historical interaction record to obtain the first rewritten text corresponding to the first dimension; wherein, the historical interaction record contains the multi-round interaction text information generated between the user and the target home device before the current interaction process; When the i-th rewriting dimension is the second dimension, the first rewritten text is rewritten in a personalized way based on the preset database to obtain the second rewritten text corresponding to the second dimension; wherein, the preset database is used to store the user's personalized interaction information; When the i-th rewriting dimension is the third dimension, obtain the home equipment list, and perform intent parsing and reasoning on the second rewritten text based on the home equipment list to obtain the third rewritten text corresponding to the third dimension; wherein, the home equipment list includes a variety of home equipment and the functional definition of each home equipment.

[0015] In this implementation, the contextual and referential information in the text is first supplemented based on multiple rounds of historical interaction records to obtain the first rewritten text, thereby solving the problems of ambiguous instructions and missing information. Then, the first rewritten text is personalized based on the user's personalized data, so that the resulting second rewritten text combines the user's usage habits and preferences. Finally, the second rewritten text is accurately interpreted and its functions are inferred by combining the list of home devices. This process gradually improves the text information from ambiguous and fragmented to semantically complete, personalized, and clearly defined instructions, greatly improving the understanding accuracy and execution reliability of smart home interaction. At the same time, the hierarchical processing method also ensures the efficiency and stability of the rewriting process.

[0016] Combining the first aspect and the above implementation methods, in some implementation methods of the first aspect, the text information is rewritten according to at least one rewriting dimension to obtain the initial text instruction, including: Analyze the text information and determine the target rewriting dimension from N rewriting dimensions; The text information is rewritten based on the target rewriting dimension to obtain the initial text instruction.

[0017] In this implementation, the text information is first analyzed to accurately determine the target rewriting dimension to be processed from N rewriting dimensions. Then, the text information is rewritten in a targeted manner based on the target rewriting dimension. This avoids redundant calculations caused by rewriting the text information in all rewriting dimensions, thereby reducing unnecessary resource consumption. It can also significantly reduce instruction processing latency while ensuring the text information is complete, standardized, and adapted.

[0018] Combining the first aspect and the above implementation methods, in some implementation methods of the first aspect, the initial text instruction is normalized to obtain the target text instruction, including: The initial text instructions are processed through format standardization, domain classification, and invalid instruction filtering to obtain the target text instructions.

[0019] In this implementation, by standardizing the format of the initial text commands, classifying them by domain, and filtering out invalid commands, natural language commands with diverse expressions and inconsistent formats can be standardized into uniform and executable standard commands. At the same time, the control domain to which the command belongs can be accurately identified, and invalid, meaningless, or unrecognizable commands can be filtered out, thereby improving the standardization and accuracy of the commands, reducing the parsing difficulty of subsequent execution modules, facilitating precise control of home devices, and improving the execution efficiency of the smart home control system.

[0020] Secondly, a control device for a home appliance is provided, the device comprising: The conversion module is used to receive the input voice signal and convert the voice signal into text information; The rewriting module is used to rewrite text information according to N rewriting dimensions to obtain the initial text instructions; where N is an integer greater than or equal to 1. The processing module is used to standardize the initial text instructions to obtain the target text instructions; wherein, the target text instructions are executable instructions in a preset format; The control module is used to control target home appliances based on target text commands.

[0021] In conjunction with the second aspect, in some implementations of the second aspect, N is an integer greater than 1. The rewriting module is specifically used to: rewrite the content to be rewritten under the i-th rewriting dimension according to the i-th rewriting dimension, to obtain the rewritten text corresponding to the i-th rewriting dimension; where, when i=1, the content to be rewritten under the i-th rewriting dimension is text information; when i>1, the content to be rewritten under the i-th rewriting dimension is the rewritten text corresponding to the (i-1)-th rewriting dimension; i is an integer greater than 1 and less than or equal to N; When i=N, the rewritten text corresponding to the Nth rewritten dimension is used as the initial text instruction.

[0022] Combining the second aspect and the above implementation methods, in some implementation methods of the second aspect, the rewriting module includes a rewriting unit, which is specifically used to: detect the content to be rewritten under the i-th rewriting dimension and obtain a detection result; wherein, the detection result is used to characterize whether to perform rewriting processing on the content to be rewritten under the i-th rewriting dimension; Based on the detection results, the content to be rewritten under the i-th rewriting dimension is rewritten to obtain the rewritten text corresponding to the i-th rewriting dimension.

[0023] In combination with the second aspect and the above implementation methods, in some implementation methods of the second aspect, the rewriting unit includes a detection subunit, which is specifically used to: determine whether the rewriting execution condition corresponding to the i-th rewriting dimension is met, and determine whether the content to be rewritten under the i-th rewriting dimension has met the rewriting completion condition; If it is determined that the rewrite execution conditions are not met or the rewrite completion conditions are met, the first detection result is output; wherein, the first detection result is used to indicate that no rewrite processing is required on the content to be rewritten. If the conditions for rewriting execution are met but the conditions for rewriting completion are not met, a second detection result is output; the second detection result is used to characterize the content to be rewritten and requires rewriting processing.

[0024] In conjunction with the second aspect and the above implementation methods, in some implementation methods of the second aspect, the rewriting unit includes a rewriting subunit, which is specifically used to: rewrite the content to be rewritten under the i-th rewriting dimension according to the detection result, to obtain the rewritten text corresponding to the i-th rewriting dimension, including: For the content to be rewritten under the i-th rewriting dimension, if the detection result indicates that no rewriting processing needs to be performed on the content to be rewritten, the content to be rewritten is determined as the rewritten text corresponding to the i-th rewriting dimension. For the content to be rewritten under the i-th rewriting dimension, if the detection result indicates that the content to be rewritten needs to be rewritten, the content to be rewritten is rewritten according to the rewriting dimension to obtain the rewritten text corresponding to the i-th rewriting dimension.

[0025] Combining the second aspect and the above implementation methods, in some implementation methods of the second aspect, the rewriting unit is specifically used to: when the i-th rewriting dimension is the first dimension, obtain historical interaction records, and complete the intent-related information corresponding to the text information based on the historical interaction records to obtain the first rewritten text corresponding to the first dimension; wherein, the historical interaction records include multi-round interaction text information generated between the user and the target home device before the current interaction process; When the i-th rewriting dimension is the second dimension, the first rewritten text is rewritten in a personalized way based on the preset database to obtain the second rewritten text corresponding to the second dimension; wherein, the preset database is used to store the user's personalized interaction information; When the i-th rewriting dimension is the third dimension, obtain the home equipment list, and perform intent parsing and reasoning on the second rewritten text based on the home equipment list to obtain the third rewritten text corresponding to the third dimension; wherein, the home equipment list includes a variety of home equipment and the functional definition of each home equipment.

[0026] In conjunction with the second aspect and the above implementation methods, in some implementation methods of the second aspect, the rewriting module is also specifically used for: analyzing text information and determining the target rewriting dimension of the text information that needs to be rewritten from N rewriting dimensions; The text information is rewritten based on the target rewriting dimension to obtain the initial text instruction.

[0027] In combination with the second aspect and the above implementation methods, in some implementation methods of the second aspect, the processing module is specifically used to: perform format standardization processing, domain classification processing, and invalid instruction filtering processing on the initial text instruction to obtain the target text instruction.

[0028] Thirdly, an electronic device is provided, including a memory and a processor. The memory is used to store executable program code, and the processor is used to call and run the executable program code from the memory, causing the electronic device to perform the control method of the home device in the first aspect and any possible implementation of the first aspect.

[0029] Fourthly, a computer program product is provided, comprising: computer program code, which, when executed on a computer or processor, causes the computer or processor to perform the control method for the home appliance described in the first aspect and any possible implementation thereof.

[0030] Fifthly, a computer-readable storage medium is provided that stores computer program code, which, when executed on a computer, causes the computer to perform the control method for the home appliance described in the first aspect and any possible implementation thereof. Attached Figure Description

[0031] Figure 1 This is a schematic flowchart illustrating a method for controlling home appliances according to an embodiment of this application; Figure 2 This is a schematic diagram of the control process of a home appliance control system provided in an embodiment of this application; Figure 3This is a schematic diagram of the structure of a control device for a home appliance provided in an embodiment of this application; Figure 4 This is a schematic diagram of the structure of an electronic device provided in an embodiment of this application. Detailed Implementation

[0032] The technical solutions in this application will be clearly and thoroughly described below with reference to the accompanying drawings. In the description of the embodiments of this application, unless otherwise stated, " / " means "or," for example, A / B can mean A or B. "And / or" in the text is merely a description of the relationship between related objects, indicating that three relationships can exist. For example, A and / or B can represent: A existing alone, A and B existing simultaneously, and B existing alone. Furthermore, in the description of the embodiments of this application, "multiple" refers to two or more than two.

[0033] Hereinafter, the terms "first" and "second" are used for descriptive purposes only and should not be construed as implying or suggesting relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include one or more of that feature.

[0034] With the increasing popularity and application of smart home devices, the voice interaction scenarios between users and these devices are becoming increasingly diverse. Users are placing higher demands on the naturalness, fluency, and intelligence of voice interaction. Existing smart home voice assistant interaction architectures typically employ rule-based or lightweight neural network-based natural language understanding models (i.e., "fast models") to process voice commands. These solutions primarily rely on the user's current voice input, or only combine very short historical context information, to complete command standardization, domain classification (such as control domain, query domain), and non-command rejection processing.

[0035] However, in real-world scenarios involving multi-turn interactions, complex intent understanding, and continuous dialogue, relying solely on the current command or a very short historical context makes it difficult to fully understand the user's true intent. This can easily lead to problems such as inaccurate intent recognition, poor interaction coherence, and low adaptability to ambiguous or elliptical expressions, resulting in poor voice interaction performance and consequently a poor user experience.

[0036] For example, the existing smart home voice assistant interaction architecture usually has the following shortcomings: (1) Lack of long context understanding ability: The fast model is limited by the input length and computational complexity, making it difficult to handle long-cycle multi-turn dialogues, resulting in the loss of core slots such as device name and room name in multi-turn interactions. (2) Lack of personalized memory: It cannot effectively remember the user's device naming habits or control habits, resulting in the user having to use mechanical standard commands, resulting in a rigid experience. (3) Weak reasoning ability for fuzzy commands: The "fast model" usually does not load, use, or use the device list as input for calculation (due to the large amount of data, it takes a long time), which makes it difficult to handle generalized commands (such as "It's too hot" implying "Turn down the air conditioner") or fuzzy references (such as "I'm taking a shower" implying "Turn on the bathroom light / bath heater"). (4) Contradiction between instant response and high intelligence: Although the introduction of a large language model (LLM) with hundreds of billions of parameters can solve the above technical problems, its reasoning delay is usually as high as several seconds, which cannot meet the strict requirements of smart home control for real-time performance (usually required to be <1 second). (5) High system reconstruction cost: The existing "fast model" link has integrated a large number of historical business rules, and direct replacement is extremely risky. An incremental upgrade solution that can be compatible with the old link is needed.

[0037] To address the aforementioned technical issues, this application provides a home appliance control method, which is mainly applied to devices such as whole-house intelligent voice control systems, smart speakers, and intelligent central control screens with voice interaction functions.

[0038] Figure 1 This is a schematic flowchart of a home appliance control method provided in an embodiment of this application.

[0039] For example, such as Figure 1 As shown, the method 100 includes: Step 101: Receive the input voice signal and convert the voice signal into text information.

[0040] Step 102: Rewrite the text information according to N rewriting dimensions to obtain the initial text instructions.

[0041] Where N is an integer greater than or equal to 1.

[0042] Step 103: Normalize the initial text instructions to obtain the target text instructions.

[0043] The target text instruction is an executable instruction in a preset format.

[0044] Step 104: Control the target home appliances based on the target text instructions.

[0045] In this embodiment, after receiving the input voice signal, it can be converted into text information to facilitate the generation of commands for controlling home appliances based on the converted text information. After converting the voice signal into text information, the text information can be rewritten according to N rewriting dimensions, which can supplement and improve the text information from multiple aspects, effectively fill in the missing information in the command, fully understand the user's true intention, and avoid parsing errors caused by incomplete information or ambiguous expression, thereby significantly improving the accuracy of subsequent command parsing. Then, the initial text command is uniformly standardized to generate a standard format and directly executable target text command, which can effectively eliminate format differences and provide a stable and reliable foundation for subsequent command issuance and device execution. Finally, the target home appliance is controlled based on the target text command, which can achieve precise control of the home appliance.

[0046] The following is about Figure 1 The specific implementation methods of each step in the illustrated embodiment are explained below: In step 101, the aforementioned voice signal refers to the user's request voice to the home appliance, collected by the home appliance command recognition system through sound sensors such as microphones. For example, the aforementioned voice signal could be a request voice such as "Turn on the air conditioner," "Turn on the air conditioner dehumidification," "Adjust the air conditioner temperature to 24 degrees Celsius," "It's so cold now," or "I feel a bit hot."

[0047] Furthermore, since voice signals are unstructured audio information and cannot be directly recognized, parsed, and executed by the control systems of home appliances, they can first undergo structured processing. Specifically, the voice recognition system of the home appliances can convert the user's input voice signal into text information, thereby transforming audio-based interactive commands into parsable and processable text data. This provides a foundation for the generation of subsequent control commands, thus enabling precise control of the home appliances.

[0048] For example, the data processing module in the voice recognition system of the aforementioned home appliances can convert the user's input voice signal into text information through integrated Automatic Speech Recognition (ASR) technology. ASR technology is a technique that converts voice audio signals into corresponding text information.

[0049] As mentioned earlier, the existing interactive architecture of smart home voice assistants typically uses a rule-based or lightweight neural network-based natural language understanding model (i.e., a "fast model") to process voice commands. That is, after converting the voice signal into text information, the text information is directly input into the "fast model" to output the corresponding control command, and the home devices are controlled based on the control command.

[0050] The aforementioned "fast model" refers to a lightweight inference model designed for scenarios such as real-time voice interaction and smart home control. Its core features are small model size, low computational overhead, and short inference latency. It can complete instruction parsing within milliseconds, thereby fully meeting the requirements of device control for low latency and high response speed.

[0051] However, because "fast models" generally adopt lightweight structures and simplified reasoning logic, they typically do not load or rely on external data such as home device lists, scene contexts, user habits, and historical interaction information. They can only parse and execute literal, standardized, and structurally complete instructions, but lack the ability to deeply understand implicit intentions, omitted expressions, vague references, and users' personalized habits. This makes it difficult to fully understand the user's true intentions, and easily leads to problems such as inaccurate intention recognition, poor interaction coherence, and low adaptability to vague or omitted expressions. Consequently, the voice interaction effect is poor, resulting in a bad user experience.

[0052] To accurately understand the user's true control intent while meeting the demands for low latency and high real-time performance in smart home scenarios, this application's embodiments can add a multi-level rewriting model to the overall interaction architecture of the smart home voice assistant, while retaining the existing "fast model." The multi-level rewriting model performs intent completion, personalized matching, and scenario adaptation on the original text information, thereby converting the user's generalized, implicit, or non-standard commands into clear, standard text commands that can be directly recognized by the "fast model." The processed text can then be input into the "fast model" for parsing to quickly output control commands that match the user's true intent. This significantly improves the understanding ability and control accuracy of voice interaction without replacing the original link or increasing inference latency.

[0053] The aforementioned multi-level rewriting model refers to a functional model set between the data processing module and the "fast model" for multi-dimensional rewriting and optimization of text information. This multi-level rewriting model is used to rewrite text information according to multiple rewriting dimensions, converting text information into instructions adapted to the "fast model."

[0054] In step 102, the smart home device can specifically rewrite the text information according to N rewriting dimensions through the multi-level rewriting model in the interaction architecture to obtain the initial text instruction input to the "fast model".

[0055] For example, the above N rewriting dimensions can usually be set according to actual needs. For example, the above N rewriting dimensions may include, but are not limited to: context understanding ability dimension, personalization dimension, and intent reasoning dimension.

[0056] Among them, the above-mentioned contextual understanding ability dimension refers to the rewriting dimension of completing text information at the contextual information level; the above-mentioned personalization dimension refers to the rewriting dimension of rewriting text information at the personalized level; and the above-mentioned intent reasoning dimension refers to the rewriting dimension of rewriting text information at the user intent level.

[0057] In one possible implementation, the above-mentioned rewriting of text information according to N rewriting dimensions can specifically be a staged rewriting process of text information according to N rewriting dimensions.

[0058] In some embodiments, N is an integer greater than 1. The text information is rewritten according to N rewriting dimensions to obtain an initial text instruction, including: rewriting the content to be rewritten under the i-th rewriting dimension according to the i-th rewriting dimension to obtain the rewritten text corresponding to the i-th rewriting dimension; wherein, when i=1, the content to be rewritten under the i-th rewriting dimension is text information; when i>1, the content to be rewritten under the i-th rewriting dimension is the rewritten text corresponding to the (i-1)-th rewriting dimension; i is an integer greater than 1 and less than or equal to N; when i=N, the rewritten text corresponding to the N-th rewriting dimension is used as the initial text instruction.

[0059] Specifically, the text information can be rewritten in multiple levels using N rewriting dimensions in a sequential manner, with each rewriting dimension based on the result of the previous dimension. For example, the first rewriting dimension uses the text information directly as the content to be rewritten. Starting from the second rewriting dimension, each dimension uses the rewritten text output by the previous dimension as the content to be rewritten. After completing the rewriting process for all dimensions, the rewritten text obtained from the last rewriting dimension is used as the final initial text instruction.

[0060] The above method uses the rewriting result of the previous dimension as the content to be rewritten in the next dimension, forming a multi-dimensional and hierarchical rewriting process. It can gradually improve, supplement and standardize the text information from different dimensions, so that the final initial text instruction has the integrity of context, standardization of expression and accuracy of intent, thus ensuring the comprehensiveness of the rewriting process.

[0061] For example, taking N rewriting dimensions, including three rewriting dimensions: contextual understanding ability, personalization, and intent reasoning, the text information can be rewritten first according to the contextual understanding ability dimension to obtain the first rewritten text; then, the first rewritten text can be rewritten according to the personalization dimension to obtain the second rewritten text; then, the second rewritten text can be rewritten according to the intent reasoning dimension to obtain the third rewritten text; finally, the third rewritten text is used as the initial text instruction input to the "fast model".

[0062] In some embodiments, the rewriting process is performed on the content to be rewritten under the i-th rewriting dimension to obtain the rewritten text corresponding to the i-th rewriting dimension, including: when the i-th rewriting dimension is a first dimension, obtaining historical interaction records and supplementing the intent-related information corresponding to the text information based on the historical interaction records to obtain the first rewritten text corresponding to the first dimension; wherein, the historical interaction records contain multi-round interaction text information generated between the user and the target home appliance before the current interaction process; when the i-th rewriting dimension is a second dimension, the first rewritten text is personalized based on a preset database to obtain the second rewritten text corresponding to the second dimension; wherein, the preset database is used to store the user's personalized interaction information; when the i-th rewriting dimension is a third dimension, obtaining a list of home appliances and performing intent parsing and reasoning on the second rewritten text based on the list of home appliances to obtain the third rewritten text corresponding to the third dimension; wherein, the list of home appliances includes multiple home appliances and the functional definition corresponding to each home appliance.

[0063] For example, the first dimension mentioned above can specifically be the context understanding ability dimension described above; the second dimension mentioned above can specifically be the personalization dimension described above; and the third dimension mentioned above can specifically be the intent reasoning dimension described above.

[0064] Specifically, the process of rewriting the content to be rewritten under the i-th rewriting dimension can be as follows: First, based on the first dimension (i.e., the contextual understanding dimension), by obtaining historical interaction records before this interaction, incomplete, omitted, or unclear intent-related information in the text information currently entered by the user is supplemented to obtain the first rewritten text; then, based on the second dimension (i.e., the personalization dimension), using a preset database storing user habits, preferences, and other information, the first rewritten text is rewritten in a personalized way to adapt to user habits to obtain the second rewritten text; finally, based on the third dimension (i.e., the intent reasoning dimension), based on a list of home appliances containing various home appliances and their functional definitions, the second rewritten text is precisely analyzed and reasoned to obtain the third rewritten text.

[0065] Optionally, the rewriting order of each of the above N rewriting dimensions can be set according to actual needs. This application embodiment does not limit the rewriting order of the rewriting dimensions during the rewriting process.

[0066] In some embodiments, the text information is rewritten by N sequentially connected rewriting sub-modules in a multi-level rewriting model according to N rewriting dimensions.

[0067] Each rewrite submodule is used to perform rewrite processing under a corresponding rewrite dimension. For example, the first rewrite submodule is used to perform rewrite processing under the first dimension, the second rewrite submodule is used to perform rewrite processing under the second dimension, and the third rewrite submodule is used to perform rewrite processing under the third dimension.

[0068] In other embodiments, where computing resources permit, the N rewriting sub-modules in the above-described multi-level rewriting model can also be freely combined in series and parallel according to actual needs. This application does not limit this aspect.

[0069] For example, taking N rewriting dimensions, including three rewriting dimensions: contextual understanding ability, personalization, and intention reasoning, the above multi-level rewriting model can include a first rewriting sub-module (also known as a "multi-turn ability rewriting sub-module") for performing rewriting processing under the contextual understanding ability dimension, a second rewriting sub-module (also known as a "memory ability rewriting sub-module") for performing rewriting processing under the personalization dimension, and a third rewriting sub-module (also known as a "reasoning enhancement rewriting sub-module") for performing rewriting processing under the intention reasoning dimension.

[0070] The input content of the first rewriting submodule can be text information and historical interaction records; the input content of the second rewriting submodule can be the output content of the multi-turn capability rewriting submodule and a preset database; and the input content of the third rewriting submodule can be the output content of the memory capability rewriting submodule and a list of home devices.

[0071] For example, each of the above rewriting submodules can integrate a machine learning model for performing rewriting processing in the rewriting dimension corresponding to that submodule.

[0072] Furthermore, the machine learning models used to implement rewriting processing in various dimensions can be trained by constructing training data containing a large number of labeled samples, based on different rewriting tasks such as contextual understanding, personalized adaptation, and intent reasoning. After training, the machine learning models can be integrated into the corresponding rewriting sub-modules to perform the rewriting processing of the sub-modules in the corresponding rewriting dimensions.

[0073] For example, the trained first model is integrated into the first rewriting submodule, so that the first model can complete the intent-related information in the text information based on the input text information and historical interaction records, and output the first rewritten text.

[0074] For example, assuming the text information obtained from the user's voice input is "warm it up a bit", we can obtain historical interaction records: the first round of interaction records is "turn on the living room light", the second round of interaction records is "brighten it up a bit", and the third round of interaction records is "brightest". The first model infers based on the multiple rounds of historical interaction records, completes the text information, and obtains the first rewritten text as "warm up the color temperature of the living room light".

[0075] The trained second model is integrated into the second rewriting submodule, which enables the second model to rewrite the first rewritten text in a personalized way based on the input first rewritten text and the preset database, and output the second rewritten text.

[0076] For example, the first rewritten text output by the first rewriting submodule is "warm up the color temperature of the ceiling light in the living room". The preset database stores the personalized information that the user has customized the name of the "main light in the living room" to "ceiling light in the living room". The second model rewrites the first rewritten text according to the customized device name. Specifically, it replaces the user-defined name in the first rewritten text with the standard device name, so that the second rewritten text is "warm up the color temperature of the main light in the living room".

[0077] The trained third model is integrated into the third rewriting submodule, which enables the third model to perform intent parsing and inference on the second rewritten text based on the input second rewritten text and the list of home appliances, and output the third rewritten text.

[0078] For example, the second rewritten text output by the second rewriting submodule is "It's too hot". Combining the air conditioner in the home appliance list, the third model performs intent parsing and reasoning on the second rewritten text to obtain the third rewritten text as "Turn down the air conditioner temperature".

[0079] Assuming that the above i rewriting dimensions only include the first dimension, the second dimension, and the third dimension, if the rewritten text obtained by the last rewriting dimension is the third rewritten text corresponding to the third dimension, then the third rewritten text can be used as the initial text instruction input to the "fast model".

[0080] For example, taking the user's current input text "make it warmer" as an example, the first rewriting submodule first completes the context by combining multiple rounds of historical interaction records, resulting in the first rewritten text "adjust the color temperature of the living room ceiling light to warmer"; then, the second rewriting submodule personalizes the text based on the user-defined device name in the preset database, replacing the user-defined name "living room ceiling light" with "living room main light", resulting in the second rewritten text "adjust the color temperature of the living room main light to warmer"; finally, the third rewriting submodule performs intent parsing and reasoning on the above text by combining the home equipment list, clarifying the device functions and control command formats, resulting in the initial text command that can be directly input into the fast model for execution: "control the living room main light to adjust the color temperature to warm light mode".

[0081] The above method first supplements the contextual and referential information in the text information based on multiple rounds of historical interaction records to obtain the first rewritten text, thereby solving the problems of ambiguous instructions and missing information. Then, the first rewritten text is personalized based on the user's personalized data, so that the resulting second rewritten text combines the user's usage habits and preferences. Finally, the second rewritten text is accurately interpreted and inferred based on the list of home devices. This process gradually improves the text information from ambiguous and fragmented to semantically complete, personalized, and clearly defined instructions, greatly improving the understanding accuracy and execution reliability of smart home interaction. At the same time, the hierarchical processing method also ensures the efficiency and stability of the rewriting process.

[0082] Furthermore, the process described above, in which N sequentially connected rewriting sub-modules in a multi-level rewriting model rewrite text information according to N rewriting dimensions, typically consumes a certain amount of time. Statistical analysis shows that approximately 80% of user input commands in smart home scenarios are simple commands that do not require rewriting. Therefore, inputting simple commands that do not require rewriting into sequentially connected rewriting sub-modules may introduce unnecessary delays.

[0083] Based on this, in order to ensure the accuracy of intent understanding while fully meeting the requirements of smart home device control for low latency and high response speed, the embodiments of this application can detect whether the text to be rewritten needs to be rewritten before performing the corresponding rewriting processing according to each rewriting dimension, and selectively perform the corresponding rewriting processing according to the detection results, thereby effectively reducing redundant processing in scenarios where rewriting is not required, and further improving the overall instruction processing efficiency.

[0084] In some embodiments, rewriting the content to be rewritten under the i-th rewriting dimension according to the i-th rewriting dimension to obtain the rewritten text corresponding to the i-th rewriting dimension includes: detecting the content to be rewritten under the i-th rewriting dimension to obtain a detection result; wherein, the detection result is used to characterize whether to perform rewriting processing on the content to be rewritten under the i-th rewriting dimension; and rewriting the content to be rewritten under the i-th rewriting dimension according to the detection result to obtain the rewritten text corresponding to the i-th rewriting dimension.

[0085] For example, as mentioned above, when i=1, the content to be rewritten under the i-th rewriting dimension refers to text information; when i>1, the content to be rewritten under the i-th rewriting dimension refers to the rewritten text corresponding to the (i-1)-th rewriting dimension.

[0086] Assuming that the i rewriting dimensions include the first dimension, the second dimension, and the third dimension, then the content to be rewritten (i.e., text information) under the first dimension can be detected, the content to be rewritten (i.e., the first rewritten text) under the second dimension can be detected, and the content to be rewritten (i.e., the second rewritten text) under the third dimension can be detected respectively.

[0087] Furthermore, the above detection results are specifically used to characterize whether rewriting processing needs to be performed on the content to be rewritten under the i-th rewriting dimension. The detection results can typically include two types of detection results: whether rewriting processing needs to be performed on the content to be rewritten and whether rewriting processing does not need to be performed on the content to be rewritten.

[0088] Understandably, content that does not require rewriting typically falls into two categories: one is where rewriting cannot be performed, meaning there is a lack of corresponding rewriting basis, such as the absence of multi-round historical interaction records, the absence of user habit information or custom device names in the preset database, or the absence of a valid list of home devices, making it impossible to perform information completion, personalized adaptation, or intent reasoning based on the corresponding rewriting dimension; the other is where rewriting is unnecessary, meaning the content itself is semantically clear and the instructions are standardized, already meeting the rewriting requirements under the current rewriting dimension, and no further rewriting optimization is needed.

[0089] Based on this, the embodiments of this application can set rewriting execution conditions and rewriting completion conditions for each rewriting dimension for the above two types of situations, and for each rewriting dimension, the home appliance control system can determine whether the content to be rewritten under that rewriting dimension meets the rewriting execution conditions and rewriting completion conditions.

[0090] In some embodiments, the content to be rewritten under the i-th rewriting dimension is detected to obtain a detection result, including: determining whether the rewriting execution condition corresponding to the i-th rewriting dimension is currently met, and determining whether the content to be rewritten under the i-th rewriting dimension has met the rewriting completion condition; if it is determined that the rewriting execution condition is not met or the rewriting completion condition is met, a first detection result is output; wherein, the first detection result is used to indicate that no rewriting processing is required for the content to be rewritten; if it is determined that the rewriting execution condition is met but the rewriting completion condition is not met, a second detection result is output; wherein, the second detection result is used to indicate that rewriting processing is required for the content to be rewritten.

[0091] The aforementioned rewrite execution conditions refer to the basic information required to perform the rewrite processing under this rewrite dimension. Specifically, it can be the basis for rewriting under this rewrite dimension. For example, the rewrite execution condition under the first dimension can be that multiple rounds of historical interaction records have been obtained, the rewrite execution condition under the second dimension can be that user personalized information exists in the preset database, and the rewrite execution condition under the third dimension can be that a list of home devices has been obtained.

[0092] The above-mentioned rewriting completion conditions refer to the fact that the content to be rewritten has met the rewriting requirements under the current rewriting dimension, and no further rewriting processing is required for the current rewriting dimension. For example, the rewriting completion conditions under the first dimension may be that the content to be rewritten is semantically clear and has no missing or ambiguous information; the rewriting completion conditions under the second dimension may be that the content to be rewritten conforms to the standard instruction format; and the rewriting completion conditions under the third dimension may be that the intent of the content to be rewritten is clear and no intent parsing and reasoning are required.

[0093] It is understandable that if the rewrite execution condition corresponding to the i-th rewrite dimension is met, it means that there is a basis for rewriting under that rewrite dimension, and it is determined that the content to be rewritten under that rewrite dimension can be rewritten. If the rewrite execution condition corresponding to the i-th rewrite dimension is not met, it means that there is no basis for rewriting under that rewrite dimension, and it is not possible to rewrite the content to be rewritten under that rewrite dimension. In this case, the first detection result can be output to represent that no rewrite processing is required for the content to be rewritten.

[0094] If the content to be rewritten in the i-th rewriting dimension has met the rewriting completion condition, it means that the content to be rewritten has met the rewriting requirements in this rewriting dimension, and no additional rewriting processing is required for the content to be rewritten in this rewriting dimension. In this case, a first detection result can be output to represent that no rewriting processing is required for the content to be rewritten. If the content to be rewritten in the i-th rewriting dimension has not met the rewriting completion condition, it means that the content to be rewritten has not met the rewriting requirements in this rewriting dimension. In this case, rewriting processing can be performed on the content to be rewritten in this rewriting dimension so that the rewritten text obtained after rewriting meets the rewriting requirements.

[0095] As can be seen from the above, if there is no basis for rewriting under this rewriting dimension or no rewriting processing is required for the content to be rewritten, that is, if it is determined that the rewriting execution conditions are not met or the rewriting completion conditions are met, a first detection result can be output to represent that no rewriting processing is required for the content to be rewritten.

[0096] If there is a basis for rewriting under this rewriting dimension and it is determined that the content to be rewritten needs to be rewritten, that is, if it is determined that the rewriting execution condition is met but the rewriting completion condition is not met, a second detection result can be output to characterize that the content to be rewritten needs to be rewritten.

[0097] For example, suppose the rewrite execution condition under the first dimension is that multiple rounds of historical interaction records have been obtained, and the rewrite completion condition under the first dimension is that the content to be rewritten is semantically clear and has no missing or ambiguous information. If the content to be rewritten under the first dimension is "warm up a bit", it is clear that the content to be rewritten is obviously missing, that is, the content to be rewritten does not meet the rewrite completion condition under the first dimension. However, if multiple rounds of historical interaction records can be obtained at present, it can be determined that the rewrite execution condition under the first dimension has been met. That is, the rewrite execution condition is met but the rewrite completion condition is not met. Then, a second detection result can be output to represent that the content to be rewritten needs to be rewritten.

[0098] If the content to be rewritten in the first dimension is "turn the main light in the living room a little warmer", it can be seen that there is no missing content, that is, the content to be rewritten has met the rewriting completion condition in the first dimension, and then the first detection result can be output to represent that no rewriting processing needs to be performed on the content to be rewritten.

[0099] If the content to be rewritten in the first dimension is "warm up a bit", it is clear that the content to be rewritten is missing, that is, the content to be rewritten does not meet the rewriting completion conditions in the first dimension; however, if multiple rounds of historical interaction records cannot be obtained at present, it can be determined that the rewriting execution conditions in the first dimension are not met at present, and the first detection result can be output to represent that no rewriting processing needs to be performed on the content to be rewritten.

[0100] The above method can accurately determine whether the text to be rewritten in the current rewriting dimension needs to be rewritten by judging whether the rewriting execution conditions and the rewriting completion conditions are met respectively. When the rewriting execution conditions are not met or the rewriting completion conditions are met, the detection result that no rewriting is required is directly output, avoiding repeated execution and invalid execution during subsequent rewriting processing, effectively improving the accuracy and reliability of instruction processing, and reducing system resource consumption and latency.

[0101] Furthermore, after obtaining the detection results of the content to be rewritten under each of the above rewriting dimensions, the content to be rewritten under each rewriting dimension can be rewritten based on the corresponding detection results to obtain the rewritten text under that rewriting dimension.

[0102] In some embodiments, based on the detection results, the content to be rewritten under the i-th rewriting dimension is rewritten to obtain the rewritten text corresponding to the i-th rewriting dimension, including: for the content to be rewritten under the i-th rewriting dimension, if the detection results indicate that no rewriting processing is required, the content to be rewritten is determined as the rewritten text corresponding to the i-th rewriting dimension; for the content to be rewritten under the i-th rewriting dimension, if the detection results indicate that rewriting processing is required, the content to be rewritten is rewritten according to the rewriting dimension to obtain the rewritten text corresponding to the i-th rewriting dimension.

[0103] It is understandable that when processing the content to be rewritten under the i-th rewriting dimension, if the detection result is that no rewriting is required, the original content to be rewritten is directly used as the rewritten text corresponding to the current rewriting dimension; if the detection result is that rewriting is required, the content to be rewritten is rewritten according to the processing logic corresponding to the current rewriting dimension, and the rewritten content is used as the rewritten text corresponding to the current rewriting dimension.

[0104] For example, if the content to be rewritten in the first dimension is "turn the main light in the living room a little warmer", the corresponding detection result indicates that no rewriting processing is required for the content to be rewritten. In this case, the content to be rewritten, "turn the main light in the living room a little warmer", is determined as the first rewritten text corresponding to the first dimension. If the content to be rewritten in the first dimension is "turn on the warmer", the corresponding detection result indicates that the content to be rewritten needs to be rewritten. Then, the historical interaction record is obtained, and the intent-related information corresponding to the content to be rewritten is supplemented based on the historical interaction record, so that the first rewritten text corresponding to the first dimension is "turn on the living room main light a little warmer".

[0105] In some embodiments, a rewrite detection model can be pre-built and trained, and integrated into the rewrite submodule. This model is used to determine whether the content to be rewritten needs to be rewritten in the corresponding rewrite dimension. During detection, the content to be rewritten, the relevant information required for the current rewrite dimension, and the rewrite execution and completion conditions under the current rewrite dimension can be input into the rewrite detection model, which will then make predictions and output the corresponding detection results.

[0106] Exemplarily, the above detection result is used to characterize whether a rewriting process needs to be performed on the content to be rewritten, and its output form can be specifically divided into two categories: one is "do not rewrite", indicating that there is no need to perform a rewriting process on the current content to be rewritten; the other is "rewrite", indicating that a rewriting process needs to be performed on the content to be rewritten, and the detection result may include the content to be rewritten itself.

[0107] For example, if the detection result indicates that a rewriting process needs to be performed on the content to be rewritten "warm up a bit", then the detection result can specifically be "Rewrite: warm up a bit"; if the detection result indicates that there is no need to perform a rewriting process on the content to be rewritten, then the detection result can specifically be "do not rewrite".

[0108] In some embodiments, usually, the detection result can be quickly identified by monitoring the first Token output by the rewriting detection model to determine whether a rewriting process needs to be performed on the content to be rewritten.

[0109] Among them, the above "first Token" refers to the first word segment of the detection result, and the two results of "rewrite" and "do not rewrite" can be quickly identified according to the first word segment of the detection result output by the rewriting detection model.

[0110] For example, if it is detected that the first word segment of the detection result output by the rewriting detection model is "not", it is determined that there is no need to perform a rewriting process on the content to be rewritten, and then the content to be rewritten is directly used as the rewritten text in the current rewriting dimension; if it is detected that the first word segment of the detection result output by the rewriting detection model is "rewrite", it is determined that a rewriting process needs to be performed on the content to be rewritten, and then the content to be rewritten is rewritten according to the current rewriting dimension to obtain the rewritten text corresponding to the current rewriting dimension.

[0111] The above method first detects whether the content to be rewritten needs to be rewritten in each rewriting dimension, so as to distinguish whether to perform a rewriting process or directly use the original content. It can directly use the content to be rewritten as the rewritten text corresponding to the current dimension when there is no need to rewrite, avoiding unnecessary rewriting calculations and resource overheads, and then perform the corresponding dimension's rewriting process when a rewriting process is required. This not only ensures the accuracy of instruction processing in each rewriting dimension but also effectively reduces invalid operations and system latency. In addition, the decision-making judgment can be quickly completed only based on the first identifying Token, without waiting for the model to output the complete content, which can effectively shorten the instruction processing latency and significantly improve the response speed of smart home control.

[0112] Furthermore, as described above, the text information is subjected to multi-level rewriting processing using N rewriting dimensions in sequence. Each rewriting dimension continues to process based on the result processed by the previous dimension. After completing the rewriting processing of all dimensions in sequence, the rewritten text obtained from the last rewriting dimension is used as the final initial text instruction.

[0113] Another possible implementation is to predict in advance which dimension of the text information needs to be rewritten, so that the text information can be rewritten only through the corresponding dimension rewriting submodule, effectively reducing unnecessary resource consumption.

[0114] In some embodiments, rewriting text information according to at least one rewriting dimension to obtain initial text instructions includes: analyzing the text information, determining the target rewriting dimension of the text information to be rewritten from N rewriting dimensions, and rewriting the text information according to the target rewriting dimension to obtain initial text instructions.

[0115] Specifically, a pre-built lightweight routing classifier can be used to identify and classify the semantic features, contextual dependencies, ambiguity, and lack of personalized information of text information, and determine the target rewriting dimension that the current text information actually needs to be enhanced from N rewriting dimensions.

[0116] The routing classifier can selectively invoke the rewriting sub-module corresponding to the target rewriting dimension based solely on the actual needs of the text information. Then, through its corresponding rewriting sub-module, the text information is rewritten in a targeted manner under the target rewriting dimension, ultimately resulting in an initial text instruction that is semantically complete and formatted correctly.

[0117] For example, assuming the text information is "turn it warmer", by analyzing the text information through a pre-built lightweight routing classifier, it can be determined that the text information has problems with unclear context and unclear control object. Then, it can be determined from N rewriting dimensions that the text information needs to be rewritten in the context understanding dimension. That is, if the target rewriting dimension is determined to be the context understanding dimension, then only the rewriting submodule corresponding to the context understanding dimension can be invoked, and the text information can be rewritten based on the context understanding dimension of the rewriting submodule, resulting in the initial text instruction "turn the living room main light warmer".

[0118] The above method first analyzes the text information and accurately determines the target rewriting dimension to be processed from N rewriting dimensions. Then, it performs targeted rewriting processing on the text information based on the target rewriting dimension. This avoids redundant calculations caused by rewriting the text information in all rewriting dimensions, thereby reducing unnecessary resource consumption. It can also significantly reduce instruction processing latency while ensuring the text information is complete, standardized, and adapted.

[0119] In step 103, it is understood that since the initial text instruction obtained after the above multi-level rewriting process is still a text instruction in natural language form, it cannot be directly recognized and executed by the control system of the smart home device. Therefore, the initial text instruction can be input into the retained "fast model" and the "fast model" can further standardize the initial text instruction to obtain a preset format executable target text instruction.

[0120] For example, the above-mentioned "fast model" can typically include a basic rewriting submodule, a domain classification submodule, and a rejection submodule. These three submodules can be used to perform normalization processing on the initial input text instructions in different dimensions to obtain the target text instructions.

[0121] In some embodiments, the initial text instruction is normalized to obtain the target text instruction, including: performing format standardization processing, domain classification processing, and invalid instruction filtering processing on the initial text instruction to obtain the target text instruction.

[0122] Specifically, the above-mentioned format standardization process can be achieved through the basic rewriting submodule in the "fast model". Specifically, it involves standardizing the sentence structure, unifying the expression, and standardizing the instruction format of the initial text instructions to make them conform to the standard instruction format that the control system of smart home devices can recognize.

[0123] The aforementioned domain classification process can be implemented through the domain classification submodule in the "fast model". Specifically, it determines the smart home control domain to which the initial text instruction belongs based on its semantic content, providing a basis for subsequent device matching and instruction execution.

[0124] The above-mentioned invalid instruction filtering process can be implemented through the rejection submodule in the "fast model". Specifically, it performs a validity judgment on the initial text instruction and filters out meaningless, unrecognizable or invalid instructions that do not meet the execution conditions, thereby avoiding the erroneous execution of invalid instructions.

[0125] Specifically, the initial text command can first be filtered for invalid commands. This can be done through the rejection submodule in the "Fast Model" to identify the initial text command and determine whether it is an unrecognizable invalid command. If it is an unrecognizable invalid command, a "reject" result is output, and no device operation is performed or the command is further processed. If it is a recognizable valid command, a "pass" result is output, and the command is then further processed for format standardization and domain classification.

[0126] For example, assuming the initial text instruction is "Turn the living room main light warmer", the initial text instruction can first be filtered for invalid instructions by the rejection submodule in the "Quick Model". If the result is "pass", the initial text instruction can be processed by format standardization and domain classification by the basic rewriting submodule and the domain classification submodule to obtain the target text instruction "Control the color temperature of the living room main light to warm light mode", and the domain of the instruction is "lighting domain".

[0127] Assuming the initial text instruction is "The weather is nice today", the invalid instruction filtering process can be performed on the initial text instruction through the rejection submodule in the "Fast Model". If the "rejection" result is obtained, the processing flow will end directly and no further processing actions will be performed.

[0128] The above method, through format standardization, domain classification, and invalid instruction filtering of initial text instructions, can standardize natural language instructions with diverse expressions and inconsistent formats into uniform and executable standard instructions. At the same time, it can accurately identify the control domain to which the instruction belongs, filter out invalid, meaningless, or unrecognizable instructions, improve the standardization and accuracy of the instructions, reduce the parsing difficulty of subsequent execution modules, facilitate precise control of home devices, and improve the execution efficiency of the smart home control system.

[0129] In step 104, the aforementioned target home appliance refers to the home appliance that is indicated by the target control command and requires the execution of control operations.

[0130] Specifically, after receiving the aforementioned target control instructions, the target home appliance can be identified first, and then the target home appliance can be controlled based on the target control instructions.

[0131] For example, assuming the target text instruction is "Control the living room main light to set the color temperature to warm light mode", it can be determined that the target home appliance to be controlled by the target control instruction is "living room main light", and then the color temperature of the target home appliance, i.e., the living room main light, can be set to warm light mode.

[0132] Furthermore, since the target control command is a structured command at the semantic level, it cannot directly match the underlying communication protocols and driver interfaces of various smart home devices. Therefore, after obtaining the target control command, further processing such as command verification, parameter parsing and protocol encapsulation can be performed on the target control command to convert it into a control signal that can be directly recognized and driven by the command execution module in the control system of the smart home device, and control of the target home device can be achieved through the control signal.

[0133] Figure 2 This is a schematic diagram of the control process of a home appliance control system provided in an embodiment of this application.

[0134] For example, such as Figure 2 As shown, the home appliance control system (also known as the "smart home appliance control system") may include an input processing module 10, a smart enhancement cascade unit 20, a fast model unit 30, a decision and streaming controller 40, and an instruction execution module 50.

[0135] The input processing module 10 is used to receive the voice signal input by the user, convert the voice signal into text information Q0 through ASR technology, and then input the converted text information into the intelligent enhancement cascade unit 20.

[0136] The intelligent enhancement cascade unit 20, also known as a "multi-level rewriting model," may include: a multi-round capability rewriting submodule 2001, a memory capability rewriting submodule 2002, and a reasoning enhancement rewriting submodule 2003. These submodules can be connected in series.

[0137] Specifically, the input of the multi-turn capability rewriting submodule 201 can be text information and historical N-turn dialogue (i.e., the "historical interaction record" mentioned above). Based on the historical N-turn dialogue, the intent-related information corresponding to the text information is completed, and the corresponding first rewritten text Q1 can be output. The first rewritten text Q1 can be output to the memory capability rewriting submodule 202.

[0138] The input to the memory ability rewriting submodule 202 can be the first rewritten text Q1 output by the multi-round ability rewriting submodule 201 and the user profile / habit memory bank (i.e., the "preset database" mentioned above). Based on the user profile / habit memory bank, the first rewritten text Q1 is rewritten in a personalized way, and the corresponding second rewritten text Q2 can be output. The second rewritten text Q2 can be output to the reasoning enhancement rewriting submodule 203.

[0139] The input to the reasoning enhancement rewriting submodule 203 can be the second rewritten text Q2 and the full list of home devices (i.e., the "home device list" mentioned above). Based on the home device list, the second rewritten text Q2 is subjected to intent parsing and reasoning, and the corresponding third rewritten text Q3 is output. The third rewritten text Q3 can be directly output to the fast model unit 30 as the initial text instruction.

[0140] The input of the fast model unit 30 can be the final output of the intelligent enhancement cascade unit 20, i.e., the initial text instruction. Its output can be the target text instruction after normalizing the initial text instruction. The fast model unit 30 can include: a basic rewriting submodule 3001, a domain classification submodule 3002, and a rejection submodule 3003.

[0141] Specifically, the basic rewriting submodule 3001 is used to standardize the instructions finally output by the intelligent enhancement cascade unit 20. Specifically, it performs sentence structure regularization, expression unification and instruction format standardization on the initial text instructions so that they conform to the standard instruction form that the control system of the smart home device can recognize.

[0142] The domain classification submodule 3002 is used to identify and classify the smart home control domain contained in the instructions finally output by the smart enhancement cascade unit 20. Specifically, it determines the smart home control domain to which the initial text instruction belongs based on the semantic content of the initial text instruction, providing a basis for subsequent device matching and instruction execution.

[0143] The rejection submodule 3003 is used to filter the non-human-computer interaction content contained in the instructions finally output by the intelligent enhancement cascade unit 20. Specifically, it performs validity judgment on the initial text instructions, filters out meaningless, unrecognizable or invalid instructions that do not meet the conditions for execution, and avoids the erroneous execution of invalid instructions.

[0144] The decision and streaming controller 40 can be connected to the multi-round capability rewriting submodule 2001, the memory capability rewriting submodule 2002, and the reasoning enhancement rewriting submodule 2003, respectively. Specifically, it is used to determine whether the content to be rewritten input to each rewriting submodule needs to be rewritten and output the corresponding detection results. At the same time, it can determine the output content of each rewriting submodule by monitoring the first token of the detection results.

[0145] For example, taking the multi-round capability rewriting submodule 2001 as an example, if the first token of the detection result corresponding to the input text information is "not", then the first rewritten text Q1=Q0 is output; if the first token of the detection result corresponding to the input text information is "modified", then the multi-round capability rewriting submodule 2001 continues to execute the corresponding rewriting logic and outputs the first rewritten text Q1 after rewriting the text information.

[0146] The instruction execution module 50 is used to execute corresponding device control based on the target text instruction finally output by the fast model unit 30. The specific control method of the instruction execution module 50 can be referred to the method described above, and will not be repeated here.

[0147] Furthermore, the aforementioned system can adopt a cloud-and-device collaborative deployment architecture. The intelligent enhancement cascading unit 20 and the decision and streaming controller 40 in the aforementioned system can be deployed on a cloud server to utilize the computing power and data resources of the cloud to complete complex intelligent rewriting, cascading processing, and decision control; the input processing module 10, the fast model unit 30, and the instruction execution module 50 in the aforementioned system are deployed on a home gateway or device-side to realize voice signal processing, fast instruction verification, and local instruction execution, thereby improving system response speed and execution reliability while ensuring complex semantic understanding capabilities.

[0148] In the above embodiments, a multi-level rewriting model is added to the existing smart home voice assistant interaction architecture (i.e., the interaction architecture including the "fast model"). The added module is only used to output the rewritten text. For the downstream execution system in the smart home voice assistant interaction architecture, it is consistent with ordinary commands, without the need to reconstruct the original business execution logic, thus achieving better compatibility. At the same time, the above system completes the context information and referential information in the text information based on multiple rounds of historical interaction records to obtain the first rewritten text, thereby solving the problems of ambiguous commands and missing information. Then, the first rewritten text is personalized based on the user's personalized data, so that the resulting second rewritten text combines the user's usage habits and preferences. Finally, the second rewritten text is accurately interpreted and inferred based on the home device list, so that the text information is gradually improved from ambiguous and fragmented to semantically complete, personalized and adapted, and clearly defined commands, which greatly improves the naturalness and understanding accuracy of smart home interaction. At the same time, the hierarchical processing method also ensures the efficiency and stability of the rewriting process.

[0149] Figure 3 This is a schematic diagram of the structure of a control device for a home appliance provided in an embodiment of this application.

[0150] For example, such as Figure 3 As shown, the device 300 includes: The conversion module 301 is used to receive the input voice signal and convert the voice signal into text information; The rewriting module 302 is used to rewrite text information according to N rewriting dimensions to obtain initial text instructions; where N is an integer greater than or equal to 1. The processing module 303 is used to standardize the initial text instruction to obtain the target text instruction; wherein the target text instruction is an executable instruction in a preset format. Control module 304 is used to control target home appliances based on target text commands.

[0151] In one possible implementation, N is an integer greater than 1, and the rewriting module is specifically used to: rewrite the content to be rewritten under the i-th rewriting dimension according to the i-th rewriting dimension to obtain the rewritten text corresponding to the i-th rewriting dimension; wherein, when i=1, the content to be rewritten under the i-th rewriting dimension is text information; when i>1, the content to be rewritten under the i-th rewriting dimension is the rewritten text corresponding to the (i-1)-th rewriting dimension; i is an integer greater than 1 and less than or equal to N; when i=N, the rewritten text corresponding to the N-th rewriting dimension is used as the initial text instruction.

[0152] In one possible implementation, the rewriting module includes a rewriting unit, which is specifically used to: detect the content to be rewritten under the i-th rewriting dimension and obtain a detection result; wherein the detection result is used to characterize whether to perform rewriting processing on the content to be rewritten under the i-th rewriting dimension; and rewrite the content to be rewritten under the i-th rewriting dimension according to the detection result to obtain the rewritten text corresponding to the i-th rewriting dimension.

[0153] In one possible implementation, the rewriting unit includes a detection subunit, which is specifically used to: determine whether the rewriting execution condition corresponding to the i-th rewriting dimension is currently met, and determine whether the content to be rewritten under the i-th rewriting dimension has met the rewriting completion condition; if it is determined that the rewriting execution condition is not met or the rewriting completion condition is met, output a first detection result; wherein the first detection result is used to indicate that no rewriting processing is required for the content to be rewritten; if it is determined that the rewriting execution condition is met but the rewriting completion condition is not met, output a second detection result; wherein the second detection result is used to indicate that rewriting processing is required for the content to be rewritten.

[0154] In one possible implementation, the rewriting unit includes a rewriting subunit, which is specifically used to: rewrite the content to be rewritten under the i-th rewriting dimension according to the detection result to obtain the rewritten text corresponding to the i-th rewriting dimension, including: for the content to be rewritten under the i-th rewriting dimension, if the detection result indicates that no rewriting processing is required for the content to be rewritten, determining the content to be rewritten as the rewritten text corresponding to the i-th rewriting dimension; for the content to be rewritten under the i-th rewriting dimension, if the detection result indicates that rewriting processing is required for the content to be rewritten, rewriting the content to be rewritten according to the rewriting dimension to obtain the rewritten text corresponding to the i-th rewriting dimension.

[0155] In one possible implementation, the rewriting unit is specifically used for: when the i-th rewriting dimension is the first dimension, acquiring historical interaction records, and supplementing the intent-related information corresponding to the text information based on the historical interaction records to obtain the first rewritten text corresponding to the first dimension; wherein, the historical interaction records contain multi-round interaction text information generated between the user and the target home appliance before the current interaction process; when the i-th rewriting dimension is the second dimension, performing personalized rewriting of the first rewritten text based on a preset database to obtain the second rewritten text corresponding to the second dimension; wherein, the preset database is used to store the user's personalized interaction information; when the i-th rewriting dimension is the third dimension, acquiring a list of home appliances, and performing intent parsing and reasoning on the second rewritten text based on the list of home appliances to obtain the third rewritten text corresponding to the third dimension; wherein, the list of home appliances includes multiple home appliances and the functional definition corresponding to each home appliance.

[0156] In one possible implementation, the rewriting module is further specifically used for: analyzing text information, determining the target rewriting dimension of the text information to be rewritten from N rewriting dimensions; and rewriting the text information according to the target rewriting dimension to obtain the initial text instruction.

[0157] In one possible implementation, the processing module is specifically used to: perform format standardization processing, domain classification processing, and invalid instruction filtering processing on the initial text instruction to obtain the target text instruction.

[0158] Figure 4 This is a schematic diagram of the structure of an electronic device provided in an embodiment of this application.

[0159] For example, such as Figure 4 As shown, the electronic device 400 includes a memory 401 and a processor 402, wherein the memory 401 stores executable program code 4011, and the processor 402 is used to call and execute the executable program code 4011 to perform a control method for a home device.

[0160] Furthermore, embodiments of this application also protect an apparatus that may include a memory and a processor, wherein the memory stores executable program code, and the processor is used to call and execute the executable program code to perform a home appliance control method provided in embodiments of this application.

[0161] This embodiment can divide the device into functional modules based on the above method example. For example, each module can correspond to a separate function, or two or more functions can be integrated into one processing module. The integrated module can be implemented in hardware. It should be noted that the module division in this embodiment is illustrative and only represents one logical functional division. In actual implementation, there may be other division methods.

[0162] When each functional module is divided according to its corresponding function, the device may also include a conversion module, a rewriting module, a processing module, and a control module. It should be noted that all relevant content regarding the steps involved in the above method embodiments can be referenced from the functional descriptions of the corresponding functional modules, and will not be repeated here.

[0163] It should be understood that the device provided in this embodiment is used to execute the above-described method for controlling a home appliance, and therefore can achieve the same effect as the above-described implementation method.

[0164] When using integrated units, the device may include a processing module and a storage module. When applied to an electronic device, the processing module can be used to control and manage the operation of the electronic device. The storage module can be used to support the execution of relevant program code and data by the electronic device.

[0165] The processing module may be a processor or a controller, which can implement or execute various exemplary logic blocks, modules, and circuits shown in conjunction with the disclosure of this application. The processor may also be a combination of functions that implement computing capabilities, such as a combination of one or more microprocessors, a combination of digital signal processing (DSP) and a microprocessor, etc., and the storage module may be a memory.

[0166] In addition, the device provided in the embodiments of this application may specifically be a chip, component or module. The chip may include a connected processor and a memory. The memory is used to store instructions. When the processor calls and executes the instructions, the chip can execute a home device control method provided in the above embodiments.

[0167] This embodiment also provides a computer-readable storage medium storing computer program code. When the computer program code is run on a computer, the computer executes the above-described related method steps to implement a home appliance control method provided in the above embodiment.

[0168] This embodiment also provides a computer program product that, when run on a computer or processor, causes the computer or processor to perform the aforementioned related steps to achieve a home appliance control method provided in the above embodiment.

[0169] In this embodiment, the device, computer-readable storage medium, computer program product, or chip are all used to execute the corresponding methods provided above. Therefore, the beneficial effects they can achieve can be referred to the beneficial effects in the corresponding methods provided above, and will not be repeated here.

[0170] Through the above description of the embodiments, those skilled in the art will understand that, for the sake of convenience and brevity, only the division of the above functional modules is used as an example. In actual applications, the above functions can be assigned to different functional modules as needed, that is, the internal structure of the device can be divided into different functional modules to complete all or part of the functions described above.

[0171] In the embodiments provided in this application, it should be understood that the disclosed apparatus and methods can be implemented in other ways. For example, the apparatus embodiments described above are merely illustrative; for instance, the division of modules or units is only a logical functional division, and in actual implementation, there may be other division methods. For example, multiple units or components may be combined or integrated into another device, or some features may be ignored or not executed. Furthermore, the coupling or direct coupling or communication connection shown or discussed may be through some interfaces; the indirect coupling or communication connection between devices or units may be electrical, mechanical, or other forms.

[0172] The above description is merely a specific embodiment of this application, but the scope of protection of this application is not limited thereto. Any variations or substitutions that can be easily conceived by those skilled in the art within the scope of the technology disclosed in this application should be included within the scope of protection of this application. Therefore, the scope of protection of this application should be determined by the scope of the claims.

Claims

1. A method for controlling home appliances, characterized in that, The method includes: Receive input voice signals and convert the voice signals into text information; The text information is rewritten according to N rewriting dimensions to obtain the initial text instruction; where N is an integer greater than or equal to 1. The initial text instruction is normalized to obtain the target text instruction; wherein, the target text instruction is an executable instruction in a preset format; Control the target home appliances based on the target text instructions.

2. The method according to claim 1, characterized in that, N is an integer greater than 1. The step of rewriting the text information according to N rewriting dimensions to obtain the initial text instruction includes: Based on the i-th rewriting dimension, the content to be rewritten under the i-th rewriting dimension is rewritten to obtain the rewritten text corresponding to the i-th rewriting dimension; wherein, when i=1, the content to be rewritten under the i-th rewriting dimension is the text information; when i>1, the content to be rewritten under the i-th rewriting dimension is the rewritten text corresponding to the (i-1)-th rewriting dimension; i is an integer greater than 1 and less than or equal to N; When i=N, the rewritten text corresponding to the Nth rewritten dimension is used as the initial text instruction.

3. The method according to claim 2, characterized in that, The step of rewriting the content to be rewritten under the i-th rewriting dimension according to the i-th rewriting dimension to obtain the rewritten text corresponding to the i-th rewriting dimension includes: The content to be rewritten under the i-th rewriting dimension is detected to obtain a detection result; wherein, the detection result is used to characterize whether to perform rewriting processing on the content to be rewritten under the i-th rewriting dimension; Based on the detection results, the content to be rewritten under the i-th rewriting dimension is rewritten to obtain the rewritten text corresponding to the i-th rewriting dimension.

4. The method according to claim 3, characterized in that, The detection of the content to be rewritten under the i-th rewriting dimension, and the resulting detection results, include: Determine whether the rewrite execution condition corresponding to the i-th rewrite dimension is met, and determine whether the content to be rewritten under the i-th rewrite dimension has met the rewrite completion condition; If it is determined that the rewrite execution condition is not met or the rewrite completion condition has been met, a first detection result is output; wherein, the first detection result is used to indicate that no rewrite processing needs to be performed on the content to be rewritten; If the rewrite execution condition is met but the rewrite completion condition is not met, a second detection result is output; wherein, the second detection result is used to characterize that the content to be rewritten needs to be rewritten.

5. The method according to claim 3 or 4, characterized in that, The step of rewriting the content to be rewritten under the i-th rewriting dimension based on the detection result to obtain the rewritten text corresponding to the i-th rewriting dimension includes: For the content to be rewritten under the i-th rewriting dimension, if the detection result indicates that no rewriting processing is required for the content to be rewritten, the content to be rewritten is determined as the rewritten text corresponding to the i-th rewriting dimension. For the content to be rewritten under the i-th rewriting dimension, if the detection result indicates that the content to be rewritten needs to be rewritten, the content to be rewritten is rewritten according to the rewriting dimension to obtain the rewritten text corresponding to the i-th rewriting dimension.

6. The method according to claim 2 or 3, characterized in that, The step of rewriting the content to be rewritten under the i-th rewriting dimension according to the i-th rewriting dimension to obtain the rewritten text corresponding to the i-th rewriting dimension includes: When the i-th rewriting dimension is the first dimension, historical interaction records are obtained, and the intent-related information corresponding to the text information is completed based on the historical interaction records to obtain the first rewritten text corresponding to the first dimension; wherein, the historical interaction records contain multi-round interaction text information generated between the user and the target home device before the current interaction process; When the i-th rewriting dimension is the second dimension, the first rewritten text is rewritten in a personalized way based on a preset database to obtain the second rewritten text corresponding to the second dimension; wherein, the preset database is used to store the user's personalized interaction information; When the i-th rewriting dimension is the third dimension, a list of home appliances is obtained, and the second rewritten text is subjected to intent parsing and reasoning based on the list of home appliances to obtain the third rewritten text corresponding to the third dimension; wherein, the list of home appliances includes a variety of home appliances and the functional definition corresponding to each home appliance.

7. The method according to claim 1, characterized in that, The step of rewriting the text information according to at least one rewriting dimension to obtain the initial text instruction includes: The text information is analyzed, and the target rewriting dimension for rewriting the text information is determined from N rewriting dimensions. The text information is rewritten according to the target rewriting dimension to obtain the initial text instruction.

8. The method according to claim 1, 2, or 7, characterized in that, The process of normalizing the initial text instruction to obtain the target text instruction includes: The initial text instruction is processed by format standardization, domain classification, and invalid instruction filtering to obtain the target text instruction.

9. A control device for home appliances, characterized in that, The device includes: A conversion module is used to receive input voice signals and convert the voice signals into text information; The rewriting module is used to rewrite the text information according to N rewriting dimensions to obtain the initial text instruction; where N is an integer greater than or equal to 1. The processing module is used to standardize the initial text instruction to obtain the target text instruction; wherein the target text instruction is an executable instruction in a preset format; The control module is used to control the target home appliances based on the target text instructions.

10. An electronic device, characterized in that, include: Memory, used to store executable program code; A processor for calling and running the executable program code from the memory, causing the electronic device to perform the method as described in any one of claims 1 to 8.

11. A computer-readable storage medium, characterized in that, The computer-readable storage medium stores a computer program that, when executed, implements the method as described in any one of claims 1 to 8.

12. A computer program product, characterized in that, When the computer program product is run on a computer or processor, it causes the computer or processor to perform the method as described in any one of claims 1 to 8.