Network device control method, apparatus and system based on voiceprint recognition

What is AI technical title?
AI technical title is built by Patsnap AI team. It summarizes the technical point description of the patent document.
By using voiceprint recognition technology, the system receives voiceprint information from user devices, identifies target users and operation content, and generates single-person or multi-person control scenarios. This solves the problem of the simple voice control function of existing electrical appliances and enables more efficient control of electrical appliances.

CN117596088BActive Publication Date: 2026-06-19SHENZHEN SDMC TECH CO LTD

View PDF 2 Cites 0 Cited by

Patent Information

Authority / Receiving Office: CN · China
Patent Type: Patents(China)
Current Assignee / Owner: SHENZHEN SDMC TECH CO LTD
Filing Date: 2023-10-09
Publication Date: 2026-06-19

Application Information

Patent Timeline

09 Oct 2023

Application

19 Jun 2026

Publication

CN117596088B

IPC: H04L12/28; G10L17/14; G10L17/22

AI Tagging

Application Domain

Speech analysis Data switching by path configuration

Technology Topics

Intelligent Network Target control

Explore More Agents

Novelty Search
Search existing technologies and assess novelty
↗
FTO
Analyze whether a product may infringe others' patents
↗
Design FTO
Check prior-design risk for exterior design
↗
Drafting
Draft patent application text based on a technical solution
↗
Find Solutions with TRIZ
Generate feasible solution to solve your technical challenge
↗

Similar Technology Patents

Get free access to AI patent search and analysis

Check patentability, review prior art and ask IP Agent with full patent context.

AI Technical Summary

Technical Problem

The voice control function of existing electrical appliances can only recognize simple control commands, which cannot fully realize the convenience and cannot adapt to multi-person control scenarios.

Method used

Voiceprint recognition technology is used to receive voiceprint information from user devices, identify target users and operation content, generate single-person or multi-person control scenarios, and send operation instructions to the corresponding devices. Smart gateways and cloud servers are used for scenario recognition and instruction transmission.

Benefits of technology

It enables user identity and operation content to be recognized based on voiceprint, adapting to the needs of different customer groups and improving the convenience of voice control of electrical appliances and the flexibility of multi-person control.

✦ Generated by Eureka AI based on patent content.

Smart Images

Figure CN117596088B_ABST

Patent Text Reader

Abstract

This application discloses a network device control method, apparatus, and system based on voiceprint recognition, relating to the field of smart home control technology. A cloud server receives voice commands sent by a smart gateway, which are obtained by the smart gateway from a user device. Voiceprint information is obtained from the voice commands. The operation content corresponding to the voice commands is determined. A target user is determined based on the voiceprint information. A target control scenario is determined based on the target user and the operation content. The target control scenario can be a single-person control scenario or a multi-person control scenario. The target control scenario includes at least one operation information, which includes operation object information and corresponding operation commands. The smart gateway sends corresponding operation commands to each operation object in the operation information, solving the problem of insufficient convenience of voice control for electrical appliances in the prior art.

Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of network device control technology based on voiceprint recognition, and specifically to a network device control method, apparatus and system based on voiceprint recognition. Background Technology

[0002] In recent years, wireless services such as mobile communications have developed rapidly, and mobile internet and smart home devices have seen increasing growth. Currently, many electrical appliances have added voice control functions, allowing users to directly control these appliances with their voices for convenience. For example, users can use voice commands to turn air conditioners on and off, lower the temperature, and increase the fan speed. However, current voice control functions for electrical appliances can only recognize simple control commands and can only perform limited operations on a single device, failing to fully leverage the convenience of voice control. Summary of the Invention

[0003] The technical problem to be solved by the present invention is to overcome the lack of convenience of voice control of electrical equipment in the prior art, thereby providing a network device control method, device and system based on voiceprint recognition.

[0004] In a first aspect, embodiments of the present invention disclose a network device control method based on voiceprint recognition, the method comprising:

[0005] Receives voice commands sent by the smart gateway, wherein the voice commands are obtained by the smart gateway from the user equipment;

[0006] Obtain voiceprint information from the voice command;

[0007] Determine the operation content corresponding to the sound command;

[0008] The target user is determined based on the voiceprint information;

[0009] The target control scenario is determined based on the target user and the operation content. The target control scenario is either a single-person control scenario or a multi-person control scenario. The target control scenario includes at least one operation information, which includes operation object information and corresponding operation instructions.

[0010] The smart gateway sends corresponding operation instructions to each operation object in the operation information.

[0011] Optionally, determining the target user based on the voiceprint information includes: calculating the number of voiceprint information obtained from the voice command; if only one voiceprint information is obtained from the voice command, the target user is directly determined based on the voiceprint information; if two or more voiceprint information are obtained from the voice command, the system checks whether there is an execution record of the voice command in the historical response record; if there is an execution record of the voice command in the historical response record, the system responds to the voice command according to the execution record; if there is no execution record of the voice command in the historical response record, the system sends an inquiry message to the user device through the smart gateway, the inquiry message being used to ask the user whether to switch to a multi-person scenario; if a switching instruction is received from the user device through the smart gateway within a preset time period, the switching instruction including multi-person mode or single-person mode; if the switching instruction is multi-person mode, all target users related to the operation content and the voice command are obtained; if no switching instruction is received within the preset time period, or the switching instruction is single-person mode, the target user corresponding to the voice command is directly obtained.

[0012] Optionally, it further includes: using a voice assistant to obtain background information of the target user; obtaining the target user's device usage habits based on the background information; and generating a control scenario for the target user based on the device usage habits.

[0013] Optionally, before receiving the voice command sent by the smart gateway, the method further includes: obtaining a first single-person control scenario corresponding to the target user; obtaining the associated personnel information of the target user based on the background information; obtaining a second single-person control scenario corresponding to each associated personnel; and merging the first single-person control scenario and all the second single-person control scenarios to generate at least one multi-person control scenario.

[0014] Optionally, determining the target user based on the voiceprint information further includes: identifying the source of the voiceprint information; if the voiceprint information comes from a legitimate user, then determining the corresponding legitimate user as the target user; if the voiceprint information comes from an illegitimate user, then sending a tracking camera instruction to the target camera through the smart gateway, so that the camera can track the illegitimate user after receiving the tracking camera instruction and send an alarm message to the designated user terminal.

[0015] Secondly, embodiments of the present invention disclose a network device control method based on voiceprint recognition, comprising:

[0016] Receive audio commands sent by user equipment;

[0017] When a normal connection with the cloud server cannot be established, voiceprint information is directly obtained from the voice command to determine the operation content corresponding to the voice command. The target user is determined based on the voiceprint information. The target control scenario is determined based on the target user and the operation content. The target control scenario is a single-person control scenario or a multi-person control scenario. The target control scenario includes at least one operation information, which includes operation object information and corresponding operation command.

[0018] When the cloud server connection is normal, the voice command is sent to the cloud server so that the cloud server can obtain voiceprint information from the voice command, determine the operation content corresponding to the voice command, determine the target user based on the voiceprint information, and determine the target control scenario based on the target user and the operation content.

[0019] Send corresponding operation instructions to each operation object in the operation information.

[0020] Thirdly, embodiments of the present invention disclose a cloud server, including:

[0021] The first voice command receiving module is used to receive voice commands sent by the smart gateway, wherein the voice commands are obtained by the smart gateway from the user equipment;

[0022] A voiceprint information acquisition module is used to acquire voiceprint information from the voice command;

[0023] The operation content acquisition module is used to determine the operation content corresponding to the sound command;

[0024] A target user determination module is used to determine a target user based on the voiceprint information;

[0025] The control scenario determination module is used to determine a target control scenario based on the target user and the operation content. The target control scenario is a single-person control scenario or a multi-person control scenario. The target control scenario includes at least one piece of operation information, which includes operation object information and corresponding operation instructions.

[0026] The first operation instruction sending module is used to send corresponding operation instructions to each operation object in the operation information through the smart gateway.

[0027] Fourthly, embodiments of the present invention disclose a smart gateway, comprising:

[0028] The second audio command receiving module is used to receive audio commands sent by the user equipment.

[0029] The voiceprint processing module is used to directly obtain voiceprint information from the voice command when the connection with the cloud server is not normal, determine the operation content corresponding to the voice command, determine the target user based on the voiceprint information, and determine the target control scenario based on the target user and the operation content. The target control scenario is a single-person control scenario or a multi-person control scenario. The target control scenario includes at least one operation information, which includes operation object information and corresponding operation command.

[0030] The voice command sending module is used to send the voice command to the cloud server when the cloud server connection is normal, so that the cloud server can obtain voiceprint information from the voice command, determine the operation content corresponding to the voice command, determine the target user based on the voiceprint information, and determine the target control scene based on the target user and the operation content.

[0031] The second operation instruction sending module is used to send corresponding operation instructions to each operation object in the operation information.

[0032] Fifthly, embodiments of the present invention disclose a network device control system based on voiceprint recognition, comprising:

[0033] The user equipment collects sound information, generates sound commands, and sends the sound commands to the smart gateway;

[0034] The smart gateway receives voice commands sent by user devices. When a normal connection to the cloud server is lost, it directly obtains voiceprint information from the voice commands to determine the corresponding operation content. Based on the voiceprint information, it identifies the target user and determines the target control scenario based on the target user and the operation content. The target control scenario can be a single-person control scenario or a multi-person control scenario. The target control scenario includes at least one operation information, which includes operation object information and corresponding operation commands. When the connection to the cloud server is normal, it sends the voice commands to the cloud server.

[0035] A cloud server receives voice commands sent by a smart gateway and obtains voiceprint information from the voice commands; determines the operation content corresponding to the voice commands; identifies a target user based on the voiceprint information; determines a target control scenario based on the target user and the operation content, wherein the target control scenario is a single-person control scenario or a multi-person control scenario, and the target control scenario includes at least one operation information, wherein the operation information includes operation object information and corresponding operation commands; and sends corresponding operation commands to each operation object in the operation information through the smart gateway.

[0036] In a sixth aspect, the present invention also discloses a computer device, including: a processor, a memory, and a bus, wherein the memory stores machine-readable instructions executable by the processor, and when the computer device is running, the processor communicates with the memory via the bus, and when the machine-readable instructions are executed by the processor, the steps of the first aspect above, or any possible implementation of the first aspect, are performed.

[0037] The technical solutions provided by the embodiments of the present invention can have the following beneficial effects:

[0038] The cloud server obtains voiceprint information from the voice command; determines the operation content corresponding to the voice command; identifies the target user based on the voiceprint information; and then determines the target control scenario, which includes at least one operation information, including operation object information and corresponding operation command; and sends the corresponding operation command to each operation object in the operation information through the smart gateway. This solution uses the uniqueness of voiceprint to bind the user corresponding to the relevant scenario, not only distinguishing people but also analyzing voice content, and can adapt to the needs of different customer groups. Attached Figure Description

[0039] To more clearly illustrate the specific embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the specific embodiments or the prior art will be briefly introduced below. Obviously, the drawings described below are some embodiments of the present invention. For those skilled in the art, other drawings can be obtained from these drawings without creative effort.

[0040] Figure 1 A flowchart of a network device control method based on voiceprint recognition provided in an embodiment of the present invention is shown;

[0041] Figure 2 A flowchart of another network device control method based on voiceprint recognition provided in an embodiment of the present invention is shown;

[0042] Figure 3 A flowchart of a network device control method based on voiceprint recognition provided in an embodiment of the present invention is shown;

[0043] Figure 4 This invention discloses a functional structure diagram of a cloud server according to an embodiment of the present invention.

[0044] Figure 5 This invention discloses a functional structure diagram of a smart gateway according to an embodiment of the present invention.

[0045] Figure 6This invention discloses a functional structure diagram of a network device control system based on voiceprint recognition, according to an embodiment of the present invention.

[0046] Figure 7 A schematic diagram of the structure of a computer device provided by an embodiment of the present invention is shown. Detailed Implementation

[0047] Exemplary embodiments will now be described in detail, examples of which are illustrated in the accompanying drawings. When the following description relates to the drawings, unless otherwise indicated, the same numerals in different drawings denote the same or similar elements. The embodiments described in the following exemplary embodiments do not represent all embodiments consistent with the present invention. Rather, they are merely examples of apparatuses and methods consistent with some aspects of the invention as detailed in the appended summary.

[0048] Example 1

[0049] like Figure 1 The flowchart shown is a network device control method based on voiceprint recognition provided in an embodiment of the present invention. The method includes:

[0050] S11: Receive voice commands sent by the smart gateway. The voice commands are obtained by the smart gateway from the user equipment.

[0051] S12: Obtain voiceprint information from voice commands.

[0052] S13: Determine the operation content corresponding to the voice command.

[0053] S14: Determine the target user based on voiceprint information.

[0054] S15: Determine the target control scenario based on the target user and operation content. The target control scenario is either a single-person control scenario or a multi-person control scenario. The target control scenario includes at least one operation information, which includes operation object information and corresponding operation instructions.

[0055] S16: Send corresponding operation instructions to each operation object in the operation information through the smart gateway.

[0056] The above instructions mainly include active instructions initiated by the user, and secondly, they also include corresponding adaptive instructions automatically generated based on the user attributes identified by voiceprint recognition and the event content presented in the speech content. These instructions can be fine-tuned and can also be passive instructions corresponding to specific security events.

[0057] It is understood that the technical solution provided in this embodiment involves the cloud server obtaining voiceprint information from the voice command; determining the operation content corresponding to the voice command; determining the target user based on the voiceprint information; and then determining the target control scenario, which includes at least one operation information, including operation object information and corresponding operation instructions; and sending corresponding operation instructions to each operation object in the operation information through a smart gateway. This solution utilizes the uniqueness of voiceprints to bind users corresponding to relevant scenarios, not only distinguishing people but also analyzing voice content, and can adapt to the needs of different customer groups.

[0058] Example 2

[0059] As an improvement to Example 1, such as Figure 2 The flowchart shown is another network device control method based on voiceprint recognition provided in an embodiment of the present invention. The method includes:

[0060] S201: Obtain the first single-person control scenario corresponding to the target user.

[0061] S202: Obtain the target user's associated personnel information based on background information.

[0062] S203: Obtain the second single-person control scene corresponding to each associated person.

[0063] S204: Merge the first single-player control scene and all second single-player control scenes to generate at least one multi-player control scene.

[0064] S205: Receives voice commands sent by the smart gateway, which are obtained by the smart gateway from the user equipment.

[0065] S206: Obtain voiceprint information from voice commands.

[0066] S207: Determine the operation content corresponding to the voice command.

[0067] S208: Determine the target user based on voiceprint information.

[0068] S209: Determine the target control scenario based on the target user and operation content. The target control scenario is a single-person control scenario or a multi-person control scenario. The target control scenario includes at least one operation information, which includes operation object information and corresponding operation instructions.

[0069] S210: Send corresponding operation instructions to each operation object in the operation information through the smart gateway.

[0070] S211: Use a voice assistant to obtain background information about the target user.

[0071] S212: Obtain the target user's device usage habits based on background information.

[0072] S213: Generate control scenarios for the target user based on device usage habits.

[0073] In some alternative embodiments, S208 includes (not shown in the figure):

[0074] S208-1: Calculate the amount of voiceprint information obtained from the voice commands.

[0075] S208-2: If only one voiceprint information is obtained from the voice command, the target user is determined directly based on the voiceprint information.

[0076] S208-3: If two or more voiceprint information are obtained from the voice command, check whether there is an execution record of the voice command in the historical response record.

[0077] S208-4: If there is an execution record of a sound command in the historical response record, then respond to the sound command according to the execution record.

[0078] S208-5: If there is no record of voice command execution in the historical response record, an inquiry message is sent to the user device through the smart gateway. The inquiry message is used to ask the user whether to switch to a multi-person scenario.

[0079] S208-6: If a switching instruction is received from a user device through a smart gateway within a preset time period, the switching instruction may include multi-user mode or single-user mode.

[0080] S208-7: If the switching instruction is set to multi-user mode, then obtain all target users related to the operation content and voice commands.

[0081] S208-8: If no switching instruction is received within the preset time period, or if the switching instruction is for single-person mode, the target user corresponding to the voice command will be directly obtained.

[0082] In some alternative embodiments, S208 further includes (not shown in the figure):

[0083] S208-9: Identify the source of voiceprint information.

[0084] S208-10: If the voiceprint information comes from a legitimate user, then the corresponding legitimate user shall be identified as the target user.

[0085] S208-11: If the voiceprint information originates from an unauthorized user, a tracking and recording command is sent to the target camera via the smart gateway. Upon receiving the command, the camera tracks the unauthorized user and sends an alarm message to the designated user terminal. Based on this, automatic alarms can be triggered, or the process of seeking assistance from security personnel can be simplified, allowing children to take timely action based on warning messages for elderly relatives.

[0086] It is understood that the technical solution provided in this embodiment involves the cloud server obtaining voiceprint information from the voice command; determining the operation content corresponding to the voice command; determining the target user based on the voiceprint information; and then determining the target control scenario, which includes at least one operation information, including operation object information and corresponding operation instructions; and sending corresponding operation instructions to each operation object in the operation information through a smart gateway. This solution utilizes the uniqueness of voiceprints to bind users corresponding to relevant scenarios, not only distinguishing people but also analyzing voice content, and can adapt to the needs of different customer groups.

[0087] Example 3

[0088] like Figure 3 As shown, a flowchart of another network device control method based on voiceprint recognition provided in an embodiment of the present invention is presented. The method includes:

[0089] S31: Receives audio commands sent by the user equipment.

[0090] S32: When the connection with the cloud server cannot be established normally, the voiceprint information is obtained directly from the voice command to determine the operation content corresponding to the voice command. The target user is determined based on the voiceprint information. The target control scenario is determined based on the target user and the operation content. The target control scenario is a single-person control scenario or a multi-person control scenario. The target control scenario includes at least one operation information, which includes operation object information and corresponding operation command.

[0091] S33: When the cloud server connection is normal, send a voice command to the cloud server so that the cloud server can obtain voiceprint information from the voice command, determine the operation content corresponding to the voice command, determine the target user based on the voiceprint information, and determine the target control scenario based on the target user and operation content.

[0092] S34: Send the corresponding operation instruction to each operation object in the operation information.

[0093] It is understood that the technical solution provided in this embodiment involves the cloud server obtaining voiceprint information from the voice command; determining the operation content corresponding to the voice command; determining the target user based on the voiceprint information; and then determining the target control scenario, which includes at least one operation information, including operation object information and corresponding operation instructions; and sending corresponding operation instructions to each operation object in the operation information through a smart gateway. This solution utilizes the uniqueness of voiceprints to bind users corresponding to relevant scenarios, not only distinguishing people but also analyzing voice content, and can adapt to the needs of different customer groups.

[0094] Example 4

[0095] like Figure 4 As shown in the figure, another functional structure diagram of a cloud server provided by an embodiment of the present invention is disclosed. The method includes:

[0096] The first voice command receiving module 41 is used to receive voice commands sent by the smart gateway, which are obtained by the smart gateway from the user equipment.

[0097] The voiceprint information acquisition module 42 is used to acquire voiceprint information from voice commands.

[0098] The operation content acquisition module 43 is used to determine the operation content corresponding to the sound command.

[0099] The target user determination module 44 is used to determine the target user based on voiceprint information.

[0100] The control scenario determination module 45 is used to determine the target control scenario based on the target user and operation content. The target control scenario is a single-person control scenario or a multi-person control scenario. The target control scenario includes at least one operation information, which includes operation object information and corresponding operation instructions.

[0101] The first operation instruction sending module 46 is used to send corresponding operation instructions to each operation object in the operation information through the smart gateway.

[0102] In some alternative embodiments, the cloud server may further include:

[0103] The control scene generation module 47 is used to obtain the target user's background information using the voice assistant; obtain the target user's device usage habits based on the background information; and generate the target user's control scene based on the device usage habits.

[0104] In some alternative embodiments, the cloud server may further include:

[0105] The scene fusion module 48 is used to obtain the first single-person control scene corresponding to the target user; obtain the associated personnel information of the target user based on the background information; obtain the second single-person control scene corresponding to each associated personnel; and fuse the first single-person control scene and all second single-person control scenes to generate at least one multi-person control scene.

[0106] In some alternative embodiments, the target user determination module 44 includes:

[0107] The target user determination submodule 441 is used to calculate the number of voiceprint information obtained from the voice command. If only one voiceprint information is obtained from the voice command, the target user is directly determined based on the voiceprint information. If two or more voiceprint information are obtained from the voice command, the system checks whether there is an execution record of the voice command in the historical response record. If there is an execution record of the voice command in the historical response record, the system responds to the voice command according to the execution record. If there is no execution record of the voice command in the historical response record, the system sends an inquiry message to the user device through the smart gateway. The inquiry message asks the user whether to switch to a multi-person scenario. If a switching instruction is received from the user device through the smart gateway within a preset time period, the switching instruction includes multi-person mode or single-person mode. If the switching instruction is multi-person mode, all target users related to the operation content and voice command are obtained. If no switching instruction is received within the preset time period, or the switching instruction is single-person mode, the target user corresponding to the voice command is directly obtained.

[0108] The abnormal sound recognition submodule 442 is used to identify the source of voiceprint information. If the voiceprint information comes from a legitimate user, the corresponding legitimate user is identified as the target user. If the voiceprint information comes from an illegal user, a tracking camera instruction is sent to the target camera through the smart gateway so that the camera can track the illegal user after receiving the tracking camera instruction and send an alarm message to the designated user terminal.

[0109] The process of generating commands based on the aforementioned abnormal sound recognition involves identifying user attributes and the content of speech based on voiceprints, temporal and frequency domain characteristics of the sound, and whether a security event, such as a sharp scream or an explosion, is present. Passive operation commands are triggered based on the recorded sound energy, such as controlling a camera to attempt to capture and lock the sound source. Corresponding alarm commands are generated to activate the smart speaker alarm and send them to the associated management user. The content of these commands can be fine-tuned to improve emergency assistance for specific security events while retaining necessary technical support. This links specific users to events, as users with different identities may encounter scenarios such as burglaries. Since thieves can easily conceal their facial features and voices, it is necessary to capture the voices of family members to assist in generating commands for specific scenarios.

[0110] It is understood that the technical solution provided in this embodiment involves the cloud server obtaining voiceprint information from the voice command; determining the operation content corresponding to the voice command; determining the target user based on the voiceprint information; and then determining the target control scenario, which includes at least one operation information, including operation object information and corresponding operation instructions; and sending corresponding operation instructions to each operation object in the operation information through a smart gateway. This solution utilizes the uniqueness of voiceprints to bind users corresponding to relevant scenarios, not only distinguishing people but also analyzing voice content, and can adapt to the needs of different customer groups.

[0111] Example 5

[0112] like Figure 5 As shown, this invention discloses a functional structure diagram of another smart gateway, which includes:

[0113] The second audio command receiving module 51 is used to receive audio commands sent by the user equipment.

[0114] The voiceprint processing module 52 is used to directly obtain voiceprint information from the voice command when the connection with the cloud server is not normal, determine the operation content corresponding to the voice command, determine the target user based on the voiceprint information, and determine the target control scenario based on the target user and the operation content. The target control scenario is a single-person control scenario or a multi-person control scenario. The target control scenario includes at least one operation information, which includes operation object information and corresponding operation command.

[0115] The voice command sending module 53 is used to send voice commands to the cloud server when the cloud server connection is normal, so that the cloud server can obtain voiceprint information from the voice command, determine the operation content corresponding to the voice command, determine the target user based on the voiceprint information, and determine the target control scenario based on the target user and the operation content.

[0116] The second operation instruction sending module 54 is used to send corresponding operation instructions to each operation object in the operation information.

[0117] It is understood that the technical solution provided in this embodiment involves the cloud server obtaining voiceprint information from the voice command; determining the operation content corresponding to the voice command; determining the target user based on the voiceprint information; and then determining the target control scenario, which includes at least one operation information, including operation object information and corresponding operation instructions; and sending corresponding operation instructions to each operation object in the operation information through a smart gateway. This solution utilizes the uniqueness of voiceprints to bind users corresponding to relevant scenarios, not only distinguishing people but also analyzing voice content, and can adapt to the needs of different customer groups.

[0118] Example 6

[0119] like Figure 5 As shown in the figure, another functional structure diagram of a network device control system based on voiceprint recognition provided by an embodiment of the present invention is shown. The system includes:

[0120] User equipment 61 collects sound information, generates sound commands, and sends the sound commands to the smart gateway.

[0121] The intelligent gateway 62 receives voice commands sent by user devices. When it cannot connect to the cloud server normally, it directly obtains voiceprint information from the voice commands to determine the operation content corresponding to the voice commands. Based on the voiceprint information, it identifies the target user and determines the target control scenario based on the target user and the operation content. The target control scenario can be a single-person control scenario or a multi-person control scenario. The target control scenario includes at least one operation information, which includes operation object information and corresponding operation commands. When the connection to the cloud server is normal, it sends voice commands to the cloud server.

[0122] Cloud server 63 receives voice commands sent by the smart gateway and obtains voiceprint information from the voice commands; determines the operation content corresponding to the voice commands; determines the target user based on the voiceprint information; determines the target control scenario based on the target user and the operation content, the target control scenario being a single-person control scenario or a multi-person control scenario, the target control scenario including at least one operation information, the operation information including operation object information and corresponding operation commands; and sends the corresponding operation commands to each operation object in the operation information through the smart gateway.

[0123] It is understood that the technical solution provided in this embodiment involves the cloud server obtaining voiceprint information from the voice command; determining the operation content corresponding to the voice command; determining the target user based on the voiceprint information; and then determining the target control scenario, which includes at least one operation information, including operation object information and corresponding operation instructions; and sending corresponding operation instructions to each operation object in the operation information through a smart gateway. This solution utilizes the uniqueness of voiceprints to bind users corresponding to relevant scenarios, not only distinguishing people but also analyzing voice content, and can adapt to the needs of different customer groups.

[0124] Example 7

[0125] Based on the same technical concept, embodiments of this application also provide a computer device, including a memory 1 and a processor 2, such as... Figure 7 As shown, memory 1 stores a computer program, and processor 2 executes the computer program to implement any of the above-mentioned network device control methods based on voiceprint recognition.

[0126] The memory 1 includes at least one type of readable storage medium, including flash memory, hard disk, multimedia card, card-type memory (e.g., SD or DX memory), magnetic memory, magnetic disk, optical disk, etc. In some embodiments, the memory 1 can be an internal storage unit of the voiceprint recognition-based network device control system, such as a hard disk. In other embodiments, the memory 1 can also be an external storage device of the voiceprint recognition-based network device control system, such as a plug-in hard disk, a smart media card (SMC), a secure digital (SD) card, a flash card, etc. Furthermore, the memory 1 can include both internal storage units and external storage devices of the voiceprint recognition-based network device control system. The memory 1 can be used not only to store application software and various data installed in the voiceprint recognition-based network device control system, such as the code of the voiceprint recognition-based network device control program, but also to temporarily store data that has been output or will be output.

[0127] In some embodiments, processor 2 may be a central processing unit (CPU), controller, microcontroller, microprocessor or other data processing chip, used to run program code stored in memory 1 or process data, such as executing a network device control program based on voiceprint recognition.

[0128] It is understood that the technical solution provided in this embodiment involves the cloud server obtaining voiceprint information from the voice command; determining the operation content corresponding to the voice command; determining the target user based on the voiceprint information; and then determining the target control scenario, which includes at least one operation information, including operation object information and corresponding operation instructions; and sending corresponding operation instructions to each operation object in the operation information through a smart gateway. This solution utilizes the uniqueness of voiceprints to bind users corresponding to relevant scenarios, not only distinguishing people but also analyzing voice content, and can adapt to the needs of different customer groups.

[0129] To facilitate readers' understanding of the technical solutions of the invention embodiments, the above solutions are described in detail below through specific examples.

[0130] Modern home environments often feature numerous production and living devices, but these devices suffer from insufficient intelligence in management and monitoring, and are less age-friendly. Consequently, many low-frequency smart home and household items are poorly managed. Each device requires attention to numerous details such as command execution and detailed procedures, necessitating users to memorize vast amounts of information, placing an increasingly heavy mental burden on them and making management difficult. Currently, voice control relies on relatively brief descriptions, which are then expanded to more comprehensive settings by professionals. Improving personalized configurations and supporting multiple devices requires more accurate user needs, or rather, a concise explanation to facilitate template configuration.

[0131] Currently, voiceprint recognition is mainly designed for judging human voices and cannot be used in scenarios involving multiple devices and multiple people of different types. Simply applying a parameter template that corresponds to a voiceprint to a fixed device is not precise or flexible enough. If we use the intelligent parsing capabilities of voice assistants to obtain sufficient user background information to achieve pre-configuration and remember the minor changes, we can determine whether to temporarily modify or fix the device parameters in the device scenario.

[0132] This application uses a human voiceprint to identify family members and their fixed operating habits and corresponding device scenarios. These scenarios require parameter control of one or more devices. However, this is only the primary scenario setup and doesn't account for potential temporary adjustments to one or more devices. For convenient adjustments, voice control is needed, using more traditional, precise commands as input. To support more specific commands, once adapted, commands to remember the operating scenario can be issued. This allows for practical testing and adjustment of preferred aspects from an initial scenario.

[0133] The core of the technical solution includes relying on mobile terminals to bind smart gateways and cloud servers. Information is entered through mobile terminals, stored and backed up in the database, and provided to users of mobile terminal devices for querying and reminders. Both the cloud server and the smart gateway have corresponding query and reminder functions.

[0134] By leveraging the uniqueness of voiceprints to identify which users correspond to specific scenarios, and using an abnormal sound analysis model to analyze the scenarios of abnormal events, it is possible to not only distinguish between people but also analyze the content of their voices, as well as preset the voiceprints of sounds contained in some emergency events. This allows the system to adapt to the needs of different customer groups, while manually set content can be handled by advanced users. This maximizes the ability to quickly set up according to user needs and reduces the cost for users to learn how to use voice control in a distributed collaborative smart home and network environment.

[0135] The cloud server includes: a cloud-based voiceprint recognition module, used to analyze the attribution and identity of voiceprints, and also to analyze the events corresponding to different combinations of sounds; a cloud-based storage module, which stores the voiceprint recognition model and analyzes event models based on sound types; a scene calculation module, used to match device scene control based on different voiceprints and event occurrences; and a user permission module, used to match users corresponding to different voiceprints with corresponding operation permissions management.

[0136] The smart gateway stores voiceprint data and small models for voiceprint recognition, storing only user-identified voiceprints and a small number of commonly used voiceprints for offline and edge computing, saving computing resources spent on round trips to the cloud. Voiceprint data can be stored directly in the smart gateway or implemented using a combination of edge storage devices. The voiceprint calculation module is used for recognition in single-person and / or multi-person scenarios; its advantage is that it can be completed locally without occupying cloud resources. The second voice command receiving module is located in the smart speaker and / or mobile terminal, recording sound through a microphone and uploading it to the cloud for voiceprint analysis. The smart speaker connects to the smart gateway and the cloud wirelessly or via wired connection for voice interaction and control with the user. The location of the smart speaker can collect user and environmental sounds to analyze events and control conditions, enhancing device management of voice interaction between each device and user in the system. Specifically, the smart speaker identifies the voices of users and objects, calculates locally whether a user has been identified and whether it is a single or multi-person scenario, and then performs further complex calculations in the cloud.

[0137] The network device control process based on voiceprint recognition includes:

[0138] Step 1: Obtain the voiceprint of the target user and record the device parameters and operation permissions of any enabled device scenario initialized by the user.

[0139] For example, User 1 is the female homeowner, setting up a welcome-home scenario for the first time. The original template sets the smart speaker's welcome message to "Madam, welcome home," and also sets the air conditioner to a comfortable setting and the TV to a specific screen. User 1 can modify and save this template according to their needs, further specifying that it is primarily for individuals living alone. User 2 is the woman's relative's child, who needs to stay for a short period. Their voiceprint is recorded for identification. When setting up a welcome-home scenario for a child entering alone, a corresponding prompt is created, cloning the original welcome message exclusively for the child. Users 3 and 4 are the woman's parents, and corresponding titles can be set to expand the welcome message, allowing for multiple names. This way, upon entering the house, the system can proactively interact with the voice assistant based on voice, greeting the user and enabling the home appliance system to operate in multi-user mode seamlessly, without requiring manual input.

[0140] Taking internet access control as an example, configure users' internet access permissions under mesh devices and optical modems, allowing different users to access different networks. Adults can access full internet access, while children's dedicated wireless devices connect to a dedicated network SSID and are automatically configured for green internet access, preventing access to some age-restricted websites. This is valid for family members indefinitely, while for children from relatives' homes, a temporary guest SSID and password are generated and sent to the administrator or the guest's mobile terminal for restriction purposes.

[0141] Preferably, the prerequisite for step one is that administrators with primary privileges can modify any configuration; ordinary users need authentication to modify some device configurations via voice. Strangers are not allowed to modify configurations via voice and can only respond. Family members and guests are whitelisted users, while strangers are those who are not on the whitelist after voiceprint comparison. Preferably, pets also have corresponding voiceprint data established, and general voiceprint data is stored in the cloud to determine whether various items have encountered corresponding events. The administrator confirms whether to enable voiceprint recognition and event analysis for ordinary items; unlike existing technologies that only determine human usage, this is used for more refined scene analysis to understand complex user needs and generate control actions to realize device scenarios with richer background support.

[0142] Fine-grained access control is mainly used in hotels and workplaces, where voiceprint binding is performed for different occasions. Unlike homes and workplaces, voiceprint binding needs to be re-bound in these settings. Furthermore, these settings do not have full administrator privileges; users can only manage as ordinary users and can only adjust electrical appliances and network devices in the location that has the greatest impact on them. This is a typical application for individual adjustments in multi-user scenarios. After users interact with the system by speaking, they can then use the system for voice control.

[0143] This system utilizes existing end-to-end voiceprint decomposition models built using machine learning to separate human voices and object sounds using a voice encoder, voiceprint recognition and separation, and voiceprint decoder. A second voiceprint decomposition model is then used to further analyze object sounds. The voiceprints of each member of a household user are pre-encoded and compared with voiceprints separated from sounds recorded in the current environment. If three people are present within a 3-minute timeframe, the system can determine if the current scene is multi-person. Combined with other location data, this can be further subdivided to determine if a room or living room is a multi-person scene. Device management is then associated with specific locations and the number of people in the scene. Specifically, after inputting environmental sound spectrum data into the voiceprint separation and discrimination network, in addition to the noisy human voice spectrogram, there is an embedding (d-vector) representing the target speaker, enhancing anti-aliasing capabilities and facilitating the comparison and separation of known voiceprints. This embedding is encoded by the voiceprint recognition encoder from a noise-free reference audio segment from the target speaker. The system ultimately uses this embedded code to selectively separate the target speaker's voice from the noisy spectrum. Since the voices of speakers and objects appearing at the same time are more prone to confusion than human voices, they require a separate background voiceprint separation model for processing. This model then extracts the temporal features of the sound, analyzes whether it has multiple time-axis locations, and identifies abnormal sounds associated with different objects and people, thus determining if there are abnormal sounds associated with an unusual event. The correlation between abnormal sounds and abnormal events is initially coarsely selected using a correlation database, and then further classified using an event causal inference classification model. This allows for the identification of the scene and event corresponding to a sound, improving decision-making accuracy. For example, sudden impacts are given lower weight than continuous impacts.

[0144] Voiceprint analysis is used to differentiate between single-person and multi-person scenarios. If only one speaker is matched in a voice, a single-person scenario is set, and initialization is performed only once. If multiple voiceprints are detected in the voice, the user is asked whether to switch to a multi-person scenario. Once the user switches, the scenario is not automatically changed, and corresponding silence conditions are set to reduce disturbance. For example, during the day, confirmation is required, and disturbances should be limited. When switching from a person-occupied state to an unoccupied state, if the user does not respond on the mobile terminal or via voice, such as after three unresponsive attempts, the scenario is switched to an away-from-home scenario or a sleep scenario based on the user's location. When switching from an unoccupied state to a person-occupied state, it is generally desirable to avoid too many confirmation operations, so automatic operation is preferred. Permission levels are set to allow for personalized adjustments and to minimize conflicts.

[0145] Step two: Based on the single-person scenario setup, inquire about the changes in needs and reasons for multi-person scenarios, using this as the question-and-answer dataset for training the question-and-answer model; first, collect the question-and-answer dataset, gathering question-and-answer data related to smart home control commands, including user questions about device operation and corresponding device operation commands as answers. For example:

[0146] Question: How do I turn on the light?

[0147] Answer: Please say "turn on the light" or click the "turn on" button on the app.

[0148] Users perform operations based on the instructions provided by the system. For complex operations, it is necessary to continuously associate the specific device scenario with the corresponding instructions for the operation content.

[0149] Question: How do I adjust the temperature?

[0150] Answer: You can say "increase the temperature" or "raise the temperature" to the intelligent temperature control device.

[0151] Next, data preprocessing is performed on the collected question-and-answer data to ensure that the data format and quality meet the requirements of the training model. This can include word segmentation, stop word removal, and other processing.

[0152] Build a question-answering model: Select a suitable question-answering model, such as a Transformer-based model like BERT or GPT-3, to train the question-answering model for device scene control commands.

[0153] When a user needs to change the content of the device scene initialized to a preset state, appropriate questions and feedback should be provided.

[0154] Preferably, in step two, the user's voiceprint and matching sound parameters are detected using an end-to-end voiceprint decomposition model, and the language content contained in the speech is identified to provide a response. For example, after separating the human voice, user one is identified, and a complete feature vector is established based on time-domain and frequency-domain features. It is determined that the user's volume is too low, and they are speaking softly on the mobile terminal, requiring adjustment by the system. A text reply is sent to the mobile terminal, suitable for private adjustment scenarios that do not affect conversations or other activities, so no further voice reply is given. Furthermore, due to the operation permissions bound to the voiceprint recognition provided by this invention, guests can perform some permitted operations and provide feedback via voice. Mel-Frequency Cepstrum Coefficients (MFCCs) are extracted from audio segments after mean filtering. The extracted MFCCs of the audio segments are used as the observation sequence and input into a trained improved Hidden Markov Model (HMM). Based on the frequency characteristics of human voices, the model analyzes whether the abnormal segment is an abnormal human voice. An abnormality does not necessarily mean that an accident has occurred. This can be used as a trigger for certain silent events. The HMM model is improved by introducing temporal correlation.

[0155] If the administrator user accepts the setup request, they only need to modify the settings according to the user's instructions to meet their needs. For example, if a user experience plan is included, the modification process for each user's scenario will be compared after removing privacy restrictions. The changes in requirement descriptions will be summarized, and data before and after modification, along with the requirement descriptions, will be extracted for training a requirement question-and-answer set. This will create a model that represents the parameter changes in the device scenario for each user's requirement change instructions in terms of both the required and desired forms. Referring to the question-and-answer set training mentioned earlier, the model itself falls within the scope of existing technology. However, this invention is used in the field of smart home control for translating human language into control instructions. The numerical instruction form of parameter changes is fitted with natural language. For example, "make the air conditioner drier" means "dehumidify for a period of time, temperature around 25 degrees Celsius." Even better, the use of key phrases and related operation instructions can be fixed and memorized by the system. Through retraining and optimization, new dataset instructions and verbal instructions can be included, helping voice control to use descriptive requirements that are understood and set by the voice assistant. During model training, the text content of the question-and-answer set is fed to a conversational robot using a large natural language model (such as GPT-3 and later versions) for learning and filtering. This refines the basic questions for a scenario by providing reasonable multi-turn question-and-answer coverage, and compares the scores with human answers to improve the comprehensiveness and simplicity of the question-and-answer process. This allows for the assessment of the completeness of the questions and answers, considering the content and steps involved in asking questions across multiple dimensions of a device scenario. The input and output stages are supplemented with additional questions to identify changes made by users that deviate from the standard process. This process allows for real-time learning of the reasons for these changes from the user's perspective, improving the fluency and creativity of the question-and-answer process. This step aims to obtain the precise mapping between demand descriptions and device parameters, enabling the learning of the transformation of data dimensions represented by relatively vague non-data-related descriptions in human language. Specifically, machine learning is used to annotate and classify control statements, understand the text, and distinguish the operational instructions corresponding to "decrease" and "increase." For example, a user saying, "I think the air conditioner is a bit noisy, and the TV is too loud, please turn it down," uses relative adjustment. However, when there's a lot of noise, turning it down shouldn't be by just a few decibels, since sound volume doesn't change linearly. Therefore, besides adjusting the volume of a major device appropriately based on the environment, controllable noise can be adjusted to ensure clarity. Current technology requires issuing voice commands to adjust each device individually, and it can't understand the need to adjust multiple devices simultaneously. A single device scenario often involves multiple devices. By inputting keywords for the control scenario and control commands for adjusting the parameters of multiple devices, the system can achieve joint control of multiple devices or even multiple device scenarios using just a few words.

[0156] To improve command recognition, data preprocessing is performed: The collected text data undergoes necessary preprocessing to ensure it meets the input requirements of the GPT-3 model. GPT-3 accepts plain text input and does not require specific formatting. When calling the GPT-3 model, text classification is performed via API or other methods. The text to be classified is passed to the GPT-3 model, which outputs the predicted command. The command output by the GPT-3 model is then applied to the corresponding operation. After the user inputs text, the text is passed to the GPT-3 model to obtain the predicted command, and then the corresponding operation is executed.

[0157] Iterative optimization: Collect user feedback and optimize the model's input and output based on user usage and needs to improve the accuracy and performance of command recognition. For example, when fusing single-person scenes from multiple users, a pre-defined merging method can be used. For instance, if the user setting the scene says, "I'm with a child," method one, where the main scene and sub-scenes are combined, is used. Examples include modifying air conditioning temperature and fan speed, lighting settings, and greetings for each person. Alternatively, method two can be used, without changing the main scene defined by the user, i.e., the scene of the primary user. In this learning mode, adjustments are made and the model is remembered when creating scenes for multiple users, without requiring specific settings. Currently, GPT-3 and above can understand this level of integration operation with high accuracy.

[0158] In general, existing technologies require manual creation of scenarios for multiple users. This invention, however, can combine multiple parameters from individual scenarios involving multiple family members into a multi-user scenario using a preset overlay method. The fusion process can be trained using a question-and-answer set to develop a descriptive understanding of the multi-user scenario, allowing for integration. Matching can be performed using methods such as Euclidean distance to determine question-and-answer similarity, enabling rapid return of relevant answers. For example, in reverse adjustment, if an air conditioner temperature below 25 degrees Celsius is designated for use by a healthy user, and the user coughs, requiring a temporary adjustment of the entire air conditioning system, the environment where that user is located will receive a temperature adjustment reminder. In existing technologies, if adaptation requires creating a separate scene, each device is individually adjusted to a specified scene. Only by specifying that scene will the device follow the scene template to a certain parameter state. Sometimes, we actually want to specify a more suitable version of the scene snapshot, where the main device parameters are the same as the configuration of the user's device scene, only modifying the most frequently changed device. For example, in the living room or bedroom, the air conditioner needs frequent adjustments, while the lighting doesn't need much change. The air conditioner can usually be set cooler, but when we use it, the sound produced can be less loud depending on our health condition, such as a cough, which the system can recognize. The trigger condition "User Two is coughing" can be linked to air conditioning settings. The user can be prompted via voice or message to manually adjust the air conditioner temperature. If the user does adjust the temperature, this setting is remembered, and the system will automatically adjust to that temperature the next time User Two enters their room, such as the second bedroom. Existing technologies for cough sound recognition compare feature vectors in the time and frequency domains to identify subtle changes in timbre, thus analyzing the possible changes in sound. Specifically, a deep belief network (DBN) cough sound recognition model can be trained. This model uses an unsupervised greedy algorithm for layer-by-layer pre-training of the Restricted Baseband (RBM) and a backpropagation (BP) algorithm for overall fine-tuning of the DBN. The pre-training process is essentially the layer-by-layer training of the RBM. Each individual voice sample feature parameter is treated as a state vector. The purpose of RBM training is to minimize the energy of the Boltzmann machine while maximizing the probability of the state vector s, thereby obtaining the corresponding RBM weights and threshold values. A single feature parameter threshold is used to define differences from normal sounds, such as coughing or nasal congestion. This process is implemented using a contrast divergence algorithm, which compares human voice with health thresholds. The system then asks the administrator whether appropriate adjustments should be made, and the administrator replies, "Remember this snapshot of a nighttime sleep scene where the temperature and noise are relatively comfortable after a cold."

[0159] When the snapshot description of the scenario is stored in the cloud and reused, such as in hotels, some adjustments can be made. Users need to synchronize some of their voiceprint data with the hotel's services. This can be applied to any hotel in the same chain. Adjusting the airflow of the central air conditioning requires some experience data support. Unlike home scenarios, users need to agree to voiceprint encryption and migration from the cloud to the hotel's local edge computing server. When users request "comfortable and quiet air conditioning", the temperature, airflow, and airflow pattern need to be adjusted appropriately to suit a more spacious room.

[0160] Step 3: Compare and learn the patterns of the device scenarios of the multiple users to trigger modification reminders when the device scenario is switched.

[0161] This system compares stored voiceprints with currently recorded voiceprints to determine if a multi-person scene needs to be switched. This avoids requiring extensive manual background input, matching cloud-based user directories and common device habits. For the initial dataset, it compares fused multi-person device scenes, recommending the common process of overlaying single-person scenes to generate multi-person scenes. As the system further develops, its cloud-based recognition of multi-person scene sounds can quickly match authorized network devices for response, while simultaneously requesting user feedback. For example, it might remind the homeowner to adjust inconvenient network configurations in the living room, such as moving devices or placing snacks. The voiceprint network device control system records this user feedback. Because it provides timely reminders and feedback during the pre-configuration and scene activation stages, when setting a scene, it requests the user to make further adjustments based on the detected presence of elderly people and children, facilitating their needs. Families with elderly people and children living together can benefit from deep intelligent adjustments, simplifying user experience. Younger users can set the scene remotely.

[0162] For example, if elderly people or children cannot adapt to a colder air conditioner temperature setting, when they enter the house, the welcome scene switches to the living room scene. The main scene can then be set to the elderly or children. The user can then make a voice request, such as saying to the system, "Turn up the temperature of the air conditioner in the living room," to verbally adjust the settings of individual devices to meet their new needs. This allows the living room, bedroom, and kitchen to be handled separately. After the settings are complete, the user can verbally request, "Current settings are good. Update the multi-person scene in the living room and process the device scenes in the rooms," so that the system remembers this new configuration. In the next situation, when the user sets device scenes, the system can restore the settings based on this scene, similar to how snapshot data can be recorded by date or tag.

[0163] Step three supports retrieving historical snapshots of device scenarios under the same user name with the same voiceprint. The historical data is marked with a description of the required changes for subsequent retrieval. By fully utilizing existing natural language processing understanding capabilities, historical device scenario settings can be configured based on voiceprint data.

[0164] Step four: For voiceprints that are not within the scope of family members and visitors, track the voice permissions and control of home devices of unfamiliar users.

[0165] The system records basic voiceprints for different users for identification, triggering main scenes for home devices and adjusting device parameters. Voiceprint data is stored on a local edge storage terminal or the encrypted storage module of the smart gateway, and is also backed up to the cloud for encrypted storage.

[0166] The newly entered voiceprint belongs to a guest or family member. Combined with other triggering conditions, the relevant scenes in multi-person scenarios are configured. The device parameters of the main scene and sub-scenes are fine-tuned. When entering a main scene, the relevant device parameters are preset, confirmed through voice interaction, and the final setting result is announced.

[0167] If the voiceprint of a person or other animal is not among guests, family members, or pets, the webcam can be activated to track suspicious individuals in the living room, continuously recording close-up shots. This recording can then be used to notify the automatic alarm system to issue an alert, dial 110 or other domestic security emergency numbers, or, if video alarms are supported, send the recording as evidence.

[0168] This part pertains to setting up a voiceprint whitelist mode, which identifies, records, and tracks individuals not on the whitelist.

[0169] Step 5: Analyze the voiceprints of different items during the current time period, combine them to determine the safety events present in the current home environment, and control network devices to trigger alarms and safety feedback; based on the decision tree model of the correlation data between safety events and abnormal sounds, for example, disconnect the power to the smart switch to prompt a fire alarm; set several scenarios that are most likely to cause accidents as safety events, and provide voice prompts during the setting process to check whether they meet the alarm requirements to avoid false alarms. Combined judgment means that instead of using a single source or a single sudden abnormal sound for reasoning, it tries to obtain information on whether there are other accompanying abnormal sound events in the nearby time period, such as within two minutes. If they occur simultaneously, an alarm is triggered, and different event sounds are assigned corresponding weight coefficients.

[0170] The system uses the loud screams of women and the elderly, the sounds of shattered glass, and the continuous destructive noises of objects being thrown to identify violence and burglaries, instantly dialing security numbers, recording sounds of struggle, and further mobilizing collaborative devices such as location recognition devices and cameras to record potential crime scenes. For example, it can trigger alarms based on the decibel level of continuous breaking sounds, analyze the probability of security risks from multiple sound recordings over a continuous period, classify and compare different security incidents, and trigger home appliances to respond based on the severity level of the incident. Even better, users can choose automatic alarms or third-party assisted alarms, with audio and video materials being sent to security departments in real time.

[0171] Step 6: Determine the current event type based on the voiceprint combination, trigger the corresponding device scenario, and provide an immediate voice reminder.

[0172] By analyzing different voiceprint combinations as preset conditions, the system can match device scene initialization to complete the overall setup of multiple devices, and more personally address the needs of different users in different home scenarios.

[0173] Alarms and prompts are triggered by the cloud, and text and voice responses generated by the cloud are received locally and broadcast on the local voice terminal. For example, a smart speaker can vividly present the response content according to the tone required by the text.

[0174] As one of the core network devices and entry points leading the intelligentization of the Internet, the smart gateway can work with existing smart terminals and the cloud to complete the transition of existing devices to intelligence. The smart gateway connects with the device, automatically configures the corresponding short-range synchronization, can cache and accelerate, reduce the cost of subsequent maintenance of equipment and network, and better recover from disasters.

[0175] In this interactive process of querying and modifying information, the smart gateway also serves as a network connection and an information acquisition tool. If other smart devices communicate with the database through relevant data protocols, they can share data, achieve decentralized intelligence, reduce the difficulty of collaboration between individual devices in smart applications, and thus solve the problem that existing technologies rely solely on manual text input, which is cumbersome and fails to provide a good information experience.

[0176] The above solution uses artificial intelligence to recognize voiceprints and uses big data to collect additional demand description information and scene setting templates, reducing the hassle of inputting specific parameters for single-person and multi-person scenarios. At the same time, it reduces the workload of developers and allows unified data operations and service Q&A updates to be completed using simple language commands. It supports natural language interaction and further expands information applications, making it suitable for people with different levels of education, covering primary and secondary school students and the elderly, lowering the entry point for intelligent applications and increasing the convenience of voice control.

[0177] It is understood that the same or similar parts in the above embodiments can be referred to each other, and the contents not described in detail in some embodiments can be referred to the same or similar contents in other embodiments.

[0178] It should be noted that in the description of this invention, the terms "first," "second," etc., are used for descriptive purposes only and should not be construed as indicating or implying relative importance. Furthermore, in the description of this invention, unless otherwise stated, "a plurality of" means at least two.

[0179] Any process or method description in the flowchart or otherwise herein can be understood as representing a module, segment, or portion of code comprising one or more executable instructions for implementing a particular logical function or process, and the scope of the preferred embodiments of the invention includes additional implementations in which functions may be performed not in the order shown or discussed, including substantially simultaneously or in reverse order depending on the functions involved, as will be understood by those skilled in the art to which embodiments of the invention pertain.

[0180] It should be understood that various parts of the present invention can be implemented in hardware, software, firmware, or a combination thereof. In the above embodiments, multiple steps or methods can be implemented in software or firmware stored in memory and executed by a suitable instruction execution system. For example, if implemented in hardware, as in another embodiment, it can be implemented using any one or a combination of the following techniques known in the art: discrete logic circuits having logic gates for implementing logical functions on data signals, application-specific integrated circuits (ASICs) having suitable combinational logic gates, programmable gate arrays (PGAs), field-programmable gate arrays (FPGAs), etc.

[0181] Those skilled in the art will understand that all or part of the steps of the methods in the above embodiments can be implemented by a program instructing related hardware. The program can be stored in a computer-readable storage medium, and when executed, the program includes one or a combination of the steps of the method embodiments.

[0182] Furthermore, the functional units in the various embodiments of the present invention can be integrated into a processing module, or each unit can exist physically separately, or two or more units can be integrated into a module. The integrated module can be implemented in hardware or as a software functional module. If the integrated module is implemented as a software functional module and sold or used as an independent product, it can also be stored in a computer-readable storage medium.

[0183] The storage media mentioned above can be read-only memory, disk, or optical disk, etc.

[0184] In the description of this specification, references to terms such as "one embodiment," "some embodiments," "example," "specific example," or "some examples," etc., indicate that a specific feature, structure, material, or characteristic described in connection with that embodiment or example is included in at least one embodiment or example of the invention. In this specification, the illustrative expressions of the above terms do not necessarily refer to the same embodiment or example. Furthermore, the specific features, structures, materials, or characteristics described may be combined in any suitable manner in one or more embodiments or examples.

[0185] Although embodiments of the present invention have been shown and described above, it is understood that the above embodiments are exemplary and should not be construed as limiting the present invention. Those skilled in the art can make changes, modifications, substitutions and variations to the above embodiments within the scope of the present invention.

Claims

1. A voiceprint recognition based network device control method, characterized in that, include: Receives voice commands sent by the smart gateway, wherein the voice commands are obtained by the smart gateway from the user equipment; Obtain voiceprint information from the voice command; Determine the operation content corresponding to the sound command; The target user is determined based on the voiceprint information; The target control scenario is determined based on the target user and the operation content. The target control scenario is either a single-person control scenario or a multi-person control scenario. The target control scenario includes at least one operation information, which includes operation object information and corresponding operation instructions. The smart gateway sends corresponding operation instructions to each operation object in the operation information. Target users identified based on voiceprint information include: Calculate the amount of voiceprint information obtained from the voice command; If only one voiceprint information is obtained from the voice command, the target user is determined directly based on the voiceprint information. If two or more voiceprint information are obtained from the voice command, then check whether there is an execution record of the voice command in the historical response record. If the execution record of the sound command exists in the historical response record, then the sound command is responded to according to the execution record; If there is no execution record of the voice command in the historical response record, an inquiry message is sent to the user device through the smart gateway. The inquiry message is used to ask the user whether to switch to a multi-person scene. If a switching instruction is received from the user equipment through the smart gateway within a preset time period, the switching instruction includes multi-user mode or single-user mode; If the switching instruction is for multi-user mode, then obtain all target users related to the operation content and the voice command; If the switching instruction is not received within the preset time period, or if the switching instruction is for single-person mode, then the target user corresponding to the voice command is directly obtained.

2. The voiceprint recognition based network device control method of claim 1, wherein, Also includes: Use a voice assistant to obtain the target user's background information; Based on the background information, obtain the target user's device usage habits; The control scenario for the target user is generated based on the device usage habits.

3. The network device control method based on voiceprint recognition according to claim 2, characterized in that, Before receiving voice commands sent by the smart gateway, the method further includes: Obtain the first single-user control scenario corresponding to the target user; Based on the background information, obtain the target user's associated personnel information; Obtain the second single-person control scene corresponding to each of the associated personnel; The first single-player control scenario and all the second single-player control scenarios are merged to generate at least one multi-player control scenario.

4. The network device control method based on voiceprint recognition according to claim 3, characterized in that, Determining the target user based on the voiceprint information also includes: Identify the source of the voiceprint information; If the voiceprint information comes from a legitimate user, then the corresponding legitimate user is identified as the target user; If the voiceprint information comes from an unauthorized user, a tracking camera instruction is sent to the target camera through the smart gateway, so that the camera can track the unauthorized user after receiving the tracking camera instruction and send an alarm message to the designated user terminal.

5. A network device control method based on voiceprint recognition, characterized in that, include: Receive audio commands sent by user equipment; When a normal connection with the cloud server cannot be established, voiceprint information is directly obtained from the voice command to determine the operation content corresponding to the voice command. The target user is determined based on the voiceprint information. The target control scenario is determined based on the target user and the operation content. The target control scenario is a single-person control scenario or a multi-person control scenario. The target control scenario includes at least one operation information, which includes operation object information and corresponding operation command. When the cloud server connection is normal, the voice command is sent to the cloud server so that the cloud server can obtain voiceprint information from the voice command, determine the operation content corresponding to the voice command, determine the target user based on the voiceprint information, and determine the target control scenario based on the target user and the operation content. Send corresponding operation instructions to each operation object in the operation information; Target users identified based on voiceprint information include: Calculate the amount of voiceprint information obtained from the voice command; If only one voiceprint information is obtained from the voice command, the target user is determined directly based on the voiceprint information. If two or more voiceprint information are obtained from the voice command, then check whether there is an execution record of the voice command in the historical response record. If the execution record of the sound command exists in the historical response record, then the sound command is responded to according to the execution record; If there is no execution record of the voice command in the historical response record, an inquiry message is sent to the user device through the smart gateway. The inquiry message is used to ask the user whether to switch to a multi-person scene. If a switching instruction is received from the user equipment through the smart gateway within a preset time period, the switching instruction includes multi-user mode or single-user mode; If the switching instruction is for multi-user mode, then obtain all target users related to the operation content and the voice command; If the switching instruction is not received within the preset time period, or if the switching instruction is for single-person mode, then the target user corresponding to the voice command is directly obtained.

6. A cloud server, characterized in that, include: The first voice command receiving module is used to receive voice commands sent by the smart gateway, wherein the voice commands are obtained by the smart gateway from the user equipment; A voiceprint information acquisition module is used to acquire voiceprint information from the voice command; The operation content acquisition module is used to determine the operation content corresponding to the sound command; A target user determination module is used to determine a target user based on the voiceprint information; Target users identified based on voiceprint information include: Calculate the amount of voiceprint information obtained from the voice command; If only one voiceprint information is obtained from the voice command, the target user is determined directly based on the voiceprint information. If two or more voiceprint information are obtained from the voice command, then check whether there is an execution record of the voice command in the historical response record. If the execution record of the sound command exists in the historical response record, then the sound command is responded to according to the execution record; If there is no execution record of the voice command in the historical response record, an inquiry message is sent to the user device through the smart gateway. The inquiry message is used to ask the user whether to switch to a multi-person scene. If a switching instruction is received from the user equipment through the smart gateway within a preset time period, the switching instruction includes multi-user mode or single-user mode; If the switching instruction is for multi-user mode, then obtain all target users related to the operation content and the voice command; If the switching instruction is not received within the preset time period, or if the switching instruction is a single-person mode, then the target user corresponding to the voice command is directly obtained. The control scenario determination module is used to determine a target control scenario based on the target user and the operation content. The target control scenario is a single-person control scenario or a multi-person control scenario. The target control scenario includes at least one piece of operation information, which includes operation object information and corresponding operation instructions. The first operation instruction sending module is used to send corresponding operation instructions to each operation object in the operation information through the smart gateway.

7. A smart gateway, characterized in that, include: The second audio command receiving module is used to receive audio commands sent by the user equipment. The voiceprint processing module is used to directly obtain voiceprint information from the voice command when the connection with the cloud server is not normal, determine the operation content corresponding to the voice command, determine the target user based on the voiceprint information, and determine the target control scenario based on the target user and the operation content. The target control scenario is a single-person control scenario or a multi-person control scenario. The target control scenario includes at least one operation information, which includes operation object information and corresponding operation command. The voice command sending module is used to send the voice command to the cloud server when the cloud server connection is normal, so that the cloud server can obtain voiceprint information from the voice command, determine the operation content corresponding to the voice command, determine the target user based on the voiceprint information, and determine the target control scene based on the target user and the operation content. The second operation instruction sending module is used to send corresponding operation instructions to each operation object in the operation information; Target users identified based on voiceprint information include: Calculate the amount of voiceprint information obtained from the voice command; If only one voiceprint information is obtained from the voice command, the target user is determined directly based on the voiceprint information. If two or more voiceprint information are obtained from the voice command, then check whether there is an execution record of the voice command in the historical response record. If the execution record of the sound command exists in the historical response record, then the sound command is responded to according to the execution record; If there is no execution record of the voice command in the historical response record, an inquiry message is sent to the user device through the smart gateway. The inquiry message is used to ask the user whether to switch to a multi-person scene. If a switching instruction is received from the user equipment through the smart gateway within a preset time period, the switching instruction includes multi-user mode or single-user mode; If the switching instruction is for multi-user mode, then obtain all target users related to the operation content and the voice command; If the switching instruction is not received within the preset time period, or if the switching instruction is for single-person mode, then the target user corresponding to the voice command is directly obtained.

8. A network device control system based on voiceprint recognition, characterized in that, include: The user equipment collects sound information, generates sound commands, and sends the sound commands to the smart gateway. The smart gateway receives voice commands sent by user devices; When a normal connection to the cloud server cannot be established, voiceprint information is directly obtained from the voice command to determine the operation content corresponding to the voice command. The target user is determined based on the voiceprint information, and the target control scenario is determined based on the target user and the operation content. The target control scenario is either a single-person control scenario or a multi-person control scenario. The target control scenario includes at least one operation information, which includes operation object information and corresponding operation commands. When the connection to the cloud server is normal, the voice command is sent to the cloud server. The cloud server receives voice commands sent by the smart gateway and obtains voiceprint information from the voice commands. Determine the operation content corresponding to the sound command; The target user is determined based on the voiceprint information; the target control scenario is determined based on the target user and the operation content, the target control scenario is a single-person control scenario or a multi-person control scenario, the target control scenario includes at least one operation information, the operation information includes operation object information and corresponding operation instructions; The smart gateway sends corresponding operation instructions to each operation object in the operation information. Target users identified based on voiceprint information include: Calculate the amount of voiceprint information obtained from the voice command; If only one voiceprint information is obtained from the voice command, the target user is determined directly based on the voiceprint information. If two or more voiceprint information are obtained from the voice command, then check whether there is an execution record of the voice command in the historical response record. If the execution record of the sound command exists in the historical response record, then the sound command is responded to according to the execution record; If there is no execution record of the voice command in the historical response record, an inquiry message is sent to the user device through the smart gateway. The inquiry message is used to ask the user whether to switch to a multi-person scene. If a switching instruction is received from the user equipment through the smart gateway within a preset time period, the switching instruction includes multi-user mode or single-user mode; If the switching instruction is for multi-user mode, then obtain all target users related to the operation content and the voice command; If the switching instruction is not received within the preset time period, or if the switching instruction is for single-person mode, then the target user corresponding to the voice command is directly obtained.

9. A computer device, comprising: The system includes a processor, a memory, and a bus. The memory stores machine-readable instructions executable by the processor. When the computer device is running, the processor communicates with the memory via the bus, and the machine-readable instructions are executed by the processor to implement the network device control method based on voiceprint recognition as described in any one of claims 1 to 6.

Citation Information

Patent Citations

Intelligent interaction processing method and system based on AI voice and storage medium
CN110113646A
Bottleneck and channel segmentation-based lightweight speaker identification method and system
CN114220438A

Patent Information

AI Technical Summary

Abstract

Description

Patent Citations

Intelligent interaction processing method and system based on AI voice and storage medium

Bottleneck and channel segmentation-based lightweight speaker identification method and system