DISTINGUISHING VOCAL COMMANDS
Patent Information
- Authority / Receiving Office
- DE · DE
- Patent Type
- Patents
- Current Assignee / Owner
- INTERNATIONAL BUSINESS MACHINE CORPORATION
- Filing Date
- 2020-08-13
- Publication Date
- 2026-07-02
AI Technical Summary
Voice-controlled units often mistakenly activate due to background noise from non-human sources like TVs and radios, leading to unintended commands, and the introduction of mobile units further complicates this issue by introducing new background noise sources.
A method and system for filtering voice commands by establishing blocked directions, determining if commands originate from these directions, and ignoring them if they do, using triangulation and voice recognition to differentiate between registered users and background noise.
Effectively reduces unintended activations by accurately identifying and ignoring commands from non-human sources and new mobile units, enhancing the reliability of voice-controlled systems.
Abstract
Description
Technical field
[0001] The present disclosure relates to voice-controlled units and, in particular, to the filtering of voice commands. BACKGROUND
[0002] Voice command devices (SGEs) are controlled by human voice commands. This eliminates the need to operate a device using hand controls such as buttons, dials, switches, user interfaces, etc. This allows users to operate devices while their hands are occupied with other tasks or when they are not close enough to touch the device.
[0003] Smart Devices (SDEs) can take various forms, such as units with a specific purpose like household appliances, controllers for other devices, or personal assistants. SDEs in the form of virtual personal assistants can be integrated into computing devices like mobile phones. Virtual personal assistants can provide voice-activated instructions for performing tasks or services in response to voice commands and input.
[0004] SGEs can be activated by a voice command in the form of one or more activation words. SGEs can use speech recognition to be programmed to respond only to the voice of a registered person or group of registered people. This prevents unregistered users from issuing commands. Other types of SGEs are not tailored to registered users and allow any user to issue a command in the form of specific command words and instructions.
[0005] Voice-controlled units (VCUs) are controlled by human voice commands. These units are controlled by human voice commands, eliminating the need to operate a unit using hand controls such as buttons, dials, switches, user interfaces, etc. This allows users to operate units while their hands are occupied with other tasks or when they are not close enough to the unit to touch it.
[0006] Complications arise when an SGE is triggered by a voice command from a television, radio, computer, or other non-human unit that is emitting speech in the immediate vicinity of the SGE.
[0007] For example, a smart speaker with a voice-activated intelligent personal assistant might be deployed in a living room. The smart speaker might mistakenly respond to audio signals from a television. Sometimes this might be a harmless command that the smart speaker doesn't understand; however, occasionally the sound is a valid command or wake word that can trigger an action from the intelligent personal assistant.
[0008] Therefore, there is a need in the field of technology to address the aforementioned problem. QUICK OVERVIEW
[0009] In a first aspect, the present invention provides a computer-implemented method for filtering voice commands, the method comprising: establishing a data exchange with a voice-controlled unit located at a certain location; receiving data indicating blocked directions from the voice-controlled unit; receiving a voice command; determining that the voice command is being received from a blocked direction specified in the data; and ignoring the received voice command.
[0010] In another aspect, the present invention provides a system for filtering voice commands, wherein the system comprises: a memory that stores program instructions; and a processor configured to execute the program instructions in order to perform a method comprising: establishing a data exchange with a voice-controlled unit located at a specific location; receiving data from the voice-controlled unit indicating blocked directions; receiving a voice command; determining that the voice command is being received from a blocked direction specified in the data; and ignoring the received voice command.
[0011] In another aspect, the present invention provides a computer program product for filtering speech commands, wherein the computer program product has a computer-readable storage medium that can be read by a processing circuit and stores instructions for execution by the processing circuit in order to execute a method for carrying out the steps of the invention.
[0012] In another respect, the present invention provides a computer program that is stored on a computer-readable medium and can be loaded into the internal memory of a digital computer, which includes software code sections to perform the steps of the invention when the program is executed in a computer.
[0013] In another aspect, the present invention provides a computer program product comprising a computer-readable storage medium containing program instructions, wherein the computer-readable storage medium is not a transitory signal in itself, and wherein the program instructions are executable by a processor to cause the processor to execute a method comprising: establishing a data exchange with a voice-controlled unit located at a specific location; receiving data from the voice-controlled unit indicating blocked directions; receiving a voice command; determining that the voice command is being received from a blocked direction specified in the data; and ignoring the received voice command.
[0014] Embodiments of the present disclosure comprise a method, a computer program product, and a system for filtering voice commands. A data exchange can be established with a voice-controlled unit located at a specific location. Data indicating blocked directions can be received by the voice-controlled unit. A voice command can be received. It can be determined that the voice command is being received from a blocked direction specified in the data. The received voice command can then be ignored.
[0015] The above summary is not intended to describe every illustrated embodiment or every implementation of the present disclosure. List of characters
[0016] The drawings contained in this disclosure are integrated into the description and form part of it. They illustrate embodiments of the disclosure and, together with the description, serve to explain the fundamental concepts of the disclosure. The drawings merely illustrate typical embodiments and do not limit the disclosure. Fig. Figure 1 is a schematic representation illustrating an environment in which embodiments of the present disclosure may be implemented. Fig. 2A is a flowchart illustrating an exemplary method for filtering voice commands based on a blocked direction according to embodiments of the present disclosure. Fig. 2B is a flowchart illustrating an exemplary procedure for determining whether voice commands are to be executed, according to embodiments of the present disclosure. Fig. 2C is a flowchart illustrating an exemplary procedure for querying an audio output unit to determine whether a voice command should be ignored, according to embodiments of the present disclosure. Fig. Figure 3 is a flowchart illustrating an exemplary procedure for transferring an audio file to a voice-controlled unit according to embodiments of the present disclosure. Fig. Figure 4 is a block diagram of a voice-controlled unit according to embodiments of the present disclosure. Fig. Figure 5 is a block diagram of an audio output unit according to embodiments of the present disclosure. Fig. Figure 6 is a flowchart illustrating an exemplary procedure for filtering voice commands in a mobile unit according to embodiments of the present disclosure. Fig. Figure 7 is a flowchart illustrating an exemplary procedure for updating a voice command with a blocked direction corresponding to the location of a mobile unit, according to embodiments of the present disclosure. Fig. Figure 8 is an overview block diagram illustrating an exemplary computer system that can be used in implementing one or more of the methods, utilities and modules described herein and all related functions, according to embodiments of the present disclosure. Fig. Figure 9 is a representation illustrating a cloud computing environment according to embodiments of the present disclosure. Fig. Figure 10 is a block diagram illustrating abstraction model layers according to embodiments of the present disclosure.
[0017] While the embodiments described herein are suitable for various modifications and alternative forms, the specific features of which are illustrated by way of example and described in detail in the drawings, it should be clear that the described specific embodiments are not to be understood in a restrictive sense. Rather, they are intended to cover all modifications, equivalents, and alternatives that fall within the scope of protection of the disclosure. DETAILED DESCRIPTION
[0018] Aspects of the present revelation generally concern the field of voice-controlled devices and, in particular, the filtering of voice commands. While the present revelation is not necessarily limited to such applications, various aspects of the revelation can be understood by explaining different examples in this context.
[0019] Aspects of this disclosure are aimed at differentiating voice commands in a voice-controlled unit. If an audio output unit (e.g., a television) generates background noise over an extended period, the audio output unit can be added to a blocked direction, so that the voice-controlled unit is not activated in response to audio output due to the background noise. The voice-controlled unit can be configured to recognize registered users from the blocked direction, so that commands from registered users from the blocked direction are executed. However, if an unregistered user attempts to issue commands from the blocked direction (e.g., at the television), the unregistered user may be erroneously ignored.Accordingly, aspects of the present disclosure overcome the aforementioned complications by querying the audio output unit to determine whether it is producing sound. If the audio output unit is not producing sound at the time the command is issued, the unregistered user's command can be processed. If the audio output unit is producing sound at the time the command is issued, an audio file can be captured from the audio output unit and compared with the received voice command. If the voice command and the audio file are substantially similar, the command can be ignored (since, for example, it was likely received from the television). If the voice command and the audio file are not substantially similar, the voice command can be processed because it did not originate from the audio output unit.
[0020] Furthermore, aspects recognize that statically programming blocked directions in SGEs can lead to problems when new mobile units enter the vicinity. Mobile units themselves can become a source of background noise, which can also lead to the erroneous activation of voice commands in the SGE. Accordingly, aspects of the present disclosure are directed towards updating an SGE when new background sources (e.g., a mobile unit) enter the vicinity of the SGE.
[0021] Furthermore, the mobile units themselves may also include voice command functionalities. However, the mobile units may not have been updated with the current blocked directions, authenticated user voices, and other important aspects. Accordingly, aspects enable the synchronization of SGE data (e.g., blocked directions, authenticated user voices, and other aspects such as the volume status of the audio output unit) between SGEs (e.g., a dedicated SGE and a mobile unit with SGE functionalities).
[0022] With reference to Fig. Figure 1 shows a schematic representation of an environment (e.g., a room) in which a voice-controlled unit (SCU) can be regularly placed. For example, the SCU 120 could be in the form of a smart speaker with a voice-controlled intelligent personal assistant, located on a table next to a sofa in the environment.
[0023] The environment 110 can include a television 114, from which sound can be output via two speakers 115, 116 belonging to the television 114. The environment 110 can also include a radio 112 with one speaker.
[0024] The SGE 120 can receive audio input from the two television speakers 115 and 116 and the radio 112 at various times. This audio input can include voices containing command words that unintentionally activate the SGE 120 or provide input to the SGE 120.
[0025] Aspects of the present disclosure provide the SGE 120 with additional functionality to learn directions (e.g., relative angles) of audio inputs that should be ignored for a given location of the SGE 120.
[0026] Over time, the SGE 120 can learn to detect sources of background noise in an environment 110 based on the direction of their audio input to the SGE 120. In this example, the radio 112 is located at an angle of approximately 0 degrees to the SGE 120, and a hatched triangle 131 illustrates how the audio output of the radio 112 can be received by the SGE 120. The two speakers 115, 116 of the television 114 can be detected at directions of approximately 15 to 20 degrees and 40 to 45 degrees, respectively, and hatched triangles 132, 133 illustrate how the audio output of the speakers 115, 116 can be received by the SGE 120. Over time, these directions can be learned by the SGE 120 as blocked directions from which audio commands must be ignored.
[0027] In another example, an SGE 120 could be a stationary device (e.g., a washing machine), and a blocked direction could be learned for an audio input source such as a radio in the same room as the washing machine.
[0028] If the SGE 120 receives a command from these blocked directions, it may ignore the command unless it is configured to accept commands from these directions from the voice of a known registered user.
[0029] In some embodiments, the SGE 120 can be configured to distinguish between commands from an unknown person 140 speaking from an obstructed direction (e.g., from the direction of one of the audio speakers 115, 116 of the television 114 or from the direction of the radio 112). As described below, the SGE 120 can be configured to query the audio output unit (e.g., the television 114 or the radio 112) to determine its status (e.g., whether the audio unit is outputting sound). If the audio unit's status indicates that it is not outputting sound, the command from the unknown person 140 can be executed, even if the unknown person is speaking from an obstructed direction.
[0030] Implementations allow an SGE 120 located at a specific location (e.g., in a specific direction relative to the SGE 120) to ignore unwanted command sources for that location without overlooking commands from unknown users.
[0031] Furthermore, aspects indicate a mobile device (MD) 150 that may have entered the environment 110. The mobile device 150 could be a smartphone with local (e.g., Bluetooth and Wi-Fi) and wide-area wireless capabilities (e.g., 4G), as well as processing and storage functions. The mobile device 150 may also include technologies for audio output (one or more speakers) and audio input (one or more microphones). The mobile device 150 may also be equipped with software functionality for voice control / commands, enabling the mobile device 150 to be controlled by a user's voice.
[0032] The dynamic introduction of the mobile unit 150 into the environment 110 has the potential to disrupt the operation of the existing static SGE 120, as the mobile unit 150 can be considered an additional source of (background) audio output, which could lead to commands from the mobile unit 150 being inadvertently received and executed by the SGE 120. Similarly, audio output from the existing background audio sources (the radio 112 and the television 114) could interfere with the operation of the voice control function of the mobile unit 150 (assuming the mobile unit 150 has such a function). The introduction of new units like the mobile unit 150 can be managed by the SGE 120 in such a way as to minimize the probability of an interruption of the voice-controlled operation of the SGE 120 and the operation of the mobile unit 150.
[0033] As explained in more detail below, the SGE 120 can store the details of one or more blocked sources of background speech noise. The details stored by the SGE 120 could be structured in one or more different ways, for example, with directional information and / or other information about the identity of the unit(s) generating the background speech noise, and / or the characteristics of the background speech noise (e.g., sound and pitch) likely to be emitted by the specific unit(s) present that might inadvertently issue voice commands. This information is retained by the SGE 120 for its own purposes, to filter out unwanted voice commands that might originate from a background source of speech noise rather than a user.
[0034] The mobile unit 150, entering environment 110 where the static SGE 120 is located, can cause the SGE 120 to take a series of actions to improve the operation of both the SGE 120 and the mobile unit 150. The SGE 120 can be considered static (e.g., at a fixed location within environment 110), and the mobile unit 150 can be considered dynamic (e.g., moving around within environment 110). Ultimately, both units are capable of functioning as SGEs, but one is static and the other dynamic. The units 120 and 150 are operated such that the mobile unit 150 exchanges data with the static SGE 120 every time the mobile unit 150 is activated by an audio command, in order to utilize the background noise source information that the SGE 120 has received.This allows the mobile unit 150 to prevent unwanted commands from being executed by background noise sources when the mobile unit 150 moves to a new environment. Furthermore, the units 120 and 150 are configured so that the SGE 120 recognizes the mobile unit 150 as a temporary background noise source, enabling the stationary SGE 120 to block unwanted commands emanating from the mobile unit's speakers, for example, when the mobile unit 150 is playing a video, streaming a TV program, a movie, or an audio track. This cooperation between the SGE 120 and the mobile unit 150 improves the operation of both units.
[0035] This improved operation of the SGE 120 and the Mobile Unit 150 can be implemented as a software update for existing SGEs and personal assistant devices such as mobile phones. The process can be activated when a Mobile Unit 150 enters a new environment where an SGE 120 is present. The Mobile Unit 150 can exchange data with the SGE 120 located in the environment 110 via Bluetooth, Wi-Fi, or another localized communication technology to establish a pairing. As part of this initial handshake procedure, the Mobile Unit 150 can transmit appropriate activation words and / or phrases used to activate the Mobile Unit's personal assistant. The SGE 120 would then store these activation words.The SGE 120 would also store details about previously paired mobile units 150, so that the activation words do not have to be transmitted every time the same mobile unit 150 enters the environment 110.
[0036] The SGE 120 then searches for all occurrences of the mobile unit's stored activation words / phrases. When the SGE 120 detects activation words / phrases, it stores an instance in its buffer, detailing the phrase used, the time / date it was received, and whether or not it originated from a known background noise source. Similarly, upon detecting an activation word / phrase, the mobile unit 150 sends a query to the SGE 120 before processing the command to determine if the command originated from one of the SGE 120's known background noise sources. If the activation word / phrase originates from a known background noise source in the environment (according to the SGE 120), the mobile unit 150 can ignore the command. Otherwise, the mobile unit 150 can execute the command as usual.
[0037] At regular intervals, the SGE 120 can determine whether the mobile unit 150 is still within the environment 110 by emitting a signal (e.g., a high-frequency audio signal emitted by the mobile unit 150). If the mobile unit 150 leaves the environment 110, the SGE 120 can clear its buffer containing observed activation words / phrases and stop storing future commands. In some embodiments, the mobile unit 150 updates the SGE 120 with its position using the signal, allowing the SGE 120 to update its blocked directions with the location of the mobile unit 150.
[0038] Fig. 2A is a flowchart illustrating an exemplary method 200 for filtering voice commands based on a blocked direction according to embodiments of the present disclosure.
[0039] Method 200 begins by determining one or more directions of background speech noise. This is described in step 201. The SGE 120 can learn the directions for the location by receiving background audio input from these directions and analyzing the relative direction from which the audio input is received (which may include both speech and non-speech background noise). In some embodiments, the determination of the one or more directions of background noise is performed by triangulation (e.g., by measuring the audio input at two or more known locations (e.g., at microphones mounted in the SGE 120) and determining the direction and / or position of the audio input by measuring the angles from the known points).Triangulation can be performed by installing two or more microphones in the SGE 120 and comparing the audio data received by these microphones. In some embodiments, the one or more blocked directions can be determined by the time difference of arrival (TDOA). This method can also be performed using two or more microphones. The data received by these microphones can be analyzed to determine the position of the received audio input based on the time difference of the audio input. In some embodiments, the one or more blocked directions can be determined by using loudspeakers in the room (e.g., loudspeakers 112, 115, and 116 of the SGE 120). Fig. 1) Sensors (e.g., optical sensors, GPS sensors, RFID tags, etc.) are assigned and the sensors are used to determine the blocked direction. For example, optical sensors can be assigned to the SGE 120 and one or more loudspeakers in the room. The blocked directions can then be determined by the optical sensor associated with the SGE 120. Alternatively, directions of background speech noise can also be configured by a user. In embodiments, the blocked directions can be based on the location of one or more mobile units (e.g., the mobile unit 150 of Fig. 1) be determined in an environment.
[0040] The one or more blocked directions are then stored. This is shown in step 202. The one or more blocked directions can be stored in any suitable memory (e.g., flash memory, RAM, hard disk storage, etc.). In some embodiments, the one or more blocked directions are stored in the local memory of the SGE 120. In some embodiments, the one or more blocked directions can be stored in another device and transferred over a network.
[0041] Furthermore, the SGE 120 determines recognized voice biometrics. This is illustrated in step 203. Determining recognized voice biometrics can be achieved, for example, by applying speech recognition to one or more registered voices. The speech recognition can utilize voice characteristics such as pitch and tone. The recognized voice biometrics can be stored in the SGE 120 to determine whether incoming voices are registered in the SGE 120. This can be done by comparing the incoming voices with the recognized voice biometrics to determine if the incoming voice is a recognized voice.
[0042] A voice input is then received. This is shown in step 204. The voice input can be received from a human or a non-human entity. Accordingly, the term "voice input" does not necessarily refer to a voice, but can also include background noise (e.g., a running washing machine, music from a speaker, etc.).
[0043] The system determines whether the speech input originates from a blocked direction. This is illustrated in step 205. Determining whether the speech input originates from a blocked direction can be done by comparing the stored blocked directions with the direction from which the speech input was received to see if the received speech input belongs to a stored blocked direction. If the speech input originates from a direction different from the stored blocked directions, it can be determined that the speech input does not originate from a blocked direction.
[0044] If it is determined that the speech input is not coming from a blocked direction, the speech input is processed. This is illustrated in step 206. In some embodiments, the processing includes recognizing a command and executing the received command. In some embodiments, the processing may include comparing the received speech input with stored command data (e.g., data specifying command words and command initiation protocols) to determine whether the received speech input corresponds to (e.g., matches) a command in the stored command data. For example, if the speech input includes the phrase "turn off the power" and "turn off the power" is specified as a command initiation phrase in the stored command data, it can be determined that the speech input is a command, and the command can be executed (e.g., the power can be turned off).
[0045] If it is determined that the speech input is being received from a blocked direction, an audio unit associated with that blocked direction can be queried to verify the speech input. This is shown in step 207. For example, if, with reference to Fig. 1. When an audio input is received from speaker 116, the television 114 can be queried to determine whether it is switched off or muted. If it is determined that the television is switched off or muted (no sound is being output), the voice input can be processed (e.g., a voice command can be executed) because the voice input was received from the unknown person 140.
[0046] The aforementioned processes can be carried out in any order and are not limited to those described. Furthermore, some, all, or none of the aforementioned processes can be carried out, while this still remains within the scope of protection of this disclosure.
[0047] The Fig. 2B to Fig. 2C are flowcharts which together illustrate a method 250 for filtering audio speech data received by a voice-controlled unit (e.g. the SGE 120) according to embodiments of the present disclosure. Fig. 2B is a flowchart illustrating an exemplary procedure according to embodiments of the present disclosure to determine whether voice commands should be executed based on a direction and recognized speech biometric data. Fig. 2C is a flowchart illustrating an exemplary procedure for querying an audio output unit according to embodiments of the present disclosure in order to determine whether a voice command should be ignored.
[0048] Procedure 250 begins with the registration of one or more recognized voices in the SGE. This is illustrated in step 252. For example, the SGE may be configured to register a primary user to ensure that the voice's speech biometric data is easily recognized. Voice registration can be performed by analyzing various speech inputs from a primary user. The analyzed speech inputs can then be used to distinguish a sound and pitch for the primary user. Alternatively, regularly received voices can be automatically learned and registered by recording pitch and sound data received during use of the unit. The presence of speech recognition functionality may not preclude the SGE from accepting commands or inputs from other unregistered voices.In some embodiments, voices may not be registered, and the method may block directions for all voice inputs.
[0049] A speech input is then received. This is shown in step 253. In some embodiments, non-speech inputs are automatically filtered, which may be an associated functionality of a SGE. In some embodiments, speech inputs may include background speech noise or sound sequences (which may contain, for example, non-human speech inputs).
[0050] The next step is to determine whether the speech input belongs to a recognized voice. This is shown in step 254. Determining whether the speech input belongs to a recognized voice can be done by analyzing the speech biometric data of the received speech input (e.g., by performing a pitch analysis of the received speech input) and by comparing the analyzed speech input data with the registered speech biometric data.
[0051] If the voice input is associated with a recognized voice and is a command, the command is executed. This is illustrated in step 255. This allows the SGE to respond to voice input from registered users, regardless of the direction from which the voice input is received, even if it is received from a blocked direction. The procedure can store a data point of a valid command for learning purposes, which may include optionally storing the time and direction from which the voice command was received. This can be used, for example, to learn common directions of valid commands, such as from a preferred position relative to the SGE, which can be amplified for more sensitive command recognition.
[0052] If the speech input does not correspond to a recognized voice, it can be determined whether the speech input originates from a blocked direction. This is illustrated in step 256. Determining the direction of a speech input may involve measuring the angle of the incoming speech input. SGEs may have known functionality for evaluating the direction of incoming sound. For example, multiple microphones may be attached to the unit, and capturing the sound via the multiple microphones may allow for position determination (e.g., using triangulation or arrival time difference). A blocked direction may be stored as a range of angles of incidence of incoming sound relative to the receiver. In the case of multiple microphones, the blocked direction may correspond to speech inputs that are predominantly or strongly received by one or more of the multiple microphones.The direction of the incoming sound can be determined in a three-dimensional arrangement, with input directions from above or below as well as in a lateral direction around the SGE being determined.
[0053] If the voice input does not originate from a blocked direction, it is determined whether the voice input is a command. This is illustrated in step 257. If the voice input is determined to be a command, the command can be executed in step 255. This allows a command to be executed by an unregistered voice, i.e., by a new or guest user, and does not restrict the use of the SGE to registered users. In some embodiments, the command can be stored as a command data point, which includes the direction from which the command was received. This can be parsed for further voice registration or to determine whether the command is overwritten by subsequent user input. The procedure can then terminate and wait for further voice input (e.g., in step 253).
[0054] If it is determined that the speech input is not coming from a blocked direction and is not a command, it is identified as background noise. The background noise data is then stored along with the time, date, and direction (e.g., angle of incidence) from which the background noise was received. This is shown in step 259. The direction can then be added as a blocked direction, so that speech inputs repeatedly received from that direction can be blocked. A threshold can be implemented to determine when speech input received from an unblocked direction that does not specify a command can be classified as background noise.
[0055] For example, a plurality of speech inputs can be received from a specific direction. Each of these speech inputs can be received at a specific time. The plurality of received speech inputs can be compared to stored command data to determine whether each of the received speech inputs corresponds to the stored command data. The number of speech inputs from the plurality that do not correspond to the stored command data can be determined. The number of speech inputs that do not correspond to the stored command data can be compared to a threshold for non-command speech inputs (e.g., a threshold that sets a number of non-command speech inputs that can be received for a specific direction before that specific direction is stored with one or more blocked directions).In response to the number of non-command voice inputs exceeding the threshold for non-command voice inputs, the direction of each input can be stored along with one or more blocked directions. In some embodiments, the background noise threshold includes information about the sound characteristics, such as frequency and amplitude.
[0056] If it is determined that the speech input is coming from a blocked direction, it is assessed whether the speech input is a command. This is shown in step 258. If the speech input is coming from a blocked direction and is not a command, it is stored as a source of background noise. This is shown in step 259.
[0057] If it is determined that the voice input coming from a blocked direction is a command, the procedure continues to step 250. Fig. 2C, where one or more audio output units are detected in the blocked direction. This is illustrated in step 271. The SGE can manage a data store with blocked directions and the audio output units in each blocked direction, along with a transmission channel or address for querying each audio output unit. In some embodiments, the method can detect all audio output units in the immediate vicinity of the SGE and use a unified communication method to exchange data with all audio output units.
[0058] The one or more detected audio output units are queried. This is shown in step 272. In embodiments where there are multiple audio output units in the blocked direction, multiple queries can be transmitted to two or more audio output units simultaneously. Querying audio output units can involve transmitting a request signal to the units to capture audio data associated with them. The audio output unit(s) can be queried for volume status (e.g., whether the unit(s) is / are muted or for the current volume level) and power status (e.g., whether the unit(s) is / are powered on). The status request can be transmitted using any suitable connection (e.g., intranet, internet, Bluetooth, etc.).
[0059] The process determines whether the queried audio unit is outputting sound. This is illustrated in step 273. Determining whether the audio output unit is outputting sound is performed based on the audio output unit's response to the query request. For example, if the response indicates that the audio output unit is muted, step 273 determines that the audio output unit is not outputting sound. Similarly, if the response (or lack thereof) indicates that the audio output unit is turned off, the process determines that the audio output unit is not outputting sound. In some embodiments, the determination may be based on a volume threshold.For example, if a volume threshold is set to 50% volume and the response to the query request indicates that the audio output unit volume is 40%, it can be determined that the audio output unit is not producing a sound loud enough to activate the SGE.
[0060] If the audio output unit is found to be not producing sound (or the sound is too quiet to activate the SGE based on a volume threshold), the command is executed. This is shown in step 274. If the audio output unit is found to be producing sound (or the sound is loud enough to activate the SGE based on a volume threshold), a sound sample is requested from the audio output unit. This is shown in step 275. In some embodiments, the sound sample (e.g., an audio file, an audio clip, etc.) corresponds to the time at which the SGE was activated. For example, a 2- to 10-second sound sample may be requested that covers the time at which the SGE was activated. However, any other length of sound sample may be requested (e.g., the last hour, the last day, or the last SGE power-up session).
[0061] The requested audio file is then received by the SGE. This is shown in step 276. The audio file detected by the microphone is also received by the SGE. This is shown in step 277. The audio files are then processed for comparison in step 278. In some embodiments, the processing may include cleaning the audio files (e.g., removing static noise and background noise from the audio files). In some embodiments, the processing may include trimming the audio files (e.g., the audio file from the audio output unit and the audio file received from the SGE microphone) to a uniform length. In some embodiments, the processing may include amplifying the audio files. In some embodiments, the processing may include dynamically adjusting the timing of the audio files so that the words contained in the audio files are correctly aligned for comparison.However, in step 278, the audio files can be processed in any other suitable way. For example, in some embodiments, the audio files are converted into text (e.g., using a conventional speech-to-text conversion) so that the text (e.g., transcripts) can be compared between the audio files.
[0062] The audio file received from the audio output unit and the audio file received at the SGE microphone are then compared. This is shown in step 279. In some embodiments, the audio files are compared using a Fast Fourier Transform (FFT). In embodiments where the audio files are converted to text, the transcripts for each audio file can be compared to determine whether the strings in each transcript match. In some embodiments, the text transcripts can be converted into a phonetic representation to avoid false positives, where words sound like a command but actually mean something else. Minor differences in recognition can be captured using known string similarity and text comparison techniques.
[0063] The next step is to determine whether there is a match between the audio file received by the audio output unit and the audio file received at the SGE microphone. This is illustrated in step 280. In embodiments, determining whether a match exists can be performed based on one or more thresholds. For example, if audio files are compared using FFT, there can be a threshold for the probability of match to determine whether the audio files are substantially similar. For instance, if a threshold for the probability of match is set to 70% and the FFT comparison yields a similarity of 60%, it can be determined that the audio files are substantially different. In another example, it can be determined that the audio files are substantially similar if the comparison yields a similarity of 75%.
[0064] In embodiments where audio files are converted to text and compared, the probability of a match threshold can be based on the number of matching characters / words in the audio file transcripts. For example, a probability of a match threshold can specify an initial number of characters within each transcript that must match for the audio files to be considered substantially a match (e.g., 20 characters must match for the audio file to be considered substantially a match for the voice command and audio output unit). In another example, a probability of a match threshold can specify a number of words in each transcript that must match for the audio files to be considered substantially a match (e.g.,(Five words must match for the audio file to be considered substantially similar for the voice-controlled unit and the audio output unit).
[0065] If a match is found, the command originates from the audio output unit and can be ignored. This is shown in step 281. If no match is found, the command can be processed and executed because the sound originates from a source other than the audio output unit. This is shown in step 274.
[0066] If the command is ignored in step 281, it can be stored as an ignored command data point with a timestamp and date, along with the received direction of the voice input. This data can be used for blocked direction analysis. Furthermore, this data can be used to analyze whether a unit is still in a blocked direction by referencing the stored time and date of the background noise data points. A threshold number of unrecognized voice commands from a specific direction can be stored before that direction is added to the blocked directions.
[0067] Analyzing stored data points of valid voice commands, background noise input, and ignored or invalid commands recorded with incoming directions can be performed to learn blocked directions and, optionally, frequently used directions for valid commands. Regular data cleaning (e.g., formatting and filtering) can be executed as a background process on the stored data points.
[0068] Data points can be stored, enabling the process and the system to more accurately determine directions in which sounds should be ignored. Sounds not originating from a recognized voice coming from a known blocked direction can be ignored, regardless of whether they are a command or not.
[0069] Storing data points of different types of noise allows for further analysis of background noise and thus a more precise differentiation of which directions should be blocked. For example, the beeping of an oven can be received by an SGE (Signal Enclosure Unit) from the direction of the oven. Over time, the background noise data points can be analyzed to determine whether a particular background noise data point in that direction has never contained a command, or whether the background noise data points from a specific direction are very similar in terms of audio content. In this case, commands from that direction can be allowed.
[0070] The SGE can include a user input mechanism to override blocked direction inputs or override the execution of a command. The process can also learn from such user override inputs to improve performance.
[0071] This method assumes that the SGE remains in the same location and space, which is often the case. If the SGE is moved to a new location, it can relearn its environment to detect blocked directions for non-human sources in relation to the SGE at the new location. The method can store blocked directions in relation to a specific SGE location, allowing the SGE to be moved back to a previous location and reconfigure the blocked directions without having to relearn its environment.
[0072] In some embodiments, the method can allow for a configuration of known directions to be blocked. This can eliminate the need to learn blocked directions over time, or it can be an additional step, allowing a user to preset blocked directions based on their knowledge of the directions from which interfering sound can be received.
[0073] The user can position the SGE at a location, and the SGE can accept input to configure blocked locations. This can be done via a graphical user interface, a remote programming service, etc. In one embodiment, blocked directions can be configured using voice commands, with the user standing at the angle to be blocked (e.g., in front of a television) and commanding that direction be blocked. In another embodiment, a room preconfiguration can be loaded and saved for use when the SGE is moved.
[0074] Using the in Fig. In the example shown, the SGE 120 can begin receiving commands from television set 114 and storing the direction of these incoming commands. The system can then filter out commands and background noise that always originate from the same direction. Commands from this direction that do not match the voices that normally give commands from other directions are ignored.
[0075] One advantage of the described method is that, unlike units that block all commands not originating from a known user, the SGE can include an additional layer of verification in the form of a direction. If a new user approaches the SGE and issues a command, the SGE can still execute the command because it comes from a direction that is not blocked by an assignment to static sound-emitting objects.
[0076] The technical problem that is solved is enabling the unit to recognize that sounds originate from other units and not from human users. Furthermore, the present disclosure offers a technical advantage in determining whether commands coming from a blocked direction originate from an audio output unit or a human, by querying the unit for current sound samples.
[0077] The aforementioned processes can be carried out in any order and are not limited to those described. Furthermore, some, all, or none of the aforementioned processes can be carried out, and this will still remain within the scope of protection of this disclosure.
[0078] Fig. Figure 3 is a flowchart illustrating an exemplary method for transferring an audio file to a voice-controlled unit according to embodiments of the present disclosure. Audio output units that produce sound, such as a television, radio, or mobile devices, may include software components that enable them to exchange data with a voice-controlled unit (SCU) on demand (e.g., a smart speaker or a smart television with network interface control units (NICs)). An audio output unit may require a software update to incorporate this functionality, and it may be necessary for the audio output unit to be connected to the same network as the SCU, e.g., the user's home Wi-Fi network.
[0079] Procedure 300 begins with an audio unit monitoring an audio output. This is shown in step 301. The audio output can be buffered for a predefined or configured duration. This is shown in step 302. For example, the audio output unit can buffer the last 10 seconds, 1 minute, 5 minutes, etc., of the audio output.
[0080] The audio output unit can receive a status request from the SGE. This is shown in step 303. The request can be the same as, or substantially similar to, the one referred to in step 275 of Fig. The requirement described in 2C. In embodiments, the request can query the volume status and / or the power supply status. The audio output unit can acknowledge the request with a status response. This is shown in step 304. The status response can include volume status and / or power supply status data. If the SGE determines, based on the status response, that the audio output unit is producing sound, a request for an audio output sample can be received from the SGE. This is shown in step 305. If the audio output unit is producing sound, in some embodiments, the audio output unit can automatically transfer the last cached audio output to the SGE in the form of an audio file. This is shown in step 306.In some embodiments, the audio output unit can wait until it receives the request for a sound sample from the SGE before transferring the cached audio file to the SGE.
[0081] The aforementioned processes can be carried out in any order and are not limited to those described. Furthermore, some, all, or none of the aforementioned processes can be carried out, and this will still remain within the scope of protection of this disclosure.
[0082] With reference to Fig. Figure 4 shows a block diagram of an SGE 420 according to embodiments of the present disclosure. The SGE 420 can be the same or substantially similar to the one described in Figure 4. Fig. 1 described in SGE 120. In embodiments, the components shown in SGE 420 may be processor-executable instructions configured to be executed by a processor.
[0083] The SGE 420 can be a special unit or part of a general-purpose data processing unit comprising at least one processor 401, which includes a hardware module or circuit for performing the functions of the described components. These components can be software units running on the at least one processor 401. Multiple processors with parallel processing program segments can be provided, enabling parallel processing of some or all of the components' functions. The memory 402 can be configured to provide computer instructions 403 to the at least one processor 401 for executing the functionality of the components.
[0084] The SGE 120 can include components for the known functionality of an SGE, depending on the type of unit and the known speech processing. In embodiments, the SGE 420 includes a speech input receiver 404, which has multiple (e.g., two or more) microphones configured in an arrangement to receive speech input from different directions relative to the SGE 420. These audio signals received by the multiple microphones of the speech input receiver 404 can be used to determine the positions (e.g., directions) of incoming sounds. This can be done via triangulation or arrival time difference.
[0085] The SGE 420 can include a command processing system 406 in the form of existing SGE software for receiving and processing voice commands. In addition, a voice command identifying system 410 can be provided to determine directions to be blocked and to recognize commands from blocked directions. The SGE 420 can further include a voice command distinguishing system 440, configured to distinguish voice input from a known audio output unit in a blocked direction from genuine voice input commands that may originate from an unknown or unregistered user.
[0086] The SGE software, including voice command recognition processing, can be deployed locally to the SGE 420 or a data processing facility, or it can be delivered remotely over a network, for example, as a cloud-based service. The System 410 voice command recognition and System 440 voice command discrimination can be delivered as downloadable updates for the SGE software or as individual add-on services over a network, for example, as a cloud-based service. A remote service can also deliver an application or application update for an audio output unit to provide the described functionality to the audio output unit.
[0087] The System 440 for Distinguishing Speech Commands can include a Blocked Direction Component 421 to access one or more stored blocked directions of background speech noise for a location of the SGE 420, as provided by the System 410 for Speech Command Recognition. A Blocked Direction Data Store 430 with associated Audio Output Units, including stored transmission channels for the Audio Output Units, can be managed by the System 410 for Speech Command Recognition.
[0088] The voice input receiver 404 can be configured to receive a voice input in the SGE 420 at the site and detect that the voice input is being received from a blocked direction.
[0089] The system 440 for distinguishing voice commands can include an identifying component 423 for recognizing one of the audio output units belonging to the blocked direction by reference to the data memory 430. The identifying component 423 can determine units located in blocked directions via triangulation or arrival time difference.
[0090] The System 440 for distinguishing voice commands can include a query component 424, configured to query the status of a detected audio output unit to determine if it is currently producing audio, and, if so, to receive an audio file of the last audio output from the audio output unit. The query component 424 includes a status request component 425, configured to request one or more statuses (e.g., volume or power status) from one or more units. The query component 424 also includes an audio determining component 426 to determine if the queried audio unit is producing sound (e.g., based on the volume / power status).In some embodiments, the query component 424 includes a threshold component 427 that implements one or more thresholds to determine whether the audio output unit is producing sound. For example, the threshold component 427 may be configured to set a volume threshold to determine whether a unit is producing sound. The query component may also include a status receiving component 431 for receiving a status from an audio output unit.
[0091] The query component 424 can query the status of all audio output units within a section of the voice-controlled unit to determine which one is currently producing audio. The query component 424 can receive an audio file that is cached in the audio output unit.
[0092] The system 440 for distinguishing voice commands can include an audio file obtaining component 432 to receive the audio file from the queried audio output unit. The received audio file can be compared with the voice input received in the voice input receiver 404.
[0093] The System 440 for distinguishing voice commands can include a comparison component 428 for comparing the received audio file with the received voice input. The comparison component 428 can include a processing component 429 for processing the received audio file and the received voice input in order to convert the audio file and the voice input into text and present the text as phonetic strings for comparison. However, the processing component 429 can process the audio files in any other suitable way (e.g., by changing the amplitude, length, etc., of the audio files).
[0094] The system 440 for distinguishing voice commands can include a component 433 for ignoring voice input (voice input ignoring component) to ignore the received voice input if there is a substantial match with the received audio file.
[0095] The voice command discrimination system 440 can also include a mobile device interaction component 460. The mobile device interaction component 460 can be configured to synchronize stored blocked directions in the data store 430 with nearby mobile devices. Furthermore, the mobile device interaction component 460 can also be configured to receive signals from mobile devices, enabling the SGE 420 to store updated positions of the respective mobile devices over an extended period. In embodiments, the mobile device interaction component 460 can transmit other SGE data stored in the data store 430 to nearby mobile devices, such as speech recognition data (e.g., voiceprints of registered users) or background noise data (e.g.,the properties of background noise) and audio status data (which were received, for example, by query component 424).
[0096] With reference to Fig. Figure 5 shows a block diagram of an exemplary audio output unit 550 according to embodiments of the present disclosure. In embodiments, the audio output unit 550 can be any suitable audio output unit. For example, the audio output unit can be the one described in Figure 5. Fig. The television set shown (114), the radio (112), or the mobile unit (150) may be the device shown. However, the audio output unit (550) may be a smartwatch, a mobile unit, a speaker, a voice-controlled unit, a computer system (e.g., a laptop, a desktop computer, etc.), or any other suitable audio output unit.
[0097] The audio output unit 550 can be any type of unit with an audio output and at least one processor 551, which can be a hardware module or a circuit for performing the functions of the described components, which can be software units running on the at least one processor 551. Multiple processors with parallel processing program segments can be provided, enabling parallel processing of some or all of the components' functions. A memory 552 can be configured to provide computer instructions 553 to the at least one processor 551 for executing the components' functionality.
[0098] The Audio Output Unit 550 includes a software component, the Audio Output Providing System 560, which allows it to exchange data with an SGE (e.g., the SGE 120 or SGE 420) upon request. An Audio Output Unit 550 may require a software update to include this functionality, and it may also be necessary for the Audio Output Unit 550 to be connected to the same network as the SGE, such as the user's home Wi-Fi network.
[0099] The system 560 for providing audio output can include a monitoring component 561 for monitoring an audio output of the audio output unit 550 and a buffer component 562 for temporarily storing a predefined duration of a last audio output in a buffer 563.
[0100] The system 560 for providing audio output can include a status component 564 for transmitting a status response to a querying SGE to determine whether the audio output unit 550 is currently making an audio output.
[0101] The audio output system 560 can include an audio file component 565 for transferring an audio file to the SGE when the audio output unit is currently outputting audio. The transferred audio file can be a recently cached audio output from the buffer 563 of the audio output unit 550.
[0102] With reference to Fig. Section 6 now shows a flowchart illustrating an exemplary procedure 600 for filtering voice commands in a mobile unit (e.g., the mobile unit 150 of Fig. 1) illustrated in an environment of an SGE according to embodiments of the present disclosure.
[0103] Procedure 600 begins at step 605, in which a data exchange with an SGE is established (e.g., with the SGE 120 of Fig. 1 or the SGE 420 from Fig. 4) This is illustrated in step 605. Data exchange with an SGE can be set up in any suitable way, including via wired and wireless network connections.
[0104] SGE data is then received from the SGE. This is illustrated in step 610. SGE data can include any data stored in the SGE memory, including stored blocked directions, speech recognition data, wake word data, background noise audio properties (e.g., amplitude, pitch, etc.), background noise contextual information (e.g., background noise metadata), and audio output status data (e.g., volume / power data from nearby output units).
[0105] A command is then received by the mobile unit. This is shown in step 615. The command may be received in response to the utterance of an activation word. Next, it is determined whether the mobile unit has directional analysis capabilities. This is shown in step 620. If the mobile unit has directional analysis capabilities (e.g., the mobile unit is configured to perform triangulation), it is determined whether the command was received from a blocked direction. This is shown in step 625. This can be considered essentially the same as in step 256 of Fig. 2B is executed. If the command is not received from a blocked direction, the command is executed in step 650. If the command is received from the blocked direction, it is determined whether the command is in a recognized voice. This is shown in step 645. Determining whether the command is in a recognized voice can be considered essentially the same as in step 254 of Fig. 2B is executed. If the command is in a recognized voice, the command is executed in step 650. If the command is not in a recognized voice, the command is ignored. This is shown in step 640.
[0106] If it is determined that the mobile unit lacks directional analysis capabilities, the sound is analyzed and compared with details of background noise sources. This is illustrated in step 630. The SGE data received in step 610 may include audio characteristics of background noise that the SGE regularly receives. The audio characteristics of the background noise can be compared with the currently received audio data to determine if there is a match. Based on the comparison performed in step 630, it is determined whether background noise is likely. This is illustrated in step 635. For example, if the frequency, amplitude, pitch, etc., of the audio characteristics match the frequency, amplitude, pitch, etc., of the currently received sound, it can be determined that the sound is likely background noise.
[0107] In some embodiments, contextual information (e.g., the time of day) can be considered when determining whether the noise is likely to be background. For example, metadata associated with the background noise received in step 610 can be compared with metadata of audio currently received on the mobile unit to determine whether it is likely to be background noise. In some embodiments, both contextual information and audio data can be considered together when determining whether background noise is likely.
[0108] If it is determined that the signal is likely background noise (e.g., there is a substantial match that may be based on a threshold), the command is ignored in step 640. If it is determined that the signal is likely not background noise, step 645 checks whether the command was received in a recognized voice. If the command is in a recognized voice, it is executed in step 650. If the command is not in a recognized voice, it is ignored in step 640.
[0109] The aforementioned processes can be carried out in any order and are not limited to those described. Furthermore, some, all, or none of the aforementioned processes can be carried out, while this still remains within the scope of protection of this disclosure.
[0110] Fig. Figure 7 is a flowchart illustrating an exemplary method 700 for updating an SGE with a position of a mobile unit according to embodiments of the present disclosure, so that it can be added to a blocked direction.
[0111] Procedure 700 begins at step 705, in which a data exchange with an SGE is established. Subsequently, the mobile unit transmits a signal tone to indicate a source of background noise. This is shown in procedure 710. The tone can have any amplitude or frequency. The SGE then performs a direction analysis so that the location of the mobile unit can be stored as a blocked direction. This is shown in procedure 715. A direction analysis can be performed as above, with reference to the Fig. 1 to Fig. The procedure described in section 6 is carried out. The mobile unit then changes its location. This is shown in procedure 720. The signal tone is then emitted again to indicate the new location as a source of potential background noise. This is shown in procedure 725. It is determined whether a command is received from the mobile unit at the location (e.g., the location in procedure 720).
[0112] If a command is received from the location, the command can be ignored. This is shown in Operation 735. If the command is not received from the location, it is determined whether the mobile unit is leaving the immediate vicinity. This is shown in Operation 740. Determining whether the mobile unit is leaving the immediate vicinity can be based on an interrupted communication link between the SGE and the mobile unit. In some embodiments, determining whether the mobile unit is leaving the immediate vicinity can be based on location data (e.g., Global Positioning System (GPS) data) from the mobile unit. If it is determined that the unit is leaving the immediate vicinity, the blocked direction is removed from the SGE's memory. This is shown in Operation 745.If it is determined that the unit has not left the immediate vicinity of the SGE, procedure 700 reverts to procedure 730, in which the orders in the SGE are continuously monitored.
[0113] The aforementioned processes can be carried out in any order and are not limited to those described. Furthermore, some, all, or none of the aforementioned processes can be carried out, while this still remains within the scope of protection of this disclosure.
[0114] With reference to Fig. 8 A summary block diagram of an exemplary computer system 801 according to embodiments of the present disclosure is now shown (e.g. SGE 120 of Fig. 1, SGE 420 of Fig. 4, Audio output unit 550 of Fig. 5), which can be used in implementing one or more of the methods, utilities, and modules described herein and all related functions (e.g., using one or more processor circuits or computer processors of the computer). In some embodiments, the main components of the computer system 801 may include one or more CPUs 802, a memory subsystem 804, a connection interface 812, a memory interface 814, an I / O (input / output) unit interface 816, and a network interface 818, all of which may be connected directly or indirectly via a memory bus 803, an I / O bus 808, and an I / O bus interface unit 810 for data exchange between the components.
[0115] The Computer System 801 can contain one or more programmable general-purpose central processing units (CPUs) 802A, 802B, 802C, and 802D, collectively referred to here as CPU 802. In some embodiments, the Computer System 801 can contain multiple processors, as is common for a relatively large system; however, in other embodiments, the Computer System 801 can be a system with a single CPU. Each CPU 802 can execute instructions stored in the Memory Subsystem 804 and can include one or more levels of integrated cache.
[0116] The 804 system memory can include computer system-readable media in the form of volatile memory, such as random access memory (RAM) 822 or cache 824. The 801 computer system can also include other removable / non-removable, volatile / non-volatile computer system memory media. By way of example only, a 826 memory system can be provided for reading and writing a non-removable, non-volatile magnetic medium, such as a "hard disk." Although not shown, a magnetic disk drive can be provided for reading and writing a removable, non-volatile magnetic disk (such as a "USB flash drive" or a "floppy disk"), or an optical disk drive can be provided for reading or writing a removable, non-volatile optical disk, such as a CD-ROM, DVD-ROM, or other optical media. In addition, the 804 memory can include flash memory, e.g.,a flash memory stick or a flash drive. Memory units can be connected to the 803 memory bus via one or more data carrier interfaces. The 804 memory can comprise at least one program product with a set (e.g., at least one) of program modules configured to perform the functions of different embodiments.
[0117] One or more programs / utilities 828, each with at least one set of program modules 830, can be stored in memory 804. The programs / utilities 828 can include a hypervisor (also called a virtual machine monitor), one or more operating systems, one or more application programs, other program modules, and program data. Any of the operating systems, one or more application programs, other program modules, and program data, or a combination thereof, can include an implementation of a networked environment. Programs 828 and / or program modules 830 generally perform the functions or methodologies of different implementations.
[0118] In some embodiments, the 830 program modules of the 801 computer system include a speech command discrimination module. The speech command discrimination module can be configured to access one or more blocked directions of background speech noise from one or more audio output units. The speech command discrimination module can further be configured to receive speech input and determine whether it is received from a blocked direction. The speech command discrimination module can be configured to query the status of an audio unit to determine whether it is outputting sound. In response to a determination that the audio output unit is outputting sound, the speech command discrimination module can be configured to capture a sound sample from the audio output unit.The module for distinguishing voice commands can then compare the tone sample and the voice command and ignore the voice command if the tone sample and the voice command are essentially the same.
[0119] In embodiments, the 830 program modules of the 801 computer system include a voice command filtering module. The voice command filtering module can be configured to receive SGE data from an SGE. The voice command module can be configured to determine whether a voice command is being received from a blocked direction specified in the SGE data. If the voice command is received from a blocked direction specified in the SGE data, the voice command can be ignored.
[0120] Although the 803 memory bus is used in Fig. Figure 8 shows a structure with a single bus providing a direct communication path between the CPUs 802, the memory subsystem 804, and the I / O bus interface 810. However, in some embodiments, the memory bus 803 may comprise several different buses or communication paths arranged in various forms, such as point-to-point connections in hierarchical, star, or network configurations, multiple hierarchical buses, parallel and redundant paths, or any other suitable configuration. Although the I / O bus interface 810 and the I / O bus 808 are shown as separate units, the computer system 801 may also, in some embodiments, include multiple I / O bus interface units 810, multiple I / O buses 808, or both.Although several I / O interface units are shown that separate the I / O bus 808 from various communication paths leading to the different I / O units, in other embodiments some or all I / O units may be directly connected to one or more system I / O buses.
[0121] In some embodiments, Computer System 801 may be a mainframe system used by multiple users, a single-user system, a server computer, or a similar unit that has little or no direct user interface but receives requests from other computer systems (clients). Furthermore, in some embodiments, Computer System 801 may be implemented as a desktop computer, portable computer, laptop or notebook, tablet computer, handheld computer, telephone, smartphone, network switching center or routing computer, or any other suitable type of electronic unit.
[0122] It is pointed out that Fig. Figure 8 is intended to represent the principal components of an exemplary computer system 801. However, in some embodiments, individual components may have a higher or lower complexity than shown. Fig. 8 shown, other components than those in Fig. The 8 components shown, or additional components, may be present, and the number, type, and configuration of such components may vary.
[0123] Although this disclosure includes a detailed description of cloud computing, it should be clear from the outset that implementations of the teachings presented herein are not limited to a cloud computing environment. Rather, embodiments of the present disclosure can be implemented in conjunction with any other type of data processing environment currently known or developed at a later date.
[0124] Cloud computing is a service model that provides convenient and on-demand network access to a shared pool of configurable computing resources (e.g., networks, network bandwidth, servers, processing, storage, storage space, applications, virtual machines, and services) that can be rapidly provisioned and released with minimal administrative overhead or interaction with a service provider. This cloud model can include at least five characteristics, at least three service models, and at least four deployment models.
[0125] The characteristics are as follows: On-demand self-service: A cloud customer can unilaterally and automatically access data processing resources such as server time and network storage as needed, without requiring human-led interaction with the service provider. Broad network access: Resources are available over a network and accessible via standard mechanisms that support use by heterogeneous thin or thick client platforms (e.g., mobile phones, laptops, and PDAs). Resource pooling: The provider's data processing resources are pooled to serve multiple customers, using a multi-tenant model where various physical and virtual resources are dynamically allocated and reallocated according to demand. There is a degree of location independence insofar as the customer generally has no control over, or knowledge of, the exact location of the provided resources, but can define the location at a higher level of abstraction (e.g., region, state, or data center). Rapid elasticity: Resources can be provisioned quickly and flexibly, in some cases automatically, to enable rapid scaling out and just as quickly released, thus offering rapid scaling in. From the customer's perspective, the resources available for provisioning often appear unlimited and can be acquired in any quantity at any time. Measured service: Cloud systems automatically control and optimize resource usage by employing a measurement function at an abstraction level appropriate for the type of service (e.g., storage space, processing power, bandwidth, and active user accounts). Resource usage can be monitored, controlled, and recorded, providing transparency for both the service provider and the user.
[0126] The service models are as follows: Software as a Service (SaaS): The functionality provided to the customer consists of using the provider's applications running in a cloud infrastructure. These applications can be accessed from various client devices via a thin-client interface such as a web browser (e.g., web-based email). The customer does not manage or control the underlying cloud infrastructure, including the network, servers, operating systems, storage space, or even individual application features, with the possible exception of limited user-specific application configuration settings. Platform as a Service (PaaS): The functionality provided to the customer consists of using customer-created or requested applications, built using provider-supported programming languages and utilities, within the cloud infrastructure. The customer does not manage or control the underlying cloud infrastructure, including networks, servers, operating systems, or storage space, but has control over the applications used and potentially over configurations of the application's hosting environment. Infrastructure as a Service (IaaS): The functionality provided to the customer consists of the provision of data processing, storage, networking, and other basic data processing resources, allowing the customer to deploy and run any software, including operating systems and applications. The customer does not manage or control the underlying cloud infrastructure but has control over operating systems, storage, deployed applications, and potentially limited control over the selection of network components (e.g., host firewalls).
[0127] The following are the deployment models: Private Cloud: The cloud infrastructure is operated solely for one organization. It can be managed by the organization itself or a third party and can be located on the organization's own premises or on external premises. Community cloud: The cloud infrastructure is shared by multiple organizations and supports a specific user community with shared interests (e.g., objectives, security requirements, strategy, and compliance considerations). It can be managed by the organizations themselves or by a third party and can be located on-premises or externally. Public cloud: The cloud infrastructure is made available to the general public or a large industry group and is owned by an organization that sells cloud services. Combined cloud (hybrid cloud): The cloud infrastructure consists of two or more clouds (private, shared or public) that remain independent units, but are connected by a standardized or proprietary technology that enables the transferability of data and applications (e.g. cloud audience distribution for load balancing between clouds).
[0128] A cloud computing environment is service-oriented, focusing on state independence, loose coupling, modularity, and semantic interoperability. At its core, cloud computing consists of an infrastructure comprising a network of interconnected nodes.
[0129] With reference to Fig. Figure 9 illustrates a cloud computing environment 50. As shown, the cloud computing environment 50 has one or more cloud computing nodes 10 with which local data processing units used by cloud users, such as the electronic assistant (PDA, personal digital assistant) (e.g., the SGE 120 or the SGE 420) or the mobile phone 54A (e.g., the mobile unit 150), the desktop computer 54B, the laptop computer 54C, and / or the automotive computer system 54N, can exchange data. The nodes 10 can exchange data with each other. They can be grouped physically or virtually into one or more networks, such as private, shared, public, or combined clouds (not shown), as described above, or into a combination thereof.This allows the cloud computing environment to offer infrastructure, platforms, and / or software as a service, for which a cloud user does not need to maintain resources on a local data processing unit. It should be noted that the types of in . Fig. The types of data processing units 54A to N shown are for illustrative purposes only, and the data processing nodes 10 and the cloud computing environment 50 can exchange data with any type of computer unit via any type of network and / or any type of network-accessible connection (e.g., using a web browser).
[0130] With reference to Fig. 10 now shows a set of functional abstraction layers that are used by the cloud computing environment 50 ( Fig. 9) will be provided. It should be clear from the outset that the in Fig.The components, layers, and functions shown in Figure 10 are intended for illustrative purposes only, and embodiments of the invention are not limited to them. As shown, the following layers and corresponding functions are provided:
[0131] A hardware and software layer 60 contains hardware and software components. Examples of hardware components include: mainframe computers 61; servers based on the RISC (Reduced Instruction Set Computer) architecture 62; servers 63; blade servers 64; storage units 65; and networks and network components 66. In some embodiments, software components include network application server software 67 and database software 68.
[0132] A virtualization layer 70 provides an abstraction layer from which the following examples of virtual entities can be provided: virtual servers 71; virtual storage 72; virtual networks 73, including virtual private networks; virtual applications and operating systems 74; and virtual clients 75.
[0133] In one example, an administration layer 80 can provide the functions described below. Resource provisioning 81 provides dynamic procurement of data processing resources and other resources used to perform tasks within the cloud computing environment. Metering and pricing 82 provides cost tracking for the use of resources within the cloud computing environment and billing for the use of these resources. In one example, these resources might include application software licenses. Security provides identity verification for cloud customers and tasks, as well as protection for data and other resources. The user portal 83 provides access to the cloud computing environment for users and system administrators.Service level management (84) provides the allocation and management of cloud computing resources to ensure that service level requirements are met. Service level agreement (SLA) planning and fulfillment (85) provides the preparation for and procurement of cloud computing resources for which a future need is anticipated according to an SLA.
[0134] A workload layer 90 provides examples of functionality for which the cloud computing environment can be used. Examples of workloads and functions that can be provided by this layer include: mapping and navigation 91; software development and lifecycle management 92; virtual classroom 93 as an educational offering; data analytics processing 94; transaction processing 95; and voice command processing 96.
[0135] As explained in more detail herein, it is conceivable that some or all operations of some of the embodiments of processes described herein may be carried out in an alternative order or not at all; furthermore, several operations may be carried out simultaneously or as an internal part of a larger process.
[0136] The present disclosure may relate to a system, a method, and / or a computer program product. The computer program product may comprise a computer-readable storage medium (or media) containing computer-readable program instructions to induce a processor to execute aspects of the present disclosure.
[0137] A computer-readable storage medium can be a physical unit capable of retaining and storing instructions for use by a unit to execute instructions. For example, a computer-readable storage medium can be an electronic storage unit, a magnetic storage unit, an optical storage unit, an electromagnetic storage unit, a semiconductor storage unit, or any suitable combination thereof, without limitation. A non-exhaustive list of more specific examples of computer-readable storage media includes the following: a portable computer disk, a hard disk, random-access memory (RAM), read-only memory (ROM), and erasable programmable read-only memory (EPROM).Flash memory), static random-access memory (SRAM), portable compact storage disk-read-only memory (CD-ROM), a DVD (digital versatile disc), a memory stick, a floppy disk, a mechanically coded unit such as punched cards or raised structures in a groove on which instructions are stored, and any suitable combination thereof. A computer-readable storage medium shall not, in its use herein, be understood as volatile signals per se, such as radio waves or other freely propagating electromagnetic waves, electromagnetic waves propagating through a waveguide or other transmission medium (e.g., light pulses traveling through an optical fiber cable), or electrical signals transmitted by a wire.
[0138] The computer-readable program instructions described herein can be downloaded from a computer-readable storage medium to individual data processing units or, via a network such as the internet, a local area network, a wide area network, and / or a wireless network, to an external computer or external storage device. The network may include copper transmission cables, fiber optic transmission lines, wireless transmission, routing computers, firewalls, switching units, gateway computers, and / or edge servers. A network adapter card or network interface in each data processing unit receives computer-readable program instructions from the network and forwards them for storage on a computer-readable storage medium within the respective data processing unit.
[0139] Computer-readable program instructions for executing the operations of this disclosure may be assembly instructions, instruction-set architecture (ISA) instructions, machine instructions, machine-dependent instructions, microcode, firmware instructions, stateful data, or either source code or object code written in any combination of one or more programming languages, including object-oriented programming languages such as Java, Smalltalk, C++, and the like, as well as traditional procedural programming languages such as C or similar languages. The computer-readable program instructions may be executed entirely on the user's computer, partially on the user's computer, as a standalone software package, partially on the user's computer and partially on a remote computer, or entirely on the remote computer or server.In the latter case, the remotely located computer can be connected to the user's computer via any type of network, including a local area network (LAN) or a wide area network (WAN), or the connection can be established with an external computer (for example, via the internet using an internet service provider). In some embodiments, electronic circuits, including, for example, programmable logic circuits, field-programmable gate arrays (FPGAs), or programmable logic arrays (PLAs), can execute the computer-readable program instructions by using state information from the computer-readable program instructions to personalize the electronic circuits to perform aspects of the present disclosure.
[0140] Aspects of the present disclosure are described herein with reference to flowcharts and / or block diagrams or charts of processes, units (systems), and computer program products according to embodiments of the disclosure. It is noted that each block of the flowcharts and / or block diagrams or charts, as well as combinations of blocks in the flowcharts and / or block diagrams or charts, can be executed by means of computer-readable program instructions.
[0141] These computer-readable program instructions can be provided to a processor of a computer or other programmable data processing unit to create a machine such that the instructions executed by the processor of the computer or other programmable data processing unit produce a means of implementing the functions / steps specified in the block(s) of the flowcharts and / or block diagrams or charts. These computer-readable program instructions can also be stored on a computer-readable storage medium capable of controlling a computer, a programmable data processing unit, and / or other units to function in a certain manner, such that the computer-readable storage medium on which instructions are stored has a manufactured product, including instructions specifying which aspects of the block(s) in the flowchart(s) or diagrams are to be implemented.implement the function / step specified in the blocks of the flowchart and / or the block diagrams or charts.
[0142] The computer-readable program instructions can also be loaded onto a computer, another programmable data processing unit, or another unit to cause the execution of a series of process steps on the computer or other programmable unit or other unit in order to create a process executed on a computer, such that the instructions executed on the computer, another programmable unit, or other unit implement the functions / steps specified in the block(s) of the flowcharts and / or block diagrams or charts.
[0143] The flowcharts and block diagrams or charts in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, processes, and computer program products according to various embodiments of the present disclosure. In this context, each block in the flowcharts or block diagrams or charts can represent a module, segment, or part of instructions that includes one or more executable instructions for performing the specific logical function(s). In some alternative implementations, the functions specified in the block may occur in a different order than shown in the figures.Two blocks shown sequentially, for example, may in reality be executed essentially in one step, which may occur simultaneously, substantially simultaneously, partially, or completely overlapping in time, or the blocks may sometimes be executed in reverse order depending on the corresponding functionality. It should also be noted that each block in the block diagrams or flowcharts, as well as combinations of blocks in the block diagrams or flowcharts, can be implemented by special hardware-based systems that perform the specified functions or steps, or by combinations of special hardware and computer instructions.
[0144] The terminology used herein serves only to describe certain embodiments and is not intended to limit the various embodiments. As used herein, the singular forms "a" and "the" are intended to include the plural forms unless the context clearly indicates otherwise. Furthermore, it is understood that the terms "includes" and "comprise," when used in this description, indicate the presence of specified features, integers, steps, operations, elements, and / or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and / or groups thereof.The preceding detailed description of exemplary embodiments of the various embodiments refers to the accompanying drawings (in which identical numbers represent identical elements), which form part thereof and in which specific exemplary embodiments are shown for illustration purposes, illustrating how the various embodiments can be implemented. These embodiments have been described in sufficient detail to enable those skilled in the art to implement them; however, other embodiments may also be used, and logical, mechanical, electrical, and other modifications may be made without altering the scope of protection of the various embodiments. Numerous specific details have been set forth in the preceding description to facilitate a thorough understanding of the various embodiments.However, the various embodiments can also be implemented without these specific details. In other cases, known circuits, structures, and techniques were not shown in detail so as not to overshadow the embodiments.
[0145] Different occurrences of the word "implementation" as used in this description do not necessarily refer to the same embodiment. All data and data structures shown or described herein are merely examples, and other embodiments may use different sets of data, data types, fields, numbers and types of fields, field names, numbers and types of rows, records, entries, or data organization. Furthermore, any data may be combined with logic, so that a separate data structure is not strictly necessary. The foregoing detailed description should therefore not be considered limiting.
[0146] The descriptions of the various embodiments of this disclosure serve for illustrative purposes but are not intended to be exhaustive or limited to the disclosed embodiments. Many modifications and changes will be obvious to those skilled in the art without altering the scope of protection of the described embodiments. The terminology used herein has been chosen to best explain the basic concepts of the embodiments, their practical application, or the technical improvement over technologies already on the market, or to enable other skilled persons to understand the embodiments disclosed herein.
[0147] Although the present disclosure has been described by means of specific embodiments, it is assumed that changes and modifications thereof are obvious to those skilled in the art. Therefore, the following claims shall be interpreted as including all such changes and modifications that fall within the scope of protection of the disclosure.
Claims
[1] A computer-implemented method for filtering speech commands, wherein the method comprises: Setting up data exchange with a voice-controlled unit located at a specific site; Receiving data from the voice-controlled unit indicating blocked directions; Receiving a voice command; Determine that the voice command is being received from a blocked direction specified in the data; and Ignore the received voice command. [2] The method of claim 1, further comprising: Receiving a second voice command; Determine that the second voice command is being received from the blocked direction specified in the data; Determine that the second voice command is given in a recognized voice; and Executing the second voice command. [3] Method according to any of the preceding claims, wherein the determination that the command is received from the blocked direction is determined by an arrival time difference. [4] A method according to any of the preceding claims, further comprising: Emitting a tone to signal a second blocked direction to the voice-controlled unit, the voice-controlled unit determining the location of the tone and storing it in the data indicating blocked directions. [5] A method according to any of the preceding claims, further comprising: Querying the status of multiple audio output units to determine, which are currently making an audio output from the majority of audio output units; Receiving an audio file from each of the audio output units that were found to be making audio output; Comparing each of the received audio files with the received voice command; and Ignore the received voice command if there is a substantial match with at least one of the received audio files. [6] A method according to any of the preceding claims, further comprising: Receiving a set of data from the voice-controlled unit specifying the characteristics of background noise; Receiving a second voice command; Comparing the second voice command with the set of data that specifies the characteristics of background noise; Based on the comparison, determine that the second voice command matches the characteristics of background noise; and In response to the finding that the second voice command matches the characteristics of background noise, the second voice command is ignored. [7] A method according to any of the preceding claims, further comprising: Receiving a set of data indicating contextual data from background noise received by the voice-controlled unit; Receiving a second voice command; Comparing the context of the second voice command with the contextual background noise data received by the voice-controlled unit; Based on the comparison, determine that the context of the second voice command matches the contextual background noise data received by the voice-controlled unit; and In response to the finding that the context of the second voice command matches the contextual background noise data received by the voice-controlled unit, the second voice command is ignored. [8] System for filtering voice commands, wherein the system comprises: a memory that stores program instructions; and a processor configured to execute the program instructions, to carry out a procedure wherein the procedure comprises: Setting up data exchange with a voice-controlled unit located at a specific site; Receiving data from the voice-controlled unit indicating blocked directions; Receiving a voice command; Determine that the voice command is being received from a blocked direction specified in the data; and Ignore the received voice command. [9] System according to claim 8, wherein the method performed by the processor further comprises: Receiving a second voice command; Determine that the second voice command is being received from the blocked direction specified in the data; Determine that the second voice command is given in a recognized voice; and Executing the second voice command. [10] System according to one of claims 8 or 9, wherein the determination that the command is received from the blocked direction is determined by an arrival time difference. [11] System according to any one of claims 8 to 10, wherein the method executed by the processor further comprises: Emitting a tone to signal a second blocked direction to the voice-controlled unit, the voice-controlled unit determining the location of the tone and storing it in the data indicating blocked directions. [12] System according to any one of claims 8 to 11, wherein the method executed by the processor further comprises: Querying the status of multiple audio output units to determine, which are currently making an audio output from the majority of audio output units; Receiving an audio file from each of the audio output units that were found to be making audio output; Comparing each of the received audio files with the received voice command; and Ignore the received voice command if there is a substantial match with at least one of the received audio files. [13] System according to any one of claims 8 to 12, wherein the method executed by the processor further comprises: Receiving a set of data from the voice-controlled unit specifying the characteristics of background noise; Receiving a second voice command; Comparing the second voice command with the set of data that specifies the characteristics of background noise; Based on the comparison, determine that the second voice command matches the characteristics of background noise; and In response to the finding that the second voice command matches the characteristics of background noise, the second voice command is ignored. [14] System according to any one of claims 8 to 13, wherein the method executed by the processor further comprises: Receiving a set of data specifying contextual background noise data received by the voice-controlled unit; Receiving a second voice command; Comparing the context of the second voice command with the contextual background noise data received by the voice-controlled unit; Based on the comparison, determine that the context of the second voice command matches the contextual background noise data received by the voice-controlled unit; and In response to the finding that the context of the second voice command matches the contextual background noise data received by the voice-controlled unit, the second voice command is ignored. [15] Computer program product for filtering speech commands, wherein the computer program product comprises: a computer-readable storage medium that is readable by a processing circuit and stores instructions for execution by the processing circuit to execute a method according to claims 1 to 7. [16] A computer program stored on a computer-readable medium and loadable into the internal memory of a digital computer, comprising software code segments to execute the method according to any one of claims 1 to 7 when the program is executed in a computer.