Proposing an alternative interface when environmental interference is expected to interfere with a specific interaction of the automated assistant.
The automated assistant addresses input interference in noisy environments by suggesting alternative interfaces, reducing resource waste and improving interaction reliability.
Patent Information
- Authority / Receiving Office
- JP · JP
- Patent Type
- Patents
- Current Assignee / Owner
- GOOGLE LLC
- Filing Date
- 2024-08-13
- Publication Date
- 2026-06-22
AI Technical Summary
Users interacting with automated assistants in noisy environments often experience input modalities being interfered with, leading to repeated inputs and incorrect actions, which waste computing resources.
The automated assistant determines potential environmental interference and provides an indicator suggesting alternative input modalities, such as a keyboard interface, to avoid interference and conserve resources.
Reduces the need for repeated inputs and incorrect actions by suggesting alternative interfaces, thereby conserving computational resources and improving interaction reliability.
Smart Images

Figure 0007877403000001 
Figure 0007877403000002 
Figure 0007877403000003
Abstract
Description
Technical Field
[0001] Relates to proposing an alternative interface when environmental interference is predicted to prevent a particular interaction of an automated assistant.
Background Art
[0002] Humans can participate in human-computer dialogues using an interactive software application, herein referred to as an "automated assistant" (also called a "digital agent", "chatbot", "interactive personal assistant", "intelligent personal assistant", "conversational agent", etc.). For example, a human (sometimes called a "user" when interacting with an automated assistant) can provide commands and / or requests using verbal natural language input (i.e., speech), which may in some cases be converted to text and then processed, and / or by providing text-based (e.g., typed) natural language input.
[0003] Users interacting with automated assistants may be forced to invoke those assistants in situations where the modalities receiving user input are being interfered with. Such situations may include crowded public places and / or other areas where background noise is clearly present. Therefore, the input modality affected by interference cannot be a reliable interface for the automated assistant at that time. As a result, users providing input to a temporarily unreliable interface may have to repeat their input. In some cases, when a user is trying to control an automated assistant using their input, the automated assistant may request that the user repeat the input and / or initiate an incorrect action. Repeating input and / or performing an incorrect action can result in wasted computing resources if the repeated input and / or unintended action is processed on the client computing device and / or communicated to the server over the network. [Overview of the Initiative] [Means for solving the problem]
[0004] The implementations described herein relate to an automated assistant capable of determining whether an ongoing or anticipated interaction between a user and the automated assistant via a particular interface is expected to be affected by interference. The automated assistant can provide an indicator that, when it determines that an interaction at a particular interface is likely to be affected by interference, the user should interact with the automated assistant via a different interface. In this way, the automated assistant can conserve computational resources by reducing the number of times the user must repeat inputs that may be affected by some amount of environmental interference. Furthermore, the provision of the indicator can reduce the number of instances in which the automated assistant initializes an incorrect action as a result of the interaction being affected by a particular interference.
[0005] In some cases, a user may be located in an environment with some amount of background noise generated by the interactions of other people in the environment. In this situation, the user may keep a portable computing device, expecting to invoke their automated assistant to answer a specific query (e.g., "Assistant, what time am I supposed to meet Joe tomorrow?"). In some implementations, before the user provides input to their automated assistant, or simultaneously with the user providing input, the automated assistant can determine whether the input will be or is being obstructed by background noise. For example, when user input is specifically expressed in spoken utterances, background noise generated by other people may be captured in the audio that also captures those spoken utterances. The automated assistant may render an indicator to the user when it determines that the input is expected to be affected by background noise and / or other interferences. The indicator may work to inform the user that their input may not be fully received by the automated assistant and / or that there may be other input modalities available to the user to provide input. For example, in some implementations, the indicator can be a rendering of a keyboard interface, which the user can use to type input to the automated assistant instead of speaking to it.
[0006] In some implementations, one or more features of the user's environment can be characterized by data used to determine whether or not those features will influence user input to some extent. This data can be processed to generate one or more scores, which can be compared to a threshold, when met, that indicates that user input is expected to be influenced by one or more features of the environment. For example, the scores may be based on the number of people in the environment, the number of people speaking in the environment, one or more previous instances in which the user has interacted with an automated assistant in a similar environment, the type of background noise, the volume level of the background noise, one or more images of the environment, and / or any other data that can be used to characterize the environment.
[0007] In some implementations, the score and / or additional scores may be based on the availability of separate modalities for interacting with the automated assistant. For example, when a user is in an environment where the automated assistant is accessible via a standalone speaker device that does not include a graphical user interface, the additional score may be generated to reflect the limited availability of any other interface (e.g., a keyboard interface rendered in a graphical user interface). Alternatively, when a user is in a different environment where the automated assistant is accessible via a standalone speaker device and a mobile computing device in the user's pocket, the additional score may be generated to reflect the richness of interfaces with which the user can interact with the automated assistant. In some implementations, the additional score may be generated to reflect the availability of automated assistant interfaces that are not expected to be interfered with in the environment. Based on the interference score for the environment and optionally the interface availability score, the automated assistant can determine whether to provide an indicator that another interface is available and / or that a particular type of input is expected to be interfered with. In some implementations, one or more scores may be updated over time based on environmental changes and / or changes in the availability of a particular automated assistant interface.
[0008] In some implementations, one or more scores may be based on processing data from one or more different sources using one or more trained machine learning models. These trained models can be trained using training data based on prior instances where one or more different types of interference influenced, and / or did not influence, the interaction between the user and the automated assistant. For example, one or more data sources could characterize the environment in which a user resides, and the data could be processed using one or more trained machine learning models to generate embeddings. These embeddings can be compared to other embeddings in the latent space to determine whether the environment exhibits interferences that would influence the interaction between the user and the automated assistant. Alternatively, or in addition to this, one or more trained machine learning models could also be used to classify and / or otherwise determine the probability of a user being understood. In some implementations, the distance from this embedding to another embedding in the latent space can be compared to a threshold or another distance to determine whether it should suggest that the user should interface with the automated assistant via a particular interface modality. For example, by comparing embeddings in latent space and / or through heuristic methods, it may be shown that one or more interfaces are affected by interference, and / or that one or more other interfaces are less affected by interference.
[0009] The above description is provided as an overview of some implementations of this disclosure. Further descriptions of those implementations and other implementations are provided below in more detail.
[0010] Other implementations may include a non-temporary computer-readable storage medium that stores instructions for performing methods such as one or more of the methods described above and / or elsewhere in this specification, which can be executed by one or more processors (e.g., a central processing unit (CPU), a graphics processing unit (GPU), and / or a tensor processing unit (TPU)). Further implementations may include a system of one or more computers, including one or more processors capable of operating to execute the stored instructions for performing methods such as one or more of the methods described above and / or elsewhere in this specification.
[0011] It should be understood that any combination of the aforementioned concepts and any further concepts described in more detail herein is intended to be part of the subject matter disclosed herein. For example, any combination of the claimed subject matter appearing at the end of this disclosure is intended to be part of the subject matter disclosed herein. [Brief explanation of the drawing]
[0012] [Figure 1A] This figure shows a view where a user receives a suggestion to type their input to an automated assistant instead of speaking it aloud in an environment where verbal input might interfere. [Figure 1B] This figure shows a view where a user receives a suggestion to type their input to an automated assistant instead of speaking it aloud in an environment where verbal input might interfere. [Figure 2] This figure shows a system for suggesting alternative interface modalities when it is anticipated that the automated assistant and / or the user will not understand a particular interaction between the user and the automated assistant. [Figure 3]This figure shows a method for providing an indicator that shows whether a particular input to an automated assistant will be affected by interference in the environment, and for providing a separate interface for providing input to the automated assistant. [Figure 4] This is a block diagram of an exemplary computer system. [Modes for carrying out the invention]
[0013] Figures 1A and 1B show views 100 and 120, respectively, of user 102 receiving a suggestion to type input to the automated assistant instead of speaking input within an environment 110 that may interfere with verbal input. For example, user 102 may be located within an environment 110, such as outside their apartment building, and can ask their automated assistant any questions they wish to ask. To access the automated assistant, user 102 may have a computing device 106 that allows user 102 to interact with the automated assistant through one or more interfaces. For example, computing device 106 may include a display interface 104, which may be a touch-enabled display panel, and an audio interface 108, which may include a speaker and / or microphone. In some implementations, computing device 106 may also include a camera to provide another interface for interacting with the automated assistant.
[0014] In some implementations, the automated assistant can, with prior permission from user 102, determine whether one or more features of the environment 110 will affect the interaction between user 102 and the automated assistant. For example, the automated assistant can determine that user 102 is on a crowded sidewalk based on audio and / or image data available to the computing device 106. In some implementations, data from one or more sources can be processed to determine whether input to a particular interface of the computing device 106 will be subject to interference from the environment 110 and / or other sources. Based on this processing, the automated assistant can choose to provide user 102 with an indicator that input to a particular interface will be subject to interference while present in the environment 110. For example, initially, the display interface 104 does not need to have any indicators indicating that a particular interface will be affected by interference such as background noise (e.g., multiple people talking as shown in Figures 1A and 1B) and / or other interference. However, as shown in view 120 of Figure 1B, an automated assistant and / or other application may, based on its determination that a particular interface would be affected by a particular interference, have the keyboard 124 rendered on the display interface 104.
[0015] In some implementations, one or more characteristics of the keyboard 124 can be based at least in part on the extent to which interference is expected to affect input to a particular interface (e.g., a microphone) of the computing device 106. For example, the size of the keyboard 124 can be adjusted according to the extent to which interference is expected to affect input to a particular interface. In some implementations, when interference is not expected to affect verbal input to the automated assistant, the automated assistant can optionally allow the input field 112 to be rendered on the display interface 104. However, when interference is expected to affect input to the automated assistant, the automated assistant can either retain the input field 112 or remove it from the display interface 104 so that the keyboard 124 is rendered on the display interface 104. This may allow the user 102 to type input to the automated assistant using their hands 122 without deliberately subjecting verbal input to audible interference.
[0016] Figure 2 shows a system 200 for suggesting alternative interface modalities when it is anticipated that the automated assistant and / or the user will not understand a particular interaction between the user and the automated assistant. The automated assistant 204 can operate as part of an assistant application provided in one or more computing devices, such as computing device 202 and / or a server device. The user can interact with the automated assistant 204 via an assistant interface 220, which can be a microphone, camera, touchscreen display, user interface, and / or any other device capable of providing an interface between the user and the application. For example, the user can initialize the automated assistant 204 by providing verbal, text, and / or graphical inputs to the assistant interface 220 to cause the automated assistant 204 to initialize one or more actions (e.g., providing data, controlling a peripheral device, accessing an agent, generating input and / or output). Alternatively, the automated assistant 204 may be initialized based on processing contextual data 236 using one or more trained machine learning models. Context data 236 can characterize one or more characteristics of the environment accessible to the automated assistant 204, and / or one or more characteristics of users who are expected to be attempting to interact with the automated assistant 204. The computing device 202 may include a display device, which may be a display panel including a touch interface, which is for receiving touch input and / or gestures that enable the user to control the application 234 of the computing device 202 via the touch interface.In some implementations, the computing device 202 may lack a display device and therefore provide an audible user interface output without providing a graphical user interface output. Furthermore, the computing device 202 may provide a user interface such as a microphone for receiving oral natural language input from the user. In some implementations, the computing device 202 may include a touch interface, and while the computing device 202 may not have a camera, it may optionally include one or more other sensors.
[0017] Computing device 202 and / or other third-party client devices can communicate with a server device over a network such as the Internet. In addition, computing device 202 and any other computing devices can communicate with each other over a local area network (LAN), such as a Wi-Fi network. Computing device 202 can offload computing tasks to the server device to conserve computing resources on computing device 202. For example, the server device can host an automated assistant 204, and / or computing device 202 can send inputs received at one or more assistant interfaces 220 to the server device. However, in some implementations, the automated assistant 204 can be hosted on computing device 202, and various processes that can be associated with the operation of the automated assistant can be performed on computing device 202.
[0018] In various implementations, all or some aspects of the automated assistant 204 can be implemented on the computing device 202. In some of these implementations, aspects of the automated assistant 204 can be implemented by the computing device 202 and interface with a server device that can implement other aspects of the automated assistant 204. The server device can optionally serve multiple users and their associated assistant applications via multiple threads. In implementations where all or some aspects of the automated assistant 204 are implemented by the computing device 202, the automated assistant 204 can be a separate application from the operating system of the computing device 202 (e.g., installed "on top of" the operating system), or alternatively, it can be implemented directly by the operating system of the computing device 202 (e.g., integrated with the operating system but considered an application of the operating system).
[0019] In some implementations, the automated assistant 204 may include an input processing engine 206, which can process inputs and / or outputs of the computing device 202 and / or server device using multiple different modules. For example, the input processing engine 206 may include a speech processing engine 208 that can process audio data received at the assistant interface 220 to identify text specifically expressed within that audio data. To conserve computing resources in the computing device 202, the audio data can be sent, for example, from the computing device 202 to the server device. Alternatively, the audio data can be processed entirely within the computing device 202.
[0020] The process for converting audio data to text may include a speech recognition algorithm, which can use a neural network and / or statistical model to identify groups of audio data corresponding to words or phrases. The text converted from the audio data can be parsed by the data parsing engine 210 and made available to the automated assistant 204 as text data that can be used to generate and / or identify command phrases, intents, actions, slot values, and / or other arbitrary content specified by the user. In some implementations, the output data provided by the data parsing engine 210 can be provided to the parameter engine 212 to determine whether the user has provided input corresponding to a particular intent, action, and / or routine that can be performed by the automated assistant 204 and / or an application or agent accessible through the automated assistant 204. For example, assistant data 238 can be stored in the server device and / or computing device 202, and the assistant data 238 may include data defining one or more actions that can be performed by the automated assistant 204, as well as the parameters necessary to perform those actions. The parameter engine 212 can generate one or more parameters relating to intent, action, and / or slot values, and provide these one or more parameters to the output generation engine 214. The output generation engine 214 can use these one or more parameters to communicate with the assistant interface 220 to provide output to the user, and / or with one or more applications 234 to provide output to one or more applications 234.
[0021] In some implementations, the automated assistant 204 can be an application that can be installed "on top of" the operating system of the computing device 202, and / or it can itself form part (or all) of the operating system of the computing device 202. The automated assistant application includes and / or can access on-device speech recognition, on-device natural language understanding, and on-device implementation. For example, on-device speech recognition can be implemented using an on-device speech recognition module that processes audio data (detected by the microphone) using an end-to-end speech recognition machine learning model stored locally on the computing device 202. On-device speech recognition generates recognized text for any oral utterances present in the audio data. Alternatively, for example, on-device natural language understanding (NLU) can be implemented using an on-device NLU module that processes the recognized text generated using on-device speech recognition, and optionally contextual data, to generate NLU data.
[0022] NLU data can include an intent corresponding to an oral utterance and, optionally, parameters related to the intent (e.g., slot values). On-device fulfillment can be performed using an on-device fulfillment module that determines the actions to take to resolve the intent of an oral utterance (and optionally parameters related to the intent) using the NLU data (from the on-device NLU) and optionally other local data. This can include determining local and / or remote responses (e.g., answers) to the oral utterance, interactions with locally installed applications to be performed based on the oral utterance, commands to be sent to Internet of Things (IoT) devices based on the oral utterance (either directly or via a corresponding remote system), and / or other resolution actions to be performed based on the oral utterance. Then, on-device fulfillment can initiate the local and / or remote implementation / execution of the determined actions to resolve the oral utterance.
[0023] In various implementations, remote speech processing, remote NLU, and / or remote fulfillment can be at least selectively utilized. For example, the recognized text can be at least selectively sent to a remote assistant component for remote NLU and / or remote fulfillment. As an example, the recognized text can be optionally sent for remote implementation in parallel with on-device implementation or in response to a failure of on-device NLU and / or on-device fulfillment. However, on-device speech processing, on-device NLU, on-device fulfillment, and / or on-device execution can be prioritized at least for the latency reduction they provide when resolving an oral utterance (since no client-server round trips are required to resolve the oral utterance). Additionally, on-device capabilities can be the only capabilities available in situations where there is no network connectivity or limited network connectivity.
[0024] In some implementations, the computing device 202 may include one or more applications 234 that may be provided by a third-party entity different from the entity that provided the computing device 202 and / or the automated assistant 204. The application state engine of the automated assistant 204 and / or the computing device 202 may access application data 230 to determine one or more actions that may be performed by one or more applications 234, and the state of each application 234 and / or the state of each device associated with the computing device 202. The device state engine of the automated assistant 204 and / or the computing device 202 may access device data 232 to determine one or more actions that may be performed by the computing device 202 and / or one or more devices associated with the computing device 202. Furthermore, application data 230 and / or other arbitrary data (e.g., device data 232) can be accessed by the automated assistant 204 to generate context data 236, which can characterize the context in which a particular application 234 and / or device is running, as well as the context in which a particular user is accessing application 234 and / or any other device or module, when accessing computing device 202.
[0025] While one or more applications 234 are being executed on computing device 202, device data 232 can characterize the current operating state of each application 234 being executed on computing device 202. Additionally, application data 230 can characterize one or more features of the executing applications 234, such as the content of one or more graphical user interfaces being rendered at the direction of one or more applications 234. Alternatively or in addition, application data 230 can characterize an action schema that can be updated by each application and / or by the automatic assistant 204 based on the current operating state of each application. Alternatively or in addition, one or more action schemas for one or more applications 234 can remain fixed but be accessible by an application state engine to determine appropriate actions to be initiated via the automatic assistant 204.
[0026] The computing device 202 may further include an assistant invocation engine 222, which can process application data 230, device data 232, context data 236, and / or any other data accessible to the computing device 202 using one or more trained machine learning models. The assistant invocation engine 222 can process this data to determine whether the data should be considered to indicate the user's intention to invoke the automated assistant, instead of waiting for the user to explicitly speak an invocation phrase to invoke the automated assistant 204 or requiring the user to explicitly speak an invocation phrase. For example, one or more trained machine learning models can be trained using training data instances based on a scenario in which the user is in an environment where multiple devices and / or applications exhibit various operating states. Training data instances can be generated to take up training data characterizing the contexts in which the user invokes the automated assistant and other contexts in which the user does not invoke the automated assistant.
[0027] When one or more trained machine learning models have been trained according to these training data instances, the assistant invocation engine 222 can cause the automatic assistant 204 to detect or restrict the detection of verbal invocation phrases from the user based on contextual and / or environmental characteristics. In addition to or instead, the assistant invocation engine 222 can also cause the automatic assistant 204 to detect or restrict the detection of one or more assistant commands from the user based on contextual and / or environmental characteristics. In some implementations, the assistant invocation engine 222 can be disabled or restricted based on the fact that computing device 202 has detected an assistant suppression output from another computing device. In this way, when computing device 202 has detected an assistant suppression output, the automatic assistant 204 will not be invoked based on the context data 236 that would normally cause the automatic assistant 204 to be invoked if no assistant suppression output had been detected.
[0028] In some implementations, the automated assistant 204 may include an environment feature engine 218 that can detect one or more features of the environment in which the computing device 202 and / or other computing devices are operating. The environment feature engine 218 may process data characterizing this one or more feature to determine whether the interaction between the user and the automated assistant 204 will be affected by the environment. This determination may be based on one or more heuristic processes and / or on one or more trained machine learning models that can be trained on previous instances in which one or more users have interacted with the automated assistant in similar environments. For example, data characterizing previous instances in which a user has interacted with the automated assistant in a crowded environment can be used to identify a threshold for interference scores. Interference scores can be generated for a particular environment using one or more trained machine learning models. When an interference score threshold is met by a particular interference score, the environment feature engine 218 can communicate with the interference marking engine 226 of the automated assistant 204.
[0029] The interference marking engine 226 can be used by the automated assistant 204 to provide the user with a mark indicating that the interaction between the user and the automated assistant 204 may be affected by one or more features of the environment in which they exist. In some implementations, the type of mark provided to the user can be based on one or more scores generated by the environment feature engine 218. For example, when a score is generated indicating that audio interference will affect the audio interface, the interference marking engine 226 can be configured to render a visual mark for the user. In some implementations, the characteristics of the mark can also be selected by the interference marking engine 226. For example, the size, shape, brightness, content, and / or other characteristics of the mark can be adjusted according to the degree of interference that is expected to affect the interaction between the user and the automated assistant 204.
[0030] In some implementations, the automated assistant 204 may include an interface selection engine 228 that can select an interface to encourage the user to use when interference is expected to affect another interface. For example, if the computing device 202 is determined to be experiencing a certain amount of glare that makes typing on a touch interface difficult, the interface selection engine 228 may score the touch interface lower than other available interfaces. In some cases, the interface selection engine 228 may designate a touch interface as less optimal than, for example, an audio interface in a particular environment. In some implementations, an interference labeling engine 226 may be notified of the ranking from the interface selection engine 228 and generate labels that identify the most optimal interface for a particular environment. As an example, labels provided by the interference labeling engine 226 may include a rendering of a keyboard to which the user can provide touch input to directly type input to the automated assistant.
[0031] Figure 3 illustrates Method 300 for providing an indicator that a particular input to an automated assistant would be affected by interference in the environment, and for providing a separate interface for providing input to the automated assistant. Method 300 can be implemented by one or more computing devices, applications, and / or any other device or module that can be associated with the automated assistant. Method 300 may include an action 302 for determining whether a user is willing to provide input to the automated assistant. In some implementations, the determination of whether a user is willing to provide input may be based on data from one or more sensors and / or other arbitrary interfaces of one or more computing devices. For example, the automated assistant may determine, based on audio data and / or image data, that one or more users are within a threshold distance of the automated assistant interface. If the automated assistant determines that a user is willing to provide input to the automated assistant, Method 300 can proceed to action 304. Otherwise, the automated assistant can continue to determine whether a user is willing to provide input to the automated assistant.
[0032] Action 304 may include determining whether one or more features of the environment are expected to affect user input. In some implementations, one or more features of the environment may include audio features of the environment. Audio features may include the number of people speaking, the source of a particular sound, the volume and / or frequency of a particular sound, ambient noise, the overall volume level, the distance from the sound source to a particular interface, and / or any other audio features that may interfere with input to a computing device. In some implementations, one or more features of the environment may include objects, people, locations, available power, weather, movement, lighting, distance between specific objects, the layout of an area, temperature, and / or any other features of the environment that may affect the interaction between the user and the automated assistant. In some implementations, one or more features of the environment may be determined and used to generate a score. The automated assistant may determine whether the score meets one or more different thresholds. Alternatively, or in addition to that, one or more different scores may be generated for each interface of a computing device present in the environment. When a score for a particular interface falls below a threshold for that interface, the automated assistant can provide an indicator that the interface may be subject to interference during the interaction between the user and the automated assistant. For example, a score for an audio interface might fail to meet the threshold for audio interfaces, and as a result, the automated assistant might cause the keyboard interface to be rendered on the computing device.
[0033] If the automated assistant determines that one or more features of the environment are expected to affect the interaction between the user and the automated assistant, method 300 may move from action 304 to action 306. Otherwise, the automated assistant may continue to determine whether the user has provided input to a particular interface or another interface to facilitate interaction with the automated assistant. Action 306 may include having the automated assistant provide an indicator that input being provided or being provided through a particular interface may be affected by the environment. In some implementations, the indicator can be provided via an audio interface, a graphical interface, a haptic interface, a wireless interface, and / or any other interface that can be used to provide the indicator to the user. In some implementations, when the computing device includes a touch display panel, the indicator can be rendered on the touch display panel, on the keyboard interface. In this way, when a user is about to provide a verbal utterance that is determined to be affected by interference in the environment, the user may recognize the keyboard interface and choose to provide touch input to the touch display panel instead of the verbal utterance.
[0034] Method 300 can move from action 306 to an optional action 308, which may include the automated assistant receiving a separate input from the user via a different interface (i.e., a different interface distinct from any particular interface that has been determined to be affected by interference in the environment). For example, instead of the user providing a verbal utterance such as "Good morning, Assistant" to initialize the execution of the "Good morning" routine, the user may type a shorter input, such as "Good morning," into the keyboard interface as input to the automated assistant. Method 300 can move from action 308 to an optional action 310, which may include causing the automated assistant to initialize the execution of one or more actions based on the separate input. For example, the separate input may include natural language content typed by the user using the keyboard interface rendered in action 306. The natural language content may be "Good morning," which may cause the automated assistant to perform one or more actions to facilitate the completion of the "Good morning" routine (e.g., reading the calendar entry for the day, turning on the lights in the house, playing relaxing music). Automated assistants can reduce the number of misinterpreted user inputs by providing indicators of whether verbal or other types of input may be subject to interference. This allows computing resources that would normally be consumed by performing incorrect actions and / or processing duplicate inputs to be preserved.
[0035] Figure 4 is a block diagram 400 of an exemplary computer system 410. The computer system 410 typically includes at least one processor 414 that communicates with several peripheral devices via a bus subsystem 412. These peripheral devices may include, for example, a storage subsystem 424 including memory 425 and a file storage subsystem 426, a user interface output device 420, a user interface input device 422, and a network interface subsystem 416. The input and output devices enable user interaction with the computer system 410. The network interface subsystem 416 provides an interface to an external network and is coupled to a corresponding interface device in another computer system.
[0036] The user interface input device 422 may include pointing devices such as keyboards, mice, trackballs, touchpads, or graphics tablets, scanners, touchscreens integrated into displays, voice recognition systems, microphones, and / or other types of input devices. Generally, the use of the term “input device” is intended to include all possible types of devices and means for inputting information within the computer system 410 or onto a communication network.
[0037] The user interface output device 420 may include non-visual displays such as a display subsystem, printer, fax machine, or audio output device. The display subsystem may include flat panel devices such as cathode ray tubes (CRTs) or liquid crystal displays (LCDs), projection devices, or any other mechanism for producing visible images. The display subsystem may also provide non-visual displays, such as through an audio output device. In general, the use of the term “output device” is intended to include all possible types of devices and means for outputting information from the computer system 410 to the user or to another machine or computer system.
[0038] The storage subsystem 424 stores programming structures and data structures that provide the functionality of some or all of the modules described herein. For example, the storage subsystem 424 may include logic for implementing a selected aspect of method 300 and / or for implementing one or more of the system 200, computing device 106, and / or any other applications, devices, apparatus, and / or modules discussed herein.
[0039] These software modules are generally executed by processor 414 alone or in combination with other processors. The memory 425 used within the storage subsystem 424 may include several types of memory, including main random access memory (RAM) 430 for storing instructions and data during program execution, and read-only memory (ROM) 432 for storing fixed instructions. The file storage subsystem 426 can provide persistent storage for program files and data files and may include hard disk drives, floppy disk drives and associated removable media, CD-ROM drives, optical drives, or removable media cartridges. Modules implementing a particular implementation of a function can be stored by the file storage subsystem 426 within the storage subsystem 424 or on other machines accessible by processor 414.
[0040] The bus subsystem 412 provides a mechanism for various components and subsystems of the computer system 410 to communicate with each other as intended. Although the bus subsystem 412 is schematically shown as a single bus, multiple buses can be used in alternative implementations of the bus subsystem.
[0041] The computer system 410 can be of various types, including workstations, servers, computing clusters, blade servers, server farms, or any other data processing system or computing device. Due to the constantly changing nature of computers and networks, the description of the computer system 410 depicted in Figure 4 is intended only as a specific example to illustrate several implementation forms. Many other configurations of the computer system 410 are possible, having more or fewer components than the computer system depicted in Figure 4.
[0042] Wherever the systems described herein collect or can use personal information about a user (or, as often referred to herein, “Participant”), the user may be given the opportunity to control whether the program or features collect user information (e.g., information about the user’s social networks, social behavior or activities, occupation, preferences, or current geographical location), or whether and / or how they receive content from a content server that may be more relevant to the user. Furthermore, certain data may be processed in one or more ways so that personally identifiable information is removed before it is stored or used. For example, user identification information may be processed so that personally identifiable information cannot be identified to that user, or, if geographical location information is obtained, the user’s geographical location may be generalized (to the city level, zip code level, or state level, etc.) so that the user’s specific geographical location cannot be identified. Thus, users may have control over how information about them is collected and / or used.
[0043] While several implementations have been described and illustrated herein, a variety of other means and / or structures can be used to perform the function and / or to obtain one or more of the results and / or advantages described herein, and such variations and / or modifications are each considered to fall within the scope of the implementations described herein. More generally, all parameters, dimensions, materials, and configurations described herein are intended to be illustrative, and the actual parameters, dimensions, materials, and / or configurations will depend on one or more specific applications in which this teaching is used. Those skilled in the art will recognize many equivalents of the specific implementations described herein, or can verify them using only ordinary experiments. Therefore, it should be understood that the aforementioned implementations are presented as examples only, and that implementations can be carried out in ways other than those specifically described and claimed, within the scope of the appended claims and their equivalents. The implementations of this disclosure cover each individual feature, system, article, material, tool, and / or method described herein. In addition, any combination of two or more such features, systems, articles, materials, sets of tools, and / or methods is included within the scope of this disclosure, provided that such features, systems, articles, materials, sets of tools, and / or methods are not inconsistent with each other.
[0044] In some implementations, a method implemented by one or more processors is described as processing first data characterizing the audio characteristics of the environment in which a computing device resides, and including actions such as processing that the computing device provides access to an automated assistant through one or more interfaces of the computing device. The method may further include determining, based on the first data, whether the audio characteristics correspond to interference that is expected to affect the interaction between the user and the automated assistant, and determining that the interaction takes place through the audio interface of one or more interfaces. The method may further include processing second data indicating that the user is providing or is expected to provide oral utterances to the automated assistant to facilitate the interaction. When it is determined that interference affects the interaction, the method may further include, based on the first and second data, causing the computing device or another computing device to render a keyboard interface, to receive user input in the keyboard interface to facilitate the interaction between the user and the automated assistant, and to cause the automated assistant to initialize the performance of one or more actions in response to the receipt of user input in the keyboard interface.
[0045] In some implementations, the method may further include receiving separate user input via an audio interface to facilitate the interaction between the user and the automated assistant when it is determined that the interference does not affect the interaction. In some implementations, determining whether an audio characteristic corresponds to an interference that is expected to affect the interaction between the user and the automated assistant includes identifying the number of people located within a threshold distance of one or more interfaces. In some implementations, determining whether an audio characteristic corresponds to an interference that is expected to affect the interaction between the user and the automated assistant includes determining a score indicating whether the automated assistant is expected to correctly interpret oral utterances that the user is providing or is expected to provide to facilitate the interaction, and determining that the interference is expected to affect the interaction when the score meets a threshold.
[0046] In some implementations, determining whether audio characteristics correspond to interferences expected to affect the interaction between the user and the automated assistant is a way of determining an additional score characterizing the ease of accessing the keyboard interface within the environment in which the computing device is present, and that the rendering of the keyboard interface is further based on this additional score. In some implementations, the rendering of the keyboard interface by the computing device occurs simultaneously with the user providing verbal utterances to the automated assistant to facilitate interaction. In some implementations, the rendering of the keyboard interface by the computing device occurs before the user provides verbal utterances to the automated assistant to facilitate interaction.
[0047] In other implementations, a method implemented by one or more processors is described as processing first data characterizing one or more features of the environment in which a computing device resides, including actions such as processing that the computing device provides access to an automated assistant through one or more interfaces of the computing device. The method may further include determining, based on the first data, whether one or more features correspond to an interference that is expected to affect the interaction between a user and an automated assistant, and determining that the interaction takes place through one or more interfaces of the computing device. The method may further include processing second data indicating that the user is providing, or is expected to provide, input to the automated assistant to facilitate the interaction. The method may further include, when it is determined that interference will affect the interaction, causing a computing device or another computing device to render an indicator based on the first and second data that the interaction between the user and the automated assistant will be affected by the interference; receiving user input in one or more other interfaces of the computing device or other computing device to facilitate the interaction between the user and the automated assistant; and causing the automated assistant to initialize the execution of one or more actions in response to the receipt of user input in one or more other interfaces.
[0048] In some implementations, the method may further include receiving separate user inputs at one or more interfaces of the computing device to facilitate interaction between the user and the automated assistant when it is determined that the interference does not affect the interaction. In some implementations, one or more interfaces include audio interfaces, and indicators are rendered at the graphical user interface of the computing device or other computing devices. In some implementations, determining whether one or more features correspond to interference that is expected to affect interaction between the user and the automated assistant includes identifying the number of people located within a threshold distance of one or more interfaces. In some implementations, determining whether one or more features correspond to interference that is expected to affect interaction between the user and the automated assistant includes identifying the number of people speaking within a threshold distance of one or more interfaces.
[0049] In some implementations, determining whether one or more features correspond to an interference expected to affect the interaction between the user and the automated assistant includes determining a score indicating whether the automated assistant is expected to correctly interpret inputs that the user is providing or is expected to provide to facilitate the interaction, and determining that the interference is expected to affect the interaction when the score meets a threshold. In some implementations, determining whether one or more features correspond to an interference expected to affect the interaction between the user and the automated assistant includes determining an additional score characterizing the ease of accessing one or more other interfaces in the environment in which the computing device resides, and determining that the indicator is further based on the additional score. In some implementations, having the computing device render the indicator is performed simultaneously with the user providing input to the automated assistant to facilitate the interaction. In some implementations, having the computing device render the indicator is performed before the user provides input to the automated assistant to facilitate the interaction. In some implementations, one or more interfaces include a keyboard interface, and the indicator includes a keyboard interface rendered in the computing device's graphical user interface. In some implementations, the indicator further includes a text field, which contains the proposed content, and the proposed content is based on the content that is specifically represented within the input and is affected by the interference.
[0050] Further implementations describe a method implemented by one or more processors that processes contextual data characterizing the audio characteristics of the environment in which a computing device resides, including actions such as processing that the computing device provides access to an automated assistant via the computing device's audio interface. The method may further include determining, based on the processing of the contextual data, whether the audio characteristics correspond to interference that is expected to affect the interaction between the user and the automated assistant, and whether the interaction takes place via the audio interface of one or more interfaces. When it is determined that the interference will affect the interaction, the method may further include, based on the contextual data, causing the computing device or another computing device to render a keyboard interface for receiving user input to facilitate the interaction between the user and the automated assistant. In some implementations, processing contextual data characterizing the audio characteristics of the environment in which a computing device resides includes processing historical interaction data characterizing one or more previous instances in which a user provided verbal input to an automated assistant, and the automated assistant failed to fulfill one or more requests specifically expressed in that verbal input. [Explanation of symbols]
[0051] 100 views 102 users 104 Display Interfaces 106 Computing Devices 108 Audio Interface 110 Environment 112 input fields 120 views 122 Hand 124-key keyboard 200 Systems 202 Computing Devices 204 Automated Assistant 206 Input Processing Engine 208 Speech Processing Engine 210 Data Parsing Engine 212 Parameter Engine 214 Output generation engine 218 Environmental Features Engine 220 Assistant Interface 222 Assistant Call Engine 226 Interference Marking Engine 228 Interface Selection Engine 230 Application Data 232 Device Data 234 Applications 236 Context Data 238 Assistant Data 300 ways 400 Block Diagram 410 Computer Systems 412 Bus subsystem 414 processors 416 Network Interface Subsystem 420 User Interface Output Devices 422 User Interface Input Devices 424 Storage Subsystems 425 memory 426 File Storage Subsystem 430 Main Random Access Memory (RAM) 432 Read-only memory (ROM)
Claims
1. A method implemented by one or more processors, A step of generating a current embedding that shows the current features of the context data based on processing the context data using a machine learning model, The aforementioned context data characterizes the environmental characteristics of the environment in which the computing device resides. The computing device provides access to an automated assistant via multiple interfaces. Steps and The steps include determining a first distance based on comparing the current embedding with a first embedding that shows the features of the audio interface modality, The steps include determining a second distance based on comparing the current embedding with a second embedding that shows the features of the keyboard modality, A step of selecting an audio interface modality or a keyboard interface modality based on the first distance and the second distance, The process includes the step of causing the computing device to render, based on the selection, one of an audio interface or a keyboard interface for receiving user input, in order to facilitate interaction between the user and the automated assistant. method.
2. The aforementioned context data characterizes the context in which a particular application is running. The method according to claim 1.
3. The context data characterizes the context in which a particular user is accessing the computing device. The method according to claim 1.
4. The context data characterizes the context in which a particular user is accessing an application on the computing device. The method according to claim 1.
5. The aforementioned context data characterizes the context in which a particular user is accessing other devices. The method according to claim 1.
6. The machine learning model is trained using training data based on a scenario in which the user is in an environment where multiple devices and / or applications exhibit various operating states. The method according to claim 1.
7. A method implemented by one or more processors, A step of processing first data to determine several characteristics of the environment in which a computing device exists, The aforementioned plurality of features include at least a first feature of a first type and a second feature of a second type that is different from the first type, The computing device provides access to the automated assistant through one or more interfaces of the computing device. Steps and A step of determining, based on at least the first and second features, whether the plurality of features correspond to interferences that are expected to affect the interaction between the user and the automated assistant, wherein the interaction occurs through one or more interfaces of the computing device; The steps include processing second data indicating that the user will provide input to the automated assistant or is expected to provide input to the automated assistant in order to facilitate the aforementioned dialogue, When it is determined that the interference will affect the dialogue, The steps of causing the computing device or another computing device to render a keyboard interface and an indication that the interaction between the user and the automated assistant will be affected by interference, based on the first data and the second data, The steps include: receiving user input to facilitate the interaction between the user and the automated assistant at the rendered keyboard interface of the computing device or the other computing device; In response to receiving user input through one or more of the aforementioned interfaces, The steps include generating natural language understanding (NLU) data based on the user input, The steps include: performing NLU on the user input text; Based on the NLU data, the automated assistant initializes the execution of one or more actions. including method.
8. When it is determined that the interference does not affect the dialogue, The computing device includes a step of receiving separate user input at one or more interfaces to facilitate the interaction between the user and the automated assistant. The method according to claim 7.
9. The aforementioned one or more interfaces include an audio interface. The display is rendered in the graphical user interface of the computing device or the other computing device. The method according to claim 7.
10. The step of determining whether the plurality of features correspond to the interference that is expected to affect the interaction between the user and the automated assistant is: The step includes determining the number of people located within a threshold distance of one or more interfaces. The method according to claim 7.
11. The step of determining whether the plurality of features correspond to the interference that is expected to affect the interaction between the user and the automated assistant is: The step includes determining the number of people speaking within a threshold distance of one or more interfaces. The method according to claim 7.
12. The step of determining whether the plurality of features correspond to the interference that is expected to affect the interaction between the user and the automated assistant is: The step includes determining a score indicating whether the automated assistant is expected to correctly interpret inputs given or expected to be given by the user in order to facilitate the aforementioned dialogue, When the score meets the threshold, the interference is expected to affect the dialogue. The method according to claim 7.
13. The step of determining whether the plurality of features correspond to the interference that is expected to affect the interaction between the user and the automated assistant is: The step includes determining an additional score that characterizes the convenience of accessing one or more other interfaces in the environment in which the computing device exists, The aforementioned markings are further based on the aforementioned additional score. The method according to claim 7.
14. The step of having the computing device render the aforementioned display is performed simultaneously with the user providing the input to the automated assistant to facilitate the interaction. The method according to claim 7.
15. The step of having the computing device render the display is performed before the user provides the input to the automated assistant to facilitate the interaction. The method according to claim 7.
16. The indication further includes a text field, the text field includes content specifically represented in the input, and the text field includes proposed content based on the content affected by the interference. The method according to claim 7.