Modification of an audio-based computer program output

DE102017131378B4Active Publication Date: 2026-06-11GOOGLE LLC

View PDF 2 Cites 0 Cited by

Patent Information

Authority / Receiving Office: DE · DE
Patent Type: Patents
Current Assignee / Owner: GOOGLE LLC
Filing Date: 2017-12-28
Publication Date: 2026-06-11

Application Information

Patent Timeline

28 Dec 2017

Application

11 Jun 2026

Publication

DE102017131378B4

IPC: G06F3/16; G06F40/211; G10L13/00; G10L13/08

CPC: G06F3/167; G06F16/951; G06F3/16; G06F40/211; G06F40/253; G06F21/6218; G10L13/08; H04L51/02

AI Tagging

Application Domain

Web data indexing Computer security arrangements

Explore More Agents

Novelty Search
Search existing technologies and assess novelty
↗
FTO
Analyze whether a product may infringe others' patents
↗
Design FTO
Check prior-design risk for exterior design
↗
Drafting
Draft patent application text based on a technical solution
↗
Find Solutions with TRIZ
Generate feasible solution to solve your technical challenge
↗

Similar Technology Patents

Get free access to AI patent search and analysis

Check patentability, review prior art and ask IP Agent with full patent context.

AI Technical Summary

Technical Problem

Dissimilar computing resources face challenges in efficiently processing and delivering audio-based content elements due to inconsistent speech or image models, leading to redundant processing and inefficient bandwidth utilization.

Method used

A data processing system that modifies computer program output by selecting and inserting parameterized content elements into dialog data structures using chatbots, employing parametrically controlled text-to-speech techniques to enhance efficiency and reduce redundant processing.

Benefits of technology

This approach reduces resource consumption, processor utilization, and bandwidth usage while ensuring accurate and consistent delivery of audio-based content, allowing for seamless session continuation and validation of chatbot platforms.

✦ Generated by Eureka AI based on patent content.

Smart Images

Figure 00000000_0000_ABST

Patent Text Reader

Abstract

System (100) for modifying a computer program output, comprising: a data processing system (102) with one or more processors and memory for: Receiving, from a computer device (104), a digital file corresponding to a first acoustic signal transmitting speech content detected by a microphone of the computer device (104), wherein the first acoustic signal is converted into the digital file by an analog-to-digital converter of the computer device (104); Selecting, in response to the language content of the digital file, a computer program that includes a chatbot, from several computer programs that include chatbots for execution; Identify, via the chatbot based on the language content of the digital file, a dialog data structure that includes a placeholder field; Generate, in response to an identification of the placeholder field in the dialog data structure, a content request in a parameterized format configured for a parametrically controlled text-to-speech technique; Sending the content request to a content selection component of the data processing system (102); Selecting, via a content selection process in response to the request, a content element to be inserted into the placeholder field of the dialog data structure, wherein the content element is configured in the parameterized format for the parametrically controlled text-to-speech technique; and Providing to the chatbot the content element in the parameterized format selected via the content selection process to cause the computer device (104) to execute the parametrically controlled text-to-speech technique to generate a second acoustic signal corresponding to the dialog data structure modified with the content element.

Need to check novelty before this filing date? Find Prior Art

Description

BACKGROUND

[0001] Packet-based or otherwise excessive network transmissions of network traffic data between computing devices can prevent a computing device from properly processing the network traffic data, completing an operation associated with the network traffic data, or responding to the network traffic data in a timely manner. Excessive network transmissions of network traffic data can also complicate data routing or degrade the quality of the response if the responding computing device is at or above its processing capacity, potentially resulting in inefficient bandwidth utilization. Controlling network transmissions corresponding to content element objects can be complicated by the large number of content element objects that can initiate network transmissions of network traffic data between computing devices.

[0002] DE 10 2012 218 938 A1 refers to voice interfaces to computer-based services received wirelessly from a cellular telephone or other mobile device, and such interfaces implemented in a vehicle, such as a passenger car.

[0003] US 2017 / 0132199A1 refers to procedures for using a virtual assistant, comprising receiving an unstructured, natural-language user request for a service from a virtual assistant, determining whether the user request matches at least one of several plan templates that the virtual assistant can access, based on the content of the user request and the content of the plan templates, selecting one of several plan templates according to a determination that the user request matches at least one of several plan templates, and refraining from selecting a plan template according to a determination that the user request does not match at least one of several plan templates, and in response to the selection of one of the plan templates, beginning to respond to the user request according to the selected plan template. SUMMARY

[0004] The present disclosure is generally aimed at improving the efficiency and effectiveness of information transfer and processing across dissimilar computing resources. With dissimilar computing resources, it is challenging to efficiently process and consistently and accurately deliver audio-based content elements in a speech-based (or other non-text-based, such as image or video) computing environment. For example, the dissimilar computing resources may not have access to the same speech or image models, or they may have access to outdated or unsynchronized speech or image models, which can make it challenging to deliver the audio-based content element accurately and consistently. The computing resource can further perform redundant processing to select content elements that can be reused, thereby reducing processor load.

[0005] The systems and methods of this disclosure are generally directed to a data processing system that modifies the output of a computer program over a network. The data processing system can receive a content request, which is to be provided via a computer program that includes a chatbot. A chatbot, or artificial conversational unit, can refer to a computer program that conducts a conversation using auditory, visual, or textual techniques.

[0006] At least one aspect relates to a system for modifying a computer program output according to claims 1 to 11.

[0007] A method for modifying a computer program output is further disclosed. The method can be executed by a data processing system comprising one or more processors and memory. The method may include the data processing system receiving a digital file from a computer device, corresponding to a first acoustic signal transmitting speech content detected by a microphone of the computer device. The first acoustic signal can be converted into the digital file by an analog-to-digital converter of the computer device. The method may include the data processing system selecting, in response to the speech content of the digital file, a computer program comprising a chatbot from among several computer programs that include chatbots for execution.The process can involve the data processing system using the chatbot to identify a dialog data structure containing a placeholder field based on the speech content of the digital file. The process can then involve the data processing system generating a content request in a parameterized format configured for parametric text-to-speech technology in response to the identification of the placeholder field in the dialog data structure. The process can then involve the data processing system sending the content request to a content selection component of the data processing system. Finally, the process can involve the data processing system using a content selection process to select a content element to insert into the placeholder field of the dialog data structure in response to the request.The content element can be in the parameterized format configured for the parametrically controlled text-to-speech technique. The procedure can involve the data processing system providing the chatbot with the content element in the parameterized format selected via the content selection process, causing the computer device to execute the parametrically controlled text-to-speech technique to generate a second acoustic signal corresponding to the dialog data structure modified by the content element.

[0008] At least one aspect relates to a method for modifying a computer program output according to claims 12 to 16. A system for balancing data requirements for modifying a computer program output is further disclosed. The system may include a data processing system comprising one or more processors and memory. The data processing system may receive from a computer device a first digital file corresponding to a first acoustic signal with a first speech content detected by a microphone of the computer device. The first acoustic signal is converted into the first digital file by an analog-to-digital converter of the computer device. In response to the first speech content of the first digital file, the data processing system may select a computer program comprising a chatbot from among several computer programs comprising chatbots for execution.The data processing system can use the chatbot to identify an initial dialog data structure, comprising a placeholder field, based on the first spoken content of the first digital file. In response to identifying this first placeholder field, the data processing system can then use a content selection process to choose a content element to insert into that first placeholder field. This content element can be in a parameterized format configured for a parametrically controlled text-to-speech technique.The data processing system can provide the chatbot with the content element in the parameterized format selected via the content selection process. This causes the computer device to execute the parametrically controlled text-to-speech technique to generate a second acoustic signal corresponding to the first dialogue data structure modified by the content element. Based on a first identifier for the chatbot, a second identifier for the first dialogue data structure, and a third identifier for the computer device, the data processing system can generate an index value. The data processing system can then associate the content element with this index value in memory. Finally, the data processing system can receive a second digital file corresponding to a third acoustic signal, which transmits a second speech content detected by the microphone in the computer device.The data processing system can select the computer program containing the chatbot in response to the second language content of the second digital file. Based on the second language content of the second digital file, the data processing system can identify a second dialog data structure containing a second placeholder field via the chatbot. In response to the identification of the second placeholder, and based on the first identifier of the chatbot, the third identifier of the computer device, and a fourth identifier of the second dialog data structure, the data processing system can select the content element associated with the index value.The data processing system can provide the chatbot with the content element linked to the index value in order to cause the computer device to execute the parametrically controlled text-to-speech technique to generate a fourth acoustic signal that corresponds to the second dialog data structure modified with the content element.

[0009] A method for balancing data requirements to modify a computer program output is further disclosed. The method can be executed by a data processing system comprising one or more processors and memory. The method may include the data processing system receiving from a computer device a first digital file corresponding to a first acoustic signal containing a first speech content detected by a microphone of the computer device. The first acoustic signal can be converted into the first digital file by an analog-to-digital converter of the computer device. The method may include the data processing system selecting, in response to the first speech content, a computer program from the first digital file, which may include a chatbot, from among several computer programs that include chatbots for execution.The process can involve the data processing system using the chatbot to identify an initial dialog data structure, containing a placeholder field, based on the first speech content from the first digital file. The process can then involve the data processing system, through a content selection process, choosing a content element to insert into the first placeholder field of the first dialog data structure after identifying it. The content element can be in a parameterized format configured for a parametrically controlled text-to-speech technique.The method may include the data processing system providing the chatbot with the content element in the parameterized format selected via the content selection process, causing the computer device to execute the parametrically controlled text-to-speech technique to generate a second acoustic signal corresponding to the dialog data structure modified by the content element. The method may include the data processing system generating an index value based on a first identifier for the chatbot, a second identifier for the first dialog data structure, and a third identifier for the computer device. The method may also include the data processing system associating the content element with the index value in memory.The method may include the data processing system receiving a second digital file corresponding to a third acoustic signal transmitting a second speech content detected by the microphone on the computer device. The method may include the data processing system selecting the computer program containing the chatbot in response to the second speech content of the second digital file. The method may also include the data processing system identifying a second dialog data structure containing a second placeholder field via the chatbot, based on the second speech content of the second digital file.The method may involve the data processing system, in response to the identification of the second placeholder and based on the first identifier of the chatbot, the third identifier of the computer device, and a fourth identifier of the second dialogue data structure, selecting the content element associated with the index value. The method may further involve the data processing system providing the chatbot with the content element associated with the index value, causing the computer device to execute the parametrically controlled text-to-speech technique to generate a fourth acoustic signal corresponding to the second dialogue data structure modified by the content element.

[0010] Furthermore, a system for balancing data requests for modifying a computer program output is disclosed. The system can include a data processing system comprising one or more processors and memory. The data processing system can receive from a computer device a first digital file corresponding to a first acoustic signal containing first speech content detected by a microphone of the computer device. The first acoustic signal can be converted into the first digital file by an analog-to-digital converter of the computer device. In response to the first speech content of the first digital file, the data processing system can select a computer program comprising a first chatbot from among several computer programs comprising chatbots for execution.The data processing system can identify an initial dialog data structure, comprising a placeholder field, via the first chatbot based on the initial speech content of the first digital file. In response to identifying this first placeholder field, the data processing system can then select a content element to insert into it through a content selection process. This content element can be in a parameterized format configured for a parametrically controlled text-to-speech technique.The data processing system can provide the computer device with the first dialog data structure, modified with the content element, in the parameterized format selected via the content selection process. This causes the computer device to execute the parametrically controlled text-to-speech technique to generate a second acoustic signal corresponding to the first dialog data structure modified with the content element. Based on a first identifier for the first placeholder field and a second identifier for the computer device, the data processing system can generate an index value. The data processing system can associate the content element with the index value in memory. The data processing system can receive a second digital file corresponding to a third acoustic signal, transmitting a second speech content detected by the microphone in the computer device.In response to the second language content of the second digital file, the data processing system can select a second computer program, comprising a second chatbot, from among several computer programs. This second chatbot differs from the first. Based on the second language content of the second digital file, the data processing system can identify a second dialog data structure, comprising a second placeholder field, via the second chatbot. In response to the identification of the second placeholder, and based on the first identifier of the first placeholder field and the second identifier of the computer device, the data processing system can select the content element associated with the index value.The data processing system can provide the computer device with the second dialog data structure, which is modified with the content element linked to the index value, to cause the computer device to execute the parametrically controlled text-to-speech technique to generate a fourth acoustic signal corresponding to the second dialog data structure modified with the content element.

[0011] A method for balancing data requirements to modify a computer program output is further disclosed. The method can be executed by a data processing system comprising one or more processors and memory. The method may include the data processing system receiving from a computer device a first digital file corresponding to a first acoustic signal containing first speech content, detected by a microphone of the computer device. The first acoustic signal can be converted into the first digital file by an analog-to-digital converter of the computer device. The method may include the data processing system, in response to the first speech content, selecting from the first digital file one computer program comprising a chatbot, or from several computer programs comprising chatbots, for execution.The process can involve the data processing system identifying an initial dialog data structure, containing a placeholder field, based on the initial speech content from the initial digital file. The process can then include the data processing system using a content selection process to select a content element for insertion into the first placeholder field of the initial dialog data structure. This content element can be in a parameterized format configured for a parametrically controlled text-to-speech technique.The method may include the data processing system providing the computer device with the first dialog data structure, modified with the content element in a parameterized format selected via the content selection process, to cause the computer device to execute the parametrically controlled text-to-speech technique to generate a second acoustic signal corresponding to the first dialog data structure modified with the content element. The method may include the data processing system generating an index value based on a first identifier for the first placeholder field and a second identifier for the computer device. The method may also include the data processing system linking the content element to the index value in memory.The method may include the data processing system receiving a second digital file corresponding to a third acoustic signal transmitting a second speech content detected by the microphone on the computer device. The method may include the data processing system, in response to the second speech content of the second digital file, selecting a second computer program, comprising a second chatbot, from among several computer programs, the second chatbot being distinct from the first. The method may include the data processing system, via the second chatbot, identifying a second dialog data structure, comprising a second placeholder field, based on the second speech content of the second digital file.The method may include the data processing system selecting the content element associated with the index value in response to the identification of the second placeholder, based on the first identifier of the first placeholder field and the second identifier of the computer device. The method may further include the data processing system providing the computer device with the second dialog data structure, modified with the content element associated with the index value, to cause the computer device to execute the parametrically controlled text-to-speech technique to generate a fourth acoustic signal corresponding to the second dialog data structure modified with the content element.

[0012] Furthermore, a system for validating a modification of a computer program output is disclosed. The system can include a data processing system comprising one or more processors and memory. The data processing system can establish a communication channel with a third-party server that provides a computer program comprising a chatbot. The computer program can include selecting the chatbot based on an acoustic signal detected by a microphone of a computer device. The data processing system can receive a content request from the third-party server in a parameterized format configured for a parametrically controlled text-to-speech technique. The request can be triggered by the identification of a placeholder field in a dialog data structure identified by the chatbot.The data processing system can select a content element to insert into the placeholder field of the dialog data structure via a content selection process in response to a request. The content element can be in the parameterized format configured for the parametrically controlled text-to-speech technique. The data processing system can send the content element, in the parameterized format selected via the content selection process for delivery to the chatbot, to the third-party server. This causes the computer device to execute the parametrically controlled text-to-speech technique to generate a second acoustic signal corresponding to the dialog data structure modified by the content element. The data processing system can receive a notification from the chatbot regarding the content element.The data processing system can set a validation parameter for the third-party server based on a comparison of the reference to the content element with the content element.

[0013] A method for validating a modification of a computer program output is further disclosed. The method can be executed by a data processing system comprising one or more processors and memory. The method can include the data processing system establishing a communication channel with a third-party server that provides a computer program comprising a chatbot. The computer program can include selecting the chatbot based on an acoustic signal detected by a microphone of a computer device. The method can include the data processing system receiving a content request from the third-party server in a parameterized format configured for a parametrically controlled text-to-speech technique. The request can be triggered by the identification of a placeholder field in a dialog data structure identified by the chatbot.The procedure may involve the data processing system selecting a content element for insertion into the placeholder field of the dialog data structure via a content selection process in response to the request. The content element may be in the parameterized format configured for the parametrically controlled text-to-speech technique. The procedure may involve the data processing system sending the content element in the parameterized format selected via the content selection process to the third-party server for delivery to the chatbot. This causes the computer device to execute the parametrically controlled text-to-speech technique to generate a second acoustic signal corresponding to the dialog data structure modified with the content element. The procedure may also involve the data processing system receiving a reference to the content element from the chatbot.The procedure may include the data processing system setting a validation parameter for the third-party server based on a comparison of the reference to the content element with the content element.

[0014] These and other aspects and implementations are described in detail below. The preceding information and the following detailed description include illustrative examples of various aspects and implementations and provide an overview or framework for understanding the nature and character of the claimed aspects and implementations. The drawings provide an illustration and further understanding of the various aspects and implementations and are incorporated into and form part of this description. BRIEF DESCRIPTION OF THE DRAWINGS

[0015] The accompanying drawings are not to scale. Identical reference numbers and labels in the various drawings refer to similar elements. For the sake of clarity, not every component can be labeled in every drawing. The drawings include: Fig. Figure 1 is an illustration of a system for modifying computer program output over a network. Fig. Figure 2 illustrates an operation of a system to modify a computer program output over a network. Fig. Figure 3 illustrates an operation of a system for balancing data requests to modify a computer program output based on a session. Fig. Figure 4 illustrates an operation of a system for balancing data requests to modify a computer program output based on a session. Fig. Figure 5 illustrates an operation of a system to validate a modification of a computer program output over a network. Fig. Figure 6 illustrates a procedure for modifying computer program output over a computer network. Fig. Figure 7 illustrates a procedure for balancing data requests to modify a computer program output over a computer network. Fig. Figure 8 illustrates a procedure for balancing data requirements to modify a computer program output over a computer network. Fig. Figure 9 illustrates a procedure for validating computer program output over a computer network. Fig. Figure 10 is a block diagram illustrating a general architecture for a computer system that can be used to implement elements of the systems and procedures described and illustrated herein. DETAILED DESCRIPTION

[0016] Detailed descriptions of various concepts, devices, and systems for routing packetized actions across a computer network, and of their implementations, follow. The various concepts introduced above and discussed in more detail below can be implemented in any number of ways.

[0017] The present disclosure is generally aimed at improving the efficiency and effectiveness of information transfer and processing across dissimilar computing resources. With dissimilar computing resources, it is challenging to efficiently process and consistently and accurately deliver audio-based content elements in a speech-based computing environment. For example, the dissimilar computing resources may not have access to the same speech or image models, or they may have access to outdated or unsynchronized speech models, which can make it challenging to deliver the audio-based content element accurately and consistently. Furthermore, the computing resources can perform redundant processing to select content elements that can be reused, thereby reducing processor utilization and the waste of processing resources and electrical energy.

[0018] The systems and methods of this disclosure are generally directed to a data processing system that modifies the output of a computer program over a network. The data processing system can receive a content request, which is to be provided via a computer program that includes a chatbot. A chatbot, or artificial conversational unit, can refer to a computer program that conducts a conversation using auditory, visual, or textual techniques.

[0019] The present solution can reduce resource consumption, processor utilization, battery consumption, bandwidth usage, the size of an audio file, or the time consumed by a speaker by parsing speech-based commands from an end user, selecting or reusing a parameterized content element, and routing the parameterized content element with a dialog data structure.

[0020] This solution can, for example, provide automated native content elements for chatbots through dynamic digital product placement. The solution can provide an application programming interface (API) configuration that allows the chatbot to initiate or request a content selection process and insert the selected content element into a dialog data structure. For example, the chatbot could be a recipe chatbot. This recipe chatbot could provide a list of ingredients, perhaps in its native language. This solution can identify placeholders in the ingredient list, select a content element for the placeholder via a content selection process, and provide the content element for insertion into the placeholder. The content selection process can occur in real time, for example, after the chatbot has been started or run and before the dialog data structure section containing the placeholder is played.The system can execute the content selection process in close proximity to when the placeholder would be rendered. Furthermore, the application programming interface can employ a parametrically driven text-to-speech technique to deliver content elements using a native language.

[0021] The present solution can merge, resume, or restore sessions to reduce data processing. For example, the technology can determine that a session is resuming and can use a content element selected in a previous session for delivery in a second dialog data structure after a session interruption. The recipe chatbot, for instance, can deliver the ingredient list and use a brand name for an ingredient selected through a content selection process. The system can then detect a session interruption. The system can detect the interruption, for example, based on the user going to the store to buy the ingredients. When the user returns home, the recipe chatbot can restore the session and use the previously selected brand name, as opposed to a different brand name for the same ingredients.Using the same brand name makes the system more efficient, as it avoids the significant processing associated with performing a content selection process.

[0022] The system can continue sessions with the same chatbot or extend a session across multiple chatbots. The system can aggregate data across multiple chatbots or connect between them to reduce processor load by avoiding or minimizing redundant processing. For example, the recipe chatbot can identify the brand name of a soft drink. A computer can then call a movie chatbot. The movie chatbot can query the recipe chatbot for the soft drink brand name, and the movie chatbot can use the same or a different native language compared to the recipe chatbot to find the same brand name.

[0023] The system can validate a chatbot platform using a validation technique. For example, the data processing system can provide a content element to a third-party chatbot server with commands to forward the content element to the chatbot for insertion into a dialog data structure. The data processing system can then ping the chatbot regarding the content element and compare the chatbot's response with the content element initially provided to the third-party chatbot platform to determine if they match or if the third-party chatbot platform has modified the content element.

[0024] Fig. Figure 1 illustrates an example System 100 for modifying computer program output over a computer network. The System 100 may include a content selection infrastructure. The System 100 may include a data processing system 102. The data processing system 102 may communicate with one or more content provider computer devices 106, chatbot provider devices 108, third-party chatbot platform servers 146, or client computer devices 104 over a network 105. The network 105 may include computer networks such as the Internet, local, wide, metro, or other area networks, intranets, satellite networks, or other communication networks such as mobile voice or data cellular networks.The network 105 can be used to access information resources such as web pages, websites, domain names, or Uniform Resource Locators (URLs) that can be presented, output, rendered, or displayed on at least one computing device 104, such as a laptop, desktop, tablet, personal digital assistant, smartphone, portable computer, or speaker. For example, through the network 105, a user of the computing device 104 can access information or data provided by a chatbot provider 108 or a content provider computing device 106. The computing device 104 may or may not include a display; for example, the computing device may include limited types of user interfaces, such as a microphone and a speaker. In some cases, the primary user interface of the computing device 104 may be a microphone and a speaker.

[0025] The network 105 can include or form an ad network, such as a subset of information sources available on the internet, that is linked to a content ordering or search engine results system, or that is selectable so that it includes third-party content elements as part of a content ordering campaign. The network 105 can be used by the data processing system 102 to access information resources such as web pages, websites, domain names, or URLs that can be presented, output, rendered, or displayed by the client computer device 104. For example, through the network 105, a user of the client computer device 104 can access information or data provided by the content provider computer device 106 or the chatbot provider computer device 108.

[0026] Network 105 can be any type or form of network and may include any of the following: a point-to-point network, a broadcast network, a wide area network, a local area network, a telecommunications network, a data communications network, a computer network, an ATM (Asynchronous Transfer Mode) network, a SONET (Synchronous Optical Network), an SDH (Synchronous Digital Hierarchy) network, a wireless network, and a wired network. Network 105 may include a wireless connection such as an infrared channel or a satellite band. The topology of Network 105 may include a bus, star, or ring network topology.The network can include mobile telephone networks using any protocol or protocols suitable for communication with mobile devices, including Advanced Mobile Phone Protocol (AMPS), Time Division Multiple Access (TDMA), Code Division Multiple Access (CDMA), Global System for Mobile Communication (GSM), General Packet Radio Services (GPRS), and Universal Mobile Telecommunications System (UMTS). Different types of data can be transmitted over different protocols, or the same types of data can be transmitted over different protocols.

[0027] System 100 can comprise at least one data processing system 102. Data processing system 102 can comprise at least one logical device, such as a computer device with a processor for communication over the network 105, for example, with computer device 104, content provider device 106 (content provider computer device 106), or chatbot provider device 108 (or chatbot provider 108). Data processing system 102 can comprise at least one computing resource, server, processor, or memory. Data processing system 102 can, for example, comprise a multitude of computing resources or servers located in at least one data center. Data processing system 102 can comprise multiple logically grouped servers and facilitate distributed computing techniques. The logical group of servers can be referred to as a data center, server farm, or computer farm.The servers can also be distributed across different locations. A data center or computer farm can be managed as a single entity, or it can comprise multiple computer farms. The servers in a computer farm can be heterogeneous—one or more of the servers or computers can run on one or more types of operating system platforms.

[0028] Servers in a computer farm can be housed in high-density rack systems along with associated storage systems and located in an enterprise data center. Consolidating servers in this way can improve system management, data security, system physical security, and system performance by placing servers and high-performance storage systems on localized, high-performance networks. Centralizing all or some of the data processing system components, including servers and storage systems, and coupling them with improved system management tools enables more efficient use of server resources, saving power and processing requirements and reducing bandwidth usage.

[0029] System 100 can include, access, or otherwise interact with at least one chatbot provider device 108. The chatbot provider device 108 can include at least one logical device, such as a computer device with a processor for communicating over the network 105, for example, with the computer device 104, the data processing system 102, or the content provider computer device 106. The chatbot provider device 108 can include at least one compute resource, server, processor, or memory. For example, chatbot provider device 108 can include multiple compute resources or servers located in at least one data center. The chatbot provider device 108 can include one or more components or functionalities of the data processing system 102.

[0030] The chatbot provider device 108 can include or refer to a chatbot developer, such as an entity that designs, develops, manages, or maintains computer programs that constitute or provide one or more chatbots. A chatbot can comprise a computer program that conducts a conversation using auditory, visual, or textual methods. The chatbot can be designed to simulate how a human would behave as a conversational partner. Chatbots can be used in dialogue systems for customer service or information retrieval. Chatbots can include or utilize natural language processing systems (e.g., the natural language processing component 112). The chatbot can scan for keywords within an input and then retrieve a response from a database containing the most matching keywords or the most similarly worded phrase.The chatbot can be programmed with procedures that use pattern matching to look up predefined dialog data structures. The chatbot can be programmed with natural language processing techniques to identify the grammar and syntax of an input, tokenize an input, or otherwise process the input to determine a response.

[0031] The content provider computer device 106 can provide audio-based content elements for display by the client computer device 104 as an audio output content element. The content element can be or include a digital component. The content element can be or include a digital object. The content element can include a brand name or company name of a product or service. The content element can be configured for parametrically controlled text-to-speech technology. The content element can be configured for text-to-speech (TTS) implementations that convert normal text into speech. The content element can be input into an application programming interface that uses a speech synthesis capability to synthesize text into natural-sounding speech in a variety of languages, accents, and voices. The content element can be encoded as plain text or a speech synthesis markup language (SSML).SSML can include parameters that can be set to control aspects of speech, such as pronunciation, volume, pitch, or speed, which can form an acoustic fingerprint or a natural voice.

[0032] For example, a chatbot can identify a dialog data structure such as "The ingredients for chicken wings are: 1 cup of brown sugar, 1 can of cola, 2 medium onions, and 2 cloves of garlic." The content provider computer device 106 can provide a content element to be inserted into the dialog data structure, such as a cola brand name. The content provider computer device 106 can provide content selection criteria for the content element, such as a value, keyword, concept, or other metadata or information, to facilitate a content selection process. The content provider computer device 106 can also provide audio-based content elements (or other content elements) to the data processing system 102, where they can be stored in the data container 124.The data processing system 102 can select the audio content elements (or content elements configured for parametrically controlled text, image, or video-to-speech technology) and provide the audio content elements to the client computer device 104 (or instruct the content provider computer device 106 to provide them). The audio-based content elements can be audio only, or they can be combined with text, image, or video data.

[0033] The content provider computer device 106 can provide the content element to the data processing system 102 for storage in the data container 124 in the content data structure 130. The data processing system 102 can query the content element in response to a content request or otherwise determine to provide the content element.

[0034] The chatbot provider device 108 can include connecting to or otherwise communicating with at least one natural language processing component 142 of a chatbot provider and one interface 144 of a chatbot provider. The chatbot provider computer device 108 can include at least one natural language processing (NLP) component 142 of a chatbot provider and at least one chatbot provider interface 144. The chatbot provider NLP component 142 (or other components such as the chatbot provider computer device 108 or the chatbot platform 146) can interact with the client computer device 104 (via the data processing system 102 or by bypassing the data processing system 102) to establish a two-way, real-time speech or audio-based conversation (e.g., a session) between the client computer device 104 and the chatbot provider computer device 108.The chatbot provider NLP 142 can include one or more functions or features such as the NLP component 112 of the data processing system 102. For example, the chatbot provider interface 144 can receive data messages from or send data messages to the interface 110 of the data processing system 102. The chatbot provider computer device 108 and the content provider computer device 106 can be associated with the same unit. For example, the content provider computer device 106 can generate, store, or make available content elements for a chatbot, and the chatbot provider computer device 108 can establish a session with the client computer device 106 to communicate via a chatbot through the client computer device 104.The data processing system 102 can also establish the session with the client computer device 104 via the interface 110, the chatbot component 114, the session handler component 120 or other components, and include or bypass the chatbot provider computer device 104 or the third-party chatbot platform 146.

[0035] The third-party chatbot platform 146 can reference one or more servers of an entity that is different from the entity that runs or provides the data processing system 102. The third-party chatbot platform 146 can receive computer programs for a chatbot from the chatbot provider device 108. The third-party chatbot platform 146 can provide natural language processing and other functions. The third-party chatbot platform 146 can connect to or communicate with the computer device 104 to provide the chatbot functionality. For example, the third-party chatbot platform 146 can execute or run the chatbot provided by the chatbot provider device 108 to conduct a conversation with a user of the computer device 104. The third-party chatbot platform 146 can run on a server remote from the data processing system 102 and the computer device 104.In some cases, the third-party chatbot platform 146 can be run at least partially on the computer device 104 (e.g., as part of the preprocessor 140).

[0036] The computer device 104 may include connecting to or otherwise communicating with at least one sensor 134, transducer 136, audio driver 138, or preprocessor 140. The sensor 134 may, for example, include a camera, an ambient light sensor, a proximity sensor, a temperature sensor, an accelerometer, a gyroscope, a motion detector, a GPS sensor, a location sensor, a microphone, a video or image detection device, or a touch sensor. The transducer 136 may include or be part of a loudspeaker or microphone. The audio driver 138 may provide a software interface to the hardware transducer 136. The audio driver may execute the audio file or other commands provided by the data processing system 102 to control the transducer 136 and generate a corresponding acoustic wave or sound wave.The preprocessor 140 can detect a keyword and execute an action based on that keyword. The preprocessor 140 can filter out one or more terms or modify the terms before sending them to the data processing system 102 for further processing. The preprocessor 140 can convert the analog audio signals detected by the microphone into a digital audio signal and send one or more data packets carrying the digital audio signal to the data processing system 102 via the network 105. In some cases, the preprocessor 140 can transmit data packets containing some or all of the input audio signals in response to the detection of a command to execute such a transmission.The command may, for example, include a trigger keyword or another keyword or authorization to send data packets containing the input audio signal to data processing system 102.

[0037] The client computer device 104 can be connected to an end user who inputs voice queries as audio input into the client computer device 104 (via the sensor 134) and receives audio output in the form of a computer-generated voice. This voice is provided to the client computer device 104 by the data processing system 102 (or the content provider computer device 106 or the chatbot provider computer device 108) and can be output by the transducer 136 (e.g., a loudspeaker). The computer-generated speech can include recordings of a real person or computer-generated speech.

[0038] The client computer device 104 can be connected to an end user who provides an image or video that can display queries such as those entered into the client computer device 104 (via the sensor 134) and receives audio output in the form of computer-generated speech, which can be provided to the client computer device 104 by the data processing system 102 (or the content provider computer device 106 or the chatbot provider computer device 108) and output by the transducer 136 (e.g., a speaker). The input detected by the one or more sensors 134 can include one or more audio inputs (e.g., an acoustic signal), visual inputs (e.g., image or video data), motion inputs, or other inputs. The input (e.g.,The input (e.g., one or more audio, image, visual, or motion inputs) to the computer device 104 can be converted into a digital file and provided to the data processing system 102 for further processing or to generate actions. For example, the input (e.g., one or more audio, image, visual, or motion inputs) to the computer device 104 can trigger the selection of a computer program that includes a chatbot and the generation of a query to be entered into the chatbot. The chatbot can then provide an output that responds to the generated query or corresponds to the input (e.g., one or more audio, image, visual, or motion inputs) to the computer device 104.

[0039] The data container 124 can comprise one or more local or distributed databases and can include a database management system. The data container 124 can comprise computer data storage or memory and can store, among other data, one or more profiles 126, one or more indexes 128, content data 130, or chatbot data 132. The profile 126 can comprise information about the computer device 104 or an account associated with the computer device 104. The profile 126 can include historical network activity associated with the computer device 104, identifiers of chatbots used by the computer device 104, a configuration of the computer device 104, device functionality, preferences, or other information associated with the computer device 104 that can facilitate content selection.Index 128 can map previously selected content elements to a session identifier, computer device identifier, and dialog data structure identifier to facilitate content element reuse. Content data 130 can include content elements for audio output or associated metadata, as well as entered audio messages that may be part of one or more communication sessions with the client computer device 104. Chatbot data 132 can include identifiers for chatbots and information about chatbot types (e.g., category, limitations, or topics).

[0040] The data processing system 102 may include a content placement system that has at least one computing resource or server. The data processing system 102 may include connecting to or otherwise communicating with at least one interface 110. The data processing system 102 may include connecting to or otherwise communicating with at least one natural language processor component 112. The data processing system 102 may include connecting to or otherwise communicating with at least one chatbot component 114. The data processing system 102 may include connecting to or otherwise communicating with at least one placeholder generation component 116. The data processing system 102 may include connecting to or otherwise communicating with at least one content selection component 118.The data processing system 102 may include connecting to or otherwise communicating with at least one session handler component 118. The data processing system 102 may include connecting to or otherwise communicating with at least one validation component 122. The data processing system 102 may include connecting to or otherwise communicating with at least one data container 124. The at least one data container 124 may include or store one or more data structures or databases, profiles 126, indexes 128, content data 130, or chatbot data 132. The content data 130 may, for example, include content campaign information, content groups, content selection criteria, content element objects, or other information provided by a content provider computer device 106 or received or determined by the data processing system to facilitate content selection.The content data 130 can, for example, include the historical performance of a content campaign.

[0041] The interface 110, the natural language processor component 112, the chatbot component 114, the placeholder generation component 116, the content selection component 118, the session handling component 120, or the validation component 122 can each comprise at least one processing unit or other logical device, such as a programmable logic array engine or module, configured to communicate with the database container or database 124. The interface 110, the natural language processor component 112, the chatbot component 114, the placeholder generation component 116, the content selection component 118, the session handling component 120, the validation component 122, and the data container 124 can be separate components, a single component, or part of the data processing system 102.System 100 and its components, such as a Data Processing System 102, may include hardware elements such as one or more processors, logic devices, or circuits.

[0042] The data processing system 102 can obtain anonymous information about computer network activities associated with multiple computer devices 104. A user of a computer device 104 can specifically authorize the data processing system 102 to obtain information about network activities corresponding to the user's computer device 104. For example, the data processing system 102 can induce the user of the computer device 104 to consent to obtain one or more types of information about network activity. The identity of the user of the computer device 104 can remain anonymous, and the computer device 104 can be associated with a unique identifier (e.g., a unique identifier for the user or the computer device provided by the data processing system or a user of the computer device).The data processing system can link each observation with a corresponding unique identifier.

[0043] A content provider computer device 106 can set up an electronic content campaign. The electronic content campaign can be stored as content data 130 in the data container 124. An electronic content campaign can refer to one or more content groups that correspond to a common theme. A content campaign can include a hierarchical data structure that comprises content groups, content element data objects (e.g., digital components or digital objects), and content selection criteria. To create a content campaign, the content provider computer device 106 can specify values for campaign-level parameters of the content campaign.Campaign-level parameters can include, for example: a campaign name, a preferred content network for placing content item objects, a resource value for use in the content campaign, start and end dates for the content campaign, a content campaign duration, a schedule for placing content item objects, language, geographic locations, and the type of computing devices on which content item objects are to be delivered. In some cases, an impression may indicate when a content item object is retrieved from its source (e.g., data processing system 102 or content provider computing device 106) and is countable. In some cases, given the possibility of fraudulent clicks, robotic activity may be filtered and excluded as an impression.Therefore, in some cases, an impression may refer to a measurement of responses from a web server regarding a page request by a browser, filtered out of automatic activity and error codes, and recorded at a point as close as possible to an opportunity to render the content element object for display on the computer device 104. In some cases, an impression may refer to a visible or audible impression; for example, the content element object or digital component is at least partially (e.g., 20%, 30%, 40%, 50%, 60%, 70%, or more) visible on a display device of the client computer device 104 or audible through a speaker 136 of the computer device 104.A click or selection can refer to a user interaction with the content element object, such as a voice response to an audible impression, a mouse click, a touch interaction, a gesture, a shake, an audio interaction, or a keyboard click. A conversation can refer to a user performing a desired action with respect to the content element object (e.g., purchasing a product or service, participating in a survey, visiting a physical store corresponding to the content element, or completing an electronic transaction).

[0044] The Content Provider Computer Device 106 can also set up one or more content groups for a content campaign. A content group comprises one or more content item objects and corresponding content selection criteria, such as keywords, words, terms, phrases, geographic locations, computer device type, time of day, interest, topic, or vertical. Content groups under the same content campaign can share the same campaign-level parameters but can have customized descriptions for content group-level parameters, such as keywords, negative keywords (e.g., this arrangement of the content item block in the presence of the negative keyword in main content), bids for keywords, or parameters associated with the bid or content campaign.

[0045] To create a new content group, the Content Provider Device 106 can provide values for the content group's content group level parameters. These parameters include, for example, a content group name or topic, and options for different content placement options (such as automatic arrangement or managed arrangement) or outcomes (such as clicks, impressions, or conversions). A content group name or topic can be one or more terms that the Content Provider Device 106 can use to capture a theme or subject for which content group content item objects are to be selected for display.A food and beverage company, for example, can create a different content theme for each food or beverage brand it carries and can further create a different content theme for each vehicle model it carries. Examples of content themes the food and beverage company might use include "Brand A Coke," "Brand B Ginger Ale," "Brand C Orange Juice," "Brand D Sports Drink," or "Brand E Purified Water." An example content campaign theme could be "Lemonade" and might include content themes for both "Brand A Coke" and "Brand B Ginger Ale." The content element (or content element object or digital component) can include "Brand A", "Brand B", "Brand C", "Brand D" or "Brand E".The content element object or digital component can refer to the content element configured for a parametrically controlled text-to-speech technique.

[0046] The Content Provider computer device 106 can provide one or more keywords and content element objects for each content group. Keywords can include terms relevant to the product or services associated with or identified by the content element object. A keyword can comprise one or more terms or phrases. For example, the food and beverage company might include "lemonade," "cola," and "soft drink" as keywords for a content group or content campaign, which can describe the goods or services the brand provides. In some cases, negative keywords can be specified by the content provider to avoid, prevent, block, or disable content placement for certain terms or keywords. The content provider can specify a type of match (e.g.,exact match, phrase match, or general match), which is used to select content element objects.

[0047] The content provider computer device 106 can provide one or more keywords for the data processing system 102 to use in selecting a content element object provided by the content provider computer device 106. The content provider computer device 106 can identify one or more keywords for bidding and further provide bid amounts for different keywords. The content provider computer device 106 can provide additional content selection criteria for the data processing system 102 to use in selecting content element objects. Multiple content providers 106 can make bids using the same or different keywords, and the data processing system 102 can execute a content selection process or an advertising auction in response to receiving a keyword notification in an electronic message.

[0048] The content provider computer device 106 can provide one or more content element objects for selection by the data processing system 102. The data processing system 102 can (e.g., via the content selection component 118) select the content element objects when a content placement opportunity becomes available that matches the resource allocation, content schedule, maximum bids, keywords, and other selection criteria specified for the content group. Different types of content element objects can be included in a content group, such as a speech content element, audio content element, text content element, image content element, video content element, multimedia content element, or a content element link.After selecting a content element, the data processing system 102 can send the content element object for rendering to a computer device 104 or a display device of the computer device 104. Rendering can include displaying the content element on a display device or playing the content element through a speaker of the computer device 104. The data processing system 102 can provide commands to a computer device 104, a chatbot component 114, or a third-party chatbot platform 146 to present the content element object. The data processing system 102 can instruct the computer device 104 or an audio driver 138 of the computer device 104 to generate audio signals or sound waves.

[0049] The data processing system 102 can include an interface component 110 that is designed, configured, built, or operational to receive and send information using, for example, data packets. The interface 110 can send and receive information using one or more protocols, such as a network protocol. The interface 110 can be a hardware interface, a software interface, a wired interface, or a wireless interface. The interface 110 can facilitate the translation or formatting of data from one format to another. For example, the interface 110 can include an application programming interface (API) that contains definitions for communication between different components, such as software components.

[0050] The data processing system 102 can include an application, script, or program installed on the client computer device 104, such as an application to communicate input audio signals to the interface 110 of the data processing system 102 and to control components of the client computer device to render output audio signals. The data processing system 102 can receive data packets, a digital file, or other signals that include or identify an audio input signal. The computer device 104 can detect the audio signal via the converter 136 and convert the analog audio signal into a digital file via an analog-to-digital converter. The audio driver 138, for example, can include an analog-to-digital converter component.

[0051] The data processing system 102 can execute or run the NLP component 112 to receive or obtain the digital file containing the audio signal and can parse the audio signal. The NLP component 112 can, for example, provide human-computer interaction. The NLP component 112 can be configured with natural language understanding techniques and can enable the data processing system 102 to derive meaning from human or natural language input. The NLP component 112 can include or be configured with machine learning techniques such as statistical machine learning. The NLP component 112 can use decision trees, statistical models, or probability models to parse the input audio signal. The NLP component 112 can, for example, perform functions such as proper name recognition (e.g.,Determining, from a given text stream, which elements in the text represent proper names such as persons or places and what type each such name is (e.g., person, place, or organization), natural language generation (e.g., converting information from computer databases or semantic intentions into intelligible human language), natural language understanding (e.g., converting text into more formal representations such as predicate logic structures that a computer module can manipulate), machine translation (e.g., automatically translating text from one human language to another), morphological segmentation (e.g., separating words into individual morphemes and identifying the class of morphemes, which can be challenging based on the complexity of the morphology or word structure of the language under consideration), and answering questions (e.g.,Determining an answer to a human language question, which may be specific or open-ended), semantic processing (e.g., the processing that may occur after identifying a word and encoding its meaning, in order to relate the identified word to other words with similar meanings).

[0052] The NLP component 112 converts the audio input signal into recognized text by comparing it to a stored representative set of audio waveforms (e.g., in data container 124) and selecting the closest matches. The audio waveform set can be stored in data container 124 or another database accessible to the data processing system 102. The representative waveforms are generated from a large number of users and can then be augmented with user-generated speech samples. After the audio signal has been converted into recognized text, the NLP component 112 matches the text with words that are linked to actions provided by the data processing system 102, for example, through user training or manual description. The NLP component 112 can also convert image or video input into text or digital files.The NLP component 112 can process, analyze, or interpret image or video input to perform actions, generate queries, or select or identify data structures.

[0053] The audio input signal can be detected by the client computer device 104 via the sensor 134 or the transducer 136 (e.g., a microphone). The client computer device 104 can provide the audio input signal to the data processing system 102 (e.g., via the network 105) via the transducer 136, the audio driver 138, or other components. There, it can be received as a digital file or in a digital format (e.g., via the interface 110) and provided to the NLP component 112 or stored in the data container 124. In some cases, the data processing system 102 can receive image or video input signals in addition to or instead of acoustic signals.The Data Processing System 102 can process image or video input signals using, for example, image interpretation techniques, computer vision, a machine learning engine, or other techniques to recognize or interpret the image or video and convert it into a digital file. One or more of these image interpretation techniques, computer vision techniques, and machine learning techniques can be collectively referred to as imaging techniques. The Data Processing System 102 (e.g., the NLP component 112) can be configured with imaging techniques in addition to, or instead of, sound processing techniques.

[0054] NLP component 112 can acquire the input audio signal. From the input audio signal, NLP component 112 can identify at least one request or at least one trigger keyword that corresponds to the request. The request can indicate the intention or subject of the input audio signal. The trigger keyword can indicate a type of action that is likely to be performed. For example, NLP component 112 can parse the input audio signal to identify at least one request to go out in the evening for dinner and a movie. The trigger keyword can include at least one word, phrase, word root, partial word, or derivative that indicates an action to be performed. For example, the trigger keyword "go" or "go to" from the input audio signal can indicate a need for transportation.In this example, the input audio signal (or the identified request) does not directly express an intention for a transport, but the trigger keyword indicates that a transport is an additional action to at least one other action indicated by the request.

[0055] NLP component 112 can parse the input audio signal to identify, determine, retrieve, or otherwise obtain the request and the trigger keyword. For example, NLP component 112 can apply a semantic processing technique to the input audio signal to identify the trigger keyword or the request. NLP component 112 can apply the semantic processing technique to the input audio signal to identify a trigger expression that includes one or more trigger keywords, such as a first trigger keyword and a second trigger keyword. For example, the input audio signal might include the sentence "I need a recipe for chicken wings."NLP component 112 can apply a semantic processing technique or another natural language processing technique to the data packets comprising the sentence to identify the trigger expressions "need," "recipe," and "chicken wings." NLP component 112 can further identify multiple trigger keywords such as "need," "recipe," and "chicken wings." For example, NLP component 112 can determine that the trigger expression includes the trigger keyword "recipe" and a second trigger keyword.

[0056] NLP component 112 can filter the input audio signal to identify the trigger keyword. The data packets transmitting the input audio signal might include, for example, "It would be great if I could get some help with a chicken wing recipe." In this case, NLP component 112 can filter out one or more terms such as "it," "would be," "great," "if," "I," "could," "get," or "help." By filtering out these terms, NLP component 112 can more accurately and reliably identify the trigger keywords, such as "chicken wing recipe," and determine that this is a request to launch a recipe chatbot.

[0057] In some cases, the NLP component can determine that the data packets transmitting the input audio signal contain one or more requests. For example, the input audio signal might contain the sentence, "I need some help preparing chicken wings and a movie listing." The NLP component 112 can determine that this is one request for a chicken wing recipe and a movie listing. The NLP component 112 can determine that this is a single request for a chatbot that can provide both recipes and a movie listing. The NLP component 112 can determine that this is two requests: a first request for a chatbot that provides recipes and a second request for a chatbot that provides a movie listing.In some cases, the NLP component 112 can combine the multiple specific requirements into a single requirement and send the single requirement to a chatbot component 114 or a third-party chatbot platform 146. In some cases, the NLP component 112 can send the individual requirements to the corresponding chatbot provider devices 108, or send both requirements separately to the same chatbot provider device 108.

[0058] The data processing system 102 can therefore receive a digital file corresponding to an initial acoustic signal transmitting speech content detected by a transducer 136 of the computer device 104. The initial acoustic signal can be converted into the digital file by an analog-to-digital converter (e.g., the audio driver 138) of the computer device 104. The data processing system 102 can parse the digital file to select a computer program that includes a chatbot. For example, the data processing system 102 can include a chatbot component 114, which is designed and built to select, in response to the digital file, a computer program that includes a chatbot for execution by the data processing system 102, the computer device 104, or the third-party chatbot platform 146.

[0059] Chatbot component 114 can identify keywords, tokens, terms, concepts, or other information in the digital file. Chatbot component 114 can use natural language processor component 112 to identify keywords, tokens, terms, concepts, or other information in the digital file. Natural language processor component 112 can provide the parsed keyword, token, term, or concept to chatbot component 114. Chatbot component 114 can then select a chatbot that responds to a keyword or concept in the digital file.

[0060] For example, the data processing system 102 can determine that the first digital file contains a request for a recipe chatbot. The chatbot component 114 can perform a lookup operation in a chatbot data structure 132 to identify a chatbot that can provide recipes. The chatbot data structure 132 can, for example, contain keywords or other information that describes the goods, service, or function that each chatbot can provide. The chatbot component 114 can use the identifier determined via the chatbot data structure 132 to launch, initiate, execute, or otherwise activate the appropriate chatbot. In some cases, the identifier can include or be associated with a filename or file path, a pointer, a web address, an Internet Protocol address, a URL, or other identifying information for the chatbot.For example, the data processing system 102 can determine that the recipe chatbot is provided via the third-party chatbot platform 146, and can instruct the third-party chatbot platform 146 to start the recipe chatbot and to connect to the computer device 104 either directly or via the data processing system 102 (e.g., via the chatbot component 114).

[0061] Before starting or initiating the starting or execution of the chatbot, the data processing system 102 can determine whether the computer device 104 is authorized to access the chatbot. The data processing system 102 can (e.g., via the chatbot component 114) perform a lookup in the data container 124 (e.g., the profile data structure 126) using the identifier of the computer device 104 to determine whether the computer device 104 is authorized to access the computer program that includes the chatbot. Authorization can be based on a subscription, plan, restriction, resource requirement, versioning, or device functionality. For example, the data processing system 102 can grant the computer device 104 access to the chatbot if the computer device 104 is configured with a predefined version of an operating system.In another example, data processing system 102 can grant computer device 104 access to the chatbot if computer device 104 is linked to a valid account or profile. If data processing system 102 determines that computer device 102 is not authorized to access the chatbot, data processing system 102 may, in some cases, terminate the thread, notify the user, or identify another chatbot that computer device 104 is authorized to access. Therefore, data processing system 102 can select the chatbot in response to the determination that computer device 104 is authorized to access it.

[0062] Interface 110 can start the chatbot itself or send a command to a third-party chatbot platform 146 to cause the third-party chatbot platform 146 to call a conversational application programming interface (API) connected to the chatbot (e.g., the NLP component 142) and establish a communication session between the data processing system 102 or the third-party chatbot platform 146 and the client computer device 104. In response to the establishment of the communication session between the data processing system 102 or the third-party chatbot platform 146 and the client computer device 104, the data processing system 102 or the third-party chatbot platform 146 can send data packets directly to the client computer device 104 via the network 105. In some cases, the third-party chatbot platform 146 can send data packets to the client computer device 104 via the data processing system 102 and the network 105.

[0063] In some cases, the chatbot provider device 108, the chatbot, or the third-party chatbot platform 146 can execute at least one section of the Conversation API 142. For example, the third-party chatbot platform 146 can handle certain aspects of the communication session or types of queries. The third-party chatbot platform 146 can utilize the NLP component 112 executed by the data processing system 102 to facilitate the processing of audio signals associated with the communication session and the generation of responses to queries. In some cases, the data processing system 102 can include the Conversation API 142 configured for the third-party chatbot platform 146. In some cases, the data processing system routes data packets between the client computer device and the third-party provider device to establish the communication session.The data processing system 102 can receive a notification from the third-party chatbot platform 146 that the third-party device has established a communication session with the client device 104. The notification may include an identifier of the client computer device 104, a corresponding timestamp indicating when the communication session was established, or other information associated with the communication session, such as the data structure associated with the communication session.

[0064] The conversational API can be a second NLP that includes one or more components or functions of the first NLP 112. The second NLP 142 can interact with or utilize the first NLP 112. In some cases, the system 100 can include a single NLP 112 executed by the data processing system 102. The single NLP 112 can support both the data processing system 102 and the chatbot. In some cases, the interface 110 creates or constructs a data structure to facilitate the execution of a service, and the conversational API generates responses or queries to support a communication session with an end user or to obtain additional information to enhance or promote the end-user experience or the service's performance.

[0065] The computer program comprising the chatbot can be run on the data processing system 102, the chatbot provider device 108, or the third-party chatbot platform 146. The chatbot can receive and process one or more digital files or sections of one or more digital files to determine a response. For example, the chatbot can be run as the chatbot component 114 on the data processing system 102.

[0066] After execution, the chatbot can identify a dialog data structure that responds to the digital file. The digital file might, for example, correspond to a voice input such as "I need a recipe for chicken wings." The chatbot, such as a recipe chatbot, can identify a dialog data structure in response to the query using a natural language processing technique, a search engine technique, a pattern matching technique, or a semantic analysis technique. The dialog data structure might, for example, contain ingredients for chicken wings. The dialog data structure might include a placeholder field. The placeholder field can be populated with a content element. The placeholder field can serve as a tag or a hint that triggers a content request.

[0067] The chatbot developer can include the placeholder field as part of the chatbot's computer program or as part of the query response. The developer can program the placeholder field using an application programming interface (API), a script, a tag, a markup language, or another mechanism that allows the chatbot to identify the placeholder field, request content, and populate it with a selected content element. The placeholder field can be associated with metadata that provides content selection criteria, which can be used to select a content element relevant to the dialog data structure and suitable for insertion into the placeholder field.For example, if the dialog data structure is a list of ingredients and the placeholder field precedes and modifies the term "Cola", then the metadata or content selection criteria can indicate how to select a content element that includes a brand name or a company that sells cola.

[0068] In some cases, the dialog data structure may not include a placeholder field. The data processing system 102 can receive or listen to the identified dialog data structure. The data processing system 102 may include a placeholder generation component 116, which is designed and built to identify a section of the dialog data structure into which the placeholder field is to be inserted. The placeholder generation component 116 can use or connect to the natural language processing component 112 to process the dialog data structure and identify a section into which the placeholder field is to be inserted. The placeholder generation component 116 can identify the section based on keywords or terms in the dialog data structure. The placeholder generation component 116 can identify the section based on available content elements.The placeholder generation component 116 can, for example, identify the keyword "Cola" in the dialog data structure and further determine that a placeholder field precedes the term "Cola". The placeholder generation component 116 can determine that the content data structure 130 contains content data associated with the keyword "Cola". The placeholder generation component 116 can determine to initiate a content selection process for the keyword "Cola" and insert the selected content element next to the term "Cola" into the dialog data structure.

[0069] The placeholder generation component 116 can determine whether and where the placeholder field or its content element should be inserted using natural language processing techniques (e.g., via the natural language processor component 112). For example, the placeholder generation component 116 can use the NLP component 112 to identify the grammar and syntax of the dialog data structure, as well as keywords of the data structure. Based on the keyword, grammar, and syntax, the placeholder generation component 116 can determine where the placeholder field should be inserted. Grammar can denote a set of rules in a given language. Syntax can denote the structure of the sentence. Based on the grammar and syntax of the dialog data structure, the placeholder generation component 116 can determine a suitable position for the placeholder field.The placeholder generation component 116 can, for example, determine whether the placeholder field should be positioned next to and before a noun in the dialog data structure. It can also determine whether the placeholder field should be positioned next to and before a noun located at the beginning, middle, or end of the dialog data structure. Furthermore, it can determine whether the placeholder field should not be positioned next to a verb, pronoun, adjective, or adverb. In some cases, the placeholder generation component 116 can determine whether to insert a placeholder field into the dialog data structure at all. For example, the only noun in the dialog data structure might be the first expression, and the placeholder generation component 116 might be configured not to insert a content element as the first expression.

[0070] Therefore, the data processing system 102 can automatically insert a placeholder field into a dialog data structure and populate the placeholder field with a content element. By automatically generating the placeholder field, the chatbot computer program can use less memory or have less complex and error-prone code, since the developer cannot include placeholder fields in every dialog data structure.

[0071] After identifying the placeholder field, the chatbot can send a content request. In some cases, after determining to insert the placeholder field, the placeholder generation component 116 can trigger the content selection process via the content selection component 118 without returning the dialog data structure to the chatbot.

[0072] The data processing system 102 can include, execute, or otherwise communicate with a content selection component 118 to receive the trigger keyword identified by the natural language processor and, based on the trigger keyword, select a content item via a real-time content selection process. The content selection process can reference or include the selection of funded content item objects provided by third-party content providers 106. The real-time content selection process can include a service that parses, processes, weights, or matches content items provided by multiple content providers to select one or more content items to be delivered to the computer device 104. The content selection component 118 can execute the content selection process in real time.Executing the content selection process in real time can refer to executing the content selection process in response to the content request received via client computer device 104. The real-time content selection process can be executed (e.g., initiated or completed) within a time interval in which the request is received (e.g., 5 seconds, 10 seconds, 20 seconds, 30 seconds, 1 minute, 2 minutes, 3 minutes, 5 minutes, 10 minutes, or 20 minutes). The real-time content selection process can be executed during a communication session with client computer device 104 or within a time interval after the communication session has ended.

[0073] The data processing system 102 may, for example, include a content selection component 118 that is designed, built, configured, or operational to select content element objects. The content selection component 118 may identify, analyze, or recognize the speech, audio, terms, characters, text, symbols, or images of the candidate content elements using image processing, optical character recognition (OCR), natural language processing, or database lookup techniques. The candidate content elements may include metadata indicating the subject of the candidate content elements, in which case the content selection component 118 may process the metadata to determine whether the subject of the candidate content element corresponds to the input audio signal.

[0074] The content provider 106 can provide additional indicators when setting up a content campaign that includes content elements. The content provider computer device 106 can provide information at the content campaign or content group level that the content selection component 118 can identify by performing a lookup using information about the candidate content element. The candidate content element can, for example, include a unique identifier that can be associated with a content group, a content campaign, or a content provider. The content selection component 118 can determine information about the content provider computer device 106 based on information stored in the data structure of the content campaign data structure in data container 124.

[0075] The data processing system 102 can receive a content request for delivery via a computer device 104. The request can include selection criteria, such as the device type, location, and a keyword associated with the request. The request can include the dialog data structure.

[0076] In response to the request, the data processing system 102 can select a content element object from the data container 124 or a database linked to the content provider computer device 106 and make the content element available for presentation via the computer device 104 over the network 105. The content element object can be provided by a content provider device 108, which is distinct from the chatbot provider device 108. The computer device 104 can interact with the content element object. The computer device 104 can receive an audio response regarding the content element.The computer device 104 can receive a prompt, select a hyperlink or other button associated with the content element object, which causes or enables the computer device 104 to identify the content provider computer device 106, request a service from the content provider computer device 106, instruct the content provider computer device 106 to perform a service, send information to the content provider computer device 106, or otherwise identify a good or service associated with the content provider computer device 106.

[0077] The content request can include content selection criteria such as content format, keywords, concepts, profile information, or other information that can facilitate content selection. Content selection component 118 can execute a real-time content selection process. Real-time content selection can refer to executing the content selection in response to the content request. The content request can be generated, sent, or otherwise provided after the chatbot identifies the dialog data structure that responds to the voice input.

[0078] Content selection component 118 can select a content element that includes text, a string, or characters that can be processed by a text-to-speech system. Content selection component 118 can select a content element that has a parameterized format configured for a parametrically controlled text-to-speech technique. In some cases, the dialog data structure may be in SSML format or configured with language parameters. Data processing system 102 can configure the language parameters of the content element to match the language parameters of the dialog data structure identified by the chatbot, so that the content element can be presented to the user of the computer device 104 with a native language, image, or acoustic fingerprint (e.g., the content element has the same or similar acoustic properties compared to the dialog data structure without the content element).

[0079] Content Selection Component 118 can select a content element in a parameterized format configured for text-to-speech, rather than a content element in an audio file format. For example, Content Selection Component 118 cannot select a content element in an audio file in an audio file format or audio encoding format such as .WAV, .AIFF, or .AU, because a content element already in an audio file format cannot be configured for seamless insertion into the placeholder field of the dialog data structure identified by the chatbot computer program. Furthermore, a content element in an audio file format may have a different acoustic fingerprint compared to the computer device's native language or the chatbot's acoustic fingerprint set.If the content element audio file has a different acoustic fingerprint compared to the native language or the chatbot's acoustic fingerprint or the dialog data structure (e.g., words are spoken at different speeds, frequencies, pitches, tones, volumes, or intonations), then inserting or integrating the content element audio file into the placeholder field in the dialog data structure may not be seamless, smooth, or continuous. For example, the content element audio file with the different acoustic fingerprint may cause awkward transitions or indicate inconsistencies.By providing the content element configured for a text-to-speech technique in which the chatbot or computer device can play the content element in a manner that corresponds to the acoustic fingerprint or native language of the chatbot or computer device, the data processing system 102 can therefore facilitate the provision of seamless modification of the chatbot computer program output.

[0080] Content selection component 118 can provide the selected content element to the chatbot, causing the computer device to execute the text-to-speech technique to generate an acoustic signal corresponding to the dialog data structure modified by the selected content element. In some cases, data processing system 102 can send data packets corresponding to the content element.

[0081] The data processing system 102 can include, execute, access, or otherwise communicate with a session handler component 120 to establish a session. The session handler component 120 can establish the session in response to the first digital file. For example, the session handler component 120 can establish a communication session between the client device 104 and the data processing system 102. The communication session can refer to one or more data transmissions between the client device 104 and the data processing system 102, which includes the digital file corresponding to the input audio signal detected by a sensor 134 of the client device 104, and the output signal being sent by the data processing system 102 to the client device 104. The data processing system 102 can (for example,The session handler component 120 establishes the communication session in response to receiving the input audio signal. The data processing system 102 can set a duration for the communication session. The data processing system 102 can set a timer or a counter for the duration set for the communication session. In response to the expiration of the timer, the data processing system 102 can end the communication session.

[0082] The session handler component 120 can determine an interruption, a pause, or the end of a session based on one or more of a time threshold, a location threshold, or machine language processing. The time threshold, for example, can be a time interval such as 5 minutes, 10 minutes, 15 minutes, 20 minutes, 30 minutes, 1 hour, 2 hours, 6 hours, or more. The location threshold can be a distance between the location of the computer device 104 at the time the session is established and the current location of the computer device 104. The distance threshold can be 0.5 miles, 1 mile, 2 miles, 3 miles, 5 miles, 10 miles, 20 miles, 50 miles, or more.The time and location thresholds can be dynamic, varying based on the time of day, geographic location, population density, historical profile information, chatbot type, or other session-related information. For example, the time threshold and interval might be shorter for a chatbot used to find a coffee shop compared to one used to facilitate booking a vacation, as the vacation booking process can take several days. Natural language processing can indicate an interruption based on a change in the topic or category of the conversation.

[0083] The communication session can refer to a network-based communication session in which the client device 104 provides authenticating information or login credentials to establish the session. In some cases, the communication session refers to a chatbot, a topic, or a context of audio signals transmitted by data packets during the session. For example, a first communication session might refer to audio signals transmitted between the client device 104 and the data processing system 102 relating to a recipe (e.g., including keywords, dialogue data structures, chatbot, or content element objects); and a second communication session might refer to audio signals transmitted between the client device 104 and the data processing system 102 relating to movie tickets.In this example, the data processing system 102 can determine that the context of the audio signals differs (e.g., via the NLP component 112) and separate the two sets of audio signals into different communication sessions. The session handling device 114 can terminate the first session, which is associated with the recipe, in response to identifying one or more audio signals associated with the movie tickets. Therefore, the data processing system 102 can initiate or establish the second session for the audio signals associated with the movie tickets in response to detecting the context of the audio signals.

[0084] Session Handler Component 120 is designed and built to allow the reuse of a content element that was previously selected for provisioning with a dialog data structure. Session Handler Component 120 can prevent, block, disable, or cancel a second content selection process by Content Selector Component 118 in response to Session Handler Component 120's determination to reuse a previously selected content element. By avoiding redundant content selection processes or otherwise eliminating them, Session Handler Component 120 can reduce the use of processor and other computer resources by Data Processing System 102.

[0085] The data processing system 102 can reuse or renew the content element during the same session or after determining a session break and determining to merge the subsequent session with the previous session in order to continue the previous session. The data processing system 102 can resume a session after determining a session break. The data processing system 102 can resume the session if a subsequent digital file, speech input, semantic input, or other information associated with the subsequent digital file or speech input indicates that the current speech input corresponds to or relates to a previous session.For example, the session handling device 120 can use the natural language processing component 112 to compare a second digital file received after a session interruption with a first digital file received before the session interruption to determine whether they relate to one or more of the same topic, category, task flow, chatbot, computer device, or dialog data structure. The computer device 104 can then, for example, invoke the same chatbot after the session interruption, and the data processing system 102 can determine whether to resume the previous session.

[0086] To identify and reuse the content element, the data processing system 102 can associate the previously selected content element with a value in the index data structure 128. This value can be generated based on a first chatbot identifier associated with the selection of the content element, a second identifier for an initial dialog data structure with which the content element was provided, and a third identifier for the computer device 104 associated with providing the content element. If the session handler component 120 determines that a subsequent request for a content element is associated with the same value, it can decide to reuse the previously selected content element.

[0087] The selected content element "Brand A" may, for example, have initially been provided via computer device 104 with a dialog data structure that includes "Ingredient: Cola" as "Ingredient: Cola Brand A", as identified by the chatbot. The first, second, and third identifiers can be alphanumeric, such as: first identifier: chatbot_123; second identifier: dialog_data_structure_123; third identifier: computing_device_123. The dialog data structure identifier can correspond to the topic, concept, category, or exact expression of the dialog data structure. For example, the second identifier, dialog_data_structure_123, could correspond to all dialog data structures that provide a chicken wing recipe. In another example, the second identifier, dialog_data_structure_123, could correspond to all dialog data structures that provide a chicken wing recipe with cola.In another example, the second identifier `dialog_data_structure_123` can correspond to all dialog data structures that provide a chicken wing recipe with cola, using a placeholder field immediately preceding the term "cola". The index value can be formed from these three tuples or any other number of tuples. The index value can be formed using a hash function that splits the tuple containing the first, second, and third identifiers. The index value can be a hash value and stored in a hash table (e.g., index data structure 128). The index value can be numeric or alphanumeric, or it can contain symbols or other identifiers. The index value can correspond to a row and column in a table or to entries in a multidimensional table. The index value can correspond to a field in a data structure, such as `index_value{first_identifier, second identifier, third identifier}`.

[0088] The data processing system 102 can link the content element with the generated index value in the index data structure 128. The data processing system 102 can link, assign, or set additional parameters or conditions for the link. The data processing system 102 can set a duration for the link between the content element and the generated index value, a geographical condition for the link, or a semantic condition. For example, if the session duration exceeds a time interval (e.g., 5 minutes, 10 minutes, 20 minutes, 30 minutes, 1 hour, 2 hours, 6 hours, 12 hours, or more), the data processing system 102 can break, suspend, or terminate the link between the content element and the index value. If the computer device 104 is more than a certain distance (e.g.,If the computer device 104 has traveled or moved 0.5 miles, 1 mile, 2 miles, 3 miles, 5 miles, 10 miles, 20 miles, 30 miles or more) from where the computer device 104 was located when the connection between the content element and the index value was created, the data processing system 102 may determine to break, interrupt or terminate the connection.

[0089] The data processing system 102 can use the connection between the content element and the index value to reuse the content element in a second dialog data structure. For example, the data processing system 102 can receive a second digital file corresponding to a different acoustic signal transmitting speech content detected by the microphone on the computer device 104. In response to the second speech content of the second digital file, the data processing system 102 can select the computer program containing the chatbot corresponding to the first identifier: chatbot_123. The chatbot can identify a second dialog data structure containing a second placeholder field. The chatbot can determine that the second dialog data structure has a fourth identifier corresponding to the second identifier of the first dialog data structure.The second dialog data structure and the first dialog data structure might both correspond to a recipe for chicken wings, for example, and therefore refer to the same identifier. Since the second dialog data structure is associated with the same chatbot, the same computer device, and the same dialog data structure identifier, data processing system 102 can determine to reuse the previously selected content element for the placeholder field in the second dialog data structure. Data processing system 102 can generate the index value using the identifiers of the second dialog data structure, select the content element associated with the same index value in index data structure 128, and provide the same content element to the second dialog data structure to cause the chatbot or computer device to generate an acoustic signal corresponding to the second dialog data structure modified with the same content element.

[0090] The first dialog data structure with the content element could be, for example: "Ingredients for chicken wings include Coke Brand A". The second dialog data structure with the reused content element could be: "In a saucepan, combine 1 can of Coke Brand A with the onions, garlic, and brown sugar." The content element "Brand A" is therefore reused in the second dialog data structure, thus eliminating a second content selection process and reducing the use of computer resources by content selection component 118.

[0091] In some cases, the data processing system 102 can reuse the content element across different chatbots. For example, the content element can be selected for insertion into a dialog data structure, which is identified by a first chatbot, such as a recipe chatbot. After the content element is provided with the first dialog data structure of the first chatbot, a second chatbot can be called via the computer device 104. The second chatbot can be called based on a second digital file received via the computer device 104. In response to the second digital file, the second chatbot can identify a second dialog data structure.The second digital file might, for example, correspond to a request for movie times; the second chatbot might be a movie chatbot; and the second dialogue data structure might be a list of one or more movie times, such as "Movie A is showing at the local cinema today at 6:00 PM, 7:30 PM, and 9:00 PM." Since the identifier of the second chatbot and the second dialogue data structure might differ from those of the first chatbot and the first dialogue data structure, Data Processing System 102 cannot generate the same index value for the second chatbot as it did for the first. However, Data Processing System 102 can use different identifiers or techniques to determine whether to reuse the content element.For example, data processing system 102 can determine that the movie theater identified by the second chatbot in the second dialogue data structure provides the product or service corresponding to the content element selected for provision with the first dialogue data structure. Data processing system 102 can then determine that the identifier of a product associated with the content element, or the keyword linked to the content element, matches or corresponds to metadata associated with the local movie theater identified by the second chatbot. Therefore, data processing system 102 can determine to reuse the content element with the second dialogue data structure as follows: "Movie A is showing today at the local cinema at 6:00 PM, 7:30 PM, and 9:00 PM, which sells Coke Brand A."

[0092] Data processing system 102 can automatically insert the placeholder into the second dialog data structure, or the second dialog data structure can be configured with the placeholder. The second dialog data structure can, for example, include a second placeholder like this: "Movie A is showing today at the local cinema at 6:00 PM, 7:30 PM, and 9:00 PM. Be on time to buy cola, 2 medium onions, and 2 cloves of garlic." In response to identifying the field, data processing system 102 can receive a content request at action 212. The chatbot, the third-party chatbot platform, the computer device, or a component of the data processing system can generate the request. Data processing system 102 can receive the request and select a content element.Data processing system 102 can provide the selected content element at action 214. Data processing system 102 can provide the content element to the chatbot, client computer device 104, the third-party chatbot platform, or another entity. In some cases, data processing system 102 can integrate or embed the content element into the dialog data structure. For example, if data processing system 102 receives or has access to dialog data structure 210, data processing system 102 can embed the content element in dialog data structure 210 and provide the modified dialog data structure containing the content element to client computer device 104. Client computer device 104 can output the modified dialog data structure at action 216 as follows: "Ingredients: 1 cup brown sugar, 1 can Brand_Name Cola, 2 medium onions, 2 cloves of garlic" 218.

[0093] The data processing system 102 can determine that actions 202 to 218 correspond to Session 1. The data processing system can then determine a session break. For example, the data processing system 102 can receive sensor input 220 indicating a break, such as location information, timer information, physical activity information, speech input, or the absence of sensor input, indicating a state of inactivity. At action 222, the data processing system 102 can detect the session break based on sensor input 220 (or its absence), generate an index value for the selected content element in order to potentially reuse the content element in a later session or after the session is resumed, and store the content element in memory 224 along with the index value.

[0094] Fig. Figure 3 illustrates an operation of a system 300 for balancing data requests to modify a computer program output based on a session. In action 302, the computer device 104 receives speech input (or other non-text input such as an image or video). In action 304, the client computer device can send a digital file corresponding to the received speech input to the data processing system 120. The data processing system 102 can process the digital file to determine that the speech input from a previous session 1, as shown in the diagram, is the correct version of the speech input. Fig. 2 illustrates this. The voice input can, for example, correspond to how to prepare the ingredients shown in 210 in order to prepare the dish.

[0095] In some cases, the input 302 may include image or video input in addition to or instead of speech input. The computer device 104 may use one or more imaging techniques (e.g., computer vision, machine learning, image analysis, or image interpretation) to process or analyze the image and convert it into the digital file. In some cases, the computer device 104 may convert the image or video input into a digital file without performing an imaging technique to further process, analyze, or interpret the image. Instead, the computer device 104 may forward or send the digital file corresponding to the input image or video to the data processing system for further image processing.The data processing system 102 can process the digital file corresponding to the image or video input to determine that the input 302 corresponds to the previous session 1. The image and video input can be processed or parsed to obtain the same type of information as is obtained when the speech input 302 is processed or parsed.

[0096] At action 306, the data processing system 102 can resume the session with the same recipe chatbot, which can identify the same recipe 308 and the instructions to prepare the recipe as follows: "Add to the pan: 1 cup brown sugar, 1 can of cola, 2 medium onions, 2 cloves of garlic" 310. The preparation instructions can take the form of a second dialog data structure. The second dialog data structure can include a second placeholder field. The chatbot or other related device or system can send a content request to the data processing system 102 at action 312. In response to the request, the data processing system 102 can select the same content element that was previously provided at action 214 during session 1.Data processing system 102 can select the same content based on identifiers and a generated index value associated with the content element in memory. In action 314, data processing system 102 can provide the same content element from session 1 to client computer device 104. In action 316, the client computer device can output the dialog data structure with the content element modified as follows: "Put in the pan: 1 cup brown sugar, 1 can Brand_Name Coke, 2 medium onions, 2 cloves of garlic" 318.

[0097] In action 320, the data processing system can receive sensor input (e.g., from computer device 104) and detect a session interruption in action 322. The data processing system 102 can store in memory the index value associated with the second presentation of the content element in memory 324. The data processing system 102 can store the second index value in addition to the previously generated index value. The index values can be the same, or additional index values associated with the content element can be generated. For example, the second dialog data structure can be linked with additional identifiers or keywords compared to the first dialog data structure.

[0098] Fig. Figure 4 illustrates an operation of a System 400 for balancing data requests to modify computer program output based on a session. In Action 402, the client computer device 104 can receive speech input (or other input such as image, video, or other nontext input). In Action 404, the client computer device sends a digital file corresponding to the speech input to the computer system 102. The computer system 102 can process the digital file and select a chatbot. The computer system 102 can further determine whether to resume a previous session 1. Although the computer system 102 (or the third-party chatbot provider platform 146) invokes a new chatbot, the computer system 102 can, for example, determine whether to resume the previous session 1 based on other attributes associated with the digital file (e.g.,same computer device 104, temporal information, spatial information, semantic analysis, possibilities for inserting the same content element or historical network activity indicative of interaction with the content element presented in session 1).

[0099] In some cases, the 402 input may include image or video input in addition to or instead of speech input. The computer device 104 may use one or more imaging techniques (e.g., computer vision, machine learning, image analysis, or image interpretation) to process or analyze the image and convert it into a digital file. In some cases, the computer device 104 may convert the image or video input into a digital file without performing an imaging technique to further process, analyze, or interpret the image. Instead, the computer device 104 may forward or send the digital file corresponding to the input image or video to the data processing system for further image processing.The data processing system 102 can process the digital file corresponding to the image or video input to determine that the input 302 corresponds to the previous session 1. The images and videos can be processed or parsed to obtain the same type of information as is obtained when the speech input 402 is processed or parsed.

[0100] In action 406, the data processing system 102 continues session 1 with a second chatbot. The second chatbot could, for example, be a movie chatbot. Based on the request or query in the digital file 404, the movie chatbot can identify a dialogue data structure for the cinema and showtime information 408. However, in this example, the cinema and showtime dialogue data structure cannot include a placeholder such as: "Action movie is playing at the local cinema at 8:00 PM" 410.

[0101] At action 412, the chatbot (or another entity) can provide the identified dialog data structure 410 to the data processing system 102. In some cases, the data processing system 102 can intercept the dialog data structure 410. The data processing system 102 can, for example, act as an intermediary between the computer device and the third-party chatbot platform 146. The data processing system 102 can parse or otherwise process the second dialog data structure and can determine (e.g., via a placeholder generation component) whether to insert a placeholder field or directly insert a content element into the dialog data structure. The data processing system 102 can provide the modified dialog data structure, containing the same content element selected in session 1, to the client computer device at action 414.In action 416, the client computer device 104 can output the dialog data structure with the content element as follows: "Action movie is playing at the local cinema at 8:00 PM, which also offers Brand_Name Coke" 418. Since the data processing system 102 uses the content element in this example where there is no placeholder field, the data processing system can add an expression to integrate the content element "Brand_Name" into the dialog data structure. The data processing system 102 can modify the grammar or syntax of the dialog data structure to integrate the content element. The data processing system 102 can be preconfigured with structures that facilitate the identification of a grammar or syntax of the dialog data structure and the identification of a template to use to modify the grammar or syntax of the dialog data structure.

[0102] Fig. Figure 5 illustrates an operation of a System 500 to validate a modification of a computer program output over a network. In Action 502, the Data Processing System 102 can receive a content request from a third-party chatbot platform 146. In Action 504, the Data Processing System 102 can perform a content selection process and provide the content element to the third-party chatbot platform 146. In Action 506, the third-party chatbot platform 146 can provide the content element to a chatbot 508. The chatbot 508 can be a computer program running on the third-party chatbot platform 146 or on a client computer device 104. In Action 510, the chatbot 508 can provide a dialog data structure modified with the content element to the client computer device 104.To determine whether the content element provided to computer device 104 at action 510 is the same content element provided by data processing system 102 to the third-party chatbot platform 146 at action 504, data processing system 102 can send a validation ping to chatbot 508 at action 512. The validation ping 512 can request information about the content element provided to computer device 104 at action 510. At action 514, chatbot 508 can respond to the validation ping with information about the content element, such as a content element identifier, content element keywords, a content element timestamp, or other information indicating the presentation of the content element.

[0103] Data processing system 102 can compare the content element reference received at action 514 with the content element provided at action 5045 to set the validation parameter 516. If the content elements match, data processing system 102 can, for example, validate the third-party chatbot platform 146. If the content elements do not match, data processing system 102 can invalidate the third-party chatbot platform or generate a warning or notification. Data processing system 102 can further validate or invalidate platform 146 based on a delay between when the content element was provided to platform 146 at action 504 and when the content element was provided to the chatbot at action 506 or to computer device 104 at action 510.

[0104] The in the Fig. 2, Fig. 3, Fig. 4 to Fig. The 5 depicted systems or operational processes 200, 300, 400 and 500 can represent one or more components or functionalities of the system described in Fig. The systems or operational sequences 200, 300, 400 and 500 may, for example, include or be executed by a data processing system 102, a client computer device 104, a third-party chatbot provider device 146 or a content provider device 106.

[0105] Fig. Figure 6 illustrates a method for modifying computer program output over a computer network. Method 600 can be modified by one or more of the following: Fig. 1, Fig. 2, Fig. 3, Fig. 4 to Fig. The components or systems shown in Figure 5 are executed, including, for example, System 100, Data Processing System 102, Computer Device 104, Third-Party Chatbot Platform 146, Chatbot Provider Device 108, or Content Provider Computer Device 106. In Action 602, the Data Processing System can receive a digital file. The Data Processing System can receive the digital file from a Computer Device or a Third-Party Chatbot Platform. The digital file can correspond to speech input detected by a microphone in the Computer Device. The digital file can comprise a digitized representation of the analog speech input.

[0106] In action 604, the data processing system or third-party chatbot platform can select a computer program that includes a chatbot and invoke the chatbot. In some cases, the data processing system 102 can select and invoke the chatbot; a third-party chatbot platform can select and invoke the chatbot; or the computer device can select and invoke the chatbot. The chatbot can be selected and invoked before the digital file is sent to the data processing system.

[0107] In action 606, a placeholder field can be identified in the dialog data structure. In action 608, and in response to the identification of the placeholder field, a content request can be generated and sent to the data processing system. In some cases, the data processing system cannot receive the digital file corresponding to the acoustic signal. For example, instead of the digital file, the data processing system might receive a content request and information to facilitate content selection. The data processing system can receive the content request from the third-party chatbot platform, the chatbot itself, or the computer device.

[0108] In action 610, the data processing system can select the content element that responds to the request and provide the content element. The data processing system can provide the selected content element to the third-party chatbot platform, the computer device, or any other entity that requested the content element. The data processing system can insert the content element into the dialog data structure and provide the modified dialog data structure for presentation via the computer device.

[0109] The data processing system can automatically generate placeholder fields for insertion into the dialog data structure. The system can use a template, structure, semantic analysis, guideline, or rule to determine whether and where a placeholder should be inserted into the dialog data structure. In some cases, the chatbot developer may request the data processing system to determine the precise location within the dialog data structure for the placeholder field.A policy or rule based on semantic analysis might involve the data processing system identifying a noun in the dialog data structure by generating a keyword based on the noun. The system then uses this keyword to perform a content selection process to determine if there are any content items provided by content providers that match or are otherwise relevant to the noun in the dialog data structure. For example, the noun might be "lemonade." The data processing system could parse the noun "lemonade" to generate one or more keywords such as "lemonade," "drink," "soft drink," "cola," "limo," or "soda water." The data processing system could then use this keyword to identify content items. The data processing system can determine whether to insert the placeholder field.The placeholder field can be linked to keywords, metadata, position information, or other information associated with the dialog data structure. The placeholder can be linked to an identifier of the placeholder field. In some cases, the data processing system may determine to use the placeholder field in response to identifying at least one piece of content that has a relevance score greater than a threshold with respect to the noun in the dialog data structure.

[0110] The data processing system can therefore generate the second placeholder field for the second dialog data structure and compare this second placeholder field with the first placeholder field of the first dialog data structure to determine, based on this comparison, whether a second request for the second content should be generated in the parameterized format. The data processing system can, for example, compare the identifiers of the placeholder fields, the keywords of the placeholder field, metadata, or positional information (e.g., within the first three words of the dialog data structure or at least three words). Based on this comparison of the placeholder fields, the data processing system can determine that there is a similarity between the placeholder fields (e.g., similar or identical keywords).Both placeholder fields could, for example, represent a brand of lemonade. In this case, the data processing system can reuse the content element from the first placeholder field. Based on the comparison, the data processing system can determine not to request a second content element for the second placeholder field of the second dialog data structure, since the data processing system can reuse the content element selected for the first placeholder field to insert it into the second placeholder field of the second dialog data structure.

[0111] However, if the data processing system determines, based on the comparison, that the placeholder fields are not similar or are different (e.g., different keywords or keywords do not match), the data processing system may decide to select a new, second content element. For example, the first placeholder field (or the first dialog data structure) might be associated with keywords for lemonade, while the second placeholder field (or the second dialog data structure) might be associated with keywords for luxury cars.

[0112] Fig. Figure 7 illustrates a method for balancing data requests to modify computer program output over a computer network. Method 700 can be implemented using one or more of the following: Fig. 1, Fig. 2, Fig. 3, Fig. 4 to Fig. The components or systems shown in Figure 5 are executed, including, for example, System 100, Data Processing System 102, Computer Device 104, Third-Party Chatbot Platform 146, Chatbot Provider Device 108, or Content Provider Computer Device 106. In Action 702, the Data Processing System can receive a digital file. The Data Processing System can receive the digital file from a Computer Device or a Third-Party Chatbot Platform. The digital file can correspond to speech input detected by a microphone in the Computer Device. The digital file can include a digitized representation of the analog speech input. The digital file can be preprocessed to include keywords or tokens associated with the speech input.In some cases, the data processing system cannot receive the digital file and instead receives the content request with information about the digital file, which can facilitate content selection.

[0113] In action 704, the data processing system or third-party chatbot platform can select a computer program that includes a chatbot and invoke the chatbot. In some cases, the data processing system 102 can select and invoke the chatbot; a third-party chatbot platform can select and invoke the chatbot; or the computer device can select and invoke the chatbot. The chatbot can be selected and invoked before the digital file is sent to the data processing system.

[0114] In action 706, a placeholder field can be identified in the dialog data structure. In action 708, the data processing system can select the content element and provide it. The data processing system can provide the selected content element to the third-party chatbot platform, the computer device, or any other entity that requested the content element. The data processing system can insert the content element into the dialog data structure and provide the modified dialog data structure for presentation via the computer device.

[0115] In action 710, the data processing system can facilitate the reduction of computer resource usage by storing the content element in memory. The data processing system can associate the content element with an index value based on information linked to the content element's presentation. For example, the index value can be generated based on one or more identifiers associated with the content element, the computer device, the chatbot, the dialog data structure, the keyword, the topic, or the location. The index value can be generated based on identifiers relevant to the session and can be used to determine whether to continue the session, to identify which session to continue, or to start a new session.

[0116] In action 712, the data processing system can receive a second digital file or a second content request with information to facilitate content selection. In action 714, the data processing system can select or invoke the same chatbot that was invoked in action 704. In cases where a different device or unit selects and invokes the chatbot, the data processing system can receive an indication of the chatbot being active in action 714.

[0117] In action 716, the data processing system can determine whether to provide the same content element as the one provided in action 708. For example, the data processing system can determine that the same content element is relevant for the second dialog data structure.

[0118] The data processing system can determine whether to reuse content elements based on speech input, or it can determine not to reuse content elements based on a comparison of the speech input with the previous speech input. For example, if the speech input received later differs from the speech input received previously in terms of keywords, content, context, or other parameters, the data processing system can determine not to reuse the content element. In some cases, the data processing system can determine not to reuse content elements from a different session. Two sessions can be different if, for example, they are separated by a time greater than a threshold, correspond to different end users, have different speech inputs with different acoustic fingerprints, or are located in different geographic locations.

[0119] The data processing system can, for example, receive a third or subsequent digital file corresponding to a fifth or subsequent acoustic signal transmitting third or subsequent speech content detected by the microphone on the computer device. In response to the third speech content of the third digital file, the data processing system can select the computer program containing the chatbot. Based on the third speech content of the third digital file, the data processing system can identify a third or subsequent dialog data structure containing a third placeholder field via the chatbot. The data processing system can generate a second or new index value based on a combination of the first identifier, the third identifier, and a fifth identifier of the third dialog data structure.The data processing system can determine, based on a comparison of the index value with the second index value, not to reuse the content element. In response to the identification of the third placeholder, and based on a combination of the chatbot's first identifier, the computer device's third identifier, and the third dialog data structure's fifth identifier, the data processing system can select a second content element to provide to the computer device. This causes the computer device to execute the parametrically controlled text-to-speech technique to generate a sixth or subsequent acoustic signal corresponding to the third dialog data structure modified by the second content element.

[0120] Fig. Figure 8 illustrates a method for balancing data requests to modify computer program output over a computer network. Method 800 can be implemented using one or more of the following: Fig. 1, Fig. 2, Fig. 3, Fig. 4 to Fig. The components or systems shown in Act 5 are executed, including, for example, System 100, Data Processing System 102, Computer Device 104, Third-Party Chatbot Platform 146, Chatbot Provider Device 108, or Content Provider Computer Device 106. In Act 802, the Data Processing System can receive a digital file. The Data Processing System can receive the digital file from a Computer Device or a Third-Party Chatbot Platform. The digital file can correspond to speech input detected by a microphone in the Computer Device. The digital file can include a digitized representation of the analog speech input. The digital file can be preprocessed to include keywords or tokens associated with the speech input.In some cases, the data processing system cannot receive the digital file and instead receives the content request with information about the digital file, which can facilitate content selection.

[0121] In Action 804, the data processing system or third-party chatbot platform can select a computer program that includes a chatbot and invoke the chatbot. In some cases, the data processing system 102 can select and invoke the chatbot; a third-party chatbot platform can select and invoke the chatbot; or the computer device can select and invoke the chatbot. The chatbot can be selected and invoked before the digital file is sent to the data processing system.

[0122] In action 806, a placeholder field can be identified in the dialog data structure. In action 808, the data processing system can select the content element and provide it. The data processing system can provide the selected content element to the third-party chatbot platform, the computer device, or any other entity that requested the content element. The data processing system can insert the content element into the dialog data structure and provide the modified dialog data structure for presentation via the computer device.

[0123] In action 810, the data processing system can facilitate the reduction of computer resource usage by storing the content element in memory. The data processing system can associate the content element with an index value based on information linked to the content element's presentation. For example, the index value can be generated based on one or more identifiers associated with the content element, the computer device, the chatbot, the dialog data structure, the keyword, the topic, or the location. The index value can be generated based on identifiers relevant to the session and can be used to determine whether to continue the session, to identify which session to continue, or to start a new session.

[0124] In action 812, the data processing system can receive a second digital file or a second content request containing information to facilitate content selection. In action 814, the data processing system can select a second computer program that includes a second chatbot, different from the chatbot previously invoked in action 804. The data processing system can receive a hint about the second chatbot, which may have been selected by a third party. In action 816, the data processing system can determine that the content element previously selected in action 808 is relevant to a second dialogue data structure provided by the second chatbot and can decide to reuse the same content element to reduce resource consumption.

[0125] Fig. Figure 9 illustrates a procedure for validating computer program output over a computer network. Procedure 900 can be implemented using one or more of the methods described in the Fig. 1, Fig. 2, Fig. 3, Fig. 4 to Fig. The components or systems shown in Figure 5 are executed, including, for example, System 100, Data Processing System 102, Computer Device 104, Third-Party Chatbot Platform 146, Chatbot Provider Device 108, or Content Provider Computer Device 106. In Action 902, the Data Processing System can establish a communication channel with a server of a third-party chatbot platform. The communication channel can be secure, for example, by using encryption technology. The Data Processing System can use a handshaking protocol to establish the communication channel.

[0126] In action 904, the data processing system can receive a content request from the third-party server. The content request can be initiated by the chatbot. The content request can occur in response to the chatbot generating a query. In action 906, the data processing system can select a content element and provide the content element to the third-party server of the third-party chatbot platform. The third-party server can be instructed to forward the content element to the chatbot for presentation to a user of a computer device.

[0127] In action 908, the data processing system can receive a notification about the content element from the chatbot. For example, the data processing system can ping the chatbot regarding the notification. In action 910, the data processing system can set a validation parameter based on a comparison of the notification about the content element and the selected content element, which the data processing system then provides to the third-party server of the third-party chatbot platform. The validation parameter can indicate whether the content elements match or whether the content element was provided to the chatbot in a timely manner.

[0128] Fig. Figure 10 is a block diagram of an exemplary computer system 1000. The computer system or computer device 1000 may comprise the system 100 or its components, such as the data processing system 102, or it may be used to implement them. The data processing system 102 may include an intelligent personal assistant or a voice-based digital assistant. The computer system 1000 comprises a bus 1005 or other communication component for communicating information and a processor 1010 or processing circuit coupled to the bus 1005 for processing information. The computer system 1000 may also include one or more processors 1010 or processing circuits coupled to the bus for processing information. The computer system 1000 also includes main memory 1015, such as...A random-access memory (RAM) or other dynamic storage device coupled to bus 1005 to store information and instructions to be executed by processor 1010. Main memory 1015 can be data container 145 or comprise this data pool. Main memory 1015 can also be used to store positional information, temporary variables, or other intermediate information during instruction execution by processor 1010. The computer system 1000 can further include read-only memory (ROM) 1020 or other static storage device coupled to bus 1005 to store static information and instructions for processor 1010. A storage device 1025, such as a semiconductor device, magnetic disk, or optical disk, can be coupled to bus 1005 to continuously store information and instructions.The storage device 1025 can include the data container 145 or be part of it.

[0129] The computer system 1000 can be connected via bus 1005 to a display 1035, such as a liquid crystal display (LCD) or an active matrix display, to show information to a user. An input device 1030, such as a keyboard with alphanumeric and other keys, can be connected to bus 1005 to communicate information and command selections to the processor 1010. The input device 1030 can include a touchscreen display 1035. The input device 1030 can also include cursor control, such as a mouse, trackball, or arrow keys on the keyboard, to communicate directional information and command selections to the processor 1010 and to control cursor movements on the display 1035. The display 1035 can, for example, be part of the data processing system 102, the client computer device 150, or another component of Fig. Be 1.

[0130] The processes, systems, and procedures described herein can be implemented by the computer system 1000 in response to the processor 1010 executing an instruction set contained in main memory 1015. These instructions can be read into main memory 1015 from another computer-readable medium, such as the storage device 1025. The execution of the instruction set contained in main memory 1015 causes the computer system 1000 to execute the illustrated processes described herein. Furthermore, in a multiprocessor arrangement, one or more processors can be used to execute the instructions contained in main memory 1015. Hardwired circuits can be used instead of, or in combination with, software instructions in conjunction with the systems and procedures described herein.The systems and procedures described herein are not limited to a specific combination of hardware circuits and software.

[0131] Although an exemplary computer system in Fig. As described in section 10, the subject matter, including the operations described in this specification, may be implemented in other types of digital electronic circuits or in computer software, firmware or hardware, including the structures disclosed in this specification and their structural equivalents, or in combinations of one or more of them.

[0132] For situations where the systems discussed here collect or use personal information about users, users may be given a way to control whether programs or functions that can collect personal information (e.g., information about a user's social network, social actions or activities, a user's preference, or a user's location) are allowed, or to control whether and / or how content is received from a content server or other data processing system that may be more relevant to the user. Furthermore, certain data may be anonymized in one or more ways before being stored or used, so that personal information is removed when parameters are generated.For example, a user's identity can be anonymized so that no personally identifiable information can be determined for the user, or a user's location can be generalized by extracting location information (such as a city, postal code, or state) so that a specific location of a user cannot be determined. Therefore, the user can have control over how information about them is collected and used by the content server.

[0133] The subject matter and operations described in this specification may be implemented in digital electronic circuits or in computer software, firmware, or hardware, including the structures disclosed in this specification and their structural equivalents, or in combinations of one or more thereof. The subject matter described in this specification may be implemented as one or more computer programs, such as one or more circuits of computer program instructions encoded on one or more computer storage media for execution by data processing devices or for controlling their operation.Alternatively or additionally, program instructions can be encoded on an artificially generated propagating signal, such as a machine-generated electrical, optical, or electromagnetic signal, which is produced to encode information for transmission to a suitable receiving device, which is then executed by a data processing device. A computer storage medium can be a computer-readable storage device, a computer-readable storage substrate, a freely addressable or serially accessible memory array or device, or a combination thereof. Although a computer storage medium is not a propagating signal, it can be a source or destination of computer program instructions encoded in an artificially generated propagating signal. A computer storage medium can also be one or more separate components or media (e.g.,multiple CDs, data carriers, or other storage devices) or included therein. The operations described in this specification can be implemented as operations performed by a data processing device on data stored on one or more computer-readable storage devices or received from other sources.

[0134] The terms "data processing system," "computer device," "component," or "data processing device" encompass various devices, apparatus, and machines for processing data, including, for example, a programmable processor, a computer, one or more systems on a chip, or combinations thereof. The device may include specialized logic circuitry, such as an FPGA (field-programmable general-purpose circuit) or an ASIC (application-specific integrated circuit). In addition to hardware, the device may also include code that creates an execution environment for the computer program in question, such as code that forms processor firmware, a protocol stack, a database management system, an operating system, a cross-platform runtime environment, a virtual machine, or a combination of one or more of these.The device and the execution environment can implement various computer model infrastructures, such as web services, as well as distributed computing and geographically distributed computing infrastructures. For example, the interface 110, the content selection component 118, or the NLP component 112, and other data processing system 102 components can include or share one or more data processing devices, systems, computer devices, or processors.

[0135] A computer program (also known as a program, software, software application, app, script, or code) can be written in any form of programming language, including compiled or interpreted languages, declarative or procedural languages, and it can be used in any form, including as a standalone program or as a module, component, subroutine, object, or other unit suitable for use in a computer environment. A computer program can correspond to a file in a file system. A computer program can be stored in a portion of a file containing other programs or data (e.g., one or more scripts stored in a document written in markup language), in a single file specifically for that program, or in several coordinated files (e.g., files containing one or more modules, subroutines, or portions of code).A computer program can be used in such a way that it is executed on one computer or on several computers located at one site or distributed across several sites and connected to each other through a communication network.

[0136] The processes and logic sequences described in this description can be executed by one or more programmable processors, which execute one or more computer programs (e.g., components of Data Processing System 102) to perform actions by processing input data and generating outputs. The processes and logic sequences can also be implemented as specialized logic circuits, such as an FPGA (field-programmable gate array) or an ASIC (application-specific integrated circuit), and devices can also be implemented as such. Devices suitable for storing computer program instructions and data include all forms of non-volatile memory, media, and storage devices, including, for example, semiconductor memory devices such as EPROM, EEPROM, and flash memory devices; magnetic disks, such as...Built-in disk drives or removable disks; magneto-optical disks; and CD-ROMs and DVD-ROMs. The processor and memory may be supplemented with or integrated with logic circuits for a special purpose.

[0137] The subject matter described herein may be implemented in a computer system comprising a backend component, such as a data server; a middleware component, such as an application server; a frontend component, such as a client computer with a graphical user interface or web browser through which a user can interact with an implementation of the subject matter described herein; or any combination of one or more such backend, middleware, or frontend components. The system components may be interconnected by any form or medium of digital data communication, such as a communication network. Examples of communication networks include a local area network (LAN) and a wide area network (WAN), an internetwork (e.g., the Internet), and peer-to-peer networks (e.g., Wi-Fi).ad hoc peer-to-peer networks).

[0138] The computer system, such as System 100 or System 1000, can include clients and servers. A client and a server are generally located remotely and typically interact through a communication network (e.g., Network 165). The client-server relationship arises from computer programs running on the respective computers, which establish a client-server relationship. In some implementations, a server sends data (e.g., data packets representing a content element) to a client device (e.g., for the purpose of displaying data and receiving user input from a user interacting with the client device). Data generated in the client device (e.g., a result of user interaction) can be received by the client device at the server (e.g.,received by the data processing system 102 from the computer device 150 or the content provider computer device 155 or the chatbot provider computer device 160).

[0139] Although the processes in the drawings are shown in a specific order, it is not necessary that these processes be carried out in the specific order shown or in sequential order, and it is not necessary that all illustrated processes be carried out. The actions described herein can be performed in a different order.

[0140] The separation of different system components does not require separation in all implementations, and the described program components can be included in a single hardware or software product. For example, the NLP component 112 or the content selection component 118 can be a single component, an app or program, or a logic device with one or more processing circuits, or part of one or more servers of the data processing system 102.

[0141] Having described several illustrative implementations, it is evident that the foregoing is illustrative and not limiting, and has been presented in an exemplary manner. In particular, although many of the examples presented herein involve specific combinations of process operations or system elements, these operations and elements can be combined in other ways to achieve the same goals. Operations, elements, and features described in connection with one implementation are not intended to preclude a similar role in other implementations or from being used in any other implementation.

[0142] The language and terminology used here serve a descriptive purpose and should not be considered restrictive. The use of the words "including," "comprehensive," "exhibiting," "containing," "incorporating," "characterized by," "characterized by," and variations thereof, here means that the elements listed thereafter, their equivalents, and additional elements, as well as alternative implementations consisting exclusively of the elements listed thereafter, are included. In an implementation, the systems and procedures described herein consist of one, any combination of more than one, or all of the elements, modes of action, or components described herein.

[0143] Any references to implementations, elements, or modes of operation of the systems and procedures mentioned herein in the singular may also include implementations comprising a multitude of such elements, while any reference to an implementation, element, or mode of operation of any kind mentioned herein in the plural may also include implementations comprising only a single element. References to the singular or plural form are not intended to limit the systems and procedures disclosed herein, their components, modes of operation, or elements to single or multiple configurations.References to a mode of operation or an element of any kind, based on information, modes of operation or elements of any kind, may include implementations whose mode of operation or element is based at least partially on information, modes of operation or elements of any kind.

[0144] Any of the implementations disclosed herein may be combined with any other implementations or embodiments, and references to "an implementation," "some implementations," "the implementation," or the like are not necessarily mutually exclusive and are intended to indicate that a particular feature, structure, or characteristic described in connection with the implementation may be included in at least one implementation or embodiment. Such terms, as used herein, do not necessarily refer to the same implementation. Any implementation may be combined with any other implementation, including or exclusively, and in any manner consistent with the aspects and implementations disclosed herein.

[0145] References to "or" can be interpreted inclusively, meaning that all terms described by "or" can refer to one, more than one, or each of the described terms. For example, a reference to "at least one of 'A' and 'B'" can include only 'A', only 'B', or both 'A' and 'B'. These references, when used in conjunction with "comprehensive" or other open terminology, can include additional elements.

[0146] If technical features in the drawings, detailed description, or any claim are followed by reference numerals, these have been included to enhance the clarity of the drawings, detailed description, or claims. Accordingly, neither such reference numerals nor their absence have a limiting effect on the scope of the claim elements.

[0147] The systems and methods described herein can also be embodied by other embodiments without altering their properties. For example, the data processing system 102 can select a content element for a subsequent action (e.g., for the third action 215) based partly on data from a preceding action in the sequence of actions of the thread 200, such as data from the second action 210 indicating that the second action 210 is complete or about to begin. The preceding implementations are illustrative rather than limiting for the systems and methods described herein. The scope of the systems and methods described herein is therefore specified by the appended claims rather than by the preceding description, and modifications that fall within the meanings and scope of equivalence of the claims are therefore included therein.

Claims

[1] System (100) for modifying a computer program output, comprising: a data processing system (102) with one or more processors and memory for: Receiving, from a computer device (104), a digital file corresponding to a first acoustic signal transmitting speech content detected by a microphone of the computer device (104), wherein the first acoustic signal is converted into the digital file by an analog-to-digital converter of the computer device (104); Selecting, in response to the language content of the digital file, a computer program that includes a chatbot, from several computer programs that include chatbots for execution; Identify, via the chatbot based on the language content of the digital file, a dialog data structure that includes a placeholder field; Generate, in response to an identification of the placeholder field in the dialog data structure, a content request in a parameterized format configured for a parametrically controlled text-to-speech technique; Sending the content request to a content selection component of the data processing system (102); Selecting, via a content selection process in response to the request, a content element to be inserted into the placeholder field of the dialog data structure, wherein the content element is configured in the parameterized format for the parametrically controlled text-to-speech technique; and Providing to the chatbot the content element in the parameterized format selected via the content selection process to cause the computer device (104) to execute the parametrically controlled text-to-speech technique to generate a second acoustic signal corresponding to the dialog data structure modified with the content element. [2] System (100) according to claim 1, comprising the data processing system (102) for: Determine, based on an identifier of the computer device (104) and via a lookup operation in a data store, that the computer device (104) is authorized to access the computer program comprising the chatbot; and Selecting the computer program that includes the chatbot in response to the determination that the computer device (104) is authorized to access the chatbot. [3] System (100) according to claim 1, comprising: that the chatbot is configured to use a natural language processing technique to identify the dialog data structure in response to the digital file. [4] System (100) according to claim 1, wherein the dialog data structure comprises a tag which identifies the placeholder field and metadata of the placeholder field. [5] System (100) according to claim 1, comprising the data processing system (102) for: Creating a second dialog data structure that includes a second placeholder field; Comparing the second placeholder field with the placeholder field; Determine, based on the comparison, to generate a second requirement for second content in the parameterized format; and Selecting a second content element to insert into the second placeholder field of the second dialog data structure. [6] System (100) according to claim 1, comprising the data processing system (102) for: Creating a second dialog data structure that includes a second placeholder field; Comparing the second placeholder field with the placeholder field; and Based on the comparison, determine which content element selected for insertion into the placeholder field should be inserted into the second placeholder field of the second dialog data structure. [7] System (100) according to claim 1, comprising the data processing system (102) for: Creating a second dialog data structure that includes a second placeholder field; Comparing the second placeholder field with the placeholder field; Determine, based on the comparison, not to request a second content element for the second placeholder field of the second dialog data structure; and Reusing the content element selected for the placeholder field to insert it into the second placeholder field of the second dialog data structure. [8] System (100) according to claim 1, comprising the data processing system (102) for: Selecting, via the content selection process, the content element to insert based on profile information linked to the computer device (104). [9] System (100) according to claim 1, comprising the data processing system (102) for: Selecting, via the content selection process, the content element to insert based on multiple digital files that correspond to the computer device (104). [10] System (100) according to claim 1, comprising the computer device (104) for: Playing the content element with an acoustic fingerprint that corresponds to the chatbot. [11] System (100) according to claim 1, comprising the data processing system (102) for: Providing, via a secure network communication channel, the content element for insertion into the dialogue data structure by the chatbot. [12] Methods for modifying a computer program output, comprising: Detect, by means of a sensor (134) of a computer device (104), a first image comprising visual content; Converting, by the computer device (104), the first image into a digital file that corresponds to the visual content; Selecting, in response to the visual content of the digital file, a computer program that includes a chatbot, from several computer programs that include chatbots for execution; Identification by the chatbot based on the visual content of the digital file, a dialog data structure that includes a placeholder field; Generate, in response to the identification of the placeholder field in the dialog data structure, a content request in a parameterized format configured for a parametrically controlled text-to-speech technique; Sending, by the chatbot, the content request to a content selection server; selecting, by the content selection server in response to the request, a content element to insert into the placeholder field of the dialog data structure, wherein the content element is configured in the parameterized format for the parametrically controlled text-to-speech technique; and Providing to the chatbot the content element in the parameterized format selected via the content selection process to cause the computer device (104) to execute the parametrically controlled text-to-speech technique to generate an acoustic signal corresponding to the dialog data structure modified with the content element. [13] The method of claim 12, comprising: Determine, based on a lookup operation in a data store with an identifier of the computer device (104), that the computer device (104) is authorized to access the computer program comprising the chatbot; and Selecting the computer program that includes the chatbot in response to the determination that the computer device (104) is authorized to access the chatbot. [14] The method of claim 12, comprising: Using a machine learning image processing technique to identify the dialog data structure in response to the digital file. [15] Method according to claim 12, wherein the dialog data structure includes a tag which identifies the placeholder field and metadata of the placeholder field. [16] The method of claim 12, comprising: Creating a second dialog data structure that includes a second placeholder field; Comparing the second placeholder field with the placeholder field; Determine, based on the comparison, to generate a second requirement for second content in the parameterized format; and select a second content element to insert into the second placeholder field of the second dialog data structure.