Dialogue agent with two-sided modeling

By using the FedAssistant framework with two-sided modeling and sparsity techniques, the problems of privacy protection and high communication costs in dialogue system training are solved, achieving efficient and secure dialogue system training and response generation.

CN115914148BActive Publication Date: 2026-06-26THE HONG KONG UNIV OF SCI & TECH

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
THE HONG KONG UNIV OF SCI & TECH
Filing Date
2022-08-04
Publication Date
2026-06-26

AI Technical Summary

Technical Problem

Existing dialogue systems face privacy protection and high communication costs during training, especially when using large pre-trained language models. Transmitting users' raw speech may leak privacy, and traditional joint machine learning is expensive to communicate and difficult to generate responses on resource-constrained user devices.

Method used

The FedAssistant framework, which adopts two-sided modeling, uses sparsification techniques such as Top-k gradient sparsification to transmit only the hidden state (contextual information) instead of the original data. It is trained using the GPT model, with the user-side and assistant-side models built locally, and the central model updated on the parameter server.

Benefits of technology

This enables training of the dialogue system without leaking the original data, reducing communication costs, improving the efficiency and security of response generation, and reducing the demand on user device computing resources.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN115914148B_ABST
    Figure CN115914148B_ABST
Patent Text Reader

Abstract

The central learning model is deployed as a user model and an assistant model. A sensitive information utterance from a previously stored conversational language corpus corresponding to a user query and its chat agent response are used to train the user model into an updated user model and the assistant model into an updated assistant model, respectively. The user model provides the assistant model with a user context corresponding to the user query, and the assistant model provides the user model with an assistant context corresponding to the chat agent response. During the training, the user model does not provide the assistant model with a plain text query, and the assistant model does not provide the user model with a plain text response. The updated assistant model can facilitate the joint training process to generate an updated central model. The updated central model can be used to provide a real-time chat agent that responds to real-time user queries.
Need to check novelty before this filing date? Find Prior Art

Description

[0001] Cross-references to related applications

[0002] This application is a non-provisional application claiming priority to U.S. Provisional Patent Application No. 63 / 229,490, filed August 4, 2021, entitled “FedAssistant: Dialog agents with two-sided modeling,” pursuant to 35 USC §119, the entire contents of which are incorporated herein by reference. Background Technology

[0003] Dialogue systems play a vital role in daily life and are widely used for recommendation, question answering, online customer service, and chatbots. Two typical machine learning-based dialogue systems include 1) utterance retrieval models that select responses from a given database, and 2) neural generative language models that generate responses spontaneously. High-capacity generative language models trained on large datasets can perform well on dialogue generation-related tasks. Using large pre-trained language models, such as generative pre-trained transformer (“GPT”) models (e.g., GPT, GPT-2, or GPT-3), can yield good performance in task-oriented dialogue tasks and open-domain chatbots. Various sources of publicly available datasets can be used to train large pre-trained models.

[0004] However, for private datasets held by individual companies, institutions, or organizations, privacy concerns may prevent current language models from maximizing their usefulness. For example, a country's laws may prohibit the sharing of personal or private data (which may be termed sensitive data) of individuals or organizations without consent. Furthermore, even if sensitive data is "deidentified" or anonymized before sharing, the identity of the source of the sensitive data (e.g., the individual or organization to which the sensitive data belongs) can be determined through data re-identification using auxiliary data. Therefore, owners of sensitive data (e.g., medical records) are often reluctant to directly share their original data.

[0005] On the other hand, data holders can be service providers who have an incentive to share sensitive data about their users, subscribers, customers, etc., to improve the services offered to them. For example, after being trained on data from multiple knowledge domains, question-answering models can answer more comprehensive and complex questions. Therefore, there is a need for a method to train language models using data from multiple data owners without exposing the raw data associated with the owners.

[0006] Federation machine learning can be used to train machine learning models to protect privacy. Federation machine learning frameworks can leverage a central learning model process (e.g., the FedAvg algorithm) located on a central server, which updates the global / master / central model using a weighted or unweighted average of the client model parameters or gradients. The global model then sends the updates back to its corresponding client. However, this traditional federated machine learning approach can be problematic for several reasons. For example, large models often contain over 100 million model parameters, and iteratively updating the client and global models is impractical due to the significant communication costs between the central model and the user / client models, especially when communication between the central model server and the client / user device is at least partially facilitated by wireless communication networks. Another issue is that the generative nature of conversational neural models can also become a constraint on serving users: to generate responses with a given context, the generative model improvises word-for-word, which is time-consuming and processor-intensive for mobile users with limited computing resources. Another potential problem with traditional federated learning is that deploying conversational neural models from client devices to service provider servers avoids consuming limited computing resources on user devices, but typically requires transmitting the user's raw utterances to the central model server. Because malicious actors can extract private information from speech, they may obtain sensitive raw speech data from users without authorization, so transmitting users' raw speech may compromise their privacy. Summary of the Invention

[0007] The following presents a simplified overview of the disclosed subject matter to provide a basic understanding of some embodiments across the various examples. This overview is not a comprehensive summary of the various embodiments. It is not intended to identify key or essential elements of the various embodiments, nor is it intended to depict the scope of the various embodiments. The sole purpose of this overview is to present some concepts of this disclosure in a streamlined manner as a prelude to the more detailed description that follows.

[0008] A framework including an assistant learning model (which may be referred to as the FedAssistant model) is used to train a neural dialogue system in a federated learning setting. The dialogue system can then be used to facilitate chatbots that can serve multiple purposes. The assistant learning model can be trained based on data from multiple data owners or data holders without leaking the raw data during the training and evaluation of the assistant model or user learning model. Assistant model training can include bilateral modeling that can be easily deployed to users supported by the data owner / provider. To reduce the communication costs between the data holder and the parameter server (which may be referred to as the central server and may include access to the central model), the assistant model can be sparsified, for example, using Top-k gradient sparsity. It should be understood that the central model can be distributed across different computing systems, which may be co-located or geographically separated.

[0009] Two-sided dialogue modeling frameworks can be based on the use of GPT models (currently, GPT, GPT-2, and GPT-3 exist and are available). Models other than GPT can be used instead. Only hidden states (which may include contextual information) are transferred between the user model and the assistant model. After the assistant model has been trained, FedAvg can be performed based solely on the parameters corresponding to the assistant model. The assistant model can use the transformer architecture of the GPT model to model subsequent utterances, and can consider the user model's previous hidden states as context.

[0010] By using the FedAssistant assistant model framework described herein, the transmission of raw data during training and inference is avoided. Only past keys and values ​​of all transformer blocks of the user model or assistant model are transmitted as context to generate responses during training and inference. The provided plain text raw data is only visible to the data owner / holder. During training, each data holder can initialize a first user-side model and a second assistant-side model. Only parameters, values, factors, coefficients, etc., from the assistant-side model are provided to the parameter server via the FedAvg algorithm to update the model at the parameter server. The model in the parameter server can be referred to as the central model and can be operated or otherwise facilitated by a third-party service provider and is accessible to all data holders and all actual users. After training, the model at the parameter server replaces the assistant-side model of each data holder and becomes the assistant model used to answer queries. The user-side model is distributed to actual users according to the data holder's preferences, specifications, permissions, requirements, or other operational needs.

[0011] In one embodiment, the data holder can be considered or referred to as a user service provider (e.g., a bank, healthcare provider, financial service provider, social media platform, retailer, etc.). The FedAssistant assistant model framework facilitates simple model deployment to the data holder's users; users do not need to generate responses locally using their associated user devices.

[0012] Using the FedAssistant assistant model framework can improve services for data holders, encouraging their participation in federated learning while protecting their private, raw data. For example, healthcare providers might be more willing to participate in training the assistant model and eventually the central model because their raw data, such as that corresponding to patients, is not disclosed to either the assistant or central model.

[0013] An example embodiment of the method includes: receiving first context information representing a first context corresponding to sensitive information from a first initial learning model executed on a computing device including a processor; inputting the first context information into a second initial learning model executed on the computing device; and using the second initial learning model to determine response information in response to the sensitive information based on the first context information.

[0014] In one embodiment, the training method includes: determining updated context information based on first context information and second context information representing a second context corresponding to the response information by using a second initial learning model; and transferring the updated context information from the second initial learning model to the first initial learning model.

[0015] The first and second initial learning models may each comprise, for example, a user-side model and an assistant-side model of a two-sided model training model. The user-side and assistant-side models operate on different computing systems in a distributed server-client computing environment and may include the same parameters as the central model (i.e., the first and second initial learning models may be the same model with the same parameters as the initial central model). In one embodiment, the first and second initial learning models may operate on the same computing system. The computing device may be part of a computing system owned, operated, leased, or controlled by an entity possessing a dialogue corpus containing sensitive information that the data owner wishes to keep confidential. One or more sessions in the corpus may include dialogue messages between a user and an agent, which may be or has been a real person or a chatbot agent. Dialogue sessions may include query messages initiated or sent by the user to the agent, and may also include response messages generated by the agent in response to one or more queries and sent to the user's device. The corpus of dialogue session messages may include messages that occurred and were recorded / stored in the corpus prior to the implementation of the two-sided modeling method. Context can be generated or determined by both the user-side model and the assistant-side model based on dialogue messages from a corpus. Context can include keys or values ​​generated by the model for a given query or response message. For example, the user-side model can generate context for a query message and send it to the assistant-side model without sending the actual language of the query message retrieved from the corpus. The assistant-side model can then generate context based on response messages from a corpus sent by an agent (human or automated) in response to the query message. The assistant-side model can combine the context received from the user-side model with the context it generated based on the response message corresponding to the query message in the corpus to obtain a second context, which it then sends to the user-side model without sending the actual language of the response message.

[0016] In one embodiment, the method may further include training a second initial learning model by a computing device based on response information to obtain an updated second learning model. In one embodiment, the second initial learning model for training the updated second learning model does not use sensitive information input from a corpus, provided to, or otherwise obtained by the first initial learning model. In one embodiment, the second initial learning model does not receive sensitive information from the first initial learning model or from any other source.

[0017] In one embodiment, the first initial learning model and the second initial learning model may include pre-trained language models, which may also include a central model. In one embodiment, the pre-trained language model may include a generative pre-trained transformer model, such as GPT, GPT-2, GPT-3, or a similar model.

[0018] In another embodiment, the method may further include training a central learning model by a second computing device based on a joint learning model to obtain an updated central model based on different response information generated in response to different contexts received from different assistant learning models, the different contexts corresponding to different conversational dialogues between different user learning models and different assistant learning models. In other words, different owners of different corresponding datasets / corpora of conversational messages can have their own respected updated second learning models, each updated second learning model having corresponding parameters or gradients distinct from other updated second learning models. The parameters of the different updated second learning models can be provided to the second computing device, which may include a processor and can be configured as an operating parameter server, and can execute a joint learning model based on the parameters from the different updated second learning models to obtain an updated central learning model.

[0019] The method of this embodiment may further include: using a central computing device of a central computing system including a processor to determine a response to a query received from a user device configured to present a dialogue agent application interface of a user device; and using the central computing system to send the response to the dialogue agent application interface of the user device; wherein the central computing device uses an updated central model to determine the response to the query. The central computing device of the central computing system may include an updated central model, which is updated during training of multiple assistant-side models using two-sided modeling, wherein the two-sided modeling does not transfer sensitive information from one model to another during training. The central computing device may be owned, operated, leased, controlled, or otherwise deployed by a data holder who owns a corpus used to train at least one assistant-side model.

[0020] The method may further include: determining a response to a query input to a dialogue agent of a user device, the user device including a processor and configured to present a dialogue agent application interface of the user device, wherein the user device uses an updated central model to determine the response to the query. In this case, instead of a central computing device, an assistant agent including a trained central model to respond to real-time queries from the user is provided; the central model may be deployed on the user's device, such as a smartphone, tablet, or laptop computing device.

[0021] In one example embodiment, the computing system includes a computing device including a processor configured to: receive first context information representing a first context corresponding to sensitive information from a first initial learning model executed on the computing device; input the first context information into a second initial learning model executed on the computing device; and use the second initial learning model to determine response information in response to the sensitive information based on the first context information.

[0022] In one embodiment, the processor of the computing device may also be configured to: determine updated context information based on first context information and second context information representing a second context corresponding to the response information using a second initial learning model; and transfer the updated context information from the second initial learning model to the first initial learning model.

[0023] In one embodiment, the processor can also be configured to train a second initial learning model based on the response information to obtain an updated second learning model. The learning model that yields the updated second learning model can be trained without using sensitive information input to the initial learning model.

[0024] In one embodiment, the first initial training model and the second initial training model include a pre-trained central language model.

[0025] In one embodiment, the parameters corresponding to the updated second learning model can be combined with parameters from other models to obtain an updated central learning model.

[0026] In one example embodiment, the non-transitory machine-readable medium may include executable instructions that, when executed by a processor of a computing device including access to a first initial learning model and a second initial learning model, cause the following operations to be performed: inputting first sensitive information into the first initial learning model; determining first context information corresponding to the first sensitive information using the first initial learning model; transferring the first context information to the second initial learning model; using the second initial learning model, determining response language information in response to the first context information based on the first context information, and determining first updated context information based on second context information corresponding to the response language information; transferring the first updated context information to the first initial learning model; inputting second sensitive information in response to the first updated context information into the first initial learning model; determining third context information corresponding to the second sensitive information and the first updated context information using the first initial learning model; determining second updated context information based on the first updated context information and the third context information; and transferring the second updated context information to the second initial learning model.

[0027] In one embodiment, the operation that the executable instructions cause to be performed further includes: training a second initial learning model based on the first context information, the second context information, and the third context information to obtain an updated second learning model.

[0028] In one embodiment, the executable instructions may be configured to train the second initial learning model without using the first sensitive information or the second sensitive information. In another embodiment, the executable instructions may be configured to train the second initial learning model without using the first sensitive information, but based on the second sensitive information.

[0029] In one embodiment, the executable instructions are configured to provide an updated second learning model to a parameter server for training a central learning model to obtain an updated central model. Attached Figure Description

[0030] Figure 1 A chatbot system in a network environment is shown.

[0031] Figure 2 An example system for training a chat agent model using two-sided modeling is shown.

[0032] Figure 3 An example method for training a chat agent model using two-sided modeling is shown.

[0033] Figure 4 An example system and method for training a chat agent model using two-sided modeling with parameters from multiple data holders are shown.

[0034] Figure 5A An example system is shown that uses a chat agent model trained by modeling from both sides on a network.

[0035] Figure 5B An example system using a chat agent model trained by modeling from both sides running from a user device is shown.

[0036] Figure 6 An example method using a chat agent trained by modeling from both sides is shown.

[0037] Figure 7 The computer environment is shown.

[0038] Figure 8 A block diagram of a method embodiment is shown.

[0039] Figure 9 A block diagram of a system embodiment is shown.

[0040] Figure 10 A block diagram of a method that can be implemented in a machine-readable medium is shown. Detailed Implementation

[0041] As a preamble, those skilled in the art will readily understand that this embodiment is readily applicable and widely applicable. In addition to those described herein, many methods, embodiments, and modifications of this application, as well as many variations, modifications, and equivalent arrangements, will be obvious or reasonably implied to the nature or scope of the various embodiments of this application.

[0042] Therefore, although this application has been described in detail with respect to various embodiments herein, it should be understood that this disclosure is merely illustrative and exemplary with respect to one or more concepts expressed by the various embodiments, and is intended only for the purpose of providing a complete and illustrative overview. The following disclosure is neither intended nor should be construed as limiting this application, or otherwise excluding any such other embodiments, adaptations, variations, modifications, and equivalent arrangements, and the embodiments described herein are limited only by the appended claims and their equivalents.

[0043] As used in this disclosure, in some embodiments, the terms "component," "system," etc., are intended to refer to or include computer-related entities or entities associated with operating means having one or more specific functions, wherein an entity may be hardware, a combination of hardware and software, software, or software being executed. As examples, a component may be, but is not limited to: a process running on a processor, a processor, an object, an executable file, a thread of execution, computer-executable instructions, a program, and / or a computer. By way of illustration and not limitation, both an application running on a server and the server itself can be components.

[0044] One or more components may reside within a process and / or execution thread, and components may be located on a single computer and / or distributed across two or more computers. Furthermore, these components may execute from various computer-readable media storing various data structures. Components may communicate via local and / or remote processes, for example, based on signals having one or more data packets (e.g., data from one component interacts with another component in a local system, a distributed system, and / or across a network (e.g., the Internet) via this signal). As another example, a component may be a device having specific functions provided by mechanical parts operated by electrical or electronic circuitry, which is operated by a software application or firmware application executed by a processor, wherein the processor may be internal or external to the device and execute at least a portion of the software or firmware application. As yet another example, a component may be a device providing specific functions through electronic components without mechanical parts, the electronic components including a processor to execute software or firmware that at least partially endows the electronic components with functionality. Although various components have been shown as separate components, it should be understood that multiple components may be implemented as a single component, or a single component may be implemented as multiple components, without departing from the exemplary embodiments.

[0045] As used herein, the term "facilitate" in the context of a system, device, or component, in relation to the nature of a complex computing environment, "facilitate" one or more actions or operations, some of which may involve multiple components and / or multiple devices. Non-limiting examples of actions that may or may not involve multiple components and / or multiple devices include: sending or receiving data, establishing connections between devices, determining intermediate results to obtain results, etc. In this respect, a computing device or component can facilitate an operation by playing any role in completing the operation. Therefore, it should be understood that when the operation of a component is described herein, where the operation is described as being facilitated by the component, the operation may optionally be accomplished through the cooperation of one or more other computing devices or components, such as, but not limited to, sensors, antennas, audio and / or video output devices, other devices, etc.

[0046] Furthermore, various embodiments can be implemented using standard programming and / or engineering techniques to generate software, firmware, hardware, or any combination thereof to control a computer to implement the disclosed subject matter, methods, apparatus, or articles of art. As used herein, the term "article of art" is intended to cover a computer program accessible from any computer-readable (or machine-readable) device or computer-readable (or machine-readable) storage / communication medium. For example, computer-readable storage media may include, but are not limited to: magnetic storage devices (e.g., hard disks, floppy disks, magnetic stripes), optical discs (e.g., compact discs (CDs), digital multifunction discs (DVDs)), smart cards, and flash memory storage devices (e.g., cards, sticks, key drives). Of course, those skilled in the art will recognize that many modifications can be made to this configuration without departing from the scope or spirit of the various embodiments.

[0047] The two-sided modeling framework can use a Generative Pre-trained Transformer (GPT) model (e.g., GPT, GPT-2, GPT-3, or similar models) to train a central language model based on a dataset that can be distributed across multiple computing devices, storage devices, etc., without exposing utterances from any of the distributed computing devices, storage devices, etc. In one embodiment, the dataset can be distributed across different computing systems, and the central model can be distributed across different computing systems, which may be located in different data centers or other locations. The computing systems distributing the dataset can be the same or different computing systems distributing the central model. The data owner / holder (“D”) can initialize two GPT models, which can be classified as a user-side GPT model and an assistant-side GPT model. For a conversation (“C”) corresponding to data holder D, the user-side GPT model models the user's utterances only locally, while the assistant-side GPT model models the utterances generated by the assistant side and generates responses based on previous context received from either the user-side GPT model or the assistant-side GPT model. The central model can be trained by the assistant-side model corresponding to the data holder based on a joint model. After a central model has been trained, it can be used to generate responses to queries received from users. The central model can include the same models as the user-side GPT model and assistant-side model before training. For ease of discussion, the user-side GPT model can be referred to as the user model, and the assistant-side GPT model can be referred to as the assistant model. The central model and assistant model can be executed on the same computing device or different computing devices. It should be understood that since the user model and assistant model are refined during training, the user model can be referred to as the updated user model, and the assistant model can be referred to as the updated assistant model. Data holder D can include a user GPT model that has been or will be trained using session data corresponding to one of several broad topics (e.g., repair, coffee, movies, cars, telephone, etc.). Data holder D can also include an assistant GPT model that has been or will be trained using session data corresponding to one of several broad topics (e.g., repair, coffee, movies, cars, telephone, etc.).

[0048] Now turn to the attached diagram. Figure 1System 2 is illustrated, which facilitates user 4 using user device 6 (which may be a mobile device, tablet, laptop, etc.) to provide an interface of chat application 8 to communicate with chatbot application 10, which can run on network computing system 12, which may include a server, via communication network 14. When a service is deployed to support or respond to messages from the user device, chatbot application 10 may provide automated reply messages in response to user inquiries or other messages; the chatbot application may include a trained central model that has been trained based on information or data corresponding to sensitive information of a data holder. The trained central model may include a parameter server (or the parameter server may include the trained central model), and the trained central model may be trained according to a federated process (e.g., FedAverage). (FedAverage may include known processes for training the central model, which are based in part on messages generated at multiple models remotely from the central model being trained.) In one embodiment, the central model of chatbot application 10 may be trained using contextual data transmitted from user device 6. In another embodiment, the central model may be based on archived session data, which includes previous (e.g., prior to training the central model) queries and responses that may have been generated during conversations between the user device and other chat applications (e.g., based on a trained language model or a human-operated enterprise help desk chat application). In one embodiment, user device 6 may send a query context 16 corresponding to a query message, which includes utterances given by user 4 (e.g., language information entered by the user in text or speech), and the assistant model may transmit a response context 18 to the user device. In another embodiment, the query context may be generated based on sensitive information in a message corpus of previous conversations stored or maintained by a data holder. In one embodiment, the query context 16 may be generated by the user-side model using a bi-model modeling technique during training and thus (instead of being transmitted by user device 6) to the assistant-side model; the response context 18 may be generated by the assistant-side model using a bi-model modeling technique during training and thus (instead of being transmitted by user device 12) to the user-side model.

[0049] Communication network 14 may include a public network, such as the Internet. Communication network 14 may also include a semi-private network, such as an intranet that provides communication services to an authorized group of users (e.g., employees of a company). Communication network 14 may include a network that facilitates communication between machine devices, such as a controller access network (“CAN”) connecting computing modules and devices in a vehicle (e.g., an automobile). In one embodiment, machine data generated from the vehicle may be sensitive information used to train a central model that will be used in an autonomous driving application.

[0050] Chat application 8 can provide an interface for user 4 to input messages or inquiries (e.g., text messages), which may include one or more questions that the system 12 wants answered. The system 12 may include or have access to the chat agent application. Chat application 8 can provide an interface for user 4 to view messages or respond as replies to questions posed by chat agent application 10. User 4 can input messages into chat application 8 through one or more types of interfaces, including a keypad, keyboard (hardware or provided virtually as part of the display interface), microphone capturing user 4's speech, dropdown menus, text boxes, confirmed message lists, etc.

[0051] Turn now Figure 2 The figure illustrates a system 20 that trains chatbot agents using a dual-model modeling technique by training user models 22A-22n corresponding to data holders 26A-n and corresponding assistant models 24A-24n. Data holder 26A is shown as being associated with the broad topic "movies"; data holder 26B is shown as being associated with the broad topic "coffee"; and data holder 26n is shown as being associated with the broad topic "repair". During the training process, contexts 16 and 18 can be passed back and forth between user models 22A-22n and corresponding assistant models 24A-24n. The trained assistant model 26 can be used to update the central model of parameter server 12 to generate an updated central model, such that the updated central model includes updated parameters / gradients determined based on contexts 16 and 18 generated or determined during the training process. When the chatbot application is deployed to provide real-time responses to real-time user inquiries, the updated central model can be used to generate reply messages in response to user inquiries and messages from the chatbot application 10. The group of conversational data stored and owned by data holder 26 can be referred to as a dataset or corpus of conversational data, and may include query and response messages, which can be used to train user models and assistant models.

[0052] exist Figure 2In this scenario, three data holders—movie, coffee, and repair—each "own" the conversation between a user and their corresponding assistant. Each data holder can use two models (e.g., two GPT-2 models) to model the user's and assistant's utterances separately. In one embodiment, a FedAvg process is performed only for the assistant's model parameters / gradients. As a result, data holders can better leverage their own automated response assistants to further improve their services, and users can obtain responses through a simple round of user-side GPT-2 inference without sending the original utterances.

[0053] The FedAvg process typically combines local stochastic gradient descent (“SGD”) for the client / user-side model with parameter averaging on the server. In traditional joint learning implementations, each of multiple clients downloads the entire model and then uploads an updated model after training. This process is often costly in terms of communication, but this can be mitigated using one or more techniques. For example, gradient compression can be used in a distributed SGD environment.

[0054] Quantized SGD aims to balance communication costs and convergence guarantees. Sparsification methods preserve the unbiasedness of sparse stochastic gradients by discarding some coordinates of the gradient and amplifying the remaining coordinates. Another technique is biased gradient compression, including SignSGD and Data Gravity-Based Classification (“DGC”) sparsification. SignSGD utilizes the sign of the stochastic gradient to perform 1-bit compressed communication between the server and client. A common use of gradient sparsification is to reduce the magnitude of the gradient transmitted from the client model to the server model. Gradient sparsification can include sending gradients larger than a threshold, the absolute value of the gradient, or a fixed portion of the gradient. Unlike previous sparsification methods, DGC sparsification sends gradient magnitudes larger than a threshold while continuing to accumulate local gradients for a local / client model that may run on the client device.

[0055] To simplify the communication process of training task-oriented dialogue agents as described in this paper, sparsification techniques that send a fixed portion of the gradient can be used, for example, the assistant model and parameter server transmit gradients in a fixed proportion selected according to the magnitude.

[0056] Task-oriented conversational agents / assistants are becoming increasingly common in providing real-time assistance to users. Examples of conversational assistants include Google Home, Apple Siri, and Microsoft Cortana. Typically, a task-oriented conversational agent consists of four parts: a Natural Language Understanding (“NLU”) module / function, a Dialogue State Tracking (“DST”) module / function, a Dialogue Policy module / function, and a Natural Language Generation (“NLG”) module / function. Conversational agents often need to perform specific tasks and may rely on a domain. (The term “domain” can refer to the topic of conversation during a chat session, such as how to diagnose and correct computer malfunctions.) The NLU module or function facilitates preprocessing steps for subsequent parts or functions. The NLU module takes human utterances as input and typically identifies three tasks: domain detection, intent determination, and semantic slot labeling. The DST module is the slot-filling process, where the conversation state is represented by a combination of several attribute slots and values. Deep neural networks and recurrent networks can be applied to model this process. The Dialogue Policy module takes the conversation state as input and determines a policy for selecting conversational actions. Reinforcement learning algorithms are commonly used for policy optimization. The dialogue state can be encoded as a feature vector of a Deep-Q network (“DQN”). The DQN then outputs a real-valued vector, where each entry represents a possible choice of dialogue action. NLG can then translate these dialogue actions into natural language, which can be combined with template-based determination.

[0057] As described in this paper, end-to-end neural systems can combine large-scale pre-trained language models to build chatbot dialogue systems using features and properties of GPT models (e.g., GPT-2 models). These GPT models train chatbots or chat agents according to a joint process, but do not transmit raw utterances from client / user device models or utterances contained in the corpus of training data to models that facilitate chatbot responses to user inquiries.

[0058] A set of data holders can be defined as D = {D1, D2, ..., D...} N}, where N is the total number of data holders. Each data holder D i "Own" or associated with the session dataset Where n i This refers to the given D i The size of the associated dataset. Each session C consists of a series of utterances between its user and assistant, which are fed into the corresponding user model and assistant model, respectively. The utterances are used to train the local user model and assistant model, respectively, and then the assistant model is used to train a central model, which facilitates communication with the data holder D. i(For example, the operation of online chatbots or chat agents with a defined level of accuracy for healthcare service providers, financial service providers, repair advisory providers, retailers, etc.)

[0059] A two-sided modeling framework based on the GPT-2 model can facilitate the training of chatbots without revealing utterances provided to the user-side model to the assistant-side model, or vice versa. (Data holder D) i Two GPT-2 models can be initialized, classified as user-side GPT-2 and assistant-side GPT-2. For any j-th session... The user-side GPT-2 only models user utterances, while the assistant-side GPT-2 is responsible for modeling assistant utterances or generating responses based on previous context. It should be understood that utterances may not be real-time utterances during training, but may be part of previously conducted and stored conversations, and may represent conversational messages between the present agent and the human / user.

[0060] Traditionally, to train models with multi-turn utterances, such as the GPT-2 model, all previous utterances are concatenated together. <eos>Tokens are used as delimiters, serving as the context for training the current utterance using a language modeling ("LM") model. Based on the current utterance U = {w0, w1, ..., w...} |U|-1 Given the previous context c, the goal of the LM model is to maximize the likelihood of each word tag in U:

[0061]

[0062] The probability distribution of the generated data and the basic true discourse U can be minimized by using a one-time process. 1:|U|-1 +{ <eos>The likelihood L(U) is maximized by using the cross-entropy loss between the keys and values. Besides the LM, the GPT-2 model can consist of layers of converter-decoder blocks, where the key (K) and value (V) at the previous position can only be queried (Q) by masking future keys and values ​​at the current position. This can be represented as:

[0063]

[0064] As described in this article, the FedAssistant model, or assistant-side model, can be trained using conventional methods for both the user-side and assistant-side GPT models. However, by utilizing methods such as those referenced in this article... Figure 3 The GPT model illustrated and described uses past keys and values ​​in its converter block to avoid transferring the original utterance between the user-side and assistant-side GPT models. The advantage of this two-sided modeling described in this paper is that previous keys and values ​​are sufficient to facilitate and predict the current utterance as context for modeling; future keys and values ​​are typically masked by the converter decoder block. Therefore, instead of forwarding the current utterance (the current message being processed, but which may have previously been stored in a conversational utterance corpus) from the user-side model to the assistant-side model, or vice versa, only the keys and values ​​based on and corresponding to that utterance (which can be referred to as the context associated with the utterance in this paper) are forwarded from one side of the model to the other. For example, to model the third utterance, the keys and values ​​of all word tags from the first and second utterances are used. This facilitates the assistant-side model, or FedAssistant model, training itself using past keys and values ​​without requiring the original data of the third utterance. Furthermore, the speed of GPT model training and generative inference is improved because the model does not need to compute the values ​​of the context and all previous keys.

[0065] To train two GPT-2 models (user and assistant), initially, the user-side GPT-2 models its first utterance U0 (from corpus data) and sends the corresponding computed keys and values ​​as context c to the assistant side. Then, the assistant-side GPT-2 models the response U1 based on context c and transmits the keys and values ​​of both U0 and U1 as the updated context c (which we can refer to as the second context) to the user side. This process is repeated until all utterances have been input into the model, and each model updates its parameters through language modeling. If the assistant side starts a session, the client side models U0, and the training process is similar. Unlike previous work that updated GPT-2 through all utterances, FedAssistant requires the user side to train its model only based on user utterances from the training corpus, while the assistant side requires updating its parameters only for assistant utterances from the training corpus. After training, the user-side GPT-2 can be distributed to other users, while its data holder holds the assistant-side model.

[0066] The two-sided modeling technique described in this paper offers several advantages. The new technique eliminates the need to transfer raw data between the user-side and assistant-side models, thus protecting information and other sensitive data contained in sensitive sessions (i.e., data holders possessing sensitive data used to train the assistant-side model do not expose sensitive data to third parties). Furthermore, unlike other chatbot or chatbot agent training implementations, resource-constrained user devices may not need to generate responses locally word-for-word; response generation can be performed by the assistant-side model. Additionally, service providers (such as data holders) can control the generated responses.

[0067] In one respect, in addition to the advantages mentioned above, the generative performance of the assistant-side model or FedAssistant model can be enhanced by combining a parameter server with the FedAvg algorithm executed for the assistant-side model. Before training, data holders can initialize their corresponding user-side and assistant-side models with the same parameters (e.g., pre-training the user and assistant models using a large corpus of conversation data owned by the data holder). After training the user and assistant models using the data holder's corpus (potentially including a batch of conversation data), the data holder can send an updated version of the assistant-side model (i.e., the trained version) to the parameter server, which can average updates (e.g., parameters / gradients) from multiple assistant-side models from one or more data holders to generate updated server parameters. These updated server parameters can then be sent back to the client (e.g., the user model) for a new round or the next round of training for the user and assistant models of one or more data holders. This training can continue iteratively until a certain epoch is reached, or until certain other criteria are met.

[0068] Performing FedAvg on a GPT model can result in high communication costs due to the large number of trainable parameters (e.g., hundreds of millions for a GPT-2 model) that must be transferred from the helper model to the parameter server. If FedAvg uses all parameters, for each round of FedAvg, the parameter server may need to download previous weights and upload the new averaged weights for all data holders. Therefore, parameter server bandwidth can become a bottleneck for pure joint learning using parameters from the FedAssistant helper model.

[0069] In one embodiment, to reduce the communication cost of transmitting parameters, the FedAssistant assistant model can use compression, such as sparsification (e.g., Top-k gradient sparsification). In one embodiment, for each data holder and parameter server, Top-k gradient sparsification can select k maximum magnitude gradients, thus transmitting only k gradients from multiple assistant-side models for use in FedAvg. Top-k gradients can be selected on the client side (in this case, the FedAssistant assistant-side model) and uploaded to the parameter server, and Top-k gradients can be selected on the server side after averaging the local gradients for download to the client.

[0070] Turn now Figure 3 The figure illustrates a method 300 for training an assistant model and a user model. In step 301, the user model 22 receives an inquiry message 305 from a dataset corpus corresponding to messages from a user device. The user model 22 determines the key and value corresponding to the inquiry 301 and combines the key and value into a first context 16A. In step 302, the assistant model 24 generates a response message 310 based on the first context 16A and generates a key and value corresponding to message 310, which is combined into a second context 18B as a response context. The second context 18B may include the first context 16A and the key and value corresponding to message 310. In step 303, the user model 22 models the next inquiry message 315 based on the second context 18B and generates a key and value corresponding to message 315, which is combined into a third context 16C as an inquiry context. The third context 16C may include the first context 16A, the second context 18B, and the key and value corresponding to message 315. In step 302, the assistant model 24 can model the response message 320 based on the third context 16C and generate a key and value corresponding to message 320, which are combined into the fourth context 18D as a response context. The fourth context 18D may include the first context 16A, the second context 18B, the third context 16C, and the key and value corresponding to message 320. Specifically, query messages 305 and 315 (which may be in the form of questions or responses) are not transmitted from the user model to the assistant model, or response messages 310 and 320 (which may ask questions or provide responses to questions asked in query messages) are not transmitted from the assistant model to the user model. Query messages or response messages from the training corpus can be referred to as utterances.

[0071] For a given data holder, the user-side model and the assistant-side model model the user's and assistant's utterances, respectively. The keys and values ​​of the model's converter decoder block are transmitted as context and can be used to determine the next utterance. Therefore, the raw text is not passed between the user-side model and the assistant-side model.

[0072] Algorithm 1 shows an example of an algorithm that can be used to perform joint learning for FedAssistant via Top-k sparsification.

[0073] Input: Client C1,...,C N

[0074] Input: b, the number of epochs used for local training

[0075] Input: Number of clients N

[0076] Input: Optimization function SGD

[0077] Input: training deployment m

[0078] Input: Initial parameters w = {w[0], w[1], ..., w[N-1]}

[0079]

[0080] Algorithm 1

[0081] Turn now Figure 4 The figure illustrates an embodiment method 400 that can be used to train a central model at parameter server 12 or another computing device using two-sided modeling techniques. The data used to train the user-side models 22A-22n and the assistant-side models 24A-24n may be part of a dataset corpus 30, which may include sensitive data that the data owner wishes to keep confidential from other data owners and other entities or individuals. Multiple data holders 26A-26n are shown in the figure, but only one dataset 30 or corpus of sensitive data is shown for clarity; it is understood that a data holder may not allow another data holder access to their sensitive data corpus, and each data holder 26A-26n has their own dataset corpus to train the user and assistant models based on one or more of their respective datasets.

[0082] Corpus 30 is shown to have multiple sessions 32A-32n. For each session 32, corpus 30 may include multiple query messages 34 and multiple response messages 26, wherein the response messages may correspond to the corresponding query messages.

[0083] In step 405, data from one or more corpora 30 are sent to the user model and assistant model corresponding to the respective data owners 26A-26n. As mentioned above, although only one dataset 30 is shown, if the data owners 26A-26n are different (e.g., individual entities such as banks, healthcare providers, online retailers, etc.), they will typically have their own different datasets. As shown, user model 22A and assistant model 24A have been deployed to data owner 26A for training step 410. Therefore, there is no privacy issue of sharing sensitive data with both user model 22A and assistant model 24A, because in step 410 both models are executed by the owner of the sensitive data (preferably within a computing system inaccessible to anyone other than users authorized by the given data holder). In step 410, user model 22A and assistant model 24A can be trained as described elsewhere herein, such that user model determines the query context corresponding to user query message 34 stored in corpus 30 and sends the query context to assistant model, and user model 24A determines the response context corresponding to response message 36 corresponding to the query message and sends the response context to user model.

[0084] Once user model 22A and assistant model 24A have processed messages 34 and 36 and their corresponding contexts, updating the initial parameters of both models according to the training in step 410, the updated parameters can be forwarded to the central model at parameter server 12 in step 415. Parameter server 12 can receive multiple updated parameters from data owners 26A-26n. In one embodiment, only assistant models 24A-24n can forward updated parameters to the central model at server 12. In another embodiment, both user models 22A-22n and the assistant model can forward their corresponding updated parameters to parameter server 12.

[0085] In step 420, parameter server 12 may execute the FedAvg algorithm on the updated parameters received from user models 22A-22n or assistant models 24A-24n. The result of executing the FedAvg algorithm or process on the updated parameters received from user models 22A-22n or assistant models 24A-24n is the creation of an updated central model. In step 425, the updated central model 38 may be forwarded to data holders 26A-26n, and models 22A-22n and 24A-22n may be retrained with data from corpus 30, but now starting with the updated parameters received in the updated central model 38 (instead of the initial parameters that were originally part of user models 22A-22n and assistant models 24A-24n). Steps 405-425 can be repeated a determined number of times to achieve the desired level of retraining / refinement of user models 22A-22n or assistant models 24A-24n. In one embodiment, the FedAvg algorithm / procedure is performed only on the parameters of one or more assistant models, and not on the parameters of user models 22A-22n. Therefore, the central model can be updated based solely on the assistant-side model parameters.

[0086] Turn now Figure 5A The figure illustrates one embodiment in which a user 4A using a user chat assistant application 8 running on user device 6 accesses a chatbot / chat assistant / chat agent 34 via a communication network 14. The communication network 14 can be a public network (e.g., the Internet) or a private network (e.g., a private network operated by an enterprise). (The enterprise can be a data holder possessing data used to train the user-side model and the assistant-side model as described elsewhere in this document.) The chat agent 34 can use an updated central model 38 that can be run or hosted on a server 37, which can be a parameter server as discussed elsewhere in this document, or it can be part of a computing system that does not include a parameter server. In the illustrated embodiment, an inquiry message input by user 4A into the interface of chat application 8 can be transmitted via network 14 to server 37, where chat agent application 34 determines the response message to send in response to the inquiry message. The chat agent application can use an updated central model 38 to determine the response message, so the response message can be based on updated parameters of the updated central model, which in turn can be based on trained assistant-side models trained according to the two-sided modeling techniques described herein with reference to other figures. Thus, although sensitive data may not be used to train the assistant-side models, which are then used to obtain the updated central model 38, the sensitive data input by user 4A can be provided by chat assistant 34 when the user seeks a real-time answer to his or her inquiry message. In another embodiment, instead of including the sensitive information input by user 4A into the user's chat application 6 in the inquiry message sent from user device 8 to server 37, the context based on the sensitive information input by the user can be transmitted to the server. Therefore, since the updated central model is trained based on context corresponding to sensitive data that is part of the training corpus / dataset session, but not on the actual sensitive information corresponding to the context, the central model also generates response messages to query messages that include context or contextual information corresponding to the sensitive information of the user input seeking a real-time response to his or her query, and the sensitive information is not actually forwarded from user device 8 to computing device 37.

[0087] Turn now Figure 5B The figure illustrates an embodiment in which a user chat application 8 running on user device 6 can provide responses to queries from the user, including sensitive information, through the chat application without transmitting the sensitive information from the user device. User device 6 may have received an updated central model, such as the updated central model 38 described above. User chat application 8 can access the updated central model on user device 6 and generate a query context based on the sensitive information input to the chat application, and the updated central model running on the user device can generate a response to the query context based on the context rather than on the sensitive information.

[0088] Turn now Figure 6 The figure illustrates embodiment method 600, which can be used to train a central model using a two-sided modeling technique or framework without sharing sensitive information input to one model of the two-sided modeling framework with the other model of the two-sided modeling framework. Method 600 begins at step 605. In step 610, the central learning model is deployed to the models of the two-sided modeling framework. The models of the two-sided modeling framework may include a user model and an assistant model. The central model and the two models of the two-sided modeling framework may include the same learning model (which may be based on, for example, the GPT-2 model), wherein the same parameters / gradients are used to initialize the two models and the central model. To train the user model and the assistant model, a corpus of conversation data can be used. The conversation data may include sensitive information and may be owned by a data owner or data holder who wishes to protect the privacy of sensitive information of users or entities that may utter sensitive information during a conversation. It should be understood that the corpus of conversation information may include previously stored conversation information or may include conversation utterances generated in real time during model training.

[0089] In step 615, user messages from the corpus are provided to the user model, and response messages in response to the user messages are provided to the assistant model. It should be understood that during training, the user model and assistant model can execute on the same computing device, on different computing devices that may be part of the same computing system, or on different computing devices that are part of different computing systems coupled through a communication network. User messages can be referred to as queries, and response messages can be referred to as response messages in response to queries. Queries and their responses may have already been generated during a session between the user and the data owner's agent. The data owner's agent interacting with the user during the session (the messages in the session may be part of the conversation corpus) may be a live person typing in a chat box, or an automated chat agent generating a response in response to a user's query. As previously stated, user messages or queries and chat agent responses may be part of a session stored in the corpus session. In step 617, the first message of the corpus session is ready for evaluation and is evaluated by the user model.

[0090] In step 620, the user model can process the first message of a given session from the message corpus (which may be referred to as the first query of the session used to train the user and assistant models) to determine a first user context or a first query context corresponding to the first user message or the first query in the corpus. The first user context may include contextual information, such as keys and values ​​generated by the user model based on the language of the first query of the session. (Similarly, the first assistant context may include key and value information generated by an assistant model corresponding to the language of the first response message in response to the first query.) In step 622, the parameters of the user model may be updated based on the determined context. In step 625, the first user context is transferred to the assistant model. The actual language of the corpus session analyzed in step 620 is not transferred to the assistant model in step 625.

[0091] In step 630, the assistant model or chatbot model can determine a first response to the first query from a corpus of conversational messages. In one embodiment, as part of determining the first response message, the assistant model can analyze the first user context transmitted from the user model in step 625. The first response determined in step 630 is not determined by analyzing the actual language (potentially sensitive information) of the conversation used in step 620 to generate the first user context. In step 635, the assistant-side model determines a first assistant context corresponding to the first response determined in step 630 in response to the first user context. The first assistant context may include the first user context. In step 637, the parameters of the assistant model can be updated based on the determined first assistant context. In step 640, the context corresponding to the response generated in step 630 is transmitted to the user model. In step 645, it is determined whether there are any more conversational messages. The determination in step 645 can be performed by the user model, or the response message sent from the assistant model in step 640 can contain an indication to the user model that there are no other conversational messages that need to be processed for training the user model and assistant model based on the conversation. If it is determined to be yes in step 645, then in step 650, a second set of query / response messages from the session in the corpus is selected for evaluation by the user model and the assistant model, and steps 628 to 640 are repeated for the other query / response messages in the session. It should be understood that, for the second set of query / response messages, the references to the first user query, the first user context, the first assistant response, and the first assistant context during the first iteration of steps 620-640 will respectively refer to steps 620-640 used in the second iteration as the second user query, the second user context, the second assistant response, and the second assistant context, and the third set of query / response messages will be set as the third user query, the third user context, the third assistant response, the third assistant context, etc.

[0092] If it is determined to be no in step 645 (e.g., there are no more messages to process in the session), then method 600 proceeds to step 655. In step 655, it is determined whether to perform further iterations of steps 620 to 645. For example, the training operation can be configured to perform a determined number of iterations of steps 620 to 645 to improve the accuracy of the parameters determined by the user model and the assistant model during training. If it is determined to be yes in step 655 (to perform more iterations), then in step 660, the training operation sets the message of the session as the first message to be evaluated, and method 600 returns to step 620 and repeats steps 620 to 645 / 650. It should be understood that the session to be evaluated during the re-execution of steps 620 to 645 / 650 can be the same session evaluated in the previous iterations, or it can be a different session than the session evaluated in the previous iterations.

[0093] If it is determined to be no at step 655 (e.g., no further iterations of steps 620 to 645 / 650 are performed), then method 600 proceeds to step 665. At step 665, the central model that has not been updated during the execution of steps 620-655 can be updated using parameters from one or more models that may have already executed some or all of steps 617 to 655. In one embodiment, the central model is updated only based on parameters from one or more helper-side models. In one embodiment, updating the central model at step 665 may include: performing an algorithm based on one or more values, keys, or other information to generate an updated central model for one or more parameters / gradients of one or more helper models that have already executed one or more previous steps of method 600. In one embodiment, updating the central model at step 665 may include: performing an algorithm based on one or more values, keys, or other information to generate an updated central model for one or more parameters / gradients of one or more user models that have already executed one or more previous steps of method 600. The algorithm executed at step 665 may include the FedAvg algorithm. In step 670, the updated central model can be deployed to the user's device or a computer system device accessible via a communication network to facilitate the user's access to assistance from a chatbot or chat agent (this can be facilitated by the updated central model running on a server physically or logically remote from the user), without requiring the user to transmit sensitive information to the remotely located updated central model. Method 600 ends in step 675.

[0094] To provide additional context for the various embodiments described herein, Figure 7 The following discussion is intended to provide a brief overview of a suitable computing environment 700 in which various embodiments of the embodiments described herein may be implemented. Although the embodiments have been described above in the general context of computer-executable instructions that can run on one or more computers, those skilled in the art will recognize that the embodiments may also be implemented in combination with other program modules, and / or as a combination of hardware and software.

[0095] Typically, program modules include routines, programs, components, data structures, etc., that perform specific tasks or implement specific abstract data types. Furthermore, those skilled in the art will understand that these methods can be practiced with other computer system configurations, including single-processor or multi-processor computer systems, minicomputers, mainframes, IoT devices, distributed computing systems, and personal computers, handheld computing devices, microprocessor-based or programmable consumer electronics, each operatively coupled to one or more associated devices.

[0096] The embodiments shown herein can also be implemented in a distributed computing environment, where some tasks are performed by remote processing devices linked via a communication network. In a distributed computing environment, program modules can reside in both local and remote memory storage devices.

[0097] Computing devices typically include a variety of media, which may include computer-readable storage media, machine-readable storage media, and / or communication media. These two terms are used interchangeably herein as follows. A computer-readable storage medium or a machine-readable storage medium can be any available storage medium that is accessible to a computer, and includes volatile and non-volatile media, removable and non-removable media. By way of example and not limitation, a computer-readable storage medium or a machine-readable storage medium can be implemented in conjunction with any method or technology used for storing information, such as computer-readable or machine-readable instructions, program modules, structured or unstructured data, etc.

[0098] Computer-readable storage media may include, but are not limited to: random access memory (RAM), read-only memory (ROM), electrically erasable programmable read-only memory (EEPROM), flash memory or other storage technologies, optical disc read-only memory (CDROM), digital versatile disc (DVD), Blu-ray disc (BD), or other optical disc storage devices, magnetic tape, magnetic stripe, magnetic disk storage devices, or other magnetic storage devices, solid-state drives or other solid-state storage devices, or other tangible and / or non-transitory media that can be used to store desired information. In this regard, the terms "tangible" or "non-transitory" used herein to refer to storage devices, memories, or computer-readable media shall be understood to exclude only the modification of the propagation of transient signals themselves, and shall not waive the rights to all standard storage devices, memories, or computer-readable media that do not only propagate transient signals themselves.

[0099] One or more local or remote computing devices may (e.g., via access requests, queries or other data retrieval protocols) access a computer-readable storage medium to perform various operations related to the information stored on the medium.

[0100] Communication media typically embody computer-readable instructions, data structures, program modules, or other structured or unstructured data in data signals (such as modulated data signals, carrier waves, or other transmission mechanisms), and include any information transmission or delivery medium. The term "modulated data signal" or signal refers to a signal having one or more characteristics set or altered in a manner that encodes information in one or more signals. By way of example and not limitation, communication media include wired media, such as wired networks or direct wired connections, and wireless media, such as acoustic, RF, infrared, and other wireless media.

[0101] Refer again Figure 7 An example environment 700 for implementing various embodiments of the aspects described herein includes a computer 702, which includes a processing unit 704, system memory 706, and a system bus 708. The system bus 708 couples system components, including but not limited to system memory 706, to the processing unit 704. The processing unit 704 can be any type of commercial processor and may include cache memory. Dual-microprocessor and other multiprocessor architectures can also be used as the processing unit 704.

[0102] System bus 708 can be any type of bus architecture and can interconnect to memory buses (with or without a memory controller), peripheral buses, and any type of local bus using a variety of commercially available bus architectures. System memory 706 includes ROM 710 and RAM 712. The Basic Input / Output System (BIOS) can be stored in non-volatile memory (e.g., ROM, Erasable Programmable Read-Only Memory (EPROM), EEPROM), where the BIOS contains (e.g., during startup) basic routines that help transfer information between components within computer 702. RAM 612 can also include high-speed RAM, such as static RAM for caching data.

[0103] Computer 702 also includes an internal hard disk drive (HDD) 714 (e.g., EIDE, SATA), one or more external storage devices 716 (e.g., floppy disk drive (FDD) 716, memory stick or flash drive reader, memory card reader, etc.), and an optical disc drive 720 (e.g., capable of reading or writing CD-ROM discs, DVDs, BDs, etc.). While the internal HDD 714 is shown as residing within computer 702, it may also be configured for external use within a suitable chassis (not shown). Furthermore, although not shown in environment 700, a solid-state drive (SSD) may be used to supplement or replace the HDD 714. The HDD 714, external storage devices 716, and optical disc drive 720 may be connected to system bus 708 via HDD interface 724, external storage interface 726, and optical disc drive interface 728, respectively. Interface 724 for the external drive implementation may include at least one or both of Universal Serial Bus (USB) and Institute of Electrical and Electronics Engineers (IEEE) 1394 interface technologies. Other external driver connection technologies are within the scope of the embodiments described herein.

[0104] Drives and their associated computer-readable storage media provide non-volatile storage of data, data structures, computer-executable instructions, etc. For computer 702, drives and storage media accommodate any data stored in a suitable digital format. Although the above description of computer-readable storage media refers to various types of storage devices, those skilled in the art should understand that other types of computer-readable storage media, whether currently existing or developed in the future, can be used in the example operating environment, and further, any such storage medium can contain computer-executable instructions for performing the methods described herein.

[0105] Some program modules may be stored in the drive and RAM 712, including the operating system 730, one or more application programs 732, other program modules 734, and program data 736. All or part of the operating system, application programs, modules, and / or data may also be cached in RAM 712. The systems and methods described herein can be implemented using a variety of commercially available operating systems or combinations of operating systems.

[0106] Computer 702 may optionally include emulation technology. For example, a hypervisor (not shown) or other intermediary may emulate the hardware environment of operating system 730, and the emulated hardware may optionally be different from that of operating system 730. Figure 7 The hardware shown is illustrated. In such an embodiment, the operating system 730 may include one of a plurality of virtual machines (VMs) hosted on the computer 702. Furthermore, the operating system 730 may provide a runtime environment for the application 732, such as the Java Runtime Environment or the .NET Framework. A runtime environment is a persistent execution environment that allows the application 732 to run on any operating system that includes a runtime environment. Similarly, the operating system 730 may support containers, and the application 732 may be in the form of a container, which is a lightweight, standalone, executable software package including, for example, code, runtime, system tools, system libraries, and application settings.

[0107] Furthermore, computer 702 may include a security module, such as a Trusted Processing Module (TPM). For example, using a TPM, the boot component hashes the next boot component in a timely manner and waits for the result to match a security value before loading the next boot component. This process can occur at any layer of the computer 602's code execution stack (e.g., applied at the application execution level or the operating system (OS) kernel level), thereby achieving security at any code execution level.

[0108] Users can input commands and information into computer 702 using one or more wired / wireless input devices, such as keyboard 738, touchscreen 740, and pointing devices (e.g., mouse 742). Other input devices (not shown) may include microphones, infrared (IR) remote controls, radio frequency (RF) remote controls or other remote controls, joysticks, virtual reality controllers and / or virtual reality headsets, game controllers, styluses, image input devices (e.g., cameras), gesture sensor input devices, visual motion sensor input devices, emotion or face detection devices, biometric input devices (e.g., fingerprint or iris scanners), etc. These and other input devices are typically connected to processing unit 704 via input device interface 744, which may be coupled to system bus 708, but may also be connected via other interfaces, such as parallel interfaces, IEEE 1394 serial interfaces, gaming interfaces, USB interfaces, IR interfaces, etc. Interfaces, etc.

[0109] The monitor 746 or other types of display devices can also be connected to the system bus 608 via an interface (such as a video adapter 748). In addition to the monitor 746, the computer typically includes other peripheral output devices (not shown), such as speakers, printers, etc.

[0110] Computer 702 can operate in a networked environment using logical connections to one or more remote computers (e.g., remote computer 750) via wired and / or wireless communications. Remote computer 750 can be a workstation, server computer, router, personal computer, laptop computer, microprocessor-based entertainment device, peer-to-peer device, or other public network node, and although only memory / storage device 752 is shown for simplicity, remote computers typically include many or all of the elements described relative to computer 702. The depicted logical connections include wired / wireless connections to a local area network (LAN) 754 and / or a larger network (e.g., a wide area network (WAN) 756). Such LAN and WAN networking environments are common in offices and companies and facilitate enterprise-wide computer networks (e.g., intranets), all of which can connect to global communications networks (e.g., the Internet).

[0111] When used in a LAN network environment, computer 702 can connect to local network 754 via a wired and / or wireless communication network interface or adapter 758. Adapter 758 can facilitate wired or wireless communication to LAN 754, and LAN 754 may also include a wireless access point (AP) set up on the LAN for communicating with adapter 758 in wireless mode.

[0112] When used in a WAN network environment, computer 702 may include modem 760 or be connected to a communication server on WAN 756 via other means (e.g., via the Internet) for establishing communication on WAN 756. Modem 760, which may be an internal or external device, and a wired or wireless device, may be connected to system bus 708 via input device interface 744. In a network environment, program modules depicted relative to computer 702 or parts thereof may be stored in remote memory / storage device 752. It should be understood that the network connection shown is an example, and other means of establishing communication links between computers may be used.

[0113] When used in a LAN or WAN networking environment, computer 702 can access cloud storage systems or other network-based storage systems to supplement or replace external storage device 716 as described above. Typically, the connection between computer 702 and the cloud storage system can be established, for example, via adapter 758 or modem 760 through LAN 754 or WAN 756. When computer 702 is connected to an associated cloud storage system, external storage interface 726 can manage the storage devices provided by the cloud storage system with the help of adapter 758 and / or modem 760, just as it would manage other types of external storage devices. For example, external storage interface 726 can be configured to provide access to cloud storage sources as if these sources were physically connected to computer 702.

[0114] Computer 702 can be operated to communicate with any wireless device or entity operably configured for wireless communication (e.g., printers, scanners, desktop and / or portable computers, portable data assistants, communication satellites, any device or location associated with wirelessly detectable tags (e.g., kiosks, newsstands, shop shelves, etc.) and telephones). This can include Wi-Fi and... Wireless technology. Therefore, communication can be a predefined structure like traditional networks, or simply self-organizing communication between at least two devices.

[0115] Turn now Figure 8 The figure illustrates an exemplary embodiment of method 800, including: at block 805, receiving first context information representing a first context corresponding to sensitive information from a first initial learning model executed on a computing device including a processor; at block 810, inputting the first context information into a second initial learning model executed on the computing device; and at block 815, using the second initial learning model to determine response information in response to the sensitive information based on the first context information. Method 800 may further include: at block 820, determining updated context information based on the first context information and second context information representing a second context corresponding to the response information using the second initial learning model; and at block 825, transferring the updated context information from the second initial learning model to the first initial learning model. Method 800 may further include: at block 830, training a central learning model by a second computing device according to a joint learning model to generate an updated central model based on different response information generated in response to different contexts received from different assistant learning models, these different contexts corresponding to different conversational dialogues between different user learning models and different assistant learning models. Method 800 may further include: at block 835, using a central computing device comprising a central computing system including a processor to determine a response to an inquiry received from a user device configured to present a dialogue agent application interface of a user device; and at block 840, using the central computing system to send the response to the dialogue agent application interface of the user device; wherein the central computing device uses an updated central model to determine the response to the inquiry.

[0116] Turn now Figure 9 The figure illustrates a computing system 900 including a computing device. The processor of this computing device is configured to: in block 905, receive first context information representing a first context corresponding to sensitive information from a first initial learning model executed on the computing device; in block 910, input the first context information into a second initial learning model executed on the computing device; and in block 915, determine response information in response to the sensitive information based on the first context information using the second initial learning model. The processor of the computing device of system 900 can also be configured to: in block 920, determine updated context information based on the first context information and second context information representing a second context corresponding to the response information using the second initial learning model; and in block 925, transfer the updated context information from the second initial learning model to the first initial learning model. The processor of the computing device of system 900 can also be configured to: in block 930, train the second initial learning model based on the response information to generate an updated second learning model; and in block 935, combine the parameters corresponding to the updated second learning model with parameters from other models to generate an updated central learning model.

[0117] Now go to Figure 10 The figure illustrates, at box 1005, a non-transitory machine-readable medium including executable instructions. When executed by a processor of a computing device including access to a first initial learning model and a second initial learning model, the operations facilitated by the executable instructions include: inputting first sensitive information into the first initial learning model; at box 1010, determining first context information corresponding to the first sensitive information using the first initial learning model; at box 1015, transferring the first context information to the second initial learning model; and at box 1020, determining, based on the first context information, response language information in response to the first context information using the second initial learning model, and based on the... The system determines first updated context information based on second context information in response to the language information; in box 1025, the first updated context information is transmitted to a first initial learning model; in box 1030, second sensitive information in response to the first updated context information is input to the first initial learning model; in box 1035, a third context information corresponding to the second sensitive information and the first updated context information is determined using the first initial learning model; in box 1040, a second updated context information is determined based on the first updated context information and the third context information; and in box 1045, the second updated context information is transmitted to a second initial learning model. In box 1050, executable instructions are configured to provide the updated second learning model to a parameter server for training a central learning model to generate an updated central model.

[0118] The above description includes non-limiting examples of various embodiments. It is certainly impossible to describe every possible combination of components or methods in order to describe the disclosed subject matter, and those skilled in the art will recognize that further combinations and arrangements of various embodiments are possible. The disclosed subject matter is intended to cover all such changes, modifications, and variations that fall within the spirit and scope of the appended claims.

[0119] Regarding the various functions performed by the aforementioned components, devices, circuits, systems, etc., unless otherwise stated, the terminology used to describe such components (including references to "apparatus") is intended to also include any structure (e.g., functional equivalent) that performs the specified function of the described component, even if it is not structurally equivalent to the disclosed structure. Furthermore, while specific features of the disclosed subject matter may have been disclosed only with respect to one of several embodiments, such features may be combined with one or more other features of any given or particular application that may require and be advantageous in other implementations.

[0120] The terms "exemplary" and / or "illustrative," or variations thereof, as may be used herein, are intended to mean as an example, instance, or illustration. For the avoidance of doubt, the subject matter disclosed herein is not limited to such examples. Furthermore, any aspect or design described herein as "exemplary" and / or "illustrative" is not necessarily to be construed as preferred or advantageous to other aspects or designs, nor does it exclude equivalent structures and techniques known to those skilled in the art. Moreover, within the scope of the terms "comprising," "having," "including," and other similar words used in the detailed description or claims, such terms are intended to be inclusive—in a manner similar to the term "comprising" as an open transition word—and do not exclude any additional or other elements.

[0121] The term "or" as used herein is intended to mean inclusive "or" rather than exclusive "or". For example, the phrase "A or B" is intended to include instances of A, B, and both A and B. Furthermore, unless otherwise stated or clearly indicated from the context to be in the singular form, the articles "a" and "an" as used in this application and the appended claims should generally be interpreted as "one or more".

[0122] The term "set" as used herein does not include an empty set, i.e., a set containing no elements. Therefore, "set" in this subject disclosure includes one or more elements or entities. Similarly, the term "group" as used herein refers to a collection of one or more entities.

[0123] Unless the context clearly indicates otherwise, the terms "first," "second," "third," etc., used in the claims are for clarity only and do not otherwise indicate or imply any order of time. For example, "first determination," "second determination," and "third determination" do not mean or imply that the first determination will be made before the second determination, and vice versa.

[0124] The description of the embodiments shown in the subject matter disclosed herein, including those described in the abstract, is not intended to be exhaustive or to limit the disclosed embodiments to the precise form disclosed. While specific embodiments and examples have been described herein for illustrative purposes, various modifications are possible within the scope of these embodiments and examples, as will be appreciated by those skilled in the art. In this regard, although the subject matter has been described herein in conjunction with various embodiments and corresponding drawings, it should be understood where applicable that other similar embodiments may be used, or modifications and additions may be made to the described embodiments to perform the same, similar, alternative, or alternative functions of the disclosed subject matter without departing from itself. Therefore, the disclosed subject matter should not be limited to any single embodiment described herein, but should be interpreted within the breadth and scope of the following appended claims.< / eos> < / eos>

Claims

1. A method comprising: First context information is received from a first initial learning model executed on a first computing device including a processor, the first context information representing a first context corresponding to sensitive information; The first context information is input into the second initial learning model executed on the first computing device; and The second initial learning model is used to determine the response information in response to the sensitive information based on the first context information, wherein the second initial learning model does not receive the sensitive information, and The method further includes: a second computing device training a central learning model based on a joint learning model to obtain an updated central model based on different response information generated in response to different contexts received from different assistant learning models, the different contexts corresponding to different conversational dialogues between different user learning models and the different assistant learning models.

2. The method according to claim 1, further comprising: The updated context information is determined based on the first context information and the second context information by using the second initial learning model, where the second context information represents the second context corresponding to the response information; and The updated context information is transferred from the second initial learning model to the first initial learning model.

3. The method according to claim 1, further comprising: The first computing device trains the second initial learning model based on the response information to obtain an updated second learning model.

4. The method of claim 3, wherein the sensitive information input to the first initial learning model is not used to train the second initial learning model to obtain the updated second learning model.

5. The method of claim 1, wherein the first initial learning model and the second initial learning model comprise a pre-trained language model.

6. The method of claim 5, wherein the pre-trained language model includes generating a pre-trained converter model.

7. The method according to claim 1, further comprising: A central computing device, utilizing a central computing system including a processor, determines a response to an inquiry received from a user equipment configured to present a conversational agent application interface of the user equipment. and The central computing system is used to send the response to the dialogue agent application interface of the user device. The central computing device uses the updated central model to determine the response to the query.

8. The method according to claim 1, further comprising: Determine a response to a query input to a conversational agent of a user device, the user device including a processor and configured to present a conversational agent application interface of the user device; The user equipment uses the updated central model to determine the response to the query.

9. A computing system, comprising a processor, the processor being configured to: First context information is received from a first initial learning model executed on a first computing device, the first context information representing a first context corresponding to the sensitive information; The first context information is input into the second initial learning model executed on the first computing device; and The second initial learning model is used to determine the response information in response to the sensitive information based on the first context information, wherein the second initial learning model does not receive the sensitive information, and The processor is further configured to: train a central learning model by a second computing device based on a joint learning model to obtain an updated central model based on different response information generated in response to different contexts received from different assistant learning models, the different contexts corresponding to different conversational dialogues between different user learning models and the different assistant learning models.

10. The computing system of claim 9, wherein the processor is further configured to: Updated context information is determined based on the first and second context information using the second initial learning model, where the second context information represents a second context corresponding to the response information; and The updated context information is transferred from the second initial learning model to the first initial learning model.

11. The computing system of claim 9, wherein the processor is further configured to: train the second initial learning model based on the response information to obtain an updated second learning model.

12. The computing system of claim 11, wherein the sensitive information input to the initial learning model is not used to train the second initial learning model to obtain the updated second learning model.

13. The computing system of claim 9, wherein the first initial training model and the second initial training model comprise a pre-trained central language model.

14. The computing system of claim 11, wherein the parameters corresponding to the updated second learning model are combined with parameters from other models to obtain an updated central learning model.

15. A non-transitory machine-readable medium comprising executable instructions, which, when executed by a processor of a computing device including access to a first initial learning model and a second initial learning model, cause the executable instructions to perform operations including: Input the first sensitive information into the first initial learning model; The first initial learning model is used to determine the first context information corresponding to the first sensitive information; Transmit the first context information to the second initial learning model; Using the second initial learning model, response language information in response to the first context information is determined based on the first context information, and first updated context information is determined based on the second context information corresponding to the response language information, wherein the second initial learning model does not receive the first sensitive information; The first updated context information is transmitted to the first initial learning model; The second sensitive information, which is in response to the first updated context information, is input into the first initial learning model; The first initial learning model is used to determine the third context information corresponding to the second sensitive information and the first updated context information; The second updated context information is determined based on the first updated context information and the third context information; and The second updated context information is transmitted to the second initial learning model, wherein the second initial learning model does not receive the second sensitive information.

16. The non-transitory machine-readable medium of claim 15, wherein the operation caused by the executable instructions further includes: The second initial learning model is trained based on the first context information, the second context information, and the third context information to obtain an updated second learning model.

17. The non-transitory machine-readable medium of claim 15, wherein the executable instructions are configured to: train the second initial learning model not based on the first sensitive information or the second sensitive information.

18. The non-transitory machine-readable medium of claim 15, wherein the executable instructions are configured to: provide the updated second learning model to a parameter server for training a central learning model to obtain an updated central model.