Interaction method and apparatus, device, and storage medium
By generating topic information and determining task configuration, machine learning models are used to improve the recommendation accuracy of digital assistants in multi-turn dialogues, solving the problem of inaccurate understanding and recommendation content by traditional digital assistants.
Patent Information
- Authority / Receiving Office
- WO · WO
- Patent Type
- Applications
- Current Assignee / Owner
- BEIJING ZITIAO NETWORK TECH CO LTD
- Filing Date
- 2024-12-16
- Publication Date
- 2026-06-25
AI Technical Summary
Traditional digital assistants still need to improve their understanding of multi-turn conversations and the accuracy of their content recommendations.
The semantics of interactive content are described by generating topic information, the task configuration is determined, and the target recommended content is presented based on the configuration. Machine learning models such as neural networks are used for understanding and recommendation.
It improves the accuracy and quality of recommended content in multi-turn dialogue scenarios.
Smart Images

Figure CN2024139734_25062026_PF_FP_ABST
Abstract
Description
Interaction methods, devices, equipment and storage media Technical Field
[0001] The exemplary embodiments disclosed herein generally relate to the field of computers, and particularly to interactive methods, apparatuses, devices, computer-readable storage media, and computer program products. Background Technology
[0002] With the development of information technology, various terminal devices can provide people with a variety of services in work and life. Applications providing these services can be deployed on these terminal devices. The terminal devices present relevant content and interact with users through the application's user interface to meet various user needs. In some cases, terminal devices can utilize tools such as digital assistants to converse with users and recommend content. Summary of the Invention
[0003] In a first aspect of this disclosure, an interaction method is provided. The method includes: in response to receiving a user query, generating topic information based on interaction content from at least one interaction round regarding the user query, the topic information describing the semantics of the interaction content from at least one interaction round; determining a task configuration corresponding to the query requirement indicated by the user query based on the topic information, the task configuration including at least one task parameter; and presenting targeted recommended content for the user query based on the task configuration.
[0004] In a second aspect of this disclosure, an apparatus for interaction is provided. The apparatus includes: a generation module configured to, in response to receiving a user query, generate topic information based on interaction content from at least one interaction round regarding the user query, the topic information describing the semantics of the interaction content from at least one interaction round; a determination module configured to, based on the topic information, determine a task configuration corresponding to a query requirement indicated by the user query, the task configuration including at least one task parameter; and a presentation module configured to, based on the task configuration, present targeted recommended content for the user query.
[0005] In a third aspect of this disclosure, an electronic device is provided. The device includes at least one processing unit; and at least one memory coupled to the at least one processing unit and storing instructions for execution by the at least one processing unit. When executed by the at least one processing unit, the instructions cause the device to perform the method of the first aspect.
[0006] In a fourth aspect of this disclosure, a computer-readable storage medium is provided. The computer-readable storage medium stores a computer program that can be executed by a processor to implement the method of the first aspect.
[0007] In a fifth aspect of this disclosure, a computer program product is provided, comprising a computer program that, when executed by a processor, implements the method according to a first aspect of this disclosure.
[0008] It should be understood that the content described in this content section is not intended to limit the key or essential features of the embodiments of this disclosure, nor is it intended to restrict the scope of this disclosure. Other features of this disclosure will become readily apparent from the following description. Attached Figure Description
[0009] The above and other features, advantages, and aspects of the embodiments of this disclosure will become more apparent from the accompanying drawings and the following detailed description. In the drawings, the same or similar reference numerals denote the same or similar elements, wherein:
[0010] Figure 1 shows a schematic diagram of an example environment in which embodiments of the present disclosure may be implemented;
[0011] Figure 2 shows a flowchart of a process for interaction according to some embodiments of the present disclosure;
[0012] Figure 3 illustrates a schematic diagram of an example architecture for interaction according to some embodiments of the present disclosure;
[0013] Figure 4 illustrates a schematic diagram of an example architecture for interaction according to some embodiments of the present disclosure;
[0014] Figure 5 shows a flowchart of an example process for interaction according to some embodiments of the present disclosure;
[0015] Figure 6 shows a flowchart of an example process for interaction according to some embodiments of the present disclosure;
[0016] Figure 7 illustrates a schematic diagram of an example architecture for interaction according to some embodiments of the present disclosure;
[0017] Figure 8 shows a schematic structural block diagram of an example device for interaction according to some embodiments of the present disclosure; and
[0018] Figure 9 shows a block diagram of an electronic device capable of implementing several embodiments of the present disclosure. Detailed Implementation
[0019] Embodiments of this disclosure will now be described in more detail with reference to the accompanying drawings. While some embodiments of this disclosure are shown in the drawings, it should be understood that this disclosure can be implemented in various forms and should not be construed as limited to the embodiments set forth herein. Rather, these embodiments are provided to provide a more thorough and complete understanding of this disclosure. It should be understood that the accompanying drawings and embodiments of this disclosure are for illustrative purposes only and are not intended to limit the scope of protection of this disclosure.
[0020] In the description of embodiments of this disclosure, the term "comprising" and similar terms should be understood as open-ended inclusion, i.e., "including but not limited to". The term "based on" should be understood as "at least partially based on". The term "one embodiment" or "the embodiment" should be understood as "at least one embodiment". The term "some embodiments" should be understood as "at least some embodiments". Other explicit and implicit definitions may also be included below.
[0021] In this document, unless explicitly stated otherwise, performing a step in response to A does not mean that the step is performed immediately after A, but may include one or more intermediate steps.
[0022] It is understood that the data involved in this technical solution (including but not limited to the data itself, the acquisition or use of the data) shall comply with the requirements of relevant laws, regulations and related provisions.
[0023] It is understood that before using the technical solutions disclosed in the various embodiments of this disclosure, users should be informed of the types, scope of use, and usage scenarios of the personal information involved in this disclosure through appropriate means in accordance with relevant laws and regulations, and user authorization should be obtained.
[0024] For example, in response to receiving a user's active request, a prompt message is sent to the user to clearly inform the user that the requested operation will require the acquisition and use of the user's personal information, thereby enabling the user to choose whether to provide personal information to the software or hardware such as electronic devices, applications, servers or storage media that perform the operation of the technical solution disclosed herein, based on the prompt message.
[0025] As an optional but non-restrictive implementation, in response to a user's active request, a prompt message can be sent to the user, such as a pop-up window, where the prompt message can be presented in text format. Furthermore, the pop-up window can also include a selection control allowing the user to choose "agree" or "disagree" to provide personal information to the electronic device.
[0026] It is understood that the above notification and user authorization process are merely illustrative and do not constitute a limitation on the implementation of this disclosure. Other methods that comply with relevant laws and regulations may also be applied to the implementation of this disclosure.
[0027] As used in this paper, the term "model" refers to a model that learns the relationship between inputs and outputs from training data, enabling it to generate corresponding outputs for a given input after training. Model generation can be based on machine learning techniques. Deep learning is a machine learning algorithm that processes inputs and provides corresponding outputs using multiple layers of processing units. A neural network model is an example of a deep learning-based model. In this paper, "model" may also be referred to as a "machine learning model," "learning model," "machine learning network," or "learning network," and these terms are used interchangeably.
[0028] A neural network is a machine learning network based on deep learning. A neural network processes input and provides a corresponding output, typically consisting of an input layer, an output layer, and one or more hidden layers between the input and output layers. Neural networks used in deep learning applications often include many hidden layers, thus increasing the network's depth. The layers of a neural network are connected sequentially, so that the output of the previous layer is provided as the input to the next layer. The input layer receives the input to the neural network, while the output layer's output serves as the final output. Each layer of a neural network includes one or more nodes (also called processing nodes or neurons), each node processing the input from the layer above.
[0029] Machine learning typically comprises three phases: training, testing, and application (also known as inference). In the training phase, a given model is trained using a large amount of training data, iteratively updating its parameter values until the model can consistently generate inferences that meet the expected goals from the training data. Through training, the model can be considered to have learned the relationship between inputs and outputs (also known as the input-output mapping) from the training data. The parameter values of the trained model are determined. In the testing phase, test inputs are applied to the trained model to test whether it can provide the correct output, thus determining the model's performance. In the application phase, the model can be used to process actual inputs based on the trained parameter values to determine the corresponding output.
[0030] As mentioned above, with the development of information technology, various terminal devices can provide people with a variety of services in work and life. Applications providing these services can be deployed on these terminal devices. The terminal devices present relevant content and interact with users through the application's user interface, meeting various user needs. In some cases, terminal devices can utilize tools such as digital assistants to converse with users and recommend content. However, the understanding of multi-turn dialogue and the accuracy of content recommendations by traditional digital assistants still need improvement.
[0031] In view of this, embodiments of the present disclosure propose an improved scheme for interaction. In this scheme, if a user query is received, topic information is generated based on the interaction content of at least one interaction round related to the user query. This topic information is used to describe the semantics of the interaction content of at least one interaction round. Based on the topic information, a task configuration corresponding to the query requirement indicated by the user query is determined, the task configuration including at least one task parameter. Subsequently, based on the task configuration, targeted recommended content for the user query is presented.
[0032] In the embodiments of this disclosure, the interactive content is summarized into more concise thematic information, thereby highlighting the interactive needs. This allows for an accurate understanding of the semantics of the user's multi-turn dialogue, determining the task configuration indicating the query request, and providing recommended content to the user based on the task configuration. Therefore, in multi-turn dialogue scenarios, the accuracy and quality of the recommended content provided to the user can be improved.
[0033] The following section provides a detailed description of various example implementations of this scheme, with reference to the accompanying drawings.
[0034] Example Environment
[0035] Figure 1 illustrates a schematic diagram of an example environment 100 in which embodiments of the present disclosure can be implemented. Environment 100 relates to a dialogue system 110 that can support interaction with a user 145. In some examples, in a local life services scenario, the dialogue system 110 can support the user 145 to submit user queries in natural language and can provide the user 145 with recommended content based on the user query, such as shops, dishes, etc. The user 145 can be referred to as the end user of the dialogue system 110.
[0036] In some embodiments, the dialogue system 110 may include or be implemented as a digital assistant 122. The digital assistant 122 may be configured to have intelligent dialogue capabilities. As an example, the digital assistant 122 may be configured as a standalone application, such as a web application or other type of application. As another example, the digital assistant 122 may be configured within an application, as part of the application. In such examples, the digital assistant 122 and the application can be considered as the same application. The digital assistant 122 is provided to assist users with various task processing needs in different applications and scenarios. During interaction with the digital assistant 122, the user inputs interactive messages, and the digital assistant 122 responds to the user's input by providing response messages. Typically, the digital assistant 122 is able to support users inputting questions in natural language and perform tasks and provide responses based on its understanding of natural language input and logical reasoning capabilities.
[0037] For each user 145, the client of the dialogue system 110 can present the interaction window 142 of the digital assistant 122 in the client interface, such as a dialogue window with the digital assistant 122. The user 145 can enter messages in the dialogue window, and the system 110 can determine the response message from the digital assistant 122 and present it to the user 145 in the interaction window 142. In some embodiments, the interaction messages of the dialogue system 110 may include multimodal messages, such as text messages (e.g., natural language text), voice messages, image messages, video messages, and so on.
[0038] The dialogue system 110 can be deployed locally on each user's 145 terminal device and / or supported by a server device. For example, user 145's terminal device can run a client of the dialogue system 110, which can support the interaction between the user and the server-provided portion. When the dialogue system 110 runs locally on the user's terminal device, user 145 can directly interact with the local dialogue system 110 using the terminal device. When the dialogue system 110 runs on a server device, the server device can provide services to the client running on the terminal device based on the communication connection with the terminal device. The dialogue system 110 can present a corresponding interface to user 145 based on user 145's operations, to output and / or receive relevant information from user 145.
[0039] In some embodiments, the implementation of at least some functions of the dialogue system 110, and / or the implementation of at least some functions of the digital assistant 122 in the dialogue system 110, may be based on target model 155. During the operation of the dialogue system 110, one or more target models may be invoked, such as target model 155-1, target model 155-2, ..., target model 155-N, etc., where N is a positive integer. For ease of description, one or more target models are collectively referred to as target model 155 herein. In the dialogue system 110, the digital assistant 122 may utilize target model 155 to understand user input and provide responses to the user based on the output of target model 155. It should be noted that although target model 155 is shown as independent of the dialogue system 110 in Figure 1, one or more target models 155 may run on the dialogue system 110 or on other remote servers.
[0040] Target model 155 may include any suitable machine learning model. In some embodiments, one or more target models 155 may be constructed based on a language model (LM), such as a large language model (LLM). The machine learning model used is a content-generative model capable of generating corresponding outputs based on model inputs. In some embodiments, the language model-based machine learning model is capable of receiving text-modal model inputs (e.g., natural language and / or machine language) and / or non-text-modal model inputs (e.g., images, speech, video, etc.), and is capable of generating the desired output based on the model inputs and prompt words. Here, prompt words are used to guide the machine learning model to generate outputs that address the user needs indicated by the model inputs. In application scenarios supporting user dialogue, user 145's input may be provided to target model 155 as at least a portion of the model inputs (other portions may include prompt words). This user input is considered a question. Based on the model outputs, corresponding responses may be generated and provided to user 145.
[0041] The dialogue system 110 can operate on suitable electronic devices. These electronic devices can be any type of computing-capable device, including terminal devices or server devices. Terminal devices can be any type of mobile terminal, fixed terminal, or portable terminal, including mobile phones, desktop computers, laptop computers, notebook computers, netbook computers, tablet computers, media computers, multimedia tablets, personal communication system (PCS) devices, personal navigation devices, personal digital assistants (PDAs), audio / video players, digital cameras / camcorders, positioning devices, television receivers, radio receivers, e-book devices, gaming devices, or any combination of the foregoing, including accessories and peripherals of these devices or any combination thereof. Server devices can include, for example, computing systems / servers, such as mainframes, edge computing nodes, computing devices in cloud environments, and so on. In some embodiments, the dialogue system 110 can be implemented based on cloud services.
[0042] It should be understood that the structure and function of environment 100 are described for illustrative purposes only and do not imply any limitation on the scope of this disclosure.
[0043] Example process
[0044] The following description continues with reference to the accompanying drawings, outlining some exemplary embodiments of this disclosure. Figure 2 illustrates a flowchart of an interactive process 200 according to some embodiments of this disclosure. Part or all of process 200 may be implemented by the dialogue system 110 or by other devices independent of the dialogue system 110, such as by other remote devices with computing capabilities (terminal devices or service devices). In the following discussion, for ease of discussion, the execution of process 200 will be described from the perspective of the dialogue system 110, but this is merely exemplary.
[0045] In box 210, if the dialogue system 110 receives a user query, it generates topic information based on the interaction content of at least one interaction round related to the user query. Generally, during the interaction, the dialogue system 110 can receive a query message from user 145 and can provide a response message to that query message. The period from receiving the query message to providing a response message can be considered one interaction round. A user query can include query messages from a point in time that is currently in the interaction round, or query messages from multiple interaction rounds associated with the current interaction round. A user query can include query messages in various modalities input by user 145, such as text messages, voice messages, image messages, video messages, etc. The interaction content of the at least one interaction round can include the query message input by user 145 in that at least one interaction round and the response message provided by the dialogue system 110 in response to the query message.
[0046] The topic information is used to describe the semantics of the interaction content in at least one interaction round. In some examples, the topic message may include natural language text that describes the linguistic meaning of the interaction content in the at least one interaction round. As an example, the dialogue system 110 may perform a rewriting operation on the interaction content of the at least one interaction round to obtain natural language text. Of course, the topic information may also include other modalities of information that can describe the semantics of the interaction content, and the embodiments of this disclosure are not limited thereto.
[0047] In some embodiments, the dialogue system 110 can determine a target query scenario corresponding to a user query from a predetermined plurality of query scenarios. Then, the dialogue system 110 can determine topic information based on the target query scenario and the interaction content of at least one round of interaction. The query scenario can indicate at least one of the following: the way the user query targets a query object, or the domain of the query object targeted by the user query. Regarding the query scenario indicating the way the user query targets a query object, in some examples, the user query may simply and explicitly indicate the query object to be queried. For example, the user query may include "nearby steakhouses," with the query object being "steak." In other examples, the user query may include the name of an institution associated with the query object; for example, the user query may include "nearby XXX restaurants." In some examples, the user query may include excluded query objects. For example, the user query may include "nearby coffee shops, but not XXX shops and YYY shops." Regarding the domain of the query object targeted by the user query, in some examples, the query scenario can indicate the domain to which the business, product, service, media content, etc., to which the user wants to query belongs; for example, the query scenario may indicate that the user needs to query content in the domains of "food, accommodation, entertainment, medicine, supermarkets," etc.
[0048] Regarding the determination of the target query scenario, in some examples, Figure 3 illustrates a schematic diagram of an example architecture 300 for interaction according to some embodiments of this disclosure. As shown in Figure 3, the example architecture 300 shows a target model 155-1 and a target model 155-2, wherein the number of parameters of target model 155-1 may be less than the number of parameters of target model 155-2, and the reasoning ability of target model 155-1 is weaker than that of target model 155-2. The dialogue system 110 may provide the interaction content 310 of at least one interaction round to the target model 155-1. In block 320, the dialogue system 110 determines whether the target model 155-1 can recognize the query scenario. If it is determined in block 320 that the target model 155-2 can recognize the query scenario, process 300 proceeds to block 330. In block 330, the dialogue system 110 may determine the target query scenario corresponding to the user query based on the model output of target model 155-1. If, in box 320, it is determined that target model 155-1 cannot determine the query scenario corresponding to the user query, the interaction content of at least one interaction round can be provided to target model 155-2. In box 330, based on the model output of target model 155-2, the target query scenario corresponding to the user query is determined.
[0049] In some embodiments, the dialogue system 110 can obtain a prompt word template corresponding to the target query scenario. The prompt word template may include a requirement to rewrite the interaction content of at least one interaction round into topic information. For example, the prompt word template may instruct the interaction content to be rewritten into topic information based on the user's query method for the query object. Alternatively, the prompt word template may instruct the interaction content to be rewritten into topic information based on the domain to which the user's query object belongs. The dialogue system 110 can generate prompt information (also referred to as first prompt information) for the target model 155-3 (sometimes referred to herein as the first target model) based on the prompt word template and the interaction content of at least one interaction round. The dialogue system 110 can provide this prompt information to the target model 155-3 to obtain the model output of the target model 155-3 (also referred to as the first model output). Then, the dialogue system 110 can determine the topic information based on the model output.
[0050] In box 220, the dialogue system 110 determines the task configuration corresponding to the query requirement indicated by the user's query based on topic information. The task configuration may include at least one task field and at least one task parameter corresponding to the at least one task field, which can, to a certain extent, indicate the user's query requirement. The at least one task parameter may each indicate at least one named entity. The at least one named entity may include, but is not limited to, place names, organization names, time, currency, products, events, proper nouns, etc.
[0051] In some embodiments, the dialogue system 110 may determine at least one task field defined for the target query scenario based on the target query scenario corresponding to the user query. The dialogue system 110 may perform parameter extraction operations on the topic information to determine at least one task parameter corresponding to the at least one task field.
[0052] In some examples, the dialogue system 110 can generate prompt information (also referred to as second prompt information) for the target model 155-4 (sometimes referred to herein as the fourth target model) based on the topic information and the at least one task field. The dialogue system 110 provides this prompt information to the target model 155-4, which then extracts at least one task parameter corresponding to the at least one task field from the topic information. Subsequently, the dialogue system 110 can determine the at least one task parameter based on the model output (also referred to as the second model output) of the target model 155-4.
[0053] In other examples, the dialogue system 110 can generate at least one prompt message corresponding to the at least one task field based on the topic information and the at least one task field. The dialogue system 110 can provide the at least one prompt message in parallel to the target model 155-4 to obtain at least one model output from the target model 155-4. Then, the dialogue system 110 can determine the at least one task parameter based on the at least one model output.
[0054] As an example, Figure 4 illustrates a schematic diagram of an example architecture 400 for interaction according to some embodiments of the present disclosure. Example architecture 400 shows a target model 155-4 and a database 406, where the database 406 stores candidate named entities pre-determined for different task fields. Dialogue system 110 can define multiple task fields 402-1, 402-2, ..., 402-M for a target query scenario, where M is a positive integer. Dialogue system 110 can select at least one set of candidate named entities from database 406 based on the multiple task fields 402-1, 402-2, ..., 402-M, for example, selecting one or more sets of restaurant category names. In block 408, dialogue system 110 can generate prompt messages 410-1, 410-2, ..., 410-M based on subject information 404, the at least one set of candidate named entities, and the multiple task fields 402-1, 402-2, ..., 402-M. The dialogue system 110 can provide prompts 410-1, 410-2, ..., 410-M in parallel to the target model 155-4, obtaining multiple model outputs from the target model 155-4. Based on the multiple model outputs, multiple task parameters 412-1, 412-2, ..., 412-M are determined. Then, the dialogue system 110 can merge the multiple task parameters 412-1, 412-2, ..., 412-M into a task configuration 414. For example, the dialogue system can merge the multiple task parameters 412-1, 412-2, ..., 412-M into a structured task configuration 414. In some cases, the multiple prompts 410-1, 410-2, ..., 410-M can also be provided to multiple target models respectively, allowing the parameter extraction operation to be performed in parallel by the multiple target models.
[0055] In some embodiments, the dialogue system 110 can detect reference queries that semantically match the topic information from a set of reference queries. If no reference query semantically matches the topic information is detected from the set of reference queries, the dialogue system 110 can determine the task configuration corresponding to the query requirement indicated by the user query based on the topic information. If a target reference query semantically matches the topic information is detected from the set of reference queries, the dialogue system 110 can determine the target recommended content for the user query based on a predetermined recommendation strategy corresponding to the target reference query. In this way, for some relatively simple user queries or queries with relatively clear query requirements, the target recommended content can be determined based on experience, which helps to improve recommendation efficiency.
[0056] As an example, Figure 5 illustrates a flowchart of an example process 500 for interaction according to some embodiments of this disclosure. In block 506, the dialogue system 110 determines the semantic similarity between topic information 502 and reference queries in the reference query set 504. In block 508, the dialogue system 110 determines whether the similarity exceeds a threshold. If it is determined in block 508 that the similarity does not exceed the threshold, the process 500 can return to block 506; if it is determined in block 508 that the similarity exceeds the threshold, the process 500 proceeds to block 510. In block 510, the dialogue system 110 determines the target query request mapped to the reference query whose similarity exceeds the threshold, based on the mapping relationship between the reference query and a predetermined query request. For example, topic information 502 may include "find a place to celebrate a birthday," and the reference query may include "restaurants suitable for celebrating birthdays." The dialogue system 110 determines that the semantic similarity between topic information 502 and the reference query is 0.97. If the threshold is 0.95, the dialogue system 110 determines that the similarity exceeds the threshold and determines that the target query requirement mapped to the reference query is "birthday dinner".
[0057] In box 512, the dialogue system 110 determines whether the target query requirement belongs to a first type of requirement. Specifically, the predetermined query requirement may include a first type of requirement and a second type of requirement. A first type of requirement indicates that the corresponding query requirement has at least one associated query field, and at least one query parameter corresponding to that at least one query field needs to be determined. A second type of requirement indicates that the corresponding query requirement does not have an associated query field. If it is determined in box 512 that the target query requirement belongs to a first type of requirement, process 500 proceeds to box 514. In box 514, the dialogue system 110 may perform at least one round of clarification to determine at least one query parameter associated with the target query requirement. Specifically, the dialogue system 110 may present at least one clarification question based on at least one query field associated with the target query requirement. The dialogue system 110 may receive the user's clarification response and extract at least one query parameter corresponding to that at least one query field from the clarification response.
[0058] In box 516, the dialogue system 110 determines the target recommended content for the user's query based on pre-defined recommendation logic corresponding to the target query requirement. Specifically, if the target query requirement belongs to the first type of requirement, the dialogue system 110 uses the pre-defined recommendation logic and at least one extracted query parameter to determine the target recommended content for the user's query. If the target query requirement belongs to the second type of requirement, the dialogue system 110 can use the pre-defined recommendation logic to determine the target recommended content for the user's query.
[0059] In some embodiments, the reference query set is determined as follows: Based on a first reference query, the dialogue system 110 generates multiple second reference queries that semantically match the first reference query. The dialogue system 110 associates the first reference query and the multiple second reference queries with a first query requirement corresponding to the first reference query. Then, the first reference query associated with the first query requirement and the multiple second reference queries are added to the reference query set.
[0060] As an example, dialogue system 110 can generate prompting information for target model 155-5 based on a first reference query, system prompt words, at least one third reference query from the reference query set, and an indication of a first type of need. Dialogue system 110 can provide this prompting information to target model 155-5 to obtain one or more second reference queries from target model 155-5. For example, the first reference query might include "I want to eat Western food," and the one or more second reference queries might include, for example, "nearby Western restaurants," "good Western restaurants around," "find good Western restaurants," etc.
[0061] As another example, the dialogue system 110 can generate prompting information for the target model 155-5 based on a first reference query, system prompt words, at least one third reference query from the reference query set, and an indication of a second type of need. The dialogue system 110 can provide this prompting information to the target model 155-5 to obtain one or more second reference queries from the target model 155-5. For example, the first reference query might include "I want to eat Western food," and the one or more second queries might include "Where can I find Western food with an average cost of less than 150 yuan per person?", "Are there any Western restaurants nearby suitable for three people?", and so on.
[0062] As another example, the comparison system 110 can also generate prompt information for the target model 155-5 based on the first reference query requirement, system prompt words, at least one third reference query from the reference query set, and additional conditions, to obtain one or more second reference queries generated by the target model 155-5. Additional conditions may include one or more limiting conditions on the query requirement; for example, additional conditions may include conditions indicating ratings, conditions indicating business hours, conditions indicating scenarios, and so on. For instance, the first reference query may include "I want to eat Western food," and the one or more second reference queries may include "Recommend some highly-rated Western restaurants," "Where are the Western restaurants open until midnight," "West restaurants nearby suitable for couples' dates," and so on.
[0063] It should be noted that the above method for generating the second reference query is merely exemplary, and any other appropriate method can be chosen to generate the second reference query. For example, the dialogue system 110 can generate multiple sets of second reference queries in various ways, and obtain multiple second reference queries by performing merging and deduplication processing on the multiple sets of second reference queries.
[0064] In the return process 200, within box 230, the dialogue system 110, based on the task configuration, presents targeted recommended content for the user's query. This targeted recommended content can include various recommendations related to the user's query, such as merchants, products, services, media content, etc. Taking the catering service scenario as an example, the targeted recommended content could include restaurants, dishes, etc.
[0065] In some embodiments, the dialogue system 110 may select multiple candidate recommended content items from a set of candidate recommended content items based on task configuration. The multiple candidate recommended content items are then ranked according to their matching degree with the query request. The dialogue system 110 selects a predetermined number of candidate recommended content items with the highest matching degree from the ranked list as target recommended content. The dialogue system 110 can then present the selected predetermined number of target recommended content items.
[0066] As an example, Figure 6 illustrates a flowchart of an example process 600 for interaction according to some embodiments of the present disclosure. The dialogue system 110 can select a set of candidate recommended content 602-1, 602-2, ..., 602-M from a set of candidate recommended content based on multiple task parameters 412-1, 412-2, ..., 412-M in the task configuration 414. In block 604, the dialogue system 110 can perform merging and deduplication operations on multiple sets of candidate recommended content 602-1, 602-2, ..., 602-M to obtain multiple candidate recommended content. In block 606, the dialogue system 110 can determine the matching degree between the multiple candidate recommended content and the query requirement. Then, the multiple candidate recommended content can be sorted based on the matching degree. Here, the query requirement can be understood as the query requirement indicated by the task configuration 414, and the dialogue system 110 determines the matching degree between the multiple candidate recommended content and the query requirement based on the task configuration. For example, the dialogue system 110 can utilize a target model based on the task configuration to determine the matching degree between the multiple candidate recommended content and the query requirement. For example, the dialogue system 110 can also utilize a predetermined matching evaluation strategy to determine the matching degree between multiple candidate recommendations and the query request. In box 608, the dialogue system 110 can select a predetermined number of candidate recommendations ranked highest in matching degree as target recommendations. This predetermined number of candidate recommendations may include one or more candidate recommendations. The dialogue system 110 can then present these one or more candidate recommendations.
[0067] The training process for target models 155-3 and 155-4 will be illustrated below. It should be understood that such a training process can be performed by appropriate training equipment, which may include, but is not limited to, the dialogue system 110.
[0068] Regarding target model 155-3, in embodiments of this disclosure, the training device can generate sample queries using target model 155-6 (sometimes referred to herein as the second target model), and generate sample responses using target model 155-7 (the third target model) to form sample interaction content including sample queries and sample responses. The training device can generate sample topic information corresponding to the sample interaction content to form a first sample including the sample interaction content and the corresponding sample topic information. Then, the target model 155-3 is trained using the first sample. It is understood that the training device typically generates multiple first samples and trains the target model 155-3 using these multiple first samples. In this way, training samples for multi-turn dialogues can be generated for training the target model 155-3, ensuring the training quality of the target model 155-3 and avoiding the impact of insufficient training samples on the training quality of the target model 155-3.
[0069] In some embodiments, the training device can determine a query scenario based on multiple predetermined transition probabilities for multiple query scenarios, where the predetermined transition probability indicates the probability of transitioning from one query scenario to another. The training device can obtain one or more reference query contents corresponding to the determined query scenario. Then, the training device can use the target model 155-6 to generate a sample query corresponding to the determined query scenario based on the one or more reference query contents.
[0070] As an example, Figure 7 illustrates a schematic diagram of an example architecture 700 for interaction according to some embodiments of the present disclosure. Example architecture 700 shows a transformation matrix 710, target models 155-6 and 155-7. The transformation matrix 710 includes multiple transition probabilities P. 11 To P ii Multiple transition probabilities P 11 To P ii These indicate the probability of transitioning from one query scenario to another. For example, P 12 P indicates the probability of transitioning from "Query Scenario 1" to "Query Scenario 2". 1i This indicates the probability of transitioning from "Query Scenario 1" to "Query Scenario I". The training device can determine the query scenario 720 corresponding to the next sample query based on the previous sample query and the transition matrix 710. The training device can determine reference query content based on the query scenario, which can indicate the content the user expects to query under the corresponding query scenario. For example, the training device can select one or more named entities such as place name, organization name, time, currency, product, service, and event as reference query content. Alternatively, the training device can select one or more sample fields corresponding to the query scenario and one or more sample parameters corresponding to those sample fields as reference query content.
[0071] The training device can generate prompts for the target model 155-6 based on the query scenario, reference query content, and historical interaction content, thereby obtaining sample queries generated by the target model 155-6. The training device can also generate prompts for the target model 155-7 based on the obtained sample queries and historical interaction content, thereby obtaining sample responses generated by the target model 155-7. By repeating this process multiple times, the training device can obtain sample interaction content from multiple interaction rounds.
[0072] Regarding target model 155-4, in embodiments of this disclosure, the training device acquires a sample configuration, which includes at least one sample field and at least one sample parameter corresponding to the at least one sample field. The training device can utilize target model 155-8 (sometimes referred to herein as the fifth target model) to generate sample topic information based on the sample configuration, thereby obtaining a second sample including the sample topic information and the sample configuration. Subsequently, the training device can use the second sample to train target model 155-4.
[0073] As an example, the training device can obtain a list including sample fields and sample parameters. The training device can select multiple sets of sample fields and sample parameters from the list. Based on each set of sample fields and sample parameters, the user query template, and system prompts, it generates prompt information for the target model 155-8. The training device can provide the prompt information to the target model 155-8 to obtain sample topic information generated by the target model 155-8. The training device can then use the second sample to train the target model 155-4. After training, the training device can also evaluate whether the model output of the target model 155-4 meets predetermined evaluation metrics. If it is determined that the model output of the target model 155-4 meets the predetermined evaluation metrics, the training of the target model 155-4 can be considered complete.
[0074] Understandably, incremental training can be performed on target model 155-4 during its operation. For example, some time after target model 155-4 has been trained, the training device can determine new task parameters and use target model 155-8 to construct training samples for incremental training based on these new parameters. These training samples are then used to perform incremental training on target model 155-4 to maintain the accuracy of its output.
[0075] It should also be noted that the aforementioned target models 155-1 to 155-8 can be implemented as different target models or as the same target model. The embodiments of this disclosure do not limit this.
[0076] In this way, the embodiments of this disclosure can accurately understand the semantics of the user's multi-turn dialogue content, determine the task configuration indicating the user's query needs, and provide recommended content to the user according to the task configuration. Therefore, in multi-turn dialogue scenarios, the accuracy and quality of the recommended content provided to the user can be improved.
[0077] Example devices and equipment
[0078] Embodiments of this disclosure also provide corresponding apparatus for implementing the methods or processes described above. FIG8 shows a schematic structural block diagram of an example apparatus 800 for interaction according to certain embodiments of this disclosure. Apparatus 800 may be implemented as or included in the dialogue system 110. The various modules / components in apparatus 800 may be implemented by hardware, software, firmware, or any combination thereof.
[0079] As shown in Figure 8, the device 800 includes a generation module 810, a determination module 820, and a presentation module 830. The generation module 810 is configured to, in response to receiving a user query, generate topic information based on interaction content from at least one interaction round related to the user query. The topic information describes the semantics of the interaction content from at least one interaction round. The determination module 820 is configured to, based on the topic information, determine a task configuration corresponding to the query requirement indicated by the user query. The task configuration includes at least one task parameter. The presentation module 830 is configured to, based on the task configuration, present targeted recommended content for the user query.
[0080] In some embodiments, the generation module 810 is further configured to: determine the target query scenario corresponding to the user query from multiple query scenarios; and determine topic information based on the target query scenario and the interaction content of at least one interaction round.
[0081] In some embodiments, the determining module 820 is further configured to: obtain a prompt word template corresponding to the target query scenario; generate first prompt information for the first target model based on the prompt word template and the interaction content of at least one interaction round, so as to obtain a first model output of the first target model; and determine topic information based on the first model output.
[0082] In some embodiments, the apparatus 800 further includes: a first training module configured to train a first target model by: generating a sample query using a second target model; generating a sample response for the sample query using a third target model to form sample interaction content including the sample query and the sample response; generating sample topic information corresponding to the sample interaction content to form a first sample including the sample interaction content and the corresponding sample topic information; and training the first target model using the first sample.
[0083] In some embodiments, the first training module is further configured to: determine a query scenario based on a plurality of predetermined transition probabilities for a plurality of query scenarios, wherein the predetermined transition probability in the plurality of predetermined transition probabilities indicates the probability of transitioning from one query scenario to another query scenario; obtain one or more reference query contents corresponding to the determined query scenario; and generate a sample query corresponding to the determined query scenario using a second target model based on the one or more reference query contents.
[0084] In some embodiments, the determining module 820 is further configured to: detect reference queries that semantically match the topic information from the reference query set; and in response to the absence of a reference query that semantically matches the topic information, determine a task configuration corresponding to the query requirement indicated by the user query based on the topic information.
[0085] In some embodiments, the determining module 820 is further configured to determine the reference query set by: generating a plurality of second reference queries that semantically match the first reference query based on the first reference query; associating the first reference query and the plurality of second reference queries with a first query requirement corresponding to the first reference query; and adding the first reference query and the plurality of second reference queries associated with the first query requirement to the reference query set.
[0086] In some embodiments, the determining module 820 is further configured to: determine a target query scenario corresponding to a user query; determine at least one task field defined for the target query scenario; generate at least one second prompt message for a fourth target model based on topic information and at least one task field to obtain at least one second model output of the fourth target model; and obtain at least one task parameter corresponding to at least one task field based on at least one second model output to form a task configuration.
[0087] In some embodiments, the apparatus 800 further includes: a second training module configured to train a fourth target model by: obtaining a sample configuration, the sample configuration including at least one sample field and at least one sample parameter corresponding to the at least one sample field; generating sample topic information based on the sample configuration using a fifth target model to obtain a second sample including the sample topic information and the sample configuration; and training the fourth target model using the second sample.
[0088] In some embodiments, the presentation module 830 is further configured to: select multiple candidate recommended contents from a set of candidate recommended contents based on task configuration; sort the multiple candidate recommended contents based on the matching degree between the multiple candidate recommended contents and the query requirements; select a predetermined number of candidate recommended contents with the highest matching degree from the sorted contents as target recommended contents; and present the selected predetermined number of target recommended contents.
[0089] The units and / or modules included in device 800 can be implemented in various ways, including software, hardware, firmware, or any combination thereof. In some embodiments, one or more units and / or modules can be implemented using software and / or firmware, such as machine-executable instructions stored on a storage medium. In addition to or as an alternative to machine-executable instructions, some or all of the units and / or modules in device 800 can be implemented at least partially by one or more hardware logic components. By way of example and not limitation, exemplary types of hardware logic components that can be used include field-programmable gate arrays (FPGAs), application-specific integrated circuits (ASICs), application-specific standard products (ASSPs), systems-on-chips (SoCs), complex programmable logic devices (CPLDs), and so on.
[0090] Figure 9 shows a block diagram of an electronic device 900 in which one or more embodiments of the present disclosure may be implemented. It should be understood that the electronic device 900 shown in Figure 9 is merely exemplary and should not constitute any limitation on the functionality and scope of the embodiments described herein. The electronic device 900 shown in Figure 9 may include or be implemented as the dialogue system 110 of Figure 1, or the device 800 of Figure 8.
[0091] As shown in Figure 9, the electronic device 900 is in the form of a general-purpose electronic device. Components of the electronic device 900 may include, but are not limited to, one or more processors or processing units 910, a memory 920, a storage device 930, one or more communication units 940, one or more input devices 950, and one or more output devices 960. The processing unit 910 may be a physical or virtual processor and is capable of performing various processes according to the program stored in the memory 920. In a multiprocessor system, multiple processing units execute computer-executable instructions in parallel to improve the parallel processing capability of the electronic device 900.
[0092] Electronic device 900 typically includes multiple computer storage media. Such media can be any accessible media that is accessible to electronic device 900, including but not limited to volatile and non-volatile media, removable and non-removable media. Memory 920 can be volatile memory (e.g., registers, cache, random access memory (RAM)), non-volatile memory (e.g., read-only memory (ROM), electrically erasable programmable read-only memory (EEPROM), flash memory), or some combination thereof. Storage device 930 can be removable or non-removable media and can include machine-readable media, such as flash drives, disks, or any other media that can be used to store information and / or data and can be accessed within electronic device 900.
[0093] Electronic device 900 may further include additional removable / non-removable, volatile / non-volatile storage media. Although not shown in FIG. 9, disk drives for reading from or writing to removable, non-volatile disks (e.g., "floppy disks") and optical disk drives for reading from or writing to removable, non-volatile optical disks may be provided. In these cases, each drive may be connected to a bus (not shown) via one or more data media interfaces. Memory 920 may include computer program product 925 having one or more program modules configured to perform various methods or actions of various embodiments of the present disclosure.
[0094] The communication unit 940 enables communication with other electronic devices via a communication medium. Additionally, the functionality of the components of the electronic device 900 can be implemented using a single computing cluster or multiple computing machines capable of communicating via communication connections. Therefore, the electronic device 900 can operate in a networked environment using logical connections to one or more other servers, network personal computers (PCs), or another network node.
[0095] Input device 950 can be one or more input devices, such as a mouse, keyboard, trackball, etc. Output device 960 can be one or more output devices, such as a monitor, speaker, printer, etc. Electronic device 900 can also communicate with one or more external devices (not shown) via communication unit 940 as needed. These external devices include storage devices, display devices, etc., and can communicate with one or more devices that enable user interaction with electronic device 900, or with any device that enables electronic device 900 to communicate with one or more other electronic devices (e.g., network card, modem, etc.). Such communication can be performed via input / output (I / O) interface (not shown).
[0096] According to an exemplary implementation of this disclosure, a computer-readable storage medium is provided that stores computer-executable instructions thereon, wherein the computer-executable instructions are executed by a processor to implement the methods described above. According to an exemplary implementation of this disclosure, a computer program product is also provided, which is tangibly stored on a non-transitory computer-readable medium and includes computer-executable instructions, which are executed by a processor to implement the methods described above.
[0097] Various aspects of this disclosure are described herein with reference to flowchart illustrations and / or block diagrams of methods, apparatuses, devices, and computer program products implemented according to this disclosure. It should be understood that each block of the flowchart illustrations and / or block diagrams, and combinations of blocks in the flowchart illustrations and / or block diagrams, can be implemented by computer-readable program instructions.
[0098] These computer-readable program instructions can be provided to a processing unit of a general-purpose computer, a special-purpose computer, or other programmable data processing apparatus to produce a machine such that, when executed by the processing unit of the computer or other programmable data processing apparatus, they create means for implementing the functions / actions specified in one or more blocks of the flowchart and / or block diagram. These computer-readable program instructions can also be stored in a computer-readable storage medium that causes a computer, programmable data processing apparatus, and / or other device to operate in a particular manner. Thus, the computer-readable medium storing the instructions comprises an article of manufacture that includes instructions for implementing aspects of the functions / actions specified in one or more blocks of the flowchart and / or block diagram.
[0099] Computer-readable program instructions can be loaded onto a computer, other programmable data processing apparatus, or other device to cause a series of operational steps to be performed on the computer, other programmable data processing apparatus, or other device to produce a computer-implemented process, thereby causing the instructions that execute on the computer, other programmable data processing apparatus, or other device to perform the functions / actions specified in one or more boxes of a flowchart and / or block diagram.
[0100] The flowcharts and block diagrams in the accompanying drawings illustrate the architecture, functionality, and operation of possible implementations of systems, methods, and computer program products according to various embodiments of this disclosure. In this regard, each block in a flowchart or block diagram may represent a module, segment, or portion of an instruction, which contains one or more executable instructions for implementing the specified logical function. In some alternative implementations, the functions indicated in the blocks may occur in a different order than those indicated in the drawings. For example, two consecutive blocks may actually be executed substantially in parallel, and they may sometimes be executed in reverse order, depending on the functions involved. It should also be noted that each block in the block diagrams and / or flowcharts, and combinations of blocks in the block diagrams and / or flowcharts, may be implemented using a dedicated hardware-based system that performs the specified function or action, or using a combination of dedicated hardware and computer instructions.
[0101] Various implementations of this disclosure have been described above. These descriptions are exemplary and not exhaustive, nor are they limited to the disclosed implementations. Many modifications and variations will be apparent to those skilled in the art without departing from the scope and spirit of the described implementations. The terminology used herein is chosen to best explain the principles, practical applications, or improvements to technology in the market, or to enable others skilled in the art to understand the various implementations disclosed herein.
Claims
1. An interaction method, comprising: In response to receiving a user query, topic information is generated based on the interaction content of at least one interaction round related to the user query, the topic information being used to describe the semantics of the interaction content of the at least one interaction round; Based on the topic information, a task configuration corresponding to the query requirement indicated by the user query is determined, and the task configuration includes at least one task parameter; as well as Based on the task configuration, recommended content is presented to the user based on their query.
2. The method according to claim 1, wherein generating the topic information includes: Determine the target query scenario corresponding to the user's query from multiple query scenarios; as well as The topic information is determined based on the target query scenario and the interaction content of the at least one interaction round.
3. The method according to claim 2, wherein determining the topic information includes: Obtain the prompt word template corresponding to the target query scenario; Based on the prompt word template and the interaction content of the at least one interaction round, generate first prompt information for the first target model to obtain the first model output of the first target model; as well as The topic information is determined based on the output of the first model.
4. The method according to claim 3, wherein the first target model is trained in the following manner: Generate sample queries using the second objective model; The third target model is used to generate sample responses for the sample query, so as to form sample interaction content including the sample query and the sample response; Generate sample topic information corresponding to the sample interaction content to form a first sample including the sample interaction content and the corresponding sample topic information; as well as The first target model is trained using the first sample.
5. The method according to claim 4, wherein generating sample queries using the second target model comprises: A query scenario is determined based on multiple predetermined transition probabilities for multiple query scenarios, wherein the predetermined transition probability among the multiple predetermined transition probabilities indicates the probability of transitioning from one query scenario to another. Retrieve one or more reference query contents corresponding to the determined query scenario; as well as Using the second target model, a sample query corresponding to the determined query scenario is generated based on the one or more reference query contents.
6. The method of claim 1, wherein determining the task configuration corresponding to the query request indicated by the user query includes: Detect reference queries from the reference query set that semantically match the topic information; as well as In response to the absence of a reference query semantically matching the topic information, a task configuration corresponding to the query requirement indicated by the user query is determined based on the topic information.
7. The method of claim 6, wherein the reference query set is determined in the following manner: Based on the first reference query, generate multiple second reference queries that semantically match the first reference query; Associate the first reference query and multiple second reference queries with the first query requirement corresponding to the first reference query; and Add the first reference query and multiple second reference queries associated with the first query requirement to the reference query set.
8. The method of claim 1, wherein determining the task configuration corresponding to the query request indicated by the user query includes: Determine the target query scenario corresponding to the user query; Determine at least one task field defined for the target query scenario; Based on the topic information and the at least one task field, at least one second prompt message is generated for the fourth target model to obtain at least one second model output of the fourth target model; as well as Based on the output of the at least one second model, at least one task parameter corresponding to the at least one task field is obtained to form the task configuration.
9. The method according to claim 8, wherein the fourth target model is trained in the following manner: Obtain sample configuration, which includes at least one sample field and at least one sample parameter corresponding to the at least one sample field; Using the fifth objective model based on the sample configuration, sample topic information is generated to obtain a second sample including the sample topic information and the sample configuration; as well as The fourth target model is trained using the second sample.
10. The method of claim 1, wherein presenting the target recommended content includes: Based on the task configuration, select multiple candidate recommendation contents from the set of candidate recommendation contents; Based on the matching degree between the multiple candidate recommended contents and the query requirements, the multiple candidate recommended contents are ranked. A predetermined number of candidate recommended content items with the highest matching degree are selected from the sorting and used as the target recommended content; as well as Present the selected predetermined number of target recommendations.
11. A device for interaction, comprising: A generation module is configured to, in response to receiving a user query, generate topic information based on the interaction content of at least one interaction round regarding the user query, the topic information being used to describe the semantics of the interaction content of the at least one interaction round; The determination module is configured to determine, based on the topic information, a task configuration corresponding to the query requirement indicated by the user query, wherein the task configuration includes at least one task parameter; as well as The presentation module is configured to present target recommended content for the user query based on the task configuration.
12. An electronic device, comprising: At least one processing unit; as well as At least one memory, coupled to the at least one processing unit and storing instructions for execution by the at least one processing unit, the instructions causing the electronic device to perform the method according to any one of claims 1 to 10 when executed by the at least one processing unit.
13. A computer-readable storage medium having a computer program stored thereon, the computer program being executable by a processor to implement the method according to any one of claims 1 to 10.
14. A computer program product comprising a computer program, wherein the computer program, when executed by a processor, implements the method according to any one of claims 1 to 10.