Dialogue processing method and system, electronic device and computer readable storage medium
By using a hybrid expert model encoder and a multi-domain knowledge database, the problem of high computational cost in multi-domain understanding of retrieval enhancement generative models is solved, thereby improving retrieval efficiency and enhancing user experience.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- LANGCHAO ELECTRONIC INFORMATION IND CO LTD
- Filing Date
- 2024-04-21
- Publication Date
- 2026-06-12
AI Technical Summary
Existing retrieval enhancement generative models involve high computational costs when understanding multi-domain knowledge, resulting in low retrieval efficiency and impacting user experience.
An encoder based on a hybrid expert model is adopted, which converts the word sequence into query vectors through a gating network and retrieves the target document vector with the highest similarity from the vector database. The model is optimized by combining a multi-domain knowledge database and a fine-tuning dataset to improve encoding efficiency.
While possessing multi-domain knowledge processing capabilities, it improves the coding efficiency and retrieval efficiency of the search engine, enhancing the user experience.
Smart Images

Figure CN118296126B_ABST
Abstract
Description
Technical Field
[0001] This application relates to the field of content generation, and in particular to a dialogue processing method, system, electronic device, and computer-readable storage medium. Background Technology
[0002] Retrieval-Augmented Generation (RAG) is a technique that combines information retrieval and text generation. It aims to enhance the generative capabilities of large language models by leveraging external knowledge bases. In a RAG model, the retrieval engine is a key component; its primary function is to retrieve the most relevant information to the user's input query from the knowledge base, thereby improving the quality of the content generated by the large language model. The retrieval engine's results directly impact whether the model can generate accurate and relevant answers. An efficient and accurate retrieval engine can significantly improve the overall performance of the RAG model; therefore, in practical applications, the design and optimization of the retrieval engine is a crucial task in RAG model development.
[0003] When RAGs are applied in specific domains, the retrieval engine needs to possess knowledge understanding capabilities across multiple domains, especially when the knowledge base is diverse and complex. This typically means requiring a language model with richer parameters and a deeper network structure to provide more accurate and in-depth understanding when processing multi-domain knowledge queries. However, as the model parameters used to build the encoder increase, the computational load increases significantly, leading to a significant decrease in the encoding efficiency of the retrieval engine. This reduces retrieval efficiency, prevents timely responses to user-input queries, and degrades the user experience.
[0004] Therefore, how to provide a solution to the above-mentioned technical problems is a problem that needs to be solved by those skilled in the art. Summary of the Invention
[0005] The purpose of this application is to provide a dialogue processing method, system, electronic device, and computer-readable storage medium that can improve the coding efficiency of the retrieval device while ensuring that the retrieval device has the knowledge processing capabilities of multiple application domains, thereby improving the user experience.
[0006] To address the aforementioned technical problems, this application provides a dialogue processing method applied to a processing component of an interactive device, wherein the interactive device further includes an input component and a prompting component, and the dialogue processing method includes:
[0007] When a query dialogue text is received through the input component, the query dialogue text is converted to obtain a word sequence including multiple word elements;
[0008] The word sequence is input into a preset encoder, which converts the word sequence into a query vector. The preset encoder is an encoder built based on a hybrid expert model.
[0009] The target document vector with the highest similarity to the query vector is retrieved from the vector database, and the content text corresponding to the target document vector is obtained. The vector database stores multiple document vectors, and the document vector is obtained by encoding and indexing the content text.
[0010] The content text corresponding to the target document vector is combined with the query dialogue text to obtain the retrieval text, so as to generate the response text corresponding to the query dialogue text based on the retrieval text, and the response text is displayed through the prompt component.
[0011] The preset encoder includes a gating network and multiple expert sub-models;
[0012] The process of converting the word sequence into a query vector using the preset encoder includes:
[0013] The expert sub-model corresponding to each word in the word sequence is determined by the gating network.
[0014] Each lexical unit is assigned to the corresponding expert sub-model to calculate the tensor corresponding to the lexical unit;
[0015] The query vector is obtained from the tensors corresponding to all the given terms.
[0016] The process of determining the expert sub-model corresponding to each word in the word sequence through the gating network includes:
[0017] The reference tensor for each lexical in the lexical sequence is determined by the gating network; the number of elements in the reference tensor is the same as the number of expert sub-models.
[0018] For each lexical unit, each element in the reference tensor of the lexical unit is divided into a first element and a second element. The position of the first element in the reference tensor is determined, and the expert sub-model corresponding to the lexical unit is determined based on the position. The value of the first element is greater than the value of the second element.
[0019] Where the number of the first element is multiple;
[0020] Determine the weights of each of the expert sub-models;
[0021] The process of assigning each lexical unit to the corresponding expert submodel to calculate the tensor corresponding to the lexical unit includes:
[0022] Each lexical unit is assigned to one of the corresponding expert sub-models to obtain multiple intermediate tensors;
[0023] The tensor of the lexical is calculated based on the weights of the expert submodel corresponding to the multiple intermediate tensors and the lexical.
[0024] The process of determining the reference tensor of each lexical in the lexical sequence through the gating network includes:
[0025] Obtain the sequence length of the lexical sequence and the number of expert sub-models;
[0026] The reference tensor for each term is determined by the gating network according to the sequence length and the number of expert sub-models.
[0027] Before inputting the word sequence into the preset encoder, the dialogue processing method further includes:
[0028] Build a multi-domain knowledge database;
[0029] Determine the fine-tuning dataset corresponding to the current application domain scenario from the multi-domain knowledge database;
[0030] The hybrid expert model is fine-tuned based on the fine-tuning dataset to obtain the preset encoder.
[0031] The process of building a multi-domain knowledge database includes:
[0032] Obtain the query types corresponding to various application domain scenarios;
[0033] Based on the query type corresponding to each application domain scenario, obtain multiple related data for that application domain scenario;
[0034] A multi-domain knowledge database is constructed based on all the aforementioned related data.
[0035] The process of constructing a multi-domain knowledge database based on all the aforementioned related data includes:
[0036] Multiple content texts are obtained based on the multiple sets of associated data;
[0037] Each piece of content text is vectorized to obtain a document vector;
[0038] A multi-domain knowledge database is constructed based on all the document vectors, and the multi-domain knowledge database is used as the vector database.
[0039] The process of vectorizing each of the content texts includes:
[0040] The content text is vectorized using the preset encoder.
[0041] The process of fine-tuning the hybrid expert model based on the fine-tuning dataset to obtain the preset encoder includes:
[0042] Positive and negative samples are constructed based on the fine-tuned dataset;
[0043] The hybrid expert model is fine-tuned based on the positive samples, the negative samples, and the preset loss function.
[0044] The process of constructing positive and negative samples based on the fine-tuned dataset includes:
[0045] Extract the first and second text paragraphs from the fine-tuned dataset;
[0046] The first text paragraph is divided into sentences to obtain a first sentence set, which includes multiple first sentence texts;
[0047] The second text paragraph is divided into sentences to obtain a second sentence set, which includes multiple second sentence texts;
[0048] Extract one first sentence text from the first sentence set as the first query text, determine all first sentence texts in the first sentence set except the first query text as the first search content, and determine the first query text and the first search content as positive samples;
[0049] Extract one second sentence text from the second sentence set as the second query text, determine all second sentence texts in the second sentence set except for the second query text as the second search content, and determine the second query text and the second search content as positive samples;
[0050] Several first sentence samples are extracted from the first sentence set to form a first sentence text set and a second sentence text set, and the first sentence text set and the second sentence text set do not intersect;
[0051] Several second sentence samples are extracted from the second sentence set to form a third sentence text set and a fourth sentence text set, wherein the third sentence text set and the fourth sentence text set are disjoint;
[0052] Each first sentence text in the first sentence text set is determined as the third query text, and multiple second sentence texts are extracted from the second sentence set as negative samples.
[0053] A predetermined number of sentences in the first and second sentence sets are swapped and randomly concatenated to obtain a third and a fourth sentence set.
[0054] Several sentence texts are extracted as negative samples from the third sentence set and the fourth sentence set, respectively.
[0055] The process of fine-tuning the hybrid expert model based on the fine-tuning dataset to obtain the preset encoder includes:
[0056] The hybrid expert model is fine-tuned based on the fine-tuning dataset to obtain the target expert model;
[0057] The student model is obtained by performing knowledge aggregation and knowledge distillation on the target expert model.
[0058] A preset encoder is constructed based on the student model.
[0059] To address the aforementioned technical problems, this application also provides a dialogue processing system, a processing component applied to an interactive device, wherein the interactive device further includes an input component and a prompting component, and the dialogue processing system includes:
[0060] The processing module is used to convert the query dialogue text input through the input component into a word sequence including multiple word elements when it receives the query dialogue text.
[0061] An encoding module is used to input the word sequence into a preset encoder and convert the word sequence into a query vector through the preset encoder. The preset encoder is an encoder built based on a hybrid expert model.
[0062] The calculation module is used to retrieve the target document vector with the highest similarity to the query vector from the vector database, and obtain the content text corresponding to the target document vector. The vector database stores multiple document vectors, and the document vector is obtained by encoding and indexing the content text.
[0063] The response module is used to combine the content text corresponding to the target document vector with the query dialogue text to obtain the retrieval text, so as to generate the response text corresponding to the query dialogue text based on the retrieval text, and to display the response text through the prompt component.
[0064] To address the aforementioned technical problems, this application also provides an electronic device, comprising:
[0065] Memory, used to store computer programs;
[0066] A processor for executing the computer program to implement the steps of the dialogue processing method as described in any of the above.
[0067] To address the aforementioned technical problems, this application also provides a computer-readable storage medium storing a computer program, which, when executed by a processor, implements the steps of the dialogue processing method described in any of the preceding claims.
[0068] This application provides a dialogue processing method. After the processing component of the interactive device receives the query dialogue text input by the user through the input component, it first segments the query dialogue text to obtain a word sequence including multiple word elements. Then, the word sequence is input into a preset encoder for encoding. Since the preset encoder is an encoder built based on a hybrid expert model, the multiple expert sub-models in the hybrid expert model can process each word element separately, thereby converting the word sequence into a query vector. While increasing the number of parameters to have the ability to process knowledge across multiple application domains, it improves the encoding efficiency, thereby improving the retrieval efficiency, so as to respond to the query dialogue text input by the user in a timely manner and improve the user's experience of using the interactive device.
[0069] This application also provides a dialogue processing system, an electronic device, and a computer-readable storage medium, which have the same beneficial effects as the dialogue processing method described above. Attached Figure Description
[0070] To more clearly illustrate the embodiments of this application, the accompanying drawings used in the embodiments will be briefly introduced below. Obviously, the drawings described below are only some embodiments of this application. For those skilled in the art, other drawings can be obtained based on these drawings without creative effort.
[0071] Figure 1 A flowchart illustrating the steps of a dialogue processing method provided in this application;
[0072] Figure 2 A schematic diagram of the structure of a dialogue processing system provided in this application;
[0073] Figure 3 A schematic diagram of the structure of an electronic device provided in this application;
[0074] Figure 4 This is a schematic diagram of the structure of a computer-readable storage medium provided in this application. Detailed Implementation
[0075] The core of this application is to provide a dialogue processing method, system, electronic device, and computer-readable storage medium that can improve the coding efficiency of the retrieval device while ensuring that the retrieval device has knowledge processing capabilities in multiple application fields, thereby improving the user experience.
[0076] To make the objectives, technical solutions, and advantages of the embodiments of this application clearer, the technical solutions of the embodiments of this application will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of this application, not all embodiments. Based on the embodiments of this application, all other embodiments obtained by those skilled in the art without creative effort are within the scope of protection of this application.
[0077] Firstly, please refer to Figure 1 This application provides a dialogue processing method applied to a processing component of an interactive device, the interactive device further including an input component and a prompting component, the dialogue processing method comprising:
[0078] S101: When a query dialogue text is received through the input component, the query dialogue text is converted to obtain a word sequence including multiple word elements;
[0079] In this embodiment, the query dialogue text is input by the user through the input component of the interactive device. After receiving the query dialogue text, it is encoded. Encoding is a mapping process that converts the text into a sequence of tokens. For example, assuming the current query dialogue text is "Who is the founder of A?", the tokens sequence obtained after conversion may be [29871, 72864, 31359, 30210, 38917, 33422, 30882], where 29871 is called a token, 72864 is called a token, and so on, with 30882 being called a token.
[0080] S102: Input the word sequence into the preset encoder, and convert the word sequence into a query vector through the preset encoder. The preset encoder is an encoder built based on a hybrid expert model.
[0081] In this embodiment, an encoder is constructed based on a Mixture of Experts (MoE) model. A MoE is a deep learning architecture consisting of multiple expert sub-models and a gated network. It processes data and tasks by combining multiple expert models. In a MoE model, each expert sub-model typically focuses on processing a specific type of data or task, while a gated network determines which expert sub-model processes a given input. During inference, the MoE model only activates the expert sub-model most relevant to the current task. This sparse activation mechanism allows for increasing the learnable parameters of the model without increasing inference costs, thereby improving the model's capacity. In this embodiment, the preset encoder constructed using a mixed expert model improves the performance and efficiency of the preset encoder by combining different expert sub-models to process different parts of the word sequence during vector transformation. It can be understood that the query vector includes the semantic information of the query dialogue text.
[0082] S103: Retrieve the target document vector with the highest similarity to the query vector in the vector database, and obtain the content text corresponding to the target document vector. The vector database stores multiple document vectors, and the document vector is obtained by encoding and indexing the content text.
[0083] After obtaining the query vector, the similarity between the query vector and each document vector in the vector database is calculated. The document vector with the highest similarity is determined as the target document vector. The target document vector is then transformed to obtain the content text corresponding to the target document vector.
[0084] In this embodiment, similarity can be calculated using methods such as inner product or cosine similarity. The appropriate method can be selected based on the actual engineering needs, and this embodiment does not impose any limitations on this method.
[0085] In another embodiment, the similarity between each document vector in the vector retrieval library and the query vector can be obtained. The document vectors are sorted in descending order of similarity. The first c document vectors are selected as the target document vectors. Then, the content text corresponding to the target document vectors is obtained, where k is a positive integer.
[0086] S104: Combine the content text corresponding to the target document vector with the query dialogue text to obtain the retrieval text, so as to generate the response text corresponding to the query dialogue text based on the retrieval text, and display the response text through the prompt component.
[0087] In this embodiment, the content text corresponding to the target document vector is concatenated with the query dialogue text to obtain the retrieval text, which is used as the output of the retrieval machine. The output of the retrieval machine is the input of the generator. The generator generates response text based on the retrieval text and provides prompts through the prompt component so that the user can obtain the response in a timely manner.
[0088] As can be seen, in this embodiment, after the processing component of the interactive device receives the query dialogue text input by the user through the input component, it first segments the query dialogue text to obtain a word sequence including multiple word elements. Then, the word sequence is input into a preset encoder for encoding. Since the preset encoder is an encoder built based on a hybrid expert model, the multiple expert sub-models in the hybrid expert model can process each word element separately, thereby converting the word sequence into a query vector. While increasing the number of parameters to have the ability to process knowledge across multiple application domains, it also improves the encoding efficiency, thereby improving the retrieval efficiency, so as to respond to the query dialogue text input by the user in a timely manner and improve the user's experience of using the interactive device.
[0089] Based on the above embodiments:
[0090] In one exemplary embodiment, the preset encoder includes a gating network and multiple expert sub-models;
[0091] The process of converting a sequence of terms into a query vector using a preset encoder includes:
[0092] The expert sub-model corresponding to each word in the word sequence is determined by a gating network;
[0093] Each word is assigned to the corresponding expert sub-model to calculate the tensor corresponding to the word;
[0094] The query vector is obtained from the tensors corresponding to all tokens.
[0095] In this embodiment, the gating network is used to assign expert sub-models to each token in the token sequence. When processing a token, not all expert sub-models process the token, but only some expert sub-models process the token. That is, for a token, not all expert sub-models participate in tensor calculation, so that although the hybrid expert model has a large number of parameters, the amount of computation does not increase significantly, thereby improving the coding efficiency.
[0096] In an exemplary embodiment, the process of determining the expert sub-model corresponding to each lexical in a lexical sequence using a gating network includes:
[0097] A reference tensor for each word in the word sequence is determined using a gating network; the number of elements in the reference tensor is the same as the number of expert sub-models.
[0098] For each word, the elements in the reference tensor of the word are divided into a first element and a second element. The position of the first element in the reference tensor is determined, and the expert sub-model corresponding to the word is determined based on the position. The value of the first element is greater than the value of the second element.
[0099] The structure of the MoE model is explained. A single-layer MoE model structure includes a hidden state layer, a GQA (Group Query Attention) layer, a gating layer, and multiple expert sub-models. Each expert model is a feed-forward network (FFN) with independent parameters.
[0100] Assuming the current token sequence is [2981, 7864, 3135, 3021, 3897, 3342, 3082], in this embodiment, the sequence length seq_len is 7 (a total of 7 tokens), the batch_size is 1, and the hidden_size is 768. The hidden state layer is a 1*7*768 tensor. The output of the GQA layer is also 1*7*768. Now it goes through the Gatang layer, where tokens are assigned and a tensor of seq_len*expert_num is output. expert_num is the number of expert sub-models. This output determines which expert sub-model should compute each token (corresponding to a 1*768 tensor).
[0101] Suppose the first token passes through Gatang, and the output reference tensor is [0.4, 0.2, 0.1, 0.3]. These four elements correspond one-to-one with four Experts. The two largest elements are 0.4 and 0.3, meaning the first and fourth elements in this reference tensor are the first elements, the second and third elements are the second elements. The first element in the reference tensor corresponds to the first expert sub-model, the second element corresponds to the second expert sub-model, and so on. Therefore, the first token should be calculated by the first and fourth expert sub-models. Inputting a 1*768 tensor into the first expert sub-model, the result is still a 1*768 tensor, denoted as t1. Similarly, the fourth expert sub-model calculates a t2.
[0102] In one exemplary embodiment, when the number of the first element is multiple;
[0103] Determine the weights of each expert sub-model;
[0104] The process of assigning each word to the corresponding expert sub-model to calculate the tensor corresponding to the word includes:
[0105] Each word element is assigned to multiple corresponding expert sub-models, resulting in multiple intermediate tensors;
[0106] The tensor of a word is calculated based on the weights of multiple intermediate tensors and the expert sub-models corresponding to the words.
[0107] Continuing with the assumptions presented in the previous embodiments, for this token, the tensor of this token needs to be calculated jointly based on the weights of the first expert sub-model and the fourth expert sub-model. Specifically, the tensor of the token can be calculated using the first relation: (ua×ta+ub×tb) / (ua+ub), where ua is the value of the first element, ub is the value of the second element, ta is the intermediate tensor calculated by the expert sub-model determined by the position of the first element in the reference tensor, and tb is the intermediate tensor calculated by the expert sub-model determined by the position of the second element in the reference tensor. The final tensor calculation result of the token is (0.4*t1+0.3*t2) / (0.3+0.4). In this embodiment, the corresponding element in the reference tensor is used as the weight of the expert sub-model. The weights are dynamically changed, which can further improve the reliability and accuracy of the encoding.
[0108] Each token is processed in this way, resulting in 7 tensors of size 1*768. After some reshape operations, they are transformed back into 1*7*768 tensors, which serve as the next hidden state.
[0109] In one exemplary embodiment, before inputting the lexical sequence into a preset encoder, the dialogue processing method further includes:
[0110] Build a multi-domain knowledge database;
[0111] Determine the fine-tuning dataset corresponding to the current application domain scenario from a multi-domain knowledge database;
[0112] The hybrid expert model is fine-tuned based on the fine-tuning dataset to obtain the preset encoder.
[0113] First, relevant data is obtained based on the preset application domain, and a multi-domain knowledge database is created. A portion of the data is extracted from the multi-domain knowledge database to establish a fine-tuning dataset. The hybrid expert model is then fine-tuned using the fine-tuning dataset to obtain the preset encoder.
[0114] Specifically, a set of documents can be selected from a multi-domain knowledge database and segmented into text paragraphs (e.g., each paragraph contains 1000-2000 tokens) to serve as a fine-tuning dataset.
[0115] In one exemplary embodiment, the process of constructing a multi-domain knowledge database includes:
[0116] Obtain the query types corresponding to various application domain scenarios;
[0117] Retrieve multiple related data for each application domain scenario based on the query type corresponding to each application domain scenario;
[0118] A multi-domain knowledge database is built based on all related data.
[0119] The query type is used to determine the fine-tuning task and dataset for the model. After obtaining the query type, related data can be obtained from textbooks, papers, and knowledge blogs. This related data is then processed into text content. For example, PDFs of papers and textbooks can be collected online, downloaded in batches, and processed into text content. Knowledge blogs can be crawled using web crawlers. After collecting sufficient data, preprocessing and data cleaning are performed, and then a vector database is constructed. Generally, this text needs to be segmented and then vectorized. In an exemplary embodiment, the process of constructing a multi-domain knowledge database based on all related data includes:
[0120] Multiple content texts are obtained based on multiple related data;
[0121] The content text is vectorized using a preset encoder;
[0122] A multi-domain knowledge database is constructed based on all document vectors, and this multi-domain knowledge database is used as a vector database.
[0123] In this embodiment, a preset encoder is used to vectorize the content text, which can improve the efficiency of vectorization processing.
[0124] In one exemplary embodiment, the process of fine-tuning a hybrid expert model based on a fine-tuning dataset to obtain a preset encoder includes:
[0125] Construct positive and negative samples based on the fine-tuned dataset;
[0126] The hybrid expert model is fine-tuned based on positive samples, negative samples, and a preset loss function.
[0127] Fine-tuning is understandable; it's a training process performed on a small dataset for a specific downstream task, involving the calculation of the loss function and the updating of model parameters. The extent of fine-tuning can be judged based on the decrease in loss. The loss function can be a contrastive learning loss function.
[0128] In one exemplary embodiment, the process of constructing positive and negative samples based on the fine-tuned dataset includes:
[0129] Extract the first and second text paragraphs from the fine-tuned dataset;
[0130] The first text paragraph is divided into sentences to obtain the first sentence set, which includes multiple first sentence texts;
[0131] The second text paragraph is divided into sentences to obtain a second sentence set, which includes multiple second sentence texts;
[0132] Extract a first sentence text from the first sentence set as the first query text, determine all first sentence texts in the first sentence set except the first query text as the first search content, and determine the first query text and the first search content as positive samples;
[0133] Extract one second sentence text from the second sentence set as the second query text, determine all second sentence texts in the second sentence set except for the second query text as the second search content, and determine the second query text and the second search content as positive samples;
[0134] Extract several first sentence samples from the first sentence set to form a first sentence text set and a second sentence text set. The first sentence text set and the second sentence text set do not intersect.
[0135] Extract several second sentence samples from the second sentence set to form the third sentence text set and the fourth sentence text set. The third sentence text set and the fourth sentence text set are disjoint.
[0136] Each first sentence text in the first sentence text set is identified as the third query text, and multiple second sentence texts are extracted from the second sentence set as negative samples.
[0137] The texts of a predetermined number of sentences in the first and second sentence sets are swapped and randomly concatenated to obtain the third and fourth sentence sets;
[0138] Several sentences were extracted from the third and fourth sentence sets respectively as negative samples.
[0139] In this embodiment, two text segments P1 and P2 are extracted from this fine-tuning dataset each time, and positive and negative samples are constructed respectively in the following manner:
[0140] Positive samples: Divide the two text paragraphs into sentences respectively, obtaining the first sentence set P1 and the second sentence set P2, where P1 = {s1, ..., s2}. M} and P2={t1,…,t N}, where t j For the j-th second sentence text, s iLet i be the first sentence text of the i-th element, where i = 1, 2, ..., M, j = 1, 2, ..., N;
[0141] Randomly select one first sentence text s from P1 i , will s i Considering the first sentence text remaining in P1 as the query sample, we treat it as the search content and form a positive sample pair. Similarly, we construct a positive sample pair from P2, each containing one query sample and one positive sample. We then randomly select two disjoint sentence text sets from P1, which are the first sentence text sets {s}. k ,…,s l} and the second sentence text set {s m ,…,s n Treat the first sentence texts in the first and second sentence text sets as the query sample and the retrieved content, respectively, and combine them into a positive sample pair. Construct another positive sample pair from P2 using the same method.
[0142] The first sentence text set {s} extracted from P1 above k ,…,s l Each first sentence text in P2 is considered a query sample, and multiple sentence text sets {t} are extracted from P2. m ,…,t n} as negative samples; swap 50% of the sentences in P1 and P2 and randomly concatenate them to obtain P1' and P2'. Randomly select several sentences from P1' to form a set {s}. k ',…,s l '}, treat each sentence text in the set as a negative sample, and select several negative samples in P2' using the same method.
[0143] For a batch of data (containing a query text mentioned above, a positive sample, and several negative samples), the fine-tuning loss is calculated based on the preset target loss function, and the fine-tuning progress is adjusted based on the fine-tuning loss.
[0144] The preset objective function is:
[0145]
[0146] Where q is the query sample, y + For positive samples, y f Let f be the f-th negative sample, K be the total number of negative samples, f = 1, 2, ..., K, s(q, y) + ) represents the similarity between the query sample and the positive sample, τ is the temperature coefficient, and L(q, y) represents the similarity between the query sample and the positive sample. + ) represents the loss function value, exp represents the exponential function, and s(q, y) represents the loss function value. f) represents the similarity between the query sample and the f-th negative sample.
[0147] In this embodiment, when fine-tuning the hybrid expert model, there is no need for manual data labeling. At the same time, considering that the text content from the same text paragraph has a high similarity, positive and negative samples are selected to fine-tune the hybrid expert model, thereby improving the fine-tuning efficiency and making the coding results of the hybrid expert model more accurate.
[0148] In one exemplary embodiment, the process of fine-tuning a hybrid expert model based on a fine-tuning dataset to obtain a preset encoder includes:
[0149] The hybrid expert model is fine-tuned based on the fine-tuning dataset to obtain the target expert model;
[0150] The student model is obtained by performing knowledge aggregation and knowledge distillation on the target expert model.
[0151] A pre-defined encoder is built based on the student model.
[0152] In this embodiment, knowledge aggregation and knowledge distillation are also performed on the pre-trained and fine-tuned MoE model to obtain a dense student model. The student model integrates the knowledge of all experts in the MoE model, and has fewer parameters, higher inference efficiency, and is easier to deploy and apply in practice.
[0153] The goal of knowledge aggregation is to merge knowledge from different MoE expert models and distribute it to a single model within the student model. In knowledge aggregation, some layers of the student model (such as embedding layers, attention layers, and normalization layers) directly copy the parameters of the teacher model. However, for the MoE layers of the teacher model, which contain multiple experts, the student model needs to merge the experts' knowledge using a specific aggregation method. Since each expert model is an FFN, aggregation methods such as summation or averaging can be used to process the weight matrices of multiple expert models into a single weight matrix for the student model. Knowledge distillation aims to fine-tune the student model to minimize the difference between the teacher model's output and the student model's output. The KL (Kullback-Leibler) divergence loss function guides the student model to learn the output distribution of the teacher model. The final loss function combines the loss from the main task and the distillation loss.
[0154] The solution of this invention can construct a general-purpose retrieval engine with knowledge understanding capabilities across multiple domains. First, relevant domains are determined based on task requirements, and training datasets for those domains are constructed. A MoE model is then trained as the teacher model. Next, knowledge is integrated from the MoE model to obtain a dense student model, which ultimately serves as the encoding model for the retrieval engine. This approach enables the retrieval engine to understand knowledge across multiple domains, providing versatility and a wider range of applications without requiring fine-tuning of a corresponding encoder model for each domain. Simultaneously, it ensures the retrieval engine's operational efficiency and makes it more suitable for deployment.
[0155] Secondly, please refer to Figure 2 This application also provides a dialogue processing system, a processing component applied to an interactive device, the interactive device further including an input component and a prompting component, the dialogue processing system comprising:
[0156] Processing module 11 is used to convert the query dialogue text input through the input component into a word sequence including multiple word elements when it receives the query dialogue text.
[0157] Encoding module 12 is used to input the word sequence into the preset encoder and convert the word sequence into a query vector through the preset encoder. The preset encoder is an encoder built based on a hybrid expert model.
[0158] The calculation module 13 is used to retrieve the target document vector with the highest similarity to the query vector in the vector database, and obtain the content text corresponding to the target document vector. The vector database stores multiple document vectors, and the document vector is obtained by encoding and indexing the content text.
[0159] The response module 14 is used to combine the content text corresponding to the target document vector with the query dialogue text to obtain the retrieval text, so as to generate the response text corresponding to the query dialogue text based on the retrieval text, and to prompt the response text through the prompt component.
[0160] In one exemplary embodiment, the preset encoder includes a gating network and multiple expert sub-models;
[0161] The process of converting a sequence of terms into a query vector using a preset encoder includes:
[0162] The expert sub-model corresponding to each word in the word sequence is determined by a gating network;
[0163] Each word is assigned to the corresponding expert sub-model to calculate the tensor corresponding to the word;
[0164] The query vector is obtained from the tensors corresponding to all tokens.
[0165] In an exemplary embodiment, the process of determining the expert sub-model corresponding to each lexical in a lexical sequence using a gating network includes:
[0166] A reference tensor for each word in the word sequence is determined using a gating network; the number of elements in the reference tensor is the same as the number of expert sub-models.
[0167] For each word, the elements in the reference tensor of the word are divided into a first element and a second element. The position of the first element in the reference tensor is determined, and the expert sub-model corresponding to the word is determined based on the position. The value of the first element is greater than the value of the second element.
[0168] In one exemplary embodiment, when the number of the first element is multiple;
[0169] Determine the weights of each expert sub-model;
[0170] The process of assigning each word to the corresponding expert sub-model to calculate the tensor corresponding to the word includes:
[0171] Each word element is assigned to multiple corresponding expert sub-models, resulting in multiple intermediate tensors;
[0172] The tensor of a word is calculated based on the weights of multiple intermediate tensors and the expert sub-models corresponding to the words.
[0173] In an exemplary embodiment, the process of determining the reference tensor for each lexical in a lexical sequence using a gating network includes:
[0174] Obtain the sequence length of the word sequence and the number of expert sub-models;
[0175] The reference tensor for each lexical unit is determined by a gating network based on the sequence length and the number of expert sub-models.
[0176] In one exemplary embodiment, the dialogue processing further includes:
[0177] The pre-built module is used to construct a multi-domain knowledge database, determine the fine-tuning dataset corresponding to the current application domain scenario from the multi-domain knowledge database, and fine-tune the hybrid expert model based on the fine-tuning dataset to obtain the preset encoder.
[0178] In one exemplary embodiment, the process of constructing a multi-domain knowledge database includes:
[0179] Obtain the query types corresponding to various application domain scenarios;
[0180] Retrieve multiple related data for each application domain scenario based on the query type corresponding to each application domain scenario;
[0181] A multi-domain knowledge database is built based on all related data.
[0182] In one exemplary embodiment, the process of constructing a multi-domain knowledge database based on all associated data includes:
[0183] Multiple content texts are obtained based on multiple related data;
[0184] Each piece of text is vectorized to obtain a document vector;
[0185] A multi-domain knowledge database is constructed based on all document vectors, and this multi-domain knowledge database is used as a vector database.
[0186] In one exemplary embodiment, the process of vectorizing each piece of content text includes:
[0187] The content text is vectorized using a preset encoder.
[0188] In one exemplary embodiment, the process of fine-tuning a hybrid expert model based on a fine-tuning dataset to obtain a preset encoder includes:
[0189] Construct positive and negative samples based on the fine-tuned dataset;
[0190] The hybrid expert model is fine-tuned based on positive samples, negative samples, and a preset loss function.
[0191] In one exemplary embodiment, the process of constructing positive and negative samples based on the fine-tuned dataset includes:
[0192] Extract the first and second text paragraphs from the fine-tuned dataset;
[0193] The first text paragraph is divided into sentences to obtain the first sentence set, which includes multiple first sentence texts;
[0194] The second text paragraph is divided into sentences to obtain a second sentence set, which includes multiple second sentence texts;
[0195] Extract a first sentence text from the first sentence set as the first query text, determine all first sentence texts in the first sentence set except the first query text as the first search content, and determine the first query text and the first search content as positive samples;
[0196] Extract one second sentence text from the second sentence set as the second query text, determine all second sentence texts in the second sentence set except for the second query text as the second search content, and determine the second query text and the second search content as positive samples;
[0197] Extract several first sentence samples from the first sentence set to form a first sentence text set and a second sentence text set. The first sentence text set and the second sentence text set do not intersect.
[0198] Extract several second sentence samples from the second sentence set to form the third sentence text set and the fourth sentence text set. The third sentence text set and the fourth sentence text set are disjoint.
[0199] Each first sentence text in the first sentence text set is identified as the third query text, and multiple second sentence texts are extracted from the second sentence set as negative samples.
[0200] The texts of a predetermined number of sentences in the first and second sentence sets are swapped and randomly concatenated to obtain the third and fourth sentence sets;
[0201] Several sentences were extracted from the third and fourth sentence sets respectively as negative samples.
[0202] In one exemplary embodiment, the process of fine-tuning a hybrid expert model based on a fine-tuning dataset to obtain a preset encoder includes:
[0203] The hybrid expert model is fine-tuned based on the fine-tuning dataset to obtain the target expert model;
[0204] The student model is obtained by performing knowledge aggregation and knowledge distillation on the target expert model.
[0205] A pre-defined encoder is built based on the student model.
[0206] While MoE can learn richer knowledge, it also suffers from drawbacks such as overfitting, deployment difficulties, and high costs associated with expert parallelism, making it relatively difficult to use in practice. Knowledge aggregation and knowledge distillation aim to distill dense student models using sparse teacher models, where the student model is a dense neural network with an architecture similar to the MoE teacher model.
[0207] The goal of the knowledge aggregation phase is to merge knowledge from different MoE expert models and distribute it to a single model in the student model. During this phase, some layers of the student model (such as the embedding layer, attention layer, and normalization layer) directly copy the parameters of the teacher model. However, for the teacher model's MoE layers, which contain multiple experts, the student model needs to merge the experts' knowledge using a specific aggregation method. Since each expert model is an FFN, aggregation methods such as summation or averaging can be used to process the weight matrices of multiple expert models into a single weight matrix for the student model.
[0208] The goal of the knowledge distillation phase is to fine-tune the student model to minimize the difference between the teacher model's output and the student model's output. The student model learns the teacher model's output distribution using the KL divergence loss function. The final loss function combines the loss from the main task and the distillation loss.
[0209] Thirdly, please refer to Figure 3 This application also provides an electronic device, including:
[0210] Memory 21 is used to store computer programs;
[0211] The processor 22 is configured to execute a computer program to implement the steps of the dialogue processing method as described in any of the embodiments above.
[0212] The electronic device also includes:
[0213] Input interface 23, connected to processor 22 via communication bus 26, is used to acquire externally imported computer programs, parameters, and instructions, and save them to memory 21 under the control of processor 22. This input interface can be connected to an input device to receive parameters or instructions manually entered by the user. This input device can be a touch layer covering the display screen, or buttons, a trackball, or a touchpad mounted on the terminal casing.
[0214] Display unit 24 is connected to processor 22 via communication bus 26 and is used to display data sent by processor 22. This display unit can be a liquid crystal display screen or an electronic ink display screen, etc.
[0215] Network port 25 is connected to processor 22 via communication bus 26 and is used for communication with external terminal devices. The communication technology used for this connection can be wired or wireless communication technology, such as mobile high-definition link technology, universal serial bus, high-definition multimedia interface, wireless fidelity technology, Bluetooth communication technology, Bluetooth low power communication technology, and communication technology based on IEEE 802.11s.
[0216] Fourthly, please refer to Figure 4 This application also provides a computer-readable storage medium 30 on which a computer program 31 is stored. When the computer program 31 is executed by a processor, it implements the steps of the dialogue processing method as described in any of the embodiments above.
[0217] The computer-readable storage medium 30 may include various media capable of storing program code, such as USB flash drives, portable hard drives, read-only memory (ROM), random access memory (RAM), magnetic disks, or optical disks.
[0218] Fifthly, this application also provides a computer program product, including a computer program / instructions that, when executed by a processor, implement the steps of the dialogue processing method described in any of the above embodiments.
[0219] It should also be noted that, in this specification, relational terms such as "first" and "second" are used only to distinguish one entity or operation from another, and do not necessarily require or imply any such actual relationship or order between these entities or operations. Furthermore, the terms "comprising," "including," or any other variations thereof are intended to cover non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements includes not only those elements but also other elements not expressly listed, or elements inherent to such a process, method, article, or apparatus. Without further limitations, an element defined by the phrase "comprising one..." does not exclude the presence of other identical elements in the process, method, article, or apparatus that includes said element.
[0220] The above description of the disclosed embodiments enables those skilled in the art to make or use this application. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the general principles defined herein may be implemented in other embodiments without departing from the spirit or scope of this application. Therefore, this application is not to be limited to the embodiments shown herein, but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.
Claims
1. A dialogue processing method, characterized in that, A processing component applied to an interactive device, the interactive device further including an input component and a prompting component, the dialogue processing method comprising: When a query dialogue text is received through the input component, the query dialogue text is converted to obtain a word sequence including multiple word elements; The word sequence is input into a preset encoder, which converts the word sequence into a query vector. The preset encoder is an encoder built based on a hybrid expert model. The target document vector with the highest similarity to the query vector is retrieved from the vector database, and the content text corresponding to the target document vector is obtained. The vector database stores multiple document vectors, and the document vector is obtained by encoding and indexing the content text. The content text corresponding to the target document vector is combined with the query dialogue text to obtain the retrieval text, so as to generate the response text corresponding to the query dialogue text based on the retrieval text, and the response text is displayed through the prompt component; The preset encoder includes a gating network and multiple expert sub-models; The process of converting the word sequence into a query vector using the preset encoder includes: The expert sub-model corresponding to each word in the word sequence is determined by the gating network. Each lexical unit is assigned to the corresponding expert sub-model to calculate the tensor corresponding to the lexical unit; The query vector is obtained from the tensors corresponding to all the given terms.
2. The dialogue processing method according to claim 1, characterized in that, The process of determining the expert sub-model corresponding to each word in the word sequence through the gating network includes: The reference tensor for each term in the term sequence is determined by the gating network; the number of elements in the reference tensor is the same as the number of expert sub-models. For each lexical unit, each element in the reference tensor of the lexical unit is divided into a first element and a second element. The position of the first element in the reference tensor is determined, and the expert sub-model corresponding to the lexical unit is determined based on the position. The value of the first element is greater than the value of the second element.
3. The dialogue processing method according to claim 2, characterized in that, When the number of the first element is multiple; Determine the weights of each of the expert sub-models; The process of assigning each lexical unit to the corresponding expert submodel to calculate the tensor corresponding to the lexical unit includes: Each lexical unit is assigned to one of the corresponding expert sub-models to obtain multiple intermediate tensors; The tensor of the lexical is calculated based on the weights of the expert submodel corresponding to the multiple intermediate tensors and the lexical.
4. The dialogue processing method according to claim 2, characterized in that, The process of determining the reference tensor of each term in the term sequence through the gating network includes: Obtain the sequence length of the lexical sequence and the number of expert sub-models; The reference tensor for each term is determined by the gating network according to the sequence length and the number of expert sub-models.
5. The dialogue processing method according to any one of claims 1-4, characterized in that, Before inputting the word sequence into the preset encoder, the dialogue processing method further includes: Build a multi-domain knowledge database; Determine the fine-tuning dataset corresponding to the current application domain scenario from the multi-domain knowledge database; The hybrid expert model is fine-tuned based on the fine-tuning dataset to obtain the preset encoder.
6. The dialogue processing method according to claim 5, characterized in that, The process of building a multi-domain knowledge database includes Obtain the query types corresponding to various application domain scenarios; Based on the query type corresponding to each application domain scenario, obtain multiple related data for that application domain scenario; A multi-domain knowledge database is constructed based on all the aforementioned related data.
7. The dialogue processing method according to claim 6, characterized in that, The process of constructing a multi-domain knowledge database based on all the aforementioned related data includes: Multiple content texts are obtained based on the multiple sets of associated data; Each piece of content text is vectorized to obtain a document vector; A multi-domain knowledge database is constructed based on all the document vectors, and the multi-domain knowledge database is used as the vector database.
8. The dialogue processing method according to claim 7, characterized in that, The process of vectorizing each of the aforementioned content texts includes: The content text is vectorized using the preset encoder.
9. The dialogue processing method according to claim 5, characterized in that, The process of fine-tuning the hybrid expert model based on the fine-tuning dataset to obtain the preset encoder includes: Positive and negative samples are constructed based on the fine-tuned dataset; The hybrid expert model is fine-tuned based on the positive samples, the negative samples, and the preset loss function.
10. The dialogue processing method according to claim 9, characterized in that, The process of constructing positive and negative samples based on the fine-tuned dataset includes: Extract the first and second text paragraphs from the fine-tuned dataset; The first text paragraph is divided into sentences to obtain a first sentence set, which includes multiple first sentence texts; The second text paragraph is divided into sentences to obtain a second sentence set, which includes multiple second sentence texts; Extract one first sentence text from the first sentence set as the first query text, determine all first sentence texts in the first sentence set except the first query text as the first search content, and determine the first query text and the first search content as positive samples; Extract one second sentence text from the second sentence set as the second query text, determine all second sentence texts in the second sentence set except for the second query text as the second search content, and determine the second query text and the second search content as positive samples; Several first sentence samples are extracted from the first sentence set to form a first sentence text set and a second sentence text set, and the first sentence text set and the second sentence text set do not intersect; Several second sentence samples are extracted from the second sentence set to form a third sentence text set and a fourth sentence text set, wherein the third sentence text set and the fourth sentence text set are disjoint; Each first sentence text in the first sentence text set is determined as the third query text, and multiple second sentence texts are extracted from the second sentence set as negative samples. A predetermined number of sentences in the first and second sentence sets are swapped and randomly concatenated to obtain a third and a fourth sentence set. Several sentence texts are extracted as negative samples from the third sentence set and the fourth sentence set, respectively.
11. The dialogue processing method according to claim 5, characterized in that, The process of fine-tuning the hybrid expert model based on the fine-tuning dataset to obtain the preset encoder includes: The hybrid expert model is fine-tuned based on the fine-tuning dataset to obtain the target expert model; The student model is obtained by performing knowledge aggregation and knowledge distillation on the target expert model. A preset encoder is constructed based on the student model.
12. A dialogue processing system, characterized in that, A processing component applied to an interactive device, the interactive device further including an input component and a prompting component, the dialogue processing system comprising: The processing module is used to convert the query dialogue text input through the input component into a word sequence including multiple word elements when it receives the query dialogue text. An encoding module is used to input the word sequence into a preset encoder and convert the word sequence into a query vector through the preset encoder. The preset encoder is an encoder built based on a hybrid expert model. The calculation module is used to retrieve the target document vector with the highest similarity to the query vector from the vector database, and obtain the content text corresponding to the target document vector. The vector database stores multiple document vectors, and the document vector is obtained by encoding and indexing the content text. The response module is used to combine the content text corresponding to the target document vector with the query dialogue text to obtain the retrieval text, so as to generate the response text corresponding to the query dialogue text based on the retrieval text, and to prompt the response text through the prompt component. The preset encoder includes a gating network and multiple expert sub-models; The process of converting the word sequence into a query vector using the preset encoder includes: The expert sub-model corresponding to each word in the word sequence is determined by the gating network. Each lexical unit is assigned to the corresponding expert sub-model to calculate the tensor corresponding to the lexical unit; The query vector is obtained from the tensors corresponding to all the given terms.
13. An electronic device, characterized in that, include: Memory, used to store computer programs; A processor, configured to implement the steps of the dialogue processing method as described in any one of claims 1-11 when executing the computer program.
14. A computer-readable storage medium, characterized in that, The computer-readable storage medium stores a computer program that, when executed by a processor, implements the steps of the dialogue processing method as described in any one of claims 1-11.