A dialogue target-oriented conditional variational autoencoder dialogue recommendation method and system

By combining the dialogue objective and external knowledge with a conditional variational autoencoder model, the problem of traditional recommendation systems capturing dynamic preferences of new users and generating meaningless answers is solved, thus achieving higher-quality dialogue generation.

CN115687586BActive Publication Date: 2026-06-12BEIHANG UNIV

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
BEIHANG UNIV
Filing Date
2022-10-28
Publication Date
2026-06-12

AI Technical Summary

Technical Problem

Traditional recommendation systems struggle to dynamically capture user preferences when faced with new users, and generative dialogue systems are prone to generating meaningless answers, suffer from severe diversity issues, and fail to fully utilize external information.

Method used

A conditional variational autoencoder model is adopted, which combines the dialogue objective, historical dialogue text and external knowledge. The encoder extracts contextual features, constructs latent variables and minimizes KL loss, combines a knowledge selection module to filter duplicate knowledge, and uses a decoder to generate answers.

🎯Benefits of technology

It improves the diversity and quality of responses, reduces noise, generates responses that better meet user expectations, and enhances fluency and relevance.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN115687586B_ABST
    Figure CN115687586B_ABST
Patent Text Reader

Abstract

The application relates to a dialogue target-oriented conditional variational autoencoder dialogue recommendation method and system, which comprises the following steps: a target-oriented conditional variational autoencoder model is constructed, hidden variables of global semantic representation are obtained by fully utilizing dialogue target information, then a knowledge selection module is constructed, the hidden variables are used for assisting external knowledge selection, and relevant external knowledge can be better fused with a decoder, so that the quality of dialogue recommendation is effectively improved. The method provided by the application can better model global semantic representation, realizes the diversity of answers, and uses global semantic representation to assist external knowledge selection, filters a large amount of irrelevant external knowledge, and reduces noise.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of dialogue recommendation, and more specifically to a conditional variational autoencoder dialogue recommendation method and system oriented towards dialogue goals. Background Technology

[0002] In many practical applications, traditional recommender systems typically employ a cold-start approach, estimating user preferences by observing past user behavior (e.g., click history, access logs). However, in certain situations, this static model fails to provide reliable estimates due to its inherent limitations (e.g., unfriendly to new users, difficulty in capturing changes in user preferences from past data). Therefore, researchers have proposed an advanced recommender mechanism: the dialogue recommender system. This system leverages the advantages of multi-turn dialogue to support richer interactions, allowing the chatbot to ask questions and receive feedback from users, dynamically capturing user preferences and thus mitigating the cold-start problem. These advantages demonstrate the significant practical application potential of dialogue recommender systems, leading to their increasing attention from academia and industry.

[0003] The key to dialogue recommendation systems lies in how to elicit user preferences through multi-turn dialogues, thus requiring a foundation in dialogue systems. Current dialogue systems mainly include dialogue retrieval systems and dialogue generation systems. Dialogue retrieval systems retrieve the most matching answer from an existing candidate database based on a given historical dialogue. If a high-quality candidate database is pre-built, it can produce fluent and informative answers; however, if the candidate database is of low quality, it may return irrelevant answers. Dialogue generation systems, on the other hand, can flexibly generate answers based on historical dialogues, resulting in higher relevance between the answers and the historical dialogue, making them more common in dialogue recommendation systems. However, generative dialogue systems have long faced diversity issues. Due to the high frequency of generic answers in the training samples; the model's probabilistic sampling, which makes it easier to generate common words at each position; and the lack of relevant external information to guide the model, generative dialogue systems tend to generate universally acceptable but meaningless answers instead of pushing the dialogue to a more engaging stage. This can easily lead to user boredom and the termination of the dialogue. To address this issue, some studies have attempted to introduce more valuable external information to enrich and guide the dialogue. Therefore, many dialogue recommendation datasets with external guiding information (such as dialogue goals, external knowledge, etc.) have been collected and created, such as DuRecDail, TG-ReDial, and GoRecDial. However, how to further utilize this external information to improve model performance requires further in-depth exploration from various aspects. Summary of the Invention

[0004] To address the aforementioned technical problems, this invention provides a conditional variational autoencoder-based dialogue recommendation method and system oriented towards dialogue goals.

[0005] The technical solution of this invention is: a dialogue recommendation method based on a conditional variational autoencoder oriented towards dialogue goals, comprising:

[0006] Step S1: Perform word segmentation and concatenation on the dialogue target G, historical dialogue text C, external knowledge K, and standard answer R to obtain the input text sequence X. C , and X R ;

[0007] Step S2: Construct a goal-oriented conditional variational autoencoder model, using the encoder to extract contextual features of each word in the input text sequence: contextual features combining the dialogue target and historical dialogue text. External knowledge context features and the contextual features of standard answers

[0008] Step S3: Place the... Input Prior Network MLP prior To obtain the prior distribution, and Input Posterior Network MLP recog An approximate posterior distribution is obtained. The latent variable z is derived from the mean and variance vector of this approximate posterior distribution. Simultaneously, the difference between the prior distribution and the approximate posterior distribution is minimized to construct the KL loss, and the weights of this loss are adjusted using a cyclic annealing strategy. z and By splicing and merging, the latent variable z of the merged data is obtained. C Construct a bag-of-words loss, such that z C Go and learn the information from the standard answers;

[0009] Step S4: Construct a knowledge selection module and calculate the external knowledge context features. With z C Relevance as a scaling weight And build filter tags To avoid repeatedly selecting knowledge that has appeared in previous dialogues, the final step is to... and Multiplying yields external knowledge context features with scaling weights.

[0010] Step S5: Apply different linear layers to z C A linear mapping is performed, and the resulting feature vector is concatenated and fused layer by layer with the key and value feature vectors of the decoder. Then, all external knowledge features are processed. By splicing This is then incorporated into the last layer of the decoder, and finally the output of the decoder is used to calculate the probability distribution p of generating the response, and a negative log-likelihood loss is constructed.

[0011] Compared with the prior art, the present invention has the following advantages:

[0012] 1. Due to the high frequency of generic responses in training samples and the lack of relevant external information to guide the model in existing technologies, most dialogue recommendation systems face the problem of diversity, that is, the model tends to generate universally acceptable but meaningless responses. This invention discloses a dialogue recommendation method based on a conditional variational autoencoder oriented towards dialogue goals. It combines the idea of ​​conditional variational autoencoders to make full use of dialogue goal information, concatenates it with historical dialogues, and then encodes it into a probability distribution in the latent space to establish a higher-level global semantic representation. Finally, it samples from the distribution to better model the diversity of responses.

[0013] 2. In most cases, appropriate answers involve some relevant external knowledge. However, when there is too much external knowledge, it can easily introduce a lot of noise, thus compromising the quality of the generated answer. To address this issue, this invention constructs a knowledge selection module, giving higher weight to external knowledge that is highly relevant to the dialogue during the generation process, and vice versa. On the other hand, since the model tends to give higher scores to knowledge that has appeared in previous dialogues, this phenomenon can lead to the model generating duplicate answers. This invention designs filtering labels to filter out knowledge that has already been used in previous dialogues. Finally, word-level embedding is used to integrate relevant external knowledge into the decoding process, providing more fine-grained feature information for answer generation. Attached Figure Description

[0014] Figure 1 This is a flowchart of a dialogue recommendation method based on a dialogue objective according to an embodiment of the present invention;

[0015] Figure 2 This is a schematic diagram of the structure of the conditional variational autoencoder model oriented towards the dialogue target in an embodiment of the present invention;

[0016] Figure 3 This is a structural block diagram of a dialogue recommendation system based on a conditional variational autoencoder oriented towards dialogue goals, as described in an embodiment of the present invention. Detailed Implementation

[0017] This invention provides a conditional variational autoencoder dialogue recommendation method oriented towards dialogue goals, which can better model global semantic representation and achieve diversity of answers; and uses global semantic representation to assist in the selection of external knowledge, filtering out a large amount of irrelevant external knowledge and reducing noise.

[0018] To make the objectives, technical solutions, and advantages of this invention clearer, the invention will be further described in detail below through specific implementations and in conjunction with the accompanying drawings.

[0019] Example 1

[0020] like Figure 1 As shown in the figure, an embodiment of the present invention provides a dialogue recommendation method based on a conditional variational autoencoder oriented towards a dialogue goal, comprising the following steps:

[0021] Step S1: Perform word segmentation and concatenation on the dialogue target G, historical dialogue text C, external knowledge K, and standard answer R to obtain the input text sequence X. C , and X R ;

[0022] Step S2: Construct a goal-oriented conditional variational autoencoder model, and use the encoder to extract contextual features of each word in the input text sequence: contextual features combining the dialogue target and historical dialogue text. External knowledge context features and the contextual features of standard answers

[0023] Step S3: [The text appears to be incomplete and contains several grammatical errors. A more accurate translation would require the full Input Prior Network MLP prior To obtain the prior distribution, and Input Posterior Network MLP recog We obtain an approximate posterior distribution, and then use the mean and variance vector of this approximate posterior distribution to obtain the latent variable z. Simultaneously, we minimize the difference between the prior distribution and the approximate posterior distribution to construct the KL loss, and use a cyclic annealing strategy to adjust the weights of this loss. We then combine z with... By splicing and merging, the latent variable z of the merged data is obtained. C Construct a bag-of-words loss, such that z C Go and learn the information from the standard answers;

[0024] Step S4: Construct a knowledge selection module and calculate external knowledge context features. With z C Relevance as a scaling weight And build filter tags To avoid repeatedly selecting knowledge that has appeared in previous dialogues, the final step is to... and Multiplying yields external knowledge context features with scaling weights.

[0025] Step S5: Apply different linear layers to z CA linear mapping is performed, and the resulting feature vector is concatenated and fused layer by layer with the key and value feature vectors of the decoder. Then, all external knowledge features are processed. By splicing This is then incorporated into the last layer of the decoder, and finally the output of the decoder is used to calculate the probability distribution p of generating the answer, and a negative log-likelihood loss is constructed.

[0026] In one embodiment, step S1 above involves segmenting and concatenating the dialogue target G, historical dialogue text C, external knowledge K, and standard answer R to obtain the input text sequence X. C , and X R Specifically, it includes:

[0027] Step S11: Obtain the dialogue target G and the historical dialogue text. External knowledge And the standard answer R; where N C and N K These represent the number of rounds of historical dialogue and the amount of external knowledge, respectively.

[0028] Step S12: After converting G, C, K, and R to Unicode encoding, perform word segmentation; insert the character tags [USER] and [BOT] into the processed history dialogue text, and concatenate them using the special delimiter [SEP] to obtain X. concat :

[0029]

[0030] Step S13: For X concat The input text sequence X is obtained by concatenating the first and last R and K sequences with the special characters [CLS] and [SEP]. C , and X R :

[0031] X C =[CLS]X concat [SEP]

[0032] X R =[CLS]R[SEP]

[0033]

[0034] In one embodiment, step S2 above involves constructing a goal-oriented conditional variational autoencoder model, using the encoder to extract contextual features of each word in the input text sequence: contextual features combining the dialogue target and historical dialogue text. External knowledge context features and the contextual features of standard answers Specifically, it includes:

[0035] This invention uses, but is not limited to, the BERT text encoder to extract contextual features of each word in the input text sequence. The specific steps are as follows:

[0036]

[0037]

[0038]

[0039] in, Let |G| represent the contextual features obtained by BERT encoding the dialogue target G, the historical dialogue text C, the standard response R, and the external knowledge K, respectively. Let |G| represent the length of the dialogue target, |C| represent the length of the historical dialogue text, and |k| represent the length of the historical dialogue text. i | represents the length of the i-th external knowledge, d h The dimension representing the contextual features.

[0040] In one embodiment, step S3 above: Input Prior Network MLP prior To obtain the prior distribution, and Input Posterior Network MLP recog We obtain an approximate posterior distribution, and then use the mean and variance vector of this approximate posterior distribution to obtain the latent variable z. Simultaneously, we minimize the difference between the prior distribution and the approximate posterior distribution to construct the KL loss, and use a cyclic annealing strategy to adjust the weights of this loss. We then combine z with... By splicing and merging, the latent variable z of the merged data is obtained. C Construct a bag-of-words loss, such that z C Learn the information from the standard answers, specifically including:

[0041] Step S31: ... Input is a priori network MLP consisting of multiple fully connected layers. prior The prior distribution is obtained.

[0042]

[0043] Where z is a latent variable; Let d represent the vector of mean and variance of the prior distribution. z The dimension of the latent variable z is represented; CLS(·) represents taking the feature vector corresponding to the [CLS] label;

[0044] Step S32: and Input is a posterior network MLP consisting of multiple fully connected layers. recog To obtain an approximate posterior distribution

[0045]

[0046] in, This represents the mean and variance vector of the approximate posterior distribution. This indicates concatenating two vectors;

[0047] Minimize the difference between the prior distribution and the approximate posterior distribution to construct the KL loss. And the weight of this loss is adjusted using a cyclic annealing strategy:

[0048]

[0049] Step S33: Obtain the latent variable z using the mean μ and variance vector σ of the approximate posterior distribution, and then combine z with... By splicing and merging, the latent variable z of the merged data is obtained. C :

[0050] z=μ+σ⊙ε

[0051]

[0052] In order to sample the latent variable z from the Gaussian distribution, since the sampling operation is not differentiable, a reparameter technique is used: first, sampling is performed from the standard normal distribution, i.e., ε is the random perturbation sampled from the standard normal distribution, following a certain pattern. Further scaling and translation yields the hidden variable z; ⊙ represents dot product;

[0053] Constructing a bag-of-words loss allows for the fusion of latent variables z. C To predict the words contained in the standard answer, in order to prompt z C Go and learn the information in the standard answer:

[0054]

[0055] Among them, R bow It is a bag-of-words form of the standard answer, and φ(·) is a fully connected layer with a softmax activation function; r i Let |R| represent the i-th word in the standard answer R; |R| represents the length of the standard answer.

[0056] In one embodiment, step S4 above: Constructing a knowledge selection module and calculating external knowledge context features. With z C Relevance as a scaling weight And build filter tags To avoid repeatedly selecting knowledge that has appeared in previous dialogues, the final step is to... and Multiplying yields external knowledge context features with scaling weights. Specifically, it includes:

[0057] Step S41: Use a scaled dot product attention mechanism to score each piece of knowledge. First, use a learnable parameter matrix. z C The dimension is reduced to d h Then, a scaled dot product operation is performed, and finally, the score weights are weighted using a sigmoid activation function. Scaling to a fixed range:

[0058]

[0059] Step S42: To avoid the model repeatedly selecting knowledge that has appeared in the historical dialogue, construct filter labels: calculate the overlap rate between each piece of knowledge and the historical dialogue at the word level, and when the overlap rate exceeds a preset threshold, assign its label value. Set to 0; Multiply by the filter tag value As a scaling weight, multiply by Obtain external knowledge context features with scaling weights.

[0060]

[0061] The knowledge selection module built in this step scales the weight of each piece of knowledge according to its relevance to the current dialogue, so that more relevant knowledge can participate in the subsequent decoding stage.

[0062] In one embodiment, step S5 above: applying different linear layers to z C A linear mapping is performed, and the resulting feature vector is concatenated and fused layer by layer with the key and value feature vectors of the decoder. Then, all external knowledge features are processed. By splicing This is then incorporated into the final layer of the decoder. Finally, the output of the decoder is used to calculate the probability distribution p of generating the response, and a negative log-likelihood loss is constructed, specifically including:

[0063] Step S51: Construct a decoder by applying z through different linear layers. C Perform a linear mapping transformation to obtain the feature vectors corresponding to each layer. To be provided to different layers of the decoder:

[0064]

[0065] Where L represents the number of layers in the decoder, This represents the feature vector provided to the decoder for use in the i-th layer. The parameter matrix representing the linear mapping;

[0066] The decoder constructed in this step also adopts the BERT structure, with the masking method modified to be from left to right (ensuring that the word at the current position can only observe words generated before the current position), and shares parameters with the encoder. Meanwhile, to alleviate the exposure error problem caused by inconsistency between training and testing (each word input during training comes from the standard answer, but the current input during inference uses the output generated by the previous word), this embodiment of the invention uses a padding generation mechanism, employing the special character [ATTN] as the input for the current word in both the training and testing phases.

[0067] This invention takes into account that each layer of the BERT model is used to capture different information content. For example, the bottom layer captures phrase-level information, the middle layer extracts syntactic features, and the top layer focuses more on semantic features. Therefore, different linear layers are used to fuse the latent variable z. C Mapped to different layers of the decoder.

[0068] Step S52: Convert the feature vector The key and value feature vectors from the decoder are concatenated layer by layer, following the calculation method of the self-attention function in BERT:

[0069]

[0070]

[0071]

[0072] in, and These represent the key, value, and feature vector, respectively. The result after splicing;

[0073] Step S53: In the last layer of the decoder, all knowledge word-level feature representations are concatenated and fused:

[0074]

[0075]

[0076]

[0077] in, The result is obtained by flattening and concatenating all knowledge word-level feature vectors, where |K| represents the total length of all knowledge.

[0078] Step S54: and the latent variable z of fusion C By integrating with the decoder, output O is obtained. dec After linear transformation and activation function GELU, logits are obtained, and finally the parameter matrix W of the BERT Word Embedding layer is reused. emb The probability distribution p of generating characters at each position is obtained:

[0079] logits = gelu(W out (O dec ))

[0080] p = W emb (logits)

[0081] in, The parameter matrix representing the linear mapping, |vocab| indicates the size of the vocabulary.

[0082] Step S55: Construct the negative log-likelihood loss Calculate the negative log-likelihood for each generated character, and then take the mean:

[0083]

[0084] in, This indicates that the word r is generated at the i-th position. i The probability of;

[0085] Thus, the total loss function is obtained.

[0086]

[0087] Where β represents the weight corresponding to the KL loss.

[0088] Figure 2 A schematic diagram of the structure of a goal-oriented conditional variational autoencoder model is shown.

[0089] Experiments were conducted on the Goal-oriented Conditional Variational Autoencoders (GoCVAE) model constructed in this invention on the DuRecDial and TG-ReDial datasets, and the model was compared with the top models on the corresponding datasets.

[0090] Tables 1 and 2 present the experimental comparison results of GoCVAE's automatic evaluation on the DuRecDial and TG-ReDial datasets, respectively. The results show that GoCVAE achieves optimal performance on multiple automatic evaluation metrics, including F1, BLUE1, BLUE2, DIST1, DIST2, and PPL. This indicates that the proposed GoCVAE model can fully utilize dialogue information to model higher-level global semantic representations, thereby better simulating the representation space of potential answers, further assisting the knowledge selection module, and improving the overall quality of generated answers. To further demonstrate the effectiveness of each module, numerous ablation experiments were conducted on the DuRecDial dataset, where -GOAL represents removing dialogue target information, -KG represents removing external knowledge, -CVAE represents removing the conditional variational autoencoder, -KD represents removing the knowledge selection module, and -FL represents removing filter labels.

[0091] Table 1 Comparison of automatic evaluation results for each model on DuRecDial

[0092]

[0093]

[0094] Table 2 Comparison of automatic evaluation results for each model on TG-ReDial

[0095]

[0096] Table 3 presents the experimental results of human evaluation on the DuRecDial dataset. Thanks to the global semantic features guided by the dialogue goal, GoCVAE achieved significant improvements in metrics such as fluency, appropriateness, informativeness, and proactivity. Furthermore, the knowledge selection module helped filter out a large amount of irrelevant external knowledge, reducing noise and improving the quality of the generated responses. Finally, the Kappa coefficient reflects the consistency among the evaluators.

[0097] Table 3 shows the human evaluation results of the model on DuRecDial.

[0098]

[0099] This invention discloses a dialogue recommendation method based on a conditional variational autoencoder (CDAE). It leverages the concept of CDAE to fully utilize dialogue target information by concatenating it with historical dialogues. This information is then encoded into a probability distribution in the latent space to establish a higher-level global semantic representation. Finally, sampling is performed from this distribution to better model the diversity of responses. Furthermore, this invention incorporates a knowledge selection module, giving higher weight to external knowledge highly relevant to the dialogue during the generation process, and vice versa. On the other hand, since the model tends to assign higher scores to knowledge that has appeared in historical dialogues, this can lead to duplicate responses. To address this, this invention designs filtering labels to filter out knowledge already used in historical dialogues. Finally, word-level embedding is used to integrate relevant external knowledge into the decoding process, providing more fine-grained feature information for response generation.

[0100] Example 2

[0101] like Figure 3 As shown, this embodiment of the invention provides a conditional variational autoencoder dialogue recommendation system oriented towards dialogue goals, including the following modules:

[0102] The input text sequence acquisition module 61 is used to segment and concatenate the dialogue target G, historical dialogue text C, external knowledge K, and standard answer R to obtain the input text sequence X. C , and X R ;

[0103] The context feature extraction module 62 is used to construct a target-oriented conditional variational autoencoder model, which uses the encoder to extract context features of each word in the input text sequence: context features combining the dialogue target and historical dialogue text. External knowledge context features and the contextual features of standard answers

[0104] The hidden variable acquisition module 63 is used to obtain... Input Prior Network MLP prior To obtain the prior distribution, and Input Posterior Network MLP recog We obtain an approximate posterior distribution, and then use the mean and variance vector of this approximate posterior distribution to obtain the latent variable z. Simultaneously, we minimize the difference between the prior distribution and the approximate posterior distribution to construct the KL loss, and use a cyclic annealing strategy to adjust the weights of this loss. We then combine z with... By splicing and merging, the latent variable z of the merged data is obtained. C Construct a bag-of-words loss, such that z C Go and learn the information from the standard answers;

[0105] Construct a knowledge selection module 64 to calculate external knowledge context features. With z C Relevance as a scaling weight And build filter tags To avoid repeatedly selecting knowledge that has appeared in previous dialogues, the final step is to... and Multiplying yields external knowledge context features with scaling weights.

[0106] Module 65 for generating responses, used to apply different linear layers to z C A linear mapping is performed, and the resulting feature vector is concatenated and fused layer by layer with the key and value feature vectors of the decoder. Then, all external knowledge features are processed. By splicing This is then incorporated into the last layer of the decoder, and finally the output of the decoder is used to calculate the probability distribution p of generating the answer, and a negative log-likelihood loss is constructed.

[0107] The above embodiments are provided merely for the purpose of describing the present invention and are not intended to limit the scope of the invention. The scope of the invention is defined by the appended claims. Various equivalent substitutions and modifications made without departing from the spirit and principles of the invention should be covered within the scope of the invention.

Claims

1. A dialogue recommendation method based on a conditional variational autoencoder oriented towards dialogue goals, characterized in that, include: Step S1: Target the dialogue Historical Dialogue Text External knowledge and standard answer The input text sequence is obtained by word segmentation and concatenation. , and ; Step S2: Construct a goal-oriented conditional variational autoencoder model, using the encoder to extract contextual features of each word in the input text sequence: contextual features combining the dialogue target and historical dialogue text. External knowledge context features and the contextual features of standard answers ; Step S3: Place the... Input Prior Network To obtain the prior distribution, and Input Posteriori Network An approximate posterior distribution is obtained, and the latent variables are obtained through the mean and variance vector of the approximate posterior distribution. Simultaneously, the difference between the prior distribution and the approximate posterior distribution is minimized to construct the KL loss, and the weights of this loss are adjusted using a cyclic annealing strategy; z and By splicing and merging, the latent variables of the merged data are obtained. Construct bag-of-words loss, making Go and learn the information from the standard answers; Step S4: Construct a knowledge selection module and calculate the external knowledge context features. and Relevance as a scaling weight and build filter tags To avoid repeatedly selecting knowledge that has appeared in previous dialogues, the final step is to... and , Multiplying yields external knowledge context features with scaling weights. ; Step S5: Use different linear layers A linear mapping is performed, and the resulting feature vector is concatenated and fused layer by layer with the key and value feature vectors of the decoder. Then, all external knowledge features are processed. By splicing This information is then incorporated into the final layer of the decoder, and finally, the output of the decoder is used to calculate the probability distribution for generating the response. And construct the negative log-likelihood loss.

2. The dialogue recommendation method based on a conditional variational autoencoder oriented towards dialogue goals according to claim 1, characterized in that, Step S1: Targeting the dialogue Historical Dialogue Text External knowledge and standard answer The input text sequence is obtained by word segmentation and concatenation. , and Specifically, it includes: Step S11: Obtain the dialogue target Historical dialogue text external knowledge and standard answers ;in, and These represent the number of rounds of historical dialogue and the amount of external knowledge, respectively. Step S12: , , , After converting to Unicode encoding, it is segmented; the character tags [USER] and [BOT] are inserted into the processed historical dialogue text, and then concatenated using the special delimiter [SEP] to obtain... : ; Step S13: For , and Use special characters to concatenate the beginning and end of the sequence. and , to obtain the input text sequence , and : 。 3. The dialogue recommendation method based on a conditional variational autoencoder oriented towards dialogue goals according to claim 2, characterized in that, Step S2: Use an encoder to extract the contextual features of each word in the input text sequence: contextual features combining the dialogue target and historical dialogue text. External knowledge context features and the contextual features of standard answers Specifically, it includes: The BERT text encoder is used to extract the contextual features of each word in the input text sequence. The specific steps are as follows: in, Each represents the dialogue target. With the aforementioned historical dialogue text The standard answer mentioned above and the aforementioned external knowledge Contextual features obtained through BERT encoding Indicates the length of the dialogue target. This indicates the length of the historical dialogue text. Indicates the first The length of external knowledge, The dimension representing the contextual features.

4. The dialogue recommendation method based on a conditional variational autoencoder oriented towards dialogue goals according to claim 1, characterized in that, Step S3: The... Input Prior Network To obtain the prior distribution, and Input Posteriori Network An approximate posterior distribution is obtained, and the latent variables are obtained through the mean and variance vector of the approximate posterior distribution. Simultaneously, the difference between the prior distribution and the approximate posterior distribution is minimized to construct the KL loss, and the weights of this loss are adjusted using a cyclic annealing strategy; z and By splicing and merging, the latent variables of the merged data are obtained. Construct bag-of-words loss, making Learn the information from the standard answers, specifically including: Step S31: Place the... Input is a prior network consisting of multiple fully connected layers. The prior distribution is obtained. : in, Let represent the vector of mean and variance of the prior distribution. Representing latent variables The dimension; This indicates retrieving the feature vector corresponding to the [CLS] label; Step S32: and The input consists of a posterior network composed of multiple fully connected layers. To obtain an approximate posterior distribution : in, This represents the mean and variance vector of the approximate posterior distribution. This indicates concatenating two vectors; Minimize the difference between the prior distribution and the approximate posterior distribution to construct the KL loss, and adjust the weights of this loss using a cyclic annealing strategy: Step S33: Using the mean of the approximate posterior distribution Sum of variance vectors Obtaining hidden variables and z and By splicing and merging, the latent variables of the merged data are obtained. : in, Let the random perturbation sampled from the standard normal distribution follow the following rules: Then, scaling and translating yields the latent variables. ; , , Dot product; Construct a bag-of-words loss, which allows the latent variables of the fusion to be... To predict the words contained in the standard answer, in order to encourage Go and learn the information in the standard answer: in, It is a bag-of-words format for standard answers. It is a fully connected layer with a softmax activation function; This represents the i-th word in the standard answer R; This indicates the length of the standard answer.

5. The dialogue recommendation method based on a conditional variational autoencoder oriented towards dialogue goals according to claim 4, characterized in that, Step S4: Construct a knowledge selection module and calculate the external knowledge context features. and Relevance as a scaling weight and build filter tags To avoid repeatedly selecting knowledge that has appeared in previous dialogues, the final step is to... and , Multiplying yields external knowledge context features with scaling weights. Specifically, it includes: Step S41: Use a scaled dot product attention mechanism to score each piece of knowledge. First, use a learnable parameter matrix. Will The dimension is reduced to Then, a scaled dot product operation is performed, and finally, the score weights are weighted using a sigmoid activation function. Scaling to a fixed range: Step S42: To avoid the model repeatedly selecting knowledge that has appeared in the historical dialogue, construct filter labels: calculate the overlap rate between each piece of knowledge and the historical dialogue at the word level, and when the overlap rate exceeds a preset threshold, assign its label value. Set to 0; Multiply by the filter tag value As a scaling weight, multiply by Obtain external knowledge context features with scaling weights. : 。 6. The dialogue recommendation method based on a conditional variational autoencoder oriented towards dialogue goals according to claim 5, characterized in that, Step S5: Using different linear layers A linear mapping is performed, and the resulting feature vector is concatenated and fused layer by layer with the key and value feature vectors of the decoder. Then, all external knowledge features are processed. By splicing This information is then incorporated into the final layer of the decoder, and finally, the output of the decoder is used to calculate the probability distribution for generating the response. And construct the negative log-likelihood loss, specifically including: Step S51: Construct the decoder by using different linear layers. Perform a linear mapping transformation to obtain the feature vectors corresponding to each layer. To provide different layers of the decoder: in, Indicates the number of layers in the decoder. This indicates that the decoder is provided with the first The feature vectors used by the layer The parameter matrix representing the linear mapping; Step S52: Convert the feature vector The key and value feature vectors from the decoder are concatenated layer by layer, following the calculation method of the self-attention function in BERT: in, and These represent the key, value, and feature vector, respectively. The result after splicing; Step S53: In the last layer of the decoder, all knowledge word-level feature representations are concatenated and fused: in, This represents the result obtained by flattening and concatenating all feature vectors at the word level of knowledge. This represents the total length of all knowledge. Step S54: and latent variables of fusion The output is obtained by integrating it into the decoder. After linear transformation and activation function get Finally, the parameter matrix of the BERT Word Embedding layer is reused. Obtain the probability distribution of the generated character at each position. : in, , The parameter matrix representing the linear mapping, , , Indicates vocabulary size, ; Step S55: Construct the negative log-likelihood loss Calculate the negative log-likelihood for each generated character, and then take the mean: in, This indicates that a word is generated at the i-th position. The probability of; Thus, the total loss function is obtained. : in, This represents the weights corresponding to the KL loss.

7. A dialogue recommendation system oriented towards dialogue goals, characterized in that, Includes the following modules: The module for obtaining the input text sequence is used to target the dialogue. Historical Dialogue Text External knowledge and standard answer The input text sequence is obtained by word segmentation and concatenation. , and ; The context feature extraction module is used to construct a target-oriented conditional variational autoencoder model, where the encoder extracts the context features of each word in the input text sequence: context features combining the dialogue target and historical dialogue text. External knowledge context features and the contextual features of standard answers ; The hidden variable acquisition module is used to obtain the... Input Prior Network To obtain the prior distribution, and Input Posteriori Network An approximate posterior distribution is obtained, and the latent variables are obtained through the mean and variance vector of the approximate posterior distribution. Simultaneously, the difference between the prior distribution and the approximate posterior distribution is minimized to construct the KL loss, and the weights of this loss are adjusted using a cyclic annealing strategy; z and By splicing and merging, the latent variables of the merged data are obtained. Construct bag-of-words loss, making Go and learn the information from the standard answers; A knowledge selection module is constructed to calculate the external knowledge context features. and Relevance as a scaling weight and build filter tags To avoid repeatedly selecting knowledge that has appeared in previous dialogues, the final step is to... and , Multiplying yields external knowledge context features with scaling weights. ; The answer generation module is used to apply different linear layers to... A linear mapping is performed, and the resulting feature vector is concatenated and fused layer by layer with the key and value feature vectors of the decoder. Then, all external knowledge features are processed. By splicing This information is then incorporated into the final layer of the decoder, and finally, the output of the decoder is used to calculate the probability distribution for generating the response. And construct the negative log-likelihood loss.