Dialogue generation method and device based on multi-layer attention, equipment and medium

What is AI technical title?
AI technical title is built by Patsnap AI team. It summarizes the technical point description of the patent document.
The dialogue generation method using a multi-layer attention mechanism solves the problem of sentences deviating from the topic in multi-turn dialogue generation, and improves the accuracy and relevance of the generated text.

CN116644164BActive Publication Date: 2026-06-26PING AN TECH (SHENZHEN) CO LTD

View PDF 2 Cites 0 Cited by

Patent Information

Authority / Receiving Office: CN · China
Patent Type: Patents(China)
Current Assignee / Owner: PING AN TECH (SHENZHEN) CO LTD
Filing Date: 2023-05-26
Publication Date: 2026-06-26

Application Information

Patent Timeline

26 May 2023

Application

26 Jun 2026

Publication

CN116644164B

IPC: G06F16/3329; G06F40/289; G06F18/22; G06F18/25; G06F18/214; G06N3/045; G06N3/0455; G06N3/044; G06N3/08

CPC: G06F16/3329; G06F40/289; G06F18/22; G06F18/25; G06F18/214; G06N3/04; G06N3/08; Y02D10/00

AI Tagging

Technology Topics

EngineeringTopic sentence

Technical Efficacy Phrases

improve accuracy improve relevance

Explore More Agents

Novelty Search
Search existing technologies and assess novelty
↗
FTO
Analyze whether a product may infringe others' patents
↗
Design FTO
Check prior-design risk for exterior design
↗
Drafting
Draft patent application text based on a technical solution
↗
Find Solutions with TRIZ
Generate feasible solution to solve your technical challenge
↗

Similar Technology Patents

Get free access to AI patent search and analysis

Check patentability, review prior art and ask IP Agent with full patent context.

AI Technical Summary

Technical Problem

Existing technologies neglect attention between sentences in multi-turn dialogue generation, causing generated sentences to deviate from the topic and resulting in low accuracy.

Method used

We employ a multi-layer attention-based dialogue generation method, which combines word attention, sentence attention, and self-attention mechanisms with a pre-trained encoder and decoder to generate more accurate dialogue text.

Benefits of technology

It improves the accuracy of dialogue generation, ensures that the generated text is relevant to the topic, and fully considers contextual information.

✦ Generated by Eureka AI based on patent content.

Smart Images

Figure CN116644164B_ABST

Patent Text Reader

Abstract

The application relates to an artificial intelligence technology and discloses a multi-layer attention-based dialogue generation method which can be used in the medical or financial fields, comprising the following steps: performing word segmentation on dialogue text to obtain a word segmentation sequence set; using a first encoder to encode the word segmentation sequence set to obtain a topic sentence vector set; using an attention mechanism to weight the topic sentence vector set to obtain a final sentence vector set; using a second encoder to perform fusion encoding on the final sentence vector set, calculating the similarity between a fusion vector and the topic sentence vector set, weighting and summing the topic sentence vector set according to the similarity to obtain an overall topic vector; using a self-attention mechanism to weight the final sentence vector set to obtain an attention vector set; and using a decoder to decode and splice the overall topic vector and the attention vector set to obtain an answer text. The application further discloses a multi-layer attention-based dialogue generation device, an electronic device and a storage medium. The application can improve the dialogue generation accuracy.

Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of artificial intelligence technology, and in particular to a dialogue generation method, apparatus, electronic device, and computer-readable storage medium based on multi-layer attention. Background Technology

[0002] With the development of deep learning and neural networks, dialogue generation systems have become a research focus in the field of artificial intelligence. In particular, multi-turn dialogue scenarios are widely used in various industries such as human-computer interaction, smart homes, financial customer service, medical consultation, and social robots, where the accuracy of dialogue generation is of paramount importance.

[0003] Currently, in multi-turn dialogue scenarios, attention mechanisms between words are often introduced, which can only focus on the semantics of the current sentence and ignore the attention between sentences, thus causing deviations in the accuracy of the sentences generated in the current dialogue. Furthermore, each turn of dialogue contains hidden topic information, and the topic directly affects the semantic cues in multi-turn dialogue. Introducing only attention mechanisms between words results in a lack of topic information, causing the sentences generated in the dialogue to deviate from the topic and resulting in low accuracy of dialogue generation. Summary of the Invention

[0004] This invention provides a dialogue generation method, apparatus, and computer-readable storage medium based on multi-layer attention, with the main objective of solving the problem of low accuracy in dialogue generation.

[0005] To achieve the above objectives, the present invention provides a dialogue generation method based on multi-layer attention, comprising:

[0006] Obtain the dialogue text from a multi-turn dialogue task, perform word segmentation on the dialogue text, and obtain a word segmentation sequence set;

[0007] Using the pre-trained first encoder, the word segmentation sequences in the word segmentation sequence set are encoded to obtain the topic sentence vector set, and attention weights are assigned to each topic sentence vector in the topic sentence vector set to obtain the final sentence vector set;

[0008] The final sentence vector set is fused and encoded using the pre-trained second encoder to obtain a fused encoded vector. The similarity between the fused encoded vector and each topic sentence vector in the topic sentence vector set is calculated. The similarity is normalized to obtain a sentence attention weight set. The corresponding topic sentence vectors in the topic sentence vector set are weighted and summed according to the sentence attention weight set to obtain the overall topic vector.

[0009] The final sentence vector set is weighted using a self-attention mechanism to obtain the target context attention vector set.

[0010] The overall topic vector and the target context attention vector are decoded using a preset decoder to obtain the answer word set. The generated word set is then concatenated in sequence to obtain the answer sentence text of the dialogue text.

[0011] Optionally, the first encoder, which has been pre-trained, encodes the word segmentation sequences in the word segmentation sequence set to obtain a topic sentence vector set, including:

[0012] The word segmentation sequences in the word segmentation sequence set are encoded sequentially using the forward recurrent network in the first pre-trained encoder to obtain a forward sentence vector set;

[0013] The reverse recurrent network in the first encoder is used to encode each word sequence in the word segmentation sequence set in reverse, so as to obtain the reverse sentence vector set;

[0014] By concatenating the vectors at corresponding positions in the forward sentence vector set and the reverse sentence vector set, the topic sentence vector set is obtained.

[0015] Optionally, assigning attention weights to each topic sentence vector in the topic sentence vector set to obtain the final sentence vector set includes:

[0016] The attention weight of each topic sentence vector in the topic sentence vector set is calculated using the attention mechanism in the pre-defined fully connected layer;

[0017] The attention weight of each topic sentence vector is multiplied by the corresponding topic sentence vector to obtain a weighted topic sentence vector set. The weighted topic sentence vectors in the weighted topic sentence vector set are added to the corresponding topic sentence vectors to obtain the final sentence vector set.

[0018] Optionally, the step of using a pre-trained second encoder to fuse and encode the final sentence vector set to obtain a fused encoded vector includes:

[0019] The first final sentence vector in the final sentence vector set is obtained by weighted summing the first hidden layer in the pre-trained second encoder;

[0020] The second final sentence vector and the first final sentence hidden layer vector in the final sentence vector set are weighted and summed using the second hidden layer in the second encoder to obtain the second final sentence hidden layer vector. The weighted summation of the final sentence vectors is performed sequentially until the last sentence vector in the final sentence vector set is traversed to obtain the fused encoding vector.

[0021] Optionally, the step of applying self-attention weights to the final sentence vector set using a self-attention mechanism to obtain the target context attention vector set includes:

[0022] The final sentence vectors in the final sentence vector set are copied three times to obtain the first final sentence vector set, the second final sentence vector set, and the third final sentence vector set;

[0023] The first final sentence vector in the first final sentence vector set is multiplied by the inner product of all final sentence vectors in the second final sentence vector set to obtain the first inner product value set. The first inner product value is normalized to obtain the first self-attention weight set. The first self-attention weight set is multiplied by the first final sentence vector in the third final sentence vector set to obtain the first context attention vector set.

[0024] The inner product of the second final sentence vector in the first final sentence vector set with all the final sentence vectors in the second final sentence vector set is calculated to obtain the second inner product value set. The second inner product value set is normalized to obtain the second self-attention weight set. The second self-attention weight set is multiplied with the second final sentence vector in the third final sentence vector set to obtain the second context attention vector set.

[0025] The first context attention vector set and the second context attention vector set are processed by a fully connected layer to obtain the target context attention vector set.

[0026] Optionally, the step of decoding the overall topic vector and the target context attention vector using a preset decoder to obtain the answer word set includes:

[0027] A random word vector is generated as the first word vector;

[0028] The first word vector, the overall topic vector, and the target context attention vector are decoded N times using a preset decoder until a terminator is detected, thus obtaining the answer word set, where N is an integer greater than 1.

[0029] Optionally, the step of sequentially encoding each word segmentation sequence in the word segmentation sequence set using the forward recurrent network in the pre-trained first encoder to obtain a forward sentence vector set includes:

[0030] Using the first node of the forward recurrent network in the pre-trained first encoder, the first word of each word segmentation sequence in the word segmentation sequence set is encoded to obtain the first word segmentation vector corresponding to each word segmentation sequence;

[0031] Using the second node of the forward recurrent network in the first encoder, the first word segmentation vector and the next word in each corresponding word segmentation sequence are encoded respectively to obtain the second word segmentation vector corresponding to each word segmentation sequence. This process continues until all word segmentation in each word segmentation sequence has been traversed. All word segmentation vectors are then integrated to obtain a forward sentence vector set.

[0032] To address the aforementioned problems, the present invention also provides a dialogue generation device based on multi-layer attention, the device comprising:

[0033] The word segmentation module is used to obtain the dialogue text in a multi-turn dialogue task, perform word segmentation on the dialogue text, and obtain a word segmentation sequence set.

[0034] The word attention weighting module is used to encode the word segmentation sequences in the word segmentation sequence set using the pre-trained first encoder to obtain the topic sentence vector set, and to assign attention weights to each topic sentence vector in the topic sentence vector set to obtain the final sentence vector set.

[0035] The topic vector fusion module is used to fuse and encode the final sentence vector set using a pre-trained second encoder to obtain a fused encoded vector, and to calculate the similarity between the fused encoded vector and each topic sentence vector in the topic sentence vector set. The similarity is normalized to obtain a sentence attention weight set. The topic sentence vectors corresponding to the topic sentence vector set are weighted and summed according to the sentence attention weight set to obtain the overall topic vector.

[0036] The sentence attention weighting module is used to perform self-attention weighting on the final sentence vector set using a self-attention mechanism to obtain the target context attention vector set.

[0037] The decoding module is used to decode the overall topic vector and the target context attention vector using a preset decoder to obtain the answer word set, and then concatenate the generated word set in sequence to obtain the answer sentence text of the dialogue text.

[0038] To address the above problems, the present invention also provides an electronic device, the electronic device comprising:

[0039] At least one processor; and,

[0040] A memory communicatively connected to the at least one processor; wherein,

[0041] The memory stores a computer program that can be executed by the at least one processor, which enables the at least one processor to perform the multi-layer attention-based dialogue generation method described above.

[0042] To address the aforementioned problems, the present invention also provides a computer-readable storage medium storing at least one computer program, which is executed by a processor in an electronic device to implement the above-described dialogue generation method based on multi-layer attention.

[0043] This invention, in its embodiments, acquires dialogue text from a multi-turn dialogue task, performs word segmentation on the dialogue text to obtain a word segmentation sequence set, and uses a pre-trained first encoder to encode the word segmentation sequences in the word segmentation sequence set to obtain a topic sentence vector set. Attention weights are assigned to each topic sentence vector in the topic sentence vector set to obtain a final sentence vector set. Weights are assigned to the topic sentence vectors from the perspective of word attention, so that the final sentence vectors can better reflect the semantics of the entire dialogue text and the degree of influence on the generated text of the current dialogue, thereby improving the accuracy of the generated text. Further, a pre-trained second encoder is used to perform fusion encoding on the final sentence vector set to obtain a fusion encoded vector. The similarity between the fusion encoded vector and each topic sentence vector in the topic sentence vector set is calculated, and the similarity is normalized to obtain sentence annotations. The attention weight set is used to weight and sum the corresponding topic sentence vectors in the topic sentence vector set to obtain the overall topic vector. This fully considers the topic information of each sentence and the entire dialogue text, making the subsequent generated dialogue text more relevant to the topic and thus improving the accuracy of dialogue generation. Furthermore, a self-attention mechanism is used to assign self-attention weights to the final sentence vector set to obtain a target context attention vector set. This ensures that contextual information is fully considered in the dialogue scenario, further improving the accuracy of the generated dialogue text. A preset decoder is used to decode the overall topic vector and the target context attention vector to obtain the response word set. The generated word set is then concatenated sequentially to obtain the response sentence text of the dialogue text. This fully combines word attention, sentence context attention, and topic information, resulting in even higher accuracy of the generated dialogue text. Therefore, the dialogue generation method, device, electronic device, and computer-readable storage medium based on multi-layer attention proposed in this invention can solve the problem of low accuracy in dialogue text generation. Attached Figure Description

[0044] Figure 1 This is a flowchart illustrating a dialogue generation method based on multi-layer attention provided in an embodiment of the present invention.

[0045] Figure 2 for Figure 1 The diagram shows a detailed implementation flow of one step in a dialogue generation method based on multi-layer attention.

[0046] Figure 3 for Figure 1 The diagram shows a detailed implementation flow of another step in the dialogue generation method based on multi-layer attention.

[0047] Figure 4 A functional block diagram of a dialogue generation device based on multi-layer attention provided in an embodiment of the present invention;

[0048] Figure 5 This is a schematic diagram of the structure of an electronic device that implements the multi-layer attention-based dialogue generation method according to an embodiment of the present invention.

[0049] The realization of the objective, functional features and advantages of the present invention will be further explained in conjunction with the embodiments and with reference to the accompanying drawings. Detailed Implementation

[0050] It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.

[0051] This application provides a dialogue generation method based on multi-layer attention. The execution entity of the multi-layer attention-based dialogue generation method includes, but is not limited to, at least one of the following electronic devices that can be configured to execute the method provided in this application: a server, a terminal, etc. In other words, the multi-layer attention-based dialogue generation method can be executed by software or hardware installed on a terminal device or a server device, and the software can be a blockchain platform. The server includes, but is not limited to, a single server, a server cluster, a cloud server, or a cloud server cluster. The server can be an independent server or a cloud server that provides basic cloud computing services such as cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communication, middleware services, domain name services, security services, content delivery networks (CDN), and big data and artificial intelligence platforms.

[0052] Reference Figure 1 The diagram shown is a flowchart illustrating a dialogue generation method based on multi-layer attention according to an embodiment of the present invention. In this embodiment, the dialogue generation method based on multi-layer attention includes:

[0053] S1. Obtain the dialogue text from the multi-turn dialogue task, and perform word segmentation on the dialogue text to obtain a word segmentation sequence set.

[0054] In this embodiment of the invention, the dialogue text is the text in a multi-turn dialogue task, such as the dialogue text between a doctor and patient on a consultation platform, which contains multiple sentences of dialogue text.

[0055] In this embodiment of the invention, the dialogue text can be segmented into sentences based on punctuation marks to obtain a text sentence sequence, wherein the text sentence sequence contains all sentences in the dialogue text. Furthermore, common word segmentation tools such as jieba and THULAC are used to segment each sentence in the text sentence sequence to obtain the word segmentation sequence corresponding to each sentence.

[0056] S2. Using the pre-trained first encoder, the word segmentation sequences in the word segmentation sequence set are encoded to obtain the topic sentence vector set, and attention weights are assigned to each topic sentence vector in the topic sentence vector set to obtain the final sentence vector set.

[0057] In this embodiment of the invention, the pre-trained first encoder can be a bidirectional gated recurrent neural network (GRU) model, which includes a forward recurrent network and a backward recurrent network.

[0058] For details, please refer to Figure 2 As shown in S2, the first encoder, which has been pre-trained, encodes the word segmentation sequences in the word segmentation sequence set to obtain a topic sentence vector set, including:

[0059] S21. Encode each word sequence in the word segmentation sequence set sequentially using the forward recurrent network in the first pre-trained encoder to obtain a forward sentence vector set;

[0060] S22. The reverse recurrent network in the first encoder is used to encode each word sequence in the word segmentation sequence set in reverse order to obtain the reverse sentence vector set;

[0061] S23. Concatenate the vectors at corresponding positions in the forward sentence vector set and the reverse sentence vector set to obtain the topic sentence vector set.

[0062] Further, S21 includes:

[0063] Using the first node of the forward recurrent network in the pre-trained first encoder, the first word of each word segmentation sequence in the word segmentation sequence set is encoded to obtain the first word segmentation vector corresponding to each word segmentation sequence;

[0064] Using the second node of the forward recurrent network in the first encoder, the first word segmentation vector and the next word in each corresponding word segmentation sequence are encoded respectively to obtain the second word segmentation vector corresponding to each word segmentation sequence. This process continues until all word segmentation in each word segmentation sequence has been traversed. All word segmentation vectors are then integrated to obtain a forward sentence vector set.

[0065] In this embodiment of the invention, one of the sequences in the word segmentation sequence set is c = {c1, c2, ..., c}. X The first word segment c1 is encoded using the first node of the forward recurrent network in the pre-trained first encoder to obtain the first word segment vector h1; the second word segment c2 and the first word segment vector h1 are encoded using the second node of the forward recurrent network in the first encoder to obtain the second word segment vector h2; this process continues until the Xth word segment c1 is encoded using the Xth node of the forward recurrent network in the first encoder.N and X-1 word segmentation vectors h X-1 Encode the Xth word segmentation vector h. X and the vector {h1,h2,…,h} corresponding to sequence c X}

[0066] In this embodiment of the invention, the reverse recurrent network in the first encoder is further used to reverse the word segmentation sequence c = {c1, c2, ..., c...} N Encode the token starting from the last word c. N From the beginning until all words in the segmentation sequence have been traversed, the encoding method is the same as that of forward encoding, and will not be elaborated here.

[0067] In this embodiment of the invention, a pre-trained first encoder is used to encode the word segmentation sequences in the word segmentation sequence set to obtain a topic sentence vector set, which makes the subsequent dialogue generated text closer to the dialogue topic, thereby making the accuracy of the dialogue generated text higher.

[0068] In one embodiment of the present invention, the preset fully connected layer is a restricted neural network.

[0069] Furthermore, the process described in S2, which assigns attention weights to each topic sentence vector in the topic sentence vector set to obtain the final sentence vector set, includes:

[0070] The attention weight of each topic sentence vector in the topic sentence vector set is calculated using the attention mechanism in the pre-defined fully connected layer;

[0071] The attention weight of each topic sentence vector is multiplied by the corresponding topic sentence vector to obtain a weighted topic sentence vector set. The weighted topic sentence vectors in the weighted topic sentence vector set are added to the corresponding topic sentence vectors to obtain the final sentence vector set.

[0072] In this embodiment of the invention, the sum of the attention weights of each topic sentence vector in the topic sentence vector set is limited to 1.

[0073] In this embodiment of the invention, weights are assigned to the topic sentence vectors so that the final sentence vectors can better reflect the semantics of the entire dialogue text and the degree of influence on the current dialogue-generated text, thereby making the accuracy of the dialogue-generated text higher.

[0074] S3. The pre-trained second encoder is used to fuse and encode the final sentence vector set to obtain a fused encoded vector. The similarity between the fused encoded vector and each topic sentence vector in the topic sentence vector set is calculated. The similarity is normalized to obtain a sentence attention weight set. The topic sentence vectors corresponding to the topic sentence vector set in the topic sentence vector set are weighted and summed according to the sentence attention weight set to obtain the overall topic vector.

[0075] In this embodiment of the invention, the pre-trained second encoder is a bidirectional gated recurrent neural network (GRU) model.

[0076] In detail, S3 describes using a pre-trained second encoder to fuse and encode the final sentence vector set to obtain a fused encoded vector, including:

[0077] The first final sentence vector in the final sentence vector set is obtained by weighted summing the first hidden layer in the pre-trained second encoder;

[0078] The second final sentence vector and the first final sentence hidden layer vector in the final sentence vector set are weighted and summed using the second hidden layer in the second encoder to obtain the second final sentence hidden layer vector. The weighted summation of the final sentence vectors is performed sequentially until the last sentence vector in the final sentence vector set is traversed to obtain the fused encoding vector.

[0079] In one embodiment of the present invention, the final sentence vector set {l1,l2,…,l... m} where l1 is the final sentence vector of the first sentence, which serves as the input sequence for the pre-trained second encoder. The first final sentence vector l1 in the final sentence vector set is weighted and summed using the first hidden layer of the pre-trained second encoder to obtain the first final sentence hidden layer vector l. 1t Using the second hidden layer in the second encoder, the second final sentence vector l2 and the first final sentence hidden layer vector l in the final sentence vector set are processed. 1t Perform a weighted summation to obtain the second final sentence hidden layer vector l. 2t ; until the m-th encoding is used Algorithms such as Euclidean distance and Pearson correlation coefficient are used to calculate the fused coding vector. The similarity scores with each topic sentence vector in the topic sentence vector set are calculated to obtain m similarity scores. These similarity scores are then normalized using softmax to obtain the sentence attention weight set β = {β1, β2, ..., β...}. m}; Further utilize the sentence attention weight set β={β1,β2,…,β mThe topic sentence vector set is weighted and summed to obtain the overall topic vector containing overall topic information.

[0080] In this embodiment of the invention, the normalized value of the similarity between the fused encoding vector and each topic sentence vector in the topic sentence vector set is used as the weight of each topic sentence vector. The overall topic vector of the overall topic information is calculated, which fully considers the topic information of the entire dialogue text, so that the subsequent dialogue generation text is more relevant to the topic, thereby making the accuracy of dialogue generation higher.

[0081] S4. Apply self-attention weights to the final sentence vector set using a self-attention mechanism to obtain the target context attention vector set.

[0082] For details, please refer to Figure 3 As shown, S4 includes:

[0083] S41. Copy the final sentence vectors in the final sentence vector set three times to obtain the first final sentence vector set, the second final sentence vector set, and the third final sentence vector set;

[0084] S42. Perform inner product calculations on the first final sentence vector in the first final sentence vector set and all final sentence vectors in the second final sentence vector set to obtain a first inner product value set. Normalize the first inner product values to obtain a first self-attention weight set. Multiply the first self-attention weight set with the first final sentence vector in the third final sentence vector set to obtain a first context attention vector set.

[0085] S43. Perform inner product calculation on the second final sentence vector in the first final sentence vector set and all final sentence vectors in the second final sentence vector set respectively to obtain the second inner product value set. Normalize the second inner product value set to obtain the second self-attention weight set. Multiply the second self-attention weight set with the second final sentence vector in the third final sentence vector set respectively to obtain the second context attention vector set.

[0086] S44. Perform a fully connected operation on the first context attention vector set and the second context attention vector set to obtain the target context attention vector set.

[0087] In this embodiment of the invention, the final sentence vectors {l1,l2,…,l} in the final sentence vector set are... m Make three copies to obtain the first final sentence vector set {k1,k2,…,k m}、The second final sentence vector set {q1,q2,…,q m} and the third final sentence vector set {v1,v2,…,v m}; The first final sentence vector k1 in the first final sentence vector set is compared with the second final sentence vector set {q1,q2,…,q...} m The inner product of all the final sentence vectors in the first inner product is calculated to obtain the first inner product value set. The first inner product value is normalized to obtain the first self-attention weight set. The first self-attention weight set is multiplied by v1 to obtain the first context attention vector set.

[0088] In this embodiment of the invention, the second final sentence vector k2 in the first final sentence vector set is further compared with the second final sentence vector set {q1,q2,…,q...} m The inner product of all the final sentence vectors in the array is calculated to obtain the second inner product value set. The second inner product value set is normalized to obtain the second self-attention weight set. The second self-attention weight set is multiplied by v2 to obtain the second context attention vector set.

[0089] In this embodiment of the invention, a vector that fuses the context of each sentence is obtained based on the attention between the final sentence vectors in the final sentence vector set. This allows the focus to be placed not only on the current text but also on the context information in the dialogue scenario, resulting in higher accuracy of the dialogue-generated text.

[0090] S5. Decode the overall topic vector and the target context attention vector using a preset decoder to obtain the answer word set, and concatenate the generated word set in order to obtain the answer sentence text of the dialogue text.

[0091] In this embodiment of the invention, the preset decoder can be a bidirectional gated recurrent neural network (GRU) model.

[0092] In detail, S5 describes using a preset decoder to decode the overall topic vector and the target context attention vector to obtain the answer word set, including:

[0093] A random word vector is generated as the first word vector;

[0094] The first word vector, the overall topic vector, and the target context attention vector are decoded N times using a preset decoder until a terminator is detected, thus obtaining the answer word set, where N is an integer greater than 1.

[0095] In one embodiment of the present invention, a preset decoder is used to decode the first word vector, the overall topic vector, and the target context attention vector to obtain a first target answer word; when no terminator is detected, the first target answer word is used as the target answer word of the previous time step, and the preset decoder is used to decode the target answer word of the previous time step, the overall topic vector, and the target context attention vector to obtain a second target answer word, until a terminator is detected to obtain the Nth target answer word, and all target answer words are integrated into an answer word set.

[0096] In another embodiment of the present invention, the end time of decoding the overall topic vector and the target context attention vector by the preset decoder can be limited according to the preset answer text length.

[0097] This invention, in its embodiments, acquires dialogue text from a multi-turn dialogue task, performs word segmentation on the dialogue text to obtain a word segmentation sequence set, and uses a pre-trained first encoder to encode the word segmentation sequences in the word segmentation sequence set to obtain a topic sentence vector set. Attention weights are assigned to each topic sentence vector in the topic sentence vector set to obtain a final sentence vector set. Weights are assigned to the topic sentence vectors from the perspective of word attention, so that the final sentence vectors can better reflect the semantics of the entire dialogue text and the degree of influence on the generated text of the current dialogue, thereby improving the accuracy of the generated text. Further, a pre-trained second encoder is used to perform fusion encoding on the final sentence vector set to obtain a fusion encoded vector. The similarity between the fusion encoded vector and each topic sentence vector in the topic sentence vector set is calculated, and the similarity is normalized to obtain sentence annotations. The attention weight set is used to weight and sum the corresponding topic sentence vectors in the topic sentence vector set to obtain the overall topic vector. This fully considers the topic information of each sentence and the entire dialogue text, making the subsequent generated dialogue text more relevant to the topic and thus improving the accuracy of dialogue generation. Furthermore, a self-attention mechanism is used to assign self-attention weights to the final sentence vector set to obtain a target context attention vector set. This ensures that contextual information is fully considered in the dialogue scenario, further improving the accuracy of the generated dialogue text. A preset decoder is used to decode the overall topic vector and the target context attention vector to obtain the response word set. These word sets are then concatenated sequentially to obtain the response sentence text of the dialogue text. This fully combines word attention, sentence context attention, and topic information, resulting in even higher accuracy of the generated dialogue text. Therefore, the dialogue generation method based on multi-layer attention proposed in this invention can solve the problem of low accuracy in dialogue text generation.

[0098] like Figure 4The diagram shown is a functional block diagram of a dialogue generation device based on multi-layer attention provided in an embodiment of the present invention.

[0099] The dialogue generation device 100 based on multi-layer attention described in this invention can be installed in an electronic device. Depending on the functions implemented, the dialogue generation device 100 based on multi-layer attention may include a word segmentation module 101, a word attention weighting module 102, a topic vector fusion module 103, a sentence attention weighting module 104, and a decoding module 105. The module described in this invention can also be called a unit, which refers to a series of computer program segments that can be executed by the processor of an electronic device and can perform a fixed function, and which are stored in the memory of the electronic device.

[0100] In this embodiment, the functions of each module / unit are as follows:

[0101] The word segmentation module 101 is used to acquire dialogue text in a multi-turn dialogue task, perform word segmentation on the dialogue text, and obtain a word segmentation sequence set.

[0102] The word attention weighting module 102 is used to encode the word segmentation sequences in the word segmentation sequence set using the pre-trained first encoder to obtain a topic sentence vector set, and to assign attention weights to each topic sentence vector in the topic sentence vector set to obtain a final sentence vector set.

[0103] The topic vector fusion module 103 is used to fuse and encode the final sentence vector set using a pre-trained second encoder to obtain a fused encoded vector, and to calculate the similarity between the fused encoded vector and each topic sentence vector in the topic sentence vector set, and to normalize the similarity to obtain a sentence attention weight set, and to perform a weighted summation of the corresponding topic sentence vectors in the topic sentence vector set according to the sentence attention weight set to obtain the overall topic vector;

[0104] The sentence attention weighting module 104 is used to perform self-attention weighting on the final sentence vector set using a self-attention mechanism to obtain the target context attention vector set.

[0105] The decoding module 105 is used to decode the overall topic vector and the target context attention vector using a preset decoder to obtain the answer word set, and then concatenate the generated word set in sequence to obtain the answer sentence text of the dialogue text.

[0106] In detail, the modules in the multi-layer attention-based dialogue generation device 100 described in this embodiment of the invention employ the same methods as described above. Figures 1 to 3 The technique used is the same as the dialogue generation method based on multi-layer attention described above, and it can produce the same technical effect, so it will not be repeated here.

[0107] like Figure 5 The diagram shown is a structural schematic of an electronic device that implements a dialogue generation method based on multi-layer attention, according to an embodiment of the present invention.

[0108] The electronic device 1 may include a processor 10, a memory 11, a communication bus 12, and a communication interface 13. It may also include a computer program stored in the memory 11 and capable of running on the processor 10, such as a dialogue generation program based on multi-layer attention.

[0109] In some embodiments, the processor 10 may be composed of integrated circuits, such as a single packaged integrated circuit or multiple integrated circuits with the same or different functions, including combinations of one or more central processing units (CPUs), microprocessors, digital processing chips, graphics processors, and various control chips. The processor 10 is the control unit of the electronic device, connecting various components of the entire electronic device through various interfaces and lines. It executes programs or modules stored in the memory 11 (e.g., executing a dialogue generation program based on multi-layer attention) and calls data stored in the memory 11 to perform various functions of the electronic device and process data.

[0110] The memory 11 includes at least one type of readable storage medium, including flash memory, portable hard drive, multimedia card, card-type memory (e.g., SD or DX memory), magnetic memory, magnetic disk, optical disk, etc. In some embodiments, the memory 11 can be an internal storage unit of an electronic device, such as a portable hard drive. In other embodiments, the memory 11 can be an external storage device of the electronic device, such as a plug-in portable hard drive, Smart Media Card (SMC), Secure Digital (SD) card, Flash Card, etc. Furthermore, the memory 11 can include both internal and external storage units of the electronic device. The memory 11 can be used not only to store application software and various types of data installed on the electronic device, such as code for a dialogue generation program based on multi-layer attention, but also to temporarily store data that has been output or will be output.

[0111] The communication bus 12 can be a Peripheral Component Interconnect (PCI) bus or an Extended Industry Standard Architecture (EISA) bus, etc. This bus can be divided into an address bus, a data bus, a control bus, etc. The bus is configured to enable communication between the memory 11 and at least one processor 10, etc.

[0112] The communication interface 13 is used for communication between the aforementioned electronic device and other devices, including a network interface and a user interface. Optionally, the network interface may include a wired interface and / or a wireless interface (such as a Wi-Fi interface, Bluetooth interface, etc.), typically used to establish communication connections between the electronic device and other electronic devices. The user interface may be a display, an input unit (such as a keyboard), or, optionally, a standard wired or wireless interface. Optionally, in some embodiments, the display may be an LED display, a liquid crystal display, a touch-sensitive liquid crystal display, or an OLED (Organic Light-Emitting Diode) touchscreen, etc. The display may also be appropriately referred to as a screen or display unit, used to display information processed in the electronic device and to display a visual user interface.

[0113] Figure 5 Only electronic devices with components are shown; it will be understood by those skilled in the art that... Figure 5 The structure shown does not constitute a limitation on the electronic device 1, and may include fewer or more components than shown, or combine certain components, or have different component arrangements.

[0114] For example, although not shown, the electronic device may also include a power supply (such as a battery) to power the various components. Preferably, the power supply can be logically connected to the at least one processor 10 through a power management device, thereby enabling functions such as charging management, discharging management, and power consumption management. The power supply may also include one or more DC or AC power supplies, recharging devices, power fault detection circuits, power converters or inverters, power status indicators, and other arbitrary components. The electronic device may also include various sensors, Bluetooth modules, Wi-Fi modules, etc., which will not be described in detail here.

[0115] It should be understood that the embodiments described are for illustrative purposes only and are not limited to this structure in the scope of the patent application.

[0116] The dialogue generation program based on multi-layer attention stored in the memory 11 of the electronic device 1 is a combination of multiple instructions, which, when run in the processor 10, can achieve the following:

[0117] Obtain the dialogue text from a multi-turn dialogue task, perform word segmentation on the dialogue text, and obtain a word segmentation sequence set;

[0118] Using the pre-trained first encoder, the word segmentation sequences in the word segmentation sequence set are encoded to obtain the topic sentence vector set, and attention weights are assigned to each topic sentence vector in the topic sentence vector set to obtain the final sentence vector set;

[0119] The final sentence vector set is fused and encoded using the pre-trained second encoder to obtain a fused encoded vector. The similarity between the fused encoded vector and each topic sentence vector in the topic sentence vector set is calculated. The similarity is normalized to obtain a sentence attention weight set. The corresponding topic sentence vectors in the topic sentence vector set are weighted and summed according to the sentence attention weight set to obtain the overall topic vector.

[0120] The final sentence vector set is weighted using a self-attention mechanism to obtain the target context attention vector set.

[0121] The overall topic vector and the target context attention vector are decoded using a preset decoder to obtain the answer word set. The generated word set is then concatenated in sequence to obtain the answer sentence text of the dialogue text.

[0122] Specifically, the specific implementation method of the processor 10 for the above instructions can be referred to the description of the relevant steps in the corresponding embodiment of the accompanying drawings, and will not be repeated here.

[0123] Furthermore, if the modules / units integrated in the electronic device 1 are implemented as software functional units and sold or used as independent products, they can be stored in a computer-readable storage medium. The computer-readable storage medium can be volatile or non-volatile. For example, the computer-readable medium may include: any entity or device capable of carrying the computer program code, a recording medium, a USB flash drive, a portable hard drive, a magnetic disk, an optical disk, a computer memory, or a read-only memory (ROM).

[0124] The present invention also provides a computer-readable storage medium storing a computer program, which, when executed by a processor of an electronic device, can perform the following:

[0125] Obtain the dialogue text from a multi-turn dialogue task, perform word segmentation on the dialogue text, and obtain a word segmentation sequence set;

[0126] Using the pre-trained first encoder, the word segmentation sequences in the word segmentation sequence set are encoded to obtain the topic sentence vector set, and attention weights are assigned to each topic sentence vector in the topic sentence vector set to obtain the final sentence vector set;

[0127] The final sentence vector set is fused and encoded using the pre-trained second encoder to obtain a fused encoded vector. The similarity between the fused encoded vector and each topic sentence vector in the topic sentence vector set is calculated. The similarity is normalized to obtain a sentence attention weight set. The corresponding topic sentence vectors in the topic sentence vector set are weighted and summed according to the sentence attention weight set to obtain the overall topic vector.

[0128] The final sentence vector set is weighted using a self-attention mechanism to obtain the target context attention vector set.

[0129] The overall topic vector and the target context attention vector are decoded using a preset decoder to obtain the answer word set. The generated word set is then concatenated in sequence to obtain the answer sentence text of the dialogue text.

[0130] In the several embodiments provided by this invention, it should be understood that the disclosed devices, apparatuses, and methods can be implemented in other ways. For example, the apparatus embodiments described above are merely illustrative; for instance, the division of modules is only a logical functional division, and other division methods may be used in actual implementation.

[0131] The modules described as separate components may or may not be physically separate. The components shown as modules may or may not be physical units; that is, they may be located in one place or distributed across multiple network units. Some or all of the modules can be selected to achieve the purpose of this embodiment according to actual needs.

[0132] Furthermore, the functional modules in the various embodiments of the present invention can be integrated into one processing unit, or each unit can exist physically separately, or two or more units can be integrated into one unit. The integrated unit can be implemented in hardware or in the form of hardware plus software functional modules.

[0133] It will be apparent to those skilled in the art that the present invention is not limited to the details of the exemplary embodiments described above, and that the present invention can be implemented in other specific forms without departing from the spirit or essential characteristics of the present invention.

[0134] Therefore, the embodiments should be considered exemplary and non-limiting in all respects, and the scope of the invention is defined by the appended claims rather than the foregoing description. Thus, all variations falling within the meaning and scope of equivalents of the claims are intended to be embraced within the invention. No appended diagram markings in the claims should be construed as limiting the scope of the claims.

[0135] The blockchain referred to in this invention is a novel application model of computer technologies such as distributed data storage, peer-to-peer transmission, consensus mechanisms, and encryption algorithms. Essentially, a blockchain is a decentralized database, a chain of data blocks linked together using cryptographic methods. Each data block contains information about a batch of network transactions, used to verify the validity of the information (anti-counterfeiting) and generate the next block. A blockchain can include an underlying blockchain platform, a platform product service layer, and an application service layer.

[0136] The embodiments of this application can acquire and process relevant data based on artificial intelligence technology. Artificial intelligence (AI) refers to the theories, methods, technologies, and application systems that use digital computers or machines controlled by digital computers to simulate, extend, and expand human intelligence, perceive the environment, acquire knowledge, and use that knowledge to obtain optimal results.

[0137] Furthermore, it is clear that the word "comprising" does not exclude other units or steps, and the singular does not exclude the plural. Multiple units or devices recited in a system claim may also be implemented by a single unit or device through software or hardware. The terms "first," "second," etc., are used to indicate names and do not indicate any specific order.

[0138] Finally, it should be noted that the above embodiments are only used to illustrate the technical solutions of the present invention and are not intended to limit it. Although the present invention has been described in detail with reference to preferred embodiments, those skilled in the art should understand that modifications or equivalent substitutions can be made to the technical solutions of the present invention without departing from the spirit and scope of the technical solutions of the present invention.

Claims

1. A dialogue generation method based on multi-layer attention, characterized in that, The method includes: Obtain the dialogue text from a multi-turn dialogue task, perform word segmentation on the dialogue text, and obtain a word segmentation sequence set; Using the pre-trained first encoder, the word segmentation sequences in the word segmentation sequence set are encoded to obtain the topic sentence vector set, and attention weights are assigned to each topic sentence vector in the topic sentence vector set to obtain the final sentence vector set; The final sentence vector set is fused and encoded using the pre-trained second encoder to obtain a fused encoded vector. The similarity between the fused encoded vector and each topic sentence vector in the topic sentence vector set is calculated. The similarity is normalized to obtain a sentence attention weight set. The corresponding topic sentence vectors in the topic sentence vector set are weighted and summed according to the sentence attention weight set to obtain the overall topic vector. The final sentence vector set is weighted using a self-attention mechanism to obtain the target context attention vector set. The overall topic vector and the target context attention vector are decoded using a preset decoder to obtain the answer word set. The word set is then concatenated in sequence to generate the answer sentence text of the dialogue text. The step of applying self-attention weights to the final sentence vector set using a self-attention mechanism to obtain the target context attention vector set includes: The final sentence vectors in the final sentence vector set are copied three times to obtain the first final sentence vector set, the second final sentence vector set, and the third final sentence vector set; The first final sentence vector in the first final sentence vector set is multiplied by the inner product of all final sentence vectors in the second final sentence vector set to obtain the first inner product value set. The first inner product value is normalized to obtain the first self-attention weight set. The first self-attention weight set is multiplied by the first final sentence vector in the third final sentence vector set to obtain the first context attention vector set. The inner product of the second final sentence vector in the first final sentence vector set with all the final sentence vectors in the second final sentence vector set is calculated to obtain the second inner product value set. The second inner product value set is normalized to obtain the second self-attention weight set. The second self-attention weight set is multiplied with the second final sentence vector in the third final sentence vector set to obtain the second context attention vector set. The first context attention vector set and the second context attention vector set are processed by a fully connected layer to obtain the target context attention vector set.

2. The dialogue generation method based on multi-layer attention as described in claim 1, characterized in that, The first encoder, which has been pre-trained, encodes the word segmentation sequences in the word segmentation sequence set to obtain a topic sentence vector set, including: The word segmentation sequences in the word segmentation sequence set are encoded sequentially using the forward recurrent network in the first pre-trained encoder to obtain a forward sentence vector set; The reverse recurrent network in the first encoder is used to encode each word sequence in the word segmentation sequence set in reverse, so as to obtain the reverse sentence vector set; By concatenating the vectors at corresponding positions in the forward sentence vector set and the reverse sentence vector set, the topic sentence vector set is obtained.

3. The dialogue generation method based on multi-layer attention as described in claim 1, characterized in that, Attention weights are assigned to each topic sentence vector in the topic sentence vector set to obtain the final sentence vector set, including: The attention weight of each topic sentence vector in the topic sentence vector set is calculated using the attention mechanism in the pre-defined fully connected layer; The attention weight of each topic sentence vector is multiplied by the corresponding topic sentence vector to obtain a weighted topic sentence vector set. The weighted topic sentence vectors in the weighted topic sentence vector set are added to the corresponding topic sentence vectors to obtain the final sentence vector set.

4. The dialogue generation method based on multi-layer attention as described in claim 1, characterized in that, The step of using a pre-trained second encoder to fuse and encode the final sentence vector set to obtain a fused encoded vector includes: The first final sentence vector in the final sentence vector set is obtained by weighted summing the first hidden layer in the pre-trained second encoder; The second final sentence vector and the first final sentence hidden layer vector in the final sentence vector set are weighted and summed using the second hidden layer in the second encoder to obtain the second final sentence hidden layer vector. The weighted summation of the final sentence vectors is performed sequentially until the last sentence vector in the final sentence vector set is traversed to obtain the fused encoding vector.

5. The dialogue generation method based on multi-layer attention as described in any one of claims 1 to 4, characterized in that, The step involves decoding the overall topic vector and the target context attention vector using a preset decoder to obtain the answer word set, including: A random word vector is generated as the first word vector; The first word vector, the overall topic vector, and the target context attention vector are decoded N times using a preset decoder until a terminator is detected, thus obtaining the answer word set, where N is an integer greater than 1.

6. The dialogue generation method based on multi-layer attention as described in claim 2, characterized in that, The first encoder, which has been pre-trained, sequentially encodes each word segmentation sequence in the word segmentation sequence set using a forward recurrent network to obtain a forward sentence vector set, including: Using the first node of the forward recurrent network in the pre-trained first encoder, the first word of each word segmentation sequence in the word segmentation sequence set is encoded to obtain the first word segmentation vector corresponding to each word segmentation sequence; Using the second node of the forward recurrent network in the first encoder, the first word segmentation vector and the next word in each corresponding word segmentation sequence are encoded respectively to obtain the second word segmentation vector corresponding to each word segmentation sequence. This process continues until all word segmentation in each word segmentation sequence has been traversed. All word segmentation vectors are then integrated to obtain a forward sentence vector set.

7. A dialogue generation apparatus based on multi-layer attention, used to implement the dialogue generation method based on multi-layer attention as described in any one of claims 1 to 6, characterized in that, The device includes: The word segmentation module is used to obtain the dialogue text in a multi-turn dialogue task, perform word segmentation on the dialogue text, and obtain a word segmentation sequence set. The word attention weighting module is used to encode the word segmentation sequences in the word segmentation sequence set using the pre-trained first encoder to obtain the topic sentence vector set, and to assign attention weights to each topic sentence vector in the topic sentence vector set to obtain the final sentence vector set. The topic vector fusion module is used to fuse and encode the final sentence vector set using a pre-trained second encoder to obtain a fused encoded vector, and to calculate the similarity between the fused encoded vector and each topic sentence vector in the topic sentence vector set. The similarity is normalized to obtain a sentence attention weight set. The topic sentence vectors corresponding to the topic sentence vector set are weighted and summed according to the sentence attention weight set to obtain the overall topic vector. The sentence attention weighting module is used to perform self-attention weighting on the final sentence vector set using a self-attention mechanism to obtain the target context attention vector set. The decoding module is used to decode the overall topic vector and the target context attention vector using a preset decoder to obtain the answer word set, and then concatenate them in order to generate the answer sentence text of the dialogue text.

8. An electronic device, characterized in that, The electronic device includes: At least one processor; and, A memory communicatively connected to the at least one processor; wherein, The memory stores a computer program that can be executed by the at least one processor, the computer program being executed by the at least one processor to enable the at least one processor to perform the dialogue generation method based on multi-layer attention as described in any one of claims 1 to 6.

9. A computer-readable storage medium storing a computer program, characterized in that, When the computer program is executed by a processor, it implements the dialogue generation method based on multi-layer attention as described in any one of claims 1 to 6.

Citation Information

Patent Citations

CN115168555A
CN115169367A

Patent Information

AI Technical Summary

Abstract

Description

Patent Citations

CN115168555A

CN115169367A