Method for obtaining reply model, method and device for obtaining reply sentence, and equipment

By using the trained response model and the encoding and decoding sub-models to generate response statements, the problems of low efficiency and poor quality in existing technologies are solved, and efficient and high-quality response statement generation is achieved.

CN114281958BActive Publication Date: 2026-06-19TENCENT TECHNOLOGY (SHENZHEN) CO LTD

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
TENCENT TECHNOLOGY (SHENZHEN) CO LTD
Filing Date
2021-10-20
Publication Date
2026-06-19

AI Technical Summary

Technical Problem

Existing KGD systems that externalize knowledge are inefficient and of poor quality when generating response statements, mainly because they use coarse-grained knowledge, which makes the process of obtaining response statements time-consuming and of low quality.

Method used

By acquiring the first training sample, a response model is trained based on the reference speech, the reference response, and the first knowledge statement corresponding to each word in the reference speech. The model includes an encoding sub-model and a decoding sub-model. The encoding sub-model is used to encode the input target speech into a target vector, and the decoding sub-model is used to output the target response statement based on the target vector.

Benefits of technology

It improves the efficiency and quality of response statement retrieval, reduces the need for additional retrieval knowledge by learning fine-grained knowledge, and the output response statements take into account more detailed knowledge content.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN114281958B_ABST
    Figure CN114281958B_ABST
Patent Text Reader

Abstract

This application discloses a method for obtaining a response model, a method for obtaining a response statement, an apparatus, and a device, belonging to the field of artificial intelligence technology. The method for obtaining the response model includes: obtaining a first training sample, which is generated based on a reference speech statement, a reference response statement, and first knowledge statements corresponding to each first word in the reference speech statement. A response model is trained based on the first training sample. The response model includes an encoding sub-model and a decoding sub-model. The encoding sub-model encodes the input target speech statement into a target vector, and the decoding sub-model outputs the target response statement based on the target vector. When using the response model trained by this application to output a response statement, there is no need to retrieve knowledge and input it into the response model, resulting in high efficiency in obtaining the response statement. The obtained response statement considers fine-grained knowledge and has high quality. This application can be applied to the mapping field.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This application relates to the field of artificial intelligence technology, and in particular to a method for obtaining a response model, a method for obtaining a response statement, an apparatus, and a device. Background Technology

[0002] A dialogue system is a system that can output response statements in response to input utterances. Because of the desire to improve the quality of output response statements, KGD (Knowledge-grounded Dialog Generation) systems have emerged. KGD systems consider the knowledge related to the utterances when generating response statements.

[0003] In related technologies, KGD systems are knowledge externalization KGD systems. When training a knowledge externalization KGD system, the knowledge statements corresponding to the spoken statements are input into the system.

[0004] However, because the relevant technologies use knowledge statements corresponding to spoken statements during training, and these knowledge statements are coarse-grained, the knowledge externalization KGD system learns only coarse-grained knowledge. Therefore, in obtaining response statements, it is necessary to search for related knowledge statements in the knowledge base, a process that consumes considerable time, resulting in low efficiency in obtaining response statements through the knowledge externalization KGD system. Furthermore, the response statements obtained through the knowledge externalization KGD system only consider coarse-grained knowledge, thus leading to lower quality response statements. Summary of the Invention

[0005] This application provides a method, apparatus, and device for obtaining response models and response statements, to address the problems of low efficiency and low quality of response statements obtained through KGD systems that externalize knowledge in related technologies. The technical solution is as follows:

[0006] On the one hand, a method for obtaining a response model is provided, the method including:

[0007] Obtain the first training sample, which is generated based on the reference speech statement, the reference reply statement, and the first knowledge statement corresponding to each first word in the reference speech statement;

[0008] The response model is trained based on the first training sample. The response model includes an encoding sub-model and a decoding sub-model. The encoding sub-model is used to encode the input target statement into a target vector, and the decoding sub-model is used to output the target response statement based on the target vector.

[0009] In an exemplary embodiment, determining a third function value based on a first knowledge statement in a fifth vector and a second positive training sample, and determining a fourth function value based on a first knowledge statement in a fifth vector and a second negative training sample, includes:

[0010] The first knowledge statement in the second positive training sample is encoded into the third vector group, and the third vector group is pooled to obtain the sixth vector. The first knowledge statement in the second negative training sample is encoded into the fourth vector group, and the fourth vector group is pooled to obtain the seventh vector.

[0011] Map and normalize the fifth, sixth, and seventh vectors to obtain updated fifth, sixth, and seventh vectors of the same length;

[0012] Calculate the dot product between the updated fifth and sixth vectors to obtain the third function value, and calculate the dot product between the updated fifth and seventh vectors to obtain the fourth function value.

[0013] On the one hand, a method for obtaining reply statements is provided, the method including:

[0014] Retrieve the target message that needs to be replied to;

[0015] The target statement is input into the encoding sub-model in the response model. The encoding sub-model is used to encode the target statement into a target vector. The response model also includes a decoding sub-model, which is used to output the target response statement based on the target vector. The encoding sub-model and the decoding sub-model are trained based on the first training sample. The first training sample is generated based on the reference statement, the reference response statement, and the first knowledge statement corresponding to each first word in the reference statement.

[0016] Obtain the target response statement output by the decoding submodel in the response model.

[0017] On the one hand, a device for acquiring a response model is provided, the device comprising:

[0018] The acquisition module is used to acquire the first training sample, which is generated based on the reference speech statement, the reference reply statement, and the first knowledge statement corresponding to each first word in the reference speech statement.

[0019] The training module is used to train the response model based on the first training sample. The response model includes an encoding sub-model and a decoding sub-model. The encoding sub-model is used to encode the input target utterance into a target vector, and the decoding sub-model is used to output the target response utterance based on the target vector.

[0020] In an exemplary embodiment, the acquisition module is configured to acquire a reference speech statement, determine a first knowledge statement corresponding to each first word in the reference speech statement, generate a first sub-training sample based on the first knowledge statement corresponding to each first word in the reference speech statement, and determine a reference response statement corresponding to the reference speech statement, and generate a second sub-training sample based on the reference response statement.

[0021] The training module is used to train the encoding sub-model and the decoding sub-model based on the first sub-training sample and the second sub-training sample.

[0022] In an exemplary embodiment, the acquisition module is used to input a reference speech statement into a retrieval model. The retrieval model is used to encode each first word in the reference speech statement into a first vector and output each first vector. Each first word corresponds one-to-one with each first vector. The retrieval model is trained based on a second training sample, which is generated based on a second word, the statement containing the second word, and the second knowledge statement corresponding to the second word. The module obtains each first vector output by the retrieval model. For any first vector, the module retrieves the candidate knowledge statement with the highest relevance to any first vector from the candidate knowledge statements included in the knowledge base, thereby obtaining the first knowledge statement corresponding to any first vector.

[0023] In an exemplary embodiment, the acquisition module is further configured to acquire a description statement corresponding to a term, the description statement including at least one descriptive word for the term; for any descriptive word, the descriptive word is determined as a second word, and the description statement is determined as the statement containing the second word and the second knowledge statement corresponding to the second word; a second training sample is generated based on the second word, the statement containing the second word, and the second knowledge statement corresponding to the second word, and a retrieval model is trained based on the second training sample.

[0024] In an exemplary embodiment, the second training sample includes a first positive training sample and a first negative training sample. The acquisition module is configured to, for any second word, form a first positive training sample by combining any second word, the statement containing any second word, and the second knowledge statement corresponding to the second word, and form a first negative training sample by combining any second word, the statement containing any second word, and the second knowledge statements corresponding to other second words, wherein the second knowledge statements corresponding to other second words are unrelated to any second word.

[0025] In an exemplary embodiment, the acquisition module is configured to, for any second word, input the statement containing the second word into a first initial model to obtain a second vector corresponding to any second word output by the first initial model; determine a first function value between the second vector and the second knowledge statement in the first positive training sample, and determine a second function value between the second vector and the second knowledge statement in the first negative training sample; determine a first loss function value based on the first function value and the second function value, and update the first initial model to a retrieval model based on the first loss function value.

[0026] In an exemplary embodiment, the acquisition module is configured to encode the second knowledge statement in the first positive training sample into a first vector group, pool the first vector group to obtain a third vector, encode the second knowledge statement in the first negative training sample into a second vector group, pool the second vector group to obtain a fourth vector; map and normalize the second, third, and fourth vectors to obtain updated second, third, and fourth vectors of the same length; calculate the dot product between the updated second and third vectors to obtain a first function value, and calculate the dot product between the updated second and fourth vectors to obtain a second function value.

[0027] In an exemplary embodiment, the training module is configured to input the first word from the first sub-training sample into the second initial model to obtain the fifth vector corresponding to the first word in the first sub-training sample output by the second initial model; determine a second loss function value based on the fifth vector and the first knowledge statement in the first sub-training sample; input the fifth vector corresponding to each first word into the third initial model to determine a third loss function value, the third loss function value being used to indicate the probability that the third initial model outputs the reference response statement in the second sub-training sample based on the fifth vector corresponding to each first word; determine a fourth loss function value based on the second loss function value and the third loss function value; and update the second initial model to an encoding sub-model and the third initial model to a decoding sub-model based on the fourth loss function value.

[0028] In an exemplary embodiment, the first sub-training sample includes a second positive training sample and a second negative training sample. The acquisition module is configured to, for any first word, form a second positive training sample by combining any first word with the first knowledge statement corresponding to any first word, and form a second negative training sample by combining any first word with the first knowledge statement corresponding to other first words, wherein the first knowledge statement corresponding to other first words is unrelated to any first word.

[0029] In an exemplary embodiment, the training module is configured to, for any first word, input any first word into a second initial model to obtain a fifth vector corresponding to any first word output by the second initial model; determine a third function value based on the fifth vector and the first knowledge statement in the second positive training sample; determine a fourth function value based on the fifth vector and the first knowledge statement in the second negative training sample; and determine a second loss function value based on the third function value and the fourth function value.

[0030] In an exemplary embodiment, the training module is configured to encode the first knowledge statement in the second positive training sample into a third vector group, pool the third vector group to obtain a sixth vector, encode the first knowledge statement in the second negative training sample into a fourth vector group, pool the fourth vector group to obtain a seventh vector; map and normalize the fifth, sixth, and seventh vectors to obtain updated fifth, sixth, and seventh vectors of the same length; calculate the dot product between the updated fifth and sixth vectors to obtain a third function value, and calculate the dot product between the updated fifth and seventh vectors to obtain a fourth function value.

[0031] On the one hand, a device for obtaining response statements is provided, the device comprising:

[0032] The acquisition module is used to acquire the target message that needs to be replied to;

[0033] The input module is used to input the target speech statement into the encoding sub-model in the response model. The encoding sub-model is used to encode the target speech statement into a target vector. The response model also includes a decoding sub-model, which is used to output the target response statement based on the target vector. The encoding sub-model and the decoding sub-model are trained based on the first training sample. The first training sample is generated based on the reference speech statement, the reference response statement, and the first knowledge statement corresponding to each first word in the reference speech statement.

[0034] The acquisition module is used to obtain the target response statement output by the decoding sub-model in the response model.

[0035] On one hand, a computer device is provided, which includes a memory and a processor; the memory stores at least one instruction, which is loaded and executed by the processor to enable the computer device to implement the method for obtaining a response model or the method for obtaining a response statement provided in any exemplary embodiment of this application.

[0036] On one hand, a computer-readable storage medium is provided, which stores at least one instruction, which is loaded and executed by a processor to enable a computer to implement the method for obtaining a response model or the method for obtaining a response statement provided in any exemplary embodiment of this application.

[0037] On the other hand, a computer program or computer program product is provided, comprising: computer instructions, which, when executed by a computer, cause the computer to implement the method for obtaining a response model or the method for obtaining a response statement provided in any exemplary embodiment of this application.

[0038] The beneficial effects of the technical solutions provided in this application include at least the following:

[0039] In this embodiment, the training samples used to train the response model include knowledge statements corresponding to each word in the spoken statement. Since the knowledge statements corresponding to words are fine-grained knowledge, the response model can learn this fine-grained knowledge during training. Therefore, when using the trained response model to output response statements, there is no need to additionally retrieve knowledge and input it into the response model, improving the efficiency of obtaining response statements. Furthermore, the output response statements take into account fine-grained knowledge and have high quality. Attached Figure Description

[0040] To more clearly illustrate the technical solutions in the embodiments of this application, the accompanying drawings used in the description of the embodiments will be briefly introduced below. Obviously, the accompanying drawings described below are only some embodiments of this application. For those skilled in the art, other drawings can be obtained based on these drawings without creative effort.

[0041] Figure 1 This is a schematic diagram of the implementation environment provided in the embodiments of this application;

[0042] Figure 2 This is a flowchart of the method for obtaining the response model provided in the embodiments of this application;

[0043] Figure 3 This is a schematic diagram of the response model provided in the embodiments of this application;

[0044] Figure 4 This is a flowchart of a method for obtaining response statements provided in an embodiment of this application;

[0045] Figure 5 This is a schematic diagram of the structure of the response model acquisition device provided in the embodiments of this application;

[0046] Figure 6 This is a schematic diagram of the structure of the device for obtaining response statements provided in an embodiment of this application;

[0047] Figure 7 This is a schematic diagram of the structure of the electronic device provided in the embodiments of this application;

[0048] Figure 8 This is a schematic diagram of the server structure provided in an embodiment of this application. Detailed Implementation

[0049] To make the objectives, technical solutions, and advantages of this application clearer, the embodiments of this application will be described in further detail below with reference to the accompanying drawings.

[0050] This application provides a method for obtaining a response model and a method for obtaining a response statement. Both methods can be applied to, for example... Figure 1 The implementation environment shown. Figure 1 The device includes at least one electronic device 11 and a server 12, with the electronic device 11 communicating with the server 12.

[0051] Regarding the method for obtaining the response model: In response to the method being applied to electronic device 11, electronic device 11 obtains training samples from server 12, or obtains training samples through other means, thereby training the response model based on the training samples. In response to the method being applied to server 12, server 12 obtains training samples from electronic device 11, or obtains training samples through other means, thereby training the response model based on the training samples.

[0052] Regarding the method for obtaining the response statement: In response to the method being applied to electronic device 11, electronic device 11 downloads a response model from server 12. After electronic device 11 obtains the target statement to be responded to, it obtains a response statement for the target statement based on the response model downloaded from server 12. Electronic device 11 can then output the obtained response statement. Alternatively, in response to the method being applied to server 12, after electronic device 11 obtains the target statement to be responded to, it transmits the target statement to server 12, and server 12 obtains a response statement for the target statement based on the response model. Server 12 can then return the response statement to electronic device 11 so that electronic device 11 can output the response statement.

[0053] For example, electronic device 11 can be any electronic product that can interact with a user through one or more means such as a keyboard, touchpad, touch screen, remote control, voice interaction or handwriting device, such as PC (Personal Computer), computer, mobile phone, smartphone, PDA (Personal Digital Assistant), wearable device, PPC (Pocket PC), tablet computer, smart car system, smart TV, smart speaker, smart voice interaction device, smart home appliance and vehicle terminal, etc.

[0054] For example, server 12 can be a single server, a server cluster consisting of multiple servers, or a cloud computing service center.

[0055] Those skilled in the art should understand that the above-described electronic device 11 and server 12 are merely examples, and other existing or future electronic devices or servers that are applicable to this application should also be included within the scope of protection of this application, and are hereby incorporated by reference.

[0056] Based on the above Figure 1 For the implementation environment shown, see [link to implementation environment]. Figure 2 This application provides a method for obtaining response statements, which can be applied to a computer device, the computer device including... Figure 1 In the electronic device or server shown. For example... Figure 2 As shown, the method includes the following steps.

[0057] 201. Obtain the first training sample. The first training sample is generated based on the reference speech statement, the reference reply statement, and the first knowledge statement corresponding to each first word in the reference speech statement.

[0058] The first training sample is used to train the response model. Since the generation of the first training sample utilizes the first knowledge statements corresponding to each first word in the reference speech, and these first knowledge statements represent fine-grained knowledge, the first training sample includes fine-grained knowledge. The response model trained based on this first training sample exhibits good performance. The process of training the response model based on this first training sample is explained in section 202 below; here, the process of obtaining the first training sample is explained first.

[0059] In an exemplary embodiment, the first training sample includes a first sub-training sample and a second sub-training sample. Obtaining the first training sample includes the following steps 2011-2013.

[0060] In 2011, obtain the reference speech statements and determine the first knowledge statements corresponding to each first word in the reference speech statements.

[0061] For example, obtaining reference statements includes collecting reference statements from public or private datasets. This embodiment does not limit the method of obtaining reference statements. However, neither public nor private datasets include the fine-grained knowledge required in this embodiment, that is, they do not include the first knowledge statements corresponding to each first term in the reference statements. Therefore, this embodiment provides a retrieval model to mine the aforementioned fine-grained knowledge.

[0062] First, the process of training the retrieval model is explained through the following steps 20111-20114.

[0063] 20111, retrieve the description statement corresponding to the term, the description statement includes at least one descriptive word for the term.

[0064] For example, the entry may be from Wikipedia, Baidu Baike, or other encyclopedias. This embodiment does not limit the method of obtaining the entry. Each entry corresponds to a descriptive statement. For example, the descriptive statement for the entry "banana" is "banana is a yellow, curved fruit." The descriptive statement includes at least one descriptive word for the entry. For example, the descriptive statement "banana is a yellow, curved fruit" includes three descriptive words for the entry "banana": "yellow," "curved," and "fruit."

[0065] 20112, for any descriptive word, determine any descriptive word as the second word, and determine the descriptive statement as the statement containing the second word and the second knowledge statement corresponding to the second word.

[0066] The retrieval model is trained based on a second training sample. The second training sample is generated based on a second vocabulary word, the sentence containing the second vocabulary word, and the second knowledge sentence corresponding to the second vocabulary word. In this embodiment, after obtaining the descriptive sentence corresponding to the term, the descriptive vocabulary word is used as the second vocabulary word, the descriptive sentence is used as the sentence containing the second vocabulary word, and the descriptive sentence is used as the second knowledge sentence corresponding to the second vocabulary word.

[0067] Taking the example from 20111, we take the descriptive word "yellow" as the second word, the descriptive statement "bananas are a yellow, curved fruit" as the statement containing the second word, and the descriptive statement "bananas are a yellow, curved fruit" as the second knowledge statement corresponding to the second word. The descriptive words "curved" and "fruit" will not be elaborated upon here.

[0068] 20113, generate the second training sample based on the second word, the sentence containing the second word, and the second knowledge sentence corresponding to the second word.

[0069] In some implementations, the second training sample includes a first positive training sample and a first negative training sample. The second training sample is generated based on a second word, the sentence containing the second word, and the second knowledge statement corresponding to the second word. This includes: for any second word, forming a first positive training sample by combining the second word, the sentence containing the second word, and the second knowledge statement corresponding to the second word; and forming a first negative training sample by combining the second word, the sentence containing the second word, and the second knowledge statements corresponding to other second words, where the second knowledge statements corresponding to other second words are unrelated to any second word. It can be seen that a first positive training sample and a first negative training sample can be formed based on a single second word, and in the corresponding first positive training sample and first negative training sample, the second word and the sentence containing the second word are the same.

[0070] In the first positive training sample, the second word and the second knowledge statement corresponding to the second word are related. This is because, in this embodiment, the descriptive word is used as the second word, and the descriptive statement containing that descriptive word is used as the second knowledge statement corresponding to the second word. Since the descriptive word and the descriptive statement containing that descriptive word are related, the second word and the second knowledge statement corresponding to the second word are also related. Taking the example in 20111, the first positive training sample might be, for example, <yellow, banana is a yellow, curved fruit, banana is a yellow, curved fruit>.

[0071] In the first negative training sample, the second vocabulary and the second knowledge statements corresponding to other second vocabulary are unrelated, and this embodiment does not limit the other second vocabulary. Taking the example in 20111 as an example, the first negative training sample is, for example, <yellow, banana is a yellow, curved fruit, eggplant is purple>, where "eggplant is purple" is the second knowledge statement corresponding to other second vocabulary.

[0072] In some other implementations, the second training samples may consist only of the first positive training samples. The method for composing the first positive training samples is described above and will not be repeated here.

[0073] In 20114, the retrieval model was trained based on the second training sample.

[0074] In some implementations, the second training samples include a first positive training sample and a first negative training sample. In this implementation, the process of training the retrieval model based on the second training samples is called contrastive learning. Contrastive learning not only increases the number of second training samples but also increases the training difficulty, thereby ensuring the accuracy of the trained retrieval model. For example, training the retrieval model based on the second training samples includes the following steps 201141-201143.

[0075] 201141, for any second word, input the sentence containing any second word into the first initial model to obtain the second vector corresponding to any second word output by the first initial model.

[0076] In this context, the sentence containing the second word includes at least one of its preceding and following contexts. For example, if the second word is "curved" and the sentence containing it is "banana is a yellow, curved fruit," then the preceding context of the second word is "banana" and "yellow," and the following context is "fruit." When a sentence containing a second word is input into a first initial model, the first initial model can combine at least one of the preceding and following contexts of the second word to output a second vector corresponding to that second word.

[0077] The sentence containing the second word is represented as , It is a positive integer not less than 1. This is used to indicate the number of times the second word is included in the sentence containing the second word. (The second word...) ( Input the first initial model and obtain the second vector output by the first initial model. .

[0078] 201142, determine the first function value between the second vector and the second knowledge statement in the first positive training sample, and determine the second function value between the second vector and the second knowledge statement in the first negative training sample.

[0079] As explained above regarding the second training sample, in the corresponding first positive and first negative training samples based on a second word, the second word appears in the same sentence. Therefore, the second vector corresponding to a second word can be calculated using the second knowledge sentence in the first positive and first negative training samples, respectively, to obtain the aforementioned first and second function values.

[0080] In an exemplary embodiment, determining the first function value between the second vector and the second knowledge statement in the first positive training sample, and determining the second function value between the second vector and the second knowledge statement in the first negative training sample, includes: encoding the second knowledge statement in the first positive training sample into a first vector group; pooling the first vector group to obtain a third vector; encoding the second knowledge statement in the first negative training sample into a second vector group; pooling the second vector group to obtain a fourth vector; mapping and normalizing the second, third, and fourth vectors to obtain updated second, third, and fourth vectors of the same length; calculating the dot product between the updated second and third vectors to obtain the first function value; and calculating the dot product between the updated second and fourth vectors to obtain the second function value.

[0081] The second word in the first positive training sample is represented as: The second knowledge statement in the first positive training sample is represented as ,Will The first vector is encoded, and the third vector is obtained by pooling the first vector. The second knowledge statement in the first negative training sample is represented as... ,Will The second vector group is encoded, and the second vector group is pooled to obtain the fourth vector. Then, the second vector is mapped and normalized to obtain the updated second vector. For the third vector After mapping and normalization, the updated third vector is obtained. For the fourth vector After mapping and normalization, the updated fourth vector is obtained. The updated second vector The updated third vector and the updated fourth vector They are the same length.

[0082] Next, the updated second vector is calculated according to the following formula (1). and the updated third vector The dot product between them yields the first function value. :

[0083]

[0084] The updated second vector is calculated according to the following formula (2). and the updated fourth vector The dot product between them yields the second function value. :

[0085]

[0086] 201143, determine the first loss function value based on the first function value and the second function value, and update the first initial model to the retrieval model based on the first loss function value.

[0087] This embodiment does not limit the method by which the first loss function value is determined based on the first function value and the second function value. For example, this embodiment obtains the first loss function value by weighted summation of the first function value and the second function value. Another example is that this embodiment determines the first loss function value using the NCE (Noise-contrastive Estimation) loss function, which includes, but is not limited to, the InfoNCE loss function. Yet another example is that this embodiment determines the first loss function value using the hinge loss function, also known as the maximum-margin function.

[0088] The method of determining the first loss function value through the hinge loss function includes: calculating a first difference based on the first function value and the second function value; calculating the sum of the first difference and the first hyperparameter to obtain the updated first difference; and determining the larger value between the updated first difference and the first reference value as the first loss function value. In this embodiment, the first hyperparameter and the first reference value are not limited; for example, the first hyperparameter is 0.1 and the first reference value is 0. The method of determining the first loss function value through the hinge loss function can be expressed as the following formula (3):

[0089]

[0090] In formula (3), The first loss function value, This is the first reference value. This is the first hyperparameter. The first function value, It is the second function value, therefore This is the first difference after the above update. Indicates in Take the maximum value from the elements inside. The element within is the first reference value. The first difference after the update .

[0091] For example, the first initial model and the model used to encode the second knowledge statement into a vector group (either encoding the second knowledge statement in the first positive training sample into a first vector group, or encoding the second knowledge statement in the first negative training sample into a second vector group) are two different BERTs (Transformers-based Bidirectional Encoders), where Transformer is the name of a model. BERTs include two different sizes: base size and large size, and can also be divided into two different types: uncased and cased. For example, the two different BERTs mentioned above are two different pre-trained BERT-base-uncased models, i.e., base-size and uncased BERTs. This embodiment does not limit the first initial model and the model used to encode the second knowledge statement into a vector group.

[0092] After determining the first loss function value based on the first function value and the second function value, the first initial model is updated to a retrieval model based on the first loss function value. This includes minimizing the first loss function value and then updating the first initial model through a backpropagation process to obtain the retrieval model. This retrieval model can output vectors corresponding to each word in the input sentence.

[0093] In other implementations, the second training sample includes only the first positive training sample and excludes the first negative training sample. In this implementation, after determining the second vector corresponding to the second word according to 201141, the first function value between the second vector and the second knowledge statement in the first positive training sample is determined according to 201142, without determining the aforementioned second function value. Then, the first loss function value is determined directly based on the first function value, for example, by deleting the first loss function value from formula (3) above. The updated formula (3) is obtained, and the first loss function value is determined based on the updated formula (3). The method of updating the first initial model to the retrieval model based on the first loss function value is explained above and will not be repeated here.

[0094] Sections 20111-20114 above describe the process of training the retrieval model, which is a weakly supervised training process. Next, sections 20115-20116 describe the process of using the retrieval model to determine the first knowledge statements corresponding to each first word in the reference spoken statement. The process of using the retrieval model can be viewed as a maximum inner product search problem. For example, this embodiment implements the process of using the retrieval model in a computer device using FAISS (a search software).

[0095] 20115. Input the reference speech into the retrieval model. The retrieval model encodes each first word in the reference speech into a first vector and outputs each first vector. Each first word corresponds one-to-one with each first vector.

[0096] As explained in 20114, the retrieval model can output vectors corresponding to each word in the input statement. Therefore, after inputting a reference statement into the retrieval model, the model can encode each first word in the reference statement into a first vector, ensuring a one-to-one correspondence between the first word and the first vector. The retrieval model can then output these first vectors.

[0097] 20116, obtain each first vector output by the retrieval model. For any first vector, retrieve the candidate knowledge statement with the highest relevance to any first vector from the candidate knowledge statements included in the knowledge base, and obtain the first knowledge statement corresponding to any first vector.

[0098] The knowledge base includes at least one candidate knowledge statement. For a first word, the relevance between the first word and each candidate knowledge statement in the knowledge base is determined, and the candidate knowledge statement with the highest relevance to the first word is taken as the first knowledge statement corresponding to the first vector.

[0099] For example, the referenced statement is represented as , A positive integer not less than 1, used to indicate the number of first words included in the reference spoken statement. The first words output by the retrieval model. ( The first vector corresponding to ) is represented as The updated first vector is obtained by mapping and normalizing the first vector. The knowledge base Optional knowledge statements The candidate vectors are encoded into a set of candidate vectors, and then pooled to obtain the candidate vectors. The updated candidate vector is obtained by mapping and normalizing the candidate vector. Then, the updated first vector is calculated according to the following formula (4). and the updated candidate vector dot product between The dot product is used to indicate the relevance between the first word and the alternative knowledge statements.

[0100]

[0101] Due to the dot product The larger the value, the higher the relevance between the first word and the alternative knowledge statements, and therefore the alternative knowledge statements with the highest relevance to the first word. Represented as formula (5) below, this alternative knowledge statement That is, the first knowledge statement corresponding to the first vector:

[0102]

[0103] Of course, the method of determining the relevance between the first word and the candidate knowledge statements based on the dot product is merely an example, and this embodiment does not limit the method of determining relevance. For example, this embodiment can also calculate the aforementioned first vector. and candidate vectors The Euclidean distance between the first vector and the candidate knowledge statement is used. The smaller the Euclidean distance, the higher the relevance between the first word and the candidate knowledge statement. Therefore, the candidate knowledge statement corresponding to the candidate vector with the smallest Euclidean distance from the first vector is taken as the first knowledge statement corresponding to the first vector.

[0104] It is understood that the method of determining the first knowledge statement corresponding to each first word in the reference speech statement through the retrieval model is merely an example, and this embodiment does not limit the method of determining the first knowledge statement corresponding to each first word. For example, after obtaining the reference speech statement, the first knowledge statement corresponding to each first word can also be determined by manual annotation.

[0105] In 2012, the first sub-training samples were generated based on the first knowledge statements corresponding to the first words in the reference speech statements.

[0106] In some implementations, the first sub-training sample includes a second positive training sample and a second negative training sample. The first sub-training sample is generated based on the first knowledge statements corresponding to each first word in the reference speech statement. This includes: for any first word, forming a second positive training sample by combining any first word with the corresponding first knowledge statement; and forming a second negative training sample by combining any first word with the first knowledge statements corresponding to other first words, where the first knowledge statements corresponding to other first words are unrelated to any first word. It can be seen that a corresponding second positive training sample and a second negative training sample can be formed based on a single first word; that is, the corresponding second positive training sample and the second negative training sample share the same first word.

[0107] In the second positive training sample, the first word is related to the first knowledge statement corresponding to that first word. The reference statement is represented as follows: The first word is represented as The first knowledge statement corresponding to the first word is represented as Then the second positive training sample is represented as ( In the second negative training sample, the first knowledge statements corresponding to the first word and other first words are unrelated. The first knowledge statements corresponding to other first words are represented as... Then the second negative training sample is represented as ( ).

[0108] The process of forming the second positive training sample and the second negative training sample is illustrated with an example: A reference speech statement includes a first word A and a first word B. First word A corresponds to the first knowledge statement A, and first word B corresponds to the first knowledge statement B. Thus, it is possible to form the second positive training sample (first word A, first knowledge statement A) and (first word B, first knowledge statement B), and the second negative training sample (first word A, first knowledge statement B) and (first word B, first knowledge statement A).

[0109] It should be noted that when constructing the second negative training sample, it is sufficient to ensure that the first word in the second negative training sample is unrelated to the first knowledge statement. Therefore, the first knowledge statement in the second negative training sample is not limited to the first knowledge statements corresponding to the other first words mentioned above; the first knowledge statement in the second negative training sample can also be any knowledge statement obtained by random sampling that is unrelated to the first word in the second negative training sample.

[0110] In some other implementations, the first sub-training sample includes only the second positive training sample. The process of forming the second positive training sample is described above and will not be repeated here.

[0111] In 2013, the reference response statement corresponding to the reference speech statement was determined, and the second sub-training sample was generated based on the reference response statement.

[0112] For example, in this embodiment, the reference reply statement corresponding to the reference statement is determined from a public or private database, and the reference reply statement is used as a second sub-training sample. This embodiment does not limit the method of generating the second sub-training sample.

[0113] The sections 2011-2013 above describe the method for obtaining the first training sample, which includes a first sub-training sample and a second sub-training sample. Once the first training sample is obtained, the response model can be trained based on it, as explained in section 202.

[0114] 202. A response model is trained based on the first training sample. The response model includes an encoding sub-model and a decoding sub-model. The encoding sub-model is used to encode the input target statement into a target vector, and the decoding sub-model is used to output the target response statement based on the target vector.

[0115] Before explaining the process of training the response model based on the first training sample, the structure of the response model in the embodiments of this application and the effects that can be obtained by using the response model will be explained first.

[0116] See Figure 3 The usage process in the diagram illustrates a structural schematic of an exemplary response model. Figure 3 In this model, the response model comprises concatenated encoding and decoding sub-models. The encoding sub-model is also called the encoder, and the decoding sub-model is also called the decoder. After the target utterance is input into the encoding sub-model, it encodes the target utterance into a target vector. The target vector is a sequence of vectors, containing multiple vectors that correspond one-to-one with words in the target utterance. Each vector represents the word in a low-dimensional space. After encoding the target utterance into a target vector, the encoding sub-model outputs this target vector, which is then used as input to the decoding sub-model. After the target vector is input into the decoding sub-model, it outputs the target response utterance based on the target vector.

[0117] The target spoken statement mentioned above is the statement that requires a response. This embodiment generates a target response statement based on the target spoken statement, and then uses the target response statement to respond to the target spoken statement. In some implementations, in response to detecting an input statement in the dialog interface, the input statement is taken as the target spoken statement. For example, the input statement is a statement typed by the user via a keyboard, or a statement obtained through speech recognition of the user's voice input via a microphone. In other implementations, the target spoken statement is selected from a dataset, which includes any public or private dataset. It is understood that the above target spoken statements are merely examples, and this embodiment does not limit the method of obtaining the target spoken statement.

[0118] Since the encoding and decoding sub-models in this embodiment are trained based on the first training samples, and the first training samples are generated based on the reference speech statement, the reference response statement, and the first knowledge statements corresponding to each first word in the reference speech statement, the encoding and decoding sub-models can learn knowledge during training. This learned knowledge is stored in the model parameters of the encoding and decoding sub-models. After training is complete, the target speech statement is input into the encoding sub-model. When encoding the target speech statement, the encoding sub-model considers the knowledge related to the target speech statement, obtaining a target vector of considered knowledge. The decoding sub-model outputs the target response statement based on the target vector of considered knowledge, ensuring that the output target response statement is knowledge-related and thus improving the quality of the target response statement.

[0119] Furthermore, since the first training samples include the first knowledge statements corresponding to each first word in the reference speech statements, the knowledge learned by the encoding and decoding sub-models during training is the knowledge corresponding to the words, which is fine-grained knowledge. In contrast, in related technologies, the KGD system that externalizes knowledge learns the instructions corresponding to the speech statements, which is coarse-grained knowledge. Therefore, compared to related technologies, this embodiment learns finer-grained knowledge, thereby further improving the quality of the target response statement.

[0120] For example, the quality of the target response statement in the above description is related to the diversity of the target response statement and the richness of knowledge contained in the target response statement. The greater the diversity of the target response statement, the higher its quality. The richer the knowledge contained in the target response statement, or the more knowledge included in the target response statement, the higher its quality.

[0121] It should be noted that, since the encoding and decoding sub-models learn fine-grained knowledge during training, or in other words, have already memorized this fine-grained knowledge into their model parameters, this embodiment, in outputting the target response statement for the target statement, does not need to retrieve knowledge related to the target statement as in related KGD systems that externalize knowledge, nor does it need to input the target statement and related knowledge together into the encoding sub-model. Instead, only the target statement needs to be input into the encoding sub-model. Therefore, this embodiment omits the knowledge retrieval step in outputting the target response statement, saving time and improving the efficiency of obtaining the target response statement. This improved efficiency is based on ensuring the quality of the target response statement. The process of memorizing knowledge into the model parameters of the encoding and decoding sub-models during training is called knowledge internalization (KI). In this embodiment, the response model including the encoding and decoding sub-models is called a knowledge-internalized KGD system.

[0122] Next, based on Figure 3 The structure of the response model shown illustrates the process of training the encoding sub-model and decoding sub-model based on the first training samples. For example, the first training samples include a first sub-training sample and a second sub-training sample. Training the response model based on the first training samples includes: training the encoding sub-model and decoding sub-model based on the first sub-training sample and the second sub-training sample. Specifically, training the encoding sub-model and decoding sub-model based on the first sub-training sample and the second sub-training sample includes the following steps 2021-2023.

[0123] In 2021, the first word in the first sub-training sample is input into the second initial model to obtain the fifth vector corresponding to the first word in the first sub-training sample output by the second initial model. The second loss function value is determined based on the fifth vector and the first knowledge statement in the first sub-training sample.

[0124] In some implementations, see Figure 3 If the first sub-training sample includes the second positive training sample and the second negative training sample, then 2021 includes the following steps 20211-20213.

[0125] 20211, for any first word, input any first word into the second initial model to obtain the fifth vector corresponding to any first word output by the second initial model.

[0126] Among them, the first word Input the second initial model, then the second initial model is... Encode the first word to obtain the fifth vector. The fifth vector It is the first word Representation in low-dimensional space.

[0127] In 20212, the third function value is determined based on the first knowledge statement in the fifth vector and the second positive training sample, and the fourth function value is determined based on the first knowledge statement in the fifth vector and the second negative training sample.

[0128] As explained above, the corresponding second positive training samples and second negative training samples, based on a first word, share the same first word. Therefore, the fifth vector corresponding to a first word can be calculated using the first knowledge statement in the first positive training sample and the first knowledge statement in the second negative training sample, respectively, to obtain the aforementioned third and fourth function values.

[0129] In an exemplary embodiment, determining a third function value based on a fifth vector and a first knowledge statement in a second positive training sample, and determining a fourth function value based on a fifth vector and a first knowledge statement in a second negative training sample, includes: encoding the first knowledge statement in the second positive training sample into a third vector group; pooling the third vector group to obtain a sixth vector; encoding the first knowledge statement in the second negative training sample into a fourth vector group; pooling the fourth vector group to obtain a seventh vector; mapping and normalizing the fifth, sixth, and seventh vectors to obtain updated fifth, sixth, and seventh vectors of the same length; calculating the dot product between the updated fifth and sixth vectors to obtain the third function value; and calculating the dot product between the updated fifth and seventh vectors to obtain the fourth function value.

[0130] The second positive training sample is represented as ( ), and the first knowledge statement in the second positive training sample The encoding is done into a third vector group, and pooling is performed on the third vector group to obtain the sixth vector. The second negative training sample is represented as ( ), and the first knowledge statement in the second negative training sample The encoding is done into a fourth vector group, and pooling is performed on the fourth vector group to obtain the seventh vector. After that, regarding the first word... The corresponding fifth vector After mapping and normalization, the updated fifth vector is obtained. The sixth vector is mapped and normalized to obtain the updated sixth vector. The seventh vector is mapped and normalized to obtain the updated seventh vector. Among them, the updated fifth vector The updated sixth vector and the updated seventh vector They are the same length.

[0131] Next, the dot product between the updated fifth vector and the updated sixth vector is calculated according to the following formula (6) to obtain the value of the third function. :

[0132]

[0133] The larger the value of the third function, the better the second initial model is for the first vocabulary. Encoded The first knowledge statement in the second positive training sample (i.e., the first word) The higher the relevance between related first-knowledge statements, the better.

[0134] The electrode between the updated fifth vector and the updated seventh vector is calculated according to the following formula (7) to obtain the fourth function value. :

[0135]

[0136] The smaller the value of the fourth function, the better the second initial model is for the first vocabulary. Encoded The first knowledge statement in the second negative training sample (i.e., the first word) The lower the similarity between unrelated first-knowledge statements.

[0137] In 20213, the second loss function value was determined based on the third and fourth function values.

[0138] This embodiment does not limit the method of determining the second loss function value based on the third and fourth function values. Taking the determination of the second loss function value through the hinge loss function as an example, the method of determining the second loss function value will be explained. For example, the second difference is calculated based on the third and fourth function values, and the sum of the second difference and the second hyperparameter is calculated to obtain the updated second difference. The larger value between the updated second difference and the second reference value is determined as the second loss function value. In this embodiment, the second hyperparameter and the second reference value are not limited. For example, the second hyperparameter is 0.1, and the second reference value is 0. The method of determining the second loss function value through the hinge loss function can be expressed as the following formula (8):

[0139]

[0140] In formula (8), This is the value of the second loss function. This is the second reference value. For the second hyperparameter The third function value, It is the fourth function value, therefore This is the second difference after the above update. Indicates in Take the maximum value from the elements inside. The element within is the second reference value. and the updated second difference .

[0141] In other implementations, the first sub-training sample includes only the second positive training sample and excludes the second negative training sample. In this approach, after determining the fifth vector corresponding to the first word according to 20211, the third function value between the fifth vector and the first knowledge statement in the second positive training sample is determined according to 20212, without determining the fourth function value. Then, the second loss function value is directly determined based on the third function value, for example, by deleting the third function value from formula (8) above. The updated formula (8) is obtained, and the value of the second loss function is determined based on the updated formula (8).

[0142] 2022, see Figure 3 The fifth vector corresponding to each first word is input into the third initial model to determine the third loss function value. The third loss function value is used to indicate the probability of the third initial model outputting the reference response statement in the second sub-training sample based on the fifth vector corresponding to each first word.

[0143] One of the first words The corresponding fifth vector is represented as Then the fifth vector corresponding to each first word is represented as: Additionally, the reference response statements in the second sub-training sample are represented as follows: , A positive integer not less than 1, used to indicate the number of words included in the reference response statement.

[0144] Regarding a word in the reference reply statement ( In this regard, the third initial model is based on the fifth vector corresponding to each first word. and the words preceding this word Generate this word Of course, regarding the first word in the reference reply statement... Since there are no other words preceding this first word, the first word is based solely on the fifth vector corresponding to each first word. Generation. Based on this, a word in the third initial model. The generation probability of is expressed as the following formula (9):

[0145]

[0146] For example, in this embodiment, the NLL (negative log-likelihood) loss is calculated based on formula (9) to obtain the third loss function value. This third loss function value indicates the probability that the third initial model outputs the reference response statement in the second sub-training sample based on the fifth vector corresponding to each first word. Thus, the third loss function value is obtained. The process is represented by the following formula (10):

[0147]

[0148] In 2023, the fourth loss function value is determined based on the second and third loss function values. Based on the fourth loss function value, the second initial model is updated to an encoding sub-model, and the third initial model is updated to a decoding sub-model.

[0149] For example, in this embodiment, the weighted sum of the second loss function value and the third loss function value is determined as the fourth loss function value. See formula (11) below:

[0150]

[0151] in, The weights corresponding to the third loss function value are not specified in this embodiment. The value of is limited. For example, it could be 1.

[0152] For example, updating the second initial model to an encoding sub-model and the third initial model to a decoding sub-model based on the fourth loss function value includes: minimizing the fourth loss function value, then updating the second initial model to obtain the encoding sub-model through the backpropagation process, and updating the third initial model to obtain the decoding sub-model.

[0153] In summary, the training samples used to train the response model in this embodiment include knowledge statements corresponding to each word in the spoken statement. Since the knowledge statements corresponding to words are fine-grained knowledge, the response model can learn this fine-grained knowledge during training. Therefore, when using the response model to output response statements, there is no need to additionally retrieve knowledge and input it into the response model, improving the efficiency of obtaining response statements. Furthermore, the output response statements consider fine-grained knowledge and have high quality.

[0154] This application also provides a method for obtaining response statements, see [link to relevant documentation]. Figure 4 The method includes the following steps 401-403.

[0155] 401, retrieves the target message that needs to be replied to.

[0156] The method for obtaining the target message that needs to be replied to is explained in section 202 above, and will not be repeated here.

[0157] 402. Input the target statement into the encoding sub-model in the response model. The encoding sub-model is used to encode the target statement into a target vector. The response model also includes a decoding sub-model. The decoding sub-model is used to output the target response statement based on the target vector. The encoding sub-model and the decoding sub-model are trained based on the first training sample. The first training sample is generated based on the reference statement, the reference response statement, and the first knowledge statement corresponding to each first word in the reference statement.

[0158] In 402, the response model, which includes an encoding sub-model and a decoding sub-model, is the model trained using 201 and 202 above.

[0159] 403, obtain the target response statement output by the decoding submodel in the response model.

[0160] After the target statement is input into the encoding sub-model of the response model, the decoding sub-model of the response model will output the target response statement corresponding to the target statement. This target response statement is of high quality, and the efficiency of obtaining it is also high. The process of outputting the target response statement is explained in section 202 above, and will not be repeated here.

[0161] The method provided in this application can be applied to the field of mapping. For example, the method provided in this application is applied to a voice interaction process in an in-vehicle scenario within the mapping field. During this voice interaction, the user's voice is recognized to obtain the required response. Then, steps 402 and 403 are executed to obtain the target response statement. Next, the target response statement is returned to the user via text, voice, or other means, thus completing a voice interaction.

[0162] As mentioned earlier, the response model in this embodiment is a Knowledge Internalization (KI) KGD system. That is, this embodiment achieves the acquisition of response statements through a knowledge internalization (KI) process. Next, the beneficial effects of the knowledge internalization process will be explained using actual data.

[0163] In one related technology, typical dialogue systems such as Seq2Seq and Transformer are provided. These typical dialogue systems adopt an encoder-decoder structure, and therefore can be combined with the knowledge internalization process in this embodiment to obtain Seq2Seq+KI and Transformer+KI. Seq2Seq, Transformer, Seq2Seq+KI, and Transformer+KI are trained on different datasets respectively, and the results are shown in Tables 1, 2, 3, and 4 below.

[0164] Table 1

[0165]

[0166] Table 1 shows the results of training on the DailyDialog dataset. In Table 1, PPL indicates the perplexity of the response statement, BLEU-4 indicates the precision of the response statement, ROUGE indicates the recall of the response statement, and Distinc-1 / 2 indicates the diversity of the response statement. It can be seen that the diversity of the response statements generated by Seq2Seq+KI (3.36 / 14.10) is higher than that generated by Seq2Seq (2.85 / 11.74), and the diversity of the response statements generated by Transformer+KI (4.39 / 21.88) is higher than that generated by Transformer (1.48 / 5.10). Therefore, the knowledge internalization process in this embodiment can improve the diversity of the response statements, thereby improving the quality of the response statements.

[0167] Table 2

[0168]

[0169] Table 2 shows the training results on the CRD dataset. It can be seen that the diversity of response statements generated by Seq2Seq+KI (1.86 / 7.73) is higher than that generated by Seq2Seq (1.13 / 4.47), and the diversity of response statements generated by Transformer+KI (3.24 / 17.81) is higher than that generated by Transformer (2.01 / 7.40). Therefore, the knowledge internalization process in this embodiment can improve the diversity of response statements, thereby improving the quality of the response statements.

[0170] Table 3

[0171]

[0172] Table 3 shows the training results on the WOW Test Seen dataset. wikiF1 is used to indicate the knowledge richness of the response statements. It can be seen that the knowledge richness of the response statements generated by Seq2Seq+KI (9.59) is higher than that generated by Seq2Seq (6.15), and the diversity of the response statements generated by Seq2Seq+KI (4.99 / 17.32) is higher than that generated by Seq2Seq (1.81 / 5.48). Similarly, the knowledge richness of the response statements generated by Transformer+KI (10.69) is higher than that generated by Transformer (6.83), and the diversity of the response statements generated by Transformer+KI (5.66 / 18.68) is higher than that generated by Transformer (1.95 / 4.44). Therefore, the knowledge internalization process in this embodiment can improve both the diversity and the knowledge richness of the response statements, thereby improving the quality of the response statements.

[0173] Table 4

[0174]

[0175] Table 4 shows the training results on the WOW Test Unseen dataset. It can be seen that the knowledge richness of the responses generated by Seq2Seq+KI (7.09) is higher than that generated by Seq2Seq (6.11), and the diversity of the responses generated by Seq2Seq+KI (3.12 / 12.05) is higher than that generated by Seq2Seq (2.58 / 10.25). Similarly, the knowledge richness of the responses generated by Transformer+KI (7.13) is higher than that generated by Transformer (5.43), and the diversity of the responses generated by Transformer+KI (3.82 / 12.98) is higher than that generated by Transformer (1.43 / 3.27). Therefore, the knowledge internalization process in this embodiment can improve both the diversity and the richness of knowledge included in the responses, thereby improving the quality of the responses.

[0176] It should be noted that combining a typical dialogue system with the knowledge internalization process does not introduce additional overhead. Therefore, the efficiency of obtaining response statements remains high, exceeding that of KGD systems that externalize knowledge in related technologies. See Table 5 below, which compares the efficiency of obtaining response statements after incorporating the knowledge internalization process, using Transformer and Transformer+KI as examples.

[0177] Table 5

[0178]

[0179] In Table 5, `sent / s` represents the number of sentences decoded per second during the test, `tok / s` represents the number of tokens decoded per second during the test, and `Time` represents the total time required to complete the test. It can be seen that on the `DailyDialog` dataset, the time consumed by `Transformer+KI` (35.1 seconds) is only 3.7 seconds longer than the time consumed by `Transformer` (31.4 seconds). On the `WoW Unseen` dataset, the time consumed by `Transformer+KI` (24.3 seconds) is only 1.9 seconds longer than the time consumed by `Transformer` (22.4 seconds). On the `CRD` dataset, the time consumed by `Transformer+KI` (108.8 seconds) is less than the time consumed by `Transformer` (126.7 seconds). On the `WoW Seen` dataset, the time consumed by `Transformer+KI` (25.4 seconds) is also less than the time consumed by `Transformer` (25.9 seconds). Therefore, combining a typical dialogue system with a knowledge internalization process can ensure the efficiency of acquiring response sentences.

[0180] As can be seen from Tables 1-5 above, combining a typical dialogue system with the knowledge internalization process provided in this embodiment can improve the diversity of response statements and the richness of knowledge included in the response statements, without incurring additional overhead, thus ensuring the efficiency of response statement acquisition.

[0181] Another related technology provides a knowledge externalization KGD system: DiffKS. Since DiffKS also employs an encoder-decoder structure, it can be combined with the knowledge internalization process in this embodiment to obtain DiffKS +KI. DiffKS and DiffKS +KI were trained on different datasets, yielding results shown in Tables 6 and 7 below.

[0182] Table 6

[0183]

[0184] Table 6 shows the training results on the WOW Test Seen dataset. The knowledge richness of the responses generated by DiffKS + KI (7.09) is higher than that of responses generated by DiffKS (6.11), and the diversity of responses generated by DiffKS + KI (3.12 / 12.05) is higher than that of responses generated by DiffKS (2.58 / 10.25). Therefore, the knowledge internalization process in this embodiment can improve both the diversity and the richness of knowledge included in the responses, thereby improving the quality of the responses.

[0185] Table 7

[0186]

[0187] Table 7 shows the training results on the WOW Test Seen dataset. The knowledge richness of the responses generated by DiffKS + KI (7.09) is higher than that of responses generated by DiffKS (6.11), and the diversity of responses generated by DiffKS + KI (3.12 / 12.05) is higher than that of responses generated by DiffKS (2.58 / 10.25). Therefore, the knowledge internalization process in this embodiment can improve both the diversity and the richness of knowledge included in the responses, thereby improving the quality of the responses.

[0188] As can be seen from Tables 6 and 7 above, combining the KGD system that externalizes knowledge with the knowledge internalization process provided in this embodiment can improve the diversity of response statements and the richness of knowledge included in the response statements.

[0189] This application provides a device for obtaining a response model, see [link to relevant documentation]. Figure 5 The device includes the following modules.

[0190] The acquisition module 501 is used to acquire the first training sample, which is generated based on the reference speech statement, the reference reply statement, and the first knowledge statement corresponding to each first word in the reference speech statement.

[0191] The training module 502 is used to train a response model based on the first training sample. The response model includes an encoding sub-model and a decoding sub-model. The encoding sub-model is used to encode the input target statement into a target vector, and the decoding sub-model is used to output the target response statement based on the target vector.

[0192] In an exemplary embodiment, the acquisition module 501 is configured to acquire a reference speech statement, determine a first knowledge statement corresponding to each first word in the reference speech statement, generate a first sub-training sample based on the first knowledge statement corresponding to each first word in the reference speech statement, and determine a reference response statement corresponding to the reference speech statement, and generate a second sub-training sample based on the reference response statement.

[0193] Training module 502 is used to train the encoding sub-model and the decoding sub-model based on the first sub-training sample and the second sub-training sample.

[0194] In an exemplary embodiment, the acquisition module 501 is used to input the reference speech statement into the retrieval model. The retrieval model is used to encode each first word in the reference speech statement into a first vector and output each first vector. Each first word corresponds one-to-one with each first vector. The retrieval model is trained based on a second training sample, which is generated based on the second word, the statement containing the second word, and the second knowledge statement corresponding to the second word. The module obtains each first vector output by the retrieval model. For any first vector, the module retrieves the candidate knowledge statement with the highest relevance to any first vector from the candidate knowledge statements included in the knowledge base, thereby obtaining the first knowledge statement corresponding to any first vector.

[0195] In an exemplary embodiment, the acquisition module 501 is further configured to acquire a description statement corresponding to a term, the description statement including at least one descriptive word for the term; for any descriptive word, the descriptive word is determined as a second word, and the description statement is determined as the statement containing the second word and the second knowledge statement corresponding to the second word; a second training sample is generated based on the second word, the statement containing the second word, and the second knowledge statement corresponding to the second word, and a retrieval model is trained based on the second training sample.

[0196] In an exemplary embodiment, the second training sample includes a first positive training sample and a first negative training sample. The acquisition module 501 is used to, for any second word, form a first positive training sample by combining any second word, the statement containing any second word, and the second knowledge statement corresponding to the second word, and form a first negative training sample by combining any second word, the statement containing any second word, and the second knowledge statement corresponding to other second words, wherein the second knowledge statement corresponding to other second words is unrelated to any second word.

[0197] In an exemplary embodiment, the acquisition module 501 is configured to, for any second word, input the statement containing the second word into the first initial model to obtain a second vector corresponding to any second word output by the first initial model; determine a first function value between the second vector and the second knowledge statement in the first positive training sample, determine a second function value between the second vector and the second knowledge statement in the first negative training sample; determine a first loss function value based on the first function value and the second function value, and update the first initial model to a retrieval model based on the first loss function value.

[0198] In an exemplary embodiment, the acquisition module 501 is configured to encode the second knowledge statement in the first positive training sample into a first vector group, pool the first vector group to obtain a third vector, encode the second knowledge statement in the first negative training sample into a second vector group, pool the second vector group to obtain a fourth vector; map and normalize the second, third, and fourth vectors to obtain updated second, third, and fourth vectors of the same length; calculate the dot product between the updated second and third vectors to obtain a first function value, and calculate the dot product between the updated second and fourth vectors to obtain a second function value.

[0199] In an exemplary embodiment, the training module 502 is configured to input the first word from the first sub-training sample into the second initial model to obtain the fifth vector corresponding to the first word in the first sub-training sample output by the second initial model; determine the second loss function value based on the fifth vector and the first knowledge statement in the first sub-training sample; input the fifth vector corresponding to each first word into the third initial model to determine the third loss function value, the third loss function value being used to indicate the probability that the third initial model outputs the reference response statement in the second sub-training sample based on the fifth vector corresponding to each first word; determine the fourth loss function value based on the second loss function value and the third loss function value; and update the second initial model to an encoding sub-model and the third initial model to a decoding sub-model based on the fourth loss function value.

[0200] In an exemplary embodiment, the first sub-training sample includes a second positive training sample and a second negative training sample. The acquisition module 501 is used to, for any first word, form a second positive training sample by combining any first word with the first knowledge statement corresponding to any first word, and form a second negative training sample by combining any first word with the first knowledge statement corresponding to other first words, wherein the first knowledge statement corresponding to other first words is unrelated to any first word.

[0201] In an exemplary embodiment, the training module 502 is configured to, for any first word, input any first word into a second initial model to obtain a fifth vector corresponding to any first word output by the second initial model; determine a third function value based on the fifth vector and the first knowledge statement in the second positive training sample; determine a fourth function value based on the fifth vector and the first knowledge statement in the second negative training sample; and determine a second loss function value based on the third function value and the fourth function value.

[0202] In an exemplary embodiment, the training module 502 is configured to encode the first knowledge statement in the second positive training sample into a third vector group, pool the third vector group to obtain a sixth vector, encode the first knowledge statement in the second negative training sample into a fourth vector group, pool the fourth vector group to obtain a seventh vector; map and normalize the fifth, sixth, and seventh vectors to obtain updated fifth, sixth, and seventh vectors of the same length; calculate the dot product between the updated fifth and sixth vectors to obtain a third function value, and calculate the dot product between the updated fifth and seventh vectors to obtain a fourth function value.

[0203] This application provides a device for obtaining response statements, see [link to relevant documentation]. Figure 6 The device includes the following modules.

[0204] Module 601 is used to obtain the target message that needs to be replied to;

[0205] The input module 602 is used to input the target speech statement into the encoding sub-model in the response model. The encoding sub-model is used to encode the target speech statement into a target vector. The response model also includes a decoding sub-model, which is used to output the target response statement based on the target vector. The encoding sub-model and the decoding sub-model are trained based on the first training sample. The first training sample is generated based on the reference speech statement, the reference response statement, and the first knowledge statement corresponding to each first word in the reference speech statement.

[0206] The module 603 is used to obtain the target response statement output by the decoding sub-model in the response model.

[0207] In summary, the training samples used to train the response model in this embodiment include knowledge statements corresponding to each word in the spoken statement. Since the knowledge statements corresponding to words are fine-grained knowledge, the response model can learn this fine-grained knowledge during training. Therefore, when using the response model to output response statements, there is no need to additionally retrieve knowledge and input it into the response model, improving the efficiency of obtaining response statements. Furthermore, the output response statements consider fine-grained knowledge and have high quality.

[0208] It should be noted that the apparatus provided in the above embodiments is only illustrated by the division of the above functional modules. In practical applications, the above functions can be assigned to different functional modules as needed, that is, the internal structure of the device can be divided into different functional modules to complete all or part of the functions described above. In addition, the apparatus and method embodiments provided in the above embodiments belong to the same concept, and their specific implementation process can be found in the method embodiments, which will not be repeated here.

[0209] See Figure 7 This illustration shows a structural schematic of an electronic device 700 provided in an embodiment of this application. The electronic device 700 can be a portable mobile electronic device, such as a smartphone, tablet computer, MP3 player (Moving Picture Experts Group Audio Layer III), MP4 player (Moving Picture Experts Group Audio Layer IV), laptop computer, or desktop computer. The electronic device 700 may also be referred to as a user device, portable electronic device, laptop electronic device, desktop electronic device, or other names.

[0210] Typically, electronic device 700 includes a processor 701 and a memory 702.

[0211] Processor 701 may include one or more processing cores, such as a quad-core processor, an octa-core processor, etc. Processor 701 may be implemented using at least one hardware form from the group consisting of DSP (Digital Signal Processing), FPGA (Field-Programmable Gate Array), and PLA (Programmable Logic Array). Processor 701 may also include a main processor and a coprocessor. The main processor, also known as a CPU (Central Processing Unit), is used to process data in the wake-up state; the coprocessor is a low-power processor used to process data in the standby state. In some embodiments, processor 701 may integrate a GPU (Graphics Processing Unit), which is responsible for rendering and drawing the content to be displayed on display screen 705. In some embodiments, processor 701 may also include an AI (Artificial Intelligence) processor, which is used to handle computational operations related to machine learning.

[0212] The memory 702 may include one or more computer-readable storage media, which may be non-transitory. The memory 702 may also include high-speed random access memory and non-volatile memory, such as one or more disk storage devices or flash memory devices. In some embodiments, the non-transitory computer-readable storage media in the memory 702 is used to store at least one instruction, which is executed by the processor 701 to implement the method for obtaining a response model or a response statement provided in the method embodiments of this application.

[0213] In some embodiments, the electronic device 700 may optionally include a peripheral device interface 703 and at least one peripheral device. The processor 701, memory 702, and peripheral device interface 703 can be connected via a bus or signal line. Each peripheral device can be connected to the peripheral device interface 703 via a bus, signal line, or circuit board. Specifically, the peripheral device includes at least one of the following groups: radio frequency circuitry 704, display screen 705, camera assembly 706, audio circuitry 707, and power supply 709.

[0214] Peripheral device interface 703 can be used to connect at least one I / O (Input / Output) related peripheral device to processor 701 and memory 702. In some embodiments, processor 701, memory 702 and peripheral device interface 703 are integrated on the same chip or circuit board; in some other embodiments, any one or two of processor 701, memory 702 and peripheral device interface 703 can be implemented on separate chips or circuit boards, which is not limited in this embodiment.

[0215] The radio frequency (RF) circuit 704 is used to receive and transmit RF (Radio Frequency) signals, also known as electromagnetic signals. The RF circuit 704 communicates with communication networks and other communication devices via electromagnetic signals. The RF circuit 704 converts electrical signals into electromagnetic signals for transmission, or converts received electromagnetic signals back into electrical signals. Optionally, the RF circuit 704 includes: an antenna system, an RF transceiver, one or more amplifiers, a tuner, an oscillator, a digital signal processor, a codec chipset, a user identity module card, etc. The RF circuit 704 can communicate with other electronic devices through at least one wireless communication protocol. This wireless communication protocol includes, but is not limited to: metropolitan area networks (MANs), various generations of mobile communication networks (2G, 3G, 4G, and 7G), wireless local area networks (WLANs), and / or Wi-Fi (Wireless Fidelity) networks. In some embodiments, the RF circuit 704 may also include circuitry related to NFC (Near Field Communication), which is not limited in this application.

[0216] Display screen 705 is used to display a UI (User Interface). This UI may include graphics, text, icons, videos, and any combination thereof. When display screen 705 is a touch display screen, display screen 707 also has the ability to collect touch signals on or above the surface of display screen 705. These touch signals can be input as control signals to processor 701 for processing. In this case, display screen 705 can also be used to provide virtual buttons and / or a virtual keyboard, also known as soft buttons and / or a soft keyboard. In some embodiments, there may be one display screen 705, disposed on the front panel of electronic device 700; in other embodiments, there may be at least two display screens 705, disposed on different surfaces of electronic device 700 or in a folded design; in other embodiments, display screen 705 may be a flexible display screen, disposed on a curved surface or folded surface of electronic device 700. Furthermore, display screen 705 may be configured as a non-rectangular irregular shape, i.e., a non-rectangular screen. Display screen 705 may be made of materials such as LCD (Liquid Crystal Display) or OLED (Organic Light-Emitting Diode).

[0217] Camera assembly 706 is used to acquire images or videos. Optionally, camera assembly 706 includes a front-facing camera and a rear-facing camera. Typically, the front-facing camera is located on the front panel of the electronic device, and the rear-facing camera is located on the back of the electronic device. In some embodiments, there are at least two rear-facing cameras, which are any one of a main camera, a depth-sensing camera, a wide-angle camera, and a telephoto camera, to achieve background blurring by fusion of the main camera and the depth-sensing camera, panoramic shooting by fusion of the main camera and the wide-angle camera, VR (Virtual Reality) shooting, or other fusion shooting functions. In some embodiments, camera assembly 706 may also include a flash. The flash can be a single-color temperature flash or a dual-color temperature flash. A dual-color temperature flash refers to a combination of a warm-light flash and a cool-light flash, which can be used for light compensation at different color temperatures.

[0218] The audio circuit 707 may include a microphone and a speaker. The microphone is used to collect sound waves from the user and the environment, converting the sound waves into electrical signals that are input to the processor 701 for processing, or input to the radio frequency circuit 704 for voice communication. For stereo sound acquisition or noise reduction purposes, multiple microphones may be used, each located in a different part of the electronic device 700. The microphone may also be an array microphone or an omnidirectional microphone. The speaker is used to convert the electrical signals from the processor 701 or the radio frequency circuit 704 into sound waves. The speaker may be a conventional diaphragm speaker or a piezoelectric ceramic speaker. When the speaker is a piezoelectric ceramic speaker, it can convert electrical signals not only into audible sound waves but also into inaudible sound waves for purposes such as distance measurement. In some embodiments, the audio circuit 707 may also include a headphone jack.

[0219] The power supply 709 is used to power the various components in the electronic device 700. The power supply 709 can be AC ​​power, DC power, a disposable battery, or a rechargeable battery. When the power supply 709 includes a rechargeable battery, the rechargeable battery can support wired or wireless charging. The rechargeable battery can also be used to support fast charging technology.

[0220] In some embodiments, the electronic device 700 further includes one or more sensors 710. The one or more sensors 710 include, but are not limited to, an accelerometer 711, a gyroscope 712, a pressure sensor 713, an optical sensor 717, and a proximity sensor 716.

[0221] Accelerometer 711 can detect the magnitude of acceleration on the three coordinate axes of a coordinate system established by electronic device 700. For example, accelerometer 711 can be used to detect the components of gravitational acceleration on the three coordinate axes. Processor 701 can control display screen 705 to display the user interface in either a landscape or portrait view based on the gravitational acceleration signal acquired by accelerometer 711. Accelerometer 711 can also be used for games or for acquiring user motion data.

[0222] The gyroscope sensor 712 can detect the orientation and rotation angle of the electronic device 700. The gyroscope sensor 712, in conjunction with the accelerometer sensor 711, can collect 3D motion data from the user on the electronic device 700. Based on the data collected by the gyroscope sensor 712, the processor 701 can perform the following functions: motion sensing (e.g., changing the UI based on the user's tilt), image stabilization during shooting, game control, and inertial navigation.

[0223] The pressure sensor 713 can be disposed on the side bezel of the electronic device 700 and / or the lower layer of the display screen 707. When the pressure sensor 713 is disposed on the side bezel of the electronic device 700, it can detect the user's grip signal on the electronic device 700, and the processor 701 can perform left / right hand recognition or quick operation based on the grip signal collected by the pressure sensor 713. When the pressure sensor 713 is disposed on the lower layer of the display screen 705, the processor 701 can control the operable controls on the UI interface based on the user's pressure operation on the display screen 705. The operable controls include at least one of the group consisting of button controls, scroll bar controls, icon controls, and menu controls.

[0224] An optical sensor 717 is used to collect ambient light intensity. In one embodiment, the processor 701 can control the display brightness of the display screen 705 based on the ambient light intensity collected by the optical sensor 717. Specifically, when the ambient light intensity is high, the display brightness of the display screen 705 is increased; when the ambient light intensity is low, the display brightness of the display screen 705 is decreased. In another embodiment, the processor 701 can also dynamically adjust the shooting parameters of the camera assembly 706 based on the ambient light intensity collected by the optical sensor 717.

[0225] A proximity sensor 716, also known as a distance sensor, is typically located on the front panel of an electronic device 700. The proximity sensor 716 is used to detect the distance between the user and the front of the electronic device 700. In one embodiment, when the proximity sensor 716 detects that the distance between the user and the front of the electronic device 700 is gradually decreasing, the processor 701 controls the display screen 705 to switch from a screen-on state to a screen-off state; when the proximity sensor 716 detects that the distance between the user and the front of the electronic device 700 is gradually increasing, the processor 701 controls the display screen 705 to switch from a screen-off state to a screen-on state.

[0226] Those skilled in the art will understand that Figure 7 The structure shown does not constitute a limitation on the electronic device 700, and may include more or fewer components than shown, or combine certain components, or use different component arrangements.

[0227] Figure 8This is a schematic diagram of the server structure provided in the embodiments of this application. The server 800 can vary considerably due to different configurations or performance. It may include one or more processors 801 and one or more memories 802. The one or more memories 802 store at least one line of program code, which is loaded and executed by the one or more processors 801 to enable the server to implement the response model acquisition method or response statement acquisition method provided in the various method embodiments described above. Of course, the server 800 may also have wired or wireless network interfaces, a keyboard, and input / output interfaces for input and output. The server 800 may also include other components for implementing device functions, which will not be elaborated here.

[0228] This application provides a computer device, which includes a memory and a processor. The memory stores at least one instruction, which is loaded and executed by the processor to enable the computer device to implement the response model acquisition method or response statement acquisition method provided in any exemplary embodiment of this application.

[0229] This application provides a computer-readable storage medium storing at least one instruction, which is loaded and executed by a processor to enable a computer to implement the method for obtaining a response model or a method for obtaining a response statement provided in any exemplary embodiment of this application.

[0230] This application provides a computer program or computer program product, which includes computer instructions. When executed by a computer, the computer instructions cause the computer to implement the method for obtaining a response model or the method for obtaining a response statement provided in any exemplary embodiment of this application.

[0231] All of the above-mentioned optional technical solutions can be combined in any way to form the optional embodiments of this application, and will not be described in detail here.

[0232] Those skilled in the art will understand that all or part of the steps of the above embodiments can be implemented by hardware or by a program instructing related hardware. The program can be stored in a computer-readable storage medium, such as a read-only memory, a disk, or an optical disk.

[0233] The above description is merely an embodiment of this application and is not intended to limit this application. Any modifications, equivalent substitutions, improvements, etc., made within the spirit and principles of this application should be included within the protection scope of this application.

Claims

1. A method of obtaining a reply model, characterized by, The method includes: Obtain a first training sample, which is generated based on a reference speech statement, a reference reply statement, and a first knowledge statement corresponding to each first word in the reference speech statement; A response model is trained based on the first training sample. The response model includes an encoding sub-model and a decoding sub-model. The trained encoding sub-model is used to encode the input target statement to be responded to into a target vector. The trained decoding sub-model is used to output the target response statement corresponding to the target statement based on the target vector. During the training process, the knowledge learned by the encoding sub-model and the decoding sub-model is the knowledge of word correspondence. The vectors included in the target vector correspond one-to-one with the words in the target statement. A vector is the representation of the word corresponding to the vector in space.

2. The method of claim 1, wherein, The first training sample includes a first sub-training sample and a second sub-training sample. Obtaining the first training sample includes: Obtain the reference speech statement, determine the first knowledge statement corresponding to each first word in the reference speech statement, and generate the first sub-training sample based on the first knowledge statement corresponding to each first word in the reference speech statement. Determine the reference response statement corresponding to the reference speech statement, and generate the second sub-training sample based on the reference response statement; The response model trained based on the first training sample includes: Based on the first sub-training sample and the second sub-training sample, the encoding sub-model and the decoding sub-model are trained.

3. The method of claim 2, wherein, The step of determining the first knowledge statement corresponding to each first word in the reference speech statement includes: The reference speech statement is input into the retrieval model, which encodes each first word in the reference speech statement into a first vector and outputs each first vector. Each first word corresponds one-to-one with each first vector. The retrieval model is trained based on a second training sample, which is generated based on a second word, the statement containing the second word, and the second knowledge statement corresponding to the second word. Obtain each of the first vectors output by the retrieval model; For any first vector, retrieve the candidate knowledge statement that is most relevant to the first vector from the candidate knowledge statements included in the knowledge base, and obtain the first knowledge statement corresponding to the first vector.

4. The method according to claim 3, characterized in that, Before inputting the reference speech statement into the retrieval model, the method further includes: Obtain the description statement corresponding to the term, wherein the description statement includes at least one descriptive term for the term; For any descriptive term, the descriptive term is determined as the second term, and the descriptive statement is determined as the statement containing the second term and the second knowledge statement corresponding to the second term; The second training sample is generated based on the second word, the sentence containing the second word, and the second knowledge sentence corresponding to the second word. The retrieval model is then trained based on the second training sample.

5. The method according to claim 4, characterized in that, The second training sample includes a first positive training sample and a first negative training sample. Generating the second training sample based on the second word, the sentence containing the second word, and the second knowledge sentence corresponding to the second word includes: For any second word, the first positive training sample is formed by the second word, the statement containing the second word, and the second knowledge statement corresponding to the second word. The first negative training sample is formed by the second word, the statement containing the second word, and the second knowledge statements corresponding to other second words. The second knowledge statements corresponding to other second words are unrelated to the second word.

6. The method according to claim 5, characterized in that, The process of training the retrieval model based on the second training sample includes: For any second word, input the sentence containing the second word into the first initial model to obtain the second vector corresponding to the second word output by the first initial model; Determine the first function value between the second vector and the second knowledge statement in the first positive training sample, and determine the second function value between the second vector and the second knowledge statement in the first negative training sample; A first loss function value is determined based on the first function value and the second function value, and the first initial model is updated to the retrieval model based on the first loss function value.

7. The method according to claim 6, characterized in that, Determining the first function value between the second vector and the second knowledge statement in the first positive training sample, and determining the second function value between the second vector and the second knowledge statement in the first negative training sample, includes: The second knowledge statement in the first positive training sample is encoded into a first vector group, and the first vector group is pooled to obtain a third vector. The second knowledge statement in the first negative training sample is encoded into a second vector group, and the second vector group is pooled to obtain a fourth vector. The second vector, the third vector, and the fourth vector are mapped and normalized to obtain updated second vectors, updated third vectors, and updated fourth vectors of the same length; Calculate the dot product between the updated second vector and the updated third vector to obtain the first function value; calculate the dot product between the updated second vector and the updated fourth vector to obtain the second function value.

8. The method according to any one of claims 2-7, characterized in that, The step of training the encoding sub-model and the decoding sub-model based on the first sub-training sample and the second sub-training sample includes: The first word in the first sub-training sample is input into the second initial model to obtain the fifth vector corresponding to the first word in the first sub-training sample output by the second initial model. The second loss function value is determined based on the fifth vector and the first knowledge statement in the first sub-training sample. The fifth vector corresponding to each of the first words is input into the third initial model to determine the third loss function value. The third loss function value is used to indicate the probability that the third initial model outputs the reference response statement in the second sub-training sample based on the fifth vector corresponding to each of the first words. A fourth loss function value is determined based on the second loss function value and the third loss function value. Based on the fourth loss function value, the second initial model is updated to the encoding sub-model, and the third initial model is updated to the decoding sub-model.

9. The method according to claim 8, characterized in that, The first sub-training sample includes a second positive training sample and a second negative training sample. Generating the first sub-training sample based on the first knowledge statement corresponding to each first word in the reference speech statement includes: For any first word, the first word and the first knowledge statement corresponding to the first word are combined to form the second positive training sample, and the first knowledge statement corresponding to the other first words is combined to form the second negative training sample, wherein the first knowledge statement corresponding to the other first words is unrelated to the first word.

10. The method according to claim 9, characterized in that, The step of inputting the first word from the first sub-training sample into the second initial model to obtain the fifth vector corresponding to the first word in the first sub-training sample output by the second initial model, and determining the second loss function value based on the fifth vector and the first knowledge statement in the first sub-training sample, includes: For any first word, input the first word into the second initial model to obtain the fifth vector corresponding to the first word output by the second initial model; The third function value is determined based on the fifth vector and the first knowledge statement in the second positive training sample, and the fourth function value is determined based on the fifth vector and the first knowledge statement in the second negative training sample. The second loss function value is determined based on the third function value and the fourth function value.

11. The method according to claim 10, characterized in that, The process of determining the third function value based on the fifth vector and the first knowledge statement in the second positive training sample, and determining the fourth function value based on the fifth vector and the first knowledge statement in the second negative training sample, includes: The first knowledge statement in the second positive training sample is encoded into a third vector group, and the third vector group is pooled to obtain a sixth vector. The first knowledge statement in the second negative training sample is encoded into a fourth vector group, and the fourth vector group is pooled to obtain a seventh vector. The fifth vector, the sixth vector, and the seventh vector are mapped and normalized to obtain updated fifth vector, updated sixth vector, and updated seventh vector with the same length; Calculate the dot product between the updated fifth vector and the updated sixth vector to obtain the third function value; calculate the dot product between the updated fifth vector and the updated seventh vector to obtain the fourth function value.

12. A method for obtaining a response statement, characterized in that, The method includes: Retrieve the target message that needs to be replied to; The target statement is input into the encoding sub-model in the response model. The encoding sub-model is used to encode the target statement into a target vector. The response model also includes a decoding sub-model, which is used to output the target response statement corresponding to the target statement based on the target vector. The encoding sub-model and the decoding sub-model are trained based on a first training sample. The first training sample is generated based on a reference statement, a reference response statement, and first knowledge statements corresponding to each first word in the reference statement. The vectors included in the target vector correspond one-to-one with the words in the target statement. During the training process, the knowledge learned by the encoding sub-model and the decoding sub-model is the knowledge corresponding to the words. A vector is the representation of the word corresponding to the vector in space. Obtain the target response statement output by the decoding sub-model in the response model.

13. A device for acquiring a response model, characterized in that, The device includes: The acquisition module is used to acquire a first training sample, which is generated based on a reference speech statement, a reference reply statement, and a first knowledge statement corresponding to each first word in the reference speech statement; The training module is used to train a response model based on the first training samples. The response model includes an encoding sub-model and a decoding sub-model. The trained encoding sub-model is used to encode the input target statement to be responded to into a target vector. The trained decoding sub-model is used to output the target response statement corresponding to the target statement based on the target vector. During the training process, the knowledge learned by the encoding sub-model and the decoding sub-model is the knowledge of word correspondence. The vectors included in the target vector correspond one-to-one with the words in the target statement. A vector is the representation of the word corresponding to the vector in space.

14. The apparatus according to claim 13, characterized in that, The first training sample includes a first sub-training sample and a second sub-training sample; The acquisition module is used to acquire the reference speech statement, determine the first knowledge statement corresponding to each first word in the reference speech statement, and generate the first sub-training sample based on the first knowledge statement corresponding to each first word in the reference speech statement. Determine the reference response statement corresponding to the reference speech statement, and generate the second sub-training sample based on the reference response statement; The training module is used to train the encoding sub-model and the decoding sub-model based on the first sub-training sample and the second sub-training sample.

15. The apparatus according to claim 14, characterized in that, The acquisition module is used for: The reference speech statement is input into the retrieval model, which encodes each first word in the reference speech statement into a first vector and outputs each first vector. Each first word corresponds one-to-one with each first vector. The retrieval model is trained based on a second training sample, which is generated based on a second word, the statement containing the second word, and the second knowledge statement corresponding to the second word. Obtain each of the first vectors output by the retrieval model; For any first vector, retrieve the candidate knowledge statement that is most relevant to the first vector from the candidate knowledge statements included in the knowledge base, and obtain the first knowledge statement corresponding to the first vector.

16. The apparatus according to claim 15, characterized in that, The acquisition module is also used for: Obtain the description statement corresponding to the term, wherein the description statement includes at least one descriptive term for the term; For any descriptive term, the descriptive term is determined as the second term, and the descriptive statement is determined as the statement containing the second term and the second knowledge statement corresponding to the second term; The second training sample is generated based on the second word, the sentence containing the second word, and the second knowledge sentence corresponding to the second word. The retrieval model is then trained based on the second training sample.

17. The apparatus according to claim 16, characterized in that, The second training sample includes a first positive training sample and a first negative training sample. The acquisition module is used to: For any second word, the first positive training sample is formed by the second word, the statement containing the second word, and the second knowledge statement corresponding to the second word. The first negative training sample is formed by the second word, the statement containing the second word, and the second knowledge statements corresponding to other second words. The second knowledge statements corresponding to other second words are unrelated to the second word.

18. The apparatus according to claim 17, characterized in that, The acquisition module is used for: For any second word, input the sentence containing the second word into the first initial model to obtain the second vector corresponding to the second word output by the first initial model; Determine the first function value between the second vector and the second knowledge statement in the first positive training sample, and determine the second function value between the second vector and the second knowledge statement in the first negative training sample; A first loss function value is determined based on the first function value and the second function value, and the first initial model is updated to the retrieval model based on the first loss function value.

19. The apparatus according to claim 18, characterized in that, The acquisition module is used for: The second knowledge statement in the first positive training sample is encoded into a first vector group, and the first vector group is pooled to obtain a third vector. The second knowledge statement in the first negative training sample is encoded into a second vector group, and the second vector group is pooled to obtain a fourth vector. The second vector, the third vector, and the fourth vector are mapped and normalized to obtain updated second vectors, updated third vectors, and updated fourth vectors of the same length; Calculate the dot product between the updated second vector and the updated third vector to obtain the first function value; calculate the dot product between the updated second vector and the updated fourth vector to obtain the second function value.

20. The apparatus according to any one of claims 14-19, characterized in that, The training module is used for: The first word in the first sub-training sample is input into the second initial model to obtain the fifth vector corresponding to the first word in the first sub-training sample output by the second initial model. The second loss function value is determined based on the fifth vector and the first knowledge statement in the first sub-training sample. The fifth vector corresponding to each of the first words is input into the third initial model to determine the third loss function value. The third loss function value is used to indicate the probability that the third initial model outputs the reference response statement in the second sub-training sample based on the fifth vector corresponding to each of the first words. A fourth loss function value is determined based on the second loss function value and the third loss function value. Based on the fourth loss function value, the second initial model is updated to the encoding sub-model, and the third initial model is updated to the decoding sub-model.

21. The apparatus according to claim 20, characterized in that, The first sub-training sample includes a second positive training sample and a second negative training sample. The acquisition module is used to: For any first word, the first word and the first knowledge statement corresponding to the first word are combined to form the second positive training sample, and the first knowledge statement corresponding to the other first words is combined to form the second negative training sample, wherein the first knowledge statement corresponding to the other first words is unrelated to the first word.

22. The apparatus according to claim 21, characterized in that, The training module is used for: For any first word, input the first word into the second initial model to obtain the fifth vector corresponding to the first word output by the second initial model; The third function value is determined based on the fifth vector and the first knowledge statement in the second positive training sample, and the fourth function value is determined based on the fifth vector and the first knowledge statement in the second negative training sample. The second loss function value is determined based on the third function value and the fourth function value.

23. The apparatus according to claim 22, characterized in that, The training module is used for: The first knowledge statement in the second positive training sample is encoded into a third vector group, and the third vector group is pooled to obtain a sixth vector. The first knowledge statement in the second negative training sample is encoded into a fourth vector group, and the fourth vector group is pooled to obtain a seventh vector. The fifth vector, the sixth vector, and the seventh vector are mapped and normalized to obtain updated fifth vector, updated sixth vector, and updated seventh vector with the same length; Calculate the dot product between the updated fifth vector and the updated sixth vector to obtain the third function value; calculate the dot product between the updated fifth vector and the updated seventh vector to obtain the fourth function value.

24. A device for acquiring response statements, characterized in that, The device includes: The acquisition module is used to acquire the target message that needs to be replied to; An input module is used to input the target speech statement into the encoding sub-model of the response model. The encoding sub-model is used to encode the target speech statement into a target vector. The response model also includes a decoding sub-model, which is used to output the target response statement corresponding to the target speech statement based on the target vector. The encoding sub-model and the decoding sub-model are trained based on a first training sample. The first training sample is generated based on a reference speech statement, a reference response statement, and first knowledge statements corresponding to each first word in the reference speech statement. The vectors included in the target vector correspond one-to-one with the words in the target speech statement. During the training process, the knowledge learned by the encoding sub-model and the decoding sub-model is the knowledge corresponding to the words. A vector is the representation of the word corresponding to the vector in space. The obtaining module is used to obtain the target response statement output by the decoding sub-model in the response model.

25. A computer device, characterized in that, The computer device includes a memory and a processor; the memory stores at least one instruction, which is loaded and executed by the processor to enable the computer device to implement the method for obtaining the response model according to any one of claims 1-11 or the method for obtaining the response statement according to claim 12.

26. A computer-readable storage medium, characterized in that, The computer-readable storage medium stores at least one instruction, which is loaded and executed by a processor to enable the computer to implement the method for obtaining a response model as described in any one of claims 1-11 or the method for obtaining a response statement as described in claim 12.

27. A computer program product comprising computer instructions, wherein when executed by a computer, the computer instructions cause the computer to implement the method for obtaining a response model as described in any one of claims 1-11 or the method for obtaining a response statement as described in claim 12.