Dialogue evaluation model processing method and device, computer device, and storage medium
By training the model using pseudo-dialogue quality and predicted dialogue quality obtained from positive and negative dialogue samples, the problem of the influence of the annotator's subjective consciousness is solved, and the training effect of the dialogue evaluation model is improved.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- CHINA PING AN LIFE INSURANCE CO LTD
- Filing Date
- 2022-10-10
- Publication Date
- 2026-06-19
AI Technical Summary
The dialogue evaluation models trained in the existing technology are affected by the subjective consciousness of the annotators, resulting in poor model training performance.
By obtaining the pseudo-dialogue quality and predicted dialogue quality of positive and negative dialogue samples, the model is trained, avoiding the labeling of the original dialogue data, and using the pseudo-dialogue quality and predicted dialogue quality for model training.
This improved the model's learning performance, avoided the influence of the annotator's subjective opinion on the model training, and enhanced the accuracy of the model's dialogue quality assessment.
Smart Images

Figure CN115545045B_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the field of artificial intelligence technology, and in particular to a dialogue evaluation model processing method, apparatus, computer device, and storage medium. Background Technology
[0002] Using dialogue evaluation models to assess the dialogue quality of dialogue data is a common approach.
[0003] In existing technologies, training a dialogue evaluation model requires manually annotating the original dialogue data beforehand, then using the annotated original dialogue data to train the model, and finally obtaining the dialogue evaluation model.
[0004] However, the labeling effect of the original dialogue data can be affected by the subjective consciousness of the labelers, resulting in poor training effect of the dialogue evaluation model obtained by training the model based on the labeled original dialogue data. Summary of the Invention
[0005] This invention provides a method, apparatus, computer device, and storage medium for processing dialogue evaluation models, in order to solve the problem that the dialogue evaluation models trained in the prior art are affected by the subjective consciousness of the annotators, resulting in poor model training effects.
[0006] This invention discloses a method for processing dialogue evaluation models, the method comprising:
[0007] Based on the original dialogue data, dialogue training samples are obtained, which include positive dialogue samples and negative dialogue samples.
[0008] Based on the positive and negative dialogue samples, corresponding pseudo-dialogue quality is obtained; the pseudo-dialogue quality represents the intensity of the user's willingness to engage in dialogue.
[0009] Based on the positive dialogue samples and the negative dialogue samples, corresponding predicted dialogue quality is obtained; the predicted dialogue quality characterizes the overall dialogue quality of the positive dialogue samples or the negative dialogue samples.
[0010] Based on the positive dialogue samples and their corresponding pseudo-dialogue quality and predicted dialogue quality, and the negative dialogue samples and their corresponding pseudo-dialogue quality and predicted dialogue quality, a model is trained to obtain a dialogue model.
[0011] Optionally, in the above method, the positive dialogue sample and the negative dialogue sample each contain at least one round of dialogue, and each round of dialogue contains a corresponding system response;
[0012] The step of obtaining the corresponding pseudo-dialogue quality based on the positive dialogue samples and the negative dialogue samples includes:
[0013] Obtain the total number of dialogue rounds for the positive dialogue sample and the number of dialogue rounds corresponding to each system response;
[0014] The pseudo-dialogue quality corresponding to each system response is obtained based on the total number of dialogue rounds and the number of dialogue rounds corresponding to each system response, as well as the preset pseudo-dialogue quality calculation formula.
[0015] According to the preset assignment rules, the pseudo-dialogue quality value corresponding to the system response in each round of dialogue in the negative dialogue sample is assigned to 0; the assignment rules are the assignment rules for the pseudo-dialogue quality of the negative dialogue sample.
[0016] Optionally, in the above method, obtaining the corresponding predicted dialogue quality based on the positive dialogue samples and the negative dialogue samples includes:
[0017] The positive and negative dialogue samples are input into a pre-trained encoding model to obtain the corresponding dialogue codes.
[0018] The dialogue encoding of the positive dialogue sample and the dialogue encoding of the negative dialogue sample are respectively input into the linear layer for calculation to obtain the predicted dialogue quality corresponding to the positive dialogue sample and the predicted dialogue quality corresponding to the negative dialogue sample.
[0019] Optionally, in the above method, the function formula for the linear layer is:
[0020] p i =sigmoid(<O,W> +b)
[0021] Where, p i The predicted dialogue quality is defined as O, the dialogue encoding is W, the weight is b, and the bias is b.
[0022] Optionally, in the above method, obtaining the dialogue training samples based on the original dialogue data includes:
[0023] For each piece of dialogue data in the original dialogue data, pseudo data is constructed according to the preset dialogue prediction to obtain the corresponding pseudo dialogue data.
[0024] Each dialogue data point is concatenated to obtain positive dialogue samples.
[0025] Each pseudo-dialogue data point is concatenated to obtain a negative dialogue sample.
[0026] Optionally, the above method, wherein training the model based on the positive dialogue samples and their corresponding pseudo-dialogue quality and predicted dialogue quality, and the negative dialogue samples and their corresponding pseudo-dialogue quality and predicted dialogue quality, to obtain a dialogue model, includes:
[0027] The dialogue data in the positive dialogue samples and the pseudo dialogue data in the negative dialogue samples are divided into batches.
[0028] For each batch of dialogue data and pseudo-dialogue data, the loss is calculated based on the corresponding pseudo-dialogue quality, predicted dialogue quality, and loss function formula.
[0029] The average loss is calculated based on the loss values corresponding to the dialogue data and pseudo-dialogue data in each batch.
[0030] Based on the obtained average loss, continuously update the model parameters in the training model;
[0031] Determine whether the training model has reached the preset training termination condition;
[0032] If the training model reaches the training termination condition, a dialogue model is obtained.
[0033] Optionally, the loss function formula in the above method is:
[0034]
[0035] Where, q i For pseudo-dialogue quality, p i To predict the quality of the dialogue, i represents the number of dialogue turns.
[0036] The present invention also discloses a dialogue evaluation model processing device, comprising:
[0037] The sample acquisition unit is used to obtain dialogue training samples based on the original dialogue data. The dialogue training samples include positive dialogue samples and negative dialogue samples.
[0038] The pseudo-dialogue quality acquisition unit is used to obtain the corresponding pseudo-dialogue quality based on the positive dialogue sample and the negative dialogue sample, respectively.
[0039] The predicted dialogue quality acquisition unit is used to obtain the corresponding predicted dialogue quality based on the positive dialogue samples and the negative dialogue samples, respectively.
[0040] The model training unit is used to train the model based on the positive dialogue samples and their corresponding pseudo-dialogue quality and predicted dialogue quality, and the negative dialogue samples and their corresponding pseudo-dialogue quality and predicted dialogue quality, to obtain the dialogue model.
[0041] The present invention also discloses a computer device, including a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein the processor executes the computer program to implement various steps as in a dialogue evaluation model processing method.
[0042] The present invention also discloses a computer-readable storage medium storing a computer program that, when executed by a processor, implements various steps of a dialogue evaluation model processing method.
[0043] The aforementioned dialogue evaluation model processing method, apparatus, computer equipment, and storage medium obtain dialogue training samples containing positive and negative dialogue samples based on the original dialogue data. Then, based on the dialogue training samples, corresponding pseudo-dialogue quality and predicted dialogue quality are obtained. Finally, the model is trained using the dialogue training samples and their corresponding pseudo-dialogue quality and predicted dialogue quality to obtain the dialogue model. It is evident that model training primarily utilizes the pseudo-dialogue quality and predicted dialogue quality corresponding to positive and negative dialogue samples, eliminating the need for annotation personnel to label the original dialogue data. This avoids the influence of annotation personnel's subjective opinions on the model's learning effect and improves the model's learning efficiency. Attached Figure Description
[0044] To more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings used in the description of the embodiments of the present invention will be briefly introduced below. Obviously, the drawings described below are only some embodiments of the present invention. For those skilled in the art, other drawings can be obtained based on these drawings without creative effort.
[0045] Figure 1 This is a flowchart illustrating a dialogue evaluation model processing method disclosed in Embodiment 1 of the present invention;
[0046] Figure 2 This is a partial flowchart of a dialogue evaluation model processing method disclosed in Embodiment 1 of the present invention;
[0047] Figure 3 This is a partial flowchart of a dialogue evaluation model processing method disclosed in Embodiment 1 of the present invention;
[0048] Figure 4 This is a partial flowchart of a dialogue evaluation model processing method disclosed in Embodiment 1 of the present invention;
[0049] Figure 5 This is a partial flowchart of a dialogue evaluation model processing method disclosed in Embodiment 1 of the present invention;
[0050] Figure 6 This is a schematic diagram of the structure of a dialogue evaluation model processing device disclosed in Embodiment 2 of the present invention;
[0051] Figure 7 This is a schematic diagram of the structure of a computer device disclosed in Embodiment 3 of the present invention. Detailed Implementation
[0052] The technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some, not all, of the embodiments of the present invention. Based on the embodiments of the present invention, all other embodiments obtained by those skilled in the art without creative effort are within the scope of protection of the present invention.
[0053] This invention discloses a dialogue evaluation model processing method, apparatus, computer device, and storage medium. The method obtains dialogue training samples containing positive and negative dialogue samples from original dialogue data. Then, it acquires the pseudo-dialogue quality and predicted dialogue quality of the positive and negative dialogue samples, and trains a model based on these values to obtain a dialogue model. This dialogue model is used to evaluate the dialogue quality of new dialogue data. It is understood that the original dialogue data in this invention contains more than one dialogue data point. Dialogue data from the original dialogue data are used as positive dialogue samples. Corresponding negative dialogue samples are constructed based on the positive dialogue samples. Then, the pseudo-dialogue quality and predicted dialogue quality of the positive and negative dialogue samples are acquired. Finally, based on the acquired pseudo-dialogue quality and predicted dialogue quality of the positive and negative dialogue samples, the model is trained to obtain the dialogue model. As can be seen, this invention primarily trains the model based on the pseudo-dialogue quality and predicted dialogue quality corresponding to positive dialogue samples, and the pseudo-dialogue quality and predicted dialogue quality corresponding to negative dialogue samples. This eliminates the need for annotation personnel to label the original dialogue data, thus avoiding the influence of the annotation personnel's subjective opinions on the model training effect. Specific embodiments are described below.
[0054] It should be noted that the model structure used for model training in the embodiments of this invention can be any model structure that can achieve the technical effect of this invention, such as the BERT model. The model structure of the main model is not limited here.
[0055] Example 1
[0056] like Figure 1 The diagram shown is a flowchart of a dialogue evaluation model processing method disclosed in Embodiment 1 of the present invention. The technical method in this embodiment mainly trains the model by calculating the pseudo-dialogue quality and predicted dialogue quality of positive dialogue samples, as well as the pseudo-dialogue quality and predicted dialogue quality of negative dialogue samples. This eliminates the need for labelers to mark the original dialogue data, thus avoiding the influence of the labelers' subjective opinions on the model's learning effect.
[0057] Specifically, the method in this embodiment may include the following steps:
[0058] S101: Obtain dialogue training samples based on the original dialogue data.
[0059] The dialogue training samples include positive dialogue samples and negative dialogue samples.
[0060] In this specific implementation, the original dialogue data in this embodiment is normally collected dialogue data, including questions asked by the user and system responses made by the dialogue system based on those questions. The original dialogue is used as positive dialogue samples. Corresponding negative dialogue samples are obtained from the original dialogue data, and subsequent steps of dialogue model training are performed based on these positive and negative samples. It is understood that the original dialogue data in this application may contain multiple dialogue data points. These processed dialogue data points serve as positive dialogue samples. The content of each dialogue data point is processed; for example, the original system response portion of each dialogue data point is replaced with a pre-constructed, poor, or incorrect system response, serving as a control sample, i.e., a negative dialogue sample, for the positive dialogue samples.
[0061] S102: Based on positive and negative dialogue samples, the corresponding pseudo-dialogue quality is obtained.
[0062] Among them, pseudo-dialogue quality represents the user's willingness to engage in dialogue.
[0063] In this implementation, the pseudo-dialogue quality can be represented numerically. A higher pseudo-dialogue quality value indicates a stronger user's willingness to engage in dialogue, and vice versa. Alternatively, a lower pseudo-dialogue quality value indicates a stronger user's willingness to engage in dialogue, and vice versa. Positive and negative dialogue samples are processed separately. The pseudo-dialogue quality of the positive samples is calculated, and then the pseudo-dialogue quality of the negative samples is calculated. Based on this, subsequent steps of dialogue model processing are performed according to the obtained pseudo-dialogue quality.
[0064] S103: Obtain the corresponding predicted dialogue quality based on positive and negative dialogue samples.
[0065] Among them, the predicted dialogue quality characterizes the overall dialogue quality of positive or negative dialogue samples.
[0066] In this implementation, the predicted dialogue quality can be represented numerically. A higher predicted dialogue quality value indicates better overall dialogue quality for either positive or negative samples, while a lower value indicates worse overall dialogue quality. Alternatively, a lower predicted dialogue quality value indicates better overall dialogue quality for either positive or negative samples, and vice versa. Positive and negative dialogue samples are processed separately to calculate the predicted dialogue quality for both positive and negative samples. Based on this, subsequent steps in dialogue model processing are performed using the obtained predicted dialogue quality.
[0067] S104: The dialogue model is trained based on positive dialogue samples and their corresponding pseudo-dialogue quality and predicted dialogue quality, and negative dialogue samples and their corresponding pseudo-dialogue quality and predicted dialogue quality.
[0068] Positive dialogue samples, along with the pseudo-dialogue quality calculated from the positive dialogue samples and the predicted dialogue quality, and negative dialogue samples, along with the pseudo-dialogue quality calculated from the negative dialogue samples and the predicted dialogue quality, are input into the training model for model training until the model training is completed, resulting in a trained dialogue model.
[0069] Understandably, when newly collected dialogue data is input into a trained dialogue model, the model outputs the dialogue quality of that data. If the dialogue quality is poor, targeted adjustments need to be made to the system's response to improve the dialogue level of the system as much as possible.
[0070] As can be seen from the above embodiments, in the dialogue model processing method disclosed in this invention, dialogue training samples containing positive and negative dialogue samples are obtained based on the original dialogue data. Corresponding pseudo-dialogue quality and predicted dialogue quality are then obtained based on the dialogue training samples. Subsequently, the model is trained based on the dialogue training samples and the corresponding pseudo-dialogue quality and predicted dialogue quality to obtain the dialogue model. It is evident that model training is primarily performed using the pseudo-dialogue quality and predicted dialogue quality corresponding to positive and negative dialogue samples. This eliminates the need for annotation personnel to label the original dialogue data, avoiding the influence of the annotation personnel's subjective opinions on the model's learning effect and improving the model's learning efficiency.
[0071] Based on Figure 1 In its specific implementation, step S102 can be achieved through the following steps, such as... Figure 2 As shown:
[0072] In this embodiment, the positive and negative dialogue samples each contain at least one round of dialogue, with each round including a user-asked question and a corresponding system response. This embodiment primarily processes the system response in each round of dialogue to achieve the following steps:
[0073] S201: Obtain the total number of dialogue rounds for the positive dialogue samples and the number of dialogue rounds corresponding to each system response.
[0074] In this implementation, the positive dialogue sample in this embodiment contains at least one round of dialogue. That is, a positive dialogue sample can contain one or more rounds of dialogue, and each round includes a system response. Each system response in a round can be one or more sentences. For example, if a positive dialogue sample contains 10 rounds of dialogue, then the total number of rounds in the positive dialogue sample is 10. The number of rounds corresponding to the system response in the first round is 1, and the number of rounds corresponding to the system response in the tenth round is 10. In other words, one round of dialogue is completed when the user asks a question and the system responds once. The total number of rounds of dialogue completed after the user finishes interacting with the system is the total number of rounds of dialogue.
[0075] S202: Based on the total number of dialogue rounds and the number of dialogue rounds corresponding to each system response, as well as the preset pseudo-dialogue quality calculation formula, the pseudo-dialogue quality corresponding to each system response is obtained.
[0076] The pseudo-dialogue quality is calculated by inputting the number of dialogue rounds corresponding to the system response in the first round of dialogue in the positive dialogue sample and the total number of dialogue rounds into a preset pseudo-dialogue quality calculation formula. Then, the pseudo-dialogue quality is calculated by inputting the number of dialogue rounds corresponding to the system response in the second round of dialogue in the positive dialogue sample and the total number of dialogue rounds into the preset pseudo-dialogue quality calculation formula. This process is repeated to calculate the pseudo-dialogue quality corresponding to the system response in each round of dialogue in the positive dialogue sample.
[0077] In its implementation, the pseudo-dialogue quality calculation formula in this embodiment needs to satisfy the characteristic that the pseudo-dialogue quality increases or decreases with the increase of the number of dialogue rounds, thereby characterizing the user's level of dialogue willingness through the numerical value of the pseudo-dialogue quality. One such pseudo-dialogue quality calculation formula in this embodiment is shown below:
[0078]
[0079] Where n is the total number of dialogue rounds, and i is the number of dialogue rounds corresponding to the system's response.
[0080] For example, taking a positive dialogue sample with 10 rounds of dialogue as an example, the pseudo-dialogue quality of the system's response in the first round of dialogue in the positive dialogue sample is calculated as follows: The pseudo-dialogue quality of the system's response in the second round of dialogue in the positive dialogue sample is: The comparison shows that It is evident that in the first round of dialogue in the positive sample, the customer's willingness to engage in dialogue is stronger than in the second round of dialogue in the positive sample.
[0081] For example, taking a positive sample of dialogue A with 10 rounds of dialogue and a positive sample of dialogue B with 5 rounds of dialogue as examples, the pseudo-dialogue quality of the system response in the first round of dialogue in the positive sample of dialogue A is calculated as follows: Find the pseudo-dialogue quality of the system response in the first round of dialogue in the positive sample of dialogue B. The comparison shows that It is evident that in the positive sample of dialogue A, the customer's willingness to engage in dialogue in the first round of dialogue is stronger than that in the positive sample of dialogue B.
[0082] S203: According to the preset assignment rules, assign the value of the pseudo-dialogue quality corresponding to the system response in each round of dialogue in the negative dialogue sample to 0.
[0083] The assignment rule is a rule for assigning values to the pseudo-dialogue quality of negative dialogue samples.
[0084] In this implementation, the dialogue negative samples are obtained from the dialogue positive samples and serve as a control sample. The system responses in each round of dialogue in the dialogue negative samples can be understood as poorly understood or incorrect system responses. To facilitate better model identification of dialogue positive and negative samples during training, the pseudo-dialogue quality of the system responses in each round of dialogue in the dialogue negative samples is directly assigned a value of 0 according to a preset pseudo-dialogue quality assignment rule. In other words, the pseudo-dialogue quality of the system responses in each round of dialogue in the dialogue negative samples does not need to be calculated and is directly assigned a value of 0. Based on this, the pseudo-dialogue quality of the system responses in each round of dialogue in the dialogue negative samples is obtained.
[0085] Based on Figure 1 In its specific implementation, step S103 can be achieved through the following steps, such as... Figure 3 As shown:
[0086] S301: Input the positive and negative dialogue samples into the pre-trained encoding model to obtain the corresponding dialogue codes.
[0087] Positive dialogue samples are input into a pre-trained encoding model to obtain the corresponding dialogue code, and negative dialogue samples are input into a pre-trained encoding model to obtain the corresponding dialogue code.
[0088] In this implementation, the encoding model can be any model capable of performing encoding functions, such as the BERT model. After pre-training the encoding model, it can automatically distinguish whether the input dialogue sample is a positive or negative dialogue sample, and then encode the positive or negative dialogue sample to obtain the dialogue code. Furthermore, the dialogue code in this embodiment is a two-dimensional vector.
[0089] S302: Input the dialogue encoding of the positive dialogue sample and the dialogue encoding of the negative dialogue sample into the linear layer for calculation, and obtain the predicted dialogue quality corresponding to the positive dialogue sample and the predicted dialogue quality corresponding to the negative dialogue sample, respectively.
[0090] The dialogue encoding of positive dialogue samples is input into a linear layer for calculation to obtain the predicted dialogue quality corresponding to the positive dialogue samples. Similarly, the dialogue encoding of negative dialogue samples is input into a linear layer for calculation to obtain the predicted dialogue quality corresponding to the negative dialogue samples.
[0091] In this specific implementation, the obtained two-dimensional vector representing the dialogue encoding is input into the function formula of the linear layer to calculate the predicted dialogue quality for positive and negative dialogue samples. Based on this, subsequent steps of model training are performed according to the calculated predicted dialogue quality.
[0092] exist Figure 3 In the specific implementation, the function calculation formula for the linear layer in this embodiment can be as follows:
[0093] p i =sigmoid(<O,W> +b)
[0094] Where, p i To predict dialogue quality, O represents the dialogue encoding, W represents the weights, and b represents the bias.
[0095] In the specific implementation, the predicted dialogue quality in this application is the dialogue quality of a complete dialogue sample, such as the predicted dialogue quality of a positive dialogue sample or the predicted dialogue quality of a negative dialogue sample. The predicted dialogue quality calculated according to the function calculation formula of the linear layer is used as the subsequent step for model training.
[0096] Based on Figure 1 In its specific implementation, step S101 can be achieved through the following steps, such as... Figure 4 As shown:
[0097] S401: For each piece of dialogue data in the original dialogue data, construct pseudo data according to the preset dialogue prediction to obtain the corresponding pseudo dialogue data.
[0098] The original dialogue data contains at least one dialogue data. Each dialogue data can have a different total number of dialogue rounds. For example, some dialogue data has 10 dialogue rounds, while some dialogue data only has 5 dialogue rounds, and so on. There is a system response in each dialogue round. Pseudo-data is constructed for these system responses.
[0099] In this specific implementation, the dialogue prediction in this embodiment is a prediction library composed of pre-constructed poor system responses or pre-constructed incorrect system responses. During pseudo-data construction, for each dialogue data point and each round of dialogue, a system response is randomly selected from the prediction library to replace the original system response in each round of dialogue. This process is repeated, replacing the system responses in each dialogue data point in the original dialogue data with system responses from the prediction library. This completes the construction of the pseudo-dialogue data.
[0100] S402: Concatenate each dialogue data to obtain positive dialogue samples;
[0101] In this implementation, each dialogue data item can be concatenated using serialization. After concatenation, a token sequence is obtained. Then, each dialogue data item in the original dialogue data is serialized separately to obtain a positive dialogue sample. One way to concatenate dialogue data into a token sequence is shown below:
[0102] For example:
[0103] D = [u1, r1 ... u] i ,r i ]
[0104] Where D represents a dialogue data point, u1 is the user's question in the first round of dialogue, r1 is the system's response in the first round of dialogue, and u i For the questions asked by the user in the i-th round of the conversation, r i This is the system response in the i-th round of dialogue.
[0105] After serialization of the dialogue data D, a token sequence is obtained, which is the positive sample of the dialogue, as shown below:
[0106] [cls],[u1],[sep],[r1],[sep]……[r i [sep]
[0107] [cls] is the dialogue code, which is a two-dimensional vector.
[0108] S403: Concatenate each pseudo-dialogue data to obtain a negative dialogue sample.
[0109] In a specific implementation, this embodiment can concatenate each pseudo-dialogue data in a serialization manner. After concatenating the pseudo-logarithmic data, a token sequence is obtained. Serialization processing is then performed on each pseudo-dialogue data separately to obtain the dialogue negative sample.
[0110] Based on Figure 4 In its specific implementation, step S104 can be achieved in the following ways, such as... Figure 5 As shown:
[0111] S501: Divide the dialogue data in the positive dialogue samples and the pseudo-dialogue data in the negative dialogue samples into batches.
[0112] Positive dialogue samples and negative dialogue samples each contain the same number of dialogue data and pseudo-dialogue data. The dialogue data in the positive dialogue samples and the pseudo-dialogue data in the negative dialogue samples are divided into batches. This batching can be done in several ways, as shown below:
[0113] In the first implementation, each batch contains the same number of dialogue data and pseudo-dialogue data. That is, in the same divided batch, if there are ten dialogue data, there will be ten pseudo-dialogue data.
[0114] In the second implementation, each batch contains at least one dialogue data and at least one pseudo dialogue data.
[0115] In the third implementation, each batch contains only dialogue data or only pseudo-dialogue data.
[0116] S502: For each batch of dialogue data and pseudo-dialogue data, calculate the loss based on the corresponding pseudo-dialogue quality, predicted dialogue quality, and loss function formula.
[0117] For each batch of dialogue data and / or pseudo-dialogue data, the loss for each piece of dialogue data and / or pseudo-dialogue data is calculated based on the corresponding pseudo-dialogue quality and predicted dialogue quality, as well as the loss function formula. This loss function formula is derived based on the Mean Square Error (MSE) function.
[0118] S503: Calculate the average loss based on the loss values corresponding to the dialogue data and pseudo-dialogue data in each batch.
[0119] For the loss values corresponding to dialogue data and / or pseudo-dialogue data in each batch, the arithmetic mean value is calculated, which is the average loss value.
[0120] In this specific implementation, when calculating the average loss, the sum of the loss values of dialogue data and / or pseudo-dialogue data in the batch is divided by the sum of the number of dialogue data and / or pseudo-dialogue data entries in the batch. Then, the subsequent steps of model training are performed based on the obtained average loss value.
[0121] S504: Continuously update the model parameters in the training model based on the average loss obtained;
[0122] In this specific implementation, based on the average loss value obtained, the gradient corresponding to each model parameter in the model is calculated. The gradient of the model parameters in the dialogue model is reduced by updating the parameters, thereby reducing the model loss value and achieving the purpose of updating the model parameters in the training model.
[0123] S505: Determine whether the training model has reached the preset training termination condition.
[0124] The model parameters in the training model are continuously updated, and it is continuously judged whether the training model has reached the preset training termination condition. If the training model has reached the preset training termination condition, step S506 is executed, that is, the dialogue model is obtained; if the training model has not reached the preset training termination condition, the process returns to step S504, that is, the model parameters in the training model are updated again.
[0125] In specific implementation, the training termination conditions in this embodiment include the following, as shown below:
[0126] The first implementation method terminates training when there are no positive or negative dialogue samples available for model training.
[0127] When all the positive and negative dialogue samples required for model training have been trained and there are no more samples available for training, the training termination condition is met, and training ends.
[0128] The second implementation method terminates training when a preset number of iterations is reached.
[0129] For a model structure used for training, a relatively accurate training model can be obtained after one training iteration. When the preset number of iterations is reached, the training termination condition, namely the number of village connections, is met.
[0130] The third implementation method terminates training when the model parameters no longer change.
[0131] During model training, the model parameters are constantly changing. When the model parameters no longer change, it can be considered that a good training model has been obtained, and training ends at this point.
[0132] S506: Obtain the dialogue model.
[0133] Once the preset training termination conditions are met, a good dialogue model is considered to have been obtained. At this point, newly collected dialogue data is input into the dialogue model, and the dialogue model will output the dialogue quality of the data. If the dialogue quality of the data is poor, targeted adjustments need to be made to the system response to improve the dialogue level of the system as much as possible.
[0134] In one implementation, the formula for the loss function in this embodiment is as follows:
[0135]
[0136] Where, q i For pseudo-dialogue quality, p i To predict the quality of the dialogue, i represents the number of dialogue turns.
[0137] Example 2
[0138] like Figure 6 The diagram shown is a schematic diagram of a dialogue evaluation model processing device disclosed in Embodiment 2 of the present invention. The technical method in this embodiment mainly trains the model by calculating the pseudo-dialogue quality and predicted dialogue quality of positive dialogue samples, as well as the pseudo-dialogue quality and predicted dialogue quality of negative dialogue samples. This eliminates the need for labelers to mark the original dialogue data, thus avoiding the influence of the labelers' subjective consciousness on the model's learning effect.
[0139] Specifically, the device in this embodiment may include the following units:
[0140] The sample acquisition unit 601 is used to obtain dialogue training samples based on the original dialogue data, wherein the dialogue training samples include positive dialogue samples and negative dialogue samples.
[0141] The pseudo-dialogue quality acquisition unit 602 is used to obtain the corresponding pseudo-dialogue quality based on the positive dialogue samples and the negative dialogue samples, respectively.
[0142] The prediction dialogue quality acquisition unit 603 is used to obtain the corresponding prediction dialogue quality based on the positive dialogue samples and the negative dialogue samples, respectively.
[0143] The model training unit 604 is used to train the model based on positive dialogue samples and their corresponding pseudo-dialogue quality and predicted dialogue quality, and negative dialogue samples and their corresponding pseudo-dialogue quality and predicted dialogue quality, to obtain the dialogue model.
[0144] In one implementation, the positive dialogue sample and the negative dialogue sample each contain at least one round of dialogue, and each round of dialogue contains a corresponding system response. The pseudo-dialogue quality acquisition unit 602 includes the following units:
[0145] The dialogue turn acquisition unit is used to acquire the total number of dialogue turns for positive dialogue samples and the number of dialogue turns corresponding to each system response;
[0146] The pseudo-dialogue quality calculation unit is used to obtain the pseudo-dialogue quality corresponding to each system response based on the total number of dialogue rounds, the number of dialogue rounds corresponding to each system response, and the preset pseudo-dialogue quality calculation formula.
[0147] The pseudo-dialogue quality assignment unit is used to assign a value of 0 to the pseudo-dialogue quality corresponding to the system response in each round of dialogue in the negative dialogue sample, according to a preset assignment rule; the assignment rule is the pseudo-dialogue quality assignment rule for the negative dialogue sample.
[0148] In one implementation, the dialogue quality prediction unit 603 includes the following units:
[0149] The sample encoding unit is used to input positive and negative dialogue samples into the pre-trained encoding model to obtain the corresponding dialogue codes.
[0150] The linear layer computation unit is used to input the dialogue encoding of positive dialogue samples and the dialogue encoding of negative dialogue samples into the linear layer for computation, so as to obtain the predicted dialogue quality corresponding to the positive dialogue samples and the predicted dialogue quality corresponding to the negative dialogue samples, respectively.
[0151] In one implementation, the function formula for the linear layer is:
[0152] p i =sigmoid(<O,W> +b)
[0153] Where, p i To predict dialogue quality, O represents the dialogue encoding, W represents the weights, and b represents the bias.
[0154] In one implementation, the sample acquisition unit 601 includes the following units:
[0155] The pseudo-dialogue data acquisition unit is used to construct pseudo-data for each piece of dialogue data in the original dialogue data according to the preset dialogue prediction, so as to obtain the corresponding pseudo-dialogue data.
[0156] The dialogue data splicing unit is used to splice each piece of dialogue data to obtain positive dialogue samples;
[0157] The pseudo-dialogue data splicing unit is used to splice each pseudo-dialogue data to obtain a negative dialogue sample.
[0158] In one implementation, the model training unit 604 includes the following units:
[0159] The batch partitioning unit is used to partition the dialogue data in the positive dialogue samples and the pseudo-dialogue data in the negative dialogue samples into batches.
[0160] The loss calculation unit is used to calculate the loss for each batch of dialogue data and pseudo-dialogue data based on the corresponding pseudo-dialogue quality, predicted dialogue quality, and loss function formula.
[0161] The average loss calculation unit is used to calculate the average loss based on the loss corresponding to the dialogue data and pseudo-dialogue data in each batch.
[0162] The model parameter update unit is used to continuously update the model parameters in the training model based on the obtained average loss.
[0163] The training termination condition judgment unit is used to determine whether the training model has reached the preset training termination condition;
[0164] The dialogue model acquisition unit is used to obtain a dialogue model if the training model reaches the preset training termination condition.
[0165] In one implementation, the loss function formula is:
[0166]
[0167] Where, q i For pseudo-dialogue quality, p i To predict the quality of the dialogue, i represents the number of dialogue turns.
[0168] Example 3
[0169] like Figure 7 The diagram shown is a structural schematic of a computer device disclosed in Embodiment 3 of this application. The computer device includes a processor, a memory, and a network interface connected via a system bus. The processor provides computing and control capabilities. The network interface is used to communicate with external devices via a network connection. When executed by the processor, the computer program implements various steps of any embodiment of a dialogue evaluation model processing method.
[0170] Example 4
[0171] Embodiment 4 of this application discloses a computer-readable storage medium storing a computer program that, when executed by a processor, implements the steps of any embodiment of the dialogue evaluation model processing method described above.
[0172] Example 5
[0173] This application discloses a computer program product, which includes a computer program that, when executed by a processor, implements the steps of any embodiment of the dialogue evaluation model processing method described above.
[0174] Those skilled in the art will understand that all or part of the processes in the methods of the above embodiments can be implemented by a computer program instructing related hardware. This computer program can be stored in a non-volatile computer-readable storage medium. When executed, the computer program can include the processes of the embodiments of the above methods. Any references to memory, storage, databases, or other media used in the embodiments provided in this application can include non-volatile and / or volatile memory. Non-volatile memory can include read-only memory (ROM), programmable ROM (PROM), electrically programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM), or flash memory. Volatile memory can include random access memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in various forms, such as static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), dual data rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous link DRAM (SLDRAM), RAMbus direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and RAMbus dynamic RAM (RDRAM), etc.
[0175] Those skilled in the art will clearly understand that, for the sake of convenience and brevity, the above-described division of functional units and modules is used as an example. In practical applications, the above functions can be assigned to different functional units and modules as needed, that is, the internal structure of the device can be divided into different functional units or modules to complete all or part of the functions described above.
[0176] The above-described embodiments are only used to illustrate the technical solutions of the present invention, and are not intended to limit it. Although the present invention has been described in detail with reference to the foregoing embodiments, those skilled in the art should understand that modifications can still be made to the technical solutions described in the foregoing embodiments, or equivalent substitutions can be made to some of the technical features. Such modifications or substitutions do not cause the essence of the corresponding technical solutions to deviate from the spirit and scope of the technical solutions of the embodiments of the present invention, and should all be included within the protection scope of the present invention.
Claims
1. A conversation evaluation model processing method, characterized by, The method includes: Based on the original dialogue data, dialogue training samples are obtained, which include positive dialogue samples and negative dialogue samples. Based on the positive and negative dialogue samples, corresponding pseudo-dialogue quality is obtained; the pseudo-dialogue quality represents the intensity of the user's willingness to engage in dialogue. Based on the positive and negative dialogue samples, corresponding predicted dialogue quality is obtained; the predicted dialogue quality characterizes the overall dialogue quality of the positive or negative dialogue samples. Based on the positive dialogue samples and their corresponding pseudo-dialogue quality and predicted dialogue quality, and the negative dialogue samples and their corresponding pseudo-dialogue quality and predicted dialogue quality, a model is trained to obtain a dialogue model. The positive and negative dialogue samples each contain at least one round of dialogue, and each round of dialogue contains a corresponding system response. The step of obtaining the corresponding pseudo-dialogue quality based on the positive and negative dialogue samples includes: obtaining the total number of dialogue rounds and the number of dialogue rounds corresponding to each system response in the positive dialogue samples; obtaining the pseudo-dialogue quality corresponding to each system response based on the total number of dialogue rounds and the number of dialogue rounds corresponding to each system response, and a preset pseudo-dialogue quality calculation formula; and assigning the value of the pseudo-dialogue quality corresponding to the system response in each round of dialogue in the negative dialogue samples to 0 according to a preset assignment rule. The step of obtaining the corresponding predicted dialogue quality based on the positive dialogue sample and the negative dialogue sample includes: inputting the positive dialogue sample and the negative dialogue sample into a pre-trained encoding model to obtain the corresponding dialogue encoding; inputting the dialogue encoding of the positive dialogue sample and the dialogue encoding of the negative dialogue sample into a linear layer for calculation to obtain the predicted dialogue quality corresponding to the positive dialogue sample and the predicted dialogue quality corresponding to the negative dialogue sample.
2. The method as described in claim 1, characterized in that, The functional formula for the linear layer is: in, For the predicted dialogue quality, The dialogue is encoded, where W is the weight. For bias.
3. The method as described in claim 1, characterized in that, The process of obtaining dialogue training samples based on the original dialogue data includes: For each piece of dialogue data in the original dialogue data, pseudo data is constructed according to the preset dialogue prediction to obtain the corresponding pseudo dialogue data. Each dialogue data point is concatenated to obtain positive dialogue samples. Each pseudo-dialogue data point is concatenated to obtain a negative dialogue sample.
4. The method as described in claim 3, characterized in that, The process of training a model based on the positive dialogue samples and their corresponding pseudo-dialogue quality and predicted dialogue quality, and the negative dialogue samples and their corresponding pseudo-dialogue quality and predicted dialogue quality, to obtain a dialogue model includes: The dialogue data in the positive dialogue samples and the pseudo dialogue data in the negative dialogue samples are divided into batches. For each batch of dialogue data and pseudo-dialogue data, the loss is calculated based on the corresponding pseudo-dialogue quality, predicted dialogue quality, and loss function formula. The average loss is calculated based on the loss values corresponding to the dialogue data and pseudo-dialogue data in each batch. Based on the obtained average loss, continuously update the model parameters in the training model; Determine whether the training model has reached the preset training termination condition; If the training model reaches the training termination condition, a dialogue model is obtained.
5. The method as described in claim 4, characterized in that, The loss function formula is as follows: in, For pseudo-dialogue quality, To predict the quality of the dialogue, This refers to the number of dialogue rounds.
6. A dialogue evaluation model processing apparatus for implementing the method as described in claim 1, characterized in that, include: The sample acquisition unit is used to obtain dialogue training samples based on the original dialogue data. The dialogue training samples include positive dialogue samples and negative dialogue samples. The pseudo-dialogue quality acquisition unit is used to obtain the corresponding pseudo-dialogue quality based on the positive dialogue sample and the negative dialogue sample, respectively. The predicted dialogue quality acquisition unit is used to obtain the corresponding predicted dialogue quality based on the positive dialogue samples and the negative dialogue samples, respectively. The model training unit is used to train the model based on the positive dialogue samples and their corresponding pseudo-dialogue quality and predicted dialogue quality, and the negative dialogue samples and their corresponding pseudo-dialogue quality and predicted dialogue quality, to obtain the dialogue model.
7. A computer device comprising a memory, a processor, and a computer program stored in the memory and executable on the processor, characterized in that, When the processor executes the computer program, it implements the dialogue evaluation model processing method as described in any one of claims 1 to 5.
8. A computer-readable storage medium storing a computer program, characterized in that, When the computer program is executed by a processor, it implements the dialogue evaluation model processing method as described in any one of claims 1 to 5.
Citation Information
Patent Citations
Model training method and device, dialogue system evaluation method and device, equipment and storage medium
CN110188331A
Dialogue interaction method and device and electronic device
CN110543552A