Controllable text generation method and device
By constructing a training sample dataset and combining language modeling and contrastive learning, a controllable text generation model is generated, which solves the problems of universality and efficiency in the existing technology for generating controllable text, and achieves efficient and stable controllable text generation.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- TSINGHUA UNIVERSITY
- Filing Date
- 2023-06-01
- Publication Date
- 2026-06-26
AI Technical Summary
Existing technologies for generating controllable text language models suffer from low universality, high time complexity, and unstable response fluency. Furthermore, they require modifications to the model structure or the introduction of auxiliary models, resulting in low generation efficiency.
By constructing a training sample dataset, including negative and positive response samples, and training it using a pre-existing language model, combined with language modeling and contrastive learning, a controllable text generation model can be generated without modifying the model architecture or introducing additional auxiliary models.
It achieves greater universality, lower time complexity, and more stable response fluency. The generated text does not contain malicious behavior, and the model is easy to call and deploy.
Smart Images

Figure CN116881709B_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the field of natural language processing technology, and in particular to a controllable text generation method and apparatus. Background Technology
[0002] Current language models based on deep neural networks and large-scale pre-training, such as ChatGPT and GPT-3, can generate fluent, grammatically correct text. However, they often produce behaviors that do not meet human expectations. For example, language models may generate offensive statements or express agreement with harmful content, and they may also generate text containing unnatural repetitions. This is mainly because language models lack inherent control mechanisms in their model structure and training objectives, making the text they generate uncontrollable by humans. Therefore, controlling language models to avoid undesirable behaviors in generated text—that is, controllable text generation—has become an important and challenging task in natural language generation research.
[0003] The basic idea behind controllable text generation methods is to reduce the probability of the language model generating undesirable responses (i.e., responses containing undesirable behavior, hereinafter referred to as negative examples; conversely, responses without undesirable behavior are considered positive examples). For example, auxiliary models can be trained using data from positive and negative examples, and then the generation probability of the main model can be adjusted accordingly. Existing methods often require modifications to the structure of the language model and the porting of corresponding modified code, making model inconvenient to use. Furthermore, in addition to the main model, auxiliary models are needed to calculate adjustment values for the generation probability, increasing generation time and potentially compromising the fluency of the generated responses. Summary of the Invention
[0004] This invention provides a controllable text generation method and apparatus to address the shortcomings of existing technologies in generating controllable text, such as low universality of language models, high time complexity, and unstable response fluency. It achieves the effect of generating controllable text with higher universality, lower time complexity, and more stable response fluency.
[0005] This invention provides a controllable text generation method, comprising:
[0006] Get tasks to be processed;
[0007] The task to be processed is input into a pre-built controllable text generation model to obtain a task response;
[0008] The controllable text generation model is obtained by training a pre-stored language model using a training sample dataset; the training sample dataset includes a dataset composed of comparison samples; the comparison samples include at least negative response samples, positive response samples, and task samples to be processed; the positive response samples are response samples that do not contain malicious behavior, and the negative response samples are response samples that contain malicious behavior.
[0009] According to the present invention, a controllable text generation method is provided, which trains a controllable text generation model based on a pre-stored language model using a training sample dataset, specifically including:
[0010] Construct a training sample dataset; the training sample dataset includes a first sample dataset and a second sample dataset;
[0011] Based on the pre-stored language model, samples are randomly sampled from the first sample dataset and the second sample dataset to obtain the first sample and the second sample.
[0012] Calculate the first loss value based on the first sample and the first preset formula; calculate the second loss value based on the second sample and the second preset formula;
[0013] The first loss value and the second loss value are weighted and summed to obtain the training loss value;
[0014] The training loss value is optimized using the stochastic gradient descent method to obtain a controllable text generation model.
[0015] According to the controllable text generation method provided by the present invention, a training sample dataset is constructed, specifically including:
[0016] S1: Obtain a pre-built input dataset and a pre-built first sample dataset; the input dataset includes a large number of task samples to be processed; the first sample dataset includes a large number of language samples, the language samples include the input to be processed and positive responses to the input to be processed;
[0017] S2: Extract a task sample to be processed from the input dataset, input the task sample to be processed into the pre-stored language model to obtain a preset number of task response samples; input the task sample to be processed and the task response samples into the pre-stored discriminant model to generate discriminant labels for the task response samples to obtain task response samples containing discriminant labels; wherein, the discriminant labels include positive examples and negative examples;
[0018] S3: Calculate the generation probability of the task response sample, and sort the task response samples according to a preset order based on the generation probability to obtain a task response sample sequence;
[0019] S4: Sequentially extract the task response samples with negative labels from the task response sample sequence, and use them as negative response samples;
[0020] S5: Obtain positive response samples based on the negative response samples; the positive response samples are task response samples whose generation probability is second only to the negative response samples and whose discrimination label is positive.
[0021] S6: The negative example response sample, the positive example response sample, and the task to be processed sample constitute a comparison sample;
[0022] Repeat steps S2-S6 to obtain a large number of comparison samples. Use the large number of comparison samples to construct a second sample dataset. The first sample dataset and the second sample dataset together constitute the training sample dataset.
[0023] According to a controllable text generation method provided by the present invention, the first preset formula includes:
[0024]
[0025] in, This represents the first loss value; Indicates from After sampling the sample data (x1, y), the average value is taken, where * is -log P. θ (y|x1); P represents the first sample dataset; θ (|) represents the generation probability; y represents the positive response to the input to be processed; x1 represents the input to be processed;
[0026] The second preset formula includes:
[0027]
[0028] in, Indicates the second loss value; Indicates from Medium sample data (x2, y + y - Then take the average of *, where * is max(0, γ+log P) θ (y - |x2)-log P θ (y + |x2)); P represents the second sample dataset; θ (|) represents the generation probability; y- represents the negative response of the task sample to be processed; x2 represents the task sample to be processed; y + γ represents the positive responses to the task samples to be processed; γ represents the distance hyperparameter of the contrastive learning.
[0029] According to a controllable text generation method provided by the present invention, calculating the generation probability of the task response sample specifically includes:
[0030] The generation probability is calculated using the third preset formula;
[0031] The third preset formula includes:
[0032]
[0033] Among them, P θ (y * |x * ) represents the generation probability; x * This represents the input to the model; y * This represents the model's output; n represents the number of words contained in the model's output. The word representing the output of the t-th model; Words representing the output of the 0th to (t-1)th models.
[0034] According to a controllable text generation method provided by the present invention, a weighted sum of the first loss value and the second loss value is obtained to obtain a training loss value, specifically including:
[0035]
[0036] Where α represents the weighted hyperparameter of the loss function; This represents the training loss value; This represents the first loss value; This represents the second loss value.
[0037] The present invention also provides a controllable text generation device, comprising:
[0038] The acquisition unit is used to acquire tasks to be processed.
[0039] The response unit is used to input the task to be processed into a pre-built controllable text generation model to obtain a task response;
[0040] The controllable text generation model is obtained by training a pre-stored language model using a training sample dataset; the training sample dataset includes a dataset composed of comparison samples; the comparison samples include at least negative response samples, positive response samples, and task samples to be processed; the positive response samples are response samples that do not contain malicious behavior, and the negative response samples are response samples that contain malicious behavior.
[0041] The present invention also provides an electronic device, including a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein the processor executes the program to implement the controllable text generation method as described above.
[0042] The present invention also provides a non-transitory computer-readable storage medium having a computer program stored thereon, which, when executed by a processor, implements the controllable text generation method as described above.
[0043] The present invention also provides a computer program product, including a computer program that, when executed by a processor, implements the controllable text generation method as described above.
[0044] This invention provides a controllable text generation method and apparatus. The method involves acquiring a task to be processed and inputting the task into a pre-built controllable text generation model to obtain a task response. The controllable text generation model is trained based on a pre-stored language model using a training sample dataset. The training sample dataset includes a dataset composed of contrast samples. The contrast samples include at least negative response samples, positive response samples, and a task sample. The positive response samples are those that do not contain inappropriate behavior, and the negative response samples are those that do contain inappropriate behavior. This invention does not require modification of the model architecture or the introduction of additional auxiliary models. It utilizes the training sample dataset to simultaneously train language modeling and contrastive learning to obtain a controllable text generation model. The trained model is ready to use out of the box and exhibits higher universality, lower time complexity, and more stable response fluency. Attached Figure Description
[0045] To more clearly illustrate the technical solutions in this invention or the prior art, the drawings used in the description of the embodiments or the prior art will be briefly introduced below. Obviously, the drawings described below are some embodiments of this invention. For those skilled in the art, other drawings can be obtained from these drawings without creative effort.
[0046] Figure 1 This is one of the flowcharts illustrating the controllable text generation method provided by the present invention;
[0047] Figure 2 This is the second flowchart of the controllable text generation method provided by the present invention;
[0048] Figure 3 This is a schematic diagram of the structure of the controllable text generation device provided by the present invention;
[0049] Figure 4 This is a schematic diagram of the structure of the electronic device provided by the present invention.
[0050] Figure label:
[0051] 310: Acquisition unit; 320: Response unit;
[0052] 410: Processor; 420: Communication interface; 430: Memory; 440: Communication bus. Detailed Implementation
[0053] To make the objectives, technical solutions, and advantages of this invention clearer, the technical solutions of this invention will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some, not all, of the embodiments of this invention. All other embodiments obtained by those skilled in the art based on the embodiments of this invention without creative effort are within the scope of protection of this invention.
[0054] The following is combined with Figures 1-2 Describe the present invention. Figure 1 This is one of the flowcharts illustrating the controllable text generation method provided by the present invention, such as... Figure 1 As shown, it includes the following steps:
[0055] Step 110: Obtain the task to be processed. The task to be processed can be user input, which may include text sequences.
[0056] Step 120: Input the task to be processed into a pre-built controllable text generation model to obtain a task response;
[0057] The controllable text generation model is obtained by training a pre-stored language model using a training sample dataset; the training sample dataset includes a dataset composed of comparison samples; the comparison samples include at least negative response samples, positive response samples, and task samples to be processed; the positive response samples are response samples that do not contain malicious behavior, and the negative response samples are response samples that contain malicious behavior.
[0058] The acquired task is input into a trained controllable text generation model, which then outputs a task response based on the task. The task response is controllable text, which is text that does not contain inappropriate information or behavior.
[0059] The objectives of this invention are twofold: first, to control the language model to avoid generating responses with inappropriate characteristics (such as offensive statements, unnatural repetitions, etc.) while maintaining the ability to generate fluent and normal responses; and second, to avoid modifying the model architecture or introducing additional auxiliary models, thereby avoiding additional computational overhead during the generation phase.
[0060] The controllable text generation model is trained using a pre-existing language model and a training sample dataset. Furthermore, the pre-existing language model includes a parameterized language model P. θ Given user input x, it can generate response y and model the probability P of generating response y. θ (y|x). In some embodiments, the pre-stored language model can be the Meta AI open-source BlenderBot model. In other embodiments, the pre-stored language model is the OpenAI open-source GPT-2 model. The training sample dataset includes two datasets, denoted as the first sample dataset and the second sample dataset. The first sample dataset is a dataset containing language model input and response pairs (x,y), which can be denoted as... It is important to note that each response y in the first sample dataset is a positive response. A positive response is one that does not contain offensive behavior or information. The second sample dataset is a pair of language model inputs, positive responses, and negative response pairs (x, y). + ,y - The dataset can be denoted as Similarly, positive examples are responses that do not contain inappropriate behavior or information, while negative examples are responses that do contain inappropriate behavior or information.
[0061] The first sample dataset is a pre-constructed dataset, and the second sample dataset is constructed based on the pre-constructed input dataset, a pre-stored language model, and a discriminant model. Specifically, the pre-stored language model P... θ Based on the given dataset For each input x, multiple responses y are generated. A discriminative model is used to judge the generated multiple responses, and responses that meet the conditions are retained, thereby constructing the second sample dataset.
[0062] In some embodiments, the first sample dataset can be a publicly available dataset, such as the Bot-Adversarial Dialogue dataset from Meta AI, or the WikiText dataset. In other embodiments, the first sample dataset can be constructed according to a language modeling objective; the construction method includes mimicking the method for constructing a second dataset, based on a pre-constructed input dataset, a pre-stored language model, and a discriminative model. Furthermore, the pre-constructed input dataset can be a publicly available dataset, such as the Bot-Adversarial Dialogue dataset from Meta AI, or the WikiText dataset. In other embodiments, the input dataset can also be constructed according to a contrastive learning training objective, for example, by manually selecting the input dataset.
[0063] By using the constructed training sample dataset, a controllable text generation model is obtained by simultaneously performing language modeling training and contrastive learning training based on a pre-stored language model.
[0064] The controllable text generation method provided by this invention trains a pre-stored language model P. θ This ensures that the generated responses are judged as positive examples by the pre-stored discriminative model c. Compared with existing controllable text generation methods, this invention does not require modification of the model architecture or the introduction of additional auxiliary models. This avoids additional computational overhead during the generation stage and allows the trained model to be used out of the box. Furthermore, experiments show that this invention, while maintaining the language model's ability to generate fluent and coherent responses, exhibits better control and can better prevent the language model from generating responses with undesirable characteristics.
[0065] Based on the above embodiments, the method, which trains a controllable text generation model using a pre-stored language model and a training sample dataset, specifically includes:
[0066] Construct a training sample dataset; the training sample dataset includes a first sample dataset and a second sample dataset;
[0067] Based on the pre-stored language model, samples are randomly sampled from the first sample dataset and the second sample dataset to obtain the first sample and the second sample.
[0068] Calculate the first loss value based on the first sample and the first preset formula; calculate the second loss value based on the second sample and the second preset formula;
[0069] The first loss value and the second loss value are weighted and summed to obtain the training loss value;
[0070] The training loss value is optimized using the stochastic gradient descent method to obtain a controllable text generation model.
[0071] Specifically, the training steps are as follows:
[0072] First, a training sample dataset is constructed; the training sample dataset includes a first sample dataset and a second sample dataset.
[0073] Using the constructed training sample dataset, based on the pre-stored language model P θ First sample dataset The training process aimed at language modeling can be expressed by the first pre-defined formula:
[0074]
[0075] in, This represents the first loss value; Indicates from After sampling the sample data (x1, y), the average value is taken, where * is -log P. θ (y|x1); P represents the first sample dataset; θ (|) represents the generation probability; y represents the positive response to the input to be processed; x1 represents the input to be processed, i.e., the input x of the first sample dataset.
[0076] Based on the pre-stored language model P θ Second sample dataset Training is performed on the contrastive learning objectives, such as... Figure 2 Its loss function can be expressed as the second preset formula:
[0077]
[0078] in, Indicates the second loss value; Indicates from Medium sample data (x2, y + y - Then take the average of *, where * is max(0, γ+log P) θ (y - |x2)-log P θ (y + |x2)); P represents the second sample dataset; θ (|) represents the generation probability; y- represents the negative response of the task sample to be processed; x2 represents the task sample to be processed; y + γ represents the positive responses to the task samples to be processed; γ represents the distance hyperparameter of the contrastive learning.
[0079] It's important to understand that the goal of language modeling is to learn natural language text written by humans; the goal of contrastive learning is to learn more meaningful latent language representations by comparing the generation probabilities of positive and negative examples.
[0080] The training loss value is optimized using stochastic gradient descent to obtain a controllable text generation model. It is important to note that the training of the language modeling objective and the contrastive learning objective are performed simultaneously. Specifically, in each training step, the first sample dataset is randomly selected... Second sample dataset Samples are collected to obtain a first sample and a second sample. It's important to note that the number of samples is also random during the random sampling process; that is, the number of language samples included in the first sample is random, and the number of comparison samples included in the second sample is also random. In some embodiments, this number can be 64 or 128. A first loss value and a second loss value are calculated based on the first sample, the second sample, and the corresponding trained loss function. Specifically, based on the sample data of a random number of language samples obtained from the sampling, the generation probability of the sample data (x1, y) of each language sample is obtained based on the pre-stored language model, and -log P is calculated. θ Let (y|x1) be denoted as L1. Calculate the average L1 of the sample data of all language samples included in the first sample to obtain the first loss value. Similarly, based on the sample data of a random number of comparison samples obtained from sampling, obtain the sample data (x2, y) of each comparison sample based on the pre-stored language model. + y - Calculate the generation probability of γ+log P. θ (y - |x2)-log P θ (y + |x2), compare it with 0, take the maximum value, and denote the result as L2. Calculate the average L2 of the sample data of all language samples included in the second sample to obtain the second loss value. Sum the first and second loss values with weights to obtain the overall training loss value:
[0081]
[0082] Where α represents the weighted hyperparameter of the loss function; This represents the training loss value; This represents the first loss value; This represents the second loss value.
[0083] The training loss value is then optimized using the stochastic gradient descent method, which enables simultaneous training. The resulting language model is the controllable text generation model.
[0084] Based on the above embodiments, the method of constructing a training sample dataset specifically includes:
[0085] S1: Obtain the pre-constructed input dataset and the pre-constructed first sample dataset. Obtaining the input dataset and the first sample dataset can be done simultaneously, or the input dataset can be obtained first, followed by the first sample dataset; or vice versa. This invention does not limit this process. The input dataset includes a massive number of task samples to be processed, which are the language model inputs. The first sample dataset includes a massive number of language samples, each language sample comprising a data pair consisting of a task input and a positive response to that task input. The task input is the language model input.
[0086] S2: From the pre-built input dataset Extract a task sample x to be processed, and input the task sample x into the pre-stored language model to obtain a preset number of task response samples, such as... Figure 2 The task sample to be processed and the task response sample are input into a pre-stored discrimination model to determine whether the task response sample is a positive or negative example, and a discrimination label is generated for the task response sample to obtain a task response sample containing the discrimination label; wherein, the discrimination label includes positive and negative examples, a positive label means that the relevant task response sample does not contain malicious behavior or information, and a negative label means that the relevant task response sample contains malicious behavior or information. In some embodiments, the preset number can be 6, such as Figure 2 A discriminative model is used to determine whether a task response sample contains inappropriate behavior or information. Given an input-response pair (x, y) of a language model, it can distinguish whether the response y is a positive or negative example. In some embodiments, the discriminative model can be an existing model, such as an API for detecting offensive speech.
[0087] S3: Calculate the generation probability P of the task response sample. θ (y|x) is automatically calculated from a pre-stored model. Based on the generation probability, the task response samples are sorted according to a preset order to obtain a task response sample sequence, such as... Figure 2 Assume the task response samples include y1, y2, ..., y6, where y1, y4, and y6 are positive examples, and y2, y3, and y5 are negative examples. y1 has the highest generation probability, and y6 has the lowest. In some embodiments, the preset order includes from high to low. The generation probability can be expressed by a third preset formula:
[0088]
[0089] Among them, P θ (y * |x * ) represents the generation probability; x *This represents the input to the model; y * This represents the model's output; n represents the number of words contained in the model's output. The word representing the output of the t-th model; Words representing the output of the 0th to (t-1)th models.
[0090] S4: Sequentially extract the task response samples with negative labels from the task response sample sequence, and use them as negative response samples (assuming they are y). - ).
[0091] S5: Obtain positive response samples based on the negative response samples; the positive response samples are task response samples whose generation probability is second only to the negative response samples and whose discrimination label is positive; that is, find the negative response sample whose generation probability is second only to the extracted negative response sample (let's assume it's y). - Positive response samples (let's assume it's y) + This forms a comparison sample (x, y). + y - ).like Figure 2 In some embodiments, y2 is extracted as a negative response sample in step S4, then y4 is the task response sample with the second highest generation probability after y2 and a positive discrimination label, which is a positive response sample that meets the requirements. In other embodiments, y3 is extracted as a negative response sample in step S4, then y4 is the task response sample with the second highest generation probability after y3 and a positive discrimination label, which is a positive response sample that meets the requirements. In still other embodiments, y5 is extracted as a negative response sample in step S4, then y6 is the task response sample with the second highest generation probability after y5 and a positive discrimination label, which is a positive response sample that meets the requirements.
[0092] S6: The negative response sample, the positive response sample, and the task sample to be processed constitute a comparison sample.
[0093] Repeat steps S2-S6 to obtain a large number of comparison samples. Use the large number of comparison samples to construct a second sample dataset. The first sample dataset and the second sample dataset together constitute the training sample dataset.
[0094] Based on the above embodiments, in this method, the first preset formula includes:
[0095]
[0096] in, This represents the first loss value; Indicates from After sampling the sample data (x1, y), the average value is taken, where * is -log P. θ (y|x1); P represents the first sample dataset; θ (|) represents the generation probability; y represents the positive response to the input to be processed; x1 represents the input to be processed;
[0097] The second preset formula includes:
[0098]
[0099] in, Indicates the second loss value; Indicates from Medium sample data (x2, y + y - Then take the average of *, where * is max(0, γ+log P) θ (y - |x2)-log P θ (y + |x2)); P represents the second sample dataset; θ (|) represents the generation probability; y- represents the negative response of the task sample to be processed; x2 represents the task sample to be processed; y + γ represents the positive responses to the task samples to be processed; γ represents the distance hyperparameter of the contrastive learning.
[0100] Based on the above embodiments, the method of calculating the generation probability of the task response sample specifically includes:
[0101] The generation probability is calculated using the third preset formula;
[0102] The third preset formula includes:
[0103]
[0104] Among them, P θ (y * |x * ) represents the generation probability; x * This represents the input to the model; y * This represents the model's output; n represents the number of words contained in the model's output. The word representing the output of the t-th model; Words representing the output of the 0th to (t-1)th models.
[0105] Based on the above embodiments, in this method, the first loss value and the second loss value are weighted and summed to obtain the training loss value, specifically including:
[0106]
[0107] Where α represents the weighted hyperparameter of the loss function; This represents the training loss value; This represents the first loss value; This represents the second loss value. In some embodiments, α can be 1.0, 0.5, etc.
[0108] This invention provides a controllable text generation method, which involves acquiring a task to be processed; inputting the task to be processed into a pre-built controllable text generation model to obtain a task response; the controllable text generation model is trained based on a pre-stored language model using a training sample dataset; the training sample dataset includes a dataset composed of comparison samples; the comparison samples include at least negative response samples, positive response samples, and a task sample to be processed; the positive response samples are response samples that do not contain inappropriate behavior, and the negative response samples are response samples that contain inappropriate behavior. This invention does not require modification of the model architecture or the introduction of additional auxiliary models. It utilizes the training sample dataset to simultaneously train language modeling and contrastive learning to obtain a controllable text generation model. The trained model is ready to use out of the box and has higher universality, lower time complexity, and more stable response fluency.
[0109] The controllable text generation apparatus provided by the present invention is described below. The controllable text generation apparatus described below can be referred to in correspondence with the controllable text generation method described above.
[0110] Figure 3 This is a schematic diagram of the controllable text generation device provided by the present invention, as shown below. Figure 3 As shown, it includes an acquisition unit 310 and a response unit 320, wherein,
[0111] Acquisition unit 310 is used to acquire tasks to be processed;
[0112] The response unit 320 is used to input the task to be processed into a pre-built controllable text generation model to obtain a task response;
[0113] The controllable text generation model is obtained by training a pre-stored language model using a training sample dataset; the training sample dataset includes a dataset composed of comparison samples; the comparison samples include at least negative response samples, positive response samples, and task samples to be processed; the positive response samples are response samples that do not contain malicious behavior, and the negative response samples are response samples that contain malicious behavior.
[0114] Based on the above embodiments, in this device, a controllable text generation model is obtained by training a training sample dataset based on a pre-stored language model, specifically including:
[0115] Construct a training sample dataset; the training sample dataset includes a first sample dataset and a second sample dataset;
[0116] Based on the pre-stored language model, samples are randomly sampled from the first sample dataset and the second sample dataset to obtain the first sample and the second sample.
[0117] Calculate the first loss value based on the first sample and the first preset formula; calculate the second loss value based on the second sample and the second preset formula;
[0118] The first loss value and the second loss value are weighted and summed to obtain the training loss value;
[0119] The training loss value is optimized using the stochastic gradient descent method to obtain a controllable text generation model.
[0120] Based on the above embodiments, the construction of the training sample dataset in this device specifically includes:
[0121] S1: Obtain a pre-built input dataset and a pre-built first sample dataset; the input dataset includes a large number of task samples to be processed; the first sample dataset includes a large number of language samples, the language samples include the input to be processed and positive responses to the input to be processed;
[0122] S2: Extract a task sample to be processed from the input dataset, input the task sample to be processed into the pre-stored language model to obtain a preset number of task response samples; input the task sample to be processed and the task response samples into the pre-stored discriminant model to generate discriminant labels for the task response samples to obtain task response samples containing discriminant labels; wherein, the discriminant labels include positive examples and negative examples;
[0123] S3: Calculate the generation probability of the task response sample, and sort the task response samples according to a preset order based on the generation probability to obtain a task response sample sequence;
[0124] S4: Sequentially extract the task response samples with negative labels from the task response sample sequence, and use them as negative response samples;
[0125] S5: Obtain positive response samples based on the negative response samples; the positive response samples are task response samples whose generation probability is second only to the negative response samples and whose discrimination label is positive.
[0126] S6: The negative example response sample, the positive example response sample, and the task to be processed sample constitute a comparison sample;
[0127] Repeat steps S2-S6 to obtain a large number of comparison samples. Use the large number of comparison samples to construct a second sample dataset. The first sample dataset and the second sample dataset together constitute the training sample dataset.
[0128] Based on the above embodiments, in this device, the first preset formula includes:
[0129]
[0130] in, This represents the first loss value; Indicates from After sampling the sample data (x1, y), the average value is taken, where * is -log P. θ (y|x1); P represents the first sample dataset; θ (|) represents the generation probability; y represents the positive response to the input to be processed; x1 represents the input to be processed;
[0131] The second preset formula includes:
[0132]
[0133] in, Indicates the second loss value; Indicates from Medium sample data (x2, y + y - Then take the average of *, where * is max(0, γ+log P) θ (y - |x2)-log P θ (y + |x2)); P represents the second sample dataset; θ (|) represents the generation probability; y- represents the negative response of the task sample to be processed; x2 represents the task sample to be processed; y + γ represents the positive responses to the task samples to be processed; γ represents the distance hyperparameter of the contrastive learning.
[0134] Based on the above embodiments, the calculation of the generation probability of the task response sample in this device specifically includes:
[0135] The generation probability is calculated using the third preset formula;
[0136] The third preset formula includes:
[0137]
[0138] Among them, P θ (y* |x * ) represents the generation probability; x * This represents the input to the model; y * This represents the model's output; n represents the number of words contained in the model's output. The word representing the output of the t-th model; Words representing the output of the 0th to (t-1)th models.
[0139] Based on the above embodiments, in this device, the first loss value and the second loss value are weighted and summed to obtain the training loss value, specifically including:
[0140]
[0141] Where α represents the weighted hyperparameter of the loss function; This represents the training loss value; This represents the first loss value; This represents the second loss value.
[0142] This invention provides a controllable text generation device that acquires a task to be processed; inputs the task to be processed into a pre-built controllable text generation model to obtain a task response; the controllable text generation model is trained based on a pre-stored language model using a training sample dataset; the training sample dataset includes a dataset composed of comparison samples; the comparison samples include at least negative example response samples, positive example response samples, and a task sample to be processed; the positive example response samples are response samples that do not contain inappropriate behavior, and the negative example response samples are response samples that contain inappropriate behavior. This invention does not require modification of the model architecture or the introduction of additional auxiliary models. It utilizes the training sample dataset to simultaneously train language modeling and contrastive learning to obtain a controllable text generation model. The trained model is ready to use out of the box and has higher universality, lower time complexity, and more stable response fluency.
[0143] Figure 4 An example is a schematic diagram of the physical structure of an electronic device, such as... Figure 4As shown, the electronic device may include a processor 410, a communications interface 420, a memory 430, and a communication bus 440, wherein the processor 410, communications interface 420, and memory 430 communicate with each other via the communication bus 440. The processor 410 can call logical instructions in the memory 430 to execute a controllable text generation method, which includes: acquiring a task to be processed; inputting the task to be processed into a pre-built controllable text generation model to obtain a task response; the controllable text generation model is trained based on a pre-stored language model using a training sample dataset; the training sample dataset includes a dataset composed of comparison samples; the comparison samples include at least negative response samples, positive response samples, and a task sample to be processed; the positive response samples are response samples that do not contain malicious behavior, and the negative response samples are response samples that contain malicious behavior.
[0144] Furthermore, the logical instructions in the aforementioned memory 430 can be implemented as software functional units and, when sold or used as independent products, can be stored in a computer-readable storage medium. Based on this understanding, the technical solution of the present invention, essentially, or the part that contributes to the prior art, or a part of the technical solution, can be embodied in the form of a software product. This computer software product is stored in a storage medium and includes several instructions to cause a computer device (which may be a personal computer, server, or network device, etc.) to execute all or part of the steps of the methods described in the various embodiments of the present invention. The aforementioned storage medium includes various media capable of storing program code, such as USB flash drives, portable hard drives, read-only memory (ROM), random access memory (RAM), magnetic disks, or optical disks.
[0145] On the other hand, the present invention also provides a computer program product, which includes a computer program that can be stored on a non-transitory computer-readable storage medium. When the computer program is executed by a processor, the computer can execute the controllable text generation method provided by the above methods. The method includes: acquiring a task to be processed; inputting the task to be processed into a pre-constructed controllable text generation model to obtain a task response; the controllable text generation model is trained based on a pre-stored language model using a training sample dataset; the training sample dataset includes a dataset composed of comparison samples; the comparison samples include at least negative response samples, positive response samples, and a task sample to be processed; the positive response samples are response samples that do not contain malicious behavior, and the negative response samples are response samples that contain malicious behavior.
[0146] In another aspect, the present invention also provides a non-transitory computer-readable storage medium storing a computer program thereon, which, when executed by a processor, implements the controllable text generation method provided by the above methods. The method includes: acquiring a task to be processed; inputting the task to be processed into a pre-constructed controllable text generation model to obtain a task response; the controllable text generation model is trained based on a pre-stored language model using a training sample dataset; the training sample dataset includes a dataset composed of comparison samples; the comparison samples include at least negative example response samples, positive example response samples, and a task sample to be processed; the positive example response samples are response samples that do not contain malicious behavior, and the negative example response samples are response samples that contain malicious behavior.
[0147] The device embodiments described above are merely illustrative. The units described as separate components may or may not be physically separate. The components shown as units may or may not be physical units; that is, they may be located in one place or distributed across multiple network units. Some or all of the modules can be selected to achieve the purpose of this embodiment according to actual needs. Those skilled in the art can understand and implement this without any creative effort.
[0148] Through the above description of the embodiments, those skilled in the art can clearly understand that each embodiment can be implemented by means of software plus necessary general-purpose hardware platforms, and of course, it can also be implemented by hardware. Based on this understanding, the above technical solutions, in essence or the part that contributes to the prior art, can be embodied in the form of a software product. This computer software product can be stored in a computer-readable storage medium, such as ROM / RAM, magnetic disk, optical disk, etc., and includes several instructions to cause a computer device (which may be a personal computer, server, or network device, etc.) to execute the methods described in the various embodiments or some parts of the embodiments.
[0149] Finally, it should be noted that the above embodiments are only used to illustrate the technical solutions of the present invention, and not to limit them; although the present invention has been described in detail with reference to the foregoing embodiments, those skilled in the art should understand that modifications can still be made to the technical solutions described in the foregoing embodiments, or equivalent substitutions can be made to some of the technical features; and these modifications or substitutions do not cause the essence of the corresponding technical solutions to deviate from the spirit and scope of the technical solutions of the embodiments of the present invention.
Claims
1. A controllable text generation method, characterized in that, include: Get tasks to be processed; The task to be processed is input into a pre-built controllable text generation model to obtain a task response; The controllable text generation model is obtained by training a pre-stored language model using a training sample dataset; the training sample dataset includes a dataset composed of comparison samples; the comparison samples include at least negative response samples, positive response samples, and task samples to be processed; the positive response samples are response samples that do not contain malicious behavior, and the negative response samples are response samples that contain malicious behavior; The training sample dataset is constructed in the following manner: S1: Obtain a pre-built input dataset and a pre-built first sample dataset; the input dataset includes a large number of task samples to be processed; the first sample dataset includes a large number of language samples, the language samples include the input to be processed and positive responses to the input to be processed; S2: Extract a task sample to be processed from the input dataset, input the task sample to be processed into the pre-stored language model to obtain a preset number of task response samples; input the task sample to be processed and the task response samples into the pre-stored discriminant model to generate discriminant labels for the task response samples to obtain task response samples containing discriminant labels; wherein, the discriminant labels include positive examples and negative examples; S3: Calculate the generation probability of the task response sample, and sort the task response samples according to a preset order based on the generation probability to obtain a task response sample sequence; S4: Sequentially extract the task response samples with negative labels from the task response sample sequence, and use them as negative response samples; S5: Obtain positive response samples based on the negative response samples; the positive response samples are task response samples whose generation probability is second only to the negative response samples and whose discrimination label is positive. S6: The negative example response sample, the positive example response sample, and the task to be processed sample constitute a comparison sample; Repeat steps S2-S6 to obtain a large number of comparison samples. Use the large number of comparison samples to construct a second sample dataset. The first sample dataset and the second sample dataset together constitute the training sample dataset.
2. The controllable text generation method according to claim 1, characterized in that, A controllable text generation model is obtained by training a training sample dataset based on a pre-existing language model, specifically including: Based on the pre-stored language model, samples are randomly sampled from the first sample dataset and the second sample dataset to obtain the first sample and the second sample. Calculate the first loss value based on the first sample and the first preset formula; calculate the second loss value based on the second sample and the second preset formula; The first loss value and the second loss value are weighted and summed to obtain the training loss value; The training loss value is optimized using the stochastic gradient descent method to obtain a controllable text generation model.
3. The controllable text generation method according to claim 2, characterized in that, The first preset formula includes: ; in, This represents the first loss value; Indicates from Medium-sampled sample data Then take the average of *, * is ; This represents the first sample dataset; Indicates the generation probability; This indicates a positive response to the input to be processed; Indicates input to be processed; The second preset formula includes: ; in, Indicates the second loss value; Indicates from Medium-sampled sample data Then take the average of *, * is ; This represents the second sample dataset; Indicates the generation probability; This indicates a negative response to a sample of the task to be processed. This represents a sample of tasks to be processed. This indicates a positive response to the task sample to be processed; This represents the distance hyperparameter for contrastive learning.
4. The controllable text generation method according to claim 1, characterized in that, Calculating the generation probability of the task response sample specifically includes: The generation probability is calculated using the third preset formula; The third preset formula includes: = ; in, Indicates the generation probability; Represents the input to the model; This represents the output of the model; This indicates the number of words contained in the model's output; Indicates the first The words output by the model; Indicates the 0th to ( The words output by the model.
5. The controllable text generation method according to claim 2, characterized in that, The first loss value and the second loss value are weighted and summed to obtain the training loss value, specifically including: ; in, The hyperparameters representing the weighting of the loss function; This represents the training loss value; This represents the first loss value; This represents the second loss value.
6. A controllable text generation device, characterized in that, include: The acquisition unit is used to acquire tasks to be processed. The response unit is used to input the task to be processed into a pre-built controllable text generation model to obtain a task response; The controllable text generation model is obtained by training a pre-stored language model using a training sample dataset; the training sample dataset includes a dataset composed of comparison samples; the comparison samples include at least negative response samples, positive response samples, and task samples to be processed; the positive response samples are response samples that do not contain malicious behavior, and the negative response samples are response samples that contain malicious behavior; The training sample dataset is constructed in the following manner: S1: Obtain a pre-built input dataset and a pre-built first sample dataset; the input dataset includes a large number of task samples to be processed; the first sample dataset includes a large number of language samples, the language samples include the input to be processed and positive responses to the input to be processed; S2: Extract a task sample to be processed from the input dataset, input the task sample to be processed into the pre-stored language model to obtain a preset number of task response samples; input the task sample to be processed and the task response samples into the pre-stored discriminant model to generate discriminant labels for the task response samples to obtain task response samples containing discriminant labels; wherein, the discriminant labels include positive examples and negative examples; S3: Calculate the generation probability of the task response sample, and sort the task response samples according to a preset order based on the generation probability to obtain a task response sample sequence; S4: Sequentially extract the task response samples with negative labels from the task response sample sequence, and use them as negative response samples; S5: Obtain positive response samples based on the negative response samples; the positive response samples are task response samples whose generation probability is second only to the negative response samples and whose discrimination label is positive. S6: The negative example response sample, the positive example response sample, and the task to be processed sample constitute a comparison sample; Repeat steps S2-S6 to obtain a large number of comparison samples. Use the large number of comparison samples to construct a second sample dataset. The first sample dataset and the second sample dataset together constitute the training sample dataset.
7. An electronic device comprising a memory, a processor, and a computer program stored in the memory and executable on the processor, characterized in that, When the processor executes the program, it implements the controllable text generation method as described in any one of claims 1 to 5.
8. A non-transitory computer-readable storage medium having a computer program stored thereon, characterized in that, When the computer program is executed by a processor, it implements the controllable text generation method as described in any one of claims 1 to 5.
9. A computer program product, comprising a computer program, characterized in that, When the computer program is executed by a processor, it implements the controllable text generation method as described in any one of claims 1 to 5.