Method and apparatus for training a generative model, generating training samples for a text classifier

By using a generative model that includes an encoder and a decoder, combined with the Prompt-tuning method, the text classification problem in zero-shot learning scenarios is solved, generating high-quality training samples and improving the training effect of the text classifier.

CN116484968BActive Publication Date: 2026-06-26ALIPAY (HANGZHOU) INFORMATION TECH CO LTD

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
ALIPAY (HANGZHOU) INFORMATION TECH CO LTD
Filing Date
2023-03-10
Publication Date
2026-06-26

AI Technical Summary

Technical Problem

In zero-shot learning scenarios, traditional text classifier training methods struggle to effectively utilize labeled text of invisible classes, leading to information confusion and poor training sample quality.

Method used

A generative model comprising a first encoder, a second encoder, and a decoder is employed. By processing text samples into semantic vectors and discrete vectors, and combining this with the Prompt-tuning method, high-quality training samples are generated.

Benefits of technology

This solves the problem of information confusion in zero-shot learning scenarios, generates high-quality training samples, and improves the training efficiency and accuracy of text classifiers.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN116484968B_ABST
    Figure CN116484968B_ABST
Patent Text Reader

Abstract

Embodiments of the present specification provide a method and apparatus for training a generation model and generating training samples for a text classifier. In the method for training the generation model, first processing and second processing are performed on a first text sample. The first processing includes determining a semantic vector of the first text sample by a first encoder. A first category of the first text sample is predicted based on the semantic vector by a text classifier, and a first prompt text corresponding to the first category is constructed. The second processing includes determining a first discrete vector corresponding to the first text sample in a target vector space by a second encoder. A reconstructed text of the first text sample is determined based on the first prompt text and the first discrete vector by a decoder. The generation model is trained based on a reconstruction loss determined based on the first text sample and the reconstructed text.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This specification relates to the field of machine learning, and more particularly to a method and apparatus for training generative models and generating training samples for a text classifier. Background Technology

[0002] In traditional techniques, text classifiers are typically trained using supervised learning, which involves training the classifier based on labeled text. However, in some scenarios (such as the distribution of user complaints), it may be necessary to collect no labeled text, or only a very small amount of labeled text, a situation known as zero-shot learning. Therefore, how to classify text in this zero-shot learning scenario becomes a problem to be solved. Summary of the Invention

[0003] This specification describes one or more embodiments of a method and apparatus for training a generative model and generating training samples for a text classifier, which can solve the text classification problem in zero-shot learning scenarios.

[0004] Firstly, a method for training a generative model is provided, the generative model comprising a first encoder, a second encoder, and a decoder; the method comprising:

[0005] The first text sample is subjected to a first processing step and a second processing step, respectively; wherein,

[0006] The first process includes: determining the semantic vector of the first text sample using the first encoder; predicting the first category of the first text sample based on the semantic vector using the text classifier; and constructing a first prompt text corresponding to the first category.

[0007] The second process includes determining, through the second encoder, the first discrete vector corresponding to the first text sample in the target vector space;

[0008] The decoder determines the reconstructed text of the first text sample based on the first prompt text and the first discrete vector.

[0009] The generative model is trained based on the reconstruction loss, which is determined based on the first text sample and the reconstructed text.

[0010] Secondly, a method for generating training samples for a text classifier is provided, including:

[0011] Obtain the generative model trained based on the first aspect;

[0012] For any target category among N preset categories, select the target prompt text corresponding to the target category from the N category prompt texts corresponding to the N categories; and determine the corresponding target discrete vector in the target vector space;

[0013] The target prompt text and the target discrete vector are input into the decoder in the generative model to obtain the output text;

[0014] Based on the output text and the target category, a first training sample is formed to train the text classifier.

[0015] Thirdly, an apparatus for training a generative model is provided, the generative model comprising a first encoder, a second encoder, and a decoder; the apparatus comprising:

[0016] The processing unit is used to perform a first processing and a second processing on the first text sample, respectively.

[0017] The processing unit includes:

[0018] The first processing submodule is configured to: determine the semantic vector of the first text sample using the first encoder; predict the first category of the first text sample based on the semantic vector using the text classifier; and construct a first prompt text corresponding to the first category.

[0019] The second processing submodule is used to determine the first discrete vector corresponding to the first text sample in the target vector space through the second encoder;

[0020] The determining unit is configured to determine the reconstructed text of the first text sample based on the first prompt text and the first discrete vector using the decoder;

[0021] A training unit is used to train the generative model based on a reconstruction loss, which is determined based on the first text sample and the reconstructed text.

[0022] Fourthly, an apparatus for generating training samples for a text classifier is provided, comprising:

[0023] The acquisition unit is used to acquire the generative model trained based on the first aspect.

[0024] The determining unit is used to select a target prompt text corresponding to the target category from N category prompt texts corresponding to the N categories for any target category among the preset N categories; and to determine the corresponding target discrete vector in the target vector space;

[0025] An input unit is used to input the target prompt text and the target discrete vector into the decoder in the generative model to obtain the output text;

[0026] A forming unit is configured to form a first training sample based on the output text and the target category, for training the text classifier.

[0027] Fifthly, a computer-readable storage medium is provided having a computer program stored thereon, which, when executed in a computer, causes the computer to perform the method of the first or second aspect.

[0028] In a sixth aspect, a computing device is provided, including a memory and a processor, wherein the memory stores executable code, and the processor, when executing the executable code, implements the method of the first or second aspect.

[0029] The training generative model, method, and apparatus for generating training samples for a text classifier provided in one or more embodiments of this specification can learn different types of features of text samples by performing two different processing on the text samples. This helps to solve the problem of information confusion between similar categories during the information transfer process from invisible to visible classes, thereby enabling the trained generative model to generate high-quality training samples for zero-shot learning scenarios. Attached Figure Description

[0030] To more clearly illustrate the technical solutions of the embodiments in this specification, the accompanying drawings used in the description of the embodiments will be briefly introduced below. Obviously, the accompanying drawings described below are only some embodiments of this specification. For those skilled in the art, other drawings can be obtained based on these drawings without creative effort.

[0031] Figure 1 This is a schematic diagram illustrating an implementation scenario of one embodiment disclosed in this specification;

[0032] Figure 2 A flowchart illustrating a method for training a generative model according to one embodiment is shown.

[0033] Figure 3 This diagram illustrates how the comparative loss is calculated in one example.

[0034] Figure 4 A schematic diagram illustrating a method for generating training samples for a text classifier according to one embodiment is shown.

[0035] Figure 5 A schematic diagram of an apparatus for training and generating a model according to one embodiment is shown;

[0036] Figure 6 A schematic diagram of an apparatus for generating training samples for a text classifier according to one embodiment is shown. Detailed Implementation

[0037] The solution provided in this specification will now be described with reference to the accompanying drawings.

[0038] Currently, to solve text classification problems in zero-shot learning scenarios, external instructions contained in pre-trained language models are typically used to help the classifier transfer knowledge from visible classes to invisible classes. Such models can be categorized as follows:

[0039] The first type is the encoder-based model, which mainly utilizes textual semantic information from the pre-trained BERT language model to transfer information from invisible to visible classes. However, this model can cause confusion between similar categories. To address this, some improved solutions propose a self-trained model. This model trains for invisible classes by adding pseudo-labels to unlabeled samples. While pseudo-labels can help the model better distinguish between different classes, their inherent errors can negatively impact model performance.

[0040] The second type is the decoder-based model, which uses the decoder to generate text samples to help train the text classifier. However, this model does not combine the classifier and decoder together, but separates the training of the decoder and classifier into two steps, which results in poor quality of the generated text samples.

[0041] To this end, the inventors of this application propose training a generative model that includes both an encoder and a decoder, in order to generate high-quality training samples for training a text classifier while utilizing textual semantic information.

[0042] Figure 1 This is a schematic diagram illustrating an implementation scenario of one of the embodiments disclosed in this specification. Figure 1 In this model, the generative model includes a first encoder, a second encoder, and a decoder, which are used to generate training samples for a text classifier. In one example, the generative model is based on a model pre-trained for at least one training task.

[0043] It should be noted that, for the above generative model, the following training process can also be performed:

[0044] A text sample is acquired and subjected to two different processes. One process involves determining the semantic vector of the text sample using a first encoder. Based on the semantic vector, a text classifier predicts the category of the text sample and constructs category-specific text corresponding to the predicted category.

[0045] In one example, the semantic vector mentioned above could be the feature vector corresponding to [CLS] among the feature vectors output by the first encoder. That is, the category-related features of the text sample can be extracted through the first encoder.

[0046] Another processing method involves determining the discrete vector corresponding to the text sample in the target vector space using a second encoder. Specifically, this discrete vector is selected from a corresponding codebook based on the mean vector of each feature vector output by the second encoder. This codebook includes multiple discrete vectors, denoted as e1, e2, ..., ek.

[0047] In summary, the class-independent features of the text samples can be extracted using this second encoder.

[0048] Then, the decoder can be used to determine the reconstructed text of the text sample based on the category cue text and discrete vectors, and the generative model can be trained based on the reconstruction loss determined based on the text sample and the reconstructed text.

[0049] In summary, this approach can learn features of different types of text samples, which helps to solve the problem of information confusion between similar categories during the information transfer process from invisible to visible classes. This enables the trained generative model to generate high-quality training samples for zero-shot learning scenarios.

[0050] It should be understood that Figure 1 This is just an example. In practical applications, the classification loss can be calculated based on the category label of the text sample and the predicted category. Then, a text classifier can be trained based on the classification loss. This allows the generative model and the text classifier to be trained together, thereby further improving the quality of the generated training samples.

[0051] In addition, a generative model can be trained by combining the quantization loss determined by the encoded vector output by the second encoder with the discrete vector, but this specification does not limit this.

[0052] Figure 2 A flowchart illustrating a method for training a generative model according to one embodiment is shown. This method can be executed by any device, apparatus, platform, or cluster of devices with computing and processing capabilities. It should be noted that the method includes multiple iterations. Figure 2 The diagram illustrates the steps involved in the t-th iteration (t is a positive integer). It can be understood that by repeatedly executing the steps shown, multiple rounds of parameter adjustments to the generative model can be performed, ultimately using the generative model with adjusted parameters from the last round as the final usable model. Figure 2 As shown, the method may include the following steps:

[0053] Step S202: Perform first processing and second processing on the first text sample respectively.

[0054] The first text sample here can be any text sample in the current sample set (also called the t-th round sample set), which may or may not have a category label.

[0055] The first process includes determining the semantic vector of a first text sample using a first encoder, predicting a first category of the first text sample based on the semantic vector using a text classifier, and constructing a first prompt text corresponding to the first category.

[0056] The first encoder mentioned above can use a Transformer-based network, such as BERT or RoBERTa, or an RNN-based LSTM or Bi-LSTM model to extract high-order semantic features of the text.

[0057] It should be noted that this first encoder typically outputs multiple feature vectors. The semantic vector of the first text sample mentioned above can be the feature vector corresponding to [CLS] among the feature vectors output by the first encoder. Thus, through this first encoder, category-related features can be extracted from the first text sample.

[0058] The text classifier described above can be based on different types of neural networks, such as TextCnn or Long Short-Term Memory (LSTM) networks, as long as it has text classification capabilities. This specification does not impose any restrictions on this.

[0059] In one embodiment, the first category may be determined based on the highest score among N scores corresponding to N preset categories output by a text classifier.

[0060] In one embodiment, a first prompt text corresponding to the first category can be selected from N preset category prompt texts. Here, the category prompt text is used to indicate category information. For example, the first prompt text indicates the first category.

[0061] In another embodiment, the first prompt text can also be generated in real time. For example, a prompt template can be preset, which contains placeholders corresponding to category information. Generating the first prompt text may include replacing the placeholders corresponding to category information in the prompt template with the first category.

[0062] In summary, the first processing described above is used to process the category-related features extracted from the first text sample.

[0063] Furthermore, during the first processing of the first text sample, a category prompt text corresponding to the predicted category is constructed simultaneously. This allows the Prompt-tuning method to be integrated into the training of the generative model. This helps to extract as much semantic information from the text as possible after the generative model has been pre-trained, thereby improving the training efficiency of the generative model.

[0064] The second process described above includes determining the first discrete vector corresponding to the first text sample in the target vector space using a second encoder.

[0065] The definition of the second encoder here can be referenced from that of the first encoder described above.

[0066] Furthermore, all vectors in the aforementioned target vector space are discretized vectors (referred to as discrete vectors), meaning that each dimension is a discrete value within a predetermined range. For example, if the predetermined range is [0,1], then each dimension can be 0 or 1, etc.

[0067] In one embodiment, the second encoder is provided with an encoding book, which includes multiple discrete vectors. Determining the first discrete vector may include encoding the first text sample using the second encoder to obtain a first encoded vector. Based on the first encoded vector, the encoding book is queried to obtain the first discrete vector.

[0068] The first encoded vector here can be obtained by averaging the feature vectors output by the second encoder; that is, the first encoded vector is the mean vector of the feature vectors output by the second encoder. Thus, the second encoder can extract class-independent features from the first text sample.

[0069] Furthermore, the codebook can be queried based on the first encoded vector using a discretization function. Specifically, this discretization function determines the first discrete vector by calculating the vector distance between the first encoded vector and each discrete vector in the codebook. For example, the discrete vector corresponding to the smallest vector distance is determined as the first discrete vector.

[0070] In another embodiment, determining the first discrete vector may also include encoding the first text sample using a second encoder to obtain a first encoded vector. The first encoded vector is then discretized using any known discretization algorithm to obtain the first encoded vector.

[0071] In summary, the second processing described above is used to process the category-independent features extracted from the first text sample.

[0072] It should be noted that the first and second processes mentioned above can be executed in parallel, or the first process can be executed first and then the second process, or the second process can be executed first and then the first process.

[0073] As can be seen, in the process of training the generative model, the embodiments of this specification learn both feature-related features and feature-independent features of the text samples. This helps to solve the problem of information confusion between similar categories during the information transfer process from invisible class to visible class, thereby enabling the trained generative model to generate high-quality training samples for zero-shot learning scenarios.

[0074] Step S204: Using a decoder, the reconstructed text of the first text sample is determined based on the first prompt text and the first discrete vector.

[0075] The decoder here can be implemented using DNN, CNN, or RNN networks; it can also be implemented using a transformer or a pre-trained language model based on a transformer decoder (such as GPT, GPT-2).

[0076] In one embodiment, the corresponding text vector for the first prompt text can be determined first. For example, word embedding processing can be performed on the word segments in the first prompt text to obtain the corresponding word vectors. Currently, many word embedding algorithms have been implemented to train word vectors based on large amounts of text corpora. Therefore, the word vectors corresponding to the word segments contained in the first prompt text can be determined directly by consulting the trained word vector table. Then, a neural network can be used to further process the word vectors to obtain the text vector corresponding to the first prompt text. In a specific embodiment, the neural network here can be a convolutional neural network (CNN) or a recurrent neural network (RNN).

[0077] In another embodiment, the word segments from the first prompt text can be directly input into the representation model. Then, in the embedding layer of the representation model, word embedding processing is performed on the word segments of the first prompt text to obtain the corresponding word vectors. Based on the obtained word vectors, the text vector corresponding to the first prompt text can then be obtained. In a specific embodiment, this representation model can be a BERT model.

[0078] After determining the text vector corresponding to the first prompt text, the text vector and the first discrete vector can be concatenated and input into the decoder to obtain the reconstructed text of the first prompt text.

[0079] Of course, in practical applications, the first prompt text can also be one-hot encoded to obtain a corresponding one-hot encoded vector. Then, the one-hot encoded vector and the first discrete vector can be input into the discriminator to obtain the probability of the combination of these two vectors occurring. This probability can be used to calculate the discrimination loss (the specific calculation method will be explained later), which assists in training the generative model.

[0080] Step S206: Train the generative model based on the reconstruction loss, which is determined based on the first text sample and the reconstructed text.

[0081] The generative model described above is used to generate training samples for a text classifier.

[0082] In one embodiment, the vector distance between the first encoded vector (i.e., the vector determined by the second encoder) corresponding to the first text sample and the embedding vector (the vector output by the decoder used to determine the reconstructed text) corresponding to the reconstructed text can be obtained, such as Euclidean distance, cosine distance, etc. Therefore, the reconstruction loss can be determined to be positively correlated with this vector distance. That is, the smaller the vector distance between the first encoded vector and the embedding vector, the smaller the data difference, and the smaller the reconstruction loss.

[0083] In another embodiment, the similarity between the first text sample and the reconstructed text can be obtained by comparing them. For example, the similarity can be determined based on the dot product between the first encoding vector and the embedding vector. Therefore, the reconstruction loss can also be determined to be negatively correlated with the aforementioned similarity. That is, the greater the similarity, the smaller the reconstruction loss.

[0084] The reconstruction loss determined above can be used to measure the generative model, especially its decoder, for reconstructing samples, and thus be used to train the generative model.

[0085] After calculating the reconstruction loss, the backpropagation method can be used to calculate the update gradient corresponding to the decoder parameters. The decoder parameters are then updated based on this update gradient to obtain the updated parameters. More specifically, the updated parameters are obtained by subtracting the product of the corresponding update gradient and the learning step size from the decoder parameters.

[0086] Similarly, the parameters and codebook of the second encoder can also be updated.

[0087] It should be understood that in practical applications, the reconstruction loss described above can be calculated by combining multiple text samples (i.e., the current sample set). Furthermore, all losses mentioned in this specification (such as the discrimination loss mentioned above, the classification loss described below, etc.) are calculated by combining multiple text samples, and this specification will not elaborate further on this point.

[0088] Of course, in practical applications, the quantization loss can also be combined to update the parameters and codebook of the second encoder. That is, the parameters and codebook of the second encoder are updated based on the combined loss of reconstruction loss and quantization loss.

[0089] The quantization loss here can be determined based on the vector distance between the first encoded vector and the first discrete vector. Furthermore, this quantization loss is positively correlated with the vector distance; that is, the smaller the vector distance between the first encoded vector and the first discrete vector, the smaller the quantization loss.

[0090] In one example, the above-mentioned comprehensive loss can be calculated according to the following formula.

[0091]

[0092] Where x is the first text sample, e k Let D(e) be the first discrete vector. k Let E(x) be the reconstructed text, and the first term in the above formula (i.e., the square of the first L2 norm) is the reconstruction loss. E(x) is the first encoded vector, and the second and third terms in the formula (i.e., the squares of the second and third L2 norms) are both quantization losses. Furthermore, sg is the stopping gradient, which updates the encoded text corresponding to the second encoder based on the calculation results of the first and second terms. The parameters of the second encoder are updated based on the calculation results of the first and third terms.

[0093] Furthermore, the parameters of the second encoder can be updated by incorporating the discrimination loss. That is, the parameters of the second encoder are updated based on the combined loss of reconstruction loss, quantization loss, and discrimination loss.

[0094] The discrimination loss here can be determined based on the probability output by the discriminator and the label value indicating whether the current vector combination has appeared. The label value is determined by querying the vector combination list, which stores all existing vector combinations that have appeared.

[0095] In one embodiment, the existing vector combinations in the above vector combination list are recorded during the training of the generative model. For example, after inputting the one-hot encoded vector of the first text sample and the first discrete vector into the discriminator, if the probability output by the discriminator is greater than the probability threshold, the vector combination of the two vectors can be added to the vector combination list for use in the next query.

[0096] In another embodiment, the contents of the vector combination list can also be pre-defined.

[0097] In a specific example, the discrimination loss can be calculated using the cross-entropy loss function based on the probability output by the discriminator and the label value.

[0098] It should be understood that this discrimination loss can also be used to update the parameters of the discriminator.

[0099] It should be noted that during the training of the generative model, this scheme can also train the aforementioned text classifier, that is, combine the generative model and the text classifier into a single framework for training, thereby further improving the quality of the training samples generated by the generative model.

[0100] In one embodiment, a text classifier can be trained (i.e., the parameters of the text classifier can be updated) based on a classification loss. This classification loss can be calculated using the cross-entropy loss function based on the first class and class label of the first text sample. It should be understood that the class label here is also called the visibility class.

[0101] It should be noted that since the above text classifier is based on the semantic vector obtained by the first encoder, the parameters of the first encoder can also be updated based on the classification loss.

[0102] In practical applications, contrastive loss can also be combined to update the parameters of the text classifier and the first encoder. That is, the parameters of the text classifier and the first encoder are updated based on the combined loss of classification loss and contrastive loss.

[0103] The aforementioned contrast loss can be determined based on positive sample pairs consisting of the first text sample and the corresponding enhanced text, and negative sample pairs consisting of the first text sample and other text samples in the current sample set excluding the first text sample.

[0104] The enhanced text described above can be obtained using easy data augmentation methods from natural language processing. Specifically, this method involves data augmentation through synonym substitution, synonym insertion, word order swapping, and word deletion.

[0105] Figure 3 This diagram illustrates the method for calculating comparative loss in one example. Figure 3 In the diagram, blank circles represent the semantic vector of the first text sample, solid circles represent the text vector of the enhanced text, and circles with diagonal lines represent the semantic vectors of other text samples. The contrastive loss is calculated through the following steps: First, based on the semantic vector of the first text sample and the text vector of the enhanced text, calculate the first text distance between the first text sample and the enhanced text. Second, based on the semantic vectors of the first text sample and the other text samples respectively, calculate the second text distance between the first text sample and the other text samples. Third, divide the first text distance by the sum of the second text distance and the first text distance, and use the quotient as the contrastive loss.

[0106] The text vector of the enhanced text can be determined by referring to the method used to determine the text vector of the first prompt text.

[0107] It should be understood that there are multiple other text samples here, thus multiple second text distances can be calculated.

[0108] Furthermore, the parameters of the first encoder can be updated by combining the aforementioned discrimination loss. That is, the parameters of the first encoder are updated based on the combined loss of classification loss, contrast loss, and discrimination loss.

[0109] In summary, this approach allows for parameter updates for all modules involved (i.e., the first encoder, the second encoder, the decoder, the discriminator, the text classifier, and the codebook) during the training of the generative model.

[0110] This completes one round of iterative training for the generative model.

[0111] After this iteration is completed, the following sample filtering steps can be performed:

[0112] For a text sample without a category label in the current sample set (i.e., the sample set of round t) (hereinafter referred to as the second text sample), obtain the highest score among the N scores corresponding to the preset N categories output by the text classifier for the second text sample. It should be understood that this highest score is also used to determine the second category of the second text sample. Then, it can be determined whether the highest score is greater than the score threshold ε (e.g., 0.9). If it is greater, a pseudo label is added to the second text sample, and the second text sample is removed from the current sample set. The remaining sample set after removing the second text sample is used as the sample set for the next round (i.e., the sample set of round t+1).

[0113] As can be seen, the solution provided in this specification improves upon the traditional pseudo-labeling scheme. Instead of directly using these text samples with added pseudo-labels to train the text classifier, it removes them from the sample set. This helps the generative model generate diverse training samples, thereby improving the accuracy of the trained text classifier.

[0114] In summary, the method for training the generative model provided in this specification can simultaneously learn both class-related and class-independent features of text samples, which helps generate high-quality training samples for the text classifier. Furthermore, this approach can incorporate a prompt-tuning method to train the generative model, thereby significantly improving training efficiency. Finally, this approach combines the generative model and the text classifier within a single framework for training, further enhancing the quality of the generated training samples.

[0115] The above describes the training process of the generative model. The following describes the method of using this generative model to generate training samples for a text classifier.

[0116] Figure 4 This diagram illustrates a method for generating training samples for a text classifier according to one embodiment. This method can be executed by any device, apparatus, platform, or cluster of devices with computing and processing capabilities. Figure 4 As shown, the method may include the following steps:

[0117] Step S402: Obtain the generated model.

[0118] The generative model can specifically be based on Figure 2 The training was obtained through the steps shown in the diagram.

[0119] Step S404: For any target category among the preset N categories, select the target prompt text corresponding to the target category from the category prompt texts corresponding to the N categories respectively, and determine the corresponding target discrete vector in the target vector space.

[0120] As mentioned earlier, during the training of the generative model, corresponding category hint text can be constructed for each category predicted by the text classifier. Thus, this step allows for the selection of the appropriate category hint text based on the target category.

[0121] Furthermore, when a codebook is provided for the second encoder, the above-mentioned determination of the corresponding target discrete vector may include determining each discrete vector in the codebook as the target discrete vector in turn.

[0122] For example, assuming the encoding includes K discrete vectors, then for a target category, the corresponding K discrete target vectors can be determined in the target vector space.

[0123] Of course, in practical applications, multiple target discrete vectors can also be obtained by discretizing the encoding vectors of several text samples corresponding to the target category.

[0124] Step S406: Input the target prompt text and the target discrete vector into the decoder in the generation model to obtain the output text.

[0125] In the example above, after concatenating the target prompt text with K discrete target vectors and inputting them into the decoder, K output texts can be obtained.

[0126] Step S408: Based on the output text and the target category, form the first training sample for training the text classifier.

[0127] As in the previous example, based on K output texts and the target category, K training samples can be formed. It should be understood that the category label of these K training samples is the target category.

[0128] In summary, based on this generative model, K training samples can be generated for each of the predefined N categories. A text classifier can then be trained based on these generated training samples. Therefore, this approach can be used to solve text classification problems in zero-shot learning scenarios.

[0129] Corresponding to the above-described method for training a generative model, one embodiment of this specification also provides an apparatus for training a generative model, the generative model including a first encoder, a second encoder, and a decoder. For example... Figure 5 As shown, the device may include:

[0130] The processing unit 502 is used to perform first processing and second processing on the first text sample respectively.

[0131] Processing unit 502 includes:

[0132] The first processing submodule 5022 is used to determine the semantic vector of the first text sample through the first encoder. Based on the semantic vector, the first category of the first text sample is predicted by the text classifier, and a first prompt text corresponding to the first category is constructed.

[0133] The second processing submodule 5024 is used to determine the first discrete vector corresponding to the first text sample in the target vector space through the second encoder.

[0134] The determining unit 504 is used to determine the reconstructed text of the first text sample based on the first prompt text and the first discrete vector through the decoder.

[0135] Training unit 506 is used to train a generative model based on a reconstruction loss determined based on the first text sample and the reconstructed text.

[0136] In one embodiment, the second encoder is provided with an encoding book, which includes multiple discrete vectors, and the second processing submodule 5024 is specifically used for:

[0137] The first text sample is encoded using a second encoder to obtain a first encoded vector;

[0138] Based on the first encoded vector, the encoded book is queried to obtain the first discrete vector.

[0139] The process of obtaining the first encoded vector includes:

[0140] The process of obtaining the first encoding vector includes: averaging the feature vectors output by the second encoder and determining the average vector as the first encoding vector.

[0141] In addition, the above-mentioned query codebook includes: calculating the vector distance between the first encoding vector and each discrete vector in the codebook;

[0142] The discrete vector corresponding to the minimum vector distance is determined as the first discrete vector.

[0143] In one embodiment, training unit 506 includes:

[0144] The first update submodule 5062 is used to update the parameters of the decoder based on the reconstruction loss;

[0145] The second update submodule 5064 is used to update the parameters and codebook of the second encoder based on the combined loss of reconstruction loss and quantization loss, wherein the quantization loss is determined based on the vector distance between the first encoded vector and the first discrete vector.

[0146] In one embodiment, the first text sample has a category label; the device further includes an update unit 508;

[0147] The determining unit 504 is also used to determine the classification loss based on the first category and the category label;

[0148] Update unit 508 is used to update the parameters of the text classifier and the first encoder based on the classification loss.

[0149] Update unit 508 is specifically used for:

[0150] The parameters of the text classifier and the first encoder are updated based on a combined loss of classification loss and contrastive loss. The contrastive loss is determined based on positive sample pairs consisting of the first text sample and the corresponding augmented text, and negative sample pairs consisting of the first text sample and other text samples in the current sample set.

[0151] The above-mentioned contrast loss is obtained through the following steps:

[0152] Calculate the first text distance between the first text sample and the enhanced text;

[0153] Calculate the second text distance between the first text sample and other text samples;

[0154] Divide the first text distance by the sum of the second text distance and the first text distance, and use the resulting quotient as the contrast loss.

[0155] In one embodiment, the device further includes:

[0156] The output unit 510 is used to output the probability of a vector combination of the first prompt text appearing based on the one-hot encoded vector and the first discrete vector corresponding to the first prompt text through the discriminator.

[0157] The determining unit 504 is also used to determine the discrimination loss based on the output probability and the label value indicating whether the vector combination has appeared. The label value is determined by querying the vector combination list, which stores each existing vector combination that has appeared.

[0158] Training unit 506 is specifically used for:

[0159] A generative model is trained based on a combined loss of reconstruction loss and discrimination loss.

[0160] In one embodiment, the training unit 506 is also used to update the parameters of the discriminator based on the discriminative loss.

[0161] In one embodiment, the first text sample does not have a category label, and the device further includes:

[0162] The acquisition unit 512 is used to acquire the score corresponding to the first category;

[0163] Judgment unit 514 is used to determine whether the obtained score is greater than the score threshold;

[0164] The filtering unit 516 is used to remove the first text sample from the current sample set when the score is greater than the score threshold, so as to obtain the remaining sample set for the next round of iteration.

[0165] The functions of each functional module of the apparatus in the above embodiments of this specification can be implemented through the steps of the above method embodiments. Therefore, the specific working process of the apparatus provided in one embodiment of this specification will not be repeated here.

[0166] This specification provides a training and generation model apparatus according to one embodiment, wherein the trained generation model can generate high-quality training samples for a text classifier.

[0167] Corresponding to the method for generating training samples for a text classifier described above, one embodiment of this specification also provides an apparatus for generating training samples for a text classifier, such as... Figure 6 As shown, the device may include:

[0168] Acquisition unit 602 is used to acquire data based on... Figure 2 The generative model obtained by training each method step shown.

[0169] The determining unit 604 is used to select the target prompt text corresponding to the target category from the N category prompt texts corresponding to the N categories for any target category among the preset N categories, and determine the corresponding target discrete vector in the target vector space.

[0170] The input unit 606 is used to input the target prompt text and the target discrete vector into the decoder in the generation model to obtain the output text.

[0171] Forming unit 608 is used to form the first training sample based on the output text and the target category for training the text classifier.

[0172] In one embodiment, the second encoder in the above-mentioned generative model is provided with an encoding book, which includes multiple discrete vectors;

[0173] Unit 604 is specifically used for:

[0174] Multiple discrete vectors in the encoding sample are sequentially determined as the target discrete vector.

[0175] The functions of each functional module of the apparatus in the above embodiments of this specification can be implemented through the steps of the above method embodiments. Therefore, the specific working process of the apparatus provided in one embodiment of this specification will not be repeated here.

[0176] This specification provides an embodiment of an apparatus for generating training samples for a text classifier, which can generate high-quality training samples.

[0177] According to another embodiment, a computer-readable storage medium is also provided, on which a computer program is stored, which, when executed in a computer, causes the computer to perform a combination Figure 2 or Figure 4 The method described.

[0178] According to another embodiment, a computing device is also provided, including a memory and a processor, wherein the memory stores executable code, and when the processor executes the executable code, it implements a combination... Figure 2 or Figure 4 The method described.

[0179] The various embodiments in this specification are described in a progressive manner. Similar or identical parts between embodiments can be referred to mutually. Each embodiment focuses on describing the differences from other embodiments. In particular, the device embodiments are basically similar to the method embodiments, so the description is relatively simple; relevant parts can be referred to the descriptions of the method embodiments.

[0180] The steps of the methods or algorithms described in conjunction with the disclosure in this specification can be implemented in hardware or by a processor executing software instructions. The software instructions can consist of corresponding software modules, which can be stored in RAM, flash memory, ROM, EPROM, EEPROM, registers, hard disk, external hard disk, CD-ROM, or any other form of storage medium well known in the art. An exemplary storage medium is coupled to the processor, enabling the processor to read information from and write information to the storage medium. Of course, the storage medium can also be a component of the processor. The processor and storage medium can reside in an ASIC. Alternatively, the ASIC can reside in a server. Of course, the processor and storage medium can also exist as discrete components in the server.

[0181] Those skilled in the art will recognize that, in one or more of the examples above, the functions described in this invention can be implemented using hardware, software, firmware, or any combination thereof. When implemented in software, these functions can be stored in a computer-readable medium or transmitted as one or more instructions or code on a computer-readable medium. Computer-readable media include computer storage media and communication media, wherein communication media include any medium that facilitates the transfer of a computer program from one place to another. Storage media can be any available medium accessible to a general-purpose or special-purpose computer.

[0182] The foregoing has described specific embodiments of this specification. Other embodiments are within the scope of the appended claims. In some cases, the actions or steps recited in the claims may be performed in a different order than that shown in the embodiments and may still achieve the desired result. Furthermore, the processes depicted in the drawings do not necessarily require the specific or sequential order shown to achieve the desired result. In some embodiments, multitasking and parallel processing are possible or may be advantageous.

[0183] The specific embodiments described above further illustrate the purpose, technical solution, and beneficial effects of this specification. It should be understood that the above description is only a specific embodiment of this specification and is not intended to limit the scope of protection of this specification. Any modifications, equivalent substitutions, improvements, etc., made on the basis of the technical solution of this specification should be included within the scope of protection of this specification.

Claims

1. A method for training a generative model, said generative model comprising a first encoder, a second encoder, and a decoder; the method comprising: The first text sample is subjected to a first processing and a second processing, respectively; wherein, The first process includes: determining the semantic vector of the first text sample using the first encoder; predicting the first category of the first text sample based on the semantic vector using a text classifier; and constructing a first prompt text corresponding to the first category. The second process includes encoding the first text sample using the second encoder to obtain a first encoded vector; and based on the first encoded vector, querying the encoding book set for the second encoder to obtain the first discrete vector corresponding to the first text sample in the target vector space. The decoder determines the reconstructed text of the first text sample based on the first prompt text and the first discrete vector. The generative model is trained based on the reconstruction loss, which is determined based on the first text sample and the reconstructed text.

2. The method according to claim 1, wherein, The process of obtaining the first encoded vector includes: The average of each feature vector output by the second encoder is calculated, and the resulting mean vector is determined as the first encoding vector.

3. The method according to claim 1, wherein, The query of the encoded book includes: Calculate the vector distance between the first encoded vector and each discrete vector in the encoded text; The discrete vector corresponding to the minimum vector distance is determined as the first discrete vector.

4. The method according to claim 1, wherein, Training the generative model includes: The parameters of the decoder are updated based on the reconstruction loss; Based on the combined loss of the reconstruction loss and the quantization loss, the parameters of the second encoder and the codebook are updated; the quantization loss is determined based on the vector distance between the first encoded vector and the first discrete vector.

5. The method according to claim 1, wherein, The first text sample has a category label; the method further includes: Based on the first category and the category label, determine the classification loss; Based on the classification loss, update the parameters of the text classifier and the first encoder.

6. The method according to claim 5, wherein, The updating of the parameters of the text classifier and the first encoder includes: The parameters of the text classifier and the first encoder are updated based on the combined loss of the classification loss and the contrast loss; the contrast loss is determined based on positive sample pairs consisting of the first text sample and the corresponding enhanced text, and negative sample pairs consisting of the first text sample and other text samples in the current sample set.

7. The method according to claim 6, wherein, The contrast loss is obtained through the following steps: Calculate the first text distance between the first text sample and the enhanced text; Calculate the second text distance between the first text sample and the other text samples; Divide the first text distance by the sum of the second text distance and the first text distance, and use the resulting quotient as the contrast loss.

8. The method according to claim 1, further comprising: The discriminator outputs the probability of a vector combination of the first two vectors appearing, based on the one-hot encoded vector corresponding to the first prompt text and the first discrete vector. The discrimination loss is determined based on the probability and the label value indicating whether the vector combination has appeared; the label value is determined by querying a vector combination list, which stores all existing vector combinations that have appeared. Training the generative model includes: The generative model is trained based on the combined loss of the reconstruction loss and the discrimination loss.

9. The method according to claim 8, further comprising: The parameters of the discriminator are updated based on the discrimination loss.

10. The method according to claim 1, wherein, The first text sample does not have a category label; the method further includes: Obtain the score corresponding to the first category; Determine whether the score is greater than a score threshold; If the score is greater than the score threshold, the first text sample is removed from the current sample set, and the remaining sample set is used for the next iteration.

11. A method for generating training samples for a text classifier, comprising: Obtain the generative model trained according to claim 1; For any target category among N preset categories, select the target prompt text corresponding to the target category from the N category prompt texts corresponding to those N categories; And determine the corresponding discrete target vector in the target vector space; The target prompt text and the target discrete vector are input into the decoder in the generative model to obtain the output text; Based on the output text and the target category, a first training sample is formed to train the text classifier.

12. The method according to claim 11, wherein, The second encoder in the generative model is equipped with an encoding book, which includes multiple discrete vectors; Determining the corresponding discrete target vector in the target vector space includes: The multiple discrete vectors in the encoded text are sequentially determined as the target discrete vector.

13. An apparatus for training a generative model, the generative model comprising a first encoder, a second encoder, and a decoder; the apparatus comprising: The processing unit is used to perform a first processing and a second processing on the first text sample, respectively. The processing unit includes: The first processing submodule is configured to: determine the semantic vector of the first text sample using the first encoder; predict the first category of the first text sample based on the semantic vector using a text classifier; and construct a first prompt text corresponding to the first category. The second processing submodule is used to encode the first text sample using the second encoder to obtain a first encoded vector; based on the first encoded vector, it queries the encoding book set for the second encoder to obtain the first discrete vector corresponding to the first text sample in the target vector space. The determining unit is configured to determine the reconstructed text of the first text sample based on the first prompt text and the first discrete vector using the decoder; A training unit is used to train the generative model based on a reconstruction loss, which is determined based on the first text sample and the reconstructed text.

14. The apparatus according to claim 13, wherein, The training unit includes: The first update submodule is used to update the parameters of the decoder based on the reconstruction loss; The second update submodule is used to update the parameters of the second encoder and the codebook based on the combined loss of the reconstruction loss and the quantization loss; the quantization loss is determined based on the vector distance between the first encoded vector and the first discrete vector.

15. The apparatus according to claim 13, wherein, The first text sample has a category label; the device further includes: an update unit; The determining unit is further configured to determine a classification loss based on the first category and the category label; The update unit is used to update the parameters of the text classifier and the first encoder based on the classification loss.

16. The apparatus according to claim 15, wherein, The update unit is specifically used for: The parameters of the text classifier and the first encoder are updated based on the combined loss of the classification loss and the contrast loss; the contrast loss is determined based on positive sample pairs consisting of the first text sample and the corresponding enhanced text, and negative sample pairs consisting of the first text sample and other text samples in the current sample set.

17. The apparatus according to claim 16, wherein, The contrast loss is obtained through the following steps: Calculate the first text distance between the first text sample and the enhanced text; Calculate the second text distance between the first text sample and the other text samples; Divide the first text distance by the sum of the second text distance and the first text distance, and use the resulting quotient as the contrast loss.

18. The apparatus of claim 13, further comprising: The output unit is used to output the probability of a vector combination of the first prompt text and the first discrete vector appearing, based on the one-hot encoded vector corresponding to the first prompt text and the discriminator. The determining unit is further configured to determine the discrimination loss based on the probability and the label value indicating whether the vector combination has appeared; the label value is determined by querying a vector combination list, which stores each existing vector combination that has appeared. The training unit is specifically used for: The generative model is trained based on the combined loss of the reconstruction loss and the discrimination loss.

19. The apparatus according to claim 13, wherein, The first text sample does not have a category label; the device further includes: The acquisition unit is used to acquire the score corresponding to the first category; The judgment unit is used to determine whether the score is greater than the score threshold. A filtering unit is used to remove the first text sample from the current sample set when the score is greater than a score threshold, so as to obtain a remaining sample set for the next iteration.

20. An apparatus for generating training samples for a text classifier, comprising: The acquisition unit is used to acquire the generative model trained according to claim 1; The determining unit is used to select the target prompt text corresponding to the target category from the N category prompt texts corresponding to the N categories for any target category among the preset N categories; And determine the corresponding discrete target vector in the target vector space; An input unit is used to input the target prompt text and the target discrete vector into the decoder in the generative model to obtain the output text; A forming unit is configured to form a first training sample based on the output text and the target category, for training the text classifier.

21. The apparatus according to claim 20, wherein, The second encoder in the generative model is equipped with an encoding book, which includes multiple discrete vectors; The determining unit is specifically used for: The multiple discrete vectors in the encoded text are sequentially determined as the target discrete vector.

22. A computer-readable storage medium having a computer program stored thereon, wherein, When the computer program is executed in the computer, it causes the computer to perform the method according to any one of claims 1-12.

23. A computing device comprising a memory and a processor, wherein, The memory stores executable code, and when the processor executes the executable code, it implements the method of any one of claims 1-12.