Training vector transformation model, method and device for transforming semantic vectors
By selecting target text and training the main network using a generative pre-trained model and an auxiliary network, a vector transformation model is constructed, which solves the problem of inaccurate text transformation variables and achieves more accurate text vector representation, text classification, and similarity calculation.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- 阳光保险集团股份有限公司
- Filing Date
- 2023-02-24
- Publication Date
- 2026-06-30
AI Technical Summary
In existing technologies, text-to-vector conversion methods have limitations, failing to accurately represent the semantics of text, resulting in inaccurate vector conversion.
By selecting target text from a text set, generating connecting samples using a generative pre-trained model, and constructing a vector transformation model through training the main network and auxiliary network, the accuracy of text vector representation is improved by achieving collaborative training of self-learning and auxiliary networks.
It improves the accuracy of text vector conversion and enhances the effectiveness of text classification and similarity calculation.
Smart Images

Figure CN116186539B_ABST
Abstract
Description
Technical Field
[0001] This application relates to the field of model pre-training, and more specifically, to training vector transformation models, methods and apparatus for transforming semantic vectors. Background Technology
[0002] Currently, with the rapid development of artificial intelligence, especially natural language processing (NLP) technology, NLP technology has found more application scenarios. Text vectorization is a core component of NLP algorithms. Traditional text vectorization typically involves extracting the text's inherent features and converting them into text vectors by constructing matrices.
[0003] The above-mentioned text-to-vector conversion method has significant limitations, as it can only represent vectors based on the features of the text itself, leading to inaccurate vector conversion.
[0004] Therefore, how to accurately convert text into text vectors is a technical problem that needs to be solved. Summary of the Invention
[0005] The purpose of this application is to provide a method for training a vector conversion model. The technical solution of this application can achieve the effect of accurately converting text into text vectors.
[0006] In a first aspect, embodiments of this application provide a method for training a vector conversion model, comprising: selecting target text from a text set according to a preset probability to obtain prefix samples, wherein the prefix samples are target text or non-target text in the text set; inputting the prefix samples into a generative pre-trained model to obtain concatenation samples, wherein the concatenation samples carry a label indicating whether there is a semantic implication relationship between the concatenation samples and the target text, and when the prefix sample is the target text, there is a semantic implication relationship between the concatenation samples and the target text; and training the main network and the auxiliary network using the target text vectors corresponding to the concatenation samples and the target text to obtain a vector conversion model.
[0007] In the above embodiments, this application selects target text with probability and trains the main network and auxiliary network with the connecting text corresponding to the selected prefix samples. Through self-learning and the assistance of the auxiliary network, the main network can learn to analyze the meaning of the text based on the content of the text and the text connection. Then, the obtained vector conversion model can represent the text as a vector, which can achieve the effect of accurately converting the text into a text vector.
[0008] In some embodiments, before selecting target text from the text set according to a preset probability to obtain prefix samples, the method further includes:
[0009] Copy the target text to obtain sample pairs;
[0010] The target text vector is obtained by encoding the sample pairs through the main network.
[0011] In the above embodiments, after constructing sample pairs, the target text vector is directly generated through the main network, which can make the main network more accurate in converting text to vectors during subsequent pre-training.
[0012] In some embodiments, a vector transformation model is obtained by training the main network and the auxiliary network by connecting the target text vectors corresponding to the sample and the target text, including:
[0013] The loss of the main network is calculated using the target text vector;
[0014] Input the target text vector and the connecting sample into the auxiliary network to obtain the semantic implication probability that the connecting sample and the target text have a semantic implication relationship;
[0015] The auxiliary network loss is calculated using semantic entailment probability.
[0016] The total loss is obtained by weighted summing of the losses from the main network and the auxiliary network.
[0017] The parameters of the main network and auxiliary network are adjusted based on the total loss to obtain the vector transformation model.
[0018] In the above embodiments, by using an auxiliary network to assist in the training of the main network and training the main network independently, the text vectors obtained by the main network during vector transformation can be made more accurate.
[0019] In some embodiments, the prefix samples are input into a generative pre-trained model to obtain concatenated samples, including:
[0020] The extended text is obtained by expanding the concatenation content of the prefix samples through a generative pre-trained model.
[0021] The text before the first punctuation mark in the expanded text is filtered to obtain the connecting sample. When the prefix sample is not the target text, there is no semantic implication relationship between the connecting sample and the target text.
[0022] In the above embodiments, by expanding and filtering the prefix samples, connecting samples can be obtained for training the main network of the subsequent model, resulting in a more accurate vector transformation model.
[0023] Secondly, embodiments of this application provide a method for converting semantic vectors, including: obtaining text to be converted; inputting the text to be converted into a vector conversion model to obtain semantic vectors, wherein the semantic vectors are used to represent the meaning of the text to be converted, and the vector conversion model is obtained by training a main network and an auxiliary network through concatenation samples and target text vectors corresponding to the target text, the concatenation samples are obtained by inputting prefix samples into a generative pre-trained model, and the prefix samples are obtained by selecting target texts from a text set according to a preset probability.
[0024] In the above embodiments, this application selects target text with probability and trains the main network and auxiliary network with the connecting text corresponding to the selected prefix samples. Through self-learning and the assistance of the auxiliary network, the main network can learn to analyze the meaning of the text to be converted based on the content connecting the text to be converted. Then, the obtained vector conversion model can represent the text to be converted as a vector, which can achieve the effect of accurately converting the text to be converted into a text vector.
[0025] In some embodiments, before obtaining the text to be converted, the method further includes:
[0026] Target text is selected from the text set according to a preset probability to obtain prefix samples, where the prefix samples are either the target text or non-target text in the text set;
[0027] The prefix samples are input into the generative pre-trained model to obtain the connecting samples;
[0028] The vector transformation model is obtained by training the main network and the auxiliary network by connecting the target text vectors corresponding to the sample and the target text.
[0029] In the above embodiments, this application selects target text with probability and trains the main network and auxiliary network with the connecting text corresponding to the selected prefix samples. Through self-learning and the assistance of the auxiliary network, the main network can learn to analyze the meaning of the text based on the content of the text and the text connection. Then, the obtained vector conversion model can represent the text as a vector, which can achieve the effect of accurately converting the text into a text vector.
[0030] In some embodiments, after converting the input text vector to be converted into a model to obtain a semantic vector, the method further includes:
[0031] The text to be converted is classified based on semantic vectors;
[0032] or
[0033] The similarity probability is obtained by calculating the similarity between the semantic vector and the text vector corresponding to another text.
[0034] In the above embodiments, the vector transformation model obtained by the training method of this application yields more accurate results when performing text classification and similarity calculation.
[0035] Thirdly, embodiments of this application provide an apparatus for training a vector transformation model, comprising:
[0036] The filtering module is used to select target text from the text set according to a preset probability to obtain prefix samples, wherein the prefix samples are either target text or non-target text in the text set;
[0037] The input module is used to input the prefix sample into the generative pre-trained model to obtain the concatenation sample. The concatenation sample carries a label indicating whether there is a semantic relationship between the concatenation sample and the target text. When the prefix sample is the target text, there is a semantic relationship between the concatenation sample and the target text.
[0038] The training module is used to train the main network and the auxiliary network by connecting the target text vectors corresponding to the sample and the target text, so as to obtain the vector transformation model.
[0039] Optionally, the device further includes:
[0040] The encoding module is used to copy the target text and obtain sample pairs before the filtering module selects the target text from the text set according to a preset probability and obtains the prefix sample;
[0041] The target text vector is obtained by encoding the sample pairs through the main network.
[0042] Optionally, the training module is specifically used for:
[0043] The loss of the main network is calculated using the target text vector;
[0044] Input the target text vector and the connecting sample into the auxiliary network to obtain the semantic implication probability that the connecting sample and the target text have a semantic implication relationship;
[0045] The auxiliary network loss is calculated using semantic entailment probability.
[0046] The total loss is obtained by weighted summing of the losses from the main network and the auxiliary network.
[0047] The parameters of the main network and auxiliary network are adjusted based on the total loss to obtain the vector transformation model.
[0048] Optionally, the input module is specifically used for:
[0049] The extended text is obtained by expanding the concatenation content of the prefix samples through a generative pre-trained model.
[0050] The text before the first punctuation mark in the expanded text is filtered to obtain the connecting sample. When the prefix sample is not the target text, there is no semantic implication relationship between the connecting sample and the target text.
[0051] Fourthly, embodiments of this application provide an apparatus for converting semantic vectors, comprising:
[0052] The acquisition module is used to acquire the text to be converted;
[0053] The conversion module is used to input the text to be converted into the vector conversion model to obtain a semantic vector. The semantic vector is used to represent the meaning of the text to be converted. The vector conversion model is obtained by training the main network and the auxiliary network with the target text vector corresponding to the concatenation sample and the target text. The concatenation sample is obtained by inputting the prefix sample into the generative pre-trained model. The prefix sample is obtained by selecting a text from the text set where the target text is located according to a preset probability.
[0054] Optionally, the device further includes:
[0055] The training module is used by the acquisition module to select target text from the text set according to a preset probability before acquiring the text to be converted, and obtain prefix samples, wherein the prefix samples are target text or non-target text in the text set;
[0056] The prefix samples are input into the generative pre-trained model to obtain the connecting samples;
[0057] The vector transformation model is obtained by training the main network and the auxiliary network by connecting the target text vectors corresponding to the sample and the target text.
[0058] Optionally, the device further includes:
[0059] The application module is used by the conversion module to classify the text to be converted based on the semantic vector after the input vector of the text to be converted is converted into a model to obtain a semantic vector.
[0060] or
[0061] The similarity probability is obtained by calculating the similarity between the semantic vector and the text vector corresponding to another text.
[0062] Fifthly, embodiments of this application provide an electronic device including a processor and a memory, the memory storing computer-readable instructions, which, when executed by the processor, perform the steps of the methods provided in the first or second aspect above.
[0063] In a sixth aspect, embodiments of this application provide a readable storage medium having a computer program stored thereon, which, when executed by a processor, performs the steps of the methods provided in the first or second aspect above.
[0064] Other features and advantages of this application will be set forth in the following description and will be apparent in part from the description or may be learned by practicing embodiments of this application. The objectives and other advantages of this application may be realized and obtained by means of the structures particularly pointed out in the written description, claims, and drawings. Attached Figure Description
[0065] To more clearly illustrate the technical solutions of the embodiments of this application, the accompanying drawings used in the embodiments of this application will be briefly introduced below. It should be understood that the following drawings only show some embodiments of this application and should not be regarded as a limitation of the scope. For those skilled in the art, other related drawings can be obtained based on these drawings without creative effort.
[0066] Figure 1 A flowchart illustrating a method for training a vector transformation model provided in this application embodiment;
[0067] Figure 2 A flowchart illustrating an overall method for training a vector transformation model, as provided in this application embodiment;
[0068] Figure 3 A flowchart illustrating a method for converting semantic vectors provided in this application embodiment;
[0069] Figure 4 A schematic block diagram of an apparatus for training a vector transformation model provided in an embodiment of this application;
[0070] Figure 5 A schematic block diagram of an apparatus for converting semantic vectors provided in an embodiment of this application;
[0071] Figure 6 A schematic block diagram of a device for training a vector conversion model provided in an embodiment of this application;
[0072] Figure 7 This is a schematic block diagram of a device for converting semantic vectors, provided in an embodiment of this application. Detailed Implementation
[0073] The technical solutions of the embodiments of this application will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of this application, and not all embodiments. The components of the embodiments of this application described and shown in the accompanying drawings can be arranged and designed in various different configurations. Therefore, the following detailed description of the embodiments of this application provided in the accompanying drawings is not intended to limit the scope of the claimed application, but merely represents selected embodiments of this application. All other embodiments obtained by those skilled in the art based on the embodiments of this application without inventive effort are within the scope of protection of this application.
[0074] It should be noted that similar reference numerals and letters in the following figures indicate similar items; therefore, once an item is defined in one figure, it does not need to be further defined and explained in subsequent figures. Furthermore, in the description of this application, terms such as "first," "second," etc., are used only to distinguish descriptions and should not be construed as indicating or implying relative importance.
[0075] First, some of the terms used in the embodiments of this application will be explained to facilitate understanding by those skilled in the art.
[0076] AI: Artificial Intelligence is a branch of computer science that studies and develops theories, methods, technologies, and application systems for simulating, extending, and expanding human intelligence.
[0077] NLP: Natural Language Processing. Natural Language Processing is an important field within computer science and artificial intelligence. It studies the theories and methods that enable effective communication between humans and computers using natural language.
[0078] Contrastive learning: A self-supervised / unsupervised learning method used to learn general features of a dataset by having the model learn which data points are similar or different without labels.
[0079] Softmax: The normalized exponential function, or Softmax function, is a generalization of the logistic function. It can "compress" a K-dimensional vector z containing arbitrary real numbers into another K-dimensional real vector σ(z), such that each element is in the range (0,1) and the sum of all elements is 1.
[0080] This application is applied to the scenario of model pre-training, specifically the scenario of obtaining the final pre-trained model by assisting the training of the main network through an auxiliary network and the pre-training of the main network itself.
[0081] However, with the rapid development of artificial intelligence (AI), especially natural language processing (NLP), NLP technology has found more application scenarios. Text vectorization is a core component of NLP algorithms. Traditional text vectorization typically involves extracting the text's inherent features and converting them into text vectors by constructing matrices. This method has significant limitations, as it can only represent text using its own features, leading to inaccurate vector conversion.
[0082] To this end, this application selects target text from a text set according to a preset probability to obtain prefix samples, where the prefix samples are either target text or non-target text in the text set. The prefix samples are then input into a generative pre-trained model to obtain concatenation samples, where each concatenation sample carries a label indicating whether a semantic relationship exists between the concatenation sample and the target text. When the prefix sample is the target text, a semantic relationship exists between the concatenation sample and the target text. The main network and auxiliary network are trained using the target text vectors corresponding to the concatenation samples and the target text to obtain a vector conversion model. By selecting target text with probability and training the main and auxiliary networks with the concatenation texts corresponding to the selected prefix samples, the main network can learn to analyze the meaning of the text based on the content of the text and the text concatenation through self-learning and the assistance of the auxiliary network. The resulting vector conversion model can then represent the text as vectors, achieving an accurate conversion of text into text vectors.
[0083] In this embodiment of the application, the executing entity can be the training vector conversion model device in the training vector conversion model system. In practical applications, the training vector conversion model device can be electronic devices such as terminal devices and servers, and there are no restrictions here.
[0084] The following is combined Figure 1 The method for training vector transformation models according to embodiments of this application will be described in detail.
[0085] Please refer to Figure 1 , Figure 1 A flowchart illustrating a method for training a vector transformation model provided in this application embodiment is shown below. Figure 1 The methods for transforming training vectors into models shown include:
[0086] Step 110: Select target text from the text set according to a preset probability to obtain prefix samples. Prefix samples can be either the target text or non-target text from the text set. Selecting target text from the text set according to a preset probability to obtain prefix samples can be understood as assigning a preset probability to the target text for extraction. For example, if the text set contains both target text and a second text, the probability of the second text being extracted relative to the target text is 1 - the preset probability. That is, if there is a 50% probability of extracting the target text, there is also a 50% probability of extracting the second text. This probability-based extraction method yields uncertain samples, which is beneficial for the subsequent training of the auxiliary network. The text set can contain one target text, and the remaining texts can be considered non-target text. Prefix samples can be either extracted target text or non-target text.
[0087] In some embodiments of this application, before selecting target text from the text set according to a preset probability to obtain prefix samples, Figure 1 The method also includes: copying the target text to obtain sample pairs; and encoding the sample pairs through the main network to obtain the target text vector.
[0088] In the above process, after constructing sample pairs, the main network directly generates target text vectors, which can make the main network more accurate in converting text into vectors during subsequent pre-training.
[0089] Here, a sample pair consists of the target text and a copy of the target text. The main network is used to encode the text and perform vector transformation, forming the core of the vector transformation model. The main network is a BERT (Bidirectional Enoceder Representations from Transformers) model. For each positive sample pair, different dropout mask strategies are used as inputs to the main network for encoding. The encoding result of the [CLS] marker at the beginning of the sentence is taken as the output sentence vector representation, resulting in two sentence vector representations. For example, for a text pair... <x,x + The encoding result is: h = E(x,m), h + =E(x,m) + ), where E is the encoder (also known as Bert), m and m + For different masking strategies.
[0090] Step 120: Input the prefix samples into the generative pre-trained model to obtain the connecting samples.
[0091] The concatenation sample carries a label indicating whether there is a semantic relationship between it and the target text. When the prefix sample is the target text, a semantic relationship exists between the concatenation sample and the target text. The pre-trained model can be a GPT (Generative Pre-Training) model, which outputs subsequent parts of the text to obtain the subsequent concatenation sample. The label indicates whether a semantic relationship exists between the concatenation sample and the target text. When the prefix sample is not the target text, no semantic relationship exists between the concatenation sample and the target text. In other words, the concatenation sample is not a concatenation sample of the target text, and there is no semantic relationship between them.
[0092] In some embodiments of this application, the prefix sample is input into a generative pre-trained model to obtain a connecting sample, including: expanding the connecting content of the prefix sample through the generative pre-trained model to obtain expanded text; filtering the text before the first punctuation mark of the expanded text to obtain a connecting sample, wherein when the prefix sample is not the target text, the connecting sample and the target text do not have a semantic implication relationship.
[0093] In the above process, by expanding and filtering the prefix samples, this application can obtain connecting samples, which are used to train the main network of the subsequent model to obtain a more accurate vector transformation model. For example, expanding the connecting content, inputting "Twodogs are running." can yield "They have been running very well. We got them to this place last week when they did get out, she said," and finally filtering to obtain the connecting text: "They have been running very well."
[0094] Step 130: Train the main network and auxiliary network by connecting the target text vectors corresponding to the sample and the target text to obtain the vector transformation model.
[0095] The auxiliary network assists in training the main network, improving the accuracy of vector transformations and analyzing the semantic implications of the text. When the prefix sample is the target text, the auxiliary network can naturally analyze that there is a semantic implication relationship between the connecting sample and the target text. When the prefix sample is not the target text, the auxiliary network can also analyze that there is a semantic implication relationship between the connecting sample and the target text. The auxiliary network can include a 6-layer Transformer (encoding layer) structure, taking the first output vector as the semantic implication relationship representation, followed by a fully connected layer and softmax to obtain the probability of whether there is a semantic implication relationship.
[0096] In some embodiments of this application, a vector transformation model is obtained by training the main network and the auxiliary network using the target text vectors corresponding to the connecting samples and the target text. This includes: calculating the loss of the main network using the target text vectors; inputting the target text vectors and connecting samples into the auxiliary network to obtain the semantic implication probability that the connecting samples and the target text have a semantic implication relationship; calculating the loss of the auxiliary network using the semantic implication probability; weighted summing of the main network loss and the auxiliary network loss to obtain the total loss; and adjusting the parameters of the main network and the auxiliary network according to the total loss to obtain the vector transformation model.
[0097] In the above process, this application improves the accuracy of the text vectors obtained by the main network during vector transformation by using an auxiliary network to assist in the training of the main network and training the main network independently.
[0098] The total loss can be calculated using the following formula:
[0099]
[0100]
[0101]
[0102] Among them, L cl and L ce These are the main network loss and the auxiliary network loss, respectively, where L is the total loss, sim(.,.) represents the cosine similarity, τ is the temperature coefficient (hyperparameter), and o i To assist the network output, λ is a hyperparameter where 0 < λ < 0.1, i represents the i-th text, and h represents the target text vector. + This represents a sample vector to another sample vector, and N represents the size of the text set (batch), which contains N text samples.
[0103] In the above Figure 1In the process described, this application selects target text from a text set according to a preset probability to obtain prefix samples, where the prefix samples are either target text or non-target text in the text set. The prefix samples are then input into a generative pre-trained model to obtain concatenation samples, where each concatenation sample carries a label indicating whether there is a semantic relationship between the concatenation sample and the target text. When the prefix sample is the target text, a semantic relationship exists between the concatenation sample and the target text. The main network and auxiliary network are trained using the target text vectors corresponding to the concatenation samples and the target text to obtain a vector conversion model. By selecting target text with probability and training the main network and auxiliary network with the concatenation texts corresponding to the selected prefix samples, the main network can learn to analyze the meaning of the text based on the content of the text and the text concatenation through self-learning and the assistance of the auxiliary network. The resulting vector conversion model can then represent the text as vectors, achieving an accurate conversion of text into text vectors.
[0104] The following is combined Figure 2 The overall method of training vector transformation model according to the embodiments of this application will be described in detail.
[0105] Please refer to Figure 2 , Figure 2 A flowchart illustrating an overall method for training a vector transformation model provided in this application embodiment is shown below. Figure 2 The overall approach to the training vector transformation model shown includes:
[0106] Input training samples: where training samples can be a set of target text and non-target text.
[0107] Construct positive sample pairs for the main network: replicate training samples to obtain positive sample pairs.
[0108] Main encoder network: also known as the main network, is used to encode training samples to obtain corresponding vectors.
[0109] Output sentence vector: Output the corresponding vector of the training sample.
[0110] Constructing prefix samples for the GPT model: Extracting target text from the training samples according to a preset probability.
[0111] Fixed-parameter GPT model: Input the prefix samples into the GPT model to obtain the connecting samples.
[0112] Input auxiliary network: The input includes sentence vectors that connect samples and outputs, and the auxiliary network determines semantic implications.
[0113] Calculate info NCE loss: Calculate the main network loss.
[0114] Calculate cross entropy loss: Calculate the auxiliary network loss.
[0115] Weighted summation of losses from the two networks: The total loss is obtained by weighted summation of the losses from the main network and the auxiliary network.
[0116] Backpropagation optimizes model parameters: Adjust the model parameters based on the total loss to complete the model pre-training.
[0117] also, Figure 2 The specific methods and steps shown can be found in [reference]. Figure 1 The method shown will not be elaborated further here.
[0118] The following is combined Figure 3 The method for converting semantic vectors according to embodiments of this application will be described in detail.
[0119] Please refer to Figure 3 , Figure 3 A flowchart of a method for converting semantic vectors provided in this application embodiment is shown below. Figure 3 The methods for transforming semantic vectors shown include:
[0120] Step 310: Obtain the text to be converted.
[0121] In some embodiments of this application, before obtaining the text to be converted, Figure 3 The method also includes: selecting target text from the text set according to a preset probability to obtain prefix samples, wherein the prefix samples are target text or non-target text in the text set; inputting the prefix samples into a generative pre-trained model to obtain concatenation samples; and training the main network and auxiliary network with the concatenation samples and the target text vectors corresponding to the target text to obtain a vector transformation model.
[0122] In the above process, this application selects target text with probability and trains the main network and auxiliary network with the connecting text corresponding to the selected prefix samples. Through self-learning and the assistance of the auxiliary network, the main network learns to analyze the meaning of the text based on the content of the text and the text connection. Then, the obtained vector conversion model can represent the text as a vector, which can achieve the effect of accurately converting the text into a text vector.
[0123] Step 320: Convert the input vector of the text to be converted into a model to obtain a semantic vector.
[0124] In this model, semantic vectors are used to represent the meaning of the text to be converted. The vector conversion model is obtained by training the main network and the auxiliary network with the target text vectors corresponding to the concatenation samples and the target text. The concatenation samples are obtained by inputting the prefix samples into the generative pre-trained model. The prefix samples are obtained by selecting the target text from the text set according to a preset probability.
[0125] In some embodiments of this application, after converting the input vector of the text to be converted into a semantic vector, Figure 3 The method also includes: classifying the text to be converted based on the semantic vector; or calculating the similarity between the semantic vector and the text vector corresponding to another text to obtain the similarity probability.
[0126] In the above process, the vector transformation model obtained by the training method of this application yields more accurate results in text classification and similarity calculation.
[0127] In the above Figure 1 In the process shown, by selecting target text with probability and training the main network and auxiliary network with the connecting text corresponding to the selected prefix sample, the main network can learn to analyze the meaning of the text to be converted based on the content connecting the text to be converted through self-learning and the assistance of the auxiliary network. Then, the obtained vector conversion model can represent the text to be converted as a vector, which can achieve the effect of accurately converting the text to be converted into a text vector.
[0128] The previous text passed Figures 1-3 The training vector transformation model and the method for transforming semantic vectors are described below. Figures 4-7 Describes the training vector transformation model and the apparatus for transforming semantic vectors.
[0129] Please refer to Figure 4 This is a schematic block diagram of a training vector conversion model apparatus 400 provided in an embodiment of this application. The apparatus 400 can be a module, program segment, or code on an electronic device. This apparatus 400 is related to the above... Figure 1 The method implementation corresponds to this and can be executed. Figure 1 The various steps involved in the method embodiment, and the specific functions of the device 400, can be found in the following description. To avoid repetition, detailed descriptions are omitted here.
[0130] Optionally, the device 400 includes:
[0131] The filtering module 410 is used to select target text from the text set according to a preset probability to obtain prefix samples, wherein the prefix samples are either target text or non-target text in the text set;
[0132] The input module 420 is used to input the prefix sample into the generative pre-trained model to obtain the concatenation sample. The concatenation sample carries a label indicating whether there is a semantic relationship between the concatenation sample and the target text. When the prefix sample is the target text, there is a semantic relationship between the concatenation sample and the target text.
[0133] Training module 430 is used to train the main network and auxiliary network by connecting the target text vectors corresponding to the sample and the target text to obtain the vector transformation model.
[0134] Optionally, the device further includes:
[0135] The encoding module is used to copy the target text to obtain sample pairs before the filtering module selects target text from the text set according to a preset probability and obtains prefix samples; and to encode the sample pairs through the main network to obtain the target text vector.
[0136] Optionally, the training module is specifically used for:
[0137] The main network loss is calculated using the target text vector; the target text vector and the connecting sample are input into the auxiliary network to obtain the semantic implication probability that the connecting sample and the target text have a semantic implication relationship; the auxiliary network loss is calculated using the semantic implication probability; the main network loss and the auxiliary network loss are weighted and summed to obtain the total loss; the parameters of the main network and the auxiliary network are adjusted according to the total loss to obtain the vector transformation model.
[0138] Optionally, the input module is specifically used for:
[0139] The prefix sample is expanded by a generative pre-trained model to obtain the expanded text; the text before the first punctuation mark in the expanded text is selected to obtain the connecting sample. When the prefix sample is not the target text, the connecting sample and the target text have no semantic implication relationship.
[0140] Please refer to Figure 5 This is a schematic block diagram of a semantic vector conversion device 500 provided in an embodiment of this application. The device 500 can be a module, program segment, or code on an electronic device. This device 500 is related to the above... Figure 3 The method implementation corresponds to this and can be executed. Figure 3 The various steps involved in the method embodiment, and the specific functions of the device 500, can be found in the description below. To avoid repetition, detailed descriptions are appropriately omitted here.
[0141] Optionally, the device 500 includes:
[0142] Module 510 is used to acquire the text to be converted;
[0143] The conversion module 520 is used to input the text to be converted into the vector conversion model to obtain a semantic vector. The semantic vector is used to represent the meaning of the text to be converted. The vector conversion model is obtained by training the main network and the auxiliary network with the target text vector corresponding to the concatenation sample and the target text. The concatenation sample is obtained by inputting the prefix sample into the generative pre-trained model. The prefix sample is obtained by selecting a text from the text set where the target text is located according to a preset probability.
[0144] Optionally, the device further includes:
[0145] The training module is used to select target text from the text set according to a preset probability before the acquisition module acquires the text to be converted, and obtain prefix samples, wherein the prefix samples are target text or non-target text in the text set; input the prefix samples into the generative pre-trained model to obtain concatenation samples; and train the main network and the auxiliary network with the concatenation samples and the target text vector corresponding to the target text to obtain the vector conversion model.
[0146] Optionally, the device further includes:
[0147] The application module is used by the conversion module to classify the text to be converted based on the semantic vector after the input vector of the text to be converted is converted into a conversion model to obtain a semantic vector; or to calculate the similarity between the semantic vector and the text vector corresponding to another text to obtain the similarity probability.
[0148] Please refer to Figure 6 This is a schematic block diagram of a device for training a vector transformation model provided in an embodiment of this application. The device may include a memory 610 and a processor 620. Optionally, the device may further include a communication interface 630 and a communication bus 640. This device is similar to the one described above. Figure 1 The method implementation corresponds to this and can be executed. Figure 1 The specific functions of the device involved in the method embodiments can be found in the following description.
[0149] Specifically, memory 610 is used to store computer-readable instructions.
[0150] Processor 620 is used to process readable instructions stored in memory and is capable of executing... Figure 1 Each step in the method.
[0151] The communication interface 630 is used for signaling or data communication with other node devices. For example, it is used for communication with a server or terminal, or for communication with other device nodes, but the embodiments of this application are not limited thereto.
[0152] Communication bus 640 is used to enable direct communication between the above components.
[0153] In this embodiment, the communication interface 630 of the device is used for signaling or data communication with other node devices. The memory 610 can be high-speed RAM or non-volatile memory, such as at least one disk storage device. Optionally, the memory 610 can also be at least one storage device located remotely from the aforementioned processor. The memory 610 stores computer-readable instructions, which, when executed by the processor 620, enable the electronic device to perform the aforementioned... Figure 1 The method process is shown. Processor 620 can be used on device 400 and is used to perform the functions described in this application. Exemplarily, the processor 620 described above can be a general-purpose processor, a digital signal processor (DSP), an application-specific integrated circuit (ASIC), a field-programmable gate array (FPGA), or other programmable logic devices, discrete gate or transistor logic devices, or discrete hardware components; the embodiments of this application are not limited thereto.
[0154] Please refer to Figure 7 This is a schematic block diagram of a device for converting semantic vectors according to an embodiment of this application. The device may include a memory 710 and a processor 720. Optionally, the device may further include a communication interface 730 and a communication bus 740. This device is similar to the one described above. Figure 3 The method implementation corresponds to this and can be executed. Figure 3 The specific functions of the device involved in the method embodiments can be found in the following description.
[0155] Specifically, memory 710 is used to store computer-readable instructions.
[0156] Processor 720 is used to process readable instructions stored in memory and is capable of executing... Figure 3 Each step in the method.
[0157] The communication interface 730 is used for signaling or data communication with other node devices. For example, it is used for communication with a server or terminal, or for communication with other device nodes, but the embodiments of this application are not limited thereto.
[0158] Communication bus 740 is used to enable direct communication between the above components.
[0159] In this embodiment, the communication interface 730 of the device is used for signaling or data communication with other node devices. The memory 710 can be high-speed RAM or non-volatile memory, such as at least one disk storage device. Optionally, the memory 710 can also be at least one storage device located remotely from the aforementioned processor. The memory 710 stores computer-readable instructions, which, when executed by the processor 720, enable the electronic device to perform the aforementioned... Figure 3 The method process is shown. Processor 720 can be used on device 500 and is used to perform the functions described in this application. Exemplarily, the processor 720 described above can be a general-purpose processor, a digital signal processor (DSP), an application-specific integrated circuit (ASIC), a field-programmable gate array (FPGA), or other programmable logic devices, discrete gate or transistor logic devices, or discrete hardware components; the embodiments of this application are not limited thereto.
[0160] This application embodiment also provides a readable storage medium, wherein when the computer program is executed by a processor, it performs the following... Figure 1 or Figure 3 The method process executed by the electronic device in the illustrated method embodiment.
[0161] Those skilled in the art will understand that, for the sake of convenience and brevity, the specific working process of the device described above can be referred to the corresponding process in the aforementioned method, and will not be elaborated further here.
[0162] In summary, this application provides a method for training a vector conversion model. The method includes: selecting target text from a text set according to a preset probability to obtain prefix samples, wherein the prefix samples are either the target text or non-target text in the text set; inputting the prefix samples into a generative pre-trained model to obtain concatenation samples, wherein the concatenation samples carry a label indicating whether there is a semantic implication relationship between the concatenation sample and the target text, and when the prefix sample is the target text, there is a semantic implication relationship between the concatenation sample and the target text; and training the main network and auxiliary network using the target text vectors corresponding to the concatenation samples and the target text to obtain a vector conversion model. This method can accurately convert text into text vectors.
[0163] In the several embodiments provided in this application, it should be understood that the disclosed apparatus and methods can also be implemented in other ways. The apparatus embodiments described above are merely illustrative. For example, the flowcharts and block diagrams in the accompanying drawings illustrate the architecture, functionality, and operation of possible implementations of apparatus, methods, and computer program products according to various embodiments of this application. In this regard, each block in a flowchart or block diagram may represent a module, segment, or portion of code containing one or more executable instructions for implementing a specified logical function. It should also be noted that in some alternative implementations, the functions marked in the blocks may occur in a different order than those marked in the drawings. For example, two consecutive blocks may actually be executed substantially in parallel, and they may sometimes be executed in reverse order, depending on the functions involved. It should also be noted that each block in a block diagram and / or flowchart, and combinations of blocks in block diagrams and / or flowcharts, can be implemented using a dedicated hardware-based system that performs the specified function or action, or using a combination of dedicated hardware and computer instructions.
[0164] In addition, the functional modules in the various embodiments of this application can be integrated together to form an independent part, or each module can exist independently, or two or more modules can be integrated to form an independent part.
[0165] If the aforementioned functions are implemented as software functional modules and sold or used as independent products, they can be stored in a computer-readable storage medium. Based on this understanding, the technical solution of this application, in essence, or the part that contributes to the prior art, or a portion of the technical solution, can be embodied in the form of a software product. This computer software product is stored in a storage medium and includes several instructions to cause a computer device (which may be a personal computer, server, or network device, etc.) to execute all or part of the steps of the methods described in the various embodiments of this application. The aforementioned storage medium includes various media capable of storing program code, such as USB flash drives, portable hard drives, read-only memory (ROM), random access memory (RAM), magnetic disks, or optical disks.
[0166] The above description is merely an embodiment of this application and is not intended to limit the scope of protection of this application. Various modifications and variations can be made to this application by those skilled in the art. Any modifications, equivalent substitutions, improvements, etc., made within the spirit and principles of this application should be included within the scope of protection of this application. It should be noted that similar reference numerals and letters in the following figures indicate similar items; therefore, once an item is defined in one figure, it does not need to be further defined and explained in subsequent figures.
[0167] The above description is merely a specific embodiment of this application, but the scope of protection of this application is not limited thereto. Any variations or substitutions that can be easily conceived by those skilled in the art within the scope of the technology disclosed in this application should be included within the scope of protection of this application. Therefore, the scope of protection of this application should be determined by the scope of the claims.
[0168] It should be noted that, in this document, relational terms such as "first" and "second" are used only to distinguish one entity or operation from another, and do not necessarily require or imply any such actual relationship or order between these entities or operations. Furthermore, the terms "comprising," "including," or any other variations thereof are intended to cover non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements includes not only those elements but also other elements not expressly listed, or elements inherent to such a process, method, article, or apparatus. Without further limitations, an element defined by the phrase "comprising one..." does not exclude the presence of other identical elements in the process, method, article, or apparatus that includes said element.
Claims
1. A method for training a vector transformation model, characterized in that, include: Target text is selected from the text set according to a preset probability to obtain prefix samples, wherein the prefix samples are either the target text or non-target text in the text set; The prefix sample is input into the generative pre-trained model to obtain the concatenation sample, wherein the concatenation sample carries a label indicating whether there is a semantic relationship between the concatenation sample and the target text. When the prefix sample is the target text, the concatenation sample and the target text have a semantic relationship. The main network and auxiliary network are trained using the connection samples and the target text vectors corresponding to the target text to obtain a vector transformation model; Before selecting target text from the text set according to a preset probability to obtain prefix samples, the method further includes: Copy the target text to obtain sample pairs; The sample pairs are encoded using the main network to obtain the target text vector; The step of training the main network and auxiliary network using the target text vectors corresponding to the connecting samples and the target text to obtain the vector transformation model includes: The main network loss is calculated using the target text vector; The target text vector and the connecting sample are input into the auxiliary network to obtain the semantic implication probability that the connecting sample and the target text have a semantic implication relationship; The auxiliary network loss is calculated using the semantic implication probability. The total loss is obtained by weighted summing of the main network loss and the auxiliary network loss. The parameters of the main network and the auxiliary network are adjusted based on the total loss to obtain the vector transformation model.
2. The method according to claim 1, characterized in that, The step of inputting the prefix samples into a generative pre-trained model to obtain concatenated samples includes: The generative pre-trained model is used to expand the concatenation content of the prefix samples to obtain expanded text; The text preceding the first punctuation mark in the extended text is filtered to obtain the connecting sample. When the prefix sample is not the target text, the connecting sample and the target text do not have the semantic implication relationship.
3. A method for converting semantic vectors, characterized in that, include: Get the text to be converted; The text to be converted is input into a vector conversion model to obtain a semantic vector, wherein the semantic vector is used to represent the meaning of the text to be converted. The vector conversion model is obtained by training the main network and the auxiliary network with the target text vector corresponding to the concatenation sample and the target text. The concatenation sample is obtained by inputting the prefix sample into the generative pre-trained model. The prefix sample is obtained by selecting the target text from the text set according to a preset probability. Before obtaining the text to be converted, the method further includes: Target text is selected from the text set according to a preset probability to obtain prefix samples, wherein the prefix samples are either the target text or non-target text in the text set; The prefix samples are input into the generative pre-trained model to obtain the connecting samples; The main network and the auxiliary network are trained using the connection samples and the target text vectors corresponding to the target text to obtain the vector transformation model; The step of training the main network and the auxiliary network using the target text vector corresponding to the concatenation samples and the target text to obtain the vector transformation model includes: The main network loss is calculated using the target text vector; The target text vector and the connecting sample are input into the auxiliary network to obtain the semantic implication probability that the connecting sample and the target text have a semantic implication relationship; The auxiliary network loss is calculated using the semantic implication probability. The total loss is obtained by weighted summing of the main network loss and the auxiliary network loss. The parameters of the main network and the auxiliary network are adjusted based on the total loss to obtain the vector transformation model.
4. The method according to claim 3, characterized in that, After converting the input text vector to be converted into a semantic vector, the method further includes: The text to be converted is classified based on the semantic vector. or The similarity between the semantic vector and the text vector corresponding to another text is calculated to obtain the similarity probability.
5. An apparatus for training a vector transformation model, characterized in that, include: The filtering module is used to select target text from the text set according to a preset probability to obtain prefix samples, wherein the prefix samples are the target text or non-target text in the text set; The input module is used to input the prefix sample into the generative pre-trained model to obtain the concatenation sample, wherein the concatenation sample carries a label indicating whether there is a semantic relationship between the concatenation sample and the target text. When the prefix sample is the target text, the concatenation sample and the target text have a semantic relationship. The training module is used to train the main network and the auxiliary network using the target text vectors corresponding to the connecting samples and the target text to obtain a vector transformation model; Before selecting target text from the text set according to a preset probability to obtain prefix samples, the method further includes: Copy the target text to obtain sample pairs; The sample pairs are encoded using the main network to obtain the target text vector; The step of training the main network and auxiliary network using the target text vectors corresponding to the connecting samples and the target text to obtain the vector transformation model includes: The main network loss is calculated using the target text vector; The target text vector and the connecting sample are input into the auxiliary network to obtain the semantic implication probability that the connecting sample and the target text have a semantic implication relationship; The auxiliary network loss is calculated using the semantic implication probability. The total loss is obtained by weighted summing of the main network loss and the auxiliary network loss. The parameters of the main network and the auxiliary network are adjusted based on the total loss to obtain the vector transformation model.
6. An apparatus for converting semantic vectors, characterized in that, include: The acquisition module is used to acquire the text to be converted; The conversion module is used to input the text to be converted into a vector conversion model to obtain a semantic vector, wherein the semantic vector is used to represent the meaning of the text to be converted, and the vector conversion model is obtained by training the main network and the auxiliary network through the target text vector corresponding to the concatenation sample and the target text. The concatenation sample is obtained by inputting the prefix sample into the generative pre-trained model, and the prefix sample is obtained by selecting a text from the text set where the target text is located according to a preset probability. Before obtaining the text to be converted, the following is also included: Target text is selected from the text set according to a preset probability to obtain prefix samples, wherein the prefix samples are either the target text or non-target text in the text set; The prefix samples are input into the generative pre-trained model to obtain the connecting samples; The main network and the auxiliary network are trained using the connection samples and the target text vectors corresponding to the target text to obtain the vector transformation model; The step of training the main network and the auxiliary network using the target text vector corresponding to the concatenation samples and the target text to obtain the vector transformation model includes: The main network loss is calculated using the target text vector; The target text vector and the connecting sample are input into the auxiliary network to obtain the semantic implication probability that the connecting sample and the target text have a semantic implication relationship; The auxiliary network loss is calculated using the semantic implication probability. The total loss is obtained by weighted summing of the main network loss and the auxiliary network loss. The parameters of the main network and the auxiliary network are adjusted based on the total loss to obtain the vector transformation model.
7. An electronic device, characterized in that, include: A memory and a processor, the memory storing computer-readable instructions that, when executed by the processor, perform the steps of the method as described in any one of claims 1-4.