Method and apparatus for generating user description text based on a text generation network

By combining the first encoder, retrieval model, and decoder in the text generation network with a self-attention mechanism and a human knowledge base, logically coherent and well-argued user description text is generated. This solves the problem of poor text quality caused by insufficient training samples in existing technologies and improves the quality of text generation.

CN115358242BActive Publication Date: 2026-06-19ALIPAY (HANGZHOU) INFORMATION TECH CO LTD

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
ALIPAY (HANGZHOU) INFORMATION TECH CO LTD
Filing Date
2021-02-19
Publication Date
2026-06-19

AI Technical Summary

Technical Problem

In existing technologies, when generating user description text based on text generation networks, the number of training samples is small, resulting in poor text quality.

Method used

A text generation network-based approach is adopted, which uses a first encoder, a retrieval model and a decoder to generate user description text. User feature vectors are encoded through a self-attention mechanism, semantic representation vectors are generated by combining sentences from an artificial knowledge base, and user description text is generated through a decoder. The model is trained using first and second class samples to improve text quality.

Benefits of technology

It improves the quality of user description text, solves the problem of poor text quality caused by insufficient training samples, and enhances the logical tightness and sufficiency of the text.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN115358242B_ABST
    Figure CN115358242B_ABST
Patent Text Reader

Abstract

This specification provides a method and apparatus for generating user-described text based on a text generation network. The method includes: inputting various features of a target user into a first encoder; obtaining initial user feature vectors corresponding to each feature through the first encoder; encoding these features based on a self-attention mechanism to obtain an encoded state vector; inputting the encoded state vector into a retrieval model; retrieving K sentences from a human-made knowledge base through the retrieval model; determining the character encoding vectors corresponding to each character contained in the K sentences; determining attention coefficients based on the decoder's output feedback vector and the character encoding vectors; and weighting and summing the character encoding vectors based on the attention coefficients to obtain a semantic representation vector; inputting the encoded state vector and the semantic representation vector into a decoder; and generating user-described text for the target user through the decoder, with the decoder's hidden state serving as the output feedback vector. This method can improve the quality of the obtained text.
Need to check novelty before this filing date? Find Prior Art

Description

[0001] This application is a divisional application of the invention patent application filed on February 19, 2021, with application number 202110189520.1, entitled "Method and apparatus for generating user description text based on text generation network". Technical Field

[0002] This specification relates to one or more embodiments in the field of computers, and more particularly to a method and apparatus for generating user-described text based on a text generation network. Background Technology

[0003] Because there is a correlation between a user's characteristics and their user category, a user can be categorized based on their characteristics. User characteristics can include data such as age, education level, and income, while user categories can include multiple pre-defined categories, such as whether a user poses a repayment risk or an illegal fund transfer risk. Typically, simply providing a user's characteristics and category is not convincing. Therefore, after obtaining the user's characteristics, it is necessary to generate a user description text based on them. This user description text includes multiple statements that demonstrate the correlation between the user characteristics and the user category. The requirements for the user description text are that it be logically sound, well-reasoned, concise, and easy to understand—a standard and formal message.

[0004] In existing technologies, user description text is often generated based on text generation networks, that is, user description text is generated through machine learning. However, since the number of training samples used to train the text generation network is small, the text quality obtained by this method is poor.

[0005] Therefore, we hope there are improved solutions that can enhance the quality of the obtained text. Summary of the Invention

[0006] This specification describes one or more embodiments of a method and apparatus for generating user-described text based on a text generation network, which can improve the quality of the obtained text.

[0007] Firstly, a method for generating user description text based on a text generation network is provided, wherein the text generation network includes a first encoder, a retrieval model, and a decoder, and the method includes:

[0008] The various features of the target user are input into the first encoder. The first encoder is used to obtain the initial user feature vectors corresponding to the various features of the target user. The initial user feature vectors are encoded based on the self-attention mechanism to obtain the encoded state vector.

[0009] The encoded state vector is input into the retrieval model, and K sentences are retrieved from the artificial knowledge base through the retrieval model. The character encoding vectors corresponding to each character contained in the K sentences are determined. The attention coefficients corresponding to each character are determined according to the output feedback vector of the decoder and the character encoding vectors. The character encoding vectors are then weighted and summed according to the attention coefficients to obtain the semantic representation vectors corresponding to the K sentences.

[0010] The encoded state vector and the semantic representation vector are input into the decoder, and the decoder generates the user description text of the target user. The hidden state of the decoder is used as the output feedback vector.

[0011] In one possible implementation, the first encoder includes a time-based unidirectional encoding structure. The step of inputting various features of the target user into the first encoder and obtaining initial user feature vectors corresponding to each feature of the target user through the first encoder includes:

[0012] The features of the target user are sequentially used as the inputs of the first encoder at each time step, and the outputs of the first encoder at each time step are used as the initial user feature vectors.

[0013] In one possible implementation, the first encoder includes a time-based bidirectional encoding structure. The step of inputting various features of the target user into the first encoder and obtaining initial user feature vectors corresponding to each feature of the target user through the first encoder includes:

[0014] The various features of the target user are input into the first encoder in a first order, and the first feature vector of each feature is obtained based on the output of the first encoder at each time step.

[0015] The features of the target user are input into the first encoder in the reverse order of the first order, and the second feature vector of each feature is obtained based on the output of the first encoder at each time step.

[0016] The first and second feature vectors corresponding to the same feature are combined to form the initial user feature vector for that feature.

[0017] In one possible implementation, encoding the initial user feature vectors based on a self-attention mechanism includes:

[0018] The weights corresponding to each initial user feature vector are determined, and the initial user feature vectors are weighted and summed according to each weight to obtain the encoded state vector.

[0019] In one possible implementation, the retrieval model includes a second encoder, wherein determining the character encoding vector corresponding to each character contained in the K statements includes:

[0020] Obtain the character embedding vectors corresponding to each character contained in the K statements;

[0021] Each character embedding vector is input into a second encoder, which then determines the character encoding vector corresponding to each character in the K sentences based on an attention mechanism.

[0022] In one possible implementation, the decoder includes a time-based decoding structure. The decoder takes the encoded state vector as the initial state, takes the decoder output of the previous time step and the semantic representation vector output of the retrieval model at the previous time step as the input of the current time step, determines the output and hidden state at the current time step, and feeds the hidden state at the current time step as the output feedback vector of the current time step back to the retrieval model. The output at each time step corresponds to each character in the user description text.

[0023] In one possible implementation, the method further includes:

[0024] The first type of sample is used to train at least one of the first encoder, the retrieval model and the decoder using the first type of sample and the second type of sample. The first type of sample has various features of the sample user and the classification label of the two categories corresponding to the sample user. The second type of sample has various features of the sample user and the sample description text corresponding to the sample user. The number of the first type of sample is greater than the number of the second type of sample.

[0025] Furthermore, the model training includes:

[0026] The first encoder is pre-trained using the first type of samples;

[0027] The second type of samples are used to continue training at least one of the pre-trained first encoder, the retrieval model, and the decoder.

[0028] Furthermore, the model training includes:

[0029] The first type of samples and the second type of samples are mixed together, and their order is randomly shuffled. Then, at least one of the first encoder, the retrieval model, and the decoder is trained in batches according to the shuffled order.

[0030] Furthermore, the model training utilizes a pre-defined total loss function to determine the total prediction loss of the first and second class samples in the same batch, and adjusts the parameters of at least one of the first encoder, the retrieval model, and the decoder based on the total prediction loss; the total loss function is jointly determined by a first loss function and a second loss function, the value of the first loss function is determined based on the probability of classifying the first class sample, and the value of the second loss function is determined based on the probability that the second class sample output by the decoder corresponds to each word in the sample description text.

[0031] Secondly, an apparatus for generating user description text based on a text generation network is provided. The text generation network includes a first encoder, a retrieval model, and a decoder. The apparatus includes:

[0032] The encoding unit is used to input the various features of the target user into the first encoder, obtain the initial user feature vectors corresponding to the various features of the target user through the first encoder, and encode the initial user feature vectors based on the self-attention mechanism to obtain the encoded state vector.

[0033] The retrieval unit is used to input the encoding state vector obtained by the encoding unit into the retrieval model, retrieve K sentences from the artificial knowledge base through the retrieval model, determine the character encoding vector corresponding to each character contained in the K sentences, determine the attention coefficients corresponding to each character according to the output feedback vector of the decoder and the character encoding vectors, and perform a weighted summation of the character encoding vectors according to the attention coefficients to obtain the semantic representation vectors corresponding to the K sentences.

[0034] The decoding unit is used to input the encoded state vector obtained by the encoding unit and the semantic representation vector obtained by the retrieval unit into the decoder, and generate the user description text of the target user through the decoder, wherein the hidden state of the decoder is used as the output feedback vector.

[0035] Thirdly, a computer-readable storage medium is provided having a computer program stored thereon, which, when executed in a computer, causes the computer to perform the method of the first aspect.

[0036] Fourthly, a computing device is provided, including a memory and a processor, wherein the memory stores executable code, and the processor executes the executable code to implement the method of the first aspect.

[0037] The method and apparatus provided in the embodiments of this specification first input various features of the target user into a first encoder, and obtain initial user feature vectors corresponding to each feature of the target user through the first encoder. These initial user feature vectors are then encoded based on a self-attention mechanism to obtain an encoded state vector. Next, the encoded state vector is input into a retrieval model, which retrieves K statements from a human-made knowledge base. The character encoding vectors corresponding to each character in the K statements are determined. Attention coefficients corresponding to each character are determined based on the output feedback vector of the decoder and the character encoding vectors. The character encoding vectors are then weighted and summed based on the attention coefficients to obtain the semantic representation vectors corresponding to the K statements. Finally, the encoded state vector and the semantic representation vector are input into a decoder, which generates the user description text of the target user. The hidden state of the decoder serves as the output feedback vector. As can be seen from the above, the embodiments of this specification not only utilize the initial user feature vectors corresponding to the various features of the target user, but also utilize the semantic representation vectors corresponding to the K retrieved statements. Since these K statements come from a human knowledge base, they can effectively utilize the human experience most relevant to the target user, and can effectively solve problems such as reduplicated words and erroneous words. They have strong applicability and good text quality. Furthermore, the retrieval model and the decoder influence each other. The semantic representation vector output by the retrieval model will serve as the input to the decoder, affecting the user description text generated by the decoder. At the same time, the hidden state of the decoder will serve as the output feedback vector, which will serve as the input to the attention mechanism of the retrieval model, affecting the semantic representation vector output by the retrieval model, thereby further improving the quality of the obtained text. Attached Figure Description

[0038] To more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings used in the following description of the embodiments will be briefly introduced. Obviously, the drawings described below are only some embodiments of the present invention. For those skilled in the art, other drawings can be obtained based on these drawings without creative effort.

[0039] Figure 1 This is a schematic diagram illustrating an implementation scenario of one embodiment disclosed in this specification;

[0040] Figure 2 A flowchart of a method for generating user description text based on a text generation network according to one embodiment is shown.

[0041] Figure 3 A schematic diagram of the structure of a first encoder according to one embodiment is shown;

[0042] Figure 4 This diagram illustrates the structure of a retrieval model according to one embodiment.

[0043] Figure 5 A schematic diagram of the decoder structure according to one embodiment is shown;

[0044] Figure 6 A schematic diagram showing the shuffled order of training samples according to one embodiment is shown.

[0045] Figure 7 A schematic block diagram of an apparatus for generating user-described text based on a text generation network according to one embodiment is shown. Detailed Implementation

[0046] The solution provided in this specification will now be described with reference to the accompanying drawings.

[0047] Figure 1 This is a schematic diagram illustrating an implementation scenario of one embodiment disclosed in this specification. This implementation scenario involves generating user description text based on a text generation network. The text generation network, as a neural network model, can be obtained through machine learning. The input to the text generation network consists of various features of the target user. It is understood that the user can be classified based on their user features. User features can be data such as age, education level, and income. User categories can include multiple pre-defined categories, such as whether there is a repayment risk or an illegal fund transfer risk. The user description text includes multiple statements that demonstrate the relationship between user features and user categories. The requirements for the user description text are that it be logically sound, well-reasoned, concise, and easy to understand—a standard message.

[0048] Reference Figure 1The table lists the feature names and corresponding feature values ​​for various characteristics of User A. It can be understood that the target user is User A, the feature name is "Age" with a corresponding feature value of 50 years old, the feature name is "Education" with a corresponding feature value of high school, and so on, with the feature name being "Annual Income" with a corresponding feature value of 30,000 yuan. Based on the feature names and corresponding feature values ​​of User A's various characteristics, the generated user description text is "User A is older, has a lower education level, and a lower annual income, therefore there is a repayment risk." In this embodiment of the specification, user characteristics can be, but are not limited to, the user attribute characteristics listed above such as age, education level, and annual income. They can also include the user's historical behavioral characteristics for a specific application, such as historical loan amounts and whether there have been any delays in repayment. The specific content and generation method of the user description text are usually not fixed. In this embodiment of the specification, user description text is generated by combining expert experience and machine learning. It can be understood that expert experience refers to human experience, and the automatic generation of user description text by machine is highly efficient. The embodiments in this specification not only utilize the initial user feature vectors corresponding to the various features of the target user, but also utilize the semantic representation vectors corresponding to the K retrieved statements. Since these K statements come from a human knowledge base, they can effectively utilize human experience most relevant to the target user, and can effectively solve problems such as reduplicated words and erroneous words. They have strong applicability and good text quality. Furthermore, during decoding, the output feedback vector affects the semantic representation vector output by the retrieval model, thereby further improving the quality of the obtained text.

[0049] Figure 2 This diagram illustrates a method for generating user description text based on a text generation network according to one embodiment. The text generation network includes a first encoder, a retrieval model, and a decoder. This method can be based on... Figure 1 The implementation scenario is shown. For example... Figure 2As shown, the method for generating user description text based on a text generation network in this embodiment includes the following steps: Step 21, inputting various features of the target user into a first encoder, obtaining initial user feature vectors corresponding to each feature of the target user through the first encoder, and encoding the initial user feature vectors based on a self-attention mechanism to obtain an encoding state vector; Step 22, inputting the encoding state vector into a retrieval model, retrieving K sentences from a human knowledge base through the retrieval model, determining the character encoding vectors corresponding to each character contained in the K sentences, determining the attention coefficients corresponding to each character based on the output feedback vector of the decoder and the character encoding vectors, and performing a weighted summation of the character encoding vectors based on the attention coefficients to obtain the semantic representation vectors corresponding to the K sentences; Step 23, inputting the encoding state vector and the semantic representation vector into a decoder, generating the user description text of the target user through the decoder, with the hidden state of the decoder serving as the output feedback vector. The specific execution method of each of the above steps is described below.

[0050] First, in step 21, the various features of the target user are input into the first encoder. The first encoder obtains the initial user feature vectors corresponding to each feature of the target user. These initial user feature vectors are then encoded using a self-attention mechanism to obtain the encoded state vector. It is understood that the first encoder can be based on various model structures, such as transformers, long short-term memory networks (LSTM), or gated recurrent units (GRU).

[0051] In the embodiments described in this specification, the types of the features include: numerical type or text type.

[0052] For example, if user A is 50 years old, then age is a numerical feature, and its feature name is age, with a corresponding feature value of 50. If user A's place of residence is Beijing and Shanghai, then place of residence is a textual feature, and its feature name is place of residence, with corresponding feature values ​​of Beijing and Shanghai.

[0053] It is understandable that the type of a feature is also the type of its corresponding feature value.

[0054] In the embodiments of this specification, for features of type numeric, the feature name and its corresponding original feature value can be input into the first encoder; for features of type text, the corresponding original feature value can be first processed by word segmentation to obtain multiple word segmentation results, and then the feature name and its corresponding multiple word segmentation results can be input into the first encoder.

[0055] In one example, the first encoder includes a time-based unidirectional encoding structure. The step of inputting various features of the target user into the first encoder and obtaining initial user feature vectors corresponding to each feature of the target user through the first encoder includes:

[0056] The features of the target user are sequentially used as the inputs of the first encoder at each time step, and the outputs of the first encoder at each time step are used as the initial user feature vectors.

[0057] In another example, the first encoder includes a time-based bidirectional encoding structure, wherein the input of various features of the target user into the first encoder, and the acquisition of initial user feature vectors corresponding to each feature of the target user through the first encoder, includes:

[0058] The various features of the target user are input into the first encoder in a first order, and the first feature vector of each feature is obtained based on the output of the first encoder at each time step.

[0059] The features of the target user are input into the first encoder in the reverse order of the first order, and the second feature vector of each feature is obtained based on the output of the first encoder at each time step.

[0060] The first and second feature vectors corresponding to the same feature are combined to form the initial user feature vector for that feature.

[0061] In one example, encoding the initial user feature vectors based on a self-attention mechanism includes:

[0062] The weights corresponding to each initial user feature vector are determined, and the initial user feature vectors are weighted and summed according to each weight to obtain the encoded state vector.

[0063] Figure 3 A schematic diagram of the structure of a first encoder according to one embodiment is shown. (Refer to...) Figure 3 The first encoder includes a time-based bidirectional encoding structure and a self-attention layer. The various features of the target user are input into the bidirectional encoding structure, where each feature is a feature 1, feature 2, feature 3… feature n. The bidirectional encoding structure generates initial user feature vectors corresponding to each feature, where each initial user feature vector is a denoted as h1, h2, h3… hn. These initial user feature vectors are then input into the self-attention layer to obtain an encoded state vector, denoted by X.

[0064] Then, in step 22, the encoded state vector is input into the retrieval model. The retrieval model retrieves K sentences from the human knowledge base, determines the character encoding vectors corresponding to each character in the K sentences, determines the attention coefficients corresponding to each character based on the output feedback vector of the decoder and the character encoding vectors, and performs a weighted summation of the character encoding vectors based on the attention coefficients to obtain the semantic representation vectors corresponding to the K sentences. It is understood that the sentences in the human knowledge base reflect human experience. Retrieval can obtain human experience relevant to the target user, and the output feedback vector of the decoder influences the semantic representation vector output by the retrieval model, thereby further improving the quality of the obtained text.

[0065] In one example, the retrieval model includes a second encoder, and determining the character encoding vector corresponding to each character contained in the K statements includes:

[0066] Obtain the character embedding vectors corresponding to each character contained in the K statements;

[0067] Each character embedding vector is input into a second encoder, which then determines the character encoding vector corresponding to each character in the K sentences based on an attention mechanism.

[0068] Figure 4 A schematic diagram of the structure of a retrieval model according to one embodiment is shown. (Refer to...) Figure 4 The retrieval model includes a retrieval network, a second encoder, and a self-attention layer. The encoded state vector X is input into the retrieval network, which retrieves K statements from a human-made knowledge base. Here, N is the total number of statements in the human-made knowledge base; N is typically large, such as hundreds or thousands. K is a pre-set value, such as 2, 3, or 5. The word embedding vector w of each character in the K statements is then used to... i The second encoder is input, and the word encoding vector s of each word is determined by the second encoder. i ; convert the character encoding vector R of each character i The output feedback vector F of the decoder is input into the self-attention layer, and the semantic representation vector H corresponding to the K statements is obtained through the self-attention layer.

[0069] In the embodiments described in this specification, the second encoder can be based on various model structures, such as transformer, LSTM or GRU model structures.

[0070] Finally, in step 23, the encoded state vector and the semantic representation vector are input into the decoder, which generates the user description text for the target user. The hidden state of the decoder serves as the output feedback vector. It can be understood that the retrieval model obtains the semantic representation vector, which serves as the input to the decoder, while the hidden state of the decoder serves as the output feedback vector. This output feedback vector acts on the retrieval model, influencing the semantic representation vector obtained by the retrieval model. Through the interaction between the retrieval model and the decoder, the text quality of the generated user description text can be improved.

[0071] In one example, the decoder includes a time-based decoding structure. The decoder takes the encoded state vector as the initial state, takes the decoder output of the previous time step and the semantic representation vector output of the retrieval model at the previous time step as the input of the current time step, determines the output and hidden state at the current time step, and feeds the hidden state at the current time step as the output feedback vector of the current time step back to the retrieval model. The output at each time step corresponds to each word in the user description text.

[0072] Figure 5 A schematic diagram of the decoder according to one embodiment is shown. (Refer to...) Figure 5 The decoder includes a time-based decoding structure. The decoder takes the encoded state vector X as the initial state, and uses the decoder output y(t-1) from the previous time step and the semantic representation vector H(t-1) output by the retrieval model from the previous time step as inputs to determine the current output y(t) and hidden state h(t). The hidden state h(t) is then fed back to the retrieval model as the current output feedback vector F(t). The outputs at each time step correspond to individual characters in the user description text. It is understood that since each time step has a different hidden state, the output feedback vectors at each time step are different, and correspondingly, the semantic representation vectors at each time step are different.

[0073] In one example, the method further includes:

[0074] The first type of sample is used to train at least one of the first encoder, the retrieval model and the decoder using the first type of sample and the second type of sample. The first type of sample has various features of the sample user and the classification label of the two categories corresponding to the sample user. The second type of sample has various features of the sample user and the sample description text corresponding to the sample user. The number of the first type of sample is greater than the number of the second type of sample.

[0075] It is understood that the embodiments of this specification can classify target users based on the encoded state vector obtained by the first encoder.

[0076] Table 1 is a schematic diagram of the sample composition of the first type of samples.

[0077] Table 1

[0078] Feature 1 Feature 2 … Feature 3 Category Tags Sample 1 … … … … black Sample 2 … … … … white Sample 3 … … … … black

[0079] As shown in Table 1, Samples 1, 2, and 3 belong to the first category of samples. Samples in the first category possess all the characteristics of the sample users and the classification labels from the two corresponding categories. However, they do not contain the sample description text corresponding to the sample users. In different fields, classification labels can have different meanings. For example, in the field of combating illicit fund transfers, a black classification label indicates that the sample user poses a risk of illicit fund transfers; a white classification label indicates that the sample user does not pose a risk of illicit fund transfers.

[0080] Table 2 is a schematic diagram of the sample composition of the second type of samples.

[0081] Table 2

[0082]

[0083] As shown in Table 1, samples 21, 22 and 23 belong to the second type of samples. The second type of samples have the characteristics of the sample users and the sample description text corresponding to the sample users. The second type of samples do not have the classification labels of the two categories corresponding to the sample users.

[0084] In this embodiment, because the sample description text is usually manually generated, consisting of several sentences, it is not easy to obtain, so the number of second-class samples is relatively small. Classification labels are relatively easy to obtain, so the number of first-class samples is relatively large. The number of first-class samples is much larger than that of second-class samples. The classification labels of samples in the first-class samples are related to the sample features, and the sample description text of samples in the second-class samples is related to the sample features. The underlying logic of both is consistent, and their convergence direction during model training is also consistent. Therefore, first-class samples can be used to help train the text generation network, effectively solving the problem of insufficient network learning and poor generalization due to the small number of second-class samples.

[0085] In one example, the model training includes:

[0086] The first encoder is pre-trained using the first type of samples;

[0087] The second type of samples are used to continue training at least one of the pre-trained first encoder, the retrieval model, and the decoder.

[0088] In another example, the model training includes:

[0089] The first type of samples and the second type of samples are mixed together, and their order is randomly shuffled. Then, at least one of the first encoder, the retrieval model, and the decoder is trained in batches according to the shuffled order.

[0090] Figure 6 This diagram illustrates a shuffled order of training samples according to one embodiment. (Refer to...) Figure 6 Samples 1, 2, 3, 4, and 5 are samples of the first class, and samples 6, 7, 8, 9, and 10 are samples of the second class. Originally, samples of the first and second classes were sorted separately. After the order is shuffled, they are mixed together. A batch of training samples obtained according to the shuffled order contains both samples of the first and second classes. For example, if the number of training samples in a batch is 5, and the order is samples 8, 2, 6, 4, and 5, the batch of training samples includes samples of the first class (samples 2, 4, and 5) and samples of the second class (samples 8 and 6).

[0091] Furthermore, the model training utilizes a pre-defined total loss function to determine the total prediction loss of the first and second class samples in the same batch, and adjusts the parameters of at least one of the first encoder, the retrieval model, and the decoder based on the total prediction loss; the total loss function is jointly determined by a first loss function and a second loss function, the value of the first loss function is determined based on the probability of classifying the first class sample, and the value of the second loss function is determined based on the probability that the second class sample output by the decoder corresponds to each word in the sample description text.

[0092] For example, the total prediction loss is represented by `loss`, and the function value of the first loss function is represented by `l`. c This indicates that the function value of the second loss function is represented by l. g If we express this as a value, then loss = l g +l c .

[0093] The first loss function can be the cross-entropy loss function, which can be expressed by the following formula:

[0094]

[0095] Where n represents the number of samples in the first category, and i represents the sample number. When the i-th sample is classified as category one, y i The value is 1, when the i-th sample is classified as category two. i The value of p is 0. i This represents the probability that the i-th sample is classified as category 1.

[0096] The second loss function can be the cross-entropy loss function, which can be expressed by the following formula:

[0097]

[0098] Where n represents the number of characters output by the decoder, and i represents the character number. When the i-th character output by the decoder belongs to the sample description text, y i The value is 1, meaning that when the i-th character output by the decoder does not belong to the sample description text, y... i The value of p is 0. wi This represents the probability of each character in the sample description text output by the decoder.

[0099] The method provided in the embodiments of this specification first inputs various features of the target user into a first encoder, and obtains initial user feature vectors corresponding to each feature of the target user through the first encoder. These initial user feature vectors are then encoded using a self-attention mechanism to obtain an encoded state vector. Next, the encoded state vector is input into a retrieval model, which retrieves K statements from a human-made knowledge base. The character encoding vectors corresponding to each character in the K statements are determined. Attention coefficients corresponding to each character are determined based on the output feedback vector of the decoder and the character encoding vectors. The character encoding vectors are then weighted and summed based on the attention coefficients to obtain the semantic representation vectors corresponding to the K statements. Finally, the encoded state vector and the semantic representation vector are input into a decoder, which generates the user description text of the target user. The hidden state of the decoder serves as the output feedback vector. As can be seen from the above, the embodiments of this specification not only utilize the initial user feature vectors corresponding to the various features of the target user, but also utilize the semantic representation vectors corresponding to the K retrieved statements. Since these K statements come from a human knowledge base, they can effectively utilize the human experience most relevant to the target user, and can effectively solve problems such as reduplicated words and erroneous words. They have strong applicability and good text quality. Furthermore, the retrieval model and the decoder influence each other. The semantic representation vector output by the retrieval model will serve as the input to the decoder, affecting the user description text generated by the decoder. At the same time, the hidden state of the decoder will serve as the output feedback vector, which will serve as the input to the attention mechanism of the retrieval model, affecting the semantic representation vector output by the retrieval model, thereby further improving the quality of the obtained text.

[0100] According to another embodiment, an apparatus for generating user-descriptive text based on a text generation network is also provided. The text generation network includes a first encoder, a retrieval model, and a decoder. The apparatus is used to perform the method for generating user-descriptive text based on a text generation network provided in the embodiments of this specification. Figure 7A schematic block diagram of an apparatus for generating user-described text based on a text generation network according to one embodiment is shown. Figure 7 As shown, the device 700 includes:

[0101] Encoding unit 71 is used to input various features of the target user into the first encoder, obtain each initial user feature vector corresponding to each feature of the target user through the first encoder, and encode each initial user feature vector based on a self-attention mechanism to obtain an encoded state vector.

[0102] The retrieval unit 72 is used to input the encoding state vector obtained by the encoding unit 71 into the retrieval model, retrieve K sentences from the artificial knowledge base through the retrieval model, determine the character encoding vector corresponding to each character contained in the K sentences, determine the attention coefficients corresponding to each character according to the output feedback vector of the decoder and the character encoding vectors, and perform a weighted summation of the character encoding vectors according to the attention coefficients to obtain the semantic representation vectors corresponding to the K sentences.

[0103] The decoding unit 73 is used to input the encoded state vector obtained by the encoding unit 71 and the semantic representation vector obtained by the retrieval unit 72 into the decoder, and generate the user description text of the target user through the decoder, wherein the hidden state of the decoder is used as the output feedback vector.

[0104] Optionally, as an embodiment, the first encoder includes a time-based unidirectional encoding structure, wherein the encoding unit 71 is specifically used to sequentially use the various features of the target user as the input of the first encoder at each time step, and use the output of the first encoder at each time step as the initial user feature vector.

[0105] Optionally, as an embodiment, the first encoder includes a time-based bidirectional coding structure, and the coding unit 71 includes:

[0106] The first encoding subunit is used to input the various features of the target user into the first encoder in a first order, and obtain the first feature vector of each feature based on the output of the first encoder at each time step.

[0107] The second encoding subunit is used to input the various features of the target user into the first encoder in the reverse order of the first order, and obtain the second feature vector of each feature based on the output of the first encoder at each time step.

[0108] The combination subunit is used to combine the first feature vector obtained by the first encoding subunit and the second feature vector obtained by the second encoding subunit corresponding to the same feature, as the initial user feature vector of that feature.

[0109] Optionally, as an embodiment, the encoding unit 71 is specifically used to determine the weights corresponding to each initial user feature vector, and to perform a weighted summation of each initial user feature vector according to each weight to obtain the encoded state vector.

[0110] Optionally, as an embodiment, the retrieval model includes a second encoder, and the retrieval unit 72 includes:

[0111] Obtain a subunit, used to obtain the character embedding vector corresponding to each character contained in the K statements;

[0112] The encoding subunit is used to input the word embedding vectors obtained by the acquisition subunit into the second encoder, and the second encoder determines the word encoding vectors corresponding to each word contained in the K sentences based on an attention mechanism.

[0113] Optionally, as an embodiment, the decoder includes a time-based decoding structure. The decoder takes the encoded state vector as the initial state, takes the decoder output of the previous time step and the semantic representation vector output of the retrieval model at the previous time step as the input of the current time step, determines the output and hidden state at the current time step, and feeds the hidden state at the current time step as the output feedback vector of the current time step back to the retrieval model. The output at each time step corresponds to each character in the user description text.

[0114] Optionally, as an embodiment, the apparatus further includes:

[0115] The training unit is used to train at least one of the first encoder, the retrieval model, and the decoder using first-class samples and second-class samples, wherein the first-class samples have various features of the sample user and classification labels of the two categories corresponding to the sample user, and the second-class samples have various features of the sample user and sample description text corresponding to the sample user, and the number of first-class samples is greater than the number of second-class samples.

[0116] Furthermore, the training unit includes:

[0117] The first training subunit is used to pre-train the first encoder using the first type of samples.

[0118] The second training subunit is used to continue training at least one of the pre-trained first encoder, the retrieval model, and the decoder obtained by the first training subunit using the second type of samples.

[0119] Furthermore, the training unit is specifically used to mix the first type of samples and the second type of samples together, randomly shuffle their order, and then train at least one of the first encoder, the retrieval model, and the decoder in batches according to the shuffled order.

[0120] Furthermore, the model training utilizes a pre-defined total loss function to determine the total prediction loss of the first and second class samples in the same batch, and adjusts the parameters of at least one of the first encoder, the retrieval model, and the decoder based on the total prediction loss; the total loss function is jointly determined by a first loss function and a second loss function, the value of the first loss function is determined based on the probability of classifying the first class sample, and the value of the second loss function is determined based on the probability that the second class sample output by the decoder corresponds to each word in the sample description text.

[0121] Using the apparatus provided in the embodiments of this specification, firstly, the encoding unit 71 inputs various features of the target user into the first encoder, obtains initial user feature vectors corresponding to each feature of the target user through the first encoder, and encodes each initial user feature vector based on a self-attention mechanism to obtain an encoding state vector; then, the retrieval unit 72 inputs the encoding state vector into the retrieval model, retrieves K sentences from the artificial knowledge base through the retrieval model, determines the character encoding vector corresponding to each character contained in the K sentences, determines each attention coefficient corresponding to each character based on the output feedback vector of the decoder and the character encoding vector, and performs a weighted summation of each character encoding vector based on each attention coefficient to obtain the semantic representation vector corresponding to the K sentences; finally, the decoding unit 73 inputs the encoding state vector and the semantic representation vector into the decoder, generates the user description text of the target user through the decoder, and the hidden state of the decoder serves as the output feedback vector. As can be seen from the above, the embodiments of this specification not only utilize the initial user feature vectors corresponding to the various features of the target user, but also utilize the semantic representation vectors corresponding to the K retrieved statements. Since these K statements come from a human knowledge base, they can effectively utilize the human experience most relevant to the target user, and can effectively solve problems such as reduplicated words and erroneous words. They have strong applicability and good text quality. Furthermore, the retrieval model and the decoder influence each other. The semantic representation vector output by the retrieval model will serve as the input to the decoder, affecting the user description text generated by the decoder. At the same time, the hidden state of the decoder will serve as the output feedback vector, which will serve as the input to the attention mechanism of the retrieval model, affecting the semantic representation vector output by the retrieval model, thereby further improving the quality of the obtained text.

[0122] According to another embodiment, a computer-readable storage medium is also provided, on which a computer program is stored, which, when executed in a computer, causes the computer to perform a combination Figure 2 The method described.

[0123] According to another embodiment, a computing device is also provided, including a memory and a processor, wherein the memory stores executable code, and when the processor executes the executable code, it implements a combination... Figure 2 The method described.

[0124] Those skilled in the art will recognize that, in one or more of the examples above, the functions described in this invention can be implemented using hardware, software, firmware, or any combination thereof. When implemented in software, these functions can be stored in a computer-readable medium or transmitted as one or more instructions or code on a computer-readable medium.

[0125] The specific embodiments described above further illustrate the purpose, technical solution, and beneficial effects of the present invention. It should be understood that the above description is only a specific embodiment of the present invention and is not intended to limit the scope of protection of the present invention. Any modifications, equivalent substitutions, improvements, etc., made on the basis of the technical solution of the present invention should be included within the scope of protection of the present invention.

Claims

1. A method for generating user description text based on a text generation network, wherein the text generation network includes a first encoder, a retrieval model, and a decoder, the method comprising: The various features of the target user are input into the first encoder. The first encoder is used to obtain the initial user feature vectors corresponding to the various features of the target user. The initial user feature vectors are encoded based on the self-attention mechanism to obtain the encoded state vector. The encoded state vector is input into the retrieval model, and K sentences are retrieved from the artificial knowledge base through the retrieval model. The character encoding vectors corresponding to each character contained in the K sentences are determined. The attention coefficients corresponding to each character are determined according to the output feedback vector of the decoder at each time and the character encoding vectors. The character encoding vectors are weighted and summed according to the attention coefficients to obtain the semantic representation vectors corresponding to the K sentences at each time. The encoded state vector and the semantic representation vector are input into the decoder. The decoder includes a time-based decoding structure. The decoder takes the encoded state vector as the initial state, and takes the decoder output of the previous time step and the semantic representation vector output of the retrieval model at the previous time step as the input of the current time step to determine the output and hidden state at the current time step. The hidden state at the current time step is fed back to the retrieval model as the output feedback vector at the current time step. The output at each time step corresponds to each word, which constitutes the user description text of the target user.

2. The method according to claim 1, wherein, The text generation network is trained in the following way: The first encoder is trained using a first type of sample, wherein the first type of sample has various features of the sample user and a classification label of the two categories corresponding to the sample user. The sample user is classified based on the encoding state vector obtained by the first encoder during model training. At least one of the first encoder, the retrieval model and the decoder is trained using a second type of sample, wherein the second type of sample has various features of the sample user and a sample description text corresponding to the sample user. The number of first type samples is greater than the number of second type samples.

3. The method as described in claim 2, wherein, The model training includes: The first type of samples and the second type of samples are mixed together, and their order is randomly shuffled. The samples are then divided into batches according to the shuffled order. The total prediction loss of the first type of samples and the second type of samples in the same batch is determined using a pre-set total loss function. The parameters of at least one of the first encoder, the retrieval model and the decoder are adjusted according to the total prediction loss.

4. The method of claim 3, wherein, The total loss function is jointly determined by a first loss function and a second loss function. The value of the first loss function is determined based on the probability of classifying the first type of sample, and the value of the second loss function is determined based on the probability that the second type of sample output by the decoder corresponds to each word in the sample description text.

5. The method of claim 1, wherein, The first encoder includes a time-based unidirectional encoding structure. The step of inputting various features of the target user into the first encoder and obtaining initial user feature vectors corresponding to each feature of the target user through the first encoder includes: The features of the target user are sequentially used as the inputs of the first encoder at each time step, and the outputs of the first encoder at each time step are used as the initial user feature vectors.

6. The method of claim 1, wherein, The first encoder includes a time-based bidirectional encoding structure. The step of inputting various features of the target user into the first encoder and obtaining initial user feature vectors corresponding to each feature of the target user through the first encoder includes: The various features of the target user are input into the first encoder in a first order, and the first feature vector of each feature is obtained based on the output of the first encoder at each time step. The features of the target user are input into the first encoder in the reverse order of the first order, and the second feature vector of each feature is obtained based on the output of the first encoder at each time step. The first and second feature vectors corresponding to the same feature are combined to form the initial user feature vector for that feature.

7. The method of claim 1, wherein, The encoding of each initial user feature vector based on a self-attention mechanism includes: The weights corresponding to each initial user feature vector are determined, and the initial user feature vectors are weighted and summed according to each weight to obtain the encoded state vector.

8. An apparatus for generating user description text based on a text generation network, wherein the text generation network includes a first encoder, a retrieval model, and a decoder, the apparatus comprising: The encoding unit is used to input the various features of the target user into the first encoder, obtain the initial user feature vectors corresponding to the various features of the target user through the first encoder, and encode the initial user feature vectors based on the self-attention mechanism to obtain the encoded state vector. The retrieval unit is used to input the encoding state vector obtained by the encoding unit into the retrieval model, retrieve K sentences from the artificial knowledge base through the retrieval model, determine the character encoding vector corresponding to each character contained in the K sentences, determine the attention coefficients corresponding to each character according to the output feedback vector of the decoder at each time and the character encoding vectors, and perform a weighted summation of the character encoding vectors according to the attention coefficients to obtain the semantic representation vectors corresponding to the K sentences at each time. The decoding unit is used to input the encoded state vector obtained by the encoding unit and the semantic representation vector obtained by the retrieval unit into the decoder. The decoder includes a time-based decoding structure. The decoder takes the encoded state vector as the initial state, takes the decoder output of the previous time step and the semantic representation vector output by the retrieval model of the previous time step as the input of the current time step, determines the output and hidden state of the current time step, and feeds the hidden state of the current time step as the output feedback vector of the current time step back to the retrieval model. The output of each time step corresponds to each word, which constitutes the user description text of the target user.

9. A computer-readable storage medium having a computer program stored thereon, which, when executed in a computer, causes the computer to perform the method of any one of claims 1-7.

10. A computing device comprising a memory and a processor, wherein the memory stores executable code, and the processor, when executing the executable code, implements the method of any one of claims 1-7.